NCLEVER(1) OGMP SEQUENCE UTILITY NCLEVER(1) NAME nclever - "Network Command-Line Entrez VERsion" REVISION This documentation refers to nclever version 4.0 SYNOPSIS nclever [-b] [-e] DESCRIPTION nclever is a tty-based version of NCBI's Entrez program. It is an interactive tool that allows easy browsing of the Entrez database. For more information about Entrez, see the Entrez manual, write to entrez@ncbi.nih.nlm.gov, or better yet access their web site at http://www.ncbi.nlm.nih.gov/Entrez/ The original Entrez Browser program written by NCBI is a tool that uses windows, menus, and a pointing device; since not everyone has computers or terminals with graphics capabilities, nclever was written to do the same work using only text input/output. nclever can do almost everything the browser version can do, and some more. See the section called NCLEVER AND THE ENTREZ BROWSER for a more complete comparison between these two Entrez database access tools. In addition, the nclever program permits BATCH access to the Entrez databases. Thus by use of script files, nclever can be made to perform queries in batch mode. In this way nclever can be used as a "search engine" for any application which has as its input a set of database queries in nclever format and can use as output any of the data in the Entrez databases (in any of the various formats supported by nclever). nclever's user interface is command-line based (thus its name). The user is presented with a prompt at which he/she types a command. Some commands perform searches in the Entrez databases, while others set options, display records, or save information in files. THE "-b" SWITCH The "-b" (for "batch") command-line switch globally affects the way warning messages (and other outputs of some commands) appear; when supplied it turns OFF an internal parameter called VerboseMode. It can alse be toggled on and off with the Option command. See the description for this command and the section USING NCLEVER IN BATCH MODE. Two very noticable effects of turning off VerboseMode is to disable the printing of the information seen when starting, and the complete absence of the interactive prompt! THE MAIN LISTS The program manages two main lists. The first one is a list of terms, supplied by the user. It is the equivalent of the "Query Refinement" subwindow in the Entrez Browser. Each term is a search string used to query one of Entrez's indexes. They can be grouped together to make conjunctive or disjunctive queries. nclever computes the other list, called the current documents list, from the list of terms, every time the list of term is changed. When doing neighboring or lookups, the current list of documents can become quite different from what is specified in the list of terms. The two lists have their own "current database" associated with them. The term list database is changed with the database command, while the curent documents database is usually the same as the term list database except when doing neighborings, when it may change. COMMANDS Commands are described here; they are entered after nclever's prompt, which is "NCLEVER> ". Arguments to commands are separated by white spaces (that is, blanks or tabs) and sometimes can also be separated by commas. All commands can be abbreviated to the minimum number of characters necessary to resolve ambiguity with other commands. They are not case- sensitive. INFORMATION COMMANDS About: Synopsis: "About" Displays the version and authorship of the program; an abbridged version of this message is also displayed once at the begining of each invocation of nclever if VerboseMode is true. Help,Man,?: Synopsis: "Help [command]", "Man [command]", "? [command]" Without arguments, displays a list of all commands with a short description of each. If the name of a command is supplied as argument, displays more information about that command. Status: Synopsis: "Status" Reports miscellanous Entrez database information. Info: Synopisis: "Info" Display the list of available Entrez search fields in tabular format. Each field has 1) a text tag (shown in square brackets) 2) a descriptive name 3) three flags indicating whether or not this field can be used to query the three databases: M = Medline P = Protein N = Nucleotide This command is particularly useful in conjonction with the "Search" query command which needs the text tags in its search expression. See the description of this command, below, in section QUERY COMMANDS" CONFIGURATION COMMANDS Option: Synopsis: "Option [[no|!]optname] [no|!]optname...]" This commands set/resets options that change the behavior of other commands and of the program generally. With no arguments, it displays the current values of all options. There are two kind options: boolean and integer. Boolean options are set to TRUE simply by supplying their names as argument to the command, and set to FALSE by prefixing them with the letters "no" or by a "!". For example, Option MultipleMode sets the option MultipleMode to TRUE while Option noMultipleMode sets it to false. There is currently only one integer option, and it is called CharsPerLine. It is set to a value by supplying a number after the option name, like in: Option CharsPerLine 80 Many options can be set/reset on the same command-line; they are not case-sensitive and they can all be abbreviated to the minimum number of characters necessary to distinguish between them. All options have default values, which can be saved to the user's nclever configuration file (see the section THE NCLEVER CONFIGURATION FILE). Here is a description of all possible options: Option [no]MEDAbstract Option [no]MEDGenes Option [no]MEDMesh Option [no]MEDSubstances These four options selectively enable/disable displaying parts of the MEDLINE records when the REPORT format is chosen (see the ARTICLE command). Option CharsPerLine This option tells nclever to use a display width of characters. It affects the display of sequence records in "Features" format, and the output of the ABOUT command. Option [no]ParentsPersist This option tells nclever to always include a copy of the records that were selected for neighboring when returning a list of their neighbors. When listing the current documents list, a "*" is shown beside the parents documents. Option [no]MultipleMode This option affect all search commands. When set to TRUE, the search commands will parse their arguments, separating them at white spaces, and making a query for each of them. When set to FALSE, everything after the search command is considered part of the query, INCLUDING the white spaces. Therefore, when set to TRUE, a query of the form Author Struhl K will try to search for "Struhl" and then for "K", which might not be what the user wanted; setting NOMultipleMode will look for "Struhl K". Option [no]TruncationMode This option affects all search command. When set to TRUE, all queries will be made in Truncation Mode, that is, the search string will be interpreted as a prefix of what is looked for. For example, a query on the word "cox" will effectively match "cox1", "cox2", etc. In that case, reported entries are shown with a "..." appended to the search string. When set to FALSE, an exact match between the search string and the indexed terms of entrez is expected. SPECIAL NOTE, January 1999: even though Truncation Mode is correctly implemented in nclever, it doesn't work because of unfinished upgrades at NCBI's servers. It is recommended NOT to use Truncation Mode in day to day business, but to try it anyway about once a month. It may suddenly start working late in 1999 or in early 2000. Option [no]AllowNull This option affects all search command. When set to TRUE, all queries that return an empty list of document will still create an entry in the current list of terms. This behavior is useful when using NCLEVER in batch mode; when using it interactively this option is better set to FALSE. The default is FALSE. Option [no]VerboseMode This option affects many commands. It basically toggles the displaying/nondisplaying of warning messages. Usually it is set to TRUE for interactive query of the database, and set to FALSE on invocation (using to -b command-line option) when nclever is used in BATCH mode, doing automatic retrieval from the Entrez database under script control. In that case, nclever only to produces useful information when the script is run, so the displaying of the prompt and the initial welcome message are also disabled. See the section USING NCLEVER IN BATCH MODE for more information. Option Save This is not really an option. This tells nclever to write out the current setting of all the options to nclever's configuration file. See the section THE NCLEVER CONFIGURATION FILE for more information. OPTIONS DEFAULTS: Medline Report display options: - MedAbstracts = TRUE - MedMesh = TRUE - MedGenes = TRUE - MedSubstances = TRUE Miscellanous options: - CharsPerLine = 80 - ParentsPersist = FALSE - MultipleMode = TRUE - TruncationMode = FALSE - AllowNull = FALSE - VerboseMode = TRUE Database: Synopsis: "Database [database name]" This command sets the current terms lookup database to either "medline", "protein" or "nucleotide". Note that changing database implies doing a RESET of the current search environment: all searched terms are cleared and the current document list and its neighboring history too. The default database is Medline. Without arguments, shows the current list of terms database. Article: Synopsis: "Article [article format]" This command sets the format in which medline articles are displayed. If no arguments are supplied, it shows what format is currently chosen. Possible formats can be shown by supplying a question mark to the command (or anything else that is not a legal article format). Report: Synopsis: "Report [sequence report format]" This command sets the format in which sequence records are displayed. If no arguments are supplied, it shows what format is currently chosen. Possible formats can be shown by supplying a question mark to the command (or anything else that is not a legal sequence format). Class: Synopsis: "Class [sequence level]" This command tells nclever what level of complexity of the sequence to display. Possible levels are NucProt, SegSet and BioSeq, according to NCBI's internal data structures for representing set of sequences. If no arguments are supplied, it shows what level is currently chosen. Possible levels can be shown by supplying a question mark to the command (or anything else that is not one of the three keywords just mentionned). QUERY COMMANDS Synopsis: " [term2] [term3]..." (MultipleMode=TRUE) " " (MultipleMode=FALSE) All query commands do the same thing: they search for one or more terms in the indexes of the Entrez databases, and if the operation is successful, add each term in the current list of terms at which point they can be grouped together, excluded, etc. There are fifteen query commands that each search a single Entrez field, and a more general query command called "Search" (described below) which can be used for more sophisticated boolean queries. There are two options that affects searching (see also the Option command). MultipleMode forces the query commands to make a query for each space-separated argument given to them; when MultipleMode is FALSE, spaces become significant in queries. TruncationMode allows queries to be made on prefixes of indexed terms, for example, a query on the word "cox" will effectively match "cox1", "cox2", etc. Some query commands apply to only some of the three Entrez databases. The "Info" command can show you which field is available for each database. They query commands are: Accession - Select documents by accession number Author - Select documents by author Date - Select documents by publication date (same as PDate) Pdate - Select documents by publication date (same as DATE) Edate - Select documents by Entrez date Mdate - Select documents by modification date ECnumber - Select documents by E. C. Number Gene - Select documents by gene name Journal - Select documents by journal title Keyword - Select documents by keyword Mesh - Select documents by MESH terms Organism - Select documents by organism name Pname - Select documents by protein name Substance - Select documents by substance name Text - Select documents by text words (titles + abstracts) Title - Select documents by title words The index of the date command contains years like "1968", combination of years and months like "1995/01", and combinations of years, months and dates like "1995/01/23". Search - Select documents using an Entrez search expression The "Search" query command allows the user to enter explicitely his or her own Entrez query expression. These expressions are made up of query terms, fields tags, parentheses and boolean operators. Query terms must be surrounded by double quotes; field tags are surrounded by square brackets (see also the Info command); the available boolean operators are "&" (and), "|" (or) and "-" (butnot). This query command is useful for searching Entrez with fields which do not have a corresponding search command in the list above. For example, to search medline by page numbers the user can query with Search "293-295" [PAGE] A more complex example using the "&" boolean operator: Search "rioux" [AUTH] & "littlejohn" [AUTH] This last query is exactly equivalent as issuing two separate queries with the "Author" query command. When a term has been added the list of terms, the current document list is updated, and its associated database is set to the list of term's database. NEIGHBORING COMMANDS: Synopsis: " [num2] [num3]" The commands "Neighbors", "Medline", "Protein" and "Nucleic" are used to do neighboring and lookup (see below) searches. When supplied with a list of numbers corresponding to documents in the current documents list, they retrieve the set of "similar" documents (precomputed in the entrez database; see the Entrez documention for how these indexes are built). This set then becomes the new current list of documents. "Neighbors" are similar records in the same database as the current list. The other commands ("Medline", "Protein" and "Nucleic") specify another database (or the same) in a more explicit manner (see LOOKUPS below). If the ParentsPersist option is TRUE, the documents used for neighboring will be included at the top of the new list, and marked with an "*" when listing it. Special recognised arguments are ALL for "all documents" and PARENTS for "parents documents" (they can be abbreviated to "A" and "P"). LOOKUPS Unlike the Entrez Browser, nclever does not have a lookup command. Instead, lookups are performed by specifiying the database to be accessed and the documents from the current list to be looked up in that database. Thus lookups are performed as described above for "NEIGHBORING" but apply only when the "Medline", "Protein" and "Nucleic" commands are used and when the most recent document list applies to a database different from the one specified in the command. For instance, if the Medline database had just been searched and the nucleic acid entry for the first entry was desired, the command Nucleic 1 would retrieve that entry. On the other hand, if the neighbours to this document were desired, the commands: Medline 1 or Neighbor 1 would both retrieve the neighbors to the first document on the list. HISTORY COMMANDS: Synopsis: "History" "Previous" "Next" When doing neighboring, the current list changes as the user browses lists of documents. nclever keeps a history of the changes, and the user is able to go back to previously fetched document lists with the "previous" command. The "next" command goes forward in the history list. The "history" command shows a summary of that list. Note that Modifying the term list doesn't automatically update the history list, until an explicit access to the history list is done with one of the history command. TAXONOMY COMMANDS: Synopsis: "Taxonomy List" "Taxonomy Down " "Taxonomy Up [num]" "Taxonomy Add [num]" These commands allow the user to browse the two taxonomic trees available with the the two sequence databases. The "List" command shows information related to the current node in the tree: its lineage and the name of all its children along with the number of documents found in the current sequence database. Taxonomy starts by default at the "root" of the tree, which is by convention at the 'top' and is its 'highest point'. The "Down" command allows the user to go to child number (as reported by the "List" command) of the current node. The "Up" command does the inverse; if a is supplied, the user climbs back up the tree to the lineage level with that number. The "Add" command puts the list of documents specified by children into the current list of term, in the same way as the search commands. If the current node in the tree is already a leaf (and therefore has no children), then doesn't need to be specified. MISCELLANOUS COMMANDS: List: Synopsis: "List [num]" Shows a summary of the current list of documents. Since this list can be very long, the default shows only the first 20 documents. When given a number N as argument, a summary of the first N documents are shown. Special arguments are "A" and "P"; see the NEIGHBORING COMMANDS subsection. Pick, Union and Not: Synopsis: " [num1] [num2] [num3]..." These commands are used to manipulated the terms in the list of terms. They take a list of numbers as arguments, each number corresponding to one of the terms shown by the list command. Since version 3.02, the "all" keyword can also be used to specify "all terms in the current term list". The "Pick" command used without argument simply shows the current term list. With arguments, it selects and unselects individual terms; an unselected term is shown with nothing in front of it while a selected one is shown with a "<" sign (unless grouped with the "Union" command). A negative number -n means to UNpick the term number n. Unselected term are completely ignored for the purpose of building the list of documents. The "Union" commands groups all the terms whose numbers are supplied in argument. Groups of one term are shown with a "<" sign in front of them, like a one-line bracket, while larger groups are shown with the characters "/", "|" and "\" which visually appear as larger brackets. Picking or unpicking single terms can be used to break-up a group. The "Not" command can only be used on groups of one term. It is used to do boolean NOTS of terms. Such terms are shown with a "-" sign in front of the single "<". A negative number -n means to remove the boolean NOT associated with the terms numN. The "Pick" command interprets "Pick 0" as "unpick all". Note that grouping terms together with "Union" can move the terms around in the list (this doesn't apply to the "Pick" or "Not" commands). Examples: Pick 2 -5 3 - Selects terms 2 and 3, unselects term 5 Union 2 4 6 - Groups terms 2, 4 and 6 (boolean "or") Not 1 3 -5 - Subtract documents of terms 1 and 3; documents of term 5 are back to normal. Evaluation of the boolean expression built by PICKing, UNIONing and NOTing terms is done in the following manner: first all groups made with the "Union" command are evaluated as ORs of the lists of documents specified by the terms. The groups are then ANDed together and finally, the single NOTed groups are substracted from that result. Therefore, a list of 8 terms like this: -< 1 Term1 < 2 Term2 / 3 Term3 | 4 Term4 \ 5 Term5 -< 6 Term6 / 7 Term7 \ 8 Term8 can be interpreted as the boolean expression ((2) AND (3 OR 4 OR 5) AND (7 OR 8)) AND (NOT 1) AND (NOT 6) When no argument is supplied, these commands show the current list of terms. When a change has been made to the list of terms, the current document list is updated, and its associated database is set to the list of term's database. Type: Synopsis: "[Type] [num2] [num3]..." This command displays one or more documents from the current list of documents. It takes numbers as arguments to specify which documents to show. Special arguments are "A" and "P"; see the NEIGHBORING COMMANDS subsection. The format of the displayed documents depend on the settings of the "Article" or "Report" commands. The "Type" keyword itself is optional, since nclever will recognize a command that starts with a digit as an abbreviation for the "Type" command. UID: Synopsis: "UID [uid2] [uid3]..." This command doesn't affect any of the internal lists. It simply displays one or more documents from the current term database, specified by their UIDs. It therefore assumes that the user knows the correct uids. As for the "type" command, the format of the displayed documents depend on the settings of the "Article" and "Report" commands. File: Synopsis: "File [ [modifiers]]" This command tells nclever to send the useful output of other commands (like "list", "type", etc) to a file. With no argument it returns the output to the user's stream. When a file is supplied as argument, the output of the following commands are sent to the file name specified, and nothing will be displayed on the screen. Some one-letter modifiers can be specified after the filename. An "A" means Append to the file. A "1" means redirect the output for the next command ONLY, not all the following commands. It is possible to do a one-command-only redirection (with modifier "1") while the general output has already been redirected somewhere else. This feature is used internally by the "save" and "print" commands. Examples: File myabstracts 1 - Next command's output redirected List all - This is what is sent to myabstracts List all - This time it's sent to your console Save: Synopsis: "Save [num2] [num3]..." This commands does the same thing as the "type" command, but sends its ouput to "filename". It is the same as doing "File 1" followed by "Type [num2] [num3]...". Print: Synopsis: Print [num2] [num3]..." This command does the same thing as "save", but sends its output to the printer. The output is saved in a temporary file and that file is printed using the PRINT COMMAND configuration in the user's nclever configuration file. See the section called THE NCLEVER CONFIGURATION FILE. This command is implemented only for UNIX systems. Reset: Synopsis: "Reset" This command discards the current list of documents, its history list, and the current list of terms. It leaves all configuration setting unchanged (current term database, record formats, etc). Saveuids: Synopsis: "Saveuids " This command saves the list of all the uids of the current documents list to the file filename. Loaduids: Synopsis: "Loaduids " This command reloads a list saved by the "saveuids" command. It adds the list to the term list, as if it was a legal searched-for term. The loaded list must be a list of uids from the same database as the list of term's database (see the "database" command). Exit, Quit, EOF: Synopsis: "Exit", "Quit", "". This exits from nclever. USING NCLEVER IN BATCH MODE Since nclever receives its command using the standard input and displays records to its standard output, it can be used as a tool to query the Entrez database automatically. One simply has to feed it the commands on its standard input and gather the results on its standard output. The "-b" (for BATCH) command-line switch can be used to turn off the internal option VerboseMode; this is advantageous in that it tells nclever not to print a prompt for each command, which would clutter the output and render it difficult to parse by other programs. The -b switch also disables the display of the introductory message. Therefore, building a scriptfile like this one: Database Medline Article ASN Option NoMultipleMode Author Struhl K Type 1 and feeding it to nclever with "nclever -b