EuroVan Vanagon

 

Searching the
Vanagon Mailing List Archives


 

Preface

This manual is an introduction to the new LISTSERV database functions. It is intended to be a reference document for general users with little or no knowledge of database systems. It does not contain any technical information that general users do not need to worry about.

This document will discuss the syntax and operational characteristics of the LISTSERV database subsystem. It is assumed that the reader is familiar with his or her e-mail client and familiar with sending commands to a LISTSERV server.

If you just need a "quick start", read the next two sections for basic instructions. If you want a detailed tutorial on how to use the SEARCH command itself, you might want to skim the next two sections and then start reading with the section entitled The SEARCH Command.

A basic database session

Let's say that you are looking for messages in the VANAGON mailing list that pertain to power mirrors.

To search for the term "power mirror" in the VANAGON list on GERRY.VANAGON.COM, create a new mail message addressed to LISTSERV@GERRY.VANAGON.COM and in the body (not the subject) of the message, simply type:

Search 'power mirror' in VANAGON

LISTSERV might respond to you with the following:

> search 'power mirror' in vanagon
-> 15 matches.

Item #   Date   Time  Recs   Subject
------   ----   ----  ----   -------
002464 00/01/26 20:02   15   FS: Lotsa parts from '87 Syncro
002674 00/01/28 14:28   46   Re: cold weather package?
002677 00/01/28 17:33   15   Re: cold weather package?
003488 00/02/07 10:24   63   still looking
003604 00/02/08 10:28   27   Power Mirrors not working
003608 00/02/08 13:02   39   Re: Power Mirrors not working
003936 00/02/11 11:57   29   Easy fix on power mirrors!
To order a copy of these postings, send the following command:

GETPOST vanagon 2464 2674 2677 3488 3604 3608 3936 4465 4468 4493-4499


>>> Item #2464 (26 Jan 2000 20:02) - FS: Lotsa parts from '87 Syncro
ECU, $100; AFM, $50; instrument panel, $75, (2) driver's side headlamp
assemblies, $75 each; power mirrors, w/switch and wiring, $150/all; 
                      ^^^^^^^^^^^^
grille, $50, 1 left armrest, blue, $20. Located in western MD.

>>> Item #2674 (28 Jan 2000 14:28) - Re: cold weather package?
If I'm not mistaken, the heated side mirrors came with the GL option
package that included the power windows and power mirrors.  Pretty
                                            ^^^^^^^^^^^^
cool, huh?

>>> Item #2677 (28 Jan 2000 17:33) - Re: cold weather package?

Does anybody know if the 87 GL (not Wolfsburg) w/power mirrors are 
                                                 ^^^^^^^^^^^^
heated? I never thought about it. I will test it tonight.

>>> Item #2678 (28 Jan 2000 17:45) - Re: cold weather package?

All power mirrors on the Vanagon are heated.  Rejoice in your new 
    ^^^^^^^^^^^^
found option!

>>> Item #2681 (28 Jan 2000 14:48) - Re: cold weather package?
>>> <KENWILFY@AOL.COM> 1/28/2000 2:45:16 PM >>>
All power mirrors on the Vanagon are heated.  Rejoice in your new 
    ^^^^^^^^^^^^
found option!

>>> Item #2704 (28 Jan 2000 22:21) - Re: cold weather package?

Nope, you aren't too high.....all of the power mirrors are heated at 
                                         ^^^^^^^^^^^^
the same time as the rear defroster.

>>> Item #3488 (7 Feb 2000 10:24) - still looking
                    front blower fan  out (checked fuse, fine)
                    power mirrors don't function -- fuse fine
                    ^^^^^^^^^^^^
                    one arm rest broken off
Summary of resource utilization
-------------------------------
 CPU time:      201.188 sec
 Overhead CPU:   26.406 sec
 CPU model:         2x90MHz Pentium (128M)

Note that LISTSERV includes excerpts from the indexed postings showing the context of the search term(s). We've deleted all but the first 2 in the example above to save space.

You would then use the GETPOST command to order the specific posts you wanted to read. For instance, we want to read posts numbered 3604, 3608, and 4493 through 4499. You would make another new message (or reply to the response from LISTSERV without quoting the text) and type in the body:

GETPOST 3604 3608 4493-4499

LISTSERV would then respond with the desired postings.

Narrowing the search

It is possible to add further parameters to your search in order to narrow it. You can limit a search by date with a 'since. . .' clause. Likewise, you can limit by sender and/or by the subject line with a 'where . . .' clause. For instance:

Search 'power mirror' in VANAGON since 99/01/01

Search 'power mirror' in VANAGON where sender contains 'Martha'

Search * in VANAGON where sender is COYOTE@VANAGON

Search * in VANAGON since 99/01/01 where subject contains 'power mirror'

are all valid search commands that will (depending on how well you've crafted your predicate) dramatically reduce the number of entries returned to you.

The SEARCH command

This chapter will introduce you to the SEARCH command. This minimum abbreviation of this command is just "S".

The syntax of this command is a bit complex, and will be introduced step by step.

Basic search functions

The two most important things you have to indicate when you search list archives are:

  1. The name of the list whose archives you want to search.

  2. What you want to search the individual documents for.

The name of the list to be searched is specified after the words or phrases to be sought and is prefixed with an IN keyword. For example, we might do this:

Search heater in VANAGON

This would select all the entries from list "VANAGON" containing the string "HEATER".

You could have used an asterisk as search argument to select all the entries in the list, though this isn't terribly useful:

Search * in VANAGON

Note that the mailing list name doesn't have to be uppercased. This is merely done to make the examples look better.

If you want to 'narrow' your previous search, i.e. perform additional tests on the documents that have been previously selected, you must omit the IN keyword. In that case, the search will be applied to the previous 'hits' and will create a new 'hit list'.

But in most cases, we will want to search for something longer than one word, for example part of a 'key' sentence.

Search problem with heated power mirror in VANAGON

Another problem is that we might not remember the exact original sentence. This is not very important, since LISTSERV will search each word individually: in the above example, any entry that contained the words "problem", "with", "heated", "power" and "mirror" would have matched the search, even if the words appeared in a different order.

But what if the original document had "mirrors" in it, instead of "mirror"? This is again no problem, as LISTSERV does not require the word to be surrounded by blanks to find a match. Case is also ignored when performing the search operation. That is, "mirror" would have found a match on "mirrors"... and "with" would have found a match on "without" or "withstand"! This may sound like inconsistent behaviour, but you should keep in mind that it is always possible to "narrow down" a search operation. However, once a document has been excluded from the list of "hits", it is very difficult to bring it back.

Now what if I want to search for an exact string? For example, I am interested in the string "in C". It is very likely that just any document in the database will contain both a "in" and the letter C. But what I am interested in is things which have been written, or programmed, or implemented, "in C". In that case, it is possible to force LISTSERV to group words together by quoting them, as in:

Search 'in C' in VANAGON

This method can also be used to insert extra blanks between or before words: leading and trailing blanks are normally removed automatically, but they are preserved inside quoted strings. Please note that quotes must be doubled when specified inside quoted strings, as in:

Search 'Coyote''s van' in VANAGON

The search for 'in C' resulted in over fifty hits, because a match was erroneously found against "in clear", "in core", etc. However, I do not want to search for 'in C ' because there might be hits with "in C." or "in C," in the database and I don't want to miss them. If the search respected the capital C, it would no longer find all those irrelevant hits. To do this, you must enclose your search string in double-quotes instead of single quotes, for example:

Search "in C" in UTILITY

Note that single quotes should not be doubled inside double-quoted strings, and vice-versa. Only quotes of the same type than the string should be doubled.

It is important to understand the difference between the two types of quoting. If you request a search for 'TEXT', you will find a match on "TEXT", "Text", "text" or even "teXt". This is the same behaviour as unquoted text. However, if you request a search for "TEXT", it will only find a match on "TEXT", not on "text" nor "Text".

Quoting is also the only way to search for a reserved keyword like "IN": if you tried "Search in in UTILITY", LISTSERV would report that database "IN" does not exist and would reject the command. This is because the keyword IN indicates the end of your search arguments. If you quote it, however, it will not be recognized and will be searched as you wanted it done. Similarly, if you want to search for an asterisk, you will have to quote it since "Search *" indicates that all entries should be selected.

Now the problem is that there may be sentences starting with a capital I, e.g. "In C, it would be coded this way:". How can I catch these sentences?

Actually, you have been using "complex search expressions" from the beginning without even being aware of it. When you specified a search on "Hardware problem with a 4381", you had in fact been asking LISTSERV for: "Hardware NEAR problem NEAR with NEAR a NEAR 4381". The "NEAR" is implicit, but it may be overriden.

You may even use parenthesis if needed:

Search ("in C" or "In C") and program in UTILITY

The 'NEAR' can still be implied, as in:

Search vanagon (white or red) in VANAGON

Search (Propex heater) or (P4 heater) in VANAGON

Search bumper (silver or black but not metal) in VANAGON

The following commands are strictly equivalent:

Search (1990 vanagon) or (1980 vanagon not brown) in VANAGON

Search vanagon (1990 or (1980 not brown)) in VANAGON

Search vanagon (1990 or (1980 but not brown)) in VANAGON

Search vanagon NEAR (1990 OR (1980 AND NOT brown)) in VANAGON

Date specifications

Since each document has been assigned a "date/time" field, it is possible to select documents based on this date field. This is accomplished by appending "date search rules" to the search expression, as in:

Search problem (serious or severe) in VANAGON since july

Search problem in VANAGON since oct 85

Search head gasket in VANAGON since 12/28

Search bumper from 12 january to august in VANAGON

Search projektzwo until 18 sept in VANAGON

Search list status since today 11:53 in VANAGON

The default values for omitted arguments are always chosen so as to exclude as few entries as possible. For example, "July" would mean "1 July 00:00:00" in a SINCE specification, and "31 July 23:59:59" in an UNTIL clause. The only exception is the year field, which always defaults to the current year.

Keyword search specifications

The last thing you may wish to search is the 'keywords' list. The following keywords are supported:

subject
sender

For example, you might want to search for 'door screen' where the subject contains the word 'mosquito':

Search door screen in VANAGON where subject contains mosquito

You may of course use complex expressions (with parenthesis) in the WHERE clause. There are new comparison operators available for this clause, like IS, CONTAINS, all the usual arithmetical comparison operators, and some more. However, the AND operation is no longer implied, but it can still be specified explicitly of course:

Search door screen in VANAGON where subject contains mosquito and sender IS hedwig@angryinch.org

Long search specifications

As the search commands become more and more complex, they will no longer fit in a single line. To solve this problem, we begin the command with the string '// ' (two front-slashes and a space) and follow it with the SEARCH command and the search specifications. Any database command ending in a comma indicates that more is to follow on the next line. This process can be repeated several times if desired. Each search string in the following examples works the same:

// Search bumper (plastic or (silver or black but not red)) ,
          in VANAGON

// Search bumper (plastic or ,
          (silver or black but not red)) ,
          in VANAGON

// Search bumper ,
          (plastic or ( ,
          silver or black ,
          but not red)) in VANAGON

The only 'trick' about this continuation line business is that you should always keep quoted strings on a single line. The process of identifying continuation lines and concatenating them afterwards may cause unwanted blanks to be inserted in the command line, which is no problem outside a quoted string since blanks are ignored, but might cause erroneous results in a quoted string.

If you want to search for several possible values in a given keyword, you do not have to repeat the keyword name and operator:

// Search * in VANAGON where ,
subject contains (Digifant or (Vanagon and computer))

// Search * in VANAGON where ,
sender contains earthlink.net or ,
(subject contains digifant and sender contains Darrell)

However, it should be noted that this 'factorization' is performed according to the rules of logic, which may not necessarily match those of English grammar. This removes any possible ambiguity as to the meaning of these clauses. Let's consider the following example:

subject does not contain (Vanagon and Type2)

This clause will get translated into:

subject does not contain Vanagon and machine does not contain Type2

In English you would probably say "machine contains neither Vanagon nor Type2". This is how LISTSERV will understand it. However, if you read the clause aloud, you will probably not pronounce the parenthesis and will end up saying "subject does not contain Vanagon and Type2", in other words, "subject does not contain both Vanagon and Type2" , which is a totally different thing (and would most probably be true all the time). The 'English meaning' could be obtained with the following clause:

not (subject contains (Vanagon and Type2))

In the former case, the negative 'does not contain' operator is inserted inside the parenthesis. In the latter, only "contains" is moved, and the negation remains outside.

Phonetic search

There may be cases where you are looking for a certain value of a keyword, the exact spelling of which you cannot remember. In these cases, it may be useful to try a phonetic search. A phonetic search will yield a match for anything that 'sounds like' your search string, as dictated by a predefined algorithm which is of course not perfect. It may give a hit for something which does not actually sound like your search string, or, more rarely, omit a keyword which did sound like what you entered. The main reasons for this are that the algorithm must be fast to execute on the machine and therefore not too sophisticated, and that the way a given word is pronounced depends on the idiom in which the word was written. For example, the phonetical transcription of the name 'Landau' will be different in French, English, German and Russian. Thus, it is impossible to decide whether a word sounds like another if the language in which the words are pronounced is not known (and of course LISTSERV does not have, a priori, any way to know it).

Phonetic searches are performed through the use of the SOUNDS LIKE and DOES NOT SOUND LIKE operators, which are syntactically similar to CONTAINS and DOES NOT CONTAIN. That is, you could do something like:

Search * in VANAGON where SUBJECT sounds like projectzwo

Note: There is a little trick with the SOUNDS LIKE operator that you should be aware of. If your search string ('projectzwo' in our above example) is a single word, it will be compared individually to all the words in the reference string (i.e. the data from the database), and will be considered a hit if it 'sounds like' any of the words in the reference string. Thus, the search word 'Ekohl' sounds like the reference string 'Ecole Normale Superieure' because it matches the first word. If the search string contains more than one word, the search and reference strings will be compared phonetically as a whole (and 'Ekohl Dzentrahll' will therefore not match 'Ecole Normale Superieure'). Note that any search string containing more than a single word must be quoted, as explained in the previous sections of this chapter.

  +--------------------------------------------------------------------+
  |                                                                    |
  | > Search * in VANAGON where subject sounds like (projectzo or ,    |
  | > reemo)                                                           |
  |                                                                    |
  | -> 3 matches.                                                      |
  |                                                                    |
  | Ref# Conn  Nodeid   Site name                                      |
  | ---- ----  ------   ---------                                      |
  | 0292 87/03 CRNLASSP Cornell University Cornell Laboratory of Atomic|
  | 0301 87/03 CRNLION  Cornell University Cornell Laboratory of Plasma|
  | 0307 87/06 CRNLNUC  Cornell University Laboratory of Nuclear Studes|
  |                                                                    |
  | > Search * in BITEARN where SITE sounds like HOPTIKK               |
  |                                                                    |
  | -> 2 matches.                                                      |
  |                                                                    |
  | Ref# Conn Nodeid Site name                                         |
  | ---- ---- ------ ---------                                         |
  | 0751 87/09 FRIHAP31 Assistance Publique - Hopitaux de Paris        |
  | 2120 87/04 UOROPT University of Rochester The Institute of Optics  |
  |                                                                    |
  | > Search * in BITEARN where SITE sounds like SCHIKAGO              |
  |                                                                    |
  | -> 1 match.                                                        |
  |                                                                    |
  | Ref# Conn Nodeid Site name                                         |
  | ---- ---- ------ ---------                                         |
  | 0140 86/03 BMLSCK11 Studiecentrum voor Kernenergie (SCK/CEN), Mol, |
  |                                                                    |
  | Figure 7.   Sample SEARCH commands involving phonetic  match:  The |
  |             first  command  shows  an example of accurate phonetic |
  |             match, where the  result  is  exactly  what  the  user |
  |             expected.   In the second example, the user found what |
  |             he was  looking  for  ("Optics"),  but  an  additional |
  |             unwanted  entry was selected.  This is by far the most |
  |             common case.  The last command is a typical example of |
  |             phonetic clash, where the algorithm did not  translate |
  |             the  search string into phonetics as the user expected |
  |             it, with the result that the desired name  ("Chicago") |
  |             was  not  found and that completely irrelevant entries |
  |             were presented instead.                                |
  +--------------------------------------------------------------------+

The phonetic matching algorithm used by LISTSERV is a slightly modified version of SOUNDEX -- a well-known algorithm that provides reasonably accurate matches at a very low CPU cost. Although it gives best results with the English language, for which it was originally designed, it is not too strongly tied to it and can still be used with other languages. It is of course absolutely impossible to write an program that would work for all the languages in the world, or even for the most widley used ones, since their interpretation of the most common combinations of letters are completely incompatible.

Exact syntax description

This section describes the exact syntax of the SEARCH command in technical terms. You can skip it if you are not interested in learning about the details of this command.

General syntax

  +--------+-----------------------------------------------------------+
  |        |                                                           |
  | Search |  search-rules                                             |
  | SELect |                                                           |
  |        |                                                           |
  |        |  Optional rules are:                                      |
  |        |                                                           |
  |        |    date-rules                                             |
  |        |    keyword-rules                                          |
  +--------+-----------------------------------------------------------+

The optional date-rules and keyword-rules arguments may appear in any order.

Date rules specification

You may optionally restrict the search to only those entries that lay within a given interval of time. This is accomplished by specifying one of the following date rules:

SINCE date-spec [time-spec]
FROM date-spec1 [time-spec1] TO date-spec2 [time-spec2]
UNTIL date-spec [time-spec]

The format of a date-spec is quite complex because of the number of different ways date/time specifications are usually expressed:

TODAY
yy
dd mm
[dd] [-] monthname[-] [yy]
mm/yy
mm-yy
yy/mm/dd
yy-mm-dd

Month names can be abbreviated to any length. If there is an ambiguity, the first month in chronological order is retained. For example, 'J' would mean 'January', 'JU' would be 'June' and 'JUL' would unambiguously select 'July'. The format of a time-spec is simply hh:mm[:ss].

 +--------------------------------------------------------------------+
 | FROM 14 july TO oct 97                                             |
 | SINCE 96                                                           |
 | UNTIL 23-JUN-97                                                    |
 | SINCE today 11:30                                                  |
 |                                                                    |
 | Figure 8.  Sample date clauses                                     |
 +--------------------------------------------------------------------+

NOTE: Case is irrelevant in date specifications. The keywords (SINCE, UNTIL, etc) have been capitalized only for better legibility, and can be entered in lower case if desired.

Keyword rules specification

You may request the actual document search to take place only for those entries which match a set of "keyword comparison" rules. The syntax is the following:

WHERE kwd-expression
WITH

kwd-expression is, generally speaking, an mathematical expression of keyword/value comparisons, possibly bound by logical operators. Comparison operators have a higher precedence than logical operators, that is, "A>10 AND B=20" is interpreted as '(A>10) AND (B=20)'. The available comparison operators are listed below. All the operators appearing on a given line are synonyms.

  +--------------------------------------------------------------------+
  | =         IS                                                       |
  | ^=   <>   IS NOT                                                   |
  | >                                                                  |
  | <                                                                  |
  | >=                                                                 |
  | <=                                                                 |
  | CONTAINS                                                           |
  | DOES NOT CONTAIN                                                   |
  | SOUNDS LIKE                                                        |
  | DOES NOT SOUND LIKE                                                |
  |                                                                    |
  | Figure 9.  Comparison operators for WHERE clauses                  |
  +--------------------------------------------------------------------+

All these operators are self-explanatory, except the last two which allow you to search the keyword value for a given "substring". That is, "Sender contains jeff" would be true if the value of the "Sender" keyword was "Jeff Smith" or "Jeffrey Donaldson". The case is ignored during the comparison unless the search operand is double-quoted.

If no valid comparison operator is specified between two arguments, "IS" (identity) is assumed.

The available logical operators are:

  +--------------------------------------------------------------------+
  | ^    NOT                                                           |
  | &    AND    BUT                                                    |
  | |    /    OR                                                       |
  |                                                                    |
  | Figure 10.  Logical (boolean) operators                            |
  +--------------------------------------------------------------------+

Please note that the logical operators AND and OR have equal precedence and are evaluated left-to-right.

Finally, keywords and operators can be "factorized" when the same comparison is to be applied to a given keyword and a series of comparands. For example, you might enter:

Search * where sender contains ('CS Dept' and (Jack or Phil))

This is internally expanded to:

// SEARCH * WHERE sender CONTAINS 'CS Dept' AND ,
(sender CONTAINS Jack OR sender CONTAINS Phil)

Please note that the expression must always be enclosed in parenthesis, even if it is a simple one:

Search * where sender contains (Joe or Morris)

This stems from the fact that comparison operators have a higher priority than logical (boolean) ones.

  +--------------------------------------------------------------------+
  | WHERE Sender is "Arthur Dent" ,                                    |
  | and Subject does not contain tea                                   |
  |                                                                    |
  | Where Sender is (Atiaran@Land or Elena@Land) ,                     |
  | and Subject contains ('Be true' but not Ur-Lord)                   |
  |                                                                    |
  | Figure 11.  Sample WHERE clauses                                   |
  +--------------------------------------------------------------------+

Search rules specification

Finally, you must specify what is to be searched inside the document. If you do not want anything to be sought at all (e.g. if you are only selecting known items from the database), you can specify an asterisk as a placeholder to waive the search. Otherwise you must specify a mathematical expression where arguments are search strings, possibly bound by logical operators (see Figure 10 for a comprehensive list). The default operator is AND, so that a search for "ENGINE PERFORMANCE TIPS" will select all entries where "ENGINE", "PERFORMANCE" and "TIPS" can be found (not necessarily in the same line).

  +--------------------------------------------------------------------+
  | Search *                                                           |
  |                                                                    |
  | Search 'I/O' Error                                                 |
  |                                                                    |
  | // Search Engine (performance or tips ,                            |
  | but not (replacement or question))                                 |
  |                                                                    |
  | Figure 12.  Sample document-search clauses                         |
  +--------------------------------------------------------------------+

Reserved words and quoting

When to quote strings

Keyword names and search arguments need not be quoted, unless:

Any non-quoted word will be stripped of leading and trailing blanks and converted to uppercase before the search.

Single-quoted strings

Strings quoted in single-quotes (') are converted to upper case and cause case to be ignored during the search. That is, they behave in the same manner as un-quoted strings as far as the search algorithm is concerned. As a rule of thumb, any string can be single-quoted if desired, even if it does not have to.

Single quotes must be doubled inside single-quoted strings, but double quotes should not:

Search '"Joel''s Big Adventure" report' in VANAGON

Double-quoted strings

Strings quoted in double-quotes (") are not converted to upper case. They result in a case-sensitive search, which means that you should never double-quote a string unless you want case to be respected during the search.

Double quotes must be doubled inside double-quoted strings, but single quotes should not:

Search """Joel's Big Adventure"" report" in VANAGON