Searching list archives
How does it work?
Advanced searches
Search tips
Non-English searches
How does it work?
A search can be as simple as typing a single word in the "Search for:" box
and clicking on "Start the search," or it can involve the full power of LISTSERV's
database functions. Here are a few examples of simple searches (the text
of the example should be entered in the "Search for:" box, and none of the
other boxes should be filled in):
- To search for messages about John Kennedy, simply type John
Kennedy in the search box. This will show all the messages that contain
the words "John" and "Kennedy" close to each other.
- You could also type 'John Kennedy', but this would not show
messages about "John F. Kennedy".
- For better results, you could use (John Kennedy) or JFK so that
you also get the messages that say "JFK".
- To search for words that are not necessarily close to each other, use
"AND". For instance, Mozart and Beethoven would show all the messages
that mention both composers, whereas Mozart Beethoven would only find
a small fraction of them.
- To make a search case sensitive, enclose it in double quotation marks.
If you are interested in the works of Norman Mailer, you will probably find
that searching for Mailer returns a lot of unexpected messages, whereas
"Mailer" gives much better results.
- You can get as sophisticated as you want: ((John Kennedy) or JFK) and
not ((Bay Pigs) or Cuba) would look for messages about JFK that do not
mention Cuba or the Bay of Pigs.
- Some characters have special syntactical meaning to the database functions
and must be enclosed in single quotes for correct results. For instance,
parentheses need to be quoted in this manner: search for 'f(x)'
instead of f(x).
Advanced searches
In the previous section, we discussed how to make a
simple (or even complex) search using the "Search for:" box. While this is
sufficient for most searches, the other search options can be used to further
restrict the scope of your search and make it easier for you to find what you
are looking for.
The substring search checkbox
By default, searches will only match full words: searching for planet
will not find messages containing the word "planetarium" (unless they also
contain the word "planet"). But if you check the "substring search" box,
your search will match any word containing the string you have entered. For
instance, a substring search for chem would find both "chemistry"
and "alchemy."
The subject search box
To restrict your search to messages whose subject contains specific search
words, simply type them in the subject search box. The syntax is the same as
for the "Search for:" box, with one difference: the "AND"
operator is redundant, because a subject field is very short and all the words
are considered to be "close" to each other. Thus, in the subject box there is
no difference between a search for Mozart and Beethoven and a search
for Mozart Beethoven.
Subject searches are a good alternative when searching large archives, or
when searching for topics that are mentioned quite often. If a word that you
are looking for appears in the subject of a message, it is much more likely to
reflect the actual contents of the message than if it only appears in one
isolated sentence. On the other hand, maybe what you are looking for is hidden
in a message that was about something else, and where someone just happened to
mention your topic of interest in passing.
The author search box
You can also restrict your search to messages posted by a particular
person. If you know the e-mail address of the person who wrote the message
you are interested in, this can be a very effective way to find what you are
looking for, without having to go through dozens of unrelated messages.
Note that you do not need to know the exact e-mail address. For instance,
if you know that the userid is "john" and the host name is some machine at
XYZ.COM, you can simply enter john xyz.com in the search box.
Since the author's e-mail address is a single word, there is no concept of
"close" vs. "distant," and the AND operator is redundant: john
xyz.com and john and xyz.com are equivalent.
Whatever you
do, do not try to use wildcards (e.g. "john@*.xyz.com")
as this is not the correct syntax. The author search box uses the same syntax
as the subject and "Search for:" boxes.
The "since" and "until" search boxes
It is not uncommon for popular mailing lists to have archives spanning 10 or
more years of activity. If the mailing list is about technology, you may
not be interested in messages that are older than a few year. Or,
alternatively, you may happen to know when approximately the information you
are looking for was posted to the list. You can use the "Since" and "Until"
boxes to restrict your search accordingly.
The syntax is very flexible and you can specify a date and/or time in just
about any of the commonly used formats:
- 23 Jun 1986 (self explanatory).
- 1986-06-23 (international date format).
- 1995 or just 95 selects 1 Jan 1995 for the "since" box
or 31 Dec 1995 for the "until" box.
- APR selects April of the current year, 1st or 30th depending on
whether this was entered in the "since" or "until" box.
- APRIL 95 same as above, but for the year 1995.
- TODAY-7 (7 days ago) makes it easy to get a list of all the
messages posted in the past week. You can also use YESTERDAY or
TODAY for a shorter time span.
IMPORTANT: The US date format (mm/dd or mm/dd/yy) is not supported
because it is ambiguous. Many other countries use dd/mm or dd/mm/yy instead,
and to avoid ambiguities LISTSERV only supports the international date format,
yyyy-mm-dd or yy/mm/dd.
Search tips
Here are a few tips which may prove useful if you are not getting anywhere
with your search.
- In most cases, you will save a lot of time by using the
"Since" and "Until" boxes to narrow your search to a
particular date range, even if it is very approximate.
- If you know the author of the message and have his e-mail address, use
the author search box to restrict your search.
- If you know the author's name, but not his e-mail address, add his name to
the "Search for:" box. Hopefully it will be somewhere
in the message header or text, and this will help narrowing the search. Make
sure to clearly separate the name from the rest of the search. If you were
looking for computer stores and know that the message you are looking
for was written by Mary Travis, your new search should be for (computer
stores) and (Mary Travis) (if you just search for computer stores
Mary Travis, the four words will have to be close to each other or there
will be no match).
- Make sure to read the notes on non-English searches
if you are conducting a search in a language that uses non-English characters.
- An easy way to find a recent message is to make a search with
TODAY-7 in the "Since" box, leaving all the other boxes empty. You can
add the URL to your hotlist and come back to it regularly to see all the
messages posted in the last week.
Non-English searches
Every effort has been made to make ISO-8859-* searches work as transparently
as possible, in spite of the complexity of the situation. In order to better
understand the cases where searches do not actually work as expected, you
should know that the messages are archived in the format in which they were
originally sent. This will typically include a mix of native 8-bit text,
MIME quoted-printable text, MIME base64 text, and other proprietary encoding
methods such as WINMAIL.DAT, plus of course 7-bit text. Each of these messages
presents its own challenges:
- Native 8-bit text normally produces the expected results. See below
for a list of generic problems that may affect even native 8-bit text.
- MIME quoted-printable text will, in most cases, produce the expected
results. Conceptually, the search is carried out as though the =xx
escape sequences had been replaced with their corresponding characters before
beginning the search. However, soft line breaks (trailing '=' signs) are not
processed (the lines are not merged). If the poster's mail client uses soft
line breaks to split words in the middle, they will not be recognized. For
instance, if the word "house" were written as "hou=" on one line followed by
"se" on the next line, LISTSERV would not find a match with the search string
"house".
- MIME base64 text is not supported by the search interface. This type
of encoding should only be used for binary data, because it is totally
unintelligible to people without a MIME user interface and because it is
context sensitive (that is, LISTSERV would have to decode the entire message
before beginning the search).
- Proprietary encoding methods such as WINMAIL.DAT are not supported
by the search interface. In most cases, these formats suffer from the same kind
of problems as MIME base64 text, and the mail programs that generate these
messages are being replaced with MIME-capable programs.
- 7-bit text (with national characters) does not work at all. It is
impossible to translate this text to native 8-bit form without knowing the
language in which it is written.
In addition, there are a number of generic problems that affect all message
formats:
- Code page: a typical international archive will contain messages in
a variety of incompatible code pages (Latin-1, Icelandic, etc.) While
LISTSERV knows the code page of each of the individual messages, it does not
know the code page of the search string you are entering, nor does it support
searches that span multiple code pages. If you search for one of the characters
in the Icelandic code page, LISTSERV may incorrectly match messages written in
another code page in which this character is not present, but where another
character with the same binary code was found in the message.
- Case-insensitive searches: special tables are required to properly
evaluate case-insensitive searches with non-ASCII characters. The tables
LISTSERV uses were designed for the Latin-1 (ISO-8859-1) code page and may not
give correct results with other code pages.
- EBCDIC systems: LISTSERV servers running on EBCDIC systems may give
incorrect results due to the multiple ASCII-EBCDIC translation steps
involved in processing your request. The TCP/IP product, the SMTP server, the
web server and LISTSERV each have their own tables, which may or may not be
identical.
Please note - During the past 17 years of operation, several gigabytes of
Vanagon mail messages have been archived. Searching the entire collection
will take up to five minutes to complete. Please be patient!
Return to the archives
@ gerry.vanagon.com
The vanagon mailing list archives are copyright (c)
1994-2011, and may not be reproduced without the
express written permission of the list administrators.
Posting messages to this mailing list grants
a license to the mailing list administrators to reproduce
the message in a compilation, either printed or electronic.
All compilations will be not-for-profit, with any excess
proceeds going to the Vanagon mailing list.
Any profits from list compilations go exclusively
towards the management and operation of the Vanagon mailing
list and vanagon mailing list web site.