Vanagon EuroVan
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (January 2009, week 2)Back to main VANAGON pageJoin or leave VANAGON (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Sun, 11 Jan 2009 13:43:14 -0500
Reply-To:     pickle vanagon <greenvanagon@GMAIL.COM>
Sender:       Vanagon Mailing List <vanagon@gerry.vanagon.com>
From:         pickle vanagon <greenvanagon@GMAIL.COM>
Subject:      Howto: make your own searchable vanagon archives
Content-Type: text/plain; charset=ISO-8859-1

A while ago, I promised that I would put together a file with some easy-to-follow instructions about how to get the vanagon archives into your gmail account "in a few days". First a few words on why this is necessary: the web search interface for the archives doesn't use an indexed search, which means that searching it can take 5 minutes or so, or, in practice, hours or more, since it will often simply time out before returning a result. Searching the archives with google is not a good substitute, since google doesn't index most of the archives, so that a google search will miss many messages. (I will not cover this fact in these directions. If you missed my previous examples of how the google search doesn't work, "search the archives" to find them. Once you follow the directions below to get the archives in a browseable and searchable form, it will become obvious just how bad google is for the archives, as you'll see that some searches that get only a handful of hits with google should actually be getting hundreds). There is one decent alternative to what I describe below when it comes to searching the archives: the "email" search interface, described here: http://gerry.vanagon.com/info/searching.html. It is still painfully slow and has its own share of issues, but it doesn't time out and is reliable.

Weeks later, I'm finally ready to deliver something less than what I promised. It is possible to get all of the archives into gmail (I've done it with my account) but I don't recommend it. It is slow and tedious (it took me more than a week of constant uploading by an automated script) due to gmail's slow imap import process. It can also make gmail mad at you (I got locked out of my account for a day at some point) and is generally not a good choice for most people. For that reason, I'm not going to try to give "step-by step" directions on how to get the archives into your gmail.

But what I will do is almost as good (or maybe even better). By importing the archives into the opera mail client (which is free), you will have access to the entire archives of the vanagon list whether or not you are connected to the internet. Plus, since opera mail has an "indexed search", you can search the entire archives in seconds all on your own machine. The best thing would be if the web archive interface was "fixed" to use an indexed search, but until that happens, I consider this to be a pretty good solution for most people.

I've prepared the archives as an "mbox" file, which John Meeks has been kind enough to agree to host for listmembers to download on his vanagonauts site. You can also import the mbox file into a different email client of your choice, but I can't write instructions for every email client so you're on your own there. I chose opera as the "suggested" client because it seems to be the only readily available client that has an "indexed search" built in (and is available for just about any operating system), but another solution is to use another client (like thunderbird) in conjunction with google desktop or a similar search program. Like I said before, you are on your own if you want to pursue that route.

One final thing: in spite of the fact that these directions can give us the ability to have complete access to the archives, I think we should keep in mind that most listmembers will not bother to download the file and go through with this (which is quite reasonable, since the process requires upwards of a gigabyte of free space to complete). As a result, I would suggest that it is still probably not very constructive to answer peoples questions with the answer "search the archives", since, for many intents and purposes, the web archives remain broken and google doesn't work either.

Here are my directions to get the archive into opera mail:

0. Download the gerry.zip file from http://www.vanagonauts.com/files/gerry.zip 1. The file gerry.zip is compressed and needs to be unzipped. (Some versions of Windows may have this functionality built in?) The size of the file gerry.zip is 162M. If you have a file that is much smaller than that, it probably didn't download correctly. (For anybody that wants to know, the MD5 sum of the file is 408626910cc14145089907ba7f7c66c1) 2. When unzipped, the file "gerry" (no extension) is 576 megabytes. It contains every message in the vanagon archives (starting on April 2 1994) through November 30, 2008. The unzipped file is an "mbox" file, which can be imported or used by most mail programs. You are on your own when it comes to importing this into your mail program of choice. I will show you how to import the file into opera mail, which includes an indexed search so that you can search the entire archive in seconds. 3. Install opera (get it for free from www.opera.com) 4. On the left bar in opera, click on the envelope button to go to opera mail. Under the file menu, choose "Import and Export" > "Import Mail". Choose "generic mbox file" and click next. Hit the "add mbox..." button and locate the unzipped "gerry" file. Under "Import into:", you can leave the setting as "new account" or you can choose an existing account if you already have an account setup that you would like to use. Click next, and opera will start processing the file. This will take a long time. When it finishes, you will have the archives in an easily searchable and browsable location. 5. If you setup opera as your email client for your vanagon subscription (follow readily available online instructions), you will always have all of the vanagon mail available for search, minus the "gap" between November 30, 2008 and whenever you set it up. Maybe at some point in the future, I will make an extra file to cover the "gap".

Happy searching! Wes


Back to: Top of message | Previous page | Main VANAGON page

Please note - During the past 17 years of operation, several gigabytes of Vanagon mail messages have been archived. Searching the entire collection will take up to five minutes to complete. Please be patient!


Return to the archives @ gerry.vanagon.com


The vanagon mailing list archives are copyright (c) 1994-2011, and may not be reproduced without the express written permission of the list administrators. Posting messages to this mailing list grants a license to the mailing list administrators to reproduce the message in a compilation, either printed or electronic. All compilations will be not-for-profit, with any excess proceeds going to the Vanagon mailing list.

Any profits from list compilations go exclusively towards the management and operation of the Vanagon mailing list and vanagon mailing list web site.