Date: Sat, 28 Mar 2009 18:24:41 -0400
Reply-To: Harold Teer <teer.vanagon@GMAIL.COM>
Sender: Vanagon Mailing List <vanagon@gerry.vanagon.com>
From: Harold Teer <teer.vanagon@GMAIL.COM>
Subject: Re: Download archives
In-Reply-To: <vanagon%2009032815562303@GERRY.VANAGON.COM>
Content-Type: text/plain; charset=ISO-8859-1
On Sat, Mar 28, 2009 at 3:47 PM, Greg Sorkilmo <sorkilmo@gmail.com> wrote:
> Is it still possible to download the archives? I find that searching for
> stuff via the online archive takes way to long.
> Thanks,
> Greg S.
> 87' Vanagon Wolfsburg
>
>
Greg, I used the following procedure by Wes and it works perfectly for me.
Harold Teer
pickle vanagon to vanagon
show details Jan 11
A while ago, I promised that I would put together a file with some
easy-to-follow instructions about how to get the vanagon archives into your
gmail account "in a few days". First a few words on why this is necessary:
the web search interface for the archives doesn't use an indexed search,
which means that searching it can take 5 minutes or so, or, in practice,
hours or more, since it will often simply time out before returning a
result. Searching the archives with google is not a good substitute, since
google doesn't index most of the archives, so that a google search will miss
many messages. (I will not cover this fact in these directions. If you
missed my previous examples of how the google search doesn't work, "search
the archives" to find them. Once you follow the directions below to get the
archives in a browseable and searchable form, it will become obvious just
how bad google is for the archives, as you'll see that some searches that
get only a handful of hits with google should actually be getting
hundreds). There is one decent alternative to what I describe below when it
comes to searching the archives: the "email" search interface, described
here: http://gerry.vanagon.com/info/searching.html. It is still painfully
slow and has its own share of issues, but it doesn't time out and is
reliable.
Weeks later, I'm finally ready to deliver something less than what I
promised. It is possible to get all of the archives into gmail (I've done
it with my account) but I don't recommend it. It is slow and tedious (it
took me more than a week of constant uploading by an automated script) due
to gmail's slow imap import process. It can also make gmail mad at you (I
got locked out of my account for a day at some point) and is generally not a
good choice for most people. For that reason, I'm not going to try to give
"step-by step" directions on how to get the archives into your gmail.
But what I will do is almost as good (or maybe even better). By importing
the archives into the opera mail client (which is free), you will have
access to the entire archives of the vanagon list whether or not you are
connected to the internet. Plus, since opera mail has an "indexed search",
you can search the entire archives in seconds all on your own machine. The
best thing would be if the web archive interface was "fixed" to use an
indexed search, but until that happens, I consider this to be a pretty good
solution for most people.
I've prepared the archives as an "mbox" file, which John Meeks has been kind
enough to agree to host for listmembers to download on his vanagonauts
site. You can also import the mbox file into a different email client of
your choice, but I can't write instructions for every email client so you're
on your own there. I chose opera as the "suggested" client because it seems
to be the only readily available client that has an "indexed search" built
in (and is available for just about any operating system), but another
solution is to use another client (like thunderbird) in conjunction with
google desktop or a similar search program. Like I said before, you are on
your own if you want to pursue that route.
One final thing: in spite of the fact that these directions can give us the
ability to have complete access to the archives, I think we should keep in
mind that most listmembers will not bother to download the file and go
through with this (which is quite reasonable, since the process requires
upwards of a gigabyte of free space to complete). As a result, I would
suggest that it is still probably not very constructive to answer peoples
questions with the answer "search the archives", since, for many intents and
purposes, the web archives remain broken and google doesn't work either.
Here are my directions to get the archive into opera mail:
0. Download the gerry.zip file from
http://www.vanagonauts.com/files/gerry.zip
1. The file gerry.zip is compressed and needs to be unzipped. (Some
versions of Windows may have this functionality built in?) The size of the
file gerry.zip is 162M. If you have a file that is much smaller than that,
it probably didn't download correctly. (For anybody that wants to know, the
MD5 sum of the file is 408626910cc14145089907ba7f7c66c1)
2. When unzipped, the file "gerry" (no extension) is 576 megabytes. It
contains every message in the vanagon archives (starting on April 2 1994)
through November 30, 2008. The unzipped file is an "mbox" file, which can
be imported or used by most mail programs. You are on your own when it
comes to importing this into your mail program of choice. I will show you
how to import the file into opera mail, which includes an indexed search so
that you can search the entire archive in seconds.
3. Install opera (get it for free from www.opera.com)
4. On the left bar in opera, click on the envelope button to go to opera
mail. Under the file menu, choose "Import and Export" > "Import Mail".
Choose "generic mbox file" and click next. Hit the "add mbox..." button and
locate the unzipped "gerry" file. Under "Import into:", you can leave the
setting as "new account" or you can choose an existing account if you
already have an account setup that you would like to use. Click next, and
opera will start processing the file. This will take a long time. When it
finishes, you will have the archives in an easily searchable and browsable
location.
5. If you setup opera as your email client for your vanagon subscription
(follow readily available online instructions), you will always have all of
the vanagon mail available for search, minus the "gap" between November 30,
2008 and whenever you set it up. Maybe at some point in the future, I will
make an extra file to cover the "gap".
Happy searching!
Wes
|