Vanagon EuroVan
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (December 1996)Back to main VANAGON pageJoin or leave VANAGON (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Thu, 5 Dec 1996 19:58:20 -0500 (EST)
Sender:       Vanagon Mailing List <vanagon@vanagon.com>
From:         Skip Montanaro <skip@automatrix.com>
Subject:      New archiving steps being taken for the Vanagon and Type2 lists

Driven partly by panic that I'd get to a point in the very near future where I wouldn't be able to recover enough disk space for our corporate Web server to function properly, I began looking at alternate archiving options for the Vanagon and Type2 mailing lists yesterday. Together they currently consume about 190MB.

I implemented a new, saner, rollover scheme that keeps the current "volume" the same as it ever was, but when it gets rolled over, it winds up in a gzipped tar file (that's the Unix geek equivalent of things like PKZIP (DOS/Windows) or StuffIt (Mac) files). Viewing rolled over archives will be a bit slower, because access to them will go through a CGI script that extracts the file from the archive before delivering it.

There are some obvious advantages with this new scheme:

* much improved disk space utilization (duh!) - things seem to be squishing by about a factor of six.

* if you are so inclined you can pull an entire archive from the server to view locally (though there is an itty bitty problem with that - see below).

* once the current Vanagon list volume is rolled over into a gzipped archive it should stay put, which means you should be able create bookmarks to messages once they get to that stage.

There are a couple disadvantages as well:

* access to the gzipped files are going to be slower (duh!^2). Of course, most of the accesses to the archive are probably to the most recent messages, so this shouldn't be a huge problem. (Trust me. If gets to be a problem, I'll let you all know... ;-)

* because of the way I put the archive together, the links in the files are hardcoded, so although you can download and extract the gzipped archives and view the individual files, you won't be able to follow links without filtering the hardcoded URLs in the files (could be done with a very simple sed script).

That said, here's where stuff exists now (first URL in a pair is for the Vanagon list, second for Type2):

* The most recent messages are still available at:

http://www.automatrix.com/~skip/volkswagen/vanagon-list/ http://www.automatrix.com/~skip/volkswagen/type2/

* The gzipped tarfile archives are in (or will be in):

http://www.automatrix.com/~skip/volkswagen/vanagon-list/archives/ http://www.automatrix.com/~skip/volkswagen/type2/archives/

The archives are numbered 0000.tar.gz, 0001.tar.gz, and so on. The most recent one is always available as latest.tar.gz.

* To view (for instance) the index.html file of the 0004.tar.gz archive, use:

http://www.automatrix.com/cgi-bin/tgzextr/~skip/volkswagen/vanagon-list/archives/0004.tar.gz?file=index.html http://www.automatrix.com/cgi-bin/tgzextr/~skip/volkswagen/type2/archives/0004.tar.gz?file=index.html

Other file references are similar. Sorry for the length of the URLs. Hopefully you can all point-and-click from your mail readers...

If you have any trouble with the new archiving scheme, let me know. It will probably take awhile to get things completely converted, since I'm trying to squeeze the archive conversions in at quiet times on the server. I have about 50 directories still to convert in the Vanagon archive. After I finish with them I'll start in on the Type2 archive (16 directories).

I plan to make one other immediate change to way things are archived. I will run a periodic program on the server that detects when a certain number of messages has been archived in a volume and automatically roll the volume over into a gzipped tar file. 400 messages seems like a reasonable size. This will keep the individual archives from getting too unwieldy when I'm not looking.

Longer term I'd like to implement a simple search engine to improve the utility of the archives. Browsing through hypermail indexes isn't particularly efficient for anybody. I'd like to also figure out a way to not have to fiddle the URLs in the files.

<plug type=shameless>

If you have file archives you'd like to make available on the Web from a Unix server, or have a large set of files on your Web site taking up space that you can get by with browsing from the Web, you might want to take a look at the Python script, tgzextr. It's available from:

http://www.automatrix.com/~skip/

If you've never used Python to develop scripts (small or large), you should check it out at

http://www.python.org/

</plug>

Cheers,

Skip Montanaro | Musi-Cal: http://concerts.calendar.com/ skip@calendar.com | "It doesn't matter where you get your appetite as (518)372-5583 | long as you eat at home." -- Sloan Wainwright


Back to: Top of message | Previous page | Main VANAGON page

Please note - During the past 17 years of operation, several gigabytes of Vanagon mail messages have been archived. Searching the entire collection will take up to five minutes to complete. Please be patient!


Return to the archives @ gerry.vanagon.com


The vanagon mailing list archives are copyright (c) 1994-2011, and may not be reproduced without the express written permission of the list administrators. Posting messages to this mailing list grants a license to the mailing list administrators to reproduce the message in a compilation, either printed or electronic. All compilations will be not-for-profit, with any excess proceeds going to the Vanagon mailing list.

Any profits from list compilations go exclusively towards the management and operation of the Vanagon mailing list and vanagon mailing list web site.