Received: with ECARTIS (v1.0.0; list gopher); Fri, 06 Oct 2006 00:55:43 -0500 (CDT) Received: from mo-69-69-114-6.sta.embarqhsd.net ([69.69.114.6] helo=erwin.lan.complete.org) by glockenspiel.complete.org with esmtps (with TLS-1.0:RSA_AES_256_CBC_SHA:32) (TLS peer CN erwin.complete.org, certificate verified) (Exim 4.50) id 1GVigN-0001Wq-7E; Fri, 06 Oct 2006 00:55:43 -0500 Received: from katherina.lan.complete.org ([10.200.0.4]) by erwin.lan.complete.org with esmtps (with TLS-1.0:RSA_AES_256_CBC_SHA:32) (No TLS peer certificate) (Exim 4.50) id 1GVigI-0005rj-OH; Fri, 06 Oct 2006 00:55:34 -0500 Received: from jgoerzen by katherina.lan.complete.org with local (Exim 4.63) (envelope-from ) id 1GVigH-0002tN-Tm; Fri, 06 Oct 2006 00:55:33 -0500 Date: Fri, 6 Oct 2006 00:55:33 -0500 From: John Goerzen To: gopher@complete.org Subject: [gopher] The archive Message-ID: <20061006055533.GB10760@katherina.lan.complete.org> MIME-Version: 1.0 Content-type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.13 (2006-08-11) X-Spam-Status: No (score 0.1): AWL=0.031, FORGED_RCVD_HELO=0.05 X-Virus-Scanned: by Exiscan on glockenspiel.complete.org at Fri, 06 Oct 2006 00:55:43 -0500 Content-Transfer-Encoding: 8bit X-archive-position: 1407 X-ecartis-version: Ecartis v1.0.0 Sender: gopher-bounce@complete.org Errors-to: gopher-bounce@complete.org X-original-sender: jgoerzen@complete.org Precedence: bulk Reply-to: gopher@complete.org List-help: List-unsubscribe: List-software: Ecartis version 1.0.0 List-Id: Gopher X-List-ID: Gopher List-subscribe: List-owner: List-post: List-archive: X-list: gopher First off, thanks to all those that have expressed interest in this. I have your emails and will get back to you. I've been rather busy lately due to the birth of our first baby [1] and our upcoming move in about a week. So it will likely be some time before I actually get anything sent off. I realized also that quux.org had never been included in the run, since it was large and I could populate it from local backups, which I have now done. I'd also like to document the directory structure. It is, roughly: gopher-arch/gopher/hostname/portnumber/selector Wheere the selector is a Gopher menu, you will see it exist as a directory with a file named .gophermap within it. This file contains the raw Gopher menu file that was sent over by the server. This should be easily usable by PyGopherd and Bucktooth with only minor modifications. I have run a duplicate file detector across the entire archive. Any duplicate files in it are hardlinked together. This saved about 10G of space. If you're on Windows, expect this to consume 10G more when unpacked than if you're on a Unix. I also have a dump of the PostgreSQL database behind the robot (10M compressed, 200M uncompressed, 1.2G when loaded into PostgreSQL). I will toss that on the DVD as well for anyone that's interested. The DVDs will be generated with: tar -cvf - gopher-arch/ | bzip2 -9 | split -d -b 4200m - gopher-arch.tar.bz2. That is, each DVD will contain a slice of the tar'd+bzipped directory. If you are going to get a set of DVDs, you can read them in, and simply: cat gopher-arch.tar.bz2.* | bzcat | tar -xvf - Some gopher servers do not use the slash as a path separator in the selector. Those servers will have a huge number of files/directories in their top-level -- could be thousands. You will need an efficient modern filesystem to extract all of them in their entirety, but there aren't many. I will get back to everyone once I have the time to send out the DVDs. [1] http://changelog.complete.org/posts/545-The-News.html