Received: with ECARTIS (v1.0.0; list gopher); Wed, 23 Aug 2006 18:13:14 -0500 (CDT) Received: from outbound3.mail.tds.net ([216.170.230.93]) by glockenspiel.complete.org with esmtp (Exim 4.50) id 1GG1uJ-0007GN-E8 for gopher@complete.org; Wed, 23 Aug 2006 18:13:14 -0500 Received: from outaamta02.mail.tds.net (outaamta02.mail.tds.net [216.170.230.32]) by outbound3.mail.tds.net (8.13.6/8.13.4) with ESMTP id k7NND7LO000842 for ; Wed, 23 Aug 2006 18:13:07 -0500 Received: from [127.0.0.1] (really [69.21.205.10]) by outaamta02.mail.tds.net with ESMTP id <20060823231306.CRLF1142.outaamta02.mail.tds.net@[127.0.0.1]> for ; Wed, 23 Aug 2006 18:13:06 -0500 Message-ID: <44ECE066.7040001@sdf.lonestar.org> Date: Wed, 23 Aug 2006 18:10:30 -0500 From: Benn Newman User-Agent: Thunderbird 1.5.0.5 (Windows/20060719) MIME-Version: 1.0 To: gopher@complete.org Subject: [gopher] Re: Gopherspace archive References: <200608231518.k7NFIDGJ005586@floodgap.com> In-Reply-To: <200608231518.k7NFIDGJ005586@floodgap.com> Content-type: text/plain X-Spam-Status: No (score 0.0): none X-Virus-Scanned: by Exiscan on glockenspiel.complete.org at Wed, 23 Aug 2006 18:13:14 -0500 Content-Transfer-Encoding: 8bit X-archive-position: 1369 X-ecartis-version: Ecartis v1.0.0 Sender: gopher-bounce@complete.org Errors-to: gopher-bounce@complete.org X-original-sender: newmanbe@sdf.lonestar.org Precedence: bulk Reply-to: gopher@complete.org List-help: List-unsubscribe: List-software: Ecartis version 1.0.0 List-Id: Gopher X-List-ID: Gopher List-subscribe: List-owner: List-post: List-archive: X-list: gopher Cameron Kaiser wrote: >> :)). With all the software and binary stuff taken out, I think it should >> (nearly) manageable. The index file shouldn't be nearly as big as the >> whole archive, I could then make a front end to that (yay for sed and awk!). > > Looking at the size of V-2's index file, I wouldn't count on that if you have > any kind of useful indexing. Yes, you won't be indexing binaries, but a large > text file will have *plenty* of keywords, and an index of keywords itself has > to be indexed by your database engine to be usefully searchable. That'll > consume a platter or two. > > I haven't decided yet if I can support hosting the archive. This sounds like > a job for BitTorrent unless the other solution is to unpack the archive and > let people take the pieces they want instead of downloading the whole thing. > My uplink is a woeful 608k max ADSL line and the web and gopher servers both > fight over it on a daily basis. > If you would read the paper (We interrupt this e-mail for an MLA-ish citation! Lesk, M. E., "Some Applications of Inverted Indexes on the UNIX System." Murray Hill, New Jersey: A really long time ago We now return to our e-mail) you would find out that there is a maximum number of keys per file/entry (or it is supposed to to anyway, documentation not meeting reality adds spice to life (or something like that)). I would not be writing the engine (I don't hate myself *that* much), just a Gopher front-end. I never said it would be "fast." I think bittorrent is the way to go. -- Benn Newman -- Binary/unsupported file stripped by Ecartis -- -- Type: application/x-pkcs7-signature -- File: smime.p7s -- Desc: S/MIME Cryptographic Signature