Received: with LISTAR (v1.0.0; list gopher); Mon, 05 Nov 2001 10:34:05 -0500 (EST) Return-Path: <spectre@stockholm.ptloma.edu> Delivered-To: gopher@complete.org Received: from stockholm.ptloma.edu (stockholm.ptloma.edu [199.106.86.50]) by pi.glockenspiel.complete.org (Postfix) with ESMTP id E3B673B80B for <gopher@complete.org>; Mon, 5 Nov 2001 10:34:04 -0500 (EST) Received: (from spectre@localhost) by stockholm.ptloma.edu (8.9.1/8.9.1) id HAA07810 for gopher@complete.org; Mon, 5 Nov 2001 07:39:10 -0800 From: Cameron Kaiser <spectre@stockholm.ptloma.edu> Message-Id: <200111051539.HAA07810@stockholm.ptloma.edu> Subject: [gopher] Large indexing systems To: gopher@complete.org Date: Mon, 5 Nov 2001 07:39:10 -0800 (PST) X-Mailer: ELM [version 2.4ME+ PL39 (25)] MIME-Version: 1.0 Content-type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 8bit X-archive-position: 227 X-listar-version: Listar v1.0.0 Sender: gopher-bounce@complete.org Errors-to: gopher-bounce@complete.org X-original-sender: spectre@stockholm.ptloma.edu Precedence: bulk Reply-to: gopher@complete.org List-help: <mailto:listar@complete.org?Subject=help> List-unsubscribe: <mailto:gopher-request@complete.org?Subject=unsubscribe> List-software: Listar version 1.0.0 X-List-ID: Gopher <gopher.complete.org> List-subscribe: <mailto:gopher-request@complete.org?Subject=subscribe> List-owner: <mailto:jgoerzen@complete.org> List-post: <mailto:gopher@complete.org> List-archive: <http://www.complete.org/mailinglists/archives/> X-list: gopher

Soliciting suggestions:

sfWAIS has crapped out on Veronica-2's final database. (When the pedal hits the metal ...) Apparently it can't cope with a dictionary that size -- when it comes to the final merge, it dies with a file seek error. Some hasty calculations seem to allege that disk space is not the problem.

Does anyone have experience with a good large-document number indexing system? I tried Isearch, which was developed by people connected with the WAIS project, but it doesn't like the ancient g++ on this system and this system doesn't like newer g++'s :-) and there's no guarantee it doesn't suffer from the same problem, anyway.

I have a few ideas for developing my own large-document number indexer, and I did some simulations with a rough version and got some hopeful numbers back w.r.t. disk space utilisation and search time latency. However, going on to develop this fully would unnecessarily delay the release of the last V-2 database as I would have to write something to build the new search index and then rewrite VISHNU and Veronica-2 to talk to it. So, any suggestions from the floor?

-- ----------------------------- personal page: http://www.armory.com/~spectre/ -- Cameron Kaiser, Point Loma Nazarene University * ckaiser@stockholm.ptloma.edu -- Please dispose of this message in the usual manner. -- Mission: Impossible -