Received: with ECARTIS (v1.0.0; list gopher); Tue, 20 Dec 2005 18:31:01 -0600 (CST) Received: from netblock-66-159-214-137.dslextreme.com ([66.159.214.137] helo=floodgap.com ident=elvis) by glockenspiel.complete.org with esmtp (Exim 4.50) id 1Eorse-0007Ik-K1 for gopher@complete.org; Tue, 20 Dec 2005 18:31:00 -0600 Received: (from spectre@localhost) by floodgap.com (6.6.6.666/2005.03.01) id QAA18218 for gopher@complete.org; Tue, 20 Dec 2005 16:30:51 -0800 From: Cameron Kaiser Message-Id: <200512210030.QAA18218@floodgap.com> Subject: [gopher] Re: Whats all this talk about? In-Reply-To: <20051220180726.6afcb532@work1.hal3000.cx> from Chris at "Dec 20, 5 06:07:26 pm" To: gopher@complete.org Date: Tue, 20 Dec 2005 16:30:50 -0800 (PST) X-Mailer: ELM [version 2.4ME+ PL39 (25)] MIME-Version: 1.0 Content-type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 8bit X-Spam-Status: No (score 0.3): AWL=0.214, FORGED_RCVD_HELO=0.05 X-Virus-Scanned: by Exiscan on glockenspiel.complete.org at Tue, 20 Dec 2005 18:31:00 -0600 X-archive-position: 1180 X-ecartis-version: Ecartis v1.0.0 Sender: gopher-bounce@complete.org Errors-to: gopher-bounce@complete.org X-original-sender: spectre@floodgap.com Precedence: bulk Reply-to: gopher@complete.org List-help: List-unsubscribe: List-software: Ecartis version 1.0.0 List-Id: Gopher X-List-ID: Gopher List-subscribe: List-owner: List-post: List-archive: X-list: gopher > The box Veronica is on is a p200mmx > Jughead is on another p200mmx > Freebsd for both. > The list of sites is included with the > About_Veronica_Search text and > About_Multi_Search talks of Jughead Ah, thanks (*reads it*). > I am having problems with the .tree script in > that there is not any decent fall backs for things > like high latency or lost connection, there is > an "Alarm" sent in text and that ends the "tree-ing" > for that site. This may be why the results are so far > differing at times with yours Cameron. Is the set up actually a crawler? It's not clear to me if you're using a predigested index the outside sites provide, or if you're crawling it yourself. I'm assuming based on > I have shown which sites "Alarmed" and therefore > are incomplete. > For instance: > gopher.semo.edu #alarm long way in > that is to say after a long time and quite far > in the tree I recieved an alarm which indcates > one of several things, timeout, loss of connection, > exceded "depth" etc. that you are crawling it yourself. > Cameron I think you are indexing more than I atm as > well, with my raw data being about 20M and the data > file being 10M with a 1M offset file and a 5M "other" > file . How many selectors does that translate to? For the record, gopher% ls -sk # in kilobytes total 956399 146408 history.MYD 3664 prospects.MYI 12 stats.frm 105496 history.MYI 12 prospects.frm 304968 textil.MYD 12 history.frm 6 stats.MYD 391104 textil.MYI 4696 prospects.MYD 9 stats.MYI 12 textil.frm so not quite a gig so far. Note it is not full-text. textil is the keyword and relevancy table, history is the selector/display string database, prospects is the workspace table and stats is cached precomputed statistics used for /world. This is with 1.1 million selectors, give or take a couple thousand, using my regular "stupid" crawler library. Mind you, this is not a competition :) I'm just curious about how you're getting things up and running. So far you seem to be getting pretty good results for an early effort, so you are to be congratulated. -- --------------------------------- personal: http://www.armory.com/~spectre/ --- Cameron Kaiser * Floodgap Systems * www.floodgap.com * ckaiser@floodgap.com -- Hi! I am a .signature virus. Copy me into your .signature to join in! -----