<?xml version="1.0"?>
<!DOCTYPE wml PUBLIC "-//WAPFORUM//DTD WML 1.1//EN"
"http://www.wapforum.org/DTD/wml_1.1.xml">
<wml>
<card id="index" title="Text File" newcontext="true">
<p>
Received: with ECARTIS (v1.0.0; list gopher);
 Tue, 20 Dec 2005 18:31:01 -0600 (CST)
Received: from netblock-66-159-214-137.dslextreme.com
	([66.159.214.137] helo=floodgap.com ident=elvis)
	by glockenspiel.complete.org with esmtp
	(Exim 4.50)
	id 1Eorse-0007Ik-K1
	for gopher@complete.org; Tue, 20 Dec 2005 18:31:00 -0600
Received: (from spectre@localhost)
	by floodgap.com (6.6.6.666/2005.03.01) id QAA18218
	for gopher@complete.org; Tue, 20 Dec 2005 16:30:51 -0800
From: Cameron Kaiser &lt;spectre@floodgap.com&gt;
Message-Id: &lt;200512210030.QAA18218@floodgap.com&gt;
Subject: [gopher] Re: Whats all this talk about?
In-Reply-To: &lt;20051220180726.6afcb532@work1.hal3000.cx&gt; from Chris at &quot;Dec 20,
 5 06:07:26 pm&quot;
To: gopher@complete.org
Date: Tue, 20 Dec 2005 16:30:50 -0800 (PST)
X-Mailer: ELM [version 2.4ME+ PL39 (25)]
MIME-Version: 1.0
Content-type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 8bit
X-Spam-Status: No (score 0.3): AWL=0.214, FORGED_RCVD_HELO=0.05
X-Virus-Scanned: by Exiscan on glockenspiel.complete.org at Tue,
 20 Dec 2005 18:31:00 -0600
X-archive-position: 1180
X-ecartis-version: Ecartis v1.0.0
Sender: gopher-bounce@complete.org
Errors-to: gopher-bounce@complete.org
X-original-sender: spectre@floodgap.com
Precedence: bulk
Reply-to: gopher@complete.org
List-help: &lt;mailto:ecartis@complete.org?Subject=help&gt;
List-unsubscribe: &lt;mailto:gopher-request@complete.org?Subject=unsubscribe&gt;
List-software: Ecartis version 1.0.0
List-Id: Gopher &lt;gopher.complete.org&gt;
X-List-ID: Gopher &lt;gopher.complete.org&gt;
List-subscribe: &lt;mailto:gopher-request@complete.org?Subject=subscribe&gt;
List-owner: &lt;mailto:jgoerzen@complete.org&gt;
List-post: &lt;mailto:gopher@complete.org&gt;
List-archive: &lt;http://www.complete.org/mailinglists/archives/&gt;
X-list: gopher
</p>
<p>&gt; The box Veronica is on is a p200mmx
&gt; Jughead is on another p200mmx
&gt; Freebsd for both.
&gt; The list of sites is included with the
&gt; About_Veronica_Search text and
&gt; About_Multi_Search talks of Jughead
</p>
<p>Ah, thanks (*reads it*).
</p>
<p>&gt; I am having problems with the .tree script in
&gt; that there is not any decent fall backs for things
&gt; like high latency or lost connection, there is
&gt; an &quot;Alarm&quot; sent in text and that ends the &quot;tree-ing&quot;
&gt; for that site. This may be why the results are so far
&gt; differing at times with yours Cameron.
</p>
<p>Is the set up actually a crawler? It&#x27;s not clear to me if you&#x27;re using a
predigested index the outside sites provide, or if you&#x27;re crawling it
yourself. I&#x27;m assuming based on
</p>
<p>&gt; I have shown which sites &quot;Alarmed&quot; and therefore
&gt; are incomplete.
&gt; For instance:
&gt; gopher.semo.edu #alarm long way in
&gt; that is to say after a long time and quite far
&gt; in the tree I recieved an alarm which indcates
&gt; one of several things, timeout, loss of connection,
&gt; exceded &quot;depth&quot; etc.
</p>
<p>that you are crawling it yourself.
</p>
<p>&gt; Cameron I think you are indexing more than I atm as
&gt; well, with my raw data being about 20M and the data
&gt; file being 10M with a 1M offset file and a 5M &quot;other&quot;
&gt; file .
</p>
<p>How many selectors does that translate to? For the record,
</p>
<p>gopher% ls -sk # in kilobytes
total 956399
146408 history.MYD          3664 prospects.MYI          12 stats.frm
105496 history.MYI            12 prospects.frm      304968 textil.MYD
    12 history.frm             6 stats.MYD          391104 textil.MYI
  4696 prospects.MYD           9 stats.MYI              12 textil.frm
</p>
<p>so not quite a gig so far. Note it is not full-text.
</p>
<p>textil is the keyword and relevancy table, history is the selector/display
string database, prospects is the workspace table and stats is cached
precomputed statistics used for /world. This is with 1.1 million selectors,
give or take a couple thousand, using my regular &quot;stupid&quot; crawler library.
</p>
<p>Mind you, this is not a competition :) I&#x27;m just curious about how you&#x27;re
getting things up and running. So far you seem to be getting pretty good
results for an early effort, so you are to be congratulated.
</p>
<p>--
--------------------------------- personal: http://www.armory.com/~spectre/ ---
  Cameron Kaiser * Floodgap Systems * www.floodgap.com * ckaiser@floodgap.com
-- Hi! I am a .signature virus.  Copy me into your .signature to join in! -----
</p>
<p></p>
</card>
</wml>
