<?xml version="1.0"?>
<!DOCTYPE wml PUBLIC "-//WAPFORUM//DTD WML 1.1//EN"
"http://www.wapforum.org/DTD/wml_1.1.xml">
<wml>
<card id="index" title="Text File" newcontext="true">
<p>
Received: with ECARTIS (v1.0.0; list gopher);
 Wed, 25 Jun 2003 00:30:12 -0500 (CDT)
Return-Path: &lt;spectre@floodgap.com&gt;
X-Original-To: gopher@complete.org
Delivered-To: gopher@complete.org
Received: by gesundheit.complete.org (Postfix, from userid 108)
	id 0559B183203C; Wed, 25 Jun 2003 00:30:09 -0500 (CDT)
X-Scanned-By: clamscan at complete.org
Received: from floodgap.com (netblock-66-159-214-137.dslextreme.com
 [66.159.214.137])
	by gesundheit.complete.org (Postfix) with ESMTP id 73A0F1832014
	for &lt;gopher@complete.org&gt;; Wed, 25 Jun 2003 00:30:03 -0500 (CDT)
Received: (from spectre@localhost)
	by floodgap.com (8.9.1/2003.05.26) id WAA13998
	for gopher@complete.org; Tue, 24 Jun 2003 22:39:42 -0700
From: Cameron Kaiser &lt;spectre@floodgap.com&gt;
Message-Id: &lt;200306250539.WAA13998@floodgap.com&gt;
Subject: [gopher] Veronica-2 again, and one last robots.txt argument
To: gopher@complete.org
Date: Tue, 24 Jun 2003 22:39:42 -0700 (PDT)
X-Mailer: ELM [version 2.4ME+ PL39 (25)]
MIME-Version: 1.0
Content-type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 8bit
X-archive-position: 772
X-ecartis-version: Ecartis v1.0.0
Sender: gopher-bounce@complete.org
Errors-to: gopher-bounce@complete.org
X-original-sender: spectre@floodgap.com
Precedence: bulk
Reply-to: gopher@complete.org
List-help: &lt;mailto:ecartis@complete.org?Subject=help&gt;
List-unsubscribe: &lt;mailto:gopher-request@complete.org?Subject=unsubscribe&gt;
List-software: Ecartis version 1.0.0
List-Id: Gopher &lt;gopher.complete.org&gt;
X-List-ID: Gopher &lt;gopher.complete.org&gt;
List-subscribe: &lt;mailto:gopher-request@complete.org?Subject=subscribe&gt;
List-owner: &lt;mailto:jgoerzen@complete.org&gt;
List-post: &lt;mailto:gopher@complete.org&gt;
List-archive: &lt;http://www.complete.org/mailinglists/archives/&gt;
X-list: gopher
</p>
<p>The new crawler just took its first step tonight by taking a harnessed walk
around gopher.floodgap.com. It reads and understands robots.txt files (the
User-agent is veronica, or *), correctly traverses trees, and generates sane
indexes. Loop protection and auto-pruning will get tested later.
</p>
<p>Revisiting robots.txt for a bit, the current logic has the following
consequences.
</p>
<p>* If you Disallow: / in your robots.txt file, not only will your site not be
  indexed, but its very existence not even registered in the statistics table
  (and consequently will not appear on the master list of servers).
</p>
<p>* Disallow: intentionally says nothing about the itemtype, both because this
  is selector-oriented, and at least one person here (John) wanted as much
  overlap between the Web and gopher robots.txt files so that one filesystem
  can be presented both ways, and the robots.txt understood by both V-2 and
  any web robots.
</p>
<p>  The consequence is this. Any gopher server that requires an &quot;internal&quot;
  itemtype to be transmitted back to it (URLs like x.yz.com:70/11/something
  where the actual selector is 1/something) MUST include this in the
  Disallow: block (e.g., for this example, Disallow: 1/something).
</p>
<p>* Disallow: /path/ works for both /path and /path/ (not substrings of same).
</p>
<p>If this will cause trouble for people, advise ASAP. I&#x27;m planning to unleash
the crawler sometime in the next week or two.
</p>
<p>--
---------------------------------- personal: http://www.armory.com/~spectre/ --
 Cameron Kaiser, Floodgap Systems Ltd * So. Calif., USA * ckaiser@floodgap.com
-- If you want divine justice, die. -- Nick Seldon ----------------------------
</p>
</card>
</wml>
