Received: with ECARTIS (v1.0.0; list gopher); Mon, 30 Jun 2003 10:12:02 -0500 (CDT) Return-Path: X-Original-To: gopher@complete.org Delivered-To: gopher@complete.org Received: by gesundheit.complete.org (Postfix, from userid 108) id A29291832049; Mon, 30 Jun 2003 10:11:59 -0500 (CDT) X-Scanned-By: clamscan at complete.org Received: from floodgap.com (netblock-66-159-214-137.dslextreme.com [66.159.214.137]) by gesundheit.complete.org (Postfix) with ESMTP id 70B2B1832045 for ; Mon, 30 Jun 2003 10:11:53 -0500 (CDT) Received: (from spectre@localhost) by floodgap.com (8.9.1/2003.05.26) id IAA30688 for gopher@complete.org; Mon, 30 Jun 2003 08:21:25 -0700 From: Cameron Kaiser Message-Id: <200306301521.IAA30688@floodgap.com> Subject: [gopher] Re: bot's running In-Reply-To: <004001c33ef0$e45c1920$43da5982@killspy> from Ruliz Galaxor at "Jun 30, 3 12:18:02 pm" To: gopher@complete.org Date: Mon, 30 Jun 2003 08:21:25 -0700 (PDT) X-Mailer: ELM [version 2.4ME+ PL39 (25)] MIME-Version: 1.0 Content-type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 8bit X-archive-position: 778 X-ecartis-version: Ecartis v1.0.0 Sender: gopher-bounce@complete.org Errors-to: gopher-bounce@complete.org X-original-sender: spectre@floodgap.com Precedence: bulk Reply-to: gopher@complete.org List-help: List-unsubscribe: List-software: Ecartis version 1.0.0 List-Id: Gopher X-List-ID: Gopher List-subscribe: List-owner: List-post: List-archive: X-list: gopher > Yes, I noticed... and I have a small question or actually favor to ask... > because I'm currently using the system in which the type of request should > also be present (e.g. 0/robots.txt). > so could the bot both check for robots.txt and 0/robots.txt? or is that a > problem? I think it will probably be okay. This is how it will work though: The bot will check for "robots.txt" first. If this works, fine, this is accepted. Next the bot will check for "0/robots.txt". If this works, fine, this is accepted; otherwise, no robots.txt is used for the site. The reason this is worth bringing up is this could potentially map to different selectors/files depending on the server, so the behaviour needs to be known. Thus selector "robots.txt" always takes precedence if found. If this is no problem to everyone, I'll take down the bot for a few minutes this afternoon and add in the changes. Obviously whenever the bot restarts, it refetches all robot exclusions; these are held in memory and not in MySQL, since they're transient anyway. > greets and keep on the great work, Arigatoo :-) If people want to look up stats while the bot is crawling, gopher://helsinki.floodgap.com/1/world/ Refresh and watch the numbers change. Great for those coffee breaks. -- ---------------------------------- personal: http://www.armory.com/~spectre/ -- Cameron Kaiser, Floodgap Systems Ltd * So. Calif., USA * ckaiser@floodgap.com -- Greek tailor shop: "Euripedes?" "Yes -- Eumenides?" ------------------------