Received: with ECARTIS (v1.0.0; list gopher); Fri, 28 Dec 2007 12:39:19 -0600 (CST) Received: from floodgap.com ([66.159.214.137] ident=elvis) by glockenspiel.complete.org with esmtp (Exim 4.63) id 1J8K72-00040H-3f for gopher@complete.org; Fri, 28 Dec 2007 12:39:18 -0600 Received: (from spectre@localhost) by floodgap.com (6.6.6.666.1/2007.10.21) id lBSIdDbN017848 for gopher@complete.org; Fri, 28 Dec 2007 10:39:13 -0800 From: Cameron Kaiser Message-Id: <200712281839.lBSIdDbN017848@floodgap.com> Subject: [gopher] Re: Improved binary file detection in Bucktooth 0.2.2 In-Reply-To: <20071228162923.GA26591@pongonova.net> from "brian@pongonova.net" at "Dec 28, 7 10:29:23 am" To: gopher@complete.org Date: Fri, 28 Dec 2007 10:39:13 -0800 (PST) X-Mailer: ELM [version 2.4ME+ PL39 (25)] MIME-Version: 1.0 Content-type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 8bit X-Spam-Status: No (score 0.0): AWL=0.004 X-Virus-Scanned: by Exiscan on glockenspiel.complete.org at Fri, 28 Dec 2007 12:39:18 -0600 X-archive-position: 1775 X-ecartis-version: Ecartis v1.0.0 Sender: gopher-bounce@complete.org Errors-to: gopher-bounce@complete.org X-original-sender: spectre@floodgap.com Precedence: bulk Reply-to: gopher@complete.org List-help: List-unsubscribe: List-software: Ecartis version 1.0.0 List-Id: Gopher X-List-ID: Gopher List-subscribe: List-owner: List-post: List-archive: X-list: gopher > > The other thing I might do is just expand the number of file extensions > > Bucktooth recognizes and generates item types for, since -B is the > > fall-through case and there will always be datasets falling in the tails > > of the bell curve. > > There is a Perl module that used the /etc/magic file to determine file > types in the same way as "file" does (File::Type). That might be one > approach... Yeah, that would work too, except it would still be a lot of iteration (but definitely would save the overhead of a fork). What I need to do is just figure out a more reliable way to identify binary data cheaply ("let Perl do it" being, of course, the cheapest way from Perl :-). -- ------------------------------------ personal: http://www.cameronkaiser.com/ -- Cameron Kaiser * Floodgap Systems * www.floodgap.com * ckaiser@floodgap.com -- When in doubt, take a pawn. -- Mission: Impossible ("Crack-Up") ------------