<?xml version="1.0"?>
<!DOCTYPE wml PUBLIC "-//WAPFORUM//DTD WML 1.1//EN"
"http://www.wapforum.org/DTD/wml_1.1.xml">
<wml>
<card id="index" title="Text File" newcontext="true">
<p>
Received: with ECARTIS (v1.0.0; list gopher);
 Fri, 28 Dec 2007 01:22:46 -0600 (CST)
Received: from static-71-170-11-156.dllstx.dsl-w.verizon.net ([71.170.11.156]
 helo=turquoise.pongonova.net)
	by glockenspiel.complete.org with esmtp
	(Exim 4.63)
	id 1J89YJ-0004eu-W2
	for gopher@complete.org; Fri, 28 Dec 2007 01:22:46 -0600
Received: by turquoise.pongonova.net (Postfix, from userid 1000)
	id 99706726; Fri, 28 Dec 2007 01:23:39 -0600 (CST)
Date: Fri, 28 Dec 2007 01:23:39 -0600
From: brian@pongonova.net
To: gopher@complete.org
Subject: [gopher] Improved binary file detection in Bucktooth 0.2.2
Message-ID: &lt;20071228072339.GA25327@pongonova.net&gt;
Mime-Version: 1.0
Content-type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.5.5.1i
X-Spam-Status: No (score 0.6): AWL=0.000, NO_REAL_NAME=0.55
X-Virus-Scanned: by Exiscan on glockenspiel.complete.org at Fri,
 28 Dec 2007 01:22:46 -0600
Content-Transfer-Encoding: 8bit
X-archive-position: 1772
X-ecartis-version: Ecartis v1.0.0
Sender: gopher-bounce@complete.org
Errors-to: gopher-bounce@complete.org
X-original-sender: brian@pongonova.net
Precedence: bulk
Reply-to: gopher@complete.org
List-help: &lt;mailto:ecartis@complete.org?Subject=help&gt;
List-unsubscribe: &lt;mailto:gopher-request@complete.org?Subject=unsubscribe&gt;
List-software: Ecartis version 1.0.0
List-Id: Gopher &lt;gopher.complete.org&gt;
X-List-ID: Gopher &lt;gopher.complete.org&gt;
List-subscribe: &lt;mailto:gopher-request@complete.org?Subject=subscribe&gt;
List-owner: &lt;mailto:jgoerzen@complete.org&gt;
List-post: &lt;mailto:gopher@complete.org&gt;
List-archive: &lt;http://www.complete.org/mailinglists/archives/&gt;
X-list: gopher
</p>
<p>I&#x27;m using buckd to serve up binary files, and noticed that several
binary files (mostly older PDFs with a lot of text in the file header)
were being identified as item type &quot;0&quot; rather than &quot;9&quot;. It turns out
that buckd uses the Perl -B operator to determine binary files.  To do
this, it examines some number of bytes in the file header for certain
characteristics (nul bytes, high-order bits set, etc.) and if that
number of bytes exceeds 30%, Perl identifies it as a binary file.
</p>
<p>This wasn&#x27;t accurate enough for my purposes, so I modified buckd.in so
that it calls the UNIX &quot;file&quot; command and greps for the string &quot;text&quot;
(guaranteed to be returned if a file is identified as a text file).
</p>
<p>I just want to emphasize that this is *not* a problem with Bucktooth,
but rather an issue with Perl.
</p>
<p>Here&#x27;s the patchfile with the change.  I opted to modify buckd.in and
simply regenerate buckd.
</p>
<p>--- buckd.in    2007-12-28 01:21:30.000000000 -0600
+++ buckd.in.new        2007-12-28 01:20:58.000000000 -0600
@@ -289,7 +289,7 @@
                ($xentr =~ /\.jpe?g$/i) ? &quot;I&quot; :
                ($xentr =~ /\.html?$/i) ? &quot;h&quot; :
                ($xentr =~ /\.hqx$/i) ? &quot;4&quot; :
-               (-B $xentr) ? &quot;9&quot; :
+               (grep(!/text/, `file $xentr`)) ? &quot;9&quot; :
        &quot;0&quot;;
        $xentr =~ s/^$DIR//;
        return ($itype, ($pentr eq $xentr) ? &#x27;&#x27; : $xentr);
</p>
<p>  --Brian
</p>
<p></p>
</card>
</wml>
