<?xml version="1.0"?>
<!DOCTYPE wml PUBLIC "-//WAPFORUM//DTD WML 1.1//EN"
"http://www.wapforum.org/DTD/wml_1.1.xml">
<wml>
<card id="index" title="Text File" newcontext="true">
<p>
#!/usr/local/bin/perl
# go4check, v1.3
#
#-------------------------------------------------------------------------------
# Introduction
#   go4check checks gopher links, probing each connection and testing the
#   output received.  It handles most types of links, reporting if the link
#   is ok, the host serving it is down/refusing connections, or its pathname
#   has changed.  It is not 100% successful at this, especially when it
#   comes to gopher0 servers, but does indeed help you keep on top of links
#   in your server(s).
#
#   To run, go4check requires only perl and socket.ph.  It understands
#   gopher0 and gopher+ servers.
#
#   go4check produces a line of output on stdout for each item appearing
#   in a gopher&#x27;s menu: the name of the item plus a result.  Indentation
#   serves to maintain items in context so problems can be located easily.
#   As an extra benefit, go4check&#x27;s output can be used as a roadmap of
#   the gopher after some rather trivial editing to remove results.
#
#   go4check is written by George A. Theall, George.A.Theall@mail.tju.edu.
#   You may freely use and redistribute this.  I can not offer any
#   support for this but am interested in your comments, suggestions,
#   and problem reports.
#
#   The latest version is available via gopher as:
#      gopher://tjgopher.tju.edu/00/networks/internet/tools/gopher/go4check
#
#   Note: Version 1.3 will probably be the last version of go4check I release.
#
#-------------------------------------------------------------------------------
# Operation
#   Before you run go4check, make sure perl and the header file socket.ph are
#   available on your system. [You can generate this file by running the perl
#   utility h2ph on /usr/include/sys/socket.h, or something similar.]
#
#   Invoke go4check with the name of the server to check and an optional port
#   number.  Other options can be used to specify a non-standard starting
#   path or generate copious debugging info.  go4check will test the items
#   listed in the initial menu and recurse into any menus it finds as long
#   as the names of server it finds match the one specified at go4check&#x27;s
#   invocation. go4check does, though, skip recursion if pathnames refer
#   to ftp gateways or point back to the initial entry point.
#
#   Results are directed to stdout, so you probably will want to redirect
#   to a file.  You might then remove instances of &quot;...ok.&quot;, which
#   indicate no problems and finally search on &quot;...can&#x27;t connect.&quot;,
#   &quot;...path changed.&quot;, and &quot;...timed out.&quot;.  Another possible result
#   is &quot;...n/a.&quot;, which is used when go4check doesn&#x27;t know how to check
#   a particular type of link.
#
#   You may want to tune the variables that go4check uses for testing
#   items of type 2 and 7.  See below where initial values are defined.
#   For items of type 2, go4check sends a invalid command, which causes
#   many CSO servers to respond in a way that go4check interprets as a
#   success.  As for items of type 7, I don&#x27;t know of any robust way
#   to test searches.  Currently, the best solution appears to be
#   to search for a word that&#x27;s common to whatever searches are in the
#   gopher being checked.
#
#   go4check is slow; it probably belongs in a cron job to run at night.
#
#-------------------------------------------------------------------------------
# History
#   15-Mar-95, GAT, v1.3
#	- Added ability to avoid recursing into selected paths.  Paths are
#	  tested using substr() so you can have go4check check a directory
#	  but not recurse further by appending a &quot;/&quot;, if that&#x27;s what you want.
#
#   27-Feb-95, GAT, v1.2
#      - Moved alarm for connecting to within the tcpconnect subroutine
#        to handle better time-out conditions.
#      - Wrapped initial gopher connection and telnet check with alarms.
#
#   31-Jan-95, GAT, v1.1
#      - Alarms are now used to abort connections that are otherwise hung.
#      - Added patches from R.D. Cameron for supporting type 7 items with
#        non-empty paths and checking error returns of type 3.
#      - Fixed glitch that arose on some servers (gopher.uwsp.edu for one)
#        that return lines with non-standard endings.
#      - Explicitly added an assignment for $| and set it to true so output
#        will be flushed after every print.
#
#   17-Oct-94, GAT
#      - Added a semicolon after a line in make_URL.  Its lack appears to
#        cause problems with some versions of Perl.
#
#   01-Sep-94, GAT, v1.0
#      - Released publically.
#
#   10-Aug-94, GAT, v1.0b2
#      - Added $snooze_length as a way to control how long to pause after
#        establishing a connection.
#      - Fixed initialization of %URLs.
#      - Changed format of internal URLs by removing &quot;:&quot; from between type
#        and path info.
#      - Used a configurable word to check search items.
#      - Added check of CSO servers.
#      - Adjusted regular expression used to check success/failure of
#        a link.
#      - Documented go4check&#x27;s operation.
#
#   12-Jul-94, GAT, v1.0b1
#      - Used pseudo URLs internally for storing links so they are not
#        checked more than once.
#      - Added support for most types of links, including telnet, binary
#        files, and searches.
#      - Used gopher+ protocol whenever possible to avoid retrieving
#        entire files.
#
#   09-Jun-94, GAT, v1.0a
#      - First version of go4check. Checks only files and directories.
#
#-------------------------------------------------------------------------------
</p>
<p></p>
<p># Specify where perl can find include files.
push(@INC, &quot;/usr/local/lib/perl&quot;);
</p>
<p></p>
<p># Define initial values for selected variables.
$| = 1;					# flush after every print?
$default_path2 = &quot;helo&quot;;		# for searching type 2 items
$default_search_term = &quot;cancer&quot;;	# for searching type 7 items
@excluded_paths = (			# paths to exclude
	&quot;1/tjgopher/changes&quot;,
	&quot;1/tju/atrium/&quot;,
	&quot;1/tju/nutrition/atrium&quot;,
	&quot;1/tju/jeffnews/current&quot;,
	&quot;1/tju/jeffnews/backissues/&quot;,
	&quot;1/tju/marketing/jeffnews&quot;,
	&quot;1/gophers/bylocation/tju&quot;,
	&quot;1/gophers/bylocation/philly&quot;);
$Indent = &quot;  &quot;;				# indentation at each level
$snooze_length = 3;			# time to snooze before connect
$timeout = 180;				# max len of connect (seconds)
%URLs = ();				# array of URL&#x27;s on server
</p>
<p></p>
<p># Check for options.
$DEBUG = 0;				# default to no debug
if ($ARGV[0] eq &#x27;-d&#x27;) {
	shift;
	$DEBUG = 1;
}
</p>
<p></p>
<p># Parse commandline args and provide help as needed.
$inithost = shift || &quot;&quot;;		# name of host to check
$initport = shift || 70;		# port number
$initpath = shift || &quot;&quot;;		# initial directory
if ($inithost eq &quot;&quot; || $inithost eq &quot;-?&quot;) {
	print &quot;$0 checks links in a gopher by probing connections\n\n&quot;;
	print &quot;Usage:  $0 [-d] host [port] [\&quot;path\&quot;]\n&quot;;
	print &quot;        unless specified, port defaults to 70 and path to \&quot;\&quot;.\n&quot;;
	print &quot;        -d is used for debugging.\n&quot;;
	exit(9);
}
</p>
<p></p>
<p># Set up subroutines to catch some alarms.
$SIG{&#x27;ALRM&#x27;} = handle_Timeout;
</p>
<p></p>
<p># Establish connection and check links.
require &#x27;sys/socket.ph&#x27;;
chop($thishost = `hostname`);		# needed for tcpconnect
&amp;check_Links($inithost, $initport, $initpath);
exit(0);
</p>
<p></p>
<p>########################################################################
#  check_Links - checks links for a given directory.                   #
#                                                                      #
#  Notes:                                                              #
#      - Links on the same host will be followed unless they point to  #
#        the root.  While this will prevent most recursion, there may  #
#        be some gophers with odd setups that lead to infinite loops.  #
#      - FTP links are not followed.                                   #
#  Entry:                                                              #
#        host = hostname                                               #
#        port = port number                                            #
#        path = selector string                                        #
#  Exit:                                                               #
#        New links are appended to @URLs.                              #
########################################################################
sub check_Links {
	local($host, $port, $path) = @_;
	local($margin) = $Indent . $margin;
	local($stat);
	local(@Items);
</p>
<p></p>
<p>	# Establish connection and read contents.
	$DEBUG &amp;&amp; print &quot;DEBUG: connecting to $host at port $port.\n&quot;;
	($GOPHER) = &amp;tcpconnect($host, $thishost);
	if ($@ &amp;&amp; $@ =~ /Timed Out/) {
		die &quot;$@&quot;;
	}
	($GOPHER) || die &quot;Can&#x27;t connect&quot;;
	$DEBUG &amp;&amp; print &quot;DEBUG: sending path \&quot;$path\&quot;.\n&quot;;
	eval {
		alarm($timeout);
		send($GOPHER, &quot;$path\r\n&quot;, 0);
		@Items = &lt;$GOPHER&gt;;
		close($GOPHER);
		alarm(0);
	};
	if ($@ &amp;&amp; $@ =~ /Timed Out/) {
		die &quot;$@&quot;;
	}
</p>
<p></p>
<p>	# Check each item, recursing into directories as necessary.
	foreach (@Items) {
		local($atype, $aname, $apath, $ahost, $aport, $aextra);
</p>
<p>		s/\s*$//;		# remove \r\n combo
		last if (/^\.$/);	# done if line is just a period
</p>
<p></p>
<p>		# Check status of each unique URL.
		$url = &amp;make_URL($_);
		s/^(.)// &amp;&amp; ($atype = $1);
		($aname, $apath, $ahost, $aport, $aextra) = split(/\t/, $_);
		chop($ahost) if ($ahost =~ /\.$/);
		if (defined($URLs{$url})) {	# already checked
			print &quot;$margin$aname...$URLs{$url}.\n&quot;;
		}
		else {
			$stat = ($URLs{$url} = &amp;test_URL($url, $aextra));
			print &quot;$margin$aname...$stat.\n&quot;;
		}
</p>
<p></p>
<p>		# Recurse as necessary.
		if ($stat eq &quot;ok&quot; &amp;&amp;
				$atype == 1 &amp;&amp;
				$ahost eq $inithost &amp;&amp;
				$aport eq $initport &amp;&amp;
				$apath ne &quot;&quot; &amp;&amp;
				&amp;is_Excluded($apath) == 0 &amp;&amp;
				$apath !~ /ftp.*:/) {
			&amp;check_Links($ahost, $aport, $apath);
		}
	}
}
</p>
<p></p>
<p>################################################
#  make_URL - constructs a URL from a string.  #
#                                              #
#  Notes:                                      #
#      - The URLs generated here are not 100%  #
#        kosher, only used internally.         #
#                                              #
#  Entry:                                      #
#        string as passed by gopher server.    #
#  Exit:                                       #
#        string representing URL.              #
################################################
sub make_URL {
	local($_) = @_;
	local($url);
	local($type, $name, $path, $host, $port);
</p>
<p></p>
<p>	s/^(.)// &amp;&amp; ($type = $1);
	($name, $path, $host, $port) = split(/\t/, $_);
	chop($host) if ($host =~ /\.$/);
	if ($type =~ /[01245679sgMhIi]/) {
		$url = &quot;gopher://$host:$port/$type$path&quot;;
	}
	elsif ($type =~ /[8T]/) {
		$url = &quot;telnet://&quot;;
		$path !~ /^$/ &amp;&amp; $url .= &quot;$path@&quot;;
		$url .= $host;
		$port &gt; 0 &amp;&amp; $url .= &quot;:$port&quot;;
		$url .= &quot;/&quot;;
	}
	return($url);
}
</p>
<p></p>
<p>###########################################################################
#  test_URL - check that a URL is accessible.                             #
#                                                                         #
#  Notes:                                                                 #
#      - I don&#x27;t have a good way to check gopher0 servers.  Currently, I  #
#        look for the string &quot;error.host&quot;, which servers like gn seem to  #
#        generate.  However, this fails with KA9Q, for which an error     #
#        message is indistinguishable from regular text.                  #
#      - For gopher+, a error code indicating a server is too busy is     #
#        treated as an error.  This may not be the right thing to do.     #
#      - If the server understands gopher+, we&#x27;ll only ask for info (!)   #
#        so as not to retrieve large files.  This approach also seems to  #
#        be the only way to check ASK blocks reliably.                    #
#      - CSO nameservers (type 2) are checked with an invalid command -   #
#        this returns a warning message from the server that is not       #
#        regarded as an error by go4check. Using the command &quot;fields&quot;     #
#        does *not* work since this typically results in lines starting   #
#        with -2, which look like errors.                                 #
#      - Checks of telnet links only see if host is up; no attempt        #
#        is made to login to whatever account may be specified.           #
#      - Checks of FTP links could be improved.  Currently, the info      #
#        returned is not examined beyond looking for the usual signs      #
#        of failure.                                                      #
#  Entry:                                                                 #
#        URL = URL to test                                                #
#        GPLUS = extra character indicating a gopher+ item.               #
#  Exit:                                                                  #
#        Text string indicating status of URL:                            #
#           &quot;ok&quot; = everything ok                                          #
#           &quot;can&#x27;t connect&quot; = can&#x27;t connect to host                       #
#           &quot;path changed&quot; = path changed                                 #
#           &quot;n/a&quot; = unknown status                                        #
###########################################################################
sub test_URL {
	local($_, $gplus) = @_;
	local($protocol, $logonid, $host, $port, $type, $path);
	local($1, $2, $3, $4, $5);
</p>
<p></p>
<p>	$DEBUG &amp;&amp; print &quot;DEBUG: checking $_.\n&quot;;
	m#^(\w+)://(.*):(\d+)/?(.?)(.*)#;
	$protocol = $1;
	$host = $2;
	$port = $3;
	$type = $4;
	$path = $5;
	if ($host =~ /@/) {
		($logonid, $host) = split(/@/, $host);
	}
	$DEBUG &amp;&amp; print &quot;protocol=$protocol; logonid=$logonid; host=$host; port=$port; type=$type; path=$path.\n&quot;;
</p>
<p></p>
<p>	# Check gopher links.
	if ($protocol eq &quot;gopher&quot;) {
		local($GOPHER);
		local($Stuff);
</p>
<p>		$DEBUG &amp;&amp; print &quot;DEBUG: checking gopher at $host;$port.\n&quot;;
		($GOPHER) = &amp;tcpconnect($host, $thishost);
		if ($@ &amp;&amp; $@ =~ /Timed Out/) {
			return &quot;timed out&quot;;
		}
		($GOPHER) || return &quot;can&#x27;t connect&quot;;
		$path .= &quot;\t!&quot; if ($gplus);	# Modify selector to get only info
		if ($type eq &quot;2&quot;) {
			$path = $default_path2 if ($path =~ /^$/);
		}
		elsif ($type eq &quot;7&quot;) {
                # Modification Oct. 19/94 by R.D. Cameron to append
                # handle the nonempty $path case:  to test in this
                # case, we send a tab and the search term after the
                # $path.
			if ($path =~ /^$/) {
				$path = $default_search_term;
			}
			else {
				$path = &quot;$path\t$default_search_term&quot;;
			}
			$path =~ s#^waissrc:(.*)/.*$#1$1#;
		}
		$DEBUG &amp;&amp; print &quot;DEBUG: sending path \&quot;$path\&quot;.\n&quot;;
		eval {
			alarm($timeout);
			send($GOPHER, &quot;$path\r\n&quot;, 0);
			$Stuff = &lt;$GOPHER&gt;;
			close($GOPHER);
			alarm(0);
		};
		if ($@ &amp;&amp; $@ =~ /Timed Out/) {
			return &quot;timed out&quot;;
		}
		$DEBUG &amp;&amp; print &quot;DEBUG: read \&quot;$Stuff\&quot;.\n&quot;;
</p>
<p></p>
<p>		# Test line for signs of errors.
		#
                # Modification Oct. 19/94 by R.D. Cameron to
                # check for type 3 error returns when a directory
                # listing is expected.  (According to the gopher
		# protocol, &quot;3&quot; as the first character of a directory
		# entry always indicates error.
		if ((($type eq &quot;1&quot;) | ($type eq &quot;7&quot;)) &amp;
		    ($Stuff =~ /^3/)) {
                        return(&quot;path changed&quot;);
		}
		# Test line for other signs of errors.
		elsif ($Stuff =~ /(^\-\-\d)|(\terror.host\t\d+)/) {
			return(&quot;path changed&quot;);
		}
		else {
			return(&quot;ok&quot;);
		}
	}
</p>
<p></p>
<p>	# Check telnet links.
	if ($protocol eq &quot;telnet&quot;) {
		local($TELNET);
</p>
<p>		$DEBUG &amp;&amp; print &quot;DEBUG: checking telnet at $host;$port.\n&quot;;
		($TELNET) = &amp;tcpconnect($host, $thishost);
		if ($@ &amp;&amp; $@ =~ /Timed Out/) {
			return &quot;timed out&quot;;
		}
		($TELNET) || return &quot;host down&quot;;
		return &quot;ok&quot;;
		close($TELNET);
	}
</p>
<p></p>
<p>	# If we get here, we don&#x27;t know how to test the link.
	return(&quot;n/a&quot;);
}
</p>
<p></p>
<p></p>
<p>#######################################################
#  is_Excluded - checks if a path is to be excluded.  #
#                                                     #
#  Entry:                                             #
#        path to be tested.                           #
#  Exit:                                              #
#        0/1 indicating no/yes.                       #
#######################################################
sub is_Excluded {
	local($path) = @_;
</p>
<p></p>
<p>	for (@excluded_paths) {
		if (index($path, $_) &gt;= $[) {
			return(1);
		}
	}
	return(0);
}
</p>
<p></p>
<p>################################################################
#  This comes from gopherhunt by Paul Lindner.                 #
#                                                              #
#  I&#x27;ve added a line to abort if it can&#x27;t resolve an address.  #
#  and return 0 if failure rather than die. GAT                #
#                                                              #
#  I also added an alarm to handle time-out conditions. GAT    #
################################################################
sub tcpconnect {                    #Get TCP info in place
   local($host, $hostname) = @_;
   local($name, $aliases, $type, $len);
   local($thisaddr, $thataddr, $this, $that);
   local($sockaddr);
   $sockaddr = &#x27;S n a4 x8&#x27;;
</p>
<p>   ($name,$aliases,$proto) = getprotobyname(&#x27;tcp&#x27;);
   ($name,$aliases,$port) = getservbyname($port, &#x27;tcp&#x27;)
        unless $port =~ /^\d+$/;
   ($name,$aliases,$type,$len,$thisaddr) = gethostbyname($hostname);
   ($name,$aliases,$type,$len,$thataddr) = gethostbyname($host);
   $name || return(0);
</p>
<p>   $this = pack($sockaddr, &amp;AF_INET, 0, $thisaddr);
   $that = pack($sockaddr, &amp;AF_INET, $port, $thataddr);
</p>
<p>   sleep($snooze_length);
</p>
<p>   eval {
	alarm($timeout);
	socket(N, &amp;PF_INET, &amp;SOCK_STREAM, $proto) || return(0);
	bind(N, $this)                            || return(0);
	connect(N, $that)                         || return(0);
	alarm(0);
   };
</p>
<p>   return(N);
}
</p>
<p></p>
<p>#####################################################
#  handle_Timeout - Die with a specific message.    #
#                                                   #
#  Notes:                                           #
#        - Calls to alarm() should be in an eval    #
#          block.                                   #
#                                                   #
#  Entry:                                           #
#        n/a                                        #
#  Exit:                                            #
#        Message &quot;Timed Out&quot; is returned.           #
#####################################################
sub handle_Timeout {
	$DEBUG &amp;&amp; print &quot;DEBUG: Timed Out.\n&quot;;
	die &quot;Timed Out&quot;;
}
</p>
</card>
</wml>
