Apur-ee.273
net.periphs
utcsrgv!utzoo!decvax!pur-ee!ghg
Sat Mar 20 14:53:58 1982
More SI experiences
	We have approx 15 or so SI 9400 controllers around here spread
over 3 or 4 PDP-11's and 6 or 7 Vaxen.  Drives are a mix of CDC 9766's
(with Dysan packs which are error free at the bit level) and Fuji
160's. There are approx 10-12 9766's and 15-16 Fuji's total.  The
Fuji's are "on loan" from SI until the Eagle (Fuji 470 MB) drives
are ready.  Around 3 years ago, we helped SI debug the Unibus &
11/70 cache bus 9400's and they worked pretty well after that.  We were
running CDC9766's 2 years before DEC announced the RM05 (a 9766).
We pretty much had a controller on each drive, and there was no need
for seeks and "autoconf" code at that time since DEC had nothing
similar on the offing.  In general we have been very pleased with
the 9766 drives (with Dysan packs) and got 2-3 failures/year, mostly
power supply and cable problems.  The 3 or 4 DEC drives (RP04) have
had nothing but problems since day one.

	With the arrival of VAXen and the SBI version of the SI
controllers, we started to see more quality control problems
in some of the controllers (chips not in sockets, etc), although
the SBI emulator has been mostly free of bugs. One of the main
troubles with the SBI adaptor is that the IC pins are not clipped
after the boards are DIP soldered and tend to short to the next board.
The 9400 "mother" board (the very large PC board) is only supported
by  connectors and tends to fall out.  About 1-2 years ago, 9400's
had cheap sleeve bearing fans which croaked in 6 MO or less,
cooking them and a cheap capacitor (MEPCO/ELECTRA) in the power
supply (mfg by ACDC) which failed causing "spikes" and dips on
the +5V and did "random" resets on the controller.  Unless the
software looks at the byte/address counters at the end of each
xfer, the driver will usually see a normal end of transfer
(the reset causes an interrupt which the dev driver thinks is
the xfer finishing). What you really got is a buffer of trash
with no deverror being reported.  Long ago, on general paranoia,
I changed all disk drivers here to HALT ON UNRECOVERED NON-RAW
ERRORS.  This has saved many a filesystem by stopping dead
before the damage spreads.  It is OK to let raw I/O errors go
since they are usually swap (which is well checked) or disk testing.
SI has since fixed the bad fans and switched vendors on their
power supplies.

Also when unusual errors arrise, the SBI (MBA) tends to hang up
with data xfer busy (0x80000000 bit in 0x20010008 set). Another
oddity is the RPDS ready sets slightly before the MBA data xfer
busy bit clears (must be due to internal buffering). This messes
up standalone boots, formatters, etc which spin on RPDS ready bit.

	As many of you have also noted, only one copy of the RPDT
register is maintained (for drive 0) and copied for other units.
This wreaks havoc for the autoconf program. I have hacked the driver
to attempt seeks to non existant heads and check for errors to
tell the difference between FJ's and 9766's on the same controller
only to find that a seek to a non existant head on a unit other
than 0 causes the controller to hang busy while it scribbles
into the registers for drive 0. SI says they are making the
RPDT register slightly user programmable (via DIP SW's) so one
can get a couple of unique (non DEC) drive types.  We run everything
"direct" mode (one large disk, not several "rm03"'s, etc per drive).

	The Seek code seems a little flakey. There is no
look-ahead register maintained... so SEARCH commands just
turn into seeks. We have also noticed bad things happening on
Fuji's when an overlapped seek is in progress and an error
occurrs on a data xfer to another drive.  The controller seems
to get lost and trash registers, etc... I haven't pinned this
down well yet and it is under investigation now. The multiple
CDC9766 drive systems DONT GET ERRORS (thanks to Dysan) and
have not had any seek problems.  Being winchesters, the Fuji's
have 4 or 5 badspots each (mostly 1 and 2 bit errors). The 9400
controllers CORRECT IN THE HARDWARE upto an 11 bit burst error.
As most of you know, until recently, nobody's driver seemed to
work correctly with handling correctable ECC errors on DEC drives.
All but two of our Fuji drives are error free due to the
error correction hardware on the 9400. The two drives with errors
each have one badspot (about 15 bits)

	The 4.1BSD bad block forwarding does not work with
currently delivered SI controllers. If you request it, they
may give you ECO's which when combined with a couple of driver
hacks will make it work I think.  The problem is that the
device registers point to the wrong thing when a xfer is
stopped on a bad sector on a multisector transfer.

	We do our own installations/maintenence and have avoided
most of the field service problems. I have seen an SI CE put
an alignment pack with bent platters in our drive and try to 
spin it up until I stopped him. This same CE was at another site
when they ended up with 2 head crashed 9766's and 4 or 5 crashed 
packs. We have never had a 9766 crash. About 1/4 of our Fuji
160's have head crashed within the first 2-3 weeks of operation
(esp when heads didn't move for several hours).  This goes
against all that I had heard about the almost perfect reliability
record for them.

	Overall I think WE are still ahead with SI over DEC
for OUR needs.  We need SBI disks and are not turnkey users.
I don't think everybody is in this situation though.

George Goble
School of Electrical Engineering
Purdue University
W. Lafayette, IN

decvax!pur-ee!ghg  or
ucbvax!pur-ee!ghg  or

ARPAVAX.ghg@Berkeley (for ARPA)

-----------------------------------------------------------------
 gopher://quux.org/ conversion by John Goerzen <jgoerzen@complete.org>
 of http://communication.ucsd.edu/A-News/


This Usenet Oldnews Archive
article may be copied and distributed freely, provided:

1. There is no money collected for the text(s) of the articles.

2. The following notice remains appended to each copy:

The Usenet Oldnews Archive: Compilation Copyright (C) 1981, 1996 
 Bruce Jones, Henry Spencer, David Wiseman.