I wrote recently about a little machine logbook I found in a disused storage cabinet in the old science building. The machine it refers to might be one produced by Floating Point Systems, Inc., since I found a processor manual (part #860-7259-003) in the same cabinet, and some of the instructions seem similar. The manual was published in 1979 (heck, I don't think I had even seen a computer at the time) by FPS Inc. of Beaverton, Oregon. The manual starts out promising:

Historically, array transform processors have been largely integer-arithmetic devices, since the slower processing rate of floating-point arithmetic was undesirable when working with large arrays of data. However, integer methods have problems which make programming awkward due to the limited dynamic range of integer arithmetic. Array scaling and block floating-point techniques either allowed human and other errors to creep into the results or were costly and time consuming. Further, as processing became more sophisticated, even 16-bit integer data words were insufficiently precise for preserving the accuracy of simple 8-bit analog-to-digital converted input data. This is because the many multiplications and additions in typical cascaded array processing can cause the propagation of truncation errors.

You know, I think the exact same paragraph (well, ok, let's increase the numbers from 16 to 64, for instance) could apply today. Paging forward, we find that the floating-point representation used in this machine is "..a true 10-bit binary exponent, which has more dynamic range than the standard 7-bit hexadecimal or 8-bit binary exponent. FPS then uses a 28-bit mantissa, plus three guard bits in the adder and a double mantissa at the multiplier output, ..." Count 'em, a 38-bit floating point representation. Bear in mind that IEEE 754 was created in 1985, with a 32-bit format.

Moving on to the functional units of the processor, we find both a main memory (MD) of up to 64k words (each 38 bits) plus a faster memory (167ns cycle time) called the table memory (TM) of 512 words, each 38 bits. The idea is that you store constants in table memory and use an index into the table memory instead of direct mode. There's two banks of accumulators, 32 in each bank, so one might speak of having 64 floating point registers. There's some data path fiddly bits here, and anyone who has read Tanenbaum's computer design books will understand the trade-off between bus width, control line count and ease of (micro-)programming. The adders and floating-point multiplier are pipelined, with two and three stages respectively.

The processor has DMA capability -- not within its own memory, but to the memory of a host machine. In other words, it hooks up to the memory bus of some other processor, can copy blocks of data to the local memory, do computations, and then move them back. So it's a co-processor.

And the whole thing fits in 16U of 19" rack space and draws 1200W.

This goes to show that we (as in computer science and the computer industry) have come a long way since 1979. It also shows that there's very little truly new stuff -- stream processors in your graphics card are similar in many ways to this old beast. Also, in the best style of wikis everywhere, the manual ends up with a "help us improve this manual, send in comments on the following postage-paid form." See, everything old is new again.