Skip to main content


Shayle Searle
1956 - 1995

I arrived at Cornell in September 1956. Professor (of Animal Breeding) C. R. Henderson was already very enthusiastic about Cornell having acquired a stored-program computer, an IBM 650. It had a capacity of 2000 10-digit (plus sign) words of storage. It was located in Phillips Hall, and the Animal Science Department was allocated access to it for limited hours each day, for preparing monthly production reports, cow by cow, for the numerous herds participating in the New York Dairy Herd Improvement plan. Computer programs written by Animal Science personnel were the basis for preparing the monthly reports. To produce these reports thousands of pre-punched IBM cars, each showing a cow's milk production for the month, were taken from Wing (later Morrison) Hall to Phillips and processed. This produced an equal number of output cards which, with the input cards, were taken back to Wing and run though an IBM tabulator machine which printed reports that were then mailed to farmers. Henderson and others thoroughly appreciated how greatly useful to farmers these reports were (and still are).

But Henderson, whose interests were strongly statistical, also saw the huge opportunities for using the computer as a beneficial research tool - both for developing new methods of estimating genetic improvement in dairy cow populations and also for doing statistical calculations of large arrays of data. Thus it was that he and some of his graduate student assistants (myself included) were involved in statistically oriented research.

Achieving computations by way of the IBM 650 was by means of the "language" known as SOAP - symbolic optimum assembly programming. It was based on the 3-letter words (acronyms) such as LDD for "load distributor" the latter being the means through which most of the various arithmetic and other operations were executed. And for any desired series of calculations, a series of such instructions and data punched one each on IBM 80-column cards, could achieve the desired result.

Although this sounds so primitive in light of to-day's computing environment, we were absolutely fascinated, not only by the technology but also by what that technology could do for us: for example, inverting a 10-by-10 matrix in 7 minutes seemed to be a miraculously short time, especially compared to 3 or more hours, when doing it by hand. My Ph.D. thesis, which involved inter-related regressions for a new method of estimating the effect of a young cow's age on the amount of milk she produced, involved calculations that would have been horrendous without a computer for the 3,000 or so cows involved; but it involved "only" two or three 8-hour nights using the computer, which by that time (1958) the Animal Science Dept. had acquired, an IBM 650 the same as in Phillips Hall.

Of course all of the effort in using the computer at this time was based on IBM cards. Handling them demanded care; to be sure that the correct information had been entered on each cow's card. And one had to be certain the cards never got out of sequence; that not one of them got lost, not even bent. Their information entered the computer via a reading mechanism which required careful placement of the cards; if a single card caused malfunction of the reading process the whole system ground to a stop until rectified. Then the reading process had to be re-started, making sure in doing so that no one card got overlooked or got read twice. It was a pain! I well remember the third of these all-nighters when around 2:00 a.m. I started to read in my last data, which would lead to 3 to 4 hours of computing. This had all gone very smoothly for the two previous data sets, so feeling very confident I started reading the third data set, turned the lights off in the computer room, locked the door and went home to bed. I anticipated a 7:00 a.m. return to the computer room to pick up my finished results. I made the 7:00 a.m., but a bent card had let me down; only a few cards had been read after my 2:05 a.m. departure.

Of course, output was also just punched cards; and to easily see what the output was, those cards had to be run through another IBM machine, the tabulator, which produced (printed) the cards' content.

In 1959 I returned to my job as Research Statistician with the New Zealand Dairy Board, from which I'd taken leave to go to Cornell. And in 1962, I came back to Cornell, invited to be statistician in the University Computing Center located in Rand Hall. This was when the CDC 1604 computer was just coming to Cornell - a huge advance in size, speed, and technology compared to the IBM 650, which I'd cut my teeth on.

My responsibilities were two-fold; to be a consultant for students and faculty who were wanting statistical analyses of data; and to decide what computer program packages should be developed and made available for doing the calculations of statistical analyses. The Computing Center had a programmer who could write the necessary programs, using Fortran. It was 1962, and there were yet no commercially available software such as we have today. BMD from the University of California in Los Angeles (supported by NIH) was well on the way, but not available. And SAS had barely started; its first users' conference was not until 1975.

Consulting with (mostly graduate) students and faculty provided some unique situations. One student, from a smallish amount of data, had calculated a correlation coefficient. With my explaining that his calculation provided little or no evidence for a real, non-zero, correlation he asked "would it perhaps be significant if there had been more data?" "Yes", was the answer and off he went. Months later, after the consultee had left town with his Master's degree, a new student took up the research project - and discovered that the departed consultee had made a second copy of the data and calculated the correlation from the now double-sized data set! Same result, of course!! And it had been announced as significant!

Another student had data with a number of potential data values not available, i.e., missing values. As was quite customary in those days, he used the code of -1 to indicate a missing value. But then he ran his regression analysis treating the negative 1s as data!

Faculty also had the occasional problem. A sociologist persuaded me to have a program written for Kendal's rank correlation. He assured me, very firmly, he would make great use of it. I was skeptical but eventually agreed. But the program failed - because the professor failed to tell us that within each of his 20 or so variables, many of the 200 observed values were the same. This nullified even considering ranks and made the substantial programming effort valueless.

And then there was the professor who had published an analysis (of variance) of his data which had provoked letters from his peers saying, "Your analysis just doesn't seem right." So he asked me what could be done. I asked for his data, some 300 values, and on running my eye down the column of 3-digit numbers, I found two that were 5 digits, ending in 00. They were 100 times too big (due to a data-entry error), which had completely upset the analysis. Correcting them produced a sensible analysis. This, of course, was before the days of software including data checking and editing.

For 1965 I was offered a line-item faculty position as Assistant Professor of Biological Statistics in the Biometrics Unit of the College of Agriculture, where I had had an honorary appointment from the day I'd come back to Cornell in 1962. A number of years after taking that faculty position I made my final contribution to Cornell computing: Annotated Computer Output, (ACO). The time had come when there were several widely-used commercial software packages for calculating analyses of variance for an extensive variety of data sets, especially for what are now called unbalanced data (having unequal numbers of observations in the cells defined by the various classifications of the data). Unfortunately, among users of the software there was much misunderstanding and confusion as to how the software output had been calculated and what its meaning was. This prompted development of the ACO documents, which were based on simple, artificial data sets, for which all pertinent calculations were done by hand. Those calculations and computer output were then displayed alongside one another (computed output on the left half of a page, and hand calculations on the right), along with quite extensive notes and book references explaining necessary details. Several hundreds of these ACOs were sold through the Biometrics Unit at Cornell.

As a computer-oriented faculty member my first major activity was to set up a successor to the tab shop in Warren Hall which had for many years supported the punched card technology for co-operating units throughout the Statutory Colleges. This came about as a result of Dean (of Agriculture) Charles Palm in 1966 returning from a meeting of his peers of other agriculture colleges deeply concerned that at Cornell agriculture was not making enough good use of computers. He thereupon convened a committee chaired by Professor C. R. Henderson of Animal Science, of which I was a member. As a result I soon found myself as the "grunt man" for Nyle Brady, Director of Research in the College, to develop an organization dedicated to assisting faculty improve their use of computers for both research and teaching.

A projective report of mine in late 1966 laid out responsibilities, administration and financing. (As an aside, the accompanying table shows interesting changes in the cost of computing from 1956 to 1967.) The organization for Warren Hall was initially to be called a Computer Service Group, but it soon took the title of Computer Activities Group (C.A.G.). Trying to find a director for such an organization was extremely difficult in those days, but eventually I recruited E. W. Jones from New Zealand (my home country) where he had for some twenty years managed punched card and computing facilities in the governmental Applied Mathematics Laboratory serving a wide variety of scientists.

Jones management of CAG was supported by a committee of faculty, each member of which came from among the prime faculty users of CAG such as Biometry, Agricultural Economics, Agricultural Engineering, the Veterinary College and the College of Home Economics (to give them their 1966-67 names). This committee met once a year, and seemed to work quite satisfactorily, until after a few years it was peremptorily disbanded by a College administrator — and CAG was then soon absorbed into OCS.

I continued through to 1995 as Professor of Biological Statistics and concentrated on research and writing.

Approximate Cost of Computing at Cornell
Year Computer Additions
per
Second
Hourly
Charge
System Cost Charge for
1 Million
Additions
1956IBM 650700$75$300,000$30.00
1959Burroughs 2205000$150$750,000$8.60
1962Control Data 1604100,000$240$1,500,000$0.66
1967IBM 360/651,000,000$350$2,500,000$0.10
Shayle R. Searle, 1968