Total Recall - C. Gordon Bell [15]
In this earliest stab at organizing my scanned data, I split my e-memory into two top-level divisions: items related to current events in my life, and an archive of older, inactive information. Under those two main folders, I had dozens of subfolders for categories including books, medical records, the Computer History Museum (which I’d helped start), trips, underwater photos, food, and so on. Under “Animals” you could find a picture of alligators, various images of San Francisco’s wild parrots, and an astonishing set of images showing a snake swallowing a kangaroo whole. I had my “Eagles” folder, of all things eagle-related, and a “Fun” folder, which included a picture of the adult me swinging from a rope.
To help find things again easily, I gave each item a long, detailed file name. For example, the file name of a technical article would include the title, where it was published, the date, keywords, and other pertinent details.
But even with all my documents and pictures stashed away in a well-thought-out classification hierarchy of file folders, it was hard to find particular items quickly, if at all, because it required remembering where it was put. It was just like a library organized by subject without a card catalog. Poring through multiple folders for the right name or thumbnail icon took too much time. Without better labels, even my photos were not much use. When I looked at some, I couldn’t recall what they were about. It was painfully clear that the problem would get far, far worse once I started adding hundreds of daily pictures and hours of daily audio to the jumble.
My friend and boss, Jim Gray, teased me about it. When you burn data onto most compact discs, the operation is permanent, and this is known as “Write Once, Read Many” or WORM. Jim mocked me as the inventor of WORN: “Write Once, Read Never.”
“It’s all just a bunch of bits unless it’s annotated,” I grumbled.
I began to realize the magnitude of what was lacking. This was not a project to store my life bits; it was about how to get them back!
Scanned documents are image files, not text files, and as such, they’re invisible to keyword searches. But with thousands upon thousands of documents in my e-memory, keyword searching would be the only way to re-locate an old file that I could only recollect one or two fragments of, such as a name, a dollar amount, or a dateline. So I ran all the scanned documents through optical character recognition (OCR) software, which is able to recognize written letters and numbers in an image and reconstruct them in a text file. What I ended up with were thousands upon thousands of text files that were neatly interleaved among the scanned files.
Now I just needed desktop search software, that is, software that would allow me to search through my thousands of files for some desired text, just like you search for Web pages now using Yahoo or Google. But at this time operating systems were still several years away from offering desktop search. Desktop search was in its infancy, and every such product I tried was pretty “bleeding edge.”
I tried to get Microsoft to take the lead in desktop search, starting with the acquisition of a leading start-up, but was unable to convince the right people. I would have to wait for others to revolutionize search technology. In the meantime, if I wanted to continue my little lifelogging experiment, I would have to cobble together my own solution. In October 2001, Jim Gemmell and Roger Lueder, who had been working with me on other projects at Microsoft, decided that this would make a great research project for them to get involved in. We started out like we do with any new research project, by combing through the published literature to see what others had learned.
I dug up an old paper that I recalled as being relevant, and was surprised at just how relevant it was. In fact, it specified a system almost made to order for us. That’s pretty amazing, when you consider that it had been written more than fifty years earlier.
MEMEX
In 1945, when electronic computers were