Reading Machines

What if there were a machine that could read vast amounts of text, “understand” it, and could then answer questions using the knowledge gained from reading? How could that affect libraries, and library and information professionals, and indeed New Zealand?

Of course such a machine already exists and it is called Watson. Created by IBM, it won the game Jeopardy! in 2011 against the champion human contestants. It is now being used to create hundreds of applications around the world, including notably, in health and legal industries.

IBM Watson is not the only game in town in this – Google particularly has a machine learning program too using a different method to Watson. (Google Deepmind) . There are many others. Machines capable of reading 1 million books a second, and then never forgetting what it learned, and keeping on learning and getting better, ad infinitum.

I don’t think I need to spell out the implications – not only for libraries but for the World Wide Web itself. The basic point is that access to a question answering/suggestion machine that is superior to any human expert, leading to superior decision making in all domains obviously has mind boggling implications for the economy and society.

Library Involvement

But there is another aspect to this which needs to be emphasised, with libraries in mind: It reads text – books, articles, and reports. That is the input. Despite the oft repeated mantra that the Internet provides all the world’s knowledge at your fingertips – librarians and information professionals know that this is nonsense – the majority of “knowledge” is not freely available on the Internet – it is actually stored in libraries.

So to feed this monster, we need comprehensive collections of texts on all subjects, in breadth and depth. This is crucial of course to its effectiveness.

Library collections are then, indispensable. Further, the systems developed to securely store, record and catalogue the documents are crucial to prioritising and managing the feeding of the monster. The starting point is clearly the indexes and catalogues of books, articles, and reports so laboriously created by librarians over decades. I would submit that the best source for these is the New Zealand Index on the The Knowledge Basket, which is the only local truly multidisciplinary database available in New Zealand, with its aggregation of indexes and catalogues from most of the major libraries here, amounting to about 4 million records.


Now that IBM has made available a cloud based Watson ecosystem the costs have come down dramatically (from millions of dollars) to the point that it is feasible to start a project.

This brings into focus the requirements to achieve a system that could answer questions and make suggestions better than any human:

* Identify the inputs – already largely achieved through the efforts of librarians and scientists in indexing the texts – now in NZ Index.
* Collect the texts into a storage facility to allow reading by a machine.
* A collaborative effort between libraries and others (perhaps universities?) to create Watson applications.

In my opinion it is crucial for this development to be in the public domain from the outset. Its true potential will not be realised if it is locked away.


The library “industry” in New Zealand and elsewhere needs to recognise that in a digital world there needs to be one collaborative organisation of libraries able to undertake these initiatives as a collective, so as to have the resources to do this.

There are some problems to be overcome: One is that much of the text is not yet digitised. This is the challenge for the libraries in New Zealand. Although we are close to having text reading machines that can read paper texts (see the KNFB Reader: – we are not quite there yet in terms of accuracy, so some human intervention is still necessary. But such an effort is the lynchpin for any successful deployment of an AI – whether or not it is Watson based – it is the pre-requisite for any project that involves a reading machine, and will therefore provide the input for all future developments of this technology.

Are there copyright issues to be overcome? Here we have a machine reading a book or article, but not storing or copying it in its original form. Is this different to a human reading an article, and then remembering its content, and then creating a new output incorporating what he or she has learned from that text and many others – in a new text, or answering a question?

On the other hand, the Watson ecosystem allows for payments to content providers, in much the same way The Knowledge Basket provides royalties to publishers, based on actual usage of a resource. A crucial difference between an intelligent reading and answering machine and our current system epitomised by the Knowledge Basket and other search aggregators is that the relative value of the data will become clear. That is, the high-volume low-value bias to retrieval will be replaced by a bias to low-volume high-value sources, leading to better financial viability of scholarly journals for example, as they are more likely to be trusted sources by Watson in producing its outputs.


25 years ago we launched the Knowledge Basket. Today, I would not create a Knowledge Basket – I would be creating a Watson.

In conclusion, while the World Wide Web and the Internet have (and are continuing to) disrupt the library profession, the next stage of innovation will disrupt a large part of the World Wide Web (searching) and paradoxically place libraries as indispensable in its further development. They need to rise to the challenge.

Posted in Views