As any experienced family history researcher will tell you, the National Library of Scotland provides access to several newspaper eResources. If you’ve ever tried researching an ancestor, details of a particular historical place or event, you’ll know that archives such as The Scotsman Digital Archive and British Newspaper Archive feature 18th and 19th century newspapers with articles of Scottish interest. These can provide an exclusive source of information to content not found anywhere else online.
A certain technical wizardry goes into transforming the newspaper in your hand into the digital text that can be searched online. As with any digitisation process involving the presentation of words, a machine process called Optimal Character Recognition (OCR) reads the scanned newspaper image and transforms it into the text used to retrieve search results. Newspapers present unique difficulties for OCR due to the poor quality nature of some of the tiny and densely printed 19th century text, and the variety of historic fonts that OCR software is not so capable of handling. Correction of the OCR results means involving people to enhance digital content where the software has gone awry. The National Library of Australia famously involved digital communities to enable text correcting of its Trove newspaper collections using crowd-sourcing for example.
Trickier still, is the arrangement of text in columns. Historical newspapers have complex structures that vary according to different layouts between titles. Newspaper articles can span several pages within a single issue and this disrupts the sequential reading order. Articles vary in length and often contain irregular content in the middle, for example sub-headings and illustrations or adverts. Improving the usability of digital newspaper collections so that search results are displayed within articles requires a far more advanced level of document analysis than usual.
In an attempt to understand how people visiting our website interact with our newspaper eResources, the National Library of Scotland is getting the scoop from leading software developers about tools to improve search accuracy. We have found examples of current innovation making the headlines by following the NewsEye and Impresso projects, and from talking to our colleagues at international libraries. An example of how articles have been extracted from the front page of the St Ronan’s Standard and Effective Advertiser, a local newspaper with some issues held in the Library’s collections is shown here. Our findings will be used to consider how we might in future, improve access to Scotland’s newspapers and deliver the news to anyone interested in the nation’s cultural history.
We are listening to your views about the accessibility and use of online Scottish newspapers through our online survey of digital newspaper collections. The survey is open for a few more days until Friday 16th August. We hope to report on progress to make Scotland’s newspaper collections more easily accessible in the coming months.