| Hack to the future - investigate Yesterday’s news
Cultural heritage institutions do not only have physical documents, such as books, but also possess a goldmine of data that is waiting to have its full potential unlocked.
Since 2002, the National Library of Luxembourg (BnL) has been digitising a large variety of historical documents such as newspapers, monographs, manuscripts, postcards and even posters. To facilitate access of its collections to the widest possible audience, as well as to provide tools for education and research, the National Library of Luxembourg published part of its digitised newspapers and metadata as open datasets. All available data, APIs and tools can be found at:
The digitised newspapers have very precise metadata (XML format) and contain the full text, article-segmentation information (tagged articles, images, tables, advertisements, …), original scanned high resolution images (300ppi TIFF). Every block of information, from individual words to complete paragraphs or articles, have page coordinates. All digitised materials are also available for viewing through on a-z.lu and eluxemburgensia.lu.
The focus of this challenge is to take the raw data and build new applications or tools aimed at exploring, understanding, analysing or enriching the historical news. It is encouraged to create novel interfaces and/or make use of machine learning or artificial intelligence techniques.
Some ideas related to digitised newspaper data might include:
- Visually explore hundreds or thousands of digitised pages in a user-friendly way.
- Find and recommend similar articles, advertisements, illustrations.
- Enrich the full text using Named Entity Recognition (NER) or similar techniques.
While competing in this challenge it is important to keep the following points in mind:
- Ask yourself how you can help researchers or historians answer precise historical questions and how can the general public benefit from it, as well.
- Feel free to prepare for the challenge by downloading and parsing the data or training your machine learning models in advance.