1 Department of Nordic Research, Faculty of Humanities, Københavns Universitet2 Department of Nordic Research, Faculty of Humanities, Københavns Universitet
Adapting place-name editions to place-name databases and overcoming structural variation problems
The printed place-name series Danmarks Stednavne (Place-names of Denmark) has been published since 1922, and in 2013 volume 26 was released. Still only about 2/3 of the area of Denmark is covered by the series. Since 2009 a parallel effort has been made to digitalise the series through scanning and human-assisted character recognition – and place-name data from the rest of the country, derived from cadastral databases and a database of medieval settlement names, has been added while doing it. The resulting database, currently holding about 200,000 entries, is published at www.danmarksstednavne.dk and obviously draws heavily on the printed edition. But the century-long effort of publishing in printed form has spawned a series of challenges to a strict database integration; first of all variations in microstructure making the parsing into information categories (i.e. database fields) quite difficult. As of now, no less than 45 different database fields have been found necessary to structure the information found in a single place-name entry – some fields mandatory, some nonmandatory. And using a relational database structure, some fields have multiple occurrences within one entry (i.e. multiple source forms for one entry a.s.f.). Having made the conscious decision to split up the information into so many categories (i.e. fields) – instead of employing a broad 'other information' field – sophisticated algorithms have been developed in order to identify information category from typographical characteristics and the sequence of information in the series. Adding to the challenge is the macrostructural variation: The areas covered in printed form are covered with shifting principles of selecting the names to be published. Finding the right balance between letting the algorithms structure this complex digitalised information and supplementing with manual work is crucial to the successful construction of a database that allows sophisticated searches while still holding an acceptably low margin of errors.
Faculty of Humanities; place-names; digitisation; digital lexicography; microstructure; macrostructure; Database