A benchmark dataset of herbarium specimen images with label data

Show simple item record

dc.contributor University of Helsinki, Botany en
dc.contributor.author Dillen, Mathias
dc.contributor.author Groom, Quentin
dc.contributor.author Chagnoux, Simon
dc.contributor.author Güntsch, Anton
dc.contributor.author Hardisty, Alex
dc.contributor.author Haston, Elspeth
dc.contributor.author Livermore, Laurence
dc.contributor.author Runnel, Veljo
dc.contributor.author Schulman, Leif
dc.contributor.author Willemse, Luc
dc.contributor.author Wu, Zhengzhe
dc.contributor.author Phillips, Sarah
dc.date.accessioned 2019-02-12T14:26:01Z
dc.date.available 2019-02-12T14:26:01Z
dc.date.issued 2019-02-08
dc.identifier.citation Dillen , M , Groom , Q , Chagnoux , S , Güntsch , A , Hardisty , A , Haston , E , Livermore , L , Runnel , V , Schulman , L , Willemse , L , Wu , Z & Phillips , S 2019 , ' A benchmark dataset of herbarium specimen images with label data ' , Biodiversity Data Journal , vol. 7 , 31817 . https://doi.org/10.3897/BDJ.7.e31817 en
dc.identifier.issn 1314-2828
dc.identifier.other PURE: 122243327
dc.identifier.other PURE UUID: cdc7e7db-2881-44e0-9e2f-9d6c871ff65f
dc.identifier.other ORCID: /0000-0002-1990-2173/work/54148974
dc.identifier.other WOS: 000458727100001
dc.identifier.uri http://hdl.handle.net/10138/298937
dc.description.abstract Background More and more herbaria are digitising their collections. Images of specimens are made available online to facilitate access to them and allow extraction of information from them. Transcription of the data written on specimens is critical for general discoverability and enables incorporation into large aggregated research datasets. Different methods, such as crowdsourcing and artificial intelligence, are being developed to optimise transcription, but herbarium specimens pose difficulties in data extraction for many reasons. New information To provide developers of transcription methods with a means of optimisation, we have compiled a benchmark dataset of 1,800 herbarium specimen images with corresponding transcribed data. These images originate from nine different collections and include specimens that reflect the multiple potential obstacles that transcription methods may encounter, such as differences in language, text format (printed or handwritten), specimen age and nomenclatural type status. We are making these specimens available with a Creative Commons Zero licence waiver and with permanent online storage of the data. By doing this, we are minimising the obstacles to the use of these images for transcription training. This benchmark dataset of images may also be used where a defined and documented set of herbarium specimens is needed, such as for the extraction of morphological traits, handwriting recognition and colour analysis of specimens. en
dc.format.extent 15
dc.language.iso eng
dc.relation.ispartof Biodiversity Data Journal
dc.rights en
dc.subject 119 Other natural sciences en
dc.title A benchmark dataset of herbarium specimen images with label data en
dc.type Article
dc.description.version Peer reviewed
dc.identifier.doi https://doi.org/10.3897/BDJ.7.e31817
dc.type.uri info:eu-repo/semantics/other
dc.type.uri info:eu-repo/semantics/publishedVersion

Files in this item

Total number of downloads: Loading...

Files Size Format View
BDJ_article_31817.pdf 448.2Kb PDF View/Open

This item appears in the following Collection(s)

Show simple item record