The University of North Texas’ Texas Center for Digital Knowledge (TxCDK) and the Botanical Research Institute of Texas (BRIT) are conducting fundamental research with the goal of identifying how human intelligence can be combined with machine processes for effective and efficient transformation of textual museum specimen label information into high-quality machine-processible parsed data. This project is advancing understanding of the workflow and processes best able to increase access to digitized biological collection metadata within the stakeholder communities comprised of biologists, natural history museum collections managers, biodiversity standards groups, and the library and information science community. A key challenge faced by all natural history collections is determining a transformation process that yields high-quality results in a cost- and time-efficient manner. The results of this research will yield a new workflow model for effective and efficient label data transformation, correction, and enhancement that can be replicated, adapted, and transferred to herbaria and other natural history collections.
Our study addresses this research problem: What workflow provides for a combination of machine-assisted and human-assisted procedures to most effectively and efficiently convert textual data on specimen labels into machine-processable parsed data to ingest in a database and associate with the digitized specimen? The goal of this project is to answer this question. The project goal will be accomplished through the following objectives:
In addition to answering the research questions, the proposed study will produce the following deliverables:
The results of this research will inform a new workflow model for label data processing that will have a core advantage of distributing collaboration on a large scale with tools that accelerate the ability of humans to accurately recognize and parse label data and to proof the accuracy of the work of others. Additionally, the workflow model can incorporate access to networked resources such as authority files and georeferencing tools to enhance the use and appeal of the metadata and thus enhance the use of digital biodiversity repositories.
Funded by Institute of Museum and Library Services National Leadership Grant # 06-08-0079-08 — http://www.imls.gov
The Institute of Museum and Library Services is the primary source of federal support for the nation’s 122,000 libraries and 17,500 museums. The Institute's mission is to create strong libraries and museums that connect people to information and ideas.