Apiary Project

Home
    • Home
    • About
    • Presentations
    • Participants

About the Apiary Project

The University of North Texas’s Texas Center for Digital Knowledge (TxCDK) and the Botanical Research Institute of Texas (BRIT) will conduct fundamental research with the goal of identifying how human intelligence can be combined with machine processes for effective and efficient transformation of textual museum specimen label information into high-quality machine-processible parsed data. This two-year project will advance understanding of the workflow and processes best able to increase access to and use of digitized biological collection metadata within the stakeholder communities comprised of biologists, natural history museum collections managers, biodiversity standards groups, and the library and information science community. A key challenge faced by all natural history collections is determining a transformation process that yields high-quality results in a cost- and time-efficient manner. The results of this research will yield a new workflow model for effective and efficient label data transformation, correction, and enhancement that can be replicated, adapted, and transferred to herbaria and other natural history collections.

Our study addresses this research problem: What workflow provides for a combination of machine-assisted and human-assisted procedures to most effectively and efficiently convert textual data on specimen labels into machine-processable parsed data to ingest in a database and associate with the digitized specimen? The goal of this project is to answer this question. The project goal will be accomplished through the following objectives:

  • Identify and test machine processes for initial transformation of label data
  • Identify human processes that act on the machine-transformed data to correct and enhance label data
  • Develop, test, and assess user interfaces to support human processes
  • Develop and test a workflow that incorporates both human- and machine-assisted procedures for effectiveness and efficiency in label data transformation and enhancement
  • Assess quality of metadata resulting from machine and human processes

In addition to answering the research questions, the proposed study will produce the following deliverables:

  • Tested and validated procedures and workflow for human- and machine-assisted transformation of specimen label data
  • A replicable workflow model for transformation, correction, and enhancement of specimen label data
  • Reports that document all results from various research activities carried out during the study
  • Open source code used in the testbed (made available to community)

The results of this research will inform a new workflow model for label data processing that will have a core advantage of distributing collaboration on a large scale with tools that accelerate the ability of humans to accurately recognize and parse label data and to proof the accuracy of the work of others. Additionally, the workflow model can incorporate access to networked resources such as authority files and geo-referencing tools to enhance the use and appeal of the metadata and thus enhance the use of digital biodiversity repositories.

Funded by Institute of Museum and Library Services National Leadership Grant # 06-08-0079-08 — http://www.imls.gov
The Institute of Museum and Library Services is the primary source of federal support for the nation’s 122,000 libraries and 17,500 museums. The Institute's mission is to create strong libraries and museums that onnect people to information and ideas.

Funded by U.S. Institute of Museum and Library Services National Leadership Grant #06-08-0079-08.

Institute of Museum and Library Services logoUniversity of North Texas logoTexas Center for Digital Knowledge logoBotanical Research Institute of Texas logo

 

Apiary Project ©2008 - 2010