How it Works

Workflow

The proposed general workflow for the Apiary Project begins after digitization with the identification of Regions of Interest (ROIs). This is followed by text transcription, text parsing, quality control, and ingestion to the Atrium biodiversity information system.

 digitize, identify ROIs, transcribe text, parse text, quality control, and Atrium biodiversity information system.

Each label on an herbarium specimen sheet will likely become an ROI. Many types of labels, and subsequently ROIs, can be part of a specimen sheet. The system or human user must be able to distinguish differing label types as well as data in different information categories.

Label text is converted from an image to digital text using Optical Character Recognition(OCR). It may be transcribed by a human, a computer, or both. The digitized text is then parsed, or sorted, into appropriate fields in a database with the use of specialized, standards-compliant metadata. This will facilitate the use of the BRIT collection's data by other biodiversity institutions and scholars.

Technologies

The Apiary Project's technology stack is composed of several open-source applications. A Fedora Commons repository provides the underlying data structure, and the Islandora module developed at the University of Prince Edward Island further serves to enhance the capabilites of Fedora Commons. Technology stack consisting of Drupal, Islandora, Fedora Commons, OCRopus, HERBIS, and djatoka.The user interfaces are developed using the content management program Drupal. OCROpus and GOCR facilitate Optical Character Recognition (OCR). Djatoka runs the JPEG 2000 image server. HerbIS (Erudite Recorded Botanical Information Synthesizer), created at the Yale Peabody Museum of Natural History, processes and presents the herbarium specimen data in machine-understandable format through the use of natural language processing (NLP).

This unique integration of open-source tools provides a robust structure for performing the many information storage, retrieval, conversion, and presentation tasks necessary for the Apiary Project. For more detailed information on the technology stack and decision processes, see the development discussion group linked to the "Get Involved" page.