Table of Contents
Table of Contents
The Apiary Project is a fundamental research project with the goal of identifying how human intelligence can be combined with machine processes for effective and efficient transformation of textual museum specimen label information into high-quality machine-processible parsed data.
Table of Contents
Dual to Quad core 64 bit processor
4-16GB RAM
Sizeable storage space, we used a 100GB drive for original images and another 100GB for converted images
Apache 2
Mysql-server
PHP 5
Sun-java6-jdk
Subversion
curl
mercurial
scons
chkconfig
vim-nox
flashplugin-installer
varnish (*optional)
Dual to Quad core 64 bit processor
4+GB RAM
Sizeable storage space. Note: We connect to a shared image drive from the Apiary Project Server.
Table of Contents
username: apiary
full name: password is apiary
password: apiary
Note: These documents will assume all passwords to be 'apiary'
Apache 2 (and its automatically added required packages)
Mysql-server
Note: These documents assume the root password is apiary
PHP 5 (and its automatically added required packages)
Sun-java6-jdk (and its automatically added required packages)
Subversion
curl
mercurial
scons
chkconfig
libapache2-mod-auth-mysql
flashplugin-installer
Php5-mysql
Php5-curl
Php5-xsl
Php5-imagick
Php5-gd
Php5-dev
Php-soap
libjpeg-progs
vim-nox (this fixes the weird behavior vi has with Ubuntu)
vi /etc/php5/apache2/php.ini
Replace ‘memory_limit = 16M’ with ‘memory_limit = 256M’
/etc/init.d/apache2 restart
mysql -u root -p
Accesses mysql command line, password is apiary
create database drupal;
grant all on drupal.* to ‘drupalAdmin’@’localhost’ identified by
‘apiary’;
grant all on drupal.* to ‘drupalAdmin’@’%’ identified by
‘apiary’;
create database fedora;
grant all on drupal.* to ‘fedoraAdmin’@’localhost’ identified by
‘apiary’;
grant all on drupal.* to ‘fedoraAdmin’@’%’ identified by
‘apiary’;
Note: This preserves versioning as drupal updates are released
ln -s /var/www/drupal-6.22 /var/www/drupal
mkdir /var/www/drupal/sites/default/files
chmod a+w /var/www/drupal/sites/default/files
vi /var/www/drupal/sites/default/files/.htaccess
Save blank file!
vi /etc/apache2/sites-enabled/000-default
change AllowOverride None to AllowOverride All for / and /var/www directories
cp /var/www/drupal/sites/default/default.settings.php
/var/www/drupal/sites/default/settings.php
chmod a+w /var/www/drupal/sites/default/settings.php
Note: This preserves versioning
ln -s /usr/lib/jvm/java-6-sun-1.6.0.15 /usr/lib/jvm/java-home
vi /etc/profile
Add the following lines (Note: no spaces around equal signs!)
export FEDORA_HOME=/usr/local/fedora
export JAVA_HOME=/usr/lib/jvm/java-home
export PATH=$JAVA_HOME/bin:$PATH
export PATH=$FEDORA_HOME/server/bin:$PATH
export PATH=$FEDORA_HOME/client/bin:$PATH
export CATALINA_HOME=$FEDORA_HOME/tomcat
export LD_LIBRARY_PATH=/usr/lib/jvm/java-home
export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH
Note: This preserves versioning
ln -s /usr/local/fedora-3.2 /usr/local/fedora
$JAVA_HOME/bin/keytool -genkey -alias tomcat -keyalg RSA
Password:apiary
java -jar fedora- installer-3.2.1.jar
Installation type: custom
Fedora home directory: /usr/local/fedora-3.2
Fedora administrator password: apiary
Fedora server host: hostname
Fedora application server context: fedora
Authentication requirement for API-A: false
SSL availability: true
SSL required for API-A: false
SSL required for API-M: false
Servlet engine: included
Tomcat home directory: /usr/local/fedora-3.2/tomcat
Tomcat HTTP port: 8080
Tomcat shutdown port: 8005
Tomcat Secure HTTP port: 8443
Keystore file: default
Keystore password: apiary
Keystore type: JKS
Database: mysql
MySQL JDBC driver: included
Database username: fedoraAdmin
Database password: apiary
JDBC URL: jdbc:mysql://hostname/fedora?useUnicode=true&characterEncoding=UTF-8&autoReconnect=true
JDBC DriverClass: com.mysql.jdbc.Driver
Policy enforcement enabled: true
Enable Resource Index: true
Enable Messaging: false
Deploy local services and demos: true
Note: Some Ubuntu installs had this variable set to zero even using “Enable Resource Index: true”
vi $FEDORA_HOME/server/config/fedora.fcfg
Set fedora.server.resourceIndex.ResourceIndex to 1
cp $FEDORA_HOME/tomcat/webapps/fedora/admin/crossdomain.xml
/usr/local/fedora/tomcat/webapps
Note: Allows anonymous users to be able to view public objects later while using Islandora.
vi $FEDORA_HOME/server/config/fedora-users.xml
Note: Islandora will send the username of fedora_anonymous and password of anonymous for all unauthenticated use.
Add the following <user> lines:
<user name="fedora_anonymous" password="anonymous">
<attribute name="fedoraRole">
<value>fedoraUser</value>
</attribute>
</user>
wget
http://softlayer.dl.sourceforge.net/project/islandora/islandora/Islandora_Dru6_Fed_3.1_beta2009-04-03/DrupalFilter_3.jar
cp DrupalFilter_3.jar
$FEDORA_HOME/tomcat/webapps/fedora/WEB-INF/lib/DrupalFilter.jar
vi $FEDORA_HOME/server/config/filter-drupal.xml
![]() | Caution |
|---|---|
The xml file needs a blank line at the end |
Add the following lines:
<?xml version="1.0" encoding="UTF-8"?> <!--File to hold drupal connection info for the FilterDrupal servlet filter. For multisite drupal installs you can include multiple connection elements. We will query all the databases and assume any user in any drupal db with the same username and password are the same user. We will gather all roles for that user from all databases. This is a potential security risk if a user in one drupal db has the same username and password as another user in a seperate drupaldb. We are also assuming all drupal dbs to be mysql. This file should be located in the same directory as the fedora.cfcg file--><FilterDrupal_Connection> <connection server="hostname" dbname="drupal" user="drupalAdmin" password="apiary" port="3306"> <sql> SELECT distinct u.uid as userid, u.name as Name, u.pass as Pass, r.name as role FROM drupal.users u, drupal.role r, drupal.users_roles where u.name=? and u.pass=? and r.rid=drupal.users_roles.rid and u.uid=drupal.users_roles.uid; </sql> </connection> </FilterDrupal_Connection>
vi $FEDORA_HOME/tomcat/webapps/fedora/WEB-INF/web.xml
After
<filter>
<filter-name>XmlUserfileFilter</filter-name>
<filter-class>fedora.server.security.servletfilters.xmluserfile.FilterXmlUserfile</filter-class>
</filter>
Add
<filter>
<filter-name>DrupalFilter</filter-name>
<filter-class>ca.upei.roblib.fedora.servletfilter.FilterDrupal</filter-class>
</filter>
After
<filter-mapping>
<filter-name>XmlUserfileFilter</filter-name>
<url-pattern>/*</url-pattern>
</filter-mapping>
Add
<filter-mapping>
<filter-name>DrupalFilter</filter-name>
<url-pattern>/*</url-pattern>
</filter-mapping>
Note: Some Ubuntu installs had this variable set to zero even using “Enable Resource Index: true”
vi $FEDORA_HOME/server/config/fedora.fcfg
Set fedora.server.resourceIndex.ResourceIndex to 1
$FEDORA_HOME/server/bin/fedora-rebuild.sh
Choose “Rebuild the Resource Index”
Verify selection
wget
http://iweb.dl.sourceforge.net/project/djatoka/djatoka/1.1/adore-djatoka-1.1.tar.gz
Note: This preserves versioning
ln -s /usr/local/adore-djatoka-1.1 /usr/local/adore-djatoka
cd /usr/local/adore-djatoka/bin
./compress.sh -i ../etc/test.jpg -o ../etc/test.jp2
Compress jpg file into jpeg2000 format
./extract.sh -i ../etc/test.jp2 -o ../etc/test-size1.jpg -l 1
extracts a jpg file from a jpeg2000 file
ls -l ../etc
verify test.jp2 and test-size1.jpg files were created
cp /usr/local/adore-djatoka/dist/adore-djatoka.war
/$CATALINA_HOME/webapps
vi
$CATALINA_HOME/webapps/adore-djatoka/WEB-INF/classes/djatoka.properties
Add the following in the Referent Resolver Properties section
SimpleListResolver.maxRemoteCacheSize=10000
cd /var/www/images/jpeg2000
wget
http://research.apiaryproject.org/images/apiary-aquarius-jpeg2000-vbox.tar.gz
vi /var/www/images/jpeg2000/imgIndex.txt
Replace all /mnt/converted_images with /var/www/images
mv /var/www/images/jpeg2000/imgIndex.txt
$CATALINA_HOME/webapps/adore-djatoka/WEB-INF/classes
mkdir /home/apiary/ocropus
cd /home/apiary/ocropus
hg clone http://iulib.googlecode.com/hg iulib
hg clone http://ocropus.googlecode.com/hg ocropus
mkdir /home/apiary/ocrad
cd /home/apiary/ocrad
wget http://ftp.gnu.org/gnu/ocrad/ocrad-0.21.tar.gz
mkdir /home/apiary/gocr
cd /home/apiary/gocr
wget
http://www-e.uni-magdeburg.de/jschulen/ocr/gocr-0.49.tar.gz
/var/www/drupal/includes/menu.inc
In function _menu_check_access(&$item, $map)
Replace:
if ($callback == 'user_access') { $item['access'] = (count($arguments) == 1) ? user_access($arguments[0]) : user_access($arguments[0], $arguments[1]); }
With:
if ($callback == 'user_access') { if(strpos($item['file'], "modules/apiary_project") > -1) { $item['access'] = user_access("View Apiary Project"); } else { $item['access'] = (count($arguments) == 1) ? user_access($arguments[0]) : user_access($arguments[0], $arguments[1]); } }
cd /var/www/drupal/themes
svn co
svn://projects.brit.org/src/svn_apiary/trunk/servers/applications/drupal/themes/zen
Note: This is theme altered by the Apiary Project to remove all menus, padding, etc
cd /var/www/drupal/themes
svn co
svn://projects.brit.org/src/svn_apiary/trunk/servers/applications/drupal/themes/cti_flex_no_title
cd /var/www/drupal/themes
svn co
svn://projects.brit.org/src/svn_apiary/trunk/servers/applications/drupal/themes/cti_flex
cd /home/apiary/
wget
http://ftp.drupal.org/files/projects/imageapi-6.x-1.10.tar.gz
cd /var/www/drupal/modules
svn co
http://fedora-commons.org/svn/root/islandora/islandora-module/Islandora-dru6-fed3/trunk/fedora_repository
Default Collection Name: Apiary's Fedora Repository
Default Collection PID: apiary:SpecimenBinders
A user with the Drupal role administrator: apiary
Pid namespaces allowed in this Drupal install: demo: changeme: Islandora: ilives: apiary: ap-specimen: ap-roi: ap-image: ap-model: ap-sdef: ap-sdep: ap-sdefcm:
Fedora Soap Management Url: http://hostname:8080/fedora/services/management?wsdl
Fedora base url: http://hostname:8080/fedora
Fedora RISearch URL: http://hostname:8080/fedora/risearch
Fedora Lucene Search URL: http://hostname:8080/fedoragsearch/rest
Fedora Lucene Index Name: BasicIndex
Fedora Soap Url: http://hostname:8080/fedora/services/access?wsdl
Click Save configuration
cd /var/www/drupal/modules
svn co
svn://projects.brit.org/src/svn_apiary/branches/mellifera-apiary-1.0.0/apiary_project
chmod 777
/var/www/drupal/modules/apiary_project/workflow/templates_c
mkdir
/var/www/drupal/sites/default/files/apiary_datastreams
chmod 777
/var/www/drupal/sites/default/files/apiary_datastreams
vi
/var/www/drupal/modules/apiary_project/fedora_commons/config_fedora.inc
Set database host, username and password as specified in Create Fedora User and Database
Note: This allows for direct reading of the fedora database instead of querying fedora itself through CURL.
Queries like get all pids with a name like ap-specimen is much easy to do from the database, which is done in one line, vs through SPRQL or Resource Index Fedora request.
At one point, we discussed the merits of storing all data for the workflow here but instead went the way of modifying the drupal db.
vi
/var/www/drupal/modules/apiary_project/workflow/index.php
Set $drupal_url = "http://hostname/drupal";
Set $djatoka_url = "http://hostname:8080/";
vi
/var/www/drupal/modules/apiary_project/workflow/comparer.php
Set $drupal_url = "http://hostname/drupal";
vi
/var/www/drupal/modules/apiary_project/workflow/search.php
Set $drupal_url = "http://hostname/drupal";
vi
/var/www/drupal/modules/apiary_project/workflow/workflow.php
Set $drupal_url = "http://hostname/drupal";
cd /home/apiary/
wget
http://ftp.drupal.org/files/projects/themekey-6.x-3.3.tar.gz
Scroll to ThemeKey section
Check Enabled for ThemeKey
Click Save configuration
cd /home/apiary
wget
http://apache.tradebit.com/pub/lucene/solr/1.4.1/apache-solr-1.4.1.tgz
cp apiary_project/solr/conf/schema.xml
$FEDORA_HOME/tomcat/webapps/apache-solr-1.4.1/example/solr/conf/schema.xml
$FEDORA_HOME/client/bin/fedora-admin.sh
Login
Username: fedoraAdmin
Password: apiary
Click File->Ingest->One Object->From File…
Navigate to /var/www/drupal/modules/apiary_project/digital_objects/required/apiary_SpecimenBinders.xml
Click Open
Select FOXML version 1.1
Click OK
cp
/var/www/drupal/modules/apiary_project/digital_objects/sDefs/ocropus/dynamicOCR.php
/var/www/drupal/sites/default/files/
$FEDORA_HOME/client/bin/fedora-admin.sh
Login
Username: fedoraAdmin
Click File->Ingest->One Object->From File…
Navigate to each of the following separately:
• /var/www/drupal/modules/apiary_project/digital_objects/sDefs/ocropus/ap-sdef_ocropus.xml
• /var/www/drupal/modules/apiary_project/digital_objects/sDefs/ocropus/ap-sdefcm_ocropus.xml
• /var/www/drupal/modules/apiary_project/digital_objects/sDefs/ocropus/ap-sdep_ocropus.xml
Click Open
Select FOXML version 1.1
Click OK
Click File->Exit
Title: Home Page
Menu settings
Menu link title:
Parent item: Primary links
Weight: 0
Body:
<h1>Welcome to the Apiary Project Proof of Concept</h1> <?php if (user_is_logged_in()): ?> <p>To begin the demo, select Workspace in the navigation menu above.</p> <?php else:?> <p>To begin the demo, <a href="user">login</a> with username "demo" and password "demo" then select Workspace in the navigation menu above.</p> <?php endif;?>
Input Format: PHP code
URL path settings: homepage
cp /var/www/drupal/modules/apiary_project/images/cti_flex_logo.png
/var/www/drupal/sites/default/files
Click configure next to CTI Flex theme
Scroll toToggle display
Uncheck Site name
Scroll to Logo image settings
Uncheck Use the default logo
Path to custom logo: sites/default/files/cti_flex_logo.png
Scroll to Theme-specific settings ->Custom color settings
Select color for body background: #FFE8B7
Select color for header and footer backgrounds: #8EB7FE
Select color for main navigation bar and block header backgrounds: #3F4F6B
Select color for block content background: #D4DAE6
Click Save configuration
Click Add item tab
Path: http://hostname/drupal
Menu link title: Home
Description: home page
Parent item: <Primary links>
Weight: 0
Click Save
Click Add item tab
Path: user/login
Menu link title: Login
Description:
Parent item: <Primary links>
Weight: 2
Click Save
Click Add item tab
Path: logout
Menu link title: Logout
Description:
Parent item: <Primary links>
Weight: 2
Click Save
Max Specimens:
File name source file: http://hostname/images/jpeg2000/demo_filenames.txt
Referent ID: apiary:jpeg2000
JPEG2000 Url Base: http://hostname/images/jpeg2000
Source Url Base: http://hostname/images/original
Click Submit
Table of Contents
Preface: Chapter 2 entails a lot, so if you have objects ingested then great! It's time to harness the power of the Apiary Project.
Anytime a reboot is felt necessary, use the following two commands to restart the project.
Note: start, stop and restart can all be passed in, but restart will not die if either are not started so it is an all-safe command to use regardless.
/etc/init.d/solr restart
/etc/init.d/apiary_project restart
Permissions to features users of the workflow will be allowed to access.
Can users assigned to the workflow use Image Analysis features, like creating ROIs.
Assign Users to this workflow
Select one or many or all druapal users who can view the Apiary Project and assign them to the workflow
This feature creates a new object pool based on a Resource Index or Solr Query.
select $sp_pid from <#ri> where $sp_pid<fedora-rels-ext:isMemberOf> <info:fedora/apiary:SpecimenBinders>
Groundtruth is the expected result of a Specimen after being accurately analyzed, transcibed and parsed. This values do not necessarily associate to any real fedora objects.
Displays the existing Groundtruth datastream for a requested Specimen pid. Can be manually entered or passed as a specimen_pid=ap-specimen:Specimen-X variable
A demonstration of the simplicity of sharing images from the djatoka image server. Any installation can ingest these images.
Ingest a number of specimen using the same rft-id base, jp2 url base and source url base. See Ingest Apiary Demo Objects
Compare two text strings and see the resulting levenshtein and simple text distances displayed using Daisydiff.
Search using specimenMetadata keywords in throughout all or one specific metadata field.
If changes have been made to Fedora Objects outside the Apiary Project, i.e. directly in Fedora, this useful tool will re-index all Apiary Project objects.
Table of Contents
Preface: Now that we have covered how to create workflows, it's time to use them.
Loads Specimen immediately to the right or left of the current Specimen in the Workflow Queue
Loads Specimen immediately to the right or left of the current Specimen in the Workflow Queue
Loads Specimen immediately to the right or left of the current Specimen in the Workflow Queue
Loads ROI immediately and switches to the Transcribe Text Tab
Loads ROI immediately and switches to the Parse Text Tab
Preface: An attempt to cover a variety of known issues
From a command line, typing in $FEDORA_HOME should result in the location of the fedora directory
If it does not, then your variable needs to be set See Edit Environment Variables
Others that could be missing:
JAVA_HOME, CATALINA_HOME, LD_LIBRARY_PATH
Some of these directories use Symlinks that can become invalid after an update, especially JAVA_HOME
Table of Contents
Other techniques used to improve the Apiary Project
vi
$FEDORA_HOME/tomcat/webapps/apache-solr-1.4.1/example/solr/conf/schema.xml
Solr offers many other features that are beyond the scope of this project. They can be learned about here
vi
/var/www/drupal/modules/apiary_project/workflow/include/search.php
Add new code to include added Dynamically Indexed Item
mysql -u root -p
Accesses mysql command line, password is apiary
create database djatoka;
grant all on djatoka.* to ‘djatokaAdmin’@’localhost’ identified by
‘apiary’;
grant all on djatoka.* to ‘djatokaAdmin’@’%’ identified by
‘apiary’;
use djatoka;
CREATE TABLE `resources` ( `identifier` varchar(150) NOT NULL,
`imageFile` varchar(255) NOT NULL, `original_file_url` text NOT NULL,
`jp2_file_url` text NOT NULL, PRIMARY KEY (`identifier`) ) ENGINE=MyISAM
DEFAULT CHARSET=latin1;
vi
/usr/local/fedora/tomcat/webapps/adore-djatoka/WEB-INF/classes/djatoka.properties
#OpenURLJP2KService.referentResolverImpl=gov.lanl.adore.djatoka.openurl.SimpleListResolverUncomment:
OpenURLJP2KService.referentResolverImpl=gov.lanl.adore.djatoka.openurl.plugin.r ftdb.DatabaseResolver
DatabaseResolver.url=jdbc:mysql://localhost/djatoka
DatabaseResolver.driver=com.mysql.jdbc.Driver
DatabaseResolver.login=djatokaAdmin
DatabaseResolver.pwd=apiary
DatabaseResolver.maxActive=500
DatabaseResolver.maxIdle=10
DatabaseResolver.query=SELECT identifier, imageFile FROM resources WHERE identifier='\\i';
The images' identifiers and file names are originally stored in a text file at
$FEDORA_HOME/tomcat/webapps/adore-djatoka/WEB-INF/classes/imgIndex.txt
For each image you wish to server, add a record modelling the following sql command used to serve the image MBB064.jp2
INSERT INTO
djatoka.resources(`identifier`,`imageFile`,`original_file_url`,`jp2_file_url`)
values('apiary:jpeg2000/mbb064', '/var/www/jpeg2000/MBB064.jp2',
'http://research.apiaryproject.org/images/original/MBB064.tif',
'http://research.apiaryproject.org/images/jpeg2000/MBB064.jp2)
References:
http://fak3r.com/2009/01/27/howto-serve-jpeg2000-images-with-a-scalable-infrastructure/
http://groups.drupal.org/node/25425/revisions/88248/view
http://blog.gootum.com/linux-blog/installing-varnish-reverse-proxy-for-ubuntu
http://comments.gmane.org/gmane.comp.web.varnish.misc/1840
a2enmod proxy
a2enmod proxy_ajp
a2enmod disk_cache
a2enmod file_cache
a2enmod mem_cache
a2enmod deflate
vi /etc/apache2/ports.conf
Change port 80 to 8019
vi /etc/apache2/sites-enabled/000-default
Change port 80 to 8019
mkdir -p /var/lib/varnish/pressflow
chown varnish.varnish /var/lib/varnish/pressflow
vi /etc/varnish/ default.vcl
change the backend default values:
backend default {
.host = "hostname";
.port = "8019";
protect against apache rewrite conflicts:
uncomment sub vcl_fetch
add after if statements:
if (obj.http.location ~ ":81"){
set obj.http.location = regsub(obj.http.location,"\:81","");
}
Table of Contents
The Apiary Project uses two powerful approaches to user interfacing and data handling. Both provide the ability to process everything on the server side using the full features of drupal. First, pages can be created within drupal itself. Second, pages external to drupal are supported using ajax and jQuery.
These are the primary type of pages the Apiary Admin pages use. They are called by the automatically created drupal node "apiary".
This file will be loaded from drupal, inside the cti-no-flex-title theme thanks to Themekey.
Example:
http://hostname/drupal/apiary?ref=groundtruth
This page is handled internally by drupal. The apiary page loads the apiary_project/workflow/include/groundtruth.php file and processes everything created therein.
Currently index (the actual workspace page), comparer, search and workflow use this approach.
An externally loaded page that is popualted by ajax calls to the drupal site
Example:
http://hostname/drupal/modules/apiary_project/workflow/index.php
This page is completely outside the scope of drupal allowing full control of user interface customization
In our case, we needed to apply jQuery to the full body of the page and this could not be done with drupal adding its own headers, etc first
Ajax calls can still be made to drupal, since the VIEW_APIARY edit was made to the menu.inc file, allowing drupal to do the work and the page to then be populated
The Task Queue is loaded by the response of the ajax call made to http://hostname/drupal/apiary/workflow_ajax/queue_list/1/0/0
Inside the external file, apiary_project/workflow/index.php, are variables that must be previously have been set
$drupal_url = "http://hostname/drupal";
$djatoka_url = "http://hostname:8080/";
These are stored using a SMARTY template
These tell the workflow where to make its ajax calls, mainly using jQuery
There are a number of loaded javascript and css files (apiary_project/workflow/assets/js and apiary_project/workflow/assets/css)
The return responses are then presented to the user with the javascript and css styling already loaded
When a jQuery ajax call is made, drupal checks for a current login and if the user has the VIEW_APIARY permission
Once verified, drupal processes the request on the server side
Often the request requires connection to the Fedora Repository, and a call using Islandora must be made (apiary_project/fedora_commons)
We also write to the Fedora Repository using CURL
Other times drupal executes a Solr query to process the response
All this information is used to return html and JSON data in an expected format
reference: http://www.apiaryproject.org/technical
reference: http://www.apiaryproject.org/technical
Table of Contents
Downloads for the Apiary Project are available at https://www.apiaryproject.org/downloads
Table of Contents
This was a main focus since the images we would be analyzing were high-resolution, 200+ MB tif files. The Adore-Djatoka Server is designed to handle large images but from a JPEG2000 file format. So the image conversion had to be explored. In the end, we found that a combination of ImageMagick and OpenJPEG would allow us to convert the large tif files into JPEG2000. A significant amount of RAM and or Swap space is definitely needed. System requirements can be viewed here.
The topic of storing the digital objects was a tough decision as the data we want to keep could be stored in a database, text files or a repository. The ability to track the versioning as the object progressed through the various stages of the workflow was a key necessity. Fedora's inherent use of Darwin Core and its multiple datastream support made it an ideal choice but its overall speed and lack of interfacing was a serious hurdle. Using a database like MySQL would be faster but lacked already developed tools we needed to have on hand to get the rest of the project moving forward.
The overall objective of the Apiary Project is to reduce the number of man hours it will take to process Specimens from start to finish. Plus, users in the world today want things fast! The faster the UI the better our objective will be met. Djatoka allowed us to handle large images effectively but sometimes had caching issues. Changing the caching properties was a must.
Varnish, an http accelerator, was also installed on the Apiary Project server. This receives http requests before they ever get to Apache2 to handle. So if an image is requested multiple times, the subsequent requests are returned by Varnish instead of being handled with Apache. Needing to see a new page can become an issue if Varnish never lets Apache process it. So the use of No-Cache headers is a good practice for some pages.
The API tools used to request a datastream and its content from Fedora, parse the data and then return results is anything but lightning fast. We chose to implement Apache-Solr indexing which is done when an indexed datastream for an ROI or Specimen Image is saved. For comparison, a request to get the Specimen, Image and ROI object information needed to fill the item browser with 40 Specimens using fedora only loaded in 10 seconds. The same 40 specimen request using Solr to fetch the information loaded in under 1.5 seconds.
I chose to create this documentation using OxygenXML given their relationship with BRIT and its ability to transform the xml into html and pdf formats.
I chose to create the code document this using Doxygen with its ability to layout the classes and function in an API-type format