Stephen,
As the lead developer on the SobekCM open-source digital repository project and 
formerly a developer for the University of Florida Libraries, I have looked at 
this quite a bit and learned a bit over time.

I began development working on tracking systems to manage a fairly large-scale 
digitization shop at UF before I was even working on the public repository 
side.  When I arrived (around 1999) metadata was double keyed several times for 
each item during the tracking and metadata creation process.  It seemed obvious 
to me that we needed a tracking system and one that would hold metadata for 
each item.  This was fairly easy to do when our metadata was very homogenous 
and based on simple Dublin Core.  This worked well and the system could easily 
spit out ready METS (and MXF) packages.

Over time, I began to experiment with MODS and increasingly started using 
specialized metadata schemas for different types of objects, such as herbarium 
or oral history materials.  I envisioned a tracking system that would hold all 
of this metadata relationally and provide different tabs based on the material 
type.  So, oral history items would have an extra tab exposing the oral history 
metadata and herbarium would have a similar special tab.  While development of 
this moved ahead, the entire system seemed unwieldy.  Adding a new schema was a 
bit laborious.. even adding a new field to use.  

After several years of this, we began the SobekCM digital repository software 
development.  After that experience I swore off trying to store very complex 
structured data in the database in the same type of format.  (This may also 
have had to do with an IMLS project I worked on that proved the futility of 
this approach.)  I generally eschew triple-stores for the basis of libraries in 
favor of relational databases on the premise that we DO actually understand the 
basic relationships of digital resources to collection and the sub-relations 
there.  We keep the data within METS files with one or more descriptive 
metadata sections and essentially the database only points to that METS file.  
For searching, we use a flattened table structure with one row per item, much 
like Solr/Lucene, and Solr/Lucene itself.

My advice is to steer clear of trying to take beautifully (and deeply) 
structured metadata from MODS, Darwin Core, VRACore (and who knows what else) 
and try to create tables and relations for them.

I think you can point some database tools at the schema and have it generate 
the tables for you.  Just doing that will probably dissuade you.  ;)

Mark V. Sullivan
CIO & Application Architect
Sobek Digital Hosting and Consulting, LLC
[email protected]
352-682-9692 (mobile)​​​


________________________________________
From: Code for Libraries <[email protected]> on behalf of Stephen Schor 
<[email protected]>
Sent: Friday, April 17, 2015 1:27 PM
To: [email protected]
Subject: [CODE4LIB] Modeling a repository's objects in a relational database

Hullo.

I'm interested to hear about people's approaches for modeling
repository objects in a normalized, spec-agnostic way, _relational_ way
while
maintaining the ability to cast objects as various specs (MODS, Dublin
Core).

People often resort to storing an object as one specification (the text of
the MODS for example),
and then convert it other specs using XSLT or their favorite language,
using established
mappings / conversions. (
http://www.loc.gov/standards/mods/mods-conversions.html)

Baking a MODS representation into a database text field can introduce
problems with queryablity and remediation that I _feel_ would be hedged
by factoring out information from the XML document, and modeling it
in a relational DB.

This is idea that's been knocking around in my head for a while.
I'd like to hear if people have gone down this road...and I'm especially
eager to hear both success and horror stories about what kind of results
they got.

Stephen

Reply via email to