On May 21, 2007, at 10:52 PM, Gary Browne wrote:
I'm wondering if anyone has any hints on how to prepare TEI documents for indexing - I was about to write some XSLT but didn't want to reinvent the wheel (unless it's punctured)?
I'm using Ruby to index TEI files, and leveraging the XPathMapper functionality built into the solr-ruby gem.
The (not so) funny thing about TEI is that every project uses it slightly differently, so whatever solution you come up with is likely not to be exactly right for other projects (sadly). So the wheel is punctured. For the Rossetti Archive (a way forked TEI variant), we use XSLT to generate RDF/XML that then gets fed into a Java-based indexer which uses Sesame's API to parse the RDF for sending into Solr. [The reason we go to RDF first is that is the convention we've developed for getting data into NINES for all archives, not just ours]
Erik