I'm reading my document data from a CMS and indexing it using calls to
curl. The curl call includes 'stream.url' so Tika will also index the
actual document pointed to by the CMS' stored url. This works fine.
Presentation side I have a dropdown with the title of all the indexed
documents such that when a user clicks one of them it opens in a new
window. Using js, I've been parsing the json returned from Solr to
create the dropdown. The problem is I can't get the titles sorted
alphabetically.
If I use a facet.sort on the title field I get back ALL the sorted
titles in the facet block, but that doesn't include the associated
URL's. A sorted query won't work because title is a multivalued field.
The one option I can think of is to make the title single valued so that
I have a one to one relationship to the returned url. To do that I'd
need to be able to *not* index the Tika returned values.
If I read right, my understanding was that I could use 'literal.title'
in the curl call to limit what would be included in the index from Tika.
That doesn't seem to be working as a test facet query returns more
than I have in the CMS.
Am I understanding the 'literal.title' processing correctly? Does
anybody have experience/suggestions on how to handle this?
Thanks - Tod
- Overriding Tika's field processing Tod
-