I'm reading my document data from a CMS and indexing it using calls to curl. The curl call includes 'stream.url' so Tika will also index the actual document pointed to by the CMS' stored url. This works fine.

Presentation side I have a dropdown with the title of all the indexed documents such that when a user clicks one of them it opens in a new window. Using js, I've been parsing the json returned from Solr to create the dropdown. The problem is I can't get the titles sorted alphabetically.

If I use a facet.sort on the title field I get back ALL the sorted titles in the facet block, but that doesn't include the associated URL's. A sorted query won't work because title is a multivalued field.

The one option I can think of is to make the title single valued so that I have a one to one relationship to the returned url. To do that I'd need to be able to *not* index the Tika returned values.

If I read right, my understanding was that I could use 'literal.title' in the curl call to limit what would be included in the index from Tika. That doesn't seem to be working as a test facet query returns more than I have in the CMS.

Am I understanding the 'literal.title' processing correctly? Does anybody have experience/suggestions on how to handle this?


Thanks - Tod

Reply via email to