If you change 'title' to be single-valued, the Extracting thing may or may not override it. I remember a go-round on this problem. But the ExtractingWhatsIt has code that explicitly checks for single-valued v.s. multi-valued.
And this may all be different in different Solr versions. The DataImportHandler has Tika support in 3.x and trunk, and the DIH gives a lot more control about what field has what value. On Thu, Oct 28, 2010 at 8:53 AM, Tod <listac...@gmail.com> wrote: > I'm reading my document data from a CMS and indexing it using calls to curl. > The curl call includes 'stream.url' so Tika will also index the actual > document pointed to by the CMS' stored url. This works fine. > > Presentation side I have a dropdown with the title of all the indexed > documents such that when a user clicks one of them it opens in a new window. > Using js, I've been parsing the json returned from Solr to create the > dropdown. The problem is I can't get the titles sorted alphabetically. > > If I use a facet.sort on the title field I get back ALL the sorted titles in > the facet block, but that doesn't include the associated URL's. A sorted > query won't work because title is a multivalued field. > > The one option I can think of is to make the title single valued so that I > have a one to one relationship to the returned url. To do that I'd need to > be able to *not* index the Tika returned values. > > If I read right, my understanding was that I could use 'literal.title' in > the curl call to limit what would be included in the index from Tika. That > doesn't seem to be working as a test facet query returns more than I have in > the CMS. > > Am I understanding the 'literal.title' processing correctly? Does anybody > have experience/suggestions on how to handle this? > > > Thanks - Tod > > -- Lance Norskog goks...@gmail.com