Re: Overriding Tika's field processing

Lance Norskog Fri, 29 Oct 2010 02:55:17 -0700

If you change 'title' to be single-valued, the Extracting thing may or
may not override it. I remember a go-round on this problem. But the
ExtractingWhatsIt has code that explicitly checks for single-valued
v.s. multi-valued.


And this may all be different in different Solr versions. The
DataImportHandler has Tika support in 3.x and trunk, and the DIH gives
a lot more control about what field has what value.

On Thu, Oct 28, 2010 at 8:53 AM, Tod <listac...@gmail.com> wrote:
> I'm reading my document data from a CMS and indexing it using calls to curl.
>  The curl call includes 'stream.url' so Tika will also index the actual
> document pointed to by the CMS' stored url.  This works fine.
>
> Presentation side I have a dropdown with the title of all the indexed
> documents such that when a user clicks one of them it opens in a new window.
>  Using js, I've been parsing the json returned from Solr to create the
> dropdown.  The problem is I can't get the titles sorted alphabetically.
>
> If I use a facet.sort on the title field I get back ALL the sorted titles in
> the facet block, but that doesn't include the associated URL's.  A sorted
> query won't work because title is a multivalued field.
>
> The one option I can think of is to make the title single valued so that I
> have a one to one relationship to the returned url.  To do that I'd need to
> be able to *not* index the Tika returned values.
>
> If I read right, my understanding was that I could use 'literal.title' in
> the curl call to limit what would be included in the index from Tika.  That
> doesn't seem to be working as a test facet query returns more than I have in
> the CMS.
>
> Am I understanding the 'literal.title' processing correctly?  Does anybody
> have experience/suggestions on how to handle this?
>
>
> Thanks - Tod
>
>



-- 
Lance Norskog
goks...@gmail.com

Re: Overriding Tika's field processing

Reply via email to