Re: Indexing and ExtractingRequestHandler

Harry Hochheiser Wed, 11 Aug 2010 20:09:02 -0700

Thanks.

I've done Tika command line to parse the Excel file, and I see
contents in it that don't appear to be indexed. I've tried the path of
using Tika to parse the Excel and then using extracting request
handler to index the resulting text, and that doesn't work either.


As far as Luke goes, I've built it from scratch. Still bombs. Is it
possible that it's not compatible with lucene  builds based on trunk?

thanks,


-harry

On Wed, Aug 11, 2010 at 6:48 PM, Jan Høydahl / Cominvent
<jan....@cominvent.com> wrote:
> Hi,
>
> You can try Tika command line to parse your Excel file, then you will se the 
> exact textual output from it, which will be indexed into Solr, and thus 
> inspect whether something is missing.
>
> Are you sure you use a version of Luke which supports your version of Lucene?
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> Training in Europe - www.solrtraining.com
>
> On 11. aug. 2010, at 23.33, Harry Hochheiser wrote:
>
>> I'm trying to use Solr to index the contents of an Excel file, using
>> the ExtractingRequestHandler (CSV handler won't work for me - I need
>> to consider the whole spreadsheet as one document), and I'm running
>> into some trouble.
>>
>> Is there any way to see what's going on during the indexing process?
>> I'm concerned that I may be losing some terms, and I'd like to see if
>> i can snoop on the terms that are added to the index as they go along.
>> How might I do this?
>>
>> Barring that, how can I inspect the index post-fact?  I have tried to
>> use luke to see what's in the index, but I get an error: "Unknown
>> format version -10". Is it possible to get luke to work?
>>
>> My solr build is straight out of SVN.
>>
>> thanks,
>>
>> harry
>
>

Re: Indexing and ExtractingRequestHandler

Reply via email to