I did that and it worked.
Thanks very much for your expert assistance, Jan!
Paul
From: Jan Høydahl / Cominvent
To: solr-user@lucene.apache.org
Date: 19/08/2010 16:15
Subject:Re: Missing tokens
Hi,
Your bug is right there in the WhitespaceTokenizer, where you see that it
-|
> |term position |1|
> |--+-|
> | term text |ob10 |
> |--+-|
> | term type |word |
> |--+-|
> |source|0,4 |
> | start,end | |
>
|
I did look at ExtractingRequestHandler a while ago, but I don't think it
supported password protected files. Just looked at it again, and it looks
like it does now.
From: Jan Høydahl / Cominvent
To: solr-user@lucene.apache.org
Date: 18/08/2010 23:16
Subject:Re:
27; search term in the summary field of the doc returned.
>
> Here's a snippet of the summary field from that returned doc
>
> To produce a downloadable file using a format suitable
> for OB10. 8-26 Profiles
>
> I'm thinking that the extracted text from pdfbox may ha
suitable
for OB10. 8-26 Profiles
I'm thinking that the extracted text from pdfbox may have hidden chars that
solr can't parse. However, before I go down that road, I just want to be
sure I'm not making schoolboy errors with my solr setup.
thanks
Paul
From: Jan Høydahl / Cominven
Hi,
Can you share with us how your schema looks for this field? What FieldType?
What tokenizer and analyser?
How do you parse the PDF document? Before submitting to Solr? With what tool?
How do you do the query? Do you get the same results when doing the query from
a browser, not SolrJ?
--
Jan