Re: Missing tokens

2010-08-19 Thread paul . moran
I did that and it worked. Thanks very much for your expert assistance, Jan! Paul From: Jan Høydahl / Cominvent To: solr-user@lucene.apache.org Date: 19/08/2010 16:15 Subject:Re: Missing tokens Hi, Your bug is right there in the WhitespaceTokenizer, where you see that it

Re: Missing tokens

2010-08-19 Thread Jan Høydahl / Cominvent
-| > |term position |1| > |--+-| > | term text |ob10 | > |--+-| > | term type |word | > |--+-| > |source|0,4 | > | start,end | | >

Re: Missing tokens

2010-08-19 Thread paul . moran
| I did look at ExtractingRequestHandler a while ago, but I don't think it supported password protected files. Just looked at it again, and it looks like it does now. From: Jan Høydahl / Cominvent To: solr-user@lucene.apache.org Date: 18/08/2010 23:16 Subject:Re:

Re: Missing tokens

2010-08-18 Thread Jan Høydahl / Cominvent
27; search term in the summary field of the doc returned. > > Here's a snippet of the summary field from that returned doc > > To produce a downloadable file using a format suitable > for OB10. 8-26 Profiles > > I'm thinking that the extracted text from pdfbox may ha

Re: Missing tokens

2010-08-18 Thread paul . moran
suitable for OB10. 8-26 Profiles I'm thinking that the extracted text from pdfbox may have hidden chars that solr can't parse. However, before I go down that road, I just want to be sure I'm not making schoolboy errors with my solr setup. thanks Paul From: Jan Høydahl / Cominven

Re: Missing tokens

2010-08-18 Thread Jan Høydahl / Cominvent
Hi, Can you share with us how your schema looks for this field? What FieldType? What tokenizer and analyser? How do you parse the PDF document? Before submitting to Solr? With what tool? How do you do the query? Do you get the same results when doing the query from a browser, not SolrJ? -- Jan