Re: Missing tokens

2010-08-19 Thread paul . moran
I did that and it worked. Thanks very much for your expert assistance, Jan! Paul From: Jan Høydahl / Cominvent To: solr-user@lucene.apache.org Date: 19/08/2010 16:15 Subject:Re: Missing tokens Hi, Your bug is right there in the WhitespaceTokenizer, where you see that it

Re: Missing tokens

2010-08-19 Thread Jan Høydahl / Cominvent
-| > |term position |1| > |--+-| > | term text |ob10 | > |--+-| > | term type |word | > |--+-| > |source|0,4 | > | start,end | | >

Re: Missing tokens

2010-08-19 Thread paul . moran
| I did look at ExtractingRequestHandler a while ago, but I don't think it supported password protected files. Just looked at it again, and it looks like it does now. From: Jan Høydahl / Cominvent To: solr-user@lucene.apache.org Date: 18/08/2010 23:16 Subject:Re:

Re: Missing tokens

2010-08-18 Thread Jan Høydahl / Cominvent
27; search term in the summary field of the doc returned. > > Here's a snippet of the summary field from that returned doc > > To produce a downloadable file using a format suitable > for OB10. 8-26 Profiles > > I'm thinking that the extracted text from pdfbox may ha

Re: Missing tokens

2010-08-18 Thread paul . moran
suitable for OB10. 8-26 Profiles I'm thinking that the extracted text from pdfbox may have hidden chars that solr can't parse. However, before I go down that road, I just want to be sure I'm not making schoolboy errors with my solr setup. thanks Paul From: Jan Høydahl / Cominven

Re: Missing tokens

2010-08-18 Thread Jan Høydahl / Cominvent
Hi, Can you share with us how your schema looks for this field? What FieldType? What tokenizer and analyser? How do you parse the PDF document? Before submitting to Solr? With what tool? How do you do the query? Do you get the same results when doing the query from a browser, not SolrJ? -- Jan

Missing tokens

2010-08-18 Thread paul . moran
Hi, I'm having a problem with certain search terms not being found when I do a query. I'm using Solrj to index a pdf document, and add the contents to the 'contents' field. If I query the 'contents' field on the SolrInputDocument doc object as below, I get 50k tokens. StringTokenizer to = new Str

Re: Urgent: SOLR Indexing missing tokens

2010-01-19 Thread Kranti™ K K Parisa
Hi I was doing the same mistake mentioned in this URL. http://search.lucidimagination.com/search/document/30616a061f8c4bf6/solr_ignoring_maxfieldlength maxFieldLength is there at 2 places. earlier changed at the indexDefaults now changed at mainIndex section also. it worked. Thanks Mark & Erick.

Re: Urgent: SOLR Indexing missing tokens

2010-01-19 Thread Kranti™ K K Parisa
Hi Erik, Yes, i deleted the index and re-indexed after increasing the value (i have restarted tomcat as well) but still no luck. but i was just wondering the field that i am trying to index has the complete document text in it as i am storing that. but not getting the complete terms/tokens into t

Re: Urgent: SOLR Indexing missing tokens

2010-01-19 Thread Kranti™ K K Parisa
Can anyone suggest/guide me on this. Best Regards, Kranti K K Parisa 2010/1/19 Kranti™ K K Parisa > Hi Mark, > > I changed the value to 1,000,000,000 to just test my luck. > > But unfortunately I am still not getting the index for all Token. > > Please suggest. > > Best Regards, > Kranti K K

Re: Urgent: SOLR Indexing missing tokens

2010-01-19 Thread Erick Erickson
Did you reindex the documents you examined? That limit is applied when you index. Try searching the user list for maxfieldlength, this topic has been discussed many times and you should find a solution. HTH Erick 2010/1/19 Kranti™ K K Parisa > Can anyone suggest/guide me on this. > > Best Rega

Re: Urgent: SOLR Indexing missing tokens

2010-01-19 Thread Kranti™ K K Parisa
Hi Mark, I changed the value to 1,000,000,000 to just test my luck. But unfortunately I am still not getting the index for all Token. Please suggest. Best Regards, Kranti K K Parisa 2010/1/19 Kranti™ K K Parisa > Hi Mark, > > As you see my config file contains the value as 10,000 > 1 >

Re: Urgent: SOLR Indexing missing tokens

2010-01-19 Thread Kranti™ K K Parisa
Hi Mark, As you see my config file contains the value as 10,000 1 But when I check thru Lukeall jar file I can see the Term count around 3,000. Please suggest. Best Regards, Kranti K K Parisa 2010/1/19 Mark Miller > It limits the number of tokens that will be indexed. > > Kranti™ K K P

Re: Urgent: SOLR Indexing missing tokens

2010-01-19 Thread Mark Miller
It limits the number of tokens that will be indexed. Kranti™ K K Parisa wrote: > Hi Mark, > > I really appreciate the quick reply. > > here is what I have in the config xml > > 32 > 2147483647 > * 1* > 1000 > 1 > > Does this matter with Tokens?? Because the field I am using

Re: Urgent: SOLR Indexing missing tokens

2010-01-19 Thread Kranti™ K K Parisa
Hi Mark, I really appreciate the quick reply. here is what I have in the config xml 32 2147483647 * 1* 1000 1 Does this matter with Tokens?? Because the field I am using is having the full content of the file ( I checked that using Lukeall jar file), how ever Tokens are n

Re: Urgent: SOLR Indexing missing tokens

2010-01-19 Thread Mark Miller
Kranti™ K K Parisa wrote: > Hi All, > > I have a problem using SOLR indexing. I am trying to index 96 pages PDF file > (using PDFBox for extracting the file contents into String). But > surprisingly SOLR Indexing is not done for the full document. Means I can't > get all the token how ever the fiel

Urgent: SOLR Indexing missing tokens

2010-01-19 Thread Kranti™ K K Parisa
Hi All, I have a problem using SOLR indexing. I am trying to index 96 pages PDF file (using PDFBox for extracting the file contents into String). But surprisingly SOLR Indexing is not done for the full document. Means I can't get all the token how ever the field contains the full text of the PDF a