Re: using extract handler: data not extracted

2014-01-11 Thread sweety
this is the output i get when indexed through* solrj*, i followed the link you suggested. i tried indexing .doc file. 400 17 org.apache.solr.search.SyntaxError: Cannot parse 'id:C:\solr\document\src\new_index_doc\document_1.doc': Encountered " ":" ": "" at line 1, column 4. Was expecting one o

Searching Numeric Data

2014-01-11 Thread Shashi Kant
Hi all, I have a use-case where I would need to search a set of numeric values, using a query set. My business case is 1. I have various Rock samples from various locations {R1...Rn} with multiple measurements like Porosity [255] - an array of values , Conductivity [1028] - also an array of number

Re: using extract handler: data not extracted

2014-01-11 Thread Erick Erickson
You know, what I'd do is one of two things: 1> Set up a remote debugging session for your sever and debug it. It's actually quite simple. Get the source code (see http://wiki.apache.org/solr/HowToContribute). I'll give you http://wiki.apache.org/solr/HowToContribute. The sections near the bottom w

Re: using extract handler: data not extracted

2014-01-11 Thread sweety
the logging screen does not show tika package, also i searched on net, it requires log4j and slf4j jars, is it true?? Do i need to do the configurations for package level log? -- View this message in context: http://lucene.472066.n3.nabble.com/using-extract-handler-data-not-extracted-tp4110850

Re: using extract handler: data not extracted

2014-01-11 Thread Andrea Gazzarini
On the admin console you should be able to tune the log at package level On 11 Jan 2014 17:31, "sweety" wrote: > how set finest for tika package?? > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/using-extract-handler-data-not-extracted-tp4110850p4110888.html > Sent

Re: using extract handler: data not extracted

2014-01-11 Thread sweety
how set finest for tika package?? -- View this message in context: http://lucene.472066.n3.nabble.com/using-extract-handler-data-not-extracted-tp4110850p4110888.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: using extract handler: data not extracted

2014-01-11 Thread Andrea Gazzarini
Set to Finest tika packages too On 11 Jan 2014 15:25, "sweety" wrote: > I set the level of extract handler to finest, now the logs are : > INFO: [document] webapp=/solr path=/update/extract > params={commit=true&literal.id=12&debug=true} {add=[12 > (1456944038966984704)],commit=} 0 2631 > Jan 11,

Re: using extract handler: data not extracted

2014-01-11 Thread sweety
I set the level of extract handler to finest, now the logs are : INFO: [document] webapp=/solr path=/update/extract params={commit=true&literal.id=12&debug=true} {add=[12 (1456944038966984704)],commit=} 0 2631 Jan 11, 2014 7:51:57 PM org.apache.solr.servlet.SolrDispatchFilter handleAdminRequest INF

Re: using extract handler: data not extracted

2014-01-11 Thread Andrea Gazzarini
Try to set to FINEST / DEBUG level the extract request handler and Tika packages and post relevant log lines On 11 Jan 2014 14:38, "sweety" wrote: > Sorry, that my question was not clear. > Initially when indexed pdf files it showed the data within this pdf in the > contents field.as follows:(t

Re: using extract handler: data not extracted

2014-01-11 Thread sweety
Sorry, that my question was not clear. Initially when indexed pdf files it showed the data within this pdf in the contents field.as follows:(this is output for initially indexed documents) Cloud ctured As tale in size as well as complexity. We need a cloud based system that will solve this problem

Re: using extract handler: data not extracted

2014-01-11 Thread Andrea Gazzarini
> Why is it so?? I'm reading your post on my mobile so probably I didn't get the point: other then the date_modified field, what is the problem? Fields with "ignored" prefix? That is perfectly right according with your configuration. The other fields you declared aren't there because they are not

Re: using extract handler: data not extracted

2014-01-11 Thread Erick Erickson
Are you sure date_modified is a meta-data field in the PDF document you're extracting? Best, Erick On Sat, Jan 11, 2014 at 3:00 AM, sweety wrote: > I need to index rich text documents, this is* solrconfig.xml for extract > handler*: > class="solr.extraction.ExtractingRequestHandler" > > > > tr

Re: leading wildcard characters

2014-01-11 Thread Ahmet Arslan
Hi Peter, Yes you are correct. There is no way to disable it.  Weird thing is javadoc says default is false but it is enabled by default in  SolrQueryParserBase.  boolean allowLeadingWildcard = true; http://search-lucene.com/jd/solr/solr-core/org/apache/solr/parser/SolrQueryParserBase.html#setA

using extract handler: data not extracted

2014-01-11 Thread sweety
I need to index rich text documents, this is* solrconfig.xml for extract handler*: true ignored_ true My *schema.xml* is: But after *indexing using this curl*: curl "http://localhost:8080/solr/document/update/extract?literal.id=12&commit=true"; -F"myfile=Coding.pdf" when queried as