Re: using extract handler: data not extracted

2014-01-12 Thread sweety
I am working on Windows 7 -- View this message in context: http://lucene.472066.n3.nabble.com/using-extract-handler-data-not-extracted-tp4110850p4110993.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: using extract handler: data not extracted

2014-01-12 Thread Andrea Gazzarini
Not really sure...the issue seems related to text extraction so the first suspect is tika...SOLR is playing a secondary role here. If Tika is doing extraction good there should be an error, a warning on solr side (an exception, a content field too long warning or something like that) What about th

Re: using extract handler: data not extracted

2014-01-12 Thread sweety
Sorry for the mistake. im using solr 4.2, it has tika-1.3. So now, java -jar tika-app-1.3.jar -v C:\Coding.pdf , parses pdf document without error or msg. Also, java -jar tika-app-1.3.jar -t C:\Coding.pdf, shows the entire document. Which means there is no problem in tika right?? -- View t

Re: using extract handler: data not extracted

2014-01-12 Thread sweety
Sorry for the mistake. im using solr 4.2, it has tika-1.3. So now, java -jar tika-app-1.3.jar -v C:\Coding.pdf , parses pdf document without error or msg. Also, java -jar tika-app-1.3.jar -t C:\Coding.pdf, shows the entire document. Which means there is no problem in tika right?? -- View t

Re: using extract handler: data not extracted

2014-01-12 Thread sweety
Sorry for the mistake. im using solr 4.2, it has tika-1.3. So now, java -jar tika-app-1.3.jar -v C:Coding.pdf , parses pdf document without error or msg. Also, java -jar tika-app-1.4.jar* -t *C:Cloud.docx, shows the entire document. Which means there is no problem in tika right?? -- View this

Re: using extract handler: data not extracted

2014-01-12 Thread Andrea Gazzarini
Please stay on (or clarify) your issue: in the first example you told us the problem is with "Coding.pdf" file. What is that Cloud.docx? Why don't you try with Coding.pdf? And what is the result of the extraction from command line with Coding.pdf and the same tika version that is in your SOLR? I w

Re: using extract handler: data not extracted

2014-01-12 Thread sweety
through command line(>java -jar tika-app-1.4.jar -v C:Cloud.docx) apache tika is able to parse .docx files, so can i use this tika-app-1.4.jar in solr?? how to do that?? -- View this message in context: http://lucene.472066.n3.nabble.com/using-extract-handler-data-not-extracted-tp4110850p4110

Re: using extract handler: data not extracted

2014-01-12 Thread Andrea Gazzarini
A premise: as Erik explained, most probably this issue has nothing to do with SOLR. So, these are the options that, in my mind, you have *OPTION #1 : Using Tika as command line tool*a) Download Tika. Make sure the same version of your SOLR b) Read here: http://tika.apache.org/1.4/gettingstarted.h

Re: using extract handler: data not extracted

2014-01-12 Thread sweety
ya right all 3 points are right. Let me solve the 1 first, there is some errror in tika level indexing, for that i need to debug at tika level right?? but how to do that?? Solr admin does not show package wise logging. -- View this message in context: http://lucene.472066.n3.nabble.com/using-e

Re: using extract handler: data not extracted

2014-01-12 Thread Andrea Gazzarini
Wait, don't confuse things...they should be three different issues: 1. with curl indexing happens but leaves the content field empty, so probably something occurs at tika level during the text extraction. That's the reason why I told you about the tika logging 2. with solrj ineexing doesn'happen

Re: using extract handler: data not extracted

2014-01-11 Thread sweety
this is the output i get when indexed through* solrj*, i followed the link you suggested. i tried indexing .doc file. 400 17 org.apache.solr.search.SyntaxError: Cannot parse 'id:C:\solr\document\src\new_index_doc\document_1.doc': Encountered " ":" ": "" at line 1, column 4. Was expecting one o

Re: using extract handler: data not extracted

2014-01-11 Thread Erick Erickson
You know, what I'd do is one of two things: 1> Set up a remote debugging session for your sever and debug it. It's actually quite simple. Get the source code (see http://wiki.apache.org/solr/HowToContribute). I'll give you http://wiki.apache.org/solr/HowToContribute. The sections near the bottom w

Re: using extract handler: data not extracted

2014-01-11 Thread sweety
the logging screen does not show tika package, also i searched on net, it requires log4j and slf4j jars, is it true?? Do i need to do the configurations for package level log? -- View this message in context: http://lucene.472066.n3.nabble.com/using-extract-handler-data-not-extracted-tp4110850

Re: using extract handler: data not extracted

2014-01-11 Thread Andrea Gazzarini
On the admin console you should be able to tune the log at package level On 11 Jan 2014 17:31, "sweety" wrote: > how set finest for tika package?? > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/using-extract-handler-data-not-extracted-tp4110850p4110888.html > Sent

Re: using extract handler: data not extracted

2014-01-11 Thread sweety
how set finest for tika package?? -- View this message in context: http://lucene.472066.n3.nabble.com/using-extract-handler-data-not-extracted-tp4110850p4110888.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: using extract handler: data not extracted

2014-01-11 Thread Andrea Gazzarini
Set to Finest tika packages too On 11 Jan 2014 15:25, "sweety" wrote: > I set the level of extract handler to finest, now the logs are : > INFO: [document] webapp=/solr path=/update/extract > params={commit=true&literal.id=12&debug=true} {add=[12 > (1456944038966984704)],commit=} 0 2631 > Jan 11,

Re: using extract handler: data not extracted

2014-01-11 Thread sweety
I set the level of extract handler to finest, now the logs are : INFO: [document] webapp=/solr path=/update/extract params={commit=true&literal.id=12&debug=true} {add=[12 (1456944038966984704)],commit=} 0 2631 Jan 11, 2014 7:51:57 PM org.apache.solr.servlet.SolrDispatchFilter handleAdminRequest INF

Re: using extract handler: data not extracted

2014-01-11 Thread Andrea Gazzarini
Try to set to FINEST / DEBUG level the extract request handler and Tika packages and post relevant log lines On 11 Jan 2014 14:38, "sweety" wrote: > Sorry, that my question was not clear. > Initially when indexed pdf files it showed the data within this pdf in the > contents field.as follows:(t

Re: using extract handler: data not extracted

2014-01-11 Thread sweety
Sorry, that my question was not clear. Initially when indexed pdf files it showed the data within this pdf in the contents field.as follows:(this is output for initially indexed documents) Cloud ctured As tale in size as well as complexity. We need a cloud based system that will solve this problem

Re: using extract handler: data not extracted

2014-01-11 Thread Andrea Gazzarini
> Why is it so?? I'm reading your post on my mobile so probably I didn't get the point: other then the date_modified field, what is the problem? Fields with "ignored" prefix? That is perfectly right according with your configuration. The other fields you declared aren't there because they are not

Re: using extract handler: data not extracted

2014-01-11 Thread Erick Erickson
Are you sure date_modified is a meta-data field in the PDF document you're extracting? Best, Erick On Sat, Jan 11, 2014 at 3:00 AM, sweety wrote: > I need to index rich text documents, this is* solrconfig.xml for extract > handler*: > class="solr.extraction.ExtractingRequestHandler" > > > > tr