I will try using solrj.Thanks. but I tried to index .docx file I am getting some different error: SEVERE: null:java.lang.RuntimeException: java.lang.VerifyError: (class: org/apache/poi/extractor/ExtractorFactory, method: createExtractor signature: (Lorg/apache/poi/poifs/filesystem/DirectoryNode;)Lorg/apache/poi/POITextExtractor;) Wrong return type in function at org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:651) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:364) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:169) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:928) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:987) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:539) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:298) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) Caused by: java.lang.VerifyError: (class: org/apache/poi/extractor/ExtractorFactory, method: createExtractor signature: (Lorg/apache/poi/poifs/filesystem/DirectoryNode;)Lorg/apache/poi/POITextExtractor;) Wrong return type in function at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:59) at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:82) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:219) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1797) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:637) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:343) ... 16 more I read this solution(http://stackoverflow.com/questions/14696371/how-to-extract-the-text-of-a-ppt-file-with-tika),which says removal of jars solves errors,but there are no such mentioned jars in my classpath. Is it that,Jars may cause the issue?
Thank You. On Wednesday, October 9, 2013 12:54 PM, sweety shinde <sweetyshind...@yahoo.com> wrote: I will try using solrJ. Now I tried indexing .docx files and I get some different error,logs are: SEVERE: null:java.lang.RuntimeException: java.lang.VerifyError: (class: org/apache/poi/extractor/ExtractorFactory, method: createExtractor signature: (Lorg/apache/poi/poifs/filesystem/DirectoryNode;)Lorg/apache/poi/POITextExtractor;) Wrong return type in function at org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:651) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:364) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:169) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:928) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:987) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:539) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:298) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) Caused by: java.lang.VerifyError: (class: org/apache/poi/extractor/ExtractorFactory, method: createExtractor signature: (Lorg/apache/poi/poifs/filesystem/DirectoryNode;)Lorg/apache/poi/POITextExtractor;) Wrong return type in function at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:59) at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:82) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:219) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1797) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:637) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:343) ... 16 more But does the jars cause these errors? Because I read one solution which said removal of few jars in classpath may solve the errors,but those jars are not present in my classpath.(the link to solution :http://stackoverflow.com/questions/14696371/how-to-extract-the-text-of-a-ppt-file-with-tika) Thank You. On Wednesday, October 9, 2013 6:05 AM, Erick Erickson [via Lucene] <ml-node+s472066n4094231...@n3.nabble.com> wrote: Hmmm, that is odd, the glob dynamicField should pick this up. Not quite sure what's going on. You an parse the file via Tika yourself and look at what's in there, it's a relatively simple SolrJ program, here's a sample: http://searchhub.org/2012/02/14/indexing-with-solrj/ Best, Erick On Tue, Oct 8, 2013 at 4:15 PM, sweety <[hidden email]> wrote: > This my new schema.xml: > <schema name="documents"> > <fields> > <field name="id" type="string" indexed="true" stored="true" required="true" > multiValued="false"/> > <field name="author" type="string" indexed="true" stored="true" > multiValued="true"/> > <field name="comments" type="text" indexed="true" stored="true" > multiValued="false"/> > <field name="keywords" type="text" indexed="true" stored="true" > multiValued="false"/> > <field name="contents" type="text" indexed="true" stored="true" > multiValued="false"/> > <field name="title" type="text" indexed="true" stored="true" > multiValued="false"/> > <field name="revision_number" type="string" indexed="true" stored="true" > multiValued="false"/> > <field name="_version_" type="long" indexed="true" stored="true" > multiValued="false"/> > <dynamicField name="ignored_*" type="string" indexed="false" stored="true" > multiValued="true"/> > <dynamicField name="*" type="ignored" multiValued="true" /> > <copyfield source="id" dest="text" /> > <copyfield source="author" dest="text" /> > </fields> > <types> > <fieldtype name="ignored" stored="false" indexed="false" > class="solr.StrField" /> > <fieldType name="integer" class="solr.IntField" /> > <fieldType name="long" class="solr.LongField" /> > <fieldType name="string" class="solr.StrField" /> > <fieldType name="text" class="solr.TextField" /> > </types> > <uniqueKey>id</uniqueKey> > </schema> > I still get the same error. > > ________________________________ > From: Erick Erickson [via Lucene] <[hidden email]> > To: sweety <[hidden email]> > Sent: Tuesday, October 8, 2013 7:16 AM > Subject: Re: no such field error:smaller big block size details while > indexing doc files > > > > Well, one of the attributes parsed out of, probably the > meta-information associated with one of your structured > docs is SMALLER_BIG_BLOCK_SIZE_DETAILS and > Solr Cel is faithfully sending that to your index. If you > want to throw all these in the bit bucket, try defining > a true catch-all field that ignores things, like this. > <dynamicField name="*" type="ignored" multiValued="true" /> > > Best, > Erick > > On Mon, Oct 7, 2013 at 8:03 AM, sweety <[hidden email]> wrote: > >> Im trying to index .doc,.docx,pdf files, >> im using this url: >> curl >> "http://localhost:8080/solr/document/update/extract?literal.id=12&commit=true" >> >> -F"myfile=@complex.doc" >> >> This is the error I get: >> Oct 07, 2013 5:02:18 PM org.apache.solr.common.SolrException log >> SEVERE: null:java.lang.RuntimeException: java.lang.NoSuchFieldError: >> SMALLER_BIG_BLOCK_SIZE_DETAILS >> at >> org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:651) >> >> at >> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:364) >> >> at >> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141) >> >> at >> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) >> >> at >> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) >> >> at >> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224) >> >> at >> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:169) >> >> at >> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168) >> >> at >> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98) >> at >> org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:928) >> at >> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) >> >> at >> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407) >> at >> org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:987) >> >> at >> org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:539) >> >> at >> org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:298) >> >> at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) >> at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown >> Source) >> at java.lang.Thread.run(Unknown Source) >> Caused by: java.lang.NoSuchFieldError: SMALLER_BIG_BLOCK_SIZE_DETAILS >> at >> org.apache.poi.poifs.filesystem.NPOIFSFileSystem.<init>(NPOIFSFileSystem.java:93) >> >> at >> org.apache.poi.poifs.filesystem.NPOIFSFileSystem.<init>(NPOIFSFileSystem.java:190) >> >> at >> org.apache.poi.poifs.filesystem.NPOIFSFileSystem.<init>(NPOIFSFileSystem.java:184) >> >> at >> org.apache.tika.parser.microsoft.POIFSContainerDetector.getTopLevelNames(POIFSContainerDetector.java:376) >> >> at >> org.apache.tika.parser.microsoft.POIFSContainerDetector.detect(POIFSContainerDetector.java:165) >> >> at >> org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:61) >> at >> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:113) >> at >> org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:219) >> >> at >> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) >> >> at >> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) >> >> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1797) >> at >> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:637) >> >> at >> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:343) >> >> ... 16 more >> >> Also using same type of url,txt,mp3 and pdf files are indexed successfully. >> (curl >> "http://localhost:8080/solr/document/update/extract?literal.id=12&commit=true" >> >> -F"myfile=@abc.txt") >> >> Schema.xml is: >> <schema name="documents"> >> <fields> >> <field name="id" type="string" indexed="true" stored="true" required="true" >> multiValued="false"/> >> <field name="author" type="string" indexed="true" stored="true" >> multiValued="true"/> >> <field name="comments" type="text" indexed="true" stored="true" >> multiValued="false"/> >> <field name="keywords" type="text" indexed="true" stored="true" >> multiValued="false"/> >> <field name="contents" type="text" indexed="true" stored="true" >> multiValued="false"/> >> <field name="title" type="text" indexed="true" stored="true" >> multiValued="false"/> >> <field name="revision_number" type="string" indexed="true" stored="true" >> multiValued="false"/> >> <field name="_version_" type="long" indexed="true" stored="true" >> multiValued="false"/> >> >> <dynamicField name="ignored_*" type="string" indexed="false" stored="true" >> multiValued="true"/> >> <copyfield source="id" dest="text" /> >> <copyfield source="author" dest="text" /> >> </fields> >> >> <types> >> <fieldType name="integer" class="solr.IntField" /> >> <fieldType name="long" class="solr.LongField" /> >> <fieldType name="string" class="solr.StrField" /> >> <fieldType name="text" class="solr.TextField" /> >> <fieldtype name="ignored" stored="false" indexed="false" multiValued="true" >> class="solr.StrField" /> >> </types> >> <uniqueKey>id</uniqueKey> >> </schema> >> >> Im not able to understand what kind of error this is,please help me. >> >> >> >> >> >> >> -- >> View this message in context: >> http://lucene.472066.n3.nabble.com/no-such-field-error-smaller-big-block-size-details-while-indexing-doc-files-tp4093883.html >> Sent from the Solr - User mailing list archive at Nabble.com. > > > ________________________________ > > If you reply to this email, your message will be added to the discussion > below:http://lucene.472066.n3.nabble.com/no-such-field-error-smaller-big-block-size-details-while-indexing-doc-files-tp4093883p4094013.html > To unsubscribe from no such field error:smaller big block size details while > indexing doc files, click here. > NAML > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/no-such-field-error-smaller-big-block-size-details-while-indexing-doc-files-tp4093883p4094166.html > Sent from the Solr - User mailing list archive at Nabble.com. ________________________________ If you reply to this email, your message will be added to the discussion below:http://lucene.472066.n3.nabble.com/no-such-field-error-smaller-big-block-size-details-while-indexing-doc-files-tp4093883p4094231.html To unsubscribe from no such field error:smaller big block size details while indexing doc files, click here. NAML -- View this message in context: http://lucene.472066.n3.nabble.com/no-such-field-error-smaller-big-block-size-details-while-indexing-doc-files-tp4093883p4094303.html Sent from the Solr - User mailing list archive at Nabble.com.