Unless there's a regression in the ExtractingRequestHandler, then this should be caused because both
A) you have an id field defined in your solr schema file that's marked as a required field and B) you did not specify an ID parameter when you submitted your document to the handler. If you don't want your Solr docs to have an id field, then mark that field as not required in your schema. If you *do* want your Solr docs to have a required field called id, then you'll need to specify the ID when you submit your document. One way is using an ext.literal parameter, more or less like this: startofURL...&ext.literal.id=13&...restofURL Alternatively, you can try the field mapping mechanism, which is hopefully described on the wiki page. Cheers, Chris On Thu, Mar 19, 2009 at 3:46 PM, Larry Reid <lcr...@jadesystems.ca> wrote: > I trying to index Word, PDF and other documents with Solr. I installed > the latest nightly build of Solr on March 17. I followed the > instructions in the Wiki for ExtractingRequestHandler at > http://wiki.apache.org/solr/ExtractingRequestHandler#head-c95841f9eda007b6b4e4594ead12a04223cf7b6e. > > I have produced text output from tiki in the nightly build directories > from PDF files. > > When I try the suggested test curl commands in the "Getting Started with > the Solr Examle" section of the Wiki page, I get the following. Any idea > what I've done wrong? Thanks in advance for your help. > > $ curl http://localhost:8983/solr/update/extract?ext.idx.attr=true > \&ext.def.fl=text -F "myfi...@tutorial.pdf" > <html> > <head> > <meta http-equiv="Content-Type" content="text/html; > charset=ISO-8859-1"/> > <title>Error 500 </title> > </head> > <body><h2>HTTP ERROR: 500</h2><pre>org.apache.solr.common.SolrException: > Document [null] missing required field: id > > org.apache.solr.common.SolrException: > org.apache.solr.common.SolrException: Document [null] missing required > field: id > at > org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:169) > at > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1333) > at > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232) > at org.mortbay.jetty.servlet.ServletHandler > $CachedChain.doFilter(ServletHandler.java:1089) > at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) > at > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) > at > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) > at > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) > at > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) > at > org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211) > at > org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) > at > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) > at org.mortbay.jetty.Server.handle(Server.java:285) > at > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502) > at org.mortbay.jetty.HttpConnection > $RequestHandler.content(HttpConnection.java:835) > at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641) > at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:202) > at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) > at org.mortbay.jetty.bio.SocketConnector > $Connection.run(SocketConnector.java:226) > at org.mortbay.thread.BoundedThreadPool > $PoolThread.run(BoundedThreadPool.java:442) > Caused by: org.apache.solr.common.SolrException: Document [null] missing > required field: id > at > org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:292) > at > org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:59) > at > org.apache.solr.handler.extraction.ExtractingDocumentLoader.doAdd(ExtractingDocumentLoader.java:90) > at > org.apache.solr.handler.extraction.ExtractingDocumentLoader.addDoc(ExtractingDocumentLoader.java:95) > at > org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:157) > ... 22 more > </pre> > <p>RequestURI=/solr/update/extract</p><p><i><small><a > href="http://jetty.mortbay.org/">Powered by > Jetty://</a></small></i></p><br/> > <br/> > <br/> > <br/> > <br/> > <br/> > <br/> > <br/> > <br/> > <br/> > <br/> > <br/> > <br/> > <br/> > <br/> > <br/> > <br/> > <br/> > <br/> > <br/> > > </body> > </html> > > > Larry Reid > Principal Consultant, Jade Systems Inc > Mobile: +1 604.376.8884 > Pragmatic IT Blog | El Blog Technologia Pragmatica | www.jadesystems.ca >