Doh! I get it. Ignore my questions in the previous e-mail. The XML files
have the id in them. For Word/Excel/PDF etc., it's up to the client
(crawler) or whatever to create a unique id if I want a unique id.

Thanks again for pointing me in the right direction. I'm really
impressed with how easy it's been for a non-Java/web app guy to get Solr
going. Excellent work!

On Thu, 2009-03-19 at 16:51 -0700, Chris Harris wrote:

> Unless there's a regression in the ExtractingRequestHandler, then this
> should be caused because both
> 
> A) you have an id field defined in your solr schema file that's marked
> as a required field
> 
> and
> 
> B) you did not specify an ID parameter when you submitted your
> document to the handler.
> 
> If you don't want your Solr docs to have an id field, then mark that
> field as not required in your schema.
> 
> If you *do* want your Solr docs to have a required field called id,
> then you'll need to specify the ID when you submit your document. One
> way is using an ext.literal parameter, more or less like this:
> 
>     startofURL...&ext.literal.id=13&...restofURL
> 
> Alternatively, you can try the field mapping mechanism, which is
> hopefully described on the wiki page.
> 
> Cheers,
> Chris
> 
> On Thu, Mar 19, 2009 at 3:46 PM, Larry Reid <lcr...@jadesystems.ca> wrote:
> > I trying to index Word, PDF and other documents with Solr. I installed
> > the latest nightly build of Solr on March 17. I followed the
> > instructions in the Wiki for ExtractingRequestHandler at
> > http://wiki.apache.org/solr/ExtractingRequestHandler#head-c95841f9eda007b6b4e4594ead12a04223cf7b6e.
> >
> > I have produced text output from tiki in the nightly build directories
> > from PDF files.
> >
> > When I try the suggested test curl commands in the "Getting Started with
> > the Solr Examle" section of the Wiki page, I get the following. Any idea
> > what I've done wrong? Thanks in advance for your help.
> >
> > $ curl http://localhost:8983/solr/update/extract?ext.idx.attr=true
> > \&ext.def.fl=text -F "myfi...@tutorial.pdf"
> > <html>
> > <head>
> > <meta http-equiv="Content-Type" content="text/html;
> > charset=ISO-8859-1"/>
> > <title>Error 500 </title>
> > </head>
> > <body><h2>HTTP ERROR: 500</h2><pre>org.apache.solr.common.SolrException:
> > Document [null] missing required field: id
> >
> > org.apache.solr.common.SolrException:
> > org.apache.solr.common.SolrException: Document [null] missing required
> > field: id
> >        at
> > org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:169)
> >        at
> > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
> >        at
> > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
> >        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1333)
> >        at
> > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
> >        at
> > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
> >        at org.mortbay.jetty.servlet.ServletHandler
> > $CachedChain.doFilter(ServletHandler.java:1089)
> >        at
> > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
> >        at
> > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
> >        at
> > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
> >        at
> > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
> >        at
> > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
> >        at
> > org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
> >        at
> > org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
> >        at
> > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
> >        at org.mortbay.jetty.Server.handle(Server.java:285)
> >        at
> > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
> >        at org.mortbay.jetty.HttpConnection
> > $RequestHandler.content(HttpConnection.java:835)
> >        at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641)
> >        at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:202)
> >        at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
> >        at org.mortbay.jetty.bio.SocketConnector
> > $Connection.run(SocketConnector.java:226)
> >        at org.mortbay.thread.BoundedThreadPool
> > $PoolThread.run(BoundedThreadPool.java:442)
> > Caused by: org.apache.solr.common.SolrException: Document [null] missing
> > required field: id
> >        at
> > org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:292)
> >        at
> > org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:59)
> >        at
> > org.apache.solr.handler.extraction.ExtractingDocumentLoader.doAdd(ExtractingDocumentLoader.java:90)
> >        at
> > org.apache.solr.handler.extraction.ExtractingDocumentLoader.addDoc(ExtractingDocumentLoader.java:95)
> >        at
> > org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:157)
> >        ... 22 more
> > </pre>
> > <p>RequestURI=/solr/update/extract</p><p><i><small><a
> > href="http://jetty.mortbay.org/";>Powered by
> > Jetty://</a></small></i></p><br/>
> > <br/>
> > <br/>
> > <br/>
> > <br/>
> > <br/>
> > <br/>
> > <br/>
> > <br/>
> > <br/>
> > <br/>
> > <br/>
> > <br/>
> > <br/>
> > <br/>
> > <br/>
> > <br/>
> > <br/>
> > <br/>
> > <br/>
> >
> > </body>
> > </html>
> >
> >
> > Larry Reid
> > Principal Consultant, Jade Systems Inc
> > Mobile: +1 604.376.8884
> > Pragmatic IT Blog | El Blog Technologia Pragmatica | www.jadesystems.ca
> >


Larry Reid
Principal Consultant, Jade Systems Inc
Mobile: +1 604.376.8884
Pragmatic IT Blog | El Blog Technologia Pragmatica | www.jadesystems.ca

Reply via email to