Re: Document is missing mandatory uniqueKey field: id for Solr PDF indexing

Furkan KAMACI Sat, 27 Apr 2013 04:55:57 -0700

OK, I asked another question for it and it has solved, thanks.

2013/4/26 Furkan KAMACI <furkankam...@gmail.com>


> Jack, thanks for your answers. Ok, when I remove -Durl parameter I think
> it works, thanks. However I think that I have a problem with my schema. I
> get that error:
>
> Apr 26, 2013 3:52:21 PM org.apache.solr.common.SolrException log
> SEVERE: org.apache.solr.common.SolrException: ERROR:
> [doc=/home/ll/Desktop/b/lucene-solr-lucene_solr_4_2_1/solr/example/exampledocs/523387.pdf]
> multiple values encountered for non multiValued copy field text:
> application/pdf
>
>
> 2013/4/26 Jack Krupansky <j...@basetechnology.com>
>
>> Maybe you are confusing things by mixing instructions - there are
>> SEPARATE instructions for directly using SolrCell and implicitly using it
>> via post.jar. Pick which you want and stick with it. DO NOT MIX the
>> instructions.
>>
>> You wrote: " I run that command: java -Durl=
>> http://localhost:8983/solr/update/extract -jar post.jar 523387.pdf"
>>
>> Was there a GOOD reason that you chose that URL?
>>
>> Best to stay with what the post.jar wiki recommends:
>>
>> Post all CSV, XML, JSON and PDF documents using AUTO mode which detects
>> type based on file name:
>>
>> java -Dauto -jar post.jar *.csv *.xml *.json *.pdf
>>
>> Or, stick with SolrCell directly, but follow its distinct instructions:
>> http://wiki.apache.org/solr/ExtractingRequestHandler
>>
>> Again, DO NOT MIX the instructions from the two.
>>
>> post.jar is designed so that you do not need to know or care exactly how
>> rich document indexing works.
>>
>> -- Jack Krupansky
>>
>> -----Original Message----- From: Furkan KAMACI
>> Sent: Friday, April 26, 2013 5:30 AM
>> To: solr-user@lucene.apache.org
>> Subject: Document is missing mandatory uniqueKey field: id for Solr PDF
>> indexing
>>
>>
>> I use Solr 4.2.1 and these are my fields:
>>
>> <field name="id" type="string" indexed="true" stored="true"
>> required="true"
>> multiValued="false" />
>> <field name="text" type="text_general" indexed="true" stored="true"/>
>>
>>
>> <!-- Common metadata fields, named specifically to match up with
>> SolrCell metadata when parsing rich documents such as Word, PDF.
>> Some fields are multiValued only because Tika currently may return
>> multiple values for them. Some metadata is parsed from the documents,
>> but there are some which come from the client context:
>> "content_type": From the HTTP headers of incoming stream
>> "resourcename": From SolrCell request param resource.name
>> -->
>> <field name="title" type="text_general" indexed="true" stored="true"
>> multiValued="true"/>
>> <field name="subject" type="text_general" indexed="true" stored="true"/>
>> <field name="description" type="text_general" indexed="true"
>> stored="true"/>
>> <field name="comments" type="text_general" indexed="true" stored="true"/>
>> <field name="author" type="text_general" indexed="true" stored="true"/>
>> <field name="keywords" type="text_general" indexed="true" stored="true"/>
>> <field name="category" type="text_general" indexed="true" stored="true"/>
>> <field name="resourcename" type="text_general" indexed="true"
>> stored="true"/>
>> <field name="url" type="text_general" indexed="true" stored="true"/>
>> <field name="content_type" type="string" indexed="true" stored="true"
>> multiValued="true"/>
>> <field name="last_modified" type="date" indexed="true" stored="true"/>
>> <field name="links" type="string" indexed="true" stored="true"
>> multiValued="true"/>
>>
>> <!-- Main body of document extracted by SolrCell.
>> NOTE: This field is not indexed by default, since it is also copied to
>> "text"
>> using copyField below. This is to save space. Use this field for returning
>> and
>> highlighting document content. Use the "text" field to search the content.
>> -->
>> <field name="content" type="text_general" indexed="false" stored="true"
>> multiValued="true"/>
>>
>>
>> <!-- catchall field, containing all other searchable text fields
>> (implemented
>> via copyField further on in this schema -->
>> <!--
>> <field name="text" type="text_general" indexed="true" stored="false"
>> multiValued="true"/>
>> -->
>> <!-- catchall text field that indexes tokens both normally and in reverse
>> for efficient
>> leading wildcard queries. -->
>> <field name="text_rev" type="text_general_rev" indexed="true"
>> stored="false" multiValued="true"/>
>>
>> <!-- non-tokenized version of manufacturer to make it easier to sort or
>> group
>> results by manufacturer. copied from "manu" via copyField -->
>> <field name="manu_exact" type="string" indexed="true" stored="false"/>
>>
>> <field name="payloads" type="payloads" indexed="true" stored="true"/>
>>
>> <field name="_version_" type="long" indexed="true" stored="true"/>
>>
>> I run that command:
>>
>> java -Durl=http://localhost:8983/solr/update/extract -jar post.jar
>> 523387.pdf
>>
>> However I get that error, any ideas?
>>
>> Apr 26, 2013 12:26:51 PM org.apache.solr.common.SolrException log
>> SEVERE: org.apache.solr.common.SolrException: Document is missing
>> mandatory
>> uniqueKey field: id
>> at
>>
>> org.apache.solr.update.AddUpdateCommand.getIndexedId(AddUpdateCommand.java:88)
>> at
>>
>> org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:464)
>> at
>>
>> org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:346)
>> at
>>
>> org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100)
>> at
>>
>> org.apache.solr.handler.extraction.ExtractingDocumentLoader.doAdd(ExtractingDocumentLoader.java:121)
>> at
>>
>> org.apache.solr.handler.extraction.ExtractingDocumentLoader.addDoc(ExtractingDocumentLoader.java:126)
>> at
>>
>> org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:228)
>> at
>>
>> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
>> at
>>
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
>> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1817)
>> at
>>
>> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:639)
>> at
>>
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)
>> at
>>
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141)
>> at
>>
>> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307)
>> at
>> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453)
>> at
>>
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
>> at
>>
>> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560)
>> at
>>
>> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
>> at
>>
>> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072)
>> at
>> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382)
>> at
>>
>> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
>> at
>>
>> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006)
>> at
>>
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
>> at
>>
>> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
>> at
>>
>> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
>> at
>>
>> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
>> at org.eclipse.jetty.server.Server.handle(Server.java:365)
>> at
>>
>> org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:485)
>> at
>>
>> org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
>> at
>>
>> org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:937)
>> at
>>
>> org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:998)
>> at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:856)
>> at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240)
>> at
>>
>> org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
>> at
>>
>> org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
>> at
>>
>> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
>> at
>>
>> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
>> at java.lang.Thread.run(Thread.java:722)
>>
>
>

Re: Document is missing mandatory uniqueKey field: id for Solr PDF indexing

Reply via email to