: Right. You're requiring that every document have an ID (via uniqueKey), but
: there's nothing
: magic about DIH that'll automagically parse a PDF file and map something
: into your ID
: field.
: 
: So you have to create a unique ID before you send your doc to Curl. I'm

a) This example isn't using DIH, it's using the extracting request handler 
directly

b) in the example URL provided, Ahson was already using the exact syntax 
you mentioned...

: > curl
: > "
: > 
http://localhost:8983/solr1/update/extract?literal.DocID=123&fmap.content=Contents&commit=true
: > "
: >  -F "myfi...@d:/solr/apache-solr-1.4.0/docs/filename1.pdf"

...note the "literal.DocID" param (where "DocID" is the field listed as 
uniqueKey in his example)

The actual root of the problem is that the "lowernames" param 
(which is declared "true" in the Solr 1.4 example declaration of 
/update/extract) is getting applied to all field names, even the literal 
ones...

http://wiki.apache.org/solr/ExtractingRequestHandler#Order_of_field_operations

Ahson: You could change your uniqueKey field to something that is all 
lowercase, or you could set lowernames=false in your config (which will 
impact all field names extract by Tika)

(Personally, i think the order of operations in the 
ExtractingRequestHandler makes no sense at all)

-Hoss

Reply via email to