Grant, you are quite right! I was too far down in the weeds, and didn't need to be doing all that crazyness.

However, one other comment, I saw you edited the wiki (thank you!) and the line:

+ It is highly recommend that you try using the extract only option to see what values actually get set for these.

I am not sure that is correct, althought it is what I would expect. When I run:

budapest:karaoke epugh$ curl http://localhost:8983/solr/karaoke/update/extract?ext.def.fl=text \&ext.extract.only=true -F "fi...@mccm.pdf" <?xml version="1.0" encoding="UTF-8"?>


My response I get back (via curl) looks like:
<response>
<lst name="responseHeader"><int name="status">0</int><int name="QTime">1728</int></lst><str name="mccm.pdf">&lt;?xml version="1.0" encoding="UTF-8"?&gt;

SNIP LOTS OF DOCUMENT CONTENT

&lt;/div&gt;
&lt;/body&gt;
&lt;/html&gt;
</str>
</response>

And I don't actually see the metadata fields. I would expect to however!

Eric



On May 28, 2009, at 8:28 PM, Grant Ingersoll wrote:


On May 28, 2009, at 11:29 AM, Eric Pugh wrote:

Hi all,

I want to use the Tika attribute stream_name as my unique key, which I can do if I specify <uniqueKey>stream_name</uniqueKey/> and run curl:

curl http://localhost:8983/solr/karaoke/update/extract?ext.def.fl=text \&ext.capture=stream_name\&ext.map.stream_name=stream_name -F "fi...@angeleyes.kar "



Why do you need to have the ext.capture and why do you need to map stream_name to stream_name? If the name in tika metadata is a field name, you don't need to map.

Also, I assume I'm missing something here because why can't you just pass in id=<name of the stream> since presumably, in your examples anyway, you have this info, right? If not, I don't know where else you are getting it from, b/c it is a Solr thing, not a Tika thing. In fact, that reminds me, I should document those values that the ERH adds to the Metadata.

However, this means that I can't use the ext.metadata.prefix to capture the other metadata fields via:

curl http://localhost:8983/solr/karaoke/update/extract?ext.def.fl=text \&ext.metadata.prefix=metadata_\&ext.capture=stream_name \&ext.map.stream_name=stream_name -F "fi...@angeleyes.kar"

If I do, it seems like stream_name is lost becasue it is now metadata_stream_name, but I can't use that name in my ext.capture and ext.map:

curl http://localhost:8983/solr/karaoke/update/extract?ext.def.fl=text \&ext.metadata.prefix=metadata_\&ext.capture=metadata_stream_name \&ext.map.metadata_stream_name=stream_name -F "fi...@angeleyes.kar"

Any ideas?  Currently seems like an either/or, but I'd like both!

Eric


-----------------------------------------------------
Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com
Free/Busy: http://tinyurl.com/eric-cal





--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene:
http://www.lucidimagination.com/search


-----------------------------------------------------
Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | 
http://www.opensourceconnections.com
Free/Busy: http://tinyurl.com/eric-cal




Reply via email to