Grant, you are quite right! I was too far down in the weeds, and
didn't need to be doing all that crazyness.
However, one other comment, I saw you edited the wiki (thank you!) and
the line:
+ It is highly recommend that you try using the extract only option to
see what values actually get set for these.
I am not sure that is correct, althought it is what I would expect.
When I run:
budapest:karaoke epugh$ curl http://localhost:8983/solr/karaoke/update/extract?ext.def.fl=text
\&ext.extract.only=true -F "fi...@mccm.pdf" <?xml version="1.0"
encoding="UTF-8"?>
My response I get back (via curl) looks like:
<response>
<lst name="responseHeader"><int name="status">0</int><int
name="QTime">1728</int></lst><str name="mccm.pdf"><?xml
version="1.0" encoding="UTF-8"?>
SNIP LOTS OF DOCUMENT CONTENT
</div>
</body>
</html>
</str>
</response>
And I don't actually see the metadata fields. I would expect to
however!
Eric
On May 28, 2009, at 8:28 PM, Grant Ingersoll wrote:
On May 28, 2009, at 11:29 AM, Eric Pugh wrote:
Hi all,
I want to use the Tika attribute stream_name as my unique key,
which I can do if I specify <uniqueKey>stream_name</uniqueKey/> and
run curl:
curl http://localhost:8983/solr/karaoke/update/extract?ext.def.fl=text
\&ext.capture=stream_name\&ext.map.stream_name=stream_name -F "fi...@angeleyes.kar
"
Why do you need to have the ext.capture and why do you need to map
stream_name to stream_name? If the name in tika metadata is a field
name, you don't need to map.
Also, I assume I'm missing something here because why can't you just
pass in id=<name of the stream> since presumably, in your examples
anyway, you have this info, right? If not, I don't know where else
you are getting it from, b/c it is a Solr thing, not a Tika thing.
In fact, that reminds me, I should document those values that the
ERH adds to the Metadata.
However, this means that I can't use the ext.metadata.prefix to
capture the other metadata fields via:
curl http://localhost:8983/solr/karaoke/update/extract?ext.def.fl=text
\&ext.metadata.prefix=metadata_\&ext.capture=stream_name
\&ext.map.stream_name=stream_name -F "fi...@angeleyes.kar"
If I do, it seems like stream_name is lost becasue it is now
metadata_stream_name, but I can't use that name in my ext.capture
and ext.map:
curl http://localhost:8983/solr/karaoke/update/extract?ext.def.fl=text
\&ext.metadata.prefix=metadata_\&ext.capture=metadata_stream_name
\&ext.map.metadata_stream_name=stream_name -F "fi...@angeleyes.kar"
Any ideas? Currently seems like an either/or, but I'd like both!
Eric
-----------------------------------------------------
Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467
| http://www.opensourceconnections.com
Free/Busy: http://tinyurl.com/eric-cal
--------------------------
Grant Ingersoll
http://www.lucidimagination.com/
Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
using Solr/Lucene:
http://www.lucidimagination.com/search
-----------------------------------------------------
Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 |
http://www.opensourceconnections.com
Free/Busy: http://tinyurl.com/eric-cal