On 9/23/2010 6:52 AM, mehdi.es...@gmail.com wrote:
Hi,
I have exactly the same problem than the one you submitted in this link
http://lucene.472066.n3.nabble.com/Data-Import-Handler-Rich-Format-Documents-td905478.html
and I would like to ask you if you got a solution for that.
I started to have a look on tika and DataImportHandler but I don't success to
find to right way of writing the syntax.
So can you please give an example if you successed to find the right syntax.
Thanks.
Bumping this to the list...
Unfortunately I could never get DIH to work correctly. My suspicion is
that I was using a stock 1.4.0 Solr but attempting to perform a task
that was only available on the latest build. My customer requirements
demand a pretty well vetted GA release so experimenting was not an
option. I attempted an upgrade (quickly, sloppily) to 1.4.1 but no
luck. I believe the next GA release might be my solution.
I tried getting around that bump by trying SolrJ
ContentStreamUpdateRequest @
http://lucene.472066.n3.nabble.com/Solrj-ContentStreamUpdateRequest-Slow-td1023630.html#a1301927.
After floundering for a while I decided to put that on hold. I ended
up writing a Perl script that emulates the command line cURL that I
referenced in the above thread. It took about 72 hours to index
~850,000 entries (if anyone is interested).
I plan on looping back to try the suggestions Hoss last made, just
haven't had the time to respond. I'm sure things will work I just
needed something quickly and don't have the seasoned experience the
other developers do.
- Tod