On 9/23/2010 6:52 AM, mehdi.es...@gmail.com wrote:
Hi,
I have exactly the same problem than the one you submitted in this link 
http://lucene.472066.n3.nabble.com/Data-Import-Handler-Rich-Format-Documents-td905478.html
 and I would like to ask you if you got a solution for that.
I started to have a look on tika and DataImportHandler but I don't success to 
find to right way of writing the syntax.
So can you please give an example if you successed to find the right syntax.
Thanks.

Bumping this to the list...

Unfortunately I could never get DIH to work correctly. My suspicion is that I was using a stock 1.4.0 Solr but attempting to perform a task that was only available on the latest build. My customer requirements demand a pretty well vetted GA release so experimenting was not an option. I attempted an upgrade (quickly, sloppily) to 1.4.1 but no luck. I believe the next GA release might be my solution.

I tried getting around that bump by trying SolrJ ContentStreamUpdateRequest @ http://lucene.472066.n3.nabble.com/Solrj-ContentStreamUpdateRequest-Slow-td1023630.html#a1301927. After floundering for a while I decided to put that on hold. I ended up writing a Perl script that emulates the command line cURL that I referenced in the above thread. It took about 72 hours to index ~850,000 entries (if anyone is interested).

I plan on looping back to try the suggestions Hoss last made, just haven't had the time to respond. I'm sure things will work I just needed something quickly and don't have the seasoned experience the other developers do.


- Tod

Reply via email to