What's a GA release? Dennis Gearon
Signature Warning ---------------- EARTH has a Right To Life, otherwise we all die. Read 'Hot, Flat, and Crowded' Laugh at http://www.yert.com/film.php --- On Fri, 9/24/10, Lance Norskog <goks...@gmail.com> wrote: > From: Lance Norskog <goks...@gmail.com> > Subject: Re: Data Import Handler Rich Format Documents > To: solr-user@lucene.apache.org > Date: Friday, September 24, 2010, 6:19 PM > The TikaEntityProcessor is the class > in the DIH that calls the Tika libraries. > TikaEntityProcessor is not in Solr 1.4 or 1.4.1. It is in > the trunk and the 3.x branch. > > I have set it up from the 3.x branch. I discovered that the > "DefaultParser" does not work, and you have to explicitly > name the parser for the file format you want to use. > > https://issues.apache.org/jira/browse/SOLR-2116 > > Tod wrote: > > On 9/23/2010 6:52 AM, mehdi.es...@gmail.com > wrote: > >> Hi, > >> I have exactly the same problem than the one you > submitted in this link > http://lucene.472066.n3.nabble.com/Data-Import-Handler-Rich-Format-Documents-td905478.html > and I would like to ask you if you got a solution for that. > >> I started to have a look on tika and > DataImportHandler but I don't success to find to right way > of writing the syntax. > >> So can you please give an example if you successed > to find the right syntax. > >> Thanks. > > > > Bumping this to the list... > > > > Unfortunately I could never get DIH to work > correctly. My suspicion is that I was using a stock > 1.4.0 Solr but attempting to perform a task that was only > available on the latest build. My customer > requirements demand a pretty well vetted GA release so > experimenting was not an option. I attempted an > upgrade (quickly, sloppily) to 1.4.1 but no luck. I > believe the next GA release might be my solution. > > > > I tried getting around that bump by trying SolrJ > ContentStreamUpdateRequest @ > http://lucene.472066.n3.nabble.com/Solrj-ContentStreamUpdateRequest-Slow-td1023630.html#a1301927. > After floundering for a while I decided to put that on > hold. I ended up writing a Perl script that emulates > the command line cURL that I referenced in the above > thread. It took about 72 hours to index ~850,000 > entries (if anyone is interested). > > > > I plan on looping back to try the suggestions Hoss > last made, just haven't had the time to respond. I'm > sure things will work I just needed something quickly and > don't have the seasoned experience the other developers do. > > > > > > - Tod >