Re: indexing db records via SolrJ

Hal Roberts Mon, 16 Mar 2015 08:10:26 -0700

We import anywhere from five to fifty million small documents a day froma postgres database. I wrestled to get the DIH stuff to work for us forabout a year and was much happier when I ditched that approach andswitched to writing the few hundred lines of relatively simple code tohandle directly the logic of what gets updated and how it gets queriedfrom postgres ourselves.

The DIH stuff is great for lots of cases, but if you are getting to thepoint of trying to hack its undocumented internals, I suspect you arebetter off spending a day or two of your time just writing all of theupdate logic yourself.

We found a relatively simple combination of postgres triggers, export tocsv based on those triggers, and then just calling update/csv to workbest for us.


-hal

On 3/16/15 9:59 AM, Shawn Heisey wrote:

On 3/16/2015 7:15 AM, sreedevi s wrote:

I had checked this post.I dont know whether this is possible but my query
is whether I can use the configuration for DIH for indexing via SolrJ


You can use SolrJ for accessing DIH.  I have code that does this, but
only for full index rebuilds.

It won't be particularly obvious how to do it.  Writing code that can
intepret DIH status and know when it finishes, succeeds, or fails is
very tricky because DIH only uses human-readable status info, not
machine-readable, and the info is not very consistent.

I can't just share my code, because it's extremely convoluted ... but
the general gist is to create a SolrQuery object, use setRequestHandler
to set the handler to "/dataimport" or whatever your DIH handler is, and
set the other parameters on the request like "command" to "full-import"
and so on.

Thanks,
Shawn


--
Hal Roberts
Fellow
Berkman Center for Internet & Society
Harvard University

Re: indexing db records via SolrJ

Reply via email to