Take a look at some of the integrations people are using with apache storm, we do something similar on a larger scale , having created a pgsql spout and having a solr indexing bolt.
-msj On Mon, Mar 16, 2015 at 11:08 AM, Hal Roberts < hrobe...@cyber.law.harvard.edu> wrote: > We import anywhere from five to fifty million small documents a day from a > postgres database. I wrestled to get the DIH stuff to work for us for > about a year and was much happier when I ditched that approach and switched > to writing the few hundred lines of relatively simple code to handle > directly the logic of what gets updated and how it gets queried from > postgres ourselves. > > The DIH stuff is great for lots of cases, but if you are getting to the > point of trying to hack its undocumented internals, I suspect you are > better off spending a day or two of your time just writing all of the > update logic yourself. > > We found a relatively simple combination of postgres triggers, export to > csv based on those triggers, and then just calling update/csv to work best > for us. > > -hal > > > On 3/16/15 9:59 AM, Shawn Heisey wrote: > >> On 3/16/2015 7:15 AM, sreedevi s wrote: >> >>> I had checked this post.I dont know whether this is possible but my query >>> is whether I can use the configuration for DIH for indexing via SolrJ >>> >> >> You can use SolrJ for accessing DIH. I have code that does this, but >> only for full index rebuilds. >> >> It won't be particularly obvious how to do it. Writing code that can >> intepret DIH status and know when it finishes, succeeds, or fails is >> very tricky because DIH only uses human-readable status info, not >> machine-readable, and the info is not very consistent. >> >> I can't just share my code, because it's extremely convoluted ... but >> the general gist is to create a SolrQuery object, use setRequestHandler >> to set the handler to "/dataimport" or whatever your DIH handler is, and >> set the other parameters on the request like "command" to "full-import" >> and so on. >> >> Thanks, >> Shawn >> >> > -- > Hal Roberts > Fellow > Berkman Center for Internet & Society > Harvard University >