This might be helpful: http://searchhub.org/2012/02/14/indexing-with-solrj/
It combines using Tika for structured documents and using a JDBC connector, but extracting the DB-specific stuff should be quite easy. Best, Erick On Sun, Apr 27, 2014 at 7:24 AM, Yuval Dotan <yuvaldo...@gmail.com> wrote: > Thanks Shawn > > In your opinion, what do you think is easier, writing the importer from > scratch or extending the DIH (for example: adding the state etc...)? > > > Yuval > > > On Thu, Apr 24, 2014 at 6:47 PM, Shawn Heisey <s...@elyograg.org> wrote: > >> On 4/24/2014 9:24 AM, Yuval Dotan wrote: >> >>> I want to use the DIH component in order to import data from old >>> postgresql >>> DB. >>> I want to be able to recover from errors and crashes. >>> If an error occurs I should be able to restart and continue indexing from >>> where it stopped. >>> Is the DIH good enough for my requirements ? >>> If not is it possible to extend one of its classes in order to support the >>> recovery? >>> >> >> The entity in the Dataimport Handler (DIH) config has an "onError" >> attribute. >> >> http://wiki.apache.org/solr/DataImportHandler#Schema_for_the_data_config >> https://cwiki.apache.org/confluence/display/solr/ >> Uploading+Structured+Data+Store+Data+with+the+Data+Import+Handler# >> UploadingStructuredDataStoreDatawiththeDataImportHandler-EntityProcessors >> >> But honestly, if you want a really robust Java program that indexes to >> Solr and does precisely what you want, you may be better off writing it >> yourself using SolrJ and JDBC. DIH is powerful and efficient, but when you >> write the program yourself, you can do anything you want with your data. >> >> You also have the possibility of resuming an import after a Solr crash. >> Because DIH is embedded in Solr and doesn't save any kind of state data >> about an import in progress, that's pretty much impossible with DIH. With >> a SolrJ program, you'd have to handle that yourself, but it would be >> *possible*. >> >> https://cwiki.apache.org/confluence/display/solr/Using+SolrJ >> >> Thanks, >> Shawn >> >>