This might be helpful: http://searchhub.org/2012/02/14/indexing-with-solrj/

It combines using Tika for structured documents and using a JDBC
connector, but extracting the DB-specific stuff should be quite easy.

Best,
Erick

On Sun, Apr 27, 2014 at 7:24 AM, Yuval Dotan <yuvaldo...@gmail.com> wrote:
> Thanks Shawn
>
> In your opinion, what do you think is easier, writing the importer from
> scratch or extending the DIH (for example: adding the state etc...)?
>
>
> Yuval
>
>
> On Thu, Apr 24, 2014 at 6:47 PM, Shawn Heisey <s...@elyograg.org> wrote:
>
>> On 4/24/2014 9:24 AM, Yuval Dotan wrote:
>>
>>> I want to use the DIH component in order to import data from old
>>> postgresql
>>> DB.
>>> I want to be able to recover from errors and crashes.
>>> If an error occurs I should be able to restart and continue indexing from
>>> where it stopped.
>>> Is the DIH good enough for my requirements ?
>>> If not is it possible to extend one of its classes in order to support the
>>> recovery?
>>>
>>
>> The entity in the Dataimport Handler (DIH) config has an "onError"
>> attribute.
>>
>> http://wiki.apache.org/solr/DataImportHandler#Schema_for_the_data_config
>> https://cwiki.apache.org/confluence/display/solr/
>> Uploading+Structured+Data+Store+Data+with+the+Data+Import+Handler#
>> UploadingStructuredDataStoreDatawiththeDataImportHandler-EntityProcessors
>>
>> But honestly, if you want a really robust Java program that indexes to
>> Solr and does precisely what you want, you may be better off writing it
>> yourself using SolrJ and JDBC.  DIH is powerful and efficient, but when you
>> write the program yourself, you can do anything you want with your data.
>>
>> You also have the possibility of resuming an import after a Solr crash.
>>  Because DIH is embedded in Solr and doesn't save any kind of state data
>> about an import in progress, that's pretty much impossible with DIH.  With
>> a SolrJ program, you'd have to handle that yourself, but it would be
>> *possible*.
>>
>> https://cwiki.apache.org/confluence/display/solr/Using+SolrJ
>>
>> Thanks,
>> Shawn
>>
>>

Reply via email to