Thanks Shawn

In your opinion, what do you think is easier, writing the importer from
scratch or extending the DIH (for example: adding the state etc...)?


Yuval


On Thu, Apr 24, 2014 at 6:47 PM, Shawn Heisey <s...@elyograg.org> wrote:

> On 4/24/2014 9:24 AM, Yuval Dotan wrote:
>
>> I want to use the DIH component in order to import data from old
>> postgresql
>> DB.
>> I want to be able to recover from errors and crashes.
>> If an error occurs I should be able to restart and continue indexing from
>> where it stopped.
>> Is the DIH good enough for my requirements ?
>> If not is it possible to extend one of its classes in order to support the
>> recovery?
>>
>
> The entity in the Dataimport Handler (DIH) config has an "onError"
> attribute.
>
> http://wiki.apache.org/solr/DataImportHandler#Schema_for_the_data_config
> https://cwiki.apache.org/confluence/display/solr/
> Uploading+Structured+Data+Store+Data+with+the+Data+Import+Handler#
> UploadingStructuredDataStoreDatawiththeDataImportHandler-EntityProcessors
>
> But honestly, if you want a really robust Java program that indexes to
> Solr and does precisely what you want, you may be better off writing it
> yourself using SolrJ and JDBC.  DIH is powerful and efficient, but when you
> write the program yourself, you can do anything you want with your data.
>
> You also have the possibility of resuming an import after a Solr crash.
>  Because DIH is embedded in Solr and doesn't save any kind of state data
> about an import in progress, that's pretty much impossible with DIH.  With
> a SolrJ program, you'd have to handle that yourself, but it would be
> *possible*.
>
> https://cwiki.apache.org/confluence/display/solr/Using+SolrJ
>
> Thanks,
> Shawn
>
>

Reply via email to