All,
I have a few questions regarding the data import handler. We have some
pretty gnarly SQL queries to load our indices and our current loader
implementation is extremely fragile. I am looking to migrate over to the
DIH; however, I am looking to use SolrJ + EmbeddedSolr + some custom stuff
to remotely load the indices so that my index loader and main search engine
are separated.
Currently, unless I am missing something, the data gathering from the entity
and the data processing (i.e. conversion to a Solr Document) is done
sequentially and I was looking to make this execute in parallel so that I
can have multiple threads processing different parts of the resultset and
loading documents into Solr. Secondly, I need to create temporary tables to
store results of a few queries and use them later for inner joins was
wondering how to best go about this?

I am thinking to add support in DIH for the following:
1) Temporary tables (maybe call it temporary entities)? --Specific only to
SQL though unless it can be generalized to other sources.
2) Parallel support
  - Including some mechanism to get the number of records (whether it be
count or the MAX(custom_id)-MIN(custom_id))
3) Support in DIH or Solr to post documents to a remote index (i.e. create a
new UpdateHandler instead of DirectUpdateHandler2).

If any of these exist or anyone else is working on this (OR you have better
suggestions), please let me know.

Thanks!
Amit

Reply via email to