All, I have a few questions regarding the data import handler. We have some pretty gnarly SQL queries to load our indices and our current loader implementation is extremely fragile. I am looking to migrate over to the DIH; however, I am looking to use SolrJ + EmbeddedSolr + some custom stuff to remotely load the indices so that my index loader and main search engine are separated. Currently, unless I am missing something, the data gathering from the entity and the data processing (i.e. conversion to a Solr Document) is done sequentially and I was looking to make this execute in parallel so that I can have multiple threads processing different parts of the resultset and loading documents into Solr. Secondly, I need to create temporary tables to store results of a few queries and use them later for inner joins was wondering how to best go about this?
I am thinking to add support in DIH for the following: 1) Temporary tables (maybe call it temporary entities)? --Specific only to SQL though unless it can be generalized to other sources. 2) Parallel support - Including some mechanism to get the number of records (whether it be count or the MAX(custom_id)-MIN(custom_id)) 3) Support in DIH or Solr to post documents to a remote index (i.e. create a new UpdateHandler instead of DirectUpdateHandler2). If any of these exist or anyone else is working on this (OR you have better suggestions), please let me know. Thanks! Amit