writing to a remote Solr through SolrJ is in the cards. I may even take it up after 1.4 release. For now your best bet is to override the class SolrWriter and override the corresponding methods for add/delete.
On Wed, Apr 29, 2009 at 2:06 AM, Amit Nithian <anith...@gmail.com> wrote: > I do remember LuSQL and a discussion regarding the performance implications > of using it compared to the DIH. My only reason to stick with DIH is that we > may have other data sources for document loading in the near term that may > make LuSQL too specific for our needs. > > Regarding the bug to write to the index in a separate thread, while helpful, > doesn't address my use case which is as follows: > 1) Write a loader application using EmbeddedSolr + SolrJ + DIH (create a > bogus local request with path='/dataimport') so that the DIH code is invoked > 2) Instead of using DirectUpdate2 update handler, write a custom update > handler to take a solr document and POST to a remote Solr server. I could > queue documents here and POST in bulk but that's details.. > 3) Possibly multi-thread the DIH so that multiple threads can process > different database segments, construct and POST solr documents. > - For example, thread 1 processes IDs 1-100, thread 2, 101-200, thread 3, > 201-... > - If the Solr Server is multithreaded in writing to the index, that's > great and helps in performance. > > #3 is possible depending on performance tests. #1 and #2 I believe I need > because I want my loader separated from the master server for development, > deployment and just general separation of concerns. > > Thanks > Amit > > On Tue, Apr 28, 2009 at 6:03 AM, Glen Newton <glen.new...@gmail.com> wrote: > >> Amit, >> >> You might want to take a look at LuSql[1] and see if it may be >> appropriate for the issues you have. >> >> thanks, >> >> Glen >> >> [1]http://lab.cisti-icist.nrc-cnrc.gc.ca/cistilabswiki/index.php/LuSql >> >> 2009/4/27 Amit Nithian <anith...@gmail.com>: >> > All, >> > I have a few questions regarding the data import handler. We have some >> > pretty gnarly SQL queries to load our indices and our current loader >> > implementation is extremely fragile. I am looking to migrate over to the >> > DIH; however, I am looking to use SolrJ + EmbeddedSolr + some custom >> stuff >> > to remotely load the indices so that my index loader and main search >> engine >> > are separated. >> > Currently, unless I am missing something, the data gathering from the >> entity >> > and the data processing (i.e. conversion to a Solr Document) is done >> > sequentially and I was looking to make this execute in parallel so that I >> > can have multiple threads processing different parts of the resultset and >> > loading documents into Solr. Secondly, I need to create temporary tables >> to >> > store results of a few queries and use them later for inner joins was >> > wondering how to best go about this? >> > >> > I am thinking to add support in DIH for the following: >> > 1) Temporary tables (maybe call it temporary entities)? --Specific only >> to >> > SQL though unless it can be generalized to other sources. >> > 2) Parallel support >> > - Including some mechanism to get the number of records (whether it be >> > count or the MAX(custom_id)-MIN(custom_id)) >> > 3) Support in DIH or Solr to post documents to a remote index (i.e. >> create a >> > new UpdateHandler instead of DirectUpdateHandler2). >> > >> > If any of these exist or anyone else is working on this (OR you have >> better >> > suggestions), please let me know. >> > >> > Thanks! >> > Amit >> > >> >> >> >> -- >> >> - >> > -- --Noble Paul