writing to a remote Solr through SolrJ is in the cards. I may even
take it up after 1.4 release. For now your best bet is to override the
class SolrWriter and override the corresponding methods for
add/delete.
On Wed, Apr 29, 2009 at 2:06 AM, Amit Nithian <anith...@gmail.com> wrote:
> I do remember LuSQL and a discussion regarding the performance implications
> of using it compared to the DIH. My only reason to stick with DIH is that we
> may have other data sources for document loading in the near term that may
> make LuSQL too specific for our needs.
>
> Regarding the bug to write to the index in a separate thread, while helpful,
> doesn't address my use case which is as follows:
> 1) Write a loader application using EmbeddedSolr + SolrJ + DIH (create a
> bogus local request with path='/dataimport') so that the DIH code is invoked
> 2) Instead of using DirectUpdate2 update handler, write a custom update
> handler to take a solr document and POST to a remote Solr server. I could
> queue documents here and POST in bulk but that's details..
> 3) Possibly multi-thread the DIH so that multiple threads can process
> different database segments, construct and POST solr documents.
>  - For example, thread 1 processes IDs 1-100, thread 2, 101-200, thread 3,
> 201-...
>  - If the Solr Server is multithreaded in writing to the index, that's
> great and helps in performance.
>
> #3 is possible depending on performance tests. #1 and #2 I believe I need
> because I want my loader separated from the master server for development,
> deployment and just general separation of concerns.
>
> Thanks
> Amit
>
> On Tue, Apr 28, 2009 at 6:03 AM, Glen Newton <glen.new...@gmail.com> wrote:
>
>> Amit,
>>
>> You might want to take a look at LuSql[1] and see if it may be
>> appropriate for the issues you have.
>>
>> thanks,
>>
>> Glen
>>
>> [1]http://lab.cisti-icist.nrc-cnrc.gc.ca/cistilabswiki/index.php/LuSql
>>
>> 2009/4/27 Amit Nithian <anith...@gmail.com>:
>> > All,
>> > I have a few questions regarding the data import handler. We have some
>> > pretty gnarly SQL queries to load our indices and our current loader
>> > implementation is extremely fragile. I am looking to migrate over to the
>> > DIH; however, I am looking to use SolrJ + EmbeddedSolr + some custom
>> stuff
>> > to remotely load the indices so that my index loader and main search
>> engine
>> > are separated.
>> > Currently, unless I am missing something, the data gathering from the
>> entity
>> > and the data processing (i.e. conversion to a Solr Document) is done
>> > sequentially and I was looking to make this execute in parallel so that I
>> > can have multiple threads processing different parts of the resultset and
>> > loading documents into Solr. Secondly, I need to create temporary tables
>> to
>> > store results of a few queries and use them later for inner joins was
>> > wondering how to best go about this?
>> >
>> > I am thinking to add support in DIH for the following:
>> > 1) Temporary tables (maybe call it temporary entities)? --Specific only
>> to
>> > SQL though unless it can be generalized to other sources.
>> > 2) Parallel support
>> >  - Including some mechanism to get the number of records (whether it be
>> > count or the MAX(custom_id)-MIN(custom_id))
>> > 3) Support in DIH or Solr to post documents to a remote index (i.e.
>> create a
>> > new UpdateHandler instead of DirectUpdateHandler2).
>> >
>> > If any of these exist or anyone else is working on this (OR you have
>> better
>> > suggestions), please let me know.
>> >
>> > Thanks!
>> > Amit
>> >
>>
>>
>>
>> --
>>
>> -
>>
>



-- 
--Noble Paul

Reply via email to