There has been a couple of discussions to find DIH successor
(including on HelioSearch list), but no real momentum as far as I can
tell.

I think somebody will have to really pitch in and do the same couple
of scenarios DIH does in several different frameworks (TodoMVC style).
That should get it going.

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Mon, Feb 17, 2014 at 7:40 PM, Mikhail Khludnev
<mkhlud...@griddynamics.com> wrote:
> On Sat, Feb 15, 2014 at 1:07 PM, Shawn Heisey <s...@elyograg.org> wrote:
>
>> On 2/14/2014 10:45 PM, William Bell wrote:
>> > On virtual cores the DIH handler is really slow. On a 12 core box it only
>> > uses 1 core while indexing.
>> >
>> > Does anyone know how to do Java threading from a SQL query into Solr?
>> > Examples?
>> >
>> > I can use SolrJ to do it, or I might be able to modify DIH to enable
>> > threading.
>> >
>> > At some point in 3.x threading was enabled in DIH, but it was removed
>> since
>> > people where having issues with it (we never did).
>>
>> If you know how to fix DIH so it can do multiple indexing threads
>> safely, please open an issue and upload a patch.
>>
> Please! Don't do it. Never again!
> https://issues.apache.org/jira/browse/SOLR-3011
>
> As far as I understand the general idea is to find the DIH successor
> https://issues.apache.org/jira/browse/SOLR-4799?focusedCommentId=13738424&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13738424
>
>
>>
>> I'm still using DIH for full rebuilds, but I'd actually like to replace
>> it with a rebuild routine written in SolrJ.  I currently achieve decent
>> speed by running DIH on all my shards at the same time.
>>
>> I do use SolrJ for once-a-minute index maintenance, but the code that
>> I've written to pull data out of SQL and write it to Solr is not able to
>> index millions of documents in a single thread as fast as DIH does.  I
>> have been building a multithreaded design in my head, but I haven't had
>> a chance to write real code and see whether it's actually a good design.
>>
>> For me, the bottleneck is definitely Solr, not the database.  I recently
>> wrote a test program that uses my current SolrJ indexing method.  If I
>> skip the "server.add(docs)" line, it can read all 91 million docs from
>> the database and build SolrInputDocument objects for them in 2.5 hours
>> or less, all with a single thread.  When I do a real rebuild with DIH,
>> it takes a little more than 4.5 hours -- and that is inherently
>> multithreaded, because it's doing all the shards simultaneously.  I have
>> no idea how long it would take with a single-threaded SolrJ program.
>>
>> Thanks,
>> Shawn
>>
>>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> <http://www.griddynamics.com>
>  <mkhlud...@griddynamics.com>

Reply via email to