You can run the DIH with multiple threads feeding from the same query. Depends also on the size of the document: large documents may index faster if they have their own threads. This may then interact with the new NRT multi-commit code.
On Sun, Mar 4, 2012 at 5:19 PM, Shawn Heisey <s...@elyograg.org> wrote: > On 3/4/2012 3:31 AM, Sphene Software wrote: >> >> Folks, >> >> I am planning to use DIH for an index of size 10 million records. >> >> I would like to know the following; >> - Can DIH scale for this size of an indexes >> - If DIH is a bottleneck, what is the specific issue and how it can be >> addressed > > > My entire index is about 67 million documents. There are a total of seven > shards, six of them have over 11 million documents each. I can do a full > dataimport (from MySQL) of those six shards simultaneously in less than > three hours. The seventh shard is less than 500000 documents and builds > after the others during a full rebuild. It is rare that we have to do a > full rebuild, it's mostly at schema change time. > > I use SolrJ for updates, my experience with that so far suggests that doing > the full import with my SolrJ code would take significantly longer than > three hours. > > Thanks, > Shawn > -- Lance Norskog goks...@gmail.com