On Mon, Mar 5, 2012 at 5:56 AM, Lance Norskog <goks...@gmail.com> wrote:
> You can run the DIH with multiple threads feeding from the same query. > FWIW, https://issues.apache.org/jira/browse/SOLR-3011 > Depends also on the size of the document: large documents may index > faster if they have their own threads. This may then interact with the > new NRT multi-commit code. > > On Sun, Mar 4, 2012 at 5:19 PM, Shawn Heisey <s...@elyograg.org> wrote: > > On 3/4/2012 3:31 AM, Sphene Software wrote: > >> > >> Folks, > >> > >> I am planning to use DIH for an index of size 10 million records. > >> > >> I would like to know the following; > >> - Can DIH scale for this size of an indexes > >> - If DIH is a bottleneck, what is the specific issue and how it can be > >> addressed > > > > > > My entire index is about 67 million documents. There are a total of > seven > > shards, six of them have over 11 million documents each. I can do a full > > dataimport (from MySQL) of those six shards simultaneously in less than > > three hours. The seventh shard is less than 500000 documents and builds > > after the others during a full rebuild. It is rare that we have to do a > > full rebuild, it's mostly at schema change time. > > > > I use SolrJ for updates, my experience with that so far suggests that > doing > > the full import with my SolrJ code would take significantly longer than > > three hours. > > > > Thanks, > > Shawn > > > > > > -- > Lance Norskog > goks...@gmail.com > -- Sincerely yours Mikhail Khludnev Lucid Certified Apache Lucene/Solr Developer Grid Dynamics <http://www.griddynamics.com> <mkhlud...@griddynamics.com>