Hey Jack, Well I have indexed around some 10 Million documents consuming 20 GB index size. Each Document is consisting of nearly 100 String Fields with data upto 10 characters per field. For my case each document containing number of fields can expand much widely (from current 100 to 500 or ever more).
As for the typical exceptional case I was more interested for a way to evenly maintain the right ratio of index vs shard. Thanks! On Wed, Jun 4, 2014 at 7:47 PM, Jack Krupansky <j...@basetechnology.com> wrote: > How many documents was in that 20GB index? > > I'm skeptical that a 1 billion document shard "won't be a problem." I mean > technically it is possible, but as you are already experiencing, it may > take a long time and a very powerful machine to do so. 100 million (or 250 > million max) would be a more realistic goal. Even then, it depends on your > doc size and machine size. > > The main point from the previous discussion is that although the technical > hard limit for a Solr shard is 2G docs, from a practical perspective it is > very difficult to get to that limit, not that indexing 1 billion docs on a > single shard is "just fine"! > > As a general rule, if you want fast queries for high volume, strive to > assure that your per-shard index fits entirely into the system memory > available for OS caching of file system pages. > > In any case, a proof of concept implementation will tell you everything > you need to know. > > > -- Jack Krupansky > > -----Original Message----- From: Vineet Mishra > Sent: Wednesday, June 4, 2014 2:45 AM > To: solr-user@lucene.apache.org > Subject: Re: Solr maximum Optimal Index Size per Shard > > > Thanks all for your response. > I presume this conversation concludes that indexing around 1Billion > documents per shard won't be a problem, as I have 10 Billion docs to index, > so approx 10 shards with 1 Billion each should be fine with it and how > about Memory, what size of RAM should be fine for this amount of data? > Moreover what should be the indexing technique for this huge data set, as > currently I am indexing with EmbeddedSolrServer but its going pathetically > slow after some 20Gb of indexing. Comparatively SolrHttpPost was slow due > to network delays and response but after this long running the indexing > with EmbeddedSolrServer I am getting a different notion. > Any good indexing technique for this huge dataset would be highly > appreciated. > > Thanks again! > > > On Wed, Jun 4, 2014 at 6:40 AM, rulinma <ruli...@gmail.com> wrote: > > mark. >> >> >> >> -- >> View this message in context: >> http://lucene.472066.n3.nabble.com/Solr-maximum- >> Optimal-Index-Size-per-Shard-tp4139565p4139698.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> >