Hi Toke,

That was Spectacular, really great to hear that you have already indexed
2.7TB+ data to your server and still the query response time is under ms or
a few seconds for such a huge dataset.
Could you state what indexing mechanism are you using, as I started with
EmbeddedSolrServer but it was pretty slow after a few GB(~30+) of indexing.
I started indexing 1 week back and still its 37GB, although I assume
HttpPost mechanism will perform lethargic slow due to network latency and
for the response await. Furthermore I started with CloudSolrServer but
facing some weird exception saying "ClassCastException Cannot cast to
Exception" while adding the SolrInputDocument to the Server.

                CloudSolrServer server1 = new
CloudSolrServer("zkHost:port1,zkHost:port2,zkHost:port3",false);
        server1.setDefaultCollection("mycollection");
        SolrInputDocument doc = new SolrInputDocument();
        doc.addField( "ID", "123");
        doc.addField( "A0_s", "282628854");

        server1.add(doc); //Error at this line
        server1.commit();

Thanks again Toke for sharing that Stats.


On Fri, Jun 6, 2014 at 5:04 PM, Toke Eskildsen <t...@statsbiblioteket.dk>
wrote:

> On Fri, 2014-06-06 at 12:32 +0200, Vineet Mishra wrote:
> > *Does that mean for querying smoothly we need to have memory atleast
> equal
> > or greater to the size of index?
>
> If you absolutely, positively have to reduce latency as much as
> possible, then yes. With an estimated index size of 2TB, I would guess
> that 10-20 machines with powerful CPUs (1 per shard per expected
> concurrent request) would also be advisable. While you're at it, do make
> sure that you're using high-speed memory.
>
> That was not a serious suggestion, should you be in doubt. Very few
> people need the best latency possible. Most just need the individual
> searches to be "fast enough" and want to scale throughput instead.
>
> > As in my case the index size will be very heavy(~2TB) and practically
> > speaking that amount of memory is not possible. Even If it goes to
> > multiple shards, say around 10 Shards then also 200GB of RAM will not
> > be an feasible option.
>
> We're building a projected 24TB index collection and are currently at
> 2.7TB+, growing with about 1TB/10 days. Our current plan is to use a
> single machine with 256GB of RAM, but we will of course adjust along the
> way if it proves to be too small.
>
> Requirements differ with the corpus and the needs, but for us, SSDs as
> storage seems to provide quite enough of a punch. I did a little testing
> yesterday: https://plus.google.com/u/0/+TokeEskildsen/posts/4yPvzrQo8A7
>
> tl;dr: for small result sets (< 1M hits) on unwarmed searches with
> simple queries, response time is below 100ms. If we enable faceting with
> plain Solr, this jumps to about 1 second.
>
> I did a top on the machine and it says that 50GB is currently used for
> caching, so an 80GB (and probably less) machine would work fine for our
> 2.7TB index.
>
>
> - Toke Eskildsen, State and University Library, Denmark
>
>
>

Reply via email to