On 2/21/2014 1:39 AM, search engn dev wrote:
> As you suggestedI have indexed 12million sample records in solr on hardware
> of 8gb ram. Size of index is 3gb.
> can i extrapolate this to predict actual size of index.?

If the sizes of those records are about the same size as the records in
the system as a whole, you can probably use that to extrapolate.

Based on that, I would guess that the index is probably going to be
about 85GB.  That's a lot less than I would have guessed, so perhaps
there's a lot of extra stuff in that 250GB that doesn't actually get
sent to Solr.

Even though they are small, the number of documents will probably
require a larger Java heap than the relatively small index size would
normally require.

Do you have any kind of notion as to what kind of query volume you're
going to have?  If it's low, you can put multiple shards on your
multi-cpu machines and take advantage of parallel processing.  If the
query volume is high, you'll need all those cpus to handle the load of
one shard, and you might need more than two machines for each shard.

You'll want to shard your index even though it's relatively small in
terms of disk space, because a billion documents is a LOT.

If you're just starting out, SolrCloud is probably a good way to go.  It
handles document routing across shards for you.  You didn't say whether
that was your plan or not.

Thanks,
Shawn

Reply via email to