On 12/5/2011 6:57 PM, Jamie Johnson wrote:
Question which is a bit off topic. You mention your algorithm for
sharding, how do you handle updates or do you not have to deal with
that in your scenario?
I have a long running program based on SolrJ that handles updates. Once
a minute, I run thro
ways. The most recent data
> (between 3.5 to 7 days, trying to keep it below 500,000 records) goes into
> one shard. The rest of the data is split using the formula crc32(did) %
> numShards. The value of numShards is currently six. Each of those large
> shards has nearly 11 million d
On Mon, Dec 5, 2011 at 3:28 PM, Shawn Heisey wrote:
> On 12/4/2011 12:41 AM, Ted Dunning wrote:
>
>> Read the papers I referred to. They describe how to search fairly
>> enormous
>> corpus with an 8GB in-memory index (and no disk cache at all).
>>
>
> They would seem to indicate moving away from
On 12/4/2011 12:41 AM, Ted Dunning wrote:
Read the papers I referred to. They describe how to search fairly enormous
corpus with an 8GB in-memory index (and no disk cache at all).
They would seem to indicate moving away from Solr. While that would not
be entirely out of the question, I don't
On Sat, Dec 3, 2011 at 6:36 PM, Shawn Heisey wrote:
> On 12/3/2011 2:25 PM, Ted Dunning wrote:
>
>> Things have changed since I last did this sort of thing seriously. My
>> guess is that this is a relatively small amount of memory to devote to
>> search. It used to be that the only way to do this
On 12/3/2011 2:25 PM, Ted Dunning wrote:
Things have changed since I last did this sort of thing seriously. My
guess is that this is a relatively small amount of memory to devote to
search. It used to be that the only way to do this effectively with
Lucene based systems was to keep the heap rel
s is currently six. Each of those large
> shards has nearly 11 million documents in 20GB of disk space.
>
OK. That is a relatively common arrangement.
I am already using the concept of micro-sharding, but certainly not on a
> grand scale. One copy of the index is served by two host
already using the concept of micro-sharding, but certainly not on a
grand scale. One copy of the index is served by two hosts with 8 CPU
cores, so each host has three of the large shards. Doing some least
common multiple calculations, I have determined that 420 shards would
allow me to use the