On 9/2/2015 3:32 PM, Renee Sun wrote:
> I think we have similar structure where we use frontier/back instead of
> hot/cold :-)
>
> so yes we will probably have to do the same.
>
> since we have large customers and some of them may have tera bytes data and
> end up with hundreds of cold cores.... the blind delete broadcasting to all
> of them is a performance kill.
>
> I am thinking of adding a in-memory inventory of coreID : docID  so I can
> identify which core the document is in efficiently... what do you think
> about it?

I could write code for the deleteByQuery method to figure out where to
send the requests.  Performance hasn't become a problem with the "send
to all shards" method.  If it does, then I know exactly what to do:

If the ID value that we use for sharding is larger than X, it goes to
the hot shard.  If not, then I would CRC32 hash the ID, mod the hash
value by the number of cold shards, and send it to the shard number (0
through 5 for our indexes) that comes out.

Our sharding ID field is actually not our uniqueKey field for Solr,
although it is the autoincrement primary key on the source MySQL
database.  Another way to think about this field is as the "delete id". 
Our Solr uniqueKey is a different field that has a unique-enforcing
index in MySQL.

If you want good performance with sharding operations, then you need a
sharding algorithm that is completely deterministic based on the key
value and the current shard layout.  If the shard layout changes then it
should not change frequently.  Our layout changes only once a day, at
which time the oldest documents are moved from the hot shard to the cold
shards.

Thanks,
Shawn

Reply via email to