Hi everyone,

I'm running into a problem with SolrCloud replicas and thought I would ask the 
list to see if anyone else has seen this / gotten past it.

Right now, we are running with only one replica per shard.  This is obviously a 
problem because if one node goes down anywhere, the whole collection goes 
offline, and due to garbage collection issues, this happens about once or twice 
a week, causing a great deal of instability.  If we try to increase to 2 
replicas per shard, once we index new documents and the shards autocommit, the 
shards all get out of sync with each other, with different numbers of 
documents, different numbers of documents deleted, different facet counts - 
pretty much totally divergent indexes.  Shards always show green and available, 
and never go into recovery or any other state as to indicate there's a 
mismatch.  There are also no errors in the logs to indicate anything is going 
wrong.  Even long after indexing has finished, the replicas never come back 
into sync.  The only way to get consistency again is to delete one set of 
replicas and then add them back in.  Unfortunately, when we do this, we 
invariably discover that many documents (2-3%) are missing from the index.

We have tried setting the min_rf parameter, and have found that when setting 
min_rf=2, we almost never get back rf=2.  We almost always get rf=1, resend the 
request, and it basically just goes into an infinite loop.  The only way to get 
rf=2 to come back is to only index one document at a time.  Unfortunately, we 
have to update millions of documents a day and it isn't really feasible to 
index this way, and even when indexing one document at a time, we still 
occasionally find ourselves in an infinite loop.  This doesn't appear to be 
related to the documents we are indexing - if we stop the index process and 
bounce solr, the exact same document will go through fine the next time until 
indexing stops up on another random document.

We have 8 nodes, with 4 shards a piece, all running one collection with about 
900M documents.  An important note is that we have a block join system with 3 
tiers of documents (products -> skus -> sku_history).  During indexing, we are 
forced to delete all documents for a product prior to adding the product back 
into the index, in order to avoid orphaned children / grandchildren.  All 
documents are consistently indexed with the top-level product ID so that we can 
delete all child/grandchild documents prior to updating the document.  So, for 
each updated document, we are sending through a delete call followed by an add 
call.  We have tried putting both the delete and add in the same update request 
with the same results.

All we see out there on Google is that none of what we're seeing should be 
happening.

We are currently running Solr 6.0 with Zookeeper 3.4.6.  We experienced the 
same behavior on 5.4 as well.

--
Steve

________________________________

WGSN is a global foresight business. Our experts provide deep insight and 
analysis of consumer, fashion and design trends. We inspire our clients to plan 
and trade their range with unparalleled confidence and accuracy. Together, we 
Create Tomorrow.

WGSN<http://www.wgsn.com/> is part of WGSN Limited, comprising of 
market-leading products including WGSN.com<http://www.wgsn.com>, WGSN Lifestyle 
& Interiors<http://www.wgsn.com/en/lifestyle-interiors>, WGSN 
INstock<http://www.wgsninstock.com/>, WGSN 
StyleTrial<http://www.wgsn.com/en/styletrial/> and WGSN 
Mindset<http://www.wgsn.com/en/services/consultancy/>, our bespoke consultancy 
services.

The information in or attached to this email is confidential and may be legally 
privileged. If you are not the intended recipient of this message, any use, 
disclosure, copying, distribution or any action taken in reliance on it is 
prohibited and may be unlawful. If you have received this message in error, 
please notify the sender immediately by return email and delete this message 
and any copies from your computer and network. WGSN does not warrant that this 
email and any attachments are free from viruses and accepts no liability for 
any loss resulting from infected email transmissions.

WGSN reserves the right to monitor all email through its networks. Any views 
expressed may be those of the originator and not necessarily of WGSN. WGSN is 
powered by Ascential plc<http://www.ascential.com>, which transforms knowledge 
businesses to deliver exceptional performance.

Please be advised all phone calls may be recorded for training and quality 
purposes and by accepting and/or making calls from and/or to us you acknowledge 
and agree to calls being recorded.

WGSN Limited, Company number 4858491

registered address:

Ascential plc, The Prow, 1 Wilder Walk, London W1B 5AP

WGSN Inc., tax ID 04-3851246, registered office c/o National Registered Agents, 
Inc., 160 Greentree Drive, Suite 101, Dover DE 19904, United States

4C Serviços de Informação Ltda., CNPJ/MF (Taxpayer's Register): 
15.536.968/0001-04, Address: Avenida Cidade Jardim, 377, 7˚ andar CEP 
01453-000, Itaim Bibi, São Paulo

4C Business Information Consulting (Shanghai) Co., Ltd, 富新商务信息咨询(上海)有限公司, 
registered address Unit 4810/4811, 48/F Tower 1, Grand Gateway, 1 Hong Qiao 
Road, Xuhui District, Shanghai

Reply via email to