Hi Daniel,
Maybe it is Monday and I am still not warmed up, but your details seems a bit 
imprecise to me. Maybe not directly related to your problem, but just to 
exclude that you are having some strange Solr setup, here is my understanding: 
You are running a single SolrCloud cluster with 8 nodes. It has a single 
collection with X shards and Y replicas. You use DIH to index data and you use 
curl to interact with Solr and start DIH process. You see some of replicas for 
some of shards having less data and after node restart it ends up being ok.

Is this right? If it is, what is X and Y? Do you have autocommit set up or you 
commit explicitly? Did you check logs on node with less data and did you see 
any errors/warnings? Do you do full imports or incremental imports?

Not related to issue, but just a note that Solr does not guaranty consistency 
at any time - it has something called “eventual consistency” - once updates 
stop all replicas will (should) end up in the same state. Having said that, 
using Solr results directly in your UI would either require you to cache used 
documents on UI/middle layer or implement some sort of stickiness or retrieve 
only ID from Solr and load data from primary storage. If you have static data, 
and you update index once a day, you can use aliases and switch between new and 
old index and you will suffer from this issue only at the time when doing 
switches.

Regards,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 12 Feb 2018, at 12:00, Daniel Carrasco <d.carra...@i2tic.com> wrote:
> 
> Hello, thanks for your help.
> 
> I answer bellow.
> 
> Greetings!!
> 
> 2018-02-12 11:31 GMT+01:00 Emir Arnautović <emir.arnauto...@sematext.com 
> <mailto:emir.arnauto...@sematext.com>>:
> 
>> Hi Daniel,
>> Can you tell us more about your document update process. How do you commit
>> changes? Since it got fixed after restart, it seems to me that on that one
>> node index searcher was not reopened after updates. Do you see any
>> errors/warnings on that node?
>> 
> 
> i've asked to the programmers and looks like they are using the collections
> dataimport using curl. I think the data is imported from a Microsoft SQL
> server using a solr plugin.
> 
> 
>> Also, what do you mean by “All nodes are standalone”?
>> 
> 
> I mean that nodes don't share filesystem (I'm planning to use Hadoop, but
> I've to learn to create and maintain the cluster first). All nodes has its
> own data drive and are connected to the cluster using zookeeper.
> 
> 
>> 
>> Regards,
>> Emir
>> --
>> Monitoring - Log Management - Alerting - Anomaly Detection
>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>> 
>> 
>> 
>>> On 12 Feb 2018, at 11:16, Daniel Carrasco <d.carra...@i2tic.com> wrote:
>>> 
>>> Hello,
>>> 
>>> We're using Solr to manage products data on our shop and the last week
>> some
>>> customers called us telling that price between shop and shopping basket
>>> differs. After research a bit I've noticed that it happens sometimes on
>>> page refresh.
>>> After disabling all cache I've queried all solr instances to see if data
>> is
>>> correct and I've seen that one of them give a different price for the
>>> product, so looks like the instance has not got the updated data.
>>> 
>>>  - How is possible that a node on a cluster have different data?
>>>  - How i can check if data is in sync?, because the cluster looks al
>>>  healthy on admin, and the node is active and OK.
>>>  - Is there any way to detect this error? and How I can force resyncs?
>>> 
>>> After restart the node it got synced, so the data now is OK, but I can't
>>> restart the nodes every time to see if data is right (it tooks a lot of
>>> time to be synced again).
>>> 
>>> My configuration is: 8 Solr nodes using v7.1.0 and zookeeper v3.4.11. All
>>> nodes are standalone (I'm not using hadoop).
>>> 
>>> Thanks and greetings!
>>> --
>>> _________________________________________
>>> 
>>>     Daniel Carrasco Marín
>>>     Ingeniería para la Innovación i2TIC, S.L.
>>>     Tlf:  +34 911 12 32 84 Ext: 223 <+34%20911%2012%2032%2084>
>>>     www.i2tic.com <http://www.i2tic.com/>
>>> _________________________________________
>> 
>> 
> 
> 
> -- 
> _________________________________________
> 
>      Daniel Carrasco Marín
>      Ingeniería para la Innovación i2TIC, S.L.
>      Tlf:  +34 911 12 32 84 Ext: 223
>      www.i2tic.com <http://www.i2tic.com/>
> _________________________________________

Reply via email to