Hello, 2018-02-12 12:32 GMT+01:00 Emir Arnautović <emir.arnauto...@sematext.com>:
> Hi Daniel, > Maybe it is Monday and I am still not warmed up, but your details seems a > bit imprecise to me. Maybe not directly related to your problem, but just > to exclude that you are having some strange Solr setup, here is my > understanding: You are running a single SolrCloud cluster with 8 nodes. It > has a single collection with X shards and Y replicas. You use DIH to index > data and you use curl to interact with Solr and start DIH process. You see > some of replicas for some of shards having less data and after node restart > it ends up being ok. > Is this right? If it is, what is X and Y? Near to reality: - I've a SolrCloud cluster with 8 nodes but has multiple collections. - Every collection has only one shard for performance purpose (I did some test splitting shards and queries were slower). - Every collection has 8 replicas (one by node) - After restart the node it start to recover the collections. I don't know if Solr serve data directly on that state or get the data from other nodes before serve it, but even while is recovering, the data looks OK. > Do you have autocommit set up or you commit explicitly? I'm not sure about that. How I can check it? On curl command is not specified, but will be true by default, right? > Did you check logs on node with less data and did you see any > errors/warnings? I'm not sure when it failed and the cluster has a lot warnings and error every time (maybe related with queries from shop), so is hard to determine if import error exists and what's the error related to the import. Is like search a needle on a haystack > Do you do full imports or incremental imports? > I've checked the curl command and looks like is doing full imports without clean data: http://' . $solr_ip . ':8983/solr/descriptions/dataimport?command=full-import&clean=false&entity=description_'.$idm[$j].'_lastupdate > > Not related to issue, but just a note that Solr does not guaranty > consistency at any time - it has something called “eventual consistency” - > once updates stop all replicas will (should) end up in the same state. > Having said that, using Solr results directly in your UI would either > require you to cache used documents on UI/middle layer or implement some > sort of stickiness or retrieve only ID from Solr and load data from primary > storage. If you have static data, and you update index once a day, you can > use aliases and switch between new and old index and you will suffer from > this issue only at the time when doing switches. > But is normal that data will be inconsistent for a very long time?, because looks like the data is inconsistent from about a week... Another question: With HDFS, data will be consistent?. With HDFS the data will be shared between nodes and then updates will be avaible on all nodes at same time, right? Thanks!! > > Regards, > Emir > -- > Monitoring - Log Management - Alerting - Anomaly Detection > Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > > > > > On 12 Feb 2018, at 12:00, Daniel Carrasco <d.carra...@i2tic.com> wrote: > > > > Hello, thanks for your help. > > > > I answer bellow. > > > > Greetings!! > > > > 2018-02-12 11:31 GMT+01:00 Emir Arnautović <emir.arnauto...@sematext.com > <mailto:emir.arnauto...@sematext.com>>: > > > >> Hi Daniel, > >> Can you tell us more about your document update process. How do you > commit > >> changes? Since it got fixed after restart, it seems to me that on that > one > >> node index searcher was not reopened after updates. Do you see any > >> errors/warnings on that node? > >> > > > > i've asked to the programmers and looks like they are using the > collections > > dataimport using curl. I think the data is imported from a Microsoft SQL > > server using a solr plugin. > > > > > >> Also, what do you mean by “All nodes are standalone”? > >> > > > > I mean that nodes don't share filesystem (I'm planning to use Hadoop, but > > I've to learn to create and maintain the cluster first). All nodes has > its > > own data drive and are connected to the cluster using zookeeper. > > > > > >> > >> Regards, > >> Emir > >> -- > >> Monitoring - Log Management - Alerting - Anomaly Detection > >> Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > >> > >> > >> > >>> On 12 Feb 2018, at 11:16, Daniel Carrasco <d.carra...@i2tic.com> > wrote: > >>> > >>> Hello, > >>> > >>> We're using Solr to manage products data on our shop and the last week > >> some > >>> customers called us telling that price between shop and shopping basket > >>> differs. After research a bit I've noticed that it happens sometimes on > >>> page refresh. > >>> After disabling all cache I've queried all solr instances to see if > data > >> is > >>> correct and I've seen that one of them give a different price for the > >>> product, so looks like the instance has not got the updated data. > >>> > >>> - How is possible that a node on a cluster have different data? > >>> - How i can check if data is in sync?, because the cluster looks al > >>> healthy on admin, and the node is active and OK. > >>> - Is there any way to detect this error? and How I can force resyncs? > >>> > >>> After restart the node it got synced, so the data now is OK, but I > can't > >>> restart the nodes every time to see if data is right (it tooks a lot of > >>> time to be synced again). > >>> > >>> My configuration is: 8 Solr nodes using v7.1.0 and zookeeper v3.4.11. > All > >>> nodes are standalone (I'm not using hadoop). > >>> > >>> Thanks and greetings! > >>> -- > >>> _________________________________________ > >>> > >>> Daniel Carrasco Marín > >>> Ingeniería para la Innovación i2TIC, S.L. > >>> Tlf: +34 911 12 32 84 Ext: 223 <+34%20911%2012%2032%2084> > >>> www.i2tic.com <http://www.i2tic.com/> > >>> _________________________________________ > >> > >> > > > > > > -- > > _________________________________________ > > > > Daniel Carrasco Marín > > Ingeniería para la Innovación i2TIC, S.L. > > Tlf: +34 911 12 32 84 Ext: 223 > > www.i2tic.com <http://www.i2tic.com/> > > _________________________________________ > > -- _________________________________________ Daniel Carrasco Marín Ingeniería para la Innovación i2TIC, S.L. Tlf: +34 911 12 32 84 Ext: 223 www.i2tic.com _________________________________________