Hey Shawn, because this is a test deployment replica is set to 1 so as far as I understand, data will not be replicated for this core. Basically we have two SOLR instances running on the same box. One on port 8983, the other on port 8984. We have 9 cores on this SOLR cloud deployment, 5 of which on the instance on port 8983 and the other 4 on port 8984. As far as I can tell all cores suffer from the occasional null document. But the one that I can easily see error from is a config core where we store configuration data for our system. Since the configuration data should always be there we throw exceptions as soon as we get a null document which is why I noticed the problem.
Our client code that connects to the APIs randomly chooses between all the different ports because it does not know which instance it should ask. So no, we did not try sending directly to the instance that has the data but since there is no replica there is no way that this should get out of sync. To add up to what Chris was saying, although the core that is seeing the issue is not hit very hard, other core in the setup will be. We are building a clustering environment that has auto-scaling so if we are under heavy load, we can easily have 200-300 client hitting the SOLR instance simultaneously. On Thu, Sep 27, 2018 at 3:38 PM Chris Ulicny <culicny@iq.media> wrote: > I don't think I've much to add that Steve hasn't already covered, but we've > also seen this "null doc" problem in one of our setups. > > In one of our Solr Cloud instances in production where the /get handler is > hit very hard in bursts, the /get request will occasionally return "null" > for a document that exists. However, there is very heavy indexing (no > overwrites or deletes) during that time which we assumed was the cause. > This happens on 2 collections which have 10 shards each, replication factor > of 2, spread across 4 hosts. During testing and when we first moved to this > setup in production, we had a replication factor of 1, and still > experienced the same issue of periodic "null" returned for documents, so it > is probably not a replica synchronization issue. > > These documents were indexed about 10 minutes prior and had already been > successfully returned to previous /get requests. We haven't been able to > replicate it with any consistency, but it isn't a particularly critical > issue with our use case. > > Best, > Chris > > > On Thu, Sep 27, 2018 at 2:53 PM Shawn Heisey <apa...@elyograg.org> wrote: > > > On 9/27/2018 11:48 AM, sgaron cse wrote: > > > So this is a SOLR core where we keep configuration data so it is almost > > > never written to. The statistics for the core say its been last > modified > > 4 > > > hours ago, yet I got doc:null from the API an hour ago. And also you > > don't > > > have to have a lot of data into the core. For example, this core has > only > > > 11 documents in it. The document I'm trying to fetch is about 45KB if > > that > > > matters. > > > > Are there multiple replicas of this collection? Have you tried sending > > requests specifically to the replica cores with distrib=false on the URL > > to keep SolrCloud from sending the request elsewhere within the cluster, > > to see if maybe the replicas are not as synchronized as they should be? > > Without distrib=false, you cannot control which machine(s) will answer > > your query. > > > > Replicas shouldn't get out of sync unless something goes very wrong, but > > it has been known to happen. > > > > > Other things to note, this SOLR cloud instance is running multiple > cores > > (9 > > > cores total) and some of them are getting completely hammered. But I > > > figured that each core is it's own thing, I may be wrong. > > > > > > BTW, I'm not 100% familiar with SOLR cloud but I see in the Replication > > > section that the Master (saerching) and the Master (Replicable) are > > running > > > different version / different gen. Not sure if that matters, not sure > > what > > > that means. > > > > For normal usage, you can completely ignore the replication master > > information when Solr is running in SolrCloud mode. SolrCloud only uses > > replication for recovering indexes that get out of sync (in a way that > > SolrCloud can detect), and it configures the replication handler on the > > fly when it is needed. The information it returns at any other time will > > be meaningless. When things are operating normally, the replication > > feature will never be used. > > > > Thanks, > > Shawn > > > > >