I've set up a test program on a local machine, we'll see if I can reproduce here's the setup:
1> created a 2-shard, leader(primary) only collection 2> added 1M simple docs to it (ids 0-999,999) and some text 3> re-added 100_000 docs with a random id between 0 - 999,999 (inclusive) to insure there were deleted docs. Don't have any clue whether that matters. 4> fired up a 16 thread query program doing RTG on random doc IDs The program will stop when either it gets a null response or the response isn't the doc asked for. 5> running 7.3.1 6> I'm using the SolrJ RTG code 'cause it was easy 7> All this is running locally on a Mac Pro, no network involved which is another variable I suppose 8> 7M queries later no issues 9> there's no indexing going on at all Steve and Chris: What about this test setup do you imagine doesn't reflect what your setup is doing? Things I can think of in order of things to test: > mimic you y'all are calling RTG more faithfully > index to this collection, perhaps not at a high rate > create another collection and actively index to it > separate the machines running solr from the one doing any querying or indexing > ??? And, of course if it reproduces then run it to death on 7.5 to see if it's still a problem Best, Erick On Fri, Sep 28, 2018 at 10:21 AM Shawn Heisey <apa...@elyograg.org> wrote: > > On 9/28/2018 6:09 AM, sgaron cse wrote: > > because this is a test deployment replica is set to 1 so as far as I > > understand, data will not be replicated for this core. Basically we have > > two SOLR instances running on the same box. One on port 8983, the other on > > port 8984. We have 9 cores on this SOLR cloud deployment, 5 of which on the > > instance on port 8983 and the other 4 on port 8984. > > A question that isn't really related to the problem you're investigating > now: Why are you running two Solr instances on the same machine? 9 > cores is definitely not too many for one Solr instance. > > > As far as I can tell > > all cores suffer from the occasional null document. But the one that I can > > easily see error from is a config core where we store configuration data > > for our system. Since the configuration data should always be there we > > throw exceptions as soon as we get a null document which is why I noticed > > the problem. > > When you say "null document" do you mean that you get no results, or > that you get a result with a document, but that document has nothing in > it? Are there any errors returned or logged by Solr when this happens? > > > Our client code that connects to the APIs randomly chooses between all the > > different ports because it does not know which instance it should ask. So > > no, we did not try sending directly to the instance that has the data but > > since there is no replica there is no way that this should get out of sync. > > I was suggesting this as a troubleshooting step, not a change to how you > use Solr. Basically trying to determine what happens if you send a > request directly to the instance and core that contains the document > with distrib=false, to see if it behaves differently than when it's a > more generic collection-directed query. The idea was to try and narrow > down exactly where to look for a problem. > > If you wait a few seconds, does the problem go away? When using real > time get, a new document must be written to a segment and a new realtime > searcher must be created before you can get that document. These things > typically happen very quickly, but it's not instantaneous. > > > To add up to what Chris was saying, although the core that is seeing the > > issue is not hit very hard, other core in the setup will be. We are > > building a clustering environment that has auto-scaling so if we are under > > heavy load, we can easily have 200-300 client hitting the SOLR instance > > simultaneously. > > That much traffic is going to need multiple replicas on separate > hardware, with something in place to do load balancing. Unless your code > is Java and you can use CloudSolrClient, I would recommend an external > load balancer. > > Thanks, > Shawn >