Re: Realtime get not always returning existing data

Shawn Heisey Sat, 29 Sep 2018 08:28:45 -0700

On 9/28/2018 8:11 PM, sgaron cse wrote:

@Shawn
We're running two instance on one machine for two reason:
1. The box has plenty of resources (48 cores / 256GB ram) and since I was
reading that it's not recommended to use more than 31GB of heap in SOLR we
figured 96 GB for keeping index data in OS cache + 31 GB of heap per
instance was a good idea.

Do you know that these Solr instances actually DO need 31 GB of heap, orare you following advice from somewhere, saying "use one quarter of yourmemory as the heap size"? That advice is not in the Solr documentation,and never will be. Figuring out the right heap size requiresexperimentation.


https://wiki.apache.org/solr/SolrPerformanceProblems#How_much_heap_space_do_I_need.3F

How big (on disk) are each of these nine cores, and how many documentsare in each one? Which of them is in each Solr instance? With thatinformation, we can make a *guess* about how big your heap should be. Figuring out whether the guess is correct generally requires carefulanalysis of a GC log.

2. We're in testing phase so we wanted a SOLR cloud configuration, we will
most likely have a much bigger deployment once going to production. In prod
right now, we currently to run a six machines Riak cluster. Riak is a
key/value document store an has SOLR built-in for search, but we are trying
to push the key/value aspect of Riak inside SOLR. That way we would have
one less piece to worry about in our system.

Solr is not a database. It is not intended to be a data repository. All of its optimizations (most of which are actually in Lucene) aregeared towards search. While technically it can be a key-value store,that is not what it was MADE for. Software actually designed for thatrole is going to be much better than Solr as a key-value store.

When I say null document, I mean the /get API returns: {doc: null}

The problem is definitely not always there. We also have large period of
time (few hours) were we have no problems. I'm just extremely hesitant on
retrying when I get a null document because in some case, getting a null
document is a valid outcome. Our caching layer heavily rely on this for
example. If I was to retry every nulls I'd pay a big penalty in
performance.

I've just done a little test with the 7.5.0 techproducts example. Itlooks like returning doc:null actually is how the RTG handler says itdidn't find the document. This seems very wrong to me, but I didn'tdesign it, and that response needs SOME kind of format.

Have you done any testing to see whether the standard searching handler(typically /select, but many other URL paths are possible) returnsresults when RTG doesn't? Do you know for these failures whether thedocument has been committed or not?

As for your last comment, part of our testing phase is also testing the
limits. Our framework has auto-scaling built-in so if we have a burst of
request, the system will automatically spin up more clients. We're pushing
10% of our production system to that Test server to see how it will handle
it.

To spin up another replica, Solr must copy all its index data from theleader replica. Not only can this take a long time if the index is big,but it will put a lot of extra I/O load on the machine(s) with theleader roles. So performance will actually be WORSE before it getsbetter when you spin up another replica, and if the index is big, thatcondition will persist for quite a while. Copying the index data willbe constrained by the speed of your network and by the speed of yourdisks. Often the disks are slower than the network, but that is notalways the case.


Thanks,
Shawn

Re: Realtime get not always returning existing data

Reply via email to