Analyzed documents. The transaction log stores the raw input.
On Sun, Dec 18, 2016 at 5:32 AM, Jaroslaw Rozanski <m...@jarekrozanski.com> wrote: > Hi Erick, > > > Not talking about separation any more. I merely summarized message from > Pushkar. As I said it was clear that it was not possible. > > > About the RAMBufferSizeMB, getting back to my original question, is this > buffer for storing update requests or ready to index, analyzed documents? > > Documentation suggests former, your first mention however suggests the > later. > > > Thanks, > Jaroslaw > > > On 18/12/16 02:16, Erick Erickson wrote: >> Yes indexing is adding stress. No you can't separate >> the two in SolrCloud. End of story, why beat it to death? >> You'll have to figure out the sharding strategy that >> meets your indexing and querying needs and live >> within that framework. I'd advise setting up a small >> cluster and driving it to its tipping point and extrapolating >> from there. Here's the long version of "the sizing exercise". >> >> https://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/ >> >> My point that while indexing to Solr/Lucene there is >> additional pressure. That pressure has a fixed upper >> limit that doesn't grow with the number of docs. That's not >> true for searching, as you add more docs per node, the >> pressure (especially memory) increases. Concentrate >> your efforts there IMO. >> >> Best >> Erick >> >> >> >> On Sat, Dec 17, 2016 at 12:54 PM, Jaroslaw Rozanski >> <m...@jarekrozanski.com> wrote: >>> Hi Erick, >>> >>> So what does this buffer represent? What does it actually store? Raw >>> update request or analyzed document? >>> >>> The documentation suggest that it stores actual update requests. >>> >>> Obviously analyzed document can and will occupy much more space than raw >>> one. Also analysis with create a lot of new allocations and subsequent >>> GC work. >>> >>> Yes, you are probably right that search puts more stress and is main >>> memory user but combination of: >>> - non-trivial analysis, >>> - high volume of updates and >>> - search on the same node >>> >>> seems adding fuel to the fire. >>> >>> From previous response by Pushkar, it is clear that separation is not >>> achievable with existing SolrCloud mechanism. >>> >>> Thanks >>> >>> >>> On 17/12/16 20:24, Erick Erickson wrote: >>>> bq: I am more concerned with indexing memory requirements at volume >>>> >>>> By and large this isn't much of a problem. RAMBufferSizeMB in >>>> solrconfig.xml governs how much memory is consumed in Solr for >>>> indexing. When that limit is exceeded, the buffer is flushed to disk. >>>> I've rarely heard of indexing being a memory issue. Anecdotally I >>>> haven't seen throughput benefit with buffer sizes over 128M. >>>> >>>> You're correct in that master/slave style replication would use less >>>> memory on the slave, although there are other costs. I.e. rather than >>>> the data for document X being sent to the replicas once as in >>>> SolrCloud, that data is re-sent to the slave every time it's merged >>>> into a new segment. >>>> >>>> That said, memory issues are _far_ more prevalent on the search side >>>> of things so unless this is a proven issue in your environment I would >>>> fight other fires..... >>>> >>>> Best, >>>> Erick >>>> >>>> On Fri, Dec 16, 2016 at 1:06 PM, Jaroslaw Rozanski >>>> <m...@jarekrozanski.com> wrote: >>>>> Thanks, that issue looks interesting! >>>>> >>>>> On 16/12/16 16:38, Pushkar Raste wrote: >>>>>> This kind of separation is not supported yet. There however some work >>>>>> going on, you can read about it on >>>>>> https://issues.apache.org/jira/browse/SOLR-9835 >>>>>> >>>>>> This unfortunately would not support soft commits and hence would not be >>>>>> a >>>>>> good solution for near real time indexing. >>>>>> >>>>>> On Dec 16, 2016 7:44 AM, "Jaroslaw Rozanski" <m...@jarekrozanski.com> >>>>>> wrote: >>>>>> >>>>>>> Sorry, not what I meant. >>>>>>> >>>>>>> Leader is responsible for distributing update requests to replica. So >>>>>>> eventually all replicas have same state as leader. Not a problem. >>>>>>> >>>>>>> It is more about the performance of such. If I gather correctly normal >>>>>>> replication happens by standard update request. Not by, say, segment >>>>>>> copy. >>>>>>> >>>>>>> Which means update on leader is as "expensive" as on replica. >>>>>>> >>>>>>> Hence, if my understanding is correct, sending search request to replica >>>>>>> only, in index heavy environment, would bring no benefit. >>>>>>> >>>>>>> So the question is: is there a mechanism, in SolrCloud (not legacy >>>>>>> master/slave set-up) to make one node take a load of indexing which >>>>>>> other nodes focus on searching. >>>>>>> >>>>>>> This is not a question of SolrClient cause that is clear how to direct >>>>>>> search request to specific nodes. This is more about index optimization >>>>>>> so that certain nodes (ie. replicas) could suffer less due to high >>>>>>> volume indexing while serving search requests. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 16/12/16 12:35, Dorian Hoxha wrote: >>>>>>>> The leader is the source of truth. You expect to make the replica the >>>>>>>> source of truth or something???Doesn't make sense? >>>>>>>> What people do, is send write to leader/master and reads to >>>>>>> replicas/slaves >>>>>>>> in other solr/other-dbs. >>>>>>>> >>>>>>>> On Fri, Dec 16, 2016 at 1:31 PM, Jaroslaw Rozanski >>>>>>>> <m...@jarekrozanski.com >>>>>>>> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi all, >>>>>>>>> >>>>>>>>> According to documentation, in normal operation (not recovery) in Solr >>>>>>>>> Cloud configuration the leader sends updates it receives to all the >>>>>>>>> replicas. >>>>>>>>> >>>>>>>>> This means and all nodes in the shard perform same effort to index >>>>>>>>> single document. Correct? >>>>>>>>> >>>>>>>>> Is there then a benefit to *not* to send search requests to leader, >>>>>>>>> but >>>>>>>>> only to replicas? >>>>>>>>> >>>>>>>>> Given index & search heavy Solr Cloud system, is it possible to >>>>>>>>> separate >>>>>>>>> search from indexing nodes? >>>>>>>>> >>>>>>>>> >>>>>>>>> RE: Solr 5.5.0 >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Jaroslaw Rozanski | e: m...@jarekrozanski.com >>>>>>>>> 695E 436F A176 4961 7793 5C70 AFDF FB5E 682C 4D3D >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Jaroslaw Rozanski | e: m...@jarekrozanski.com >>>>>>> 695E 436F A176 4961 7793 5C70 AFDF FB5E 682C 4D3D >>>>>>> >>>>>>> >>>>>> >>>>> >>>>> -- >>>>> Jaroslaw Rozanski | e: m...@jarekrozanski.com >>>>> 695E 436F A176 4961 7793 5C70 AFDF FB5E 682C 4D3D >>>>> >>> >>> -- >>> Jaroslaw Rozanski | e: m...@jarekrozanski.com >>> 695E 436F A176 4961 7793 5C70 AFDF FB5E 682C 4D3D >>> > > -- > Jaroslaw Rozanski | e: m...@jarekrozanski.com > 695E 436F A176 4961 7793 5C70 AFDF FB5E 682C 4D3D >