[jira] [Commented] (GEODE-8278) Gateway sender queues using heap memory way above configured value after server restart

Barrett Oglesby (Jira) Wed, 16 Dec 2020 14:12:37 -0800


    [ 
https://issues.apache.org/jira/browse/GEODE-8278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17250674#comment-17250674
 ]


Barrett Oglesby commented on GEODE-8278:
----------------------------------------

One thing note is that the GatewaySenderEventImpl and the 
VersionedThinDiskRegionEntryHeapStringKey1 share a reference to the value in 
the initial load.

The VersionedThinDiskRegionEntryHeapStringKey1 is the entry in the data region 
used in the case of a String key.

The VersionedThinDiskRegionEntryHeapStringKey1 value field contains a 
VMCachedDeserializable. The VMCachedDeserializable value field contains a 
byte[].

The GatewaySenderEventImpl value field contains a byte[].

These two byte[]s are identical after the initial load.

>From the above histograms in the load case, the totals show:
{noformat}
Total        521194      126837648
Total        579438      128892808
{noformat}
If I size the actual queue entries, I see:
{noformat}
Sender primaryEntries=2,530; primaryBytes=39,751,392; secondaryEntries=2,470; 
secondaryBytes=38,726,816; totalEntries=5,000; totalBytes=78,478,208
Sender primaryEntries=2,470; primaryBytes=38,726,816; secondaryEntries=2,530; 
secondaryBytes=39,751,392; totalEntries=5,000; totalBytes=78,478,208
{noformat}
That shared reference is broken in the recovery case. The 
GatewaySenderEventImpl and the VersionedThinDiskRegionEntryHeapStringKey1 will 
each have their own copy of the value.

After recovery with no changes, the totals are:
{noformat}
Total        442436      133431416
Total        447847      226306728
{noformat}
If I size the actual queue entries, I see:
{noformat}
Sender primaryEntries=2,467; primaryBytes=259,256; secondaryEntries=2,533; 
secondaryBytes=266,168; totalEntries=5,000; totalBytes=525,424
Sender primaryEntries=2,533; primaryBytes=39,818,456; secondaryEntries=2,467; 
secondaryBytes=51,515,512; totalEntries=5,000; totalBytes=91,333,968
{noformat}
One server has recovered the keys; the other server has recovered the keys and 
done a GII to get the values. Eviction has not occurred in this case since the 
totalBytes is greater than the initial load case.

After recovery with the changes I posted above, the totals are:
{noformat}
Total        443319      133451360
Total        449229      204864160
{noformat}
If I size the actual queue entries, I see:
{noformat}
Sender primaryEntries=2,490; primaryBytes=261,648; secondaryEntries=2,510; 
secondaryBytes=263,776; totalEntries=5,000; totalBytes=525,424
Sender primaryEntries=2,510; primaryBytes=34,831,496; secondaryEntries=2,490; 
secondaryBytes=44,224,224; totalEntries=5,000; totalBytes=79,055,720
{noformat}
One server has recovered the keys; the other server has recovered the keys and 
done a GII to get the values. Eviction has occurred in this case since the 
totalBytes is similar to the initial load case.


> Gateway sender queues using heap memory way above configured value after 
> server restart
> ---------------------------------------------------------------------------------------
>
>                 Key: GEODE-8278
>                 URL: https://issues.apache.org/jira/browse/GEODE-8278
>             Project: Geode
>          Issue Type: Bug
>          Components: eviction
>            Reporter: Alberto Gomez
>            Assignee: Alberto Gomez
>            Priority: Major
>
> In a Geode system with the following characteristics:
>  * WAN replication
>  * partition redundant regions
>  * overflow configured for the gateway senders queues by means of persistence 
> and maximum queue memory set.
>  * gateway receivers stopped in one site (B)
>  * Operations sent to the site that does not have the gateway receivers 
> stopped (A)
> When operations are sent to site A, the gateway sender queues start to grow 
> as expected and the heap memory consumed by the queues does not grow 
> indefinitely given that there is overflow to disk when the limit is reached.
> But, if a server is restarted, the restarted server will show a much higher 
> heap memory used than the memory used by this server before it was restarted 
> or by the other servers.
> This can even provoke that the server cannot be restarted if the heap memory 
> it requires is above the limit configured.
> According to the memory analyzer the entries taking up the memory are 
> subclasses of ```VMThinDiskLRURegionEntryHeap```.
> The number of instances of this type are the same in the restarted server 
> than in the not restarted servers but on the restarted server they take much 
> more memory. The reason seems to be that the ```value``` member attribute of 
> the instances, in the case of the restarted server contains 
> ```VMCachedDeserializable``` objects while in the case of the not restarted 
> server the attribute contains either ```null``` or 
> ```GatewaySenderEventImpl``` objects that use much less memory than the 
> ```VMCachedDeserializable``` ones.
>  If redundancy is not configured for the region then the problem is not 
> manifested, i.e. the heap memory used by the restarted server is similar to 
> the one prior to the restart.
> If the node not restarted is restarted then the previously restarted node 
> seems to release the extra memory (my guess is that it is processing the 
> other process queue).
> Also, if traffic is sent again to the Geode cluster, then it seems eviction 
> kicks in and after some short time, the memory of the restarted server goes 
> down to the level it had before it had been restarted.
> As a summary, the problem seems to be that if a server does GII 
> (getInitialImage) from another server, eviction does not occur for gateway 
> sender queue entries.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (GEODE-8278) Gateway sender queues using heap memory way above configured value after server restart

Reply via email to