Re: waiting for client write

2021-06-13 Thread Ayub Khan
Ranier,

I did the MTU change and it did seem to bring down the clientWrite waits to
half.

The change I did was to enable ICMP to have Destination Unreachable
fragmentation needed and DF set

"When there is a difference in the MTU size in the network between two
hosts, first make sure that your network settings don't block path MTU
discovery (PMTUD). PMTUD enables the receiving host to respond to the
originating host with the following ICMP message: Destination Unreachable:
fragmentation needed and DF set (ICMP Type 3, Code 4). This message
instructs the originating host to use the lowest MTU size along the network
path to resend the request. "

https://docs.aws.amazon.com/redshift/latest/mgmt/connecting-drop-issues.html



Vijay,

Below is the pgbench result which was executed from the remote instance
pointing to RDS

postgres@localhost:/archive$ pgbench -h pg-cluster -p 5432 -U testuser -c
50 -j 2 -P 60 -T 600 testdb -f /archive/test.sql
starting vacuum...pgbench: error: ERROR:  relation "pgbench_branches" does
not exist
pgbench: (ignoring this error and continuing anyway)
pgbench: error: ERROR:  relation "pgbench_tellers" does not exist
pgbench: (ignoring this error and continuing anyway)
pgbench: error: ERROR:  relation "pgbench_history" does not exist
pgbench: (ignoring this error and continuing anyway)
end.
progress: 60.0 s, 18.3 tps, lat 2631.655 ms stddev 293.592
progress: 120.0 s, 19.6 tps, lat 2533.271 ms stddev 223.722
progress: 180.0 s, 20.3 tps, lat 2446.050 ms stddev 158.397
progress: 240.1 s, 19.2 tps, lat 2575.506 ms stddev 292.418
progress: 300.0 s, 20.0 tps, lat 2482.908 ms stddev 181.770
progress: 360.0 s, 22.1 tps, lat 2245.147 ms stddev 110.855
progress: 420.0 s, 20.7 tps, lat 2397.270 ms stddev 289.324
progress: 480.0 s, 18.8 tps, lat 2625.595 ms stddev 240.250
progress: 540.0 s, 20.1 tps, lat 2467.336 ms stddev 133.121
progress: 600.0 s, 20.2 tps, lat 2455.824 ms stddev 137.976
transaction type: /archive/test.sql
scaling factor: 1
query mode: simple
number of clients: 50
number of threads: 2
duration: 600 s
number of transactions actually processed: 12007
latency average = 2480.042 ms
latency stddev = 242.243 ms
tps = 19.955602 (including connections establishing)
tps = 19.955890 (excluding connections establishing)

On Wed, Jun 9, 2021 at 5:47 PM Ayub Khan  wrote:

> attached is the screenshot of RDS performance insights for AWS and it
> shows high waiting client writes. The api performance is slow. I read that
> this might be due to IOPS on RDS. However we have 80k IOPS on this test
> RDS.
>
> Below is the query which is being load tested
>
> SELECT
>
>a.menu_item_id,
>a.menu_item_name,
>a.menu_item_category_id,
>b.menu_item_category_desc,
>c.menu_item_variant_id,
>c.menu_item_variant_type_id,
>c.price,
>c.size_id,
>c.parent_menu_item_variant_id,
>d.menu_item_variant_type_desc,
>e.size_desc,
>f.currency_code,
>a.image,
>a.mark_id,
>m.mark_name
>
>  FROM .menu_item_category AS b, .menu_item_variant AS
> c,
>.menu_item_variant_type AS d, .item_size AS e,
> .restaurant AS f,
>.menu_item AS a
>
>LEFT OUTER JOIN .mark AS m
>  ON (a.mark_id = m.mark_id)
>
>  WHERE a.menu_item_category_id =
> b.menu_item_category_id AND a.menu_item_id = c.menu_item_id AND
>c.menu_item_variant_type_id =
> d.menu_item_variant_type_id AND d.is_hidden = 'false' AND
>c.size_id = e.size_id AND a.restaurant_id =
> f.restaurant_id AND f.restaurant_id = 1528 AND
>(a.menu_item_category_id = NULL OR NULL IS NULL)
>
>AND c.menu_item_variant_id = (SELECT
> min(menu_item_variant_id)
>  FROM
> .menu_item_variant
>  WHERE
> menu_item_id = a.menu_item_id AND deleted = 'N'
>  LIMIT 1) AND
> a.active = 'Y'
>AND (CONCAT_WS('', ',', a.hidden_branch_ids,
> ',') NOT LIKE CONCAT_WS('', '%,4191,%') OR
> NULL IS NULL)
>AND .is_menu_item_available(a.menu_item_id, 'Y') = 'Y'
>
>ORDER BY a.row_order, menu_item_id;
>
> --Ayub
>


-- 

Sun Certified Enterprise Architect 1.5
Sun Certified Java Programmer 1.4
Microsoft Certified Systems Engineer 2000
http://in.linkedin.com/pub/ayub-khan/a/811/b81
mobile:+966-502674604

Re: waiting for client write

2021-06-13 Thread Vijaykumar Jain
thanks.

>latency average = 2480.042 ms

that latency is pretty high, even after changing the mtu ? for a query that
takes 5ms to run (from your explain analyze above) and returns a few 100
rows.

so it does look like a network latency, but it seems strange when you said
the same query from the same ec2 host ran fast against oracle compared to
postgres RDS.
So oracle and RDS on vms with separate mtu  setting ?

i was tried to simulate issues with the client if any :),
Slow ClientWrites · Issue #2480 · brianc/node-postgres (github.com)

Some intial performance work by brianc · Pull Request #2031 ·
brianc/node-postgres (github.com)

ClientRead statement_timeout · Issue #1952 · brianc/node-postgres
(github.com) 

I tried to play around with FEBE protocols to delay flush sync etc, but
could not simulate clientwrite wait event :(. Sorry.
and i am not touching java :)

I was asking for a run like this, with -r that would have shown latency per
statement. but anyways.

Below I make use of tc (traffic control) to add a delay to my lo interface,
and check how pgbench runs vary with increased latency at interface.
useless for your case, but i use this to tell dev when they have
slot queries if, roundtrip delay is high, it is not a pg fault :)

postgres@db:~/playground$ sudo tc -s qdisc | head -3
qdisc noqueue 0: dev lo root refcnt 2
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0

postgres@db:~/playground$ sudo tc qdisc add dev lo root netem delay 100ms
 # add a delay on lo of 100ms via tc module

postgres@db:~/playground$ sudo tc -s qdisc | head -3
qdisc netem 8007: dev lo root refcnt 2 limit 1000 delay 100.0ms
 Sent 8398 bytes 15 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0

postgres@db:~/playground$ pgbench -c 2 -C -j 2 -n -P 2 -T 10 -r -f
pgbench.test -h 127.0.0.1
progress: 2.0 s, 1.0 tps, lat 603.211 ms stddev 0.007
progress: 4.2 s, 1.8 tps, lat 602.838 ms stddev 0.101
progress: 6.0 s, 1.1 tps, lat 603.730 ms stddev 0.034
progress: 8.0 s, 2.0 tps, lat 603.058 ms stddev 0.081
progress: 10.3 s, 1.8 tps, lat 600.852 ms stddev 0.030
pgbench (PostgreSQL) 14.0
transaction type: pgbench.test
scaling factor: 1
query mode: simple
number of clients: 2
number of threads: 2
duration: 10 s
number of transactions actually processed: 18
latency average = 602.357 ms
latency stddev = 1.112 ms
average connection time = 605.499 ms
tps = 1.655749 (including reconnection times)
statement latencies in milliseconds:
   200.672  begin;
   200.797  select 1;
   200.917  end;

postgres@db:~/playground$ sudo tc qdisc del dev lo root netem  # remove
delay
postgres@db:~/playground$ sudo tc -s qdisc | head -3
qdisc noqueue 0: dev lo root refcnt 2
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0

postgres@db:~/playground$ pgbench -c 2 -C -j 2 -n -P 2 -T 10 -r -f
pgbench.test -h 127.0.0.1
progress: 2.0 s, 1272.4 tps, lat 0.200 ms stddev 0.273
progress: 4.0 s, 1155.3 tps, lat 0.306 ms stddev 0.304
progress: 6.0 s, 1241.7 tps, lat 0.261 ms stddev 0.290
progress: 8.0 s, 1508.6 tps, lat 0.150 ms stddev 0.140
progress: 10.0 s, 1172.7 tps, lat 0.292 ms stddev 0.302
pgbench (PostgreSQL) 14.0
transaction type: pgbench.test
scaling factor: 1
query mode: simple
number of clients: 2
number of threads: 2
duration: 10 s
number of transactions actually processed: 12704
latency average = 0.236 ms
latency stddev = 0.271 ms
average connection time = 1.228 ms
tps = 1270.314254 (including reconnection times)
statement latencies in milliseconds:
 0.074  begin;
 0.115  select 1;
 0.048  end;


Re: waiting for client write

2021-06-13 Thread Ayub Khan
Vijay,

I did not change the MTU on the network interface but  created incoming
rule on the security group  as per the below documentation:

 PMTUD enables the receiving host to respond to the originating host with
the following ICMP message: Destination Unreachable: fragmentation needed
and DF set (ICMP Type 3, Code 4). This message instructs the originating
host to use the lowest MTU size along the network path to resend the
request. Without this negotiation, packet drop can occur because the
request is too large for the receiving host to accept.

I also did another test, instead of using RDS, installed postgresql on a
similar VM as that of where oracle is installed and tested it. Now even
when both client and postgresql VMs have the same MTU settings still in the
pg activity table I could see clientwrite waits.

-Ayub

On Wed, Jun 9, 2021 at 5:47 PM Ayub Khan  wrote:

> attached is the screenshot of RDS performance insights for AWS and it
> shows high waiting client writes. The api performance is slow. I read that
> this might be due to IOPS on RDS. However we have 80k IOPS on this test
> RDS.
>
> Below is the query which is being load tested
>
> SELECT
>
>a.menu_item_id,
>a.menu_item_name,
>a.menu_item_category_id,
>b.menu_item_category_desc,
>c.menu_item_variant_id,
>c.menu_item_variant_type_id,
>c.price,
>c.size_id,
>c.parent_menu_item_variant_id,
>d.menu_item_variant_type_desc,
>e.size_desc,
>f.currency_code,
>a.image,
>a.mark_id,
>m.mark_name
>
>  FROM .menu_item_category AS b, .menu_item_variant AS
> c,
>.menu_item_variant_type AS d, .item_size AS e,
> .restaurant AS f,
>.menu_item AS a
>
>LEFT OUTER JOIN .mark AS m
>  ON (a.mark_id = m.mark_id)
>
>  WHERE a.menu_item_category_id =
> b.menu_item_category_id AND a.menu_item_id = c.menu_item_id AND
>c.menu_item_variant_type_id =
> d.menu_item_variant_type_id AND d.is_hidden = 'false' AND
>c.size_id = e.size_id AND a.restaurant_id =
> f.restaurant_id AND f.restaurant_id = 1528 AND
>(a.menu_item_category_id = NULL OR NULL IS NULL)
>
>AND c.menu_item_variant_id = (SELECT
> min(menu_item_variant_id)
>  FROM
> .menu_item_variant
>  WHERE
> menu_item_id = a.menu_item_id AND deleted = 'N'
>  LIMIT 1) AND
> a.active = 'Y'
>AND (CONCAT_WS('', ',', a.hidden_branch_ids,
> ',') NOT LIKE CONCAT_WS('', '%,4191,%') OR
> NULL IS NULL)
>AND .is_menu_item_available(a.menu_item_id, 'Y') = 'Y'
>
>ORDER BY a.row_order, menu_item_id;
>
> --Ayub
>


-- 

Sun Certified Enterprise Architect 1.5
Sun Certified Java Programmer 1.4
Microsoft Certified Systems Engineer 2000
http://in.linkedin.com/pub/ayub-khan/a/811/b81
mobile:+966-502674604
--
It is proved that Hard Work and kowledge will get you close but attitude
will get you there. However, it's the Love
of God that will put you over the top!!


Re: waiting for client write

2021-06-13 Thread Ayub Khan
Vijay,

below is the benchmark result when executed against bench_mark database
instead of running the test with slow query on the application database.
This shows that it might not be an issue with MTU but some issue with the
application database itself and the query.


postgres@localhost:~$ pgbench -h test-cluster -p 5432 -U testuser -c 50 -j
2 -P 60 -T 600 bench_mark
starting vacuum...end.
progress: 60.0 s, 17830.3 tps, lat 2.765 ms stddev 0.632
progress: 120.0 s, 18450.3 tps, lat 2.681 ms stddev 0.582
progress: 180.0 s, 18405.0 tps, lat 2.688 ms stddev 0.588
progress: 240.0 s, 17087.9 tps, lat 2.897 ms stddev 0.717
progress: 300.0 s, 18280.6 tps, lat 2.706 ms stddev 0.595
progress: 360.0 s, 18433.9 tps, lat 2.683 ms stddev 0.582
progress: 420.0 s, 18308.4 tps, lat 2.702 ms stddev 0.599
progress: 480.0 s, 18156.7 tps, lat 2.725 ms stddev 0.615
progress: 540.0 s, 16803.3 tps, lat 2.946 ms stddev 0.764
progress: 600.0 s, 18266.6 tps, lat 2.708 ms stddev 0.602
transaction type: 
scaling factor: 150
query mode: simple
number of clients: 50
number of threads: 2
duration: 600 s
number of transactions actually processed: 10801425
latency average = 2.747 ms
latency stddev = 0.635 ms
tps = 18001.935315 (including connections establishing)
tps = 18002.205940 (excluding connections establishing)

On Wed, Jun 9, 2021 at 5:47 PM Ayub Khan  wrote:

> attached is the screenshot of RDS performance insights for AWS and it
> shows high waiting client writes. The api performance is slow. I read that
> this might be due to IOPS on RDS. However we have 80k IOPS on this test
> RDS.
>
> Below is the query which is being load tested
>
> SELECT
>
>a.menu_item_id,
>a.menu_item_name,
>a.menu_item_category_id,
>b.menu_item_category_desc,
>c.menu_item_variant_id,
>c.menu_item_variant_type_id,
>c.price,
>c.size_id,
>c.parent_menu_item_variant_id,
>d.menu_item_variant_type_desc,
>e.size_desc,
>f.currency_code,
>a.image,
>a.mark_id,
>m.mark_name
>
>  FROM .menu_item_category AS b, .menu_item_variant AS
> c,
>.menu_item_variant_type AS d, .item_size AS e,
> .restaurant AS f,
>.menu_item AS a
>
>LEFT OUTER JOIN .mark AS m
>  ON (a.mark_id = m.mark_id)
>
>  WHERE a.menu_item_category_id =
> b.menu_item_category_id AND a.menu_item_id = c.menu_item_id AND
>c.menu_item_variant_type_id =
> d.menu_item_variant_type_id AND d.is_hidden = 'false' AND
>c.size_id = e.size_id AND a.restaurant_id =
> f.restaurant_id AND f.restaurant_id = 1528 AND
>(a.menu_item_category_id = NULL OR NULL IS NULL)
>
>AND c.menu_item_variant_id = (SELECT
> min(menu_item_variant_id)
>  FROM
> .menu_item_variant
>  WHERE
> menu_item_id = a.menu_item_id AND deleted = 'N'
>  LIMIT 1) AND
> a.active = 'Y'
>AND (CONCAT_WS('', ',', a.hidden_branch_ids,
> ',') NOT LIKE CONCAT_WS('', '%,4191,%') OR
> NULL IS NULL)
>AND .is_menu_item_available(a.menu_item_id, 'Y') = 'Y'
>
>ORDER BY a.row_order, menu_item_id;
>
> --Ayub
>


-- 

Sun Certified Enterprise Architect 1.5
Sun Certified Java Programmer 1.4
Microsoft Certified Systems Engineer 2000
http://in.linkedin.com/pub/ayub-khan/a/811/b81
mobile:+966-502674604
--
It is proved that Hard Work and kowledge will get you close but attitude
will get you there. However, it's the Love
of God that will put you over the top!!