date:20180920

Why could different data in a table be processed with different performance?

2018-09-20 Thread Vladimir Ryabtsev

I am experiencing a strange performance problem when accessing JSONB
content by primary key.

My DB version() is PostgreSQL 10.3 (Ubuntu 10.3-1.pgdg14.04+1) on
x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 4.8.4-2ubuntu1~14.04.4) 4.8.4,
64-bit
postgres.conf: https://justpaste.it/6pzz1
uname -a: Linux postgresnlpslave 4.4.0-62-generic #83-Ubuntu SMP Wed Jan 18
14:10:15 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
The machine is virtual, running under Hyper-V.
Processor: Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz, 1x1 cores
Disk storage: the host has two vmdx drives, first shared between the root
partition and an LVM PV, second is a single LVM PV. Both PVs are in a VG
containing swap and postgres data partitions. The data is mostly on the
first PV.

I have such a table:

CREATE TABLE articles
(
article_id bigint NOT NULL,
content jsonb NOT NULL,
published_at timestamp without time zone NOT NULL,
appended_at timestamp without time zone NOT NULL,
source_id integer NOT NULL,
language character varying(2) NOT NULL,
title text NOT NULL,
topicstopic[] NOT NULL,
objects object[] NOT NULL,
cluster_id bigint NOT NULL,
CONSTRAINT articles_pkey PRIMARY KEY (article_id)
)

We have a Python lib (using psycopg2 driver) to access this table. It
executes simple queries to the table, one of them is used for bulk
downloading of content and looks like this:

select content from articles where id between $1 and $2

I noticed that with some IDs it works pretty fast while with other it is
4-5 times slower. It is suitable to note, there are two main 'categories'
of IDs in this table: first is range 27000-5, and second is
range 100-10003000. For the first range it is 'fast' and for
the second it is 'slow'. Besides larger absolute numbers withdrawing them
from int to bigint, values in the second range are more 'sparse', which
means in the first range values are almost consequent (with very few
'holes' of missing values) while in the second range there are much more
'holes' (average filling is 35%). Total number of rows in the first range:
~62M, in the second range: ~10M.

I conducted several experiments to eliminate possible influence of
library's code and network throughput, I omit some of them. I ended up with
iterating over table with EXPLAIN to simulate read load:

explain (analyze, buffers)
select count(*), sum(length(content::text)) from articles where article_id
between %s and %s

Sample output:

Aggregate  (cost=8635.91..8635.92 rows=1 width=16) (actual
time=6625.993..6625.995 rows=1 loops=1)
  Buffers: shared hit=26847 read=3914
  ->  Index Scan using articles_pkey on articles  (cost=0.57..8573.35
rows=5005 width=107) (actual time=21.649..1128.004 rows=5000 loops=1)
Index Cond: ((article_id >= 43800) AND (article_id <=
438005000))
Buffers: shared hit=4342 read=671
Planning time: 0.393 ms
Execution time: 6626.136 ms

Aggregate  (cost=5533.02..5533.03 rows=1 width=16) (actual
time=33219.100..33219.102 rows=1 loops=1)
  Buffers: shared hit=6568 read=7104
  ->  Index Scan using articles_pkey on articles  (cost=0.57..5492.96
rows=3205 width=107) (actual time=22.167..12082.624 rows=2416 loops=1)
Index Cond: ((article_id >= '10002100'::bigint) AND (article_id
<= '10002101'::bigint))
Buffers: shared hit=50 read=2378
Planning time: 0.517 ms
Execution time: 33219.218 ms

During iteration, I parse the result of EXPLAIN and collect series of
following metrics:

- buffer hits/reads for the table,
- buffer hits/reads for the index,
- number of rows (from "Index Scan..."),
- duration of execution.

Based on metrics above I calculate inherited metrics:

- disk read rate: (index reads + table reads) * 8192 / duration,
- reads ratio: (index reads + table reads) / (index reads + table reads +
index hits + table hits),
- data rate: (index reads + table reads + index hits + table hits) * 8192 /
duration,
- rows rate: number of rows / duration.

Since "density" of IDs is different in "small" and "big" ranges, I adjusted
size of chunks in order to get around 5000 rows on each iteration in both
cases, though my experiments show that chunk size does not really matter a
lot.

The issue posted at the very beginning of my message was confirmed for the
*whole* first and second ranges (so it was not just caused by randomly
cached data).

To eliminate cache influence, I restarted Postgres server with flushing
buffers:

/$ postgresql stop; sync; echo 3 > /proc/sys/vm/drop_caches; postgresql
start

After this I repeated the test and got next-to-same picture.

"Small' range: disk read rate is around 10-11 MB/s uniformly across the
test. Output rate was 1300-1700 rows/s. Read ratio is around 13% (why?
Shouldn't it be ~ 100% after drop_caches?).
"Big" range: In most of time disk read speed was about 2 MB/s but sometimes
it jumped to 26-30 MB/s. Output rate was 70-80 rows/s (but varied a lot and
reached 8000 rows/s). Read ratio also varied a lot.

I rendere

Re: Why could different data in a table be processed with different performance?

2018-09-20 Thread Justin Pryzby

On Thu, Sep 20, 2018 at 05:07:21PM -0700, Vladimir Ryabtsev wrote:
> I am experiencing a strange performance problem when accessing JSONB
> content by primary key.

> I noticed that with some IDs it works pretty fast while with other it is
> 4-5 times slower. It is suitable to note, there are two main 'categories'
> of IDs in this table: first is range 27000-5, and second is
> range 100-10003000. For the first range it is 'fast' and for
> the second it is 'slow'.

Was the data populated differently, too ?
Has the table been reindexed (or pg_repack'ed) since loading (or vacuumed for
that matter) ?
Were the tests run when the DB was otherwise idle?

You can see the index scan itself takes an additional 11sec, the "heap" portion
takes the remaining, additional 14sec (33s-12s-7s).

So it seems to me like the index itself is slow to scan.  *And*, the heap
referenced by the index is slow to scan, probably due to being referenced by
the index less consecutively.

> "Small' range: disk read rate is around 10-11 MB/s uniformly across the
> test. Output rate was 1300-1700 rows/s. Read ratio is around 13% (why?
> Shouldn't it be ~ 100% after drop_caches?).

I guess you mean buffers cache hit ratio: read/hit, which I think should
actually be read/(hit+read).

It's because a given buffer can be requested multiple times.  For example, if
an index page is read which references multiple items on the same heap page,
each heap access is counted separately.  If the index is freshly built, that'd
happen nearly every item.

Justin

> Aggregate  (cost=8635.91..8635.92 rows=1 width=16) (actual 
> time=6625.993..6625.995 rows=1 loops=1)
>   Buffers: shared hit=26847 read=3914
>   ->  Index Scan using articles_pkey on articles  (cost=0.57..8573.35 
> rows=5005 width=107) (actual time=21.649..1128.004 rows=5000 loops=1)
> Index Cond: ((article_id >= 43800) AND (article_id <= 438005000))
> Buffers: shared hit=4342 read=671

> Aggregate  (cost=5533.02..5533.03 rows=1 width=16) (actual 
> time=33219.100..33219.102 rows=1 loops=1)
>   Buffers: shared hit=6568 read=7104
>   ->  Index Scan using articles_pkey on articles  (cost=0.57..5492.96 
> rows=3205 width=107) (actual time=22.167..12082.624 rows=2416 loops=1)
> Index Cond: ((article_id >= '10002100'::bigint) AND (article_id 
> <= '10002101'::bigint))
> Buffers: shared hit=50 read=2378

Re: Why could different data in a table be processed with different performance?

2018-09-20 Thread Vladimir Ryabtsev

> Was the data populated differently, too ?
Here is how new records were coming in last two month, by days:
https://i.stack.imgur.com/zp9WP.png During a day, records come evenly (in
both ranges), slightly faster in Europe and American work time.

Since Jul 1, 2018, when we started population by online records, trend was
approximately same as before Aug 04, 2018 (see picture). Then it changed
for "big" range, we now in some transition period until it stabilizes.

We also have imported historical data massively from another system. First
part was the range with big numbers, they were added in couple of days,
second part was range with small numbers, it took around a week. Online
records were coming uninterruptedly during the import.

Rows are updated rarely and almost never deleted.

Here is distribution of JSONB field length (if converted to ::text) in last
5 days:
<10KB: 665066
10-20KB: 225697
20-30KB: 25640
30-40KB: 6678
40-50KB: 2100
50-60KB: 1028
Other (max 2.7MB): 2248 (only single exemplars larger than 250KB)

> Has the table been reindexed (or pg_repack'ed) since loading (or vacuumed
for that matter) ?
Not sure what you mean... We created indexes on some fields (on
appended_at, published_at, source_id).
When I came across the problem I noticed that table is not being vacuumed.
I then ran VACUUM ANALYZE manually but it did not change anything about the
issue.

> Were the tests run when the DB was otherwise idle?
Yes, like I said, my test were performed on slave, the were no any other
users connected (only me monitoring sessions from pgAdmin), and I never
noticed any significant I/O from processes other than postgres (only light
load from replication).

> You can see the index scan itself takes an additional 11sec, the "heap"
portion takes the remaining, additional 14sec (33s-12s-7s).
Sorry, I see 33 s total and 12 s for index, where do you see 7 s?

> I guess you mean buffers cache hit ratio: read/hit, which I think should
actually be read/(hit+read).
I will quote myself:
> reads ratio: (index reads + table reads) / (index reads + table reads +
index hits + table hits)
So yes, you are right, it is.

+ Some extra info about my system from QA recommendations:

OS version: Ubuntu 16.04.2 LTS / xenial

~$ time dd if=/dev/mapper/postgresnlpslave--vg-root of=/dev/null bs=1M
count=32K skip=$((128*$RANDOM/32))
32768+0 records in
32768+0 records out
34359738368 bytes (34 GB, 32 GiB) copied, 62.1574 s, 553 MB/s
0.05user 23.13system 1:02.15elapsed 37%CPU (0avgtext+0avgdata
3004maxresident)k
67099496inputs+0outputs (0major+335minor)pagefaults 0swaps

DBMS is accessed directly (no pgpool, pgbouncer, etc).

RAM: 58972 MB

On physical device level RAID10 is used.

Table metadata: (relname, relpages, reltuples, relallvisible, relkind,
relnatts, relhassubclass, reloptions, pg_table_size(oid)) = (articles,
7824944, 6.74338e+07, 7635864, 10, false, 454570926080)

Regards,
Vlad

чт, 20 сент. 2018 г. в 17:42, Justin Pryzby :

> On Thu, Sep 20, 2018 at 05:07:21PM -0700, Vladimir Ryabtsev wrote:
> > I am experiencing a strange performance problem when accessing JSONB
> > content by primary key.
>
> > I noticed that with some IDs it works pretty fast while with other it is
> > 4-5 times slower. It is suitable to note, there are two main 'categories'
> > of IDs in this table: first is range 27000-5, and second is
> > range 100-10003000. For the first range it is 'fast' and for
> > the second it is 'slow'.
>
> Was the data populated differently, too ?
> Has the table been reindexed (or pg_repack'ed) since loading (or vacuumed
> for
> that matter) ?
> Were the tests run when the DB was otherwise idle?
>
> You can see the index scan itself takes an additional 11sec, the "heap"
> portion
> takes the remaining, additional 14sec (33s-12s-7s).
>
> So it seems to me like the index itself is slow to scan.  *And*, the heap
> referenced by the index is slow to scan, probably due to being referenced
> by
> the index less consecutively.
>
> > "Small' range: disk read rate is around 10-11 MB/s uniformly across the
> > test. Output rate was 1300-1700 rows/s. Read ratio is around 13% (why?
> > Shouldn't it be ~ 100% after drop_caches?).
>
> I guess you mean buffers cache hit ratio: read/hit, which I think should
> actually be read/(hit+read).
>
> It's because a given buffer can be requested multiple times.  For example,
> if
> an index page is read which references multiple items on the same heap
> page,
> each heap access is counted separately.  If the index is freshly built,
> that'd
> happen nearly every item.
>
> Justin
>
> > Aggregate  (cost=8635.91..8635.92 rows=1 width=16) (actual
> time=6625.993..6625.995 rows=1 loops=1)
> >   Buffers: shared hit=26847 read=3914
> >   ->  Index Scan using articles_pkey on articles  (cost=0.57..8573.35
> rows=5005 width=107) (actual time=21.649..1128.004 rows=5000 loops=1)
> > Index Cond: ((article_id >= 43800) AND (article_id <=
> 438005000))
> > Buffers: s

Re: Why could different data in a table be processed with different performance?

2018-09-20 Thread Justin Pryzby

Sorry, dropped -performance.

 Has the table been reindexed (or pg_repack'ed) since loading (or vacuumed
 for that matter) ?
>>> Not sure what you mean... We created indexes on some fields (on
>> I mean REINDEX INDEX articles_pkey;
>> Or (from "contrib"): /usr/pgsql-10/bin/pg_repack -i articles_pkey
>I never did it... Do you recommend to try it? Which variant is preferable?

REINDEX is likely to block access to the table [0], and pg_repack is "online"
(except for briefly acquiring an exclusive lock).

[0] https://www.postgresql.org/docs/10/static/sql-reindex.html

 You can see the index scan itself takes an additional 11sec, the "heap"
 portion takes the remaining, additional 14sec (33s-12s-7s).
>>> Sorry, I see 33 s total and 12 s for index, where do you see 7 s?
>> 6625 ms (for short query).
>> So the heap component of the long query is 14512 ms slower.
> Yes, I see, thanks.
> So reindex can help only with index component? What should I do for heap?
> May be reindex the corresponding toast table?

I think reindex will improve the heap access..and maybe the index access too.
I don't see why it would be bloated without UPDATE/DELETE, but you could check
to see if its size changes significantly after reindex.

Justin

Re: Why could different data in a table be processed with different performance?

2018-09-20 Thread Laurenz Albe

Vladimir Ryabtsev wrote:
> explain (analyze, buffers)
> select count(*), sum(length(content::text)) from articles where article_id 
> between %s and %s
> 
> Sample output:
> 
> Aggregate  (cost=8635.91..8635.92 rows=1 width=16) (actual 
> time=6625.993..6625.995 rows=1 loops=1)
>   Buffers: shared hit=26847 read=3914
>   ->  Index Scan using articles_pkey on articles  (cost=0.57..8573.35 
> rows=5005 width=107) (actual time=21.649..1128.004 rows=5000 loops=1)
> Index Cond: ((article_id >= 43800) AND (article_id <= 438005000))
> Buffers: shared hit=4342 read=671
> Planning time: 0.393 ms
> Execution time: 6626.136 ms
> 
> Aggregate  (cost=5533.02..5533.03 rows=1 width=16) (actual 
> time=33219.100..33219.102 rows=1 loops=1)
>   Buffers: shared hit=6568 read=7104
>   ->  Index Scan using articles_pkey on articles  (cost=0.57..5492.96 
> rows=3205 width=107) (actual time=22.167..12082.624 rows=2416 loops=1)
> Index Cond: ((article_id >= '10002100'::bigint) AND (article_id 
> <= '10002101'::bigint))
> Buffers: shared hit=50 read=2378
> Planning time: 0.517 ms
> Execution time: 33219.218 ms
> 
> During iteration, I parse the result of EXPLAIN and collect series of 
> following metrics:
> 
> - buffer hits/reads for the table,
> - buffer hits/reads for the index,
> - number of rows (from "Index Scan..."),
> - duration of execution.
> 
> Based on metrics above I calculate inherited metrics:
> 
> - disk read rate: (index reads + table reads) * 8192 / duration,
> - reads ratio: (index reads + table reads) / (index reads + table reads + 
> index hits + table hits),
> - data rate: (index reads + table reads + index hits + table hits) * 8192 / 
> duration,
> - rows rate: number of rows / duration.
> 
> Since "density" of IDs is different in "small" and "big" ranges, I adjusted
> size of chunks in order to get around 5000 rows on each iteration in both 
> cases,
> though my experiments show that chunk size does not really matter a lot.
> 
> The issue posted at the very beginning of my message was confirmed for the
> *whole* first and second ranges (so it was not just caused by randomly cached 
> data).
> 
> To eliminate cache influence, I restarted Postgres server with flushing 
> buffers:
> 
> /$ postgresql stop; sync; echo 3 > /proc/sys/vm/drop_caches; postgresql start
> 
> After this I repeated the test and got next-to-same picture.
> 
> "Small' range: disk read rate is around 10-11 MB/s uniformly across the test.
> Output rate was 1300-1700 rows/s. Read ratio is around 13% (why? Shouldn't it 
> be
> ~ 100% after drop_caches?).
> "Big" range: In most of time disk read speed was about 2 MB/s but sometimes
> it jumped to 26-30 MB/s. Output rate was 70-80 rows/s (but varied a lot and
> reached 8000 rows/s). Read ratio also varied a lot.
> 
> I rendered series from the last test into charts:
> "Small" range: https://i.stack.imgur.com/3Zfml.png
> "Big" range (insane): https://i.stack.imgur.com/VXdID.png
> 
> During the tests I verified disk read speed with iotop and found its 
> indications
> very close to ones calculated by me based on EXPLAIN BUFFERS. I cannot say I 
> was
> monitoring it all the time, but I confirmed it when it was 2 MB/s and 22 MB/s 
> on
> the second range and 10 MB/s on the first range. I also checked with htop that
> CPU was not a bottleneck and was around 3% during the tests.
> 
> The issue is reproducible on both master and slave servers. My tests were 
> conducted
> on slave, while there were no any other load on DBMS, or disk activity on the
> host unrelated to DBMS.
> 
> My only assumption is that different fragments of data are being read with 
> different
> speed due to virtualization or something, but... why is it so strictly bound
> to these ranges? Why is it the same on two different machines?

What is the storage system?

Setting "track_io_timing = on" should measure the time spent doing I/O
more accurately.

One problem with measuring read speed that way is that "buffers read" can
mean "buffers read from storage" or "buffers read from the file system cache",
but you say you observe a difference even after dropping the cache.

To verify if the difference comes from the physical placement, you could
run VACUUM (FULL) which rewrites the table and see if that changes the behavior.

Another idea is that the operating system rearranges I/O in a way that
is not ideal for your storage.

Try a different I/O scheduler by running

echo deadline > /sys/block/sda/queue/scheduler

(replace "sda" with the disk where your database resides)

See if that changes the observed I/O speed.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com

Re: Why could different data in a table be processed with different performance?

2018-09-20 Thread Vladimir Ryabtsev

> Setting "track_io_timing = on" should measure the time spent doing I/O
more accurately.
I see I/O timings after this. It shows that 96.5% of long queries is spent
on I/O. If I subtract I/O time from total I get ~1,4 s for 5000 rows, which
is SAME for both ranges if I adjust segment borders accordingly (to match
~5000 rows). Only I/O time differs, and differs significantly.

> One problem with measuring read speed that way is that "buffers read" can
mean "buffers read from storage" or "buffers read from the file system
cache",
I understand, that's why I conducted experiments with drop_caches.

> but you say you observe a difference even after dropping the cache.
No, I say I see NO significant difference (accurate to measurement error)
between "with caches" and after dropping caches. And this is explainable, I
think. Since I read consequently almost all data from the huge table, no
cache can fit this data, thus it cannot influence significantly on results.
And whilst the PK index *could* be cached (in theory) I think its data is
being displaced from buffers by bulkier JSONB data.

Vlad

Why could different data in a table be processed with different performance?

Re: Why could different data in a table be processed with different performance?

Re: Why could different data in a table be processed with different performance?

Re: Why could different data in a table be processed with different performance?

Re: Why could different data in a table be processed with different performance?

Re: Why could different data in a table be processed with different performance?

6 matches

Site Navigation

Mail list logo

Footer information