Re: Why could different data in a table be processed with different performance?
> Another idea is that the operating system rearranges I/O in a way that is not ideal for your storage. > Try a different I/O scheduler by running echo deadline > /sys/block/sda/queue/scheduler My scheduler was already "deadline". In some places I read that in virtual environment sometimes "noop" scheduler is better, so I tried it. However the experiment shown NO noticeable difference between them (look "deadline": https://i.stack.imgur.com/wCOJW.png, "noop": https://i.stack.imgur.com/lB33u.png). At the same time tests show almost similar patterns in changing read speed when going over the "slow" range. Vlad чт, 20 сент. 2018 г. в 20:17, Laurenz Albe : > Vladimir Ryabtsev wrote: > > explain (analyze, buffers) > > select count(*), sum(length(content::text)) from articles where > article_id between %s and %s > > > > Sample output: > > > > Aggregate (cost=8635.91..8635.92 rows=1 width=16) (actual > time=6625.993..6625.995 rows=1 loops=1) > > Buffers: shared hit=26847 read=3914 > > -> Index Scan using articles_pkey on articles (cost=0.57..8573.35 > rows=5005 width=107) (actual time=21.649..1128.004 rows=5000 loops=1) > > Index Cond: ((article_id >= 43800) AND (article_id <= > 438005000)) > > Buffers: shared hit=4342 read=671 > > Planning time: 0.393 ms > > Execution time: 6626.136 ms > > > > Aggregate (cost=5533.02..5533.03 rows=1 width=16) (actual > time=33219.100..33219.102 rows=1 loops=1) > > Buffers: shared hit=6568 read=7104 > > -> Index Scan using articles_pkey on articles (cost=0.57..5492.96 > rows=3205 width=107) (actual time=22.167..12082.624 rows=2416 loops=1) > > Index Cond: ((article_id >= '10002100'::bigint) AND > (article_id <= '10002101'::bigint)) > > Buffers: shared hit=50 read=2378 > > Planning time: 0.517 ms > > Execution time: 33219.218 ms > > > > During iteration, I parse the result of EXPLAIN and collect series of > following metrics: > > > > - buffer hits/reads for the table, > > - buffer hits/reads for the index, > > - number of rows (from "Index Scan..."), > > - duration of execution. > > > > Based on metrics above I calculate inherited metrics: > > > > - disk read rate: (index reads + table reads) * 8192 / duration, > > - reads ratio: (index reads + table reads) / (index reads + table reads > + index hits + table hits), > > - data rate: (index reads + table reads + index hits + table hits) * > 8192 / duration, > > - rows rate: number of rows / duration. > > > > Since "density" of IDs is different in "small" and "big" ranges, I > adjusted > > size of chunks in order to get around 5000 rows on each iteration in > both cases, > > though my experiments show that chunk size does not really matter a lot. > > > > The issue posted at the very beginning of my message was confirmed for > the > > *whole* first and second ranges (so it was not just caused by randomly > cached data). > > > > To eliminate cache influence, I restarted Postgres server with flushing > buffers: > > > > /$ postgresql stop; sync; echo 3 > /proc/sys/vm/drop_caches; postgresql > start > > > > After this I repeated the test and got next-to-same picture. > > > > "Small' range: disk read rate is around 10-11 MB/s uniformly across the > test. > > Output rate was 1300-1700 rows/s. Read ratio is around 13% (why? > Shouldn't it be > > ~ 100% after drop_caches?). > > "Big" range: In most of time disk read speed was about 2 MB/s but > sometimes > > it jumped to 26-30 MB/s. Output rate was 70-80 rows/s (but varied a lot > and > > reached 8000 rows/s). Read ratio also varied a lot. > > > > I rendered series from the last test into charts: > > "Small" range: https://i.stack.imgur.com/3Zfml.png > > "Big" range (insane): https://i.stack.imgur.com/VXdID.png > > > > During the tests I verified disk read speed with iotop and found its > indications > > very close to ones calculated by me based on EXPLAIN BUFFERS. I cannot > say I was > > monitoring it all the time, but I confirmed it when it was 2 MB/s and 22 > MB/s on > > the second range and 10 MB/s on the first range. I also checked with > htop that > > CPU was not a bottleneck and was around 3% during the tests. > > > > The issue is reproducible on both master and slave servers. My tests > were conducted > > on slave, while there were no any other load on DBMS, or disk activity > on the > > host unrelated to DBMS. > > > > My only assumption is that different fragments of data are being read > with different > > speed due to virtualization or something, but... why is it so strictly > bound > > to these ranges? Why is it the same on two different machines? > > What is the storage system? > > Setting "track_io_timing = on" should measure the time spent doing I/O > more accurately. > > One problem with measuring read speed that way is that "buffers read" can > mean "buffers read from storage" or "buffers read from the file system > cache", > but you say you observe a difference even after dropping the cache. > > To verify if the
Re: Why could different data in a table be processed with different performance?
Hi, answers (and questions) in line here below On 22/09/18 11:19, Vladimir Ryabtsev wrote: > > is the length of the text equally distributed over the 2 partitions? > Not 100% equally, but to me it does not seem to be a big deal... Considering > the ranges independently: > First range: ~70% < 10 KB, ~25% for 10-20 KB, ~3% for 20-30 KB, everything > else is less than 1% (with 10 KB steps). > Second range: ~80% < 10 KB, ~18% for 10-20 KB, ~2% for 20-30 KB, everything > else is less than 1% (with 10 KB steps). > agree, should not play a role here > >From what you posted, the first query retrieves 5005 rows, but the second > >2416. It might be helpful if we are able to compare 5000 vs 5000 > Yes it was just an example, here are the plans for approximately same number > of rows: > > Aggregate (cost=9210.12..9210.13 rows=1 width=16) (actual > time=4265.478..4265.479 rows=1 loops=1) > Buffers: shared hit=27027 read=4311 > I/O Timings: read=2738.728 > -> Index Scan using articles_pkey on articles (cost=0.57..9143.40 > rows=5338 width=107) (actual time=12.254..873.081 rows=5001 loops=1) > Index Cond: ((article_id >= 43803) AND (article_id <= 438035000)) > Buffers: shared hit=4282 read=710 > I/O Timings: read=852.547 > Planning time: 0.235 ms > Execution time: 4265.554 ms > > Aggregate (cost=11794.59..11794.60 rows=1 width=16) (actual > time=62298.559..62298.559 rows=1 loops=1) > Buffers: shared hit=15071 read=14847 > I/O Timings: read=60703.859 > -> Index Scan using articles_pkey on articles (cost=0.57..11709.13 > rows=6837 width=107) (actual time=24.686..24582.221 rows=5417 loops=1) > Index Cond: ((article_id >= '10002104'::bigint) AND (article_id > <= '10002106'::bigint)) > Buffers: shared hit=195 read=5244 > I/O Timings: read=24507.621 > Planning time: 0.494 ms > Execution time: 62298.630 ms > > If we subtract I/O from total time, we get 1527 ms vs 1596 ms — very close > timings for other than I/O operations (considering slightly higher number of > rows in second case). But I/O time differs dramatically. > > > Also is worth noticing that the 'estimated' differs from 'actual' on the > > second query. I think that happens because data is differently distributed > > over the ranges. Probably the analyzer does not have enough samples to > > understand the real distribution. > I think we should not worry about it unless the planner chose poor plan, > should we? Statistics affects on picking a proper plan, but not on execution > of the plan, doesn't it? > Agree, it was pure speculation > > > or to create partial indexes on the 2 ranges. > Sure, will try it with partial indexes. Should I drop existing PK index, or > ensuring that planner picks range index is enough? > you cannot drop it since is on a PKEY. You can create 2 partial indexes and the planner will pick it up for you. (and the planning time will go a bit up) > - does the raid controller have a cache? > > - how big is the cache? (when you measure disk speed, that will influence > > the result very much, if you do not run the test on big-enough data chunk) > > best if is disabled during your tests > I am pretty sure there is some, usually it's several tens of megabytes, but I > ran disk read tests several times with chunks that could not be fit in the > cache and with random offset, so I am pretty sure that something around 500 > MB/s is enough reasonably accurate (but it is only for sequential read). > it is not unusual to have 1GB cache or more... and do not forget to drop the cache between tests + do a sync > > - is the OS caching disk blocks too? maybe you want to drop everything from > > there too. > How can I find it out? And how to drop it? Or you mean hypervisor OS? > Anyway, don't you think that caching specifics could not really explain these > issues? > Sorry I meant the hypervisor OS. Given that the most of the time is on the I/O then caching is maybe playing a role. I tried to reproduce your problem but I cannot go even closer to your results. Everything goes smooth with or without shared buffers, or OS cache. A few questions and considerations came to mind: - how big is your index? - how big is the table? - given the size of shared_buffers, almost 2M blocks should fit, but you say 2 consecutive runs still are hitting the disk. That's strange indeed since you are using way more than 2M blocks. Did you check that perhaps are there any other processes or cronjobs (on postgres and on the system) that are maybe reading data and flushing out the cache? You can make use of pg_buffercache in order to see what is actually cached. That might help to have an overview of the content of it. - As Laurenz suggested (VACUUM FULL), you might want to move data around. You can try also a dump + restore to narrow the problem to data or disk - You might also want to try to see the disk graph of Windows, while you are running your te
Explain is slow with tables having many columns
Hello, I have found that explain on tables with many (hundreds) columns are slow compare to nominal executions. This can break application performances when using auto_explain or pg_store_plans. Here is my test case (with 500 columns, can be pushed to 1000 or 1600) create table a(); DECLARE i int; BEGIN for i in 1..500 loop execute 'alter table a add column a'||i::text||' int'; end loop; END $$; #\timing #select a500 from a; a500 -- (0 rows) Time: 0,319 ms #explain analyze select a500 from a; QUERY PLAN Seq Scan on a (cost=0.00..10.40 rows=40 width=4) (actual time=0.010..0.010 rows=0 loops=1) Planning time: 0.347 ms Execution time: 0.047 ms (3 rows) Time: 4,290 ms Here is a loop to try to understand where this comes from DO $$ DECLARE i int; j int; BEGIN for j in 1..100 loop for i in 1..500 loop execute 'explain select a'||i::text||' from a'; end loop; end loop; END $$; Using perf top, most of the cpu time seems to come from relutils.c colname_is_unique: 59,54% libc-2.26.so[.] __GI___strcmp_ssse3 26,11% postgres [.] colname_is_unique.isra.2 1,46% postgres [.] AllocSetAlloc 1,43% postgres [.] SearchCatCache3 0,70% postgres [.] set_relation_column_names 0,56% libc-2.26.so [.] __strlen_avx2 select version(); PostgreSQL 11devel on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 7.2.0-8ubuntu3) 7.2.0, 64-bit Could this be improved ? Regards PAscal -- Sent from: http://www.postgresql-archive.org/PostgreSQL-performance-f2050081.html
Re: Explain is slow with tables having many columns
On Mon, Sep 24, 2018 at 12:22:28PM -0700, legrand legrand wrote: > Hello, > I have found that explain on tables with many (hundreds) columns > are slow compare to nominal executions. See also this thread from last month: https://www.postgresql.org/message-id/flat/CAEe%3DmRnNNL3RDKJDmY%3D_mpcpAb5ugYL9NcchELa6Qgtoz2NjCw%40mail.gmail.com Justin
Re: Explain is slow with tables having many columns
Justin Pryzby wrote > On Mon, Sep 24, 2018 at 12:22:28PM -0700, legrand legrand wrote: >> Hello, >> I have found that explain on tables with many (hundreds) columns >> are slow compare to nominal executions. > > See also this thread from last month: > > https://www.postgresql.org/message-id/flat/CAEe%3DmRnNNL3RDKJDmY%3D_mpcpAb5ugYL9NcchELa6Qgtoz2NjCw%40mail.gmail.com > > Justin maybe, I will check that patch ... I thought it would also have been related to https://www.postgresql.org/message-id/CAMkU%3D1xPqHP%3D7YPeChq6n1v_qd4WGf%2BZvtnR-b%2BgyzFqtJqMMQ%40mail.gmail.com Regards PAscal -- Sent from: http://www.postgresql-archive.org/PostgreSQL-performance-f2050081.html
Re: Explain is slow with tables having many columns
Hi, (CCing -hackers) On 2018-09-24 12:22:28 -0700, legrand legrand wrote: > I have found that explain on tables with many (hundreds) columns > are slow compare to nominal executions. Yea, colname_is_unique() (called via make_colname_unique()) is essentially O(#total_columns) and rougly called once for each column in a select list (or using or ...). IIRC we've hit this once when I was at citus, too. We really should be usign a more appropriate datastructure here - very likely a hashtable. Unfortunately such a change would likely be a bit too much to backpatch... Greetings, Andres Freund
Re: Explain is slow with tables having many columns
Hi, On 2018-09-24 12:43:44 -0700, legrand legrand wrote: > Justin Pryzby wrote > > On Mon, Sep 24, 2018 at 12:22:28PM -0700, legrand legrand wrote: > >> Hello, > >> I have found that explain on tables with many (hundreds) columns > >> are slow compare to nominal executions. > > > > See also this thread from last month: > > > > https://www.postgresql.org/message-id/flat/CAEe%3DmRnNNL3RDKJDmY%3D_mpcpAb5ugYL9NcchELa6Qgtoz2NjCw%40mail.gmail.com > > > > Justin > > maybe, I will check that patch ... > > I thought it would also have been related to > https://www.postgresql.org/message-id/CAMkU%3D1xPqHP%3D7YPeChq6n1v_qd4WGf%2BZvtnR-b%2BgyzFqtJqMMQ%40mail.gmail.com Neither of these are related to the problem. Greetings, Andres Freund
Re: Why could different data in a table be processed with different performance?
> You can create 2 partial indexes and the planner will pick it up for you.
(and the planning time will go a bit up).
Created two partial indexes and ensured planner uses it. But the result is
still the same, no noticeable difference.
> it is not unusual to have 1GB cache or more... and do not forget to drop
the cache between tests + do a sync
I conducted several long runs of dd, so I am sure that this numbers are
fairly correct. However, what worries me is that I test sequential read
speed while during my experiments Postgres might need to read from random
places thus reducing real read speed dramatically. I have a feeling that
this can be the reason.
I also reviewed import scripts and found the import was done in DESCENDING
order of IDs. It was so to get most recent records sooner, may be it caused
some inefficiency in the storage... But again, it was so for both ranges.
> - how big is your index?
pg_table_size('articles_pkey') = 1561 MB
> - how big is the table?
pg_table_size('articles') = 427 GB
pg_table_size('pg_toast.pg_toast_221558') = 359 GB
> - given the size of shared_buffers, almost 2M blocks should fit, but you
say 2 consecutive runs still are hitting the disk. That's strange indeed
since you are using way more than 2M blocks.
TBH, I cannot say I understand your calculations with number of blocks...
But to clarify: consecutive runs with SAME parameters do NOT hit the disk,
only the first one does, consequent ones read only from buffer cache.
> Did you check that perhaps are there any other processes or cronjobs (on
postgres and on the system) that are maybe reading data and flushing out
the cache?
I checked with iotop than nothing else reads intensively from any disk in
the system. And again, the result is 100% reproducible and depends on ID
range only, if there were any thing like these I would have noticed some
fluctuations in results.
> You can make use of pg_buffercache in order to see what is actually
cached.
It seems that there is no such a view in my DB, could it be that the module
is not installed?
> - As Laurenz suggested (VACUUM FULL), you might want to move data around.
You can try also a dump + restore to narrow the problem to data or disk
I launched VACUUM FULL, but it ran very slowly, according to my calculation
it might take 17 hours. I will try to do copy data into another table with
the same structure or spin up another server, and let you know.
> - You might also want to try to see the disk graph of Windows, while you
are running your tests. It can show you if data (and good to know how much)
is actually fetching from disk or not.
I wanted to do so but I don't have access to Hyper-V server, will try to
request credentials from admins.
Couple more observations:
1) The result of my experiment is almost not affected by other server load.
Another user was running a query (over this table) with read speed ~130
MB/s, while with my query read at 1.8-2 MB/s.
2) iotop show higher IO % (~93-94%) with slower read speed (though it is
not quite clear what this field is). A process from example above had ~55%
IO with 130 MB/s while my process had ~93% with ~2MB/s.
Regards,
Vlad
Re: Why could different data in a table be processed with different performance?
On Mon, Sep 24, 2018 at 03:28:15PM -0700, Vladimir Ryabtsev wrote:
> > it is not unusual to have 1GB cache or more... and do not forget to drop
> the cache between tests + do a sync
> I also reviewed import scripts and found the import was done in DESCENDING
> order of IDs.
This seems significant..it means the heap was probably written in backwards
order relative to the IDs, and the OS readahead is ineffective when index
scanning across a range of IDs. From memory, linux since (at least) 2.6.32 can
optimize this. You mentioned you're using 4.4. Does your LVM have readahead
ramped up ? Try lvchange -r 65536 data/postgres (or similar).
Also..these might be an impractical solution for several reasons, but did you
try either 1) forcing a bitmap scan (of only one index), to force the heap
reads to be ordered, if not sequential? SET enable_indexscan=off (and maybe
SET enable_seqscan=off and others as needed).
Or, 2) Using a brin index (scanning of which always results in bitmap heap
scan).
> > - how big is the table?
> pg_table_size('articles') = 427 GB
> pg_table_size('pg_toast.pg_toast_221558') = 359 GB
Ouch .. if it were me, I would definitely want to make that a partitioned
table..
Or perhaps two unioned together with a view? One each for the sparse and dense
range?
> > You can make use of pg_buffercache in order to see what is actually
> cached.
> It seems that there is no such a view in my DB, could it be that the module
> is not installed?
Right, it's in the postgresql -contrib package.
And you have to "CREATE EXTENSION pg_buffercache".
> > - As Laurenz suggested (VACUUM FULL), you might want to move data around.
> You can try also a dump + restore to narrow the problem to data or disk
> I launched VACUUM FULL, but it ran very slowly, according to my calculation
> it might take 17 hours. I will try to do copy data into another table with
> the same structure or spin up another server, and let you know.
I *suspect* VACUUM FULL won't help, since (AIUI) it copies all "visible" tuples
from the source table into a new table (and updates indices as necessary). It
can resolve bloat due to historic DELETEs, but since I think your table was
written in reverse order of pkey, I think it'll also copy it in reverse order.
CLUSTER will fix that. You can use pg_repack to do so online...but it's going
to be painful for a table+toast 1TiB in size: it'll take all day, and also
require an additional 1TB while running (same as VAC FULL).
Justin
Re: Why could different data in a table be processed with different performance?
> And if you run? > select count(*) from articles where article_id between %s and %s > ie without reading json, is your buffers hit count increasing? Tried this. This is somewhat interesting, too... Even index-only scan is faster for the "fast" range. The results are consistently fast in it, with small and constant numbers of hits and reads. For the big one, in contrary, it shows huge number of hits (why? how it manages to do the same with lesser blocks access in "fast" range?) and the duration is "jumping" with higher values in average. "Fast": https://i.stack.imgur.com/63I9k.png "Slow": https://i.stack.imgur.com/QzI3N.png Note that results on the charts are averaged by 1M, but particular values in "slow" range reached 4 s, while maximum execution time for the "fast" range was only 0.3 s. Regards, Vlad
Re: Why could different data in a table be processed with different performance?
> This seems significant..it means the heap was probably written in backwards order relative to the IDs, and the OS readahead is ineffective when index scanning across a range of IDs. But again, why is it different for one range and another? It was reversed for both ranges. > I would definitely want to make that a partitioned table Yes, I believe it will be partitioned in the future. > I *suspect* VACUUM FULL won't help, since (AIUI) it copies all "visible" tuples from the source table into a new table (and updates indices as necessary). It can resolve bloat due to historic DELETEs, but since I think your table was written in reverse order of pkey, I think it'll also copy it in reverse order. I am going copy the slow range into a table nearby and see if it reproduces (I hope "INSERT INTO t2 SELECT * FROM t1 WHERE ..." will keep existing order of rows). Then I could try the same after CLUSTER. Regards, Vlad
Re: Why could different data in a table be processed with different performance?
> did you try either 1) forcing a bitmap scan (of only one index), to force the heap reads to be ordered, if not sequential? SET enable_indexscan=off (and maybe SET enable_seqscan=off and others as needed). Disabling index scan made it bitmap. It is surprising, but this increased read speed in both ranges. It came two times for "fast" range and 3 times faster for "slow" range (for certain segments of data I checked on, the whole experiment takes a while though). But there is still a difference between the ranges, it became now ~20 MB/s vs ~6 MB/s. Vlad
Re: Why could different data in a table be processed with different performance?
On Mon, Sep 24, 2018 at 05:59:12PM -0700, Vladimir Ryabtsev wrote: > > I *suspect* VACUUM FULL won't help, since (AIUI) it copies all "visible" ... > I am going copy the slow range into a table nearby and see if it reproduces > (I hope "INSERT INTO t2 SELECT * FROM t1 WHERE ..." will keep existing > order of rows). Then I could try the same after CLUSTER. If it does an index scan, I think that will badly fail to keep the same order of heap TIDs - it'll be inserting rows in ID order rather than in (I guess) reverse ID order. Justin
Re: Why could different data in a table be processed with different performance?
> If it does an index scan, I think that will badly fail to keep the same order of heap TIDs - it'll be inserting rows in ID order rather than in (I guess) reverse ID order. According to the plan, it's gonna be seq. scan with filter. Vlad
Re: Why could different data in a table be processed with different performance?
On Mon, Sep 24, 2018 at 05:59:12PM -0700, Vladimir Ryabtsev wrote: > > This seems significant..it means the heap was probably written in > backwards > order relative to the IDs, and the OS readahead is ineffective when index > scanning across a range of IDs. > But again, why is it different for one range and another? It was reversed > for both ranges. I don't have an explaination for it.. but I'd be curious to know pg_stats.correlation for the id column: SELECT schemaname, tablename, attname, correlation FROM pg_stats WHERE tablename='articles' AND column='article_id' LIMIT 1; Justin
Re: Why could different data in a table be processed with different performance?
> but I'd be curious to know > SELECT schemaname, tablename, attname, correlation FROM pg_stats WHERE tablename='articles' AND column='article_id' LIMIT 1; I think you meant 'attname'. It gives storage articles article_id -0.77380306 Vlad
Re: Why could different data in a table be processed with different performance?
Hi, Vladimir, Reading the whole thread it seems you should look deeper into IO subsystem. 1) Which file system are you using? 2) What is the segment layout of the LVM PVs and LVs? See https://www.centos.org/docs/5/html/Cluster_Logical_Volume_Manager/report_object_selection.html how to check. If data is fragmented, maybe the disks are doing a lot of seeking? 3) Do you use LVM for any "extra" features, such as snapshots? 4) You can try using seekwatcher to see where on the disk the slowness is occurring. You get a chart similar to this http://kernel.dk/dd-md0-xfs-pdflush.png 5) BCC is a collection of tools that might shed a light on what is happening. https://github.com/iovisor/bcc Kind regards, Gasper On 21. 09. 2018 02:07, Vladimir Ryabtsev wrote: > I am experiencing a strange performance problem when accessing JSONB > content by primary key. > > My DB version() is PostgreSQL 10.3 (Ubuntu 10.3-1.pgdg14.04+1) on > x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 4.8.4-2ubuntu1~14.04.4) > 4.8.4, 64-bit > postgres.conf: https://justpaste.it/6pzz1 > uname -a: Linux postgresnlpslave 4.4.0-62-generic #83-Ubuntu SMP Wed > Jan 18 14:10:15 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux > The machine is virtual, running under Hyper-V. > Processor: Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz, 1x1 cores > Disk storage: the host has two vmdx drives, first shared between the > root partition and an LVM PV, second is a single LVM PV. Both PVs are > in a VG containing swap and postgres data partitions. The data is > mostly on the first PV. > > I have such a table: > > CREATE TABLE articles > ( > article_id bigint NOT NULL, > content jsonb NOT NULL, > published_at timestamp without time zone NOT NULL, > appended_at timestamp without time zone NOT NULL, > source_id integer NOT NULL, > language character varying(2) NOT NULL, > title text NOT NULL, > topicstopic[] NOT NULL, > objects object[] NOT NULL, > cluster_id bigint NOT NULL, > CONSTRAINT articles_pkey PRIMARY KEY (article_id) > ) > > We have a Python lib (using psycopg2 driver) to access this table. It > executes simple queries to the table, one of them is used for bulk > downloading of content and looks like this: > > select content from articles where id between $1 and $2 > > I noticed that with some IDs it works pretty fast while with other it > is 4-5 times slower. It is suitable to note, there are two main > 'categories' of IDs in this table: first is range 27000-5, > and second is range 100-10003000. For the first range it > is 'fast' and for the second it is 'slow'. Besides larger absolute > numbers withdrawing them from int to bigint, values in the second > range are more 'sparse', which means in the first range values are > almost consequent (with very few 'holes' of missing values) while in > the second range there are much more 'holes' (average filling is 35%). > Total number of rows in the first range: ~62M, in the second range: ~10M. > > I conducted several experiments to eliminate possible influence of > library's code and network throughput, I omit some of them. I ended up > with iterating over table with EXPLAIN to simulate read load: > > explain (analyze, buffers) > select count(*), sum(length(content::text)) from articles where > article_id between %s and %s > > Sample output: > > Aggregate (cost=8635.91..8635.92 rows=1 width=16) (actual > time=6625.993..6625.995 rows=1 loops=1) > Buffers: shared hit=26847 read=3914 > -> Index Scan using articles_pkey on articles (cost=0.57..8573.35 > rows=5005 width=107) (actual time=21.649..1128.004 rows=5000 loops=1) > Index Cond: ((article_id >= 43800) AND (article_id <= > 438005000)) > Buffers: shared hit=4342 read=671 > Planning time: 0.393 ms > Execution time: 6626.136 ms > > Aggregate (cost=5533.02..5533.03 rows=1 width=16) (actual > time=33219.100..33219.102 rows=1 loops=1) > Buffers: shared hit=6568 read=7104 > -> Index Scan using articles_pkey on articles (cost=0.57..5492.96 > rows=3205 width=107) (actual time=22.167..12082.624 rows=2416 loops=1) > Index Cond: ((article_id >= '10002100'::bigint) AND > (article_id <= '10002101'::bigint)) > Buffers: shared hit=50 read=2378 > Planning time: 0.517 ms > Execution time: 33219.218 ms > > During iteration, I parse the result of EXPLAIN and collect series of > following metrics: > > - buffer hits/reads for the table, > - buffer hits/reads for the index, > - number of rows (from "Index Scan..."), > - duration of execution. > > Based on metrics above I calculate inherited metrics: > > - disk read rate: (index reads + table reads) * 8192 / duration, > - reads ratio: (index reads + table reads) / (index reads + table > reads + index hits + table hits), > - data rate: (index reads + table reads + index hits + table hits) * > 8192 / duration, > - rows rate: number of rows / duration. > > Since "density" of IDs is different in "small" and "big" range
Re: To keep indexes in memory, is large enough effective_cache_size enough?
Hi! "Index in memory" topic: After read operation starts, I think / it seems that a big part of an index gets loaded to memory quite quickly. A lot of IDs fit to one 8 KB page in PostgreSQL. When reading operation starts, pages start to be loaded to memory quickly. So, this "feature" / PostgreSQL may work well in our very long "random" IDs cases still. It is not needed / not possible to keep the whole index in memory: E.g. if there is not enough memory / size of index is bigger than memory, it is not even possible to keep whole index in memory. ( Regarding in memory DB functionalities: I do not know would "In-memory" index / db work in such situations, if index would not fit in memory. We would like to keep most of the index in memory, but not whole index in all cases e.g. when there is not enough memory available. ) So, again, maybe PostgreSQL works well in our case. Regarding double buffering: I do not know how much double buffering would slow down operations. It could also be possible to turn off kernel page cache on our DB server, to avoid double buffering. Although, we may still keep it in use. BR Sam On Wednesday, September 19, 2018 5:50 PM, Sam R. wrote: Thanks for the comments! Sam wrote: >> The data in db table columns is not needed to be kept in memory, only the >> index. (hash index.) Jeff Janes wrote: > This sounds like speculation. Do you have hard evidence that this is > actually the case? In our case the "ID" is randomly generated random number. (Large ID.) It is not a "sequential" number, but random. In generation phase, it is a very large random number. Our application may not even generate the random ID. We use hash index over the ID. At the moment, in "pure theory", we will read randomly through the hash index.So, no one will be able to know what part of the data (from the table) should be kept in memory. Side note: Of course there may be (even many) use cases, where same data is read again and again. Still: I am thinking now from a very theoretical point of view (which we may still apply in practice). In generic: I am not certain how PostgreSQL or hash indexes work in detail, so my claim / wish of keeping only the index in memory may be faulty. (This is one reason for these discussions.) BR Sam
