HDD vs SSD without explanation
Dear all Someone help me analyze the two execution plans below (Explain ANALYZE used), is the query 9 of TPC-H benchmark [1]. I'm using a server HP Intel Xeon 2.8GHz/4-core - Memory 8GB HDD SAS 320GB 15 Krpm AND SSD Sansung EVO 500GB. My DBMS parameters presents in postgresql.conf is default, but in SSD I have changed random_page_cost = 1.0. I do not understand, because running on an HDD a query used half the time. I explain better, in HDD spends on average 12 minutes the query execution and on SSD spent 26 minutes. I think maybe the execution plan is using more write operations, and so the HDD SAS 15Krpm has been faster. Anyway, I always thought that an SSD would be equal or faster, but in the case and four more cases we have here, it lost a lot for the HDDs. Any help in understanding, is welcome Best Regards Neto - Query execution Time on SSD --- execution 1: 00:23:29 execution 2: 00:28:38 execution 3: 00:27:32 execution 4: 00:27:54 execution 5: 00:27:35 execution 6: 00:26:19 Average: 26min 54 seconds Query execution Time on HDD --- execution 1: 00:12:44 execution 2: 00:12:30 execution 3: 00:12:47 execution 4: 00:13:02 execution 5: 00:13:00 execution 6: 00:12:47 Average: 12 minutes 48 seconds -- EXECUTION PLAN SSD Storage Finalize GroupAggregate (cost=15.694.362.41..15842178.65 rows=60150 width=66) (actual time=1670577.649..1674717.444 rows=175 loops=1) Group Key: nation.n_name, (date_part(_year_::text, (orders.o_orderdate)::timestamp without time zone)) -> Gather Merge (cost=15694362.41..15839923.02 rows=120300 width=66) (actual time=1670552.446..1674716.748 rows=525 loops=1)Workers Planned: 2Workers Launched: 2-> Partial GroupAggregate (cost=15693362.39..15825037.39 rows=60150 width=66) (actual time=1640482.164..1644619.574 rows=175 loops=3) Group Key: nation.n_name, (date_part(_year_::text, (orders.o_orderdate)::timestamp without time zone)) -> Sort (cost=15693362.39..15709690.19 rows=6531119 width=57) (actual time=1640467.384..1641511.970 rows=4344197 loops=3)Sort Key: nation.n_name, (date_part(_year_::text, (orders.o_orderdate)::timestamp without time zone)) DESCSort Method: external merge Disk: 319512kB-> Hash Join (cost=4708869.23..14666423.78 rows=6531119 width=57) (actual time=1366753.586..1634128.122 rows=4344197 loops=3) Hash Cond: (lineitem.l_suppkey = supplier.s_suppkey) -> Hash Join (cost=4683027.67..14400582.74 rows=6531119 width=43) (actual time=1328019.213..1623919.675 rows=4344197 loops=3)Hash Cond: (lineitem.l_orderkey = orders.o_orderkey)-> Hash Join (cost=1993678.29..11279593.98 rows=6531119 width=47) (actual time=245906.330..1316201.213 rows=4344197 loops=3) Hash Cond: ((lineitem.l_suppkey = partsupp.ps_suppkey) AND (lineitem.l_partkey = partsupp.ps_partkey)) -> Hash Join (cost=273200.59..9157211.71 rows=6531119 width=45) (actual time=5103.563..1007657.993 rows=4344197 loops=3)Hash Cond: (lineitem.l_partkey = part.p_partkey)-> Parallel Seq Scan on lineitem (cost=0.00..5861332.93 rows=15093 width=41) (actual time=3.494..842667.110 rows=80004097 loops=3)-> Hash (cost=263919.95..263919.95 rows=565651 width=4) (actual time=4973.807..4973.807 rows=434469 loops=3) Buckets: 131072 Batches: 8 Memory Usage: 2933kB -> Seq Scan on part (cost=0.00..263919.95 rows=565651 width=4) (actual time=11.810..4837.287 rows=434469 loops=3)Filter: ((p_name)::text ~~ _%orchid%_::text) Rows Removed by Filter: 7565531 -> Hash (cost=1052983.08..1052983.08 rows=31999708 width=22) (actual time=240711.936..240711.936 rows=3200 loops=3)Buckets: 65536 Batches: 512 Memory Usage: 3941kB-> Seq Scan on partsupp (cost=0.00..1052983.08 rows=31999708 width=22) (actual time=0.033..228828.149 rows=3200 loops=3)-> Hash (cost=1704962.28..1704962.28 rows=6728 width=8) (actual time=253669.242..253669.242 rows=6000 loops=3) Buckets: 131072 Batches: 1024 Memory Usage: 3316kB -> Seq Scan on orders (cost=0.00..1704962.28 rows=6728 width=8) (actual time=0.038..237545.226 rows
Re: HDD vs SSD without explanation
On Sun, Jan 14, 2018 at 12:44:00PM -0800, Neto pr wrote:
> Dear all
>
> Someone help me analyze the two execution plans below (Explain ANALYZE
> used), is the query 9 of TPC-H benchmark [1].
>
> I'm using a server HP Intel Xeon 2.8GHz/4-core - Memory 8GB HDD SAS 320GB
> 15 Krpm AND SSD Sansung EVO 500GB.
>
> I think maybe the execution plan is using more write operations, and so the
> HDD SAS 15Krpm has been faster.
The query plan is all garbled by mail , could you resend? Or post a link from
https://explain.depesz.com/
To see if the query is causing many writes (due to dirty pages, sorts, etc),
run with explain(analyze,buffers)
But from what I could tell, your problems are here:
-> Parallel Seq Scan on lineitem (cost=0.00..5861332.93 rows=15093
width=41) (actual TIME=3.494..842667.110 rows=80004097 loops=3)
vs
-> Parallel Seq Scan on lineitem (cost=0.00..5861333.40 rows=15140
width=41) (actual TIME=41.805..224438.909 rows=80004097 loops=3)
-> Seq Scan on partsupp (cost=0.00..1052983.08 rows=31999708 width=22) (actual
TIME=0.033..228828.149 rows=3200 loops=3)
vs
-> Seq Scan on partsupp (cost=0.00..1052934.38 rows=31994838 width=22)
(actual TIME=0.037..37865.003 rows=3200 loops=3)
Can you reproduce the speed difference using dd ?
time sudo dd if=/dev/sdX of=/dev/null bs=1M count=32K skip=$((128*$RANDOM/32))
# set bs to optimal_io_size
Or: bonnie++ -f -n0
What OS/kernel are you using? LVM? filesystem? I/O scheduler? partitions?
readahead? blockdev --getra
If you're running under linux, maybe you can just send the output of:
for a in /sys/block/sdX/queue/*; do echo "$a `cat $a`"; done
or: tail
/sys/block/sdX/queue/{minimum_io_size,optimal_io_size,read_ahead_kb,scheduler,rotational,max_sectors_kb,logical_block_size,physical_block_size}
Justin
Re: HDD vs SSD without explanation
Thanks for the reply.
I'll try upload the execution plan with Explain (analyse, buffer) for
website: https://explain.depesz.com/
I'm make an experiment for a scientific research and this is what I
find strange, explaining better, strange HDD performance far outweigh
the performance of an SSD.
Do you think that if you run a VACUMM FULL the performance with the
SSD will be better than a 15Krpm SAS HDD?
Best Regards
Neto
https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail";
target="_blank">https://ipmcdn.avast.com/images/icons/icon-envelope-tick-round-orange-animated-no-repeat-v1.gif";
alt="" width="46" height="29" style="width: 46px; height: 29px;"
/>
Livre de vĂrus. https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail";
target="_blank" style="color: #4453ea;">www.avast.com.
2018-01-14 19:40 GMT-02:00 Justin Pryzby :
> On Sun, Jan 14, 2018 at 12:44:00PM -0800, Neto pr wrote:
>> Dear all
>>
>> Someone help me analyze the two execution plans below (Explain ANALYZE
>> used), is the query 9 of TPC-H benchmark [1].
>>
>> I'm using a server HP Intel Xeon 2.8GHz/4-core - Memory 8GB HDD SAS 320GB
>> 15 Krpm AND SSD Sansung EVO 500GB.
>>
>> I think maybe the execution plan is using more write operations, and so the
>> HDD SAS 15Krpm has been faster.
>
> The query plan is all garbled by mail , could you resend? Or post a link from
> https://explain.depesz.com/
>
> To see if the query is causing many writes (due to dirty pages, sorts, etc),
> run with explain(analyze,buffers)
>
> But from what I could tell, your problems are here:
>
> -> Parallel Seq Scan on lineitem (cost=0.00..5861332.93 rows=15093
> width=41) (actual TIME=3.494..842667.110 rows=80004097 loops=3)
> vs
> -> Parallel Seq Scan on lineitem (cost=0.00..5861333.40 rows=15140
> width=41) (actual TIME=41.805..224438.909 rows=80004097 loops=3)
>
> -> Seq Scan on partsupp (cost=0.00..1052983.08 rows=31999708 width=22)
> (actual TIME=0.033..228828.149 rows=3200 loops=3)
> vs
> -> Seq Scan on partsupp (cost=0.00..1052934.38 rows=31994838 width=22)
> (actual TIME=0.037..37865.003 rows=3200 loops=3)
>
> Can you reproduce the speed difference using dd ?
> time sudo dd if=/dev/sdX of=/dev/null bs=1M count=32K
> skip=$((128*$RANDOM/32)) # set bs to optimal_io_size
>
> Or: bonnie++ -f -n0
>
> What OS/kernel are you using? LVM? filesystem? I/O scheduler? partitions?
> readahead? blockdev --getra
>
> If you're running under linux, maybe you can just send the output of:
> for a in /sys/block/sdX/queue/*; do echo "$a `cat $a`"; done
> or: tail
> /sys/block/sdX/queue/{minimum_io_size,optimal_io_size,read_ahead_kb,scheduler,rotational,max_sectors_kb,logical_block_size,physical_block_size}
>
> Justin
Re: HDD vs SSD without explanation
2018-01-14 13:40 GMT-08:00 Justin Pryzby :
> On Sun, Jan 14, 2018 at 12:44:00PM -0800, Neto pr wrote:
>> Dear all
>>
>> Someone help me analyze the two execution plans below (Explain ANALYZE
>> used), is the query 9 of TPC-H benchmark [1].
>>
>> I'm using a server HP Intel Xeon 2.8GHz/4-core - Memory 8GB HDD SAS 320GB
>> 15 Krpm AND SSD Sansung EVO 500GB.
>>
>> I think maybe the execution plan is using more write operations, and so the
>> HDD SAS 15Krpm has been faster.
>
> The query plan is all garbled by mail , could you resend? Or post a link from
> https://explain.depesz.com/
>
> To see if the query is causing many writes (due to dirty pages, sorts, etc),
> run with explain(analyze,buffers)
>
> But from what I could tell, your problems are here:
>
> -> Parallel Seq Scan on lineitem (cost=0.00..5861332.93 rows=15093
> width=41) (actual TIME=3.494..842667.110 rows=80004097 loops=3)
> vs
> -> Parallel Seq Scan on lineitem (cost=0.00..5861333.40 rows=15140
> width=41) (actual TIME=41.805..224438.909 rows=80004097 loops=3)
>
> -> Seq Scan on partsupp (cost=0.00..1052983.08 rows=31999708 width=22)
> (actual TIME=0.033..228828.149 rows=3200 loops=3)
> vs
> -> Seq Scan on partsupp (cost=0.00..1052934.38 rows=31994838 width=22)
> (actual TIME=0.037..37865.003 rows=3200 loops=3)
>
> Can you reproduce the speed difference using dd ?
> time sudo dd if=/dev/sdX of=/dev/null bs=1M count=32K
> skip=$((128*$RANDOM/32)) # set bs to optimal_io_size
>
> Or: bonnie++ -f -n0
>
> What OS/kernel are you using? LVM? filesystem? I/O scheduler? partitions?
> readahead? blockdev --getra
OS = Debian 8 64bits - 3.16.0-4
See below the Disk FileSystem
root@hp2ml110deb:/# fdisk -l
Disk /dev/sda: 931.5 GiB, 1000204886016 bytes, 1953525168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 26F5EB21-30DB-44E4-B9E2-E8105846B6C4
Device StartEndSectors Size Type
/dev/sda1204810506231048576 512M EFI System
/dev/sda2 1050624 1937274879 1936224256 923.3G Linux filesystem
/dev/sda3 1937274880 1953523711 16248832 7.8G Linux swap
Disk /dev/sdb: 465.8 GiB, 500107862016 bytes, 976773168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
The DBMS and tablespace of users is installed in /dev/sdb SSD.
> If you're running under linux, maybe you can just send the output of:
> for a in /sys/block/sdX/queue/*; do echo "$a `cat $a`"; done
> or: tail
> /sys/block/sdX/queue/{minimum_io_size,optimal_io_size,read_ahead_kb,scheduler,rotational,max_sectors_kb,logical_block_size,physical_block_size}
>
> Justin
Re: HDD vs SSD without explanation
2018-01-14 15:59 GMT-08:00 Neto pr : > Thanks for the reply. > I'll try upload the execution plan with Explain (analyse, buffer) for > website: https://explain.depesz.com/ > Below is a new execution plan, with Analyze, BUFFERS. This time, without changing anything in the configuration of the DBMS, I just rebooted the DBMS, the time of 16 minutes was obtained, against the 26 minutes of another execution. But it still has not managed to exceed the execution time in HDD SAS 15Krpm. I was not able to upload to the site, because I'm saving the execution plan in the database, and when I retrieve it, it loses the line breaks, and the d site does not allow uploading. --- Execution Plan with Buffers executed on SSD Stores.- Finalize GroupAggregate (cost=1588.33..15980046.69 rows=60150 width=66) (actual time=969248.287..973686.679 rows=175 loops=1) Group Key: nation.n_name, (date_part(_year_::text, (orders.o_orderdate)::timestamp without time zone)) Buffers: shared hit=1327602 read=2305013, temp read=1183857 written=1180940 -> Gather Merge (cost=1588.33..15977791.06 rows=120300 width=66) (actual time=969222.164..973685.582 rows=525 loops=1)Workers Planned: 2Workers Launched: 2Buffers: shared hit=1327602 read=2305013, temp read=1183857 written=1180940-> Partial GroupAggregate (cost=15821228.31..15962905.44 rows=60150 width=66) (actual time=941985.137..946403.344 rows=175 loops=3) Group Key: nation.n_name, (date_part(_year_::text, (orders.o_orderdate)::timestamp without time zone)) Buffers: shared hit=3773802 read=7120852, temp read=3550293 written=3541542 -> Sort (cost=15821228.31..15838806.37 rows=7031225 width=57) (actual time=941954.595..943119.850 rows=4344197 loops=3)Sort Key: nation.n_name, (date_part(_year_::text, (orders.o_orderdate)::timestamp without time zone)) DESCSort Method: external merge Disk: 320784kBBuffers: shared hit=3773802 read=7120852, temp read=3550293 written=3541542-> Hash Join (cost=4708859.28..14719466.13 rows=7031225 width=57) (actual time=619996.638..933725.615 rows=4344197 loops=3) Hash Cond: (lineitem.l_suppkey = supplier.s_suppkey) Buffers: shared hit=3773732 read=7120852, temp read=3220697 written=3211409 -> Hash Join (cost=4683017.71..14434606.65 rows=7071075 width=43) (actual time=579893.395..926348.061 rows=4344197 loops=3) Hash Cond: (lineitem.l_orderkey = orders.o_orderkey) Buffers: shared hit=3758207 read=7108695, temp read=3114271 written=3105025-> Hash Join (cost=1993687.71..11297331.33 rows=7071075 width=47) (actual time=79741.803..805259.856 rows=4344197 loops=3) Hash Cond: ((lineitem.l_suppkey = partsupp.ps_suppkey) AND (lineitem.l_partkey = partsupp.ps_partkey)) Buffers: shared hit=1754251 read=5797780, temp read=2369849 written=2366741 -> Hash Join (cost=273201.71..9157213.44 rows=7071075 width=45) (actual time=5363.078..672302.517 rows=4344197 loops=3) Hash Cond: (lineitem.l_partkey = part.p_partkey) Buffers: shared hit=325918 read=5027133, temp read=1742658 written=1742616 -> Parallel Seq Scan on lineitem (cost=0.00..5861333.20 rows=15120 width=41) (actual time=0.129..536226.436 rows=80004097 loops=3) Buffers: shared hit=2 read=4861280 -> Hash (cost=263921.00..263921.00 rows=565657 width=4) (actual time=5362.100..5362.100 rows=434469 loops=3) Buckets: 131072 Batches: 8 Memory Usage: 2933kB Buffers: shared hit=325910 read=165853, temp written=3327 -> Seq Scan on part (cost=0.00..263921.00 rows=565657 width=4) (actual time=0.025..5279.959 rows=434469 loops=3) Filter: ((p_name)::text ~~ _%orchid%_::text) Rows Removed by Filter: 7565531 Buffers: shared hit=325910 read=165853 -> Hash (cost=1052986.00..1052986.00 rows=3200 width=22) (actual time=74231.061..74231.061 rows=3200 loops=3) Buckets: 65536 Batches: 512 Memory Usage: 3941kBBuffers: shared hit=1428311 read=770647, temp written=513846 -> Seq Scan on partsupp (cost=0.00..1052986.00 rows=3200 width=22) (actual time=0.037..66316.652 rows=3200 loops=3) Buffers: shared hit=1428311 read=770647 -> Hash (cost=1704955.00..1704955.00 rows=6000 width=8) (actual time=46310.630..46310.63
Re: HDD vs SSD without explanation
On Sun, Jan 14, 2018 at 06:25:40PM -0800, Neto pr wrote:
> > The query plan is all garbled by mail , could you resend? Or post a link
> > from
> > https://explain.depesz.com/
On Sun, Jan 14, 2018 at 06:36:02PM -0800, Neto pr wrote:
> I was not able to upload to the site, because I'm saving the execution
> plan in the database, and when I retrieve it, it loses the line breaks,
That's why it's an issue for me, too..
> > What OS/kernel are you using? LVM? filesystem? I/O scheduler?
> > partitions?
>
> See below the Disk FileSystem
> root@hp2ml110deb:/# fdisk -l
> Disk /dev/sda: 931.5 GiB, 1000204886016 bytes, 1953525168 sectors
>
> Disk /dev/sdb: 465.8 GiB, 500107862016 bytes, 976773168 sectors
> Units: sectors of 1 * 512 = 512 bytes
> Sector size (logical/physical): 512 bytes / 512 bytes
> I/O size (minimum/optimal): 512 bytes / 512 bytes
>
What about sdb partitions/FS?
On Sun, Jan 14, 2018 at 06:25:40PM -0800, Neto pr wrote:
> The DBMS and tablespace of users is installed in /dev/sdb SSD.
Is that also a temp_tablespace ? Or are your hashes spilling to HDD instead ?
Group Key: nation.n_name, (date_part(_year_::text,
(orders.o_orderdate)::timestamp without time zone))
Buffers: shared hit=3773802 read=7120852, temp read=3550293 written=3541542
Are your SSD being used for anything else ?
What about these?
> > readahead? blockdev --getra
> > If you're running under linux, maybe you can just send the output of:
> > for a in /sys/block/sdX/queue/*; do echo "$a `cat $a`"; done
> > or: tail
> > /sys/block/sdX/queue/{minimum_io_size,optimal_io_size,read_ahead_kb,scheduler,rotational,max_sectors_kb,logical_block_size,physical_block_size}
> > Can you reproduce the speed difference using dd ?
> > time sudo dd if=/dev/sdX of=/dev/null bs=1M count=32K
> > skip=$((128*$RANDOM/32)) # set bs to optimal_io_size
> >
> > Or: bonnie++ -f -n0
Justin
