HDD vs SSD without explanation

2018-01-14 Thread Neto pr
Dear all

Someone help me analyze the two execution plans below (Explain ANALYZE
used), is the  query 9 of TPC-H benchmark [1].
I'm using a server HP Intel Xeon 2.8GHz/4-core - Memory 8GB HDD SAS 320GB
15 Krpm AND SSD Sansung EVO 500GB.
My DBMS parameters presents in postgresql.conf is default, but in SSD I
have changed random_page_cost = 1.0.

I do not understand, because running on an HDD a query used half the time.
I explain better, in HDD spends on average 12 minutes the query execution
and on SSD spent 26 minutes.
I think maybe the execution plan is using more write operations, and so the
HDD SAS 15Krpm has been faster.
Anyway, I always thought that an SSD would be equal or faster, but in the
case and four more cases we have here, it lost a lot for the HDDs.

Any help in understanding, is welcome

Best Regards
Neto

- Query execution Time on SSD ---
execution 1: 00:23:29
execution 2: 00:28:38
execution 3: 00:27:32
execution 4: 00:27:54
execution 5: 00:27:35
execution 6: 00:26:19
Average: 26min 54 seconds

Query execution Time on HDD
---
execution 1: 00:12:44
execution 2: 00:12:30
execution 3: 00:12:47
execution 4: 00:13:02
execution 5: 00:13:00
execution 6: 00:12:47
Average: 12 minutes 48 seconds

-- EXECUTION PLAN SSD
Storage
Finalize GroupAggregate  (cost=15.694.362.41..15842178.65 rows=60150
width=66) (actual time=1670577.649..1674717.444 rows=175 loops=1)  Group
Key: nation.n_name, (date_part(_year_::text,
(orders.o_orderdate)::timestamp without time zone))  ->  Gather Merge
(cost=15694362.41..15839923.02 rows=120300 width=66) (actual
time=1670552.446..1674716.748 rows=525 loops=1)Workers Planned:
2Workers Launched: 2->  Partial GroupAggregate
(cost=15693362.39..15825037.39 rows=60150 width=66) (actual
time=1640482.164..1644619.574 rows=175 loops=3)  Group Key:
nation.n_name, (date_part(_year_::text, (orders.o_orderdate)::timestamp
without time zone))  ->  Sort  (cost=15693362.39..15709690.19
rows=6531119 width=57) (actual time=1640467.384..1641511.970 rows=4344197
loops=3)Sort Key: nation.n_name,
(date_part(_year_::text, (orders.o_orderdate)::timestamp without time
zone)) DESCSort Method: external merge  Disk:
319512kB->  Hash Join  (cost=4708869.23..14666423.78
rows=6531119 width=57) (actual time=1366753.586..1634128.122 rows=4344197
loops=3)  Hash Cond: (lineitem.l_suppkey =
supplier.s_suppkey)  ->  Hash Join
(cost=4683027.67..14400582.74 rows=6531119 width=43) (actual
time=1328019.213..1623919.675 rows=4344197
loops=3)Hash Cond: (lineitem.l_orderkey =
orders.o_orderkey)->  Hash Join
(cost=1993678.29..11279593.98 rows=6531119 width=47) (actual
time=245906.330..1316201.213 rows=4344197
loops=3)  Hash Cond:
((lineitem.l_suppkey = partsupp.ps_suppkey) AND (lineitem.l_partkey =
partsupp.ps_partkey))  ->  Hash Join
(cost=273200.59..9157211.71 rows=6531119 width=45) (actual
time=5103.563..1007657.993 rows=4344197
loops=3)Hash Cond:
(lineitem.l_partkey =
part.p_partkey)->  Parallel Seq
Scan on lineitem  (cost=0.00..5861332.93 rows=15093 width=41) (actual
time=3.494..842667.110 rows=80004097
loops=3)->  Hash
(cost=263919.95..263919.95 rows=565651 width=4) (actual
time=4973.807..4973.807 rows=434469
loops=3)  Buckets: 131072
Batches: 8  Memory Usage:
2933kB  ->  Seq Scan on
part  (cost=0.00..263919.95 rows=565651 width=4) (actual
time=11.810..4837.287 rows=434469
loops=3)Filter:
((p_name)::text ~~
_%orchid%_::text)
Rows Removed by Filter: 7565531  ->
Hash  (cost=1052983.08..1052983.08 rows=31999708 width=22) (actual
time=240711.936..240711.936 rows=3200
loops=3)Buckets: 65536
Batches: 512  Memory Usage:
3941kB->  Seq Scan on partsupp
(cost=0.00..1052983.08 rows=31999708 width=22) (actual
time=0.033..228828.149 rows=3200
loops=3)->  Hash
(cost=1704962.28..1704962.28 rows=6728 width=8) (actual
time=253669.242..253669.242 rows=6000
loops=3)  Buckets: 131072  Batches:
1024  Memory Usage: 3316kB  ->  Seq
Scan on orders  (cost=0.00..1704962.28 rows=6728 width=8) (actual
time=0.038..237545.226 rows

Re: HDD vs SSD without explanation

2018-01-14 Thread Justin Pryzby
On Sun, Jan 14, 2018 at 12:44:00PM -0800, Neto pr wrote:
> Dear all
> 
> Someone help me analyze the two execution plans below (Explain ANALYZE
> used), is the  query 9 of TPC-H benchmark [1].
>
> I'm using a server HP Intel Xeon 2.8GHz/4-core - Memory 8GB HDD SAS 320GB
> 15 Krpm AND SSD Sansung EVO 500GB.
>
> I think maybe the execution plan is using more write operations, and so the
> HDD SAS 15Krpm has been faster.

The query plan is all garbled by mail , could you resend?  Or post a link from
https://explain.depesz.com/

To see if the query is causing many writes (due to dirty pages, sorts, etc),
run with explain(analyze,buffers) 

But from what I could tell, your problems are here:

->  Parallel Seq Scan on lineitem  (cost=0.00..5861332.93 rows=15093 
width=41) (actual TIME=3.494..842667.110 rows=80004097 loops=3)
vs
->  Parallel Seq Scan on lineitem  (cost=0.00..5861333.40 rows=15140 
width=41) (actual TIME=41.805..224438.909 rows=80004097 loops=3)

->  Seq Scan on partsupp (cost=0.00..1052983.08 rows=31999708 width=22) (actual 
TIME=0.033..228828.149 rows=3200 loops=3)
vs
->  Seq Scan on partsupp  (cost=0.00..1052934.38 rows=31994838 width=22) 
(actual TIME=0.037..37865.003 rows=3200 loops=3)

Can you reproduce the speed difference using dd ?
time sudo dd if=/dev/sdX of=/dev/null bs=1M count=32K skip=$((128*$RANDOM/32)) 
# set bs to optimal_io_size 

Or: bonnie++ -f -n0

What OS/kernel are you using?  LVM?  filesystem?  I/O scheduler?  partitions?
readahead?  blockdev --getra

If you're running under linux, maybe you can just send the output of:
for a in /sys/block/sdX/queue/*; do echo "$a `cat $a`"; done
or: tail 
/sys/block/sdX/queue/{minimum_io_size,optimal_io_size,read_ahead_kb,scheduler,rotational,max_sectors_kb,logical_block_size,physical_block_size}

Justin



Re: HDD vs SSD without explanation

2018-01-14 Thread Neto pr
Thanks for the reply.
I'll try upload the execution plan with Explain (analyse, buffer) for
website:  https://explain.depesz.com/

I'm make an experiment for a scientific research and this is what I
find strange, explaining better, strange HDD performance far outweigh
the performance of an SSD.

Do you think that if you run a VACUMM FULL the performance with the
SSD will be better than a 15Krpm SAS HDD?

Best Regards
Neto
 

  https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail";
target="_blank">https://ipmcdn.avast.com/images/icons/icon-envelope-tick-round-orange-animated-no-repeat-v1.gif";
alt="" width="46" height="29" style="width: 46px; height: 29px;"
/>
Livre de vĂ­rus. https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail";
target="_blank" style="color: #4453ea;">www.avast.com.  




2018-01-14 19:40 GMT-02:00 Justin Pryzby :
> On Sun, Jan 14, 2018 at 12:44:00PM -0800, Neto pr wrote:
>> Dear all
>>
>> Someone help me analyze the two execution plans below (Explain ANALYZE
>> used), is the  query 9 of TPC-H benchmark [1].
>>
>> I'm using a server HP Intel Xeon 2.8GHz/4-core - Memory 8GB HDD SAS 320GB
>> 15 Krpm AND SSD Sansung EVO 500GB.
>>
>> I think maybe the execution plan is using more write operations, and so the
>> HDD SAS 15Krpm has been faster.
>
> The query plan is all garbled by mail , could you resend?  Or post a link from
> https://explain.depesz.com/
>
> To see if the query is causing many writes (due to dirty pages, sorts, etc),
> run with explain(analyze,buffers)
>
> But from what I could tell, your problems are here:
>
> ->  Parallel Seq Scan on lineitem  (cost=0.00..5861332.93 rows=15093 
> width=41) (actual TIME=3.494..842667.110 rows=80004097 loops=3)
> vs
> ->  Parallel Seq Scan on lineitem  (cost=0.00..5861333.40 rows=15140 
> width=41) (actual TIME=41.805..224438.909 rows=80004097 loops=3)
>
> ->  Seq Scan on partsupp (cost=0.00..1052983.08 rows=31999708 width=22) 
> (actual TIME=0.033..228828.149 rows=3200 loops=3)
> vs
> ->  Seq Scan on partsupp  (cost=0.00..1052934.38 rows=31994838 width=22) 
> (actual TIME=0.037..37865.003 rows=3200 loops=3)
>
> Can you reproduce the speed difference using dd ?
> time sudo dd if=/dev/sdX of=/dev/null bs=1M count=32K 
> skip=$((128*$RANDOM/32)) # set bs to optimal_io_size
>
> Or: bonnie++ -f -n0
>
> What OS/kernel are you using?  LVM?  filesystem?  I/O scheduler?  partitions?
> readahead?  blockdev --getra
>
> If you're running under linux, maybe you can just send the output of:
> for a in /sys/block/sdX/queue/*; do echo "$a `cat $a`"; done
> or: tail 
> /sys/block/sdX/queue/{minimum_io_size,optimal_io_size,read_ahead_kb,scheduler,rotational,max_sectors_kb,logical_block_size,physical_block_size}
>
> Justin



Re: HDD vs SSD without explanation

2018-01-14 Thread Neto pr
2018-01-14 13:40 GMT-08:00 Justin Pryzby :
> On Sun, Jan 14, 2018 at 12:44:00PM -0800, Neto pr wrote:
>> Dear all
>>
>> Someone help me analyze the two execution plans below (Explain ANALYZE
>> used), is the  query 9 of TPC-H benchmark [1].
>>
>> I'm using a server HP Intel Xeon 2.8GHz/4-core - Memory 8GB HDD SAS 320GB
>> 15 Krpm AND SSD Sansung EVO 500GB.
>>
>> I think maybe the execution plan is using more write operations, and so the
>> HDD SAS 15Krpm has been faster.
>
> The query plan is all garbled by mail , could you resend?  Or post a link from
> https://explain.depesz.com/
>
> To see if the query is causing many writes (due to dirty pages, sorts, etc),
> run with explain(analyze,buffers)
>
> But from what I could tell, your problems are here:
>
> ->  Parallel Seq Scan on lineitem  (cost=0.00..5861332.93 rows=15093 
> width=41) (actual TIME=3.494..842667.110 rows=80004097 loops=3)
> vs
> ->  Parallel Seq Scan on lineitem  (cost=0.00..5861333.40 rows=15140 
> width=41) (actual TIME=41.805..224438.909 rows=80004097 loops=3)
>
> ->  Seq Scan on partsupp (cost=0.00..1052983.08 rows=31999708 width=22) 
> (actual TIME=0.033..228828.149 rows=3200 loops=3)
> vs
> ->  Seq Scan on partsupp  (cost=0.00..1052934.38 rows=31994838 width=22) 
> (actual TIME=0.037..37865.003 rows=3200 loops=3)
>
> Can you reproduce the speed difference using dd ?
> time sudo dd if=/dev/sdX of=/dev/null bs=1M count=32K 
> skip=$((128*$RANDOM/32)) # set bs to optimal_io_size
>
> Or: bonnie++ -f -n0
>
> What OS/kernel are you using?  LVM?  filesystem?  I/O scheduler?  partitions?
> readahead?  blockdev --getra

OS = Debian 8 64bits - 3.16.0-4

See below the Disk FileSystem 
root@hp2ml110deb:/# fdisk -l
Disk /dev/sda: 931.5 GiB, 1000204886016 bytes, 1953525168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 26F5EB21-30DB-44E4-B9E2-E8105846B6C4

Device  StartEndSectors   Size Type
/dev/sda1204810506231048576   512M EFI System
/dev/sda2 1050624 1937274879 1936224256 923.3G Linux filesystem
/dev/sda3  1937274880 1953523711   16248832   7.8G Linux swap

Disk /dev/sdb: 465.8 GiB, 500107862016 bytes, 976773168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes


The DBMS and tablespace of users is installed in /dev/sdb  SSD.

> If you're running under linux, maybe you can just send the output of:
> for a in /sys/block/sdX/queue/*; do echo "$a `cat $a`"; done
> or: tail 
> /sys/block/sdX/queue/{minimum_io_size,optimal_io_size,read_ahead_kb,scheduler,rotational,max_sectors_kb,logical_block_size,physical_block_size}
>
> Justin



Re: HDD vs SSD without explanation

2018-01-14 Thread Neto pr
2018-01-14 15:59 GMT-08:00 Neto pr :
> Thanks for the reply.
> I'll try upload the execution plan with Explain (analyse, buffer) for
> website:  https://explain.depesz.com/
>

Below is a new execution plan, with Analyze, BUFFERS. This time,
without changing anything in the configuration of the DBMS, I just
rebooted the DBMS, the time of 16 minutes was obtained, against the 26
minutes of another execution. But it still has not managed to exceed
the execution time in HDD SAS 15Krpm.
I was not able to upload to the site, because I'm saving the execution
plan in the database, and when I retrieve it, it loses the line
breaks, and the d site does not allow uploading.


--- Execution Plan with Buffers  executed on SSD
Stores.-

Finalize GroupAggregate  (cost=1588.33..15980046.69 rows=60150
width=66) (actual time=969248.287..973686.679 rows=175 loops=1)  Group
Key: nation.n_name, (date_part(_year_::text,
(orders.o_orderdate)::timestamp without time zone))  Buffers: shared
hit=1327602 read=2305013, temp read=1183857 written=1180940  ->
Gather Merge  (cost=1588.33..15977791.06 rows=120300 width=66)
(actual time=969222.164..973685.582 rows=525 loops=1)Workers
Planned: 2Workers Launched: 2Buffers: shared
hit=1327602 read=2305013, temp read=1183857 written=1180940->
Partial GroupAggregate  (cost=15821228.31..15962905.44 rows=60150
width=66) (actual time=941985.137..946403.344 rows=175 loops=3)
  Group Key: nation.n_name, (date_part(_year_::text,
(orders.o_orderdate)::timestamp without time zone))
Buffers: shared hit=3773802 read=7120852, temp read=3550293
written=3541542  ->  Sort  (cost=15821228.31..15838806.37
rows=7031225 width=57) (actual time=941954.595..943119.850
rows=4344197 loops=3)Sort Key: nation.n_name,
(date_part(_year_::text, (orders.o_orderdate)::timestamp without time
zone)) DESCSort Method: external merge  Disk:
320784kBBuffers: shared hit=3773802 read=7120852,
temp read=3550293 written=3541542->  Hash Join
(cost=4708859.28..14719466.13 rows=7031225 width=57) (actual
time=619996.638..933725.615 rows=4344197 loops=3)
Hash Cond: (lineitem.l_suppkey = supplier.s_suppkey)
   Buffers: shared hit=3773732 read=7120852, temp read=3220697
written=3211409  ->  Hash Join
(cost=4683017.71..14434606.65 rows=7071075 width=43) (actual
time=579893.395..926348.061 rows=4344197 loops=3)
  Hash Cond: (lineitem.l_orderkey = orders.o_orderkey)
   Buffers: shared hit=3758207 read=7108695, temp
read=3114271 written=3105025->  Hash
Join  (cost=1993687.71..11297331.33 rows=7071075 width=47) (actual
time=79741.803..805259.856 rows=4344197 loops=3)
   Hash Cond: ((lineitem.l_suppkey = partsupp.ps_suppkey)
AND (lineitem.l_partkey = partsupp.ps_partkey))
  Buffers: shared hit=1754251 read=5797780, temp
read=2369849 written=2366741  ->
Hash Join  (cost=273201.71..9157213.44 rows=7071075 width=45) (actual
time=5363.078..672302.517 rows=4344197 loops=3)
Hash Cond: (lineitem.l_partkey = part.p_partkey)
 Buffers: shared hit=325918
read=5027133, temp read=1742658 written=1742616
->  Parallel Seq Scan on lineitem
(cost=0.00..5861333.20 rows=15120 width=41) (actual
time=0.129..536226.436 rows=80004097 loops=3)
Buffers: shared hit=2 read=4861280
   ->  Hash  (cost=263921.00..263921.00
rows=565657 width=4) (actual time=5362.100..5362.100 rows=434469
loops=3)  Buckets:
131072  Batches: 8  Memory Usage: 2933kB
   Buffers: shared hit=325910 read=165853, temp
written=3327  ->  Seq
Scan on part  (cost=0.00..263921.00 rows=565657 width=4) (actual
time=0.025..5279.959 rows=434469 loops=3)
  Filter: ((p_name)::text ~~ _%orchid%_::text)
   Rows Removed by
Filter: 7565531
Buffers: shared hit=325910 read=165853
 ->  Hash  (cost=1052986.00..1052986.00 rows=3200 width=22)
(actual time=74231.061..74231.061 rows=3200 loops=3)
 Buckets: 65536  Batches: 512  Memory
Usage: 3941kBBuffers:
shared hit=1428311 read=770647, temp written=513846
->  Seq Scan on partsupp
(cost=0.00..1052986.00 rows=3200 width=22) (actual
time=0.037..66316.652 rows=3200 loops=3)
   Buffers: shared hit=1428311 read=770647
   ->  Hash  (cost=1704955.00..1704955.00
rows=6000 width=8) (actual time=46310.630..46310.63

Re: HDD vs SSD without explanation

2018-01-14 Thread Justin Pryzby
On Sun, Jan 14, 2018 at 06:25:40PM -0800, Neto pr wrote:
> > The query plan is all garbled by mail , could you resend?  Or post a link 
> > from
> > https://explain.depesz.com/

On Sun, Jan 14, 2018 at 06:36:02PM -0800, Neto pr wrote:
> I was not able to upload to the site, because I'm saving the execution
> plan in the database, and when I retrieve it, it loses the line breaks,

That's why it's an issue for me, too..

> > What OS/kernel are you using?  LVM?  filesystem?  I/O scheduler?  
> > partitions?
> 
> See below the Disk FileSystem 
> root@hp2ml110deb:/# fdisk -l
> Disk /dev/sda: 931.5 GiB, 1000204886016 bytes, 1953525168 sectors
> 
> Disk /dev/sdb: 465.8 GiB, 500107862016 bytes, 976773168 sectors
> Units: sectors of 1 * 512 = 512 bytes
> Sector size (logical/physical): 512 bytes / 512 bytes
> I/O size (minimum/optimal): 512 bytes / 512 bytes
> 
What about sdb partitions/FS?

On Sun, Jan 14, 2018 at 06:25:40PM -0800, Neto pr wrote:
> The DBMS and tablespace of users is installed in /dev/sdb  SSD.

Is that also a temp_tablespace ?  Or are your hashes spilling to HDD instead ?

Group Key: nation.n_name, (date_part(_year_::text, 
(orders.o_orderdate)::timestamp without time zone))
Buffers: shared hit=3773802 read=7120852, temp read=3550293 written=3541542

Are your SSD being used for anything else ?

What about these?

> > readahead?  blockdev --getra

> > If you're running under linux, maybe you can just send the output of:
> > for a in /sys/block/sdX/queue/*; do echo "$a `cat $a`"; done
> > or: tail 
> > /sys/block/sdX/queue/{minimum_io_size,optimal_io_size,read_ahead_kb,scheduler,rotational,max_sectors_kb,logical_block_size,physical_block_size}

> > Can you reproduce the speed difference using dd ?
> > time sudo dd if=/dev/sdX of=/dev/null bs=1M count=32K 
> > skip=$((128*$RANDOM/32)) # set bs to optimal_io_size
> >
> > Or: bonnie++ -f -n0

Justin