Re: Why could different data in a table be processed with different performance?

2018-09-25 Thread Fabio Pardi


On 25/09/18 00:28, Vladimir Ryabtsev wrote:
>
> > it is not unusual to have 1GB cache or more...  and do not forget to drop 
> > the cache between tests + do a sync
> I conducted several long runs of dd, so I am sure that this numbers are 
> fairly correct. However, what worries me is that I test sequential read speed 
> while during my experiments Postgres might need to read from random places 
> thus reducing real read speed dramatically. I have a feeling that this can be 
> the reason.
> I also reviewed import scripts and found the import was done in DESCENDING 
> order of IDs. It was so to get most recent records sooner, may be it caused 
> some inefficiency in the storage... But again, it was so for both ranges.
>
> > - how big is your index?
> pg_table_size('articles_pkey') = 1561 MB
>
> > - how big is the table?
> pg_table_size('articles') = 427 GB
> pg_table_size('pg_toast.pg_toast_221558') = 359 GB
>

Since you have a very big toast table, given you are using spinning disks, I 
think that increasing the block size will bring benefits. (Also partitioning is 
not a bad idea.)

If my understanding of TOAST is correct, if data will fit blocks of let's say 
16 or 24 KB then one block retrieval from Postgres will result in less seeks on 
the disk and less possibility data gets sparse on your disk. (a very quick and 
dirty calculation, shows your average block size is 17KB)

One thing you might want to have a look at, is again the RAID controller and 
your OS. You might want to have all of them aligned in block size, or maybe 
have Postgres ones a multiple of what OS and RAID controller have.



> > - given the size of shared_buffers, almost 2M blocks should fit, but you 
> > say 2 consecutive runs still are hitting the disk. That's strange indeed 
> > since you are using way more than 2M blocks.
> TBH, I cannot say I understand your calculations with number of blocks...
shared_buffers = 15GB IIRC (justpaste link is gone)

15 * 1024 *1024 = 15728640 KB

using 8KB blocks = 1966080 total blocks

if you query shared_buffers you should get the same number of total available 
blocks

> But to clarify: consecutive runs with SAME parameters do NOT hit the disk, 
> only the first one does, consequent ones read only from buffer cache.
>
I m  a bit confused.. every query you pasted contains 'read':

        Buffers: shared hit=50 read=2378

and 'read' means you are reading from disk (or OS cache). Or not?



> > - As Laurenz suggested (VACUUM FULL), you might want to move data around. 
> > You can try also a dump + restore to narrow the problem to data or disk
> I launched VACUUM FULL, but it ran very slowly, according to my calculation 
> it might take 17 hours. I will try to do copy data into another table with 
> the same structure or spin up another server, and let you know.
>
cool, that should also clarify if the reverse order matters or not

> > - You might also want to try to see the disk graph of Windows, while you 
> > are running your tests. It can show you if data (and good to know how much) 
> > is actually fetching from disk or not.
> I wanted to do so but I don't have access to Hyper-V server, will try to 
> request credentials from admins.
>
> Couple more observations:
> 1) The result of my experiment is almost not affected by other server load. 
> Another user was running a query (over this table) with read speed ~130 MB/s, 
> while with my query read at 1.8-2 MB/s.
> 2) iotop show higher IO % (~93-94%) with slower read speed (though it is not 
> quite clear what this field is). A process from example above had ~55% IO 
> with 130 MB/s while my process had ~93% with ~2MB/s.
>
I think because you are looking at 'IO' column which indicates (from manual) 
'..the percentage of  time  the  thread/process  spent [..]  while waiting on 
I/O.'

> Regards,
> Vlad
>

regards,

fabio pardi


Re: link to Slow_Query_Questions from wiki/Main Page

2018-09-25 Thread Justin Pryzby
I asked few weeks ago [0] but didn't get a response on -docs so resending here
for wider review/discussion/.

[0] 
https://www.postgresql.org/message-id/flat/20180908012957.GA15350%40telsasoft.com

On Fri, Sep 07, 2018 at 08:29:57PM -0500, Justin Pryzby wrote:
> Hi,
> 
> I'm suggesting to link to:
> https://wiki.postgresql.org/wiki/Slow_Query_Questions
> 
> From either:
> https://wiki.postgresql.org/wiki/Main%20Page
> or:
> https://wiki.postgresql.org/wiki/Performance_Optimization
> 
> I know it's a wiki, but it looks like I'm not allowed to edit the 'Main' page,
> so I'm asking here, which is prolly for the best anyway.  Feel free to forward
> to or ask for opinion on the -perform list.
> 
> Thanks,
> Justin



Re: link to Slow_Query_Questions from wiki/Main Page

2018-09-25 Thread Alvaro Herrera
On 2018-Sep-25, Justin Pryzby wrote:

> I asked few weeks ago [0] but didn't get a response on -docs so resending here
> for wider review/discussion/.

I support the idea of adding a link to "Performance Optimization".
That's not a protected page, so you should be able to do it.


> [0] 
> https://www.postgresql.org/message-id/flat/20180908012957.GA15350%40telsasoft.com
> 
> On Fri, Sep 07, 2018 at 08:29:57PM -0500, Justin Pryzby wrote:
> > Hi,
> > 
> > I'm suggesting to link to:
> > https://wiki.postgresql.org/wiki/Slow_Query_Questions
> > 
> > From either:
> > https://wiki.postgresql.org/wiki/Main%20Page
> > or:
> > https://wiki.postgresql.org/wiki/Performance_Optimization
> > 
> > I know it's a wiki, but it looks like I'm not allowed to edit the 'Main' 
> > page,
> > so I'm asking here, which is prolly for the best anyway.  Feel free to 
> > forward
> > to or ask for opinion on the -perform list.
> > 
> > Thanks,
> > Justin
> 


-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Why could different data in a table be processed with different performance?

2018-09-25 Thread Vladimir Ryabtsev
> 1) Which file system are you using?
>From Linux's view it's ext4. Real vmdx file on Hyper-V is stored on NTFS,
as far as I know.

> 2) What is the segment layout of the LVM PVs and LVs?
I am a bit lost with it. Is that what you are asking about?
master:
# pvs --segments
  PV VGFmt  Attr PSize  PFree  Start SSize
  /dev/sda5  ubuntu-vg lvm2 a--  19.76g 20.00m 0  4926
  /dev/sda5  ubuntu-vg lvm2 a--  19.76g 20.00m  4926   127
  /dev/sda5  ubuntu-vg lvm2 a--  19.76g 20.00m  5053 5
# lvs --segments
  LV VGAttr  #Str Type   SSize
  root   ubuntu-vg -wi-ao---1 linear  19.24g
  swap_1 ubuntu-vg -wi-ao---1 linear 508.00m

slave:
# pvs --segments
  PV VG  Fmt  Attr PSize   PFree Start SSize
  /dev/sda3  postgresnlpslave-vg lvm2 a--  429.77g0  0 110021
  /dev/sda5  postgresnlpslave-vg lvm2 a--  169.52g0  0  28392
  /dev/sda5  postgresnlpslave-vg lvm2 a--  169.52g0  28392   2199
  /dev/sda5  postgresnlpslave-vg lvm2 a--  169.52g0  30591   2560
  /dev/sda5  postgresnlpslave-vg lvm2 a--  169.52g0  33151  10246
  /dev/sdb1  postgresnlpslave-vg lvm2 a--  512.00g0  0 131071
# lvs --segments
  LV VG  Attr   #Str Type   SSize
  root   postgresnlpslave-vg -wi-ao1 linear 110.91g
  root   postgresnlpslave-vg -wi-ao1 linear  40.02g
  root   postgresnlpslave-vg -wi-ao1 linear  10.00g
  root   postgresnlpslave-vg -wi-ao1 linear 429.77g
  root   postgresnlpslave-vg -wi-ao1 linear 512.00g
  swap_1 postgresnlpslave-vg -wi-ao1 linear   8.59g

> 3) Do you use LVM for any "extra" features, such as snapshots?
I don't think so, but how to check? vgs gives #SN = 0, is that it?

> 4) You can try using seekwatcher to see where on the disk the slowness is
occurring. You get a chart similar to this
http://kernel.dk/dd-md0-xfs-pdflush.png
> 5) BCC is a collection of tools that might shed a light on what is
happening. https://github.com/iovisor/bcc
Will look into it.

Regards,
Vlad


Re: To keep indexes in memory, is large enough effective_cache_size enough?

2018-09-25 Thread David Rowley
On Tue, 25 Sep 2018 at 18:36, Sam R.  wrote:
> Regarding double buffering: I do not know how much double buffering would 
> slow down operations.
> It could also be possible to turn off kernel page cache on our DB server, to 
> avoid double buffering. Although, we may still keep it in use.

I think you've misunderstood double buffering.  The double buffering
itself does not slow anything down. If the buffer is in shared buffers
already then it does not need to look any further for it. Double
buffering only becomes an issue when buffers existing 2 times in
memory causes other useful buffers to appear 0 times.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services