Strange query behaviour between 9.4 and 12beta1

2019-06-04 Thread Vasilis Ventirozos
Hello everyone, 
I started comparing performance between postgres 9.4 and 12beta1 more 
specifically comparing the new (materialized) CTE.
The statements i use are application statements that i have little control over,
Hardware is identical as both clusters are running on the same server, on the 
same disks, with the same data.
Also, cluster settings are almost identical and both clusters have been 
analyzed.
In all my tests 12 is faster , sometimes much faster, apart from one single 
query that takes ~12 seconds on 9.4 and nearly 300 seconds on 12.

Plans for both :
Plan for 12 
plan for 9.4 

The plans are obfuscated , apologies for that but what stands out is the 
following :


Hash Left Join 

 (cost=200,673.150..203,301.940 rows=153,121 width=64) (actual 
time=1,485.883..284,536.440 rows=467,123 loops=1)
Hash Cond: (lima_sierra(six_lima_november2.victor_romeo, 1) = 
foxtrot_hotel.victor_romeo)
Join Filter: (whiskey_uniform1.six_sierra = foxtrot_hotel.uniform_juliet)
Rows Removed by Join Filter: 4549925366


I really can't understand what these 4.5bil rows have been removed from, there 
is nothing suggesting that this dataset was ever created (eg. temp)
and these numbers definitely don't match what i was expecting, which is more or 
less what i'm seeing in 9.4 plan.

Obviously i've tested this more than once and this behaviour consists.


Best Regards,
Vasilis Ventirozos



Re: Shortest offline window on database migration

2019-06-04 Thread Stephen Frost
Greetings,

* Haroldo Kerry ([email protected]) wrote:
> The bottleneck at dump is CPU (a single one, on a 44 thread server), as we
> are using the -Fc option, that does not allow multiple jobs.
> We tried some time ago to use the --jobs option of pg_dump but it was
> slower, even with more threads. Our guess is the sheer volume of files
> outweighs the processing gains of using a compressed file output. Also
> pg_dump even with multiple jobs spends a lot of time (1h+) on the "reading
> dependency data" section that seems to be single threaded (our application
> is multi-tenant and we use schemas to separate tenant data, hence we have a
> lot of tables).

You might want to reconsider using the separate-schemas-for-tenants
approach.  This isn't the only annoyance you can run into with lots and
lots of tables.  That said, are you using the newer version of pg_dump
(which is what you should be doing when migrating to a newer version of
PG, always)?  We've improved it over time, though I can't recall off-hand
if this particular issue was improved of in-between the releases being
discussed here.  Of course, lots of little files and dealing with them
could drag down performance when working in parallel.  Still a bit
surprised that it's ending up slower than -Fc.

> We are creating the replica using :
> docker exec pg-2 pg_basebackup -h 192.168.0.107 -U replication -P --xlog -D
> /var/lib/postgresql/data_9.6
> and it is taking 1h10m , instead of the 2h I reported initially, because we
> were using rsync with checksums to do it, after experimenting with
> pg_basebackup we found out it is faster, rsync was taking 1h just to
> calculate all checksums. Thanks for your insight on this taking too long.

So, it's a bit awkward still, unfortunately, but you can use pgbackrest
to effectively give you a parallel-replica-build.  The steps are
something like:

Get pgbackrest WAL archiving up and going, with the repo on the
destination server/filesystem, but have 'compress=n' in the
pgbackrest.conf for the repo.

Run: pgbackrest --stanza=mydb --type=full --process-max=8 backup

Once that's done, just do:

mv /path/to/repo/backup/mydb/20190605-12F/pg_data /new/pgdata
chmod -R g-rwx /new/pgdata

Then in /new/pgdata, create a recovery.conf file like:

restore_command = 'pgbackrest --stanza=mydb archive-get %f "%p"'

And start up the DB server.

We have some ideas about how make that whole thing cleaner but the
rewrite into C has delayed our efforts, perhaps once that's done (this
fall), we can look at it.

Of course, you won't have an actual backup of the new database server at
that point yet, so you'll want to clean things up and make that happen
ASAP.  Another option, which is what I usually recommend, is just to
take a new backup (properly) and then do a restore from it, but that'll
obviously take longer since there's two copies being done instead of one
(though you can parallelize to your heart's content, so it can still be
quite fast if you have enough CPU and I/O).

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: Strange query behaviour between 9.4 and 12beta1

2019-06-04 Thread Tomas Vondra

On Tue, Jun 04, 2019 at 05:34:07PM +0300, Vasilis Ventirozos wrote:

Hello everyone,
I started comparing performance between postgres 9.4 and 12beta1 more 
specifically comparing the new (materialized) CTE.


Are you saying the CTE is specified as MATERIALIZED in the query on 12?
Because I don't see it in the explain plan (it's mentioned in the 9.4
plan, though).


The statements i use are application statements that i have little control over,
Hardware is identical as both clusters are running on the same server, on the 
same disks, with the same data.
Also, cluster settings are almost identical and both clusters have been 
analyzed.
In all my tests 12 is faster , sometimes much faster, apart from one single 
query that takes ~12 seconds on 9.4 and nearly 300 seconds on 12.

Plans for both :
Plan for 12 
plan for 9.4 

The plans are obfuscated , apologies for that but what stands out is the 
following :



Meh.



Hash Left Join 

 (cost=200,673.150..203,301.940 rows=153,121 width=64) (actual 
time=1,485.883..284,536.440 rows=467,123 loops=1)
Hash Cond: (lima_sierra(six_lima_november2.victor_romeo, 1) = 
foxtrot_hotel.victor_romeo)
Join Filter: (whiskey_uniform1.six_sierra = foxtrot_hotel.uniform_juliet)
Rows Removed by Join Filter: 4549925366



You have two equality conditions for the join. The first one is used to
match rows by the hash join itself - it's used to compute the hash
value and lookups. But there may be multiple rows that match on either
side, generating additional "virtual rows". Those are then removed by
the second condition.

Consider for example simple cross-join on this table:

  a  |  b
 -
  1  |  a
  1  |  b
  2  |  a
  2  |  b

and the query is

 SELECT * FROM t t1 JOIN t t2 ON (t1.a = t2.a AND t1.b = t2.b)

Now, in the first phase, the hash join might only do (t1.a = t2.a),
which will generate 8 rows

 a | t1.b | t2.b
 
 1 |a |a
 1 |a |b
 1 |b |a
 1 |b |b
 2 |a |a
 2 |a |b
 2 |b |a
 2 |b |b

And then it will apply the second condition (t1.b = t2.b) as a "filter"
which will remove some of the rows. In your case the first step
generates 4.5B rows the second step discards.

I'm not sure why we don't use both conditions as a hash condition.
Perhaps it's a data type that does not support hashing, and on 9.4 that
does not matter because we end up using merge join.




I really can't understand what these 4.5bil rows have been removed from, there 
is nothing suggesting that this dataset was ever created (eg. temp)
and these numbers definitely don't match what i was expecting, which is more or 
less what i'm seeing in 9.4 plan.

Obviously i've tested this more than once and this behaviour consists.


Best Regards,
Vasilis Ventirozos




regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services