Re: Memory and hard ware calculation :

2019-01-22 Thread Fabio Pardi
Hi,

As per '1100 concurrent connections' I think you want to have a close
look to your CPU.

It looks to me a lot of connections, much more than what 8 cores can
handle and you will probably end up in a situation where your CPU spends
a lot of time in handling the connections rather than serving them.


regards,

fabio pardi


On 1/21/19 12:24 PM, Rangaraj G wrote:
> Hi,
> 
>  
> 
> Memory and hard ware calculation :
> 
>  
> 
>  
> 
> How much memory required to achieve performance with the 6GB RAM, 8
> Core, Max connection 1100 concurrent connection and O/p memory from
> procedure 1GB ?
> 
>  
> 
>  
> 
> Is there any changes required in hardware and work memory expansion ?  
> 
>  
> 
>  
> 
> Regards
> 
> RANGARAJ G
> 
>  
> 



Very long query planning times for database with lots of partitions

2019-01-22 Thread Mickael van der Beek
Hey everyone,

I have a PostgreSQL 10 database that contains two tables which both have
two levels of partitioning (by list and using a single value). Meaning that
a partitioned table gets repartitioned again.

The data in my use case is stored on 5K to 10K partitioned tables (children
and grand-children of the two tables mentioned above) depending on usage
levels.

Three indexes are set on the grand-child partition. The partitioning
columns are not covered by them.
(I don't believe that it is needed to index partition columns no?)

With this setup, I experience queries that have very slow planning times
but fast execution times.
Even for simple queries where only a couple partitions are searched on and
the partition values are hard-coded.

Researching the issue, I thought that the linear search in use by
PostgreSQL 10 to find the partition table metadata was the cause.

cf: https://blog.2ndquadrant.com/partition-elimination-postgresql-11/

So I decided to try ou PostgreSQL 11 which included the two aforementioned
fixes:

-
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=499be013de65242235ebdde06adb08db887f0ea5
-
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=9fdb675fc5d2de825414e05939727de8b120ae81

Helas, it seems that the version update didn't change anything.
I ran an `ANALYZE` before doing my tests so I believe that the statistics
are calculated and fresh.

Now I know that PostgreSQL doesn't like having lots of partitions but I
still would like to understand why the query planner is so slow in
PostgreSQL 10 and PostgreSQL 11.
(I was also wondering what "a lot" of partitions is in PostgreSQL? When I
look at use cases of extensions like TimescaleDB, I would expect that 5K to
10K partitions wouldn't be a whole lot.)

An example of a simple query that I run on both PostgreSQL version would be:

EXPLAIN ANALYZE
> SELECT
> table_a.a,
> table_b.a
> FROM
> (
> SELECT
> a,
> b
> FROM
> table_a
> WHERE
> partition_level_1_column = 'foo'
> AND
> partition_level_2_column = 'bar'
> )
> AS table_a
> INNER JOIN
> (
> SELECT
> a,
> b
> FROM
> table_b
> WHERE
> partition_level_1_column = 'baz'
> AND
> partition_level_2_column = 'bat'
> )
> AS table_b
> ON table_b.b = table_a.b
> LIMIT
> 10;


Running this query on my database with 5K partitions (split roughly 2/3rds
of the partitions for table_b and 1/3rd of the partitions for table_a) will
return:

- Planning Time: 7155.647 ms
- Execution Time: 2.827 ms

Thank you in advance for your help!

Mickael


Re: Very long query planning times for database with lots of partitions

2019-01-22 Thread Justin Pryzby
On Tue, Jan 22, 2019 at 02:44:29PM +0100, Mickael van der Beek wrote:
> Hey everyone,
> 
> I have a PostgreSQL 10 database that contains two tables which both have
> two levels of partitioning (by list and using a single value). Meaning that
> a partitioned table gets repartitioned again.
> 
> The data in my use case is stored on 5K to 10K partitioned tables (children
> and grand-children of the two tables mentioned above) depending on usage
> levels.
> 
> Three indexes are set on the grand-child partition. The partitioning
> columns are not covered by them.
> (I don't believe that it is needed to index partition columns no?)
> 
> With this setup, I experience queries that have very slow planning times
> but fast execution times.
> Even for simple queries where only a couple partitions are searched on and
> the partition values are hard-coded.
> 
> Researching the issue, I thought that the linear search in use by
> PostgreSQL 10 to find the partition table metadata was the cause.
> 
> cf: https://blog.2ndquadrant.com/partition-elimination-postgresql-11/
> 
> So I decided to try ou PostgreSQL 11 which included the two aforementioned
> fixes:
> 
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=499be013de65242235ebdde06adb08db887f0ea5
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=9fdb675fc5d2de825414e05939727de8b120ae81

Those reduce the CPU time needed, but that's not the most significant issue.

For postgres up through 11, including for relkind=p, planning requires 1)
stat()ing every 1GB file for every partition, even those partitions which are
eventually excluded by constraints or partition bounds ; AND, 2) open()ing
every index on every partition, even if it's excluded later.

Postgres 12 is expected to resolve this and allow "many" (perhaps 10k) of
partitions: https://commitfest.postgresql.org/20/1778/

I think postgres through 11 would consider 1000 partitions to be "too many".

You *might* be able to mitigate the high cost of stat()ing tables by ensuring
that the table metadata stays in OS cache, by running something like:
 find /var/lib/pgsql /tablespace -ls

You *might* be able to mitigate the high cost of open()ing the indices by
keeping their first page in cache (preferably postgres buffer cache)..either by
running a cronjob to run explain, or perhaps something like pg_prewarm on the
indices.  (I did something like this for our largest customers to improve
performance as a stopgap).

Justin



RE: Very long query planning times for database with lots of partitions

2019-01-22 Thread Steven Winfield
Do you have constraint_exclusion set correctly (i.e. ‘on’ or ‘partition’)?
If so, does the EXPLAIN output mention all of your parent partitions, or are 
some being successfully pruned?
Planning times can be sped up significantly if the planner can exclude parent 
partitions, without ever having to examine the constraints of the child (and 
grandchild) partitions. If this is not the case, take another look at your 
query and try to figure out why the planner might believe a parent partition 
cannot be outright disregarded from the query – does the query contain a filter 
on the parent partitions’ partition key, for example?

I believe Timescaledb has its own query planner optimisations for discarding 
partitions early at planning time.

Good luck,
Steve.



This email is confidential. If you are not the intended recipient, please 
advise us immediately and delete this message. 
The registered name of Cantab- part of GAM Systematic is Cantab Capital 
Partners LLP. 
See - http://www.gam.com/en/Legal/Email+disclosures+EU for further information 
on confidentiality, the risks of non-secure electronic communication, and 
certain disclosures which we are required to make in accordance with applicable 
legislation and regulations. 
If you cannot access this link, please notify us by reply message and we will 
send the contents to you.

GAM Holding AG and its subsidiaries (Cantab – GAM Systematic) will collect and 
use information about you in the course of your interactions with us. 
Full details about the data types we collect and what we use this for and your 
related rights is set out in our online privacy policy at 
https://www.gam.com/en/legal/privacy-policy. 
Please familiarise yourself with this policy and check it from time to time for 
updates as it supplements this notice.


Re: Very long query planning times for database with lots of partitions

2019-01-22 Thread Mickael van der Beek
Thank both of you for your quick answers,

@Justin Based on your answer it would seem to confirm that partitioning or
at least partitioning this much is not the correct direction to take.
The reason I originally wanted to use partitioning was that I'm storing a
multi-tenant graph and that as the data grew, so did the indexes and once
they were larger than the available RAM, query performance went down the
drain.
The two levels of partitioning let me create one level for the tenant-level
partitioning and one level for the business logic where I could further
partition the tables into the different types of nodes and edges I was
storing.
(The table_a and table_b in my example query. There is also a table_c which
connect table_a and table_b but I wanted to keep it simple.)
Another reason was that we do regular, automated cleanups of the data and
dropping all the data (hundreds of thousands of rows) for a tenant is very
fast with DROP TABLE of a partition and rather slow with a regular DELETE
query (even if indexed).
With the redesign of the database schema (that included the partitioning
changes), I also dramatically reduced the amounts and size of data per row
on the nodes and edges by storing the large and numerous metadata fields on
separate tables that are not part of the graph traversal process.
Based on the usage number I see, I would expect around 12K tenants in the
medium future which means that even partitioning per tenant on those two
tables would lead to 24K partitions which is way above your approximate
limit of 1K partitions.
Queries are always limited to one tenant's data which was one of the
motivations behind partitioning in the first place.
Not sure what you would advise in this case for a multi-tenant graph?

@Steven, yes, constaint_exclusion is set to the default value of
'partition'.
The EXPLAIN ANALYZE output also successfully prunes the partitions
correctly.
So the query plan looks sounds and the query execution confirms this.
But reaching that point is really what the issue is for me.



On Tue, Jan 22, 2019 at 3:07 PM Steven Winfield <
[email protected]> wrote:

> Do you have constraint_exclusion set correctly (i.e. ‘on’ or ‘partition’)?
>
> If so, does the EXPLAIN output mention all of your parent partitions, or
> are some being successfully pruned?
>
> Planning times can be sped up significantly if the planner can exclude
> parent partitions, without ever having to examine the constraints of the
> child (and grandchild) partitions. If this is not the case, take another
> look at your query and try to figure out why the planner might believe a
> parent partition cannot be outright disregarded from the query – does the
> query contain a filter on the parent partitions’ partition key, for example?
>
>
>
> I believe Timescaledb has its own query planner optimisations for
> discarding partitions early at planning time.
>
>
>
> Good luck,
>
> Steve.
>
>
>
> --
>
>
> *This email is confidential. If you are not the intended recipient, please
> advise us immediately and delete this message. The registered name of
> Cantab- part of GAM Systematic is Cantab Capital Partners LLP. See -
> http://www.gam.com/en/Legal/Email+disclosures+EU
>  for further information
> on confidentiality, the risks of non-secure electronic communication, and
> certain disclosures which we are required to make in accordance with
> applicable legislation and regulations. If you cannot access this link,
> please notify us by reply message and we will send the contents to you.GAM
> Holding AG and its subsidiaries (Cantab – GAM Systematic) will collect and
> use information about you in the course of your interactions with us. Full
> details about the data types we collect and what we use this for and your
> related rights is set out in our online privacy policy at
> https://www.gam.com/en/legal/privacy-policy
> . Please familiarise yourself
> with this policy and check it from time to time for updates as it
> supplements this notice-- *
>


RE: Memory and hard ware calculation :

2019-01-22 Thread Rangaraj G
Hi,

My question
Our connection is 1100 parallel connection and 1 GB I/p data and 1 GB o/p data 
in each connection, currently we are using 64 GB RAM and 8 core.

But we need all the reports below 3 seconds.

So kindly suggest expanding my hard ware and work memory.

Is there any possibility to get your mobile number ?

Regards,
RANGARAJ G


From: Cleiton Luiz Domazak 
Sent: Monday, January 21, 2019 11:13 PM
To: Rangaraj G 
Cc: [email protected]; [email protected]; 
[email protected]
Subject: Re: Memory and hard ware calculation :



On Mon, Jan 21, 2019 at 5:35 PM Rangaraj G 
mailto:[email protected]>> wrote:
Hi,

Memory and hard ware calculation :


How much memory required to achieve performance with the 6GB RAM, 8 Core, Max 
connection 1100 concurrent connection and O/p memory from procedure 1GB ?

https://pgtune.leopard.in.ua/#/

Is there any changes required in hardware and work memory expansion ?


Regards
RANGARAJ G



SELECT performance drop

2019-01-22 Thread Jan Nielsen
I just notice that one of my Hibernate JPA SELECTs against my Heroku PG
10.4 instance is taking a  l o o o g  to complete
 as this EXPLAIN (ANALYZE, BUFFERS)
shows. The database is 591MB running in PG 10.4 on Heroku with the
following row counts and index use:


relname | percent_of_times_index_used | rows_in_table
+-+---
fm_order   | 99  |   2233237
fm_grant   | Insufficient data   |204282
fm_trader  | 5   | 89037
fm_capital | 99  | 84267
fm_session | 99  |  7182
fm_person  | 99  |  4365
fm_allocation  | 96  |  4286
fm_approval| Insufficient data   |   920
fm_market  | 97  |   583
fm_account | 93  |   451
fm_marketplace | 22  |   275


and the offending JPA JPQL is:

@Query("SELECT o FROM Order o WHERE"
 + "   o.type  = 'LIMIT'   "
 + "   AND o.session.original  = :originalSessionId"
 + "   AND ( ( "
 + " o.consumer IS NULL"
 + ") OR ( "
 + " o.consumer IS NOT NULL"
 + " AND o.consumer > 0"
 + " AND EXISTS (  "
 + "   SELECT 1 FROM Order oo WHERE"
 + " oo.id= o.consumer "
 + " AND oo.session.original  = :originalSessionId "
 + " AND oo.type  = 'LIMIT'"
 + " AND oo.owner!= o.owner"
 + " ) "
 + "   )   "
 + " ) "
 + " ORDER BY o.lastModifiedDate DESC  ")

I'd like get this SELECT to complete in a few milliseconds again instead of
the several minutes (!) it is now taking. Any ideas what I might try?

Thanks for your time,

Jan


Re: SELECT performance drop

2019-01-22 Thread Jan Nielsen
On Tue, Jan 22, 2019 at 1:04 PM Jan Nielsen 
wrote:

> I just notice that one of my Hibernate JPA SELECTs against my Heroku PG
> 10.4 instance is taking a  l o o o g  to complete
>  as this EXPLAIN (ANALYZE, BUFFERS)
> shows. The database is 591MB running in PG 10.4 on Heroku with the
> following row counts and index use:
>
>
> relname | percent_of_times_index_used | rows_in_table
> +-+---
> fm_order   | 99  |   2233237
> fm_grant   | Insufficient data   |204282
> fm_trader  | 5   | 89037
> fm_capital | 99  | 84267
> fm_session | 99  |  7182
> fm_person  | 99  |  4365
> fm_allocation  | 96  |  4286
> fm_approval| Insufficient data   |   920
> fm_market  | 97  |   583
> fm_account | 93  |   451
> fm_marketplace | 22  |   275
>
>
> and the offending JPA JPQL is:
>
> @Query("SELECT o FROM Order o WHERE"
>  + "   o.type  = 'LIMIT'   "
>  + "   AND o.session.original  = :originalSessionId"
>  + "   AND ( ( "
>  + " o.consumer IS NULL"
>  + ") OR ( "
>  + " o.consumer IS NOT NULL"
>  + " AND o.consumer > 0"
>  + " AND EXISTS (  "
>  + "   SELECT 1 FROM Order oo WHERE"
>  + " oo.id= o.consumer "
>  + " AND oo.session.original  = :originalSessionId "
>  + " AND oo.type  = 'LIMIT'"
>  + " AND oo.owner!= o.owner"
>  + " ) "
>  + "   )   "
>  + " ) "
>  + " ORDER BY o.lastModifiedDate DESC  ")
>
>
...which Hibernate converts to:

SELECT order0_.id AS id1_7_,
   order0_.created_by AS created_2_7_,
   order0_.created_date   AS created_3_7_,
   order0_.last_modified_by   AS last_mod4_7_,
   order0_.last_modified_date AS last_mod5_7_,
   order0_.consumer   AS consumer6_7_,
   order0_.market_id  AS market_14_7_,
   order0_.original   AS original7_7_,
   order0_.owner_id   AS owner_i15_7_,
   order0_.owner_target   AS owner_ta8_7_,
   order0_.price  AS price9_7_,
   order0_.session_id AS session16_7_,
   order0_.side   AS side10_7_,
   order0_.supplier   AS supplie11_7_,
   order0_.type   AS type12_7_,
   order0_.units  AS units13_7_
FROM   fm_order order0_
   CROSS JOIN fm_session session1_
WHERE  order0_.session_id = session1_.id
   AND order0_.type = 'LIMIT'
   AND session1_.original = 7569
   AND ( order0_.consumer IS NULL
  OR ( order0_.consumer IS NOT NULL )
 AND order0_.consumer > 0
 AND ( EXISTS (SELECT 1
   FROM   fm_order order2_
  CROSS JOIN fm_session session3_
   WHERE  order2_.session_id = session3_.id
  AND order2_.id = order0_.consumer
  AND session3_.original = 7569
  AND order2_.type = 'LIMIT'
  AND
 order2_.owner_id <> order0_.owner_id) ) )
ORDER  BY order0_.last_modified_date DESC;




> I'd like get this SELECT to complete in a few milliseconds again instead
> of the several minutes (!) it is now taking. Any ideas what I might try?
>
> Thanks for your time,
>
> Jan
>


Re: SELECT performance drop

2019-01-22 Thread legrand legrand
Hello,
could you check that statistics for fm_session are accurate ?

Regards
PAscal




--
Sent from: 
http://www.postgresql-archive.org/PostgreSQL-performance-f2050081.html



Re: SELECT performance drop

2019-01-22 Thread Jan Nielsen
On Tue, Jan 22, 2019 at 2:55 PM legrand legrand 
wrote:

> Hello,
> could you check that statistics for fm_session are accurate ?
>
> Regards
> PAscal
>

heroku pg:psql -c "SELECT schemaname, relname, last_analyze FROM
pg_stat_all_tables WHERE relname LIKE 'fm_%'"
schemaname |relname | last_analyze
++--
public | fm_account |
public | fm_allocation  |
public | fm_approval|
public | fm_capital |
public | fm_grant   |
public | fm_market  |
public | fm_marketplace |
public | fm_order   |
public | fm_person  |
public | fm_session |
public | fm_trader  |


I suspect you'd say "not accurate"? :-o After ANALYZE, the performance is
much better . Thank you so much!