Re: Help me in reducing the CPU cost for the high cost query below, as it is hitting production seriously!!

2018-05-23 Thread pavan95
Hi all/Justin,

As said, created index on the res_users.res_employee_id and the below link
is the explain plan result.

Link:  https://explain.depesz.com/s/hoct   .

And the cost of Previous query is 92,129 and the cost of current modified
query after creating the above said index is 91,462. But good thing is we
can see a very small improvement..!. 

Please find the table definitions which are used in the query(which you
asked for tms_worflow_history).

1. tms_timesheet_details:

amp_test=# \d tms_timesheet_details
  Table
"public.tms_timesheet_details"
   Column|Type |
Modifiers
-+-+
 id  | integer | not null default
nextval('tms_timesheet_details_id_seq'::regclass)
 status  | character varying   |
 create_uid  | integer |
 effort_hours| double precision|
 work_order_no   | character varying   |
 res_employee_id | character varying   |
 wsr_header_id   | integer |
 remarks | character varying   |
 write_date  | timestamp without time zone |
 timesheet_header_id | integer |
 date| date|
 create_date | timestamp without time zone |
 write_uid   | integer |
 release_no  | character varying   |
 project_id  | character varying   |
 loc_name| character varying   |
 user_id | integer |
 ao_emp_id   | character varying   |
Indexes:
"tms_timesheet_details_pkey" PRIMARY KEY, btree (id)
"tms_timesheet_details_uniq_res_employee_id_efforts" UNIQUE, btree
(res_employee_id, work_order_no, release_no, date, project_id)
"timesheet_detail_inx" btree (wsr_header_id, timesheet_header_id)
"tms_timesheet_details_all_idx" btree (wsr_header_id, work_order_no,
release_no, date, effort_hours)
"tms_timesheet_details_id_idx" btree (id) WHERE wsr_header_id IS NOT
NULL
"ts_detail_date_idx" btree (date)
"ts_detail_hdr_id_idx" btree (timesheet_header_id)
"ts_detail_release_no_idx" btree (release_no)
"work_order_no_idx" btree (work_order_no)


2. tms_workflow_history:

amp_test=# \d tms_workflow_history
 Table "public.tms_workflow_history"
  Column   |Type |
Modifiers
---+-+---
 id| integer | not null default
nextval('tms_workflow_history_id_seq'::regclass)
 create_uid| integer |
 current_activity  | character varying   |
 user_id   | integer |
 sequence  | integer |
 is_final_approver | boolean |
 wsr_id| integer |
 write_uid | integer |
 timesheet_id  | integer |
 state | character varying   |
 write_date| timestamp without time zone |
 remarks   | character varying   |
 create_date   | timestamp without time zone |
 group_id  | integer |
 active| boolean |
Indexes:
"tms_workflow_history_pkey" PRIMARY KEY, btree (id)
"curract_state_isfinal_app_idx" btree (current_activity, state,
is_final_approver)
"timesheet_id_group_id_active_idx" btree (timesheet_id, group_id,
active)
"tms_wkf_his_active_is_final_approveridx" btree (active,
is_final_approver)
"tms_wkf_his_group_id_idx" btree (group_id)
"tms_wkf_his_timesheet_id_idx" btree (timesheet_id)
"tms_wkf_hist_current_activity_idx" btree (current_activity)
"tms_wkf_hist_state_idx" btree (state)
"wsr_id_idx" btree (wsr_id)

3. res_users:

 Table "public.res_users"
  Column   |Type |  
Modifiers
---+-+
 id| integer | not null default
nextval('res_users_id_seq'::regclass)
 active| boolean | default true
 login | character varying   | not null
 password  | character varying   |
 company_id| integer | not null
 partner_id| integer | not null
 create_date   | timestamp without time zone |
 share | boolean |
 write_uid   

Re: Help me in reducing the CPU cost for the high cost query below, as it is hitting production seriously!!

2018-05-23 Thread mlunnon
Hi Pavan,
that's quite a big query. I can see that the generate_series function is
getting repeatedly called and the planner estimates for this sub query are
out by a factor of 66. You might try to re-write using a WITH query. I am
assuming that you have already analyzed all the tables and also added
appropriate indexes on join/query columns.
Regards
Matthew



--
Sent from: 
http://www.postgresql-archive.org/PostgreSQL-performance-f2050081.html



Re: Help me in reducing the CPU cost for the high cost query below, as it is hitting production seriously!!

2018-05-23 Thread Justin Pryzby
On Wed, May 23, 2018 at 12:01:06AM -0700, pavan95 wrote:
> As said, created index on the res_users.res_employee_id and the below link
> is the explain plan result.
> 
> Link:  https://explain.depesz.com/s/hoct
> 
> And the cost of Previous query is 92,129 and the cost of current modified
> query after creating the above said index is 91,462. But good thing is we

Forget the cost - that's postgres *model* of the combined IO+CPU.
If the model is off, that's may cause bad plans and could be looked into
further.

In any case, that index cut your runtime from 75sec to 60sec (in spite of the
modelled cost).

It looks like you resolved the bad estimate on the users table?

> 2. tms_workflow_history:
> Indexes:
> "tms_workflow_history_pkey" PRIMARY KEY, btree (id)
> "curract_state_isfinal_app_idx" btree (current_activity, state, 
> is_final_approver)
> "timesheet_id_group_id_active_idx" btree (timesheet_id, group_id, active)
> "tms_wkf_his_active_is_final_approveridx" btree (active, 
> is_final_approver)
> "tms_wkf_his_group_id_idx" btree (group_id)
> "tms_wkf_his_timesheet_id_idx" btree (timesheet_id)
> "tms_wkf_hist_current_activity_idx" btree (current_activity)
> "tms_wkf_hist_state_idx" btree (state)
> "wsr_id_idx" btree (wsr_id)

How big is the table ?  And curract_state_isfinal_app_idx ?
Have these been reindexed (or pg_repacked) recently?

It seems to me that the remaining query optimization is to improve this:
> Bitmap Heap Scan on tms_workflow_history history (cost=193.19..1,090.50 
> rows=6,041 width=12) (actual time=3.692..15.714 rows=11,351 loops=1)

I think you could consider clustering (or repacking) the table on
curract_state_isfinal_app_idx (but you'll have to judge if that's okay and
won't negatively affect other queries).

But, what's your target runtime ?  Improvements here could cut at most 15sec
off the total 60sec.  If you're hoping to save more than that, you'll need to
(also) look further than the query:

 - postgres parameters: what are shared_buffers, work_mem, effective_cache_size 
?
   + https://wiki.postgresql.org/wiki/Server_Configuration
 - are there other DBs/applications running on the server/VM ?
 - kernel tuning (readahead, VM parameters, write cache, scheduler, THP, etc)
 - server hardware (what OS? storage? RAM? filesystem?)
 - how does the storage perform outside of postgres?
   + something like this: /usr/sbin/bonnie++ -f -n0 -x4 -d /var/lib/pgsql

Justin



Re: Help me in reducing the CPU cost for the high cost query below, as it is hitting production seriously!!

2018-05-23 Thread pavan95
Hi Matthew,

Yeah and you said right!. I have analyzed the entire database and also
created appropriate indexes for the columns used in WHERE/JOIN clauses.

Okay I will just provide the fourth union part of the query which you can
analyze easier(this not that big).

Please find the query part. And refer to the table definitions in my
previous posts.
Query:

select 
0 as id,
header.id as header_id,
'0' as emp_id,
 0 as sno,
users.alias_id as alias,
partner.name as name,
(select SUM( floor(effort_hours) + ( (effort_hours -
floor(effort_hours)) / 0.6 ))
from tms_timesheet_details where 
work_order_no != 'CORPORATE' and
res_employee_id=users.res_employee_id and date in (select date::date from
generate_series((select start_date from tms_timesheet_period where
id=127),(select end_date from tms_timesheet_period where id=127),'1
day'::interval) date)) billed_hrs,
(select SUM( floor(effort_hours) + ( (effort_hours -
floor(effort_hours)) / 0.6 ))
from tms_timesheet_details where 
release_no = 'unbillable_time' and
res_employee_id=users.res_employee_id and date in (select date::date from
generate_series((select start_date from tms_timesheet_period where
id=127),(select end_date from tms_timesheet_period where id=127),'1
day'::interval) date)) as unbilled_hrs,
(select SUM( floor(effort_hours) + ( (effort_hours -
floor(effort_hours)) / 0.6 ))
from tms_timesheet_details where 
release_no = 'paid_time_off' and
res_employee_id=users.res_employee_id and date in (select date::date from
generate_series((select start_date from tms_timesheet_period where
id=127),(select end_date from tms_timesheet_period where id=127),'1
day'::interval) date)) as paid_time_off,
(select SUM( floor(effort_hours) + ( (effort_hours -
floor(effort_hours)) / 0.6 ))
from tms_timesheet_details where 
release_no = 'unpaid_leave' and
res_employee_id=users.res_employee_id and date in (select date::date from
generate_series((select start_date from tms_timesheet_period where
id=127),(select end_date from tms_timesheet_period where id=127),'1
day'::interval) date)) as unpaid_leave,
(select SUM( floor(effort_hours) + ( (effort_hours -
floor(effort_hours)) / 0.6 ))
from tms_timesheet_details where 
release_no = 'bereavement_time' and
res_employee_id=users.res_employee_id and date in (select date::date from
generate_series((select start_date from tms_timesheet_period where
id=127),(select end_date from tms_timesheet_period where id=127),'1
day'::interval) date)) as breavement_time,
(select SUM( floor(effort_hours) + ( (effort_hours -
floor(effort_hours)) / 0.6 ))
from tms_timesheet_details where 
res_employee_id=users.res_employee_id
and  date >='2018-04-16' and date <='2018-04-30' and release_no in
('sick_leave','casual_leave','privilege_leave','optional_holiday') ) as
leave,
'Not Submitted' state,
header.res_employee_id as header_emp_id,
'Not Submitted' as header_status

from res_users users,
res_partner partner,
tms_timesheet_status status,
tms_timesheet_header header

where 
header.res_employee_id=users.res_employee_id
and  status.id=header.status_id
and users.partner_id=partner.id
and status.name='Draft'
and header.timesheet_period_id=127
and header.res_employee_id in ('14145', '14147',
'ON-14148', '11331', '11332', '11333', 'ON-11334', '65432', '65416',
'54643', '23266', '4681', '56464', '64649', '89564', '98798', '1',
'44466', '87852', '65464', '65464', '44655', '8201', '65465', 'ON-78785',
'13233', 'ON-5544', 'ON-54654', '23131', '98765', '25134', '13218', '84645',
'4687', '6546', '4988', '89796', '79878', '7198', '15726', '2132', '5310',
'13056', '4446', '16825', '16740', '3912', '19601', '13200', '12981',
'ON-3332', '13166', 'ON-3144', 'ON-1251', 'ON-2799', 'ON-2338', '7286',
'ON-2381', 'ON-3102', 'ON-2938', '64782', '5407', '54641', '46379',
'G151151', '5007', '6011', '5050', '20869', '20204', '12410', '10488',
'14582', '13574', '12982', '7884', '7788', '13417', '7922', '16744',
'16746', '16756', '8292', '16745', '19989', '8297', '5020', '14184',
'17161', '20767', '20753', '20289', '19979', '19975', '20272', '4292',
'G9341010', '14791', '5121', 'ON-1767', 'ON-581', 'ON-700', 'ON-437',
'ON-562', 'ON-1726', 'OFF-1060', 'ON-147', 'OFF-612', 'OFF-635', 'OFF-857',
'ON-900280', 'ON-1934', 'ON-1922', 'ON-2258', 'OFF-2537', 'ON-2872',
'ON-

Re: Help me in reducing the CPU cost for the high cost query below, as it is hitting production seriously!!

2018-05-23 Thread pavan95
Hi Justin,

>How big is the table ?  And curract_state_isfinal_app_idx ? 
>Have these been reindexed (or pg_repacked) recently? 

The size of the table 'tms_workflow_history' is 7600Kb(which is pretty
small). Yes those indexes were dropped and recreated. 

>It looks like you resolved the bad estimate on the users table? 
Yeah, even I think the same.

Please find the explain plan which got increased again vastly. Is this
because of the increase in rows?

Link :  https://explain.depesz.com/s/Ifr   

The above is the explain plan taken from production server. And this is the
main plan to tune.

Please let me know the where I'm going wrong. Thank you in Advance.!!

Regards,
Pavan



--
Sent from: 
http://www.postgresql-archive.org/PostgreSQL-performance-f2050081.html



Re: dsa_allocate() faliure

2018-05-23 Thread Sand Stone
>> At which commit ID?
83fcc615020647268bb129cbf86f7661feee6412 (5/6)

>>do you mean that these were separate PostgreSQL clusters, and they were all 
>>running the same query and they all crashed like this?
A few worker nodes, a table is hash partitioned by "aTable.did" by
Citus, and further partitioned by PG10 by time range on field "ts". As
far as I could tell, Citus just does a query rewrite, and execute the
same type of queries to all nodes.

>>so this happened at the same time or at different times?
At the same time. The queries are simple count and sum queries, here
is the relevant part from one of the worker nodes:
2018-05-23 01:24:01.492 UTC [130536] ERROR:  dsa_allocate could not
find 7 free pages
2018-05-23 01:24:01.492 UTC [130536] CONTEXT:  parallel worker
STATEMENT:  COPY (SELECT count(1) AS count, sum(worker_column_1) AS
sum FROM (SELECT subquery.avg AS worker_column_1 FROM (SELECT
aTable.did, avg((aTable.sum OPERATOR(pg_catalog./)
(aTable.count)::double precision)) AS avg FROM public.aTable_102117
aTable WHERE ((aTable.ts OPERATOR(pg_catalog.>=) '2018-04-25
00:00:00+00'::timestamp with time zone) AND (aTable.ts
OPERATOR(pg_catalog.<=) '2018-04-30 00:00:00+00'::timestamp with time
zone) AND (aTable.v OPERATOR(pg_catalog.=) 12345)) GROUP BY
aTable.did) subquery) worker_subquery) TO STDOUT WITH (FORMAT binary)


>> a parallel worker process
I think this is more of PG10 parallel bg worker issue. I don't think
Citus just lets each worker PG server do its own planning.

I will try to do more experiments about this, and see if there is any
specific query to cause the parallel query execution to fail. As far
as I can tell, the level of concurrency triggered this issue. That is
executing 10s of queries as shown on the worker nodes, depending on
the stats, the PG10 core may or may not spawn more bg workers.

Thanks for your time!





On Tue, May 22, 2018 at 9:44 PM, Thomas Munro
 wrote:
> On Wed, May 23, 2018 at 4:10 PM, Sand Stone  wrote:
dsa_allocate could not find 7 free pages
>> I just this error message again on all of my worker nodes (I am using
>> Citus 7.4 rel). The PG core is my own build of release_10_stable
>> (10.4) out of GitHub on Ubuntu.
>
> At which commit ID?
>
> All of your worker nodes... so this happened at the same time or at
> different times?  I don't know much about Citus -- do you mean that
> these were separate PostgreSQL clusters, and they were all running the
> same query and they all crashed like this?
>
>> What's the best way to debug this? I am running pre-production tests
>> for the next few days, so I could gather info. if necessary (I cannot
>> pinpoint a query to repro this yet, as we have 10K queries running
>> concurrently).
>
> Any chance of an EXPLAIN plan for the query that crashed like this?
> Do you know if it's using multiple Gather[Merge] nodes and parallel
> bitmap heap scans?  Was it a regular backend process or a parallel
> worker process (or a Citus worker process, if that is a thing?) that
> raised the error?
>
> --
> Thomas Munro
> http://www.enterprisedb.com



Re: Help me in reducing the CPU cost for the high cost query below, as it is hitting production seriously!!

2018-05-23 Thread Justin Pryzby
On Wed, May 23, 2018 at 07:03:18AM -0700, pavan95 wrote:
> Please find the explain plan which got increased again vastly. Is this
> because of the increase in rows?
> 
> Link :  https://explain.depesz.com/s/Ifr   

That's explain without "analyze", so not very useful.

There's handful of questions:

On Wed, May 23, 2018 at 08:43:22AM -0500, Justin Pryzby wrote:
>  - postgres parameters: what are shared_buffers, work_mem, 
> effective_cache_size ?
>+ https://wiki.postgresql.org/wiki/Server_Configuration
>  - are there other DBs/applications running on the server/VM ?
>  - kernel tuning (readahead, VM parameters, write cache, scheduler, THP, etc)
>  - server hardware (what OS? storage? RAM? filesystem?)
>  - how does the storage perform outside of postgres?
>+ something like this: /usr/sbin/bonnie++ -f -n0 -x4 -d /var/lib/pgsql

Justin



Re: Help me in reducing the CPU cost for the high cost query below, as it is hitting production seriously!!

2018-05-23 Thread pavan95
Hi Justin,

Please find the below explain plan link.

Link:  https://explain.depesz.com/s/owE   


Any help is appreciated. Thanks in Advance.


Regards,
Pavan



--
Sent from: 
http://www.postgresql-archive.org/PostgreSQL-performance-f2050081.html



Re: Help me in reducing the CPU cost for the high cost query below, as it is hitting production seriously!!

2018-05-23 Thread Justin Pryzby
On Wed, May 23, 2018 at 10:20:42PM -0700, pavan95 wrote:
> Hi Justin,
> 
> Please find the below explain plan link.
> 
> Link:  https://explain.depesz.com/s/owE   

That's explain analyze but explain(analyze,buffers) is better.

Is this on a completely different server than the previous plans ?

This rowcount misestimate appears to be a significant part of the problem:
 Merge Join (cost=228.77..992.11 ROWS=20 width=22) (actual time=4.353..12.439 
ROWS=343 loops=1)
Merge Cond: (history_2.timesheet_id = header_2.id)

You could look at the available stats for that table's column in pg_stats.
Is there an "most common values" list?
Maybe you need to ALTER TABLE .. SET STATISTICS 999 (or some increased value)
and re-analyze ?

You can see these are also taking large component of the query time:

 Bitmap Index Scan on ts_detail_release_no_idx (cost=0.00..33.86 rows=1,259 
width=0) (actual time=0.304..0.304 rows=1,331 LOOPS=327)
Index Cond: ((release_no)::text = 'paid_time_off'::text)
...
 Bitmap Index Scan on ts_detail_release_no_idx (cost=0.00..33.86 rows=1,259 
width=0) (actual time=0.304..0.304 rows=1,331 LOOPS=343)
Index Cond: ((release_no)::text = 'paid_time_off'::text)

I wonder whether it would help to
CREATE INDEX ON tms_timesheet_details(timesheet_header_id) WHERE
((release_no)::text = 'paid_time_off'::text);

In addition to the other settings I asked about, it might be interesting to
SHOW effective_io_concurrency;

You're at the point where I can't reasonably contribute much more.

Justin