Re: select distinct runs slow on pg 10.6

2019-09-09 Thread yash mehta
In addition to below mail, we have used btree indexes for primary key
columns. Below is the query:

 select distinct shipmentre0_.FIN_IDas FIN1_53_0_,
workflowst10_.FIN_ID   as FIN1_57_1_,
carriers3_.FIN_ID  as FIN1_40_2_,
shipmentro1_.FIN_IDas FIN1_33_3_,
shipmentme11_.FIN_ID   as FIN1_5_4_,
workflowst9_.FIN_IDas FIN1_57_5_,
workflowst8_.FIN_IDas FIN1_57_6_,
workflowst7_.FIN_IDas FIN1_57_7_,
consignees5_.FIN_IDas FIN1_81_8_,
consignees6_.FIN_IDas FIN1_81_9_,
shipmentty4_.FIN_IDas FIN1_8_10_,
shipmentsc2_.FIN_IDas FIN1_78_11_,
shipmentre0_.MOD_IDas MOD2_53_0_,
shipmentre0_.SHIPMENT_METHOD_IDas SHIPMENT3_53_0_,
shipmentre0_.SHIPPER_IDas SHIPPER4_53_0_,
shipmentre0_.CONSIGNEES_ID as CONSIGNEES5_53_0_,
shipmentre0_.SHIPMENT_BASIS_ID as SHIPMENT6_53_0_,
shipmentre0_.SHIPMENT_TYPE_ID  as SHIPMENT7_53_0_,
shipmentre0_.SHIPMENT_ARRANGEMENT_ID   as SHIPMENT8_53_0_,
shipmentre0_.SHIPMENT_DATE as SHIPMENT9_53_0_,
shipmentre0_.SHIPMENT_CURRENCY_ID  as SHIPMENT10_53_0_,
shipmentre0_.CARRIER_CREW_EXTN_ID  as CARRIER11_53_0_,
shipmentre0_.END_TIME  as END12_53_0_,
shipmentre0_.SHIPMENT_VALUE_USDas SHIPMENT13_53_0_,
shipmentre0_.SHIPMENT_VALUE_BASE   as SHIPMENT14_53_0_,
shipmentre0_.INSURANCE_VALUE_USD   as INSURANCE15_53_0_,
shipmentre0_.INSURANCE_VALUE_BASE  as INSURANCE16_53_0_,
shipmentre0_.REMARKS   as REMARKS53_0_,
shipmentre0_.DELETION_REMARKS  as DELETION18_53_0_,
shipmentre0_.SHIPMENT_STATUS_IDas SHIPMENT19_53_0_,
shipmentre0_.VAULT_STATUS_ID   as VAULT20_53_0_,
shipmentre0_.SHIPMENT_CHARGE_STATUSas SHIPMENT21_53_0_,
shipmentre0_.SHIPMENT_DOCUMENT_STATUS  as SHIPMENT22_53_0_,
shipmentre0_.INSURANCE_PROVIDERas INSURANCE23_53_0_,
shipmentre0_.SHIPMENT_PROVIDER as SHIPMENT24_53_0_,
shipmentre0_.SECURITY_PROVIDER_ID  as SECURITY25_53_0_,
shipmentre0_.CONSIGNEE_CONTACT_NAMEas CONSIGNEE26_53_0_,
shipmentre0_.SIGNALas SIGNAL53_0_,
shipmentre0_.CHARGEABLE_WT as CHARGEABLE28_53_0_,
shipmentre0_.NO_OF_PIECES  as NO29_53_0_,
shipmentre0_.REGIONS_IDas REGIONS30_53_0_,
shipmentre0_.IS_DELETEDas IS31_53_0_,
shipmentre0_.CREATED   as CREATED53_0_,
shipmentre0_.CREATED_BYas CREATED33_53_0_,
shipmentre0_.LAST_UPDATED  as LAST34_53_0_,
shipmentre0_.LAST_UPDATED_BY   as LAST35_53_0_,
shipmentre0_.LAST_CHECKED_BY   as LAST36_53_0_,
shipmentre0_.LAST_MAKEDas LAST37_53_0_,
shipmentre0_.MAKER_CHECKER_STATUS  as MAKER38_53_0_,
shipmentre0_.SHADOW_ID as SHADOW39_53_0_,
--(select now()) as formula48_0_,
workflowst10_.WORKFLOW_MODULE  as WORKFLOW2_57_1_,
workflowst10_.NAME as NAME57_1_,
workflowst10_.DEAL_DISPLAY_MODULE  as DEAL4_57_1_,
workflowst10_.WORKFLOW_LEVEL   as WORKFLOW5_57_1_,
workflowst10_.IS_DEAL_EDITABLE as IS6_57_1_,
workflowst10_.GEN_CONFOas GEN7_57_1_,
workflowst10_.GEN_DEAL_TICKET  as GEN8_57_1_,
workflowst10_.GEN_SETTLEMENTS  as GEN9_57_1_,
workflowst10_.VAULT_START  as VAULT10_57_1_,
workflowst10_.UPDATE_MAIN_INV  as UPDATE11_57_1_,
workflowst10_.UPDATE_OTHER_INV as UPDATE12_57_1_,
workflowst10_.RELEASE_SHIPMENT as RELEASE13_57_1_,
workflowst10_.IS_DEAL_SPLITTABLE   as IS14_57_1_,
workflowst10_.SEND_EMAIL   as SEND15_57_1_,
workflowst10_.IS_DELETED   as IS16_57_1_,
workflowst10_.CREATED  as CREATED57_1_,
workflowst10_.CREATED_BY   as CREATED18_57_1_,
workflowst10_.LAST_UPDATED as LAST19_57_1_,
workflowst10_.LAST_UPDATED_BY  as LAST20_57_1_,
workflowst10_.LAST_CHECKED_BY  as LAST21_57_1_,
workflowst10_.LAST_MAKED   as LAST22_57_1_,
workflowst10_.MOD_ID   as MOD23_57_1_,
workflowst10_.MAKER_CHECKER_STATUS as MAKER24_57_1_,
workflowst10_.SHADOW_IDas SHADOW25_57_1_,
--(select now()) as formula52_1_,
carriers3_.MOD_ID  as MOD2_40_2_,
carriers3_.CITIES_ID   as CITIES3_40_2_,
carriers3_.CODEas CODE40_2_,
carriers3_.NAMEas NAME40_2_,
carriers3_.CARRIER

select distinct runs slow on pg 10.6

2019-09-09 Thread yash mehta
We have a query that takes 1min to execute in postgres 10.6 and the same
executes in 4 sec in Oracle database. The query is doing 'select distinct'.
If I add a 'group by' clause, performance in postgres improves
significantly and fetches results in 2 sec (better than oracle). But
unfortunately, we cannot modify the query. Could you please suggest a way
to improve performance in Postgres without modifying the query.

*Original condition: time taken 1min*

Sort Method: external merge  Disk: 90656kB



*After removing distinct from query: time taken 2sec*

Sort Method: top-N heapsort  Memory: 201kB



*After increasing work_mem to 180MB; it takes 20sec*

Sort Method: quicksort  Memory: 172409kB



SELECT * FROM pg_stat_statements ORDER BY total_time DESC limit 1;

-[ RECORD 1
]---+-

userid  | 174862

dbid| 174861

queryid | 1469376470

query   | 

calls   | 1

total_time  | 59469.972661

min_time| 59469.972661

max_time| 59469.972661

mean_time   | 59469.972661

stddev_time | 0

rows| 25

shared_blks_hit | 27436

shared_blks_read| 2542

shared_blks_dirtied | 0

shared_blks_written | 0

local_blks_hit  | 0

local_blks_read | 0

local_blks_dirtied  | 0

local_blks_written  | 0

temp_blks_read  | 257

temp_blks_written   | 11333

blk_read_time   | 0

blk_write_time  | 0


Re: select distinct runs slow on pg 10.6

2019-09-09 Thread Flo Rance
On Mon, Sep 9, 2019 at 10:38 AM yash mehta  wrote:

> In addition to below mail, we have used btree indexes for primary key
> columns. Below is the query:
>
>  select distinct shipmentre0_.FIN_IDas FIN1_53_0_,
> workflowst10_.FIN_ID   as FIN1_57_1_,
> carriers3_.FIN_ID  as FIN1_40_2_,
> shipmentro1_.FIN_IDas FIN1_33_3_,
> shipmentme11_.FIN_ID   as FIN1_5_4_,
> workflowst9_.FIN_IDas FIN1_57_5_,
> workflowst8_.FIN_IDas FIN1_57_6_,
> workflowst7_.FIN_IDas FIN1_57_7_,
> consignees5_.FIN_IDas FIN1_81_8_,
> consignees6_.FIN_IDas FIN1_81_9_,
> shipmentty4_.FIN_IDas FIN1_8_10_,
> shipmentsc2_.FIN_IDas FIN1_78_11_,
> shipmentre0_.MOD_IDas MOD2_53_0_,
> shipmentre0_.SHIPMENT_METHOD_IDas SHIPMENT3_53_0_,
> shipmentre0_.SHIPPER_IDas SHIPPER4_53_0_,
> shipmentre0_.CONSIGNEES_ID as CONSIGNEES5_53_0_,
> shipmentre0_.SHIPMENT_BASIS_ID as SHIPMENT6_53_0_,
> shipmentre0_.SHIPMENT_TYPE_ID  as SHIPMENT7_53_0_,
> shipmentre0_.SHIPMENT_ARRANGEMENT_ID   as SHIPMENT8_53_0_,
> shipmentre0_.SHIPMENT_DATE as SHIPMENT9_53_0_,
> shipmentre0_.SHIPMENT_CURRENCY_ID  as SHIPMENT10_53_0_,
> shipmentre0_.CARRIER_CREW_EXTN_ID  as CARRIER11_53_0_,
> shipmentre0_.END_TIME  as END12_53_0_,
> shipmentre0_.SHIPMENT_VALUE_USDas SHIPMENT13_53_0_,
> shipmentre0_.SHIPMENT_VALUE_BASE   as SHIPMENT14_53_0_,
> shipmentre0_.INSURANCE_VALUE_USD   as INSURANCE15_53_0_,
> shipmentre0_.INSURANCE_VALUE_BASE  as INSURANCE16_53_0_,
> shipmentre0_.REMARKS   as REMARKS53_0_,
> shipmentre0_.DELETION_REMARKS  as DELETION18_53_0_,
> shipmentre0_.SHIPMENT_STATUS_IDas SHIPMENT19_53_0_,
> shipmentre0_.VAULT_STATUS_ID   as VAULT20_53_0_,
> shipmentre0_.SHIPMENT_CHARGE_STATUSas SHIPMENT21_53_0_,
> shipmentre0_.SHIPMENT_DOCUMENT_STATUS  as SHIPMENT22_53_0_,
> shipmentre0_.INSURANCE_PROVIDERas INSURANCE23_53_0_,
> shipmentre0_.SHIPMENT_PROVIDER as SHIPMENT24_53_0_,
> shipmentre0_.SECURITY_PROVIDER_ID  as SECURITY25_53_0_,
> shipmentre0_.CONSIGNEE_CONTACT_NAMEas CONSIGNEE26_53_0_,
> shipmentre0_.SIGNALas SIGNAL53_0_,
> shipmentre0_.CHARGEABLE_WT as CHARGEABLE28_53_0_,
> shipmentre0_.NO_OF_PIECES  as NO29_53_0_,
> shipmentre0_.REGIONS_IDas REGIONS30_53_0_,
> shipmentre0_.IS_DELETEDas IS31_53_0_,
> shipmentre0_.CREATED   as CREATED53_0_,
> shipmentre0_.CREATED_BYas CREATED33_53_0_,
> shipmentre0_.LAST_UPDATED  as LAST34_53_0_,
> shipmentre0_.LAST_UPDATED_BY   as LAST35_53_0_,
> shipmentre0_.LAST_CHECKED_BY   as LAST36_53_0_,
> shipmentre0_.LAST_MAKEDas LAST37_53_0_,
> shipmentre0_.MAKER_CHECKER_STATUS  as MAKER38_53_0_,
> shipmentre0_.SHADOW_ID as SHADOW39_53_0_,
> --(select now()) as formula48_0_,
> workflowst10_.WORKFLOW_MODULE  as WORKFLOW2_57_1_,
> workflowst10_.NAME as NAME57_1_,
> workflowst10_.DEAL_DISPLAY_MODULE  as DEAL4_57_1_,
> workflowst10_.WORKFLOW_LEVEL   as WORKFLOW5_57_1_,
> workflowst10_.IS_DEAL_EDITABLE as IS6_57_1_,
> workflowst10_.GEN_CONFOas GEN7_57_1_,
> workflowst10_.GEN_DEAL_TICKET  as GEN8_57_1_,
> workflowst10_.GEN_SETTLEMENTS  as GEN9_57_1_,
> workflowst10_.VAULT_START  as VAULT10_57_1_,
> workflowst10_.UPDATE_MAIN_INV  as UPDATE11_57_1_,
> workflowst10_.UPDATE_OTHER_INV as UPDATE12_57_1_,
> workflowst10_.RELEASE_SHIPMENT as RELEASE13_57_1_,
> workflowst10_.IS_DEAL_SPLITTABLE   as IS14_57_1_,
> workflowst10_.SEND_EMAIL   as SEND15_57_1_,
> workflowst10_.IS_DELETED   as IS16_57_1_,
> workflowst10_.CREATED  as CREATED57_1_,
> workflowst10_.CREATED_BY   as CREATED18_57_1_,
> workflowst10_.LAST_UPDATED as LAST19_57_1_,
> workflowst10_.LAST_UPDATED_BY  as LAST20_57_1_,
> workflowst10_.LAST_CHECKED_BY  as LAST21_57_1_,
> workflowst10_.LAST_MAKED   as LAST22_57_1_,
> workflowst10_.MOD_ID   as MOD23_57_1_,
> workflowst10_.MAKER_CHECKER_STATUS as MAKER24_57_1_,
> workflowst10_.SHADOW_IDas SHADOW25_57_1_,
> --(select now()) as formula52_1_,
> carriers3_.MOD_ID   

Re: select distinct runs slow on pg 10.6

2019-09-09 Thread yash mehta
Hi Flo,

PFB the explain plan:



"Limit  (cost=5925.59..5944.03 rows=25 width=6994) (actual
time=57997.219..58002.451 rows=25 loops=1)"

"  ->  Unique  (cost=5925.59..5969.10 rows=59 width=6994) (actual
time=57997.218..58002.416 rows=25 loops=1)"

"->  Sort  (cost=5925.59..5925.74 rows=59 width=6994) (actual
time=57997.214..57997.537 rows=550 loops=1)"

"  Sort Key: shipmentre0_.shipment_date, shipmentre0_.fin_id,
workflowst10_.fin_id, carriers3_.fin_id, shipmentro1_.fin_id,
shipmentme11_.fin_id, workflowst9_.fin_id, workflowst8_.fin_id,
workflowst7_.fin_id, consignees5_.fin_id, consignees6_.fin_id,
shipmentty4_.fin_id, shipmentsc2_.fin_id, shipmentre0_.mod_id,
shipmentre0_.shipment_method_id, shipmentre0_.shipment_basis_id,
shipmentre0_.shipment_arrangement_id, shipmentre0_.shipment_currency_id,
shipmentre0_.carrier_crew_extn_id, shipmentre0_.end_time,
shipmentre0_.shipment_value_usd, shipmentre0_.shipment_value_base,
shipmentre0_.insurance_value_usd, shipmentre0_.insurance_value_base,
shipmentre0_.remarks, shipmentre0_.deletion_remarks,
shipmentre0_.insurance_provider, shipmentre0_.shipment_provider,
shipmentre0_.security_provider_id, shipmentre0_.consignee_contact_name,
shipmentre0_.signal, shipmentre0_.chargeable_wt, shipmentre0_.no_of_pieces,
shipmentre0_.regions_id, shipmentre0_.created, shipmentre0_.created_by,
shipmentre0_.last_updated, shipmentre0_.last_updated_by,
shipmentre0_.last_checked_by, shipmentre0_.last_maked,
shipmentre0_.maker_checker_status, shipmentre0_.shadow_id,
workflowst10_.workflow_module, workflowst10_.name,
workflowst10_.deal_display_module, workflowst10_.workflow_level,
workflowst10_.is_deal_editable, workflowst10_.gen_confo,
workflowst10_.gen_deal_ticket, workflowst10_.gen_settlements,
workflowst10_.vault_start, workflowst10_.update_main_inv,
workflowst10_.update_other_inv, workflowst10_.release_shipment,
workflowst10_.is_deal_splittable, workflowst10_.send_email,
workflowst10_.is_deleted, workflowst10_.created, workflowst10_.created_by,
workflowst10_.last_updated, workflowst10_.last_updated_by,
workflowst10_.last_checked_by, workflowst10_.last_maked,
workflowst10_.mod_id, workflowst10_.maker_checker_status,
workflowst10_.shadow_id, carriers3_.mod_id, carriers3_.cities_id,
carriers3_.code, carriers3_.name, carriers3_.carrier_types,
carriers3_.name_in_fl, carriers3_.iata_code, carriers3_.kc_code,
carriers3_.airline_acct, carriers3_.address1, carriers3_.address2,
carriers3_.address3, carriers3_.address4, carriers3_.terminal,
carriers3_.airline_agent, carriers3_.accountinginfo,
carriers3_.import_dept, carriers3_.import_after_office_hour,
carriers3_.import_contact, carriers3_.import_fax, carriers3_.import_email,
carriers3_.export_deptt, carriers3_.export_after_office_hour,
carriers3_.export_contact, carriers3_.export_fax,
carriers3_.import_contact_no, carriers3_.export_contact_no,
carriers3_.export_email, carriers3_.awb_issued_by, carriers3_.is_deleted,
carriers3_.created, carriers3_.created_by, carriers3_.last_updated,
carriers3_.last_updated_by, carriers3_.last_checked_by,
carriers3_.last_maked, carriers3_.maker_checker_status,
carriers3_.shadow_id, shipmentro1_.mod_id, shipmentro1_.region_id,
shipmentro1_.airway_bill_no, shipmentro1_.shipment_date,
shipmentro1_.arrival_date, shipmentro1_.leg_no, shipmentro1_.no_of_pcs,
shipmentro1_.chargeable_weight, shipmentro1_.carrier_crew_extn_id,
shipmentro1_.is_deleted, shipmentro1_.created, shipmentro1_.created_by,
shipmentro1_.last_updated, shipmentro1_.last_updated_by,
shipmentro1_.last_checked_by, shipmentro1_.last_maked,
shipmentro1_.maker_checker_status, shipmentro1_.shadow_id,
shipmentme11_.mod_id, shipmentme11_.code, shipmentme11_.name,
shipmentme11_.shipment_method_type, shipmentme11_.is_deleted,
shipmentme11_.created, shipmentme11_.created_by,
shipmentme11_.last_updated, shipmentme11_.last_updated_by,
shipmentme11_.last_checked_by, shipmentme11_.last_maked,
shipmentme11_.maker_checker_status, shipmentme11_.shadow_id,
workflowst9_.workflow_module, workflowst9_.name,
workflowst9_.deal_display_module, workflowst9_.workflow_level,
workflowst9_.is_deal_editable, workflowst9_.gen_confo,
workflowst9_.gen_deal_ticket, workflowst9_.gen_settlements,
workflowst9_.vault_start, workflowst9_.update_main_inv,
workflowst9_.update_other_inv, workflowst9_.release_shipment,
workflowst9_.is_deal_splittable, workflowst9_.send_email,
workflowst9_.is_deleted, workflowst9_.created, workflowst9_.created_by,
workflowst9_.last_updated, workflowst9_.last_updated_by,
workflowst9_.last_checked_by, workflowst9_.last_maked, workflowst9_.mod_id,
workflowst9_.maker_checker_status, workflowst9_.shadow_id,
workflowst8_.workflow_module, workflowst8_.name,
workflowst8_.deal_display_module, workflowst8_.workflow_level,
workflowst8_.is_deal_editable, workflowst8_.gen_confo,
workflowst8_.gen_deal_ticket, workflowst8_.gen_settlements,
workflowst8_.vault_start, workflowst8_.update_main_inv,
workflowst8_.update_other_inv, workflowst8_.release_shipmen

Re: select distinct runs slow on pg 10.6

2019-09-09 Thread Justin Pryzby
On Mon, Sep 09, 2019 at 02:00:01PM +0530, yash mehta wrote:
> We have a query that takes 1min to execute in postgres 10.6 and the same
> executes in 4 sec in Oracle database. The query is doing 'select distinct'.
> If I add a 'group by' clause, performance in postgres improves
> significantly and fetches results in 2 sec (better than oracle). But
> unfortunately, we cannot modify the query. Could you please suggest a way
> to improve performance in Postgres without modifying the query.

Not sure it helps, but I remember this:
https://www.postgresql.org/message-id/CAKJS1f9q0j3BgMUsDbtf9%3DecfVLnqvkYB44MXj0gpVuamcN8Xw%40mail.gmail.com




Re: select distinct runs slow on pg 10.6

2019-09-09 Thread Flo Rance
There are few things to consider:
- you don't need to use distinct on all columns (and therefore sort all
columns)
- you should try to sort in memory, better than on-disk
- it seems that the planner doesn't predict the good number of rows

Regards,
Florian

On Mon, Sep 9, 2019 at 12:46 PM Justin Pryzby  wrote:

> On Mon, Sep 09, 2019 at 02:00:01PM +0530, yash mehta wrote:
> > We have a query that takes 1min to execute in postgres 10.6 and the same
> > executes in 4 sec in Oracle database. The query is doing 'select
> distinct'.
> > If I add a 'group by' clause, performance in postgres improves
> > significantly and fetches results in 2 sec (better than oracle). But
> > unfortunately, we cannot modify the query. Could you please suggest a way
> > to improve performance in Postgres without modifying the query.
>
> Not sure it helps, but I remember this:
>
> https://www.postgresql.org/message-id/CAKJS1f9q0j3BgMUsDbtf9%3DecfVLnqvkYB44MXj0gpVuamcN8Xw%40mail.gmail.com
>
>
>


Re: select distinct runs slow on pg 10.6

2019-09-09 Thread Michael Lewis
If you can't modify the query, then there is nothing more to be done to
optimize the execution afaik. Distinct is much slower than group by in
scenarios like this with many columns. You already identified the disk sort
and increased work mem to get it faster by 3x. There are not any other
tricks of which I am aware.


Re: select distinct runs slow on pg 10.6

2019-09-09 Thread yash mehta
Hi Michael/Justin/Flo,

Thank you all for your assistance. As Michael said, looks like there are no
more tricks left.

On Mon, Sep 9, 2019 at 9:09 PM Michael Lewis  wrote:

> If you can't modify the query, then there is nothing more to be done to
> optimize the execution afaik. Distinct is much slower than group by in
> scenarios like this with many columns. You already identified the disk sort
> and increased work mem to get it faster by 3x. There are not any other
> tricks of which I am aware.
>