Bitmap heap scan performance

2019-08-09 Thread Rob Emery
Hi Guys,

I’m at a bit of a loss where I can go with the following 2 queries
that are over the same data structure (DDL attached) under postgresql
PostgreSQL 9.5.16 on x86_64-pc-linux-gnu (Debian 9.5.16-1.pgdg90+1),
compiled by gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516, 64-bit and
could do with a second set of eyes if someone would oblige.

I’ve attached Query1.txt and Query2.txt along with the DDL for the
tables and indicies and execution plans.

On our production environment we’re running at about 2 seconds (with
the cache warm); I’m getting a comparable speed on my playbox. It
seems to me like the Bitmap Heap Scan on proposal is the issue because
the recheck is throwing away enormous amounts of data. The
has_been_anonymised flag on the proposal is effectively a soft-delete;
so I’ve tried adding something like :

CREATE INDEX ON proposal.proposal (system_id, legacy_organisation_id, reference)
WHERE has_been_anonymised = false;

Which I was hoping would shrink the size of the index significantly
and encourage an index scan rather than bitmap, however it didn’t have
that effect. For reference:

Has_been_anonymised false:1534790
Has_been_anonymised true:7072192

Row counts over the whole table in question are :
Proposal.proposal:86069822340 MB
Proposal.note:2624423  1638 MB

Presumably I could partition proposal on has_been_anonymised, however
the row counts seem low enough that it feels a bit like overkill? We
also need referential integrity so I'll need to wait until that's in
(I think it's coming in PG12?)

If I decrease the number of legacy_organisation_id’s that are being
used then the query performance gets much better, but presumably
that’s because there’s a smaller dataset.

Any thoughts or ideas?

Thanks
Rob

-- 
 


A big Get Focused ‘thank you’ 

Why you 
should partner with an Agile company 



*
*
*Phone:* 0800 021 0888   Email: [email protected] 

Codeweavers Ltd | Barn 4 | Dunston 
Business Village | Dunston | ST18 9AB
Registered in England and Wales No. 
04092394 | VAT registration no. 974 9705 63 



 
  
  

CREATE TABLE proposal.proposal
(
  id bigserial NOT NULL,
  reference uuid NOT NULL,
  system_id integer NOT NULL,
  legacy_id integer NOT NULL,
  organisation_id integer,
  legacy_organisation_id integer NOT NULL,
  has_been_anonymised boolean NOT NULL,
  external_source character varying,
  CONSTRAINT pk_proposal PRIMARY KEY (id),
  CONSTRAINT proposal_reference_key UNIQUE (reference),
  CONSTRAINT proposal_system_id_legacy_id_key UNIQUE (system_id, legacy_id)
)
WITH (
  OIDS=FALSE
);

CREATE INDEX proposal_has_been_anonymised_system_id_legacy_organisation__idx
  ON proposal.proposal
  USING btree
  (has_been_anonymised, system_id, legacy_organisation_id);

CREATE INDEX proposal_has_been_anonymised_system_id_legacy_organisation_idx1
  ON proposal.proposal
  USING btree
  (has_been_anonymised, system_id, legacy_organisation_id, reference);

--

CREATE TABLE proposal.note
(
  id bigserial NOT NULL,
  reference uuid NOT NULL,
  proposal_reference uuid NOT NULL,
  entry_time timestamp with time zone NOT NULL,
  legacy_originator_id integer,
  type_id integer NOT NULL,
  content text NOT NULL,
  legacy_read_by integer,
  time_read timestamp with time zone,
  "from" character varying(100),
  "to" character varying(100),
  originator_reference uuid,
  read_by_reference uuid,
  CONSTRAINT pk_note PRIMARY KEY (id),
  CONSTRAINT note_proposal_reference_fkey FOREIGN KEY (proposal_reference)
  REFERENCES proposal.proposal (reference) MATCH SIMPLE
  ON UPDATE NO ACTION ON DELETE NO ACTION,
  CONSTRAINT note_reference_key UNIQUE (reference)
)
WITH (
  OIDS=FALSE
);

CREATE INDEX note_entry_time_type_id_idx
  ON proposal.note
  USING btree
  (entry_time, type_id)
  WHERE legacy_read_by IS NULL;

CREATE INDEX note_proposal_reference_idx
  ON proposal.note
  USING btree
  (proposal_reference);

CREATE INDEX note_proposal_reference_type_id_entry_time_idx
  ON proposal.note
  USING btree
  (proposal_reference, type_id, entry_time)
  WHERE legacy_read_by IS NULL;

CREATE INDEX note_type_id_entry_time_idx
  ON proposal.note
  USING btree
  (type_id, entry_time)
  WHERE legacy_read_by IS NULL;EXPLAIN (ANALYZE TRUE, VERBOSE TRUE, COSTS TRUE, BUFFERS TRUE, TIMING TRUE) 
SELECT COUNT(1)
FROM proposal.proposal
INNER JOIN proposal.note 
ON proposal.note.proposal_reference = proposal.proposal.reference
WHERE proposal.proposal.system_id = '11'
AND proposal.proposal.legacy_organisation_id IN ('2', '6', '7', '8', 
'9', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', '20', '21', 
'22', '23', '24', '25',

Re: Bitmap heap scan performance

2019-08-12 Thread Rob Emery
Aha!

That's a great hint, we had that set down to an obscenely low value
due to our max_connections setting being quite high. I've tweaked it
back up to 4MB for now and it's definitely had a marked improvement!

Many Thanks,
Rob

On 09/08/2019, Jeff Janes  wrote:
> On Fri, Aug 9, 2019 at 4:42 AM Rob Emery  wrote:
>
>
>>
>> It
>> seems to me like the Bitmap Heap Scan on proposal is the issue because
>> the recheck is throwing away enormous amounts of data.
>
>
> Have you tried increasing work_mem?  The probable reason for the recheck is
> that your bitmap overflows the allowed memory, and then switches
> from storing every tid to storing just the block numbers.  As indicated by
> the lossy part of "Heap Blocks: exact=3983 lossy=27989"
>
> The
>> has_been_anonymised flag on the proposal is effectively a soft-delete;
>> so I’ve tried adding something like :
>>
>> CREATE INDEX ON proposal.proposal (system_id, legacy_organisation_id,
>> reference)
>> WHERE has_been_anonymised = false;
>>
>> Which I was hoping would shrink the size of the index significantly
>>
>
> The partial index should be smaller, but when comparing to the index with
> "has_been_anonymised" as the leading column, it won't make a lot of
> difference.  You only have to scan a smaller part of the larger index, and
> the sizes of part of the index you have to scan in each case will be
> roughly comparable.
>
>
>> and encourage an index scan rather than bitmap, however it didn’t have
>> that effect.
>
>
> To encourage index scans over bitmap scans, you can increase
> effective_cache_size.  Or to really force the issue, you can "set
> enable_bitmapscan=off" but that is something you would usually do locally
> for experimental purposes, not do it in production's config settings.
>
> Cheers,
>
> Jeff
>


-- 
Robert Emery
Infrastructure Director

E: [email protected] | T: 01785 711633 | W: www.codeweavers.net

-- 
 <https://codeweavers.net>


A big Get Focused ‘thank you’ 
<https://codeweavers.net/company-blog/a-big-get-focused-thank-you>
Why you 
should partner with an Agile company 
<https://codeweavers.net/company-blog/why-you-should-partner-with-an-agile-company>


*
*
*Phone:* 0800 021 0888   Email: [email protected] 
<mailto:[email protected]>
Codeweavers Ltd | Barn 4 | Dunston 
Business Village | Dunston | ST18 9AB
Registered in England and Wales No. 
04092394 | VAT registration no. 974 9705 63 



 
<https://twitter.com/Codeweavers_Ltd>  
<https://www.facebook.com/Codeweavers.Ltd/>  
<https://www.linkedin.com/company/codeweavers-limited>