Re: PostgreSQL 12.3 slow index scan chosen

2020-06-20 Thread Tom Lane
I wrote:
> ... oh, now I see: apparently, your filter condition is such that *no*
> rows of the objectcustomfieldvalues table get past the filter:
>
>   ->  Index Scan using objectcustomfieldvalues3 on 
> objectcustomfieldvalues objectcustomfieldvalues_1  (cost=0.56..807603.40 
> rows=915 width=4) (actual time=21165.441..21165.441 rows=0 loops=1)
>  Filter: ((disabled = 0) AND ((largecontent ~~* 
> '%958575%'::text) OR ((content)::text ~~* '%958575%'::text)))
>  Rows Removed by Filter: 19030904

> That's kind of annoying :-(.  I wonder if there's a way to be smarter?
> This case would work a lot better if the filter conditions were not
> applied till after the merge; but of course that wouldn't be an
> improvement in general.  Or maybe we should penalize the mergejoin
> cost estimate if there's a highly selective filter in the way.

I experimented with this some more, with the intention of creating a
planner patch that would do the latter, and was surprised to find that
there already is such a penalty.  It's sort of indirect and undocumented,
but nonetheless the estimate for whether a mergejoin can stop early
does get heavily de-rated if the planner realizes that the table is
being heavily filtered.  So now I'm thinking that your problem is that
"rows=915" is not a good enough estimate of what will happen in this
indexscan.  It looks good in comparison to the table size of 19M rows,
but on a percentage basis compared to the true value of 0 rows, it's
still pretty bad.  You said you'd increased the stats target for
objectcustomfieldvalues.objectid, but maybe the real problem is needing
to increase the targets for content and largecontent, in hopes of driving
down the estimate for how many rows will pass this filter condition.

regards, tom lane




Re: PostgreSQL 12.3 slow index scan chosen

2020-06-20 Thread Kenneth Marshall
On Sat, Jun 20, 2020 at 02:22:03PM -0400, Tom Lane wrote:
> I wrote:
> > ... oh, now I see: apparently, your filter condition is such that *no*
> > rows of the objectcustomfieldvalues table get past the filter:
> >
> >   ->  Index Scan using objectcustomfieldvalues3 on 
> > objectcustomfieldvalues objectcustomfieldvalues_1  (cost=0.56..807603.40 
> > rows=915 width=4) (actual time=21165.441..21165.441 rows=0 loops=1)
> >  Filter: ((disabled = 0) AND ((largecontent ~~* 
> > '%958575%'::text) OR ((content)::text ~~* '%958575%'::text)))
> >  Rows Removed by Filter: 19030904
> 
> > That's kind of annoying :-(.  I wonder if there's a way to be smarter?
> > This case would work a lot better if the filter conditions were not
> > applied till after the merge; but of course that wouldn't be an
> > improvement in general.  Or maybe we should penalize the mergejoin
> > cost estimate if there's a highly selective filter in the way.
> 
> I experimented with this some more, with the intention of creating a
> planner patch that would do the latter, and was surprised to find that
> there already is such a penalty.  It's sort of indirect and undocumented,
> but nonetheless the estimate for whether a mergejoin can stop early
> does get heavily de-rated if the planner realizes that the table is
> being heavily filtered.  So now I'm thinking that your problem is that
> "rows=915" is not a good enough estimate of what will happen in this
> indexscan.  It looks good in comparison to the table size of 19M rows,
> but on a percentage basis compared to the true value of 0 rows, it's
> still pretty bad.  You said you'd increased the stats target for
> objectcustomfieldvalues.objectid, but maybe the real problem is needing
> to increase the targets for content and largecontent, in hopes of driving
> down the estimate for how many rows will pass this filter condition.
> 
>   regards, tom lane

Hi Tom,

I increased the statistics on the content field because it had the most
values (1900) versus largecontent (1000). When I reached 8000, the
plan changed to:

# explain (analyze,buffers) SELECT DISTINCT main.* FROM Articles main JOIN 
ObjectCustomFieldValues ObjectCustomFieldValues_1  ON ( 
ObjectCustomFieldValues_1.Disabled = '0' ) AND ( 
ObjectCustomFieldValues_1.ObjectId = main.id )  WHERE 
(ObjectCustomFieldValues_1.LargeContent ILIKE '%958575%' OR 
ObjectCustomFieldValues_1.Content ILIKE '%958575%') AND (main.Disabled  = '0') 
ORDER BY main.SortOrder ASC, main.Name ASC;

 QUERY
PLAN
 

 Unique  (cost=1151.07..1151.10 rows=1 width=137) (actual time=1.782..1.782 
rows=0 loops=1)
   Buffers: shared hit=158
   ->  Sort  (cost=1151.07..1151.08 rows=1 width=137) (actual time=1.781..1.781 
rows=0 loops=1)
 Sort Key: main.sortorder, main.name, main.id, main.summary, 
main.class, main.parent, main.uri, main.creator, main.created, 
main.lastupdatedby, main.lastupdated
 Sort Method: quicksort  Memory: 25kB
 Buffers: shared hit=158
 ->  Hash Join  (cost=185.12..1151.06 rows=1 width=137) (actual 
time=1.777..1.777 rows=0 loops=1)
   Hash Cond: (objectcustomfieldvalues_1.objectid = main.id)
   Buffers: shared hit=158
   ->  Bitmap Heap Scan on objectcustomfieldvalues 
objectcustomfieldvalues_1  (cost=177.36..1141.46 rows=699 width=4) (actual 
time=1.704..1.704 rows=0 loops=1)
 Recheck Cond: ((largecontent ~~* '%958575%'::text) OR 
((content)::text ~~* '%958575%'::text))
 Filter: (disabled = 0)
 Buffers: shared hit=154
 ->  BitmapOr  (cost=177.36..177.36 rows=868 width=0) 
(actual time=1.703..1.703 rows=0 loops=1)
   Buffers: shared hit=154
   ->  Bitmap Index Scan on 
objectcustomfieldvalues_largecontent_trgm  (cost=0.00..30.80 rows=1 width=0) 
(actual time=0.282..0.282 rows=0 loops=1)
 Index Cond: (largecontent ~~* '%958575%'::text)
 Buffers: shared hit=28
   ->  Bitmap Index Scan on 
objectcustomfieldvalues_content_trgm  (cost=0.00..146.21 rows=868 width=0) 
(actual time=1.421..1.421 rows=0 loops=1)
 Index Cond: ((content)::text ~~* 
'%958575%'::text)
 Buffers: shared hit=126
   ->  Hash  (cost=5.91..5.91 rows=148 width=137) (actual 
time=0.071..0.071 rows=148 loops=1)
 Buckets: 1024  Batches: 1  Memory Usage: 34kB
 Buffers: shared hit=4