On Sat, Jun 20, 2020 at 02:22:03PM -0400, Tom Lane wrote:
> I wrote:
> > ... oh, now I see: apparently, your filter condition is such that *no*
> > rows of the objectcustomfieldvalues table get past the filter:
> >
> > -> Index Scan using objectcustomfieldvalues3 on
> > objectcustomfieldvalues objectcustomfieldvalues_1 (cost=0.56..807603.40
> > rows=915 width=4) (actual time=21165.441..21165.441 rows=0 loops=1)
> > Filter: ((disabled = 0) AND ((largecontent ~~*
> > '%958575%'::text) OR ((content)::text ~~* '%958575%'::text)))
> > Rows Removed by Filter: 19030904
>
> > That's kind of annoying :-(. I wonder if there's a way to be smarter?
> > This case would work a lot better if the filter conditions were not
> > applied till after the merge; but of course that wouldn't be an
> > improvement in general. Or maybe we should penalize the mergejoin
> > cost estimate if there's a highly selective filter in the way.
>
> I experimented with this some more, with the intention of creating a
> planner patch that would do the latter, and was surprised to find that
> there already is such a penalty. It's sort of indirect and undocumented,
> but nonetheless the estimate for whether a mergejoin can stop early
> does get heavily de-rated if the planner realizes that the table is
> being heavily filtered. So now I'm thinking that your problem is that
> "rows=915" is not a good enough estimate of what will happen in this
> indexscan. It looks good in comparison to the table size of 19M rows,
> but on a percentage basis compared to the true value of 0 rows, it's
> still pretty bad. You said you'd increased the stats target for
> objectcustomfieldvalues.objectid, but maybe the real problem is needing
> to increase the targets for content and largecontent, in hopes of driving
> down the estimate for how many rows will pass this filter condition.
>
> regards, tom lane
Hi Tom,
I increased the statistics on the content field because it had the most
values (1900) versus largecontent (1000). When I reached 8000, the
plan changed to:
# explain (analyze,buffers) SELECT DISTINCT main.* FROM Articles main JOIN
ObjectCustomFieldValues ObjectCustomFieldValues_1 ON (
ObjectCustomFieldValues_1.Disabled = '0' ) AND (
ObjectCustomFieldValues_1.ObjectId = main.id ) WHERE
(ObjectCustomFieldValues_1.LargeContent ILIKE '%958575%' OR
ObjectCustomFieldValues_1.Content ILIKE '%958575%') AND (main.Disabled = '0')
ORDER BY main.SortOrder ASC, main.Name ASC;
QUERY
PLAN
Unique (cost=1151.07..1151.10 rows=1 width=137) (actual time=1.782..1.782
rows=0 loops=1)
Buffers: shared hit=158
-> Sort (cost=1151.07..1151.08 rows=1 width=137) (actual time=1.781..1.781
rows=0 loops=1)
Sort Key: main.sortorder, main.name, main.id, main.summary,
main.class, main.parent, main.uri, main.creator, main.created,
main.lastupdatedby, main.lastupdated
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=158
-> Hash Join (cost=185.12..1151.06 rows=1 width=137) (actual
time=1.777..1.777 rows=0 loops=1)
Hash Cond: (objectcustomfieldvalues_1.objectid = main.id)
Buffers: shared hit=158
-> Bitmap Heap Scan on objectcustomfieldvalues
objectcustomfieldvalues_1 (cost=177.36..1141.46 rows=699 width=4) (actual
time=1.704..1.704 rows=0 loops=1)
Recheck Cond: ((largecontent ~~* '%958575%'::text) OR
((content)::text ~~* '%958575%'::text))
Filter: (disabled = 0)
Buffers: shared hit=154
-> BitmapOr (cost=177.36..177.36 rows=868 width=0)
(actual time=1.703..1.703 rows=0 loops=1)
Buffers: shared hit=154
-> Bitmap Index Scan on
objectcustomfieldvalues_largecontent_trgm (cost=0.00..30.80 rows=1 width=0)
(actual time=0.282..0.282 rows=0 loops=1)
Index Cond: (largecontent ~~* '%958575%'::text)
Buffers: shared hit=28
-> Bitmap Index Scan on
objectcustomfieldvalues_content_trgm (cost=0.00..146.21 rows=868 width=0)
(actual time=1.421..1.421 rows=0 loops=1)
Index Cond: ((content)::text ~~*
'%958575%'::text)
Buffers: shared hit=126
-> Hash (cost=5.91..5.91 rows=148 width=137) (actual
time=0.071..0.071 rows=148 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 34kB
Buffers: shared hit=4