Hstore index for full text search

2020-08-11 Thread Burhan Akbulut
Hi all,

I need help in full text search optimization for hstore type. I added my
query to explain.depesz, you can check the query and also i added explain
analyze result in this link: https://explain.depesz.com/s/QT1e

table_ord.ops column type is hstore. I couldn't find the effective index
that would reduce the run time of the query.

When I tried to add an gin index to the hstore column, I got the following
error:

create index on table_ord USING gin (ops);
ERROR:  index row size 2728 exceeds maximum 2712 for index " table_ord_
ops_idx"

How can I fix this query ?

Best Regards,

Burhan Akbulut
DBA - Cooksoft


Re: Hstore index for full text search

2020-08-11 Thread Michael Lewis
Hash Cond: (o.courier_id = cc.id)
Filter: (((o.tracker_code)::text ~~* '%1654323%'::text) OR
((table_cus.name)::text
~~* '%1654323%'::text) OR ((au.username)::text ~~ '%1654323%'::text) OR
((o.source)::text ~~* '%1654323%'::text) OR ((o.ops -> 'shop'::text) ~~*
'%1654323%'::text) OR ((o.ops -> 'camp_code'::text) ~~* '%1654323%'::text)
OR ((city.name)::text ~~* '%1654323%'::text) OR ((co.name)::text ~~*
'%1654323%'::text) OR ((o.tr_code)::text ~~* '%1654323%'::text) OR ((o.ops
? 'shipping_company'::text) AND ((o.ops -> 'shipping_company'::text) ~~*
'%1654323%'::text)) OR ((cc.name)::text ~~* '%1654323%'::text))


All those OR conditions on different tables and fields seems like it will
be unlikely that the planner will do anything with the index you are trying
to create (for this query).

On the error, I came across discussions on dba.stackexchange.com
referencing a limit of about 1/3 of the page size (8192) for every
key because of it being a btree underneath. It could be one or more of your
keys in ops (like shop, camp_code, and shipping_company) is much longer
than those examples shown in the query.


Re: Hstore index for full text search

2020-08-11 Thread Tom Lane
Michael Lewis  writes:
> Hash Cond: (o.courier_id = cc.id)
> Filter: (((o.tracker_code)::text ~~* '%1654323%'::text) OR
> ((table_cus.name)::text
> ~~* '%1654323%'::text) OR ((au.username)::text ~~ '%1654323%'::text) OR
> ((o.source)::text ~~* '%1654323%'::text) OR ((o.ops -> 'shop'::text) ~~*
> '%1654323%'::text) OR ((o.ops -> 'camp_code'::text) ~~* '%1654323%'::text)
> OR ((city.name)::text ~~* '%1654323%'::text) OR ((co.name)::text ~~*
> '%1654323%'::text) OR ((o.tr_code)::text ~~* '%1654323%'::text) OR ((o.ops
> ? 'shipping_company'::text) AND ((o.ops -> 'shipping_company'::text) ~~*
> '%1654323%'::text)) OR ((cc.name)::text ~~* '%1654323%'::text))

> All those OR conditions on different tables and fields seems like it will
> be unlikely that the planner will do anything with the index you are trying
> to create (for this query).

A GIN index on an hstore column only provides the ability to search for
exact matches to hstore key strings.  There are a few bells and whistles,
like the ability to AND or OR such conditions.  But basically it's just an
exact-match engine, and it doesn't index the hstore's data values at all
(which is why the implementors weren't too concerned about having a length
limit on the index entries).  There is 0 chance of this index type being
useful for what the OP wants to do.

Given these examples, I'd think about setting up a collection of pg_trgm
indexes on the specific hstore keys you care about, ie something like

CREATE INDEX ON mytable USING GIST ((ops -> 'camp_code') gist_trgm_ops);
CREATE INDEX ON mytable USING GIST ((ops -> 'shipping_company') gist_trgm_ops);
...

which'd allow indexing queries like

... WHERE (ops -> 'camp_code') LIKE '%1654323%'
   OR (ops -> 'shipping_company') LIKE '%1654323%'

I'm not sure how far this will get you, though; if there's a whole lot
of different keys of interest, maintaining a separate index for each
one is probably too much overhead.  Another point is that you will only
get an indexscan if *every* OR'd clause matches some index.  The example
query looks sufficiently unstructured that that might be hard to ensure.

I kind of wonder whether this data design is actually a good idea.
It doesn't seem to match your querying style terribly well.

regards, tom lane




Re: Hstore index for full text search

2020-08-11 Thread Michael Lewis
On Tue, Aug 11, 2020 at 4:46 PM Tom Lane  wrote:

> A GIN index on an hstore column only provides the ability to search for
> exact matches to hstore key strings.  There are a few bells and whistles,
> like the ability to AND or OR such conditions.  But basically it's just an
> exact-match engine, and it doesn't index the hstore's data values at all
> (which is why the implementors weren't too concerned about having a length
> limit on the index entries).  There is 0 chance of this index type being
> useful for what the OP wants to do.
>

Thanks for sharing. More like json path ops and not the full key and value.
Interesting.


> Another point is that you will only
> get an indexscan if *every* OR'd clause matches some index.  The example
> query looks sufficiently unstructured that that might be hard to ensure.
>

Does this still apply when the where clauses are on several tables and not
just one?


Re: Is there a known bug with SKIP LOCKED and "tuple to be locked was already moved to another partition due to concurrent update"?

2020-08-11 Thread Jim Jarvie

Hi Gunther & List,

I think I have an extremely similar issue and things point in the same 
direction of a potential issue for skip locked on partitioned tables.


Background is I had a queue table on v9.6 with fairly high volume (>50M 
items, growth in the 1+M/daily).


Processing the queue with FOR UPDATE SKIP LOCKED was reliable but 
traffic volumes on v9.6 and the fact v12 is current let to migrating to 
v12 and using a partitioned table.


Queue has distinct categories of items, so the table is partitioned by 
list on each category.  Processing in 1 category results in it being 
updated to the next logical category (i.e. it will migrate partition 
once it is processed).


Within each category, there can be 10'sM rows, so the list partition is 
hash partitioned as well.  I don't think this is the issue but is 
mentioned for completeness.


Now, when processing the queue, there are regular transaction aborts 
with "tuple to be locked was already moved to another partition due to 
concurrent update".


From everything I can trace, it really does look like this is caused by 
rows which should be locked/skipped as they are processed by a different 
thread.


I tried switching 'for update' to 'for key share' and that created a 
cascade of deadlock aborts, so was worse for my situation.


For now, I roll back and repeat the select for update skip locked until 
it succeeds - which it eventually does.


However, it really feels like these should just have been skipped by 
PostgreSQL without the rollback/retry until success.


So, am I missing something/doing it wrong?  Or could there be a 
potential issue that needs raised?


Thanks

Jim



On 30-Jun.-2020 12:10, Gunther Schadow wrote:

Hi all,

long time ago I devised with your help a task queuing system which 
uses SELECT ... FOR UPDATE SKIP LOCKED for many parallel workers to 
find tasks in the queue, and it used a partitioned table where the hot 
part of the queue is short and so the query for a job is quick and the 
skip locked locking makes sure that one job is only assigned to one 
worker. And this works pretty well for me, except that when we run 
many workers we find a lot of these failures occurring:


"tuple to be locked was already moved to another partition due to 
concurrent update"


This would not exactly look like a bug, because the message says "to 
be locked", so at least it's not allowing two workers to lock the same 
tuple. But it seems that the skip-locked mode should not make an error 
out of this, but treat it as the tuple was already locked. Why would 
it want to lock the tuple (representing the job) if another worker has 
already finished his UPDATE of the job to mark it as "done" (which is 
what makes the tuple move to the "completed" partition.)


Either the SELECT for jobs to do returned a wrong tuple, which was 
already update, or there is some lapse in the locking.


Either way it would seem to be a waste of time throwing all these 
errors when the tuple should not even have been selected for update 
and locking.


I wonder if anybody knows anything about that issue? Of course you'll 
want to see the DDL and SQL queries, etc. but you can't really try it 
out unless you do some massively parallel magic. So I figured I just ask.


regards,
-Gunther







Re: Hstore index for full text search

2020-08-11 Thread Tom Lane
Michael Lewis  writes:
> On Tue, Aug 11, 2020 at 4:46 PM Tom Lane  wrote:
>> Another point is that you will only
>> get an indexscan if *every* OR'd clause matches some index.  The example
>> query looks sufficiently unstructured that that might be hard to ensure.

> Does this still apply when the where clauses are on several tables and not
> just one?

Yeah.  In that case there's no hope of an indexscan at all, since for all
the planner knows, the query might match some table rows that don't meet
any of the conditions mentioned for that table's columns.  If you can
write

WHERE (condition-on-t1.a OR condition-on-t1.b OR ...)
  AND (condition-on-t2.x OR condition-on-t2.y OR ...)

then you've got a chance of making the OR'd conditions into index
qualifications on t1 or t2 respectively.  But if it's

WHERE condition-on-t1.a OR condition-on-t1.b OR ...
   OR condition-on-t2.x OR condition-on-t2.y OR ...

then you're in for full-table scans.  (This is another thing that
was bothering me about the data design, though I failed to think
it through clearly enough to state before.)

regards, tom lane