Hstore index for full text search
Hi all, I need help in full text search optimization for hstore type. I added my query to explain.depesz, you can check the query and also i added explain analyze result in this link: https://explain.depesz.com/s/QT1e table_ord.ops column type is hstore. I couldn't find the effective index that would reduce the run time of the query. When I tried to add an gin index to the hstore column, I got the following error: create index on table_ord USING gin (ops); ERROR: index row size 2728 exceeds maximum 2712 for index " table_ord_ ops_idx" How can I fix this query ? Best Regards, Burhan Akbulut DBA - Cooksoft
Re: Hstore index for full text search
Hash Cond: (o.courier_id = cc.id) Filter: (((o.tracker_code)::text ~~* '%1654323%'::text) OR ((table_cus.name)::text ~~* '%1654323%'::text) OR ((au.username)::text ~~ '%1654323%'::text) OR ((o.source)::text ~~* '%1654323%'::text) OR ((o.ops -> 'shop'::text) ~~* '%1654323%'::text) OR ((o.ops -> 'camp_code'::text) ~~* '%1654323%'::text) OR ((city.name)::text ~~* '%1654323%'::text) OR ((co.name)::text ~~* '%1654323%'::text) OR ((o.tr_code)::text ~~* '%1654323%'::text) OR ((o.ops ? 'shipping_company'::text) AND ((o.ops -> 'shipping_company'::text) ~~* '%1654323%'::text)) OR ((cc.name)::text ~~* '%1654323%'::text)) All those OR conditions on different tables and fields seems like it will be unlikely that the planner will do anything with the index you are trying to create (for this query). On the error, I came across discussions on dba.stackexchange.com referencing a limit of about 1/3 of the page size (8192) for every key because of it being a btree underneath. It could be one or more of your keys in ops (like shop, camp_code, and shipping_company) is much longer than those examples shown in the query.
Re: Hstore index for full text search
Michael Lewis writes: > Hash Cond: (o.courier_id = cc.id) > Filter: (((o.tracker_code)::text ~~* '%1654323%'::text) OR > ((table_cus.name)::text > ~~* '%1654323%'::text) OR ((au.username)::text ~~ '%1654323%'::text) OR > ((o.source)::text ~~* '%1654323%'::text) OR ((o.ops -> 'shop'::text) ~~* > '%1654323%'::text) OR ((o.ops -> 'camp_code'::text) ~~* '%1654323%'::text) > OR ((city.name)::text ~~* '%1654323%'::text) OR ((co.name)::text ~~* > '%1654323%'::text) OR ((o.tr_code)::text ~~* '%1654323%'::text) OR ((o.ops > ? 'shipping_company'::text) AND ((o.ops -> 'shipping_company'::text) ~~* > '%1654323%'::text)) OR ((cc.name)::text ~~* '%1654323%'::text)) > All those OR conditions on different tables and fields seems like it will > be unlikely that the planner will do anything with the index you are trying > to create (for this query). A GIN index on an hstore column only provides the ability to search for exact matches to hstore key strings. There are a few bells and whistles, like the ability to AND or OR such conditions. But basically it's just an exact-match engine, and it doesn't index the hstore's data values at all (which is why the implementors weren't too concerned about having a length limit on the index entries). There is 0 chance of this index type being useful for what the OP wants to do. Given these examples, I'd think about setting up a collection of pg_trgm indexes on the specific hstore keys you care about, ie something like CREATE INDEX ON mytable USING GIST ((ops -> 'camp_code') gist_trgm_ops); CREATE INDEX ON mytable USING GIST ((ops -> 'shipping_company') gist_trgm_ops); ... which'd allow indexing queries like ... WHERE (ops -> 'camp_code') LIKE '%1654323%' OR (ops -> 'shipping_company') LIKE '%1654323%' I'm not sure how far this will get you, though; if there's a whole lot of different keys of interest, maintaining a separate index for each one is probably too much overhead. Another point is that you will only get an indexscan if *every* OR'd clause matches some index. The example query looks sufficiently unstructured that that might be hard to ensure. I kind of wonder whether this data design is actually a good idea. It doesn't seem to match your querying style terribly well. regards, tom lane
Re: Hstore index for full text search
On Tue, Aug 11, 2020 at 4:46 PM Tom Lane wrote: > A GIN index on an hstore column only provides the ability to search for > exact matches to hstore key strings. There are a few bells and whistles, > like the ability to AND or OR such conditions. But basically it's just an > exact-match engine, and it doesn't index the hstore's data values at all > (which is why the implementors weren't too concerned about having a length > limit on the index entries). There is 0 chance of this index type being > useful for what the OP wants to do. > Thanks for sharing. More like json path ops and not the full key and value. Interesting. > Another point is that you will only > get an indexscan if *every* OR'd clause matches some index. The example > query looks sufficiently unstructured that that might be hard to ensure. > Does this still apply when the where clauses are on several tables and not just one?
Re: Is there a known bug with SKIP LOCKED and "tuple to be locked was already moved to another partition due to concurrent update"?
Hi Gunther & List, I think I have an extremely similar issue and things point in the same direction of a potential issue for skip locked on partitioned tables. Background is I had a queue table on v9.6 with fairly high volume (>50M items, growth in the 1+M/daily). Processing the queue with FOR UPDATE SKIP LOCKED was reliable but traffic volumes on v9.6 and the fact v12 is current let to migrating to v12 and using a partitioned table. Queue has distinct categories of items, so the table is partitioned by list on each category. Processing in 1 category results in it being updated to the next logical category (i.e. it will migrate partition once it is processed). Within each category, there can be 10'sM rows, so the list partition is hash partitioned as well. I don't think this is the issue but is mentioned for completeness. Now, when processing the queue, there are regular transaction aborts with "tuple to be locked was already moved to another partition due to concurrent update". From everything I can trace, it really does look like this is caused by rows which should be locked/skipped as they are processed by a different thread. I tried switching 'for update' to 'for key share' and that created a cascade of deadlock aborts, so was worse for my situation. For now, I roll back and repeat the select for update skip locked until it succeeds - which it eventually does. However, it really feels like these should just have been skipped by PostgreSQL without the rollback/retry until success. So, am I missing something/doing it wrong? Or could there be a potential issue that needs raised? Thanks Jim On 30-Jun.-2020 12:10, Gunther Schadow wrote: Hi all, long time ago I devised with your help a task queuing system which uses SELECT ... FOR UPDATE SKIP LOCKED for many parallel workers to find tasks in the queue, and it used a partitioned table where the hot part of the queue is short and so the query for a job is quick and the skip locked locking makes sure that one job is only assigned to one worker. And this works pretty well for me, except that when we run many workers we find a lot of these failures occurring: "tuple to be locked was already moved to another partition due to concurrent update" This would not exactly look like a bug, because the message says "to be locked", so at least it's not allowing two workers to lock the same tuple. But it seems that the skip-locked mode should not make an error out of this, but treat it as the tuple was already locked. Why would it want to lock the tuple (representing the job) if another worker has already finished his UPDATE of the job to mark it as "done" (which is what makes the tuple move to the "completed" partition.) Either the SELECT for jobs to do returned a wrong tuple, which was already update, or there is some lapse in the locking. Either way it would seem to be a waste of time throwing all these errors when the tuple should not even have been selected for update and locking. I wonder if anybody knows anything about that issue? Of course you'll want to see the DDL and SQL queries, etc. but you can't really try it out unless you do some massively parallel magic. So I figured I just ask. regards, -Gunther
Re: Hstore index for full text search
Michael Lewis writes: > On Tue, Aug 11, 2020 at 4:46 PM Tom Lane wrote: >> Another point is that you will only >> get an indexscan if *every* OR'd clause matches some index. The example >> query looks sufficiently unstructured that that might be hard to ensure. > Does this still apply when the where clauses are on several tables and not > just one? Yeah. In that case there's no hope of an indexscan at all, since for all the planner knows, the query might match some table rows that don't meet any of the conditions mentioned for that table's columns. If you can write WHERE (condition-on-t1.a OR condition-on-t1.b OR ...) AND (condition-on-t2.x OR condition-on-t2.y OR ...) then you've got a chance of making the OR'd conditions into index qualifications on t1 or t2 respectively. But if it's WHERE condition-on-t1.a OR condition-on-t1.b OR ... OR condition-on-t2.x OR condition-on-t2.y OR ... then you're in for full-table scans. (This is another thing that was bothering me about the data design, though I failed to think it through clearly enough to state before.) regards, tom lane
