Re: AWS forcing PG upgrade from v9.6 a disaster
*SOLVED !!!* Below is the *new* EXPLAIN ANALYZE for *13.2* on AWS RDS
(with *no changes* to server parameters) along with the prior EXPLAIN
ANALYZE outputs for easy comparison.
While I didn't discount the significance & effect of optimizing the
server parameters, this problem always seemed to me like a fundamental
difference in how the PostgreSQL planner viewed the structure of the
query. In particular, I had a usage pattern of writing VIEWS that
worked very well with v9.6 & prior versions, but which made me suspect a
route of attack:
Since the FCC tables contain lots of one-character codes for different
conditions, to simplify maintenance & displays to humans, I created over
twenty tiny lookup tables (a dozen or so entries in each table), to
render a human-readable field as a replacement for the original
one-character field in many of the VIEWs. In some cases those
"humanized" fields were used as conditions in SELECT statements. Of
course, fields that are not referenced or selected for output from a
particular query, never get looked up (an advantage over using a JOIN
for each lookup). In some cases, for ease of handling multiple or
complex lookups, I indeed used a JOIN. All this worked fine until v10.
Here's the FROM clause that bit me:
FROM lic_en
JOIN govt_region USING (territory_id, country_id)
LEFT JOIN zip_code USING (territory_id, country_id, zip5)
LEFT JOIN "County" USING (territory_id, country_id, fips_county);
The first two JOINs are not the problem, & are in fact retained in my
solution. The problem is the third JOIN, where "fips_county" from
"County" is actually matched with the corresponding field from the
"zip_code" VIEW. Works fine, if you don't mind the performance impact
in v10 & above. It has now been rewritten, to be a sub-query for an
output field. Voila ! Back to sub-second query times.
This also solved performance issues with other queries as well. I also
now use lookup values as additional fields in the output, in addition to
the original fields, which should help some more (but means some changes
to some web pages that do queries).
-- Dean
ps: I wonder how many other RDS users of v9.6 are going to get a very
rude awakening *very soon*, as AWS is not allowing new instances of v9.6
after *August 2* (see https://forums.aws.amazon.com/ann.jspa?annID=8499
). Whether that milestone affects restores from snapshots, remains to
be seen (by others, not by me). In other words, users should plan to be
up & running on a newer version well before August. Total cost to me?
I"m in my *8th day* of dealing with this, & I still have a number of web
pages to update, due to changes in SQL field names to manage this mess.
This was certainly not a obvious solution.
*Here's from 13.2 (new):*
=> EXPLAIN ANALYZE SELECT club_count, extra_count, region_count,
callsign AS trustee_callsign, applicant_type, entity_name, licensee_id
AS _lid FROM genclub_multi_ WHERE club_count >= 5 ORDER BY extra_count
DESC, club_count DESC, entity_name;
QUERY PLAN
Sort (cost=457.77..457.77 rows=1 width=64) (actual
time=48.737..48.742 rows=43 loops=1)
Sort Key: "_Club".extra_count DESC, "_Club".club_count DESC,
"_EN".entity_name
Sort Method: quicksort Memory: 31kB
-> Nested Loop Left Join (cost=1.57..457.76 rows=1 width=64)
(actual time=1.796..48.635 rows=43 loops=1)
-> Nested Loop (cost=1.28..457.07 rows=1 width=71) (actual
time=1.736..48.239 rows=43 loops=1)
Join Filter: (("_EN".country_id =
"_GovtRegion".country_id) AND ("_EN".state = "_GovtRegion".territory_id))
Rows Removed by Join Filter: 1297
-> Nested Loop (cost=1.28..453.75 rows=1 width=70)
(actual time=1.720..47.778 rows=43 loops=1)
Join Filter: (("_HD".unique_system_identifier =
"_EN".unique_system_identifier) AND ("_HD".callsign = "_EN".callsign))
-> Nested Loop (cost=0.85..450.98 rows=1
width=65) (actual time=1.207..34.912 rows=43 loops=1)
-> Nested Loop (cost=0.43..376.57 rows=27
width=50) (actual time=0.620..20.956 rows=43 loops=1)
-> Seq Scan on "_Club"
(cost=0.00..4.44 rows=44 width=35) (actual time=0.037..0.067 rows=44
loops=1)
Filter: (club_count >= 5)
Rows Removed by Filter: 151
-> Index Scan using "_HD_callsign" on
"_HD" (cost=0.43..8.45 rows=1 width=15) (actual time=0.474..0.474
rows=1 loops=44)
Index Cond: (callsign =
"_Club".trustee_callsign)
Filter: (license_status =
'A'::bpchar)
Rows Re
Re: AWS forcing PG upgrade from v9.6 a disaster
> On May 30, 2021, at 20:07, Dean Gibson (DB Administrator) > wrote: > The first two JOINs are not the problem, & are in fact retained in my > solution. The problem is the third JOIN, where "fips_county" from "County" > is actually matched with the corresponding field from the "zip_code" VIEW. > Works fine, if you don't mind the performance impact in v10 & above. It has > now been rewritten, to be a sub-query for an output field. Voila ! Back to > sub-second query times. If, rather than a subquery, you explicitly called out the join criteria with ON, did it have the same performance benefit?
Re: AWS forcing PG upgrade from v9.6 a disaster
On 2021-05-30 20:41, Christophe Pettus wrote: On May 30, 2021, at 20:07, Dean Gibson (DB Administrator) wrote: The first two JOINs are not the problem, & are in fact retained in my solution. The problem is the third JOIN, where "fips_county" from "County" is actually matched with the corresponding field from the "zip_code" VIEW. Works fine, if you don't mind the performance impact in v10 & above. It has now been rewritten, to be a sub-query for an output field. Voila ! Back to sub-second query times. If, rather than a subquery, you explicitly called out the join criteria with ON, did it have the same performance benefit? I thought that having a "USING" clause, was semantically equivalent to an "ON" clause with the equalities explicitly stated. So no, I didn't try that. The matching that occurred is *exactly *what I wanted. I just didn't want the performance impact.
Re: AWS forcing PG upgrade from v9.6 a disaster
"Dean Gibson (DB Administrator)" writes: > I thought that having a "USING" clause, was semantically equivalent to > an "ON" clause with the equalities explicitly stated. So no, I didn't > try that. USING is not that, or at least not only that ... read the manual. I'm wondering if what you saw is some side-effect of the aliasing that USING does. regards, tom lane
Re: AWS forcing PG upgrade from v9.6 a disaster
On 2021-05-30 21:44, Tom Lane wrote: "Dean Gibson (DB Administrator)" writes: I thought that having a "USING" clause, was semantically equivalent to an "ON" clause with the equalities explicitly stated. So no, I didn't try that. USING is not that, or at least not only that ... read the manual. I'm wondering if what you saw is some side-effect of the aliasing that USING does. regards, tom lane /|USING ( /|join_column|/ [, ...] )|/ /A clause of the form //|USING ( a, b, ... )|//is shorthand for //|ON left_table.a = right_table.a AND left_table.b = right_table.b ...|//. Also, //|USING|//implies that only one of each pair of equivalent columns will be included in the join output, not both./ / / /The //|USING|//clause is a shorthand that allows you to take advantage of the specific situation where both sides of the join use the same name for the joining column(s). It takes a comma-separated list of the shared column names and forms a join condition that includes an equality comparison for each one. For example, joining //|T1|//and //|T2|//with //|USING (a, b)|//produces the join condition //|ON /|T1|/.a = /|T2|/.a AND /|T1|/.b = /|T2|/.b|//./ /Furthermore, the output of //|JOIN USING|//suppresses redundant columns: there is no need to print both of the matched columns, since they must have equal values. While //|JOIN ON|//produces all columns from //|T1|//followed by all columns from //|T2|//, //|JOIN USING|//produces one output column for each of the listed column pairs (in the listed order), followed by any remaining columns from //|T1|//, followed by any remaining columns from //|T2|//./ /Finally, //|NATURAL|//is a shorthand form of //|USING|//: it forms a //|USING|//list consisting of all column names that appear in both input tables. As with //|USING|//, these columns appear only once in the output table. If there are no common column names, //|NATURAL JOIN|//behaves like //|JOIN ... ON TRUE|//, producing a cross-product join./ I get that it's like NATURAL, in that only one column is included. Is there some other side-effect? Is the fact that I was using a LEFT JOIN, relevant? Is what I was doing, unusual (or risky)?
