The query plan get all columns but I'm using only one column.

2020-04-25 Thread Moises Lopez
We have production database that has slow queries because of the query get
all columns even if I'm using only one column.
The result is slow for tables that there are too much columns
The weird part is that there is environment that I can't reproduce it even
if they are using the same postgresql.conf
I didn't find what is the variant/configuration to avoid it
I could reproduce it using the official docker image of postgresql

* Steps to reproduce it

1. Run the following script:
docker run --name psql1 -d -e POSTGRES_PASSWORD=pwd postgres
docker exec -it --user=postgres psql1 psql
# Into docker container
CREATE DATABASE db;
\connect db;
CREATE TABLE link (
ID serial PRIMARY KEY,
url VARCHAR (255) NOT NULL,
name VARCHAR (255) NOT NULL,
description VARCHAR (255),
rel VARCHAR (50)
);
EXPLAIN (ANALYZE, VERBOSE, BUFFERS)
SELECT l1.url
FROM link l1
JOIN link l2
  ON l1.url=l2.url;

2. See result of the Query Plan:
QUERY PLAN

---
Hash Join  (cost=10.90..21.85 rows=40 width=516) (actual
time=0.080..0.081 rows=1 loops=1)
Output: l1.url
Hash Cond: ((l1.url)::text = (l2.url)::text)
Buffers: shared hit=5
->  Seq Scan on public.link l1  (cost=0.00..10.40 rows=40 width=516)
(actual time=0.010..0.011 rows=1 loops=1)
*Output: l1.id , l1.url, l1.name
, l1.description, l1.rel*
Buffers: shared hit=1
->  Hash  (cost=10.40..10.40 rows=40 width=516) (actual
time=0.021..0.021 rows=1 loops=1)
Output: l2.url
Buckets: 1024  Batches: 1  Memory Usage: 9kB
Buffers: shared hit=1
->  Seq Scan on public.link l2  (cost=0.00..10.40 rows=40
width=516) (actual time=0.010..0.011 rows=1 loops=1)
Output: l2.url
Buffers: shared hit=1
Planning Time: 0.564 ms
Execution Time: 0.142 ms

3. Notice that I'm using only the column "url" for "JOIN" and "SELECT"
section,
but the "Output" section is returning all columns.

Is there a manner to avoid returning all columns in order to get a better
performance?

Thank you in advance

* PostgreSQL version:

psql postgres -c "SELECT version()"
PostgreSQL 12.2 (Debian 12.2-2.pgdg100+1) on x86_64-pc-linux-gnu,
compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit

Changes made to the settings in the postgresql.conf file:  see Server
Configuration for a quick way to list them all.
without changes

Operating system and version:
cat /etc/os-release
PRETTY_NAME="Debian GNU/Linux 10 (buster)"

-- 
Moisés López
@moylop260


Re: PostgreSQL does not choose my indexes well

2020-04-25 Thread Stephen Frost
Greetings,

* Tom Lane ([email protected]) wrote:
> Stephen Frost  writes:
> > Turns out to be because what was provided wasn't actually what was being
> > used- there's a domain in there and that seems to gum up the works and
> > make it so we don't consider the partial index as being something we can
> > use (see the discussion at the end of the other sub-thread).
> 
> Some simple experiments here don't find that a domain-type column prevents
> use of the partial index.  So it's still not entirely clear what's
> happening for the OP.  I concur with Jeff's suggestion to try forcing
> use of the desired index, and see whether it happens at all and what
> the cost estimate is.

Once burned, twice shy, I suppose- considering we weren't given the
actual DDL the first round, I'm guessing there's other differences.

> I'm also wondering exactly which Postgres version this is.

Also a good question.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: The query plan get all columns but I'm using only one column.

2020-04-25 Thread Tom Lane
Moises Lopez  writes:
> ->  Seq Scan on public.link l1  (cost=0.00..10.40 rows=40 width=516)
> (actual time=0.010..0.011 rows=1 loops=1)
> *Output: l1.id , l1.url, l1.name
> , l1.description, l1.rel*

This is normal; it is not a bug, and it is not a source of performance
issues either.  The planner is choosing to do that to avoid a projection
step in this plan node, because there's no need for one.  On the other
scan, where it *is* important to project out just the required columns to
minimize the size of the hash table above the scan, it does do so:

> ->  Seq Scan on public.link l2  (cost=0.00..10.40 rows=40
> width=516) (actual time=0.010..0.011 rows=1 loops=1)
> Output: l2.url

> Is there a manner to avoid returning all columns in order to get a better
> performance?

You have not shown us anything about what your actual performance
issue is, but this isn't it.

https://wiki.postgresql.org/wiki/Slow_Query_Questions

regards, tom lane




Re: PostgreSQL does not choose my indexes well

2020-04-25 Thread Arcadio Ortega Reinoso


I'm also wondering exactly which Postgres version this is.

Also a good question.

Thanks,

Stephen

postgresql-12/bionic-pgdg,now 12.2-2.pgdg18.04+1 amd64 [instalado]
postgresql-client-12/bionic-pgdg,now 12.2-2.pgdg18.04+1 amd64 
[instalado, automático]
postgresql-client-common/bionic-pgdg,bionic-pgdg,now 213.pgdg18.04+1 all 
[instalado, automático]
postgresql-common/bionic-pgdg,bionic-pgdg,now 213.pgdg18.04+1 all 
[instalado, automático]
postgresql-doc-11/bionic-pgdg,bionic-pgdg,now 11.7-2.pgdg18.04+1 all 
[instalado]





signature.asc
Description: PGP signature


Re: The query plan get all columns but I'm using only one column.

2020-04-25 Thread Michael Lewis
The example is nonsensical so I expect it is too contrived to be useful for
analyzing the actual problem.

Additionally, the total query time is under 1ms and most of it is planning
time. Use a prepared statement or do something else to reduce planning time
like reducing statistics target if that actually makes sense for your use
case.

Else, show us something much closer to the real problem.