pg11: Query using index, but only for half the partitions
Hello all Does anyone have an explanation for this? Query uses only three columns of the table "snap" and all three are in an index. The planner seems to think that some partitions are better scanned in full. Yet for the other half of them it's using the index just fine. Can someone enlighten me what could be happening here? Cheers Stefan Table: \d+ snap Table "statspack.snap" Column |Type | Collation | Nullable | Default | Storage | Stats target | Description --+-+---+--+--+--+--+- snap_id | integer | | | nextval('snap_id_seq'::regclass) | plain| | start_time | timestamp without time zone | | | | plain| | end_time | timestamp without time zone | | | | plain| | metadata_snap_id | integer | | | | plain| | instance_id | character varying(200) | | | | extended | | data_size| bigint | | | | plain| | state| text| | | | extended | | Partition key: RANGE (start_time) Indexes: "snap_snap_id_ix" UNIQUE, btree (start_time, snap_id) "snap_instance_id_ix" btree (start_time, snap_id, instance_id) Partitions: snap_20200225 FOR VALUES FROM ('2020-02-25 00:00:00') TO ('2020-02-26 00:00:00'), snap_20200226 FOR VALUES FROM ('2020-02-26 00:00:00') TO ('2020-02-27 00:00:00'), ... (omitted for clarity) snap_20200423 FOR VALUES FROM ('2020-04-23 00:00:00') TO ('2020-04-24 00:00:00') Query predicates: from statspack.cloudwatch_metrics m, statspack.snap s, statspack.settings l where m.snap_id = s.snap_id and m.start_time = s.start_time and s.snap_id = l.snap_id and s.start_time = l.start_time and l.name = 'max_connections' and s.instance_id::varchar = 'test01' and m.start_time > now() - interval '30 days' order by m.snap_id Plan: QUERY PLAN --- WindowAgg (cost=4972.00..4974.41 rows=21 width=212) (actual time=131.541..132.934 rows=2763 loops=1) -> Subquery Scan on m (cost=4972.00..4972.26 rows=21 width=124) (actual time=125.612..127.744 rows=2763 loops=1) -> Sort (cost=4972.00..4972.05 rows=21 width=188) (actual time=125.609..126.226 rows=2763 loops=1) Sort Key: m_1.snap_id Sort Method: quicksort Memory: 830kB -> WindowAgg (cost=4970.49..4971.54 rows=21 width=188) (actual time=120.667..124.384 rows=2763 loops=1) -> Sort (cost=4970.49..4970.54 rows=21 width=147) (actual time=120.614..121.186 rows=2763 loops=1) Sort Key: (date_trunc('day'::text, m_1.start_time)) Sort Method: quicksort Memory: 485kB -> WindowAgg (cost=2050.81..4970.02 rows=21 width=147) (actual time=118.370..119.501 rows=2763 loops=1) -> Nested Loop (cost=2050.81..4968.92 rows=21 width=107) (actual time=7.974..113.780 rows=2763 loops=1) -> Hash Join (cost=2050.53..4309.54 rows=18 width=40) (actual time=7.942..83.302 rows=2763 loops=1) Hash Cond: ((m_1.snap_id = s.snap_id) AND (m_1.start_time = s.start_time)) -> Append (cost=0.00..1870.00 rows=51845 width=28) (actual time=0.059..60.010 rows=51681 loops=1) Subplans Removed: 19 -> Seq Scan on cloudwatch_metrics_20200324 m_1 (cost=0.00..37.53 rows=1068 width=28) (actual time=0.058..0.676 rows=1070 loops=1) Filter: (start_time > (now() - '30 days'::interval)) Rows Removed by Filter: 160 -> Seq Scan on cloudwatch_metrics_20200325 m_2 (cost=0.00..54.59 rows=1805 width=28) (actual time=0.006..1.062 rows=1805 loops=1) Filter: (start_time > (now() - '30 days'::interval)) ... ( omitted for clarity ) -> Seq Scan on cloudwatch_metrics_20200423 m_31 (cost=0.00..17.99 rows=571 width=28) (actual time=0.008..0.325 rows=572 loops=1) Filter: (start_time > (now() - '30 days'::interval)) -> Hash (cost=2007.12..2007.12 rows=2894 width=12) (actual time=7.836..7.836 rows=2878 loops=1) Buckets: 4096 Batches: 1 Memory Usage: 167kB -> Append (cost=0.00..20
Re: pg11: Query using index, but only for half the partitions
Thanks Andreas, But I don't think that that's what's happening. Take this example line: -> Seq Scan on snap_20200328 s_23 (cost=0.00..51.73 rows=95 width=12) (actual time=0.007..0.225 rows=95 loops=1) Filter: ((instance_id)::text = 'test01'::text) Rows Removed by Filter: 1723 There's no question that this is more expensive than just reading the 95 rows from the index directly and returning them - table access is not required for this query. The index fully satisfies the query's columns, as evident in the output: -> Index Only Scan using snap_20200325_start_time_snap_id_instance_id_idx on snap_20200325 s_20 (cost=0.28..74.80 rows=95 width=12) (actual time=0.011..0.113 rows=95 loops=1) Index Cond: (instance_id = 'test01'::text) ->Heap Fetches: 0 That is, unless it doesn't consider that fact when costing? On Thu, Apr 23, 2020 at 5:01 PM Andreas Kretschmer wrote: > > > Am 23.04.20 um 10:13 schrieb Stefan Knecht: > > Seq Scan on snap_20200225 s (cost=0.00..1.19 rows=1 width=12) > > the partition is very small, so it's cheaper to scan only the table (one > block) than index + table (1 + 1 block). > > > Regards, Andreas > > -- > 2ndQuadrant - The PostgreSQL Support Company. > www.2ndQuadrant.com > > > > -- // zztat - The Next-Gen Oracle Performance Monitoring and Reaction Framework! Visit us at zztat.net | @zztat_oracle | fb.me/zztat | zztat.net/blog/
Re: Oracle vs. PostgreSQL - a comment
Okay I'll bite. Comparing Postgres with Oracle is a bit like comparing a rubber duck you might buy your three year old, with a 30 ton super tanker. Do they both float? Yeah, but that's about the only similarity. The rubber duck barely tells you how and why it floats, but the super tanker is packed with instrumentation, statistics, events and trace functionality down to every last bit of activity. Of course, that comes at a cost. Oracle is also the single most feature-rich database out there - the feature set of Postgres isn't even 1% of what Oracle has. It's not a fair comparison. Postgres has its place, it's free, it works well. But you can't compare it to an RDBMS like Oracle. Not in terms of size, nor the time it takes to install (and your 2 hours are definitely on the high end - it shouldn't take much more than half an hour). In fact, you likely want to limit the feature set you are installing with Oracle - both to keep it as lean as possible, to reduce bugs (yes it contains many millions of lines of code more than Postgres, and it's written by humans, it will obviously have more bugs), and also to reduce the time it takes to install, upgrade and patch it. There are ways to do that. That's my THB 0.02 On Sat, May 30, 2020 at 7:21 PM Paul Förster wrote: > Hi, > > I know, this list is not for this, but I just couldn't resist. Please > forgive me. > > Being an Oracle DBA for two decades now (back then starting with Oracle > 8.0.5) and only doing PostgreSQL since version 10.3, I feel compelled to > share some of my experiences with both. > > Quick facts: > > All installations were performed on the same host except for Oracle Data > Guard and Patroni with etcd, which both require at least three nodes, or an > odd number >2 respectively) to establish a democracy to determine the > master/primary and replica/standby databases. However, all machines have > the same hardware and operating system: > > OS: openSUSE Leap 15.1 (server setup, not desktop) > CPU: Intel i7-7700T CPU > RAM: 32 GB > Disk Hardware: SSD > > Also, Oracle requires 161 additional packages to be installed, many of > which are 32-bit packages, for a supposedly 64-bit only software! This > results in 150 MB additional disk space needed and swamps the system with > 32-bit packages! > > PostgreSQL only requires a few packages to be installed depending on the > options one chooses to compile the source with. Anyway, none of these > packages require a 32-bit version! > > Size of installation: > > Software: > $ du -sh /data/postgres/12.3 /data/oracle/product/19.6 > 62M /data/postgres/12.3 > 8.5G/data/oracle/product/19.6 > Databases: > $ du -sh /data/oradb/* > 3.3G/data/oradb/cdb01# Oracle Container w/ 1 PDB > 1.8G/data/oradb/sdb01# Oracle stand alone database > $ du -sh /data/pgdb/sdb01 > 659M/data/pgdb/sdb01 # PostgreSQL 12.3 database cluster > > All databases are a clean setup, no schemas, users, tables, data, etc. > Just an empty base. > > Installation: > Oracle 19c: ~2h > unzip to ORACLE_HOME > runInstaller > unzip newest OPatch p6880880_20_Linux-x86-64 > apply p30797938_19_Linux-x86-64 (19.6.1) > PostgreSQL 12.3 compiled from source: ~3m30s > bunzip postgresql-12.3.tar.bz2 to PGHOME > make install-world > > Create database: > Oracle: > create stand alone database: ~30m > create container database (cdb$root): ~47m > create pluggable database (pdb): ~26s > Memory to run a database reasonably well: > at least 1 GB SGA for a stand alone database > at least 4 GB SGA for a container database > PostgreSQL: > initdb: <1s > create database: <200ms > Memory: 128 MB db_buffers > > Start/stop database: > Oracle: > startup (standard, 1 GB SGA): ~15s > shutdown immediate (standard, 1 GB SGA): ~21s > startup (container, 4 GB SGA):~16s > shutdown immediate (container, 4 GB SGA): ~23s > PostgreSQL: > pg_ctl start: 0.1s > pg_ctl stop: 0.2s > > Other discoveries/experiences: > Oracle: > Set up Data Guard (2 nodes) with observer (3. node): ~4h > Applying a PSU or RU often requires downtime of 60m-90m. > Migrating a major version often requires downtime of 60m-90m. > Migrating a new major version requires a lot of work in advance. > Switching Data Guard takes ~1m. > PostgreSQL: > Set up Patroni (2 nodes) with etcd (3 nodes): ~30m > Applying a new minor version requires downtime of <2s. > Migrating a new major version requires downtime of <20s. > Migrating a new major version requires a few minutes work in > advance. > Switching Patroni takes ~1s. > > Oracle has some good concept
Re: Value Too long varchar(100)
You should be able to do something with this if you require it to be done with COPY: https://paquier.xyz/postgresql-2/postgres-9-3-feature-highlight-copy-tofrom-program/ But, as David suggested, I'd also recommend to do an intermediate step, and load the data into a table, or pre-process the file to be how you want it. On Thu, Oct 27, 2022 at 7:19 PM David G. Johnston < david.g.johns...@gmail.com> wrote: > > > On Thu, Oct 27, 2022 at 5:02 AM Rama Krishnan > wrote: > >> Hi team, >> >> We are getting csv file from client to upload data in my db table , one >> particular column I. E clinet description column contains more than 100 >> character hence I am getting value too long varchar (100) so we decided to >> upload db only first 100 characters. How to use this thing in copy command >> > > You cannot. Either fix the content of the file or remove the arbitrary > length limitation on the field (i.e., change the type to "text"). I > suggest the later. You may also,copy into a temporary staging table that > lacks the limit, then use insert to move the transformed data (via a select > query) into the production table. > > David J. >
To create or not to create that index
Hello Why does this happen? profile_aggregates=> create index concurrently foo_idx on agg (status, foots, created_ts); ^CCancel request sent ERROR: canceling statement due to user request profile_aggregates=> profile_aggregates=> create index concurrently foo_idx on agg (status, foots, created_ts); ERROR: relation " foo_idx" already exists Are these operations not atomic ? Cheers Stefan
Re: To create or not to create that index
But that "invalid" index is being used by queries On Fri, Aug 18, 2023 at 10:41 AM David G. Johnston < david.g.johns...@gmail.com> wrote: > On Thu, Aug 17, 2023 at 8:37 PM Stefan Knecht > wrote: > >> Hello >> >> Why does this happen? >> >> profile_aggregates=> create index concurrently foo_idx on agg (status, >> foots, created_ts); >> >> >> ^CCancel request sent >> ERROR: canceling statement due to user request >> profile_aggregates=> >> profile_aggregates=> create index concurrently foo_idx on agg (status, >> foots, created_ts); >> ERROR: relation " foo_idx" already exists >> >> Are these operations not atomic ? >> >> > No, being atomic would interfere with doing things concurrently. Per the > docs: > > In a concurrent index build, the index is actually entered as an “invalid” > index into the system catalogs in one transaction, then two table scans > occur in two more transactions. > ... > If a problem arises while scanning the table, such as a deadlock or a > uniqueness violation in a unique index, the CREATE INDEX command will fail > but leave behind an “invalid” index. > > "Problem" includes you forcibly killing it while it is running. > > https://www.postgresql.org/docs/current/sql-createindex.html > > David J. > >
Re: To create or not to create that index
Ah no it is not. Something else was changed at the same time. Sigh. Thanks for clarifying David On Fri, Aug 18, 2023 at 10:42 AM Stefan Knecht wrote: > But that "invalid" index is being used by queries > > On Fri, Aug 18, 2023 at 10:41 AM David G. Johnston < > david.g.johns...@gmail.com> wrote: > >> On Thu, Aug 17, 2023 at 8:37 PM Stefan Knecht >> wrote: >> >>> Hello >>> >>> Why does this happen? >>> >>> profile_aggregates=> create index concurrently foo_idx on agg (status, >>> foots, created_ts); >>> >>> >>> ^CCancel request sent >>> ERROR: canceling statement due to user request >>> profile_aggregates=> >>> profile_aggregates=> create index concurrently foo_idx on agg (status, >>> foots, created_ts); >>> ERROR: relation " foo_idx" already exists >>> >>> Are these operations not atomic ? >>> >>> >> No, being atomic would interfere with doing things concurrently. Per the >> docs: >> >> In a concurrent index build, the index is actually entered as an >> “invalid” index into the system catalogs in one transaction, then two table >> scans occur in two more transactions. >> ... >> If a problem arises while scanning the table, such as a deadlock or a >> uniqueness violation in a unique index, the CREATE INDEX command will fail >> but leave behind an “invalid” index. >> >> "Problem" includes you forcibly killing it while it is running. >> >> https://www.postgresql.org/docs/current/sql-createindex.html >> >> David J. >> >>