from:"Stefan Knecht"

pg11: Query using index, but only for half the partitions

2020-04-23 Thread Stefan Knecht

Hello all

Does anyone have an explanation for this?

Query uses only three columns of the table "snap" and all three are in an
index.

The planner seems to think that some partitions are better scanned in full.
Yet for the other half of them it's using the index just fine.

Can someone enlighten me what could be happening here?

Cheers

Stefan



Table:

\d+ snap
  Table
"statspack.snap"
  Column  |Type | Collation | Nullable |
  Default  | Storage  | Stats target | Description
--+-+---+--+--+--+--+-
 snap_id  | integer |   |  |
nextval('snap_id_seq'::regclass) | plain|  |
 start_time   | timestamp without time zone |   |  |
   | plain|  |
 end_time | timestamp without time zone |   |  |
   | plain|  |
 metadata_snap_id | integer |   |  |
   | plain|  |
 instance_id  | character varying(200)  |   |  |
   | extended |  |
 data_size| bigint  |   |  |
   | plain|  |
 state| text|   |  |
   | extended |  |
Partition key: RANGE (start_time)
Indexes:
"snap_snap_id_ix" UNIQUE, btree (start_time, snap_id)
"snap_instance_id_ix" btree (start_time, snap_id, instance_id)
Partitions: snap_20200225 FOR VALUES FROM ('2020-02-25 00:00:00') TO
('2020-02-26 00:00:00'),
snap_20200226 FOR VALUES FROM ('2020-02-26 00:00:00') TO
('2020-02-27 00:00:00'),

... (omitted for clarity)

snap_20200423 FOR VALUES FROM ('2020-04-23 00:00:00') TO
('2020-04-24 00:00:00')



Query predicates:

from statspack.cloudwatch_metrics m, statspack.snap s,
statspack.settings l
   where m.snap_id = s.snap_id
 and m.start_time = s.start_time
 and s.snap_id = l.snap_id
 and s.start_time = l.start_time
 and l.name = 'max_connections'
 and s.instance_id::varchar = 'test01'
 and m.start_time > now() - interval '30 days'
   order by m.snap_id



Plan:



QUERY PLAN
---
 WindowAgg  (cost=4972.00..4974.41 rows=21 width=212) (actual
time=131.541..132.934 rows=2763 loops=1)
   ->  Subquery Scan on m  (cost=4972.00..4972.26 rows=21 width=124)
(actual time=125.612..127.744 rows=2763 loops=1)
   ->  Sort  (cost=4972.00..4972.05 rows=21 width=188) (actual
time=125.609..126.226 rows=2763 loops=1)
  Sort Key: m_1.snap_id
  Sort Method: quicksort  Memory: 830kB
  ->  WindowAgg  (cost=4970.49..4971.54 rows=21 width=188) (actual
time=120.667..124.384 rows=2763 loops=1)
  ->  Sort  (cost=4970.49..4970.54 rows=21 width=147) (actual
time=120.614..121.186 rows=2763 loops=1)
 Sort Key: (date_trunc('day'::text, m_1.start_time))
 Sort Method: quicksort  Memory: 485kB
 ->  WindowAgg  (cost=2050.81..4970.02 rows=21 width=147) (actual
time=118.370..119.501 rows=2763 loops=1)
 ->  Nested Loop  (cost=2050.81..4968.92 rows=21 width=107) (actual
time=7.974..113.780 rows=2763 loops=1)
->  Hash Join  (cost=2050.53..4309.54 rows=18 width=40) (actual
time=7.942..83.302 rows=2763 loops=1)
Hash Cond: ((m_1.snap_id = s.snap_id) AND (m_1.start_time =
s.start_time))
->  Append  (cost=0.00..1870.00 rows=51845 width=28) (actual
time=0.059..60.010 rows=51681 loops=1)
   Subplans Removed: 19
   ->  Seq Scan on cloudwatch_metrics_20200324 m_1
 (cost=0.00..37.53 rows=1068 width=28) (actual time=0.058..0.676 rows=1070
loops=1)
   Filter: (start_time > (now() - '30 days'::interval))
   Rows Removed by Filter: 160
   ->  Seq Scan on cloudwatch_metrics_20200325 m_2
 (cost=0.00..54.59 rows=1805 width=28) (actual time=0.006..1.062 rows=1805
loops=1)
   Filter: (start_time > (now() - '30 days'::interval))

   ... ( omitted for clarity )

   ->  Seq Scan on cloudwatch_metrics_20200423 m_31
 (cost=0.00..17.99 rows=571 width=28) (actual time=0.008..0.325 rows=572
loops=1)
   Filter: (start_time > (now() - '30 days'::interval))
->  Hash  (cost=2007.12..2007.12 rows=2894 width=12) (actual
time=7.836..7.836 rows=2878 loops=1)
   Buckets: 4096  Batches: 1  Memory Usage: 167kB
   ->  Append  (cost=0.00..20

Re: pg11: Query using index, but only for half the partitions

2020-04-23 Thread Stefan Knecht

Thanks Andreas,

But I don't think that that's what's happening.

Take this example line:
   ->  Seq Scan on snap_20200328 s_23  (cost=0.00..51.73
rows=95 width=12) (actual time=0.007..0.225 rows=95 loops=1)
  Filter: ((instance_id)::text = 'test01'::text)
  Rows Removed by Filter: 1723

There's no question that this is more expensive than just reading the 95
rows from the index directly and returning them - table access is not
required for this query.

The index fully satisfies the query's columns, as evident in the output:

   ->  Index Only Scan using
snap_20200325_start_time_snap_id_instance_id_idx on snap_20200325 s_20
 (cost=0.28..74.80 rows=95 width=12) (actual time=0.011..0.113 rows=95
loops=1)
  Index Cond: (instance_id = 'test01'::text)
->Heap Fetches: 0

That is, unless it doesn't consider that fact when costing?




On Thu, Apr 23, 2020 at 5:01 PM Andreas Kretschmer 
wrote:

>
>
> Am 23.04.20 um 10:13 schrieb Stefan Knecht:
> > Seq Scan on snap_20200225 s  (cost=0.00..1.19 rows=1 width=12)
>
> the partition is very small, so it's cheaper to scan only the table (one
> block) than index + table (1 + 1 block).
>
>
> Regards, Andreas
>
> --
> 2ndQuadrant - The PostgreSQL Support Company.
> www.2ndQuadrant.com
>
>
>
>

-- 
//
zztat - The Next-Gen Oracle Performance Monitoring and Reaction Framework!
Visit us at zztat.net | @zztat_oracle | fb.me/zztat | zztat.net/blog/

Re: Oracle vs. PostgreSQL - a comment

2020-05-31 Thread Stefan Knecht

Okay I'll bite.

Comparing Postgres with Oracle is a bit like comparing a rubber duck you
might buy your three year old, with a 30 ton super tanker.

Do they both float? Yeah, but that's about the only similarity.

The rubber duck barely tells you how and why it floats, but the super
tanker is packed with instrumentation, statistics, events and trace
functionality down to every last bit of activity.

Of course, that comes at a cost.

Oracle is also the single most feature-rich database out there - the
feature set of Postgres isn't even 1% of what Oracle has.

It's not a fair comparison.

Postgres has its place, it's free, it works well.

But you can't compare it to an RDBMS like Oracle. Not in terms of size, nor
the time it takes to install (and your 2 hours are definitely on the high
end - it shouldn't take much more than half an hour).

In fact, you likely want to limit the feature set you are installing with
Oracle - both to keep it as lean as possible, to reduce bugs (yes it
contains many millions of lines of code more than Postgres, and it's
written by humans, it will obviously have more bugs), and also to reduce
the time it takes to install, upgrade and patch it. There are ways to do
that.

That's my THB 0.02





On Sat, May 30, 2020 at 7:21 PM Paul Förster 
wrote:

> Hi,
>
> I know, this list is not for this, but I just couldn't resist. Please
> forgive me.
>
> Being an Oracle DBA for two decades now (back then starting with Oracle
> 8.0.5) and only doing PostgreSQL since version 10.3, I feel compelled to
> share some of my experiences with both.
>
> Quick facts:
>
> All installations were performed on the same host except for Oracle Data
> Guard and Patroni with etcd, which both require at least three nodes, or an
> odd number >2 respectively) to establish a democracy to determine the
> master/primary and replica/standby databases. However, all machines have
> the same hardware and operating system:
>
> OS:  openSUSE Leap 15.1 (server setup, not desktop)
> CPU: Intel i7-7700T CPU
> RAM: 32 GB
> Disk Hardware: SSD
>
> Also, Oracle requires 161 additional packages to be installed, many of
> which are 32-bit packages, for a supposedly 64-bit only software! This
> results in 150 MB additional disk space needed and swamps the system with
> 32-bit packages!
>
> PostgreSQL only requires a few packages to be installed depending on the
> options one chooses to compile the source with. Anyway, none of these
> packages require a 32-bit version!
>
> Size of installation:
>
> Software:
> $ du -sh /data/postgres/12.3 /data/oracle/product/19.6
> 62M /data/postgres/12.3
> 8.5G/data/oracle/product/19.6
> Databases:
> $ du -sh /data/oradb/*
> 3.3G/data/oradb/cdb01# Oracle Container w/ 1 PDB
> 1.8G/data/oradb/sdb01# Oracle stand alone database
> $ du -sh /data/pgdb/sdb01
> 659M/data/pgdb/sdb01 # PostgreSQL 12.3 database cluster
>
> All databases are a clean setup, no schemas, users, tables, data, etc.
> Just an empty base.
>
> Installation:
> Oracle 19c: ~2h
> unzip to ORACLE_HOME
> runInstaller
> unzip newest OPatch p6880880_20_Linux-x86-64
> apply p30797938_19_Linux-x86-64 (19.6.1)
> PostgreSQL 12.3 compiled from source: ~3m30s
> bunzip postgresql-12.3.tar.bz2 to PGHOME
> make install-world
>
> Create database:
> Oracle:
> create stand alone database:  ~30m
> create container database (cdb$root): ~47m
> create pluggable database (pdb):  ~26s
> Memory to run a database reasonably well:
> at least 1 GB SGA for a stand alone database
> at least 4 GB SGA for a container database
> PostgreSQL:
> initdb:  <1s
> create database: <200ms
> Memory:  128 MB db_buffers
>
> Start/stop database:
> Oracle:
> startup (standard, 1 GB SGA): ~15s
> shutdown immediate (standard, 1 GB SGA):  ~21s
> startup (container, 4 GB SGA):~16s
> shutdown immediate (container, 4 GB SGA): ~23s
> PostgreSQL:
> pg_ctl start: 0.1s
> pg_ctl stop:  0.2s
>
> Other discoveries/experiences:
> Oracle:
> Set up Data Guard (2 nodes) with observer (3. node): ~4h
> Applying a PSU or RU often requires downtime of 60m-90m.
> Migrating a major version often requires downtime of 60m-90m.
> Migrating a new major version requires a lot of work in advance.
> Switching Data Guard takes ~1m.
> PostgreSQL:
> Set up Patroni (2 nodes) with etcd (3 nodes): ~30m
> Applying a new minor version requires downtime of <2s.
> Migrating a new major version requires downtime of <20s.
> Migrating a new major version requires a few minutes work in
> advance.
> Switching Patroni takes ~1s.
>
> Oracle has some good concept

Re: Value Too long varchar(100)

2022-10-27 Thread Stefan Knecht

You should be able to do something with this if you require it to be done
with COPY:

https://paquier.xyz/postgresql-2/postgres-9-3-feature-highlight-copy-tofrom-program/

But, as David suggested, I'd also recommend to do an intermediate step, and
load the data into a table, or pre-process the file to be how you want it.



On Thu, Oct 27, 2022 at 7:19 PM David G. Johnston <
david.g.johns...@gmail.com> wrote:

>
>
> On Thu, Oct 27, 2022 at 5:02 AM Rama Krishnan 
> wrote:
>
>> Hi team,
>>
>> We are getting csv file from client to upload data in my db table , one
>> particular column I. E clinet description column contains more than 100
>> character  hence I am getting value too long varchar (100) so we decided to
>> upload db only first 100 characters. How to use this thing in copy command
>>
>
> You cannot.  Either fix the content of the file or remove the arbitrary
> length limitation on the field (i.e., change the type to "text").  I
> suggest the later.  You may also,copy into a temporary staging table that
> lacks the limit, then use insert to move the transformed data (via a select
> query) into the production table.
>
> David J.
>

To create or not to create that index

2023-08-17 Thread Stefan Knecht

Hello

Why does this happen?

profile_aggregates=> create index concurrently foo_idx on agg (status,
foots, created_ts);


^CCancel request sent
ERROR:  canceling statement due to user request
profile_aggregates=>
profile_aggregates=>  create index concurrently foo_idx on agg (status,
foots, created_ts);
ERROR:  relation " foo_idx" already exists

Are these operations not atomic ?

Cheers

Stefan

Re: To create or not to create that index

2023-08-17 Thread Stefan Knecht

But that "invalid" index is being used by queries

On Fri, Aug 18, 2023 at 10:41 AM David G. Johnston <
david.g.johns...@gmail.com> wrote:

> On Thu, Aug 17, 2023 at 8:37 PM Stefan Knecht 
> wrote:
>
>> Hello
>>
>> Why does this happen?
>>
>> profile_aggregates=> create index concurrently foo_idx on agg (status,
>> foots, created_ts);
>>
>>
>> ^CCancel request sent
>> ERROR:  canceling statement due to user request
>> profile_aggregates=>
>> profile_aggregates=>  create index concurrently foo_idx on agg (status,
>> foots, created_ts);
>> ERROR:  relation " foo_idx" already exists
>>
>> Are these operations not atomic ?
>>
>>
> No, being atomic would interfere with doing things concurrently.  Per the
> docs:
>
> In a concurrent index build, the index is actually entered as an “invalid”
> index into the system catalogs in one transaction, then two table scans
> occur in two more transactions.
> ...
> If a problem arises while scanning the table, such as a deadlock or a
> uniqueness violation in a unique index, the CREATE INDEX command will fail
> but leave behind an “invalid” index.
>
> "Problem" includes you forcibly killing it while it is running.
>
> https://www.postgresql.org/docs/current/sql-createindex.html
>
> David J.
>
>

Re: To create or not to create that index

2023-08-17 Thread Stefan Knecht

Ah no it is not. Something else was changed at the same time. Sigh.

Thanks for clarifying David

On Fri, Aug 18, 2023 at 10:42 AM Stefan Knecht 
wrote:

> But that "invalid" index is being used by queries
>
> On Fri, Aug 18, 2023 at 10:41 AM David G. Johnston <
> david.g.johns...@gmail.com> wrote:
>
>> On Thu, Aug 17, 2023 at 8:37 PM Stefan Knecht 
>> wrote:
>>
>>> Hello
>>>
>>> Why does this happen?
>>>
>>> profile_aggregates=> create index concurrently foo_idx on agg (status,
>>> foots, created_ts);
>>>
>>>
>>> ^CCancel request sent
>>> ERROR:  canceling statement due to user request
>>> profile_aggregates=>
>>> profile_aggregates=>  create index concurrently foo_idx on agg (status,
>>> foots, created_ts);
>>> ERROR:  relation " foo_idx" already exists
>>>
>>> Are these operations not atomic ?
>>>
>>>
>> No, being atomic would interfere with doing things concurrently.  Per the
>> docs:
>>
>> In a concurrent index build, the index is actually entered as an
>> “invalid” index into the system catalogs in one transaction, then two table
>> scans occur in two more transactions.
>> ...
>> If a problem arises while scanning the table, such as a deadlock or a
>> uniqueness violation in a unique index, the CREATE INDEX command will fail
>> but leave behind an “invalid” index.
>>
>> "Problem" includes you forcibly killing it while it is running.
>>
>> https://www.postgresql.org/docs/current/sql-createindex.html
>>
>> David J.
>>
>>

pg11: Query using index, but only for half the partitions

Re: pg11: Query using index, but only for half the partitions

Re: Oracle vs. PostgreSQL - a comment

Re: Value Too long varchar(100)

To create or not to create that index

Re: To create or not to create that index

Re: To create or not to create that index

7 matches

Site Navigation

Mail list logo

Footer information