Optimizer choosing the wrong plan

2018-11-26 Thread Viswanath
*Postgres server version -  9.5.10*
*RAM - 128 GB*
*WorkMem 64 MB*

*Problematic query with explain :*
*Query 1 (original):*
explain analyse SELECT myTable1.ID FROM myTable1 LEFT JOIN myTable2 ON
myTable1.ID=myTable2.ID WHERE  myTable1.bool_val = true) AND
(myTable1.small_intval IN (1,2,3))) AND ((*myTable2.bigint_val = 1*) AND
(myTable1.bool_val = true))) AND (((myTable1.ID >= 1) AND
(myTable1.ID <= 1)) ))  ORDER BY 1 DESC , 1 NULLS FIRST  LIMIT
11;

   
QUERY PLAN  
   
---
 Limit  (cost=1.00..8077.43 rows=11 width=8) (actual time=6440.245..6440.245
rows=0 loops=1)
   ->  Nested Loop  (cost=1.00..1268000.55 *rows=1727* width=8) (actual
time=6440.242..6440.242 rows=0 loops=1)
 ->  Index Scan Backward using myTable2_fk1_idx on myTable2 
(cost=0.43..1259961.54 rows=1756 width=8) (actual time=6440.241..6440.241
rows=0 loops=1)
   Filter: (bigint_val = 1)
   Rows Removed by Filter: 12701925
 ->  Index Scan using myTable1_fk2_idx on myTable1  (cost=0.56..4.57
rows=1 width=8) (never executed)
   Index Cond: ((id = myTable2.id) AND (id >=
'1'::bigint) AND (id <= '1'::bigint))
   Filter: (bool_val AND bool_val AND (small_intval = ANY
('{1,2,3}'::integer[])))
 Planning time: 0.654 ms
 Execution time: 6440.353 ms
(10 rows)

*The columns myTable1.ID and myTable2.bigint_val = 1 both are indexed*

The table myTable2 contains *12701952* entries. Out of which only *86227* is
not null and *146* entries are distinct.

The above query returns 0 rows since 'myTable2.bigint_val = 1' criteria
satisfies nothing. It takes 6 seconds for execution as the planner chooses*
myTable1.ID column's index*. 


If I use nulls last on the order by clause of the query then the planner
chooses this plan since it doesn't use index for *DESC NULLS LAST*. And the
query executes in milliseconds.

*Query 2 (order by modified to avoid index):*
explain analyse SELECT myTable1.ID FROM myTable1 LEFT JOIN myTable2 ON
myTable1.ID=myTable2.ID WHERE  myTable1.bool_val = true) AND
(myTable1.small_intval IN (1,2,3))) AND ((myTable2.bigint_val = 1) AND
(myTable1.bool_val = true))) AND (((myTable1.ID >= 1) AND
(myTable1.ID <= 1)) ))  ORDER BY 1 DESC *NULLS LAST*, 1 NULLS
FIRST  LIMIT 11;


QUERY PLAN  
   

 Limit  (cost=11625.07..11625.10 rows=11 width=8) (actual time=0.028..0.028
rows=0 loops=1)
   ->  Sort  (cost=11625.07..11629.39 rows=1727 width=8) (actual
time=0.028..0.028 rows=0 loops=1)
 Sort Key: myTable1.id DESC NULLS LAST
 Sort Method: quicksort  Memory: 25kB
 ->  Nested Loop  (cost=0.85..11586.56 *rows=1727 *width=8) (actual
time=0.024..0.024 rows=0 loops=1)
   ->  Index Scan using bigint_val_idx_px on myTable2 
(cost=0.29..3547.55 rows=1756 width=8) (actual time=0.024..0.024 rows=0
loops=1)
 Index Cond: (bigint_val = 1)
   ->  Index Scan using myTable1_fk2_idx on myTable1 
(cost=0.56..4.57 rows=1 width=8) (never executed)
 Index Cond: ((id = myTable2.id) AND (id >=
'1'::bigint) AND (id <= '1'::bigint))
 Filter: (bool_val AND bool_val AND (small_intval = ANY
('{1,2,3}'::integer[])))
 Planning time: 0.547 ms
 Execution time: 0.110 ms

The reason why postgres chooses the 1st plan over the 2nd was due to it's
cost. *plan 1 - 8077.43 and plan 2 - 11625.10* . But obviously plan 2 is
correct. 

I tried running *vacuum analyse* table many times, tried changing the
*statistics target of the column to 250 (since there are only 149 distinct
values)*. But none worked out. The planner thinks that there are *1727* rows
that matches the condition *myTable2.bigint_val = 1* but there are none.

Also I tried changing the limit of the 1st query, increasing the limit
increases the cost of the 1st plan so if I use 16 as limit for the same 1st
query the planner chooses the 2nd plan.

*Query 3 (same as 1st but limit increased to 16):*

explain analyse SELECT myTable1.ID FROM myTable1 LEFT JOIN myTable2 ON
myTable1.ID=myTable2.ID WHERE  myTable1.bool_val = true) AND
(myTable1.small_intval IN (1,2,3))) AND ((myTable2.bigint_val = 1) AND
(myTable1.bool_val = true))) AND (((myTable1.ID >= 1) AND
(myTable1.ID <= 199

autovacuum run but last_autovacuum is empty

2018-11-26 Thread Mariel Cherkassky
Hi,
I checked pg_stat_all_tables and I see that the last_autovacuum is empty
for all the tables but in the database`s log I see that autovacuum was
running and deleted records(not all of them but still deleted...).
Can someone explain why the pg_stat_all_tables doesnt show the real data ?


Re: autovacuum run but last_autovacuum is empty

2018-11-26 Thread Tom Lane
Mariel Cherkassky  writes:
> I checked pg_stat_all_tables and I see that the last_autovacuum is empty
> for all the tables but in the database`s log I see that autovacuum was
> running and deleted records(not all of them but still deleted...).
> Can someone explain why the pg_stat_all_tables doesnt show the real data ?

Hm, are you sure the stats collector is working at all?  Are other fields
in the pg_stat data updating?

(If not, you may have a problem with firewall rules blocking traffic
on the stats loopback port.)

regards, tom lane



Re: dsa_allocate() faliure

2018-11-26 Thread Jakub Glapa
So, the issue occurs only on production db an right now I cannot reproduce
it.
I had a look at dmesg and indeed I see something like:


--
regards,
pozdrawiam,
Jakub Glapa


On Fri, Nov 23, 2018 at 5:10 PM Justin Pryzby  wrote:

> On Fri, Nov 23, 2018 at 03:31:41PM +0100, Jakub Glapa wrote:
> > Hi Justin, I've upgrade to 10.6 but the error still shows up:
> >
> > If I set it to max_parallel_workers=0 I also get and my connection is
> being
> > closed (but the server is alive):
> >
> > psql db@host as user => set max_parallel_workers=0;
>
> Can you show the plan (explain without analyze) for the nonparallel case?
>
> Also, it looks like the server crashed in that case (even if it restarted
> itself quickly).  Can you confirm ?
>
> For example: dmesg |tail might show "postmaster[8582]: segfault [...]" or
> similar.  And other clients would've been disconnected.  (For example,
> you'd
> get an error in another, previously-connected session the next time you
> run:
> SELECT 1).
>
> In any case, could you try to find a minimal way to reproduce the problem
> ?  I
> mean, is the dataset and query small and something you can publish, or can
> you
> reproduce with data generated from (for example) generate_series() ?
>
> Thanks,
> Justin
>


Re: dsa_allocate() faliure

2018-11-26 Thread Jakub Glapa
sorry, the message was sent out to early.

So, the issue occurs only on production db an right now I cannot reproduce
it.
I had a look at dmesg and indeed I see something like:

postgres[30667]: segfault at 0 ip 557834264b16 sp 7ffc2ce1e030
error 4 in postgres[557833db7000+6d5000]

and AFAIR other sessions I had opened at that time were indeed disconnected.

When it comes to the execution plan for max_parallel_workers=0.
There is no real difference.
I guess *max_parallel_workers *has no effect and
*max_parallel_workers_per_gather
*should have been used.
 Why it caused a server crash is unknown right now.

I cannot really give a reproducible recipe.
My case is that I have a parent table with ~300 partitions.
And I initiate a select on ~100 of them with select [...] from fa where
client_id() and [filters].
I know this is not effective. Every partition has several indexes and this
query acquires a lot of locks... even for relations not used in the query.
PG11 should have better partition pruning mechanism but I'm not there yet
to upgrade.
Some of the partitions have millions of rows.

I'll keep observing maybe I'l find a pattern when this occurs.


--
regards,
pozdrawiam,
Jakub Glapa


On Mon, Nov 26, 2018 at 4:26 PM Jakub Glapa  wrote:

> So, the issue occurs only on production db an right now I cannot reproduce
> it.
> I had a look at dmesg and indeed I see something like:
>
>
> --
> regards,
> Jakub Glapa
>
>
> On Fri, Nov 23, 2018 at 5:10 PM Justin Pryzby 
> wrote:
>
>> On Fri, Nov 23, 2018 at 03:31:41PM +0100, Jakub Glapa wrote:
>> > Hi Justin, I've upgrade to 10.6 but the error still shows up:
>> >
>> > If I set it to max_parallel_workers=0 I also get and my connection is
>> being
>> > closed (but the server is alive):
>> >
>> > psql db@host as user => set max_parallel_workers=0;
>>
>> Can you show the plan (explain without analyze) for the nonparallel case?
>>
>> Also, it looks like the server crashed in that case (even if it restarted
>> itself quickly).  Can you confirm ?
>>
>> For example: dmesg |tail might show "postmaster[8582]: segfault [...]" or
>> similar.  And other clients would've been disconnected.  (For example,
>> you'd
>> get an error in another, previously-connected session the next time you
>> run:
>> SELECT 1).
>>
>> In any case, could you try to find a minimal way to reproduce the problem
>> ?  I
>> mean, is the dataset and query small and something you can publish, or
>> can you
>> reproduce with data generated from (for example) generate_series() ?
>>
>> Thanks,
>> Justin
>>
>


Re: autovacuum run but last_autovacuum is empty

2018-11-26 Thread Mariel Cherkassky
I think that something is wrong because some tables has a value in
last_autoanalyze and some dont while most of them were analyzed. For
example :

   relname   | n_live_tup | n_dead_tup |
last_autovacuum|   last_autoanalyze
-+++---+---
 tbl1|975 | 37 |
   | 2018-11-26 16:15:51.557115+02
 tbl2   |4283798 | 739345 | 2018-11-26
17:43:46.663709+02 | 2018-11-26 17:44:38.234908+02
 tbl3 |3726015 | 596362 | 2018-11-26
17:37:33.726438+02 | 2018-11-26 17:48:30.126623+02
 pg_language |  0 |  0 |
   |
 pg_toast_19018  |  0 |  0 |
   |
 pg_toast_13115  |  0 |  0 |
   |
 tbl4 |  0 |  0 |
 |


 log data :
[root@pg_log]# cat postgresql-Sun.log | grep automatic | awk {'print $6
$7,$10'} | grep db1 | sort -n | uniq -c
  1 automaticanalyze "db1.public.attributes"
  4 automaticanalyze "db1.public.tbl3"
  3 automaticanalyze "db1.public.tbl3"
 11 automaticanalyze "db1.public.tbl2"
  1 automaticanalyze "db1.public.tbl4"
  5 automaticanalyze "db1.public.tbl1"
  1 automaticvacuum "db1.pg_catalog.pg_statistic":
  5 automaticvacuum "db1.pg_toast.pg_toast_19239":
  1 automaticvacuum "db1.pg_toast.pg_toast_19420":
  2 automaticvacuum "db1.pg_toast.pg_toast_2619":
  3 automaticvacuum "db1.public.tbl3":
  9 automaticvacuum "db1.public.tbl2":
  1 automaticvacuum "db1.public.tbl4":


I just changed the name of the tables but all other values are accurate.


 Firewall rules should block all stats not just some of them right ?
Something is weird...





‫בתאריך יום ב׳, 26 בנוב׳ 2018 ב-17:25 מאת ‪Tom Lane‬‏ <‪[email protected]
‬‏>:‬

> Mariel Cherkassky  writes:
> > I checked pg_stat_all_tables and I see that the last_autovacuum is empty
> > for all the tables but in the database`s log I see that autovacuum was
> > running and deleted records(not all of them but still deleted...).
> > Can someone explain why the pg_stat_all_tables doesnt show the real data
> ?
>
> Hm, are you sure the stats collector is working at all?  Are other fields
> in the pg_stat data updating?
>
> (If not, you may have a problem with firewall rules blocking traffic
> on the stats loopback port.)
>
> regards, tom lane
>


Re: dsa_allocate() faliure

2018-11-26 Thread Justin Pryzby
Hi, thanks for following through.

On Mon, Nov 26, 2018 at 04:38:35PM +0100, Jakub Glapa wrote:
> I had a look at dmesg and indeed I see something like:
> 
> postgres[30667]: segfault at 0 ip 557834264b16 sp 7ffc2ce1e030
> error 4 in postgres[557833db7000+6d5000]

That's useful, I think "at 0" means a null pointer dereferenced.

Can you check /var/log/messages (or ./syslog or similar) and verify the
timestamp matches the time of the last crash (and not an unrelated crash) ?

The logs might also indicate if the process dumped a core file anywhere. 

I don't know what distribution/OS you're using, but it might be good to install
abrt (RHEL) or apport (ubuntu) or other mechanism to save coredumps, or to
manually configure /proc/sys/kernel/core_pattern.

On centos, I usually set:
/etc/abrt/abrt-action-save-package-data.conf
OpenGPGCheck = no

Also, it might be good to install debug symbols, in case you do find a core
dump now or get one later.

On centos: yum install postgresql10-debuginfo or debuginfo-install 
postgresql10-server
Make sure this exactly matches the debug symbols exactly match the server 
version.

Justin



Re: Optimizer choosing the wrong plan

2018-11-26 Thread Jeff Janes
On Mon, Nov 26, 2018 at 5:11 AM Viswanath  wrote:

> *Postgres server version -  9.5.10*
> *RAM - 128 GB*
> *WorkMem 64 MB*
>
> *Problematic query with explain :*
> *Query 1 (original):*
> explain analyse SELECT myTable1.ID FROM myTable1 LEFT JOIN myTable2 ON
> myTable1.ID=myTable2.ID WHERE  myTable1.bool_val = true) AND
> (myTable1.small_intval IN (1,2,3))) AND ((*myTable2.bigint_val = 1*) AND
> (myTable1.bool_val = true))) AND (((myTable1.ID >= 1) AND
> (myTable1.ID <= 1)) ))  ORDER BY 1 DESC , 1 NULLS FIRST  LIMIT
> 11;
>

There is no point doing a LEFT JOIN when the NULL-extended rows get
filtered out later.

Also, ordering by the same column twice is peculiar, to say the least.


> The table myTable2 contains *12701952* entries. Out of which only *86227*
> is
> not null and *146* entries are distinct.
>

I assume you mean the column myTable2.ID has that many not null and
distinct?


>
> The above query returns 0 rows since 'myTable2.bigint_val = 1' criteria
> satisfies nothing. It takes 6 seconds for execution as the planner chooses*
> myTable1.ID column's index*.


More importantly, it chooses the index on myTable2.ID.  It does also use
the index on myTable1.ID, but that is secondary.

The ideal index for this query would probably be a composite index on
myTable2 (bigint_val, id DESC);
The planner would probably choose to use that index, even if the statistics
are off.

I tried running *vacuum analyse* table many times, tried changing the
> *statistics target of the column to 250 (since there are only 149 distinct
> values)*. But none worked out. The planner thinks that there are *1727*
> rows
> that matches the condition *myTable2.bigint_val = 1* but there are none.
>

It would interesting if you can upgrade a copy of your server to v11 and
try it there.  We made changes to ANALYZE in that version which were
intended to improve this situation, and it would be nice to know if it
actually did so for your case.

Also, increasing statistics target even beyond 250 might help.  If every
one of the observed value is seen at least twice, it will trigger the
system to assume that it has observed all distinct values that exist.  But
if any of the values are seen exactly once, that causes it to take a
different path (the one which got modified in v11).

Cheers,

Jeff


Re: dsa_allocate() faliure

2018-11-26 Thread Jakub Glapa
Justin thanks for the information!
I'm running Ubuntu 16.04.
I'll try to prepare for the next crash.
Couldn't find anything this time.


--
regards,
Jakub Glapa


On Mon, Nov 26, 2018 at 4:52 PM Justin Pryzby  wrote:

> Hi, thanks for following through.
>
> On Mon, Nov 26, 2018 at 04:38:35PM +0100, Jakub Glapa wrote:
> > I had a look at dmesg and indeed I see something like:
> >
> > postgres[30667]: segfault at 0 ip 557834264b16 sp 7ffc2ce1e030
> > error 4 in postgres[557833db7000+6d5000]
>
> That's useful, I think "at 0" means a null pointer dereferenced.
>
> Can you check /var/log/messages (or ./syslog or similar) and verify the
> timestamp matches the time of the last crash (and not an unrelated crash) ?
>
> The logs might also indicate if the process dumped a core file anywhere.
>
> I don't know what distribution/OS you're using, but it might be good to
> install
> abrt (RHEL) or apport (ubuntu) or other mechanism to save coredumps, or to
> manually configure /proc/sys/kernel/core_pattern.
>
> On centos, I usually set:
> /etc/abrt/abrt-action-save-package-data.conf
> OpenGPGCheck = no
>
> Also, it might be good to install debug symbols, in case you do find a core
> dump now or get one later.
>
> On centos: yum install postgresql10-debuginfo or debuginfo-install
> postgresql10-server
> Make sure this exactly matches the debug symbols exactly match the server
> version.
>
> Justin
>


Re: autovacuum run but last_autovacuum is empty

2018-11-26 Thread Tom Lane
Mariel Cherkassky  writes:
> I think that something is wrong because some tables has a value in
> last_autoanalyze and some dont while most of them were analyzed.

It's possible that autovacuum abandoned some vacuum attempts without
getting to the end because of lock conflicts.

regards, tom lane



Re: dsa_allocate() faliure

2018-11-26 Thread Alvaro Herrera
On 2018-Nov-26, Jakub Glapa wrote:

> Justin thanks for the information!
> I'm running Ubuntu 16.04.
> I'll try to prepare for the next crash.
> Couldn't find anything this time.

As I recall, the appport stuff in Ubuntu is terrible ... I've seen it
take 40 minutes to write the crash dump to disk, during which the
database was "down".  I don't know why it is so slow (it's a rather
silly python script that apparently processes the core dump one byte at
a time, and you can imagine that with a few gigabytes of shared memory
that takes a while).  Anyway my recommendation was to *remove* that
stuff from the server and make sure the core file is saved by normal
means.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services