Re: FPGA optimization ...

2019-11-06 Thread AJG
>From what I have read and benchmarks seen..

FPGA shines for writes (and up to 3x (as opposed to 10x claim) real world
for queries from memory)

GPU shines/outperforms FPGA for reads. There is a very recent and
interesting academic paper[1] on High Performance GPU B-Tree (vs lsm) and
the incredible performance it gets, but I 'think' it requires NVIDIA (so no
easy/super epyc+gpu+hbm on-chip combo solution then ;) ).

Doesn't both FPHGA and GPU going to require changes to executor from pull to
push to get real benefits from them? Isnt that something Andres working on
(pull to push)?

What really is exciting is UPMEM (little 500mhz processors on the memory),
cost will be little more than memory cost itself, and shows up to 20x
performance improvement on things like index search (from memory). C
library, claim only needs few hundred lines of code to integrate from
memory, but not clear to me what use cases it can also be used for than ones
they show benchmarks for.


[1] https://escholarship.org/content/qt1ph2x5td/qt1ph2x5td.pdf?t=pkvkdm



--
Sent from: 
https://www.postgresql-archive.org/PostgreSQL-performance-f2050081.html




Re: FPGA optimization ...

2019-11-06 Thread Tomas Vondra

On Wed, Nov 06, 2019 at 11:01:37AM -0700, AJG wrote:

From what I have read and benchmarks seen..

FPGA shines for writes (and up to 3x (as opposed to 10x claim) real world
for queries from memory)

GPU shines/outperforms FPGA for reads. There is a very recent and
interesting academic paper[1] on High Performance GPU B-Tree (vs lsm) and
the incredible performance it gets, but I 'think' it requires NVIDIA (so no
easy/super epyc+gpu+hbm on-chip combo solution then ;) ).

Doesn't both FPHGA and GPU going to require changes to executor from pull to
push to get real benefits from them? Isnt that something Andres working on
(pull to push)?



I think it very much depends on how the FPA/GPU/... is used.

If we're only talking about FPGA I/O acceleration, essentially FPGA
between the database and storage, it's likely possible to get that
working without any extensive executor changes. Essentially create an
FPGA-aware variant of SeqScan and you're done. Or an FPGA-aware
tuplesort, or something like that. Neither of this should require
significant planner/executor changes, except for costing.


What really is exciting is UPMEM (little 500mhz processors on the memory),
cost will be little more than memory cost itself, and shows up to 20x
performance improvement on things like index search (from memory). C
library, claim only needs few hundred lines of code to integrate from
memory, but not clear to me what use cases it can also be used for than ones
they show benchmarks for.



Interesting, and perhaps interesting for in-memory databases.



[1] https://escholarship.org/content/qt1ph2x5td/qt1ph2x5td.pdf?t=pkvkdm


--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 





Re: Slow planning, fast execution for particular 3-table query

2019-11-06 Thread David Wheeler
>> I'll try reindexing each of the tables just to make sure it's not strange 
>> index imbalance or something causing the issue.
> I seen this issue few time - and reindex helps.

Alas our reindex doesn’t seem to have helped. I’m going to see if we can 
reproduce this on a non-prod environment so we can muck about a bit more. If we 
can reproduce it in a safe place, is there a tool we can use to get more info 
out of the query planner to find what it’s doing to take so long?

Regards,

David

From: Pavel Stehule 
Date: Monday, 4 November 2019 at 4:53 pm
To: David Wheeler 
Cc: Tom Lane , "[email protected]" 
, Cameron Redpath 

Subject: Re: Slow planning, fast execution for particular 3-table query



po 4. 11. 2019 v 6:17 odesílatel David Wheeler 
mailto:[email protected]>> napsal:
>To see this issue, you have to have recently
>inserted or deleted a bunch of extremal values of the indexed join-key
>column.  And the problem only persists until those values become known
>committed-good, or known dead-to-everybody.  (Maybe you've got a
>long-running transaction somewhere, postponing the dead-to-everybody
>condition?)

There are no long-running transactions that have backend_xmin set in 
pg_stat_activity, if that's what you mean here. There are also no open prepared 
transactions or replication slots which I understand have a similar 
keeping-things-alive issue.

These tables are biggish (hundreds of mb), but not changing so frequently that 
I'd expect large quantities of data to be inserted or deleted before autovac 
can get in there and clean it up. And certainly not in a single uncommitted 
transaction.

I'll try reindexing each of the tables just to make sure it's not strange index 
imbalance or something causing the issue.

I seen this issue few time - and reindex helps.

Pavel


Regards,

David

On 4/11/19, 4:01 pm, "Tom Lane" mailto:[email protected]>> 
wrote:

David Wheeler mailto:[email protected]>> 
writes:
> We’re having trouble working out why the planning time for this
> particular query is slow (~2.5s vs 0.9ms execution time). As you can see
> below, there are only 3 tables involved so it’s hard to imagine what
> decisions the planner has to make that take so long.

I wonder whether this traces to the cost of trying to estimate the
largest/smallest value of an indexed column by looking into the index.
Normally that's pretty cheap, but if you have a lot of recently-inserted
or recently-deleted values at the end of the index, it can get painful.
AFAIR this only happens for columns that are equijoin keys, so the fact
that your query is a join is significant.

I'm not convinced that this is the problem, because it's a corner case
that few people hit.  To see this issue, you have to have recently
inserted or deleted a bunch of extremal values of the indexed join-key
column.  And the problem only persists until those values become known
committed-good, or known dead-to-everybody.  (Maybe you've got a
long-running transaction somewhere, postponing the dead-to-everybody
condition?)

> Postgres version 9.5.19

If this *is* the cause, v11 and up have a performance improvement that
you need:

https://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=3ca930fc3

regards, tom lane



Re: Slow planning, fast execution for particular 3-table query

2019-11-06 Thread Michael Lewis
Is default_statistics_target set above default 100? I would assume that
would reflect in the size of pg_statistic, but wanted to ask since
increasing that from 100 to 1000 was the only time I have seen planning
time explode. Are other queries slow to plan?


Re: Slow planning, fast execution for particular 3-table query

2019-11-06 Thread Justin Pryzby
On Mon, Nov 04, 2019 at 03:04:45AM +, David Wheeler wrote:
> Postgres version 9.5.19
> Each of the tables has between 3-4 indexes, and all the indexes include tid 
> as first parameter.

On Mon, Nov 04, 2019 at 12:00:59AM -0500, Tom Lane wrote:
> If this *is* the cause, v11 and up have a performance improvement that
> you need:

But note that index definition will be prohibited since:

https://www.postgresql.org/docs/9.6/release-9-6.html
|Disallow creation of indexes on system columns, except for OID columns (David 
Rowley)
|Such indexes were never considered supported, and would very possibly 
misbehave since the system might change the system-column fields of a tuple 
without updating indexes. However, previously there were no error checks to 
prevent them from being created.

Justin




Re: FPGA optimization ...

2019-11-06 Thread Andres Freund
Hi,

On 2019-11-06 22:54:48 +0100, Tomas Vondra wrote:
> If we're only talking about FPGA I/O acceleration, essentially FPGA
> between the database and storage, it's likely possible to get that
> working without any extensive executor changes. Essentially create an
> FPGA-aware variant of SeqScan and you're done. Or an FPGA-aware
> tuplesort, or something like that. Neither of this should require
> significant planner/executor changes, except for costing.

I doubt that that is true.  For one, you either need to teach the FPGA
to understand at least enough about the intricacies of postgres storage
format, to be able to make enough sense of visibility information to
know when it safe to look at a tuple (you can't evaluate qual's before
visibility information). It also needs to be fed a lot of information
about the layout of the table, involved operators etc.  And even if you
define those away somehow, you still need to make sure that the on-disk
state is coherent with the in-memory state - which definitely requires
reaching outside of just a replacement seqscan node.

I've a hard time believing that, even though some storage vendors are
pushing this model heavily, the approach of performing qual evaluation
on the storage level is actually useful for anything close to a general
purpose database, especially a row store.

It's more realistic to have a model where the fpga is fed pre-processed
data, and it streams out the processed results. That way there are no
problems with coherency, one can can transparently handle parts of
reading the data that the FPGA can't, etc.


But I admit I'm sceptical even the above model is relevant for
postgres. The potential market seems likely to stay small, and there's
so much more performance work that's applicable to everyone using PG,
even without access to special purpose hardware.

Greetings,

Andres Freund




Re: Slow planning, fast execution for particular 3-table query

2019-11-06 Thread David Wheeler
Is default_statistics_target set above default 100? I would assume that would 
reflect in the size of pg_statistic, but wanted to ask since increasing that 
from 100 to 1000 was the only time I have seen planning time explode. Are other 
queries slow to plan?

Looks like you’ve found it! Someone has set the target to 10k so that’s going 
to wildly increase planning time.

Thanks for your help! And thanks to the others who chipped in along the way 😊

Regards,

David



Re: Slow planning, fast execution for particular 3-table query

2019-11-06 Thread David Rowley
On Thu, 7 Nov 2019 at 11:59, Justin Pryzby  wrote:
>
> On Mon, Nov 04, 2019 at 03:04:45AM +, David Wheeler wrote:
> > Postgres version 9.5.19
> > Each of the tables has between 3-4 indexes, and all the indexes include tid 
> > as first parameter.


> But note that index definition will be prohibited since:
>
> https://www.postgresql.org/docs/9.6/release-9-6.html
> |Disallow creation of indexes on system columns, except for OID columns 
> (David Rowley)
> |Such indexes were never considered supported, and would very possibly 
> misbehave since the system might change the system-column fields of a tuple 
> without updating indexes. However, previously there were no error checks to 
> prevent them from being created.

David will have meant the user column named "tid" rather than the
system column named "ctid".

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services




Re: Slow planning, fast execution for particular 3-table query

2019-11-06 Thread Justin Pryzby
On Thu, Nov 07, 2019 at 01:15:30PM +1300, David Rowley wrote:
> On Thu, 7 Nov 2019 at 11:59, Justin Pryzby  wrote:
> >
> > On Mon, Nov 04, 2019 at 03:04:45AM +, David Wheeler wrote:
> > > Postgres version 9.5.19
> > > Each of the tables has between 3-4 indexes, and all the indexes include 
> > > tid as first parameter.
> 
> > But note that index definition will be prohibited since:
> >
> > https://www.postgresql.org/docs/9.6/release-9-6.html
> > |Disallow creation of indexes on system columns, except for OID columns 
> > (David Rowley)
> > |Such indexes were never considered supported, and would very possibly 
> > misbehave since the system might change the system-column fields of a tuple 
> > without updating indexes. However, previously there were no error checks to 
> > prevent them from being created.
> 
> David will have meant the user column named "tid" rather than the
> system column named "ctid".

Ah.  And David must have meant David W :)

Justin 




Re: FPGA optimization ...

2019-11-06 Thread Tomas Vondra

On Wed, Nov 06, 2019 at 03:15:53PM -0800, Andres Freund wrote:

Hi,

On 2019-11-06 22:54:48 +0100, Tomas Vondra wrote:

If we're only talking about FPGA I/O acceleration, essentially FPGA
between the database and storage, it's likely possible to get that
working without any extensive executor changes. Essentially create an
FPGA-aware variant of SeqScan and you're done. Or an FPGA-aware
tuplesort, or something like that. Neither of this should require
significant planner/executor changes, except for costing.


I doubt that that is true.  For one, you either need to teach the FPGA
to understand at least enough about the intricacies of postgres storage
format, to be able to make enough sense of visibility information to
know when it safe to look at a tuple (you can't evaluate qual's before
visibility information). It also needs to be fed a lot of information
about the layout of the table, involved operators etc.  And even if you
define those away somehow, you still need to make sure that the on-disk
state is coherent with the in-memory state - which definitely requires
reaching outside of just a replacement seqscan node.



That's true, of course - the new node would have to know a lot of
details about the on-disk format, meaning of operators, etc. Not
trivial, that's for sure. (I think PGStrom does this)

What I had in mind were extensive changes to how the executor works in
general, because the OP mentioned changing the executor from pull to
push, or abandoning the iterative executor design. And I think that
would not be necessary ...


I've a hard time believing that, even though some storage vendors are
pushing this model heavily, the approach of performing qual evaluation
on the storage level is actually useful for anything close to a general
purpose database, especially a row store.



I agree with this too - it's unlikely to be a huge win for "regular"
workloads, it's usually aimed at (some) analytical workloads.

And yes, row store is not the most efficient format for this type of
accelerators (I don't have much experience with FPGA, but for GPUs it's
very inefficient).


It's more realistic to have a model where the fpga is fed pre-processed
data, and it streams out the processed results. That way there are no
problems with coherency, one can can transparently handle parts of
reading the data that the FPGA can't, etc.



Well, the whole idea is that the FPGA does a lot of "simple" filtering
before the data even get into RAM / CPU, etc. So I don't think this
model would perform well - I assume the "processing" necessary could
easily be more expensive than the gains.



But I admit I'm sceptical even the above model is relevant for
postgres. The potential market seems likely to stay small, and there's
so much more performance work that's applicable to everyone using PG,
even without access to special purpose hardware.



Not sure. It certainly is irrelevant for everyone who does not have
access to systems with FPGAs, and useful only for some workloads. How
large the market is, I don't know.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services