Very large table: Partition it or not?

2020-12-16 Thread Jack Orenstein
I have a table in an analytics database (Postgres 12.3), that gathers data
continuously. It is at 5B rows, with an average row size of 250 bytes. The
table has five indexes, on bigint and varchar columns, all with keys of one
or two columns.

There are currently frequent updates and deletions, but the net change in
the number of rows is continuously positive. We are rearchitecting the
application to avoid the updates and deletes. I.e., the table will soon be
append-only, (so vacuuming will be needed only to avoid transaction id
wraparound).

I know that the maximum table size is 32TB, which allows for 128B rows.
Based on this calculation, and the expected growth rate (2B/year
currently), we should be good for quite a while.

What are the pros and cons of partitioning the table? Without partitioning,
are we liable to run into trouble as this table keeps growing? I do realize
that some query times will grow with table size, and that partitioning,
combined with parallel query execution can address that problem. I'm more
wondering about problems in maintaining tables and indexes once we have
10B, 20B, ... rows.

Jack Orenstein


Btree vs. GIN

2021-01-01 Thread Jack Orenstein
I am working on a research project involving a datatype, D, with the
following characteristics:

- A value of D is a variable-length binary string.

- A value of D usually gives rise to a single index term, but there could
occasionally be more, (up to some configurable maximum).

- The index terms are int64.

GIN looks like a good fit for my requirements, and I've done a little
experimentation (using btree_gin) to determine that the optimizer is
working as I'd hoped.

So I'm going to proceed with GIN indexes, writing an extension for my
datatype. But I'm wondering about btrees and GIN, in general. This
discussion was pretty interesting:
https://www.postgresql-archive.org/Questions-about-btree-gin-vs-btree-gist-for-low-cardinality-columns-td6088041.html
.

My main questions are about the remaining differences between btree and GIN
indexes. With btree key deduplication in Postgres 12, one of the main
differences between btrees and GIN is gone, (my understanding is that GIN
has done this optimization for a long time).

I think that one other major difference between the two is that GIN can
handle multiple keys per row, while a btree cannot. Is there some
fundamental reason why the btree cannot accommodate multiple keys per row?
I think that this would have no impact on the btree structure itself. The
only difference would be that the same row (ctid) could be associated with
multiple keys.

I guess the top-level question is this: Is it possible in principle for the
btree index to subsume GIN index capabilities?

Jack Orenstein


Crashing on insert to GIN index

2021-01-03 Thread Jack Orenstein
I am defining a new type, FooBar, and trying to create a GIN index for it.
Everything is working well without the index. FooBar values are getting
into a table, and being retrieved and selected correctly. But I'm getting a
crash when I add a GIN index on a column of type FooBar.

Here is the operator class:

create operator class foobar_ops
default for type foobar using gin
as
operator 1 @@,
function 1 foobar_cmp(bigint, bigint),
function 2 foobar_item_to_keys(foobar, internal),
function 3 foobar_query_to_keys(foobar, internal, int2, internal,
internal),
function 4 foobar_match(internal, int2, anyelement, int4, internal,
internal),
function 5 foobar_partial_match(foobar, foobar, int2, internal);

Here is the postgres function for extracting keys from FooBar values:

create function foobar_item_to_keys(foobar, internal) returns internal
as '$libdir/foobar'
language C immutable strict parallel safe;

And the implementation:

Datum foobar_item_to_keys(PG_FUNCTION_ARGS)
{
FooBar* foobar = (FooBar*) DatumGetPointer(PG_GETARG_DATUM(0));
int32* n_keys = (int32*) PG_GETARG_POINTER(1);
int64_t* keys = (int64_t*) palloc(sizeof(int64_t));
*n_keys = 1;
keys[0] = foobar->key0;
PG_RETURN_POINTER(keys);
}

(Eventually there will be multiple keys, so it really does need to be a GIN
index.)

I have used ereport debugging to prove that the FooBar delivered into
foobar_item_to_keys is correct, and that the PG_RETURN_POINTER statement is
being reached.

I have been reading the Postgres docs, and comparing my code to the
examples in contrib, and cannot see what I'm doing wrong. Can anyone see a
problem in what I've described? Or point me in the right direction to debug
this problem?

Thanks.

Jack Orenstein


Re: Crashing on insert to GIN index

2021-01-03 Thread Jack Orenstein
Thank you, the missing STORAGE clause was the problem.

As for the non-standard coding: I did start out with more correct coding,
and it wandered off as I tried to figure out what was causing the crash.

Jack Orenstein

On Sun, Jan 3, 2021 at 7:57 PM Tom Lane  wrote:

> Jack Orenstein  writes:
> > I am defining a new type, FooBar, and trying to create a GIN index for
> it.
> > Everything is working well without the index. FooBar values are getting
> > into a table, and being retrieved and selected correctly. But I'm
> getting a
> > crash when I add a GIN index on a column of type FooBar.
>
> > Here is the operator class:
>
> > create operator class foobar_ops
> > default for type foobar using gin
> > as
> > operator 1 @@,
> > function 1 foobar_cmp(bigint, bigint),
> > function 2 foobar_item_to_keys(foobar, internal),
> > function 3 foobar_query_to_keys(foobar, internal, int2, internal,
> > internal),
> > function 4 foobar_match(internal, int2, anyelement, int4,
> internal,
> > internal),
> > function 5 foobar_partial_match(foobar, foobar, int2, internal);
>
> Hmm, don't you want a "STORAGE bigint" clause in there?
>
> > And the implementation:
>
> > int64_t* keys = (int64_t*) palloc(sizeof(int64_t));
>
> As a general rule, ignoring the conventions about how to use Datums
> is a good way to cause yourself pain.  It doesn't look like what
> you've shown us so far is directly broken ... as long as you don't
> try to run it on 32-bit hardware ... but bugs could easily be lurking
> nearby.  More, the fact that this code looks nothing like standard
> coding for the task is not making your life easier, because you
> can't easily compare what you've done to other functions.  It'd be
> much wiser to write this as
>
> Datum *keys = (Datum *) palloc(sizeof(Datum) * whatever);
>
> and then use Int64GetDatum() to convert your integer key
> values to Datums.  Yes, I'm well aware that that macro is
> physically a no-op (... on 64-bit hardware ...) but you're
> best advised to not rely on that, but think of Datum as a
> physically distinct type.
>
> regards, tom lane
>


Missing declaration of _PG_init()

2021-01-08 Thread Jack Orenstein
I am writing an extension. The docs describe a _PG_init function that will
be called upon
loading the shared library (https://www.postgresql.org/docs/12/xfunc-c.html).
I include
postgres.h and fmgr.h, but on compilation, _PG_init has not been declared.
Grepping the postgres source, _PG_init appears to be involved in
programming language extensions, and the function is declared in plpgsql.h.
Looking at various contrib modules, I see
explicit declarations of _PG_init(void).

Should _PG_init(void) be declared in someplace included by postgres.h or
fmgr.h?

Jack Orenstein


Finding memory corruption in an extension

2021-01-08 Thread Jack Orenstein
An extension I'm creating is causing Postgres to crash, almost certainly
due to memory corruption.  I am using palloc0/pfree, calling SET_VARSIZE,
and generally following the procedures documented here:
https://www.postgresql.org/docs/12/xfunc-c.html. I am also testing my code
outside of Postgres (using alloc/free instead of palloc0/pfree), and
valgrind is not finding any corruption or leaks.

The crash is not completely reproducible, but when it does happen, it's
pretty fast -- create a table, insert a couple of rows, explain a query.
(My goal is to create a GIN index on my datatype, but this crash occurs
without the index.)

I'm interested in advice on how to go about hunting down my problem.
Something along the lines of a debugging malloc, or valgrind, for Postgres.

Jack Orenstein


Static memory, shared memory

2021-01-09 Thread Jack Orenstein
I am writing a Postgres extension, and thought that I had memory
corruption, (thanks for the --enable-cassert lead). I might, but It now
looks like I need to understand the use of shared memory and locking in
Postgres. So I have two questions.

1) I am now guessing that my original problem is caused by relying on
static memory in my extension (i.e., in the source declaring
PG_MODULE_MAGIC). This static memory is almost but not quite constant -- it
is initialized from _PG_init, and then never modified. I suspect that this
cannot work in general (since Postgres is multi-process), but I thought it
would be adequate for early development. However, I am seeing this static
memory get corrupted even when there is only a single process executing the
extension code (verified by examining getpid()). So the question is this:
Is the use of non-constant static memory ill-advised, even assuming there
is just one process relying on it? Or is it more likely that I still have
run-of-the-mill memory corruption. (--enable-cassert hasn't notified me of
any problems).

2) Assuming that I should be using shared memory instead of static, I am
reading https://www.postgresql.org/docs/12/xfunc-c.html#id-1.8.3.13.14, and
examining contrib/pg_prewarm. The xfunc-c documentation mentions
RequestNamedLWLockTranche to get an array of LWLocks. But the sample code
that follows calls GetNamedLWLockTranche. And the pg_prewarm code doesn't
rely on an array of locks, it initializes a single lock, (which I think
would be adequate for my needs).  I understand the purpose of locks for
obtaining and manipulating shared memory but I am confused about the
correct way to proceed. I'm guessing it is safe to assume that pg_prewarm
works and is a good model to follow, and that the doc may be buggy. Can
someone clarify my confusion, and perhaps point me at a tutorial on correct
usage of the interfaces for LWLocks and shared memory, (I haven't been able
to find one).

Thank you.

Jack Orenstein


Understanding GIN indexes

2021-01-11 Thread Jack Orenstein
I am building a new type, which will be indexed using a GIN index. Things
are starting to work, and I am seeing queries use the index, call the
partialMatch(), consistent(), and compare() functions, and return correct
results.

However, I am still unclear on some aspects of how partialMatch and
consistent are supposed to work, (so my implementation of consistent()
always sets *refresh to true).

1) The recheck logic of consistent() is unclear to me. The docs say (
https://www.postgresql.org/docs/12/gin-extensibility.html):

On success, *recheck should be set to true if the heap tuple needs to be
rechecked against the query operator, or false if the index test is exact.
That is, a false return value guarantees that the heap tuple does not match
the query; a true return value with *recheck set to false guarantees that
the heap tuple does match the query; and a true return value with *recheck
set to true means that the heap tuple might match the query, so it needs to
be fetched and rechecked by evaluating the query operator directly against
the originally indexed item.

How can it ever be correct to return true and set *recheck to false? My
understanding of conventional (btree) indexes is that the row needs to be
retrieved, and the index condition rechecked, because the table has
visibility information, and the index does not -- a key in the index might
correspond to an obsolete row version. I understand visibility map
optimizations, and the fact that going to the actual data page can
sometimes be skipped. But that doesn't seem to be what the consistent()
refetch flag is about.

In other words, how can consistent() ever decide that a recheck is not
necessary, since the index entry may be from an obsolete row version?
Couldn't returning true and setting *recheck to false result in a false
positive?

2) For partial matches, why does consistent() need to be called at all? For
a given key (2nd arg), partialMatch() decides whether the key satisfies the
index condition. Why is a further check by consistent() required?

I think that my mental model of how GIN works must be way off. Is there a
presentation or paper that explains how GIN works?

Jack Orenstein


Re: Static memory, shared memory

2021-01-11 Thread Jack Orenstein



On Sat, Jan 9, 2021 at 12:18 PM Tom Lane  wrote:
> Jack Orenstein  writes:
> > I am writing a Postgres extension, and thought that I had memory
> > corruption, (thanks for the --enable-cassert lead). I might, but It now
> > looks like I need to understand the use of shared memory and locking in
> > Postgres. So I have two questions.
> 
> > 1) I am now guessing that my original problem is caused by relying on
> > static memory in my extension (i.e., in the source declaring
> > PG_MODULE_MAGIC). This static memory is almost but not quite constant -- it
> > is initialized from _PG_init, and then never modified. I suspect that this
> > cannot work in general (since Postgres is multi-process), but I thought it
> > would be adequate for early development. However, I am seeing this static
> > memory get corrupted even when there is only a single process executing the
> > extension code (verified by examining getpid()).
> 
> Define what you mean by "corrupted".  It seems highly unlikely that any
> code but your own is touching this memory.

Some fields have expected values, others do not.

I think I have just figured out this problem, and it was indeed my own gun 
shooting my own foot.
 
> 
> Really the big-picture question here is what are you hoping to accomplish
> and why do you think this memory might need to be shared?

The type I am implementing depends on looking up data in an array whose size is 
approximately 64k. This array needs to be computed once and for all early on, 
and is then consulted as the type is used. For now, I have hardwired the 
parameters that determine the array's contents, and the array is constructed 
during _PG_init. I will eventually remove this scaffolding, and compute the 
array contents when required, which should be a rare event.

Jack Orenstein