Very large table: Partition it or not?
I have a table in an analytics database (Postgres 12.3), that gathers data continuously. It is at 5B rows, with an average row size of 250 bytes. The table has five indexes, on bigint and varchar columns, all with keys of one or two columns. There are currently frequent updates and deletions, but the net change in the number of rows is continuously positive. We are rearchitecting the application to avoid the updates and deletes. I.e., the table will soon be append-only, (so vacuuming will be needed only to avoid transaction id wraparound). I know that the maximum table size is 32TB, which allows for 128B rows. Based on this calculation, and the expected growth rate (2B/year currently), we should be good for quite a while. What are the pros and cons of partitioning the table? Without partitioning, are we liable to run into trouble as this table keeps growing? I do realize that some query times will grow with table size, and that partitioning, combined with parallel query execution can address that problem. I'm more wondering about problems in maintaining tables and indexes once we have 10B, 20B, ... rows. Jack Orenstein
Btree vs. GIN
I am working on a research project involving a datatype, D, with the following characteristics: - A value of D is a variable-length binary string. - A value of D usually gives rise to a single index term, but there could occasionally be more, (up to some configurable maximum). - The index terms are int64. GIN looks like a good fit for my requirements, and I've done a little experimentation (using btree_gin) to determine that the optimizer is working as I'd hoped. So I'm going to proceed with GIN indexes, writing an extension for my datatype. But I'm wondering about btrees and GIN, in general. This discussion was pretty interesting: https://www.postgresql-archive.org/Questions-about-btree-gin-vs-btree-gist-for-low-cardinality-columns-td6088041.html . My main questions are about the remaining differences between btree and GIN indexes. With btree key deduplication in Postgres 12, one of the main differences between btrees and GIN is gone, (my understanding is that GIN has done this optimization for a long time). I think that one other major difference between the two is that GIN can handle multiple keys per row, while a btree cannot. Is there some fundamental reason why the btree cannot accommodate multiple keys per row? I think that this would have no impact on the btree structure itself. The only difference would be that the same row (ctid) could be associated with multiple keys. I guess the top-level question is this: Is it possible in principle for the btree index to subsume GIN index capabilities? Jack Orenstein
Crashing on insert to GIN index
I am defining a new type, FooBar, and trying to create a GIN index for it. Everything is working well without the index. FooBar values are getting into a table, and being retrieved and selected correctly. But I'm getting a crash when I add a GIN index on a column of type FooBar. Here is the operator class: create operator class foobar_ops default for type foobar using gin as operator 1 @@, function 1 foobar_cmp(bigint, bigint), function 2 foobar_item_to_keys(foobar, internal), function 3 foobar_query_to_keys(foobar, internal, int2, internal, internal), function 4 foobar_match(internal, int2, anyelement, int4, internal, internal), function 5 foobar_partial_match(foobar, foobar, int2, internal); Here is the postgres function for extracting keys from FooBar values: create function foobar_item_to_keys(foobar, internal) returns internal as '$libdir/foobar' language C immutable strict parallel safe; And the implementation: Datum foobar_item_to_keys(PG_FUNCTION_ARGS) { FooBar* foobar = (FooBar*) DatumGetPointer(PG_GETARG_DATUM(0)); int32* n_keys = (int32*) PG_GETARG_POINTER(1); int64_t* keys = (int64_t*) palloc(sizeof(int64_t)); *n_keys = 1; keys[0] = foobar->key0; PG_RETURN_POINTER(keys); } (Eventually there will be multiple keys, so it really does need to be a GIN index.) I have used ereport debugging to prove that the FooBar delivered into foobar_item_to_keys is correct, and that the PG_RETURN_POINTER statement is being reached. I have been reading the Postgres docs, and comparing my code to the examples in contrib, and cannot see what I'm doing wrong. Can anyone see a problem in what I've described? Or point me in the right direction to debug this problem? Thanks. Jack Orenstein
Re: Crashing on insert to GIN index
Thank you, the missing STORAGE clause was the problem. As for the non-standard coding: I did start out with more correct coding, and it wandered off as I tried to figure out what was causing the crash. Jack Orenstein On Sun, Jan 3, 2021 at 7:57 PM Tom Lane wrote: > Jack Orenstein writes: > > I am defining a new type, FooBar, and trying to create a GIN index for > it. > > Everything is working well without the index. FooBar values are getting > > into a table, and being retrieved and selected correctly. But I'm > getting a > > crash when I add a GIN index on a column of type FooBar. > > > Here is the operator class: > > > create operator class foobar_ops > > default for type foobar using gin > > as > > operator 1 @@, > > function 1 foobar_cmp(bigint, bigint), > > function 2 foobar_item_to_keys(foobar, internal), > > function 3 foobar_query_to_keys(foobar, internal, int2, internal, > > internal), > > function 4 foobar_match(internal, int2, anyelement, int4, > internal, > > internal), > > function 5 foobar_partial_match(foobar, foobar, int2, internal); > > Hmm, don't you want a "STORAGE bigint" clause in there? > > > And the implementation: > > > int64_t* keys = (int64_t*) palloc(sizeof(int64_t)); > > As a general rule, ignoring the conventions about how to use Datums > is a good way to cause yourself pain. It doesn't look like what > you've shown us so far is directly broken ... as long as you don't > try to run it on 32-bit hardware ... but bugs could easily be lurking > nearby. More, the fact that this code looks nothing like standard > coding for the task is not making your life easier, because you > can't easily compare what you've done to other functions. It'd be > much wiser to write this as > > Datum *keys = (Datum *) palloc(sizeof(Datum) * whatever); > > and then use Int64GetDatum() to convert your integer key > values to Datums. Yes, I'm well aware that that macro is > physically a no-op (... on 64-bit hardware ...) but you're > best advised to not rely on that, but think of Datum as a > physically distinct type. > > regards, tom lane >
Missing declaration of _PG_init()
I am writing an extension. The docs describe a _PG_init function that will be called upon loading the shared library (https://www.postgresql.org/docs/12/xfunc-c.html). I include postgres.h and fmgr.h, but on compilation, _PG_init has not been declared. Grepping the postgres source, _PG_init appears to be involved in programming language extensions, and the function is declared in plpgsql.h. Looking at various contrib modules, I see explicit declarations of _PG_init(void). Should _PG_init(void) be declared in someplace included by postgres.h or fmgr.h? Jack Orenstein
Finding memory corruption in an extension
An extension I'm creating is causing Postgres to crash, almost certainly due to memory corruption. I am using palloc0/pfree, calling SET_VARSIZE, and generally following the procedures documented here: https://www.postgresql.org/docs/12/xfunc-c.html. I am also testing my code outside of Postgres (using alloc/free instead of palloc0/pfree), and valgrind is not finding any corruption or leaks. The crash is not completely reproducible, but when it does happen, it's pretty fast -- create a table, insert a couple of rows, explain a query. (My goal is to create a GIN index on my datatype, but this crash occurs without the index.) I'm interested in advice on how to go about hunting down my problem. Something along the lines of a debugging malloc, or valgrind, for Postgres. Jack Orenstein
Static memory, shared memory
I am writing a Postgres extension, and thought that I had memory corruption, (thanks for the --enable-cassert lead). I might, but It now looks like I need to understand the use of shared memory and locking in Postgres. So I have two questions. 1) I am now guessing that my original problem is caused by relying on static memory in my extension (i.e., in the source declaring PG_MODULE_MAGIC). This static memory is almost but not quite constant -- it is initialized from _PG_init, and then never modified. I suspect that this cannot work in general (since Postgres is multi-process), but I thought it would be adequate for early development. However, I am seeing this static memory get corrupted even when there is only a single process executing the extension code (verified by examining getpid()). So the question is this: Is the use of non-constant static memory ill-advised, even assuming there is just one process relying on it? Or is it more likely that I still have run-of-the-mill memory corruption. (--enable-cassert hasn't notified me of any problems). 2) Assuming that I should be using shared memory instead of static, I am reading https://www.postgresql.org/docs/12/xfunc-c.html#id-1.8.3.13.14, and examining contrib/pg_prewarm. The xfunc-c documentation mentions RequestNamedLWLockTranche to get an array of LWLocks. But the sample code that follows calls GetNamedLWLockTranche. And the pg_prewarm code doesn't rely on an array of locks, it initializes a single lock, (which I think would be adequate for my needs). I understand the purpose of locks for obtaining and manipulating shared memory but I am confused about the correct way to proceed. I'm guessing it is safe to assume that pg_prewarm works and is a good model to follow, and that the doc may be buggy. Can someone clarify my confusion, and perhaps point me at a tutorial on correct usage of the interfaces for LWLocks and shared memory, (I haven't been able to find one). Thank you. Jack Orenstein
Understanding GIN indexes
I am building a new type, which will be indexed using a GIN index. Things are starting to work, and I am seeing queries use the index, call the partialMatch(), consistent(), and compare() functions, and return correct results. However, I am still unclear on some aspects of how partialMatch and consistent are supposed to work, (so my implementation of consistent() always sets *refresh to true). 1) The recheck logic of consistent() is unclear to me. The docs say ( https://www.postgresql.org/docs/12/gin-extensibility.html): On success, *recheck should be set to true if the heap tuple needs to be rechecked against the query operator, or false if the index test is exact. That is, a false return value guarantees that the heap tuple does not match the query; a true return value with *recheck set to false guarantees that the heap tuple does match the query; and a true return value with *recheck set to true means that the heap tuple might match the query, so it needs to be fetched and rechecked by evaluating the query operator directly against the originally indexed item. How can it ever be correct to return true and set *recheck to false? My understanding of conventional (btree) indexes is that the row needs to be retrieved, and the index condition rechecked, because the table has visibility information, and the index does not -- a key in the index might correspond to an obsolete row version. I understand visibility map optimizations, and the fact that going to the actual data page can sometimes be skipped. But that doesn't seem to be what the consistent() refetch flag is about. In other words, how can consistent() ever decide that a recheck is not necessary, since the index entry may be from an obsolete row version? Couldn't returning true and setting *recheck to false result in a false positive? 2) For partial matches, why does consistent() need to be called at all? For a given key (2nd arg), partialMatch() decides whether the key satisfies the index condition. Why is a further check by consistent() required? I think that my mental model of how GIN works must be way off. Is there a presentation or paper that explains how GIN works? Jack Orenstein
Re: Static memory, shared memory
On Sat, Jan 9, 2021 at 12:18 PM Tom Lane wrote: > Jack Orenstein writes: > > I am writing a Postgres extension, and thought that I had memory > > corruption, (thanks for the --enable-cassert lead). I might, but It now > > looks like I need to understand the use of shared memory and locking in > > Postgres. So I have two questions. > > > 1) I am now guessing that my original problem is caused by relying on > > static memory in my extension (i.e., in the source declaring > > PG_MODULE_MAGIC). This static memory is almost but not quite constant -- it > > is initialized from _PG_init, and then never modified. I suspect that this > > cannot work in general (since Postgres is multi-process), but I thought it > > would be adequate for early development. However, I am seeing this static > > memory get corrupted even when there is only a single process executing the > > extension code (verified by examining getpid()). > > Define what you mean by "corrupted". It seems highly unlikely that any > code but your own is touching this memory. Some fields have expected values, others do not. I think I have just figured out this problem, and it was indeed my own gun shooting my own foot. > > Really the big-picture question here is what are you hoping to accomplish > and why do you think this memory might need to be shared? The type I am implementing depends on looking up data in an array whose size is approximately 64k. This array needs to be computed once and for all early on, and is then consulted as the type is used. For now, I have hardwired the parameters that determine the array's contents, and the array is constructed during _PG_init. I will eventually remove this scaffolding, and compute the array contents when required, which should be a rare event. Jack Orenstein