On Fri, Mar 27, 2026 at 4:31 AM Lukas Fittl <[email protected]> wrote: > I've been thinking more about 0003 (pg_collect_advice) today, and I'm > getting increasingly skeptical that we should try to get that into 19. > It just feels like there is more design work to be done here, and I > don't see the pressing need to have this in place. > > Instead, I wonder if we should just add a "debug_log_plan_advice" > setting to pg_plan_advice, that always logs the plan advice when > enabled. Basically like "always_store_advice_details", but emit a log > line in addition to doing the work. That could either be enabled on a > single session with a sufficiently high client_min_messages to consume > it directly, or written to the log for a few minutes when trying to > capture production activity (on small production systems where the > logging overhead is acceptable).
Sure, we could do something like that. It means that people have to do log parsing to get the advice out cleanly, but it avoids the concerns about memory utilization, and it's simple to code. > For the other one 0004 (pg_stash_advice) I feel its worth trying to > get it in, if we can figure out the remaining issues. I'll try to do > another pass on that tomorrow after some sleep. Let me just talk a little about the way that I see the space of possible designs here. Let's set aside what we do or do not have time to code for a moment and just think about what makes some kind of sense in theory. I believe we can divide this up along a few different axes: 1. Where is advice stored for on-the-fly retrieval? Possible answers include: in shared memory, in local memory, in a table, on disk. But realistically, I doubt that "in a table" is a realistic option. Even if we hard-coded a direct index lookup i.e. without going through the query planner, I think this would be a fairly expensive thing to do for every query, and if this is going to be usable on production systems, it has to be fast. I am absolutely certain that "on disk" is not a realistic option. Local memory is an option. It has the advantage of making server-lifespan memory leaks impossible, and the further advantage of avoiding contention between different backends, since each backend would have its own copy of the data. The big disadvantage is memory consumption. If we think the advice store is going to be small, then that might be fine, but somebody is potentially going to have thousands of advice strings stored, duplicating that across hundreds (or, gulp, thousands) of backends is pretty bad. 2. What provision do we have for durability? Possible answers include: in a table, on disk, nothing. I went with nothing, partly on the theory that it gives users more flexibility. We don't really care where they store their query IDs and advice strings, as long as they have a way to feed them into the mechanism. But of course I was also trying to save myself implementation complexity. There are some tricky things about a table: as implemented, the advice stores are cluster-wide objects, but tables are database objects. If we're supposed to automatically load advice strings from a table that might be in another database into an in-memory store, well, we can't. That could be fixed by scoping stores to a specific database, which would be inconvenient only for users who have the same schema in multiple databases and would want to share stashes across DBs, which is probably not common. A disk file is also an option. It requires inventing some kind of custom format that we can generate and parse, which has some complexity, but reading from a table has some complexity too; I'm not sure which is messier. 3. How do we limit memory usage? One possible approach involves limiting the size of the hash table by entries or by memory consumption, but I don't think that's too valuable if that's all we do, because presumably all that's going to do is annoy people who hit the limit. It could make sense if we switched to a design where the superuser creates the stash, assigns it a memory limit, and then there's a way to give permission to use that stash to some particular user who is bound by the memory limit. In that kind of design, the person setting aside the memory has different and superior privileges to the person using it, so the cap serves a real purpose. A complicating factor here is that dshash doesn't seem to be well set up to enforce memory limits. You can do it if each dshash uses a separate DSA, but that's clunky, too. A completely different direction is to treat the in-memory component of the system as a cache, backed by a table or file from which cold entries are retrieved as needed. The problem with this - and I think it's a pretty big problem - is that performance will probably fall off a cliff as soon as you overflow the cache. I mean, it might not, if most of the requests can be satisfied from cache and a small percentage get fetched from cold storage, but if it's a case where a uniformly-accessed working set is 10% larger than the cache, the cache hit ratio will go down the tubes and so will performance. 4. How do we match advice strings to queries? The query ID is the obvious answer, but you could also think of having an option to match on query text, for cases when query IDs collide. You could do something like store a tag in the query string or in a GUC and look that up, instead of the query ID, but then I'm not sure why you don't just store the advice string instead of the tag, and then you don't need a hash table lookup anymore. You could do some kind of pattern matching on the query string rather than using the query ID, but that feels like it would be hard to get right, and also probably more expensive. There are probably other options here but I don't know what they are. I doubt that it makes sense from a performance standpoint to delegate out to a function written in a high-level language, and if you want to delegate to C code then you can just forget about pg_stash_advice and just use the add_advisor hook directly. I really feel like I'm probably missing some possible techniques, here. I wonder if other people will come up with some clever ideas. 5. What should the security model be? Right now it's as simple as can be: the superuser gets to decide who can use the mechanism. But also, not to be overlooked, an individual session can always opt out of automatic advice application by clearing the stash_name GUC. It shouldn't be possible to force wrong answers on another session even if you can impose arbitrary plan_advice, but there is a good chance that you could find a way to make their session catastrophically slow, so it's good that users can opt themselves out of the mechanism. Nonetheless, if we really want to have mutually untrusting users be able to use this facility, then stashes should have owners, and you should only be able to access a stash whose owner has authorized you to do so. This is all a bit awkward because there's no way for an extension to integrate with the built-in permissions mechanisms for handling e.g. dropped users, and in-memory objects are a poor fit anyway. Also, all of this is bound to have a performance cost: if every access to a stash involves permissions checking, that's going to add a possibly-significant amount of overhead to a code path that might be very hot. And, in many environments, that permissions check would be a complete waste of cycles. Things could maybe be simplified by deciding that stash access can't be delegated: a stash has one and only one owner, and it can affect only that user and not any other. That is unlike what we do for built-in objects, but it simplifies the permissions check to strcmp(), which is super-cheap compared to catalog lookups. All in all, I don't really know which way to jump here: what I've got right now looks almost stupidly primitive, and I suspect it is, but adding complexity along what seem to be the obvious lines isn't an obvious win, either. 6. How do we tell users when there's a problem? Right now, the only available answer is to set pg_plan_advice.feedback_warnings, which I don't think is unreasonable. If you're using advice as intended, i.e. only for particular queries that really need it, then it really shouldn't be generating any output unless something's gone wrong, but I also don't think everyone is going to want this information going to the main log file. Somehow percolating the advice feedback back to the advisor that provided it -- pg_stash_advice or whatever -- feels like a thing that is probably worth doing. pg_stash_advice could then summarize the feedback and provide reports: hey, look, of the 100 queries for which you stashed advice, 97 of them always showed /* matched */ for every item of advice, but the last 3 had varying numbers of other results. Being able to query that via SQL seems like it would be quite useful. I tend to think that it's almost a given that we're going to eventually want something like this, but the details aren't entirely clear to me. It could for example be developed as a separate extension that can work for any advisor, rather than being coupled to pg_stash_advice specifically -- but I'm also not saying that's better, and I think it might be worse. Figuring out exactly what we want to do here is a lot less critical than the items mentioned above because, no matter what we do about any of those things, this can always be added afterwards if and when desired. But, I'm mentioning it for completeness. If there are other design axes we should be talking about, I'm keen to hear them. This is just what came to mind off-hand. Of course, I'm also interested in everyone's views on what the right decisions are from among the available options (or other options they may have in mind). Thanks, -- Robert Haas EDB: http://www.enterprisedb.com
