Automatically parsing in-line composite types
Hi! I am trying to understand how could I automatically parse an in-line composite type. By in-line composite type I mean a type corresponding to ROW. For example, in the following query: SELECT _id, body, (SELECT array_agg(ROW(comments._id, comments.body)) FROM comments WHERE comments.post_id=posts._id) AS comments FROM posts It looks like I can figure out that "comments" is an array of records. But then there is no way really to understand how to parse those records? So what are types of fields in the record? I start the parsing process by looking at types returned in RowDescription message and then reading descriptions in pg_type table. Is there some other way to get full typing information of the result I am assuming is available to PostreSQL internally? Mitar -- http://mitar.tnode.com/ https://twitter.com/mitar_m
Re: Automatically parsing in-line composite types
Hi! Bump my previous question. I find it surprising that it seems this information is not possible to be reconstructed by the client, when the server has to have it internally. Is this a new feature request or am I missing something? > I am trying to understand how could I automatically parse an in-line > composite type. By in-line composite type I mean a type corresponding > to ROW. For example, in the following query: > > SELECT _id, body, (SELECT array_agg(ROW(comments._id, comments.body)) > FROM comments WHERE comments.post_id=posts._id) AS comments FROM posts > > It looks like I can figure out that "comments" is an array of records. > But then there is no way really to understand how to parse those > records? So what are types of fields in the record? > > I start the parsing process by looking at types returned in > RowDescription message and then reading descriptions in pg_type table. > > Is there some other way to get full typing information of the result I > am assuming is available to PostreSQL internally? Mitar -- http://mitar.tnode.com/ https://twitter.com/mitar_m
Re: Automatically parsing in-line composite types
Hi! On Tue, Oct 29, 2019 at 5:23 AM Dave Cramer wrote: > Reading the RowDescription is the only way I am aware of. But that provides only the types for the top-level fields. Not the inline composite types. If your top-level field is a registered composite type then yes, it works out if you then go and read from system tables definitions of those types. But for any other case where you for example subselect a list of columns from a table in a sub-query, it does not work out. I think ideally, with introduction of composite types into PostgreSQL, RowDescription should have been extended to provide information for composite types as well, recursively. In that way you would not even have to go and fetch additional information from other types, potentially hitting race conditions. Mitar -- http://mitar.tnode.com/ https://twitter.com/mitar_m
Re: Automatically parsing in-line composite types
Hi! On Tue, Oct 29, 2019 at 9:06 AM Fabio Ugo Venchiarutti wrote: > You can use subqueries and array_agg() to deepen your output tree all > the way to a stack overflow, a single _to_json() call at the > top will recursively traverse and convert whatever you feed it. Yes, what you are describing is exactly the sad state of things: the only way to meaningfully retrieve inline composite types which are made when one aggregate things like that, or when you subselect a set of fields from a table in a sub-query, is that you then convert the whole thing to JSON and transmit it in that way. Because this is the only way you can parse things on the client. Because if you leave it as raw composite type encoding, you cannot really parse that on the client correctly in all cases without knowing what types are stored inside those composite types you are getting. But JSON is not a lossless transport format: it does not support full floating point spec (no inf, NANs) and for many types of fields it just converts to string representation of that, which can be problematic. For example, if you have binary blobs. So no, JSON is a workaround, but it is sad that we should have to use it. PostgreSQL seems to be almost there with the support for composite types and nested query results, only it seems you cannot really parse it out. I mean, why PostgreSQL even has its own binary format for results, then it could just transmit everything as JSON. :-) But that does not really work for many data types. I think RowDescription should be extended to provide full recursive metadata about all data types. That would be the best way to do it. Mitar -- http://mitar.tnode.com/ https://twitter.com/mitar_m
Re: Automatically parsing in-line composite types
Hi! On Tue, Oct 29, 2019 at 11:33 AM Tom Lane wrote: > [ shrug... ] In a world where stability of the wire protocol were > of zero value, maybe we would do that. In the real world, don't > hold your breath. Oh, yes. I would hope this would be possible in backwards compatible way. I am not too familiar with the wire protocol to know the answer to that though. > Clients would also > have to be prepared to parse and de-escape the data representation, > which is not trivial in either text or binary cases. Yes, but currently they cannot be prepared. They simply lack necessary information. So if they are not prepared, then the state is the same as it is currently: they get some composite type in its encoded representation as a value. But if they are prepared, they have necessary metadata to parse it. > On the whole I think it's generally better practice to explode your > composite types into separate fields for transmission to the client. The issue here is that it is really hard to make a general client for PostgreSQL. User might want to an arbitrary SQL query. I would like to be able to parse that automatically, without user having to specify additional how to parse it, or requiring them to change SQL query, or showing them encoded representation directly (not very user friendly). I agree that in simple cases one could just change the SQL query, but that is not really always possible. For example, aggregating into an array a related table is very useful because it makes amount of data transmitted over the wire much smaller (instead of having to repeat again and again contents of rows of main table). > Note that the cases where JSON or XML shine are where you don't > necessarily have a consistent set of fields in different instances > of the composite values. Even if we did extend RowDescription to > support describing composites' sub-fields, it wouldn't be in > much of a position to deal with that. Yes, but that case is already handled: you just have a column type "JSON' (or "JSONB") and it is clear how to automatically parse that. What I am missing is a way to automatically parse composite types. Those are generally not completely arbitrary, but are defined by the query, not by data. What would be the next step to move this further in some direction? Mitar -- http://mitar.tnode.com/ https://twitter.com/mitar_m
Re: Automatically parsing in-line composite types
Hi! On Wed, Oct 30, 2019 at 8:37 AM Merlin Moncure wrote: > Check out libpqtypes: https://github.com/pgagarinov/libpqtypes Interesting. I have looked at the code a bit but I do not find how it determines the type for inline compound types, like the ones they appear in my original SQL query example. Could you maybe point me to the piece of code there handling that? Because to my understanding/exploration that information is simply not exposed to the client in any way. :-( > it does exactly what you want. It's a wrapper for libpq that provides > client side parsing for the binary protocol with array and composite > type parsing. It looks to me that it does parsing of composite types only if they are registered composite types. But not for example ones you get if you project a subset of fields from a table in a subquery. That has no registered composite type? Also, how you are handling discovery of registered types, do you read that on-demand from the database? They are not provided over the wire? > Virtually any > non-C client application really ought to be using json rather than the > custom binary structures libpqtyps would provide. I thought that initially, too, but then found out that JSON has some heavy limitations because the implementation in PostgreSQL is standard based. There is also no hook to do custom encoding of non-JSON values. So binary blobs are converted in an ugly way (base64 would be better). You also loose a lot of meta-information, because everything non-JSON gets converted to strings automatically. Like knowing what is a date. I think MongoDB with BSON made much more sense here. It looks like perfect balance between simplicity of JSON structure and adding few more useful data types. But yes, JSON is great also because clients often have optimized JSON readers. Which can beat any other binary serialization format. In node.js, it is simply the fastest there is to transfer data: https://mitar.tnode.com/post/in-nodejs-always-query-in-json-from-postgresql/ Mitar -- http://mitar.tnode.com/ https://twitter.com/mitar_m
Re: Automatically parsing in-line composite types
Hi! On Wed, Oct 30, 2019 at 3:06 PM Merlin Moncure wrote: > It looks it up from the database. Yes, this is how I started doing it in my prototype as well. > Correct. Only declared (via CREATE TYPE) composite types will work due > to protocol limitations. Exactly. This is where I got stuck, so this is why I started this thread. :-( > So if you decided to scratch in itch and create a postgres > BSON type, no one would likely use it, since the chances of adoption > in core are slim to none. Yea. :-( So we get back to square one. :-( One other approach I was investigating was developing a Babel-like transpiler for PostgreSQL SQL, so that I could have plugins which would convert SQL queries to automatically encode values in JSON. And then parse it back out once results arrive. Because yes, as you note, JSON is the only stable and supported format among all installations there is (except for the wire format, which has limitations). So having to map to it and back, but without developer having to think about it, might be the best solution. Mitar -- http://mitar.tnode.com/ https://twitter.com/mitar_m
Re: Automatically parsing in-line composite types
Hi! On Thu, Oct 31, 2019 at 7:59 AM Tom Lane wrote: > tl;dr: there are a lot more worms in this can than you might guess. But with all those issues, how do people then use compound types in practice at all? Just at the top-level and have custom code to parse them out for every query they are making? Mitar -- http://mitar.tnode.com/ https://twitter.com/mitar_m
Watching for view changes
Hi! I am trying to see how I could use NOTIFY/LISTEN to watch changes of a complicated SELECT query which spans multiple tables. Ideally, I would like to leave to PostgreSQL to determine when some data (and which data) in the result of the SELECT query has changed. So I am thinking that creating a temporary view using that query could be a way, only if I would find a way to watch such view for changes somehow. But it seems this is not really possible. I looked into two mechanisms: - Logical replication. Instead of NOTIFY/LISTEN I could simply create a publication over a view and then subscribe to it. But it seems logical replication can be done only over base tables and not views. [1] - Using "after" trigger on the view to get notification when the view gets changed. I could even use transition relations to have information what changed. But sadly it seems that this is possible only if there is also INSTEAD OF trigger on the view. But I would like to get notification when the view has changed because underlying tables have changed, and not because of an UPDATE query on the view itself. Moreover, I do not really need writable views. [2] So I wonder if I am missing anything. Is there some other best practice how to get notifications when result of a query changes in real-time? And information what changed? How hard it would be to implement such triggers on a view for whenever a view changes? Is there a process to make a feature request? (Also, I have not really managed to get statement level "after" triggers to be run on a view for at all. Because if I rewrite a query with INSTEAD OF then triggers on those tables are triggered, not really view's. So not sure what is even expected use there.) [1] https://www.postgresql.org/docs/devel/logical-replication-restrictions.html [2] https://www.postgresql.org/docs/devel/trigger-definition.html Mitar -- http://mitar.tnode.com/ https://twitter.com/mitar_m
Re: Watching for view changes
Hi! Thanks. Care for a link or at least thread subject? Also, Googling around this seems a pretty popular request. I am quite surprised that there is not something out-of-box available for this, efficient and scalable. Mitar On Thu, Dec 20, 2018 at 7:33 AM Ron wrote: > > On 12/20/18 3:17 AM, Mitar wrote: > > Hi! > > > > I am trying to see how I could use NOTIFY/LISTEN to watch changes of a > > complicated SELECT query which spans multiple tables. Ideally, I would > > like to leave to PostgreSQL to determine when some data (and which > > data) in the result of the SELECT query has changed. So I am thinking > [snip] > > There was a LONG thread on this list a few months ago about monitoring for > DDL changes. It should prove helpful (or cause you to despair). > > > -- > Angular momentum makes the world go 'round. > -- http://mitar.tnode.com/ https://twitter.com/mitar_m
Re: Watching for view changes
Hi! Ah, that was a thread about table changes. I am interested in data changes (results from a query). Thanks. Mitar On Thu, Dec 20, 2018 at 8:01 AM Adrian Klaver wrote: > > On 12/20/18 7:54 AM, Mitar wrote: > > Hi! > > > > Thanks. Care for a link or at least thread subject? > > https://www.postgresql.org/message-id/CA%2BFnnTxqScf6mDw_7HLMfu7YTagPrsYUw-bc%3DOKsBQ0TqprvEA%40mail.gmail.com > > > > > Also, Googling around this seems a pretty popular request. I am quite > > surprised that there is not something out-of-box available for this, > > efficient and scalable. > > > > > > Mitar > > > > On Thu, Dec 20, 2018 at 7:33 AM Ron wrote: > >> > >> On 12/20/18 3:17 AM, Mitar wrote: > >>> Hi! > >>> > >>> I am trying to see how I could use NOTIFY/LISTEN to watch changes of a > >>> complicated SELECT query which spans multiple tables. Ideally, I would > >>> like to leave to PostgreSQL to determine when some data (and which > >>> data) in the result of the SELECT query has changed. So I am thinking > >> [snip] > >> > >> There was a LONG thread on this list a few months ago about monitoring for > >> DDL changes. It should prove helpful (or cause you to despair). > >> > >> > >> -- > >> Angular momentum makes the world go 'round. > >> > > > > > > > -- > Adrian Klaver > adrian.kla...@aklaver.com -- http://mitar.tnode.com/ https://twitter.com/mitar_m
Re: Watching for view changes
Hi! On Thu, Dec 20, 2018 at 8:11 AM Mike Rylander wrote: > If, as I assume, the FROM clause of the SELECT is essentially static, > just with varying WHERE conditions, you could just use an AFTER > INSERT/UPDATE/DELETE trigger on each of the tables involved to send a > NOTIFY whenever data that is exposed by the SELECT is changed. You mean that for every SQL query I am making to a database, I would manually determine which tables are involved and then setup triggers with NOTIFY that the SELECT might have changed? I am trying to see if this could be something I could abstract out that it would be done automatically for any query. I have looked into using EXPLAIN to get a list of tables involved in a query, but the issue is that it look non-trivial to determine what has really changed as a consequence of those tables changing. I would have to cache myself what was the result of a query and then myself do a diff? Are there any ways to do this inside PostgreSQL without having to cache results on the client and do it there? > You can deliver a payload, such as the table name, primary key value and, > with a little work, even a list of fields that were modified as a JSON > blob, and let the application do whatever needs to be done to react > the the changes -- issue other queries, etc. But this would mean that I would have to know how changes on involved tables influence query results. I would like to not have to do SQL query parsing and understanding on the client. So ideally, I would get information directly from PostgreSQL. For me, an API where I could do AFTER UPDATE trigger on FOR EACH ROW on a view would be perfect. In that trigger I could get information which rows of the view changed and then use NOTIFY to inform the client. Or even use transition relations to get old and new state in the case FOR EACH STATEMENT (but then I would still have to diff it probably myself). And view could represent any query, without me having to try to understand and parse it. Mitar -- http://mitar.tnode.com/ https://twitter.com/mitar_m
Re: Watching for view changes
Hi! On Thu, Dec 20, 2018 at 12:07 PM Rob Sargent wrote: > Are you hoping to see the difference in the returned values for successive > calls to the same query? i would like to in as close to real-time as possible get notification when results of a query would have changed if I would do the same query again, without me having to do polling or without me having to do diffing. So if my query was SELECT * FROM table then having after update trigger on that table would give me such information on a row to row basis, or statement basis. But I am not doing just such queries. So I was thinking that I could create a view and then do SELECT * FROM view to get update trigger on changes on the view. So changes would happen because of changes do underlying tables. And I would like to be able to know when and what in that query has changed when underlying tables have changed. I have found this package [1] which seems to be close in the idea, but it does not seem to work in all cases. There is also an older different package. [2] I would like to see if something like this could be done inside database itself. [1] https://github.com/nothingisdead/pg-live-query [2] https://github.com/numtel/pg-live-select Mitar -- http://mitar.tnode.com/ https://twitter.com/mitar_m
Re: Watching for view changes
Hi! On Thu, Dec 20, 2018 at 12:54 PM Kevin Brannen wrote: > Hmm, I guess I could see that as long as the DB wasn't too [write] busy, > else you'd be flooded with notifications. > Sure. But this could be resolved by allowing notifications to be batched together. Debounce them. So could maybe configure how often you want such notifications and if they are more often they would be combined together into one. > Maybe it's a useful idea for you ... or maybe not. 😊 > Thanks. Yes, this is one approach to do it. Hooking into every modify call at the app level and in this way have some information what is changing. I would prefer doing it in the database though, so that it could be independent from the source of the change. Moreover, not all UPDATE queries really do end up updating the data. Mitar -- http://mitar.tnode.com/ https://twitter.com/mitar_m
Re: Watching for view changes
Hi! Thanks for more responses. On Thu, Dec 20, 2018 at 6:28 PM George Neuner wrote: > >You mean that for every SQL query I am making to a database, I would > >manually determine which tables are involved and then setup triggers > >with NOTIFY that the SELECT might have changed? > > You can just put an update trigger on every table. You aren't forced > to listen for notifications. You are right. It is easier to just have triggers in advance. > The difficulty is that views are not guaranteed to be updateable. What you mean by that? I mean, just to be clear. I care only about read-only views. The changes to views I care about change view because underlying tables are updated. I do not care about UPDATE queries against views themselves. So you mean here that it is not guaranteed that you can make an UPDATE query against a view? Yes, that I know. But if underlying tables change, view is always updated, no? I mean, this is the main purpose of a view. :-) > As a technical matter, you can put a trigger on a view, but it may never fire. If fires only for UPDATE queries against views themselves (and only if INSTEAD OF trigger does not invalidate the update). But it never fires for updates which happen because of changes to the underlying tables. I would like to know: a) Do people agree/think that would be a good API for my use case? b) How hard would it be to implement/develop something like that? Is this something PostgreSQL already knows internally and it is just a question of exposing it? c) Is there some better way to achieve this? > AFAIK, update and delete triggers do work on materialized views, > because they really are physical tables. Depending on your use case, > materialized views may or may not be of help to you. Yes, materialized views are too heavy for me. But having update and delete triggers only memory-only temporary views would be perfect. Also, materialized views have to be manually refreshed, no? So it is not really true that they get updated automatically (and that update triggers would run) as soon as underling tables are modified? > >I would like to not have to do SQL query parsing and understanding > >on the client. So ideally, I would get information directly from > >PostgreSQL. For me, an API where I could do AFTER UPDATE trigger on > >FOR EACH ROW on a view would be perfect. In that trigger I could get > >information which rows of the view changed and then use NOTIFY to > >inform the client. > > You might want to use BEFORE UPDATE so you get both the old and new > row data. Not sure how this helps. If I have a query like (or a view defined with such query): SELECT "_id", "body", (SELECT row_to_json(posts) FROM posts WHERE posts."_id"=comments."postId") AS "post" FROM comments If I get a trigger notification that some row in "comments" table has changed. How do I get an updated row in the query results (or the view defined with such query). I would have to parse the SQL and figure out how to do transformation myself, no? So, I am not sure how triggers on underlying tables can really inform how to know what in the view has been updated? Mitar -- http://mitar.tnode.com/ https://twitter.com/mitar_m
Re: Watching for view changes
Hi! On Fri, Dec 21, 2018 at 11:10 PM George Neuner wrote: > A materialized view IS exactly such a deliberate cache of results from > applying a view. It is a real table that can be monitored for changes > using INSERT, UPDATE and/or DELETE triggers. Caching is needed if you want to compute a difference between previous version and new. But if you want to just know new value, then I could imagine that (a simple implementation would) on every change to any underlying table check if this change matches selectors of the query and if such apply its operations/projections and produce the new value. So yes, you need caching if you want to decrease CPU use, but you could also see it as new values being computed again and again through query. Would such caching you are mentioning really improve performance, I do not know, so it might be premature optimization? If we do not go down the cache path, then it seems there is no other way to have this "apply this query again on those updated rows from table". In a way I see query as a transformation (in relational algebra) of original tables to results and I would need to be able to apply that transformation again on new rows. And if some additional data is necessary (some rows for other non-changed tables) it would just recompute that again, instead of using cache. If we do go down the cache path, then I agree, materialized views seems nice, but I would prefer temporary materialized views: they should be cleaned up at the end of the session. Moreover, they should be ideally just in memory, not really on disk. Materialized views are currently stored to disk, no? > With a materialized view, you must apply the view code again (call > REFRESH) to see changes to the underlying tables - it doesn't happen > automagically. But when refreshed, triggers on the cache table would > react to changes. So you are saying I could use triggers on the cache table to know what really changed instead of having to compute diff myself? Interesting. Mitar -- http://mitar.tnode.com/ https://twitter.com/mitar_m
Re: Watching for view changes
Hi! This is of very practical concern. :-) I have many apps I developed in Meteor [1] using MongoDB. The whole Meteor stack is about reactive programming where web UI automatically rerenders as data in the database is changing. Meteor achieves this using complicated server-side code which tails MongoDB oplog and then maps this to reactive queries and maps how they update based on changes it observes in the oplog. This is then pushed to the client which rerenders. This approach has scalability issues and also it costs a lot of resources on the server side to first copy data from the DB into server-side component and then keep that state in the server-side component in sync with the DB. It generally has to reimplement oplog parsing, query parsing and evaluation, to be able to do all that. I like this declerative style of programming. Where you define reactive queries where you select data from DB, define a transformation, and then render it in UI. As data in DB is changing, everything else gets updated automatically. It is a pretty nice way of programming. Without having to think about which all places might be updating DB and how to update UI based on all those places. Maybe not for everyone and all tasks, but in my case I generally work with collaborative online tools where such real-time aspect of working together is pretty neat. So I like PostgreSQL and I have used in other apps. And now I am trying to see if I could find an efficient way for PostgreSQL to have such reactive query and send me data as the query is changing. I think DB already has to deal with most of such logic and wiring it together in the DB instead of server-side of the app might allow better performance and scaling here. For example, CREATE PUBLICATION seems a reasonable API as well (instead of a trigger + notification + working around a limit on how much data can be send in a notification), but it does not work on materialized views. Why is that? I thought materialized views are the same as tables, just that there is nicer API to copy a query into those tables when wanted. Currently it seems what is there is very similar to what MongoDB provides: publication/observe on a table level. So if I would not be using joins I could SUBSCRIBE to the PUBLICATION. I would still have to implement logic how to map those changes to changes to a result of a query though, to know how to update results. If I could publish a materialized view, PostgreSQL could do that mapping for me. And I could also do joins. So it seems materialized views are close to this, but not perfect. So I have some questions: - Is there a technical reason why PUBLICATION cannot be done on a materialized views? I mean, I could manually create/simulate materialized views through regular tables probably. What are differences between regular tables and materialized views? - Are there limits on how many subscribers to a PUBLICATION can there effectively be? - For my case it would be great if materialized views could be TEMPORARY, in-memory (and if PostgreSQL runs out of buffer space for it, I would prefer an error), and UNLOGGED. Any particular reasons which would prevent them to be implemented as such? [1] https://github.com/meteor/meteor Mitar On Sat, Dec 22, 2018 at 1:16 PM Rob Sargent wrote: > Is this of theoretical interest (up to and including a > specification/requirement) or this a practical concern (i.e. need to know > when to update somebody’s dashboard widget (but the query is too slow to > simply refresh on-demand)? > > > On Dec 22, 2018, at 9:42 AM, Ricardo Martin Gomez < > rimartingo...@hotmail.com> wrote: > > Hi, perhaps you can use triggers for some tables. > Regards. > > Obtener Outlook para Android <https://aka.ms/ghei36> > > -- > *From:* Mitar > *Sent:* Saturday, December 22, 2018 1:21:49 AM > *To:* Kevin Brannen > *Cc:* pgsql-general@lists.postgresql.org > *Subject:* Re: Watching for view changes > > Hi! > > On Thu, Dec 20, 2018 at 12:54 PM Kevin Brannen wrote: > >> Hmm, I guess I could see that as long as the DB wasn't too [write] busy, >> else you'd be flooded with notifications. >> > > Sure. But this could be resolved by allowing notifications to be batched > together. Debounce them. So could maybe configure how often you want such > notifications and if they are more often they would be combined together > into one. > > >> Maybe it's a useful idea for you ... or maybe not. 😊 >> > > Thanks. Yes, this is one approach to do it. Hooking into every modify call > at the app level and in this way have some information what is changing. I > would prefer doing it in the database though, so that it could be > independent from the source of the change. Moreover, not all UPDATE queries > really do end up updating the data. > > > Mitar > > -- > http://mitar.tnode.com/ > https://twitter.com/mitar_m > > > -- http://mitar.tnode.com/ https://twitter.com/mitar_m
Re: Watching for view changes
Hi! On Sun, Dec 23, 2018 at 1:00 AM George Neuner wrote: > A spreadsheet is an order of magnitude simpler to implement than a > DBMS, but the combination would be an order of magnitude (or more) > harder. Even object graph databases don't offer spreadsheet > functionality, and it would be a lot easier to do there than in a > table relational system. But having that readily available would be so cool. :-) Especially because it is hard. And also, it does not necessary have to be full graph. Just one level of dependencies. Then those are recomputed. And then anything depending on now changed values get recomputed again. And so on. So no need to traverse the whole graph at once. > In PG, all *named* tables are backed on disk - even temporary tables. > Only anonymous tables of query results can exist entirely in memory > [and even they can spill onto disk when necessary]. > > With enough memory you can cache all your tables in shared buffers and > have enough extra that you never run out of work buffers and never > overflow a work buffer. But that is the best you can achieve with PG. Thanks for all this input. So I am now thinking in terms of a materialized views. For my case it would be great if materialized views could be TEMPORARY (removed at the end of session), in-memory (and if PostgreSQL runs out of buffer space for it, I would prefer an error, instead of spilling to the disk), and UNLOGGED. Any particular reasons which would prevent them to be implemented as such? I through that materialized views are just a fancy table with stored query, so that you can easily REFRESH them, instead of doing that yourself. I can then wire triggers on underlying tables to REFRESH materialized views automatically. Is there some easy way to debounce those refreshes? If I just blindly trigger REFRESH in every row trigger, this could do a lot of work. I would prefer to combine all changes for example into 100 ms batches and refresh only once per 100 ms. So in some way I would like to be able to control the level of real-time I would need. I have found a blog post [1] about this, but it seems pretty tricky and requires an outside cron job. For 100 ms batching time this feels like something better done inside PostgreSQL itself. The last question is how do I get changes in materialized views streamed to the client. It seems one option is a trigger on the materialized view which uses NOTIFY to tell the client about the change. But NOTIFY has limit on the payload size, so I cannot just send the change to the client. It seems I would need additional table to store the change and then client would get notification, read from that table, and remove the rows read. So in a way I would need my own queue for changes. Any other suggestion how to do that? I looked into PUBLICATION and SUBSCRIBE, but it seems this is only supported between servers, not server-client, and also works only on base tables, not materialized views (not sure again why, because aren't materialized views just tables). Would it be possible to use client to subscribe instead of a server? [1] https://onethingsimple.com/2017/10/sync-materialized-views-after-debounce-period/ Mitar -- http://mitar.tnode.com/ https://twitter.com/mitar_m
Determine in a trigger if UPDATE query really changed anything
Hi! Currently I am doing: CREATE TRIGGER some_trigger AFTER UPDATE ON my_table REFERENCING NEW TABLE AS new_table OLD TABLE AS old_table FOR EACH STATEMENT EXECUTE FUNCTION my_trigger(); In my trigger I do: PERFORM * FROM ((TABLE new_table EXCEPT TABLE new_table) UNION ALL (TABLE new_table EXCEPT TABLE old_table)) AS differences LIMIT 1; IF FOUND THEN ... But I wonder if there is an easier way. I would just like to know if an UPDATE really changed anything. For DELETE I do "PERFORM * FROM old_table LIMIT 1" and for INSERT I do "PERFORM * FROM new_table LIMIT 1" and I think this is reasonable. Still, not sure why I have to store the whole relation just to know if statement really changed anything. Mitar -- http://mitar.tnode.com/ https://twitter.com/mitar_m
Re: Watching for view changes
Hi! On Fri, Dec 21, 2018 at 11:10 PM George Neuner wrote: > A materialized view IS exactly such a deliberate cache of results from > applying a view. It is a real table that can be monitored for changes > using INSERT, UPDATE and/or DELETE triggers. Are you sure one can use triggers on a materialized view? I am getting: "my_materialized_view" is not a table or view as an error when I am trying to create a trigger on materialized view. Mitar -- http://mitar.tnode.com/ https://twitter.com/mitar_m
Re: Determine in a trigger if UPDATE query really changed anything
Hi! On Mon, Dec 24, 2018 at 12:31 AM George Neuner wrote: > You're still thinking in terms of deltas for the whole table. Think > individual rows instead. > > With a BY ROW trigger, the difference between the NEW and OLD > variables lets you see the changes to the particular row. I was thinking of the statement trigger because I thought this is what works on materialized views. Now I see that this is not true anyway. But still, I am using these triggers to do a materialized view refresh. I would prefer to do those at the statement level and not at the row level? So that I run the refresh only once per base table changes. For pushing notification this can be done at the row level. Mitar -- http://mitar.tnode.com/ https://twitter.com/mitar_m
Re: Watching for view changes
Hi! On Sun, Dec 23, 2018 at 11:59 PM George Neuner wrote: > IIRC the cache table's name is generated. I don't know the proper > incantations to get it from the catalogs, but an easy way to find it > is to put the materialized view into its own tablespace, then search > pg_tables for objects in that space. Oh, what a trick. :-) Mitar -- http://mitar.tnode.com/ https://twitter.com/mitar_m
Re: Watching for view changes
Hi! On Mon, Dec 24, 2018 at 12:20 AM George Neuner wrote: > Also remember that you may have to deal with DELETEd rows. If you > can't send row data by NOTIFY, then the client HAS to cache the whole > view anyway to see what's been deleted. Client-side (web browser) is caching the view (because it is rendering it). I do not want the server-side component (PostgreSQL client) to have to cache it as well. So I will use materialized view to cache it in the PostgreSQL directly, and the sync it all the way up to the web browser. Not sure if this changes anything? > If it makes sense, have the client collect some number of > notifications and read all the indicated rows in one query. You are right. Thanks again. Mitar -- http://mitar.tnode.com/ https://twitter.com/mitar_m
Re: Watching for view changes
Hi! I am unable to find cache table's name. :-( Mitar On Mon, Dec 24, 2018 at 1:01 AM Mitar wrote: > > Hi! > > On Sun, Dec 23, 2018 at 11:59 PM George Neuner wrote: > > IIRC the cache table's name is generated. I don't know the proper > > incantations to get it from the catalogs, but an easy way to find it > > is to put the materialized view into its own tablespace, then search > > pg_tables for objects in that space. > > Oh, what a trick. :-) > > > Mitar > > -- > http://mitar.tnode.com/ > https://twitter.com/mitar_m -- http://mitar.tnode.com/ https://twitter.com/mitar_m
Re: Watching for view changes
Hi! On Mon, Dec 24, 2018 at 1:31 AM George Neuner wrote: > Did you execute the view or simply define it? I think the table is > created the 1st time the view is executed. View is populated with data and I can select from it. Still, I cannot find any table which would look like it in pg_tables? Mitar -- http://mitar.tnode.com/ https://twitter.com/mitar_m
CREATE UNLOGGED MATERIALIZED VIEW
Hi! I am looking at parser grammar rules and it looks like the following is a valid query: CREATE UNLOGGED MATERIALIZED VIEW Any particular reason this is not documented? Mitar -- http://mitar.tnode.com/ https://twitter.com/mitar_m
Re: CREATE UNLOGGED MATERIALIZED VIEW
Hi! That is for transient tables (NEW and OLD tables), no? Not for the main materialized view table. Mitar On Tue, Dec 25, 2018 at 10:14 AM Adrian Klaver wrote: > > On 12/25/18 1:13 AM, Mitar wrote: > > Hi! > > > > I am looking at parser grammar rules and it looks like the following > > is a valid query: > > > > CREATE UNLOGGED MATERIALIZED VIEW > > > > Any particular reason this is not documented? > > My guess because of this: > > https://doxygen.postgresql.org/matview_8c.html > > Line 458-461 > > "/* >* We can skip WAL-logging the insertions, unless PITR or streaming > * replication is in use. We can skip the FSM in any case. > */ > " > > > > > > > Mitar > > > > > -- > Adrian Klaver > adrian.kla...@aklaver.com -- http://mitar.tnode.com/ https://twitter.com/mitar_m
Re: CREATE UNLOGGED MATERIALIZED VIEW
Hi! On Tue, Dec 25, 2018 at 10:25 AM Adrian Klaver wrote: > Don't know. From my POV MATERIALIZED VIEWS are transient any way you > look at them. They capture a state at a point in time and then the data > they depend on moves on. Given that they can be refreshed at will not > sure logging/not logging really matters. I agree with you. But current default is RELPERSISTENCE_PERMANENT. So there is an option for RELPERSISTENCE_UNLOGGED. So, I think that we should or a) make RELPERSISTENCE_UNLOGGED default, b) document RELPERSISTENCE_UNLOGGED or c) remove it. I propose b). Mitar -- http://mitar.tnode.com/ https://twitter.com/mitar_m
Re: CREATE UNLOGGED MATERIALIZED VIEW
Hi! FYI, I started working on also adding TEMPORARY materialized views. [1] [1] https://www.postgresql.org/message-id/CAKLmikOwqJx7J%3Dfg7VFcuk2OUjqLy%3DHZj_J5stY1x8X7%2B14g5g%40mail.gmail.com UNLOGGED is still unsupported, but TEMPORARY materialized views do not have the issue mentioned in the comment above because they do not exit after the crash anyway. Mitar On Wed, Dec 26, 2018 at 8:35 AM Tom Lane wrote: > > Mitar writes: > > I am looking at parser grammar rules and it looks like the following > > is a valid query: > > CREATE UNLOGGED MATERIALIZED VIEW > > Any particular reason this is not documented? > > Because it's not supported: > > regression=# CREATE UNLOGGED MATERIALIZED VIEW foo AS select 1; > ERROR: materialized views cannot be UNLOGGED > > Tracing down the source of that error message, we find > > /* > * For now, we disallow unlogged materialized views, because it seems > * like a bad idea for them to just go to empty after a crash. (If we > * could mark them as unpopulated, that would be better, but that > * requires catalog changes which crash recovery can't presently > * handle.) > */ > if (stmt->into->rel->relpersistence == RELPERSISTENCE_UNLOGGED) > ereport(ERROR, > (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), > errmsg("materialized views cannot be UNLOGGED"))); > > So that's a value judgment you might or might not agree with (and even if > you don't, it's not clear whether there are any deeper implementation > problems that would be exposed if we just deleted this prohibition). > But there it is, and it's why the documentation doesn't mention such a > capability. > > The reason the grammar allows the possibility is likely some combination > of (1) sharing grammar with other statements, (2) an error like the above > is a lot more user-friendly than "syntax error" if someone tries to use > the nonexistent feature, and (3) making provision for future enhancements. > > regards, tom lane -- http://mitar.tnode.com/ https://twitter.com/mitar_m
Re: Watching for view changes
Hi! I started a patch to address this limitation. [1] [1] https://www.postgresql.org/message-id/CAKLmikMN%2B0W79A7Wg4krK55cTLxb%3DTARdz2b%3Dvqp19ao6AAZjw%40mail.gmail.com Mitar On Mon, Dec 24, 2018 at 1:38 AM Mitar wrote: > > Hi! > > On Mon, Dec 24, 2018 at 1:31 AM George Neuner wrote: > > Did you execute the view or simply define it? I think the table is > > created the 1st time the view is executed. > > View is populated with data and I can select from it. Still, I cannot > find any table which would look like it in pg_tables? > > > Mitar > > -- > http://mitar.tnode.com/ > https://twitter.com/mitar_m -- http://mitar.tnode.com/ https://twitter.com/mitar_m
In which session context is a trigger run?
Hi! It seems to me that triggers on a table are run in the session context of the user who made a change in the table, but not of the user who defined the trigger? So I create a temporary function: CREATE OR REPLACE FUNCTION pg_temp.my_function() And a temporary table: CREATE TEMPORARY TABLE posts_temp ( "id" CHAR(17) PRIMARY KEY, ); And I add it to a regular table as a trigger: CREATE TRIGGER posts_insert AFTER INSERT ON posts REFERENCING NEW TABLE AS new_table FOR EACH STATEMENT EXECUTE FUNCTION pg_temp.my_function(); When a row is added to "posts" table outside of my session, function "my_function" is called, but it seems it cannot access "posts_temp" table. It seems that "my_function" is called in the session of the user who modified the table, and not the user who created the function (pg_temp.my_function above) or the user who added the trigger. Is there a way to modify/configure this? Switch the session? Is there some other way that I cold define a trigger which runs for a time of my session and runs and interact with temporary objects inside of my session, when data on regular tables is modified? Mitar -- http://mitar.tnode.com/ https://twitter.com/mitar_m
Re: In which session context is a trigger run?
Hi! On Fri, Dec 28, 2018 at 12:57 PM Adrian Klaver wrote: > > CREATE OR REPLACE FUNCTION pg_temp.my_function() > > And it does what? Copies/transforms data from posts into posts_temp. > > When a row is added to "posts" table outside of my session, function > > "my_function" is called, but it seems it cannot access "posts_temp" > > table. It seems that "my_function" is called in the session of the > > Temp tables can not be seen by other sessions. Sure, but why is a temporary function used as a temporary trigger made from my session not run inside my session? Then it could see a temporary table made in my session. Mitar -- http://mitar.tnode.com/ https://twitter.com/mitar_m
Re: In which session context is a trigger run?
Hi! On Fri, Dec 28, 2018 at 3:25 PM Adrian Klaver wrote: > > Sure, but why is a temporary function used as a temporary trigger made > > There is no such thing as a temporary trigger. A trigger defined using a temporary function gets deleted once a function gets deleted, which is at the end of the session. Thus, it is a temporary trigger. Feel free to try it. Create a function in pg_temp and then define a trigger, disconnect, and you will see that trigger is deleted as well. > > from my session not run inside my session? Then it could see a > > temporary table made in my session. > > Except that is not what your OP stated: > > "And I add it to a regular table as a trigger: > > CREATE TRIGGER posts_insert AFTER INSERT ON posts REFERENCING NEW > TABLE AS new_table FOR EACH STATEMENT EXECUTE FUNCTION > pg_temp.my_function(); > > When a row is added to "posts" table outside of my session, function > "my_function" is called, but it seems it cannot access "posts_temp" > table." > > So are you talking about another case now? No. Still the same case. I have a regular table "posts" and a temporary table "posts_temp". I want to create a trigger on "posts" which calls "my_function". "my_function" then copies data from "posts" to "posts_temp". The problem is that if "posts" is modified in another session, the trigger cannot access "posts_temp". I wonder if there is a way to call "my_function" inside the same temporary context / session where it was defined, because in that same session also "posts_temp" was defined. Mitar
Re: In which session context is a trigger run?
Hi! On Fri, Dec 28, 2018 at 9:36 PM Adrian Klaver wrote: > When you create the temporary function it is 'pinned' to a particular > session/pg_temp_nn. Running the trigger in another session 'pins' it to > that session and it is not able to see the posts_temp table in the > original session. Yes. But why is trigger run in that other session? Could there be a way to get trigger to run in the session where it was declared? And yes, global temporary tables would be another approach to solve this. But being able to set the session could be another, no? Or are sessions linked to OS processes PostgreSQL is using and this is why triggers run based on the session in which a change on the table was made? Mitar -- http://mitar.tnode.com/ https://twitter.com/mitar_m
Using placeholders when creating a materialized view
Hi! Is it possible to use placeholders and provide values to the query when creating a materialized view? It looks like direct passing of values to the query is not possible and I have to encode them? Mitar -- http://mitar.tnode.com/ https://twitter.com/mitar_m
Re: In which session context is a trigger run?
Hi! On Sun, Dec 30, 2018 at 9:58 PM David G. Johnston wrote: > Doesn’t seem likely. Maybe you can use NOTIFY/LISTEN in your temporary > triggers and have your active client perform the work after being notified. Yes, this is what I ended up doing. I signal the client and then the client dispatches the call inside the correct client connection/session. It seems to work fine. Probably latency it is adding is also not too big a problem for me. I will see through time. Mitar -- http://mitar.tnode.com/ https://twitter.com/mitar_m
Re: In which session context is a trigger run?
Hi! On Mon, Dec 31, 2018 at 2:58 AM Peter J. Holzer wrote: > On 2018-12-30 08:56:13 -0800, Adrian Klaver wrote: > > On 12/30/18 3:08 AM, Peter J. Holzer wrote: > > > If I understood Mitar correctly he wants the trigger to execute in the > > > session where it was declared, not in the sessio where the statement was > > > executed that triggered the trigger. > > > > There is the additional hitch that the trigger is being declared to use a > > temporary function that is defined in an alias schema pg_temp. > > Yeah, but that would be neatly solved by the magic teleportation of the > trigger execution. (I think this is actually how he got the idea that > executing the trigger in a different session would solve his problem.) Thanks for understanding well what I am asking. Yes, it seems some form of sending around triggers would be needed. I do not really care if those triggers would run after transaction is committed. So they should be some REALLY AFTER triggers. :-) For now I am using NOTIFY/LISTEN to send information to the client which then dispatches it to current connection/session. It allows me also to batch multiple trigger calls together. Mitar -- http://mitar.tnode.com/ https://twitter.com/mitar_m
Dropping and creating a trigger
Hi! I am seeing such errors in logs: ERROR: trigger "myapp_assignments" for relation "assignments" already exists STATEMENT: BEGIN TRANSACTION; DROP TRIGGER IF EXISTS "myapp_assignments" ON "assignments"; CREATE TRIGGER "myapp_assignments" AFTER INSERT OR UPDATE OR DELETE ON "assignments" FOR EACH ROW EXECUTE PROCEDURE "tblobs_myapp"(); COMMIT; How is this possible? If I am inside a transaction, this should work, no? Mitar -- http://mitar.tnode.com/ https://twitter.com/mitar_m
Re: Dropping and creating a trigger
Hi! On Sat, Jan 5, 2019 at 9:35 AM Adrian Klaver wrote: > > How is this possible? If I am inside a transaction, this should work, no? > > Works here: I thought so. This is being run in parallel multiple times by a benchmarking tool I made. So it is not just done once, but many times (50x) at almost the same time. > select version(); >version PostgreSQL 11.1 (Debian 11.1-1.pgdg90+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 6.3.0- 18+deb9u1) 6.3.0 20170516, 64-bit In fact, using this Docker image [1] with two patches applied (the ones I currently have in commitfest). I think they are unrelated to this problem. > 2) Is this one of your 'temporary' trigger/function combos? No. I was trying to fix this code of a package I found. [1] Which currently does not work well because, again, if it runs multiple times in parallel, then it happens that sometimes the same trigger tries to be created twice in a row, failing the second time. So I tried to fix it by wrapping it into a transaction, but then surprisingly didn't work. To reproduce this (if people are interested), I think, you could try: - try using the Docker image [1] - clone this benchmarking tool [2] - after installing, modifying node_modules/pg-table-observer/dist/PgTableObserver.js to try BEGIN/COMMIT around the block, see attached patch - maybe modify index.js to provide connection information to connect to your PostgreSQL instance, CONN_STR variable - run: node --experimental-worker --expose-gc index.js pg-query-observer - ignore errors from the app, check PostgreSQL logs [1] https://github.com/mitar/docker-postgres [2] https://github.com/Richie765/pg-table-observer/blob/master/src/PgTableObserver.js#L199 [3] https://github.com/mitar/node-pg-reactivity-benchmark (How can this thread be moved to bugs mailing list?) Mitar -- http://mitar.tnode.com/ https://twitter.com/mitar_m --- node_modules/pg-table-observer/dist/PgTableObserver.js.orig 2019-01-05 14:11:33.303140087 -0800 +++ node_modules/pg-table-observer/dist/PgTableObserver.js 2019-01-05 14:19:24.817530060 -0800 @@ -373,7 +373,7 @@ trigger_name = channel + '_' + table; _context4.prev = 2; _context4.next = 5; -return db.none('\nCREATE TRIGGER $1~\nAFTER INSERT OR UPDATE OR DELETE ON $2~\nFOR EACH ROW EXECUTE PROCEDURE $3~()\n ', [trigger_name, table, _this2.trigger_func]); +return db.none('\nBEGIN; DROP TRIGGER IF EXISTS "' + trigger_name + '" ON "' + table + '"; CREATE TRIGGER "' + trigger_name + '"\nAFTER INSERT OR UPDATE OR DELETE ON "' + table + '"\nFOR EACH ROW EXECUTE PROCEDURE "' + _this2.trigger_func + '"(); COMMIT;\n '); case 5: _context4.next = 9;
Re: Watching for view changes
Hi! I have a followup to this thread. George, thanks for all the help. I spend some time now investigating various approaches here and I am reporting here some findings, so that they might help others as well. First, I have tried the approach with MATERIALIZED VIEWs to hold the cached contents of the query. I attached triggers to all source tables which then notified (using LISTEN/NOTIFY) the client about changes. Client throttled those notifications and eventually triggered a REFRESH MATERIALIZED VIEW. Because I also attached triggers on the materialized view, I got notifications (using LISTEN/NOTIFY) of what rows have changed. Client might decided to fetch also rows themselves. For this to work well I made two patches. A patch to allow creation of TEMPORARY MATERIALIZED VIEWs [1] and a patch to allow attaching triggers on materialized views [2]. In the second patch I also changed the REFRESH MATERIALIZED VIEW CONCURRENTLY logic to issue not just REMOVEs and INSERTs for changed rows, but to compute which rows have changed and issue UPDATEs for them as well. This makes it easier for the client to know what changed. This worked well but it was slower than some other packages I have found on the web which were trying to provide a similar functionality. I made a benchmark to compare them [3] and found out that there was room for improvement. First, instead of sending updates of a MATERIALIZED VIEW using LISTEN/NOTIFY and then fetching rows, I instead used a trigger to copy changes to another temporary table, and then just used DELETE FROM temp_table RETURNING * to get all results from a table and returning it to the client, all inside same transaction, so data in that temporary table was never committed. This made things a bit better, mostly latency between a change and getting it to the client became more predictable. Before there were sometimes quite large spikes. Once I did that I realized that in fact MATERIALIZED VIEW is not really necessary. All I need is a place to cache previous results of the query, but I do not really care about updates to the MATERIALIZED VIEW. So I decided to do the following. If I already have a temporary table with previous results, when I want to refresh my query, I create a new temporary table using CREATE TABLE AS using the PREPAREd query, I compute diff between those two tables in the same way as REFRESH MATERIALIZED VIEW CONCURRENTLY does, in one query, and I return those results to the client. Then I just DROP TABLE old cache table, and rename new cache table to old cache name. So instead of computing a diff, updating materialized view, running triggers, and copying to the table, I just compute a diff and this is it. This works now very well. I made and published a package doings this [4]. Of course, the main problem is still that for every change in source table I have to eventually refresh the query. And without more logic this can really become problematic. Ideally, we could reuse some ideas/code from Incremental View Maintenance [5] to compute what changes to the query results should happen based on changes to source tables. Then we could just push those to the client (and update temporary table). We would then do full refresh only when things could not be inferred. Thanks again for everything. [1] https://commitfest.postgresql.org/21/1951/ [2] https://commitfest.postgresql.org/21/1948/ [3] https://github.com/mitar/node-pg-reactivity-benchmark [4] https://github.com/tozd/node-reactive-postgres [5] https://www.postgresql.org/message-id/flat/20181227215726.4d166b4874f8983a641123f5%40sraoss.co.jp Mitar -- http://mitar.tnode.com/ https://twitter.com/mitar_m
Optimizing the same PREPAREd static query (without parameters)
Hi! If I have a PREPAREd query without parameters (static) and I EXECUTE it repeatedly in the same session, does PostgreSQL learn/optimize anything across those runs? Mitar -- http://mitar.tnode.com/ https://twitter.com/mitar_m
Re: Optimizing the same PREPAREd static query (without parameters)
Hi! On Mon, Jan 7, 2019 at 12:09 AM David Rowley wrote: > On Mon, 7 Jan 2019 at 18:54, Mitar wrote: > > If I have a PREPAREd query without parameters (static) and I EXECUTE > > it repeatedly in the same session, does PostgreSQL learn/optimize > > anything across those runs? > > Yes, it will generate the query plan on the first invocation of > EXECUTE and use that plan for all subsequent EXECUTEs for the session > until you DEALLOCATE the prepared query or DISCARD PLANS/ALL; So this sounds more like no? So the same plan is used, so PostgreSQL is not further optimizing the plan based on any statistics or anything? Mitar -- http://mitar.tnode.com/ https://twitter.com/mitar_m
Re: Optimizing the same PREPAREd static query (without parameters)
Hi! On Mon, Jan 7, 2019 at 12:44 AM David Rowley wrote: > If you're asking if it caches the result and foregoes scanning the > underlying tables, then that's a "No". Else what further optimising > did you have in mind? For example, it could learn better statistics. In documentation [1] it is written: > A generic plan assumes that each value supplied to EXECUTE is one of the > column's distinct values and that column values are uniformly distributed. > For example, if statistics record three distinct column values, a generic > plan assumes a column equality comparison will match 33% of processed rows. > Column statistics also allow generic plans to accurately compute the > selectivity of unique columns. So it could learn that the values used are not distinct values, or that column values are not uniformly distributed? And maybe decide to change the plan? So it makes a plan, runs it, determines that the plan was not as good as expected, I run it again, it decides to try another plan. It is better, it decides to switch to it and keep it. [1] https://www.postgresql.org/docs/devel/sql-prepare.html Mitar -- http://mitar.tnode.com/ https://twitter.com/mitar_m
Benchmark of using JSON to transport query results in node.js
Hi! I made some benchmarks of using JSON to transport results to node.js and it seems it really makes a difference over using native or standard PostgreSQL. So the idea is that you simply wrap all results into JSON like SELECT to_json(t) FROM (... original query ...) AS t. I am guessing because node.js/JavaScript has really fast JSON parser but for everything else there is overhead. See my blog post for more details [1]. Any feedback welcome. This makes me wonder. If serialization/deserialization makes such big impact, where there efforts to improve how results are serialized for over-the-wire transmission? For example, to use something like Capnproto [2] to serialize into structure which can be directly used without any real deserialization? [1] https://mitar.tnode.com/post/181893159351/in-nodejs-always-query-in-json-from-postgresql [2] https://capnproto.org/ Mitar -- http://mitar.tnode.com/ https://twitter.com/mitar_m
Re: Benchmark of using JSON to transport query results in node.js
Hi! On Fri, Jan 11, 2019 at 3:06 AM Tony Shelver wrote: > I'm fairly new to Postgres, but one question is how node.js implements the > native driver when fetching the data: fetchall, fetchmany or fetch.single? > Also which native driver is it using? The package I used is here [1]. It is using libpq. Is there some better native driver to use? It seems it is mostly using PQexec call, not sure how that call internally fetches the data [2]. I think the reason for slowness is because parsing of results is still done in JavaScript [3]. [1] https://github.com/brianc/node-pg-native [2] https://www.postgresql.org/docs/devel/libpq-exec.html [3] https://github.com/brianc/node-pg-native/blob/master/lib/build-result.js > Does the native driver do a round trip for each record fetched, or can it > batch them into multiples? I think it waits for all results to first arrive using native driver and then it starts processing it in JavaScript. > Another option is to query directly from node.js and get JSON or native query > from the database (extensive use of functions / stored procedures). For web applications, I was even thinking about this crazy approach: get PostgreSQL to encode all results in JSON, and then in node.js do not parse JSON, but send it as string directly to the browser. I have seen this many times in my other projects. That having REST and other stuff in between requires you to read from the database, serialize it into something, then you deserialize it back on the server-side, maybe even wrap it into ORM objects (Django I am looking at you) just so that you can send it to your REST code, which then converts it to JSON and sends it over. From my older benchmarks using Python/Django this added few 100ms very quickly when having for example time-series GeoJSON data being read and send to the client through a REST interface, few thousands of points. I hacked it so that I directly passed through GeoJSON fields as strings by injecting GeoJSON into a placeholder in otherwise serialized JSON (luckily JSON is recursive) through pure string manipulation and it improved greatly. And I just disabled all other REST content types except for JSON. We have not used XML or YAML over our API anyway. So to me it seems encoding all results in JSON is a win always in node.js. If you consume it by node.js, great. If you are planing to pass it on, maybe just pass it on as-is. > Our application is data-intensive, involving a lot of geotracking data across > hundreds of devices at it's core, and then quite a bit of geo/mapping/ > analytics around that.. It seems maybe very similar to what I had in Python/Django/REST in the past. :-) Mitar -- http://mitar.tnode.com/ https://twitter.com/mitar_m
Relaxing NaN/Infinity restriction in JSON fields
Hi! When migrating from MongoDB to PostgreSQL one thing which just surprised me now is that I cannot store NaN/Infinity in JSON fields. I know that standard JSON restricts those values, but they are a very common (and welcome) relaxation. What are prospects of this restriction being lifted? It is really sad that one cannot stores directly all IEEE 754 double precision floating point values. Mitar -- http://mitar.tnode.com/ https://twitter.com/mitar_m
Re: Relaxing NaN/Infinity restriction in JSON fields
Hi! On Mon, May 6, 2019 at 1:21 PM Tom Lane wrote: > Getting us to deviate from the RFC so blatantly would be a very hard sell. > A large part of the point of the JSON datatype is to be interoperable; > once you give that up you may as well use some not-standard-at-all > representation. Python supports that, enabled by default: https://docs.python.org/3/library/json.html#infinite-and-nan-number-values > There is not, and never has been, any claim that JSON numbers correspond > to the IEEE spec. There is note [1], but yes, it does not claim that nor I claimed that. I am just saying that the reality is that most people these days use IEEE spec floating numbers so it is sad that those cannot be easily stored in JSON, or a database. Mitar [1] https://tools.ietf.org/html/rfc7159#page-7 -- http://mitar.tnode.com/ https://twitter.com/mitar_m
Re: Relaxing NaN/Infinity restriction in JSON fields
Hi! On Wed, May 8, 2019 at 6:09 AM Alvaro Herrera wrote: > If you want to complain about JSON, it's IETF that you need to talk > about, not us -- we're just implementing their spec. As for storing the > numbers in a database, you can already do that, just not on the JSON > datatype. Yes, I see why then so many implement variations on JSON, like BSON and stuff. So that they can have mostly compatible structure, but with all floats and datetime structure. What are thoughts of adding something like that? PgJSON variant. :-) Mitar -- http://mitar.tnode.com/ https://twitter.com/mitar_m
Why is writing JSONB faster than just JSON?
Hi! I have a project where we among other data want to store static JSON objects which can get pretty large (10-100 KB). I was trying to evaluate how it would work if we simply store it as an additional column in a PostgreSQL database. So I made a benchmark [1]. The results surprised me a bit and I am writing here because I would like to understand them. Namely, it looks like writing into a jsonb typed column is 30% faster than writing into a json typed column. Why is that? Does not jsonb require parsing of JSON and conversion? That should be slower than just storing a blob as-is? [1] https://gitlab.com/mitar/benchmark-pg-json Mitar -- http://mitar.tnode.com/ https://twitter.com/mitar_m
Re: Why is writing JSONB faster than just JSON?
Hi! On Thu, Apr 15, 2021 at 12:11 PM Dmitry Dolgov <9erthali...@gmail.com> wrote: > > My point was that for JSON, after validating that the input is > > syntactically correct, we just store it as-received. So in particular > > the amount of whitespace in the value would depend on how the client > > had chosen to format the JSON. This'd affect the stored size of > > course, and I think it would have an effect on compression time too. > > Yes, I got it and just wanted to confirm you were right - this was the > reason I've observed slowdown trying to reproduce the report. Thank you for trying to reproduce the report. I did a bit more digging myself and I am still confused. First, it is important to note that the JSON I am using contains primarily random strings as values, so not really something which is easy to compress. See example at [1]. I have realized though that in the previous benchmark I have been using the same JSON document and inserting it multiple times, so compression might work across documents or something. So I ran a version of the benchmark with different JSONs being inserted (but with the same structure, just values are random strings). There was no difference. Second, as you see from [1], the JSON representation I am using is really compact and has no extra spaces. I also used pg_total_relation_size to get the size of the table after inserting 10k rows and the numbers are similar, with JSONB being slightly larger than others. So I think the idea of compression does not hold. So I do not know what is happening and why you cannot reproduce it. Maybe explain a bit how you are trying to reproduce it? Directly from psql console? Are you using the same version as me (13.2)? Numbers with inserting the same large JSON 10k times: Type: jsonb Mean: 200243.1 Stddev: 1679.7741187433503 Size: { pg_total_relation_size: '4611792896' } Type: json Mean: 256938.5 Stddev: 2471.9909890612466 Size: { pg_total_relation_size: '4597833728' } Type: text Mean: 248175.3 Stddev: 376.677594236769 Size: { pg_total_relation_size: '4597833728' } Inserting different JSON 10k times: Type: jsonb Mean: 202794.5 Stddev: 978.5346442512907 Size: { pg_total_relation_size: '4611792896' } Type: json Mean: 259437.9 Stddev: 1785.8411155531167 Size: { pg_total_relation_size: '4597833728' } Type: text Mean: 250060.5 Stddev: 912.9207249263213 Size: { pg_total_relation_size: '4597833728' } [1] https://gitlab.com/mitar/benchmark-pg-json/-/blob/master/example.json Mitar -- http://mitar.tnode.com/ https://twitter.com/mitar_m
Re: Why is writing JSONB faster than just JSON?
Hi! On Fri, Apr 23, 2021 at 10:28 AM Francisco Olarte wrote: > A fast look at the link. It seems to be long string of random LOWER > CASE HEX DIGITS. A simple huffman coder can probably put it in 5 bits > per char, and a more sophisticated algorithm can probably approach 4. But this compression-ility would apply to both JSONB and JSON column types, no? Moreover, it looks like JSONB column type ends up larger on disk. Mitar -- http://mitar.tnode.com/ https://twitter.com/mitar_m
Re: Why is writing JSONB faster than just JSON?
Hi! On Fri, Apr 23, 2021 at 10:49 AM Francisco Olarte wrote: > Of course, I did not follow the thread to deeply, just pointed that in > case you were assuming that was not going to be stored compressed. Thanks for pointing that out. I was just trying to make sure I am understanding you correctly and that we are all on the same page about implications. It seems we are. > Also, not surprised JSONB ends up being fatter, Yes, by itself this is not surprising. Why I mentioned it is because in my original post in this thread, I posted that I am surprised that inserting into JSONB column seems observably faster than into JSON or TEXT column (for same data) and I wonder why that is. One theory presented was that JSONB might compress better so there is less IO so insertion is faster. But JSONB does not look more compressed (I have not measured the size in my original benchmark), so now I am searching for other explanations for the results of my benchmark. Mitar -- http://mitar.tnode.com/ https://twitter.com/mitar_m
Determining if a table really changed in a trigger
Hi! I have a trigger like: CREATE TRIGGER update_trigger AFTER UPDATE ON my_table REFERENCING NEW TABLE AS new_table OLD TABLE AS old_table FOR EACH STATEMENT EXECUTE FUNCTION trigger_function; I would like to test inside trigger_function if the table really changed. I have tried to do: PERFORM * FROM ((TABLE old_table EXCEPT TABLE new_table) UNION ALL (TABLE new_table EXCEPT TABLE old_table)) AS differences LIMIT 1; IF FOUND THEN ... changed ... END IF; But this fails if the table contains a JSON field with the error: could not identify an equality operator for type json The table has an unique index column, if that helps. Mitar -- http://mitar.tnode.com/ https://twitter.com/mitar_m
Re: Determining if a table really changed in a trigger
Hi! Thank you everyone for your responses. I investigated them. I have also found composite type operators [1]. There is no way to tell the EXCEPT operator to use *= as its equality operator? *EXCEPT would seem to be a useful operator to have. :-) I am not sure about performance though. EXCEPT is generally fast, but probably because it can use indices, not sure how fast *= is, given that it is comparing binary representations. What is experience with this operator of others? Mitar [1] https://www.postgresql.org/docs/current/functions-comparisons.html#COMPOSITE-TYPE-COMPARISON -- http://mitar.tnode.com/ https://twitter.com/mitar_m
Re: Determining if a table really changed in a trigger
Hi! On Tue, Oct 26, 2021 at 10:17 PM Mark Dilger wrote: > I can't tell from your post if you want the trivial update to be performed, > but if not, would it work to filter trivial updates as: No, I want to skip trivial updates (those which have not changed anything). But my trigger is per statement, not per row. So I do not think your approach works there? So this is why I am then making a more complicated check inside the trigger itself. Mitar -- http://mitar.tnode.com/ https://twitter.com/mitar_m
Re: Determining if a table really changed in a trigger
Hi! On Tue, Oct 26, 2021 at 10:55 PM Mark Dilger wrote: > The trigger "my_table_trig" in the example is a per row trigger, but it > exists only to demonstrate that the rule has filtered out the appropriate > rows. You can use the rule "my_table_rule" as written and a per statement > trigger, as here: Oh, very interesting. I thought that this is not possible because WHEN condition on triggers does not have NEW and OLD. But this is a very cool way to combine rules with triggers, where a rule can still operate by row. Thank you for sharing this! > Note that there is a performance cost to storing the old rows using the > REFERENCING clause of the trigger Yea, by moving the trivial update check to a rule, I need REFERENCING only to see if there were any changes at all. This seems a bit excessive. Is there a way to check if any rows have been affected by an UPDATE inside a per statement trigger without using REFERENCING? > Note that I used equality and inequality rather than IS DISTINCT FROM and IS > NOT DISTINCT FROM in the design, but you should think about how NULL values > (old, new, or both) will behave in the solution you choose. I have just now tested the following rule: CREATE RULE filter_trivial_updates AS ON UPDATE TO my_table WHERE NEW *= OLD DO INSTEAD NOTHING; and it looks like it works well. It sidesteps the issue around equality operator for type json and also just compares nulls as just another value (which I would like). Not sure how it is performance wise in comparison with listing all columns and using the regular equality operator. I also notice that you check if a table has any rows with: SELECT true INTO have_rows FROM old_values LIMIT 1; IF have_rows THEN ... Is this just a question of style or is this a better approach than my: PERFORM * FROM old_values LIMIT 1; IF FOUND THEN ... Mitar -- http://mitar.tnode.com/ https://twitter.com/mitar_m
Re: Determining if a table really changed in a trigger
Hi! On Wed, Oct 27, 2021 at 1:16 AM Mark Dilger wrote: > If Mitar finds that suppress_redundant_updates_trigger is sufficient, that > may be a simpler solution. Thanks for mentioning it. > > The suppress_redundant_updates_trigger uses memcmp on the old and new rows. > I don't know if memcmp will be sufficient in this case, since json can be > binary unequal and yet turn out to be equal once cast to jsonb. I was using > the rule and casting the json column to jsonb before comparing for equality. Very interesting, I didn't know about that trigger. Memcmp is OK for my use case. This is why I am considering *= as well. I am guessing that if I am already doing a row comparison on every UPDATE before my AFTER trigger so that I do not run the trigger (the rule-based approach suggested by Mark), it is probably better to do the row comparison as a BEFORE trigger which prevents the UPDATE from even happening. I already pay for the row comparison so at least I could prevent the disk write as well. Do I understand that correctly? So the only remaining question is how to prevent my statement trigger from running if no rows end up being changed by INSERT/UPDATE/DELETE without having to use REFERENCING. Mitar -- http://mitar.tnode.com/ https://twitter.com/mitar_m
Re: Determining if a table really changed in a trigger
Hi! On Wed, Oct 27, 2021 at 3:56 AM Michael Lewis wrote: > If you end up with no rows changing from an insert or delete, something seems > awry. Unless you mean 0 rows affected. Isn't this the same? Isn't the number of rows affected the same as the number of rows changing? For example: DELETE FROM my_table where i=100; would not change anything in your example. But probably this is just terminology I have used badly. > Do after statement triggers still execute? I suppose they very well might. I have run the following and it seems statement triggers still execute even if nothing changes: postgres=# create table my_table (i integer, j json); CREATE TABLE postgres=# insert into my_table select gs::integer, '{"key":1}'::json from generate_series(1,3) gs; INSERT 0 3 postgres=# create function my_table_func () returns trigger as $$ declare have_rows boolean; begin raise warning 'trigger called'; if (tg_op = 'INSERT') then select true into have_rows from new_values limit 1; if have_rows then raise warning 'rows have changed'; end if; elsif (tg_op = 'UPDATE' or tg_op = 'DELETE') then select true into have_rows from old_values limit 1; if have_rows then raise warning 'rows have changed'; end if; end if; return null; end $$ language plpgsql; CREATE FUNCTION postgres=# create trigger my_table_trig_insert after insert on my_table referencing new table as new_values for each statement execute function my_table_func(); CREATE TRIGGER postgres=# create trigger my_table_trig_update after update on my_table referencing old table as old_values for each statement execute function my_table_func(); CREATE TRIGGER postgres=# create trigger my_table_trig_delete after delete on my_table referencing old table as old_values for each statement execute function my_table_func(); CREATE TRIGGER postgres=# update my_table set j = '{"key":2}'::jsonb; WARNING: trigger called WARNING: rows have changed UPDATE 3 postgres=# update my_table set j = '{"key":2}'::jsonb; WARNING: trigger called WARNING: rows have changed UPDATE 3 postgres=# create trigger z_min_update before update on my_table for each row execute function suppress_redundant_updates_trigger(); CREATE TRIGGER postgres=# update my_table set j = '{"key":2}'::jsonb; WARNING: trigger called UPDATE 0 postgres=# update my_table set j = '{"key":3}'::jsonb; WARNING: trigger called WARNING: rows have changed UPDATE 3 postgres=# delete from my_table where i = 100; WARNING: trigger called DELETE 0 postgres=# insert into my_table select * from my_table where i = 100; WARNING: trigger called INSERT 0 0 > Would the statement even execute if no rows get updated and that is prevented > with before update? I would assume null is being returned rather than old if > the trigger finds the row to be identical. It looks like a statement trigger is always called, but checking REFERENCING matches affected rows as returned by the psql shell. Also notice how the number of affected rows is non-zero for trivial update before the use of suppress_redundant_updates_trigger, both through REFERENCING and through the psql shell. That matches also documentation: > ..., a trigger that is marked FOR EACH STATEMENT only executes once for any > given operation, regardless of how many rows it modifies (in particular, an > operation that modifies zero rows will still result in the execution of any > applicable FOR EACH STATEMENT triggers). So it would be really cool to be able to access the number of affected rows inside a trigger without the use of REFERENCING. Given that WHEN condition of a statement trigger is currently mostly useless (because the condition cannot refer to any values in the table) maybe providing something like AFFECTED variable in there would be the way to go? So one could write: CREATE TRIGGER my_trigger AFTER UPDATE ON my_table FOR EACH STATEMENT WHEN AFFECTED <> 0 EXECUTE FUNCTION my_table_func(); Mitar -- http://mitar.tnode.com/ https://twitter.com/mitar_m
Re: Determining if a table really changed in a trigger
Hi! On Wed, Oct 27, 2021 at 12:56 PM Marcos Pegoraro wrote: >> Oh, very interesting. I thought that this is not possible because WHEN >> condition on triggers does not have NEW and OLD. But this is a very >> cool way to combine rules with triggers, where a rule can still >> operate by row. > > That is not true Sorry to be imprecise. In this thread I am interested in statement triggers, so I didn't mention this explicitly here. So statement triggers do not have NEW and OLD. But you can combine it with a row-level rule and this works then well together. Mitar -- http://mitar.tnode.com/ https://twitter.com/mitar_m
Re: Determining if a table really changed in a trigger
Hi! On Wed, Oct 27, 2021 at 12:46 AM Mark Dilger wrote: > I felt the same way about it, but after glancing quickly through the code and > docs nothing jumped out. The information is clearly available, as it gets > returned at the end of the UPDATE statement in the "UPDATE 0" OR "UPDATE 3", > but I don't see how to access that from the trigger. I might have to submit > a patch for that if nobody else knows a way to get it. (Hopefully somebody > will respond with the answer...?) Anyone? Any way to determine the number of affected rows in a statement trigger? Mitar -- http://mitar.tnode.com/ https://twitter.com/mitar_m
Re: Determining if a table really changed in a trigger
Hi! On Sat, Nov 6, 2021 at 2:43 PM Tom Lane wrote: > Mitar writes: > > Anyone? Any way to determine the number of affected rows in a statement > > trigger? > > Check the size of the transition relation. Yes, this is what we are currently doing, but it looks very inefficient if you want just the number, no? Or even if you want to know if it is non-zero or zero. Mitar -- http://mitar.tnode.com/ https://twitter.com/mitar_m