IS NOT DISTINCT FROM statement
I have some query: EXPLAIN ANALYZE select id from sometable where fkey IS NOT DISTINCT FROM 21580; QUERY PLAN Gather (cost=10.00..39465.11 rows=1 width=4) (actual time=0.512..129.625 rows=1 loops=1) Workers Planned: 4 Workers Launched: 4 -> Parallel Seq Scan on sometable (cost=0.00..39455.01 rows=1 width=4) (actual time=77.995..103.806 rows=0 loops=5) Filter: (NOT (fkey IS DISTINCT FROM 21580)) Rows Removed by Filter: 675238 Planning time: 0.101 ms Execution time: 148.517 ms Other Query: EXPLAIN ANALYZE select id from table where fkey=21580; QUERY PLAN -- Index Scan using sometable_index1 on sometable (cost=0.43..8.45 rows=1 width=4) (actual time=0.075..0.076 rows=1 loops=1) Index Cond: (fkey = 21580) Planning time: 0.117 ms Execution time: 0.101 ms (4 rows) There is unique index on sometable(fkey); Is there any reason that "NOT DISTINCT FROM" can't be autotransformed to "=" when value on right side of expression is not NULL or is this any way to use index with "IS NOT DISTINCT FROM" statement? Artur Zajac
Re: IS NOT DISTINCT FROM statement
Artur Zając wrote: > Is there any reason that „NOT DISTINCT FROM” can’t be autotransformed to „=” > when value > on right side of expression is not NULL or is this any way to use index with > „IS NOT DISTINCT FROM” statement? That would subtly change the semantics of the expression: test=> SELECT NULL IS NOT DISTINCT FROM 21580; ?column? -- f (1 row) test=> SELECT NULL = 21580; ?column? -- (1 row) One expression is FALSE, the other NULL. It doesn't matter in the context of your specific query, but it could matter. Yours, Laurenz Albe -- Cybertec | https://www.cybertec-postgresql.com
Re: IS NOT DISTINCT FROM statement
On Sat, 9 Mar 2019 at 00:30, Artur Zając wrote: > Is there any reason that „NOT DISTINCT FROM” can’t be autotransformed to „=” > when value on right side of expression is not NULL or is this any way to use > index with „IS NOT DISTINCT FROM” statement? Probably nothing other than nobody has done it yet. It might be reasonable to have some transformation stage called from distribute_restrictinfo_to_rels() when adding single rel RestrictInfos to RTE_RELATION base rels. It's only these you can check for NOT NULL constraints, i.e. not so possible with rtekinds such as RTE_FUNCTION and the like. It becomes more complex if you consider that someone might have added a partial index on the relation that matches the IS NOT DISTINCT FROM clause. In this case, they might not be happy that their index can no longer be used. Fixing that would require some careful surgery on predicate_implied_by() to teach it about IS NOT DISTINCT FROM clauses. However, that seems to go a step beyond what predicate_implied_by() does for now. Currently, it only gets to know about quals. Not the relations they belong to, so there'd be no way to know that the NOT NULL constraint exists from there. I'm not sure if there's a good reason for this or not, it might be because it's not been required before. It gets more complex still if you want to consider other quals in the list to prove not nullness. In short, probably possible, but why not just write an equality clause, if you know NULLs are not possible? -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
RE: IS NOT DISTINCT FROM statement
> In short, probably possible, but why not just write an equality clause, if > you know NULLs are not possible? In fact I construct query like this (usually in pl/pgsql). SELECT column FROM table WHERE column1 IS NOT DISTINCT FROM $1 AND column2 IS NOT DISTINCT FROM $2; "IS NOT DISTINCT FROM" statement simplifies the query ($1 OR $2 may be null, col1 and col2 has indexes). I made some workaround. I made function: CREATE OR REPLACE FUNCTION smarteq(v1 int,v2 INT) RETURNS BOOL AS $BODY$ SELECT (CASE WHEN v2 IS NULL THEN (v1 IS NULL) ELSE v1=v2 END); $BODY$ LANGUAGE 'sql' IMMUTABLE PARALLEL SAFE; And then explain analyze select id from sometable where smarteq(id1,21580); QUERY PLAN -- Index Scan using sometable_index1 on sometable (cost=0.43..8.45 rows=1 width=4) (actual time=0.085..0.086 rows=1 loops=1) Index Cond: (id1 = 21580) Planning time: 0.223 ms Execution time: 0.117 ms (4 rows) explain analyze select id from sometable where smarteq(id1,NULL); QUERY PLAN -- Bitmap Heap Scan on sometable (cost=19338.59..57045.02 rows=882343 width=4) (actual time=116.236..306.304 rows=881657 loops=1) Recheck Cond: (id1 IS NULL) Heap Blocks: exact=9581 -> Bitmap Index Scan on sometable_index1 (cost=0.00..19118.00 rows=882343 width=0) (actual time=114.209..114.209 rows=892552 loops=1) Index Cond: (id1 IS NULL) Planning time: 0.135 ms Execution time: 339.229 ms It looks like it works, but I must check if it will still works in plpgsql (I expect some problems if query is prepared). Artur Zajac
Re: IS NOT DISTINCT FROM statement
On Sat, 9 Mar 2019 at 01:25, Artur Zając wrote: > I made some workaround. I made function: > > CREATE OR REPLACE FUNCTION smarteq(v1 int,v2 INT) RETURNS BOOL AS > $BODY$ > SELECT (CASE WHEN v2 IS NULL THEN (v1 IS NULL) ELSE v1=v2 END); > $BODY$ LANGUAGE 'sql' IMMUTABLE PARALLEL SAFE; > explain analyze select id from sometable where smarteq(id1,NULL); > QUERY PLAN > -- > Bitmap Heap Scan on sometable (cost=19338.59..57045.02 rows=882343 width=4) > (actual time=116.236..306.304 rows=881657 loops=1) >Recheck Cond: (id1 IS NULL) >Heap Blocks: exact=9581 >-> Bitmap Index Scan on sometable_index1 (cost=0.00..19118.00 > rows=882343 width=0) (actual time=114.209..114.209 rows=892552 loops=1) > Index Cond: (id1 IS NULL) > Planning time: 0.135 ms > Execution time: 339.229 ms > > It looks like it works, but I must check if it will still works in plpgsql (I > expect some problems if query is prepared). I think with either that you'll just be at the mercy of whether a generic or custom plan is chosen. If you get a custom plan then likely your case statement will be inlined and constant folded away, but for a generic plan, that can't happen since those constants are not consts, they're parameters. Most likely, if you've got an index on the column you'll perhaps always get a custom plan as the generic plan would result in a seqscan and it would have to evaluate your case statement for each row. By default, generic plans are only considered on the 6th query execution and are only chosen if the generic cost is cheaper than the average custom plan cost + fuzz cost for planning. PG12 gives you a bit more control over that with the plan_cache_mode GUC, but... that's the not out yet. However, possibly the cost of planning each execution is cheaper than doing the seq scan, so you might be better off with this. There is a risk that the planner does for some reason choose a generic plan and ends up doing the seq scan, but for that to happen likely the table would have to be small, in which case it wouldn't matter or the costs would have to be off, which might cause you some pain. The transformation mentioned earlier could only work if the arguments of the IS NOT DISTINCT FROM were Vars or Consts. It couldn't work with Params since the values are unknown to the planner. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Hot to model data in DB (PostgreSQL) for SNMP-like multiple configurations
I am working on product managing and monitoring Network (NMS-like products).
Product manages configuration of network devices, for now each device has
stored its configuration in simple table - this was the original design.
CREATE TABLE public.configuration(
id integer NOT NULL,
config json NOT NULL,
CONSTRAINT configuration_pkey PRIMARY KEY (id),)
A config looks like:
{
"_id": 20818132,
"type": "Modem",
"data": [{
"oid": "1.3.6.1.4.1..3.10.2.2.25.4.1.40",
"instance": "24",
"value": "null"
},
{
"oid": "1.3.6.1.4.1..3.5.10.1.86",
"instance": "0",
"value": "562"
},
{
"oid": "1.3.6.1.4.1..3.5.10.3.92.4.1",
"instance": "0",
"value": "0"
},
{
"oid": "1.3.6.1.4.1..3.10.2.2.25.4.1.43",
"instance": "24",
"value": "vlan24"
},
{
"oid": "1.3.6.1.4.1..3.10.2.2.25.4.1.43",
"instance": "25",
"value": "vlan25"
}
]}
And there are many plv8 (java script procedural language extension for
PostgreSQL) stored procedures working on bulks of such config, reading some
OIDs, changing them conditionally, removing some of them and adding others,
especially in use cases like: There are some upper-level META-configuration
of different level, which during change have to update all their updated
parameters to all affected leaves configs. An simple test-example (but
without touching 'data' node)
CREATE OR REPLACE FUNCTION public.process_jsonb_plv8()
RETURNS void AS$BODY$
var CFG_TABLE_NAME = "configurations";
var selPlan = plv8.prepare( "select c.config from " + CFG_TABLE_NAME +
" c where c.id = $1", ['int'] );
var updPlan = plv8.prepare( 'update ' + CFG_TABLE_NAME + ' set config
= $1 where id = $2', ['jsonb','int'] );
try {
var ids = plv8.execute('select id from devices');
for (var i = 0; i < ids.length; i++) {
var db_cfg = selPlan.execute(ids[i].id); //Get current json
config from DB
var cfg = db_cfg[0].config;
cfg["key0"] = 'plv8_json'; //-add some dummy key
updPlan.execute(cfg, ids[i].id); //put uopdated JSON config in DB
plv8.elog(NOTICE, "UPDATED = " + ids[i].id);
}} finally {
selPlan.free();
updPlan.free();}
return;$BODY$
LANGUAGE plv8 VOLATILE
COST 100;
For real use-cases plv8 SPs are more complicated, doing FOR-LOOP through
ALL OIDs object of 'data' array, checking if it is looking for and update
value an/or remove it and/or add newer if necessary.
Since number of devices in DB increased from several hundreds to 40K or
even 70K, and number of OID+Instance combinations also increased from
several hundred to ~1K and sometimes up to 10K within a config, we start
facing slowness in bulk (especially global -> update to ALL Devices)
updates/searches.
In order to get rid off FOR LOOP step for each configuration I've converted
data-node from array to object (key-value model), something like :
{
"_id": 20818132,
"type": "Modem",
"data": {
"1.3.6.1.4.1..3.10.2.2.25.4.1.40": {
"24": "null"
},
"1.3.6.1.4.1..3.5.10.1.86": {
"0": "562"
},
"1.3.6.1.4.1..3.5.10.3.92.4.1": {
"0": "0"
},
"1.3.6.1.4.1..3.10.2.2.25.4.1.43": {
"24": "vlan24",
"25": "vlan25"
}
}}
Now in order to get a concrete OID (e.g.
"1.3.6.1.4.1..3.10.2.2.25.4.1.43") and/or its instance I do 1-2 *O(1)*
operations instead *O(n)*. And it become a bit faster. After I've changed
column type from json to jsonb - I've got a lot of memory issues with plv8
stored procedures, so now ideas is:
*What are the best practices to store such data and use cases in DB?*
taking in considerations following: - Bulk and global updates are often
enough (user-done operation) - several times per week and it takes long
time - several minutes, annoying user experience. - Consulting some OIDs
only from concrete config is medium frequency use case - Consulting ALL
devices have some specific OID (SNMP Parameter) settled to a specific value
- medium frequency cases. - Consult (read) a configuration for a specific
device as a whole document - often use case (it is send to device as json
or as converted CSV, it is send in modified json format to other utilities,
etc)
One of suggestion from other oppinions is to move ALL configurations to
simple plain relational table
CREATE TABLE public.configuration_plain(
device_id integer,
oid text,
instance text,
value text)
Looking like
*id*
*oid*
*instance*
*value*
20818132
1.3.6.1.4.1..2.13
0
VSAT
20818132
1.3.6.1.4.1..3.10.2.2.10.15
0
0
20818132
1.3.6.1.4.1..3.10.2.2.10.17
0
0
20818132
1.3.6.1.4.1..3.10.2.2.10.18
0
1
20818132
1.3.6.1.4.1..3.10.2.2.10.19
0
2
20818132
1.3.6.1.4.1..3.10.
Re: IS NOT DISTINCT FROM statement
David Rowley writes: > On Sat, 9 Mar 2019 at 01:25, Artur Zając wrote: >> CREATE OR REPLACE FUNCTION smarteq(v1 int,v2 INT) RETURNS BOOL AS >> $BODY$ >> SELECT (CASE WHEN v2 IS NULL THEN (v1 IS NULL) ELSE v1=v2 END); >> $BODY$ LANGUAGE 'sql' IMMUTABLE PARALLEL SAFE; > The transformation mentioned earlier could only work if the arguments > of the IS NOT DISTINCT FROM were Vars or Consts. It couldn't work with > Params since the values are unknown to the planner. Just looking at this example, I'm wondering if there'd be any value in adding a rule to eval_const_expressions that converts IS DISTINCT FROM with one constant-NULL argument into an IS NOT NULL test on the other argument. Doing anything with the general case would be hard, as you mentioned, but this "workaround" suggests that the OP isn't actually concerned with the general case. [ experiments... ] Oh, look at this: regression=# explain verbose select f1 is distinct from null from int4_tbl; QUERY PLAN --- Seq Scan on public.int4_tbl (cost=0.00..1.05 rows=5 width=1) Output: (f1 IS NOT NULL) (2 rows) regression=# explain verbose select f1 is not distinct from null from int4_tbl; QUERY PLAN --- Seq Scan on public.int4_tbl (cost=0.00..1.05 rows=5 width=1) Output: (f1 IS NULL) (2 rows) So somebody already inserted this optimization, but I don't see it happening in eval_const_expressions ... oh, it's way earlier, in transformAExprDistinct: /* * If either input is an undecorated NULL literal, transform to a NullTest * on the other input. That's simpler to process than a full DistinctExpr, * and it avoids needing to require that the datatype have an = operator. */ if (exprIsNullConstant(rexpr)) return make_nulltest_from_distinct(pstate, a, lexpr); if (exprIsNullConstant(lexpr)) return make_nulltest_from_distinct(pstate, a, rexpr); I'm hesitant to call that wrong; the ability to avoid a dependency on an "=" operator is kind of nice. But it doesn't help for cases requiring a Param substitution. So maybe if we *also* had a check for this in eval_const_expressions, that would address the OP's problem. But the use-case would be a bit narrow given that the parser is catching the simplest case. regards, tom lane
Re: Hot to model data in DB (PostgreSQL) for SNMP-like multiple configurations
Is there a reason not to use a relational model instead of json(b) here? I
think that is in fact considered best practice.
On Fri, 8 Mar 2019 at 15:40, Alexandru Lazarev
wrote:
> I am working on product managing and monitoring Network (NMS-like
> products).
>
> Product manages configuration of network devices, for now each device has
> stored its configuration in simple table - this was the original design.
>
> CREATE TABLE public.configuration(
> id integer NOT NULL,
> config json NOT NULL,
> CONSTRAINT configuration_pkey PRIMARY KEY (id),)
>
> A config looks like:
>
> {
> "_id": 20818132,
> "type": "Modem",
> "data": [{
> "oid": "1.3.6.1.4.1..3.10.2.2.25.4.1.40",
> "instance": "24",
> "value": "null"
> },
> {
> "oid": "1.3.6.1.4.1..3.5.10.1.86",
> "instance": "0",
> "value": "562"
> },
> {
> "oid": "1.3.6.1.4.1..3.5.10.3.92.4.1",
> "instance": "0",
> "value": "0"
> },
> {
> "oid": "1.3.6.1.4.1..3.10.2.2.25.4.1.43",
> "instance": "24",
> "value": "vlan24"
> },
> {
> "oid": "1.3.6.1.4.1..3.10.2.2.25.4.1.43",
> "instance": "25",
> "value": "vlan25"
> }
> ]}
>
> And there are many plv8 (java script procedural language extension for
> PostgreSQL) stored procedures working on bulks of such config, reading some
> OIDs, changing them conditionally, removing some of them and adding others,
> especially in use cases like: There are some upper-level META-configuration
> of different level, which during change have to update all their updated
> parameters to all affected leaves configs. An simple test-example (but
> without touching 'data' node)
>
> CREATE OR REPLACE FUNCTION public.process_jsonb_plv8()
> RETURNS void AS$BODY$
> var CFG_TABLE_NAME = "configurations";
> var selPlan = plv8.prepare( "select c.config from " + CFG_TABLE_NAME + " c
> where c.id = $1", ['int'] );
> var updPlan = plv8.prepare( 'update ' + CFG_TABLE_NAME + ' set config = $1
> where id = $2', ['jsonb','int'] );
>
> try {
>
> var ids = plv8.execute('select id from devices');
>
> for (var i = 0; i < ids.length; i++) {
> var db_cfg = selPlan.execute(ids[i].id); //Get current json config
> from DB
> var cfg = db_cfg[0].config;
> cfg["key0"] = 'plv8_json'; //-add some dummy key
> updPlan.execute(cfg, ids[i].id); //put uopdated JSON config in DB
> plv8.elog(NOTICE, "UPDATED = " + ids[i].id);
>
>
> }} finally {
> selPlan.free();
> updPlan.free();}
> return;$BODY$
> LANGUAGE plv8 VOLATILE
> COST 100;
>
> For real use-cases plv8 SPs are more complicated, doing FOR-LOOP through
> ALL OIDs object of 'data' array, checking if it is looking for and update
> value an/or remove it and/or add newer if necessary.
>
> Since number of devices in DB increased from several hundreds to 40K or
> even 70K, and number of OID+Instance combinations also increased from
> several hundred to ~1K and sometimes up to 10K within a config, we start
> facing slowness in bulk (especially global -> update to ALL Devices)
> updates/searches.
>
> In order to get rid off FOR LOOP step for each configuration I've
> converted data-node from array to object (key-value model), something like
> :
>
> {
> "_id": 20818132,
> "type": "Modem",
> "data": {
> "1.3.6.1.4.1..3.10.2.2.25.4.1.40": {
> "24": "null"
> },
> "1.3.6.1.4.1..3.5.10.1.86": {
> "0": "562"
> },
> "1.3.6.1.4.1..3.5.10.3.92.4.1": {
> "0": "0"
> },
> "1.3.6.1.4.1..3.10.2.2.25.4.1.43": {
> "24": "vlan24",
> "25": "vlan25"
> }
> }}
>
> Now in order to get a concrete OID (e.g.
> "1.3.6.1.4.1..3.10.2.2.25.4.1.43") and/or its instance I do 1-2 *O(1)*
> operations instead *O(n)*. And it become a bit faster. After I've changed
> column type from json to jsonb - I've got a lot of memory issues with
> plv8 stored procedures, so now ideas is:
>
> *What are the best practices to store such data and use cases in DB?*
> taking in considerations following: - Bulk and global updates are often
> enough (user-done operation) - several times per week and it takes long
> time - several minutes, annoying user experience. - Consulting some OIDs
> only from concrete config is medium frequency use case - Consulting ALL
> devices have some specific OID (SNMP Parameter) settled to a specific value
> - medium frequency cases. - Consult (read) a configuration for a specific
> device as a whole document - often use case (it is send to device as json
> or as converted CSV, it is send in modified json format to other utilities,
> etc)
>
> One of suggestion from other oppinions is to move ALL configurations to
> simple plain relational tabl
Re: Hot to model data in DB (PostgreSQL) for SNMP-like multiple configurations
For now I do not see the strong reason, but i inherited this project from
other developers,
Originally there was MongoDB and structure was more complex, having SNMP
like nested tables with OID.Instance1.Instance2.instance3 and in JSON it
looked like:
{
"1.3.6.1.4.1..3.10.2.2.25.4.1.43.1": {
"1": {
"24": "vlan24",
"25": "vlan25"
},
"2": {
"24": "127.0.0.1",
"25": "8.8.8.8"
}
}
}
Here we have table in table - How to model this in relational - with
separate tables and JOINs only?
I am not excluding in future I'll have such requirement
the other reason is that devices request their config and some other tools
requests devices configs as a single document/file - this a bit create
overhead for composing document in JSON or XML or CSV format from
relational table (I understand it is doable, but...)
BTW in PG documentation:
"
*8.14.2. Designing JSON documents effectively*
*Representing data as JSON can be considerably more flexible than the
traditional relational data model, which is compelling in environments
where requirements are fluid. It is quite possible for both approaches to
co-exist and complement each other within the same application. However,
even for applications where maximal flexibility is desired, it is still
recommended that JSON documents have a somewhat fixed structure. The
structure is typically unenforced (though enforcing some business rules
declaratively is possible), but having a predictable structure makes it
easier to write queries that usefully summarize a set of "documents"
(datums) in a table.JSON data is subject to the same concurrency-control
considerations as any other data type when stored in a table. Although
storing large documents is practicable, keep in mind that any update
acquires a row-level lock on the whole row. Consider limiting JSON
documents to a manageable size in order to decrease lock contention among
updating transactions. Ideally, JSON documents should each represent an
atomic datum that business rules dictate cannot reasonably be further
subdivided into smaller datums that could be modified independently."
https://www.postgresql.org/docs/9.6/datatype-json.html
*
On Fri, Mar 8, 2019 at 5:15 PM Alban Hertroys wrote:
> Is there a reason not to use a relational model instead of json(b) here? I
> think that is in fact considered best practice.
>
> On Fri, 8 Mar 2019 at 15:40, Alexandru Lazarev <
> [email protected]> wrote:
>
>> I am working on product managing and monitoring Network (NMS-like
>> products).
>>
>> Product manages configuration of network devices, for now each device has
>> stored its configuration in simple table - this was the original design.
>>
>> CREATE TABLE public.configuration(
>> id integer NOT NULL,
>> config json NOT NULL,
>> CONSTRAINT configuration_pkey PRIMARY KEY (id),)
>>
>> A config looks like:
>>
>> {
>> "_id": 20818132,
>> "type": "Modem",
>> "data": [{
>> "oid": "1.3.6.1.4.1..3.10.2.2.25.4.1.40",
>> "instance": "24",
>> "value": "null"
>> },
>> {
>> "oid": "1.3.6.1.4.1..3.5.10.1.86",
>> "instance": "0",
>> "value": "562"
>> },
>> {
>> "oid": "1.3.6.1.4.1..3.5.10.3.92.4.1",
>> "instance": "0",
>> "value": "0"
>> },
>> {
>> "oid": "1.3.6.1.4.1..3.10.2.2.25.4.1.43",
>> "instance": "24",
>> "value": "vlan24"
>> },
>> {
>> "oid": "1.3.6.1.4.1..3.10.2.2.25.4.1.43",
>> "instance": "25",
>> "value": "vlan25"
>> }
>> ]}
>>
>> And there are many plv8 (java script procedural language extension for
>> PostgreSQL) stored procedures working on bulks of such config, reading some
>> OIDs, changing them conditionally, removing some of them and adding others,
>> especially in use cases like: There are some upper-level META-configuration
>> of different level, which during change have to update all their updated
>> parameters to all affected leaves configs. An simple test-example (but
>> without touching 'data' node)
>>
>> CREATE OR REPLACE FUNCTION public.process_jsonb_plv8()
>> RETURNS void AS$BODY$
>> var CFG_TABLE_NAME = "configurations";
>> var selPlan = plv8.prepare( "select c.config from " + CFG_TABLE_NAME + " c
>> where c.id = $1", ['int'] );
>> var updPlan = plv8.prepare( 'update ' + CFG_TABLE_NAME + ' set config = $1
>> where id = $2', ['jsonb','int'] );
>>
>> try {
>>
>> var ids = plv8.execute('select id from devices');
>>
>> for (var i = 0; i < ids.length; i++) {
>> var db_cfg = selPlan.execute(ids[i].id); //Get current json config
>> from DB
>> var cfg = db_cfg[0].config;
>> cfg["key0"] = 'plv8_json'; //-add some dummy key
>> updPlan.execute(
