Re: PostgreSQL ping/pong to client

2019-04-18 Thread Ajay Pratap
Maybe postgresql (application) layer ping pong.

Enabling system wise timeouts can impact my whole setup including my web
app, thats why it is hard for me to configure them.

On Wed, Apr 17, 2019, 22:12 Francisco Olarte  wrote:

> On Wed, Apr 17, 2019 at 4:49 PM Ajay Pratap 
> wrote:
> > Correction: I meant when my java application dies postgres should break
> all the connections that were associated with that peer.
>
> And how is the server supposed to detect that without keepalives? TCP
> is dessigned to survice for extended period of times without traffic,
> I used that a lot in the dial up times.
>
> And what makes you think keepalives are impactful and unrealistic? I
> use them a lot, they do not impact my workloads measurably.
>
> Francisco Olarte.
>

-- 


*Disclaimer:* The information in this communication is confidential and 
may be legally privileged. It is intended solely for the use of the 
individual or entity to whom it is addressed and others authorized to 
receive it. If you are not the intended recipient you are hereby notified 
that any disclosure, copying, distribution or taking action in reliance of 
the contents of this information is strictly prohibited and may be 
unlawful. Drishti is neither liable for the improper, incomplete 
transmission of the information contained in this communication nor any 
delay in its receipt. The communication is not intended to operate as an 
electronic signature under any applicable law. Drishti assumes no 
responsibility for any loss or damage resulting from the use of e-mails.


Re: Multicolumn index for single-column queries?

2019-04-18 Thread Gavin Flower

On 18/04/2019 18:52, rihad wrote:

Hi. Say there are 2 indexes:

    "foo_index" btree (foo_id)

    "multi_index" btree (foo_id, approved, expires_at)


foo_id is an integer. Some queries involve all three columns in their 
WHERE clauses, some involve only foo_id.
Would it be ok from general performance standpoint to remove foo_index 
and rely only on multi_index? I know that
PG would have to do less work updating just one index compared to 
updating them both, but wouldn't searches

on foo_id alone become slower?

Thanks.



The multi column index will require more RAM to hold it.  So if there is 
memory contention, then there would be an increased risk of swapping, 
leading to slower query times.


I suspect that if there is more than enough RAM, then a multi column 
index will be slightly slower than a single column index. However, the 
difference will probably be lost in the noise -- in other words, the 
various things happening in the background will most likely to have far 
more significant impact on query duration.  IMHO



Cheers,
Gavin






Re: Multicolumn index for single-column queries?

2019-04-18 Thread Andreas Kretschmer




Am 18.04.19 um 08:52 schrieb rihad:

Hi. Say there are 2 indexes:

    "foo_index" btree (foo_id)

    "multi_index" btree (foo_id, approved, expires_at)


foo_id is an integer. Some queries involve all three columns in their 
WHERE clauses, some involve only foo_id.
Would it be ok from general performance standpoint to remove foo_index 
and rely only on multi_index? I know that
PG would have to do less work updating just one index compared to 
updating them both, but wouldn't searches
on foo_id alone become slower? 


it depends .

it depends on the queries you are using, on your workload. a 
multi-column-index will be large than an index over just one column,

therefore you will have more disk-io when you read from such an index.


Regards, Andreas

--
2ndQuadrant - The PostgreSQL Support Company.
www.2ndQuadrant.com





Re: Multicolumn index for single-column queries?

2019-04-18 Thread Laurenz Albe
Andreas Kretschmer wrote:
> Am 18.04.19 um 08:52 schrieb rihad:
> > Hi. Say there are 2 indexes:
> > 
> > "foo_index" btree (foo_id)
> > 
> > "multi_index" btree (foo_id, approved, expires_at)
> > 
> > 
> > foo_id is an integer. Some queries involve all three columns in their 
> > WHERE clauses, some involve only foo_id.
> > Would it be ok from general performance standpoint to remove foo_index 
> > and rely only on multi_index? I know that
> > PG would have to do less work updating just one index compared to 
> > updating them both, but wouldn't searches
> > on foo_id alone become slower? 
> 
> it depends .
> 
> it depends on the queries you are using, on your workload. a 
> multi-column-index will be large than an index over just one column,
> therefore you will have more disk-io when you read from such an index.

To be more explicit: if you can live with a slightly less efficient
index scan and want fast data modifications, use only the second index.

If you hardly ever update the table, don't mind the wasted space and
want every bit of query speed (data warehouse), having both indexes
might be better.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com





Re: Multicolumn index for single-column queries?

2019-04-18 Thread Ron

On 4/18/19 2:14 AM, Andreas Kretschmer wrote:



Am 18.04.19 um 08:52 schrieb rihad:

Hi. Say there are 2 indexes:

    "foo_index" btree (foo_id)

    "multi_index" btree (foo_id, approved, expires_at)


foo_id is an integer. Some queries involve all three columns in their 
WHERE clauses, some involve only foo_id.
Would it be ok from general performance standpoint to remove foo_index 
and rely only on multi_index? I know that
PG would have to do less work updating just one index compared to 
updating them both, but wouldn't searches
on foo_id alone become slower? 


it depends .

it depends on the queries you are using, on your workload. a 
multi-column-index will be large than an index over just one column,

therefore you will have more disk-io when you read from such an index.


But two indexes are larger than one index, and updating two indexes requires 
more disk IO than updating one index.


(Prefix compression would obviate the need for this question.  Then your 
multi-column index would be *much* smaller.)



--
Angular momentum makes the world go 'round.


Re: Multicolumn index for single-column queries?

2019-04-18 Thread Gavin Flower

On 19/04/2019 01:24, Ron wrote:

On 4/18/19 2:14 AM, Andreas Kretschmer wrote:



Am 18.04.19 um 08:52 schrieb rihad:

Hi. Say there are 2 indexes:

    "foo_index" btree (foo_id)

    "multi_index" btree (foo_id, approved, expires_at)


foo_id is an integer. Some queries involve all three columns in 
their WHERE clauses, some involve only foo_id.
Would it be ok from general performance standpoint to remove 
foo_index and rely only on multi_index? I know that
PG would have to do less work updating just one index compared to 
updating them both, but wouldn't searches
on foo_id alone become slower? 


it depends .

it depends on the queries you are using, on your workload. a 
multi-column-index will be large than an index over just one column,

therefore you will have more disk-io when you read from such an index.


But two indexes are larger than one index, and updating two indexes 
requires more disk IO than updating one index.


Agreed.

A key question would be: how often is the query run, compared to the 
frequency Insertions, Updates, and Deletions -- wrt the table.




(Prefix compression would obviate the need for this question. Then 
your multi-column index would be *much* smaller.)


True, but a multi column index will still be bigger than single column 
index.


[...]




Re: PostgreSQL ping/pong to client

2019-04-18 Thread Tom Lane
Ajay Pratap  writes:
> Enabling system wise timeouts can impact my whole setup including my web
> app, thats why it is hard for me to configure them.

Uh ... who said anything about system-wide timeouts?

The normal way to set this up is to configure the tcp_keepalives_xxx
PG parameters, which will be applied to connections to the Postgres
server, nothing else.

regards, tom lane




Re: Multicolumn index for single-column queries?

2019-04-18 Thread Harald Fuchs
Andreas Kretschmer  writes:

> Am 18.04.19 um 08:52 schrieb rihad:
>> Hi. Say there are 2 indexes:
>>
>>     "foo_index" btree (foo_id)
>>
>>     "multi_index" btree (foo_id, approved, expires_at)
>>
>>
>> foo_id is an integer. Some queries involve all three columns in
>> their WHERE clauses, some involve only foo_id.
>> Would it be ok from general performance standpoint to remove
>> foo_index and rely only on multi_index? I know that
>> PG would have to do less work updating just one index compared to
>> updating them both, but wouldn't searches
>> on foo_id alone become slower? 
>
> it depends .
>
> it depends on the queries you are using, on your workload. a
> multi-column-index will be large than an index over just one column,
> therefore you will have more disk-io when you read from such an index.

I think it also depends on the average number of rows having the same foo_id.





Re: Multicolumn index for single-column queries?

2019-04-18 Thread Ron

On 4/18/19 8:45 AM, Gavin Flower wrote:

On 19/04/2019 01:24, Ron wrote:

On 4/18/19 2:14 AM, Andreas Kretschmer wrote:

[snip]


(Prefix compression would obviate the need for this question. Then your 
multi-column index would be *much* smaller.)


True, but a multi column index will still be bigger than single column index.


TANSTAAFL.


--
Angular momentum makes the world go 'round.




Re: Method to pass data between queries in a multi-statement transaction

2019-04-18 Thread Adrian Klaver

On 4/17/19 5:30 PM, Souvik Bhattacherjee wrote:
There are few if any situations where you need to immediately and 
completely pass all values from one query to another in the same 
transaction where the queries cannot just be combined into a single 
statement.  Your representative example is one that is easily combined 
into a single statement.


 > What if I need the result of the join to be stored into table3 as 
well as the tuples that participated in the query to be deleted from 
table1. The following can be done without the need to transfer values 
from the previous query into the next:


begin;
insert into table3 (id, attr1, attr2) (select t1.cid, t1.empname, 
t2.dept from table1 t1, table2 t2 where t1.cid = t2.cid);
delete from table1 where cid in (select c.cid from table1 t1, table2 t2 
where t1.cid = t2.cid);


Well the DELETE is not going to work as c.cid will error as undefined.


commit;

However note that we have to perform the join twice, which is not 
efficient. Now to make things worse, increase the number of tables to 
join while imposing the requirement of tuple deletion to apply to all or 
to a subset of the tables that participate in join.


You might want to take a look at CTE's:

https://www.postgresql.org/docs/11/queries-with.html



Now, the stuff you are trying seems to indicate you are trying to do 
something in C, inside the engine itself, with all of this.  If that is 
the case you may want to be more clear as to what you are attempting to 
do.  But as far as server SQL goes the only persistence area are 
tables/relations - including temporary ones.



I'm trying to modify the engine here.


-SB

On Wed, Apr 17, 2019 at 6:16 PM David G. Johnston 
mailto:david.g.johns...@gmail.com>> wrote:



On Wed, Apr 17, 2019 at 3:04 PM Souvik Bhattacherjee
mailto:kivu...@gmail.com>> wrote:

Hello,

I'm trying to pass some values between queries in a
multi-statement transaction. For example, consider the following
representative multi-statement transaction:

begin;
select * from table1 t1, table2 t2 where t1.cid = t2.cid;
delete from table1 where cid in
(values-to-be-populated-from-the-previous-query);
commit;


There are few if any situations where you need to immediately and
completely pass all values from one query to another in the same
transaction where the queries cannot just be combined into a single
statement.  Your representative example is one that is easily
combined into a single statement.

Now, the stuff you are trying seems to indicate you are trying to do
something in C, inside the engine itself, with all of this.  If that
is the case you may want to be more clear as to what you are
attempting to do.  But as far as server SQL goes the only
persistence area are tables/relations - including temporary ones.

David J.




--
Adrian Klaver
adrian.kla...@aklaver.com




Re: Method to pass data between queries in a multi-statement transaction

2019-04-18 Thread Souvik Bhattacherjee
Thanks Michel.

However this only works if a is an unique attribute in the table that would
help us to identify tuples that participated in the join. Consider the
following join:

insert into table3 (id, level, empname, salary)
(select  t0.cid, t0.level, t1.empname, t2.salary from table0 t0, table1 t1,
table2 t2 where t0.cid = t1.cid and t1.pid = t2.pid);

Now if I want to delete those tuples from table2 that satisfied the join
condition, I need to execute the join again with additional attributes.
Also note that based on query plan, i.e. whether table0 and table1 were
joined first followed by table1 and table2, we have to execute one
additional join to get the tuples in table2 that satisfied the join
condition (t1.pid = t2.pid).

Getting that information while the query is executed may not be difficult.
There are other use cases in my application that require me to transfer the
data from one query to the next within a transaction.

Thus, what I'm looking for here is way to store the information and then
pass that information to the next query efficiently.
For example, is it possible to define a struct of my choice, private to the
current transaction, that would store the data and then pass it around to
the next query in the transaction without having to materialize that struct
(or deal with concurrency issues as in the hash table approach mentioned
earlier) .

-SB

On Wed, Apr 17, 2019 at 10:47 PM Michel Pelletier <
pelletier.mic...@gmail.com> wrote:

> On Wed, Apr 17, 2019 at 5:30 PM Souvik Bhattacherjee 
> wrote:
>
>> > What if I need the result of the join to be stored into table3 as well
>> as the tuples that participated in the query to be deleted from table1. The
>> following can be done without the need to transfer values from the previous
>> query into the next:
>>
>> begin;
>> insert into table3 (id, attr1, attr2) (select t1.cid, t1.empname, t2.dept
>> from table1 t1, table2 t2 where t1.cid = t2.cid);
>> delete from table1 where cid in (select c.cid from table1 t1, table2 t2
>> where t1.cid = t2.cid);
>> commit;
>>
>>
>
> You can use INSERT...RETURNING in a WITH query:
>
> postgres=# create table foo (a integer);
> CREATE TABLE
> postgres=# create table bar (b integer);
> CREATE TABLE
> postgres=# insert into bar values (42);
> INSERT 0 1
> postgres=# with i as (insert into foo values (42) returning a) delete from
> bar where b = (select a from i);
> DELETE 1
>
> -Michel
>
>
>>
>> On Wed, Apr 17, 2019 at 6:16 PM David G. Johnston <
>> david.g.johns...@gmail.com> wrote:
>>
>>>
>>> On Wed, Apr 17, 2019 at 3:04 PM Souvik Bhattacherjee 
>>> wrote:
>>>
 Hello,

 I'm trying to pass some values between queries in a multi-statement
 transaction. For example, consider the following representative
 multi-statement transaction:

 begin;
 select * from table1 t1, table2 t2 where t1.cid = t2.cid;
 delete from table1 where cid in
 (values-to-be-populated-from-the-previous-query);
 commit;

>>>
>>> There are few if any situations where you need to immediately and
>>> completely pass all values from one query to another in the same
>>> transaction where the queries cannot just be combined into a single
>>> statement.  Your representative example is one that is easily combined into
>>> a single statement.
>>>
>>> Now, the stuff you are trying seems to indicate you are trying to do
>>> something in C, inside the engine itself, with all of this.  If that is the
>>> case you may want to be more clear as to what you are attempting to do.
>>> But as far as server SQL goes the only persistence area are
>>> tables/relations - including temporary ones.
>>>
>>> David J.
>>>
>>


Re: Method to pass data between queries in a multi-statement transaction

2019-04-18 Thread Souvik Bhattacherjee
Well the DELETE is not going to work as c.cid will error as undefined.

> Yes, that's a typo. I haven't tested it out before typing; just wanted to
convey the general idea.

-SB

On Thu, Apr 18, 2019 at 10:50 AM Adrian Klaver 
wrote:

> On 4/17/19 5:30 PM, Souvik Bhattacherjee wrote:
> > There are few if any situations where you need to immediately and
> > completely pass all values from one query to another in the same
> > transaction where the queries cannot just be combined into a single
> > statement.  Your representative example is one that is easily combined
> > into a single statement.
> >
> >  > What if I need the result of the join to be stored into table3 as
> > well as the tuples that participated in the query to be deleted from
> > table1. The following can be done without the need to transfer values
> > from the previous query into the next:
> >
> > begin;
> > insert into table3 (id, attr1, attr2) (select t1.cid, t1.empname,
> > t2.dept from table1 t1, table2 t2 where t1.cid = t2.cid);
> > delete from table1 where cid in (select c.cid from table1 t1, table2 t2
> > where t1.cid = t2.cid);
>
> Well the DELETE is not going to work as c.cid will error as undefined.
>
> > commit;
> >
> > However note that we have to perform the join twice, which is not
> > efficient. Now to make things worse, increase the number of tables to
> > join while imposing the requirement of tuple deletion to apply to all or
> > to a subset of the tables that participate in join.
>
> You might want to take a look at CTE's:
>
> https://www.postgresql.org/docs/11/queries-with.html
>
> >
> > Now, the stuff you are trying seems to indicate you are trying to do
> > something in C, inside the engine itself, with all of this.  If that is
> > the case you may want to be more clear as to what you are attempting to
> > do.  But as far as server SQL goes the only persistence area are
> > tables/relations - including temporary ones.
> >
> >> I'm trying to modify the engine here.
> >
> > -SB
> >
> > On Wed, Apr 17, 2019 at 6:16 PM David G. Johnston
> > mailto:david.g.johns...@gmail.com>> wrote:
> >
> >
> > On Wed, Apr 17, 2019 at 3:04 PM Souvik Bhattacherjee
> > mailto:kivu...@gmail.com>> wrote:
> >
> > Hello,
> >
> > I'm trying to pass some values between queries in a
> > multi-statement transaction. For example, consider the following
> > representative multi-statement transaction:
> >
> > begin;
> > select * from table1 t1, table2 t2 where t1.cid = t2.cid;
> > delete from table1 where cid in
> > (values-to-be-populated-from-the-previous-query);
> > commit;
> >
> >
> > There are few if any situations where you need to immediately and
> > completely pass all values from one query to another in the same
> > transaction where the queries cannot just be combined into a single
> > statement.  Your representative example is one that is easily
> > combined into a single statement.
> >
> > Now, the stuff you are trying seems to indicate you are trying to do
> > something in C, inside the engine itself, with all of this.  If that
> > is the case you may want to be more clear as to what you are
> > attempting to do.  But as far as server SQL goes the only
> > persistence area are tables/relations - including temporary ones.
> >
> > David J.
> >
>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>


Re: Method to pass data between queries in a multi-statement transaction

2019-04-18 Thread Michael Lewis
>
> Thus, what I'm looking for here is way to store the information and then
> pass that information to the next query efficiently.
> For example, is it possible to define a struct of my choice, private to
> the current transaction, that would store the data and then pass it around
> to the next query in the transaction without having to materialize that
> struct (or deal with concurrency issues as in the hash table approach
> mentioned earlier) .
>

Perhaps I am missing something obvious, but why not use a temp table?


SQL query

2019-04-18 Thread Vikas Sharma
Hi,

I have come across a query that a developer wrote to update a few rows in
table, the query did update the two desired rows but also updated the
rest of the table with the column value as 'false'.

Update tableA set col1 = null and col2 in (1,2);

The query updated col1 to null for the rows where col2 was either 1 or 2,
rest of rows were also updated for col1 to 'false'.
 The above was run without where clause.
Could the experts throw some light on this?

Regards
Vikas Sharma


SQL query

2019-04-18 Thread David G. Johnston
On Thursday, April 18, 2019, Vikas Sharma  wrote:
>
>  The above was run without where clause.
>

There was no where clause thus every row has to be updated by
definition...null and bool evaluates to either false or null since if bool
is false the null doesn’t matter and if bool is true the result is unknown
from the null.

David J.


Re: SQL query

2019-04-18 Thread Adrian Klaver

On 4/18/19 9:43 AM, Vikas Sharma wrote:

Hi,

I have come across a query that a developer wrote to update a few rows 
in table, the query did update the two desired rows but also updated the 
rest of the table with the column value as 'false'.


Update tableA set col1 = null and col2 in (1,2);

The query updated col1 to null for the rows where col2 was either 1 or 
2, rest of rows were also updated for col1 to 'false'.

  The above was run without where clause.
Could the experts throw some light on this?


Hmm.
What Postgres version?

Assuming col1 is boolean, correct?

My experimentation:

create table up_test(id integer, col1 boolean, col2 integer);

insert into up_test values (1, 't', 1), (2, null, 2), (3, 't', 4);

 update up_test set col1=null and col2 in(1,2);
UPDATE 3

select * from up_test;
 id | col1 | col2
+--+--
  1 |  |1
  2 |  |2
  3 | f|4
(3 rows)

truncate up_test ;
TRUNCATE TABLE
insert into up_test values (1, 't', 1), (2, null, 2), (3, 't', 4);
INSERT 0 3

update up_test set col1=(null and col2 in(1,2));
UPDATE 3
test_(postgres)# select * from up_test;
 id | col1 | col2
+--+--
  1 |  |1
  2 |  |2
  3 | f|4


Looks to me it is seeing the and as part of an expression.


Regards
Vikas Sharma



--
Adrian Klaver
adrian.kla...@aklaver.com




Re: Method to pass data between queries in a multi-statement transaction

2019-04-18 Thread Michel Pelletier
On Thu, Apr 18, 2019 at 9:06 AM Michael Lewis  wrote:

> Thus, what I'm looking for here is way to store the information and then
>> pass that information to the next query efficiently.
>> For example, is it possible to define a struct of my choice, private to
>> the current transaction, that would store the data and then pass it around
>> to the next query in the transaction without having to materialize that
>> struct (or deal with concurrency issues as in the hash table approach
>> mentioned earlier) .
>>
>
> Perhaps I am missing something obvious, but why not use a temp table?
>

Right, or as Adrian and I pointed out, use a CTE (WITH) query, which will
materialize any results you want for the query.  Perhaps if you give us a
working, reproducible, self contained example of what you expect we can
help you better.


Re: SQL query

2019-04-18 Thread Ron

On 4/18/19 11:43 AM, Vikas Sharma wrote:

Hi,

I have come across a query that a developer wrote to update a few rows in 
table, the query did update the two desired rows but also updated the 
rest of the table with the column value as 'false'.


Update tableA set col1 = null and col2 in (1,2);

The query updated col1 to null for the rows where col2 was either 1 or 2, 
rest of rows were also updated for col1 to 'false'.

 The above was run without where clause.


On the QA server, right?

--
Angular momentum makes the world go 'round.




Re: ERROR: operator class "gin__int_ops" does not exist for access method "gin"

2019-04-18 Thread Adrian Klaver

On 4/17/19 9:38 PM, Julie Nishimura wrote:

Thank you! All works now


Great.
I mentioned upstream about migrating to version newer then 9.4. Given 
the amount of work you are putting into this, would you not be better 
served by going to a version that has more then ~10 months left on its 
community support?:


https://www.postgresql.org/support/versioning/

Version Current minor Supported First Release Final Release
9.4 9.4.21Yes   December 18, 2014 February 13, 2020

--
Adrian Klaver
adrian.kla...@aklaver.com




Re: SQL query

2019-04-18 Thread Ravi Krishna
Not able to produce this with PG 11.1

If col1 is any type other than boolean, the update statement fails in syntax.
If col1 is boolean, then it updated it correctly.  In other words
   update col1 = NULL
and col2 in (1,2)
is treated same as
   update col1 = NULL
 where col2 in (1,2)

Also I checked it on DB2 which rejected the UPDATE sql as bad syntax, as 
expected.





Re: SQL query

2019-04-18 Thread Ravi Krishna
Oh wait.  I see that it in both cases it did update correct target rows, but 
the value of col1 for non matching rows
is different. In the first case (and col2), the non matching rows also got 
updated.

So yes, same behavior like yours.


> Sent: Thursday, April 18, 2019 at 2:36 PM
> From: "Ravi Krishna" 
> To: "Vikas Sharma" 
> Cc: pgsql-general@lists.postgresql.org
> Subject: Re: SQL query
>
> Not able to produce this with PG 11.1
>
> If col1 is any type other than boolean, the update statement fails in syntax.
> If col1 is boolean, then it updated it correctly.  In other words
>update col1 = NULL
> and col2 in (1,2)
> is treated same as
>update col1 = NULL
>  where col2 in (1,2)
>
> Also I checked it on DB2 which rejected the UPDATE sql as bad syntax, as 
> expected.
>
>
>
>




Re: SQL query

2019-04-18 Thread Adrian Klaver

On 4/18/19 11:36 AM, Ravi Krishna wrote:

Not able to produce this with PG 11.1

If col1 is any type other than boolean, the update statement fails in syntax.
If col1 is boolean, then it updated it correctly.  In other words
update col1 = NULL
 and col2 in (1,2)
is treated same as
update col1 = NULL
  where col2 in (1,2)


The above is not the same format as OP's query:

Update tableA set col1 = null and col2 in (1,2);


Also I checked it on DB2 which rejected the UPDATE sql as bad syntax, as 
expected.







--
Adrian Klaver
adrian.kla...@aklaver.com




Re: SQL query

2019-04-18 Thread Ravi Krishna


> The above is not the same format as OP's query:
>
> Update tableA set col1 = null and col2 in (1,2);

I did include set in the sql.  I typed it wrong here.




Re: Method to pass data between queries in a multi-statement transaction

2019-04-18 Thread David G. Johnston
On Thu, Apr 18, 2019 at 9:03 AM Souvik Bhattacherjee 
wrote:

> Thanks Michel.
>
> However this only works if a is an unique attribute in the table that
> would help us to identify tuples that participated in the join. Consider
> the following join:
>
> insert into table3 (id, level, empname, salary)
> (select  t0.cid, t0.level, t1.empname, t2.salary from table0 t0, table1
> t1, table2 t2 where t0.cid = t1.cid and t1.pid = t2.pid);
>
> Now if I want to delete those tuples from table2 that satisfied the join
> condition, I need to execute the join again with additional attributes.
>

Or just add pid to table3...

> Also note that based on query plan, i.e. whether table0 and table1 were
> joined first followed by table1 and table2, we have to execute one
> additional join to get the tuples in table2 that satisfied the join
> condition (t1.pid = t2.pid).
>

???

> Getting that information while the query is executed may not be difficult.
> There are other use cases in my application that require me to transfer the
> data from one query to the next within a transaction.
>

There may be some that benefit to some degree but its likely that you can
write the application and queries in such a way to avoid a hard requirement.

Thus, what I'm looking for here is way to store the information and then
> pass that information to the next query efficiently.
>
For example, is it possible to define a struct of my choice, private to the
> current transaction, that would store the data and then pass it around to
> the next query in the transaction without having to materialize that struct
> (or deal with concurrency issues as in the hash table approach mentioned
> earlier) .
>

How much development and maintenance effort are you willing to spend here
to gain what is likely to amount to only a bit of efficiency?  Many things
are possible if you are going to modify the server code but why add grief?

David J.


Re: Multicolumn index for single-column queries?

2019-04-18 Thread Gavin Flower

On 19/04/2019 02:12, Ron wrote:

On 4/18/19 8:45 AM, Gavin Flower wrote:

On 19/04/2019 01:24, Ron wrote:

On 4/18/19 2:14 AM, Andreas Kretschmer wrote:

[snip]


(Prefix compression would obviate the need for this question. Then 
your multi-column index would be *much* smaller.)


True, but a multi column index will still be bigger than single 
column index.


TANSTAAFL.



QUOTE: [Oh, 'tanstaafl.' Means ~There ain't no such thing as a free lunch.']
From The Moon is a Harsh Mistress, by Robert Heinlein. Published 1966
Is where I first came across TANSTAAFL.

However, it appears to have been used at least as early as 1949.

Just adding this, as probably there are many people who don't know the 
acronym.






Re: Multicolumn index for single-column queries?

2019-04-18 Thread Gavin Flower

On 19/04/2019 01:47, Harald Fuchs wrote:

Andreas Kretschmer  writes:


Am 18.04.19 um 08:52 schrieb rihad:

Hi. Say there are 2 indexes:

     "foo_index" btree (foo_id)

     "multi_index" btree (foo_id, approved, expires_at)


foo_id is an integer. Some queries involve all three columns in
their WHERE clauses, some involve only foo_id.
Would it be ok from general performance standpoint to remove
foo_index and rely only on multi_index? I know that
PG would have to do less work updating just one index compared to
updating them both, but wouldn't searches
on foo_id alone become slower?

it depends .

it depends on the queries you are using, on your workload. a
multi-column-index will be large than an index over just one column,
therefore you will have more disk-io when you read from such an index.

I think it also depends on the average number of rows having the same foo_id.

The number of rows referenced by an index entry for the multi_index will 
always be less than or equal to those for the matching foo_index.


Also there will be fewer index entries per block for the multi_index.  
Which is why the I/O count will be higher; even in the best case, where 
there is an equal row referenced by the index entries.






Re: Method to pass data between queries in a multi-statement transaction

2019-04-18 Thread Souvik Bhattacherjee
Or just add pid to table3...

> That's an application requirement. So pid cannot be added at will to
table3.

How much development and maintenance effort are you willing to spend here
to gain what is likely to amount to only a bit of efficiency?  Many things
are possible if you are going to modify the server code but why add grief?

> How much development / maintenance effort do you anticipate for
implementing this feature? This is something that my application would need
so I'm willing to dedicate some time to it.

By the way, I figured that the hash table deallocation issue could be
resolved (although in an inefficient way) by serializing the data to a
string and then copying that value into the hash table during insertion.
However the hash table is still visible to all the transactions I suppose
and as a result needs to be locked. Just wanted to let you know that I have
initialized the hash table within the PostgresMain() method and the hash
table is declared as an extern variable, which I anticipate to be accessed
by many methods. How difficult is it to make a hash table (or any data
structure) private to the current transaction so that I do not have to
resort to locking?

-SB


On Thu, Apr 18, 2019 at 5:34 PM David G. Johnston <
david.g.johns...@gmail.com> wrote:

> On Thu, Apr 18, 2019 at 9:03 AM Souvik Bhattacherjee 
> wrote:
>
>> Thanks Michel.
>>
>> However this only works if a is an unique attribute in the table that
>> would help us to identify tuples that participated in the join. Consider
>> the following join:
>>
>> insert into table3 (id, level, empname, salary)
>> (select  t0.cid, t0.level, t1.empname, t2.salary from table0 t0, table1
>> t1, table2 t2 where t0.cid = t1.cid and t1.pid = t2.pid);
>>
>> Now if I want to delete those tuples from table2 that satisfied the join
>> condition, I need to execute the join again with additional attributes.
>>
>
> Or just add pid to table3...
>
>> Also note that based on query plan, i.e. whether table0 and table1 were
>> joined first followed by table1 and table2, we have to execute one
>> additional join to get the tuples in table2 that satisfied the join
>> condition (t1.pid = t2.pid).
>>
>
> ???
>
>> Getting that information while the query is executed may not be
>> difficult. There are other use cases in my application that require me to
>> transfer the data from one query to the next within a transaction.
>>
>
> There may be some that benefit to some degree but its likely that you can
> write the application and queries in such a way to avoid a hard requirement.
>
> Thus, what I'm looking for here is way to store the information and then
>> pass that information to the next query efficiently.
>>
> For example, is it possible to define a struct of my choice, private to
>> the current transaction, that would store the data and then pass it around
>> to the next query in the transaction without having to materialize that
>> struct (or deal with concurrency issues as in the hash table approach
>> mentioned earlier) .
>>
>
> How much development and maintenance effort are you willing to spend here
> to gain what is likely to amount to only a bit of efficiency?  Many things
> are possible if you are going to modify the server code but why add grief?
>
> David J.
>
>


Re: Multicolumn index for single-column queries?

2019-04-18 Thread Gavin Flower

On 19/04/2019 01:47, Harald Fuchs wrote:

Andreas Kretschmer  writes:


Am 18.04.19 um 08:52 schrieb rihad:

Hi. Say there are 2 indexes:

     "foo_index" btree (foo_id)

     "multi_index" btree (foo_id, approved, expires_at)


foo_id is an integer. Some queries involve all three columns in
their WHERE clauses, some involve only foo_id.
Would it be ok from general performance standpoint to remove
foo_index and rely only on multi_index? I know that
PG would have to do less work updating just one index compared to
updating them both, but wouldn't searches
on foo_id alone become slower?

it depends .

it depends on the queries you are using, on your workload. a
multi-column-index will be large than an index over just one column,
therefore you will have more disk-io when you read from such an index.

I think it also depends on the average number of rows having the same foo_id.

The number of rows referenced by an index entry for the multi_index will 
always be less than or equal to those for the matching foo_index.


Also there will be fewer index entries per block for the multi_index, 
which is why the I/O count will be higher even in the best case where 
there is an equal number of rows referenced by each index entry.






Re: Multicolumn index for single-column queries?

2019-04-18 Thread Gavin Flower

On 19/04/2019 14:01, Gavin Flower wrote:
[...]


Also there will be fewer index entries per block for the multi_index, 
which is why the I/O count will be higher even in the best case where 
there is an equal number of rows referenced by each index entry.



Not sure why my system had this still in my draft folder!


Sorry, for the duplication...