Manual query vs trigger during data load

2024-09-13 Thread yudhi s
Hello All,

We are having a table which is going to be inserted with 100's of millions
of rows each day. And we now want to have a requirement in which we need to
do some transformation/lookup logic built on top of a few of the input bind
values , while inserting the data. So I wanted to understand ,is it
possible to do it along with the INSERT query or is it better to have a
trigger created for the same?

For. e.g Below is the current Insert query used in the Java code. We want
to fetch the value for "column2" from a lookup table rather than directly
inserting as it's coming from the customer side. So I am thinking of a
trigger like below. But at the same time I also want to compare the
performance of a normal way of doing the lookup vs having it performed
using triggers.

So one way i am thinking is first fetching the value of the "column2" from
reference_tab1 using a separate "select query" in Java code itself,  and
then passing that to the below insert query, but i think that will increase
the response time as that will be a separate DB call.

1)So,  is there a way I can do it directly using the single INSERT query
itself without additional SELECT query? And then will try to compare that
with the trigger based approach.
2)Additionally , if this decision will impact a batch insert approach. i.e.
say , in case of trigger , will the batch insert fail because trigger will
force it to make it row by row?

INSERT INTO tab_part1 (column1, column2, column3, column4, column5,
part_date)
VALUES (:v_col1, :v_col2, :v_col3, :v_col3, :v_col4,:v_col5,
CURRENT_DATE);

CREATE OR REPLACE FUNCTION trg_func_populate_column2() RETURNS TRIGGER AS $$
BEGIN
-- Fetch reference value and populate column2
NEW.column2 := (SELECT lookup_key FROM reference_tab1 WHERE lookup_key
= old.column2);
RETURN NEW;
END;
$$ LANGUAGE plpgsql;

Regards
Yudhi


Manual query vs trigger during data load

2024-09-13 Thread Thiemo Kellner
Hi

To me it seems possible to create an insert select, e.g.

Insert into tab1 (val1, val2)
Select valA, valB
>From tab2
Where valC = :param1

Cheers

Thiemo


Storing plans with pg_stat_statements

2024-09-13 Thread Ebubekir Büyüktosun
Hello community, As you know, pg_stat_statements is very popular and useful extension used by many Postgres user. What do you think of adding to store or capture also query plans as new feature in this extension? This would be an optional feature by default closed controling with a parameter maybe. I am sure it would be very helpful feature for many of us. I just present this idea for your consideration. İyi çalışmalarBest Regards   




Re: Manual query vs trigger during data load

2024-09-13 Thread Juan Rodrigo Alejandro Burgos Mella
Hello, I find it unlikely that the trigger will work properly, since the
reserved fields of the OLD subset have no value in an INSERT

Atte
JRBN

El vie, 13 de sept de 2024, 04:32, yudhi s 
escribió:

> Hello All,
>
> We are having a table which is going to be inserted with 100's of millions
> of rows each day. And we now want to have a requirement in which we need to
> do some transformation/lookup logic built on top of a few of the input bind
> values , while inserting the data. So I wanted to understand ,is it
> possible to do it along with the INSERT query or is it better to have a
> trigger created for the same?
>
> For. e.g Below is the current Insert query used in the Java code. We want
> to fetch the value for "column2" from a lookup table rather than directly
> inserting as it's coming from the customer side. So I am thinking of a
> trigger like below. But at the same time I also want to compare the
> performance of a normal way of doing the lookup vs having it performed
> using triggers.
>
> So one way i am thinking is first fetching the value of the "column2" from
> reference_tab1 using a separate "select query" in Java code itself,  and
> then passing that to the below insert query, but i think that will increase
> the response time as that will be a separate DB call.
>
> 1)So,  is there a way I can do it directly using the single INSERT query
> itself without additional SELECT query? And then will try to compare that
> with the trigger based approach.
> 2)Additionally , if this decision will impact a batch insert approach.
> i.e. say , in case of trigger , will the batch insert fail because
> trigger will force it to make it row by row?
>
> INSERT INTO tab_part1 (column1, column2, column3, column4, column5,
> part_date)
> VALUES (:v_col1, :v_col2, :v_col3, :v_col3, :v_col4,:v_col5,
> CURRENT_DATE);
>
> CREATE OR REPLACE FUNCTION trg_func_populate_column2() RETURNS TRIGGER AS
> $$
> BEGIN
> -- Fetch reference value and populate column2
> NEW.column2 := (SELECT lookup_key FROM reference_tab1 WHERE lookup_key
> = old.column2);
> RETURN NEW;
> END;
> $$ LANGUAGE plpgsql;
>
> Regards
> Yudhi
>


Reg: Size difference

2024-09-13 Thread Vinay Oli
Hi Team

I have been using PostgreSQL for the past 6 years. PostgreSQL has
significantly impacted my life, providing me with great opportunities for
knowledge and self-development.

I'm currently facing a strange issue with PostgreSQL 15.0. I have a
primary-standby setup that is in sync, with a replication slot in place.
There are 18 databases, and one of the databases on the primary side is 104
GB, while the same database on the standby side is 216 GB. Both are in sync
with zero delay.

Could this be a bug? If so, has it been resolved in newer releases? If it
is not a bug, how can this issue be fixed? Is there a solution or any
supporting documentation available?

WAL and log files are being rotated properly. The issue is with a database
named services_mfs. On the primary cluster, the services_mfs database is
104GB, but on the standby cluster, it is 216GB, even though both cluster
are in sync. The standby database is only used in case of a crash, which is
managed by a Patroni cluster with etcd.



Thanks,

Vinay Kumar


Re: DDL issue

2024-09-13 Thread Rich Shepard

On Thu, 12 Sep 2024, David G. Johnston wrote:


Read up on “many-to-many” data models. In SQL they involve a linking
table, one row per bidirectional edge, in addition to the two node tables.


David,

Thanks very much. I knew about those a long time ago but haven't needed them
in a long time so I forgot about them. Will re-learn.

Much appreciated,

Rich




Re: DDL issue

2024-09-13 Thread Rich Shepard

On Thu, 12 Sep 2024, Adrian Klaver wrote:


Quick and dirty:

people_table
  person_id  PK
  name_last
  name_first
  email_address
  ph_number
  ...

location_table
  loc_id   PK
  person_id_fk FK <--> people_table(person_id)
  loc_name
  loc_st_addr
  loc_st_city
  loc_st_st_prov
  ...

contact_table
  contact_id  PK
  loc_id_fk   FK <--> location_table(loc_id)
  contact_ph_number
  contact_email  --Can be null

It can get more involved then this, depends on how flexible you want to get.


Adrian,

There are many companies with multiple locations, but few owners with
multiple companies, each with a different location. David's reminder about
many-to-many tables will do the job.

Thanks,

Rich




Re: DDL issue

2024-09-13 Thread Rich Shepard

On Fri, 13 Sep 2024, Muhammad Usman Khan wrote:


To handle this situation in PostgreSQL, you can model the data in a way
that maintains a single entry for each owner in the people table while
linking the owner to multiple dairies through a separate dairies table.
This is a typical one-to-many relationship (one person can own many
dairies). The following is testing scenario which might help you


Muhammed,

Thank you. There are a number of industries (most involving natural
resources) so the one-to-many table will be more general.

Regards,

Rich




Re: DDL issue

2024-09-13 Thread Rich Shepard

On Fri, 13 Sep 2024, Tony Shelver wrote:


Or if you want to get even more flexible, where a dairy could have more
than one owner as well as one owner having more than one dairy, you could
create an intersection / relationship table.

Something like

-- Create people table (one entry per person)
CREATE TABLE people_dairy_map ( pdm_id SERIAL PRIMARY KEY,
first_name VARCHAR(100),
last_name VARCHAR(100),
email VARCHAR(100) UNIQUE );
phone_number VARCHAR(15),
person_id INT REFERENCES people(person_id)
dairy_id INT REFERENCES dairies(dairy_id);


Thanks, Tony.

Regards,

Rich




Re: Reg: Size difference

2024-09-13 Thread Karsten Hilbert
Am Fri, Sep 13, 2024 at 05:39:22PM +0530 schrieb Vinay Oli:

> I'm currently facing a strange issue with PostgreSQL 15.0. I have a
> primary-standby setup that is in sync, with a replication slot in place.
> There are 18 databases, and one of the databases on the primary side is 104
> GB, while the same database on the standby side is 216 GB. Both are in sync
> with zero delay.

First thing that comes to my mind is different effect of
autovacuum between both clusters.

Best,
Karsten
--
GPG  40BE 5B0E C98E 1713 AFA6  5BC0 3BEA AC80 7D4F C89B




Re: post-bootstrap init : permission denied pg_description

2024-09-13 Thread François SIMON
Le Thu, Sep 12, 2024 at 01:38:11PM -0400, Tom Lane a écrit :
> =?iso-8859-1?Q?Fran=E7ois?= SIMON  writes:
> > So the problem seems to come from xlc, and only at initdb step.
> > I can see that initdb runs a backend postgres in single user mode.
> > And this is this backend, when compiled with xlc, that shows :
> > FATAL:  permission denied for table pg_description
> 
> Yeah.  This seems like it must be an xlc code-generation bug.
> It's got nothing to do with filesystem permissions: you're failing
> an ACL check at the logical "table" level, which you should not
> be because single-user mode implies superuser permissions.
> 
> We do have one buildfarm member building with 64-bit xlc on
> AIX 7.1:
> 
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hornet&dt=2024-09-11%2003%3A51%3A02
> 
> I see that that uses a bunch of odd CFLAGS, but they look like they
> are mostly to suppress compiler warnings.  Another important bit of
> the recipe for AIX is -D_LARGE_FILES=1, but I'm not sure that that
> would have any impact during initdb (since we're not dealing with
> any large files at that point).
> 
> You could perhaps try different -O optimization levels.  Also
> make sure your xlc is up-to-date.

Thank you for all this information.

Unfortunately I haven't obtained any better results yet.

I tried -O2, -O3, with or without -D_LARGE_FILES=1, xlc_r instead
of xlc, and all the same flags as the buildfarm member.

I will try with another version of xlc if possible.

François

> 
>   regards, tom lane




Re: Manual query vs trigger during data load

2024-09-13 Thread Adrian Klaver

On 9/13/24 02:58, Juan Rodrigo Alejandro Burgos Mella wrote:
Hello, I find it unlikely that the trigger will work properly, since the 
reserved fields of the OLD subset have no value in an INSERT


I'm not seeing that the OP is asking for OLD.* values, they are just 
looking to include the result of a lookup on another table in the INSERT.




Atte
JRBN




--
Adrian Klaver
adrian.kla...@aklaver.com





Re: Manual query vs trigger during data load

2024-09-13 Thread Adrian Klaver

On 9/13/24 07:50, Adrian Klaver wrote:

On 9/13/24 02:58, Juan Rodrigo Alejandro Burgos Mella wrote:
Hello, I find it unlikely that the trigger will work properly, since 
the reserved fields of the OLD subset have no value in an INSERT


I'm not seeing that the OP is asking for OLD.* values, they are just 
looking to include the result of a lookup on another table in the INSERT.


My mistake I see the OLD reference now.





Atte
JRBN






--
Adrian Klaver
adrian.kla...@aklaver.com





Re: Functionally dependent columns in SELECT DISTINCT

2024-09-13 Thread Willow Chargin
On Thu, Sep 12, 2024 at 11:13 PM  wrote:
>
> What about using DISTINCT ON () ?
> SELECT DISTINCT ON (items.id) items.*
> FROM items
>   JOIN parts ON items.id = parts.item_id
> WHERE part_id % 3 = 0
> ORDER BY items.id,items.create_time DESC
> LIMIT 5;
>
> This gives me this plan: https://explain.depesz.com/s/QHr6 on 16.2  (Windows, 
> i7-1260P)

Ordering by items.id changes the answer, though. In the example I gave,
items.id and items.create_time happened to be in the same order, but
that needn't hold. In reality I really do want the ID columns of the
*most recent* items.

You can see the difference if you build the test dataset a bit
differently:

INSERT INTO items(id, create_time)
SELECT i, now() - make_interval(secs => random() * 1e6)
FROM generate_series(1, 100) s(i);

We want the returned create_times to be all recent, and the IDs now
should look roughly random.




Will hundred of thousands of this type of query cause Parsing issue

2024-09-13 Thread Wong, Kam Fook (TR Technology)
Follow Postgres expert,

We have a flavor of this type of query with long in-list/bind variables (see 
below).  We notice that some of the bind variables come in as 0 which causes 
the optimizer to choose to full scan two of the following 3 tables.  One 
thought to fix a full table scan is to chop off the not-needed bind variables 
(proven to work after some tests).  But my other worry is will cause parsing 
issues because the app will be executing > 100k/sec with this type of query.

I am an Oracle DBA, and this change for sure will generate a different query 
id.  Which in turn generates tons of extra parsing to the DB because all soft 
and hard parsing occurs at the DB level.  But my understanding for Postgres is 
parsing occurs at the client jdbc level.  Am I understanding this correctly?

In summary/my concern:

1) Where does query parsing occur?
2) Will this cause extra parsing to the posgress DB?  Any pg system table to 
measure parsing?


SELECT  abc, efg from DOCLOC a, COLLECTION b  WHERE  a.colum1 IN ($1, $2, $3, 
$4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17, $18, $19, $20, 
$21, $22, $23, $24, $25, $26, $27, $28, $29, $30, $31, $32, $33, $34, $35, $36, 
$37, $38, $39, $40, $41, $42, $43, $44, $45, $46, $47, $48, $49, $50, $51, $52, 
$53, $54, $55, $56, $57, $58, $59, $60, $61, $62, $63, $64, $65, $66, $67, $68, 
$69, $70, $71, $72, $73, $74, $75, $76, $77, $78, $79, $80, $81, $82, $83, $84, 
$85, $86, $87, $88, $89, $90, $91, $92, $93, $94, $95, $96, $97, $98, $99, 
$100, $101, $102, $103, $104, $105, $106, $107, $108, $109, $110, $111, $112, 
$113, $114, $115, $116, $117, $118, $119, $120, $121, $122, $123, $124, $125, 
$126, $127, $128, $129, $130, $131, $132, $133, $134, $135, $136, $137, $138, 
$139, $140, $141, $142, $143, $144, $145, $146, $147, $148, $149, $150, $151, 
$152, $153, $154, $155, $156, $157, $158, $159, $160, $161, $162, $163, $164, 
$165, $166, $167, $168, $169, $170, $171, $172, $173, $174, $175, $176, $177, 
$178, $179, $180, $181, $182, $183, $184, $185, $186, $187, $188, $189, $190, 
$191, $192, $193, $194, $195, $196, $197, $198, $199, $200, $201, $202, $203, 
$204, $205, $206, $207, $208, $209, $210, $211, $212, $213, $214, $215, $216, 
$217, $218, $219, $220, $221, $222, $223, $224, $225, $226, $227, $228, $229, 
$230, $231, $232, $233, $234, $235, $236, $237, $238, $239, $240, $241, $242, 
$243, $244, $245, $246, $247, $248, $249, $250, $251, $252, $253, $254, $255, 
$256, $257, $258, $259, $260, $261, $262, $263, $264, $265, $266, $267, $268, 
$269, $270, $271, $272, $273, $274, $275, $276, $277, $278, $279, $280, $281, 
$282, $283, $284, $285, $286, $287, $288, $289, $290, $291, $292, $293, $294, 
$295, $296, $297, $298, $299, $300, $301, $302, $303, $304, $305, $306, $307, 
$308, $309, $310, $311, $312, $313, $314, $315, $316, $317, $318, $319, $320, 
$321, $322, $323, $324, $325, $326, $327, $328, $329, $330, $331, $332, $333, 
$334, $335, $336, $337, $338, $339, $340, $341, $342, $343, $344, $345, $346, 
$347, $348, $349, $350, $351, $352, $353, $354, $355, $356, $357, $358, $359, 
$360, $361, $362, $363, $364, $365, $366, $367, $368, $369, $370, $371, $372, 
$373, $374, $375, $376, $377, $378, $379, $380, $381, $382, $383, $384, $385, 
$386, $387, $388, $389, $390, $391, $392, $393, $394, $395, $396, $397, $398, 
$399, $400, $401, $402, $403, $404, $405, $406, $407, $408, $409, $410, $411, 
$412, $413, $414, $415, $416, $417, $418, $419, $420, $421, $422, $423, $424, 
$425, $426, $427, $428, $429, $430, $431, $432, $433, $434, $435, $436, $437, 
$438, $439, $440, $441, $442, $443, $444, $445, $446, $447, $448, $449, $450, 
$451, $452, $453, $454, $455, $456, $457, $458, $459, $460, $461, $462, $463, 
$464, $465, $466, $467, $468, $469, $470, $471, $472, $473, $474, $475, $476, 
$477, $478, $479, $480, $481, $482, $483, $484, $485, $486, $487, $488, $489, 
$490, $491, $492, $493, $494, $495, $496, $497, $498, $499, $500, $501, $502, 
$503, $504, $505, $506, $507, $508, $509, $510, $511, $512, $513, $514, $515, 
$516, $517, $518, $519, $520, $521, $522, $523, $524, $525, $526, $527, $528, 
$529, $530, $531, $532, $533, $534, $535, $536, $537, $538, $539, $540, $541, 
$542, $543, $544, $545, $546, $547, $548, $549, $550, $551, $552, $553, $554, 
$555, $556, $557, $558, $559, $560, $561, $562, $563, $564, $565, $566, $567, 
$568, $569, $570, $571, $572, $573, $574, $575, $576, $577, $578, $579, $580, 
$581, $582, $583, $584, $585, $586, $587, $588, $589, $590, $591, $592, $593, 
$594, $595, $596, $597, $598, $599, $600, $601, $602, $603, $604, $605, $606, 
$607, $608, $609, $610, $611, $612, $613, $614, $615, $616, $617, $618, $619, 
$620, $621, $622, $623, $624, $625, $626, $627, $628, $629, $630, $631, $632, 
$633, $634, $635, $636, $637, $638, $639, $640, $641, $642, $643, $644, $645, 
$646, $647, $648, $649, $650, $651, $652, $653, $654, $655, $656, $657, $658, 
$659, $660, $661, $662, $663, $664, $665, $666, $667, $668, $669, $670, $671, 
$67

Re: Functionally dependent columns in SELECT DISTINCT

2024-09-13 Thread David G. Johnston
On Friday, September 13, 2024, Willow Chargin 
wrote:

> In reality I really do want the ID columns of the
> *most recent* items.
>

Use a window function to rank them and pull out rank=1, or use a lateral
subquery to surgically (fetch first 1) retrieve the first row when sorted
by recency descending.

David J.


Re: Functionally dependent columns in SELECT DISTINCT

2024-09-13 Thread Willow Chargin
Thanks both for your suggestions so far.

On Fri, Sep 13, 2024 at 8:43 AM David G. Johnston
 wrote:
>
> On Friday, September 13, 2024, Willow Chargin  wrote:
>>
>> In reality I really do want the ID columns of the
>> *most recent* items.
>
>
> Use a window function to rank them and pull out rank=1

Hmm, like this? noting that it's rank<=5, not rank=1:

-- 1. rank all item-part combinations, densely since an item may
  have multiple parts
-- 2. limit by rank, still retaining multiple copies of each item
-- 3. de-duplicate IDs
SELECT DISTINCT id FROM (
SELECT id, dense_rank FROM (
SELECT
items.id,
dense_rank() OVER (ORDER BY create_time DESC)
FROM items JOIN parts ON items.id = parts.item_id
WHERE part_id % 3 = 0
) q
WHERE dense_rank <= 5
) q

I've done this before, but my experience is that it's usually far slower
because the rank is computed eagerly even for rows that don't match the
rank bound. And indeed here it takes 20% longer than even the slower
GROUP BY from before: https://explain.depesz.com/s/mQIi

> or use a lateral subquery to surgically (fetch first 1) retrieve the first 
> row when sorted by recency descending.

I'm not sure that I see how to apply this when I need top-k, not top-1.




Re: DDL issue

2024-09-13 Thread Rich Shepard

On Thu, 12 Sep 2024, Adrian Klaver wrote:


Quick and dirty:

people_table
  person_id  PK
  name_last
  name_first
  email_address
  ph_number
  ...

location_table
  loc_id   PK
  person_id_fk FK <--> people_table(person_id)
  loc_name
  loc_st_addr
  loc_st_city
  loc_st_st_prov
  ...

contact_table
  contact_id  PK
  loc_id_fk   FK <--> location_table(loc_id)
  contact_ph_number
  contact_email  --Can be null

It can get more involved then this, depends on how flexible you want to get.


Adrian,

This comes close; I need to think about this.

Thanks,

Rich





Re: Will hundred of thousands of this type of query cause Parsing issue

2024-09-13 Thread David Mullineux
I would usually try to avoid such long IN causes. Why not try this ..
Create a temp table of 1 column. Bulk insert all your IDs into that table.
Then change your query to join to the temp table?  This also has the
advantage of working for 1000s of values.

On Fri, 13 Sept 2024, 16:35 Wong, Kam Fook (TR Technology), <
kamfook.w...@thomsonreuters.com> wrote:

> Follow Postgres expert,
>
> We have a flavor of this type of query with long in-list/bind variables
> (see below).  We notice that some of the bind variables come in as 0 which
> causes the optimizer to choose to full scan two of the following 3 tables.
> One thought to fix a full table scan is to chop off the not-needed bind
> variables (proven to work after some tests).  But my other worry is will
> cause parsing issues because the app will be executing > 100k/sec with this
> type of query.
>
> I am an Oracle DBA, and this change for sure will generate a different
> query id.  Which in turn generates tons of extra parsing to the DB because
> all soft and hard parsing occurs at the DB level.  But my understanding for
> Postgres is parsing occurs at the client jdbc level.  Am I understanding
> this correctly?
>
>
> In summary/my concern:
>
> 1) Where does query parsing occur?
> 2) Will this cause extra parsing to the posgress DB?  Any pg system table
> to measure parsing?
>
>
>
> SELECT  abc, efg from DOCLOC a, COLLECTION b  WHERE  a.colum1 IN ($1, $2,
> $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17, $18,
> $19, $20, $21, $22, $23, $24, $25, $26, $27, $28, $29, $30, $31, $32, $33,
> $34, $35, $36, $37, $38, $39, $40, $41, $42, $43, $44, $45, $46, $47, $48,
> $49, $50, $51, $52, $53, $54, $55, $56, $57, $58, $59, $60, $61, $62, $63,
> $64, $65, $66, $67, $68, $69, $70, $71, $72, $73, $74, $75, $76, $77, $78,
> $79, $80, $81, $82, $83, $84, $85, $86, $87, $88, $89, $90, $91, $92, $93,
> $94, $95, $96, $97, $98, $99, $100, $101, $102, $103, $104, $105, $106,
> $107, $108, $109, $110, $111, $112, $113, $114, $115, $116, $117, $118,
> $119, $120, $121, $122, $123, $124, $125, $126, $127, $128, $129, $130,
> $131, $132, $133, $134, $135, $136, $137, $138, $139, $140, $141, $142,
> $143, $144, $145, $146, $147, $148, $149, $150, $151, $152, $153, $154,
> $155, $156, $157, $158, $159, $160, $161, $162, $163, $164, $165, $166,
> $167, $168, $169, $170, $171, $172, $173, $174, $175, $176, $177, $178,
> $179, $180, $181, $182, $183, $184, $185, $186, $187, $188, $189, $190,
> $191, $192, $193, $194, $195, $196, $197, $198, $199, $200, $201, $202,
> $203, $204, $205, $206, $207, $208, $209, $210, $211, $212, $213, $214,
> $215, $216, $217, $218, $219, $220, $221, $222, $223, $224, $225, $226,
> $227, $228, $229, $230, $231, $232, $233, $234, $235, $236, $237, $238,
> $239, $240, $241, $242, $243, $244, $245, $246, $247, $248, $249, $250,
> $251, $252, $253, $254, $255, $256, $257, $258, $259, $260, $261, $262,
> $263, $264, $265, $266, $267, $268, $269, $270, $271, $272, $273, $274,
> $275, $276, $277, $278, $279, $280, $281, $282, $283, $284, $285, $286,
> $287, $288, $289, $290, $291, $292, $293, $294, $295, $296, $297, $298,
> $299, $300, $301, $302, $303, $304, $305, $306, $307, $308, $309, $310,
> $311, $312, $313, $314, $315, $316, $317, $318, $319, $320, $321, $322,
> $323, $324, $325, $326, $327, $328, $329, $330, $331, $332, $333, $334,
> $335, $336, $337, $338, $339, $340, $341, $342, $343, $344, $345, $346,
> $347, $348, $349, $350, $351, $352, $353, $354, $355, $356, $357, $358,
> $359, $360, $361, $362, $363, $364, $365, $366, $367, $368, $369, $370,
> $371, $372, $373, $374, $375, $376, $377, $378, $379, $380, $381, $382,
> $383, $384, $385, $386, $387, $388, $389, $390, $391, $392, $393, $394,
> $395, $396, $397, $398, $399, $400, $401, $402, $403, $404, $405, $406,
> $407, $408, $409, $410, $411, $412, $413, $414, $415, $416, $417, $418,
> $419, $420, $421, $422, $423, $424, $425, $426, $427, $428, $429, $430,
> $431, $432, $433, $434, $435, $436, $437, $438, $439, $440, $441, $442,
> $443, $444, $445, $446, $447, $448, $449, $450, $451, $452, $453, $454,
> $455, $456, $457, $458, $459, $460, $461, $462, $463, $464, $465, $466,
> $467, $468, $469, $470, $471, $472, $473, $474, $475, $476, $477, $478,
> $479, $480, $481, $482, $483, $484, $485, $486, $487, $488, $489, $490,
> $491, $492, $493, $494, $495, $496, $497, $498, $499, $500, $501, $502,
> $503, $504, $505, $506, $507, $508, $509, $510, $511, $512, $513, $514,
> $515, $516, $517, $518, $519, $520, $521, $522, $523, $524, $525, $526,
> $527, $528, $529, $530, $531, $532, $533, $534, $535, $536, $537, $538,
> $539, $540, $541, $542, $543, $544, $545, $546, $547, $548, $549, $550,
> $551, $552, $553, $554, $555, $556, $557, $558, $559, $560, $561, $562,
> $563, $564, $565, $566, $567, $568, $569, $570, $571, $572, $573, $574,
> $575, $576, $577, $578, $579, $580, $581, $582, $583, $584, $585, $586,
> $587, $588, $589, $590, $591, $592, $593, $594, $595, $596, $597, $5

Re: Manual query vs trigger during data load

2024-09-13 Thread Rob Sargent



> On Sep 13, 2024, at 10:57 AM, Adrian Klaver  wrote:
> 
> On 9/13/24 07:50, Adrian Klaver wrote:
>>> On 9/13/24 02:58, Juan Rodrigo Alejandro Burgos Mella wrote:
>>> Hello, I find it unlikely that the trigger will work properly, since the 
>>> reserved fields of the OLD subset have no value in an INSERT
>> I'm not seeing that the OP is asking for OLD.* values, they are just looking 
>> to include the result of a lookup on another table in the INSERT.
> 
> My mistake I see the OLD reference now.
> 
>>> 
Personally I would cache the lookup on the java side a send the correct value 
to a simple insert statement

> 
> 
> 




Re: Functionally dependent columns in SELECT DISTINCT

2024-09-13 Thread David G. Johnston
>
>
>
> > or use a lateral subquery to surgically (fetch first 1) retrieve the
> first row when sorted by recency descending.
>
> I'm not sure that I see how to apply this when I need top-k, not top-1.
>

Fetch first k

It's just a modern limit clause.

David J.


Re: Will hundred of thousands of this type of query cause Parsing issue

2024-09-13 Thread Greg Sabino Mullane
On Fri, Sep 13, 2024 at 11:35 AM Wong, Kam Fook (TR Technology) <
kamfook.w...@thomsonreuters.com> wrote:

> 1) Where does query parsing occur?
>
>
Always on the server side, although your driver may do something as well.

2) Will this cause extra parsing to the posgress DB?
>

Yes


>   Any pg system table to measure parsing?
>

No

You want to send an array of values to the same query, so it can be
prepared once, like so:

SELECT abc, efg
FROM docloc a
JOIN collection b USING (collection_name)
WHERE a.column1 = ANY($1)
AND a.stage_id = (
  select max(stage_id) from collection_pit c
  where c.collection_name = a.collection_name
  and c.pid_id < $2 and c.stage_code = $3
);

Then you can always pass in three arguments, the first being an array of
all the column1 values you want.

You might also want to get familiar with plan_cache_mode:
https://www.postgresql.org/docs/current/sql-prepare.html

Cheers,
Greg


Re: Manual query vs trigger during data load

2024-09-13 Thread yudhi s
On Fri, Sep 13, 2024 at 8:27 PM Adrian Klaver 
wrote:

> On 9/13/24 07:50, Adrian Klaver wrote:
> > On 9/13/24 02:58, Juan Rodrigo Alejandro Burgos Mella wrote:
> >> Hello, I find it unlikely that the trigger will work properly, since
> >> the reserved fields of the OLD subset have no value in an INSERT
> >
> > I'm not seeing that the OP is asking for OLD.* values, they are just
> > looking to include the result of a lookup on another table in the INSERT.
>
> My mistake I see the OLD reference now.
>
> >
>

My mistake.The trigger was supposed to use "new.col2" and fetch the
corresponding lookup value from the lookup table and insert that value to
the target table.

Now my question was ,in such a situation , the trigger will work fine , but
is that the optimal way of doing ? Or should we convert the query someway
such that the lookup table can be queried along with the INSERT at one shot
from the database with a single DB call? And is it true that the trigger on
the target table will suppress the batch insert and make it row by row,
even if we call it in a batch fashion?

As "thiemo" mentioned , it can be done as below method, but if we have
multiple lookup tables to be populated for multiple columns , then , how
can the INSERT query be tweaked to cater the need here? And I understand ,
the lookup table can be cached in Java and refreshed at a certain point in
time, but I was trying to understand if this can be doable by directly
querying the database, considering the lookup tables are having large data
sets in them.

Insert into tab1 (val1, val2)
Select valA, valB
>From tab2
Where valC = :param1


Re: Will hundred of thousands of this type of query cause Parsing issue

2024-09-13 Thread shammat



Am 13.09.24 um 17:34 schrieb Wong, Kam Fook (TR Technology):

We have a flavor of this type of query with long in-list/bind
variables (see below).  We notice that some of the bind variables
come in as 0 which causes the optimizer to choose to full scan two of
the following 3 tables.  One thought to fix a full table scan is to
chop off the not-needed bind variables (proven to work after some
tests).  But my other worry is will cause parsing issues because the
app will be executing > 100k/sec with this type of query.

I am an Oracle DBA, and this change for sure will generate a
different query id.  Which in turn generates tons of extra parsing to
the DB because all soft and hard parsing occurs at the DB level.  But
my understanding for Postgres is parsing occurs at the client jdbc
level.  Am I understanding this correctly?

In summary/my concern:

1) Where does query parsing occur?
2) Will this cause extra parsing to the posgress DB?  Any pg system table to 
measure parsing?



You can simplify the query to a single parameter by passing the list of values 
as an array:

SELECT  abc, efg
from DOCLOC a,
 COLLECTION b
WHERE a.colum1 = ANY($1)
  AND a.COLLECTION_NAME=b.DOCLOC.COLLECTION_NAME
  AND a.DOCLOC.STAGE_ID=(SELECT MAX (STAGE_ID)
 FROM COLLECTION_PIT
 WHERE COLLECTION_PIT.COLLECTION_NAME=a.COLLECTION_NAME
 AND COLLECTION_PIT.PIT_ID<=$1001 AND 
COLLECTION_PIT.STAGE_CODE=$2)

You can then pass the array using PreparedStatement.setArray()

This has the additional advantage that you don't need to build the query 
dynamically
and there is only a single statement to be parsed. I don't think Postgres 
distinguishes
between soft and hard parses as it doesn't cache plans as aggressively as 
Oracle.






update faster way

2024-09-13 Thread yudhi s
Hello,
We have to update a column value(from numbers like '123' to codes like
'abc' by looking into a reference table data) in a partitioned table with
billions of rows in it, with each partition having 100's millions rows. As
we tested for ~30million rows it's taking ~20minutes to update. So if we go
by this calculation, it's going to take days for updating all the values.
So my question is

1) If there is any inbuilt way of running the update query in parallel
(e.g. using parallel hints etc) to make it run faster?
2) should we run each individual partition in a separate session (e.g. five
partitions will have the updates done at same time from 5 different
sessions)? And will it have any locking effect or we can just start the
sessions and let them run without impacting our live transactions?

UPDATE tab_part1
SET column1 = reftab.code
FROM reference_tab reftab
WHERE tab_part1.column1 = subquery.column1;

Regards
Yudhi


Re: update faster way

2024-09-13 Thread Igor Korot
Hii,

On Fri, Sep 13, 2024 at 10:22 PM yudhi s  wrote:
>
> Hello,
> We have to update a column value(from numbers like '123' to codes like 'abc' 
> by looking into a reference table data) in a partitioned table with billions 
> of rows in it, with each partition having 100's millions rows. As we tested 
> for ~30million rows it's taking ~20minutes to update. So if we go by this 
> calculation, it's going to take days for updating all the values. So my 
> question is
>
> 1) If there is any inbuilt way of running the update query in parallel (e.g. 
> using parallel hints etc) to make it run faster?
> 2) should we run each individual partition in a separate session (e.g. five 
> partitions will have the updates done at same time from 5 different 
> sessions)? And will it have any locking effect or we can just start the 
> sessions and let them run without impacting our live transactions?

Do you have any indexes?
If not - you should, if yes - what are they?

Thank you.

>
> UPDATE tab_part1
> SET column1 = reftab.code
> FROM reference_tab reftab
> WHERE tab_part1.column1 = subquery.column1;
>
> Regards
> Yudhi




Re: update faster way

2024-09-13 Thread yudhi s
>
>
>
> Do you have any indexes?
> If not - you should, if yes - what are they?
>
>
>
Yes we have a primary key on this table which is on a UUID type column and
also we have other indexes in other timestamp columns . But how is this
going to help as we are going to update almost all the rows in the table?