Manual query vs trigger during data load
Hello All, We are having a table which is going to be inserted with 100's of millions of rows each day. And we now want to have a requirement in which we need to do some transformation/lookup logic built on top of a few of the input bind values , while inserting the data. So I wanted to understand ,is it possible to do it along with the INSERT query or is it better to have a trigger created for the same? For. e.g Below is the current Insert query used in the Java code. We want to fetch the value for "column2" from a lookup table rather than directly inserting as it's coming from the customer side. So I am thinking of a trigger like below. But at the same time I also want to compare the performance of a normal way of doing the lookup vs having it performed using triggers. So one way i am thinking is first fetching the value of the "column2" from reference_tab1 using a separate "select query" in Java code itself, and then passing that to the below insert query, but i think that will increase the response time as that will be a separate DB call. 1)So, is there a way I can do it directly using the single INSERT query itself without additional SELECT query? And then will try to compare that with the trigger based approach. 2)Additionally , if this decision will impact a batch insert approach. i.e. say , in case of trigger , will the batch insert fail because trigger will force it to make it row by row? INSERT INTO tab_part1 (column1, column2, column3, column4, column5, part_date) VALUES (:v_col1, :v_col2, :v_col3, :v_col3, :v_col4,:v_col5, CURRENT_DATE); CREATE OR REPLACE FUNCTION trg_func_populate_column2() RETURNS TRIGGER AS $$ BEGIN -- Fetch reference value and populate column2 NEW.column2 := (SELECT lookup_key FROM reference_tab1 WHERE lookup_key = old.column2); RETURN NEW; END; $$ LANGUAGE plpgsql; Regards Yudhi
Manual query vs trigger during data load
Hi To me it seems possible to create an insert select, e.g. Insert into tab1 (val1, val2) Select valA, valB >From tab2 Where valC = :param1 Cheers Thiemo
Storing plans with pg_stat_statements
Hello community, As you know, pg_stat_statements is very popular and useful extension used by many Postgres user. What do you think of adding to store or capture also query plans as new feature in this extension? This would be an optional feature by default closed controling with a parameter maybe. I am sure it would be very helpful feature for many of us. I just present this idea for your consideration. İyi çalışmalarBest Regards
Re: Manual query vs trigger during data load
Hello, I find it unlikely that the trigger will work properly, since the reserved fields of the OLD subset have no value in an INSERT Atte JRBN El vie, 13 de sept de 2024, 04:32, yudhi s escribió: > Hello All, > > We are having a table which is going to be inserted with 100's of millions > of rows each day. And we now want to have a requirement in which we need to > do some transformation/lookup logic built on top of a few of the input bind > values , while inserting the data. So I wanted to understand ,is it > possible to do it along with the INSERT query or is it better to have a > trigger created for the same? > > For. e.g Below is the current Insert query used in the Java code. We want > to fetch the value for "column2" from a lookup table rather than directly > inserting as it's coming from the customer side. So I am thinking of a > trigger like below. But at the same time I also want to compare the > performance of a normal way of doing the lookup vs having it performed > using triggers. > > So one way i am thinking is first fetching the value of the "column2" from > reference_tab1 using a separate "select query" in Java code itself, and > then passing that to the below insert query, but i think that will increase > the response time as that will be a separate DB call. > > 1)So, is there a way I can do it directly using the single INSERT query > itself without additional SELECT query? And then will try to compare that > with the trigger based approach. > 2)Additionally , if this decision will impact a batch insert approach. > i.e. say , in case of trigger , will the batch insert fail because > trigger will force it to make it row by row? > > INSERT INTO tab_part1 (column1, column2, column3, column4, column5, > part_date) > VALUES (:v_col1, :v_col2, :v_col3, :v_col3, :v_col4,:v_col5, > CURRENT_DATE); > > CREATE OR REPLACE FUNCTION trg_func_populate_column2() RETURNS TRIGGER AS > $$ > BEGIN > -- Fetch reference value and populate column2 > NEW.column2 := (SELECT lookup_key FROM reference_tab1 WHERE lookup_key > = old.column2); > RETURN NEW; > END; > $$ LANGUAGE plpgsql; > > Regards > Yudhi >
Reg: Size difference
Hi Team I have been using PostgreSQL for the past 6 years. PostgreSQL has significantly impacted my life, providing me with great opportunities for knowledge and self-development. I'm currently facing a strange issue with PostgreSQL 15.0. I have a primary-standby setup that is in sync, with a replication slot in place. There are 18 databases, and one of the databases on the primary side is 104 GB, while the same database on the standby side is 216 GB. Both are in sync with zero delay. Could this be a bug? If so, has it been resolved in newer releases? If it is not a bug, how can this issue be fixed? Is there a solution or any supporting documentation available? WAL and log files are being rotated properly. The issue is with a database named services_mfs. On the primary cluster, the services_mfs database is 104GB, but on the standby cluster, it is 216GB, even though both cluster are in sync. The standby database is only used in case of a crash, which is managed by a Patroni cluster with etcd. Thanks, Vinay Kumar
Re: DDL issue
On Thu, 12 Sep 2024, David G. Johnston wrote: Read up on “many-to-many” data models. In SQL they involve a linking table, one row per bidirectional edge, in addition to the two node tables. David, Thanks very much. I knew about those a long time ago but haven't needed them in a long time so I forgot about them. Will re-learn. Much appreciated, Rich
Re: DDL issue
On Thu, 12 Sep 2024, Adrian Klaver wrote: Quick and dirty: people_table person_id PK name_last name_first email_address ph_number ... location_table loc_id PK person_id_fk FK <--> people_table(person_id) loc_name loc_st_addr loc_st_city loc_st_st_prov ... contact_table contact_id PK loc_id_fk FK <--> location_table(loc_id) contact_ph_number contact_email --Can be null It can get more involved then this, depends on how flexible you want to get. Adrian, There are many companies with multiple locations, but few owners with multiple companies, each with a different location. David's reminder about many-to-many tables will do the job. Thanks, Rich
Re: DDL issue
On Fri, 13 Sep 2024, Muhammad Usman Khan wrote: To handle this situation in PostgreSQL, you can model the data in a way that maintains a single entry for each owner in the people table while linking the owner to multiple dairies through a separate dairies table. This is a typical one-to-many relationship (one person can own many dairies). The following is testing scenario which might help you Muhammed, Thank you. There are a number of industries (most involving natural resources) so the one-to-many table will be more general. Regards, Rich
Re: DDL issue
On Fri, 13 Sep 2024, Tony Shelver wrote: Or if you want to get even more flexible, where a dairy could have more than one owner as well as one owner having more than one dairy, you could create an intersection / relationship table. Something like -- Create people table (one entry per person) CREATE TABLE people_dairy_map ( pdm_id SERIAL PRIMARY KEY, first_name VARCHAR(100), last_name VARCHAR(100), email VARCHAR(100) UNIQUE ); phone_number VARCHAR(15), person_id INT REFERENCES people(person_id) dairy_id INT REFERENCES dairies(dairy_id); Thanks, Tony. Regards, Rich
Re: Reg: Size difference
Am Fri, Sep 13, 2024 at 05:39:22PM +0530 schrieb Vinay Oli: > I'm currently facing a strange issue with PostgreSQL 15.0. I have a > primary-standby setup that is in sync, with a replication slot in place. > There are 18 databases, and one of the databases on the primary side is 104 > GB, while the same database on the standby side is 216 GB. Both are in sync > with zero delay. First thing that comes to my mind is different effect of autovacuum between both clusters. Best, Karsten -- GPG 40BE 5B0E C98E 1713 AFA6 5BC0 3BEA AC80 7D4F C89B
Re: post-bootstrap init : permission denied pg_description
Le Thu, Sep 12, 2024 at 01:38:11PM -0400, Tom Lane a écrit : > =?iso-8859-1?Q?Fran=E7ois?= SIMON writes: > > So the problem seems to come from xlc, and only at initdb step. > > I can see that initdb runs a backend postgres in single user mode. > > And this is this backend, when compiled with xlc, that shows : > > FATAL: permission denied for table pg_description > > Yeah. This seems like it must be an xlc code-generation bug. > It's got nothing to do with filesystem permissions: you're failing > an ACL check at the logical "table" level, which you should not > be because single-user mode implies superuser permissions. > > We do have one buildfarm member building with 64-bit xlc on > AIX 7.1: > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hornet&dt=2024-09-11%2003%3A51%3A02 > > I see that that uses a bunch of odd CFLAGS, but they look like they > are mostly to suppress compiler warnings. Another important bit of > the recipe for AIX is -D_LARGE_FILES=1, but I'm not sure that that > would have any impact during initdb (since we're not dealing with > any large files at that point). > > You could perhaps try different -O optimization levels. Also > make sure your xlc is up-to-date. Thank you for all this information. Unfortunately I haven't obtained any better results yet. I tried -O2, -O3, with or without -D_LARGE_FILES=1, xlc_r instead of xlc, and all the same flags as the buildfarm member. I will try with another version of xlc if possible. François > > regards, tom lane
Re: Manual query vs trigger during data load
On 9/13/24 02:58, Juan Rodrigo Alejandro Burgos Mella wrote: Hello, I find it unlikely that the trigger will work properly, since the reserved fields of the OLD subset have no value in an INSERT I'm not seeing that the OP is asking for OLD.* values, they are just looking to include the result of a lookup on another table in the INSERT. Atte JRBN -- Adrian Klaver adrian.kla...@aklaver.com
Re: Manual query vs trigger during data load
On 9/13/24 07:50, Adrian Klaver wrote: On 9/13/24 02:58, Juan Rodrigo Alejandro Burgos Mella wrote: Hello, I find it unlikely that the trigger will work properly, since the reserved fields of the OLD subset have no value in an INSERT I'm not seeing that the OP is asking for OLD.* values, they are just looking to include the result of a lookup on another table in the INSERT. My mistake I see the OLD reference now. Atte JRBN -- Adrian Klaver adrian.kla...@aklaver.com
Re: Functionally dependent columns in SELECT DISTINCT
On Thu, Sep 12, 2024 at 11:13 PM wrote: > > What about using DISTINCT ON () ? > SELECT DISTINCT ON (items.id) items.* > FROM items > JOIN parts ON items.id = parts.item_id > WHERE part_id % 3 = 0 > ORDER BY items.id,items.create_time DESC > LIMIT 5; > > This gives me this plan: https://explain.depesz.com/s/QHr6 on 16.2 (Windows, > i7-1260P) Ordering by items.id changes the answer, though. In the example I gave, items.id and items.create_time happened to be in the same order, but that needn't hold. In reality I really do want the ID columns of the *most recent* items. You can see the difference if you build the test dataset a bit differently: INSERT INTO items(id, create_time) SELECT i, now() - make_interval(secs => random() * 1e6) FROM generate_series(1, 100) s(i); We want the returned create_times to be all recent, and the IDs now should look roughly random.
Will hundred of thousands of this type of query cause Parsing issue
Follow Postgres expert, We have a flavor of this type of query with long in-list/bind variables (see below). We notice that some of the bind variables come in as 0 which causes the optimizer to choose to full scan two of the following 3 tables. One thought to fix a full table scan is to chop off the not-needed bind variables (proven to work after some tests). But my other worry is will cause parsing issues because the app will be executing > 100k/sec with this type of query. I am an Oracle DBA, and this change for sure will generate a different query id. Which in turn generates tons of extra parsing to the DB because all soft and hard parsing occurs at the DB level. But my understanding for Postgres is parsing occurs at the client jdbc level. Am I understanding this correctly? In summary/my concern: 1) Where does query parsing occur? 2) Will this cause extra parsing to the posgress DB? Any pg system table to measure parsing? SELECT abc, efg from DOCLOC a, COLLECTION b WHERE a.colum1 IN ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17, $18, $19, $20, $21, $22, $23, $24, $25, $26, $27, $28, $29, $30, $31, $32, $33, $34, $35, $36, $37, $38, $39, $40, $41, $42, $43, $44, $45, $46, $47, $48, $49, $50, $51, $52, $53, $54, $55, $56, $57, $58, $59, $60, $61, $62, $63, $64, $65, $66, $67, $68, $69, $70, $71, $72, $73, $74, $75, $76, $77, $78, $79, $80, $81, $82, $83, $84, $85, $86, $87, $88, $89, $90, $91, $92, $93, $94, $95, $96, $97, $98, $99, $100, $101, $102, $103, $104, $105, $106, $107, $108, $109, $110, $111, $112, $113, $114, $115, $116, $117, $118, $119, $120, $121, $122, $123, $124, $125, $126, $127, $128, $129, $130, $131, $132, $133, $134, $135, $136, $137, $138, $139, $140, $141, $142, $143, $144, $145, $146, $147, $148, $149, $150, $151, $152, $153, $154, $155, $156, $157, $158, $159, $160, $161, $162, $163, $164, $165, $166, $167, $168, $169, $170, $171, $172, $173, $174, $175, $176, $177, $178, $179, $180, $181, $182, $183, $184, $185, $186, $187, $188, $189, $190, $191, $192, $193, $194, $195, $196, $197, $198, $199, $200, $201, $202, $203, $204, $205, $206, $207, $208, $209, $210, $211, $212, $213, $214, $215, $216, $217, $218, $219, $220, $221, $222, $223, $224, $225, $226, $227, $228, $229, $230, $231, $232, $233, $234, $235, $236, $237, $238, $239, $240, $241, $242, $243, $244, $245, $246, $247, $248, $249, $250, $251, $252, $253, $254, $255, $256, $257, $258, $259, $260, $261, $262, $263, $264, $265, $266, $267, $268, $269, $270, $271, $272, $273, $274, $275, $276, $277, $278, $279, $280, $281, $282, $283, $284, $285, $286, $287, $288, $289, $290, $291, $292, $293, $294, $295, $296, $297, $298, $299, $300, $301, $302, $303, $304, $305, $306, $307, $308, $309, $310, $311, $312, $313, $314, $315, $316, $317, $318, $319, $320, $321, $322, $323, $324, $325, $326, $327, $328, $329, $330, $331, $332, $333, $334, $335, $336, $337, $338, $339, $340, $341, $342, $343, $344, $345, $346, $347, $348, $349, $350, $351, $352, $353, $354, $355, $356, $357, $358, $359, $360, $361, $362, $363, $364, $365, $366, $367, $368, $369, $370, $371, $372, $373, $374, $375, $376, $377, $378, $379, $380, $381, $382, $383, $384, $385, $386, $387, $388, $389, $390, $391, $392, $393, $394, $395, $396, $397, $398, $399, $400, $401, $402, $403, $404, $405, $406, $407, $408, $409, $410, $411, $412, $413, $414, $415, $416, $417, $418, $419, $420, $421, $422, $423, $424, $425, $426, $427, $428, $429, $430, $431, $432, $433, $434, $435, $436, $437, $438, $439, $440, $441, $442, $443, $444, $445, $446, $447, $448, $449, $450, $451, $452, $453, $454, $455, $456, $457, $458, $459, $460, $461, $462, $463, $464, $465, $466, $467, $468, $469, $470, $471, $472, $473, $474, $475, $476, $477, $478, $479, $480, $481, $482, $483, $484, $485, $486, $487, $488, $489, $490, $491, $492, $493, $494, $495, $496, $497, $498, $499, $500, $501, $502, $503, $504, $505, $506, $507, $508, $509, $510, $511, $512, $513, $514, $515, $516, $517, $518, $519, $520, $521, $522, $523, $524, $525, $526, $527, $528, $529, $530, $531, $532, $533, $534, $535, $536, $537, $538, $539, $540, $541, $542, $543, $544, $545, $546, $547, $548, $549, $550, $551, $552, $553, $554, $555, $556, $557, $558, $559, $560, $561, $562, $563, $564, $565, $566, $567, $568, $569, $570, $571, $572, $573, $574, $575, $576, $577, $578, $579, $580, $581, $582, $583, $584, $585, $586, $587, $588, $589, $590, $591, $592, $593, $594, $595, $596, $597, $598, $599, $600, $601, $602, $603, $604, $605, $606, $607, $608, $609, $610, $611, $612, $613, $614, $615, $616, $617, $618, $619, $620, $621, $622, $623, $624, $625, $626, $627, $628, $629, $630, $631, $632, $633, $634, $635, $636, $637, $638, $639, $640, $641, $642, $643, $644, $645, $646, $647, $648, $649, $650, $651, $652, $653, $654, $655, $656, $657, $658, $659, $660, $661, $662, $663, $664, $665, $666, $667, $668, $669, $670, $671, $67
Re: Functionally dependent columns in SELECT DISTINCT
On Friday, September 13, 2024, Willow Chargin wrote: > In reality I really do want the ID columns of the > *most recent* items. > Use a window function to rank them and pull out rank=1, or use a lateral subquery to surgically (fetch first 1) retrieve the first row when sorted by recency descending. David J.
Re: Functionally dependent columns in SELECT DISTINCT
Thanks both for your suggestions so far. On Fri, Sep 13, 2024 at 8:43 AM David G. Johnston wrote: > > On Friday, September 13, 2024, Willow Chargin wrote: >> >> In reality I really do want the ID columns of the >> *most recent* items. > > > Use a window function to rank them and pull out rank=1 Hmm, like this? noting that it's rank<=5, not rank=1: -- 1. rank all item-part combinations, densely since an item may have multiple parts -- 2. limit by rank, still retaining multiple copies of each item -- 3. de-duplicate IDs SELECT DISTINCT id FROM ( SELECT id, dense_rank FROM ( SELECT items.id, dense_rank() OVER (ORDER BY create_time DESC) FROM items JOIN parts ON items.id = parts.item_id WHERE part_id % 3 = 0 ) q WHERE dense_rank <= 5 ) q I've done this before, but my experience is that it's usually far slower because the rank is computed eagerly even for rows that don't match the rank bound. And indeed here it takes 20% longer than even the slower GROUP BY from before: https://explain.depesz.com/s/mQIi > or use a lateral subquery to surgically (fetch first 1) retrieve the first > row when sorted by recency descending. I'm not sure that I see how to apply this when I need top-k, not top-1.
Re: DDL issue
On Thu, 12 Sep 2024, Adrian Klaver wrote: Quick and dirty: people_table person_id PK name_last name_first email_address ph_number ... location_table loc_id PK person_id_fk FK <--> people_table(person_id) loc_name loc_st_addr loc_st_city loc_st_st_prov ... contact_table contact_id PK loc_id_fk FK <--> location_table(loc_id) contact_ph_number contact_email --Can be null It can get more involved then this, depends on how flexible you want to get. Adrian, This comes close; I need to think about this. Thanks, Rich
Re: Will hundred of thousands of this type of query cause Parsing issue
I would usually try to avoid such long IN causes. Why not try this .. Create a temp table of 1 column. Bulk insert all your IDs into that table. Then change your query to join to the temp table? This also has the advantage of working for 1000s of values. On Fri, 13 Sept 2024, 16:35 Wong, Kam Fook (TR Technology), < kamfook.w...@thomsonreuters.com> wrote: > Follow Postgres expert, > > We have a flavor of this type of query with long in-list/bind variables > (see below). We notice that some of the bind variables come in as 0 which > causes the optimizer to choose to full scan two of the following 3 tables. > One thought to fix a full table scan is to chop off the not-needed bind > variables (proven to work after some tests). But my other worry is will > cause parsing issues because the app will be executing > 100k/sec with this > type of query. > > I am an Oracle DBA, and this change for sure will generate a different > query id. Which in turn generates tons of extra parsing to the DB because > all soft and hard parsing occurs at the DB level. But my understanding for > Postgres is parsing occurs at the client jdbc level. Am I understanding > this correctly? > > > In summary/my concern: > > 1) Where does query parsing occur? > 2) Will this cause extra parsing to the posgress DB? Any pg system table > to measure parsing? > > > > SELECT abc, efg from DOCLOC a, COLLECTION b WHERE a.colum1 IN ($1, $2, > $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17, $18, > $19, $20, $21, $22, $23, $24, $25, $26, $27, $28, $29, $30, $31, $32, $33, > $34, $35, $36, $37, $38, $39, $40, $41, $42, $43, $44, $45, $46, $47, $48, > $49, $50, $51, $52, $53, $54, $55, $56, $57, $58, $59, $60, $61, $62, $63, > $64, $65, $66, $67, $68, $69, $70, $71, $72, $73, $74, $75, $76, $77, $78, > $79, $80, $81, $82, $83, $84, $85, $86, $87, $88, $89, $90, $91, $92, $93, > $94, $95, $96, $97, $98, $99, $100, $101, $102, $103, $104, $105, $106, > $107, $108, $109, $110, $111, $112, $113, $114, $115, $116, $117, $118, > $119, $120, $121, $122, $123, $124, $125, $126, $127, $128, $129, $130, > $131, $132, $133, $134, $135, $136, $137, $138, $139, $140, $141, $142, > $143, $144, $145, $146, $147, $148, $149, $150, $151, $152, $153, $154, > $155, $156, $157, $158, $159, $160, $161, $162, $163, $164, $165, $166, > $167, $168, $169, $170, $171, $172, $173, $174, $175, $176, $177, $178, > $179, $180, $181, $182, $183, $184, $185, $186, $187, $188, $189, $190, > $191, $192, $193, $194, $195, $196, $197, $198, $199, $200, $201, $202, > $203, $204, $205, $206, $207, $208, $209, $210, $211, $212, $213, $214, > $215, $216, $217, $218, $219, $220, $221, $222, $223, $224, $225, $226, > $227, $228, $229, $230, $231, $232, $233, $234, $235, $236, $237, $238, > $239, $240, $241, $242, $243, $244, $245, $246, $247, $248, $249, $250, > $251, $252, $253, $254, $255, $256, $257, $258, $259, $260, $261, $262, > $263, $264, $265, $266, $267, $268, $269, $270, $271, $272, $273, $274, > $275, $276, $277, $278, $279, $280, $281, $282, $283, $284, $285, $286, > $287, $288, $289, $290, $291, $292, $293, $294, $295, $296, $297, $298, > $299, $300, $301, $302, $303, $304, $305, $306, $307, $308, $309, $310, > $311, $312, $313, $314, $315, $316, $317, $318, $319, $320, $321, $322, > $323, $324, $325, $326, $327, $328, $329, $330, $331, $332, $333, $334, > $335, $336, $337, $338, $339, $340, $341, $342, $343, $344, $345, $346, > $347, $348, $349, $350, $351, $352, $353, $354, $355, $356, $357, $358, > $359, $360, $361, $362, $363, $364, $365, $366, $367, $368, $369, $370, > $371, $372, $373, $374, $375, $376, $377, $378, $379, $380, $381, $382, > $383, $384, $385, $386, $387, $388, $389, $390, $391, $392, $393, $394, > $395, $396, $397, $398, $399, $400, $401, $402, $403, $404, $405, $406, > $407, $408, $409, $410, $411, $412, $413, $414, $415, $416, $417, $418, > $419, $420, $421, $422, $423, $424, $425, $426, $427, $428, $429, $430, > $431, $432, $433, $434, $435, $436, $437, $438, $439, $440, $441, $442, > $443, $444, $445, $446, $447, $448, $449, $450, $451, $452, $453, $454, > $455, $456, $457, $458, $459, $460, $461, $462, $463, $464, $465, $466, > $467, $468, $469, $470, $471, $472, $473, $474, $475, $476, $477, $478, > $479, $480, $481, $482, $483, $484, $485, $486, $487, $488, $489, $490, > $491, $492, $493, $494, $495, $496, $497, $498, $499, $500, $501, $502, > $503, $504, $505, $506, $507, $508, $509, $510, $511, $512, $513, $514, > $515, $516, $517, $518, $519, $520, $521, $522, $523, $524, $525, $526, > $527, $528, $529, $530, $531, $532, $533, $534, $535, $536, $537, $538, > $539, $540, $541, $542, $543, $544, $545, $546, $547, $548, $549, $550, > $551, $552, $553, $554, $555, $556, $557, $558, $559, $560, $561, $562, > $563, $564, $565, $566, $567, $568, $569, $570, $571, $572, $573, $574, > $575, $576, $577, $578, $579, $580, $581, $582, $583, $584, $585, $586, > $587, $588, $589, $590, $591, $592, $593, $594, $595, $596, $597, $5
Re: Manual query vs trigger during data load
> On Sep 13, 2024, at 10:57 AM, Adrian Klaver wrote: > > On 9/13/24 07:50, Adrian Klaver wrote: >>> On 9/13/24 02:58, Juan Rodrigo Alejandro Burgos Mella wrote: >>> Hello, I find it unlikely that the trigger will work properly, since the >>> reserved fields of the OLD subset have no value in an INSERT >> I'm not seeing that the OP is asking for OLD.* values, they are just looking >> to include the result of a lookup on another table in the INSERT. > > My mistake I see the OLD reference now. > >>> Personally I would cache the lookup on the java side a send the correct value to a simple insert statement > > >
Re: Functionally dependent columns in SELECT DISTINCT
> > > > > or use a lateral subquery to surgically (fetch first 1) retrieve the > first row when sorted by recency descending. > > I'm not sure that I see how to apply this when I need top-k, not top-1. > Fetch first k It's just a modern limit clause. David J.
Re: Will hundred of thousands of this type of query cause Parsing issue
On Fri, Sep 13, 2024 at 11:35 AM Wong, Kam Fook (TR Technology) < kamfook.w...@thomsonreuters.com> wrote: > 1) Where does query parsing occur? > > Always on the server side, although your driver may do something as well. 2) Will this cause extra parsing to the posgress DB? > Yes > Any pg system table to measure parsing? > No You want to send an array of values to the same query, so it can be prepared once, like so: SELECT abc, efg FROM docloc a JOIN collection b USING (collection_name) WHERE a.column1 = ANY($1) AND a.stage_id = ( select max(stage_id) from collection_pit c where c.collection_name = a.collection_name and c.pid_id < $2 and c.stage_code = $3 ); Then you can always pass in three arguments, the first being an array of all the column1 values you want. You might also want to get familiar with plan_cache_mode: https://www.postgresql.org/docs/current/sql-prepare.html Cheers, Greg
Re: Manual query vs trigger during data load
On Fri, Sep 13, 2024 at 8:27 PM Adrian Klaver wrote: > On 9/13/24 07:50, Adrian Klaver wrote: > > On 9/13/24 02:58, Juan Rodrigo Alejandro Burgos Mella wrote: > >> Hello, I find it unlikely that the trigger will work properly, since > >> the reserved fields of the OLD subset have no value in an INSERT > > > > I'm not seeing that the OP is asking for OLD.* values, they are just > > looking to include the result of a lookup on another table in the INSERT. > > My mistake I see the OLD reference now. > > > > My mistake.The trigger was supposed to use "new.col2" and fetch the corresponding lookup value from the lookup table and insert that value to the target table. Now my question was ,in such a situation , the trigger will work fine , but is that the optimal way of doing ? Or should we convert the query someway such that the lookup table can be queried along with the INSERT at one shot from the database with a single DB call? And is it true that the trigger on the target table will suppress the batch insert and make it row by row, even if we call it in a batch fashion? As "thiemo" mentioned , it can be done as below method, but if we have multiple lookup tables to be populated for multiple columns , then , how can the INSERT query be tweaked to cater the need here? And I understand , the lookup table can be cached in Java and refreshed at a certain point in time, but I was trying to understand if this can be doable by directly querying the database, considering the lookup tables are having large data sets in them. Insert into tab1 (val1, val2) Select valA, valB >From tab2 Where valC = :param1
Re: Will hundred of thousands of this type of query cause Parsing issue
Am 13.09.24 um 17:34 schrieb Wong, Kam Fook (TR Technology): We have a flavor of this type of query with long in-list/bind variables (see below). We notice that some of the bind variables come in as 0 which causes the optimizer to choose to full scan two of the following 3 tables. One thought to fix a full table scan is to chop off the not-needed bind variables (proven to work after some tests). But my other worry is will cause parsing issues because the app will be executing > 100k/sec with this type of query. I am an Oracle DBA, and this change for sure will generate a different query id. Which in turn generates tons of extra parsing to the DB because all soft and hard parsing occurs at the DB level. But my understanding for Postgres is parsing occurs at the client jdbc level. Am I understanding this correctly? In summary/my concern: 1) Where does query parsing occur? 2) Will this cause extra parsing to the posgress DB? Any pg system table to measure parsing? You can simplify the query to a single parameter by passing the list of values as an array: SELECT abc, efg from DOCLOC a, COLLECTION b WHERE a.colum1 = ANY($1) AND a.COLLECTION_NAME=b.DOCLOC.COLLECTION_NAME AND a.DOCLOC.STAGE_ID=(SELECT MAX (STAGE_ID) FROM COLLECTION_PIT WHERE COLLECTION_PIT.COLLECTION_NAME=a.COLLECTION_NAME AND COLLECTION_PIT.PIT_ID<=$1001 AND COLLECTION_PIT.STAGE_CODE=$2) You can then pass the array using PreparedStatement.setArray() This has the additional advantage that you don't need to build the query dynamically and there is only a single statement to be parsed. I don't think Postgres distinguishes between soft and hard parses as it doesn't cache plans as aggressively as Oracle.
update faster way
Hello, We have to update a column value(from numbers like '123' to codes like 'abc' by looking into a reference table data) in a partitioned table with billions of rows in it, with each partition having 100's millions rows. As we tested for ~30million rows it's taking ~20minutes to update. So if we go by this calculation, it's going to take days for updating all the values. So my question is 1) If there is any inbuilt way of running the update query in parallel (e.g. using parallel hints etc) to make it run faster? 2) should we run each individual partition in a separate session (e.g. five partitions will have the updates done at same time from 5 different sessions)? And will it have any locking effect or we can just start the sessions and let them run without impacting our live transactions? UPDATE tab_part1 SET column1 = reftab.code FROM reference_tab reftab WHERE tab_part1.column1 = subquery.column1; Regards Yudhi
Re: update faster way
Hii, On Fri, Sep 13, 2024 at 10:22 PM yudhi s wrote: > > Hello, > We have to update a column value(from numbers like '123' to codes like 'abc' > by looking into a reference table data) in a partitioned table with billions > of rows in it, with each partition having 100's millions rows. As we tested > for ~30million rows it's taking ~20minutes to update. So if we go by this > calculation, it's going to take days for updating all the values. So my > question is > > 1) If there is any inbuilt way of running the update query in parallel (e.g. > using parallel hints etc) to make it run faster? > 2) should we run each individual partition in a separate session (e.g. five > partitions will have the updates done at same time from 5 different > sessions)? And will it have any locking effect or we can just start the > sessions and let them run without impacting our live transactions? Do you have any indexes? If not - you should, if yes - what are they? Thank you. > > UPDATE tab_part1 > SET column1 = reftab.code > FROM reference_tab reftab > WHERE tab_part1.column1 = subquery.column1; > > Regards > Yudhi
Re: update faster way
> > > > Do you have any indexes? > If not - you should, if yes - what are they? > > > Yes we have a primary key on this table which is on a UUID type column and also we have other indexes in other timestamp columns . But how is this going to help as we are going to update almost all the rows in the table?