Re: INSERT Trigger to check for existing records

2020-11-22 Thread Michael Lewis
If you can modify your insert statement, and live with an extra column in
the data, no trigger is needed as best I can figure.

Create a unique index over the existing columns, add a "created_on" field
and call insert on conflict (unique index) do nothing.

This should give the behavior you want.


Re: INSERT Trigger to check for existing records

2020-11-22 Thread Hagen Finley

Hello Michael,

Thanks so much for this advice. As I mentioned previously, I'm not very 
good at this yet, so forgive me if my response is obtuse. I really love 
databases but my sentiments may be unrequited.


In reality my table has lots of columns (~30) including a report date 
(repdate) and each week's pull has a new repdate ( in this case 
2020-11-02 and 2020-11-09) which could function as a "created on" field.


To clarify, I would create an unique index on all the columns in the old 
report records (2020-11-02)  or just the three I am comparing 
(dealid,stage and revenue)?


In either case, so far in my efforts it looks like the create index 
fails because there are lots of rows with the same stage value, and a 
few with the same revenue value.


Create UNIQUE INDEX idx_sfdc
ON sfdc(ndealid, stage, revusd);

ERROR: could not create unique index "idx_sfdc" DETAIL: Key (ndealid, 
stage, revusd)=(19743576, Commit - 90% , 22072.37) is duplicated. SQL 
state: 23505


I probably could create an unique index on the dealid column as that 
should be unique. Would that be enough? It seems like that would insert 
ONLY the new records with a new (unique) dealid and that would 
definitely by an important step forward.


I hesitate to admit I have no idea how I would code the "call insert on 
conflict (unique index) do nothing" syntax, but I would be excited to learn.


Best,

Hagen


On 11/22/20 8:54 AM, Michael Lewis wrote:
If you can modify your insert statement, and live with an extra column 
in the data, no trigger is needed as best I can figure.


Create a unique index over the existing columns, add a "created_on" 
field and call insert on conflict (unique index) do nothing.


This should give the behavior you want.


Re: INSERT Trigger to check for existing records

2020-11-22 Thread Adrian Klaver

On 11/22/20 9:53 AM, Hagen Finley wrote:

Hello Michael,

Thanks so much for this advice. As I mentioned previously, I'm not very 
good at this yet, so forgive me if my response is obtuse. I really love 
databases but my sentiments may be unrequited.


The overriding issue is lack of a plan. From your first post:

"Here’s what I (REALLY) want:


Trigger looks at three fields prior to new insert: Deal ID (numeric),
revenue (numeric), stage(char) Example: 19743576 22072.37Commit
- 90%


 1. If the NEW dealid doesn't match any of the OLD dealids, insert the
new row
 2. if the NEW dealid, revenue and stage fields ALL match the OLD
dealid, revenue and stage, skip (don't insert the NEW row)
 3. If the NEW dealid matches an OLD dealid but either the NEW revenue
OR the stage fields have changed (don't match OLD record) insert new
row (I'll review both rows manually)
"

And from later post:

" I figure I could
use the chk field to delete the new inserts I didn't need."


From this I come up with the following:

1) Data rules

  a) If dealid in new data does not exist in old data INSERT row.
  b) Id dealid is in both new and old data AND revenue OR stage don't 
 match then INSERT and mark for review.
  c) If new dealid, revenue, stage match old dealid, revenue, stage 
then do not INSERT.


2) Process the data. Choices
   a) Use trigger on table sfdc
   b) Use staging table to hold new data and then process into sfdc table

3) Process the data. Same basic principle for both choices in 2) Flowchart
   a) In new data search for dealid in table sfdc if it does not exist 
add data to sfdc.

   b) If new data dealid does exist in sfdc
1) If revenue or stage field differ mark for review
2) If they do match skip further processing
4) Thoughts about above.
   a) To me table sfdc should only hold vetted data that is known to be 
unique per row.
   b) The data for review  1)b) 3)b) should end up in another review 
table e.g. sfdc_review.

   c) Since from OP ' 80% of the records are
the same as the existing records from the week before.' it makes sense 
to use the staging table 2)b) process rather then throwing away a lot of 
INSERTs.


If this makes sense then it comes down to decision in which choice in 2) 
to use. At that point it is filling in the flowchart with the exact 
steps to take.




In reality my table has lots of columns (~30) including a report date 
(repdate) and each week's pull has a new repdate ( in this case 
2020-11-02 and 2020-11-09) which could function as a "created on" field.


To clarify, I would create an unique index on all the columns in the old 
report records (2020-11-02)  or just the three I am comparing 
(dealid,stage and revenue)?


In either case, so far in my efforts it looks like the create index 
fails because there are lots of rows with the same stage value, and a 
few with the same revenue value.


Create UNIQUE INDEX idx_sfdc
ON sfdc(ndealid, stage, revusd);

ERROR: could not create unique index "idx_sfdc" DETAIL: Key (ndealid, 
stage, revusd)=(19743576, Commit - 90% , 22072.37) is duplicated. SQL 
state: 23505


I probably could create an unique index on the dealid column as that 
should be unique. Would that be enough? It seems like that would insert 
ONLY the new records with a new (unique) dealid and that would 
definitely by an important step forward.


I hesitate to admit I have no idea how I would code the "call insert on 
conflict (unique index) do nothing" syntax, but I would be excited to learn.


Best,

Hagen


On 11/22/20 8:54 AM, Michael Lewis wrote:
If you can modify your insert statement, and live with an extra column 
in the data, no trigger is needed as best I can figure.


Create a unique index over the existing columns, add a "created_on" 
field and call insert on conflict (unique index) do nothing.


This should give the behavior you want.



--
Adrian Klaver
adrian.kla...@aklaver.com




RE: INSERT Trigger to check for existing records

2020-11-22 Thread hagen
Thank you Adrian,

Your summary looks exactly right to me. I think option 2b looks more in reach 
for my limited skillset.
Let me see if I can make that work (dubious) and report.

Best,

Hagen

-Original Message-
From: Adrian Klaver  
Sent: Sunday, November 22, 2020 11:26 AM
To: Hagen Finley ; pgsql-general@lists.postgresql.org
Subject: Re: INSERT Trigger to check for existing records

On 11/22/20 9:53 AM, Hagen Finley wrote:
> Hello Michael,
> 
> Thanks so much for this advice. As I mentioned previously, I'm not 
> very good at this yet, so forgive me if my response is obtuse. I 
> really love databases but my sentiments may be unrequited.

The overriding issue is lack of a plan. From your first post:

"Here’s what I (REALLY) want:


Trigger looks at three fields prior to new insert: Deal ID (numeric),
revenue (numeric), stage(char) Example: 19743576 22072.37Commit
- 90%


  1. If the NEW dealid doesn't match any of the OLD dealids, insert the
 new row
  2. if the NEW dealid, revenue and stage fields ALL match the OLD
 dealid, revenue and stage, skip (don't insert the NEW row)
  3. If the NEW dealid matches an OLD dealid but either the NEW revenue
 OR the stage fields have changed (don't match OLD record) insert new
 row (I'll review both rows manually) "

And from later post:

" I figure I could
use the chk field to delete the new inserts I didn't need."


 From this I come up with the following:

1) Data rules

   a) If dealid in new data does not exist in old data INSERT row.
   b) Id dealid is in both new and old data AND revenue OR stage don't
  match then INSERT and mark for review.
   c) If new dealid, revenue, stage match old dealid, revenue, stage then do 
not INSERT.

2) Process the data. Choices
a) Use trigger on table sfdc
b) Use staging table to hold new data and then process into sfdc table

3) Process the data. Same basic principle for both choices in 2) Flowchart
a) In new data search for dealid in table sfdc if it does not exist add 
data to sfdc.
b) If new data dealid does exist in sfdc
1) If revenue or stage field differ mark for review
2) If they do match skip further processing
4) Thoughts about above.
a) To me table sfdc should only hold vetted data that is known to be unique 
per row.
b) The data for review  1)b) 3)b) should end up in another review table 
e.g. sfdc_review.
c) Since from OP ' 80% of the records are the same as the existing records 
from the week before.' it makes sense to use the staging table 2)b) process 
rather then throwing away a lot of INSERTs.

If this makes sense then it comes down to decision in which choice in 2) to 
use. At that point it is filling in the flowchart with the exact steps to take.

> 
> In reality my table has lots of columns (~30) including a report date
> (repdate) and each week's pull has a new repdate ( in this case
> 2020-11-02 and 2020-11-09) which could function as a "created on" field.
> 
> To clarify, I would create an unique index on all the columns in the 
> old report records (2020-11-02)  or just the three I am comparing 
> (dealid,stage and revenue)?
> 
> In either case, so far in my efforts it looks like the create index 
> fails because there are lots of rows with the same stage value, and a 
> few with the same revenue value.
> 
> Create UNIQUE INDEX idx_sfdc
> ON sfdc(ndealid, stage, revusd);
> 
> ERROR: could not create unique index "idx_sfdc" DETAIL: Key (ndealid, 
> stage, revusd)=(19743576, Commit - 90% , 22072.37) is duplicated. SQL
> state: 23505
> 
> I probably could create an unique index on the dealid column as that 
> should be unique. Would that be enough? It seems like that would 
> insert ONLY the new records with a new (unique) dealid and that would 
> definitely by an important step forward.
> 
> I hesitate to admit I have no idea how I would code the "call insert 
> on conflict (unique index) do nothing" syntax, but I would be excited to 
> learn.
> 
> Best,
> 
> Hagen
> 
> 
> On 11/22/20 8:54 AM, Michael Lewis wrote:
>> If you can modify your insert statement, and live with an extra 
>> column in the data, no trigger is needed as best I can figure.
>>
>> Create a unique index over the existing columns, add a "created_on" 
>> field and call insert on conflict (unique index) do nothing.
>>
>> This should give the behavior you want.


--
Adrian Klaver
adrian.kla...@aklaver.com





Re: INSERT Trigger to check for existing records

2020-11-22 Thread Hagen Finley

Adrian and Michael,

My current insert logic (which works) is in a psycopg2 python script 
which reads a spreadsheet row into an array, so for the moment I didn't 
want to add that integration to my struggle.


   cur = conn.cursor()
   \
   query = "INSERT INTO
   
sfdc(theater,country,account,smotion,opname,cprod,opid,*ndealid,*qnum,*stage,revusd*,cdate,bdate,age,opown,opnum,sonum,fbdate,region,dqnum,pid,closed,won,onum,repdate)

   VALUES
   
(%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s);"

   cur.executemany(query, frecords)
   conn.commit()
   conn.close()


However, the following is something of a stepping stone towards the 
destination and, (dare I say it? ;-) it works:


CREATE OR REPLACE FUNCTION same_test(did numeric)
RETURNS numeric AS $$
BEGIN
  IF $1 IN
      (SELECT ndealid from hygiene_112)
  THEN
    UPDATE hygiene_119 SET paid = 'SAME';
  ELSE
      UPDATE hygiene_119 SET paid = 'NEW';
  END IF;
RETURN NULL;
END;
$$ LANGUAGE plpgsql;


I changed the dealid to something that doesn't exist (14593030) in the 
11-2 table and the function updates the 11-9 table.status field to "NEW":


sales=# UPDATE hygiene_112 SET ndealid = 14593030 WHERE ndealid = 14593039;
UPDATE 1

SELECT same_test(14593039);

+--+---+--++--+--+--+
|repdate   |ndealid |revusd    |stage |status  |
+--+---+--++--+--+--+
|2020-11-09|14593039|1015624.23|Propose - 60% *|NEW * |
+--+---+--++--+--+--+

When I changed it back I get the proper "SAME" update:

sales=# UPDATE hygiene_112 SET ndealid = 14593039 WHERE ndealid = 14593030;
UPDATE 1

SELECT same_test(14593039);

+--+---+--++--+--+--+
|repdate   |ndealid |revusd    |stage |status  |
+--+---+--++--+--+--+
|2020-11-09|14593039|1015624.23|Propose - 60% |*SAME * |
+--+---+--++--+--+--+


I'm generally amazed when ANYTHING works so this is good news (to me). 
It seems logical I could replace the UPDATE statement with an INSERT 
statement at this point.


However, that only addresses one of the /data rules /on my checklist.

I'll keep forging ahead here and see what additional progress I can 
attain. Very much appreciate your patient assistance here.


Best,

Hagen


On 11/22/20 11:26 AM, Adrian Klaver wrote:

On 11/22/20 9:53 AM, Hagen Finley wrote:

Hello Michael,

Thanks so much for this advice. As I mentioned previously, I'm not 
very good at this yet, so forgive me if my response is obtuse. I 
really love databases but my sentiments may be unrequited.


The overriding issue is lack of a plan. From your first post:

"Here’s what I (REALLY) want:


Trigger looks at three fields prior to new insert: Deal ID (numeric),
revenue (numeric), stage(char) Example: 19743576 22072.37 Commit
- 90%


 1. If the NEW dealid doesn't match any of the OLD dealids, insert the
    new row
 2. if the NEW dealid, revenue and stage fields ALL match the OLD
    dealid, revenue and stage, skip (don't insert the NEW row)
 3. If the NEW dealid matches an OLD dealid but either the NEW revenue
    OR the stage fields have changed (don't match OLD record) insert new
    row (I'll review both rows manually)
"

And from later post:

" I figure I could
use the chk field to delete the new inserts I didn't need."


From this I come up with the following:

1) Data rules

  a) If dealid in new data does not exist in old data INSERT row.
  b) Id dealid is in both new and old data AND revenue OR stage don't 
 match then INSERT and mark for review.
  c) If new dealid, revenue, stage match old dealid, revenue, stage 
then do not INSERT.


2) Process the data. Choices
   a) Use trigger on table sfdc
   b) Use staging table to hold new data and then process into sfdc table

3) Process the data. Same basic principle for both choices in 2) 
Flowchart
   a) In new data search for dealid in table sfdc if it does not exist 
add data to sfdc.

   b) If new data dealid does exist in sfdc
1) If revenue or stage field differ mark for review
2) If they do match skip further processing
4) Thoughts about above.
   a) To me table sfdc should only hold vetted data that is known to 
be unique per row.
   b) The data for review  1)b) 3)b)