time taking deletion on large tables
Hi, The feed_posts table has over 50 Million rows. When I m deleting all rows of a certain type that are over 60 days old. When I try to do a delete like this: it hangs for an entire day, so I need to kill it with pg_terminate_backend(pid). DELETE FROM feed_posts WHERE feed_definition_id = 'bf33573d-936e-4e55-8607-72b685d2cbae' AND created_at > '2020-05-11 00:00:00' AND created_at < '2020-05-12 00:00:00'; So– I need help in figuring out how to do large deletes on a production database during normal hours. explain plan is given below "Delete on feed_posts (cost=1156.57..195748.88 rows=15534 width=6)" " -> Bitmap Heap Scan on feed_posts (cost=1156.57..195748.88 rows=15534 width=6)" " Recheck Cond: ((created_at >= '2020-05-11 00:00:00'::timestamp without time zone) AND (created_at <= '2020-05-12 00:00:00'::timestamp without time zone))" " Filter: (feed_definition_id = 'bf33573d-936e-4e55-8607-72b685d2cbae'::uuid)" " -> Bitmap Index Scan on feed_posts_created_at (cost=0.00..1152.68 rows=54812 width=0)" " Index Cond: ((created_at >= '2020-05-11 00:00:00'::timestamp without time zone) AND (created_at <= '2020-05-12 00:00:00'::timestamp without time zone))" please help me on deleting the rows, Do I need to anything in postgres configuration ? or in table structure ? Regards, Atul
Re: time taking deletion on large tables
Hi Atul, Please try the code below. Execute all the statements in one transaction. select * into new_table from old_table where type = 'abcz'; truncate table old_table; inesrt into old_table select * from new_table; On Thu, Dec 3, 2020 at 8:16 PM Atul Kumar wrote: > Hi, > > The feed_posts table has over 50 Million rows. > > When I m deleting all rows of a certain type that are over 60 days old. > > When I try to do a delete like this: it hangs for an entire day, so I > need to kill it with pg_terminate_backend(pid). > > DELETE FROM feed_posts > WHERE feed_definition_id = 'bf33573d-936e-4e55-8607-72b685d2cbae' > AND created_at > '2020-05-11 00:00:00' > AND created_at < '2020-05-12 00:00:00'; > > So– I need help in figuring out how to do large deletes on a > production database during normal hours. > > explain plan is given below > > > > "Delete on feed_posts (cost=1156.57..195748.88 rows=15534 width=6)" > " -> Bitmap Heap Scan on feed_posts (cost=1156.57..195748.88 > rows=15534 width=6)" > " Recheck Cond: ((created_at >= '2020-05-11 00:00:00'::timestamp > without time zone) AND (created_at <= '2020-05-12 00:00:00'::timestamp > without time zone))" > " Filter: (feed_definition_id = > 'bf33573d-936e-4e55-8607-72b685d2cbae'::uuid)" > " -> Bitmap Index Scan on feed_posts_created_at (cost=0.00..1152.68 > rows=54812 width=0)" > " Index Cond: ((created_at >= '2020-05-11 00:00:00'::timestamp without > time zone) AND (created_at <= '2020-05-12 00:00:00'::timestamp without > time zone))" > > > please help me on deleting the rows, Do I need to anything in postgres > configuration ? > or in table structure ? > > > > > > Regards, > Atul > > > -- *Regards,* *Ravikumar S,* *Ph: 8106741263*
Re: time taking deletion on large tables
On Thu, Dec 03, 2020 at 08:43:57PM +0530, Ravikumar Reddy wrote: > Please try the code below. Execute all the statements in one transaction. > > select * into new_table from old_table where type = 'abcz'; > truncate table old_table; > inesrt into old_table select * from new_table; This looks like advice for when most of the rows are being deleted, but I don't think that's true here. It'd need to LOCK old_table, first, right? Also, truncate isn't MVCC safe. Atul: What server version? Do you have an index on feed_definition_id ? https://wiki.postgresql.org/wiki/Slow_Query_Questions If explain (analyze,buffers) SELECT runs in a reasonable time for that query, include its output. On Thu, Dec 3, 2020 at 8:16 PM Atul Kumar wrote: > The feed_posts table has over 50 Million rows. > > When I m deleting all rows of a certain type that are over 60 days old. > > When I try to do a delete like this: it hangs for an entire day, so I > need to kill it with pg_terminate_backend(pid). > > DELETE FROM feed_posts > WHERE feed_definition_id = 'bf33573d-936e-4e55-8607-72b685d2cbae' > AND created_at > '2020-05-11 00:00:00' > AND created_at < '2020-05-12 00:00:00'; > > So– I need help in figuring out how to do large deletes on a > production database during normal hours. > > please help me on deleting the rows, Do I need to anything in postgres > configuration ? > or in table structure ?
Re: time taking deletion on large tables
Justin Pryzby writes: > On Thu, Dec 03, 2020 at 08:43:57PM +0530, Ravikumar Reddy wrote: >> When I try to do a delete like this: it hangs for an entire day, so I >> need to kill it with pg_terminate_backend(pid). >> >> DELETE FROM feed_posts >> WHERE feed_definition_id = 'bf33573d-936e-4e55-8607-72b685d2cbae' >> AND created_at > '2020-05-11 00:00:00' >> AND created_at < '2020-05-12 00:00:00'; 90% of the "delete takes forever" complaints that we hear trace down to having a foreign key reference to the deletion-target table that's not backed by an index on the referencing column. Then you end up getting a seqscan on the referencing table to look for rows referencing a row-to-be-deleted. And then another one for the next row. Etc. You could try "explain analyze" on a query deleting just a single one of these rows and see if an RI enforcement trigger is what's eating the time. regards, tom lane
Re: time taking deletion on large tables
On 12/3/20 11:16 AM, Tom Lane wrote: > Justin Pryzby writes: >> On Thu, Dec 03, 2020 at 08:43:57PM +0530, Ravikumar Reddy wrote: >>> When I try to do a delete like this: it hangs for an entire day, so I >>> need to kill it with pg_terminate_backend(pid). >>> >>> DELETE FROM feed_posts >>> WHERE feed_definition_id = 'bf33573d-936e-4e55-8607-72b685d2cbae' >>> AND created_at > '2020-05-11 00:00:00' >>> AND created_at < '2020-05-12 00:00:00'; > 90% of the "delete takes forever" complaints that we hear trace down to > having a foreign key reference to the deletion-target table that's not > backed by an index on the referencing column. Then you end up getting > a seqscan on the referencing table to look for rows referencing a > row-to-be-deleted. And then another one for the next row. Etc. > > You could try "explain analyze" on a query deleting just a single > one of these rows and see if an RI enforcement trigger is what's > eating the time. > > Yeah. IIRC some other RDBMS systems actually create such an index if it doesn't already exist. Maybe we should have a warning when setting up an FK constraint if the referencing fields aren't usefully indexed. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Temporarily disable not null constraints
Hi, Can we disable not null constraints temporarily in the session-based transaction, like we disable FK constraints? SETsession_replication_role = ‘replica’; alter table table_name disable trigger user;” above two options are working for unique constraints violation exception. Thanks,Rj
Re: Temporarily disable not null constraints
On Thu, Dec 3, 2020 at 1:00 PM Nagaraj Raj wrote: > Hi, > > Can we disable not null constraints temporarily in the session-based > transaction, like we disable FK constraints? > > SET session_replication_role = ‘replica’; > alter table table_name disable trigger user;” > > above two options are working for unique constraints violation exception. > > Thanks, > Rj > You can alter the column and remove the not null constraint, do your work, and then add it back, but it will have to verify all rows have that column set, that is, you can't leave some of them null.
Re: Temporarily disable not null constraints
generally, you shouldn't be disabling your constraints, especially if you are having multiple parallel processes accessing your db. instead, you should create them DEFERRABLE and have them checked at the end of your transaction. regarding your question about NOT NULL: it is not possible to have it deferred (please check this page: https://www.postgresql.org/docs/13/sql-set-constraints.html) you may alter your column, remove it, and then get it back, but still all rows will have to be checked, which I doubt you would like to see on a large table. regards, milos On Thu, Dec 3, 2020 at 9:00 PM Nagaraj Raj wrote: > Hi, > > Can we disable not null constraints temporarily in the session-based > transaction, like we disable FK constraints? > > SET session_replication_role = ‘replica’; > alter table table_name disable trigger user;” > > above two options are working for unique constraints violation exception. > > Thanks, > Rj >
Re: Temporarily disable not null constraints
On Thu, Dec 03, 2020 at 07:58:15PM +, Nagaraj Raj wrote: > Can we disable not null constraints temporarily in the session-based > transaction, like we disable FK constraints? If you're trying to temporarily violate the not-null constraint.. I don't know if it's a good idea.. ..but maybe this feature in v12 helps you: https://www.postgresql.org/docs/12/sql-altertable.html | Ordinarily this is checked during the ALTER TABLE by scanning the entire table; however, if a valid CHECK constraint is found which proves no NULL can exist, then the table scan is skipped. When you're done violating constraints, you can ALTER .. ADD CONSTRAINT .. CHECK (.. IS NOT NULL) NOT VALID, and then ALTER .. VALIDATE CONSTRAINT, and then ALTER column SET NOT NULL. -- Justin
Re: Pg_locks and pg_stat_activity
At Fri, 4 Dec 2020 12:11:59 +0530, Ravikumar Reddy
wrote in
> Hi,
> pg_stat_activity -- Providers the active and ideal connection for our
> database
> Pg_locks -- Provider the queries/procedure/function details if
> any object is locked at the current in our database.
Yeah..
That result is quite hard to see, but..
> On Fri, Dec 4, 2020 at 11:43 AM aditya desai wrote:
>
> > Hi Postgres Experts,
> > Requesting for advice on below. I am new to postgres :(
> >
> > Regards,
> > Aditya.
> >
> > On Tue, Dec 1, 2020 at 9:41 PM aditya desai wrote:
> >
> >> Hi,
> >> One of the API is calling three of the below queries. Output(jobids) of
> >> 1st query gets fed to 2nd query in API.
> >>
> >> Query 1:
> >>
> >> select j.id from job j where $19=$20 and j.internaljobcode in
> >> ($21,$22,$23,$24,$25,$26,$27,$28,$29,$30,$31) and j.countrycode = $1 and
> >> j.facilitycode in ($2) and j.jobstatuscode in ($3, $4, $5, $6) and
> >> j.internaljobcode in ($7, $8, $9, $10, $11, $12, $13, $14, $15) and
> >> ((j.jobstartdatetime between $16 and $17) or j.jobstartdatetime IS NULL)
> >> ORDER BY createddate DESC limit $18"
> >>
> >> Query 2
> >>
> >> with JobData AS ( select * from job where id in ($1, $2, $3, $4, $5, $6,
> >> $7, $8, $9, $10) )
> >> select j.id
> >> ,j.jobcategory,j.internaljobcode,j.jobname,j.jobstatuscode,j.jobreferencenumber,
> >>
> >> vws.vascodes,j.createddate,j.facilitycode,j.countrycode,j.sladate,j.codamount,j.jobstartdatetime,j.jobenddatetime,j.attemptcount,
> >> j.primeindicator,j.rescheduleddatetime,j.jobproductcode,
> >> j.tour_id, j.pickupaccount,
> >> j.connoterequired,j.expectedbags,j.expectedparcels,j.isservicepointpickup,
> >>
> >> j.estimateddeliverydatetime,j.currency,j.paymentmethod,j.paymentamount,j.missorted,j.pickupcustomername,j.mps,j.parcelcount,j.jobcontactpersonname,t.courier_id,t.tourid,
> >> js.jobstatusname, jt.externaljobcode, ja.city, ja.postalcode,
> >> ja.addressline1, ja.addressline2,
> >> ja.addressline3,ja.addresstype, ja.state
> >> from JobData j join jobaddress ja on ja.job_id=j.id join
> >> jobstatus js on js.jobstatuscode=j.jobstatuscode
> >> join jobtype jt on j.internaljobcode=jt.internaljobcode left
> >> join
> >> (select v.job_id, string_agg(distinct(v.code),'PPOD') as
> >> vascodes from JobData j join valueaddedservices v on j.id=v.job_id
> >> group by v.job_id) AS vws on vws.job_id=j.id left join tour t on
> >> t.id=j.tour_id
> >> and ((j.internaljobcode in ('003','012') and ja.addresstype='RETURN') or
> >> j.internaljobcode not in ('003','012')) ORDER BY id DESC ;
> >>
> >> Query3:
> >>
> >> "with JobCount as ( select jobstatuscode,count($14) stat_count from job j
> >> where $15=$16 and j.countrycode = $1 and j.facilitycode in ($2) and
> >> j.internaljobcode in ($3, $4, $5, $6, $7, $8, $9, $10, $11) and
> >> ((j.jobstartdatetime between $12 and $13) or j.jobstartdatetime IS NULL)
> >> group by j.jobstatuscode)
> >> select js.jobstatuscode,COALESCE(stat_count,$17) stat_count from JobCount
> >> jc right outer join jobstatus js on jc.jobstatuscode=js.jobstatuscode"
> >>
> >>
> >> When I run explain analyze for 1st two queries Execution Time is below 1
> >> milliseconds for these queries. Basically queries run fast and with low
> >> cost when ran from Database 'psql' or pgadmin. However when called from API
> >> Average Time in pg_stat_statements shows more than 1 second. When Load test
> >> runs these queries get concurrently called ,response time beomes poor with
> >> more load. Could this be due to Lockings.
> >>
> >>
> >> I checked pg_locks and I see the below records. Query that I used is also
> >> given below. I could see few ExclusiveLocks for "virtualxid" records and
> >> for queries with CTEs(WITH Clause). Please advise
> >>
> >>
> >>
> >> SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa
> >> ON pl.pid = psa.pid;
You would find that the "granted" column in all the rows from pg_locks
is "true", that is, no one is waiting on a lock. That slowdown doesn't
at least seem coming from lock conflict.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
