Alter the column data type of the large data volume table.
Hi all, I have a table with 1.6 billion records. The data type of the primary key column is incorrectly used as integer. I need to replace the type of the column with bigint. Is there any ideas for this? Solutions that have been tried: Adding temporary columns was too time-consuming, so I gave up. Using a temporary table, there is no good way to migrate the original table data to the temporary table Thanks in advance.
Re: Alter the column data type of the large data volume table.
What I mean is that it will take a lot of time to complete all data processing.I have to say that it is a good solution to adjust the column type without affecting online users. I found a tool on github, see the link below, unfortunately, this is for MySQL... https://github.com/github/gh-ost regards. Michael Lewis 于2020年12月4日周五 下午1:04写道: > Please do reply all. > > Not sure what you mean about it taking too much time. It's rewriting a > bunch of data. It's going to take a while. The best you can do is break the > work up into small pieces and commit each piece. > > On Thu, Dec 3, 2020, 7:11 PM charles meng wrote: > >> Thanks for your help, I think the first method I tried (adding temporary >> column) is similar to what you said, but it takes too much time for me. >> >> >> Thanks again. >> >> Michael Lewis 于2020年12月4日周五 上午1:11写道: >> >>> On Wed, Dec 2, 2020 at 11:53 PM charles meng wrote: >>> >>>> Hi all, >>>> >>>> I have a table with 1.6 billion records. The data type of the primary >>>> key column is incorrectly used as integer. I need to replace the type of >>>> the column with bigint. Is there any ideas for this? >>>> >>>> Solutions that have been tried: >>>> Adding temporary columns was too time-consuming, so I gave up. >>>> Using a temporary table, there is no good way to migrate the original >>>> table data to the temporary table >>>> >>>> Thanks in advance. >>>> >>> >>> You can add a new column with NO default value and null as default and >>> have it be very fast. Then you can gradually update rows in batches (if on >>> PG11+, perhaps use do script with a loop to commit after X rows) to set the >>> new column the same as the primary key. Lastly, in a transaction, update >>> any new rows where the bigint column is null, and change which column is >>> the primary key & drop the old one. This should keep each transaction >>> reasonably sized to not hold up other processes. >>> >>
Re: Alter the column data type of the large data volume table.
Hi Olivier, My PG version is 10. Anyway, thanks a lot for your help. Best regards. Olivier Gautherot 于2020年12月4日周五 下午6:14写道: > Hi Charles, > On 04-12-2020 9:44, Olivier Gautherot wrote: > > Hi Charles, > > On Fri, Dec 4, 2020 at 9:12 AM charles meng wrote: > >> What I mean is that it will take a lot of time to complete all data >> processing.I have to say that it is a good solution to adjust the column >> type without affecting online users. >> >> I found a tool on github, see the link below, unfortunately, this is for >> MySQL... >> >> https://github.com/github/gh-ost >> > > MySQL has its own strategy with regard to column handling so what works > there does not necessarily fit here. > > There are some good ideas in this thread but I would start with a few > checks: > > 1) What version of PG are you using ? > > 2) If you can try on your current setup (not necessarily in Production) to > add a column typed bigint - if it is recent enough it will be a simple > catalog update. Michael's suggestion is viable > > 3) Given the massive number of rows, I would update as suggested, > progressively in batches of a few tens of thousands. Make sure you commit > and vacuum after each to retrieve the space (or you may end up with a > storage space issue in addition to all the rest). In the meantime, add a > trigger to set the new column to the index value. Once the new column is > complete, drop the old column and set the new one as primary key (it may > take a while to recalculate the index). > > 4) If your table is still growing, I would definitely look into > partitioning as it can hardly make things worse. > > 5) If you're brave enough, convert your current table as a partition > (rename it to something like table_hist), duplicate the table model under > the same name as now (adjusting the primary key type) and set the INHERITS > on the primary key range. The inheritance should take care of the type > conversion (haven't tried it but it's worth a try). If it works, you will > reach your goal without downtime or significant overhead. > > Sorry, just tried this one and it failed: type mismatch. > > Cheers > -- > Olivier Gautherot > >
Re: Alter the column data type of the large data volume table.
Hi Kevin, This sounds like a good idea, I will work hard on this idea and let you know the result. Most appreciated. Kevin Brannen 于2020年12月5日周六 上午12:04写道: > *>From:* Olivier Gautherot > > >>5) If you're brave enough, convert your current table as a partition > (rename it to something like table_hist), duplicate the table model under > the same name as now (adjusting the primary key type) and set the INHERITS > on the primary key range. The inheritance should take care of the type > conversion (haven't tried it but it's worth a try). If it works, you will > reach your goal without downtime or significant overhead. > > >Sorry, just tried this one and it failed: type mismatch. > > > > Seems like a sound idea in general. I’d probably rename the tables, let’s > call them “big_hist” for the old big table and “big_split” for the new > partitioned table that being used go forward – assuming the original table > was called “big”. Then create a View that will look at both of those but > call it the same as the old table, and let the view do a type cast on the > old key like big_hist.id::bigint so it matches the new type, because the > view will probably be a union and the type need to match. That way your > application only has to pause long enough to do a few meta-commands then it > all can resume, and like Olivier pointed you, you can fix the data by > moving it from big_hist to big_split in the background as you have time. > > > > I’d probably put it all in a transaction too: > > > > Create table … -- all the commands to create your patitioned table > big_split here > > Begin; > > Alter table big rename to big_hist; > > Create view big select * from big_split union select id::bigint, /* other > cols */ from big_hist; > > Commit; > > > > Try it on a dev system and if it works you’re off and running. I’d expect > the view to slow things down a little, but probably not too much if you > have good indexes. But at least you could transition without major downtime > and then rename “big_split” back to “big” and drop “big_hist” when you’ve > finished the transition. I might even be tempted to add a trigger so that > all new inserts into “big” really go into “big_split” so “big_hist” doesn’t > grow any more. Your imagination is probably the limit. 😊 > > > > HTH, > > Kevin > > . > > > This e-mail transmission, and any documents, files or previous e-mail > messages attached to it, may contain confidential information. If you are > not the intended recipient, or a person responsible for delivering it to > the intended recipient, you are hereby notified that any disclosure, > distribution, review, copy or use of any of the information contained in or > attached to this message is STRICTLY PROHIBITED. If you have received this > transmission in error, please immediately notify us by reply e-mail, and > destroy the original transmission and its attachments without reading them > or saving them to disk. Thank you. >