Alter the column data type of the large data volume table.

2020-12-02 Thread charles meng
Hi all,

I have a table with 1.6 billion records. The data type of the primary key
column is incorrectly used as integer. I need to replace the type of the
column with bigint. Is there any ideas for this?

Solutions that have been tried:
Adding temporary columns was too time-consuming, so I gave up.
Using a temporary table, there is no good way to migrate the original table
data to the temporary table

Thanks in advance.


Re: Alter the column data type of the large data volume table.

2020-12-04 Thread charles meng
What I mean is that it will take a lot of time to complete all data
processing.I have to say that it is a good solution to adjust the column
type without affecting online users.

I found a tool on github, see the link below, unfortunately, this is for
MySQL...

https://github.com/github/gh-ost

regards.

Michael Lewis  于2020年12月4日周五 下午1:04写道:

> Please do reply all.
>
> Not sure what you mean about it taking too much time. It's rewriting a
> bunch of data. It's going to take a while. The best you can do is break the
> work up into small pieces and commit each piece.
>
> On Thu, Dec 3, 2020, 7:11 PM charles meng  wrote:
>
>> Thanks for your help, I think the first method I tried (adding temporary
>> column) is similar to what you said, but it takes too much time for me.
>>
>>
>> Thanks again.
>>
>> Michael Lewis  于2020年12月4日周五 上午1:11写道:
>>
>>> On Wed, Dec 2, 2020 at 11:53 PM charles meng  wrote:
>>>
>>>> Hi all,
>>>>
>>>> I have a table with 1.6 billion records. The data type of the primary
>>>> key column is incorrectly used as integer. I need to replace the type of
>>>> the column with bigint. Is there any ideas for this?
>>>>
>>>> Solutions that have been tried:
>>>> Adding temporary columns was too time-consuming, so I gave up.
>>>> Using a temporary table, there is no good way to migrate the original
>>>> table data to the temporary table
>>>>
>>>> Thanks in advance.
>>>>
>>>
>>> You can add a new column with NO default value and null as default and
>>> have it be very fast. Then you can gradually update rows in batches (if on
>>> PG11+, perhaps use do script with a loop to commit after X rows) to set the
>>> new column the same as the primary key. Lastly, in a transaction, update
>>> any new rows where the bigint column is null, and change which column is
>>> the primary key & drop the old one. This should keep each transaction
>>> reasonably sized to not hold up other processes.
>>>
>>


Re: Alter the column data type of the large data volume table.

2020-12-04 Thread charles meng
Hi Olivier,

My PG version is 10.
Anyway, thanks a lot for your help.

Best regards.

Olivier Gautherot  于2020年12月4日周五 下午6:14写道:

> Hi Charles,
> On 04-12-2020 9:44, Olivier Gautherot wrote:
>
> Hi Charles,
>
> On Fri, Dec 4, 2020 at 9:12 AM charles meng  wrote:
>
>> What I mean is that it will take a lot of time to complete all data
>> processing.I have to say that it is a good solution to adjust the column
>> type without affecting online users.
>>
>> I found a tool on github, see the link below, unfortunately, this is for
>> MySQL...
>>
>> https://github.com/github/gh-ost
>>
>
> MySQL has its own strategy with regard to column handling so what works
> there does not necessarily fit here.
>
> There are some good ideas in this thread but I would start with a few
> checks:
>
> 1) What version of PG are you using ?
>
> 2) If you can try on your current setup (not necessarily in Production) to
> add a column typed bigint - if it is recent enough it will be a simple
> catalog update. Michael's suggestion is viable
>
> 3) Given the massive number of rows, I would update as suggested,
> progressively in batches of a few tens of thousands. Make sure you commit
> and vacuum after each to retrieve the space (or you may end up with a
> storage space issue in addition to all the rest). In the meantime, add a
> trigger to set the new column to the index value. Once the new column is
> complete, drop the old column and set the new one as primary key (it may
> take a while to recalculate the index).
>
> 4) If your table is still growing, I would definitely look into
> partitioning as it can hardly make things worse.
>
> 5) If you're brave enough, convert your current table as a partition
> (rename it to something like table_hist), duplicate the table model under
> the same name as now (adjusting the primary key type) and set the INHERITS
> on the primary key range. The inheritance should take care of the type
> conversion (haven't tried it but it's worth a try). If it works, you will
> reach your goal without downtime or significant overhead.
>
> Sorry, just tried this one and it failed: type mismatch.
>
> Cheers
> --
> Olivier Gautherot
>
>


Re: Alter the column data type of the large data volume table.

2020-12-05 Thread charles meng
Hi Kevin,

This sounds like a good idea, I will work hard on this idea and let you
know the result.

Most appreciated.

Kevin Brannen  于2020年12月5日周六 上午12:04写道:

> *>From:* Olivier Gautherot 
>
> >>5) If you're brave enough, convert your current table as a partition
> (rename it to something like table_hist), duplicate the table model under
> the same name as now (adjusting the primary key type) and set the INHERITS
> on the primary key range. The inheritance should take care of the type
> conversion (haven't tried it but it's worth a try). If it works, you will
> reach your goal without downtime or significant overhead.
>
> >Sorry, just tried this one and it failed: type mismatch.
>
>
>
> Seems like a sound idea in general. I’d probably rename the tables, let’s
> call them “big_hist” for the old big table and “big_split” for the new
> partitioned table that being used go forward – assuming the original table
> was called “big”. Then create a View that will look at both of those but
> call it the same as the old table, and let the view do a type cast on the
> old key like big_hist.id::bigint so it matches the new type, because the
> view will probably be a union and the type need to match. That way your
> application only has to pause long enough to do a few meta-commands then it
> all can resume, and like Olivier pointed you, you can fix the data by
> moving it from big_hist to big_split in the background as you have time.
>
>
>
> I’d probably put it all in a transaction too:
>
>
>
> Create table … -- all the commands to create your patitioned table
> big_split here
>
> Begin;
>
> Alter table big rename to big_hist;
>
> Create view big select * from big_split union select id::bigint, /* other
> cols */ from big_hist;
>
> Commit;
>
>
>
> Try it on a dev system and if it works you’re off and running. I’d expect
> the view to slow things down a little, but probably not too much if you
> have good indexes. But at least you could transition without major downtime
> and then rename “big_split” back to “big” and drop “big_hist” when you’ve
> finished the transition. I might even be tempted to add a trigger so that
> all new inserts into “big” really go into “big_split” so “big_hist” doesn’t
> grow any more. Your imagination is probably the limit. 😊
>
>
>
> HTH,
>
> Kevin
>
> .
>
>
> This e-mail transmission, and any documents, files or previous e-mail
> messages attached to it, may contain confidential information. If you are
> not the intended recipient, or a person responsible for delivering it to
> the intended recipient, you are hereby notified that any disclosure,
> distribution, review, copy or use of any of the information contained in or
> attached to this message is STRICTLY PROHIBITED. If you have received this
> transmission in error, please immediately notify us by reply e-mail, and
> destroy the original transmission and its attachments without reading them
> or saving them to disk. Thank you.
>