Creating A GIN index on JSONB column (large database)

2022-08-09 Thread Taylor Smith
Hi all,

I have a database that is in excess of 20TB in size, partitioned by date on
a month to month basis.

There is a column within that stores text (validated to be json but not
stored as JSONB). I have a requirement to make the json searchable using
GIN indexes. however when we try to create the GIN index on AWS RDS our
temp storage maxes out which crashes the create process resulting in
unhealthy created indexes.

This might be an AWS issue but we are wondering the reason why creating
these indexes (concurrently) would cause the temp storage to blow up and
crash out and is it possible to mitigate or a necessary evil.

The only solution we have so far is scale up our RDS instance.

Kind regards,

Taylor Smith


Re: Creating A GIN index on JSONB column (large database)

2022-08-09 Thread Rob Sargent



> On Aug 9, 2022, at 7:04 AM, Taylor Smith  wrote:
> 
> Hi all,
> 
> I have a database that is in excess of 20TB in size, partitioned by date on a 
> month to month basis.
> 
> There is a column within that stores text (validated to be json but not 
> stored as JSONB). I have a requirement to make the json searchable using GIN 
> indexes. however when we try to create the GIN index on AWS RDS our temp 
> storage maxes out which crashes the create process resulting in unhealthy 
> created indexes. 
> 
> This might be an AWS issue but we are wondering the reason why creating these 
> indexes (concurrently) would cause the temp storage to blow up and crash out 
> and is it possible to mitigate or a necessary evil. 
> 
> The only solution we have so far is scale up our RDS instance.
> 
> Kind regards,
> 
> Taylor Smith

The scale-up might be needed only for the create index.  Scale down afterwards?



Re: Creating A GIN index on JSONB column (large database)

2022-08-09 Thread Taylor Smith
Thanks Rob,

This is what I was thinking. Would you say it presents any risk then
having a database unable to rebuild its own indexes if needed?




On Tue, Aug 9, 2022 at 3:39 PM Rob Sargent  wrote:

>
>
> > On Aug 9, 2022, at 7:04 AM, Taylor Smith 
> wrote:
> >
> > Hi all,
> >
> > I have a database that is in excess of 20TB in size, partitioned by date
> on a month to month basis.
> >
> > There is a column within that stores text (validated to be json but not
> stored as JSONB). I have a requirement to make the json searchable using
> GIN indexes. however when we try to create the GIN index on AWS RDS our
> temp storage maxes out which crashes the create process resulting in
> unhealthy created indexes.
> >
> > This might be an AWS issue but we are wondering the reason why creating
> these indexes (concurrently) would cause the temp storage to blow up and
> crash out and is it possible to mitigate or a necessary evil.
> >
> > The only solution we have so far is scale up our RDS instance.
> >
> > Kind regards,
> >
> > Taylor Smith
>
> The scale-up might be needed only for the create index.  Scale down
> afterwards?


Re: Creating A GIN index on JSONB column (large database)

2022-08-09 Thread Rob Sargent



> On Aug 9, 2022, at 7:45 AM, Taylor Smith  wrote:
> 
> Thanks Rob,
> 
> This is what I was thinking. Would you say it presents any risk then having a 
> database unable to rebuild its own indexes if needed?
> 
> 

No I would not.  The scale-up + rebuild should tell you how much you actually 
need and if you’ve over done the scale up you can scale down to what you need.  
Was the original spec expecting 20+T?  Any chance of rolling older months off 
to some other node?





Re: Creating A GIN index on JSONB column (large database)

2022-08-09 Thread Taylor Smith
Yes, the database was always going to be a minimum of 20TB however we build
the indexes on the partition which at max is about 1.3TB, which in temp
files will use about 800GB temporary files. Note it doesn't have the same
effect when doing non-concurrently. I don't think rolling off would help as
its likely largest partition will either be the current month or the
previous month so the problem would still persist.



On Tue, Aug 9, 2022 at 3:50 PM Rob Sargent  wrote:

>
>
> > On Aug 9, 2022, at 7:45 AM, Taylor Smith 
> wrote:
> >
> > Thanks Rob,
> >
> > This is what I was thinking. Would you say it presents any risk then
> having a database unable to rebuild its own indexes if needed?
> >
> >
>
> No I would not.  The scale-up + rebuild should tell you how much you
> actually need and if you’ve over done the scale up you can scale down to
> what you need.  Was the original spec expecting 20+T?  Any chance of
> rolling older months off to some other node?
>
>


Re: Creating A GIN index on JSONB column (large database)

2022-08-09 Thread Ron

On 8/9/22 09:39, Rob Sargent wrote:



On Aug 9, 2022, at 7:04 AM, Taylor Smith  wrote:

Hi all,

I have a database that is in excess of 20TB in size, partitioned by date on a 
month to month basis.

There is a column within that stores text (validated to be json but not stored 
as JSONB). I have a requirement to make the json searchable using GIN indexes. 
however when we try to create the GIN index on AWS RDS our temp storage maxes 
out which crashes the create process resulting in unhealthy created indexes.

This might be an AWS issue but we are wondering the reason why creating these 
indexes (concurrently) would cause the temp storage to blow up and crash out 
and is it possible to mitigate or a necessary evil.

The only solution we have so far is scale up our RDS instance.

Kind regards,

Taylor Smith

The scale-up might be needed only for the create index.  Scale down afterwards?


Scaling down disks means using DMS (which costs money) to migrate the 
instance to a new instance.


--
Angular momentum makes the world go 'round.




Re: Creating A GIN index on JSONB column (large database)

2022-08-09 Thread Ron

800GB/20TB = 4%.

You've got a 20TB database, and run out of space when *4% is added*.

Either you're running way to close to the edge, or I misunderstand something.

Either way, enable auto-scaling. 
https://aws.amazon.com/about-aws/whats-new/2019/06/rds-storage-auto-scaling/



On 8/9/22 10:29, Taylor Smith wrote:
Yes, the database was always going to be a minimum of 20TB however we 
build the indexes on the partition which at max is about 1.3TB, which in 
temp files will use about 800GB temporary files. Note it doesn't have the 
same effect when doing non-concurrently. I don't think rolling off would 
help as its likely largest partition will either be the current month or 
the previous month so the problem would still persist.




On Tue, Aug 9, 2022 at 3:50 PM Rob Sargent  wrote:



> On Aug 9, 2022, at 7:45 AM, Taylor Smith 
wrote:
>
> Thanks Rob,
>
> This is what I was thinking. Would you say it presents any risk then
having a database unable to rebuild its own indexes if needed?
>
>

No I would not.  The scale-up + rebuild should tell you how much you
actually need and if you’ve over done the scale up you can scale down
to what you need.  Was the original spec expecting 20+T?  Any chance
of rolling older months off to some other node?



--
Angular momentum makes the world go 'round.

index row size 2720 exceeds btree version 4

2022-08-09 Thread Daulat
Hello Team,

We are facing an error while uploading data in a table that has two B-tree
indexes. As per the Postgres documentation there is a limitation of b-tree
index size but I need to know if there is any alternative to overcome this
issue.


Error:

" index row size 2720 exceeds btree version 4 maximum 2704 for index
""uk_gvoa_gi_ad"" 54000"
uk_gvoa_gi_ad

Index:

UNIQUE INDEX pk_gvoa_id ON test.groupedvuln_asset USING btree
(groupedvuln_orphanasset_id)"

UNIQUE INDEX uk_gvoa_gi_ad ON test.groupedvuln_asset USING btree (group_id,
hostip, macaddress, fqdn, scanid)"

Thanks