multiple tables got corrupted

2020-09-15 Thread Vasu Madhineni
Hi All,

In one of my postgres databases multiple tables got corrupted and followed
the below steps but still the same error.

1.SET zero_damaged_pages = on
2. VACUUM ANALYZE, VACUUM FULL
but still same error.

moh_fa=# VACUUM FULL;
ERROR:  could not read block 9350 in file "base/1156523/1270812":
Input/output error

Tried to take backup of tables with pg_dump but same error. files exist
physically in base location.

How to proceed on this, no backup to restore.

Thanks in advance

Regards,
Vasu Madhineni


Re: multiple tables got corrupted

2020-09-15 Thread Magnus Hagander
On Tue, Sep 15, 2020 at 11:15 AM Vasu Madhineni 
wrote:

> Hi All,
>
> In one of my postgres databases multiple tables got corrupted and followed
> the below steps but still the same error.
>
> 1.SET zero_damaged_pages = on
> 2. VACUUM ANALYZE, VACUUM FULL
> but still same error.
>


That is a very destructive first attempt. I hope you took a full disk-level
backup of the database before you did that, as it can ruin your chances for
forensics and data recovery for other issues.


moh_fa=# VACUUM FULL;
> ERROR:  could not read block 9350 in file "base/1156523/1270812":
> Input/output error
>
> Tried to take backup of tables with pg_dump but same error. files exist
> physically in base location.
>
> How to proceed on this, no backup to restore.
>
>
This is clearly some sort of disk error, and with no backups to restore you
will definitely be losing data.

I'd start by figuring out which tables have no corruption and do work, and
back those up (with pg_dump for example) as soon as possible to a different
machine -- since it's not exactly unlikely that further disk errors will
appear.

Once you've done that, identify the tables, and then try to do partial
recovery. For example, if you look at the file 1270812, how big it is?
PostgreSQL is failing to read block 9350 which is 76595200 bytes into the
file. If this is at the very end of the file, you can for example try to
get the data out until that point with LIMIT. If it's in the middle of the
file, it gets more ticky, but similar approaches can be done.

Also, unless you are running with data checksums enabled, I wouldn't fully
trust the data in the tables that you *can* read either. Since you clearly
have disk issues, they may have caused corruption elsewhere as well, so
whatever verification you can do against other tables, you should do as
well.


You'll of course also want to check any kernel logs or storage system logs
to see if they can give you a hint as to what happened, but they are
unlikely to actually give you something that will help you fix the problem.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: multiple tables got corrupted

2020-09-15 Thread Vasu Madhineni
Is it possible to identify which rows are corrupted in particular tables.

On Tue, Sep 15, 2020 at 5:36 PM Magnus Hagander  wrote:

>
>
> On Tue, Sep 15, 2020 at 11:15 AM Vasu Madhineni 
> wrote:
>
>> Hi All,
>>
>> In one of my postgres databases multiple tables got corrupted and
>> followed the below steps but still the same error.
>>
>> 1.SET zero_damaged_pages = on
>> 2. VACUUM ANALYZE, VACUUM FULL
>> but still same error.
>>
>
>
> That is a very destructive first attempt. I hope you took a full
> disk-level backup of the database before you did that, as it can ruin your
> chances for forensics and data recovery for other issues.
>
>
> moh_fa=# VACUUM FULL;
>> ERROR:  could not read block 9350 in file "base/1156523/1270812":
>> Input/output error
>>
>> Tried to take backup of tables with pg_dump but same error. files exist
>> physically in base location.
>>
>> How to proceed on this, no backup to restore.
>>
>>
> This is clearly some sort of disk error, and with no backups to restore
> you will definitely be losing data.
>
> I'd start by figuring out which tables have no corruption and do work, and
> back those up (with pg_dump for example) as soon as possible to a different
> machine -- since it's not exactly unlikely that further disk errors will
> appear.
>
> Once you've done that, identify the tables, and then try to do partial
> recovery. For example, if you look at the file 1270812, how big it is?
> PostgreSQL is failing to read block 9350 which is 76595200 bytes into the
> file. If this is at the very end of the file, you can for example try to
> get the data out until that point with LIMIT. If it's in the middle of the
> file, it gets more ticky, but similar approaches can be done.
>
> Also, unless you are running with data checksums enabled, I wouldn't fully
> trust the data in the tables that you *can* read either. Since you clearly
> have disk issues, they may have caused corruption elsewhere as well, so
> whatever verification you can do against other tables, you should do as
> well.
>
>
> You'll of course also want to check any kernel logs or storage system logs
> to see if they can give you a hint as to what happened, but they are
> unlikely to actually give you something that will help you fix the problem.
>
> --
>  Magnus Hagander
>  Me: https://www.hagander.net/ 
>  Work: https://www.redpill-linpro.com/ 
>


Re: How to calculate shared memory size

2020-09-15 Thread Tom Lane
=?UTF-8?Q?R=C3=A9beli-Szab=C3=B3_Tam=C3=A1s?=  writes:
> I want to do a proper calculation of the number of huge pages needed for 
> PostgreSQL to start.
> For that I need to know how much shared memory will be allocated at 
> startup.

TBH, the only method that's not going to break over time is
"start the server with the parameters you want, and look to
see how big the shmem segment is".

We used to have an approximate formula in the documentation,
but we took it out years ago because it was constantly out of date.

regards, tom lane




Re: multiple tables got corrupted

2020-09-15 Thread Magnus Hagander
Try reading them "row by row" until it breaks. That is, SELECT * FROM ...
LIMIT 1, then LIMIT 2 etc. For more efficiency use a binary search starting
at what seems like a reasonable place looking at the size of the table vs
the first failed block to make it faster, but the principle is the same.
Once it fails, you've found a corrupt block...

//Magnus


On Tue, Sep 15, 2020 at 12:46 PM Vasu Madhineni 
wrote:

> Is it possible to identify which rows are corrupted in particular tables.
>
> On Tue, Sep 15, 2020 at 5:36 PM Magnus Hagander 
> wrote:
>
>>
>>
>> On Tue, Sep 15, 2020 at 11:15 AM Vasu Madhineni 
>> wrote:
>>
>>> Hi All,
>>>
>>> In one of my postgres databases multiple tables got corrupted and
>>> followed the below steps but still the same error.
>>>
>>> 1.SET zero_damaged_pages = on
>>> 2. VACUUM ANALYZE, VACUUM FULL
>>> but still same error.
>>>
>>
>>
>> That is a very destructive first attempt. I hope you took a full
>> disk-level backup of the database before you did that, as it can ruin your
>> chances for forensics and data recovery for other issues.
>>
>>
>> moh_fa=# VACUUM FULL;
>>> ERROR:  could not read block 9350 in file "base/1156523/1270812":
>>> Input/output error
>>>
>>> Tried to take backup of tables with pg_dump but same error. files exist
>>> physically in base location.
>>>
>>> How to proceed on this, no backup to restore.
>>>
>>>
>> This is clearly some sort of disk error, and with no backups to restore
>> you will definitely be losing data.
>>
>> I'd start by figuring out which tables have no corruption and do work,
>> and back those up (with pg_dump for example) as soon as possible to a
>> different machine -- since it's not exactly unlikely that further disk
>> errors will appear.
>>
>> Once you've done that, identify the tables, and then try to do partial
>> recovery. For example, if you look at the file 1270812, how big it is?
>> PostgreSQL is failing to read block 9350 which is 76595200 bytes into the
>> file. If this is at the very end of the file, you can for example try to
>> get the data out until that point with LIMIT. If it's in the middle of the
>> file, it gets more ticky, but similar approaches can be done.
>>
>> Also, unless you are running with data checksums enabled, I wouldn't
>> fully trust the data in the tables that you *can* read either. Since you
>> clearly have disk issues, they may have caused corruption elsewhere as
>> well, so whatever verification you can do against other tables, you should
>> do as well.
>>
>>
>> You'll of course also want to check any kernel logs or storage system
>> logs to see if they can give you a hint as to what happened, but they are
>> unlikely to actually give you something that will help you fix the problem.
>>
>>


How to calculate shared memory size

2020-09-15 Thread Rébeli-Szabó Tamás

Hello all,

I want to do a proper calculation of the number of huge pages needed for 
PostgreSQL to start.


To do that, I need a good approximation of how much shared memory will 
be requested at server startup.


Looking at the shared memory calculation in the source (ipci.c), I can 
see that two terms stand out:


  size = add_size(size, BufferShmemSize()); ---> shared_buffers
  size = add_size(size, XLOGShmemSize()); --> 1/32 * shared_buffers, but 
max WAL segment size


If I use only those two terms ( shared buffers + WAL segment size) for 
my calculation, the result will be off by a lot, especially with bigger 
shared buffers (e.g. 32 GB).


What are any other major addends (in the order of 10 megabytes or more) 
that I need to take into account to get a better result?


Regards,

tamas





How to calculate shared memory size

2020-09-15 Thread Rébeli-Szabó Tamás

Hello all,

I want to do a proper calculation of the number of huge pages needed for 
PostgreSQL to start.


For that I need to know how much shared memory will be allocated at 
startup.


Looking at the shared memory calculation in the source (ipci.c), I can 
see that two terms stand out:


  size = add_size(size, BufferShmemSize()); ---> shared_buffers
  size = add_size(size, XLOGShmemSize()); --> 1/32 * shared_buffers, but 
max WAL segment size


However, if I only use those two terms (shared buffers + WAL segment 
size) for my calculation, the result will be off by a lot, especially 
with bigger shared buffers (e.g. 32 GB).


What are any other major terms (in the order of 10 megabytes or more) 
that I need to take into account to get a better result?


Regards,

tamas




Problems with MemoryContextSwitchTo ()

2020-09-15 Thread Yessica Brinkmann
Hello.
I think several of you will already remember me. I'm the one with the
IndexAdviser topic. Only that I changed my email address.
As you may recall, I am doing my thesis on the subject of IndexAdviser
modifications.
I really appreciate the help they have given me in various Postgresql
groups.
Well, I was really nearing the end of the programming part of the thesis,
when I had a problem to be able to compile my program in a moment, and by
accident some lines of source code were moved.
And for this reason, I think I have problems again with the context switch
issue, since at some point my context switch stopped working for me, I
think because of the issue that some lines of source code were moved.
Well, the fact is that I have a function called get_columnnames, which in
the second foreach, is printing the values of idxcd-> varattnnames [i] the
null value.
This second foreach, I only do it to test if the data is really saved well
and if I can recover it properly.
And since the data is not retrieved properly, or is not saved properly, in
the following functions of my program, the value of idxcd-> varattnnames
[i] continues to appear as null.
I will appreciate a lot please help, if you can tell me please why the
function prints null in the values of idxcd-> varattnnames [i], in the
second foreach, if it is due to an error in the context switch, or why it
could be .

I send my function as an attachment.


Best regards,
Yessica Brinkmann.
static List*
get_columnnames( List* candidates )
{
int proc;
int ret;
StringInfoData  query;  /* string for Query */
StringInfoData  cols;   /* string for Columns */
MemoryContext outerContext;
ListCell*cell;




IndexCandidate* idxcd;


elog( DEBUG3, "IND ADV: get_column_names: ENTER" );




initStringInfo( &query );
initStringInfo( &cols );

foreach( cell, candidates ) /* foreach cell in candidates */
{

int i;



/*elog (INFO, "Ingresando a foreach");*/
idxcd = (IndexCandidate*)lfirst( cell );

if (idxcd == NULL) {
elog( INFO, "idxcd IS NULL" );
continue; /* Or is that fatal enough to break instead? */
}

if (!idxcd->idxused)
continue;




/* pfree() the memory allocated for the previous 
candidate. FIXME: Avoid
* meddling with the internals of a StringInfo, and try 
to use an API.
*/
if( cols.len > 0 )
{
initStringInfo(&cols);
} /*IF col.len>0*/

if( query.len > 0 )
{
initStringInfo(&query);
} /*IF col.len>0*/

elog(INFO,"reloid:%d", idxcd->reloid);
appendStringInfo( &query, "select a.attname from 
pg_class c,pg_attribute a where c.oid=%d AND a.attrelid = c.oid AND (", 
idxcd->reloid);

/*elog(INFO,"QUERY:%s", query.data);*/

/*elog(INFO,"ncols:%d", idxcd->ncols);*/

for (i = 0; i < idxcd->ncols; ++i)
{
/*elog(INFO,"i:%d", i);*/
/*elog(INFO,"var attno i:%d", 
idxcd->varattno[i]);*/
/*elog(INFO,"cols:%s", cols.data);*/
appendStringInfo( &cols, "%s a.attnum=%d", (i>0 
? " OR" : ""), idxcd->varattno[i]);
/*elog(INFO,"cols:%s", cols.data);*/
/*elog(INFO,"i:%d", i);*/
elog(INFO,"varattno i:%d", idxcd->varattno[i]);


}/* foreach col in varattno*/

/*elog(INFO,"PASA EL FOR");*/
appendStringInfo( &cols, "%s", ")");

/* FIXME: Mention the column names explicitly after the 
table name. */
appendStringInfo( &query, "%s;", cols.data);

elog(INFO,"QUERY:%s", query.data);
/*elog(INFO,"LONGITUD:%d", query.len);*/

if( query.len > 0 ) /* if we generated any SQL */
{


outerContext = CurrentMemoryContext;