Query returns no results until REINDEX

2020-02-07 Thread Colin Adler
Hi all,

Earlier today we were trying to debug why a row wasn't being deleted from a
table and we ran into some interesting behavior.

This is the table in question:

coder=# \d+ extensions
Table "public.extensions"
Column |   Type   | Collation | Nullable | Default
| Storage  | Stats target | Description
+--+---+--+-+--+--+-
ext_name   | text |   | not null |
| extended |  |
publisher_name | text |   | not null |
| extended |  |
ext_version| text |   | not null |
| extended |  |
obj| jsonb|   | not null |
| extended |  |
created_at | timestamp with time zone |   | not null |
| plain|  |
updated_at | timestamp with time zone |   | not null |
| plain|  |
Indexes:
   "extensions_pkey" PRIMARY KEY, btree (ext_name, publisher_name,
ext_version)
Foreign-key constraints:
   "extensions_ext_name_fkey" FOREIGN KEY (ext_name, publisher_name)
REFERENCES extension_statistics(ext_name, publisher_name) ON DELETE CASCADE
Referenced by:
   TABLE "extension_assets" CONSTRAINT "extension_assets_ext_name_fkey"
FOREIGN KEY (ext_name, publisher_name, ext_version) REFERENCES
extensions(ext_name, publisher_name, ext_version) ON DELETE CASCADE

coder=# select ext_name from extensions;
ext_name
---
vim
Go
min-theme
terraform
prettier-vscode
vscode-icons
gitlens
vscode-eslint
cpptools
rust
Angular-BeastCode
(11 rows)

We start to run into issues when querying for the "Go" extension.

coder=# select ext_name from extensions where ext_name = 'Go';
ext_name
--
(0 rows)

Other extensions seem to be queried just fine.

coder=# select ext_name from extensions where ext_name =
'Angular-BeastCode';
ext_name
---
Angular-BeastCode
(1 row)

Using LIKE seems to find the broken row fine.

coder=# select ext_name from extensions where ext_name LIKE '%Go';
ext_name
--
Go
(1 row)

That then begs the question, maybe there's some weird whitespace in front
causing it to fail.

coder=# select encode(ext_name::bytea, 'hex') from extensions where
ext_name LIKE '%Go';
encode

476f
(1 row)

Doesn't seem like it. After a bit of confusion I thought to reindex the
table.

coder=# reindex table extensions;
REINDEX
coder=# select ext_name from extensions where ext_name = 'Go';
ext_name
--
Go
(1 row)

Seems to work now. My question is, is this something I should report to the
maintainers? I took a snapshot of the data folder before the reindex in
case it
would be helpful. Is index corruption something that should be actively
looked
out for?

It's worth noting this particular row has existed in the database for quite
a
long time. Probably over 3 months.

Thanks for taking a look,
Colin


Re: Query returns no results until REINDEX

2020-02-09 Thread Colin Adler
Peter,

Thanks for taking the time to respond and hope you're having a good
weekend. My
apologies for not noting the Postgres version. I thought about it after I
sent
the email but I didn't want to spam the thread.

$ psql --version
psql (PostgreSQL) 11.6

One interesting thing you mention is that the system probably changed libc
versions. This is correct. We recently moved from an Alpine based docker
container running `musl libc 1.1.22` to a RHEL based one running `glibc
2.17`.
This was done a month or so ago. The `Go` extension row would've existed in
the
database before this switch.

I dumped and restored our database in prod and all seems to be fine now. I'm
now testing using the backup from prod on my local machine with the same
RHEL
image running `glibc 2.17`.

coder=# select ext_name from extensions where ext_name = 'Go';
 ext_name
--
(0 rows)

coder=# SELECT bt_index_check('extensions_pkey', true);
ERROR:  item order invariant violated for index "extensions_pkey"
DETAIL:  Lower index tid=(1,2) (points to heap tid=(0,2)) higher index
tid=(1,3) (points to heap tid=(2,1)) page lsn=1/FF9C13B8.

Looks like it found something. I checked out the contrib/pageinspect docs
but
wasn't too sure what to run. Are incompatible libc versions causing btree
corruption something you consider a bug? If it's something you'd like to
look
into further I can gladly send over the database files.

Thanks,
Colin

On Fri, Feb 7, 2020 at 7:08 PM Peter Geoghegan  wrote:

> On Fri, Feb 7, 2020 at 3:52 PM Colin Adler  wrote:
> > Seems to work now. My question is, is this something I should report to
> the
> > maintainers?
>
> I am one of the people that maintains the B-Tree code.
>
> You didn't mention what version of Postgres you're using here. That
> could be important. Please let us know. Mention the minor component of
> the release version, too (i.e. say 12.2, not just 12).
>
> > I took a snapshot of the data folder before the reindex in case it
> > would be helpful. Is index corruption something that should be actively
> looked
> > out for?
>
> Yes -- look for corruption. If I had to guess, I'd say that this has
> something to do with upgrading the operating system to use a
> different, incompatible glibc. Or perhaps it has something to do with
> streaming replication between machines with different glibc version.
>
> You should try running contrib/amcheck, which should be able to isolate
> index corruption, and give you a specific complaint. You may then be
> able to inspect the exact index page with the problem using
> contrib/pageinspect. Something like this ought to do it on Postgres
> 11 or 12:
>
> CREATE EXTENSION IF NOT EXISTS amcheck
> SELECT bt_index_check('my_index', true);
>
> If that doesn't show any errors, then perhaps try this:
>
> SELECT bt_index_parent_check('my_index', true);
>
> If you're on Postgres 10, then you should leave out the second
> argument, "true", since that version doesn't have the extra
> heapallindexed check.
>
> Let us know what you see.
>
> --
> Peter Geoghegan
>