Full text search - wildcard and a stop word

2022-02-22 Thread Allan Jardine
Hi all,

I'm venturing into full text search in Postgres for the first time, and I'd
like to be able to do a search by the start of a word - so I used the `:*`
operator. However, this doesn't operate as I'd expect with a stop word -
for example, my name is "Allan" so I often use it as a test string. It
contains `all` which is a stop word, which is how I noticed this issue.

To illustrate:

=> select to_tsquery('al:*');
 to_tsquery

 'al':*
(1 row)

=> select to_tsquery('all:*');
NOTICE:  text-search query contains only stop words or doesn't contain
lexemes, ignored
 to_tsquery

(1 row)

=> select to_tsquery('alla:*');
 to_tsquery

 'alla':*
(1 row)

I get why that is happening - the notification basically details it, but
the wildcard at the end seems to me that it should return `'all':*` in this
case? Is this by design or could it be considered a bug? I'm using Postgres
12.10.

Thanks,
Allan


Re: Full text search - wildcard and a stop word

2022-02-22 Thread Tom Lane
Allan Jardine  writes:
> => select to_tsquery('all:*');
> NOTICE:  text-search query contains only stop words or doesn't contain
> lexemes, ignored
>  to_tsquery
> 
> (1 row)

> I get why that is happening - the notification basically details it, but
> the wildcard at the end seems to me that it should return `'all':*` in this
> case? Is this by design or could it be considered a bug?

It's a hard problem.  If we don't normalize the presented word, we risk
not matching cases that users would expect to match (because the word
is going to be compared to data that probably *was* normalized).

In this particular case, you can skip the normalization by just not
using to_tsquery:

n=# select 'all:*'::tsquery;
 tsquery 
-
 'all':*
(1 row)

but that might or might not be what you want in general.

Perhaps the ideal behavior here would be "normalize, but don't throw away
stopwords", but unfortunately our dictionary APIs don't support that.

regards, tom lane




AWS vs GCP storage

2022-02-22 Thread Torsten Förtsch
Hi,

I have a few databases with several TB-sized tables. We recently moved one
of those databases from AWS EC2 to GCP. Today I ran VACUUM FREEZE on those
tables and every time I saw our application freezing up (and throwing tons
of errors) for a few minutes right after the message about the end of
vacuum (duration: 30182257.392 ms  statement: vacuum freeze verbose ...).
We never saw anything like that on AWS.

The database is 9.6. At the moment I have no details about the storage
configuration on GCP. The machine was set up by another department.

Any suggestions on why that is happening and how to prevent it would be
appreciated.

Thanks,
Torsten


Re: AWS vs GCP storage

2022-02-22 Thread Adrian Klaver

On 2/22/22 12:27, Torsten Förtsch wrote:

Hi,

I have a few databases with several TB-sized tables. We recently moved 
one of those databases from AWS EC2 to GCP. Today I ran VACUUM FREEZE on 
those tables and every time I saw our application freezing up (and 
throwing tons of errors) for a few minutes right after the message about 
the end of vacuum (duration: 30182257.392 ms  statement: vacuum freeze 
verbose ...). We never saw anything like that on AWS.


And the errors where?



The database is 9.6. At the moment I have no details about the storage 
configuration on GCP. The machine was set up by another department.


Any suggestions on why that is happening and how to prevent it would be 
appreciated.


Thanks,
Torsten



--
Adrian Klaver
adrian.kla...@aklaver.com




Re: AWS vs GCP storage

2022-02-22 Thread Nikolay Samokhvalov
On Tue, Feb 22, 2022 at 12:27 PM Torsten Förtsch 
wrote:

> Hi,
>
> I have a few databases with several TB-sized tables. We recently moved one
> of those databases from AWS EC2 to GCP. Today I ran VACUUM FREEZE on those
> tables and every time I saw our application freezing up (and throwing tons
> of errors) for a few minutes right after the message about the end of
> vacuum (duration: 30182257.392 ms  statement: vacuum freeze verbose ...).
> We never saw anything like that on AWS.
>
> The database is 9.6. At the moment I have no details about the storage
> configuration on GCP. The machine was set up by another department.
>
> Any suggestions on why that is happening and how to prevent it would be
> appreciated.
>

The first thing to check is that you're using SSD PDs, not regular PDs.

Second: GCP has 2 rules for disk IO throttling (independent for regular PDs
and SSD PDs): based on disk size, and based on the number of vCPUs. You
need more vCPUs to get maximum disk IO, which may be counter-intuitive. I
suggest checking out the docs and benchmarking the disks using fio:
- https://cloud.google.com/compute/docs/disks/performance
- https://cloud.google.com/compute/docs/disks/benchmarking-pd-performance