Re: Is there a way to speed up WAL replay?

2018-10-31 Thread Nicolas Grilly
This tool may be useful:

https://github.com/joyent/pg_prefaulter
Faults pages into PostgreSQL shared_buffers or filesystem caches in advance
of WAL apply

Nicolas

On Wed, Oct 31, 2018 at 6:38 AM Torsten Förtsch 
wrote:

> Hi,
>
> I am working on restoring a database from a base backup + WAL. With the
> default settings the database replays about 3-4 WAL files per second. The
> startup process takes about 65% of a CPU and writes data with something
> between 50 and 100 MB/sec.
>
> Is there a way to speed that up? The disk can easily sustain 400-500
> MB/sec.
>
> Thanks,
> Torsten
>


Re: Where **not** to use PostgreSQL?

2019-02-28 Thread Nicolas Grilly
On Thu, Feb 28, 2019 at 1:24 PM Chris Travers 
wrote:

> 1.  a) TB-scale full text search systems.
>  b) PostgreSQL's full text search is quite capable but not so powerful
> that it can completely replace Lucene-based systems.  So you have to
> consider complexity vs functionality if you are tying with other data that
> is already in PostgreSQL.  Note further that my experience with at least
> ElasticSearch is that it is easier to scale something built on multiple
> PostgreSQL instances into the PB range than it is to scale ElasticSearch
> into the PB range.
>  c) Solr or ElasticSearch
>

One question about your use of PostgreSQL for a TB-scale full-text search
system: Did you order search results using ts_rank or ts_rank_cd? I'm
asking because in my experience, PostgreSQL full-text search is extremely
efficient, until you need ranking. It's because the indexes don't contain
the necessary information for ranking, and because of this the heap has to
be consulted, which implies a lot of random IO.

I'd be curious to know a bit more about your experience in this regard.

Regards,

Nicolas Grilly

PS: A potential solution to the performance issue I mentioned is this PG
extension: https://github.com/postgrespro/rum


Re: Where **not** to use PostgreSQL?

2019-02-28 Thread Nicolas Grilly
On Thu, Feb 28, 2019 at 2:12 PM Chris Travers 
wrote:

> Where I did this on the TB scale, we had some sort of ranking but it was
> not based on ts_rank.
>
> On the PB scale systems I work on now, it is distributed, and we don't
> order in PostgreSQL (or anywhere else, though if someone wants to write to
> disk and sort, they can do this I guess)
>

Thanks!

>