RE: Query is slow when run for first time; subsequent execution is fast
On windows, how to put an entry in my db startup script to run this query (pg_prewarm) immediately after startng the server, and let the query warm the cache itself. After starting the server, I want to know what is the server, and it is the database I restarted or windows system? Thank you. >Hi, >On 17 Jan 2018 12:55, "POUSSEL, Guillaume" > >wrote: >Are you on Windows or Linux? I’m on Windows and wondering if the issue is >the same on Linux? >I have experienced this on Mac and Linux machines. >You can try pg_prewarm, on pg_statistic table and its index. But I'd >probably just put an entry in my db startup script to run this query >immediately after startng the server, and let the query warm the cache >itself. >I will try this suggestion and get back on the thread. Is pg_statistic the >only table to be pre cached? Pls let me know if any other table/index needs >to be pre warmed. > > >Btw, I don't running a "select * from pg_statistic" will fill the shared >buffer. Only 256 kb of data will be cached during sequential scans. I will >try pg_prewarm > > >Why do you restart your database often > > >Postgres is bundled with our application and deployed by our client. >Starting / stopping the server is not under my control. > > >Regards, >Nanda
Inconsistent query times and spiky CPU with GIN tsvector search
Hello all,
We are running postgresql 9.4 and we have a table where we do some full-text
searching using a GIN index on a tsvector column:
CREATE TABLE public.location_search
(
id bigint NOT NULL DEFAULT nextval('location_search_id_seq'::regclass),
…
search_field_tsvector tsvector
)
and
CREATE INDEX location_search_tsvector_idx
ON public.location_search USING gin
(search_field_tsvector)
TABLESPACE pg_default;
The search_field_tsvector column contains the data from the location's name and
address:
to_tsvector('pg_catalog.english', COALESCE(NEW.name, '')) ||
to_tsvector(COALESCE(address, ''))
This setup has been running very well, but as our load is getting heavier, the
performance seems to be getting much more inconsistent. Our searches are run
on a dedicated read replica, so this server is only doing queries against this
one table. IO is very low, indicating to me that the data is all in memory.
However, we're getting some queries taking upwards of 15-20 seconds, while the
average is closer to 1 second.
A sample query that's running slowly is
explain (analyze, buffers)
SELECT ls.location AS locationId FROM location_search ls
WHERE ls.client = 1363
AND ls.favorite = TRUE
AND search_field_tsvector @@ to_tsquery('CA-94:* &E &San:*')
LIMIT 4;
And the explain analyze is:
Limit (cost=39865.85..39877.29 rows=1 width=8) (actual time=4471.120..4471.120
rows=0 loops=1)
Buffers: shared hit=25613
-> Bitmap Heap Scan on location_search ls (cost=39865.85..39877.29 rows=1
width=8) (actual time=4471.117..4471.117 rows=0 loops=1)
Recheck Cond: (search_field_tsvector @@ to_tsquery('CA-94:* &E
&San:*'::text))
Filter: (favorite AND (client = 1363))
Rows Removed by Filter: 74
Heap Blocks: exact=84
Buffers: shared hit=25613
-> Bitmap Index Scan on location_search_tsvector_idx
(cost=0.00..39865.85 rows=6 width=0) (actual time=4470.895..4470.895 rows=84
loops=1)
Index Cond: (search_field_tsvector @@ to_tsquery('CA-94:* &E
&San:*'::text))
Buffers: shared hit=25529
Planning time: 0.335 ms
Execution time: 4487.224 ms
I'm a little bit at a loss to where to start at this - any suggestions would be
hugely appreciated!
Thanks,
Scott
This email message contains information that Motus, LLC considers confidential
and/or proprietary, or may later designate as confidential and proprietary. It
is intended only for use of the individual or entity named above and should not
be forwarded to any other persons or entities without the express consent of
Motus, LLC, nor should it be used for any purpose other than in the course of
any potential or actual business relationship with Motus, LLC. If the reader of
this message is not the intended recipient, or the employee or agent
responsible to deliver it to the intended recipient, you are hereby notified
that any dissemination, distribution, or copying of this communication is
strictly prohibited. If you have received this communication in error, please
notify sender immediately and destroy the original message.
Internal Revenue Service regulations require that certain types of written
advice include a disclaimer. To the extent the preceding message contains
advice relating to a Federal tax issue, unless expressly stated otherwise the
advice is not intended or written to be used, and it cannot be used by the
recipient or any other taxpayer, for the purpose of avoiding Federal tax
penalties, and was not written to support the promotion or marketing of any
transaction or matter discussed herein.
Re: Inconsistent query times and spiky CPU with GIN tsvector search
Scott Rankin wrote:
> We are running postgresql 9.4 and we have a table where we do some
> full-text searching using a GIN index on a tsvector column:
>
> CREATE INDEX location_search_tsvector_idx
> ON public.location_search USING gin
> (search_field_tsvector)
> TABLESPACE pg_default;
>
> This setup has been running very well, but as our load is getting heavier,
> the performance seems to be getting much more inconsistent.
> Our searches are run on a dedicated read replica, so this server is only
> doing queries against this one table. IO is very low, indicating to me
> that the data is all in memory. However, we're getting some queries taking
> upwards of 15-20 seconds, while the average is closer to 1 second.
>
> A sample query that's running slowly is
>
> explain (analyze, buffers)
> SELECT ls.location AS locationId FROM location_search ls
> WHERE ls.client = 1363
> AND ls.favorite = TRUE
> AND search_field_tsvector @@ to_tsquery('CA-94:* &E &San:*')
> LIMIT 4;
>
> And the explain analyze is:
>
> Limit (cost=39865.85..39877.29 rows=1 width=8) (actual
> time=4471.120..4471.120 rows=0 loops=1)
> Buffers: shared hit=25613
> -> Bitmap Heap Scan on location_search ls (cost=39865.85..39877.29 rows=1
> width=8) (actual time=4471.117..4471.117 rows=0 loops=1)
> Recheck Cond: (search_field_tsvector @@ to_tsquery('CA-94:* &E
> &San:*'::text))
> Filter: (favorite AND (client = 1363))
> Rows Removed by Filter: 74
> Heap Blocks: exact=84
> Buffers: shared hit=25613
> -> Bitmap Index Scan on location_search_tsvector_idx
> (cost=0.00..39865.85 rows=6 width=0) (actual time=4470.895..4470.895 rows=84
> loops=1)
> Index Cond: (search_field_tsvector @@ to_tsquery('CA-94:* &E
> &San:*'::text))
> Buffers: shared hit=25529
> Planning time: 0.335 ms
> Execution time: 4487.224 ms
Not sure, but maybe you are suffering from bad performance because of a
long "GIN pending list".
If yes, then the following can help:
ALTER INDEX location_search_tsvector_idx SET (gin_pending_list_limit = 512);
Or you can disable the feature altogether:
ALTER INDEX location_search_tsvector_idx SET (fastupdate = off);
Then clean the pending list with
SELECT gin_clean_pending_list('location_search_tsvector_idx'::regclass);
Disabling the pending list will slow down data modification, but should
keep the SELECT performance stable.
Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com
Re: Query is slow when run for first time; subsequent execution is fast
On Tue, Sep 4, 2018 at 3:16 AM jimmy wrote: > On windows, how to put an entry in my db startup script to run this query > (pg_prewarm) immediately after startng the server, and let the query warm > the cache itself. > Starting with PostgreSQL version 11 (to be released soon), you can use pg_prewarm.autoprewarm. Until then, maybe this: https://superuser.com/questions/502160/run-a-scheduled-task-after-a-windows-service-is-started I've tested neither one. Cheers, Jeff
Re: Performance difference in accessing differrent columns in a Postgres Table
Hi All, I was wondering whether the case is solved or still continuing. As a Postgres newbie, I can't understand any of the terms (JIT, tuple deformation) as you mentioned above. Please anyone let me know , what is the current scenario. Thanks, Dineshkumar. On Wed, Aug 1, 2018 at 8:51 PM Jeff Janes wrote: > On Mon, Jul 30, 2018 at 3:02 PM, Andres Freund wrote: > >> Hi, >> >> On 2018-07-30 13:31:33 -0400, Jeff Janes wrote: >> > I don't know where the time is going with the as-committed JIT. None of >> > the JIT-specific timings reported by EXPLAIN (ANALYZE) add up to >> anything >> > close to the slow-down I'm seeing. Shouldn't compiling and optimization >> > time show up there? >> >> As my timings showed, I don't see the slowdown you're reporting. Could >> you post a few EXPLAIN ANALYZEs? >> > > > I don't think you showed any timings where jit_above_cost < query cost < > jit_optimize_above_cost, which is where I saw the slow down. (That is also > where things naturally land for me using default settings) > > I've repeated my test case on a default build (./configure --with-llvm > --prefix=) and default postgresql.conf, using the post-11BETA2 commit > 5a71d3e. > > > I've attached the full test case, and the full output. > > Here are the last two executions, with jit=on and jit=off, respectively. > Doing it with TIMING OFF doesn't meaningfully change things, nor does > increasing shared_buffers beyond the default. > > > > QUERY PLAN > > -- > Seq Scan on i200c200 (cost=0.00..22.28 rows=828 width=16) > (actual time=29.317..11966.291 rows=1000 loops=1) > Planning Time: 0.034 ms > JIT: >Functions: 2 >Generation Time: 1.589 ms >Inlining: false >Inlining Time: 0.000 ms >Optimization: false >Optimization Time: 9.002 ms >Emission Time: 19.948 ms > Execution Time: 12375.493 ms > (11 rows) > > Time: 12376.281 ms (00:12.376) > SET > Time: 1.955 ms >QUERY PLAN > > > Seq Scan on i200c200 (cost=0.00..22.28 rows=828 width=16) > (actual time=0.063..3897.302 rows=1000 loops=1) > Planning Time: 0.037 ms > Execution Time: 4292.400 ms > (3 rows) > > Time: 4293.196 ms (00:04.293) > > Cheers, > > Jeff >
