Searching in varchar column having 100M records

2019-07-17 Thread mayank rupareliya
My table is having data like below with 100M records (contains all dummy
data). I am having btree index on column ("field").
*While searching for any text from that column takes longer (more than 1
minute).*

user Id field
d848f466-5e12-46e7-acf4-e12aff592241 Northern Arkansas College
24c32757-e6a8-4dbd-aac7-1efd867156ce female
6e225c57-c1d1-48a5-b9aa-513223efc81b 1.0, 3.67, 3.67, 4.67, 7.0, 3.0
088c6342-a240-45a7-9d12-e0e707292031 Weber
b05088cf-cba6-4bd7-8f8f-1469226874d0 addd#[email protected]


Table and index are created using following query.

create table fields(user_id varchar(64), field varchar(64));
CREATE INDEX index_field ON public.fields USING btree (field);

Search Query:
EXPLAIN (ANALYZE, BUFFERS) select * from fields where field='Mueller';

Bitmap Heap Scan on fields  (cost=72.61..10069.32 rows=2586 width=55)
(actual time=88.017..65358.548 rows=31882 loops=1)
  Recheck Cond: ((field)::text = 'Mueller'::text)
  Heap Blocks: exact=31403
  Buffers: shared hit=2 read=31492
  ->  Bitmap Index Scan on index_field  (cost=0.00..71.96 rows=2586
width=0) (actual time=55.960..55.960 rows=31882 loops=1)
Index Cond: ((field)::text = 'Mueller'::text)
Buffers: shared read=91
Planning Time: 0.331 ms
Execution Time: 65399.314 ms


Any suggestions for improvement?

Best Regards,
Mayank


Re: Searching in varchar column having 100M records

2019-07-17 Thread Sergei Kornilov
Hello

Please recheck with track_io_timing = on in configuration. explain 
(analyze,buffers) with this option will report how many time we spend during i/o

>   Buffers: shared hit=2 read=31492

31492 blocks / 65 sec ~ 480 IOPS, not bad if you are using HDD

Your query reads table data from disks (well, or from OS cache). You need more 
RAM for shared_buffers or disks with better performance.

regards, Sergei




Re: Searching in varchar column having 100M records

2019-07-17 Thread Tomas Vondra

On Wed, Jul 17, 2019 at 02:53:20PM +0300, Sergei Kornilov wrote:

Hello

Please recheck with track_io_timing = on in configuration. explain
(analyze,buffers) with this option will report how many time we spend
during i/o


  Buffers: shared hit=2 read=31492


31492 blocks / 65 sec ~ 480 IOPS, not bad if you are using HDD

Your query reads table data from disks (well, or from OS cache). You need
more RAM for shared_buffers or disks with better performance.



Either that, or try creating a covering index, so that the query can do an
index-only scan. That might reduce the amount of IO against the table, and
in the index the data should be located close to each other (same page or
pages close to each other).

So try something like

   CREATE INDEX ios_idx ON table (field, user_id);

and make sure the table is vacuumed often enough (so that the visibility
map is up to date).


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services





Re: Searching in varchar column having 100M records

2019-07-17 Thread Andreas Kretschmer




Am 17.07.19 um 14:48 schrieb Tomas Vondra:
Either that, or try creating a covering index, so that the query can 
do an
index-only scan. That might reduce the amount of IO against the table, 
and

in the index the data should be located close to each other (same page or
pages close to each other).

So try something like

   CREATE INDEX ios_idx ON table (field, user_id);

and make sure the table is vacuumed often enough (so that the visibility
map is up to date). 


yeah, and please don't use varchar(64), but instead UUID for the user_id 
- field to save space on disk and for faster comparison.



Regards, Andreas

--
2ndQuadrant - The PostgreSQL Support Company.
www.2ndQuadrant.com





Re: Searching in varchar column having 100M records

2019-07-17 Thread David G. Johnston
On Wed, Jul 17, 2019 at 4:04 AM mayank rupareliya 
wrote:

> create table fields(user_id varchar(64), field varchar(64));
> CREATE INDEX index_field ON public.fields USING btree (field);
>
> Any suggestions for improvement?
>

Reduce the number of rows by constructing a relationally normalized data
model.

David J.


Re: Perplexing, regular decline in performance

2019-07-17 Thread Hugh Ranalli
On Wed, 26 Jun 2019 at 15:18, Tom Lane  wrote:

> Alvaro Herrera  writes:
> > On 2019-Jun-26, Hugh Ranalli wrote:
> >> From my research in preparing for the upgrade, I understood transparent
> >> huge pages were a good thing, and should be enabled. Is this not
> correct?
>
> > It is not.
>
> Yeah ... they would be a good thing perhaps if the quality of the kernel
> implementation were better.  But there are way too many nasty corner
> cases, at least with the kernel versions people around here have
> experimented with.  You're best off to disable THP and instead manually
> arrange for Postgres' shared memory to use huge pages.  I forget where
> to look for docs about doing that, but I think we have some.
>

We've been dealing with some other production issues, so my apologies for
not replying sooner. I'm seeing now that I have confused huge pages with
*transparent* huge pages. We have a maintenance window coming up this
weekend, so we'll disable transparent huge pages and configure huge pages
manually. I found the docs here:
https://www.postgresql.org/docs/11/kernel-resources.html#LINUX-HUGE-PAGES

Thank you very much!


Re: Perplexing, regular decline in performance

2019-07-17 Thread Alvaro Herrera
On 2019-Jun-26, Justin Pryzby wrote:

> > Also, Should pg_buffercache perhaps be run at the beginning and end of the
> > week, to see if there is a significant difference?
> 
> Yes; buffercache can be pretty volatile, so I'd save it numerous times each at
> beginning and end of week.

Be careful with pg_buffercache though, as it can cause a hiccup in
operation.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Perplexing, regular decline in performance

2019-07-17 Thread Andres Freund
Hi

On 2019-07-17 13:55:51 -0400, Alvaro Herrera wrote:
> Be careful with pg_buffercache though, as it can cause a hiccup in
> operation.

I think that's been fixed a few years back:

commit 6e654546fb61f62cc982d0c8f62241b3b30e7ef8
Author: Heikki Linnakangas 
Date:   2016-09-29 13:16:30 +0300

Don't bother to lock bufmgr partitions in pg_buffercache.

That makes the view a lot less disruptive to use on a production system.
Without the locks, you don't get a consistent snapshot across all buffers,
but that's OK. It wasn't a very useful guarantee in practice.

Ivan Kartyshov, reviewed by Tomas Vondra and Robert Haas.

Discusssion: 

so everything from 10 onwards ought to be fine.

Greetings,

Andres Freund




Re: Searching in varchar column having 100M records

2019-07-17 Thread Gavin Flower

On 17/07/2019 23:03, mayank rupareliya wrote:
[...]

Table and index are created using following query.

create table fields(user_id varchar(64), field varchar(64));
CREATE INDEX index_field ON public.fields USING btree (field);


[...]

Any particular reason for using varchar instead of text, for field?

Also, as Andreas pointed out, use UUID for the user_id.


Cheers,
Gavin