explain analyze faster then query

2018-11-25 Thread Mariel Cherkassky
Hi,
I'm using postgres 9.6.
I have a table with 100M+ records which consume on disk about 8GB. In
addition I have an index on the id column of the table.
When I run in psql : explain analyze select id from my_table order by id
The query returns output after 130 seconds which is great. The plan that is
chosen is Index only scan.
However when I run the query without the explain analyze it takes forever
to run it(More then two hours).
All the statistics are accurate and work_mem set to 4MB. What there is so
much difference between running the query with explain analyze and without ?
Is there a possibility that it is related to fetching or something like
that ?

Thanks.


Re: explain analyze faster then query

2018-11-25 Thread Justin Pryzby
Cc: [email protected],
[email protected]

Please avoid simultaneously sending the same question to multiple lists.

It means that people can't see each others replies and everything that implies.

On Sun, Nov 25, 2018 at 03:08:33PM +0200, Mariel Cherkassky wrote:
> However when I run the query without the explain analyze it takes forever
> to run it(More then two hours).
> Is there a possibility that it is related to fetching or something like
> that ?

If it's a remote database, I expect that's why.
Maybe you can test by running the query on the DB server.
Or by running another variant of the query, such as:

WITH x AS (QUERY GOES HERE) SELECT 1;

which returns only one row but after having executed the query behind CTE, as
optimization fence.

Justin



Re: explain analyze faster then query

2018-11-25 Thread Mariel Cherkassky
I run it from inside the machine on the local database.
For example :

db=# create table rule_test as select generate_series(1,1);
SELECT 1

db=# explain analyze select generate_series from rule_test order by
generate_series asc;
QUERY PLAN
---
 Sort  (cost=17763711.32..18045791.04 rows=112831890 width=4) (actual
time=62677.752..100928.829 rows=1 loops=1)
   Sort Key: generate_series
   Sort Method: external merge  Disk: 1367624kB
   ->  Seq Scan on rule_test  (cost=0.00..1570796.90 rows=112831890
width=4) (actual time=0.019..36098.463 rows=1 loops=1)
 Planning time: 0.072 ms
 Execution time: 107025.113 ms
(6 rows)

db=# create index on rule_test(generate_series);
CREATE INDEX
db=# select generate_series from rule_test order by generate_series asc;


db=# explain analyze select generate_series from rule_test order by
generate_series asc;

   QUERY PLAN

 Index Only Scan using rule_test_generate_series_idx on rule_test
(cost=0.57..2490867.57 rows=1 width=4) (actual
time=0.103..63122.906 rows=1 loops=1)
   Heap Fetches: 1
 Planning time: 6.682 ms
 Execution time: 69265.311 ms
(4 rows)

db=# select generate_series from rule_test order by generate_series asc;
stuck for more then a hour


‫בתאריך יום א׳, 25 בנוב׳ 2018 ב-15:30 מאת ‪Justin Pryzby‬‏ <‪
[email protected]‬‏>:‬

> Cc: [email protected],
> [email protected]
>
> Please avoid simultaneously sending the same question to multiple lists.
>
> It means that people can't see each others replies and everything that
> implies.
>
> On Sun, Nov 25, 2018 at 03:08:33PM +0200, Mariel Cherkassky wrote:
> > However when I run the query without the explain analyze it takes forever
> > to run it(More then two hours).
> > Is there a possibility that it is related to fetching or something like
> > that ?
>
> If it's a remote database, I expect that's why.
> Maybe you can test by running the query on the DB server.
> Or by running another variant of the query, such as:
>
> WITH x AS (QUERY GOES HERE) SELECT 1;
>
> which returns only one row but after having executed the query behind CTE,
> as
> optimization fence.
>
> Justin
>


Re: explain analyze faster then query

2018-11-25 Thread Justin Pryzby
On Sun, Nov 25, 2018 at 03:37:46PM +0200, Mariel Cherkassky wrote:
> I run it from inside the machine on the local database.
> For example :
> 
> db=# create table rule_test as select generate_series(1,1);
> SELECT 1

> db=# explain analyze select generate_series from rule_test order by
> generate_series asc;

So it's returning 100M rows to the client, which nominally will require moving
400MB.

And pgsql is formatting the output.

I did a test with 10M rows:

[pryzbyj@database ~]$ command time -v psql postgres -c 'SELECT * FROM 
rule_test' |wc -c&
Command being timed: "psql postgres -c SELECT * FROM rule_test"
User time (seconds): 11.52
Percent of CPU this job got: 78%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:17.25
Maximum resident set size (kbytes): 396244
...
17053

Explain analyze takes 0.8sec, but returning query results uses 11sec CPU time
on the *client*, needed 400MB RAM (ints now being represented as strings
instead of machine types), and wrote 170MB to stdout, Also, if the output is
being piped to less, the data is going to be buffered there, which means your
query is perhaps using 4GB RAM in psql + 4GB in less..

Is the server swapping ?   check "si" and "so" in output of "vmstat -w 1"

Justin