Hello,

I had a look at the query used in pgsql/dbpgsql.c to calculate the used quota 
in db_get_quotum_used(). I executed the current query (with "size" instead of 
"m.size")

SELECT SUM(messagesize)
FROM messages
WHERE mailbox_idnr IN
     (SELECT mailbox_idnr
      FROM mailboxes
      WHERE owner_idnr = %llu::bigint)
  AND status < 2;

and the equivalent query (which is used in mysql/dbmysql.c)

SELECT SUM(m.messagesize)
FROM messages m, mailboxes mb
WHERE m.mailbox_idnr = mb.mailbox_idnr
     AND mb.owner_idnr = %llu::bigint 
     AND m.status < 2;

on an account which had 2155 mails in 52 mailboxes (the whole database 
contains 26000 mails in 400 mailboxes). The server is a Celeron 1200MHz with 
256 MB RAM.

I asked PostgreSQL 7.3 to EXPLAIN ANALYZE the queries (details see below): 
PostgreSQL uses two nested sequential scans (on messages and mailboxes) for 
the first query and the total execution time is 12144.25 msec.
The second query is executed as a hash join and takes only 154.69 msec.

These results are repeatable and the interactive behaviour of the mail server 
is _much_ better with the second query when, e.g., moving mails between 
folders.

I've attached a patch to change the query in pgsql/dbpgsql.c to be the same as 
the query used in mysql/dbmysql.c .

Is there a reason why the query in pgsql/dbpgsql.c has been formulated 
differently, with a sub-select?

Regards,
Armin


Detailed output from "EXPLAIN ANALYZE":

dbmail=# explain analyze SELECT SUM(messagesize) FROM messages WHERE 
mailbox_idnr IN (SELECT mailbox_idnr FROM mailboxes WHERE owner_idnr = 
10::bigint) AND status < 2;
                                                      QUERY PLAN                
                                                                 
-----------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=140954.36..140954.36 rows=1 width=8) (actual 
time=12144.10..12144.10 rows=1 loops=1)
   ->  Seq Scan on messages  (cost=0.00..140922.85 rows=12601 width=8) (actual 
time=11.92..12130.91 rows=2056 loops=1)
         Filter: ((status < 2) AND (subplan))
         SubPlan
           ->  Seq Scan on mailboxes  (cost=0.00..11.03 rows=52 width=8) 
(actual time=0.02..0.43 rows=50 loops=25105)
                 Filter: (owner_idnr = 10::bigint)
 Total runtime: 12144.25 msec


dbmail=# explain analyze SELECT SUM(m.messagesize) FROM messages m, mailboxes 
mb WHERE m.mailbox_idnr = mb.mailbox_idnr AND mb.owner_idnr = 10::bigint AND 
m.status < 2;
                                                        QUERY PLAN              
                                                                   
---------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=856.95..856.95 rows=1 width=24) (actual time=154.51..154.51 
rows=1 loops=1)
   ->  Hash Join  (cost=11.16..848.80 rows=3260 width=24) (actual 
time=1.35..142.72 rows=2056 loops=1)
         Hash Cond: ("outer".mailbox_idnr = "inner".mailbox_idnr)
         ->  Seq Scan on messages m  (cost=0.00..670.89 rows=25202 width=16) 
(actual time=0.10..111.64 rows=25105 loops=1)
               Filter: (status < 2)
         ->  Hash  (cost=11.03..11.03 rows=52 width=8) (actual time=1.11..1.11 
rows=0 loops=1)
               ->  Seq Scan on mailboxes mb  (cost=0.00..11.03 rows=52 
width=8) (actual time=0.18..1.05 rows=52 loops=1)
                     Filter: (owner_idnr = 10::bigint)
 Total runtime: 154.69 msec
Index: pgsql/dbpgsql.c
===================================================================
RCS file: /cvsroot-dbmail/dbmail/pgsql/dbpgsql.c,v
retrieving revision 1.99
diff -u -r1.99 dbpgsql.c
--- pgsql/dbpgsql.c	2003/03/21 13:02:22	1.99
+++ pgsql/dbpgsql.c	2003/05/20 18:05:23
@@ -156,8 +156,8 @@
 {
   u64_t q=0;
 
-  snprintf(query, DEF_QUERYSIZE, "SELECT SUM(messagesize) FROM messages WHERE "
-	   "mailbox_idnr IN (SELECT mailbox_idnr FROM mailboxes WHERE owner_idnr = %llu::bigint) AND "
+  snprintf(query, DEF_QUERYSIZE, "SELECT SUM(m.messagesize) FROM messages m, mailboxes mb "
+	   "WHERE m.mailbox_idnr = mb.mailbox_idnr AND mb.owner_idnr = %llu::bigint AND "
 	   "m.status < 2",
 	   userid);
 

Reply via email to