Le Tuesday 09 September 2008 23:15:45 Bill Moran, vous avez écrit :
> In response to "Yuri Timofeev" <[EMAIL PROTECTED]>:
> > 2008/9/9 Bill Moran <[EMAIL PROTECTED]>:
> > > In response to Bob Hetzel <[EMAIL PROTECTED]>:
> > >. Additionally, doing that
> > > rewrite so that it results in improvements to all SQL platforms is
> > > probably going to require that the code be broken into SQL-dependent
> > > and SQL-independent sections.
> >
> > It may be that in the critical sections have to do it.
>
> Agreed. This was demonstrated last year by some massive speed improvements
> in the way that job records are saved.
>
> > The problem is that different DBMS have different optimizers.
> > For example, in PostgreSQL (during my tests) operator EXPLAIN does not
> > show that will be used indices :
> >
> > bacula=# EXPLAIN SELECT File.FileId, Job.JobId FROM File LEFT OUTER JOIN
> > Job ON (File.JobId = Job.JobId) WHERE Job.JobId IS NULL LIMIT 300000;
> >
> > QUERY PLAN
> > -------------------------------------------------------------------------
> >-- Limit (cost=78.86..24169.76 rows=300000 width=8)
> > -> Hash Left Join (cost=78.86..181966.90 rows=2265021 width=8)
> > Hash Cond: (file.jobid = job.jobid)
> > Filter: (job.jobid IS NULL)
> > -> Seq Scan on file (cost=0.00..113937.42 rows=4530042
> > width=8) -> Hash (cost=58.38..58.38 rows=1638 width=4)
> > -> Seq Scan on job (cost=0.00..58.38 rows=1638 width=4)
> > (7 rows)
> >
> > However, PostgreSQL worked quickly.
>
> That's because using indexes on that query is a horrifically bad idea:
>
> bacula=# EXPLAIN SELECT File.FileId, Job.JobId FROM File LEFT OUTER JOIN
> Job ON bacula-# (File.JobId = Job.JobId) WHERE Job.JobId IS NULL LIMIT
> 300000; QUERY PLAN
> ---------------------------------------------------------------------------
>-- Limit (cost=53.15..14650.94 rows=300000 width=8)
> -> Hash Left Join (cost=53.15..2115902.45 rows=43482932 width=8)
> Hash Cond: ("outer".jobid = "inner".jobid)
> Filter: ("inner".jobid IS NULL)
> -> Seq Scan on file (cost=0.00..1463605.32 rows=43482932
> width=8) -> Hash (cost=50.52..50.52 rows=1052 width=4)
> -> Seq Scan on job (cost=0.00..50.52 rows=1052 width=4)
> (7 rows)
>
> bacula=# set enable_seqscan=off;
> SET
> Time: 0.210 ms
> bacula=# EXPLAIN SELECT File.FileId, Job.JobId FROM File LEFT OUTER JOIN
> Job ON bacula-# (File.JobId = Job.JobId) WHERE Job.JobId IS NULL LIMIT
> 300000; QUERY PLAN
> ---------------------------------------------------------------------------
>--------------------------- Limit (cost=0.00..1042544.60 rows=300000
> width=8)
> -> Merge Right Join (cost=0.00..151109653.30 rows=43482932 width=8)
> Merge Cond: ("outer".jobid = "inner".jobid)
> Filter: ("outer".jobid IS NULL)
> -> Index Scan using job_pkey on job (cost=0.00..175.80 rows=1052
> width=4) -> Index Scan using file_jobid_idx on file
> (cost=0.00..150565938.23 rows=43482932 width=8) (6 rows)
>
> Notice that PostgreSQL expects the index scan on file_job_idx to take
> 150,565,938 operations along, whereas the entire sequential scan only
> takes 1,463,605. The seq scans are a clear winner in this case.
>
> > MySQL (theoretically) should use an index (multiple columns), which
> > already have:
> > in make_mysql_tables.in
> > CREATE TABLE File (...
> > INDEX (JobId, PathId, FilenameId)
> >
> > but MySQL worked slowly.
>
> I don't know much about the innards of MySQL's query planner, but it
> doesn't seem to be as robust as Postgres'. Index scanning is usually
> the best way to go, but (as is shown above) not always. I have trouble
> believing that the MySQL developers always use available indexes as
> that would create a laundry list of corner cases where queries perform
> horribly, but it seems that their planning logic fails in this particular
> case.
>
> It's not unusual. If you mess with Postgres long enough, you'll find
> odd queries that perform poorly even though there's a much faster way
> to do it. And that's the rub. If I were writing a system purely for
> PostgreSQL, I could reorder joins or make other tweaks to the query
> until it ran quickly, but then I might have a query that ran like
> molasses on MySQL. The inverse is true as well.
>
> My point is that such work is definitely warranted in the director, I'm
> not sure if it's warranted in something like dbcheck. If you've already
> got it done, there's no reason to waste it, but suggesting that someone
> review dbcheck and optimize the SQL is like asking me to clean toilets
> with a toothbrush -- it's a useful exercise to teach privates not to
> mouth off to sergeants, but it's not a particularly efficient way to get
> the toilet clean.
>
> I expect, that on Postgres, the fastest way to delete orphaned records
> would be:
> delete from filename
> where filename.filenameid not in (select filenameid from file);
>
> Assuming that is fast, would such a query even work on MySQL?
We (essentially Marc Cousin) have already done some optimizations for
postgresql, you will be able to find a dbcheck version in pure SQL in
trunk/bacula/examples/database/dbcheck.sql
It's *pretty* fast with postgresql because the engine is able to do merge or
hash join. Mysql uses nested loop (a loop within a loop) instead and needs
indexes.
Bye
-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Bacula-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bacula-devel