On 11/13/2010 03:07 PM, Leo Franchi wrote: > Hello, > > Below are my observations too, just to see if other users' compare. > > On Sat, Nov 13, 2010 at 4:06 AM, Mikko C. <mikko....@gmail.com> wrote: >> Hi, >> I found some time to run some tests with the new scanner. >> >> Amarok from git master of today: >> Full rescan with the collection already being present on the external >> MySQL database. >> >> - 11:30 mins for the first scanning part (up to 50% in the progress bar) >> - 2:50 mins for the last part (remaining 50%) >> >> Total time: around 14:20 mins. >> >> tracks found: 21113 >> albums found: 1703 >> artists found: 1013 > > Rescan with empty mysql database: > > 11:00 amarokcollectionscanner run > 16:00 scan result processing / committing > > total of 26:00 > > 47 636 tracks. > > Old scanner: > > 11:30 total time for amarokcollectionscanner + committing.
This is almost certainly due to the way that insertions and other DB accesses were handled in the old scanning code. I did a lot of work doing every thing I possibly could to minimize DB calls, because they were by far being the slowest part of the scanning, other than actual I/O access on the drives. The end result was a lot of really nasty data structures to be able to emulate the behavior of running various SQL calls. These data structures would store all information to be committed, and then this information would be committed in one go, using the largest packet size possible. This made it quite complex, yes -- but it made it extremely fast. You've probably seen them before but see e.g. http://jefferai.org/2009/07/db-changes-call-for-benchmarkers/ and http://jefferai.org/2009/10/speed-never-gets-old-at-least-in-software/ and especially http://jefferai.org/2009/11/the-collection-scanners-ultimate-speed-bump-and-cases/ I haven't seen any proper query logs for the new scanner because when I was last looking at them with Leo there were logic problems in the new scanner that were keeping queries screwed up -- hopefully those have been fixed. But I'm guessing from what I *did* see that each track uses several database accesses -- an INSERT or two into various tables and several SELECT or so queries. If so, this is going to be the big bottleneck and the big reason for the slowdown. --Jeff
signature.asc
Description: OpenPGP digital signature
_______________________________________________ Amarok-devel mailing list Amarok-devel@kde.org https://mail.kde.org/mailman/listinfo/amarok-devel