On Mi, 2010-11-17 at 17:38 -0500, Leo Franchi wrote: > On Tue, Nov 16, 2010 at 6:48 PM, Jeff Mitchell <mitch...@kde.org> wrote: > > On 11/13/2010 03:07 PM, Leo Franchi wrote: > >> Hello, > >> > >> Below are my observations too, just to see if other users' compare. > >> > >> On Sat, Nov 13, 2010 at 4:06 AM, Mikko C. <mikko....@gmail.com> wrote: > >>> Hi, > >>> I found some time to run some tests with the new scanner. > >>> > >>> Amarok from git master of today: > >>> Full rescan with the collection already being present on the external > >>> MySQL database. > >>> > >>> - 11:30 mins for the first scanning part (up to 50% in the progress bar) > >>> - 2:50 mins for the last part (remaining 50%) > >>> > >>> Total time: around 14:20 mins. > >>> > >>> tracks found: 21113 > >>> albums found: 1703 > >>> artists found: 1013 > >> > >> Rescan with empty mysql database: > >> > >> 11:00 amarokcollectionscanner run > >> 16:00 scan result processing / committing > >> > >> total of 26:00 > >> > >> 47 636 tracks. > >> > >> Old scanner: > >> > >> 11:30 total time for amarokcollectionscanner + committing. > > > > This is almost certainly due to the way that insertions and other DB > > accesses were handled in the old scanning code. > > > > I did a lot of work doing every thing I possibly could to minimize DB > > calls, because they were by far being the slowest part of the scanning, > > other than actual I/O access on the drives. The end result was a lot of > > really nasty data structures to be able to emulate the behavior of > > running various SQL calls. These data structures would store all > > information to be committed, and then this information would be > > committed in one go, using the largest packet size possible. This made > > it quite complex, yes -- but it made it extremely fast. You've probably > > seen them before but see e.g. > > http://jefferai.org/2009/07/db-changes-call-for-benchmarkers/ and > > http://jefferai.org/2009/10/speed-never-gets-old-at-least-in-software/ > > and especially > > http://jefferai.org/2009/11/the-collection-scanners-ultimate-speed-bump-and-cases/ > > > > I haven't seen any proper query logs for the new scanner because when I > > was last looking at them with Leo there were logic problems in the new > > scanner that were keeping queries screwed up -- hopefully those have > > been fixed. But I'm guessing from what I *did* see that each track uses > > several database accesses -- an INSERT or two into various tables and > > several SELECT or so queries. If so, this is going to be the big > > bottleneck and the big reason for the slowdown. > > When I profiled the slowness of the new scan result processor, 95% of > the time was spent in mysql calls. Just wanted to underline Jeff's > point. Thousands of sql queries == bad, and all of Jeff's hard work > making the scanner minimize how many SQL operations it did is not > something to throw away lightly. > > I do hope and believe we can get the needed fixes to the current > scanner before getting closer to 2.4 betas. But if we get there and > the scanner is still significantly worse for users with large > collections (of which we have a lot!) we should revert to the old > scanner until the issues are worked out. > > leo
Hi all, I agree that 16 minutes just for committing the data is too much. My earlier tests with the scan result processor showed a time increase of 200% to 300%. Now this sounds bad, but my test also showed that only 5% of the time was used in the database. A collection of 13000 file needs 58 seconds for a full scan on an existing collection and 93 seconds on an empty one. Calculating for the 47000 files collection you could say that it should take around five to six minutes which would be consistent with the 200% time increase. Now it seem that for large collection there is an additional delay somewhere. I still assume that a kind of index buffer is getting too big for memory which would cause additional delays. Let's just see what else I can do. There are a lot of options open before the need to copy whole tables around. One of them e.g. would be for the Registry to realize that it has all existing tracks already buffered. From that point on it would no longer need to query for additional tracks. Another would be to precompile queries or combine them as the old scanner had. But first I would like to find out why the access time increases so much. That might not only concern the scanning but would also slow down every other operation. Also why we are at it: I am still thinking that we might commit the changes as we go along. The only drawback would be that the abort button near the progress bar would not abort but instead just stop the scanning. I think that this might not be a bad thing. The button does not have any label, just a no-parking symbol. At least this "committing while we scan" should be done for an empty collection. It would decrease the time a new user has to wait until he can start using Amarok. Cheers, Ralf _______________________________________________ Amarok-devel mailing list Amarok-devel@kde.org https://mail.kde.org/mailman/listinfo/amarok-devel