Hi Tobias, On 24/11/2011 16:20, Tobias Quathamer wrote: > Am 23.11.2011 03:00, schrieb David Prévot: >> Since isoquery has been rewritten in Python, it's a lot slower than it >> used to be.
> I've done some profiling now and I couldn't find any bottlenecks which > could be improved. The most time for ISO 639-3 takes the call to > lxml.etree.parse I forgot to mention that, in order to make the comparative test in the initial bug report, I made a s/639-3/639/ for the isoquery calls (since ISO 639-3 was not supported in the Lenny version). > Are you sure that those additional five minutes needed for the build are > coming from isoquery? If the system has been upgraded to Squeeze, maybe > another program is the culprit. The test was actually made on my Sid box where I only downgraded isoquery between the two tests, so it should rule out external components. > Anyway, I've been looking at the file in question (dtc.def) and tried to > rewrite it a bit. With this new version, two calls to isoquery are > removed from the source. Wow, great idea, thanks a lot for digging into that! > I think that this should speed up the build, if > isoquery is really responsible for the additional time. > > Could you please try the patch? Unfortunately, I haven't had the chance > to do so, because wml currently fails on my system. Sure, I just had to fix a build issue, attached as fixdtc.diff (Perl stuff included in wml makes it quite weird to debug ;-). All the following tests are made on my Sid box, with s/639-3/639/ in dtc.def and scripts/fix-files.sh. The build (“time make install”) is only made in the webwml/english/international/l10n subdirectory (please note that in order to build, it needs some data imported daily by the lessoften cronjob [0]). 1) Up to date Sid, with Lenny's isoquery (isoquery 0.16-1): real 15m10.163s user 0m42.263s sys 0m16.565s 2) Up to date Sid (isoquery 1.5-1): real 20m15.055s user 4m4.611s sys 0m48.947s 3) Up to date Sid (isoquery 1.5-1) with your dtc.def patch: real 17m47.653s user 2m34.166s sys 0m32.854s 3bis) I had even better results while hardcoding the few undefined isocode, using take1.diff, but I would prefer not to go that way, because some more undefined ISO codes may reach the archive in the future: real 17m12.212s user 2m7.224s sys 0m26.382s Since isoquery is also called in script/fix-file.sh, with the same crappy logic, I fixed it following your lead, thanks again! Given the time result, the actual improvement does not seem really significant, but the code is lighter now, so it's not a complete loss. 4) Up to date Sid (isoquery 1.5-1) with updated dtc.def and fix-file.sh: real 17m33.876s user 2m7.332s sys 0m27.994s Thanks to your enhancement, around two minutes and a half (fourteen times: one build for for each language) have been won, compared to the current build (test #2). On the other hand, around two minutes and a half compared with the previous version of isoquery (test #1) have been lost. I wondered how much better it would have been if I had not written crappy calls in the first place (not that different from #1 finally): 5) Up to date Sid, with Lenny's isoquery (isoquery 0.16-1) with fixed scripts: real 15m4.842s user 0m38.710s sys 0m12.905s 6) Up to date Sid (isoquery 1.5-1) with fixed scripts, using ISO 639-3 (what I intend to commit once I've actually checked there is no regression in the built pages): real 17m58.035s user 2m53.519s sys 0m34.306s The build was always done after a cleanup (“make clean”) that's why it takes so long: scripts/gen-files.pl is only called once, not for every language, I should have run gen-files.pl before actually calling the build :/, I'll have to think of a better way to test those changes in a more reliable way (and build all languages in order to have more significant numbers to compare: one build only for each test gives only a approximate idea). 0: http://anonscm.debian.org/gitweb/?p=debwww/cron.git;a=blob;f=lessoften;hb=HEAD Regards David
diff --git a/english/international/l10n/dtc.def b/english/international/l10n/dtc.def index 14285c6..bf0df3e 100644 --- a/english/international/l10n/dtc.def +++ b/english/international/l10n/dtc.def @@ -154,8 +154,9 @@ sub language_name { $lang=$1; $country=$2; } - my $lang_fullname = chomp(`isoquery -i 639 $lang`); - if ($? == 0) { + my $lang_fullname = `isoquery -i 639 $lang 2>/dev/null`; + chomp $lang_fullname; + if ($lang_fullname != '') { $lang_fullname =~ s/^.*\t//; $lang_fullname = dgettext("iso_639_3", "$lang_fullname"); # #624476 workaround: French typography expect languages to start with a lowercase @@ -164,8 +165,9 @@ sub language_name { return qq(<Unknown_Language>); } if (defined $country) { - my $country_fullname = chomp(`isoquery -c $country`); - if ($? == 0) { + my $country_fullname = `isoquery -c $country 2>/dev/null`; + chomp $country_fullname; + if ($country_fullname != '') { $country_fullname =~ s/^.*\t//; $country_fullname = dgettext("iso_3166", "$country_fullname"); return "<langcountryoutput $lang_fullname $country_fullname>";
diff --git a/english/international/l10n/dtc.def b/english/international/l10n/dtc.def index 14285c6..9c7dc7f 100644 --- a/english/international/l10n/dtc.def +++ b/english/international/l10n/dtc.def @@ -154,8 +154,9 @@ sub language_name { $lang=$1; $country=$2; } - my $lang_fullname = chomp(`isoquery -i 639 $lang`); - if ($? == 0) { + if ($lang != 'cz' and $lang != 'frp' and $lang != 'hne' and $lang != 'pms' and $lang != 'sp'){ + $lang_fullname = `isoquery -i 639 $lang`; + chomp $lang_fullname; $lang_fullname =~ s/^.*\t//; $lang_fullname = dgettext("iso_639_3", "$lang_fullname"); # #624476 workaround: French typography expect languages to start with a lowercase @@ -164,8 +165,9 @@ sub language_name { return qq(<Unknown_Language>); } if (defined $country) { - my $country_fullname = chomp(`isoquery -c $country`); - if ($? == 0) { + if ($country != 'FX' and $country != 'YU'){ + my $country_fullname = `isoquery -c $country`; + chomp $country_fullname; $country_fullname =~ s/^.*\t//; $country_fullname = dgettext("iso_3166", "$country_fullname"); return "<langcountryoutput $lang_fullname $country_fullname>";
signature.asc
Description: OpenPGP digital signature