Hi Tobias,

On 24/11/2011 16:20, Tobias Quathamer wrote:
> Am 23.11.2011 03:00, schrieb David Prévot:
>> Since isoquery has been rewritten in Python, it's a lot slower than it
>> used to be.

> I've done some profiling now and I couldn't find any bottlenecks which
> could be improved. The most time for ISO 639-3 takes the call to
> lxml.etree.parse

I forgot to mention that, in order to make the comparative test in the
initial bug report, I made a s/639-3/639/ for the isoquery calls (since
ISO 639-3 was not supported in the Lenny version).

> Are you sure that those additional five minutes needed for the build are
> coming from isoquery? If the system has been upgraded to Squeeze, maybe
> another program is the culprit.

The test was actually made on my Sid box where I only downgraded
isoquery between the two tests, so it should rule out external components.

> Anyway, I've been looking at the file in question (dtc.def) and tried to
> rewrite it a bit. With this new version, two calls to isoquery are
> removed from the source.

Wow, great idea, thanks a lot for digging into that!

> I think that this should speed up the build, if
> isoquery is really responsible for the additional time.
> 
> Could you please try the patch? Unfortunately, I haven't had the chance
> to do so, because wml currently fails on my system.

Sure, I just had to fix a build issue, attached as fixdtc.diff (Perl
stuff included in wml makes it quite weird to debug ;-).

All the following tests are made on my Sid box, with s/639-3/639/ in
dtc.def and scripts/fix-files.sh. The build (“time make install”) is
only made in the webwml/english/international/l10n subdirectory (please
note that in order to build, it needs some data imported daily by the
lessoften cronjob [0]).

1) Up to date Sid, with Lenny's isoquery (isoquery 0.16-1):

real    15m10.163s
user    0m42.263s
sys     0m16.565s


2) Up to date Sid (isoquery 1.5-1):

real    20m15.055s
user    4m4.611s
sys     0m48.947s


3) Up to date Sid (isoquery 1.5-1) with your dtc.def patch:

real    17m47.653s
user    2m34.166s
sys     0m32.854s


3bis) I had even better results while hardcoding the few undefined
isocode, using take1.diff, but I would prefer not to go that way,
because some more undefined ISO codes may reach the archive in the future:

real    17m12.212s
user    2m7.224s
sys     0m26.382s


Since isoquery is also called in script/fix-file.sh, with the same
crappy logic, I fixed it following your lead, thanks again! Given the
time result, the actual improvement does not seem really significant,
but the code is lighter now, so it's not a complete loss.

4) Up to date Sid (isoquery 1.5-1) with updated dtc.def and fix-file.sh:

real    17m33.876s
user    2m7.332s
sys     0m27.994s

Thanks to your enhancement, around two minutes and a half (fourteen
times: one build for for each language) have been won, compared to the
current build (test #2). On the other hand, around two minutes and a
half compared with the previous version of isoquery (test #1) have been
lost.


I wondered how much better it would have been if I had not written
crappy calls in the first place (not that different from #1 finally):

5) Up to date Sid, with Lenny's isoquery (isoquery 0.16-1) with fixed
scripts:

real    15m4.842s
user    0m38.710s
sys     0m12.905s



6) Up to date Sid (isoquery 1.5-1) with fixed scripts, using ISO 639-3
(what I intend to commit once I've actually checked there is no
regression in the built pages):

real    17m58.035s
user    2m53.519s
sys     0m34.306s



The build was always done after a cleanup (“make clean”) that's why it
takes so long: scripts/gen-files.pl is only called once, not for every
language, I should have run gen-files.pl before actually calling the
build :/, I'll have to think of a better way to test those changes in a
more reliable way (and build all languages in order to have more
significant numbers to compare: one build only for each test gives only
a approximate idea).

0:
http://anonscm.debian.org/gitweb/?p=debwww/cron.git;a=blob;f=lessoften;hb=HEAD

Regards

David
diff --git a/english/international/l10n/dtc.def b/english/international/l10n/dtc.def
index 14285c6..bf0df3e 100644
--- a/english/international/l10n/dtc.def
+++ b/english/international/l10n/dtc.def
@@ -154,8 +154,9 @@ sub language_name {
                 $lang=$1;
                 $country=$2;
         }
-	my $lang_fullname = chomp(`isoquery -i 639 $lang`);
-	if ($? == 0) {
+	my $lang_fullname = `isoquery -i 639 $lang 2>/dev/null`;
+	chomp $lang_fullname;
+	if ($lang_fullname != '') {
 		$lang_fullname =~ s/^.*\t//;
 		$lang_fullname = dgettext("iso_639_3", "$lang_fullname");
 		# #624476 workaround: French typography expect languages to start with a lowercase
@@ -164,8 +165,9 @@ sub language_name {
                 return qq(<Unknown_Language>);
         }
 	if (defined $country) {
-		my $country_fullname = chomp(`isoquery -c $country`);
-		if ($? == 0) {
+		my $country_fullname = `isoquery -c $country 2>/dev/null`;
+		chomp $country_fullname;
+		if ($country_fullname != '') {
 			$country_fullname =~ s/^.*\t//;
 			$country_fullname = dgettext("iso_3166", "$country_fullname");
 			return "<langcountryoutput $lang_fullname $country_fullname>";
diff --git a/english/international/l10n/dtc.def b/english/international/l10n/dtc.def
index 14285c6..9c7dc7f 100644
--- a/english/international/l10n/dtc.def
+++ b/english/international/l10n/dtc.def
@@ -154,8 +154,9 @@ sub language_name {
                 $lang=$1;
                 $country=$2;
         }
-	my $lang_fullname = chomp(`isoquery -i 639 $lang`);
-	if ($? == 0) {
+	if ($lang != 'cz' and $lang != 'frp' and $lang != 'hne' and $lang != 'pms' and $lang != 'sp'){
+		$lang_fullname = `isoquery -i 639 $lang`;
+		chomp $lang_fullname;
 		$lang_fullname =~ s/^.*\t//;
 		$lang_fullname = dgettext("iso_639_3", "$lang_fullname");
 		# #624476 workaround: French typography expect languages to start with a lowercase
@@ -164,8 +165,9 @@ sub language_name {
                 return qq(<Unknown_Language>);
         }
 	if (defined $country) {
-		my $country_fullname = chomp(`isoquery -c $country`);
-		if ($? == 0) {
+		if ($country != 'FX' and $country != 'YU'){
+			my $country_fullname = `isoquery -c $country`;
+			chomp $country_fullname;
 			$country_fullname =~ s/^.*\t//;
 			$country_fullname = dgettext("iso_3166", "$country_fullname");
 			return "<langcountryoutput $lang_fullname $country_fullname>";

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to