See attached patch improvement. Thanks.
-- G
--- Begin Message --- Title: Re: xgettext: problems with PHP heredocsGaƫtan Frenoy wrote:
> Note that this problem was already reported a while ago
> in gnu.utils.bug
> (see http://groups.google.be/group/gnu.utils.bug/tree/browse_frm/thread/94bed260e b3dde71/e8c679cb94fb3b8a)Thanks for re-reporting it; the original reporter had not provided a complete
testcase.> But so far, I did not find any fix.
>
> First let's describe the problem. Say you have the
> following PHP source file.
>
> -------- bug_heredoc.php -------------------------- start -
> <?php
>
> $foo = _("Bar");
> echo <<<_END_
> [...]
> <script language="_javascript_">
> [...]
> </script>
> _END_;
> $foo2 = _("Bar2");
>
> ?>
> -------- bug_heredoc.php ---------------------------- end -
>
> Now, run "xgettext" to extract marked strings :
>
> -- xgettext call ---------------------------------- start -
> $ xgettext -L PHP --omit-header heredoc_bug.php -o -
> #: heredoc_bug.php:3
> msgid "Bar"
> msgstr ""
> -- xgettext call ------------------------------------ end -
>
> Surprisingly, "Bar2" is not reported nor any of marked
> strings located after the end of PHP heredoc section.
>
> If you are not familiar with PHP, here are some words
> about Heredoc syntax:
> http://www.php.net/manual/en/language.types.string.php#language.types.string .syntax.heredocYou're right; this is a bug in xgettext. Thanks also for the syntax
reference.> By digging into the code, I found the following fix
> for gettext-tools/src/x-php.c :
>
> -- x-php.c patch --------------------------------- start -
> $ diff -abu x-php.orig.c x-php.c
> --- x-php.orig.c 2003-12-30 12:30:01.000000000 +0100
> +++ x-php.c 2006-05-04 19:14:43.434424200 +0200
> @@ -1087,12 +1087,18 @@
> {
> int bufidx = 0;
>
> + /* Skip blank lines before processing
> + * possible label */
> + do
> + c = phase1_getc ();
> + while (c != EOF && (c == '\n' || c == '\r'));
> +
> while (bufidx < bufpos)
> {
> c = phase1_getc ();
> if (c == EOF)
> break;
> - if (c != buffer[bufidx])
> + if (c != buffer[bufidx++])
> {
> phase1_ungetc (c);
> break;
> -- x-php.c patch ----------------------------------- end -
Thanks for the patch; the missing bufidx increment is indeed half of the
bug. However, your fix of the blank lines bug does not work well. Try for
example the input file======================= foo.php =======================
<?
echo _("Egyptians");
echo <<<EOTMARKER
Ramses
EOTMARKER;
echo _("Babylonians");
echo <<<EOTMARKER
Nebukadnezar
EOTMARKER
echo _("Assyrians");
echo <<<EOTMARKER
Assurbanipal
EOT
echo _("Persians");
echo <<<EOTMARKER
Dariusecho _("Greeks");
echo <<<EOTMARKER
AlexanderEOTMARKER
echo _("Romans");
echo <<<EOTMARKER
Augustus
EOTMARKER
echo _("Goths");
echo <<<EOTMARKER
Odoakar
EOTMARKER
echo _("Franks");
?>
===============================================The expected xgettext output here is:
===============================================
msgid "Egyptians"
msgstr ""msgid "Babylonians"
msgstr ""msgid "Assyrians"
msgstr ""msgid "Romans"
msgstr ""msgid "Franks"
msgstr ""
===============================================I'm using the appended patch.
*** gettext-0.14.5/gettext-tools/src/x-php.c.bak 2005-05-20 22:46:40.000000000 +0200
--- gettext-0.14.5/gettext-tools/src/x-php.c 2006-05-12 03:42:58.000000000 +0200
***************
*** 1097,1109 ****
phase1_ungetc (c);
break;
}
}
! c = phase1_getc ();
! if (c != ';')
! phase1_ungetc (c);
! c = phase1_getc ();
! if (c == '\n' || c == '\r')
! break;
}
}
--- 1097,1113 ----
phase1_ungetc (c);
break;
}
+ bufidx++;
}
! if (bufidx == bufpos)
! {
! c = phase1_getc ();
! if (c != ';')
! phase1_ungetc (c);
! c = phase1_getc ();
! if (c == '\n' || c == '\r')
! break;
! }
}
}
Bruno
--- End Message ---