Edit report at http://bugs.php.net/bug.php?id=54089&edit=1

 ID:                 54089
 User updated by:    nicolas dot grekas+php at gmail dot com
 Reported by:        nicolas dot grekas+php at gmail dot com
 Summary:            token_get_all with regards to __halt_compiler is not
                     binary safe
-Status:             Assigned
+Status:             Open
 Type:               Bug
 Package:            Unknown/Other Function
 Operating System:   Any
 PHP Version:        5.3.5
 Assigned To:        iliaa
 Block user comment: N
 Private report:     N

 New Comment:

Really, the actual patch is a step backward, I can't do things that were
easy before (getting the halt_compiler_offset with token_get_all)...

Please consider reverting it!


Previous Comments:
------------------------------------------------------------------------
[2011-03-03 15:44:33] nicolas dot grekas+php at gmail dot com

Sorry to reopen. As 5.3.6 is in RC, I just want to be sure my previous
comment has been read. What about reverting the patch ?

------------------------------------------------------------------------
[2011-03-01 10:15:47] nicolas dot grekas+php at gmail dot com

Thanks for the patch. After reading it, I'm not sure it really helps,
considering that the stop on T_HALT_COMPILER was already easily feasible
in plain PHP. In fact, it may be worse, because now if I want to access
data after T_HALT_COMPILER in PHP using tokenizer, I have to write more
code, as the data is missing from the token array.



As a corner case also, __halt_compiler is always followed by 3 valid
tokens: "(", ")" then ";" or T_CLOSE_TAG, with any number of
T_WHITESPACE/T_COMMENT/T_DOC_COMMENT between.



My view is that this "bug" can be fixed by introducing a new
T_UNEXPECTED_CHARACTER token type, matching those "Unexpected character
in input" warnings: this would fix token_get_all binary unsafeness. Is
it a good idea? I don't know if it's difficult to implement, nor if it
would introduce any BC break, so maybe a "Won't fix" on this bug is
enough?



Could the patch be reverted? I'm afraid it's the best for tokenizer
users...

Here is what I was using before the patch to work around this binary
incompatibility:



<?php



// New token matching an "Unexpected character in input"

define('T_UNEXPECTED_CHARACTER', -1); 



$src_tokens = @token_get_all($code);

$bin_tokens = array();

$offset =  0;

$i      = -1;



while (isset($src_tokens[++$i]))

{

        $t = isset($src_tokens[$i][1]) ? $src_tokens[$i][1] : $src_tokens[$i];



        while ($t[0] !== $code[$offset])

                $bin_tokens[] = array(T_UNEXPECTED_CHARACTER, $code[$offset++]);



        $offset += strlen($t);

        $bin_tokens[] = $src_tokens[$i];

        unset($src_tokens[$i]);

}



// Here, $bin_tokens contains binary safe tokens



?>

------------------------------------------------------------------------
[2011-02-28 16:18:35] il...@php.net

This bug has been fixed in SVN.

Snapshots of the sources are packaged every three hours; this change
will be in the next snapshot. You can grab the snapshot at
http://snaps.php.net/.
 
Thank you for the report, and for helping us make PHP better.



------------------------------------------------------------------------
[2011-02-28 16:18:28] il...@php.net

Automatic comment from SVN on behalf of iliaa
Revision: http://svn.php.net/viewvc/?view=revision&amp;revision=308761
Log: Fixed bug #54089 (token_get_all() does not stop after
__halt_compiler).

------------------------------------------------------------------------
[2011-02-24 13:16:17] nicolas dot grekas+php at gmail dot com

Description:
------------
A. token_get_all() eats some characters which are not allowed in plain
PHP code and trigger a "Unexpected character in input" warning.



B. after a T_HALT_COMPILER, the tokens are still identified as if they
were not after this T_HALT_COMPILER, when in reality they are just
random data.



So, when using token_get_all on code which contains a T_HALT_COMPILER,
the data after that is corrupted because of A.

Test script:
---------------
<?php



$code = "<?php __halt_compiler();\x01?>\x02";



$tokens = token_get_all($code);

$reconstructed_code = '';



foreach ($tokens as $t)

{

        $reconstructed_code .= isset($t[1]) ? $t[1] : $t;

}



var_dump($code);

var_dump($reconstructed_code);



Expected result:
----------------
string(28) "<?php __halt_compiler();?>"

string(28) "<?php __halt_compiler();?>"



Actual result:
--------------
PHP Warning:  Unexpected character in input:  '' (ASCII=1) state=0 on
line 5

string(28) "<?php __halt_compiler();?>"

string(27) "<?php __halt_compiler();?>"




------------------------------------------------------------------------



-- 
Edit this bug report at http://bugs.php.net/bug.php?id=54089&edit=1

Reply via email to