Edit report at http://bugs.php.net/bug.php?id=54089&edit=1
ID: 54089 User updated by: nicolas dot grekas+php at gmail dot com Reported by: nicolas dot grekas+php at gmail dot com Summary: token_get_all with regards to __halt_compiler is not binary safe -Status: Assigned +Status: Open Type: Bug Package: Unknown/Other Function Operating System: Any PHP Version: 5.3.5 Assigned To: iliaa Block user comment: N Private report: N New Comment: Really, the actual patch is a step backward, I can't do things that were easy before (getting the halt_compiler_offset with token_get_all)... Please consider reverting it! Previous Comments: ------------------------------------------------------------------------ [2011-03-03 15:44:33] nicolas dot grekas+php at gmail dot com Sorry to reopen. As 5.3.6 is in RC, I just want to be sure my previous comment has been read. What about reverting the patch ? ------------------------------------------------------------------------ [2011-03-01 10:15:47] nicolas dot grekas+php at gmail dot com Thanks for the patch. After reading it, I'm not sure it really helps, considering that the stop on T_HALT_COMPILER was already easily feasible in plain PHP. In fact, it may be worse, because now if I want to access data after T_HALT_COMPILER in PHP using tokenizer, I have to write more code, as the data is missing from the token array. As a corner case also, __halt_compiler is always followed by 3 valid tokens: "(", ")" then ";" or T_CLOSE_TAG, with any number of T_WHITESPACE/T_COMMENT/T_DOC_COMMENT between. My view is that this "bug" can be fixed by introducing a new T_UNEXPECTED_CHARACTER token type, matching those "Unexpected character in input" warnings: this would fix token_get_all binary unsafeness. Is it a good idea? I don't know if it's difficult to implement, nor if it would introduce any BC break, so maybe a "Won't fix" on this bug is enough? Could the patch be reverted? I'm afraid it's the best for tokenizer users... Here is what I was using before the patch to work around this binary incompatibility: <?php // New token matching an "Unexpected character in input" define('T_UNEXPECTED_CHARACTER', -1); $src_tokens = @token_get_all($code); $bin_tokens = array(); $offset = 0; $i = -1; while (isset($src_tokens[++$i])) { $t = isset($src_tokens[$i][1]) ? $src_tokens[$i][1] : $src_tokens[$i]; while ($t[0] !== $code[$offset]) $bin_tokens[] = array(T_UNEXPECTED_CHARACTER, $code[$offset++]); $offset += strlen($t); $bin_tokens[] = $src_tokens[$i]; unset($src_tokens[$i]); } // Here, $bin_tokens contains binary safe tokens ?> ------------------------------------------------------------------------ [2011-02-28 16:18:35] il...@php.net This bug has been fixed in SVN. Snapshots of the sources are packaged every three hours; this change will be in the next snapshot. You can grab the snapshot at http://snaps.php.net/. Thank you for the report, and for helping us make PHP better. ------------------------------------------------------------------------ [2011-02-28 16:18:28] il...@php.net Automatic comment from SVN on behalf of iliaa Revision: http://svn.php.net/viewvc/?view=revision&revision=308761 Log: Fixed bug #54089 (token_get_all() does not stop after __halt_compiler). ------------------------------------------------------------------------ [2011-02-24 13:16:17] nicolas dot grekas+php at gmail dot com Description: ------------ A. token_get_all() eats some characters which are not allowed in plain PHP code and trigger a "Unexpected character in input" warning. B. after a T_HALT_COMPILER, the tokens are still identified as if they were not after this T_HALT_COMPILER, when in reality they are just random data. So, when using token_get_all on code which contains a T_HALT_COMPILER, the data after that is corrupted because of A. Test script: --------------- <?php $code = "<?php __halt_compiler();\x01?>\x02"; $tokens = token_get_all($code); $reconstructed_code = ''; foreach ($tokens as $t) { $reconstructed_code .= isset($t[1]) ? $t[1] : $t; } var_dump($code); var_dump($reconstructed_code); Expected result: ---------------- string(28) "<?php __halt_compiler();?>" string(28) "<?php __halt_compiler();?>" Actual result: -------------- PHP Warning: Unexpected character in input: '' (ASCII=1) state=0 on line 5 string(28) "<?php __halt_compiler();?>" string(27) "<?php __halt_compiler();?>" ------------------------------------------------------------------------ -- Edit this bug report at http://bugs.php.net/bug.php?id=54089&edit=1