ID:               39078
 Comment by:       toby dot walsh at fxhome dot com
 Reported By:      main at springtimesoftware dot com
 Status:           Open
 Bug Type:         Feature/Change Request
 Operating System: Windows XP
 PHP Version:      5.1.6
 New Comment:

I believe derick probably meant to link to rfc 2396

http://www.ietf.org/rfc/rfc2396.txt

It says...

----
Many URI include components consisting of or delimited by, certain
   special characters.  These characters are called "reserved", since
   their usage within the URI component is limited to their reserved
   purpose.  If the data for a URI component would conflict with the
   reserved purpose, then the conflicting data must be escaped before
   forming the URI.

      reserved    = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" |
                    "$" | ","
----

notice the "+" symbol is now in the reserved list.

This issue is confusing because the old rfc did indeed say that the "+"
symbol did not need to be encoded. The new rfc 2396 actually draws
attention to this change.

----
G.2. Modifications from both RFC 1738 and RFC 1808

Changed to URI syntax instead of just URL.

Confusion regarding the terms "character encoding", the URI
"character set", and the escaping of characters with %<hex><hex>
equivalents has (hopefully) been reduced.  Many of the BNF rule names
regarding the character sets have been changed to more accurately
describe their purpose and to encompass all "characters" rather than
just US-ASCII octets.  Unless otherwise noted here, these
modifications do not affect the URI syntax.

Both RFC 1738 and RFC 1808 refer to the "reserved" set of characters
as if URI-interpreting software were limited to a single set of
characters with a reserved purpose (i.e., as meaning something other
than the data to which the characters correspond), and that this set
was fixed by the URI scheme.  However, this has not been true in
practice; any character that is interpreted differently when it is
escaped is, in effect, reserved.  Furthermore, the interpreting
engine on a HTTP server is often dependent on the resource, not just
the URI scheme.  The description of reserved characters has been
changed accordingly.

The plus "+", dollar "$", and comma "," characters have been added to
those in the "reserved" set, since they are treated as reserved
within the query component.
----

So I believe PHP is correct to decode the "+" as a " ".

You should be using the javascript function encodeURIComponent() to 
escape your strings. encodeURIComponent will encode "+" chars properly.
Here's a good page which shows the difference between javascripts
encoding functions.

http://xkr.us/articles/javascript/encode-compare/


Previous Comments:
------------------------------------------------------------------------

[2009-08-10 15:02:31] boriss at web dot de

I'd like to see an option to change runtime behavior of PHP, too. Even
if the Javascript function escape() would work a user could still enter
an URL with a query string himself. Imagine you have a search engine and
someone enters an URL with ?query=C++. If you use $_GET['query'] you
just don't know if someone searches for "C++" or "C  ".

------------------------------------------------------------------------

[2008-07-16 20:18:49] edA-qa at disemia dot com

I would also like to add that decoding '+' to a space is just plain
wrong. I got burnt again by this when using base64_encode, which should
produce URL safe strings, but for PHP it doesn't, since it may include
the '+'.

A global option to use the proper rawurldecode would be great. 
Otherwise I'm stuck, like many developers, in reparsing the query
string/url manually and unable to use _POST and _GET.

------------------------------------------------------------------------

[2008-06-12 00:25:52] jerm at live dot com

I'm with David on this.

On the client-side, I'm using the JavaScript escape() function to
encode data for sending to the server using a POST ajax request.
(Original bug report refers to $_GET, but this is also affecting
$_POST)

The server sees both plus signs "+" and "%20" as spaces. And yes, PHP
is seeing the plus, untouched by Apache, as I can prove using:

echo file_get_contents("php://input"); // Display raw POST

This is very frustrating. I'm currently getting around this by parsing
the raw POST data manually (above), and not using the pre-parsed $_POST
data.

------------------------------------------------------------------------

[2006-10-10 13:30:10] main at springtimesoftware dot com

So, that's it? Just a few ignorant attempts to classify this feature
request as Bogus, with no assignment to a developer to make this feature
request happen?

I'm disappointed.

An option to process incoming URL args using rawurldecode instead of
urldecode would benefit so many users!

David Spector

------------------------------------------------------------------------

[2006-10-07 22:53:52] main at springtimesoftware dot com

I'm not sure I'm following you.

Section "Reserved:" in RFC 1738 (at
http://www.freesoft.org/CIE/RFC/1738/4.htm) states:

----
Thus, only alphanumerics, the special characters "$-_.+!*'(),", and
reserved characters used for their reserved purposes may be used
unencoded within a URL.
----

Since "+" is listed, I would expect that any agent that obeyed this RFC
would transmit "+" unchanged.

That means that Apache should transmit "+" unchanged to PHP.

This is why I would be surprised to find that Apache is the cause of
this problem.

Indeed, if I browse (using IE 6.0) to a Web page that contains a call
to phpinfo(), browsing using a URL that contains the argument
"Arg=+%20", then phpinfo() reports that _SERVER["QUERY_STRING"] has the
value "Arg=+%20". (I just did this, I'm not making this up.)

This confirms that the plus sign is getting to PHP okay.

So wouldn't you agree with me that Apache cannot be causing this
problem?

PHP must be using urldecode() when it parses the arguments into the
$_GET array, yes? Otherwise, how would the plus sign in the argument
become a space?

David

------------------------------------------------------------------------

The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at
    http://bugs.php.net/39078

-- 
Edit this bug report at http://bugs.php.net/?id=39078&edit=1

Reply via email to