Edit report at https://bugs.php.net/bug.php?id=48219&edit=1

 ID:                 48219
 Comment by:         alastair at alastairs-place dot net
 Reported by:        carsten_sttgt at gmx dot de
 Summary:            Add entry for possible content-transfer-encoding in
                     uploaded file information
 Status:             Open
 Type:               Feature/Change Request
 Package:            HTTP related
 Operating System:   *
 PHP Version:        5.*, 6CVS (2009-05-09)
 Block user comment: N
 Private report:     N

 New Comment:

I'll add that RFC2388 glibly states that "a boundary is selected that does not 
occur in any of the data".  This is not, of course, the implementation that 
some 
browser writers have chosen, nor would following that recommendation be 
reasonable in the general case, since it might necessitate pre-scanning a large 
file prior to upload; rather, they pick a random boundary string that they 
think 
is not likely to come up in practice.

RFC2388 also quite clearly states that

  Each part may be encoded and the "content-transfer-encoding" header
  supplied if the value of that part does not conform to the default
  encoding.


Previous Comments:
------------------------------------------------------------------------
[2012-03-27 14:53:53] alastair at alastairs-place dot net

The claim that HTTP, as a binary supporting protocol, does not need 
Content-Transfer-
Encoding for form POSTs is bogus.

The problem is very simple; if your MIME boundary is set to (say) "test", then 
a POST with 
a body like this:

  --test
  Content-Disposition: form-data; name="frob"
  
  $frobValue
  --test--

can go wrong if $frobValue happens to contain something like

  This is line one.
  --test
  Content-Disposition: form-data; name="hack"

  This is very naughty.

This could happen from a web browser, if the boundary was predictable.  It's 
unlikely to 
happen by accident (since in practice the boundary will be randomly generated 
and contain a 
significant number of characters), but it could nevertheless happen.

There are two ways to deal with this problem.  The first is to set 
Content-Length headers 
on the subparts; the MIME parser can then read that many bytes in the knowledge 
that no 
*real* boundary will be within the data.  The second is to use 
Content-Transfer-Encoding 
and either send the data as e.g. base64, or use quoted-printable in combination 
with a 
boundary that is not valid quoted printable data.

Unfortunately, as far as I can see from reading rfc1867.c, PHP SUPPORTS 
NEITHER!  Even for 
binary files, PHP *ignores* Content-Length and scans for a boundary instead.  
Result: there 
is a statistical likelihood, however, small, that the POST data will not be as 
expected.

------------------------------------------------------------------------
[2010-12-20 08:55:51] j...@php.net

Updated, shouldn't it be enough if we add the encoding if it is passed by the 
uploader? Then you could handle the data easier. Any other fields that are 
missing? :) I don't think PHP should decode it automatically..

------------------------------------------------------------------------
[2009-11-20 21:46:47] codeslinger at compsalot dot com

Well, I mostly deal with email, especially including webmail.  and as far as I 
can see, nearly all attachments are base64 encoded. In fact it is hard to find 
anything that isn't,  unless it's plain text.

So, I guess I was a little bit confused about the difference between HTTP 
uploads and email uploads, since they both use MIME and typically they both 
contain web pages.

With regard to this feature request.  I would really like for php to make the 
MIME Header info available.  That way we can easily do our own decoding as long 
as we have access to the info that tells us what needs to be decoded, currently 
we don't, at least not with out kludge hacks, and that makes it hard to do 
something which should be simple.

------------------------------------------------------------------------
[2009-11-19 23:55:12] avalon73 at caerleon dot us

RFC 2616 section 3.2.7 itself says nothing about the use of 
Content-Transfer-Encoding (CTE).

RFCs 1867 and 2388 both mention the possibility of the multipart/form-data MIME 
type being used with email as a transport as well as HTTP.  The CTE header and 
the "base64" and "quoted-printable" encodings were included in MIME 
specifically for moving 8-bit data over 7-bit transport protocols, which 
included basic (non-enhanced) SMTP at the time of its creation (and still does, 
if you adhere strictly to the RFCs).  The other standard encodings defined for 
the CTE header (7bit, 8bit, and binary) imply no content encoding at all.

HTTP is and has always been an 8-bit clean transport protocol.  Because of 
that, it has no need for any encodings designed to move 8-bit data over a 7-bit 
protocol.  In fact, the use of such encodings would only needlessly add bulk to 
the data being transferred.  If no such transformation is necessary, the 
addition of the CTE header is also not necessary.  Section 19.4.5 of RFC 2616 
would seem to merely codify this fact, effectively forbidding the use of CTE 
over HTTP.

------------------------------------------------------------------------
[2009-11-19 23:00:39] carsten_sttgt at gmx dot de

> Has anyone noticed this?
> http://www.w3.org/Protocols/rfc2616/rfc2616-sec19.html#sec19.4.5

Sure, but in rfc2616-sec3.html#sec3.7.2 you can read, that especially 
multipart/form-data is defined in RFC1867 (RFC2388). And there you can read 
about the content-transfer-encoding.

Regards,
Carsten

------------------------------------------------------------------------


The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at

    https://bugs.php.net/bug.php?id=48219


-- 
Edit this bug report at https://bugs.php.net/bug.php?id=48219&edit=1

Reply via email to