"Costin Manolache" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > On 3/17/06, Jean-frederic Clere <[EMAIL PROTECTED]> wrote: >> >> Costin Manolache wrote: >> >> >Sorry, I forgot there are 2 meanings of 'xml syntax' :-), I was >> >thinking >> if >> >the output >> >is an xml file - with encoding in declaration, but in regular jsp. >> >(well, >> >the patch is not dealing >> >with jspx anyway ) >> >I was referring to the fact that <?xml encoding="iso-8859-2"?> is >> >treated >> as >> >template text, >> >and pageEncoding (or web.xml ) takes precedence. >> >In jsp-xml ( jspx ) it seems we report an error if the web.xml encoding >> >doesn't match the >> ><?xml?> encoding. I can't see many use cases for having an explicit >> encoding >> >in the >> >xml header, and yet the file read with a different encoding. >> > >> > >> In my case the xml header is: >> <?xml version="1.0" encoding="OSD_EBCDIC_DF04_1"?> (In EBCDIC...) >> Reading the file with ISO-8859-1 encoding only gives garbages. >> >> But the patch prevents reading the <@page pageEncoding="bla" %> so it >> is bad. > > > > Yes, the patch is bad - but what would be a good patch ? > > - if pageEncoding is not specified but document starts with <?xml > encoding=...?> - use xml encoding
It would be Tomcat-specific, but +1, since it's in the spirit of using the <[EMAIL PROTECTED] contentType="text/xml; charset=OSD_EBCDIC_DF04_1" %> as the default if pageEncoding isn't specified. > - if pageEncoding is specified and so is <?xml encoding?> - report an > error > ( like jspx does ) or > a warning or choose the xml encoding -1, since the <?xml encoding?> in JSP syntax corresponds to a 'charset=' on the Content-Type header. i.e. it's an output encoding on the page, not a way to read the input document. > - leave current behavior - use default 8859-1 or pageEncoding only. > Of course, this is what RIs, like GlassFish are required to do :). For Tomcat, I'm perfectly happy to have smart guessing as long as it doesn't override the declared <[EMAIL PROTECTED] pageEncoding="bla" %>. > <?xml encoding?> is probably more used and supported ( i.e. more > 'standard' > :-) that jsp pageEncoding. > The jsp spec is clear that last option should be used - but having 2 > conflicting encodings is a source of problems, > and if we can't follow the 'higher' standard, we can at least warn. > > Well - not a big deal, but encodings tends to be a headache area for many > people, in particular > when different parts of the system have different 'standards' and defaults > plus autodetections ( on browser, http, html, > xml, or jsp ). > > Costin > > > The old code should be improved to allow to use the sourceEnc when the >> pageEncoding is not specified and ISO-8859-1 if none are specified. > > > > > > > Cheers >> >> Jean-Frederic >> >> > >> >Costin >> > >> > >> >On 3/17/06, Bill Barker <[EMAIL PROTECTED]> wrote: >> > >> > >> >> >> >> >> >> >> >>>-----Original Message----- >> >>>From: Costin Manolache [mailto:[EMAIL PROTECTED] >> >>>Sent: Friday, March 17, 2006 11:57 AM >> >>>To: Tomcat Developers List >> >>>Subject: Re: svn commit: r386315 - >> >>>/tomcat/jasper/tc5.5.x/src/share/org/apache/jasper/compiler/Pa >> >>>rserController.java >> >>> >> >>>In his example ( where both XML and JSP declare encodings ) - >> >>>which one >> >>>would win ? >> >>> >> >>> >> >>The patch only affects pages in JSP syntax, so the <?xml ... ?> is just >> >>another piece of template text :). >> >> >> >> >> >> >> >>>IMO the XML encoding should win i.e. if the file uses xml >> >>>syntax and starts >> >>>with >> >>><?xml version="1.0" encoding="iso-8859-2" ?>, then jsp >> >>>pageEncoding should >> >>>be ignored. >> >>>If a jsp is written using the XML syntax - it is supposed to >> >>>follow the XML >> >>>rules - there is no >> >>>exception in the XML spec for jsps specifying their different >> >>>syntax for >> >>>encoding. >> >>> >> >>> >> >>> >> >>The JSP expert group agrees with you:). In XML syntax, the XML >> >>encoding >> >>should win out over <jsp:directive.page pageEncoding="..." />. >> >> >> >> >> >> >> >>>For non-XML jsps - I think respecting pageEncoding is a must, >> >>>the jsp reader >> >>>must scan the >> >>>file to find the pageEncoding string - which is not trivial ( >> >>>there is a >> >>>reason why XML requires the >> >>>encoding to be the first thing in the file, at the top, I >> >>>would't bet on >> >>>jasper implementing it correctly :-) >> >>> >> >>> >> >>> >> >>In JSP syntax, the spec (Appendix D) says that pageEncoding should win >> (at >> >>least when there is no matching <page-encoding /> in web.xml :). What >> the >> >>patch breaks is that with it Jasper won't even look for the >> >>pageEncoding >> >>most of the time. >> >> >> >>Jasper looks like it does a pretty good job of guessing to set up the >> >>Reader >> >>that scans for the pageEncoding directive. And JFC seems to agree, >> since >> >>the patch is to use the guessed encoding rather than the one that was >> >>specified :). >> >> >> >> >> >> >> >>>Costin >> >>> >> >>>On 3/17/06, Bill Barker <[EMAIL PROTECTED]> wrote: >> >>> >> >>> >> >>>> >> >>>> >> >>>> >> >>>>>-----Original Message----- >> >>>>>From: Jean-frederic Clere [mailto:[EMAIL PROTECTED] >> >>>>>Sent: Friday, March 17, 2006 4:13 AM >> >>>>>To: Tomcat Developers List >> >>>>>Subject: Re: svn commit: r386315 - >> >>>>>/tomcat/jasper/tc5.5.x/src/share/org/apache/jasper/compiler/Pa >> >>>>>rserController.java >> >>>>> >> >>>>>Bill Barker wrote: >> >>>>> >> >>>>> >> >>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>>>-----Original Message----- >> >>>>>>>From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] >> >>>>>>>Sent: Thursday, March 16, 2006 3:55 AM >> >>>>>>>To: tomcat-dev@jakarta.apache.org >> >>>>>>>Subject: svn commit: r386315 - >> >>>>>>>/tomcat/jasper/tc5.5.x/src/share/org/apache/jasper/compiler/Pa >> >>>>>>>rserController.java >> >>>>>>> >> >>>>>>>Author: jfclere >> >>>>>>>Date: Thu Mar 16 03:54:29 2006 >> >>>>>>>New Revision: 386315 >> >>>>>>> >> >>>>>>>URL: http://svn.apache.org/viewcvs?rev=386315&view=rev >> >>>>>>>Log: >> >>>>>>>If the encoding is not specified use the detected one. >> >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>-1. >> >>>>>>If it gets to this point, the detected encoding is *wrong* >> >>>>>> >> >>>>>> >> >>>>>(e.g. <?xml >> >>>>> >> >>>>> >> >>>>>>version="1.0" encoding="iso-8859-2" ?> in JSP syntax). >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>Why wrong? >> >>>>> >> >>>>> >> >>>>Because the right encoding is the one specified in the <[EMAIL PROTECTED] >> >>>>pageEncoding="utf8"%>. >> >>>> >> >>>> >> >>>> >> >>>>>+++ >> >>>>>Connected to localhost. >> >>>>>Escape character is '^]'. >> >>>>>GET /try1.jsp >> >>>>><?xml version="1.0" encoding="ISO-8859-2"?> >> >>>>><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" >> >>>>> "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> >> >>>>>+++ >> >>>>> >> >>>>> >> >>>>> >> >>>>This is about pageEncoding, so I don't see the relevance. >> >>>> >> >>>> >> >>>> >> >>>>>>I don't have access to an EBCDIC machine to know what the >> >>>>>> >> >>>>>> >> >>>>>problem is, but >> >>>>> >> >>>>> >> >>>>>>this isn't the fix. Possibly a better way to guess the >> >>>>>> >> >>>>>> >> >>>>>encoding of the >> >>>>> >> >>>>> >> >>>>>>Reader? >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>Thinking to it the patch is not prefect but the old code >> >>>>> >> >>>>> >> >>>is worse we >> >>> >> >>> >> >>>>>have a piece of code that detects correctly the source >> >>>>> >> >>>>> >> >>>encoding and >> >>> >> >>> >> >>>>>detroy it... >> >>>>> >> >>>>> >> >>>>> >> >>>>However, the old code adheres to the JSP spec, whereas your >> >>>> >> >>>> >> >>>patch breaks >> >>> >> >>> >> >>>>the >> >>>>JSP spec (Appendix D). That automatically makes the old >> >>>> >> >>>> >> >>>code better than >> >>> >> >>> >> >>>>your patch. >> >>>> >> >>>> >> >>>> >> >>>>>In doParse() in ParserController.java the following happends >> >>>>>parse() is called with pageEnc = sourceEnc >> >>>>>jspConfigPageEnc = null >> >>>>>isDefaultPageEncoding = false. >> >>>>>But the line before the jspReader uses the sourceEnc to create the >> >>>>>InputStreamReader so the content of the file is translated to >> >>>>>utf-8 when >> >>>>>reading it. >> >>>>>In validator.java the charset will be set to the detected >> >>>>>encoding... In >> >>>>>the example above iso-8859.2. Bad for me that will be >> >>>>>OSD_EBCDIC_DF04_1. >> >>>>> >> >>>>> >> >>>>> >> >>>>The only issue is why Jasper can't recognize your <[EMAIL PROTECTED] >> >>>>pageEncoding="OSD_EBCDIC_DF04_1" %> statement. That's the >> >>>> >> >>>> >> >>>part that I >> >>> >> >>> >> >>>>can't >> >>>>figure out (and your patch is masking :). >> >>>> >> >>>> >> >>>> >> >>>>>Cheers >> >>>>> >> >>>>>Jean-Frederic >> >>>>> >> >>>>> >> >>>>> >> >>>>>> >> >>>>>> >> >>>>>>This message is intended only for the use of the person(s) >> >>>>>> >> >>>>>> >> >>>>>listed above as the intended recipient(s), and may contain >> >>>>>information that is PRIVILEGED and CONFIDENTIAL. If you are >> >>>>>not an intended recipient, you may not read, copy, or >> >>>>>distribute this message or any attachment. If you received >> >>>>>this communication in error, please notify us immediately by >> >>>>>e-mail and then delete all copies of this message and any >> >>>>> >> >>>>> >> >>>attachments. >> >>> >> >>> >> >>>>>>In addition you should be aware that ordinary (unencrypted) >> >>>>>> >> >>>>>> >> >>>>>e-mail sent through the Internet is not secure. Do not send >> >>>>>confidential or sensitive information, such as social >> >>>>>security numbers, account numbers, personal identification >> >>>>>numbers and passwords, to us via ordinary (unencrypted) e-mail. >> >>>>> >> >>>>> >> >>>>>> >> >>>>>> >> >>>>--------------------------------------------------------------------- >> >>>> >> >>>> >> >>>>>>To unsubscribe, e-mail: [EMAIL PROTECTED] >> >>>>>>For additional commands, e-mail: [EMAIL PROTECTED] >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>--------------------------------------------------------------------- >> >>> >> >>> >> >>>>>To unsubscribe, e-mail: [EMAIL PROTECTED] >> >>>>>For additional commands, e-mail: [EMAIL PROTECTED] >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>> >> >>>>This message is intended only for the use of the person(s) >> >>>> >> >>>> >> >>>listed above as >> >>> >> >>> >> >>>>the intended recipient(s), and may contain information that >> >>>> >> >>>> >> >>>is PRIVILEGED >> >>> >> >>> >> >>>>and CONFIDENTIAL. If you are not an intended recipient, >> >>>> >> >>>> >> >>>you may not read, >> >>> >> >>> >> >>>>copy, or distribute this message or any attachment. If you >> >>>> >> >>>> >> >>>received this >> >>> >> >>> >> >>>>communication in error, please notify us immediately by >> >>>> >> >>>> >> >>>e-mail and then >> >>> >> >>> >> >>>>delete all copies of this message and any attachments. >> >>>> >> >>>>In addition you should be aware that ordinary (unencrypted) >> >>>> >> >>>> >> >>>e-mail sent >> >>> >> >>> >> >>>>through the Internet is not secure. Do not send >> >>>> >> >>>> >> >>>confidential or sensitive >> >>> >> >>> >> >>>>information, such as social security numbers, account >> >>>> >> >>>> >> >>>numbers, personal >> >>> >> >>> >> >>>>identification numbers and passwords, to us via ordinary >> >>>> >> >>>> >> >>>(unencrypted) >> >>> >> >>> >> >>>>e-mail. >> >>>> >> >>>> >> >>>> >> >>>> >> >>>> >> >>>--------------------------------------------------------------------- >> >>> >> >>> >> >>>>To unsubscribe, e-mail: [EMAIL PROTECTED] >> >>>>For additional commands, e-mail: [EMAIL PROTECTED] >> >>>> >> >>>> >> >>>> >> >>>> >> >> >> >>This message is intended only for the use of the person(s) listed above >> as >> >>the intended recipient(s), and may contain information that is >> PRIVILEGED >> >>and CONFIDENTIAL. If you are not an intended recipient, you may not >> read, >> >>copy, or distribute this message or any attachment. If you received >> >>this >> >>communication in error, please notify us immediately by e-mail and then >> >>delete all copies of this message and any attachments. >> >> >> >>In addition you should be aware that ordinary (unencrypted) e-mail sent >> >>through the Internet is not secure. Do not send confidential or >> sensitive >> >>information, such as social security numbers, account numbers, personal >> >>identification numbers and passwords, to us via ordinary (unencrypted) >> >>e-mail. >> >> >> >> >> >>--------------------------------------------------------------------- >> >>To unsubscribe, e-mail: [EMAIL PROTECTED] >> >>For additional commands, e-mail: [EMAIL PROTECTED] >> >> >> >> >> >> >> >> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [EMAIL PROTECTED] >> For additional commands, e-mail: [EMAIL PROTECTED] >> >> > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]