Francois PIETTE wrote: >> But 3 bytes looks like UTF-8 ? > > I don't know. You said it was UTF-16 if not encoded.
I installed IIS 7 on my Vista box and I found that IIS 7 uses UTF-7 in directory listings. The HTTP header contains the "charset=UTF-8" content-type extension. However I think the ICS server should continue to use HTML enitities. HTML entities represent both iso-8859-1 (Latin1) and Unicode character numbers (in Unicode the first 256 chars are the same as Latin1). So in order to create a _valid_ mapping a AnsiString MUST be converted with current ANSI code page to a UnicodeString/WideString first! This can be achieved easily in TextToHtmlText() by a local WideString variable that is assigned parameter Src : String. Characters above #255 must the be represented as numerical HTML entities (&#nnnn;). That's all, fully backwards compatible and works in D2009 as well :) -- Arno Garrels > > ----- Original Message ----- > From: "Arno Garrels" <[EMAIL PROTECTED]> > To: "ICS support mailing" <[email protected]> > Sent: Thursday, October 09, 2008 7:03 PM > Subject: Re: [twsocket] HTML encoding in HttpSrv func. > TextToHtmlText() > > >> Francois PIETTE wrote: >>>> The twothird character is not 'encoded' either as "⅔" >>>> (decimal) or as "⅔" (hex)? If so, IIS sends plain UTF-16! >>> >>> Yes, no encoding at all. Just the 3 bytes. So UTF-16. >> >> But 3 bytes looks like UTF-8 ? >> >> -- >> Arno Garrels >> >>> >>> -- >>> [EMAIL PROTECTED] >>> http://www.overbyte.be >>> >>> >>> ----- Original Message ----- >>> From: "Arno Garrels" <[EMAIL PROTECTED]> >>> To: "ICS support mailing" <[email protected]> >>> Sent: Thursday, October 09, 2008 5:26 PM >>> Subject: Re: [twsocket] HTML encoding in HttpSrv func. >>> TextToHtmlText() >>> >>> >>>> Francois Piette wrote: >>>>>> Yes, if someone has Apache or a newer IIS installed he could >>>>>> help. Create a file name with characters not in current ANSI >>>>>> code page by copy those characters from the Windows application >>>>>> charmap.exe. Than start a packet sniffer and log a directory >>>>>> listing. >>>>> >>>>> Using IIS6 on W2K3. >>>> >>>> Thanks! >>>> >>>>> The twothird character (U+2154) is sent in the dirlist as 3 >>>>> characters : 0xE2 0x85 0x94. In the href link, the 3 characters >>>>> are expressed as %e2%85%94 >>>> >>>> That's UTF-8 URL-encoded. >>>> >>>>> while they are binary in the text itself. >>>> >>>> The twothird character is not 'encoded' either as "⅔" >>>> (decimal) or as "⅔" (hex)? If so, IIS sends plain UTF-16! >>>> >>>>> There is nothing in the html header to tell which code page or >>>>> charset is used. -- >>>> >>>> Browsers seem to be very good in detecting the correct character >>>> set nowadays. >>>> >>>> -- >>>> Arno Garrels >>>> -- >>>> To unsubscribe or change your settings for TWSocket mailing list >>>> please goto >>>> http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket Visit >>>> our website at http://www.overbyte.be >> -- >> To unsubscribe or change your settings for TWSocket mailing list >> please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket >> Visit our website at http://www.overbyte.be -- To unsubscribe or change your settings for TWSocket mailing list please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket Visit our website at http://www.overbyte.be
