Francois PIETTE wrote:
>> But 3 bytes looks like UTF-8 ?
> 
> I don't know. You said it was UTF-16 if not encoded.

I installed IIS 7 on my Vista box and I found that IIS 7
uses UTF-7 in directory listings. The HTTP header contains 
the "charset=UTF-8" content-type extension.


However I think the ICS server should continue to use HTML 
enitities. 
HTML entities represent both iso-8859-1 (Latin1) and Unicode 
character numbers (in Unicode the first 256 chars are the same as
Latin1). So in order to create a _valid_ mapping a AnsiString MUST be 
converted with current ANSI code page to a UnicodeString/WideString
first! This can be achieved easily in TextToHtmlText() by a local 
WideString variable that is assigned parameter Src : String.  
Characters above #255 must the be represented as numerical HTML
entities (&#nnnn;). That's all, fully backwards compatible and
works in D2009 as well :)

--
Arno Garrels


> 
> ----- Original Message -----
> From: "Arno Garrels" <[EMAIL PROTECTED]>
> To: "ICS support mailing" <[email protected]>
> Sent: Thursday, October 09, 2008 7:03 PM
> Subject: Re: [twsocket] HTML encoding in HttpSrv func.
> TextToHtmlText() 
> 
> 
>> Francois PIETTE wrote:
>>>> The twothird character is not 'encoded' either as "&#8532;"
>>>> (decimal) or as "&#x2154;" (hex)? If so, IIS sends plain UTF-16!
>>> 
>>> Yes, no encoding at all. Just the 3 bytes. So UTF-16.
>> 
>> But 3 bytes looks like UTF-8 ?
>> 
>> --
>> Arno Garrels
>> 
>>> 
>>> --
>>> [EMAIL PROTECTED]
>>> http://www.overbyte.be
>>> 
>>> 
>>> ----- Original Message -----
>>> From: "Arno Garrels" <[EMAIL PROTECTED]>
>>> To: "ICS support mailing" <[email protected]>
>>> Sent: Thursday, October 09, 2008 5:26 PM
>>> Subject: Re: [twsocket] HTML encoding in HttpSrv func.
>>> TextToHtmlText()
>>> 
>>> 
>>>> Francois Piette wrote:
>>>>>> Yes, if someone has Apache or a newer IIS installed he could
>>>>>> help. Create a file name with characters not in current ANSI
>>>>>> code page by copy those characters from the Windows application
>>>>>> charmap.exe. Than start a packet sniffer and log a directory
>>>>>> listing. 
>>>>> 
>>>>> Using IIS6 on W2K3.
>>>> 
>>>> Thanks!
>>>> 
>>>>> The twothird character (U+2154) is sent in the dirlist as 3
>>>>> characters : 0xE2 0x85 0x94. In the href link, the 3 characters
>>>>> are expressed as %e2%85%94
>>>> 
>>>> That's UTF-8 URL-encoded.
>>>> 
>>>>> while they are binary in the text itself.
>>>> 
>>>> The twothird character is not 'encoded' either as "&#8532;"
>>>> (decimal) or as "&#x2154;" (hex)? If so, IIS sends plain UTF-16!
>>>> 
>>>>> There is nothing in the html header to tell which code page or
>>>>> charset is used. --
>>>> 
>>>> Browsers seem to be very good in detecting the correct character
>>>> set nowadays.
>>>> 
>>>> --
>>>> Arno Garrels
>>>> --
>>>> To unsubscribe or change your settings for TWSocket mailing list
>>>> please goto
>>>> http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket Visit
>>>> our website at http://www.overbyte.be 
>> --
>> To unsubscribe or change your settings for TWSocket mailing list
>> please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket
>> Visit our website at http://www.overbyte.be
-- 
To unsubscribe or change your settings for TWSocket mailing list
please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket
Visit our website at http://www.overbyte.be

Reply via email to