At 5:21 PM +0100 1/31/00, Gergely Madarasz wrote:
>I use htdig with a locale: de_DE setting. It seems unable to find
>occurrences of words containing non-ascii characters that are part of
>titles, <Hn> or emphasis elements. Say, if i look for "b�g" in my
>data, it finds an index.html document that contains the line

This is rather odd. You see, the HTML parser doesn't pay much 
attention to emphasis tags like <strong> or <em> and doesn't really 
do anything different about <Hn> tags as far as recording words.

However, Marc Pohl <[EMAIL PROTECTED]> found a problem with handling 
of 8-bit characters. I don't know whether it would fix this problem, 
but the patch is attached.

Please let me know if this helps,

WordList.patch

-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/
------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] 
You will receive a message to confirm this. 

Reply via email to