At 5:21 PM +0100 1/31/00, Gergely Madarasz wrote:
>I use htdig with a locale: de_DE setting. It seems unable to find
>occurrences of words containing non-ascii characters that are part of
>titles, <Hn> or emphasis elements. Say, if i look for "b�g" in my
>data, it finds an index.html document that contains the line
This is rather odd. You see, the HTML parser doesn't pay much
attention to emphasis tags like <strong> or <em> and doesn't really
do anything different about <Hn> tags as far as recording words.
However, Marc Pohl <[EMAIL PROTECTED]> found a problem with handling
of 8-bit characters. I don't know whether it would fix this problem,
but the patch is attached.
Please let me know if this helps,
WordList.patch
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/
------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.