Looking at Parser.flex...
/* Non whitespace and not close of tag (right angle bracket). I.e.
* chars that
* would not cause an unquoted attribute to end */
NONSEP=[^>\n\r\ \t\b\012:?]
NONSEP_NOQUOTE=[^>\n\r\ \t\b\012:?"]
This I don't understand... "?" or ":" do not terminate the attribute
(meaning the URL in an a href=<unquoted URL>. Presumably it is to reduce
backtracking? Anyway, the proposed modifications are:
NONSEP=[^>\n\r\ \t\b\012:]
NONSEP_NOQUOTE=[^>\n\r\ \t\b\012:"]
......
/* Catch any colon or ?htl= within the URL */
LINK_PATTERNS1={LINK_ATTRS}{WS}={WS}["][^":]*[:][^"]*
LINK_PATTERNS2={LINK_ATTRS}{WS}={WS}({NONSEP_NOQUOTE}{NONSEP}*)?[:]{NONSEP}*
LINK_PATTERNS3={LINK_ATTRS}{WS}={WS}["][^"?]*?htl=
LINK_PATTERNS4={LINK_ATTRS}{WS}={WS}({NONSEP_NOQUOTE}{NONSEP}*)?htl=
LINK_PATTERNS={LINK_PATTERNS1}|{LINK_PATTERNS2}|{LINK_PATTERNS3}|{LINK_PATTERNS4}
This should achieve the functionality we want: block all colons (if we
want to change the port, we should encode it as
__CHECKED_HTTP_hostname_port__ or something), allow ? unless it's part
of a ?htl=... However, I could be grossly mistaken. Comments?
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL:
<https://emu.freenetproject.org/pipermail/devl/attachments/20020831/e5b83e99/attachment.pgp>