Hi all

as the title says, I had problems with comments in HTML, and the file
HTML.cc.
What is the problem. Well, most web pages at are school are produces by
people who don't know s**t about HTML.

They produce comments like:

<!hello -->
<!------------hello------------>

The first one is captured by HTML.cc at line 169 and following:

              else
                {
                  // Not a comment declaration after all
                  // but possibly DTD: get to the end

It isn't legal HTML but it is tackled.

But the next one causes more problems:
The code from line 155 and fol.:

              // Not the end of the declaration yet:
              // we'll try to find an actual comment
              if (strncmp((char *)position, "--", 2) == 0)
                {
                  // Found start of comment - now find the end
                  position += 2;
                  q = (unsigned char*)strstr((char *)position, "--");
                  if (!q)
                    {
                      *position = '\0';
                      break;    // Rest of document seems to be a
comment...
 
tries to determine if it is a comment. Then next, the code tries to find
the end. This is
done by finding the -- just before the > (comments end with -->). But in
the first comment
case above it fails. Anyway, it messes my indexing. The trick is (I
HOPE) that line 161:

                  q = (unsigned char*)strstr((char *)position, "--");

should be changed in:

                  q = (unsigned char*)strstr((char *)position, "-->");

It finds the first occurence of --> so don't recurse comments. Anyway,
it works on my htdig system.

Another problem is that M$ Frontpage 98 in combination with Frontpage
Server Extension don't do
<AREA> tags. They create a webbot (inside a comment). If the webbot has
links, these links don't
get indexed. Of couse this is a M$ / user problem, it just that you know
of it.

Hope you had some time to read it.

Greetz

--jesse
------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the SUBJECT of the message.

Reply via email to