According to Geoff Hutchison:
> Well I've been working on the documentation. There's still a bunch 
> todo, but I'm going to finish up for the night. If everyone would 
> take a look at tonight's snapshot, I'd appreciate it--I'm looking for 
> showstoppers mostly, but please go through the STATUS report. If 
> there's anything on there already resolved or that can be resolved 
> quickly, let me know.

I don't know if this is a showstopper or not, but the score calculation
code in htsearch/parser.cc just seems wrong to me...


            dm->score = (wr->Flags() & FLAG_TEXT) * config.Double("text_factor", 1);
            dm->score += (wr->Flags() & FLAG_CAPITAL) * config.Double("caps_factor", 
1);
            dm->score += (wr->Flags() & FLAG_TITLE) * config.Double("title_factor", 1);
            dm->score += (wr->Flags() & FLAG_HEADING) * 
config.Double("heading_factor", 1);
            dm->score += (wr->Flags() & FLAG_KEYWORDS) * 
config.Double("keywords_factor", 1);
            dm->score += (wr->Flags() & FLAG_DESCRIPTION) * 
config.Double("meta_description_factor", 1);
            dm->score += (wr->Flags() & FLAG_AUTHOR) * config.Double("author_factor", 
1);
            dm->score += (wr->Flags() & FLAG_LINK_TEXT) * 
config.Double("description_factor", 1);

First of all, it stikes me as odd that this code is duplicated in the else
clase of the if statement, and inefficient that all the factors are looked
up and converted for each and every word reference that gets scored.

However, my bigger concern is this.  The FLAG_* macros used above are
defined as:

#define FLAG_TEXT 0
#define FLAG_CAPITAL 1
#define FLAG_TITLE 2
#define FLAG_HEADING 4
#define FLAG_KEYWORDS 8
#define FLAG_DESCRIPTION 16
#define FLAG_AUTHOR 32
#define FLAG_LINK_TEXT 64
#define FLAG_URL 128

Now if FLAG_TEXT is 0, then (anything & 0) will always be 0, won't it?
So, it seems text_factor will never be counted.  If these FLAG_* macros
are to represent bit masks, none of them should be 0.  Or, if you want
the default value (no bits set) to indicate text words, the test should
be more like (wr->Flags() == FLAG_TEXT).

What's more, (wr->Flags() & FLAG_LINK_TEXT) will always be either 0 or 64,
depending on whether that bit is set in wr->Flags().  So, for link text,
you'll end up adding 64 times the value of description_factor to the text.
Somehow, I don't think that's the intended behaviour.  Perhaps something
like ((wr->Flags() & FLAG_LINK_TEXT) != 0) would do?  I'd make the changes
and commit them myself, but at this stage I'd like a second opinion in
case I'm misreading something.

I'm also a bit unsure about what's allowed in C++.  Apparently, the
ternary (x ? y : z) operator isn't supposed to be used, judging by the
response when I did use this, so I'm wondering whether something like
(x == y) or (x != y) is guaranteed to be 0 or 1 in C++, as it is in C.

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] 
You will receive a message to confirm this. 

Reply via email to