According to Aaron Turner:
> So, If a user does a search from category 1.3.5 on a word that matches
> against article 127 he'll get one match.  If he does the search from cat
> 1.3 however he will get two hits- for the same article (127).  This is
> what I'm trying to prevent.  Basically, I need htsearch to scan the URL's
> of all matches, select the value of 'id' and compare each hit to the
> others and drop duplicates.
> 
> I was told the way to do this was by modifing Display::buildMatchList() in
> htsearch/Display.cc to weed out the duplicates before they're counted and
> paginated.  (Doing this post htsearch via a wrapper causes incorrect
> paging info.)
> 
> Honestly, I'm pretty SOL at this point.  I barely can do 'hello world' in
> C++, and a friend of mine who knows C/C++ took a quick look at the code
> quickly gave up because it wasn't intuitive in his opinion.
> 
> I was hoping someone could give me some pointers that I could either
> forward to my friend so that he could do it, or so that when I pick up one
> of the 5 or so C++ books I have laying around I can figure it out myself.
> (I've done a lot of Pascal/Perl programming in the past, so I know how to
> program, just not C++ or much OOP.)

I think your friend gave up too easily.  If Display::buildMatchList()
is too unintuitive, how about Display::includeURL(), which is called
from buildMatchList?  It's quite straightforward, and selects matches
based solely on URL.  Currently, it checks the URL against two lists,
limitTo and excludeFrom, built from the restrict and exclude input
parameters, and returns a 1 if the URL is to be included.  You'd just
need to add further tests before returning 1, to make sure the URL is
not a duplicate.  Your new code would have to keep track of any IDs
it's checked so far, for any URLs that call your particular CGI script,
and see if an ID comes up that it's already seen.  If so, it returns 0,
and if it's a new ID, or not a reference to this CGI script, it returns 1.

As long as you don't care which of the dups it selects, this should
do the trick.  If you want to get fancier and select the dup with the
lowest section number, you'd probably have to do post-processing on the
whole list, after buildMatchList() gathered up all matches.

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930
------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the SUBJECT of the message.

Reply via email to