[htdig] Won't index word doc's

Keith Pettit Mon, 09 Dec 2002 14:43:32 -0800

I am so boggled with this.  I've followed all the instructions tried
different vresion of htdig, different paresers and nothing seems to work
and I can't tell where the failure is.


Basically I'm running htdig on a index page I created. All this page has
is a bunch of links to word documents.  But it don't search though any
of the doc's.

This is what I get when I run it:
htdig: Run complete
htdig: 1 server seen:
htdig:     www.drgutah.com:80 1 document

I've tried using htparsedoc, parse_doc.pl, and the doc2html.  htparsedoc
and parse_doc.pl work by themselves if just execute them by themselves
and point them at a word file, can't get doc2html to work and I assume
it's becuase I won't buy the commerical coverter.  So I'm assumin there
is some sort of issue in my config.  I've got it pointing to the right
places it just seems like it's ignoring the .doc files.  Maybe there is
some way I can force it to go though them.

Thanks for any help..

Thanks.

Keith
[EMAIL PROTECTED]

external_parsers: application/msword /opt/www/htdig/bin/htparsedoc \
                  application/postscript /opt/www/htdig/bin/htparsedoc \
                  application/pdf /opt/www/htdig/bin/htparsedoc

database_dir:           /opt/www/htdig/db
start_url:              http://myurl.com
limit_urls_to:          ${start_url}
exclude_urls:           /cgi-bin/ .cgi
maintainer:             [EMAIL PROTECTED]
max_head_length:        10000
max_doc_size:           2000000




-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

[htdig] Won't index word doc's

Reply via email to