According to Warren Jones:
> I'm not at all sure that the patch to URL.cc is the best solution,
> but something like it is essential for our site, and I suspect
> others are in the same situation.  Here are the details:
> 
>     o We must index only valid_extensions, since we have no
>       control over what individual users put in their web
>       directories, and some are ...uhm... indiscriminate.
> 
>     o If a user puts a binary executable in his web directory,
>       our server announces that it's type "text/html".
>       I don't have control over this either.

This is a bit odd.  I believe most servers use text/plain as the default
type, for files with no suffix or an unknown suffix.  Still, htdig would
index text/plain files, so binary files with no file name suffix would
still pose a problem.

>     o Using valid_extensions also allows URL's with no extension
>       (after my patch to Retriever.cc).  This is as it should be,
>       since many URL's with no extension are subdirectories,
>       which we need to index.  But other URL's with no extension
>       are binary executables or heaven knows what.
> 
>     o Users can't be relied on to use a trailing slash in links
>       that point to a directory, e.g. <A HREF="subdirectory/">.
> 
> In short, I see no way to tell whether a URL with no extension
> is 1) a subdirectory, which we want to index or 2) binary garbage,
> which we want to ignore, except to do what I've done in URL.cc:
> add a trailing slash to the URL and try to retrieve it.
> 
> Still, I agree with Gilles in being a little uncomfortable with
> this solution.  I'd be happy if someone could suggest something
> that's more elegant.

The problem is that change totally breaks things for cases where it's
valid to have text files with no suffix.  E.g., one may want to index
a directory of HTML documentation files which also contains text/plain
files like COPYING, ChangeLog, README, etc.  If your change is necessary
for your system, then perhaps it could be selectable by a new config
attribute, but to make this the default or only action would cause a
lot of users a lot of grief.

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] 
You will receive a message to confirm this. 

Reply via email to