On Mon, Jan 17, 2000 at 01:00:53PM -0600, Gilles Detillieux wrote:
> According to Warren Jones:
> > I'm not at all sure that the patch to URL.cc is the best solution,
> > but something like it is essential for our site, and I suspect
> > others are in the same situation. Here are the details:
> >
> > o We must index only valid_extensions, since we have no
> > control over what individual users put in their web
> > directories, and some are ...uhm... indiscriminate.
> >
> > o If a user puts a binary executable in his web directory,
> > our server announces that it's type "text/html".
> > I don't have control over this either.
>
> This is a bit odd. I believe most servers use text/plain as the default
> type, for files with no suffix or an unknown suffix. Still, htdig would
> index text/plain files, so binary files with no file name suffix would
> still pose a problem.
My mistake. Our web server actually defaults to text/plain for files
without an extension. As you say, this still presents a problem.
> The problem is that change totally breaks things for cases where it's
> valid to have text files with no suffix. E.g., one may want to index
> a directory of HTML documentation files which also contains text/plain
> files like COPYING, ChangeLog, README, etc. If your change is necessary
> for your system, then perhaps it could be selectable by a new config
> attribute, but to make this the default or only action would cause a
> lot of users a lot of grief.
I attempted to address this by checking to see if "valid_extensions"
is set before rejecting URL's that have no extension and are not
directories. I suspect that this will be the desired behavior in
most cases where valid_extensions is used, but perhaps it would be
better to have a new config attribute, maybe something like
"no_null_extensions". Or perhaps URL's that have no extensions and
are not directories should be retrieved only if "valid_extensions"
includes a zero-length string. This would require a change to my
patch for Retriever.cc.
It's unfortunate that we have to skip files like COPYING, ChangeLog,
README etc, but this seems unavoidable when there's no reliable way
to tell whether such files are text or binary.
--
Warren
------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.