Hi, folks.

This patch from Paul Henson fixes some long-standing complaints about
limitations in the local_urls and local_user_urls support.  It's for
3.1.4, so I don't know how easily it'll apply to 3.2.0b1, but it looks
to me like a clean implementation.  Can we have a vote on including it
in 3.2.0b1, despite the feature freeze?

Here's my
+1

--- begin forwarded message from Paul B. Henson ---
>From [EMAIL PROTECTED]  Thu Jan 27 16:23:52 2000
Date: Thu, 27 Jan 2000 14:23:46 -0800 (PST)
From: "Paul B. Henson" <[EMAIL PROTECTED]>
To: Gilles Detillieux <[EMAIL PROTECTED]>
cc: [EMAIL PROTECTED], [EMAIL PROTECTED]
Subject: Re: Local file system access enhancements (PR#744)
In-Reply-To: <[EMAIL PROTECTED]>
Message-ID: <[EMAIL PROTECTED]>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII

On Thu, 20 Jan 2000, Gilles Detillieux wrote:

> According to Paul B. Henson ([EMAIL PROTECTED]):
> > I would like to request that the local_default_doc allow a list of strings
> > specifying multiple default index files sorted by preference.
> 
> It's not really a bug, but it is a limitation that several others have
> requested we address.  It's on the to-do list, but won't make it into
> 3.2.0b1.  Hopefully someone will come forward to implement it for the
> following release.
>
> > We also have multiple locations where a ~name reference is redirected (e.g.,
> > /~foo/ could be either /dfs/user/foo or /dfs/group/foo).
> > 
> > I would like to request that the local_user_urls be enhanced to allow ~foo
> > to be found by searching multiple locations in preference order.
[...]
> 
> Ideally, though, htdig should keep trying other matches in the list if the
> first one fails, rather than only trying the first match in local_urls or
> local_user_urls.  Again, hopefully someone will implement this eventually.

Here is a patch to htdig-3.1.4 that allows you to specify multiple default
index files separated by white space, e.g.,

   local_default_doc:  index.html index.htm

that will be tried in order until one is found or the list is exhausted.

The patch also adds the ability to have multiple identical prefixes with
different paths in the local_urls configuration variable, e.g.,

   local_urls:  http://www.csupomona.edu/=/dfs/web/public/ \
                http://www.csupomona.edu/=/dfs/web/data/

that will be tried in order until one is found or the list is exhausted.

Finally, the patch adds the ability to have multiple locations for
local_user_urls, e.g.,

   local_user_urls:  http://www.csupomona.edu/=/dfs/group/,/ \
                     http://www.csupomona.edu/=/dfs/user/,/


The patch is not overly complicated; hopefully, you will be able to
incorporate it into your next release.


Thanks...

-----------------------------------------------------------------------------------
diff -r -c htdig-3.1.4/htdig/Document.cc htdig-3.1.4-new/htdig/Document.cc
*** htdig-3.1.4/htdig/Document.cc       Thu Dec  9 16:28:44 1999
--- htdig-3.1.4-new/htdig/Document.cc   Thu Jan 27 11:51:52 2000
***************
*** 571,589 ****
  
  
  //*****************************************************************************
! // DocStatus Document::RetrieveLocal(time_t date, char *filename)
  //   Attempt to retrieve the document pointed to by our internal URL
! //   using a local filename given. Returns Document_ok,
  //   Document_not_changed or Document_not_local (in which case the
  //   retriever tries it again using HTTP).
  //
  Document::DocStatus
! Document::RetrieveLocal(time_t date, char *filename)
  {
      struct stat stat_buf;
!     // Check that it exists, and is a regular file. 
!     if ((stat(filename, &stat_buf) == -1) || !S_ISREG(stat_buf.st_mode))
!       return Document_not_local;
  
      modtime = stat_buf.st_mtime;
      if (modtime <= date)
--- 571,602 ----
  
  
  //*****************************************************************************
! // DocStatus Document::RetrieveLocal(time_t date, StringList *filenames)
  //   Attempt to retrieve the document pointed to by our internal URL
! //   using a list of potential local filenames given. Returns Document_ok,
  //   Document_not_changed or Document_not_local (in which case the
  //   retriever tries it again using HTTP).
  //
  Document::DocStatus
! Document::RetrieveLocal(time_t date, StringList *filenames)
  {
      struct stat stat_buf;
!     String *filename;
! 
!     filenames->Start_Get();
! 
!     // Loop through list of potential filenames until the list is exhausted
!     // or a suitable file is found.
!     while ((filename = (String *)filenames->Get_Next()) &&
!          ((stat(*filename, &stat_buf) == -1) || !S_ISREG(stat_buf.st_mode)))
!         if (debug > 1)
!           cout << "  tried local file " << *filename << endl;
!     
!     if (!filename)
!         return Document_not_local;
! 
!     if (debug > 1)
!         cout << "  found existing file " << *filename << endl;
  
      modtime = stat_buf.st_mtime;
      if (modtime <= date)
***************
*** 592,598 ****
      // Process only HTML files (this could be changed if we read
      // the server's mime.types file).
      // (...and handle a select few other types for now...)
!     char *ext = strrchr(filename, '.');
      if (ext == NULL)
                return Document_not_local;
      if ((mystrcasecmp(ext, ".html") == 0) || (mystrcasecmp(ext, ".htm") == 0))
--- 605,611 ----
      // Process only HTML files (this could be changed if we read
      // the server's mime.types file).
      // (...and handle a select few other types for now...)
!     char *ext = strrchr(*filename, '.');
      if (ext == NULL)
                return Document_not_local;
      if ((mystrcasecmp(ext, ".html") == 0) || (mystrcasecmp(ext, ".htm") == 0))
***************
*** 607,613 ****
        return Document_not_local;
  
      // Open it
!     FILE *f = fopen(filename, "r");
      if (f == NULL)
        return Document_not_local;
  
--- 620,626 ----
        return Document_not_local;
  
      // Open it
!     FILE *f = fopen(*filename, "r");
      if (f == NULL)
        return Document_not_local;
  
diff -r -c htdig-3.1.4/htdig/Document.h htdig-3.1.4-new/htdig/Document.h
*** htdig-3.1.4/htdig/Document.h        Thu Dec  9 16:28:44 1999
--- htdig-3.1.4-new/htdig/Document.h    Thu Jan 27 11:51:16 2000
***************
*** 19,24 ****
--- 19,25 ----
  #include "Object.h"
  #include "URL.h"
  #include "htString.h"
+ #include "StringList.h"
  #if TIME_WITH_SYS_TIME
  # include <sys/time.h>
  # include <time.h>
***************
*** 79,85 ****
        Document_not_local
      };
      DocStatus                 RetrieveHTTP(time_t date);
!     DocStatus                 RetrieveLocal(time_t date, char *filename);
  
      //
      // Return an appropriate parsable object for the document type.
--- 80,86 ----
        Document_not_local
      };
      DocStatus                 RetrieveHTTP(time_t date);
!     DocStatus                 RetrieveLocal(time_t date, StringList *filenames);
  
      //
      // Return an appropriate parsable object for the document type.
diff -r -c htdig-3.1.4/htdig/Retriever.cc htdig-3.1.4-new/htdig/Retriever.cc
*** htdig-3.1.4/htdig/Retriever.cc      Thu Dec  9 16:28:44 1999
--- htdig-3.1.4-new/htdig/Retriever.cc  Thu Jan 27 12:18:26 2000
***************
*** 139,148 ****
        {
            String robotsURL = "http://";
            robotsURL << u.host() << "/robots.txt";
!           String *localRobotsFile = GetLocal(robotsURL.get());
!           server = new Server(u.host(), u.port(), localRobotsFile);
            servers.Add(u.signature(), server);
!           delete localRobotsFile;
        }
        else if (from && visited.Exists(url)) 
        {
--- 139,148 ----
        {
            String robotsURL = "http://";
            robotsURL << u.host() << "/robots.txt";
!           StringList *localRobotsFiles = GetLocal(robotsURL.get());
!           server = new Server(u.host(), u.port(), localRobotsFiles);
            servers.Add(u.signature(), server);
!           delete localRobotsFiles;
        }
        else if (from && visited.Exists(url)) 
        {
***************
*** 402,413 ****
      // Retrive document, first trying local file access if possible.
      Document::DocStatus status;
      server = (Server *) servers[url.signature()];
!     String *local_filename = GetLocal(url.get());
!     if (local_filename)
      {  
          if (debug > 1)
!           cout << "Trying local file " << *local_filename << endl;
!         status = doc->RetrieveLocal(date, *local_filename);
          if (status == Document::Document_not_local)
          {
            if (local_urls_only)
--- 402,413 ----
      // Retrive document, first trying local file access if possible.
      Document::DocStatus status;
      server = (Server *) servers[url.signature()];
!     StringList *local_filenames = GetLocal(url.get());
!     if (local_filenames)
      {  
          if (debug > 1)
!           cout << "Trying local files" << endl;
!         status = doc->RetrieveLocal(date, local_filenames);
          if (status == Document::Document_not_local)
          {
            if (local_urls_only)
***************
*** 421,427 ****
            else
                status = Document::Document_no_server;
          }
!         delete local_filename;
      }
      else if (server && !server->IsDead())
          status = doc->RetrieveHTTP(date);
--- 421,427 ----
            else
                status = Document::Document_no_server;
          }
!         delete local_filenames;
      }
      else if (server && !server->IsDead())
          status = doc->RetrieveHTTP(date);
***************
*** 747,762 ****
  
  
  //*****************************************************************************
! // String* Retriever::GetLocal(char *url)
! //   Returns a string containing the (possible) local filename
  //   of the given url, or 0 if it's definitely not local.
! //   THE CALLER MUST FREE THE STRING AFTER USE!
  //
! String*
  Retriever::GetLocal(char *url)
  {
      static StringList *prefixes = 0;
      static StringList *paths = 0;
  
      //
      // Initialize prefix/path list if this is the first time.
--- 747,763 ----
  
  
  //*****************************************************************************
! // StringList* Retriever::GetLocal(char *url)
! //   Returns a list of strings containing the (possible) local filenames
  //   of the given url, or 0 if it's definitely not local.
! //   THE CALLER MUST FREE THE STRINGLIST AFTER USE!
  //
! StringList*
  Retriever::GetLocal(char *url)
  {
      static StringList *prefixes = 0;
      static StringList *paths = 0;
+     static StringList *defaultdocs = 0;
  
      //
      // Initialize prefix/path list if this is the first time.
***************
*** 766,771 ****
--- 767,773 ----
      {
        prefixes = new StringList();
        paths = new StringList();
+       defaultdocs = new StringList();
  
        String t = config["local_urls"];
        char *p = strtok(t, " \t");
***************
*** 782,793 ****
              paths->Add(path);
            p = strtok(0, " \t");
        }
      }
  
      // Check first for local user...
      if (strchr(url, '~'))
      {
!       String *local = GetLocalUser(url);
        if (local)
            return local;
      }
--- 784,804 ----
              paths->Add(path);
            p = strtok(0, " \t");
        }
+       t = config["local_default_doc"];
+       p = strtok(t, " \t");
+       while (p)       
+       {
+           defaultdocs->Add(p);
+           p = strtok(0, " \t");
+       }
+       if (defaultdocs->Count() == 0)
+           delete defaultdocs;
      }
  
      // Check first for local user...
      if (strchr(url, '~'))
      {
!       StringList *local = GetLocalUser(url, defaultdocs);
        if (local)
            return local;
      }
***************
*** 797,802 ****
--- 808,814 ----
          return 0;
      
      String *prefix, *path;
+     StringList *local_names = new StringList();
      prefixes->Start_Get();
      paths->Start_Get();
      while ((prefix = (String*) prefixes->Get_Next()))
***************
*** 807,830 ****
            int l = strlen(url)-prefix->length()+path->length()+4;
            String *local = new String(*path, l);
            *local += &url[prefix->length()];
!           if (local->last() == '/' && config["local_default_doc"] != "")
!             *local += config["local_default_doc"];
!           return local;
        }       
      }
      return 0;
  }
  
  
  //*****************************************************************************
! // String* Retriever::GetLocalUser(char *url)
! //   If the URL has ~user part, returns a string containing the
! //   (possible) local filename of the given url, or 0 if it's
  //   definitely not local.
! //   THE CALLER MUST FREE THE STRING AFTER USE!
  //
! String*
! Retriever::GetLocalUser(char *url)
  {
      static StringList *prefixes = 0, *paths = 0, *dirs = 0;
      static Dictionary home_cache;
--- 819,854 ----
            int l = strlen(url)-prefix->length()+path->length()+4;
            String *local = new String(*path, l);
            *local += &url[prefix->length()];
!           if (local->last() == '/' && defaultdocs) {
!             defaultdocs->Start_Get();
!             while (String *defaultdoc = (String *)defaultdocs->Get_Next()) {
!               String *localdefault = new String(*local, 
local->length()+defaultdoc->length()+1);
!               localdefault->append(*defaultdoc);
!               local_names->Add(localdefault);
!             }
!             delete local;
!           }
!           else
!             local_names->Add(local);
        }       
      }
+     if (local_names->Count() > 0)
+         return local_names;
+ 
+     delete local_names;
      return 0;
  }
  
  
  //*****************************************************************************
! // StringList* Retriever::GetLocalUser(char *url, StringList *defaultdocs)
! //   If the URL has ~user part, return a list of strings containing the
! //   (possible) local filenames of the given url, or 0 if it's
  //   definitely not local.
! //   THE CALLER MUST FREE THE STRINGLIST AFTER USE!
  //
! StringList*
! Retriever::GetLocalUser(char *url, StringList *defaultdocs)
  {
      static StringList *prefixes = 0, *paths = 0, *dirs = 0;
      static Dictionary home_cache;
***************
*** 882,887 ****
--- 906,912 ----
      paths->Start_Get();
      dirs->Start_Get();
      String *prefix, *path, *dir;
+     StringList *local_names = new StringList();
      while ((prefix = (String*) prefixes->Get_Next()))
      {
          path = (String*) paths->Get_Next();
***************
*** 906,912 ****
            if (home)
                *local += *home;
            else
!               return 0;
        }
        else
        {
--- 931,937 ----
            if (home)
                *local += *home;
            else
!               continue;
        }
        else
        {
***************
*** 915,924 ****
        }
        *local += *dir;
        *local += rest;
!       if (local->last() == '/' && config["local_default_doc"] != "")
!         *local += config["local_default_doc"];
!       return local;
      }
      return 0;
  }
  
--- 940,962 ----
        }
        *local += *dir;
        *local += rest;
!       if (local->last() == '/' && defaultdocs) {
!         defaultdocs->Start_Get();
!         while (String *defaultdoc = (String *)defaultdocs->Get_Next()) {
!           String *localdefault = new String(*local, 
local->length()+defaultdoc->length()+1);
!           localdefault->append(*defaultdoc);
!           local_names->Add(localdefault);
!         }
!         delete local;
!       }
!       else
!         local_names->Add(local);
      }
+ 
+     if (local_names->Count() > 0)
+         return local_names;
+ 
+     delete local_names;
      return 0;
  }
  
***************
*** 933,939 ****
  {
      int ret;
  
!     String *local_filename = GetLocal(url);
      ret = (local_filename != 0);
      delete local_filename;
  
--- 971,977 ----
  {
      int ret;
  
!     StringList *local_filename = GetLocal(url);
      ret = (local_filename != 0);
      delete local_filename;
  
***************
*** 1174,1180 ****
                    //
                    String robotsURL = "http://";
                    robotsURL << url.host() << "/robots.txt";
!                   String *localRobotsFile = GetLocal(robotsURL.get());
                    server = new Server(url.host(), url.port(), localRobotsFile);
                    servers.Add(url.signature(), server);
                    delete localRobotsFile;
--- 1212,1218 ----
                    //
                    String robotsURL = "http://";
                    robotsURL << url.host() << "/robots.txt";
!                   StringList *localRobotsFile = GetLocal(robotsURL.get());
                    server = new Server(url.host(), url.port(), localRobotsFile);
                    servers.Add(url.signature(), server);
                    delete localRobotsFile;
***************
*** 1307,1313 ****
                    //
                    String robotsURL = "http://";
                    robotsURL << url.host() << "/robots.txt";
!                   String *localRobotsFile = GetLocal(robotsURL.get());
                    server = new Server(url.host(), url.port(), localRobotsFile);
                    servers.Add(url.signature(), server);
                    delete localRobotsFile;
--- 1345,1351 ----
                    //
                    String robotsURL = "http://";
                    robotsURL << url.host() << "/robots.txt";
!                   StringList *localRobotsFile = GetLocal(robotsURL.get());
                    server = new Server(url.host(), url.port(), localRobotsFile);
                    servers.Add(url.signature(), server);
                    delete localRobotsFile;
diff -r -c htdig-3.1.4/htdig/Retriever.h htdig-3.1.4-new/htdig/Retriever.h
*** htdig-3.1.4/htdig/Retriever.h       Thu Dec  9 16:28:44 1999
--- htdig-3.1.4-new/htdig/Retriever.h   Thu Jan 27 12:05:41 2000
***************
*** 12,17 ****
--- 12,18 ----
  #include "Dictionary.h"
  #include "Queue.h"
  #include "List.h"
+ #include "StringList.h"
  
  class URL;
  class Document;
***************
*** 68,75 ****
      //
      // Routines for dealing with local filesystem access
      //
!     String *            GetLocal(char *url);
!     String *            GetLocalUser(char *url);
      int                       IsLocalURL(char *url);
        
  private:
--- 69,76 ----
      //
      // Routines for dealing with local filesystem access
      //
!     StringList *            GetLocal(char *url);
!     StringList *            GetLocalUser(char *url, StringList *defaultdocs);
      int                       IsLocalURL(char *url);
        
  private:
diff -r -c htdig-3.1.4/htdig/Server.cc htdig-3.1.4-new/htdig/Server.cc
*** htdig-3.1.4/htdig/Server.cc Thu Dec  9 16:28:44 1999
--- htdig-3.1.4-new/htdig/Server.cc     Thu Jan 27 11:04:37 2000
***************
*** 20,28 ****
  
  
  //*****************************************************************************
! // Server::Server(char *host, int port, String *local_robots_file)
  //
! Server::Server(char *host, int port, String *local_robots_file)
  {
      if (debug > 0)
        cout << endl << "New server: " << host << ", " << port << endl;
--- 20,28 ----
  
  
  //*****************************************************************************
! // Server::Server(char *host, int port, StringList *local_robots_files)
  //
! Server::Server(char *host, int port, StringList *local_robots_files)
  {
      if (debug > 0)
        cout << endl << "New server: " << host << ", " << port << endl;
***************
*** 47,57 ****
      static int                local_urls_only = config.Boolean("local_urls_only");
      time_t            timeZero = 0;
      Document::DocStatus       status;
!     if (local_robots_file)
      {  
          if (debug > 1)
!           cout << "Trying local file " << *local_robots_file << endl;
!         status = doc.RetrieveLocal(timeZero, *local_robots_file);
          if (status == Document::Document_not_local)
          {
            if (local_urls_only)
--- 47,57 ----
      static int                local_urls_only = config.Boolean("local_urls_only");
      time_t            timeZero = 0;
      Document::DocStatus       status;
!     if (local_robots_files)
      {  
          if (debug > 1)
!           cout << "Trying local files " << endl;
!         status = doc.RetrieveLocal(timeZero, local_robots_files);
          if (status == Document::Document_not_local)
          {
            if (local_urls_only)
diff -r -c htdig-3.1.4/htdig/Server.h htdig-3.1.4-new/htdig/Server.h
*** htdig-3.1.4/htdig/Server.h  Thu Dec  9 16:28:44 1999
--- htdig-3.1.4-new/htdig/Server.h      Thu Jan 27 11:58:29 2000
***************
*** 11,16 ****
--- 11,17 ----
  
  #include "Object.h"
  #include "htString.h"
+ #include "StringList.h"
  #include "Stack.h"
  #include "Queue.h"
  #include "StringMatch.h"
***************
*** 25,31 ****
        //
        // Construction/Destruction
        //
!       Server(char *host, int port, String *local_robots_file = NULL);
        ~Server();
  
        //
--- 26,32 ----
        //
        // Construction/Destruction
        //
!       Server(char *host, int port, StringList *local_robots_files = NULL);
        ~Server();
  
        //
-----------------------------------------------------------------------------------




-- 
Paul B. Henson  |  (909) 869-3781  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  [EMAIL PROTECTED]
California State Polytechnic University  |  Pomona CA 91768

--- end forwarded message from Paul B. Henson ---

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] 
You will receive a message to confirm this. 

Reply via email to