Hi Bruno,

On 3/26/24 5:20 AM, Bruno Haible wrote:
> Regarding the 'modules' subdirectory: When I see you replacing 2 lines of code
> with 8 lines of code, and especially searching for a 'modules' file name
> component, I wonder whether there's no simpler solution.
> Namely, a module of name NAME can be located at one of the following 
> locations:
>   gnulib_dir/modules/NAME
>   local_dir1/modules/NAME
>   local_dir2/modules/NAME
>   ...
> (Or it can logically be in such a location but physically reside in a 
> temporary
> file.)
> So, if the GLModule.__init__ method knows about which directory (gnulib_dir,
> local_dir1, local_dir2, ...) resolved the lookup from GLModuleSystem.py:108,
> it just needs to remove the /modules/ file name component from that (fixed!)
> position. I think this should be simpler than to search for a /modules/ file
> name component and then argue whether the first one found or the last one
> found is the better match.

Not sure why I didn't do this in the first place to be honest...
Here is two patches. Most of the change is in these lines:

         if self.exists(module):
             path, istemp = self.filesystem.lookup(joinpath('modules', module))
-            result = GLModule(self.config, path, istemp)
+            # Get the directory containing the module. May be a '--local-dir' 
directory.
+            directory = remove_trailing_slashes(subend(joinpath('modules', 
module), '', path))
+            result = GLModule(self.config, directory, module, istemp)
             return result

So, as you explained in your example, we remove 'modules/NAME' from
'gnulib_dir/modules/NAME' to get the directory. Then we pass that
directory and the module name to GLModule.__init__().

The 'remove_trailing_slashes' isn't strictly necessary since:

    joinpath('dir1//', 'dir2')
    'dir1/dir2'

but in case we want the path by itself, I think it is preferable not
to have the trailing slash.

Also, I used the '@property' decorator for getting the path. Maybe
this is personal preference, but I like it for values that should be
read only [1]:

+    @property
+    def path(self) -> str:
+        '''Return the path to the module description file.'''
+        return joinpath(self.directory, 'modules', self.name)

In this case, the path always depends on the directory and module
name. We shouldn't need to change it:

      module = GLModule('gnulib-dir', 'my-module', False)
      # Print the path.
      print(module.path)
      'gnulib-dir/modules/my-module'
      # This will fail and throw an AttributeError
      module.path = 'this will fail'
      AttributeError: property 'path' of 'GLModule' object has no setter

The second patch just fixes the Coreutils --local-dir module sorting
by changing __eq__() and friends to use the module name and not the
path.

And here is the test again to show that by caching the names the test
does not take 20 seconds just because of that regular expression. :)

$ time env GNULIB_TOOL_IMPL=py ./test-all.sh 
PASS: test-hello-c-gnulib-1.sh
PASS: test-hello-c-gnulib-automakesubdir-1.sh
PASS: test-hello-c-gnulib-automakesubdir-withtests-1.sh
PASS: test-hello-c-gnulib-conddeps-1.sh
PASS: test-hello-c-gnulib-nonrecursive-1.sh
PASS: test-wget2-1.sh
./test-oath-toolkit-1.out tmp636420-out differ: byte 561, line 24
--- ./test-oath-toolkit-1.out   2024-03-30 18:09:34.351161750 -0700
+++ tmp636420-out       2024-03-30 21:04:46.108986889 -0700
@@ -21,6 +21,7 @@
   top/GNUmakefile
   top/maint.mk
 Updating file gl/m4/gnulib-common.m4 (backup in gl/m4/gnulib-common.m4~)
+Updating gl/Makefile.am (backup in gl/Makefile.am~)
 Finished.
 
 You may need to add #include directives for the following .h files.
FAIL: gnulib-tool's output has unexpected differences.
FAIL: test-oath-toolkit-1.sh
PASS: test-oath-toolkit-2.sh
PASS: test-oath-toolkit-3.sh
PASS: test-oath-toolkit-4.sh
PASS: test-oath-toolkit-5.sh
PASS: test-coreutils-1.sh
PASS: test-emacs-1.sh

real    0m4.960s
user    0m2.863s
sys     0m2.137s

[1] https://docs.python.org/3/library/functions.html#property

Collin
From 0adefc31ffeb08c483b38cfc65cca733a0e9e7f7 Mon Sep 17 00:00:00 2001
From: Collin Funk <collin.fu...@gmail.com>
Date: Sat, 30 Mar 2024 20:05:00 -0700
Subject: [PATCH 1/2] gnulib-tool.py: Cache the module name instead of
 computing it.

(GLModule.__init__): Replace path parameter with a directory and name
parameter. Update docstring.
* pygnulib/GLModuleSystem.py (GLModuleSystem.find): Extract the
directory from the module path. Use it when creating the GLModule
object.
(GLModule.path): New property function. Returns the path to the module
description file.
(GLModule.getName): Return the cached name instead of using a regular
expression to extract it from the path.
---
 ChangeLog                  | 13 +++++++++++++
 pygnulib/GLModuleSystem.py | 38 +++++++++++++++++++++++++-------------
 2 files changed, 38 insertions(+), 13 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index cf25ecf79e..dc03400c2a 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,16 @@
+2024-03-30  Collin Funk  <collin.fu...@gmail.com>
+
+	gnulib-tool.py: Cache the module name instead of computing it.
+	(GLModule.__init__): Replace path parameter with a directory and name
+	parameter. Update docstring.
+	* pygnulib/GLModuleSystem.py (GLModuleSystem.find): Extract the
+	directory from the module path. Use it when creating the GLModule
+	object.
+	(GLModule.path): New property function. Returns the path to the module
+	description file.
+	(GLModule.getName): Return the cached name instead of using a regular
+	expression to extract it from the path.
+
 2024-03-30  Collin Funk  <collin.fu...@gmail.com>
 
 	gnulib-tool.py: Don't discard the 'dummy' module.
diff --git a/pygnulib/GLModuleSystem.py b/pygnulib/GLModuleSystem.py
index 4add195f2a..afe2d8aa0e 100644
--- a/pygnulib/GLModuleSystem.py
+++ b/pygnulib/GLModuleSystem.py
@@ -50,6 +50,7 @@ lines_to_multiline = constants.lines_to_multiline
 isdir = os.path.isdir
 isfile = os.path.isfile
 filter_filelist = constants.filter_filelist
+remove_trailing_slashes = constants.remove_trailing_slashes
 
 
 #===============================================================================
@@ -102,7 +103,9 @@ class GLModuleSystem(object):
                             % type(module).__name__)
         if self.exists(module):
             path, istemp = self.filesystem.lookup(joinpath('modules', module))
-            result = GLModule(self.config, path, istemp)
+            # Get the directory containing the module. May be a '--local-dir' directory.
+            directory = remove_trailing_slashes(subend(joinpath('modules', module), '', path))
+            result = GLModule(self.config, directory, module, istemp)
             return result
         else:  # if not self.exists(module)
             if self.config['errors']:
@@ -183,29 +186,35 @@ class GLModule(object):
     # List of characters allowed in shell identifiers.
     shell_id_chars = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789_'
 
-    def __init__(self, config: GLConfig, path: str, patched: bool = False) -> None:
-        '''Create new GLModule instance. Arguments are path and patched, where
-        path is a string representing the path to the module and patched is a
-        bool indicating that module was created after applying patch.'''
+    def __init__(self, config: GLConfig, directory: str, name: str, patched: bool = False) -> None:
+        '''Create new GLModule instance.
+
+        config stores information shared between classes.
+        directory is the directory that contains the module description file
+          in it's modules/ subdirectory.
+        name is the name of the module.
+        patched is True if the module had a patch applied to it, else False.'''
         self.args = dict()
         self.cache = dict()
         self.content = ''
         if type(config) is not GLConfig:
             raise TypeError('config must be a GLConfig, not %s'
                             % type(config).__name__)
-        if type(path) is not str:
-            raise TypeError('path must be a string, not %s'
-                            % type(path).__name__)
+        if type(directory) is not str:
+            raise TypeError(f'directory must be a string, not {type(directory).__name__}')
+        if type(name) is not str:
+            raise TypeError(f'name must be a string, not {type(name).__name__}')
         if type(patched) is not bool:
             raise TypeError('patched must be a bool, not %s'
                             % type(patched).__name__)
-        self.path = path
+        self.directory = directory
+        self.name = name
         self.patched = patched
         self.config = config
         self.filesystem = GLFileSystem(self.config)
         self.modulesystem = GLModuleSystem(self.config)
         # Read the module description file into memory.
-        with codecs.open(path, 'rb', 'UTF-8') as file:
+        with codecs.open(self.path, 'rb', 'UTF-8') as file:
             self.content = file.read().replace('\r\n', '\n')
         # Dissect it into sections.
         self.sections = dict()
@@ -219,6 +228,11 @@ class GLModule(object):
         if last_section_label != None:
             self.sections[last_section_label] = self.content[last_section_start:]
 
+    @property
+    def path(self) -> str:
+        '''Return the path to the module description file.'''
+        return joinpath(self.directory, 'modules', self.name)
+
     def __eq__(self, module: object) -> bool:
         '''x.__eq__(y) <==> x==y'''
         result = False
@@ -284,9 +298,7 @@ class GLModule(object):
 
     def getName(self) -> str:
         '''Return the name of the module.'''
-        pattern = re.compile(joinpath('modules', '(.*)$'))
-        result = pattern.findall(self.path)[0]
-        return result
+        return self.name
 
     def isPatched(self) -> bool:
         '''Check whether module was created after applying patch.'''
-- 
2.44.0

From 7804711f8a074cefa078e1e6625f3f3f7a1b3a57 Mon Sep 17 00:00:00 2001
From: Collin Funk <collin.fu...@gmail.com>
Date: Sat, 30 Mar 2024 20:51:22 -0700
Subject: [PATCH 2/2] gnulib-tool.py: Fix sorting of modules when --local-dir
 is used.

* pygnulib/GLModuleSystem.py (GLModule.__eq__, GLModule.__ne__)
(GLModule.__ge__, GLModule.__gt__, GLModule.__hash__, GLModule.__le__)
(GLModule.__lt__): Use module names as identifiers instead of paths.
---
 ChangeLog                  |  7 +++++++
 pygnulib/GLModuleSystem.py | 14 +++++++-------
 2 files changed, 14 insertions(+), 7 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index dc03400c2a..28d1c2e855 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,10 @@
+2024-03-30  Collin Funk  <collin.fu...@gmail.com>
+
+	gnulib-tool.py: Fix sorting of modules when --local-dir is used.
+	* pygnulib/GLModuleSystem.py (GLModule.__eq__, GLModule.__ne__)
+	(GLModule.__ge__, GLModule.__gt__, GLModule.__hash__, GLModule.__le__)
+	(GLModule.__lt__): Use module names as identifiers instead of paths.
+
 2024-03-30  Collin Funk  <collin.fu...@gmail.com>
 
 	gnulib-tool.py: Cache the module name instead of computing it.
diff --git a/pygnulib/GLModuleSystem.py b/pygnulib/GLModuleSystem.py
index afe2d8aa0e..3bed06aded 100644
--- a/pygnulib/GLModuleSystem.py
+++ b/pygnulib/GLModuleSystem.py
@@ -237,7 +237,7 @@ class GLModule(object):
         '''x.__eq__(y) <==> x==y'''
         result = False
         if type(module) is GLModule:
-            if self.path == module.path:
+            if self.name == module.name:
                 result = True
         return result
 
@@ -245,7 +245,7 @@ class GLModule(object):
         '''x.__ne__(y) <==> x!=y'''
         result = False
         if type(module) is GLModule:
-            if self.path != module.path:
+            if self.name != module.name:
                 result = True
         return result
 
@@ -253,7 +253,7 @@ class GLModule(object):
         '''x.__ge__(y) <==> x>=y'''
         result = False
         if type(module) is GLModule:
-            if self.path >= module.path:
+            if self.name >= module.name:
                 result = True
         return result
 
@@ -261,20 +261,20 @@ class GLModule(object):
         '''x.__gt__(y) <==> x>y'''
         result = False
         if type(module) is GLModule:
-            if self.path > module.path:
+            if self.name > module.name:
                 result = True
         return result
 
     def __hash__(self) -> int:
         '''x.__hash__() <==> hash(x)'''
-        result = hash(self.path) ^ hash(self.patched)
+        result = hash(self.name) ^ hash(self.patched)
         return result
 
     def __le__(self, module: object) -> bool:
         '''x.__le__(y) <==> x<=y'''
         result = False
         if type(module) is GLModule:
-            if self.path <= module.path:
+            if self.name <= module.name:
                 result = True
         return result
 
@@ -282,7 +282,7 @@ class GLModule(object):
         '''x.__lt__(y) <==> x<y'''
         result = False
         if type(module) is GLModule:
-            if self.path < module.path:
+            if self.name < module.name:
                 result = True
         return result
 
-- 
2.44.0

Reply via email to