Re: shutil.rmtree() fails when used in Fedora (rpm) "mock" environment

2024-10-24 Thread MRAB via Python-list

On 2024-10-24 20:21, Left Right wrote:

> > > The stack is created on line 760 with os.lstat and entries are appended
> > > on lines 677 (os.rmdir), 679 (os.close) and 689 (os.lstat).
> > >
> > > 'func' is popped off the stack on line 651 and check in the following 
lines.
> > >
> > > I can't see anywhere else where something else is put onto the stack or
> > > an entry is replaced.

But the _rmtree_safe_fd() compares func to a *dynamically* resolved
reference: os.lstat. If the reference to os changed (or os object was
modified to have new reference at lstat) between the time os.lstat was
added to the stack and the time of comparison, then comparison
would've failed.  To illustrate my idea:

os.lstat = lambda x: x # thread 1
stack.append((os.lstat, ...)) # thread 1
os.lstat = lambda x: x # thread 2
func, *_ = stack.pop() # thread 1
assert func is os.lstat # thread 1 (failure!)

The only question is: is it possible to modify os.lstat like that, and
if so, how?

Other alternatives include a malfunctioning "is" operator,
malfunctioning module cache... all those are a lot less likely.
What is the probability of replacing os.lstat, os.close or os.rmdir from 
another thread at just the right time?

--
https://mail.python.org/mailman/listinfo/python-list


Re: shutil.rmtree() fails when used in Fedora (rpm) "mock" environment

2024-10-24 Thread Barry via Python-list



> On 24 Oct 2024, at 15:07, Christian Buhtz via Python-list 
>  wrote:
> 
> On one hand Fedora seems to use a tool called "mock" to build packages in a 
> chroot environment.
> On the other hand the test suite of "Back In Time" does read and write to the 
> real file system.

I am a Fedora packager and can help explain what is the tools are doing.

Mock runs the build in a chroot env that allows for reproducible clean room 
builds.
Sort like a container.

This is nothing to do with the python mock package.

What do you mean by the real file sustem?

You cannot write to the /usr file system. Is that what your tests do?
If so that needs changing.

Barry


-- 
https://mail.python.org/mailman/listinfo/python-list


shutil.rmtree() fails when used in Fedora (rpm) "mock" environment

2024-10-24 Thread Christian Buhtz via Python-list

Hello,
I am upstream maintainer of "Back In Time" [1] investigating an issue a 
distro maintainer from Fedora reported [2] to me.


On one hand Fedora seems to use a tool called "mock" to build packages 
in a chroot environment.
On the other hand the test suite of "Back In Time" does read and write 
to the real file system.
One test fails because a temporary directory is cleaned up using 
shutil.rmtree(). Please see the output below.


I am not familiar with Fedora and "mock". So I am not able to reproduce 
this on my own.
It seems the Fedora maintainer also has no clue how to solve it or why 
it happens.


Can you please have a look (especially at the line "assert func is 
os.lstat").
Maybe you have an idea what is the intention behind this error raised by 
an "assert" statement inside "shutil.rmtree()".


Thanks in advance,
Christian Buhtz

[1] -- 
[2] -- 

__ General.test_ctor_defaults 
__

self = 
def test_ctor_defaults(self):
"""Default values in constructor."""

  with TemporaryDirectory(prefix='bit.') as temp_name:

test/test_uniquenessset.py:47:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _

/usr/lib64/python3.13/tempfile.py:946: in __exit__
self.cleanup()
/usr/lib64/python3.13/tempfile.py:950: in cleanup
self._rmtree(self.name, ignore_errors=self._ignore_cleanup_errors)
/usr/lib64/python3.13/tempfile.py:930: in _rmtree
_shutil.rmtree(name, onexc=onexc)
/usr/lib64/python3.13/shutil.py:763: in rmtree
_rmtree_safe_fd(stack, onexc)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _

stack = []
onexc = .onexc at 
0xb39bc860>

def _rmtree_safe_fd(stack, onexc):
# Each stack item has four elements:
# * func: The first operation to perform: os.lstat, os.close or 
os.rmdir.
#   Walking a directory starts with an os.lstat() to detect 
symlinks; in
#   this case, func is updated before subsequent operations and 
passed to

#   onexc() if an error occurs.
# * dirfd: Open file descriptor, or None if we're processing the 
top-level
#   directory given to rmtree() and the user didn't supply 
dir_fd.
# * path: Path of file to operate upon. This is passed to 
onexc() if an

#   error occurs.
# * orig_entry: os.DirEntry, or None if we're processing the 
top-level
#   directory given to rmtree(). We used the cached stat() of 
the entry to

#   save a call to os.lstat() when walking subdirectories.
func, dirfd, path, orig_entry = stack.pop()
name = path if orig_entry is None else orig_entry.name
try:
if func is os.close:
os.close(dirfd)
return
if func is os.rmdir:
os.rmdir(name, dir_fd=dirfd)
return

# Note: To guard against symlink races, we use the standard
# lstat()/open()/fstat() trick.

  assert func is os.lstat

E   AssertionError
/usr/lib64/python3.13/shutil.py:663: AssertionError

--
https://mail.python.org/mailman/listinfo/python-list


Re: shutil.rmtree() fails when used in Fedora (rpm) "mock" environment

2024-10-24 Thread Left Right via Python-list
From reading the code where the exception is coming from, this is how
I interpret the intention of the author: they build a list (not sure
why they used list, when there's a stack datastructure in Python)
which they use as a stack, where the elements of the stack are
4-tuples, the important part about these tuples is that the first
element is the operation to be performed by rmtree() has to be one of
the known filesystem-related functions. The code raising the exception
checks that it's one of those kinds and if it isn't, crashes.

There is, however, a problem with testing equality (more strictly,
identity in this case) between functions.  I.e. it's possible that a
function isn't identical to itself is, eg. "os" module was somehow
loaded twice.  I'm not sure if that's a real possibility with how
Python works... but maybe in some cases, like, multithreaded
environments it could happen...

To investigate this, I'd edit the file with the assertion and make it
print the actual value found in os.lstat and func.  My guess is that
they are both somehow "lstat", but with different memory addresses.

On Thu, Oct 24, 2024 at 4:06 PM Christian Buhtz via Python-list
 wrote:
>
> Hello,
> I am upstream maintainer of "Back In Time" [1] investigating an issue a
> distro maintainer from Fedora reported [2] to me.
>
> On one hand Fedora seems to use a tool called "mock" to build packages
> in a chroot environment.
> On the other hand the test suite of "Back In Time" does read and write
> to the real file system.
> One test fails because a temporary directory is cleaned up using
> shutil.rmtree(). Please see the output below.
>
> I am not familiar with Fedora and "mock". So I am not able to reproduce
> this on my own.
> It seems the Fedora maintainer also has no clue how to solve it or why
> it happens.
>
> Can you please have a look (especially at the line "assert func is
> os.lstat").
> Maybe you have an idea what is the intention behind this error raised by
> an "assert" statement inside "shutil.rmtree()".
>
> Thanks in advance,
> Christian Buhtz
>
> [1] -- 
> [2] -- 
>
> __ General.test_ctor_defaults
> __
> self = 
>  def test_ctor_defaults(self):
>  """Default values in constructor."""
> >   with TemporaryDirectory(prefix='bit.') as temp_name:
> test/test_uniquenessset.py:47:
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> _ _ _ _
> /usr/lib64/python3.13/tempfile.py:946: in __exit__
>  self.cleanup()
> /usr/lib64/python3.13/tempfile.py:950: in cleanup
>  self._rmtree(self.name, ignore_errors=self._ignore_cleanup_errors)
> /usr/lib64/python3.13/tempfile.py:930: in _rmtree
>  _shutil.rmtree(name, onexc=onexc)
> /usr/lib64/python3.13/shutil.py:763: in rmtree
>  _rmtree_safe_fd(stack, onexc)
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> _ _ _ _
> stack = []
> onexc = .onexc at
> 0xb39bc860>
>  def _rmtree_safe_fd(stack, onexc):
>  # Each stack item has four elements:
>  # * func: The first operation to perform: os.lstat, os.close or
> os.rmdir.
>  #   Walking a directory starts with an os.lstat() to detect
> symlinks; in
>  #   this case, func is updated before subsequent operations and
> passed to
>  #   onexc() if an error occurs.
>  # * dirfd: Open file descriptor, or None if we're processing the
> top-level
>  #   directory given to rmtree() and the user didn't supply
> dir_fd.
>  # * path: Path of file to operate upon. This is passed to
> onexc() if an
>  #   error occurs.
>  # * orig_entry: os.DirEntry, or None if we're processing the
> top-level
>  #   directory given to rmtree(). We used the cached stat() of
> the entry to
>  #   save a call to os.lstat() when walking subdirectories.
>  func, dirfd, path, orig_entry = stack.pop()
>  name = path if orig_entry is None else orig_entry.name
>  try:
>  if func is os.close:
>  os.close(dirfd)
>  return
>  if func is os.rmdir:
>  os.rmdir(name, dir_fd=dirfd)
>  return
>
>  # Note: To guard against symlink races, we use the standard
>  # lstat()/open()/fstat() trick.
> >   assert func is os.lstat
> E   AssertionError
> /usr/lib64/python3.13/shutil.py:663: AssertionError
>
> --
> https://mail.python.org/mailman/listinfo/python-list
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: shutil.rmtree() fails when used in Fedora (rpm) "mock" environment

2024-10-24 Thread Dan Sommers via Python-list
On 2024-10-24 at 20:54:53 +0100,
MRAB via Python-list  wrote:

> On 2024-10-24 20:21, Left Right wrote:
> > > > > The stack is created on line 760 with os.lstat and entries are 
> > > > > appended
> > > > > on lines 677 (os.rmdir), 679 (os.close) and 689 (os.lstat).
> > > > >
> > > > > 'func' is popped off the stack on line 651 and check in the following 
> > > > > lines.
> > > > >
> > > > > I can't see anywhere else where something else is put onto the stack 
> > > > > or
> > > > > an entry is replaced.
> > 
> > But the _rmtree_safe_fd() compares func to a *dynamically* resolved
> > reference: os.lstat. If the reference to os changed (or os object was
> > modified to have new reference at lstat) between the time os.lstat was
> > added to the stack and the time of comparison, then comparison
> > would've failed.  To illustrate my idea:
> > 
> > os.lstat = lambda x: x # thread 1
> > stack.append((os.lstat, ...)) # thread 1
> > os.lstat = lambda x: x # thread 2
> > func, *_ = stack.pop() # thread 1
> > assert func is os.lstat # thread 1 (failure!)
> > 
> > The only question is: is it possible to modify os.lstat like that, and
> > if so, how?
> > 
> > Other alternatives include a malfunctioning "is" operator,
> > malfunctioning module cache... all those are a lot less likely.
> What is the probability of replacing os.lstat, os.close or os.rmdir from
> another thread at just the right time?

That is never the right question in a multi-threaded system.  The answer
is always that is doesn't matter, the odds will beat you in the end.  Or
sometimes right in the middle of a CPU instruction; does anyone remember
the MC680XX series?

Yes, as a matter of fact, I did used to make my living designing,
building, delivering, and maintaining such systems.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: shutil.rmtree() fails when used in Fedora (rpm) "mock" environment

2024-10-24 Thread MRAB via Python-list

On 2024-10-24 17:30, Left Right wrote:

> The stack is created on line 760 with os.lstat and entries are appended
> on lines 677 (os.rmdir), 679 (os.close) and 689 (os.lstat).
>
> 'func' is popped off the stack on line 651 and check in the following lines.
>
> I can't see anywhere else where something else is put onto the stack or
> an entry is replaced.

But how do you know this code isn't executed from different threads?
What I anticipate to be the problem is that the "os" module is
imported twice, and there are two references to "os.lstat".  Normally,
this wouldn't cause a problem, because they are the same function that
doesn't have any state, but once you are trying to compare them, the
identity test will fail, because those functions were loaded multiple
times into different memory locations.

I don't know of any specific mechanism for forcing the interpreter to
import the same module multiple times, but if that was possible (which
in principle it is), then it would explain the behavior.
The stack is a local variable and os.lstat, etc, are pushed and popped 
in one function and then another that it calls, so they're in the same 
thread.

--
https://mail.python.org/mailman/listinfo/python-list


Re: shutil.rmtree() fails when used in Fedora (rpm) "mock" environment

2024-10-24 Thread Left Right via Python-list
> The stack is created on line 760 with os.lstat and entries are appended
> on lines 677 (os.rmdir), 679 (os.close) and 689 (os.lstat).
>
> 'func' is popped off the stack on line 651 and check in the following lines.
>
> I can't see anywhere else where something else is put onto the stack or
> an entry is replaced.

But how do you know this code isn't executed from different threads?
What I anticipate to be the problem is that the "os" module is
imported twice, and there are two references to "os.lstat".  Normally,
this wouldn't cause a problem, because they are the same function that
doesn't have any state, but once you are trying to compare them, the
identity test will fail, because those functions were loaded multiple
times into different memory locations.

I don't know of any specific mechanism for forcing the interpreter to
import the same module multiple times, but if that was possible (which
in principle it is), then it would explain the behavior.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: shutil.rmtree() fails when used in Fedora (rpm) "mock" environment

2024-10-24 Thread MRAB via Python-list

On 2024-10-24 16:17, Left Right via Python-list wrote:

 From reading the code where the exception is coming from, this is how
I interpret the intention of the author: they build a list (not sure
why they used list, when there's a stack datastructure in Python)
which they use as a stack, where the elements of the stack are
4-tuples, the important part about these tuples is that the first
element is the operation to be performed by rmtree() has to be one of
the known filesystem-related functions. The code raising the exception
checks that it's one of those kinds and if it isn't, crashes.

There is, however, a problem with testing equality (more strictly,
identity in this case) between functions.  I.e. it's possible that a
function isn't identical to itself is, eg. "os" module was somehow
loaded twice.  I'm not sure if that's a real possibility with how
Python works... but maybe in some cases, like, multithreaded
environments it could happen...

To investigate this, I'd edit the file with the assertion and make it
print the actual value found in os.lstat and func.  My guess is that
they are both somehow "lstat", but with different memory addresses.

The stack is created on line 760 with os.lstat and entries are appended 
on lines 677 (os.rmdir), 679 (os.close) and 689 (os.lstat).


'func' is popped off the stack on line 651 and check in the following lines.

I can't see anywhere else where something else is put onto the stack or 
an entry is replaced.



On Thu, Oct 24, 2024 at 4:06 PM Christian Buhtz via Python-list
 wrote:


Hello,
I am upstream maintainer of "Back In Time" [1] investigating an issue a
distro maintainer from Fedora reported [2] to me.

On one hand Fedora seems to use a tool called "mock" to build packages
in a chroot environment.
On the other hand the test suite of "Back In Time" does read and write
to the real file system.
One test fails because a temporary directory is cleaned up using
shutil.rmtree(). Please see the output below.

I am not familiar with Fedora and "mock". So I am not able to reproduce
this on my own.
It seems the Fedora maintainer also has no clue how to solve it or why
it happens.

Can you please have a look (especially at the line "assert func is
os.lstat").
Maybe you have an idea what is the intention behind this error raised by
an "assert" statement inside "shutil.rmtree()".

Thanks in advance,
Christian Buhtz

[1] -- 
[2] -- 

__ General.test_ctor_defaults
__
self = 
 def test_ctor_defaults(self):
 """Default values in constructor."""
>   with TemporaryDirectory(prefix='bit.') as temp_name:
test/test_uniquenessset.py:47:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
_ _ _ _
/usr/lib64/python3.13/tempfile.py:946: in __exit__
 self.cleanup()
/usr/lib64/python3.13/tempfile.py:950: in cleanup
 self._rmtree(self.name, ignore_errors=self._ignore_cleanup_errors)
/usr/lib64/python3.13/tempfile.py:930: in _rmtree
 _shutil.rmtree(name, onexc=onexc)
/usr/lib64/python3.13/shutil.py:763: in rmtree
 _rmtree_safe_fd(stack, onexc)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
_ _ _ _
stack = []
onexc = .onexc at
0xb39bc860>
 def _rmtree_safe_fd(stack, onexc):
 # Each stack item has four elements:
 # * func: The first operation to perform: os.lstat, os.close or
os.rmdir.
 #   Walking a directory starts with an os.lstat() to detect
symlinks; in
 #   this case, func is updated before subsequent operations and
passed to
 #   onexc() if an error occurs.
 # * dirfd: Open file descriptor, or None if we're processing the
top-level
 #   directory given to rmtree() and the user didn't supply
dir_fd.
 # * path: Path of file to operate upon. This is passed to
onexc() if an
 #   error occurs.
 # * orig_entry: os.DirEntry, or None if we're processing the
top-level
 #   directory given to rmtree(). We used the cached stat() of
the entry to
 #   save a call to os.lstat() when walking subdirectories.
 func, dirfd, path, orig_entry = stack.pop()
 name = path if orig_entry is None else orig_entry.name
 try:
 if func is os.close:
 os.close(dirfd)
 return
 if func is os.rmdir:
 os.rmdir(name, dir_fd=dirfd)
 return

 # Note: To guard against symlink races, we use the standard
 # lstat()/open()/fstat() trick.
>   assert func is os.lstat
E   AssertionError
/usr/lib64/python3.13/shutil.py:663: AssertionError

--
https://mail.python.org/mailman/listinfo/python-list


--
https://mail.python.org/mailman/listinfo/python-list


Re: Chardet oddity

2024-10-24 Thread Mark Bourne via Python-list

Albert-Jan Roskam wrote:

Today I used chardet.detect in the repl and it returned windows-1252
(incorrect, because it later resulted in a UnicodeDecodeError). When I ran
chardet as a script (which uses UniversalLineDetector) this returned
MacRoman. Isn't charset.detect the correct way? I've used this method many
times.
# Interpreter
>>> contents = open(FILENAME, "rb").read()
>>> chardet.detect(content)


Is that copy and pasted from the terminal, or retyped with possible 
transcription errors?  As written, you've assigned the open file handle 
to `contents`, but passed `content` (with no "s") to `chardet.detect` - 
so the result would depend on whatever was previously assigned to `content`.



{'encoding': 'Windows-1252', 'confidence': 0.7282676610947401, 'language':
''}
# Terminal
$ python -m chardet FILENAME
FILENAME: MacRoman with confidence 0.7167379080370483
Thanks!
Albert-Jan


--
Mark.
--
https://mail.python.org/mailman/listinfo/python-list


Re: Chardet oddity

2024-10-24 Thread Roland Mueller via Python-list
ke 23. lokak. 2024 klo 20.11 Albert-Jan Roskam via Python-list (
[email protected]) kirjoitti:

>Today I used chardet.detect in the repl and it returned windows-1252
>(incorrect, because it later resulted in a UnicodeDecodeError). When I
> ran
>chardet as a script (which uses UniversalLineDetector) this returned
>MacRoman. Isn't charset.detect the correct way? I've used this method
> many
>times.
># Interpreter
>>>> contents = open(FILENAME, "rb").read()
>>>> chardet.detect(content)
>{'encoding': 'Windows-1252', 'confidence': 0.7282676610947401,
> 'language':
>''}
># Terminal
>$ python -m chardet FILENAME
>FILENAME: MacRoman with confidence 0.7167379080370483
>Thanks!
>Albert-Jan
>

The entry point for the module chardet is chardet.cli.chardetect:main and
main() calls function description_of(lines, name).
'lines' is an opened file in mode 'rb' and name will hold the filename.

Following way I tried this in interactive mode: I think the crucial
difference is that  description_of(lines, name) reads
the opened file line by line and stops after something has been detected in
some line.

When reading the whole file into the variable contents probably gives
another result depending on the input.
This behaviour I was not able to repeat.
I am assuming that you used the same Python for both tests.

>>> from chardet.cli import chardetect
>>> chardetect.description_of(open('/tmp/DATE', 'rb'), 'some file')
'some file: ascii with confidence 1.0'
>>>

Your approach
>>> from chardet import detect
>>> detect(open('/tmp/DATE','rb').read())
{'encoding': 'ascii', 'confidence': 1.0, 'language': ''}


from /usr/lib/python3/dist-packages/chardet/cli/chardetect.py

def description_of(lines, name='stdin'):
u = UniversalDetector()
for line in lines:
line = bytearray(line)
u.feed(line)
# shortcut out of the loop to save reading further - particularly
useful if we read a BOM.
if u.done:
break
u.close()
result = u.result
...


> --
> https://mail.python.org/mailman/listinfo/python-list
>
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: shutil.rmtree() fails when used in Fedora (rpm) "mock" environment

2024-10-24 Thread Left Right via Python-list
> > > The stack is created on line 760 with os.lstat and entries are appended
> > > on lines 677 (os.rmdir), 679 (os.close) and 689 (os.lstat).
> > >
> > > 'func' is popped off the stack on line 651 and check in the following 
> > > lines.
> > >
> > > I can't see anywhere else where something else is put onto the stack or
> > > an entry is replaced.

But the _rmtree_safe_fd() compares func to a *dynamically* resolved
reference: os.lstat. If the reference to os changed (or os object was
modified to have new reference at lstat) between the time os.lstat was
added to the stack and the time of comparison, then comparison
would've failed.  To illustrate my idea:

os.lstat = lambda x: x # thread 1
stack.append((os.lstat, ...)) # thread 1
os.lstat = lambda x: x # thread 2
func, *_ = stack.pop() # thread 1
assert func is os.lstat # thread 1 (failure!)

The only question is: is it possible to modify os.lstat like that, and
if so, how?

Other alternatives include a malfunctioning "is" operator,
malfunctioning module cache... all those are a lot less likely.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: shutil.rmtree() fails when used in Fedora (rpm) "mock" environment

2024-10-24 Thread Left Right via Python-list
> What is the probability of replacing os.lstat, os.close or os.rmdir from
> another thread at just the right time?

If the thead does "import os", and its start is logically connected to
calling _rmtree_safe_fd(), I'd say it's a very good chance! That is,
again, granted that the reference to os.lstat *can* be modified in
this way.

But, before we keep guessing any further, it'd be best if OP could get
us the info on what's stored in "func" and "os.lstat" at the time the
assertion fails.
-- 
https://mail.python.org/mailman/listinfo/python-list