bjfar opened a new issue, #41857:
URL: https://github.com/apache/arrow/issues/41857
### Describe the bug, including details regarding any error messages,
version, and platform.
Python version: 3.10.14
pyarrow version: 16.1.0
pandas version: 2.2.2
pytest version: 8.2.1
I have some apparently niche circumstances that trigger the following error:
```
/home/benf/repos/tetra/python/tests/test_minimal.py:24:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
/home/benf/micromamba/envs/tetra/lib/python3.10/site-packages/pandas/util/_decorators.py:333:
in wrapper
return func(*args, **kwargs)
/home/benf/micromamba/envs/tetra/lib/python3.10/site-packages/pandas/core/frame.py:3113:
in to_parquet
return to_parquet(
/home/benf/micromamba/envs/tetra/lib/python3.10/site-packages/pandas/io/parquet.py:476:
in to_parquet
impl = get_engine(engine)
/home/benf/micromamba/envs/tetra/lib/python3.10/site-packages/pandas/io/parquet.py:63:
in get_engine
return engine_class()
/home/benf/micromamba/envs/tetra/lib/python3.10/site-packages/pandas/io/parquet.py:169:
in __init__
import pandas.core.arrays.arrow.extension_types # pyright:
ignore[reportUnusedImport] # noqa: F401
/home/benf/micromamba/envs/tetra/lib/python3.10/site-packages/pandas/core/arrays/arrow/extension_types.py:59:
in <module>
pyarrow.register_extension_type(_period_type)
pyarrow/types.pxi:1954: in pyarrow.lib.register_extension_type
???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> ???
E pyarrow.lib.ArrowKeyError: A type extension with name pandas.period
already defined
pyarrow/error.pxi:91: ArrowKeyError
========================================================= short test summary
info =========================================================
FAILED python/tests/test_minimal.py::test_pyarrow_issue_2 -
pyarrow.lib.ArrowKeyError: A type extension with name pandas.period already
defined
```
It seems to have something to do with how pytest orchestrates its tests.
Here is my minimal example:
test_minimal.py
```
import pytest
import pandas as pd
pytest_plugins = ["pytester"]
def test_pyarrow_issue(testdir, tmp_path):
path = str(tmp_path / "test.tar")
df = pd.DataFrame()
df.to_parquet(path)
def test_pyarrow_issue_2(testdir, tmp_path):
path = str(tmp_path / "test_2.tar")
df = pd.DataFrame()
df.to_parquet(path)
```
Running `pytest test_minimal.py` then triggers the error.
Notably, the error does *not* occur if either test is run independently, and
it does not occur if the `testdir` fixture is removed or replaced with some
other fixture. So I guess it has something to do with whatever `testdir` is
doing under the hood. Presumably to do with how pandas/pyarrow get imported.
In my real case I would really quite like to keep using the `testdir`
fixture, though I can probably find a different way to do things. But
nonetheless this behaviour seemed worth reporting. Not sure if it is a pyarrow
issue though, or whether it is more of a pytest issue, or maybe even pandas.
### Component(s)
Parquet, Python
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]