bellabaiyunyu opened a new issue, #47155:
URL: https://github.com/apache/arrow/issues/47155
### Describe the bug, including details regarding any error messages,
version, and platform.
Hi team,
We are seeing the following error after upgrading to pyarrow 21.0.0.
Downgrading to 20.0.0 resolves the issue.
I believe this is a backward incompatible change. Will team be releasing a
patch to make the latest pyarrow backward compatible?
```
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File
"<REDACTED_LOCAL_ENV>/build/workloads/nemo-dgxc-benchmarking-g_v25.04/NeMo/scripts/performance/llm/pretrain_deepseek_v2_lite.py",
line 21, in <module>
from nemo.collections.llm.recipes.deepseek_v2_lite import pretrain_recipe
File
"<REDACTED_LOCAL_ENV>/build/workloads/nemo-dgxc-benchmarking-g_v25.04/NeMo/nemo/collections/llm/__init__.py",
line 21, in <module>
from nemo.collections.llm.bert.data import BERTMockDataModule,
BERTPreTrainingDataModule, SpecterDataModule
File
"<REDACTED_LOCAL_ENV>/build/workloads/nemo-dgxc-benchmarking-g_v25.04/NeMo/nemo/collections/llm/bert/data/__init__.py",
line 3, in <module>
from nemo.collections.llm.bert.data.specter import SpecterDataModule
File
"<REDACTED_LOCAL_ENV>/build/workloads/nemo-dgxc-benchmarking-g_v25.04/NeMo/nemo/collections/llm/bert/data/specter.py",
line 18, in <module>
from datasets import DatasetDict, load_dataset
File
"<REDACTED_LOCAL_ENV>/env/lib/python3.12/site-packages/datasets/__init__.py",
line 22, in <module>
from .arrow_dataset import Dataset
File
"<REDACTED_LOCAL_ENV>/env/lib/python3.12/site-packages/datasets/arrow_dataset.py",
line 67, in <module>
from .arrow_writer import ArrowWriter, OptimizedTypedSequence
File
"<REDACTED_LOCAL_ENV>/env/lib/python3.12/site-packages/datasets/arrow_writer.py",
line 27, in <module>
from .features import Features, Image, Value
File
"<REDACTED_LOCAL_ENV>/env/lib/python3.12/site-packages/datasets/features/__init__.py",
line 18, in <module>
from .features import Array2D, Array3D, Array4D, Array5D, ClassLabel,
Features, Sequence, Value
File
"<REDACTED_LOCAL_ENV>/env/lib/python3.12/site-packages/datasets/features/features.py",
line 634, in <module>
class _ArrayXDExtensionType(pa.PyExtensionType):
^^^^^^^^^^^^^^^^^^
AttributeError: module 'pyarrow' has no attribute 'PyExtensionType'. Did you
mean: 'ExtensionType'?
```
### Component(s)
Python
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]