This is an automated email from the ASF dual-hosted git repository.

rok pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/main by this push:
     new 1cd1841c06 GH-49150: [Doc][CI][Python] Doctests failing on rst files 
due to pandas 3+ (#49088)
1cd1841c06 is described below

commit 1cd1841c06c2c5c849340549ba57fc015d50005a
Author: Rok Mihevc <[email protected]>
AuthorDate: Thu Feb 5 01:06:04 2026 +0100

    GH-49150: [Doc][CI][Python] Doctests failing on rst files due to pandas 3+ 
(#49088)
    
    Fixes: #49150
    See https://github.com/apache/arrow/pull/48619#issuecomment-3823269381
    
    ### Rationale for this change
    
    Fix CI failures
    
    ### What changes are included in this PR?
    
    Tests are made more general to allow for Pandas 2 and Pandas 3 style string 
types
    
    ### Are these changes tested?
    
    By CI
    
    ### Are there any user-facing changes?
    
    No
    * GitHub Issue: #49150
    
    Authored-by: Rok Mihevc <[email protected]>
    Signed-off-by: Rok Mihevc <[email protected]>
---
 .github/workflows/python.yml   |  6 +++---
 docs/source/python/data.rst    |  2 +-
 docs/source/python/ipc.rst     | 12 ++++++------
 docs/source/python/pandas.rst  | 12 ++++++------
 docs/source/python/parquet.rst |  6 +++---
 5 files changed, 19 insertions(+), 19 deletions(-)

diff --git a/.github/workflows/python.yml b/.github/workflows/python.yml
index e5d367958d..bc7fe3cd68 100644
--- a/.github/workflows/python.yml
+++ b/.github/workflows/python.yml
@@ -69,10 +69,10 @@ jobs:
           - conda-python-3.12-no-numpy
         include:
           - name: conda-python-docs
-            cache: conda-python-3.10
+            cache: conda-python-3.11
             image: conda-python-docs
-            title: AMD64 Conda Python 3.10 Sphinx & Numpydoc
-            python: "3.10"
+            title: AMD64 Conda Python 3.11 Sphinx & Numpydoc
+            python: "3.11"
           - name: conda-python-3.11-nopandas
             cache: conda-python-3.11
             image: conda-python
diff --git a/docs/source/python/data.rst b/docs/source/python/data.rst
index 279ec5dc61..22a3114fdd 100644
--- a/docs/source/python/data.rst
+++ b/docs/source/python/data.rst
@@ -684,7 +684,7 @@ When using :class:`~.DictionaryArray` with pandas, the 
analogue is
    6    NaN
    7    baz
    dtype: category
-   Categories (3, object): ['foo', 'bar', 'baz']
+   Categories (3, str): ['foo', 'bar', 'baz']
 
 .. _data.record_batch:
 
diff --git a/docs/source/python/ipc.rst b/docs/source/python/ipc.rst
index 9b4458c748..8f96363968 100644
--- a/docs/source/python/ipc.rst
+++ b/docs/source/python/ipc.rst
@@ -160,12 +160,12 @@ DataFrame output:
    >>> with pa.ipc.open_file(buf) as reader:
    ...     df = reader.read_pandas()
    >>> df[:5]
-      f0    f1     f2
-   0   1   foo   True
-   1   2   bar   None
-   2   3   baz  False
-   3   4  None   True
-   4   1   foo   True
+      f0   f1     f2
+   0   1  foo   True
+   1   2  bar   None
+   2   3  baz  False
+   3   4  NaN   True
+   4   1  foo   True
 
 Efficiently Writing and Reading Arrow Data
 ------------------------------------------
diff --git a/docs/source/python/pandas.rst b/docs/source/python/pandas.rst
index 9999a5b779..7aacaaff60 100644
--- a/docs/source/python/pandas.rst
+++ b/docs/source/python/pandas.rst
@@ -170,7 +170,7 @@ number of possible values.
 
    >>> df = pd.DataFrame({"cat": pd.Categorical(["a", "b", "c", "a", "b", 
"c"])})
    >>> df.cat.dtype.categories
-   Index(['a', 'b', 'c'], dtype='object')
+   Index(['a', 'b', 'c'], dtype='str')
    >>> df
      cat
    0   a
@@ -182,7 +182,7 @@ number of possible values.
    >>> table = pa.Table.from_pandas(df)
    >>> table
    pyarrow.Table
-   cat: dictionary<values=string, indices=int8, ordered=0>
+   cat: dictionary<values=large_string, indices=int8, ordered=0>
    ----
    cat: [  -- dictionary:
    ["a","b","c"]  -- indices:
@@ -196,7 +196,7 @@ same categories of the Pandas DataFrame.
    >>> column = table[0]
    >>> chunk = column.chunk(0)
    >>> chunk.dictionary
-   <pyarrow.lib.StringArray object at ...>
+   <pyarrow.lib.LargeStringArray object at ...>
    [
      "a",
      "b",
@@ -224,7 +224,7 @@ use the ``datetime64[ns]`` type in Pandas and are converted 
to an Arrow
 
    >>> df = pd.DataFrame({"datetime": pd.date_range("2020-01-01T00:00:00Z", 
freq="h", periods=3)})
    >>> df.dtypes
-   datetime    datetime64[ns, UTC]
+   datetime    datetime64[us, UTC]
    dtype: object
    >>> df
                       datetime
@@ -234,9 +234,9 @@ use the ``datetime64[ns]`` type in Pandas and are converted 
to an Arrow
    >>> table = pa.Table.from_pandas(df)
    >>> table
    pyarrow.Table
-   datetime: timestamp[ns, tz=UTC]
+   datetime: timestamp[us, tz=UTC]
    ----
-   datetime: [[2020-01-01 00:00:00.000000000Z,...,2020-01-01 
02:00:00.000000000Z]]
+   datetime: [[2020-01-01 00:00:00.000000Z,2020-01-01 
01:00:00.000000Z,2020-01-01 02:00:00.000000Z]]
 
 In this example the Pandas Timestamp is time zone aware
 (``UTC`` on this case), and this information is used to create the Arrow
diff --git a/docs/source/python/parquet.rst b/docs/source/python/parquet.rst
index 638df963cd..30a84b3dc6 100644
--- a/docs/source/python/parquet.rst
+++ b/docs/source/python/parquet.rst
@@ -238,9 +238,9 @@ concatenate them into a single table. You can read 
individual row groups with
    >>> parquet_file.read_row_group(0)
    pyarrow.Table
    one: double
-   two: string
+   two: large_string
    three: bool
-   __index_level_0__: string
+   __index_level_0__: large_string
    ----
    one: [[-1,null,2.5]]
    two: [["foo","bar","baz"]]
@@ -352,7 +352,7 @@ and improved performance for columns with many repeated 
string values.
    one: double
    two: dictionary<values=string, indices=int32, ordered=0>
    three: bool
-   __index_level_0__: string
+   __index_level_0__: large_string
    ----
    one: [[-1,null,2.5]]
    two: [  -- dictionary:

Reply via email to