[jira] [Comment Edited] (ARROW-16340) [C++][Python] Move all Python related code into PyArrow

Yue Ni (Jira) Sat, 19 Nov 2022 19:58:06 -0800


    [ 
https://issues.apache.org/jira/browse/ARROW-16340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17636259#comment-17636259
 ]


Yue Ni edited comment on ARROW-16340 at 11/20/22 3:57 AM:
----------------------------------------------------------

[~kou] thanks for the pointer. After some more experiments, I managed to make 
it to work, the two minor changes are (compiler doesn't help on either of them):
{code:java}
// 1. the cast looks like this:
auto py_reader =
            
py_reader_import_from_c(static_cast<uintptr_t>(reinterpret_cast<int64_t>(&c_stream)));
// 2. no need for `pybind11::reinterpret_steal<pybind11::object>(py_table)`, 
py_table should be returned directly
return py_table;
{code}
{quote}If you use both of Apache Arrow C++ from vcpkg and pyarrow wheel from 
PyPI, you mix multiple Apache Arrow C++ libraries. It causes unexpected 
behavior such as a crash
{quote}
Although I managed to make it to work, I am not entirely sure if this is the 
recommended approach:

1) Do you think if it helps if I keep pyarrow's version and Arrow C++ library 
version always consistent (for example, both using 10.0.0)?

2) If I use the official pyarrow (in Python) wheel and (pyarrow C++ library + 
Arrow C++ library, both compiled from vcpkg), is it any better than using the C 
data stream API? You said `you mix multiple Apache Arrow C++ libraries` and 
this could cause unexpected behavior, but it seems even if I don't use this 
approach, as long as I use pyarrow wheel in Python, I may run into such some 
unexpected problem, is it correct?


was (Author: niyue):
[~kou] thanks for the pointer. After some more experiments, I managed to make 
it to work, the two minor changes are (compiler doesn't help on either of them):

 

 
{code:java}
// 1. the cast looks like this:
auto py_reader =
            
py_reader_import_from_c(static_cast<uintptr_t>(reinterpret_cast<int64_t>(&c_stream)));
// 2. no need for `pybind11::reinterpret_steal<pybind11::object>(py_table)`, 
py_table should be returned directly
return py_table;
{code}
 

 

> If you use both of Apache Arrow C++ from vcpkg and pyarrow wheel from PyPI, 
> you mix multiple Apache Arrow C++ libraries. It causes unexpected behavior 
> such as a crash

Although I managed to make it to work, I am not entirely sure if this is the 
recommended approach:

1) Do you think if it helps if I keep pyarrow's version and Arrow C++ library 
version always consistent (for example, both using 10.0.0)?

2) If I use the official pyarrow (in Python) wheel and (pyarrow C++ library + 
Arrow C++ library, both compiled from vcpkg), is it any better than using the C 
data stream API? You said `you mix multiple Apache Arrow C++ libraries` and 
this could cause unexpected behavior, but it seems even if I don't use this 
approach, as long as I use pyarrow wheel in Python, I may run into such some 
unexpected problem, is it correct?

> [C++][Python] Move all Python related code into PyArrow
> -------------------------------------------------------
>
>                 Key: ARROW-16340
>                 URL: https://issues.apache.org/jira/browse/ARROW-16340
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++, Python
>            Reporter: Alenka Frim
>            Assignee: Alenka Frim
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 10.0.0
>
>          Time Spent: 33h 10m
>  Remaining Estimate: 0h
>
> Move {{src/arrow/python}} directory into {{pyarrow}} and arrange PyArrow to 
> build it.
> More details can be found on this thread:
> https://lists.apache.org/thread/jbxyldhqff4p9z53whhs95y4jcomdgd2



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (ARROW-16340) [C++][Python] Move all Python related code into PyArrow

Reply via email to