[
https://issues.apache.org/jira/browse/ARROW-16340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17636259#comment-17636259
]
Yue Ni edited comment on ARROW-16340 at 11/20/22 3:57 AM:
----------------------------------------------------------
[~kou] thanks for the pointer. After some more experiments, I managed to make
it to work, the two minor changes are (compiler doesn't help on either of them):
{code:java}
// 1. the cast looks like this:
auto py_reader =
py_reader_import_from_c(static_cast<uintptr_t>(reinterpret_cast<int64_t>(&c_stream)));
// 2. no need for `pybind11::reinterpret_steal<pybind11::object>(py_table)`,
py_table should be returned directly
return py_table;
{code}
{quote}If you use both of Apache Arrow C++ from vcpkg and pyarrow wheel from
PyPI, you mix multiple Apache Arrow C++ libraries. It causes unexpected
behavior such as a crash
{quote}
Although I managed to make it to work, I am not entirely sure if this is the
recommended approach:
1) Do you think if it helps if I keep pyarrow's version and Arrow C++ library
version always consistent (for example, both using 10.0.0)?
2) If I use the official pyarrow (in Python) wheel and (pyarrow C++ library +
Arrow C++ library, both compiled from vcpkg), is it any better than using the C
data stream API? You said `you mix multiple Apache Arrow C++ libraries` and
this could cause unexpected behavior, but it seems even if I don't use this
approach, as long as I use pyarrow wheel in Python, I may run into such some
unexpected problem, is it correct?
was (Author: niyue):
[~kou] thanks for the pointer. After some more experiments, I managed to make
it to work, the two minor changes are (compiler doesn't help on either of them):
{code:java}
// 1. the cast looks like this:
auto py_reader =
py_reader_import_from_c(static_cast<uintptr_t>(reinterpret_cast<int64_t>(&c_stream)));
// 2. no need for `pybind11::reinterpret_steal<pybind11::object>(py_table)`,
py_table should be returned directly
return py_table;
{code}
> If you use both of Apache Arrow C++ from vcpkg and pyarrow wheel from PyPI,
> you mix multiple Apache Arrow C++ libraries. It causes unexpected behavior
> such as a crash
Although I managed to make it to work, I am not entirely sure if this is the
recommended approach:
1) Do you think if it helps if I keep pyarrow's version and Arrow C++ library
version always consistent (for example, both using 10.0.0)?
2) If I use the official pyarrow (in Python) wheel and (pyarrow C++ library +
Arrow C++ library, both compiled from vcpkg), is it any better than using the C
data stream API? You said `you mix multiple Apache Arrow C++ libraries` and
this could cause unexpected behavior, but it seems even if I don't use this
approach, as long as I use pyarrow wheel in Python, I may run into such some
unexpected problem, is it correct?
> [C++][Python] Move all Python related code into PyArrow
> -------------------------------------------------------
>
> Key: ARROW-16340
> URL: https://issues.apache.org/jira/browse/ARROW-16340
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++, Python
> Reporter: Alenka Frim
> Assignee: Alenka Frim
> Priority: Major
> Labels: pull-request-available
> Fix For: 10.0.0
>
> Time Spent: 33h 10m
> Remaining Estimate: 0h
>
> Move {{src/arrow/python}} directory into {{pyarrow}} and arrange PyArrow to
> build it.
> More details can be found on this thread:
> https://lists.apache.org/thread/jbxyldhqff4p9z53whhs95y4jcomdgd2
--
This message was sent by Atlassian Jira
(v8.20.10#820010)