erikhansenwong opened a new issue, #46224:
URL: https://github.com/apache/arrow/issues/46224
### Describe the bug, including details regarding any error messages,
version, and platform.
On some calls to `Table.join_asof` my python process becomes unresponsive
and is using zero cpu. It appears to be a thread deadlock or something
similar. I have created an example that causes the deadlock with high
probability on my laptop.
Here are the details of my setup:
- Python 3.12.7
- pyarrow==19.0.1
- numpy==2.2.4
- pandas==2.2.3
- Ubuntu 22.04.5
- CPU: 13th Gen Intel(R) Core(TM) i9-13980HX
I was also able to produce the deadlock on a colleague's Mac laptop with
Apple silicon using this example, so I assume it won't make a big difference
what hardware it runs on.
On my laptop this always gets deadlocked before the 300th iteration
```python
import numpy as np
import pandas as pd
import pyarrow as pa
n_left = 100
n_right = 200_000
left_start = pd.Timestamp("2025-04-07T07:45:55", tz="UTC")
right_start = pd.Timestamp("2025-04-07T00:00:00", tz="UTC")
time_end = pd.Timestamp("2025-04-07T12:05:59", tz="UTC")
tolerance_nanos = 60 * 1_000_000_000
np.random.seed(0)
def get_timestamps(start, end, n):
seconds = (end - start).total_seconds()
td = np.random.uniform(0, 1, n)
td *= np.random.choice([0, 1], n)
td *= seconds / td.sum()
td = td.cumsum()
return start + pd.to_timedelta(td, "seconds")
left_schema = pa.schema([pa.field("timestamp", pa.timestamp("ns", "UTC"))])
right_schema = pa.schema(
[
pa.field("timestamp", pa.timestamp("ns", "UTC")),
pa.field("value", pa.float64()),
]
)
left = pa.table(
{"timestamp": get_timestamps(left_start, time_end, n_left)},
schema=left_schema,
)
right = pa.table(
{
"timestamp": get_timestamps(right_start, time_end, n_right),
"value": np.random.normal(100, 5, n_right),
},
schema=right_schema,
)
for i in range(1000):
print(f"{i:>5} | {pd.Timestamp.now()}")
left.join_asof(
right,
on="timestamp",
by=[],
tolerance=tolerance_nanos,
)
```
### Component(s)
Python
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]