Fokko opened a new issue, #45557: URL: https://github.com/apache/arrow/issues/45557
### Describe the enhancement requested Consider the following code: ```python Python 3.10.14 (main, Mar 19 2024, 21:46:16) [Clang 15.0.0 (clang-1500.3.9.4)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import pyarrow as pa >>> >>> arrow_schema = pa.schema( ... [ ... pa.field("city", pa.string(), nullable=False), ... pa.field("population", pa.int32(), nullable=False), ... ] ... ) >>> >>> # Write some data >>> df = pa.Table.from_pylist( ... [ ... {"city": "Amsterdam", "population": 921402}, ... {"city": "San Francisco", "population": 808988}, ... ], ... schema=arrow_schema, ... ) >>> >>> joined = df.join(df, "city", join_type="inner") >>> >>> joined pyarrow.Table city: string population: int32 population: int32 ---- city: [["Amsterdam","San Francisco"]] population: [[921402,808988]] population: [[921402,808988]] >>> df pyarrow.Table city: string not null population: int32 not null ---- city: [["Amsterdam","San Francisco"]] population: [[921402,808988]] ``` We do an inner join of two `not null` fields, but the output is nullable. Since we know that with the inner join no nulls can be added, and if both sides are not-null, we can set the output as not null too. I would be happy to see if I can add this with some pointers to the relevant code. ### Component(s) Python -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org