Fokko opened a new issue, #45557:
URL: https://github.com/apache/arrow/issues/45557

   ### Describe the enhancement requested
   
   Consider the following code:
   
   ```python
   Python 3.10.14 (main, Mar 19 2024, 21:46:16) [Clang 15.0.0 
(clang-1500.3.9.4)] on darwin
   Type "help", "copyright", "credits" or "license" for more information.
   >>> import pyarrow as pa
   >>> 
   >>> arrow_schema = pa.schema(
   ...     [
   ...         pa.field("city", pa.string(), nullable=False),
   ...         pa.field("population", pa.int32(), nullable=False),
   ...     ]
   ... )
   >>> 
   >>> # Write some data
   >>> df = pa.Table.from_pylist(
   ...     [
   ...         {"city": "Amsterdam", "population": 921402},
   ...         {"city": "San Francisco", "population": 808988},
   ...     ],
   ...     schema=arrow_schema,
   ... )
   >>> 
   >>> joined = df.join(df, "city", join_type="inner")
   >>> 
   >>> joined
   pyarrow.Table
   city: string
   population: int32
   population: int32
   ----
   city: [["Amsterdam","San Francisco"]]
   population: [[921402,808988]]
   population: [[921402,808988]]
   >>> df
   pyarrow.Table
   city: string not null
   population: int32 not null
   ----
   city: [["Amsterdam","San Francisco"]]
   population: [[921402,808988]]
   ```
   
   We do an inner join of two `not null` fields, but the output is nullable. 
Since we know that with the inner join no nulls can be added, and if both sides 
are not-null, we can set the output as not null too. 
   
   I would be happy to see if I can add this with some pointers to the relevant 
code.
   
   ### Component(s)
   
   Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to