[I] tableFromIPC incorrectly types columns in schema when there are duplicate column names [arrow-js]

via GitHub Sat, 18 Oct 2025 01:17:22 -0700


danielstreit opened a new issue, #288:
URL: https://github.com/apache/arrow-js/issues/288


   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   **Version:** [email protected] (also tested and noticed the issue in 
17.0.0)
   I have a use case where I have an arrow file that has two columns with the 
same name, but different types. In this example, there are two "id" columns, 
but one is an Int while the other is a string.
   When I load this arrow file using tableFromIPC, the table schema marks both 
columns as strings.
   Simple base64 arrow file used in this example:
   
   ```
   
/////7ABAAAQAAAAAAAKAA4ABgANAAgACgAAAAAABAAQAAAAAAEKAAwAAAAIAAQACgAAAAgAAAAIAAAAAAAAAAIAAADAAAAABAAAAFr///8UAAAAiAAAAIwAAAAAAAAFiAAAAAIAAABAAAAABAAAABT///8IAAAAFAAAAAgAAAAic3RyaW5nIgAAAAAXAAAAU3Bhcms6RGF0YVR5cGU6SnNvblR5cGUATP///wgAAAAQAAAABgAAAFNUUklORwAAFgAAAFNwYXJrOkRhdGFUeXBlOlNxbE5hbWUAAAAAAAAEAAQABAAAAAIAAABpZAAAAAASABgAFAAAABMADAAAAAgABAASAAAAFAAAAIwAAACUAAAAAAAAApgAAAACAAAARAAAAAQAAADM////CAAAABAAAAAGAAAAImxvbmciAAAXAAAAU3Bhcms6RGF0YVR5cGU6SnNvblR5cGUACAAMAAgABAAIAAAACAAAABAAAAAGAAAAQklHSU5UAAAWAAAAU3Bhcms6RGF0YVR5cGU6U3FsTmFtZQAAAAAAAAgADAAIAAcACAAAAAAAAAFAAAAAAgAAAGlkAAD/////yAAAABQAAAAAAAAADAAWAAYABQAIAAwADAAAAAADBAAYAAAAKAAAAAAAAAAAAAoAGAAMAAQACAAKAAAAbAAAABAAAAABAAAAAAAAAAAAAAAFAAAAAAAAAAAAAAABAAAAAAAAAAgAAAAAAAAACAAAAAAAAAAQAAAAAAAAAAEAAAAAAAAAGAAAAAAAAAAIAAAAAAAAACAAAAAAAAAAAgAAAAAAAAAAAAAAAgAAAAEAAAAAAAAAAAAAAAAAAAABAAAAAAAAAAAAAAAAAAAAAQAAAAAAAAAAAAAAAAAAAAEAAAAAAAAAAAAAAAIAAAAxMAAAAAAAAP////8AAAAA
   ```
   
   Below are the details of the Table object returned by `tableFromIPC`. Notice 
how in the schema, both id columns are marked as Utf8, but the column details 
show the difference in column types.
   
   ```
   === Arrow Table Information ===
   Rows: 1
   Columns: 2
   Schema: Schema<{ 0: id: Utf8, 1: id: Utf8 }>
   === Table Contents ===
   [
     {"id": 0, "id": "10"}
   ]
   === Column Details ===
   Column 0: undefined (Int64)
   Column 1: undefined (Utf8)
   ```
   
   <details> <summary> Code to generate the above result </summary>
   
   ```typescript
   const arrowBuffer = base64ToUint8Array(base64String);
   // Parse the Arrow IPC data into a table
   const table = tableFromIPC(arrowBuffer);
   console.log('\n=== Arrow Table Information ===');
   console.log(`Rows: ${table.numRows}`);
   console.log(`Columns: ${table.numCols}`);
   console.log(`Schema: ${table.schema}`);
   console.log('\n=== Table Contents ===');
   console.log(table.toString());
   console.log('\n=== Column Details ===');
   for (let i = 0; i < table.numCols; i++) {
     const column = table.getChildAt(i);
     console.log(`Column ${i}: ${column.name} (${column.type})`);
   }
   ```
   
    </details>
   
   I have tested this same arrow file with pyarrow, and it shows the expected 
result:
   
   ```
   Schema:
    id: int64 not null
     -- field metadata --
     Spark:DataType:SqlName: 'BIGINT'
     Spark:DataType:JsonType: '"long"'
   id: string not null
     -- field metadata --
     Spark:DataType:SqlName: 'STRING'
     Spark:DataType:JsonType: '"string"'
   Batch 0:
   pyarrow.RecordBatch
   id: int64 not null
   id: string not null
   ----
   id: [0]
   id: ["10"]
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] tableFromIPC incorrectly types columns in schema when there are duplicate column names [arrow-js]

Reply via email to