danielstreit opened a new issue, #288: URL: https://github.com/apache/arrow-js/issues/288
### Describe the bug, including details regarding any error messages, version, and platform. **Version:** [email protected] (also tested and noticed the issue in 17.0.0) I have a use case where I have an arrow file that has two columns with the same name, but different types. In this example, there are two "id" columns, but one is an Int while the other is a string. When I load this arrow file using tableFromIPC, the table schema marks both columns as strings. Simple base64 arrow file used in this example: ``` /////7ABAAAQAAAAAAAKAA4ABgANAAgACgAAAAAABAAQAAAAAAEKAAwAAAAIAAQACgAAAAgAAAAIAAAAAAAAAAIAAADAAAAABAAAAFr///8UAAAAiAAAAIwAAAAAAAAFiAAAAAIAAABAAAAABAAAABT///8IAAAAFAAAAAgAAAAic3RyaW5nIgAAAAAXAAAAU3Bhcms6RGF0YVR5cGU6SnNvblR5cGUATP///wgAAAAQAAAABgAAAFNUUklORwAAFgAAAFNwYXJrOkRhdGFUeXBlOlNxbE5hbWUAAAAAAAAEAAQABAAAAAIAAABpZAAAAAASABgAFAAAABMADAAAAAgABAASAAAAFAAAAIwAAACUAAAAAAAAApgAAAACAAAARAAAAAQAAADM////CAAAABAAAAAGAAAAImxvbmciAAAXAAAAU3Bhcms6RGF0YVR5cGU6SnNvblR5cGUACAAMAAgABAAIAAAACAAAABAAAAAGAAAAQklHSU5UAAAWAAAAU3Bhcms6RGF0YVR5cGU6U3FsTmFtZQAAAAAAAAgADAAIAAcACAAAAAAAAAFAAAAAAgAAAGlkAAD/////yAAAABQAAAAAAAAADAAWAAYABQAIAAwADAAAAAADBAAYAAAAKAAAAAAAAAAAAAoAGAAMAAQACAAKAAAAbAAAABAAAAABAAAAAAAAAAAAAAAFAAAAAAAAAAAAAAABAAAAAAAAAAgAAAAAAAAACAAAAAAAAAAQAAAAAAAAAAEAAAAAAAAAGAAAAAAAAAAIAAAAAAAAACAAAAAAAAAAAgAAAAAAAAAAAAAAAgAAAAEAAAAAAAAAAAAAAAAAAAABAAAAAAAAAAAAAAAAAAAAAQAAAAAAAAAAAAAAAAAAAAEAAAAAAAAAAAAAAAIAAAAxMAAAAAAAAP////8AAAAA ``` Below are the details of the Table object returned by `tableFromIPC`. Notice how in the schema, both id columns are marked as Utf8, but the column details show the difference in column types. ``` === Arrow Table Information === Rows: 1 Columns: 2 Schema: Schema<{ 0: id: Utf8, 1: id: Utf8 }> === Table Contents === [ {"id": 0, "id": "10"} ] === Column Details === Column 0: undefined (Int64) Column 1: undefined (Utf8) ``` <details> <summary> Code to generate the above result </summary> ```typescript const arrowBuffer = base64ToUint8Array(base64String); // Parse the Arrow IPC data into a table const table = tableFromIPC(arrowBuffer); console.log('\n=== Arrow Table Information ==='); console.log(`Rows: ${table.numRows}`); console.log(`Columns: ${table.numCols}`); console.log(`Schema: ${table.schema}`); console.log('\n=== Table Contents ==='); console.log(table.toString()); console.log('\n=== Column Details ==='); for (let i = 0; i < table.numCols; i++) { const column = table.getChildAt(i); console.log(`Column ${i}: ${column.name} (${column.type})`); } ``` </details> I have tested this same arrow file with pyarrow, and it shows the expected result: ``` Schema: id: int64 not null -- field metadata -- Spark:DataType:SqlName: 'BIGINT' Spark:DataType:JsonType: '"long"' id: string not null -- field metadata -- Spark:DataType:SqlName: 'STRING' Spark:DataType:JsonType: '"string"' Batch 0: pyarrow.RecordBatch id: int64 not null id: string not null ---- id: [0] id: ["10"] ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
