Forshining opened a new issue, #44059: URL: https://github.com/apache/arrow/issues/44059
### Describe the usage question you have. Please include as many useful details as possible. My question description is as follows: I have a parquet file that has already contains the __index_level_0__ colomn. Now I try to delete some of the rows that are unmatched to some criterion and turn the updated dataframe into another new parquet file. The code is as follows: ```python import pyarrow.parquet as pq import pyarrow as pa import pandas as pd import os ## select the images with aesthetic score >= 5 for m in range(100,128): # table_temp = pq.ParquetFile('part-00000-4e217ab5-40f3-4738-ac05-c1cb9f75ef32-c000.snappy.parquet') # column_names = table_temp.schema.names # print("Column names:", column_names) read_file = f'part-00{m}-4e217ab5-40f3-4738-ac05-c1cb9f75ef32-c000.snappy.parquet' table_temp_pandas = pq.read_table(read_file).to_pandas() index_to_be_deleted = [] for i in range(table_temp_pandas.shape[0]): if table_temp_pandas.loc[i, 'AESTHETIC_SCORE'] < 5: index_to_be_deleted.append(i) table_dropped = table_temp_pandas.drop(table_temp_pandas.index[index_to_be_deleted]) split_string = read_file.split("-") write_file = "new-part-" + split_string[1] + ".parquet" table_dropped_p = pa.Table.from_pandas(table_dropped, preserve_index=False) print(f"Writing new table into parquet......:{m+1}/127") pq.write_table(table_dropped_p, write_file) print("Completed") print(f"Staring to remove the former parquet file: part-00{m}-4e217ab5-40f3-4738-ac05-c1cb9f75ef32-c000.snappy.parquet") os.remove(f"part-00{m}-4e217ab5-40f3-4738-ac05-c1cb9f75ef32-c000.snappy.parquet") print("File removed successfully!") ``` However, when I tried to read the new parquet files with the following codes: ```python read_file = "./datasets--laion--aesthetics_v2_4.75/new-part-00000.parquet" table_temp_pandas = pq.read_table(read_file).to_pandas() ``` here comes an error: ```bash pyarrow.lib.ArrowInvalid: Multiple matches for FieldRef.Name(__index_level_0__) ``` After I check the new dataframe, I found that there are two same colomns of __index_level_0__, one from the original parquet file and another from write_file() operations. However, my situation is that I do not have so much storage to simultaneously store the original and new parquet files. Therefore, is there any solution to directly delete one of the __index_level_0__ colomns only based on the new parquet files? I would appreciate anyone's precious time and help! ### Component(s) Parquet, Python -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org