zeddit commented on issue #132: URL: https://github.com/apache/iceberg-python/issues/132#issuecomment-1806675152
@Fokko great thanks for your reply. I have carefully read the bullets and I am still not clear if it is realizable for `pyiceberg` to pull out data in a in-order way, i.e. 1) rows in the same partition are placed adjacent and in-order, 2) partitions are placed in order, e.g. if we partition with months, then months should be placed in a growing way. and what effort we need to do to achieve that, i.e. 1) conduct global sort (with the right engine, trino is not the right one), 2) change to another engine like spark than trino who implement global sort that is more consistent with pyiceberg in semantic, 3) modify the pyiceberg source code to assemble data-files in the right order, 4) control the flow how we write data, or 5) there is no way to achieve that. As my test, with trino we could not achieve that by using global sort. I think it's an interoperability issue for trino and pyiceberg. I still has the opinion that pyiceberg has a different view than other distributed query engine because it's architecture and python has no capacity to reorder dataset of more than 100GB well. p.s. sorry for me. I didn't get the point about what's the meanings of `conflict` well, is that when multiple writer is writing the same partition simultaneously and iceberg needs to support transaction which requires to keep the order between each operation. is there any documentation for me to learn why this could be a problem to keep the order. I am sorry to have too many questions, great thanks for you. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org