zeddit commented on issue #132:
URL: https://github.com/apache/iceberg-python/issues/132#issuecomment-1806675152

   @Fokko great thanks for your reply.
   
   I have carefully read the bullets and I am still not clear if it is 
realizable for `pyiceberg` to pull out data in a in-order way, i.e. 1) rows in 
the same partition are placed adjacent and in-order, 2) partitions are placed 
in order, e.g. if we partition with months, then months should be placed in a 
growing way.
   
   and what effort we need to do to achieve that, i.e. 1) conduct global sort 
(with the right engine, trino is not the right one), 2) change to another 
engine like spark than trino who implement global sort that is more consistent 
with pyiceberg in semantic, 3) modify the pyiceberg source code to assemble 
data-files in the right order, 4) control the flow how we write data, or 5) 
there is no way to achieve that.
   
   As my test, with trino we could not achieve that by using global sort. I 
think it's an interoperability issue for trino and pyiceberg. I still has the 
opinion that pyiceberg has a different view than other distributed query engine 
because it's architecture and python has no capacity to reorder dataset of more 
than 100GB well.
   
   p.s. sorry for me. I didn't get the point about what's the meanings of 
`conflict` well,  is that when multiple writer is writing the same partition 
simultaneously and iceberg needs to support transaction which requires to keep 
the order between each operation. is there any documentation for me to learn 
why this could be a problem to keep the order.
   I am sorry to have too many questions, great thanks for you.  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to