Fokko commented on issue #132: URL: https://github.com/apache/iceberg-python/issues/132#issuecomment-1803401481
> In my opinion, if we need user to sort large dataset in python, it will make a lot use cases unusable, especially for time series cases. python is slow, not the way like MPP or spark situation. I agree there, but the API is often just in Python. For example, we use PyArrow, which pushes it down into the C++ layer. > I wanna know if pyiceberg read data in a consistent way. Yes, we do some ordering. This is important when we fetch the top n rows, we always want to return the same rows, and not the ones that came back the quickest. > if local sort is used for partitions, the row order within partition will be persevered. in another word, the order is the one when data is being written. Yes this is true, and I think you can achieve this today. Especially when the data is small (which it is in your case IIRC from Slack), then you could just write a single file for the partition. > Every insertion will create a snapshot, and because we insert the row individually, each row will be put in it's own data-file. Every insertion being a new snapshot is okay, but I think it will also create a new manifest per commit. Trino uses [fast-append](https://iceberg.apache.org/spec/#snapshots) which means that for each append operation a new manifest is created, instead of rewriting the existing metadata. We [keep the order of sequences](https://github.com/apache/iceberg-python/blob/main/pyiceberg/table/__init__.py#L871-L875) since the Python list will maintain order. > Then we try to append the same amount of data into the sorted table with If you would add a `ORDER BY 1` you'll introduce a global sort which should probably fix without needing to optimize the table. > It's a bad news that the order between partitions will never be under controlled by any means of controlling the writing method. e.g. even when we conduct a global sort, the month order in the final result is still a random one, which make time series analysis disappointed. I think you can also fix this by adding an `ORDER BY`. > when data is large enough and get split within a partition, i.e. multiple data-files and they contain part-of the whole data. If your data is relative small, then having a single file is best. Also, you could tune the row group sizes to get decent parallism (PyArrow will do this for you). > what will happen when a row deletion occurs, if the order would be messed up. The order should be maintained. > what will happen when scheme evolve occurs. Should not influence the order since it is just schema projection. > besides, it's quite strange that for rows newly inserted to show at the top of the final results, it will add challenge to integrated with the data ingestion subsystem in my opinion. any advices on it. I think this depends mostly on Trino, on how the data is written. As mentioned earlier, the fast-append might mess things up because you're relying on how Trino produces the manifest. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org