fresh-borzoni commented on PR #2730: URL: https://github.com/apache/fluss/pull/2730#issuecomment-4154465477
@leekeiabstraction TY, looks good, couple of suggestions Let's include list of contributors, it's a good way to say thank you for all dedicated people we are lucky to have. Also there are a couple of things we can mention(choose whatever you think is good): * Idempotent writes like in kafka * Batch scanner mode, where we pass Arrow's batch directly(also we have ability to return raw Arrow with metadata), Java doesn't have this at all, but for Python's Polars/Pandas it's very beneficial, C++ plugs Arrow to DuckDB, Rust's tools also benefit from this mode. Java's clients - spark/flink have their internal row format, so not as of a big benefit, so it was never implemented. * Memory bounded backpressure * priority prefetch PQ for remote segments with better, it's more scalable than Java's. When scanning across many buckets with segments in remote storage, he parallel downloads eliminate the bottleneck of Java's single thread blocking on one segment at a time. * pandas/polars specific API for python Also mb we need cover image, if you like - you can take otter with crab from my blog. if you like. Also some diagrams would be good, few propositions: * layered arch, most simple one, just to explain that there is core SDK and then bindings on top. * batch mode - just to show how it's good for new clients that have better integration with Arrow * PQ fetching remote segments comparing with Java(just to hint how it's different and why Rust has an edge here) - mb a bit dense, so we may prefer this skip this * just an stylized image of all notable features of rust SDK that sells it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
