littleDrew commented on issue #13438: URL: https://github.com/apache/iceberg/issues/13438#issuecomment-3094323984
Hi @pvary, these 3 needed steps, it seems that: step 1 you have done, and **step 3 seems need to be discuss by iceberg community member, does there existed a planed discussion meeting to decide this**. and step 2 looks more like do more test to verify. and as for the performance benefit to integrate with lance compare with parquet, i also have do some test, here i mainly consider full scan performance and random access performance. - The test using 1 millon rows, it shows that, the scan performance may better (1.5~3x faster) than parquet in condition the column not so many(should less than 50 column), but if column are two much scan performance will not be better than parquet. - and the random access performance, my current test like lance official test case, it show that have 40x faster, when only random take 20~50 rows from the file. I still keep doubt as for 40x faster as only test lance file instead of lance table. - disk space issue: i found use lance will take 50% more disk space than parquet. may be we also should consider how to deal with this well. Thus, **do you think some other work like implement much more native lance reader is needed to improve performance**(like random access). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
