littleDrew commented on issue #13438:
URL: https://github.com/apache/iceberg/issues/13438#issuecomment-3094323984

   Hi @pvary, these 3 needed steps, it seems that: step 1 you have done, and 
**step 3 seems need to be discuss by iceberg community member, does there 
existed a planed discussion meeting to decide this**. and step 2 looks more 
like do more test to verify.
   
   and as for the performance benefit to integrate with lance compare with 
parquet, i also have do some test, here i mainly consider full scan performance 
and random access performance. 
   - The test using 1 millon rows, it shows that, the scan performance may 
better (1.5~3x faster) than parquet in condition the column not so many(should 
less than 50 column), but if column are two much scan performance will not be 
better than parquet.
   - and the random access performance, my current test like lance official 
test case, it show that have 40x faster, when only random take 20~50 rows from 
the file. I still keep doubt as for 40x faster as only test lance file instead 
of lance table.
   - disk space issue: i found use lance will take 50% more disk space than 
parquet. may be we also should consider how to deal with this well.
   
   Thus, **do you think some other work like implement much more native lance 
reader is needed to improve performance**(like random access).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to