lidavidm commented on PR #30:
URL: https://github.com/apache/iceberg-cpp/pull/30#issuecomment-2750178633
> Then the only remaining I/O we need to address is the table metadata file
which stores a single json string. This is the only reason to keep the
`InputFile` and `OutputFile` interfaces
zhjwpku commented on PR #30:
URL: https://github.com/apache/iceberg-cpp/pull/30#issuecomment-2751059392
I'll refactor the FileIO as per Gang's suggestion. Thank you all for your
valuable input. Starting with a minimal implementation and expanding it as
needed seems like an ideal approach.
pitrou commented on PR #30:
URL: https://github.com/apache/iceberg-cpp/pull/30#issuecomment-2750938941
> Good question! IMHO, a streaming JSON reader is an overkill in this case.
That's why I propose to define a minimal FileIO like:
>
> ```
> class FileIO {
> public:
> virt
lidavidm commented on PR #30:
URL: https://github.com/apache/iceberg-cpp/pull/30#issuecomment-2750433013
That seems reasonable.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific commen
lidavidm commented on PR #30:
URL: https://github.com/apache/iceberg-cpp/pull/30#issuecomment-2750434350
It will mean different implementations won't be exactly drop-in portable but
I think that's too big a goal to tackle anyways
--
This is an automated message from the Apache Git Service
lidavidm commented on PR #30:
URL: https://github.com/apache/iceberg-cpp/pull/30#issuecomment-2750412560
Yeah, in that case I don't think we need InputFile/OutputFile.
Is the idea that basically, iceberg-cpp would effectively not deal with I/O
(or would only deal with I/O of metadata
wgtmac commented on PR #30:
URL: https://github.com/apache/iceberg-cpp/pull/30#issuecomment-2750431481
Exactly. Does that make sense to you?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the sp
wgtmac commented on PR #30:
URL: https://github.com/apache/iceberg-cpp/pull/30#issuecomment-2750307640
Good question! IMHO, a streaming JSON reader is an overkill in this case.
That's why I propose to define a minimal FileIO like:
```
class FileIO {
public:
virtual expected Del
wgtmac commented on code in PR #30:
URL: https://github.com/apache/iceberg-cpp/pull/30#discussion_r2011241905
##
src/iceberg/io/fs_file_reader.cc:
##
@@ -0,0 +1,80 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.
wgtmac commented on code in PR #30:
URL: https://github.com/apache/iceberg-cpp/pull/30#discussion_r2011224442
##
src/iceberg/file_io.h:
##
@@ -0,0 +1,112 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NO
zhjwpku commented on code in PR #30:
URL: https://github.com/apache/iceberg-cpp/pull/30#discussion_r2011195005
##
src/iceberg/io/fs_file_reader.cc:
##
@@ -0,0 +1,80 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.
zhjwpku commented on code in PR #30:
URL: https://github.com/apache/iceberg-cpp/pull/30#discussion_r2011192149
##
src/iceberg/file_io.h:
##
@@ -0,0 +1,112 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the N
wgtmac commented on code in PR #30:
URL: https://github.com/apache/iceberg-cpp/pull/30#discussion_r2010260510
##
src/iceberg/file_io.h:
##
Review Comment:
Why do we need this?
##
src/iceberg/file_io.h:
##
@@ -0,0 +1,112 @@
+/*
+ * Licensed to the Apache S
lidavidm commented on code in PR #30:
URL: https://github.com/apache/iceberg-cpp/pull/30#discussion_r2009380801
##
src/iceberg/file_io.h:
##
@@ -0,0 +1,112 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the
zhjwpku commented on PR #30:
URL: https://github.com/apache/iceberg-cpp/pull/30#issuecomment-2597349794
> > Can we provide both?
>
> I think we can. It is pretty straight-forward to do something similar to
`arrow::RandomnAccessFile`:
https://github.com/apache/arrow/blob/main/cpp/src/
wgtmac commented on PR #30:
URL: https://github.com/apache/iceberg-cpp/pull/30#issuecomment-2597327334
> Can we provide both?
I think we can. It is pretty straight-forward to do something similar to
`arrow::RandomnAccessFile`:
https://github.com/apache/arrow/blob/main/cpp/src/arrow/i
zhjwpku commented on code in PR #30:
URL: https://github.com/apache/iceberg-cpp/pull/30#discussion_r1918650611
##
src/iceberg/io/fs_file_io.h:
##
@@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See
zhjwpku commented on PR #30:
URL: https://github.com/apache/iceberg-cpp/pull/30#issuecomment-2594455927
> Is my understanding correct that this FileIo pertains to locations that
are local, on HDFS, or on S3?
Yeah, I hope this FileIO to be extended to other storages.
--
This is an a
zhjwpku commented on PR #30:
URL: https://github.com/apache/iceberg-cpp/pull/30#issuecomment-2594498304
> For instance, async vs sync: [#2
(comment)](https://github.com/apache/iceberg-cpp/issues/2#issuecomment-2522607394)
Can we provide both?
>
> Or, whether the core librar
MisterRaindrop commented on PR #30:
URL: https://github.com/apache/iceberg-cpp/pull/30#issuecomment-2594372750
Is my understanding correct that this FileIo pertains to locations that are
local, on HDFS, or on S3?
--
This is an automated message from the Apache Git Service.
To respond to t
lidavidm commented on PR #30:
URL: https://github.com/apache/iceberg-cpp/pull/30#issuecomment-2594348439
For instance, async vs sync:
https://github.com/apache/iceberg-cpp/issues/2#issuecomment-2522607394
Or, whether the core library should do any IO at all:
https://github.com/apache
zhjwpku commented on PR #30:
URL: https://github.com/apache/iceberg-cpp/pull/30#issuecomment-2594324922
> Has iceberg-cpp decided on an IO strategy already?
>
> It might be more productive to start writing the IO-less components, such
as parsing the various metadata files, etc.
lidavidm commented on code in PR #30:
URL: https://github.com/apache/iceberg-cpp/pull/30#discussion_r1917497458
##
src/iceberg/io/file_io.h:
##
@@ -0,0 +1,78 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See th
pitrou commented on PR #30:
URL: https://github.com/apache/iceberg-cpp/pull/30#issuecomment-2593303684
Has iceberg-cpp decided on an IO strategy already?
It might be more productive to start writing the IO-less components, such as
parsing the various metadata files, etc.
--
This is
wgtmac commented on PR #30:
URL: https://github.com/apache/iceberg-cpp/pull/30#issuecomment-2593244484
cc @lidavidm @MisterRaindrop
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific co
zhjwpku commented on PR #30:
URL: https://github.com/apache/iceberg-cpp/pull/30#issuecomment-2592893837
@wgtmac @Fokko @pitrou @Xuanwo @raulcd Please take a look at this early
implementation of FileIO, I plan to add more interfaces, before that, I hope to
hear some advices, thanks.
--
T
26 matches
Mail list logo