Re: [I] Support partitioned writes [iceberg-python]

2025-01-20 Thread via GitHub
Fokko commented on issue #208: URL: https://github.com/apache/iceberg-python/issues/208#issuecomment-2602221346 https://github.com/apache/iceberg-python/pull/1345 has been merged, closing this one :) -- This is an automated message from the Apache Git Service. To respond to the message, p

Re: [I] Support partitioned writes [iceberg-python]

2025-01-20 Thread via GitHub
Fokko closed issue #208: Support partitioned writes URL: https://github.com/apache/iceberg-python/issues/208 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-m

Re: [I] Support partitioned writes [iceberg-python]

2024-10-30 Thread via GitHub
Fokko commented on issue #208: URL: https://github.com/apache/iceberg-python/issues/208#issuecomment-2448147351 @RLashofRegas Sorry for the long wait, @sungwy has been working on adding a rust extension to efficiently run the bucketing transform 🥳 We're blocked on a release on the rust sid

Re: [I] Support partitioned writes [iceberg-python]

2024-08-24 Thread via GitHub
RLashofRegas commented on issue #208: URL: https://github.com/apache/iceberg-python/issues/208#issuecomment-2308652459 @Fokko any expected timeline you can share on support for bucket transform? Is there a separate issue I can follow for that? Thanks for all the hard work so far!! -- Thi

Re: [I] Support partitioned writes [iceberg-python]

2024-06-09 Thread via GitHub
mike-luabase commented on issue #208: URL: https://github.com/apache/iceberg-python/issues/208#issuecomment-2156618883 Here's what I've been trying (sorry for long example, but thought the context would help) ```python iowa_sales_df = pcsv.read_csv("/Users/mritchie712/blackbird/de

Re: [I] Support partitioned writes [iceberg-python]

2024-06-09 Thread via GitHub
mike-luabase commented on issue #208: URL: https://github.com/apache/iceberg-python/issues/208#issuecomment-2156617018 @Fokko I installed from source, but I'm [hitting this error](https://github.com/apache/iceberg-python/blob/94e8a9835995e3b61f07f0dfb48d8a22a1e1d1b0/pyiceberg/table/__init__.

Re: [I] Support partitioned writes [iceberg-python]

2024-06-09 Thread via GitHub
Fokko commented on issue #208: URL: https://github.com/apache/iceberg-python/issues/208#issuecomment-2156424393 Hey everyone, the support for partioned writes are coming along pretty nicely. We miss some of the transforms, such as the bucket transform. Most of the stuff is on the main branc

Re: [I] Support partitioned writes [iceberg-python]

2024-06-08 Thread via GitHub
ppasquet commented on issue #208: URL: https://github.com/apache/iceberg-python/issues/208#issuecomment-2156311362 Curious as well as where you guys are standing on partitioned write. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to G

Re: [I] Support partitioned writes [iceberg-python]

2024-06-06 Thread via GitHub
deepika094 commented on issue #208: URL: https://github.com/apache/iceberg-python/issues/208#issuecomment-2151843836 hi, do we have any way to write to partitioned table so far? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [I] Support partitioned writes [iceberg-python]

2024-04-30 Thread via GitHub
jaychia commented on issue #208: URL: https://github.com/apache/iceberg-python/issues/208#issuecomment-2085981955 > Idea from @Fokko - support day/month/year transforms first You can also try using the transforms that Daft has already implemented. Full list of transforms: * [Ex

Re: [I] Support partitioned writes [iceberg-python]

2024-04-30 Thread via GitHub
syun64 commented on issue #208: URL: https://github.com/apache/iceberg-python/issues/208#issuecomment-2085959258 Idea from @Fokko - support day/month/year transforms first -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

Re: [I] Support partitioned writes [iceberg-python]

2024-04-29 Thread via GitHub
jqin61 commented on issue #208: URL: https://github.com/apache/iceberg-python/issues/208#issuecomment-2083999610 Updates for monthly sync: 1. Working on dynamic overwrite which gets unblocked by partial deletes https://github.com/apache/iceberg-python/pull/569 2. For transforms functio

Re: [I] Support partitioned writes [iceberg-python]

2024-02-01 Thread via GitHub
jqin61 commented on issue #208: URL: https://github.com/apache/iceberg-python/issues/208#issuecomment-1922851143 Opened draft PR with working code samples (it supports partitioned append with identity transform for now): https://github.com/apache/iceberg-python/pull/353 -- This is an aut

Re: [I] Support partitioned writes [iceberg-python]

2024-02-01 Thread via GitHub
jqin61 commented on issue #208: URL: https://github.com/apache/iceberg-python/issues/208#issuecomment-1921735727 I have an incoming PR with working code samples that conform to the design above and cover identity transform + append as the first step of supporting partitioned write. During i

Re: [I] Support partitioned writes [iceberg-python]

2024-01-31 Thread via GitHub
syun64 commented on issue #208: URL: https://github.com/apache/iceberg-python/issues/208#issuecomment-1919558503 The [Design Document](https://docs.google.com/document/d/1TLIzxKJilvhAq4JDoGMWMZdkRZXvcrG5YrxLvJ5UXkQ/edit#heading=h.f84o4qaemlga) on data file writes that was discussed during t

Re: [I] Support partitioned writes [iceberg-python]

2024-01-26 Thread via GitHub
syun64 commented on issue #208: URL: https://github.com/apache/iceberg-python/issues/208#issuecomment-1912742015 @jqin61 and I discussed this a great deal offline, and we just wanted to follow up on step (2). If we wanted to use existing PyArrow functions, I think we could use a 2 pass algo

Re: [I] Support partitioned writes [iceberg-python]

2024-01-26 Thread via GitHub
asheeshgarg commented on issue #208: URL: https://github.com/apache/iceberg-python/issues/208#issuecomment-1912738795 @Fokko @syun64 @syun64 another option I can think is use polars to do it simple example below with hashing and partitioning sorting in partition. Where all the partition is

Re: [I] Support partitioned writes [iceberg-python]

2024-01-26 Thread via GitHub
asheeshgarg commented on issue #208: URL: https://github.com/apache/iceberg-python/issues/208#issuecomment-1912636932 @jqin61 I have also seen this behavior pyarrow.dataset.write_dataset(), its behavior removes the partition columns in the written-out parquet files. @syun64 above approac

Re: [I] Support partitioned writes [iceberg-python]

2024-01-26 Thread via GitHub
syun64 commented on issue #208: URL: https://github.com/apache/iceberg-python/issues/208#issuecomment-1912560771 Maybe another approach we could take if we want to use existing PyArrow functions is: 1. table.sort_by (all partitions) 2. figure out the row index for each permutation of p

Re: [I] Support partitioned writes [iceberg-python]

2024-01-26 Thread via GitHub
syun64 commented on issue #208: URL: https://github.com/apache/iceberg-python/issues/208#issuecomment-1912495464 Right, as @jqin61 mentioned, if we only had to support **Transformed Partitions**, we could have employed some hack to add partition column to the dataset, which gets consumed by

Re: [I] Support partitioned writes [iceberg-python]

2024-01-26 Thread via GitHub
jqin61 commented on issue #208: URL: https://github.com/apache/iceberg-python/issues/208#issuecomment-1912416306 @Fokko Thank you! These 2 points of supporting hidden partitioning and extracting metrics efficiently during writing are very insightful! For using pyarrow.dataset.write_da

Re: [I] Support partitioned writes [iceberg-python]

2024-01-25 Thread via GitHub
asheeshgarg commented on issue #208: URL: https://github.com/apache/iceberg-python/issues/208#issuecomment-1910802240 @Fokko thanks for pointing out the mismatch it worked. After modifying the datatype it worked. -- This is an automated message from the Apache Git Service. To res

Re: [I] Support partitioned writes [iceberg-python]

2024-01-25 Thread via GitHub
Fokko commented on issue #208: URL: https://github.com/apache/iceberg-python/issues/208#issuecomment-1910290987 Hey @jqin61 Thanks for the elaborate post, and sorry for my slow reply. I did want to take the time to write a good answer. Probably the following statement needs ano

Re: [I] Support partitioned writes [iceberg-python]

2024-01-24 Thread via GitHub
asheeshgarg commented on issue #208: URL: https://github.com/apache/iceberg-python/issues/208#issuecomment-1908955580 @Fokko @jqin61 Today I tried basic example on partition write from pyiceberg.io.pyarrow import schema_to_pyarrow import pyarrow as pa from pyarrow import parquet

Re: [I] Support partitioned writes [iceberg-python]

2024-01-23 Thread via GitHub
syun64 commented on issue #208: URL: https://github.com/apache/iceberg-python/issues/208#issuecomment-1906879564 > @Fokko @jqin61 I am also interested in this to move forward as we also deal with lot of write involves partitions. Happy to collaborate on to this. For write_dataset() we might

Re: [I] Support partitioned writes [iceberg-python]

2024-01-23 Thread via GitHub
jqin61 commented on issue #208: URL: https://github.com/apache/iceberg-python/issues/208#issuecomment-1906856252 > @jqin61 just wondering if we can use this directly https://arrow.apache.org/docs/python/generated/pyarrow.dataset.partitioning.html Thank you Ashish! I overlooked it, as

Re: [I] Support partitioned writes [iceberg-python]

2024-01-23 Thread via GitHub
asheeshgarg commented on issue #208: URL: https://github.com/apache/iceberg-python/issues/208#issuecomment-1906088623 @jqin61 just wondering if we can use this directly https://arrow.apache.org/docs/python/generated/pyarrow.dataset.partitioning.html -- This is an automated mess

Re: [I] Support partitioned writes [iceberg-python]

2024-01-19 Thread via GitHub
jqin61 commented on issue #208: URL: https://github.com/apache/iceberg-python/issues/208#issuecomment-1901345425 Based on the existing discussion, there are 3 major possible directions for detecting partitions and writing each partition in a multi-threaded way to maximize I/O. It seems ther

Re: [I] Support partitioned writes [iceberg-python]

2024-01-15 Thread via GitHub
Fokko commented on issue #208: URL: https://github.com/apache/iceberg-python/issues/208#issuecomment-1891664926 @jqin61 I did some more thinking over the weekend, and I think that the approach that you suggested is the most flexible. I forgot about the sort-order that we also want to add at

Re: [I] Support partitioned writes [iceberg-python]

2024-01-12 Thread via GitHub
Fokko commented on issue #208: URL: https://github.com/apache/iceberg-python/issues/208#issuecomment-1889891973 I currently see two approaches: - First get the unique partitions, and then filter for each of the partitions the relevant data. It is nice that we know the partition upfron

Re: [I] Support partitioned writes [iceberg-python]

2024-01-12 Thread via GitHub
jqin61 commented on issue #208: URL: https://github.com/apache/iceberg-python/issues/208#issuecomment-1889656103 >How are we going to fan out the writing of the data. We have an Arrow table, what is an efficient way to compute the partitions and scale out the work. For example, are we going

Re: [I] Support partitioned writes [iceberg-python]

2024-01-10 Thread via GitHub
jqin61 commented on issue #208: URL: https://github.com/apache/iceberg-python/issues/208#issuecomment-1885365903 > In Iceberg it can be that some files are still on an older partitioning, we should make sure that we handle those correctly based on the that we provide. It seems Spark's

Re: [I] Support partitioned writes [iceberg-python]

2024-01-06 Thread via GitHub
Fokko commented on issue #208: URL: https://github.com/apache/iceberg-python/issues/208#issuecomment-1879726210 Hey @jqin61 Thanks for replying here. I'm not aware of the fact that anyone already started on this. It would be great if you can take a stab at it 🚀 -- This is an automated me

Re: [I] Support partitioned writes [iceberg-python]

2024-01-05 Thread via GitHub
jqin61 commented on issue #208: URL: https://github.com/apache/iceberg-python/issues/208#issuecomment-1879332455 Hi @Fokko and Iceberg community, I and @syun64 are continuing working on testing the write capability in [Write support pr](https://github.com/apache/iceberg-python/pull/41). We