Al-Moatasem opened a new issue, #974:
URL: https://github.com/apache/iceberg-python/issues/974

   ### Apache Iceberg version
   
   0.6.1 (latest release)
   
   ### Please describe the bug 🐞
   
   Hi,
   
   I am trying to use the **rest** catalog and writing the data into **Minio**, 
the script I am using can communicate with Minio (it creates the 
`metadata.json` file under `metadata` directory, however, it raises `OSError: 
When initiating multiple part upload for key 
'poc_new/coordinates/data/00000-0-f27b7921-a6d7-4c7e-b034-2d12221e5054.parquet' 
in bucket 'warehouse': AWS Error NETWORK_CONNECTION during 
CreateMultipartUpload operation: Encountered network error when sending http 
request`
   
   this is the docker compose file that I use
   ```yaml
   version: '3'
   services:
     rest:
       image: tabulario/iceberg-rest:1.5.0
       container_name: iceberg-rest
       ports:
         - 8181:8181
       environment:
         - AWS_ACCESS_KEY_ID=admin
         - AWS_SECRET_ACCESS_KEY=password
         - AWS_REGION=us-east-1
         - CATALOG_WAREHOUSE=s3://warehouse/
         - CATALOG_IO__IMPL=org.apache.iceberg.aws.s3.S3FileIO
         - CATALOG_S3_ENDPOINT=http://minio:9000
       networks:
         iceberg-rest:
   
   
     minio:
       image: minio/minio:RELEASE.2024-05-10T01-41-38Z
       container_name: minio
       environment:
         - MINIO_ROOT_USER=admin
         - MINIO_ROOT_PASSWORD=password
         - MINIO_DOMAIN=minio
   
       ports:
         - 9001:9001
         - 9000:9000
       command: [ "server", "/data", "--console-address", ":9001" ]
       networks:
         iceberg-rest:
           aliases:
             - warehouse.minio 
   
     mc:
       depends_on:
         - minio
       image: minio/mc:RELEASE.2024-05-09T17-04-24Z
       container_name: mc
       entrypoint: |
         /bin/sh -c "
           until (/usr/bin/mc config host add minio http://minio:9000 admin 
password)
           do
             echo '...waiting...' && sleep 1;
           done;
           /usr/bin/mc rm -r --force minio/warehouse;
           /usr/bin/mc mb minio/warehouse;
           /usr/bin/mc policy set public minio/warehouse;
           tail -f /dev/null
         "
       environment:
         - AWS_ACCESS_KEY_ID=admin
         - AWS_SECRET_ACCESS_KEY=password
         - AWS_REGION=us-east-1
       networks:
         iceberg-rest:
   
   
   networks:
     iceberg-rest:
   
   ```
   
   And this the script file
   ```py
   import pyarrow as pa
   from pyiceberg.catalog import load_rest
   from pyiceberg.exceptions import NamespaceAlreadyExistsError, 
TableAlreadyExistsError
   
   catalog = load_rest(
       name="rest",
       conf={
           "uri": "http://localhost:8181/";,
       },
   )
   
   
   namespace = "poc_new"
   try:
       catalog.create_namespace(namespace)
   except NamespaceAlreadyExistsError as e:
       pass
   
   
   df = pa.Table.from_pylist(
       [
           {"lat": 52.371807, "long": 4.896029},
           {"lat": 52.387386, "long": 4.646219},
           {"lat": 52.078663, "long": 4.288788},
       ],
   )
   schema = df.schema
   
   table_name = "coordinates"
   table_identifier = f"{namespace}.{table_name}"
   try:
       table = catalog.create_table(
           identifier=table_identifier,
           schema=schema,
       )
   except TableAlreadyExistsError as e:
       pass
   
   table = catalog.load_table(table_identifier)
   table.append(df)
   ```
   
   The Traceback
   ```
   Traceback (most recent call last):
     File "d:\flink_iceberg\poc_01_iceberg_rest.py", line 40, in <module>
       table.append(df)
     File 
"D:\flink_iceberg\.venv2\Lib\site-packages\pyiceberg\table\__init__.py", line 
1068, in append
       for data_file in data_files:
     File 
"D:\flink_iceberg\.venv2\Lib\site-packages\pyiceberg\table\__init__.py", line 
2423, in _dataframe_to_data_files
       yield from write_file(table, iter([WriteTask(write_uuid, next(counter), 
df)]))
                  
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File "D:\flink_iceberg\.venv2\Lib\site-packages\pyiceberg\io\pyarrow.py", 
line 1726, in write_file
       with fo.create(overwrite=True) as fos:
            ^^^^^^^^^^^^^^^^^^^^^^^^^
     File "D:\flink_iceberg\.venv2\Lib\site-packages\pyiceberg\io\pyarrow.py", 
line 299, in create
       output_file = self._filesystem.open_output_stream(self._path, 
buffer_size=self._buffer_size)
                     
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File "pyarrow\_fs.pyx", line 868, in 
pyarrow._fs.FileSystem.open_output_stream
     File "pyarrow\error.pxi", line 144, in 
pyarrow.lib.pyarrow_internal_check_status
     File "pyarrow\error.pxi", line 115, in pyarrow.lib.check_status
   OSError: When initiating multiple part upload for key 
'poc_new/coordinates/data/00000-0-efc0be57-453d-442d-af13-2e0b2382a53d.parquet' 
in bucket 'warehouse': AWS Error NETWORK_CONNECTION during 
CreateMultipartUpload operation: Encountered network error when sending http 
request
   ```
   In Minio, the metadata directory is created and it stores the 
`metadata.json` file, but, no `data` directory.
   
![image](https://github.com/user-attachments/assets/f301b226-dab4-4453-9755-955096475ba3)
   
   
   Also, this is the requirements.txt file
   ```
   annotated-types==0.7.0
   apache-beam==2.48.0
   apache-flink==1.19.1
   apache-flink-libraries==1.19.1
   avro-python3==1.10.2
   certifi==2024.7.4
   charset-normalizer==3.3.2
   click==8.1.7
   cloudpickle==2.2.1
   colorama==0.4.6
   confluent-kafka==2.5.0
   crcmod==1.7
   dill==0.3.1.1
   dnspython==2.6.1
   docopt==0.6.2
   duckdb==0.9.2
   duckdb_engine==0.13.0
   Faker==26.0.0
   fastavro==1.9.5
   fasteners==0.19
   fsspec==2023.12.2
   greenlet==3.0.3
   grpcio==1.65.1
   hdfs==2.7.3
   httplib2==0.22.0
   idna==3.7
   kafka-python==2.0.2
   markdown-it-py==3.0.0
   mdurl==0.1.2
   mmhash3==3.0.1
   numpy==1.24.4
   objsize==0.6.1
   orjson==3.10.6
   packaging==24.1
   pandas==2.2.2
   polars==1.2.1
   proto-plus==1.24.0
   protobuf==4.23.4
   py4j==0.10.9.7
   pyarrow==11.0.0
   pydantic==2.8.2
   pydantic-settings==2.3.4
   pydantic_core==2.20.1
   pydot==1.4.2
   Pygments==2.18.0
   pyiceberg==0.6.1
   pymongo==4.8.0
   pyparsing==3.1.2
   python-dateutil==2.9.0.post0
   python-dotenv==1.0.1
   pytz==2024.1
   regex==2024.7.24
   requests==2.32.3
   rich==13.7.1
   ruamel.yaml==0.18.6
   ruamel.yaml.clib==0.2.8
   six==1.16.0
   sortedcontainers==2.4.0
   SQLAlchemy==2.0.31
   strictyaml==1.7.3
   typing_extensions==4.12.2
   tzdata==2024.1
   urllib3==2.2.2
   zstandard==0.23.0
   ```
   
   I checked [this Slack 
thread](https://apache-iceberg.slack.com/archives/C029EE6HQ5D/p1707633685716559)
 for the same issue, but, it doesn't contain any fix for my case.
   OS: Windows 10
   
   environment variables contain `aws` in the three containers
   
   `iceberg-rest` container
   ```
   iceberg@ce79d3f11b5f:/usr/lib/iceberg-rest$ env | grep -i aws
   AWS_REGION=us-east-1
   CATALOG_IO__IMPL=org.apache.iceberg.aws.s3.S3FileIO
   AWS_SECRET_ACCESS_KEY=password
   AWS_ACCESS_KEY_ID=admin
   ```
   
   `minio` container, doesn't have any ENV with `aws`
   
   `mc` container
   ```
   AWS_REGION=us-east-1
   AWS_SECRET_ACCESS_KEY=password
   AWS_ACCESS_KEY_ID=admin
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to