fuzing commented on issue #12895:
URL: https://github.com/apache/iceberg/issues/12895#issuecomment-2839697843

   @jesumyip  - sorry, a couple of other things, namely:
   
   - You're resource capping the connect container at 2GB.  Kafka-connect is 
notoriously memory hungry.  Depending upon your use-case, you may run into java 
VM memory exhaustion issues out of the gate.  Make sure that you tune 
KAFKA_HEAP_OPTS such that it's set a couple/few GB below whatever has been 
allocated (we use a 32GB limited container with 28GB allocated to the to the 
JVM heap like so: "-Xmx28G").  For laptop/toy/test configuration I'd probably 
start with 6GB total, with 4GB allocated to JVM heap (although this is quite 
lean for K/C).
   - On startup, in distributed mode, the connector will sit there idle until 
you prime the pump by injecting your config on the connector REST port (see 
example below, with <sections-between-angle-brackets> standing in for variables 
we inject).  This will kick off the process.  Conversely, in standalone mode 
you can do all your config via the environment (but will need to make the 
appropriate changes to the container to get this working - personally I 
wouldn't bother with standalone mode).  Adjust commit threads based on #cores 
allocated (2x #cores is usually the go to, but often you can scale up from 2x 
to keep your cores maxed out for high throughput.  Monitor via docker stats or 
whatever you use).  Configure transforms etc. per your use-case.
   - The example below uses 4 tasks, and in our test setup each task ingests 
from a pair of kafka partitions (i.e. 8 total partitions)
   
   
   ```
   {
             "tasks.max": "4",
             "connector.class": 
"org.apache.iceberg.connect.IcebergSinkConnector",
             "topics": "<some-topic>",
             "key.converter": 
"org.apache.kafka.connect.storage.StringConverter",
             "value.converter": 
"org.apache.kafka.connect.storage.StringConverter",
             "key.converter.schemas.enable": "true",
             "value.converter.schemas.enable": "false",
             "iceberg.catalog": "iceberg",
             "iceberg.catalog.type": "rest",
             "iceberg.catalog.uri": "<catalog-uri>",
             "iceberg.catalog.s3.path-style-access": "true",
             "iceberg.tables.auto-create-enabled": "true",
             "iceberg.tables.evolve-schema-enabled": "true",
             "iceberg.control.commit.interval-ms": "300000",
             "iceberg.control.commit.timeout-ms": "30000",
             "iceberg.control.commit.threads": "8",
             "iceberg.catalog.warehouse": "<warehouse>",
             "iceberg.catalog.io-impl": "<implementation>",
             "iceberg.catalog.s3.endpoint": "<endpoint>",
             "iceberg.catalog.s3.access-key-id": "<key_id>",
             "iceberg.catalog.s3.secret-access-key": "<secret>",
             "iceberg.catalog.client.region": "<region>",
             "transforms": "jsontomap,tsconverter",
             "transforms.jsontomap.type": 
"org.apache.iceberg.connect.transforms.JsonToMapTransform",
             "transforms.jsontomap.json.root": "false",        
             "transforms.tsconverter.type": 
"org.apache.kafka.connect.transforms.TimestampConverter$$Value",
             "transforms.tsconverter.field": "timestamp",
             "transforms.tsconverter.unix.precision": "milliseconds",
             "transforms.tsconverter.target.type": "Timestamp",
             "sink.properties.strip_outer_array": "true"
           }
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to