[
https://issues.apache.org/jira/browse/KAFKA-13926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Kumud Kumar Srivatsava Tirupati updated KAFKA-13926:
----------------------------------------------------
Description:
Hello,
Today's connect predicates enables checks on the record metadata. However, this
can be limiting considering {*}many inbuilt and custom transformations that we
(community) use are more key/value centric{*}.
Some use-cases this can solve:
* Data type conversions of certain pre-identified fields for records coming
across datasets only if those fields exist. [Ex: TimestampConverter can be run
only if the specified date field exists irrespective of the record metadata]
* Skip running certain transform if a given field does/does not exist. A lot
of inbuilt transforms raise exceptions (Ex: InsertField transform if the field
already exists) thereby breaking the task. Giving this control enable users to
consciously configure for such cases.
* Even though some inbuilt transforms explicitly handle these cases, it would
still be an unnecessary pass-through loop.
* Considering each connector usually deals with multiple datasets (Even 100s
for a database CDC connector), metadata-centric predicate checking will be
somewhat limiting when we talk about such pre-identified custom metadata fields
in the records.
I know some of these cases can be handled within the transforms itself but that
defeats the purpose of having predicates.
We have built this predicate for us and it is found to be extremely helpful.
Please let me know your thoughts on the same so that I can raise a PR.
KIP:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-845%3A+%27HasField%27+predicate+for+kafka+connect
was:
Hello,
Today's connect predicates enables checks on the record metadata. However, this
can be limiting considering {*}many inbuilt and custom transformations that we
(community) use are more key/value centric{*}.
Some use-cases this can solve:
* Data type conversions of certain pre-identified fields for records coming
across datasets only if those fields exist. [Ex: TimestampConverter can be run
only if the specified date field exists irrespective of the record metadata]
* Skip running certain transform if a given field does/does not exist. A lot
of inbuilt transforms raise exceptions (Ex: InsertField transform if the field
already exists) thereby breaking the task. Giving this control enable users to
consciously configure for such cases.
* Even though some inbuilt transforms explicitly handle these cases, it would
still be an unnecessary pass-through loop.
* Considering each connector usually deals with multiple datasets (Even 100s
for a database CDC connector), metadata-centric predicate checking will be
somewhat limiting when we talk about such pre-identified custom metadata fields
in the records.
I know some of these cases can be handled within the transforms itself but that
defeats the purpose of having predicates.
We have built this predicate for us and it is found to be extremely helpful.
Please let me know your thoughts on the same so that I can raise a PR.
> Proposal to have "HasField" predicate for kafka connect
> -------------------------------------------------------
>
> Key: KAFKA-13926
> URL: https://issues.apache.org/jira/browse/KAFKA-13926
> Project: Kafka
> Issue Type: Improvement
> Components: KafkaConnect
> Reporter: Kumud Kumar Srivatsava Tirupati
> Assignee: Kumud Kumar Srivatsava Tirupati
> Priority: Major
>
> Hello,
> Today's connect predicates enables checks on the record metadata. However,
> this can be limiting considering {*}many inbuilt and custom transformations
> that we (community) use are more key/value centric{*}.
> Some use-cases this can solve:
> * Data type conversions of certain pre-identified fields for records coming
> across datasets only if those fields exist. [Ex: TimestampConverter can be
> run only if the specified date field exists irrespective of the record
> metadata]
> * Skip running certain transform if a given field does/does not exist. A lot
> of inbuilt transforms raise exceptions (Ex: InsertField transform if the
> field already exists) thereby breaking the task. Giving this control enable
> users to consciously configure for such cases.
> * Even though some inbuilt transforms explicitly handle these cases, it
> would still be an unnecessary pass-through loop.
> * Considering each connector usually deals with multiple datasets (Even 100s
> for a database CDC connector), metadata-centric predicate checking will be
> somewhat limiting when we talk about such pre-identified custom metadata
> fields in the records.
> I know some of these cases can be handled within the transforms itself but
> that defeats the purpose of having predicates.
> We have built this predicate for us and it is found to be extremely helpful.
> Please let me know your thoughts on the same so that I can raise a PR.
>
> KIP:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-845%3A+%27HasField%27+predicate+for+kafka+connect
--
This message was sent by Atlassian Jira
(v8.20.7#820007)