[
https://issues.apache.org/jira/browse/KAFKA-6002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Edvard Poliakov updated KAFKA-6002:
-----------------------------------
Description:
My colleague and I have been working on a new Transform, that takes a JSON
string and transforms it into an actual object, like this:
{code}
{
"a" : "{\"b\": 23}"
}
{code}
into
{code}
{
"a" : {
"b" : 23
}
}
{code}
There is no robust way of building a Schema from a JSON object itself, as it
can be something like an empty array or a null, that doesn't provide any info
on the schema of the object. So I see two options here.
1. For a transform to take in schema as a transform parameter. The problem I
found with this is that it is not clear what JSON schema specification should
be used for this? I assume it would be reasonable to use
http://json-schema.org/, but it doesn't seem that Kafka Connect supports it
currently, moreover reading through JsonConverter class in Kafka Connect, I am
not able to understand what spec does the Json Schema have that is used in that
class, for example {{asConnectSchema}} method on {{JsonConverter}}, [see
here|https://github.com/apache/kafka/blob/trunk/connect/json/src/main/java/org/apache/kafka/connect/json/JsonConverter.java#L415].
2. On each object received, keep updating the schema, but I can't see a
standard and robust way of handling edge cases.
I am happy to create a pull request for this transform, if we can agree on
something here. :)
was:
My colleague and I have been working on a new Transform, that takes a JSON
string and transforms it into an actual object, like this:
{code}
{
"a" : "{\"b\": 23}"
}
{code}
into
{code}
{
"a" : {
"b" : 23
}
}
{code}
There is no robust way of building a Schema from a JSON object itself, as it
can be something like an empty array or a null, that doesn't provide any info
on the schema of the object. So I see two options here.
1. For a transform to take in schema as a transform parameter. The problem I
found with this is that it is not clear what JSON schema specification should
be used for this? I assume it would be reasonable to use
http://json-schema.org/, but it doesn't seem that Kafka Connect supports it
currently, moreover reading through JsonConverter class in Kafka Connect, I am
not able to understand what spec does the Json Schema have that is used in that
class, for example {{asConnectSchema}} method on {{JsonConverter}}, .
2. On each object received, keep updating the schema, but I can't see a
standard and robust way of handling edge cases.
I am happy to create a pull request for this transform, if we can agree on
something here. :)
> Kafka Connect Transform transforming JSON string into actual object
> -------------------------------------------------------------------
>
> Key: KAFKA-6002
> URL: https://issues.apache.org/jira/browse/KAFKA-6002
> Project: Kafka
> Issue Type: Improvement
> Components: KafkaConnect
> Reporter: Edvard Poliakov
> Priority: Minor
>
> My colleague and I have been working on a new Transform, that takes a JSON
> string and transforms it into an actual object, like this:
> {code}
> {
> "a" : "{\"b\": 23}"
> }
> {code}
> into
> {code}
> {
> "a" : {
> "b" : 23
> }
> }
> {code}
> There is no robust way of building a Schema from a JSON object itself, as it
> can be something like an empty array or a null, that doesn't provide any info
> on the schema of the object. So I see two options here.
> 1. For a transform to take in schema as a transform parameter. The problem I
> found with this is that it is not clear what JSON schema specification should
> be used for this? I assume it would be reasonable to use
> http://json-schema.org/, but it doesn't seem that Kafka Connect supports it
> currently, moreover reading through JsonConverter class in Kafka Connect, I
> am not able to understand what spec does the Json Schema have that is used in
> that class, for example {{asConnectSchema}} method on {{JsonConverter}}, [see
> here|https://github.com/apache/kafka/blob/trunk/connect/json/src/main/java/org/apache/kafka/connect/json/JsonConverter.java#L415].
> 2. On each object received, keep updating the schema, but I can't see a
> standard and robust way of handling edge cases.
> I am happy to create a pull request for this transform, if we can agree on
> something here. :)
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)