[GitHub] [camel-k] lburgazzoli commented on issue #1980: Add support for multiple data types and schemas in Kamelets

GitBox Thu, 04 Feb 2021 01:36:40 -0800


lburgazzoli commented on issue #1980:
URL: https://github.com/apache/camel-k/issues/1980#issuecomment-773169266



   > Let's do another iteration on this...
   > 
   > I'm thinking to your comments and I like the idea of having stuff also as 
CRs. I remember some brainstorming with @lburgazzoli about how dynamic schemas 
may work in this model. The idea was to let Kamelets define their schemas, if 
known in advance, but also let KameletBindings redefine them, if needed.
   > 
   > DataFormats are generic in Camel, but when talking about connectors 
(a.k.a. Kamelets), I think it's better for the Kamelet to enumerate all the 
possible dataformats it supports. E.g. @davsclaus was talking about sources 
that can only produce `binary` data (i.e. no dataformat), but there are many 
other examples: e.g. a "hello world" string cannot be transformed into FHIR 
data by simply plugging the FHIR JSON dataformat, as well as not all data is 
suitable for CSV encoding..
   > 
   > I also see that we're talking about formats and schemas as if they were 
the same thing, but even if they are related (i.e. dataFormat + Kamelet [+ 
Binding Properties] may imply a Schema), maybe we can do a better job in 
treating them as separate entities.
   > 
   > I think the following model may be good for the in-Kamelet specification 
of a "format":
   > 
   > ```yaml
   > kind: Kamelet
   > apiVersion: camel.apache.org/v1alpha1
   > metadata:
   >   name: chuck-source
   > # ... 
   > spec:
   >   definition:
   >     properties:
   >       format:
   >         title: Format
   >         type: string
   >         enum:
   >         - JSON
   >         - Avro
   >         default: JSON
   > # ... 
   > formats:
   > - name: JSON
   >   # optional, useful in case of in/out Kamelets
   >   scope: out
   >   schema:
   >     mediaType: "application/json"
   >     data: # the JSON schema inline
   >     url: # alternative link to the shema
   >     ref: # alternative Kubernetes reference to the schema (see below)
   >       name: # ...
   >   # the source produces JSON by default, no libs or transformations needed
   > 
   > - name: Avro
   >   schema:
   >     type: avro-schema
   >     mediaType: "application/avro"
   >     data: # the avro schema inline
   >     url: # alternative link to the schema
   >     ref: # alternative Kubernetes reference to the schema (see below)
   >       name: # ...
   >   dataFormat:
   >     # optional, but if not provided "no format" is assumed
   >     id: "avro"
   >     properties: # only if "id" is present
   >       class-name: org.apache.camel.xxx.MyClass
   >       compute-schema: true|false
   >       # ...
   >     dependencies:
   >     - camel:jackson
   >     - camel:avro
   >     - mvn:org.acme/my-artifact/1.0.0
   > ```
   > 
   > You can notice the `scope` property that allows to define the specific 
details of transformations for input and output of a particular format. I'd not 
complicate life and assume that users will choose only 1 format using the 
standard `format` property (not an `inputFormat` and `outputFormat`). So if I 
choose `CSV`, the Kamelet will consume and produce CSV. Anyway, the shape 
(schema) of the input CSV can be different from the one of the output CSV (and 
that's described in the Kamelet).
   > 
   
   I think we could also have a case where we want the data format to 
automatically compute the schema i.e. from a pojo, so basically a formats 
whiteout the `schema` section.
   
   > The `schema` here is declared inline in the Kamelet, to make it 
self-contained, but we can create also a `Schema` CR:
   > 
   > ```yaml
   > kind: Schema
   > apiVersion: camel.apache.org/v1alpha1
   > metadata:
   >   name: my-avro-schema
   > spec:
   >   type: avro-schema
   >   mediaType: application/avro
   >   data: # the avro schema inline
   >   url: # alternative URL reference
   >   # no, ref is forbidden here
   > ```
   > 
   > Structure is almost the same as the inline version.
   > 
   > The binding can use the predefined schema:
   > 
   > ```yaml
   > kind: KameletBinding
   > apiVersion: camel.apache.org/v1alpha1
   > metadata:
   >   name: chuck-to-channel
   > spec:
   >   source:
   >     kind: Kamelet
   >     apiVersion: camel.apache.org/v1alpha1
   >     name: chuck-source
   >     properties:
   >       # may have been omitted, since it's the default
   >       format: JSON
   >   sink:
   >     # ...
   > ```
   > 
   > The binding above will produce objects in JSON format with the inline 
definition of the schema. The one below is using a custom schema:
   > 
   > ```yaml
   > kind: KameletBinding
   > apiVersion: camel.apache.org/v1alpha1
   > metadata:
   >   name: chuck-to-channel
   > spec:
   >   source:
   >     kind: Kamelet
   >     apiVersion: camel.apache.org/v1alpha1
   >     name: chuck-source
   >     properties:
   >       # since there's no inline format named "my-avro", it refers to the 
external one
   >       format: Avro
   >     schema:
   >       # since it's a source, we assume this is the schema of the output
   >       ref:
   >         name: my-avro-schema
   >       # or alternatively also inline
   >       data: #...
   >       url: # ...
   >   sink:
   >     # ...
   > ```
   > 
   > This mechanism may be used also in cases where the schema can be computed 
dynamically before running the integration. In this case, an external entity 
saves the schema in a CR and references it in the KameletBinding.
   > 
   > For the use case of using the Schema CR to sync external entities (like 
registries), it's possible, but we should think more about that because of edge 
cases: sometimes the schema is known only at runtime and sometimes it varies 
from message to message. In that cases, it's the integration itself that needs 
to update the registries. Probably it would be cleaner if it's the integration 
that always updates the registry.
   
   Yep, we don't need to publish each schema up-front but for pre-computed 
scheme (either because they are known at runtime or because they are computed 
before running the integration), we should store them as CR so other can 
eventually consume them.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [camel-k] lburgazzoli commented on issue #1980: Add support for multiple data types and schemas in Kamelets

Reply via email to