lburgazzoli commented on issue #1980: URL: https://github.com/apache/camel-k/issues/1980#issuecomment-773169266
> Let's do another iteration on this... > > I'm thinking to your comments and I like the idea of having stuff also as CRs. I remember some brainstorming with @lburgazzoli about how dynamic schemas may work in this model. The idea was to let Kamelets define their schemas, if known in advance, but also let KameletBindings redefine them, if needed. > > DataFormats are generic in Camel, but when talking about connectors (a.k.a. Kamelets), I think it's better for the Kamelet to enumerate all the possible dataformats it supports. E.g. @davsclaus was talking about sources that can only produce `binary` data (i.e. no dataformat), but there are many other examples: e.g. a "hello world" string cannot be transformed into FHIR data by simply plugging the FHIR JSON dataformat, as well as not all data is suitable for CSV encoding.. > > I also see that we're talking about formats and schemas as if they were the same thing, but even if they are related (i.e. dataFormat + Kamelet [+ Binding Properties] may imply a Schema), maybe we can do a better job in treating them as separate entities. > > I think the following model may be good for the in-Kamelet specification of a "format": > > ```yaml > kind: Kamelet > apiVersion: camel.apache.org/v1alpha1 > metadata: > name: chuck-source > # ... > spec: > definition: > properties: > format: > title: Format > type: string > enum: > - JSON > - Avro > default: JSON > # ... > formats: > - name: JSON > # optional, useful in case of in/out Kamelets > scope: out > schema: > mediaType: "application/json" > data: # the JSON schema inline > url: # alternative link to the shema > ref: # alternative Kubernetes reference to the schema (see below) > name: # ... > # the source produces JSON by default, no libs or transformations needed > > - name: Avro > schema: > type: avro-schema > mediaType: "application/avro" > data: # the avro schema inline > url: # alternative link to the schema > ref: # alternative Kubernetes reference to the schema (see below) > name: # ... > dataFormat: > # optional, but if not provided "no format" is assumed > id: "avro" > properties: # only if "id" is present > class-name: org.apache.camel.xxx.MyClass > compute-schema: true|false > # ... > dependencies: > - camel:jackson > - camel:avro > - mvn:org.acme/my-artifact/1.0.0 > ``` > > You can notice the `scope` property that allows to define the specific details of transformations for input and output of a particular format. I'd not complicate life and assume that users will choose only 1 format using the standard `format` property (not an `inputFormat` and `outputFormat`). So if I choose `CSV`, the Kamelet will consume and produce CSV. Anyway, the shape (schema) of the input CSV can be different from the one of the output CSV (and that's described in the Kamelet). > I think we could also have a case where we want the data format to automatically compute the schema i.e. from a pojo, so basically a formats whiteout the `schema` section. > The `schema` here is declared inline in the Kamelet, to make it self-contained, but we can create also a `Schema` CR: > > ```yaml > kind: Schema > apiVersion: camel.apache.org/v1alpha1 > metadata: > name: my-avro-schema > spec: > type: avro-schema > mediaType: application/avro > data: # the avro schema inline > url: # alternative URL reference > # no, ref is forbidden here > ``` > > Structure is almost the same as the inline version. > > The binding can use the predefined schema: > > ```yaml > kind: KameletBinding > apiVersion: camel.apache.org/v1alpha1 > metadata: > name: chuck-to-channel > spec: > source: > kind: Kamelet > apiVersion: camel.apache.org/v1alpha1 > name: chuck-source > properties: > # may have been omitted, since it's the default > format: JSON > sink: > # ... > ``` > > The binding above will produce objects in JSON format with the inline definition of the schema. The one below is using a custom schema: > > ```yaml > kind: KameletBinding > apiVersion: camel.apache.org/v1alpha1 > metadata: > name: chuck-to-channel > spec: > source: > kind: Kamelet > apiVersion: camel.apache.org/v1alpha1 > name: chuck-source > properties: > # since there's no inline format named "my-avro", it refers to the external one > format: Avro > schema: > # since it's a source, we assume this is the schema of the output > ref: > name: my-avro-schema > # or alternatively also inline > data: #... > url: # ... > sink: > # ... > ``` > > This mechanism may be used also in cases where the schema can be computed dynamically before running the integration. In this case, an external entity saves the schema in a CR and references it in the KameletBinding. > > For the use case of using the Schema CR to sync external entities (like registries), it's possible, but we should think more about that because of edge cases: sometimes the schema is known only at runtime and sometimes it varies from message to message. In that cases, it's the integration itself that needs to update the registries. Probably it would be cleaner if it's the integration that always updates the registry. Yep, we don't need to publish each schema up-front but for pre-computed scheme (either because they are known at runtime or because they are computed before running the integration), we should store them as CR so other can eventually consume them. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org