[
https://issues.apache.org/jira/browse/NIFI-15745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mark Payne updated NIFI-15745:
------------------------------
Fix Version/s: 2.9.0
Status: Patch Available (was: Open)
> Schema Inference is very inefficient when complex inner fields have many
> nullable values
> ----------------------------------------------------------------------------------------
>
> Key: NIFI-15745
> URL: https://issues.apache.org/jira/browse/NIFI-15745
> Project: Apache NiFi
> Issue Type: Improvement
> Components: Extensions
> Reporter: Mark Payne
> Assignee: Mark Payne
> Priority: Major
> Fix For: 2.9.0
>
> Time Spent: 20m
> Remaining Estimate: 0h
>
> When we have records with inner "records" / "objects" and we're inferring
> schema over many records, if some of the inner fields are nullable and
> therefore not present (especially common in JSON) our inference creates a
> UNION of record types. For example, if we had:
> {code:java}
> [{
> "name": "Mark",
> "project": {
> "name": "nifi",
> "org": "The Apache Software Foundation",
> "yearEstablished": 2014
> }
> },
> {
> "name": "John",
> "project": {
> "name": "nifi",
> "language": "Java",
> "jiraProject": "NIFI"
> },
> "language": {
> "name": "Java"
> }
> }] {code}
> Each of these records has an inner-record with nullable fields so the schema
> would define project as a {{UNION}} of two Record fields.
> This works okay for a simple example like this. But consider a FlowFile with
> thousands or tens of thousands of Records, where inner objects can be very
> complex. The UNION becomes massive, and it takes an inordinate amount of time
> to infer the schema.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)