[
https://issues.apache.org/jira/browse/NIFI-15681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18071724#comment-18071724
]
ASF subversion and git services commented on NIFI-15681:
--------------------------------------------------------
Commit 5fc2e6a02a51312a9c63192bad1d4f43030f4fc4 in nifi's branch
refs/heads/main from agturley
[ https://gitbox.apache.org/repos/asf?p=nifi.git;h=5fc2e6a02a5 ]
NIFI-15681 - Enhance PutElasticsearchJson to support NDJSON, JSON Array, and
Single JSON input formats with size-based batching (#10981)
> Enhance PutElasticsearchJson to support NDJSON, JSON Array, and Single JSON
> input formats with size-based batching
> ------------------------------------------------------------------------------------------------------------------
>
> Key: NIFI-15681
> URL: https://issues.apache.org/jira/browse/NIFI-15681
> Project: Apache NiFi
> Issue Type: Improvement
> Components: Extensions
> Affects Versions: 2.8.0
> Environment: Containerized NiFi 2.8.0 on Rhel 9
> Reporter: Adam Turley
> Priority: Major
> Time Spent: 3h 40m
> Remaining Estimate: 0h
>
> The existing PutElasticsearchJson processor is limited to indexing one JSON
> document per FlowFile. This creates significant overhead in high-volume
> ingest scenarios, requiring upstream flow logic to reshape data before it can
> be sent to Elasticsearch. Additionally, ingesting large datasets requires one
> FlowFile per document, creating excessive NiFi session overhead and making it
> impractical to send pre-aggregated NDJSON or JSON array payloads directly.
> This improvement enhances PutElasticsearchJson in-place while remaining fully
> backwards compatible with existing flows. No schema, Record Reader, or schema
> registry is required — JSON is passed through directly, making it suitable
> for dynamic or schema-less documents.
> Why not PutElasticsearchRecord?
> PutElasticsearchRecord is the right choice when data arrives in a structured,
> well-known format (Avro, CSV, Parquet, etc.) and field-level type mapping,
> schema enforcement, or schema evolution is needed. However, it introduces
> significant overhead that is unnecessary in many JSON ingest pipelines:
> * Schema requirement — a Record Reader and schema (via schema registry,
> inferred, or embedded) must be defined and maintained. For JSON data with
> dynamic fields, deeply nested structures, or schema-less designs, this is a
> configuration burden with no benefit.
> * Deserialization cost — PutElasticsearchRecord fully deserializes the input
> into NiFi's internal Record object model and then re-serializes it to JSON
> for the _bulk request. This is a two-way type conversion for data that is
> already valid JSON, adding CPU and memory overhead on every document.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)