Scrooge-McDucks opened a new pull request, #11058:
URL: https://github.com/apache/nifi/pull/11058

   …e fragment attributes from MergeContent in Defragment mode
   
   ## Summary
   
   This change adds optional fragment attribute support to `UnpackContent` so 
unpacked FlowFiles can be regrouped downstream using `MergeContent` in 
`Defragment` mode.
   
   It also updates `MergeContent` to remove reassembly-related attributes from 
the merged FlowFile once defragmentation has completed successfully, including:
   
   - `fragment.identifier`
   - `fragment.index`
   - `fragment.count`
   - `segment.original.filename`
   
   ## Motivation
   
   A common dataflow pattern is:
   
   1. Data is packaged to optimise transport
   2. `UnpackContent` extracts individual FlowFiles
   3. Files are enriched or transformed independently
   4. Files are regrouped and repackaged
   
   This works well conceptually, but today `UnpackContent` does not provide a 
built-in way to assign the fragment attributes needed for downstream reassembly 
across formats such as ZIP, TAR, and FlowFile Package.
   
   Without those attributes, users need custom logic to preserve grouping and 
ordering, which adds complexity and can lead to inconsistent behaviour.
   
   This change makes that workflow easier by allowing `UnpackContent` to 
optionally generate fragment attributes, while ensuring `MergeContent` removes 
the temporary reassembly metadata once the final merged FlowFile has been 
produced.
   
   ## Changes Included
   
   ### UnpackContent
   
   Added optional support for assigning fragment attributes to unpacked 
FlowFiles.
   
   #### New Properties
   
   **Add Fragment Attributes**
   - When enabled, assigns:
     - `fragment.identifier`
     - `fragment.index`
     - `fragment.count`
   
   **Fragment Identifier Value**
   - Specifies the value used for `fragment.identifier`
   - Supports Expression Language evaluated against the incoming packed FlowFile
   - Evaluated once per source FlowFile, with the resulting value applied to 
all unpacked FlowFiles derived from that source
   - Default: `${uuid()}`
   
   Examples:
   - `${UUID()}` for a unique grouping per archive (default)
   - `${filename}` for grouping based on the original filename
   - `${archive.filename}` when an explicit archive attribute is available
   
   #### Behaviour
   
   When enabled:
   - All FlowFiles produced from a single archive share the same 
`fragment.identifier`
   - `fragment.index` is assigned based on entry order within the archive
   - `fragment.count` is set to the total number of unpacked entries
   - The identifier expression is evaluated once per parent FlowFile
   
   When disabled:
   - No change to current `UnpackContent` behaviour
   
   ### MergeContent
   
   Updated `MergeContent` so that after a successful defragmentation, the 
merged FlowFile no longer retains temporary reassembly metadata.
   
   When operating in `Defragment` mode, the merged FlowFile now removes:
   
   - `fragment.identifier`
   - `fragment.index`
   - `fragment.count`
   - `segment.original.filename`
   
   This ensures the final merged output reflects the completed repackaged 
artifact rather than the intermediate fragmentation state used to drive 
regrouping.
   
   ## Compatibility
   
   - Fully backward compatible
   - Fragment attribute generation in `UnpackContent` is opt-in
   - Existing flows are unchanged unless the new property is enabled
   - The `MergeContent` cleanup only applies after successful defragmentation
   
   ## Example
   
   Input:
   - `archive.zip` containing 3 files
   
   Unpack output when enabled with `${filename}` as the identifier:
   
   | filename  | fragment.identifier | fragment.index | fragment.count |
   |-----------|---------------------|----------------|----------------|
   | file1.txt | archive.zip         | 0              | 3              |
   | file2.txt | archive.zip         | 1              | 3              |
   | file3.txt | archive.zip         | 2              | 3              |
   
   After processing and successful defragmentation in `MergeContent`, the 
merged FlowFile no longer retains:
   
   - `fragment.identifier`
   - `fragment.index`
   - `fragment.count`
   - `segment.original.filename`
   
   
   <!-- Licensed to the Apache Software Foundation (ASF) under one or more -->
   <!-- contributor license agreements.  See the NOTICE file distributed with 
-->
   <!-- this work for additional information regarding copyright ownership. -->
   <!-- The ASF licenses this file to You under the Apache License, Version 2.0 
-->
   <!-- (the "License"); you may not use this file except in compliance with -->
   <!-- the License.  You may obtain a copy of the License at -->
   <!--     http://www.apache.org/licenses/LICENSE-2.0 -->
   <!-- Unless required by applicable law or agreed to in writing, software -->
   <!-- distributed under the License is distributed on an "AS IS" BASIS, -->
   <!-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 
implied. -->
   <!-- See the License for the specific language governing permissions and -->
   <!-- limitations under the License. -->
   
   # Summary
   
   [NIFI-15758](https://issues.apache.org/jira/browse/NIFI-15758)
   
   # Tracking
   
   Please complete the following tracking steps prior to pull request creation.
   
   ### Issue Tracking
   
   - [x] [Apache NiFi Jira](https://issues.apache.org/jira/browse/NIFI) issue 
created
   
   ### Pull Request Tracking
   
   - [x] Pull Request title starts with Apache NiFi Jira issue number, such as 
`NIFI-00000`
   - [x] Pull Request commit message starts with Apache NiFi Jira issue number, 
as such `NIFI-00000`
   - [x] Pull request contains [commits 
signed](https://docs.github.com/en/authentication/managing-commit-signature-verification/signing-commits)
 with a registered key indicating `Verified` status
   
   ### Pull Request Formatting
   
   - [x] Pull Request based on current revision of the `main` branch
   - [x] Pull Request refers to a feature branch with one commit containing 
changes
   
   # Verification
   
   Please indicate the verification steps performed prior to pull request 
creation.
   
   ### Build
   
   - [x] Build completed using `./mvnw clean install -P contrib-check`
     - [x] JDK 21
     - [ ] JDK 25
   
   ### Licensing
   
   - [x] New dependencies are compatible with the [Apache License 
2.0](https://apache.org/licenses/LICENSE-2.0) according to the [License 
Policy](https://www.apache.org/legal/resolved.html)
   - [x] New dependencies are documented in applicable `LICENSE` and `NOTICE` 
files
   
   ### Documentation
   
   - [x] Documentation formatting appears as expected in rendered files
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to