Mike, This is extremely common. Both sides of this are. You have some low-latency or batch producer and you want to delivery to some low latency of batch receiver.
This is what splitting is for (in the case of going large to small) or joining is for (in the case of 'batching'). MergeContent is designed for the batching/aggregation case. It allows you to merge using a couple strategies with binary concatenation being the most common. The very classic example is receiving a live stream of data which needs to be sent to HDFS. We'd setup MergeContent to aggregate data to a size that is close to or matches the desired HDFS block size. Now the part this is interesting that you mention is what if 'object' 45 of 100 causes a problem with the downstream system. How would/could NiFi know about that object? Is it not feasible to evaluate the data for its fitness to merge prior to doing so? Anyway - let us know what you're thinking in terms of how NiFi would know which object was problematic or that any were problematic for that matter. Thanks Joe On Tue, Feb 24, 2015 at 9:28 AM, Mike Drob <[email protected]> wrote: > NiFi experts, > > Let's say that I want to send data from NiFi to some destination that works > much better when the documents are batched. I do not think this is an > unreasonable ask. > > I imagine that I would want to first combine all of the records in one > processor, and then pass on to a dedicated processor for sending the data? > I'm not sure yet if I would be able to use existing processors for this, or > if I could create my own, but this part feels fairly straightforward. > > Next, let's imagine that some document in the batch causes it to fail. I > would like to un-batch, and create smaller batches, and try to send those, > assuming that some piece of the data was malformed and not a transient > error like network unavailable. Is this pattern workable? I can imagine > several layers of fail/split/retry to winnow from 1000 documents to 100 to > 10 to 1, so that I can still get most of my data sent and know exactly > which documents fail. > > I'm largely thinking out loud here, somebody stop me if I'm off the deep > end, or if this has been done before and we have examples (I didn't see any > readily apparent). > > Mike
