Github user joewitt commented on the pull request:
https://github.com/apache/incubator-nifi/pull/28#issuecomment-74810190
Review Findings:
- Adjusted the LICENSE/NOTICE information for Standard Nar, NiFi source
root, and NiFi assembly.
- Can we cache the compiled JsonPaths [they are threadsafe, etc..]?
- If so whenever onPropertyModified is triggers we know to invalidate the
cache. Can let it get refreshed during the next onTrigger.
- EvaluateJsonPath: Instead of pulling some set batch size use the
@SupportsBatching annotation and the framework can take care of this for you
and will do so in a manner that favors user defined tolerances for latency vs
throughput. In addition this removes the need for the for loop, the
flowfileloop goto, etc.. But it does make the previous suggestion more
important (caching compiled JsonPaths)
- Another contributor has provided an auto-documentation mechanism. Not
sure of its status for being pulled in just yet though. In the mean time please
consider adding documentation pages in src/main/resources/docs....
- EvaluateJsonPath: Line 207. Is exception handling the only possible
option for handling/detecting this? Exceptions create a shocking amount of
stress on GC and overall JVM performance at rate. Having exception handling be
key to the flow control will greatly limit the applicability of this processor
for truly high volume processing. If there is no alternative it is probably
worth documenting this fact as a caveat for users to know in the docs for the
processor. If you'd like to see just how horrible exceptions are for
throughput of the JVM do a simple test of Integer.parseInt to parse a String
(when some of the inputs could be non-Integers) vs a regex to test first if it
appears to be an Int then using Integer.parseInt.
- EvaluateJsonPath: Consider telling the provenance event reporter if you
modified the content of the flow file and providing some details about the
change.
- SplitJson: Exception based handling again to be avoided if possible.
- Both Processors: It appears to be that the library will load the full
input stream in memory then return results, splits, etc.. If so that provides
important information about its applicability on large JSON documents. If so
please document this concern with the processor so users can consider this and
its impact to the heap/memory space.
- SplitJson: Consider telling the provenance event reporter that the splits
are forks from the original.
Summary: Really good stuff here. This will be a very helpful processor.
Please consider making the suggested modifications. I will push to origin this
new branch NIFI-360. Please modify off that and we can work from that PR.
Thanks!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---