Github user joewitt commented on the pull request:

    https://github.com/apache/incubator-nifi/pull/28#issuecomment-74810190
  
    Review Findings:
    
    - Adjusted the LICENSE/NOTICE information for Standard Nar, NiFi source 
root, and NiFi assembly.
    
    - Can we cache the compiled JsonPaths [they are threadsafe, etc..]?
      - If so whenever onPropertyModified is triggers we know to invalidate the 
cache.  Can let it get refreshed during the next onTrigger.
      
    - EvaluateJsonPath: Instead of pulling some set batch size use the 
@SupportsBatching annotation and the framework can take care of this for you 
and will do so in a manner that favors user defined tolerances for latency vs 
throughput.  In addition this removes the need for the for loop, the 
flowfileloop goto, etc..  But it does make the previous suggestion more 
important (caching compiled JsonPaths)
    
    - Another contributor has provided an auto-documentation mechanism.  Not 
sure of its status for being pulled in just yet though. In the mean time please 
consider adding documentation pages in src/main/resources/docs....
    
    - EvaluateJsonPath: Line 207.  Is exception handling the only possible 
option for handling/detecting this?  Exceptions create a shocking amount of 
stress on GC and overall JVM performance at rate.  Having exception handling be 
key to the flow control will greatly limit the applicability of this processor 
for truly high volume processing.  If there is no alternative it is probably 
worth documenting this fact as a caveat for users to know in the docs for the 
processor.  If you'd like to see just how horrible exceptions are for 
throughput of the JVM do a simple test of Integer.parseInt to parse a String 
(when some of the inputs could be non-Integers) vs a regex to test first if it 
appears to be an Int then using Integer.parseInt.
    
    - EvaluateJsonPath: Consider telling the provenance event reporter if you 
modified the content of the flow file and providing some details about the 
change.
    
    - SplitJson: Exception based handling again to be avoided if possible.
    
    - Both Processors: It appears to be that the library will load the full 
input stream in memory then return results, splits, etc..  If so that provides 
important information about its applicability on large JSON documents.  If so 
please document this concern with the processor so users can consider this and 
its impact to the heap/memory space.
    
    - SplitJson: Consider telling the provenance event reporter that the splits 
are forks from the original.
    
    Summary: Really good stuff here.  This will be a very helpful processor.  
Please consider making the suggested modifications.  I will push to origin this 
new branch NIFI-360.  Please modify off that and we can work from that PR.  
Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to