Ryan,



I have a few notes/suggestions below. Overall, it looks great - thanks for the 
work you’ve put into this!




Any reason for the naming? I’d suggest “ConvertCSVToAvro”, “ConvertJSONToAvro” 
and “PutKite” or something like that. Specifically, we try to avoid having the 
word “Processor” in the names of the Processors and name them as verbs. This is 
done specifically because it helps the user understand what is going on when 
looking at the graph if the processors are named as verbs that describe what 
they do. (GetFile -> ConvertJSONToAvro -> PutKite is easier to understand than 
GetFile -> KiteJSONToAvroProcessor -> KiteStorageProcessor)



It looks like you have a couple of places that you allow a comma-separated list 
of filenames. After splitting, should probably call “trim” so that users can 
add spaces or new-lines between the items.



You use a “@VisibleForTesting” annotation on your PropertyDescriptors and make 
them package-level. Generally, we make all PropertyDescriptors public.




In the CSV to Avro processor, you have a member variable “CSVProperties 
props;”. That is initialized in the @OnScheduled, so it needs to be marked 
volatile so that threads in the onTrigger method have appropriate access to it.




I would avoid catching FlowFileAccessException or Throwable in the processors - 
generally, the framework is best at handling these.




I don’t know if this matters because of the readers/writers you’re using but 
just to be sure: The InputStream and OutputStream that the framework gives to 
you when you call session.read / session.write are not buffered. So if the 
readers or writers don’t buffer data themselves performance could hurt. It may 
help to wrap those streams in BufferedInputStream, BufferedOutputStream?



A couple of the files have Cloudera Copyright notices but most do not.



Thanks

-Mark








From: rdblue
Sent: ‎Sunday‎, ‎February‎ ‎15‎, ‎2015 ‎5‎:‎57‎ ‎AM
To: [email protected]





GitHub user rdblue opened a pull request:

    https://github.com/apache/incubator-nifi/pull/24

    NIFI-238: Add Kite processors.

    Includes:
    * KiteStorageProcessor - store Avro files in a Kite dataset
    * KiteCSVToAvroProcessor - convert CSV to Avro for storage
    * KiteJSONToAvroProcessor - convert JSON to Avro for storage

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/rdblue/incubator-nifi NIFI-238-kite-processors

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-nifi/pull/24.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #24
    
----
commit b96a1b8c14deba10c36f6092888eb7396d0b652b
Author: Ryan Blue <[email protected]>
Date:   2015-02-13T02:53:49Z

    NIFI-238: Add Kite processors.
    
    Includes:
    * KiteStorageProcessor - store Avro files in a Kite dataset
    * KiteCSVToAvroProcessor - convert CSV to Avro for storage
    * KiteJSONToAvroProcessor - convert JSON to Avro for storage

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to