Ryan,
I have a few notes/suggestions below. Overall, it looks great - thanks for the work you’ve put into this! Any reason for the naming? I’d suggest “ConvertCSVToAvro”, “ConvertJSONToAvro” and “PutKite” or something like that. Specifically, we try to avoid having the word “Processor” in the names of the Processors and name them as verbs. This is done specifically because it helps the user understand what is going on when looking at the graph if the processors are named as verbs that describe what they do. (GetFile -> ConvertJSONToAvro -> PutKite is easier to understand than GetFile -> KiteJSONToAvroProcessor -> KiteStorageProcessor) It looks like you have a couple of places that you allow a comma-separated list of filenames. After splitting, should probably call “trim” so that users can add spaces or new-lines between the items. You use a “@VisibleForTesting” annotation on your PropertyDescriptors and make them package-level. Generally, we make all PropertyDescriptors public. In the CSV to Avro processor, you have a member variable “CSVProperties props;”. That is initialized in the @OnScheduled, so it needs to be marked volatile so that threads in the onTrigger method have appropriate access to it. I would avoid catching FlowFileAccessException or Throwable in the processors - generally, the framework is best at handling these. I don’t know if this matters because of the readers/writers you’re using but just to be sure: The InputStream and OutputStream that the framework gives to you when you call session.read / session.write are not buffered. So if the readers or writers don’t buffer data themselves performance could hurt. It may help to wrap those streams in BufferedInputStream, BufferedOutputStream? A couple of the files have Cloudera Copyright notices but most do not. Thanks -Mark From: rdblue Sent: Sunday, February 15, 2015 5:57 AM To: [email protected] GitHub user rdblue opened a pull request: https://github.com/apache/incubator-nifi/pull/24 NIFI-238: Add Kite processors. Includes: * KiteStorageProcessor - store Avro files in a Kite dataset * KiteCSVToAvroProcessor - convert CSV to Avro for storage * KiteJSONToAvroProcessor - convert JSON to Avro for storage You can merge this pull request into a Git repository by running: $ git pull https://github.com/rdblue/incubator-nifi NIFI-238-kite-processors Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-nifi/pull/24.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #24 ---- commit b96a1b8c14deba10c36f6092888eb7396d0b652b Author: Ryan Blue <[email protected]> Date: 2015-02-13T02:53:49Z NIFI-238: Add Kite processors. Includes: * KiteStorageProcessor - store Avro files in a Kite dataset * KiteCSVToAvroProcessor - convert CSV to Avro for storage * KiteJSONToAvroProcessor - convert JSON to Avro for storage ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [email protected] or file a JIRA ticket with INFRA. ---
