This would work for some templates but not all. The Events as a collection need 
to support PEvents and LEvents APIs and files would make those type of queries 
rather difficult. I believe the current philosophy for PIO is that to include 
something in the core it would need to support all templates—put another way it 
would have to provide the minimum API.

However importing those formats is fair game. Parquet is supported as output, 
and JSON for in and out. Adding formats should be easy.

However we have discussed a future version of PIO that would use composition of 
microservices to create the combined API for a particular algorithm, one the 
did not need all PEvent queries might bring in a reader that reads files. This 
is still far off and not agreed to by all but I like to hear of a use for it.


On Aug 2, 2016, at 7:32 PM, Hyukjin Kwon <[email protected]> wrote:

Hi all,


I started to have some interests in PredictionIO few months ago and digging
it for myself.

It seems PredictionIO requires MySQL, PostgreSQL or HBase with
Elasticsearch as storage to save data and metadata and it saves models into
HDFS, Local file system, MySQL or PostgreSQL (is my understanding correct?)

I am just wondering if we can add the support for HDFS and local FS as
storage with some file formats such as Parquet, ORC, CSV or JSON (maybe
reading via Spark not like the other storages).

I should look into this further though. The advantage of this will be to
reduce the minimum dependencies.

I might have to add [Proposal] tag in the title but I didn't because I feel
like this should be suggested before and I might be missing something.

Could you please give some feedback?


Thanks!

Reply via email to