Chan, Donald, and I met about this discussion and wanted to share the idea with everyone in hopes it seems reasonable. Please comment.
Our goal: 1) a stateless CLI (with one exception) 2) run from anywhere, in any order 3) replace `pio build` with `sbt build` + a new command, something like `pio register` that stores metadata—all of the rest of the rest of PIO commands read this metadata. This can and will be done incrementally: 1) Chan’s PIO-51 being the first step but it will not allow commands to run from anywhere it will be renamed and make some steps towards the goal 2) remove `pio build` replace with `sbt build` and `pio register` #1 and many already merged PRs and changes including optional Elasticsearch 5.x support will go into Apache PredictionIO-0.11.0 in the next few weeks. #2 may have to wait for PredicitonIO-0.11.0+ The basis for stateless workflow will be the new concept of an engine-instance-id, created at `pio register` time or set optionally by the user. This will be consumed by all other commands to reference the metadata from any location including other machines connected to PIO resources, making them (except for `pio register`) stateless. Thread was Re: [jira] [Commented] (PIO-51) Enable `pio build/train/deploy` outside of engine directory On Jan 22, 2017, at 11:54 AM, Pat Ferrel (JIRA) <[email protected]> wrote: [ https://issues.apache.org/jira/browse/PIO-51?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15833658#comment-15833658 ] Pat Ferrel commented on PIO-51: ------------------------------- I understand that in PIO there a plethora of ids that confuse and hide the real issue, try to ignore those and think about what we want not so much what we have. We need to simplify this to remove as many of these confusing ids as possible and as quickly as possible so we don't draw out or add to the confusion. You and I have a hard time explaining what purpose those ids have (imagine the user's confusion), I choose to ignore them for this very reason. Regardless of future work, the path-to-engine-directory should not be used to identify an engine instance however indirectly. This is because the exact same code jar may be used for multiple engine instances and there may be multiple engine.json files for each of those engines in the same directory. If you are implying a 1-1 correspondence between dir and engine instances we have a problem. These other intermediate ids are completely useless, in principal, and must be removed in order to make the CLI stateless. Remember the next step is replace `pio build` with `sbt build`, which will require a workflow change to create `pio register`. If this means wait on "run from anywhere" until `sbt build` and `pio register` are implemented so be it. Doing "run from everywhere" without asking how it affect stateless workflow and CLI is asking for trouble. > Enable `pio build/train/deploy` outside of engine directory > ----------------------------------------------------------- > > Key: PIO-51 > URL: https://issues.apache.org/jira/browse/PIO-51 > Project: PredictionIO > Issue Type: Improvement > Reporter: Chan > > Users can now provide the engine directory path as —engine-dir or -ed, and > call `pio build/train/deploy` from anywhere. > The “engineVersion” used to identify a prediction engine is created using the > hash of the engine directory path. As a result, the filepath of the engine > had to be kept the same in a distributed setup, with multiple machines using > the same trained model. This was a point of confusion for some users, which > led to this change. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
