[
https://issues.apache.org/jira/browse/PIO-105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16088312#comment-16088312
]
ASF GitHub Bot commented on PIO-105:
------------------------------------
GitHub user mars opened a pull request:
https://github.com/apache/incubator-predictionio/pull/412
Batch Predictions
JIRA issue [PIO-105](https://issues.apache.org/jira/browse/PIO-105)
Provides a new `pio batchpredict` command.
Reads from multi-object JSON input file. Example:
```json
{"user":"1"}
{"user":"2"}
{"user":"3"}
{"user":"4"}
{"user":"5"}
```
Writes to multi-object JSON output file (actually Hadoop partition files).
Example:
```json
{"query":{"user":"1"},"prediction":{"itemScores":[{"item":"1","score":33},{"item":"2","score":32}]}}
{"query":{"user":"2"},"prediction":{"itemScores":[{"item":"5","score":55},{"item":"3","score":28}]}}
{"query":{"user":"3"},"prediction":{"itemScores":[{"item":"2","score":16},{"item":"3","score":12}]}}
{"query":{"user":"4"},"prediction":{"itemScores":[{"item":"3","score":19},{"item":"1","score":18}]}}
{"query":{"user":"5"},"prediction":{"itemScores":[{"item":"1","score":24},{"item":"4","score":14}]}}
```
See the included [console usage
help](#diff-2cf174557564e09d52157be8e839fecf)
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/mars/incubator-predictionio batch-predict
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-predictionio/pull/412.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #412
----
commit 99ee6493bddc8f02aee384f3a2db27c6ae3f68cc
Author: Mars Hall <[email protected]>
Date: 2017-07-13T00:12:25Z
Implement BatchPredict
commit c205357498e4a4a745810b04130c5bbad78f8686
Author: Mars Hall <[email protected]>
Date: 2017-07-14T22:29:26Z
Improve console help for batch predict.
commit 93f7ed3e5ed10155a688a032e367793d75fa116a
Author: Mars Hall <[email protected]>
Date: 2017-07-14T22:46:30Z
Undo experimental change to publish tools artifact
----
> Batch Predictions
> -----------------
>
> Key: PIO-105
> URL: https://issues.apache.org/jira/browse/PIO-105
> Project: PredictionIO
> Issue Type: New Feature
> Components: Core
> Reporter: Mars Hall
> Assignee: Mars Hall
>
> Implement a new {{pio batchpredict}} command to enable massive, fast, batch
> predictions from a trained model. Read a multi-object JSON file as the input
> format, with one query object per line. Similarly, write results to a
> multi-object JSON file, with one prediction result + its original query per
> line.
> Currently getting bulk predictions from PredictionIO is possible with either:
> * a {{pio eval}} script, which will always train a fresh, unvalidated model
> before getting predictions
> * a custom script that hits the {{queries.json}} HTTP API, which is a serious
> bottleneck when requesting hundreds-of-thousands or millions of predictions
> Neither of these existing bulk-prediction hacks are adequate for the reasons
> mentioned.
> It's time for this use-case to be a firstclass command :D
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)