[
https://issues.apache.org/jira/browse/PIO-105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16090850#comment-16090850
]
ASF GitHub Bot commented on PIO-105:
------------------------------------
Github user takezoe commented on a diff in the pull request:
https://github.com/apache/incubator-predictionio/pull/412#discussion_r127857163
--- Diff:
tools/src/main/scala/org/apache/predictionio/tools/commands/Engine.scala ---
@@ -262,6 +263,56 @@ object Engine extends EitherLogging {
}
}
+ /** Batch predict with an engine.
+ *
+ * @param ea An instance of [[EngineArgs]]
+ * @param engineInstanceId An instance of [[engineInstanceId]]
+ * @param batchPredictArgs An instance of [[BatchPredictArgs]]
+ * @param sparkArgs An instance of [[SparkArgs]]
+ * @param pioHome [[String]] with a path to PIO installation
+ * @param verbose A [[Boolean]]
+ * @return An instance of [[Expected]] contaning either [[Left]]
+ * with an error message or [[Right]] with a handle to process
+ * of a running angine and a function () => Unit,
+ * that must be called when the process is complete
+ */
+ def batchPredict(
+ ea: EngineArgs,
+ engineInstanceId: Option[String],
+ batchPredictArgs: BatchPredictArgs,
+ sparkArgs: SparkArgs,
+ pioHome: String,
+ verbose: Boolean = false): Expected[(Process, () => Unit)] = {
+
+ val engineDirPath = getEngineDirPath(ea.engineDir)
+ val verifyResult = Template.verifyTemplateMinVersion(
+ new File(engineDirPath, "template.json"))
+ if (verifyResult.isLeft) {
+ return Left(verifyResult.left.get)
--- End diff --
Generally, avoiding return is preferred in Scala if it's possible. I think
following is better:
```scala
verifyResult match {
case x @ Left(err) => x
case Right(_) => {
...
}
}
```
or this is bit shorter:
```scala
verifyResult.right.flatMap { _ =>
...
}
```
However, there are some other `return`s in this file, so it might be better
to fully refactor in an another pull request.
> Batch Predictions
> -----------------
>
> Key: PIO-105
> URL: https://issues.apache.org/jira/browse/PIO-105
> Project: PredictionIO
> Issue Type: New Feature
> Components: Core
> Reporter: Mars Hall
> Assignee: Mars Hall
>
> Implement a new {{pio batchpredict}} command to enable massive, fast, batch
> predictions from a trained model. Read a multi-object JSON file as the input
> format, with one query object per line. Similarly, write results to a
> multi-object JSON file, with one prediction result + its original query per
> line.
> Currently getting bulk predictions from PredictionIO is possible with either:
> * a {{pio eval}} script, which will always train a fresh, unvalidated model
> before getting predictions
> * a custom script that hits the {{queries.json}} HTTP API, which is a serious
> bottleneck when requesting hundreds-of-thousands or millions of predictions
> Neither of these existing bulk-prediction hacks are adequate for the reasons
> mentioned.
> It's time for this use-case to be a firstclass command :D
> Pull request https://github.com/apache/incubator-predictionio/pull/412
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)