Hi All,

Just started to experiment with "sparklyr" and already loving it.

I'm trying to build an extension by constructing an R wrapper to Spark's
Gaussian Mixtures. My attempt is below, and so is the error message. Not
sure if this is possible to do, and if so, what is wrong with my code.

Any hints would be much appreciated.

Best,
Axel.

-----

library(sparklyr)
library(dplyr)
sc <- spark_connect(master = "local")

x <- copy_to(sc, iris)
x <- x %>% select(Petal_Width, Petal_Length)

# set params
k <- 3
iter.max <- 100
features <- dplyr::tbl_vars(x)
compute.cost <- TRUE
tolerance <- 1e-4
ml.options <- ml_options()

df <- spark_dataframe(x)
sc <- spark_connection(df)
df <- ml_prepare_features(
  x = df,
  features = features,
  envir = environment()
  # ml.options = ml.options
)
envir <- new.env(parent = emptyenv())
envir$id <- ml.options$id.column
df <- df %>%
  sdf_with_unique_id(envir$id) %>%
  spark_dataframe()
tdf <- ml_prepare_dataframe(df, features, ml.options = ml.options, envir =
envir)
envir$model <- "org.apache.spark.ml.clustering.GaussianMixture"
gmm <- invoke_new(sc, envir$model)
>Error: failed to invoke spark command
>16/10/09 16:35:35 ERROR <init> on
org.apache.spark.ml.clustering.GaussianMixture failed

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to