I have a data.frame with 300 observations of 36 numerical, categorical, and
NA variables. I am trying to evaluate the partitioning around medoids
clustering algorithm for a marketing segmentation study. My original
dataset has over 130,000 observations, but I took a sample for easy
reproducibility reasons.


My machine Mac OSX 10.9.3:


    > sessionInfo()

    R version 3.1.0 (2014-04-10)

    Platform: x86_64-apple-darwin13.1.0 (64-bit)


Problem: Getting an error when doing internal and stability evaluation with
the clValid CRAN package in R.


Code:

    #Convert csv to data.frame

    frame <-as.data.frame(Smallstore1)

    > library(cluster)

    #Create dissimilarity matrix

    #Gower coefficient for finding distance between mixed variables

    > daisy1 <- daisy(frame, metric = "gower", type = list(ordratio =
c(1:36)))

    #k-medoid algorithm with 3 clusters

    > kanswers <- pam(daisy1, 3, diss = TRUE)

    #Evaluate k-mediod clustering algorithm with 2 to 6 clusters

    #Import clValid package

    > library(clValid)

    #Internal validation

    > internval1 <- clValid(daisy1, 2:6, clMethods = "pam", validation =
"internal")

    #Error in switch(class(obj), matrix = mat <- obj, ExpressionSet = mat
<-Biobase::exprs(obj),  : EXPR must be a length 1 vector

    #Error in summary(internval1) :

      #error in evaluating the argument 'object' in selecting a method for
function 'summary': Error: object 'internval1' not found

    #External validation

    > stabval1 <- clValid(daisy1, 2:6, clMethods = "pam", validation =
"stability")

    #Error in switch(class(obj), matrix = mat <- obj, ExpressionSet = mat
<- Biobase::exprs(obj),  : EXPR must be a length 1 vector


Data:


I put the data.frame in a dissimilarity matrix using the daisy function and
used partitioning around medoids with 3 clusters. The daisy and pam
functions come from the cluster CRAN package in R. Since the data.frame has
mixed values, the gower distance coefficient is used. Here's the head of
the first 7 variables, but I took out the names of the email for privacy
reasons.


    > head(frame)

      user_id     email            Age   Gender Household.Income
Marital.Status Presence .of.children

    1   12945     @bellycard.com  <NA>    Male    <NA>            <NA>
       <NA>

    2   12947     @bellycard.com  <NA>    Male    <NA>            <NA>
       <NA>

    3   12990     @gmail.com      <NA>    <NA>    <NA>            <NA>
       <NA>

    4   13160     @gmail.com      25-34   Male    100k-125k       Single
       No

    5   13195     @gmail.com      <NA>    Male    75k-100k        Single
       No

    6   13286     @gmail.com      <NA>    <NA>    <NA>            <NA>
       <NA>


Please let me know if I can provide more information.
-- 
Scott Davis
Cell: (408)826-9561
Skype ID: Scdavis61
San Jose, CA.

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to