Hi,
R has a vast array of tools for cluster analysis. There's even a task
view: https://cran.r-project.org/web/views/Cluster.html
Which method is best for your needs is going to require you spending
some time working to understand the pros and cons, and possibly
consulting with a local statistici
Hi
> -Original Message-
> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Venky
> Sent: Wednesday, June 17, 2015 8:43 AM
> To: R Help R
> Subject: [R] cluster analysis
>
> Hi friends,
>
> I have data like this
>
In R or elsewhere?
>
>
> Group
> Employee size WOE Employe
Dear Sun Shine,
dtes <- dist(tes.df, method = 'euclidean')
dtesFreq <- hclust(dtes, method = 'ward.D')
plot(dtesFreq, labels = names(tes.df))
However, I get an error message when trying to plot this: "Error in
graphics:::plotHclust(n1, merge, height, order(x$order), hang, : invalid
dendrogr
On Wed, Mar 20, 2013 at 3:55 AM, Emma Gibson wrote:
> I am trying to perform cluster analysis on survey data where each
> respondent has answered several questions, some of which have categorical
> answers ("blue" "pink" "green" etc) and some of which have scale answers
> (rating from 1 to 10 etc)
These are the errors I've been having. I have been trying 3 different things
1- Mclust:
This is the example I have been following:
# Model Based Clustering
library(mclust)
fit <- Mclust(mydata)
plot(fit, mydata) # plot results
print(fit) # display the best model
What I have done:
> fit <- Mclu
It's hard to answer these questions without knowing what the errors are and
how they can be reproduced.
Best, Ingmar
On Thu, Nov 22, 2012 at 1:03 AM, KitKat wrote:
> Thanks, I have been trying that site and another one
> (http://www.statmethods.net/advstats/cluster.html)
>
> I don't know if I sh
Thanks, I have been trying that site and another one
(http://www.statmethods.net/advstats/cluster.html)
I don't know if I should be doing mclust or mcclust, but either way, the
codes are not working. I am following the guidelines online at:
mcclust - http://cran.r-project.org/web/packages/mcclust/
http://cran.r-project.org/web/views/Cluster.html
might be a good start
Brian
On Nov 21, 2012, at 1:36 PM, KitKat wrote:
> Thank you for replying!
> I made a new post asking if there are any websites or files on how to
> download package mclust (or other Bayesian cluster analysis packages) an
Thank you for replying!
I made a new post asking if there are any websites or files on how to
download package mclust (or other Bayesian cluster analysis packages) and
the appropriate R functions? Sorry I don't know how this forum works yet
--
View this message in context:
http://r.789695.n4.n
Dear Katherine,
function flexmixedruns in package fpc may do what you want; it fits mixtures
with continuous and categorical variables, can use the BIC for giving you the
number of mixture components and also gives you posterior probabilities for
cases to belong to components.
Note that genera
Have a look at the package mclust.
Jose
From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] On Behalf Of
Ingmar Visser [i.vis...@uva.nl]
Sent: 15 November 2012 21:10
To: KitKat
Cc: r-help@r-project.org
Subject: Re: [R] cluster analysis in R
Dear KitKat,
After installing R and reading some introductory material on getting
started with R you may want to check the CRAN task view on cluster analysis:
http://cran.r-project.org/web/views/Cluster.html
which has many useful references to all kinds and flavors of clustering
techniques, hierar
Hi, Taisa,
It depends on many paramfactors, e.g. nature of your data, volume of data set
etc.
The analog of SAS fastclus in R - kmeans (for practical example check slide #35
here:
http://www.slideshare.net/whitish/textmining-with-r)
Check also kmedoids (pam) and hclust.
Good luck,
-Alex
__
At the R command prompt
?kmeans (for info on the R equivalent to FASTCLUS)
?hclust (for info on the R equivalent to CLUSTER)
Install package clusterSim
and look at function index.G1 for the Calinski-Harabasz pseudo F-statistic
--
David L Carlson
Assoc
On Wed, Apr 4, 2012 at 10:12 AM, Petr Savicky wrote:
> On Wed, Apr 04, 2012 at 01:32:10PM +0200, paladini wrote:
> Var1 <- c("(1,2)", "(7,8)", "(4,7)")
> Var2 <- c("(1,5)", "(3,88)", "(12,4)")
> Var3 <- c("(4,2)", "(6,5)", "(4,4)")
> DF <- data.frame(Var1, Var2, Var3, stringsAsFactors=FALSE)
On Wed, Apr 04, 2012 at 01:32:10PM +0200, paladini wrote:
> Hello,
> I want to do a cluster analysis with my data. The problem is, that the
> variables dont't consist of single value but the entries are pairs of
> values.
> That lokks like this:
>
>
> Variable 1:Variable2: Variable3:
You can create distance matrices for each Variable, square them, sum them,
and take the square root. As for getting the data into a data frame, the
simplest would be to enter the three variables into six columns like the
following:
data
[,1] [,2] [,3] [,4] [,5] [,6]
[1,]1215
On Thu, Mar 31, 2011 at 11:48 AM, Hans Ekbrand wrote:
>
> The variables are unordered factors, stored as integers 1:9, where
>
> 1 means "Full-time employment"
> 2 means "Part-time employment"
> 3 means "Student"
> 4 means "Full-time self-employee"
> ...
>
> Does euclidean distances make sense on
On Thu, Mar 31, 2011 at 08:48:02PM +0200, Hans Ekbrand wrote:
> On Thu, Mar 31, 2011 at 07:06:31PM +0100, Christian Hennig wrote:
> > Dear Hans,
> >
> > clara doesn't require a distance matrix as input (and therefore
> > doesn't require you to run daisy), it will work with the raw data
> > matrix
On Thu, Mar 31, 2011 at 07:06:31PM +0100, Christian Hennig wrote:
> Dear Hans,
>
> clara doesn't require a distance matrix as input (and therefore
> doesn't require you to run daisy), it will work with the raw data
> matrix using
> Euclidean distances implicitly.
> I can't tell you whether Euclide
Dear Hans,
clara doesn't require a distance matrix as input (and therefore doesn't
require you to run daisy), it will work with the raw data matrix using
Euclidean distances implicitly.
I can't tell you whether Euclidean distances are appropriate in this
situation (this depends on the interpre
Peter Langfelder wrote:
>
> On Fri, Nov 26, 2010 at 6:55 AM, Derik Burgert wrote:
>> Dear list,
>>
>> running a hierachical cluster analysis I want to define a number of
>> objects that build a cluster already. In other words: I want to force
>> some of the cases to be in the same cluster from
On Fri, Nov 26, 2010 at 6:55 AM, Derik Burgert wrote:
> Dear list,
>
> running a hierachical cluster analysis I want to define a number of objects
> that build a cluster already. In other words: I want to force some of the
> cases to be in the same cluster from the start of the algorithm.
>
> An
Hi Ulrich,
I'm studying the principles of Affinity Propagation and I'm really glad to
use your package (apcluster) in order to cluster my data. I have just an
issue to solve..
If I apply the funcion: apcluster(sim)
where sim is the matrix of dissimilarities, sometimes I encounter the
warning
Hi Jim,
Ow! Very nice job at http://mephisto.unige.ch/traminer/preview.shtml I´m
going to read more about it.
I have a lot of different steps, in a sequence. Actually, 586 different
possible steps, but I have 4269 legal cases, with a maximum of 379 steps
each one.
If you want, I can send this da
Hi Allan,
It helps a lot. I´ll try to read more about it.
But, as you asked me, here goes a brief explanation about the necessary
columns of the sample date paste at the end:
id_processo: identify a legal case, it is its primary key.
ordem_andamento: is the step number inside a legal case (id_pr
Pablo, we've had success using
http://mephisto.unige.ch/traminer/preview.shtml to look at marketing paths.
Question would be how many distinct case step discriptions are there?
HTH, Jim
On Jul 26, 2010 9:44 AM, "Pablo Cerdeira" wrote:
Hi all,
I have no idea if this question is to easy to be an
>
> What do you suggest in order to assign a new observation to a determined
> cluster?
>
As I mentioned already, I would simply assign the new observation to the
cluster to whose exemplar the new observation is most similar to (in a
knn1-like fashion). To compute these similarities, you can use t
Christian wrote:
and the implement
nearest neighbours classification myself if I needed it.
It should be pretty straightforward to implement.
Do you intend modify the code of the knn1() function by yourself?
No; if you understand what the nearest neighbours method does, it's not
very compl
Ulrich wrote:
>Affinity propagation produces quite a number of clusters.
I tried with q=0 and produces 17 clusters. Anyway that's a good idea,
thanks. I'm looking to test it with my dataset.
So I'll probably use daisy() to compute an appropriate dissimilarity then
apcluster() or another meth
Sorry, Joris, I overlooked that you already mentioned daisy() in your
posting. I should have credited your recommendation in my previous message.
Cheers, Ulrich
--
View this message in context:
http://r.789695.n4.nabble.com/cluster-analysis-and-supervised-classification-an-alternative-to-knn1-t
>
> I had a look at the documentation of the package apcluster.
> That's interesting but do you have any example using it with both
> categorical
> and numerical variables? I'd like to test it with a large dataset..
>
Your posting has opened my eyes: problems where both numerical and
categorical f
un...@r- cc
> project.org r-help@r-project.org
> Subject
> Re: [R] cluster analysis and
> 05/27/2010 07:56
Hi Abanero,
first, I have to correct myself. Knn1 is a supervised learning algorithm, so
my comment wasn't completely correct. In any case, if you want to do a
clustering prior to a supervised classification, the function daisy() can
handle any kind of variable. The resulting distance matrix can b
Hi,
thank you Joris and Ulrich for you answers.
Joris Meys wrote:
>see the library randomForest for example
I'm trying to find some example in randomForest with categorical variables
but I haven't found anything. Do you know any example with both categorical
and numerical variables? Anyway I
Dear abanero,
In principle, k nearest neighbours classification can be computed on
any dissimilarity matrix. Unfortunately, knn and knn1 seem to assume
Euclidean vectors as input, which restricts their use.
I'd probably compute an appropriate dissimilarity between points (have a
look at Gowe
abanero wrote:
>
> Do you know something like “knn1” that works with categorical variables
> too?
> Do you have any suggestion?
>
There are surely plenty of clustering algorithms around that do not require
a vector space structure on the inputs (like KNN does). I think
agglomerative clustering w
Not a direct answer, but from your description it looks like you are better
of with supervised classification algorithms instead of unsupervised
clustering. see the library randomForest for example. Alternatively, you can
try a logistic regression or a multinomial regression approach, but these
are
I'm not sure why you'd expect Euclidean distance and squared Euclidean
distance to
give the same results.
Euclidean distance is the square root of the sums of squared
differences for each variable, and that's exactly what dist() returns.
http://en.wikipedia.org/wiki/Euclidean_distance
On a map,
Hi Jeoffrey,
How stable are the results in general ?
If you repeat the analysis in R several times, does it yield the same
results ?
Tal
Contact
Details:---
Contact me: tal.gal...@gmail.com | 972-52-7275845
Read me: www.talgal
Hi Samantha,
You can check out the graph and source code on this page:
http://addictedtor.free.fr/graphiques/RGraphGallery.php?graph=79
best, Xian
--
View this message in context:
http://n4.nabble.com/cluster-analysis-labels-for-dendrogram-tp1588347p1588790.html
Sent from the R help mailing
Hi Samantha,
Did you check out the help for plclust? There's a labels argument that
is used to label the leaves of your dendrogram. By default, the rownames
of your dataframe are used.
Sarah
On Wed, Mar 10, 2010 at 9:01 PM, Samantha wrote:
>
> Hi,
>
> I am clustering data based on three numeric
Without know what your data set really looks like, I'd look to decision
trees - specifically package rpart and use method = classify.
Your problem may not be appropriate in that environment, but it is hard to
say with limited explanation of issues.
good luck
Steve Friedman Ph. D.
Spatial Statist
On 17.11.2009 5:22, Charles C. Berry wrote:
>
> Once you get the hang of it, you'll be in a position to modify an existing
> hclust object.
I believe that I managed to solve the problem. (The code may not
be too refined, and my R is perhaps a bit dialectal. The function
may fail especially if th
Original Message
Subject: Re: [R] Cluster analysis: hclust manipulation possible?
Date: Mon, 16 Nov 2009 19:22:54 -0800
From: Charles C. Berry
To: Jopi Harri
References: <4b016237.7050...@utu.fi>
<4b01bc5d.3020...@utu.fi>
On Mon, 16 Nov 2009, Jopi Harri
On 16.11.2009 19:13, Charles C. Berry wrote:
>> The question: Can this be accomplished in the *dendrogram plot*
>> by manipulating the resulting hclust data structure or by some
>> other means, and if yes, how?
>
> Yes, you need to study
>
> ?hclust
>
> particularly the part about 'Value'
On Mon, 16 Nov 2009, Jopi Harri wrote:
I am doing cluster analysis [hclust(Dist, method="average")] on
data that potentially contains redundant objects. As expected,
the inclusion of redundant objects affects the clustering result,
i.e., the data a1, = a2, = a3, b, c, d, e1, = e2 is likely to
cl
On Mon, 2009-07-13 at 23:42 -0700, Hollix wrote:
> Hi folks,
>
> I tried for the first time hclust. Unfortunately, with missing data in my
> data file, it doesn't seem
> to work. I found no information about how to consider missing data.
>
> Omission of all missings is not really an option as I w
vegdist() in the vegan package optionally allows pairwise deletion of missing
values when computing dissimilarities. The result can be used as the first
agrument to hclust()
('Caveat emptor', of course.)
From: r-help-boun...@r-project.org [r-help-boun...
Dear Alex,
actually fixing the number of clusters in kmeans end then ending up with a
smaller number because of empty clusters is not a standard method of
estimating the number of clusters. I may happen (as apparently in some of
your examples), but it is generally rather unusual. In most cases
Try this:
c4 <- cutree(cluster, k=4)
by(data, c4, mean)
HTH
Marcelino
On 2009-02-20 Jgaspard wrote:
Hi all!
I'm new to R and don't know many about it. Because it is free, I managed to
learn it a little bit.
Here is my problem: I did a cluster analysis on 30 observations and 16
variab
jgaspard wrote:
Hi all!
I'm new to R and don't know many about it. Because it is free, I managed to
learn it a little bit.
Here is my problem: I did a cluster analysis on 30 observations and 16
variables (monde, figaro, liberation, etc.). Here is the .txt data file:
"monde","figaro","liberat
Dan,
I don't use the flexclust package, but if I understand your question
correctly, you can use your own distance measure to calculate a
dissimilarity matrix and pass that to, e.g., agnes() in the cluster
package.
Stephen
On Fri, Feb 6, 2009 at 9:42 AM, Jim Porzak wrote:
> Dan,
>
> Check out F
Dan,
Check out Fritz Leisch's flexclust package.
HTH,
Jim Porzak
TGN.com
San Francisco, CA
http://www.linkedin.com/in/jimporzak
use R! Group SF: http://ia.meetup.com/67/
On Fri, Feb 6, 2009 at 7:11 AM, Dan Stanger wrote:
> Hello All,
>
> I have data where each feature data point is a vector, a
If you can define a distance between two vectors (where each one has some
numerical and some categorical coordinates) then you can proceed with any
clustering algorithm.
One possibility to get such a distance is to use RandomForest which can produce
a proximity matrix which can be turned into d
Try hclust with daisy in cluster package.
Cheers,
Jin
Jin Li, PhD
Spatial Modeller/
Computational Statistician
Marine & Coastal Environment
Geoscience Australia
Ph: 61 (02) 6249 9899
Fax: 61 (02) 6249 9956
email: [EMAIL PROTECTED]
---
Dear Miha,
a general way to do this is as follows:
Define a distance measure by aggregating the
Euclidean distance on the (X,Y)-space and the trivial 0-1 distance (0 if
category is the same) on the categorial variable. Perform cluster analysis
(whichever you want) on the resulting distance mat
AMINA SHAHZADI,
The eternal question.
What I do is that I generate a range of solutions, profile them on variables
used to cluster the data into groups and any other information I have to
profile the cluster groups on and then present the solutions to a group of
others to assess meaningfulness
Amna,
You have posted this question to the list several times now over the past
few weeks.
Several of us have recommended hclust() as a starting point.
However, your question about the optimal number of clusters to choose is not
an R question. I recommend that you tackle the literature on this
On Thu, 2007-11-01 at 08:19 -0800, amna khan wrote:
> Hi Sir
>
> How can we select the optimum number of clusters?
>
> Best Regards
There are many ways you could do this, some better than others. A key
factor is which method of "cluster analysis" are you using?
I'd suggest you read up about the
> Subject: [R] Cluster Analysis
>
> Dear all,
>
> I would like to know if I can do a hierarchical cluster analysis in R
> using my own similarity matrix and how. Thanks. Katia Freire.
Yes. ;)
Reading the help for dist() and hclust() should make the procedure for
doing this appear fairly str
take a look at hclust()
Dieter
Katia Freire wrote:
> Dear all,
>
> I would like to know if I can do a hierarchical cluster analysis in R using
> my own similarity matrix and how. Thanks. Katia Freire.
>
>
> [[alternative HTML version deleted]]
>
> _
Hi Amna,
I believe you are looking for these functions
?hclust
[with method = "ward"]
?kmeans
Best regards,
Stephen
--- amna khan <[EMAIL PROTECTED]> wrote:
> Hi Sir
>
> How to perform cluster analysis using Ward's method and K- means
> clustering?
>
> Regards
>
> --
> AMINA SHAHZADI
>
On 10/18/07, amna khan <[EMAIL PROTECTED]> wrote:
> Hi Sir
>
> How to perform cluster analysis using Ward's method and K- means clustering?
For beginning, try to perform it using the GUI Rcmdr.
Regards,
Liviu
__
R-help@r-project.org mailing list
https:
64 matches
Mail list logo