( rep(first.col.val, reps-1), mat.clean)
On Fri, 31 Jan 2020 at 20:31, Berry, Charles
wrote:
>
>
> > On Jan 31, 2020, at 1:04 AM, Emmanuel Levy
> wrote:
> >
> > Hi,
> >
> > I'd like to use the Netflix challenge data and just can't figure
Hi,
I'd like to use the Netflix challenge data and just can't figure out how to
efficiently "scan" the files.
https://www.kaggle.com/netflix-inc/netflix-prize-data
The files have two types of row, either an *ID* e.g., "1:" , "2:", etc. or
3 values associated to each ID:
The format is as follows:
column' data.frames, not 'empty'
> data.frames, which could be either.)
>
>
>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
>
> On Wed, Nov 2, 2016 at 6:48 AM, Emmanuel Levy
> wrote:
>
>> Dear All,
>>
>> This sounds simple but can
Dear All,
This sounds simple but can't figure out a good way to do it.
Let's say that I have an empty data frame "df":
## creates the df
df = data.frame( id=1, data=2)
## empties the df, perhaps there is a more elegant way to create an empty
df?
df = df[-c(1),]
> df
[1] id data
<0 rows> (or
~ Sir Ronald Aylmer Fisher
> The plural of anecdote is not data. ~ Roger Brinner
> The combination of some data and an aching desire for an answer does not
> ensure that a reasonable answer can be extracted from a given body of data.
> ~ John Tukey
>
> 2015-07-21 15:43 GMT+02:00 E
Hi,
The answer to this is probably straightforward, I have a dataframe and I'd
like to build an index of column combinations, e.g.
col1 col2 --> col3 (the index I need)
A 1 1
A 1 1
A 2 2
B 1 3
B 2 4
B 2 4
At th
I did not know that unique worked on entire rows!
That is great, thank you very much!
Emmanuel
On 27 December 2012 22:39, Marc Schwartz wrote:
> unique(t(apply(cbind(v1, v2), 1, sort)))
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailm
Hi,
I've had this problem for a while and tackled it is a quite dirty way
so I'm wondering is a better solution exists:
If we have two vectors:
v1 = c(0,1,2,3,4)
v2 = c(5,3,2,1,0)
How to remove one instance of the "3,1" / "1,3" double?
At the moment I'm using the following solution, which is q
Hi,
That sounds simple but I cannot think of a really fast way of getting
the following:
c(1,1,2,2,3,3,4,4) would give c(1,3,5,7)
i.e., a function that returns the indexes of the first occurrences of numbers.
Note that numbers may have any order e.g., c(3,4,1,2,1,1,2,3,5), can
be very large, a
to your question, but 1L and 2L are just the
> integers 1 and 2 (the L makes them integers instead of doubles which is
> useful for some things)
>
> Michael
>
> On May 11, 2012, at 2:15 PM, Emmanuel Levy wrote:
>
>> Hello,
>>
>> The heatmap function conveniently
Hello,
The heatmap function conveniently has a "reorder.dendrogram" function
so that clusters follow a certain logic.
It seems that the hclust function doesn't have such feature. I can use
the "reorder" function on the dendrogram obtained from hclust, but
this does not modify the hclust object it
OK, it seems that the array2df function from arrayhelpers package does
the job :)
On 19 April 2012 16:46, Emmanuel Levy wrote:
> Hi,
>
> I have a three dimensional array, e.g.,
>
> my.array = array(0, dim=c(2,3,4), dimnames=list( d1=c("A1","A2"),
> d2=c(&q
Hi,
I have a three dimensional array, e.g.,
my.array = array(0, dim=c(2,3,4), dimnames=list( d1=c("A1","A2"),
d2=c("B1","B2","B3"), d3=c("C1","C2","C3","C4")) )
what I would like to get is then a dataframe:
d1 d2 d3 value
A1 B1 C1 0
A2 B1 C1 0
.
.
.
A2 B3 C4 0
I'm sure there is one function t
bs-X1
Ytrans = Yabs-Y1
return(c(Xtrans,Ytrans))
}
On 12 March 2012 20:58, David Winsemius wrote:
>
> On Mar 12, 2012, at 3:07 PM, Emmanuel Levy wrote:
>
>> Hi Jeff,
>>
>> Thanks for your reply and the example.
>>
>> I'm not sure if it could
; DCN: Basics: ##.#. ##.#. Live Go...
> Live: OO#.. Dead: OO#.. Playing
> Research Engineer (Solar/Batteries O.O#. #.O#. with
> /Software/Embedded Controllers) .OO#. .OO#. rocks...1k
> ---
Hi,
I am trying to normalize some data. First I fitted a principal curve
(using the LCPM package), but now I would like to apply a
transformation so that the curve becomes a "straight diagonal line" on
the plot. The data used to fit the curve would then be normalized by
applying the same transfor
f y on x. That is not what you say you want,
> so these approaches are unlikely to work.
>
>
> -- Bert
>
> On Sat, Mar 10, 2012 at 6:20 PM, Emmanuel Levy
> wrote:
>> Hi,
>>
>> I'm wondering which function would allow fitting this type of data:
>>
Hi,
I'm wondering which function would allow fitting this type of data:
tmp=rnorm(2000)
X.1 = 5+tmp
Y.1 = 5+ (5*tmp+rnorm(2000))
tmp=rnorm(100)
X.2 = 9+tmp
Y.2 = 40+ (1.5*tmp+rnorm(100))
X.3 = 7+ 0.5*runif(500)
Y.3 = 15+20*runif(500)
X = c(X.1,X.2,X.3)
Y =
y(abs(my.loess$res)/max(abs(my.loess$res))) )
On 10 March 2012 18:30, Emmanuel Levy wrote:
> Hi,
>
> I posted a message earlier entitled "How to fit a line through the
> "Mountain crest" ..."
>
> I figured loess is probably the best way, but it seem
t; - not sure why I did not get an
error message.
I'll post the lines of code as a reply to the second post.
All the best,
Emmanuel
On 10 March 2012 19:46, David Winsemius wrote:
>
> On Mar 10, 2012, at 3:55 PM, Emmanuel Levy wrote:
>
>> Hi,
>>
>> I'
Hi,
I posted a message earlier entitled "How to fit a line through the
"Mountain crest" ..."
I figured loess is probably the best way, but it seems that the
problem is the robustness of the fit. Below I paste an example to
illustrate the problem:
tmp=rnorm(2000)
X.background = 5+tmp; Y.b
Hi,
I'm trying to normalize data by fitting a line through the highest density
of points (in a 2D plot).
In other words, if you visualize the data as a density plot, the fit I'm
trying to achieve is the line that goes through the "crest" of the mountain.
This is similar yet different to what LOES
Dear All,
I would like to generate random protein sequences using a HMM model.
Has anybody done that before, or would you have any idea which package
is likely to be best for that?
The important facts are that the HMM will be fitted on ~3 million
sequential observations, with 20 different states
Hello Roger,
Thanks for the suggestions.
I finally managed to do it using the output of kde2d - The code is
pasted below. Actually this made me realize that the outcome of kde2d
can be quite influenced by outliers if a boundary box is not given
(try running the code without the boundary box, e.g.
urprised if there is a trick with quantile that
escapes my mind.
Thanks for your help,
Emmanuel
On 19 November 2010 21:25, David Winsemius wrote:
>
> On Nov 19, 2010, at 8:44 PM, Emmanuel Levy wrote:
>
>> Hello,
>>
>> This sounds like a problem to which many sol
Hello,
This sounds like a problem to which many solutions should exist, but I
did not manage to find one.
Basically, given a list of datapoints, I'd like to keep those within
the X% percentile highest density.
That would be equivalent to retain only points within a given line of
a contour plot.
Update - sorry for the stupid question, let's say it's pretty late.
For those who may be as tired as I am and get the same warning, the
paper size should be given as an integer!
On 16 November 2010 04:17, Emmanuel Levy wrote:
> Hi,
>
> The pdf function would not let me ch
Hi,
The pdf function would not let me change the paper size and gives me
the following warning:
pdf("figure.pdf", width="6", height="10")
Warning message:
‘mode(width)’ and ‘mode(height)’ differ between new and previous
==> NOT changing ‘width’ & ‘height’
If I use the option paper = "
> But if the 1st order differences are the same, then doesn't it follow that
> the 2nd, 3rd, ... order differences must be the same between the original and
> the new "random" vector. What am I missing?
You are missing nothing sorry, I wrote something wrong. What I would
like to be preserved is
with this problem? Or even better of a package?
Thanks for your help,
Emmanuel
2009/8/12 Nordlund, Dan (DSHS/RDA) :
>> -Original Message-
>> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
>> Behalf Of Emmanuel Levy
>> Sent: Wedne
lp me solve it.
Many thanks!
Emmanuel
PS: I apologize that I sent a second post. This one did not appear in
my "R-help" label so I assumed it wasn't sent for some reason.
2009/8/12 Ted Harding :
> On 12-Aug-09 22:05:24, Emmanuel Levy wrote:
>> Dear All,
>&
Dear All,(my apologies if it got posted twice, it seems it didn't
get through)
I cannot find a solution to the following problem although I suppose
this is a classic.
I have a vector V of X=length(V) values comprised between 1 and N.
I would like to get random samples of X values also compri
Dear All,
I cannot find a solution to the following problem although I imagine
that it is a classic, hence my email.
I have a vector V of X values comprised between 1 and N.
I would like to get random samples of X values also comprised between
1 and N, but the important point is:
* I would like
Dear all,
I have been using normalize.loess and I get the following error
message when my matrix contains NA values:
> my.mat = matrix(nrow=100, ncol=4, runif(400) )
> my.mat[1,1]=NA
> my.mat.n = normalize.loess(my.mat, verbose=TRUE)
Done with 1 vs 2 in iteration 1
Done with 1 vs 3 in iteration 1
Dear Brian, Mose, Peter and Stefan,
Thanks a lot for your replies - the issues are now clearer to me. (and
I apologize for not using the appropriate list).
Best wishes,
Emmanuel
2008/11/19 Peter Dalgaard <[EMAIL PROTECTED]>:
> Stefan Evert wrote:
>>
>> On 19 Nov 2008, at 07:56, Prof Brian Ri
Dear All,
I just read an announcement saying that Mathematica is launching a
version working with Nvidia GPUs. It is claimed that it'd make it
~10-100x faster!
http://www.physorg.com/news146247669.html
I was wondering if you are aware of any development going into this
direction with R?
Thanks f
Hi Chuck,
Thanks a lot for your suggestion.
> You can find all such matches (not just the disjoint ones that gregexpr
> finds) using something like this:
>
>twomatch <-function(x,y) intersect(x+1,y)
>match.list <-
>list(
>which( vec %in% c(3
Dear All,
I have a long string and need to search for regular expressions in
there. However it becomes horribly slow as the string length
increases.
Below is an example: when "i" increases by 5, the time spent increases
by more! (my string is 11,000,000 letters long!)
I also noticed that
- the s
Dear All,
I hope the title speaks by itself.
I believe that there should be a solution when I see what Mclust is
able to do. However, this problem is quite
particular in that d3 is not known and does not necessarily correspond
to a common distribution (e.g. normal, exponential ...).
However it mu
Hi Duncan,
I'm really stupid --- yes of course!!
Thanks for pointing me out the (now) obvious.
All the best,
E
2008/10/21 Duncan Murdoch <[EMAIL PROTECTED]>:
> On 10/21/2008 2:56 PM, Emmanuel Levy wrote:
>>
>> Dear All,
>>
>> I have a distribution of
Dear All,
I have a distribution of values and I would like to assess the
uni/bimodality of the distribution.
I managed to decompose it into two normal distribs using Mclust, and
the BIC criteria is best for two parameters.
However, the problem is that the BIC criteria is not a P-value, which
I wo
,0.15),type="n",xlab=" ",ylab=" ",axes=F, ylim=c(0,0.4) )
axis(side=1)
for (i in 1:2) {
ni <- v$parameters$pro[i]*dnorm(x0,
mean=as.numeric(v$parameters$mean[i]),sd=1)
lines(x0,ni,col=1)
nt <- nt+ni
}
lines(x0,nt,lwd=3)
segments(my.data,0,my.data,0.02)
Best,
this would be great; is it possible to
somehow force the parameters (e.g variance) to be
greater than a particular threshold?
Thanks,
Emmanuel
2008/10/20 Emmanuel Levy <[EMAIL PROTECTED]>:
> Dear list members,
>
> I am using Mclust in order to deconvolute a distribution that I
&
Dear list members,
I am using Mclust in order to deconvolute a distribution that I
believe is a sum of two gaussians.
First I can make a model:
> my.data.model = Mclust(my.data, modelNames=c("E"), warn=T, G=1:3)
But then, when I try to plot the result, I get the following error:
> mclust1Dplot(
oblem should disappear. It relates to encoding of strings.
>
> D.
>
> Emmanuel Levy wrote:
>> Dear list members,
>>
>> I encountered this problem and the solution pointed out in a previous
>> thread did not work for me.
>> (e.g. install.packages("
Dear list members,
I encountered this problem and the solution pointed out in a previous
thread did not work for me.
(e.g. install.packages("RCurl", repos = "http://www.omegahat.org/R";)
I work with Ubuntu Hardy, and installed R 2.6.2 via apt-get.
I really need RCurl in order to use biomaRt ...
are doing. Can you make a small example
> that shows what you have and what you want?
>
> Is ?split what you are after?
>
> Emmanuel Levy wrote:
>>
>> Dear Peter and Henrik,
>>
>> Thanks for your replies - this helps speed up a bit, but I thought
>> t
l example
> that shows what you have and what you want?
>
> Is ?split what you are after?
>
> Emmanuel Levy wrote:
>>
>> Dear Peter and Henrik,
>>
>> Thanks for your replies - this helps speed up a bit, but I thought
>> there would be something much faster.
>>
gers does
> t4 <- system.time(res <- which(as.integer(x) == match("A", levels(x
> print(t4/t1);
> usersystem elapsed
> 0.417 0.000 0.3636364
>
> So, the latter seems to be the fastest way to identify those elements.
>
> My $.02
>
> /Hen
Dear All,
I have a large data frame ( 270 lines and 14 columns), and I would like to
extract the information in a particular way illustrated below:
Given a data frame "df":
> col1=sample(c(0,1),10, rep=T)
> names = factor(c(rep("A",5),rep("B",5)))
> df = data.frame(names,col1)
> df
names
0 - 0.17.
I haven't looked yet at the locfit package as it is not installed, but
I will check it out!
Thanks for helping!
Emmanuel
On 20/03/2008, David Winsemius <[EMAIL PROTECTED]> wrote:
> "Emmanuel Levy" <[EMAIL PROTECTED]> wrote in
> news:[EMAIL PROTECTED]
in
> the base distribution, which will do exactly what you requested.
>
>
> Bert Gunter
> Genentech Nonclinical Statistics
>
>
>
> -Original Message-
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On
> Behalf Of Emmanuel Levy
> Sent: Wednesday
Dear All,
I'm sure this is not the first time this question comes up but I
couldn't find the keywords that would point me out to it - so
apologies if this is a re-post.
Basically I've got thousands of points, each depending on three variables:
x, y, and z.
if I do a plot(x,y, col=z), I get somet
53 matches
Mail list logo