from:"Emmanuel Levy"

Re: [R] How to read a file containing two types of rows - (for the Netflix challenge data format)

2020-01-31 Thread Emmanuel Levy

( rep(first.col.val, reps-1), mat.clean) On Fri, 31 Jan 2020 at 20:31, Berry, Charles wrote: > > > > On Jan 31, 2020, at 1:04 AM, Emmanuel Levy > wrote: > > > > Hi, > > > > I'd like to use the Netflix challenge data and just can't figure

[R] How to read a file containing two types of rows - (for the Netflix challenge data format)

2020-01-31 Thread Emmanuel Levy

Hi, I'd like to use the Netflix challenge data and just can't figure out how to efficiently "scan" the files. https://www.kaggle.com/netflix-inc/netflix-prize-data The files have two types of row, either an *ID* e.g., "1:" , "2:", etc. or 3 values associated to each ID: The format is as follows:

Re: [R] Adding a column to an empty data.frame

2016-11-02 Thread Emmanuel Levy

column' data.frames, not 'empty' > data.frames, which could be either.) > > > > Bill Dunlap > TIBCO Software > wdunlap tibco.com > > On Wed, Nov 2, 2016 at 6:48 AM, Emmanuel Levy > wrote: > >> Dear All, >> >> This sounds simple but can&#

[R] Adding a column to an empty data.frame

2016-11-02 Thread Emmanuel Levy

Dear All, This sounds simple but can't figure out a good way to do it. Let's say that I have an empty data frame "df": ## creates the df df = data.frame( id=1, data=2) ## empties the df, perhaps there is a more elegant way to create an empty df? df = df[-c(1),] > df [1] id data <0 rows> (or

Re: [R] combining columns into a "combination index" of the same length

2015-07-21 Thread Emmanuel Levy

~ Sir Ronald Aylmer Fisher > The plural of anecdote is not data. ~ Roger Brinner > The combination of some data and an aching desire for an answer does not > ensure that a reasonable answer can be extracted from a given body of data. > ~ John Tukey > > 2015-07-21 15:43 GMT+02:00 E

[R] combining columns into a "combination index" of the same length

2015-07-21 Thread Emmanuel Levy

Hi, The answer to this is probably straightforward, I have a dataframe and I'd like to build an index of column combinations, e.g. col1 col2 --> col3 (the index I need) A 1 1 A 1 1 A 2 2 B 1 3 B 2 4 B 2 4 At th

Re: [R] Finding (swapped) repetitions of numbers pairs across two columns

2012-12-27 Thread Emmanuel Levy

I did not know that unique worked on entire rows! That is great, thank you very much! Emmanuel On 27 December 2012 22:39, Marc Schwartz wrote: > unique(t(apply(cbind(v1, v2), 1, sort))) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailm

[R] Finding (swapped) repetitions of numbers pairs across two columns

2012-12-27 Thread Emmanuel Levy

Hi, I've had this problem for a while and tackled it is a quite dirty way so I'm wondering is a better solution exists: If we have two vectors: v1 = c(0,1,2,3,4) v2 = c(5,3,2,1,0) How to remove one instance of the "3,1" / "1,3" double? At the moment I'm using the following solution, which is q

[R] Retrieve indexes of the "first occurrence of numbers" in an effective manner

2012-12-27 Thread Emmanuel Levy

Hi, That sounds simple but I cannot think of a really fast way of getting the following: c(1,1,2,2,3,3,4,4) would give c(1,3,5,7) i.e., a function that returns the indexes of the first occurrences of numbers. Note that numbers may have any order e.g., c(3,4,1,2,1,1,2,3,5), can be very large, a

Re: [R] How to re-order clusters of hclust output?

2012-05-11 Thread Emmanuel Levy

to your question, but 1L and 2L are just the > integers 1 and 2 (the L makes them integers instead of doubles which is > useful for some things) > > Michael > > On May 11, 2012, at 2:15 PM, Emmanuel Levy wrote: > >> Hello, >> >> The heatmap function conveniently

[R] How to re-order clusters of hclust output?

2012-05-11 Thread Emmanuel Levy

Hello, The heatmap function conveniently has a "reorder.dendrogram" function so that clusters follow a certain logic. It seems that the hclust function doesn't have such feature. I can use the "reorder" function on the dendrogram obtained from hclust, but this does not modify the hclust object it

Re: [R] How to "flatten" a multidimensional array into a dataframe?

2012-04-19 Thread Emmanuel Levy

OK, it seems that the array2df function from arrayhelpers package does the job :) On 19 April 2012 16:46, Emmanuel Levy wrote: > Hi, > > I have a three dimensional array, e.g., > > my.array = array(0, dim=c(2,3,4), dimnames=list( d1=c("A1","A2"), > d2=c(&q

[R] How to "flatten" a multidimensional array into a dataframe?

2012-04-19 Thread Emmanuel Levy

Hi, I have a three dimensional array, e.g., my.array = array(0, dim=c(2,3,4), dimnames=list( d1=c("A1","A2"), d2=c("B1","B2","B3"), d3=c("C1","C2","C3","C4")) ) what I would like to get is then a dataframe: d1 d2 d3 value A1 B1 C1 0 A2 B1 C1 0 . . . A2 B3 C4 0 I'm sure there is one function t

Re: [R] Idea/package to "linearize a curve" along the diagonal?

2012-03-13 Thread Emmanuel Levy

bs-X1 Ytrans = Yabs-Y1 return(c(Xtrans,Ytrans)) } On 12 March 2012 20:58, David Winsemius wrote: > > On Mar 12, 2012, at 3:07 PM, Emmanuel Levy wrote: > >> Hi Jeff, >> >> Thanks for your reply and the example. >> >> I'm not sure if it could

Re: [R] Idea/package to "linearize a curve" along the diagonal?

2012-03-12 Thread Emmanuel Levy

; DCN: Basics: ##.#. ##.#. Live Go... > Live: OO#.. Dead: OO#.. Playing > Research Engineer (Solar/Batteries O.O#. #.O#. with > /Software/Embedded Controllers) .OO#. .OO#. rocks...1k > ---

[R] Idea/package to "linearize a curve" along the diagonal?

2012-03-11 Thread Emmanuel Levy

Hi, I am trying to normalize some data. First I fitted a principal curve (using the LCPM package), but now I would like to apply a transformation so that the curve becomes a "straight diagonal line" on the plot. The data used to fit the curve would then be normalized by applying the same transfor

Re: [R] Which non-parametric regression would allow fitting this type of data? (example given).

2012-03-11 Thread Emmanuel Levy

f y on x. That is not what you say you want, > so these approaches are unlikely to work. > > > -- Bert > > On Sat, Mar 10, 2012 at 6:20 PM, Emmanuel Levy > wrote: >> Hi, >> >> I'm wondering which function would allow fitting this type of data: >>

[R] Which non-parametric regression would allow fitting this type of data? (example given).

2012-03-10 Thread Emmanuel Levy

Hi, I'm wondering which function would allow fitting this type of data: tmp=rnorm(2000) X.1 = 5+tmp Y.1 = 5+ (5*tmp+rnorm(2000)) tmp=rnorm(100) X.2 = 9+tmp Y.2 = 40+ (1.5*tmp+rnorm(100)) X.3 = 7+ 0.5*runif(500) Y.3 = 15+20*runif(500) X = c(X.1,X.2,X.3) Y =

Re: [R] How to improve the robustness of "loess"? - example included.

2012-03-10 Thread Emmanuel Levy

y(abs(my.loess$res)/max(abs(my.loess$res))) ) On 10 March 2012 18:30, Emmanuel Levy wrote: > Hi, > > I posted a message earlier entitled "How to fit a line through the > "Mountain crest" ..." > > I figured loess is probably the best way, but it seem

Re: [R] How to fit a line through the "Mountain crest", i.e., through the highest density of points - in a "loess-like" fashion.

2012-03-10 Thread Emmanuel Levy

t; - not sure why I did not get an error message. I'll post the lines of code as a reply to the second post. All the best, Emmanuel On 10 March 2012 19:46, David Winsemius wrote: > > On Mar 10, 2012, at 3:55 PM, Emmanuel Levy wrote: > >> Hi, >> >> I'

[R] How to improve the robustness of "loess"? - example included.

2012-03-10 Thread Emmanuel Levy

Hi, I posted a message earlier entitled "How to fit a line through the "Mountain crest" ..." I figured loess is probably the best way, but it seems that the problem is the robustness of the fit. Below I paste an example to illustrate the problem: tmp=rnorm(2000) X.background = 5+tmp; Y.b

[R] How to fit a line through the "Mountain crest", i.e., through the highest density of points - in a "loess-like" fashion.

2012-03-10 Thread Emmanuel Levy

Hi, I'm trying to normalize data by fitting a line through the highest density of points (in a 2D plot). In other words, if you visualize the data as a density plot, the fit I'm trying to achieve is the line that goes through the "crest" of the mountain. This is similar yet different to what LOES

[R] Best HMM package to generate random (protein) sequences?

2011-03-22 Thread Emmanuel Levy

Dear All, I would like to generate random protein sequences using a HMM model. Has anybody done that before, or would you have any idea which package is likely to be best for that? The important facts are that the HMM will be fitted on ~3 million sequential observations, with 20 different states

Re: [R] How to do a probability density based filtering in 2D?

2010-11-19 Thread Emmanuel Levy

Hello Roger, Thanks for the suggestions. I finally managed to do it using the output of kde2d - The code is pasted below. Actually this made me realize that the outcome of kde2d can be quite influenced by outliers if a boundary box is not given (try running the code without the boundary box, e.g.

Re: [R] How to do a probability density based filtering in 2D?

2010-11-19 Thread Emmanuel Levy

urprised if there is a trick with quantile that escapes my mind. Thanks for your help, Emmanuel On 19 November 2010 21:25, David Winsemius wrote: > > On Nov 19, 2010, at 8:44 PM, Emmanuel Levy wrote: > >> Hello, >> >> This sounds like a problem to which many sol

[R] How to do a probability density based filtering in 2D?

2010-11-19 Thread Emmanuel Levy

Hello, This sounds like a problem to which many solutions should exist, but I did not manage to find one. Basically, given a list of datapoints, I'd like to keep those within the X% percentile highest density. That would be equivalent to retain only points within a given line of a contour plot.

Re: [R] problem with PDF/postcript, cannot change paper size: "‘mode(width)’ and ‘mod e(height)’ differ between new and previous "

2010-11-16 Thread Emmanuel Levy

Update - sorry for the stupid question, let's say it's pretty late. For those who may be as tired as I am and get the same warning, the paper size should be given as an integer! On 16 November 2010 04:17, Emmanuel Levy wrote: > Hi, > > The pdf function would not let me ch

[R] problem with PDF/postcript, cannot change paper size: "‘mode(width)’ and ‘mod e(height)’ differ between new and previous "

2010-11-16 Thread Emmanuel Levy

Hi, The pdf function would not let me change the paper size and gives me the following warning: pdf("figure.pdf", width="6", height="10") Warning message: ‘mode(width)’ and ‘mode(height)’ differ between new and previous ==> NOT changing ‘width’ & ‘height’ If I use the option paper = "

Re: [R] Random sampling while keeping distribution of nearest neighbor distances constant.

2009-08-12 Thread Emmanuel Levy

> But if the 1st order differences are the same, then doesn't it follow that > the 2nd, 3rd, ... order differences must be the same between the original and > the new "random" vector. What am I missing? You are missing nothing sorry, I wrote something wrong. What I would like to be preserved is

Re: [R] Random sampling while keeping distribution of nearest neighbor distances constant.

2009-08-12 Thread Emmanuel Levy

with this problem? Or even better of a package? Thanks for your help, Emmanuel 2009/8/12 Nordlund, Dan (DSHS/RDA) : >> -Original Message- >> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On >> Behalf Of Emmanuel Levy >> Sent: Wedne

Re: [R] Random sampling while keeping distribution of nearest ne

2009-08-12 Thread Emmanuel Levy

lp me solve it. Many thanks! Emmanuel PS: I apologize that I sent a second post. This one did not appear in my "R-help" label so I assumed it wasn't sent for some reason. 2009/8/12 Ted Harding : > On 12-Aug-09 22:05:24, Emmanuel Levy wrote: >> Dear All, >&

[R] Random sampling while keeping distribution of nearest neighbor distances constant.

2009-08-12 Thread Emmanuel Levy

Dear All,(my apologies if it got posted twice, it seems it didn't get through) I cannot find a solution to the following problem although I suppose this is a classic. I have a vector V of X=length(V) values comprised between 1 and N. I would like to get random samples of X values also compri

[R] Random sampling while keeping distribution of nearest neighbor distances constant.

2009-08-12 Thread Emmanuel Levy

Dear All, I cannot find a solution to the following problem although I imagine that it is a classic, hence my email. I have a vector V of X values comprised between 1 and N. I would like to get random samples of X values also comprised between 1 and N, but the important point is: * I would like

[R] Is it normal that normalize.loess does not tolerate a single NA value?

2009-03-13 Thread Emmanuel Levy

Dear all, I have been using normalize.loess and I get the following error message when my matrix contains NA values: > my.mat = matrix(nrow=100, ncol=4, runif(400) ) > my.mat[1,1]=NA > my.mat.n = normalize.loess(my.mat, verbose=TRUE) Done with 1 vs 2 in iteration 1 Done with 1 vs 3 in iteration 1

Re: [R] Mathematica now working with Nvidia GPUs --> any plan for R?

2008-11-20 Thread Emmanuel Levy

Dear Brian, Mose, Peter and Stefan, Thanks a lot for your replies - the issues are now clearer to me. (and I apologize for not using the appropriate list). Best wishes, Emmanuel 2008/11/19 Peter Dalgaard <[EMAIL PROTECTED]>: > Stefan Evert wrote: >> >> On 19 Nov 2008, at 07:56, Prof Brian Ri

[R] Mathematica now working with Nvidia GPUs --> any plan for R?

2008-11-18 Thread Emmanuel Levy

Dear All, I just read an announcement saying that Mathematica is launching a version working with Nvidia GPUs. It is claimed that it'd make it ~10-100x faster! http://www.physorg.com/news146247669.html I was wondering if you are aware of any development going into this direction with R? Thanks f

Re: [R] gregexpr slow and increases exponentially with string length --> how to speed it up?

2008-10-30 Thread Emmanuel Levy

Hi Chuck, Thanks a lot for your suggestion. > You can find all such matches (not just the disjoint ones that gregexpr > finds) using something like this: > >twomatch <-function(x,y) intersect(x+1,y) >match.list <- >list( >which( vec %in% c(3

[R] gregexpr slow and increases exponentially with string length --> how to speed it up?

2008-10-30 Thread Emmanuel Levy

Dear All, I have a long string and need to search for regular expressions in there. However it becomes horribly slow as the string length increases. Below is an example: when "i" increases by 5, the time spent increases by more! (my string is 11,000,000 letters long!) I also noticed that - the s

[R] If I known d1 (density1), and dmix is a mix between d1 and d2 (d2 is unknown), can one infer d2?

2008-10-22 Thread Emmanuel Levy

Dear All, I hope the title speaks by itself. I believe that there should be a solution when I see what Mclust is able to do. However, this problem is quite particular in that d3 is not known and does not necessarily correspond to a common distribution (e.g. normal, exponential ...). However it mu

Re: [R] unimodal VS bimodal normal distribution - how to get a pvalue?

2008-10-21 Thread Emmanuel Levy

Hi Duncan, I'm really stupid --- yes of course!! Thanks for pointing me out the (now) obvious. All the best, E 2008/10/21 Duncan Murdoch <[EMAIL PROTECTED]>: > On 10/21/2008 2:56 PM, Emmanuel Levy wrote: >> >> Dear All, >> >> I have a distribution of

[R] unimodal VS bimodal normal distribution - how to get a pvalue?

2008-10-21 Thread Emmanuel Levy

Dear All, I have a distribution of values and I would like to assess the uni/bimodality of the distribution. I managed to decompose it into two normal distribs using Mclust, and the BIC criteria is best for two parameters. However, the problem is that the BIC criteria is not a P-value, which I wo

Re: [R] Mclust problem with mclust1Dplot: Error in to - from : non-numeric argument to binary operator

2008-10-21 Thread Emmanuel Levy

,0.15),type="n",xlab=" ",ylab=" ",axes=F, ylim=c(0,0.4) ) axis(side=1) for (i in 1:2) { ni <- v$parameters$pro[i]*dnorm(x0, mean=as.numeric(v$parameters$mean[i]),sd=1) lines(x0,ni,col=1) nt <- nt+ni } lines(x0,nt,lwd=3) segments(my.data,0,my.data,0.02) Best,

Re: [R] Mclust problem with mclust1Dplot: Error in to - from : non-numeric argument to binary operator

2008-10-21 Thread Emmanuel Levy

this would be great; is it possible to somehow force the parameters (e.g variance) to be greater than a particular threshold? Thanks, Emmanuel 2008/10/20 Emmanuel Levy <[EMAIL PROTECTED]>: > Dear list members, > > I am using Mclust in order to deconvolute a distribution that I &

[R] Mclust problem with mclust1Dplot: Error in to - from : non-numeric argument to binary operator

2008-10-20 Thread Emmanuel Levy

Dear list members, I am using Mclust in order to deconvolute a distribution that I believe is a sum of two gaussians. First I can make a model: > my.data.model = Mclust(my.data, modelNames=c("E"), warn=T, G=1:3) But then, when I try to plot the result, I get the following error: > mclust1Dplot(

Re: [R] RCurl compilation error on ubuntu hardy

2008-09-17 Thread Emmanuel Levy

oblem should disappear. It relates to encoding of strings. > > D. > > Emmanuel Levy wrote: >> Dear list members, >> >> I encountered this problem and the solution pointed out in a previous >> thread did not work for me. >> (e.g. install.packages("

[R] RCurl compilation error on ubuntu hardy

2008-09-16 Thread Emmanuel Levy

Dear list members, I encountered this problem and the solution pointed out in a previous thread did not work for me. (e.g. install.packages("RCurl", repos = "http://www.omegahat.org/R";) I work with Ubuntu Hardy, and installed R 2.6.2 via apt-get. I really need RCurl in order to use biomaRt ...

Re: [R] which(df$name=="A") takes ~1 second! (df is very large), but can it be speeded up?

2008-08-13 Thread Emmanuel Levy

are doing. Can you make a small example > that shows what you have and what you want? > > Is ?split what you are after? > > Emmanuel Levy wrote: >> >> Dear Peter and Henrik, >> >> Thanks for your replies - this helps speed up a bit, but I thought >> t

Re: [R] which(df$name=="A") takes ~1 second! (df is very large), but can it be speeded up?

2008-08-13 Thread Emmanuel Levy

l example > that shows what you have and what you want? > > Is ?split what you are after? > > Emmanuel Levy wrote: >> >> Dear Peter and Henrik, >> >> Thanks for your replies - this helps speed up a bit, but I thought >> there would be something much faster. >>

Re: [R] which(df$name=="A") takes ~1 second! (df is very large), but can it be speeded up?

2008-08-13 Thread Emmanuel Levy

gers does > t4 <- system.time(res <- which(as.integer(x) == match("A", levels(x > print(t4/t1); > usersystem elapsed > 0.417 0.000 0.3636364 > > So, the latter seems to be the fastest way to identify those elements. > > My $.02 > > /Hen

[R] which(df$name=="A") takes ~1 second! (df is very large), but can it be speeded up?

2008-08-12 Thread Emmanuel Levy

Dear All, I have a large data frame ( 270 lines and 14 columns), and I would like to extract the information in a particular way illustrated below: Given a data frame "df": > col1=sample(c(0,1),10, rep=T) > names = factor(c(rep("A",5),rep("B",5))) > df = data.frame(names,col1) > df names

Re: [R] Smoothing z-values according to their x, y positions

2008-03-19 Thread Emmanuel Levy

0 - 0.17. I haven't looked yet at the locfit package as it is not installed, but I will check it out! Thanks for helping! Emmanuel On 20/03/2008, David Winsemius <[EMAIL PROTECTED]> wrote: > "Emmanuel Levy" <[EMAIL PROTECTED]> wrote in > news:[EMAIL PROTECTED]

Re: [R] Smoothing z-values according to their x, y positions

2008-03-19 Thread Emmanuel Levy

in > the base distribution, which will do exactly what you requested. > > > Bert Gunter > Genentech Nonclinical Statistics > > > > -Original Message- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On > Behalf Of Emmanuel Levy > Sent: Wednesday

[R] Smoothing z-values according to their x, y positions

2008-03-19 Thread Emmanuel Levy

Dear All, I'm sure this is not the first time this question comes up but I couldn't find the keywords that would point me out to it - so apologies if this is a re-post. Basically I've got thousands of points, each depending on three variables: x, y, and z. if I do a plot(x,y, col=z), I get somet

53 matches

Mail list logo