[R] R and MySQL
I installed MySQL 5.0.67 and R. I installed RMySQL and added env variable MYSQL_HOME. But R still does not want to load the library. It says Error : .onLoad failed in loadNamespace() for 'RMySQL', details: call: NULL error: MYSQL_HOME was set but does not point to a directory Error: package/namespace load failed for 'RMySQL' Please help. -- View this message in context: http://r.789695.n4.nabble.com/R-and-MySQL-tp2340371p2340371.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] 05 - Re: R and MySQL
Hello De-Jian Thanks. That helped. It works now. On Fri, Aug 27, 2010 at 2:30 AM, De-Jian,ZHAO [via R] < ml-node+2340662-1760624908-138...@n4.nabble.com > wrote: > Waitlist reason: > subscription-bounces+138...@n4.nabble.comis > not on your > Guest List <https://www.boxbe.com/approved-list> | Approve > sender<https://www.boxbe.com/anno?tc=4459557811_1948909546>| Approve > domain <https://www.boxbe.com/anno?tc=4459557811_1948909546&dom> | [image: > Boxbe] <https://www.boxbe.com/overview> > > I am not sure whether you are working under windows. Hope the following > message helps. > > Using the RMySQL package under Windows > http://www.stat.berkeley.edu/users/spector/s133/RMySQL_windows.html > > On 2010-8-27 5:03, quant wrote: > > I installed MySQL 5.0.67 and R. I installed RMySQL and added env variable > > > MYSQL_HOME. But R still does not want to load the library. It says > > Error : .onLoad failed in loadNamespace() for 'RMySQL', details: > >call: NULL > >error: MYSQL_HOME was set but does not point to a directory > > Error: package/namespace load failed for 'RMySQL' > > > > Please help. > > > > __ > [hidden email] <http://user/SendEmail.jtp?type=node&node=2340662&i=0>mailing > list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > > -- > View message @ > http://r.789695.n4.nabble.com/R-and-MySQL-tp2340371p2340662.html > To unsubscribe from R and MySQL, click > here<http://r.789695.n4.nabble.com/template/TplServlet.jtp?tpl=unsubscribe_by_code&node=2340371&code=c2hhcmFwb3YuYW5kcmV5QGdtYWlsLmNvbXwyMzQwMzcxfDM3NTI1NzAzNA==>. > > > > -- View this message in context: http://r.789695.n4.nabble.com/R-and-MySQL-tp2340371p2341483.html Sent from the R help mailing list archive at Nabble.com. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] remove leading slash
Hello, How do I change this: > cnt_str [1] "\002" "\001" "\102" ...to this: > cnt_str [1] "2" "1" "102" Having trouble because of this: > nchar(cnt_str[1]) [1] 1 Thanks! Ben [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] remove leading slash
Thanks for all your help. I did it this way: > x = sapply(cnt_str,deparse) > x \002\001\002 "\"\\002\"" "\"\\001\"" "\"\\102\"" > as.numeric(substr(x,3,5)) [1] 2 1 102 ...which is a bit of a hack, but gets me where I want to go. Thanks, Ben On Fri, Jun 8, 2012 at 11:56 AM, Duncan Murdoch wrote: > On 08/06/2012 1:50 PM, Peter Langfelder wrote: > >> On Fri, Jun 8, 2012 at 10:25 AM, David >> Winsemius> >> wrote: >> > >> > On Jun 8, 2012, at 1:11 PM, Ben quant wrote: >> > >> >> Hello, >> >> >> >> How do I change this: >> >>> >> >>> cnt_str >> >> >> >> [1] "\002" "\001" "\102" >> >> >> >> ...to this: >> >>> >> >>> cnt_str >> >> >> >> [1] "2" "1" "102" >> >> >> >> Having trouble because of this: >> >>> >> >>> nchar(cnt_str[1]) >> >> >> >> [1] 1 >> > >> > >> > "\001" is ASCII cntrl-A, a single character. >> > >> > ?Quotes # not the first, second or third place I looked but I knew I >> had >> > seen it before. >> >> If you still want to obtain the actual codes, you will be able to get >> the number using utf8ToInt from package base or AsciiToInt from >> package sfsmisc. By default, the integer codes will be printed in base >> 10, though. >> > > You could use > > > as.octmode(as.integer(**charToRaw("\102"))) > [1] "102" > > if you really want the octal versions. Doesn't work so well on "\1022" > though (because that's two characters long). > > Duncan Murdoch > > >> A roundabout way, assuming your are on a *nix system, would be to >> dump() cnt_str into a file, say tmp.txt, then run in a shell (or using >> system() ) something like >> >> sed --in-place 's/\\//g' tmp.txt >> >> to remove the slashes, then use >> >> cnt_str_new = read.table("tmp.txt") >> >> in R to get the codes back in. I'll let you iron out the details. >> >> Peter >> >> __** >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help> >> PLEASE do read the posting guide http://www.R-project.org/** >> posting-guide.html <http://www.R-project.org/posting-guide.html> >> and provide commented, minimal, self-contained, reproducible code. >> > > __** > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help> > PLEASE do read the posting guide http://www.R-project.org/** > posting-guide.html <http://www.R-project.org/posting-guide.html> > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] remove leading slash
Okay, Bill smelt something wrong, so I must revise. This works for large numbers: prds = sapply(sapply(cnt_str,charToRaw),as.integer) PS - this also solves an issue I've been having elsewhere... PPS- Bill - I'm reading binary files...and learning. thanks! ben On Fri, Jun 8, 2012 at 12:16 PM, William Dunlap wrote: > Can you tell us why you are interested in this mapping? > I.e., how did the "\001" and "\102" arise and why do you > want to convert them to the integers 1 and 102? > > Bill Dunlap > Spotfire, TIBCO Software > wdunlap tibco.com > > > > -Original Message- > > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] > On Behalf > > Of Ben quant > > Sent: Friday, June 08, 2012 11:00 AM > > To: Duncan Murdoch > > Cc: r-help@r-project.org > > Subject: Re: [R] remove leading slash > > > > Thanks for all your help. I did it this way: > > > > > x = sapply(cnt_str,deparse) > > > x > >\002\001\002 > > "\"\\002\"" "\"\\001\"" "\"\\102\"" > > > as.numeric(substr(x,3,5)) > > [1] 2 1 102 > > > > ...which is a bit of a hack, but gets me where I want to go. > > > > Thanks, > > Ben > > > > On Fri, Jun 8, 2012 at 11:56 AM, Duncan Murdoch < > murdoch.dun...@gmail.com>wrote: > > > > > On 08/06/2012 1:50 PM, Peter Langfelder wrote: > > > > > >> On Fri, Jun 8, 2012 at 10:25 AM, David > > Winsemius> > > >> wrote: > > >> > > > >> > On Jun 8, 2012, at 1:11 PM, Ben quant wrote: > > >> > > > >> >> Hello, > > >> >> > > >> >> How do I change this: > > >> >>> > > >> >>> cnt_str > > >> >> > > >> >> [1] "\002" "\001" "\102" > > >> >> > > >> >> ...to this: > > >> >>> > > >> >>> cnt_str > > >> >> > > >> >> [1] "2" "1" "102" > > >> >> > > >> >> Having trouble because of this: > > >> >>> > > >> >>> nchar(cnt_str[1]) > > >> >> > > >> >> [1] 1 > > >> > > > >> > > > >> > "\001" is ASCII cntrl-A, a single character. > > >> > > > >> > ?Quotes # not the first, second or third place I looked but I > knew I > > >> had > > >> > seen it before. > > >> > > >> If you still want to obtain the actual codes, you will be able to get > > >> the number using utf8ToInt from package base or AsciiToInt from > > >> package sfsmisc. By default, the integer codes will be printed in base > > >> 10, though. > > >> > > > > > > You could use > > > > > > > as.octmode(as.integer(**charToRaw("\102"))) > > > [1] "102" > > > > > > if you really want the octal versions. Doesn't work so well on "\1022" > > > though (because that's two characters long). > > > > > > Duncan Murdoch > > > > > > > > >> A roundabout way, assuming your are on a *nix system, would be to > > >> dump() cnt_str into a file, say tmp.txt, then run in a shell (or using > > >> system() ) something like > > >> > > >> sed --in-place 's/\\//g' tmp.txt > > >> > > >> to remove the slashes, then use > > >> > > >> cnt_str_new = read.table("tmp.txt") > > >> > > >> in R to get the codes back in. I'll let you iron out the details. > > >> > > >> Peter > > >> > > >> __** > > >> R-help@r-project.org mailing list > > >> https://stat.ethz.ch/mailman/**listinfo/r- > > help<https://stat.ethz.ch/mailman/listinfo/r-help> > > >> PLEASE do read the posting guide http://www.R-project.org/** > > >> posting-guide.html <http://www.R-project.org/posting-guide.html> > > >> and provide commented, minimal, self-contained, reproducible code. > > >> > > > > > > __** > > > R-help@r-project.org mailing list > > > https://stat.ethz.ch/mailman/**listinfo/r-help< > https://stat.ethz.ch/mailman/listinfo/r- > > help> > > > PLEASE do read the posting guide http://www.R-project.org/** > > > posting-guide.html <http://www.R-project.org/posting-guide.html> > > > and provide commented, minimal, self-contained, reproducible code. > > > > > > > [[alternative HTML version deleted]] > > > > __ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] remove leading slash
Yes, I've been messing with that. I've also been using the hexView package. Reading as characters first is just helping me figure out the structure of this binary file. In this situation it really helped. For example: È \001 \002 20012 This probably isn't how I'll do it in my final draft. I'm now looking for a date or series of dates in the binary file... I'm guessing the dates will be represented as 3 integers one for month, day, and year. Any help on strategy help here would be great... I'm reading a file with a dbs extension if that helps. Thanks! ben On Fri, Jun 8, 2012 at 12:44 PM, William Dunlap wrote: > When reading binary files, it is usually best to use readBin's > > what=, size=, signed=, and endian= arguments to get what you want. > > Reading as characters and then converting them as you are doing > > is a very hard way to do things (and this particular conversion doesn't*** > * > > make much sense). > > ** ** > > Bill Dunlap**** > > Spotfire, TIBCO Software > > wdunlap tibco.com > > ** ** > > *From:* Ben quant [mailto:ccqu...@gmail.com] > *Sent:* Friday, June 08, 2012 11:40 AM > *To:* William Dunlap > > *Cc:* r-help@r-project.org > *Subject:* Re: [R] remove leading slash > > ** ** > > Okay, Bill smelt something wrong, so I must revise. > > This works for large numbers: > > prds = sapply(sapply(cnt_str,charToRaw),as.integer) > > PS - this also solves an issue I've been having elsewhere... > PPS- Bill - I'm reading binary files...and learning. > > thanks! > ben > > > > On Fri, Jun 8, 2012 at 12:16 PM, William Dunlap wrote: > > > Can you tell us why you are interested in this mapping? > I.e., how did the "\001" and "\102" arise and why do you > want to convert them to the integers 1 and 102? > > Bill Dunlap > Spotfire, TIBCO Software > wdunlap tibco.com > > > > > -Original Message- > > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] > On Behalf > > Of Ben quant > > Sent: Friday, June 08, 2012 11:00 AM > > To: Duncan Murdoch > > Cc: r-help@r-project.org > > Subject: Re: [R] remove leading slash > > > > Thanks for all your help. I did it this way: > > > > > x = sapply(cnt_str,deparse) > > > x > >\002\001\002 > > "\"\\002\"" "\"\\001\"" "\"\\102\"" > > > as.numeric(substr(x,3,5)) > > [1] 2 1 102 > > > > ...which is a bit of a hack, but gets me where I want to go. > > > > Thanks, > > Ben > > > > On Fri, Jun 8, 2012 at 11:56 AM, Duncan Murdoch < > murdoch.dun...@gmail.com>wrote: > > > > > On 08/06/2012 1:50 PM, Peter Langfelder wrote: > > > > > >> On Fri, Jun 8, 2012 at 10:25 AM, David > > > Winsemius> > > > >> wrote: > > >> > > > >> > On Jun 8, 2012, at 1:11 PM, Ben quant wrote: > > >> > > > >> >> Hello, > > >> >> > > >> >> How do I change this: > > >> >>> > > >> >>> cnt_str > > >> >> > > >> >> [1] "\002" "\001" "\102" > > >> >> > > >> >> ...to this: > > >> >>> > > >> >>> cnt_str > > >> >> > > >> >> [1] "2" "1" "102" > > >> >> > > >> >> Having trouble because of this: > > >> >>> > > >> >>> nchar(cnt_str[1]) > > >> >> > > >> >> [1] 1 > > >> > > > >> > > > >> > "\001" is ASCII cntrl-A, a single character. > > >> > > > >> > ?Quotes # not the first, second or third place I looked but I > knew I > > >> had > > >> > seen it before. > > >> > > >> If you still want to obtain the actual codes, you will be able to get > > >> the number using utf8ToInt from package base or AsciiToInt from > > >> package sfsmisc. By default, the integer codes will be printed in base > > >> 10, though. > > >> > > > > > > You could use > > > > > > > > as.octmode(as.integer(**charToRaw("\102"))) > > > > [1] "102" > > > > > &g
[R] strings concatenation and organization (fast)
Hello, What is the fastest way to do this? I has to be done quite a few times. Basically I have sets of 3 numbers (as characters) and sets of 3 dashes and I have to store them in named columns. The order of the sets and the column name they fall under is important. The actual numbers and the pattern/order of the sets should be considered random/unpredictable. Sample data: vec = c("1","2","3","-","-","-","4","5","6","1","2","3","-","-","-") rep_vec = rep(vec,times=20) nms = c("A","B","C","D") I need to get this: A B C D "123" "---" "456" "123" "---" "123" "---" "456" "123" "---" "123" "---" "456" "123" "---" "123" "---" "456" "123" "---" Note: a matrix of 4 columns and 5 rows of concatenated string sets. Thanks!! Ben [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] strings concatenation and organization (fast)
I'm checking out Phil's solution...so far so good. Thanks! Yes, 25 not 5 rows, sorry about that. Rui - I can't modify rep_vec...that's just sample data. I have to start with rep_vec and go from there. have a good weekend all... Ben On Fri, Jun 15, 2012 at 2:51 PM, Rui Barradas wrote: > Hello, > > Try > > > > vec = c("1","2","3","-","-","-","4",**"5","6","1","2","3","-","-","-**") > nms = c("A","B","C","D") > rep_vec <- rep(sapply(split(vec, cumsum(rep(c(1, 0, 0), 5))), paste, > collapse=""), 4) > mat <- matrix(rep_vec, nrow=5, byrow=TRUE, dimnames=list(NULL,nms)) > mat > > > Hope this helps, > > Rui Barradas > > Em 15-06-2012 21:11, Ben quant escreveu: > >> Hello, >> >> What is the fastest way to do this? I has to be done quite a few times. >> Basically I have sets of 3 numbers (as characters) and sets of 3 dashes >> and >> I have to store them in named columns. The order of the sets and the >> column >> name they fall under is important. The actual numbers and the >> pattern/order >> of the sets should be considered random/unpredictable. >> >> Sample data: >> vec = c("1","2","3","-","-","-","4",**"5","6","1","2","3","-","-","-**") >> rep_vec = rep(vec,times=20) >> nms = c("A","B","C","D") >> >> I need to get this: >> A B C D >> "123" "---" "456" "123" >> "---" "123" "---" "456" >> "123" "---" "123" "---" >> "456" "123" "---" "123" >> "---" "456" "123" "---" >> >> Note: a matrix of 4 columns and 5 rows of concatenated string sets. >> >> Thanks!! >> >> Ben >> >>[[alternative HTML version deleted]] >> >> __** >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help> >> PLEASE do read the posting guide http://www.R-project.org/** >> posting-guide.html <http://www.R-project.org/posting-guide.html> >> and provide commented, minimal, self-contained, reproducible code. >> >> > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] seek(), skip by bits (not by bytes) in binary file
Hello, Has a function been built that will skip to a certain bit in a binary file? As of 2009 the answer was 'no': http://r.789695.n4.nabble.com/read-binary-file-seek-td900847.html https://stat.ethz.ch/pipermail/r-help/2009-May/199819.html If you feel I don't need to (like in the links above), please provide some help. (Note this is my first time working with binary files.) I'm still working on the script, but here is where I am right now. The for loop is being used because: 1) I have to get down to correct position then get the info I want/need. The stuff I am reading through (x) is not fully understood and it is a mix of various chars, floats, integers, etc. of various sizes etc. so I don't know who many bytes to read in unless I read them bit by bit. (The information and structure of the information changes daily so I'm skipping over it.) 2) If I skip all in one readBin() my 'n' value is often up to 20 times too big (I get an error) and/or R won't let me "allocate a vector of size" etc. So I split it up into chunks (divide by 20 etc.) and read each chuck then trash each part that is readBin()'d. Then the last line I get the data that I want (data1). Here is my working code: # I have to read 'junk' bits from the to.read file which is huge integer so I divide it up and loop through to.read in parts (jb_part). divr = 20 mod = junk %% divr jb_part = as.integer(junk/divr) jb_part_mod = jb_part + mod # catch the remainder/modulus to.read = file(paste(dbs_path,"/",dbs_file,sep=""),"rb") # connect to the binary file # loop in chunks to where I want to be for(i in 1:(divr-1)){ x = readBin(to.read,"raw",n=jb_part,size=1) x = NULL # trash the result b/c I don't want it } # read a a little more to include the remainder/modulus bits left over by dividing by 20 above x = readBin(to.read,'raw',n=jb_part_mod,size=1) x = NULL # trash it # finally get the data that I want data1 = readBin(to.read,double(),n=some_number,size=size_to_use) This works, but it is SLOW! Any ideas on how to get down to the correct bit a bit quicker (pun intended). :) Thanks for any help! Ben [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] seek(), skip by bits (not by bytes) in binary file
Other people at my firm who know a lot about binary files couldn't figure out the parts of the file that I am skipping over. Part of the issue is that there are several different files (dbs extension files) like this that I have to process and the structures do change depending on the source of these files. In short, the problem is over my head and I was hoping to go right to the correct bit and read, which would make things much easier. I guess not... Thanks for your help though. Anyone else? thanks, ben On Tue, Jun 19, 2012 at 10:10 AM, jim holtman wrote: > I am not sure why reading through 'bit-by-bit' gets you to where you > want to be. I assume that the file has some structure, even though it > may be changing daily. You mentioned the various types of data that > it might contain; are they all in 'byte' sized chucks? If you really > have data that begins in the middle of a byte and then extends over > several bytes, you will have to write some functions that will pull > out this data and then reconstruct it into an object (e.g., integer, > numeric, ...) that R understands. Can you provide some more > definition of what the data actually looks like and how you would find > the "pattern" of the data. Almost all systems read at the lowest > level byte sized chucks, and if you really have to get down to the bit > level to reconstruct the data, then you have to write the unpack/pack > functions. This can all be done once you understand the structure of > the data. So some examples would be useful if you want someone to > propose a solution. > > On Tue, Jun 19, 2012 at 11:54 AM, Ben quant wrote: > > Hello, > > > > Has a function been built that will skip to a certain bit in a binary > file? > > > > As of 2009 the answer was 'no': > > http://r.789695.n4.nabble.com/read-binary-file-seek-td900847.html > > https://stat.ethz.ch/pipermail/r-help/2009-May/199819.html > > > > If you feel I don't need to (like in the links above), please provide > some > > help. (Note this is my first time working with binary files.) > > > > I'm still working on the script, but here is where I am right now. The > for > > loop is being used because: > > > > 1) I have to get down to correct position then get the info I want/need. > > The stuff I am reading through (x) is not fully understood and it is a > mix > > of various chars, floats, integers, etc. of various sizes etc. so I don't > > know who many bytes to read in unless I read them bit by bit. (The > > information and structure of the information changes daily so I'm > skipping > > over it.) > > 2) If I skip all in one readBin() my 'n' value is often up to 20 times > too > > big (I get an error) and/or R won't let me "allocate a vector of > size" > > etc. So I split it up into chunks (divide by 20 etc.) and read each chuck > > then trash each part that is readBin()'d. Then the last line I get the > data > > that I want (data1). > > > > Here is my working code: > > > > # I have to read 'junk' bits from the to.read file which is huge integer > so > > I divide it up and loop through to.read in parts (jb_part). > > divr = 20 > > mod = junk %% divr > > > > jb_part = as.integer(junk/divr) > > jb_part_mod = jb_part + mod # catch the remainder/modulus > > > > to.read = file(paste(dbs_path,"/",dbs_file,sep=""),"rb") # connect to > the > > binary file > > # loop in chunks to where I want to be > > for(i in 1:(divr-1)){ > >x = readBin(to.read,"raw",n=jb_part,size=1) > >x = NULL # trash the result b/c I don't want it > > } > > # read a a little more to include the remainder/modulus bits left over by > > dividing by 20 above > > x = readBin(to.read,'raw',n=jb_part_mod,size=1) > > x = NULL # trash it > > > > # finally get the data that I want > > data1 = readBin(to.read,double(),n=some_number,size=size_to_use) > > > > This works, but it is SLOW! Any ideas on how to get down to the correct > > bit a bit quicker (pun intended). :) > > > > Thanks for any help! > > > > Ben > > > >[[alternative HTML version deleted]] > > > > __ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > -- > Jim Holtman > Data Munger Guru > > What is the problem that you are trying to solve? > Tell me what you want to do, not how you want to do it. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] seek(), skip by bits (not by bytes) in binary file
This post got me thinking and this works (fast!) to get the first 10 integers that I want: #I'm still testing this... # once I find the value of 'junk' and 'size_to_use', which I already had/have. to.read = file(file_path_name,"rb") seek(to.read,where=junk) data1 = readBin(to.read,integer(),n=10,size=size_to_use) Seems kinda silly that I didn't think of this before...I looked into using seek() before... Anyway, thanks for helping me think it through. PS - I still don't know how to use "the 3rd bit of the 71st byte" ...or was that an example of how to think about the problem? Thanks! Ben On Tue, Jun 19, 2012 at 11:07 AM, Jeff Newmiller wrote: > If the structure really changes day by day, then you have to decipher how > it is constructed in order to find the correct bit to go to. > > If you think you already know which bit to go to, then the way you know is > "the 3rd bit of the 71st byte", which means that the existing seek function > should be sufficient to get that byte and pick apart the bits to get the > ones you want. > > I recommend using the hexBin package for this kind of task. > --- > Jeff NewmillerThe . . Go Live... > DCN:Basics: ##.#. ##.#. Live > Go... > Live: OO#.. Dead: OO#.. Playing > Research Engineer (Solar/BatteriesO.O#. #.O#. with > /Software/Embedded Controllers) .OO#. .OO#. rocks...1k > --- > Sent from my phone. Please excuse my brevity. > > > > Ben quant wrote: > > >Other people at my firm who know a lot about binary files couldn't > >figure > >out the parts of the file that I am skipping over. Part of the issue is > >that there are several different files (dbs extension files) like this > >that > >I have to process and the structures do change depending on the source > >of > >these files. > > > >In short, the problem is over my head and I was hoping to go right to > >the > >correct bit and read, which would make things much easier. I guess > >not... > >Thanks for your help though. > > > >Anyone else? > > > >thanks, > > > >ben > > > >On Tue, Jun 19, 2012 at 10:10 AM, jim holtman > >wrote: > > > >> I am not sure why reading through 'bit-by-bit' gets you to where you > >> want to be. I assume that the file has some structure, even though > >it > >> may be changing daily. You mentioned the various types of data that > >> it might contain; are they all in 'byte' sized chucks? If you really > >> have data that begins in the middle of a byte and then extends over > >> several bytes, you will have to write some functions that will pull > >> out this data and then reconstruct it into an object (e.g., integer, > >> numeric, ...) that R understands. Can you provide some more > >> definition of what the data actually looks like and how you would > >find > >> the "pattern" of the data. Almost all systems read at the lowest > >> level byte sized chucks, and if you really have to get down to the > >bit > >> level to reconstruct the data, then you have to write the unpack/pack > >> functions. This can all be done once you understand the structure of > >> the data. So some examples would be useful if you want someone to > >> propose a solution. > >> > >> On Tue, Jun 19, 2012 at 11:54 AM, Ben quant > >wrote: > >> > Hello, > >> > > >> > Has a function been built that will skip to a certain bit in a > >binary > >> file? > >> > > >> > As of 2009 the answer was 'no': > >> > http://r.789695.n4.nabble.com/read-binary-file-seek-td900847.html > >> > https://stat.ethz.ch/pipermail/r-help/2009-May/199819.html > >> > > >> > If you feel I don't need to (like in the links above), please > >provide > >> some > >> > help. (Note this is my first time working with binary files.) > >> > > >> > I'm still working on the script, but here is where I am right now. > >The > >> for > >> > loop is being used because: > >> > > >> > 1) I have to get down to correct position then get the info I > >want/need. > >> > The stuff I am reading through (x) is not fully understood and it > >is a &
[R] activate console
Hello, After I plot something how do I reactivate the console (and not the plot window) so I don't have to click on the console each time to go to the next command? Example that does not work: fun = function(x){ plot(x); dev.set(dev.prev())} fun(1:4) ...and another that does not work: fun = function(x){ plot(x); dev.set(NULL)} fun(1:4) Again, by 'not work' I mean I can't seem to give control back to the console after I plot. I didn't find anything online. thanks, Ben [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] activate console
Perfect, thanks! Naturally, now I need to resize the console so it doesn't cover my new plots. I'd like to resize it on the fly (from within the function) then reset the size to its previous size. So, curConsoleDims() and resize.console() are a made up functions, but they demonstrate what I am trying to do: fun = function(x){ plot(x) bringToTop(-1)#bring console to top (activate console or give console control) thanks Bert! dims = curConsoleDims() on.exit(dims) resize.console(width=100,height=100) } fun(1:4) Thanks! Ben On Wed, Nov 16, 2011 at 10:37 AM, Bert Gunter wrote: > ??focus ## admittedly, not the first keyword that comes to mind > ?bringToTop > > -- Bert > > On Wed, Nov 16, 2011 at 9:07 AM, Ben quant wrote: > > Hello, > > > > After I plot something how do I reactivate the console (and not the plot > > window) so I don't have to click on the console each time to go to the > next > > command? > > > > Example that does not work: > > > > fun = function(x){ plot(x); dev.set(dev.prev())} > > fun(1:4) > > > > ...and another that does not work: > > fun = function(x){ plot(x); dev.set(NULL)} > > fun(1:4) > > > > Again, by 'not work' I mean I can't seem to give control back to the > > console after I plot. I didn't find anything online. > > > > thanks, > > > > Ben > > > >[[alternative HTML version deleted]] > > > > __ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > > > -- > > Bert Gunter > Genentech Nonclinical Biostatistics > > Internal Contact Info: > Phone: 467-7374 > Website: > > http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] zeros to NA's - faster
Hello, Is there a faster way to do this? Basically, I'd like to NA all values in all_data if there are no 1's in the same column of the other matrix, iu. Put another way, I want to replace values in the all_data columns if values in the same column in iu are all 0. This is pretty slow for me, but works: all_data = matrix(c(1:9),3,3) colnames(all_data) = c('a','b','c') > all_data a b c [1,] 1 4 7 [2,] 2 5 8 [3,] 3 6 9 iu = matrix(c(1,0,0,0,1,0,0,0,0),3,3) colnames(iu) = c('a','b','c') > iu a b c [1,] 1 0 0 [2,] 0 1 0 [3,] 0 0 0 fun = function(x,d){ vals = d[,x] i = iu[,x] if(!any(i==1)){ vals = rep(NA,times=length(vals)) }else{ vals } vals } all_data = sapply(colnames(iu),fun,all_data) > all_data a b c [1,] 1 4 NA [2,] 2 5 NA [3,] 3 6 NA ...again, this work, but is slow for a large number of columns. Have anything faster? Thanks, ben [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] variable types - logistic regression
Hello, Is there an example out there that shows how to treat each of the predictor variable types when doing logistic regression in R? Something like this: glm(y~x1+x2+x3+x4, data=mydata, family=binomial(link="logit"), na.action=na.pass) I'm drawing mostly from: http://www.ats.ucla.edu/stat/r/dae/logit.htm ...but there are only two types of variable in the example given. I'm wondering if the answer is that easy or if I have to consider more with different types of variables. It seems like as.factor() is doing a lot of the organization for me. I will need to understand how to perform logistic regression in R on all data types all in the same model (potentially). As it stands, I think I can solve all of my data type issues with: as.factor(x,ordered=T) ...for all discrete ordinal variables as.factor(x, ordered=F) ...for all discrete nominal variables ...and do nothing for everything else. I'm pretty sure its not that simple because of some other posts I've seen, but I haven't seen a post that discusses ALL data types in logistic regression. Here is what I think will work at this point: glm(y ~ **all_other_vars + as.factor(disc_ord_var,ordered=T) + as.factor(disc_nom_var,ordered=F), data=mydata, family=binomial(link="logit"), na.action=na.pass) I'm also looking for any best practices help as well. I'm new'ish to R...and oddly enough I haven't had the pleasure of doing much regression R yet. Regards, Ben [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] logistic regression - glm.fit: fitted probabilities numerically 0 or 1 occurred
Sorry if this is a duplicate: This is a re-post because the pdf's mentioned below did not go through. Hello, I'm new'ish to R, and very new to glm. I've read a lot about my issue: Warning message: glm.fit: fitted probabilities numerically 0 or 1 occurred ...including: http://tolstoy.newcastle.edu.au/R/help/05/07/7759.html http://r.789695.n4.nabble.com/glm-fit-quot-fitted-probabilities-numerically-0-or-1-occurred-quot-td849242.html (note that I never found: "MASS4 pp.197-8" However, Ted's post was quite helpful.) This is a common question, sorry. Because it is a common issue I am posting everything I know about the issue and how I think I am not falling into the same trap at the others (but I must be due to some reason I am not yet aware of). >From the two links above I gather that my warning "glm.fit: fitted probabilities numerically 0 or 1 occurred" arises from a "perfect fit" situation (i.e. the issue where all the high value x's (predictor variables) are Y=1 (response=1) or the other way around). I don't feel my data has this issue. Please point out how it does! The list post instructions state that I can attach pdf's, so I attached plots of my data right before I do the call to glm. The attachments are plots of my data stored in variable l_yx (as can be seen in the axis names): My response (vertical axis) by row index (horizontal axis): plot(l_yx[,1],type='h') My predictor variable (vertical axis) by row index index (horizontal axis): plot(l_yx[,2],type='h') So here is more info on my data frame/data (in case you can't see my pdf attachments): > unique(l_yx[,1]) [1] 0 1 > mean(l_yx[,2]) [1] 0.01123699 > max(l_yx[,2]) [1] 14.66518 > min(l_yx[,2]) [1] 0 > attributes(l_yx) $dim [1] 690303 2 $dimnames $dimnames[[1]] NULL $dimnames[[2]] [1] "y" "x" With the above data I do: > l_logit = glm(y~x, data=as.data.frame(l_yx), family=binomial(link="logit")) Warning message: glm.fit: fitted probabilities numerically 0 or 1 occurred Why am I getting this warning when I have data points of varying values for y=1 and y=0? In other words, I don't think I have the linear separation issue discussed in one of the links I provided. PS - Then I do this and I get a odds ratio a crazy size: > l_sm = summary(l_logit) # coef pval is $coefficients[8], log odds $coefficients[2] > l_exp_coef = exp(l_logit$coefficients)[2] # exponentiate the coeffcients > l_exp_coef x 3161.781 So for one unit increase in the predictor variable I get 3160.781% (3161.781 - 1 = 3160.781) increase in odds? That can't be correct either. How do I correct for this issue? (I tried multiplying the predictor variables by a constant and the odds ratio goes down, but the warning above still persists and shouldn't the odds ratio be predictor variable size independent?) Thank you for your help! Ben [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] logistic regression - glm.fit: fitted probabilities numerically 0 or 1 occurred
e-02 6.457757e-03 2.942391e-04 1.628352e-02 8.288831e-03 3.170856e-04 1.251331e+00 1.706954e-02 1.063723e-03 [181] 1.374416e-02 2.140507e-02 2.817009e-02 2.272793e-02 4.365562e-02 6.089414e-03 2.498083e-02 1.360471e-02 1.884079e-02 1.448660e-02 2.341314e-02 8.167064e-03 4.109117e-02 2.660633e-02 7.711723e-03 [196] 9.590278e-03 2.515490e-03 1.978033e-02 3.454990e-02 8.072748e-03 4.718885e-03 1.621131e-01 4.547743e-03 1.081195e-02 9.572051e-04 1.790391e-02 1.618026e-02 1.910230e-02 1.861914e-02 3.485475e-02 [211] 2.844890e-03 1.866889e-02 1.378208e-02 2.451514e-02 2.535044e-03 3.921364e-04 1.557266e-03 3.315892e-03 1.752821e-03 6.786187e-03 1.360921e-02 9.550702e-03 8.114506e-03 5.068741e-03 1.729822e-02 [226] 1.902033e-02 8.196564e-03 2.632880e-03 1.587969e-02 8.354079e-04 1.050023e-03 4.236195e-04 9.181120e-03 4.995919e-04 1.092234e-02 1.207544e-02 2.187243e-01 3.251349e-02 1.269134e-03 1.557751e-04 [241] 1.232498e-02 2.654449e-02 1.049324e-03 8.442729e-03 6.331691e-03 1.715609e-02 1.017800e-03 9.230006e-03 1.331373e-02 5.596195e-02 1.296551e-03 5.272687e-03 2.805640e-02 4.790665e-02 2.043011e-02 [256] 1.047226e-02 1.866499e-02 9.323001e-03 8.920536e-03 1.582911e-03 2.776238e-03 2.914762e-02 4.402356e-03 9.555274e-04 1.681966e-03 7.584319e-04 6.758914e-02 1.505431e-02 2.213308e-02 1.329330e-02 [271] 7.284363e-03 2.687818e-02 2.997535e-03 7.470007e-03 2.070569e-03 3.441944e-02 1.717768e-02 4.523364e-02 1.003558e-02 1.365111e-02 1.906845e-02 1.676223e-02 3.506809e-04 9.164257e-02 9.008416e-03 [286] 1.073903e-02 4.855937e-03 8.618043e-03 2.529247e-02 1.059375e-02 5.834253e-03 2.004309e-02 1.460387e-02 2.899190e-02 5.867984e-03 1.983956e-02 6.834339e-03 1.925821e-03 9.231870e-03 6.839616e-03 [301] 1.029972e-02 2.009769e-02 9.458785e-03 1.183901e-02 8.911549e-03 1.264745e-02 2.995451e-03 7.657983e-04 5.315853e-03 1.325039e-02 1.044103e-02 2.307236e-02 2.780789e-02 1.735145e-02 9.053126e-03 [316] 5.847638e-02 3.815715e-03 5.087690e-03 1.040513e-02 4.475672e-02 6.564791e-02 3.233571e-03 1.076193e-02 8.283819e-02 5.370256e-03 3.533256e-02 1.302812e-02 1.896783e-02 2.055282e-02 3.572239e-03 [331] 5.867681e-03 5.864974e-04 9.715807e-03 1.665469e-02 5.082044e-02 3.547168e-03 3.069631e-03 1.274717e-02 1.858226e-03 3.104809e-04 1.247831e-02 2.073575e-03 3.544110e-04 7.240736e-03 8.452117e-05 [346] 8.149151e-04 4.942461e-05 1.142303e-03 6.265512e-04 3.666717e-04 3.244669e-02 7.242018e-03 6.335951e-04 2.329072e-02 3.719716e-03 2.803425e-02 1.623981e-02 6.387102e-03 8.807679e-03 1.214914e-02 [361] 6.699341e-03 1.148082e-02 1.329736e-02 1.537364e-03 2.004390e-02 1.562065e-02 1.655465e-02 9.960172e-02 2.174588e-02 1.209472e-02 2.328413e-02 2.012760e-04 1.422327e-02 2.194455e-03 2.307362e-02 [376] 4.315764e-03 3.208576e-02 3.826598e-02 1.828001e-02 3.935978e-03 5.294211e-04 1.392423e-02 6.588394e-03 1.040147e-03 1.260787e-02 9.051757e-04 5.353215e-02 6.049058e-02 1.382630e-01 1.064124e-01 [391] 3.380742e-03 1.798038e-02 1.557048e-01 1.217146e-02 4.140520e-02 4.707564e-02 2.786042e-02 8.836988e-03 5.542879e-03 1.862664e-02 8.858770e-03 1.026681e-03 1.692105e-02 8.849238e-03 7.143816e-03 [406] 1.630118e-02 1.165920e-01 9.471496e-03 4.879998e-02 1.388216e-02 1.453267e-02 4.845224e-04 1.415190e-03 1.208627e-02 1.372348e-02 2.573131e-02 1.169595e-02 1.825447e-02 2.574299e-02 5.301360e-02 [421] 6.961110e-03 7.781891e-03 1.013308e-03 3.160916e-03 1.090344e-02 1.530841e-02 9.398088e-04 9.143726e-04 1.286683e-02 2.006193e-02 1.774378e-02 5.681591e-02 9.584676e-03 7.957152e-02 4.485609e-03 [436] 1.086684e-02 2.930273e-03 6.085481e-03 4.342320e-03 1.31e-02 2.120402e-02 4.477545e-02 1.991814e-02 8.893947e-03 7.790133e-03 1.610199e-02 2.441280e-02 2.781231e-03 1.410080e-02 1.639912e-02 [451] 1.797498e-02 1.185382e-02 2.775063e-02 3.797315e-02 1.428883e-02 1.272659e-02 2.390500e-03 7.503478e-03 8.965356e-03 2.139452e-02 2.028536e-02 6.916416e-02 1.615986e-02 4.837412e-02 1.561731e-02 [466] 7.130332e-03 9.208406e-05 1.099934e-02 2.003469e-02 1.395857e-02 9.883482e-03 4.110852e-02 1.202052e-02 2.833039e-02 1.233236e-02 2.145801e-02 7.900161e-03 4.663819e-02 4.410819e-03 5.115056e-04 [481] 9.100270e-04 4.013683e-03 1.227139e-02 3.304697e-03 2.919099e-03 6.112390e-03 1.99e-02 1.208282e-03 1.164037e-02 2.166888e-02 4.381615e-02 5.318929e-03 7.226343e-03 2.732819e-02 2.385092e-04 [496] 4.905250e-02 1.159876e-02 4.068228e-03 3.349013e-02 1.273468e-03 Thanks for your help, Ben On Thu, Dec 1, 2011 at 11:55 AM, peter dalgaard wrote: > > On Dec 1, 2011, at 18:54 , Ben quant wrote: > > > Sorry if this is a duplicate: This is a re-post because the pdf's > mentioned > > below did not go through. > > Still not there. Sometimes it's because your mailer doesn't label them > with the appropriate mime-type (e.g. as application/octet-stream, which is > "arbitrary binary"). Anyways, see below > > [snip] > > > > With the above data I do: > >>l_log
Re: [R] logistic regression - glm.fit: fitted probabilities numerically 0 or 1 occurred
Here you go: > attach(as.data.frame(l_yx)) > range(x[y==1]) [1] -22500.746. > range(x[y==0]) [1] -10076.5303653.0228 How do I know what is acceptable? Also, here are the screen shots of my data that I tried to send earlier (two screen shots, two pages): http://scientia.crescat.net/static/ben/warn%20num%200%20or%201.pdf Thank you, Ben On Thu, Dec 1, 2011 at 2:24 PM, peter dalgaard wrote: > > On Dec 1, 2011, at 21:32 , Ben quant wrote: > > > Thank you for the feedback, but my data looks fine to me. Please tell me > if I'm not understanding. > > Hum, then maybe it really is a case of a transition region being short > relative to the range of your data. Notice that the warning is just that: a > warning. I do notice that the distribution of your x values is rather > extreme -- you stated a range of 0--14 and a mean of 0.01. And after all, > an odds ratio of 3000 per unit is only a tad over a doubling per 0.1 units. > > Have a look at range(x[y==0]) and range(x[y==1]). > > > > > > I followed your instructions and here is a sample of the first 500 > values : (info on 'd' is below that) > > > > > d <- as.data.frame(l_yx) > > > x = with(d, y[order(x)]) > > > x[1:500] # I have 1's and 0's dispersed throughout > > [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 > > [101] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > > [201] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 > 1 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 > 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 1 0 0 1 0 0 0 1 0 0 > > [301] 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 0 1 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > > [401] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 > 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 > > > > # I get the warning still > > > l_df = as.data.frame(l_yx) > > > l_logit = glm(y~x, data=l_df, family=binomial(link="logit")) > > > > Warning message: > > glm.fit: fitted probabilities numerically 0 or 1 occurred > > > > # some info on 'd' above: > > > > > d[1:500,1] > > [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > > [101] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > > [201] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > > [301] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > > [401] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > > > d[1:500,2] > > [1] 3.023160e-03 7.932130e-02 0.00e+00 1.779657e-02 1.608374e-01 > 0.00e+00 5.577064e-02 7.753926e-03 4.018553e-03 4.760918e-02 > 2.080511e-02 1.642404e-01 3.703720e-03 8.901981e-02 1.260415e-03 > > [16] 2.202523e-02 3.750940e-02 4.441975e-04 9.351171e-03 8.374567e-03 > 0.00e+00 8.440448e-02 5.081017e-01 2.538640e-05 1.806017e-02 > 2.954641e-04 1.434859e-03 6.964976e-04 0.00e+00 1.202162e-02 > > [31] 3.420300e-03 4.276100e-02 1.457324e-02 4.140121e-03 1.349180e-04 > 1.525292e-03 4.817502e-02 9.515717e-03 2.232953e-02 1.227908e-01 > 3.293581e-02 1.454352e-02 1.176011e-03 6.274138e-02 2.879205e-02 > > [46] 6.900926e-03 1.414648e-04 3.446349e-02 8.807174e-03 3.549332e-02 > 2.828509e-03 2.935003e-02 7.162872e-03 5.650050e-03 1.221191e-02 > 0.00e+00 2.126334e-02 2.052020e-02 7.542409e-02 2.586269e-04 > > [61] 5.258664e-02 1.213126e-02 1.493275e-02 8.152657e-03 1.7
Re: [R] logistic regression - glm.fit: fitted probabilities numerically 0 or 1 occurred
Oops! Please ignore my last post. I mistakenly gave you different data I was testing with. This is the correct data: Here you go: > attach(as.data.frame(l_yx)) > range(x[y==0]) [1] 0.0 14.66518 > range(x[y==1]) [1] 0.0 13.49791 How do I know what is acceptable? Also, here are the screen shots of my data that I tried to send earlier (two screen shots, two pages): http://scientia.crescat.net/static/ben/warn%20num%200%20or%201.pdf Thank you, Ben On Thu, Dec 1, 2011 at 3:07 PM, Ben quant wrote: > Here you go: > > > attach(as.data.frame(l_yx)) > > range(x[y==1]) > [1] -22500.746. > > range(x[y==0]) > [1] -10076.5303653.0228 > > How do I know what is acceptable? > > Also, here are the screen shots of my data that I tried to send earlier > (two screen shots, two pages): > http://scientia.crescat.net/static/ben/warn%20num%200%20or%201.pdf > > Thank you, > > Ben > > > On Thu, Dec 1, 2011 at 2:24 PM, peter dalgaard wrote: > >> >> On Dec 1, 2011, at 21:32 , Ben quant wrote: >> >> > Thank you for the feedback, but my data looks fine to me. Please tell >> me if I'm not understanding. >> >> Hum, then maybe it really is a case of a transition region being short >> relative to the range of your data. Notice that the warning is just that: a >> warning. I do notice that the distribution of your x values is rather >> extreme -- you stated a range of 0--14 and a mean of 0.01. And after all, >> an odds ratio of 3000 per unit is only a tad over a doubling per 0.1 units. >> >> Have a look at range(x[y==0]) and range(x[y==1]). >> >> >> > >> > I followed your instructions and here is a sample of the first 500 >> values : (info on 'd' is below that) >> > >> > > d <- as.data.frame(l_yx) >> > > x = with(d, y[order(x)]) >> > > x[1:500] # I have 1's and 0's dispersed throughout >> > [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 >> > [101] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> > [201] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 >> 1 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 >> 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 1 0 0 1 0 0 0 1 0 0 >> > [301] 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 0 1 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> > [401] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 >> 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 >> > >> > # I get the warning still >> > > l_df = as.data.frame(l_yx) >> > > l_logit = glm(y~x, data=l_df, family=binomial(link="logit")) >> > >> > Warning message: >> > glm.fit: fitted probabilities numerically 0 or 1 occurred >> > >> > # some info on 'd' above: >> > >> > > d[1:500,1] >> > [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> > [101] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> > [201] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> > [301] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> > [401] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> > > d[1:500,2] >> > [1] 3.023160e-03 7.932130e-02 0.00e+00 1.779657e-02 1.6083
Re: [R] logistic regression - glm.fit: fitted probabilities numerically 0 or 1 occurred
I'm not proposing this as a permanent solution, just investigating the warning. I zeroed out the three outliers and received no warning. Can someone tell me why I am getting no warning now? I did this 3 times to get rid of the 3 outliers: mx_dims = arrayInd(which.max(l_yx), dim(l_yx)) l_yx[mx_dims] = 0 Now this does not produce an warning: l_logit = glm(y~x, data=as.data.frame(l_yx), family=binomial(link="logit")) Can someone tell me why occurred? Also, again, here are the screen shots of my data that I tried to send earlier (two screen shots, two pages): http://scientia.crescat.net/static/ben/warn%20num%200%20or%201.pdf Thank you for your help, Ben On Thu, Dec 1, 2011 at 3:25 PM, Ben quant wrote: > Oops! Please ignore my last post. I mistakenly gave you different data I > was testing with. This is the correct data: > > Here you go: > > > attach(as.data.frame(l_yx)) > > range(x[y==0]) > [1] 0.0 14.66518 > > range(x[y==1]) > [1] 0.0 13.49791 > > > How do I know what is acceptable? > > Also, here are the screen shots of my data that I tried to send earlier > (two screen shots, two pages): > http://scientia.crescat.net/static/ben/warn%20num%200%20or%201.pdf > > Thank you, > > Ben > > On Thu, Dec 1, 2011 at 3:07 PM, Ben quant wrote: > >> Here you go: >> >> > attach(as.data.frame(l_yx)) >> > range(x[y==1]) >> [1] -22500.746. >> > range(x[y==0]) >> [1] -10076.5303653.0228 >> >> How do I know what is acceptable? >> >> Also, here are the screen shots of my data that I tried to send earlier >> (two screen shots, two pages): >> http://scientia.crescat.net/static/ben/warn%20num%200%20or%201.pdf >> >> Thank you, >> >> Ben >> >> >> On Thu, Dec 1, 2011 at 2:24 PM, peter dalgaard wrote: >> >>> >>> On Dec 1, 2011, at 21:32 , Ben quant wrote: >>> >>> > Thank you for the feedback, but my data looks fine to me. Please tell >>> me if I'm not understanding. >>> >>> Hum, then maybe it really is a case of a transition region being short >>> relative to the range of your data. Notice that the warning is just that: a >>> warning. I do notice that the distribution of your x values is rather >>> extreme -- you stated a range of 0--14 and a mean of 0.01. And after all, >>> an odds ratio of 3000 per unit is only a tad over a doubling per 0.1 units. >>> >>> Have a look at range(x[y==0]) and range(x[y==1]). >>> >>> >>> > >>> > I followed your instructions and here is a sample of the first 500 >>> values : (info on 'd' is below that) >>> > >>> > > d <- as.data.frame(l_yx) >>> > > x = with(d, y[order(x)]) >>> > > x[1:500] # I have 1's and 0's dispersed throughout >>> > [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 >>> > [101] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >>> 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >>> > [201] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 >>> 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 >>> 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 1 0 0 1 0 0 0 1 0 0 >>> > [301] 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >>> 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 0 1 0 >>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >>> > [401] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 >>> 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 >>> > >>> > # I get the warning still >>> > > l_df = as.data.frame(l_yx) >>> > > l_logit = glm(y~x, data=l_df, family=binomial(link="logit")) >>> > >>> > Warning message: >>> > glm.fit: fitted probabilities numerically 0 or 1 occurred >>> > >>> > # some info on 'd' above: >>> > >>> > > d[1:500,1] >>> > [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 &g
Re: [R] logistic regression - glm.fit: fitted probabilities numerically 0 or 1 occurred
Thank you so much for your help. The data I am using is the last file called l_yx.RData at this link (the second file contains the plots from earlier): http://scientia.crescat.net/static/ben/ Seems like the warning went away with pmin(x,1) but now the OR is over 15k. If I multiple my x's by 1000 I get a much more realistic OR. So I guess this brings me to a much different question: aren't OR's comparable between factors/data? In this case they don't seem to be. However, with different data the OR's only change a very small amount (+8.0e-4) when I multiply the x's by 1000. I don't understand. Anyways, here is a run with the raw data and a run with your suggestion (pmin(x,1)) that removed the error: > l_logit = glm(y~x, data=as.data.frame(l_yx), family=binomial(link="logit")) > l_logit Call: glm(formula = y ~ x, family = binomial(link = "logit"), data = as.data.frame(l_yx)) Coefficients: (Intercept)x -2.2938.059 Degrees of Freedom: 690302 Total (i.e. Null); 690301 Residual Null Deviance: 448800 Residual Deviance: 447100 AIC: 447100 > l_exp_coef = exp(l_logit$coefficients)[2] > l_exp_coef x 3161.781 > dim(l_yx) [1] 690303 2 > l_yx = cbind(l_yx[,1],pmin(l_yx[,2],1)) > dim(l_yx) [1] 690303 2 > colnames(l_yx) = c('y','x') > mean(l_yx[,2]) [1] 0.01117248 > range(l_yx[,2]) [1] 0 1 > head(l_yx[,2]) [1] 0.00302316 0.07932130 0. 0.01779657 0.16083735 0. > unique(l_yx[,1]) [1] 0 1 > l_logit = glm(y~x, data=as.data.frame(l_yx), family=binomial(link="logit")) > l_logit Call: glm(formula = y ~ x, family = binomial(link = "logit"), data = as.data.frame(l_yx)) Coefficients: (Intercept)x -2.3129.662 Degrees of Freedom: 690302 Total (i.e. Null); 690301 Residual Null Deviance: 448800 Residual Deviance: 446800 AIC: 446800 > l_exp_coef = exp(l_logit$coefficients)[2] > l_exp_coef x 15709.52 Thanks, Ben On Thu, Dec 1, 2011 at 4:32 PM, peter dalgaard wrote: > > On Dec 1, 2011, at 23:43 , Ben quant wrote: > > > I'm not proposing this as a permanent solution, just investigating the > warning. I zeroed out the three outliers and received no warning. Can > someone tell me why I am getting no warning now? > > It's easier to explain why you got the warning before. If the OR for a one > unit change is 3000, the OR for a 14 unit change is on the order of 10^48 > and that causes over/underflow in the conversion to probabilities. > > I'm still baffled at how you can get that model fitted to your data, > though. One thing is that you can have situations where there are fitted > probabilities of one corresponding to data that are all one and/or fitted > zeros where data are zero, but you seem to have cases where you have both > zeros and ones at both ends of the range of x. Fitting a zero to a one or > vice versa would make the likelihood zero, so you'd expect that the > algorithm would find a better set of parameters rather quickly. Perhaps the > extremely large number of observations that you have has something to do > with it? > > You'll get the warning if the fitted zeros or ones occur at any point of > the iterative procedure. Maybe it isn't actually true for the final model, > but that wouldn't seem consistent with the OR that you cited. > > Anyways, your real problem lies with the distribution of the x values. I'd > want to try transforming it to something more sane. Taking logarithms is > the obvious idea, but you'd need to find out what to do about the zeros -- > perhaps log(x + 1e-4) ? Or maybe just cut the outliers down to size with > pmin(x,1). > > > > > I did this 3 times to get rid of the 3 outliers: > > mx_dims = arrayInd(which.max(l_yx), dim(l_yx)) > > l_yx[mx_dims] = 0 > > > > Now this does not produce an warning: > > l_logit = glm(y~x, data=as.data.frame(l_yx), > family=binomial(link="logit")) > > > > Can someone tell me why occurred? > > > > Also, again, here are the screen shots of my data that I tried to send > earlier (two screen shots, two pages): > > http://scientia.crescat.net/static/ben/warn%20num%200%20or%201.pdf > > > > Thank you for your help, > > > > Ben > > > > On Thu, Dec 1, 2011 at 3:25 PM, Ben quant wrote: > > Oops! Please ignore my last post. I mistakenly gave you different data I > was testing with. This is the correct data: > > > > Here you go: > > > > > attach(as.data.frame(l_yx)) > > > range(x[y==0]) > > [1] 0.0 14.66518 > > > range(x[y==1]) > > [1] 0.0 13.49791 > &
[R] R on the cloud - Windows to Linux
Hello, I'm working with the gam function and due to the amount of data I am working with it is taking a long time to run. I looked at the tips to get it to run faster, but none have acceptable side effects. That is the real problem. I have accepted that gam will run a long time. I will be running gam many times for many different models. To make gam useable I am looking at splitting the work up and putting all of it on an Amazon EC2 cloud. I have a Windows machine and I'm (planning on) running Linux EC2 instances via Amazon. I have R running on one EC2 instance now. Now I'm looking to: 1) division of processing 2) creating/terminating instances via R 3) porting code and data to the cloud 4) producing plots on the cloud and getting them back on my (Windows) computer for review 5) do all of the above programmically (over night) I am new'ish to R, brand new to the cloud, and I am new to Linux (but I have access to a Linux expert at my company). I'm looking for 1) guidance so I am headed in the best direction from the start, 2) any gotchas I can learn from, 3) package suggestions. Thank you very much for your assistance! Regards, Ben [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R on the cloud - Windows to Linux
Thank you! I subscribed to R-hpc, thanks. I replied and I'm waiting for list approval. I am willing to work, but I'm not sure what to do to get these to work. I literally started using the cloud yesterday and R a couple months ago. I don't know where to start because, it looks like rzmq is not available for Windows and it looks like AWS.tools and deathstar depend on rzmq, so by dependency they seem unavailable to me since I have a local Windows box. Or do I have this wrong? I want to work, but where do I start? Will using a local Windows box continue to be an issue as I progress with R and EC2? I've run into several hurdles already, including some that are not associated with the cloud. Thank you for your help! Ben On Wed, Dec 7, 2011 at 7:00 PM, Whit Armstrong wrote: > subscribe to R-hpc. > > and check out these: > https://github.com/armstrtw/rzmq > https://github.com/armstrtw/AWS.tools > https://github.com/armstrtw/deathstar > > and this: > http://code.google.com/p/segue/ > > If you're willing to work, you can probably get deathstar to work > using a local windows box and remote linux nodes. > > -Whit > > > On Wed, Dec 7, 2011 at 6:02 PM, Ben quant wrote: > > Hello, > > > > I'm working with the gam function and due to the amount of data I am > > working with it is taking a long time to run. I looked at the tips to get > > it to run faster, but none have acceptable side effects. That is the real > > problem. > > > > I have accepted that gam will run a long time. I will be running gam many > > times for many different models. To make gam useable I am looking at > > splitting the work up and putting all of it on an Amazon EC2 cloud. I > have > > a Windows machine and I'm (planning on) running Linux EC2 instances via > > Amazon. > > > > I have R running on one EC2 instance now. Now I'm looking to: > > > > 1) division of processing > > 2) creating/terminating instances via R > > 3) porting code and data to the cloud > > 4) producing plots on the cloud and getting them back on my (Windows) > > computer for review > > 5) do all of the above programmically (over night) > > > > I am new'ish to R, brand new to the cloud, and I am new to Linux (but I > > have access to a Linux expert at my company). I'm looking for 1) guidance > > so I am headed in the best direction from the start, 2) any gotchas I can > > learn from, 3) package suggestions. > > > > Thank you very much for your assistance! > > > > Regards, > > > > Ben > > > >[[alternative HTML version deleted]] > > > > __ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R on the cloud - Windows to Linux
Due to my lack of experience with R and the cloud I am leery about attempting any patch dev for Windows compatibility. I think it would be cool to contribute at some point, but I think I am still too new. Anyway, I'm looking into using my company's linux server via Putty and use that as my local machine (as you suggest). Thanks! Ben On Thu, Dec 8, 2011 at 9:44 AM, Whit Armstrong wrote: > > I don't know where to start because, it looks like rzmq is not available > for > > Windows and it looks like AWS.tools and deathstar depend on rzmq, so by > > Hence my reference to work. patches welcome. > > > Will using a > > local Windows box continue to be an issue as I progress with R and EC2? > I've > > run into several hurdles already, including some that are not associated > > with the cloud. > > My opinion only, but if you want to use big data and hpc, then use linux. > > If you move your data into s3, you can simply boot up a micro linux > instance in the cloud and do your development there (I think usage of > a micro instance is free w/ a new AWS account). > > If you have local linux servers available, then even better. > > -Whit > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] gam, what is the function(s)
Hello, I'd like to understand 'what' is predicting the response for library(mgcv) gam? For example: library(mgcv) fit <- gam(y~s(x),data=as.data.frame(l_yx),family=binomial) xx <- seq(min(l_yx[,2]),max(l_yx[,2]),len=101) plot(xx,predict(fit,data.frame(x=xx),type="response"),type="l") I want to see the generalized function(s) used to predict the response that is plotted above. In other words, f(x) = {[what?]}. I'm new to gam and relatively new to R. I did read ?gam, but I didn't see what I wanted. Thanks, Ben [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] gam, what is the function(s)
Thank you Simon. I already ordered your book. Regards, Ben On Fri, Dec 9, 2011 at 10:49 AM, Simon Wood wrote: > See help("mgcv-FAQ"), item 2. > > best, > Simon > > > On 09/12/11 15:05, Ben quant wrote: > >> Hello, >> >> I'd like to understand 'what' is predicting the response for library(mgcv) >> gam? >> >> For example: >> >> library(mgcv) >> fit<- gam(y~s(x),data=as.data.frame(**l_yx),family=binomial) >> xx<- seq(min(l_yx[,2]),max(l_yx[,2]**),len=101) >> plot(xx,predict(fit,data.**frame(x=xx),type="response"),**type="l") >> >> I want to see the generalized function(s) used to predict the response >> that >> is plotted above. In other words, f(x) = {[what?]}. I'm new to gam and >> relatively new to R. I did read ?gam, but I didn't see what I wanted. >> >> Thanks, >> >> Ben >> >>[[alternative HTML version deleted]] >> >> __** >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help> >> PLEASE do read the posting guide http://www.R-project.org/** >> posting-guide.html <http://www.R-project.org/posting-guide.html> >> and provide commented, minimal, self-contained, reproducible code. >> >> > > -- > Simon Wood, Mathematical Science, University of Bath BA2 7AY UK > +44 (0)1225 386603 http://people.bath.ac.uk/sw283 > > > __** > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help> > PLEASE do read the posting guide http://www.R-project.org/** > posting-guide.html <http://www.R-project.org/posting-guide.html> > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R on the cloud - Windows to Linux
: expected primary-expression before â>â token interface.cpp:202: error: expected `)' before â;â token interface.cpp:204: error: âmsgâ was not declared in this scope interface.cpp:210: error: âmsgâ was not declared in this scope interface.cpp: In function âSEXPREC* receiveInt(SEXPREC*)â: interface.cpp:227: error: âzmqâ has not been declared interface.cpp:227: error: expected `;' before âmsgâ interface.cpp:228: error: âzmqâ has not been declared interface.cpp:228: error: âsocketâ was not declared in this scope interface.cpp:228: error: expected type-specifier before âzmqâ interface.cpp:228: error: expected `>' before âzmqâ interface.cpp:228: error: expected `(' before âzmqâ interface.cpp:228: error: âzmqâ has not been declared interface.cpp:228: error: expected primary-expression before â>â token interface.cpp:228: error: expected `)' before â;â token interface.cpp:230: error: âmsgâ was not declared in this scope interface.cpp:235: error: âmsgâ was not declared in this scope interface.cpp:240: error: âmsgâ was not declared in this scope interface.cpp: In function âSEXPREC* receiveDouble(SEXPREC*)â: interface.cpp:250: error: âzmqâ has not been declared interface.cpp:250: error: expected `;' before âmsgâ interface.cpp:251: error: âzmqâ has not been declared interface.cpp:251: error: âsocketâ was not declared in this scope interface.cpp:251: error: expected type-specifier before âzmqâ interface.cpp:251: error: expected `>' before âzmqâ interface.cpp:251: error: expected `(' before âzmqâ interface.cpp:251: error: âzmqâ has not been declared interface.cpp:251: error: expected primary-expression before â>â token interface.cpp:251: error: expected `)' before â;â token interface.cpp:253: error: âmsgâ was not declared in this scope interface.cpp:258: error: âmsgâ was not declared in this scope interface.cpp:263: error: âmsgâ was not declared in this scope make: *** [interface.o] Error 1 ERROR: compilation failed for package ârzmqâ * removing â/home/bnachtrieb/R/x86_64-redhat-linux-gnu-library/2.13/rzmqâ The downloaded packages are in â/tmp/RtmpoTdDMm/downloaded_packagesâ Warning message: In install.packages("rzmq", dependencies = TRUE) : installation of package 'rzmq' had non-zero exit status > Thank you for your help! Ben On Wed, Dec 7, 2011 at 7:00 PM, Whit Armstrong wrote: > subscribe to R-hpc. > > and check out these: > https://github.com/armstrtw/rzmq > https://github.com/armstrtw/AWS.tools > https://github.com/armstrtw/deathstar > > and this: > http://code.google.com/p/segue/ > > If you're willing to work, you can probably get deathstar to work > using a local windows box and remote linux nodes. > > -Whit > > > On Wed, Dec 7, 2011 at 6:02 PM, Ben quant wrote: > > Hello, > > > > I'm working with the gam function and due to the amount of data I am > > working with it is taking a long time to run. I looked at the tips to get > > it to run faster, but none have acceptable side effects. That is the real > > problem. > > > > I have accepted that gam will run a long time. I will be running gam many > > times for many different models. To make gam useable I am looking at > > splitting the work up and putting all of it on an Amazon EC2 cloud. I > have > > a Windows machine and I'm (planning on) running Linux EC2 instances via > > Amazon. > > > > I have R running on one EC2 instance now. Now I'm looking to: > > > > 1) division of processing > > 2) creating/terminating instances via R > > 3) porting code and data to the cloud > > 4) producing plots on the cloud and getting them back on my (Windows) > > computer for review > > 5) do all of the above programmically (over night) > > > > I am new'ish to R, brand new to the cloud, and I am new to Linux (but I > > have access to a Linux expert at my company). I'm looking for 1) guidance > > so I am headed in the best direction from the start, 2) any gotchas I can > > learn from, 3) package suggestions. > > > > Thank you very much for your assistance! > > > > Regards, > > > > Ben > > > >[[alternative HTML version deleted]] > > > > __ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] folders of path - platform independent
Hello, I'm attempting to get the folders of a path in a robust way (platform independent, format independent). It has to run on Windows and Linux and tolerate different formats. For these: (The paths don't actually exist in Linux but you get the idea.) Windows: file_full_path = "C://Program Files//R//R-2.13.1//NEWS.pdf" file_full_path = "C:\Program Files\R\R-2.13.1\NEWS.pdf" Linux: file_full_path = "~/Program FilesR\R-2.13.1\NEWS.pdf" file_full_path = "/home/username/Program FilesR\R-2.13.1\NEWS.pdf" I would get for Windows: "C", "Program Files", "R", "R-2.13.1","NEWS.pdf" I would get for Linux: "home","username", "Program Files", "R", "R-2.13.1","NEWS.pdf" (The drive and/or home/username aren't necessary, but would be nice to have. Also, that file name isn't necessary, but would be nice.) Thank you for your help, Ben [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] folders of path - platform independent (repost)
Hello, (sorry re-posting due to typo) I'm attempting to get the folders of a path in a robust way (platform independent, format independent). It has to run on Windows and Linux and tolerate different formats. For these: (The paths don't actually exist in Linux but you get the idea.) Windows: file_full_path = "C://Program Files//R//R-2.13.1//NEWS.pdf" file_full_path = "C:\Program Files\R\R-2.13.1\NEWS.pdf" Linux: file_full_path = "~/Program FilesR/R-2.13.1/NEWS.pdf" file_full_path = "/home/username/Program FilesR/R-2.13.1/NEWS.pdf" I would get for Windows: "C", "Program Files", "R", "R-2.13.1","NEWS.pdf" I would get for Linux: "home","username", "Program Files", "R", "R-2.13.1","NEWS.pdf" (The drive and/or home/username aren't necessary, but would be nice to have. Also, that file name isn't necessary, but would be nice.) Thank you for your help, Ben [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] folders of path - platform independent (repost)
Excellent! Thanks, ben On Wed, Dec 28, 2011 at 2:37 PM, Duncan Murdoch wrote: > On 11-12-28 4:30 PM, Ben quant wrote: > >> Hello, (sorry re-posting due to typo) >> >> I'm attempting to get the folders of a path in a robust way (platform >> independent, format independent). It has to run on Windows and Linux and >> tolerate different formats. >> >> For these: (The paths don't actually exist in Linux but you get the idea.) >> >> Windows: >> file_full_path = "C://Program Files//R//R-2.13.1//NEWS.pdf" >> file_full_path = "C:\Program Files\R\R-2.13.1\NEWS.pdf" >> Linux: >> file_full_path = "~/Program FilesR/R-2.13.1/NEWS.pdf" >> file_full_path = "/home/username/Program FilesR/R-2.13.1/NEWS.pdf" >> >> I would get for Windows: "C", "Program Files", "R", "R-2.13.1","NEWS.pdf" >> I would get for Linux: "home","username", "Program Files", "R", >> "R-2.13.1","NEWS.pdf" >> (The drive and/or home/username aren't necessary, but would be nice to >> have. Also, that file name isn't necessary, but would be nice.) >> >> Thank you for your help, >> >> > If you use the normalizePath() function with winslash="/", then all > current platforms will return a path using "/" as the separator, so you > could do something like > > strsplit(normalizePath(**filename, winslash="/"), "/")[[1]] > > You need to be careful with normalizePath: at least on Windows, it will > not necessarily do what you wanted if the filename doesn't exist. > > Duncan Murdoch > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] folders of path - platform independent (repost)
One quick follow-up on reversing your example. Is there an easy way to get the file.path separator for the platform? file.path("","") seems the be the only way to do it. So if filename is a valid file path, this will return the folders, drive, and file name in vector form regardless of the platform: folders = strsplit(normalizePath(filename, winslash="/"), "/")[[1]] This will undo the above regardless of the platform: paste(folders,collapse=file.path('"","")) Thanks again for your help Duncan! Ben > On Wed, Dec 28, 2011 at 2:37 PM, Duncan Murdoch > wrote: > >> On 11-12-28 4:30 PM, Ben quant wrote: >> >>> Hello, (sorry re-posting due to typo) >>> >>> I'm attempting to get the folders of a path in a robust way (platform >>> independent, format independent). It has to run on Windows and Linux and >>> tolerate different formats. >>> >>> For these: (The paths don't actually exist in Linux but you get the >>> idea.) >>> >>> Windows: >>> file_full_path = "C://Program Files//R//R-2.13.1//NEWS.pdf" >>> file_full_path = "C:\Program Files\R\R-2.13.1\NEWS.pdf" >>> Linux: >>> file_full_path = "~/Program FilesR/R-2.13.1/NEWS.pdf" >>> file_full_path = "/home/username/Program FilesR/R-2.13.1/NEWS.pdf" >>> >>> I would get for Windows: "C", "Program Files", "R", "R-2.13.1","NEWS.pdf" >>> I would get for Linux: "home","username", "Program Files", "R", >>> "R-2.13.1","NEWS.pdf" >>> (The drive and/or home/username aren't necessary, but would be nice to >>> have. Also, that file name isn't necessary, but would be nice.) >>> >>> Thank you for your help, >>> >>> >> If you use the normalizePath() function with winslash="/", then all >> current platforms will return a path using "/" as the separator, so you >> could do something like >> >> strsplit(normalizePath(**filename, winslash="/"), "/")[[1]] >> >> You need to be careful with normalizePath: at least on Windows, it will >> not necessarily do what you wanted if the filename doesn't exist. >> >> Duncan Murdoch >> > > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] folders of path - platform independent (repost)
Oops. I guess I stopped reading about the fsep param when I saw PATH and R_LIB because I'm not interested in those. I didn't get to the part I was interested in. Thanks! Ben On Wed, Dec 28, 2011 at 5:33 PM, David Winsemius wrote: > > On Dec 28, 2011, at 5:57 PM, Ben quant wrote: > > One quick follow-up on reversing your example. Is there an easy way to get >> the file.path separator for the platform? file.path("","") seems the be >> the only way to do it. >> > > I don't get it. Did you look at ?file.path ? It's default call shows fsep= > > > .Platform$file.sep > [1] "/" > > ?.Platform > > -- > David. > > >> So if filename is a valid file path, this will return the folders, drive, >> and file name in vector form regardless of the platform: >> folders = strsplit(normalizePath(**filename, winslash="/"), "/")[[1]] >> This will undo the above regardless of the platform: >> paste(folders,collapse=file.**path('"","")) >> > > > >> Thanks again for your help Duncan! >> >> Ben >> >> >> On Wed, Dec 28, 2011 at 2:37 PM, Duncan Murdoch < >>> murdoch.dun...@gmail.com>**wrote: >>> >>> On 11-12-28 4:30 PM, Ben quant wrote: >>>> >>>> Hello, (sorry re-posting due to typo) >>>>> >>>>> I'm attempting to get the folders of a path in a robust way (platform >>>>> independent, format independent). It has to run on Windows and Linux >>>>> and >>>>> tolerate different formats. >>>>> >>>>> For these: (The paths don't actually exist in Linux but you get the >>>>> idea.) >>>>> >>>>> Windows: >>>>> file_full_path = "C://Program Files//R//R-2.13.1//NEWS.pdf" >>>>> file_full_path = "C:\Program Files\R\R-2.13.1\NEWS.pdf" >>>>> Linux: >>>>> file_full_path = "~/Program FilesR/R-2.13.1/NEWS.pdf" >>>>> file_full_path = "/home/username/Program FilesR/R-2.13.1/NEWS.pdf" >>>>> >>>>> I would get for Windows: "C", "Program Files", "R", >>>>> "R-2.13.1","NEWS.pdf" >>>>> I would get for Linux: "home","username", "Program Files", "R", >>>>> "R-2.13.1","NEWS.pdf" >>>>> (The drive and/or home/username aren't necessary, but would be nice to >>>>> have. Also, that file name isn't necessary, but would be nice.) >>>>> >>>>> Thank you for your help, >>>>> >>>>> >>>>> If you use the normalizePath() function with winslash="/", then all >>>> current platforms will return a path using "/" as the separator, so you >>>> could do something like >>>> >>>> strsplit(normalizePath(filename, winslash="/"), "/")[[1]] >>>> >>>> >>>> You need to be careful with normalizePath: at least on Windows, it will >>>> not necessarily do what you wanted if the filename doesn't exist. >>>> >>>> Duncan Murdoch >>>> >>>> >>> >>> >>[[alternative HTML version deleted]] >> >> __** >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help> >> PLEASE do read the posting guide http://www.R-project.org/** >> posting-guide.html <http://www.R-project.org/posting-guide.html> >> and provide commented, minimal, self-contained, reproducible code. >> > > David Winsemius, MD > West Hartford, CT > > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R.oo package: do setMethodS3 work upon construction
Hello (Heinrich), I did not know I could do this. It doesn't seem to be documented anywhere. Thought this would be helpful to the fraction of the community using package R.oo. Note the call of a setMethodS3 method, xOne, in the setConstructorS3. This is extremely useful if xOne (in this case) is a very complex method (that you always want to be called every time you create a new object). If I have something wrong please let me know! (I'm about to implement this in a large'ish program.) Great package for OOP programming! Example 1: setConstructorS3("ClassA", function() { this = extend(Object(), "ClassA", .x=NULL ) this$xOne() # this is useful! this }) setMethodS3("xOne", "ClassA", function(this,...) { this$.x = 1 }) setMethodS3("getX", "ClassA", function(this,...) { this$.x }) So x is always 1: > a = ClassA() > a$x [1] 1 If you are new to R.oo: if you only want x to be 1 (I.e. xOne above is simple) you should do something like this: Example 2: setConstructorS3("ClassA", function() { this = extend(Object(), "ClassA", .x=1 ) this }) setMethodS3("getX", "ClassA", function(this,...) { this$.x }) > a = ClassA() > a$x [1] 1 The following further illustrates what you can do with Example 1 above: Example 3: setConstructorS3("ClassA", function() { this = extend(Object(), "ClassA", .x=NULL, .y=1 ) this$xOne() this$xPlusY() this }) setMethodS3("xOne", "ClassA", function(this,...) { this$.x = 1 }) setMethodS3("xPlusY", "ClassA", function(this,...) { this$.x = this$.x + this$.y }) setMethodS3("getX", "ClassA", function(this,...) { this$.x }) > a = ClassA() > a$x [1] 2 Hope that helps! Ben [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] textplot in layout
Hello, Someone (Erik) recently posted about putting text on a plot. That thread didn't help. I'd like to put text directly below the 'sub' text (with no gap). The code below is the best I can do. Note the large undesirable gap between 'sub' and 'test'. I'd like the word 'test' to be just below the top box() boarder (directly below 'sub'). year <- c(2000 , 2001 , 2002 , 2003 , 2004) rate <- c(9.34 , 8.50 , 7.62 , 6.93 , 6.60) op <- par(no.readonly = TRUE) on.exit(par(op)) layout(matrix(c(1,2), 2, 1, byrow = TRUE),heights=c(8,1)) par(mar=c(5,3,3,3)) plot(year,rate,main='main',sub='sub') library(gplots) par(mar=c(0,0,0,0),new=F) textplot('test',valign='top',cex=1) box() Note: I'd rather solve it with textplot. If not, my next stop is grid.text(). Also, the text I am plotting with textplot is much longer so a multiple line text plot would solve my next issue (of which I have not looked into yet). Lastly, layout is not necessary. I just used it because I thought it would do what I wanted. Thanks, Ben [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] textplot in layout
Perfect, thanks! ben On Tue, Oct 25, 2011 at 8:12 AM, Eik Vettorazzi wrote: > Hi Ben, > maybe mtext is of more help here? > > par(mar=c(7,3,3,3)) > plot(year,rate,main='main',sub='sub') > mtext('test',cex=1,side=1,line=5) > box() > > cheers > > Am 25.10.2011 15:26, schrieb Ben quant: > > Hello, > > > > Someone (Erik) recently posted about putting text on a plot. That thread > > didn't help. I'd like to put text directly below the 'sub' text (with no > > gap). The code below is the best I can do. Note the large undesirable gap > > between 'sub' and 'test'. I'd like the word 'test' to be just below the > top > > box() boarder (directly below 'sub'). > > > > year <- c(2000 , 2001 , 2002 , 2003 , 2004) > > rate <- c(9.34 , 8.50 , 7.62 , 6.93 , 6.60) > > op <- par(no.readonly = TRUE) > > on.exit(par(op)) > > layout(matrix(c(1,2), 2, 1, byrow = TRUE),heights=c(8,1)) > > par(mar=c(5,3,3,3)) > > plot(year,rate,main='main',sub='sub') > > library(gplots) > > par(mar=c(0,0,0,0),new=F) > > textplot('test',valign='top',cex=1) > > box() > > > > Note: I'd rather solve it with textplot. If not, my next stop is > > grid.text(). Also, the text I am plotting with textplot is much longer so > a > > multiple line text plot would solve my next issue (of which I have not > > looked into yet). Lastly, layout is not necessary. I just used it because > I > > thought it would do what I wanted. > > > > Thanks, > > > > Ben > > > > [[alternative HTML version deleted]] > > > > __ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > -- > Eik Vettorazzi > > Department of Medical Biometry and Epidemiology > University Medical Center Hamburg-Eppendorf > > Martinistr. 52 > 20246 Hamburg > > T ++49/40/7410-58243 > F ++49/40/7410-57790 > > -- > Pflichtangaben gemäß Gesetz über elektronische Handelsregister und > Genossenschaftsregister sowie das Unternehmensregister (EHUG): > > Universitätsklinikum Hamburg-Eppendorf; Körperschaft des öffentlichen > Rechts; Gerichtsstand: Hamburg > > Vorstandsmitglieder: Prof. Dr. Guido Sauter (Vertreter des Vorsitzenden), > Dr. Alexander Kirstein, Joachim Prölß, Prof. Dr. Dr. Uwe Koch-Gromus > > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R.oo package, inherit two classes
Hello, How do I inherit two classes using the R.oo package. Below is kind of a silly example, but I am trying to create class PerDog from classes Dog and Person. Error at bottom. I've tried a few other ways of using extend(), but nothing seems to get me what I want. Example: setConstructorS3("Person", function(age=NA) { this = extend(Object(), "Person", .age=age ) this }) setMethodS3("getAge", "Person", function(this, ...) { this$.age; }) setMethodS3("setAge", "Person", function(this,num, ...) { this$.age = num; }) # .. setConstructorS3("Dog", function(dog_age=NA) { this = extend(Object(), "Dog", .dog_age=dog_age ) this }) setMethodS3("getDogAge", "Dog", function(this, ...) { this$.dog_age; }) setMethodS3("setDogAge", "Dog", function(this,num, ...) { this$.dog_age = num; }) #.. setConstructorS3("PerDog", function(age=NA,wt=NA,dog_age=NULL) { extend(Person(age=age),Dog(dog_age=dog_age), "PerDog", .wt=wt ) }) setMethodS3("getWeight", "PerDog", function(this, ...) { this$.wt; }) setMethodS3("setWeight", "PerDog", function(this,w, ...) { this$.wt = w; }) > pd = PerDog(67,150,1) Error in list(`PerDog(67, 150, 1)` = , `extend(Person(age = age), Dog(dog_age = dog_age), "PerDog", .wt = wt)` = , : [2011-10-27 09:34:06] Exception: Missing name of field #1 in class definition: Dog: 0x73880408 at throw(Exception(...)) at throw.default("Missing name of field #", k, " in class definition: ", ...className) at throw("Missing name of field #", k, " in class definition: ", ...className) at extend.Object(Person(age = age), Dog(dog_age = dog_age), "PerDog", .wt = wt) at extend(Person(age = age), Dog(dog_age = dog_age), "PerDog", .wt = wt) at PerDog(67, 150, 1) Three (of many) other things I have tried: 1) setConstructorS3("PerDog", function(age=NA,wt=NA,dog_age=NULL) { this = extend(extend(Person(age=age), "PerDog"),Dog(dog_age=dog_age), "PerDog", .wt=wt ) this }) 2) setConstructorS3("PerDog", function(age=NA,wt=NA,dog_age=NULL) { this = extend(Dog(dog_age=dog_age), "PerDog", .wt=wt ) this }) setConstructorS3("PerDog", function(age=NA,wt=NA,dog_age=NULL) { this = extend(Person(age=age), "PerDog", .wt=wt ) this }) 3) setConstructorS3("PerDog", function(age=NA,wt=NA,dog_age=NULL) { this = extend(Dog(dog_age=dog_age), "PerDog", setConstructorS3("PerDog", function(age=NA,wt=NA,dog_age=NULL) { extend(Person(age=age), "PerDog", .wt=wt ) }) ) this }) Thanks, ben [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] preceding X. and X
Hello, Why do I get preceding "X." (that is a and X followed by a period) for negative numbers and an "X" for positive numbers when I read a csv file? Am I stuck with this? If so, how do I convert it to normal numbers? dat=read.csv(file_path) > dat [1] X0.0 X.0.240432350374 X0.355468069625 X.0.211469972378 X1.1812797415 X.0.227975150826 X0.74834842067 X.1.04923922494X0.566058942902X.0.184077147931 [11] X.0.693389240029 X.0.474961946724 X.0.557555716654 X0.374198813899X0.560620781209X.0.0609127295732 X0.645337364133 X0.353711785227X.0.0999146114953 X.0.320711825714 [21] X0.332194935294X0.513794862516X0.228124868198 X0.141250108666X0.879359879038X0.721652892103X.1.14723732497 X.0.0871541975062 X0.302181204959X0.0594492294833 [31] X.0.240723094394 X0.358971714966X.0.42954330242 X.0.0739978455876 X.0.108806367787 X0.616107131373X.0.202669947993 X.0.200450609711 X0.15421692014 X.0.0629346641528 [41] X1.16077454571 X.0.100980386545 X.0.457429357325 X0.128929934631X.0.143442822494 X.1.09050490567X.0.270230489547 X.0.438100470791 X.0.069111547 X0.18367056566 [51] X0.728842996177X0.221986311856X.0.793971624503 X.0.258083713185 X0.460468157809X.0.608552686527 X.0.11024558138 X.0.247014689522 X.0.137467423146 X0.0577133684917 [61] X0.615590960098X.0.210395786553 X0.372979876654 X.0.763661795812 X1.22248872639 X1.17541364078 X1.34965201031 X.0.0653956005331 X0.446173249776X0.738548926264 [71] X0.426787360705X.0.409994430265 X.0.445643675958 etc... Thanks ben [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] preceding X. and X
Figured it out. Solution: dat=read.csv(file_path, header=F) > dat V1 V2V3 V4 V5 V6V7 V8V9V10V11V12V13 V14 V15 V16 V17 V18 V19V20 1 0 -0.2404324 0.3554681 -0.21147 1.18128 -0.2279752 0.7483484 -1.049239 0.5660589 -0.1840771 -0.6933892 -0.4749619 -0.5575557 0.3741988 0.5606208 -0.06091273 0.6453374 0.3537118 -0.09991461 -0.3207118 V21 V22 V23 V24 V25 V26 V27V28 V29V30V31 V32V33 V34V35 V36V37V38 V39 1 0.3321949 0.5137949 0.2281249 0.1412501 0.8793599 0.7216529 -1.147237 -0.0871542 0.3021812 0.05944923 -0.2407231 0.3589717 -0.4295433 -0.07399785 -0.1088064 0.6161071 -0.2026699 -0.2004506 0.1542169 V40 V41V42V43 V44V45 V46V47V48 V49 V50 V51 V52 V53V54 V55V56V57V58 1 -0.06293466 1.160775 -0.1009804 -0.4574294 0.1289299 -0.1434428 -1.090505 -0.2702305 -0.4381005 -0.0691 0.1836706 0.728843 0.2219863 -0.7939716 -0.2580837 0.4604682 -0.6085527 -0.1102456 -0.2470147 V59V60 V61V62 V63V64 V65 V66 V67V68 V69 V70 V71 V72V73 V74 V75V76V77 1 -0.1374674 0.05771337 0.615591 -0.2103958 0.3729799 -0.7636618 1.222489 1.175414 1.349652 -0.0653956 0.4461732 0.7385489 0.4267874 -0.4099944 -0.4456437 0.1310654 0.5912901 0.03645256 -0.1760742 V78 V79 V80 Thanks, Ben On Thu, Oct 27, 2011 at 1:12 PM, Justin Haynes wrote: > Id look at the actual csv file. I assume it has the X there also. > sounds like a good candidate for some data munging tools first before > you bring it into R. also ?str of the data would be helpful. My first > guess is those are all being read as column names. Were they data in > the data.frame dat the should be quoted: > > > dat<-c('X0.0','X.0.24','X0.35','X.0.211') > > dat > [1] "X0.0""X.0.24" "X0.35" "X.0.211" > > versus: > > > names(dat)<-c('col_one','X.0.44',0.65,'last_col') > > dat > col_oneX.0.44 0.65 last_col > "X0.0" "X.0.24" "X0.35" "X.0.211" > > > > However, if you want to use R to clean it up, I'd use the stringr package. > > > library(stringr) > > > dat<-str_replace(dat,'X.0.','-0.') > > dat > [1] "X0.0" "-0.24" "X0.35" "-0.211" > > dat<-str_replace(dat,'X','') > > dat > [1] "0.0""-0.24" "0.35" "-0.211" > > dat<-as.numeric(dat) > > dat > [1] 0.000 -0.240 0.350 -0.211 > > > > hope that helps, > > Justin > > > On Thu, Oct 27, 2011 at 11:47 AM, Ben quant wrote: > > Hello, > > > > Why do I get preceding "X." (that is a and X followed by a period) for > > negative numbers and an "X" for positive numbers when I read a csv file? > Am > > I stuck with this? If so, how do I convert it to normal numbers? > > > > dat=read.csv(file_path) > > > >> dat > > [1] X0.0 X.0.240432350374 X0.355468069625 > > X.0.211469972378 X1.1812797415 X.0.227975150826 X0.74834842067 > > X.1.04923922494X0.566058942902X.0.184077147931 > > [11] X.0.693389240029 X.0.474961946724 X.0.557555716654 > > X0.374198813899X0.560620781209X.0.0609127295732 X0.645337364133 > > X0.353711785227X.0.0999146114953 X.0.320711825714 > > [21] X0.332194935294X0.513794862516X0.228124868198 > > X0.141250108666X0.879359879038X0.721652892103X.1.14723732497 > > X.0.0871541975062 X0.302181204959X0.0594492294833 > > [31] X.0.240723094394 X0.358971714966X.0.42954330242 > > X.0.0739978455876 X.0.108806367787 X0.616107131373X.0.202669947993 > > X.0.200450609711 X0.15421692014 X.0.0629346641528 > > [41] X1.16077454571 X.0.100980386545 X.0.457429357325 > > X0.128929934631X.0.143442822494 X.1.09050490567X.0.270230489547 > > X.0.438100470791 X.0.069111547 X0.18367056566 > > [51] X0.728842996177X0.221986311856X.0.793971624503 > > X.0.258083713185 X0.460468157809X.0.608552686527 X.0.11024558138 > > X.0.247014689522 X.0.137467423146 X0.0577133684917 > > [61] X0.615590960098X.0.210395786553 X0.372979876654 > > X.0.763661795812 X1.22248872639
Re: [R] preceding X. and X
I think it is what I want. The values look OK. I do get a warning. Here is what you asked for: > dat=read.csv(file_path, header=F) Warning message: In read.table(file = file, header = header, sep = sep, quote = quote, : incomplete final line found by readTableHeader on 'y:\ALL STRATEGIES\INVEST-TRADING\Zacks\RCsvData\VarPortRtns.csv' > str(dat) 'data.frame': 1 obs. of 251 variables: $ V1 : num 0 $ V2 : num -0.24 $ V3 : num 0.355 $ V4 : num -0.211 $ V5 : num 1.18 $ V6 : num -0.228 $ V7 : num 0.748 $ V8 : num -1.05 $ V9 : num 0.566 $ V10 : num -0.184 $ V11 : num -0.693 ...etc > dat V1 V2V3 V4 V5 V6V7 V8V9V10V11V12V13 V14 V15 V16 V17 V18 V19V20 1 0 -0.2404324 0.3554681 -0.21147 1.18128 -0.2279752 0.7483484 -1.049239 0.5660589 -0.1840771 -0.6933892 -0.4749619 -0.5575557 0.3741988 0.5606208 -0.06091273 0.6453374 0.3537118 -0.09991461 -0.3207118 V21 V22 V23 V24 V25 V26 V27V28 V29V30V31 V32V33 V34V35 V36V37V38 V39 1 0.3321949 0.5137949 0.2281249 0.1412501 0.8793599 0.7216529 -1.147237 -0.0871542 0.3021812 0.05944923 -0.2407231 0.3589717 -0.4295433 -0.07399785 -0.1088064 0.6161071 -0.2026699 -0.2004506 0.1542169 V40 V41V42V43 V44V45 V46V47V48 V49 V50 V51 V52 V53V54 V55V56V57V58 1 -0.06293466 1.160775 -0.1009804 -0.4574294 0.1289299 -0.1434428 -1.090505 -0.2702305 -0.4381005 -0.0691 0.1836706 0.728843 0.2219863 -0.7939716 -0.2580837 0.4604682 -0.6085527 -0.1102456 -0.2470147 V5 ...etc... Ben On Thu, Oct 27, 2011 at 1:37 PM, Nordlund, Dan (DSHS/RDA) < nord...@dshs.wa.gov> wrote: > > -Original Message- > > From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- > > project.org] On Behalf Of Ben quant > > Sent: Thursday, October 27, 2011 12:26 PM > > To: r-help@r-project.org > > Subject: Re: [R] preceding X. and X > > > > Figured it out. Solution: > > dat=read.csv(file_path, header=F) > > > dat > > V1 V2V3 V4 V5 V6V7 > > V8V9V10V11V12V13 V14 > > V15 V16 V17 V18 V19V20 > > 1 0 -0.2404324 0.3554681 -0.21147 1.18128 -0.2279752 0.7483484 - > > 1.049239 > > 0.5660589 -0.1840771 -0.6933892 -0.4749619 -0.5575557 0.3741988 > > 0.5606208 > > -0.06091273 0.6453374 0.3537118 -0.09991461 -0.3207118 > > V21 V22 V23 V24 V25 V26 > > V27V28 V29V30V31 V32V33 > > V34V35 V36V37V38 V39 > > 1 0.3321949 0.5137949 0.2281249 0.1412501 0.8793599 0.7216529 -1.147237 > > -0.0871542 0.3021812 0.05944923 -0.2407231 0.3589717 -0.4295433 - > > 0.07399785 > > -0.1088064 0.6161071 -0.2026699 -0.2004506 0.1542169 > > V40 V41V42V43 V44V45 > > V46V47V48 V49 V50 V51 V52 > > V53V54 V55V56V57V58 > > 1 -0.06293466 1.160775 -0.1009804 -0.4574294 0.1289299 -0.1434428 - > > 1.090505 > > -0.2702305 -0.4381005 -0.0691 0.1836706 0.728843 0.2219863 - > > 0.7939716 > > -0.2580837 0.4604682 -0.6085527 -0.1102456 -0.2470147 > > V59V60 V61V62 V63V64 > > V65 V66 V67V68 V69 V70 V71 > > V72V73 V74 V75V76V77 > > 1 -0.1374674 0.05771337 0.615591 -0.2103958 0.3729799 -0.7636618 > > 1.222489 > > 1.175414 1.349652 -0.0653956 0.4461732 0.7385489 0.4267874 -0.4099944 > > -0.4456437 0.1310654 0.5912901 0.03645256 -0.1760742 > > V78 V79 V80 > > > > Thanks, > > Ben > > > > On Thu, Oct 27, 2011 at 1:12 PM, Justin Haynes > > wrote: > > > > > Id look at the actual csv file. I assume it has the X there also. > > > sounds like a good candidate for some data munging tools first before > > > you bring it into R. also ?str of the data would be helpful. My > > first > > > guess is those are all being read as column names. Were they data in > > > the data.frame dat the should be quoted: > > > > > > > dat<-c('X0.0','X.0.24','X0.35','X.0.211') > > > > dat > > > [1] "X0.0""X.0.24&quo
[R] RpgSQL vs RPostgreSQL
Hello, Could someone who has experience with or knowledge regarding both RPostgreSQL and RpgSQL packages provide some feedback? Thanks! I am most interested in hearing from people who have knowledge regarding both packages, not just one. The only real difference I can see is that RpgSQL has a Java dependency, which I am not apposed to if it provides some added benefit...otherwise I will probably use the RPostgreSQL package. Both packages look to be maintained still. I have skimmed over both of these links: http://cran.r-project.org/web/packages/RpgSQL/RpgSQL.pdf http://cran.r-project.org/web/packages/RPostgreSQL/RPostgreSQL.pdf Thanks, Ben [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] RpgSQL row names
Hello, Using the RpgSQL package, there must be a way to get the row names into the table automatically. In the example below, I'm trying to get rid of the cbind line, yet have the row names of the data frame populate a column. > bentest = matrix(1:4,2,2) > dimnames(bentest) = list(c('ra','rb'),c('ca','cb')) > bentest ca cb ra 1 3 rb 2 4 > bentest = cbind(item_name=rownames(bentest),bentest) > dbWriteTable(con, "r.bentest", bentest) [1] TRUE > dbGetQuery(con, "SELECT * FROM r.bentest") item_name ca cb 1ra 1 3 2rb 2 4 Thanks, Ben [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] RpgSQL row names
This is great, thanks! I have another unrelated question. I'll create a new email for that one. ben On Mon, Nov 7, 2011 at 4:16 PM, Gabor Grothendieck wrote: > On Mon, Nov 7, 2011 at 5:34 PM, Ben quant wrote: > > Hello, > > > > Using the RpgSQL package, there must be a way to get the row names into > the > > table automatically. In the example below, I'm trying to get rid of the > > cbind line, yet have the row names of the data frame populate a column. > > > >> bentest = matrix(1:4,2,2) > >> dimnames(bentest) = list(c('ra','rb'),c('ca','cb')) > >> bentest > > ca cb > > ra 1 3 > > rb 2 4 > >> bentest = cbind(item_name=rownames(bentest),bentest) > >> dbWriteTable(con, "r.bentest", bentest) > > [1] TRUE > >> dbGetQuery(con, "SELECT * FROM r.bentest") > > item_name ca cb > > 1ra 1 3 > > 2rb 2 4 > > > > > > The RJDBC based drivers currently don't support that. You can create a > higher level function that does it. > > dbGetQuery2 <- function(...) { > out <- dbGetQuery(...) > i <- match("row_names", names(out), nomatch = 0) > if (i > 0) { >rownames(out) <- out[[i]] >out <- out[-1] > } > out > } > > rownames(BOD) <- letters[1:nrow(BOD)] > dbWriteTable(con, "BOD", cbind(row_names = rownames(BOD), BOD)) > dbGetQuery2(con, "select * from BOD") > > -- > Statistics & Software Consulting > GKX Group, GKX Associates Inc. > tel: 1-877-GKX-GROUP > email: ggrothendieck at gmail.com > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] multi-line query
Hello, I'm using package RpgSQL. Is there a better way to create a multi-line query/character string? I'm looking for less to type and readability. This is not very readable for large queries: s <- 'create table r.BOD("id" int primary key,"name" varchar(12))' I write a lot of code, so I'm looking to type less than this, but it is more readable from and SQL standpoint: s <- gsub("\n", "", 'create table r.BOD( "id" int primary key ,"name" varchar(12)) ') How it is used: dbSendUpdate(con, s) Regards, Ben [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] dbWriteTable with field data type
Hello, When I do: dbWriteTable(con, "r.BOD", cbind(row_names = rownames(BOD), BOD)) ...can I specify the data types such as varchar(12), float, double precision, etc. for each of the fields/columns? If not, what is the best way to create a table with specified field data types (with the RpgSQL package/R)? Regards, Ben [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] multi-line query
Because I don't know anything about sqldf. :) Here is what happens, but I"m sure it is happening because I didn't read the manual yet: > s <- sqldf('create table r.dat("id" int primary key,"val" int)') Error in ls(envir = envir, all.names = private) : invalid 'envir' argument Error in !dbPreExists : invalid argument type ben On Tue, Nov 8, 2011 at 10:41 AM, jim holtman wrote: > Why not just send it in as is. I use SQLite (via sqldf) and here is > the way I write my SQL statements: > >inRange <- sqldf(' >select t.* >, r.start >, r.end >from total t, commRange r >where t.comm = r.comm and >t.loc between r.start and r.end and >t.loc != t.new >') > > On Tue, Nov 8, 2011 at 11:43 AM, Ben quant wrote: > > Hello, > > > > I'm using package RpgSQL. Is there a better way to create a multi-line > > query/character string? I'm looking for less to type and readability. > > > > This is not very readable for large queries: > > s <- 'create table r.BOD("id" int primary key,"name" varchar(12))' > > > > I write a lot of code, so I'm looking to type less than this, but it is > > more readable from and SQL standpoint: > > s <- gsub("\n", "", 'create table r.BOD( > > "id" int primary key > > ,"name" varchar(12)) > > ') > > > > How it is used: > > dbSendUpdate(con, s) > > > > Regards, > > > > Ben > > > >[[alternative HTML version deleted]] > > > > __ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > > > -- > Jim Holtman > Data Munger Guru > > What is the problem that you are trying to solve? > Tell me what you want to do, not how you want to do it. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] fridays date to date
Hello, How do I get the dates of all Fridays between two dates? thanks, Ben [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] fridays date to date
Great thanks! ben On Thu, Mar 1, 2012 at 1:30 PM, Marc Schwartz wrote: > On Mar 1, 2012, at 2:02 PM, Ben quant wrote: > > > Hello, > > > > How do I get the dates of all Fridays between two dates? > > > > thanks, > > > > Ben > > > Days <- seq(from = as.Date("2012-03-01"), >to = as.Date("2012-07-31"), >by = "day") > > > str(Days) > Date[1:153], format: "2012-03-01" "2012-03-02" "2012-03-03" "2012-03-04" > ... > > # See ?weekdays > > > Days[weekdays(Days) == "Friday"] > [1] "2012-03-02" "2012-03-09" "2012-03-16" "2012-03-23" "2012-03-30" > [6] "2012-04-06" "2012-04-13" "2012-04-20" "2012-04-27" "2012-05-04" > [11] "2012-05-11" "2012-05-18" "2012-05-25" "2012-06-01" "2012-06-08" > [16] "2012-06-15" "2012-06-22" "2012-06-29" "2012-07-06" "2012-07-13" > [21] "2012-07-20" "2012-07-27" > > HTH, > > Marc Schwartz > > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] fill data forward in data frame.
Hello, My direct desire is a good (fast) way to fill values forward until there is another value then fill that value foward in the data xx (at the bottom of this email). For example, from row 1 to row 45 should be NA (no change), but from row 46 row 136 the value should be 12649, and from row 137 to the next value should be 13039.00. The last line of code is all you need for this part. If you are so inclined, my goal is this: I want to create a weekly time series out of some data based on the report date. The report date is 'rd' below, and is the correct date for the time series. My idea (in part seen below) is to align rd and ua via the incorrect date (the time series date), then merge that using the report date (rd) and a daily series (so I capture all of the dates) of dates (dt). That gets the data in the right start period. I've done all of this so far below and it looks fine. Then I plan to roll all of those values forward to the next value (see question above), then I'll do something like this: xx[weekdays(xx[,1]) == "Friday",] ...to get a weekly series of Friday values. I'm thinking someone probably has a faster way of doing this. I have to do this many times, so speed is important. Thanks! Here is what I have done so far: dt <- seq(from =as.Date("2009-06-01"), to = Sys.Date(), by = "day") > nms [1] "2009-06-30" "2009-09-30" "2009-12-31" "2010-03-31" "2010-06-30" "2010-09-30" "2010-12-31" "2011-03-31" "2011-06-30" "2011-09-30" [11] "2011-12-31" > rd 2009-06-30 2009-09-30 2009-12-31 2010-03-31 2010-06-30 2010-09-30 2010-12-31 2011-03-31 2011-06-30 2011-09-30 "2009-07-16" "2009-10-15" "2010-01-19" "2010-04-19" "2010-07-19" "2010-10-18" "2011-01-18" "2011-04-19" "2011-07-18" "2011-10-17" 2011-12-31 "2012-01-19" > ua 2009-06-30 2009-09-30 2009-12-31 2010-03-31 2010-06-30 2010-09-30 2010-12-31 2011-03-31 2011-06-30 2011-09-30 2011-12-31 12649.00 13039.00 13425.00 13731.00 14014.00 14389.00 14833.00 15095.00 15481.43 15846.43 16186.43 > x = merge(ua,rd,by='row.names') > names(x) = c('z.date','val','rt_date') > xx = merge(dt,x,by.y= 'rt_date',by.x=1,all.x=T) > xx x z.date val 1 2009-06-01 NA 2 2009-06-02 NA 3 2009-06-03 NA 4 2009-06-04 NA 5 2009-06-05 NA ...ect 36 2009-07-06 NA 37 2009-07-07 NA 38 2009-07-08 NA 39 2009-07-09 NA 40 2009-07-10 NA 41 2009-07-11 NA 42 2009-07-12 NA 43 2009-07-13 NA 44 2009-07-14 NA 45 2009-07-15 NA 46 2009-07-16 2009-06-30 12649 47 2009-07-17 NA 48 2009-07-18 NA 49 2009-07-19 NA 50 2009-07-20 NA 51 2009-07-21 NA 52 2009-07-22 NA 53 2009-07-23 NA 54 2009-07-24 NA 55 2009-07-25 NA 56 2009-07-26 NA 57 2009-07-27 NA 58 2009-07-28 NA ...ect 129 2009-10-07 NA 130 2009-10-08 NA 131 2009-10-09 NA 132 2009-10-10 NA 133 2009-10-11 NA 134 2009-10-12 NA 135 2009-10-13 NA 136 2009-10-14 NA 137 2009-10-15 2009-09-30 13039.00 138 2009-10-16 NA 139 2009-10-17 NA 140 2009-10-18 NA 141 2009-10-19 NA 142 2009-10-20 NA 143 2009-10-21 NA ...ect [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] fill data forward in data frame.
That is great! Thank you very much. Ben On Thu, Mar 1, 2012 at 2:57 PM, Petr Savicky wrote: > On Thu, Mar 01, 2012 at 02:31:01PM -0700, Ben quant wrote: > > Hello, > > > > My direct desire is a good (fast) way to fill values forward until there > is > > another value then fill that value foward in the data xx (at the bottom > of > > this email). For example, from row 1 to row 45 should be NA (no change), > > but from row 46 row 136 the value should be 12649, and from row 137 to > the > > next value should be 13039.00. The last line of code is all you need for > > this part. > > > > If you are so inclined, my goal is this: I want to create a weekly time > > series out of some data based on the report date. The report date is 'rd' > > below, and is the correct date for the time series. My idea (in part seen > > below) is to align rd and ua via the incorrect date (the time series > date), > > then merge that using the report date (rd) and a daily series (so I > capture > > all of the dates) of dates (dt). That gets the data in the right start > > period. I've done all of this so far below and it looks fine. Then I plan > > to roll all of those values forward to the next value (see question > above), > > then I'll do something like this: > > > > xx[weekdays(xx[,1]) == "Friday",] > > > > ...to get a weekly series of Friday values. I'm thinking someone probably > > has a faster way of doing this. I have to do this many times, so speed is > > important. Thanks! > > > > Here is what I have done so far: > > > > dt <- seq(from =as.Date("2009-06-01"), to = Sys.Date(), by = "day") > > > > > nms > > [1] "2009-06-30" "2009-09-30" "2009-12-31" "2010-03-31" "2010-06-30" > > "2010-09-30" "2010-12-31" "2011-03-31" "2011-06-30" "2011-09-30" > > [11] "2011-12-31" > > > > > rd > > 2009-06-30 2009-09-30 2009-12-31 2010-03-31 2010-06-30 > > 2010-09-30 2010-12-31 2011-03-31 2011-06-30 2011-09-30 > > "2009-07-16" "2009-10-15" "2010-01-19" "2010-04-19" "2010-07-19" > > "2010-10-18" "2011-01-18" "2011-04-19" "2011-07-18" "2011-10-17" > > 2011-12-31 > > "2012-01-19" > > > > > ua > > 2009-06-30 2009-09-30 2009-12-31 2010-03-31 2010-06-30 2010-09-30 > > 2010-12-31 2011-03-31 2011-06-30 2011-09-30 2011-12-31 > > 12649.00 13039.00 13425.00 13731.00 14014.00 14389.00 > > 14833.00 15095.00 15481.43 15846.43 16186.43 > > > > > x = merge(ua,rd,by='row.names') > > > names(x) = c('z.date','val','rt_date') > > > xx = merge(dt,x,by.y= 'rt_date',by.x=1,all.x=T) > > > xx > > x z.date val > > 1 2009-06-01 NA > > 2 2009-06-02 NA > > 3 2009-06-03 NA > > 4 2009-06-04 NA > > 5 2009-06-05 NA > > > > ...ect > > > > 36 2009-07-06 NA > > 37 2009-07-07 NA > > 38 2009-07-08 NA > > 39 2009-07-09 NA > > 40 2009-07-10 NA > > 41 2009-07-11 NA > > 42 2009-07-12 NA > > 43 2009-07-13 NA > > 44 2009-07-14 NA > > 45 2009-07-15 NA > > 46 2009-07-16 2009-06-30 12649 > > 47 2009-07-17 NA > > 48 2009-07-18 NA > > 49 2009-07-19 NA > > 50 2009-07-20 NA > > 51 2009-07-21 NA > > 52 2009-07-22 NA > > 53 2009-07-23 NA > > 54 2009-07-24 NA > > 55 2009-07-25 NA > > 56 2009-07-26 NA > > 57 2009-07-27 NA > > 58 2009-07-28 NA > > > > ...ect > > > > 129 2009-10-07 NA > > 130 2009-10-08 NA > > 131 2009-10-09 NA > > 132 2009-10-10 NA > > 133 2009-10-11 NA > > 134 2009-10-12 NA > > 135 2009-10-13 NA > > 136 2009-10-14 NA > > 137 2009-10-15 2009-09-30 13039.00 > > 138 2009-10-16 NA > > 139 2009-10-17 NA > > 140 2009-10-18 NA > > 141 2009-10-19 NA > > 142 2009-10-20 NA > >
[R] data frame of strings formatted
Hello, I have another question I have a data frame that looks like this: a b 2007-03-31 "20070514" "20070410" 2007-06-30 "20070814" "20070709" 2007-09-30 "20071115" "20071009" 2007-12-31 "20080213" "20080109" 2008-03-31 "20080514" "20080407" 2008-06-30 "20080814" "--" 2008-09-30 "20081114" "20081007" 2008-12-31 "20090217" "20090112" 2009-03-31 "--" "20090407" 2009-06-30 "20090817" "20090708" 2009-09-30 "20091113" "--" 2009-12-31 "20100212" "20100111" 2010-03-31 "20100517" "20100412" 2010-06-30 "20100816" "20100712" 2010-09-30 "20101112" "20101007" 2010-12-31 "20110214" "20110110" 2011-03-31 "20110513" "20110411" 2011-06-30 "20110815" "20110711" 2011-09-30 "2015" "20111011" (actually it has about 10,00 columns) I'd like all of the strings to be formatted like 2011-11-15, 2011-10-11, etc. as a data frame of the same dimensions and all of the and dimnames intact. They don't have to be of date format. "--" can be NA or left the same. It does have to be fast though... Thanks! ben [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data frame of strings formatted
Thanks a ton! That is great. ben On Thu, Mar 1, 2012 at 9:29 PM, Peter Langfelder wrote: > On Thu, Mar 1, 2012 at 8:05 PM, Ben quant wrote: > > Hello, > > > > I have another question > > > > I have a data frame that looks like this: > > a b > > 2007-03-31 "20070514" "20070410" > > 2007-06-30 "20070814" "20070709" > > 2007-09-30 "20071115" "20071009" > > 2007-12-31 "20080213" "20080109" > > 2008-03-31 "20080514" "20080407" > > 2008-06-30 "20080814" "--" > > 2008-09-30 "20081114" "20081007" > > 2008-12-31 "20090217" "20090112" > > 2009-03-31 "--" "20090407" > > 2009-06-30 "20090817" "20090708" > > 2009-09-30 "20091113" "--" > > 2009-12-31 "20100212" "20100111" > > 2010-03-31 "20100517" "20100412" > > 2010-06-30 "20100816" "20100712" > > 2010-09-30 "20101112" "20101007" > > 2010-12-31 "20110214" "20110110" > > 2011-03-31 "20110513" "20110411" > > 2011-06-30 "20110815" "20110711" > > 2011-09-30 "2015" "20111011" > > > > (actually it has about 10,00 columns) > > > > I'd like all of the strings to be formatted like 2011-11-15, 2011-10-11, > > etc. as a data frame of the same dimensions and all of the and dimnames > > intact. They don't have to be of date format. "--" can be NA or left the > > same. It does have to be fast though... > > There may be a ready-made function for this, but if not, substring and > paste are your friends. Look them up. > > Here's how I would do it: > > fix = function(x) > { > year = substring(x, 1, 4); > mo = substring(x, 5, 6); > day = substring(x, 7, 8); > ifelse(year=="--", "NA", paste(year, mo, day, sep = "-")) > } > > fixed = apply(YourDataFrame, 2, fix) > dimnames(fixed) = dimnames(YourDataFrame) > > Since you don't provide an example I can't test it exhaustively but it > seems to work for me. > > Peter > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] speed up merge
Hello, I have a nasty loop that I have to do 11877 times. The only thing that slows it down really is this merge: xx1 = merge(dt,ua_rd,by.x=1,by.y= 'rt_date',all.x=T) Any ideas on how to speed it up? The output can't change materially (it works), but I'd like it to go faster. I'm looking at getting around the loop (not shown), but I'm trying to speed up the merge first. I'll post regarding the loop if nothing comes of this post. Here is some information on what type of stuff is going into the merge: > class(ua_rd) [1] "matrix" > dim(ua_rd) [1] 20 2 > head(ua_rd) AName rt_date 2007-03-31 "14066.580078125" "2007-04-26" 2007-06-30 "14717" "2007-07-19" 2007-09-30 "15528" "2007-10-25" 2007-12-31 "17609" "2008-01-24" 2008-03-31 "17168" "2008-04-24" 2008-06-30 "17681" "2008-07-17" > class(dt) [1] "character" > length(dt) [1] 1799 > dt[1:10] [1] "2007-03-31" "2007-04-01" "2007-04-02" "2007-04-03" "2007-04-04" "2007-04-05" "2007-04-06" "2007-04-07" [9] "2007-04-08" "2007-04-09" thanks, Ben [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] speed up merge
I'm not sure. I'm still looking into it. Its pretty involved, so I asked the simplest answer first (the merge question). I'll reply back with a mock-up/sample that is testable under a more appropriate subject line. Probably this weekend. Regards, Ben On Fri, Mar 2, 2012 at 4:37 AM, Hans Ekbrand wrote: > On Fri, Mar 02, 2012 at 03:24:20AM -0700, Ben quant wrote: > > Hello, > > > > I have a nasty loop that I have to do 11877 times. > > Are you completely sure about that? I often find my self avoiding > loops-by-row by constructing vectors of which rows that fullfil a > condition, and then creating new vectors out of that vector. If you > elaborate on the problem, perhaps we could find a way to avoid the > loops altogether? > > Mostly as a note to self, I wrote > http://code.cjb.net/vectors-instead-of-loop.html, it might be > understood by others too, but I'm not sure. > > -- > Hans Ekbrand (http://sociologi.cjb.net) > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] speed up merge
I'll have to give this a try this weekend. Thank you! ben On Fri, Mar 2, 2012 at 12:07 PM, jim holtman wrote: > One way to speed up the merge is not to use merge. You can use 'match' to > find matching indices and then manually. > > Does this do what you want: > > > ua <- read.table(text = ' AName rt_date > + 2007-03-31 "14066.580078125" "2007-04-01" > + 2007-06-30 "14717" "2007-04-03" > + 2007-09-30 "15528" "2007-10-25" > + 2007-12-31 "17609" "2008-04-06" > + 2008-03-31 "17168" "2008-04-24" > + 2008-06-30 "17681" "2008-04-09"', header = TRUE, as.is = TRUE) > > > > dt <- c( "2007-03-31" ,"2007-04-01" ,"2007-04-02", "2007-04-03" > ,"2007-04-04", > + "2007-04-05" ,"2007-04-06" ,"2007-04-07", > + "2007-04-08", "2007-04-09") > > > > # find matching values in ua > > indx <- match(dt, ua$rt_date) > > > > # create new result matrix > > xx1 <- cbind(dt, ua[indx,]) > > rownames(xx1) <- NULL # delete funny names > > xx1 >dtANamert_date > 1 2007-03-31 NA > 2 2007-04-01 14066.58 2007-04-01 > 3 2007-04-02 NA > 4 2007-04-03 14717.00 2007-04-03 > 5 2007-04-04 NA > 6 2007-04-05 NA > 7 2007-04-06 NA > 8 2007-04-07 NA > 9 2007-04-08 NA > 10 2007-04-09 NA > > > > > On Fri, Mar 2, 2012 at 5:24 AM, Ben quant wrote: > >> Hello, >> >> I have a nasty loop that I have to do 11877 times. The only thing that >> slows it down really is this merge: >> >> xx1 = merge(dt,ua_rd,by.x=1,by.y= 'rt_date',all.x=T) >> >> Any ideas on how to speed it up? The output can't change materially (it >> works), but I'd like it to go faster. I'm looking at getting around the >> loop (not shown), but I'm trying to speed up the merge first. I'll post >> regarding the loop if nothing comes of this post. >> >> Here is some information on what type of stuff is going into the merge: >> >> > class(ua_rd) >> [1] "matrix" >> > dim(ua_rd) >> [1] 20 2 >> > head(ua_rd) >> AName rt_date >> 2007-03-31 "14066.580078125" "2007-04-26" >> 2007-06-30 "14717" "2007-07-19" >> 2007-09-30 "15528" "2007-10-25" >> 2007-12-31 "17609" "2008-01-24" >> 2008-03-31 "17168" "2008-04-24" >> 2008-06-30 "17681" "2008-07-17" >> > class(dt) >> [1] "character" >> > length(dt) >> [1] 1799 >> > dt[1:10] >> [1] "2007-03-31" "2007-04-01" "2007-04-02" "2007-04-03" "2007-04-04" >> "2007-04-05" "2007-04-06" "2007-04-07" >> [9] "2007-04-08" "2007-04-09" >> >> thanks, >> >> Ben >> >>[[alternative HTML version deleted]] >> >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > > > -- > Jim Holtman > Data Munger Guru > > What is the problem that you are trying to solve? > Tell me what you want to do, not how you want to do it. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] removing data look-ahead, something faster.
Hello, Thank you for your help/advice! The issue here is speed/efficiency. I can do what I want, but its really slow. The goal is to have the ability to do calculations on my data and have it adjusted for look-ahead. I see two ways to do this: (I'm open to more ideas. My terminology: Unadjusted = values not adjusted for look-ahead bias; adjusted = values adjusted for look-ahead bias.) 1) I could a) do calculations on unadjusted values then b) adjust the resulting values for look-ahead bias. Here is what I mean: a) I could say the following using time series of val1: [(val1 - val1 4 periods ago) / val1 4 periods ago] = resultval. ("Periods" correspond to the z.dates in my example below.) b) Then I would adjust the resultval for look-ahead based on val1's associated report date. Note: I don't think this will be the fastest. 2) I could do the same calculation [(val1 - val1 4 periods ago) / val1 4 periods ago] = resultval, but my calculation function would get the 'right' values that would have no look-ahead bias. I'm not sure how I would do this, but maybe a query starting with the date that I want, indexed to appropriate report date indexed to the correct value to return. But how do I do this in R? I think I would have to put this in our database and do a query. The data comes to me in RData format. I could put it all in our database via PpgSQL which we already use. Note: I think this will be fastest. Anyway, my first attempt at this was to solve part b of #1 above. Here is how my data looks and my first attempt at solving part b of idea #1 above. It only takes 0.14 seconds for my mock data, but that is way too slow. The major things slowing it down A) the loop, B) the merge statement. # mock data: this is how it comes to me (raw) # in practice I have over 10,000 columns # the starting 'periods' for my data z.dates = c("2007-03-31","2007-06-30","2007-09-30","2007-12-31","2008-03-31","2008-06-30","2008-09-30","2008-12-31") nms = c("A","B","C","D") # these are the report dates that are the real days the data was available rd1 = matrix(c("20070514","20070814","20071115","20080213","20080514","20080814","20081114","20090217", "20070410","20070709","20071009","20080109","20080407","20080708","20081007","20090112", "20070426","--","--","--","--","--","--","20090319", "--","--","--","--","--","--","--","--"), nrow=8,ncol=4) dimnames(rd1) = list(z.dates,nms) # this is the unadjusted raw data, that always has the same dimensions, rownames, and colnames as the report dates ua = matrix(c(640.35,636.16,655.91,657.41,682.06,702.90,736.15,667.65, 2625.050,2625.050,2645.000,2302.000,1972.000,1805.000,1547.000,1025.000, NaN, NaN,-98.426,190.304,180.894,183.220,172.520, 144.138, NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN), nrow=8,ncol=4) dimnames(ua) = list(z.dates,nms) # change anything below. I can't change anything above this line. # My first attempt at this was to solve part b of #1 above. fix = function(x) { year = substring(x, 1, 4); mo = substring(x, 5, 6); day = substring(x, 7, 8); ifelse(year=="--", "NA", paste(year, mo, day, sep = "-")) } rd = apply(rd1, 2, fix) dimnames(rd) = dimnames(eps_rd) dt1 <- seq(from =as.Date(z.dates[1]), to = as.Date("2009-03-25"), by = "day") dt = sapply(dt1, as.character) fin = dt ck_rows = length(dt) bad = character(0) start_t_all = Sys.time() for(cn in 1:ncol(ua)){ uac = ua[,cn] tkr = colnames(ua)[cn] rdc = rd[,cn] ua_rd = cbind(uac,rdc) colnames(ua_rd) = c(tkr,'rt_date') xx1 = merge(dt,ua_rd,by.x=1,by.y= 'rt_date',all.x=T) xx = as.character(xx1[,2]) values <- c(NA, xx[!is.na(xx)]) ind = cumsum(!is.na(xx)) + 1 y <- values[ind] if(ck_rows == length(y)){ fin = data.frame(fin,y) }else{ bad = c(bad,tkr) } } colnames(fin) = c('daily_dates',nms) # after this I would slice and dice the data into weekly, monthly, etc. periodicity as needed, but this leaves it in daily format which is as granular as I will get. print("over all time for loop") print(Sys.time()-start_t_all) Regards, Ben [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Matrix Package, sparseMatrix, more NaN's than zeros
Hello, I have a lot of data and it has a lot of NaN values. I want to compress the data so I don't have memory issues later. Using the Matrix package, sparseMatrix function, and some fiddling around, I have successfully reduced the 'size' of my data (as measured by object.size()). However, NaN values are found all over in my data and zeros are important, but zeros are found very infrequently in my data. So I turn NaN's into zeros and zeros into very small numbers. I don't like changing the zeros into small numbers, because that is not the truth. I know this is a judgement call on my part based on the impact non-zero zeros will have on my analysis. My question is: Do I have any other option? Is there a better solution for this issue? Here is a small example: # make sample data M <- Matrix(10 + 1:28, 4, 7) M2 <- cBind(-1, M) M2[, c(2,4:6)] <- 0 M2[1:2,2] <- M2[c(3,4),]<- M2[,c(3,4,5)]<- NaN M3 = M2 # my 'fiddling' to make sparseMatrix save space M3[M3==0] = 1e-08 # turn zeros into small values M3[is.nan(M3)] = 0 # turn NaN's into zeros # saving space sM <- as(M3, "sparseMatrix") #Note that this is just a sample of what I am doing. This reduces the object.size() if you have a lot more data. In this simple example it actually increases the object.size() because the data is so small. What I know about Matrix: http://cran.r-project.org/web/packages/Matrix/vignettes/Intro2Matrix.pdf Thanks, Ben [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] index instead of loop?
Hello, Does anyone know of a way I can speed this up? Basically I'm attempting to get the data item on the same row as the report date for each report date available. In reality, I have over 11k of columns, not just A, B, C, D and I have to do that over 100 times. My solution is slow, but it works. The loop is slow because of merge. # create sample data z.dates = c("2007-03-31","2007-06-30","2007-09-30","2007-12-31","2008-03-31","2008-06-30","2008-09-30","2008-12-31") nms = c("A","B","C","D") # these are the report dates that are the real days the data was available rd1 = matrix(c("20070514","20070814","20071115","20080213","20080514","20080814","20081114","20090217", "20070410","20070709","20071009","20080109","20080407","20080708","20081007","20090112", "20070426","--","--","--","--","--","--","20090319", "--","--","--","--","--","--","--","--"), nrow=8,ncol=4) dimnames(rd1) = list(z.dates,nms) # this is the unadjusted raw data, that always has the same dimensions, rownames, and colnames as the report dates ua = matrix(c(640.35,636.16,655.91,657.41,682.06,702.90,736.15,667.65, 2625.050,2625.050,2645.000,2302.000,1972.000,1805.000,1547.000,1025.000, NaN, NaN,-98.426,190.304,180.894,183.220,172.520, 144.138, NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN), nrow=8,ncol=4) dimnames(ua) = list(z.dates,nms) # change anything below. # My first attempt at this fix = function(x) { year = substring(x, 1, 4); mo = substring(x, 5, 6); day = substring(x, 7, 8); ifelse(year=="--", "NA", paste(year, mo, day, sep = "-")) } rd = apply(rd1, 2, fix) dimnames(rd) = dimnames(rd) dt1 <- seq(from =as.Date(z.dates[1]), to = as.Date("2009-03-25"), by = "day") dt = sapply(dt1, as.character) fin = dt ck_rows = length(dt) bad = character(0) start_t_all = Sys.time() for(cn in 1:ncol(ua)){ uac = ua[,cn] tkr = colnames(ua)[cn] rdc = rd[,cn] ua_rd = cbind(uac,rdc) colnames(ua_rd) = c(tkr,'rt_date') xx1 = merge(dt,ua_rd,by.x=1,by.y= 'rt_date',all.x=T) xx = as.character(xx1[,2]) values <- c(NA, xx[!is.na(xx)]) ind = cumsum(!is.na(xx)) + 1 y <- values[ind] if(ck_rows == length(y)){ fin = data.frame(fin,y) }else{ bad = c(bad,tkr) } } colnames(fin) = c('daily_dates',nms) print("over all time for loop") print(Sys.time()-start_t_all) print(fin) Thanks, Ben PS - the real/over-all issue is below, but it is probably too involved to follow. On Sat, Mar 3, 2012 at 2:30 PM, Ben quant wrote: > Hello, > > Thank you for your help/advice! > > The issue here is speed/efficiency. I can do what I want, but its really > slow. > > The goal is to have the ability to do calculations on my data and have it > adjusted for look-ahead. I see two ways to do this: > (I'm open to more ideas. My terminology: Unadjusted = values not adjusted > for look-ahead bias; adjusted = values adjusted for look-ahead bias.) > > 1) I could a) do calculations on unadjusted values then b) adjust the > resulting values for look-ahead bias. Here is what I mean: > a) I could say the following using time series of val1: [(val1 - val1 4 > periods ago) / val1 4 periods ago] = resultval. ("Periods" correspond to > the z.dates in my example below.) > b) Then I would adjust the resultval for look-ahead based on val1's > associated report date. > Note: I don't think this will be the fastest. > > 2) I could do the same calculation [(val1 - val1 4 periods ago) / val1 4 > periods ago] = resultval, but my calculation function would get the 'right' > values that would have no look-ahead bias. I'm not sure how I would do > this, but maybe a query starting with the date that I want, indexed to > appropriate report date indexed to the correct value to return. But how do > I do this in R? I think I would have to put this in our database and do a > query. The data comes to me in RData format. I could put it all in our > database via PpgSQL which we already use. > Note: I think this will be fastest. > > Anyway, my first attempt at this was to solve part b of #1 above. Here is > how my data looks and my first attempt at solving part b of idea #1 above. > It only takes 0.14 seconds for my mock data, but that is way too slow. The > major
Re: [R] index instead of loop?
Just looking at this, but it looks like ix doesn't exist: sapply(1:length(inxlist), function(i) if(length(ix[[i]])) fin1[ix[[i]], tkr + 1] <<- ua[i, tkr]) Trying to sort it out now. Ben On Mon, Mar 5, 2012 at 7:48 PM, Rui Barradas wrote: > Hello, > > > > > Mar 05, 2012; 8:53pm by Ben quant Ben quant > > Hello, > > > > Does anyone know of a way I can speed this up? > > > > Maybe, let's see. > > > > > # change anything below. > > > > # Yes. > # First, start by using dates, not characters > > fdate <- function(x, format="%Y%m%d"){ >DF <- data.frame(x) >for(i in colnames(DF)){ >DF[, i] <- as.Date(DF[, i], format=format) >class(DF[, i]) <- "Date" >} >DF > } > > rd1 <- fdate(rd1) > # This is yours, use it. > dt1 <- seq(from =as.Date(z.dates[1]), to = as.Date("2009-03-25"), by = > "day") > # Set up the result, no time expensive 'cbind' inside a loop > fin1 <- data.frame(matrix(NA, nrow=length(dt1), ncol=ncol(ua) + 1)) > fin1[, 1] <- dt1 > nr <- nrow(rd1) > > # And vectorize > for(tkr in 1:ncol(ua)){ >x <- c(rd1[, tkr], as.Date("-12-31")) >inxlist <- lapply(1:nr, function(i) which(x[i] <= dt1 & dt1 < x[i + > 1])) >sapply(1:length(inxlist), function(i) if(length(ix[[i]])) > fin1[ix[[i]], tkr > + 1] <<- ua[i, tkr]) > } > colnames(fin1) <- c("daily_dates", colnames(ua)) > > # Check results > str(fin) > str(fin1) > head(fin) > head(fin1) > tail(fin) > tail(fin1) > > > Note that 'fin' has facotrs, 'fin1' numerics. > I haven't timed it but I believe it should be faster. > > Hope this helps, > > Rui Barradas > > > > > > -- > View this message in context: > http://r.789695.n4.nabble.com/index-instead-of-loop-tp4447672p4448567.html > Sent from the R help mailing list archive at Nabble.com. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] index instead of loop?
I think this is what you meant: z.dates = c("2007-03-31","2007-06-30","2007-09-30","2007-12-31","2008-03-31","2008-06-30","2008-09-30","2008-12-31") nms = c("A","B","C","D") # these are the report dates that are the real days the data was available rd1 = matrix(c("20070514","20070814","20071115", "20080213", "20080514", "20080814", "20081114", "20090217", "20070410","20070709","20071009", "20080109", "20080407", "20080708", "20081007", "20090112", "20070426","--","--","--","--","--","--","20090319", "--","--","--","--","--","--","--","--"), nrow=8,ncol=4) dimnames(rd1) = list(z.dates,nms) # this is the unadjusted raw data, that always has the same dimensions, rownames, and colnames as the report dates ua = matrix(c(640.35,636.16,655.91,657.41,682.06,702.90,736.15,667.65, 2625.050,2625.050,2645.000,2302.000,1972.000,1805.000,1547.000,1025.000, NaN, NaN,-98.426,190.304,180.894,183.220,172.520, 144.138, NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN), nrow=8,ncol=4) dimnames(ua) = list(z.dates,nms) ## fdate <- function(x, format="%Y%m%d"){ DF <- data.frame(x) for(i in colnames(DF)){ DF[, i] <- as.Date(DF[, i], format=format) class(DF[, i]) <- "Date" } DF } rd1 <- fdate(rd1) # This is yours, use it. dt1 <- seq(from =as.Date(z.dates[1]), to = as.Date("2009-03-25"), by = "day") # Set up the result, no time expensive 'cbind' inside a loop fin1 <- data.frame(matrix(NA, nrow=length(dt1), ncol=ncol(ua) + 1)) fin1[, 1] <- dt1 nr <- nrow(rd1) # And vectorize for(tkr in 1:ncol(ua)){ x <- c(rd1[, tkr], as.Date("-12-31")) # inxlist <- lapply(1:nr, function(i) which(x[i] <= dt1 & dt1 < x[i + 1])) ix <- lapply(1:nr, function(i) which(x[i] <= dt1 & dt1 < x[i + 1])) sapply(1:length(ix), function(i) if(length(ix[[i]])) fin1[ix[[i]], tkr + 1] <<- ua[i, tkr]) } colnames(fin1) <- c("daily_dates", colnames(ua)) # Check results str(fin1) head(fin1) tail(fin1) On Tue, Mar 6, 2012 at 7:34 AM, Ben quant wrote: > Just looking at this, but it looks like ix doesn't exist: > >sapply(1:length(inxlist), function(i) if(length(ix[[i]])) > fin1[ix[[i]], tkr > + 1] <<- ua[i, tkr]) > > Trying to sort it out now. > > Ben > > > On Mon, Mar 5, 2012 at 7:48 PM, Rui Barradas wrote: > >> Hello, >> >> > >> > Mar 05, 2012; 8:53pm by Ben quant Ben quant >> > Hello, >> > >> > Does anyone know of a way I can speed this up? >> > >> >> Maybe, let's see. >> >> > >> > # change anything below. >> > >> >> # Yes. >> # First, start by using dates, not characters >> >> fdate <- function(x, format="%Y%m%d"){ >>DF <- data.frame(x) >>for(i in colnames(DF)){ >>DF[, i] <- as.Date(DF[, i], format=format) >>class(DF[, i]) <- "Date" >>} >>DF >> } >> >> rd1 <- fdate(rd1) >> # This is yours, use it. >> dt1 <- seq(from =as.Date(z.dates[1]), to = as.Date("2009-03-25"), by = >> "day") >> # Set up the result, no time expensive 'cbind' inside a loop >> fin1 <- data.frame(matrix(NA, nrow=length(dt1), ncol=ncol(ua) + 1)) >> fin1[, 1] <- dt1 >> nr <- nrow(rd1) >> >> # And vectorize >> for(tkr in 1:ncol(ua)){ >>x <- c(rd1[, tkr], as.Date("-12-31")) >>inxlist <- lapply(1:nr, function(i) which(x[i] <= dt1 & dt1 < x[i >> + 1])) >>sapply(1:length(inxlist), function(i) if(length(ix[[i]])) >> fin1[ix[[i]], tkr >> + 1] <<- ua[i, tkr]) >> } >> colnames(fin1) <- c("daily_dates", colnames(ua)) >> >> # Check results >> str(fin) >> str(fin1) >> head(fin) >> head(fin1) >> tail(fin) >> tail(fin1) >> >> >> Note that 'fin' has facotrs, 'fin1' numerics. >> I haven't timed it but I believe it should be faster. >> >> Hope this helps, >> >> Rui Barradas >> >> >> >> >> >> -- >> View this message in context: >> http://r.789695.n4.nabble.com/index-instead-of-loop-tp4447672p4448567.html >> Sent from the R help mailing list archive at Nabble.com. >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] index instead of loop?
Unfortunately, your solution is does not scale well. (Tough for you to test this without my real data.) If ua is my data and rd1 are my report dates (same as the code below) and I use more columns, it appears that your solution slows considerably. Remember I have ~11k columns in my real data, so scalability is critical. Here are the processing times using real data: Use 4 columns: ua = ua[,1:4] rd1 = rd1[,1:4] mine: 2.4 sec's yours: 1.39 sec's Note: yours is faster with 4 columns (like the mockup data I provided.) Use 150 columns: ua = ua[,1:150] rd1 = rd1[,1:150] mine: 5 sec's yours: 9 sec's Use 300 columns: ua = ua[,1:300] rd1 = rd1[,1:300] mine: 9.5 sec's yours: 1 min # data Here is the mockup date and code used: (Anyone looking to test the scalability may want to add more columns.) Mockup date: z.dates = c("2007-03-31","2007-06-30","2007-09-30","2007-12-31","2008-03-31","2008-06-30","2008-09-30","2008-12-31") nms = c("A","B","C","D") # these are the report dates that are the real days the data was available rd1 = matrix(c("20070514","20070814","20071115", "20080213", "20080514", "20080814", "20081114", "20090217", "20070410","20070709","20071009", "20080109", "20080407", "20080708", "20081007", "20090112", "20070426","--","--","--","--","--","--","20090319", "--","--","--","--","--","--","--","--"), nrow=8,ncol=4) dimnames(rd1) = list(z.dates,nms) My code: start_t_all = Sys.time() nms = colnames(ua) fix = function(x) { year = substring(x, 1, 4); mo = substring(x, 5, 6); day = substring(x, 7, 8); ifelse(year=="--", "NA", paste(year, mo, day, sep = "-")) } rd = apply(rd1, 2, fix) dimnames(rd) = dimnames(rd) dt1 <- seq(from =as.Date(z.dates[1]), to = as.Date(z.dates[length(z.dates)]), by = "day") dt = sapply(dt1, as.character) fin = dt ck_rows = length(dt) bad = character(0) for(cn in 1:ncol(ua)){ uac = ua[,cn] tkr = colnames(ua)[cn] rdc = rd[,cn] ua_rd = cbind(uac,rdc) colnames(ua_rd) = c(tkr,'rt_date') xx1 = merge(dt,ua_rd,by.x=1,by.y= 'rt_date',all.x=T) xx = as.character(xx1[,2]) values <- c(NA, xx[!is.na(xx)]) ind = cumsum(!is.na(xx)) + 1 y <- values[ind] if(ck_rows == length(y)){ fin = data.frame(fin,y) }else{ bad = c(bad,tkr) } } if(length(bad)){ nms = nms[bad != nms] } colnames(fin) = c('daily_dates',nms) print("over all time for loop") print(Sys.time()-start_t_all) ### Your code: z.dates = rownames(ua) start_t_all = Sys.time() fdate <- function(x, format="%Y%m%d"){ DF <- data.frame(x) for(i in colnames(DF)){ DF[, i] <- as.Date(DF[, i], format=format) class(DF[, i]) <- "Date" } DF } rd1 <- fdate(rd1) # This is yours, use it. dt1 <- seq(from =as.Date(z.dates[1]), to = as.Date(z.dates[length(z.dates)]), by ="day") # Set up the result, no time expensive 'cbind' inside a loop fin1 <- data.frame(matrix(NA, nrow=length(dt1), ncol=ncol(ua) + 1)) fin1[, 1] <- dt1 nr <- nrow(rd1) # And vectorize for(tkr in 1:ncol(ua)){ x <- c(rd1[, tkr], as.Date("-12-31")) # inxlist <- lapply(1:nr, function(i) which(x[i] <= dt1 & dt1 < x[i + 1])) ix <- lapply(1:nr, function(i) which(x[i] <= dt1 & dt1 < x[i + 1])) sapply(1:length(ix), function(i) if(length(ix[[i]])) fin1[ix[[i]], tkr + 1] <<- ua[i, tkr]) } colnames(fin1) <- c("daily_dates", colnames(ua)) print(Sys.time()-start_t_all) Thanks for your efforts though, ben On Tue, Mar 6, 2012 at 7:39 AM, Ben quant wrote: > I think this is what you meant: > > > z.dates = > c("2007-03-31","2007-06-30","2007-09-30","2007-12-31","2008-03-31","2008-06-30","2008-09-30","2008-12-31") > > nms = c("A","B","C","D") > # these are the report dates that are the real days the data was available > rd1 = matrix(c("20070514","20070814","20071115", "20080213", > "20080514", "20080814", "20081114", "20090217", >"20070410","20070709","20071009", "20080109", > "20080407", &quo
Re: [R] index instead of loop?
Hello, In case anyone is interested in a faster solution for lots of columns. This solution is slower if you only have a few columns. If anyone has anything faster, I would be interested in seeing it. ### some mockup data z.dates = c("2007-03-31","2007-06-30","2007-09-30","2007-12-31","2008-03-31","2008-06-30","2008-09-30","2008-12-31") nms = c("A","B","C","D") # add more columns to see how the code below is fsater # these are the report dates that are the real days the data was available, so show the data the day after this date ('after' is a design decision) rd1 = matrix(c("20070514","20070814","20071115", "20080213", "20080514", "20080814", "20081114", "20090217", "20070410","20070709","20071009", "20080109", "20080407", "20080708", "20081007", "20090112", "20070426","--","--","--","--","--","--","20090319", "--","--","--","--","--","--","--","--"), nrow=8,ncol=4) dimnames(rd1) = list(z.dates,nms) # this is the unadjusted raw data, that always has the same dimensions, rownames, and colnames as the report dates ua = matrix(c(640.35,636.16,655.91,657.41,682.06,702.90,736.15,667.65, 2625.050,2625.050,2645.000,2302.000,1972.000,1805.000,1547.000,1025.000, NaN, NaN,-98.426,190.304,180.894,183.220,172.520, 144.138, NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN), nrow=8,ncol=4) dimnames(ua) = list(z.dates,nms) the fastest code I have found: start_t_all = Sys.time() fix = function(x) { year = substring(x, 1, 4) mo = substring(x, 5, 6) day = substring(x, 7, 8) ifelse(year=="--", "NA", paste(year, mo, day, sep = "-")) } rd = apply(rd1, 2, fix) dimnames(rd) = dimnames(rd) wd1 <- seq(from =as.Date(min(z.dates)), to = Sys.Date(), by = "day") #wd1 = wd1[weekdays(wd1) == "Friday"] # uncomment to go weekly wd = sapply(wd1, as.character) mat = matrix(NA,nrow=length(wd),ncol=ncol(ua)) rownames(mat) = wd nms = as.Date(rownames(ua)) for(i in 1:length(wd)){ d = as.Date(wd[i]) diff = abs(nms - d) rd_row_idx = max(which(diff == min(diff))) rd_col_idx = which(rd[rd_row_idx,] < d) if((rd_row_idx - 1) > 0){ mat[i,] = ua[rd_row_idx - 1,] } if( length(rd_col_idx)){ mat[i,rd_col_idx] = ua[rd_row_idx,rd_col_idx] } } colnames(mat)=colnames(ua) print(Sys.time()-start_t_all) Regards, Ben On Tue, Mar 6, 2012 at 8:22 AM, Rui Barradas wrote: > Hello, > > > Just looking at this, but it looks like ix doesn't exist: > >sapply(1:length(inxlist), function(i) if(length(ix[[i]])) > > fin1[ix[[i]], tkr + 1] <<- ua[i, tkr]) > > > > Trying to sort it out now. > > Right, sorry. > I've changed the name from 'ix' to 'inxlist' to make it more readable just > before posting. > And since the object 'ix' still existed in the R global environment it > didn't throw an error... > > Your correction in the post that followed is what I meant. > > Correction (full loop, tested): > > for(tkr in 1:ncol(ua)){ >x <- c(rd1[, tkr], as.Date("-12-31")) > ix <- lapply(1:nr, function(i) > which(x[i] <= dt1 & dt1 < x[i + 1])) > sapply(1:length(ix), function(i) > if(length(ix[[i]])) fin1[ix[[i]], tkr + 1] <<- ua[i, tkr]) > } > > Rui Barradas > > > -- > View this message in context: > http://r.789695.n4.nabble.com/index-instead-of-loop-tp4447672p4450186.html > Sent from the R help mailing list archive at Nabble.com. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] extract same columns and rows in two matrices
Hello, I have two matrices. They both have different row names and column names, but they have some common row names and column names. The row names and column names that are the same are what I am interested in. I also want the columns in the two matrices aligned the same. In the end, I need to do rd[1,1] and ua[1,1], for example and be accessing the same column and row for both matrices. Thank you very much for all you help. I can do it, but I am pretty sure there is a better/faster way: make some sample data ua = matrix(c(1,2,3,4,5,6),nrow=2,ncol=3) colnames(ua) = c('a','b','c') rownames(ua)= c('ra','rb') rd1 = matrix(c(7,8,9,10,11,12,13,14,15,16,17,18),nrow=3,ncol=4) colnames(rd1) = c('c','b','a','d') rownames(rd1)= c('rc','rb','ra') > rd1 c b a d rc 7 10 13 16 rb 8 11 14 17 ra 9 12 15 18 > ua a b c ra 1 3 5 rb 2 4 6 # get common columns and rows and order them the same, this works but is slow'ish rd1_cn = colnames(rd1) ua_cn = colnames(ua) common_t = merge(rd1_cn,ua_cn,by.x=1,by.y=1) common_t = as.character(common_t[,1]) rd1 = rd1[,common_t] ua = ua[,common_t] rd1_d = rownames(rd1) ua_d = rownames(ua) common_d = merge(rd1_d,ua_d,by.x=1,by.y=1) common_d = as.character(common_d[,1]) rd = rd1[common_d,] ua = ua[common_d,] this is what I want > rd a b c ra 15 12 9 rb 14 11 8 > ua a b c ra 1 3 5 rb 2 4 6 Thanks! ben [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] index instead of loop?
Humm If I understand what you are saying, you are correct. I get 144.138 for 2009-03-20 for column C. Maybe I posted the wrong code? If so, sorry. Let me know if you disagree. I still plan to come back to this and optimize it more, so if you see anything that would make it faster that would be great. Of course, the for loop is my focus for optimization. Due to some issues in the real data I had to add the lag and lag2 stuff in (I don't think I had that before). In my real data the values don't really belong in the z.dates the are aligned with, but to avoid lots of empty values in the flat matrix (ua) they were forced in. I can push them into their "real" dates via looking at a deeper lag. I'm thinking that all the "which" stuff in the for look can be nested so that it runs faster. Also the as.Date, abs() and max(which( etc. stuff seems like it could be handled better/faster or outside the loop. If you are interested in helping further, I can post a link to some 'real' data. Here is what I am using now and it seems to work. Sorry, my code is still very fluid: z.dates = c("2007-03-31","2007-06-30","2007-09-30","2007-12-31","2008-03-31","2008-06-30","2008-09-30","2008-12-31") nms = c("A","B","C","D") # these are the report dates that are the real days the data was available rd1 = matrix(c("20070514","20070814","20071115","20080213","20080514","20080814","20081114","20090217", "20070410","20070709","20071009","20080109","20080407","20080708","20081007","20090112", "20070426","--","--","--","--","--","--","20090319", "--","--","--","--","--","--","--","--"), nrow=8,ncol=4) dimnames(rd1) = list(z.dates,nms) # this is the unadjusted raw data, that always has the same dimensions, rownames, and colnames as the report dates ua = matrix(c(640.35,636.16,655.91,657.41,682.06,702.90,736.15,667.65, 2625.050,2625.050,2645.000,2302.000,1972.000,1805.000,1547.000,1025.000, NaN, NaN,-98.426,190.304,180.894,183.220,172.520, 144.138, NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN), nrow=8,ncol=4) dimnames(ua) = list(z.dates,nms) z.dates = rownames(ua) ## by rows ## FASTEST start_t_all = Sys.time() fix = function(x) { year = substring(x, 1, 4) mo = substring(x, 5, 6) day = substring(x, 7, 8) ifelse(year=="--", "--", paste(year, mo, day, sep = "-")) } rd = apply(rd1, 2, fix) dimnames(rd) = dimnames(rd) wd1 <- seq(from =as.Date(min(z.dates)), to = Sys.Date(), by = "day") wd1 = wd1[weekdays(wd1) == "Friday"] # uncomment to go weekly wd = sapply(wd1, as.character) mat = matrix(NA,nrow=length(wd),ncol=ncol(ua)) rownames(mat) = wd nms = as.Date(rownames(ua)) for(i in 1:length(wd)){ d = as.Date(wd[i]) diff = abs(nms - d) rd_row_idx = max(which(diff == min(diff))) rd_col_idx = which(as.Date(rd[rd_row_idx,], format="%Y-%m-%d") < d) rd_col_idx_lag = which(as.Date(rd[rd_row_idx - 1,], format="%Y-%m-%d") < d) rd_col_idx_lag2 = which(as.Date(rd[rd_row_idx - 2,], format="%Y-%m-%d") < d) if(length(rd_col_idx_lag2) && (rd_row_idx - 2) > 0){ mat[i,rd_col_idx_lag2] = ua[rd_row_idx - 2,rd_col_idx_lag2] } if(length(rd_col_idx_lag)){ mat[i,rd_col_idx_lag] = ua[rd_row_idx - 1,rd_col_idx_lag] } if( length(rd_col_idx)){ mat[i,rd_col_idx] = ua[rd_row_idx,rd_col_idx] } } colnames(mat)=colnames(ua) print(Sys.time()-start_t_all) Let me know if you disagree, Ben On Wed, Mar 7, 2012 at 5:57 PM, Rui Barradas wrote: > Hello again. > > > Ben quant wrote > > > > Hello, > > > > In case anyone is interested in a faster solution for lots of columns. > > This > > solution is slower if you only have a few columns. If anyone has > anything > > faster, I would be interested in seeing it. > > > > ### some mockup data > > z.dates = > > > c("2007-03-31","2007-06-30","2007-09-30","2007-12-31","2008-03-31","2008-06-30","2008-09-30","2008-12-31") > > > > nms = c("A","B","C","D") # add more columns to see how the code below is > > fsater > > # these are the report dates that are the real days the data was > > available, > >
Re: [R] index values of one matrix to another of a different size
> Hello, > > Is this the fastest way to use indices from one matrix to reference rows > in another smaller matrix? I am dealing with very big data (lots of columns > and I have to do this lots of times). > > ##sample data ## > vals = matrix(LETTERS[1:9], nrow=3,ncol=3) > colnames(vals) = c('col1','col2','col3') > rownames(vals) = c('row1','row2','row3') > > vals > col1 col2 col3 > row1 "A" "D" "G" > row2 "B" "E" "H" > row3 "C" "F" "I" > > # this is a matrix of row references to vals above. The values all stay in > the same column but shift in row via the indices. > indx = matrix(c(1,1,3,3,2,2,2,3,1,2,2,1),nrow=4,ncol=3) > > indx > [,1] [,2] [,3] > [1,]121 > [2,]122 > [3,]322 > [4,]331 > ### end sample data > > # my solution > > > matrix(vals[cbind(c(indx),rep(1:ncol(indx),each=nrow(indx)))],nrow=nrow(indx),ncol=ncol(indx)) > [,1] [,2] [,3] > [1,] "A" "E" "G" > [2,] "A" "E" "H" > [3,] "C" "E" "H" > [4,] "C" "F" "G" > > Thanks, > > Ben > > PS - Rui - I thought you may want to see this since I think this will be a > faster way to deal with the issue you were working with me on...although I > don't show how I build the matrix of indices, I think you get the idea. > > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] index instead of loop?
Here is my latest. I kind of changed the problem (for speed). In real life I have over 300 uadata type matrices, each having over 20 rows and over 11,000 columns. However the rddata file is valid for all of the uadata matrices that I have (300). What I am doing now: I'm creating a matrix of row indices which will either lag the row values or not based on the report data (rddata). Then I apply that matrix of row indices to each uadata data item (300 times) to create a matrix of the correctly row adjusted data items for the correct columns of the dimensions and periodicity that I want (weekly in this case). The key being, I only do the 'adjustment' once (which is comparatively slow) and I apply those results to the data matrix (fast!). I'm open to ideas. I put this together quickly so hopefully all is well. #sample data zdates = c("2007-03-31","2007-06-30","2007-09-30","2007-12-31","2008-03-31","2008-06-30","2008-09-30","2008-12-31") nms = c("A","B","C","D") # these are the report dates that are the real days the data was available rddata = matrix(c("20070514","20070814","20071115","20080213","20080514","20080814","20081114","20090217", "20070410","20070709","20071009","20080109","20080407","20080708","20081007","20090112", "20070426","--","--","--","--","--","--","20090319", "--","--","--","--","--","--","--","--"), nrow=8,ncol=4) dimnames(rddata) = list(zdates,nms) # this is the unadjusted raw data, that always has the same dimensions, rownames, and colnames as the report dates uadata = matrix(c(640.35,636.16,655.91,657.41,682.06,702.90,736.15,667.65, 2625.050,2625.050,2645.000,2302.000,1972.000,1805.000,1547.000,1025.000, NaN, NaN,-98.426,190.304,180.894,183.220,172.520, 144.138, NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN), nrow=8,ncol=4) dimnames(uadata) = list(zdates,nms) I do this once fix = function(x) { year = substring(x, 1, 4) mo = substring(x, 5, 6) day = substring(x, 7, 8) ifelse(year=="--", "--", paste(year, mo, day, sep = "-")) } rd = apply(rddata, 2, fix) dimnames(rd) = dimnames(rd) wd1 <- seq(from =as.Date(min(zdates)), to = Sys.Date(), by = "day") wd1 = wd1[weekdays(wd1) == "Friday"] # uncomment to go weekly wd = sapply(wd1, as.character) mat = matrix(NA,nrow=length(wd),ncol=ncol(uadata)) rownames(mat) = wd nms = as.Date(rownames(uadata)) for(i in 1:length(wd)){ d = as.Date(wd[i]) diff = abs(nms - d) rd_row_idx = max(which(diff == min(diff))) rd_row_idx_lag = rd_row_idx - 1 rd_row_idx_lag2 = rd_row_idx - 2 rd_col_idx = which(as.Date(rd[rd_row_idx,], format="%Y-%m-%d") < d) rd_col_idx_lag = which(as.Date(rd[rd_row_idx_lag,], format="%Y-%m-%d") < d) rd_col_idx_lag2 = which(as.Date(rd[rd_row_idx_lag2,], format="%Y-%m-%d") < d) ## if(length(rd_col_idx_lag2) && (rd_row_idx - 2) > 0){ if(rd_row_idx_lag2 > 0){ # mat[i,rd_col_idx_lag2] = ua[rd_row_idx_lag2,rd_col_idx_lag2] mat[i,rd_col_idx_lag2] = rd_row_idx_lag2 } #if(length(rd_col_idx_lag)){ mat[i,rd_col_idx_lag] = rd_row_idx_lag #} #if( length(rd_col_idx)){ mat[i,rd_col_idx] = rd_row_idx #} } indx = mat vals = uadata ## I do this 300 times x = matrix(vals[cbind(c(indx),rep(1:ncol(indx),each=nrow(indx)))],nrow=nrow(indx),ncol=ncol(indx)) Regards, ben On Thu, Mar 8, 2012 at 11:40 AM, Rui Barradas wrote: > Hello, > > > Humm If I understand what you are saying, you are correct. I get > > 144.138 for 2009-03-20 for column C. Maybe I posted the wrong code? If > > so, > > sorry. > > I think I have the fastest so far solution, and it checks with your > corrected,last one. > > I've made just a change: to transform it into a function I renamed the > parameters > (only for use inside the function) 'zdates', without the period, 'rddata' > and 'uadata'. > > 'fun1' is yours, 'fun2', mine. Here it goes. > > > fun1 <- function(zdates, rddata, uadata){ > fix = function(x) >{ > year = substring(x, 1, 4) > mo = substring(x, 5, 6) > day = substring(x, 7, 8) > ifelse(year=="--", "--", paste(year, mo, day, sep = "-")) > >} > rd = apply(rddata, 2, fix) >dimnames(rd) = dimnames(rd) > >wd1 <- seq(from =as.Date(min(zdates)), to = Sys.Date(), by = "day") > #wd1 = wd1[weekdays(wd1) == "Friday"] # uncomment to go weekly >wd = sapply(wd1, as.character) > mat = matrix(NA,nrow=length(wd),ncol=ncol(uadata)) >rownames(mat) = wd >nms = as.Date(rownames(uadata)) > >for(i in 1:length(wd)){ > d = as.Date(wd[i]) > diff = abs(nms - d) > rd_row_idx = max(which(diff == min(diff))) > rd_col_idx = which(as.Date(rd[rd_row_idx,], format="%Y-%m-%d") < d) > rd_col_idx_lag = which(as.Date(rd[rd_row_idx - 1,], format="%Y-%m-%d") > < d) > rd_col_idx_lag2 = which(as.Date(rd[rd_row_idx - 2,], > format="%Y-%m-%d") < d) > > if(length(rd_col_idx_lag2) && (rd_row_idx
Re: [R] index values of one matrix to another of a different size
Thanks for the info. Unfortunately its a little bit slower after one apples to apples test using my big data. Mine: 0.28 seconds. Yours. 0.73 seconds. Not a big deal, but significant when I have to do this 300 to 500 times. regards, ben On Fri, Mar 9, 2012 at 1:23 PM, Rui Barradas wrote: > Hello, > > I don't know if it's the fastest but it's more natural to have an index > matrix with two columns only, > one for each coordinate. And it's fast. > > fun <- function(valdata, inxdata){ >nr <- nrow(inxdata) >nc <- ncol(inxdata) >mat <- matrix(NA, nrow=nr*nc, ncol=2) >i1 <- 1 >i2 <- nr >for(j in 1:nc){ >mat[i1:i2, 1] <- inxdata[, j] >mat[i1:i2, 2] <- rep(j, nr) >i1 <- i1 + nr >i2 <- i2 + nr >} >matrix(valdata[mat], ncol=nc) > } > > fun(vals, indx) > > Rui Barradas > > > -- > View this message in context: > http://r.789695.n4.nabble.com/Re-index-values-of-one-matrix-to-another-of-a-different-size-tp4458666p4460575.html > Sent from the R help mailing list archive at Nabble.com. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] index values of one matrix to another of a different size
Very interesting. You are doing some stuff here that I have never seen. Thank you. I will test it on my real data on Monday and let you know what I find. That cmpfun function looks very useful! Thanks, Ben On Sat, Mar 10, 2012 at 10:26 AM, Joshua Wiley wrote: > Hi Ben, > > It seems likely that there are bigger bottle necks in your overall > program/use---have you tried Rprof() to find where things really get > slowed down? In any case, f2() below takes about 70% of the time as > your function in your test data, and 55-65% of the time for a bigger > example I constructed. Rui's function benefits substantially from > byte compiling, but is still slower. As a side benefit, f2() seems to > use less memory than your current implementation. > > Cheers, > > Josh > > %% > ##sample data ## > vals <- matrix(LETTERS[1:9], nrow = 3, ncol = 3, > dimnames = list(c('row1','row2','row3'), c('col1','col2','col3'))) > > indx <- matrix(c(1,1,3,3,2,2,2,3,1,2,2,1), nrow=4, ncol=3) > storage.mode(indx) <- "integer" > > > f <- function(x, i, di = dim(i), dx = dim(x)) { > out <- x[c(i + matrix(0:(dx[1L] - 1L) * dx[1L], nrow = di[1L], ncol > = di[2L], TRUE))] > dim(out) <- di > return(out) > } > > > fun <- function(valdata, inxdata){ >nr <- nrow(inxdata) >nc <- ncol(inxdata) >mat <- matrix(NA, nrow=nr*nc, ncol=2) >i1 <- 1 >i2 <- nr >for(j in 1:nc){ >mat[i1:i2, 1] <- inxdata[, j] >mat[i1:i2, 2] <- rep(j, nr) >i1 <- i1 + nr >i2 <- i2 + nr >} >matrix(valdata[mat], ncol=nc) > } > > require(compiler) > f2 <- cmpfun(f) > fun2 <- cmpfun(fun) > > system.time(for (i in 1:1) f(vals, indx)) > system.time(for (i in 1:1) f2(vals, indx)) > system.time(for (i in 1:1) fun(vals, indx)) > system.time(for (i in 1:1) fun2(vals, indx)) > system.time(for (i in 1:1) > > matrix(vals[cbind(c(indx),rep(1:ncol(indx),each=nrow(indx)))],nrow=nrow(indx),ncol=ncol(indx))) > > ## now let's make a bigger test set > set.seed(1) > vals2 <- matrix(sample(LETTERS, 10^7, TRUE), nrow = 10^4) > indx2 <- sapply(1:ncol(vals2), FUN = function(x) sample(10^4, 10^3, TRUE)) > > dim(vals2) > dim(indx2) > > ## the best contenders from round 1 > gold <- > matrix(vals2[cbind(c(indx2),rep(1:ncol(indx2),each=nrow(indx2)))],nrow=nrow(indx2),ncol=ncol(indx2)) > test1 <- f2(vals2, indx2) > all.equal(gold, test1) > > system.time(for (i in 1:20) f2(vals2, indx2)) > system.time(for (i in 1:20) > > matrix(vals2[cbind(c(indx2),rep(1:ncol(indx2),each=nrow(indx2)))],nrow=nrow(indx2),ncol=ncol(indx2))) > > %% > > On Sat, Mar 10, 2012 at 7:48 AM, Ben quant wrote: > > Thanks for the info. Unfortunately its a little bit slower after one > apples > > to apples test using my big data. Mine: 0.28 seconds. Yours. 0.73 > seconds. > > Not a big deal, but significant when I have to do this 300 to 500 times. > > > > regards, > > > > ben > > > > On Fri, Mar 9, 2012 at 1:23 PM, Rui Barradas wrote: > > > >> Hello, > >> > >> I don't know if it's the fastest but it's more natural to have an index > >> matrix with two columns only, > >> one for each coordinate. And it's fast. > >> > >> fun <- function(valdata, inxdata){ > >>nr <- nrow(inxdata) > >>nc <- ncol(inxdata) > >>mat <- matrix(NA, nrow=nr*nc, ncol=2) > >>i1 <- 1 > >>i2 <- nr > >>for(j in 1:nc){ > >>mat[i1:i2, 1] <- inxdata[, j] > >>mat[i1:i2, 2] <- rep(j, nr) > >>i1 <- i1 + nr > >>i2 <- i2 + nr > >>} > >>matrix(valdata[mat], ncol=nc) > >> } > >> > >> fun(vals, indx) > >> > >> Rui Barradas > >> > >> > >> -- > >> View this message in context: > >> > http://r.789695.n4.nabble.com/Re-index-values-of-one-matrix-to-another-of-a-different-size-tp4458666p4460575.html > >> Sent from the R help mailing list archive at Nabble.com. > >> > >> __ > >> R-help@r-project.org mailing list > >> https://stat.ethz.ch/mailman/listinfo/r-help > &g
Re: [R] index values of one matrix to another of a different size
Joshua, Just confirming quickly that your method using cmpfun and your f function below was fastest using my real data. Again, thank you for your help! Ben On Sat, Mar 10, 2012 at 1:21 PM, Joshua Wiley wrote: > On Sat, Mar 10, 2012 at 12:11 PM, Ben quant wrote: > > Very interesting. You are doing some stuff here that I have never seen. > > and that I would not typically do or recommend (e.g., fussing with > storage mode or manually setting the dimensions of an object), but > that can be faster by sacrificing higher level functions flexibility > for lower level, more direct control. > > > Thank you. I will test it on my real data on Monday and let you know > what I > > find. That cmpfun function looks very useful! > > It can reduce the overhead of repeated function calls. I find the > biggest speedups when it is used with some sort of loop. Then again, > many loops can be avoided entirely, which often yields even larger > performance gains. > > > > > Thanks, > > You're welcome. You might also look at the data table package by > Matthew Dowle. It does some *very* fast indexing and subsetting and > if those operations are serious slow down for you, you would likely > benefit substantially from using it. One final comment, since you are > creating the matrix of indices; if you can create it in such a way > that it already has the vector position rather than row/column form, > you could eliminate the need for my f2() function altogether as you > could use it to directly index your data, and then just add dimensions > back afterward. > > Cheers, > > Josh > > > Ben > > > > > > On Sat, Mar 10, 2012 at 10:26 AM, Joshua Wiley > > wrote: > >> > >> Hi Ben, > >> > >> It seems likely that there are bigger bottle necks in your overall > >> program/use---have you tried Rprof() to find where things really get > >> slowed down? In any case, f2() below takes about 70% of the time as > >> your function in your test data, and 55-65% of the time for a bigger > >> example I constructed. Rui's function benefits substantially from > >> byte compiling, but is still slower. As a side benefit, f2() seems to > >> use less memory than your current implementation. > >> > >> Cheers, > >> > >> Josh > >> > >> %% > >> ##sample data ## > >> vals <- matrix(LETTERS[1:9], nrow = 3, ncol = 3, > >> dimnames = list(c('row1','row2','row3'), c('col1','col2','col3'))) > >> > >> indx <- matrix(c(1,1,3,3,2,2,2,3,1,2,2,1), nrow=4, ncol=3) > >> storage.mode(indx) <- "integer" > >> > >> > >> f <- function(x, i, di = dim(i), dx = dim(x)) { > >> out <- x[c(i + matrix(0:(dx[1L] - 1L) * dx[1L], nrow = di[1L], ncol > >> = di[2L], TRUE))] > >> dim(out) <- di > >> return(out) > >> } > >> > >> > >> fun <- function(valdata, inxdata){ > >>nr <- nrow(inxdata) > >>nc <- ncol(inxdata) > >>mat <- matrix(NA, nrow=nr*nc, ncol=2) > >>i1 <- 1 > >>i2 <- nr > >>for(j in 1:nc){ > >>mat[i1:i2, 1] <- inxdata[, j] > >>mat[i1:i2, 2] <- rep(j, nr) > >>i1 <- i1 + nr > >>i2 <- i2 + nr > >>} > >>matrix(valdata[mat], ncol=nc) > >> } > >> > >> require(compiler) > >> f2 <- cmpfun(f) > >> fun2 <- cmpfun(fun) > >> > >> system.time(for (i in 1:1) f(vals, indx)) > >> system.time(for (i in 1:1) f2(vals, indx)) > >> system.time(for (i in 1:1) fun(vals, indx)) > >> system.time(for (i in 1:1) fun2(vals, indx)) > >> system.time(for (i in 1:1) > >> > >> > matrix(vals[cbind(c(indx),rep(1:ncol(indx),each=nrow(indx)))],nrow=nrow(indx),ncol=ncol(indx))) > >> > >> ## now let's make a bigger test set > >> set.seed(1) > >> vals2 <- matrix(sample(LETTERS, 10^7, TRUE), nrow = 10^4) > >> indx2 <- sapply(1:ncol(vals2), FUN = function(x) sample(10^4, 10^3, > TRUE)) > >> > >> dim(vals2) > >> dim(indx2) > >> > >> ## the best contenders from round 1 > >> gold <- > >> > matrix(vals2[cbind(c(indx2),rep(1:ncol(indx2),each=nrow(indx2)))],nrow=nrow(indx2),ncol=ncol(indx2)) &g
[R] gam - Y axis probability scale with confidence/error lines
Hello, How do I plot a gam fit object on probability (Y axis) vs raw values (X axis) axis and include the confidence plot lines? Details... I'm using the gam function like this: l_yx[,2] = log(l_yx[,2] + .0004) fit <- gam(y~s(x),data=as.data.frame(l_yx),family=binomial) And I want to plot it so that probability is on the Y axis and values are on the X axis (i.e. I don't want log likelihood on the Y axis or the log of my values on my X axis): xx <- seq(min(l_yx[,2]),max(l_yx[,2]),len=101) plot(xx,predict(fit,data.frame(x=xx),type="response"),type="l",xaxt="n",xlab="Churn",ylab="P(Top Performer)") at <- c(.001,.01,.1,1,10) # <-- I'd also like to generalize this rather than hard code the numbers axis(1,at=log(at+ .0004),label=at) So far, using the code above, everything looks the way I want. But that does not give me anything information on variability/confidence/certainty. How do I get the dash plots from this: plot(fit) ...on the same scales as above? Related question: how do get the dashed values out of the fit object so I can do 'stuff' with it? Thanks, Ben PS - thank you Patrick for your help previously. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] gam - Y axis probability scale with confidence/error lines
That was embarrassingly easy. Thanks again Patrick! Just correcting a little typo to his reply. this is probably what he meant: pred = predict(fit,data.frame(x=xx),type="response",se.fit=TRUE) upper = pred$fit + 1.96 * pred$se.fit lower = pred$fit - 1.96 * pred$se.fit # For people who are interested this is how you plot it line by line: plot(xx,pred$fit,type="l",xlab=fd$getFactorName(),ylab=ylab,ylim= c(min(down),max(up))) lines(xx,upper,type="l",lty='dashed') lines(xx,lower,type="l",lty='dashed') In my opinion this is only important if the desired y axis is different than what plot(fit) gives you for a gam fit (i.e fit <- gam(...stuff...)) and you want to plot the confidence intervals. thanks again! Ben On Wed, Mar 14, 2012 at 10:39 AM, Patrick Breheny wrote: > The predict() function has an option 'se.fit' that returns what you are > asking for. If you set this equal to TRUE in your code: > > pred <- predict(fit,data.frame(x=xx),**type="response",se.fit=TRUE) > > will return a list with two elements, 'fit' and 'se.fit'. The pointwise > confidence intervals will then be > > pred$fit + 1.96*se.fit > pred$fit - 1.96*se.fit > > for 95% confidence intervals (replace 1.96 with the appropriate quantile > of the normal distribution for other confidence levels). > > You can then do whatever "stuff" you want to do with them, including plot > them. > > --Patrick > > > On 03/14/2012 10:48 AM, Ben quant wrote: > >> Hello, >> >> How do I plot a gam fit object on probability (Y axis) vs raw values (X >> axis) axis and include the confidence plot lines? >> >> Details... >> >> I'm using the gam function like this: >> l_yx[,2] = log(l_yx[,2] + .0004) >> fit<- gam(y~s(x),data=as.data.frame(**l_yx),family=binomial) >> >> And I want to plot it so that probability is on the Y axis and values are >> on the X axis (i.e. I don't want log likelihood on the Y axis or the log >> of >> my values on my X axis): >> >> xx<- seq(min(l_yx[,2]),max(l_yx[,2]**),len=101) >> plot(xx,predict(fit,data.**frame(x=xx),type="response"),** >> type="l",xaxt="n",xlab="Churn"**,ylab="P(Top >> Performer)") >> at<- c(.001,.01,.1,1,10) #<-- I'd also like to generalize >> this rather than hard code the numbers >> axis(1,at=log(at+ .0004),label=at) >> >> So far, using the code above, everything looks the way I want. But that >> does not give me anything information on variability/confidence/** >> certainty. >> How do I get the dash plots from this: >> plot(fit) >> ...on the same scales as above? >> >> Related question: how do get the dashed values out of the fit object so I >> can do 'stuff' with it? >> >> Thanks, >> >> Ben >> >> PS - thank you Patrick for your help previously. >> >>[[alternative HTML version deleted]] >> >> __** >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help> >> PLEASE do read the posting guide http://www.R-project.org/** >> posting-guide.html <http://www.R-project.org/posting-guide.html> >> and provide commented, minimal, self-contained, reproducible code. >> > > > -- > Patrick Breheny > Assistant Professor > Department of Biostatistics > Department of Statistics > University of Kentucky > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] gam - Y axis probability scale with confidence/error lines
Thank you. The binomial()$linkinv() is good to know. Ben On Wed, Mar 14, 2012 at 12:23 PM, Patrick Breheny wrote: > Actually, I responded a bit too quickly last time, without really reading > through your example carefully. You're fitting a logistic regression model > and plotting the results on the probability scale. The better way to do > what you propose is to obtain the confidence interval on the scale of the > linear predictor and then transform to the probability scale, as in: > > x <- seq(0,1,by=.01) > y <- rbinom(length(x),size=1,p=x) > require(gam) > fit <- gam(y~s(x),family=binomial) > pred <- predict(fit,se.fit=TRUE) > yy <- binomial()$linkinv(pred$fit) > l <- binomial()$linkinv(pred$fit-1.**96*pred$se.fit) > u <- binomial()$linkinv(pred$fit+1.**96*pred$se.fit) > plot(x,yy,type="l") > lines(x,l,lty=2) > lines(x,u,lty=2) > > > -- > Patrick Breheny > Assistant Professor > Department of Biostatistics > Department of Statistics > University of Kentucky > > > > > On 03/14/2012 01:49 PM, Ben quant wrote: > >> That was embarrassingly easy. Thanks again Patrick! Just correcting a >> little typo to his reply. this is probably what he meant: >> >> pred = predict(fit,data.frame(x=xx),**type="response",se.fit=TRUE) >> upper = pred$fit + 1.96 * pred$se.fit >> lower = pred$fit - 1.96 * pred$se.fit >> >> # For people who are interested this is how you plot it line by line: >> >> plot(xx,pred$fit,type="l",**xlab=fd$getFactorName(),ylab=**ylab,ylim= >> c(min(down),max(up))) >> lines(xx,upper,type="l",lty='**dashed') >> lines(xx,lower,type="l",lty='**dashed') >> >> In my opinion this is only important if the desired y axis is different >> than what plot(fit) gives you for a gam fit (i.e fit <- >> gam(...stuff...)) and you want to plot the confidence intervals. >> >> thanks again! >> >> Ben >> >> On Wed, Mar 14, 2012 at 10:39 AM, Patrick Breheny >> > <mailto:patrick.breheny@uky.**edu>> >> wrote: >> >>The predict() function has an option 'se.fit' that returns what you >>are asking for. If you set this equal to TRUE in your code: >> >>pred <- predict(fit,data.frame(x=xx),_**_type="response",se.fit=TRUE) >> >> >>will return a list with two elements, 'fit' and 'se.fit'. The >>pointwise confidence intervals will then be >> >>pred$fit + 1.96*se.fit >>pred$fit - 1.96*se.fit >> >>for 95% confidence intervals (replace 1.96 with the appropriate >>quantile of the normal distribution for other confidence levels). >> >>You can then do whatever "stuff" you want to do with them, including >>plot them. >> >>--Patrick >> >> >>On 03/14/2012 10:48 AM, Ben quant wrote: >> >>Hello, >> >>How do I plot a gam fit object on probability (Y axis) vs raw >>values (X >>axis) axis and include the confidence plot lines? >> >>Details... >> >>I'm using the gam function like this: >>l_yx[,2] = log(l_yx[,2] + .0004) >>fit<- gam(y~s(x),data=as.data.frame(**__l_yx),family=binomial) >> >> >>And I want to plot it so that probability is on the Y axis and >>values are >>on the X axis (i.e. I don't want log likelihood on the Y axis or >>the log of >>my values on my X axis): >> >>xx<- seq(min(l_yx[,2]),max(l_yx[,2]**__),len=101) >>plot(xx,predict(fit,data.__**frame(x=xx),type="response"),_** >> _type="l",xaxt="n",xlab="**Churn"__,ylab="P(Top >> >>Performer)") >>at<- c(.001,.01,.1,1,10) #<-- I'd also like to >>generalize >>this rather than hard code the numbers >>axis(1,at=log(at+ .0004),label=at) >> >>So far, using the code above, everything looks the way I want. >>But that >>does not give me anything information on >>variability/confidence/__**certainty. >> >>How do I get the dash plots from this: >>plot(fit) >>...on the same scales as above? >> >>Related question: how do get the dashed values out of the fit >>object so I >>can do 'stuff' with it? >> >>Thanks, &
[R] resetting console
Hello, I'm still hoping my issue is preventable and not worthy of a bug/crash report, hence my post is in 'help'. Anyway, I'd like to know how to reset the console so it is clear of all residual effects caused by previous scripts. Details: I run a script once and it runs successfully (but very slowly). The script uses fairly sizable data. No problem so far. Then I run the same exact script again and the console eventually crashes. Naturally, I thought it was the size of the data so I: 1) run script successfully (which includes a plot) 2) do this: dev.off() rm(list=ls(all=TRUE)) gc() 3) run script again ...and it still crashes. There isn't an R error or anything, I just get a Microsoft error report request window and the console goes away. However, if I: 1) run script successfully 2) shut down, and reopen console 3) run script again ...everything runs as expected and the console does not crash. If the script runs successfully the first time and I'm clearing all available memory (I think) what is 'remaining' that I need to reset in the console (that restarting the console solves/clears out)? PS - Because the script runs successfully, I'm thinking the script itself is not all that important. I'd prefer an answer that indicates generally how to reset the console. Basically I'm loading some data, manipulating the data including a log transformation, doing a gam fit (family="binomial"), and finally I plot the data from the fit. Interestingly, if I set family to "gaussian" or if I do not log transform the data, the console does not crash. Or should I post a crash/bug? Regards, Ben [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] sequencing environments
Hello, I can get at environments if I know their names, but what if want to look at what environments currently exist at some point in a script? In other words, if I don't know what environments exist and I don't know their sequence/hierarchy, how do I display a visual representation of the environments and how they relate to one another? I'm looking at getting away from the package R.oo and using R in normal state, but I need a way to "check in on" the status and organization of my environments. I've done considerable research on R's environments, but its a challenging thing to google and come up with meaningful results. Thanks, Ben [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sequencing environments
Thank you Duncan. Interesting. I find it strange that you can't get a list of the environments. But I'll deal with it... Anyway, I'm about to start a new R dev project for my company. I'm thinking about architecture, organization, and gotchas. I went through much of the documentation you sent me. Thanks!. I came up with what I think is the best way to implement environments (which I am using like I would use a class in a traditional OO language) that can be reused in various programs. I'm thinking of creating different scripts like this: #this is saved as script name EnvTest.R myEnvir = new.env() var1 = 2 + 2 assign("myx",var1,envir=myEnvir) Then I will write programs like this that will use the environments and the objects/functions they contain: source("EnvTest.r") prgmVar1 = get("myVar1",pos=myEnvir) ## do stuff with env objects print(prgmVar1) Do you think this is the best way to use environments to avoid naming conflicts, take advantage of separation of data, organize scripting logically, etc. (the benefits of traditional OO classes)? Eventually, I'll use this on a Linux machine in the cloud using.: https://github.com/armstrtw/rzmq https://github.com/armstrtw/AWS.tools https://github.com/armstrtw/deathstar http://code.google.com/p/segue/ ...do you (or anyone else) see any gotchas here? Any suggestions, help, things to watch for are welcome... Note: I am aware of the (surprising?) scoping rules. Thanks so much for your help. Ben On Tue, Feb 14, 2012 at 5:04 AM, Duncan Murdoch wrote: > On 12-02-14 12:34 AM, Ben quant wrote: > >> Hello, >> >> I can get at environments if I know their names, but what if want to look >> at what environments currently exist at some point in a script? In other >> words, if I don't know what environments exist and I don't know their >> sequence/hierarchy, how do I display a visual representation of the >> environments and how they relate to one another? >> > > Environments are objects and most of them are maintained in the same > places as other objects (including some obscure places, such as in > structures maintained only in external package code), so it's not easy to > generate a complete list. > > > >> I'm looking at getting away from the package R.oo and using R in normal >> state, but I need a way to "check in on" the status and organization of my >> environments. >> >> I've done considerable research on R's environments, but its a challenging >> thing to google and come up with meaningful results. >> > > I would suggest reading the technical documentation: the R Language > manual, the R Internals manual, and some of the papers on the "Technical > papers" page. > > Duncan Murdoch > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sequencing environments
Thanks Gabor/Duncan, I might give that proto package a try. The R.oo package is more intuitive for someone coming from a traditional OO background, but compared to proto, it looks like it requires a lot more typing to create the same amount of functionality. I've used R.oo for a number of months now and it works great. The other option is to just use get() and assign(), like I suggested in my original post, which seems to be the simplest, but more typing than proto. Thanks for the info! Have a good weekend... ben On Wed, Feb 15, 2012 at 11:09 PM, Gabor Grothendieck < ggrothendi...@gmail.com> wrote: > On Wed, Feb 15, 2012 at 11:58 PM, Ben quant wrote: > > Thank you Duncan. Interesting. I find it strange that you can't get a > list > > of the environments. But I'll deal with it... > > > > Anyway, I'm about to start a new R dev project for my company. I'm > thinking > > about architecture, organization, and gotchas. I went through much of the > > documentation you sent me. Thanks!. I came up with what I think is the > best > > way to implement environments (which I am using like I would use a class > in > > a traditional OO language) that can be reused in various programs. > > > > I'm thinking of creating different scripts like this: > > #this is saved as script name EnvTest.R > > myEnvir = new.env() > > var1 = 2 + 2 > > assign("myx",var1,envir=myEnvir) > > > > Then I will write programs like this that will use the environments and > the > > objects/functions they contain: > > source("EnvTest.r") > > prgmVar1 = get("myVar1",pos=myEnvir) > > ## do stuff with env objects > > print(prgmVar1) > > > > Do you think this is the best way to use environments to avoid naming > > conflicts, take advantage of separation of data, organize scripting > > logically, etc. (the benefits of traditional OO classes)? Eventually, > I'll > > use this on a Linux machine in the cloud using.: > > https://github.com/armstrtw/rzmq > > https://github.com/armstrtw/AWS.tools > > https://github.com/armstrtw/deathstar > > http://code.google.com/p/segue/ > > > Reference classes, the oo.R package and the proto package provide OO > implementations based on environments. > > Being particular familiar with the proto package > (http://r-proto.googlecode.com), I will discuss it. The graph.proto > function in that package will draw a graphViz graph of your proto > objects (environments). Using p and x in place of myEnv and myx your > example is as follows. > > library(proto) > p <- proto(x = 2+2) > p$x # 4 > > # add a method, incr > p$incr <- function(.) .$x <- .$x + 1 > p$incr() # increment x > p$x # 5 > > # create a child > # it overrides x; inherits incr from p > ch <- p$proto(x = 100) > ch$incr() > ch$x # 101 > > -- > Statistics & Software Consulting > GKX Group, GKX Associates Inc. > tel: 1-877-GKX-GROUP > email: ggrothendieck at gmail.com > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] proto: make a parameter persist
The code below works as expected but: Using the proto package, is this the best way to 1) make a parameter persist if the parameter is passed in with a value, 2) allow for calling the bias() function without a parameter assignment, 3) have the x2 value initialize as 5? Thanks for your feedback. Giving the proto package a test beat and establishing some templates for myself. > oo <- proto(expr = {x = c(10, 20, 15, 19, 17) x2 = 5 # so > x2 initializes as 5, but can be overwritten with param assignment > bias <- function(.,x2=.$x2) { # x2=.$x2 so no default param is needed > .$x2 = x2 # so x2 persists in the env > .$x <- .$x + x2 } })> o = oo$proto()> o$x # [1] > 10 20 15 19 17> o$x2 #[1] 5> o$bias(x2 = 100)> o$x2 # [1] 100> o$x # [1] 110 > 120 115 119 117 Regards, Ben [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] proto: make a parameter persist
I like it better. Thanks! Ben On Fri, Feb 17, 2012 at 11:38 PM, Gabor Grothendieck < ggrothendi...@gmail.com> wrote: > On Sat, Feb 18, 2012 at 12:44 AM, Ben quant wrote: > > The code below works as expected but: > > Using the proto package, is this the best way to 1) make a parameter > > persist if the parameter is passed > > in with a value, 2) allow for calling the bias() function without a > > parameter assignment, 3) have > > the x2 value initialize as 5? Thanks for your feedback. Giving the > > proto package a test beat and > > establishing some templates for myself. > > > >> oo <- proto(expr = {x = c(10, 20, 15, 19, 17) x2 = > 5 # so x2 initializes as 5, but can be overwritten with param assignment > bias <- function(.,x2=.$x2) { # x2=.$x2 so no default > param is needed .$x2 = x2 # so x2 persists in the > env .$x <- .$x + x2 } })> o = > oo$proto()> o$x # [1] 10 20 15 19 17> o$x2 #[1] 5> o$bias(x2 = 100)> o$x2 # > [1] 100> o$x # [1] 110 120 115 119 117 > > > > This is not very different from what you have already but here it is > for comparison. Note that the with(...) line has the same meaning as > .$x <- .$x + .$x2 : > > oo <- proto( > x = c(10, 20, 15, 19, 17), > x2 = 5, > bias = function(., x2) { > if (!missing(x2)) .$x2 <- x2 > with(., x <- x + x2) > } > ) > > -- > Statistics & Software Consulting > GKX Group, GKX Associates Inc. > tel: 1-877-GKX-GROUP > email: ggrothendieck at gmail.com > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] proto: make a parameter persist
Thanks again for your so far on proto. I have another question. What is the best way to "do stuff" based on data prior to calling a function? I tried the code below without expr (and including commas after data member assignments), but it errors out. I'd like to make decisions based on inputs during proto object construction and prep my data_members for use in the functions that (would) follow. I think I could probably get around this with a small function with other functions within the same proto object, but I'd rather not repeat that in each function...if that makes sense. See my last line of code below: makeProto = proto( expr={ data_member1=NULL data_member2=5 data_member3=NULL if(!is.null(data_member1)){ with(.,data_member3 = data_member1 + data_member2) } }) oo = makeProto$proto() oo$data_member1 # NULL oo$data_member2 # 5 oo$data_member3 # NULL oo2 = makeProto$proto(data_member1 = 7) oo2$data_member1 # 7 oo2$data_member2 # 5 oo2$data_member3 # I want this to be 12 (12 = 7 + 5), but I get NULL Its late for me so hopefully this makes sense... Thanks! ben On Fri, Feb 17, 2012 at 11:38 PM, Gabor Grothendieck < ggrothendi...@gmail.com> wrote: > On Sat, Feb 18, 2012 at 12:44 AM, Ben quant wrote: > > The code below works as expected but: > > Using the proto package, is this the best way to 1) make a parameter > > persist if the parameter is passed > > in with a value, 2) allow for calling the bias() function without a > > parameter assignment, 3) have > > the x2 value initialize as 5? Thanks for your feedback. Giving the > > proto package a test beat and > > establishing some templates for myself. > > > >> oo <- proto(expr = {x = c(10, 20, 15, 19, 17) x2 = > 5 # so x2 initializes as 5, but can be overwritten with param assignment > bias <- function(.,x2=.$x2) { # x2=.$x2 so no default > param is needed .$x2 = x2 # so x2 persists in the > env .$x <- .$x + x2 } })> o = > oo$proto()> o$x # [1] 10 20 15 19 17> o$x2 #[1] 5> o$bias(x2 = 100)> o$x2 # > [1] 100> o$x # [1] 110 120 115 119 117 > > > > This is not very different from what you have already but here it is > for comparison. Note that the with(...) line has the same meaning as > .$x <- .$x + .$x2 : > > oo <- proto( > x = c(10, 20, 15, 19, 17), > x2 = 5, > bias = function(., x2) { > if (!missing(x2)) .$x2 <- x2 > with(., x <- x + x2) > } > ) > > -- > Statistics & Software Consulting > GKX Group, GKX Associates Inc. > tel: 1-877-GKX-GROUP > email: ggrothendieck at gmail.com > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] proto: make a parameter persist
Thank you very much! I'll follow-up with more questions as I dabble...if I have any. Thank you ben On Tue, Feb 21, 2012 at 7:01 AM, Gabor Grothendieck wrote: > On Tue, Feb 21, 2012 at 12:15 AM, Ben quant wrote: > > Thanks again for your so far on proto. I have another question. > > > > What is the best way to "do stuff" based on data prior to calling a > > function? I tried the code below without expr (and including commas after > > data member assignments), but it errors out. I'd like to make decisions > > based on inputs during proto object construction and prep my data_members > > for use in the functions that (would) follow. I think I could probably > get > > around this with a small function with other functions within the same > proto > > object, but I'd rather not repeat that in each function...if that makes > > sense. See my last line of code below: > > > > makeProto = proto( expr={ > > data_member1=NULL > > data_member2=5 > > data_member3=NULL > > if(!is.null(data_member1)){ > > with(.,data_member3 = data_member1 + data_member2) > > } > > }) > > oo = makeProto$proto() > > oo$data_member1 # NULL > > oo$data_member2 # 5 > > oo$data_member3 # NULL > > oo2 = makeProto$proto(data_member1 = 7) > > oo2$data_member1 # 7 > > oo2$data_member2 # 5 > > oo2$data_member3 # I want this to be 12 (12 = 7 + 5), but I get NULL > > > > Its late for me so hopefully this makes sense... > > > > There are multiple issues here: > > 1. The expr is executed at the time you define the proto object -- its > not a method. Once the proto object is defined the only thing that is > left is the result of the computation so you can't spawn a child and > then figure that this code will be rerun as if its a constructor. You > need to define a constructor method to do that. > > 2. You can't use dot as if it were a special notation -- its not. A > single dot is just the name of an ordinary variable and is not > anything special that proto knows about. In the examples where dot is > used its used as the first formal argument to various methods but this > was the choice of the method writer and not something required by > proto. We could have used self or this or any variable name. > > 3. Note that the code in expr=... is already evaluated in the > environment of the proto object so you don't need with. > > 4. I personally find it clearer to reserve = for argument assignment > and use <- for ordinary assignment but that is mostly a style issue > and its up to you: > > 5. The discussion of traits in the proto vignette illustrates > constructors -- be sure to read that. Traits are not a special > construct built into proto but rather its just a way in which you can > use proto. That is one of the advantages of the prototype model of > OO -- you don't need to have special language constructs for many > situations where ordinary OO needs such constructs since they are all > subsumed under one more general set of primitives. > > Here we define the trait MakeProto (again, traits are not a special > language feature of proto but are just a way of using it): > > MakeProto <- proto( > new = function(., ...) { > .$proto(expr = if ( !is.null(d1) ) d3 <- d1 + d2, ...) > }, > d1 = NULL, > d2 = 5, > d3 = NULL > ) > > oo <- MakeProto$new() > oo$d1 # NULL > oo$d2 # 5 > oo$d3 # NULL > > oo2 <- MakeProto$new(d1 = 7) > oo2$d1 # 7 > oo2$d2 # 5 > oo2$d3 # 12 > > In the above oo$d1, oo$d2, oo$d3 are actually located in MakeProto and > delegated to oo so that when one writes oo$d2 it looks into MakeProto > since it cannot find d2 in oo. oo2$d2 is also not in oo2 but > delegated from MakeProto; however, oo2$d1 and oo2$d3 are located in > oo2 itself. That is due to the way we set it up and we could have set > it up differently. Try str(MakeProto); str(oo); str(oo2) to see > this. > > -- > Statistics & Software Consulting > GKX Group, GKX Associates Inc. > tel: 1-877-GKX-GROUP > email: ggrothendieck at gmail.com > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] rank with uniform count for each rank
Hello, What is the best way to get ranks for a vector of values, limit the range of rank values and create equal count in each group? I call this uniform ranking...uniform count/number in each group. Here is an example using three groups: Say I have values: x = c(3, 2, -3, 1, 0, 5, 10, 30, -1, 4) names(x) = letters[1:10] > x a b c d e f g h i j 3 2 -3 1 0 5 10 30 -1 4 I would like: a b c d e f g h i j 2 2 1 2 1 3 3 3 1 3 Same thing as above, maybe easier to see: c i e d b a j f g h -3 -1 0 1 2 3 4 5 10 30 I would get: c i e d b a j f g h 1 1 1 2 2 2 3 3 3 3 Note that there are 4 values with a rank of 3 because I can't get even numbers (10/3 = 3.333). Been to ?sort, ?order, ?quantile, ?cut, and ?split. Thanks, Ben [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] rank with uniform count for each rank
Thank you everyone! We already use the Hmisc package so I'll likely use cut2. Ben On Wed, Feb 22, 2012 at 2:22 PM, David Winsemius wrote: > > On Feb 22, 2012, at 4:01 PM, Ben quant wrote: > > Hello, >> >> What is the best way to get ranks for a vector of values, limit the range >> of rank values and create equal count in each group? I call this uniform >> ranking...uniform count/number in each group. >> >> Here is an example using three groups: >> >> Say I have values: >> x = c(3, 2, -3, 1, 0, 5, 10, 30, -1, 4) >> names(x) = letters[1:10] >> >>> x >>> >> a b c d e f g h i j >> 3 2 -3 1 0 5 10 30 -1 4 >> I would like: >> a b c d e f g h i j >> 2 2 1 2 1 3 3 3 1 3 >> >> Same thing as above, maybe easier to see: >> c i e d b a j f g h >> -3 -1 0 1 2 3 4 5 10 30 >> I would get: >> c i e d b a j f g h >> 1 1 1 2 2 2 3 3 3 3 >> >> Note that there are 4 values with a rank of 3 because I can't get even >> numbers (10/3 = 3.333). >> >> Been to ?sort, ?order, ?quantile, ?cut, and ?split. >> > > You may need to look more carefully at the definitions and adjustments to > `cut` and `quantile` but this does roughly what you asked: > > n=3 > as.numeric( cut(x, breaks=quantile(x, prob=(0:n)/n) , include.lowest=TRUE) > ) > @ [1] 1 1 1 1 2 2 2 3 3 3 > > It a fairly common task and Harrell's cut2 function has a g= parameter > (for number of groups) that I generally use: > > library(Hmisc) > > cut2(x, g=3) > [1] [-3, 2) [-3, 2) [-3, 2) [-3, 2) [ 2, 5) [ 2, 5) [ 2, 5) [ 5,30] [ > 5,30] [ 5,30] > Levels: [-3, 2) [ 2, 5) [ 5,30] > > as.numeric( cut2(x, g=3)) > [1] 1 1 1 1 2 2 2 3 3 3 > > > > >> Thanks, >> >> Ben >> >>[[alternative HTML version deleted]] >> >> __** >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help> >> PLEASE do read the posting guide http://www.R-project.org/** >> posting-guide.html <http://www.R-project.org/posting-guide.html> >> and provide commented, minimal, self-contained, reproducible code. >> > > David Winsemius, MD > West Hartford, CT > > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] group calculations with other columns for the ride
Hello, I can get the median for each factor, but I'd like another column to go with each factor. The nm column is a long name for the lvls column. So unique work except for the order can get messed up. Example: x = data.frame(val=1:10,lvls=c('cat2',rep("cat1",4),rep("cat2",4),'cat1'),nm=c('longname2',rep("longname1",4),rep("longname2",4),'longname1')) x val lvlsnm 11 cat2 longname2 22 cat1 longname1 33 cat1 longname1 44 cat1 longname1 55 cat1 longname1 66 cat2 longname2 77 cat2 longname2 88 cat2 longname2 99 cat2 longname2 10 10 cat1 longname1 unique doesn't work in data.frame: mdn = do.call(rbind,lapply(split(x[,1], x[,2]), median)) data.frame(mdn,ln=as.character(unique(x[,3]))) mdnln cat1 4 longname2 cat2 7 longname1 I want: mdnln cat1 4 longname1 cat2 7 longname2 Thank you very much! PS - looking for simple'ish solutions. I know I can do it with loops and merges, but is there an option I am not using here? Ben [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] group calculations with other columns for the ride
Excellent! I wonder why I haven't seen aggregate before. Thanks! ben On Tue, Feb 28, 2012 at 4:51 PM, ilai wrote: > aggregate(val~lvls+nm,data=x,FUN='median') > > > > On Tue, Feb 28, 2012 at 4:43 PM, Ben quant wrote: > > Hello, > > > > I can get the median for each factor, but I'd like another column to go > > with each factor. The nm column is a long name for the lvls column. So > > unique work except for the order can get messed up. > > > > Example: > > x = > > > data.frame(val=1:10,lvls=c('cat2',rep("cat1",4),rep("cat2",4),'cat1'),nm=c('longname2',rep("longname1",4),rep("longname2",4),'longname1')) > > x > > val lvlsnm > > 11 cat2 longname2 > > 22 cat1 longname1 > > 33 cat1 longname1 > > 44 cat1 longname1 > > 55 cat1 longname1 > > 66 cat2 longname2 > > 77 cat2 longname2 > > 88 cat2 longname2 > > 99 cat2 longname2 > > 10 10 cat1 longname1 > > > > unique doesn't work in data.frame: > > mdn = do.call(rbind,lapply(split(x[,1], x[,2]), median)) > > data.frame(mdn,ln=as.character(unique(x[,3]))) > > mdnln > > cat1 4 longname2 > > cat2 7 longname1 > > > > I want: > > mdnln > > cat1 4 longname1 > > cat2 7 longname2 > > > > Thank you very much! > > > > PS - looking for simple'ish solutions. I know I can do it with loops and > > merges, but is there an option I am not using here? > > > > Ben > > > >[[alternative HTML version deleted]] > > > > __ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] quarter end dates between two date strings
Hello, I have two date strings, say "1972-06-30" and "2012-01-31", and I'd like to get every quarter period end date between those dates? Does anyone know how to do this? Speed is important... Here is a small sample: Two dates: "2007-01-31" "2012-01-31" And I'd like to get this: [1] "2007-03-31" "2007-06-30" "2007-09-30" "2007-12-31" "2008-03-31" "2008-06-30" "2008-09-30" "2008-12-31" [9] "2009-03-31" "2009-06-30" "2009-09-30" "2009-12-31" "2010-03-31" "2010-06-30" "2010-09-30" "2010-12-31" [17] "2011-03-31" "2011-06-30" "2011-09-30" "2011-12-31" Thanks! ben [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] lines crosses
Hello, If the exact value does not exist in the vector, can I still get at the intersections? Is there a simple way to do this and avoid looping? Seems like there would be a simple R function to do this... Example: vec <- c(5,4,3,2,3,4,5) vec [1] 5 4 3 2 3 4 5 intersect(vec,2.5) numeric(0) I want to get: 2.5 and 2.5 My real data is very large and I don't know the values of anything ahead of time. The vec vector is produced by the gam function so it can be just about any continuous line. Thanks, Ben [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] get plot axis rounding method
Hello, Does anyone know how to get the rounding method used for the axis tick numbers/values in plot()? I'm using mtext() to plot the values used to plot vertical and horizontal lines (using abline()) and I'd like these vertical and horizontal line values to be rounded like the axis tick values are rounded. In other words, I want numbers plotted with mtext() to be rounded in the same fashion as the axis values given by default by plot(). thank you very much for your help! ben [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] check if excel file is
Hello again, I'd like to determine if an Excel file is open or writable. Can anyone help me with that? I write some stats to an .xlsx Excel file using the xlsx package. I can't write to the file unless its closed. How do I determine if the .xlsx file is open or closed so I can write to it? I've looked at file.info and file.access and I couldn't get those to work for me. Any help would be great! ben [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] check if excel file is
Forgot this: the solution doesn't have to come from the xlsx package... thanks ben On Fri, Apr 27, 2012 at 10:08 AM, Ben quant wrote: > Hello again, > > I'd like to determine if an Excel file is open or writable. Can anyone > help me with that? > > I write some stats to an .xlsx Excel file using the xlsx package. I can't > write to the file unless its closed. How do I determine if the .xlsx file > is open or closed so I can write to it? > > I've looked at file.info and file.access and I couldn't get those to work > for me. > > Any help would be great! > ben > > > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] check if excel file is
To get around the issue below, I just wrapped it with try(), but would like to know how to know the question below. Thanks! ben On Fri, Apr 27, 2012 at 10:13 AM, Ben quant wrote: > Forgot this: the solution doesn't have to come from the xlsx package... > > thanks > > ben > > > On Fri, Apr 27, 2012 at 10:08 AM, Ben quant wrote: > >> Hello again, >> >> I'd like to determine if an Excel file is open or writable. Can anyone >> help me with that? >> >> I write some stats to an .xlsx Excel file using the xlsx package. I can't >> write to the file unless its closed. How do I determine if the .xlsx file >> is open or closed so I can write to it? >> >> I've looked at file.info and file.access and I couldn't get those to >> work for me. >> >> Any help would be great! >> ben >> >> >> > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] GAM, how to set qr=TRUE
Hello, I don't understand what went wrong or how to fix this. How do I set qr=TRUE for gam? When I produce a fit using gam like this: fit = gam(y~s(x),data=as.data.frame(l_yx),family=family,control = list(keepData=T)) ...then try to use predict: (see #1 below in the traceback() ) > traceback() 6: stop("lm object does not have a proper 'qr' component.\n Rank zero or should not have used lm(.., qr=FALSE).") at #81 5: qr.lm(object) at #81 4: summary.glm(object, dispersion = dispersion) at #81 3: summary(object, dispersion = dispersion) at #81 2: predict.glm(fit, data.frame(x = xx), type = "response", se.fit = T, col = prediction_col, lty = prediction_ln) at #81 1: predict(fit, data.frame(x = xx), type = "response", se.fit = T, col = prediction_col, lty = prediction_ln) at #81 ...I get this error: Error in qr.lm(object) : lm object does not have a proper 'qr' component. Rank zero or should not have used lm(.., qr=FALSE). I read this post: http://tolstoy.newcastle.edu.au/R/devel/06/04/5133.html So I tried adding qr=T to the gam call but it didn't make any difference. This is how I did it: fit = gam(y~s(x),data=as.data.frame(l_yx),family=family,control = list(keepData=T),qr=T) Its all very strange because I've produced fits with this data may times before with no issues (and never having to do anything with the qr parameter. I don't understand why this is coming up or how to fix it. PS - I don't think this matters, but I am calling a script called FunctionGamFit.r like this: err = system(paste('"C:\\Program Files\\R\\R-2.14.1\\bin\\R.exe"', 'CMD BATCH FunctionGamFit.r'), wait = T) ...to produce the fit. Thanks for any help! ben [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] GAM, how to set qr=TRUE
Solution: have package mgcv loaded when you predict...not just for the fit. :) Silly mistake... Thanks Simon! Ben On Thu, May 3, 2012 at 3:56 PM, Ben quant wrote: > Hello, > > I don't understand what went wrong or how to fix this. How do I set > qr=TRUE for gam? > > When I produce a fit using gam like this: > > fit = gam(y~s(x),data=as.data.frame(l_yx),family=family,control = > list(keepData=T)) > > ...then try to use predict: > (see #1 below in the traceback() ) > > > traceback() > 6: stop("lm object does not have a proper 'qr' component.\n Rank zero or > should not have used lm(.., qr=FALSE).") at #81 > 5: qr.lm(object) at #81 > 4: summary.glm(object, dispersion = dispersion) at #81 > 3: summary(object, dispersion = dispersion) at #81 > 2: predict.glm(fit, data.frame(x = xx), type = "response", se.fit = T, >col = prediction_col, lty = prediction_ln) at #81 > 1: predict(fit, data.frame(x = xx), type = "response", se.fit = T, >col = prediction_col, lty = prediction_ln) at #81 > > ...I get this error: > > Error in qr.lm(object) : lm object does not have a proper 'qr' component. > Rank zero or should not have used lm(.., qr=FALSE). > > I read this post: http://tolstoy.newcastle.edu.au/R/devel/06/04/5133.html > > So I tried adding qr=T to the gam call but it didn't make any difference. > This is how I did it: > > fit = gam(y~s(x),data=as.data.frame(l_yx),family=family,control = > list(keepData=T),qr=T) > > Its all very strange because I've produced fits with this data may times > before with no issues (and never having to do anything with the qr > parameter. I don't understand why this is coming up or how to fix it. > > PS - I don't think this matters, but I am calling a script called > FunctionGamFit.r like this: > err = system(paste('"C:\\Program Files\\R\\R-2.14.1\\bin\\R.exe"', 'CMD > BATCH FunctionGamFit.r'), wait = T) > ...to produce the fit. > > Thanks for any help! > > ben > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] domain/number line/range reduction problem
Hello, Currently I'm only coming up with brute force solutions to this issue. Wondering if anyone knows of a better way to do this. The problem: I have endpoints of one x range (x_rng) and an unknown number of s ranges (s[#]_rng) also defined by endpoints. What I want are the parts of the x ranges that don't overlap the s ranges. The examples below demonstrate what I mean. I'm glossing over an obvious endpoint inclusion/exclusion issue here for simplicity, but in a perfect world the resulting ranges would not include the s range endpoints and would include endpoints of the x range if they were not eliminated by an s range. Is there some function(s) in R that would make this easy? Ex 1. For: x_rng = c(-100,100) s1_rng = c(-25.5,30) s2_rng = c(0.77,10) s3_rng = c(25,35) s4_rng = c(70,80.3) s5_rng = c(90,95) I would get: xa_rng = c(-100,-25.5) xb_rng = c(35,70) xc_rng = c(80.3,90) xd_rng = c(95,100) Ex 2. For: x_rng = c(-50.5,100) s1_rng = c(-75.3,30) I would get: xa_rng = c(30,100) Ex 3. For: x_rng = c(-75.3,30) s1_rng = c(-50.5,100) I would get: xa_rng = c(-75.3,-50.5) Ex 4. For: x_rng = c(-100,100) s1_rng = c(-105,105) I would get something like: xa_rng = c(NA,NA) or... xa_rng = NA Ex 5. For: x_rng = c(-100,100) s1_rng = c(-100,100) I would get something like: xa_rng = c(NA,NA) or... xa_rng = NA [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] domain/number line/range reduction problem
Clarification/correction: Ex 5 isn't consistent with the other examples To be consistent with the other examples the resulting ranges would be something like: xa_rng = c(-100,-100) xb_rng = c(100,100) or just... xa_rng = -100 xb_rng = 100 However, my original Ex 5 would be a good solution if the s range endpoints were not included in the results per my statement: "...but in a perfect world the resulting ranges would not include the s range endpoints and would include endpoints of the x range if they were not eliminated by an s range." Sorry, for any confusion. Thanks! Ben On Fri, May 11, 2012 at 8:58 AM, Ben quant wrote: > Hello, > > Currently I'm only coming up with brute force solutions to this issue. > Wondering if anyone knows of a better way to do this. > > The problem: I have endpoints of one x range (x_rng) and an unknown number > of s ranges (s[#]_rng) also defined by endpoints. What I want are the parts > of the x ranges that don't overlap the s ranges. The examples below > demonstrate what I mean. I'm glossing over an obvious endpoint > inclusion/exclusion issue here for simplicity, but in a perfect world the > resulting ranges would not include the s range endpoints and would include > endpoints of the x range if they were not eliminated by an s range. > > Is there some function(s) in R that would make this easy? > > Ex 1. > For: > x_rng = c(-100,100) > > s1_rng = c(-25.5,30) > s2_rng = c(0.77,10) > s3_rng = c(25,35) > s4_rng = c(70,80.3) > s5_rng = c(90,95) > > I would get: > xa_rng = c(-100,-25.5) > xb_rng = c(35,70) > xc_rng = c(80.3,90) > xd_rng = c(95,100) > > Ex 2. > For: > x_rng = c(-50.5,100) > > s1_rng = c(-75.3,30) > > I would get: > xa_rng = c(30,100) > > Ex 3. > For: > x_rng = c(-75.3,30) > > s1_rng = c(-50.5,100) > > I would get: > xa_rng = c(-75.3,-50.5) > > Ex 4. > For: > x_rng = c(-100,100) > > s1_rng = c(-105,105) > > I would get something like: > xa_rng = c(NA,NA) > or... > xa_rng = NA > > Ex 5. > For: > x_rng = c(-100,100) > > s1_rng = c(-100,100) > > I would get something like: > xa_rng = c(NA,NA) > or... > xa_rng = NA > > > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] range segment exclusion using range endpoints
Hello, I'm posting this again (with some small edits). I didn't get any replies last time...hoping for some this time. :) Currently I'm only coming up with brute force solutions to this issue (loops). I'm wondering if anyone has a better way to do this. Thank you for your help in advance! The problem: I have endpoints of one x range (x_rng) and an unknown number of s ranges (s[#]_rng) also defined by the range endpoints. I'd like to remove the x ranges that overlap with the s ranges. The examples below demonstrate what I mean. What is the best way to do this? Ex 1. For: x_rng = c(-100,100) s1_rng = c(-25.5,30) s2_rng = c(0.77,10) s3_rng = c(25,35) s4_rng = c(70,80.3) s5_rng = c(90,95) I would get: -100,-25.5 35,70 80.3,90 95,100 Ex 2. For: x_rng = c(-50.5,100) s1_rng = c(-75.3,30) I would get: 30,100 Ex 3. For: x_rng = c(-75.3,30) s1_rng = c(-50.5,100) I would get: -75.3,-50.5 Ex 4. For: x_rng = c(-100,100) s1_rng = c(-105,105) I would get something like: NA,NA or... NA Ex 5. For: x_rng = c(-100,100) s1_rng = c(-100,100) I would get something like: -100,-100 100,100 or just... -100 100 PS - You may have noticed that in all of the examples I am including the s range endpoints in the desired results, which I can deal with later in my program so its not a problem... I think leaving in the s range endpoints simplifies the problem. Thanks! Ben [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] range segment exclusion using range endpoints
rence: return Ranges object describing points that are in x > but not y >x <- unionIntervals(x) >y <- unionIntervals(y) >nx <- nrow(x) >ny <- nrow(y) >u <- c(x[, 1], y[, 1], x[, 2], y[, 2]) >o <- order(u) >u <- u[o] >vx <- cumsum(jx <- rep(c(1, 0, -1, 0), c(nx, ny, nx, ny))[o]) >vy <- cumsum(jy <- rep(c(0, -1, 0, 1), c(nx, ny, nx, ny))[o]) >as.Ranges(u[vx == 1 & vy == 0], u[(vx == 1 & jy == -1) | (jx == -1 & vy > == 0)]) > } > > intersectRanges <- function(x, y) > { ># return Ranges object describing points that are in both x and y >x <- unionIntervals(x) >y <- unionIntervals(y) >nx <- nrow(x) >ny <- nrow(y) >u <- c(x[, 1], y[, 1], x[, 2], y[, 2]) >o <- order(u) >u <- u[o] >vx <- cumsum(jx <- rep(c(1, 0, -1, 0), c(nx, ny, nx, ny))[o]) >vy <- cumsum(jy <- rep(c(0, 1, 0, -1), c(nx, ny, nx, ny))[o]) >as.Ranges(u[vx == 1 & vy == 1], u[(vx == 1 & jy == -1) | (jx == -1 & vy > == 1)]) > } > > inRanges <- function(x, Ranges) > { >if (length(x) == 1) { >any(x > Ranges[,1] & x <= Ranges[,2]) >} else { >Ranges <- unionIntervals(Ranges) >(findInterval(-x, rev(-as.vector(t(Ranges %% 2) == 1 >} > } > > plot.Ranges <- function(x, ...) > { ># mainly for debugging - no plotting controls, all ... must be Ranges > objects. >RangesList <- list(x=x, ...) >labels <- vapply(as.list(substitute(list(x, ...)))[-1], > function(x)deparse(x)[1], "") >oldmar <- par(mar = replace(par("mar"), 2, max(nchar(labels)/2, 10))) >on.exit(par(oldmar)) >xlim <- do.call("range", c(unlist(RangesList, recursive=FALSE), > list(finite=TRUE))) >ylim <- c(0, length(RangesList)+1) >plot(type="n", xlim, ylim, xlab="", ylab="", axes=FALSE) >grid(ny=0) >axis(side=1) >axis(side=2, at=seq_along(RangesList), lab=labels, las=1, tck=0) >box() >incr <- 0.45 / max(vapply(RangesList, nrow, 0)) >xr <- par("usr")[1:2] # for intervals that extend to -Inf or Inf. >for(i in seq_along(RangesList)) { >r <- RangesList[[i]] >if (nrow(r)>0) { >y <- i + seq(0, by=incr, len=nrow(r)) >r <- r[order(r[,1]),,drop=FALSE] >segments(pmax(r[,1], xr[1]), y, pmin(r[,2], xr[2]), y) > } >} > } > > > > > -Original Message- > > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] > On Behalf > > Of Ben quant > > Sent: Saturday, May 12, 2012 10:54 AM > > To: r-help@r-project.org > > Subject: [R] range segment exclusion using range endpoints > > > > Hello, > > > > I'm posting this again (with some small edits). I didn't get any replies > > last time...hoping for some this time. :) > > > > Currently I'm only coming up with brute force solutions to this issue > > (loops). I'm wondering if anyone has a better way to do this. Thank you > for > > your help in advance! > > > > The problem: I have endpoints of one x range (x_rng) and an unknown > number > > of s ranges (s[#]_rng) also defined by the range endpoints. I'd like to > > remove the x ranges that overlap with the s ranges. The examples below > > demonstrate what I mean. > > > > What is the best way to do this? > > > > Ex 1. > > For: > > x_rng = c(-100,100) > > > > s1_rng = c(-25.5,30) > > s2_rng = c(0.77,10) > > s3_rng = c(25,35) > > s4_rng = c(70,80.3) > > s5_rng = c(90,95) > > > > I would get: > > -100,-25.5 > > 35,70 > > 80.3,90 > > 95,100 > > > > Ex 2. > > For: > > x_rng = c(-50.5,100) > > > > s1_rng = c(-75.3,30) > > > > I would get: > > 30,100 > > > > Ex 3. > > For: > > x_rng = c(-75.3,30) > > > > s1_rng = c(-50.5,100) > > > > I would get: > > -75.3,-50.5 > > > > Ex 4. > > For: > > x_rng = c(-100,100) > > > > s1_rng = c(-105,105) > > > > I would get something like: > > NA,NA > > or... > > NA > > > > Ex 5. > > For: > > x_rng = c(-100,100) > > > > s1_rng = c(-100,100) > > > > I would get something like: > > -100,-100 > > 100,100 > > or just... > > -100 > > 100 > > > > PS - You may have noticed that in all of the examples I am including the > s > > range endpoints in the desired results, which I can deal with later in my > > program so its not a problem... I think leaving in the s range endpoints > > simplifies the problem. > > > > Thanks! > > Ben > > > > [[alternative HTML version deleted]] > > > > __ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] range segment exclusion using range endpoints
Great solution! Thanks! Ben On Sat, May 12, 2012 at 12:50 PM, jim holtman wrote: > Here is an example of how you might do it. It uses a technique of > counting how many items are in a queue based on their arrival times; > it can be used to also find areas of overlap. > > Note that it would be best to use a list for the 's' end points > > > > # note the next statement removes names of the format 's[0-9]+_rng' > > # it would be best to create a list with the 's' endpoints, but this is > > # what the OP specified > > > > rm(list = grep('s[0-9]+_rng', ls(), value = TRUE)) # Danger Will > Robinson!! > > > > # ex 1 > > x_rng = c(-100,100) > > > > s1_rng = c(-25.5,30) > > s2_rng = c(0.77,10) > > s3_rng = c(25,35) > > s4_rng = c(70,80.3) > > s5_rng = c(90,95) > > > > # ex 2 > > # x_rng = c(-50.5,100) > > > > # s1_rng = c(-75.3,30) > > > > # ex 3 > > # x_rng = c(-75.3,30) > > > > # s1_rng = c(-50.5,100) > > > > # ex 4 > > # x_rng = c(-100,100) > > > > # s1_rng = c(-105,105) > > > > # find all the names -- USE A LIST NEXT TIME > > sNames <- grep("s[0-9]+_rng", ls(), value = TRUE) > > > > # initial matrix with the 'x' endpoints > > queue <- rbind(c(x_rng[1], 1), c(x_rng[2], 1)) > > > > # add the 's' end points to the list > > # this will be used to determine how many things are in a queue (or > areas that > > # overlap) > > for (i in sNames){ > + queue <- rbind(queue > + , c(get(i)[1], 1) # enter queue > + , c(get(i)[2], -1) # exit queue > + ) > + } > > queue <- queue[order(queue[, 1]), ] # sort > > queue <- cbind(queue, cumsum(queue[, 2])) # of people in the queue > > print(queue) > [,1] [,2] [,3] > [1,] -100.0011 > [2,] -25.5012 > [3,]0.7713 > [4,] 10.00 -12 > [5,] 25.0013 > [6,] 30.00 -12 > [7,] 35.00 -11 > [8,] 70.0012 > [9,] 80.30 -11 > [10,] 90.0012 > [11,] 95.00 -11 > [12,] 100.0012 > > > > # print out values where the last column is 1 > > for (i in which(queue[, 3] == 1)){ > + cat("start:", queue[i, 1L], ' end:', queue[i + 1L, 1L], "\n") > + } > start: -100 end: -25.5 > start: 35 end: 70 > start: 80.3 end: 90 > start: 95 end: 100 > > > > > = > > On Sat, May 12, 2012 at 1:54 PM, Ben quant wrote: > > Hello, > > > > I'm posting this again (with some small edits). I didn't get any replies > > last time...hoping for some this time. :) > > > > Currently I'm only coming up with brute force solutions to this issue > > (loops). I'm wondering if anyone has a better way to do this. Thank you > for > > your help in advance! > > > > The problem: I have endpoints of one x range (x_rng) and an unknown > number > > of s ranges (s[#]_rng) also defined by the range endpoints. I'd like to > > remove the x ranges that overlap with the s ranges. The examples below > > demonstrate what I mean. > > > > What is the best way to do this? > > > > Ex 1. > > For: > > x_rng = c(-100,100) > > > > s1_rng = c(-25.5,30) > > s2_rng = c(0.77,10) > > s3_rng = c(25,35) > > s4_rng = c(70,80.3) > > s5_rng = c(90,95) > > > > I would get: > > -100,-25.5 > > 35,70 > > 80.3,90 > > 95,100 > > > > Ex 2. > > For: > > x_rng = c(-50.5,100) > > > > s1_rng = c(-75.3,30) > > > > I would get: > > 30,100 > > > > Ex 3. > > For: > > x_rng = c(-75.3,30) > > > > s1_rng = c(-50.5,100) > > > > I would get: > > -75.3,-50.5 > > > > Ex 4. > > For: > > x_rng = c(-100,100) > > > > s1_rng = c(-105,105) > > > > I would get something like: > > NA,NA > > or... > > NA > > > > Ex 5. > > For: > > x_rng = c(-100,100) > > > > s1_rng = c(-100,100) > > > > I would get something like: > > -100,-100 > > 100,100 > > or just... > > -100 > > 100 > > > > PS - You may have noticed that in all of the examples I am including the > s > > range endpoints in the desired results, which I can deal with later in my > > program so its not a problem... I think leaving in the s range endpoints > > simplifies the problem. > > > > Thanks! > > Ben > > > >[[alternative HTML version deleted]] > > > > __ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > -- > Jim Holtman > Data Munger Guru > > What is the problem that you are trying to solve? > Tell me what you want to do, not how you want to do it. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] range segment exclusion using range endpoints
Turns out this solution doesn't work if the s range is outside the range of the x range. I didn't include that in my examples, but it is something I have to deal with quite often. For example s1_rng below causes an issue: x_rng = c(-100,100) s1_rng = c(-250.5,30) s2_rng = c(0.77,10) s3_rng = c(25,35) s4_rng = c(70,80.3) s5_rng = c(90,95) sNames <- grep("s[0-9]+_rng", ls(), value = TRUE) queue <- rbind(c(x_rng[1], 1), c(x_rng[2], 1)) for (i in sNames){ queue <- rbind(queue , c(get(i)[1], 1) # enter queue , c(get(i)[2], -1) # exit queue ) } queue <- queue[order(queue[, 1]), ] # sort queue <- cbind(queue, cumsum(queue[, 2])) # of people in the queue for (i in which(queue[, 3] == 1)){ cat("start:", queue[i, 1L], ' end:', queue[i + 1L, 1L], "\n") } Regards, ben On Sat, May 12, 2012 at 12:50 PM, jim holtman wrote: > Here is an example of how you might do it. It uses a technique of > counting how many items are in a queue based on their arrival times; > it can be used to also find areas of overlap. > > Note that it would be best to use a list for the 's' end points > > > > # note the next statement removes names of the format 's[0-9]+_rng' > > # it would be best to create a list with the 's' endpoints, but this is > > # what the OP specified > > > > rm(list = grep('s[0-9]+_rng', ls(), value = TRUE)) # Danger Will > Robinson!! > > > > # ex 1 > > x_rng = c(-100,100) > > > > s1_rng = c(-25.5,30) > > s2_rng = c(0.77,10) > > s3_rng = c(25,35) > > s4_rng = c(70,80.3) > > s5_rng = c(90,95) > > > > # ex 2 > > # x_rng = c(-50.5,100) > > > > # s1_rng = c(-75.3,30) > > > > # ex 3 > > # x_rng = c(-75.3,30) > > > > # s1_rng = c(-50.5,100) > > > > # ex 4 > > # x_rng = c(-100,100) > > > > # s1_rng = c(-105,105) > > > > # find all the names -- USE A LIST NEXT TIME > > sNames <- grep("s[0-9]+_rng", ls(), value = TRUE) > > > > # initial matrix with the 'x' endpoints > > queue <- rbind(c(x_rng[1], 1), c(x_rng[2], 1)) > > > > # add the 's' end points to the list > > # this will be used to determine how many things are in a queue (or > areas that > > # overlap) > > for (i in sNames){ > + queue <- rbind(queue > + , c(get(i)[1], 1) # enter queue > + , c(get(i)[2], -1) # exit queue > + ) > + } > > queue <- queue[order(queue[, 1]), ] # sort > > queue <- cbind(queue, cumsum(queue[, 2])) # of people in the queue > > print(queue) > [,1] [,2] [,3] > [1,] -100.0011 > [2,] -25.5012 > [3,]0.7713 > [4,] 10.00 -12 > [5,] 25.0013 > [6,] 30.00 -12 > [7,] 35.00 -11 > [8,] 70.00 12 > [9,] 80.30 -11 > [10,] 90.0012 > [11,] 95.00 -11 > [12,] 100.0012 > > > > # print out values where the last column is 1 > > for (i in which(queue[, 3] == 1)){ > + cat("start:", queue[i, 1L], ' end:', queue[i + 1L, 1L], "\n") > + } > start: -100 end: -25.5 > start: 35 end: 70 > start: 80.3 end: 90 > start: 95 end: 100 > > > > > = > > On Sat, May 12, 2012 at 1:54 PM, Ben quant wrote: > > Hello, > > > > I'm posting this again (with some small edits). I didn't get any replies > > last time...hoping for some this time. :) > > > > Currently I'm only coming up with brute force solutions to this issue > > (loops). I'm wondering if anyone has a better way to do this. Thank you > for > > your help in advance! > > > > The problem: I have endpoints of one x range (x_rng) and an unknown > number > > of s ranges (s[#]_rng) also defined by the range endpoints. I'd like to > > remove the x ranges that overlap with the s ranges. The examples below > > demonstrate what I mean. > > > > What is the best way to do this? > > > > Ex 1. > > For: > > x_rng = c(-100,100) > > > > s1_rng = c(-25.5,30) > > s2_rng = c(0.77,10) > > s3_rng = c(25,35) > > s4_rng = c(70,80.3) > > s5_rng = c(90,95) > > > > I would get: > > -100,-25.5 > > 35,70 > > 80.3,90 > > 95,100 > > > > Ex 2. > > For: > > x_rng = c(-50.5,100) > >
Re: [R] range segment exclusion using range endpoints
Yes, it is. I'm looking into understanding this now... thanks! Ben On Mon, May 14, 2012 at 12:38 PM, William Dunlap wrote: > To the list of function I sent, add another that converts a list of > intervals > into a Ranges object: > as.Ranges.list <- function (x, ...) { > stopifnot(nargs() == 1, all(vapply(x, length, 0) == 2)) > # use c() instead of unlist() because c() doesn't mangle POSIXct and > Date objects > x <- unname(do.call(c, x)) > odd <- seq(from = 1, to = length(x), by = 2) > as.Ranges(bottoms = x[odd], tops = x[odd + 1]) > } > Then stop using get() and assign() all over the place and instead make > lists of > related intervals and convert them to Ranges objects: > > x <- as.Ranges(list(x_rng)) > > s <- as.Ranges(list(s1_rng, s2_rng, s3_rng, s4_rng, s5_rng)) > > x >bottoms tops > 1-100 100 > > s >bottoms tops > 1 -250.50 30.0 > 20.77 10.0 > 3 25.00 35.0 > 4 70.00 80.3 > 5 90.00 95.0 > and then compute the difference between the sets x and s (i.e., describe > the points in x but not s as a union of intervals): > > setdiffRanges(x, s) >bottoms tops > 135.0 70 > 280.3 90 > 395.0 100 > and for a graphical check do > > plot(x, s, setdiffRanges(x, s)) > Are those the numbers you want? > > I find it easier to use standard functions and data structures for this > than > to adapt the cumsum/order idiom to different situations. > > Bill Dunlap > Spotfire, TIBCO Software > wdunlap tibco.com > > > > -Original Message- > > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] > On Behalf > > Of Ben quant > > Sent: Monday, May 14, 2012 11:07 AM > > To: jim holtman > > Cc: r-help@r-project.org > > Subject: Re: [R] range segment exclusion using range endpoints > > > > Turns out this solution doesn't work if the s range is outside the range > of > > the x range. I didn't include that in my examples, but it is something I > > have to deal with quite often. > > > > For example s1_rng below causes an issue: > > > > x_rng = c(-100,100) > > s1_rng = c(-250.5,30) > > s2_rng = c(0.77,10) > > s3_rng = c(25,35) > > s4_rng = c(70,80.3) > > s5_rng = c(90,95) > > > > sNames <- grep("s[0-9]+_rng", ls(), value = TRUE) > > queue <- rbind(c(x_rng[1], 1), c(x_rng[2], 1)) > > for (i in sNames){ > > queue <- rbind(queue > > , c(get(i)[1], 1) # enter queue > > , c(get(i)[2], -1) # exit queue > > ) > > } > > queue <- queue[order(queue[, 1]), ] # sort > > queue <- cbind(queue, cumsum(queue[, 2])) # of people in the queue > > for (i in which(queue[, 3] == 1)){ > > cat("start:", queue[i, 1L], ' end:', queue[i + 1L, 1L], "\n") > > } > > > > Regards, > > > > ben > > On Sat, May 12, 2012 at 12:50 PM, jim holtman > wrote: > > > > > Here is an example of how you might do it. It uses a technique of > > > counting how many items are in a queue based on their arrival times; > > > it can be used to also find areas of overlap. > > > > > > Note that it would be best to use a list for the 's' end points > > > > > > > > > > # note the next statement removes names of the format 's[0-9]+_rng' > > > > # it would be best to create a list with the 's' endpoints, but this > is > > > > # what the OP specified > > > > > > > > rm(list = grep('s[0-9]+_rng', ls(), value = TRUE)) # Danger Will > > > Robinson!! > > > > > > > > # ex 1 > > > > x_rng = c(-100,100) > > > > > > > > s1_rng = c(-25.5,30) > > > > s2_rng = c(0.77,10) > > > > s3_rng = c(25,35) > > > > s4_rng = c(70,80.3) > > > > s5_rng = c(90,95) > > > > > > > > # ex 2 > > > > # x_rng = c(-50.5,100) > > > > > > > > # s1_rng = c(-75.3,30) > > > > > > > > # ex 3 > > > > # x_rng = c(-75.3,30) > > > > > > > > # s1_rng = c(-50.5,100) > > > > > > > > # ex 4 > > > > # x_rng = c(-100,100) > > > > > > > > # s1_rng = c(-105,105) > > > > > > > > # find all the names -- USE A LIST NEXT TIME > > > > sNames <- grep("s[0-9
Re: [R] range segment exclusion using range endpoints
Thank you Steve! This does everything I need (at this point): (this excludes ranges y2 from range y1) library('intervals') y1 = Intervals(c(-100,100)) y2 = Intervals(rbind( c(-100.5,30), c(0.77,10), c(25,35), c(70,80.3), c(90,95) )) interval_difference(y1,y2) Object of class Intervals_full 3 intervals over R: (35, 70) (80.3, 90) (95, 100] PS - I'm pretty sure William's solution worked as well, but opted for the package solution which is a bit more robust. Thanks everyone! Ben On Mon, May 14, 2012 at 1:06 PM, Steve Lianoglou < mailinglist.honey...@gmail.com> wrote: > Hi all, > > Nice code samples presented all around. > > Just wanted to point out that I think the stuff found in the > `intervals` package might also be helpful: > > http://cran.at.r-project.org/web/packages/intervals/index.html > > HTH, > -steve > > On Mon, May 14, 2012 at 2:54 PM, Ben quant wrote: > > Yes, it is. I'm looking into understanding this now... > > > > thanks! > > Ben > > > > On Mon, May 14, 2012 at 12:38 PM, William Dunlap > wrote: > > > >> To the list of function I sent, add another that converts a list of > >> intervals > >> into a Ranges object: > >> as.Ranges.list <- function (x, ...) { > >> stopifnot(nargs() == 1, all(vapply(x, length, 0) == 2)) > >> # use c() instead of unlist() because c() doesn't mangle POSIXct > and > >> Date objects > >> x <- unname(do.call(c, x)) > >> odd <- seq(from = 1, to = length(x), by = 2) > >> as.Ranges(bottoms = x[odd], tops = x[odd + 1]) > >> } > >> Then stop using get() and assign() all over the place and instead make > >> lists of > >> related intervals and convert them to Ranges objects: > >> > x <- as.Ranges(list(x_rng)) > >> > s <- as.Ranges(list(s1_rng, s2_rng, s3_rng, s4_rng, s5_rng)) > >> > x > >>bottoms tops > >> 1-100 100 > >> > s > >>bottoms tops > >> 1 -250.50 30.0 > >> 20.77 10.0 > >> 3 25.00 35.0 > >> 4 70.00 80.3 > >> 5 90.00 95.0 > >> and then compute the difference between the sets x and s (i.e., describe > >> the points in x but not s as a union of intervals): > >> > setdiffRanges(x, s) > >>bottoms tops > >> 135.0 70 > >> 280.3 90 > >> 395.0 100 > >> and for a graphical check do > >> > plot(x, s, setdiffRanges(x, s)) > >> Are those the numbers you want? > >> > >> I find it easier to use standard functions and data structures for this > >> than > >> to adapt the cumsum/order idiom to different situations. > >> > >> Bill Dunlap > >> Spotfire, TIBCO Software > >> wdunlap tibco.com > >> > >> > >> > -Original Message- > >> > From: r-help-boun...@r-project.org [mailto: > r-help-boun...@r-project.org] > >> On Behalf > >> > Of Ben quant > >> > Sent: Monday, May 14, 2012 11:07 AM > >> > To: jim holtman > >> > Cc: r-help@r-project.org > >> > Subject: Re: [R] range segment exclusion using range endpoints > >> > > >> > Turns out this solution doesn't work if the s range is outside the > range > >> of > >> > the x range. I didn't include that in my examples, but it is > something I > >> > have to deal with quite often. > >> > > >> > For example s1_rng below causes an issue: > >> > > >> > x_rng = c(-100,100) > >> > s1_rng = c(-250.5,30) > >> > s2_rng = c(0.77,10) > >> > s3_rng = c(25,35) > >> > s4_rng = c(70,80.3) > >> > s5_rng = c(90,95) > >> > > >> > sNames <- grep("s[0-9]+_rng", ls(), value = TRUE) > >> > queue <- rbind(c(x_rng[1], 1), c(x_rng[2], 1)) > >> > for (i in sNames){ > >> > queue <- rbind(queue > >> > , c(get(i)[1], 1) # enter queue > >> > , c(get(i)[2], -1) # exit queue > >> > ) > >> > } > >> > queue <- queue[order(queue[, 1]), ] # sort > >> > queue <- cbind(queue, cumsum(queue[, 2])) # of people in the queue > >> > for (i in which(queue[, 3] == 1)){ > >> > cat("start:", queue[i, 1L], ' end:', queue[i + 1L, 1L], "\n") > >> > } > >> > &
[R] pass objects into "..." (dot dot dot)
Hello, Thanks in advance for any help! How do I pass an unknown number of objects into the "..." (dot dot dot) parameter? Put another way, is there some standard way to pass multiple objects into "..." to "fool" the function into thinking the objects are passed in separately/explicitly with common separation (like "x,y,z" when x, y and z are objects to be passed into "...")? Details: I'm working with this parameter list and function: interval_intersection(x, ..., check_valid = TRUE) To illustrate... This works and I get the expected interval: library('intervals') # create individual Intervals objects z = Intervals(c(1,10)) y = Intervals(c(5,10)) x = Intervals(c(4,6)) > interval_intersection(x,y,z) Object of class Intervals 1 interval over R: [5, 6] ...but at run time I don't know how many Intervals objects I will have so I can't list them explicitly like this "x,y,z". So I build a matrix of Intervals (per the package manual) and the function doesn't work: > xyz = matrix(c(4,5,1,6,10,10),nrow=3) > xyz [,1] [,2] [1,]46 [2,]5 10 [3,]1 10 > xyz_interval = Intervals(xyz) > interval_intersection(xyz_interval) Object of class Intervals 1 interval over R: [1, 10] ...[1,10] is unexpected/wrong because I want the intersection of the three intervals. So I conclude that I need to pass in the individual Intervals objects, but how do I do that if I don't know how many I have at run time? I tried putting them in a list, but that didn't work. I also tried using paste(,sep=',') and get(). Is there some standard way to pass multiple objects into "..." to "fool" the function into thinking they are passed in separately/explicitly with common separation? Thanks! ben [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.