from:"Matthew"

[R] identify duplicate entries in data frame and calculate mean

2016-05-24 Thread Matthew


I have a data frame with 10 columns.
In the last column is an alphaneumaric identifier.
For most rows, this alphaneumaric identifier is unique to the file, 
however some of these alphanemeric idenitifiers occur in duplicate, 
triplicate or more. When they do occur more than once they are in 
consecutive rows, so when there is a duplicate or triplicate or 
quadruplicate (let's call them multiplicates), they are in consecutive rows.


In column 7 there is an integer number (may or may not be unique. does 
not matter).


I want to identify each multiple entries (multiplicates) occurring in 
column 10 and then for each multiplicate calculate the mean of the 
integers column 7.


As an example, I will show just two columns:
Length  Identifier
321 A234
350 A234
340 A234
180 B123
198 B225

What I want to do (in the above example) is collapse all the A234's and 
report the mean to get this:

Length  Identifier
337 A234
180 B123
198     B225


Matthew

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] identify duplicate entries in data frame and calculate mean

2016-05-24 Thread Matthew

Thank you very much, Tom.
This gets me thinking in the right direction.
One thing I should have mentioned that I did not is that the number of 
rows in the data frame will be a little over 40,000 rows.

On 5/24/2016 4:08 PM, Tom Wright wrote:
> Using dplyr
>
> $ library(dplyr)
> $ x<-data.frame(Length=c(321,350,340,180,198),
> ID=c(rep('A234',3),'B123','B225') )
> $ x %>% group_by(ID) %>% summarise(m=mean(Length))
>
>
>
> On Tue, May 24, 2016 at 3:46 PM, Matthew 
>  <mailto:mccorm...@molbio.mgh.harvard.edu>> wrote:
>
> I have a data frame with 10 columns.
> In the last column is an alphaneumaric identifier.
> For most rows, this alphaneumaric identifier is unique to the
> file, however some of these alphanemeric idenitifiers occur in
> duplicate, triplicate or more. When they do occur more than once
> they are in consecutive rows, so when there is a duplicate or
> triplicate or quadruplicate (let's call them multiplicates), they
> are in consecutive rows.
>
> In column 7 there is an integer number (may or may not be unique.
> does not matter).
>
> I want to identify each multiple entries (multiplicates) occurring
> in column 10 and then for each multiplicate calculate the mean of
> the integers column 7.
>
> As an example, I will show just two columns:
> Length  Identifier
> 321 A234
> 350 A234
> 340 A234
> 180 B123
> 198 B225
>
> What I want to do (in the above example) is collapse all the
> A234's and report the mean to get this:
> Length  Identifier
> 337 A234
> 180 B123
> 198 B225
>
>
> Matthew
>
> __
> R-help@r-project.org <mailto:R-help@r-project.org> mailing list --
> To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] identify duplicate entries in data frame and calculate mean

2016-05-24 Thread Matthew

Thanks, Tom.  I was making a mistake looking at your example and that's 
what my problem was.

Cool answer, works great. Thank you very much.

Matthew

On 5/24/2016 4:23 PM, Tom Wright wrote:
> Don't see that as being a big problem. If your data grows then dplyr 
> supports connections to external databases. Alternately if you just 
> want a mean, most databases can do that directly in SQL.
>
> On Tue, May 24, 2016 at 4:17 PM, Matthew 
>  <mailto:mccorm...@molbio.mgh.harvard.edu>> wrote:
>
> Thank you very much, Tom.
> This gets me thinking in the right direction.
> One thing I should have mentioned that I did not is that the
> number of rows in the data frame will be a little over 40,000 rows.
>
>
> On 5/24/2016 4:08 PM, Tom Wright wrote:
>> Using dplyr
>>
>> $ library(dplyr)
>> $ x<-data.frame(Length=c(321,350,340,180,198),
>> ID=c(rep('A234',3),'B123','B225') )
>> $ x %>% group_by(ID) %>% summarise(m=mean(Length))
>>
>>
>>
>> On Tue, May 24, 2016 at 3:46 PM, Matthew
>> > <mailto:mccorm...@molbio.mgh.harvard.edu>> wrote:
>>
>> I have a data frame with 10 columns.
>> In the last column is an alphaneumaric identifier.
>> For most rows, this alphaneumaric identifier is unique to the
>> file, however some of these alphanemeric idenitifiers occur
>> in duplicate, triplicate or more. When they do occur more
>> than once they are in consecutive rows, so when there is a
>> duplicate or triplicate or quadruplicate (let's call them
>> multiplicates), they are in consecutive rows.
>>
>> In column 7 there is an integer number (may or may not be
>> unique. does not matter).
>>
>> I want to identify each multiple entries (multiplicates)
>> occurring in column 10 and then for each multiplicate
>> calculate the mean of the integers column 7.
>>
>> As an example, I will show just two columns:
>> Length  Identifier
>> 321 A234
>> 350 A234
>>     340 A234
>> 180 B123
>> 198 B225
>>
>> What I want to do (in the above example) is collapse all the
>> A234's and report the mean to get this:
>> Length  Identifier
>> 337 A234
>> 180 B123
>> 198 B225
>>
>>
>> Matthew
>>
>> __
>> R-help@r-project.org <mailto:R-help@r-project.org> mailing
>> list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible
>> code.
>>
>>
>
>


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] identify duplicate entries in data frame and calculate mean

2016-05-24 Thread Matthew


Thank you very much, Dan.

These work great. Two more great answers to my question.

Matthew

On 5/24/2016 4:15 PM, Nordlund, Dan (DSHS/RDA) wrote:

You have several  options.

1.  You could use the aggregate function.  If your data frame is called DF, you 
could do something like

with(DF, aggregate(Length, list(Identifier), mean))

2.  You could use the dplyr package like this

library(dplyr)
summarize(group_by(DF, Identifier), mean(Length))


Hope this is helpful,

Dan

Daniel Nordlund, PhD
Research and Data Analysis Division
Services & Enterprise Support Administration
Washington State Department of Social and Health Services



-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Matthew
Sent: Tuesday, May 24, 2016 12:47 PM
To: r-help@r-project.org
Subject: [R] identify duplicate entries in data frame and calculate mean

I have a data frame with 10 columns.
In the last column is an alphaneumaric identifier.
For most rows, this alphaneumaric identifier is unique to the file, however
some of these alphanemeric idenitifiers occur in duplicate, triplicate or more.
When they do occur more than once they are in consecutive rows, so when
there is a duplicate or triplicate or quadruplicate (let's call them 
multiplicates),
they are in consecutive rows.

In column 7 there is an integer number (may or may not be unique. does not
matter).

I want to identify each multiple entries (multiplicates) occurring in column 10
and then for each multiplicate calculate the mean of the integers column 7.

As an example, I will show just two columns:
Length  Identifier
321 A234
350 A234
340 A234
180 B123
198 B225

What I want to do (in the above example) is collapse all the A234's and report
the mean to get this:
Length  Identifier
337 A234
180 B123
198 B225


Matthew

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-
guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] use value in variable to be name of another variable

2016-07-11 Thread Matthew

I want to get a value that has been assigned to a variable, and then use 
that value to be the name of a variable.


For example,

tTargTFS[1,1]
# returns:
V1
"AT1G01010"

Now, I want to make AT1G01010 the name of a variable:
AT1G01010 <- tTargTFS[-1,1]

Then, go to the next tTargTFS[1,2]. Which produces
   V1
"AT1G01030"
And then,
AT1G01030 <- tTargTFS[-1,2]

I want to do this up to tTargTFS[1, 2666], so I want to do this in a 
script and not manually.
tTargTFS is a list of 2: chr [1:265, 1:2666], but I also have the data 
in a data frame of 265 observations of 2666 variables, if this data 
structure makes things easier.


My initial attempts are not working. Starting with a test data structure 
that is a little simpler I have tried:

for (i in 1:4)
{ ATG <- tTargTFS[1, i]
assign(cat(ATG), tTargTFS[-1, i]) }

Matthew

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] use value in variable to be name of another variable

2016-07-11 Thread Matthew


Hi Jim,

   Wow ! And it does exactly what I was looking for.  Thank you very much.

That assign function is pretty nice. I should become more familiar with it.

Matthew


On 7/11/2016 5:59 PM, Jim Lemon wrote:

Hi Matthew,
This question is a bit mysterious as we don't know what the object
"chr" is. However, have a look at this and see if it is close to what
you want to do.

# set up a little matrix of character values
tTargTFS<-matrix(paste("A",rep(1:4,each=4),"B",rep(1:4,4),sep=""),ncol=4)
# try the assignment on the first row and column
assign(tTargTFS[1,1],tTargTFS[-1,1])
# see what it looks like - okay
A1B1
# run the assignment over the matrix
for(i in 1:4) assign(tTargTFS[1,i],tTargTFS[-1,i])
# see what the variables look like
A1B1
A2B1
A3B1
A4B1

It does what I would expect.

Jim


On Tue, Jul 12, 2016 at 6:01 AM, Matthew
 wrote:

I want to get a value that has been assigned to a variable, and then use
that value to be the name of a variable.

For example,

tTargTFS[1,1]
# returns:
 V1
"AT1G01010"

Now, I want to make AT1G01010 the name of a variable:
AT1G01010 <- tTargTFS[-1,1]

Then, go to the next tTargTFS[1,2]. Which produces
V1
"AT1G01030"
And then,
AT1G01030 <- tTargTFS[-1,2]

I want to do this up to tTargTFS[1, 2666], so I want to do this in a script
and not manually.
tTargTFS is a list of 2: chr [1:265, 1:2666], but I also have the data in a
data frame of 265 observations of 2666 variables, if this data structure
makes things easier.

My initial attempts are not working. Starting with a test data structure
that is a little simpler I have tried:
for (i in 1:4)
{ ATG <- tTargTFS[1, i]
assign(cat(ATG), tTargTFS[-1, i]) }

Matthew

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] use value in variable to be name of another variable

2016-07-11 Thread Matthew


Hi Rolf,

Thanks for the warning. I think because my initial efforts used the 
assign function, that Jim provided his solution using it.


Any suggestions for how it could be done without assign() ?

Matthew

On 7/11/2016 6:31 PM, Rolf Turner wrote:

On 12/07/16 10:13, Matthew wrote:

Hi Jim,

   Wow ! And it does exactly what I was looking for.  Thank you very 
much.


That assign function is pretty nice. I should become more familiar 
with it.


Indeed you should, and assign() is indeed nice and useful and handy. 
But it should be used with care and circumspection.  It *alters the 
global environment* which is fraught with peril. Generally speaking 
most things that can be done with assign() (and its companion function 
get()) are better and more safely done using lists and functions and 
other "natural" R-ish constructs. Resist the temptation to turn R into 
a macro language.


cheers,

Rolf Turner



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] tcltk2 entry box

2015-07-08 Thread Matthew

Is anyone familiar enough with the tcltk2 package to know if it is 
possible to have an entry box where a user can enter information (such 
as a path to a file or a number) and then be able to use the entered 
information downstream in a R script ?


The idea is for someone unfamiliar with R to just start an R script that 
would take care of all the commands for them so all they have to do is 
get the script started. However, there is always a couple of pieces of 
information that will change each time the script is used (for example, 
a different file will be processed by the script). So, I would like a 
way for the user to input that information as the script ran.


Matthew McCormack

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] tcltk2 entry box

2015-07-10 Thread Matthew


Thank you very much, Greg, for the tkwait commands.

 I am just starting to try out examples on the sciviews web page to get 
a feel for tcltk in R and the tkwait.variable and tkwait.window seem 
like they could be very useful to me. I will add these in to my practice 
scripts and see what I can do with them.


Matthew

On 7/9/2015 5:31 PM, Greg Snow wrote:

If you want you script to wait until you have a value entered then you
can use the tkwait.variable or tkwait.window commands to make the
script wait before continuing (or you can bind the code to a button so
that you enter the value, then click on the button to run the code).

On Wed, Jul 8, 2015 at 7:58 PM, Matthew McCormack
 wrote:

Wow !  Very nice.  Thank you very much, John.  This is very helpful and just
what I need.
Yes, I can see that I should have paid attention to tcltk before going to
tcltk2.

Matthew


On 7/8/2015 8:37 PM, John Fox wrote:

Dear Matthew,

For file selection, see ?tcltk::tk_choose.files or ?tcltk::tkgetOpenFile .

You could enter a number in a tk entry widget, but, depending upon the
nature of the number, a slider or other widget might be a better choice.

For a variety of helpful tcltk examples see
<http://www.sciviews.org/_rgui/tcltk/>, originally by James Wettenhall but
now maintained by Philippe Grosjean (the author of the tcltk2 package).
(You
probably don't need tcltk2 for the simple operations that you mention, but
see ?tk2spinbox for an alternative to a slider.)

Best,
   John

---
John Fox, Professor
McMaster University
Hamilton, Ontario, Canada
http://socserv.socsci.mcmaster.ca/jfox/





-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Matthew
Sent: July-08-15 8:01 PM
To: r-help
Subject: [R] tcltk2 entry box

Is anyone familiar enough with the tcltk2 package to know if it is
possible to have an entry box where a user can enter information (such
as a path to a file or a number) and then be able to use the entered
information downstream in a R script ?

The idea is for someone unfamiliar with R to just start an R script that
would take care of all the commands for them so all they have to do is
get the script started. However, there is always a couple of pieces of
information that will change each time the script is used (for example,
a different file will be processed by the script). So, I would like a
way for the user to input that information as the script ran.

Matthew McCormack

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-
guide.html
and provide commented, minimal, self-contained, reproducible code.


---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] need help with excel data

2015-01-21 Thread Matthew

Try asap utilities (Home and Student edition), 
http://www.asap-utilities.com/index.php. When installed it will look 
like this in Excel,

Select Columns & Rows and then #18.


If that is not helpful, then DigDB, http://www.digdb.com/, but this one 
requires a subscription. It will also split columns.
You may have to do some 'cleaning' of individual cells, such as removing 
leading and/or trainling spaces. A lot of this can be one with the ASAP 
Utilities 'Text' pull down menu.


Matthew

On 1/21/2015 3:31 PM, Dr Polanski wrote:

Hi all!

Sorry to bother you, I am trying to learn some R via coursera courses and other 
internet sources yet haven’t managed to go far

And now I need to do some, I hope, not too difficult things, which I think R 
can do, yet have no idea how to make it do so

I have a big set of data (empirical) which was obtained by my colleagues and 
store at not convenient  way - all of the data in two cells of an excel table
an example of the data is in the attached file (the link)

https://drive.google.com/file/d/0B64YMbf_hh5BS2tzVE9WVmV3bFU/view?usp=sharing

so the first column has a number and the second has a whole vector (I guess it 
is) which looks like
«some words in Cyrillic(the length varies)» and then the set of numbers «12*23 
34*45» (another problem that some times it is «12*23, 34*56»

And the number of raws is about 3000 so it is impossible to do manually

what I need to have at the end is to have it separately in different excel cells
- what is written in words - |  12  | 23 | 34 | 45 |

Do you think it is possible to do so using R (or something else?)

Thank you very much in advance and sorry for asking for help and so stupid 
question, the problem is - I am trying and yet haven’t even managed to install 
openSUSE onto my laptop - only Ubuntu! :)


Thank you very much!
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] creating a dataframe with full_join and looping over a list of lists

2019-03-21 Thread Matthew

I have been trying create a dataframe by looping through a list of lists,

and using dplyr's full_join so as to keep common elements on the same row.

But, I have a couple of problems.

1) The lists have different numbers of elements.

2) In the final dataframe, I would like the column names to be the names
of the lists.

Is it possible ?

for(j in avector){

mydf3 <- data.frame(myenter) # Start out with a list,
myenter, to dataframe. mydf3 now has 1 column.
# This
first column will be the longest column in the final mydf3.
atglsts <- as.data.frame(comatgs[j]) # Loop through a list of
lists, comatgs, and with each loop a particular list
# is made
into a dataframe of one column, atglsts.
# The
name of the column is the name of the list.
# Each
atglsts dataframe has a different number of elements.
mydf3 <- full_join(mydf3, atglsts) # What I want to do, is to
add the newly made dataframe, atglsts, as a
} # new
column of the data frame, mydf3 using full_join
# in order
to keep common elements on the same row.
# I could
rename the colname to 'AGI' so that I can join by 'AGI',
# but then
I would lose the name of the list.
# In the
final dataframe, I want to know the name of the original list

# the column was made from.

Matthew

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] creating a dataframe with full_join and looping over a list of lists.

2019-03-21 Thread Matthew

   My apologies, my first e-mail formatted very poorly when sent, so I am 
trying again with something I hope will be less confusing.

I have been trying create a dataframe by looping through a list of lists,
and using dplyr's full_join so as to keep common elements on the same row.
But, I have a couple of problems.

1) The lists have different numbers of elements.

2) In the final dataframe, I would like the column names to be the names
of the lists.

Is it possible ?

Code: *for(j in avector){mydf3 <- data.frame(myenter) atglsts <- 
as.data.frame(comatgs[j]) mydf3 <- full_join(mydf3, atglsts) }* 
Explanation: # Start out with a list, myenter, to dataframe. mydf3 now 
has 1 column. # This first column will be the longest column in the 
final mydf3. # Loop through a list of lists, comatgs, and with each loop 
a particular list # is made into a dataframe of one column, atglsts. # 
The name of the column is the name of the list. # Each atglsts dataframe 
has a different number of elements. # What I want to do, is to add the 
newly made dataframe, atglsts, as a # new column of the data frame, 
mydf3 using full_join # in order to keep common elements on the same 
row. # I could rename the colname to 'AGI' so that I can join by 'AGI', 
# but then I would lose the name of the list. # In the final dataframe, 
I want to know the name of the original list # the column was made from. Matthew




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] creating a dataframe with full_join and looping over a list of lists.

2019-03-25 Thread Matthew

This is fantastic !  It was exactly what I was looking for. It is part 
of a larger Shiny app, so difficult to provide a working example as part 
of the post, and after figuring out how your code works ( I am an R 
novice), I made a couple of small tweaks and it works great !  Thank you 
very much, Jim, for the work you put into this.


Matthew

On 3/21/2019 11:01 PM, Jim Lemon wrote:

 External Email - Use Caution

Hi Matthew,
Remember, keep it on the list so that people know the status of the request.
I couldn't get this to work with the "_source_info_" variable. It
seems to be unreadable as a variable name. So, this _may_ be what you
want. I don't know if it can be done with "merge" and I don't know the
function "full_join".

WRKY8_colamp_a<-as.character(
  c("AT1G02920","AT1G06135","AT1G07160","AT1G11925","AT1G14540","AT1G16150",
  "AT1G21120","AT1G26380","AT1G26410","AT1G35210","AT1G49000","AT1G51920",
  "AT1G56250","AT1G66090","AT1G72520","AT1G80840","AT2G02010","AT2G18690",
  "AT2G30750","AT2G39200","AT2G43620","AT3G01830","AT3G54150","AT3G55840",
  "AT4G03460","AT4G11470","AT4G11890","AT4G14370","AT4G15417","AT4G15975",
  "AT4G31940","AT4G35180","AT5G01540","AT5G05300","AT5G11140","AT5G24110",
  "AT5G25250","AT5G36925","AT5G46295","AT5G64750","AT5G64905","AT5G66020"))

bHLH10_col_a<-as.character(c("AT1G72520","AT3G55840","AT5G20230","AT5G64750"))

bHLH10_colamp_a<-as.character(
  c("AT1G01560","AT1G02920","AT1G16420","AT1G17147","AT1G35210","AT1G51620",
  "AT1G57630","AT1G72520","AT2G18690","AT2G19190","AT2G40180","AT2G44370",
  "AT3G23250","AT3G55840","AT4G03460","AT4G04480","AT4G04540","AT4G08555",
  "AT4G11470","AT4G11890","AT4G16820","AT4G23280","AT4G35180","AT5G01540",
  "AT5G05300","AT5G20230","AT5G22530","AT5G24110","AT5G56960","AT5G57010",
  "AT5G57220","AT5G64750","AT5G66020"))

# let myenter be the sorted superset
myenter<-
  sort(unique(c(WRKY8_colamp_a,bHLH10_col_a,bHLH10_colamp_a)))

splice<-function(x,y) {
  nx<-length(x)
  ny<-length(y)
  newy<-rep(NA,nx)
  if(ny) {
   yi<-1
   for(xi in 1:nx) {
if(x[xi] == y[yi]) {
 newy[xi]<-y[yi]
 yi<-yi+1
}
if(yi>ny) break()
   }
  }
  return(newy)
}

comatgs<-list(WRKY8_colamp_a=WRKY8_colamp_a,
  bHLH10_col_a=bHLH10_col_a,bHLH10_colamp_a=bHLH10_colamp_a)
mydf3<-data.frame(myenter,stringsAsFactors=FALSE)
for(j in 1:length(comatgs)) {
  tmp<-data.frame(splice(myenter,sort(comatgs[[j]])))
  names(tmp)<-names(comatgs)[j]
  mydf3<-cbind(mydf3,tmp)
}

Jim

On Fri, Mar 22, 2019 at 10:29 AM Matthew
 wrote:

Hi Jim,

 Thanks for the reply.  That was pretty dumb of me.  I took that out of the 
loop.

comatgs is longer than this but here is a sample of 4 of 569 elements:

$WRKY8_colamp_a
  [1] "AT1G02920" "AT1G06135" "AT1G07160" "AT1G11925" "AT1G14540" "AT1G16150" 
"AT1G21120"
  [8] "AT1G26380" "AT1G26410" "AT1G35210" "AT1G49000" "AT1G51920" "AT1G56250" 
"AT1G66090"
[15] "AT1G72520" "AT1G80840" "AT2G02010" "AT2G18690" "AT2G30750" "AT2G39200" 
"AT2G43620"
[22] "AT3G01830" "AT3G54150" "AT3G55840" "AT4G03460" "AT4G11470" "AT4G11890" 
"AT4G14370"
[29] "AT4G15417" "AT4G15975" "AT4G31940" "AT4G35180" "AT5G01540" "AT5G05300" 
"AT5G11140"
[36] "AT5G24110" "AT5G25250" "AT5G36925" "AT5G46295" "AT5G64750" "AT5G64905" 
"AT5G66020"

$`_source_info_`
character(0)

$bHLH10_col_a
[1] "AT1G72520" "AT3G55840" "AT5G20230" "AT5G64750"

$bHLH10_colamp_a
  [1] "AT1G01560" "AT1G02920" "AT1G16420" "AT1G17147" "AT1G35210" "AT1G51620" 
"AT1G57630"
  [8] "AT1G72520" "AT2G18690" "AT2G19190" "AT2G40180" "AT2G44370"

[R] working on a data frame

2014-07-24 Thread Matthew

I am coming from the perspective of Excel and VBA scripts, but I would 
like to do the following in R.


 I have a data frame with 14 columns and 32,795 rows.

I want to check the value in column 8 (row 1) to see if it is a 0.
If it is not a zero, proceed to the next row and check the value for 
column 8.

If it is a zero, then
a) change the zero to a 1,
b) divide the value in column 9 (row 1) by 1,
c) place the result in column 10 (row 1) and
d) repeat this for each of the other 32,794 rows.

Is this possible with an R script, and is this the way to go about it. 
If it is, could anyone get me started ?


Matthew

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] working on a data frame

2014-07-25 Thread Matthew


Thank you for your comments, Peter.

A couple of questions.  Can I do something like the following ?

if
yourData[,8]==0,
then
yourData[,8]==1, yourData[,10] <- yourData[,9]/yourData[,8]


I think I am just going to have to learn more about R. I thought getting 
into R would be like going from Perl to Python or Java etc., but it 
seems like R programming works differently.


Matthew


On 7/25/2014 12:06 AM, Peter Alspach wrote:

Tena koe Matthew

" Column 10 contains the result of the value in column 9 divided by the value in 
column 8. If the value in column 8==0, then the division can not be done, so  I want to 
change the zero to a one in order to do the division.".  That being the case, think 
in terms of vectors, as Sarah says.  Try:

yourData[,10] <- yourData[,9]/yourData[,8]
yourData[yourData[,8]==0,10] <- yourData[yourData[,8]==0,9]

This doesn't change the 0 to 1 in column 8, but it doesn't appear you actually 
need to do that.

HTH 

Peter Alspach

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Matthew McCormack
Sent: Friday, 25 July 2014 3:16 p.m.
To: Sarah Goslee
Cc: r-help@r-project.org
Subject: Re: [R] working on a data frame


On 7/24/2014 8:52 PM, Sarah Goslee wrote:

Hi,

Your description isn't clear:

On Thursday, July 24, 2014, Matthew mailto:mccorm...@molbio.mgh.harvard.edu>> wrote:

 I am coming from the perspective of Excel and VBA scripts, but I
 would like to do the following in R.

  I have a data frame with 14 columns and 32,795 rows.

 I want to check the value in column 8 (row 1) to see if it is a 0.
 If it is not a zero, proceed to the next row and check the value
 for column 8.
 If it is a zero, then
 a) change the zero to a 1,
 b) divide the value in column 9 (row 1) by 1,


Row 1, or the row in which column 8 == 0?

All rows in which the value in column 8==0.

Why do you want to divide by 1?

Column 10 contains the result of the value in column 9 divided by the value in 
column 8. If the value in column 8==0, then the division can not be done, so  I 
want to change the zero to a one in order to do the division. This is a fairly 
standard thing to do with this data. (The data are measurements of amounts at 
two time points. Sometimes a thing will not be present in the beginning (0), 
but very present at the later time. Column 10 is the log2 of the change. 
Infinite is not an easy number to work with, so it is common to change the 0 to 
a 1. On the other hand, something may be present at time 1, but not at the 
later time. In this case column 10 would be taking the log2 of a number divided 
by 0, so again the zero is commonly changed to a one in order to get a useable 
value in column 10. In both the preceding cases there was a real change, but 
Inf and NaN are not helpful.)

 c) place the result in column 10 (row 1) and


Ditto on the row 1 question.

I want to work on all rows where column 8 (and column 9) contain a zero.
Column 10 contains the result of the value in column 9 divided by the value in 
column 8. So, for row 1, column 10 row 1 contains the ratio column 9 row 1 
divided by column 8 row 1, and so on through the whole
32,000 or so rows.

Most rows do not have a zero in columns 8 or 9. Some rows have  zero in column 
8 only, and some rows have a zero in column 9 only. I want to get rid of the 
zeros in these two columns and then do the division to get a manageable value 
in column 10. Division by zero and Inf are not considered 'manageable' by me.

What do you want column 10 to be if column 8 isn't 0? Does it already
have a value. I suppose it must.

Yes column 10 does have something, but this something can be Inf or NaN, which 
I want to get rid of.

 d) repeat this for each of the other 32,794 rows.

 Is this possible with an R script, and is this the way to go about
 it. If it is, could anyone get me started ?


Assuming you want to put the new values in the rows where column 8 ==
0, you can do it in two steps:

mydata[,10] <- ifelse(mydata[,8] == 0, mydata[,9]/whatever,
mydata[,10]) #where whatever is the thing you want to divide by that
probably isn't 1 mydata[,8] <- ifelse(mydata[,8] == 0, 1, mydata[,8])

R programming is best done by thinking about vectorizing things,
rather than doing them in loops. Reading the Intro to R that comes
with your installation is a good place to start.

Would it be better to change the data frame into a matrix, or something else ?
Thanks for your help.

Sarah


 Matthew




--
Sarah Goslee
http://www.stringpage.com
http://www.sarahgoslee.com
http://www.functionaldiversity.org


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide c

Re: [R] working on a data frame

2014-07-28 Thread Matthew

Thank you very much Peter, Bill and Petr for some great and quite 
elegant solutions. There is a lot I can learn from these.


Yes to your question Bill about the raw numbers, they are counts 
and they can not be negatives. The data is RNA Sequencing data where 
there are approximately 32,000 genes being measured for changes between 
two conditions. There are some genes that are not present (can not be 
measured) initially, but are present in the second condition, and the 
reverse is true also of some genes that are present initially and then 
not be present in the second condition (these are often the most 
interesting genes). This makes it difficult to compare mathematically 
the changes of all genes, so it is common practice to change the 0's to 
1's and then redo the log2. 1 is considered sufficiently small, actually 
anywhere up to 3 or 5 could be just do to 'background noise' in the 
measurement process, but it is somewhat arbitrary.


Matthew

On 7/28/2014 2:43 AM, PIKAL Petr wrote:

Hi

I like to use logical values directly in computations if possible.

yourData[,10] <- yourData[,9]/(yourData[,8]+(yourData[,8]==0))

Logical values are automagicaly considered FALSE=0 and TRUE=1 and can be used 
in computations. If you really want to change 0 to 1 in column 8 you can use

yourData[,8]  <-  yourData[,8]+(yourData[,8]==0)

without ifelse stuff.

Regards
Petr



-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
project.org] On Behalf Of William Dunlap
Sent: Friday, July 25, 2014 8:07 PM
To: Matthew
Cc: r-help@r-project.org
Subject: Re: [R] working on a data frame


if
yourData[,8]==0,
then
yourData[,8]==1, yourData[,10] <- yourData[,9]/yourData[,8]

You could do express this in R as
is8Zero <- yourData[,8] == 0
yourData[is8Zero, 8] <- 1
yourData[is8Zero, 10] <- yourData[is8Zero,9] / yourData[is8Zero,8]
Note how logical (Boolean) values are used as subscripts - read the '['
as 'such that' when using logical subscripts.

There are many more ways to express the same thing.

(I am tempted to change the algorithm to avoid the divide by zero
problem by making the quotient (numerator + epsilon)/(denominator +
epsilon) where epsilon is a very small number.  I am assuming that the
raw numbers are counts or at least cannot be negative.)

Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Fri, Jul 25, 2014 at 10:44 AM, Matthew
 wrote:

Thank you for your comments, Peter.

A couple of questions.  Can I do something like the following ?

if
yourData[,8]==0,
then
yourData[,8]==1, yourData[,10] <- yourData[,9]/yourData[,8]


I think I am just going to have to learn more about R. I thought
getting into R would be like going from Perl to Python or Java etc.,
but it seems like R programming works differently.

Matthew


On 7/25/2014 12:06 AM, Peter Alspach wrote:

Tena koe Matthew

" Column 10 contains the result of the value in column 9 divided by
the value in column 8. If the value in column 8==0, then the

division

can not be done, so  I want to change the zero to a one in order to

do the division.".

That being the case, think in terms of vectors, as Sarah says.  Try:

yourData[,10] <- yourData[,9]/yourData[,8]
yourData[yourData[,8]==0,10] <- yourData[yourData[,8]==0,9]

This doesn't change the 0 to 1 in column 8, but it doesn't appear

you

actually need to do that.

HTH 

Peter Alspach

-Original Message-
From: r-help-boun...@r-project.org
[mailto:r-help-boun...@r-project.org]
On Behalf Of Matthew McCormack
Sent: Friday, 25 July 2014 3:16 p.m.
To: Sarah Goslee
Cc: r-help@r-project.org
Subject: Re: [R] working on a data frame


On 7/24/2014 8:52 PM, Sarah Goslee wrote:

Hi,

Your description isn't clear:

On Thursday, July 24, 2014, Matthew

<mailto:mccorm...@molbio.mgh.harvard.edu>> wrote:

  I am coming from the perspective of Excel and VBA scripts, but

I

  would like to do the following in R.

   I have a data frame with 14 columns and 32,795 rows.

  I want to check the value in column 8 (row 1) to see if it is

a 0.

  If it is not a zero, proceed to the next row and check the

value

  for column 8.
  If it is a zero, then
  a) change the zero to a 1,
  b) divide the value in column 9 (row 1) by 1,


Row 1, or the row in which column 8 == 0?

All rows in which the value in column 8==0.

Why do you want to divide by 1?

Column 10 contains the result of the value in column 9 divided by

the

value in column 8. If the value in column 8==0, then the division

can

not be done, so  I want to change the zero to a one in order to do

the division.

This is a fairly standard thing to do with this data. (The data are
measurements of amounts at two time points. Sometimes a thing will
not be present in the beginning (0), but very present at the later
time. Column 10 is the log2 of the change. Infinite is not an

[R] find the data frames in list of objects and make a list of them

2014-08-13 Thread Matthew


Hi everyone,

   I would like the find which objects are data frames in all the 
objects I have created ( in other words in what you get when you type: 
ls()  ), then I would like to make a list of these data frames.


Explained in other words; after typing ls(), you get the names of 
objects. Which objects are data frames ?  How to then make a list of 
these data frames.


   A second question: is this the best way to make a list of data 
frames without having to manually type c(dataframe1, dataframe2, ...)  ?


Matthew

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] find the data frames in list of objects and make a list of them

2014-08-13 Thread Matthew

Jim, Wow that was cl !   This function is *really* useful.  Thank 
you very much !  (It is also way beyond my capability).

I need to make a list of data frames because then I am going to bind 
them with plyr using 'dplyr::rbind_all(listOfDataFrames)'. This will 
make a single data frame, and from that single data frame I can make a 
heat map of all the data.

   For example, when I use your fantastic function, my.ls(), I get:

my.ls()
  Size  Class  
Length  Dim
.Random.seed2,544integer
 626
cpl28,664  character
 512
filenames   2,120  character
  19
filepath  216  character
   1
i 152  character
   1
Mer7_1-1_160-226A_1_gene_exp_diff_filt_hc_log2.txt 81,152 data.frame
   3  529 x 3
Mer7_1-1_Mer7_1-2_gene_exp_diff_filt_hc_log2.txt   31,624 data.frame
   3  199 x 3
Mer7_1-1_S150-160-226A_1_gene_exp_diff_filt_hc_log2.txt81,152 data.frame
   3  529 x 3
Mer7_1-1_W29_1_gene_exp_diff_filt_hc_log2.txt 129,376 data.frame
   3  849 x 3
Mer7_1-1_W29_S150-226A_1_gene_exp_diff_filt_hc_log2.txt   126,816 data.frame
   3  835 x 3
Mer7_1-1_W29_S160-162A_1_gene_exp_diff_filt_hc_log2.txt82,792 data.frame
   3  537 x 3
Mer7_1-1_W29_S226A_1_gene_exp_diff_filt_hc_log2.txt   115,008 data.frame
   3  756 x 3
Mer7_1-2_160-226A_1_gene_exp_diff_filt_hc_log2.txt 79,936 data.frame
   3  519 x 3
Mer7_1-2_S150-160-226A_1_gene_exp_diff_filt_hc_log2.txt84,512 data.frame
   3  548 x 3
Mer7_1-2_W29_1_gene_exp_diff_filt_hc_log2.txt 130,568 data.frame
   3  857 x 3
Mer7_1-2_W29_S160-162A_1_gene_exp_diff_filt_hc_log2.txt83,768 data.frame
   3  542 x 3
Mer7_1-2_W29_S226A_1_gene_exp_diff_filt_hc_log2.txt   119,008 data.frame
   3  783 x 3
Mer7_2-1_160-226A_2_gene_exp_diff_filt_hc_log2.txt105,344 data.frame
   3  685 x 3
Mer7_2-1_Mer7_2-2_gene_exp_diff_filt_hc_log2.txt   26,216 data.frame
   3  166 x 3
Mer7_2-1_S150-160-226A_2_gene_exp_diff_filt_hc_log2.txt   106,368 data.frame
   3  693 x 3
Mer7_2-1_W29_2_gene_exp_diff_filt_hc_log2.txt 160,200 data.frame
   3 1053 x 3
Mer7_2-1_W29_S150-226A_2_gene_exp_diff_filt_hc_log2.txt   152,696 data.frame
   3 1005 x 3
Mer7_2-1_W29_S160-162A_2_gene_exp_diff_filt_hc_log2.txt   113,992 data.frame
   3  743 x 3
Mer7_2-1_W29_S226A_2_gene_exp_diff_filt_hc_log2.txt   138,944 data.frame
   3  914 x 3
my.ls  35,624   function
   1
myfiles 2,120  character
  19
names   2,424   list
  19
test  680  character
   5
whatisthis  2,424   list
  19
**Total 2,026,440--- 
---  ---



   What I need is make the list of data frames for the dplyr command, 
dplyr::rbind_all(listOfDataFrames). Ideally, this would also be a 
specific subset of all the data frames, say the data frames with W29 in 
the name. This is something we, our lab, would be doing routinely and at 
various times of the day, so I want to automate the process so it does 
not need anyone to manually sit at the computer and type the list of 
data frames.

Matthew


On 8/13/2014 3:06 PM, jim holtman wrote:
> Here is a function that I use that might give you the results you want:
>
> =
>> my.ls()
> Size  Class  Length Dim
> .Random.seed  2,544integer 626
> .remapHeaderFile 40,440 data.frame   2 373 x 2
> colID   216  character   3
> delDate 104  character   1
> deliv15,752 data.table   7 164 x 7
> f_drawPallet 36,896   function   1
> i96  character   1
> indx168,816  character1782
> pallet  172,696 data.table   31782 x 3
> pallets 405,736 data.table  14   1782 x 14
> picks26,572,856 data.table  19 154247 x 19
> wb  656   Workbook   1
> wSplit   68,043,136   list1782
> x56numeric   2
> **Total  95,460,000--- --- ---
>
> 
>> my.ls
> function (pos = 1, sorted = FALSE, envir = as.environment(pos))
> {
>  .result

Re: [R] find the data frames in list of objects and make a list of them

2014-08-13 Thread Matthew


Hi Richard,

Thank you very much for your reply and your code.
Your code is doing just what I asked for, but does not seem to be what I 
need.


I will need to review some basic R before I can continue.

I am trying to list data frames in order to bind them into 1 single data 
frame with something like: dplyr::rbind_all(list of data frames), but 
when I try dplyr::rbind_all(lsDataFrame(ls())), I get the error: object 
at index 1 not a data.frame. So, I am going to have to learn some more 
about lists in R before proceding.


Thank you for your help and code.

Matthew





Matthew

On 8/13/2014 3:12 PM, Richard M. Heiberger wrote:

I would do something like this

lsDataFrame <- function(xx=ls()) xx[sapply(xx, function(x)
is.data.frame(get(x)))]
ls("package:datasets")
lsDataFrame(ls("package:datasets"))

On Wed, Aug 13, 2014 at 2:56 PM, Matthew
 wrote:

Hi everyone,

I would like the find which objects are data frames in all the objects I
have created ( in other words in what you get when you type: ls()  ), then I
would like to make a list of these data frames.

Explained in other words; after typing ls(), you get the names of objects.
Which objects are data frames ?  How to then make a list of these data
frames.

A second question: is this the best way to make a list of data frames
without having to manually type c(dataframe1, dataframe2, ...)  ?

Matthew

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] find the data frames in list of objects and make a list of them

2014-09-04 Thread Matthew

Thank you very much, Bill !

 It has taken my a while to figure out, but yes, what I need is a 
list (the R object, list) of data frames and not a character vector 
containing the names of the data frames.

   Thank you very much. This works well and is getting me in the 
direction I want to go.

Matthew

On 8/13/2014 7:40 PM, William Dunlap wrote:
> Previously you asked
>>  A second question: is this the best way to make a list
>> of data frames without having to manually type c(dataframe1, dataframe2, 
>> ...)  ?
> If you use 'c' there you will not get a list of data.frames - you will
> get a list of all the columns in the data.frame you supplied.  Use
> 'list' instead of 'c' if you are taking that route.
>
> The *apply functions are helpful  here.  To make list of all
> data.frames in an environment you can use the following function,
> which takes the environment to search as an argument.
>
> f <- function(envir = globalenv()) {
>  tmp <- eapply(envir,
> all.names=TRUE,
> FUN=function(obj) if (is.data.frame(obj))
> obj else NULL)
>  # remove NULL's now
>  tmp[!vapply(tmp, is.null, TRUE)]
> }
>
> Use is as
>allDataFrames <- f(globalenv()) # or just f()
>
>
>
>
>
>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
>
>
> On Wed, Aug 13, 2014 at 3:49 PM, Matthew
>   wrote:
>> Hi Richard,
>>
>>  Thank you very much for your reply and your code.
>> Your code is doing just what I asked for, but does not seem to be what I
>> need.
>>
>> I will need to review some basic R before I can continue.
>>
>> I am trying to list data frames in order to bind them into 1 single data
>> frame with something like: dplyr::rbind_all(list of data frames), but when I
>> try dplyr::rbind_all(lsDataFrame(ls())), I get the error: object at index 1
>> not a data.frame. So, I am going to have to learn some more about lists in R
>> before proceding.
>>
>> Thank you for your help and code.
>>
>> Matthew
>>
>>
>>
>>
>>
>> Matthew
>>
>> On 8/13/2014 3:12 PM, Richard M. Heiberger wrote:
>>> I would do something like this
>>>
>>> lsDataFrame <- function(xx=ls()) xx[sapply(xx, function(x)
>>> is.data.frame(get(x)))]
>>> ls("package:datasets")
>>> lsDataFrame(ls("package:datasets"))
>>>
>>> On Wed, Aug 13, 2014 at 2:56 PM, Matthew
>>>   wrote:
>>>> Hi everyone,
>>>>
>>>>  I would like the find which objects are data frames in all the
>>>> objects I
>>>> have created ( in other words in what you get when you type: ls()  ),
>>>> then I
>>>> would like to make a list of these data frames.
>>>>
>>>> Explained in other words; after typing ls(), you get the names of
>>>> objects.
>>>> Which objects are data frames ?  How to then make a list of these data
>>>> frames.
>>>>
>>>>  A second question: is this the best way to make a list of data frames
>>>> without having to manually type c(dataframe1, dataframe2, ...)  ?
>>>>
>>>> Matthew
>>>>
>>>> __
>>>> R-help@r-project.org  mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>> __
>> R-help@r-project.org  mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] change default installation of R

2014-10-30 Thread Matthew

I have R version 2.15.0 installed in /usr/local/bin, and this is the 
default; in other words when I type which R this is the path I get.

I also have installed R into/usr/local/R-3.1.1/.   I used ./configure 
and then make to install this version. After make, I get the following 
error messages:

../unix/sys-std.o: In function `initialize_rlcompletion':
/usr/local/R-3.1.1/src/unix/sys-std.c:689: undefined reference to 
`rl_sort_completion_matches'
collect2: ld returned 1 exit status
make[3]: *** [R.bin] Error 1
make[3]: Leaving directory `/usr/local/R-3.1.1/src/main'
make[2]: *** [R] Error 2
make[2]: Leaving directory `/usr/local/R-3.1.1/src/main'
make[1]: *** [R] Error 1
make[1]: Leaving directory `/usr/local/R-3.1.1/src'
make: *** [R] Error 1

I want to change R-3.1.1 to the default, so that when I type which R, I 
get /usr/local/R-3.1.1

To do this I first cd'd into /usr/local/bin and renamed R to R-old_10-30-14
then created a symlink by 'ln -s /usr/local/R-3.1.1/bin  R'
but when I type which R, I get 'no R in ... , where ' . . . ' is my PATH 
variable.

If I remove the symlink and then create another one with ln -s 
/usr/local/R-3.1.1/bin/R  R,
then after typing 'which R', I get
/usr/local/bin/R: line 259: /usr/local/R-3.1.1/bin/exe
c/R: No such file or directory
/usr/local/bin/R: line 259: exec: /usr/local/R-3.1.1/bin/exec/R: cannot 
execute: No such file or directory

This is the same message I get if I just type at the command line: 
/usr/local/R-3.1.1/bin/R.

Matthew

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] transpose and split dataframe

2019-04-30 Thread Matthew

I have a data frame that is a lot bigger but for simplicity sake we can 
say it looks like this:


Regulator    hits
AT1G69490    AT4G31950,AT5G24110,AT1G26380,AT1G05675
AT2G55980    AT2G85403,AT4G89223

   In other words:

data.frame : 2 obs. of 2 variables
$Regulator: Factor w/ 2 levels
$hits : Factor w/ 6 levels

  I want to transpose it so that Regulator is now the column headings 
and each of the AGI numbers now separated by commas is a row. So, 
AT1G69490 is now the header of the first column and AT4G31950 is row 1 
of column 1, AT5G24110 is row 2 of column 1, etc. AT2G55980 is header of 
column 2 and AT2G85403 is row 1 of column 2, etc.


  I have tried playing around with strsplit(TF2list[2:2]) and 
strsplit(as.character(TF2list[2:2]), but I am getting nowhere.


Matthew

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Fwd: Re: transpose and split dataframe

2019-04-30 Thread Matthew

Thanks for your reply. I was trying to simplify it a little, but must 
have got it wrong. Here is the real dataframe, TF2list:

  str(TF2list)
'data.frame':    152 obs. of  2 variables:
  $ Regulator: Factor w/ 87 levels "AT1G02065","AT1G13960",..: 17 6 6 54 
54 82 82 82 82 82 ...
  $ hits : Factor w/ 97 levels 
"AT1G05675,AT3G12910,AT1G22810,AT1G14540,AT1G21120,AT1G07160,AT5G22520,AT1G56250,AT2G31345,AT5G22530,AT4G11170,A"|

__truncated__,..: 65 57 90 57 87 57 56 91 31 17 ...

    And the first few lines resulting from dput(head(TF2list)):

dput(head(TF2list))
structure(list(Regulator = structure(c(17L, 6L, 6L, 54L, 54L,
82L), .Label = c("AT1G02065", "AT1G13960", "AT1G18860", "AT1G23380",
"AT1G29280", "AT1G29860", "AT1G30650", "AT1G55600", "AT1G62300",
"AT1G62990", "AT1G64000", "AT1G66550", "AT1G66560", "AT1G66600",
"AT1G68150", "AT1G69310", "AT1G69490", "AT1G69810", "AT1G70510", ...

This is another way of looking at the first 4 entries (Regulator is 
tab-separated from hits):

Regulator
   hits
1
AT1G69490

AT4G31950,AT5G24110,AT1G26380,AT1G05675,AT3G12910,AT5G64905,AT1G22810,AT1G79680,AT3G02840,AT5G25260,AT5G57220,AT2G37430,AT2G26560,AT1G56250,AT3G23230,AT1G16420,AT1G78410,AT4G22030,AT5G05300,AT1G69930,AT4G03460,AT4G11470,AT5G25250,AT5G36925,AT2G30750,AT1G16150,AT1G02930,AT2G19190,AT4G11890,AT1G72520,AT4G31940,AT5G37490,AT5G52760,AT5G66020,AT3G57460,AT4G23220,AT3G15518,AT2G43620,AT2G02010,AT1G35210,AT5G46295,AT1G17147,AT1G11925,AT2G39200,AT1G02920,AT2G40180,AT1G59865,AT4G35180,AT4G15417,AT1G51820,AT1G06135,AT1G36622,AT5G42830
2
AT1G29860

AT4G31950,AT5G24110,AT1G05675,AT3G12910,AT5G64905,AT1G22810,AT1G14540,AT1G79680,AT1G07160,AT3G23250,AT5G25260,AT1G53625,AT5G57220,AT2G37430,AT3G54150,AT1G56250,AT3G23230,AT1G16420,AT1G78410,AT4G22030,AT1G69930,AT4G03460,AT4G11470,AT5G25250,AT5G36925,AT4G14450,AT2G30750,AT1G16150,AT1G02930,AT2G19190,AT4G11890,AT1G72520,AT4G31940,AT5G37490,AT4G08555,AT5G66020,AT5G26920,AT3G57460,AT4G23220,AT3G15518,AT2G43620,AT1G35210,AT5G46295,AT1G17147,AT1G11925,AT2G39200,AT1G02920,AT4G35180,AT4G15417,AT1G51820,AT4G40020,AT1G06135

3
AT1G2986

AT5G64905,AT1G21120,AT1G07160,AT5G25260,AT1G53625,AT1G56250,AT2G31345,AT4G11170,AT1G66090,AT1G26410,AT3G55840,AT1G69930,AT4G03460,AT5G25250,AT5G36925,AT1G26420,AT5G42380,AT1G16150,AT2G22880,AT1G02930,AT4G11890,AT1G72520,AT5G66020,AT2G43620,AT2G44370,AT4G15975,AT1G35210,AT5G46295,AT1G11925,AT2G39200,AT1G02920,AT4G14370,AT4G35180,AT4G15417,AT2G18690,AT5G11140,AT1G06135,AT5G42830

    So, the goal would be to

first: Transpose the existing dataframe so that the factor Regulator 
becomes a column name (column 1 name = AT1G69490, column2 name 
AT1G29860, etc.) and the hits associated with each Regulator become 
rows. Hits is a comma separated 'list' ( I do not not know if 
technically it is an R list.), so it would have to be comma 
'unseparated' with each entry becoming a row (col 1 row 1 = AT4G31950, 
col 1 row 2 - AT5G24410, etc); like this :

AT1G69490
AT4G31950
AT5G24110
AT1G05675
AT5G64905

... I did not include all the rows)

I think it would be best to actually make the first entry a separate 
dataframe ( 1 column with name = AT1G69490 and number of rows depending 
on the number of hits), then make the second column (column name = 
AT1G29860, and number of rows depending on the number of hits) into a 
new dataframe and do a full join of of the two dataframes; continue by 
making the third column (column name = AT1G2986) into a dataframe and 
full join it with the previous; continue for the 152 observations so 
that then end result is a dataframe with 152 columns and number of rows 
depending on the entry with the greatest number of hits. The full joins 
I can do with dplyr, but getting up to that point seems rather difficult.

This would get me what my ultimate goal would be; each Regulator is a 
column name (152 columns) and a given row has either NA or the same hit.

    This seems very difficult to me, but I appreciate any attempt.

Matthew

On 4/30/2019 4:34 PM, David L Carlson wrote:
>  External Email - Use Caution
>
> I think we need more information. Can you give us the structure of the data 
> with str(YourDataFrame). Alternatively you could copy a small piece into your 
> email message by copying and pasting the results of the following code:
>
> dput(head(YourDataFrame))
>
> The data frame you present could not be a data frame since you say "hits" is 
> a factor with a variable number of elements. If each value of "hits" was a 
> single character string, it would only have 2 factor levels not 6 and your 
> efforts to parse the string would make more sense. Transposing to a data 
> frame would only be possi

Re: [R] Fwd: Re: transpose and split dataframe

2019-05-01 Thread Matthew

Thank you very much, David and Jim for your work and solutions.

I have been working through both of them to better learn R. They both 
proceed through a similar logic except David's starts with a character 
matrix and Jim's with a dataframe, and both end with equivalent 
dataframes (  identical(tmmdf, TF2list2)) returns TRUE  ). They have 
both been very helpful. However, there is one attribute of my intended 
final dataframe that is missing.

Looking at part of the final dataframe:

  head(tmmdf)
   AT1G69490 AT1G29860 AT1G29860.1 AT4G18170 AT4G18170.1 AT5G46350
1 *AT4G31950* *AT4G31950*   AT5G64905 *AT4G31950* AT5G64905 *AT4G31950*
2 AT5G24110 AT5G24110   AT1G21120 AT5G24110   AT1G14540 AT5G24110
3 AT1G26380 AT1G05675   AT1G07160 AT1G05675   AT1G21120 AT1G05675

Row 1 has *AT4G31950* in columns 1,2,4 and 6, but AT4G31950 in columns 3 
and 5. What I was aiming at would be that each row would have a unique 
entry so that AT4G31950 is row 1 columns 1,2,4 and 6, and NA is row 1 
columns 3 and 5. AT4G31950 is row 2 columns 3 and 5 and NA is row 2 
columns 1,2,4 and 6. So, it would look like this:

  head(intended_df)
   AT1G69490 AT1G29860 AT1G29860.1 AT4G18170 AT4G18170.1 AT5G46350
1 AT4G31950 AT4G31950 NA                AT4G31950       NA         
AT4G31950

2      NA                NA           AT4G31950       NA            
AT4G31950      NA

I have been trying to adjust the code to get my intended result 
basically by trying to build a dataframe one column at a time from each 
entry in the character matrix, but have not got anything near working yet.

Matthew

On 4/30/2019 6:29 PM, David L Carlson wrote
> If you read the data frame with read.csv() or one of the other read() 
> functions, use the asis=TRUE argument to prevent conversion to factors. If 
> not do the conversion first:
>
> # Convert factors to characters
> DataMatrix <- sapply(TF2list, as.character)
> # Split the vector of hits
> DataList <- sapply(DataMatrix[, 2], strsplit, split=",")
> # Use the values in Regulator to name the parts of the list
> names(DataList) <- DataMatrix[,"Regulator"]
>
> # Now create a data frame
> # How long is the longest list of hits?
> mx <- max(sapply(DataList, length))
> # Now add NAs to vectors shorter than mx
> DataList2 <- lapply(DataList, function(x) c(x, rep(NA, mx-length(x
> # Finally convert back to a data frame
> TF2list2 <- do.call(data.frame, DataList2)
>
> Try this on a portion of the list, say 25 lines and print each object to see 
> what is happening.
>
> 
> David L Carlson
> Department of Anthropology
> Texas A&M University
> College Station, TX 77843-4352
>
>
>
>
>
> -Original Message-
> From: R-help  On Behalf Of Matthew
> Sent: Tuesday, April 30, 2019 4:31 PM
> To: r-help@r-project.org
> Subject: [R] Fwd: Re: transpose and split dataframe
>
> Thanks for your reply. I was trying to simplify it a little, but must
> have got it wrong. Here is the real dataframe, TF2list:
>
>    str(TF2list)
> 'data.frame':    152 obs. of  2 variables:
>    $ Regulator: Factor w/ 87 levels "AT1G02065","AT1G13960",..: 17 6 6 54
> 54 82 82 82 82 82 ...
>    $ hits : Factor w/ 97 levels
> "AT1G05675,AT3G12910,AT1G22810,AT1G14540,AT1G21120,AT1G07160,AT5G22520,AT1G56250,AT2G31345,AT5G22530,AT4G11170,A"|
> __truncated__,..: 65 57 90 57 87 57 56 91 31 17 ...
>
>      And the first few lines resulting from dput(head(TF2list)):
>
> dput(head(TF2list))
> structure(list(Regulator = structure(c(17L, 6L, 6L, 54L, 54L,
> 82L), .Label = c("AT1G02065", "AT1G13960", "AT1G18860", "AT1G23380",
> "AT1G29280", "AT1G29860", "AT1G30650", "AT1G55600", "AT1G62300",
> "AT1G62990", "AT1G64000", "AT1G66550", "AT1G66560", "AT1G66600",
> "AT1G68150", "AT1G69310", "AT1G69490", "AT1G69810", "AT1G70510", ...
>
> This is another way of looking at the first 4 entries (Regulator is
> tab-separated from hits):
>
> Regulator
>     hits
> 1
> AT1G69490
>    
> AT4G31950,AT5G24110,AT1G26380,AT1G05675,AT3G12910,AT5G64905,AT1G22810,AT1G79680,AT3G02840,AT5G25260,AT5G57220,AT2G37430,AT2G26560,AT1G56250,AT3G23230,AT1G16420,AT1G78410,AT4G22030,AT5G05300,AT1G69930,AT4G03460,AT4G11470,AT5G25250,AT5G36925,AT2G30750,AT1G16150,AT1G02930,AT2G19190,AT4G11890,AT1G72520,AT4G31940,AT5G37490,AT5G52760,AT5G66020,AT3G57460,AT4G23220,AT3G15518,AT2G43620,AT2G02010,AT1G35210,AT5G46295,AT1G17147,AT1G11925,AT2G39200,AT1G02920,AT2G40180,AT1G59865,AT4G35180,AT4G15417,AT1G51820,AT1G06135,AT1G36622,AT5G42830
> 2
> AT1G29860
>

Re: [R] Fwd: Re: transpose and split dataframe

2019-05-06 Thread Matthew

Thank you very much Jim and David for your scripts and accompanying 
explanations.

I was intrigued at the results that came from David's script.  As seen 
below where I have taken a small piece of his DataTable:

AT1G69490 AT1G29860    AT4G18170 *AT5G46350*
AT1G01560    0    0    0    1
*AT1G02920*    1    2    2    4
AT1G02930    1    2    2    4
AT1G05675    1    1    1    2

    There are numbers other than 1 or 0, which was not what I was 
expecting. The data I am working with come from downloading results of 
an analysis done at a particular web site. I looked at Jim's solution, 
and the equivalent of the above would be:

     AT1G69490 _AT1G29860_ _AT1G29860_ AT4G18170    AT4G18170 
*AT5G46350    AT5G46350 AT5G46350    AT5G46350    AT5G46350*
AT1G01560    NA    NA    NA    NA    NA    NA NA    NA    AT1G01560    NA
*AT1G02920*    AT1G02920    AT1G02920 AT1G02920    AT1G02920    
AT1G02920    AT1G02920    AT1G02920 AT1G02920    AT1G02920    NA
AT1G02930    AT1G02930    AT1G02930    AT1G02930 AT1G02930    
AT1G02930    AT1G02930    AT1G02930    AT1G02930 AT1G02930    NA
AT1G05675    AT1G05675    AT1G05675    NA AT1G05675    NA    
AT1G05675    AT1G05675    NA    NA    NA

   The above is the format that I was desiring, but I was not expecting 
that a single ATG number would be the name of multiple columns. As shown 
above, _AT1G2960_ is the name of two columns and *AT5G46350* is the name 
of 5 columns (You may have to widen the e-mail across the screen to see 
it clearly). When a single ATG number, such as AT5G46350, names multiple 
columns, then the contents of each of those columns may or may not be 
the same. For example, going across a single row looking at *AT1G02920*, 
it occurs in the first column, hence the 1 in David's DataTable. It 
occurs in both AT1G29860 columns, hence the 2 in the DataTable. It again 
occurs in both AT4G18170 columns, so another 2 in the DataTable, and 
finally it occurs in only 4 of the 5 AT5G46350 columns, so the 4 in the 
DataTable.

     When the same ATG number names multiple columns it is because 
different methods were used to determine the content of each column. So, 
if an ATG number such as AT1G05675 occurs in all columns with the same 
name, I then know that it was by multiple methods that this has been 
shown, and if it only occurs in some of the columns, I know that all 
methods did not associate it with the column name ATG.  David's result 
complements Jim's, and both end up being very helpful to me.

   Thanks again to both of you for your time and help.

Matthew

On 5/2/2019 8:40 PM, Jim Lemon wrote:
>  External Email - Use Caution
>
> Hi again,
> Just noticed that the NA fill in the original solution is unnecessary, thus:
>
> # split the second column at the commas
> hitsplit<-strsplit(mmdf$hits,",")
> # get all the sorted hits
> allhits<-sort(unique(unlist(hitsplit)))
> tmmdf<-as.data.frame(matrix(NA,ncol=length(hitsplit),nrow=length(allhits)))
> # change the names of the list
> names(tmmdf)<-mmdf$Regulator
> for(column in 1:length(hitsplit)) {
>   hitmatches<-match(hitsplit[[column]],allhits)
>   hitmatches<-hitmatches[!is.na(hitmatches)]
>   tmmdf[hitmatches,column]<-allhits[hitmatches]
> }
>
> Jim
>
> On Fri, May 3, 2019 at 10:32 AM Jim Lemon  wrote:
>> Hi Matthew,
>> I'm not sure whether you want something like your initial request or
>> David's solution. The result of this can be transformed into the
>> latter:
>>
>> mmdf<-read.table(text="Regulator hits
>> AT1G69490 
>> AT4G31950,AT5G24110,AT1G26380,AT1G05675,AT3G12910,AT5G64905,AT1G22810,AT1G79680,AT3G02840,AT5G25260,AT5G57220,AT2G37430,AT2G26560,AT1G56250,AT3G23230,AT1G16420,AT1G78410,AT4G22030,AT5G05300,AT1G69930,AT4G03460,AT4G11470,AT5G25250,AT5G36925,AT2G30750,AT1G16150,AT1G02930,AT2G19190,AT4G11890,AT1G72520,AT4G31940,AT5G37490,AT5G52760,AT5G66020,AT3G57460,AT4G23220,AT3G15518,AT2G43620,AT2G02010,AT1G35210,AT5G46295,AT1G17147,AT1G11925,AT2G39200,AT1G02920,AT2G40180,AT1G59865,AT4G35180,AT4G15417,AT1G51820,AT1G06135,AT1G36622,AT5G42830
>> AT1G29860 
>> AT4G31950,AT5G24110,AT1G05675,AT3G12910,AT5G64905,AT1G22810,AT1G14540,AT1G79680,AT1G07160,AT3G23250,AT5G25260,AT1G53625,AT5G57220,AT2G37430,AT3G54150,AT1G56250,AT3G23230,AT1G16420,AT1G78410,AT4G22030,AT1G69930,AT4G03460,AT4G11470,AT5G25250,AT5G36925,AT4G14450,AT2G30750,AT1G16150,AT1G02930,AT2G19190,AT4G11890,AT1G72520,AT4G31940,AT5G37490,AT4G08555,AT5G66020,AT5G26920,AT3G57460,AT4G23220,AT3G15518,AT2G43620,AT1G35210,AT5G46295,AT1G17147,AT1G11925,AT2G39200,AT1G02920,AT4G35180,AT4G15417,AT1G51820,AT4G40020,AT1G06135
>> AT1G2986 
>> AT5G64905,AT1G21120,AT1G07160,AT5G25260,AT1G53625,AT1G56250,AT2G31345,AT4G11170,AT1G66090,AT1G26410,AT3G55840,AT1G69930,AT4G03460,AT5G25250,AT5G36925,AT1G26420,AT5G42380,AT1G16150,AT2G22880,AT1G02930,AT4G

Re: [R] Contour lines in a persp plot

2013-05-17 Thread Matthew

Thanks a lot, that is all i want. If someone is interessed, see the code
below

panel.3d.contour <- 
function(x, y, z, rot.mat, distance, 
 nlevels = 20, zlim.scaled, ...) # les3 points de suspension
pour dire les autres paramètres sont ceux données par défaut
{
add.line <- trellis.par.get("add.line")
panel.3dwire(x, y, z, rot.mat, distance, 
 zlim.scaled = zlim.scaled, ...)
clines <- 
contourLines(x, y, matrix(z, nrow = length(x), byrow = TRUE),
 nlevels = nlevels)
for (ll in clines) {
m <- ltransform3dto3d(rbind(ll$x, ll$y, zlim.scaled[2]), 
  rot.mat, distance)
panel.lines(m[1,], m[2,], col = add.line$col,
lty = add.line$lty, lwd = add.line$lwd)
}
}


fn<-function(x,y){sin(x)+2*y} #this looks like a corrugated tin roof

x<-seq(from=1,to=100,by=2) #generates a list of x values to sample
y<-seq(from=1,to=100,by=2) #generates a list of y values to sample

z<-outer(x,y,FUN=fn) #applies the funct. across the combos of x and y


wireframe(z,zlim = c(1, 300), nlevels = 10,
  aspect = c(1, 0.5), panel.aspect = 0.6,
  panel.3d.wireframe = panel.3d.contour,
   shade = FALSE ,
  screen = list(z = 20, x = -60))



--
View this message in context: 
http://r.789695.n4.nabble.com/Contour-lines-in-a-persp-plot-tp4667220p4667309.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Save intermediate result in a same file

2013-10-01 Thread Matthew

Hello everybody,

i have to save a 100 iteration computation in a file every 5 iterations
until the end.
I first give a vector A  of 100 elements for the 100 iterations and i want
to update A every 5 iterations.

I use "save" but it doesn't work. 
Someone has an idea,  i need a help

Cheers.





--
View this message in context: 
http://r.789695.n4.nabble.com/Save-intermediate-result-in-a-same-file-tp4677350.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] fwrite() not found in data.table package

2017-10-02 Thread Matthew Keller

Hi all,

I used to use fwrite() function in data.table but I cannot get it to work
now. The function is not in the data.table package, even though a help page
exists for it. My session info is below. Any ideas on how to get fwrite()
to work would be much appreciated. Thanks!

> sessionInfo()
R version 3.2.0 (2015-04-16)
Platform: x86_64-unknown-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server release 6.3 (Santiago)

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8LC_PAPER=en_US.UTF-8
 [8] LC_NAME=C  LC_ADDRESS=C
LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

other attached packages:
[1] data.table_1.10.5

loaded via a namespace (and not attached):
[1] tools_3.2.0  chron_2.3-47 tcltk_3.2.0

-- 
Matthew C Keller
Asst. Professor of Psychology
University of Colorado at Boulder
www.matthewckeller.com

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] fwrite() not found in data.table package

2017-10-02 Thread Matthew Keller

Thanks Jeff!

It turns out that my problem was that I tried to install the newest
data.table package while the old data.table package was loaded in R. Full
instructions for installing data.table are here:
https://github.com/Rdatatable/data.table/wiki/Installation

On Mon, Oct 2, 2017 at 10:55 AM, Jeff Newmiller 
wrote:

> You are asking about (a) a contributed package (b) for a package version
> that is not in CRAN and (c) an R version that is outdated, which stretches
> the definition of "on topic" here. Since that function does not appear to
> have been removed from that package (I am not installing a development
> version to test if it is broken for your benefit), I will throw out a guess
> that if you update R to 3.4.1 or 3.4.2 then things might start working. If
> not, I suggest you use the CRAN version of the package and create a
> reproducible example (check it with package reprex) and try again here, or
> ask one of the maintainers of that package.
> --
> Sent from my phone. Please excuse my brevity.
>
> On October 2, 2017 8:56:46 AM PDT, Matthew Keller 
> wrote:
> >Hi all,
> >
> >I used to use fwrite() function in data.table but I cannot get it to
> >work
> >now. The function is not in the data.table package, even though a help
> >page
> >exists for it. My session info is below. Any ideas on how to get
> >fwrite()
> >to work would be much appreciated. Thanks!
> >
> >> sessionInfo()
> >R version 3.2.0 (2015-04-16)
> >Platform: x86_64-unknown-linux-gnu (64-bit)
> >Running under: Red Hat Enterprise Linux Server release 6.3 (Santiago)
> >
> >locale:
> > [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
> >LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
> >LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
> >LC_PAPER=en_US.UTF-8
> > [8] LC_NAME=C  LC_ADDRESS=C
> >LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8
> >LC_IDENTIFICATION=C
> >
> >attached base packages:
> >[1] stats graphics  grDevices utils datasets  methods   base
> >
> >other attached packages:
> >[1] data.table_1.10.5
> >
> >loaded via a namespace (and not attached):
> >[1] tools_3.2.0  chron_2.3-47 tcltk_3.2.0
>



-- 
Matthew C Keller
Asst. Professor of Psychology
University of Colorado at Boulder
www.matthewckeller.com

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] tcltk2 entry box

2015-07-08 Thread Matthew McCormack

Wow !  Very nice.  Thank you very much, John.  This is very helpful and 
just what I need.
Yes, I can see that I should have paid attention to tcltk before going 
to tcltk2.


Matthew

On 7/8/2015 8:37 PM, John Fox wrote:

Dear Matthew,

For file selection, see ?tcltk::tk_choose.files or ?tcltk::tkgetOpenFile .

You could enter a number in a tk entry widget, but, depending upon the
nature of the number, a slider or other widget might be a better choice.

For a variety of helpful tcltk examples see
<http://www.sciviews.org/_rgui/tcltk/>, originally by James Wettenhall but
now maintained by Philippe Grosjean (the author of the tcltk2 package). (You
probably don't need tcltk2 for the simple operations that you mention, but
see ?tk2spinbox for an alternative to a slider.)

Best,
  John

---
John Fox, Professor
McMaster University
Hamilton, Ontario, Canada
http://socserv.socsci.mcmaster.ca/jfox/





-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Matthew
Sent: July-08-15 8:01 PM
To: r-help
Subject: [R] tcltk2 entry box

Is anyone familiar enough with the tcltk2 package to know if it is
possible to have an entry box where a user can enter information (such
as a path to a file or a number) and then be able to use the entered
information downstream in a R script ?

The idea is for someone unfamiliar with R to just start an R script that
would take care of all the commands for them so all they have to do is
get the script started. However, there is always a couple of pieces of
information that will change each time the script is used (for example,
a different file will be processed by the script). So, I would like a
way for the user to input that information as the script ran.

Matthew McCormack

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-
guide.html
and provide commented, minimal, self-contained, reproducible code.


---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] setting up R -- VM Fusion, WIndows7

2015-07-30 Thread Matthew Johnson

Hi,

As i need R to speak to Bloomberg (and big only runs on windows), i'm
running windows 7 via VM Fusion on my mac.

I think i am having permission problems, as i cannot use install.packages,
and cannot change .libPaths via either a .Rprofile, or Profile.site.

I've posted more detail in this super-user question --
http://superuser.com/questions/948083/how-to-set-environment-variables-in-vm-fusion-windows-7

Throwing it over to this list as well, as I've spent about half the time i
had allowed for my project on (not getting) set up.

I realise this is a very niche problem - hoping that someone else has had a
similar problem, and can offer pointers.

best

mj

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] simple question - mean of a row of a data.frame

2015-02-11 Thread Matthew Keller

Hi all,

Simple question I should know: I'm unclear on the logic of why the sum of a
row of a data.frame returns a valid sum but the mean of a row of a
data.frame returns NA:

sum(rock[2,])
[1] 10901.05

mean(rock[2,],trim=0)
[1] NA
Warning message:
In mean.default(rock[2, ], trim = 0) :
  argument is not numeric or logical: returning NA

I get that rock[2,] is itself a data.frame of mode list, but why the
inconsistency between functions? How can you figure this out from, e.g.,
?mean
?sum

Thanks in advance,

Matt


-- 
Matthew C Keller
Asst. Professor of Psychology
University of Colorado at Boulder
www.matthewckeller.com

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Defining Variables from a Matrix for 10-Fold Cross Validation

2018-10-09 Thread matthew campbell

Good afternoon,

I am trying to run a 10-fold CV, using a matrix as my data set.
Essentially, I want "y" to be the first column of the matrix, and my "x" to
be all remaining columns (2-257). I've posted some of the code I used
below, and the data set (called "zip.train") is in the "ElemStatLearn"
package. The error message is highlighted in red, and the corresponding
section of code is bolded. (I am not concerned with the warning message,
just the error message).

The issue I am experiencing is the error message below the code: I haven't
come across that specific message before, and am not exactly sure how to
interpret its meaning. What exactly is this error message trying to tell
me?  Any suggestions or insights are appreciated!

Thank you all,

Matthew Campbell


> library (ElemStatLearn)
> library(kknn)
> data(zip.train)
> train=zip.train[which(zip.train[,1] %in% c(2,3)),]
> test=zip.test[which(zip.test[,1] %in% c(2,3)),]
> nfold = 10
> infold = sample(rep(1:10, length.out = (x)))
Warning message:
In rep(1:10, length.out = (x)) :
  first element used of 'length.out' argument
>
*> mydata = data.frame(x = train[ , c(2,257)] , y = train[ , 1])*
>
> K = 20
> errorMatrix = matrix(NA, K, 10)
>
> for (l in nfold)
+ {
+   for (k in 1:20)
+   {
+ knn.fit = kknn(y ~ x, train = mydata[infold != l, ], test =
mydata[infold == l, ], k = k)
+ errorMatrix[k, l] = mean((knn.fit$fitted.values - mydata$y[infold ==
l])^2)
+   }
+ }
Error in model.frame.default(formula, data = train) :
  variable lengths differ (found for 'x')

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Define pch and color based on two different columns

2019-04-09 Thread Matthew Snyder

I am making a lattice plot and I would like to use the value in one column
to define the pch and another column to define color of points. Something
like:

xyplot(mpg ~ wt | cyl,
   data=mtcars,
   col = gear,
   pch = carb
)

There are unique pch points in the second and third panels, but these
points are only unique within the plots, not among all the plots (as they
should be). You can see this if you use the following code:

xyplot(mpg ~ wt | cyl,
   data=mtcars,
   groups = carb
)

This plot looks great for one group, but if you try to invoke two groups
using c(gear, carb) I think it simply takes unique combinations of those
two variables and plots them as unique colors.

Another solution given by a StackExchange user:

mypch <- 1:6
mycol <- 1:3

xyplot(mpg ~ wt | cyl,
  panel = function(x, y, ..., groups, subscripts) {
  pch <- mypch[factor(carb[subscripts])]
  col <- mycol[factor(gear[subscripts])]
  grp <- c(gear,carb)
  panel.xyplot(x, y, pch = pch, col = col)
  }
)

This solution has the same problems as the code at the top. I think the
issue causing problems with both solutions is that not every value for each
group is present in each panel, and they are almost never in the same
order. I think R is just interpreting the appearance of unique values as a
signal to change to the next pch or color. My actual data file is very
large, and it's not possible to sort my way out of this mess. It would be
best if I could just use the value in two columns to actually define a
color or pch for each point on an entire plot. Is there a way to do this?

Ps, I had to post this via email because the Nabble site kept sending me an
error message: "Message rejected by filter rule match"

Thanks,
Matt



*Matthew R. Snyder*
*~*
PhD Candidate
University Fellow
University of Toledo
Computational biologist, ecologist, and bioinformatician
Sponsored Guest Researcher at NOAA PMEL, Seattle, WA.
matthew.snyd...@rockets.utoledo.edu
msnyder...@gmail.com



[image: Mailtrack]
<https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&;>
Sender
notified by
Mailtrack
<https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&;>
04/09/19,
1:49:27 PM

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Define pch and color based on two different columns

2019-04-09 Thread Matthew Snyder

Thanks, Jim.

I appreciate your contributed answer, but neither of those make the desired
plot either. I'm actually kind of shocked this isn't an easier more
straightforward thing. It seems like this would be something that a user
would want to do frequently. I can actually do this for single plots in
ggplot. Maybe I should contact the authors of lattice and see if this is
something they can help me with or if they would like to add this as a
feature in the future...

Matt



*Matthew R. Snyder*
*~*
PhD Candidate
University Fellow
University of Toledo
Computational biologist, ecologist, and bioinformatician
Sponsored Guest Researcher at NOAA PMEL, Seattle, WA.
matthew.snyd...@rockets.utoledo.edu
msnyder...@gmail.com



[image: Mailtrack]
<https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&;>
Sender
notified by
Mailtrack
<https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&;>
04/09/19,
7:52:27 PM

On Tue, Apr 9, 2019 at 4:53 PM Jim Lemon  wrote:

> Hi Matthew,
> How about this?
>
> library(lattice)
> xyplot(mpg ~ wt | cyl,
>data=mtcars,
>col = mtcars$gear,
>pch = mtcars$carb
> )
> library(plotrix)
> grange<-range(mtcars$gear)
> xyplot(mpg ~ wt | cyl,
>data=mtcars,
>col =
> color.scale(mtcars$gear,extremes=c("blue","red"),xrange=grange),
>pch = as.character(mtcars$carb)
> )
>
> Jim
>
> On Wed, Apr 10, 2019 at 7:43 AM Matthew Snyder 
> wrote:
> >
> > I am making a lattice plot and I would like to use the value in one
> column
> > to define the pch and another column to define color of points. Something
> > like:
> >
> > xyplot(mpg ~ wt | cyl,
> >data=mtcars,
> >col = gear,
> >pch = carb
> > )
> >
> > There are unique pch points in the second and third panels, but these
> > points are only unique within the plots, not among all the plots (as they
> > should be). You can see this if you use the following code:
> >
> > xyplot(mpg ~ wt | cyl,
> >data=mtcars,
> >groups = carb
> > )
> >
> > This plot looks great for one group, but if you try to invoke two groups
> > using c(gear, carb) I think it simply takes unique combinations of those
> > two variables and plots them as unique colors.
> >
> > Another solution given by a StackExchange user:
> >
> > mypch <- 1:6
> > mycol <- 1:3
> >
> > xyplot(mpg ~ wt | cyl,
> >   panel = function(x, y, ..., groups, subscripts) {
> >   pch <- mypch[factor(carb[subscripts])]
> >   col <- mycol[factor(gear[subscripts])]
> >   grp <- c(gear,carb)
> >   panel.xyplot(x, y, pch = pch, col = col)
> >   }
> > )
> >
> > This solution has the same problems as the code at the top. I think the
> > issue causing problems with both solutions is that not every value for
> each
> > group is present in each panel, and they are almost never in the same
> > order. I think R is just interpreting the appearance of unique values as
> a
> > signal to change to the next pch or color. My actual data file is very
> > large, and it's not possible to sort my way out of this mess. It would be
> > best if I could just use the value in two columns to actually define a
> > color or pch for each point on an entire plot. Is there a way to do this?
> >
> > Ps, I had to post this via email because the Nabble site kept sending me
> an
> > error message: "Message rejected by filter rule match"
> >
> > Thanks,
> > Matt
> >
> >
> >
> > *Matthew R. Snyder*
> > *~*
> > PhD Candidate
> > University Fellow
> > University of Toledo
> > Computational biologist, ecologist, and bioinformatician
> > Sponsored Guest Researcher at NOAA PMEL, Seattle, WA.
> > matthew.snyd...@rockets.utoledo.edu
> > msnyder...@gmail.com
> >
> >
> >
> > [image: Mailtrack]
> > <
> https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&;
> >
> > Sender
> > notified by
> > Mailtrack
> > <
> https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&;
> >
> > 04/09/19,
> > 1:49:27 PM
> >
> > [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Define pch and color based on two different columns

2019-04-09 Thread Matthew Snyder

I want to have one column in a dataframe define the color and another
define the pch.

This can be done easily with a single panel:

xyplot(mpg ~ wt,
   data=mtcars,
   col = mtcars$gear,
   pch = mtcars$carb
)

This produces the expected result: two pch that are the same color are
unique in the whole plot. But when you add cyl as a factor. Those two
points are only unique within their respective panels, and not across the
whole plot.

Matt



*Matthew R. Snyder*
*~*
PhD Candidate
University Fellow
University of Toledo
Computational biologist, ecologist, and bioinformatician
Sponsored Guest Researcher at NOAA PMEL, Seattle, WA.
matthew.snyd...@rockets.utoledo.edu
msnyder...@gmail.com




[image: Mailtrack]
<https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&;>
Sender
notified by
Mailtrack
<https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&;>
04/09/19,
9:26:09 PM

On Tue, Apr 9, 2019 at 9:23 PM Bert Gunter  wrote:

> 1. I am quite sure that whatever it is that you want to do can be done.
> Probably straightforwardly. The various R graphics systems are mature and
> extensive.
>
> 2. But I, for one, do not understand from your post what it is that you
> want to do.  Nor does anyone else apparently.
>
> Cheers,
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along and
> sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Tue, Apr 9, 2019 at 8:10 PM Matthew Snyder 
> wrote:
>
>> Thanks, Jim.
>>
>> I appreciate your contributed answer, but neither of those make the
>> desired
>> plot either. I'm actually kind of shocked this isn't an easier more
>> straightforward thing. It seems like this would be something that a user
>> would want to do frequently. I can actually do this for single plots in
>> ggplot. Maybe I should contact the authors of lattice and see if this is
>> something they can help me with or if they would like to add this as a
>> feature in the future...
>>
>> Matt
>>
>>
>>
>> *Matthew R. Snyder*
>> *~*
>> PhD Candidate
>> University Fellow
>> University of Toledo
>> Computational biologist, ecologist, and bioinformatician
>> Sponsored Guest Researcher at NOAA PMEL, Seattle, WA.
>> matthew.snyd...@rockets.utoledo.edu
>> msnyder...@gmail.com
>>
>>
>>
>> [image: Mailtrack]
>> <
>> https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&;
>> >
>> Sender
>> notified by
>> Mailtrack
>> <
>> https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&;
>> >
>> 04/09/19,
>> 7:52:27 PM
>>
>> On Tue, Apr 9, 2019 at 4:53 PM Jim Lemon  wrote:
>>
>> > Hi Matthew,
>> > How about this?
>> >
>> > library(lattice)
>> > xyplot(mpg ~ wt | cyl,
>> >data=mtcars,
>> >col = mtcars$gear,
>> >pch = mtcars$carb
>> > )
>> > library(plotrix)
>> > grange<-range(mtcars$gear)
>> > xyplot(mpg ~ wt | cyl,
>> >data=mtcars,
>> >col =
>> > color.scale(mtcars$gear,extremes=c("blue","red"),xrange=grange),
>> >pch = as.character(mtcars$carb)
>> > )
>> >
>> > Jim
>> >
>> > On Wed, Apr 10, 2019 at 7:43 AM Matthew Snyder 
>> > wrote:
>> > >
>> > > I am making a lattice plot and I would like to use the value in one
>> > column
>> > > to define the pch and another column to define color of points.
>> Something
>> > > like:
>> > >
>> > > xyplot(mpg ~ wt | cyl,
>> > >data=mtcars,
>> > >col = gear,
>> > >pch = carb
>> > > )
>> > >
>> > > There are unique pch points in the second and third panels, but these
>> > > points are only unique within the plots, not among all the plots (as
>> they
>> > > should be). You can see this if you use the following code:
>> > >
>> > > xyplot(mpg ~ wt | cyl,
>> > >data=mtcars,
>> > >groups = carb
>> > > )
>> > >
>> > > This plot looks great for one group, but if you try to invoke two
>> groups
>> > > using c(gear, carb) I think it simply tak

Re: [R] Define pch and color based on two different columns

2019-04-09 Thread Matthew Snyder

I tried this too:

xyplot(mpg ~ wt | cyl, data=mtcars,
   # groups = carb,
   subscripts = TRUE,
   col = as.factor(mtcars$gear),
   pch = as.factor(mtcars$carb)
)

Same problem...


*Matthew R. Snyder*
*~*
PhD Candidate
University Fellow
University of Toledo
Computational biologist, ecologist, and bioinformatician
Sponsored Guest Researcher at NOAA PMEL, Seattle, WA.
matthew.snyd...@rockets.utoledo.edu
msnyder...@gmail.com




[image: Mailtrack]
<https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&;>
Sender
notified by
Mailtrack
<https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&;>
04/09/19,
9:28:11 PM

On Tue, Apr 9, 2019 at 8:18 PM Jeff Newmiller 
wrote:

> Maybe you should use factors rather than character columns.
>
> On April 9, 2019 8:09:43 PM PDT, Matthew Snyder 
> wrote:
> >Thanks, Jim.
> >
> >I appreciate your contributed answer, but neither of those make the
> >desired
> >plot either. I'm actually kind of shocked this isn't an easier more
> >straightforward thing. It seems like this would be something that a
> >user
> >would want to do frequently. I can actually do this for single plots in
> >ggplot. Maybe I should contact the authors of lattice and see if this
> >is
> >something they can help me with or if they would like to add this as a
> >feature in the future...
> >
> >Matt
> >
> >
> >
> >*Matthew R. Snyder*
> >*~*
> >PhD Candidate
> >University Fellow
> >University of Toledo
> >Computational biologist, ecologist, and bioinformatician
> >Sponsored Guest Researcher at NOAA PMEL, Seattle, WA.
> >matthew.snyd...@rockets.utoledo.edu
> >msnyder...@gmail.com
> >
> >
> >
> >[image: Mailtrack]
> ><
> https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&;
> >
> >Sender
> >notified by
> >Mailtrack
> ><
> https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&;
> >
> >04/09/19,
> >7:52:27 PM
> >
> >On Tue, Apr 9, 2019 at 4:53 PM Jim Lemon  wrote:
> >
> >> Hi Matthew,
> >> How about this?
> >>
> >> library(lattice)
> >> xyplot(mpg ~ wt | cyl,
> >>data=mtcars,
> >>col = mtcars$gear,
> >>pch = mtcars$carb
> >> )
> >> library(plotrix)
> >> grange<-range(mtcars$gear)
> >> xyplot(mpg ~ wt | cyl,
> >>data=mtcars,
> >>col =
> >> color.scale(mtcars$gear,extremes=c("blue","red"),xrange=grange),
> >>pch = as.character(mtcars$carb)
> >> )
> >>
> >> Jim
> >>
> >> On Wed, Apr 10, 2019 at 7:43 AM Matthew Snyder 
> >> wrote:
> >> >
> >> > I am making a lattice plot and I would like to use the value in one
> >> column
> >> > to define the pch and another column to define color of points.
> >Something
> >> > like:
> >> >
> >> > xyplot(mpg ~ wt | cyl,
> >> >data=mtcars,
> >> >col = gear,
> >> >pch = carb
> >> > )
> >> >
> >> > There are unique pch points in the second and third panels, but
> >these
> >> > points are only unique within the plots, not among all the plots
> >(as they
> >> > should be). You can see this if you use the following code:
> >> >
> >> > xyplot(mpg ~ wt | cyl,
> >> >data=mtcars,
> >> >groups = carb
> >> > )
> >> >
> >> > This plot looks great for one group, but if you try to invoke two
> >groups
> >> > using c(gear, carb) I think it simply takes unique combinations of
> >those
> >> > two variables and plots them as unique colors.
> >> >
> >> > Another solution given by a StackExchange user:
> >> >
> >> > mypch <- 1:6
> >> > mycol <- 1:3
> >> >
> >> > xyplot(mpg ~ wt | cyl,
> >> >   panel = function(x, y, ..., groups, subscripts) {
> >> >   pch <- mypch[factor(carb[subscripts])]
> >> >   col <- mycol[factor(gear[subscripts])]
> >> >       grp <- c(gear,carb)
> >> >   panel.xyplot(x, y, pch = pch, col = col)
> >> >   }

Re: [R] Define pch and color based on two different columns

2019-04-09 Thread Matthew Snyder

You are not late to the party. And you solved it!

Thank you very much. You just made my PhD a little closer to reality!

Matt



*Matthew R. Snyder*
*~*
PhD Candidate
University Fellow
University of Toledo
Computational biologist, ecologist, and bioinformatician
Sponsored Guest Researcher at NOAA PMEL, Seattle, WA.
matthew.snyd...@rockets.utoledo.edu
msnyder...@gmail.com



[image: Mailtrack]
<https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&;>
Sender
notified by
Mailtrack
<https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&;>
04/09/19,
10:01:53 PM

On Tue, Apr 9, 2019 at 9:37 PM Peter Langfelder 
wrote:

> Sorry for being late to the party, but has anyone suggested a minor
> but important modification of the code from stack exchange?
>
> xyplot(mpg ~ wt | cyl,
>   panel = function(x, y, ..., groups, subscripts) {
>   pch <- mypch[factor(carb)[subscripts]]
>   col <- mycol[factor(gear)[subscripts]]
>   grp <- c(gear,carb)
>   panel.xyplot(x, y, pch = pch, col = col)
>   }
> )
>
> From the little I understand about what you're trying to do, this may
> just do the trick.
>
> Peter
>
> On Tue, Apr 9, 2019 at 2:43 PM Matthew Snyder 
> wrote:
> >
> > I am making a lattice plot and I would like to use the value in one
> column
> > to define the pch and another column to define color of points. Something
> > like:
> >
> > xyplot(mpg ~ wt | cyl,
> >data=mtcars,
> >col = gear,
> >pch = carb
> > )
> >
> > There are unique pch points in the second and third panels, but these
> > points are only unique within the plots, not among all the plots (as they
> > should be). You can see this if you use the following code:
> >
> > xyplot(mpg ~ wt | cyl,
> >data=mtcars,
> >groups = carb
> > )
> >
> > This plot looks great for one group, but if you try to invoke two groups
> > using c(gear, carb) I think it simply takes unique combinations of those
> > two variables and plots them as unique colors.
> >
> > Another solution given by a StackExchange user:
> >
> > mypch <- 1:6
> > mycol <- 1:3
> >
> > xyplot(mpg ~ wt | cyl,
> >   panel = function(x, y, ..., groups, subscripts) {
> >   pch <- mypch[factor(carb[subscripts])]
> >   col <- mycol[factor(gear[subscripts])]
> >   grp <- c(gear,carb)
> >   panel.xyplot(x, y, pch = pch, col = col)
> >   }
> > )
> >
> > This solution has the same problems as the code at the top. I think the
> > issue causing problems with both solutions is that not every value for
> each
> > group is present in each panel, and they are almost never in the same
> > order. I think R is just interpreting the appearance of unique values as
> a
> > signal to change to the next pch or color. My actual data file is very
> > large, and it's not possible to sort my way out of this mess. It would be
> > best if I could just use the value in two columns to actually define a
> > color or pch for each point on an entire plot. Is there a way to do this?
> >
> > Ps, I had to post this via email because the Nabble site kept sending me
> an
> > error message: "Message rejected by filter rule match"
> >
> > Thanks,
> > Matt
> >
> >
> >
> > *Matthew R. Snyder*
> > *~*
> > PhD Candidate
> > University Fellow
> > University of Toledo
> > Computational biologist, ecologist, and bioinformatician
> > Sponsored Guest Researcher at NOAA PMEL, Seattle, WA.
> > matthew.snyd...@rockets.utoledo.edu
> > msnyder...@gmail.com
> >
> >
> >
> > [image: Mailtrack]
> > <
> https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&;
> >
> > Sender
> > notified by
> > Mailtrack
> > <
> https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&;
> >
> > 04/09/19,
> > 1:49:27 PM
> >
> > [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] working on a data frame

2014-07-24 Thread Matthew McCormack


On 7/24/2014 8:52 PM, Sarah Goslee wrote:
> Hi,
>
> Your description isn't clear:
>
> On Thursday, July 24, 2014, Matthew  <mailto:mccorm...@molbio.mgh.harvard.edu>> wrote:
>
> I am coming from the perspective of Excel and VBA scripts, but I
> would like to do the following in R.
>
>  I have a data frame with 14 columns and 32,795 rows.
>
> I want to check the value in column 8 (row 1) to see if it is a 0.
> If it is not a zero, proceed to the next row and check the value
> for column 8.
> If it is a zero, then
> a) change the zero to a 1,
> b) divide the value in column 9 (row 1) by 1,
>
>
> Row 1, or the row in which column 8 == 0?
All rows in which the value in column 8==0.
> Why do you want to divide by 1?
Column 10 contains the result of the value in column 9 divided by the 
value in column 8. If the value in column 8==0, then the division can 
not be done, so  I want to change the zero to a one in order to do the 
division. This is a fairly standard thing to do with this data. (The 
data are measurements of amounts at two time points. Sometimes a thing 
will not be present in the beginning (0), but very present at the later 
time. Column 10 is the log2 of the change. Infinite is not an easy 
number to work with, so it is common to change the 0 to a 1. On the 
other hand, something may be present at time 1, but not at the later 
time. In this case column 10 would be taking the log2 of a number 
divided by 0, so again the zero is commonly changed to a one in order to 
get a useable value in column 10. In both the preceding cases there was 
a real change, but Inf and NaN are not helpful.)
>
> c) place the result in column 10 (row 1) and
>
>
> Ditto on the row 1 question.
I want to work on all rows where column 8 (and column 9) contain a zero.
Column 10 contains the result of the value in column 9 divided by the 
value in column 8. So, for row 1, column 10 row 1 contains the ratio 
column 9 row 1 divided by column 8 row 1, and so on through the whole 
32,000 or so rows.

Most rows do not have a zero in columns 8 or 9. Some rows have  zero in 
column 8 only, and some rows have a zero in column 9 only. I want to get 
rid of the zeros in these two columns and then do the division to get a 
manageable value in column 10. Division by zero and Inf are not 
considered 'manageable' by me.
> What do you want column 10 to be if column 8 isn't 0? Does it already 
> have a value. I suppose it must.
Yes column 10 does have something, but this something can be Inf or NaN, 
which I want to get rid of.
>
> d) repeat this for each of the other 32,794 rows.
>
> Is this possible with an R script, and is this the way to go about
> it. If it is, could anyone get me started ?
>
>
> Assuming you want to put the new values in the rows where column 8 == 
> 0, you can do it in two steps:
>
> mydata[,10] <- ifelse(mydata[,8] == 0, mydata[,9]/whatever, mydata[,10])
> #where whatever is the thing you want to divide by that probably isn't 1
> mydata[,8] <- ifelse(mydata[,8] == 0, 1, mydata[,8])
>
> R programming is best done by thinking about vectorizing things, 
> rather than doing them in loops. Reading the Intro to R that comes 
> with your installation is a good place to start.
Would it be better to change the data frame into a matrix, or something 
else ?
Thanks for your help.
>
> Sarah
>
>
> Matthew
>
>
>
>
> -- 
> Sarah Goslee
> http://www.stringpage.com
> http://www.sarahgoslee.com
> http://www.functionaldiversity.org


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] reshape: melt and cast

2015-09-01 Thread Matthew Pickard

Yep, that works. Thanks, Stephen. I should have drawn the parallel with
Excel Pivot tables sooner.

On Tue, Sep 1, 2015 at 9:36 AM, stephen sefick  wrote:

> I would make this minimal. In other words, use an example data set, dput,
> and use output of dput in a block of reproducible code. I don't understand
> exactly what you want, but does sum work? If there is more than one record
> for a given set of factors the sum is the sum of the counts. If only one
> record, then the sum is the same as the original number.
>
> On Tue, Sep 1, 2015 at 10:00 AM, Matthew Pickard <
> matthew.david.pick...@gmail.com> wrote:
>
>> Thanks, Stephen. I've looked into the fun.aggregate argument. I don't
>> want to aggregate, so I thought leaving it blank (allowing it to default to
>> NULL) would do that.
>>
>>
>> Here's a corrected post (with further explanation):
>>
>> Hi,
>>
>> I have data that looks like this:
>>
>> >dput(head(ratings))
>> structure(list(QCode = structure(c(5L, 7L, 5L, 7L, 5L, 7L), .Label =
>> c("APPEAR",
>> "FEAR", "FUN", "GRAT", "GUILT", "Joy", "LOVE", "UNGRAT"), class =
>> "factor"),
>> PID = structure(c(1L, 1L, 2L, 2L, 3L, 3L), .Label = c("1123",
>> "1136", "1137", "1142", "1146", "1147", "1148", "1149", "1152",
>> "1153", "1154", "1156", "1158", "1161", "1164", "1179", "1182",
>> "1183", "1191", "1196", "1197", "1198", "1199", "1200", "1201",
>> "1203", "1205", "1207", "1208", "1209", "1214", "1216", "1219",
>> "1220", "1222", "1223", "1224", "1225", "1226", "1229", "1236",
>> "1237", "1238", "1240", "1241", "1243", "1245", "1246", "1248",
>> "1254", "1255", "1256", "1257", "1260", "1262", "1264", "1268",
>> "1270", "1272", "1278", "1279", "1280", "1282", "1283", "1287",
>> "1288", "1292", "1293", "1297", "1310", "1311", "1315", "1329",
>> "1332", "1333", "1343", "1346", "1347", "1352", "1354", "1355",
>> "1356", "1360", "1368", "1369", "1370", "1378", "1398", "1400",
>> "1403", "1404", "1411", "1412", "1420", "1421", "1423", "1424",
>> "1426", "1428", "1432", "1433", "1435", "1436", "1438", "1439",
>> "1440", "1441", "1443", "1444", "1446", "1447", "1448", "1449",
>> "1450", "1453", "1454", "1456", "1459", "1460", "1461", "1462",
>> "1463", "1468", "1471", "1475", "1478", "1481", "1482", "1487",
>> "1488", "1490", "1493", "1495", "1497", "1503", "1504", "1508",
>> "1509", "1511", "1513", "1514", "1515", "1522", "1524", "1525",
>> "1526", "1527", "1528", "1529", "1532", "1534", "1536", "1538",
>> "1539", "1540", "1543", "1550", "1551", "1552", "1554", "1555",
>> "1556", "1558", "1559"), class = "factor"), RaterName =
>> structure(c(1L,
>> 1L, 1L, 1L, 1L, 1L), .Label = c("cwormhoudt", "zspeidel"), class =
>> "factor"),
>> SI1 = c(2L, 1L, 1L, 1L, 2L, 1L), SI2 = c(2L, 2L, 2L, 2L,
>> 2L, 3L), SI3 = c(3L, 3L, 3L, 3L, 2L, 4L), SI4 = c(1L, 2L,
>> 1L, 1

[R] fast way to create composite matrix based on mixed indices?

2015-09-17 Thread Matthew Keller

HI all,

Sorry for the title here but I find this difficult to describe succinctly.
Here's the problem.

I want to create a new matrix where each row is a composite of an old
matrix, but where the row & column indexes of the old matrix change for
different parts of the new matrix. For example, the second row of new
matrix (which has , e.g., 10 columns) might be columns 1 to 3 of row 2 of
old matrix, columns 4 to 8 of row 1 of old matrix, and columns 9 to 10 of
row 3 of old matrix.

Here's an example in code:

#The old matrix
(old.mat <- matrix(1:30,nrow=3,byrow=TRUE))

#matrix of indices to create the new matrix from the old one.
#The 1st column gives the row number of the new matrix
#the 2nd gives the row of the old matrix that we're going to copy into the
new matrix
#the 3rd gives the starting column of the old matrix for the row in col 2
#the 4th gives the end column of the old matrix for the row in col 2
index <- matrix(c(1,1,1,4,
  1,3,5,10,
  2,2,1,3,
  2,1,4,8,
  2,3,9,10),
nrow=5,byrow=TRUE,

dimnames=list(NULL,c('new.mat.row','old.mat.row','old.mat.col.start','old.mat.col.end')))

I will be given old.mat and index and want to create new.mat from them.

I want to create a new.matrix of two rows that looks like this:
new.mat <- matrix(c(1:4,25:30,11:13,4:8,29:30),byrow=TRUE,nrow=2)

So here, the first row of new.mat is columns 1 to 4 of row 1 of the old.mat
and columns 5 to 10 of row 3 of old.mat.

new.mat and old.mat will always have the same number of columns but the
number of rows could differ.

I could accomplish this in a loop, but the real problem is quite large
(new.mat might have 1e8 elements), and so a for loop would be prohibitively
slow.

I may resort to unix tools and use a shell script, but wanted to first see
if this is doable in R in a fast way.

Thanks in advance!

Matt


-- 
Matthew C Keller
Asst. Professor of Psychology
University of Colorado at Boulder
www.matthewckeller.com

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] fast way to create composite matrix based on mixed indices?

2015-09-18 Thread Matthew Keller

Brilliant Denes. Thank you for your help. This worked and is obviously much
faster than a loop...

On Thu, Sep 17, 2015 at 3:22 PM, Dénes Tóth  wrote:

> Hi Matt,
>
> you could use matrix indexing. Here is a possible solution, which could be
> optimized further (probably).
>
> # The old matrix
> (old.mat <- matrix(1:30,nrow=3,byrow=TRUE))
> # matrix of indices
> index <- matrix(c(1,1,1,4,
>   1,3,5,10,
>   2,2,1,3,
>   2,1,4,8,
>   2,3,9,10),
> nrow=5,byrow=TRUE,
> dimnames=list(NULL,
>   c('new.mat.row','old.mat.row',
> 'old.mat.col.start','old.mat.col.end')))
> # expected result
> new.mat <- matrix(c(1:4,25:30,11:13,4:8,29:30),
>   byrow=TRUE, nrow=2)
> #
> # column indices
> ind <- mapply(seq, index[, 3], index[,4],
>   SIMPLIFY = FALSE, USE.NAMES = FALSE)
> ind_len <- vapply(ind, length, integer(1))
> ind <- unlist(ind)
>
> #
> # old indices
> old.ind <- cbind(rep(index[,2], ind_len), ind)
> #
> # new indices
> new.ind <- cbind(rep(index[,1], ind_len), ind)
> #
> # create the new matrix
> result <- matrix(NA_integer_, max(index[,1]), max(index[,4]))
> #
> # fill the new matrix
> result[new.ind] <- old.mat[old.ind]
> #
> # check the results
> identical(result, new.mat)
>
>
> HTH,
>   Denes
>
>
>
>
>
> On 09/17/2015 10:36 PM, Matthew Keller wrote:
>
>> HI all,
>>
>> Sorry for the title here but I find this difficult to describe succinctly.
>> Here's the problem.
>>
>> I want to create a new matrix where each row is a composite of an old
>> matrix, but where the row & column indexes of the old matrix change for
>> different parts of the new matrix. For example, the second row of new
>> matrix (which has , e.g., 10 columns) might be columns 1 to 3 of row 2 of
>> old matrix, columns 4 to 8 of row 1 of old matrix, and columns 9 to 10 of
>> row 3 of old matrix.
>>
>> Here's an example in code:
>>
>> #The old matrix
>> (old.mat <- matrix(1:30,nrow=3,byrow=TRUE))
>>
>> #matrix of indices to create the new matrix from the old one.
>> #The 1st column gives the row number of the new matrix
>> #the 2nd gives the row of the old matrix that we're going to copy into the
>> new matrix
>> #the 3rd gives the starting column of the old matrix for the row in col 2
>> #the 4th gives the end column of the old matrix for the row in col 2
>> index <- matrix(c(1,1,1,4,
>>1,3,5,10,
>>2,2,1,3,
>>2,1,4,8,
>>2,3,9,10),
>>  nrow=5,byrow=TRUE,
>>
>>
>> dimnames=list(NULL,c('new.mat.row','old.mat.row','old.mat.col.start','old.mat.col.end')))
>>
>> I will be given old.mat and index and want to create new.mat from them.
>>
>> I want to create a new.matrix of two rows that looks like this:
>> new.mat <- matrix(c(1:4,25:30,11:13,4:8,29:30),byrow=TRUE,nrow=2)
>>
>> So here, the first row of new.mat is columns 1 to 4 of row 1 of the
>> old.mat
>> and columns 5 to 10 of row 3 of old.mat.
>>
>> new.mat and old.mat will always have the same number of columns but the
>> number of rows could differ.
>>
>> I could accomplish this in a loop, but the real problem is quite large
>> (new.mat might have 1e8 elements), and so a for loop would be
>> prohibitively
>> slow.
>>
>> I may resort to unix tools and use a shell script, but wanted to first see
>> if this is doable in R in a fast way.
>>
>> Thanks in advance!
>>
>> Matt
>>
>>
>>


-- 
Matthew C Keller
Asst. Professor of Psychology
University of Colorado at Boulder
www.matthewckeller.com

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Why does residuals.coxph use naive.var?

2017-03-02 Thread Matthew Burgess

Hi all,

I noticed that the scaled Schoenfeld residuals produced by
residuals.coxph(fit, type="scaledsch") were different from those returned
by cox.zph for a model where robust standard errors have been estimated.
Looking at the source code for both functions suggests this is because
residuals.coxph uses the naive variance to scale the Schoenfeld residuals
whereas cox.zph uses the robust version when it is available.

Lines 20-21 of the version of residuals.coxph currently on github:

vv <- drop(object$naive.var)
if (is.null(vv)) vv <- drop(object$var)

i.e. the naive variance is used even when a robust version is available.

Why is this the case? Have I missed something? Am I right in thinking that
using the robust variance is the better choice if the intention is to check
the proportional hazards assumption?

Here is a reproducible example using the heart data:

data(heart)
fit <- coxph(Surv(start, stop, event) ~ year + age + surgery + cluster(id),
data=jasa1)
# Should return True since both produce the scaled Schoenfeld residuals
all(residuals(fit, type='scaledsch') == cox.zph(fit)$y)

Thanks for your help.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] 64-bit R on Mac OS X 10.4.5

2008-07-26 Thread Matthew Keller

ot;gfortran -arch x86_64" \
> FC="gfortran -arch x86_64" \
> --with-system-zlib \
> --with-blas='-framework vecLib' --with-lapack && \
> make -j4 && \
> make check && \
> make install
> cd ..when I try to run it by typing R, it gives me the following error: 
> -bash: R: command not found
>
>
> Can any body help me to solve this problem or direct me to better 
> step-by-step instructions.
> Thanks
> Joseph
>
>
>
>[[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Matthew C Keller
Asst. Professor of Psychology
University of Colorado at Boulder
www.matthewckeller.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] tryCatch

2008-08-31 Thread Redding, Matthew

Hi All R-Gurus, 

I am trying to debug a program, and I think tryCatch will help. The functions 
involved 
process through so many times before I encounter the error, things are a bit 
slow to 
use debug and browser().

I've read the help file and postings on conditions, and am still having trouble.

While running my program I am getting a NAN from a called function, and I want 
to know the 
input parameters that generate this, so I have included the following in the 
code of the main
function (calling function):

tryCatch(delM > S, exception=function(e)print(list(S=S, Si=Si, D=D, theta=S/N, 
incr=del.t)), finally=print("finally"))

This is actually part of an "if" statement, where delM > S is the condition.
Now if delM is an NAN an error results.  

Now the above tryCatch does not work in the way I wish it.  What sort of 
condition does this little expression throw when it encounters delM=NAN? is it 
an exception? What is wrong with the above handler etc?

Kind regards, 

Matt Redding

DISCLAIMER
The information contained in the above e-mail message or messages 
(which includes any attachments) is confidential and may be legally 
privileged.  It is intended only for the use of the person or entity 
to which it is addressed.  If you are not the addressee any form of 
disclosure, copying, modification, distribution or any action taken 
or omitted in reliance on the information is unauthorised.  Opinions 
contained in the message(s) do not necessarily reflect the opinions 
of the Queensland Government and its authorities.  If you received 
this communication in error, please notify the sender immediately and 
delete it from your computer system network.
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] RGoogleDocs: getDocs() - "problems connecting to get the list of documents"

2009-08-14 Thread Blackett, Matthew

Hi

I have been using RGoogleDocs successfully for some time now but something 
seems to have happened which is preventing me from accessing my data in google 
spreadsheets.

I get the message: "problems connecting to get the list of documents" when I 
use getDocs, despite being logged in

e.g.

sheets.con = getGoogleDocsConnection(getGoogleAuth("username", "password", 
service = "wise"))
ts = getWorksheets("formname", sheets.con)


è Error in getDocs(con) : problems connecting to get the list of documents

Does anyone know what might be causing this? Is it maybe a problem at the 
google end?

Matthew Blackett
Researcher
King's College London

http://geography.kcl.ac.uk/micromet/MBlackett/


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Video demo of using svSocket with data.table

2009-08-20 Thread Matthew Dowle

Dear r-help,
If you haven't already seen this then :
http://www.youtube.com/watch?v=rvT8XThGA8o
The video consists of typing at the console and graphics, there is no audio
or slides. Please press the HD button and maximise.  Its about 8 mins.
Regards, Matthew

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Plotting point text-labels with lattice splom

2009-09-01 Thread Jockers Matthew

I have read the thread re: "Plotting text with lattice" but can't seem  
to get from there to what I need. . . would appreciate any advice. . .


I have used splom to plot data of the first three principle components  
from a pca analysis.  Here is the code I have thus far:


> mydata.pr<-prcomp(mydata)
> grps <- substr(rownames(mydata),1,4)
> super.sym=trellis.par.get("superpose.symbol")
> splom(data.frame(mydata.pr$x[,1:3]),
groups = grps,
panel = panel.superpose,
key = list (title = "Four Items in PCA space",
text = list(c("G", "H", "N", "Il")),
points=list(pch=super.sym$pch[1:4],
col=super.sym$col[1:4])))

I would now like to append text labels to each point in the plot that  
will identify the item based on its rowname in the original data set.


so, something like this gets me the labels I want

> labs<-substr(rownames(mydata),1,6)

My trouble then comes in figuring out how to get these labels to  
"attach" to the corresponding points in the plot.


Thanks.
Matt

--
Matthew Jockers
Stanford University
http://www.stanford.edu/~mjockers

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] R help:

2009-09-11 Thread Matthew Fantle

Hi

I have written a code to do some averaging of data over uneven  
intervals. The for loop keeps missing particular depths and I once got  
an error message reading:

  *** caught segfault ***
address 0xc023, cause 'memory not mapped'

Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace

The portion of the code that is giving me problems is:

if(length(which( interp.depth == highres.depth[i] )) >0 ) {
print(paste("depth = ",highres.depth[i],sep=""))
depth.tracker <- c(highres.depth[i],depth.tracker)
caco3.interp.vector <- 
c(mean(caco3.interp),caco3.interp.vector)
caco3.interp <- numeric(0)
}

When the routine misses a depth, it returns a length of zero for (say)  
depth = 1.4, or highres.depth[141]. but when i type in the value 1.4,  
I get the proper answer. Any idea what is going on here?

thanks
Matt

______
Matthew S. Fantle
Assistant Professor
Department of Geosciences
Penn State University
212 Deike Bldg.
University Park, PA 16802

Phone: 814-863-9968
mfan...@psu.edu

Departmental Homepage
http://www.geosc.psu.edu/people/faculty/personalpages/mfantle/index.html





[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] data.table evaluating columns

2010-03-03 Thread Matthew Dowle

I'd go a bit further and remind that the r-help posting guide is clear :

  " For questions about functions in standard packages distributed with R 
(see the FAQ Add-on packages in R), ask questions on R-help.
If the question relates to a contributed package , e.g., one downloaded from 
CRAN, try contacting the package maintainer first. You can also use 
find("functionname") and packageDescription("packagename") to find this 
information. ONLY send such questions to R-help or R-devel if you get no 
reply or need further assistance. This applies to both requests for help and 
to bug reports. "

The "ONLY" is in bold in the posting guide. I changed the bold to capitals 
above for people reading this in text only.

Since Tom and I are friendly and responsive, users of data.table don't 
usually make it to r-help. We'll follow up this one off-list.  Please note 
that Rob's question is very good by the rest of the posting guide, so no 
complaints there, only that it was sent to the wrong place.  Please keep the 
questions coming, but send them to us, not r-help.

You do sometimes see messages to r-help starting something like "I have 
contacted the authors/maintainers but didn't hear back,  does anyone know 
...".   To not state that they had would be an implicit request for further 
work by the community (for free) to ask if they had. So its not enough to 
contact the maintainer first, but you also have to say that you have as 
well, and perhaps how long ago too would be helpful.  For r-forge projects I 
usually send any question to everyone on the project (easy to find) or if 
they have a list then to that.

HTH
Matthew

"Tom Short"  wrote in message 
news:fd27013a1003021718w409acb32r1281dfeca5593...@mail.gmail.com...
On Tue, Mar 2, 2010 at 7:09 PM, Rob Forler  wrote:
> Hi everyone,
>
> I have the following code that works in data frames taht I would like tow
> ork in data.tables . However, I'm not really sure how to go about it.
>
> I basically have the following
>
> names = c("data1", "data2")
> frame = data.frame(list(key1=as.integer(c(1,2,3,4,5,6)),
> key2=as.integer(c(1,2,3,2,5,6)),data1 = c(3,3,2,3,5,2), data2=
> c(3,3,2,3,5,2)))
>
> for(i in 1:length(names)){
> frame[, paste(names[i], "flag")] = frame[,names[i]] < 3
>
> }
>
> Now I try with data.table code:
> names = c("data1", "data2")
> frame = data.table(list(key1=as.integer(c(1,2,3,4,5,6)),
> key2=as.integer(c(1,2,3,2,5,6)),data1 = c(3,3,2,3,5,2), data2=
> c(3,3,2,3,5,2)))
>
> for(i in 1:length(names)){
> frame[, paste(names[i], "flag"), with=F] = as.matrix(frame[,names[i],
> with=F] )< 3
>
> }

Rob, this type of question is better for the package maintainer(s)
directly rather than R-help. That said, one answer is to use list
addressing:

for(i in 1:length(names)){
frame[[paste(names[i], "flag")]] = frame[[names[i]]] < 3
}

Another option is to manipulate frame as a data frame and convert to
data.table when you need that functionality (conversion is quick).

In the data table version, frame[,names[i], with=F] is the same as
frame[,names[i], drop=FALSE] (the answer is a list, not a vector).
Normally, it's easier to use [[]] or $ indexing to get this. Also,
fname[i,j] <- something assignment is still a bit buggy for
data.tables.

- Tom

Tom Short

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] data.table evaluating columns

2010-03-03 Thread Matthew Dowle

That in itself is a question for the maintainer, off r-help. When the 
posting guide says "contact the package maintainer first" it means it 
literally and applies even to questions about the existence of a mailing 
list for the package.  So what I'm supposed to do now is tell you how the 
posting guide works, and tell you that I'll reply off list.  Then hopefully 
the community will be happy with me too.  So I'll reply off list :-)

"Rob Forler"  wrote in message 
news:eb472fec1003030502s4996511ap8dfd329a3...@mail.gmail.com...
> Okay I appreciate the help, and I appreciate the FAQ reminder. I will read
> the r-help posting guide. I'm relatively new to using the support systems
> around R. So far everyone has been really helpful.
>
> I'm confused as to which data.table "list" I should be using.
> http://lists.r-forge.r-project.org/pipermail/datatable-commits/ doesn't
> appear to be correct. Or just directly sending an email to all of you?
>
> Thanks again,
> Rob
>
>
>
> On Wed, Mar 3, 2010 at 6:05 AM, Matthew Dowle 
> wrote:
>
>>
>> I'd go a bit further and remind that the r-help posting guide is clear :
>>
>>  " For questions about functions in standard packages distributed with R
>> (see the FAQ Add-on packages in R), ask questions on R-help.
>> If the question relates to a contributed package , e.g., one downloaded
>> from
>> CRAN, try contacting the package maintainer first. You can also use
>> find("functionname") and packageDescription("packagename") to find this
>> information. ONLY send such questions to R-help or R-devel if you get no
>> reply or need further assistance. This applies to both requests for help
>> and
>> to bug reports. "
>>
>> The "ONLY" is in bold in the posting guide. I changed the bold to 
>> capitals
>> above for people reading this in text only.
>>
>> Since Tom and I are friendly and responsive, users of data.table don't
>> usually make it to r-help. We'll follow up this one off-list.  Please 
>> note
>> that Rob's question is very good by the rest of the posting guide, so no
>> complaints there, only that it was sent to the wrong place.  Please keep
>> the
>> questions coming, but send them to us, not r-help.
>>
>> You do sometimes see messages to r-help starting something like "I have
>> contacted the authors/maintainers but didn't hear back,  does anyone know
>> ...".   To not state that they had would be an implicit request for 
>> further
>> work by the community (for free) to ask if they had. So its not enough to
>> contact the maintainer first, but you also have to say that you have as
>> well, and perhaps how long ago too would be helpful.  For r-forge 
>> projects
>> I
>> usually send any question to everyone on the project (easy to find) or if
>> they have a list then to that.
>>
>> HTH
>> Matthew
>>
>>
>> "Tom Short"  wrote in message
>> news:fd27013a1003021718w409acb32r1281dfeca5593...@mail.gmail.com...
>> On Tue, Mar 2, 2010 at 7:09 PM, Rob Forler  wrote:
>> > Hi everyone,
>> >
>> > I have the following code that works in data frames taht I would like 
>> > tow
>> > ork in data.tables . However, I'm not really sure how to go about it.
>> >
>> > I basically have the following
>> >
>> > names = c("data1", "data2")
>> > frame = data.frame(list(key1=as.integer(c(1,2,3,4,5,6)),
>> > key2=as.integer(c(1,2,3,2,5,6)),data1 = c(3,3,2,3,5,2), data2=
>> > c(3,3,2,3,5,2)))
>> >
>> > for(i in 1:length(names)){
>> > frame[, paste(names[i], "flag")] = frame[,names[i]] < 3
>> >
>> > }
>> >
>> > Now I try with data.table code:
>> > names = c("data1", "data2")
>> > frame = data.table(list(key1=as.integer(c(1,2,3,4,5,6)),
>> > key2=as.integer(c(1,2,3,2,5,6)),data1 = c(3,3,2,3,5,2), data2=
>> > c(3,3,2,3,5,2)))
>> >
>> > for(i in 1:length(names)){
>> > frame[, paste(names[i], "flag"), with=F] = as.matrix(frame[,names[i],
>> > with=F] )< 3
>> >
>> > }
>>
>> Rob, this type of question is better for the package maintainer(s)
>> directly rather than R-help. That said, one answer is to use list
>> addressing:
>>
>> for(i in 1:length(names)){
>>frame[[paste(names[i], "flag")]] = frame[[names[i]]] < 3
>> }
>>
>> Another option

Re: [R] Three most useful R package

2010-03-03 Thread Matthew Dowle

Dieter,

One way to check if a package is active, is by looking on r-forge. If you 
are referring to data.table you would have found it is actually very active 
at the moment and is far from abandoned.

What you may be referring to is a warning, not an error, with v1.2 on 
R2.10+.  That was fixed many moons ago. The r-forge version is where its at.

Rather than commenting in public about a warning on a package, and making a 
conclusion about its abandonment, and doing this without copying the 
maintainer, perhaps you could have contacted the maintainer to let him know 
you had found a problem.  That would have been a more community spirited 
action to take.  Doing that at the time you found out would have been 
helpful too rather than saving it up for now.  Or you can always check the 
svn logs yourself,  as the r-forge guys even made that trivial to do.

All,

Can we please now stop this thread ?  The crantastic people worked hard to 
provide a better solution.  If the community refuses to use crantastic, 
thats up to the community, but to start now filling up r-help with votes on 
packages when so much effort was put in to a much much better solution ages 
ago?  Its as quick to put your votes into crantastic as it is to write to 
r-help.  What your problem, folks, with crantastic?   The second reply 
mentioned crantastic but you all chose to ignore it,  it seems.  If you want 
to vote, use crantastic.  If you don't want to vote,  don't vote.  But using 
r-help to vote ?!  The better solution is right there: 
http://crantastic.org/

Matthew

"Dieter Menne"  wrote in message 
news:1267626882999-1576618.p...@n4.nabble.com...
>
>
> Rob Forler wrote:
>>
>> And data.table because it does aggregation about 50x times faster than
>> plyr
>> (which I used to use a lot).
>>
>>
>
> This is correct, from the error message its spits out one has to conclude
> that is was abandoned at R-version 2.4.x
>
> Dieter
>
>
>
>
> -- 
> View this message in context: 
> http://n4.nabble.com/Three-most-useful-R-package-tp1575671p1576618.html
> Sent from the R help mailing list archive at Nabble.com.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ifthen() question

2010-03-05 Thread Matthew Dowle

This post breaks the posting guide in multiple ways.  Please read it again 
(and then again) - in particular the first 3 paragraphs.  You will help 
yourself by following it.

The solution is right there in the help page for ?data.frame and other 
places including Introduction to R.  I think its more helpful *not* to tell 
you what it is, so that you discover it for yourself, learn how to learn, 
and google.   I hope that you appreciate it that I've been helpful just 
simply (and quickly) telling you the answer *is* there.

Having said that, you don't appear to be aware of many of the packages 
around that does this task - you appear to be re-inventing the wheel.  I 
suggest you briefly investigate each and every one of the top 30 packages 
ranked by crantastic, before writing any more R code.  A little time 
invested doing that will pay you dividends in the long run. That is not a 
complaint of you though, as that advice is not in the posting guide.

Matthew


"AC Del Re"  wrote in message 
news:85cf8f8d1003040735k2b076142jc99b7ec34da87...@mail.gmail.com...
> Hi All,
>
> I am using a specialized aggregation function to reduce a dataset with
> multiple rows per id down to 1 row per id. My function work perfect when
> there are >1 id but alters the 'var.g' in undesirable ways when this
> condition is not met, Therefore, I have been trying ifthen() statements to
> keep the original value when length of unique id == 1 but I cannot get it 
> to
> work. e.g.:
>
> #function to aggregate effect sizes:
> aggs <- function(g, n.1, n.2, cor = .50) {
>  n.1 <- mean(n.1)
>  n.2 <- mean(n.2)
>  N_ES <- length(g)
>  corr.mat <- matrix (rep(cor, N_ES^2), nrow=N_ES)
>  diag(corr.mat) <- 1
>  g1g2 <- cbind(g) %*% g
>  PSI <- (8*corr.mat + g1g2*corr.mat^2)/(2*(n.1+n.2))
>  PSI.inv <- solve(PSI)
>  a <- rowSums(PSI.inv)/sum(PSI.inv)
>  var.g <- 1/sum(PSI.inv)
>  g <- sum(g*a)
>  out<-cbind(g,var.g, n.1, n.2)
>  return(out)
>  }
>
>
> # automating this procedure for all rows of df. This format works perfect
> when there is > 1 id per row only:
>
> agg_g <- function(id, g, n.1, n.2, cor = .50) {
>  st <- unique(id)
>  out <- data.frame(id=rep(NA,length(st)))
>  for(i in 1:length(st))   {
>out$id[i] <- st[i]
>out$g[i] <- aggs(g=g[id==st[i]], n.1= n.1[id==st[i]],
>   n.2 = n.2[id==st[i]], cor)[1]
>out$var.g[i] <- aggs(g=g[id==st[i]], n.1= n.1[id==st[i]],
>  n.2 = n.2[id==st[i]], cor)[2]
>out$n.1[i] <- round(mean(n.1[id==st[i]]),0)
>out$n.2[i] <- round(mean(n.2[id==st[i]]),0)
>  }
>  return(out)
> }
>
>
> # The attempted solution using ifthen() and minor changes to function but
> it's not working properly:
> agg_g <- function(df,var.g, id, g, n.1, n.2, cor = .50) {
>  df$var.g <- var.g
>  st <- unique(id)
>  out <- data.frame(id=rep(NA,length(st)))
>  for(i in 1:length(st))   {
>out$id[i] <- st[i]
>out$g[i] <- aggs(g=g[id==st[i]], n.1= n.1[id==st[i]],
>   n.2 = n.2[id==st[i]], cor)[1]
>out$var.g[i]<-ifelse(length(st[i])==1, df$var.g[id==st[i]],
> aggs(g=g[id==st[i]],
>  n.1= n.1[id==st[i]],
>  n.2 = n.2[id==st[i]], cor)[2])
>out$n.1[i] <- round(mean(n.1[id==st[i]]),0)
>out$n.2[i] <- round(mean(n.2[id==st[i]]),0)
>  }
>  return(out)
> }
>
> # sample data:
> id<-c(1, rep(1:19))
> n.1<-c(10,20,13,22,28,12,12,36,19,12,36,75,33,121,37,14,40,16,14,20)
> n.2 <- c(11,22,10,20,25,12,12,36,19,11,34,75,33,120,37,14,40,16,10,21)
> g <- c(.68,.56,.23,.64,.49,-.04,1.49,1.33,.58,1.18,-.11,1.27,.26,.40,.49,
> .51,.40,.34,.42,1.16)
> var.g <-
> c(.08,.06,.03,.04,.09,.04,.009,.033,.0058,.018,.011,.027,.026,.0040,
> .049,.0051,.040,.034,.0042,.016)
> df<-data.frame(id, n.1,n.2, g, var.g)
>
> Any help is much appreciated,
>
> AC
>
> [[alternative HTML version deleted]]
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Nonparametric generalization of ANOVA

2010-03-05 Thread Matthew Dowle

Frank, I respect your views but I agree with Gabor.  The posting guide does 
not support your views.

It is not any of our views that are important but we are following the 
posting guide.  It covers affiliation. It says only that "some" consider it 
"good manners to include a concise signature specifying affiliation". It 
does not agree that it is bad manners not to.  It is therefore going too far 
to urge R-gurus, whoever they might be, to ignore such postings on that 
basis alone.  It is up to responders (I think that is the better word which 
is the one used by the posting guide) whether they reply.  Missing 
affiliation is ok by the posting guide.  Users shouldn't be put off from 
posting because of that alone.

Sending from an anonymous email address such as "BioStudent" is also fine by 
the posting guide as far as my eyes read it. It says only that the email 
address should work. I would also answer such anonymous posts, providing 
they demonstrate they made best efforts to follow the posting guide, as 
usual for all requests for help.  Its so easy to send from a false, but 
apparently real name, why worry about that?

If you disagree with the posting guide then you could make a suggestion to 
get the posting guide changed with respect to these points.  But, currently, 
good and practice is defined by the posting guide, and I can't see that your 
view is backed up by it.  In fact it seems to me that these points were 
carefully considered, and the wording is careful on these points.

As far as I know you are wrong that there is no moderator.  There are in 
fact an uncountable number of people who are empowered to moderate i.e. all 
of us. In other words its up to the responders to moderate.  The posting 
guide is our guide.  As a last resort we can alert the list administrator 
(which I believe is the correct name for him in that role), who has powers 
to remove an email address from the list if he thinks that is appropriate, 
or act otherwise, or not at all.  It is actually up to responders (i.e. all 
of us) to ensure the posting guide is followed.

My view is that the problems started with some responders on some occasions. 
They sometimes forgot, a little bit, to encourage and remind posters to 
follow the posting guide when it was not followed. This then may have 
encouraged more posters to think it was ok not to follow the posting guide. 
That is my own personal view,  not a statistical one backed up by any 
evidence.

Matthew

"Frank E Harrell Jr"  wrote in message 
news:4b913880.9020...@vanderbilt.edu...
> Gabor Grothendieck wrote:
>> I am happy to answer posts to r-help regardless of the name and email
>> address of the poster but would draw the line at someone excessively
>> posting without a reasonable effort to find the answer first or using
>> it for homework since such requests could flood the list making it
>> useless for everyone.
>
> Gabor I respectfully disagree.  It is bad practice to allow anonymous 
> postings.  We need to see real names and real affiliations.
>
> r-help is starting to border on uselessness because of the age old problem 
> of the same question being asked every two days, a high frequency of 
> specialty questions, and answers given with the best of intentions in 
> incremental or contradictory e-mail pieces (as opposed to a cumulative 
> wiki or hierarchically designed discussion web forum), as there is no 
> moderator for the list.  We don't need even more traffic from anonymous 
> postings.
>
> Frank
>
>>
>> On Fri, Mar 5, 2010 at 10:55 AM, Ravi Varadhan  
>> wrote:
>>> David,
>>>
>>> I agree with your sentiments.  I also think that it is bad posting 
>>> etiquette not to sign one's genuine name and affiliation when asking for 
>>> help, which "blue sky" seems to do a lot.  Bert Gunter has already 
>>> raised this issue, and I completely agree with him. I would also like to 
>>> urge the R-gurus to ignore such postings.
>>>
>>> Best,
>>> Ravi.
>>> 
>>>
>>> Ravi Varadhan, Ph.D.
>>> Assistant Professor,
>>> Division of Geriatric Medicine and Gerontology
>>> School of Medicine
>>> Johns Hopkins University
>>>
>>> Ph. (410) 502-2619
>>> email: rvarad...@jhmi.edu
>>>
>>>
>>> - Original Message -
>>> From: David Winsemius 
>>> Date: Friday, March 5, 2010 9:25 am
>>> Subject: Re: [R] Nonparametric generalization of ANOVA
>>> To: blue sky 
>>> Cc: r-h...@stat.math.ethz.ch
>>>
>>>
>>>>  On Mar 5, 2010, at 8:19 AM, blue sky wrote:
>>>>
>&

Re: [R] Nonparametric generalization of ANOVA

2010-03-05 Thread Matthew Dowle

John,

So you want BlueSky to change their name to "Paul Smith" at "New York 
University",   just to give a totally random, false name, example,  and then 
you will be happy ?  I just picked a popular, real name at a real, big 
place.   Are you, or is anyone else,  going to check its real ?

We want BlueSky to ask great questions,  which haven't been asked before, 
and to follow the posting guide.  If BlueSky improves the knowledge base 
whats the problem?  This person may well be breaking the posting guide for 
many other reasons  (I haven't looked), and if they are then you could take 
issue with them on those points, but not for simply writing as "BlueSky".

David W has got it right when he replied to "ManInMoon".   Shall we stop 
this thread now,  and follow his lead ?   I would have picked "ManOnMoon" 
myself but maybe that one was taken. Its rather difficult to be on a moon, 
let alone inside it.

Matthew


"John Sorkin"  wrote in message 
news:4b91068702cb00064...@medicine.umaryland.edu...
> The sad part of this interchanges is that Blue Sky does not seem to be 
> amiable to suggestion. He, or she, has not taken note, or responded to the 
> fact that a number of people believe it is good manners to give a real 
> name and affiliation. My mother taught me that when two people tell you 
> that you are drunk you should lie down until the inebriation goes away. 
> Blue Sky, several people have noted that you would do well to give us your 
> name and affiliation. Is this too much to ask given that people are good 
> enough to help you?
> John
>
>
>
>
> John David Sorkin M.D., Ph.D.
> Chief, Biostatistics and Informatics
> University of Maryland School of Medicine Division of Gerontology
> Baltimore VA Medical Center
> 10 North Greene Street
> GRECC (BT/18/GR)
> Baltimore, MD 21201-1524
> (Phone) 410-605-7119
> (Fax) 410-605-7913 (Please call phone number above prior to faxing)>>> 
> "Matthew Dowle"  3/5/2010 12:58 PM >>>
> Frank, I respect your views but I agree with Gabor.  The posting guide 
> does
> not support your views.
>
> It is not any of our views that are important but we are following the
> posting guide.  It covers affiliation. It says only that "some" consider 
> it
> "good manners to include a concise signature specifying affiliation". It
> does not agree that it is bad manners not to.  It is therefore going too 
> far
> to urge R-gurus, whoever they might be, to ignore such postings on that
> basis alone.  It is up to responders (I think that is the better word 
> which
> is the one used by the posting guide) whether they reply.  Missing
> affiliation is ok by the posting guide.  Users shouldn't be put off from
> posting because of that alone.
>
> Sending from an anonymous email address such as "BioStudent" is also fine 
> by
> the posting guide as far as my eyes read it. It says only that the email
> address should work. I would also answer such anonymous posts, providing
> they demonstrate they made best efforts to follow the posting guide, as
> usual for all requests for help.  Its so easy to send from a false, but
> apparently real name, why worry about that?
>
> If you disagree with the posting guide then you could make a suggestion to
> get the posting guide changed with respect to these points.  But, 
> currently,
> good and practice is defined by the posting guide, and I can't see that 
> your
> view is backed up by it.  In fact it seems to me that these points were
> carefully considered, and the wording is careful on these points.
>
> As far as I know you are wrong that there is no moderator.  There are in
> fact an uncountable number of people who are empowered to moderate i.e. 
> all
> of us. In other words its up to the responders to moderate.  The posting
> guide is our guide.  As a last resort we can alert the list administrator
> (which I believe is the correct name for him in that role), who has powers
> to remove an email address from the list if he thinks that is appropriate,
> or act otherwise, or not at all.  It is actually up to responders (i.e. 
> all
> of us) to ensure the posting guide is followed.
>
> My view is that the problems started with some responders on some 
> occasions.
> They sometimes forgot, a little bit, to encourage and remind posters to
> follow the posting guide when it was not followed. This then may have
> encouraged more posters to think it was ok not to follow the posting 
> guide.
> That is my own personal view,  not a statistical one backed up by any
> evidence.
>
> Matthew
>
>
> "Frank E Harrell Jr"  wrote in message
> news:4b913880.9020...@vanderbil

Re: [R] fit a gamma pdf using Residual Sum-of-Squares

2010-03-08 Thread Matthew Dowle

Thanks for making it quickly reproducible - I was able to see that message 
in English within a few seconds.
The start has x=86, but the data is also called x.  Remove x=86 from start 
and you get a different error.
P.S. - please do include the R version information. It saves time for us, 
and we like it if you save us time.

"vincent laperriere"  wrote in message 
news:883644.16455...@web24106.mail.ird.yahoo.com...
Hi all,

I would like to fit a gamma pdf to my data using the method of RSS (Residual 
Sum-of-Squares). Here are the data:

 x <- c(86,  90,  94,  98, 102, 106, 110, 114, 118, 122, 126, 130, 134, 138, 
142, 146, 150, 154, 158, 162, 166, 170, 174)
 y <- c(2, 5, 10, 17, 26, 60, 94, 128, 137, 128, 77, 68, 65, 60, 51, 26, 17, 
9, 5, 2, 3, 7, 3)

I have typed the following code, using nls method:

fit <- nls(y ~ (1/((s^a)*gamma(a))*x^(a-1)*exp(-x/s)), start = c(s=3, a=75, 
x=86))

But I have the following message error (sorry, this is in German):


Fehler in qr(.swts * attr(rhs, "gradient")) :
  Dimensionen [Produkt 3] passen nicht zur Länge des Objektes [23]
Zusätzlich: Warnmeldung:
In .swts * attr(rhs, "gradient") : Länge des längeren Objektes
  ist kein Vielfaches der Länge des kürzeren Objektes

Could anyone help me with the code?
I would greatly appreciate it.
Sincerely yours,
Vincent Laperrière.



[[alternative HTML version deleted]]







>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] IMPORTANT - To remove the null elements from a vector

2010-03-09 Thread Matthew Dowle

Welcome to R Barbara.  Its quite an incredible community from all walks of 
life.

Your beginner questions are answered in the manual. See Introduction to R. 
Please read the posting guide again because it contains lots of good advice 
for you. Some people read it three times before posting because they have so 
much respect for the community.  Sometimes they trip up over themselves to 
show they have read it.

Btw - just to let you know that starting your subject lines with "IMPORTANT" 
is considered by some people a demanding tone for free help. Not everyone, 
but some people. Two posts starting IMPORTANT within 5 minutes is another 
thing that a very large number of people around the world may have just seen 
you do.  I'm just letting you know, in case you were not aware of this.

You received answers from four people who clearly don't mind, and you have 
your answers. Was that your only goal in posting?  Did you consider there 
might be downsides?  This is a public list read by many people and one thing 
the posting guide says is that your questions are saved in the archives 
forever.  Just checking you knew that.  I wouldn't want you to reduce your 
reputation accidentally.  A future employer (it might be a company, or it 
might be a university) anywhere in the world might do a simple search on 
your name, and thats why you might not get an interview, because you had 
showed (in their minds) that you didn't have respect for guidlines. I would 
hate for something like that to happen, all just because you didn't know you 
were supposed to read the posting guide, it wouldn't be fair on you. So it 
would be very unfair of me to know that, and suspect that you don't, but not 
tell you about the posting guide, wouldn't it ?  I hope this information 
helps you.  It is entirely up to you.

r-help is a great way to increase your reputation, but it can reduce your 
reputation too.  By asking great questions, or even contributing, you can 
proudly put that on your CV and increase your chances of getting that 
interview, or getting that position.  I have seen on several CVs from 
students the text "please search for my name on r-help".  Just like 
everything you do in public, r-help is very similar. What you write, you 
write in the public domain, and you write it free of charge, and free of 
restriction.

All this applies to all us. When asking for help, and when giving help.

Matthew


 wrote in message 
news:of1a8063a1.fc14f5ff-onc12576e1.00466053-c12576e1.00466...@uniroma1.it...
>
> I have a vector that have null elements. How to remove these elements?
> For example:
> x=[10 0 30 40 0 0] I want the vector y=[10 30 40]
> Thanks
> [[alternative HTML version deleted]]
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] speed

2010-03-10 Thread Matthew Dowle

Your choice of subject line alone shows some people that you missed some 
small details from the posting guide. The ability to notice small details 
may be important for you to demonstrate in future.  Any answer in this 
thread is unlikely to be found by a topic search on subject lines alone 
since "speed" is a single word.

One fast way to increase your reputation is to contribute.  You now have an 
opportunity.  If you follow Jim's good advice, discover the answer for 
yourself, and post it back to the group, changing the subject line so that 
its easier for others to find in future,  thats one way you can contribute 
and increase your reputation.  If you don't do that, thats your choice. It 
is entirely up to you. Whatever action you take next, even doing nothing is 
an action, it is visible in public for everyone to search back and find out 
within seconds.

HTH

"Adam Majewski"  wrote in message 
news:hn6fp4$2g...@dough.gmane.org...
> Hi,
>
> I have found some example of R code :
> http://commons.wikimedia.org/wiki/File:Mandelbrot_Creation_Animation_%28800x600%29.gif
>
> When I run this code on my computer it takes few seconds.
>
> I wanted to make similar program in Maxima CAS :
>
> http://thread.gmane.org/gmane.comp.mathematics.maxima.general/29949/focus=29968
>
> for example :
>
> f(x,y,n) :=
> block([i:0, c:x+y*%i,ER:4,iMax:n,z:0],
> while abs(z)do (z:z*z + c,i:i+1),
> min(ER,abs(z)))$
>
> wxanimate_draw3d(
>n, 5,
>enhanced3d=true,
>user_preamble="set pm3d at b; set view map",
>xu_grid=70,
>yv_grid=70,
>explicit('f(x,y,n), x, -2, 0.7, y, -1.2, 1.2))$
>
>
> But it takes so long to make even one image ( hours)
>
> What makes the difference, and why R so fast ?
>
> Regards
>
> Adam
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Strange result in survey package: svyvar

2010-03-10 Thread Matthew Dowle

This list is the wrong place for that question.  The posting guide tells 
you, in bold, to contact the package maintainer first.

If you had already done that, and didn't hear back from him,  then you 
should tell us,  so that we know you followed the guide.

"Corey Sparks"  wrote in message 
news:c7bd3ca5.206a%corey.spa...@utsa.edu...
> Hi R users,
> I'm using the survey package to calculate summary statistics for a large
> health survey (the Demographic and Health Survey for Honduras, 2006), and
> when I try to calculate the variances for several variables, I get 
> negative
> numbers.  I thought it may be my data, so I ran the example on the help
> page:
>
> data(api)
> ## one-stage cluster sample
> dclus1<-svydesign(id=~dnum, weights=~pw, data=apiclus1, fpc=~fpc)
>
> svyvar(~api00+enroll+api.stu+api99, dclus1)
>variance SE
> api0011182.8 1386.4
> api0011516.3 1412.9
> api.stu  -4547.1 3164.9
> api9912735.2 1450.1
>
> If I look at the full matrix for the variances (and covariances):
> test<-svyvar(~api00+enroll+api.stu+api99, dclus1)
>
> print(test, covariance=T)
>variance  SE
> api00:api00  11182.8  1386.4
> enroll:api00 -5492.4  3458.1
> api.stu:api00-4547.1  3164.9
> api99:api00  11516.3  1412.9
> api00:enroll -5492.4  3458.1
> enroll:enroll   136424.3 41377.2
> api.stu:enroll  114035.7 34153.9
> api99:enroll -3922.3  3589.9
> api00:api.stu-4547.1  3164.9
> enroll:api.stu  114035.7 34153.9
> api.stu:api.stu  96218.9 28413.7
> api99:api.stu-3060.0  3260.9
> api00:api99  11516.3  1412.9
> enroll:api99 -3922.3  3589.9
> api.stu:api99-3060.0  3260.9
> api99:api99  12735.2  1450.1
>
>
> I see that the function is actually returning the covariance for the 
> api.stu
> with the api00 variable.
>
> I can get the correct variances if I just take
> diag(test)
>
> But I just was wondering if anyone else was having this problem.  I'm 
> using
> :
>> sessionInfo()
> R version 2.10.1 Patched (2009-12-20 r50794)
> x86_64-apple-darwin9.8.0
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] stats graphics  grDevices utils datasets  methods   base
>
> other attached packages:
> [1] survey_3.19
>
> loaded via a namespace (and not attached):
> [1] tools_2.10.1
>
> And have the same error on a linux server.
>
> Thanks,
> Corey
> -- 
> Corey Sparks
> Assistant Professor
> Department of Demography and Organization Studies
> University of Texas at San Antonio
> 501 West Durango Blvd
> Monterey Building 2.270C
> San Antonio, TX 78207
> 210-458-3166
> corey.sparks 'at' utsa.edu
> https://rowdyspace.utsa.edu/users/ozd504/www/index.htm
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] leaps error

2010-03-10 Thread Matthew Scholz

R version: 2.10.0
platform: i486-pc-linux-gnu

I'm trying to perform model selection from a data.frame object (creatively
named "data") using the leaps function, and I run across the following
error:

> leaps(data[,3:7], data[,1], nbest = 10)
Error in leaps.setup(x, y, wt = wt, nbest = nbest, nvmax = NCOL(x) + int,  :

character variables must be duplicated in .C/.Fortran

Here is a sample of what 'data' looks like:

   ethanol flask batch delabso delgluc delglyc ph
1 0.00 1 01.41 0.0 0.7  1
2 0.00 2 01.33 0.0 0.6  9
3 0.00 2 01.18 0.0 1.1  1
4 0.00 3 11.58 0.0 3.5  1
5 0.00 4 01.25 0.0 5.0  1
6 0.00 4 0   -0.01 0.0 5.0  1
7 0.32 5 0   -0.08 0.0 1.5  1
8 0.00 6 11.22 0.1 3.0  1
9 0.00 6 11.30 0.3 0.4  1
100.13 7 01.48 0.3 1.4  1

where flask, batch, and ph are factors and the rest of the variables are of
class numeric. There are NAs in this data set. Does anyone understand this?

Thanks

Matt Scholz
PhD Candidate
Department of Ag. and Biosystems Engineering
University of Arizona
(520)6266947

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Forecasting with Panel Data

2010-03-11 Thread Matthew Dowle

Ricardo,

I see you got no public answer so far, on either of the two lists you posted 
to at the same time yesterday.  You are therefore unlikely to ever get a 
reply.

I also see you've been having trouble getting answers in the past, back to 
Nov 09,  at least.  For example no reply to "Credit Migration Matrix" (Jan 
2010) and no reply to "Help with a Loop in function" (Nov 2009).

For your information, this is a public place and it took me about 10 seconds 
to assess you. Anyone else on the planet can do this too.

Please read the posting guide AND the links from it, especially the last 
link.  I suggest you read it fully, and slowly.  I think its just that you 
didn't know about it, or somehow missed it by accident.  You were told to 
read it though, at the time you subscribed to this list, at least.  Don't 
worry,  this is not a huge problem. You can build up your reputation again 
very quickly.

With the kindest of regards,

Matthew


"Ricardo Gonçalves Silva"  wrote in message 
news:df406bd9dbe644a9b8c0642a3c3f8...@ricardopc...
> Dear Users,
>
> Can I perform panel data (fixed effects model) out of sample forecasts 
> using R?
>
> Thanks in advance,
>
> Ricardo.
> [[alternative HTML version deleted]]
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] [R-SIG-Mac] How to interrupt an R process that hangs

2010-03-15 Thread Matthew Keller

Hi all,

Thanks Simon and Duncan for the help. Sorry to be dense, but I'm still
unsure how to interrupt such processes. Here's an example:

for (i in 1:10){
a <- matrix(rnorm(10*10),ncol=10)
b <- svd(a) }

If you run this, R will hang (i.e., it's a legitimate execution, it
will just take a really long time to execute). The most obvious
solution is to write code that doesn't do unintended things, but
that's not always possible. Is there a way to interrupt it? I tried:

kill -s INT 

and at least on Mac it had no effect. Thanks again,

Matt



On Mon, Mar 15, 2010 at 1:19 PM, Simon Urbanek
 wrote:
>
> On Mar 15, 2010, at 14:42 , Adam D. I. Kramer wrote:
>
>> +1--this is the single most-annoying issue with R that I know of.
>>
>> My usual solution, after accomplishing nothing as R spins idly for a
>> couple
>> hours, is to kill the process and lose any un-saved work.  save.history()
>> is
>> my friend, but is a big delay when you work with big data sets as I do, so
>> I
>> don't run it after every command.
>>
>> I have cc'd r-help here, however, because I experience this problem with
>> non-OSX R as well...when I run it in Linux or from the OSX command-line (I
>> compile R for Darwin without aqua/R-framework), the same thing happens.
>>
>> Is there some way around this? Is this a known problem?
>>
>
> "Hanging" for a long period of time is usually caused by poorly written
> C/Fortran code. You can always interrupt R as long as it is in the R code.
> Once you load a package that uses native code (C/Fortran/..) you have to
> rely on the sanity of the developer to call R_CheckUserInterrupt() or
> rchkusr() often enough (see 6.12 in R-ext). If you have some particular
> package that does not do that, I would suggest alerting the author. By
> definition this requires cooperation from authors, because interrupting
> random code forcefully (as it was possible many years ago) creates leaks and
> unstable states.
>
> Cheers,
> Simon
>
>
>
>> Google searching suggests no solution, timeline, or anything, but the
>> problem has been annoying users for at least twelve years:
>> http://tolstoy.newcastle.edu.au/R/help/9704/0151.html
>>
>> Cordially,
>> Adam
>>
>> On Mon, 15 Mar 2010, Matthew Keller wrote:
>>
>>> HI all,
>>>
>>> Apologies for this question. I'm sure it's been asked many times, but
>>> despite 20 minutes of looking, I can't find the answer. I never use
>>> the GUI, I use emacs, but my postdoc does, so I don't know what to
>>> tell her about the following:
>>>
>>> Occasionally she'll mess up in her code and cause R to hang
>>> indefinitely (e.g., R is trying to do something that will take days).
>>> In these situations, is there an option other than killing R (and the
>>> work you've done on your script to that point)?
>>>
>>> Thank you,
>>>
>>> Matthew Keller
>>>
>>>
>>> --
>>> Matthew C Keller
>>> Asst. Professor of Psychology
>>> University of Colorado at Boulder
>>> www.matthewckeller.com
>>>
>>> ___
>>> R-SIG-Mac mailing list
>>> r-sig-...@stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mac
>>>
>>
>> ___
>> R-SIG-Mac mailing list
>> r-sig-...@stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mac
>>
>>
>
>



-- 
Matthew C Keller
Asst. Professor of Psychology
University of Colorado at Boulder
www.matthewckeller.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] [R-SIG-Mac] How to interrupt an R process that hangs

2010-03-15 Thread Matthew Keller

Hi all,

Thanks for the responses. Ted - thank you for your help. I had to
laugh. I'm no computer guru, but I do know unix well enough to know
not to type "". But then again, my original code did contain a
matrix with >>2^31-1 elements, so maybe your assumption was reasonable
;)

Anyway, all your kill statements merely kill R, script included, which
doesn't really do what I'd like.

Thus, summary of responses:

Question:
"How do I interrupt an R process that's taking too long?"

Answer:
"You don't. Kill R. And don't make mistakes."

Matthew



On Mon, Mar 15, 2010 at 2:49 PM, Ted Harding
 wrote:
> [Though I'm not using a Mac, OS X is a Unix variant and should
> have the commands used below installed]
>
> Did you *literally* do
>  kill -s INT 
> without substituting the R PID for " In Mac console, do
>
>  ps aux | grep R
>
> On my Linux machine this currently responds with (amongst some
> irrelevant lines):
>
> ted  8625  0.0  3.2  41568 34096 pts/6  S+ Mar13 0:07
>  /usr/lib/R/bin/exec/R --no-save
>
> showing that the PID of the R process is 8625. Then you can do
> whatever corresponds to
>
>  kill -s INT 8625
>
> (replacing "8625" with what you get from ps). However, when I
> just tried it, it didn't work for me either. So I changed the
> Signal from "INT" to "HUP", and this time it did work. Maybe
> try this instead?
>
> Other ways of using 'kill' include
> (a) Use the signal number (1 for HUP, 2 for INT) like
>
>  kill -1 8625    or     kill -2 8625
>
> (b) Don't search for the numeric Process ID (PID) but kill it
>    by name ('killall' command):
>
>  killall -1 R    or    killall -2 R
>
> However, this will kill every running instance of R (if you
> two or more running simultaneously), and you may not want that!
>
> Hoping this helps,
> Ted.
>
>
>
> On 15-Mar-10 20:20:29, Matthew Keller wrote:
>> Hi all,
>>
>> Thanks Simon and Duncan for the help. Sorry to be dense, but I'm still
>> unsure how to interrupt such processes. Here's an example:
>>
>> for (i in 1:10){
>>       a <- matrix(rnorm(10*10),ncol=10)
>>       b <- svd(a)     }
>>
>> If you run this, R will hang (i.e., it's a legitimate execution, it
>> will just take a really long time to execute). The most obvious
>> solution is to write code that doesn't do unintended things, but
>> that's not always possible. Is there a way to interrupt it? I tried:
>>
>> kill -s INT 
>>
>> and at least on Mac it had no effect. Thanks again,
>>
>> Matt
>>
>>
>>
>> On Mon, Mar 15, 2010 at 1:19 PM, Simon Urbanek
>>  wrote:
>>>
>>> On Mar 15, 2010, at 14:42 , Adam D. I. Kramer wrote:
>>>
>>>> +1--this is the single most-annoying issue with R that I know of.
>>>>
>>>> My usual solution, after accomplishing nothing as R spins idly for a
>>>> couple
>>>> hours, is to kill the process and lose any un-saved work.
>>>> Â_save.history()
>>>> is
>>>> my friend, but is a big delay when you work with big data sets as I
>>>> do, so
>>>> I
>>>> don't run it after every command.
>>>>
>>>> I have cc'd r-help here, however, because I experience this problem
>>>> with
>>>> non-OSX R as well...when I run it in Linux or from the OSX
>>>> command-line (I
>>>> compile R for Darwin without aqua/R-framework), the same thing
>>>> happens.
>>>>
>>>> Is there some way around this? Is this a known problem?
>>>>
>>>
>>> "Hanging" for a long period of time is usually caused by poorly
>>> written
>>> C/Fortran code. You can always interrupt R as long as it is in the R
>>> code.
>>> Once you load a package that uses native code (C/Fortran/..) you have
>>> to
>>> rely on the sanity of the developer to call R_CheckUserInterrupt() or
>>> rchkusr() often enough (see 6.12 in R-ext). If you have some
>>> particular
>>> package that does not do that, I would suggest alerting the author. By
>>> definition this requires cooperation from authors, because
>>> interrupting
>>> random code forcefully (as it was possible many years ago) creates
>>> leaks and
>>> unstable states.
>>>
>>> Cheers,
>>> Simon
>>>
>>>
>>>
>>>> Google searching suggests no solut

[R] Problem specifying Gamma distribution in lme4/glmer

2010-03-20 Thread Matthew Giovanni

Dear R and lme4 users-

I am trying to fit a mixed-effects model, with the glmer function in
lme4, to right-skewed, zero-inflated, non-normal data representing
understory grass and forb biomass (continuous) as a function of tree
density (indicated by leaf-area).  Thus, I have tried to specify a
Gamma distribution with a log-link function but consistently receive
an error as follows:

> total=glmer(total~gla4+(1|plot)+(1|year/month),data=veg,family=Gamma(link=log))
> summary(total)
Error in asMethod(object) : matrix is not symmetric [1,2]

I have also tried fitting glmm's with lme4 and glmer to other
Gamma-distributed data but receive the same error.  Has anyone had
similar problems and found any solutions?

Thank you for your input.  Best regards,
___
Matt Giovanni, Ph.D.
NSERC Visiting Research Fellow
Canadian Wildlife Service
2365 Albert St., Room 300
Regina, SK S4P 4K1
306-780-6121 work
402-617-3764 mobile
http://sites.google.com/site/matthewgiovanni/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Merge data frame and keep unmatched

2009-07-13 Thread Matthew Dowle

Or if you need it to be fast,  try data.table.   X[Y] is a join when X and Y 
are both data.tables. X[Y] is a left join, Y[X] is a right join. 'nomatch' 
controls the inner/outer join i.e. what happens for unmatched rows.   This 
is much faster than merge().

"Gabor Grothendieck"  wrote in message 
news:971536df0906100704q433f5f99ld3f9c23e69d95...@mail.gmail.com...
Try:

merge(completedf, partdf, all.x = TRUE)

or

library(sqldf) # see http://sqldf.googlecode.com
sqldf("select * from completedf left join partdf using(beta, alpha)")

On Wed, Jun 10, 2009 at 9:56 AM, Etienne B. Racine 
wrote:
>
> Hi,
>
> With two data sets, one complete and another one partial, I would like to
> merge them and keep the unmatched lines. The problem is that merge() 
> dosen't
> keep the unmatched lines. Is there another function that I could use to
> merge the data frames.
>
> Example:
>
> completedf <- expand.grid(alpha=letters[1:3],beta=1:3)
> partdf <- data.frame(
> alpha= c('a','a','c'),
> beta = c(1,3,2),
> val = c(2,6,4))
>
> mergedf <- merge(x=completedf, y=partdf, by=c('alpha','beta'))
> # it only kept the common rows
> nrow(mergedf)
>
> Thanks,
> Etienne
> --
> View this message in context: 
> http://www.nabble.com/Merge-data-frame-and-keep-unmatched-tp23962874p23962874.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to order an data.table by values of an column?

2009-07-14 Thread Matthew Dowle

If the question really meant to say "data.table" (i.e. package
"data.table") then its easier than the data.frame answer.

dt = 
data.table(Categ=c(468,351,0,234,117),Perc=c(31.52,27.52,0.77,22.55,15.99))
dt[order(Categ)]

Notice there is no dt$ required before dt$Categ.  Also note the comma is
optional.  See help("[.data.table")

Another example :

dt[Categ>300,cumsum(Perc+Categ)]
[1] 499.52 878.04

Thats it. The i and the j are evaluated within the data.table i.e. you can
use column names as variables in expressions,  like a built-in with() and 
subset().

A join between 2 data.tables X and Y is just X[Y].  This is much faster than 
merge().

"Allan Engelhardt"  wrote in message 
news:4a309f8e.4000...@cybaea.com...
> See help("order") and help("[.data.frame").
>
>
> df <- 
> data.frame(Categ=c(468,351,0,234,117),Perc=c(31.52,27.52,0.77,22.55,15.99))
> df[order(df$Categ),]
> #   Categ  Perc
> # 3 0  0.77
> # 5   117 15.99
> # 4   234 22.55
> # 2   351 27.52
> # 1   468 31.52
>
>
> Lesandro wrote:
>> Hello!
>>
>> Can you help me? How to order an data.table by values of an column?
>>
>> Per example:
>>
>> Table no initial
>>
>> Categ Perc
>> 468  31.52
>> 351  27.52
>> 0  0.77
>> 234  22.55
>> 117  15.99
>>
>> table final
>>
>> Categ Perc
>> 0  0.77
>> 117  15.99
>> 234  22.55
>> 351  27.52
>> 468  31.52
>>
>> Lesandro
>>
>>
>>
>>   Veja quais são os assuntos do momento no Yahoo! +Buscados
>>
>> [[alternative HTML version deleted]]
>>
>>   
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide 
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Memory errors when using QCA package

2009-07-22 Thread Matthew Gwynne

Hi,

I have been using the QCA package, in particular the "eqmcc" function
and I am having some issues when trying to use this to minimise a
particular boolean function.

The  boolean function in question has 16 variables, and I am providing
the full truth table for the function (65536 with 256 true entries),
in the following way :

library(QCA)

func_tt = read.table("func.tt",header=TRUE)
eqmcc(func_tt, outcome="O", expl.0=TRUE)

However, after calculating for a little while, the system throws up a
memory error :

Error in vector("double", length) :
  cannot allocate vector of length 2130706560


However, looking at the memory usage, I seem to have far more than 2GB free.

Is there some kind of built-in limit on the size of the heap in R? If
so, is there some way I can extend this?

Does anyone have any insight into this? Perhaps I am doing something stupid?

Thanks

Matthew

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] If else statements

2010-03-23 Thread Matthew Dowle

Here are some references. Please read these first and post again if you are 
still stuck after reading them. If you do post again, we will need x and y.

1. Introduction to R : 9.2.1 Conditional execution: if statements.
2. R Language Definition : 3.2 Control structures.
3. R for beginners by E Paradis : 6.1 Loops and vectorization
4. Eric Raymond's essay "How to Ask Questions The Smart Way" 
http://www.catb.org/~esr/faqs/smart-questions.html.

HTH
Matthew


"tj"  wrote in message 
news:1269325933723-1678705.p...@n4.nabble.com...
>
> Hi everyone!
> May I request again for your help?
> I need to make some codes using if else statements...
> Can I do an "if-else statement" inside an "if-else statement"? Is this the
> correct form of writing it?
> Thank you.=)
>
> Example:
>
> for (v in 1:6) {
> for (i in 2:200) {
> if (v==1)
> (if max(x*v-y*v)>1 break())
>
> if (v==2)
> (if max(x*v-y*v)>1.8 break())
>
> if (v==3)
> (if max(x*v-y*v)>2 break())
> }
> }
> -- 
> View this message in context: 
> http://n4.nabble.com/If-else-statements-tp1678705p1678705.html
> Sent from the R help mailing list archive at Nabble.com.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Mosaic

2010-03-24 Thread Matthew Dowle

When you click search on the R homepage, type "mosaic" into the box, and 
click the button, do the top 3 links seem relevant ?

Your previous 2 requests for help :

26 Feb :  Response was SuppDists. Yet that is the first hit returned by the 
subject line you posted : "Hartleys table"

22 Feb :  Response was shapiro.test. Yet that is in the second hit returned 
by the subject line you posted : "normality in split plot design"

Spot the pattern ?


"Silvano"  wrote in message 
news:a9322645c4f846a3a6a9daaa8b5a2...@ccepc...
Hi,

I have this data set:

obitoss = c(
5.8,17.4,5.9,17.6,5.8,17.5,4.7,15.8,
3.8,13.4,3.8,13.5,3.7,13.4,3.4,13.6,
4.4,17.3,4.3,17.4,4.2,17.5,4.3,17.0,
4.4,13.6,5.1,14.6,5.7,13.5,3.6,13.3,
6.5,19.6,6.4,19.4,6.3,19.5,6.0,19.7)

(dados = data.frame(
regiao = factor(rep(c('Norte', 'Nordeste', 'Sudeste', 'Sul',
'Centro-Oeste'), each=8)),
ano = factor(rep(c('2000','2001','2002','2003'), each=2)),
sexo = factor(rep(c('F','M'), 4)), resp=obitoss))

I would like to make a mosaic to represent the numeric
variable depending on 3 variables. Does anyone know how to
do?

--
Silvano Cesar da Costa
Departamento de Estatística
Universidade Estadual de Londrina
Fone: 3371-4346

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] translating SQL statements into data.table operations

2010-03-25 Thread Matthew Dowle


Nick,

Good question,  but just sent to the wrong place. The posting guide asks you 
to contact the package maintainer first before posting to r-help only if you 
don't hear back. I guess one reason for that is that if questions about all 
2000+ packages were sent to r-help, then r-help's traffic could go through 
the roof.  Another reason could be that some (i.e. maybe many, maybe few) 
package maintainers don't actually monitor r-help and might miss any 
messages you post here.  I only saw this one thanks to google alerts.

Since I'm writing anyway ... are you using the latest version on r-forge 
which has the very fast grouping? Have you set multi-column keys on both edt 
and cdt and tried edt[cdt,roll=TRUE] syntax ?  We'll help you off list to 
climb the learning curve quickly. We are working on FAQs and a vignette and 
they should be ready soon too.

Please do follow up with us (myself and Tom Short cc'd are the main 
developers) off list and one of us will be happy to help further.

Matthew


"Nick Switanek"  wrote in message 
news:772ec1011003241351v6a3f36efqb0b0787564691...@mail.gmail.com...
> I've recently stumbled across data.table, Matthew Dowle's package. I'm
> impressed by the speed of the package in handling operations with large
> data.frames, but am a bit overwhelmed with the syntax. I'd like to express
> the SQL statement below using data.table operations rather than sqldf 
> (which
> was incredibly slow for a small subset of my financial data) or
> import/export with a DBMS, but I haven't been able to figure out how to do
> it. I would be grateful for your suggestions.
>
> nick
>
>
>
> My aim is to join events (trades) from two datasets ("edt" and "cdt") 
> where,
> for the same stock, the events in one dataset occur between 15 and 75 days
> before the other, and within the same time window. I can only see how to
> express the "WHERE e.SYMBOL = c.SYMBOL" part in data.table syntax. I'm 
> also
> at a loss at whether I can express the remainder using data.table's
> %between% operator or not.
>
> ctqm <- sqldf("SELECT e.*,
> c.DATE 'DATEctrl',
> c.TIME 'TIMEctrl',
> c.PRICE 'PRICEctrl',
> c.SIZE 'SIZEctrl'
>
> FROM edt e, ctq c
>
> WHERE e.SYMBOL = c.SYMBOL AND
>   julianday(e.DATE) - julianday(c.DATE) BETWEEN 15 AND
> 75 AND
>   strftime('%H:%M:%S',c.TIME) BETWEEN
> strftime('%H:%M:%S',e.BEGTIME) AND strftime('%H:%M:%S',e.ENDTIME)")
>
> [[alternative HTML version deleted]]
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] NA values in indexing

2010-03-26 Thread Matthew Dowle

The type of 'NA' is logical. So x[NA] behaves more like x[TRUE] i.e. silent 
recycling.

> class(NA)
[1] "logical"
> x=101:108
> x[NA]
[1] NA NA NA NA NA NA NA NA
> x[c(TRUE,NA)]
[1] 101  NA 103  NA 105  NA 107  NA

> x[as.integer(NA)]
[1] NA

HTH
Matthew

"Barry Rowlingson"  wrote in message 
news:d8ad40b51003260509y6b671e53o9f79142d2b52c...@mail.gmail.com...
If you index a vector with a vector that has NA in it, you get NA back:

 > x=101:107
 > x[c(NA,4,NA)]
 [1]  NA 104  NA
 > x[c(4,NA)]
 [1] 104  NA

All well and good. ?"[" says, under NAs in indexing:

 When extracting, a numerical, logical or character NA index
 picks an unknown element and so returns NA in the corresponding
 element of a logical, integer, numeric, complex or character
 result, and NULL for a list.  (It returns 00 for a raw
 result.]

But if the indexing vector is all NA, you get back a vector of length
of your original vector rather than of your index vector:

 > x[c(NA,NA)]
 [1] NA NA NA NA NA NA NA

Maybe it's just me, but I find this surprising, and I can't see it
documented. Bug or undocumented feature? Apologies if I've missed
something obvious.

Barry

 sessionInfo()
R version 2.11.0 alpha (2010-03-25 r51407)
i686-pc-linux-gnu

locale:
 [1] LC_CTYPE=en_GB.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_GB.UTF-8LC_COLLATE=en_GB.UTF-8
 [5] LC_MONETARY=C  LC_MESSAGES=en_GB.UTF-8
 [7] LC_PAPER=en_GB.UTF-8   LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Combing

2010-03-29 Thread Matthew Dowle

Val,

Type "combine two data sets" (text you wrote in your post) into 
www.rseek.org. The first two links are: "Quick-R: Merge" and "Merging data: 
A tutorial".  Isn't it quicker for you to use rseek, rather than the time it 
takes to write a post and wait for a reply ?  Don't you also get more 
detailed information that way too ?

You already received advice from others on this list to look at 
www.rseek.org on 26 Oct,  package 'sos' on 27 Oct, and to 'read the manuals 
and FAQs before posting' on 5 Nov.

This month you have posted 3 times : "Loop", "Renumbering" and "Combing".

References :
1. Posting Guide headings : "Do your homework before posting" and "Further 
resources"
2. Contributed Documentation e.g. 'R Reference Card' by Tom Short 
http://cran.r-project.org/doc/contrib/Short-refcard.pdf.
3. Eric Raymond's essay http://www.catb.org/~esr/faqs/smart-questions.html. 
e.g. you posted to r-help 10 times so far,  9 of the 10 subjects were either 
a single word, or a single function name.

HTH
Matthew


"Val"  wrote in message 
news:cdc083ac1003290413s7e047e25lc4202568af119...@mail.gmail.com...
> Hi all,
>
> I want to combine two data sets (ZA and ZB to get ZAB).
> The common variable between the two data sets is ID.
>
> Data  ZA
> ID F  M
> 1  0  0
> 2  0  0
> 3  1  2
> 4  1  0
> 5  3  2
> 6  5  4
>
> Data ZB
>
> ID  v1  v2 v3
> 3  2.5 3.4 302
> 4  8.6 2.9 317
> 5  9.7 4.0 325
> 6  7.5 1.9 296
>
> Output (ZAB)
>
> ID F  M  v1  v2  v3
> 1  0  0   -9  -9  -9
> 2  0  0   -9  -9  -9
> 3  1  2  2.5 3.4 302
> 4  1  0  8.6 2.9 317
> 5  3  2  9.7 4.0 325
> 6  5  4  7.5 1.9 296
>
> Any help is highly appreciated in advance,
>
> Val
>
> [[alternative HTML version deleted]]
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] single quotes and double quotes in a system() command. What to do?

2010-03-29 Thread Matthew Keller

Hi all,

I would like to run the following from within R:

awk '{$3=$4="";gsub("  ","");print}' myfile > outfile

However, this obviously won't work:
system("awk '{$3=$4="";gsub("  ","");print}' myfile > outfile")

and this won't either:
system("awk '{$3=$4='';gsub('  ','');print}' myfile > outfile")


Can anyone help me out? I'm sure there's an obvious solution. Thanks,

Matt


-- 
Matthew C Keller
Asst. Professor of Psychology
University of Colorado at Boulder
www.matthewckeller.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Error "grid must have equal distances in each direction"

2010-03-31 Thread Matthew Dowle

M Joshi,

I don't know but I guess that some might have looked at your previous thread 
on 14 March (also about the geoR package). You received help and good advice 
then, but it doesn't appear that you are following it.  It appears to be a 
similar problem this time.

Also, this list is the wrong place for that question. Please read the 
posting guide to find out the correct place. Its a question about a package.

HTH,
Matthew


"maddy"  wrote in message 
news:1269974076132-1745651.p...@n4.nabble.com...
>
> Hello All,
>
> Can anyone please help me on this error?
>
> Error in FUN(X[[1L]], ...) :
>  different grid distances detected, but the grid must have equal distances
> in each direction -- try gridtriple=TRUE that avoids numerical errors.
>
> The program that I am trying to run posted in the previous post of this
> thread.After the rows 1021 of my matrix of size 1024*1024, I start getting
> all the values as 0s.
> How to set the gridtriple as I am using the grf function which does not 
> take
> this parameter as input.
>
> The maximum vector limit that can be reached in 'R' is 2^30, why does it 
> not
> allow me to create arrays of length even of size 2^17?
>
> Thanks,
> M Joshi
> -- 
> View this message in context: 
> http://n4.nabble.com/Error-grid-must-have-equal-distances-in-each-direction-tp1695189p1745651.html
> Sent from the R help mailing list archive at Nabble.com.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Question about 'logit' and 'mlogit' in Zelig

2010-03-31 Thread Matthew Dowle

Abraham,

This appears to be your 3rd unanswered post to r-help in March, all 3 have 
been about the Zelig package.

Please read the posting guide and find out the correct place to send 
questions about packages.  Then you might get an answer.

HTH
Matthew


"Mathew, Abraham T"  wrote in message 
news:281f7a5fdfef844696011cb21185f8ac0be...@mailbox-11.home.ku.edu...


I'm running a multinomial logit in R using the Zelig packages. According to 
str(trade962a), my dependent variable is a factor with three levels. When I 
run the multinomial logit I get an error message. However, when I run 
'model=logit' it works fine. any ideas on whats wrong?

## MULTINOMIAL LOGIT
anes96two <- zelig(trade962a ~ age962 + education962 + personal962 + 
economy962 + partisan962 + employment962 + union962 + home962 + market962 + 
race962 + income962, model="logit", data=data96)
summary(anes96two)

#Error in attr(tt, "depFactors")$depFactorVar :
#  $ operator is invalid for atomic vectors


## LOGIT
Call:
zelig(formula = trade962a ~ age962 + education962 + personal962 +
economy962 + partisan962 + employment962 + union962 + home962 +
market962 + race962 + income962, model = "logit", data = data96)

Deviance Residuals:
   Min  1Q  Median  3Q Max
-2.021  -1.179   0.764   1.032   1.648

Coefficients:
   Estimate Std. Error z value Pr(>|z|)
(Intercept)   -0.697675   0.600991  -1.161   0.2457
age962 0.003235   0.004126   0.784   0.4330
education962  -0.065198   0.038002  -1.716   0.0862 .
personal9620.006827   0.072421   0.094   0.9249
economy962-0.200535   0.084554  -2.372   0.0177 *
partisan9620.092361   0.079005   1.169   0.2424
employment962 -0.009346   0.044106  -0.212   0.8322
union962  -0.016293   0.149887  -0.109   0.9134
home962   -0.150221   0.133685  -1.124   0.2611
market962  0.292320   0.128636   2.272   0.0231 *
race9620.205828   0.094890   2.169   0.0301 *
income962  0.263363   0.048275   5.455 4.89e-08 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 1841.2  on 1348  degrees of freedom
Residual deviance: 1746.3  on 1337  degrees of freedom
  (365 observations deleted due to missingness)
AIC: 1770.3

Number of Fisher Scoring iterations: 4




Thanks
Abraham

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] zero standard errors with geeglm in geepack

2010-03-31 Thread Matthew Dowle

You may not have got an answer because you posted to the wrong place. Its a 
question about a package. Please read the posting guide.

"miriza"  wrote in message 
news:1269886286228-1695430.p...@n4.nabble.com...
>
> Hi!
>
> I am using geeglm to fit a Poisson model to a timeseries of count data as
> follows.  Since there are no clusters I use 73 values of 1 for the ids. 
> The
> problem I have is that I am getting standard errors of zero for the
> parameters.  What am I doing wrong?
> Thanks, Michelle
>>  N_Base
> [1]  95  85 104  88 102 104  91  88  85 115  96  83  91 107  96 116 118 
> 103
> 89  88 101 117  82  80  83 103 115 119  95  90  82  91 108 115  93  96  72
> [38]  98  95  98  97 104  86 107  92  94  95 100 107  76 104 101  80 102 
> 100
> 91  96  89  71 109  97 113  99 127 115  91  81  73  69  92  90  78  57
>> Year
> [1] 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945
> 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960
> 1961
> [31] 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975
> 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990
> 1991
> [61] 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2006
>
> tes=geese(formula = N_Base ~ Year, id = rep(1, 73), family = "poisson",
> corstr = "ar1")
>> summary(tes)
>
> Call:
> geese(formula = N_Base ~ Year, id = rep(1, 73), family = "poisson",
>corstr = "ar1")
>
> Mean Model:
> Mean Link: log
> Variance to Mean Relation: poisson
>
> Coefficients:
>estimate san.se wald p
> (Intercept)   7.1131  0  Inf 0
> Year -0.0013  0  Inf 0
>
> Scale Model:
> Scale Link:identity
>
> Estimated Scale Parameters:
>estimate san.se wald p
> (Intercept) 1.79  0  Inf 0
>
> Correlation Model:
> Correlation Structure: ar1
> Correlation Link:  identity
>
> Estimated Correlation Parameters:
>  estimate san.se wald p
> alpha0.187  0  Inf 0
>
> Returned Error Value:0
> Number of clusters:   1   Maximum cluster size: 73
>
> -- 
> View this message in context: 
> http://n4.nabble.com/zero-standard-errors-with-geeglm-in-geepack-tp1695430p1695430.html
> Sent from the R help mailing list archive at Nabble.com.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] GEE for a timeseries of count (one cluster)

2010-03-31 Thread Matthew Dowle

Contact the authors of those packages ?

"miriza"  wrote in message 
news:1269981675252-1745896.p...@n4.nabble.com...
>
> Hi!
>
> I was wondering if there were any packages that would allow me to fit a 
> GEE
> to a single timeseries of counts so that I could account for 
> autocorrelation
> in the data.  I tried gee, geepack and yags packages, but I do not get
> standard errors for the parameters when using a single cluster.  Any tips?
>
> Thanks, Michelle
> -- 
> View this message in context: 
> http://n4.nabble.com/GEE-for-a-timeseries-of-count-one-cluster-tp1745896p1745896.html
> Sent from the R help mailing list archive at Nabble.com.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] mcmcglmm starting value example

2010-03-31 Thread Matthew Dowle

Apparently not, since this your 3rd unanswered thread to r-help this month 
about this package.

Please read the posting guide and find out where you should send questions 
about packages.  Then you might get an answer.


"ping chen"  wrote in message 
news:975148.47160...@web15304.mail.cnb.yahoo.com...
> Hi R-users:
>
> Can anyone give an example of giving starting values for MCMCglmm?
> I can't find any anywhere.
> I have 1 random effect (physicians, and there are 50 of them)
> and family="ordinal".
>
> How can I specify starting values for my fixed effects? It doesn't seem to 
> have the option to do so.
>
> Thanks, Ping
>
>
>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] GLM / large dataset question

2010-03-31 Thread Matthew Dowle

Geelman,

This appears to be your first post to this list. Welcome to R. Nearly 2 days 
is quite a long time to wait though, so you are unlikely to get a reply now.

Feedback : the question seems quite vague and imprecise. It depends on which 
R you mean (32bit/64bit) and how much ram you have.  It also depends on your 
data and what you want to do with it. Did you mean 100.000 (i.e. one 
hundred) or 100,000.  Also, '8000 explanatory variables' seems a lot, 
especially to be stored in 'a factor'.  There is no R code in your post so 
we can't tell if you're using glm correctly or not.  You could provide the 
result of object.size(), and dim() on your data rather than explaining it in 
words.

No reply often, but not always, means you haven't followed some detail of 
the posting guide or haven't followed this : 
http://www.catb.org/~esr/faqs/smart-questions.html.

HTH
Matthew

"geelman"  wrote in message 
news:mkedkcmimcmgohidffmbieklcaaa.geel...@zonnet.nl...
> LS,
>
> How large a dataset can glm fit with a binomial link function?  I have a 
> set
> of about 100.000 observations and about 8000 explanatory variables (a 
> factor
> with 8000 levels).
>
> Is there a way to find out how large datasets R can handle in general?
>
>
>
> Thanks in advance,
>
>
> geelman
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Adding RcppFrame to RcppResultSet causes segmentation fault

2010-04-01 Thread Matthew Dowle

Rob,
Please look again at Romain's reply to you on 19th March.  He informed you 
then that Rcpp has its own dedicated mailing list and he gave you the link.
Matthew

"R_help Help"  wrote in message 
news:ad1ead5f1003291753p68d6ed52q572940f13e1c0...@mail.gmail.com...
> Hi,
>
> I'm a bit puzzled. I uses exactly the same code in RcppExamples
> package to try adding RcppFrame object to RcppResultSet. When running
> it gives me segmentation fault problem. I'm using gcc 4.1.2 on redhat
> 64bit. I'm not sure if this is the cause of the problem. Any advice
> would be greatly appreciated. Thank you.
>
> Rob.
>
>
> int numCol=4;
> std::vector colNames(numCol);
> colNames[0] = "alpha"; // column of strings
> colNames[1] = "beta";  // column of reals
> colNames[2] = "gamma"; // factor column
> colNames[3] = "delta"; // column of Dates
> RcppFrame frame(colNames);
>
> // Third column will be a factor. In the current implementation the
> // level names are copied to every factor value (and factors
> // in the same column must have the same level names). The level names
> // for a particular column will be factored out (pardon the pun) in
> // a future release.
> int numLevels = 2;
> std::string *levelNames = new std::string[2];
> levelNames[0] = std::string("pass"); // level 1
> levelNames[1] = std::string("fail"); // level 2
>
> // First row (this one determines column types).
> std::vector row1(numCol);
> row1[0].setStringValue("a");
> row1[1].setDoubleValue(3.14);
> row1[2].setFactorValue(levelNames, numLevels, 1);
> row1[3].setDateValue(RcppDate(7,4,2006));
> frame.addRow(row1);
>
> // Second row.
> std::vector row2(numCol);
> row2[0].setStringValue("b");
> row2[1].setDoubleValue(6.28);
> row2[2].setFactorValue(levelNames, numLevels, 1);
> row2[3].setDateValue(RcppDate(12,25,2006));
> frame.addRow(row2);
>
> RcppResultSet rs;
> rs.add("PreDF", frame);
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Adding RcppFrame to RcppResultSet causes segmentation fault

2010-04-01 Thread Matthew Dowle

He could have posted into this thread then at the time to say that. 
Otherwise it appears like its open.

"Romain Francois"  wrote in message 
news:4bb4c4b8.2030...@dbmail.com...
The thread has been handled in Rcpp-devel. Rob posted there 7 minutes
after posting on r-help.

FWIW, I think the problem is fixed on the Rcpp 0.7.11 version (on cran
incoming)

Romain

Le 01/04/10 17:47, Matthew Dowle a écrit :
>
> Rob,
> Please look again at Romain's reply to you on 19th March.  He informed you
> then that Rcpp has its own dedicated mailing list and he gave you the 
> link.
> Matthew
>
> "R_help Help"  wrote in message
> news:ad1ead5f1003291753p68d6ed52q572940f13e1c0...@mail.gmail.com...
>> Hi,
>>
>> I'm a bit puzzled. I uses exactly the same code in RcppExamples
>> package to try adding RcppFrame object to RcppResultSet. When running
>> it gives me segmentation fault problem. I'm using gcc 4.1.2 on redhat
>> 64bit. I'm not sure if this is the cause of the problem. Any advice
>> would be greatly appreciated. Thank you.
>>
>> Rob.
>>
>>
>> int numCol=4;
>> std::vector  colNames(numCol);
>> colNames[0] = "alpha"; // column of strings
>> colNames[1] = "beta";  // column of reals
>> colNames[2] = "gamma"; // factor column
>> colNames[3] = "delta"; // column of Dates
>> RcppFrame frame(colNames);
>>
>> // Third column will be a factor. In the current implementation the
>> // level names are copied to every factor value (and factors
>> // in the same column must have the same level names). The level names
>> // for a particular column will be factored out (pardon the pun) in
>> // a future release.
>> int numLevels = 2;
>> std::string *levelNames = new std::string[2];
>> levelNames[0] = std::string("pass"); // level 1
>> levelNames[1] = std::string("fail"); // level 2
>>
>> // First row (this one determines column types).
>> std::vector  row1(numCol);
>> row1[0].setStringValue("a");
>> row1[1].setDoubleValue(3.14);
>> row1[2].setFactorValue(levelNames, numLevels, 1);
>> row1[3].setDateValue(RcppDate(7,4,2006));
>> frame.addRow(row1);
>>
>> // Second row.
>> std::vector  row2(numCol);
>> row2[0].setStringValue("b");
>> row2[1].setDoubleValue(6.28);
>> row2[2].setFactorValue(levelNames, numLevels, 1);
>> row2[3].setDateValue(RcppDate(12,25,2006));
>> frame.addRow(row2);
>>
>> RcppResultSet rs;
>> rs.add("PreDF", frame);


-- 
Romain Francois
Professional R Enthusiast
+33(0) 6 28 91 30 30
http://romainfrancois.blog.free.fr
|- http://tr.im/OIXN : raster images and RImageJ
|- http://tr.im/OcQe : Rcpp 0.7.7
`- http://tr.im/O1wO : highlight 0.1-5

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] nlrq parameter bounds

2010-04-01 Thread Matthew Dowle

Ashley,

This appears to be your first post to this list. Welcome to R. Over 2 days
is quite a long time to wait though, so you are unlikely to get a reply now.

Feedback:  since nlrq is in package quantreg, its a question about a package 
and should
be sent to the package maintainer. Some packages though, over 40 of the 664 
on
r-forge, have dedicated help/devel/forum lists hosted on r-forge.

No reply from r-help often, but not always, means you haven't followed some
detail of the posting guide or haven't followed this :
http://www.catb.org/~esr/faqs/smart-questions.html.

HTH
Matthew


"Ashley Greenwood"  wrote in message 
news:45708.131.217.6.9.1269916052.squir...@webmail.student.unimelb.edu.au...
> Hi there,
> Can anyone please tell me if it is possible to limit parameters in nlrq()
> to 'upper' and 'lower' bounds as per nls()?  If so how??
>
> Many thanks in advance
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] memory error

2010-04-06 Thread Matthew Dowle

> someone else on this list may be able to give you a ballpark estimate
> of how much RAM this merge would require.

I don't have an absolute estimate, but try data.table::merge, as it needs 
less
working memory than base::merge.

20 million rows of 5 columns isn't beyond 32bit :
   (1*4 + 4*8)*19758564/1024^3 = 0.662GB

Also try sqldf to do the join.

Matthew


"Sharpie"  wrote in message
news:1270102758449-1747733.p...@n4.nabble.com...
>
>
> Janet Choate-2 wrote:
>>
>> Thanx for clarification on stating my problem, Charlie.
>>
>> I am attempting to merge to files, i.e.:
>> hi39 = merge(comb[,c("hillID","geo")], hi.h39, by=c("hillID"))
>>
>> if this is relevant or helps to explain:
>> the file 'comb' is 3 columns and 1127 rows
>> the file 'hi.h39' is 5 columns and 19758564 rows
>>
>> i started a new clean R session in which i was able to read those 2 files
>> in, but get the following error when i try to merge them:
>>
>> R(2175) malloc: *** mmap(size=79036416) failed (error code=12)
>> *** error: can't allocate region
>> *** set a breakpoint in malloc_error_break to debug
>> R(2175) malloc: *** mmap(size=79036416) failed (error code=12)
>> *** error: can't allocate region
>> *** set a breakpoint in malloc_error_break to debug
>> R(2175) malloc: *** mmap(size=158068736) failed (error code=12)
>> *** error: can't allocate region
>> *** set a breakpoint in malloc_error_break to debug
>> R(2175) malloc: *** mmap(size=158068736) failed (error code=12)
>> *** error: can't allocate region
>> *** set a breakpoint in malloc_error_break to debug
>> R(2175) malloc: *** mmap(size=158068736) failed (error code=12)
>> *** error: can't allocate region
>> *** set a breakpoint in malloc_error_break to debug
>> Error: cannot allocate vector of size 150.7 Mb
>>
>> so the final error is "Cannot allocate vector of size 150.7 Mb", as
>> suggested when R runs out of memory.
>>
>> i am running R version 2.9.2, on mac os X 10.5 - leopard.
>>
>> any suggestion on how to increase R's memory on a mac?
>> thanx for any much needed help!
>> Janet
>>
>
> Ah, so it is indeed a shortage of memory problem.  With R 2.9.2, you are
> likely running a 32 bit version of R which will be limited to accessing at
> most 4 GB of RAM. You may want to try the newest version of R, 2.10.1, as
> it includes a 64 bit version that will allow you to access significantly
> more memory- provided you have the RAM installed on your system.
>
> I'm not too hot on memory usage calculation, but someone else on this list
> may be able to give you a ballpark estimate of how much RAM this merge
> would
> require.  If it turns out to be a ridiculous amount, you will need to
> consider breaking the merge up into chunks or finding an out-of-core (i.e.
> not dependent on RAM for storage) merge tool.
>
> Hope this helps!
>
> -Charlie
>
> -
> Charlie Sharpsteen
> Undergraduate-- Environmental Resources Engineering
> Humboldt State University
> -- 
> View this message in context:
> http://n4.nabble.com/memory-error-tp1747357p1747733.html
> Sent from the R help mailing list archive at Nabble.com.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] match function or "=="

2010-04-08 Thread Matthew Dowle

Please install v1.3 from R-forge :
install.packages("data.table",repos="http://R-Forge.R-project.org";)

It will be ready for CRAN soon.

Please follow up on datatable-h...@lists.r-forge.r-project.org

Matthew

"bo"  wrote in message 
news:1270689586866-1755876.p...@n4.nabble.com...
>
> Thank you very much for the help.
>
> I installed data.table package, but I keep getting the following warnings:
>
>> setkey(DT,id,date)
> Warning messages:
> 1: In `[.data.table`(deref(x), o) :
> This R session is < 2.4.0. Please upgrade to 2.4.0+.
>
> I'm using R 2.10, but why I keep getting warnings on upgrades. Thanks 
> again.
>
>
> -- 
> View this message in context: 
> http://n4.nabble.com/match-function-or-tp1754505p1755876.html
> Sent from the R help mailing list archive at Nabble.com.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Code is too slow: mean-centering variables in a dataframebysubgroup

2010-04-08 Thread Matthew Dowle

Hi Dimitri,

A start has been made at explaining .SD in FAQ 2.1. This was previously on a 
webpage, but its just been moved to a vignette :

https://r-forge.r-project.org/plugins/scmsvn/viewcvs.php/*checkout*/branch2/inst/doc/faq.pdf?rev=68&root=datatable

Please note: that vignette is part of a development branch on r-forge, and 
as such isn't even released to the r-forge repository yet.

Please also see FAQ 4.5 in that vignette and follow up on 
datatable-h...@lists.r-forge.r-project.org

An introduction vignette is taking shape too (again, in the development 
branch i.e. bleeding edge) :

https://r-forge.r-project.org/plugins/scmsvn/viewcvs.php/*checkout*/branch2/inst/doc/intro.pdf?rev=68&root=datatable

HTH
Matthew


"Dimitri Liakhovitski"  wrote in message 
news:r2rdae9a2a61004071314xc03ae851n4c9027b28df5a...@mail.gmail.com...
Yes, Tom's solution is indeed the fastest!
On my PC it took .17-.22 seconds while using ave() took .23-.27 seconds.
And of course - the last two methods I mentioned took 1.3 SECONDS, not
MINUTES (it was a typo).

All that is left to me is to understand what .SD stands for.
:-)

Dimitri

On Wed, Apr 7, 2010 at 4:04 PM, Rob Forler  wrote:
> Leave it up to Tom to solve things wickedly fast :)
>
> Just as an fyi Dimitri, Tom is one of the developers of data.table.
>
> -Rob
>
> On Wed, Apr 7, 2010 at 2:51 PM, Dimitri Liakhovitski 
> wrote:
>>
>> Wow, thank you, Tom!
>>
>> On Wed, Apr 7, 2010 at 3:46 PM, Tom Short  
>> wrote:
>> > Here's how I would have done the data.table method. It's a bit faster
>> > than the ave approach on my machine:
>> >
>> >> # install.packages("data.table",repos="http://R-Forge.R-project.org";)
>> >> library(data.table)
>> >>
>> >> f3 <- function(frame) {
>> > + frame <- as.data.table(frame)
>> > + frame[, lapply(.SD[,2:ncol(.SD), with = FALSE],
>> > + function(x) x / mean(x, na.rm = TRUE)),
>> > + by = "group"]
>> > + }
>> >>
>> >> system.time(new.frame2 <- f2(frame)) # ave
>> > user system elapsed
>> > 0.50 0.08 1.24
>> >> system.time(new.frame3 <- f3(frame)) # data.table
>> > user system elapsed
>> > 0.25 0.01 0.30
>> >
>> > - Tom
>> >
>> > Tom Short
>> >
>> >
>> > On Wed, Apr 7, 2010 at 12:46 PM, Dimitri Liakhovitski 
>> > 
>> > wrote:
>> >> I would like to thank once more everyone who helped me with this
>> >> question.
>> >> I compared the speed for different approaches. Below are the results
>> >> of my comparisons - in case anyone is interested:
>> >>
>> >> ### Building an EXAMPLE FRAME with N rows - with groups and a lot of
>> >> NAs:
>> >> N<-10
>> >> set.seed(1234)
>> >>
>> >> frame<-data.frame(group=rep(paste("group",1:10),N/10),a=rnorm(1:N),b=rnorm(1:N),c=rnorm(1:N),d=rnorm(1:N),e=rnorm(1:N),f=rnorm(1:N),g=rnorm(1:N))
>> >> frame<-frame[order(frame$group),]
>> >>
>> >> ## Introducing 60% NAs:
>> >> names.used<-names(frame)[2:length(frame)]
>> >> set.seed(1234)
>> >> for(i in names.used){
>> >> i.for.NA<-sample(1:N,round((N*.6),0))
>> >> frame[[i]][i.for.NA]<-NA
>> >> }
>> >> lapply(frame[2:8], function(x) length(x[is.na(x)])) # Checking that it
>> >> worked
>> >> ORIGframe<-frame ## placeholder for the unchanged original frame
>> >>
>> >> ### Objective of the code - divide each value by its group mean
>> >> 
>> >>
>> >> ### METHOD 1 - the FASTEST - using 
>> >> ave():##
>> >> frame<-ORIGframe
>> >> f2 <- function(frame) {
>> >> for(i in 2:ncol(frame)) {
>> >> frame[,i] <- ave(frame[,i], frame[,1],
>> >> FUN=function(x)x/mean(x,na.rm=TRUE))
>> >> }
>> >> frame
>> >> }
>> >> system.time({new.frame<-f2(frame)})
>> >> # Took me 0.23-0.27 sec
>> >> ###
>> >>
>> >> ### METHOD 2 - fast, just a bit slower - using data.table:
>> >> ##
>> >>
>> >> # If you don't have it - install the package - NOT from CRAN:
>> >> install.packages("data.table",repos="http://R-Forge.R-project.org";)
>> >> library(data.table)
>> >>

[R] Any chance R will ever get beyond the 2^31-1 vector size limit?

2010-04-09 Thread Matthew Keller

Hi all,

My institute will hopefully be working on cutting-edge genetic
sequencing data by the Fall of 2010. The datasets will be 10's of GB
large and growing. I'd like to use R to do primary analyses. This is
OK, because we can just throw $ at the problem and get lots of RAM
running on 64 bit R. However, we are still running up against the fact
that vectors in R cannot contain more than 2^31-1. I know there are
"ways around" this issue, and trust me, I think I've tried them all
(e.g., bringing in portions of the data at a time; using large-dataset
packages in R; using SQL databases, etc). But all these 'solutions'
are, at the end of the day, much much more cumbersome,
programming-wise, than just doing things in native R. Maybe that's
just the cost of doing what I'm doing. But my questions, which  may
well be naive (I'm not a computer programmer), are:

1) Is there an *inherent* limit to vectors being < 2^31-1 long? I.e.,
in an alternative history of R's development, would it have been
feasible for R to not have had this limitation?

2) Is there any possibility that this limit will be overcome in future
revisions of R?

I'm very very grateful to the people who have spent important parts of
their professional lives developing R. I don't think anyone back in,
say, 1995, could have foreseen that datasets would be >>2^32-1 in
size. For better or worse, however, in many fields of science, that is
routinely the case today. *If* it's possible to get around this limit,
then I'd like to know whether the R Development Team takes seriously
the needs of large data users, or if they feel that (perhaps not
mutually exclusively) developing such capacity is best left up to ad
hoc R packages and alternative analysis programs.

Best,

Matt



-- 
Matthew C Keller
Asst. Professor of Psychology
University of Colorado at Boulder
www.matthewckeller.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Interpreting factor*numeric interaction coefficients

2010-04-12 Thread Matthew Carroll

Dear all,
I am a relative novice with R, so please forgive any terrible errors...

I am working with a GLM that describes a response variable as a function of
a categorical variable with three levels and a continuous variable. These
two predictor variables are believed to interact.
An example of such a model follows at the bottom of this message, but here
is a section of its summary table:

   Estimate Std. Error z value Pr(>|z|)  
(Intercept)1.220186 0.539475   2.262   0.0237 *
var1   0.028182 0.050850   0.554   0.5794  
cat2   -0.112454  0.781137  -0.144   0.8855  
cat3   0.339589   0.672828   0.505   0.6138  
var1:cat2  0.007091   0.068072   0.104   0.9170  
var1:cat3  -0.027248  0.064468  -0.423   0.6725  

I am having trouble interpreting this output.
I think I understand that:

# the 'var1' value refers to the slope of the relationship within the first
factor level

# the 'cat2' and 'cat3' values refer to the difference in intercept from
'cat1'

# the interaction terms describe the difference in slope between the
relationship in 'cat1' and that in 'cat2' and 'cat3' respectively

Therefore, if I wanted a single value to describe the slope in either cat2
or cat3, I would sum the interaction value with that of var1.

However, if I wanted to report a standard error for the slope in 'cat2', how
would I go about doing this? Is the reported standard error that for the
overall slope for that factor level, or is the actual standard error a
function of the standard error of var1 and that of the interaction?

Any help with this would be much appreciated,

Matthew Carroll


### example code

resp <- rpois(30, 5)
cat <- factor(rep(c(1:3), 10))
var1 <- rnorm(30, 10, 3)

mod <- glm(resp ~ var1 * cat, family="poisson")
summary(mod)

Call:
glm(formula = resp ~ var1 * cat, family = "poisson")

Deviance Residuals: 
 Min1QMedian3Q   Max  
-1.80269  -0.54107  -0.06169   0.51819   1.58169  

Coefficients:
   Estimate Std. Error z value Pr(>|z|)  
(Intercept)1.220186 0.539475   2.262   0.0237 *
var1   0.028182 0.050850   0.554   0.5794  
cat2   -0.112454  0.781137  -0.144   0.8855  
cat3   0.339589   0.672828   0.505   0.6138  
var1:cat2  0.007091   0.068072   0.104   0.9170  
var1:cat3  -0.027248  0.064468  -0.423   0.6725  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 23.222  on 29  degrees of freedom Residual deviance:
22.192  on 24  degrees of freedom
AIC: 133.75

Number of Fisher Scoring iterations: 5



--
Matthew Carroll
E-mail: mjc...@york.ac.uk

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Any chance R will ever get beyond the 2^31-1 vector size limit?

2010-04-15 Thread Matthew Keller

HI Duncan and R users,

Duncan, thank you for taking the time to respond. I've had several
other comments off the list, and I'd like to summarize what these have
to say, although I won't give sources since I assume there was a
reason why people chose not to respond to the whole list. The long and
short of it is that there is hope for people who want R to get beyond
the 2^31-1 vector size limit.

First off, I received a couple of responses from people who wanted to
commiserate and me to summarize what I learned. Here you go.

Second, the package bigmemory and ff can both help with memory issues.
I've had success using bigmemory before, and found it to be quite
intuitive.

Third, one knowledgeable responder doubted that changing the 2^31-1
limit would 'break' old datasets. He says, "This might be true for
isolated cases of objects stored in binary formats or in workspaces,
but I don't see that as anywhere near as important as the change you
(and we) would like to see."

Fourth, another knowledgeable responder felt it was likely that, given
the demand driven by the huge increases in dataset sizes, this
limitation would likely be overcome within the next few years.

Best,

Matt

On Fri, Apr 9, 2010 at 6:36 PM, Duncan Murdoch  wrote:
> On 09/04/2010 7:38 PM, Matthew Keller wrote:
>>
>> Hi all,
>>
>> My institute will hopefully be working on cutting-edge genetic
>> sequencing data by the Fall of 2010. The datasets will be 10's of GB
>> large and growing. I'd like to use R to do primary analyses. This is
>> OK, because we can just throw $ at the problem and get lots of RAM
>> running on 64 bit R. However, we are still running up against the fact
>> that vectors in R cannot contain more than 2^31-1. I know there are
>> "ways around" this issue, and trust me, I think I've tried them all
>> (e.g., bringing in portions of the data at a time; using large-dataset
>> packages in R; using SQL databases, etc). But all these 'solutions'
>> are, at the end of the day, much much more cumbersome,
>> programming-wise, than just doing things in native R. Maybe that's
>> just the cost of doing what I'm doing. But my questions, which  may
>> well be naive (I'm not a computer programmer), are:
>>
>> 1) Is there an *inherent* limit to vectors being < 2^31-1 long? I.e.,
>> in an alternative history of R's development, would it have been
>> feasible for R to not have had this limitation?
>
> The problem is that we use "int" as a vector index.  On most platforms,
> that's a signed 32 bit integer, with max value 2^31-1.
>
>
>>
>> 2) Is there any possibility that this limit will be overcome in future
>> revisions of R?
>
>
> Of course, R is open source.  You could rewrite all of the internal code
> tomorrow to use 64 bit indexing.
>
> Will someone else do it for you?  Even that is possible.  One problem are
> that this will make all of your data incompatible with older versions of R.
>  And back to the original question:  are you willing to pay for the
> development?  Then go ahead, you can have it tomorrow (or later, if your
> budget is limited).  Are you waiting for someone else to do it for free?
>  Then you need to wait for someone who knows how to do it to want to do it.
>
>
>> I'm very very grateful to the people who have spent important parts of
>> their professional lives developing R. I don't think anyone back in,
>> say, 1995, could have foreseen that datasets would be >>2^32-1 in
>> size. For better or worse, however, in many fields of science, that is
>> routinely the case today. *If* it's possible to get around this limit,
>> then I'd like to know whether the R Development Team takes seriously
>> the needs of large data users, or if they feel that (perhaps not
>> mutually exclusively) developing such capacity is best left up to ad
>> hoc R packages and alternative analysis programs.
>
> There are many ways around the limit today.  Put your data in a dataframe
> with many columns each of length 2^31-1 or less.  Put your data in a
> database, and process it a block at a time.  Etc.
>
> Duncan Murdoch
>
>>
>> Best,
>>
>> Matt
>>
>>
>>
>
>

-- 
Matthew C Keller
Asst. Professor of Psychology
University of Colorado at Boulder
www.matthewckeller.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] sum specific rows in a data frame

2010-04-20 Thread Matthew Dowle


Or try data.table 1.4 on r-forge, its grouping is faster than aggregate :

 agg datatable
X100.012 0.008
X100   0.020 0.008
X1000  0.172 0.020
X1 1.164 0.144
X1e.05 9.397 1.180

install.packages("data.table", repos="http://R-Forge.R-project.org";)
require(data.table)
dt = as.data.table(df)
t3 <- system.time(zz3 <- dt[, list(sumflt=sum(fltval), sumint=sum
(intval)), by=id])

Matthew


On Thu, 15 Apr 2010 13:09:17 +, hadley wickham wrote:
> On Thu, Apr 15, 2010 at 1:16 AM, Chuck  wrote:
>> Depending on the size of the dataframe and the operations you are
>> trying to perform, aggregate or ddply may be better.  In the function
>> below, df has the same structure as your dataframe.
> 
> Current version of plyr:
> 
>  agg  ddply
> X100.005  0.007
> X100   0.007  0.026
> X1000  0.086  0.248
> X1 0.577  3.136
> X1e.05 4.493 44.147
> 
> Development version of plyr:
> 
>  agg ddply
> X100.003 0.005
> X100   0.007 0.007
> X1000  0.042 0.044
> X1 0.410 0.443
> X1e.05 4.479 4.237
> 
> So there are some big speed improvements in the works.
> 
> Hadley

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] multiple paired t-tests without loops

2010-04-24 Thread Matthew Finkbeiner

I am new to R and I suspect my problem is easily solved, but I haven't 
been able to figure it out without using loops.  I am trying to 
implement Blair & Karniski's (1993) permutation test.  I've included a 
sample data frame below.  This data frame represents the conditional 
means (C1, C2) for 3 subjects in 2 consecutive samples of a continuous 
data set (e.g. ERP waveform).  Each sample includes all possible 
permuations of the subject means (2^N), which is 8 in this case.


The problem: I need to run a paired t-test on each SampleXPermutation 
set and save the maximum t-value obtained for each sample.  The real 
data set has 16 subjects (216 permutations) and 500 samples, which leads 
to more than 32 million t-tests.  I have a loop version of the program 
working, but it would take a few weeks to complete the job and I was 
hoping that someone could tell me how to do it faster?


thank you kindly,

Matthew Finkbeiner



"Sample""C1""C2""PermN"
158perm1
143perm1
164perm1
226perm1
231perm1
274perm1
185perm2
134perm2
164perm2
262perm2
213perm2
274perm2
158perm3
134perm3
164perm3
226perm3
213perm3
274perm3
185perm4
143perm4
146perm4
262perm4
231perm4
247perm4
158perm5
143perm5
146perm5
226perm5
231perm5
247perm5
185perm6
134perm6
146perm6
262perm6
213perm6
247perm6
158perm7
134perm7
146perm7
226perm7
213perm7
247perm7
185perm8
143perm8
164perm8
262perm8
2    31perm8
274perm8






--
Dr. Matthew Finkbeiner
Senior Lecturer & ARC Australian Research Fellow
Macquarie Centre for Cognitive Science (MACCS)
Macquarie University, Sydney, NSW 2109

Phone: +61 2 9850-6718
Fax:   +61 2 9850-6059
Homepage: http://www.maccs.mq.edu.au/~mfinkbei
Lab Homepage: http://www.maccs.mq.edu.au/laboratories/action/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] multiple paired t-tests without loops

2010-04-24 Thread Matthew Finkbeiner

I am new to R and I suspect my problem is easily solved, but I haven't 
been able to figure it out without using loops.  I am trying to 
implement Blair & Karniski's (1993) permutation test.  I've included a 
sample data frame below.  This data frame represents the conditional 
means (C1, C2) for 3 subjects in 2 consecutive samples of a continuous 
data set (e.g. ERP waveform).  Each sample includes all possible 
permuations of the subject means (2^N), which is 8 in this case.


The problem: I need to run a paired t-test on each SampleXPermutation 
set and save the maximum t-value obtained for each sample.  The real 
data set has 16 subjects (2^16 permutations) and 500 samples, which 
leads to more than 32 million t-tests.  I have a loop version of the 
program working, but it would take a few weeks to complete the job and I 
was hoping that someone could tell me how to do it faster?


thank you kindly,

Matthew Finkbeiner



"Sample"  "C1"  "C2"  "PermN"
1   5   8   perm1
1   4   3   perm1
1   6   4   perm1
2   2   6   perm1
2   3   1   perm1
2   7   4   perm1
1   8   5   perm2
1   3   4   perm2
1   6   4   perm2
2   6   2   perm2
2   1   3   perm2
2   7   4   perm2
1   5   8   perm3
1   3   4   perm3
1   6   4   perm3
2   2   6   perm3
2   1   3   perm3
2   7   4   perm3
1   8   5   perm4
1   4   3   perm4
1   4   6   perm4
2   6   2   perm4
2   3   1   perm4
2   4   7   perm4
1   5   8   perm5
1   4   3   perm5
1   4   6   perm5
2   2   6   perm5
2   3   1   perm5
2   4   7   perm5
1   8   5   perm6
1   3   4   perm6
1   4   6   perm6
2   6   2   perm6
2   1   3   perm6
2   4   7   perm6
1   5   8   perm7
1   3   4   perm7
1   4   6   perm7
2   2   6   perm7
2   1   3   perm7
2   4   7   perm7
1   8   5   perm8
1   4   3   perm8
1   6   4   perm8
2   6   2   perm8
2   3   1   perm8
2   7   4   perm8






--
Dr. Matthew Finkbeiner
Senior Lecturer & ARC Australian Research Fellow
Macquarie Centre for Cognitive Science (MACCS)
Macquarie University, Sydney, NSW 2109

Phone: +61 2 9850-6718
Fax:   +61 2 9850-6059
Homepage: http://www.maccs.mq.edu.au/~mfinkbei
Lab Homepage: http://www.maccs.mq.edu.au/laboratories/action/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] how to make read in a vector of 0s and 1s with no space between them

2010-04-25 Thread Matthew Keller

Hi all,

Probably a rudimentary question. I have a flat file that looks like
this (the real one has ~10e6 elements):

10110100101001011101011

and I want to pull that into R as a vector, but with each digit being
it's own element. There are no separators between the digits. How can
I accomplish this? Thanks in advance!

Matt

-- 
Matthew C Keller
Asst. Professor of Psychology
University of Colorado at Boulder
www.matthewckeller.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to make read in a vector of 0s and 1s with no space between them

2010-04-25 Thread Matthew Keller

Hi all,

Quickly received an answer off the list. To do this is easy. Pull it
in using e.g., scan(). Then use strsplit:

z <- '10001011010010'
strsplit(z,'')

On Sun, Apr 25, 2010 at 10:52 AM, Matthew Keller  wrote:
> Hi all,
>
> Probably a rudimentary question. I have a flat file that looks like
> this (the real one has ~10e6 elements):
>
> 10110100101001011101011
>
> and I want to pull that into R as a vector, but with each digit being
> it's own element. There are no separators between the digits. How can
> I accomplish this? Thanks in advance!
>
> Matt
>
> --
> Matthew C Keller
> Asst. Professor of Psychology
> University of Colorado at Boulder
> www.matthewckeller.com
>

-- 
Matthew C Keller
Asst. Professor of Psychology
University of Colorado at Boulder
www.matthewckeller.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] multiple paired t-tests without loops

2010-04-26 Thread Matthew Finkbeiner

Yes, I suspect that I will end up using a sampling approach, but I'd 
like to use an exact test if it's at all feasible.


Here are two samples of data from 3 subjects:
Sample  SubjC1  C2
44  1   0.0093  0.0077
44  2   0.0089  0.0069
44  3   0.051   0.0432
44  4   0.014   0.0147
44  5   0.0161  0.0117
45  1   0.0103  0.0086
45  2   0.0099  0.0078
45  3   0.0542  0.0458
45  4   0.0154  0.0163
45  5   0.0175  0.0129


and then here is the script I've pieced together from things I've found 
on the web (sorry for not citing the snippets!).  any pointers on how to 
speed it up would be greatly appreciated.


#--
# Utility function
# that returns binary representation of 1:(2^n) X SubjN
binary.v <-
function(n)
{
  x <- 1:(2^n)
  mx <- max(x)
  digits <- floor(log2(mx))
  ans <- 0:(digits-1); lx <- length(x)
  x <- matrix(rep(x,rep(digits, lx)),ncol=lx)
  x <- (x %/% 2^ans) %% 2
}

library(plyr)


#first some global variables
TotalSubjects <- 5
TotalSamples <- 2
StartSample <- 44
EndSample <- ((StartSample + TotalSamples)-1)
maxTs <- NULL
obsTs <- NULL




#create index array that drives the permuations for all samples
ind <- binary.v(TotalSubjects)

#transpose ind so that the first 2^N items correspond to S1,
#the second 2^N correspond to S2 and so on...
transind <- t(ind)

#get data file that is organized first by sample then by subj (e.g. 
sample1 subject1

# sample1 subject 2 ... sample 1 subject N)
#sampledatafile <- file.choose()

samples <- read.table(sampledatafile, header=T)

#this is the progress bar
pb <- txtProgressBar(min = StartSample, max = EndSample, style = 3)
setTxtProgressBar(pb, 1)

start.t <- proc.time()

#begin loop that analyzes data sample by sample
for (s in StartSample:EndSample) {

S <- samples[samples$Sample==s,] #pick up data for current sample

#reproduce data frame rows once for each permutation to be done
expanddata <- S[rep(1:nrow(S), each = 2^TotalSubjects),]


#create new array to hold the flipped (permuted) data
permdata = expanddata

#permute the data
permdata[transind==1,3] <- expanddata[transind==1,4] #Cnd1 <- Cnd2
permdata[transind==1,4] <- expanddata[transind==1,3] #Cnd2 <- Cnd1

#create permutation # as a factor in dataframe
PermN <- rep(rep(1:2^TotalSubjects, TotalSubjects),2)

#create Sample# as a factor
Sample <- rep(permdata[,1],2) #Sample# is in the 1st Column

#create subject IDs as a factor
Subj <- rep(permdata[,2],2) #Subject ID is in the 2nd Column

#stack the permutated data
StackedPermData <- stack(permdata[,3:4])

#bind all the factors together
StackedPermData <- as.data.frame(cbind(Sample, Subj, PermN, 
StackedPermData))



#sort by perm
sortedstack <- 
as.data.frame(StackedPermData[order(StackedPermData$PermN,

StackedPermData$Sample),])


#clear up some memory
rm(expanddata, permdata, StackedPermData)

#pull out data 1 perm at a time
res<-ddply(sortedstack, c("Sample", "PermN"), function(.data){

# Type combinations by Class
combs<-t(combn(sort(unique(.data[,5])),2))

# Applying the t-test for them
aaply(combs,1, function(.r){
x1<-.data[.data[,5]==.r[1],4] # select first column
x2<-.data[.data[,5]==.r[2],4] # select first column

tvalue <- t.test(x1,x2, paired = T)

res <- c(tvalue$statistic,tvalue$parameter,tvalue$p.value)
names(res) <- c('stat','df','pvalue')
res
}
)
}
)

 # update progress bar
 setTxtProgressBar(pb, s)

 #get max T vals
 maxTs <- c(maxTs, tapply (res$stat, list (res$Sample), max))

 #get observed T vals
 obsTs <- c(obsTs, res$stat[length(res$stat)])

 #here we need to save res to a binary file


}
#close out the progress bar
close(pb)

end.t <- proc.time() - start.t
print(end.t)

#get cutoffs
#these are the 2-tailed t-vals that maintain experimentwise error at the 
0.05 level

lowerT <- quantile(maxTs, .025)
upperT <- quantile(maxTs, .975)











On 4/27/2010 6:53 AM, Greg Snow wrote:

The usual way to speed up permutation testing is to sample from the
set of possible permutations rather than looking at all possible
ones.

If you show some code then we may be able to find some inefficiencies
for you, but there is not general solution, poorly written uses of
apply will be slower than well written for loops.  In some cases
rewriting critical pieces in C or fortran will help quite a bit, but
we need to see what you are already doing to know if that will help
or not.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Confusing concept of vector and matrix in R

2010-04-26 Thread Matthew Keller

Rolf: "Well then, why don't you go away and design and build your own
statistics and data analysis language/package to replace R?"

What a nice reply! The fellow is just trying to understand R. That
response reminds me of citizens of my own country who cannot abide by
any criticism of the USA: "If you don't like it, why don't you leave?"
Classy.

I have sympathies with the author. When I first began using R
(migrating from Matlab), I also found the vector concept strange,
especially because I was doing a lot of matrix algebra back then and
didn't like the concept of conflating a row vector with a column
vector. But I've since gotten used to it and can hardly remember why I
struggled with this early on. Perhaps your experience will be similar.

Best of luck!

Matt



On Mon, Apr 26, 2010 at 7:40 PM, Charles C. Berry  wrote:
> On Mon, 26 Apr 2010, Stu wrote:
>
>> Hi all,
>>
>> One subtlety is that the drop argument only works if you specify 2 or
>> more indices e.g. [i, j, ..., drop=F]; but not for a single index e.g
>> [i, drop=F].
>
> Wrong.
>
>> a <- structure(1:5,dim=5)
>> dim(a)
>
> [1] 5
>>
>> dim(a[2:3,drop=F]) # don't drop regardless
>
> [1] 2
>>
>> dim(a[2,drop=F]) # dont' drop regardless
>
> [1] 1
>>
>> dim(a[2:3,drop=T]) # no extent of length 1
>
> [1] 2
>>
>> dim(a[2,drop=T]) # drop, extent of length 1
>
> NULL
>
>
>>
>> Why doesn't R complain about the unused "drop=F" argument in the
>> single index case?
>
> In the example you give (one index for a two-dimension array), vector
> indexing is assumed. For vector indexing, drop is irrelevant.
>
> HTH,
>
> Chuck
>>
>> Cheers,
>> - Stu
>>
>> a = matrix(1:10, nrow=1)
>> b = matrix(10:1, ncol=1)
>>
>> # a1 is an vector w/o dim attribute (i.e. drop=F is ignored silently)
>> (a1 = a[2:5, drop=F])
>> dim(a1)
>>
>> # a2 is an vector WITH dim attribute: a row matrix (drop=F works)
>> (a2 = a[, 2:5, drop=F])
>> dim(a2)
>>
>> # b1 is an vector w/o dim attribute (i.e. drop=F is ignored silently)
>> (b1 = b[2:5, drop=F])
>> dim(b1)
>>
>> # b2 is an vector WITH dim attribute: a column matrix (drop=F works)
>> (b2 = b[2:5, , drop=F])
>> dim(b2)
>>
>>
>> On Mar 30, 4:08 pm, lith  wrote:
>>>>
>>>> Reframe the problem. Rethink why you need to keep dimensions. I never
>>>> ever had to use drop.
>>>
>>> The problem is that the type of the return value changes if you happen
>>> to forget to use drop = FALSE, which can easily turn into a nightmare:
>>>
>>> m <-matrix(1:20, ncol=4)
>>> for (i in seq(3, 1, -1)) {
>>>     print(class(m[1:i, ]))}
>>>
>>> [1] "matrix"
>>> [1] "matrix"
>>> [1] "integer"
>>>
>>> __
>>> r-h...@r-project.org mailing
>>> listhttps://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting
>>> guidehttp://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> Charles C. Berry                            (858) 534-2098
>                                            Dept of Family/Preventive
> Medicine
> E mailto:cbe...@tajo.ucsd.edu               UC San Diego
> http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901
>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>



-- 
Matthew C Keller
Asst. Professor of Psychology
University of Colorado at Boulder
www.matthewckeller.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Size limitations for model.matrix?

2010-04-28 Thread Matthew Keller

Hi Gerald,

A matrix and an array *are* vectors that can be indexed by 2+ indices.
Thus, matrices and arrays are also limited to 2^31-1 elements.  You
might check out the bigmemory package, which can help with these
issues...

Matt



On Wed, Apr 28, 2010 at 11:01 AM,   wrote:
>
> Hello,
>
> I am running:
>
> R version 2.10.0 (2009-10-26)
> Copyright (C) 2009 The R Foundation for Statistical Computing
> ISBN 3-900051-07-0
>
> on a RedHat Linux box with 48Gb of memory.
>
> I am trying to create a model.matrix for a big model on a moderately large
> data set.  It seems there is a size limitation to this model.matrix.
>
>> dim(coll.train)
> [1] 677236    128
>> coll.1st.model.mat <- model.matrix(coll.1st.formula, data = coll.train)
>> dim(coll.1st.model.mat)
> [1] 581618    169
>
> One I saw the resulting model.matrix had fewer rows than the original
> data.frame I played with the number of input variables in the model:
>
>> ttt <- model.matrix(~kmpleasure + vehage + age + gender + marital.status
> +
> +     license.category + minor.conviction + driver.training.certificate +
> +     admhybrid + anpol + anveh + cie + dblct + faq13c + faq20 + faq27 +
> faq43 +
> +     faq5a + fra2 + frb2 + frb3 + kmaff + kmannuel + kmtravai + lima +
> maison +
> +     nacp + nap + nbcond + nbcondpo + nbvt + rabmlt06 + rabmtve +
> rabperprg +
> +     rabretrai + statnuit + tarcl06 + utilusa + sexeocc + ageocc + napocc,
> +     data = coll.train)
> dim(ttt)
> [1] 677236    109
>
> ## OK so far, but if I had one more variable there will be missing rows.
>
>> ttt <- model.matrix(~kmpleasure + vehage + age + gender + marital.status
> +
> +     license.category + minor.conviction + driver.training.certificate +
> +     admhybrid + anpol + anveh + cie + dblct + faq13c + faq20 + faq27 +
> faq43 +
> +     faq5a + fra2 + frb2 + frb3 + kmaff + kmannuel + kmtravai + lima +
> maison +
> +     nacp + nap + nbcond + nbcondpo + nbvt + rabmlt06 + rabmtve +
> rabperprg +
> +     rabretrai + statnuit + tarcl06 + utilusa + sexeocc + ageocc + napocc
> +
> +     prof.b2, data = coll.train)
> dim(ttt)
> [1] 676379    110
>
> Is there a limit to the size of a matrix and of a data.frame.  I know the
> limit for the length of a vector to be 2^31, but we are very far from that
> here.  Am I missing something?
>
> Thanks for any support,
>
> Gérald Jean
> Conseiller senior en statistiques,
> VP Actuariat et Solutions d'assurances,
> Desjardins Groupe d'Assurances Générales
> télephone            : (418) 835-4900 poste (7639)
> télecopieur          : (418) 835-6657
> courrier électronique: gerald.j...@dgag.ca
>
> "We believe in God, others must bring Data."
>
> W. Edwards Deming
>
>
> Le message ci-dessus, ainsi que les documents l'accompagnant, sont destinés 
> uniquement aux personnes identifiées et peuvent contenir des informations
> privilégiées, confidentielles ou ne pouvant être divulguées. Si vous avez 
> reçu ce message par erreur, veuillez le détruire.
>
> This communication ( and/or the attachments ) is intended for named 
> recipients only and may contain privileged or confidential information which 
> is
> not to be disclosed. If you received this communication by mistake please 
> destroy all copies.
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Matthew C Keller
Asst. Professor of Psychology
University of Colorado at Boulder
www.matthewckeller.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Using plyr::dply more (memory) efficiently?

2010-04-29 Thread Matthew Dowle

I don't know about that,  but try this :

install.packages("data.table", repos="http://R-Forge.R-project.org";)
require(data.table)
summaries = data.table(summaries)
summaries[,sum(counts),by=symbol]

Please let us know if that returns the correct result,  and if its 
memory/speed is ok ?

Matthew

"Steve Lianoglou"  wrote in message 
news:w2kbbdc7ed01004290606lc425e47cs95b36f6bf0a...@mail.gmail.com...
> Hi all,
>
> In short:
>
> I'm running ddply on an admittedly (somehow) large data.frame (not
> that large). It runs fine until it finishes and gets to the
> "collating" part where all subsets of my data.frame have been
> summarized and they are being reassembled into the final summary
> data.frame (sorry, don't know the correct plyr terminology). During
> collation, my R workspace RAM usage goes from about 1.5 GB upto 20GB
> until I kill it.
>
> Running a similar piece of code that iterates manually w/o ddply by
> using a combo of lapply and a do.call(rbind, ...) uses considerably
> less ram (tops out at about 8GB).
>
> How can I use ddply more efficiently?
>
> Longer:
>
> Here's more info:
>
> * The data.frame itself ~ 15.8 MB when loaded.
> * ~ 400,000 rows, 8 columns
>
> It looks like so:
>
>   exon.start exon.width exon.width.unique exon.anno counts
> symbol   transcript  chr
> 14225468 0   utr  0
> WASH5P   WASH5P chr1
> 24833 69 0   utr  1
> WASH5P   WASH5P chr1
> 3565915238   utr  1
> WASH5P   WASH5P chr1
> 46470159 0   utr  0
> WASH5P   WASH5P chr1
> 56721198 0   utr  0
> WASH5P   WASH5P chr1
> 67096136 0   utr  0
> WASH5P   WASH5P chr1
> 77469137 0   utr  0
> WASH5P   WASH5P chr1
> 87778147 0   utr  0
> WASH5P   WASH5P chr1
> 98131 99 0   utr  0
> WASH5P   WASH5P chr1
> 10  14601154 0   utr  0
> WASH5P   WASH5P chr1
> 11  19184 50 0   utr  0
> WASH5P   WASH5P chr1
> 12   469314036intron  2
> WASH5P   WASH5P chr1
> 13   490275736intron  1
> WASH5P   WASH5P chr1
> 14   5811659   144intron 47
> WASH5P   WASH5P chr1
> 15   6629 9221intron  1
> WASH5P   WASH5P chr1
> 16   6919177 0intron  0
> WASH5P   WASH5P chr1
> 17   723223735intron  2
> WASH5P   WASH5P chr1
> 18   7606172 0intron  0
> WASH5P   WASH5P chr1
> 19   7925206 0intron  0
> WASH5P   WASH5P chr1
> 20   8230   6371   109intron 67
> WASH5P   WASH5P chr1
> 21  14755   442955intron 12
> WASH5P   WASH5P chr1
> ...
>
> I'm "ply"-ing over the "transcript" column and the function transforms
> each such subset of the data.frame into a new data.frame that is just
> 1 row / transcript that basically has the sum of the "counts" for each
> transcript.
>
> The code would look something like this (`summaries` is the data.frame
> I'm referring to):
>
> rpkm <- ddply(summaries, .(transcript), function(df) {
>  data.frame(symbol=df$symbol[1], counts=sum(df$counts))
> }
>
> (It actually calculates 2 more columns that are returned in the
> data.frame, but I'm not sure that's really important here).
>
> To test some things out, I've written another function to manually
> iterate/create subsets of my data.frame to summarize.
>
> I'm using sqldf to dump the data.frame into a db, then I lapply over
> subsets of the db `where transcript=x` to summarize each subset of my
> data into a list of single-row data.frames (like ddply is doing), and
> finish with a `do.call(rbind, the.dfs)` o nthis list.
>
> This returns the same exact result ddply would return, and by the time
> `do.call` finishes, my RAM usage hits about 8gb.
>
> So, what am I doing wrong with ddply that makes the difference ram
> usage in the last step ("collation" -- the equivalent of my final
> `do.call(rbind, my.dfs)` be more than 12GB?
>
> Thanks,
> -steve
>
> -- 
> Steve Lianoglou
> Graduate Student: Computational Systems Biology
> | Memorial Sloan-Kettering Cancer Center
> | Weill Medical College of Cornell University
> Contact Info: http://cbio.mskcc.org/~lianos/contact
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Using plyr::dply more (memory) efficiently?

2010-04-29 Thread Matthew Dowle


"Steve Lianoglou"  wrote in message 
news:t2ybbdc7ed01004290812n433515b5vb15b49c170f5a...@mail.gmail.com...

> Thanks for directing me to the data.table package. I read through some
> of the vignettes, and it looks quite nice.
>
> While your sample code would provide answer if I wanted to just
> compute some summary statistic/function of groups of my data.frame
> (using `by=symbol`), what's the best way to produces several pieces of
> info per subset.
>
> For instance, I see that I can do something like this:
>
> summaries[, list(counts=sum(counts), width=sum(exon.width)), by=symbol]

Yes, thats it.

> But what if I need to do some more complex processing within the
> subsets defined in `by=symbol` -- like several lines of programming
> logic for 1 result, say.
>
> I guess I can open a new block that just returns a data.table? Like:
>
> summaries[, {
>  cnts <- sum(counts)
>  ew <- sum(exon.width)
>  # ... some complex things
>  complex <- # .. result of complex things
>  data.table(counts=cnts, width=ew, cplx=complex)
>}, by=symbol]
>
> Is that right? (I mean, it looks like it's working, but maybe there's
> a more idiomatic way(?))

Yes, you got it.  Rather than a data.table at the end though, just return a 
list, its faster.
Shorter vectors will still be recycled to match any longer ones.

Or just this :

summaries[, list(
counts = sum(counts),
width = sum(exon.width),
cplx = # .. result of complex things
), by=symbol]


Sounds like its working,  but could you give us an idea whether it is quick 
and memory efficient ?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Calculating Random Effects Coefficients from lmer

2009-10-21 Thread Matthew Schneider

Hello all,

I am new to the list serve but hope to contribute what I can. For now,
however, I hope to tap into all your knowledge about mixed effects models.

Currently, I am running a mixed effects model on a time series panel data.
For this model I want to find out the fixed and random effect for two
discrete variables.

My Model:

m1 <- lmer(mghegdp_who ~ govdisgdp_up_net + pridisgdp_up_net +
mgdppc_usd06_imf + drdisgdp + mggegdpwb + hiv_prevalence + (0 +
govdisgdp_up_net|country) + (0 + pridisgdp_up_net|country), data)

To find the overall effect *with confidence intervals* of govdisgdp_up_net &
pridisgdp_up_net I need to account for the fixed and random effects.
Has any calculated this? Or does anyone have suggestions?

Also, does anyone know of the ability to calculate the postVar for a model
with multiple random effects?

Thank you and look forward to hearing your insights.
Matt 'Lost in Seattle'

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] ARMAtoMA

2009-11-10 Thread Matthew Pietrzykowski

Hello R Users!

I have a question about the output of ARMAtoMA when used to calculate
the variance of
a model.  I have a mixed model of the form ARMA(1,1).  The actual
model takes the form:

X(t) = 0.75X(t-12) + a(t) - 0.4a(t-1)

Given that gamma(0) takes the form [(1 + theta^2 -
2*theta*phi)/(1-phi^2)]*sigma(a), I would
expect a process variance of 4.02*sigma(a) when I substitute 0.75 for
phi and -0.4 for theta.

When I run ARMAtoMA,

result <- ARMAtoMA(ar=c(0.75), ma=(-0.4), lag.max=40)
sum(result^2)+1

I get 1.28.  If I input 0.4 instead of -0.4 in ARMAtoMA I get the
result I expected.  Is there a sign
dependence in the R function I am overlooking?

Thanks in advance.

Matt

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

1 2 3 4 5 >

1 - 100 of 465 matches

Mail list logo