Re: [R] R Processing dataframe by group - equivalent to SAS by group processing with a first. and retain statments

2024-11-27 Thread Tom Woolman
Oh and don't forget: #first line of code, bring dplyr into memory after that package has been installed. library(dplyr) On Wednesday, November 27th, 2024 at 12:05 PM, Tom Woolman wrote: > > > Check out the dplyr package, specifically the mutate function. > > # Cre

Re: [R] R Processing dataframe by group - equivalent to SAS by group processing with a first. and retain statments

2024-11-27 Thread Tom Woolman
Check out the dplyr package, specifically the mutate function. # Create new column based on existing column value df <- df %>% mutate(FirstDay = if(ID = 2, 5)) df Repeat as needed to capture all of the day/firstday combinations you want to account for. Like everything else in R, there are

Re: [R] How to do non-parametric calculations in R

2022-06-11 Thread Tom Woolman
Imagine that it's the year 2022 and you don't know how to look up information about performing a Kruskal-Wallis H test. It would take you longer to join the listserv and then write such a cokamemie email than to open the stats textbook you are supposed to have for the course, much less doing

Re: [R] categorizing data

2022-05-29 Thread Tom Woolman
Some ideas: You could create a cluster model with k=3 for each of the 3 variables, to determine what constitutes high/medium/low centroid values for each of the 3 types of plant types. Centroid values could then be used as the upper/lower boundary ranges for high/med/low. Or utilize a hist

Re: [R] Is there a canonical way to pronounce CRAN?

2022-05-04 Thread Tom Woolman
Everyone needs to speak English exactly like I do or else they're doing it wrong :) By I pronounce CRAN the same way that I pronounce the first half of cranberry. On 2022-05-04 20:24, Avi Gross via R-help wrote: Extended discussion may be a waste but speaking for myself, I found it highl

Re: [R] Combining data.frames

2022-03-19 Thread Tom Woolman
Have you looked at the merge function in base R? https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/merge On 2022-03-19 21:15, Jeff Reichman wrote: R-Help Community I'm trying to combine two data.frames which each containing 10 columns of which they each share two common fiel

Re: [R] Time for a companion mailing list for R packages?

2022-01-13 Thread Tom Woolman
I concur on both of Eric's suggestions below. I love R but I couldn't imagine using it on a daily basis without "key" packages for various regression and classification modeling problems, etc. Likewise on being able to embed images (within reason... maybe establish a max KB or MB file size fo

Re: [R] Defining Parameters in arules

2021-11-23 Thread Tom Woolman
Greg Williams has a book titled "Data Mining with Rattle and R", which has a chapter on association rules and the arules package. Williams' Rattle GUI package for R also lets you define an association rules model using a graphical interface (which creates the R code for you in the log file for

Re: [R] Creating a log-transformed histogram of multiclass data

2021-08-03 Thread Tom Woolman
Apologies, I left out 3 critical lines of code after the randomized sample dataframe is created: group_a <- d[ which(d$label =='A'), ] group_b <- d[ which(d$label =='B'), ] group_c <- d[ which(d$label =='C'), ] On 2021-08-03 18:56, Tom Woolman wro

[R] Creating a log-transformed histogram of multiclass data

2021-08-03 Thread Tom Woolman
# Resending this message since the original email was held in queue by the listserv software because of a "suspicious" subject line, and/or because of attached .png histogram chart attachments. I'm guessing that the listserv software doesn't like multiple image file attachments. Hi everyon

Re: [R] [EXT] Re: Assigning categorical values to dates

2021-07-21 Thread Tom Woolman
ngton State University > Graduate Advocate, American Association of University Professors (OR) > > Recent work (https://www.researchgate.net/profile/Nathan_Parsons3/publications) > Schedule an appointment (https://calendly.com/nate-parsons) > > > On Wednesday, Jul 21, 2021 at

Re: [R] Assigning categorical values to dates

2021-07-21 Thread Tom Woolman
y, Washington State University Graduate Advocate, American Association of University Professors (OR) Recent work (https://www.researchgate.net/profile/Nathan_Parsons3/publications) Schedule an appointment (https://calendly.com/nate-parsons) On Wednesday, Jul 21, 2021 at 8:30 PM, Tom Woolman

Re: [R] Assigning categorical values to dates

2021-07-21 Thread Tom Woolman
Couldn't you convert the date columns to character type data in a data frame, and then convert those strings to factors in a 2nd step? The only downside I think to treating dates as factor levels is that you might have an awful lot of factors if you have a large enough dataset. Quoti

Re: [R] Using R to analyse Court documents

2021-07-20 Thread Tom Woolman
Hi Brian. I assume you're interested in some kind of classification of the theme or the contents within each document? In which case I would direct you to natural language processing for multinomial classification of unstructured data. Basically an NLP (natural language processing) classifica

Re: [R] Windows path backward slash

2020-12-24 Thread Tom Woolman
In Windows versions of R/RStudio when refering to filename paths, you need to either use two "\\" characters instead of one, OR use the reverse slash "/" as used in Linux/Unix. It's an unfortunate conflict between R and Windows in that a single \ character by itself is treated as an esc

Re: [R] cooks distance for repeated measures anova

2020-12-23 Thread Tom Woolman
Hi Dr. Pedersen. I haven't used cook's on an aov object but I do it all the time from an lm (general linear model) object, ie.: mod <- lm(data=dataframe) cooksdistance <- cooks.distance(mod) I *think* you might be able to simulate an aov using the lm functon by selecting the parameter in

Re: [R] counting duplicate items that occur in multiple groups

2020-11-18 Thread Tom Woolman
ount dupAcctID<-colSums(table(Data)>0) Data$dupAcct<-NA # fill in the new column for(i in 1:length(dupAcctID)) Data$dupAcct[Data$AcctID == names(dupAcctID[i])]<-dupAcctID[i] Jim On Wed, Nov 18, 2020 at 8:20 AM Tom Woolman wrote: Hi everyone. I have a dataframe that is a collectio

Re: [R] counting duplicate items that occur in multiple groups

2020-11-17 Thread Tom Woolman
in his "Bloom County" comic strip ) On Tue, Nov 17, 2020 at 3:29 PM Tom Woolman wrote: Hi Bill. Sorry to be so obtuse with the example data, I was trying (too hard) not to share any actual values so I just created randomized values for my example; of course I should have specified th

Re: [R] counting duplicate items that occur in multiple groups

2020-11-17 Thread Tom Woolman
uot;)) ? Must each vendor have only one account? If not, what should the result be for Data2 <- data.frame(Vendor=c("V1","V2","V3","V1","V4","V2"), Account=c("A1","A2","A2","A2","A3",&q

[R] counting duplicate items that occur in multiple groups

2020-11-17 Thread Tom Woolman
Hi everyone. I have a dataframe that is a collection of Vendor IDs plus a bank account number for each vendor. I'm trying to find a way to count the number of duplicate bank accounts that occur in more than one unique Vendor_ID, and then assign the count value for each row in the dataframe

[R] RIDIT scoring in R

2020-09-14 Thread Tom Woolman
Hi everyone. I'd like to perform RIDIT scoring of a column that consists of ordinal values, but I don't have a comparison dataset to use against it as required by the Ridit::ridit function. As a question of best practice, could I use a normally distributed frequency distribution table gen

Re: [R] Assigning cores

2020-09-03 Thread Tom Woolman
Hi Leslie and all. You may want to investigate using SparklyR on a cloud environment like AWS, where you have more packages that are designed to work on cluster computing environments and you have more control over those types of parallel operations. V/r, Tom W. Quoting Leslie Rutkows

Re: [R] kernlab ksvm rbfdot kernel - prediction returning fewer rows than provided for input

2020-06-10 Thread Tom Woolman
. Quoting Tom Woolman : Hi everyone. I'm using the kernlab ksvm function with the rbfdot kernel for a binary classification problem and getting a strange result back. The predictions seem to be very accurate judging by the training results provided by the algorithm, but I'm unable to

[R] kernlab ksvm rbfdot kernel - prediction returning fewer rows than provided for input

2020-06-10 Thread Tom Woolman
Hi everyone. I'm using the kernlab ksvm function with the rbfdot kernel for a binary classification problem and getting a strange result back. The predictions seem to be very accurate judging by the training results provided by the algorithm, but I'm unable to generate a confusion matrix

[R] random forest significance testing tools

2020-05-10 Thread Tom Woolman
train[,1:29], nperm=99, ntree=500) Thanks in advance. Tom Woolman PhD student, Indiana State University __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide

[R] Problem witth nnet:multinom

2019-06-21 Thread Tom Woolman
I am using R with the nnet package to perform a multinomial logistic regression on a training dataset with ~5800 training dataset records and 45 predictor variables in that training data. Predictor variables were chosen as a subset of all ~120 available variables based on PCA analysis. My t

[R] Trying to fix code that will find highest 5 column names and their associated values for each row in a data frame in R

2018-12-17 Thread Tom Woolman
I have a data frame each with 10 variables of integer data for various attributes about each row of data, and I need to know the highest 5 variables related to each of row in this data frame and output that to a new data frame. In addition to the 5 highest variable names, I also need to kn