After reading the metals data frame, I would do this: metals$result <- as.numeric(gsub('<','',metals$Cedar.Creek)) metals$flag <- ifelse(grepl('<',metals$Cedar.Creek),'<','h')
Also, assuming you got your data into R using read.table(), read.csv(), or similar, I would include stringsAsFactors=TRUE as another argument to the function call. You don't need factors at this point. -Don -- Don MacQueen Lawrence Livermore National Laboratory 7000 East Ave., L-627 Livermore, CA 94550 925-423-1062 On 7/9/14 11:02 AM, "Sam Albers" <tonightstheni...@gmail.com> wrote: >Thanks for all the responses. It sometimes difficult to outline >exactly what you need. These response were helpful to get there. >Speaking to Bert's point a bit, I needed a column to identify where >the < symbol was used. If I knew more about R I think I might be >embarrassed to post my solution to that problem but here is how I used >Sarah's solution but still kept the info about detection limits. I'm >sure there is a more elegant way: > >metals <- >structure(list(Parameter = structure(c(1L, 2L, 3L, 4L, 6L, 7L, >8L, 9L, 10L, 11L, 12L, 13L, 15L, 16L, 17L, 18L, 19L, 20L, 1L), .Label >= c("Antimony", >"Arsenic", "Barium", "Beryllium", "Boron (Hot Water Soluble)", >"Cadmium", "Chromium", "Cobalt", "Copper", "Lead", "Mercury", >"Molybdenum", "Nickel", "pH 1:2", "Selenium", "Silver", "Thallium", >"Tin", "Vanadium", "Zinc"), class = "factor"), Cedar.Creek = >structure(c(3L, >3L, 7L, 3L, 2L, 4L, 3L, 34L, 36L, 2L, 5L, 7L, 3L, 7L, 3L, 45L, >4L, 4L, 3L), .Label = c("<1", "<10", "<100", "<1000", "<200", >"<5", "<500", "0.1", "0.13", "0.5", "0.8", "1.07", "1.1", "1.4", >"1.5", "137", "154", "163", "165", "169", "178", "2.3", "2.4", >"22", "24", "244", "27.2", "274", "3", "3.1", "40.2", "43", "50", >"516", "53.3", "550", "569", "65", "66.1", "68", "7.6", "72", >"77", "89", "951"), class = "factor")), .Names = c("Parameter", >"Cedar.Creek"), row.names = c(NA, 19L), class = "data.frame") > > > >metals$temp1<-metals$Cedar.Creek >metals$Cedar.Creek <- as.character(metals$Cedar.Creek) >metals$Cedar.Creek <- gsub("<", "", metals$Cedar.Creek) >metals$Cedar.Creek <- as.numeric(metals$Cedar.Creek) > >metals$temp2<-metals$temp1==metals$Cedar.Creek >metals$Detection<-factor(ifelse(metals$temp2=="TRUE","Measured","Limit")) >metals[,c(1,2,5)] > > >Thanks again! > >Sam > >On Wed, Jul 9, 2014 at 10:41 AM, Bert Gunter <gunter.ber...@gene.com> >wrote: >> Well, ?grep and ?regex are clearly apropos here -- dealing with >> character data is an essential skill for handling input from diverse >> sources with various formatting conventions. I suggest you go through >> one of the many regular expression tutorials on the web to learn more. >> >> But this may not be the important issue here at all. If "<k" means the >> value is left censored at k -- i.e. we know it's less than k but not >> how much less -- than Sarah's proposal is not what you want to do. >> Exactly what you do want to do depends on context, and as it concerns >> statistical methodology, is not something that should be discussed >> here. Consult a local statistician if this is a correct guess. >> Otherwise ignore. >> >> ... and please post in plain text in future (as requested) as HTML can >> get garbled. >> >> Bert Gunter >> Genentech Nonclinical Biostatistics >> (650) 467-7374 >> >> "Data is not information. Information is not knowledge. And knowledge >> is certainly not wisdom." >> Clifford Stoll >> >> >> >> >> On Wed, Jul 9, 2014 at 10:26 AM, Sarah Goslee <sarah.gos...@gmail.com> >>wrote: >>> Hi Sam, >>> >>> I'd take the similar tack of removing the < instead. Note that if you >>> import the data frame using the stringsAsFactors=FALSE argument, you >>> don't need the first step. >>> >>> metals$Cedar.Creek <- as.character(metals$Cedar.Creek) >>> metals$Cedar.Creek <- gsub("<", "", metals$Cedar.Creek) >>> metals$Cedar.Creek <- as.numeric(metals$Cedar.Creek) >>> >>> R> str(metals) >>> 'data.frame': 19 obs. of 2 variables: >>> $ Parameter : Factor w/ 20 levels "Antimony","Arsenic",..: 1 2 3 4 6 >>> 7 8 9 10 11 ... >>> $ Cedar.Creek: num 100 100 500 100 10 1000 100 516 550 10 ... >>> >>> Sarah >>> >>> >>> On Wed, Jul 9, 2014 at 1:19 PM, Sam Albers >>><tonightstheni...@gmail.com> wrote: >>>> Hello, >>>> >>>> I have recently received a dataset from a metal analysis company. The >>>> dataset is filled with less than symbols. What I am looking for is a >>>> efficient way to subset for any whole numbers from the dataset. The >>>>column >>>> is automatically formatted as a factor because of the "<" symbols >>>>making it >>>> difficult to deal with the numbers is a useful way. >>>> >>>> So in sum any ideas on how I could subset the example below for only >>>>whole >>>> numbers? >>>> >>>> Thanks in advance! >>>> >>>> Sam >>>> >>>> #code >>>> >>>> metals <- >>>> >>>> >>>> structure(list(Parameter = structure(c(1L, 2L, 3L, 4L, 6L, 7L, >>>> 8L, 9L, 10L, 11L, 12L, 13L, 15L, 16L, 17L, 18L, 19L, 20L, 1L), .Label >>>> = c("Antimony", >>>> "Arsenic", "Barium", "Beryllium", "Boron (Hot Water Soluble)", >>>> "Cadmium", "Chromium", "Cobalt", "Copper", "Lead", "Mercury", >>>> "Molybdenum", "Nickel", "pH 1:2", "Selenium", "Silver", "Thallium", >>>> "Tin", "Vanadium", "Zinc"), class = "factor"), Cedar.Creek = >>>>structure(c(3L, >>>> 3L, 7L, 3L, 2L, 4L, 3L, 34L, 36L, 2L, 5L, 7L, 3L, 7L, 3L, 45L, >>>> 4L, 4L, 3L), .Label = c("<1", "<10", "<100", "<1000", "<200", >>>> "<5", "<500", "0.1", "0.13", "0.5", "0.8", "1.07", "1.1", "1.4", >>>> "1.5", "137", "154", "163", "165", "169", "178", "2.3", "2.4", >>>> "22", "24", "244", "27.2", "274", "3", "3.1", "40.2", "43", "50", >>>> "516", "53.3", "550", "569", "65", "66.1", "68", "7.6", "72", >>>> "77", "89", "951"), class = "factor")), .Names = c("Parameter", >>>> "Cedar.Creek"), row.names = c(NA, 19L), class = "data.frame") >>>> >>> >>> -- >>> Sarah Goslee >>> http://www.functionaldiversity.org >>> >>> ______________________________________________ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>>http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. > >______________________________________________ >R-help@r-project.org mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.