Dear Peter, Thank you for your answer, the function na.locf() is exactly what I needed! I had started processing my dataset so the first lines (used as headers) were not included in the sample I have sent. But there is also a "unit" line before the first value.
And yes, of course, divide by 1000. Best, Ivan -- Dr. Ivan Calandra TraCEr, laboratory for Traceology and Controlled Experiments MONREPOS Archaeological Research Centre and Museum for Human Behavioural Evolution Schloss Monrepos 56567 Neuwied, Germany +49 (0) 2631 9772-243 https://www.researchgate.net/profile/Ivan_Calandra On May 10, 2019 at 3:29 PM peter dalgaard <pda...@gmail.com> wrote: > From nm to micron, _divide_ by 1000.... (as you likely know) > > What are the units of the first value? Looks like micron in your example, but > is there a rule? > > Basically, it is a "last observation carried forward" type problem, so > something like this: > > > my.data <- structure(list(V1 = c("2019/05/10", "#", "#", "#", "2019/05/10", > "2019/05/10", "2019/05/10", "#", "#", "#", "2019/05/10", "#", "#", "#", > "2019/05/10", "#", "#", "#", "2019/05/10", "2019/05/10"), V19 = > c("0.2012800083", "45", "Sq", "µm", "0.3634383236", "0.4360454777", > "0.3767733568", "45", "Sq", "nm", "102.013048", "45", "Sq", "µm", > "0.1413840498", "45", "Sq", "nm", "65.4459715", "46.45802917")), row.names = > c(NA, 20L), class = "data.frame") > > y <- my.data$V19 > u <- ifelse(y=="nm" | y=="µm", y, NA) > num <- my.data$V1 != "#" > uu <- zoo::na.locf(u, na.rm=FALSE) > data.frame(val = as.numeric(y[num]), units = uu[num]) > > giving > val units > 1 0.2012800 <NA> > 2 0.3634383 µm > 3 0.4360455 µm > 4 0.3767734 µm > 5 102.0130480 nm > 6 0.1413840 µm > 7 65.4459715 nm > 8 46.4580292 nm > > and you can surely take it from there. > > -pd > > > > On 10 May 2019, at 13:54 , Ivan Calandra <calan...@rgzm.de> wrote: > > > > Dear useRs, > > > > Below is a sample of my dataset (I have more rows and columns). > > > > As you can see in the 2nd column, there are values, the name of the > > parameter > > ('Sq' in that case), some integer ('45' in that case) and the unit ('µm' or > > 'nm'). > > I know how to extract the rows of interest (those with values), but they are > > expressed in different units. All values following a line with the unit are > > expressed in that unit, but the number of lines is not constant (sometimes > > each > > value is expressed in a different unit so there will be a new unit line, but > > there are sometimes several values in a row expressed in the same unit so > > without unit lines in between). I hope this is clear (it should be with the > > example provided). > > This messy dataset comes from an external software so I don't have any means > > to > > format the ways the data are collated. I have to find a way to deal with it > > in > > R. > > > > What I would like to do is convert the values in nm to µm; I just need to > > multiply by 1000. > > > > What I don't know is how to identify the values that are expressed in nm > > (all > > values that follow a line with 'nm' until there is a line with 'µm'). > > > > I don't even know how I should search online because I don't know how this > > kind > > of operation is called. > > Any help is appreciated. > > > > Thank you in advance. > > Ivan > > > > > > my.data <- structure(list(V1 = c("2019/05/10", "#", "#", "#", "2019/05/10", > > "2019/05/10", "2019/05/10", "#", "#", "#", "2019/05/10", "#", "#", "#", > > "2019/05/10", "#", "#", "#", "2019/05/10", "2019/05/10"), V19 = > > c("0.2012800083", "45", "Sq", "µm", "0.3634383236", "0.4360454777", > > "0.3767733568", "45", "Sq", "nm", "102.013048", "45", "Sq", "µm", > > "0.1413840498", "45", "Sq", "nm", "65.4459715", "46.45802917")), row.names = > > c(NA, 20L), class = "data.frame") > > > > -- > > Dr. Ivan Calandra > > TraCEr, laboratory for Traceology and Controlled Experiments > > MONREPOS Archaeological Research Centre and > > Museum for Human Behavioural Evolution > > Schloss Monrepos > > 56567 Neuwied, Germany > > +49 (0) 2631 9772-243 > > https://www.researchgate.net/profile/Ivan_Calandra > > > > ______________________________________________ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > -- > Peter Dalgaard, Professor, > Center for Statistics, Copenhagen Business School > Solbjerg Plads 3, 2000 Frederiksberg, Denmark > Phone: (+45)38153501 > Office: A 4.23 > Email: pd....@cbs.dk Priv: pda...@gmail.com > > > > > > > > > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.