(For the units) Why not simply:
sub(".*\\[(.+)\\]","\\1", headers) Cheers, Bert On Tue, Sep 17, 2019 at 6:40 AM Ivan Calandra <calan...@rgzm.de> wrote: > Thank you Ivan for your help! > > Your solution for the first problem is so simple I didn't even think > about it! > What I find weird is that "_w_|\\.csv$" works as expected ("OR"), but is > there no way to combine two patterns with an "AND"? > > Your solution to the second problem is actually unfortunately even more > complicated to me than the gsub() solution. But I'm glad I can learn > about regmatches() and regexpr()! > > Best, > Ivan > > -- > Dr. Ivan Calandra > TraCEr, laboratory for Traceology and Controlled Experiments > MONREPOS Archaeological Research Centre and > Museum for Human Behavioural Evolution > Schloss Monrepos > 56567 Neuwied, Germany > +49 (0) 2631 9772-243 > https://www.researchgate.net/profile/Ivan_Calandra > > On 17/09/2019 09:14, Ivan Krylov wrote: > > On Tue, 17 Sep 2019 08:48:43 +0200 > > Ivan Calandra <calan...@rgzm.de> wrote: > > > >> CSVs <- list.files(path=..., pattern="\\.csv$") > >> w.files <- CSVs[grep(pattern="_w_", CSVs)] > >> > >> Of course, what I would like to do is list only the interesting files > >> from the beginning, rather than subsetting the whole list of files. > > One way to express that would be "_w_.*\\.csv$", meaning that the > > filename has to have "_w_" in it, followed by anything (any character > > repeated any number of times, including 0), followed by ".csv" at the > > end of the line. > > > >> 2) The units of the variables are given in the original headers. I > >> would like to extract the units. This is what I did: headers <- > >> c("dist to origin on curve [mm]","segment on section [mm]", "angle 1 > >> [degree]", "angle 2 [degree]","angle 3 [degree]") units.var <- > >> gsub(pattern="^.*\\[|\\]$", "", headers) > >> > >> It seems to be to overly complicated using gsub(). Isn't there a way > >> to extract what is interesting rather than deleting what is not? > > Pure-R way: use regmatches() + regexpr(). Both regmatches and regexpr > > take the character vector as an argument, so duplication is hard to > > avoid: > > > > units <- regmatches(headers, regexpr('\\[.*\\]', headers)) > > > > The stringr package has an str_match() function with a nicer interface: > > str_match(headers, '\\[.*\\]') -> units. > > > > Such "greedy" patterns containing ".*" present a few pitfalls, e.g. > > looking for text in parentheses using the pattern "\\(.*\\)" in > > "...(abc)...(def)..." will match the whole "(abc)...(def)" instead of > > single groups "(abc)" and "(def)", but with your examples the pattern > > should work as presented. One other option would be to ask for "[", > > followed by zero or more characters that are not "]", followed by "]": > > '\\[[^]]*\\]'. > > > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.