https://stackoverflow.com/questions/3041320/regex-and-operator/37692545
On September 17, 2019 6:39:13 AM PDT, Ivan Calandra <calan...@rgzm.de> wrote: >Thank you Ivan for your help! > >Your solution for the first problem is so simple I didn't even think >about it! >What I find weird is that "_w_|\\.csv$" works as expected ("OR"), but >is >there no way to combine two patterns with an "AND"? > >Your solution to the second problem is actually unfortunately even more > >complicated to me than the gsub() solution. But I'm glad I can learn >about regmatches() and regexpr()! > >Best, >Ivan > >-- >Dr. Ivan Calandra >TraCEr, laboratory for Traceology and Controlled Experiments >MONREPOS Archaeological Research Centre and >Museum for Human Behavioural Evolution >Schloss Monrepos >56567 Neuwied, Germany >+49 (0) 2631 9772-243 >https://www.researchgate.net/profile/Ivan_Calandra > >On 17/09/2019 09:14, Ivan Krylov wrote: >> On Tue, 17 Sep 2019 08:48:43 +0200 >> Ivan Calandra <calan...@rgzm.de> wrote: >> >>> CSVs <- list.files(path=..., pattern="\\.csv$") >>> w.files <- CSVs[grep(pattern="_w_", CSVs)] >>> >>> Of course, what I would like to do is list only the interesting >files >>> from the beginning, rather than subsetting the whole list of files. >> One way to express that would be "_w_.*\\.csv$", meaning that the >> filename has to have "_w_" in it, followed by anything (any character >> repeated any number of times, including 0), followed by ".csv" at the >> end of the line. >> >>> 2) The units of the variables are given in the original headers. I >>> would like to extract the units. This is what I did: headers <- >>> c("dist to origin on curve [mm]","segment on section [mm]", "angle 1 >>> [degree]", "angle 2 [degree]","angle 3 [degree]") units.var <- >>> gsub(pattern="^.*\\[|\\]$", "", headers) >>> >>> It seems to be to overly complicated using gsub(). Isn't there a way >>> to extract what is interesting rather than deleting what is not? >> Pure-R way: use regmatches() + regexpr(). Both regmatches and regexpr >> take the character vector as an argument, so duplication is hard to >> avoid: >> >> units <- regmatches(headers, regexpr('\\[.*\\]', headers)) >> >> The stringr package has an str_match() function with a nicer >interface: >> str_match(headers, '\\[.*\\]') -> units. >> >> Such "greedy" patterns containing ".*" present a few pitfalls, e.g. >> looking for text in parentheses using the pattern "\\(.*\\)" in >> "...(abc)...(def)..." will match the whole "(abc)...(def)" instead of >> single groups "(abc)" and "(def)", but with your examples the pattern >> should work as presented. One other option would be to ask for "[", >> followed by zero or more characters that are not "]", followed by >"]": >> '\\[[^]]*\\]'. >> > >______________________________________________ >R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code. -- Sent from my phone. Please excuse my brevity. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.