Thank you Ivan for your help!

Your solution for the first problem is so simple I didn't even think about it! What I find weird is that "_w_|\\.csv$" works as expected ("OR"), but is there no way to combine two patterns with an "AND"?

Your solution to the second problem is actually unfortunately even more complicated to me than the gsub() solution. But I'm glad I can learn about regmatches() and regexpr()!

Best,
Ivan

--
Dr. Ivan Calandra
TraCEr, laboratory for Traceology and Controlled Experiments
MONREPOS Archaeological Research Centre and
Museum for Human Behavioural Evolution
Schloss Monrepos
56567 Neuwied, Germany
+49 (0) 2631 9772-243
https://www.researchgate.net/profile/Ivan_Calandra

On 17/09/2019 09:14, Ivan Krylov wrote:
On Tue, 17 Sep 2019 08:48:43 +0200
Ivan Calandra <calan...@rgzm.de> wrote:

CSVs <- list.files(path=..., pattern="\\.csv$")
w.files <- CSVs[grep(pattern="_w_", CSVs)]

Of course, what I would like to do is list only the interesting files
from the beginning, rather than subsetting the whole list of files.
One way to express that would be "_w_.*\\.csv$", meaning that the
filename has to have "_w_" in it, followed by anything (any character
repeated any number of times, including 0), followed by ".csv" at the
end of the line.

2) The units of the variables are given in the original headers. I
would like to extract the units. This is what I did: headers <-
c("dist to origin on curve [mm]","segment on section [mm]", "angle 1
[degree]", "angle 2 [degree]","angle 3 [degree]") units.var <-
gsub(pattern="^.*\\[|\\]$", "", headers)

It seems to be to overly complicated using gsub(). Isn't there a way
to extract what is interesting rather than deleting what is not?
Pure-R way: use regmatches() + regexpr(). Both regmatches and regexpr
take the character vector as an argument, so duplication is hard to
avoid:

units <- regmatches(headers, regexpr('\\[.*\\]', headers))

The stringr package has an str_match() function with a nicer interface:
str_match(headers, '\\[.*\\]') -> units.

Such "greedy" patterns containing ".*" present a few pitfalls, e.g.
looking for text in parentheses using the pattern "\\(.*\\)" in
"...(abc)...(def)..." will match the whole "(abc)...(def)" instead of
single groups "(abc)" and "(def)", but with your examples the pattern
should work as presented. One other option would be to ask for "[",
followed by zero or more characters that are not "]", followed by "]":
'\\[[^]]*\\]'.


______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to