Part of your problem is that your regexes have spaces in them, so that's what you're matching.
A small reproducible example would be more useful. I'm not feeling inclined to wade through all your linked files on Friday evening, but see if this helps: > testdata <- "[Engel, Kathrin M. Y.; Schroeck, Kristin; Schoeneberg, Torsten; > Schulz, Angela] Univ Leipzig, Fac Med, Inst Biochem, Leipzig, New Zealand; > [Teupser, Daniel; Holdt, Lesca Miriam; Thiery, Joachim] Univ Leipzig, Fac > Med, Inst Lab Med Clin Chem & Mol Diagnost, Leipzig, USA; [Toenjes, Anke; > Kern, Matthias; Blueher, Matthias; Stumvoll, Michael] Univ Leipzig, Fac Med, > Dept Internal Med, Leipzig, Germany; [Dietrich, Kerstin; Kovacs, Peter] Univ > Leipzig, Fac Med, Interdisciplinary Ctr Clin Res, Leipzig, Germany; [Kruegel, > Ute] Univ Leipzig, Fac Med, Rudolf Boehm Inst Pharmacol & Toxicol, Leipzig, > Germany; [Scheidt, Holger A.; Schiller, Juergen; Huster, Daniel] Univ > Leipzig, Fac Med, Inst Med Phys & Biophys, Leipzig, Germany; [Brockmann, > Gudrun A.] Humboldt Univ, Inst Anim Sci, D-10099 Berlin, Germany; [Augustin, > Martin] Ingenium Pharmaceut AG, Martinsried, Germany" > results <- gsub("\\[.*?\\]", "", testdata) > results <- unlist(strsplit(results, ";")) > results <- sapply(results, function(x)sub("^.*, ([A-Za-z ]*)$", "\\1", x)) > names(results) <- NULL > results [1] "New Zealand" "USA" "Germany" "Germany" "Germany" "Germany" "Germany" "Germany" Sarah On Fri, May 25, 2012 at 4:31 PM, Sabina Arndt <sabina.ar...@hotmail.de> wrote: > Hello r-help members, > > the solutions which Sarah Goslee and arun sent to me in such a prompt and > helpful manner work well with the examples I cut from the data.frame I'm > analyzing. Thank you very much for that! > I incorporated them into my R-script and discovered that it still doesn't > work properly, unfortunately. I have no idea why that's the case. > You see, I want to extract country names from the contents of tab-delimited > text files. This is an example of the data I'm using: > http://pastebin.com/mYZNDXg6 > This is the script I'm using to import the data: > http://pastebin.com/Z10UUH3z (It requires the text files to be in a folder > which doesn't contain any other .txt files.) > This is the script I'm using to extract the country names: > http://pastebin.com/G37fuPba > This is the string that's in the relevant field of the first record I'm > working on: > > [Engel, Kathrin M. Y.; Schroeck, Kristin; Schoeneberg, Torsten; Schulz, > Angela] Univ Leipzig, Fac Med, Inst Biochem, Leipzig, Germany; [Teupser, > Daniel; Holdt, Lesca Miriam; Thiery, Joachim] Univ Leipzig, Fac Med, Inst > Lab Med Clin Chem & Mol Diagnost, Leipzig, Germany; [Toenjes, Anke; Kern, > Matthias; Blueher, Matthias; Stumvoll, Michael] Univ Leipzig, Fac Med, Dept > Internal Med, Leipzig, Germany; [Dietrich, Kerstin; Kovacs, Peter] Univ > Leipzig, Fac Med, Interdisciplinary Ctr Clin Res, Leipzig, Germany; > [Kruegel, Ute] Univ Leipzig, Fac Med, Rudolf Boehm Inst Pharmacol & Toxicol, > Leipzig, Germany; [Scheidt, Holger A.; Schiller, Juergen; Huster, Daniel] > Univ Leipzig, Fac Med, Inst Med Phys & Biophys, Leipzig, Germany; > [Brockmann, Gudrun A.] Humboldt Univ, Inst Anim Sci, D-10099 Berlin, > Germany; [Augustin, Martin] Ingenium Pharmaceut AG, Martinsried, Germany > > This is the incorrect result my extraction script gives me for the first > record: > >> C1s[1] > [1] "[ENGEL, KATHRIN M. Y." "KRISTIN" "TORSTEN" > [4] "GERMANY" "DANIEL" "LESCA MIRIAM" > [7] "GERMANY" "ANKE" "MATTHIAS" > [10] "MATTHIAS" "GERMANY" "KERSTIN" > [13] "GERMANY" "GERMANY" "[SCHEIDT, HOLGER > A." > [16] "JUERGEN" "GERMANY" "HUMBOLDT" > [19] "GERMANY" > > For some reason the first and sixth pair of the eight square brackets are > not removed ... Do you understand why? > Instead I'd like to get this result, though: > >> C1s[1] > [1] "GERMANY" "GERMANY" "GERMANY" > [4] "GERMANY" "GERMANY" "GERMANY" > [7] "HUMBOLDT" "GERMANY" > > What am I doing wrong? What are the errors in my R-script? > Would anybody be so kind as to take a look and help me out, please? > Thank you very much in advance! > > Faithfully yours, > > Sabina Arndt > -- Sarah Goslee http://www.functionaldiversity.org ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.