Hi I have stumbled upon a problem when using gregexpr and regmatches, with the following error-message:
Error in iconv(x, "latin1", "ASCII") : 'x' must be a list of NULL or raw vectors The data: (1) I have two journal articles and after some regex manipulation I am at the following situation: # manipluat only two full text articles author.test <- articles1[1:2] # extract author informaiton r <- gregexpr("(\"authors\":(.*?)\"(.*?)\")|(\"authors\": \\[(.*?)\\],)", author.test) authors.raw <- regmatches(author.test, r) authors.raw [[1]] [1] "\"authors\": [\"Allan G. KING\", \"B. Lindsay LOWELL\", \"Frank D. BEAN\"]," [[2]] [1] "\"authors\": \"Chris Baldry\", \"" (2) Now, if I want to conduct additional regex manipulation I get the Error stated above. r <- gregexpr("([^(\"authors\":)])(.*?)(\"(.*?)\")", authors.raw) authors.raw <- regmatches(authors.raw, r) Error in iconv(x, "latin1", "ASCII") : 'x' must be a list of NULL or raw vectors (3) One of the ways to avoid this is to unlist(authors.raw) - see below - but the problem with this is that I lose some information which was contained in the list. The first element contains three character elements and which are the authors of the first paper. I want to keep them in that list format. > authors.raw <- unlist(regmatches(authors.raw, r)) > authors.raw [1] " [\"Allan G. KING\"" ", \"B. Lindsay LOWELL\"" ", \"Frank D. BEAN\"" " \"Chris Baldry\"" (4) So what I want to do is to avoid unlis() and apply the gregex() multiple times in a row. Any ideas? Thanks in advance Adel -- View this message in context: http://r.789695.n4.nabble.com/Using-gregexpr-and-regmatches-but-getting-Iconv-error-tp4700677.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.