On Fri, 31 Jan 2020 18:06:00 +0000 Ioanna Ioannou <ii54...@msn.com> wrote:
> I want to extract e.g., the country from all these files. How can i > add NA for the files for which the country is not mentioned? I am starting from the beginning, since I don't know what you have tried and where exactly you are stuck. > A<- data.frame( name1 = c('fields', 'fields', 'fields'), > name2= c('category', 'asset', > 'country'), value = c('Structure Class', 'Building', 'Colombia') Given one such data frame, we can use logical vector subscripts to extract the 'country' field. The following command returns a logical vector: A[, 'name2'] == 'country' # [1] FALSE FALSE TRUE If we pass it to the subscript operator (type ?'[' in the R prompt for more info), we can get the matching rows of the data frame: subs <- A[, 'name2'] == 'country' A[subs, ] # name1 name2 value # 3 fields country Colombia Okay, now we just need to choose the correct column: A[subs, 'value'] # [1] Colombia # Levels: Building Colombia Structure Class What happens if there is no "country" row? C[C[, 'name2'] == 'country', 'value'] # factor(0) # Levels: Building Fragility Structure Class We get a 0-length vector instead of the NA we want. The length() function and the `if` control-flow construct should let us test for 0-length vectors (see ?length and ?'if'): x <- C[C[,'name2'] == 'country','value'] if (length(x) == 1) x else NA # [1] NA Bonus question: what happens if there is more than one "country" line in the data frame? What should happen instead? See also: https://cran.r-project.org/doc/manuals/r-release/R-intro.html#Index-vectors Note that the "value" column is a factor (that's why we are getting these "Levels:" when we print the vectors; see ?factor). You want a character vector, so we will coerce the value to the desired type using the as.character() function. > essentially i want a vector called country which will look like this: > > Country <- c('Colombia', 'Greece', NA) Once we have a procedure to deal with one data frame, we can apply it to multiple data frames by putting the procedure into a function and calling it on a list of data frames using one of the *apply functions (see ?vapply): # TODO: produce the list programmatically by calling the JSON reading # function on a vector of filenames dataframes <- list(A, B, C) # perform an anonymous function on each of the data frames, # return the result as a vector sapply(dataframes, function(x) { country <- x[x[,'name2'] == 'country','value'] # look for "country" row # return the country as a string if found one row, NA otherwise if (length(country) == 1) as.character(country) else NA }) I am pretty sure there are other ways to perform this operation, but I find this one the easiest to explain. -- Best regards, Ivan P.S. > [[alternative HTML version deleted]] Please post e-mails in plain text, not HTML. See <http://www.R-project.org/posting-guide.html> for more info. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.