Re: [R] How to load load multiple text files and order by id

Dennis Murphy Sat, 05 Mar 2011 23:38:28 -0800

Hi:

This is basically Scott's idea with a few added details.

Let's assume your files have similar names - e.g., they differ only by
number.
The example below creates ten files of similar structure to yours. There are
then two paths one can follow: (1) put all the files into a specific
directory, or
(2) keep them where they are.

This is my current working directory (Win 7):
> getwd()
[1] "C:/Users/Dennis/Documents"

# Create ten files, each with 20 IDs and a random count. The files are then
# written as csv files to the current working directory. This is simply a
way
# for me to generate data that in some sense mimics the data you already
have.
# You don't need to reproduce this since you already have the file list.
for (i in 1:10) {
    df <- data.frame(id = sprintf('%02d', 1:20),
                     count = rpois(20, 50))
    write.csv(df, file = paste('file_', sprintf('%02d', i), '.csv', sep =
''),
                row.names = FALSE)
     }

# Option 1: Move all the files to a separate subdirectory of the current
directory-
# I'll call it 'myfiles', because I'm highly imaginative. [If your files
have different names
# that are difficult to isolate with a certain string pattern, this is
probably the best option.]
# Once the files are moved, I can change the working directory to myfiles:
setwd('myfiles')

# > getwd()
# [1] "C:/Users/Dennis/Documents/myfiles"

# Now, read all the csv files from this directory into a list object - in
your case,
# it may be simpler to define a vector of names with list.files() instead
and check
# that it's right before using lapply, something like
# filelist <- list.files(pattern = '.csv', all.files = FALSE)
# readlist <- lapply(filelist, read.csv, header = TRUE)
# The line below combines the two.
readlist <- lapply(list.files(pattern = 'csv', all.files = FALSE),
                     read.csv, header = TRUE)

# Assign names count_01 to count_10 to the list components (rationale: these
# are the column names I'll want to use in the final data frame)
names(readlist) <- paste('count', sprintf('%02d', 1:length(readlist)), sep =
'_')
# As Scott intimated (but never used :), fire up the plyr and reshape
packages:
library(plyr)
library(reshape)
# The first command is equivalent to do.call(rbind, readlist), but the
advantage of
# ldply is that it copies over the list component names in a variable named
.id as well,
# which as we'll see is very useful...
dtf <- ldply(readlist, rbind)
head(dtf)      # to see the first few lines

# The cast() function in the reshape package takes our 'long' data in dtf
and
# reshapes it to 'wide' form according to the formula - in this case, the
rows will
# be the id numbers and the columns will be count_01 - count_10.
Fortunately,
# the count is taken as the 'value' variable. (This is made more explicit in
the
# reshape2 package, where the corresponding function is dcast() and count
# would be (in quotes) the argument of value_var = )...but this works:
cast(dtf, id ~ .id)

# Option 2: The files happen to be in the same directory as getwd(), but may
be
# mixed in with a bunch of other files. This is the case in my Documents
directory.
setwd('..')
getwd()
[1] "C:/Users/Dennis/Documents"

# I may have other .csv files in this directory, so I'm probably better off
trying to
# match 'file_' instead of '.csv'. Otherwise, it's pretty much the same
story as above:
list2 <- lapply(list.files(pattern = 'file_', all.files = FALSE),
                     read.csv, header = TRUE)
names(list2) <- paste('count', sprintf('%02d', 1:length(readlist)), sep =
'_')
dtg <-  ldply(list2, rbind)
cast(dtg, id ~ .id)

A third option is to create a separate subdirectory for the data files, copy
an R shortcut
into that directory (at least under Windows, anyway), go to Properties and
change the
'StartIn' directory to its name. Then follow Option 1.

HTH,
Dennis

On Sat, Mar 5, 2011 at 6:39 PM, Richard Green <gree...@uw.edu> wrote:

> Hello R users,
> I am fairly new to R and was hoping you could point me in the right
> direction I have a set of text files (36).
> Each file has only two columns (id and count) , I am trying to figure out a
> way to load all the files together and
> then have them ordered by id into a matrix data frame. For example
>
> If each txt file has :
> ID           count
> id_00002 20
> id_00003 3
>
> A Merged File:
> ID           count_file1 count_file2 count_file3 count_file4
> id_00002 20         8              12               5             19 26
> id_00003 3 0 2 0 0 0
> id_00004 75 84 241 149 271 257
>
> Is there a relatively simply way to do that in R? I was trying with <-
> read.table
> and then <- cbind but that does not appear to be working.  Any suggestions
> folks have are appreciated.
> Thanks
> -Rich
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to load load multiple text files and order by id

Reply via email to