Re: [R] Splitting columns and forming new data files in R

Zilefac Elvis Thu, 10 Apr 2014 23:15:08 -0700

Great. Thanks AK.

On Thursday, April 10, 2014 11:14 PM, arun <[email protected]> wrote:
 
Hi,
Ok, In that case,


change `lst1New`.  Also, in your files, there was no "Sim" column.  So, I 
changed the name.


lst1New <- lapply(lst1,function(x) {lst2 <- setNames(lapply(x,function(y) {dat 
<- read.table(y,sep=" ",header=TRUE, stringsAsFactors=FALSE);names(dat)[5] <- 
"Sim"; dat[,1:5]}),names1); dat2 <- do.call(cbind,lst2); indx <- 
grepl("Sim",names(dat2)); dat3 <- dat2[indx]; dat4 <- dat2[!indx][,1:4]; 
names(dat4) <- gsub(".*\\.","",names(dat4)); 
lapply(split(names(dat3),gsub(".*\\.","",names(dat3))),function(x) {dat5 <- 
cbind(dat4,dat3[,x]);dat5$Tmean <- -999.9; dat6 <- 
dat5[,c(1:4,7:6,8,5)];colnames(dat6)[2:3] <- 
format(Coord[match(unique(dat6$Site),Coord$Site),3:2],digits=4); dat7 <- 
dat6[,-4]; mat1 <- as.matrix(dat7); colnames(mat1)[-(2:3)] <- ' '; mat1})}) 


A.K.




On Friday, April 11, 2014 12:12 AM, "[email protected]" 
<[email protected]> wrote:





Hi AK,  the program works perfect. I used a different data set and was unable 
to modify the program to suit the new data set. Attached.





The attached dataset has 5 columns instead of 104 columns for which the program 
was developed. I was unable to edit lst1NEW<-.

------ Original Message ------



From : arun
>To : Zilefac Elvis;
>Sent : 10-04-2014 22:02
>Subject : Re: Re: Splitting columns and forming new data files in R
> 
>Hi Atem, It may be that the program slows with the size of the dataset.  On 
>Thursday, April 10, 2014 11:13 PM, Zilefac Elvis  wrote: Hi AK,
Please apply this program of yours to my attached data set. I have been 
struggling for hours but did not succeed. I am learning faster than I expected 
but this one is more than me. Thanks,
Atem.
#---------------------------------------------------------------------------
dir.create("final")
list.files()
#[1] "coordinates.csv" "final"          "Precip"          "Tmax"   "Tmin"
Coord <- 
read.csv(list.files(pattern=".csv"),header=TRUE,stringsAsFactors=FALSE) # read 
coordinates (lat,long)
lfile <- list.files()[!grepl(".csv|final",list.files())] # list other files 
except .csv and final
files <-  paste(paste(getwd(),lfile,sep="/"), list.files(lfile),sep="/")# getwd 
of these files/contents lst1 <- 
split(files,gsub(".*\\/(.*)\\.csv","\\1",files)) 
names1 <- gsub(".*\\/(.*)\\/.*\\.csv","\\1",lst1[[1]])
lst1New <- lapply(lst1,function(x) {lst2 <- setNames(lapply(x,function(y)   
{dat <- read.table(y,sep=" ",header=TRUE, stringsAsFactors=FALSE);    
dat[,1:104]} ), names1); dat2 <- do.call(cbind,lst2);   indx <- 
grepl("Sim",names(dat2)); dat3 <- dat2[indx];  dat4 <- dat2[!indx][,1:4]; 
names(dat4) <- gsub(".*\\.","",names(dat4));   
lapply(split(names(dat3),gsub(".*\\.","",names(dat3))),function(x)      {dat5 
<- cbind(dat4,dat3[,x]); dat5$Tmean <- -999.9;      dat6 <- 
dat5[,c(1:4,7:6,8,5)];      colnames(dat6)[2:3] <- 
format(Coord[match(unique(dat6$Site),Coord$Site),3:2],digits=4);      dat7 <- 
dat6[,-4]; mat1 <- as.matrix(dat7); colnames(mat1)[-(2:3)] <- ' ';mat1})}) 
# Change dat to [,1:104] if you need all rows. lst2New <- 
lapply(lst1New,function(x) {names(x) <- NULL; x}) #head(lst2New[[1]][[1]],4)
#49.53 -96.77 
#[1,] 2000    1      1 -9.13 8.23 -999.9 0 
#[2,] 2000    1      2 -9.51 0.39 -999.9 0  lapply(names(lst2New),function(x)   
{nm1 <- paste(x, names(lst1New[[x]]),sep="_");   nm2 <- 
paste0(paste(paste0(getwd(),"/final"),nm1,sep="/"),".csv");   
lapply(seq_along(lst1New[[x]]),function(i) {x1 <- lst2New[[x]][[i]];            
                                   write.csv(x1, 
nm2[i],quote=FALSE,row.names=FALSE)})}) 
#-----------------------------------------------------------------------------
On Wednesday, April 9, 2014 1:29 AM, arun  wrote: Hi Atem, No problem.  Glad it 
worked.  In the first instance, I should have used `[[` instead of `[` in the 
last line of code, which created the confusion. lapply(...., lst2New[[x]][i]; 
write.table...) On Wednesday,
April 9, 2014 1:35 AM, Zilefac Elvis  wrote: Wow! You finally fixed it.
I appreciate your endless efforts.
Atem.
On Tuesday, April 8, 2014 11:26 PM, arun  wrote: Hi Atem, If you change 
colnames(mat1)[-(2:3)] <- ' ' in 'lst1New' lst1New <- lapply(lst1,function(x) 
{lst2 <- setNames(lapply(x,function(y) {dat <- read.table(y,sep=" 
",header=TRUE, stringsAsFactors=FALSE); dat[,1:104]} ), names1); dat2
<- do.call(cbind,lst2); indx <- grepl("Sim",names(dat2)); dat3 <-
dat2[indx];dat4 <- dat2[!indx][,1:4]; names(dat4) <- 
gsub(".*\\.","",names(dat4)); 
lapply(split(names(dat3),gsub(".*\\.","",names(dat3))),function(x)  {dat5 <- 
cbind(dat4,dat3[,x]); dat5$Tmean <- -999.9; dat6 <- dat5[,c(1:4,7:6,8,5)]; 
colnames(dat6)[2:3] <- 
format(Coord[match(unique(dat6$Site),Coord$Site),3:2],digits=4); dat7 <- 
dat6[,-4]; mat1 <- as.matrix(dat7); colnames(mat1)[-(2:3)] <- ' ';mat1})})  
lst2New <- lapply(lst1New,function(x) {names(x) <- NULL; x}) 
head(lst2New[[1]][[1]],2) 49.53 -96.77 
#[1,] 2000     1      1 -9.13 8.23 -999.9 0 
#[2,] 2000     1      2 -9.51 0.39 -999.9 0  lapply(names(lst2New),function(x) 
{nm1 <- paste(x,
names(lst1New[[x]]),sep="_"); nm2 <- 
paste0(paste(paste0(getwd(),"/final"),nm1,sep="/"),".csv");lapply(seq_along(lst1New[[x]]),function(i)
 {x1 <- lst2New[[x]][[i]]; write.table(x1, 
nm2[i],quote=FALSE,row.names=TRUE)})})  ##output file 
dat1 <- 
read.csv(paste(paste(getwd(),"final",sep="/"),"G100_Sim001.csv",sep="/"),header=TRUE,sep="
 ",row.names=1)[1:3,1:7] 
mat1N <- as.matrix(dat1) colnames(mat1N) <- 
gsub("X\\.|X","",dimnames(mat1N)[[2]]) colnames(mat1N)[-(2:3)] <- " " 
mat1N #       49.53 96.77 
#1 2000     1     1  -9.13  8.23 -999.9 0 #2 2000     1     2  -9.51  0.39 
-999.9 0 #3 2000    1     3 -18.10 -5.67 -999.9 0 
A.K. On Wednesday, April 9, 2014 1:06 AM, arun  wrote: Hi Atem, I slightly 
modified: lst1New <- lapply(lst1,function(x) {lst2 <- 
setNames(lapply(x,function(y) {dat <- read.table(y,sep=" ",header=TRUE, 
stringsAsFactors=FALSE); dat[,1:104]} ), names1); dat2 <- do.call(cbind,lst2); 
indx <- grepl("Sim",names(dat2)); dat3 <- dat2[indx];dat4 <- dat2[!indx][,1:4]; 
names(dat4) <- gsub(".*\\.","",names(dat4)); 
lapply(split(names(dat3),gsub(".*\\.","",names(dat3))),function(x)  {dat5 <- 
cbind(dat4,dat3[,x]); dat5$Tmean <-
-999.9; dat6
<- dat5[,c(1:4,7:6,8,5)]; colnames(dat6)[2:3] <- 
format(Coord[match(unique(dat6$Site),Coord$Site),3:2],digits=4); dat7 <- 
dat6[,-4]; mat1 <- as.matrix(dat7); colnames(mat1)[-(2:3)] <- 
"Missval";mat1})})  head(lst1New[[1]][[1]],2)  #     Missval 49.53 -96.77 
Missval Missval Missval Missval  #[1,]    2000     1      1   -9.13    8.23  
-999.9       0  #[2,]    2000     1      2   -9.51    0.39  -999.9       0  
lst2New <- lapply(lst1New,function(x) {names(x) <- NULL; x}) 
lapply(names(lst2New),function(x) {nm1 <- paste(x,
names(lst1New[[x]]),sep="_"); nm2
<- 
paste0(paste(paste0(getwd(),"/final"),nm1,sep="/"),".csv");lapply(seq_along(lst1New[[x]]),function(i)
 {x1 <- lst2New[[x]][[i]]; write.table(x1, nm2[i],quote=FALSE,row.names=F)})}) 
##output file 
read.csv(paste(paste(getwd(),"final",sep="/"),"G100_Sim001.csv",sep="/"),header=TRUE,sep="
 ",check.names=FALSE)[1:3,]
#Missval 49.53 -96.77 Missval Missval Missval Missval
#1    2000     1      1   -9.13    8.23  -999.9       0
#2    2000     1      2   -9.51    0.39  -999.9       0
#3    2000     1      3  -18.10   -5.67  -999.9      0  A.K. On Tuesday, April 
8, 2014 9:28 PM, Zilefac Elvis  wrote: Hi AK, 
Let's try to use characters other than NA and see what happens. I tried 
'Missval' but the Lat and Long had Missval prefixed to it. How can I get rid of 
any characters in Lat and Long columns? Thanks,
Atem. Hi AK, Thanks for the timely reply.  "Lat" should be on the MONTH  column 
while "Long" should be on the DAY column. I guess you did it this way. I will
try the code once
I get to school. Thanks again. Atem.  ------ Original Message ------ From : arun
>To : Zilefac Elvis;
>Sent : 08-04-2014 04:44
>Subject : Re: Splitting columns and forming new data files in R
> 
>Hi,
The 'lat' and 'long' names you mentioned correspond to which columns
in the "final" dataset? On Tuesday, April 8, 2014 1:17 AM, Zilefac Elvis  
wrote: Hi AK,
Please I need your help. I finally solved the previous task I sent to you.
I have Precip,Tmin and Tmax in three different folders (attached).
Each
folder
has 4 files with identical names in
all folders (we can match case). Within each file are [,YYY MM DD 
sim001...sim100] (some files may have more than 100 simulations. Use only the 
first 100). Q1) Open all three folders, go to file 1 (e.g G100), copy column 1 
(sim001, do not copy date) and paste it in a new folder called "final". Do so 
for column 2 (sim002),...,column 100 (sim100). So from file 1 alone with 100 
sims, you will have 100 files in "final". The files in "final" should be 
labelled for example as G100_sim001, G100_sim002,...,G100_sim100; 
G101_sim001,G101_sim002 etc. The format of all files in "final" is similar to:
50-110.7
196111-999.9-999.9-999.90
196112-999.9-999.9-999.92.38
196113-999.9-999.9-999.90
196114-999.9-999.9-999.90
196115-999.9-999.9-999.90
196116-999.9-999.9-999.90
196117-999.9-999.9-999.90
196118-999.9-999.9-999.90
196119-999.9-999.9-999.95.19
1961110-999.9-999.9-999.90
1961111-999.9-999.9-999.90
1961112-999.9-999.9-999.90
1961113-999.9-999.9-999.90
1961114-999.9-999.9-999.90
1961115-999.9-999.9-999.90 The columns after the date should be 
[Tmin,Tmax,Tmean,Precip]. Please do not include column names in output. Output 
files are .csv. *Fill column "Tmean" with -999.9 in all files. Therefore, using 
the sample I have provided, you will have 4sites*100 sims = 400 files in folder 
"final". Q2) From the attached coordinates file, please copy Lat andLong 
corresponding to the Site and past it in the first row of every file starting 
with that site code. For example, all files beginning with G100_sim... will have
their first row similar to:      49.53-96.7
196111-999.9-999.9-999.90 This looks very cumbersome
for me to handle.  Thanks very much.
Atem.
        [[alternative HTML version deleted]]

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Splitting columns and forming new data files in R

Reply via email to