Hi Atem,

Try this:

I created 3 folders (Precip, Tmax, Tmin) within the folder "sample"
#working directory: sample
list.files()
#[1] "Imputation_Daily_Sim01.dat"    "Imputation_Daily_Sim02.dat"   
#[3] "Imputation_Daily_Sim03.dat"    "Precip"                       
#[5] "Sim1971-2000_Daily_Sim001.dat" "Sim1971-2000_Daily_Sim002.dat"
#[7] "Sim1971-2000_Daily_Sim003.dat" "Tmax"                         
#[9] "Tmin" 

list.files(pattern="Sim1971-2000")
#[1] "Sim1971-2000_Daily_Sim001.dat" "Sim1971-2000_Daily_Sim002.dat"
#[3] "Sim1971-2000_Daily_Sim003.dat"

lst1 <- lapply(list.files(pattern="Sim1971-2000"),function(x) readLines(x))

lst1Not1970 <- lapply(lst1,function(x) x[!grepl("1970",x)]) 

#Using a small subset:
lst1Sub <- lapply(lst1Not1970,function(x) x[1:1000]) 


#replace lst1Sub with lst1Not1970 below 

lst2 <- lapply(lst1Sub,function(x) {dateSite <- gsub("(.*G\\d+).*","\\1",x); 
dat1 <- 
data.frame(Year=as.numeric(substr(dateSite,1,4)),Month=as.numeric(substr(dateSite,5,6)),Day=as.numeric(substr(dateSite,7,8)),Site=substr(dateSite,9,12),stringsAsFactors=FALSE);Sims
 <- gsub(".*G\\d+\\s+(.*)","\\1",x); Sims[grep("\\d+-",Sims)] <- gsub("(.*)([- 
][0-9]+\\.[0-9]+)","\\1 \\2",gsub("^([0-9]+\\.[0-9]+)(.*)","\\1 \\2", 
Sims[grep("\\d+-",Sims)])); Sims1 <- read.table(text=Sims,header=FALSE); 
names(Sims1) <- c("Precipitation", "Tmin", "Tmax");dat2 <- cbind(dat1,Sims1)})

Precip <- lapply(lst2,function(x) x[,1:5])

Tmin <- lapply(lst2,function(x) x[,c(1:4,6)]) 


Tmax <- lapply(lst2,function(x) x[,c(1:4,7)])

Precip1 <- cbind(Precip[[1]][,1:4],do.call(cbind,lapply(Precip,`[`,5)))

names(Precip1)[5:ncol(Precip1)] <- 
paste0("Sim",sprintf("%03d",1:length(Precip))) 


lapply(split(Precip1,Precip1$Site),function(x) 
write.table(x,file=paste(getwd(),"Precip",paste0(unique(x$Site),".dat"),sep="/"),row.names=FALSE,quote=FALSE))

Tmin1 <- cbind(Tmin[[1]][,1:4],do.call(cbind,lapply(Tmin,`[`,5)))

names(Tmin1) <- names(Precip1)

lapply(split(Tmin1,Tmin1$Site),function(x) 
write.table(x,file=paste(getwd(),"Tmin",paste0(unique(x$Site),".dat"),sep="/"),row.names=FALSE,quote=FALSE))
 


Tmax1 <- cbind(Tmax[[1]][,1:4],do.call(cbind,lapply(Tmax,`[`,5)))

names(Tmax1) <- names(Precip1)

lapply(split(Tmax1,Tmax1$Site),function(x) 
write.table(x,file=paste(getwd(),"Tmax",paste0(unique(x$Site),".dat"),sep="/"),row.names=FALSE,quote=FALSE))
 


Hope this helps.
A.K.


On Friday, March 28, 2014 2:07 AM, Zilefac Elvis <zilefacel...@yahoo.com> wrote:

Hi AK,
Consider that you had to use the large file which could not download. 
My final output will be as follows:
Three folders:
1) Precip
2) Tmin or minimum temperature
3) Tmax or maximum temperature

Within each folder, we will have 120 files. Each file is named by the site code 
e.g GGG1, GGG2 ,..., G120.
Each file will be a dataframe with the first 3 columns as date 
(Year,Month,Day). Years are from 1971-2000. For the large file, after the date 
columns are simulation numbers e.g Year,Month,Day,sim001,sim002...sim100. For 
the sample file, it would be Year,Month,Day,sim001,sim002,sim003.

Thanks again.
Atem.






On Thursday, March 27, 2014 11:55 PM, Zilefac Elvis <zilefacel...@yahoo.com> 
wrote:

Hi AK,
Attached is a sample from the large file. The expected output is explained at 
the end of this message (bold).
It is a little lengthy but is worth it given that the number of sites is 
plentiful. I have attached three simulations, so your will have sim1,sim2,sim3 
instead of sim1 to sim100 as in the previous message.
############################################################################
I have done some simulations in R and would like to order my data to usable 
format.
The data is to large so I have attached via Dropbox.
When you load Calibration.RData to the workspace, you will find the site codes 
(column 1) in "Prairies.Sites".
My initial dataset was in the form of a dataframe with with columns denoting 
stations. So I had three dataframes each for precipitation, Tmin, and Tmax. 
Individually, you reshaped the dataframes to three column vectors (see file 
called PrecipTminTmax) using this code: library(reshape2)
dat1 <- read.table("predictand.csv",header=TRUE,stringsAsFactors=TRUE,sep="\t") 
# Predictand.csv had 123 #columns with the columns 1,2,3 as date.
dat1<-precipitation
dat2M <- melt(dat1,id.var=c("year","month","day"))
dat2M1 <- dat2M[with(dat2M,order(year,month,day,variable)),]
dim(dat2M1)
#[1] 1972320       5
row.names(dat2M1) <- 1:nrow(dat2M1)
PrecipTminTmax<-cbind(precipitation,Tmin,Tmax) The problem to be solved 
Attached is a large file (SimCalibration.zip) containing my simulations (001 to 
100). Please import files starting with "Sim1971-2000_Daily_" only. The rest is 
not important. My analysis is for the period 1971-2000. Any data before or 
after this period should be ignored.
My simulation was done in R using Fortran encoding to read data values. All 
files are ".dat". In each file, the columns are as follows :
Year, Month, Day, Site, Precip, Tmin, Tmax. In another project involving 
rainfall only, I read such files into R using this code:
rain.data <- scan("gaugvals.all",what=character(),sep="\n",n=257212)
rain.data <- data.frame(Year=as.numeric(substr(rain.data,1,4)),                 
       Month=as.numeric(substr(rain.data,5,6)),                        
Day=as.numeric(substr(rain.data,7,8)),  
                      Site=substr(rain.data,10,12),                        
                      Rain=as.numeric(substr(rain.data,13,18))) 

Q1) So, I would like to read all files beginning with "Sim1971-2000_Daily_".
2) Split each file by variable name (Precip, Tmin, Tmax) and then arrange each 
variable in the form of a dataframe. For example, I will take precip from site 
GGG1 and have a data frame with colnames such as Year,Month,Day, 
sim1,sim2,...,sim100. Repeat this for all 120 sites. So that for Precip, you 
will have 120 files corresponding to the site codes. Each file has nrows with 
Year,Month,Day, sim1...sim100 columns. 3) Please repeat the above for Tmin and 
Tmax so that in the end I will have three folders (Precip, Tmin and Tmax). Each 
folder has 120 files with each file being a dataframe containing date and 100 
columns).  When you successfullly go through this "difficult" section,I will 
access each folder, read each file and apply a function to it one at a time. 
Thanks AK, this is part of my Msc thesis project. Your help would be fully 
acknowledged. You have helped me a lot towards the success of this project. 
Atem.




On Thursday, March 27, 2014 9:09 PM, arun <smartpink...@yahoo.com> wrote:

HI Atem,

I tried to download the first file. 
It is taking me forever.  With the speed I have, I doubt it would be 
successful.  Can you just provide some small reproducible example data and what 
your expected output would be?
Arun

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to