Hi I am almost 100% sure that you would spare yourself much trouble if you changed your date column to real date
?as.Date reshape your wide format to long one library(reshape2) ?melt to get 3 column data.frame with one date column, one country column and one value column use ?aggregate and ?format to get summary value something like aggregate(value column, list(format(date column, "%m.%Y"), country column), mean) But if you insist to scratch your left ear with right hand accross your head, you could continue your way. Cheers Petr > -----Original Message----- > From: R-help <r-help-boun...@r-project.org> On Behalf Of Subhamitra > Patra > Sent: Friday, September 13, 2019 3:20 PM > To: Jim Lemon <drjimle...@gmail.com>; r-help mailing list <r-help@r- > project.org> > Subject: Re: [R] Query about calculating the monthly average of daily data > columns > > Dear Sir, > > Yes, I understood the logic. But, still, I have a few queries that I mentioned > below your answers. > > "# if you only have to get the monthly averages, it can be done this way > > spdat$month<-sapply(strsplit(spdat$dates,"-"),"["*,2*) > > spdat$year<-sapply(strsplit(spdat$dates,"-"),"[",*3*)" > > > > B. Here, I need to define the no. of months, and years separately, right? > > or else what 2, and 3 (in bold) indicates? > > > > To get the grouping variable of sequential months that you want, you only > need the month and year values of the dates in the first column. First I used > the "strsplit" function to split the date field at the hyphens, then used > "sapply" to extract ("[") the second (month) and *third (year)* parts as two > new columns. Because you have more than one year of data, you need the > year values or you will group all Januarys, all Februarys and so on. > Notice how I pass both of the new columns as a list (a data frame is a type of > list) in the call to get the mean of each month. > > 1. Here, as per my understanding, the "3" indicates the 3rd year, right? > But, you showed an average for 2 months of the same year. Then, what "3" > in the spdat$year object indicate? > > > C. From this part, I got the exact average values of both January and > > February of 1994 for country A, and B. But, in code, I have a query > > that I need to define spdat$returnA, and spdat$returnB separately > > before writing this code, right? Like this, I need to define for each > > 84 countries separately with their respective number of months, and > > years before writing this code, right? > > > > I don't think so. Because I don't know what your data looks like, I am > guessing that for each row, it has columns for each of the 84 countries. I > don't know what these columns are named, either. Maybe: > > date Australia Belarus ... Zambia > 01/01/1994 20 21 22 > ... > > Here, due to my misunderstanding about the code, I was wrong. But, what > data structure you guessed, it is absolutely right that for each row, I have > columns for each of the 84 countries. So, I think, I need to define the date > column with no. of months, and years once for all the countries. > Therefore, I got my answer to the first and third question in the previous > email (what you suggested) that I no need to define the column of each > country, as the date, and no. of observations are same for all countries. > But, the no. of days are different for each month, and similarly, for each > year. So, I think I need to define date for each year separately. Hence, I > have > given an example of 12 months, for 2 years (i.e. 1994, and 1995), and have > written the following code. Please correct me in case I am wrong. > > spdat<-data.frame( > > dates=paste(c(1:21,1:20,1:23,1:21,1:22,1:22,1:21,1:23,1:22,1:21,1:22,1:22),c(r > ep(1,21),rep(2,20), > rep(3,23), rep(4,21), > rep(5,22),rep(6,22),rep(7,21),rep(8,23),rep(9,22),rep(10,21),rep(11,22),rep(12 > ,22) > ),rep(1994,260) > dates1= > paste(c(1:22,1:20,1:23,1:20,1:23,1:22,1:21,1:23,1:21,1:22,1:22,1:21),c(rep(1,2 > 2),rep(2,20), > rep(3,23), rep(4,20), > rep(5,23),rep(6,22),rep(7,21),rep(8,23),rep(9,21),rep(10,21),rep(11,22),rep(12 > ,21) > ),rep(1995,259) ,sep="-") > > Concerning the exporting of structure of the dataset to excel, I will have > 12*84 matrix. But, please suggest me the way to proceed for the large > sample. I have mentioned below what I understood from your code. Please > correct me if I am wrong. > 1. I need to define the date for each year as the no. of days in each month > are different for each year (as mentioned in my above code). For instance, in > my data file, Jan 1994 has 21 days while Jan 1995 has 22 days. > 2. Need to define the date column as character. > 3. Need to define the monthly average for each month, and year. So, now > code will be as follows. > spdat$month<-sapply(strsplit(spdat$dates,"-"),"[",2,3,4,5,6,7,8,9,10,11,12) > %%%%As I need all months average sequentially. > spdat$year<-sapply(strsplit(spdat$dates,"-"),"[",3) > > Here, this meaning of "3", I am really unable to get. > > 4. Need to define each country with each month and year as mentioned in > the last part of your code. > > Please suggest me in this regard. > > Thank you. > > > > > > > > [image: Mailtrack] > <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_ca > mpaign=signaturevirality5&> > Sender > notified by > Mailtrack > <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_ca > mpaign=signaturevirality5&> > 09/13/19, > 06:41:41 PM > > On Fri, Sep 13, 2019 at 4:24 PM Jim Lemon <drjimle...@gmail.com> wrote: > > > Hi Subhamitra, > > I'll try to write my answers adjacent to your questions below. > > > > On Fri, Sep 13, 2019 at 6:08 PM Subhamitra Patra < > > subhamitra.pa...@gmail.com> wrote: > > > >> Dear Sir, > >> > >> Thank you very much for your suggestion. > >> > >> Yes, your suggested code worked. But, actually, I have data from 3rd > >> January 1994 to 3rd August 2017 for very large (i.e. for 84 > >> countries) sample. From this, I have given the example of the years > >> up to 2000. Before applying the same code for the long 24 years, I > >> want to learn the logic behind the code. Actually, some part of the > >> code is not understandable to me which I mentioned in the bold letter as > follows. > >> > >> "spdat<-data.frame( > >> dates=paste(c(1:30,1:28),c(rep(1,30),rep(2,28)),rep(1994,58),sep="-"), > >> returnA=sample(*15:50*,58,TRUE),returnB=sample(*10:45*,58,TRUE))" > >> > >> A. Here, I need to define the no. of days in a month, and the no. of > >> countries name separately, right? But, what is meant by 15:50, and > >> 10:45 in return A, and B respectively? > >> > > > > To paraphrase Donald Trump, this is FAKE DATA! I have no idea what the > > real values of return are, so I made them up using the "sample" function. > > However, this is not meant to mislead anyone, just to show how > > whatever numbers are in your data can be used in calculations. The > > colon (":") operator creates a sequence of numbers starting with the > > one to the left and ending with the one to the right. > > > >> > >> "# if you only have to get the monthly averages, it can be done this > >> way > >> spdat$month<-sapply(strsplit(spdat$dates,"-"),"["*,2*) > >> spdat$year<-sapply(strsplit(spdat$dates,"-"),"[",*3*)" > >> > >> B. Here, I need to define the no. of months, and years separately, right? > >> or else what 2, and 3 (in bold) indicates? > >> > > > > To get the grouping variable of sequential months that you want, you > > only need the month and year values of the dates in the first column. > > First I used the "strsplit" function to split the date field at the > > hyphens, then used "sapply" to extract ("[") the second (month) and > > third (year) parts as two new columns. Because you have more than one > > year of data, you need the year values or you will group all Januarys, > > all Februarys and so on. Notice how I pass both of the new columns as > > a list (a data frame is a type of > > list) in the call to get the mean of each month. > > > >> > >> "# get the averages by month and year - is this correct? > >> monthlyA<-by(*spdat$returnA*,spdat[,c("month","year")],mean) > >> monthlyB<-by(*spdat$returnB*,spdat[,c("month","year")],mean)" > >> > >> C. From this part, I got the exact average values of both January and > >> February of 1994 for country A, and B. But, in code, I have a query > >> that I need to define spdat$returnA, and spdat$returnB separately > >> before writing this code, right? Like this, I need to define for each > >> 84 countries separately with their respective number of months, and > >> years before writing this code, right? > >> > > > > I don't think so. Because I don't know what your data looks like, I am > > guessing that for each row, it has columns for each of the 84 > > countries. I don't know what these columns are named, either. Maybe: > > > > date Australia Belarus ... Zambia > > 01/01/1994 20 21 22 > > ... > > > > > >> Yes, after obtaining the monthly average for each country's data, I > >> need to use them for further calculations. So, I want to export the > >> result to excel. But, until understanding the code, I think I willn't > >> able to apply for the entire sample, and cannot be able to discuss > >> the format of the resulted column to export to excel. > >> > > > > Say that we perform the grouped mean calculation for the first two > > country columns like this: > > monmeans<-sapply(spdat[,2:3],by,spdat[,c("month","year")],mean) > > monmeans > > Australia Belarus > > [1,] 29.70000 30.43333 > > [2,] 34.17857 27.39286 > > > > We are presented with a 2x2 matrix of monthly means in just the format > > someone might use for importing into Excel. The first row is January > > 1994, the second February 1994 and so on. By expanding the columns to > > include all the countries in your data, You should have the result you want. > > > > Jim > > > > > -- > *Best Regards,* > *Subhamitra Patra* > *Phd. Research Scholar* > *Department of Humanities and Social Sciences* *Indian Institute of > Technology, Kharagpur* > *INDIA* > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code. Osobní údaje: Informace o zpracování a ochraně osobních údajů obchodních partnerů PRECHEZA a.s. jsou zveřejněny na: https://www.precheza.cz/zasady-ochrany-osobnich-udaju/ | Information about processing and protection of business partner’s personal data are available on website: https://www.precheza.cz/en/personal-data-protection-principles/ Důvěrnost: Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a podléhají tomuto právně závaznému prohláąení o vyloučení odpovědnosti: https://www.precheza.cz/01-dovetek/ | This email and any documents attached to it may be confidential and are subject to the legally binding disclaimer: https://www.precheza.cz/en/01-disclaimer/ ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.