Thanks Bert. I also managed to get this work files = list.files(pattern="*.txt") df = ldply(files, read_csv,col_names=c("ticker","date","open","high", "low", "close", "volume")) Cheers, Nathan
On Fri, Nov 15, 2019 at 3:45 PM Bert Gunter <bgunter.4...@gmail.com> wrote: > Ha! -- A bug! "Corrected" version inline below: > Bert Gunter > > On Thu, Nov 14, 2019 at 8:10 PM Bert Gunter <bgunter.4...@gmail.com> > wrote: > >> Brute force approach, possibly inefficient: >> >> 1. You have a vector of file names. Sort them in the appropriate (time) >> order. These names are also the component names of all the data frames in >> your list that you read in, call it yourlist. >> >> 2. Create a vector of all the unique ticker names, perhaps by creating a >> vector of all the names and then unique() -ing it. Call this vector snames >> with n.names in it. It will probably have length several hundred at least I >> assume. >> >> 3. Suppose the 6 columns of data of each data frame that you want are >> named cnames = c("stocknames","Open", "High", "Low", "Close", "Volume"). >> >> 4. You could proceed as you suggested, but it would likely be more >> efficient, since all data that you want are numeric, to create a 3D array >> of NA's via: >> >> yourdat <- array(dim = c(n.dates, n.names, 5), dimnames = list(NULL, >> snames, cnames[-1])) >> >> 5. Then just loop through your list of files and use indexing to fill in >> the columns x category slices for each date. Stocks that are missing will >> be NA automatically. e.g. (warning: UNTESTED): >> >> For date "d", let df be the data frame from date "d" in your list, i.e. >> >> df <- yourlist[["d"]][, cnames] >> ## Note The order of the listed stocks in the "stocknames" column can be >> different from frame to frame of your master list. >> >> Then fill in the flat for the dth date (i.e. dth row) in your array by: >> >> ## corrected line here: > >> yourdat[ ,df[ ,"stocknames"], cnames[-1] <- as.matrix(df[ ,-1]) ## need >> to omit the column names so it converts to numeric matrix >> > ## need to get the names of the stocks in the "stocknames" column in the > order they appear in df. > >> >> This should fill in the values of the 2nd and 3rd dimensions of the >> array for all the stocks on the dth date with the data for each stock in >> the data frame matched to the appropriate column in the array. >> >> The entire loop will give all dates for all stocks and all categories >> with NA's for missing days. (*IF IT WORKS!*) >> You may need to modify this sightly if, for example, your stock names are >> row names rather than a field in your data frame. I leave such adjustments >> to you. >> >> Note again that this is fairly elementary with just arrays and indexing. >> Basic tutorials should tell you about all of this. Also, when plotting, >> you'll have to convert your dates to suitable date-time format. >> >> Cheers, >> Bert >> >> >> >> >> On Thu, Nov 14, 2019 at 4:55 PM Nhan La <lathanhn...@gmail.com> wrote: >> >>> Hi Bert, >>> >>> I've attempted to find the answer and actually been able to import the >>> individual data sets into a list of data frames. >>> >>> But I'm not sure how to go ahead with the next step. I'm not necessarily >>> asking for a final answer. Perhaps if you (I mean others as well) would >>> like a constructive coaching, you would suggest a few key words to look at? >>> >>> Sorry for the HTML thing, this is my first post. I'll do better next >>> times. >>> >>> Thanks, >>> Nathan >>> >>> >>> >>> On Fri, Nov 15, 2019 at 11:34 AM Bert Gunter <bgunter.4...@gmail.com> >>> wrote: >>> >>>> So you've made no attempt at all to do this for yourself?! >>>> >>>> That suggests to me that you need to spend time with some R tutorials. >>>> >>>> Also, please post in plain text on this plain text list. HTML can get >>>> mangled, as it may have here. >>>> >>>> -- Bert >>>> "The trouble with having an open mind is that people keep coming along >>>> and sticking things into it." >>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) >>>> >>>> >>>> On Thu, Nov 14, 2019 at 4:11 PM Nhan La <lathanhn...@gmail.com> wrote: >>>> >>>>> I have many separate data files in csv format for a lot of daily stock >>>>> prices. Over a few years there are hundreds of those data files, whose >>>>> names are the dates of data record. >>>>> >>>>> In each file there are variables of ticker (or stock trading code), >>>>> date, >>>>> open price, high price, low price, close price, and trading volume. For >>>>> example, inside a data file named 20150128.txt it looks like this: >>>>> >>>>> FB,20150128,1.075,1.075,0.97,0.97,725221 >>>>> AAPL,20150128,2.24,2.24,2.2,2.24,63682 >>>>> AMZN,20150128,0.4,0.415,0.4,0.415,194900 >>>>> NFLX,20150128,50.19,50.21,50.19,50.19,761845 >>>>> GOOGL,20150128,1.62,1.645,1.59,1.63,684835 ...................and many >>>>> more.................. >>>>> >>>>> In case it's relevant, the number of stocks in these files are not >>>>> necessarily the same (so there will be missing data). I need to import >>>>> and >>>>> create 5 separate time series data frames from those files, one each >>>>> for >>>>> Open, High, Low, Close and Volume. In each data frame, rows are >>>>> indexed by >>>>> date, and columns by ticker. For example, the data frame Open may look >>>>> like >>>>> this: >>>>> >>>>> DATE,FB,AAPL,AMZN,NFLX,GOOGL,... 20150128,1.5,2.2,0.4,5.1,1.6,... >>>>> 20150129,NA,2.3,0.5,5.2,1.7,... ... >>>>> >>>>> What will be an efficient way to do that? I've used the following >>>>> codes to >>>>> read the files into a list of data frames but don't know what to do >>>>> next >>>>> from here. >>>>> >>>>> files = list.files(pattern="*.txt") mydata = lapply(files, >>>>> read.csv,head=FALSE) >>>>> >>>>> Thanks, >>>>> >>>>> Nathan >>>>> >>>>> Disclaimer: In case it's relevant, this question is also posted on >>>>> stackoverflow. >>>>> >>>>> [[alternative HTML version deleted]] >>>>> >>>>> ______________________________________________ >>>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>> PLEASE do read the posting guide >>>>> http://www.R-project.org/posting-guide.html >>>>> and provide commented, minimal, self-contained, reproducible code. >>>>> >>>> [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.