Hi everybody, I'm working on the very messy data, I have tried to clean it up in SAS and SAS/IML but there is not enough info on how to handle certain things in SAS so I have turned to R. The thing itself should be rather simple, so i was wondering if someone could help me out.
The original .csv has ([1] 7138 6338 ) dimensions with funds with the corresponding dates and observations for each date for around 10 years and 4000+ funds, meaning in COL5 has the next fund's name and so on. COL1 COL2 COL3 COL4 HBNNF US Equity Date EQY_SH_OUT PX_VOLUME #NAME? #N/A N/A 135000 7/7/2008 #N/A N/A 105000 7/17/2008 #N/A N/A 590000 7/22/2008 #N/A N/A 40000 so in R this .csv is somehow read as list (using typeof) and not as dataframe, and a lot of stuff like regexpr searches in the whole file do not work or behave strangely. I want to stack the fund data, and create a long dataset with a fund name, date, eqy_sh_out and px_volume, with fund name present for each date. That should look like this, Fund_name Date EQY_SH_OUT PX_VOLUME HBNNF US Equity 7/7/2008 #N/A N/A 105000 HBNNF US Equity 7/17/2008 #N/A N/A 590000 HBNNF US Equity 7/22/2008 #N/A N/A 40000 HBNNF US Equity 7/24/2008 #N/A N/A 3000 HBNNF US Equity 7/31/2008 #N/A N/A 1000 HBNNF US Equity 8/20/2008 #N/A N/A 1000 HBNNF US Equity 8/26/2008 #N/A N/A 2000 HBNNF US Equity 8/27/2008 #N/A N/A 2000 HBNNF US Equity 9/2/2008 #N/A N/A 5000 HND CN Equity 1/17/2008 #N/A N/A 28000 HND CN Equity 1/18/2008 #N/A N/A 25000 HND CN Equity 1/21/2008 #N/A N/A 5000 HND CN Equity 1/22/2008 #N/A N/A 101000 HND CN Equity 1/23/2008 #N/A N/A 122000 Any way to accomplish this? Should be an easy way, but i have never worked with lists and somehow it doesn't read as a dataframe with strange results. > small_raw[1,1] [1] HBNNF US Equity Levels: 0.26 0.46 COL1 HBNNF US Equity > grep("Equity",as.character(small_raw)) integer(0) > small_raw[[1]] [1] HBNNF US Equity [5] [9] [13] [17] [21] [25] [29] [33] [37] [41] [45] [49] [53] [57] [61] [65] [69] [73] [77] [81] [85] [89] [93] [97] 0.46 0.46 [101] 0.46 0.26 [105] 0.26 0.26 [109] 0.26 0.26 [113] 0.26 0.26 [117] 0.26 0.26 [121] 0.26 0.26 [125] 0.26 0.26 [129] 0.26 0.26 [133] 0.26 0.26 [137] 0.26 0.26 [141] 0.26 0.26 [145] 0.26 0.26 [149] 0.26 0.26 [153] 0.26 0.26 [157] 0.26 0.26 [161] 0.26 0.26 [165] 0.26 [169] [173] [177] [181] [185] [189] [193] [197] Levels: 0.26 0.46 COL1 HBNNF US Equity I have been on this for a while. Thank you in advance! Arsenio ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.