Hi everybody,

I'm working on the very
messy data, I have tried to clean it up in SAS and
SAS/IML but there is not enough info on how to handle certain things
in SAS so I have turned to R. The thing itself should be rather
simple, so i was wondering if someone could help me out.

The original .csv has ([1] 7138 6338 ) dimensions with funds with the 
corresponding dates and observations for each date for around 10 years and 
4000+ funds, meaning in COL5 has the next fund's name and so on.

COL1                  COL2               COL3           COL4
HBNNF US Equity Date            EQY_SH_OUT      PX_VOLUME
                        #NAME?         #N/A N/A   135000
                        7/7/2008        #N/A N/A          105000
                        7/17/2008       #N/A N/A          590000
                        7/22/2008       #N/A N/A          40000


so in R this .csv is somehow read as list (using typeof) and not as dataframe, 
and a lot of stuff like regexpr searches in the whole file do not work or 
behave strangely. I want to stack the fund data, and create a long dataset with 
a fund name, date, eqy_sh_out and px_volume, with fund name present for each 
date.
That should look like this,

Fund_name               Date            EQY_SH_OUT      PX_VOLUME
HBNNF US Equity 7/7/2008        #N/A N/A        105000
HBNNF US Equity 7/17/2008       #N/A N/A        590000
HBNNF US Equity 7/22/2008       #N/A N/A        40000
HBNNF US Equity 7/24/2008       #N/A N/A        3000
HBNNF US Equity 7/31/2008       #N/A N/A        1000
HBNNF US Equity 8/20/2008       #N/A N/A        1000
HBNNF US Equity 8/26/2008       #N/A N/A        2000
HBNNF US Equity 8/27/2008       #N/A N/A        2000
HBNNF US Equity 9/2/2008        #N/A N/A        5000
HND CN Equity           1/17/2008       #N/A N/A        28000
HND CN Equity           1/18/2008       #N/A N/A        25000
HND CN Equity           1/21/2008       #N/A N/A        5000
HND CN Equity           1/22/2008       #N/A N/A        101000
HND CN Equity           1/23/2008       #N/A N/A        122000


Any way to accomplish this? Should be an easy way, but i have never worked with 
lists and somehow it doesn't read as a dataframe with strange results.

> small_raw[1,1]
[1] HBNNF US Equity
Levels:  0.26 0.46 COL1 HBNNF US Equity

> grep("Equity",as.character(small_raw))
integer(0)

> small_raw[[1]]
  [1] HBNNF US Equity                                                
  [5]                                                                
  [9]                                                                
 [13]                                                                
 [17]                                                                
 [21]                                                                
 [25]                                                                
 [29]                                                                
 [33]                                                                
 [37]                                                                
 [41]                                                                
 [45]                                                                
 [49]                                                                
 [53]                                                                
 [57]                                                                
 [61]                                                                
 [65]                                                                
 [69]                                                                
 [73]                                                                
 [77]                                                                
 [81]                                                                
 [85]                                                                
 [89]                                                                
 [93]                                                                
 [97] 0.46                            0.46                           
[101] 0.46                            0.26                           
[105] 0.26                            0.26                           
[109] 0.26                            0.26                           
[113] 0.26                            0.26                           
[117] 0.26                            0.26                           
[121] 0.26                            0.26                           
[125] 0.26                            0.26                           
[129] 0.26                            0.26                           
[133] 0.26                            0.26                           
[137] 0.26                            0.26                           
[141] 0.26                            0.26                           
[145] 0.26                            0.26                           
[149] 0.26                            0.26                           
[153] 0.26                            0.26                           
[157] 0.26                            0.26                           
[161] 0.26                            0.26                           
[165] 0.26                                                           
[169]                                                                
[173]                                                                
[177]                                                                
[181]                                                                
[185]                                                                
[189]                                                                
[193]                                                                
[197]                                                
Levels:  0.26 0.46 COL1 HBNNF US Equity

I have been on this for a while. Thank you in advance! 

Arsenio

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to