Re: [R] rowspan and readHTMLTable

Chris Stubben Wed, 08 May 2013 09:55:45 -0700

Sorry to answer my own question - I guess here's one way to read thistable. Other suggestions are still welcome.


Chris


------

x<-htmlParse("<table>
<tr><td rowspan=2>ab</td><td>X</td></tr>
<tr><td rowspan=2>YZ</td></tr>
<tr><td>c</td></tr>
</table>")

# split by rows
z <- getNodeSet(x, "//tr")

# create empty data.frame - probably not the best solution...
t1<- data.frame(matrix(NA, nrow = 3,  ncol = 2 ))

for (i in 1:3){

rowspan <- as.numeric( xpathSApply(z[[i]], ".//td", xmlGetAttr,"rowspan", 1) )

  val <- xpathSApply(z[[i]], ".//td", xmlValue)

  # fill values into empty cells
  n <- which(is.na(t1[i,]))
  t1[ i ,n] <- val

  if( any(rowspan > 1) ){
     for(j in 1:length( rowspan ) ){
        if(rowspan[j] > 1){
            ## repeat value down column
              t1[ (i+1):(i+ ( rowspan[j] -1) ) , n[j] ]   <- val[j]
        }
     }
  }
}


t1
 X1 X2
1 ab  X
2 ab YZ
3  c YZ

If you are interested, I used this code in the pmcTable function athttps://github.com/cstubben/pubmed . To get Table 1, this now works...


doc<-pmc("PMC3544749")  # downloads XML from OAI service

t1 <- pmcTable(doc,1) # parse table... also saves caption and footnotesto attributes

t1[1:4,1:4]

Category Gen Name Rvnumber Description1 Lipids and Fatty Acid Metabolism kasB Rv22463-oxoacyl-[acyl-carrier protein] synthase 2 kasb2 Mycolic acid synthesis mmaA4 Rv0642cMethoxy mycolic acid synthase 43 Mycolic acid synthesis pcaA Rv0470c Mycolic acidsynthase (cyclopropane synthase)4 Mycolic acid synthesis pcaA Rv0470c Mycolic acidsynthase (cyclopropane synthase)





--

Chris Stubben

Los Alamos National Lab
Bioscience Division
MS M888
Los Alamos, NM 87545

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] rowspan and readHTMLTable

Reply via email to