Hello,

Here's a function that doesn't do it all but might help.

fun <- function(x){
    x1 <- unlist(strsplit(x, " "))
    x2 <- x1[nchar(x1) > 0]
    i <- as.integer(x2[1])
    x3 <- unlist(strsplit(x2[-1], ":"))
    j <- as.integer(x3[rep(c(TRUE, FALSE), length(x3)/2)])
    y <- numeric(max(j))
    y[j] <- as.numeric(x3[rep(c(FALSE, TRUE), length(x3)/2)])
    list(row = i, line = y)
}

x <- "1  5:1  27:3  345:10"
fun(x)

If you know that your labels, i.e., row numbers are consecutive, have the function return just 'y', not a list.
Then use readLines to read the file in and lapply fun to it. Something like

ln <- readLines(filename)
lst <- lapply(ln, fun)

Then you'll have another problem. The lines' lengths. They shouldn't be all the same, so in order to make a data.frame or matrix you'll need extra work. Try the code above and say whether it's on the right track.

Also, take a look at package Matrix. It's a recommended package and it implements sparse matrices.

Hope this helps,

Rui Barradas

Em 09-10-2012 05:56, Noah Silverman escreveu:
I have a bunch of data sets that were created for the libsvm tool.  They are in 
"colon separated sparse format".

i.e.

1  5:1  27:3  345:10

Is a row with the label of "1" and only has values in columns 5, 27, and 345.

I want to read these into a data.frame in R.

Is there a simple way to do this?

--
Noah Silverman, M.S.
UCLA Department of Statistics
8117 Math Sciences Building
Los Angeles, CA 90095

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to