Subsetting using [] vs. head(), gives different results. R code:
> head(train$data, 5) [1] 0 0 1 0 0 > train$data[1:5, 1:5] 5 x 5 sparse Matrix of class "dgCMatrix" cap-shape=bell cap-shape=conical cap-shape=convex [1,] . . 1 [2,] . . 1 [3,] 1 . . [4,] . . 1 [5,] . . 1 cap-shape=flat cap-shape=knobbed [1,] . . [2,] . . [3,] . . [4,] . . [5,] . . On Fri, Oct 20, 2017 at 3:51 PM, C W <tmrs...@gmail.com> wrote: > Thank you for your responses. > > I guess I don't feel alone. I don't find the documentation go into any > detail. > > I also find it surprising that, > > > object.size(train$data) > 1730904 bytes > > > object.size(as.matrix(train$data)) > 6575016 bytes > > the dgCMatrix actually takes less memory, though it *looks* like the > opposite. > > Cheers! > > On Fri, Oct 20, 2017 at 3:22 PM, David Winsemius <dwinsem...@comcast.net> > wrote: > >> >> > On Oct 20, 2017, at 11:11 AM, C W <tmrs...@gmail.com> wrote: >> > >> > Dear R list, >> > >> > I came across dgCMatrix. I believe this class is associated with sparse >> > matrix. >> >> Yes. See: >> >> help('dgCMatrix-class', pack=Matrix) >> >> If Martin Maechler happens to respond to this you should listen to him >> rather than anything I write. Much of what the Matrix package does appears >> to be magical to one such as I. >> >> > >> > I see there are 8 attributes to train$data, I am confused why are there >> so >> > many, some are vectors, what do they do? >> > >> > Here's the R code: >> > >> > library(xgboost) >> > data(agaricus.train, package='xgboost') >> > data(agaricus.test, package='xgboost') >> > train <- agaricus.train >> > test <- agaricus.test >> > attributes(train$data) >> > >> >> I got a bit of an annoying surprise when I did something similar. It >> appearred to me that I did not need to load the xgboost library since all >> that was being asked was "where is the data" in an object that should be >> loaded from that library using the `data` function. The last command asking >> for the attributes filled up my console with a 100K length vector (actually >> 2 of such vectors). The `str` function returns a more useful result. >> >> > data(agaricus.train, package='xgboost') >> > train <- agaricus.train >> > names( attributes(train$data) ) >> [1] "i" "p" "Dim" "Dimnames" "x" "factors" >> "class" >> > str(train$data) >> Formal class 'dgCMatrix' [package "Matrix"] with 6 slots >> ..@ i : int [1:143286] 2 6 8 11 18 20 21 24 28 32 ... >> ..@ p : int [1:127] 0 369 372 3306 5845 6489 6513 8380 8384 10991 >> ... >> ..@ Dim : int [1:2] 6513 126 >> ..@ Dimnames:List of 2 >> .. ..$ : NULL >> .. ..$ : chr [1:126] "cap-shape=bell" "cap-shape=conical" >> "cap-shape=convex" "cap-shape=flat" ... >> ..@ x : num [1:143286] 1 1 1 1 1 1 1 1 1 1 ... >> ..@ factors : list() >> >> > Where is the data, is it in $p, $i, or $x? >> >> So the "data" (meaning the values of the sparse matrix) are in the @x >> leaf. The values all appear to be the number 1. The @i leaf is the sequence >> of row locations for the values entries while the @p items are somehow >> connected with the columns (I think, since 127 and 126=number of columns >> from the @Dim leaf are only off by 1). >> >> Doing this > colSums(as.matrix(train$data)) >> cap-shape=bell cap-shape=conical >> 369 3 >> cap-shape=convex cap-shape=flat >> 2934 2539 >> cap-shape=knobbed cap-shape=sunken >> 644 24 >> cap-surface=fibrous cap-surface=grooves >> 1867 4 >> cap-surface=scaly cap-surface=smooth >> 2607 2035 >> cap-color=brown cap-color=buff >> 1816 >> # now snipping the rest of that output. >> >> >> >> Now this makes me think that the @p vector gives you the cumulative sum >> of number of items per column: >> >> > all( cumsum( colSums(as.matrix(train$data)) ) == train$data@p[-1] ) >> [1] TRUE >> >> > >> > Thank you very much! >> > >> > [[alternative HTML version deleted]] >> >> Please read the Posting Guide. Your code was not mangled in this >> instance, but HTML code often arrives in an unreadable mess. >> >> > >> > ______________________________________________ >> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide http://www.R-project.org/posti >> ng-guide.html >> > and provide commented, minimal, self-contained, reproducible code. >> >> David Winsemius >> Alameda, CA, USA >> >> 'Any technology distinguishable from magic is insufficiently advanced.' >> -Gehm's Corollary to Clarke's Third Law >> >> >> >> >> >> > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.