Re: [R] how to find number of unique rows for combination of r columns

Gerrit Eichner Fri, 08 Nov 2019 07:21:08 -0800

It seems as if dt is not a (base R) data frame but a
data table. I assume, you will have to transform dt
into a data frame (maybe with as.data.frame) to be
able to apply unique in the suggested way. However,
I am not familiar with data tables. Perhaps somebody
else can provide a more profound guess.


 Regards  --  Gerrit

---------------------------------------------------------------------
Dr. Gerrit Eichner                   Mathematical Institute, Room 212
gerrit.eich...@math.uni-giessen.de   Justus-Liebig-University Giessen
Tel: +49-(0)641-99-32104          Arndtstr. 2, 35392 Giessen, Germany
http://www.uni-giessen.de/eichner
---------------------------------------------------------------------

Am 08.11.2019 um 16:02 schrieb Ana Marija:

I tried it but I got this error:

udt <- unique(dt[c("chr", "pos", "gene_id")])

Error in `[.data.table`(dt, c("chr", "pos", "gene_id")) :
   When i is a data.table (or character vector), the columns to join by
must be specified using 'on=' argument (see ?data.table), by keying x
(i.e. sorted, and, marked as sorted, see ?setkey), or by sharing
column names between x and i (i.e., a natural join). Keyed joins might
have further speed benefits on very large data due to x being sorted
in RAM.

On Fri, Nov 8, 2019 at 8:58 AM Gerrit Eichner
<gerrit.eich...@math.uni-giessen.de> wrote:


Hi, Ana,

doesn't

udt <- unique(dt[c("chr", "pos", "gene_id")])
nrow(udt)

get close to what you want?

   Hth  --  Gerrit

---------------------------------------------------------------------
Dr. Gerrit Eichner                   Mathematical Institute, Room 212
gerrit.eich...@math.uni-giessen.de   Justus-Liebig-University Giessen
Tel: +49-(0)641-99-32104          Arndtstr. 2, 35392 Giessen, Germany
http://www.uni-giessen.de/eichner
---------------------------------------------------------------------

Am 08.11.2019 um 15:38 schrieb Ana Marija:

Hello,

I have a data frame like this:

head(dt,20)

       chr    pos         gene_id pval_nominal  pval_ret       wl      wr
   1: chr1  54490 ENSG00000227232    0.6084950 0.7837780 31.62278 21.2838
   2: chr1  58814 ENSG00000227232    0.2952110 0.8975820 31.62278 21.2838
   3: chr1  60351 ENSG00000227232    0.4397880 0.8679590 31.62278 21.2838
   4: chr1  61920 ENSG00000227232    0.3195280 0.6018090 31.62278 21.2838
   5: chr1  63671 ENSG00000227232    0.2377390 0.9880390 31.62278 21.2838
   6: chr1  64931 ENSG00000227232    0.2766790 0.9070370 31.62278 21.2838
   7: chr1  81587 ENSG00000227232    0.6057930 0.6167630 31.62278 21.2838
   8: chr1 115746 ENSG00000227232    0.4078770 0.7799110 31.62278 21.2838
   9: chr1 135203 ENSG00000227232    0.4078770 0.9299130 31.62278 21.2838
10: chr1 138593 ENSG00000227232    0.8464560 0.5696060 31.62278 21.2838

it is very big,

dim(dt)

[1] 73719122        8

To count number of unique rows for all 3 columns: chr, pos and gene_id
I could just join those 3 columns and than count. But how would I find
unique number of rows for these 4 columns without joining them?

Thanks
Ana

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to find number of unique rows for combination of r columns

Reply via email to