Dear r-help readers, why is it so much slower to query an sqlite database using RSQlite «from the outside» using param like
statement <-
"SELECT * FROM gene2refseq
LEFT JOIN gene_info ON
gene_info.GeneID = gene2refseq.GeneID
WHERE gene2refseq.`RNA_nucleotide_accession.version`
LIKE ?"
db <- "gene_info.sqlite"
conn <- DBI::dbConnect(RSQLite::SQLite(), db)
x1 <- DBI::dbGetQuery(conn=conn,
statement=statement,
param=list(Håkan20210914$RNANucleotideAccession))
compared to querying «from the inside»
of sqlite, by writing your search terms
as a table first, and then calling it
statement <-
"SELECT * FROM H
LEFT JOIN gene2refseq R ON
R.`RNA_nucleotide_accession.version`
LIKE '%' || H.RNANucleotideAccession || '%'
LEFT JOIN gene_info I ON I.GeneID = R.GeneID"
DBI::dbWriteTable(conn, "H", Håkan20210914)
x2 <- DBI::dbGetQuery(conn=conn, statement=statement)
DBI::dbDisconnect(conn)
On my system (E5-2603 v4), the first
query took more than an hour, while the
second took only a few minutes ...
Do you guys know of any faster (but also
nice) way to dig around in very large
tsv files like
https://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2refseq.gz
and
https://ftp.ncbi.nlm.nih.gov/gene/DATA/gene_info.gz
?
Best,
Rasmus
signature.asc
Description: PGP signature
______________________________________________ [email protected] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

