Re: [R] duckdb table from multiple csv files

Jan van der Laan Mon, 25 May 2026 04:46:06 -0700




On 5/25/26 04:46, Naresh Gurbuxani wrote:


" If all the data were in a few files, then in memory duckdb would work."

I only need a subset of data at any time.  Duckdb allows a virtual table for 
each file.  This not practical with thousands of files.  With a few large 
files, this can work.  Here the goal is to establish a connection, not to load 
all data at once.

It the files have the same columns, you can also also open all filesinto one virtual database using duckdb. The code below creates a virtualtable view called 'flights' with the data from all csv files in data/.


con <- duckdb::dbConnect(duckdb::duckdb())

sql <- paste0("CREATE OR REPLACE VIEW flights AS "
  "SELECT * FROM read_csv('data/**/*.csv');")
DBI::dbExecute(con, sql)

dbListTables(con)

dbGetQuery(con, "SELECT * FROM flights;")

duckdb is fast and will do things in parallel, but for every query itwill have to go through all files. Going through 200GB of data will taketime. So, if you have to query the data repeatedly it is probably goingto speed up your code significantly if you resave your data in anotherformat.


HTH,

Jan

______________________________________________
[email protected] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide https://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] duckdb table from multiple csv files

Reply via email to