Although there is some code there, we may want to step back and ask what the person wants to do.
Assuming you have thousands of files in CSV format and you can easily access them one by one, then there are two major methods that seem interesting if you want to evaluate them in some way. You can open all the files and combine them into one entity you can query. Alternatively, you can create a query that you then do repeatedly on as many (or all) of the files till you get a (combined) result. The query can use native R mechanisms, or a package that does SQL if you put the data into a database, or another package such as in the tidyverse using dplyr and so on. In real life, you have to ask if the many CSV files contain valid data without duplications or missing data and so on. If it ios not guaranteed to be exactly what you want, then consolidating the data and running multiple functions on it to make it what you want may be important. Removing (or maybe fixing) some rows may make sense. It may make sense to remove columns not needed for the query or rows that fail some condition that simplifies later queries. It may make sense to choose subsets, remove duplicates on the columns remaining and much more. As one of many examples, you might determine info about some column and then remove outliers above and below a threshold. If you insist on using SQL, as the final query, you could push all the remaining data into a database. If your data is too large when combined, there are packages that can help you work with them albeit this is an example where pushing it all into a database first makes sense, albeit it can be slow. And note, your solution can include work outside of R such as copying all the files (assuming n o header in each) into one file and then continuing from R. Good luck. -----Original Message----- From: R-help <[email protected]> On Behalf Of John Kane Sent: Sunday, May 24, 2026 5:41 PM To: Naresh Gurbuxani <[email protected]> Cc: [email protected] Subject: Re: [R] duckdb table from multiple csv files I am not really sure what you are doing here. Where are the files stored? Are they in one place? What size are they? On Sun, 24 May 2026 at 09:35, Naresh Gurbuxani <[email protected]> wrote: > > I have approximately ten thousand csv files with identical columns and > formats. I want to run some SQL queries on a virtual database, where all > of these files are treated as one table. While it is possible to run > SQL query, dbListTables() does not show this table. Is it possible to > list all tables including those created from arrow FileSystem? > > Is it possible to achieve this result without arrow package? > > # Create example data > library(data.table) > data("flights", package = "nycflights13") > fwrite(flights[(origin == "EWR")], "data/flights/ewr_flights.csv") > fwrite(flights[(origin == "JFK")], "data/flights/jfk_flights.csv") > fwrite(flights[(origin == "LGA")], "data/flights/lga_flights.csv") > > data("airports", package = "nycflights13") > fwrite(airports, "data/airports.csv") > > # Verify data saved as intended > dir("data") > [1] "airports.csv" "flights" > dir("data/flights/") > [1] "ewr_flights.csv" "jfk_flights.csv" "lga_flights.csv" > > # Create virtual database with two tables > library(arrow) > library(duckdb) > > # csv file successfully registed as a table > con <- dbConnect(duckdb()) > duckdb_read_csv(con, "airports", "data/airports.csv") > dbListTables(con) > [1] "airports" > > # flights_arrow does not show up as a table > flights_arrow <- open_csv_dataset("data/flights") > duckdb_register_arrow(con, "flights", flights_arrow) > dbListTables(con) > [1] "airports" > dbGetQuery(con, "SELECT table_name FROM information_schema.tables;") > table_name > 1 airports > > # SQL queries can be run on flights table > dbGetQuery(con, "SELECT * FROM flights LIMIT 2;") > year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time > arr_delay carrier > 1 2013 1 1 517 515 2 830 > 819 11 UA > 2 2013 1 1 554 558 -4 740 > 728 12 UA > flight tailnum origin dest air_time distance hour minute > time_hour > 1 1545 N14228 EWR IAH 227 1400 5 15 2013-01-01 > 10:00:00 > 2 1696 N39463 EWR ORD 150 719 5 58 2013-01-01 > 10:00:00 > > ______________________________________________ > [email protected] mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > https://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- John Kane Kingston ON Canada [[alternative HTML version deleted]] ______________________________________________ [email protected] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide https://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ [email protected] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide https://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

