JavOrraca opened a new issue, #2508: URL: https://github.com/apache/arrow-adbc/issues/2508
### What would you like help with? After connecting to Snowflake with R's v0.16.0 [`adbcsnowflake`](https://arrow.apache.org/adbc/current/r/adbcsnowflake/index.html) and [`adbcdrivermanager`](https://arrow.apache.org/adbc/current/r/adbcdrivermanager/)) I benchmarked query + download times using ADBC vs ODBC. This could be a case of user error or bad setup. My benchmarking (see screenshot below) tested ADBC vs ODBC on the same observations and columns, iterating over different combinations of row counts and columns.  For the ODBC method, I established a connection to Snowflake with `DBI` and then used `dplyr` + `dbplyr` to interact with Snowflake, finally collecting the results as a tibble via `dplyr::collect()`. For the ADBC method, I used the connection string below and collected my results to a tibble using `tibble::as_tibble()` as shown on the [`adbcsnowflake`](https://arrow.apache.org/adbc/current/r/adbcsnowflake/index.html) website. ```{r} # Not shown here is the ODBC connection string using {DBI} but # it relies on the same inputs as the ADBC inputs, below, and the # ODBC connection was assigned to the object "conn_odbc" library(adbcdrivermanager) db <- adbc_database_init( adbcsnowflake::adbcsnowflake(), username = Sys.getenv("ldap_username"), adbc.snowflake.sql.account = Sys.getenv("sf_account"), adbc.snowflake.sql.uri.protocol = "https", adbc.snowflake.sql.uri.host = Sys.getenv("sf_host"), adbc.snowflake.sql.uri.port = "443", adbc.snowflake.sql.auth_type = "auth_oauth", adbc.snowflake.sql.client_option.auth_token = Sys.getenv("sf_auth_token"), adbc.snowflake.sql.role = Sys.getenv("sf_role"), adbc.snowflake.sql.warehouse = Sys.getenv("sf_warehouse"), adbc.snowflake.sql.database = Sys.getenv("sf_database") ) conn_adbc <- adbc_connection_init(db) # Benchmarking psuedo-code example library(bench) library(dplyr) library(dbplyr) benchmarking_odbc_vs_adbc <- bench::mark( min iterations = 5, ODBC_1000_5_col = conn_odbc |> dplyr::tbl(I("CATALOG.SCHEMA.TABLE_NAME")) |> dplyr::select(name, age, tenure, year_yyyy, month_mm) |> dplyr::filter( year_yyyy == 2024, month_mm == 12 ) |> head(1000) |> dplyr::collect(), ADBC_1000_5_col = conn_adbc |> adbcdrivermanager::read_adbc( "SELECT TOP 1000 col_1, col_2, col_3, col_4, col_5 FROM CATALOG.SCHEMA.TABLE_NAME WHERE year_yyyy = 2024 AND month_mm = 12" ) |> tibble::as_tibble(), check = FALSE ) ``` As indicated on my screenshot above, my results show that ADBC performs better than ODBC on the smaller data sets but on the largest of data pulls (1 million rows and 75 columns), ODBC performs better. I ran 5 iterations for each combination of rows and columns and forn the largest data set, ADBC takes ~57 minutes to pull the data while ODBC takes ~50 seconds. There could be a number of reasons for this (e.g., user error, or driver config setup, or something even Snowflake side) but if anything on the surface jumps out at what I should be doing differently, any pointers would be appreciated. Thanks so much for your help and awesome work with Arrow and these drivers. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org