Thanks to all of you. It's great to interact with you, your comments are opportunities to learn more not only about the specific posted question, but also about many other related topics.
Most of comments agree on the single long-format data frame, and Jeff's synthesis has been particularly interesting. I run R in a server, which is well maintained and most likely faster than my pc. The main variables I am dealing with are snow-pach height and daily snow-fall amount; as support to these two measurements there are many other meteorological parameters (such as wind direction, wind speed, air temperature, theta-e air temperature, surface snow-pack temperature, incident radiation, reflected radiation). The frequency of the new sensors is getting higher and higher (at the time being is 10 minutes and in case of emergency can swap to 5 minutes!), I spent a lot of efforts to "normalize" data to half-hourly frequecy. I use this data for several different purposes, the most important are - graphical comparisons for manual validation (these comparisons may take into account different sensors of a single meteorological station or the same sensor for several meteorological stations) - studying some regressions that may result important - climatological studies A single data frame is easy to handle, this is what I've been doing so far. Yes, in few years time my initial data frame will pass the 20M rows, it will always be a concern. Thank you again for everything Stefano (oo) --oOO--( )--OOo-------------------------------------- Stefano Sofia MSc, PhD Civil Protection Department - Marche Region - Italy Meteo Section Snow Section Via Colle Ameno 5 60126 Torrette di Ancona, Ancona (AN) Uff: +39 071 806 7743 E-mail: stefano.so...@regione.marche.it ---Oo---------oO---------------------------------------- ________________________________ Da: Jeff Reichman <reichm...@sbcglobal.net> Inviato: venerd� 15 agosto 2025 01:00 A: Stefano Sofia; r-help@R-project.org Oggetto: RE: [R] About size of data frames [You don't often get email from reichm...@sbcglobal.net. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ] great question, and one that touches on both performance and usability in R. Here's a breakdown of the trade-offs and recommendations: You're comparing three data structure strategies for handling ~16.5 million observations: - Single long data frame, ~16.5M rows � 3 columns, Simple to manage, easy to filter/group, tidyverse-friendly, May require more memory; slower row-wise operations - Wide data frame, ~235K rows � 141 columns, Fast column-wise operations; good for matrix-style analysis, to reshape/filter; less tidy - List of 70 data frames, Each ~235K rows � 3 columns, Parallel processing possible; modular, Complex to manage; harder to aggregate or compare Performance Considerations - Memory Efficiency: A single long data frame is generally more memory-efficient than a list of data frames, especially if column types are consistent. - Vectorization: R is optimized for vectorized operations. A long format works well with dplyr, data.table, and tidyverse tools. - Parallelism: If you plan to process each sensor independently, a list of data frames could allow parallel computation using future, furrr, or parallel. - Reshaping Costs: Wide formats are fast for matrix-style operations but can be cumbersome when filtering by time, sensor, or value. I'd stick with the single long-format data frame: - It aligns with tidy data principles. - It's easier to filter, group, and summarize. - It integrates seamlessly with packages like ggplot2, dplyr, and data.table. If performance becomes an issue: - Consider converting to a data.table object (setDT(df)), which is highly optimized for large datasets. - Use indexing and keys for faster filtering. - Use arrow::read_parquet() or fst::write_fst() for fast disk I/O if you need to save/load frequently. If you're doing seasonal analysis, consider adding a season column. That way, you can easily group by sensor, season, and day without needing to split the data. -----Original Message----- From: R-help <r-help-boun...@r-project.org> On Behalf Of Stefano Sofia via R-help Sent: Thursday, August 14, 2025 6:27 AM To: r-help@R-project.org Subject: [R] About size of data frames Dear R-list users, let me ask you a very general question about performance of big data frames. I deal with semi-hourly meteorological data of about 70 sensors during 28 winter seasons. It means that for each sensor I have 48 data for each day, 181 days for each winter season (182 in case of leap year): 48 * 181 * 28 = 234,576 234,576 * 70 = 16420320 >From the computational point of view it is better to deal with a single data >frame of approximately 16.5 M rows and 3 columns (one for data, one for sensor >code and one for value), with a single data frame of approximately 235,000 >rows and 141 rows or 70 different data frames of approximately 235,000 rows >and 3 rows? Or it doesn't make any difference? I personally would prefer the first choice, because it would be easier for me to deal with a single data frame and few columns. Thank you for your usual help Stefano (oo) --oOO--( )--OOo-------------------------------------- Stefano Sofia MSc, PhD Civil Protection Department - Marche Region - Italy Meteo Section Snow Section Via Colle Ameno 5 60126 Torrette di Ancona, Ancona (AN) Uff: +39 071 806 7743 E-mail: stefano.so...@regione.marche.it ---Oo---------oO---------------------------------------- ________________________________ AVVISO IMPORTANTE: Questo messaggio di posta elettronica pu contenere informazioni confidenziali, pertanto destinato solo a persone autorizzate alla ricezione. I messaggi di posta elettronica per i client di Regione Marche possono contenere informazioni confidenziali e con privilegi legali. Se non si il destinatario specificato, non leggere, copiare, inoltrare o archiviare questo messaggio. Se si ricevuto questo messaggio per errore, inoltrarlo al mittente ed eliminarlo completamente dal sistema del proprio computer. Ai sensi dell'art. Ai sensi dell'art. 2.4 dell'allegato 1 alla DGR n. 74/2021, si segnala che, in caso di necessit ed urgenza, la risposta al presente messaggio di posta elettronica pu essere visionata da persone estranee al destinatario. IMPORTANT NOTICE: This e-mail message is intended to be received only by persons entitled to receive the confidential information it may contain. E-mail messages to clients of Regione Marche may contain information that is confidential and legally privileged. Please do not read, copy, forward, or store this message unless you are an intended recipient of it. If you have received this message in error, please forward it to the sender and delete it completely from your computer system. [[alternative HTML version deleted]] ________________________________ AVVISO IMPORTANTE: Questo messaggio di posta elettronica pu� contenere informazioni confidenziali, pertanto � destinato solo a persone autorizzate alla ricezione. I messaggi di posta elettronica per i client di Regione Marche possono contenere informazioni confidenziali e con privilegi legali. Se non si � il destinatario specificato, non leggere, copiare, inoltrare o archiviare questo messaggio. Se si � ricevuto questo messaggio per errore, inoltrarlo al mittente ed eliminarlo completamente dal sistema del proprio computer. Ai sensi dell'art. Ai sensi dell'art. 2.4 dell'allegato 1 alla DGR n. 74/2021, si segnala che, in caso di necessit� ed urgenza, la risposta al presente messaggio di posta elettronica pu� essere visionata da persone estranee al destinatario. IMPORTANT NOTICE: This e-mail message is intended to be received only by persons entitled to receive the confidential information it may contain. E-mail messages to clients of Regione Marche may contain information that is confidential and legally privileged. Please do not read, copy, forward, or store this message unless you are an intended recipient of it. If you have received this message in error, please forward it to the sender and delete it completely from your computer system. [[alternative HTML version deleted]]
______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide https://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.