(arrow-site) branch asf-site updated: Update R changelog for 23.0.0 (#752)

thisisnic Wed, 28 Jan 2026 20:06:56 -0800

This is an automated email from the ASF dual-hosted git repository.

thisisnic pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/arrow-site.git



The following commit(s) were added to refs/heads/asf-site by this push:
     new 42deea61fe6 Update R changelog for 23.0.0 (#752)
42deea61fe6 is described below

commit 42deea61fe68aadb2872dc722d2b37a4fdc4f8be
Author: Nic Crane <[email protected]>
AuthorDate: Wed Jan 28 23:05:43 2026 -0500

    Update R changelog for 23.0.0 (#752)
    
    This needed doing manually as we didn't update the relevant page until
    after the release vote.
---
 docs/r/news/index.html | 85 ++++++++++++++++++++++++++++++--------------------
 1 file changed, 51 insertions(+), 34 deletions(-)

diff --git a/docs/r/news/index.html b/docs/r/news/index.html
index b178f9adb04..c93017589fd 100644
--- a/docs/r/news/index.html
+++ b/docs/r/news/index.html
@@ -73,7 +73,24 @@
     </div>
 
     <div class="section level2">
-<h2 class="pkg-version" data-toc-text="23.0.0" id="arrow-2300">arrow 23.0.0<a 
class="anchor" aria-label="anchor" href="#arrow-2300"></a></h2>
+<h2 class="pkg-version" data-toc-text="23.0.0" id="arrow-2300">arrow 23.0.0<a 
class="anchor" aria-label="anchor" href="#arrow-2300"></a></h2><p 
class="text-muted">CRAN release: 2026-01-23</p>
+<div class="section level3">
+<h3 id="new-features-23-0-0">New features<a class="anchor" aria-label="anchor" 
href="#new-features-23-0-0"></a></h3>
+<ul><li>
+<code><a href="https://rdrr.io/r/base/nchar.html"; 
class="external-link">nchar()</a></code> now supports <code>keepNA = 
FALSE</code> (<a href="https://github.com/HyukjinKwon"; 
class="external-link">@HyukjinKwon</a>, <a 
href="https://github.com/apache/arrow/issues/48665"; 
class="external-link">#48665</a>).</li>
+<li>
+<code><a href="https://stringr.tidyverse.org/reference/str_like.html"; 
class="external-link">stringr::str_ilike()</a></code> binding for 
case-insensitive pattern matching (<a 
href="https://github.com/apache/arrow/issues/48262"; 
class="external-link">#48262</a>).</li>
+</ul></div>
+<div class="section level3">
+<h3 id="minor-improvements-and-fixes-23-0-0">Minor improvements and fixes<a 
class="anchor" aria-label="anchor" 
href="#minor-improvements-and-fixes-23-0-0"></a></h3>
+<ul><li>Fix slow performance reading files with large number of columns (<a 
href="https://github.com/apache/arrow/issues/48104"; 
class="external-link">#48104</a>).</li>
+<li>Fix segfault when calling <code><a 
href="../reference/concat_tables.html">concat_tables()</a></code> on a 
<code>RecordBatch</code> (<a 
href="https://github.com/apache/arrow/issues/47885"; 
class="external-link">#47885</a>).</li>
+<li>Writing partitioned datasets on S3 no longer requires 
<code>ListBucket</code> permissions (<a href="https://github.com/HaochengLIU"; 
class="external-link">@HaochengLIU</a>, <a 
href="https://github.com/apache/arrow/issues/47599"; 
class="external-link">#47599</a>).</li>
+</ul></div>
+<div class="section level3">
+<h3 id="installation-23-0-0">Installation<a class="anchor" aria-label="anchor" 
href="#installation-23-0-0"></a></h3>
+<ul><li>As of version 23.0.0, <code>arrow</code> requires C++20 to build from 
source. This means that you may need a newer compiler than the default on some 
older systems. See <code><a href="../articles/install.html">vignette("install", 
package = "arrow")</a></code> for guidance.</li>
+</ul></div>
 </div>
     <div class="section level2">
 <h2 class="pkg-version" data-toc-text="22.0.0.1" id="arrow-22001">arrow 
22.0.0.1<a class="anchor" aria-label="anchor" href="#arrow-22001"></a></h2><p 
class="text-muted">CRAN release: 2025-12-23</p>
@@ -117,7 +134,7 @@
 <h3 id="minor-improvements-and-fixes-21-0-0">Minor improvements and fixes<a 
class="anchor" aria-label="anchor" 
href="#minor-improvements-and-fixes-21-0-0"></a></h3>
 <ul><li>Expose an option 
<code>check_directory_existence_before_creation</code> in 
<code>S3FileSystem</code> to reduce I/O calls on cloud storage (<a 
href="https://github.com/HaochengLIU"; class="external-link">@HaochengLIU</a>, 
<a href="https://github.com/apache/arrow/issues/41998"; 
class="external-link">#41998</a>).</li>
 <li>
-<code><a href="https://dplyr.tidyverse.org/reference/case_when.html"; 
class="external-link">case_when()</a></code> now correctly detects objects that 
are not in the global environment (<a href="https://github.com/etiennebacher"; 
class="external-link">@etiennebacher</a>, <a 
href="https://github.com/apache/arrow/issues/46667"; 
class="external-link">#46667</a>).</li>
+<code>case_when()</code> now correctly detects objects that are not in the 
global environment (<a href="https://github.com/etiennebacher"; 
class="external-link">@etiennebacher</a>, <a 
href="https://github.com/apache/arrow/issues/46667"; 
class="external-link">#46667</a>).</li>
 <li>Negative fractional dates now correctly converted to integers by flooring 
values (<a href="https://github.com/apache/arrow/issues/46873"; 
class="external-link">#46873</a>).</li>
 <li>Backwards compatibility checks for legacy Arrow C++ versions have been 
removed from the R package (<a 
href="https://github.com/apache/arrow/issues/46491"; 
class="external-link">#46491</a>). This shouldn’t affect most users this 
package and would only impact you if you were building the R package from 
source with different R package and Arrow C++ versions.</li>
 <li>Require CMake 3.25 or greater in bundled build script for full-source 
builds (<a href="https://github.com/apache/arrow/issues/46834"; 
class="external-link">#46834</a>). This shouldn’t affect most users.</li>
@@ -167,9 +184,9 @@
 <h3 id="new-features-17-0-0">New features<a class="anchor" aria-label="anchor" 
href="#new-features-17-0-0"></a></h3>
 <ul><li>R functions that users write that use functions that Arrow supports in 
dataset queries now can be used in queries too. Previously, only functions that 
used arithmetic operators worked. For example, <code>time_hours &lt;- 
function(mins) mins / 60</code> worked, but <code>time_hours_rounded &lt;- 
function(mins) round(mins / 60)</code> did not; now both work. These are 
automatic translations rather than true user-defined functions (UDFs); for 
UDFs, see <code><a href="../reference/re [...]
 <li>
-<code><a href="https://dplyr.tidyverse.org/reference/mutate.html"; 
class="external-link">mutate()</a></code> expressions can now include 
aggregations, such as <code>x - mean(x)</code>. (<a 
href="https://github.com/apache/arrow/issues/41350"; 
class="external-link">#41350</a>)</li>
+<code>mutate()</code> expressions can now include aggregations, such as 
<code>x - mean(x)</code>. (<a 
href="https://github.com/apache/arrow/issues/41350"; 
class="external-link">#41350</a>)</li>
 <li>
-<code><a href="https://dplyr.tidyverse.org/reference/summarise.html"; 
class="external-link">summarize()</a></code> supports more complex expressions, 
and correctly handles cases where column names are reused in expressions. (<a 
href="https://github.com/apache/arrow/issues/41223"; 
class="external-link">#41223</a>)</li>
+<code>summarize()</code> supports more complex expressions, and correctly 
handles cases where column names are reused in expressions. (<a 
href="https://github.com/apache/arrow/issues/41223"; 
class="external-link">#41223</a>)</li>
 <li>The <code>na_matches</code> argument to the <code>dplyr::*_join()</code> 
functions is now supported. This argument controls whether <code>NA</code> 
values are considered equal when joining. (<a 
href="https://github.com/apache/arrow/issues/41358"; 
class="external-link">#41358</a>)</li>
 <li>R metadata, stored in the Arrow schema to support round-tripping data 
between R and Arrow/Parquet, is now serialized and deserialized more strictly. 
This makes it safer to load data from files from unknown sources into R 
data.frames. (<a href="https://github.com/apache/arrow/issues/41969"; 
class="external-link">#41969</a>)</li>
 </ul></div>
@@ -322,7 +339,7 @@
 <li>Ensure that the RStringViewer helper class does not own any Array 
references (<a href="https://github.com/apache/arrow/issues/35812"; 
class="external-link">#35812</a>)</li>
 <li>
 <code><a href="https://rdrr.io/r/base/strptime.html"; 
class="external-link">strptime()</a></code> in arrow will return a 
timezone-aware timestamp if <code>%z</code> is part of the format string (<a 
href="https://github.com/apache/arrow/issues/35671"; 
class="external-link">#35671</a>)</li>
-<li>Column ordering when combining <code><a 
href="https://dplyr.tidyverse.org/reference/group_by.html"; 
class="external-link">group_by()</a></code> and <code><a 
href="https://dplyr.tidyverse.org/reference/across.html"; 
class="external-link">across()</a></code> now matches dplyr (<a 
href="https://github.com/eitsupi"; class="external-link">@eitsupi</a>, <a 
href="https://github.com/apache/arrow/issues/35473"; 
class="external-link">#35473</a>)</li>
+<li>Column ordering when combining <code>group_by()</code> and 
<code>across()</code> now matches dplyr (<a href="https://github.com/eitsupi"; 
class="external-link">@eitsupi</a>, <a 
href="https://github.com/apache/arrow/issues/35473"; 
class="external-link">#35473</a>)</li>
 </ul></div>
 <div class="section level3">
 <h3 id="installation-13-0-0">Installation<a class="anchor" aria-label="anchor" 
href="#installation-13-0-0"></a></h3>
@@ -352,7 +369,7 @@
 <ul><li>The <code><a 
href="../reference/read_parquet.html">read_parquet()</a></code> and <code><a 
href="../reference/read_feather.html">read_feather()</a></code> functions can 
now accept URL arguments (<a 
href="https://github.com/apache/arrow/issues/33287"; 
class="external-link">#33287</a>, <a 
href="https://github.com/apache/arrow/issues/34708"; 
class="external-link">#34708</a>).</li>
 <li>The <code>json_credentials</code> argument in 
<code>GcsFileSystem$create()</code> now accepts a file path containing the 
appropriate authentication token (<a href="https://github.com/amoeba"; 
class="external-link">@amoeba</a>, <a 
href="https://github.com/apache/arrow/issues/34421"; 
class="external-link">#34421</a>, <a 
href="https://github.com/apache/arrow/issues/34524"; 
class="external-link">#34524</a>).</li>
 <li>The <code>$options</code> member of <code>GcsFileSystem</code> objects can 
now be inspected (<a href="https://github.com/amoeba"; 
class="external-link">@amoeba</a>, <a 
href="https://github.com/apache/arrow/issues/34422"; 
class="external-link">#34422</a>, <a 
href="https://github.com/apache/arrow/issues/34477"; 
class="external-link">#34477</a>).</li>
-<li>The <code><a 
href="../reference/read_delim_arrow.html">read_csv_arrow()</a></code> and 
<code><a href="../reference/read_json_arrow.html">read_json_arrow()</a></code> 
functions now accept literal text input wrapped in <code><a 
href="https://rdrr.io/r/base/AsIs.html"; class="external-link">I()</a></code> to 
improve compatability with <code>readr::read_csv()</code> (<a 
href="https://github.com/eitsupi"; class="external-link">@eitsupi</a>, <a 
href="https://github.com/apache/arrow/issues/18 [...]
+<li>The <code><a 
href="../reference/read_delim_arrow.html">read_csv_arrow()</a></code> and 
<code><a href="../reference/read_json_arrow.html">read_json_arrow()</a></code> 
functions now accept literal text input wrapped in <code><a 
href="https://rdrr.io/r/base/AsIs.html"; class="external-link">I()</a></code> to 
improve compatability with <code><a 
href="https://readr.tidyverse.org/reference/read_delim.html"; 
class="external-link">readr::read_csv()</a></code> (<a 
href="https://github.com/eitsu [...]
 <li>Nested fields can now be accessed using <code>$</code> and <code>[[</code> 
in dplyr expressions (<a href="https://github.com/apache/arrow/issues/18818"; 
class="external-link">#18818</a>, <a 
href="https://github.com/apache/arrow/issues/19706"; 
class="external-link">#19706</a>).</li>
 </ul></div>
 <div class="section level3">
@@ -409,7 +426,7 @@
 </ul></div>
 <div class="section level4">
 <h4 id="dplyr-compatibility-11-0-0-2">dplyr compatibility<a class="anchor" 
aria-label="anchor" href="#dplyr-compatibility-11-0-0-2"></a></h4>
-<ul><li>New dplyr (1.1.0) function <code><a 
href="https://dplyr.tidyverse.org/reference/join_by.html"; 
class="external-link">join_by()</a></code> has been implemented for dplyr joins 
on Arrow objects (equality conditions only). (<a 
href="https://github.com/apache/arrow/issues/33664"; 
class="external-link">#33664</a>)</li>
+<ul><li>New dplyr (1.1.0) function <code>join_by()</code> has been implemented 
for dplyr joins on Arrow objects (equality conditions only). (<a 
href="https://github.com/apache/arrow/issues/33664"; 
class="external-link">#33664</a>)</li>
 <li>Output is accurate when multiple <code><a 
href="https://dplyr.tidyverse.org/reference/group_by.html"; 
class="external-link">dplyr::group_by()</a></code>/<code><a 
href="https://dplyr.tidyverse.org/reference/summarise.html"; 
class="external-link">dplyr::summarise()</a></code> calls are used. (<a 
href="https://github.com/apache/arrow/issues/14905"; 
class="external-link">#14905</a>)</li>
 <li>
 <code><a href="https://dplyr.tidyverse.org/reference/summarise.html"; 
class="external-link">dplyr::summarize()</a></code> works with division when 
divisor is a variable. (<a href="https://github.com/apache/arrow/issues/14933"; 
class="external-link">#14933</a>)</li>
@@ -465,7 +482,7 @@
 <h3 id="arrow-dplyr-queries-10-0-0">Arrow dplyr queries<a class="anchor" 
aria-label="anchor" href="#arrow-dplyr-queries-10-0-0"></a></h3>
 <p>Several new functions can be used in queries:</p>
 <ul><li>
-<code><a href="https://dplyr.tidyverse.org/reference/across.html"; 
class="external-link">dplyr::across()</a></code> can be used to apply the same 
computation across multiple columns, and the <code><a 
href="https://tidyselect.r-lib.org/reference/where.html"; 
class="external-link">where()</a></code> selection helper is supported in 
<code><a href="https://dplyr.tidyverse.org/reference/across.html"; 
class="external-link">across()</a></code>;</li>
+<code><a href="https://dplyr.tidyverse.org/reference/across.html"; 
class="external-link">dplyr::across()</a></code> can be used to apply the same 
computation across multiple columns, and the <code>where()</code> selection 
helper is supported in <code>across()</code>;</li>
 <li>
 <code><a href="../reference/add_filename.html">add_filename()</a></code> can 
be used to get the filename a row came from (only available when querying 
<code><a href="../reference/Dataset.html">?Dataset</a></code>);</li>
 <li>Added five functions in the <code>slice_*</code> family: <code><a 
href="https://dplyr.tidyverse.org/reference/slice.html"; 
class="external-link">dplyr::slice_min()</a></code>, <code><a 
href="https://dplyr.tidyverse.org/reference/slice.html"; 
class="external-link">dplyr::slice_max()</a></code>, <code><a 
href="https://dplyr.tidyverse.org/reference/slice.html"; 
class="external-link">dplyr::slice_head()</a></code>, <code><a 
href="https://dplyr.tidyverse.org/reference/slice.html"; class="exte [...]
@@ -680,9 +697,9 @@
 <div class="section level3">
 <h3 id="enhancements-to-dplyr-and-datasets-7-0-0">Enhancements to dplyr and 
datasets<a class="anchor" aria-label="anchor" 
href="#enhancements-to-dplyr-and-datasets-7-0-0"></a></h3>
 <ul><li>Additional <a href="https://lubridate.tidyverse.org"; 
class="external-link">lubridate</a> features: <code>week()</code>, more of the 
<code>is.*()</code> functions, and the label argument to <code>month()</code> 
have been implemented.</li>
-<li>More complex expressions inside <code><a 
href="https://dplyr.tidyverse.org/reference/summarise.html"; 
class="external-link">summarize()</a></code>, such as <code>ifelse(n() &gt; 1, 
mean(y), mean(z))</code>, are supported.</li>
+<li>More complex expressions inside <code>summarize()</code>, such as 
<code>ifelse(n() &gt; 1, mean(y), mean(z))</code>, are supported.</li>
 <li>When adding columns in a dplyr pipeline, one can now use 
<code>tibble</code> and <code>data.frame</code> to create columns of tibbles or 
data.frames respectively (e.g. <code>... %&gt;% mutate(df_col = tibble(a, b)) 
%&gt;% ...</code>).</li>
-<li>Dictionary columns (R <code>factor</code> type) are supported inside of 
<code><a href="https://dplyr.tidyverse.org/reference/coalesce.html"; 
class="external-link">coalesce()</a></code>.</li>
+<li>Dictionary columns (R <code>factor</code> type) are supported inside of 
<code>coalesce()</code>.</li>
 <li>
 <code><a href="../reference/open_dataset.html">open_dataset()</a></code> 
accepts the <code>partitioning</code> argument when reading Hive-style 
partitioned files, even though it is not required.</li>
 <li>The experimental <code><a 
href="../reference/map_batches.html">map_batches()</a></code> function for 
custom operations on dataset has been restored.</li>
@@ -697,7 +714,7 @@
 <code><a href="https://rdrr.io/r/utils/head.html"; 
class="external-link">head()</a></code> no longer hangs on large CSV 
datasets.</li>
 <li>There is an improved error message when there is a conflict between a 
header in the file and schema/column names provided as arguments.</li>
 <li>
-<code><a href="../reference/write_csv_arrow.html">write_csv_arrow()</a></code> 
now follows the signature of <code>readr::write_csv()</code>.</li>
+<code><a href="../reference/write_csv_arrow.html">write_csv_arrow()</a></code> 
now follows the signature of <code><a 
href="https://readr.tidyverse.org/reference/write_delim.html"; 
class="external-link">readr::write_csv()</a></code>.</li>
 </ul></div>
 <div class="section level3">
 <h3 id="other-improvements-and-fixes-7-0-0">Other improvements and fixes<a 
class="anchor" aria-label="anchor" 
href="#other-improvements-and-fixes-7-0-0"></a></h3>
@@ -743,27 +760,27 @@
 <h2 class="pkg-version" data-toc-text="6.0.0" id="arrow-600">arrow 6.0.0<a 
class="anchor" aria-label="anchor" href="#arrow-600"></a></h2>
 <p>There are now two ways to query Arrow data:</p>
 <div class="section level3">
-<h3 id="1-expanded-arrow-native-queries-aggregation-and-joins-6-0-0">1. 
Expanded Arrow-native queries: aggregation and joins<a class="anchor" 
aria-label="anchor" 
href="#1-expanded-arrow-native-queries-aggregation-and-joins-6-0-0"></a></h3>
-<p><code><a href="https://dplyr.tidyverse.org/reference/summarise.html"; 
class="external-link">dplyr::summarize()</a></code>, both grouped and 
ungrouped, is now implemented for Arrow Datasets, Tables, and RecordBatches. 
Because data is scanned in chunks, you can aggregate over larger-than-memory 
datasets backed by many files. Supported aggregation functions include <code><a 
href="https://dplyr.tidyverse.org/reference/context.html"; 
class="external-link">n()</a></code>, <code><a href="https [...]
-<p>Along with <code><a 
href="https://dplyr.tidyverse.org/reference/summarise.html"; 
class="external-link">summarize()</a></code>, you can also call <code><a 
href="https://dplyr.tidyverse.org/reference/count.html"; 
class="external-link">count()</a></code>, <code><a 
href="https://dplyr.tidyverse.org/reference/count.html"; 
class="external-link">tally()</a></code>, and <code><a 
href="https://dplyr.tidyverse.org/reference/distinct.html"; 
class="external-link">distinct()</a></code>, which effectiv [...]
-<p>This enhancement does change the behavior of <code><a 
href="https://dplyr.tidyverse.org/reference/summarise.html"; 
class="external-link">summarize()</a></code> and <code><a 
href="https://dplyr.tidyverse.org/reference/compute.html"; 
class="external-link">collect()</a></code> in some cases: see “Breaking 
changes” below for details.</p>
-<p>In addition to <code><a 
href="https://dplyr.tidyverse.org/reference/summarise.html"; 
class="external-link">summarize()</a></code>, mutating and filtering equality 
joins (<code><a href="https://dplyr.tidyverse.org/reference/mutate-joins.html"; 
class="external-link">inner_join()</a></code>, <code><a 
href="https://dplyr.tidyverse.org/reference/mutate-joins.html"; 
class="external-link">left_join()</a></code>, <code><a 
href="https://dplyr.tidyverse.org/reference/mutate-joins.html"; class="exte [...]
+<h3 id="id_1-expanded-arrow-native-queries-aggregation-and-joins-6-0-0">1. 
Expanded Arrow-native queries: aggregation and joins<a class="anchor" 
aria-label="anchor" 
href="#id_1-expanded-arrow-native-queries-aggregation-and-joins-6-0-0"></a></h3>
+<p><code><a href="https://dplyr.tidyverse.org/reference/summarise.html"; 
class="external-link">dplyr::summarize()</a></code>, both grouped and 
ungrouped, is now implemented for Arrow Datasets, Tables, and RecordBatches. 
Because data is scanned in chunks, you can aggregate over larger-than-memory 
datasets backed by many files. Supported aggregation functions include 
<code>n()</code>, <code>n_distinct()</code>, <code>min(),</code> <code><a 
href="https://rdrr.io/r/base/Extremes.html"; class=" [...]
+<p>Along with <code>summarize()</code>, you can also call 
<code>count()</code>, <code>tally()</code>, and <code>distinct()</code>, which 
effectively wrap <code>summarize()</code>.</p>
+<p>This enhancement does change the behavior of <code>summarize()</code> and 
<code>collect()</code> in some cases: see “Breaking changes” below for 
details.</p>
+<p>In addition to <code>summarize()</code>, mutating and filtering equality 
joins (<code>inner_join()</code>, <code>left_join()</code>, 
<code>right_join()</code>, <code>full_join()</code>, <code>semi_join()</code>, 
and <code>anti_join()</code>) with are also supported natively in Arrow.</p>
 <p>Grouped aggregation and (especially) joins should be considered somewhat 
experimental in this release. We expect them to work, but they may not be well 
optimized for all workloads. To help us focus our efforts on improving them in 
the next release, please let us know if you encounter unexpected behavior or 
poor performance.</p>
 <p>New non-aggregating compute functions include string functions like 
<code>str_to_title()</code> and <code><a 
href="https://rdrr.io/r/base/strptime.html"; 
class="external-link">strftime()</a></code> as well as compute functions for 
extracting date parts (e.g. <code>year()</code>, <code>month()</code>) from 
dates. This is not a complete list of additional compute functions; for an 
exhaustive list of available compute functions see <code><a 
href="../reference/list_compute_functions.html"> [...]
 <p>We’ve also worked to fill in support for all data types, such as 
<code>Decimal</code>, for functions added in previous releases. All type 
limitations mentioned in previous release notes should be no longer valid, and 
if you find a function that is not implemented for a certain data type, please 
<a href="https://issues.apache.org/jira/projects/ARROW/issues"; 
class="external-link">report an issue</a>.</p>
 </div>
 <div class="section level3">
-<h3 id="2-duckdb-integration-6-0-0">2. DuckDB integration<a class="anchor" 
aria-label="anchor" href="#2-duckdb-integration-6-0-0"></a></h3>
+<h3 id="id_2-duckdb-integration-6-0-0">2. DuckDB integration<a class="anchor" 
aria-label="anchor" href="#id_2-duckdb-integration-6-0-0"></a></h3>
 <p>If you have the <a href="https://CRAN.R-project.org/package=duckdb"; 
class="external-link">duckdb package</a> installed, you can hand off an Arrow 
Dataset or query object to <a href="https://duckdb.org/"; 
class="external-link">DuckDB</a> for further querying using the <code><a 
href="../reference/to_duckdb.html">to_duckdb()</a></code> function. This allows 
you to use duckdb’s <code>dbplyr</code> methods, as well as its SQL interface, 
to aggregate data. Filtering and column projection don [...]
 <p>You can also take a duckdb <code>tbl</code> and call <code><a 
href="../reference/to_arrow.html">to_arrow()</a></code> to stream data to 
Arrow’s query engine. This means that in a single dplyr pipeline, you could 
start with an Arrow Dataset, evaluate some steps in DuckDB, then evaluate the 
rest in Arrow.</p>
 </div>
 <div class="section level3">
 <h3 id="breaking-changes-6-0-0">Breaking changes<a class="anchor" 
aria-label="anchor" href="#breaking-changes-6-0-0"></a></h3>
-<ul><li>Row order of data from a Dataset query is no longer deterministic. If 
you need a stable sort order, you should explicitly <code><a 
href="https://dplyr.tidyverse.org/reference/arrange.html"; 
class="external-link">arrange()</a></code> the query result. For calls to 
<code><a href="https://dplyr.tidyverse.org/reference/summarise.html"; 
class="external-link">summarize()</a></code>, you can set 
<code>options(arrow.summarise.sort = TRUE)</code> to match the current 
<code>dplyr</code> beha [...]
+<ul><li>Row order of data from a Dataset query is no longer deterministic. If 
you need a stable sort order, you should explicitly <code>arrange()</code> the 
query result. For calls to <code>summarize()</code>, you can set 
<code>options(arrow.summarise.sort = TRUE)</code> to match the current 
<code>dplyr</code> behavior of sorting on the grouping columns.</li>
 <li>
-<code><a href="https://dplyr.tidyverse.org/reference/summarise.html"; 
class="external-link">dplyr::summarize()</a></code> on an in-memory Arrow Table 
or RecordBatch no longer eagerly evaluates. Call <code><a 
href="https://dplyr.tidyverse.org/reference/compute.html"; 
class="external-link">compute()</a></code> or <code><a 
href="https://dplyr.tidyverse.org/reference/compute.html"; 
class="external-link">collect()</a></code> to evaluate the query.</li>
+<code><a href="https://dplyr.tidyverse.org/reference/summarise.html"; 
class="external-link">dplyr::summarize()</a></code> on an in-memory Arrow Table 
or RecordBatch no longer eagerly evaluates. Call <code>compute()</code> or 
<code>collect()</code> to evaluate the query.</li>
 <li>
-<code><a href="https://rdrr.io/r/utils/head.html"; 
class="external-link">head()</a></code> and <code><a 
href="https://rdrr.io/r/utils/head.html"; 
class="external-link">tail()</a></code> also no longer eagerly evaluate, both 
for in-memory data and for Datasets. Also, because row order is no longer 
deterministic, they will effectively give you a random slice of data from 
somewhere in the dataset unless you <code><a 
href="https://dplyr.tidyverse.org/reference/arrange.html"; class="external-lin 
[...]
+<code><a href="https://rdrr.io/r/utils/head.html"; 
class="external-link">head()</a></code> and <code><a 
href="https://rdrr.io/r/utils/head.html"; 
class="external-link">tail()</a></code> also no longer eagerly evaluate, both 
for in-memory data and for Datasets. Also, because row order is no longer 
deterministic, they will effectively give you a random slice of data from 
somewhere in the dataset unless you <code>arrange()</code> to specify 
sorting.</li>
 <li>Simple Feature (SF) columns no longer save all of their metadata when 
converting to Arrow tables (and thus when saving to Parquet or Feather). This 
also includes any dataframe column that has attributes on each element (in 
other words: row-level metadata). Our previous approach to saving this metadata 
is both (computationally) inefficient and unreliable with Arrow queries + 
datasets. This will most impact saving SF columns. For saving these columns we 
recommend either converting the  [...]
 <li>Datasets are officially no longer supported on 32-bit Windows on R &lt; 
4.0 (Rtools 3.5). 32-bit Windows users should upgrade to a newer version of R 
in order to use datasets.</li>
 </ul></div>
@@ -785,7 +802,7 @@
 <li>
 <code><a href="../reference/write_parquet.html">write_parquet()</a></code> no 
longer errors when used with a grouped data.frame</li>
 <li>
-<code><a href="https://dplyr.tidyverse.org/reference/case_when.html"; 
class="external-link">case_when()</a></code> now errors cleanly if an 
expression is not supported in Arrow</li>
+<code>case_when()</code> now errors cleanly if an expression is not supported 
in Arrow</li>
 <li>
 <code><a href="../reference/open_dataset.html">open_dataset()</a></code> now 
works on CSVs without header rows</li>
 <li>Fixed a minor issue where the short readr-style types <code>T</code> and 
<code>t</code> were reversed in <code><a 
href="../reference/read_delim_arrow.html">read_csv_arrow()</a></code>
@@ -813,19 +830,19 @@
 <div class="section level3">
 <h3 id="more-dplyr-5-0-0">More dplyr<a class="anchor" aria-label="anchor" 
href="#more-dplyr-5-0-0"></a></h3>
 <ul><li>
-<p>There are now more than 250 compute functions available for use in <code><a 
href="https://dplyr.tidyverse.org/reference/filter.html"; 
class="external-link">dplyr::filter()</a></code>, <code><a 
href="https://dplyr.tidyverse.org/reference/mutate.html"; 
class="external-link">mutate()</a></code>, etc. Additions in this release 
include:</p>
+<p>There are now more than 250 compute functions available for use in <code><a 
href="https://dplyr.tidyverse.org/reference/filter.html"; 
class="external-link">dplyr::filter()</a></code>, <code>mutate()</code>, etc. 
Additions in this release include:</p>
 <ul><li>String operations: <code><a 
href="https://rdrr.io/r/base/strsplit.html"; 
class="external-link">strsplit()</a></code> and <code>str_split()</code>; 
<code><a href="https://rdrr.io/r/base/strptime.html"; 
class="external-link">strptime()</a></code>; <code><a 
href="https://rdrr.io/r/base/paste.html"; 
class="external-link">paste()</a></code>, <code><a 
href="https://rdrr.io/r/base/paste.html"; 
class="external-link">paste0()</a></code>, and <code>str_c()</code>; <code><a 
href="https://rdrr.i [...]
 </li>
 <li>Date/time operations: <code>lubridate</code> methods such as 
<code>year()</code>, <code>month()</code>, <code>wday()</code>, and so on</li>
 <li>Math: logarithms (<code><a href="https://rdrr.io/r/base/Log.html"; 
class="external-link">log()</a></code> et al.); trigonometry (<code><a 
href="https://rdrr.io/r/base/Trig.html"; class="external-link">sin()</a></code>, 
<code><a href="https://rdrr.io/r/base/Trig.html"; 
class="external-link">cos()</a></code>, et al.); <code><a 
href="https://rdrr.io/r/base/MathFun.html"; 
class="external-link">abs()</a></code>; <code><a 
href="https://rdrr.io/r/base/sign.html"; class="external-link">sign()</a> [...]
 </li>
-<li>Conditional functions, with some limitations on input type in this 
release: <code><a href="https://rdrr.io/r/base/ifelse.html"; 
class="external-link">ifelse()</a></code> and <code><a 
href="https://dplyr.tidyverse.org/reference/if_else.html"; 
class="external-link">if_else()</a></code> for all but <code>Decimal</code> 
types; <code><a href="https://dplyr.tidyverse.org/reference/case_when.html"; 
class="external-link">case_when()</a></code> for logical, numeric, and temporal 
types only; <cod [...]
+<li>Conditional functions, with some limitations on input type in this 
release: <code><a href="https://rdrr.io/r/base/ifelse.html"; 
class="external-link">ifelse()</a></code> and <code>if_else()</code> for all 
but <code>Decimal</code> types; <code>case_when()</code> for logical, numeric, 
and temporal types only; <code>coalesce()</code> for all but lists/structs. 
Note also that in this release, factors/dictionaries are converted to strings 
in these functions.</li>
 <li>
-<code>is.*</code> functions are supported and can be used inside <code><a 
href="https://dplyr.tidyverse.org/reference/relocate.html"; 
class="external-link">relocate()</a></code>
+<code>is.*</code> functions are supported and can be used inside 
<code>relocate()</code>
 </li>
 </ul></li>
-<li><p>The print method for <code>arrow_dplyr_query</code> now includes the 
expression and the resulting type of columns derived by <code><a 
href="https://dplyr.tidyverse.org/reference/mutate.html"; 
class="external-link">mutate()</a></code>.</p></li>
-<li><p><code><a href="https://dplyr.tidyverse.org/reference/transmute.html"; 
class="external-link">transmute()</a></code> now errors if passed arguments 
<code>.keep</code>, <code>.before</code>, or <code>.after</code>, for 
consistency with the behavior of <code>dplyr</code> on 
<code>data.frame</code>s.</p></li>
+<li><p>The print method for <code>arrow_dplyr_query</code> now includes the 
expression and the resulting type of columns derived by 
<code>mutate()</code>.</p></li>
+<li><p><code>transmute()</code> now errors if passed arguments 
<code>.keep</code>, <code>.before</code>, or <code>.after</code>, for 
consistency with the behavior of <code>dplyr</code> on 
<code>data.frame</code>s.</p></li>
 </ul></div>
 <div class="section level3">
 <h3 id="csv-writing-5-0-0">CSV writing<a class="anchor" aria-label="anchor" 
href="#csv-writing-5-0-0"></a></h3>
@@ -836,8 +853,8 @@
 </ul></div>
 <div class="section level3">
 <h3 id="c-interface-5-0-0">C interface<a class="anchor" aria-label="anchor" 
href="#c-interface-5-0-0"></a></h3>
-<ul><li>Added bindings for the remainder of C data interface: Type, Field, and 
RecordBatchReader (from the experimental C stream interface). These also have 
<code><a 
href="https://rstudio.github.io/reticulate/reference/r-py-conversion.html"; 
class="external-link">reticulate::py_to_r()</a></code> and <code><a 
href="https://rstudio.github.io/reticulate/reference/r-py-conversion.html"; 
class="external-link">r_to_py()</a></code> methods. Along with the addition of 
the <code>Scanner$ToRecordBat [...]
-<li>C interface methods are exposed on Arrow objects (e.g. 
<code>Array$export_to_c()</code>, <code>RecordBatch$import_from_c()</code>), 
similar to how they are in <code>pyarrow</code>. This facilitates their use in 
other packages. See the <code><a 
href="https://rstudio.github.io/reticulate/reference/r-py-conversion.html"; 
class="external-link">py_to_r()</a></code> and <code><a 
href="https://rstudio.github.io/reticulate/reference/r-py-conversion.html"; 
class="external-link">r_to_py()</a></c [...]
+<ul><li>Added bindings for the remainder of C data interface: Type, Field, and 
RecordBatchReader (from the experimental C stream interface). These also have 
<code><a 
href="https://rstudio.github.io/reticulate/reference/r-py-conversion.html"; 
class="external-link">reticulate::py_to_r()</a></code> and 
<code>r_to_py()</code> methods. Along with the addition of the 
<code>Scanner$ToRecordBatchReader()</code> method, you can now build up a 
Dataset query in R and pass the resulting stream of bat [...]
+<li>C interface methods are exposed on Arrow objects (e.g. 
<code>Array$export_to_c()</code>, <code>RecordBatch$import_from_c()</code>), 
similar to how they are in <code>pyarrow</code>. This facilitates their use in 
other packages. See the <code>py_to_r()</code> and <code>r_to_py()</code> 
methods for usage examples.</li>
 </ul></div>
 <div class="section level3">
 <h3 id="other-enhancements-5-0-0">Other enhancements<a class="anchor" 
aria-label="anchor" href="#other-enhancements-5-0-0"></a></h3>
@@ -874,9 +891,9 @@
 <h3 id="dplyr-methods-4-0-0">dplyr methods<a class="anchor" 
aria-label="anchor" href="#dplyr-methods-4-0-0"></a></h3>
 <p>Many more <code>dplyr</code> verbs are supported on Arrow objects:</p>
 <ul><li>
-<code><a href="https://dplyr.tidyverse.org/reference/mutate.html"; 
class="external-link">dplyr::mutate()</a></code> is now supported in Arrow for 
many applications. For queries on <code>Table</code> and 
<code>RecordBatch</code> that are not yet supported in Arrow, the 
implementation falls back to pulling data into an in-memory R 
<code>data.frame</code> first, as in the previous release. For queries on 
<code>Dataset</code> (which can be larger than memory), it raises an error if 
the functi [...]
+<code><a href="https://dplyr.tidyverse.org/reference/mutate.html"; 
class="external-link">dplyr::mutate()</a></code> is now supported in Arrow for 
many applications. For queries on <code>Table</code> and 
<code>RecordBatch</code> that are not yet supported in Arrow, the 
implementation falls back to pulling data into an in-memory R 
<code>data.frame</code> first, as in the previous release. For queries on 
<code>Dataset</code> (which can be larger than memory), it raises an error if 
the functi [...]
 <li>
-<code><a href="https://dplyr.tidyverse.org/reference/transmute.html"; 
class="external-link">dplyr::transmute()</a></code> (which calls <code><a 
href="https://dplyr.tidyverse.org/reference/mutate.html"; 
class="external-link">mutate()</a></code>)</li>
+<code><a href="https://dplyr.tidyverse.org/reference/transmute.html"; 
class="external-link">dplyr::transmute()</a></code> (which calls 
<code>mutate()</code>)</li>
 <li>
 <code><a href="https://dplyr.tidyverse.org/reference/group_by.html"; 
class="external-link">dplyr::group_by()</a></code> now preserves the 
<code>.drop</code> argument and supports on-the-fly definition of columns</li>
 <li>
@@ -996,7 +1013,7 @@
 <code><a href="../reference/write_dataset.html">write_dataset()</a></code> to 
Feather or Parquet files with partitioning. See the end of <code><a 
href="../articles/dataset.html">vignette("dataset", package = 
"arrow")</a></code> for discussion and examples.</li>
 <li>Datasets now have <code><a href="https://rdrr.io/r/utils/head.html"; 
class="external-link">head()</a></code>, <code><a 
href="https://rdrr.io/r/utils/head.html"; 
class="external-link">tail()</a></code>, and take (<code>[</code>) methods. 
<code><a href="https://rdrr.io/r/utils/head.html"; 
class="external-link">head()</a></code> is optimized but the others may not be 
performant.</li>
 <li>
-<code><a href="https://dplyr.tidyverse.org/reference/compute.html"; 
class="external-link">collect()</a></code> gains an <code>as_data_frame</code> 
argument, default <code>TRUE</code> but when <code>FALSE</code> allows you to 
evaluate the accumulated <code>select</code> and <code>filter</code> query but 
keep the result in Arrow, not an R <code>data.frame</code>
+<code>collect()</code> gains an <code>as_data_frame</code> argument, default 
<code>TRUE</code> but when <code>FALSE</code> allows you to evaluate the 
accumulated <code>select</code> and <code>filter</code> query but keep the 
result in Arrow, not an R <code>data.frame</code>
 </li>
 <li>
 <code><a href="../reference/read_delim_arrow.html">read_csv_arrow()</a></code> 
supports specifying column types, both with a <code>Schema</code> and with the 
compact string representation for types used in the <code>readr</code> package. 
It also has gained a <code>timestamp_parsers</code> argument that lets you 
express a set of <code>strptime</code> parse strings that will be tried to 
convert columns designated as <code>Timestamp</code> type.</li>
@@ -1063,7 +1080,7 @@
 <code>character</code> vectors that exceed 2GB are converted to Arrow 
<code>large_utf8</code> type</li>
 <li>
 <code>POSIXlt</code> objects can now be converted to Arrow 
(<code>struct</code>)</li>
-<li>R <code><a href="https://rdrr.io/r/base/attributes.html"; 
class="external-link">attributes()</a></code> are preserved in Arrow metadata 
when converting to Arrow RecordBatch and table and are restored when converting 
from Arrow. This means that custom subclasses, such as 
<code>haven::labelled</code>, are preserved in round trip through Arrow.</li>
+<li>R <code><a href="https://rdrr.io/r/base/attributes.html"; 
class="external-link">attributes()</a></code> are preserved in Arrow metadata 
when converting to Arrow RecordBatch and table and are restored when converting 
from Arrow. This means that custom subclasses, such as <code><a 
href="https://haven.tidyverse.org/reference/labelled.html"; 
class="external-link">haven::labelled</a></code>, are preserved in round trip 
through Arrow.</li>
 <li>Schema metadata is now exposed as a named list, and it can be modified by 
assignment like <code>batch$metadata$new_key &lt;- "new value"</code>
 </li>
 <li>Arrow types <code>int64</code>, <code>uint32</code>, and 
<code>uint64</code> now are converted to R <code>integer</code> if all values 
fit in bounds</li>
@@ -1167,7 +1184,7 @@
 <h2 class="pkg-version" data-toc-text="0.16.0" id="arrow-0160">arrow 0.16.0<a 
class="anchor" aria-label="anchor" href="#arrow-0160"></a></h2><p 
class="text-muted">CRAN release: 2020-02-09</p>
 <div class="section level3">
 <h3 id="multi-file-datasets-0-16-0">Multi-file datasets<a class="anchor" 
aria-label="anchor" href="#multi-file-datasets-0-16-0"></a></h3>
-<p>This release includes a <code>dplyr</code> interface to Arrow Datasets, 
which let you work efficiently with large, multi-file datasets as a single 
entity. Explore a directory of data files with <code><a 
href="../reference/open_dataset.html">open_dataset()</a></code> and then use 
<code>dplyr</code> methods to <code><a 
href="https://dplyr.tidyverse.org/reference/select.html"; 
class="external-link">select()</a></code>, <code><a 
href="https://dplyr.tidyverse.org/reference/filter.html"; clas [...]
+<p>This release includes a <code>dplyr</code> interface to Arrow Datasets, 
which let you work efficiently with large, multi-file datasets as a single 
entity. Explore a directory of data files with <code><a 
href="../reference/open_dataset.html">open_dataset()</a></code> and then use 
<code>dplyr</code> methods to <code>select()</code>, <code><a 
href="https://rdrr.io/r/stats/filter.html"; 
class="external-link">filter()</a></code>, etc. Work will be done where 
possible in Arrow memory. When n [...]
 <p>See <code><a href="../articles/dataset.html">vignette("dataset", package = 
"arrow")</a></code> for details.</p>
 </div>
 <div class="section level3">
@@ -1246,7 +1263,7 @@
 </div>
 
 <div class="pkgdown-footer-right">
-  <p>Site built with <a href="https://pkgdown.r-lib.org/"; 
class="external-link">pkgdown</a> 2.2.0.</p>
+  <p>Site built with <a href="https://pkgdown.r-lib.org/"; 
class="external-link">pkgdown</a> 2.1.3.</p>
 </div>
 
     </footer></div>

(arrow-site) branch asf-site updated: Update R changelog for 23.0.0 (#752)

Reply via email to