This is an automated email from the ASF dual-hosted git repository.
thisisnic pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/arrow-site.git
The following commit(s) were added to refs/heads/asf-site by this push:
new 42deea61fe6 Update R changelog for 23.0.0 (#752)
42deea61fe6 is described below
commit 42deea61fe68aadb2872dc722d2b37a4fdc4f8be
Author: Nic Crane <[email protected]>
AuthorDate: Wed Jan 28 23:05:43 2026 -0500
Update R changelog for 23.0.0 (#752)
This needed doing manually as we didn't update the relevant page until
after the release vote.
---
docs/r/news/index.html | 85 ++++++++++++++++++++++++++++++--------------------
1 file changed, 51 insertions(+), 34 deletions(-)
diff --git a/docs/r/news/index.html b/docs/r/news/index.html
index b178f9adb04..c93017589fd 100644
--- a/docs/r/news/index.html
+++ b/docs/r/news/index.html
@@ -73,7 +73,24 @@
</div>
<div class="section level2">
-<h2 class="pkg-version" data-toc-text="23.0.0" id="arrow-2300">arrow 23.0.0<a
class="anchor" aria-label="anchor" href="#arrow-2300"></a></h2>
+<h2 class="pkg-version" data-toc-text="23.0.0" id="arrow-2300">arrow 23.0.0<a
class="anchor" aria-label="anchor" href="#arrow-2300"></a></h2><p
class="text-muted">CRAN release: 2026-01-23</p>
+<div class="section level3">
+<h3 id="new-features-23-0-0">New features<a class="anchor" aria-label="anchor"
href="#new-features-23-0-0"></a></h3>
+<ul><li>
+<code><a href="https://rdrr.io/r/base/nchar.html"
class="external-link">nchar()</a></code> now supports <code>keepNA =
FALSE</code> (<a href="https://github.com/HyukjinKwon"
class="external-link">@HyukjinKwon</a>, <a
href="https://github.com/apache/arrow/issues/48665"
class="external-link">#48665</a>).</li>
+<li>
+<code><a href="https://stringr.tidyverse.org/reference/str_like.html"
class="external-link">stringr::str_ilike()</a></code> binding for
case-insensitive pattern matching (<a
href="https://github.com/apache/arrow/issues/48262"
class="external-link">#48262</a>).</li>
+</ul></div>
+<div class="section level3">
+<h3 id="minor-improvements-and-fixes-23-0-0">Minor improvements and fixes<a
class="anchor" aria-label="anchor"
href="#minor-improvements-and-fixes-23-0-0"></a></h3>
+<ul><li>Fix slow performance reading files with large number of columns (<a
href="https://github.com/apache/arrow/issues/48104"
class="external-link">#48104</a>).</li>
+<li>Fix segfault when calling <code><a
href="../reference/concat_tables.html">concat_tables()</a></code> on a
<code>RecordBatch</code> (<a
href="https://github.com/apache/arrow/issues/47885"
class="external-link">#47885</a>).</li>
+<li>Writing partitioned datasets on S3 no longer requires
<code>ListBucket</code> permissions (<a href="https://github.com/HaochengLIU"
class="external-link">@HaochengLIU</a>, <a
href="https://github.com/apache/arrow/issues/47599"
class="external-link">#47599</a>).</li>
+</ul></div>
+<div class="section level3">
+<h3 id="installation-23-0-0">Installation<a class="anchor" aria-label="anchor"
href="#installation-23-0-0"></a></h3>
+<ul><li>As of version 23.0.0, <code>arrow</code> requires C++20 to build from
source. This means that you may need a newer compiler than the default on some
older systems. See <code><a href="../articles/install.html">vignette("install",
package = "arrow")</a></code> for guidance.</li>
+</ul></div>
</div>
<div class="section level2">
<h2 class="pkg-version" data-toc-text="22.0.0.1" id="arrow-22001">arrow
22.0.0.1<a class="anchor" aria-label="anchor" href="#arrow-22001"></a></h2><p
class="text-muted">CRAN release: 2025-12-23</p>
@@ -117,7 +134,7 @@
<h3 id="minor-improvements-and-fixes-21-0-0">Minor improvements and fixes<a
class="anchor" aria-label="anchor"
href="#minor-improvements-and-fixes-21-0-0"></a></h3>
<ul><li>Expose an option
<code>check_directory_existence_before_creation</code> in
<code>S3FileSystem</code> to reduce I/O calls on cloud storage (<a
href="https://github.com/HaochengLIU" class="external-link">@HaochengLIU</a>,
<a href="https://github.com/apache/arrow/issues/41998"
class="external-link">#41998</a>).</li>
<li>
-<code><a href="https://dplyr.tidyverse.org/reference/case_when.html"
class="external-link">case_when()</a></code> now correctly detects objects that
are not in the global environment (<a href="https://github.com/etiennebacher"
class="external-link">@etiennebacher</a>, <a
href="https://github.com/apache/arrow/issues/46667"
class="external-link">#46667</a>).</li>
+<code>case_when()</code> now correctly detects objects that are not in the
global environment (<a href="https://github.com/etiennebacher"
class="external-link">@etiennebacher</a>, <a
href="https://github.com/apache/arrow/issues/46667"
class="external-link">#46667</a>).</li>
<li>Negative fractional dates now correctly converted to integers by flooring
values (<a href="https://github.com/apache/arrow/issues/46873"
class="external-link">#46873</a>).</li>
<li>Backwards compatibility checks for legacy Arrow C++ versions have been
removed from the R package (<a
href="https://github.com/apache/arrow/issues/46491"
class="external-link">#46491</a>). This shouldn’t affect most users this
package and would only impact you if you were building the R package from
source with different R package and Arrow C++ versions.</li>
<li>Require CMake 3.25 or greater in bundled build script for full-source
builds (<a href="https://github.com/apache/arrow/issues/46834"
class="external-link">#46834</a>). This shouldn’t affect most users.</li>
@@ -167,9 +184,9 @@
<h3 id="new-features-17-0-0">New features<a class="anchor" aria-label="anchor"
href="#new-features-17-0-0"></a></h3>
<ul><li>R functions that users write that use functions that Arrow supports in
dataset queries now can be used in queries too. Previously, only functions that
used arithmetic operators worked. For example, <code>time_hours <-
function(mins) mins / 60</code> worked, but <code>time_hours_rounded <-
function(mins) round(mins / 60)</code> did not; now both work. These are
automatic translations rather than true user-defined functions (UDFs); for
UDFs, see <code><a href="../reference/re [...]
<li>
-<code><a href="https://dplyr.tidyverse.org/reference/mutate.html"
class="external-link">mutate()</a></code> expressions can now include
aggregations, such as <code>x - mean(x)</code>. (<a
href="https://github.com/apache/arrow/issues/41350"
class="external-link">#41350</a>)</li>
+<code>mutate()</code> expressions can now include aggregations, such as
<code>x - mean(x)</code>. (<a
href="https://github.com/apache/arrow/issues/41350"
class="external-link">#41350</a>)</li>
<li>
-<code><a href="https://dplyr.tidyverse.org/reference/summarise.html"
class="external-link">summarize()</a></code> supports more complex expressions,
and correctly handles cases where column names are reused in expressions. (<a
href="https://github.com/apache/arrow/issues/41223"
class="external-link">#41223</a>)</li>
+<code>summarize()</code> supports more complex expressions, and correctly
handles cases where column names are reused in expressions. (<a
href="https://github.com/apache/arrow/issues/41223"
class="external-link">#41223</a>)</li>
<li>The <code>na_matches</code> argument to the <code>dplyr::*_join()</code>
functions is now supported. This argument controls whether <code>NA</code>
values are considered equal when joining. (<a
href="https://github.com/apache/arrow/issues/41358"
class="external-link">#41358</a>)</li>
<li>R metadata, stored in the Arrow schema to support round-tripping data
between R and Arrow/Parquet, is now serialized and deserialized more strictly.
This makes it safer to load data from files from unknown sources into R
data.frames. (<a href="https://github.com/apache/arrow/issues/41969"
class="external-link">#41969</a>)</li>
</ul></div>
@@ -322,7 +339,7 @@
<li>Ensure that the RStringViewer helper class does not own any Array
references (<a href="https://github.com/apache/arrow/issues/35812"
class="external-link">#35812</a>)</li>
<li>
<code><a href="https://rdrr.io/r/base/strptime.html"
class="external-link">strptime()</a></code> in arrow will return a
timezone-aware timestamp if <code>%z</code> is part of the format string (<a
href="https://github.com/apache/arrow/issues/35671"
class="external-link">#35671</a>)</li>
-<li>Column ordering when combining <code><a
href="https://dplyr.tidyverse.org/reference/group_by.html"
class="external-link">group_by()</a></code> and <code><a
href="https://dplyr.tidyverse.org/reference/across.html"
class="external-link">across()</a></code> now matches dplyr (<a
href="https://github.com/eitsupi" class="external-link">@eitsupi</a>, <a
href="https://github.com/apache/arrow/issues/35473"
class="external-link">#35473</a>)</li>
+<li>Column ordering when combining <code>group_by()</code> and
<code>across()</code> now matches dplyr (<a href="https://github.com/eitsupi"
class="external-link">@eitsupi</a>, <a
href="https://github.com/apache/arrow/issues/35473"
class="external-link">#35473</a>)</li>
</ul></div>
<div class="section level3">
<h3 id="installation-13-0-0">Installation<a class="anchor" aria-label="anchor"
href="#installation-13-0-0"></a></h3>
@@ -352,7 +369,7 @@
<ul><li>The <code><a
href="../reference/read_parquet.html">read_parquet()</a></code> and <code><a
href="../reference/read_feather.html">read_feather()</a></code> functions can
now accept URL arguments (<a
href="https://github.com/apache/arrow/issues/33287"
class="external-link">#33287</a>, <a
href="https://github.com/apache/arrow/issues/34708"
class="external-link">#34708</a>).</li>
<li>The <code>json_credentials</code> argument in
<code>GcsFileSystem$create()</code> now accepts a file path containing the
appropriate authentication token (<a href="https://github.com/amoeba"
class="external-link">@amoeba</a>, <a
href="https://github.com/apache/arrow/issues/34421"
class="external-link">#34421</a>, <a
href="https://github.com/apache/arrow/issues/34524"
class="external-link">#34524</a>).</li>
<li>The <code>$options</code> member of <code>GcsFileSystem</code> objects can
now be inspected (<a href="https://github.com/amoeba"
class="external-link">@amoeba</a>, <a
href="https://github.com/apache/arrow/issues/34422"
class="external-link">#34422</a>, <a
href="https://github.com/apache/arrow/issues/34477"
class="external-link">#34477</a>).</li>
-<li>The <code><a
href="../reference/read_delim_arrow.html">read_csv_arrow()</a></code> and
<code><a href="../reference/read_json_arrow.html">read_json_arrow()</a></code>
functions now accept literal text input wrapped in <code><a
href="https://rdrr.io/r/base/AsIs.html" class="external-link">I()</a></code> to
improve compatability with <code>readr::read_csv()</code> (<a
href="https://github.com/eitsupi" class="external-link">@eitsupi</a>, <a
href="https://github.com/apache/arrow/issues/18 [...]
+<li>The <code><a
href="../reference/read_delim_arrow.html">read_csv_arrow()</a></code> and
<code><a href="../reference/read_json_arrow.html">read_json_arrow()</a></code>
functions now accept literal text input wrapped in <code><a
href="https://rdrr.io/r/base/AsIs.html" class="external-link">I()</a></code> to
improve compatability with <code><a
href="https://readr.tidyverse.org/reference/read_delim.html"
class="external-link">readr::read_csv()</a></code> (<a
href="https://github.com/eitsu [...]
<li>Nested fields can now be accessed using <code>$</code> and <code>[[</code>
in dplyr expressions (<a href="https://github.com/apache/arrow/issues/18818"
class="external-link">#18818</a>, <a
href="https://github.com/apache/arrow/issues/19706"
class="external-link">#19706</a>).</li>
</ul></div>
<div class="section level3">
@@ -409,7 +426,7 @@
</ul></div>
<div class="section level4">
<h4 id="dplyr-compatibility-11-0-0-2">dplyr compatibility<a class="anchor"
aria-label="anchor" href="#dplyr-compatibility-11-0-0-2"></a></h4>
-<ul><li>New dplyr (1.1.0) function <code><a
href="https://dplyr.tidyverse.org/reference/join_by.html"
class="external-link">join_by()</a></code> has been implemented for dplyr joins
on Arrow objects (equality conditions only). (<a
href="https://github.com/apache/arrow/issues/33664"
class="external-link">#33664</a>)</li>
+<ul><li>New dplyr (1.1.0) function <code>join_by()</code> has been implemented
for dplyr joins on Arrow objects (equality conditions only). (<a
href="https://github.com/apache/arrow/issues/33664"
class="external-link">#33664</a>)</li>
<li>Output is accurate when multiple <code><a
href="https://dplyr.tidyverse.org/reference/group_by.html"
class="external-link">dplyr::group_by()</a></code>/<code><a
href="https://dplyr.tidyverse.org/reference/summarise.html"
class="external-link">dplyr::summarise()</a></code> calls are used. (<a
href="https://github.com/apache/arrow/issues/14905"
class="external-link">#14905</a>)</li>
<li>
<code><a href="https://dplyr.tidyverse.org/reference/summarise.html"
class="external-link">dplyr::summarize()</a></code> works with division when
divisor is a variable. (<a href="https://github.com/apache/arrow/issues/14933"
class="external-link">#14933</a>)</li>
@@ -465,7 +482,7 @@
<h3 id="arrow-dplyr-queries-10-0-0">Arrow dplyr queries<a class="anchor"
aria-label="anchor" href="#arrow-dplyr-queries-10-0-0"></a></h3>
<p>Several new functions can be used in queries:</p>
<ul><li>
-<code><a href="https://dplyr.tidyverse.org/reference/across.html"
class="external-link">dplyr::across()</a></code> can be used to apply the same
computation across multiple columns, and the <code><a
href="https://tidyselect.r-lib.org/reference/where.html"
class="external-link">where()</a></code> selection helper is supported in
<code><a href="https://dplyr.tidyverse.org/reference/across.html"
class="external-link">across()</a></code>;</li>
+<code><a href="https://dplyr.tidyverse.org/reference/across.html"
class="external-link">dplyr::across()</a></code> can be used to apply the same
computation across multiple columns, and the <code>where()</code> selection
helper is supported in <code>across()</code>;</li>
<li>
<code><a href="../reference/add_filename.html">add_filename()</a></code> can
be used to get the filename a row came from (only available when querying
<code><a href="../reference/Dataset.html">?Dataset</a></code>);</li>
<li>Added five functions in the <code>slice_*</code> family: <code><a
href="https://dplyr.tidyverse.org/reference/slice.html"
class="external-link">dplyr::slice_min()</a></code>, <code><a
href="https://dplyr.tidyverse.org/reference/slice.html"
class="external-link">dplyr::slice_max()</a></code>, <code><a
href="https://dplyr.tidyverse.org/reference/slice.html"
class="external-link">dplyr::slice_head()</a></code>, <code><a
href="https://dplyr.tidyverse.org/reference/slice.html" class="exte [...]
@@ -680,9 +697,9 @@
<div class="section level3">
<h3 id="enhancements-to-dplyr-and-datasets-7-0-0">Enhancements to dplyr and
datasets<a class="anchor" aria-label="anchor"
href="#enhancements-to-dplyr-and-datasets-7-0-0"></a></h3>
<ul><li>Additional <a href="https://lubridate.tidyverse.org"
class="external-link">lubridate</a> features: <code>week()</code>, more of the
<code>is.*()</code> functions, and the label argument to <code>month()</code>
have been implemented.</li>
-<li>More complex expressions inside <code><a
href="https://dplyr.tidyverse.org/reference/summarise.html"
class="external-link">summarize()</a></code>, such as <code>ifelse(n() > 1,
mean(y), mean(z))</code>, are supported.</li>
+<li>More complex expressions inside <code>summarize()</code>, such as
<code>ifelse(n() > 1, mean(y), mean(z))</code>, are supported.</li>
<li>When adding columns in a dplyr pipeline, one can now use
<code>tibble</code> and <code>data.frame</code> to create columns of tibbles or
data.frames respectively (e.g. <code>... %>% mutate(df_col = tibble(a, b))
%>% ...</code>).</li>
-<li>Dictionary columns (R <code>factor</code> type) are supported inside of
<code><a href="https://dplyr.tidyverse.org/reference/coalesce.html"
class="external-link">coalesce()</a></code>.</li>
+<li>Dictionary columns (R <code>factor</code> type) are supported inside of
<code>coalesce()</code>.</li>
<li>
<code><a href="../reference/open_dataset.html">open_dataset()</a></code>
accepts the <code>partitioning</code> argument when reading Hive-style
partitioned files, even though it is not required.</li>
<li>The experimental <code><a
href="../reference/map_batches.html">map_batches()</a></code> function for
custom operations on dataset has been restored.</li>
@@ -697,7 +714,7 @@
<code><a href="https://rdrr.io/r/utils/head.html"
class="external-link">head()</a></code> no longer hangs on large CSV
datasets.</li>
<li>There is an improved error message when there is a conflict between a
header in the file and schema/column names provided as arguments.</li>
<li>
-<code><a href="../reference/write_csv_arrow.html">write_csv_arrow()</a></code>
now follows the signature of <code>readr::write_csv()</code>.</li>
+<code><a href="../reference/write_csv_arrow.html">write_csv_arrow()</a></code>
now follows the signature of <code><a
href="https://readr.tidyverse.org/reference/write_delim.html"
class="external-link">readr::write_csv()</a></code>.</li>
</ul></div>
<div class="section level3">
<h3 id="other-improvements-and-fixes-7-0-0">Other improvements and fixes<a
class="anchor" aria-label="anchor"
href="#other-improvements-and-fixes-7-0-0"></a></h3>
@@ -743,27 +760,27 @@
<h2 class="pkg-version" data-toc-text="6.0.0" id="arrow-600">arrow 6.0.0<a
class="anchor" aria-label="anchor" href="#arrow-600"></a></h2>
<p>There are now two ways to query Arrow data:</p>
<div class="section level3">
-<h3 id="1-expanded-arrow-native-queries-aggregation-and-joins-6-0-0">1.
Expanded Arrow-native queries: aggregation and joins<a class="anchor"
aria-label="anchor"
href="#1-expanded-arrow-native-queries-aggregation-and-joins-6-0-0"></a></h3>
-<p><code><a href="https://dplyr.tidyverse.org/reference/summarise.html"
class="external-link">dplyr::summarize()</a></code>, both grouped and
ungrouped, is now implemented for Arrow Datasets, Tables, and RecordBatches.
Because data is scanned in chunks, you can aggregate over larger-than-memory
datasets backed by many files. Supported aggregation functions include <code><a
href="https://dplyr.tidyverse.org/reference/context.html"
class="external-link">n()</a></code>, <code><a href="https [...]
-<p>Along with <code><a
href="https://dplyr.tidyverse.org/reference/summarise.html"
class="external-link">summarize()</a></code>, you can also call <code><a
href="https://dplyr.tidyverse.org/reference/count.html"
class="external-link">count()</a></code>, <code><a
href="https://dplyr.tidyverse.org/reference/count.html"
class="external-link">tally()</a></code>, and <code><a
href="https://dplyr.tidyverse.org/reference/distinct.html"
class="external-link">distinct()</a></code>, which effectiv [...]
-<p>This enhancement does change the behavior of <code><a
href="https://dplyr.tidyverse.org/reference/summarise.html"
class="external-link">summarize()</a></code> and <code><a
href="https://dplyr.tidyverse.org/reference/compute.html"
class="external-link">collect()</a></code> in some cases: see “Breaking
changes” below for details.</p>
-<p>In addition to <code><a
href="https://dplyr.tidyverse.org/reference/summarise.html"
class="external-link">summarize()</a></code>, mutating and filtering equality
joins (<code><a href="https://dplyr.tidyverse.org/reference/mutate-joins.html"
class="external-link">inner_join()</a></code>, <code><a
href="https://dplyr.tidyverse.org/reference/mutate-joins.html"
class="external-link">left_join()</a></code>, <code><a
href="https://dplyr.tidyverse.org/reference/mutate-joins.html" class="exte [...]
+<h3 id="id_1-expanded-arrow-native-queries-aggregation-and-joins-6-0-0">1.
Expanded Arrow-native queries: aggregation and joins<a class="anchor"
aria-label="anchor"
href="#id_1-expanded-arrow-native-queries-aggregation-and-joins-6-0-0"></a></h3>
+<p><code><a href="https://dplyr.tidyverse.org/reference/summarise.html"
class="external-link">dplyr::summarize()</a></code>, both grouped and
ungrouped, is now implemented for Arrow Datasets, Tables, and RecordBatches.
Because data is scanned in chunks, you can aggregate over larger-than-memory
datasets backed by many files. Supported aggregation functions include
<code>n()</code>, <code>n_distinct()</code>, <code>min(),</code> <code><a
href="https://rdrr.io/r/base/Extremes.html" class=" [...]
+<p>Along with <code>summarize()</code>, you can also call
<code>count()</code>, <code>tally()</code>, and <code>distinct()</code>, which
effectively wrap <code>summarize()</code>.</p>
+<p>This enhancement does change the behavior of <code>summarize()</code> and
<code>collect()</code> in some cases: see “Breaking changes” below for
details.</p>
+<p>In addition to <code>summarize()</code>, mutating and filtering equality
joins (<code>inner_join()</code>, <code>left_join()</code>,
<code>right_join()</code>, <code>full_join()</code>, <code>semi_join()</code>,
and <code>anti_join()</code>) with are also supported natively in Arrow.</p>
<p>Grouped aggregation and (especially) joins should be considered somewhat
experimental in this release. We expect them to work, but they may not be well
optimized for all workloads. To help us focus our efforts on improving them in
the next release, please let us know if you encounter unexpected behavior or
poor performance.</p>
<p>New non-aggregating compute functions include string functions like
<code>str_to_title()</code> and <code><a
href="https://rdrr.io/r/base/strptime.html"
class="external-link">strftime()</a></code> as well as compute functions for
extracting date parts (e.g. <code>year()</code>, <code>month()</code>) from
dates. This is not a complete list of additional compute functions; for an
exhaustive list of available compute functions see <code><a
href="../reference/list_compute_functions.html"> [...]
<p>We’ve also worked to fill in support for all data types, such as
<code>Decimal</code>, for functions added in previous releases. All type
limitations mentioned in previous release notes should be no longer valid, and
if you find a function that is not implemented for a certain data type, please
<a href="https://issues.apache.org/jira/projects/ARROW/issues"
class="external-link">report an issue</a>.</p>
</div>
<div class="section level3">
-<h3 id="2-duckdb-integration-6-0-0">2. DuckDB integration<a class="anchor"
aria-label="anchor" href="#2-duckdb-integration-6-0-0"></a></h3>
+<h3 id="id_2-duckdb-integration-6-0-0">2. DuckDB integration<a class="anchor"
aria-label="anchor" href="#id_2-duckdb-integration-6-0-0"></a></h3>
<p>If you have the <a href="https://CRAN.R-project.org/package=duckdb"
class="external-link">duckdb package</a> installed, you can hand off an Arrow
Dataset or query object to <a href="https://duckdb.org/"
class="external-link">DuckDB</a> for further querying using the <code><a
href="../reference/to_duckdb.html">to_duckdb()</a></code> function. This allows
you to use duckdb’s <code>dbplyr</code> methods, as well as its SQL interface,
to aggregate data. Filtering and column projection don [...]
<p>You can also take a duckdb <code>tbl</code> and call <code><a
href="../reference/to_arrow.html">to_arrow()</a></code> to stream data to
Arrow’s query engine. This means that in a single dplyr pipeline, you could
start with an Arrow Dataset, evaluate some steps in DuckDB, then evaluate the
rest in Arrow.</p>
</div>
<div class="section level3">
<h3 id="breaking-changes-6-0-0">Breaking changes<a class="anchor"
aria-label="anchor" href="#breaking-changes-6-0-0"></a></h3>
-<ul><li>Row order of data from a Dataset query is no longer deterministic. If
you need a stable sort order, you should explicitly <code><a
href="https://dplyr.tidyverse.org/reference/arrange.html"
class="external-link">arrange()</a></code> the query result. For calls to
<code><a href="https://dplyr.tidyverse.org/reference/summarise.html"
class="external-link">summarize()</a></code>, you can set
<code>options(arrow.summarise.sort = TRUE)</code> to match the current
<code>dplyr</code> beha [...]
+<ul><li>Row order of data from a Dataset query is no longer deterministic. If
you need a stable sort order, you should explicitly <code>arrange()</code> the
query result. For calls to <code>summarize()</code>, you can set
<code>options(arrow.summarise.sort = TRUE)</code> to match the current
<code>dplyr</code> behavior of sorting on the grouping columns.</li>
<li>
-<code><a href="https://dplyr.tidyverse.org/reference/summarise.html"
class="external-link">dplyr::summarize()</a></code> on an in-memory Arrow Table
or RecordBatch no longer eagerly evaluates. Call <code><a
href="https://dplyr.tidyverse.org/reference/compute.html"
class="external-link">compute()</a></code> or <code><a
href="https://dplyr.tidyverse.org/reference/compute.html"
class="external-link">collect()</a></code> to evaluate the query.</li>
+<code><a href="https://dplyr.tidyverse.org/reference/summarise.html"
class="external-link">dplyr::summarize()</a></code> on an in-memory Arrow Table
or RecordBatch no longer eagerly evaluates. Call <code>compute()</code> or
<code>collect()</code> to evaluate the query.</li>
<li>
-<code><a href="https://rdrr.io/r/utils/head.html"
class="external-link">head()</a></code> and <code><a
href="https://rdrr.io/r/utils/head.html"
class="external-link">tail()</a></code> also no longer eagerly evaluate, both
for in-memory data and for Datasets. Also, because row order is no longer
deterministic, they will effectively give you a random slice of data from
somewhere in the dataset unless you <code><a
href="https://dplyr.tidyverse.org/reference/arrange.html" class="external-lin
[...]
+<code><a href="https://rdrr.io/r/utils/head.html"
class="external-link">head()</a></code> and <code><a
href="https://rdrr.io/r/utils/head.html"
class="external-link">tail()</a></code> also no longer eagerly evaluate, both
for in-memory data and for Datasets. Also, because row order is no longer
deterministic, they will effectively give you a random slice of data from
somewhere in the dataset unless you <code>arrange()</code> to specify
sorting.</li>
<li>Simple Feature (SF) columns no longer save all of their metadata when
converting to Arrow tables (and thus when saving to Parquet or Feather). This
also includes any dataframe column that has attributes on each element (in
other words: row-level metadata). Our previous approach to saving this metadata
is both (computationally) inefficient and unreliable with Arrow queries +
datasets. This will most impact saving SF columns. For saving these columns we
recommend either converting the [...]
<li>Datasets are officially no longer supported on 32-bit Windows on R <
4.0 (Rtools 3.5). 32-bit Windows users should upgrade to a newer version of R
in order to use datasets.</li>
</ul></div>
@@ -785,7 +802,7 @@
<li>
<code><a href="../reference/write_parquet.html">write_parquet()</a></code> no
longer errors when used with a grouped data.frame</li>
<li>
-<code><a href="https://dplyr.tidyverse.org/reference/case_when.html"
class="external-link">case_when()</a></code> now errors cleanly if an
expression is not supported in Arrow</li>
+<code>case_when()</code> now errors cleanly if an expression is not supported
in Arrow</li>
<li>
<code><a href="../reference/open_dataset.html">open_dataset()</a></code> now
works on CSVs without header rows</li>
<li>Fixed a minor issue where the short readr-style types <code>T</code> and
<code>t</code> were reversed in <code><a
href="../reference/read_delim_arrow.html">read_csv_arrow()</a></code>
@@ -813,19 +830,19 @@
<div class="section level3">
<h3 id="more-dplyr-5-0-0">More dplyr<a class="anchor" aria-label="anchor"
href="#more-dplyr-5-0-0"></a></h3>
<ul><li>
-<p>There are now more than 250 compute functions available for use in <code><a
href="https://dplyr.tidyverse.org/reference/filter.html"
class="external-link">dplyr::filter()</a></code>, <code><a
href="https://dplyr.tidyverse.org/reference/mutate.html"
class="external-link">mutate()</a></code>, etc. Additions in this release
include:</p>
+<p>There are now more than 250 compute functions available for use in <code><a
href="https://dplyr.tidyverse.org/reference/filter.html"
class="external-link">dplyr::filter()</a></code>, <code>mutate()</code>, etc.
Additions in this release include:</p>
<ul><li>String operations: <code><a
href="https://rdrr.io/r/base/strsplit.html"
class="external-link">strsplit()</a></code> and <code>str_split()</code>;
<code><a href="https://rdrr.io/r/base/strptime.html"
class="external-link">strptime()</a></code>; <code><a
href="https://rdrr.io/r/base/paste.html"
class="external-link">paste()</a></code>, <code><a
href="https://rdrr.io/r/base/paste.html"
class="external-link">paste0()</a></code>, and <code>str_c()</code>; <code><a
href="https://rdrr.i [...]
</li>
<li>Date/time operations: <code>lubridate</code> methods such as
<code>year()</code>, <code>month()</code>, <code>wday()</code>, and so on</li>
<li>Math: logarithms (<code><a href="https://rdrr.io/r/base/Log.html"
class="external-link">log()</a></code> et al.); trigonometry (<code><a
href="https://rdrr.io/r/base/Trig.html" class="external-link">sin()</a></code>,
<code><a href="https://rdrr.io/r/base/Trig.html"
class="external-link">cos()</a></code>, et al.); <code><a
href="https://rdrr.io/r/base/MathFun.html"
class="external-link">abs()</a></code>; <code><a
href="https://rdrr.io/r/base/sign.html" class="external-link">sign()</a> [...]
</li>
-<li>Conditional functions, with some limitations on input type in this
release: <code><a href="https://rdrr.io/r/base/ifelse.html"
class="external-link">ifelse()</a></code> and <code><a
href="https://dplyr.tidyverse.org/reference/if_else.html"
class="external-link">if_else()</a></code> for all but <code>Decimal</code>
types; <code><a href="https://dplyr.tidyverse.org/reference/case_when.html"
class="external-link">case_when()</a></code> for logical, numeric, and temporal
types only; <cod [...]
+<li>Conditional functions, with some limitations on input type in this
release: <code><a href="https://rdrr.io/r/base/ifelse.html"
class="external-link">ifelse()</a></code> and <code>if_else()</code> for all
but <code>Decimal</code> types; <code>case_when()</code> for logical, numeric,
and temporal types only; <code>coalesce()</code> for all but lists/structs.
Note also that in this release, factors/dictionaries are converted to strings
in these functions.</li>
<li>
-<code>is.*</code> functions are supported and can be used inside <code><a
href="https://dplyr.tidyverse.org/reference/relocate.html"
class="external-link">relocate()</a></code>
+<code>is.*</code> functions are supported and can be used inside
<code>relocate()</code>
</li>
</ul></li>
-<li><p>The print method for <code>arrow_dplyr_query</code> now includes the
expression and the resulting type of columns derived by <code><a
href="https://dplyr.tidyverse.org/reference/mutate.html"
class="external-link">mutate()</a></code>.</p></li>
-<li><p><code><a href="https://dplyr.tidyverse.org/reference/transmute.html"
class="external-link">transmute()</a></code> now errors if passed arguments
<code>.keep</code>, <code>.before</code>, or <code>.after</code>, for
consistency with the behavior of <code>dplyr</code> on
<code>data.frame</code>s.</p></li>
+<li><p>The print method for <code>arrow_dplyr_query</code> now includes the
expression and the resulting type of columns derived by
<code>mutate()</code>.</p></li>
+<li><p><code>transmute()</code> now errors if passed arguments
<code>.keep</code>, <code>.before</code>, or <code>.after</code>, for
consistency with the behavior of <code>dplyr</code> on
<code>data.frame</code>s.</p></li>
</ul></div>
<div class="section level3">
<h3 id="csv-writing-5-0-0">CSV writing<a class="anchor" aria-label="anchor"
href="#csv-writing-5-0-0"></a></h3>
@@ -836,8 +853,8 @@
</ul></div>
<div class="section level3">
<h3 id="c-interface-5-0-0">C interface<a class="anchor" aria-label="anchor"
href="#c-interface-5-0-0"></a></h3>
-<ul><li>Added bindings for the remainder of C data interface: Type, Field, and
RecordBatchReader (from the experimental C stream interface). These also have
<code><a
href="https://rstudio.github.io/reticulate/reference/r-py-conversion.html"
class="external-link">reticulate::py_to_r()</a></code> and <code><a
href="https://rstudio.github.io/reticulate/reference/r-py-conversion.html"
class="external-link">r_to_py()</a></code> methods. Along with the addition of
the <code>Scanner$ToRecordBat [...]
-<li>C interface methods are exposed on Arrow objects (e.g.
<code>Array$export_to_c()</code>, <code>RecordBatch$import_from_c()</code>),
similar to how they are in <code>pyarrow</code>. This facilitates their use in
other packages. See the <code><a
href="https://rstudio.github.io/reticulate/reference/r-py-conversion.html"
class="external-link">py_to_r()</a></code> and <code><a
href="https://rstudio.github.io/reticulate/reference/r-py-conversion.html"
class="external-link">r_to_py()</a></c [...]
+<ul><li>Added bindings for the remainder of C data interface: Type, Field, and
RecordBatchReader (from the experimental C stream interface). These also have
<code><a
href="https://rstudio.github.io/reticulate/reference/r-py-conversion.html"
class="external-link">reticulate::py_to_r()</a></code> and
<code>r_to_py()</code> methods. Along with the addition of the
<code>Scanner$ToRecordBatchReader()</code> method, you can now build up a
Dataset query in R and pass the resulting stream of bat [...]
+<li>C interface methods are exposed on Arrow objects (e.g.
<code>Array$export_to_c()</code>, <code>RecordBatch$import_from_c()</code>),
similar to how they are in <code>pyarrow</code>. This facilitates their use in
other packages. See the <code>py_to_r()</code> and <code>r_to_py()</code>
methods for usage examples.</li>
</ul></div>
<div class="section level3">
<h3 id="other-enhancements-5-0-0">Other enhancements<a class="anchor"
aria-label="anchor" href="#other-enhancements-5-0-0"></a></h3>
@@ -874,9 +891,9 @@
<h3 id="dplyr-methods-4-0-0">dplyr methods<a class="anchor"
aria-label="anchor" href="#dplyr-methods-4-0-0"></a></h3>
<p>Many more <code>dplyr</code> verbs are supported on Arrow objects:</p>
<ul><li>
-<code><a href="https://dplyr.tidyverse.org/reference/mutate.html"
class="external-link">dplyr::mutate()</a></code> is now supported in Arrow for
many applications. For queries on <code>Table</code> and
<code>RecordBatch</code> that are not yet supported in Arrow, the
implementation falls back to pulling data into an in-memory R
<code>data.frame</code> first, as in the previous release. For queries on
<code>Dataset</code> (which can be larger than memory), it raises an error if
the functi [...]
+<code><a href="https://dplyr.tidyverse.org/reference/mutate.html"
class="external-link">dplyr::mutate()</a></code> is now supported in Arrow for
many applications. For queries on <code>Table</code> and
<code>RecordBatch</code> that are not yet supported in Arrow, the
implementation falls back to pulling data into an in-memory R
<code>data.frame</code> first, as in the previous release. For queries on
<code>Dataset</code> (which can be larger than memory), it raises an error if
the functi [...]
<li>
-<code><a href="https://dplyr.tidyverse.org/reference/transmute.html"
class="external-link">dplyr::transmute()</a></code> (which calls <code><a
href="https://dplyr.tidyverse.org/reference/mutate.html"
class="external-link">mutate()</a></code>)</li>
+<code><a href="https://dplyr.tidyverse.org/reference/transmute.html"
class="external-link">dplyr::transmute()</a></code> (which calls
<code>mutate()</code>)</li>
<li>
<code><a href="https://dplyr.tidyverse.org/reference/group_by.html"
class="external-link">dplyr::group_by()</a></code> now preserves the
<code>.drop</code> argument and supports on-the-fly definition of columns</li>
<li>
@@ -996,7 +1013,7 @@
<code><a href="../reference/write_dataset.html">write_dataset()</a></code> to
Feather or Parquet files with partitioning. See the end of <code><a
href="../articles/dataset.html">vignette("dataset", package =
"arrow")</a></code> for discussion and examples.</li>
<li>Datasets now have <code><a href="https://rdrr.io/r/utils/head.html"
class="external-link">head()</a></code>, <code><a
href="https://rdrr.io/r/utils/head.html"
class="external-link">tail()</a></code>, and take (<code>[</code>) methods.
<code><a href="https://rdrr.io/r/utils/head.html"
class="external-link">head()</a></code> is optimized but the others may not be
performant.</li>
<li>
-<code><a href="https://dplyr.tidyverse.org/reference/compute.html"
class="external-link">collect()</a></code> gains an <code>as_data_frame</code>
argument, default <code>TRUE</code> but when <code>FALSE</code> allows you to
evaluate the accumulated <code>select</code> and <code>filter</code> query but
keep the result in Arrow, not an R <code>data.frame</code>
+<code>collect()</code> gains an <code>as_data_frame</code> argument, default
<code>TRUE</code> but when <code>FALSE</code> allows you to evaluate the
accumulated <code>select</code> and <code>filter</code> query but keep the
result in Arrow, not an R <code>data.frame</code>
</li>
<li>
<code><a href="../reference/read_delim_arrow.html">read_csv_arrow()</a></code>
supports specifying column types, both with a <code>Schema</code> and with the
compact string representation for types used in the <code>readr</code> package.
It also has gained a <code>timestamp_parsers</code> argument that lets you
express a set of <code>strptime</code> parse strings that will be tried to
convert columns designated as <code>Timestamp</code> type.</li>
@@ -1063,7 +1080,7 @@
<code>character</code> vectors that exceed 2GB are converted to Arrow
<code>large_utf8</code> type</li>
<li>
<code>POSIXlt</code> objects can now be converted to Arrow
(<code>struct</code>)</li>
-<li>R <code><a href="https://rdrr.io/r/base/attributes.html"
class="external-link">attributes()</a></code> are preserved in Arrow metadata
when converting to Arrow RecordBatch and table and are restored when converting
from Arrow. This means that custom subclasses, such as
<code>haven::labelled</code>, are preserved in round trip through Arrow.</li>
+<li>R <code><a href="https://rdrr.io/r/base/attributes.html"
class="external-link">attributes()</a></code> are preserved in Arrow metadata
when converting to Arrow RecordBatch and table and are restored when converting
from Arrow. This means that custom subclasses, such as <code><a
href="https://haven.tidyverse.org/reference/labelled.html"
class="external-link">haven::labelled</a></code>, are preserved in round trip
through Arrow.</li>
<li>Schema metadata is now exposed as a named list, and it can be modified by
assignment like <code>batch$metadata$new_key <- "new value"</code>
</li>
<li>Arrow types <code>int64</code>, <code>uint32</code>, and
<code>uint64</code> now are converted to R <code>integer</code> if all values
fit in bounds</li>
@@ -1167,7 +1184,7 @@
<h2 class="pkg-version" data-toc-text="0.16.0" id="arrow-0160">arrow 0.16.0<a
class="anchor" aria-label="anchor" href="#arrow-0160"></a></h2><p
class="text-muted">CRAN release: 2020-02-09</p>
<div class="section level3">
<h3 id="multi-file-datasets-0-16-0">Multi-file datasets<a class="anchor"
aria-label="anchor" href="#multi-file-datasets-0-16-0"></a></h3>
-<p>This release includes a <code>dplyr</code> interface to Arrow Datasets,
which let you work efficiently with large, multi-file datasets as a single
entity. Explore a directory of data files with <code><a
href="../reference/open_dataset.html">open_dataset()</a></code> and then use
<code>dplyr</code> methods to <code><a
href="https://dplyr.tidyverse.org/reference/select.html"
class="external-link">select()</a></code>, <code><a
href="https://dplyr.tidyverse.org/reference/filter.html" clas [...]
+<p>This release includes a <code>dplyr</code> interface to Arrow Datasets,
which let you work efficiently with large, multi-file datasets as a single
entity. Explore a directory of data files with <code><a
href="../reference/open_dataset.html">open_dataset()</a></code> and then use
<code>dplyr</code> methods to <code>select()</code>, <code><a
href="https://rdrr.io/r/stats/filter.html"
class="external-link">filter()</a></code>, etc. Work will be done where
possible in Arrow memory. When n [...]
<p>See <code><a href="../articles/dataset.html">vignette("dataset", package =
"arrow")</a></code> for details.</p>
</div>
<div class="section level3">
@@ -1246,7 +1263,7 @@
</div>
<div class="pkgdown-footer-right">
- <p>Site built with <a href="https://pkgdown.r-lib.org/"
class="external-link">pkgdown</a> 2.2.0.</p>
+ <p>Site built with <a href="https://pkgdown.r-lib.org/"
class="external-link">pkgdown</a> 2.1.3.</p>
</div>
</footer></div>