TPDeramus opened a new issue, #43747: URL: https://github.com/apache/arrow/issues/43747
### Describe the bug, including details regarding any error messages, version, and platform. Hi Arrow Devs. Not sure if this would be more of a usage question or a bug, but it certainly seemed to behave like the latter. I have a large amount of data that needs to be queried in a specific way using a combination of hard (exact matches) and soft (begins with) string matches based on another condition (e.g. value of "class" in another column). I was recently able to get this working via `if_else` with many other singular instances where the queries were smaller, but with this larger dataset I need to recruit `case_when` for a set of larger, conditional mutations based on which "database" the query belongs to. However, every time I do this, the package seems to throw an error that it cannot be run in `arrow` and needs to be pulled into R: ``` FilteredOutput <- LargeDataset |> select(ID, BeginDate, EndDate, class, identity_string, database) |> distinct() |> mutate(Foundate = as.Date("1700-01-01"), Logdate = case_when((((class == "A" & arrow_is_in(identity_string,options = list(value_set = Array$create(HardA), skip_nulls = TRUE))) | (class == "A" & str_starts(identity_string, WildA)) | (class == "B" & arrow_is_in( identity_string, options = list(value_set = Array$create(HardB),skip_nulls = TRUE))) | (class == "C" & arrow_is_in(identity_string, options = list(value_set = Array$create(CHard),skip_nulls = TRUE)))) & database == "secondary") ~ EndDate, database == "primary" ~ as.Date(as.integer(BeginDate) + 30), .default = Foundate)) Error: Expression case_when((((arrow_is_in(identity_string, options = list(value_set = Array$create(HardA), ... not supported in Arrow Call collect() first to pull data into R ``` I assumed this was some kind of issue with doing a date conversion in `arrow`, which I realize does not have hard-mapped `tidyverse` commands for doing mathematical operations on dates (e.g. I need to add 30 days to BeginDate and the like, hence why I used Foundate as an imputed placeholder that would not be possible to remove later. However, oddly enough, every single one of these other options seem to work: ``` #Works (adding to date permitted, no case_when) FilteredOutput <- LargeDataset |> select(ID, BeginDate, EndDate, class, identity_string, database) |> distinct() |> mutate(Logdate = if_else(database == "primary", as.Date(as.integer(BeginDate) + 30), EndDate)) |> as_arrow_table() #Works (no case_when, no math performed on date) FilteredOutput <- LargeDataset |> select(ID, BeginDate, EndDate, class, identity_string, database) |> distinct() |> mutate(Logdate = if_else((((class == "A" & arrow_is_in(identity_string,options = list(value_set = Array$create(HardA), skip_nulls = TRUE))) | (class == "A" & str_starts(identity_string, WildA)) | (class == "B" & arrow_is_in(identity_string, options = list(value_set = Array$create(HardB),skip_nulls = TRUE))) | (class == "C" & arrow_is_in(identity_string, options = list(value_set = Array$create(CHard),skip_nulls = TRUE)))) & database == "secondary"), EndDate, NA)) |> as_arrow_table() #Works (no case when, adding to date, permits operations on date) FilteredOutput <- LargeDataset |> select(ID, BeginDate, EndDate, class, identity_string, database) |> distinct() |> mutate(Foundate = as.Date("1700-01-01"), Logdate = if_else(database == "primary", as.Date(as.integer(BeginDate) + 30), Foundate)) |> as_arrow_table() #This Works(HAS case_when, but value for met criteria is an integer, not a date) > FilteredOutput <- LargeDataset |> select(ID, BeginDate, EndDate, class, identity_string, database) |> distinct() |> mutate(Foundate = as.Date("1700-01-01"), Logdate = case_when((((class == "A" & arrow_is_in(identity_string,options = list(value_set = Array$create(HardA), skip_nulls = TRUE))) | (class == "A" & str_starts(identity_string, WildA)) | (class == "B" & arrow_is_in( identity_string, options = list(value_set = Array$create(HardB),skip_nulls = TRUE))) | (class == "C" & arrow_is_in(identity_string, options = list(value_set = Array$create(CHard),skip_nulls = TRUE)))) & database == "secondary") ~ 1, database == "primary" ~ 0, .default = 2)) |> as_arrow_table() #This Works (no case_when, saves as date with no math) FilteredOutput <- LargeDataset |> select(ID, BeginDate, EndDate, class, identity_string, database) |> distinct() |> mutate(Foundate = as.Date("1700-01-01"), Logdate = if_else((((class == "A" & arrow_is_in(identity_string,options = list(value_set = Array$create(HardA), skip_nulls = TRUE))) | (class == "A" & str_starts(identity_string, WildA)) | (class == "B" & arrow_is_in(identity_string, options = list(value_set = Array$create(HardB),skip_nulls = TRUE))) | (class == "C" & arrow_is_in(identity_string, options = list(value_set = Array$create(CHard),skip_nulls = TRUE)))) & database == "secondary"), EndDate, Foundate)) |> as_arrow_table() ``` Am I facing some sort of syntax issue or is there a limitation when attempting to set `case_when` conditions with regards to dates? Thank you in advance for any suggestions. ### Component(s) R -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org