Just wanted to give an update on the status of this, since it�s been a couple 
days and I�ve had a chance to work on it a little more.

Improvements:
- Fixed a few bugs, added some more robust checking to ensure correct checking 
for leaf nodes
- Corrected references to �in-order� traversals, I actually meant �pre-order�
- Added new documentation, including some new examples to the �Usage� section
- Cleaned up some names/variables/identifiers
- Added some additional code to have function accurately replicate a weird bug 
of `stats::dendrapply` that is used in CRAN packages. Full details are in my PR 
(linked below).


I�ve integrated this into the svn mirror at r-devel/r-svn, and put out a PR at 
https://github.com/r-devel/r-svn/pull/111. Current PR is passing all build 
checks aside from Windows, which is throwing the error `Sorry, but: Error 
response from server: 500` while installing Miktex. I�m not sure what�s causing 
this, but it seems to be something aside from my code because it�s also 
crashing builds for other PRs.

A link to the diff file is here: 
https://patch-diff.githubusercontent.com/raw/r-devel/r-svn/pull/111.diff

Happy to open a Bugzilla report as well; this is enough code that a discussion 
is probably warranted, and Bugzilla may be an easier place to discuss compared 
to here. Also happy to discuss on the PR itself.

Thank you to everyone that has taken a look at my code, I appreciate people 
taking the time to read through it.

Sincerely,
Aidan Lakshman

-----------------------
Aidan Lakshman (he/him)<https://www.ahl27.com/>
Doctoral Candidate, Wright Lab<https://www.wrightlabscience.com/>
University of Pittsburgh School of Medicine
Department of Biomedical Informatics
ah...@pitt.edu
(724) 612-9940


From: Lakshman, Aidan H <ah...@pitt.edu>
Date: Friday, February 24, 2023 at 07:42
To: Toby Hocking <tdho...@gmail.com>
Cc: R-devel@r-project.org <R-devel@r-project.org>
Subject: Re: [Rd] `dendrapply` Enhancements
Hi Toby,

Thanks for your reply! I haven�t heard about the R project sprint, but I�ll 
definitely check it out. UK is going to be a little hard for me to get to 
funding-wise, but I�ll try to apply for funding.

I appreciate your other comments. As far as coding style, I did do everything I 
could think of to make sure it�s a drop-in replacement for the current version 
with the default settings, so all the user-exposed arguments/variables should 
be identical. I used the conventions in 
https://github.com/wch/r-source/wiki/Contributing for commenting and 
whitespace, so hopefully that all looks okay. I�m realizing there may be some 
differences in tab widths, but I can fix that later today.

-Aidan

-----------------------
Aidan Lakshman (he/him)<https://www.ahl27.com/>
Doctoral Candidate, Wright Lab<https://www.wrightlabscience.com/>
University of Pittsburgh School of Medicine
Department of Biomedical Informatics
ah...@pitt.edu
(724) 612-9940


From: Toby Hocking <tdho...@gmail.com>
Date: Friday, February 24, 2023 at 06:57
To: Lakshman, Aidan H <ah...@pitt.edu>
Cc: R-devel@r-project.org <R-devel@r-project.org>
Subject: Re: [Rd] `dendrapply` Enhancements
Hi Aidan, I think you are on the right email list.
I'm not R-core, but this looks like an interesting/meaningful/significant 
contribution to base R.
I'm not sure what the original dendrapply looks like in terms of code style 
(variable names/white space formatting/etc) but in my experience it is 
important that your code contribution makes minimal changes in that area.
Did you hear about the R project sprint 2023? 
https://contributor.r-project.org/r-project-sprint-2023/<https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcontributor.r-project.org%2Fr-project-sprint-2023%2F&data=05%7C01%7CAHL27%40pitt.edu%7C41b702f139034cd7165608db165e4530%7C9ef9f489e0a04eeb87cc3a526112fd0d%7C1%7C0%7C638128366414606277%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=oB%2BBivUsBjIgBtZNU8mh%2Fz2rujD3bv9MbWdxqNtyUFk%3D&reserved=0>
 Your work falls into the "new developments" category so I think you could 
apply for that funding to participate.
Toby

On Fri, Feb 24, 2023 at 3:47 AM Lakshman, Aidan H 
<ah...@pitt.edu<mailto:ah...@pitt.edu>> wrote:
Hi everyone,

My apologies if this isn�t the right place to submit this�I�m new to the 
R-devel community and still figuring out what is where.

If people want to skip my writeup and just look at the code, I�ve made a 
repository for it here: 
https://github.com/ahl27/new_dendrapply/tree/master<https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fahl27%2Fnew_dendrapply%2Ftree%2Fmaster&data=05%7C01%7CAHL27%40pitt.edu%7C41b702f139034cd7165608db165e4530%7C9ef9f489e0a04eeb87cc3a526112fd0d%7C1%7C0%7C638128366414606277%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=QI%2B5t1C%2BJB15D8o8noZra4W87fgyITm12nluGN%2BFNoE%3D&reserved=0>.
 I�m not quite sure how to integrate it into a fork of R-devel; the package 
structure is different from what I�m used to.

I had written a slightly improved version of dendrapply for one of my research 
projects, and my advisor encouraged me to submit it to the R project. It took 
me longer than I expected, but I�ve finally gotten my implementation to be a 
drop-in replacement for `stats::dendrapply`. The man page for 
`stats::dendrapply` says �The implementation is somewhat experimental and 
suggestions for enhancements (or nice examples of usage) are very welcome,� so 
I figured this had the potential to be a worthwhile contribution. I wanted to 
send it out to R-devel to see if this was something worth pursuing as an 
enhancement to R.

The implementation I have is based in C, which I understand implies an 
increased burden of maintenance over pure R code. However, it does come with 
the following benefits:

- Completely eliminates recursion, so no memory overhead from function calls or 
possibility of stack overflows (this was a major issue reported on some of the 
functions in one of our Bioconductor packages that previously used 
`dendrapply`).
- Modest runtime improvement, around 2x on my computer (2021 MBP, 32GB RAM). 
I�m relatively confident this could be optimized more.
- Seemingly significant reduction in memory reduction, still working on a 
robust benchmark. Suggestions for the best way to do that are welcome.
- Support for applying functions with an inorder traversal (as in 
`stats::dendrapply`) as well as using a postorder traversal.

This implementation was tested manually as well as running all the unit tests 
in `dendextend`, which comprises a lot of applications of `dendrapply`.

The postorder traversal would be a significant new functionality to dendrapply, 
as it would allow for functions that use the child nodes to correctly execute. 
A toy example of this is something like:
```
exFunc <- function(x){
  attr(x, 'newA') <- 'a'
  if(is.null(attr(x, 'leaf'))){
    cat(attr(x[[1]], 'newA'), attr(x[[2]], 'newA'))
    cat('\n')
  }
  x
})

dendrapply(dend, exFunc)
```

With the current version of dendrapply, this prints nothing, but the postorder 
traversal version will print �a� twice for each internal branch. If this would 
be a worthwhile addition, I can refactor the code for brevity and add a 
`how=c("in.order", "post.order")`, with the default value �in.order� to 
maintain backwards compatibility. A preorder traversal version should also be 
possible, I just haven�t gotten to it yet.

I think the runtime could be optimized more as well.

Thank you in advance for looking at my code and offering feedback; I�m excited 
at the possibility of helping contribute to the R project! I�m happy to discuss 
more either here, on GitHub, or on the R Contributors Slack.

Sincerely,
Aidan Lakshman

-----------------------
Aidan Lakshman 
(he/him)<https://www.ahl27.com/<https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ahl27.com%2F&data=05%7C01%7CAHL27%40pitt.edu%7C41b702f139034cd7165608db165e4530%7C9ef9f489e0a04eeb87cc3a526112fd0d%7C1%7C0%7C638128366414606277%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=7KUpJpdulSIzSXbpDJlyUV8pMJm%2BSVFvDOJTlVs9lhc%3D&reserved=0>>
Doctoral Candidate, Wright 
Lab<https://www.wrightlabscience.com/<https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.wrightlabscience.com%2F&data=05%7C01%7CAHL27%40pitt.edu%7C41b702f139034cd7165608db165e4530%7C9ef9f489e0a04eeb87cc3a526112fd0d%7C1%7C0%7C638128366414606277%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=JMgw%2BMiQ6xdp3OokToJ2nyyco%2BryiFH%2B9ap5iU3yJH8%3D&reserved=0>>
University of Pittsburgh School of Medicine
Department of Biomedical Informatics
ah...@pitt.edu<mailto:ah...@pitt.edu>
(724) 612-9940


        [[alternative HTML version deleted]]

______________________________________________
R-devel@r-project.org<mailto:R-devel@r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel<https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-devel&data=05%7C01%7CAHL27%40pitt.edu%7C41b702f139034cd7165608db165e4530%7C9ef9f489e0a04eeb87cc3a526112fd0d%7C1%7C0%7C638128366414606277%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=I88%2FQhGHXDRS2yHqvh53k3MSWHSd5z2KBgORUHIxfG0%3D&reserved=0>

        [[alternative HTML version deleted]]

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to