Re: [R-pkg-devel] "pdflatex: not found" in GitHub Action

2022-04-23 Thread Greg Hunt
Spencer,
These are quite different issues.  pdflatex is an executable (for
processing LaTeX source, an executable which it appears is missing).
pdfpages is a LaTeX package and my question is about its being
inconsistently available on different RHub platforms and how that
inconsistency relates to the wisdom of using it in a package vignette.

Greg

On Sat, 23 Apr 2022 at 22:34, Spencer Graves <
spencer.gra...@effectivedefense.org> wrote:

> Hello, All:
>
>
>   Greg Hunt  asked this list 4 hours ago,
> "Is the
> use of pdfpages a viable approach" for use in vignettes?
>
>
>   Three of four GitHub Action checks of "JamesRamsay5/fda" failed,
> because "pdflatex: not found".
>
>
>   What do you suggest I do about that problem, if anything?
>
>
>   Thanks,
>   Spencer Graves
>
>
>  Forwarded Message 
> Subject:[JamesRamsay5/fda] Run failed: R-CMD-check - master
> (663569e)
> Date:   Fri, 22 Apr 2022 20:40:16 -0700
> From:   sbgraves237 
> Reply-To:   JamesRamsay5/fda 
> To: JamesRamsay5/fda 
> CC: Ci activity 
>
>
>
> GitHub
>
>
>  [JamesRamsay5/fda] R-CMD-check workflow run
>
>
>R-CMD-check: Some jobs were not successful
>
> View workflow run
> <https://github.com/JamesRamsay5/fda/actions/runs/2210173324>
>
> windows-latest (release)
>
> *R-CMD-check* / windows-latest (release)
> Failed in 3 minutes and 5 seconds
>
> annotations for R-CMD-check / windows-latest (release) 6
> <https://github.com/JamesRamsay5/fda/actions/runs/2210173324>
>
> macOS-latest (release)
>
> *R-CMD-check* / macOS-latest (release)
> Cancelled
>
> annotations for R-CMD-check / macOS-latest (release) 2
> <https://github.com/JamesRamsay5/fda/actions/runs/2210173324>
>
> ubuntu-20.04 (release)
>
> *R-CMD-check* / ubuntu-20.04 (release)
> Failed in 15 minutes and 29 seconds
>
> annotations for R-CMD-check / ubuntu-20.04 (release) 12
> <https://github.com/JamesRamsay5/fda/actions/runs/2210173324>
>
> ubuntu-20.04 (devel)
>
> *R-CMD-check* / ubuntu-20.04 (devel)
> Failed in 2 minutes and 21 seconds
>
> annotations for R-CMD-check / ubuntu-20.04 (devel) 6
> <https://github.com/JamesRamsay5/fda/actions/runs/2210173324>
>
> —
> You are receiving this because this workflow ran on your branch.
> Manage your GitHub Actions notifications
> <https://github.com/settings/notifications>
>
> GitHub, Inc. ・88 Colin P Kelly Jr Street ・San Francisco, CA 94107
>
>

[[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Searching examples in source code

2022-05-08 Thread Greg Hunt
Indeed, and since this is a database client, the question is not just the
elapsed time of the client code, but the optimisation opportunities need to
be viewed as a proportion of the overall request time: client time, network
time, typical and minimum database server time.

On Mon, 9 May 2022 at 05:24, Joshua Ulrich  wrote:

> Hi Ben,
>
> On Sat, May 7, 2022 at 4:24 PM Ben Engbers 
> wrote:
> >
> > Hi,
> >
> > My package (RBaseX) is written entirely in R. The performance is not bad
> > but to further improve the performance I want to investigate to what
> > extent use of C++ makes sense. Problem is that I have little experience
> > with C++ and none with Rcpp. So I am looking for examples.
> > On my (linux) system I installed several packages that needed to be
> > compiled. So it is likely that I already have examples on my system.
> >
> I strongly recommend you profile your code to determine where there
> are performance bottlenecks before writing any new code. Especially
> before adding compiled code to your package.
>
> The infamous Knuth quote:
> "We should forget about small efficiencies, say about 97% of the time:
> premature optimization is the root of all evil. Yet we should not pass
> up our opportunities in that critical 3%. A good programmer will not
> be lulled into complacency by such reasoning, he will be wise to look
> carefully at the critical code; but only after that code has been
> identified."
>
> Best,
> Josh
>
> > My first question is if there is a useful linux command to search all
> > the source code of installed packages on my system.
> > The second question is if there is a command to search all packages at
> > https://cran.r-project.org/web/packages/available_packages_by_name.html?
> >
> > This question is not only relevant to C++ examples. It would also be
> > nice if you could search for occurrences of commands in R code.
> >
> > Ben
> >
> > __
> > R-package-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-package-devel
>
>
>
> --
> Joshua Ulrich  |  about.me/joshuaulrich
> FOSS Trading  |  www.fosstrading.com
>
> __
> R-package-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-package-devel
>

[[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] URL checks

2022-06-30 Thread Greg Hunt
Ivan,
I am sure that we can make this work a bit better, and do I agree that it
working perfectly isn't going to happen.  I don't think that the behaviour
you're seeing is likely to be stateful: recording the fact that you have
made a previous request from a browser.  That type of protection is
implemented for some DDOS attacks, but it soaks up resources (money/speed)
to little point when there is no DDOS or when the DDOS is too large.
Remembering and looking up the request has a cost and isn't really
scalable.

I got errors for DOI and the .eu examples earlier today (but I didn't hit
the rstudio link) and I never accessed the pages using a browser. Removing
the -I from the curl request allowed them to succeed.

Without the http HEAD method I got 302 (redirects) from doi.org which seems
to indicate that the ID exists and 404 (not found) for an ID which did
not.   For DOI checks I suggest removing the nobody settings and treating
anything other than a 404 (not found) from the DOI.ORG web server as
success (or perhaps more precisely, regarding a 302 redirect as success, a
404 not found as a failure and anything else as potentially ambiguous or at
least we'd need to categorise them as temporary or permanent but I am not
sure how much better that extra complexity makes things) .  That would be
an improvement over the current behaviour for most references to doi.org.

I get the 403 code from rstudio that you do, I suspect that they are
checking the browser in a way that doi.org doesn't.  Thats probably to
protect their site text from content scraping and getting into an arms race
with them is likely to be pointless.  Forbidden does mean that the server
is there but we can't tell the difference between not found and any other
condition.  I'd suggest that a 403 (which means actively declined by the
web server) should be treated as success IF there is a cloudflare server
header as well and getting more CDNs added to the check over time.  You
aren't going to get access anyway.  It looks like the top three CDN vendors
are CloudFlare, Amazon and Akamai and getting coverage of those three would
get you about 90% coverage of CDN fronted hosts and CloudFlare are the
overwhelming market leader.

In summary:

   - removing nobody, which selects the HEAD method, may allow the
   composite indicators eu sites to work, meaning that sites that have removed
   support for head (not an uncommon thing to do at the prompting of IT
   auditors) will start to work.
   - removing nobody and then not following the redirect may allow the
   doi.org requests to work
   - seeing a 403 code and a cloudflare server header in response to a
   request should be regarded as success, its as positive as you are likely to
   see
   - check what the responses from Amazon and Akamai  look like to identify
   them (Amazon responses have a bunch of X-amzn-* headers in them and I
   looked at an Akamai site which included an x-akamai-transformed header in
   its response) - I would add logging to the build environment to collect the
   requests and response headers from failed requests to see what the overall
   behaviour is

I think its worth exploring to remove a bunch of the recurring questions
about URL lookup.  The question is whether the servers running the CRAN
checks see the same behaviour that I am seeing.  If we can get say two
thirds of errors resolved this way then everyone is better off.

None of this will increase the traffic rate from CRAN as a result of these
checks, and frankly I doubt that you are going to generate enough traffic
to show up in anyone's traffic analysis anyway.  The hits on doi.org are
likely to be the single largest group and doi.org clearly expect that
people will access the site from scripts, so I doubt that this will cause
more explicit blocking.  For myself I tend to get a bit antsy about sites
that submit failed requests over and over or the ones that seem to be
systematically scraping a site (meaning many thousands of requests and/or a
very organised pattern).


Greg


On Thu, 30 Jun 2022 at 18:36, Ivan Krylov  wrote:

> Greg,
>
> I realise you are trying to solve the problem and I thank you for
> trying to make the URL checks better for everyone. I probably sound
> defeatist in my e-mails; sorry about that.
>
> On Thu, 30 Jun 2022 17:49:49 +1000
> Greg Hunt  wrote:
>
> > Do you have evidence that even without the use of HEAD that
> > CloudFlare is rejecting the CRAN checks?
>
> Unfortunately, yes, I think it's possible:
>
> $ curl -v
> https://support.rstudio.com/hc/en-us/articles/219949047-Installing-older-versions-of-packages
> # ...skipping TLS logs...
> > GET /hc/en-us/articles/219949047-Installing-older-versions-of-packages
> HTTP/2
> > Host: support.rstudio.com
> > User-Agent: curl/7.64.0
> > Accept: */*
> >
> * Connection state changed (MAX_CONCURRE

Re: [R-pkg-devel] Undocumented requirement for CRAN

2022-09-12 Thread Greg Hunt
These aren't requirements of the language, they are issues of code and
documentation quality.  Other languages have exactly the same issues and no
code audit tool I've ever seen provides 100% coverage of potential issues.
There is always a level of human intelligence applied to effective code
review (I do know you can do code review without involving people but that
tends in practice to be fairly basic).  The CRAN process, human checking on
initial submission and a bit less on the later submissions seems like a
good balance of complexity and development effort.

Things like enforcing documentation of return values would require package
standards to be tightened.  That would cause problems for existing packages
and I don't think that there is specific markup for a return value anyway,
so detecting missing return value documentation either requires changes to
the underlying Rd markup, a bunch more intelligence in the check, or human
inspection.

The moment you document a function and explain how to get to it you have
committed to dealing with what happens when you change it.  Regardless of
whether its described as internal or publicly visible, people will use it
and they will get annoyed when you change it or when it is broken (which
may not matter to the internal use in the package but may be visible in
edge cases from outside).  If they have to go digging to find and
understand the internal interfaces its much clearer to them that they are
on their own.  Looked at another way, if the external interface is complete
then people don't need the internal interfaces.  If its incomplete then
thats an interface design problem.  I have seen distinction made between
general use programming interfaces and product (or release) sensitive
programming interfaces, but they come with statements about how they are
likely to change and what limited purposes they can be used for.  Thats a
more bureaucratic approach than R developers are likely to engage with.

Greg

On Tue, 13 Sept 2022 at 09:46, Jiří Moravec 
wrote:

> For experienced R developer like you, certain things might seem obvious
> without the need of any documentation.
>
> But please understand that other languages do not have such requirements.
> So for new people, or people coming from different languages, this might
> not seem as obvious as it is for you.
>
> R already has a capability to do automated checking of packages to
> enforce certain level of quality.
>  From this perspective, at least to me, it doesn't make sense that some
> issues are automatically flagged,
> while other issues, which might be on the same or lower level of
> complexity, are not.
> (also, packages are not journal articles)
>
> Same with documentation, I can't spot and fix an issue, if I am not even
> aware that it is an issue.
>
>
>  > If a user can't count on the interface for those functions remaining
> unchanged, why document it in a user-visible place?
>
> Why not? Even unexported functions are user-visible through ::: . Since
> they are already documented, I might as well produce full documentation
> that is checked during `R CMD check`.
> Isn't one of the R's advantage the ability to read code of any function
> without wading through the source files?
>
>  > The fact that some base packages don't document this is a deficiency
> in that documentation, not an excuse for having a deficiency in your
> documentation.
>
> That is good to know. I certainly know it now after having to fix this
> issue in my package. But how I am was supposed to know about it when
> this problem is not documented, `R CMD check` doesn't flag it, and
> official documentation uses it?
>
>
> -- Jirka
>
> On 9/13/22 11:19, Duncan Murdoch wrote:
> > On 12/09/2022 6:42 p.m., Jiří Moravec wrote:
> >> There are quite a lot of undocumented requirement for CRAN.
> >> This bite me several times already.
> >>
> >> They are not documented in the
> >> https://cran.r-project.org/doc/manuals/R-exts.html
> >> Nor they are marked by `R CMD check`
> >>
> >> Ideally, these would be documented AND flagged by R CMD check.
> >> Otherwise, it is a waste of time for both CRAN team and package
> >> developers.
> >>
> >> So far, the undocumented requirements that were flagged for me are:
> >>
> >> * Documenting return value even for functions without return value
> >> -- This is even contrary to the base code (i.e., many graphical
> >> functions do not document return values)
> >>
> >> * Commented code in examples
> >>
> >> * Examples for non-exported internal functions
> >> -- I understand that this is related to the fact that any ::: is
> >> highly discouraged (which is documented) and that examples for
> >> unexported functions cannot be run without ::: .
> >>But I like the idea of using properly documented internal
> >> functions and usage of examples as for rudimentary testing.
> >>
> >>
> >> Are there any other undocumented requirements?
> >
> > Of course there are.  CRAN is not an automaton, it is a group of

Re: [R-pkg-devel] Non-ASCII and CRAN Checks

2022-09-20 Thread Greg Hunt
Leaving data in the wrong encoding is leaving a bug around waiting to
surface.  Is the data correctly encoded as Latin1 (codepage 8859-1),
Windows 8 bit (codepage 1252, also sometimes referred to as Latin1) or some
Unicode encoding (likely UTF-8)?

Character mapping is not such an issue for mapping characters between the
traditional 8 bit character sets (they do very little interpretation), but
in going in and out of Unicode with incorrectly encoded data you can end up
with non-characters (a concept that does not exist in 8 bit character sets)
in your data which bite you much later when other systems require data to
be Unicode characters.  I also had some extremely odd behaviour from R
around the beginning of the year when some Unicode accented characters got
into some variable names and data frame data access got quite weird.

Greg

On Wed, 21 Sept 2022 at 10:04, Hadley Wickham  wrote:

> In my experience this NOTE does not interfere with CRAN submission and you
> can ignore it.
>
> Hadley
>
> On Monday, September 19, 2022, Igor L  wrote:
>
> > Hello everybody,
> >
> > I'm testing my package with the devtools::check() function and I got a
> > warning about found non-ASCII strings.
> >
> > These characters are in a dataframe and, as they are names of
> institutions
> > used to filter databases, it makes no sense to translate them.
> >
> > Is there any way to make the check accept these characters?
> >
> > They are in latin1 encoding.
> >
> > Thanks in advance!
> >
> > --
> > *Igor Laltuf Marques*
> > Economist (UFF)
> > Master in urban and regional planning (IPPUR-UFRJ)
> > Researcher at ETTERN e CiDMob
> > https://igorlaltuf.github.io/
> >
> > [[alternative HTML version deleted]]
> >
> > __
> > R-package-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-package-devel
> >
>
>
> --
> http://hadley.nz
>
> [[alternative HTML version deleted]]
>
> __
> R-package-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-package-devel
>

[[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] corrupted NAMESPACE file

2023-01-20 Thread Greg Hunt
xFEFF isn't a BOM in a UTF-8 file, its not anything.   The UTF-8 BOM is a
different sequence of bits.  If tools treat it as a BOM, that is arguably a
problem.

On Sat, 21 Jan 2023 at 05:09, Bill Dunlap  wrote:

> Setting the locale to "C" (or perhaps some other non-UTF-8 locale) will
> show the BOM bytes.  E.g., on Windows I get:
>
> > Sys.getlocale()
> [1] "LC_COLLATE=English_United States.utf8;LC_CTYPE=English_United
> States.utf8;LC_MONETARY=English_United
> States.utf8;LC_NUMERIC=C;LC_TIME=English_United States.utf8"
> > tools::showNonASCIIfile('
> https://raw.githubusercontent.com/JamesRamsay5/fda/master/NAMESPACE')
> > rawToChar(readBin('
> https://raw.githubusercontent.com/JamesRamsay5/fda/master/NAMESPACE',
> what="raw", n=20))
> [1] "export(AmpPhasDec"
> > Sys.setlocale(locale="C")
> [1] "C"
> > tools::showNonASCIIfile('
> https://raw.githubusercontent.com/JamesRamsay5/fda/master/NAMESPACE')
> 1: export(AmpPhasDecomp,
> > rawToChar(readBin('
> https://raw.githubusercontent.com/JamesRamsay5/fda/master/NAMESPACE',
> what="raw", n=20))
> [1] "\357\273\277export(AmpPhasDec"
>
> -Bill
>
>
> On Fri, Jan 20, 2023 at 9:16 AM Spencer Graves <
> spencer.gra...@effectivedefense.org> wrote:
>
> > Hi, Ivan and Uwe:
> >
> >
> >   Thanks for your suggestions, but I've so far been unable to get
> > them
> > to work.  see below.
> >
> >
> > On 1/20/23 9:22 AM, Uwe Ligges wrote:
> > >
> > >
> > > On 20.01.2023 15:53, Ivan Krylov wrote:
> > >> В Fri, 20 Jan 2023 08:41:25 -0600
> > >> Spencer Graves  пишет:
> > >>
> > >>> ** byte-compile and prepare package for lazy loading
> > >>> Error in parse(nsFile, keep.source = FALSE, srcfile = NULL) :
> > >>> 1:1: unexpected input
> > >>
> > >> tools::showNonASCIIfile('
> > https://raw.githubusercontent.com/JamesRamsay5/fda/master/NAMESPACE')
> > >> # 1: export(AmpPhaseDecomp,
> > >>
> > >> Your NAMESPACE file starts with a U+FEFF ZERO WIDTH NO-BREAK SPACE.
> > >> You'll need to remove it, e.g. by re-creating the first line.
> > >
> > >
> > > Note that this is also called "byte order mark" (BOM). Tell your editor
> > > not to create files with BOM.
> > >
> > > You can also fix in R:
> > >
> > > x <- readLines(..., encoding="UTF-8-BOM")
> > > writeLines(x, ..)
> >
> >
> >   In RStudio 2022.12.0+353 (the current version),
> >
> >
> > tools::showNonASCIIfile('
> > https://raw.githubusercontent.com/JamesRamsay5/fda/master/NAMESPACE')
> >
> >
> > returned "char(0)".  'readLines' and 'writeLines' as Uwe suggested
> > failed to fix it for me.
> >
> >
> >   The first problem I noticed with this was that RStudio could
> not
> > read
> > the NAMESPACE file.  When I tried, it said, "File is binary rather than
> > text so cannot be opened by the source editor."  I changed something
> > using a different editor and did "git commit" and "git push", and got
> > the error on GitHub that I reported above.  I copied the file elsewhere,
> > deleted it locally and from GitHub, then recreated it in LibreOffice by
> > manually typing the first and last lines then copying the rest from a
> > copy I had saved elsewhere.  The RStudio would open the file, but I
> > still get the same error message as above from both "R CMD build fda"
> > locally and from GitHub Action at:
> >
> >
> > https://github.com/JamesRamsay5/fda
> >
> >
> >   Other suggestions?
> >   Thanks,
> >   Spencer Graves
> >
> > >
> > > Best,
> > > Uwe Ligges
> > >
> > >
> > >
> > >
> > >
> >
> > __
> > R-package-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-package-devel
> >
>
> [[alternative HTML version deleted]]
>
> __
> R-package-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-package-devel
>

[[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Re-building vignettes had CPU time 9.2 times elapsed time

2023-08-25 Thread Greg Hunt
The question should be, in how many cases is the current behaviour a
problem?  In a shared environment, sure, you have to be more careful.  I'd
say don't let the teenagers in there. The CRAN build server does need to do
something to protect itself and I don't greatly mind the 2 thread limit, I
implemented it by hand in my examples and didn't think about it
afterwards.  On most 8, 16 or 32 way environments, dedicated or
semi-dedicated to a particular workload, the defaults make some level of
sense and they are probably most of the use cases.  Protecting high
processor count environments from people who don't know what they are doing
would seem to be a mismatch between the people and the environment, not so
much a matter of software.

On Sat, 26 Aug 2023 at 11:49, Jeff Newmiller 
wrote:

> You have a really bizarre way of twisting what others are saying, Dirk. I
> have seen no-one here saying 'limit R to 2 threads' except for you, as a
> way to paint opposing views to be absurd.
>
> What _is_ being said is that users need to be in control_, but _the
> default needs to do least harm_ until those users take responsibility for
> that control. Do not turn the throttle up until the user is prepared for
> the consequences. Trying to subvert that responsibility into packages by
> default is going to make more trouble than giving the people using those
> packages simple examples of how to take that control.
>
> A similar problem happens when users discover .Rprofile and insert all
> those pesky library statements into it, making their scripts
> irreproducible. If data.table made a warp10() function that activated this
> current default performance setting then the user would be clearly at fault
> for using it in an inappropriate environment like a shared HPC or the CRAN
> servers. Don't put a brick on the accelerator of a teenager's car before
> they even figure out where the brakes are.
>
> On August 25, 2023 6:17:04 PM PDT, Dirk Eddelbuettel 
> wrote:
> >
> >On 26 August 2023 at 12:05, Simon Urbanek wrote:
> >| In reality it's more people running R on their laptops vs the rest of
> the world.
> >
> >My point was that we also have 'single user on really Yuge workstation'.
> >
> >Plus we all know that those users are often not sysadmins, and do not have
> >our levels of accumulated systems knowledge.
> >
> >So we should give _more_ power by default, not less.
> >
> >| [...] they will always be saying blatantly false things like "R is not
> for large data"
> >
> >By limiting R (and/or packages) to two threads we will only get more of
> >these.  Our collective call.
> >
> >This whole thread is pretty sad, actually.
> >
> >Dirk
> >
>
> --
> Sent from my phone. Please excuse my brevity.
>
> __
> R-package-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-package-devel
>

[[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Re-building vignettes had CPU time 9.2 times elapsed time

2023-08-26 Thread Greg Hunt
Tim,
I think that things like data.table have a different set of problems
depending on the environment.  Working out what the right degree of
parallelism for an IO workload is is a hard question that depends on the
characteristics of the IO subsystem, the characteristics of the dataset and
on what problem you really have (really how much its worth spending to
achieve an optimal answer).  It would be interesting to see how well
data.table would do with several tens of threads on several tens of
processors reading a file, I suspect it might not be pretty (coordination
overheads could be large relative to the actual gains from IO
parallelism), but its not a subject I've looked at.  It would not
surprise me if the right answer was to cap the number of threads, but that
cap would probably still be higher than the usual number of processors in
the average physical or virtual box.  This stuff is not easy and its
saturated with "it depends" answers.  The underlying problem here is that
to get optimal or optimal-enough behaviour, a 96-way or more box
will require different configuration of the software to an 8 or 16-way VM.


Greg

On Sat, 26 Aug 2023 at 18:15, Tim Taylor 
wrote:

> I’m definitely sympathetic to both sides but have come around to the view
> of Greg, Dirk et al. It seems sensible to have a default that benefits the
> majority of “normal” users and require explicit action in shared
> environments not vice-versa.
>
> That is not to say that data.table could not do better with it’s
> heuristics (e.g. respecting CGroups settings as raised by Henrik in
> https://github.com/Rdatatable/data.table/issues/5620) but the current
> defaults (50%) seem reasonable for, dare I say, most users.
>
> Tim
>
> On 26 Aug 2023, at 03:20, Greg Hunt  wrote:
>
> The question should be, in how many cases is the current behaviour a
> problem?  In a shared environment, sure, you have to be more careful.  I'd
> say don't let the teenagers in there. The CRAN build server does need to do
> something to protect itself and I don't greatly mind the 2 thread limit, I
> implemented it by hand in my examples and didn't think about it
> afterwards.  On most 8, 16 or 32 way environments, dedicated or
> semi-dedicated to a particular workload, the defaults make some level of
> sense and they are probably most of the use cases.  Protecting high
> processor count environments from people who don't know what they are doing
> would seem to be a mismatch between the people and the environment, not so
> much a matter of software.
>
> On Sat, 26 Aug 2023 at 11:49, Jeff Newmiller 
> wrote:
>
> You have a really bizarre way of twisting what others are saying, Dirk. I
>
> have seen no-one here saying 'limit R to 2 threads' except for you, as a
>
> way to paint opposing views to be absurd.
>
>
> What _is_ being said is that users need to be in control_, but _the
>
> default needs to do least harm_ until those users take responsibility for
>
> that control. Do not turn the throttle up until the user is prepared for
>
> the consequences. Trying to subvert that responsibility into packages by
>
> default is going to make more trouble than giving the people using those
>
> packages simple examples of how to take that control.
>
>
> A similar problem happens when users discover .Rprofile and insert all
>
> those pesky library statements into it, making their scripts
>
> irreproducible. If data.table made a warp10() function that activated this
>
> current default performance setting then the user would be clearly at fault
>
> for using it in an inappropriate environment like a shared HPC or the CRAN
>
> servers. Don't put a brick on the accelerator of a teenager's car before
>
> they even figure out where the brakes are.
>
>
> On August 25, 2023 6:17:04 PM PDT, Dirk Eddelbuettel 
>
> wrote:
>
>
> On 26 August 2023 at 12:05, Simon Urbanek wrote:
>
> | In reality it's more people running R on their laptops vs the rest of
>
> the world.
>
>
> My point was that we also have 'single user on really Yuge workstation'.
>
>
> Plus we all know that those users are often not sysadmins, and do not have
>
> our levels of accumulated systems knowledge.
>
>
> So we should give _more_ power by default, not less.
>
>
> | [...] they will always be saying blatantly false things like "R is not
>
> for large data"
>
>
> By limiting R (and/or packages) to two threads we will only get more of
>
> these.  Our collective call.
>
>
> This whole thread is pretty sad, actually.
>
>
> Dirk
>
>
>
> --
>
> Sent from my phone. Please excuse my brevity.

Re: [R-pkg-devel] URL syntax causes R CMD build failure - a fix

2023-09-02 Thread Greg Hunt
The percent encoded characters appear to be valid in that URL, suggesting
that rejecting them is an error. That kind of error could occur when the
software processing them converts them back to a non-unicode character set.

On Sun, 3 Sep 2023 at 4:34 am, J C Nash  wrote:

> I'm posting this in case it helps some other developers getting build
> failure.
>
> Recently package nlsr that I maintain got a message that it failed to
> build on
> some platforms. The exact source of the problem is still to be illuminated,
> but seems to be in knitr::render and/or pandoc or an unfortunate
> interaction.
> An update to pandoc triggered a failure to process a vignette that had been
> happily processed for several years. The error messages are unhelpful, at
> least
> to me,
>
> Error at "nlsr-devdoc.knit.md" (line 5419, column 1):
> unexpected end of input
> Error: pandoc document conversion failed with error 64
> Execution halted
>
> Unfortunately, adding "keep_md: TRUE" (you need upper case TRUE to save it
> when
> there is no error of this type), did not save the intermediate file in this
> case. However, searching for "pandoc error 64" presented one web page
> where the author
> used brute force search of his document by removing / replacing sections
> to find
> the line(s) that caused trouble. This is a little tedious, but effective.
> In my
> case, the offending line turned out to be a copied and pasted URL
>
> https://en.wikipedia.org/wiki/Levenberg%E2%80%93Marquardt_algorithm
>
> The coded characters can be replaced by a hyphen, to give,
>
> https://en.wikipedia.org/wiki/Levenberg-Marquardt_algorithm
>
> and this, when pasted in Mozilla Firefox at least, will go to the
> appropriate
> wikipedia page.
>
> I'd be interested in hearing from others who have had similar
> difficulties. I
> suspect this is relatively rare, and causing some sort of infelicity in the
> output of knitr::render that then trips up some versions of pandoc, that
> may,
> for instance, be now applying stricter rules to URL syntax.
>
> Best,
>
> John Nash
>
> __
> R-package-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-package-devel
>

[[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Too many cores used in examples (not caused by data.table)

2023-10-24 Thread Greg Hunt
In my case recently, after an hour or so’s messing about I disabled some
tests and example executions to get rid of the offending times. I doubt
that i am the only one to do that.

On Tue, 24 Oct 2023 at 9:38 pm, Helske, Jouni  wrote:

> Thanks for the help, I now tried resubmitting with
> Sys.setenv("OMP_THREAD_LIMIT" = 2) at the top of the exchange example, but
> I still get the same note:
>
> Examples with CPU time > 2.5 times elapsed time
>   user system elapsed ratio
> exchange 1.196   0.04   0.159 7.774
>
> Not sure what to try next.
>
> Best,
> Jouni
> 
> From: Ivan Krylov 
> Sent: Friday, October 20, 2023 16:54
> To: Helske, Jouni 
> Cc: r-package-devel@r-project.org 
> Subject: Re: [R-pkg-devel] Too many cores used in examples (not caused by
> data.table)
>
> В Thu, 19 Oct 2023 05:57:54 +
> "Helske, Jouni"  пишет:
>
> > But I just realised that bssm uses Armadillo via RcppArmadillo, which
> > uses OpenMP by default for some elementwise operations. So, I wonder
> > if that could be the culprit?
>
> I wasn't able to reproduce the NOTE either, despite manually setting
> the environment variable
> _R_CHECK_EXAMPLE_TIMING_CPU_TO_ELAPSED_THRESHOLD=2 before running R CMD
> check, but I think I can see the code using OpenMP. Here's what I did:
>
> 0. Temporarily lower the system protections against capturing
> performance traces of potentially sensitive parts:
>
> echo -1 | sudo tee /proc/sys/kernel/perf_event_paranoid
>
> (Set it back to 3 after you're done.)
>
> 1. Run the following command with the development version of the
> package installed:
>
> env OPENBLAS_NUM_THREADS=1 \
>  perf record --call-graph drawf,4096 \
>  R -e 'library(bssm); system.time(replicate(100, example(exchange)))'
>
> OPENBLAS_NUM_THREADS=1 will prevent OpenBLAS from spawning worker
> threads if you have it installed. (A different BLAS may need different
> environment variables.)
>
> 2. Run `perf report` and browse collected call stack information.
>
> The call stacks are hard to navigate, but I think they are not pointing
> towards Armadillo. At least, setting ARMA_OPENMP_THREADS=1 doesn't
> help, but setting OMP_THREAD_LIMIT=1 does.
>
> --
> Best regards,
> Ivan
>
> [[alternative HTML version deleted]]
>
> __
> R-package-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-package-devel
>

[[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Fortune candidate Re: Issue with R Package on CRAN - OpenMP and clang17

2023-10-31 Thread Greg Hunt
If i remember rightly one of the early Algol compilers for the IBM
mainframe couldnt be compiled on an IBM mainframe because it was too memory
hungry (it had to be cross compiled). The numbers change, but the problems
don’t, except that i haven’t run a compile lately that ran out of memory
like that.


On Tue, 31 Oct 2023 at 12:24 pm, Dirk Eddelbuettel  wrote:

>
> On 31 October 2023 at 19:58, Ivan Krylov wrote:
> | [...] The computers that helped launch the first
> | people into space had 2 kWords of memory, but nowadays you need more
> | than 256 MBytes of RAM to launch a bird into a pig and 10 GBytes of
> | storage in order to compile a compiler. This is what progress looks
> | like.
>
> Fortune!!
>
> Dirk
>
> --
> dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org
>
> __
> R-package-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-package-devel
>

[[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] CRAN submission struggle

2023-12-29 Thread Greg Hunt
Christaan,
The elapsed time note is because CRAN expects that examples will be
configured to run single threaded and some package that you use, or a
package used by a package that you use is multi-threading by default and
using more CPU time than clock time. If you cannot figure out how to
reconfigure the multi-threaded package, a number of people have found that
the simplest thing to do is disable running the example (which reduces the
effective test coverage provided by the example).

I haven’t encountered the miktex exception file before but i suspect its a
side effect of a miktex error. Packages should not leave files behind in
the temp directory. If you expect a miktex error you need to remove the
file. If you don’t you need to track down and fix or work around the bug.
The build process is really a quality check on your package.

Greg

On Fri, 29 Dec 2023 at 3:01 am, Christiaan Pieterse <
pietie.cjp.1...@gmail.com> wrote:

> Hi,
>
> Thank you for showing the difference in the ExampleTradeData. I've fixed
> this by adding a .Gitignore file and a "data-raw" folder to load the
> ExampleTradeData. I hope I did this correctly. When I check the package (
> https://github.com/WoutersResearchGroup/R-IO-PS/tree/CRAN-prep) in
> RStudio.
> I only get 3 notes (see below), and if I run it in PositCloud, it crashes
> or yields the same 1 ERROR and 2 NOTES result as before. Why might this be?
> Is it a problem or is it fine if I continue working in RStudio since I
> cannot increase the specs in PositCloud because I'm working on a research
> group account?
>
> Here are the 3 notes I receive in RStudio:
>
> The first is the expected New Submission Note.
>
> The second is the runtime that is too long:
> * checking examples ... [43s] NOTE
> Examples with CPU (user + system) or elapsed time > 5s
>   user system elapsed
> IOPS 10.06   3.35   35.04
> How can I reduce this time? I'm not sure how to reduce the size of my
> ExampleTradeData without the check giving errors when running the example.
>
> The third note I am unsure what it means:
> * checking for detritus in the temp directory ... NOTE
> Found the following files/directories:
>   'lastMiKTeXException'
>
> Kind regards
> Christiaan
>
> On Thu, 28 Dec 2023 at 15:55, Ivan Krylov  wrote:
>
> > Hi Christiaan,
> >
> > В Thu, 28 Dec 2023 14:57:55 +0200
> > Christiaan Pieterse  пишет:
> >
> > > Still, I couldn't figure out why I ran into this problem, so I
> > > created a test file called "Test Example.R" (available at the same
> > > GitHub repository:
> > > https://github.com/WoutersResearchGroup/R-IO-PS/tree/CRAN-prep).
> >
> > I see you're always adding or updating files to the GitHub repo by
> > means of uploading. While that's certainly one way to use GitHub, it's
> > combines the least convenient aspects of two approaches to using GitHub.
> >
> > With GitHub purely in the browser, GitHub is just a website where you
> > keep and edit code, running nothing else on the local computer. Code
> > can be run in Codespaces or using GitHub Actions. Microsoft will want
> > to be paid money to run code on their computers.
> >
> > With GitHub as a Git remote, there is a local checkout [*] that's kept
> > in sync with GitHub by means of commits [**] and pushes [***], letting
> > you create meaningful, describable snapshots of changes in your code
> > spanning multiple files at the same time.
> >
> > Right now, it probably feels like Dropbox but worse.
> >
> > > This file creates the function in the global environment (note that
> > > this is the same function code as available in the package
> > > "R/iopspackage2.0.R" file), and then runs this function with the same
> > > example as in the package (If you want to try this yourself, just
> > > load the data/ExampleTradeData.rda in before running the Test Example
> > > file). This test file yields no errors when I run it and produces the
> > > correct results. When I then proceed to build and check the package,
> > > it yields the same example error as before. I do not understand why
> > > or what could cause this issue.
> >
> > The difference is in the ExampleTradeData variable, which "Test
> > Example.R" doesn't define.
> >
> > With data(ExampleTradeData), the script works.
> >
> > With ExampleTradeData <-
> >
> >
> read.csv(system.file("extdata","ExampleTradeData.csv",package="iopspackage")),
> > the script fails exactly the same way as example(IOPS) does.
> >
> > > I'm not sure if I should send out another email to the developers to
> > > see if someone else spots something I'm not seeing.
> >
> > It may help to keep Cc: r-package-devel@r-project.org in the e-mails
> > for the search engines to index the potential solutions in the mailing
> > list archives.
> >
> > --
> > Best regards,
> > Ivan
> >
> > [*]
> > https://git-scm.com/book/en/v2/Git-Basics-Getting-a-Git-Repository
> >
> > [**]
> >
> >
> https://git-scm.com/book/en/v2/Git-Basics-Recording-Changes-to-the-Repository
> >
> > [***]
> > https://git-scm.com/book/en/

[R-pkg-devel] Order of repo access from options("repos")

2024-03-31 Thread Greg Hunt
When I set multiple repositories in options(repos=...) the order of access
is providing me with some surprises as I work through some CICD issues:

Given:

options(
   repos = c(
 CRAN = "http://localhost:3001/proxy";,
 C = "http://172.17.0.1:3002";,
 B = "http://172.17.0.1:3001/proxy";,
 A = "http://localhost:3002";
   )
)


the order in the build log after this is :

#12 178.7 Warning: unable to access index for repository
http://localhost:3001/proxy/src/contrib:
#12 178.7   cannot open URL '
http://localhost:3001/proxy/src/contrib/PACKAGES'
#12 178.7 Warning: unable to access index for repository
http://172.17.0.1:3002/src/contrib:
#12 178.7   cannot open URL 'http://172.17.0.1:3002/src/contrib/PACKAGES'
#12 178.9 Warning: unable to access index for repository
http://localhost:3002/src/contrib:
#12 178.9   cannot open URL 'http://localhost:3002/src/contrib/PACKAGES'
#12 179.0 trying URL '
http://172.17.0.1:3001/proxy/src/contrib/png_0.1-8.tar.gz'
#12 179.1 Content type 'application/x-gzip' length 24880 bytes (24 KB)


Which indicates that the order is:

CRAN, C, A, B...

note that A comes before B in the URL accesses when I was expecting either
CRAN, C, B, A if its is physical order, or alphabetically would be A, B, C,
CRAN.

As an alternative, given:

options(
repos = c(
C = "http://172.17.0.1:3002";,
B = "http://172.17.0.1:3001/proxy";,
A = "http://localhost:3002";,
CRAN = "http://localhost:3001/proxy";
)
)


The order is:

#12 0.485 Warning: unable to access index for repository
http://172.17.0.1:3002/src/contrib:
#12 0.485   cannot open URL 'http://172.17.0.1:3002/src/contrib/PACKAGES'
#12 1.153 Warning: unable to access index for repository
http://localhost:3002/src/contrib:
#12 1.153   cannot open URL 'http://localhost:3002/src/contrib/PACKAGES'
#12 1.153 Warning: unable to access index for repository
http://localhost:3001/proxy/src/contrib:
#12 1.153   cannot open URL '
http://localhost:3001/proxy/src/contrib/PACKAGES'
#12 1.250 trying URL '
http://172.17.0.1:3001/proxy/src/contrib/rlang_1.1.3.tar.gz'


Which seems to be C, A, CRAN, B.

What is it about B?

The help doesn't talk about this.  It says:

repos:
character vector of repository URLs for use by available.packages and
related functions. Initially set from entries marked as default in the
‘repositories’ file, whose path is configurable via environment variable
R_REPOSITORIES (set this to NULL to skip initialization at startup). The
‘factory-fresh’ setting from the file in R.home("etc") is c(CRAN="@CRAN@"),
a value that causes some utilities to prompt for a CRAN mirror. To avoid
this do set the CRAN mirror, by something like


local({
r <- getOption("repos")
r["CRAN"] <- "https://my.local.cran";
options(repos = r)
})
in your ‘.Rprofile’, or use a personal ‘repositories’ file.


Note that you can add more repositories (Bioconductor, R-Forge, RForge.net,
...) for the current session using setRepositories.


Now I am not setting the values in exactly the way that the manual says, so
I experimented in case something was wrong there:

 options('repos')$repos
 CRAN
"https://cloud.r-project.org";
> local({+ r <- getOption("repos")+ r["CRAN"] <- 
> "https://my.local.cran"+ options(repos = r)+ })> options('repos')$repos
   CRAN
"https://my.local.cran";
> str(options('repos'))List of 1
 $ repos: Named chr "https://my.local.cran";
  ..- attr(*, "names")= chr "CRAN"> local({+ r <-
getOption("repos")+ r["CRAN"] <- "https://my.local.cran"+
options(repos = r)+ })> options(+ repos = c(+ C =
"http://172.17.0.1:3002",+ B =
"http://172.17.0.1:3001/proxy",+ A = "http://localhost:3002",+
CRAN = "http://localhost:3001/proxy"+ )+ )>
options('repos')$repos
 C  B
A   CRAN
  "http://172.17.0.1:3002"; "http://172.17.0.1:3001/proxy";
"http://localhost:3002";  "http://localhost:3001/proxy";
> str(options('repos'))List of 1
 $ repos: Named chr [1:4] "http://172.17.0.1:3002";
"http://172.17.0.1:3001/proxy"; "http://localhost:3002";
"http://localhost:3001/proxy";
  ..- attr(*, "names")= chr [1:4] "C" "B" "A" "CRAN"> local({+ r
<- getOption("repos")+ r["CRAN"] <- "https://my.local.cran"+
r["C"] = "http://172.17.0.1:3002"+ r["B"] =
"http://172.17.0.1:3001/proxy"+ r["A"] = "http://localhost:3002"+
   r["CRAN"] = "http://localhost:3001/proxy"+ options(repos = r)+
})> > str(options('repos'))List of 1
 $ repos: Named chr [1:4] "http://172.17.0.1:3002";
"http://172.17.0.1:3001/proxy"; "http://localhost:3002";
"http://localhost:3001/proxy";
  ..- attr(*, "names")= chr [1:4] "C" "B" "A" "CRAN"> options('repos')$repos
 C  B
A   CRAN
  "http://172.17.0.1:3002"; "http://172.17.0.1:3001/proxy";
"http://localhost:3002";  "http://localh

Re: [R-pkg-devel] Order of repo access from options("repos")

2024-03-31 Thread Greg Hunt
Dirk,
Sadly I can't use localhost for all of those.  172.17.0.1 is an internal
Docker IP, not the localhost address (127.0.0.1), they are there to handle
two different scenarios and different ones will fail to resolve in
different scenarios.  Are you saying that the DNS lookup adds a timing
issue to the search order?  Isn't the list deterministically ordered?


Greg

On Sun, 31 Mar 2024 at 22:15, Dirk Eddelbuettel  wrote:

>
> Greg,
>
> There are AFAICT two issues here: how R unrolls the named vector that is
> the
> 'repos' element in the list 'options', and how your computer resolves DNS
> for
> localhost vs 172.17.0.1.  I would try something like
>
>options(repos = c(CRAN = "http://localhost:3001/proxy";,
>  C = "http://localhost:3002";,
>  B = "http://localhost:3003/proxy";,
>  A = "http://localhost:3004";))
>
> or the equivalent with 172.17.0.1. When I do that here I get errors from
> first to last as we expect:
>
>> options(repos = c(CRAN = "http://localhost:3001/proxy";,
>  C = "http://localhost:3002";,
>  B = "http://localhost:3003/proxy";,
>  A = "http://localhost:3004";))
>> available.packages()
>Warning: unable to access index for repository
> http://localhost:3001/proxy/src/contrib:
>  cannot open URL 'http://localhost:3001/proxy/src/contrib/PACKAGES'
>Warning: unable to access index for repository
> http://localhost:3002/src/contrib:
>  cannot open URL 'http://localhost:3002/src/contrib/PACKAGES'
>Warning: unable to access index for repository
> http://localhost:3003/proxy/src/contrib:
>  cannot open URL 'http://localhost:3003/proxy/src/contrib/PACKAGES'
>Warning: unable to access index for repository
> http://localhost:3004/src/contrib:
>  cannot open URL 'http://localhost:3004/src/contrib/PACKAGES'
> Package Version Priority Depends Imports LinkingTo Suggests
> Enhances License License_is_FOSS License_restricts_use OS_type Archs MD5sum
> NeedsCompilation File Repository
>>
>
> Dirk
>
> --
> dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org
>

[[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Order of repo access from options("repos")

2024-03-31 Thread Greg Hunt
Martin, Dirk, Kevin,
Thanks for your help.  To summarise: the order of access is undefined, and
every repo URL is accessed.   I'm working in an environment
where "known-good" is more important than "latest", so what follows is an
explanation of the problem space from my perspective.

What I am experimenting with is pinning down the versions of the packages
that a moderately complex solution is built against using a combination of
an internal repository of cached packages (internally written packages, our
own hopefully transient copies of packages archived from CRAN,
packages live on CRAN, and packages present in both Github and CRAN which
we build and cache locally) and a proxy that separately populates that
cache in specific build processes by intercepting requests to CRAN.  I'd
like to use the base R function if possible and I want to let the version
numbers in the dependencies float because a) we do need to maintain
approximate currency in what versions of packages we use and b) I have no
business monkeying around with third party's dependencies.  Renv looks
helpful but has some assumptions about disk access to its cache that I'd
rather avoid by running an internal repo.  The team is spread around the
world, so shared cache volumes are not a great idea.

The business with the multiple repo addresses is one approach to working
around Docker's inability to understand that people need to access the
Docker host's ports from inside a container or a build, and that the
current Docker treatment of the host's internal IP is far from transparent
(I have scripts that run both inside and outside of Docker containers and
they used to be able to work out for themselves what environment they run
in, thats got harder lately).  That led down a path in which one set of
addresses did not reject connection attempts, making each package
installation (and there are hundreds) take some number of minutes for the
connections to time out.  Thankfully I don't actually have to deal with
that.

We have had a few cases where our dependencies have been archived from CRAN
and we have maintained our own copy for a period of days to months, a
period in which we do not know what the next package version number is.  It
would be convenient to not have to think about that - a deterministic,
terminating search of a sequence of repos looked like a nice idea for that,
but I may have to do something different.

There was a recent case where a package made a breaking change in its
interface in a release (not version) update that broke another package we
depend on.  It would be nice to be able to temporarily pin that package at
its previous version (without updating the source of the third party
package that depends on it) to preserve our own build-ability while those
packages sort themselves out.

There is one case where a pull request for a CRAN-hosted package was
verbally accepted but never actioned so we have our own forked version of a
CRAN-hosted package which I need to decide what to do with one day soon.
Another case where the package version number is different in CRAN from the
one we want.

We have a dependency on a package that we build from a Git repo but which
is also present in CRAN.  I don't want to be dependent on the maintainers
keeping the package version in the Git copy of the DESCRIPTION file higher
than the version in CRAN.  Ideally I'd like to build and push to the
internal repo and not have to think about it after that. Same issue as
before arises, as it stands today I have to either worry about, and
probably edit, the version number in the build or manage the cache
population process so the internal package instance is added after any
CRAN-sourced dependencies and make sure that the public CRAN instances are
not accessed in the build.

All of these problems are soluble by special-casing the affected installs,
specifically managing the cache population (with a requirement that the
cache and CRAN not be searched at the same time), or editing version
numbers whose next values I do not control, but I would like to try for the
simplest approach first. I know I'm not going to get a clean solution here,
the relative weights of "known-good" and "latest" are different
depending on where you stand.


Greg

On Sun, 31 Mar 2024 at 22:43, Martin Morgan  wrote:

> available.packages indicates that
>
>
>
>  By default, the return value includes only packages whose version
>
>  and OS requirements are met by the running version of R, and only
>
>  gives information on the latest versions of packages.
>
>
>
> So all repositories are consulted and then the result filtered to contain
> just the most recent version of each. Does it matter then what order the
> repositories are visited?
>
>
>
> Martin Morgan
>
>
>
> *From: *R-package-devel  on behalf
> of Greg Hunt

Re: [R-pkg-devel] Order of repo access from options("repos")

2024-04-02 Thread Greg Hunt
Jan,
Thats only the case if you want to allow later version numbers to override
the versions in the internal repository, the "known-good" is more important
than "latest" point above.

Having a defined set of dependencies while still maintaining currency is a
difficult problem.  Always fetching dependencies from a public source is a
very bad idea (which is why I am looking at these issues), but not doing it
accumulates future costs as interfaces and sets of bugs evolve and need to
be remediated.  Those future costs can become very large indeed in a large
system.

Compounding the problem, CRAN caching is not supported universally by
commercial infrastructure.  I think Artifactory and Nexus do it, the AWS
and Azure offerings don't.


Greg

On Wed, 3 Apr 2024 at 01:05, Jan van der Laan  wrote:

> Interesting. That would also mean that putting a company repo first does
> not protect against dependency confusion attacks (people intentionally
> uploading packages with the same name as company internal packages on
> CRAN;
>
> https://arstechnica.com/information-technology/2021/02/supply-chain-attack-that-fooled-apple-and-microsoft-is-attracting-copycats/)
>
>
>
> Jan
>
>
>
> On 01-04-2024 02:07, Greg Hunt wrote:
> > Martin, Dirk, Kevin,
> > Thanks for your help.  To summarise: the order of access is undefined,
> and
> > every repo URL is accessed.   I'm working in an environment
> > where "known-good" is more important than "latest", so what follows is an
> > explanation of the problem space from my perspective.
> >
> > What I am experimenting with is pinning down the versions of the packages
> > that a moderately complex solution is built against using a combination
> of
> > an internal repository of cached packages (internally written packages,
> our
> > own hopefully transient copies of packages archived from CRAN,
> > packages live on CRAN, and packages present in both Github and CRAN which
> > we build and cache locally) and a proxy that separately populates that
> > cache in specific build processes by intercepting requests to CRAN.  I'd
> > like to use the base R function if possible and I want to let the version
> > numbers in the dependencies float because a) we do need to maintain
> > approximate currency in what versions of packages we use and b) I have no
> > business monkeying around with third party's dependencies.  Renv looks
> > helpful but has some assumptions about disk access to its cache that I'd
> > rather avoid by running an internal repo.  The team is spread around the
> > world, so shared cache volumes are not a great idea.
> >
> > The business with the multiple repo addresses is one approach to working
> > around Docker's inability to understand that people need to access the
> > Docker host's ports from inside a container or a build, and that the
> > current Docker treatment of the host's internal IP is far from
> transparent
> > (I have scripts that run both inside and outside of Docker containers and
> > they used to be able to work out for themselves what environment they run
> > in, thats got harder lately).  That led down a path in which one set of
> > addresses did not reject connection attempts, making each package
> > installation (and there are hundreds) take some number of minutes for the
> > connections to time out.  Thankfully I don't actually have to deal with
> > that.
> >
> > We have had a few cases where our dependencies have been archived from
> CRAN
> > and we have maintained our own copy for a period of days to months, a
> > period in which we do not know what the next package version number is.
> It
> > would be convenient to not have to think about that - a deterministic,
> > terminating search of a sequence of repos looked like a nice idea for
> that,
> > but I may have to do something different.
> >
> > There was a recent case where a package made a breaking change in its
> > interface in a release (not version) update that broke another package we
> > depend on.  It would be nice to be able to temporarily pin that package
> at
> > its previous version (without updating the source of the third party
> > package that depends on it) to preserve our own build-ability while those
> > packages sort themselves out.
> >
> > There is one case where a pull request for a CRAN-hosted package was
> > verbally accepted but never actioned so we have our own forked version
> of a
> > CRAN-hosted package which I need to decide what to do with one day soon.
> > Another case where the package version number is different in CRAN f

[R-pkg-devel] Package vulnerabilities

2024-04-03 Thread Greg Hunt
Uwe,
Whether it takes a lot of effort to get malicious code into a company
depends on the pay-off, which can be large relative to the effort.  The
example of the hack before was largely interesting because the priorities
of the package users were fundamentally insecure (higher version number
wins, defaulting to public repositories) and the specific package names
meant that the hack was narrowly targeted, making it less likely to be
discovered than exfiltration code inserted into a widely used package.
Having an identifiable set of package dependencies at any point in time is
a beginning.  Its difficult to effectively control developer behaviour, so
there is a risk there, but what makes it into production can in principle
be identified and controlled.

I had a look at the CVE database, its difficult to identify R package
vulnerabilities there.  Some other searching turned up a couple of
vulnerabilities and some rather promotional blog posts, one of which
asserted that R code is almost always run in controlled environments, which
was sadly hilarious.

Is there a source of vulnerability information for R packages?  Are there
or have there been examples of actually malicious R packages in the wild?


Greg

[[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Failed: Future File Timestamp Check

2025-02-07 Thread Greg Hunt
Does all this mean that the check is not handling its own errors?

On Sat, 8 Feb 2025 at 8:28 AM, Henrik Bengtsson 
wrote:

> > It has to have the "datetime" entry. If you can't fix your network you
> can skip that test with
> >
> > _R_CHECK_FUTURE_FILE_TIMESTAMPS_=FALSE
>
> I'm quite sure that is overridden 'R CMD check' when using the
> --as-cran flag. The workaround that I have found is to set environment
> variable:
>
> _R_CHECK_SYSTEM_CLOCK_=FALSE
>
> I have observed  timing
> out a lot over the last couple of months from several different
> networks and hosts. I've seen it happen in the past too (last couple
> of years), but it has got considerably worse recently. Maybe they're
> throttling in per IP(?). I consider it quite unreliable these days, so
> I use _R_CHECK_SYSTEM_CLOCK_=FALSE by default now.
>
> /Henrik
>
> On Tue, Feb 4, 2025 at 5:50 PM Simon Urbanek
>  wrote:
> >
> > Josiah,
> >
> > that test tests the accuracy of the system clock by querying
> https://worldtimeapi.org/api/timezone/etc/UTC so my guess would be that
> you have either network or proxy issues which cause that request to fail by
> providing garbage instead of the actual response.
> >
> > The call to test yourself is
> > > readLines("http://worldtimeapi.org/api/timezone/etc/UTC";, warn=FALSE)
> > [1]
> "{\"utc_offset\":\"+00:00\",\"timezone\":\"Etc/UTC\",\"day_of_week\":3,\"day_of_year\":36,\"datetime\":\"2025-02-05T01:42:19.272728+00:00\",\"utc_datetime\":\"2025-02-05T01:42:19.272728+00:00\",\"unixtime\":1738719739,\"raw_offset\":0,\"week_number\":6,\"dst\":false,\"abbreviation\":\"UTC\",\"dst_offset\":0,\"dst_from\":null,\"dst_until\":null,\"client_ip\":\"121.98.39.155\"}"
> >
> > It has to have the "datetime" entry. If you can't fix your network you
> can skip that test with
> >
> > _R_CHECK_FUTURE_FILE_TIMESTAMPS_=FALSE
> >
> > Cheers,
> > Simon
> >
> >
> > > On Feb 5, 2025, at 10:11 AM, Josiah Parry 
> wrote:
> > >
> > > I'm running R CMD check for my package {calcite} (source:
> > > https://github.com/r-arcGIS/calcite) which is failing due to what
> *looks* like
> > > a bug.
> > >
> > > R CMD check fails at "checking for future file timestamps"
> > >
> > > I get this error:  ...Error in if (abs(unclass(now_local) -
> > > unclass(now)[1]) > 300) missing value where TRUE/FALSE needed.
> > >
> > > It seems that an NA is being generated somehow during this check but
> I'm
> > > unsure how.
> > >
> > > One thing that comes to mind is that the file that contains all of my
> > > function definitions is generated using writeLines() but the output of
> `
> > > file.info()` looks normal to me.
> > >
> > > Have others encountered this? I'm on R 4.4.0 Puppy Cup
> > >
> > > platform   aarch64-apple-darwin20
> > > arch   aarch64
> > > os darwin20
> > > system aarch64, darwin20
> > > status
> > > major  4
> > > minor  4.0
> > > year   2024
> > > month  04
> > > day24
> > > svn rev86474
> > > language   R
> > > version.string R version 4.4.0 (2024-04-24)
> > > nickname   Puppy Cup
> > >
> > >   [[alternative HTML version deleted]]
> > >
> > > __
> > > R-package-devel@r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-package-devel
> > >
> >
> > __
> > R-package-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-package-devel
>
> __
> R-package-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-package-devel
>

[[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] how to notify users of obsolete and new package

2025-02-11 Thread Greg Hunt
If as your install page says, there are only trivial differences in the
code that uses the old and new packages, why force people to reinstall by
disabling their code with what you term a shell?  Surely a package startup
message would be enough, and better than disabling your users' previously
(hopefully) working code.  Having a package suddenly stop working is fairly
annoying and happens all the time through the usual entropy in complex open
source systems, having that failure created deliberately is just adding to
the annoyance and is likely to impose time costs on users at moments that
they did not choose.   I have had this happen a number of times:  some
small project that I've left for a few weeks and returned to expecting to
take a few minutes to do some trivial update and re-generate a graph or a
notebook and discovered that due to a rash update of some kind (R, Python
or OS packages) that I have to spend time I don't have bringing the pieces
up to date.  Big, long-lived things, sure, they get their environment
frozen, but the transient or ad-hoc pieces of work eat time with this type
of problem.  If it can be avoided, it should be.

The dartR GitHub page (https://green-striped-gecko.github.io/dartR/) and at
https://github.com/green-striped-gecko/dartR/ does not mention that the
package has been superseded, you have to read the install page, which is
surprising given this email thread.  One additional way of letting people
know about the new version would be to put it on the github page and in the
readme.

It looks like the dartR package has been removed from CRAN due to
uncorrected errors in the code.

Greg

On Tue, 11 Feb 2025 at 08:55, Bernd.Gruber 
wrote:

> Hi,
>
> I have a quick question. I have an older package (dartR) that is now
> superseded by a series of new packages.
>
> Still we noticed that several users have not updated yet and moved to the
> new package. Hence the question:
>
> Is it okay to submit a "shell" package under the name of the old package
> that does nothing else than telling the user to install the new package
> (and a link/code how to do that)?
>
> There would only be one function which is updating some legacy data to a
> new format.
>
> Is that accepted or is there another way to let user know (e.g. via the
> CRAN package pages)?
>
> Thanks and regards, Bernd
>
>
>
> ==
> Dr Bernd Gruber   Tel: (02) 6206 3804 Fax: (02) 6201 2328
> Professor
> Institute for Applied Ecology
> Faculty of Applied Science
> University of Canberra   ACT 2601 AUSTRALIA
> Email: bernd.gru...@canberra.edu.au
> WWW:
> http://www.canberra.edu.au/faculties/science/staff/profiles/dr-bernd-gruber
>
> Australian Government Higher Education Provider Number CRICOS:#00212K
>
> NOTICE & DISCLAIMER: This email and any files transmitted with it may
> contain
> confidential or copyright material and are for the attention of the
> addressee
> only. If you have received this email in error please notify us by email
> reply and delete it from your system. The University of Canberra accepts
> no liability for any damage caused by any virus transmitted by this email.
>
>
> ==
>
> [UC Logo]
>
> [Adobe Creative Campus. Fuel your ceativity, Adobe Express free for all UC
> Students and Staff.]<
> https://www.canberra.edu.au/on-campus/adobe-creative-campus/>
>
>
>
> The Ngunnawal people are the Traditional Custodians of the ACT where UC's
> Bruce Campus is situated and are an integral and celebrated part of UC's
> culture. We also acknowledge other First Nations Peoples.
>
> Australian Government Higher Education Registered Provider (CRICOS)
> #00212K. TEQSA Provider ID: PRV12003 (Australian University)
> Email Disclaimer<
> https://www.canberra.edu.au/about-uc/disclaimer-copyright-privacy-accessibility
> >
>
> [UC Facebook]  [UC
> Instagram]  [UC Linkedin] <
> https://au.linkedin.com/school/university-of-canberra/> [UC Youtube] <
> https://www.youtube.com/user/uniofcanberra>  [University of Canberra] <
> http://www.canberra.edu.au>
>
> [[alternative HTML version deleted]]
>
> __
> R-package-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-package-devel
>

[[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Check time > 10min

2025-06-02 Thread Greg Hunt
That log ends with status OK, so is that the run that had the problem?
Isn't that time exceeded message in the log somewhere?

On Tue, 3 Jun 2025 at 11:41, Murray Efford via R-package-devel <
r-package-devel@r-project.org> wrote:

> On the face of it,  I would need to throw out all the examples, and all
> the tests. That can't be right. Am I wrong to take the times in the log at
> face value? Where did the other 6 minutes go? Please excuse my obtuseness.
> 
> From: Dirk Eddelbuettel 
> Sent: Tuesday, 3 June 2025 12:54
> To: Murray Efford 
> Cc: R Package Development 
> Subject: Re: [R-pkg-devel] Check time > 10min
>
>
> On 3 June 2025 at 00:12, Murray Efford via R-package-devel wrote:
> | My revision of package 'secr' fails CRAN pre-test on Windows (R 4.5.0)
> because total check time exceeds 10 min (it's 760 seconds or 13 min). I
> can't see how to fix this as none of the times listed in the log
> https://win-builder.r-project.org/incoming_pretest/secr_5.2.2_20250602_054847/Windows/00check.log
> <
> https://win-builder.r-project.org/incoming_pretest/secr_5.2.2_20250602_054847/Windows/00check.log>
> seems exceptional:
> | * checking CRAN incoming feasibility ... [18s] OK
> | * checking R code for possible problems ... [116s] OK
> | * checking examples ... [87s] OK
> | * checking tests ... [59s] OK
> | * checking re-building of vignette outputs ... [42s] OK
> | * checking PDF version of manual ... [32s] OK
> | * checking HTML version of manual ... [42s] OK
> | and the total of these components is only 396 sec (6.6 min), so I must
> be missing something. I would appreciate any advice.  Not much was added in
> this release, and I don't like the idea of blindly hacking off bits.
>
> To a first approximation every tests is a function of some variable we can
> describe as 'N' which you, as author of the package and the tests,
> understand
> best.
>
> Surely you must know a way to define a new N1 <- N/2, or some other
> appropriate scaling. Then try running with N1 instead. And you can also
> make
> both tests and examples _conditional_ on some other control variable.
>
> It's all just code. Bend it like Beckham.
>
> Dirk
>
> |
> |[[alternative HTML version deleted]]
> |
> | __
> | R-package-devel@r-project.org mailing list
> | https://stat.ethz.ch/mailman/listinfo/r-package-devel<
> https://stat.ethz.ch/mailman/listinfo/r-package-devel>
>
> --
> dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org
>
> [[alternative HTML version deleted]]
>
> __
> R-package-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-package-devel
>

[[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Check time > 10min

2025-06-03 Thread Greg Hunt
Dirk,
Even if he gets the test and example times to zero, his total time in that
thirteen minute run is still above ten minutes.  In my view the incomplete
time reporting (we don't know what makes up the thirteen minutes) is a bug
in the build process.

Greg

On Tue, 3 Jun 2025 at 10:54, Dirk Eddelbuettel  wrote:

>
> On 3 June 2025 at 00:12, Murray Efford via R-package-devel wrote:
> | My revision of package 'secr' fails CRAN pre-test on Windows (R 4.5.0)
> because total check time exceeds 10 min (it's 760 seconds or 13 min). I
> can't see how to fix this as none of the times listed in the log
> https://win-builder.r-project.org/incoming_pretest/secr_5.2.2_20250602_054847/Windows/00check.log
> seems exceptional:
> | * checking CRAN incoming feasibility ... [18s] OK
> | * checking R code for possible problems ... [116s] OK
> | * checking examples ... [87s] OK
> | * checking tests ... [59s] OK
> | * checking re-building of vignette outputs ... [42s] OK
> | * checking PDF version of manual ... [32s] OK
> | * checking HTML version of manual ... [42s] OK
> | and the total of these components is only 396 sec (6.6 min), so I must
> be missing something. I would appreciate any advice.  Not much was added in
> this release, and I don't like the idea of blindly hacking off bits.
>
> To a first approximation every tests is a function of some variable we can
> describe as 'N' which you, as author of the package and the tests,
> understand
> best.
>
> Surely you must know a way to define a new N1 <- N/2, or some other
> appropriate scaling. Then try running with N1 instead. And you can also
> make
> both tests and examples _conditional_ on some other control variable.
>
> It's all just code. Bend it like Beckham.
>
> Dirk
>
> |
> |   [[alternative HTML version deleted]]
> |
> | __
> | R-package-devel@r-project.org mailing list
> | https://stat.ethz.ch/mailman/listinfo/r-package-devel
>
> --
> dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org
>
> __
> R-package-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-package-devel
>

[[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Check time > 10min

2025-06-03 Thread Greg Hunt
Dirk,
To clarify the reference to zero cost.

If Murray is being told that total time is thirteen minutes and that the
time needs to be less than ten, he might try to reduce the cost of tests
and examples, but they don't in total add up to the required three minutes.
Even if the combined cost of tests and examples is reduced to zero he is
still not below ten minutes.

Greg

On Tue, 3 Jun 2025 at 22:47, Dirk Eddelbuettel  wrote:

>
> Greg,
>
> On 3 June 2025 at 21:22, Greg Hunt wrote:
> | Dirk,
> | Even if he gets the test and example times to zero, his total time in
> that
> | thirteen minute run is still above ten minutes.  In my view the
> incomplete time
> | reporting (we don't know what makes up the thirteen minutes) is a bug in
> the
> | build process.
>
> Are you aware of these (documented, if you know where to look) environment
> variables?  (Copied from my ~/.R/check.Renviron, and 2 mins may be too
> tight.)
>
> _R_CHECK_INSTALL_TIMING_CPU_TO_ELAPSED_THRESHOLD_=2
> _R_CHECK_TIMINGS_=2
> _R_CHECK_EXAMPLE_TIMING_CPU_TO_ELAPSED_THRESHOLD_=2
> _R_CHECK_TEST_TIMING_CPU_TO_ELAPSED_THRESHOLD_=2
>
> Also, I am lost between your dualing propositions 'test and example times
> [are] zero' and 'still above ten minutes'. Can you re-explain? Generally
> zero
> is in effect less than 10.  So I must be misunderstanding something.
>
> Dirk
>
> |
> | Greg
> |
> | On Tue, 3 Jun 2025 at 10:54, Dirk Eddelbuettel  wrote:
> |
> |
> | On 3 June 2025 at 00:12, Murray Efford via R-package-devel wrote:
> | | My revision of package 'secr' fails CRAN pre-test on Windows (R
> 4.5.0)
> | because total check time exceeds 10 min (it's 760 seconds or 13
> min). I
> | can't see how to fix this as none of the times listed in the log
> https://
> |
> win-builder.r-project.org/incoming_pretest/secr_5.2.2_20250602_054847/
> | Windows/00check.log seems exceptional:
> | | * checking CRAN incoming feasibility ... [18s] OK
> | | * checking R code for possible problems ... [116s] OK
> | | * checking examples ... [87s] OK
> | | * checking tests ... [59s] OK
> | | * checking re-building of vignette outputs ... [42s] OK
> | | * checking PDF version of manual ... [32s] OK
> | | * checking HTML version of manual ... [42s] OK
> | | and the total of these components is only 396 sec (6.6 min), so I
> must be
> | missing something. I would appreciate any advice.  Not much was
> added in
> | this release, and I don't like the idea of blindly hacking off bits.
> |
> | To a first approximation every tests is a function of some variable
> we can
> | describe as 'N' which you, as author of the package and the tests,
> | understand
> | best.
> |
> | Surely you must know a way to define a new N1 <- N/2, or some other
> | appropriate scaling. Then try running with N1 instead. And you can
> also
> | make
> | both tests and examples _conditional_ on some other control variable.
> |
> | It's all just code. Bend it like Beckham.
> |
> | Dirk
> |
> | |
> | |   [[alternative HTML version deleted]]
> | |
> | | __
> | | R-package-devel@r-project.org mailing list
> | | https://stat.ethz.ch/mailman/listinfo/r-package-devel
> |
> | --
> | dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org
> |
> | __
> | R-package-devel@r-project.org mailing list
> | https://stat.ethz.ch/mailman/listinfo/r-package-devel
> |
>
> --
> dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org
>

[[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Check time > 10min

2025-06-03 Thread Greg Hunt
Dirk,
In the original email, there was this:

* checking examples ... [87s] OK
* checking tests ... [59s] OK

Am I interpreting it wrong or are these numbers the elapsed times for
checking examples and tests?

Greg

On Wed, 4 Jun 2025 at 00:17, Dirk Eddelbuettel  wrote:

>
> Greg,
>
> On 3 June 2025 at 23:58, Greg Hunt wrote:
> | To clarify the reference to zero cost.
> |
> | If Murray is being told that total time is thirteen minutes and that the
> time
> | needs to be less than ten, he might try to reduce the cost of tests and
> | examples, but they don't in total add up to the required three minutes.
> Even if
> | the combined cost of tests and examples is reduced to zero he is still
> not
> | below ten minutes.
>
> It is my (possibly wrong) understanding that _total time_ is not limited,
> but
> test and example time are. It is the latter two this thread is about, not
> the
> total time. Total time is much higher for some packages (arrow, duckdb,
> terra, my own RQuantLib, the various *stan* packages, ...).
>
> Dirk
>
> --
> dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org
>

[[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel