Re: [R-pkg-devel] Questions about making a database package (Rpolyhedra)

2018-06-29 Thread Mark van der Loo
Hi Alejandro,

Brooke Anderson gave a nice talk at useR!2017 addressing this exact issue.
See
https://schd.ws/hosted_files/user2017/19/anderson-eddelbuettel-use_r_talk.pdf
for
the slides. The basic idea is to use an external CRAN-like repository for
the data back-end. Brooke used 'drat' to set up such a repo.

-Mark



Op do 28 jun. 2018 om 13:56 schreef alejandro baranek <
alejandrobara...@gmail.com>:

> Hi Joris:
>
> Thank you for your comments.
> Of course, we are using https for aditional downloads.
>
> For the moment it is not needed to use github LFS, but is an alternative we
> can explore after this short step: our immediate goal is to make the
> package lighter in CRAN. Now it's 35kb so I think we made it well.
>
> We are defining an XSD for exporting polyhedra in XML. After that, it will
> be possible to make an API with the polyhedra database and make the
> improvement you are saying. But with time, we have no funding yet for this
> project and want to implement some functionalities to make it more valuable
> first. But is in our roadmap to make it easy to port it to other languages.
> The interface we are using is really simple, probably it will be the API
> interface too.
>
> Best, Ale.
>
>
> 2018-06-28 5:23 GMT-03:00 Joris Meys :
>
> > Hi Ale,
> >
> > I'd personally use a more specific solution like github LFS (large file
> > storage) for a versioned database. You should also check with CRAN
> itself,
> > as they keep high standards for everything that's not a standard install.
> > More specifically (from CRAN policies) :
> >
> > Downloads of additional software or data as part of package installation
> > or startup should only use secure download mechanisms (e.g., ‘https’ or
> > ‘ftps’).
> >
> > Personally I would store that information in a public database somewhere
> > with a (minimal) API. This can then be extended without inflating the
> > download and would allow people to install only a subset of what they
> need.
> > That would also allow people to also port your work to other language by
> > simply writing a wrapper around the DB API. It's not a necessity, but I
> > thought it was worth mentioning as an option.
> >
> > Cheers
> > Joris
> >
> > On Wed, Jun 27, 2018 at 10:22 PM, alejandro baranek <
> > alejandrobara...@gmail.com> wrote:
> >
> >> By now, we are on that situation: +- 150 polyhedra published.
> >> But +800 able to publish and because of package size cannot publish all
> of
> >> them.
> >>
> >> It is not a problem on github, it's a problem on CRAN, with building
> >> (fixed
> >> testing timing with simple sample techniques) timing. I would like to
> hear
> >> more from experienced package developers about this issues, but we
> seemed
> >> to found a solution.
> >>
> >> We decided to make another github repo RpolyhedraDB. When you install
> the
> >> package, it downloads the database from the correct tag marked in the
> data
> >> folder of the package in a home directory of the user. So package will
> be
> >> minimal for CRAN, will be RR and will install database on first use (In
> >> case of TRAVIS or other qa/continuous integration, it will install it of
> >> course). It will be possible to setup different DB size using the TAGS,
> in
> >> case we find it preferable to the users.
> >>
> >>
> >> Best, Ale.
> >>
> >>
> >> 2018-03-29 4:43 GMT-03:00 Berry Boessenkool <
> berryboessenk...@hotmail.com
> >> >:
> >>
> >> >
> >> > I assume you cannot simply reduce the 150 to a few for demonstration
> >> > purposes?
> >> >
> >> >
> >> > I have seen people using DRAT packages on github for data, but gh is
> >> > limited in size restrictions as well...
> >> >
> >> >
> >> > No expert in this, but maybe this helps a little bit...
> >> >
> >> > Berry
> >> >
> >> >
> >> >
> >> > -
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > --
> >> > *From:* R-package-devel  on
> >> behalf
> >> > of alejandro baranek 
> >> > *Sent:* Tuesday, March 27, 2018 19:26
> >> > *To:* r-package-devel@r-project.org
> >> > *Subject:* [R-pkg-devel] Questions about making a database package
> >>
> >> > (Rpolyhedra)
> >> >
> >> > Hello group:
> >> >
> >> > We released Rpolyhedra V0.2 last month. It is able to scrape +800
> >> polyhedra
> >> > definitions from public sources. At V0.2.4 we are publishing only 150
> >> > because the time needed for scrape all the polyhedra, testing and the
> >> > resulting size of the package. The difference is a configuration in
> >> zzz.R,
> >> > very simple to change (Who wants to try it, can build the package for
> >> > themeselves)
> >> > Only the source files of polyhedra definitions are +12MB of size (We
> are
> >> > including it in the data folder for package self suficience).
> >> >
> >> > But we have doubts about good practices for publishing a database
> >> package.
> >> >
> >> > We think the solution is to split the package in an internal
> >> > Rpolyhedra-lib, opensource but not in CRAN, and Rpolyhedra with a
> >> catalog
> >> > sewhich enables to con

Re: [R-pkg-devel] Checksums changed on CRAN without any visible modifications to files.

2018-06-29 Thread Joris Meys
Thx Henrik.

That's indeed one of the issues: right now these minor and defendable
changes are not reflected in the version or filename. Hence there's really
no way to know the tarball (and hence the checksums) changed other than a
build suddenly failing. Our sysadmin proposed to add eg _1 or _patched to
the updated tarball, but I also realize this would require a lot of changes
in other places. I wish I knew a way this could be avoided without causing
trouble elsewhere, but I'm not inventive enough alas.

The other three packages mentioned in that issue, don't even show a visible
change. So in those cases, all (text) files in the tarball are identical
and yet the checksum somehow changed as well. There it's even more
baffling, but I'm more confident that this should be solvable on CRAN
without interfering with other things.

Cheers
Joris



On Thu, Jun 28, 2018 at 6:50 PM, Henrik Bengtsson <
henrik.bengts...@gmail.com> wrote:

> Below are more details on/examples of changes due to ORCID URLs being
> added to the DESCRIPTION file (from
> https://github.com/easybuilders/easybuild-easyconfigs/pull/6446#
> issuecomment-396574744):
>
> $ diff -ru RWeka.orig RWeka
> diff -ru RWeka.orig/DESCRIPTION RWeka/DESCRIPTION
> --- RWeka.orig/DESCRIPTION 2018-01-07 16:27:10.0 +0100
> +++ RWeka/DESCRIPTION 2018-05-23 11:45:56.0 +0200
> @@ -26,12 +26,12 @@
>  License: GPL-2
>  NeedsCompilation: no
>  Packaged: 2018-01-07 15:04:47 UTC; hornik
> -Author: Kurt Hornik [aut, cre] (-0003-4198-9911),
> +Author: Kurt Hornik [aut, cre] (),
>Christian Buchta [ctb],
>Torsten Hothorn [ctb],
>Alexandros Karatzoglou [ctb],
>David Meyer [ctb],
> -  Achim Zeileis [ctb] (-0003-0918-3766)
> +  Achim Zeileis [ctb] ()
>  Maintainer: Kurt Hornik 
>  Repository: CRAN
>  Date/Publication: 2018-01-07 16:17:29
> diff -ru RWeka.orig/MD5 RWeka/MD5
> --- RWeka.orig/MD5 2018-01-07 16:27:10.0 +0100
> +++ RWeka/MD5 2018-05-23 11:45:56.0 +0200
> @@ -1,5 +1,5 @@
>  5ee28414fe580928907527d9e4217845 *CHANGELOG
> -4aae74779d3a1de0fdc64beec22078ee *DESCRIPTION
> +fe0f10b7f193e91112c978228acaa5ae *DESCRIPTION
>  41b1dde3a37014e3c2c5fa208fc47167 *NAMESPACE
>  f9a81f720aebf5398a94efa32a2047a5 *R/AAA.R
>  e8b6adbe6a0b2cf61f433762e1fd16dd *R/arff.R
>
> It looks like such updates to existing tarballs cause troubles
> downstream (e.g. EasyBuild).  Although these updates are minor (in a
> functional sense), it does mean that we cannot guarantee that everyone
> gets identical installs.
>
> /Henrik
> On Wed, Jun 27, 2018 at 7:03 AM Joris Meys  wrote:
> >
> > Correction: In the case of mgcv it was the publication date that changed
> > for some reason. Our sysadmins keep reaching out to me in the hope to
> > resolve this. Is there a way they can reach out so this can be clarified?
> >
> > For reference, the latest issue on mgcv :
> > https://github.com/easybuilders/easybuild-easyconfigs/issues/6501
> >
> > Cheers
> > Joris
> >
> > On Wed, Jun 27, 2018 at 3:30 PM, Joris Meys  wrote:
> >
> > > Dear Uwe,
> > >
> > > sorry to bother you again with it, but I was wondering if you had some
> > > more idea about what happened. I just received another one, this time
> about
> > > mgcv_1.8-23.tar.gz. It goes beyond my understanding as to why the MD5
> sums
> > > would change without any change to the package. Is there anything I can
> > > check at this side to make it easier for you?
> > >
> > > Cheers
> > > Joris
> > >
> > > On Thu, Jun 14, 2018 at 6:55 PM, Joris Meys 
> wrote:
> > >
> > >> Dear Uwe,
> > >>
> > >> Thank you for willing to take a look. In the report it was about the
> > >> following tarballs :
> > >>
> > >> pkgmaker_0.22.tar.gz
> > >> rngtools_1.2.4.tar.gz
> > >> RcppProgress_0.4.tar.gz
> > >>
> > >> Our sysadmin tried diff -ru, but couldn't find any difference in the
> text
> > >> files.
> > >>
> > >> Cheers
> > >> Joris
> > >>
> > >> On Thu, Jun 14, 2018 at 5:46 PM, Uwe Ligges <
> > >> lig...@statistik.tu-dortmund.de> wrote:
> > >>
> > >>>
> > >>>
> > >>> On 13.06.2018 15:40, Joris Meys wrote:
> > >>>
> > 
> > 
> >  On Wed, Jun 13, 2018 at 3:16 PM, Uwe Ligges <
> >  lig...@statistik.tu-dortmund.de  dortmund.de>>
> >  wrote:
> > 
> > 
> >  When CRAN repacks and changes the DESCRIPTION file, the
> checksums
> >  change, of course.
> > 
> >  Best,
> >  Uwe Ligges
> > 
> > 
> >  Dear Uwe,
> > 
> >  I understood that from the previous issue. In this case however,
> none
> >  of the files -including the DESCRIPTION file- were changed. Am I
> right in
> >  suspecting that a package is repacked when moved to the archive?
> > 
> > >>>
> > >>> No, we do not repack generally then. Perhaps the package got
> orphaned?
> > >>>
> > >>>
> > >>> Which package / tarball were you talking about? I can take a look why
> > >>> th

Re: [R-pkg-devel] Questions about making a database package (Rpolyhedra)

2018-06-29 Thread Dirk Eddelbuettel


On 29 June 2018 at 09:15, Mark van der Loo wrote:
| Hi Alejandro,
| 
| Brooke Anderson gave a nice talk at useR!2017 addressing this exact issue.
| See
| https://schd.ws/hosted_files/user2017/19/anderson-eddelbuettel-use_r_talk.pdf
| for
| the slides. The basic idea is to use an external CRAN-like repository for
| the data back-end. Brooke used 'drat' to set up such a repo.

Thanks for the kind wordds. We also turned this fun project into an R Journal
paper with more background and details:

  https://journal.r-project.org/archive/2017/RJ-2017-026/index.html

Dirk

-- 
http://dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Questions about making a database package (Rpolyhedra)

2018-06-29 Thread alejandro baranek
Hello Mark, Dirk:

We will study your suggestions and write back in some days with our decided
approach.

Thanks!
Ale

2018-06-29 8:36 GMT-03:00 Dirk Eddelbuettel :

>
> On 29 June 2018 at 09:15, Mark van der Loo wrote:
> | Hi Alejandro,
> |
> | Brooke Anderson gave a nice talk at useR!2017 addressing this exact
> issue.
> | See
> | https://schd.ws/hosted_files/user2017/19/anderson-
> eddelbuettel-use_r_talk.pdf
> | for
> | the slides. The basic idea is to use an external CRAN-like repository for
> | the data back-end. Brooke used 'drat' to set up such a repo.
>
> Thanks for the kind wordds. We also turned this fun project into an R
> Journal
> paper with more background and details:
>
>   https://journal.r-project.org/archive/2017/RJ-2017-026/index.html
>
> Dirk
>
> --
> http://dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org
>



-- 
 alejandro baranek
@ken4rab 
qbotics  | surferinvaders
 | algebraic-soundscapes
 | surfer-shuffle


[[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel