[Rd] Defragmentation of memory

2016-09-04 Thread Måns Magnusson
Dear all developers,

I'm working with a lot of textual data in R and need to handle this batch
by batch. The problem is that I read in batches of 10 000 documents and do
some calculations that results in objects that consume quite some memory
(calculate unigrams, 2-grams and 3-grams). In every iteration a new objects
(~ 500 mB) is created (and I can't control the size, so a new object needs
to be created each iteration). The speed of this computations is decreasing
every iteration (first iteration 7 sec, after 30 iterations 20-30 minutes
per iteration).

I (think) I localized the problem to R:s memory handling and that my
approach is fragmenting the memory. If I do this batch handling in Bash and
starting up a new R session for each batch it takes ~ 7 sec per batch, so
it is nothing with the individual batches. The garbage collector do not
seem to handle this (potential) fragmentation.

Can the reason of the poor performance after a couple of iterations be that
I'm fragmenting the memory? If so, is there a solution that can used to
handle this within R, such as defragmentation or restarting R from within R?

With kind regards
Måns Magnusson

PhD Student, Statistics, Linköping University.

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] new function to tools/utils package: dependencies based on DESCRIPTION file

2016-09-04 Thread Jan Górecki
Is there any better mailing list for utils related discussion?
Jan

On 16 June 2016 at 14:00, Michael Lawrence 
wrote:

> I agree that the utils package needs some improvements related to
> this, and hope to make them eventually. This type of feedback is very
> helpful.
>
> Thanks,
> Michael
>
>
>
> On Thu, Jun 16, 2016 at 1:42 AM, Jan Górecki  wrote:
> > Dear Joris,
> >
> > So it does looks like the proposed function makes a lot sense then,
> isn't it?
> >
> > Cheers,
> > Jan
> >
> > On 16 June 2016 at 08:37, Joris Meys  wrote:
> >> Dear Jan,
> >>
> >> It is unavoidable to have OS and R dependencies for devtools. The
> building
> >> process for packages is both OS and R dependent, so devtools has to be
> too
> >> according to my understanding.
> >>
> >> Cheers
> >> Joris
> >>
> >> On 14 Jun 2016 18:56, "Jan Górecki"  wrote:
> >>
> >> Hi Thierry,
> >>
> >> I'm perfectly aware of it. Any idea when devtools would be shipped as
> >> a base R package, or at least recommended package? To actually answer
> >> the problem described in my email.
> >> I have range of useful functions available tools/utils packages which
> >> are shipped together with R. They doesn't require any OS dependencies
> >> or R dependencies, unlike devtools which requires both. Installing
> >> unnecessary OS dependencies and R dependencies just for such a simple
> >> wrapper doesn't seem to be an elegant way to address it, therefore my
> >> proposal to include that simple function in tools, or utils package.
> >>
> >> Regards,
> >> Jan Gorecki
> >>
> >> On 14 June 2016 at 16:17, Thierry Onkelinx 
> wrote:
> >>> Dear Jan,
> >>>
> >>> Similar functionality is available in devtools::dev_package_deps()
> >>>
> >>> Best regards,
> >>>
> >>> ir. Thierry Onkelinx
> >>> Instituut voor natuur- en bosonderzoek / Research Institute for Nature
> and
> >>> Forest
> >>> team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
> >>> Kliniekstraat 25
> >>> 1070 Anderlecht
> >>> Belgium
> >>>
> >>> To call in the statistician after the experiment is done may be no more
> >>> than
> >>> asking him to perform a post-mortem examination: he may be able to say
> >>> what
> >>> the experiment died of. ~ Sir Ronald Aylmer Fisher
> >>> The plural of anecdote is not data. ~ Roger Brinner
> >>> The combination of some data and an aching desire for an answer does
> not
> >>> ensure that a reasonable answer can be extracted from a given body of
> >>> data.
> >>> ~ John Tukey
> >>>
> >>> 2016-06-14 16:54 GMT+02:00 Jan Górecki :
> 
>  Hi all,
> 
>  Packages tools and utils have a lot of useful stuff for R developers.
>  I find one task still not as straightforward as it could. Simply to
>  extract dependencies of a package from DESCRIPTION file (before it is
>  even installed to library). This would be valuable in automation of CI
>  setup in a more meta-data driven way.
>  The simple function below, I know it is short and simple, but having
>  it to be defined in each CI workflow is a pain, it could be already
>  available in tools or utils namespace.
> 
>  package.dependencies.dcf <- function(file = "DESCRIPTION", which =
>  c("Depends","Imports","LinkingTo")) {
>  stopifnot(file.exists(file), is.character(which))
>  which_all <- c("Depends", "Imports", "LinkingTo", "Suggests",
>  "Enhances")
>  if (identical(which, "all"))
>  which <- which_all
>  else if (identical(which, "most"))
>  which <- c("Depends", "Imports", "LinkingTo", "Suggests")
>  stopifnot(which %in% which_all)
>  dcf <- read.dcf(file, which)
>  # parse fields
>  raw.deps <- unlist(strsplit(dcf[!is.na(dcf)], ",", fixed = TRUE))
>  # strip stated dependency version
>  deps <- trimws(sapply(strsplit(trimws(raw.deps), "(", fixed =
>  TRUE), `[[`, 1L))
>  # exclude base R pkgs
>  base.pkgs <- c("R", rownames(installed.packages(priority =
> "base")))
>  setdiff(deps, base.pkgs)
>  }
> 
>  This allows to easily install all package dependencies just based on
>  DESCRIPTION file, so simplify that in custom CI workflows to:
> 
>  if (length(pkgs<-package.dependencies.dcf(which="all")))
>  install.packages(pkgs)
> 
>  And would not require to install custom packages or shell scripts.
> 
>  Regards,
>  Jan Gorecki
> 
>  __
>  R-devel@r-project.org mailing list
>  https://stat.ethz.ch/mailman/listinfo/r-devel
> >>>
> >>>
> >>
> >> __
> >> R-devel@r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-devel
> >
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-dev