Re: [R] [R-sig-hpc] Quickest way to make a large "empty" file on disk?

Rui Barradas Thu, 27 Sep 2012 19:18:09 -0700

Hello,

If you really need to trash your disk, why not use seek()?


 > fl <- file("Test.txt", open = "wb")
 > seek(fl, where = 1024, origin = "start", rw = "write")
[1] 0
 > writeChar(character(1), fl, nchars = 1, useBytes = TRUE)
Warning message:
In writeChar(character(1), fl, nchars = 1, useBytes = TRUE) :
   writeChar: more characters requested than are in the string - will 
zero-pad
 > close(fl)


File "Test.txt" is now 1Kb in size.

Hope this helps,

Rui Barradas
Em 27-09-2012 20:17, Jonathan Greenberg escreveu:
> Folks:
>
> Asked this question some time ago, and found what appeared (at first) to be
> the best solution, but I'm now finding a new problem.  First off, it seemed
> like ff as Jens suggested worked:
>
> # outdata_ncells = the number of rows * number of columns * number of bands
> in an image:
> out<-ff(vmode="double",length=outdata_ncells,filename=filename)
> finalizer(out) <- close
> close(out)
>
> This was working fine until I attempted to set length to a VERY large
> number: outdata_ncells = 17711913600.  This would create a file that is
> 131.964GB.  Big, but not obscenely so (and certainly not larger than the
> filesystem can handle).  However, length appears to be restricted
> by .Machine$integer.max (I'm on a 64-bit windows box):
>> .Machine$integer.max
> [1] 2147483647
>
> Any suggestions on how to solve this problem for much larger file sizes?
>
> --j
>
>
> On Thu, May 3, 2012 at 10:44 AM, Jonathan Greenberg <j...@illinois.edu>wrote:
>
>> Thanks, all!  I'll try these out.  I'm trying to work up something that is
>> platform independent (if possible) for use with mmap.  I'll do some tests
>> on these suggestions and see which works best. I'll try to report back in a
>> few days.  Cheers!
>>
>> --j
>>
>>
>>
>> 2012/5/3 "Jens Oehlschlägel" <jens.oehlschlae...@truecluster.com>
>>
>>> Jonathan,
>>>
>>> On some filesystems (e.g. NTFS, see below) it is possible to create
>>> 'sparse' memory-mapped files, i.e. reserving the space without the cost of
>>> actually writing initial values.
>>> Package 'ff' does this automatically and also allows to access the file
>>> in parallel. Check the example below and see how big file creation is
>>> immediate.
>>>
>>> Jens Oehlschlägel
>>>
>>>
>>>> library(ff)
>>>> library(snowfall)
>>>> ncpus <- 2
>>>> n <- 1e8
>>>> system.time(
>>> + x <- ff(vmode="double", length=n, filename="c:/Temp/x.ff")
>>> + )
>>>         User      System verstrichen
>>>         0.01        0.00        0.02
>>>> # check finalizer, with an explicit filename we should have a 'close'
>>> finalizer
>>>> finalizer(x)
>>> [1] "close"
>>>> # if not, set it to 'close' inorder to not let slaves delete x on slave
>>> shutdown
>>>> finalizer(x) <- "close"
>>>> sfInit(parallel=TRUE, cpus=ncpus, type="SOCK")
>>> R Version:  R version 2.15.0 (2012-03-30)
>>>
>>> snowfall 1.84 initialized (using snow 0.3-9): parallel execution on 2
>>> CPUs.
>>>
>>>> sfLibrary(ff)
>>> Library ff loaded.
>>> Library ff loaded in cluster.
>>>
>>> Warnmeldung:
>>> In library(package = "ff", character.only = TRUE, pos = 2, warn.conflicts
>>> = TRUE,  :
>>>    'keep.source' is deprecated and will be ignored
>>>> sfExport("x") # note: do not export the same ff multiple times
>>>> # explicitely opening avoids a gc problem
>>>> sfClusterEval(open(x, caching="mmeachflush")) # opening with
>>> 'mmeachflush' inststead of 'mmnoflush' is a bit slower but prevents OS
>>> write storms when the file is larger than RAM
>>> [[1]]
>>> [1] TRUE
>>>
>>> [[2]]
>>> [1] TRUE
>>>
>>>> system.time(
>>> + sfLapply( chunk(x, length=ncpus), function(i){
>>> +   x[i] <- runif(sum(i))
>>> +   invisible()
>>> + })
>>> + )
>>>         User      System verstrichen
>>>         0.00        0.00       30.78
>>>> system.time(
>>> + s <- sfLapply( chunk(x, length=ncpus), function(i) quantile(x[i],
>>> c(0.05, 0.95)) )
>>> + )
>>>         User      System verstrichen
>>>         0.00        0.00        4.38
>>>> # for completeness
>>>> sfClusterEval(close(x))
>>> [[1]]
>>> [1] TRUE
>>>
>>> [[2]]
>>> [1] TRUE
>>>
>>>> csummary(s)
>>>               5%  95%
>>> Min.    0.04998 0.95
>>> 1st Qu. 0.04999 0.95
>>> Median  0.05001 0.95
>>> Mean    0.05001 0.95
>>> 3rd Qu. 0.05002 0.95
>>> Max.    0.05003 0.95
>>>> # stop slaves
>>>> sfStop()
>>> Stopping cluster
>>>
>>>> # with the close finalizer we are responsible for deleting the file
>>> explicitely (unless we want to keep it)
>>>> delete(x)
>>> [1] TRUE
>>>> # remove r-side metadata
>>>> rm(x)
>>>> # truly free memory
>>>> gc()
>>>
>>>
>>>   *Gesendet:* Donnerstag, 03. Mai 2012 um 00:23 Uhr
>>> *Von:* "Jonathan Greenberg" <j...@illinois.edu>
>>> *An:* r-help <r-help@r-project.org>, r-sig-...@r-project.org
>>> *Betreff:* [R-sig-hpc] Quickest way to make a large "empty" file on
>>> disk?
>>>   R-helpers:
>>>
>>> What would be the absolute fastest way to make a large "empty" file (e.g.
>>> filled with all zeroes) on disk, given a byte size and a given number
>>> number of empty values. I know I can use writeBin, but the "object" in
>>> this case may be far too large to store in main memory. I'm asking because
>>> I'm going to use this file in conjunction with mmap to do parallel writes
>>> to this file. Say, I want to create a blank file of 10,000 floating point
>>> numbers.
>>>
>>> Thanks!
>>>
>>> --j
>>>
>>> --
>>> Jonathan A. Greenberg, PhD
>>> Assistant Professor
>>> Department of Geography and Geographic Information Science
>>> University of Illinois at Urbana-Champaign
>>> 607 South Mathews Avenue, MC 150
>>> Urbana, IL 61801
>>> Phone: 415-763-5476
>>> AIM: jgrn307, MSN: jgrn...@hotmail.com, Gchat: jgrn307, Skype: jgrn3007
>>> http://www.geog.illinois.edu/people/JonathanGreenberg.html
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> R-sig-hpc mailing list
>>> r-sig-...@r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
>>>
>>>
>>>
>>
>> --
>> Jonathan A. Greenberg, PhD
>> Assistant Professor
>> Department of Geography and Geographic Information Science
>> University of Illinois at Urbana-Champaign
>> 607 South Mathews Avenue, MC 150
>> Urbana, IL 61801
>> Phone: 415-763-5476
>> AIM: jgrn307, MSN: jgrn...@hotmail.com, Gchat: jgrn307, Skype: jgrn3007
>> http://www.geog.illinois.edu/people/JonathanGreenberg.html
>>
>
>
>
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] [R-sig-hpc] Quickest way to make a large "empty" file on disk?

Reply via email to