[R-pkg-devel] Possible bug report/false positive

2022-01-06 Thread Igor L
To whom it may concern,
I'm using devtools::check() and trying to upload my package to CRAN (The
code is available in github
), but I'm not
able to understand why the function is showing 1 NOTE:


-- R CMD check results

dail 0.0.0.9000 
Duration: 17.9s
> checking R code for possible problems ... NOTE
  Found an obsolete/platform-specific call in the following function:
'requests'
  Found the platform-specific device:
'X11'
  dev.new() is the preferred way to open a new device, in the unlikely
  event one is needed.
  requests: no visible binding for global variable 'protocolo'
  Undefined global functions or variables:
protocolo
0 errors √ | 0 warnings √ | 1 note x


After uploading it to CRAN, I noticed that: (1) this only happens on
debian; and (2) as pointed out in this answer
,
the code might be understanding X11 as a function from package::grDevices

.
Does anyone know the reason for this and how to solve it?

Best regards,
*Igor Laltuf Marques*
https://igorlaltuf.github.io/

[[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Possible bug report/false positive

2022-01-06 Thread Igor L
Ivan,
Declaring the variables inside the function solved the problem (protocolo
<- X1 <- X2 <- ... X21 <- NULL).
Thanks for the help and suggestions for improving the code.
Best Regards,
Igor

Em qui., 6 de jan. de 2022 às 11:37, Ivan Krylov 
escreveu:

> On Wed, 5 Jan 2022 17:15:20 -0300
> Igor L  wrote:
>
> >   Found the platform-specific device:
> >'X11'
>
> > how to solve it?
>
> One of the tricks that work (in the sense that calls to functions using
> non-standard evaluation don't result in warnings about "Undefined
> global functions or variables") is to declare the variables locally,
> inside the function:
>
> protocolo <- X1 <- X2 <- ... X21 <- NULL
> var <- readr::read_csv2(...) ...
>
> Alternatively, since you know that the file always has 21 columns,
> you can pass the variable to `colnames<-` instead of dplyr::rename,
> together with a vector of column names you already have in the
> nomes.colunas vector. This way, you won't need to declare the 21 dummy
> variable.
>
> By the way, you shouldn't declare at least download.file and unzip
> functions as global variables. Instead, import them from the utils
> package in your NAMESPACE (or using the @importFrom Roxygen tag, if you
> use Roxygen).
>
> There are other ways the package code could be improved:
>
>  - There doesn't seem to be a need for the dynamically-named variable
>you create using assign(paste0('pedidos', i), var) and remove soon
>after; you can just use `var` instead of get(paste0('pedidos', i)).
>  - If you're worried about leaving temporary variables around, move
>the loop body into a separate function so that anything you don't
>return from it would be cleaned up automatically.
>  - You can future-proof your package by creating the URLs with
>paste0('
> https://dadosabertos-download.cgu.gov.br/FalaBR/Arquivos_FalaBR_Filtrado/Arquivos_csv_
> ',
>year, '.zip') instead of hard-coding their list. It seems likely to
>me that once the 2022 Right to Information Law report is available,
>it'll have a predictable URL. If not, then you'll update the
>package (as you were going to anyway).
>  - If you need to iterate over indices in a vector, use for (idx in
>seq_along(vector)) instead of for (i in vector) and match() to
>find the index. Though in this case, the code can be modified to
>avoid the need for the index in the loop body.
>
> --
> Best regards,
> Ivan
>

[[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


[R-pkg-devel] Non-ASCII and CRAN Checks

2022-09-19 Thread Igor L
Hello everybody,

I'm testing my package with the devtools::check() function and I got a
warning about found non-ASCII strings.

These characters are in a dataframe and, as they are names of institutions
used to filter databases, it makes no sense to translate them.

Is there any way to make the check accept these characters?

They are in latin1 encoding.

Thanks in advance!

--
*Igor Laltuf Marques*
Economist (UFF)
Master in urban and regional planning (IPPUR-UFRJ)
Researcher at ETTERN e CiDMob
https://igorlaltuf.github.io/

[[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


[R-pkg-devel] How to decrease time to import files in xlsx format?

2022-10-04 Thread Igor L
Hello all,

I'm developing an R package that basically downloads, imports, cleans and
merges nine files in xlsx format updated monthly from a public institution.

The problem is that importing files in xlsx format is time consuming.

My initial idea was to parallelize the execution of the read_xlsx function
according to the number of cores in the user's processor, but apparently it
didn't make much difference, since when trying to parallelize it the
execution time went from 185.89 to 184.12 seconds:

# not parallelized code
y <- purrr::map_dfr(paste0(dir.temp, '/', lista.arquivos.locais),
   readxl::read_excel, sheet = 1, skip = 4, col_types =
c(rep('text', 30)))

# parallelized code
plan(strategy = future::multicore(workers = 4))
y <- furrr::future_map_dfr(paste0(dir.temp, '/', lista.arquivos.locais),
 readxl::read_excel, sheet = 1, skip = 4,
col_types = c(rep('text', 30)))

 Any suggestions to reduce the import processing time?

Thanks in advance!

-- 
*Igor Laltuf Marques*
Economist (UFF)
Master in Urban and Regional Planning (IPPUR-UFRJ)
Researcher at ETTERN and CiDMob
https://igorlaltuf.github.io/

[[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] How to decrease time to import files in xlsx format?

2022-10-05 Thread Igor L
According to my internet research, it looks like readxl is the fastest
package.

The profvis package indicated that the bottleneck is indeed in importing
the files.

My processor has six cores, but when I use four of them the computer
crashes completely. When I use three processors, it's still usable. So I
did one more benchmark comparing for loop, map_dfr and future_map_dfr (with
multisession and three cores).

After the benchmark was run 10 times, the result was:

 expr  min  lq   mean
 medianuq  max neval
 import_for()140.9940 147.9722 160.7229 155.6459 172.4661
199.105910
 import_map_dfr()   161.6707 339.6769 480.5760 567.8389 643.8895 666.0726
 10
   import_furrr()112.1374 116.4301 127.5976 129.0067 137.9179
140.863210

For me it is proven that using the furrr package is the best solution in
this case, but what would explain so much difference with map_dfr?

Em ter., 4 de out. de 2022 às 16:58, Jeff Newmiller <
jdnew...@dcn.davis.ca.us> escreveu:

> It looks like you are reading directly from URLs? How do you know the
> delay is not network I/O delay?
>
> Parallel computation is not a panacea. It allows tasks _that are
> CPU-bound_ to get through the CPU-intensive work faster. You need to be
> certain that your tasks actually can benefit from parallelism before using
> it... there is a significant overhead and added complexity to using
> parallel processing that will lead to SLOWER processing if mis-used.
>
> On October 4, 2022 11:29:54 AM PDT, Igor L  wrote:
> >Hello all,
> >
> >I'm developing an R package that basically downloads, imports, cleans and
> >merges nine files in xlsx format updated monthly from a public
> institution.
> >
> >The problem is that importing files in xlsx format is time consuming.
> >
> >My initial idea was to parallelize the execution of the read_xlsx function
> >according to the number of cores in the user's processor, but apparently
> it
> >didn't make much difference, since when trying to parallelize it the
> >execution time went from 185.89 to 184.12 seconds:
> >
> ># not parallelized code
> >y <- purrr::map_dfr(paste0(dir.temp, '/', lista.arquivos.locais),
> >   readxl::read_excel, sheet = 1, skip = 4, col_types =
> >c(rep('text', 30)))
> >
> ># parallelized code
> >plan(strategy = future::multicore(workers = 4))
> >y <- furrr::future_map_dfr(paste0(dir.temp, '/', lista.arquivos.locais),
> > readxl::read_excel, sheet = 1, skip = 4,
> >col_types = c(rep('text', 30)))
> >
> > Any suggestions to reduce the import processing time?
> >
> >Thanks in advance!
> >
>
> --
> Sent from my phone. Please excuse my brevity.
>

[[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


[R-pkg-devel] Problem with package containing spatial data

2023-02-08 Thread Igor L
Hello all,

I'm developing a package that contains spatial data about public safety in
Rio de Janeiro.

The problem is that when I use the usethis::use_data function which
transforms the shapefile data into a file with the .rda extension, I cannot
use the geometry attribute to create a map.

E.g.:

# Raw-data script:

spatial_aisp <- sf::st_read('data-raw/shp_aisp/lm_aisp_2019.shp')

plot(spatial_aisp) # works

# Same data from .rda file after use usethis::use_data(spatial_aisp,
overwrite = TRUE)

x <- ispdata::spatial_aisp

plot(x) # do not work

Error message:
Error in data.matrix(x) :
  'list' object cannot be coerced to type 'double'


This is happening with all spatial data in the package. I'm using lazydata:
true and have already disabled file compression options, but the problem
persists.

Any ideas?

Scripts can be accessed at https://github.com/igorlaltuf/ispdata

Thanks!
-- 
*Igor Laltuf Marques*
Economist (UFF)
Master in Urban and Regional Planning (IPPUR-UFRJ)
Researcher at ETTERN and CiDMob Laboratories
https://igorlaltuf.github.io/

[[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Problem with package containing spatial data

2023-02-09 Thread Igor L
Thank you all for your help.

Adam, I followed your suggestion, but I still can't figure out why the data
is only available locally when I run the devtools::load_all() function.

When I install the package from GitHub, the data does not appear (even
using the data() function).

Extdata: https://github.com/igorlaltuf/ispdata/tree/main/inst/extdata
Import: https://github.com/igorlaltuf/ispdata/blob/main/R/spatial_cisp.R

And is there any way to reduce the size of files in gpkg format? The
package increased from 5 to 10 megabytes. I tried to make it smaller with
the function sf::st_write("inst/extdata/spatial_aisp.gpkg", compress =
"deflate", append = F), but the size remained the same.

Thanks again for all the help.

Em qui., 9 de fev. de 2023 às 14:41, Alexandre Courtiol <
alexandre.court...@gmail.com> escreveu:

> Hi Igor,
>
> I had the same issue using terra rather than sf a couple of weeks ago.
>
> I thought of solving the issue as follow:
>
>
>1.
>
>store the shapefiles under extdata.
>2.
>
>create a function that loads the files:
>
> .build_internal_files <- function() {
>   ## This function should not be called by the user.
>   ## It performs the lazy loading of the data since terra cannot handle rda 
> files
>   assign("CountryBorders", 
> terra::vect(system.file("extdata/CountryBorders.shp", package = "IsoriX")), 
> envir = as.environment("package:IsoriX"))
>   assign("OceanMask", terra::vect(system.file("extdata/OceanMask.shp", 
> package = "IsoriX")), envir = as.environment("package:IsoriX"))
> }
>
>
>1. call that function automatically upon attach using .onAttach():
>
> .onAttach <- function(libname, pkgname) {
> .build_internal_files() ## lazy loading of the internal data
> }
>
> It seems to work...
>
> Note that .onAttach() is a standard way of defining a function that is
> recognised by R and ran when the package is attached.
>
> ++
>
> Alex
>
> On Thu, 9 Feb 2023 at 11:11, Duncan Murdoch 
> wrote:
>
>> On 09/02/2023 3:56 a.m., Ivan Krylov wrote:
>> > В Wed, 8 Feb 2023 11:32:36 -0300
>> > Igor L  пишет:
>> >
>> >> spatial_aisp <- sf::st_read('data-raw/shp_aisp/lm_aisp_2019.shp')
>> >>
>> >> plot(spatial_aisp) # works
>> >>
>> >> # Same data from .rda file after use usethis::use_data(spatial_aisp,
>> >> overwrite = TRUE)
>> >>
>> >> x <- ispdata::spatial_aisp
>> >>
>> >> plot(x) # do not work
>> >
>> > Does this break in a new R session, but start working when you load the
>> > sf namespace? I think that your package needs to depend on sf in order
>> > for this to work. Specifying it in Imports may be enough to make the
>> > plot.sf S3 method available to the user.
>>
>> Specifying a package in the Imports field of DESCRIPTION guarantees that
>> it will be available to load, but doesn't load it.  Importing something
>> from it via the NAMESPACE triggers a load, as does executing code like
>> pkg::fn, or explicitly calling loadNamespace("pkg"), or loading a
>> package that does one of these things.
>>
>>
>> > You may encounter other problems if you go this way, like R CMD check
>> > complaining that you don't use the package you're importing. Loading
>> > the data from a file on demand would also load the sf namespace and
>> > thus solve the problem.
>>
>> Workarounds for the check complaints are discussed here, among other
>> places:  https://stackoverflow.com/a/75384338/2554330 .
>>
>> Duncan Murdoch
>>
>> __
>> R-package-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-package-devel
>>
>
>
> --
> Alexandre Courtiol, www.datazoogang.de
>

[[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel