date:20180816

Re: [Rd] Problem with parseData

2018-08-16 Thread Tomas Kalibera


Dear Barbara,

thank you for the report. This is something to be fixed in R - I am now 
testing a patch that adds the extra node for the equality assignment 
expression.


Best,
Tomas

On 07/30/2018 05:35 PM, Barbara Lerner wrote:

Hi,

I have run into a problem with parseData from the utils package.  When
an assignment is done with = instead of <-, the information provided by
parseData does not include an entry for the assignment.

For this input, stored in file "BadPosition.R":

y <- 5
foo = 7

And running this code:

parsed <- parse("BadPosition.R", keep.source=TRUE)
parsedData <- utils::getParseData (parsed, includeText=TRUE)
print(paste("parseData =", parsedData))

I get the following output:

[1] "parseData = c(1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2)"
[2] "parseData = c(1, 1, 1, 3, 6, 6, 1, 1, 5, 7, 7)"
[3] "parseData = c(1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2)"
[4] "parseData = c(6, 1, 1, 4, 6, 6, 3, 3, 5, 7, 7)"
[5] "parseData = c(7, 1, 3, 2, 4, 5, 10, 12, 11, 13, 14)"
[6] "parseData = c(0, 3, 7, 7, 5, 7, 12, 0, 0, 14, 0)"
[7] "parseData = c(\"expr\", \"SYMBOL\", \"expr\", \"LEFT_ASSIGN\",
\"NUM_CONST\", \"expr\", \"SYMBOL\", \"expr\", \"EQ_ASSIGN\",
\"NUM_CONST\", \"expr\")"
[8] "parseData = c(FALSE, TRUE, FALSE, TRUE, TRUE, FALSE, TRUE, FALSE,
TRUE, TRUE, FALSE)"
[9] "parseData = c(\"y <- 5\", \"y\", \"y\", \"<-\", \"5\", \"5\",
\"foo\", \"foo\", \"=\", \"7\", \"7\")"

Notice how there is an entry for "y <- 5" beginning on line 1, column 1,
ending at line 1, column 6, but there is no analogous entry for "foo = 7".

I am running R 3.5.0 on a Mac running 10.12.6.

Thanks for your help and please let me know if you need any further
information.

Barbara



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] longint

2018-08-16 Thread Dirk Eddelbuettel



On 15 August 2018 at 20:32, Benjamin Tyner wrote:
| Thanks for the replies and for confirming my suspicion.
| 
| Interestingly, src/include/S.h uses a trick:
| 
|     #define longint int
| 
| and so does the nlme package (within src/init.c).

As Bill Dunlap already told you, this is a) ancient and b) was concerned with
the int as 16 bit to 32 bit transition period. Ie a long time ago. Old C
programmers remember.

You should preferably not even use 'long int' on the other side but rely on
the fact that all compiler nowadays allow you to specify exactly what size is
used via int64_t (long), int32_t (int), ... and the unsigned cousins (which R
does not have).  So please receive the value as a int64_t and then cast it to
an int32_t -- which corresponds to R's notion of an integer on every platform.

And please note that that conversion is lossy.  If you must keep 64 bits then
the bit64 package by Jens Oehlschlaegel is good and eg fully supported inside
data.table. We use it for 64-bit integers as nanosecond timestamps in our
nanotime package (which has some converters).

Dirk

-- 
http://dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] longint

2018-08-16 Thread Hervé Pagès


On 08/16/2018 05:12 AM, Dirk Eddelbuettel wrote:


On 15 August 2018 at 20:32, Benjamin Tyner wrote:
| Thanks for the replies and for confirming my suspicion.
|
| Interestingly, src/include/S.h uses a trick:
|
|     #define longint int
|
| and so does the nlme package (within src/init.c).

As Bill Dunlap already told you, this is a) ancient and b) was concerned with
the int as 16 bit to 32 bit transition period. Ie a long time ago. Old C
programmers remember.

You should preferably not even use 'long int' on the other side but rely on
the fact that all compiler nowadays allow you to specify exactly what size is
used via int64_t (long), int32_t (int), ... and the unsigned cousins (which R
does not have).  So please receive the value as a int64_t and then cast it to
an int32_t -- which corresponds to R's notion of an integer on every platform.


Only on Intel platforms int is 32 bits. Strictly speaking int is only
required to be >= 16 bits. Who knows what the size of an int is on
the Sunway TaihuLight for example ;-)

H.



And please note that that conversion is lossy.  If you must keep 64 bits then
the bit64 package by Jens Oehlschlaegel is good and eg fully supported inside
data.table. We use it for 64-bit integers as nanosecond timestamps in our
nanotime package (which has some converters).

Dirk



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] longint

2018-08-16 Thread Prof Brian Ripley


On 16/08/2018 18:33, Hervé Pagès wrote:

On 08/16/2018 05:12 AM, Dirk Eddelbuettel wrote:


On 15 August 2018 at 20:32, Benjamin Tyner wrote:
| Thanks for the replies and for confirming my suspicion.
|
| Interestingly, src/include/S.h uses a trick:
|
|     #define longint int
|
| and so does the nlme package (within src/init.c).

As Bill Dunlap already told you, this is a) ancient and b) was 
concerned with

the int as 16 bit to 32 bit transition period. Ie a long time ago. Old C
programmers remember.

You should preferably not even use 'long int' on the other side but 
rely on
the fact that all compiler nowadays allow you to specify exactly what 
size is
used via int64_t (long), int32_t (int), ... and the unsigned cousins 


Well, not all compilers.  Those types were introduced in C99, but are 
optional in that standard and in C11 and C++11.  I have not checked 
C++1[47], but expect they are also optional there.  int_fast64_t is not 
optional in C99, so R uses that if int64_t is not supported.


[It is easy to overlook that they are optional in C99 and at one time R 
assumed them.]



(which R
does not have).  So please receive the value as a int64_t and then 
cast it to
an int32_t -- which corresponds to R's notion of an integer on every 
platform.


Only on Intel platforms int is 32 bits. Strictly speaking int is only
required to be >= 16 bits. Who knows what the size of an int is on
the Sunway TaihuLight for example ;-)


R's configure checks that int is 32 bit and will not compile without it 
(src/main/arithmetic.c) ... so int and int32_t are the same on all 
platforms where the latter is defined.


--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Emeritus Professor of Applied Statistics, University of Oxford

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] longint

2018-08-16 Thread Hervé Pagès


On 08/16/2018 11:30 AM, Prof Brian Ripley wrote:

On 16/08/2018 18:33, Hervé Pagès wrote:

...


Only on Intel platforms int is 32 bits. Strictly speaking int is only
required to be >= 16 bits. Who knows what the size of an int is on
the Sunway TaihuLight for example ;-)


R's configure checks that int is 32 bit and will not compile without it 
(src/main/arithmetic.c) ... so int and int32_t are the same on all 
platforms where the latter is defined.


Good to know. Thanks for the clarification!

--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Thanks for help with validspamobject

2018-08-16 Thread Ronald Barry

Hi,
  Thanks for all your help.  The problem with an error involving
validspamobject() has been resolved, as a new version of spdep (0.7-7) was
just released and it seems to have stopped using the deprecated function.

Ron B.

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Package compiler - efficiency problem

2018-08-16 Thread Karol Podemski

Dear Thomas,

thank you for prompt response and taking interest in this issue. I really
appreciate your compiler project and efficiency gains in usual case. I am
aware of limitations of interpreted languages too and because of that even
when writing my first mail I had a hunch that it is not that easy to
address this problem.  As you mentioned optimisation of compiler for
handling non-standard code may be tricky and harmful for usual code. The
question is if gEcon is the only package that may face the same issue
because of compilation.

The functions generated by gEcon are systems of non-linear equations
defining the equilibrium of an economy (see
http://gecon.r-forge.r-project.org/files/gEcon-users-guide.pdf  if you want
to learn a bit how we obtain it). The rows, you suggested to vectorise, are
indeed vectorisable because they define equilibrium for similiar markets
(e.g. production and sale of beverages and food) but do not have to be
vectorisable in general case. So that not to delve into too much details I
will stop here in description of how the equations originate. However, I
would like to point that similiar large systems of linear equations may
arise in other fields ( https://en.wikipedia.org/wiki/Steady_state ) and
there may be other packages that generate similar large systems (e.g.
network problems like hydraulic networks). In that case, reports such as
mine may help you to assess the scale of the problems.

Thank you for suggestions for improvement in our approach, i am going to
discuss them with other package developers.

Regards,
Karol Podemski

pon., 13 sie 2018 o 18:02 Tomas Kalibera 
napisał(a):

> Dear Karol,
>
> thank you for the report. I can reproduce that the function from you
> example takes very long to compile and I can see where most time is spent.
> The compiler is itself written in R and requires a lot of resources for
> large functions (foo() has over 16,000 lines of code, nearly 1 million of
> instructions/operands, 45,000 constants). In particular a lot of time is
> spent in garbage collection and in finding a unique set of constants. Some
> optimizations of the compiler may be possible, but it is unlikely that
> functions this large will compile fast any soon. For non-generated code, we
> now have the byte-compilation on installation by default which at least
> removes the compile overhead from runtime. Even though the compiler is
> slow, it is important to keep in mind that in principle, with any compiler
> there will be functions where compilation would not be improve performance
> (when the compile time is included or not).
>
> I think it is not a good idea to generate code for functions like foo() in
> R (or any interpreted language). You say that R's byte-code compiler
> produces code that runs 5-10x faster than when the function is interpreted
> by the AST interpreter (uncompiled), which sounds like a good result, but I
> believe that avoiding code generation would be much faster than that, apart
> from drastically reducing code size and therefore compile time. The
> generator of these functions has much more information than the compiler -
> it could be turned into an interpreter of these functions and compute their
> values on the fly.
>
> A significant source of inefficiency of the generated code are
> element-wise operations, such as
>
> r[12] <- -vv[88] + vv[16] * (1 + ppff[1307])
> ...
>
> r[139] <- -vv[215] + vv[47] * (1 + ppff[1434])
>
> (these could be vectorized, which would reduce code size and improve
> interpretation speed; and make it somewhat readable). Most of the code
> lines in the generated functions seem to be easily vectorizable.
>
> Compilers and interpreters necessarily use some heuristics or optimize at
> some code patterns. Optimizing for generated code may be tricky as it could
> even harm performance of usual code. And, I would much rather optimize the
> compiler for the usual code.
>
> Indeed, a pragmatic solution requiring the least amount of work would be
> to disable compilation of these generated functions. There is not a
> documented way to do that and maybe we could add it (and technically it is
> trivial), but I have been reluctant so far - in some cases, compilation
> even of these functions may be beneficial - if the speedup is 5-10x and we
> run very many times. But once the generated code included some pragma
> preventing compilation, it won't be ever compiled. Also, the trade-offs may
> change as the compiler evolves, perhaps not in this case, but in other
> where such pragma may be used.
>
> Well so the short answer would be that these functions should not be
> generated in the first place. If it were too much work rewriting, perhaps
> the generator could just be improved to produce vectorized operations.
>
> Best
> Tomas
> On 12.8.2018 21:31, Karol Podemski wrote:
>
>  Dear R team,
>
> I am a co-author and maintainer of one of R packages distributed by R-forge
> (gEcon). One of gEcon package users found a strange behaviour of

Re: [Rd] Package compiler - efficiency problem

2018-08-16 Thread Iñaki Ucar

Karol,

If I understood correctly, functions like "foo" are automatically
generated by gEcon's model parser. For such a long function, and
depending on how many times you need to call it, it may make more
sense to generate C++ code instead (including the 'for' loop). Then
you can use Rcpp::sourceCpp, or Rcpp::cppFunction, to compile it and
run it from R.

Iñaki

El vie., 17 ago. 2018 a las 0:47, Karol Podemski
() escribió:
>
> Dear Thomas,
>
> thank you for prompt response and taking interest in this issue. I really
> appreciate your compiler project and efficiency gains in usual case. I am
> aware of limitations of interpreted languages too and because of that even
> when writing my first mail I had a hunch that it is not that easy to
> address this problem.  As you mentioned optimisation of compiler for
> handling non-standard code may be tricky and harmful for usual code. The
> question is if gEcon is the only package that may face the same issue
> because of compilation.
>
> The functions generated by gEcon are systems of non-linear equations
> defining the equilibrium of an economy (see
> http://gecon.r-forge.r-project.org/files/gEcon-users-guide.pdf  if you want
> to learn a bit how we obtain it). The rows, you suggested to vectorise, are
> indeed vectorisable because they define equilibrium for similiar markets
> (e.g. production and sale of beverages and food) but do not have to be
> vectorisable in general case. So that not to delve into too much details I
> will stop here in description of how the equations originate. However, I
> would like to point that similiar large systems of linear equations may
> arise in other fields ( https://en.wikipedia.org/wiki/Steady_state ) and
> there may be other packages that generate similar large systems (e.g.
> network problems like hydraulic networks). In that case, reports such as
> mine may help you to assess the scale of the problems.
>
> Thank you for suggestions for improvement in our approach, i am going to
> discuss them with other package developers.
>
> Regards,
> Karol Podemski
>
> pon., 13 sie 2018 o 18:02 Tomas Kalibera 
> napisał(a):
>
> > Dear Karol,
> >
> > thank you for the report. I can reproduce that the function from you
> > example takes very long to compile and I can see where most time is spent.
> > The compiler is itself written in R and requires a lot of resources for
> > large functions (foo() has over 16,000 lines of code, nearly 1 million of
> > instructions/operands, 45,000 constants). In particular a lot of time is
> > spent in garbage collection and in finding a unique set of constants. Some
> > optimizations of the compiler may be possible, but it is unlikely that
> > functions this large will compile fast any soon. For non-generated code, we
> > now have the byte-compilation on installation by default which at least
> > removes the compile overhead from runtime. Even though the compiler is
> > slow, it is important to keep in mind that in principle, with any compiler
> > there will be functions where compilation would not be improve performance
> > (when the compile time is included or not).
> >
> > I think it is not a good idea to generate code for functions like foo() in
> > R (or any interpreted language). You say that R's byte-code compiler
> > produces code that runs 5-10x faster than when the function is interpreted
> > by the AST interpreter (uncompiled), which sounds like a good result, but I
> > believe that avoiding code generation would be much faster than that, apart
> > from drastically reducing code size and therefore compile time. The
> > generator of these functions has much more information than the compiler -
> > it could be turned into an interpreter of these functions and compute their
> > values on the fly.
> >
> > A significant source of inefficiency of the generated code are
> > element-wise operations, such as
> >
> > r[12] <- -vv[88] + vv[16] * (1 + ppff[1307])
> > ...
> >
> > r[139] <- -vv[215] + vv[47] * (1 + ppff[1434])
> >
> > (these could be vectorized, which would reduce code size and improve
> > interpretation speed; and make it somewhat readable). Most of the code
> > lines in the generated functions seem to be easily vectorizable.
> >
> > Compilers and interpreters necessarily use some heuristics or optimize at
> > some code patterns. Optimizing for generated code may be tricky as it could
> > even harm performance of usual code. And, I would much rather optimize the
> > compiler for the usual code.
> >
> > Indeed, a pragmatic solution requiring the least amount of work would be
> > to disable compilation of these generated functions. There is not a
> > documented way to do that and maybe we could add it (and technically it is
> > trivial), but I have been reluctant so far - in some cases, compilation
> > even of these functions may be beneficial - if the speedup is 5-10x and we
> > run very many times. But once the generated code included some pragma
> > preventing compilation, it won't be eve

Re: [Rd] Problem with parseData

Re: [Rd] longint

Re: [Rd] longint

Re: [Rd] longint

Re: [Rd] longint

[Rd] Thanks for help with validspamobject

Re: [Rd] Package compiler - efficiency problem

Re: [Rd] Package compiler - efficiency problem

8 matches

Site Navigation

Mail list logo

Footer information