[R-pkg-devel] Formula modeling

2021-10-07 Thread pikappa.devel
Dear R-package-devel subscribers,

 

My question concerns a package design issue relating to the usage of
formulas.

 

I am interested in describing via formulas systems of the form:

 

d = p + x + y 

s = p + w + y

p = z + y

q = min(d,s).

 

The context in which I am working is that of market models with, primarily,
panel data. In the above system, one may think of the first equation as
demand, the second as supply, and the third as an equation (co-)determining
prices. The fourth equation is implicitly used by the estimation method, and
it does not need to be specified when programming the R formula. If you need
more information bout the system, you may check the package diseq.
Currently, I am using constructors to build market model objects. In a
constructor call, I pass [i] the right-hand sides of the first three
equations as strings, [ii] an argument indicating whether the equations of
the system have correlated shocks, [iii] the identifiers of the used dataset
(one for the subjects of the panel and one for time), and [iv] the quantity
(q) and price (p) variables. These four arguments contain all the necessary
information for constructing a model.

 

I would now like to re-implement model construction using formulas, which
would be a more regular practice for most R users. I am currently
considering passing all the above information with a single formula of the
form:

 

q | p | subject | time | rho ~ p + x + y | p + w + y | z + y 

 

where subject and time are the identifiers, and rho indicates whether
correlated or independent shocks should be used.

 

I am unaware of other packages that use formulas in this way (for instance,
passing the identifiers in the formula), and I wonder if this would go
against any good practices. Would it be better to exclude some of the
necessary elements for constructing the model? This might make the resuting
formulas more similar to those of models with multiple responses or multiple
parts. I am not sure, though, how one would use such model formulas without
all the relevant information. Is there any suggested design alternative that
I could check?

 

I would appreciate any suggestions and discussion!

 

Kind regards,

Pantelis


[[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] [External] Formula modeling

2021-10-08 Thread pikappa.devel
Hi,

The different environments can potentially be an issue in the future. I was not 
aware of the vector construction notation, and I think this is what I was 
mainly looking for. 

I could provide two initialization methods. One will use the ugly vector 
notation that one could use to bind the whole model with a particular 
environment. The second can be more user-friendly and use the comma-separated 
list of formulas. Essentially, the second will prepare the vector formula and 
call the first initialization method.

The (|) operator comment makes sense, and I would also want to avoid this to 
the extent that it is feasible.  So, I am currently thinking something along 
the line:

c(d, s, p | subject | time) ~ c(p + x + y, p + w + y, z + y)

This is very similar to how the function ?lme4::lmer uses the bar to separate 
expressions for design matrices from grouping factors. Actually, the subject 
and time variables are needed for subsetting prices for various operations 
required for the model matrix. 

Thanks for the suggestions; they are very helpful!

Best,
Pantelis

-Original Message-
From: Duncan Murdoch  
Sent: Friday, October 8, 2021 2:04 AM
To: Richard M. Heiberger ; pikappa.de...@gmail.com
Cc: r-package-devel@r-project.org
Subject: Re: [R-pkg-devel] [External] Formula modeling

On 07/10/2021 5:58 p.m., Duncan Murdoch wrote:
> I don't work with models like this, but I would find it more natural 
> to express the multiple formulas in a list:
> 
> list(d ~ p + x + y, s ~ p + w + y, p ~ z + y)
> 
> I'd really have no idea how either of the proposals below should be parsed.

There's a disadvantage to this proposal.  I'd assume that "p" means the same in 
all 3 formulas, but with the notation I give, it could refer to
3 unrelated variables, because each of the formulas would have its own 
environment, and they could all be different.  I guess you could make it a 
requirement that they all use the same environment, but that's likely going to 
be confusing to users, who won't know what it means.

Another possibility that wouldn't have this problem (but in my opinion is kind 
of ugly) is to use R vector construction notation:

   c(d, s, p) ~ c(p + x + y, p + w + y, z + y)

Duncan Murdoch

> 
> Of course, if people working with models like this are used to working 
> with notation like yours, that would be a strong argument to use your 
> notation.
> 
> Duncan Murdoch
> 
> On 07/10/2021 5:51 p.m., Richard M. Heiberger wrote:
>> I am responding to a subset of what you asked.  There are packages 
>> which use multiple formulas in their argument sequence.
>>
>>
>> What you have as a single formula with | as a separator q | p | 
>> subject | time | rho ~ p + x + y | p + w + y | z + y I think would be 
>> better as a comma-separated list of formulas
>>
>> q , p , subject , time , rho ~ p + x + y , p + w + y , z + y
>>
>> because in R notation | is usually an operator, not a separator.
>>
>> lattice uses formulas and the | is used as a conditioning operator.
>>
>> nlme and lme4 can have multiple formulas in the same calling sequence.
>>
>> lme4 is newer.  from its ?lme4-package ‘lme4’ covers approximately 
>> the same ground as the earlier ‘nlme’
>>package.
>>
>> lme4 should probably be the modelyou are looking for for the package design.
>>
>>> On Oct 07, 2021, at 17:20, pikappa.de...@gmail.com wrote:
>>>
>>> Dear R-package-devel subscribers,
>>>
>>>
>>>
>>> My question concerns a package design issue relating to the usage of 
>>> formulas.
>>>
>>>
>>>
>>> I am interested in describing via formulas systems of the form:
>>>
>>>
>>>
>>> d = p + x + y
>>>
>>> s = p + w + y
>>>
>>> p = z + y
>>>
>>> q = min(d,s).
>>>
>>>
>>>
>>> The context in which I am working is that of market models with, 
>>> primarily, panel data. In the above system, one may think of the 
>>> first equation as demand, the second as supply, and the third as an 
>>> equation (co-)determining prices. The fourth equation is implicitly 
>>> used by the estimation method, and it does not need to be specified 
>>> when programming the R formula. If you need more information bout the 
>>> system, you may check the package diseq.
>>> Currently, I am using constructors to build market model objects. In 
>>> a constructor call, I pass [i] the right-hand sides of the first 
>>> three equations as strings, [ii] an argument indicating whether the 
>>> equations of the system have correlated shocks, [iii] the 
>>> identifiers of the used dataset (one for the subjects of the panel 
>>> and one for time), and [iv] the quantity
>>> (q) and price (p) variables. These four arguments contain all the 
>>> necessary information for constructing a model.
>>>
>>>
>>>
>>> I would now like to re-implement model construction using formulas, 
>>> which would be a more regular practice for most R users. I am 
>>> currently considering passing all the above information with a 
>>> single formula of the
>>> form:
>>>
>>>
>>>
>>> q | p | subject | time 

[R-pkg-devel] R CMD check: checking for detritus in the temp directory

2023-06-08 Thread pikappa.devel
Dear all,

 

I am checking a package with DESCRIPTION dependencies:

 

Depends:

R (>= 4.1.0)

Imports:

keras,

R6,

reticulate,

tensorflow

 

I am getting a note:

 

❯ checking for detritus in the temp directory ... NOTE

  Found the following files/directories:

‘__autograph_generated_file3jcyufy7.py’

‘__autograph_generated_file4blbq_bi.py’

‘__autograph_generated_file6dur53sj.py’

‘__autograph_generated_filerj3ohibl.py’ ‘__pycache__’

 

I have found these previous discussions, but they seem different from my case.

 

https://stat.ethz.ch/pipermail/r-package-devel/2020q3/005698.html

https://stat.ethz.ch/pipermail/r-package-devel/2021q1/006477.html

https://stat.ethz.ch/pipermail/r-package-devel/2019q4/004626.html

https://stat.ethz.ch/pipermail/r-package-devel/2020q4/006161.html

 

I think the temp files are from tensorflow. There is a vignette in the package 
that constructs a keras

Model. Is it possible to avoid the check NOTE in my package?

 

Kind Regards,

Pantelis


[[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] R CMD check: checking for detritus in the temp directory

2023-06-12 Thread pikappa.devel
Thanks a lot; I am going to try this out.

Kind Regards,
Pantelis

-Original Message-
From: Ivan Krylov  
Sent: Sunday, June 11, 2023 12:41 PM
To: pikappa.de...@gmail.com
Cc: r-package-devel@r-project.org
Subject: Re: [R-pkg-devel] R CMD check: checking for detritus in the temp 
directory

В Thu, 8 Jun 2023 14:34:03 +0200
 пишет:

> I think the temp files are from tensorflow. There is a vignette in the 
> package that constructs a keras
> 
> Model. Is it possible to avoid the check NOTE in my package?

Other CRAN packages seem to remove these files (generated by
tensorflow.autograph) manually:

https://github.com/cran/vetiver/blob/35b24768cf0e84fab96610e001bba377dc777953/tests/testthat/setup.R#L13

https://github.com/cran/reservr/blob/98628416ba8d6a5f8f4c93682328f6a171bcd86d/tests/testthat/test-zzz.R#L6

--
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel