Re: [Rd] unable to load shared object

2018-10-05 Thread Paul Johnson
Hm. I cannot find it now, but I saw notes about problems with Onedrive,
maybe also Dropbox, because file paths are not exactly what r expects.
Usually I have package library as local folder. Did your R choose that
location automatically?

Paul Johnson
University of Kansas

On Fri, Oct 5, 2018, 3:01 AM Witold E Wolski  wrote:

> Thanks for asking.
>
> The path where R tries to load the dll from does not exist I think (or
> the install process deletes it after the error - might this be the
> case?):
> C:/Users/wewol/OneDrive/Documents/R/win-library/3.5/grpc/libs/x64/grpc.dll'
>
> When I am checking after the error that path is only valid up to:
> C:/Users/wewol/OneDrive/Documents/R/win-library/3.5/
>
> So maybe (?) the install process wrongly assums that the package is
> installed (but it never was) and than it executes
> ** testing if installed package can be loaded
>
> To run into the error.
>
> Error: package or namespace load failed for 'grpc' in inDL(x,
> as.logical(local), as.logical(now), ...):
>  unable to load shared object
>
> 'C:/Users/wewol/OneDrive/Documents/R/win-library/3.5/grpc/libs/x64/grpc.dll':
>   LoadLibrary failure:  The specified module could not be found.
> On Fri, 5 Oct 2018 at 09:37, Paul Johnson  wrote:
> >
> > Greetings.
> > Is it possible that Onedrive is causing trouble? Do other packages you
> build and install work well from that directory?
> >
> > Paul Johnson
> > University of Kansas
> >
> > On Wed, Oct 3, 2018, 3:13 AM Witold E Wolski  wrote:
> >>
> >> Hello,
> >>
> >> I am trying to install a package with some src files on windows (linux
> >> install works fine). The sources seem to build, there is an *.dll in
> >> the src folder but than the installation process fails with:
> >>
> >> ```
> >> ** testing if installed package can be loaded
> >> Error: package or namespace load failed for 'grpc' in inDL(x,
> >> as.logical(local), as.logical(now), ...):
> >>  unable to load shared object
> >>
> 'C:/Users/wewol/OneDrive/Documents/R/win-library/3.5/grpc/libs/x64/grpc.dll':
> >>   LoadLibrary failure:  The specified module could not be found.
> >>
> >> Error: loading failed
> >> Execution halted
> >> ```
> >>
> >> Do I need to point the installer to the *.dll in the src directory by
> >> creating some special function (e.g. dyn.load) or creating some
> >> special file?
> >>
> >> Help would be greatly appreciated:
> >>
> >> The packages sources are here:
> >> https://github.com/wolski/grpc-1
> >>
> >> Witek
> >>
> >>
> >> --
> >> Witold Eryk Wolski
> >>
> >> __
> >> R-devel@r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>
>
> --
> Witold Eryk Wolski
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Proposed function file.backup

2019-02-14 Thread Paul Johnson
Dear R Core:

In the kutils package, I wrote a function that is so handy that I would
like to ask you put it in R itself.

file.backup() will look at a file, find its last modification time, and
create a new backup with a name that appends MMDD-HHMM to the file
name. So now, whenever I worry that running write.csv or saving a graph
might destroy something valuable, I use an idiom like

fn <- "filename.csv"
file.backup(fn)
write.csv(fn)

So If "filename.csv" already exists, it will get a backup copy.  If an old
backup version exists, the function doesn't re-copy.  In my workflow, this
is very convenient and it could be integrated with other R file writers
that have overwrite as a parameter.  If overwrite could be set as "backup"
instead of TRUE or FALSE, it would be convenient.

I sincerely hope you will consider it:

##' Create a backup version of a file by renaming it.
##'
##' Inserts the date-time of the most recent modification at the
##' end of the file name, before the extension.
##'
##' Return is the new file name that was created, using whatever
##' path information was provided in the file's original name. However,
##' the fullpath argument can be set to TRUE, so a path with the
##' full directory name will be created and returned.
##'
##' @param name A character string for the name of the file.
##' @param fullpath Return the full directory path to the
##' file. Default FALSE, return only the file name.
##' @param keep.old If FALSE (default), rename the file. Otherwise, keep
old copy.
##' @param verbose If TRUE, list the files in the output directory
##' @author Shadi Pirhosseinloo  Paul Johnson 
##' @return The name of the newly created file.
##' @importFrom tools file_ext
##' @importFrom tools file_path_sans_ext
##' @export
##' @examples
##' tdir <- tempdir()
##' owd <- getwd()
##'
##' setwd(tdir)
##' system("touch test.1.txt")
##' system("touch test.2.txt")
##' system("touch test.3.txt")
##' system("touch test.4.txt")
##' system("touch test.5.txt")
##' ## note: no extension next
##' system("touch test.6")
##' list.files()
##' file.backup("test.1.txt")
##' file.backup("test.2.txt", fullpath=TRUE)
##' list.files()
##' setwd(owd)
##' file.backup(file.path(tdir, "test.3.txt"))
##' ## Next should be same path because input had a full path
##' file.backup(file.path(tdir, "test.4.txt"), fullpath=TRUE)
##' file.backup(file.path(tdir, "test.5.txt"), fullpath = TRUE, verbose =
TRUE)
##' file.backup(file.path(tdir, "test.6"))
file.backup <- function(name, fullpath = FALSE, keep.old = FALSE, verbose =
FALSE){
if(!file.exists(name)){
MESSG <- paste("file", name, "does not exist. No backup created.")
warning(MESSG)
return(NULL)
}
dir.source <- dirname(normalizePath(name))

date_cr <- base::format(base::file.info(name)$mtime, "%Y%m%d-%H%M")
ext_name <- tools::file_ext(name)
noext_name <- tools::file_path_sans_ext(name)
new_name <- paste0(noext_name, "-", date_cr,
   if(!ext_name == "") {paste0( ".", ext_name)})

## Abort if new file name already exists
if (file.exists(new_name)){
MESSG <- paste("backup file already exists. No new backup created.")
warning(MESSG)
return(new_name)
}
ret <- if(keep.old){
   file.copy(name, new_name, recursive = TRUE, overwrite = TRUE,
 copy.mode = TRUE, copy.date = TRUE)
   }else{
   file.rename(name, new_name)
   }
if(!ret) {
MESSG <- paste("file.rename(", name, ",", new_name, "failed")
stop(MESSG)
}
if(verbose){
cat("Full file list of directory:", dir.source, "\n")
print(list.files(dir.source))
cat("End of file list\n")
}
if(fullpath){
new_name_fullpath <- normalizePath(new_name)
return(new_name_fullpath)
}

## else, give back adjusted with same path information it had
new_name
}







-- 
Paul E. Johnson   http://pj.freefaculty.org
Director, Center for Research Methods and Data Analysis http://crmda.ku.edu

To write to me directly, please address me at pauljohn at ku.edu.

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Large discrepancies in the same object being saved to .RData

2010-07-10 Thread Paul Johnson
On Wed, Jul 7, 2010 at 7:12 AM, Duncan Murdoch  wrote:
> On 06/07/2010 9:04 PM, julian.tay...@csiro.au wrote:
>>
>> Hi developers,
>>
>>
>>
>> After some investigation I have found there can be large discrepancies in
>> the same object being saved as an external "xx.RData" file. The immediate
>> repercussion of this is the possible increased size of your .RData workspace
>> for no apparent reason.
>>
>>
>>
> I haven't worked through your example, but in general the way that local
> objects get captured is when part of the return value includes an
> environment.

Hi, can I ask a follow up question?

Is there a tool to browse *.Rdata files without loading them into R?

In HDF5 (a data storage format we use sometimes), there is a CLI
program "h5dump" that will spit out line-by-line all the contents of a
storage entity.  It will literally track through all the metadata, all
the vectors of scores, etc.  I've found that handy to "see what's
really  in there" in cases like the one that OP asked about.
Sometimes, we find that there are things that are "in there" by
mistake, as Duncan describes, and then we can try to figure why they
are in there.

pj


-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Can you share a working example of R program aided by fast BLAS?

2010-08-19 Thread Paul Johnson
Can one of you give me an R program that displays the benefits an
accelerated BLAS in R?

Here's why I ask, in case you wonder:

In a linux cluster, I've hit some bumps in the road.  The worst one by
far was that I installed R, then GotoBLAS2 with default settings, and
after that, jobs using Rmpi were *really* *really* slow.  I mean
horrible.  If a job took 15 minutes when run by itself, outside of
MPI, it took 1 full day when run inside MPI. Literally the same job.

I learned later that GotoBLAS2 defaults to allow threads equal to the
number of cores, and that the threads are not compatible with MPI.
This latter point not clearly stated in the GotoBLAS2 documents, so
far as I can find, but after I realized that was the problem, I did
find one other cluster website that mentioned the same problem. "If
your application uses GotoBLAS and all cores as MPI threads, setting
GOTO_NUM_THREADS larger than one will usually result in drastically
slower performance."
(http://hpc.uark.edu/hpc/support/software/numerical.html#gotoblas).
In the GotoBLAS2 documentation, it warns of weird thread related
delays, but it implies that the slowdown--if it happens--is a result
of bad user code, rather than this more fundamental mismatch between
OpenMPI (or MPI in general) and GotoBLAS2.

In the process of diagnosing the big slowdown, I've been making many
time comparisons. When I installed GotoBLAS2 in the first place, it
was because so many people (and the R admin manual) said that R's
ordinary BLAS is rudimentary/slow.  In the test cases I've tried,  R's
BLAS is not that bad. In fact, in the test programs we run, the time
is not substantially different with GotoBLAS2 and R's BLAS.  I also
compared the Intel Kernel Math Library BLAS and didn't notice a huge
difference.

So, well, I think that means I'm running bad test cases for R and GotoBLAS2.

Oh, and one more thing. I have not been able to find an example R
program that benefitted at all from allowing threads > 1 in GotoBLAS2
environment settings.  In fact, if a one-thread job takes15 minutes,
the one that allows 2 or more threads is 21 minutes.  And the more
threads allowed causes a job to take longer. This is literally the
same job, same cluster node, the only difference is changing the
environment variable that adjusts the GotoBLAS2 threads allowed.

So if you know whether your example depends on threads or not, I would
appreciate the warning.

pj

-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] C or Java code generation

2010-08-20 Thread Paul Johnson
On Thu, Aug 19, 2010 at 5:47 AM, Vyacheslav Karamov
 wrote:
x.ru>>>
>>
>>
>>           Hi All!
>>
>>           I'm new to R and I need to know is it possible for R to
>>        generate
>>           C/C++ source code, Java byte code or native Win32 DLL like
>>        MatLab?
>
> Is there any posibility to use R without installing?
> I mean that I have my own application written in MS Visual C++ and I need to
> use R script in this app. I can install R and use it via DCOM, but it's not
> convenient for  the end users of my  program.
>

Do your users have Web access? If so, you can set up an Rweb (or
similar) server that could run the jobs for those users. I have one
set up like that. Users transmit their code via http and can get
results back in web page.  I think you could probably program around
the output management question, but I did not try to do it. I mean,
something like "wget" could download the images directly, by-passing
the http server.

Rweb has been abandoned by its original author, but it can be made to
work, and there are other, newer http server programs to run R.  I
have not tried them yet, mainly because my customers needed to run
Rweb because their clients told them to use it.

-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] No RTFM?

2010-08-20 Thread Paul Johnson
On Thu, Aug 19, 2010 at 7:08 PM, Spencer Graves
 wrote:
>  What do you think about adding a "No RTFM" policy to the R mailing lists?
> Per, "http://en.wikipedia.org/wiki/RTFM":
>
I think this is a great suggestion.

I notice the R mailing list already has a gesture in this direction:
"Rudeness and ad hominem comments are not acceptable. Brevity is OK."

But the people who behave badly don't care about policies like this
and they will keep doing what they do.

pj

--
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] No RTFM?

2010-08-21 Thread Paul Johnson
On Sat, Aug 21, 2010 at 11:04 AM, Hadley Wickham  wrote:
>> previous suggestion by a regular contributor.  I still think a better
>> response is not to escalate:  Either ignore the post or say something like,
>> "I don't understand your question.  Please provide a self-contained minimal
>> example as suggested in the Posting Guide ... ."
>
> I agree wholeheartedly. I have tried to do this with the ggplot2
> mailing list, and I think it has been extremely successful in
> fostering a community that is friendly and polite, yet still provides
> excellent technical support (and these days, most of it doesn't come
> from me!).
>

I've been insulted in r-help and its no fun.  After Gabor pointed out
the logic in it,  I have to admit he is right. It does keep traffic
down.  I don't go there anymore unless I'm completely desperate.

I agree also, sometimes RTFM is the right answer, especially if it is
stated as "That is discussed on p. 162 of the R Guide for Whatever..."
 I don't think people are insulted if  you tell them to check
something specific.

Usually, first time visitors don't know about the R posting guide and
when they ask an incomplete question, we should just refer them to it.
 We don't need the angry tone that we often see, but I don't think
people mind being referred. This presupposes the posting guide is
helpful. If somebody forgets the posting guide twice, then I think we
should all berate and insult them. I mean vigorously :)

My personal opinion is that the R posting guide could be revised to be
more direct.  Exhausted, frustrated people don't benefit as much as
they could because the document is too long.

This is hard to fix because everything in there was added for a good
reasons.  (I've been here long enough to remember a time before the
guide, sad to say.)  I tried to reshape the posting guide and I can
see that it is a really hard problem.

What do you think of this: The priority is to put the most important
thing at the top. The second priority is brevity.

=

Posting Guide: How to ask good questions that prompt useful answers

People are busy, so ask your question in a useful way.

1. Every question to r-help should begin with the following.

A. Output from the command sessionInfo()

B. Output from Sys.getlocale()

C. Information about how you installed R. Did you make any changes,
such as incorporating a BLAS library. If you don't know, ask your
system administrator.

D. If you see an error or something unexpected, your message MUST
include the EXACT code that you were running. We mean, ALL of your
commands that were run before the error occurred.  If you are unsure
of what you did, close your session, clean up your code, and start
over to reproduce the  problem in the most direct way.  Your post MUST
include the EXACT VERBATIM error message.

If you are working on a long program that requires a dataset,
post the dataset and the program somewhere on the Internet and
refer us to it. It is simply not possible to guess about what
might be going wrong in your program unless we can see it.

If you don't want to share your data, construct a small example
dataset that produces the same problem. Post it and refer us to it.

E. If you have isolated the problem to a certain part of your project,
write a small, self-contained 'working example' that causes the
problem and include it with your post.

Example. Why does this code:

dat <- data.frame(x=c(1,2,3), y=c(3,4,5))
plot (x, y, data=dat)

cause this:

Error in plot(x, y, data = dat) : object 'x' not found

We can easily reproduce that and explain the problem to you. We can't
help with a question like "my plot failed, something about an object
was missing."

2. How to avoid making the members of r-help angry at you.

A. Do not call a problem a "bug" unless you really mean to say that
   someone has made a mistake. If you say "bug", they hear
   "embarrassing personal flaw" or "gigantic boil".  We know
   you don't mean that, but sometimes they forget.

B. Before you write to r-help, search for answers from previous questions.
   1. Try Google? Or her cousin Yahoo?
   2. Try these inside R:

  help.search("whatever")
  RSiteSearch("whatever")
  apropos("whatever")

C. Read R-intro, R help pages, and the R FAQs.

  ?whatever


3. Do not send your question to r-help unless your question is about R
or the base R packages that are supported by the R Core Team.

A. Don't ask statistics questions, unless they fall into the form "Which
R procedure or package should I use to conduct an analysis of ..." or
"Does anybody have a working example of procedure XYZ?" or "Can I
obtain XYZ result from an obect of class ZYX?"

B. If you have trouble with a package from CRAN or elsewhere, write to
the author of that package.  r-help might be a good place to ask,
"I've been using package XYZ and the author seems to have abandoned
the project. Does anybody know of a replacement?" Otherwise, don't.

Note that the

Re: [Rd] No RTFM?

2010-08-22 Thread Paul Johnson
Hey Ben:

One of my colleagues bought your book and was reading it during a
faculty meeting last Tuesday.  Everybody kept asking what's that?

If you know how to put it in the wiki, would you please do it and let
us know where it is.  I was involved with an R wiki about 5 or 6 years
ago, but completely lost track of it.

I'm going to keep pushing for sharp, clear guidelines.  If I could get
R-help set up as a template for messages like  bugzilla, I think it
would be awesome! To disagree with Ted. People won't mind giving us
what we need, as long as we tell them exactly what to do. We save
ourselves the circle of "give us sessionInfo()" and "give us your
LANG" and 'what version is that' and so forth.

But I'll consider it a victory if we can get the requirement of
sessionInfo() in as the first thing in the posting guide !

I agree that "which packages can we help with" in r-help is a good
question. I had in mind that we should field questions about anything
in the R tarball, including the recommended ones. I think I did not
use the right words when I tried to express that.  Maybe we just need
to list the packages to be clear.  Things like "foreign" that have
passed from the ownership of the original writer to the R Core Team
are certainly eligible.

I also admit that I have not looked at R-FAQ since about 2005, and I
don't want to duplicate that effort.  If R-FAQ has a nice concise list
of ways to get information about specific things, the posting guide
might just refer to that section by name/number, rather than
duplicate.  I'm torn here, because people often assume that if you
give them a link, it must not be important (thats why you left it
out).

pj

On Sat, Aug 21, 2010 at 6:04 PM, Ben Bolker  wrote:
> Paul Johnson  gmail.com> writes:
>
>>
>
>  [snip: lots more snippage to try get gmane to let me post]
>
>> What do you think of this: The priority is to put the most important
>> thing at the top. The second priority is brevity.
>
>  I really like this.
>  Some suggestions:
>
> =
>> Posting Guide: How to ask good questions that prompt useful answers
>>
>> People are busy, so ask your question in a useful way.
>> 1. Every question to r-help should begin with the following.
>> A. Output from the command sessionInfo()
>> B. Output from Sys.getlocale()
>> C. Information about how you installed R. Did you make any changes,
>> such as incorporating a BLAS library. If you don't know, ask your
>> system administrator.
>
>   I would make this optional or put it further down. I don't see that
> many problems on the list that are due to non-standard installations.
> Most of the most clueless people are (a) using stock installations
> and/or (b) don't even know who installed R on the computer they are
> using. I don't mind sending them to find out/ask if it's a real
> issue, but it feels like an unnecessary hurdle.
>
>> D. If you see an error or something unexpected, your message MUST
>> include the EXACT code that you were running. We mean, ALL of your
>> commands that were run before the error occurred.  If you are unsure
>> of what you did, close your session, clean up your code, and start
>> over to reproduce the  problem in the most direct way.  Your post MUST
>> include the EXACT VERBATIM error message.
>>
>> If you are working on a long program that requires a dataset,
>> post the dataset and the program somewhere on the Internet and
>> refer us to it. It is simply not possible to guess about what
>> might be going wrong in your program unless we can see it.
>>
>> If you don't want to share your data, construct a small example
>> dataset that produces the same problem. Post it and refer us to it.
>
>   This is where we have to emphasize 'small, reproducible example'
> more strongly -- perhaps move the next item up.
> I dread the pages and pages of random R-session crap
> that will be posted following this advice literally ...
>
>> E. If you have isolated the problem to a certain part of your project,
>> write a small, self-contained 'working example' that causes the
>> problem and include it with your post.
>>
>> Example. Why does this code:
>>
>> dat <- data.frame(x=c(1,2,3), y=c(3,4,5))
>> plot (x, y, data=dat)
>>
>> cause this:
>>
>> Error in plot(x, y, data = dat) : object 'x' not found
>>
>> We can easily reproduce that and explain the problem to you. We can't
>> help with a question like "my plot failed, something about an object
>> was missing."
>>
>> 2. How to avoid making the members 

Re: [Rd] No RTFM?

2010-08-22 Thread Paul Johnson
On Sun, Aug 22, 2010 at 9:22 PM, Ted Harding
 wrote:
>
> (with Cc: to r-devel)
>    I presume you mean "sessionInfo()". "systemInfo()" hasn't been
>    mentioned so far, I think.
>

brain fart. I'm old, you know :)

>. I am questioning your proposal
>    that
>      1. Every question to r-help should begin with the following.
>      A. Output from the command sessionInfo()
>      B. Output from Sys.getlocale()
>      [etc.]
>    Not *every* question, as I've said before. It depends on the case.

I promise this is my last post in this thread.

How about this.  I aim for more concise, direct instruction.

Hence I replace

"This guide is intended to help you get the most out of the R mailing
lists, and to avoid embarrassment. Like many responses posted on the
list, it is written in a concise manner. This is not intended to be
unfriendly - it is more a consequence of allocating the limited
available time and space to technical issues rather than to social
niceties.

The list: Remember that R is free software, constructed and maintained
by volunteers. They have various reasons for contributing software and
participating on the mailing lists, but often have limited time."

with

"People are busy, so ask your question in a useful way."

I think we need to get the most vital instructions up to the top of
the guide. And the single most vital thing is sessionInfo(), as far as
I can tell. I wish sessionInfo would just grab the locale information.

You interpret my plan as harsh/restrictive, and I don't mean it that
way.  I'm aiming for clear and understandable to tired/frustrated
people.

Suppose it is phrased it this way instead:

1. Unless you are confident that the problem you are asking about is
not affected by your OS or version of R, please include this
information with your post:

A. Output from the command sessionInfo()

B. Output from Sys.getlocale()
...


I think your experience with the befuddled users must be different
from mine. Mine are looking for a clear instruction on what to do,
they don't know what's wrong, and don't mind doing something specific
as long as they are sure you want it.

My users are perplexed by a statement like :

"Surprising behavior and bugs: What you think is a bug may be many
other things, such as a default behavior that you do not like, a
feature, an undocumented feature, or a bug in the documentation. You
do not need to commit yourself to one of these in order to ask a
question. If it is a real bug, someone will notice it. Before you post
a real bug report, make sure you read R Bugs in the R-faq. If you're
not completely and utterly sure something is a bug, post a question to
r-help, not a bug report to r-bugs - every bug report requires manual
action by one of the R-core members. Also, see Simon Tatham's essay on
How to Report Bugs Effectively.

For questions about unexpected behavior or a possible bug, you should,
at a minimum, copy and paste the output from sessionInfo() into your
message."

I don't think they get much out of this. They need to know that that
"bug" is a "fighting word" (US legal term: expression for which you
can reasonably expect to be punched in the nose, so it is not
protected free speech) for programmers and they are deeply insulted by
it, and users should never use it.  The only person who should call
something a bug is the author.  Only the Mother can say a baby is
ugly.

> [3] I have tried to argue for a moderate and flexible spirit in
>    what is advised in the Posting Guide.

I agree, but can it be done more clearly and concisely.

pj


-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] How do you make a formal "feature" request?

2010-08-28 Thread Paul Johnson
On Sat, Aug 21, 2010 at 10:41 AM, Donald Winston  wrote:
> Who decides what features are in R and how they are implemented? If there is 
> someone here who has that authority I have this request:
>
> A report() function analogous to the plot() function that makes it easy to 
> generate a report from a table of data. This should not be in some auxiliary 
> package, but part of the core R just like plot(). As a long time SAS user I 
> cannot believe R does not have this. Please don't give me any crap about 
> Sweave, LaTex, and the "power" of R to roll your own. You don't have to "roll 
> your own" plot do you? Reports are no different. If you don't agree do not 
> bother me. If you agree then please bring this request to the appropriate 
> authorities for consideration or tell me how to do it.
>
> Thanks.
>        [[alternative HTML version deleted]]

Dear Donald:

I've wondered about your question about "how do things get into R."
It seems to me the tone of your post pissed people off and they are
giving you some abuse.  By telling you to write it yourself, they are
expressing their frustration toward you.  Really, though, you have
only yourself to blame for that.  You treat the readers like your
employees.

I think the nicer answer might go along these lines.

1.  I recently suggested code to patch R in this list and one of the
readers pointed me to this note in R-FAQ, which seems to be about as
close as you are going to get to a "request".

"There is a section of the bug repository for suggestions for
enhancements for R labelled ‘wishlist’. Suggestions can be submitted
in the same ways as bugs, but please ensure that the subject line
makes clear that this is for the wishlist and not a bug report, for
example by starting with ‘Wishlist:’. "


2. How do you suppose plot got into R? Somebody volunteered it, it
worked nicely, and many people have improved and perfected it over the
years.  If you could give us a working version of "report", the same
might happen.

3. I hate SAS report.  You should too. It stinks. We don't need that.
It is for boneheads.
You don't even tell us what is good about report or give us an
example. Or a link to an example.

4. In the R family, there are many functions like report that you should try.

Please note the function "summary" is supplied by almost all modeling
tools. It is supposed to do roughly what you want.  Does it have too
much information or too little?

There are several ways to beautify that output.

Try xtable. It can make html or LaTeX.

I wonder if the memisc package function mtable  does what you want. It
collects up a bunch of regressions for you in a nice looking table?
I've found very useful. I don't know what your objection is against
LaTeX, incidentally.   Similar tables can be created with packages.
Hmisc and apsrtable


Good luck, next time don't tell us to go to hell when you ask your question.

pj
-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] environment question

2010-12-26 Thread Paul Johnson
Hello, everybody.

I'm putting together some lecture notes and course exercises on R
programming.  My plan is to pick some R packages, ask students to read
through code and see why things work, maybe make some changes.  As I
look for examples, I'm running up against the problem that packages
use coding idioms that are unfamiliar to me.

A difficult thing for me is explaining scope of variables in R
functions.  When should we pass an object to a function, when should
we let the R system search about for an object?  I've been puzzling
through ?environment for quite a while.

Here's an example from one of the packages that I like, called "ltm".
In the function "ltm.fit" the work of calculating estimates is sent to
different functions like "EM' and "loglikltm" and "scoreltm".  Before
that, this is used:

environment(EM) <- environment(loglikltm) <- environment(scoreltm) <-
environment()

##and then EM is called
res.EM <- EM(betas, constraint, control$iter.em, control$verbose)

I want to make sure I understand this. The environment line gets the
current environment and then assigns it for those 3 functions, right?
All variables and functions that can be accessed from the current
position in the code become available to function EM, loglikltm,
scoreltm.

So, which options should be explicitly inserted into a function call,
which should be left in the environment for R to find when it needs
them?

1. I *think* that when EM is called, the variables "betas",
"constraint", and "control" are already in the environment.

The EM function is declared like this, using the same words "beta" and
"constraint"

EM <-
function (betas, constraint, iter, verbose = FALSE) {

It seems to me that if I wrote the function call like this (leave out
"betas" and "constraint")

res.EM <- EM(control$iter.em, control$verbose)

R will run EM and go find "betas" and "constraint" in the environment,
there was no need to name them as arguments.


2 Is a function like EM allowed to alter objects that it finds through
the environment, ones that are not passed as arguments? I understand
that a function cannot alter an object that is passed explicitly, but
what about the ones it grabs from the environment?

If you have ideas about packages that might be handy teaching
examples, please let me know.

pj
-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] 'libRblas.so' missing in R 2.12.1

2010-12-26 Thread Paul Johnson
On Thu, Dec 16, 2010 at 7:31 AM, Christian Kohler
 wrote:
> Dear R developers,
>
> I just compiled the latest version of R (2.12.1) and noticed that 
> 'libRblas.so' is missing in the '/x86_64/src/extra/blas' subdirectory of my
> R-installation.
>
> Did I miss ongoing discussions on the Mailinglist about this or might it be a 
> local problem?
>
> Christian Kohler

Hi, Christian:

Did  you get an answer? I'm guessing you asked because you want to
follow along with the optimization advice in the R Install Guide,
section "A.3.1.5 Shared BLAS"

I've asked the same thing myself.  Depending on the configure options
you specified, libRblas may be built as a shared library or R may be
linked against an external blas.   There seems to be some tension and
the experts will give you differing advice about whether R should be a
shared library and whether you should allow libRblas.so to be built.
The Install manual says that a non-shared library will load more
quickly, but, of course, if the blas is built into R itself, then you
can't play games pointing the symbolic link to other shared library
implementations.

If you have access to a Ubuntu system, you will notice there is no
libRblas.so supplied in the packages they provide.  They are pointing
to a shared library blas from the intel kernel math library.

The last time I looked, that was configured like this:

  ./configure --prefix=/usr   \
   --with-cairo\
   --with-jpeglib  \
   --with-pango\
   --with-png  \
   --with-readline \
   --with-tcltk\
   --with-system-bzlib \
   --with-system-pcre  \
   --with-system-zlib  \
   --mandir=/usr/share/man \
   --infodir=/usr/share/info   \
   --datadir=/usr/share/R/share\
   --includedir=/usr/share/R/include   \
   $(atlas)\
   $(lapack)   \
   --without-gnome \
   --enable-R-profiling\
   --enable-R-shlib\
   --enable-memory-profiling   \
   --without-recommended-packages  \
   --build $(buildarch)

$(atlas) draws its value from the rules file

atlas   = --with-blas

Similarly, $(lapack)

lapack  = --with-lapack


If you want to get libRblas.so out, you need to remove the atlas and
lapack lines. Also, watch out for this configure option:

--disable-BLAS-shlib.

If you take out the atlas/lapack statements, you get libRblas.so, and
then you can follow along with R install manual to replace that with a
link to one of the optimized blas libraries.

However, my experience, and that of at least one other person, is that
the speedup you will observe from that is not too substantial.

http://www.cybaea.net/Blogs/Data/Faster-R-through-better-BLAS.html


-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Request: Suggestions for "good teaching" packages, esp. with C code

2011-02-15 Thread Paul Johnson
Hello,

I am looking for CRAN packages that don't teach bad habits.  Can I
have suggestions?

I don't mean the recommended packages that come with R, I mean the
contributed ones.  I've been sampling a lot of examples and am
surprised that many ignore seemingly agreed-upon principles of R
coding. In r-devel, almost everyone seems to support the "functional
programming" theme in Chambers's book on Software For Data Analysis,
but when I go look at randomly selected packages, programmers don't
follow that advice.

In particular:

1. Functions must avoid "mystery variables from nowhere."

Consider a function's code, it should not be necessary to say "what's
variable X?" and go hunting in the commands that lead up to the
function call.  If X is used in the function, it should be in a named
argument, or extracted from one of the named arguments.  People who
rely on variables floating around in the user's environment are
creating hard-to-find bugs.

2. We don't want functions with indirect effects (no <<- ), almost always.

3. Code should be vectorized where possible, C style for loops over
vector members should be avoided.

4. We don't want gratuitous use of "return" at the end of functions.
Why do people still do that?

5. Neatness counts.  Code should look nice!  Check out how beautiful
the functions in MASS look! I want code with spaces and " <- " rather
than  everything jammed together with "=".

I don't mean to criticize any particular person's code in raising this
point.  For teaching exemples, where to focus?

Here's one candidate I've found:

MNP.  as far as I can tell, it meets the first 4 requirements.  And it
has some very clear C code with it as well. I'm only hesitant there
because I'm not entirely sure that a package's C code should introduce
its own functions for handling vectors and matrices, when some general
purpose library might be more desirable.  But that's a small point,
and clarity and completeness counts a great deal in my opinion.





-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Set up new CRAN mirror; but have questions before finalizing.

2011-02-18 Thread Paul Johnson
Hi, everybody

I have an account on Dreamhost.com and when I renewed it recently,
their message said my usage of storage and bandwidth had been
reasonably low. In an idle moment about 3 weeks ago, I followed your
instructions to set up a CRAN mirror on their server.  Here it is:

http://www.freefaculty.org/cran

It is not hosted at my University, but it is a working, high
availability server.  Is there any reason it could not be listed as a
CRAN mirror. (Although I really have no idea where these machines
exist. I'm pretty sure it is in the USA. I'll try to find out).

Maybe you might try it and see?  It has updated several times, no
trouble I can see in that.

I have a couple of small details to ask about.  Maybe this first one
is a potential  "bug report" for the CRAN mirror instructions.

1. Permissions on "src" and "web" folders are 700,  and so running
"update.packages" or an apt-get update against the debian stuff
results in permission denied errors.  I re-set the permissions
manually, but wonder if I'm actually supposed to mess around with your
archive.  After doing that, it works.  But I worry a little bit about
what else might not be readable "down there" in the hierarchy. And I
wonder why any body else's mirror works without doing that.

Notice before

> update.packages( repos=c("http://www.freefaculty.org/cran";))
Warning: unable to access index for repository
http://www.freefaculty.org/cran/src/contrib

and after giving others rx permission on "src".

> update.packages( repos=c("http://www.freefaculty.org/cran";))
RandomFields :
 Version 1.3.47 installed in /usr/local/lib/R/site-library
 Version 2.0.40 available at http://www.freefaculty.org/cran
Update (y/N/c)?

So at least that part works.  right?

2. When I run "apt-get update" against my mirror, i get a lot of
harassment about the lack of a security key for my repository.  Should
I be publishing an R Core team key to fix that, or my own key to do
what?  I've never administered an apt repository before. I have
administered yum repositories and the security there is in the key on
the individual RPMS, I don't quite understand what the Debian thing is
asking me to do.

If just a couple of you would test this out, I would feel more brave
in asking to have this listed on the CRAN mirror system.

Regards,

pj

-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Set up new CRAN mirror; but have questions before finalizing.

2011-02-19 Thread Paul Johnson
On Sat, Feb 19, 2011 at 12:43 AM, Friedrich Leisch
 wrote:
>>>>>> On Fri, 18 Feb 2011 22:00:28 -0600,
>>>>>> Paul Johnson (PJ) wrote:
>
>  > Hi, everybody

>  > Maybe you might try it and see?  It has updated several times, no
>  > trouble I can see in that.
>
>  > I have a couple of small details to ask about.  Maybe this first one
>  > is a potential  "bug report" for the CRAN mirror instructions.
>
>  > 1. Permissions on "src" and "web" folders are 700,
>
> Then you must have made a mistake in setting up the mirror, because on
> the master we have 775 for both directories and all directories within
> these two. We also had never complaints about this before.
>
Greetings, Fritz

I'm surprised too.  All I did was follow the very clear and simple
instructions in the HOWTO. I run this script in the folder where the
web server lives.
$ cat downloadCran.sh
rsync -rtlzv --delete cran.r-project.org::CRAN
/home/freefaculty/freefaculty.org/cran

I promise (sincerely) I changed nothing that forced src or web to 700.

I manually put all the directories to permission 755 and then will run
an update and see if they stay that way.  If all stays well, I'll
notify the address listed in the mirror HOWTO that this thing is in
California.

pj

>
> Best regards,
> Fritz Leisch
>



-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] function call overhead

2011-02-28 Thread Paul Johnson
Snipping down to bare minimum history before comment:

On Wed, Feb 16, 2011 at 4:28 PM, Olaf Mersmann
 wrote:
> Dear Hadly, dear list,
>
> On Wed, Feb 16, 2011 at 9:53 PM, Hadley Wickham  wrote:
>
>>> system.time(replicate(1e4, base::print))
>>   user  system elapsed
>>  0.539   0.001   0.541
>>> system.time(replicate(1e4, print))
>>   user  system elapsed
>>  0.013   0.000   0.012

>> library("microbenchmark")
>> res <- microbenchmark(print, base::print, times=1)
>> res
>> print(res, unit="eps")
> Unit: evaluations per second
>                    min          lq      median          uq        max
> print       17543859.65 15384615.38 14705882.35 14492753.62 20665.8538
> base::print    23944.64    23064.33    22584.32    20659.88   210.5329
>

I think it is important to say that this slowdown is not unique to R
and is unrelated to the fact that is R  interpreted.  The same happens
in compiled object-oriented languages like C++ or Objective-C. There
is an inherent cost in the runtime system to find a function or method
that is suitable to an object.

In agent-based modeling simulations, we call it the cost of "method
lookup" because the runtime system has to check for the existence of a
method each time it is called for a given object.   There is a
time-saving approach where one can cache the result of the lookup and
then call that result directly each time through the loop.
Implementing this is pretty complicated, however, and it is
discouraged unless you really need it.  It is especially dangerous
because this optimization throws-away the runtime benefit of matching
the correct method to the class of the object.  (See
http://www.mulle-kybernetik.com/artikel/Optimization/opti-3.html,
where it shows how one can even cache C library functions to avoid
lookup overhead. I'm told that the Obj-C 2.0 runtime will try to
optimize this automatically, I've not tested.)

The R solution is achieving that exact same kind of speed-up by saving
the function lookup in a local variable. The R approach, however, is
implemented much more easily than the Objective-C solution. There is
an obvious danger: if the saved method is not appropriate to an object
to which it applies, something unpredictable will happen.

The same is true in C++.  I was fiddling around with the C++ code that
is included with the R package Siena (awesome package, incidentally)
last year and noticed a similar slowdown with method lookup.  In C++,
I was surprised to find a slowdown inside a class using an instance
variable prefixed with  "this.".  For an IVAR, "this.x" and "x" are
the same thing, but to the runtime system, well, there's slowdown in
finding "this" class and getting x, compared to just using  x.  To the
programmer who is trying to be clear and careful, putting "this." on
the front of IVAR is tidy, but it also slows down the runtime a lot.

Hope this is not more confusing than when I started :)

pj
-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Standalone C++ application for processing R parser output(SEXP)

2011-03-26 Thread Paul Johnson
On Wed, Mar 23, 2011 at 6:35 PM, Rob Anderson  wrote:
> Hi All,
>
> I am trying to write a source-to-source compiler for R. I am trying to
> leverage the R parser code for the purpose. I am trying to transform the
> SEXP returned from the parser into an AST for our own Ruby embedded Domain
> specific language.
>
> I tried using R CMD SHBIN to compile a C function that parses arbitrary R
> expressions. But I think, the generated .so file can be used from within R
> and not be called from other C or Ruby programs(I get linker errors).
>

I hope I am not telling you what you already know.   There are working
examples of "C Standalone" programs that link with R mathlib. I
learned from the examples here

http://www.math.ncu.edu.tw/~chenwc/R_note/index.php?item=standalone

and

http://www.stat.berkeley.edu/classes/s243/rmath.html

I can't say for sure if these give you access to all of the R stuff
you want, but you do get access to quite a bit.

PJ




> My Idea  is to use the SEXP processing functions/MACROS (CAR, CDR, CADR,
> etc..) from within C code and transform it to our AST format. I tried
> linking to libR.a and other lib*.so's when I compile the C code using gcc
> but, it doesn't work.
>
> I read that R exposes only small number of functions for library/package
> writers and the compiled *.so can only from within R.
>
> Any ideas on what is wrong, or how I can go about it?
>
> Appreciate any help.
>
> Thanks
> RJ
>
>        [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>



-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Wish there were a "strict mode" for R interpreter. What about You?

2011-04-09 Thread Paul Johnson
Years ago, I did lots of Perl programming. Perl will let you be lazy
and write functions that refer to undefined variables (like R does),
but there is also a strict mode so the interpreter will block anything
when a variable is mentioned that has not been defined. I wish there
were a strict mode for checking R functions.

Here's why. We have a lot of students writing R functions around here
and they run into trouble because they use the same name for things
inside and outside of functions. When they call functions that have
mistaken or undefined references to names that they use elsewhere,
then variables that are in the environment are accidentally used. Know
what I mean?

dat <- whatever

someNewFunction <- function(z, w){
   #do something with z and w and create a new "dat"
   # but forget to name it "dat"
lm (y, x, data=dat)
   # lm just used wrong data
}

I wish R had a strict mode to return an error in that case. Users
don't realize they are getting nonsense because R finds things to fill
in for their mistakes.

Is this possible?  Does anybody agree it would be good?

-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Wish R Core had a standard format (or generic function) for "newdata" objects

2011-04-26 Thread Paul Johnson
Is anybody working on a way to standardize the creation of "newdata"
objects for predict methods?

When using predict, I find it difficult/tedious to create newdata data
frames when there are many variables. It is necessary to set all
variables at the mean/mode/median, and then for some variables of
interest, one has to insert values for which predictions are desired.
I was at a presentation by Scott Long last week and he was discussing
the increasing emphasis in Stata on calculations of marginal
predictions and "Spost" an several other packages, and,
co-incidentally, I had a student visit who is learning to use R MASS's
polr (W.Venables and B. Ripley) and we wrestled for quite a while to
try to make the same calculations that Stata makes automatically.  It
spits out predicted probabilities each independent variable, keeping
other variables at a reference level.

I've found R packages that aim to do essentially the same thing.

In Frank Harrell's Design/rms framework, he uses a "data.dist"
function that generates an object that the user has to put into the R
options.  I think many users trip over the use of "options" there.  If
I don't use that for a month or two, I completely forget the fine
points and have to fight with it.  But it does "work" to give plots
and predict functions the information they require.

In  Zelig ( by Kosuke Imai, Gary King, and Olivia Lau), a function
"setx" does the work of creating "newdata" objects. That appears to be
about right as a candidate for a generic "newdata" function. Perhaps
it could directly generalize to all R regression functions, but right
now it is tailored to the models in Zelig. It has separate methods for
the different types of models, and that is a bit confusing to me,since
the "newdata" in one model should be the same as the newdata in
another, I'm guessing. But his code is all there, I'll keep looking.

In Effects (by John Fox), there are internal functions to create
newdata and plot the marginal effects. If you load effects and run,
for example, "effects:::effect.lm" you see Prof Fox has his own way of
grabbing information from model columns and calculating predictions.

I think it is time the R Core Team would look at this tell "us" what
is the right way to do this. I think the interface to setx in Zelig is
pretty easy to understand, at least for numeric variables.

In R's termplot function, such a thing could be put to use.  As far as
I can tell now, termplot is doing most of the work of creating a
newdata object, but not exactly.

It seems like it would be a shame to proliferate more functions that
do the same function, when it is such a common thing.

-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Wish R Core had a standard format (or generic function) for "newdata" objects

2011-04-27 Thread Paul Johnson
On Tue, Apr 26, 2011 at 7:39 PM, Duncan Murdoch
 wrote:

> If you don't like the way this was done in my three lines above, or by Frank
> Harrell, or the Zelig group, or John Fox, why don't you do it yourself, and
> get it right this time?  It's pretty rude to complain about things that
> others have given you for free, and demand they do it better.
>
> Duncan Murdoch
>

I offer sincere apology for sounding that way.  I'm not attacking
anybody. I'm just talking, asking don't you agree this were
standardized.  And you disagree, and I respect that since you are
actually doing the work.

>From a "lowly user's point of view", I wish "you experts" out there
would tell us one way to do this, we could follow your example.

When there's a regression model fitted with 20 variables in it, and
half of them are numeric, 4 are unordered factors, 3 are ordinal
factors, and what not, then this is a hard problem for many of us
ordinary users.  Or it is tedious.  They want "keep everything fixed,"
except one variable that takes on different specified values.  And
they want to do that for every variable, one at a time.

Stata has made this easy for many models, R could as well, if we
coalesced on a more-or-less standard way to create newdata objects for
predict.

But, in the end, I agree with your sentiment.  I just have to do this,
show you it is handy.  I think Zelig's setx has it about right, I'll
pursue that strategy.

pj
-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] median and data frames

2011-04-27 Thread Paul Johnson
On Wed, Apr 27, 2011 at 12:44 PM, Patrick Burns
 wrote:
> Here are some data frames:
>
> df3.2 <- data.frame(1:3, 7:9)
> df4.2 <- data.frame(1:4, 7:10)
> df3.3 <- data.frame(1:3, 7:9, 10:12)
> df4.3 <- data.frame(1:4, 7:10, 10:13)
> df3.4 <- data.frame(1:3, 7:9, 10:12, 15:17)
> df4.4 <- data.frame(1:4, 7:10, 10:13, 15:18)
>
> Now here are some commands and their answers:

>> median(df4.4)
> [1]  8.5 11.5
>> median(df3.2[c(1,2,3),])
> [1] 2 8
>> median(df3.2[c(1,3,2),])
> [1]  2 NA
> Warning message:
> In mean.default(X[[2L]], ...) :
>  argument is not numeric or logical: returning NA
>
>
>
> The sessionInfo is below, but it looks
> to me like the present behavior started
> in 2.10.0.
>
> Sometimes it gets the right answer.  I'd
> be grateful to hear how it does that -- I
> can't figure it out.
>

Hello, Pat.

Nice poetry there!  I think I have an actual answer, as opposed to the
usual crap I spew.

I would agree if you said median.data.frame ought to be written to
work columnwise, similar to mean.data.frame.

apply and sapply  always give the correct answer

> apply(df3.3, 2, median)
  X1.3   X7.9 X10.12
 2  8 11

> apply(df3.2, 2, median)
X1.3 X7.9
   28

> apply(df3.2[c(1,3,2),], 2, median)
X1.3 X7.9
   28

mean.data.frame is now implemented as

mean.data.frame <- function(x, ...) sapply(x, mean, ...)

I think we would suggest this for medians:

??

median.data.frame <- function(x,...) sapply(x, median, ...)

?

It works, see:

> median.data.frame(df3.2[c(1,3,2),])
X1.3 X7.9
   28

Would our next step be to enter that somewhere in R bugzilla? (I'm not
joking--I'm that naive).

I think I can explain why the current median works intermittently in
those cases you mention.  Give it a small set of pre-sorted data, all
is well.  median.default uses a sort function, and it is confused when
it is given a data.frame object rather than just a vector.


I put a browser() at the top of median.default

> median(df3.2[c(1,3,2),])
Called from: median.default(df3.2[c(1, 3, 2), ])
Browse[1]> n
debug at #4: if (is.factor(x)) stop("need numeric data")
Browse[2]> n
debug at #4: NULL
Browse[2]> n
debug at #6: if (length(names(x))) names(x) <- NULL
Browse[2]> n
debug at #6: names(x) <- NULL
Browse[2]> n
debug at #8: if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x)))
return(x[FALSE][NA])
Browse[2]> n
debug at #8: if (any(is.na(x))) return(x[FALSE][NA])
Browse[2]> n
debug at #8: NULL
Browse[2]> n
debug at #12: n <- length(x)
Browse[2]> n
debug at #13: if (n == 0L) return(x[FALSE][NA])
Browse[2]> n
debug at #13: NULL
Browse[2]> n
debug at #15: half <- (n + 1L)%/%2L
Browse[2]> n
debug at #16: if (n%%2L == 1L) sort(x, partial = half)[half] else
mean(sort(x,
partial = half + 0L:1L)[half + 0L:1L])
Browse[2]> n
debug at #16: mean(sort(x, partial = half + 0L:1L)[half + 0L:1L])
Browse[2]> n
[1]  2 NA
Warning message:
In mean.default(X[[2L]], ...) :
  argument is not numeric or logical: returning NA


Note the sort there in step 16. I think that's what is killing us.

If you are lucky, give it a small  data frame that is in order, like
df3.2, the sort doesn't produce gibberish. When I get to that point, I
will show you the sort's effect.

First, the case that "works". I moved the browser() down, because I
got tired of looking at the same old not-yet-erroneous output.


> median(df3.2)
Called from: median.default(df3.2)
Browse[1]> n
debug at #15: half <- (n + 1L)%/%2L
Browse[2]> n
debug at #16: if (n%%2L == 1L) sort(x, partial = half)[half] else
mean(sort(x,
partial = half + 0L:1L)[half + 0L:1L])
Browse[2]> n
debug at #16: mean(sort(x, partial = half + 0L:1L)[half + 0L:1L])

Interactively, type

Browse[2]> sort(x, partial = half + 0L:1L)
  NA NA   NA   NA   NA   NA
1  1  7 NULL NULL NULL NULL
2  2  8
3  3  9
Warning message:
In format.data.frame(x, digits = digits, na.encode = FALSE) :
  corrupt data frame: columns will be truncated or padded with NAs

But it still gives you a "right" answer:

Browse[2]> n
[1] 2 8


But if  you give it data out of order, the second column turns to NA,
and that causes doom.


> median(df3.2[c(1,3,2),])
Called from: median.default(df3.2[c(1, 3, 2), ])
Browse[1]> n
debug at #15: half <- (n + 1L)%/%2L
Browse[2]> n
debug at #16: if (n%%2L == 1L) sort(x, partial = half)[half] else
mean(sort(x,
partial = half + 0L:1L)[half + 0L:1L])
Browse[2]> n
debug at #16: mean(sort(x, partial = half + 0L:1L)[half + 0L:1L])

Interactively:

Browse[2]> sort(x, partial = half + 0L:1L)
  NA   NA NA   NA   NA   NA
1  1 NULL  7 NULL NULL NULL
3  3   9   
2  2   8   
Warning message:
In format.data.frame(x, digits = digits, na.encode = FALSE) :
  corrupt data frame: columns will be truncated or padded with NAs

Browse[2]> n
[1]  2 NA
Warning message:
In mean.default(X[[2L]], ...) :
  argument is not numeric or logical: returning NA


Here's a larger test case. Note columns 1 and 3 turn to NULL

> df8.8 <- data.frame(a=2:8, b=1:7)

median(df8.8)


[Rd] requesting a mentor for R development

2011-06-20 Thread Paul Johnson
I'd like to learn the process of revising R functions & packages and
then submitting proposed patches to the R Core team.  Would someone be
willing to mentor me through one example?

For starters, consider an example.  I'd like to revise the t.test
function to return the stderr value to the user.  We only need to
change the "rval" in the third-from-the end line of
stats:::t.test.default.

Change this:

 rval <- list(statistic = tstat, parameter = df, p.value = pval,
conf.int = cint, estimate = estimate, null.value = mu,
alternative = alternative, method = method, data.name = dname)
class(rval) <- "htest"
return(rval)

To this:

 rval <- list(statistic = tstat, parameter = df, stderr=stderr, p.value = pval,
conf.int = cint, estimate = estimate, null.value = mu,
alternative = alternative, method = method, data.name = dname)
class(rval) <- "htest"
return(rval)

Here is where I need help.

1. What other changes in the R code & documents would be required for
consistency?

2. How does R Core Team expect/accept patches?
I do understand a bit about SVN and CVS.  I think I could mark this up
and make a diff, but I'm uncertain about the details.

3. How would I best  prepare a suggestion like this so as to get it
accepted into R?

If somebody will raise a hand, I'm willing to do the work to learn.

pj
-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Randomness not due to seed

2011-07-25 Thread Paul Johnson
On Tue, Jul 19, 2011 at 8:13 AM, jeroen00ms  wrote:
> I am working on a reproducible computing platform for which I would like to
> be able to _exactly_ reproduce an R object. However, I am experiencing
> unexpected randomness in some calculations. I have a hard time finding out
> exactly how it occurs. The code below illustrates the issue.
>
> mylm1 <- lm(dist~speed, data=cars);
> mylm2 <- lm(dist~speed, data=cars);
> identical(mylm1, mylm2); #TRUE
>
> makelm <- function(){
>        return(lm(dist~speed, data=cars));
> }
>
> mylm1 <- makelm();
> mylm2 <- makelm();
> identical(mylm1, mylm2); #FALSE
>
> When inspecting both objects there seem to be some rounding differences.
> Setting a seed does not make a difference. Is there any way I can remove
> this randomness and exactly reproduce the object every time?
>

William Dunlap was correct.  Observe in the sequence of comparisons
below, the difference in the "terms" object is causing the identical
to fail: Everything else associated with this model--the coefficients,
the r-square, cov matrix, etc, exactly match.


> mylm1 <- lm(dist~speed, data=cars);
> mylm2 <- lm(dist~speed, data=cars);
> identical(mylm1, mylm2); #TRUE
[1] TRUE
> makelm <- function(){
+return(lm(dist~speed, data=cars));
+ }
> mylm1 <- makelm();
> mylm2 <- makelm();
> identical(mylm1, mylm2); #FALSE
[1] FALSE
> identical(coef(mylm1), coef(mylm2))
[1] TRUE
> identical(summary(mylm1), summary(mylm2))
[1] FALSE
> identical(coef(summary(mylm1)), coef(summary(mylm2)))
[1] TRUE
> all.equal(mylm1, mylm2)
[1] TRUE
> identical(summary(mylm1)$r.squared, summary(mylm2)$r.squared)
[1] TRUE
> identical(summary(mylm1)$adj.r.squared, summary(mylm2)$adj.r.squared)
[1] TRUE
> identical(summary(mylm1)$sigma, summary(mylm2)$sigma)
[1] TRUE
> identical(summary(mylm1)$fstatistic, summary(mylm2)$fstatistic)
[1] TRUE
> identical(summary(mylm1)$residuals, summary(mylm2)$residuals)
[1] TRUE
> identical(summary(mylm1)$cov.unscaled, summary(mylm2)$cov.unscaled)
[1] TRUE
> identical(summary(mylm1)$call, summary(mylm2)$call)
[1] TRUE
> identical(summary(mylm1)$terms, summary(mylm2)$terms)
[1] FALSE

> summary(mylm2)$terms
dist ~ speed
attr(,"variables")
list(dist, speed)
attr(,"factors")
  speed
dist  0
speed 1
attr(,"term.labels")
[1] "speed"
attr(,"order")
[1] 1
attr(,"intercept")
[1] 1
attr(,"response")
[1] 1
attr(,".Environment")

attr(,"predvars")
list(dist, speed)
attr(,"dataClasses")
 dist speed
"numeric" "numeric"
>
> summary(mylm1)$terms
dist ~ speed
attr(,"variables")
list(dist, speed)
attr(,"factors")
  speed
dist  0
speed 1
attr(,"term.labels")
[1] "speed"
attr(,"order")
[1] 1
attr(,"intercept")
[1] 1
attr(,"response")
[1] 1
attr(,".Environment")

attr(,"predvars")
list(dist, speed)
attr(,"dataClasses")
 dist speed
"numeric" "numeric"




-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] CRAN mirror size mushrooming; consider archiving some?

2011-07-25 Thread Paul Johnson
Hi, everybody

I'm setting up a new CRAN mirror and filled up the disk space the
server allotted me.  I asked for more, then filled that up.  Now the
system administrators want me to buy an $800 fiber channel card and a
storage device.  I'm going to do that, but it does make want to
suggest to you that this is a problem.

CRAN now is about 68GB, and about 3/4 of that is in the bin folder,
where one finds copies of compiled packages for macosx and windows.
If the administrators of CRAN would move the packages for R before,
say, 2.12, to long term storage, then mirror management would be a bit
more, well, manageable.

Moving the R for windows packages for, say, R 2.0 through 2.10 would
save some space, and possibly establish a useful precedent for the
long term.

Here's the bin/windows folder. Note it is expanding exponentially (or nearly so)

$ du --max-depth=1 | sort
1012644 ./2.6
103504  ./1.7
122200  ./1.8
1239876 ./2.7
1487024 ./2.8
15220   ./ATLAS
167668  ./1.9
17921604.
1866196 ./2.9
204392  ./2.0
2207708 ./2.10
2340120 ./2.13
2356272 ./2.12
2403176 ./2.11
298620  ./2.1
364292  ./2.2
438044  ./2.3
595920  ./2.4
698064  ./2.5

-- 

-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Please explain your workflow from R code -> package -> R code -> package

2011-09-09 Thread Paul Johnson
Hi,

I'm asking another one of those questions that would be obvious if I
could watch your work while you do it.

I'm having trouble understanding the workflow of code and package maintenance.

Stage 1.  Make some R functions in a folder.  This is in a Subversion repo

R/trunk/myproject

Stage 2. Make a package:

After the package.skeleton, and R check, I have a new folder with the
project in it,

R/trunk/myproject/mypackage
  DESCRIPTION
  man
  R

I to into the man folder and manually edit the Rd files. I don't
change anything in the R folder because I think it is OK so far.

And eventually I end up with a tarball mypackage_1.0.tar.gz.

Stage 3. How to make the round trip? I add more R code, and
re-generate a package.

package.skeleton obliterates the help files I've already edited.

So keeping the R code in sync with the documentation appears to be a hassle.

In other languages, I've seen to write the documentation inside the
code files and then post-process to make the documentation.  Is there
a similar thing for R, to unify the R code development and
documentation/package-making process?

pj

-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Proposed change in file.exists() to tolerate Windows

2015-08-27 Thread Paul Johnson
I'm writing to ask if R Core would make file.exists more Windows
tolerant when the argument has a trailing slash. This has been
discussed by users a few times here, I know it is not a new topic. But
it is not a solved problem, yet. I acknowledge that CRAN packages
exist which fix this by replacing file.exists(), but it seems more
elegant to me to fix the problem in R itself.

R Core goes to great extremes to accommodate Windows users and the
refusal to make file.exists() work in a cross-platform way is
incongruous.

I often do have slashes on the end of directory names being tested.
Now that I understand the meaning of ?file.exists,  I need to wrap the
name being checked in a slash-deleter

## delete trailing slashes
dts <- function(name) gsub("/$", "", name)
if(!file.exists(dts(any_name))) { ...

Can't you make file.exists do this invisibly? Maybe the argument could
be filtered through normalizePath() instead.

If not, would you please consider putting a workaround like mine into
the file.exists documentation so Windows users can see how easy this
is to avoid?

Respectfully yours,

pj

-- 
Paul E. Johnson
Professor, Political ScienceDirector
1541 Lilac Lane, Room 504  Center for Research Methods
University of Kansas University of Kansas
http://pj.freefaculty.org  http://crmda.ku.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] MKL Acceleration encouraging; need adjust package builds?

2015-11-23 Thread Paul Johnson
Dear R-devel:

The Cluster administrators at KU got enthusiastic about testing
R-3.2.2 with Intel MKL when I asked for some BLAS integration.  Below
I forward a performance report, which is encouraging, and thought you
would like to know the numbers.  Appears to my untrained eye there are
some extraordinary speedups on Cholesky decomposition, determinants,
and matrix inversion.

They had difficulty getting R to compile with  R shared BLAS (don't
know what went wrong there), so they went the other direction.

In his message to me, the technician says that I should consider
adjusting the compilation flags on the packages that use BLAS.  Do you
think that is needed? R is compiled with non-shared BLAS libraries,
won't packages know where to look for BLAS headers?

2. If I need to do that, I wonder how to do it and which packages need
attention.  Eigen and Armadillo packages, and possibly the ones that
depend on them, lme4, anything flowing through Rcpp.

Here's the build for some packages. Are they finding MKL BLAS?  How
would I know?

* installing *source* package 'RcppArmadillo' ...
** package 'RcppArmadillo' successfully unpacked and MD5 sums checked
* checking LAPACK_LIBS: divide-and-conquer complex SVD available via
system LAPACK
** libs
g++ -I/tools/cluster/6.2/R/3.2.2_mkl/lib64/R/include
-I/usr/local/include
-I"/panfs/pfs.acf.ku.edu/crmda/tools/lib64/R/3.2/site-library/Rcpp/include"
 -I../inst/include -fpic  -O3 -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2
-fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64
-mtune=generic-c RcppArmadillo.cpp -o RcppArmadillo.o
g++ -I/tools/cluster/6.2/R/3.2.2_mkl/lib64/R/include
-I/usr/local/include
-I"/panfs/pfs.acf.ku.edu/crmda/tools/lib64/R/3.2/site-library/Rcpp/include"
 -I../inst/include -fpic  -O3 -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2
-fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64
-mtune=generic-c RcppExports.cpp -o RcppExports.o
g++ -I/tools/cluster/6.2/R/3.2.2_mkl/lib64/R/include
-I/usr/local/include
-I"/panfs/pfs.acf.ku.edu/crmda/tools/lib64/R/3.2/site-library/Rcpp/include"
 -I../inst/include -fpic  -O3 -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2
-fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64
-mtune=generic-c fastLm.cpp -o fastLm.o
g++ -shared -L/tools/cluster/6.2/R/3.2.2_mkl/lib64/R/lib
-L/usr/local/lib64 -o RcppArmadillo.so RcppArmadillo.o RcppExports.o
fastLm.o -L/panfs/pfs.acf.ku.edu/cluster/6.2/intel/2015/mkl/lib/intel64
-Wl,--no-as-needed -lmkl_gf_lp64 -Wl,--start-group -lmkl_gnu_thread
-lmkl_core -Wl,--end-group -fopenmp -ldl -lpthread -lm -lgfortran -lm
-L/tools/cluster/6.2/R/3.2.2_mkl/lib64/R/lib -lR
installing to 
/panfs/pfs.acf.ku.edu/crmda/tools/lib64/R/3.2/site-library/RcppArmadillo/libs
** R
** inst
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded
* DONE (RcppArmadillo)

* installing *source* package 'RcppEigen' ...
** package 'RcppEigen' successfully unpacked and MD5 sums checked
** libs
g++ -I/tools/cluster/6.2/R/3.2.2_mkl/lib64/R/include
-I/usr/local/include
-I"/panfs/pfs.acf.ku.edu/crmda/tools/lib64/R/3.2/site-library/Rcpp/include"
 -I../inst/include -fpic  -O3 -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2
-fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64
-mtune=generic-c RcppEigen.cpp -o RcppEigen.o
g++ -I/tools/cluster/6.2/R/3.2.2_mkl/lib64/R/include
-I/usr/local/include
-I"/panfs/pfs.acf.ku.edu/crmda/tools/lib64/R/3.2/site-library/Rcpp/include"
 -I../inst/include -fpic  -O3 -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2
-fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64
-mtune=generic-c RcppExports.cpp -o RcppExports.o
g++ -I/tools/cluster/6.2/R/3.2.2_mkl/lib64/R/include
-I/usr/local/include
-I"/panfs/pfs.acf.ku.edu/crmda/tools/lib64/R/3.2/site-library/Rcpp/include"
 -I../inst/include -fpic  -O3 -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2
-fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64
-mtune=generic-c fastLm.cpp -o fastLm.o
g++ -shared -L/tools/cluster/6.2/R/3.2.2_mkl/lib64/R/lib
-L/usr/local/lib64 -o RcppEigen.so RcppEigen.o RcppExports.o fastLm.o
-L/panfs/pfs.acf.ku.edu/cluster/6.2/intel/2015/mkl/lib/intel64
-Wl,--no-as-needed -lmkl_gf_lp64 -Wl,--start-group -lmkl_gnu_thread
-lmkl_core -Wl,--end-group -fopenmp -ldl -lpthread -lm -lgfortran -lm
-L/tools/cluster/6.2/R/3.2.2_mkl/lib64/R/lib -lR
installing to 
/panfs/pfs.acf.ku.edu/crmda/tools/lib64/R/3.2/site-library/RcppEigen/libs
** R
** inst
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded
* DONE (RcppEigen)

* installing *source* package 'MatrixModels' ...
** package 'MatrixModels' successfully unpacked and MD5 sums checked
** R
** preparing package for lazy loading
Creating a generic function for 'resid' from package 'stats' in
package 'MatrixModels'
Creating a generic function for 'fitted.va

Re: [Rd] MKL Acceleration encouraging; need adjust package builds?

2015-11-25 Thread Paul Johnson
On Mon, Nov 23, 2015 at 11:39 AM, David Smith  wrote:
> Hi Paul,
>
> We've been through this process ourselves for the Revolution R Open project. 
> There are a number of pitfalls to avoid, but you can take a look at how we 
> achieved it in the build scripts at:
>
> https://github.com/RevolutionAnalytics/RRO
>
> There are also some very useful notes in the R Installation guide:
> https://cran.r-project.org/doc/manuals/r-release/R-admin.html#BLAS
>
> Most packages do benefit from MKL (or any multi-threaded BLAS) to some 
> degree, although the actual benefit depends on the R functions they call. 
> Some packages (and some built-in R functions) don't call into BLAS endpoints, 
> so you won't see benefits in all cases.
>
> # David Smith
>
Dear David

I'm in the situation mentioned here in the docs, since BLAS is not shared.

"Note that under Unix (but not under Windows) if R is compiled against
a non-default BLAS and --enable-BLAS-shlib is not used, then all
BLAS-using packages must also be. So if R is re-built to use an
enhanced BLAS then packages such as quantreg will need to be
re-installed. "

I am building all of the modules from scratch, so if the default build
is sufficient, then I'll be done. When I asked the other day, I was
worried that packages would find the wrong shared library. As far as I
can tell now, I should not have been so worried.

Today, while browsing the R installation, I find the Makeconf file and
that has all the information a package should need.  I've verified
that the quantreg package detects this information, and we'll just
hope the others do too :)

In case anybody else comes along later and wonders how R can be
configured to make this go, here's the top of our Makeconf from the
installed R, which has the configure line as well as BLAS_LIBS, which,
so far as I can tell, is making all of this go.

Makeconf content

# etc/Makeconf.  Generated from Makeconf.in by configure.
#
# ${R_HOME}/etc/Makeconf
# R was configured using the following call
# (not including env. vars and site configuration)
# configure  '--prefix=/tools/cluster/6.2/R/3.2.2_mkl' '--with-tcltk'
'--enable-R-shlib' '--enable-shared' '--with-pic'
'--disable-BLAS-shlib'
'--with-blas=-L/panfs/pfs.acf.ku.edu/cluster/6.2/intel/2015/mkl/lib/intel64
-Wl,--no-as-needed -lmkl_gf_lp64 -Wl,--start-group -lmkl_gnu_thread
-lmkl_core  -Wl,--end-group -fopenmp  -ldl -lpthread -lm'
'--with-lapack'
'CFLAGS=-I/panfs/pfs.acf.ku.edu/cluster/system/pkg/R/curl7.45_install/include
-L/panfs/pfs.acf.ku.edu/cluster/6.2/R/3.2.2_mkl/lib64'
'JAVA_HOME=/tools/cluster/6.2/java/jdk1.8.0_66'

## This fails if it contains spaces, or if it is quoted
include $(R_SHARE_DIR)/make/vars.mk

AR = ar
## Used by packages 'maps' and 'mapdata'
AWK = gawk
BLAS_LIBS = -L/panfs/pfs.acf.ku.edu/cluster/6.2/intel/2015/mkl/lib/intel64
-Wl,--no-as-needed -lmkl_gf_lp64 -Wl,--start-group -lmkl_gnu_thread
-lmkl_core  -Wl,--end-group -fopenmp  -ldl -lpthread -lm
C_VISIBILITY = -fvisibility=hidden
...



pj

-- 
Paul E. Johnson
Professor, Political ScienceDirector
1541 Lilac Lane, Room 504  Center for Research Methods
University of Kansas University of Kansas
http://pj.freefaculty.org  http://crmda.ku.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Package Dependency

2015-12-01 Thread Paul Johnson
Lately, I see a dependency problem with R-3.2 systems.
install.packages allows dependencies, but they are not installed so
the install fails.

I see this a lot with students who are able to install packages like
ggplot2 but then can't load it because a dependency "stringi" is
missing.  Those students are generally using Windows, but today I
reproduced the same on RedHat Linux with R-3.2.2.

desiredPackages is a big vector of packages to install. This is using
a trick that one of R Core taught me about 15 years ago...




> installedPackages <- rownames (installed.packages() )

> alreadyHave <- desiredPackages %in% installedPackages


> install.packages(desiredPackages[!alreadyHave],
+ dependencies = c("Depends"),
+ lib = TARGETLIB)
trying URL 'http://rweb.crmda.ku.edu/cran/src/contrib/DiagrammeR_0.8.1.tar.gz'
Content type 'application/x-gzip' length 3438156 bytes (3.3 MB)
==
downloaded 3.3 MB

trying URL 'http://rweb.crmda.ku.edu/cran/src/contrib/emplik_1.0-2.tar.gz'
Content type 'application/x-gzip' length 199124 bytes (194 KB)
==
downloaded 194 KB

trying URL 'http://rweb.crmda.ku.edu/cran/src/contrib/glmc_0.2-4.tar.gz'
Content type 'application/x-gzip' length 19721 bytes (19 KB)
==
downloaded 19 KB

trying URL 'http://rweb.crmda.ku.edu/cran/src/contrib/sem_3.1-6.tar.gz'
Content type 'application/x-gzip' length 156194 bytes (152 KB)
==
downloaded 152 KB

trying URL 'http://rweb.crmda.ku.edu/cran/src/contrib/OpenMx_2.3.1.tar.gz'
Content type 'application/x-gzip' length 2537264 bytes (2.4 MB)
==
downloaded 2.4 MB

ERROR: dependency 'visNetwork' is not available for package 'DiagrammeR'
* removing 
'/panfs/pfs.acf.ku.edu/crmda/tools/lib64/R/3.2/site-library/DiagrammeR'


However, I can manually install that individual package



> install.packages("visNetwork",
+ dependencies = c("Depends"),
+ lib = TARGETLIB)
trying URL 'http://rweb.crmda.ku.edu/cran/src/contrib/visNetwork_0.1.2.tar.gz'
Content type 'application/x-gzip' length 1828012 bytes (1.7 MB)
==
downloaded 1.7 MB

* installing *source* package 'visNetwork' ...
** package 'visNetwork' successfully unpacked and MD5 sums checked
** R
** inst
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded
* DONE (visNetwork)

The downloaded source packages are in
'/library/tmp/RtmpBImblN/downloaded_packages'


After that, I start over,  the install of the desiredPackages vector
proceeds, but hits another snag:

install.packages(desiredPackages[!alreadyHave],
dependencies = c("Depends"),
+ + lib = TARGETLIB)
trying URL 'http://rweb.crmda.ku.edu/cran/src/contrib/DiagrammeR_0.8.1.tar.gz'
Content type 'application/x-gzip' length 3438156 bytes (3.3 MB)
==
downloaded 3.3 MB

trying URL 'http://rweb.crmda.ku.edu/cran/src/contrib/sem_3.1-6.tar.gz'
Content type 'application/x-gzip' length 156194 bytes (152 KB)
==
downloaded 152 KB

trying URL 'http://rweb.crmda.ku.edu/cran/src/contrib/OpenMx_2.3.1.tar.gz'
Content type 'application/x-gzip' length 2537264 bytes (2.4 MB)
==
downloaded 2.4 MB

* installing *source* package 'DiagrammeR' ...
** package 'DiagrammeR' successfully unpacked and MD5 sums checked
** R
** inst
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded
* DONE (DiagrammeR)
ERROR: dependency 'StanHeaders' is not available for package 'OpenMx'


Go back, install just that one

> install.packages("StanHeaders", dependencies = c("Depends"), lib = TARGETLIB)
trying URL 'http://rweb.crmda.ku.edu/cran/src/contrib/StanHeaders_2.8.0.tar.gz'
Content type 'application/x-gzip' length 479231 bytes (467 KB)
==
downloaded 467 KB

* installing *source* package 'StanHeaders' ...
** package 'StanHeaders' successfully unpacked and MD5 sums checked
** inst
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded
* DONE (StanHeaders)

The downloaded source packages are in
'/library/tmp/RtmpBImblN/downloaded_packages'

I don't recall seeing this problem until the last few months.

-- 
Paul E. Johnson
Professor, Political ScienceDirector
1541 Lilac Lane, Room 504  Center 

[Rd] something wrong in package submission procedure/website

2016-02-08 Thread Paul Johnson
Yesterday I uploaded new rockchalk_1.8.97.  Then I received email
saying that I needed to confirm the submission.  Here's the message.

Dear Paul E. Johnson
Someone has submitted the package rockchalk to CRAN.
You are receiving this email to confirm the submission as the maintainer of
this package.
To confirm the submission to CRAN, follow or copy & paste the following
link into your browser:

http://xmpalantir.wu.ac.at/cransubmit/conf_mail.php?code=b95e95eb5317901449ed7d311053bad8

If you did not submit the package or do not wish for it to be submitted to
CRAN, simply ignore this email

Submission Information:
Submitter: Paul Johnson 
Package: rockchalk
Version: 1.8.97

Note the email has the correct package version 1.8.97, what I uploaded
yesterday.

The email points to a webpage where I confirm submission.  The 3rd box
there requires me to agree to "I have fixed all probelms shown on the
package check page(which links to
http://cran.r-project.org/web/checks/check_results_rockchalk.html).

The problem is that the package check has a report on the PREVIOUS
rockchalk 1.8.92, not the one I uploaded yesterday.

So none of the beautiful repairs I implemented show up :(

pj

-- 
Paul E. Johnson
Professor, Political ScienceDirector
1541 Lilac Lane, Room 504  Center for Research Methods
University of Kansas University of Kansas
http://pj.freefaculty.org  http://crmda.ku.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] configure statement for R-devel with updated zlib in user account

2016-02-12 Thread Paul Johnson
I'm aware R-devel no longer includes zlib. This works find on
up-to-date Linux systems.

On the older Centos 6 cluster at KU, they have zlib tool old for
R-devel. The R-devel configure fails thus:

checking if zlib version >= 1.2.5... no
checking whether zlib support suffices... configure: error: zlib
library and headers are required

In other software, I've seen this kind of thing, so I installed zlib
1.2.8 into $HOME/packages and I can verify that the right pieces are
installed there.

Now I need you to tell me how to tell R to use that. Guesses so far:

$ export PATH=$HOME/packages/bin:$PATH
$ export LD_LIBRARY_PATH=$HOME/packages/lib:$LD_LIBRARY_PATH

What R's configure statement needs to find zlib in my user account?

In R-devel, the ./configure --help doesn't give me a simple-enough
explanation of what I'm supposed to do. I thought I had a solution by
adding this

--with-libpth-prefix=$HOME/packages

but still rejection, new zlib not found.

-- 
Paul E. Johnson
Professor, Political ScienceDirector
1541 Lilac Lane, Room 504  Center for Research Methods
University of Kansas University of Kansas
http://pj.freefaculty.org  http://crmda.ku.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] configure statement for R-devel with updated zlib in user account

2016-02-18 Thread Paul Johnson
On Mon, Feb 15, 2016 at 3:28 AM, Jesper Gådin  wrote:
> Hi Paul,
>
> This might be what you are looking for.
> https://stat.ethz.ch/pipermail/r-devel/2015-April/070951.html
>
> Jesper
>

Thanks to help form Wes Mason at KU and Jesper Gadin in r-devel, I
compiled R-devel on RHEL 6.

R-devel requires install of newer gzip, bzip2, xz, pcre, and curl.
Compiles of bzip2 and pcre require a slightly greater-than-usual
amount of attention to detail.  I wrote it all out:

http://pj.freefaculty.org/blog/?p=315 "Building R-devel on RedHat Linux 6"

Regards

pj
-- 
Paul E. Johnson
Professor, Political ScienceDirector
1541 Lilac Lane, Room 504  Center for Research Methods
University of Kansas University of Kansas
http://pj.freefaculty.org  http://crmda.ku.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Suggestion: pdf.options(embed=TRUE)

2016-05-31 Thread Paul Johnson
I learned last week that I generated a lot of PDFs and forgot to go
back and embed the fonts in them.

It would be very pleasant for me if pdf.options included an argument
to turn on font embedding and have fonts always embedded. Always.

Thanks for your time, as always.

-- 
Paul E. Johnson   http://pj.freefaculty.org
Director, Center for Research Methods and Data Analysis http://crmda.ku.edu

I only use this account for email list memberships. To write directly,
address me at pauljohn
at ku.edu.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] seq.int does not return a sequence of integers sometimes

2016-08-03 Thread Paul Johnson
I have a script that goes wrong because I assumed that seq.int would
return integers.

Below please see it does not unless user is super cautious about
inserting "L" with inputs. I think seq.int should do coercion for me
before returning the sequence.

> xx <- seq.int(1,10)
> class(xx)
[1] "integer"
> is.integer(xx)
[1] TRUE
> xx <- seq.int(1,10, 2)
> class(xx)
[1] "numeric"
> is.integer(xx)
[1] FALSE
> xx <- seq.int(1,10, 2L)
> class(xx)
[1] "numeric"
> is.integer(xx)
[1] FALSE
> xx <- seq.int(1L, 10L, 2L)
> class(xx)
[1] "integer"
> is.integer(xx)
[1] TRUE


I think all of those should have same return value, if the function is
correctly named seq.int.

-- 
Paul E. Johnson   http://pj.freefaculty.org
Director, Center for Research Methods and Data Analysis http://crmda.ku.edu

I only use this account for email list memberships. To write directly,
address me at pauljohn at ku.edu.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Cluster: Various GCC, how important is consistency?

2016-10-17 Thread Paul Johnson
On a cluster that is based on RedHat 6.2, we are updating to R-3.3.1.
I have, from time to time, run into problems with various R packages
and some older versions of GCC. I wish we had newer Linux in the
cluster, but with 1000s of nodes running 1000s of jobs, well, they
don't want a restart.

Administrator suggested I try to build with the GCC that is provided
with the nodes, which is gcc-4.4.7.  To my surprise, R-3.3.1 compiled
with that.  After that, I got quite far, many 100s of packages
compiled, but then I hit a snag that RccArmadillo explicitly refuses
to build with anything older than gcc-4.6.  The OpenMx package and
emplik packages also refuse to compile with old gcc

The cluster uses a module system, it is easy enough to swap in various
gcc versions to see what compiles.

I did succeed compiling RcppArmadillo with gcc 4.9.2. But Rcpp is not
picky, it compiled with gcc-4.4.7.

I worry...

1)  will reliance on various GCC make the packages incompatible with
R, or each other?

I logged out, logged back in, with R 3.3.1 I can run

library(RcppArmadillo)
library(Rcpp)

with no errors so far. But I'm not stress testing it much.

I should rebuild everything?

I expect that if I were to use gcc-6 on one package, it would not be
compatible with binaries built with 4.4.7.  But is there a zone of
tolerance allowing 4.4.7 and 4.9 packages to coexist?

2) If I build with non-default GCC, are all of the R users going to
hit trouble if they don't have the same GCC I use?  Unless I make some
extraordinary effort, they are getting GCC 4.4.7. If they try to
install a package, they are getting that GCC, not the one I use to
build RcppArmadillo or the other trouble cases (or everything, if you
say I need to go back and rebuild).

>From an administrative point of view, should I tie R-3.3.1 to a
particular version of GCC? I think I could learn how to do that.

On the cluster, they use the module framework. There are about 50
versions of GCC.  It is easy enough ask for a newer one:

$ module load gcc/4.9.2

It puts the gcc 4.9.2 binaries and shared libraries at the front of the PATHs.

pj


-- 
Paul E. Johnson   http://pj.freefaculty.org
Director, Center for Research Methods and Data Analysis http://crmda.ku.edu

To write me directly, address me at pauljohn at ku.edu.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Cluster: Various GCC, how important is consistency?

2016-10-18 Thread Paul Johnson
Dear Jeroen

Did you  rebuild R-3.3.1 and all of the packages with GCC-5.3 in order
to make this work?

The part that worries me is that the shared libraries won't be
consistent, with various versions of GCC in play.

On Tue, Oct 18, 2016 at 5:55 AM, Jeroen Ooms  wrote:
> On Tue, Oct 18, 2016 at 1:44 AM, Paul Johnson  wrote:
>>
>> Administrator suggested I try to build with the GCC that is provided
>> with the nodes, which is gcc-4.4.7.
>
> Redhat provides an alternative compiler (gcc 5.3 based) in one of it's
> opt-in repositories called "redhat developer toolkit" (RDT). In CentOS
> you install it as follows:
>
>   yum install -y centos-release-scl
>   yum install -y devtoolset-4-gcc-c++
>
> This compiler is specifically designed to be used alongside the EL6
> stock gcc 4.4.7. It includes a simple 'enable' script which will put
> RDT gcc and g++ in front of your PATH and LD_LIBRARY_PATH and so on.
>
> So what I do on CentOS is install R from EPEL (built with stock gcc
> 4.4.7) and whenever I need to install an R package that uses e.g.
> CXX11, simply start an R shell using the RDT compilers:
>
>source /opt/rh/devtoolset-4/enable
>R
>
> From what I have been able to test, this works pretty well (though I
> am not a regular EL user). But I was able to build R packages that use
> C++11 (such as feather) and once installed, these packages can be used
> even in a regular R session (without RDT enabled).



-- 
Paul E. Johnson   http://pj.freefaculty.org
Director, Center for Research Methods and Data Analysis http://crmda.ku.edu

I only use this account for email list memberships. To write directly,
address me at pauljohn at ku.edu.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] syntax difference clusterExport in parallel and snow

2016-12-13 Thread Paul Johnson
We got some errors and eventually figured out that
parallel::clusterExport second argument is "varlist" while in
snow::clusterExport it is "list".

The user had loaded parallel first, but did something else which
inadvertently loaded snow, then clusterExport failed because we had
"varlist" and not "list".

Are these different on purpose?

pj

-- 
Paul E. Johnson   http://pj.freefaculty.org
Director, Center for Research Methods and Data Analysis http://crmda.ku.edu

To write to me directly, please address me at pauljohn at ku.edu.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] duplicated factor labels.

2017-06-14 Thread Paul Johnson
Dear R devel

I've been wondering about this for a while. I am sorry to ask for your
time, but can one of you help me understand this?

This concerns duplicated labels, not levels, in the factor function.

I think it is hard to understand that factor() fails, but levels()
after does not

>  x <- 1:6
> xlevels <- 1:6
> xlabels <- c(1, NA, NA, 4, 4, 4)
> y <- factor(x, levels = xlevels, labels = xlabels)
Error in `levels<-`(`*tmp*`, value = if (nl == nL)
as.character(labels) else paste0(labels,  :
  factor level [3] is duplicated
> y <- factor(x, levels = xlevels)
> levels(y) <- xlabels
> y
[1] 1  444
Levels: 1 4

If the latter use of levels() causes a good, expected result, couldn't
factor(..., labels = xlabels) be made to the same thing?

That's the gist of it. To signal to you that I've been trying to
figure this out on my own, here is a revision I've tested in R's
factor function which "seems" to fix the matter. (Of course, probably
causes lots of other problems I don't understand, that's why I'm
writing to  you now.)

In the factor function, the class of f is assigned *after* levels(f) is called

levels(f) <- ## nl == nL or 1
if (nl == nL) as.character(labels)
else paste0(labels, seq_along(levels))
class(f) <- c(if(ordered) "ordered", "factor")

At that point, f is an integer, and levels(f) is a primitive

> `levels<-`
function (x, value)  .Primitive("levels<-")

That's what generates the error.  I don't understand well what
.Primitive means here. I need to walk past that detail.

Suppose I revise the factor function to put the class(f) line before
the level(). Then `levels<-.factor` is called and all seems well.

factor <- function (x = character(), levels, labels = levels, exclude = NA,
ordered = is.ordered(x), nmax = NA)
{
if (is.null(x))
x <- character()
nx <- names(x)
if (missing(levels)) {
y <- unique(x, nmax = nmax)
ind <- sort.list(y)
levels <- unique(as.character(y)[ind])
}
force(ordered)
if (!is.character(x))
x <- as.character(x)
levels <- levels[is.na(match(levels, exclude))]
f <- match(x, levels)
if (!is.null(nx))
names(f) <- nx
nl <- length(labels)
nL <- length(levels)
if (!any(nl == c(1L, nL)))
stop(gettextf("invalid 'labels'; length %d should be 1 or %d",
nl, nL), domain = NA)
## class() moved up 3 rows
class(f) <- c(if (ordered) "ordered", "factor")
levels(f) <- if (nl == nL)
  as.character(labels)
 else paste0(labels, seq_along(levels))
f
}

> assignInNamespace("factor", factor, "base")
> x <- 1:6
> xlevels <- 1:6
> xlabels <- c(1, NA, NA, 4, 4, 4)
> y <- factor(x, levels = xlevels, labels = xlabels)
> y
[1] 1  444
Levels: 1 4
> attributes(y)
$class
[1] "factor"

$levels
[1] "1" "4"

That's a "good" answer for me.

But I broke your function. I eliminated the check for duplicated levels.

> y <- factor(x, levels = c(1, 1, 1, 2, 2, 2), labels = xlabels)
> y
[1] 14   
Levels: 1 4

Rather than have factor return the "duplicated levels" error when
there are duplicated values in labels, I wonder why it is not better
to have a check for duplicated levels directly. For example, insert a
new else in this stanza

if (missing(levels)) {
y <- unique(x, nmax = nmax)
ind <- sort.list(y)
levels <- unique(as.character(y)[ind])
} ##next is new part
else {
levels <- unique(levels)
}

That will cause an error when there are duplicated levels because
there are more labels than levels:

> y <- factor(x, levels = c(1, 1, 1, 2, 2, 2), labels = xlabels)
Error in factor(x, levels = c(1, 1, 1, 2, 2, 2), labels = xlabels) :
  invalid 'labels'; length 6 should be 1 or 2

So, in conclusion, if levels() can work after creating a factor, I
wish equivalent labels argument would be accepted. What is your
opinion?

pj
-- 
Paul E. Johnson   http://pj.freefaculty.org
Director, Center for Research Methods and Data Analysis http://crmda.ku.edu

To write to me directly, please address me at pauljohn at ku.edu.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] duplicated factor labels.

2017-06-16 Thread Paul Johnson
On Fri, Jun 16, 2017 at 2:35 AM, Joris Meys  wrote:
> To extwnd on Martin 's explanation :
>
> In factor(), levels are the unique input values and labels the unique output
> values. So the function levels() actually displays the labels.
>

Dear Joris

I think we agree. Currently, factor insists both levels and labels be unique.

I wish that it would not accept nonunique labels. I also understand it
is impractical to change this now in base R.

I don't think I succeeded in explaining why this would be nicer.
Here's another example. Fairly often, we see input data like

x <- c("Male", "Man", "male", "Man", "Female")

The first four represent the same value.  I'd like to go in one step
to a new factor variable with enumerated types "Male" and "Female".
This fails

xf <- factor(x, levels = c("Male", "Man", "male", "Female"),
labels = c("Male", "Male", "Male", "Female"))

Instead, we need 2 steps.

xf <- factor(x, levels = c("Male", "Man", "male", "Female"))
levels(xf) <- c("Male", "Male", "Male", "Female")

I think it is quirky that `levels<-.factor` allows the duplicated
labels, whereas factor does not.

I wrote a function rockchalk::combineLevels to simplify combining
levels, but most of the students here like plyr::mapvalues to do it.
The use of levels() can be tricky because one must enumerate all
values, not just the ones being changed.

But I do understand Martin's point. Its been this way 25 years, it
won't change. :).

> Cheers
> Joris
>
>


-- 
Paul E. Johnson   http://pj.freefaculty.org
Director, Center for Research Methods and Data Analysis http://crmda.ku.edu

To write to me directly, please address me at pauljohn at ku.edu.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Rmpi, openMPI editions.

2017-06-19 Thread Paul Johnson
Greetings.

I see a warning message while compiling OpenMPI and would appreciate
it if you tell me what it means.

This warning happens with any OpenMPI > 1.6.5.  Even before starting a
cluster, just "sessionInfo" triggers this warning.

I'm pasting in the message from R-3.3.2 (this is MRO).

Do the R parallel package cluster functions violate the warnings described here?

> library("Rmpi")
> sessionInfo()
--
An MPI process has executed an operation involving a call to the
"fork()" system call to create a child process.  Open MPI is currently
operating in a condition that could result in memory corruption or
other system errors; your MPI job may hang, crash, or produce silent
data corruption.  The use of fork() (or system() or other calls that
create child processes) is strongly discouraged.

The process that invoked fork was:

  Local host:  n401 (PID 114242)
  MPI_COMM_WORLD rank: 0

If you are *absolutely sure* that your application will successfully
and correctly survive a call to fork(), you may disable this warning
by setting the mpi_warn_on_fork MCA parameter to 0.
--
R version 3.3.2 (2016-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server release 6.4 (Santiago)

locale:
[1] C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

other attached packages:
[1] Rmpi_0.6-6   RevoUtilsMath_10.0.0

loaded via a namespace (and not attached):
[1] RevoUtils_10.0.2 parallel_3.3.2   tools_3.3.2
>

What I think this means is that we need to never run any multicore
functions and we need to be very careful that MKL or such does not
launch threads.  Is that right? Is it worse than that?

Why am I chasing this one today?

I've been on an adventure compiling R in a RedHat 6 cluster again. The
cluster admins here like the Microsoft R, and they had both 3.3 and
3.4 installed. However, we found some packaging flaws in 3.4 and so
that MRO was removed. I'm interested in building R-3.4, but it is a
pretty big job on the old RedHat.  I want to get this correct.

I've run into the problem I'd forgotten about OpenMPI. If OpenMPI >=
2, then Rmpi will compile, but jobs hang with "stopCluster".  With
OpenMPI-1.6.5, we get a clean build and no warnings, and clusters do
start and stop cleanly.  With newer version 1 editions of OpenMPI,
such as 1.10 or 1.12 (I suspect any versions (> 1.6.5), the Rmpi
generates an intimidating warning, but the cluster will stop when
asked.


-- 
Paul E. Johnson   http://pj.freefaculty.org
Director, Center for Research Methods and Data Analysis http://crmda.ku.edu

To write to me directly, please address me at pauljohn at ku.edu.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] arrows: no vectors for "code" and "angle" parameters

2017-06-19 Thread Paul Johnson
I was teaching new R users to make some fun graphs. I had some arrows examples
worked up we came across a problem.  The arrows function ignores 2nd
and following elements of vectors given as code an angle.

Would you please consider 1) allowing vectors for code and angle, or
2) returning an error or warning when user mistakenly supplies a
vector for those parameters?  When code like this is wrapped into an
Sweaved document--we I don't see the graph on the screen--this error
is difficult to catch while proofreading.

Example:

x0 <- c(-1, -4, 4.5)
y0 <- c(-1, -4, -8)
x1 <- c(2, -2, -3)
y1 <- c(4, 4, 18)
mylengths <- c(0.2, 0.3, 0.15)
mycodes <- c(3, 2, 1)
myangle <- c(10, 60, 80)

plot(x = c(-5, 5), y = c(-10, 20),
 type = "n", xlab = "", ylab = "")

arrows(x0 = x0, y0 = y0, x1 = x1, y1 = y1,
   length = mylengths, code = mycodes, angle = myangle)

I found a workaround, but this is more difficult to explain to beginners...

plot(x = c(-5, 5), y = c(-10, 20),
 type = "n", xlab = "", ylab = "")

mapply(arrows, x0 = x0, y0 = y0, x1 = x1, y1 = y1,
   length = mylengths, angle = myangle, code = mycodes)

pj

-- 
Paul E. Johnson   http://pj.freefaculty.org
Director, Center for Research Methods and Data Analysis http://crmda.ku.edu

To write to me directly, please address me at pauljohn at ku.edu.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] duplicated factor labels.

2017-06-23 Thread Paul Johnson
return(x)
  }

  mapidx <- match(x, from)
  mapidxNA  <- is.na(mapidx)

  # index of items in `from` that were found in `x`
  from_found <- sort(unique(mapidx))
  if (warn_missing && length(from_found) != length(from)) {
message("The following `from` values were not present in `x`: ",
  paste(from[!(1:length(from) %in% from_found) ], collapse = ", "))
  }

  x[!mapidxNA] <- to[mapidx[!mapidxNA]]
  x
}

In the rockchalk package, I wrote a function called combineLevels that
is careful with ordinal variables and only allows adjacent values to
be combined. I'm not suggesting you go that far with this simple
piece.

>
>
>
>
>>
>> -pd
>>
>>> On 23 Jun 2017, at 10:42 , Martin Maechler 
>>> wrote:
>>>
>>>>>>>> Martin Maechler 
>>>>>>>> on Thu, 22 Jun 2017 11:43:59 +0200 writes:
>>>
>>>
>>>>>>>> Paul Johnson 
>>>>>>>> on Fri, 16 Jun 2017 11:02:34 -0500 writes:
>>>
>>>
>>>>> On Fri, Jun 16, 2017 at 2:35 AM, Joris Meys 
>>>>> wrote:
>>>>>>
>>>>>> To extwnd on Martin 's explanation :
>>>>>>
>>>>>> In factor(), levels are the unique input values and labels the unique
>>>>>> output
>>>>>> values. So the function levels() actually displays the labels.
>>>>>>
>>>
>>>>> Dear Joris
>>>
>>>
>>>>> I think we agree. Currently, factor insists both levels and labels be
>>>>> unique.
>>>
>>>
>>>>> I wish that it would not accept nonunique labels. I also understand it
>>>>> is impractical to change this now in base R.
>>>
>>>
>>>>> I don't think I succeeded in explaining why this would be nicer.
>>>>> Here's another example. Fairly often, we see input data like
>>>
>>>
>>>>> x <- c("Male", "Man", "male", "Man", "Female")
>>>
>>>
>>>>> The first four represent the same value.  I'd like to go in one step
>>>>> to a new factor variable with enumerated types "Male" and "Female".
>>>>> This fails
>>>
>>>
>>>>> xf <- factor(x, levels = c("Male", "Man", "male", "Female"),
>>>>> labels = c("Male", "Male", "Male", "Female"))
>>>
>>>
>>>>> Instead, we need 2 steps.
>>>
>>>
>>>>> xf <- factor(x, levels = c("Male", "Man", "male", "Female"))
>>>>> levels(xf) <- c("Male", "Male", "Male", "Female")
>>>
>>>
>>>>> I think it is quirky that `levels<-.factor` allows the duplicated
>>>>> labels, whereas factor does not.
>>>
>>>
>>>>> I wrote a function rockchalk::combineLevels to simplify combining
>>>>> levels, but most of the students here like plyr::mapvalues to do it.
>>>>> The use of levels() can be tricky because one must enumerate all
>>>>> values, not just the ones being changed.
>>>
>>>
>>>>> But I do understand Martin's point. Its been this way 25 years, it
>>>>> won't change. :).
>>>
>>>
>>>> Well.. the above is a bit out of context.
>>>
>>>
>>>> Your first example really did not make a point to me (and Joris)
>>>> and I showed that you could use even two different simple factor() calls
>>>> to
>>>> produce what you wanted
>>>> yc <- factor(c("1",NA,NA,"4","4","4"))
>>>> yn <- factor(c( 1, NA,NA, 4,  4,  4))
>>>
>>>
>>>> Your new example is indeed  much more convincing !
>>>
>>>
>>>> (Note though that the two steps that are needed can be written
>>>> more shortly
>>>
>>>
>>>> The  "been this way 25 years"  is one a reason to be very
>>>> cautious(*) with changes, but not a reason for no changes!
>>>
>>>
>>>> (*) Indeed as some of you have noted we really should not "break
>>>> behavior".
>>>> This means to me we cannot accept a change there which gives
>>>> an error or a different result in cases the old behavior gave a valid
>>>> factor.
>>>
>>>
>>>> I'm looking at a possible change currently
>>>> [not promising that a change will happen ...]
>>>
>>>
>>> In the end, I've liked the change (after 2-3 iterations), and
>>> now been brave to commit to R-devel (svn 72845).
>>>
>>> With the change, I had to disable one of our own regression
>>> checks (tests/reg-tests-1b.R, line 726):
>>>
>>> The following is now (in R-devel -> R 3.5.0) valid:
>>>
>>>> factor(1:2, labels = c("A","A"))
>>>
>>>[1] A A
>>>Levels: A
>>>>
>>>>
>>>
>>> I wonder how many CRAN package checks will "break" from
>>> this (my guess is in the order of a dozen), but I hope
>>> that these breakages will be benign, e.g., similar to the above
>>> case where before an error was expected via tools :: assertError(.)
>>>
>>> Martin
>>>
>>> __
>>> R-devel@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>>
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



-- 
Paul E. Johnson   http://pj.freefaculty.org
Director, Center for Research Methods and Data Analysis http://crmda.ku.edu

To write to me directly, please address me at pauljohn at ku.edu.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Rmpi, openMPI editions.

2017-07-05 Thread Paul Johnson
 API v2.0, Component v1.6.5)
   MCA btl: openib (MCA v2.0, API v2.0, Component v1.6.5)
   MCA btl: self (MCA v2.0, API v2.0, Component v1.6.5)
   MCA btl: sm (MCA v2.0, API v2.0, Component v1.6.5)
   MCA btl: tcp (MCA v2.0, API v2.0, Component v1.6.5)

And like this after changing either ~/openmpi/mca-params.conf or,
etc/openmpi-mca-params.conf).

   MCA btl: ofud (MCA v2.0, API v2.0, Component v1.6.5)
   MCA btl: self (MCA v2.0, API v2.0, Component v1.6.5)
   MCA btl: sm (MCA v2.0, API v2.0, Component v1.6.5)
   MCA btl: tcp (MCA v2.0, API v2.0, Component v1.6.5)

I believe it is worth mentioning that, if some of your compute nodes
have Infiniband, an some do not, then OpenMPI jobs will crash if they
try to integrate nodes connected with ethernet and Infiniband.  That
is another reason to tell OpenMPI not to try to use Infiniband at all.

pj

On Mon, Jun 19, 2017 at 2:34 PM, Paul Johnson  wrote:
> Greetings.
>
> I see a warning message while compiling OpenMPI and would appreciate
> it if you tell me what it means.
>
> This warning happens with any OpenMPI > 1.6.5.  Even before starting a
> cluster, just "sessionInfo" triggers this warning.
>
> I'm pasting in the message from R-3.3.2 (this is MRO).
>
> Do the R parallel package cluster functions violate the warnings described 
> here?
>
>> library("Rmpi")
>> sessionInfo()
> --
> An MPI process has executed an operation involving a call to the
> "fork()" system call to create a child process.  Open MPI is currently
> operating in a condition that could result in memory corruption or
> other system errors; your MPI job may hang, crash, or produce silent
> data corruption.  The use of fork() (or system() or other calls that
> create child processes) is strongly discouraged.
>
> The process that invoked fork was:
>
>   Local host:  n401 (PID 114242)
>   MPI_COMM_WORLD rank: 0
>
> If you are *absolutely sure* that your application will successfully
> and correctly survive a call to fork(), you may disable this warning
> by setting the mpi_warn_on_fork MCA parameter to 0.
> --
> R version 3.3.2 (2016-10-31)
> Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: Red Hat Enterprise Linux Server release 6.4 (Santiago)
>
> locale:
> [1] C
>
> attached base packages:
> [1] stats graphics  grDevices utils datasets  methods   base
>
> other attached packages:
> [1] Rmpi_0.6-6   RevoUtilsMath_10.0.0
>
> loaded via a namespace (and not attached):
> [1] RevoUtils_10.0.2 parallel_3.3.2   tools_3.3.2
>>
>
> What I think this means is that we need to never run any multicore
> functions and we need to be very careful that MKL or such does not
> launch threads.  Is that right? Is it worse than that?
>
> Why am I chasing this one today?
>
> I've been on an adventure compiling R in a RedHat 6 cluster again. The
> cluster admins here like the Microsoft R, and they had both 3.3 and
> 3.4 installed. However, we found some packaging flaws in 3.4 and so
> that MRO was removed. I'm interested in building R-3.4, but it is a
> pretty big job on the old RedHat.  I want to get this correct.
>
> I've run into the problem I'd forgotten about OpenMPI. If OpenMPI >=
> 2, then Rmpi will compile, but jobs hang with "stopCluster".  With
> OpenMPI-1.6.5, we get a clean build and no warnings, and clusters do
> start and stop cleanly.  With newer version 1 editions of OpenMPI,
> such as 1.10 or 1.12 (I suspect any versions (> 1.6.5), the Rmpi
> generates an intimidating warning, but the cluster will stop when
> asked.
>
>
> --
> Paul E. Johnson   http://pj.freefaculty.org
> Director, Center for Research Methods and Data Analysis http://crmda.ku.edu
>
> To write to me directly, please address me at pauljohn at ku.edu.



-- 
Paul E. Johnson   http://pj.freefaculty.org
Director, Center for Research Methods and Data Analysis http://crmda.ku.edu

To write to me directly, please address me at pauljohn at ku.edu.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] what do you think about write.table(... qmethod = "excel")?

2017-09-19 Thread Paul Johnson
Last week one of our clients reported trouble with a csv file I
generated with write.table.  He said that columns with quotes for
character variables were rejected by their data importer, which was
revised to match the way Microsoft Excel uses quotation marks in
character variables.  I explained to them that quoted character
variables are virtuous and wise, of course, but they say Microsoft
Excel CSV export no longer quotes characters unless they include
commas in the values.

They showed me a CSV file from Excel that looked like this

x1,x2,x3,x4 5 6
fred,barney,betty,x
bambam,"fred,wilma",pebbles,y

Note how the quotes only happen on row 2 column 2. I was surprised it
did that, but now I have some pressure to write a csv maker that has
that structure.  Its weird, even when there are spaces in values there
are no quotation marks.

Has anybody done this and verified that it matches CSV from MS Excel?
If I succeed will you consider a patch?

pj
-- 
Paul E. Johnson   http://pj.freefaculty.org
Director, Center for Research Methods and Data Analysis http://crmda.ku.edu

To write to me directly, please address me at pauljohn at ku.edu.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] what do you think about write.table(... qmethod = "excel")?

2017-09-20 Thread Paul Johnson
On Tue, Sep 19, 2017 at 4:45 PM, Duncan Murdoch
 wrote:
>
>
> That's true, but if that's what they want to do, and they're willing to pay
> to be able to write files that imitate Excel, then why not do what they ask?
>
> On the other hand, if they aren't willing to pay for the work, then you
> should lecture them on how silly their request is.
>
> In any case, base R functions should not include nonsense, so this is not
> something that should go into R.
>
> Duncan Murdoch
>

I understand.  This is a paying client, I'm going where the money goes.

Here's my current working example of a function that writes a CSV
exactly as Excel does. I've posted this into StackOverflow
(https://stackoverflow.com/questions/25743018/how-to-conditionally-remove-quotes-in-write-csv).
It is buried under a thread, maybe will not get much attention and
bug-checking. Oh, well, I can hope. Processes variables
column-by-column, I don't know how to do it differently.



##' Write CSV files with quotes same as MS Excel 2013 or newer
##'
##' R's write.csv inserts quotes around all elements in a character
##' vector (if quote = TRUE).  In contrast, MS Excel CSV export no
##' longer inserts quotation marks on all elements in character
##' variables, except when the cells include commas or quotation
##' marks.  This function generates CSV files that are, so far as we
##' know, exactly the same "quoted style" as MS Excel CSV export
##' files.
##'
##' This works by manually inserting quotation marks where necessary and
##' turning FALSE R's own method to insert quotation marks.
##' @param x a data frame
##' @param file character string for file name
##' @param row.names Default FALSE for row.names
##' @importFrom utils write.table
##' @return the return from write.table, using revised quotes
##' @export
##' @author Paul Johnson
##' @examples
##' set.seed(234)
##' x1 <- data.frame(x1 = c("a", "b,c", "b", "The \"Washington, DC\""),
##'   x2 = rnorm(4), stringsAsFactors = FALSE)
##' x1
##' fn <- tempfile(pattern = "testcsv", fileext = ".csv")
##' writeCSV(x1, file = fn)
##' readLines(fn)
##' x2 <- read.table(fn, sep = ",", header = TRUE, stringsAsFactors = FALSE)
##' all.equal(x1,x2)
writeCSV <- function(x, file, row.names = FALSE){
xischar <- colnames(x)[sapply(x, is.character)]
for(jj in xischar){
x[ , jj] <- gsub('"', '""', x[ , jj], fixed = TRUE)
needsquotes <- grep('[\",]', x[ ,jj])
x[needsquotes, jj] <- paste0("\"", x[needsquotes, jj], "\"")
}
write.table(x, file = file, sep = ",", quote = FALSE,
row.names = row.names)
}

Output:

>  set.seed(234)
>  x1 <- data.frame(x1 = c("a", "b,c", "b", "The \"Washington, DC\""),
+x2 = rnorm(4), stringsAsFactors = FALSE)
>  x1
x1 x2
1a  0.6607697
2  b,c -2.0529830
3b -1.4992061
4 The "Washington, DC"  1.4712331
>  fn <- tempfile(pattern = "testcsv", fileext = ".csv")
>  writeCSV(x1, file = fn)
>  readLines(fn)
[1] "x1,x2"
[2] "a,0.660769736644892"
[3] "\"b,c\",-2.052983003941"
[4] "b,-1.49920605110092"
[5] "\"The \"\"Washington, DC\"\"\",1.4712331168047"
>  x2 <- read.table(fn, sep = ",", header = TRUE, stringsAsFactors = FALSE)
>  all.equal(x1,x2)
[1] TRUE

I already see one problem, that I've got no special arrangement for
column names with commas or quotes. People who want column names with
those things are even more wrong than the people write a parser that
can't understand quotes on character variables.

-- 
Paul E. Johnson   http://pj.freefaculty.org
Director, Center for Research Methods and Data Analysis http://crmda.ku.edu

To write to me directly, please address me at pauljohn at ku.edu.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] package check fail on Windows-release only?

2017-11-20 Thread Paul Johnson
I mistakenly left a write in "/tmp" in the rockchalk package (version
1.8.109) that I uploaded last Friday. Kurt H wrote and asked me to fix
today.

While uploading a new one, I became aware of a problem I had not seen.
The version I uploaded last Friday, 1.8.109, has OK status on all
platforms except r-release-windows-ix86+x86_64. I get OK on
oldrel-windows and also on devel-windows.

https://cran.r-project.org/web/checks/check_results_rockchalk.html

Can you please advise me on what to do? I don't understand, well,
anything about this :(

The error says:

* installing *source* package 'rockchalk' ...
** package 'rockchalk' successfully unpacked and MD5 sums checked
** R
** data
** inst
** preparing package for lazy loading
Warning: S3 methods 'print.sparseSummary', 'print.diagSummary',
'c.abIndex', 'c.sparseVector', 'as.array.Matrix',
'as.array.sparseVector', 'as.matrix.Matrix', 'as.matrix.sparseVector',
'as.vector.Matrix', 'as.vector.sparseVector' were declared in
NAMESPACE but not found
Error in namespaceExport(ns, exports) :
  undefined exports: %&%, Cholesky, .SuiteSparse_version, Diagonal,
.symDiagonal, .sparseDiagonal, .trDiagonal, Hilbert, KhatriRao,
Matrix, MatrixClass, spMatrix, sparseMatrix, rsparsematrix, Schur,
abIseq, abIseq1, rep2abI, band, bandSparse, bdiag, .bdiag,
c.sparseVector, condest, onenormest, .asmatrix, .dsy2mat, .dsy2dsp,
.dxC2mat, .T2Cmat, ..2dge, .dense2sy, .C2nC, .nC2d, .nC2l, .diag.dsC,
.solve.dgC.chol, .solve.dgC.qr, .solve.dgC.lu, diagN2U, diagU2N,
.diagU2N, .diag2tT, .diag2sT, .diag2mat, drop0, expand, expm, facmul,
fac2sparse, fac2Sparse, forceSymmetric, T2graph, graph2T,
anyDuplicatedT, uniqTsparse, isTriangular, isDiagonal, isLDL,
is.null.DN, invPerm, lu, nearPD, nnzero, formatSpMatrix,
formatSparseM, .formatSparseSimple, printSpMatrix, printSpMatrix2,
qrR, rankMatrix, readHB, readMM, sparse.model.matrix, sparseVector,
symmpart, skewpart, tril, triu, updown, pack, unpack,
.updateCHMfactor, .validateCsparse, writeMM, cBind, rBind
ERROR: lazy loading failed for package 'rockchalk'
* removing 'd:/Rcompile/CRANpkg/lib/3.4/rockchalk'
* restoring previous 'd:/Rcompile/CRANpkg/lib/3.4/rockchalk'


-- 
Paul E. Johnson   http://pj.freefaculty.org
Director, Center for Research Methods and Data Analysis http://crmda.ku.edu

To write to me directly, please address me at pauljohn at ku.edu.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Two R editiosn in Unix cluster systems

2013-10-15 Thread Paul Johnson
Dear R Devel

Some of our R users are still insisting we run R-2.15.3 because of
difficulties with a package called OpenMX.  It can't cooperate with new R,
oh well.

Other users need to run R-3.0.1. I'm looking for the most direct route to
install both, and allow users to choose at runtime.

In the cluster, things run faster if I install RPMs to each node, rather
than putting R itself on the NFS share (but I could do that if you think
it's really better)

In the past, I've used the SRPM packaging from EPEL repository to make a
few little path changes and build R RPM for our cluster nodes. Now I face
the problem of building 2 RPMS, one for R-2.15.3 and one for R-newest, and
somehow keeping them separate.

If you were me, how would you approach this?

Here's my guess

First, The RPM packages need unique names, of course.

Second, leave the RPM packaging for R-newest exactly the same as it always
was.  R is in the path, the R script and references among all the bits will
be fine, no need to fight. It will find what it needs in /usr/lib64/R or
whatnot.

For the legacy R, I'm considering 2 ideas.  I could install R with the same
prefix, /usr, but very careful so the R bits are installed into separate
places. I just made a fresh build of R and on RedHat 6, it appears to me R
installs these directories:
bin
libdir
share.

So what if the configure line has the magic bindir=/usr/bin-R-2.15.3
libdir = /usr/lib64/R-2.15.3, and whatnot. If I were doing Debian
packaging, I suppose I'd be obligated (by the file system standard) to do
that kind of thing. But it looks like a headache.

The easy road is to set the prefix at some out of the way place, like
/opt/R-2.15.3, and then use a post-install script to link
/opt/R-2/15.3/bin/R to /usr/bin/R-2.15.3.  When I tried that, it surprised
me because R did not complain about lack access to devel headers. It
configures and builds fine.

R is now configured for x86_64-unknown-linux-gnu

  Source directory:  .
  Installation directory:/tmp/R

  C compiler:gcc -std=gnu99  -g -O2
  Fortran 77 compiler:   gfortran  -g -O2

  C++ compiler:  g++  -g -O2
  Fortran 90/95 compiler:gfortran -g -O2
  Obj-C compiler:gcc -g -O2 -fobjc-exceptions

  Interfaces supported:  X11, tcltk
  External libraries:readline, ICU, lzma
  Additional capabilities:   PNG, JPEG, TIFF, NLS, cairo
  Options enabled:   shared BLAS, R profiling, Java

  Recommended packages:  yes

Should I worry about any runtime complications of this older R finding its
of the newer R in the PATH ahead of it? I worry I'm making lazy
assumptions?

After that, I need to do some dancing with the RPM packaging.

I suppose there'd be some comfort if I could get the users to define R_HOME
in their user environment before launching jobs, I think that would
eliminate the danger of confusion between versions, wouldn't it?

pj
-- 
Paul E. Johnson
Professor, Political Science  Assoc. Director
1541 Lilac Lane, Room 504  Center for Research Methods
University of Kansas University of Kansas
http://pj.freefaculty.org   http://quant.ku.edu

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Tab formatting in dummy.coef.R

2014-01-02 Thread Paul Johnson
Happy New Year

I recognize this is a low priority issue, but... I'll fix it if you let
me.

There are some TABs where R style calls for 4 spaces. For example
R-3.0.2/src/library/stats/R/dummy.coef.R.

I never noticed this until today, when I was stranded on a deserted island
with only the R source code and a Swiss Army knife (vi). Now I realize my
~.vimrc has tabstop set at 2, and it makes tab indentations in R source "go
backwards".  if you are strictly following R Internals 9 "R Coding
Standards", the TABs should be replaced with spaces, I shouldn't have to
reconfigure the editor.

Here's the way it looks in my vi

dummy.coef.aovlist <- function(object, use.na = FALSE, ...)
{
Terms <- terms(object, specials="Error")
err <- attr(Terms,"specials")$Error - 1
tl <- attr(Terms, "term.labels")[-err]
int <- attr(Terms, "intercept")
facs <- attr(Terms, "factors")[-c(1,1+err), -err, drop=FALSE]
vars <- rownames(facs)
xl <- attr(object, "xlevels")
if(!length(xl)) { # no factors in model
  return(as.list(coef(object)))
}
nxl <- setNames(rep.int(1, length(vars)), vars)
tmp <- unlist(lapply(xl, length))
nxl[names(tmp)] <- tmp
lterms <- apply(facs, 2L, function(x) prod(nxl[x > 0]))
nl <- sum(lterms)
args <- setNames(vector("list", length(vars)), vars)
for(i in vars)
  args[[i]] <- if(nxl[[i]] == 1) rep.int(1, nl)
  else factor(rep.int(xl[[i]][1L], nl), levels = xl[[i]])
dummy <- do.call("data.frame", args)
pos <- 0
rn <- rep.int(tl, lterms)
rnn <- rep.int("", nl)
for(j in tl) {
  i <- vars[facs[, j] > 0]
  ifac <- i[nxl[i] > 1]
  if(length(ifac) == 0L) {# quantitative factor
  rnn[pos + 1] <- j
  } else if(length(ifac) == 1L) { # main effect
  dummy[ pos+1L:lterms[j], ifac ] <- xl[[ifac]]
  rnn[ pos+1L:lterms[j] ] <- as.character(xl[[ifac]])
[... snip ]

The patch looks like this in my vi:

*** dummy.coef.R  2013-03-05 17:02:23.0 -0600
--- dummy.coef-indent.R 2014-01-02 21:49:25.828163544 -0600
***
*** 28,35 
  Terms <- delete.response(Terms)
  vars <- all.vars(Terms)
  xl <- object$xlevels
! if(!length(xl)) { # no factors in model
!   return(as.list(coef(object)))
  }
  nxl <- setNames(rep.int(1, length(vars)), vars)
  tmp <- unlist(lapply(xl, length)) ## ?? vapply(xl, length, 1L)
--- 28,35 
  Terms <- delete.response(Terms)
  vars <- all.vars(Terms)
  xl <- object$xlevels
! if(!length(xl)) {# no factors in model
! return(as.list(coef(object)))
  }
  nxl <- setNames(rep.int(1, length(vars)), vars)
  tmp <- unlist(lapply(xl, length)) ## ?? vapply(xl, length, 1L)
***
*** 38,64 
  nl <- sum(lterms)
  args <- setNames(vector("list", length(vars)), vars)
  for(i in vars)
!   args[[i]] <- if(nxl[[i]] == 1) rep.int(1, nl)
!   else factor(rep.int(xl[[i]][1L], nl), levels = xl[[i]])
  dummy <- do.call("data.frame", args)
  pos <- 0
  rn <- rep.int(tl, lterms)
  rnn <- rep.int("", nl)
  for(j in tl) {
!   i <- vars[facs[, j] > 0]
!   ifac <- i[nxl[i] > 1]
!   if(length(ifac) == 0L) {# quantitative factor
!   rnn[pos+1] <- j
!   } else if(length(ifac) == 1L) { # main effect
!   dummy[ pos+1L:lterms[j], ifac ] <- xl[[ifac]]
!   rnn[ pos+1L:lterms[j] ] <- as.character(xl[[ifac]])
!   } else {  # interaction
!   tmp <- expand.grid(xl[ifac])
!   dummy[ pos+1L:lterms[j], ifac ] <- tmp
!   rnn[ pos+1L:lterms[j] ] <-
! apply(as.matrix(tmp), 1L, function(x) paste(x, collapse=":"))
!   }
!   pos <- pos + lterms[j]
  }
  ## some terms like poly(x,1) will give problems here, so allow
  ## NaNs and set to NA afterwards.
--- 38,64 
  nl <- sum(lterms)
  args <- setNames(vector("list", length(vars)), vars)
  for(i in vars)
! args[[i]] <- if(nxl[[i]] == 1) rep.int(1, nl)
! else factor(rep.int(xl[[i]][1L], nl), levels = xl[[i]])
  dummy <- do.call("data.frame", args)
  pos <- 0
  rn <- rep.int(tl, lterms)
  rnn <- rep.int("", nl)
  for(j in tl) {
! i <- vars[facs[, j] > 0]
! ifac <- i[nxl[i] > 1]
! if(length(ifac) == 0L) {# quantitative factor
! rnn[pos+1] <- j
! } else if(length(ifac) == 1L) {# main effect
! dummy[ pos+1L:lterms[j], ifac ] <- xl[[ifac]]
! rnn[ pos+1L:lterms[j] ] <- as.character(xl[[ifac]])
! } else {# interaction
! tmp <- expand.grid(xl[ifac])
! dummy[ pos+1L:lterms[j], ifac ] <- tmp
! rnn[ pos+1L:lterms[j] ] <-
! apply(as.matrix(tmp), 1L, function(x) paste(x,
collapse=":"))
! }
! pos <- pos + lterms[j]
  }
[...snip...]

I could make a patch that applies at the top level, maybe one for each
library. If one of you will consider using it.

Again, I recognize this is a small thing, the editor c

[Rd] predict.glm line 28. Please explain

2014-01-13 Thread Paul Johnson
I imitated predict.glm, my thing worked, now I need to revise.  It would
help me very much if someone would explain predict.glm line 28, which says

object$na.action <- NULL # kill this for predict.lm calls

I want to know

1) why does it set the object$na.action to NULL
2) what does the comment after mean?

Maybe I need a pass by value lesson too, because I can't see how changing
that in object would have any effect on calculations done elsewhere.

pj


Here's the context from predict.glm, to save you trouble of looking it up:

predict.glm <-
  function(object, newdata = NULL, type = c("link", "response", "terms"),
   se.fit = FALSE, dispersion = NULL, terms = NULL,
   na.action = na.pass, ...)
{
## 1998/06/23 KH:  predict.lm() now merged with the version in lm.R

type <- match.arg(type)
na.act <- object$na.action
object$na.action <- NULL # kill this for predict.lm calls
if (!se.fit) {
## No standard errors
if(missing(newdata)) {
pred <- switch(type,
   link = object$linear.predictors,
   response = object$fitted.values,
   terms = predict.lm(object,  se.fit = se.fit,
   scale = 1, type = "terms", terms = terms)
   )
if(!is.null(na.act)) pred <- napredict(na.act, pred)
} else {
pred <- predict.lm(object, newdata, se.fit, scale = 1,
   type = ifelse(type == "link", "response",
type),
   terms = terms, na.action = na.action)
switch(type,
   response = {pred <- family(object)$linkinv(pred)},
   link = , terms = )
  }
} else {
## summary.survreg has no ... argument.
if(inherits(object, "survreg")) dispersion <- 1.
if(is.null(dispersion) || dispersion == 0)
dispersion <- summary(object, dispersion=dispersion)$dispersion
residual.scale <- as.vector(sqrt(dispersion))
pred <- predict.lm(object, newdata, se.fit, scale = residual.scale,
   type = ifelse(type == "link", "response", type),
   terms = terms, na.action = na.action)
fit <- pred$fit
se.fit <- pred$se.fit
switch(type,
   response = {
   se.fit <- se.fit * abs(family(object)$mu.eta(fit))
   fit <- family(object)$linkinv(fit)
   },
   link = , terms = )
if( missing(newdata) && !is.null(na.act) ) {
fit <- napredict(na.act, fit)
se.fit <- napredict(na.act, se.fit)
}
pred <- list(fit = fit, se.fit = se.fit, residual.scale =
residual.scale)
}
pred
}



-- 
Paul E. Johnson
Professor, Political Science  Assoc. Director
1541 Lilac Lane, Room 504  Center for Research Methods
University of Kansas University of Kansas
http://pj.freefaculty.org   http://quant.ku.edu

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] package compositions removed CRAN. Explain please the output?

2014-05-13 Thread Paul Johnson
I notice  compositions was removed on CRAN, that's one I want to try out.

I downloaded the last tarball, the build output doesn't look that bad.
A warning, no errors.


$ R CMD build compositions
* checking for file ‘compositions/DESCRIPTION’ ... OK
* preparing ‘compositions’:
* checking DESCRIPTION meta-information ... OK
* cleaning src
* checking for LF line-endings in source and make files
* checking for empty or unneeded directories
* looking to see if a ‘data/datalist’ file should be added
* building ‘compositions_1.40-0.tar.gz’

$ R CMD check --as-cran compositions_1.40-0.tar.gz
* using log directory ‘/tmp/compositions.Rcheck’
* using R version 3.1.0 (2014-04-10)
* using platform: x86_64-pc-linux-gnu (64-bit)
* using session charset: UTF-8
* checking for file ‘compositions/DESCRIPTION’ ... OK
* this is package ‘compositions’ version ‘1.40-0’
* checking CRAN incoming feasibility ... WARNING
Maintainer: ‘K. Gerald van den Boogaart ’
New submission
Package was archived on CRAN
Insufficient package version (submitted: 1.40.0, existing: 1.40.0)
CRAN repository db overrides:
  X-CRAN-Comment: Archived on 2014-05-01 as undefined-behaviour errors
were not corrected (and previouslu archived on 2014-03-07).
* checking package namespace information ... OK
* checking package dependencies ... NOTE
  No repository set, so cyclic dependency check skipped
* checking if this is a source package ... OK
* checking if there is a namespace ... OK
* checking for executable files ... OK
* checking for hidden files and directories ... OK
* checking for portable file names ... OK
* checking for sufficient/correct file permissions ... OK
* checking whether package ‘compositions’ can be installed ... OK
* checking installed package size ... OK
* checking package directory ... OK
* checking DESCRIPTION meta-information ... OK
* checking top-level files ... OK
* checking for left-over files ... OK
* checking index information ... OK
* checking package subdirectories ... OK
* checking R files for non-ASCII characters ... OK
* checking R files for syntax errors ... OK
* checking whether the package can be loaded ... OK
* checking whether the package can be loaded with stated dependencies ... OK
* checking whether the package can be unloaded cleanly ... OK
* checking whether the namespace can be loaded with stated dependencies ... OK
* checking whether the namespace can be unloaded cleanly ... OK
* checking loading without being on the library search path ... OK
* checking dependencies in R code ... OK
* checking S3 generic/method consistency ... OK
* checking replacement functions ... OK
* checking foreign function calls ... OK
* checking R code for possible problems ... OK
* checking Rd files ... OK
* checking Rd metadata ... OK
* checking Rd line widths ... OK
* checking Rd cross-references ... OK
* checking for missing documentation entries ... OK
* checking for code/documentation mismatches ... OK
* checking Rd \usage sections ... OK
* checking Rd contents ... OK
* checking for unstated dependencies in examples ... OK
* checking contents of ‘data’ directory ... OK
* checking data for non-ASCII characters ... OK
* checking data for ASCII and uncompressed saves ... OK
* checking line endings in C/C++/Fortran sources/headers ... OK
* checking line endings in Makefiles ... OK
* checking compilation flags in Makevars ... OK
* checking for portable use of $(BLAS_LIBS) and $(LAPACK_LIBS) ... OK
* checking compiled code ... OK
* checking sizes of PDF files under ‘inst/doc’ ... OK
* checking installed files from ‘inst/doc’ ... OK
* checking files in ‘vignettes’ ... OK
* checking examples ... OK
Examples with CPU or elapsed time > 5s
   user system elapsed
SimulatedAmounts 18.810  0.188  19.064
rmahalanobis  8.573  0.016   8.619
replot7.078  0.012   7.113
* checking for unstated dependencies in tests ... OK
* checking tests ...
  Running ‘AitchisonTest.R’
  Running ‘CheckGeometry.R’ [38s/38s]
  Running ‘RobustTest.R’
  Running ‘TestMissing.R’ [39s/40s]
  Running ‘TestSimp.R’ [12s/12s]
  Running ‘acheck.R’
  Running ‘geostat.R’
  Running ‘missings.R’
  Running ‘quickcheck.R’
 OK
* checking PDF version of manual ... OK

WARNING: There was 1 warning.
NOTE: There was 1 note.
See
  ‘/tmp/compositions.Rcheck/00check.log’
for details.






-- 
Paul E. Johnson
Professor, Political Science  Assoc. Director
1541 Lilac Lane, Room 504  Center for Research Methods
University of Kansas University of Kansas
http://pj.freefaculty.org   http://quant.ku.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] portableParalleSeeds Package violation, CRAN exception?

2014-08-06 Thread Paul Johnson
I'm writing to ask for a policy exception, or advice on how to make
this package CRAN allowable.

http://rweb.quant.ku.edu/kran/src/contrib/portableParallelSeeds_0.9.tar.gz

Yesterday I tried to submit a package on CRAN and Dr Ripley pointed
out that I had not understood the instructions about packages.  Here's
the part where the R check gives a Note

* checking R code for possible problems ... NOTE
Found the following assignments to the global environment:
File ‘portableParallelSeeds/R/initPortableStreams.R’:
   assign("currentStream", n, envir = .GlobalEnv)
   assign("currentStates", curStates, envir = .GlobalEnv)
   assign("currentStream", 1L, envir = .GlobalEnv)
   assign("startStates", runSeeds, envir = .GlobalEnv)
   assign("currentStates", runSeeds, envir = .GlobalEnv)
   assign("currentStream", as.integer(currentStream), envir = .GlobalEnv)
   assign("startStates", runSeeds, envir = .GlobalEnv)
   assign("currentStates", runSeeds, envir = .GlobalEnv)

Altering the user's environment requires a special arrangement with
CRAN. I believe this is justified, I'll sketch the reasons now. But,
mostly, I'm at your mercy and if there is any way to make this
possible, I would be very grateful.

To control & replace random number streams, it really is necessary to
alter the workspace. That's where the random generator state is
stored.  It is acknowledged in Robert Gentleman' s Book, R Programming
for Bionformatics "The decision to have these [random generator]
functions manipulate a global variable, .Random.seed, is slightly
unfortunate as it makes it somewhat more difficult to manage several
different random number streams simultaneously” (Gentleman, 2009, p.
201).

I have developed an understandable set of wrapper functions that handle this.

Some of you may recall this project. I've asked about it here a couple
of times. We allow separate streams of randoms for different purposes
within a single R run. There is a framework to save 1000s of those
sets in a file, so it can be used on a cluster or in a single
workstation.  This is handy because, when 1 run in 10,000 on the
cluster exhibits some weird behavior, we can easily re-initiate that
interactively and see what's going on.

I have a  vignette "pps" that explains. I dropped a copy of that here
in case you don't want to get the package:

http://pj.freefaculty.org/scraps/pps.pdf

While working on that, I gained a considerably deeper understanding of
random generators and seeds.  That is what this vignette is about

http://pj.freefaculty.org/scraps/PRNG-basics.pdf


We've been running simulations on our cluster with the
portableParallelSeeds framework for 2 years, we've never had any
trouble.  We are able to re-start runs, verify random number draws in
separate streams.

PJ
-- 
Paul E. Johnson
Professor, Political Science  Assoc. Director
1541 Lilac Lane, Room 504  Center for Research Methods
University of Kansas University of Kansas
http://pj.freefaculty.org   http://quant.ku.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] S3 generic method dispatch on promises

2015-01-16 Thread Paul Johnson
Dear R friends

I wanted a function to make a simple percent table that would be easy for
students to use. The goal originally was to have a simple thing people
would call like this

pctable(rowvar, colvar, data)

and the things "rowvar" and "colvar" might be names of variables in data. I
wanted to avoid the usage of "with" (as we now see in the table help).

Then some people wanted more features, and I agreed with the suggestion to
create a formula interface that people can call like so:

pctable(rowvar ~ colvar, data)

I end up with a generic function pctable and methods pctable.default,
pctable.formula, pctable.character.

I got that working, mostly I understand what's going on.

Except the following, which, actually, is a good lesson to me about
promises and method dispatch in R. An S3 generic will not send a call with
a promise in the first argument to pctable.default (as I had mistakenly
hoped). I'll paste in all the code below, but I think you will know the
answer even without running it.

pctable is a generic function.  In workspace, I have no objects x and y,
but there are variables inside data.frame dat named x and y.   Since y is
not an object, the method dispatch fails thus:

> pctable(y, x, dat)
Error in pctable(y, x, dat) (from #3) : object 'y' not found

This direct call on pctable.default works (recall  y and x are promises):

> pctable.default(y, x, dat)
Count (column %)
 x
y 1  2  3  4  Sum
  A   5(20%) 3(12%) 5(20%) 6(24%) 19
  B   9(36%) 5(20%) 4(16%) 6(24%) 24
  C   1(4%)  6(24%) 3(12%) 2(8%)  12
  D   4(16%) 4(16%) 6(24%) 5(20%) 19
  E   6(24%) 7(28%) 7(28%) 6(24%) 26
  Sum 25 25 25 25 100

All the methods work fine when the first argument is a language object.

This works (dispatches to pctable.formula)

> pctable(y ~ x, dat)
Count (column %)
 x
y 1  2  3  4  Sum
  A   5(20%) 3(12%) 5(20%) 6(24%) 19
  B   9(36%) 5(20%) 4(16%) 6(24%) 24
  C   1(4%)  6(24%) 3(12%) 2(8%)  12
  D   4(16%) 4(16%) 6(24%) 5(20%) 19
  E   6(24%) 7(28%) 7(28%) 6(24%) 26
  Sum 25 25 25 25 100


This works (dispatches to pctable.default)
> pctable(dat$y, dat$x)
Count (column %)
 dat$x
dat$y 1  2  3  4  Sum
  A   5(20%) 3(12%) 5(20%) 6(24%) 19
  B   9(36%) 5(20%) 4(16%) 6(24%) 24
  C   1(4%)  6(24%) 3(12%) 2(8%)  12
  D   4(16%) 4(16%) 6(24%) 5(20%) 19
  E   6(24%) 7(28%) 7(28%) 6(24%) 26
  Sum 25 25 25 25 100

However, this fails because y is not an object with a type

> pctable(y, x, dat)
Error in pctable(y, x, dat) (from #3) : object 'y' not found

Can R be tricked to send that call to pctable.default, where it does work?

Here's the code, I'm working on documentation, will put in package
rockchalk eventually, but hate to leave this problem until I fully
understand it.


pctable <- function(rv, ...)
{
UseMethod("pctable")
}

## rv: row variable, quoted or not
## cv: column variable, quoted or not
pctable.default <- function(rv, cv, data = parent.frame(),
rvlab = NULL, cvlab = NULL,
colpct = TRUE, rowpct = FALSE,
exclude = c(NA, NaN), rounded = FALSE)
{
rvlabel <- if (!missing(rv)) deparse(substitute(rv))
cvlabel <- if (!missing(cv)) deparse(substitute(cv))
rvlab <- if (is.null(rvlab)) rvlabel else rvlab
cvlab <- if (is.null(cvlab)) cvlabel else cvlab

rvin <- eval(substitute(rv), envir = data, enclos = parent.frame())
cvin <- eval(substitute(cv), envir = data, enclos = parent.frame())

t1 <- table(rvin, cvin, dnn = c(rvlab, cvlab), exclude = exclude)
rownames(t1)[is.na(rownames(t1))] <- "NA" ## symbol to letters
colnames(t1)[is.na(colnames(t1))] <- "NA"
if (rounded) t1 <- round(t1, -1)
t2 <- addmargins(t1, c(1,2))
t1colpct <- round(100*prop.table(t1, 2), 1)
t1rowpct <- round(100*prop.table(t1, 1), 1)
t1colpct <- apply(t1colpct, c(1,2), function(x) gsub("NaN", "", x))
t1rowpct <- apply(t1rowpct, c(1,2), function(x) gsub("NaN", "", x))
res <- list("count" = t2, "colpct" = t1colpct, "rowpct" = t1rowpct,
call = match.call())
class(res) <- "pctable"
print(res, colpct = colpct, rowpct = rowpct)
invisible(res)
}


pctable.formula <- function(formula, data = NULL,  rvlab = NULL,
cvlab = NULL, colpct = TRUE, rowpct = FALSE,
exclude = c(NA, NaN), rounded = FALSE,
..., subset = NULL)

{
if (missing(data) || !is.data.frame(data)) stop("pctable requires a
data frame")
if (missing(formula) || (length(formula) != 3L))
stop("pctable requires a two sided formula")
mt <- terms(formula, data = data)
if (attr(mt, "response") == 0L) stop("response variable is required")
mf <- match.call(expand.dots = FALSE)
keepers <- match(c("formula", "data", "subset", "na.action"),
names(mf), 0L)
mf <- mf[c(1L, keepers)]
mf$drop.unused.levels <- F

[Rd] "What Calls What" diagram. Flow Chart?

2011-10-09 Thread Paul Johnson
I don't know the right computer science words for this question, I'm
afraid. Apology in advance.

How do you find your way around in somebody else's code?  If the user
runs a specific command, and wants to know how the data is managed
until the result is returned, what to do ?

I've tried this manually with tools like mtrace and browser. This is a
bit frustrating because the browser does not stay in effect until the
work is done. It quits at the end of the function.  So you have to
attach the browser to the functions that are called there, and so
forth.  But that doesn't quite put everything together.

Example.  Recently I was trying to find out where the package lavaan's
calculations for the function cfa are actually done and it was a
maddening chase from one function to the next, as data was
re-organized and options were filled in. lavaan's "cfa" function
reformats some options, then the work gets done by an eval.

cfa> fit <- cfa(HS.model, data=HolzingerSwineford1939)
debugging in: cfa(HS.model, data = HolzingerSwineford1939)
debug: {
mc <- match.call()
mc$model.type = as.character(mc[[1L]])
if (length(mc$model.type) == 3L)
mc$model.type <- mc$model.type[3L]
mc$int.ov.free = TRUE
mc$int.lv.free = FALSE
mc$auto.fix.first = !std.lv
mc$auto.fix.single = TRUE
mc$auto.var = TRUE
mc$auto.cov.lv.x = TRUE
mc$auto.cov.y = TRUE
mc[[1L]] <- as.name("lavaan")
eval(mc, parent.frame())
}

The value of "mc" that gets executed by eval is this

Browse[2]> mc
lavaan(model.syntax = HS.model, data = HolzingerSwineford1939,
model.type = "cfa", int.ov.free = TRUE, int.lv.free = FALSE,
auto.fix.first = TRUE, auto.fix.single = TRUE, auto.var = TRUE,
auto.cov.lv.x = TRUE, auto.cov.y = TRUE)

So then I need to but a debug on "lavaan" and step through that, see
what it does.

Is there a way to make a list of the functions that are called "pop
out", possibly with a flow chart?

Consider lm,  I want to know

lm -> lm.fit ->  .Fortran("dqrls")

I'm not asking for a conceptual UML diagram, so far as I know.

The kind of trace information you get with gdb in C programs and
shallow steps with "n" would probably help. I would not need to keep
attaching more functions with debug.

pj

-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] "What Calls What" diagram. Flow Chart?

2011-10-10 Thread Paul Johnson
On Sun, Oct 9, 2011 at 5:29 PM,   wrote:
> Hi Paul
>
> Have you tried
>
> mvbutils::foodweb( where=asNamespace( 'lavaan'))
>
> (assuming lavaan has a namespace, otherwise where='package:lavaan')?
>
> Sounds like it's what you're after--
>
> Mark
>
Thanks, Mark. The foodweb graph for lavaan is a bit overwhelming.

The graph shows everything it finds that might be called any time, it
doesn't help me trace the path of a specific user call to a particular
function. So I'm not entirely sure it is doing what I hope for.

While matching the graph against the source code, it seems to me some
R language idioms can confuse/break the foodweb.  When eval is called
on a string object, then I think function calls can escape detection.
In the cfa example code I put in the original post, the function
"lavaan" is called by eval, and as far as I can tell in the foodweb
output, that connection is not found.

I'm still studying your package, of course, but here's (I think) an
example, I know "cfa" does call "lavaan" though eval, but this code

library(lavaan)
library(mvbutils)
mvbutils::foodweb( where=asNamespace( 'lavaan'))

myfw <- mvbutils::foodweb( where=asNamespace( 'lavaan'))

callers.of("lavaan", myfw)


> [1] "independence.model"  "independence.model.fit"
[3] "independence.model.fit2" "setLavaanOptions"




-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] extra & in some lines of R KEYWORDS file

2011-11-02 Thread Paul Johnson
The KEYWORDS file (on R 2.13.1 anyway) , seems to have some extra "&" symbols.
Look below at "&" before "character" "complex" "category" and "NA"

?intentional or mistaken? If not mistaken, what do the & mean?

Basics
sysdata &   Basic System Variables  [!= S]
datasets&   Datasets available by data(.)   [!= S]
data&   Environments, Scoping, Packages [~= S]
manip   &   Data Manipulation
attribute   &   Data Attributes
classes &   Data Types (not OO)
& character &   Character Data ("String") Operations
& complex   &   Complex Numbers
& category  &   Categorical Data
& NA&   Missing Values  [!= S]
list&   Lists
chron   &   Dates and Times
package &   Package Summaries


-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] One step way to create data frame with variable "variable names"?

2011-11-11 Thread Paul Johnson
Suppose

plotx <- "someName"
modx <- "otherName"
plotxRange <- c(10,20)
modxVals <- c(1,2,3)

It often happens I want to create a dataframe or object with plotx or
modx as the variable names.  But can't understand syntax to do that.

I can get this done in 2 steps, creating the data frame and then
assigning names, as in

newdf <- data.frame( c(1, 2, 3, 4), c(4, 4, 4, 4))
colnames(newdf) <- c(plotx, modx)

I was trying to hit this in one step, but can't find how.  If I could
get this in one step, it would simplify some book keeping, because
this current method requires me to build up the name vector to match
newdf, and sometimes I make mistakes.

After the data.frame already exists, I understand I can insert new
variables using the variable names with syntax like

newdf[[plotx]] <- c(23, 23, 23, 23)

PJ

-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] termplot & predict.lm. some details about calculating predicted values with "other variables set at the mean"

2011-12-13 Thread Paul Johnson
I'm making some functions to illustrate regressions and I have been
staring at termplot and predict.lm and residuals.lm to see how this is
done. I've wondered who wrote predict.lm originally, because I think
it is very clever.

I got interested because termplot doesn't work with interactive models:

> m1 <- lm(y ~ x1*x2)
> termplot(m1)
Error in `[.data.frame`(mf, , i) : undefined columns selected

Digging into that, I realized some surprising implications of
nonlinear formulas.

This issue arises when there are math functions in the regression
formula. The question focuses on what we mean by the mean of "x" when
we are discussing predictions and deviations.

Suppose one fits:

m1 <- lm (y ~ x1 + log(x2), data=dat)

I had thought the partial residual was calculated with reference to
the log of the mean of x2. But that's not right. It is calculated with
reference to mean(log(x2)). That seems misleading, termplot shows a
graph illustrating the effect of x2 on the horizontal axis (not
"log(x2)").  I should not say misleading.  Rather, it is unexpected.
I think users who want the reference value in the plot of x2 to be the
mean of x2 have a legitimate concern here.

With a more elaborate formula, the mismatch gets more confusing.
Suppose the regression formula is

m2 <- lm (y ~ x1 + poly(x2,3), data=dat)

The model frame has these variables:

  yx1 poly(x2, 3).1 poly(x2, 3).2 poly(x2, 3).3

and the partial residual calculation for variable x1, which I had
expected would be based on a polynomial transformation of mean(x2), is
the weighted sum of the means of the 3 polys.

Can you help me see this more clearly?  (Or less wrongly?)

Perhaps you think I don't understand partial residuals in termplot,
but I am pretty sure I do.  I made notes about it. See slides 54 and
55 in here: 
http://pj.freefaculty.org/guides/Rcourse/regression-tableAndPlot-1/regression-tableAndPlot.pdf

-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] termplot & predict.lm. some details about calculating predicted values with "other variables set at the mean"

2011-12-13 Thread Paul Johnson
I'm making some functions to illustrate regressions and I have been
staring at termplot and predict.lm and residuals.lm to see how this is
done. I've wondered who wrote predict.lm originally, because I think
it is very clever.

I got interested because termplot doesn't work with interactive models:

> m1 <- lm(y ~ x1*x2)
> termplot(m1)
Error in `[.data.frame`(mf, , i) : undefined columns selected

Digging into that, I realized some surprising implications of
nonlinear formulas.

This issue arises when there are math functions in the regression
formula. The question focuses on what we mean by the mean of "x" when
we are discussing predictions and deviations.

Suppose one fits:

m1 <- lm (y ~ x1 + log(x2), data=dat)

I had thought the partial residual was calculated with reference to
the log of the mean of x2. But that's not right. It is calculated with
reference to mean(log(x2)). That seems misleading, termplot shows a
graph illustrating the effect of x2 on the horizontal axis (not
"log(x2)").  I should not say misleading.  Rather, it is unexpected.
I think users who want the reference value in the plot of x2 to be the
mean of x2 have a legitimate concern here.

With a more elaborate formula, the mismatch gets more confusing.
Suppose the regression formula is

m2 <- lm (y ~ x1 + poly(x2,3), data=dat)

The model frame has these variables:

  yx1 poly(x2, 3).1 poly(x2, 3).2 poly(x2, 3).3

and the partial residual calculation for variable x1, which I had
expected would be based on a polynomial transformation of mean(x2), is
the weighted sum of the means of the 3 polys.

Can you help me see this more clearly?  (Or less wrongly?)

Perhaps you think I don't understand partial residuals in termplot,
but I am pretty sure I do.  I made notes about it. See slides 54 and
55 in here: 
http://pj.freefaculty.org/guides/Rcourse/regression-tableAndPlot-1/regression-tableAndPlot.pdf

-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] returning information from functions via attributes rather than return list

2012-01-03 Thread Paul Johnson
I would like to ask for advice from R experts about the benefits or
dangers of using attr to return information with an object that is
returned from a function. I have a feeling as though I have cheated by
using attributes, and wonder if I've done something fishy.

Maybe I mean to ask, where is the dividing line between attributes and
instance variables?  The separation is not clear in my mind anymore.

Background: I paste below a function that takes in a regression object
and make changes to the data and/or call and then run a
revised regression.  In my earlier effort, I was building a return
list, including the new fitted regression object plus some
variables that have information about the changes that a were made.

That creates some inconvenience, however.  When the regression is in a
list object, then methods for lm objects don't apply to that result
object. The return is not an lm anymore.  I either need to write
custom methods for every function or remember to extract the object
from the list before sending to the generic function.

I *guessed* it would work to write the new information as object
attributes, and it seems to work. There is a generic function
"meanCenter" and a method "meanCenter.default". At the end of
meanCenter.default, here's my use (or abuse) of attributes.

  res <- eval(mc)
  class(res) <- c("mcreg", class(model))
  attr(res, "centeredVars") <- nc
  attr(res, "centerCall") <-  match.call()
  res

I wrote print and summary methods, but other methods that work for lm
objects like plot will also work for these new ones.



meanCenter <- function(model, centerOnlyInteractors=TRUE,
centerDV=FALSE, standardize=FALSE, centerContrasts = F){
  UseMethod("meanCenter")
}

meanCenter.default <- function(model, centerOnlyInteractors=TRUE,
centerDV=FALSE, standardize=FALSE, centerContrasts = F){

  std <- function(x) {
if( !is.numeric(x) ){
  stop("center.lm tried to center a factor variable. No Can Do!")
} else {
  scale(x, center = TRUE, scale = standardize)
}
  }

  rdf <- get_all_vars(formula(model), model$model) #raw data frame
  t <- terms(model)
  tl <- attr(t, "term.labels")
  tmdc <- attr(t, "dataClasses") ##term model data classes

  isNumeric <- names(tmdc)[ which(tmdc %in% c("numeric"))]
  isFac <-  names(tmdc)[ which(tmdc %in% c("factor"))]
  if (tmdc[1] != "numeric") stop("Sorry, DV not a single numeric column")

  ##Build "nc", a vector of variable names that "need centering"
  ##
  if (!centerDV) {
if (centerOnlyInteractors == FALSE){
  nc <- isNumeric[-1] #-1 excludes response
  unique(nc)
}else{
  interactTerms <- tl[grep(":", tl)]
  nc <- unique(unlist(strsplit( interactTerms, ":")))
  nc <-  nc[which(nc %in% isNumeric)]
}
  }else{
if (centerOnlyInteractors == FALSE){
  nc <- isNumeric
}else{
  interactTerms <- tl[grep(":", tl)]
  nc <- unique(unlist(strsplit( interactTerms, ":")))
  nc <- nc[which(nc %in% isNumeric)]
  nc <- c( names(tmdc)[1] , nc)
}
  }


  mc <- model$call
  # run same model call, replacing non centered data with centered data.
  ## if no need to center factor contrasts:
  if (!centerContrasts)
{
  stddat <- rdf
  for (i in nc) stddat[ , i] <- std( stddat[, i])
  mc$data <- quote(stddat)
}else{
  ##dm: design matrix, only includes intercept and predictors
  dm <- model.matrix(model, data=rdf, contrasts.arg =
model$contrasts, xlev = model$xlevels)
  ##contrastIdx: indexes of contrast variables in dm
  contrastIdx <- which(attr(dm, "assign")== match(isFac, tl))
  contrastVars <- colnames(dm)[contrastIdx]
  nc <- c(nc, contrastVars)

  dm <- as.data.frame(dm)

  hasIntercept <- attr(t, "intercept")
  if (hasIntercept) dm <- dm[ , -1] # removes intercept, column 1

  dv <- rdf[ ,names(tmdc)[1]] #tmdc[1] is response variable name
  dm <- cbind(dv, dm)
  colnames(dm)[1] <- names(tmdc)[1] #put colname for dv

  dmnames <- colnames(dm)
  hasColon <- dmnames[grep(":", dmnames)]
  dm <- dm[ , -match(hasColon, dmnames)] ##remove vars with colons
(lm will recreate)

  ##Now, standardise the variables that need standardizing
  for (i in nc) dm[ , i] <- std( dm[, i])


  fmla <- formula(paste(dmnames[1], " ~ ",  paste(dmnames[-1],
collapse=" + ")))
  cat("This fitted model will use those centered variables\n")
  cat("Model-constructed interactions such as \"x1:x3\" are built
from centered variables\n")
  mc$formula <- formula(fmla)
  mc$data <-  quote(dm)
}

  cat("These variables", nc, "Are centered in the design matrix \n")

  res <- eval(mc)
  class(res) <- c("mcreg", class(model))
  attr(res, "centeredVars") <- nc
  attr(res, "centerCall") <-  match.call()
  res
}

summary.mcreg <- function(object, ...){
  nc <- attr(object, "centeredVars")
  cat("The centered variables were: \n")
  print(nc)
  cat("Even though the variables here have the same names as their
non-centered

Re: [Rd] returning information from functions via attributes rather than return list

2012-01-04 Thread Paul Johnson
On Tue, Jan 3, 2012 at 3:59 PM, Simon Urbanek
 wrote:
> Paul,
>
> On Jan 3, 2012, at 3:08 PM, Paul Johnson wrote:
>
>> I would like to ask for advice from R experts about the benefits or
>> dangers of using attr to return information with an object that is
>> returned from a function. I have a feeling as though I have cheated by
>> using attributes, and wonder if I've done something fishy.
>>
>> Maybe I mean to ask, where is the dividing line between attributes and
>> instance variables?  The separation is not clear in my mind anymore.
>>
>> Background: I paste below a function that takes in a regression object
>> and make changes to the data and/or call and then run a
>> revised regression.  In my earlier effort, I was building a return
>> list, including the new fitted regression object plus some
>> variables that have information about the changes that a were made.
>>
>> That creates some inconvenience, however.  When the regression is in a
>> list object, then methods for lm objects don't apply to that result
>> object. The return is not an lm anymore.
>
> Why don't you just subclass it? That's the "normal" way of doing things - you 
> simply add additional entries for your subclass (e.g. m$myItem1, m$myItem2, 
> ...), prepend your new subclass name and you're done. You can still dispatch 
> on your subclass before the superclass while superclass methods just work as 
> well..
>
> Cheers,
> Simon
>

Yes. I see that now.

But I'm still wondering about the attribute versus variable question.
To the programmer, is there any difference between returning
information as attributes or variables?

Does R care if I do this:

class(res) <- c("mcreg", "lm")
attr(res, "centeredVars") <- nc

Or this:

class(res) <- c("mcreg", "lm")
res$centeredVars <- nc

Is there some place "down there," in the C foundations of R,  where an
attribute is just a variable, as is centeredVars?

pj

-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] delete.response leaves response in attribute dataClasses

2012-01-05 Thread Paul Johnson
I posted this one as an R bug
(https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=14767), but
Prof. Ripley says I'm premature, and I should raise the question here.

Here's the behavior I assert is a bug:
The output from delete.response on a terms object alters the formula
by removing the dependent variable. It removes the response from the
"variables" attribute and it changes the response attribute from 1 to
0.  The response is removed from "predvars"

But it leaves the name of the dependent variable first in the in
"dataClasses".  It caused an unexpected behavior in my code, so (as
usual) the bug may be mine, but in my heart, I believe it belongs to
delete.response.

To illustrate, here's a terms object from a regression.

> tt
y ~ x1 * x2 + x3 + x4
attr(,"variables")
list(y, x1, x2, x3, x4)
attr(,"factors")
   x1 x2 x3 x4 x1:x2
y   0  0  0  0 0
x1  1  0  0  0 1
x2  0  1  0  0 1
x3  0  0  1  0 0
x4  0  0  0  1 0
attr(,"term.labels")
[1] "x1""x2""x3""x4""x1:x2"
attr(,"order")
[1] 1 1 1 1 2
attr(,"intercept")
[1] 1
attr(,"response")
[1] 1
attr(,".Environment")

attr(,"predvars")
list(y, x1, x2, x3, x4)
attr(,"dataClasses")
yx1x2x3x4
"numeric" "numeric" "numeric" "numeric" "numeric"

Now observe that delete.response removes the response from all
attributes except dataClasses.

> delete.response(tt)
~x1 * x2 + x3 + x4
attr(,"variables")
list(x1, x2, x3, x4)
attr(,"factors")
   x1 x2 x3 x4 x1:x2
x1  1  0  0  0 1
x2  0  1  0  0 1
x3  0  0  1  0 0
x4  0  0  0  1 0
attr(,"term.labels")
[1] "x1""x2""x3""x4""x1:x2"
attr(,"order")
[1] 1 1 1 1 2
attr(,"intercept")
[1] 1
attr(,"response")
[1] 0
attr(,".Environment")

attr(,"predvars")
list(x1, x2, x3, x4)
attr(,"dataClasses")
yx1x2x3x4
"numeric" "numeric" "numeric" "numeric" "numeric"


pj

-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] informal conventions/checklist for new predictive modeling packages

2012-01-05 Thread Paul Johnson
I agree with almost all, except the last point. Since I have
participated in wheel-reinvention lately, I agree with the bulk of
your comment. I don't think the fix is as easy as you suspect,
RSiteSearch won't help me find a function I need when I don't know the
magic words.  Some R functions have such unexpected names that only a
fastidious source-code reader would find them ("pretty", for example).
 But I agree with your concern.

But, as far as the last one is concerned, I think you are mistaken.
Explanation below.

On Wed, Jan 4, 2012 at 8:19 AM, Max Kuhn  wrote:
>
> (14) [OCD] For binary classification models, model the probability of
> the first level of a factor as the event of interest (again, for
> consistency) Note that glm() does not do this but most others use the
> first level.
>
When the DV is thought of as 0 and 1, and 1 is an "event" "success" or
"win" and 0 is a "non event" "failure" or "loss",  if there is to be a
single predicted probability, I want it to be the probability of the
higher outcome.

glm is doing the thing I want, and I don't know of others that go the
other way, except PROC LOGISTIC in SAS.  And that has a long history
of causing confusion and despair.

I'd like to consider adding one thing to your list, though.  I have
wished (in this list and elsewhere) that there were a more regular
approach for calculating "newdata" objects that are used in predict.
Many packages have re-invented this (datadist in rms, effects), and
almost nobody here agreed with my wish for a more standard approach.
But if there were a standard approach, it would be much easier to hold
up R as an alternative to Stata when users pop up with "marginal
effects tables" from Stata that are very difficult to reproduce with
R.

Regards,
pj

-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] delete.response leaves response in attribute dataClasses

2012-01-06 Thread Paul Johnson
Thanks, Bill

Counter-arguments at the end

On Thu, Jan 5, 2012 at 3:15 PM, William Dunlap  wrote:
> My feeling that everyone would index dataClasses by name was
> wrong.  I looked through the packages that used dataClasses
> and saw code that would break if the first (response) entry
> were omitted.  (I didn't check to see if passing the output
> of delete.response to these functions would be appropriate.)
> E.g.,
> file: AICcmodavg/R/predictSE.mer.r
>  ##matrix with info on factors
>  fact.frame <- attr(attr(orig.frame, "terms"), "dataClasses")[-1]
>
>  ##continue if factors
>  if(any(fact.frame == "factor")) {
>    id.factors <- which(fact.frame == "factor")
>    fact.name <- names(fact.frame)[id.factors] #identify the rows for factors
>
> Some packages create a dataClass attribute for a model.frame
> (not its terms attribute) that does not have any names:
> file: caper/R/macrocaic.R
>   attr(mf, "dataClasses") <- rep("numeric", dim(termFactors)[2])
> .checkMFClasses() does not throw an error for that, but it
> doesn't do any real checking either.
>
> Most users of dataClasses do pass it to .checkMFClasses() to
> compare it with newdata and that doesn't care if you have extra
> entries in dataClasses.
>
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com
>

I can't understand what your point is.  I agree we can work around the
problem, but why should we have to?

If you confine yourself to the output of "delete.response" applied to
a terms object from a regression, can you point to any package or
usage that depends on leaving the response variable in the dataClasses
attribute?  I can't find one.  In R base, these are all the references
to delete.response:

stats/R/models.R:delete.response <- function (termobj)
stats/R/lm.R:Terms <- delete.response(tt)
stats/R/lm.R:Terms <- delete.response(tt)
stats/R/ppr.R:Terms <- delete.response(object$terms)
stats/R/loess.R:
as.matrix(model.frame(delete.response(terms(object)), newdata,
stats/R/dummy.coef.R:Terms <- delete.response(Terms)

I've looked it over carefully and predict.lm (in lm.R) would not be
affected by the change I propose. I can't find any usage in loess.R of
the dataClasses attribute.

Furthermore, I can't see how a person would use the dataClasses
attribute at all, after the other markers of the response are
eliminated. How is a method to find which variable is the response,
after response=0?

I'm not disagreeing with you that I can workaround the peculiarity
that the response is left in the dataClasses attribute of the output
object from delete.response.  I'm just saying it is a complication
that programmers should not have to put up with, because I think
delete.response should delete the response from all attributes of a
terms object.

pj


-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Example of "task seeds" with R parallel. Critique?

2012-01-13 Thread Paul Johnson
Greetings:

In R parallel's vignette, there is a comment "It would however take
only slightly more work to allocate a stream to each task." (p.6).
I've written down a working example that can allocate not just one,
but several separate seeds for each task.  (We have just a few project
here that need multiple streams).  I would like to help work that up
for inclusion in the parallel package, if possible.

This approach is not original. It combines the idea behind snowFT and
ideas for setting and monitoring seeds to replicate random streams in
John Chambers Software for Data Analysis, Section 6.10. I'm able to
save a block of seeds in a file, run a simulation for each one, and
then re-run any particular task and match the random numbers.

But I realize there are a lot of dangers I am not yet aware of, so I'm
asking you what can go wrong?  I see danger in conflicts between my
effort to manage seeds and the work of parallel functions that try to
manage seeds for me.  That's why I wish I could integrate task-based
seeds into parallel itself.


RNGkind("L'Ecuyer-CMRG")
set.seed(23456)

library(parallel)

## nrep = number of repetitions (or tasks)
## streamsPerRep = number of streams needed by each repetition
nReps <- 2000
streamsPerRep <- 2

## projSeeds=list of lists of stream seeds
projSeeds <- vector(mode="list", nReps)
for (i in 1:nReps) projSeeds[[i]] <- vector(mode="list", streamsPerRep)

runif(1) ##establishes .Random.seed
##Grab first seed
s <- .Random.seed
origSeed <- s

x <- rnorm(4) ##will compare later
x

for (i in 1:nReps) {
  for (j in 1:streamsPerRep){
projSeeds[[i]][[j]] <- s
s <- nextRNGStream(s)
  }
}


save(projSeeds, file="projSeeds.rda")

rm(projSeeds)

load("projSeeds.rda")

##Note that origSeed does match projSeeds
origSeed
projSeeds[[1]][[1]]


## And we get same random draws from project 1, stream 1
.Random.seed <- projSeeds[[1]][[1]]
rnorm(4)
x

##Another way (preferred?) to reset stream
assign(".Random.seed", projSeeds[[1]][[1]], envir = .GlobalEnv)
rnorm(4)




## Now, how to make this more systematic
## Each task needs streamsPerRep seeds
## startSeeds = for each stream, a starting seed
## currentSeeds = for each stream, a seed recording stream's current position
## currentStream = integer indicator of which stream is in use


## Test that interactively
currentStream <- 1
currentSeeds <- startSeeds <- projSeeds[[1]]
.Random.seed <- startSeeds[[currentStream]]

useStream <- function(n = NULL, origin = FALSE){
  if (n > length(currentSeeds)) stop("requested stream does not exist")
  currentSeeds[[currentStream]] <- .Random.seed
  if (origin) assign(".Random.seed", startSeeds[[n]], envir = .GlobalEnv)
  else assign(".Random.seed",  currentSeeds[[n]], envir = .GlobalEnv)
  currentStream <<- n
}


useStream(n=1, origin=TRUE)
rnorm(4)

currentStream

useStream(n=2, origin=TRUE)
rnorm(4)

currentStream


## Now, make that work in a clustered environment

cl <- makeCluster(9, "MPI")

## run on worker, so can retrieve seeds for particular run
initSeeds <- function(p = NULL){
  currentStream <<- 1
  projSeeds[[p]]
}


clusterEvalQ(cl, {
  RNGkind("L'Ecuyer-CMRG")
})

clusterExport(cl, c("projSeeds", "useStream", "initSeeds"))

someHorrendousFunction <- function(run, parm){
  currentStream <- 1
  currentSeeds <- startSeeds <- initSeeds(run)
  assign(".Random.seed",  startSeeds[[currentStream]],  envir = .GlobalEnv)

  ##then some gigantic, long lasting computation occurs
  dat <- data.frame(x1 = rnorm(parm$N), x2 = rnorm(parm$N), y = rnorm(parm$N))
  m1 <- lm(y ~ x1 + x2, data=dat)
  list(m1, summary(m1), model.matrix(m1))
}

whatever <- list("N" = 999)

res <- parLapply(cl, 1:nReps, someHorrendousFunction, parm = whatever)

res[[77]]

##Prove I can repeat 77'th task

res77 <- someHorrendousFunction(77, parm = whatever)

## well, that worked.
stopCluster(cl)

-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Last Call: Who says this is not a bug in delete.response()?

2012-01-24 Thread Paul Johnson
On January 5, I posted here concerning this issue
(https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=14767), and I
have not heard any answers from people who think this is not a bug,
or a simple accident in the delete.response function.

The output from delete.response on a terms object alters the formula
by removing the dependent variable. It removes the response from the
"variables" attribute and it changes the response attribute from 1 to
0.  The response is removed from "predvars"

But it leaves the name of the dependent variable first in the in
"dataClasses" attribute.  I believe delete.response should be fixed, and if
I'm wrong, I wish one of you would tell me.  Otherwise, I will ask Prof.
Ripley to re-open the bug report.

This is an important API question, since functions that use regression
models (in my opinion) ought to be able to count on delete.response
to leave behind a dataClasses attribute that no longer includes the response.

If we could count on that, then, some methods that work with fitted regressions
could work more easily. For example, the work in termplot
with variables "carrier" and "carrier.terms" would be unnecessary,
since the column names of
"dataClasses" would be, exactly, the carrier.name vector and the data
type information
of those variables would eliminate a lot of fancy footwork one has to
do in order
to calculate predicted values.

-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] need gui matrix editor: does R Core team have advice on how?

2012-01-28 Thread Paul Johnson
Dear R-devel:

Would R core team consider endorsing a graphical toolkit and trying to
facilitate development of little GUI tools?

I need a gui matrix editor for users that want to be able to write
matrices that are later used to simulate data.  Instead of teaching
them to write a covariance matrix (for example, for mvtnorm), I want
to tell them run a function that pops up a table they can fill in.

The users need to be able to enter variable names in the matrix, so
something that would accept

a  0  0
0  b  0
c  d  e

would be great.  Something that would accept formulae like this would
be even more great.

a  0  0
0  b  a^2
c  d  e

I want this gui matrix editor to "just work" on Windows, Mac, Linux. I
don't mind building this on top of some widget set, but it is
important that the widget set be easily installable on the user end.

That's why I wish R core would offer us some guidance or advice.  I'm
not a programmer, but I can learn to program with a library, as long
as it is not a waste of time.

I've been searching R archives and here's what I found so far.

1. tcl/tk

Building on R tcltk2, for people that have the Tcl addon widget
TkTable installed, there are several packages.  in tcltk2 itself,
there is a function tk2edit, and there are others that try to
embellish.  I've tried several of these, they seem to be not-quite
done yet, one can't copy a rectangle, for example. But maybe I could
learn how to fix them up and make yet another tktable based editor.

Problem: requires user to have enough understanding to install the Tcl
widget TkTable.  And, for platforms like Windows, user has to install
tcl/tk itself.  On Linux and Mac, that is not as big of a hurdle, so
far as I can tell.  On Debian linux, I found that in a package
libtktable that works, but I have no idea how tough that would be on
other linux systems or Macintosh or Windows.

Another problem is that tcl/tk editions change rapidly, and on several
of our systems, we still don't have access to tcl/tk 8.5
(RedHat/Centos based clusters still running version 5 are like that).

2. Gtk

Building on Rgtk, I found the package RGtk2Extras.  This of course
requires a function gtk tool chain, which used to be a big problem on
the various platforms.  But the function dfedit appears to be almost
exactly what I'm looking for.  I can create a character matrix and put
in letters how I want, but I later face the problem of how to evaluate
the matrix.

Problem: even more than tcl/tk, GTK versions change and become
incompatible, especially across platforms.

What about QT or WX-Widgets.  I gather RkWard is built on KDE, and
hence Qt.  I don't find matrix editors using those languages, but I
don't know why not.

Maybe this is impossible for R core to advise us because you may
disagree on which widget library is best, but if there is some
consensus, I would be glad to know because I would do whatever you
recommend.

I'm pretty sure that, if you said, "use library X, version XYZ", then
the worldwide usage of R is sufficiently broad that people would step
forward and help make sure those libraries are packaged for all OS.  I
mean, if there were no tktable package for any linux for which I make
packages (RedHat, Fedora, Debian, Ubuntu), I would create the
packages.   But I'm not doing it now because I have no reason to
believe that is a good path from here on out.


-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] tcltk GUIs (was need gui matrix editor: does R Core team have advice on how?)

2012-01-29 Thread Paul Johnson
On Sun, Jan 29, 2012 at 6:10 AM, Prof Brian Ripley
 wrote:
> On 28/01/2012 22:04, John Fox wrote:
>>
>> Dear Paul and Gabor,
>>
>> The Rcmdr GUI uses the tcltk package, so I have some experience with
>> providing an R tcltk-based GUI for various platforms.
>>
>> As Gabor says, everything works very smoothly on Windows because the R
>> Windows binary includes Tcl/Tk.
>
>
> Maybe, but getting it there was very far from smooth.  Tcl/Tk compiled under
> the compilers we used, but the resulting DLLs crashed R.  No one has ever
> found the cause and I used the system SDK (essentiallly a version of VC++)
> to build them.  And that puts us in a bind since the current system SDKs
> generate code depending on DLLs that are not part of the minimal OS versions
> we support (e.g. Windows XP and Server 2003, and the machine used to build
> was retired 2 years ago).
>

Thanks, this is clearing things up. I believe these comments mean
that, at the current time, tcl/tk is as close as there is to an
officially endorsed graphical toolkit.  As I search more, I find many
other community contributors (besides Prof. Fox) using tcl/tk
(Sciviews).  So I should learn how to work with that.  Prof Ripley's
comment makes me think the endorsement is not entirely enthusiastic,
though.

If there were a change to emphasize Gtk2, I don't think I would be
disappointed. I've been testing table-making examples today.  On
Debian Linux, I am having more luck with the Gtk2 based packages.
dfedit in RGtk2Extras "just works" for me. Example:

> library(RGtk2Extras)
> mat <- matrix(rnorm(100), 10, 10)
> dfedit(mat)

That edits the R object mat as expected.

On the other hand, I don't have success with the tk2edit from tcltk2,
even with the example in the help page:

> library(tcltk2)
Loading required package: tcltk
Loading Tcl/Tk interface ... done
> ?tk2edit
>   data(iris)
>  tk2edit(iris)
Error in matrix("", nrow = nrow(tA), ncol = ncol(tA)) :
  non-numeric matrix extent

I've fiddled with this quite a bit, I believe there's some little
mismatch between this particular system's tcl/tk libraries and the
ones that tcltk2 is expecting. Packaging of tcl/tk has caused lots of
trouble with Swarm simulations that we run, maybe that's breaking
tktable usage too.   I'm going to look into that some more.

I think the idea behind gWidgetstcltk is great, it aims to create R
functions that can use either Gtk2 or tclk.  But the implementation is
a big hassle, it seems to me.  It inherits all of the management
troubles of both tcltk and Gtk2. For example.

> library(gWidgetstcltk)
> mat <- matrix(rnorm(100), 10 , 10)
> gdf(mat)
Select a GUI toolkit

1: gWidgetsRGtk2
2: gWidgetstcltk

Selection: 2
guiWidget of type: NULL for toolkit: guiWidgetsToolkittcltk
Warning message:
In .gdf(toolkit, items = items, name = name, do.subset = do.subset,  :
  Container is not correct. No NULL containers possible

When I run the example at the end of the help from ?gdf in
gWidgetstcltk, I get this (even before trying to use the table at
all).

>obj[,] <- head(mtcars) ## replace df
Error in `.leftBracket<-`(`*tmp*`, toolkit, ..., value = list(mpg = c(21,  :
  Value has different number of rows than the replacement area

If I make the other selection, opting for Gtk2, I don't get an error,
but nothing happens--no table pops up either.

> library(gWidgetstcltk)
> mat <- matrix(100, 10, 10)
> gdf(mat)
Select a GUI toolkit

1: gWidgetsRGtk2
2: gWidgetstcltk

Selection: 1
Loading required package: gWidgetsRGtk2
guiWidget of type: gGridRGtk for toolkit: guiWidgetsToolkitRGtk2

If I had not seen the Gtk2 table work well with RGtk2Extras, I'd have
no faith at all.

In conclusion, what am I supposed to work on?

If tcl/tk is likely to stay in the R for Windows package, then we can
work on streamlining the Macintosh and Windows instructions for tcltk
maintenance, then I see my mission now is to make TkTable based
widgets work.   Right?

Something Prof. Grothendieck said made me curious.  One can package
the TkTable library with an R package?  Why is it not already included
in packages like tcltk2 or gWidgetstcltk?

Debian package libtktable2.9 installs these files:

/usr/lib/Tktable2.9/libTktable2.9.so
/usr/lib/Tktable2.9/pkgIndex.tcl
/usr/lib/Tktable2.9/tkTable.tcl

So TkTable requries not just the tcl bit, but a shared library. Is
that a substantial roadblock to R packaging of TkTable? (I've never
tried it).

pj
-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] portable parallel seeds project: request for critiques

2012-02-17 Thread Paul Johnson
I've got another edition of my simulation replication framework.  I'm
attaching 2 R files and pasting in the readme.

I would especially like to know if I'm doing anything that breaks
.Random.seed or other things that R's parallel uses in the
environment.

In case you don't want to wrestle with attachments, the same files are
online in our SVN

http://winstat.quant.ku.edu/svn/hpcexample/trunk/Ex66-ParallelSeedPrototype/


## Paul E. Johnson CRMDA 
## Portable Parallel Seeds Project.
## 2012-02-18

Portable Parallel Seeds Project

This is how I'm going to recommend we work with random number seeds in
simulations. It enhances work that requires runs with random numbers,
whether runs are in a cluster computing environment or in a single
workstation.

It is a solution for two separate problems.

Problem 1. I scripted up 1000 R runs and need high quality,
unique, replicable random streams for each one. Each simulation
runs separately, but I need to be confident their streams are
not correlated or overlapping. For replication, I need to be able to
select any run, say 667, and restart it exactly as it was.

Problem 2. I've written a Parallel MPI (Message Passing Interface)
routine that launches 1000 runs and I need to assure each has
a unique, replicatable, random stream. I need to be able to
select any run, say 667, and restart it exactly as it was.

This project develops one approach to create replicable simulations.
It blends ideas about seed management from John M. Chambers
Software for Data Analysis (2008) with ideas from the snowFT
package by Hana Sevcikova and Tony R. Rossini.


Here's my proposal.

1. Run a preliminary program to generate an array of seeds

run1:   seed1.1   seed1.2   seed1.3
run2:   seed2.1   seed2.2   seed2.3
run3:   seed3.1   seed3.2   seed3.3
...  ...   ...
run1000   seed1000.1  seed1000.2   seed1000.3

This example provides 3 separate streams of random numbers within each
run. Because we will use the L'Ecuyer "many separate streams"
approach, we are confident that there is no correlation or overlap
between any of the runs.

The projSeeds has to have one row per project, but it is not a huge
file. I created seeds for 2000 runs of a project that requires 2 seeds
per run.  The saved size of the file 104443kb, which is very small. By
comparison, a 1400x1050 jpg image would usually be twice that size.
If you save 10,000 runs-worth of seeds, the size rises to 521,993kb,
still pretty small.

Because the seeds are saved in a file, we are sure each
run can be replicated. We just have to teach each program
how to use the seeds. That is step two.


2. Inside each run, an initialization function runs that loads the
seeds file and takes the row of seeds that it needs.  As the
simulation progresses, the user can ask for random numbers from the
separate streams. When we need random draws from a particular stream,
we set the variable "currentStream" with the function useStream().

The function initSeedStreams creates several objects in
the global environment. It sets the integer currentStream,
as well as two list objects, startSeeds and currentSeeds.
At the outset of the run, startSeeds and currentSeeds
are the same thing. When we change the currentStream
to a different stream, the currentSeeds vector is
updated to remember where that stream was when we stopped
drawing numbers from it.


Now, for the proof of concept. A working example.

Step 1. Create the Seeds. Review the R program

seedCreator.R

That creates the file "projSeeds.rda".


Step 2. Use one row of seeds per run.

Please review "controlledSeeds.R" to see an example usage
that I've tested on a cluster.

"controlledSeeds.R" can also be run on a single workstation for
testing purposes.  There is a variable "runningInMPI" which determines
whether the code is supposed to run on the RMPI cluster or just in a
single workstation.


The code for each run of the model begins by loading the
required libraries and loading the seed file, if it exists, or
generating a new "projSeed" object if it is not found.

library(parallel)
RNGkind("L'Ecuyer-CMRG")
set.seed(234234)
if (file.exists("projSeeds.rda")) {
  load("projSeeds.rda")
} else {
  source("seedCreator.R")
}

## Suppose the "run" number is:
run <- 232
initSeedStreams(run)

After that, R's random generator functions will draw values
from the first random random stream that was initialized
in projSeeds. When each repetition (run) occurs,
R looks up the right seed for that run, and uses it.

If the user wants to begin drawing observations from the
second random stream, this command is used:

useStream(2)

If the user has drawn values from stream 1 already, but
wishes to begin again at the initial point in that stream,
use this command

useStream(1, origin = TRUE)


Question: Why is this approach better for parallel runs?

Answer: After a batch of simulations, we can re-start any
one of them and repeat it exactly. This builds on the idea
of the snowFT package, by Hana Sevcikova and A.J

Re: [Rd] portable parallel seeds project: request for critiques

2012-02-17 Thread Paul Johnson
On Fri, Feb 17, 2012 at 3:23 PM, Paul Gilbert  wrote:
> Paul
>
> I think (perhaps incorrectly) of the general problem being that one wants to
> run a random experiment, on a single node, or two nodes, or ten nodes, or
> any number of nodes, and reliably be able to reproduce the experiment
> without concern about how many nodes it runs on when you re-run it.
>
> From your description I don't have the impression your solution would do
> that. Am I misunderstanding?
>

Well, I think my approach does that!  Each time a function runs, it grabs
a pre-specified set of seed values and initializes the R .Random.seed
appropriately.

Since I take the pre-specified seeds from the L'Ecuyer et al approach
(cite below), I believe that
means each separate stream is dependably uncorrelated and non-overlapping, both
within a particular run and across runs.

> A second problem is that you want to use a proven algorithm for generating
> the numbers. This is implicitly solved by the above, because you always get
> the same result as you do on one node with a well proven RNG. If you
> generate a string of seed and then numbers from those, do you have a proven
> RNG?
>
Luckily, I think that part was solved by people other than me:

L'Ecuyer, P., Simard, R., Chen, E. J. and Kelton, W. D. (2002) An
object-oriented random-number package with many long streams and
substreams. Operations Research 50 1073–5.
http://www.iro.umontreal.ca/~lecuyer/myftp/papers/streams00.pdf



> Paul
>
>
> On 12-02-17 03:57 PM, Paul Johnson wrote:
>>
>> I've got another edition of my simulation replication framework.  I'm
>> attaching 2 R files and pasting in the readme.
>>
>> I would especially like to know if I'm doing anything that breaks
>> .Random.seed or other things that R's parallel uses in the
>> environment.
>>
>> In case you don't want to wrestle with attachments, the same files are
>> online in our SVN
>>
>>
>> http://winstat.quant.ku.edu/svn/hpcexample/trunk/Ex66-ParallelSeedPrototype/
>>
>>
>> ## Paul E. Johnson CRMDA
>> ## Portable Parallel Seeds Project.
>> ## 2012-02-18
>>
>> Portable Parallel Seeds Project
>>
>> This is how I'm going to recommend we work with random number seeds in
>> simulations. It enhances work that requires runs with random numbers,
>> whether runs are in a cluster computing environment or in a single
>> workstation.
>>
>> It is a solution for two separate problems.
>>
>> Problem 1. I scripted up 1000 R runs and need high quality,
>> unique, replicable random streams for each one. Each simulation
>> runs separately, but I need to be confident their streams are
>> not correlated or overlapping. For replication, I need to be able to
>> select any run, say 667, and restart it exactly as it was.
>>
>> Problem 2. I've written a Parallel MPI (Message Passing Interface)
>> routine that launches 1000 runs and I need to assure each has
>> a unique, replicatable, random stream. I need to be able to
>> select any run, say 667, and restart it exactly as it was.
>>
>> This project develops one approach to create replicable simulations.
>> It blends ideas about seed management from John M. Chambers
>> Software for Data Analysis (2008) with ideas from the snowFT
>> package by Hana Sevcikova and Tony R. Rossini.
>>
>>
>> Here's my proposal.
>>
>> 1. Run a preliminary program to generate an array of seeds
>>
>> run1:   seed1.1   seed1.2   seed1.3
>> run2:   seed2.1   seed2.2   seed2.3
>> run3:   seed3.1   seed3.2   seed3.3
>> ...      ...       ...
>> run1000   seed1000.1  seed1000.2   seed1000.3
>>
>> This example provides 3 separate streams of random numbers within each
>> run. Because we will use the L'Ecuyer "many separate streams"
>> approach, we are confident that there is no correlation or overlap
>> between any of the runs.
>>
>> The projSeeds has to have one row per project, but it is not a huge
>> file. I created seeds for 2000 runs of a project that requires 2 seeds
>> per run.  The saved size of the file 104443kb, which is very small. By
>> comparison, a 1400x1050 jpg image would usually be twice that size.
>> If you save 10,000 runs-worth of seeds, the size rises to 521,993kb,
>> still pretty small.
>>
>> Because the seeds are saved in a file, we are sure each
>> run can be replicated. We just have to teach each program
>> how to use the seeds. That is step two.
>>
>>
>> 2. Inside each run, an initialization function runs that loads the
>> seeds file and t

Re: [Rd] portable parallel seeds project: request for critiques

2012-02-17 Thread Paul Johnson
On Fri, Feb 17, 2012 at 5:06 PM, Petr Savicky  wrote:
> On Fri, Feb 17, 2012 at 02:57:26PM -0600, Paul Johnson wrote:
> Hi.
>
> Some of the random number generators allow as a seed a vector,
> not only a single number. This can simplify generating the seeds.
> There can be one seed for each of the 1000 runs and then,
> the rows of the seed matrix can be
>
>  c(seed1, 1), c(seed1, 2), ...
>  c(seed2, 1), c(seed2, 2), ...
>  c(seed3, 1), c(seed3, 2), ...
>  ...
>
Yes, I understand.

The seed things I'm using are the 6 integer values from the L'Ecuyer.
If you run the example script, the verbose option causes some to print
out.  The first 3 seeds in a saved project seeds file looks like:

> projSeeds[[1]]
[[1]]
[1] 407   376488316  1939487821  1433925148 -1040698333   579503880
[7]  -624878918

[[2]]
[1] 407 -1107332181   854177397  1773099324  1774170776  -266687360
[7]   816955059

[[3]]
[1] 407   936506900 -1924631332 -1380363206  2109234517  1585239833
[7] -1559304513

The 407 in the first position is an integer R uses to note the type of
stream for which the seed is intended, in this case R'Lecuyer.





> There could be even only one seed and the matrix can be generated as
>
>  c(seed, 1, 1), c(seed, 1, 2), ...
>  c(seed, 2, 1), c(seed, 2, 2), ...
>  c(seed, 3, 1), c(seed, 3, 2), ...
>
> If the initialization using the vector c(seed, i, j) is done
> with a good quality hash function, the runs will be independent.
>
I don't have any formal proof that a "good quality hash function"
would truly create seeds from which independent streams will be drawn.

There is, however, the proof in the L'Ecuyer paper that one can take
the long stream and divide it into sections.  That's the approach I'm
taking here. Its the same approach the a parallel package in R
follows, and parallel frameworks like snow.

The different thing in my approach is that I'm saving one row of seeds
per simulation "run".  So each run can be replicated exactly.

I hope.

pj

pj


> What is your opinion on this?
>
> An advantage of seeding with a vector is also that there can
> be significantly more initial states of the generator among
> which we select by the seed than 2^32, which is the maximum
> for a single integer seed.
>
> Petr Savicky.
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] portable parallel seeds project: request for critiques

2012-02-22 Thread Paul Johnson
Greetings. Answers below.

On Tue, Feb 21, 2012 at 7:04 AM, Petr Savicky  wrote:

>
> Hi.
>
> In the description of your project in the file
>
>  http://winstat.quant.ku.edu/svn/hpcexample/trunk/Ex66-ParallelSeedPrototype/README
>
> you argue as follows
>
>  Question: Why is this better than the simple old approach of
>  setting the seeds within each run with a formula like
>
>  set.seed(2345 + 10 * run)
>
>  Answer: That does allow replication, but it does not assure
>  that each run uses non-overlapping random number streams. It
>  offers absolutely no assurance whatsoever that the runs are
>  actually non-redundant.
>
> The following demonstrates that the function set.seed() for
> the default generator indeed allows to have correlated streams.
>
>  step <- function(x)
>  {
>      x[x < 0] <- x[x < 0] + 2^32
>      x <- (69069 * x + 1) %% 2^32
>      x[x > 2^31] <- x[x > 2^31] - 2^32
>      x
>  }
>
>  n <- 1000
>  seed1 <- 124370417 # any seed
>  seed2 <- step(seed1)
>
>  set.seed(seed1)
>  x <- runif(n)
>  set.seed(seed2)
>  y <- runif(n)
>
>  rbind(seed1, seed2)
>  table(x[-1] == y[-n])
>
> The output is
>
>             [,1]
>  seed1 124370417
>  seed2 205739774
>
>  FALSE  TRUE
>      5   994
>
> This means that if the streams x, y are generated from the two
> seeds above, then y is almost exactly equal to x shifted by 1.
>
> What is the current state of your project?
>
> Petr Savicky.


Thanks for the example. That is exactly the point I've been making
about the old, ordinary way of setting seeds.

I continue testing on my plan, I believe it does work. The example
implementation is successful.  I have not received any critiques that
make me think it is theoretically wrong, but I also have not received
feedback from any body who has very detailed knowledge of R's parallel
package to tell me that my approach is good.

In order for this to be easy for users, I need to put the init streams
and set current stream functions into a package, and then streamline
the process of creating the seed array.  My opinion is that CRAN is
now overflowed with too many "one function" packages, I don't want to
create another one just for these two little functions, and I may roll
it into my general purpose regression package called "rockchalk".

If I worked in a company and had money riding on this, I'd need to
hire someone from R Core to stare at the way I'm setting, saving, and
re-setting the .Random.seed variable to be completely sure there are
not complications.  Because parallel package is so new, there is
relatively little documentation on how it is supposed to work. I'm
just gazing at the source code and trying not to  break anything it
does, and trying to stop it from breaking anything I do.

One technical issue that has been raised to me is that R parallel's
implementation of the L'Ecuyer generator is based on integer valued
variables, whereas the original L'Ecuyer code uses double real
variables.  But I'm trusting the R Core on this, if they code the
generator in a way that is good enough for R itself, it is good enough
for me. (And if that's wrong, we'll all find out together :) ).

Josef Leydold (the rstream package author) has argued that R's
implementation runs more slowly than it ought to. We had some
correspondence and I tracked a few threads in forums. It appears the
approach suggested there is roadblocked by some characteristics deep
down in R and the way random streams are managed.  Packages have only
a limited, tenuous method to replace R's generators with their own
generators. Here is the comment  that L'Ecuyer and Leydold make in the
document that is distributed with rstream..  "rstream: Streams of
Random Numbers for Stochastic Simulation"

"There is one drawback in the implementation. R uses a static table to
implement access to different
kinds of random number generators. Only one slot
(RNGkind(kind="user-supplied")) has been provided
for users who want to add her own generator. This slot relies on a
pointer to a function
called user_unif_rand. Thus rstream has to abuse this slot to add
additional random number
generators. However, there are other packages on CRAN that use the
same trick (randaes, rlecuyer,
rsprng, SuppDists) or load one of these packages (Rmpi, rpvm, rpart,
snow, varSelRF). Thus if a
user loads two of these packages or adds her own uniform random number
generator, the behavior
of at least one of the packages is broken. This problem could be fixed
by some changes in the R
core system, either by adding a package variableto
RNGkind(kind="user-supplied") similar to the
.Call function, or by replacing the static table by a dynamic one.

I interpret all of this to mean that, although other approaches may
work as well or better,
the only approach that can "defend itself " in the ecology of a
running R program is an approach
that fiddles with R's own generators, rather than tries to replace
them. And that's what I'm doing.

pj

ps

Just this moment, while googling for Leydold's comments,

[Rd] Unexpected behavior in factor level ordering

2012-02-25 Thread Paul Johnson
Hello, Everybody:

This may not be a "bug", but for me it is an unexpected outcome. A
factor variable's levels
do not retain their ordering after the levels function is used.  I
supply an example in which
a factor with values "BC" "AD" (in that order) is unintentionally
re-alphabetized by the levels
function.

To me, this is very bad behavior. Would you agree?


# Paul Johnson 2012-02-05

x <- c("AD","BC","AD","BC","AD","BC")
xf <- factor(x, levels=c("BC", "AD"), labels=c("Before Christ","After Christ"))
y <- rnorm(6)

m1 <- lm (y ~ xf )

plot(y ~ xf)

abline (m1)
## Just a little problem the line does not "go through" the box
## plot in the right spot because contrasts(xf) is 0,1 but
## the plot uses xf in 1,2.

xlevels <- levels(xf)
newdf <- data.frame(xf=xlevels)

ypred <- predict(m1, newdata=newdf)

##Watch now: the plot comes out "reversed", AC before BC
plot(ypred ~ newdf$xf)

## Ah. Now I see:

levels(newdf$xf)
## Why doesnt newdf$xf respect the ordering of the levels?




-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] --as-cran and need to ignore.svn directories

2012-03-12 Thread Paul Johnson
Good morning:

I submitted a package update to CRAN and got a bounce because I had
not run R CMD check with "--as-cran".  I'd not heard of that before,
but I'm glad to know about it now.

I see it warns when my functions do use partial argument matching, and
I like that advice very much.

Also I see this warning

* checking package subdirectories ... WARNING
Found the following directory(s) with names of version control directories:
  ./.svn
  ./R/.svn
  ./data/.svn
  ./inst/.svn
  ./inst/doc/.svn
  ./inst/examples/.svn
  ./vignettes/.svn
These should not be in a package tarball.

Is there a way to cause R to ignore the .svn folders while running R
CMD check --as-cran or R CMD build?

It seems a little tedious to have to copy the whole directory tree to
some other place and remove the .svn folders before building. I can do
it, but it just seems, well, tedious. I have the feeling that you
"frequent flyers"  would have worked around this already.

-- 
Paul E. Johnson
Professor, Political Science    Assoc. Director
1541 Lilac Lane, Room 504     Center for Research Methods
University of Kansas               University of Kansas
http://pj.freefaculty.org            http://quant.ku.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] --as-cran and need to ignore.svn directories

2012-03-12 Thread Paul Johnson
On Mon, Mar 12, 2012 at 12:47 PM, Martin Maechler
 wrote:
>> Jari Oksanen 
[snip]
>
>    > Paul,
>
>    > I think the best solution is to 'svn export' svn directory to a
>    > temporary directory/folder:
>
> No, I don't think that should not be needed.
>
> When building the tarball,  'R CMD build' *does* ignore the .svn
> folders, at least for me.
>

My tarball was created by R CMD build, I don't know why the .svn
folders were included.  There are no .svn folders in the packages I
made with 2.14.1, so I suppose that is the first problem I have to
figure out.  This was my first build with R-2.14.2.  If I get an
answer, I will write back.

> Paul's problem was using  --as-cran on his SVN *directory*
> but that's a slight contradiction in itself, as the CRAN maintainers
> always run the checks on a tarball...

I don't have an argument against what you say, and I'm glad to do it
however you recommend. But I don't understand this. How do you get the
tarball without first doing R CMD check and R CMD build on the working
directory where the package lives?  I just do what seems to be
recommended in the tutorial by Friedrich Leisch "Creating R Packages:
A Tutorial" and the R extensions document. Until now, I have done this
in the shell to create the package.

R CMD check rockchalk
R CMD build rockchalk

You mean everybody doesn't do it that way?

>From now on, I'm adding --as-cran with check, but what else?

After that finishes, I should be running check on the tarball that was
created by build?

While I'm asking basic packaging questions, I have a vignette
packaging question. My vignette is in

vignettes/outreg.Rnw

The R build process runs latex to make sure the vignette can be built.
But, as far as I can tell, it does not copy the pdf file into

inst/doc/outreg.pdf.

So when building the package, I should still manually compile the pdf
and move it over there?


>
> Martin
>
>    > svn export my-svn-directory tmp-pkg-directory
>    > R CMD build tmp-pkg-directory
>    > R CMD check --as-cran ...
>
>    > The two advantages of 'svn export' that it (1) strips the .svn specific
>    > files, and (2) it really exports only those files that really are under
>    > version control. More often than once I have had some non-svn files in
>    > my svn directory so that *my* version of the package works, but the one
>    > actually in subversion fails.
>
>    > Cheers, Jari Oksanen
>
>    > --
>    > Jari Oksanen, Dept Biology, Univ Oulu, 90014 Finland
>    > jari.oksa...@oulu.fi, Ph. +358 400 408593, http://cc.oulu.fi/~jarioksa


-- 
Paul E. Johnson
Professor, Political Science    Assoc. Director
1541 Lilac Lane, Room 504     Center for Research Methods
University of Kansas               University of Kansas
http://pj.freefaculty.org            http://quant.ku.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] overriding "summary.default" or "summary.data.frame". How?

2012-03-20 Thread Paul Johnson
I suppose everybody who makes a package for the first time thinks "I
can change anything!" and then runs into this same question. Has
anybody written out information on how a package can override
functions in R base in the R 2.14 (mandatory NAMESPACE era)?

Suppose I want to alphabetize variables in a summary.data.frame, or
return the standard deviation with the mean in summary output.  I'm
pasting in a working example below.  It has new "summary.factor"
method. It also has a function summarize that I might like to use in
place of summary.data.frame.

How would my new methods "drop on top" of R base functions?  It
appears my functions (summarizeFactors) can find my summary.factor,
but R's own summary uses its own summary.factor.


## summarizeNumerics takes a data frame or matrix, scans the columns
## to select only the numeric variables.  By default it alphabetizes
## the columns (use alphaSort = FALSE to stop that). It then
## calculates the quantiles for each variable, as well as the mean,
## standard deviation, and variance, and then packs those results into
## a matrix. The main benefits from this compared to R's default
## summary are 1) more summary information is returned for each
## variable, and the results are returned in a matrix that is easy to
## use in further analysis.
summarizeNumerics <- function(dat, alphaSort = TRUE,  digits = max(3,
getOption("digits") - 3)){
  if (!is.data.frame(dat)) dat <- as.data.frame(dat)
  nums <- sapply(dat, is.numeric)
  datn <- dat[ , nums, drop = FALSE]
  if (alphaSort) datn <- datn[ , sort(colnames(datn)), drop = FALSE]
  sumdat <- apply(datn, 2, stats::quantile, na.rm=TRUE)
  sumdat <- rbind(sumdat, mean= apply(datn, 2, mean, na.rm=TRUE))
  sumdat <- rbind(sumdat, sd= apply(datn, 2, sd, na.rm=TRUE))
  sumdat <- rbind(sumdat, var= apply(datn, 2, var, na.rm=TRUE))
  sumdat <- rbind(sumdat, "NA's"=apply(datn, 2, function(x) sum(is.na(x
  signif(sumdat, digits)
}


summary.factor <- function(y, numLevels) {
  ## 5 nested functions to be used later

  divr <- function(p=0){
ifelse ( p>0 & p < 1, -p * log2(p), 0)
  }
  entropy <- function(p){
sum ( divr(p) )
  }
  maximumEntropy <- function(N) -log2(1/N)
  normedEntropy <- function(x) entropy(x)/ maximumEntropy(length(x))
  nas <- is.na(y)
  y <- factor(y)
  ll <- levels(y)
  tbl <- table(y)
  tt <- c(tbl)
  names(tt) <- dimnames(tbl)[[1L]]
  o <- sort.list(tt, decreasing = TRUE)
  if (length(ll) > numLevels){
toExclude <- numLevels:length(ll)
tt <- c(tt[o[-toExclude]], `(All Others)` = sum(tt[o[toExclude]]),
`NA's`=sum(nas))
  }else{
tt <- c(tt[o], `NA's`=sum(nas))
  }
  props <- prop.table(tbl);
  tt <- c(tt, "Entropy"=entropy(props), "NormedEntropy"= normedEntropy(props))
}


## Takes a data frame or matrix, scans the columns to find the
## variables that are not numeric and keeps them. By default it
## alphabetizes them (alphaSort = FALSE to stop that).  It then treats
## all non-numeric variables as if they were factors, and summarizes
## each in a say that I find useful. In particular, for each factor,
## it provides a table of the most frequently occurring values (the
## top "numLevels" values are represented).  As a diversity indictor,
## it calculates the Entropy and NormedEntropy values for each
## variable.  Note not all of this is original. It combines my code
## and R code from base/summary.R
summarizeFactors  <- function(dat = NULL, numLevels = 10, alphaSort =
TRUE, digits = max(3, getOption("digits") - 3))
{

  ##copies from R base::summary.R summary.data.frame
  ncw <- function(x) {
z <- nchar(x, type="w")
if (any(na <- is.na(z))) {
# FIXME: can we do better
  z[na] <- nchar(encodeString(z[na]), "b")
}
z
  }

  if (!is.data.frame(dat)) dat <- as.data.frame(dat)
  ##treat any nonnumeric as a factor
  factors <- sapply(dat, function(x) {!is.numeric(x) })
  ##If only one factor, need drop=FALSE.
  datf <- dat[ , factors, drop = FALSE]
  if (alphaSort) datf <- datf[ , sort(colnames(datf)), drop = FALSE]
  z  <- lapply(datf, summary.factor, numLevels=numLevels)
  nv <- length(datf)
  nm <- names(datf)
  lw <- numeric(nv)
  nr <- max(unlist(lapply(z, NROW)))
  for(i in 1L:nv) {
sms <- z[[i]]
lbs <- format(names(sms))
sms <- paste(lbs, ":", format(sms, digits = digits), "  ",
 sep = "")
lw[i] <- ncw(lbs[1L])
length(sms) <- nr
z[[i]] <- sms
  }
  z <- unlist(z, use.names=TRUE)
  dim(z) <- c(nr, nv)
  if(any(is.na(lw)))
warning("probably wrong encoding in names(.) of column ",
paste(which(is.na(lw)), collapse = ", "))
blanks <- paste(character(max(lw, na.rm=TRUE) + 2L), collapse = " ")
  pad <- floor(lw - ncw(nm)/2)
  nm <- paste(substring(blanks, 1, pad), nm, sep = "")
  dimnames(z) <- list(rep.int("", nr), nm)
  attr(z, "class") <- c("table")
  z
}

##
## want to override summary.data.frame, but confusing.  When
## will R find my summary.data.frame, when will it find the one in base.
## us

[Rd] I wish xlim=c(0, NA) would work. How about I send you a patch?

2012-04-16 Thread Paul Johnson
I'm looking for an R mentor.  I want to propose a change in management
of plot options xlim and ylim.

Did you ever want to change one coordinate in xlim or ylim? It happens
to me all the time.

x <- rnorm(100, m=5, s=1)
y <- rnorm(100, m=6, s=1)
plot(x,y)

## Oh, I want the "y axis" to show above x=0.

plot(x,y, xlim=c(0, ))

##Output: Error in c(0, ) : argument 2 is empty

 plot(x,y, xlim=c(0,NA ))
## Output: Error in plot.window(...) : need finite 'xlim' values


I wish that plot would let me supply just the min (or max) and then
calculate the other value where needed.
It is a little bit tedious for the user to do that for herself.  The
user must be knowledgeable enough to know where the maximum (MAX) is
supposed to be, and then supply xlim=c(0, MAX). I can't see any reason
for insisting users have that deeper understanding of how R calculates
ranges for plots.

Suppose the user is allowed to supply NA to signal R should fill in the blanks.

plot(x,y, xlim=c(0, NA))


In plot.default now, I find this code to manage xlim

   xlim <- if (is.null(xlim))
range(xy$x[is.finite(xy$x)])

And I would change it to be something like
   ##get default range
   nxlim <- range(xy$x[is.finite(xy$x)])

   ## if xlim is NULL, so same as current
xlim <- if (is.null(xlim)) nxlim
## Otherwise, replace NAs in xlim with values from nxlim
else xlim[ is.na(xlim) ] <- nxlim[ is.na(xlim) ]


Who is the responsible party for plot.default.  How about it?

Think of how much happier users would be!

pj
-- 
Paul E. Johnson
Professor, Political Science    Assoc. Director
1541 Lilac Lane, Room 504     Center for Research Methods
University of Kansas               University of Kansas
http://pj.freefaculty.org            http://quant.ku.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Proposal: model.data

2012-05-03 Thread Paul Johnson
Greetings:

I'm still working on functions to make it easier for students to
interpret regression predictions.  I am working out a scheme to
more easily create "newdata" objects (for use in predict functions).
This has been done before in packages like Zelig, Effects,
rms, and others. Nothing is "exactly right," though. Stata users keep
flashing their "predicted probabity tables" at me and I'd like
something like that to use with R.

I'm proposing here a function model.data that receives a regression
and creates a dataframe of raw data from which newdata objects
can be constructed. This follows a suggestion that Bill Dunlap made
to me in response to a question I posted in r-help.

While studying termplot code, I saw the "carrier" function approach
to deducing the "raw" predictors.  However, it does not always work.

Here is one problem. termplot mistakes 10 in log(10 + x1) for a variable

Example:

dat <- data.frame(x1 = rnorm(100), x2 = rpois(100, lambda=7))
STDE <- 10
dat$y <- 1.2 * log(10 + dat$x1) + 2.3 * dat$x2 + rnorm(100, sd = STDE)

m1 <- lm( y ~ log(10 + x1)  + x2, data=dat)
termplot(m1)

## See the trouble? termplot thinks 10 is the term to plot.

Another problem is that predict( type="terms") does not behave
sensibly sometimes. RHS of formula that have nonlinear transformations
are misunderstood as separate terms.

##Example:
dat$y2 <- 1.2 * log(10 + dat$x1) + 2.3 * dat$x1^2 + rnorm(100, sd = STDE)

m2 <- lm( y2 ~ log(10 + x1)  + sin(x1), data=dat)
summary(m2)

predict(m2, type="terms")

## Output:
## log(10 + x1) sin(x1)
## 1 1.50051781 -2.04871711
## 2-0.14707391  0.31131124

What I wish would happen instead is one "correct" prediction
for each value of x1. This should be the output:

predict(m2, newdata = data.frame(x1 = dat$x1))

## > predict(m2, newdata = data.frame(x1 = dat$x1))
##   12345678
## 17.78563 18.49806 17.50719 19.70093 17.45071 19.69718 18.84137 18.89971

The "fix" I'm testing now is the following new function, "model.data".
which tries to re-create the data object that would be
consistent with a fitted model. This follows a suggestion from
Bill Dunlap in r-help on 2012-04-22



##' Creates a "raw" (UNTRANSFORMED) data frame equivalent
##' to the input data that would be required to fit the given model.
##'
##' Unlike model.frame and model.matrix, this does not return transformed
##' variables.
##'
##' @param model A fitted regression model in which the data argument
##' is specified. This function will fail if the model was not fit
##' with the data option.
##' @return A data frame
##' @export
##' @author Paul E. Johnson 
##' @example inst/examples/model.data-ex.R
model.data <- function(model){
fmla <- formula(model)
allnames <- all.vars(fmla) ## all variable names
## indep variables, includes d in poly(x,d)
ivnames <- all.vars(formula(delete.response(terms(model
## datOrig: original data frame
datOrig <-  eval(model$call$data, environment(formula(model)))
if (is.null(datOrig))stop("model.data: input model has no data frame")
## datOrig: almost right, but includes d in poly(x, d)
dat <- get_all_vars(fmla, datOrig)
## Get rid of "d" and other "non variable" variable names that are
not in datOrig:
keepnames <- intersect(names(dat), names(datOrig))
## Keep only rows actually used in model fit, and the correct columns
dat <- dat[ row.names(model$model) , keepnames]
## keep ivnames that exist in datOrig
attr(dat, "ivnames") <- intersect(ivnames, names(datOrig))
invisible(dat)
}


This works for the test cases like log(10+x) and so forth:

## Examples:

head(m1.data <- model.data(m1))

head(m2.data <- model.data(m2))

## > head(m1.data <- model.data(m1))
##  y  x1 x2
## 1 18.53846  0.46176539  8
## 2 28.24759  0.09720934  7
## 3 23.88184  0.67602556  9
## 4 23.50130 -0.74877054  8
## 5 25.81714  1.02555255  5
## 6 24.75052 -0.69659539  6
## > head(m2.data <- model.data(m2))
##  y  x1
## 1 18.53846  0.46176539
## 2 28.24759  0.09720934
## 3 23.88184  0.67602556
## 4 23.50130 -0.74877054
## 5 25.81714  1.02555255
## 6 24.75052 -0.69659539


d <- 2
m4 <- lm(y ~ poly(x1,d), data=dat)

head(m4.data <- model.data(m4))

##  y  x1
## 1 18.53846  0.46176539
## 2 28.24759  0.09720934
## 3 23.88184  0.67602556

Another strength of this approach is that the return object has an
attribute "ivnames".  If R's termplot used model.dat instead of the
carrier functions, this would make for a much tighter set of code.

What flaws do you see in this?

One flaw is that I did not know how to re-construct data from the
parent environment, so I insist the regression model has to have a
data argument. Is this necessary, or can one of the R experts help.

Another possible flaw: I'm keeping the columns from the data frame
that are needed to re-construct the model.frame, and to match
the rows, I'm using row.names for the model

Re: [Rd] Proposal: model.data

2012-05-03 Thread Paul Johnson
Greetings:

On Thu, May 3, 2012 at 11:36 AM, Brian G. Peterson  wrote:
> On Thu, 2012-05-03 at 10:51 -0500, Paul Johnson wrote:
>> If somebody in R Core would like this and think about putting it, or
>> something like it, into the base, then many chores involving predicted
>> values would become much easier.
>>
> Why does this need to be in base?  Implement it in a package.
>>
> If it works, and is additive, people will use it.  Look at 'reshape' or
> 'xts' or 'Matrix' just to name a few examples of widely used packages.
>

I can't use it to fix termplot unless it is in base.

Or are you suggesting I create my own "termplot" replacement?


> Regards,
>
>   - Brian
>
>



-- 
Paul E. Johnson
Professor, Political Science    Assoc. Director
1541 Lilac Lane, Room 504     Center for Research Methods
University of Kansas               University of Kansas
http://pj.freefaculty.org            http://quant.ku.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Proposal: model.data

2012-05-04 Thread Paul Johnson
On Thu, May 3, 2012 at 12:19 PM, Brian G. Peterson  wrote:
> On Thu, 2012-05-03 at 12:09 -0500, Paul Johnson wrote:
>> Greetings:
>>
>> On Thu, May 3, 2012 at 11:36 AM, Brian G. Peterson  
>> wrote:
>> > On Thu, 2012-05-03 at 10:51 -0500, Paul Johnson wrote:
>> >> If somebody in R Core would like this and think about putting it, or
>> >> something like it, into the base, then many chores involving predicted
>> >> values would become much easier.
>> >>
>> > Why does this need to be in base?  Implement it in a package.
>> >>

So, nobody agrees with me that R base should have model.data?  Too
bad!  You could save us a lot of effort.  I've found different efforts
to get the same work done in termplot and all of the implementations
of predict. And just about every regression related package has its
own approach.

If there were one good way that would always work, just think how
convenient it would be for all those function & package writers.  Oh,
well. I'm not saying that my model.data is perfect, just that I wish
there were a perfect one :)

Yesterday, I realized that predict.nls probably has to deal with this
problem so I studied that and have yet another version of model.data
to propose to you.  I'm using this in my regression-support package
rockchalk, so you don't need to give me Brian Peterson's advice to
"put it in a package".

The idea here is to take variables from a fitted model's data if it
can find them, and then grab variables from the parent environment IF
they have the correct length.  This means we ignore variables like d
in poly(x, d) because the variable d is not of the same length as the
variables in the model.

##' Creates a "raw" (UNTRANSFORMED) data frame equivalent
##' to the input data that would be required to fit the given model.
##'
##' Unlike model.frame and model.matrix, this does not return transformed
##' variables.
##'
##' @param model A fitted regression model in which the data argument
##' is specified. This function will fail if the model was not fit
##' with the data option.
##' @return A data frame
##' @export
##' @author Paul E. Johnson 
##' @example inst/examples/model.data-ex.R
model.data <- function(model){
#from nls, returns -1 for missing variables
lenVar <- function(var, data) tryCatch(length(eval(as.name(var),
 data, env)), error = function(e) -1)
fmla <- formula(model)
varNames <- all.vars(fmla) ## all variable names
## varNames includes d in poly(x,d), possibly other "constants"
## varNamesRHS <- all.vars(formula(delete.response(terms(model
## previous same as nls way?
fmla2 <- fmla
fmla2[[2L]] <- 0
varNamesRHS <- all.vars(fmla2)
varNamesLHS <- setdiff(varNames, varNamesRHS)
env <- environment(fmla)
if (is.null(env))
env <- parent.frame()

dataOrig <-  eval(model$call$data, environment(formula(model)))
rndataOrig <- row.names(dataOrig)
n <- sapply(varNames, lenVar, data=dataOrig)
targetLength <- length(eval(as.name(varNamesLHS[1]), dataOrig, env))
varNames <- varNames[ n == targetLength ]
ldata <- lapply(varNames, function(x) eval(as.name(x), dataOrig, env))
names(ldata) <- varNames
data <- data.frame(ldata[varNames])
if (!is.null(rndataOrig)) row.names(data) <- rndataOrig
## remove rows listed in model's na.action
## TODO: question: what else besides OMIT might be called for?
if ( !is.null(model$na.action)){
data <- data[ -as.vector(model$na.action),  ]
}
## keep varNamesRHS that exist in datOrig
attr(data, "varNamesRHS") <- setdiff(colnames(data), varNamesLHS)
invisible(data)
}

And some example output:
> ## check if model.data works when there is no data argument
> set.seed(12345)
> x1 <- rpois(100, l=6)
> x2 <- rnorm(100, m=50, s=10)
> x3 <- rnorm(100)
> y <- rnorm(100)
> m0 <- lm(y ~ x1 + x2 + x3)
> m0.data <- model.data(m0)
> x1[4:5] <- NA
> m0 <- lm(y ~ x1 + x2 + x3)
> m0.data <- model.data(m0)
> head(m0.data)
   y x1   x2 x3
1 -0.8086741  7 44.59614 -1.6193283
2  1.0011198  9 69.47693  0.5483979
3  0.4560525  8 50.53590  0.1952822
6  0.6417692  4 52.77954 -0.2509466
7 -0.4150210  5 56.91171  1.6993467
8 -0.4595757  6 58.23795 -0.3442988
> x1 <- rpois(100, l=6)
> x2 <- rnorm(100, m=50, s=10)
> x3 <- rnorm(100)
> xcat1 <- gl(2,50, labels=c("M","F"))
> xcat2 <- cut(rnorm(100), breaks=c(-Inf, 0, 0.4, 0.9, 1, Inf), labels=c("R", 
> "M", "D", "P", "G"))
> dat <- data.frame(x1, x2, x3, xcat1, xcat2)
> rm(

[Rd] Requests for vignette clarification (re: Writing R Extensions)

2012-06-01 Thread Paul Johnson
I apologize that these questions are stupid and literal.

I write to ask for clarification of comments in the R extensions
manual about vignettes.  I'm not great at LaTeX, but I'm not a
complete novice either, and some of the comments are puzzling to me.

1. I'm stumbling over this line:

"Make sure all files needed to run the R code in the vignette (data
sets, ...) are accessible by either placing them in the inst/doc
hierarchy of the source package or by using calls to system.file()."

Where it says "inst/doc", can I interpret it to mean "vignettes"?  The
vignette files are under vignettes. Why wouldn't those other files be
in there? Or does that mean I'm supposed to copy the style and bib
files from the vignettes folder to the inst/doc folder?  Or none of
the above :)

2. I'm also curious about the implications of the parenthesized
section of this comment:

"By default R CMD build will run Sweave on all files in Sweave format
in vignettes, or if that does not exist, inst/doc (but not in
sub-directories)."

At first I though that meant it will search vignettes and
subdirectories under vignettes, or it will look under inst/doc, but no
subdirectories under inst/doc.  So I created vignettes in
subdirectories under vignettes and they are ignored by the build
process, so that was obviously wrong.  For clarification, it would
help me if the manual said

"By default R CMD build will run Sweave on all files in Sweave format
in vignettes (but not in sub-directories), or if that does not exist,
inst/doc ."

In this list I've read several questions/complaints from people who
don't want their vignettes rebuild during the package check or build
process, and I wondered if there is a benefit to having vignettes in
subdirectories.   Could inclusion of troublesome vignettes in
subdirectories be a way that people can circumvent the rebuilding and
re-checking of vignettes during build, check, or install?  If I build
my vignettes manually and copy the pdf output over to inst/doc, will
those pdf files be "legitimate" vignette files as far as CRAN is
concerned?  The writeup in R Extensions is a little bit confusing on
that point.

"By including the PDF version in the package sources it is not
necessary that the vignette PDFs can be re-built at install time,
i.e., the package author can use private R packages, screen snapshots
and LaTeX extensions which are only available on his machine."

Its just confusing, that's all I can say about it.

I could learn how to do this from some examples of packages that
manage vignettes the "right way", if you could tell me which ones are
"right" :) I'd like to see one that has a Makefile, uses a bib file,
and, if possible, one that imports a pdf file from elsewhere.  If
there is one that uses subdirectories under vignettes to keep separate
 the content of vignettes, that would be extra helpful.

I'm eager to do this in the correct way, just point me at some that are proper.

pj

-- 
Paul E. Johnson
Professor, Political Science    Assoc. Director
1541 Lilac Lane, Room 504     Center for Research Methods
University of Kansas               University of Kansas
http://pj.freefaculty.org            http://quant.ku.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Requests for vignette clarification (re: Writing R Extensions)

2012-06-04 Thread Paul Johnson
On Sun, Jun 3, 2012 at 2:02 PM, Paul Gilbert  wrote:
> I'll make a guess at some parts of this.
>
>
> On 12-06-01 02:53 PM, Paul Johnson wrote:
>>
>> I apologize that these questions are stupid and literal.
>>
>> I write to ask for clarification of comments in the R extensions
>> manual about vignettes.  I'm not great at LaTeX, but I'm not a
>> complete novice either, and some of the comments are puzzling to me.
>>
>> 1. I'm stumbling over this line:
>>
>> "Make sure all files needed to run the R code in the vignette (data
>> sets, ...) are accessible by either placing them in the inst/doc
>> hierarchy of the source package or by using calls to system.file()."
>>
>> Where it says "inst/doc", can I interpret it to mean "vignettes"?  The
>> vignette files are under vignettes. Why wouldn't those other files be
>> in there? Or does that mean I'm supposed to copy the style and bib
>> files from the vignettes folder to the inst/doc folder?  Or none of
>> the above :)
>
>
> I think the idea is that a user looking at an installed version of the
> package will be able to see things that are in the doc/ directory of the
> installed package. This automatically includes the source files (eg *.Stex)
> from vignettes/ and also the generated *.pdf and the *.R files stripped from
> the *.Stex files. If you want them to have access to other files then you
> should put those somewhere so they get installed, such as in the source
> package /inst/doc directory so they get put in the doc/ directory of the
> installed package. That should probably include anything else that is
> important to reproduce the results in the vignette, but I do not count the
> .bib file in that list (so I have it in vignettes/ and users would need to
> look at the package source to find it).
>
>>
>> 2. I'm also curious about the implications of the parenthesized
>> section of this comment:
>>
>> "By default R CMD build will run Sweave on all files in Sweave format
>> in vignettes, or if that does not exist, inst/doc (but not in
>> sub-directories)."
>>
>> At first I though that meant it will search vignettes and
>> subdirectories under vignettes, or it will look under inst/doc, but no
>> subdirectories under inst/doc.  So I created vignettes in
>> subdirectories under vignettes and they are ignored by the build
>> process, so that was obviously wrong.  For clarification, it would
>> help me if the manual said
>>
>> "By default R CMD build will run Sweave on all files in Sweave format
>> in vignettes (but not in sub-directories), or if that does not exist,
>> inst/doc ."
>>
>> In this list I've read several questions/complaints from people who
>> don't want their vignettes rebuild during the package check or build
>> process, and I wondered if there is a benefit to having vignettes in
>> subdirectories.   Could inclusion of troublesome vignettes in
>> subdirectories be a way that people can circumvent the rebuilding and
>> re-checking of vignettes during build, check, or install?  If I build
>> my vignettes manually and copy the pdf output over to inst/doc, will
>> those pdf files be "legitimate" vignette files as far as CRAN is
>> concerned?  The writeup in R Extensions is a little bit confusing on
>> that point.
>>
>> "By including the PDF version in the package sources it is not
>> necessary that the vignette PDFs can be re-built at install time,
>> i.e., the package author can use private R packages, screen snapshots
>> and LaTeX extensions which are only available on his machine."
>>
>> Its just confusing, that's all I can say about it.
>
>
> There was at least one earlier R-devel discussion of this, in which I
> contributed an incorrect understanding, but was generally straightened out
> by Uwe. I hope I have a correct understanding now.
>
> You can put a pdf file in inst/doc and specify "BuildVignettes: false" in
> the DESCRIPTION file, in which case the already constructed pdf from
> inst/doc will be used. The purpose of this is to allow vignettes which would
> not be completely constructed from sources, for example, because certain
> data or other resources may not be generally available. However, R CMD check
> will still try to parse the Sweave file and run the R code, and fail if it
> does not run. So, when the resources to build the vignette are not generally
> available this does require some special attention, often with try(), in the
> code for your vignette.
>
> It is possible to clai

Re: [Rd] stumped on re-building package vignette

2012-06-12 Thread Paul Johnson
On Wed, Jun 6, 2012 at 8:21 AM, Michael Friendly  wrote:
> [Env: Win Xp / StatET 2.0 / R 2.15.0]
>
> In my heplots package I extended the HE-examples.Rnw vignette under
> inst/doc. The package passes R CMD check
> on my machine:
>
> * using log directory 'C:/eclipse-3.7/heplots.Rcheck'
> * using R version 2.15.0 (2012-03-30)
> * using platform: i386-pc-mingw32 (32-bit)
> ...
> * checking sizes of PDF files under 'inst/doc' ... OK
> * checking installed files from 'inst/doc' ... OK
> * checking examples ... OK
> * checking for unstated dependencies in vignettes ... OK
> * checking package vignettes in 'inst/doc' ... OK
> * checking running R code from vignettes ...
> 'HE-examples.Rnw' ... OK
> 'repeated.Rnw' ... OK
> ...

2 suggestions.

First, On your own workstation, run the cmd check like so

R CMD check --as-cran package.tar.gz

That will give you results more like CRAN or Rforge.

Second, how did your pdf get into the folder "fig"?  It makes me
suspect you are using some forbidden Sweave settings.

Oh, Third, here's a suggestion.  Build your own vignette, copy from
vignettes to inst/doc, and then in the DESCRIPTION file include the
option to ask the R system not to rebuild your vignette.

BuildVignettes: false

I think you will see this discussed/debated in R-devel and the
situation is somewhat fluid.

pj


>
> However, on R-Forge and on CRAN, the following error/warning is generated:
>
> Mon Jun  4 20:18:22 2012: Checking package heplots (SVN revision 136) ...
> * using log directory
> ‘/mnt/building/build_2012-06-04-20-02/RF_PKG_CHECK/PKGS/heplots.Rcheck’
> * using R version 2.15.0 Patched (2012-06-03 r59505)
> * using platform: x86_64-unknown-linux-gnu (64-bit)
>  ...
> * checking for unstated dependencies in vignettes ... OK
> * checking package vignettes in ‘inst/doc’ ... OK
> * checking running R code from vignettes ...
>   ‘HE-examples.Rnw’ ... [4s/4s] OK
>   ‘repeated.Rnw’ ... [4s/4s] OK
>  [9s/9s] OK
> * checking re-building of vignette PDFs ... NOTE
> Error in re-building vignettes:
>  ...
> Loading required package: car
> Loading required package: MASS
> Loading required package: nnet
> Error: processing vignette 'HE-examples.Rnw' failed with diagnostics:
> cannot open file 'fig/plot-plastic1.pdf'
> Execution halted
>
> * checking PDF version of manual ... OK
>
> I am unable to determine why the file fig/plot-plastic1.pdf cannot be
> opened. It is in my inst/doc/fig directory & is regenerated
> by the .Rnw file. What could cause this?
>
> Second, I have tried manually running tools::compactPDF("HE-examples.pdf) on
> the .pdf file under inst/doc in the package,
> but no change is made to the .pdf file. I can't see any way to correct this.
>
> --
> Michael Friendly     Email: friendly AT yorku DOT ca
> Professor, Psychology Dept.
> York University      Voice: 416 736-5115 x66249 Fax: 416 736-5814
> 4700 Keele Street    Web:   http://www.datavis.ca
> Toronto, ONT  M3J 1P3 CANADA
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



-- 
Paul E. Johnson
Professor, Political Science    Assoc. Director
1541 Lilac Lane, Room 504     Center for Research Methods
University of Kansas               University of Kansas
http://pj.freefaculty.org            http://quant.ku.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Working on a Vignette called Rcheology

2012-06-12 Thread Paul Johnson
Greetings, R developers

Here are my requests today.

1. Would you care to review this vignette
http://pj.freefaculty.org/R/Rchaeology.pdf and tell me if you think it
is wrong headed, and

2. Supposing  you do not think I'm completely wrong, would you care to
point me at more examples of R idioms that lead to deep insights into
the nature of R programming?


Longer more boring explanation:

In the rockchalk package, I have functions to make it easier to teach
regression to students who aren't R programmers. I just uploaded
package version 1.6 to CRAN. While I do this, I'm reading R source
code all the time because I need to understand how functions like
summary and anova receive R objects, take them apart, and do specific
chores.  I've been very glad that R is open source while I do this.

While working on this, an inspiration hit me! I want to call
"Rchaeology" the study of R idioms and customs deduced from the R
source code written by the experts.  Practicing this makes me an
Rchaeologist.

I've started a vignette, it's in rockchalk and up here:

http://pj.freefaculty.org/R/Rchaeology.pdf

I plan to work out sections that explore some particular usage
examples that touch on things that expose special features of the R
language.  Right now, I've only got one worked out, it explores this
idiom that Gabor  G. explained to me, which receives a formula object
and replaces a variable "x1" with "x1c".

do.call("substitute", list(newFmla, setNames(list(as.name("x1c")), "x1")))

I think that one is fabulous, It sheds a lot of light when you break
it down to pieces.

I'd like to build up a good list of "Do This, Not That" bits, but it
is hard to find particular approaches that almost all of you will
agree to.  Almost everybody agrees that calling rbind over and over
again is slow, and do.call with rbind and a collection of stackable
things is better:

http://pj.freefaculty.org/R/WorkingExamples/stackListItems.R

If you have ideas for "Do This, Not That", I'd be glad to hear them
and I'll look for example applications that make them relevant.

pj

-- 
Paul E. Johnson
Professor, Political Science    Assoc. Director
1541 Lilac Lane, Room 504     Center for Research Methods
University of Kansas               University of Kansas
http://pj.freefaculty.org            http://quant.ku.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Problem in vignette packaging of Sweave in utils package

2012-07-03 Thread Paul Johnson
In ?Sweave, it refers to Sweave User Manual. In the doc folder of
utils package, I see "Sweave.pdf".

However, I can't find it from within R


> vignette("Sweave User Manual")
Warning message:
vignette ‘Sweave User Manual’ not found


> browseVignettes("utils")
No vignettes found by browseVignettes("utils")


> library(help=utils)

does not refer to any vignettes.

The vignette does not appear in the main page for utils in help.start().

I checked the source code for the Sweave vignette, but I don't see
anything wrong. It has all of the required elements:

%\VignetteIndexEntry{Sweave User Manual}
%\VignettePackage{utils}
%\VignetteDepends{tools}
%\VignetteDepends{datasets}
%\VignetteDepends{stats}

Am I accessing it incorrectly, or is there something wrong in my
install of R-2.15.1?

> sessionInfo()
R version 2.15.1 (2012-06-22)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=C LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

loaded via a namespace (and not attached):
[1] tools_2.15.1
-- 
Paul E. Johnson
Professor, Political ScienceAssoc. Director
1541 Lilac Lane, Room 504 Center for Research Methods
University of Kansas   University of Kansas
http://pj.freefaculty.orghttp://quant.ku.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Problem in vignette packaging of Sweave in utils package

2012-07-03 Thread Paul Johnson
On Tue, Jul 3, 2012 at 12:54 PM, Yihui Xie  wrote:
> Strange enough; I just noticed the HTML index pages of several base
> packages were gone (e.g. base, stats, tools, utils) under Ubuntu. Not
> sure if this is a problem of Debian packages or R itself.
>

It is not only on Debian where I see:

> vignette("Sweave")
Warning message:
vignette ‘Sweave’ not found

This afflicts RPM based installs as well. It is not a new problem, AFAICT.

Here's output from R-2.15.0 on a Centos Linux cluster node.:

> vignette("Sweave")
Warning message:
vignette 'Sweave' not found

> sessionInfo()
R version 2.15.0 (2012-03-30)
Platform: x86_64-redhat-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.iso885915   LC_NUMERIC=C
 [3] LC_TIME=en_US.iso885915LC_COLLATE=en_US.iso885915
 [5] LC_MONETARY=en_US.iso885915LC_MESSAGES=en_US.iso885915
 [7] LC_PAPER=C LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.iso885915 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

loaded via a namespace (and not attached):
[1] tools_2.15.0

> --
> Yihui Xie 
> Phone: 515-294-2465 Web: http://yihui.name
> Department of Statistics, Iowa State University
> 2215 Snedecor Hall, Ames, IA
>

-- 
Paul E. Johnson
Professor, Political ScienceAssoc. Director
1541 Lilac Lane, Room 504 Center for Research Methods
University of Kansas   University of Kansas
http://pj.freefaculty.orghttp://quant.ku.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Are R packages supposed to be "relocatable"? (avoiding BioConductor scripts...)

2012-07-19 Thread Paul Johnson
I've asked a question in the BioConductor list about package
management. My solution depends on your answer to the following
question.

Are installed R packages "relocatable"?

I mean relocatable in the same sense that files in a RedHat RPM file
might be "relocatable" after compiling
(http://www.rpm.org/max-rpm/ch-rpm-reloc.html).  This allows one to
build a package as the ordinary user and then the root user can take
the result and put it wherever it fits well in the path (say,
/usr/local/lib/R/library).

Here is why I asked.  I manage packages in our cluster and some users
have asked me to install some BioConductor packages.  BioConductor
install documents expect me to run a script as root that does a bunch
of changes, and I'm just unwilling to do that.  If I have to do
something as root, it has to be something more focused like running a
particular R command (install.packages(), for example).  It seems
insane to me that they expect (http://www.bioconductor.org/install) a
root user to run

source("http://bioconductor.org/biocLite.R";)
biocLite("limma")

If I could do the installs as me with their script, and then copy the
install folder into the system, then it would be OK, if the packages
would work.

Or perhaps post-processing is required to fiddle some paths inside
package files?

pj
-- 
Paul E. Johnson
Professor, Political ScienceAssoc. Director
1541 Lilac Lane, Room 504 Center for Research Methods
University of Kansas   University of Kansas
http://pj.freefaculty.orghttp://quant.ku.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Problem in vignette packaging of Sweave in utils package

2012-07-26 Thread Paul Johnson
New help request below

On Sat, Jul 7, 2012 at 7:25 AM, Duncan Murdoch  wrote:
> On 12-07-03 1:21 PM, Paul Johnson wrote:
>>
>> In ?Sweave, it refers to Sweave User Manual. In the doc folder of
>> utils package, I see "Sweave.pdf".
>>
>> However, I can't find it from within R
>>
>>
>>> vignette("Sweave User Manual")
>>
>> Warning message:
>> vignette ‘Sweave User Manual’ not found
>
>
> Turns out there was a bug in the code to install from tarballs, so base
> package vignettes weren't fully installed.  I've fixed it; installs from
> R-patched or R-devel revisions after 59750 should be okay.
>
> Duncan Murdoch
>
>

Thanks, Duncan

Can I ask you to check this one again?

I just built RPMS for R-2.15.1-patched 2012-07-25 on a Rocks Cluster
(Centos 5.3).  I still see the trouble that the vignette for Sweave is
missing from the vignette index.

Below I'm showing everything I can think of that is related to this,
hoping it will give you an idea if the problem is on my end.

> sessionInfo()
R version 2.15.1 Patched (2012-07-25 r59963)
Platform: x86_64-redhat-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=C LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

loaded via a namespace (and not attached):
[1] tools_2.15.1

> browseVignettes("utils")
No vignettes found by browseVignettes("utils")

> vignette("Sweave")
Warning message:
vignette ‘Sweave’ not found

As far as I can tell, the vignette WAS built and installed, as evidenced:

$ pwd
/usr/lib64/R/library/utils/doc
[pauljohn@devel doc]$ ls
Sweave.pdf

But it is not found by the vignette indexing system, it's not in the
output from
help(package="utils")

Maybe I did something to break the install when making the RPM.  But I
don't know what:)  I also built R the "old fashioned way" in the shell
and the result is the same.

For the RPM, I used a spec file I've slightly adapted from Tom "Spot"
Calloway's spec file that is used to build R for EPEL and Fedora.
Here's a copy:

http://pj.freefaculty.org/misc/R-2.15.1-patch.spec

And the output (stdout and stderr) from the build process, obtained from

$ rpmbuild -ba R-2.15.1-patch.spec --define "dist kurocks53" >
R-2.15.1-patch.txt 2>&1

http://pj.freefaculty.org/misc/R-2.15.1-patch.txt

In order to make this work, I did one unsavory thing.  I took the
contents of this R-patched_2012-07-25.tar.gz and re-named the
directory R-2.15.1 and then repackaged that as R-2.15.1.tar.gz.  I
know this is unsavory to rename somebody else's packaging, but it was
the only way I could get the RPM build system to accept the version
number and file name. I put the date in the release version to reduce
confusion.  The RPMs have names like:

R-core-2.15.1-20120725kurocks53.x86_64.rpm

Oh, heck. I'm just uploading them all to the same folder as the spec
file, on the off chance you can see what's wrong.

libRmath-2.15.1-20120725kurocks53.x86_64.rpm
libRmath-devel-2.15.1-20120725kurocks53.x86_64.rpm
libRmath-static-2.15.1-20120725kurocks53.x86_64.rpm
R-2.15.1-20120725kurocks53.x86_64.rpm
R-core-2.15.1-20120725kurocks53.x86_64.rpm
R-debuginfo-2.15.1-20120725kurocks53.x86_64.rpm
R-devel-2.15.1-20120725kurocks53.x86_64.rpm

And the SRPM:

R-2.15.1-20120725kurocks53.src.rpm

pj

-- 
Paul E. Johnson
Professor, Political ScienceAssoc. Director
1541 Lilac Lane, Room 504 Center for Research Methods
University of Kansas   University of Kansas
http://pj.freefaculty.orghttp://quant.ku.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Problem in vignette packaging of Sweave in utils package

2012-07-27 Thread Paul Johnson
On Fri, Jul 27, 2012 at 9:54 AM, Duncan Murdoch
 wrote:
> On 12-07-27 1:23 AM, Paul Johnson wrote:
>>
>> New help request below
>>
>> On Sat, Jul 7, 2012 at 7:25 AM, Duncan Murdoch 
>> wrote:
>>>
>>> On 12-07-03 1:21 PM, Paul Johnson wrote:
>>>>
>>>>
>>>> In ?Sweave, it refers to Sweave User Manual. In the doc folder of
>>>> utils package, I see "Sweave.pdf".
>>>>
>>>> However, I can't find it from within R
>>>>
>>>>
>>>>> vignette("Sweave User Manual")
>>>>
>>>>
>>>> Warning message:
>>>> vignette ‘Sweave User Manual’ not found
>>>
>>>
>>>
>>> Turns out there was a bug in the code to install from tarballs, so base
>>> package vignettes weren't fully installed.  I've fixed it; installs from
>>> R-patched or R-devel revisions after 59750 should be okay.
>>>
>>> Duncan Murdoch
>>>
>>>
>>
>> Thanks, Duncan
>>
>> Can I ask you to check this one again?
>
>
> Sorry, I can't help with this.  If you build from svn or from a tarball,
> then I could help, but someone else produces the rpms, and I don't have
> access to a Linux system to try them.
>
> Duncan Murdoch
>
>

Sorry I was unclear. I did build from R-patched tarball, from here:

ftp://ftp.stat.math.ethz.ch/Software/R/R-patched_2012-07-25.tar.gz

I get the same result whether I build that into an RPM or if I build
and install the "old fashioned" way (configure, make, make install).
The Sweave pdf is created, but not found inside R.

I just re-ran the compilation in an Emacs shell and you can see
everything that goes on:

http://pj.freefaculty.org/misc/R-patched-build-20120725.txt

As you will see, I run into problems with pdf creation in

make pdf

dealing with refman. I think that links back to the ancient tetex
toolkit and the lack of incolsolata font.  If that is the reason R
can't find the vignette, I would be glad to know.

I do get the Sweave.pdf vignette created, just not indexed.

pj
-- 
Paul E. Johnson
Professor, Political ScienceAssoc. Director
1541 Lilac Lane, Room 504 Center for Research Methods
University of Kansas   University of Kansas
http://pj.freefaculty.orghttp://quant.ku.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] tcltk capability

2012-09-27 Thread Paul Johnson
Sorry I did not see this sooner.  Response below:

On Thu, Aug 30, 2012 at 2:48 PM, Gabor Grothendieck
 wrote:
> There are quite a few packages that make use of tcltk and although
>
> - most R distributions have tcltk capability
> - its possible to query this via capabilities()[["tcltk"]]
> - attempting to build a package that needs it on such an R
> distribution will give a message letting one know
>
> users still seem to have problems with this.

I agree that packages are using tcltk, but users do so at their peril.
 Just loading tcltk breaks multicore functions like mclapply. I don't
understand why, you might be able to explain it to me.

http://bugs.r-project.org/bugzilla3/show_bug.cgi?id=15040

tcltk is discouraged, at least to some in R core.

I would really appreciate some guidance on what graphics library I
should use to develop GUI for R programs.  There's no future in tcltk,
apparently, but I don't know what to work with.

pj

pj

>
> There was recently an R bug report submitted that likely stemmed from
> the user having such a build of R and recently there was also an R
> help post that seemed to be related to this as well.  As more and more
> packages add GUI capability these incidents are bound to increase.
>
> The best situation would be to arrange that every distribution of R
> have tcltk capability but if this is difficult to arrange then I
> suggest there be a message at start up similar to the existing locale
> message letting the user know that the R distribution they are running
> lacks tcltk capability.  Preferably the message would be somewhat
> menacing so that users are encouraged to get a different distribution
> of R right from the start.
>
> This might reduce such problem reports somewhat.
>
> --
> Statistics & Software Consulting
> GKX Group, GKX Associates Inc.
> tel: 1-877-GKX-GROUP
> email: ggrothendieck at gmail.com
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



-- 
Paul E. Johnson
Professor, Political ScienceAssoc. Director
1541 Lilac Lane, Room 504 Center for Research Methods
University of Kansas   University of Kansas
http://pj.freefaculty.orghttp://quant.ku.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Changing arguments inside .Call. Wise to encourage "const" on all arguments?

2012-12-09 Thread Paul Johnson
I'm continuing my work on finding speedups in generalized inverse
calculations in some simulations.  It leads me back to .C and .Call,
and some questions I've never been able to answer for myself.  It may
be I can push some calculations to LAPACK in or C BLAS, that's why I
realized again I don't understand the call by reference or value
semantics of .Call

Why aren't users of .Call encouraged to "const" their arguments, and
why doesn't .Call do this for them (if we really believe in return by
value)?

R Gentleman's R Programming for Bioinformatics is the most
understandable treatment I've found on .Call. It appears to me .Call
leaves "wiggle room" where there should be none.  Here's Gentleman on
p. 185. "For .Call and .External, the return value is an R object (the
C functions must return a SEXP), and for these functions the values
that were passed are typically not modified.  If they must be
modified, then making a copy in R, prior to invoking the C code, is
necessary."

I *think* that means:

.Call allows return by reference, BUT we really wish users would not
use it. Users can damage R data structures that are pointed to unless
they really truly know what they are doing on the C side. ??

This seems dangerous. Why allow return  by reference at all?

On p. 197, there's a similar comment  "Any function that has been
invoked by either .External or .Call will have all of its arguments
protected already. You do not need to protect them.  [T]hey were
not duplicated and should be treated as read-only values."

"should be ... read-only" concerns me. They are "protected" in the
garbage collector sense, but they are not protected from "return by
reference" damage. Right?

Why doesn't the documentation recommend function writers to mark
arguments to C functions as const?  Isn't that what the return by
value policy would suggest?

Here's a troublesome example in  R src/main/array.c:

/* DropDims strips away redundant dimensioning information. */
/* If there is an appropriate dimnames attribute the correct */
/* element is extracted and attached to the vector as a names */
/* attribute.  Note that this function mutates x. */
/* Duplication should occur before this is called. */

SEXP DropDims(SEXP x)
{
SEXP dims, dimnames, newnames = R_NilValue;
int i, n, ndims;

   PROTECT(x);
   dims = getAttrib(x, R_DimSymbol);
[... SNIP ]
setAttrib(x, R_DimNamesSymbol, R_NilValue);
setAttrib(x, R_DimSymbol, R_NilValue);
setAttrib(x, R_NamesSymbol, newnames);
[... SNIP ]

return x;
}

Well, at least there's a warning comment with that one.  But it does
show .Call allows call by reference.

Why allow it, though? DropDims could copy x, modify the copy, and return it.

I wondered why DropDims bothers to return x at all. We seem to be
using modify and return by reference there.

I also wondered why x is PROTECTED, actually. Its an argument, wasn't
it automatically protected? Is it no longer  protected after the
function starts modifying it?

Here's an  example with similar usage in Writing R Extensions, section
5.10.1 "Calling .Call".  It protects the arguments a and b (needed
??), then changes them.

#include 
#include 

 SEXP convolve2(SEXP a, SEXP b)
 {
 R_len_t i, j, na, nb, nab;
 double *xa, *xb, *xab;
 SEXP ab;

 PROTECT(a = AS_NUMERIC(a)); /* PJ wonders, doesn't this alter
"a"  in calling code*/
 PROTECT(b = AS_NUMERIC(b));
 na = LENGTH(a); nb = LENGTH(b); nab = na + nb - 1;
 PROTECT(ab = NEW_NUMERIC(nab));
 xa = NUMERIC_POINTER(a); xb = NUMERIC_POINTER(b);
 xab = NUMERIC_POINTER(ab);
 for(i = 0; i < nab; i++) xab[i] = 0.0;
 for(i = 0; i < na; i++)
  for(j = 0; j < nb; j++) xab[i + j] += xa[i] * xb[j];
 UNPROTECT(3);
 return(ab);
 }


Doesn't

PROTECT(a = AS_NUMERIC(a));

have the alter the data structure "a" both inside the C function and
in the calling R code as well? And, if a was PROTECTED automatically,
could we do without that PROTECT()?

pj

-- 
Paul E. Johnson
Professor, Political Science  Assoc. Director
1541 Lilac Lane, Room 504  Center for Research Methods
University of Kansas University of Kansas
http://pj.freefaculty.org   http://quant.ku.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] R-2.15.2 changes in computation speed. Numerical precision?

2012-12-12 Thread Paul Johnson
Speaking of optimization and speeding up R calculations...

I mentioned last week I want to speed up calculation of generalized
inverses. On Debian Wheezy with R-2.15.2, I see a huge speedup using a
souped up generalized inverse algorithm published by

V. N. Katsikis, D. Pappas, Fast computing of theMoore-Penrose inverse
matrix, Electronic Journal of Linear Algebra,
17(2008), 637-650.

I was so delighted to see the computation time drop on my Debian
system that I boasted to the WIndows users and gave them a test case.
They answered back "there's no benefits, plus Windows is faster than
Linux".

That sent me off on a bit of a goose chase, but I think I'm beginning
to understand the situation.  I believe R-2.15.2 introduced a tighter
requirement for precision, thus triggering longer-lasting calculations
in many example scripts. Better algorithms can avoid some of that
slowdown, as you see in this test case.

Here is the test code you can run to see:

http://pj.freefaculty.org/scraps/profile/prof-puzzle-1.R

It downloads a data file from that same directory and then runs some
multiple imputations with the Amelia package.

Here's the output from my computer

http://pj.freefaculty.org/scraps/profile/prof-puzzle-1.Rout

That includes the profile of the calculations that depend on the
ordinary generalized inverse algorithm based on svd and the new one.

See? The KP algorithm is faster.  And just as accurate as
Amelia:::mpinv or MASS::ginv (for details on that, please review my
notes in http://pj.freefaculty.org/scraps/profile/qrginv.R).

So I asked WIndows users for more detailed feedback, including
sessionInfo(), and I noticed that my proposed algorithm is not faster
on Windows--WITH OLD R!

Here's the script output with R-2.15.0, shows no speedup from the
KPginv algorithm

http://pj.freefaculty.org/scraps/profile/prof-puzzle-1-Windows.Rout

On the same machine, I updated to R-2.15.2, and we see the same
speedup from the KPginv algorithm

http://pj.freefaculty.org/scraps/profile/prof-puzzle-1-CRMDA02-WinR2.15.2.Rout

After that, I realized it is an R version change, not an OS
difference, I was a bit relieved.

What causes the difference in this case?  In the Amelia code, they try
to avoid doing the generalized inverse by using the ordinary solve(),
and if that fails, then they do the generalized inverse. In R 2.15.0,
the near singularity of the matrix is ignored, but not in R 2.15.2.
The ordinary solve is failing almost all the time, thus triggering the
use of the svd based generalized inverse.  Which is slower.

The Katsikis and Pappas 2008 algorithm is the fastest one I've found
after translating from Matlab to R.  It is not so universally
applicable as svd based methods, it will fail if there are linearly
dependent columns. However, it does tolerate columns of all zeros,
which seems to be the problem case in the particular application I am
testing.

I tried very hard to get the newer algorithm described here to go as
fast, but it is way way slower, at least in the implementations I
tried:
##  KPP
## Vasilios N. Katsikis, Dimitrios Pappas, Athanassios Petralias.  "An
improved method for
## the computation of the Moore Penrose inverse matrix," Applied
## Mathematics and Computation, 2011

The notes on that are in the qrginv.R file linked above.

The fact that I can't make that newer KPP algorithm go faster,
although the authors show it can go faster in Matlab, leads me to a
bunch of other questions and possibly the need to implement all of
this in C with LAPACK or EIGEN or something like that, but at this
point, I've got to return to my normal job.  If somebody is good at
R's .Call interface and can make a pure C implementation of KPP.

I think the key thing is that with R-2.15.2, there is an svd-related
bottleneck in the multiple imputation algorithms in Amelia. The
replacement version of the function Amelia:::mpinv does reclaim a 30%
time saving, while generating imputations that are identical, so far
as i can tell.

pj



-- 
Paul E. Johnson
Professor, Political Science  Assoc. Director
1541 Lilac Lane, Room 504  Center for Research Methods
University of Kansas University of Kansas
http://pj.freefaculty.org   http://quant.ku.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R-2.15.2 changes in computation speed. Numerical precision?

2012-12-13 Thread Paul Johnson
On Thu, Dec 13, 2012 at 3:33 AM, Uwe Ligges
 wrote:
> Long message, but as far as I can see, this is not about base R but the
> contributed package Amelia: Please discuss possible improvements with its
> maintainer.
>
Thanks for answering, but I'm really surprised by your answer. The
package is a constant, but R got much slower  between R-2.15.1 and
R-2.15.2. The profile of functions called radically changed, svd gets
called much more because solve() fails much more often.

No change in R could account for it?  I'm not saying R is wrong, it
may be more accurate and better. After chasing the slowdown for a
week, I need some comfort. Does the LINPACK -> LAPACK change play a
role. The change I'm looking for is something that would substantially
tune up mathematical precision so that matrices that did not seem
singular before are now, in the eyes of functions like chol() and
solve().  Whereas in R-2.15.1, a matrix can be inverted by solve(),
for example, now R-2.15.2 rejects the matrix as singular.

I will take the problem up with applications, of course. But you see
how package writers might think its ridiculous.   They argue, "I had a
perfectly fine calculation against R-2.15.0 and R-2.15.1, and now with
R-2.15.2 it takes three times as long?  And you want me to revise my
package?"

Would you be persuaded there's an R base question if I showed you a
particular matrix that can be decomposed or solved in R-2.15.1 but
cannot be in R-2.15.2? I should have thought of that before, I suppose
:)

pj
> Best,
> Uwe Ligges
>
>
>
> On 12.12.2012 19:14, Paul Johnson wrote:
>>
>> Speaking of optimization and speeding up R calculations...
>>
>> I mentioned last week I want to speed up calculation of generalized
>> inverses. On Debian Wheezy with R-2.15.2, I see a huge speedup using a
>> souped up generalized inverse algorithm published by
>>
>> V. N. Katsikis, D. Pappas, Fast computing of theMoore-Penrose inverse
>> matrix, Electronic Journal of Linear Algebra,
>> 17(2008), 637-650.
>>
>> I was so delighted to see the computation time drop on my Debian
>> system that I boasted to the WIndows users and gave them a test case.
>> They answered back "there's no benefits, plus Windows is faster than
>> Linux".
>>
>> That sent me off on a bit of a goose chase, but I think I'm beginning
>> to understand the situation.  I believe R-2.15.2 introduced a tighter
>> requirement for precision, thus triggering longer-lasting calculations
>> in many example scripts. Better algorithms can avoid some of that
>> slowdown, as you see in this test case.
>>
>> Here is the test code you can run to see:
>>
>> http://pj.freefaculty.org/scraps/profile/prof-puzzle-1.R
>>
>> It downloads a data file from that same directory and then runs some
>> multiple imputations with the Amelia package.
>>
>> Here's the output from my computer
>>
>> http://pj.freefaculty.org/scraps/profile/prof-puzzle-1.Rout
>>
>> That includes the profile of the calculations that depend on the
>> ordinary generalized inverse algorithm based on svd and the new one.
>>
>> See? The KP algorithm is faster.  And just as accurate as
>> Amelia:::mpinv or MASS::ginv (for details on that, please review my
>> notes in http://pj.freefaculty.org/scraps/profile/qrginv.R).
>>
>> So I asked WIndows users for more detailed feedback, including
>> sessionInfo(), and I noticed that my proposed algorithm is not faster
>> on Windows--WITH OLD R!
>>
>> Here's the script output with R-2.15.0, shows no speedup from the
>> KPginv algorithm
>>
>> http://pj.freefaculty.org/scraps/profile/prof-puzzle-1-Windows.Rout
>>
>> On the same machine, I updated to R-2.15.2, and we see the same
>> speedup from the KPginv algorithm
>>
>>
>> http://pj.freefaculty.org/scraps/profile/prof-puzzle-1-CRMDA02-WinR2.15.2.Rout
>>
>> After that, I realized it is an R version change, not an OS
>> difference, I was a bit relieved.
>>
>> What causes the difference in this case?  In the Amelia code, they try
>> to avoid doing the generalized inverse by using the ordinary solve(),
>> and if that fails, then they do the generalized inverse. In R 2.15.0,
>> the near singularity of the matrix is ignored, but not in R 2.15.2.
>> The ordinary solve is failing almost all the time, thus triggering the
>> use of the svd based generalized inverse.  Which is slower.
>>
>> The Katsikis and Pappas 2008 algorithm is the fastest one I've found
>> after translating from Matlab to R.  It is not so universally
>> applicable as svd based methods, it 

Re: [Rd] R-2.15.2 changes in computation speed. Numerical precision?

2012-12-13 Thread Paul Johnson
On Thu, Dec 13, 2012 at 9:01 PM, Yi (Alice) Wang  wrote:
> I have also encountered a similar problem. My mvabund package runs much
> faster on linux/OSX than on windows with both R/2.15.1 and R/2.15.2. For
> example, with mvabund_3.6.3 and R/2.15.2,
> system.time(example(anova.manyglm))
>

Hi, Alice

You have a different problem than I do.

The change from R-2.15.1 to R-2.15.2 makes the program slower on all
platforms.  The slowdown that emerges in R-2.15.2 on all types of
hardware concerns me.

It only seemed like a "Windows is better" issue when all the Windows
users who tested my program were using R-2.15.0 or R-2.15.1. As soon
as they update R, then they have the slowdown as well.

pj


> on OSX returns
>
>user  system elapsed
>   3.351   0.006   3.381
>
> but on windows 7 it returns
>
>user  system elapsed
>   13.13   0.00   13.14
>
> I also used svd frequently in my c code though by calling the gsl functions
> only. In my memory, I think the comp time difference is not that significant
> with earlier R versions. So maybe it is worth an investigation?
>
> Many thanks,
> Yi Wang
>
>
> On Thu, Dec 13, 2012 at 5:33 PM, Uwe Ligges
>  wrote:
>>
>> Long message, but as far as I can see, this is not about base R but the
>> contributed package Amelia: Please discuss possible improvements with its
>> maintainer.
>>
>> Best,
>> Uwe Ligges
>>
>>
>> On 12.12.2012 19:14, Paul Johnson wrote:
>>>
>>> Speaking of optimization and speeding up R calculations...
>>>
>>> I mentioned last week I want to speed up calculation of generalized
>>> inverses. On Debian Wheezy with R-2.15.2, I see a huge speedup using a
>>> souped up generalized inverse algorithm published by
>>>
>>> V. N. Katsikis, D. Pappas, Fast computing of theMoore-Penrose inverse
>>> matrix, Electronic Journal of Linear Algebra,
>>> 17(2008), 637-650.
>>>
>>> I was so delighted to see the computation time drop on my Debian
>>> system that I boasted to the WIndows users and gave them a test case.
>>> They answered back "there's no benefits, plus Windows is faster than
>>> Linux".
>>>
>>> That sent me off on a bit of a goose chase, but I think I'm beginning
>>> to understand the situation.  I believe R-2.15.2 introduced a tighter
>>> requirement for precision, thus triggering longer-lasting calculations
>>> in many example scripts. Better algorithms can avoid some of that
>>> slowdown, as you see in this test case.
>>>
>>> Here is the test code you can run to see:
>>>
>>> http://pj.freefaculty.org/scraps/profile/prof-puzzle-1.R
>>>
>>> It downloads a data file from that same directory and then runs some
>>> multiple imputations with the Amelia package.
>>>
>>> Here's the output from my computer
>>>
>>> http://pj.freefaculty.org/scraps/profile/prof-puzzle-1.Rout
>>>
>>> That includes the profile of the calculations that depend on the
>>> ordinary generalized inverse algorithm based on svd and the new one.
>>>
>>> See? The KP algorithm is faster.  And just as accurate as
>>> Amelia:::mpinv or MASS::ginv (for details on that, please review my
>>> notes in http://pj.freefaculty.org/scraps/profile/qrginv.R).
>>>
>>> So I asked WIndows users for more detailed feedback, including
>>> sessionInfo(), and I noticed that my proposed algorithm is not faster
>>> on Windows--WITH OLD R!
>>>
>>> Here's the script output with R-2.15.0, shows no speedup from the
>>> KPginv algorithm
>>>
>>> http://pj.freefaculty.org/scraps/profile/prof-puzzle-1-Windows.Rout
>>>
>>> On the same machine, I updated to R-2.15.2, and we see the same
>>> speedup from the KPginv algorithm
>>>
>>>
>>> http://pj.freefaculty.org/scraps/profile/prof-puzzle-1-CRMDA02-WinR2.15.2.Rout
>>>
>>> After that, I realized it is an R version change, not an OS
>>> difference, I was a bit relieved.
>>>
>>> What causes the difference in this case?  In the Amelia code, they try
>>> to avoid doing the generalized inverse by using the ordinary solve(),
>>> and if that fails, then they do the generalized inverse. In R 2.15.0,
>>> the near singularity of the matrix is ignored, but not in R 2.15.2.
>>> The ordinary solve is failing almost all the time, thus triggering the
>>> use of the svd based generalized inverse.  Which is slower.
>>>
>>> The Katsiki

[Rd] Found explanation for R-2.15.2 slowdown in one case; caution for any users of La_chol

2012-12-14 Thread Paul Johnson
2 days ago, I posted my long message about the observed slowdown in a
package between R-2.15.0 and R-2.15.2.

Uwe Ligges urged me to make a self-contained R example. That was the
encouragement I needed. I tracked the problem down to a failing use of
a LAPACK routine.

R's LAPACK C interface changed one variable in one function. But it
turned out to be an important change.  In case others have code that
is behaving in unexpected says, I'd urge package writers to
double-check their usage of the Cholesky inverse. Here are details:

In R 2.15.0, src/main/lapack.c, we have the prototype:

SEXP La_chol (SEXP A)

BUT in R 2.15.2, the prototype changed:

SEXP La_chol (SEXP A, SEXP pivot)

In the problem case I was studying, the effort to use La_chol was
wrapped in a "try" catch framework, and when Cholesky failed, it fell
back to a singular value decomposition. That's much slower, of course.

Hence the program seemed slower under R-2.15.2, but it was really
failing in a way that I had not noticed.

pj

-- 
Paul E. Johnson
Professor, Political Science  Assoc. Director
1541 Lilac Lane, Room 504  Center for Research Methods
University of Kansas University of Kansas
http://pj.freefaculty.org   http://quant.ku.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Found explanation for R-2.15.2 slowdown in one case; caution for any users of La_chol

2012-12-15 Thread Paul Johnson
On Sat, Dec 15, 2012 at 11:44 AM, Uwe Ligges
 wrote:
>
>
> On 14.12.2012 23:31, Paul Johnson wrote:
>>
> That is the reason why R CMD check gives a WARNING in the checks of your
> package for quite some time now:
>
>
> checking foreign function calls ... WARNING
>
> Foreign function calls with 'PACKAGE' argument in a base package:
> .Call("La_chol", ..., PACKAGE = "base")
> .Call("La_chol2inv", ..., PACKAGE = "base")
> Packages should not make .C/.Call/.Fortran calls to base packages. They
> are not part of the API, for use only by R itself and subject to change
> without notice.
>

Just to be clear. Its not *my* package. It is a package we use A LOT
I had to get to the bottom of the problem. I'm *very glad* to
understand it.

>From a user perspective, I wonder if CRAN could warn me somehow?
Would it be a reasonable wishlist request to have the build process
take a snapshot of warnings and then have the installer play them back
to the user when the package is installed?  That would have saved me
some time this week.

Now, when I install that, I don't see a warning. (see below).

> install.packages("Amelia")
Installing package(s) into ‘/home/pauljohn/R/x86_64-pc-linux-gnu-library/2.15’
(as ‘lib’ is unspecified)
--- Please select a CRAN mirror for use in this session ---
Loading Tcl/Tk interface ... done
trying URL 'http://rweb.quant.ku.edu/cran/src/contrib/Amelia_1.6.3.tar.gz'
Content type 'application/x-gzip' length 1060878 bytes (1.0 Mb)
opened URL
==
downloaded 1.0 Mb

* installing *source* package ‘Amelia’ ...
** package ‘Amelia’ successfully unpacked and MD5 sums checked
** R
** data
** inst
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** installing vignettes
   ‘amelia.Rnw’
** testing if installed package can be loaded

* DONE (Amelia)

The downloaded source packages are in
‘/tmp/Rtmpi0CLZe/downloaded_packages’
>
> So time to fix that. See also:
> http://cran.r-project.org/web/checks/check_results_Amelia.html
>
>
> Best,
> Uwe Ligges
>



-- 
Paul E. Johnson
Professor, Political Science  Assoc. Director
1541 Lilac Lane, Room 504  Center for Research Methods
University of Kansas University of Kansas
http://pj.freefaculty.org   http://quant.ku.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] weird bug with parallel, RSQlite and tcltk

2013-01-03 Thread Paul Johnson
Happy new year.

On Mon, Dec 31, 2012 at 12:08 PM, Karl Forner  wrote:
> Hello,
>
> I spent a lot of a time on a weird bug, and I just managed to narrow it down.
>
> In parallel code (here with parallel::mclappy, but I got it
> doMC/multicore too), if the library(tcltk) is loaded, R hangs when
> trying to open a DB connection.
> I got the same behaviour on two different computers, one dual-core,
> and one 2 xeon quad-core.
>

I believe this is the same problem discussed here:

https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=15040

I am interested to know the long term implications of the discussion
going on here, but can I interject to ask the short term question:

With R-2.15.2, is there something I can do to defend myself before
calling mclapply? Suppose I don't intentionally use tcltk, but some
package I use has loaded it. Or somebody ran install.packages(). What
to do?

I thought this would work:

detach(package:tcltk, unload=TRUE)

I ran one test case where it seemed to work, but then started a
several new sessions and got a big crashes.

Example 1. detach(package:tcltk) without unload=TRUE hangs

> library(parallel)
> library(tcltk)
Loading Tcl/Tk interface ... done
> detach(package:tcltk)
> example(mclapply)

mclppl> ## No test:
mclppl> simplify2array(mclapply(rep(4, 5), rnorm))
   [,1]   [,2]   [,3]   [,4][,5]
[1,] -0.1499368 -0.7511576  0.7675234 -0.1324973 -0.01118655
[2,] -0.5989764  0.6605780 -0.9417879  0.2295157 -0.16750436
[3,]  1.4585402  0.2513995  1.6857897 -0.1200986  1.19094290
[4,] -0.1459071 -1.7570450  0.2968794  1.1964827  1.87066283

mclppl> # use the same random numbers for all values
mclppl> set.seed(1)

mclppl> simplify2array(mclapply(rep(4, 5), rnorm, mc.preschedule = FALSE,
mclppl+ mc.set.seed = FALSE))

Example 2. detach with unload=TRUE dies

> library(tcltk)
Loading Tcl/Tk interface ... done
> library(parallel)
> detach(package:tcltk, unload=TRUE)
> example(mclapply)
called Tcl_FindHashEntry on deleted table
Aborted

Ouch. Back to the shell, completely unceremonious. Does it matter in
which order they are loaded?

> library(parallel)
> library(tcltk)
Loading Tcl/Tk interface ... done
> detach(package:tcltk, unload=TRUE)
> example(mclapply)
called Tcl_FindHashEntry on deleted table
Aborted

In the short term, It would really help if we just had a foolproof way
to close, kill, remove, delete, detach and otherwise block tcltk, its
functions, its anything!

pj

> Here's the code:
>
> library(parallel)
> library(RSQLite)
> library(tcltk)
> #unloadNamespace("tcltk")
>
> res <- mclapply(1:2, function(x) {
> db <- DBI::dbConnect("SQLite", ":memory:")
> }, mc.cores=2)
> print("Done")
>
> When I execute it (R --vanilla  < test_parallel_db.R), it hangs
> forever, and I have to type several times CTRL+C to interrupt it. I
> then get this message:
>
> Warning messages:
> 1: In selectChildren(ac, 1) : error 'Interrupted system call' in select
> 2: In selectChildren(ac, 1) : error 'Interrupted system call' in select
>
> Then, just remove library(tcltk), or uncomment
> unloadNamespace("tcltk"), and it works fine again.
>
> I guess there's a bug somewhere, but where exactly ?
>
> Best,
>
> Karl Forner
>
> Further info:
>
>
> R version 2.15.1 (2012-06-22) -- "Roasted Marshmallows"
> Copyright (C) 2012 The R Foundation for Statistical Computing
> ISBN 3-900051-07-0
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> ubuntu 12.04 and 12.10
>
> ubuntu package tk8.5
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



-- 
Paul E. Johnson
Professor, Political Science  Assoc. Director
1541 Lilac Lane, Room 504  Center for Research Methods
University of Kansas University of Kansas
http://pj.freefaculty.org   http://quant.ku.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] best practice for packages using mclapply to avoid tcltk

2013-02-05 Thread Paul Johnson
On Sun, Feb 3, 2013 at 1:34 PM, Simon Urbanek
 wrote:
> As Peter pointed out earlier, this is better addressed by disabling the 
> Tcl/Tk event loop in forked processes.
>
Dear Simon:

I don't understand.  Can you please try to say it again?

I find Peter's comment (on Jan 3, 2013, thread title: weird bug with
parallel, RSQlite and tcltk):

"More likely, the wisdom of calling R_ProcessEvents and R_PolledEvents
in parallel processes should be questioned. I tend to think that they
should both be disabled completely conditionally on R_isForkedChild.
At least in the Tk loop, some of the events are generated as responses
to specific queries, and having one process ask for something and
another one handling the reply, leaving the first one waiting
indefinitely, is just Not Right."

That suggested to me the problem is in R itself, or the tcltk package

If package writers don't follow my suggestion, what do they think they
should do  instead?

pj
> Cheers,
> Simon
>
> On Feb 2, 2013, at 5:02 PM, Paul Johnson wrote:
>
>> Dear R-devel friends:
>>
>> I'm back to bother you again about the conflict between mclapply and
>> tcltk. I've been
>> monitoring several packages that want to use mclapply to parallelize
>> computations and
>> need to figure out what should be done.
>>
>> It appears tcltk cannot be safely unloaded, so the best we can do is
>> check for the presence of tcltk and stop if it is found before
>> mclapply() is used.
>>
>> I wish you would please review my suggestion below.  Maybe checkForTcltk()
>> could be used in the parallel package. Otherwise, we are letting
>> people run with scissors.
>>
>> There's a warning about this in ?mclapply
>>
>> It is _strongly discouraged_ to use these functions in GUI or
>>embedded environments, because it leads to several processes
>>sharing the same GUI which will likely cause chaos (and possibly
>>crashes).  Child processes should never use on-screen graphics
>>devices.(Warning in ?mclapply)
>>
>> Bug report: (https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=15040 )
>>
>> By the way, what is "embedded environments" in ?mclapply
>>
>> ## I don't want to crash your system, but if you want to see a freeze-up:
>> ## change 0 to 1 and run this:
>> if (0){
>> library(parallel)
>> library(tcltk)
>> example(mclapply)
>> }
>>
>> ## What are packagers supposed to do if they want to call mclapply?
>> ## It appears to me the best a package can do is scan for tcltk and
>> ## stop. Here's a function that does so.
>>
>> checkForTcltk <- function(){
>>if ("tcltk" %in% loadedNamespaces()){
>>stop("This function cannot be used because the R tcltk
>>package is loaded. It is necessary to Close R, and re-run
>>the program making sure that tcltk is never loaded before
>>this function is called.")
>>}
>> }
>>
>> ## put that to use.
>> MCLApply <- function(){
>>if (!require(parallel)) stop ("parallel wouldn't load")
>>checkForTcltk()
>>example(mclapply)
>> }
>>
>> ## test that:
>>
>> checkForTcltk()
>> MCLApply()
>>
>> library(tcltk)
>> checkForTcltk()
>>
>>
>> ## Why can't tcltk just be unloaded? I don't understand, but it is a deep
>> ## problem.
>>
>> ## Consider the ominous warnings in R detach's help:
>> ##
>> ## "Unloading some namespaces has undesirable side effects: e.g.
>> ##  unloading ‘grid’ closes all graphics devices, and on most systems
>> ##  ‘tcltk’ cannot be reloaded once it has been unloaded and may crash
>> ##  R if this is attempted." (Note: section of ?detach).
>> ##
>> ## To be fair, that does not say unloading tcltk is fatal, but
>> ## reloading it is fatal. And I've seen plenty of times when
>> ## unloading it is fatal.
>>
>> ## Example 1. Crash on library.dynam.unload()
>> detach("package:tcltk", unload = TRUE)
>> library.dynam.unload("tcltk", system.file(package = "tcltk"))
>>
>> ## Output
>> ## > library.dynam.unload("tcltk", system.file(package = "tcltk"))
>> ## >
>> ##  *** caught segfault ***
>> ## address 0x7f69c9d99580, cause 'memory not mapped'
>>
>> ## Possible actions:
>> ## 1: abort (with core dump, if enabled)
>> ## 2: normal R exit
>> ## 3: exit R without saving workspace
>> ## 4: exit R saving workspace
>>

[Rd] ifelse can't return a list? Please explain (R-2.15.3)

2013-03-24 Thread Paul Johnson
I hope you are doing well.

For me, this was an unexpected problem. I've hoped for quite a few
wrong things today, but I'm only asking you about this one. Why does

ifelse(1, list(a, b, c), list(x, y, z))

return a list with only a, not list(a, b, c) as I hoped.  I wish it
would either
cause an error or return the whole list, not just the first thing.

Working example:

> x <- 1
> y <- 2
> z <- 3
> a <- 4
> b <- 5
> c <- 6
> list(x,y,z)
[[1]]
[1] 1

[[2]]
[1] 2

[[3]]
[1] 3

> list(a,b,c)
[[1]]
[1] 4

[[2]]
[1] 5

[[3]]
[1] 6

> ifelse(1, list(a,b,c), list(x,y,z))
[[1]]
[1] 4

> ifelse(0, list(a,b,c), list(x,y,z))
[[1]]
[1] 1


> sessionInfo()
R version 2.15.3 (2013-03-01)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=C LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

other attached packages:
[1] rockchalk_1.5.5.10 car_2.0-16 nnet_7.3-5 MASS_7.3-23

loaded via a namespace (and not attached):
[1] compiler_2.15.3 tools_2.15.3


I realize I can code around this, but I'm just curious about why
ifelse hates me so much :(

> if (1) myvar <- list(a, b, c) else myvar <- list(x, y, z)
> myvar
[[1]]
[1] 4

[[2]]
[1] 5

[[3]]
[1] 6

> myvar <- if (1) list(a, b, c) else list(x, y, z)
> myvar
[[1]]
[1] 4

[[2]]
[1] 5

[[3]]
[1] 6


-- 
Paul E. Johnson
Professor, Political Science  Assoc. Director
1541 Lilac Lane, Room 504  Center for Research Methods
University of Kansas University of Kansas
http://pj.freefaculty.org   http://quant.ku.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Patch proposal for R style consistency (concerning deparse.c)

2013-04-17 Thread Paul Johnson
Hello, everybody.

I recognize I'm asking you to deal with a not-very-important problem.
But its important to me :)

I've noticed a little inconsistency in the print.function() output. I
traced the matter to deparse.c, for which I attach a patch that
addresses 3 separate things. There's one important thing that I'm
willing to stand behind and 2 litter things that I think would be
good, but won't argue too much about. I ask if this is a legitimate
wishlist item in your bug tracker.

Here are the changes I'm trying to achieve.

1. I want this:

} else {

and not the current
   }
  else {

2 & 3. I want to omit space after if and for.   Since if and for are
functions in R, not keywords, I suggest that there should not be a
space before the opening parentheses.

Item 1 brings the output into line with the advice of the help page
for "if".  The output looks more like the a randomly selected R file
from the R source or the recommended packages. I'm teaching new users
that they should study the R source code itself as a guide for
programming (http://pj.freefaculty.org/R/Rstyle.pdf).  My argument
there would go more smoothly if print.function obeyed the "} else {"
policy that most of your source code and packages adhere to. We do see
r-help questions about the problem that user code can't be run
line-by-line, even when it is styled like the print.function() output.

As you know, this is discussed in R Extensions Section 3.1 on tidying
R code.  In my opinion, the current approach makes code less tidy,
rather than more.

Consider an excerpt from the top of predict.glm.R.  This R code is
correctly formatted, it is consistent with most of the rest of R:

  if (!se.fit) {
## No standard errors
if(missing(newdata)) {
pred <- switch(type,
   link = object$linear.predictors,
   response = object$fitted.values,
   terms = predict.lm(object,  se.fit=se.fit,
   scale = 1, type="terms", terms=terms)
   )
if(!is.null(na.act)) pred <- napredict(na.act, pred)
} else {

But the output from '> predict.glm' is not as good:

   if (!se.fit) {
if (missing(newdata)) {
pred <- switch(type, link = object$linear.predictors,
response = object$fitted.values, terms = predict.lm(object,
  se.fit = se.fit, scale = 1, type = "terms",
  terms = terms))
if (!is.null(na.act))
pred <- napredict(na.act, pred)
}
else {


I have applied the attached patch with R 3.0.0. I don't see any damage
from it, just more tidier output:

> predict.glm
function (object, newdata = NULL, type = c("link", "response",
"terms"), se.fit = FALSE, dispersion = NULL, terms = NULL,
na.action = na.pass, ...)
{
type <- match.arg(type)
na.act <- object$na.action
object$na.action <- NULL
if(!se.fit) {
if(missing(newdata)) {
pred <- switch(type, link = object$linear.predictors,
response = object$fitted.values, terms = predict.lm(object,
  se.fit = se.fit, scale = 1, type = "terms",
  terms = terms))
if(!is.null(na.act))
pred <- napredict(na.act, pred)
} else {
pred <- predict.lm(object, newdata, se.fit, scale = 1,
type = ifelse(type == "link", "response", type),
terms = terms, na.action = na.action)
switch(type, response = {
pred <- family(object)$linkinv(pred)
}, link = , terms = )
}
} else {

Of course, you know much better than I do what else might break as a
result of this change.

In case the email list processor scrubs the patch, please try here:
http://pj.freefaculty.org/scraps/deparse-pj-20130417.patch

pj

--
Paul E. Johnson
Professor, Political Science  Assoc. Director
1541 Lilac Lane, Room 504  Center for Research Methods
University of Kansas University of Kansas
http://pj.freefaculty.org   http://quant.ku.edu
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Patch proposal for R style consistency (concerning deparse.c)

2013-04-18 Thread Paul Johnson
OK, I concede that.

Now, how about "} else {"

I will provide patch that does only that change.
?

On Thu, Apr 18, 2013 at 3:05 AM, peter dalgaard  wrote:
>
> On Apr 18, 2013, at 05:39 , Paul Johnson wrote:
>
>> 2 & 3. I want to omit space after if and for.   Since if and for are
>> functions in R, not keywords, I suggest that there should not be a
>> space before the opening parentheses.
>
> Wrong. They are part of language constructs (and they _are_ keywords, not 
> names, that's why ?for won't work). The function calls are `if`(fee, {foo}, 
> {fie}) and something rebarbative for `for`().
>
> Besides, both constructs are harder to read without the spaces.
>
> --
> Peter Dalgaard, Professor
> Center for Statistics, Copenhagen Business School
> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> Phone: (+45)38153501
> Email: pd@cbs.dk  Priv: pda...@gmail.com
>



-- 
Paul E. Johnson
Professor, Political Science  Assoc. Director
1541 Lilac Lane, Room 504  Center for Research Methods
University of Kansas University of Kansas
http://pj.freefaculty.org   http://quant.ku.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


  1   2   >