Re: [Rd] optim(…?=, =?utf-8?Q?method=‘L-BFGS-B’) stops with an error message while violating the lower bound

2016-10-10 Thread Martin Maechler
> Spencer Graves 
> on Sat, 8 Oct 2016 18:03:43 -0500 writes:

[.]

>  2.  It would be interesting to know if the
> current algorithm behind optim and optimx with
> method='L-BFGS-B' incorporates Morales and Nocedal (2011)
> 'Remark on “Algorithm 778: L-BFGS-B: Fortran Subroutines
> for Large-Scale Bound Constrained Optimization”'.  I
> created this vignette and started this threat hoping that
> someone on the R Core team might decide it's worth
> checking things like that.

well I hope you mean "thread" rather "threat"  ;-)

I've now looked at the reference above, which is indeed quite
interesting.
doi 10.1145/2049662.2049669
--> http://dl.acm.org/citation.cfm?doid=2049662.2049669
A "free" (pre-publication I assume) version of the manuscript is
  http://www.eecs.northwestern.edu/~morales/PSfiles/acm-remark.pdf

The authors, Morales and Nocedal, the 2nd one being one of the
original L-BFGS-B(1997) paper, make two remarks, the 2nd one
about the "machine epsilon" used, and I can assure you that R's
optim() version never suffered from that; we've always been
using a C translation of the fortran code, and then used DBL_EPSILON.
R's (main) source file for that is in .../src/appl/lbfgsb.c, e.g., here
https://svn.r-project.org/R/trunk/src/appl/lbfgsb.c

OTOH, their remark 1 is very relevant and promising faster /
more reliable convergence. 
I'd be "happy" if optim() could gain a new option, say, "L-BFGS-B-2011"
which would incorporate what they call "modified L-BFGS-B".

However, I did not find published code to go together with their
remark.
Ideally, some of you interested in this, would provide a patch
against the above  lbfgsb.c  file

Martin Maechler,
ETH Zurich

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Bug/Inconsistency in merge() with all.x when first nonmatching column in y is matrix

2016-10-10 Thread Russ Hamilton via R-devel
I've noticed inconsistent behavior with merge() when using all.x=TRUE.
After some digging I found the following test cases:
1) The snippet below doesn't work as expected, as the non-matching
columns of rows in a but not b take the value from the first matching
row instead of being NA:
--- Snip >>>
NUM<-25;
a <- data.frame(id=factor(letters[1:NUM]), qq=rep(NA, NUM), rr=rep(1.0,NUM))
b <- data.frame(id=c("e","a","f","y","x"))

b$mm <- as.vector(c(1,2,3.1,4.0,NA))%o%3.14
b$nn <- rep("from b", 5)

merge(a,b,by="id",all.x=TRUE)
<<< Snip ---
2) The modified snippet below works as expected:
--- Snip >>>
NUM<-25;
a <- data.frame(id=factor(letters[1:NUM]), qq=rep(NA, NUM), rr=rep(1.0,NUM))
b <- data.frame(id=c("e","a","f","y","x"))

b$nn <- rep("from b", 5)
b$mm <- as.vector(c(1,2,3.1,4.0,NA))%o%3.14

merge(a,b,by="id",all.x=TRUE)
<<< Snip ---

In src/library/base/R/merge.R:154, I see the following:
--- Snip >>>
for(i in seq_along(y)) {
## do it this way to invoke methods for e.g. factor
if(is.matrix(y[[1]])) y[[1]][zap, ] <- NA
else is.na(y[[i]]) <- zap
}
<<< Snip ---
Changing the '1's in the if statement to 'i's fixes this issue for me, i.e.:
--- Snip >>>
for(i in seq_along(y)) {
## do it this way to invoke methods for e.g. factor
if(is.matrix(y[[i]])) y[[i]][zap, ] <- NA
else is.na(y[[i]]) <- zap
}
<<< Snip ---
I'm actually not sure if the "if statement" is even needed (the "else"
case seems to handle matrices just fine).

--Russ Hamilton

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] optim(…?=, =?utf-8?Q?method=‘L-BFGS-B’) stops with an error message while violating the lower bound

2016-10-10 Thread Avraham Adler
I believe the code can be found here:
http://users.iems.northwestern.edu/~nocedal/lbfgsb.html. Specifically,
lbfgsb.f in version 3.0 starts:

This is a modified version of L-BFGS-B. Minor changes in the updated
c code appear preceded by a line comment as follows
c
c c-jlm-jn
c
c Major changes are described in the accompanying paper:
c
c Jorge Nocedal and Jose Luis Morales, Remark on "Algorithm 778:
c L-BFGS-B: Fortran Subroutines for Large-Scale Bound Constrained
c Optimization"  (2011). To appear in  ACM Transactions on
c Mathematical Software,
c
c The paper describes an improvement and a correction to Algorithm 778.
c It is shown that the performance of the algorithm can be improved
c significantly by making a relatively simple modication to the subspace
c minimization phase. The correction concerns an error caused by the use
c of routine dpmeps to estimate machine precision.


It is released under the New 3-clause BSD license, so porting it to C
for inclusion into R should be OK as long as the i's are dotted and
t's crossed.


Avi

On Mon, Oct 10, 2016 at 5:54 AM, Martin Maechler
 wrote:
>> Spencer Graves 
>> on Sat, 8 Oct 2016 18:03:43 -0500 writes:
>
> [.]
>
> >  2.  It would be interesting to know if the
> > current algorithm behind optim and optimx with
> > method='L-BFGS-B' incorporates Morales and Nocedal (2011)
> > 'Remark on “Algorithm 778: L-BFGS-B: Fortran Subroutines
> > for Large-Scale Bound Constrained Optimization”'.  I
> > created this vignette and started this threat hoping that
> > someone on the R Core team might decide it's worth
> > checking things like that.
>
> well I hope you mean "thread" rather "threat"  ;-)
>
> I've now looked at the reference above, which is indeed quite
> interesting.
> doi 10.1145/2049662.2049669
> --> http://dl.acm.org/citation.cfm?doid=2049662.2049669
> A "free" (pre-publication I assume) version of the manuscript is
>   http://www.eecs.northwestern.edu/~morales/PSfiles/acm-remark.pdf
>
> The authors, Morales and Nocedal, the 2nd one being one of the
> original L-BFGS-B(1997) paper, make two remarks, the 2nd one
> about the "machine epsilon" used, and I can assure you that R's
> optim() version never suffered from that; we've always been
> using a C translation of the fortran code, and then used DBL_EPSILON.
> R's (main) source file for that is in .../src/appl/lbfgsb.c, e.g., here
> https://svn.r-project.org/R/trunk/src/appl/lbfgsb.c
>
> OTOH, their remark 1 is very relevant and promising faster /
> more reliable convergence.
> I'd be "happy" if optim() could gain a new option, say, "L-BFGS-B-2011"
> which would incorporate what they call "modified L-BFGS-B".
>
> However, I did not find published code to go together with their
> remark.
> Ideally, some of you interested in this, would provide a patch
> against the above  lbfgsb.c  file
>
> Martin Maechler,
> ETH Zurich
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] PKG_LIBS in make child processes

2016-10-10 Thread Ulrich Bodenhofer

[cross-posted from bioc-devel list]

Hi all,

I have a subtle question related to how R CMD SHLIB handles variables in 
make child processes. In more detail: I am the maintainer of the 'msa' 
package which has been in Bioconductor since April 2015. This package 
integrates three open-source libraries for multiple sequence alignment. 
This is organized in the following way: in src/, there are three 
sub-directories, one for each of the libraries (plus another one for a 
garbage collector library, but that is not relevant at this point). 
src/Makevars is made such that the libraries are compiled individually 
to static libraries in their respective sub-directory, then these static 
libraries are copied to src/, and finally the static libraries are 
integrated into msa.so. The Makevars file looks as follows:


PKG_LIBS=`${R_HOME}/bin${R_ARCH_BIN}/Rscript -e "if
(Sys.info()['sysname'] == 'Darwin') cat('-Wl,-all_load ./libgc.a
./libClustalW.a ./libClustalOmega.a ./libMuscle.a') else
cat('-Wl,--whole-archive ./libgc.a ./libClustalW.a
./libClustalOmega.a ./libMuscle.a  -Wl,--no-whole-archive')"`
PKG_CXXFLAGS=-I"./gc-7.2/include" -I"./Muscle/" -I"./ClustalW/src"
-I"./ClustalOmega/src"

.PHONY: all mylibs

all: $(SHLIB)
$(SHLIB): mylibs

mylibs: build_gc build_muscle build_clustalw build_clustalomega

build_gc:
 make --file=msaMakefile --directory=gc-7.2
 @echo ""
 @echo "-- GC  -"
 @echo ""
 @echo "- Compilation finished -"
 @echo ""

build_muscle:
 make --file=msaMakefile --directory=Muscle
 @echo ""
 @echo " MUSCLE "
 @echo ""
 @echo "- Compilation finished -"
 @echo ""

build_clustalw:
 make --file=msaMakefile --directory=ClustalW
 @echo ""
 @echo "--- ClustalW ---"
 @echo ""
 @echo "- Compilation finished -"
 @echo ""

build_clustalomega:
 make --file=msaMakefile --directory=ClustalOmega
 @echo ""
 @echo "- ClustalOmega -"
 @echo ""
 @echo "- Compilation finished -"
 @echo ""

This has always worked on Linux and Mac OS so far. Now I have received 
an error report from a user who cannot install the package on a 64-bit 
openSUSE 13.1 system using R 3.3.1. It turned out that R CMD SHLIB as 
called in the make child processes (make target 'build_muscle' above) 
uses the value of PKG_LIBS defined in the first line of the top-level 
Makevars file shown above (which of course does not work and makes no 
sense), while this does not happen on any other Unix-like system I have 
tried so far (Ubuntu, CentOS, Mac OS). Maybe somebody can shed some 
light on how variables defined inside the Makevars file propagate to 
child processes. Thanks so much in advance!


Best regards,
Ulrich

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel