Re: [Rd] list_files() memory corruption?

2010-03-17 Thread Alistair Gee
On Mon, Mar 15, 2010 at 9:13 PM, Seth Falcon  wrote:

> I'm curious to know if this is a problem you have encountered while using R.
>
> My initial thought is that there isn't much benefit of making this part
> of the code smarter.  If your patch simplifies things, I'd be more
> interested.
>
> + seth
>

Yes. I had noticed that R occasionally segfaults (especially when I
run many concurrent R processes), so I used valgrind to log every use
of R. In the valgrind logs, I tracked the problem to list_files().

I attached a patch to platform.c (for trunk). Unfortunately, I am
having trouble building R from the subversion trunk--it is taking a
very long time decompressing/installing the recommended packages--so I
haven't been able to verify the fix yet. But my version of platform.c
does compile, and it does simplify the code b/c count_files() is no
longer needed.
--- platform.c~	2010-03-17 09:57:26.268110062 -0400
+++ platform.c	2010-03-17 09:56:09.346917055 -0400
@@ -871,61 +871,10 @@
 
 #include 
 
-static void count_files(const char *dnp, int *count,
-			Rboolean allfiles, Rboolean recursive,
-			const regex_t *reg)
-{
-DIR *dir;
-struct dirent *de;
-char p[PATH_MAX];
-#ifdef Windows
-/* For consistency with other functions */
-struct _stati64 sb;
-#else
-struct stat sb;
-#endif
-
-if (strlen(dnp) >= PATH_MAX)  /* should not happen! */
-	error(_("directory/folder path name too long"));
-if ((dir = opendir(dnp)) == NULL) {
-	warning(_("list.files: '%s' is not a readable directory"), dnp);
-} else {
-	while ((de = readdir(dir))) {
-	if (allfiles || !R_HiddenFile(de->d_name)) {
-		if (recursive) {
-#ifdef Win32
-		if (strlen(dnp) == 2 && dnp[1] == ':')
-			snprintf(p, PATH_MAX, "%s%s", dnp, de->d_name);
-		else
-			snprintf(p, PATH_MAX, "%s%s%s", dnp, R_FileSep, de->d_name);
-#else
-		snprintf(p, PATH_MAX, "%s%s%s", dnp, R_FileSep, de->d_name);
-#endif		
-#ifdef Windows
-		_stati64(p, &sb);
-#else
-		stat(p, &sb);
-#endif
-		if ((sb.st_mode & S_IFDIR) > 0) {
-			if (strcmp(de->d_name, ".") && strcmp(de->d_name, ".."))
-count_files(p, count, allfiles, recursive, reg);
-			continue;
-		}
-		}
-		if (reg) {
-		if (tre_regexec(reg, de->d_name, 0, NULL, 0) == 0) (*count)++;
-		} else (*count)++;
-	}
-	}
-	closedir(dir);
-}
-}
-
 static void list_files(const char *dnp, const char *stem, int *count, SEXP ans,
 		   Rboolean allfiles, Rboolean recursive,
-   const regex_t *reg)
+		   int pattern, regex_t reg, int *countmax, PROTECT_INDEX *idx)
 {
-int ans_len = length(ans);
 DIR *dir;
 struct dirent *de;
 char p[PATH_MAX], stem2[PATH_MAX];
@@ -971,20 +920,28 @@
 			} else
 strcpy(stem2, de->d_name);
 			list_files(p, stem2, count, ans, allfiles,
-   recursive, reg);
+   recursive, pattern, reg, countmax, idx);
 			}
 			continue;
 		}
 		}
-/* number of files could have changed since call to count_files */
-if (*count >= ans_len) break;
-		if (reg) {
-		if (tre_regexec(reg, de->d_name, 0, NULL, 0) == 0)
+		if (pattern) {
+if (tre_regexec(®, de->d_name, 0, NULL, 0) == 0) {
+if (*count == *countmax - 1) {
+*countmax *= 2;
+REPROTECT(ans = lengthgets(ans, *countmax), *idx);
+}
 			SET_STRING_ELT(ans, (*count)++,
    filename(stem, de->d_name));
-		} else
+}
+		} else {
+if (*count == *countmax - 1) {
+*countmax *= 2;
+REPROTECT(ans = lengthgets(ans, *countmax), *idx);
+}
 		SET_STRING_ELT(ans, (*count)++,
    filename(stem, de->d_name));
+}
 	}
 	}
 	closedir(dir);
@@ -993,11 +950,13 @@
 
 SEXP attribute_hidden do_listfiles(SEXP call, SEXP op, SEXP args, SEXP rho)
 {
+PROTECT_INDEX ipx;
 SEXP d, p, ans;
 int allfiles, fullnames, count, pattern, recursive, igcase, flags;
 int i, ndir;
 const char *dnp;
 regex_t reg;
+int countmax = 100;
 
 checkArity(op, args);
 d = CAR(args);  args = CDR(args);
@@ -1019,19 +978,13 @@
 
 if (pattern && tre_regcomp(®, translateChar(STRING_ELT(p, 0)), flags))
 	error(_("invalid 'pattern' regular expression"));
-count = 0;
-for (i = 0; i < ndir ; i++) {
-	if (STRING_ELT(d, i) == NA_STRING) continue;
-	dnp = R_ExpandFileName(translateChar(STRING_ELT(d, i)));
-	count_files(dnp, &count, allfiles, recursive, pattern ? ® : NULL);
-}
-PROTECT(ans = allocVector(STRSXP, count));
+PROTECT_WITH_INDEX(ans = allocVector(STRSXP, count), &ipx);
 count = 0;
 for (i = 0; i < ndir ; i++) {
 	if (STRING_ELT(d, i) == NA_STRING) continue;
 	dnp = R_ExpandFileName(translateChar(STRING_ELT(d, i)));
 	list_files(dnp, fullnames ? dnp : NULL, &count, ans, allfiles,
-		   recursive, pattern ? 

[Rd] Suggestion: Move \alias{.conflicts.OK} to base/man/library.Rd instead.

2010-03-17 Thread Henrik Bengtsson
Currently help.search(".conflicts.OK") gives:

  methods::.conflicts.OKAdditional (Support) Functions for Methods

However, help(".conflicts.OK", package="methods") provides no
information on the term ".conflicts.OK".

Check the R-devel SVN source code, this is only because there is an:

\alias{.conflicts.OK}

in the methods/man/MethodSupport.Rd file.

Is this a leftover from old times?  May I suggest to instead place
that \alias() in base/man/library.Rd, where ".conflicts.OK" is
discussed.  (The other place it is discussed in ?attach).

/Henrik

PS. A better place to document it is probably in 'Writing R
Extensions', which is intended for developers.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Suggestion: Not having to export .conflicts.OK in name spaces

2010-03-17 Thread Henrik Bengtsson
Currently library() and attach() fail to locate an existing
'.conflicts.OK' in a package wit name space, unless it is exported.
Since there should be little interest in exporting '.conflicts.OK'
otherwise, one may argue that those methods should look for
'.conflicts.OK' even if it is not exported.  If so, a patch for
library() is:

>svn diff library.R
Index: library.R
===
--- library.R   (revision 51296)
+++ library.R   (working copy)
@@ -312,7 +312,7 @@
 nogenerics <-
 !.isMethodsDispatchOn() || checkNoGenerics(env, package
)
 if(warn.conflicts &&
-   !exists(".conflicts.OK", envir = env, inherits = FALSE))

+   !exists(".conflicts.OK", envir = ns, inherits = FALSE))
 checkConflicts(package, pkgname, pkgpath,
nogenerics, ns)
 runUserHook(package, pkgpath)


In order to be consistent, a patch for attach() should also be
provided, which requires some more changes, but I won't spend time on
that unless the above is a wanted feature.

/Henrik

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] list_files() memory corruption?

2010-03-17 Thread Seth Falcon
On 3/17/10 7:16 AM, Alistair Gee wrote:
> Yes. I had noticed that R occasionally segfaults (especially when I
> run many concurrent R processes), so I used valgrind to log every use
> of R. In the valgrind logs, I tracked the problem to list_files().
> 
> I attached a patch to platform.c (for trunk). Unfortunately, I am
> having trouble building R from the subversion trunk--it is taking a
> very long time decompressing/installing the recommended packages--so I
> haven't been able to verify the fix yet. But my version of platform.c
> does compile, and it does simplify the code b/c count_files() is no
> longer needed.

Hmm, I see that you "grow" the vector containing filenames by calling
lengthgets and doubling the length.  I don't see where you cleanup
before returning -- seems likely you will end up returning a vector that
is too long.

And there are some performance characteristics to consider in terms of
both run time and memory profile.  Does making a single pass through the
files make up for the allocations/data copying that result from
lengthgets?  Is it worth possibly requiring twice the memory for the
worst case?

+ seth

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Call R function from Java

2010-03-17 Thread sara_m

Hi,

i would like also to install JRI in Eclipse but it doesnt work.. 

 @way4thesub: can you please tell me how you did that?

thanks
sara
-- 
View this message in context: 
http://n4.nabble.com/Call-R-function-from-Java-tp911738p1596298.html
Sent from the R devel mailing list archive at Nabble.com.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] list_files() memory corruption?

2010-03-17 Thread Alistair Gee
On Wed, Mar 17, 2010 at 9:42 AM, Seth Falcon  wrote:
> On 3/17/10 7:16 AM, Alistair Gee wrote:
>> Yes. I had noticed that R occasionally segfaults (especially when I
>> run many concurrent R processes), so I used valgrind to log every use
>> of R. In the valgrind logs, I tracked the problem to list_files().
>>
>> I attached a patch to platform.c (for trunk). Unfortunately, I am
>> having trouble building R from the subversion trunk--it is taking a
>> very long time decompressing/installing the recommended packages--so I
>> haven't been able to verify the fix yet. But my version of platform.c
>> does compile, and it does simplify the code b/c count_files() is no
>> longer needed.
>
> Hmm, I see that you "grow" the vector containing filenames by calling
> lengthgets and doubling the length.  I don't see where you cleanup
> before returning -- seems likely you will end up returning a vector that
> is too long.
>
> And there are some performance characteristics to consider in terms of
> both run time and memory profile.  Does making a single pass through the
> files make up for the allocations/data copying that result from
> lengthgets?  Is it worth possibly requiring twice the memory for the
> worst case?
>
> + seth
>
>
>
>
>
Sorry, I left out a call to shorten the vector (via final call to
lengthgets()). See new patch.

BTW, I modeled this code after code I found in the RODBC package that
handles rows returned from a database query. I don't know if this is a
typical approach to reallocation. Maybe there is a better way of
extending the vector, though wouldn't most of the memory usage be in
the strings (of the filenames) rather than the STRSXP vector itself?

Anyway, I'm offering this (untested) fix as it handles both
directories that have grown and directories that have shrunk, so that
the length of the vector is correct in both cases.

--
--- platform.c~	2010-03-17 09:57:26.268110062 -0400
+++ platform.c	2010-03-17 13:36:55.416853389 -0400
@@ -871,61 +871,10 @@
 
 #include 
 
-static void count_files(const char *dnp, int *count,
-			Rboolean allfiles, Rboolean recursive,
-			const regex_t *reg)
-{
-DIR *dir;
-struct dirent *de;
-char p[PATH_MAX];
-#ifdef Windows
-/* For consistency with other functions */
-struct _stati64 sb;
-#else
-struct stat sb;
-#endif
-
-if (strlen(dnp) >= PATH_MAX)  /* should not happen! */
-	error(_("directory/folder path name too long"));
-if ((dir = opendir(dnp)) == NULL) {
-	warning(_("list.files: '%s' is not a readable directory"), dnp);
-} else {
-	while ((de = readdir(dir))) {
-	if (allfiles || !R_HiddenFile(de->d_name)) {
-		if (recursive) {
-#ifdef Win32
-		if (strlen(dnp) == 2 && dnp[1] == ':')
-			snprintf(p, PATH_MAX, "%s%s", dnp, de->d_name);
-		else
-			snprintf(p, PATH_MAX, "%s%s%s", dnp, R_FileSep, de->d_name);
-#else
-		snprintf(p, PATH_MAX, "%s%s%s", dnp, R_FileSep, de->d_name);
-#endif		
-#ifdef Windows
-		_stati64(p, &sb);
-#else
-		stat(p, &sb);
-#endif
-		if ((sb.st_mode & S_IFDIR) > 0) {
-			if (strcmp(de->d_name, ".") && strcmp(de->d_name, ".."))
-count_files(p, count, allfiles, recursive, reg);
-			continue;
-		}
-		}
-		if (reg) {
-		if (tre_regexec(reg, de->d_name, 0, NULL, 0) == 0) (*count)++;
-		} else (*count)++;
-	}
-	}
-	closedir(dir);
-}
-}
-
 static void list_files(const char *dnp, const char *stem, int *count, SEXP ans,
 		   Rboolean allfiles, Rboolean recursive,
-   const regex_t *reg)
+		   int pattern, regex_t reg, int *countmax, PROTECT_INDEX *idx)
 {
-int ans_len = length(ans);
 DIR *dir;
 struct dirent *de;
 char p[PATH_MAX], stem2[PATH_MAX];
@@ -971,20 +920,28 @@
 			} else
 strcpy(stem2, de->d_name);
 			list_files(p, stem2, count, ans, allfiles,
-   recursive, reg);
+   recursive, pattern, reg, countmax, idx);
 			}
 			continue;
 		}
 		}
-/* number of files could have changed since call to count_files */
-if (*count >= ans_len) break;
-		if (reg) {
-		if (tre_regexec(reg, de->d_name, 0, NULL, 0) == 0)
+		if (pattern) {
+if (tre_regexec(®, de->d_name, 0, NULL, 0) == 0) {
+if (*count == *countmax - 1) {
+*countmax *= 2;
+REPROTECT(ans = lengthgets(ans, *countmax), *idx);
+}
 			SET_STRING_ELT(ans, (*count)++,
    filename(stem, de->d_name));
-		} else
+}
+		} else {
+if (*count == *countmax - 1) {
+*countmax *= 2;
+REPROTECT(ans = lengthgets(ans, *countmax), *idx);
+}
 		SET_STRING_ELT(ans, (*count)++,
    filename(stem, de->d_name));
+}
 	}
 	}
 	closedir(dir);
@@ -993,11 +950,13 @@
 
 SEXP attribute_hidden do_listfiles(SEXP call, SEXP op, SEXP args, SEXP rho)
 {
+PROTECT_IND

[Rd] y ~ X -1 , X a matrix

2010-03-17 Thread Ross Boylan
While browsing some code I discovered a call to lm that used a formula y
~ X - 1, where X was a matrix.

Looking through the documentation of formula, lm, model.matrix and maybe
some others I couldn't find this useage (R 2.10.1).  Is it anything I
can count on in future versions?  Is there documentation I've
overlooked?

For the curious: model.frame on the above equation returns a data.frame
with 2 columns.  The second "column" is the whole X matrix.
model.matrix on that object returns the expected matrix, with the
transition from the odd model.frame to the regular matrix happening in
an .Internal call.

Thanks.
Ross

P.S. I would appreciate cc's, since mail problems are preventing me from
seeing list mail.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] y ~ X -1 , X a matrix

2010-03-17 Thread Ted Harding
On 17-Mar-10 23:32:41, Ross Boylan wrote:
> While browsing some code I discovered a call to lm that used
> a formula y ~ X - 1, where X was a matrix.
> 
> Looking through the documentation of formula, lm, model.matrix
> and maybe some others I couldn't find this useage (R 2.10.1).
> Is it anything I can count on in future versions?  Is there
> documentation I've overlooked?
> 
> For the curious: model.frame on the above equation returns a
> data.frame with 2 columns.  The second "column" is the whole X
> matrix. model.matrix on that object returns the expected matrix,
> with the transition from the odd model.frame to the regular
> matrix happening in an .Internal call.
> 
> Thanks.
> Ross
> 
> P.S. I would appreciate cc's, since mail problems are preventing
> me from seeing list mail.

Hmmm ... I'm not sure what is the problem with what you describe.
Code:

  set.seed(54321)
  X  <- matrix(rnorm(50),ncol=2)
  Y  <- 1*X[,1] + 2*X[,2] + 0.25*rnorm(25)
  LM <- lm(Y ~ X-1)

  summary(LM)
  # Call:
  # lm(formula = Y ~ X - 1)
  # Residuals:
  #  Min   1Q   Median   3Q  Max 
  # -0.39942 -0.13143 -0.02249  0.11662  0.61661 
  # Coefficients:
  #Estimate Std. Error t value Pr(>|t|)
  # X1  0.977070.04159   23.49   <2e-16 ***
  # X2  2.091520.06714   31.15   <2e-16 ***
  # ---
  # Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 
  # Residual standard error: 0.2658 on 23 degrees of freedom
  # Multiple R-squared: 0.9863, Adjusted R-squared: 0.9851 
  # F-statistic: 826.6 on 2 and 23 DF,  p-value: < 2.2e-16 

  model.frame(LM)
  #  Y  X.1  X.2
  # 1   0.04936244 -0.178900750  0.051420078
  # 2  -0.54224173 -0.928044132 -0.027963292
  # [...]
  # 24  1.54196979  0.312332806  0.602009497
  # 25 -0.16928420 -1.285559427  0.394790358

  str(model.frame(LM))
  #  $ Y: num  0.0494 -0.5422 -0.7295 -3.4422 -3.1296 ...
  #  $ X: num [1:25, 1:2] -0.179 -0.928 -0.784 -1.651 -0.408 ...
  # [...]

  model.frame(Y ~ X-1)
  #  Y  X.1  X.2
  # 1   0.04936244 -0.178900750  0.051420078
  # 2  -0.54224173 -0.928044132 -0.027963292
  # [...]
  # 24  1.54196979  0.312332806  0.602009497
  # 25 -0.16928420 -1.285559427  0.394790358
  ## (Identical to above)

  str(model.frame(Y ~ X-1))
  # $ Y: num  0.0494 -0.5422 -0.7295 -3.4422 -3.1296 ...
  # $ X: num [1:25, 1:2] -0.179 -0.928 -0.784 -1.651 -0.408 ...
  # [...]
  ## (Identical to above)

Maybe the clue (admittedly somewhat obtuse( can be found in ?lm:

  lm(formula, data, subset, weights, na.action,
 method = "qr", model = TRUE, x = FALSE, y = FALSE, qr = TRUE,
 singular.ok = TRUE, contrasts = NULL, offset, ...)
  [...]

  data: an optional data frame, list or environment (or object
coercible by 'as.data.frame' to a data frame) containing the
variables in the model.  If not found in 'data', the variables
are taken from 'environment(formula)', typically the
environment from which ?lm? is called.

So, in the example the variables are taken from X, "coercible
by 'as.data.frame' ... taken from 'environment(formula)'".

Hence (I guess) X is found in the environment and is coerced
into a dataframe with 2 columns, and X.1, X.2 are taken from there.

R Gurus: Please comment! (I'm only guessing by plausibility).
Ted.


E-Mail: (Ted Harding) 
Fax-to-email: +44 (0)870 094 0861
Date: 18-Mar-10   Time: 00:57:20
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] y ~ X -1 , X a matrix

2010-03-17 Thread Ross Boylan
On Thu, 2010-03-18 at 00:57 +, ted.hard...@manchester.ac.uk wrote:
> On 17-Mar-10 23:32:41, Ross Boylan wrote:
> > While browsing some code I discovered a call to lm that used
> > a formula y ~ X - 1, where X was a matrix.
> > 
> > Looking through the documentation of formula, lm, model.matrix
> > and maybe some others I couldn't find this useage (R 2.10.1).
> > Is it anything I can count on in future versions?  Is there
> > documentation I've overlooked?
> > 
> > For the curious: model.frame on the above equation returns a
> > data.frame with 2 columns.  The second "column" is the whole X
> > matrix. model.matrix on that object returns the expected matrix,
> > with the transition from the odd model.frame to the regular
> > matrix happening in an .Internal call.
> > 
> > Thanks.
> > Ross
> > 
> > P.S. I would appreciate cc's, since mail problems are preventing
> > me from seeing list mail.
> 
> Hmmm ... I'm not sure what is the problem with what you describe.
There is no problem in the "it doesn't work" sense.
There is a problem that it seems undocumented--though the help you quote
could rather indirectly be taken as a clue--and thus, possibly, subject
to change in later releases.

Ross Boylan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] y ~ X -1 , X a matrix

2010-03-17 Thread Dirk Eddelbuettel

On 17 March 2010 at 16:32, Ross Boylan wrote:
| While browsing some code I discovered a call to lm that used a formula y
| ~ X - 1, where X was a matrix.
| 
| Looking through the documentation of formula, lm, model.matrix and maybe
| some others I couldn't find this useage (R 2.10.1).  Is it anything I
| can count on in future versions?  Is there documentation I've
| overlooked?

>From help(formula):

  Details:

   In addition to ‘+’ and ‘:’, a number of other operators are useful
   in model formulae.  [...] The
   ‘-’ operator removes the specified terms, so that ‘(a+b+c)^2 -
   a:b’ is identical to ‘a + b + c + b:c + a:c’.  It can also used to
   remove the intercept term: ‘y ~ x - 1’ is a line through the
   origin.  A model with no intercept can be also specified as ‘y ~ x
   + 0’ or ‘y ~ 0 + x’.

What exactly are you questioning? That X is a matrix? That doesn't take
away from the fact that the rest is a formula.

Dirk

-- 
  Registration is open for the 2nd International conference R / Finance 2010
  See http://www.RinFinance.com for details, and see you in Chicago in April!

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel