[Rd] generic sample

2006-11-09 Thread Romain Francois
Hi,

When x is a data frame, sample(x) is not really sensible :

R> sample(iris) # sends back all the iris data set.

What about a generic sample function (from R-devel/src/library/base/R) 
with a method for class `data.frame` that would sample the row indexes 
and then do the subset. See the code below.


Cheers,


Romain


sample <- function(x, ...)
  UseMethod("sample")
 
sample.data.frame <- function(x, ...){
  x[ sample(1:nrow(x), ...), ] 
}


sample.default <- function(x, size, replace=FALSE, prob=NULL)
{
if(length(x) == 1 && x >= 1) {
if(missing(size)) size <- x
.Internal(sample(x, size, replace, prob))
}
else {
if(missing(size)) size <- length(x)
x[.Internal(sample(length(x), size, replace, prob))]
}
}


R> set.seed(4)
R> sample(iris, 5)
Sepal.Length Sepal.Width Petal.Length Petal.WidthSpecies
88   6.3 2.3  4.4 1.3 versicolor
24.9 3.0  1.4 0.2 setosa
44   5.0 3.5  1.6 0.6 setosa
41   5.0 3.5  1.3 0.3 setosa
119  7.7 2.6  6.9 2.3  virginica



-- 
*mangosolutions*
/data analysis that delivers/

Tel   +44 1249 467 467
Fax   +44 1249 467 468

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Retrieving function name

2006-11-09 Thread Tom McCallum
Hi,

Does anyone know how I can retrieve a function name, for example

If I have a function f as follows:

f <- function( myfunc ) {
print( name_of(myfunc) );
}

I want to know what I should have as "name_of" such that I could call this  
with :
f( median )
and it would print "median"

or f( my_function_name ) and it would print "m_function_name".

So far all I can get is the function definition that myfunc points to.

I thought the structure command might do it but this also just gives back  
the function definition and not the original name.

Any suggestions?

Tom

-- 
Dr. Thomas McCallum
Systems Architect,
Level E Limited
ETTC, The King's Buildings
Mayfield Road,
Edinburgh EH9 3JL, UK
Work  +44 (0) 131 472 4813
Fax:  +44 (0) 131 472 4719
http://www.levelelimited.com
Email: [EMAIL PROTECTED]

Level E is a limited company incorporated in Scotland. The contents of  
this e-mail are privileged and/or confidential. If you are not the  
intended recipient, please
notify the sender and ensure this e-mail is deleted and not read, copied  
or disclosed. It is your responsibility to scan this e-mail and any  
attachments for
computer viruses or other defects. Level E does not accept liability for  
any loss or damage which may result from this e-mail or any attachment.  
E-mail is not secure
and can be intercepted, corrupted or amended. Level E does not accept  
liability for errors or omissions arising as a result of interrupted or  
defective transmission.
Any views, opinions, conclusions or other information in this e-mail which  
do not relate to the business of Level E Limited are not authorised by  
Level E. Unless
specifically stated and authorised by Level E, nothing in this e-mail  
shall be taken to be an offer or acceptance of any contract of any nature.  
E-mail entering or leaving Level E's system is subject to random  
monitoring and recording.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Retrieving function name

2006-11-09 Thread Dimitris Rizopoulos
probably you're looking for:

f <- function(FUN, x){
list(funName = deparse(substitute(FUN)), value = FUN(x))
}

f(mean, rnorm(10))
f(median, rnorm(10))


I hope it helps.

Best,
Dimitris


Dimitris Rizopoulos
Ph.D. Student
Biostatistical Centre
School of Public Health
Catholic University of Leuven

Address: Kapucijnenvoer 35, Leuven, Belgium
Tel: +32/(0)16/336899
Fax: +32/(0)16/337015
Web: http://med.kuleuven.be/biostat/
 http://www.student.kuleuven.be/~m0390867/dimitris.htm


- Original Message - 
From: "Tom McCallum" <[EMAIL PROTECTED]>
To: 
Sent: Thursday, November 09, 2006 4:28 PM
Subject: [Rd] Retrieving function name


> Hi,
>
> Does anyone know how I can retrieve a function name, for example
>
> If I have a function f as follows:
>
> f <- function( myfunc ) {
> print( name_of(myfunc) );
> }
>
> I want to know what I should have as "name_of" such that I could 
> call this
> with :
> f( median )
> and it would print "median"
>
> or f( my_function_name ) and it would print "m_function_name".
>
> So far all I can get is the function definition that myfunc points 
> to.
>
> I thought the structure command might do it but this also just gives 
> back
> the function definition and not the original name.
>
> Any suggestions?
>
> Tom
>
> -- 
> Dr. Thomas McCallum
> Systems Architect,
> Level E Limited
> ETTC, The King's Buildings
> Mayfield Road,
> Edinburgh EH9 3JL, UK
> Work  +44 (0) 131 472 4813
> Fax:  +44 (0) 131 472 4719
> http://www.levelelimited.com
> Email: [EMAIL PROTECTED]
>
> Level E is a limited company incorporated in Scotland. The contents 
> of
> this e-mail are privileged and/or confidential. If you are not the
> intended recipient, please
> notify the sender and ensure this e-mail is deleted and not read, 
> copied
> or disclosed. It is your responsibility to scan this e-mail and any
> attachments for
> computer viruses or other defects. Level E does not accept liability 
> for
> any loss or damage which may result from this e-mail or any 
> attachment.
> E-mail is not secure
> and can be intercepted, corrupted or amended. Level E does not 
> accept
> liability for errors or omissions arising as a result of interrupted 
> or
> defective transmission.
> Any views, opinions, conclusions or other information in this e-mail 
> which
> do not relate to the business of Level E Limited are not authorised 
> by
> Level E. Unless
> specifically stated and authorised by Level E, nothing in this 
> e-mail
> shall be taken to be an offer or acceptance of any contract of any 
> nature.
> E-mail entering or leaving Level E's system is subject to random
> monitoring and recording.
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
> 


Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Retrieving function name

2006-11-09 Thread Peter Dalgaard
"Tom McCallum" <[EMAIL PROTECTED]> writes:

> Hi,
> 
> Does anyone know how I can retrieve a function name, for example
> 
> If I have a function f as follows:
> 
> f <- function( myfunc ) {
>   print( name_of(myfunc) );
> }
> 
> I want to know what I should have as "name_of" such that I could call this  
> with :
>   f( median )
> and it would print "median"
> 
> or f( my_function_name ) and it would print "m_function_name".
> 
> So far all I can get is the function definition that myfunc points to.
> 
> I thought the structure command might do it but this also just gives back  
> the function definition and not the original name.
> 
> Any suggestions?

Depending on what you really want, this is either impossible or
trivial. The trivial version is 

f <- function(x) print(deparse(substitute(x)))

and the impossible one is to get something that prints "mean" if you
do something like

x<-1
f(switch(x, 1 = mean, 2 = median, 3 = sd, 4 = IQR))

or 

g <- function(foo) f(foo)
g(mean)

or indeed does anything sensible with

f(function(x,y) x*y)


Thing is, functions do not "have names", they can be anonymous or
assigned to multiple names, or be passed as arguments.

-- 
   O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
~~ - ([EMAIL PROTECTED])  FAX: (+45) 35327907

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] allocVector bug ?

2006-11-09 Thread Luke Tierney
On Wed, 8 Nov 2006, Vladimir Dergachev wrote:

> On Wednesday 08 November 2006 12:56 pm, Luke Tierney wrote:
>> On Mon, 6 Nov 2006, Vladimir Dergachev wrote:
>>> Hi Luke,
>>>
>>>
>>> I generally agree with this, however I believe that current logic breaks
>>> down for large allocation sizes and my code ends up spending 70% (and up)
>>> of computer time spinning inside garbage collector (I run oprofile to
>>> observe what is going on).
>>
>> Again please be careful about these sorts of statements.  I am sure
>> there are bugs in the memory manager and places where things "break
>> down" but this isn't one of them.  The memory manager is quite
>> deliberately biased towards keeping the total allocation low, if
>> necessary at the expense of some extra gc overhead.  This is needed if
>> we want to use the same settings across a wide range of
>> configurations, some of which have relatively little memory available
>> (think student labs).  The memory manager does try to learn about the
>> needs of a session, and as a result triggering value get adjusted.  It
>> is not true that every large allocation causes a gc.  This may be true
>> _initially_, but once total memory usage stabilizes at a particular
>> level it is no longer true (look at the way the heap limits are
>> adjusted).
>>
>> This approach of adjusting based on usage within a session is
>> reasonable and works well for longer sessions.  It may not work well
>> for short scripts that need large allocations.  I doubt that any
>> automated setting can work well in that situation while at the same
>> time keeping memory usage in other settings low. So it may be useful
>> to find ways of specifying a collection strategy appropriate for these
>> situations. If you can send me a simplified version of your usage
>> scenario then I will give this some thought and see if we can come up
>> with some reasonable ways of allowing user code to tweak gc behavior
>> for these situations.
>>
>
> Hi Luke,
>
>   Yes, I gladly concede the point that for a heuristic algorithm the notion
> of what is a "bug" is murky (besides crashes, etc, which is not what I am not
> talking about).
>
>   Here is why I called this a bug:
>
> 1. My understanding is that each time gc() needs to increase memory it
> performs a full garbage collection run. Right ?

The allocation process does not call gc before every call to malloc.
It only calls gc if the allocation would cross a threshold level.
Those theshold levels are adjusted in an effort to compromise between
keeping memory footprint low and not calling gc too often.  The code
you quote below is part of this adjustment process.  If this process
is working properly then as memory use grows there will initially be
more gc activity and then less as the thresholds adjust.

The command line arguments mentioned in ?Memory can be used to
influence some of this behavior.  Calling gcinfo(TRUE) will show you
when the internally triggered collections occur, and which level of
collection occurs.

>
> 2. This is not a problem with small memory sizes as they imply
> (presumably) small number of objects.
>
> 3. However, if one wants to allocate many objects (say columns in a data
> frame or just vectors) this results in large penalty
>
> Example 1: This simulates allocation of a data.frame with some character
> columns which are assumed to be factors. On my system first assignment is
> nearly instantaneous, why subsequent assignments take slightly less than 0.1
> seconds each.

I'm not sure these are quite doing what you intend.  You define Chars
but don't use it.  Also, system.time by default calls gc() before
doing the evaluation. Giving FALSE as the second argument may give you
a more realistic picture.

>
> L<-list()
> Chars<-as.character(1:10)
> for(i in 1:100)L[[i]]<-system.time(assign(paste("test", i), 1:100))
> Times<-do.call(rbind, L)
>
> Example 2: Same as example 1 but we first grow the memory with fake
> allocation:
>
> L<-list()
> Chars<-as.character(1:10)
> Data<-1:1
> rm(Data)
> for(i in 1:100)L[[i]]<-system.time(assign(paste("test", i), 1:100))
> Times<-do.call(rbind, L)
>
> In this case the first 20 or so allocations are very quck (faster than 0.02
> sec) and then garbage collector kicks in and the time rises to 0.08 seconds
> each - still less than in Example 1.
>
> This example is relevant because this sequence of allocations is exactly what
> happens when one uses read.table or scan (or database query) to load data.
>
> What is more, if the user then manipulates the loaded data by creating columns
> that are a combination of existing ones then this is very slow as well.
>
> I looked more carefully at your code in src/main/memory.c, function
> AdjustHeapSize:
>
> R_VSize = VNeeded;
>if (vect_occup > R_VGrowFrac) {
>   R_size_t change = R_VGrowIncrMin + R_VGrowIncrFrac * R_NSize;
>   if (R_MaxVSize - R_VSize >= change)
>   R_VSize += change;
>}
>
> Could it be that R_NSize 

Re: [Rd] allocVector bug ?

2006-11-09 Thread Vladimir Dergachev
On Thursday 09 November 2006 12:21 pm, Luke Tierney wrote:
> On Wed, 8 Nov 2006, Vladimir Dergachev wrote:
> > On Wednesday 08 November 2006 12:56 pm, Luke Tierney wrote:
> >> On Mon, 6 Nov 2006, Vladimir Dergachev wrote:
> >
> > Hi Luke,
> >
> >   Yes, I gladly concede the point that for a heuristic algorithm the
> > notion of what is a "bug" is murky (besides crashes, etc, which is not
> > what I am not talking about).
> >
> >   Here is why I called this a bug:
> >
> > 1. My understanding is that each time gc() needs to increase memory
> > it performs a full garbage collection run. Right ?
>
> The allocation process does not call gc before every call to malloc.
> It only calls gc if the allocation would cross a threshold level.
> Those theshold levels are adjusted in an effort to compromise between
> keeping memory footprint low and not calling gc too often.  The code
> you quote below is part of this adjustment process.  If this process
> is working properly then as memory use grows there will initially be
> more gc activity and then less as the thresholds adjust.

Well, I was seeing it call gc for every large vector. This probably happens be 
only for those larger  than R_VGrowIncrFrac * R_NSize. On my system R_NSize 
is never more than 1e6 so this would explain the problems when using 1e6 (and 
larger) vectors.

>
> > 2. This is not a problem with small memory sizes as they imply
> > (presumably) small number of objects.
> >
> > 3. However, if one wants to allocate many objects (say columns in a
> > data frame or just vectors) this results in large penalty
> >
> > Example 1: This simulates allocation of a data.frame with some character
> > columns which are assumed to be factors. On my system first assignment is
> > nearly instantaneous, why subsequent assignments take slightly less than
> > 0.1 seconds each.
>
> I'm not sure these are quite doing what you intend.  You define Chars
> but don't use it.  Also, system.time by default calls gc() before
> doing the evaluation. Giving FALSE as the second argument may give you
> a more realistic picture.

The Chars are defined to create lots of ncells and make gc() run time more 
realistic. It also mimics having a data.frame with a few factor columns.

As for system.time - thank you, I missed that ! 
Setting gcFirst=FALSE changes behavior in the first example to be 2 times 
faster and makes all the allocations in the second example faster.

I guess that extra call to gc() caused R_VSize to shrink too fast.

> > I looked more carefully at your code in src/main/memory.c, function
> > AdjustHeapSize:
> >
> > R_VSize = VNeeded;
> >if (vect_occup > R_VGrowFrac) {
> > R_size_t change = R_VGrowIncrMin + R_VGrowIncrFrac * R_NSize;
> > if (R_MaxVSize - R_VSize >= change)
> > R_VSize += change;
> >}
> >
> > Could it be that R_NSize should be R_VSize ? This would explain why I see
> > a problem in case R_VSize>>R_NSize.
>
> That does indeed look like a bug and that R_NSize should be R_VSize --
> well spotted, thanks.  I will need to experiment with this a bit more
> to see if it can safely be changed.  It will increase the memory
> footprint a bit.  Probaly not by enough to matter but if it does we
> may need to adjust some of the tuning constants.
>

Would there be something I can help you with ? Is there a script to run 
through common usage patterns ?

  thank you !

  Vladimir Dergachev


> Best,
>
> luke
>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Failing to install R-2.4.0 on FC4

2006-11-09 Thread Benilton Carvalho
Hi everyone,

Firstly, I'm sorry for the cross-post. I re-read the posting guide  
and it appears my question is more related to r-devel (i think :-) )

I downloaded the source code available at:

http://cran.fhcrc.org/src/base/R-2/R-2.4.0.tar.gz

to a linux machine (Linux 2.6.11-1.1369_FC4smp #1 SMP).

I successfully configured and compiled it, which means I'm able to
run it from the R-2.4.0/bin directory.

I want to do a system wide installation (via 'sudo make install') and
help2man fails, ie, when I execute the installation command mentioned
previously I get:

make[1]: Entering directory `/home/bcarvalh/R-2.4.0/m4'
make[1]: Nothing to be done for `install'.
make[1]: Leaving directory `/home/bcarvalh/R-2.4.0/m4'
make[1]: Entering directory `/home/bcarvalh/R-2.4.0/tools'
make[1]: Nothing to be done for `install'.
make[1]: Leaving directory `/home/bcarvalh/R-2.4.0/tools'
make[1]: Entering directory `/home/bcarvalh/R-2.4.0/doc'
installing doc ...
help2man: can't get `--version' info from ../bin/R
make[1]: *** [R.1] Error 255
make[1]: Leaving directory `/home/bcarvalh/R-2.4.0/doc'
make: *** [install] Error 1

Searching the archive, I did find a similar report (http://
tolstoy.newcastle.edu.au/R/devel/06/01/3699.html) but the proposed
solution did not work for me.

Any suggestion?

Thank you very much,

Benilton Carvalho
PhD Candidate
Department of Biostatistics
Johns Hopkins University

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Failing to install R-2.4.0 on FC4

2006-11-09 Thread Peter Dalgaard
Benilton Carvalho <[EMAIL PROTECTED]> writes:

> Hi everyone,
> 
> Firstly, I'm sorry for the cross-post. I re-read the posting guide  
> and it appears my question is more related to r-devel (i think :-) )
> 
> I downloaded the source code available at:
> 
> http://cran.fhcrc.org/src/base/R-2/R-2.4.0.tar.gz
> 
> to a linux machine (Linux 2.6.11-1.1369_FC4smp #1 SMP).
> 
> I successfully configured and compiled it, which means I'm able to
> run it from the R-2.4.0/bin directory.
> 
> I want to do a system wide installation (via 'sudo make install') and
> help2man fails, ie, when I execute the installation command mentioned
> previously I get:
> 
> make[1]: Entering directory `/home/bcarvalh/R-2.4.0/m4'
> make[1]: Nothing to be done for `install'.
> make[1]: Leaving directory `/home/bcarvalh/R-2.4.0/m4'
> make[1]: Entering directory `/home/bcarvalh/R-2.4.0/tools'
> make[1]: Nothing to be done for `install'.
> make[1]: Leaving directory `/home/bcarvalh/R-2.4.0/tools'
> make[1]: Entering directory `/home/bcarvalh/R-2.4.0/doc'
> installing doc ...
> help2man: can't get `--version' info from ../bin/R
> make[1]: *** [R.1] Error 255
> make[1]: Leaving directory `/home/bcarvalh/R-2.4.0/doc'
> make: *** [install] Error 1
> 
> Searching the archive, I did find a similar report (http://
> tolstoy.newcastle.edu.au/R/devel/06/01/3699.html) but the proposed
> solution did not work for me.
> 
> Any suggestion?

Well if nothing else works, there's a Fedora Extra RPM for R-2.4.0...

http://fedoraproject.org/extras/4/i386/repodata/repoview/R-0-2.4.0-2.fc4.1.html

The symptoms suggest that you can run ../bin/R, but root cannot, so
check permissions.

-- 
   O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
~~ - ([EMAIL PROTECTED])  FAX: (+45) 35327907

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] invert argument in grep

2006-11-09 Thread Romain Francois

Hello,

What about an `invert` argument in grep, to return elements that are 
*not* matching a regular expression :


R> grep("pink", colors(), invert = TRUE, value = TRUE)

would essentially return the same as :

R> colors() [ - grep("pink", colors()) ]


I'm attaching the files that I modified (against today's tarball) for 
that purpose.


Cheers,

Romain

--
*mangosolutions*
/data analysis that delivers/

Tel   +44 1249 467 467
Fax   +44 1249 467 468

grep <-
function(pattern, x, ignore.case = FALSE, extended = TRUE, perl = FALSE,
 value = FALSE, fixed = FALSE, useBytes = FALSE, invert = FALSE)
{
pattern <- as.character(pattern)
## when value = TRUE we return names
if(!is.character(x)) x <- structure(as.character(x), names=names(x))
## behaves like == for NA pattern
if (is.na(pattern)) {
if(value)
return(structure(rep.int(as.character(NA), length(x)),
 names = names(x)))
else
return(rep.int(NA, length(x)))
}

if(perl)
.Internal(grep.perl(pattern, x, ignore.case, value, useBytes, invert))
else
.Internal(grep(pattern, x, ignore.case, extended, value, fixed,
   useBytes, invert))
}

sub <-
function(pattern, replacement, x, ignore.case = FALSE, extended = TRUE,
 perl = FALSE, fixed = FALSE, useBytes = FALSE)
{
pattern <- as.character(pattern)
replacement <- as.character(replacement)
if(!is.character(x)) x <- as.character(x)
if (is.na(pattern))
return(rep.int(as.character(NA), length(x)))

if(perl)
.Internal(sub.perl(pattern, replacement, x, ignore.case, useBytes))
else
.Internal(sub(pattern, replacement, x, ignore.case,
  extended, fixed, useBytes))
}

gsub <-
function(pattern, replacement, x, ignore.case = FALSE, extended = TRUE,
 perl = FALSE, fixed = FALSE, useBytes = FALSE)
{
pattern <- as.character(pattern)
replacement <- as.character(replacement)
if(!is.character(x)) x <- as.character(x)
if (is.na(pattern))
return(rep.int(as.character(NA), length(x)))

if(perl)
.Internal(gsub.perl(pattern, replacement, x, ignore.case, useBytes))
else
.Internal(gsub(pattern, replacement, x, ignore.case,
   extended, fixed, useBytes))
}

regexpr <-
function(pattern, text, extended = TRUE, perl = FALSE,
 fixed = FALSE, useBytes = FALSE)
{
pattern <- as.character(pattern)
text <- as.character(text)
if(perl)
.Internal(regexpr.perl(pattern, text, useBytes))
else
.Internal(regexpr(pattern, text, extended, fixed, useBytes))
}

gregexpr <-
function(pattern, text, extended = TRUE, perl = FALSE,
 fixed = FALSE, useBytes = FALSE)
{
pattern <- as.character(pattern)
text <- as.character(text)
if(perl)
  .Internal(gregexpr.perl(pattern, text, useBytes))
else
  .Internal(gregexpr(pattern, text, extended, fixed, useBytes))
}

agrep <-
function(pattern, x, ignore.case = FALSE, value = FALSE,
 max.distance = 0.1)
{
pattern <- as.character(pattern)
if(!is.character(x)) x <- as.character(x)
## behaves like == for NA pattern
if (is.na(pattern)){
if (value)
return(structure(rep.int(as.character(NA), length(x)),
 names = names(x)))
else
return(rep.int(NA, length(x)))
}

if(!is.character(pattern)
   || (length(pattern) < 1)
   || ((n <- nchar(pattern)) == 0))
stop("'pattern' must be a non-empty character string")

if(!is.list(max.distance)) {
if(!is.numeric(max.distance) || (max.distance < 0))
stop("'max.distance' must be non-negative")
if(max.distance < 1)# transform percentages
max.distance <- ceiling(n * max.distance)
max.insertions <- max.deletions <- max.substitutions <-
max.distance
} else {
## partial matching
table <- c("all", "deletions", "insertions", "substitutions")
ind <- pmatch(names(max.distance), table)
if(any(is.na(ind)))
warning("unknown match distance components ignored")
max.distance <- max.distance[!is.na(ind)]
names(max.distance) <- table[ind]
## sanity checks
comps <- unlist(max.distance)
if(!all(is.numeric(comps)) || any(comps < 0))
stop("'max.distance' components must be non-negative")
## extract restrictions
if(is.null(max.distance$all))
max.distance$all <- 0.1
max.insertions <- max.deletions <- max.substitutions <-
max.distance$all
if(!is.null(max.distance$deletions))
max.deletions <- max.distance$deletions
if(!is.null(max.distance$insertions))
max.insertions <- max.distance$insertions
if(!is.null(max.distance$substitutions))
max.substitutions <- max.distance$su