Re: [Rd] Characters vs. factors

2009-10-05 Thread David M Smith
On Mon, Oct 5, 2009 at 4:33 PM, hadley wickham  wrote:
> It seems like a recent trend in R has been to make character vectors
> and factors almost equivalent (apart from the way that factors always
> remember their original range).  There are a few exceptions:

A related issue is that modeling functions throw a warning when
character objects are used in place of factors:

> shopping <- 
> read.csv("http://spreadsheets.google.com/pub?key=tE9pXlYLwTAeiDWxL8h_viA&single=true&gid=0&range=A1%3AE37&output=csv";,
>  as.is=TRUE)
> shopping$seconds <- as.numeric(as.difftime(shopping$Total.Time))
> fit <- lm(seconds ~ Number.of.Items + Payment - 1, shopping,subset=-8)
Warning message:
In model.matrix.default(mt, mf, contrasts) :
  variable 'Payment' converted to a factor

The warning doesn't affect R's behaviour, of course, but it does make
it difficult to sanction the otherwise sensible advice to R beginners
to read in data files with as.it=TRUE. (The warning leads to
difficult-to-answer questions.) For similar reasons  I deleted the
warning from this post:
http://blog.revolution-computing.com/2009/09/is-the-express-line-really-faster-1.html

In general the trend towards equivalence of factors and character
vectors is welcome, though.

# David

On Mon, Oct 5, 2009 at 4:33 PM, hadley wickham  wrote:
>
> It seems like a recent trend in R has been to make character vectors
> and factors almost equivalent (apart from the way that factors always
> remember their original range).  There are a few exceptions:
>
>  * summary.character != summary.factor
>  * table(x, exclude = NULL) != table(factor(x), exclude=NULL) when x
> includes missing values
>
>  * strsplit on a factor
>
> > strsplit(factor(c("a", "a b")), " ")
> Error in strsplit(factor(c("a", "a b")), " ") : non-character argument
>
>  * nchar on a factor:
>
> > nchar(factor(c("abc", "d", "defgh")))
> [1] 1 1 1
>
>  * : with two character strings
>
> > "a":"b"
> Error in "a":"b" : NA/NaN argument
> In addition: Warning messages:
> 1: NAs introduced by coercion
> 2: NAs introduced by coercion
> > factor("a"):factor("b")
> [1] a:b
> Levels: a:b
>
> Regards,
>
> Hadley
>
> --
> http://had.co.nz/
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



--
David M Smith 
Director of Community, REvolution Computing www.revolution-computing.com
Tel: +1 (206) 577-4778 x3203 (San Francisco, USA)

Check out our upcoming events schedule at www.revolution-computing.com/events

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [R] Using R in a corporate envinronment

2010-03-10 Thread David M Smith
If you're looking for business and technical justifications for a
business to adopt R, I wrote some up last year:
http://blog.revolution-computing.com/2009/02/how-to-get-it-to-accept-and-love-r.html

In summary:
R is mainstream
R is supported
R is high-quality
R leads commercial packages in innovation
R is cost effective

Ironically, for many companies #1 is more important than #5, so
pointing to other companies using R is often a good strategy (see
http://blog.revolution-computing.com/rmedia/ for some media articles
that may help).
And also (if I may indulge the list for a small plug) some companies
are more comfortable paying for open-source software than installing
it free of cost (for the support, validation, and so on). If that's
the case for your company, there's REvolution R Enterprise:
http://www.revolution-computing.com/products/revolution-enterprise.php

Hope this helps,
# David Smith

--
David M Smith 
VP of Marketing, REvolution Computing  http://blog.revolution-computing.com
Tel: +1 (650) 330-0553 x205 (Palo Alto, CA, USA)

Download REvolution R free:
www.revolution-computing.com/downloads/revolution-r.php

On Wed, Mar 10, 2010 at 10:07 AM, Fernando Henrique Ferraz Pereira da
Rosa  wrote:
>
> Dear r-useRs,
>
> After a couple of years in a 'R exile' of sorts, I've recently changed jobs
> and my current employer (an American multinational in the food manufacturing
> industry) is much more open than my past employer (which wouldn't even want
> to hear about anything that didn't begin with SAS...). So, after my
> insistence corporate IT is now considering adopting R as part of our
> statistical applications toolbox.
>
> Things are not that simple though, and I'm now in the process of collecting
> data for writing a Business Case for R in our corporation, and this is the
> reason I'm writing you. If you have any examples (preferably with
> references) and/or experience in a similar scenario please do write me. I've
> already googled for some materials, and there's an excellent piece on the
> NyTimes of last year, which pointed that even Google was adopting R, and
> this is exactly the sort of thing I need to help convincing IT they'll be
> making a sound choice in adopting R.
>
> Thanks in advance for your attention,
>
> Fernando Rosa
>
> --
> "Though this be randomness, yet there is structure in't."
>                                          Rosa, F.H.F.P
>
> Instituto de Matemática e Estatística
> Universidade de São Paulo
> Fernando Henrique Ferraz P. da Rosa
> http://www.feferraz.net
>
>        [[alternative HTML version deleted]]
>
>
> __
> r-h...@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] FW: [R] The Quality & Accuracy of R

2009-01-26 Thread David M Smith
On Sun, Jan 25, 2009 at 4:20 PM, Peter Dalgaard
 wrote:
>>> - a good reason to want post-install validation is that validity can depend 
>>> on other part of the system outside developer control (e.g. an overzealous 
>>> BLAS optimization, sacrificing accuracy and/or standards compliance for 
>>> speed, can cause trouble). This is also a reason for not making too 
>>> far-reaching statements about validity.

I wanted to echo Peter's point here.  It's the main reason why we
don't claim our distribution of R is validated: *no* software can be
considered validated outside of the environment where it is installed
and used.  (We do however claim Revolution R is ready for a validation
*process*, a small but significant part of which is coming on-site to
run tests and verify the results.) We've come across a number of
environmental issues (locales, random number generators, shared
libraries, path settings, many others) that may affect the validation
process.  My main point here is that R can only be validated in situ,
and the process isn't practical to automate.  With the right build
tools in place, many of the *tests* can be automated, but that leaves
out validation on how the results are stored, used, and accessed in
practice.

> Muenchen, Robert A (Bob) wrote:
>>> Asking to add a superfluous step to an installation may seem like a
> waste of time, and technically it is. But psychologically this testing
> will have a important impact that will silence many critics.

Nonetheless, Bob has an excellent point here -- even short of a
complete validation process, *perception* can prevent the validation
ball from getting stuck in the first place.  Giving the user some
degree of easily-digestible feedback that the installed R has run and
passed a battery of tests could help for that, and is something we'll
look at for the Revolution R distribution.

# David Smith

P.S. For those who subscribe to r-devel but not r-help, some further
discussion of validation for R is here:
http://blog.revolution-computing.com/2009/01/analyzing-clinical-trial-data-with-r.html

--
David M Smith 
Director of Community, REvolution Computing www.revolution-computing.com
Tel: +1 (206) 577-4778 x3203 (Seattle, USA)

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Identifying graphics files produced by R

2009-02-13 Thread David M Smith
Oftentimes, I see graphs on the web that *look* like they've been
produced by R, but I can never be sure.  Or can I?  I notice that
PostScript files include a "%%%Creator: R Software" line, but do R
graphics drivers encode any identifying information in GIF or PNG
files more commonly used on the web?  And of so, would such evidence
necessarily be obliterated in post-processing (e.g cropping)?

I'm trying to do an informal survey of R's use to create statistical
graphics on the web, and if there's a way to identify graph files I
see as coming from R it would help a lot.

Thanks,
# David Smith

--
David M Smith 
Director of Community, REvolution Computing www.revolution-computing.com
Tel: +1 (206) 577-4778 x3203 (Seattle, USA)

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Summary: Identifying graphics files produced by R

2009-02-16 Thread David M Smith
Thanks to all those that responded to the question below, either on-list or
privately.  The bottom line is that there's no identifying information from
R in the metadata for PNG or JPG files (and R doesn't produce GIFs). I did
however figure out a way to automate a search for PDF and PostScript files
produced by R, and the details are here:
http://blog.revolution-computing.com/2009/02/r-graphics-in-the-media.html

Thanks,
# David Smith

On Fri, Feb 13, 2009 at 1:15 PM, David M Smith <
da...@revolution-computing.com> wrote:

> Oftentimes, I see graphs on the web that *look* like they've been
> produced by R, but I can never be sure.  Or can I?  I notice that
> PostScript files include a "%%%Creator: R Software" line, but do R
> graphics drivers encode any identifying information in GIF or PNG
> files more commonly used on the web?  And of so, would such evidence
> necessarily be obliterated in post-processing (e.g cropping)?
>
> I'm trying to do an informal survey of R's use to create statistical
> graphics on the web, and if there's a way to identify graph files I
> see as coming from R it would help a lot.
>
> Thanks,
> # David Smith
>
> --
> David M Smith 
> Director of Community, REvolution Computing www.revolution-computing.com
> Tel: +1 (206) 577-4778 x3203 (Seattle, USA)
>



-- 
David M Smith 
Director of Community, REvolution Computing www.revolution-computing.com
Tel: +1 (206) 577-4778 x3203 (Seattle, USA)

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Building R for Vistax64

2009-03-11 Thread David M Smith
On Wed, Mar 11, 2009 at 1:15 PM, Sim, Fraser
 wrote:
> Hi all,
>
> I have successfully built from source the 32-bit version of R on my
> Vista 64-bit box. I was hoping to graduate to a 64-bit version so I
> could analyze some larger data sets. I have 8gb RAM installed.

We (REvolution Computing) are beta testing a 64-bit build of R (2.7.2)
and its packages for Windows now.  There's more information at:

http://www.revolution-computing.com/products/windows-64bit.php

# David Smith

-- 
David M Smith 
Director of Community, REvolution Computing www.revolution-computing.com
Tel: +1 (206) 577-4778 x3203 (Seattle, USA)

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Closed-source non-free ParallelR ?

2009-04-22 Thread David M Smith
ternative HTML version deleted]]
>>
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>
>
>
> --
> Pat Shields
> Software Engineer
> REvolution Computing
> One Century Tower | 265 Church Street, Suite 1006
> New Haven, CT  06510
> P: 203-777-7442 x250 | www.revolution-computing.com
>
> Check out our upcoming events schedule at
> www.revolution-computing.com/events
>
>        [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>



-- 
David M Smith 
Director of Community, REvolution Computing www.revolution-computing.com
Tel: +1 (206) 577-4778 x3203 (San Francisco, USA)

Check out our upcoming events schedule at www.revolution-computing.com/events

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] About ParallelR and licensing of packages

2009-04-26 Thread David M Smith
I rather feel that this discussion has gone beyond a topic and tone suitable
for r-devel. I would like to say however, as an author of several GPL works
myself, that I am confident that REvolution Computing (my employer, in case
that's not clear) is a good-faith member of the open-source community and
adheres to the letter and spirit of all licenses. We will reply to the
particulars of Mr Dowle's message in private email.
I invite any others who may wish to share comments or concerns to do so to
me directly at da...@revolution-computing.com.

# David Smith

-- 
David M Smith 
Director of Community, REvolution Computing www.revolution-computing.com
Tel: +1 (206) 577-4778 x3203 (San Francisco, USA)


On Sun, Apr 26, 2009 at 9:21 PM, Matthew Dowle wrote:

> Dear Danese,
>
> Without prejudice save as to costs
>
> I am the author of the R library "data.table". I released data.table under
> the provisions of the General Public License (GPL). This email is to notify
> REvolution that we may be in dispute.  If we are in dispute then I am
> entitled to issue litigation proceedings against REvolution for breach of
> contract.
>
> To establish if we are in fact in dispute, please answer the following :
>
> 1. Does REvolution R Enterprise include the library data.table ?
> 2. Has REvolution R Enterprise been distributed yet, for example has
> REvolution sold a copy ?
> 3. If it was distributed, was it distributed under a GPL-compatible license
> ?
>
> FSF guidance :
> http://www.fsf.org/licensing/licenses/gpl-faq.html#GPLInProprietarySystem
>
> Notwithstanding a potential dispute on the basis above, please also answer
> the following :
>
> 4. Has REvolution distributed any program code, written in R or any other
> language or environment or otherwise, which uses the library data.table, for
> example by calling functions that are provided by data.table at run time ?
> 5. If so, was such program code distributed under a GPL compatible license
> ?
> FSF guidance :
> http://www.fsf.org/licensing/licenses/gpl-faq.html#IfInterpreterIsGPL (3rd 
> paragraph)
> http://www.fsf.org/licensing/licenses/gpl-faq.html#IfLibraryIsGPL
> http://www.fsf.org/licensing/licenses/gpl-faq.html#NFUseGPLPlugins
>
> I am making every effort to agree with you that we are not in dispute.  I
> have several suggestions which may avoid dispute, for example you could
> remove data.table from REvolution R Enterprise. You could confirm that the
> aggregate work REvolution R Enterprise is released under a GPL-compatible
> license. There may well be other solutions you could suggest. You could
> decide to postpone distribution of REvolution R Enterprise until all
> potential disputes are resolved.  If I have not heard from you or your
> representatives within 21 days of today 26 April 2009 then I will instruct
> my legal representatives to establish whether there is a dispute.
> Alternatively you can confirm we are in dispute and I will start to accrue
> legal costs immediately thereon. Any such costs will themselves form part of
> the claim. I intend to be as open and forthcoming with you about costs as my
> lawyers permit me.
>
> This potential dispute is between myself only and REvolution. You must
> engage with me directly by answering the questions above with respect to
> data.table. It is a matter for you whether you answer publicly, via your
> lawyers or privately to me.  It is my understanding that any other GPL'd R
> library owners is also entitled to establish, either now or in the future,
> whether they are also potentially in dispute with you on the same basis as
> above. There are up to 1,700 distinct R libraries, each of which could
> potentially generate 1,700 claims of breach of contract on you. One of those
> is the R Foundation, who as license holder for the library "base" have
> stated they will make a public statement in due course. That is a matter for
> the R Foundation, and them alone.  In my potential dispute with you, under
> English law I have 6 years between the date of any as yet unknown breach of
> contract and the date by which I must serve notice on you and submit
> particulars of claim to the cou!
>  rt.  My lawyers cannot start to draft particulars of claim until we have
> established we are actually in dispute.
>
> I remind you of the contract by which you are bound by me of your
> distributing of my library, or your distributing of programs (yours or
> otherwise) which use my library :
>
> Licensing FAQ page:http://www.fsf.org/licenses/gpl-faq.html
> Text of the GNU GPL:   http://www.fsf.org/copyleft/gpl.html
> Text of the GNU LGPL:  http://www.fsf.org/copyleft/lgpl.html
> FSF license list page: http://www.fsf.org/licenses/license-list.html
>
> I look fo