On 2009-01-20 15:24, Gavin Simpson wrote:
On Tue, 2009-01-20 at 14:45 +0100, Ian Jenkinson wrote:
See the R Import/Export manual.  Also
RSiteSearch("import excel")
gives many hits.  It seems as if this question is
asked almost daily.

On Sun, Jan 18, 2009 at 9:15 AM, Michele Santacatterina
<miksa...@gmail.com> wrote:
Hello,

i have a xls file. I will read it in r, what library-command i use for
this??

any ideas??
I feel concerned because I have just spent a frustrating couple of days trying to read an Excel (xls) file, with the aid of the R book (Crawley, 2007), and R help files. I failed, but finally found a workaround. My experience might help others.

You did read ?read.table yes?
Unfortunately not. Now I see that it is rich and detailed. Thank you.
There are three arguments there that can help in such situations:
'colClasses' allows the finest grained control over how R imports your
text files. You specify what each column is, noting that if you have
lots of columns, things like
c("numeric", rep("character", 12))

will deal with runs of columns of the same type, without having to type
them all by hand.
I'm a bit mystified by this.
'as.is' is a vector of logicals (TRUE/FALSE) that controls whether a
column is read in as is or converted.
I have just tried your suggestion:

> TTT<-read.table("/D/.../090117T_P.txt",header=TRUE,as.is="Log.microplank.biomass")

I CONFIRM IT WORKS! Sure enough elements in my "Log.microplank.biomass" column are now "numeric":

> (TTT[5,7])
[1] 1.612784
> class(TTT[5,7])
[1] "numeric"

Indeed, I see that Crawley (2007) gives this example on p.100:
[>] murder<-read.table("c:\\temp\\murders.txt,header=T,as.is="region")
, but until now I hadn't understood what "region" meant, so I didn't see how to use "as.is". Now I realise "region" is a header name in that particular data.frame.
Sorry for being such a newby!

'stringsAsFactors' a single logical. Should all character variables be
converted to factors.
I hadn't come across this. This would be useful if you wanted variables as factors, but my problem was that I got "factors" when I wanted "numeric"
Some of these would have been useful in your case.

I'm not sure what you tried, but I have found that saving an .xls file
as a CSV via OpenOffice.org (on Linux) and subsequently reading it in
with read.csv("foo.csv", ...) to be reasonably fool proof, especially
when one makes use of the arguments about for fine-grained processing.
I can appreciate this would probably have worked for me too, had I known how to do the "fine-grained processing".
Someone in this thread posted a response that included the use of RODBC,
which I haven't tried, but there are a plethora of ways to read data
from Excel without having to torture yourself and the data formats to do
so.
I saw this about RODBC, but it seemed a complicated way of doing things, and in any case it seems that to run RODBC you need Excel 2004 or higher, which I would need to buy. One beautiful thing about R and OOo is that they are excellent and they cost nothing, so you don't have to buy, borrow or steal them.
HTH

G

Ian
My data were in an Excel xls file

I have R (version 2.6.2) installed in Kubuntu Linux
I also have R (version 2.6.2) installed in Windows XP SP_3 running in VirtualBox (a Virtual Computer) in Kubuntu, and I have (very old) Excel 97 on this system.

I wasted a lot of time exporting from Excel in various formats (txt, csv, dif, tab-delimited, ;-delimited ,-delimited, etc.). (I checked they were of correct format by peeking with a text editor.) Then I would try reading using e.g. read.table("[file path]",header=TRUE) or read.csv(...) or read.csv2(...), or read.DIF(...), with or without "header=TRUE" or "header =FALSE". I also copied to the "clipboard" and tried reading using read.DIF("clipboard")
In many of these cases I did get a data.frame that looked nice on-screen.

My recurrent problem, however, was that many of the numeric variables in the resultant data frame were CLASS "factor". If you do arithmetic or plotting on factors, either it fails or gives wrong results.

So I spent hours using (as.numeric(...)) with variants and permutations, etc. Most times (as.numeric(...)) seems to work, but actually the data either remained unchanged (as a "factor") or gave "numeric" but wrong numbers.

I read the xls file using gnumeric application and saved as a dif file, then used read.DIF("[file path]"). This gave some correct "numeric" numbers but jumbled and partly duplicated.

N.B. My problems were essentially the same whether I used R in XP or in Linux (kubuntu)

MY SOLUTION (working in Linux):
Read the Excel file (xls) using Open Office.org (version 2.4.1) (downloadable for free for Linux or Windows).
Save as dif file.
In R,   TT<-read.DIF("[file path]",header=TRUE)
It worked, and all my numerical data elements were "numeric", correct and in the right order. Omit "header=TRUE" if you don't want the first elements of the spreadsheet columns declared as headers.

Hope this may help someone.

Here's a subset of my data in a data.frame (environmental data on plankton):

 >TT
Stn Day Mean.salinity Mean.temperature Secchi.disc. Log.microplank.biomass 1 1 12 0 14 0.7 1.954242509 2 1 70 13.5 16.55 0.3 3.083860801 3 1 93 13.45 16.85 0.6 2.651278014 4 1 153 6.78 14.2 0.5 2.075546961 5 1 200 0 9.3 0.7 1.612783857 6 1 231 0 7.1 0.8 1.491361694 7 1 283 0 8.8 0.4 2.123851641 8 1 330 4.95 9.45 0.3 2.276461804 9 1 370 16.6 12.3 0.4 2.728353782 10 3 12 16.25 11.95 0.55 2.025305865 11 3 70 22.35 16.1 0.5 2.096910013 12 3 93 26.05 17.15 1.5 1.707570176 13 3 153 23.4 14.2 1 1.755874856 14 3 200 14.05 8.6 0.4 1.812913357 15 3 231 7.9 6.3 0.3 1.897627091 16 3 283 11.2 7.25 0.7 1.832508913 17 3 330 19.95 8.1 0.5 1.785329835 18 3 370 24.35 11.5 0.4 2.361727836 19 4 12 18.1 12.05 0.6 1.792391689 20 4 70 24.35 15.9 0.7 1.973127854 21 4 93 27 17.35 1.3 1.982271233 22 4 153 25.8 14.2 0.8 1.924279286 23 4 200 16.2 9 0.4 1.653212514 24 4 231 11.5 6.85 0.4 1.819543936 25 4 283 10.95 8.2 0.25 2.096910013 26 4 330 19.7 8.45 0.4 2.025305865 27 4 370 25.6 11.5 0.5 2.274157849

Ian Jenkinson

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


--
Dr Ian R. Jenkinson
Agence de Conseil et de Recherche Océanographiques
Lavergne
19320 La Roche Canillac
France

+33 555 29 19 48
+33 555 29 19 82 (fax)
+33 608 89 13 62 (mobile)
ian.jenkin...@wanadoo.fr
http://assoc.orange.fr/acro/

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to