This certainly looks like a bug, and there are many ways of inducing bugs that 
only show up with large datasets - buffer overruns, fields that are too small 
to hold the number of rows, etc. Remember that there is NO official 
documentation of the .sas7bdat format, everything has been reverse engineered, 
and if something in the format is different for very large datasets, it may 
well have gone unnoticed.

However, read.sas7bdat is from the sas7bdat package which has a maintainer.  It 
is not unlikely that he is interested in tracking down the root cause, if you 
show him how to generate SAS datasets that reproduce the issue.

Best,
Peter D.

On 19 Nov 2013, at 22:40 , Li, Xiaochun <xiaoc...@iupui.edu> wrote:

> Dear R-ers,
> 
> I was trying to read in a large sas7bdat file (size 148094976 bytes) using 
> 'read.sas7bdat()', but it did not read in the data correctly.  E.g., the 
> first 5 rows will come out like this (I'm omitting other columns to keep it 
> readable):
> 
>       PERSON_ID           age
> 1  5.399114e-315 5.329436e-315
> 2  5.399114e-315 5.328302e-315
> 3  5.399114e-315 5.332026e-315
> 4  5.399114e-315 5.329112e-315
> 5  5.399114e-315 5.331055e-315
> 
> If I reduced the original sas dataset to the first 5 rows, 'read.sas7bdat' 
> read them in correctly:
> 
>  PERSON_ID age
> 1    612569  55
> 2    612571  48
> 3    612580  78
> 4    612606  53
> 5    612617  66
> 
> So for now I first saved the sas dataset as .csv, then read using 'read.csv', 
> everything is fine.  
> 
> Any suggestion why 'read.sas7bdat' didn't work, and if some fix in its code 
> can make it work?
> 
> Thank  you.
> _____________________________ 
> Xiaochun Li, Ph.D. 
> Department of Biostatistics 
> Indiana University 
> School of Medicine and
> Richard M. Fairbanks School of Public Health
> 
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd....@cbs.dk  Priv: pda...@gmail.com

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to