[Rd] [PATCH 2/2] readtable: add test for type conversion hook 'colConvert'
Signed-off-by: Kurt Van Dijck --- tests/reg-tests-2.R | 21 + tests/reg-tests-2.Rout.save | 27 +++ 2 files changed, 48 insertions(+) diff --git a/tests/reg-tests-2.R b/tests/reg-tests-2.R index 9fd5242..5026fe7 100644 --- a/tests/reg-tests-2.R +++ b/tests/reg-tests-2.R @@ -1329,6 +1329,27 @@ unlink(foo) ## added in 2.0.0 +## colConvert in read.table +probecol <- function(col) { + tmp <- as.POSIXlt(col, optional=TRUE, tryFormats=c("%d/%m/%Y %H:%M")); + if (all(!is.na(tmp))) + return (tmp) + tmp <- as.POSIXlt(col, optional=TRUE, tryFormats=c("%d/%m/%Y")); + if (all(!is.na(tmp))) + return (tmp) +} + +Mat <- matrix(c(1:3, letters[1:3], 1:3, LETTERS[1:3], +c("22/4/1969", "8/4/1971", "23/9/1973"), +c("22/4/1969 6:01", " 8/4/1971 7:23", "23/9/1973 8:45")), + 3, 6) +foo <- tempfile() +write.table(Mat, foo, sep = ",", col.names = FALSE, row.names = FALSE) +read.table(foo, sep = ",", colConvert=probecol) +unlist(sapply(.Last.value, class)) +unlink(foo) + + ## write.table with complex columns (PR#7260, in part) write.table(data.frame(x = 0.5+1:4, y = 1:4 + 1.5i), file = "") # printed all as complex in 2.0.0. diff --git a/tests/reg-tests-2.Rout.save b/tests/reg-tests-2.Rout.save index 598dd71..668898e 100644 --- a/tests/reg-tests-2.Rout.save +++ b/tests/reg-tests-2.Rout.save @@ -4206,6 +4206,33 @@ Warning message: > ## added in 2.0.0 > > +> ## colConvert in read.table +> probecol <- function(col) { ++ tmp <- as.POSIXlt(col, optional=TRUE, tryFormats=c("%d/%m/%Y %H:%M")); ++ if (all(!is.na(tmp))) ++ return (tmp) ++ tmp <- as.POSIXlt(col, optional=TRUE, tryFormats=c("%d/%m/%Y")); ++ if (all(!is.na(tmp))) ++ return (tmp) ++ } +> +> Mat <- matrix(c(1:3, letters[1:3], 1:3, LETTERS[1:3], ++ c("22/4/1969", "8/4/1971", "23/9/1973"), ++ c("22/4/1969 6:01", " 8/4/1971 7:23", "23/9/1973 8:45")), ++ 3, 6) +> foo <- tempfile() +> write.table(Mat, foo, sep = ",", col.names = FALSE, row.names = FALSE) +> read.table(foo, sep = ",", colConvert=probecol) + V1 V2 V3 V4 V5 V6 +1 1 a 1 A 1969-04-22 1969-04-22 06:01:00 +2 2 b 2 B 1971-04-08 1971-04-08 07:23:00 +3 3 c 3 C 1973-09-23 1973-09-23 08:45:00 +> unlist(sapply(.Last.value, class)) + V1V2V3V4 V51 V52 V61 V62 +"integer" "factor" "integer" "factor" "POSIXlt" "POSIXt" "POSIXlt" "POSIXt" +> unlink(foo) +> +> > ## write.table with complex columns (PR#7260, in part) > write.table(data.frame(x = 0.5+1:4, y = 1:4 + 1.5i), file = "") "x" "y" -- 1.8.5.rc3 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] [PATCH 1/2] readtable: add hook for type conversions per column
This commit adds a function parameter to readtable. The function is called for every column. The goal is to allow specific (non-standard) type conversions depending on the input. When the parameter is not given, or the function returns NULL, the legacy default applies. The colClasses parameter still takes precedence, i.e. the colConvertFn only applies to the default conversions. This allows to properly load a .csv with timestamps expressed in the (quite common) %d/%m/%y %H:%M format, which was impossible since overruling as.POSIXlt makes a copy in the users namespace, and read.table would still take the base version of as.POSIXlt. Rather than fixing my specific requirement, this hook allows to probe for any custom format and do smart things with little syntax. Signed-off-by: Kurt Van Dijck --- src/library/utils/R/readtable.R | 12 +++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/src/library/utils/R/readtable.R b/src/library/utils/R/readtable.R index 238542e..076a707 100644 --- a/src/library/utils/R/readtable.R +++ b/src/library/utils/R/readtable.R @@ -65,6 +65,7 @@ function(file, header = FALSE, sep = "", quote = "\"'", dec = ".", strip.white = FALSE, blank.lines.skip = TRUE, comment.char = "#", allowEscapes = FALSE, flush = FALSE, stringsAsFactors = default.stringsAsFactors(), + colConvert = NULL, fileEncoding = "", encoding = "unknown", text, skipNul = FALSE) { if (missing(file) && !missing(text)) { @@ -226,9 +227,18 @@ function(file, header = FALSE, sep = "", quote = "\"'", dec = ".", if(rlabp) do[1L] <- FALSE # don't convert "row.names" for (i in (1L:cols)[do]) { data[[i]] <- -if (is.na(colClasses[i])) +if (is.na(colClasses[i])) { +tmp <- NULL +if (!is.null(colConvert)) +# attempt to convert from user provided hook +tmp <- colConvert(data[[i]]) +if (!is.null(tmp)) +(tmp) +else +# fallback, default type.convert(data[[i]], as.is = as.is[i], dec=dec, numerals=numerals, na.strings = character(0L)) +} ## as na.strings have already been converted to else if (colClasses[i] == "factor") as.factor(data[[i]]) else if (colClasses[i] == "Date") as.Date(data[[i]]) -- 1.8.5.rc3 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [PATCH 1/2] readtable: add hook for type conversions per column
Hello, I want to find out if this patch is ok or not, and if not, what should change. Kind regards, Kurt __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [PATCH 1/2] readtable: add hook for type conversions per column
On di, 26 mrt 2019 12:48:12 -0700, Michael Lawrence wrote: >Please file a bug on bugzilla so we can discuss this further. All fine. I didn't find a way to create an account on bugs.r-project.org. Did I just not see it? or do I need administrator assistance? Kind regards, Kurt __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [RFC] readtable enhancement
Thank you for your answers. I rather do not file a new bug, since what I coded isn't really a bug. The problem I (my colleagues) have today is very stupid: We read .csv files with a lot of columns, of which most contain date-time stamps, coded in DD/MM/ HH:MM. This is not exotic, but the base library's readtable (and derivatives) only accept date-times in a limited number of possible formats (which I understand very well). We could specify a format in a rather complicated format, for each column individually, but this syntax is rather difficult to maintain. My solution to this specific problem became trivial, yet generic extension to read.table. Rather than relying on the built-in type detection, I added a parameter to a function that will be called for each to-be-type-probed column so I can overrule the built-in limited default. If nothing returns from the function, the built-in default is still used. This way, I could construct a type-probing function that is straight-forward, not hard to code, and makes reading my .csv files acceptible in terms of code (read.table parameters). I'm sure I'm not the only one dealing with such needs, escpecially date-time formats exist in enormous amounts, but I want to stress here that my approach is agnostic to my specific problem. For those asking to 'show me the code', I redirect to my 2nd patch, where the tests have been extended with my specific problem. What are your opinions about this? Kind regards, Kurt __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [RFC] readtable enhancement
Hey, In the meantime, I submitted a bug. Thanks for the assistence on that. >and I'm not convinced that >coercion failures should fallback gracefully to the default. the gracefull fallback: - makes the code more complex + keeps colConvert implementations limited + requires the user to only implement what changed from the default + seemed to me to smallest overall effort In my opinion, gracefull fallback makes the thing better, but without it, the colConvert parameter remains usefull, it would still fill a gap. >The implementation needs work though, Other than to remove the gracefull fallback? Kind regards, Kurt On wo, 27 mrt 2019 14:28:25 -0700, Michael Lawrence wrote: >This has some nice properties: >1) It self-documents the input expectations in a similar manner to >colClasses. >2) The implementation could eventually "push down" the coercion, e.g., >calling it on each chunk of an iterative read operation. >The implementation needs work though, and I'm not convinced that >coercion failures should fallback gracefully to the default. >Feature requests fall under a "bug" in bugzilla terminology, so please >submit this there. I think I've made you an account. >Thanks, >Michael > >On Wed, Mar 27, 2019 at 1:19 PM Kurt Van Dijck ><[1]dev.k...@vandijck-laurijssen.be> wrote: > > Thank you for your answers. > I rather do not file a new bug, since what I coded isn't really a > bug. > The problem I (my colleagues) have today is very stupid: > We read .csv files with a lot of columns, of which most contain > date-time stamps, coded in DD/MM/ HH:MM. > This is not exotic, but the base library's readtable (and > derivatives) > only accept date-times in a limited number of possible formats > (which I > understand very well). > We could specify a format in a rather complicated format, for each > column individually, but this syntax is rather difficult to > maintain. > My solution to this specific problem became trivial, yet generic > extension to read.table. > Rather than relying on the built-in type detection, I added a > parameter > to a function that will be called for each to-be-type-probed column > so I > can overrule the built-in limited default. > If nothing returns from the function, the built-in default is still > used. > This way, I could construct a type-probing function that is > straight-forward, not hard to code, and makes reading my .csv files > acceptible in terms of code (read.table parameters). > I'm sure I'm not the only one dealing with such needs, escpecially > date-time formats exist in enormous amounts, but I want to stress > here > that my approach is agnostic to my specific problem. > For those asking to 'show me the code', I redirect to my 2nd patch, > where the tests have been extended with my specific problem. > What are your opinions about this? > Kind regards, > Kurt __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [RFC] readtable enhancement
On wo, 27 mrt 2019 22:55:06 -0700, Gabriel Becker wrote: >Kurt, >Cool idea and great "seeing new faces" on here proposing things on here >and engaging with R-core on here. >Some comments on the issue of fallbacks below. >On Wed, Mar 27, 2019 at 10:33 PM Kurt Van Dijck ><[1]dev.k...@vandijck-laurijssen.be> wrote: > > Hey, > In the meantime, I submitted a bug. Thanks for the assistence on > that. > >and I'm not convinced that > >coercion failures should fallback gracefully to the default. > the gracefull fallback: > - makes the code more complex > + keeps colConvert implementations limited > + requires the user to only implement what changed from the default > + seemed to me to smallest overall effort > In my opinion, gracefull fallback makes the thing better, > but without it, the colConvert parameter remains usefull, it would > still > fill a gap. > >Another way of viewing coercion failure, I think, is that either the >user-supplied converter has a bug in it or was mistakenly applied in a >situation where it shouldn't have been. If thats the case the fail >early and loud paradigm might ultimately be more helpful to users >there. >Another thought in the same vein is that if fallback occurs, the >returned result will not be what the user asked for and is expecting. >So either their code which assumes (e.g., that a column has correctly >parsed as a date) is going to break in mysterious (to them) ways, or >they have to put a bunch of their own checking logic after the call to >see if their converters actually worked in order to protect themselves >from that. Neither really seems ideal to me; I think an error would be >better, myself. I'm more of a software developer than a script >writer/analyst though, so its possible others' opinions would differ >(though I'd be a bit surprised by that in this particular case given >the danger). I see. So if we provide a default colConvert, named e.g. colConvertBuiltin, which is used if colConvert is not given? 1) This respects the 'fail early and loud'. 2) The user would get what he asks for 3) A colConvert implementation would be able to call colConvertBuiltin manually if desired, so have colConvert limited to adding on top of the default implementation. If this is acceptable, I'll prepare a new patch. Kind regards, Kurt __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [Bug 17546] extend readtable with a hook for column type detection
Hey, Does someone have comments on the v4 of the proposed patch? Kind regards, Kurt __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] spss long labels
Hi, A frequently seen issue with importing SPSS data files, is that R does not import the 'long variable names'. I built a patch on the R-project's foreign module, in order to import the 'long variable names' from SPSS (record 7, subtype 13). To complete the job, I had to expand the "struct variable" definition to have 64 +1 charachters. I'm not aware of side effects. The sfm-read.c code works fine. I didn't test a variety of platforms, as I don't have an idea of what is regarded as sufficient testing. Anyway, I don't expect major troubles there (no byteswapping problems, no 32<->64 bit issues) as it's mainly character processing. The patch is relative to the foreign directory. It was created against the trunk of R-project yesterday. We would appreciate that you import such patch into the main tree. Kind regards, Kurt Van Dijck (C programmer) & Ilse Laurijssen (R user) Belgium Index: src/sfm-read.c === --- src/sfm-read.c (revision 5168) +++ src/sfm-read.c (working copy) @@ -188,6 +188,8 @@ static int read_variables (struct file_handle * h, struct variable *** var_by_index); static int read_machine_int32_info (struct file_handle * h, int size, int count, int *encoding); static int read_machine_flt64_info (struct file_handle * h, int size, int count); +static int read_long_var_names (struct file_handle * h, struct dictionary * + , unsigned long size, unsigned int count); static int read_documents (struct file_handle * h); /* Displays the message X with corrupt_msg, then jumps to the lossage @@ -418,11 +420,15 @@ break; case 7: /* Multiple-response sets (later versions of SPSS). */ - case 13: /* long variable names. PSPP now has code for these - that could be ported if someone is interested. */ skip = 1; break; + case 13: /* long variable names. PSPP now has code for these + that could be ported if someone is interested. */ + if (!read_long_var_names(h, ext->dict, data.size, data.count)) + goto lossage; + break; + case 16: /* See http://www.nabble.com/problem-loading-SPSS-15.0-save-files-t2726500.html */ skip = 1; break; @@ -584,14 +590,72 @@ return 0; } +/* Read record type 7, subtype 13. + * long variable names + */ static int +read_long_var_names (struct file_handle * h, struct dictionary * dict + , unsigned long size, unsigned int count) +{ + char * data; + unsigned int j; + struct variable ** lp; + struct variable ** end; + char * p; + char * endp; + char * val; + if ((1 != size)||(0 == count)) { +warning("%s: strange record info seen, size=%u, count=%u" + ", ignoring long variable names" + , h->fn, size, count); +return 0; + } + size *= count; + data = Calloc (size +1, char); + bufread(h, data, size, 0); + /* parse */ + end = &dict->var[dict->nvar]; + p = data; + do { +if (0 != (endp = strchr(p, '\t'))) + *endp = 0; /* put null terminator */ +if (0 == (val = strchr(p, '='))) { + warning("%s: no long variable name for variable '%s'", h->fn, p); +} else { + *val = 0; + ++val; + /* now, p is key, val is long name */ + for (lp = dict->var; lp < end; ++lp) { +if (!strcmp(lp[0]->name, p)) { + strncpy(lp[0]->name, val, sizeof(lp[0]->name)); + break; +} + } + if (lp >= end) { +warning("%s: long variable name mapping '%s' to '%s'" +"for variable which does not exist" +, h->fn, p, val); + } +} +p = &endp[1]; /* put to next */ + } while (endp); + + free(data); + return 1; + +lossage: + free(data); + return 0; +} + +static int read_header (struct file_handle * h, struct sfm_read_info * inf) { struct sfm_fhuser_ext *ext = h->ext; /* File extension strcut. */ struct sysfile_header hdr; /* Disk buffer. */ struct dictionary *dict; /* File dictionary. */ char prod_name[sizeof hdr.prod_name + 1];/* Buffer for product name. */ - int skip_amt = 0;/* Amount of product name to omit. */ + int skip_amt = 0;/* Amount of product name to omit. */ int i; /* Create the dictionary. */ @@ -1495,7 +1559,7 @@ /* Reads one case from system file H into the value array PERM according to the instructions given in associated dictionary DICT, which must have the get.* elements appropriately set. Returns - nonzero only if successful. */ + nonzero only if successful. */ int sfm_read_case (struct file_handle * h
[Rd] spss long labels
Hi all, I got no feedback at all concerning the merge of this patch in the source tree. Am I supposed to do this myself? How should I do this (do I have subversion commit access)? Is this patch acceptable at all? Is it being tested? I got some personal reactions on my post, proving there is general interest in getting rid of the inconvenience of importing long labels from SPSS files. Kurt Van Dijck wrote: Hi, A frequently seen issue with importing SPSS data files, is that R does not import the 'long variable names'. I built a patch on the R-project's foreign module, in order to import the 'long variable names' from SPSS (record 7, subtype 13). To complete the job, I had to expand the "struct variable" definition to have 64 +1 charachters. I'm not aware of side effects. The sfm-read.c code works fine. I didn't test a variety of platforms, as I don't have an idea of what is regarded as sufficient testing. Anyway, I don't expect major troubles there (no byteswapping problems, no 32<->64 bit issues) as it's mainly character processing. The patch is relative to the foreign directory. It was created against the trunk of R-project yesterday. We would appreciate that you import such patch into the main tree. Kind regards, Kurt Van Dijck (C programmer) & Ilse Laurijssen (R user) Belgium Index: src/sfm-read.c === --- src/sfm-read.c(revision 5168) +++ src/sfm-read.c(working copy) @@ -188,6 +188,8 @@ static int read_variables (struct file_handle * h, struct variable *** var_by_index); static int read_machine_int32_info (struct file_handle * h, int size, int count, int *encoding); static int read_machine_flt64_info (struct file_handle * h, int size, int count); +static int read_long_var_names (struct file_handle * h, struct dictionary * +, unsigned long size, unsigned int count); static int read_documents (struct file_handle * h); /* Displays the message X with corrupt_msg, then jumps to the lossage @@ -418,11 +420,15 @@ break; case 7: /* Multiple-response sets (later versions of SPSS). */ - case 13: /* long variable names. PSPP now has code for these - that could be ported if someone is interested. */ skip = 1; break; + case 13:/* long variable names. PSPP now has code for these + that could be ported if someone is interested. */ +if (!read_long_var_names(h, ext->dict, data.size, data.count)) + goto lossage; +break; + case 16: /* See http://www.nabble.com/problem-loading-SPSS-15.0-save-files-t2726500.html */ skip = 1; break; @@ -584,14 +590,72 @@ return 0; } +/* Read record type 7, subtype 13. + * long variable names + */ static int +read_long_var_names (struct file_handle * h, struct dictionary * dict +, unsigned long size, unsigned int count) +{ + char * data; + unsigned int j; + struct variable ** lp; + struct variable ** end; + char * p; + char * endp; + char * val; + if ((1 != size)||(0 == count)) { +warning("%s: strange record info seen, size=%u, count=%u" + ", ignoring long variable names" + , h->fn, size, count); +return 0; + } + size *= count; + data = Calloc (size +1, char); + bufread(h, data, size, 0); + /* parse */ + end = &dict->var[dict->nvar]; + p = data; + do { +if (0 != (endp = strchr(p, '\t'))) + *endp = 0; /* put null terminator */ +if (0 == (val = strchr(p, '='))) { + warning("%s: no long variable name for variable '%s'", h->fn, p); +} else { + *val = 0; + ++val; + /* now, p is key, val is long name */ + for (lp = dict->var; lp < end; ++lp) { +if (!strcmp(lp[0]->name, p)) { + strncpy(lp[0]->name, val, sizeof(lp[0]->name)); + break; +} + } + if (lp >= end) { +warning("%s: long variable name mapping '%s' to '%s'" +"for variable which does not exist" +, h->fn, p, val); + } +} +p = &endp[1]; /* put to next */ + } while (endp); + + free(data); + return 1; + +lossage: + free(data); + return 0; +} + +static int read_header (struct file_handle * h, struct sfm_read_info * inf) { struct sfm_fhuser_ext *ext = h->ext;/* File extension strcut. */ struct sysfile_header hdr;/* Disk buffer. */ struct dictionary *dict;/* File dictionary. */ char prod_name[sizeof hdr.prod_name + 1];/* Buffer for product name. */ - int skip_amt = 0;/* Amount of product name to omit. */ + int skip_amt = 0;/* Amount of product name to omit. */ int i; /* Create the dictionary. */ @@ -1495,7 +1559,7 @@ /* Reads one c
[Rd] spss long labels
On Tue, Jul 15, 2008 at 09:29:22AM +0100, Prof Brian Ripley wrote: > On Tue, 15 Jul 2008, Martin Maechler wrote: > > >Hi Kurt, > > > >>>>>>"KVD" == Kurt Van Dijck <[EMAIL PROTECTED]> > >>>>>>on Wed, 09 Jul 2008 10:05:39 +0200 writes: > > > > KVD> Hi all, I got no feedback at all concerning the merge > > KVD> of this patch in the source tree. Am I supposed to do > > KVD> this myself? How should I do this (do I have subversion > > KVD> commit access)? Is this patch acceptable at all? Is it > > KVD> being tested? > > > >I don't know if it's being tested. > >It's vacation and traveling time, also for the R core team. > > Indeed. This is on my TODO list, but I've been away (and unexpectedly > mainly offline) for the last two weeks, and will be again until Friday. > > Hopefully I will have a chance to take a look next week, but we do need > at least one example file. (I could generate SPSS examples, but they may > not be what you are trying to test.) > > > > >The foreign package source is kept in svn-archive > > https://svn.r-project.org/R-packages/trunk/foreign/ > > > >and I have tried to apply your patch (from July 2) to the > >sources but > > > > patch -p0 < K_Van_Dijck_patch > > > > patching file src/sfm-read.c > > Hunk #1 FAILED at 188. > > Hunk #2 FAILED at 420. > > Hunk #3 FAILED at 590. > > Hunk #4 FAILED at 1559. > > 4 out of 4 hunks FAILED -- saving rejects to file src/sfm-read.c.rej > > patching file src/var.h.in > > Hunk #1 FAILED at 41. > > Hunk #2 FAILED at 232. > > Hunk #3 FAILED at 306. > > Hunk #4 FAILED at 377. > > 4 out of 4 hunks FAILED -- saving rejects to file src/var.h.in.rej > > > > > >Could you provide a patch against the development code from the > >above url ? > >(after installing 'subversion', you get the development directory by > > svn co https://svn.r-project.org/R-packages/trunk/foreign/ > >) I had problems with whitespace in the patch file, I attached a new one > > KVD> I got some personal reactions on my post, proving there > > KVD> is general interest in getting rid of the inconvenience > > KVD> of importing long labels from SPSS files. > > > >My problem is that I cannot do much testing apart from the tests > >already present in foreign/tests/spss.R > > > >Could you provide a new small *.sav file and a corresponding > >read.spss() call which exhibits the > >problems and is fixed by your patch? > > > >Thank you in advance for your contribution! > >Best regards, > > I ran the spss.R in tests/, it worked fine. Be sure to clean all object files before compiling. Ilse made me a test .sav file (attached) with 2 variables (varialbe1 & variable2), 3 records. This piece of R code shows the problem: # to resolve locale problem Sys.setlocale (locale="C"); # read spss datafile library(foreign); data = read.spss("spss_long.sav", to.data.frame=TRUE); # to.data.frame not necessary, but gives nicer output # commands to show the data, the variable names and labels data; names(data); attr(data, "variable.labels"); # result in unpatched version: # both variable names are in shortened form # (max 8 characters; provided in SPSS-file) #> data; # VARIABLE V2_A #111 #221 #323 # #> names(data); #[1] "VARIABLE" "V2_A" # #> attr(data,"variable.labels"); # VARIABLEV2_A #"variable1" "variable2" # and in patched version: # variable names are the full names as originally defined in the SPSS-file #> data; # variable1 variable2 #1 1 1 #2 2 1 #3 2 3 #> names(data); #[1] "variable1" "variable2" #> attr(data, "variable.labels"); # variable1 variable2 #"variable1" "variable2" Kind regards, Kurt & Ilse Index: src/sfm-read.c === --- src/sfm-read.c (revision 5175) +++ src/sfm-read.c (working copy) @@ -188,6 +188,8 @@ static int read_variables (struct file_handle * h, struct variable *** var_by_index); static int read_machine_int32_info (struct file_handle * h, int size, int count, int *encoding); static int read_machine_flt64_info (struct file_handle * h, int size, int count); +static int read_long_var_names (struct file_handle * h, struct dictionary * + , unsigned long size, unsigned int count); static int read_documents (struct file_h
[Rd] spss endianness bugfix
Hi, We just upgraded the MacOSX R with foreign package 0.8.27 with CRAN to have SPSS long variable names. Script tests/spss.R runs fine. Thanks for importing the spss long labels patch & having this available for MacOSX! The first real-life datafile does not get loaded with "Unexpected end of file". The SPSS file was saved on a P4 linux, R running on MacOSX (ppc). SPSS on MacOSX could read the file. Saving in SPSS there brings the same problem when reading on P4 linux. Testing proved that the problem existed with earlier versions of the foreign package too. I found the problem in src/sfm-read.c, read_document(). The n_lines variable does not get byte-swapped. This patch solves the problem. Kind regards, Kurt Van Dijck & Ilse Laurijssen Index: src/sfm-read.c === --- src/sfm-read.c (revision 5177) +++ src/sfm-read.c (working copy) @@ -1336,6 +1336,8 @@ h->fn)); assertive_bufread(h, &n_lines, sizeof n_lines, 0); + if (ext->reverse_endian) +bswap_int32 (&n_lines); dict->n_documents = n_lines; if (dict->n_documents <= 0) lose ((_("%s: Number of document lines (%d) must be greater than 0"), __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel