from:"Kurt Van Dijck"

[Rd] [PATCH 2/2] readtable: add test for type conversion hook 'colConvert'

2019-03-25 Thread Kurt Van Dijck

Signed-off-by: Kurt Van Dijck 
---
 tests/reg-tests-2.R | 21 +
 tests/reg-tests-2.Rout.save | 27 +++
 2 files changed, 48 insertions(+)

diff --git a/tests/reg-tests-2.R b/tests/reg-tests-2.R
index 9fd5242..5026fe7 100644
--- a/tests/reg-tests-2.R
+++ b/tests/reg-tests-2.R
@@ -1329,6 +1329,27 @@ unlink(foo)
 ## added in 2.0.0
 
 
+## colConvert in read.table
+probecol <- function(col) {
+   tmp <- as.POSIXlt(col, optional=TRUE, tryFormats=c("%d/%m/%Y %H:%M"));
+   if (all(!is.na(tmp)))
+   return (tmp)
+   tmp <- as.POSIXlt(col, optional=TRUE, tryFormats=c("%d/%m/%Y"));
+   if (all(!is.na(tmp)))
+   return (tmp)
+}
+
+Mat <- matrix(c(1:3, letters[1:3], 1:3, LETTERS[1:3],
+c("22/4/1969", "8/4/1971", "23/9/1973"),
+c("22/4/1969 6:01", " 8/4/1971 7:23", "23/9/1973 8:45")),
+  3, 6)
+foo <- tempfile()
+write.table(Mat, foo, sep = ",", col.names = FALSE, row.names = FALSE)
+read.table(foo, sep = ",", colConvert=probecol)
+unlist(sapply(.Last.value, class))
+unlink(foo)
+
+
 ## write.table with complex columns (PR#7260, in part)
 write.table(data.frame(x = 0.5+1:4, y = 1:4 + 1.5i), file = "")
 # printed all as complex in 2.0.0.
diff --git a/tests/reg-tests-2.Rout.save b/tests/reg-tests-2.Rout.save
index 598dd71..668898e 100644
--- a/tests/reg-tests-2.Rout.save
+++ b/tests/reg-tests-2.Rout.save
@@ -4206,6 +4206,33 @@ Warning message:
 > ## added in 2.0.0
 > 
 > 
+> ## colConvert in read.table
+> probecol <- function(col) {
++  tmp <- as.POSIXlt(col, optional=TRUE, tryFormats=c("%d/%m/%Y %H:%M"));
++  if (all(!is.na(tmp)))
++  return (tmp)
++  tmp <- as.POSIXlt(col, optional=TRUE, tryFormats=c("%d/%m/%Y"));
++  if (all(!is.na(tmp)))
++  return (tmp)
++ }
+> 
+> Mat <- matrix(c(1:3, letters[1:3], 1:3, LETTERS[1:3],
++ c("22/4/1969", "8/4/1971", "23/9/1973"),
++ c("22/4/1969 6:01", " 8/4/1971 7:23", "23/9/1973 8:45")),
++   3, 6)
+> foo <- tempfile()
+> write.table(Mat, foo, sep = ",", col.names = FALSE, row.names = FALSE)
+> read.table(foo, sep = ",", colConvert=probecol)
+  V1 V2 V3 V4 V5  V6
+1  1  a  1  A 1969-04-22 1969-04-22 06:01:00
+2  2  b  2  B 1971-04-08 1971-04-08 07:23:00
+3  3  c  3  C 1973-09-23 1973-09-23 08:45:00
+> unlist(sapply(.Last.value, class))
+   V1V2V3V4   V51   V52   V61   
V62 
+"integer"  "factor" "integer"  "factor" "POSIXlt"  "POSIXt" "POSIXlt"  
"POSIXt" 
+> unlink(foo)
+> 
+> 
 > ## write.table with complex columns (PR#7260, in part)
 > write.table(data.frame(x = 0.5+1:4, y = 1:4 + 1.5i), file = "")
 "x" "y"
-- 
1.8.5.rc3

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] [PATCH 1/2] readtable: add hook for type conversions per column

2019-03-25 Thread Kurt Van Dijck

This commit adds a function parameter to readtable. The function is called
for every column.
The goal is to allow specific (non-standard) type conversions depending on the 
input.
When the parameter is not given, or the function returns NULL, the legacy 
default applies.
The colClasses parameter still takes precedence, i.e. the colConvertFn only 
applies to
the default conversions.
This allows to properly load a .csv with timestamps expressed in the (quite 
common) %d/%m/%y %H:%M format,
which was impossible since overruling as.POSIXlt makes a copy in the users 
namespace, and
read.table would still take the base version of as.POSIXlt.
Rather than fixing my specific requirement, this hook allows to probe for any 
custom format
and do smart things with little syntax.

Signed-off-by: Kurt Van Dijck 
---
 src/library/utils/R/readtable.R | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/src/library/utils/R/readtable.R b/src/library/utils/R/readtable.R
index 238542e..076a707 100644
--- a/src/library/utils/R/readtable.R
+++ b/src/library/utils/R/readtable.R
@@ -65,6 +65,7 @@ function(file, header = FALSE, sep = "", quote = "\"'", dec = 
".",
  strip.white = FALSE, blank.lines.skip = TRUE,
  comment.char = "#", allowEscapes = FALSE, flush = FALSE,
  stringsAsFactors = default.stringsAsFactors(),
+ colConvert = NULL,
  fileEncoding = "", encoding = "unknown", text, skipNul = FALSE)
 {
 if (missing(file) && !missing(text)) {
@@ -226,9 +227,18 @@ function(file, header = FALSE, sep = "", quote = "\"'", 
dec = ".",
 if(rlabp) do[1L] <- FALSE # don't convert "row.names"
 for (i in (1L:cols)[do]) {
 data[[i]] <-
-if (is.na(colClasses[i]))
+if (is.na(colClasses[i])) {
+tmp <- NULL
+if (!is.null(colConvert))
+# attempt to convert from user provided hook
+tmp <- colConvert(data[[i]])
+if (!is.null(tmp))
+(tmp)
+else
+# fallback, default
 type.convert(data[[i]], as.is = as.is[i], dec=dec,
 numerals=numerals, na.strings = character(0L))
+}
 ## as na.strings have already been converted to 
 else if (colClasses[i] == "factor") as.factor(data[[i]])
 else if (colClasses[i] == "Date") as.Date(data[[i]])
-- 
1.8.5.rc3

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [PATCH 1/2] readtable: add hook for type conversions per column

2019-03-26 Thread Kurt Van Dijck

Hello,

I want to find out if this patch is ok or not, and if not, what should
change.

Kind regards,
Kurt

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [PATCH 1/2] readtable: add hook for type conversions per column

2019-03-26 Thread Kurt Van Dijck

On di, 26 mrt 2019 12:48:12 -0700, Michael Lawrence wrote:
>Please file a bug on bugzilla so we can discuss this further.

All fine.
I didn't find a way to create an account on bugs.r-project.org.
Did I just not see it? or do I need administrator assistance?

Kind regards,
Kurt

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [RFC] readtable enhancement

2019-03-27 Thread Kurt Van Dijck

Thank you for your answers.
I rather do not file a new bug, since what I coded isn't really a bug.

The problem I (my colleagues) have today is very stupid:
We read .csv files with a lot of columns, of which most contain
date-time stamps, coded in DD/MM/ HH:MM.
This is not exotic, but the base library's readtable (and derivatives)
only accept date-times in a limited number of possible formats (which I
understand very well).

We could specify a format in a rather complicated format, for each
column individually, but this syntax is rather difficult to maintain.

My solution to this specific problem became trivial, yet generic
extension to read.table.
Rather than relying on the built-in type detection, I added a parameter
to a function that will be called for each to-be-type-probed column so I
can overrule the built-in limited default.
If nothing returns from the function, the built-in default is still
used.

This way, I could construct a type-probing function that is
straight-forward, not hard to code, and makes reading my .csv files
acceptible in terms of code (read.table parameters).

I'm sure I'm not the only one dealing with such needs, escpecially
date-time formats exist in enormous amounts, but I want to stress here
that my approach is agnostic to my specific problem.

For those asking to 'show me the code', I redirect to my 2nd patch,
where the tests have been extended with my specific problem.

What are your opinions about this?

Kind regards,
Kurt

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [RFC] readtable enhancement

2019-03-27 Thread Kurt Van Dijck

Hey,

In the meantime, I submitted a bug. Thanks for the assistence on that.

>and I'm not convinced that
>coercion failures should fallback gracefully to the default.

the gracefull fallback:
- makes the code more complex
+ keeps colConvert implementations limited
+ requires the user to only implement what changed from the default
+ seemed to me to smallest overall effort

In my opinion, gracefull fallback makes the thing better,
but without it, the colConvert parameter remains usefull, it would still
fill a gap.

>The implementation needs work though,

Other than to remove the gracefull fallback?

Kind regards,
Kurt

On wo, 27 mrt 2019 14:28:25 -0700, Michael Lawrence wrote:
>This has some nice properties:
>1) It self-documents the input expectations in a similar manner to
>colClasses.
>2) The implementation could eventually "push down" the coercion, e.g.,
>calling it on each chunk of an iterative read operation.
>The implementation needs work though, and I'm not convinced that
>coercion failures should fallback gracefully to the default.
>Feature requests fall under a "bug" in bugzilla terminology, so please
>submit this there. I think I've made you an account.
>Thanks,
>Michael
> 
>On Wed, Mar 27, 2019 at 1:19 PM Kurt Van Dijck
><[1]dev.k...@vandijck-laurijssen.be> wrote:
> 
>  Thank you for your answers.
>  I rather do not file a new bug, since what I coded isn't really a
>  bug.
>  The problem I (my colleagues) have today is very stupid:
>  We read .csv files with a lot of columns, of which most contain
>  date-time stamps, coded in DD/MM/ HH:MM.
>  This is not exotic, but the base library's readtable (and
>  derivatives)
>  only accept date-times in a limited number of possible formats
>  (which I
>  understand very well).
>  We could specify a format in a rather complicated format, for each
>  column individually, but this syntax is rather difficult to
>  maintain.
>  My solution to this specific problem became trivial, yet generic
>  extension to read.table.
>  Rather than relying on the built-in type detection, I added a
>  parameter
>  to a function that will be called for each to-be-type-probed column
>  so I
>  can overrule the built-in limited default.
>  If nothing returns from the function, the built-in default is still
>  used.
>  This way, I could construct a type-probing function that is
>  straight-forward, not hard to code, and makes reading my .csv files
>  acceptible in terms of code (read.table parameters).
>  I'm sure I'm not the only one dealing with such needs, escpecially
>  date-time formats exist in enormous amounts, but I want to stress
>  here
>  that my approach is agnostic to my specific problem.
>  For those asking to 'show me the code', I redirect to my 2nd patch,
>  where the tests have been extended with my specific problem.
>  What are your opinions about this?
>  Kind regards,
>  Kurt

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [RFC] readtable enhancement

2019-03-28 Thread Kurt Van Dijck

On wo, 27 mrt 2019 22:55:06 -0700, Gabriel Becker wrote:
>Kurt,
>Cool idea and great "seeing new faces" on here proposing things on here
>and engaging with R-core on here.
>Some comments on the issue of fallbacks below.
>On Wed, Mar 27, 2019 at 10:33 PM Kurt Van Dijck
><[1]dev.k...@vandijck-laurijssen.be> wrote:
> 
>  Hey,
>  In the meantime, I submitted a bug. Thanks for the assistence on
>  that.
>  >and I'm not convinced that
>  >coercion failures should fallback gracefully to the default.
>  the gracefull fallback:
>  - makes the code more complex
>  + keeps colConvert implementations limited
>  + requires the user to only implement what changed from the default
>  + seemed to me to smallest overall effort
>  In my opinion, gracefull fallback makes the thing better,
>  but without it, the colConvert parameter remains usefull, it would
>  still
>  fill a gap.
> 
>Another way of viewing coercion failure, I think, is that either the
>user-supplied converter has a bug in it or was mistakenly applied in a
>situation where it shouldn't have been. If thats the case the fail
>early and loud paradigm might ultimately be more helpful to users
>there.
>Another thought in the same vein is that if fallback occurs, the
>returned result will not be what the user asked for and is expecting.
>So either their code which assumes (e.g., that a column has correctly
>parsed as a date) is going to break in mysterious (to them) ways, or
>they have to put a bunch of their own checking logic after the call to
>see if their converters actually worked in order to protect themselves
>from that.  Neither really seems ideal to me; I think an error would be
>better, myself. I'm more of a software developer than a script
>writer/analyst though, so its possible others' opinions would differ
>(though I'd be a bit surprised by that in this particular case given
>the danger).

I see.
So if we provide a default colConvert, named e.g. colConvertBuiltin,
which is used if colConvert is not given?
1) This respects the 'fail early and loud'.
2) The user would get what he asks for
3) A colConvert implementation would be able to call colConvertBuiltin
manually if desired, so have colConvert limited to adding on top of the
default implementation.

If this is acceptable, I'll prepare a new patch.

Kind regards,
Kurt

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [Bug 17546] extend readtable with a hook for column type detection

2019-04-02 Thread Kurt Van Dijck

Hey,

Does someone have comments on the v4 of the proposed patch?

Kind regards,
Kurt

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] spss long labels

2008-07-02 Thread Kurt Van Dijck


Hi,

A frequently seen issue with importing SPSS data files, is that R does
not import the 'long variable names'.
I built a patch on the R-project's foreign module, in order to import
the 'long variable names' from SPSS (record 7, subtype 13).
To complete the job, I had to expand the "struct variable" definition
to have 64 +1 charachters. I'm not aware of side effects.
The sfm-read.c code works fine.
I didn't test a variety of platforms, as I don't have an idea of what is
regarded as sufficient testing. Anyway, I don't expect major troubles there
(no byteswapping problems, no 32<->64 bit issues) as it's mainly character 
processing.
The patch is relative to the foreign directory. It was created
against the trunk of R-project yesterday.

We would appreciate that you import such patch into the main tree.

Kind regards,

Kurt Van Dijck (C programmer) & Ilse Laurijssen (R user)
Belgium

Index: src/sfm-read.c
===
--- src/sfm-read.c  (revision 5168)
+++ src/sfm-read.c  (working copy)
@@ -188,6 +188,8 @@
 static int read_variables (struct file_handle * h, struct variable *** 
var_by_index);
 static int read_machine_int32_info (struct file_handle * h, int size, int 
count, int *encoding);
 static int read_machine_flt64_info (struct file_handle * h, int size, int 
count);
+static int read_long_var_names (struct file_handle * h, struct dictionary *
+   , unsigned long size, unsigned int count);
 static int read_documents (struct file_handle * h);

 /* Displays the message X with corrupt_msg, then jumps to the lossage
@@ -418,11 +420,15 @@
break;

  case 7: /* Multiple-response sets (later versions of SPSS). */
- case 13:  /* long variable names. PSPP now has code for these
-  that could be ported if someone is interested. */
skip = 1;
break;

+ case 13:  /* long variable names. PSPP now has code for these
+  that could be ported if someone is interested. */
+   if (!read_long_var_names(h, ext->dict, data.size, data.count))
+ goto lossage;
+   break;
+
  case 16: /* See 
http://www.nabble.com/problem-loading-SPSS-15.0-save-files-t2726500.html */
skip = 1;
break;
@@ -584,14 +590,72 @@
   return 0;
 }

+/* Read record type 7, subtype 13.
+ * long variable names
+ */
 static int
+read_long_var_names (struct file_handle * h, struct dictionary * dict
+   , unsigned long size, unsigned int count)
+{
+  char * data;
+  unsigned int j;
+  struct variable ** lp;
+  struct variable ** end;
+  char * p;
+  char * endp;
+  char * val;
+  if ((1 != size)||(0 == count)) {
+warning("%s: strange record info seen, size=%u, count=%u"
+  ", ignoring long variable names"
+  , h->fn, size, count);
+return 0;
+  }
+  size *= count;
+  data = Calloc (size +1, char);
+  bufread(h, data, size, 0);
+  /* parse */
+  end = &dict->var[dict->nvar];
+  p = data;
+  do {
+if (0 != (endp = strchr(p, '\t')))
+  *endp = 0; /* put null terminator */
+if (0 == (val = strchr(p, '='))) {
+  warning("%s: no long variable name for variable '%s'", h->fn, p);
+} else {
+  *val = 0;
+  ++val;
+  /* now, p is key, val is long name */
+  for (lp = dict->var; lp < end; ++lp) {
+if (!strcmp(lp[0]->name, p)) {
+  strncpy(lp[0]->name, val, sizeof(lp[0]->name));
+  break;
+}
+  }
+  if (lp >= end) {
+warning("%s: long variable name mapping '%s' to '%s'"
+"for variable which does not exist"
+, h->fn, p, val);
+  }
+}
+p = &endp[1]; /* put to next */
+  } while (endp);
+
+  free(data);
+  return 1;
+
+lossage:
+  free(data);
+  return 0;
+}
+
+static int
 read_header (struct file_handle * h, struct sfm_read_info * inf)
 {
   struct sfm_fhuser_ext *ext = h->ext;  /* File extension strcut. */
   struct sysfile_header hdr;   /* Disk buffer. */
   struct dictionary *dict; /* File dictionary. */
   char prod_name[sizeof hdr.prod_name + 1];/* Buffer for product name. */
-  int skip_amt = 0;/* Amount of product name to omit. */
+  int skip_amt = 0;/* Amount of product name to omit. */
   int i;

   /* Create the dictionary. */
@@ -1495,7 +1559,7 @@
 /* Reads one case from system file H into the value array PERM
according to the instructions given in associated dictionary DICT,
which must have the get.* elements appropriately set.  Returns
-   nonzero only if successful.  */
+   nonzero only if successful. */
 int
 sfm_read_case (struct file_handle * h

[Rd] spss long labels

2008-07-09 Thread Kurt Van Dijck


Hi all,

I got no feedback at all concerning the merge of this patch in the source tree.
Am I supposed to do this myself? How should I do this (do I have subversion 
commit
access)? Is this patch acceptable at all? Is it being tested?

I got some personal reactions on my post, proving there is general interest in
getting rid of the inconvenience of importing long labels from SPSS files.



Kurt Van Dijck wrote:

Hi,

A frequently seen issue with importing SPSS data files, is that R does
not import the 'long variable names'.
I built a patch on the R-project's foreign module, in order to import
the 'long variable names' from SPSS (record 7, subtype 13).
To complete the job, I had to expand the "struct variable" definition
to have 64 +1 charachters. I'm not aware of side effects.
The sfm-read.c code works fine.
I didn't test a variety of platforms, as I don't have an idea of what is
regarded as sufficient testing. Anyway, I don't expect major troubles there
(no byteswapping problems, no 32<->64 bit issues) as it's mainly 
character processing.

The patch is relative to the foreign directory. It was created
against the trunk of R-project yesterday.

We would appreciate that you import such patch into the main tree.

Kind regards,

Kurt Van Dijck (C programmer) & Ilse Laurijssen (R user)
Belgium

Index: src/sfm-read.c
===
--- src/sfm-read.c(revision 5168)
+++ src/sfm-read.c(working copy)
@@ -188,6 +188,8 @@
 static int read_variables (struct file_handle * h, struct variable *** 
var_by_index);
 static int read_machine_int32_info (struct file_handle * h, int size, 
int count, int *encoding);
 static int read_machine_flt64_info (struct file_handle * h, int size, 
int count);
+static int read_long_var_names (struct file_handle * h, struct 
dictionary *

+, unsigned long size, unsigned int count);
 static int read_documents (struct file_handle * h);

 /* Displays the message X with corrupt_msg, then jumps to the lossage
@@ -418,11 +420,15 @@
 break;

   case 7: /* Multiple-response sets (later versions of SPSS). */
-  case 13:  /* long variable names. PSPP now has code for these
-   that could be ported if someone is interested. */
 skip = 1;
 break;

+  case 13:/* long variable names. PSPP now has code for these
+   that could be ported if someone is interested. */
+if (!read_long_var_names(h, ext->dict, data.size, data.count))
+  goto lossage;
+break;
+
   case 16: /* See 
http://www.nabble.com/problem-loading-SPSS-15.0-save-files-t2726500.html */

 skip = 1;
 break;
@@ -584,14 +590,72 @@
   return 0;
 }

+/* Read record type 7, subtype 13.
+ * long variable names
+ */
 static int
+read_long_var_names (struct file_handle * h, struct dictionary * dict
+, unsigned long size, unsigned int count)
+{
+  char * data;
+  unsigned int j;
+  struct variable ** lp;
+  struct variable ** end;
+  char * p;
+  char * endp;
+  char * val;
+  if ((1 != size)||(0 == count)) {
+warning("%s: strange record info seen, size=%u, count=%u"
+  ", ignoring long variable names"
+  , h->fn, size, count);
+return 0;
+  }
+  size *= count;
+  data = Calloc (size +1, char);
+  bufread(h, data, size, 0);
+  /* parse */
+  end = &dict->var[dict->nvar];
+  p = data;
+  do {
+if (0 != (endp = strchr(p, '\t')))
+  *endp = 0; /* put null terminator */
+if (0 == (val = strchr(p, '='))) {
+  warning("%s: no long variable name for variable '%s'", h->fn, p);
+} else {
+  *val = 0;
+  ++val;
+  /* now, p is key, val is long name */
+  for (lp = dict->var; lp < end; ++lp) {
+if (!strcmp(lp[0]->name, p)) {
+  strncpy(lp[0]->name, val, sizeof(lp[0]->name));
+  break;
+}
+  }
+  if (lp >= end) {
+warning("%s: long variable name mapping '%s' to '%s'"
+"for variable which does not exist"
+, h->fn, p, val);
+  }
+}
+p = &endp[1]; /* put to next */
+  } while (endp);
+
+  free(data);
+  return 1;
+
+lossage:
+  free(data);
+  return 0;
+}
+
+static int
 read_header (struct file_handle * h, struct sfm_read_info * inf)
 {
   struct sfm_fhuser_ext *ext = h->ext;/* File extension strcut. */
   struct sysfile_header hdr;/* Disk buffer. */
   struct dictionary *dict;/* File dictionary. */
   char prod_name[sizeof hdr.prod_name + 1];/* Buffer for product 
name. */

-  int skip_amt = 0;/* Amount of product name to omit. */
+  int skip_amt = 0;/* Amount of product name to omit. */
   int i;

   /* Create the dictionary. */
@@ -1495,7 +1559,7 @@
 /* Reads one c

[Rd] spss long labels

2008-07-15 Thread Kurt Van Dijck

On Tue, Jul 15, 2008 at 09:29:22AM +0100, Prof Brian Ripley wrote:
> On Tue, 15 Jul 2008, Martin Maechler wrote:
> 
> >Hi Kurt,
> >
> >>>>>>"KVD" == Kurt Van Dijck <[EMAIL PROTECTED]>
> >>>>>>on Wed, 09 Jul 2008 10:05:39 +0200 writes:
> >
> >   KVD> Hi all, I got no feedback at all concerning the merge
> >   KVD> of this patch in the source tree.  Am I supposed to do
> >   KVD> this myself? How should I do this (do I have subversion
> >   KVD> commit access)? Is this patch acceptable at all? Is it
> >   KVD> being tested?
> >
> >I don't know if it's being tested.
> >It's vacation and traveling time, also for the R core team.
> 
> Indeed.  This is on my TODO list, but I've been away (and unexpectedly 
> mainly offline) for the last two weeks, and will be again until Friday.
> 
> Hopefully I will have a chance to take a look next week, but we do need 
> at least one example file.  (I could generate SPSS examples, but they may 
> not be what you are trying to test.)
> 
> >
> >The foreign package source is kept in svn-archive
> > https://svn.r-project.org/R-packages/trunk/foreign/
> >
> >and I have tried to apply your patch (from July 2) to the
> >sources but
> >
> >   patch -p0 < K_Van_Dijck_patch
> >
> >   patching file src/sfm-read.c
> >   Hunk #1 FAILED at 188.
> >   Hunk #2 FAILED at 420.
> >   Hunk #3 FAILED at 590.
> >   Hunk #4 FAILED at 1559.
> >   4 out of 4 hunks FAILED -- saving rejects to file src/sfm-read.c.rej
> >   patching file src/var.h.in
> >   Hunk #1 FAILED at 41.
> >   Hunk #2 FAILED at 232.
> >   Hunk #3 FAILED at 306.
> >   Hunk #4 FAILED at 377.
> >   4 out of 4 hunks FAILED -- saving rejects to file src/var.h.in.rej
> >
> >
> >Could you provide a patch against the development code from the
> >above url ?
> >(after installing 'subversion', you get the development directory by
> >  svn co https://svn.r-project.org/R-packages/trunk/foreign/
> >)
I had problems with whitespace in the patch file,
I attached a new one
> >   KVD> I got some personal reactions on my post, proving there
> >   KVD> is general interest in getting rid of the inconvenience
> >   KVD> of importing long labels from SPSS files.
> >
> >My problem is that I cannot do much testing apart from the tests
> >already present in foreign/tests/spss.R
> >
> >Could you provide a new small *.sav file and a corresponding
> >read.spss() call which exhibits the
> >problems and is fixed by your patch?
> >
> >Thank you in advance for your contribution!
> >Best regards,
> >
I ran the spss.R in tests/, it worked fine. Be sure to clean all object
files before compiling.

Ilse made me a test .sav file (attached) with 2 variables (varialbe1 &
  variable2), 3 records.
This piece of R code shows the problem:

# to resolve locale problem
Sys.setlocale (locale="C");
# read spss datafile
library(foreign);
data = read.spss("spss_long.sav", to.data.frame=TRUE);
# to.data.frame not necessary, but gives nicer output
# commands to show the data, the variable names and labels
data;
names(data);
attr(data, "variable.labels");

# result in unpatched version:
# both variable names are in shortened form 
# (max 8 characters; provided in SPSS-file)

#> data;
#  VARIABLE V2_A
#111
#221
#323
#
#> names(data);
#[1] "VARIABLE" "V2_A"
#
#> attr(data,"variable.labels");
#   VARIABLEV2_A 
#"variable1" "variable2" 

# and in patched version:
# variable names are the full names as originally defined in the SPSS-file

#> data;
#  variable1 variable2
#1 1 1
#2 2 1
#3 2 3
#> names(data);
#[1] "variable1" "variable2"
#> attr(data, "variable.labels");
#  variable1   variable2 
#"variable1" "variable2" 

Kind regards,
Kurt & Ilse

Index: src/sfm-read.c
===
--- src/sfm-read.c  (revision 5175)
+++ src/sfm-read.c  (working copy)
@@ -188,6 +188,8 @@
 static int read_variables (struct file_handle * h, struct variable *** 
var_by_index);
 static int read_machine_int32_info (struct file_handle * h, int size, int 
count, int *encoding);
 static int read_machine_flt64_info (struct file_handle * h, int size, int 
count);
+static int read_long_var_names (struct file_handle * h, struct dictionary *
+   , unsigned long size, unsigned int count);
 static int read_documents (struct file_h

[Rd] spss endianness bugfix

2008-07-17 Thread Kurt Van Dijck

Hi,

We just upgraded the MacOSX R with foreign package 0.8.27 with
CRAN to have SPSS long variable names. Script tests/spss.R runs
fine. Thanks for importing the spss long labels patch & having
this available for MacOSX!
The first real-life datafile
does not get loaded with "Unexpected end of file".

The SPSS file was saved on a P4 linux, R running on MacOSX (ppc).
SPSS on MacOSX could read the file. Saving in SPSS there brings the 
same problem when reading on P4 linux. Testing proved that the problem
existed with earlier versions of the foreign package too.

I found the problem in src/sfm-read.c, read_document(). The n_lines
variable does not get byte-swapped.

This patch solves the problem.

Kind regards,
Kurt Van Dijck & Ilse Laurijssen

Index: src/sfm-read.c
===
--- src/sfm-read.c  (revision 5177)
+++ src/sfm-read.c  (working copy)
@@ -1336,6 +1336,8 @@
   h->fn));
 
   assertive_bufread(h, &n_lines, sizeof n_lines, 0);
+  if (ext->reverse_endian)
+bswap_int32 (&n_lines);
   dict->n_documents = n_lines;
   if (dict->n_documents <= 0)
 lose ((_("%s: Number of document lines (%d) must be greater than 0"),

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] [PATCH 2/2] readtable: add test for type conversion hook 'colConvert'

[Rd] [PATCH 1/2] readtable: add hook for type conversions per column

Re: [Rd] [PATCH 1/2] readtable: add hook for type conversions per column

Re: [Rd] [PATCH 1/2] readtable: add hook for type conversions per column

Re: [Rd] [RFC] readtable enhancement

Re: [Rd] [RFC] readtable enhancement

Re: [Rd] [RFC] readtable enhancement

Re: [Rd] [Bug 17546] extend readtable with a hook for column type detection

[Rd] spss long labels

[Rd] spss long labels

[Rd] spss long labels

[Rd] spss endianness bugfix

12 matches

Site Navigation

Mail list logo

Footer information