[Rd] [PATCH] Improve utf8clen and remove utf8_table4

2017-03-19 Thread Sahil Kang

Given a char `c' which should be the start byte of a utf8 character,
the utf8clen function returns the byte length of the utf8 character.

Before this patch, the utf8clen function would return either:
* 1 if `c' was an ascii character or a utf8 continuation byte
* An int in the range [2, 6] indicating the byte length of the utf8 
character


With this patch, the utf8clen function will now return either:
* -1 if `c' is not a valid utf8 start byte
* The byte length of the utf8 character (the number of leading 1's, 
really)


I believe returning -1 for continuation bytes makes utf8clen less error 
prone.

The utf8_table4 array is no longer needed and has been removed.

Sahil
Index: src/main/util.c
===
--- src/main/util.c	(revision 72365)
+++ src/main/util.c	(working copy)
@@ -1183,18 +1183,23 @@
 return TRUE;
 }
 
-/* Number of additional bytes */
-static const unsigned char utf8_table4[] = {
-  1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
-  1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
-  2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,
-  3,3,3,3,3,3,3,3,4,4,4,4,5,5,5,5 };
-
+/*
+ * If `c' is not a valid utf8 start byte, return -1.
+ * Otherwise, return the number of bytes in the utf8 string with start byte `c'
+ */
 int attribute_hidden utf8clen(char c)
 {
-/* This allows through 8-bit chars 10xx, which are invalid */
-if ((c & 0xc0) != 0xc0) return 1;
-return 1 + utf8_table4[c & 0x3f];
+int n = 0; /* number of leading 1's */
+int m = 0x80; /* byte mask */
+
+while (c & m) {
+++n;
+m >>= 1;
+}
+
+if (n == 0) return 1; /* an ascii char of the form 0xxx */
+else if (n == 1) return -1; /* invalid start byte of the form 10xx */
+else return n;
 }
 
 /* These return the result in wchar_t, but does assume
Index: src/main/valid_utf8.h
===
--- src/main/valid_utf8.h	(revision 72365)
+++ src/main/valid_utf8.h	(working copy)
@@ -75,7 +75,7 @@
 	if (c < 0xc0) return 1;   /* Isolated 10xx  byte */
 	if (c >= 0xfe) return 1; /* Invalid 0xfe or 0xff bytes */
 
-	ab = utf8_table4[c & 0x3f]; /* Number of additional bytes */
+	ab = utf8clen(c) - 1; /* Number of additional bytes */
 	if (length < ab) return 1;
 	length -= ab; /* Length remaining */
 
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [PATCH] Improve utf8clen and remove utf8_table4

2017-03-19 Thread Sahil Kang
Some of the code that uses utf8clen checks the validity of the utf8 
string before making the call.
However, there were some hairy areas where I felt that the new semantics 
may cause issues (if not now, then in future changes).


I've attached two patches:
* new_semantics.diff keeps the new semantics and updates those 
hairy areas above.
* old_semantics.diff maintains the old semantics (return 1 even for 
continuation bytes).


I don't think the new semantics will cause issues, especially with the 
updates, but we can err on the side of caution and keep the old 
semantics. I feel that the new semantics provide a clearer interface 
though (the function expects a start byte and should return an error if 
a start byte is not supplied).

In either case, the utf8_table4 array has been removed.

Sahil

On 03/19/2017 05:38 AM, Duncan Murdoch wrote:

On 19/03/2017 2:31 AM, Sahil Kang wrote:

Given a char `c' which should be the start byte of a utf8 character,
the utf8clen function returns the byte length of the utf8 character.

Before this patch, the utf8clen function would return either:
 * 1 if `c' was an ascii character or a utf8 continuation byte
 * An int in the range [2, 6] indicating the byte length of the utf8
character

With this patch, the utf8clen function will now return either:
 * -1 if `c' is not a valid utf8 start byte
 * The byte length of the utf8 character (the number of leading 1's,
really)

I believe returning -1 for continuation bytes makes utf8clen less error
prone.
The utf8_table4 array is no longer needed and has been removed.


utf8clen is used internally by R in more than a dozen places, and is 
likely used in packages as well.  Have you checked that this change in 
semantics won't break any of those uses?


Duncan Murdoch



Index: src/main/util.c
===
--- src/main/util.c	(revision 72365)
+++ src/main/util.c	(working copy)
@@ -1183,18 +1183,23 @@
 return TRUE;
 }
 
-/* Number of additional bytes */
-static const unsigned char utf8_table4[] = {
-  1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
-  1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
-  2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,
-  3,3,3,3,3,3,3,3,4,4,4,4,5,5,5,5 };
-
+/*
+ * If `c' is not a valid utf8 start byte, return 1.
+ * Otherwise, return the number of bytes in the utf8 string with start byte `c'
+ */
 int attribute_hidden utf8clen(char c)
 {
-/* This allows through 8-bit chars 10xx, which are invalid */
-if ((c & 0xc0) != 0xc0) return 1;
-return 1 + utf8_table4[c & 0x3f];
+int n = 0; /* number of leading 1's */
+int m = 0x80; /* byte mask */
+
+while (c & m) {
+++n;
+m >>= 1;
+}
+
+if (n == 0) return 1; /* an ascii char of the form 0xxx */
+else if (n == 1) return 1; /* invalid start byte of the form 10xx */
+else return n;
 }
 
 /* These return the result in wchar_t, but does assume
Index: src/main/valid_utf8.h
===
--- src/main/valid_utf8.h	(revision 72365)
+++ src/main/valid_utf8.h	(working copy)
@@ -75,7 +75,7 @@
 	if (c < 0xc0) return 1;   /* Isolated 10xx  byte */
 	if (c >= 0xfe) return 1; /* Invalid 0xfe or 0xff bytes */
 
-	ab = utf8_table4[c & 0x3f]; /* Number of additional bytes */
+	ab = utf8clen(c) - 1; /* Number of additional bytes */
 	if (length < ab) return 1;
 	length -= ab; /* Length remaining */
 
Index: src/main/character.c
===
--- src/main/character.c	(revision 72365)
+++ src/main/character.c	(working copy)
@@ -276,7 +276,9 @@
 if (ienc == CE_UTF8) {
 	const char *end = str + strlen(str);
 	for (i = 0; i < so && str < end; i++) {
-	int used = utf8clen(*str);
+	used = utf8clen(*str);
+	if (used < 0) used = 1; /* gobble up invalid utf8 start byte */
+
 	if (i < sa - 1) { str += used; continue; }
 	for (j = 0; j < used; j++) *buf++ = *str++;
 	}
@@ -459,10 +461,18 @@
 int i, in = 0, out = 0;
 
 if (ienc == CE_UTF8) {
-	for (i = 1; i < sa; i++) buf += utf8clen(*buf);
+	int len;
+	for (i = 1; i < sa; i++) {
+	len = utf8clen(*buf);
+	buf += len < 0 ? 1 : len;
+	}
 	for (i = sa; i <= so && in < strlen(str); i++) {
-	in +=  utf8clen(str[in]);
-	out += utf8clen(buf[out]);
+	len = utf8clen(str[in]);
+	in +=  len < 0 ? 1 : len;
+
+	len = utf8clen(buf[out]);
+	out += len < 0 ? 1 : len;
+
 	if (!str[in]) break;
 	}
 	if (in != out) memmove(buf+in, buf+out, strlen(buf+out)+1);
Index: src/main/connections.c
===
--- src/main/connections.c	(revision 72365)
+++ src/main/connections.c	(working copy)
@@ -4408,6 +4408,7 @@
 	if (iread >= nbytes) b

[Rd] [PATCH] Ensure correct order of evaluation in macro

2017-05-22 Thread Sahil Kang

Hello,

I'd like to contribute this small patch (attached) that I think will 
help prevent some future bugs from occurring in paste.c.
By wrapping the macro's arguments in parentheses, we can ensure that the 
correct order of evaluation will take place during preprocessing.


To illustrate, we can use the == operator which has lower evaluation 
precedence than the < operator:

* With the current macro, imax2(3==4, 1) expands to 0.
* After applying this patch, imax2(3==4, 1) expands to 1 as expected.

Since I'm still relatively new to the mailing list, I've kept this patch 
small.
I did notice other macros that have this same issue, so I can start 
sending additional patches if this seems okay.


Thanks,
Sahil
Index: src/main/paste.c
===
--- src/main/paste.c	(revision 72713)
+++ src/main/paste.c	(working copy)
@@ -31,7 +31,7 @@
 #include "Defn.h"
 #include 
 
-#define imax2(x, y) ((x < y) ? y : x)
+#define imax2(x, y) (((x) < (y)) ? (y) : (x))
 
 #include "Print.h"
 #include "RBufferUtils.h"
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [PATCH] Ensure correct order of evaluation in macro

2017-05-23 Thread Sahil Kang
Hi Duncan,

Would you merge this patch?
I'm planning on sending larger patches in the next few days that fix other 
macros I've seen, but I figured​ it'd be best to start with a smaller patch.

Thanks,
Sahil

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [PATCH] Ensure correct order of evaluation in macro

2017-05-24 Thread Sahil Kang

Hi Martin,

Would you feel comfortable merging my small patch in?
I'd like to send additional patches that will improve these types of 
function macros so I'm hoping to start with this small change.


Thanks,
Sahil

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] SUGGESTION: Use JIRA for Bug Reporting, Package Development and Project Management

2017-06-08 Thread Sahil Kang
I enjoy the simplicity of a mailing list; I'd prefer not to create a new user 
account to participate in a software project. Plenty of other software 
communities utilize mailing lists, and I think you'll also begin to prefer 
email as you get more involved with software projects.

R is also a GNU package​ so I don't think it's a good idea to use a closed 
source product like JIRA. I also think it's easier for the core devs to manage 
a mailing list instead of a JIRA instance.

I'm not sure what you mean about privacy concerns for package maintainers; can 
you elaborate?

Sahil

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] [Bug Fix] Default values not applied to ... arguments

2017-07-06 Thread Sahil Kang

Hi Duncan, Martin

Here's a small patch that fixes bug 15199 reported at:
https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=15199

I was able to reproduce the bug as Duncan had outlined just fine, but I 
did notice that when we debug(f), the problem went away.
I later realized that f(1,,3) behaved correctly the first time it was 
executed, but misbehaved as documented on subsequent calls.
This narrowed the problem down to the byte-compilation that occurs on 
subsequent function calls.


This patch prevents byte-compilation on closure objects.
Although it's a less than ideal solution, this patch fixes the bug at 
least until the underlying byte-compilation issue can be found (I'm 
currently scrutinizing the promiseArgs function at eval.c:2771).


Thanks,
Sahil
Index: src/main/eval.c
===
--- src/main/eval.c	(revision 72894)
+++ src/main/eval.c	(working copy)
@@ -1059,8 +1059,8 @@
 
 SEXP body = BODY(fun);
 
-if (R_jit_enabled > 0 && TYPEOF(body) != BCODESXP &&
-	! R_disable_bytecode && ! NOJIT(fun)) {
+if (R_jit_enabled > 0 && TYPEOF(fun) != CLOSXP && TYPEOF(body) != BCODESXP
+	&& !R_disable_bytecode && ! NOJIT(fun)) {
 
 	if (MAYBEJIT(fun)) {
 	/* function marked as MAYBEJIT the first time now seen
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [Bug Fix] Default values not applied to ... arguments

2017-07-07 Thread Sahil Kang
Yes, I see what you mean. My patch only disables JIT compilation for 
closures. If a user manually compiles a closure, however, the bug pops 
up again.

I think the bug may either lie in how the byte-compiler compiles 
closures, or how closures with compiled body's are executed by the AST 
interpreter (without my patch applied):

compiler::enableJIT(0) # turn off JIT compilation
f <- function(...) { g(...) }
g <- function(a=4, b=5, c=6) {
   print(c(missing(a), missing(b), missing(c)))
   b
}

f(1,,3) # works fine
# [1] FALSE TRUE FALSE
# [1] 5

f(1,,3) # works fine
# [1] FALSE TRUE FALSE
# [1] 5

ff_1 <- compiler::compile(f) # compile f
ff_2 <- compiler::cmpfun(f) # compile body of closure

eval(ff_1)(1,,3) # works fine
# [1] FALSE TRUE FALSE
# [1] 5

eval(ff_2)(1,,3) # bug shows up again
# [1] FALSE TRUE FALSE
# Error in g(...) : argument is missing, with no default

Thanks,
Sahil

On 07/06/2017 09:29 AM, Tomas Kalibera wrote:
>
> Thanks for the report, I've fixed 15199 in the AST interpreter in 
> 72664, I will fix it in the byte-code interpreter as well.
>
> If you ever needed to disable the JIT, there is API for that, see 
> ?compiler. Note though that by disabling the JIT you won't disable the 
> byte-code interpreter, because code also gets compiled when packages 
> are installed or when the compiler is invoked explicitly.
>
> Best,
> Tomas
>
> On 07/06/2017 04:40 PM, Sahil Kang wrote:
>> Hi Duncan, Martin
>>
>> Here's a small patch that fixes bug 15199 reported at:
>> https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=15199
>>
>> I was able to reproduce the bug as Duncan had outlined just fine, but 
>> I did notice that when we debug(f), the problem went away.
>> I later realized that f(1,,3) behaved correctly the first time it was 
>> executed, but misbehaved as documented on subsequent calls.
>> This narrowed the problem down to the byte-compilation that occurs on 
>> subsequent function calls.
>>
>> This patch prevents byte-compilation on closure objects.
>> Although it's a less than ideal solution, this patch fixes the bug at 
>> least until the underlying byte-compilation issue can be found (I'm 
>> currently scrutinizing the promiseArgs function at eval.c:2771).
>>
>> Thanks,
>> Sahil
>>
>>
>> __
>> R-devel@r-project.org  mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [Bug Fix] Default values not applied to ... arguments

2017-07-07 Thread Sahil Kang
I'm glad I could help, and I look forward to reading your patch so that I can 
learn more about the R internals. It'd be nice to close this 4 year old bug.

Sahil



 Original Message 
From: Tomas Kalibera 
Sent: July 7, 2017 11:10:05 AM PDT
To: Sahil Kang 
Subject: Re: [Rd] [Bug Fix] Default values not applied to ... arguments

As I said the old behavior had been for several years both in AST and BC 
interpreter. I fixed it in AST and I will fix it in BC. I know exactly 
where it is and how to fix it, there is no more help I could use on this 
issue and I don't think there is any need to discuss this further, and 
particularly so on R-devel. Please do not send speculative emails to 
R-devel, in the interest of saving time of people following the list.

Thanks
Tomas

On 07/07/2017 11:42 AM, Sahil Kang wrote:
> Yes, I see what you mean. My patch only disables JIT compilation for 
> closures. If a user manually compiles a closure, however, the bug pops 
> up again.
>

> I think the bug may either lie in how the byte-compiler compiles 
> closures, or how closures with compiled body's are executed by the AST 
> interpreter (without my patch applied):
>
> compiler::enableJIT(0) # turn off JIT compilation
> f <- function(...) { g(...) }
> g <- function(a=4, b=5, c=6) {
>   print(c(missing(a), missing(b), missing(c)))
>   b
> }
>
> f(1,,3) # works fine
> # [1] FALSE TRUE FALSE
> # [1] 5
>
> f(1,,3) # works fine
> # [1] FALSE TRUE FALSE
> # [1] 5
>
> ff_1 <- compiler::compile(f) # compile f
> ff_2 <- compiler::cmpfun(f) # compile body of closure
>
> eval(ff_1)(1,,3) # works fine
> # [1] FALSE TRUE FALSE
> # [1] 5
>
> eval(ff_2)(1,,3) # bug shows up again
> # [1] FALSE TRUE FALSE
> # Error in g(...) : argument is missing, with no default
>
> Thanks,
> Sahil
>
> On 07/06/2017 09:29 AM, Tomas Kalibera wrote:
>>
>> Thanks for the report, I've fixed 15199 in the AST interpreter in 
>> 72664, I will fix it in the byte-code interpreter as well.
>>
>> If you ever needed to disable the JIT, there is API for that, see 
>> ?compiler. Note though that by disabling the JIT you won't disable 
>> the byte-code interpreter, because code also gets compiled when 
>> packages are installed or when the compiler is invoked explicitly.
>>
>> Best,
>> Tomas
>>
>> On 07/06/2017 04:40 PM, Sahil Kang wrote:
>>> Hi Duncan, Martin
>>>
>>> Here's a small patch that fixes bug 15199 reported at:
>>> https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=15199
>>>
>>> I was able to reproduce the bug as Duncan had outlined just fine, 
>>> but I did notice that when we debug(f), the problem went away.
>>> I later realized that f(1,,3) behaved correctly the first time it 
>>> was executed, but misbehaved as documented on subsequent calls.
>>> This narrowed the problem down to the byte-compilation that occurs 
>>> on subsequent function calls.
>>>
>>> This patch prevents byte-compilation on closure objects.
>>> Although it's a less than ideal solution, this patch fixes the bug 
>>> at least until the underlying byte-compilation issue can be found 
>>> (I'm currently scrutinizing the promiseArgs function at eval.c:2771).
>>>
>>> Thanks,
>>> Sahil
>>>
>>>
>>> __
>>> R-devel@r-project.org  mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>>
>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel