Ok, found the problem - on platforms that support it TRE uses wint_t (from 
wchar.h) as its type for characters (tre_cint_t) which on AIX is *signed* int. 
TRE uses liberally conversions between int and tre_cint_t apparently assuming 
that the latter is unsigned so conversions back to int are suitable for 
comparisons etc. On other platforms wint_t is unsigned so it works. Manually 
defining tre_cint_t to unsigned int fixes the issue.

Cheers,
Simon


On Jan 1, 2016, at 12:20 PM, Simon Urbanek <simon.urba...@r-project.org> wrote:

> Michael,
> 
> thanks, I'll have a look once my PDP VMs are up again (later today). This may 
> be a signedness issue although it's unclear why other platforms wouldn't be 
> affected.
> 
> Cheers,
> Simon
> 
> 
> On Dec 31, 2015, at 10:14 AM, Michael Felt <aixto...@gmail.com> wrote:
> 
>> On 2015-12-30 09:58, Michael Felt wrote:
>>> On 2015-12-29 11:02, Michael Felt wrote:
>>>> This seems to be a problem that goes back a long time - and I hope someone 
>>>> who understands what tre is suppossed to be doing will look at this.
>>>> 
>>>> A short history of other people who have reported on this on different 
>>>> versions of AIX. I shall only add that I get the same results on AIX 5.3 
>>>> TL7, AIX 6.1 TL9 and AIX 7.1 TL3.
>>>> 
>>>> Basically, with settings that work for AIX and 32-bit - the only changes 
>>>> being
>>>> -maix32 becomes -maix64
>>>> and
>>>> export OBJECT_MODE=32 becomes export OBJECT_MODE=64
>>>> 
>>>> Then to shorten the 'make' bla bla, first run just make, then
>>>> 
>>>> cd src/library/tools
>>>> make -s sysdata
>>>> 
>>>> http://article.gmane.org/gmane.comp.lang.r.devel/38817/match=package+tools+malformed
>>>>  
>>>> http://article.gmane.org/gmane.comp.lang.r.devel/36886/match=package+tools+malformed
>>>>  
>>>> http://article.gmane.org/gmane.comp.lang.r.devel/23372/match=package+tools+malformed
>>>>  Date: 2010-01-25 06:55:41 GMT (5 years, 48 weeks, 1 day, 20 hours and 30 
>>>> minutes ago)
>>>> 
>>>> To that, to get debug data, I have
>>>> 
>>>> * added -DTRE_DUGUG to src/extra/tre/Makefile # ALL_CFLAGS = 
>>>> $(ALL_CFLAGS_LO) -DTRE_DEBUG
>>>> * rm src/extra/tre/tre-match-parallel.o
>>>> * find . -name \*.so -exec rm {} \;
>>>> * make
>>>> * cd src/library/tools
>>>> * make -s sysdata
>>>> 
>>>> Attached are the two script files of the screen output. The 32-bit one is 
>>>> more verbose - and contains magically lines such as:
>>>>  found match 3037fd14 (while "found" does not occur in the 64-bit output)
>>>> 
>>>> root@x069:[/data/prj/cran/64/R-aix-3.2.3/src/library/tools]wc 
>>>> /tmp/sysdata.??.*
>>>>    4730   14123  139916 /tmp/sysdata.32.text
>>>>    1312    3688   40528 /tmp/sysdata.64.text
>>>>    6042   17811  180444 total
>>>> 
>>>> root@x069:[/data/prj/cran/64/R-aix-3.2.3/src/library/tools]grep -c found 
>>>> /tmp/sysdata.??.*
>>>> /tmp/sysdata.32.text:19
>>>> /tmp/sysdata.64.text:0
>>>> 
>>>> 
>>>> Hope this brings us (or me), closer to a resolution to an old concern.
>>>> 
>>>> And, best wishes for the new year!
>>>> 
>>>> Michael
>>>> 
>>>> 
>>> Still hoping for someones curiosity/willingness.
>>> 
>>> The differences show up in the first comparision that is made (of the 
>>> string "3.2.3" it seems) - 32-bit is on the left, 64-bit on the right.
>>> 
>>> Script command is started on Tue Dec 29 08:39:16 UTC 2015.                  
>>>    |  Script command is started on Tue Dec 29 08:39:56 UTC 2015.
>>> root@x069:[/data/prj/cran/32/R-aix-3.2.3/src/library/tools]make -s sysdata  
>>>    |  root@x069:[/data/prj/cran/64/R-aix-3.2.3/src/library/tools]make -s 
>>> sysdata
>>> installing 'sysdata.rda'                                                    
>>>    |  installing 'sysdata.rda'
>>> tre_tnfa_run_parallel, input type 1                                         
>>>    |  tre_tnfa_run_parallel, input type 1
>>> length: -1                                                                  
>>>    |  length: -1
>>> pos:chr/code | states and tags                                              
>>>    |  pos:chr/code | states and tags
>>> -------------+------------------------------------------------              
>>>    |  -------------+------------------------------------------------
>>> init > 30380200 3038014c 30380098                                           
>>>   |   init > 110cc3040 110cc2f28 110cc2e10
>>> match end offset = -1                                                       
>>>    |  match end offset = -1
>>> tre_tnfa_run_parallel, input type 1                                         
>>>    |  tre_tnfa_run_parallel, input type 1
>>> length: -1                                                                  
>>>    |  length: -1
>>> pos:chr/code | states and tags                                              
>>>    |  pos:chr/code | states and tags
>>> -------------+------------------------------------------------              
>>>    |  -------------+------------------------------------------------
>>> init > 3037fb88                                                             
>>>   |   init > 110cc3310
>>> 0: 3/00051 | 3037fb88/0:0                                                   
>>>  |    0: 3/00051 | 110cc3310/0:0
>>> 1: ./00046 | 3037fb88/0:0                                                   
>>>  |    1: ./00046 | 110cc3310/0:0
>>> init > 3037fb88                                                             
>>>   |   init > 110cc3310
>>> 1: ./00046 | 3037fb88/0:1                                                   
>>>  |    1: ./00046 | 110cc3310/0:1
>>> 2: 2/00050 | 3037fb88/0:1                                                   
>>>  |    2: 2/00050 | 110cc3310/0:1
>>> assertion failed                                                            
>>>    |  assertion failed
>>> init > 3037fb88                                                             
>>>   |   init > 110cc3310
>>> 2: 2/00050 | 3037fc18/0:1 3037fb88/0:2                                      
>>>  |    2: 2/00050 | 110cc33f0/0:1 110cc3310/0:2
>>> 3: ./00046 | 3037fc18/0:1 3037fb88/0:2                                      
>>>  |    3: ./00046 | 110cc33f0/0:1 110cc3310/0:2
>>> assertion failed                           *** DIFFERENCE ***               
>>>    |   init > 110cc3310
>>> init > 3037fb88                                                             
>>>   |    3: ./00046 | 110cc3310/0:3
>>> 3: ./00046 | 3037fc18/0:1 3037fb88/0:3                                      
>>>  |    4: 3/00051 | 110cc3310/0:3
>>> 4: 3/00051 | 3037fc18/0:1 3037fb88/0:3                                      
>>>  |  assertion failed
>>> assertion failed                                                            
>>>    |   init > 110cc3310
>>> init > 3037fb88                                                             
>>>   |    4: 3/00051 | 110cc33f0/0:3 110cc3310/0:4
>>> 4: 3/00051 | 3037fc18/0:3 3037fb88/0:4                                      
>>>  |    5: /00000 | 110cc33f0/0:3 110cc3310/0:4
>>> 5: /00000 | 3037fc18/0:3 3037fb88/0:4 |   init > 110cc3310
>>> found match 3037fd14 *** DIFFERENCE ***                   |  match end 
>>> offset = -1
>>> match end offset = 5 *** DIFFERENCE ***                   |  
>>> tre_tnfa_run_parallel, input type 1
>>> tre_tnfa_run_parallel, input type 1                                         
>>>    |  length: -1
>>> length: -1                                                                  
>>>    |  pos:chr/code | states and tags
>>> pos:chr/code | states and tags                                              
>>>    |  -------------+------------------------------------------------
>>> -------------+------------------------------------------------              
>>>    |   init > 110cc4780 110cc4668 110cc4550
>>> init > 303811c0 3038110c 30381058                                           
>>>   |  match end offset = -1
>>> match end offset = -1                                                       
>>>    |  tre_tnfa_run_parallel, input type 1
>>> tre_tnfa_run_parallel, input type 1                                         
>>>    |  length: -1
>>> length: -1                                                                  
>>>    |  pos:chr/code | states and tags
>>> pos:chr/code | states and tags                                              
>>>    |  -------------+------------------------------------------------
>>> -------------+------------------------------------------------              
>>>    |   init > 110cc5700 110cc55e8 110cc54d0
>>> 
>> One day further - looks like tre_compile (or just before, after all).
>> 
>> With TRE_DEBUG switched on in tre-compile.c and tre-ast.c I see (snip)
>> 
>> --- /tmp/x.32   2015-12-31 15:09:44.000000000 +0000
>> +++ /tmp/x.64   2015-12-31 15:09:30.000000000 +0000
>> @@ -1,5 +1,5 @@
>> - Script command is started on Thu Dec 31 15:04:39 2015.
>> - root@x069:[/data/prj/cran/32/R-aix-3.2.3/src/library/tools]make sysdata
>> + Script command is started on Thu Dec 31 15:08:43 2015.
>> + root@x069:[/data/prj/cran/64/R-aix-3.2.3/src/library/tools]make sysdata
>> installing 'sysdata.rda'
>> echo 
>> "tools:::sysdata2LazyLoadDB(\"/data/prj/cran/R-3.2.3/src/library/tools/R/sysdata.rda\",\"../../../library/tools/R\")"
>>  | \
>>   R_DEFAULT_PACKAGES=NULL LC_ALL=C ../../../bin/R --vanilla --slave
>> @@ -167,7 +167,7 @@
>> initial: 1/1,0, assert 0
>> initial: 0/0, assert 0
>> initial: 0/0, assert 0
>> - final state 30370718
>> + final state 110cba530
>> tre_compile: parsing '(^|[^%])(%%)*%V'
>> AST:
>> catenation, sub 0, 0 tags
>> @@ -177,7 +177,7 @@
>>         assertions: bol
>>         union, sub -1, 0 tags
>>           literal (, $) (0, 36), pos 0, sub -1, 0 tags
>> -           literal (&, M-^?) (38, 65535), pos 0, sub -1, 0 tags
>> +           literal (&, M-^?) (38, -1), pos 0, sub -1, 0 tags
>>       iteration {0, -1}, sub -1, 0 tags, greedy
>>         catenation, sub 2, 0 tags
>>           literal (%, %) (37, 37), pos 1, sub -1, 0 tags
>> @@ -197,7 +197,7 @@
>> Union
>> Literal 0-36
>> After union left
>> - Literal 38-65535
>> + Literal 38--1
>> After union right
>> After union right
>>   num_tags += 2
>> @@ -231,7 +231,7 @@
>>         assertions: bol
>>         union, sub -1, 0 tags
>>           literal (, $) (0, 36), pos 0, sub -1, 0 tags
>> -           literal (&, M-^?) (38, 65535), pos 0, sub -1, 0 tags
>> +           literal (&, M-^?) (38, -1), pos 0, sub -1, 0 tags
>>       iteration {0, -1}, sub -1, 2 tags, greedy
>>         catenation, sub 2, 1 tags
>>           literal (%, %) (37, 37), pos 1, sub -1, 1 tags
>> @@ -255,7 +255,7 @@
>> Union
>> Literal 0-36
>> After union left
>> - Literal 38-65535
>> + Literal 38--1
>> After union right
>> After union right
>> tre_add_tag_right: tag 3
>> @@ -342,7 +342,7 @@
>>           catenation, sub -1, 0 tags
>>             union, sub -1, 0 tags
>>               literal (, $) (0, 36), pos 0, sub -1, 0 tags
>> -               literal (&, M-^?) (38, 65535), pos 0, sub -1, 0 tags
>> +               literal (&, M-^?) (38, -1), pos 0, sub -1, 0 tags
>>             tag 4
>> 
>> It seems in 32-bit mode -1 is unsigned (65535) but -1 == -1 in 64-bit mode.
>> 
>> I suspect I will "find it" - but a proposed change is appreciated.
>> 
>> Happy New Year,
>> Michael
>> 
>> ______________________________________________
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>> 
> 
> ______________________________________________
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
> 

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to