Re: [R-pkg-devel] New package with C++ code causes R abort in RStudio, not in R console.

2024-11-15 Thread Luc De Wilde
The 'parser' in lavaanC was written in two parts:
* a part in  pure C/C++ without the R interface things like SEXP, etc. ; taking 
as input a string (the model) and a bool (produce debuginfo yes/no) , and 
returning a structure with all parsed information and debuginfo (if requested) 
or an error code (and position in the model) if an error was detected in the 
model; (lav_SyntaxParser.cpp + lav_SyntaxParser.h + lav_Util.cpp + lav_Util.h + 
lav_SmallStringList.h)
* an interface function with R which is callable from R with .Call taking SEXP 
model and SEXP debug as input and returning a SEXP with the parsed information 
in about the same format as the lavaan parser written in R returns. This 
interface function also evaluates the parts of the model syntax that are 
written as an R expression (and that can be evaluated in the Global 
environment). (lav_parser_interface.cpp)

The first part of parser was written using Visual Studio (the IDE I know well 
(for C#) because I wrote many programs in C# in this environment), and I also 
wrote a main function as 'replacement' for the interface function in R to be 
able to test this first part in Visual Studio.

When searching for the reason Rstudio aborts the R session when executing 
tests, I found - thanks to the help of Ivan - that there were some memory 
allocation or usage problems in my code. Because I work most of the time in 
Windows and the Address Sanitizer in R is only available on Linux/MacOS (cfr 
Writing R Extensions 4.3.3) I felt stuck.

Because C/C++ is relatively new to me (about 3 months) I didn't know very well 
the different tools for C/C++ available in Visual Studio, but looking at the 
documentation of Dr Memory I realized that there should also be some tools in 
Visual Studio to check the memory allocation and usage in C++ programs.  So I 
found that in Visual Studio Project options -> C/C++  -> General there is an 
option "Enable Address Sanitizer". When I enabled it and ran the tests I found 
the problematic code where I allocated one byte less then needed to copy a 
string.

I have looked at the documentation of Dr. Memory but not yet installed it.


Luc


Van: Ivan Krylov 
Verzonden: donderdag 14 november 2024 21:32
Aan: Luc De Wilde 
CC: r-package-devel@r-project.org ; Yves Rosseel 

Onderwerp: Re: [R-pkg-devel] New package with C++ code causes R abort in 
RStudio, not in R console.

В Thu, 14 Nov 2024 13:08:13 +
Luc De Wilde  пишет:

> At last, with checking of the program with address sanitizer in
> Visual Studio, I found an "off by 1 error" in my code. Now all tests
> pass without problems in Rstudio.

Congratulations on being able to solve the problem yourself!

It took me too long to figure out that the syntax errors I was getting
from lavaanC::lav_parse_model_string_c() were due to the invisible
U+00A0 (non-breakable space) characters from the e-mail. Once I ran
lav_parse_model_string_c(gsub('\ua0', ' ', model)) in a sanitized build
of R, I too saw the buffer overflow and a number of memory leaks.

Would you mind sharing with the list how you used address sanitizer
with R on Windows? Did you have to use the clang compiler, or have you
been able to use MSVC? Does Dr. Memory  find any
additional problems?

--
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Possible false negative for compiled C++ code in CRAN checks

2024-11-15 Thread Mauricio Vargas Sepulveda
Dear R Developers,

I implemented Dr. Krylov's fix, and now redatam is back to CRAN (Point 1).

Dr. Eddelbuettel's suggestion about not fully mimicking CRAN gcc-san setup was 
also correct (Point 2).

Point 1:
- When the big endian issue was described here, I tried the s390x image from 
R-Hub, which does not work on my laptop. Then I went to a big endian server 
that we have at UofT and that reproduced the error!

Point 2:
- I was testing with docker.io/rocker/r-devel-san:latest, and I could not 
replicate the error.
- I ended up using the R-Hub image with 
https://github.com/r-hub/containers/pull/81#issuecomment-2478009166, and that 
replicated the error line by line.

Besides it, I implemented the same fix for the command line tool and the Python 
package. I also changed the example data by aggregating Uruguay's census to 
show some numbers. Just cutting the levels and show region/city is not 
informative. Being Redatam a closed format with no standard or specification, I 
ended up reading the census with my pkg, aggregating with dplyr, saving to CSV, 
and then converting back to Redatam with a point-and-click tool following an 
unofficial tutorial from You Tube. There is no easy way  to aggregate and save 
in the same format as we do with SPSS.

I added Dr. Krylov to ctb for this very useful fix! Thanks a lot to Dr. Krylov 
and Dr. Eddelbuettel for the suggestions, I was going in circles being unable 
to replicate the issue and I now asked Dr. Ligges for the possibility to know 
more about the CRAN specific configuration to add it to R-Hub.

Best,

——
Mauricio "Pachá" Vargas Sepúlveda
PhD Student, Political Science
University of Toronto



From: Ivan Krylov 
Sent: November 14, 2024 4:50 PM
To: Mauricio Vargas Sepulveda 
Cc: r-package-devel@r-project.org 
Subject: Re: [R-pkg-devel] Possible false negative for compiled C++ code in 
CRAN checks

В Thu, 14 Nov 2024 16:24:16 +
Mauricio Vargas Sepulveda  пишет:

> After enabling the SAN flags, I cannot reproduce the gcc-san error
> [2].

Can you use the rocker/r-devel-san container? It reproduces the problem
for me.

When reading galapagos/cg15.dic, FuzzyEntityParser::ParseEntities()
keeps advancing over the file and failing to parse a single entity
until it eventually calls stop() because it didn't find any entities.

In a non-sanitized build, it first succeeds at 0-based offset 1095. In
a sanitized build, it fails for all offsets. I think this is due to the
ordering of the byte reads:
https://github.com/pachadotdev/open-redatam/blob/bbb65242f1af5f601def1c0b971ed601d459b4f3/src/readers/ByteArrayReader.cpp#L176-L192

In C++, an operation like the following:

static_cast(ReadByte()) << 8 |
static_cast(ReadByte());

...depends on the order in which the compiler will choose to evaluate
the calls to static_cast(ReadByte()), and this order is not
guaranteed to be left-to-right:
https://en.cppreference.com/w/cpp/language/eval_order

I edited all four byte-reading functions and replaced the one-statement
operations with separate statements for each of the byte reads:

--- redatam.orig/src/redatamlib/readers/ByteArrayReader.cpp 2024-11-09 
02:12:17.0 +
+++ redatam.new/src/redatamlib/readers/ByteArrayReader.cpp  2024-11-14 
21:25:54.0 +
@@ -175,23 +175,27 @@
 }

 uint16_t ByteArrayReader::ReadInt16LE() {
-  return static_cast(ReadByte()) |
- (static_cast(ReadByte()) << 8);
+  uint16_t a = static_cast(ReadByte());
+  uint16_t b = static_cast(ReadByte()) << 8;
+  return a | b;
 }

 uint32_t ByteArrayReader::ReadInt32LE() {
-  return static_cast(ReadInt16LE()) |
- static_cast(ReadInt16LE()) << 16;
+  uint32_t a = static_cast(ReadInt16LE());
+  uint32_t b = static_cast(ReadInt16LE()) << 16;
+  return a | b;
 }

 uint16_t ByteArrayReader::ReadInt16BE() {
-  return (static_cast(ReadByte()) << 8) |
- static_cast(ReadByte());
+ uint16_t a= (static_cast(ReadByte()) << 8);
+ uint16_t b= static_cast(ReadByte());
+ return a| b;
 }

 uint32_t ByteArrayReader::ReadInt32BE() {
-  return (static_cast(ReadInt16BE()) << 16) |
- static_cast(ReadInt16BE());
+  uint32_t b = static_cast(ReadInt16LE()) << 16;
+  uint32_t a = static_cast(ReadInt16LE());
+  return b | a;
 }

 }  // namespace RedatamLib

...and this seems to make the error vanish. I think I see the
misordering too. In the output of objdump -d
redatam.Rcheck/redatam/libs/redatam.so, I see:

00267010 <_ZN10RedatamLib15ByteArrayReader11ReadInt16LEEv>:

  267028:   e8 93 6f f5 ff  call   1bdfc0 
<_ZN10RedatamLib15ByteArrayReader8ReadByteEv@plt>
   first_byte <- ReadByte()
  26702d:   41 89 c4mov%eax,%r12d
   save first byte in r12
  26703d:   41 c1 e4 08 shl$0x8,%r12d
   left-shift the first byte!
  267041:   e8 7a 6f f5 ff 

Re: [R-pkg-devel] Possible false negative for compiled C++ code in CRAN checks

2024-11-15 Thread Dirk Eddelbuettel


On 15 November 2024 at 12:16, Mauricio Vargas Sepulveda wrote:
| [...] and I now asked Dr. Ligges for the possibility to know more about the 
CRAN specific configuration to add it to R-Hub.

It is (and has always been) documented in a text file on the server 

   https://www.stats.ox.ac.uk/pub/bdr/memtests/README.txt

which is also referenced in a few places (but I forget where, I tend to go
back to my rocker san repos who have it too -- Google gets a few pages of
hits for the URL as a constant quoted string).

Dirk

-- 
dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel