[Bug libstdc++/48257] New: std::string::assign() corrupts std::string static data when called on emptyString1 using emptyString2.data()

2011-03-23 Thread mohsinrzaidi at gmail dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48257



   Summary: std::string::assign() corrupts std::string static data

when called on emptyString1 using emptyString2.data()

   Product: gcc

   Version: 4.1.2

Status: UNCONFIRMED

  Severity: minor

  Priority: P3

 Component: libstdc++

AssignedTo: unassig...@gcc.gnu.org

ReportedBy: mohsinrza...@gmail.com





The output of the following program is:

4

4



#include 

#include 



using namespace std;

int main()

{

try

{

string emptyStr1;

string emptyStr2;

string emptyStr3;



emptyStr3.assign(emptyStr2.data(), 4);



cout << emptyStr1.size() << endl;

cout << emptyStr3.size() << endl;

}

catch(...)

{

cout << "Reached here" << endl;

}

}



The size of emptyStr1/3 should not have been modified when emptyStr3 was

assigned the contents of emptyStr2 up to 4 bytes although all string are empty.



>From what I've understood of the basic_string code, the following code flow

will result in this case:







  template

inline basic_string<_CharT, _Traits, _Alloc>::

basic_string()

#ifndef _GLIBCXX_FULLY_DYNAMIC_STRING

: _M_dataplus(_S_empty_rep()._M_refdata(), _Alloc()) { }

#else

: _M_dataplus(_S_construct(size_type(), _CharT(), _Alloc()), _Alloc()) { }

#endif







First, the default constructor. Since I do not have

"_GLIBCXX_FULLY_DYNAMIC_STRING" defined in my implementation, _M_dataplus._M_p

will get initialized to &(_S_empty_rep_storage + sizeof (_Rep)) for all three

strings.







  const _CharT*

  data() const

  { return _M_data(); }











  _CharT*

  _M_data() const

  { return  _M_dataplus._M_p; }











  template

basic_string<_CharT, _Traits, _Alloc>&

basic_string<_CharT, _Traits, _Alloc>::

assign(const _CharT* __s, size_type __n)

{

  __glibcxx_requires_string_len(__s, __n);

  _M_check_length(this->size(), __n, "basic_string::assign");

  if (_M_disjunct(__s) || _M_rep()->_M_is_shared())

return _M_replace_safe(size_type(0), this->size(), __s, __n);

  else

{

  // Work in-place.

  const size_type __pos = __s - _M_data();

  if (__pos >= __n)

_M_copy(_M_data(), __s, __n);

  else if (__pos)

_M_move(_M_data(), __s, __n);

  _M_rep()->_M_set_length_and_sharable(__n);

  return *this;

}

 }







When assign is called using emptyStr2.data(), which returns the same value as

that set above for _M_dataplus._M_p, _M_disjunct() will return 0, as will

_M_is_shared() and we end up going into the else block.



Here, since __s = _M_dataplus._M_p, which is what _M_data() returns, __pos = 0

and so the only statements we end up executing are the last two in the else

block thereby setting the length = 4 in the static storage returned by

_M_rep().



This results in the corruption of static storage used by all std::string

objects.



Does that make sense?



==

Using built-in specs.

Target: x86_64-redhat-linux

Configured with: ../configure --prefix=/usr --mandir=/usr/share/man

--infodir=/usr/share/info --enable-shared --enable-threads=posix

--enable-checking=release --with-system-zlib --enable-__cxa_atexit

--disable-libunwind-exceptions --enable-libgcj-multifile

--enable-languages=c,c++,objc,obj-c++,java,fortran,ada --enable-java-awt=gtk

--disable-dssi --enable-plugin

--with-java-home=/usr/lib/jvm/java-1.4.2-gcj-1.4.2.0/jre --with-cpu=generic

--host=x86_64-redhat-linux

Thread model: posix

gcc version 4.1.2 20080704 (Red Hat 4.1.2-48)


[Bug libstdc++/48257] std::string::assign() corrupts std::string static data when called on emptyString1 using emptyString2.data()

2011-03-23 Thread mohsinrzaidi at gmail dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48257



Mohsin  changed:



   What|Removed |Added



 Status|RESOLVED|UNCONFIRMED

 Resolution|INVALID |



--- Comment #2 from Mohsin  2011-03-23 19:09:31 
UTC ---

(In reply to comment #1)

> this is undefined behaviour, the standard requires that emptyStr2.data() 
> points

> to an array of at least 4 chars



Kindly point me to where this is stated in the standards. I have searched but

have not been able to find any information in this regard.


[Bug libstdc++/48257] std::string::assign() corrupts std::string static data when called on emptyString1 using emptyString2.data()

2011-03-23 Thread mohsinrzaidi at gmail dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48257



Mohsin  changed:



   What|Removed |Added



 Status|RESOLVED|UNCONFIRMED

 Resolution|INVALID |



--- Comment #4 from Mohsin  2011-03-24 06:19:52 
UTC ---

(In reply to comment #3)

> In C++03 see 21.3.5.3 [lib.string.assign] paragraph 6 where 

> string::assign(const char* s, size_t n) is defined to return

> assign( string(s, n) ) and that constructor:

> 

> Constructs an object of class basic_string and determines its initial string

> value from the array of charT of length n whose first element is designated by

> s, 

> 

> In C++0x the requirement is explicit, [string::assign] paragraph 8 says

> 

> Requires: s points to an array of at least n elements of charT.



I believe this is what you were referring to (from N3242=11-0012):



basic_string& assign(const charT* s, size_type n);

8 Requires: s points to an array of at least n elements of charT.

9 Throws: length_error if n > max_size().

10 Effects: Replaces the string controlled by *this with a string of length n

whose elements are a copy

of those pointed to by s.

11 Returns: *this.



Thanks for pointing this out. However, does libstdc++ throw and error if "n >

max_size()"? I don't think it does.



>From the latest libstdc++ documentation:







template

basic_string< _CharT, _Traits, _Alloc > & std::basic_string< _CharT, _Traits,

_Alloc >::assign ( const _CharT *  __s,

size_type  __n 

) 



Set value to a C substring.



Parameters:

s The C string to use.

n Number of characters to use.



Returns:

Reference to this string.



This function sets the value of this string to the first n characters of s. If

n is is larger than the number of available characters in s, the remainder of s

is used.



Definition at line 261 of file basic_string.tcc.



References std::basic_string< _CharT, _Traits, _Alloc >::size().







I don't see this as saying that the string has to have at least n elements OR

that an error will be thrown if it isn't.



>From the latest basic_string code:







  template

00259 basic_string<_CharT, _Traits, _Alloc>&

00260 basic_string<_CharT, _Traits, _Alloc>::

00261 assign(const _CharT* __s, size_type __n)

00262 {

00263   __glibcxx_requires_string_len(__s, __n);

00264   _M_check_length(this->size(), __n, "basic_string::assign");

00265   if (_M_disjunct(__s) || _M_rep()->_M_is_shared())

00266 return _M_replace_safe(size_type(0), this->size(), __s, __n);

00267   else

00268 {

00269   // Work in-place.

00270   const size_type __pos = __s - _M_data();

00271   if (__pos >= __n)

00272 _M_copy(_M_data(), __s, __n);

00273   else if (__pos)

00274 _M_move(_M_data(), __s, __n);

00275   _M_rep()->_M_set_length_and_sharable(__n);

00276   return *this;

00277 }

00278  }







I don't see any errors being thrown. I am not trying to nit-pick but I've

raised this issue because I was affected by it and had to spend several days

isolating it. It would have made my life a whole lot easier if, as you claim,

libstdc++ had thrown an error.



Just a request, but please let the discussion complete before you go ahead and

mark the bug as Resolved-Invalid. I don't mind it being marked invalid, but I

consider it disrespectful for you to not even wait for my response and rushing

to mark it as invalid. Once we're through discussing, I will mark it invalid

myself if necessary.


[Bug libstdc++/48257] std::string::assign() corrupts std::string static data when called on emptyString1 using emptyString2.data()

2011-03-24 Thread mohsinrzaidi at gmail dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48257



--- Comment #8 from Mohsin  2011-03-24 09:50:11 
UTC ---

(In reply to comment #6)

> Also, GCC 4.1.2 is ancient and not supported here, you should either report

> bugs to the vendor of your version (Red Hat) or refer to a supported version.

> 



In my last comment, I did not post snippets to old code - I posted them off the

latest code.



> As for being disrespectful, the bug was invalid and I know that for a fact,

> which is why I didn't wait for you to agree.  You don't need to reopen the bug

> in order to post a new comment if you'd like pointers to the relevant pieces 
> of

> the standard or the code.



Agreed.


[Bug libstdc++/48257] std::string::assign() corrupts std::string static data when called on emptyString1 using emptyString2.data()

2011-03-24 Thread mohsinrzaidi at gmail dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48257



--- Comment #9 from Mohsin  2011-03-24 09:52:56 
UTC ---

(In reply to comment #7)

> (In reply to comment #4)

> > 

> > I don't see any errors being thrown. I am not trying to nit-pick but I've

> > raised this issue because I was affected by it and had to spend several days

> > isolating it. It would have made my life a whole lot easier if, as you 
> > claim,

> > libstdc++ had thrown an error.

> 

> I never claimed there'd be an exception throw in your original example, I said

> it's undefined behaviour.  An exception is thrown if n > max_size()

> 



Agreed.



> > Just a request, but please let the discussion complete before you go ahead 
> > and

> > mark the bug as Resolved-Invalid. I don't mind it being marked invalid, but 
> > I

> > consider it disrespectful for you to not even wait for my response and 
> > rushing

> > to mark it as invalid. Once we're through discussing, I will mark it invalid

> > myself if necessary.

> 

> That's not how it works, most users do not close their own bugs, maintainers

> don't want to keep revisiting old bugs to see if the user has decided to 
> accept

> the resolution yet.



Agreed.(In reply to comment #7)

> (In reply to comment #4)

> > 

> > I don't see any errors being thrown. I am not trying to nit-pick but I've

> > raised this issue because I was affected by it and had to spend several days

> > isolating it. It would have made my life a whole lot easier if, as you 
> > claim,

> > libstdc++ had thrown an error.

> 

> I never claimed there'd be an exception throw in your original example, I said

> it's undefined behaviour.  An exception is thrown if n > max_size()

> 

> > Just a request, but please let the discussion complete before you go ahead 
> > and

> > mark the bug as Resolved-Invalid. I don't mind it being marked invalid, but 
> > I

> > consider it disrespectful for you to not even wait for my response and 
> > rushing

> > to mark it as invalid. Once we're through discussing, I will mark it invalid

> > myself if necessary.

> 

> That's not how it works, most users do not close their own bugs, maintainers

> don't want to keep revisiting old bugs to see if the user has decided to 
> accept

> the resolution yet.



Agreed, thanks.


[Bug libstdc++/48257] std::string::assign() corrupts std::string static data when called on emptyString1 using emptyString2.data()

2011-03-24 Thread mohsinrzaidi at gmail dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48257



--- Comment #10 from Mohsin  2011-03-24 10:21:56 
UTC ---

Two questions here:



1. Is the behaviour undefined for __n < number of elements in __s?



2. For cases undefined in the specs, do we take steps to ensure robustness? I

still cannot digest that a programmer error could corrupt std::string static

memory.


[Bug libstdc++/48257] std::string::assign() corrupts std::string static data when called on emptyString1 using emptyString2.data()

2011-03-24 Thread mohsinrzaidi at gmail dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48257



--- Comment #11 from Mohsin  2011-03-24 10:24:15 
UTC ---

(In reply to comment #10)

> Two questions here:

> 

> 1. Is the behaviour undefined for __n < number of elements in __s?

>



Oops! I meant for __n >  number of elements in __s. 



>

> 2. For cases undefined in the specs, do we take steps to ensure robustness? I

> still cannot digest that a programmer error could corrupt std::string static

> memory.


[Bug libstdc++/48257] std::string::assign() corrupts std::string static data when called on emptyString1 using emptyString2.data()

2011-03-24 Thread mohsinrzaidi at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48257

--- Comment #13 from Mohsin  2011-03-25 05:48:21 
UTC ---
(In reply to comment #12)
> (In reply to comment #11)
> > (In reply to comment #10)
> > > Two questions here:
> > > 
> > > 1. Is the behaviour undefined for __n < number of elements in __s?
> > >
> > 
> > Oops! I meant for __n >  number of elements in __s. 
> Yes!  The standard defines the behaviour of string::assign when s contains at
> least n elements. When that precondition isn't met the definition doesn't 
> apply
> i.e. it's undefined.

Sorry for the confusion here Jon. I meant to ask if the specs define what the
behaviour should be if __s *does not* contain __n elements.

> > > 2. For cases undefined in the specs, do we take steps to ensure 
> > > robustness? 
> Where possible, yes, that's what -D_GLIBCXX_DEBUG tries to do.  But in general
> it's not possible to verify that the supplied string meets the required 
> length.
> Given a const char*, how do you tell if it points to an array of at least n
> chars?  You can't.

We could always look for the null-termination :) But like you say below, this
would add overhead and still not handle all cases.

> It would be possible to check that the supplied const char* is not the shared
> static "empty string" representation, but that would add overhead and still
> wouldn't prevent similar errors like:
>   string s("oops", 999);

Understood.

> > > I
> > > still cannot digest that a programmer error could corrupt std::string 
> > > static
> > > memory.
> Really?  A sufficiently malicious/careless programmer can corrupt pretty much
> anything!

Hehe, understood.

> In any case, if you try your original example with current releases it doesn't
> print '4'

Great! I'll try that.

Thanks for bearing with my impetuous behaviour. I know it mustn't be easy
maintaining such a widely used code base.