[cfe-users] how clang merge strings in .rodata section

2018-07-04 Thread Jian, Xu via cfe-users
Hi,
The following c source code abc.c:
#include 
int g_val=10;
const char *g_str="abc";
const char *g_str1="c";
int main(void)
{
printf("%s %s: %d\n",g_str,g_str1,g_val);
return 0;
}

When compile with "clang abc.c -o abc" then dump .rodata section:
# readelf -p .rodata abc

String dump of section '.rodata':
  [ 0]  abc
 [ 4]  %s %s: %d

When compile with "gcc abc.c -o abc" then dump .rodata section:
$ readelf -p .rodata abc

String dump of section '.rodata':
  [10]  abc
  [14]  c
  [16]  %s %s: %d^J

clang is able to merge short string ("c") into the tail of a long string 
("abc"), while gcc will not.
Does anybody know how to disable this behavior (make it similar to gcc) ?
Thanks.

___
cfe-users mailing list
cfe-users@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-users


Re: [cfe-users] how clang merge strings in .rodata section

2018-07-06 Thread Jian, Xu via cfe-users
Hi Hans,
We need to compare whether ELF files of two builds are identical.
Because of string merge, the comparison has some trouble.

For example in case following code lines (may be in different files):
---
const char* s_array[1]="s";
const char *first_s="this first bigger s";
const char *second_s="this second bigger s";
---

After clang build ELF out, sometimes the s_array[1] contail the position of the 
tail of first_s in .rodata second, while sometimes second_s.
This lead to .data section diff since s_array is in it.
The ELF diffs, while nothing changed from functionality point of view.

Thanks.

-Original Message-
From: hwennb...@google.com [mailto:hwennb...@google.com] On Behalf Of Hans 
Wennborg
Sent: Friday, July 6, 2018 3:54 PM
To: Jian, Xu
Cc: cfe-users@lists.llvm.org
Subject: Re: [cfe-users] how clang merge strings in .rodata section

On Thu, Jul 5, 2018 at 3:18 AM, Jian, Xu via cfe-users 
 wrote:
> Hi,
>
> The following c source code abc.c:
>
> #include 
>
> int g_val=10;
>
> const char *g_str="abc";
>
> const char *g_str1="c";
>
> int main(void)
>
> {
>
> printf("%s %s: %d\n",g_str,g_str1,g_val);
>
> return 0;
>
> }
>
>
>
> When compile with “clang abc.c -o abc” then dump .rodata section:
>
> # readelf -p .rodata abc
>
>
>
> String dump of section '.rodata':
>
>   [ 0]  abc
>
>  [ 4]  %s %s: %d
>
>
>
> When compile with “gcc abc.c -o abc” then dump .rodata section:
>
> $ readelf -p .rodata abc
>
>
>
> String dump of section '.rodata':
>
>   [10]  abc
>
>   [14]  c
>
>   [16]  %s %s: %d^J
>
>
>
> clang is able to merge short string (“c”) into the tail of a long 
> string (“abc”), while gcc will not.
>
> Does anybody know how to disable this behavior (make it similar to gcc) ?

I don't think there is a way to disable it.

Why do you want to disable this behaviour?

 - Hans
___
cfe-users mailing list
cfe-users@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-users


Re: [cfe-users] how clang merge strings in .rodata section

2018-07-10 Thread Jian, Xu via cfe-users
Hi Hans,
Thank you very much for your support.
It should not be a clang problem.

It is a problem that variable string (date and build host) is injected into ELF.
In zile/src/help.c:
  DEFUN ("zile-version", zile_version)
  /*+
  Show the zile version.
  +*/
  {
minibuf_write ("Zile " VERSION " of " CONFIGURE_DATE " on " 
CONFIGURE_HOST);

return TRUE;
  }
  END_DEFUN

This result in .rodata diffs between two build:
  ***
  *** 1 
  !   [  1d1a]  Zile 2.2.59 of Wed Nov 01 2017 on host-10
  --- 1 
  !   [  1d1a]  Zile 2.2.59 of Wed Jul 04 2018 on host-04

"4" is a constant string defined in source code:
In zile/src/variables.c:
  /*
  * Default variables values table.
  */
  static struct var_entry
  {
   char \*var;/* Variable name. */
   char \*val;/* Default value. */
   int local;/* If true, becomes local when set. */
  } def_vars[] =
  {
  #define X(var, val, local, doc) { var, val, local },
  #include "tbl_vars.h"
  #undef X
  };
In zile/src/tbl_vars.h:
  X ("standard-indent", "4", FALSE, "\

"4" point at the end of " \F4" in one build, and point at the end of "Zile 
2.2.59 of Wed Jul 04 2018 on host-04" in another build, thus after linking 
cause ELF .data section diffs.

-Original Message-
From: hwennb...@google.com [mailto:hwennb...@google.com] On Behalf Of Hans 
Wennborg
Sent: Friday, July 6, 2018 5:01 PM
To: Jian, Xu
Cc: cfe-users@lists.llvm.org
Subject: Re: [cfe-users] how clang merge strings in .rodata section

On Fri, Jul 6, 2018 at 10:22 AM, Jian, Xu 
mailto:xu.j...@dell.com>> wrote:
> Hi Hans,
> We need to compare whether ELF files of two builds are identical.
> Because of string merge, the comparison has some trouble.
>
> For example in case following code lines (may be in different files):
> ---
> const char* s_array[1]="s";
> const char *first_s="this first bigger s"; const char *second_s="this
> second bigger s";
> ---
>
> After clang build ELF out, sometimes the s_array[1] contail the position of 
> the tail of first_s in .rodata second, while sometimes second_s.
> This lead to .data section diff since s_array is in it.
> The ELF diffs, while nothing changed from functionality point of view.

Did the inputs change? If Clang is sometimes using the tail of first_s and 
sometimes second_s, for the same input, that's a bug. The compilation should be 
deterministic.

Can you provide sample input files and command lines that show this problem?

Thanks,
Hans


> -Original Message-
> From: hwennb...@google.com<mailto:hwennb...@google.com> 
> [mailto:hwennb...@google.com] On Behalf Of
> Hans Wennborg
> Sent: Friday, July 6, 2018 3:54 PM
> To: Jian, Xu
> Cc: cfe-users@lists.llvm.org<mailto:cfe-users@lists.llvm.org>
> Subject: Re: [cfe-users] how clang merge strings in .rodata section
>
> On Thu, Jul 5, 2018 at 3:18 AM, Jian, Xu via cfe-users 
> mailto:cfe-users@lists.llvm.org>> wrote:
>> Hi,
>>
>> The following c source code abc.c:
>>
>> #include 
>>
>> int g_val=10;
>>
>> const char *g_str="abc";
>>
>> const char *g_str1="c";
>>
>> int main(void)
>>
>> {
>>
>> printf("%s %s: %d\n",g_str,g_str1,g_val);
>>
>> return 0;
>>
>> }
>>
>>
>>
>> When compile with “clang abc.c -o abc” then dump .rodata section:
>>
>> # readelf -p .rodata abc
>>
>>
>>
>> String dump of section '.rodata':
>>
>>   [ 0]  abc
>>
>>  [ 4]  %s %s: %d
>>
>>
>>
>> When compile with “gcc abc.c -o abc” then dump .rodata section:
>>
>> $ readelf -p .rodata abc
>>
>>
>>
>> String dump of section '.rodata':
>>
>>   [10]  abc
>>
>>   [14]  c
>>
>>   [16]  %s %s: %d^J
>>
>>
>>
>> clang is able to merge short string (“c”) into the tail of a long
>> string (“abc”), while gcc will not.
>>
>> Does anybody know how to disable this behavior (make it similar to gcc) ?
>
> I don't think there is a way to disable it.
>
> Why do you want to disable this behaviour?
>
>  - Hans

___
cfe-users mailing list
cfe-users@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-users