[Lldb-commits] [lldb] 46cdcf0 - [lldb] Add support for UTF-8 unicode formatting

2021-12-25 Thread Luís Ferreira via lldb-commits

Author: Luís Ferreira
Date: 2021-12-25T20:19:09Z
New Revision: 46cdcf08730012128173cd261767a7d12898c8d6

URL: 
https://github.com/llvm/llvm-project/commit/46cdcf08730012128173cd261767a7d12898c8d6
DIFF: 
https://github.com/llvm/llvm-project/commit/46cdcf08730012128173cd261767a7d12898c8d6.diff

LOG: [lldb] Add support for UTF-8 unicode formatting

This patch adds missing formatting for UTF-8 unicode.

Cross-referencing https://reviews.llvm.org/D66447

Reviewed By: labath

Differential Revision: https://reviews.llvm.org/D112564

Added: 


Modified: 
lldb/source/Plugins/TypeSystem/Clang/TypeSystemClang.cpp

lldb/test/API/functionalities/data-formatter/builtin-formats/TestBuiltinFormats.py

Removed: 




diff  --git a/lldb/source/Plugins/TypeSystem/Clang/TypeSystemClang.cpp 
b/lldb/source/Plugins/TypeSystem/Clang/TypeSystemClang.cpp
index 0df95594eea2d..88c3aedb4c6b5 100644
--- a/lldb/source/Plugins/TypeSystem/Clang/TypeSystemClang.cpp
+++ b/lldb/source/Plugins/TypeSystem/Clang/TypeSystemClang.cpp
@@ -5149,6 +5149,8 @@ lldb::Format 
TypeSystemClang::GetFormat(lldb::opaque_compiler_type_t type) {
 case clang::BuiltinType::UChar:
 case clang::BuiltinType::WChar_U:
   return lldb::eFormatChar;
+case clang::BuiltinType::Char8:
+  return lldb::eFormatUnicode8;
 case clang::BuiltinType::Char16:
   return lldb::eFormatUnicode16;
 case clang::BuiltinType::Char32:
@@ -8957,6 +8959,7 @@ bool TypeSystemClang::DumpTypeValue(
 case eFormatCharPrintable:
 case eFormatCharArray:
 case eFormatBytes:
+case eFormatUnicode8:
 case eFormatBytesWithASCII:
   item_count = byte_size;
   byte_size = 1;

diff  --git 
a/lldb/test/API/functionalities/data-formatter/builtin-formats/TestBuiltinFormats.py
 
b/lldb/test/API/functionalities/data-formatter/builtin-formats/TestBuiltinFormats.py
index c894b80228cfe..7763305b58db8 100644
--- 
a/lldb/test/API/functionalities/data-formatter/builtin-formats/TestBuiltinFormats.py
+++ 
b/lldb/test/API/functionalities/data-formatter/builtin-formats/TestBuiltinFormats.py
@@ -115,8 +115,7 @@ def test(self):
 self.assertIn('= \\0\\e90zaZA\\v\\t\\r\\n\\f\\b\\a \n', 
self.getFormatted("character array", string_expr))
 self.assertIn('= \\0\\e90zaZA\\v\\t\\r\\n\\f\\b\\a \n', 
self.getFormatted("character", string_expr))
 self.assertIn('= ..90zaZA... \n', self.getFormatted("printable 
character", string_expr))
-# FIXME: This should probably print the characters in the uint128_t.
-self.assertIn('= 0x2007080c0a0d090b415a617a30391b00\n', 
self.getFormatted("unicode8", string_expr))
+self.assertIn('= 0x00 0x1b 0x39 0x30 0x7a 0x61 0x5a 0x41 0x0b 0x09 
0x0d 0x0a 0x0c 0x08 0x07 0x20\n', self.getFormatted("unicode8", string_expr))
 
 # OSType
 ostype_expr = "(__UINT64_TYPE__)0x"
@@ -137,6 +136,9 @@ def test(self):
 # bytes with ASCII
 self.assertIn(r'= " \U001b\a\b\f\n\r\t\vaA09\0"', 
self.getFormatted("bytes with ASCII", "cstring"))
 
+# unicode8
+self.assertIn('= 0x78 0x56 0x34 0x12\n', self.getFormatted("unicode8", 
"0x12345678"))
+
 # unicode16
 self.assertIn('= U+5678 U+1234\n', self.getFormatted("unicode16", 
"0x12345678"))
 



___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [PATCH] D112564: [lldb] Add support for UTF-8 unicode formatting

2021-12-25 Thread Luís Ferreira via Phabricator via lldb-commits
This revision was automatically updated to reflect the committed changes.
Closed by commit rG46cdcf087300: [lldb] Add support for UTF-8 unicode 
formatting (authored by ljmf00).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D112564/new/

https://reviews.llvm.org/D112564

Files:
  lldb/source/Plugins/TypeSystem/Clang/TypeSystemClang.cpp
  
lldb/test/API/functionalities/data-formatter/builtin-formats/TestBuiltinFormats.py


Index: 
lldb/test/API/functionalities/data-formatter/builtin-formats/TestBuiltinFormats.py
===
--- 
lldb/test/API/functionalities/data-formatter/builtin-formats/TestBuiltinFormats.py
+++ 
lldb/test/API/functionalities/data-formatter/builtin-formats/TestBuiltinFormats.py
@@ -115,8 +115,7 @@
 self.assertIn('= \\0\\e90zaZA\\v\\t\\r\\n\\f\\b\\a \n', 
self.getFormatted("character array", string_expr))
 self.assertIn('= \\0\\e90zaZA\\v\\t\\r\\n\\f\\b\\a \n', 
self.getFormatted("character", string_expr))
 self.assertIn('= ..90zaZA... \n', self.getFormatted("printable 
character", string_expr))
-# FIXME: This should probably print the characters in the uint128_t.
-self.assertIn('= 0x2007080c0a0d090b415a617a30391b00\n', 
self.getFormatted("unicode8", string_expr))
+self.assertIn('= 0x00 0x1b 0x39 0x30 0x7a 0x61 0x5a 0x41 0x0b 0x09 
0x0d 0x0a 0x0c 0x08 0x07 0x20\n', self.getFormatted("unicode8", string_expr))
 
 # OSType
 ostype_expr = "(__UINT64_TYPE__)0x"
@@ -137,6 +136,9 @@
 # bytes with ASCII
 self.assertIn(r'= " \U001b\a\b\f\n\r\t\vaA09\0"', 
self.getFormatted("bytes with ASCII", "cstring"))
 
+# unicode8
+self.assertIn('= 0x78 0x56 0x34 0x12\n', self.getFormatted("unicode8", 
"0x12345678"))
+
 # unicode16
 self.assertIn('= U+5678 U+1234\n', self.getFormatted("unicode16", 
"0x12345678"))
 
Index: lldb/source/Plugins/TypeSystem/Clang/TypeSystemClang.cpp
===
--- lldb/source/Plugins/TypeSystem/Clang/TypeSystemClang.cpp
+++ lldb/source/Plugins/TypeSystem/Clang/TypeSystemClang.cpp
@@ -5149,6 +5149,8 @@
 case clang::BuiltinType::UChar:
 case clang::BuiltinType::WChar_U:
   return lldb::eFormatChar;
+case clang::BuiltinType::Char8:
+  return lldb::eFormatUnicode8;
 case clang::BuiltinType::Char16:
   return lldb::eFormatUnicode16;
 case clang::BuiltinType::Char32:
@@ -8957,6 +8959,7 @@
 case eFormatCharPrintable:
 case eFormatCharArray:
 case eFormatBytes:
+case eFormatUnicode8:
 case eFormatBytesWithASCII:
   item_count = byte_size;
   byte_size = 1;


Index: lldb/test/API/functionalities/data-formatter/builtin-formats/TestBuiltinFormats.py
===
--- lldb/test/API/functionalities/data-formatter/builtin-formats/TestBuiltinFormats.py
+++ lldb/test/API/functionalities/data-formatter/builtin-formats/TestBuiltinFormats.py
@@ -115,8 +115,7 @@
 self.assertIn('= \\0\\e90zaZA\\v\\t\\r\\n\\f\\b\\a \n', self.getFormatted("character array", string_expr))
 self.assertIn('= \\0\\e90zaZA\\v\\t\\r\\n\\f\\b\\a \n', self.getFormatted("character", string_expr))
 self.assertIn('= ..90zaZA... \n', self.getFormatted("printable character", string_expr))
-# FIXME: This should probably print the characters in the uint128_t.
-self.assertIn('= 0x2007080c0a0d090b415a617a30391b00\n', self.getFormatted("unicode8", string_expr))
+self.assertIn('= 0x00 0x1b 0x39 0x30 0x7a 0x61 0x5a 0x41 0x0b 0x09 0x0d 0x0a 0x0c 0x08 0x07 0x20\n', self.getFormatted("unicode8", string_expr))
 
 # OSType
 ostype_expr = "(__UINT64_TYPE__)0x"
@@ -137,6 +136,9 @@
 # bytes with ASCII
 self.assertIn(r'= " \U001b\a\b\f\n\r\t\vaA09\0"', self.getFormatted("bytes with ASCII", "cstring"))
 
+# unicode8
+self.assertIn('= 0x78 0x56 0x34 0x12\n', self.getFormatted("unicode8", "0x12345678"))
+
 # unicode16
 self.assertIn('= U+5678 U+1234\n', self.getFormatted("unicode16", "0x12345678"))
 
Index: lldb/source/Plugins/TypeSystem/Clang/TypeSystemClang.cpp
===
--- lldb/source/Plugins/TypeSystem/Clang/TypeSystemClang.cpp
+++ lldb/source/Plugins/TypeSystem/Clang/TypeSystemClang.cpp
@@ -5149,6 +5149,8 @@
 case clang::BuiltinType::UChar:
 case clang::BuiltinType::WChar_U:
   return lldb::eFormatChar;
+case clang::BuiltinType::Char8:
+  return lldb::eFormatUnicode8;
 case clang::BuiltinType::Char16:
   return lldb::eFormatUnicode16;
 case clang::BuiltinType::Char32:
@@ -8957,6 +8959,7 @@
 case eFormatCharPrintable:
 case eFormatCharArray:
 case eFormatBytes:
+case eFormatUnicode8:
 case

[Lldb-commits] [PATCH] D114668: [lldb][NFC] Move generic DWARFASTParser code out of Clang-specific code

2021-12-25 Thread Luís Ferreira via Phabricator via lldb-commits
ljmf00 added a comment.

In D114668#3159640 , @bulbazord wrote:

> I think breaking it out of the Clang-specific class makes sense if we want 
> LLDB to be more language-agnostic. Do you have an idea of what bits of 
> `DWARFASTParserClang` can be moved out other than `ParseChildArrayInfo` and 
> `GetAccessTypeFromDWARF` (from the patch on top of this)? What is your 
> end-goal with this decoupling? I assume you want to work towards supporting 
> languages non-clang-based languages but I'm curious about the motivation.

@bulbazord Yes, my plan is to make LLDB interfaces more language-agnostic, to 
accommodate D programming language DWARFASTParser and TypeSystem. I've seen 
other language plugins such as Go that simply copy and paste this method, but I 
want to make D addition clearer and avoid such duplication. You can see more 
similar changes on Clang-specific code decoupling on the stacked changes.

I have made the requested changes, can you re-review, please? Also pinging 
@shafik .


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D114668/new/

https://reviews.llvm.org/D114668

___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits