kristof.beyls added a comment. Herald added a project: LLVM. This looks fine to me; I just have a number of nit picks. The only part where I don't understand the code logic is around the comment starting with "If this is the final byte of a multi-byte sequence".
================ Comment at: llvm/include/llvm/Support/FormattedStream.h:23-26 /// formatted_raw_ostream - A raw_ostream that wraps another one and keeps track /// of line and column position, allowing padding out to specific column /// boundaries and querying the number of lines written to the stream. /// ---------------- I think it would be useful to add a description to this comment that: (a) This class assumes that a UTF-8 encoding is used on the Stream; and (b) "This doesn't attempt to handle everything unicode can do (combining characters, right-to-left markers, ...), but hopefully covers most things likely to be common in messages and source code we might want to print." ================ Comment at: llvm/lib/Support/FormattedStream.cpp:30 - // Keep track of the current column and line by scanning the string for - // special characters - for (const char *End = Ptr + Size; Ptr != End; ++Ptr) { - ++Column; - switch (*Ptr) { + auto ProcessCodePoint = [&Line, &Column](StringRef CP) { + int Width = sys::locale::columnWidth(CP); ---------------- <bikeshedding mode> Given that ProcessCodePoint assumes that the Unicode code point represented in the UTF-8 encoding, maybe it be slightly better to name the lambda as ProcessUTF8CodePoint rather than ProcessCodePoint? </bikeshedding mode> ================ Comment at: llvm/lib/Support/FormattedStream.cpp:31 + auto ProcessCodePoint = [&Line, &Column](StringRef CP) { + int Width = sys::locale::columnWidth(CP); + // columnWidth returns -1 for non-printing characters. ---------------- I'm wondering if using sys::unicode::columnWidthUTF8 instead of sys::locale::columnWidth would be more future-proof and more clearly describe the intent that this function only processes UTF-8 and not strings encoded in other encodings? ================ Comment at: llvm/lib/Support/FormattedStream.cpp:33 + // columnWidth returns -1 for non-printing characters. + if (Width != -1) + Column += Width; ---------------- The documentation for sys::unicode::columnWidthUTF8 documents it returns ErrorNonPrintableCharacter (-1) if the text contains non-printable characters. Maybe it's more self-documenting to compare against ErrorNonPrintableCharacter rather than -1 in the above if condition? ================ Comment at: llvm/lib/Support/FormattedStream.cpp:36-37 + + // If this is the final byte of a multi-byte sequence, it can't be any of + // the special whitespace characters below. + if (CP.size() > 1) ---------------- Reading through the code linearly from the top to the bottom, I'm a bit surprised by this comment. I would expect CP to contain exactly the bytes that when interpreted as a UTF-8 encoded Unicode character, form exactly one Unicode character. Therefore, I'm not sure how to interpret "If this is the final byte of a multi-byte sequence.". I'm expecting "this" to refer to "CP" in this context. But that cannot be "just" the final byte of a multi-byte sequence, unless my assumption that CP contains exactly the bytes forming a single UTF-8 encoded Unicode character is wrong. Could CP contain a partial UTF-8 encoded character? If so, maybe it'd be better to change the name ProcessCodePoint to something that suggests that could be possible? ================ Comment at: llvm/unittests/Support/formatted_raw_ostream_test.cpp:134 + + // This character encodes as three bytes, so will cause the buffer to be + // flushed after the first byte (4 byte buffer, 3 bytes already written). We ---------------- I guess "This" refers to \u2468? If so, it'd be easier to read this comment if it was written like: "// \u2468 encodes as three bytes, ..." Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D76291/new/ https://reviews.llvm.org/D76291 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits