wgtmac commented on code in PR #156:
URL: https://github.com/apache/iceberg-cpp/pull/156#discussion_r2255686870


##########
test/transform_test.cc:
##########
@@ -24,6 +24,7 @@
 
 #include <gmock/gmock.h>
 #include <gtest/gtest.h>
+#include <iceberg/expression/literal.h>

Review Comment:
   ```suggestion
   #include "iceberg/expression/literal.h"
   ```



##########
src/iceberg/type_fwd.h:
##########
@@ -36,6 +36,8 @@ enum class TypeId {
   kStruct,
   kList,
   kMap,
+  kNull,  // Note: A type having no physical storage. This is not an iceberg 
type, we add

Review Comment:
   See my comment above.



##########
src/iceberg/transform_function.cc:
##########
@@ -420,8 +428,8 @@ Result<std::unique_ptr<TransformFunction>> 
HourTransform::Make(
 VoidTransform::VoidTransform(std::shared_ptr<Type> const& source_type)
     : TransformFunction(TransformType::kVoid, source_type) {}
 
-Result<std::optional<Literal>> VoidTransform::Transform(const Literal& 
literal) {
-  return std::nullopt;
+Result<Literal> VoidTransform::Transform(const Literal& literal) {
+  return Literal::Null();

Review Comment:
   Per the spec, `All transforms must return `null` for a `null` input value`. 
We need to check if the input literal is a null for every kind of transform.
   



##########
src/iceberg/transform_function.cc:
##########
@@ -150,17 +150,25 @@ Result<std::optional<Literal>> 
TruncateTransform::Transform(const Literal& liter
       return NotImplemented("Truncate for Decimal is not implemented yet");
     }
     case TypeId::kString: {
+      // Strings are truncated to a valid UTF-8 string with no more than L 
code points.
       auto value = std::get<std::string>(literal.value());
-      if (value.size() > static_cast<size_t>(width_)) {
-        size_t safe_point = width_;
-        while (safe_point > 0 && (value[safe_point] & 0xC0) == 0x80) {
-          // Find the last valid UTF-8 character boundary before or at width_
-          safe_point--;
+      size_t code_point_count = 0;
+      size_t safe_point = 0;
+
+      for (size_t i = 0; i < value.size(); ++i) {
+        // Start of a new UTF-8 code point
+        if ((value[i] & 0xC0) != 0x80) {

Review Comment:
   nit: add a `iceberg/util/truncate_util.h` so we can test individual 
functions (utf8, integer, decimal, etc.) better without the boilerplate to 
create literal and transform. 



##########
src/iceberg/type.h:
##########
@@ -191,6 +191,23 @@ class ICEBERG_EXPORT MapType : public NestedType {
 /// Primitive types do not have nested fields.
 /// @{
 
+/// \brief A data type that has no physical storage.
+/// Technically, this is a primitive type, we treat it as a primitive type for
+/// convenience.
+class ICEBERG_EXPORT NullType : public PrimitiveType {

Review Comment:
   I don't think adding `NullType` and `TypeId::kNull` is the right path 
forward. It is confusing since it does not exist in the spec and may conflict 
with the purpose of `UnknownType` in v3. I'd suggest to revert changes related 
to this new type.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to