wgtmac commented on code in PR #190:
URL: https://github.com/apache/iceberg-cpp/pull/190#discussion_r2342877421


##########
src/iceberg/avro/avro_schema_util_internal.h:
##########
@@ -149,11 +149,44 @@ std::string ToString(const ::avro::LogicalType::Type& 
logical_type);
 /// \return True if the node has a map logical type, false otherwise.
 bool HasMapLogicalType(const ::avro::NodePtr& node);
 
+/// \brief Check if a string is a valid Avro name.
+///
+/// Valid Avro names must:
+/// 1. Start with a letter or underscore
+/// 2. Contain only letters, digits, or underscores
+///
+/// \param name The name to check.
+/// \return True if the name is valid, false otherwise.
+bool ValidAvroName(const std::string& name);

Review Comment:
   ```suggestion
   bool ValidAvroName(std::string_view name);
   ```
   
   `std::string_view` accepts wider types than `std::string`



##########
src/iceberg/avro/avro_schema_util.cc:
##########
@@ -181,10 +234,20 @@ Status ToAvroNodeVisitor::Visit(const StructType& type, 
::avro::NodePtr* node) {
     ::avro::NodePtr field_node;
     ICEBERG_RETURN_UNEXPECTED(Visit(sub_field, &field_node));
 
-    // TODO(gangwu): sanitize field name
-    (*node)->addName(std::string(sub_field.name()));
+    std::string origFieldName = std::string(sub_field.name());
+    bool isValidFieldName = ValidAvroName(origFieldName);
+    std::string fieldName =
+        isValidFieldName ? origFieldName : SanitizeFieldName(origFieldName);

Review Comment:
   ```suggestion
       bool is_valid_field_name = ValidAvroName(sub_field.name());
       std::string field_name =
           is_valid_field_name ? std::string(sub_field.name()) : 
SanitizeFieldName(sub_field.name());
   ```



##########
src/iceberg/avro/avro_schema_util.cc:
##########
@@ -181,10 +234,20 @@ Status ToAvroNodeVisitor::Visit(const StructType& type, 
::avro::NodePtr* node) {
     ::avro::NodePtr field_node;
     ICEBERG_RETURN_UNEXPECTED(Visit(sub_field, &field_node));
 
-    // TODO(gangwu): sanitize field name
-    (*node)->addName(std::string(sub_field.name()));
+    std::string origFieldName = std::string(sub_field.name());
+    bool isValidFieldName = ValidAvroName(origFieldName);
+    std::string fieldName =
+        isValidFieldName ? origFieldName : SanitizeFieldName(origFieldName);

Review Comment:
   We don't want to pay for a copy when it is a valid name.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to