attilakreiner commented on code in PR #10681:
URL: https://github.com/apache/iceberg/pull/10681#discussion_r1678355098


##########
.baseline/checkstyle/checkstyle.xml:
##########
@@ -281,16 +280,13 @@
         </module>
         <module name="InnerAssignment"/> <!-- Java Coding Guidelines: Inner 
assignments: Not used -->
         <module name="MemberName"> <!-- Java Style Guide: Non-constant field 
names -->
-            <property name="format" value="^[a-z][a-zA-Z0-9]+$"/>
-            <message key="name.invalidPattern" value="Member name ''{0}'' must 
match pattern ''{1}''."/>
+            <property name="format" value="^[a-z][a-zA-Z0-9]++$"/>

Review Comment:
   The `*+` and `++` are _possessive_ quantifiers (as opposed to `*` and `+` 
which are _greedy_; and `*?` and `+?` which are _lazy_). 
   
   I found the explanation on this page very useful: 
https://www.rexegg.com/regex-quantifiers.php#possessive
   
   My understanding is that in cases when we can use either _greedy_ and 
_possessive_ quantifiers, _possessive_ are faster. Quote form the link above:
   
   > This behavior is particularly useful when you know there is no valid 
reason why the engine should ever backtrack into a section of matched text, as 
you can save the engine a lot of needless work.
   
   This seems a common regex feature, but honestly speaking I wasn't familiar 
with in until @findepi pointed it out (thx for that) in the review of #10673.
   
   I understand using _possessive_ quantifiers is good for performance, OTOH 
it's possible that many people are unfamiliar with this syntax, so it may be a 
tradeoff in terms of readability (and also a deviation from checkstyle 
standards). WDYT?



##########
.baseline/checkstyle/checkstyle.xml:
##########
@@ -281,16 +280,13 @@
         </module>
         <module name="InnerAssignment"/> <!-- Java Coding Guidelines: Inner 
assignments: Not used -->
         <module name="MemberName"> <!-- Java Style Guide: Non-constant field 
names -->
-            <property name="format" value="^[a-z][a-zA-Z0-9]+$"/>
-            <message key="name.invalidPattern" value="Member name ''{0}'' must 
match pattern ''{1}''."/>
+            <property name="format" value="^[a-z][a-zA-Z0-9]++$"/>

Review Comment:
   The `*+` and `++` are _possessive_ quantifiers (as opposed to `*` and `+` 
which are _greedy_; and `*?` and `+?` which are _lazy_). 
   
   I found the explanation on this page very useful: 
https://www.rexegg.com/regex-quantifiers.php#possessive
   
   My understanding is that in cases when we can use either _greedy_ or 
_possessive_ quantifiers, _possessive_ are faster. Quote form the link above:
   
   > This behavior is particularly useful when you know there is no valid 
reason why the engine should ever backtrack into a section of matched text, as 
you can save the engine a lot of needless work.
   
   This seems a common regex feature, but honestly speaking I wasn't familiar 
with in until @findepi pointed it out (thx for that) in the review of #10673.
   
   I understand using _possessive_ quantifiers is good for performance, OTOH 
it's possible that many people are unfamiliar with this syntax, so it may be a 
tradeoff in terms of readability (and also a deviation from checkstyle 
standards). WDYT?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to