Fokko commented on code in PR #1862: URL: https://github.com/apache/iceberg-python/pull/1862#discussion_r2029322605
########## mkdocs/docs/row-filter-syntax.md: ########## @@ -0,0 +1,171 @@ +<!-- + - Licensed to the Apache Software Foundation (ASF) under one or more + - contributor license agreements. See the NOTICE file distributed with + - this work for additional information regarding copyright ownership. + - The ASF licenses this file to You under the Apache License, Version 2.0 + - (the "License"); you may not use this file except in compliance with + - the License. You may obtain a copy of the License at + - + - http://www.apache.org/licenses/LICENSE-2.0 + - + - Unless required by applicable law or agreed to in writing, software + - distributed under the License is distributed on an "AS IS" BASIS, + - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + - See the License for the specific language governing permissions and + - limitations under the License. + --> + +# Row Filter Syntax + +In addtion to the primary [Expression DSL](expression-dsl.md), PyIceberg provides a string-based statement interface for filtering rows in Iceberg tables. This guide explains the syntax and provides examples for supported operations. + +The row filter syntax is designed to be similar to SQL WHERE clauses. Here are the basic components: + +### Column References + +Columns can be referenced using either unquoted or quoted identifiers: + +```sql +column_name +"column.name" +``` + +### Literals + +The following literal types are supported: + +- Strings: `'hello world'` +- Numbers: `42`, `-42`, `3.14` +- Booleans: `true`, `false` (case insensitive) + +## Comparison Operations + +### Basic Comparisons + +```sql +column = 42 +column != 42 +column > 42 +column >= 42 +column < 42 +column <= 42 +``` + +!!! note + The `==` operator is an alias for `=` and `<>` is an alias for `!=` + +### String Comparisons + +```sql +column = 'hello' +column != 'world' +``` + +## NULL Checks + +Check for NULL values using the `IS NULL` and `IS NOT NULL` operators: + +```sql +column IS NULL +column IS NOT NULL +``` + +## NaN Checks + +For floating-point columns, you can check for NaN values: + +```sql +column IS NAN +column IS NOT NAN +``` + +## IN and NOT IN + +Check if a value is in a set of values: + +```sql +column IN ('a', 'b', 'c') +column NOT IN (1, 2, 3) +``` + +## LIKE Operations + +The LIKE operator supports pattern matching with a wildcard `%` at the end of the string: + +```sql +column LIKE 'prefix%' +column NOT LIKE 'prefix%' +``` + +!!! important + The `%` wildcard is only supported at the end of the pattern. Using it in the middle or beginning of the pattern will raise an error. + +## Logical Operations + +Combine multiple conditions using logical operators: + +```sql +column1 = 42 AND column2 = 'hello' +column1 > 0 OR column2 IS NULL +NOT (column1 = 42) +``` + +!!! tip + Parentheses can be used to group logical operations for clarity: + ```sql + (column1 = 42 AND column2 = 'hello') OR column3 IS NULL + ``` + +## Complete Examples + +Here are some complete examples showing how to combine different operations: + +```sql +-- Complex filter with multiple conditions +status = 'active' AND age > 18 AND NOT (country IN ('US', 'CA')) + +-- Filter with string pattern matching +name LIKE 'John%' AND age >= 21 + +-- Filter with NULL checks and numeric comparisons +price IS NOT NULL AND price > 100 AND quantity > 0 + +-- Filter with multiple logical operations +(status = 'pending' OR status = 'processing') AND NOT (priority = 'low') +``` + +## Common Pitfalls + +1. **String Quoting**: Always use single quotes for string literals. Double quotes are reserved for column identifiers. + ```sql + -- Correct + name = 'John' + + -- Incorrect + name = "John" + ``` + +2. **Wildcard Usage**: The `%` wildcard in LIKE patterns can only appear at the end. + ```sql + -- Correct + name LIKE 'John%' + + -- Incorrect (will raise an error) + name LIKE '%John%' + ``` + +3. **Case Sensitivity**: Boolean literals (`true`/`false`) are case insensitive, but string comparisons are case sensitive. + ```sql + -- All valid + is_active = true + is_active = TRUE + is_active = True + + -- Case sensitive + status = 'Active' -- Will not match 'active' + ``` + +## Best Practices +1. For complex use cases, use the primary [Expression DSL](expression-dsl.md) +2. When using multiple conditions, consider the order of operations (NOT > AND > OR) +3. For string comparisons, be consistent with case usage Review Comment: Nit on whitespace: ```suggestion 1. For complex use cases, use the primary [Expression DSL](expression-dsl.md) 2. When using multiple conditions, consider the order of operations (NOT > AND > OR) 3. For string comparisons, be consistent with case usage ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org