gsmiller opened a new issue, #12190: URL: https://github.com/apache/lucene/issues/12190
### Description A use-case was discussed on the [users list](https://lists.apache.org/thread/1yvzm400xgh8gfmo1r0kyfz6y2o1b438) for faceting based on expressions. The idea is to assign faceting path weights based on the output of an expression, where the expression can reference "aggregation" variables. An "aggregation variable" would provide a single value for any given path by aggregating some "normal" document-level variable using a specified aggregation function. If you think of this in a "map / reduce" kind of paradigm, an aggregation variable references some document-level variable for "mapping," and an aggregation function for "reduction." We already implement "aggregation variables" with association faceting (e.g., `TaxonomyFacetFloatAssociations`, `TaxonomyFacetIntAssociations`). For example, imagine you were indexing real-world cities as documents, and those documents contained a facet field for the city's `country` and a field for the city's `population`. With the current association faceting implementations, you could facet on `country`, assigning each unique country a weight based on `sum(population)` (i.e., an "aggregation variable" referencing the `population` document-level variable and using `SUM` as an aggregation function). For expression faceting, the idea is to be able to combine multiple of these "aggregation variables" into a single weight. Extending the above example, if we added lat/lon locations to each document and defined a `distance` variable for each document (which itself would be some expression based on some input lat/lon), we might want to define an expression like: `1000 * ln( 1 / var_distance_max ) + var_population_sum`. I've made up a syntax here where an "aggregation variable" always begins with `var_`, followed by the actual name of the document-level variable being referenced, followed by `_[max|sum]` to describe which aggregation function to use (right now, we only support max/sum in association faceting, but we could extend this). Note that this idea is different from the faceting demo found in `ExpressionAggregationFacetsExample`. In that example, a single document-level expression is being created, and then aggregated at each path. We explored this idea in a draft PR (#12184) to shake out some ideas and details. I think this is an interesting idea to explore more. It needs some thought though. For example, there's the issue of landing on an expression syntax that actually feels like it makes sense as part of the API (e.g., this `var_<name>_[max|sum]` syntax feels a little loose to me). I'm sure there are other functional questions as well. @stefanvodita raised some in the draft PR. Let's explore the idea a bit and decide if it's worth moving forward or if we want to scrap it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org