gsmiller opened a new issue, #12190:
URL: https://github.com/apache/lucene/issues/12190

   ### Description
   
   A use-case was discussed on the [users 
list](https://lists.apache.org/thread/1yvzm400xgh8gfmo1r0kyfz6y2o1b438) for 
faceting based on expressions. The idea is to assign faceting path weights 
based on the output of an expression, where the expression can reference 
"aggregation" variables.
   
   An "aggregation variable" would provide a single value for any given path by 
aggregating some "normal" document-level variable using a specified aggregation 
function. If you think of this in a "map / reduce" kind of paradigm, an 
aggregation variable references some document-level variable for "mapping," and 
an aggregation function for "reduction." We already implement "aggregation 
variables" with association faceting (e.g., `TaxonomyFacetFloatAssociations`, 
`TaxonomyFacetIntAssociations`). For example, imagine you were indexing 
real-world cities as documents, and those documents contained a facet field for 
the city's `country` and a field for the city's `population`. With the current 
association faceting implementations, you could facet on `country`, assigning 
each unique country a weight based on `sum(population)` (i.e., an "aggregation 
variable" referencing the `population` document-level variable and using `SUM` 
as an aggregation function).
   
   For expression faceting, the idea is to be able to combine multiple of these 
"aggregation variables" into a single weight. Extending the above example, if 
we added lat/lon locations to each document and defined a `distance` variable 
for each document (which itself would be some expression based on some input 
lat/lon), we might want to define an expression like:
   `1000 * ln( 1 / var_distance_max ) + var_population_sum`. I've made up a 
syntax here where an "aggregation variable" always begins with `var_`, followed 
by the actual name of the document-level variable being referenced, followed by 
`_[max|sum]` to describe which aggregation function to use (right now, we only 
support max/sum in association faceting, but we could extend this).
   
   Note that this idea is different from the faceting demo found in 
`ExpressionAggregationFacetsExample`. In that example, a single document-level 
expression is being created, and then aggregated at each path.
   
   We explored this idea in a draft PR (#12184) to shake out some ideas and 
details.
   
   I think this is an interesting idea to explore more. It needs some thought 
though. For example, there's the issue of landing on an expression syntax that 
actually feels like it makes sense as part of the API (e.g., this 
`var_<name>_[max|sum]` syntax feels a little loose to me). I'm sure there are 
other functional questions as well. @stefanvodita raised some in the draft PR.
   
   Let's explore the idea a bit and decide if it's worth moving forward or if 
we want to scrap it.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to