Joel -- thanks! Got this working and now feel in a better shape to grok what's happening
Out of curiosity, is there any work being done to customize scoreNodes scoring? There's a bunch of other forms of similarity I wouldn't mind playing with as well. On Thu, Sep 22, 2016 at 6:06 PM Joel Bernstein <joels...@gmail.com> wrote: You could use the facet() expression which works with multi-value fields. This emits aggregated tuples useful for recommendations. For example: facet(baskets, q="item:taco", buckets="item", bucketSorts="count(*) desc", bucketSizeLimit="100", count(*)) You can feed this to scoreNodes() to score the tuples for a recommendation. scoreNodes is a graph expression so it expects tuples to be formatted like a node set. Specifically it looks for the following fields: node, field and collection, which it uses to retrieve the IDF for each node. The select() function can turn your facet response into a node set, so scoreNodes can operate on it: scoreNodes( select(facet(baskets, q="item:taco", buckets="item", bucketSorts="count(*) desc", bucketSizeLimit=100, count(*)), item as node, count(*), replace(collection, null, withValue=baskets), replace(field, null, withValue=item))) There is a ticket open to have scoreNodes operate directly on the facet() function so you don't have to deal with the select() function. https://issues.apache.org/jira/browse/SOLR-9537. I'd like to get to this soon. Joel Bernstein http://joelsolr.blogspot.com/ On Thu, Sep 22, 2016 at 5:02 PM, Doug Turnbull < dturnb...@opensourceconnections.com> wrote: > I have a field like follows in my search index > > { > "shopper_id": 1234, > "basket_id": 2512, > "items_bought": ["eggs", "tacos", "nachos"] > } > > { > "shopper_id" 1236, > "basket_id": 2515, > "items_bought": ["eggs", "tacos", "chicken", "bubble gum"] > } > > I would like to use some of the stream expression capabilities (in this > case I'm looking at the recsys stuff) but it seems like I need to break up > my data into tuples like > > { > "shopper_id": 1234, > "basket_id": 2512, > "item": "egg" > }, > { > "shopper_id": 1234 > "basket_id": 2512, > "item": "taco" > } > { > "shopper_id": 1234 > "basket_id": 2512, > "item": "nacho" > } > ... > > For various other reasons, I'd prefer to keep my original data model with > Solr doc == one shopper basket. > > Now is there a way to take documents above, output from a search tuple > source and apply a stream mutator to emit baskets with a field broken up > like above? (do let me know if I'm missing something completely here) > > Thanks! > -Doug >