Re: Stream expressions: Break up multivalue field into usable tuples

Doug Turnbull Sat, 08 Oct 2016 17:55:09 -0700

Joel -- thanks! Got this working and now feel in a better shape to grok
what's happening


Out of curiosity, is there any work being done to customize scoreNodes
scoring? There's a bunch of other forms of similarity I wouldn't mind
playing with as well.

On Thu, Sep 22, 2016 at 6:06 PM Joel Bernstein <joels...@gmail.com> wrote:

You could use the facet() expression which works with multi-value fields.

This emits aggregated tuples useful for recommendations. For example:



facet(baskets,

         q="item:taco",

         buckets="item",

         bucketSorts="count(*) desc",

         bucketSizeLimit="100",

         count(*))



You can feed this to scoreNodes() to score the tuples for a recommendation.

scoreNodes is a graph expression so it expects tuples to be formatted like

a node set. Specifically it looks for the following fields: node, field and

collection, which it uses to retrieve the IDF for each node.



The select() function can turn your facet response into a node set, so

scoreNodes can operate on it:



scoreNodes(

                    select(facet(baskets,

                     q="item:taco",

                     buckets="item",

                     bucketSorts="count(*) desc",

                     bucketSizeLimit=100,

                     count(*)),

               item as node,

               count(*),

               replace(collection, null, withValue=baskets),

               replace(field, null, withValue=item)))



There is a ticket open to have scoreNodes operate directly on the facet()

function so you don't have to deal with

the select() function. https://issues.apache.org/jira/browse/SOLR-9537. I'd

like to get to this soon.















Joel Bernstein

http://joelsolr.blogspot.com/



On Thu, Sep 22, 2016 at 5:02 PM, Doug Turnbull <

dturnb...@opensourceconnections.com> wrote:



> I have a field like follows in my search index

>

> {

>    "shopper_id": 1234,

>    "basket_id": 2512,

>    "items_bought": ["eggs", "tacos", "nachos"]

> }

>

> {

>    "shopper_id" 1236,

>    "basket_id": 2515,

>    "items_bought": ["eggs", "tacos", "chicken", "bubble gum"]

> }

>

> I would like to use some of the stream expression capabilities (in this

> case I'm looking at the recsys stuff) but it seems like I need to break up

> my data into tuples like

>

> {

>    "shopper_id": 1234,

>    "basket_id": 2512,

>     "item": "egg"

> },

> {

>    "shopper_id": 1234

>    "basket_id": 2512,

>    "item": "taco"

> }

> {

>    "shopper_id": 1234

>    "basket_id": 2512,

>    "item": "nacho"

> }

> ...

>

> For various other reasons, I'd prefer to keep my original data model with

> Solr doc == one shopper basket.

>

> Now is there a way to take documents above, output from a search tuple

> source and apply a stream mutator to emit baskets with a field broken up

> like above? (do let me know if I'm missing something completely here)

>

> Thanks!

> -Doug

>

Re: Stream expressions: Break up multivalue field into usable tuples

Reply via email to