Here is the psuedo code: rollup(sort(fetch(gatherNodes())))
Joel Bernstein http://joelsolr.blogspot.com/ On Thu, Jun 22, 2017 at 5:19 PM, Joel Bernstein <joels...@gmail.com> wrote: > You'll need to use the sort expression to sort the nodes by schemaType > first. The rollup expression is doing a MapReduce rollup that requires the > the records to be sorted by the "over" fields. > > Joel Bernstein > http://joelsolr.blogspot.com/ > > On Thu, Jun 22, 2017 at 2:49 PM, Pratik Patel <pra...@semandex.net> wrote: > >> Hi, >> >> I have a streaming expression which uses rollup function. My understanding >> is that rollup takes an incoming stream and aggregates over given buckets. >> However, with following query the result contains duplicate tuples. >> >> Following is the streaming expression. >> >> rollup( >> fetch( >> collection1, >> gatherNodes( >> collection1, >> gatherNodes(collection1, >> walk="54227b412a1c4e574f88f2bb >> ->eventParticipantID", >> gather="eventID" >> ), >> walk="eventID->conceptid", >> gather="conceptid", >> trackTraversal="true", scatter="branches,leaves" >> ), >> fl="schematype", >> on="node=conceptid" >> ), >> over="schematype", >> count(schematype) >> ) >> >> The result returned is as follows. >> >> { >> "result-set": { >> "docs": [ >> { >> "count(schematype)": 1, >> "schematype": "Company" >> }, >> { >> "count(schematype)": 1, >> "schematype": "Founding Event" >> }, >> { >> "count(schematype)": 1, >> "schematype": "Customer" >> }, >> { >> "count(schematype)": 1, >> "schematype": "Founding Event" // duplicate >> }, >> { >> "count(schematype)": 1, >> "schematype": "Employment" // duplicate >> }, >> { >> "count(schematype)": 1, >> "schematype": "Founding Event" >> }, >> { >> "count(schematype)": 4, >> "schematype": "Employment" >> },...... >> ] >> } >> >> As you can see, there are more than one tuples for 'Founding >> Event'/'Employment' >> >> Am I missing something here? >> >> Following is the content of stream which is wrapped by rollup, if it >> helps. >> >> // stream on which rollup is working >> { >> "result-set": { >> "docs": [ >> { >> "node": "54227b412a1c4e574f88f2bb", >> "schematype": "Company", >> "collection": "collection1", >> "field": "node", >> "level": 0 >> }, >> { >> "node": "543004f0c92c0a651166aea5", >> "schematype": "Founding Event", >> "collection": "collection1", >> "field": "eventID", >> "level": 1 >> }, >> { >> "node": "543004f0c92c0a651166ae99", >> "schematype": "Customer", >> "collection": "collection1", >> "field": "eventID", >> "level": 1 >> }, >> { >> "node": "543004f0c92c0a651166aea1", >> "schematype": "Founding Event", >> "collection": "collection1", >> "field": "eventID", >> "level": 1 >> }, >> { >> "node": "543004f0c92c0a651166ae78", >> "schematype": "Employment", >> "collection": "collection1", >> "field": "eventID", >> "level": 1 >> }, >> { >> "node": "54ee6178b54c1d65412b5f9f", >> "schematype": "Founding Event", >> "collection": "collection1", >> "field": "eventID", >> "level": 1 >> }, >> { >> "node": "543004f0c92c0a651166ae7c", >> "schematype": "Employment", >> "collection": "collection1", >> "field": "eventID", >> "level": 1 >> }, >> { >> "node": "543004f0c92c0a651166ae80", >> "schematype": "Employment", >> "collection": "collection1", >> "field": "eventID", >> "level": 1 >> }, >> { >> "node": "543004f0c92c0a651166ae8a", >> "schematype": "Employment", >> "collection": "collection1", >> "field": "eventID", >> "level": 1 >> }, >> { >> "node": "543004f0c92c0a651166ae94", >> "schematype": "Employment", >> "collection": "collection1", >> "field": "eventID", >> "level": 1 >> }, >> { >> "node": "543004f0c92c0a651166ae9d", >> "schematype": "Customer", >> "collection": "collection1", >> "field": "eventID", >> "level": 1 >> }, >> { >> "EOF": true, >> "RESPONSE_TIME": 38 >> } >> ] >> } >> } >> >> If I rollup on the level field then the results are as expected but not >> when the field is schematype. Any idea what's going on here? >> >> >> Thanks, >> >> Pratik >> > >