It looks like you just need to set the rows parameter in the search expression. If you don't set rows the default will be 20 I believe, which will pull to top 20 docs from each shard. If you have 5 shards than the 1000 results would make sense.
You can parallelize the whole expression by wrapping it in a parallel expression. You'll need to set the partitionKeys in the search expression to do this. If you have a large number of records to process I would recommend batch processing. This blog explains the parallel batch framework: http://joelsolr.blogspot.com/2016/10/solr-63-batch-jobs-parallel-etl-and.html Joel Bernstein http://joelsolr.blogspot.com/ On Mon, Aug 14, 2017 at 7:53 PM, Joe Obernberger < joseph.obernber...@gmail.com> wrote: > Hi All - I'm using the classify stream expression and the results returned > are always limited to 1,000. Where do I specify the number to return? The > stream expression that I'm using looks like: > > classify(model(models,id="MODEL1014",cacheMillis=5000),searc > h(COL,df="FULL_DOCUMENT",q="Collection:(COLLECT2000) AND > DocTimestamp:[2017-08-14T04:00:00Z TO > 2017-08-15T03:59:00Z]",fl="id,score",sort="id > asc"),field="ClusterText") > > When I read this (code snipet): > > stream.open(); > while (true) { > Tuple tuple = stream.read(); > if (tuple.EOF) { > break; > } > Double probabilty = (Double) tuple.fields.get("probability_ > d"); > String docID = (String) tuple.fields.get("id"); > > I get back 1,000 results. Another question is if there is a way to > parallelize the classify call to other worker nodes? Thank you! > > -Joe > >