Re: Classify stream expression questions

Joel Bernstein Mon, 14 Aug 2017 19:11:14 -0700

It looks like you just need to set the rows parameter in the search
expression. If you don't set rows the default will be 20 I believe, which
will pull to top 20 docs from each shard. If you have 5 shards than the
1000 results would make sense.


You can parallelize the whole expression by wrapping it in a parallel
expression. You'll need to set the partitionKeys in the search expression
to do this.

If you have a large number of records to process I would recommend batch
processing. This blog explains the parallel batch framework:

http://joelsolr.blogspot.com/2016/10/solr-63-batch-jobs-parallel-etl-and.html






Joel Bernstein
http://joelsolr.blogspot.com/

On Mon, Aug 14, 2017 at 7:53 PM, Joe Obernberger <
joseph.obernber...@gmail.com> wrote:

> Hi All - I'm using the classify stream expression and the results returned
> are always limited to 1,000.  Where do I specify the number to return?  The
> stream expression that I'm using looks like:
>
> classify(model(models,id="MODEL1014",cacheMillis=5000),searc
> h(COL,df="FULL_DOCUMENT",q="Collection:(COLLECT2000) AND
> DocTimestamp:[2017-08-14T04:00:00Z TO 
> 2017-08-15T03:59:00Z]",fl="id,score",sort="id
> asc"),field="ClusterText")
>
> When I read this (code snipet):
>
>              stream.open();
>             while (true) {
>                 Tuple tuple = stream.read();
>                 if (tuple.EOF) {
>                     break;
>                 }
>                 Double probabilty = (Double) tuple.fields.get("probability_
> d");
>                 String docID = (String) tuple.fields.get("id");
>
> I get back 1,000 results.  Another question is if there is a way to
> parallelize the classify call to other worker nodes?  Thank you!
>
> -Joe
>
>

Re: Classify stream expression questions

Reply via email to