Re: Classify stream expression questions

Joel Bernstein Mon, 14 Aug 2017 19:18:06 -0700

Actually my math was off. You would need 200 shards to get to 1000 result.
How many shards do you have?


The expression you provided also didn't include the ClusterText field in
field list of the search. So perhaps it's missing other parameters.

If you include all the parameters I may be able to spot the issue.

Joel Bernstein
http://joelsolr.blogspot.com/

On Mon, Aug 14, 2017 at 10:10 PM, Joel Bernstein <joels...@gmail.com> wrote:

> It looks like you just need to set the rows parameter in the search
> expression. If you don't set rows the default will be 20 I believe, which
> will pull to top 20 docs from each shard. If you have 5 shards than the
> 1000 results would make sense.
>
> You can parallelize the whole expression by wrapping it in a parallel
> expression. You'll need to set the partitionKeys in the search expression
> to do this.
>
> If you have a large number of records to process I would recommend batch
> processing. This blog explains the parallel batch framework:
>
> http://joelsolr.blogspot.com/2016/10/solr-63-batch-jobs-
> parallel-etl-and.html
>
>
>
>
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Mon, Aug 14, 2017 at 7:53 PM, Joe Obernberger <
> joseph.obernber...@gmail.com> wrote:
>
>> Hi All - I'm using the classify stream expression and the results
>> returned are always limited to 1,000.  Where do I specify the number to
>> return?  The stream expression that I'm using looks like:
>>
>> classify(model(models,id="MODEL1014",cacheMillis=5000),searc
>> h(COL,df="FULL_DOCUMENT",q="Collection:(COLLECT2000) AND
>> DocTimestamp:[2017-08-14T04:00:00Z TO 
>> 2017-08-15T03:59:00Z]",fl="id,score",sort="id
>> asc"),field="ClusterText")
>>
>> When I read this (code snipet):
>>
>>              stream.open();
>>             while (true) {
>>                 Tuple tuple = stream.read();
>>                 if (tuple.EOF) {
>>                     break;
>>                 }
>>                 Double probabilty = (Double)
>> tuple.fields.get("probability_d");
>>                 String docID = (String) tuple.fields.get("id");
>>
>> I get back 1,000 results.  Another question is if there is a way to
>> parallelize the classify call to other worker nodes?  Thank you!
>>
>> -Joe
>>
>>
>

Re: Classify stream expression questions

Reply via email to