You'll need to have this field in your schema:

<dynamicField name="random_*" type="random" />

I'll check to see if the default schema used with solr start -c has this
field, if not I'll add it. Thanks for pointing this out.

I checked and right now the random expression is only accepting one fq, but
I consider this a bug. It should accept multiple. I'll create ticket for
getting this fixed.



Joel Bernstein
http://joelsolr.blogspot.com/

On Thu, Mar 1, 2018 at 4:55 PM, John Smith <localde...@gmail.com> wrote:

> Joel, thanks for the pointers to the streaming feature. I had no idea solr
> had that (and also just discovered the very intersting sql feature! I will
> be sure to investigate that in more detail in the future).
>
> However I'm having some trouble getting basic streaming functions working.
> I've already figured out that I had to move to "solr cloud" instead of
> "solr standalone" because I was getting errors about "cannot find zk
> instance" or whatever which went away when using "solr start -c" instead.
>
> But now I'm trying to use the random function since that was one of the
> functions used in your example.
>
> random(tx_header, q="*:*", rows="100", fl="countyname")
>
> I posted that directly in the "stream" section of the solr admin UI. This
> is all on linux, with solr 7.1.0 and 7.2.1 (tried several versions in case
> it was a bug in one)
>
> I get back an error message:
> *sort param could not be parsed as a query, and is not a field that exists
> in the index: random_-255009774*
>
> I'm not passing in any sort field anywhere. But the solr logs show these
> three log entries:
>
> 2018-03-01 21:41:18.954 INFO  (qtp257513673-21) [c:tx_header s:shard1
> r:core_node2 x:tx_header_shard1_replica_n1] o.a.s.c.S.Request
> [tx_header_shard1_replica_n1]  webapp=/solr path=/select
> params={q=*:*&_stateVer_=tx_header:6&fl=countyname
> *&sort=random_-255009774+asc*&rows=100&wt=javabin&version=2} status=400
> QTime=19
>
> 2018-03-01 21:41:18.966 ERROR (qtp257513673-17) [c:tx_header s:shard1
> r:core_node2 x:tx_header_shard1_replica_n1] o.a.s.c.s.i.CloudSolrClient
> Request to collection [tx_header] failed due to (400)
> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:
> Error
> from server at http://192.168.13.31:8983/solr/tx_header: sort param could
> not be parsed as a query, and is not a field that exists in the index:
> random_-255009774, retry? 0
>
> 2018-03-01 21:41:18.968 ERROR (qtp257513673-17) [c:tx_header s:shard1
> r:core_node2 x:tx_header_shard1_replica_n1] o.a.s.c.s.i.s.ExceptionStream
> java.io.IOException:
> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:
> Error
> from server at http://192.168.13.31:8983/solr/tx_header: sort param could
> not be parsed as a query, and is not a field that exists in the index:
> random_-255009774
>
>
> So basically it looks like solr is injecting the "sort=random_" stuff into
> my query and of course that is failing on the search since that
> field/column doesn't exist in my schema. Everytime I run the random
> function, I get a slightly different field name that it injects, but they
> all start with "random_" etc.
>
> I have tried adding my own sort field instead, hoping solr wouldn't inject
> one for me, but it still injected a random sort fieldname:
> random(tx_header, q="*:*", rows="100", fl="countyname", sort="countyname
> asc")
>
>
> Assuming I can fix that whole problem, my second question is: can I add
> multiple "fq=" parameters to the random function? I build a pretty
> complicated query using many fq= fields, and then want to run some stats on
> that hitlist; so somehow I have to pass in the query that made up the exact
> hitlist to these various functions, but when I used multiple "fq=" values
> it only seemed to use the last one I specified and just ignored all the
> previous fq's?
>
> Thanks in advance for any comments/suggestions...!
>
>
>
>
> On Fri, Feb 23, 2018 at 5:59 PM, Joel Bernstein <joels...@gmail.com>
> wrote:
>
> > This is going to be a complex answer because Solr actually now has
> multiple
> > ways of doing regression analysis as part of the Streaming Expression
> > statistical programming library. The basic documentation is here:
> >
> > https://lucene.apache.org/solr/guide/7_2/statistical-programming.html
> >
> > Here is a sample expression that performs a simple linear regression in
> > Solr 7.2:
> >
> > let(a=random(collection1, q="any query", rows="15000", fl="fieldA,
> > fieldB"),
> >     b=col(a, fieldA),
> >     c=col(a, fieldB),
> >     d=regress(b, c))
> >
> >
> > The expression above takes a random sample of 15000 results from
> > collection1. The result set will include fieldA and fieldB in each
> record.
> > The result set is stored in variable "a".
> >
> > Then the "col" function creates arrays of numbers from the results stored
> > in variable a. The values in fieldA are stored in the variable "b". The
> > values in fieldB are stored in variable "c".
> >
> > Then the regress function performs a simple linear regression on arrays
> > stored in variables "b" and "c".
> >
> > The output of the regress function is a map containing the regression
> > result. This result includes RSquared and other attributes of the
> > regression model such as R (correlation), slope, y intercept etc...
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> > On Fri, Feb 23, 2018 at 3:10 PM, John Smith <localde...@gmail.com>
> wrote:
> >
> > > Hi Joel, thanks for the answer. I'm not really a stats guy, but the end
> > > result of all this is supposed to be obtaining R^2. Is there no way of
> > > obtaining this value, then (short of iterating over all the results in
> > the
> > > hitlist and calculating it myself)?
> > >
> > > On Fri, Feb 23, 2018 at 12:26 PM, Joel Bernstein <joels...@gmail.com>
> > > wrote:
> > >
> > > > Typically SSE is the sum of the squared errors of the prediction in a
> > > > regression analysis. The stats component doesn't perform regression,
> > > > although it might be a nice feature.
> > > >
> > > >
> > > >
> > > > Joel Bernstein
> > > > http://joelsolr.blogspot.com/
> > > >
> > > > On Fri, Feb 23, 2018 at 12:17 PM, John Smith <localde...@gmail.com>
> > > wrote:
> > > >
> > > > > I'm using solr, and enabling stats as per this page:
> > > > > https://lucene.apache.org/solr/guide/6_6/the-stats-component.html
> > > > >
> > > > > I want to get more stat values though. Specifically I'm looking for
> > > > > r-squared (coefficient of determination). This value is not present
> > in
> > > > > solr, however some of the pieces used to calculate r^2 are in the
> > stats
> > > > > element, for example:
> > > > >
> > > > > <double name="min">0.0</double>
> > > > > <double name="max">10.0</double>
> > > > > <long name="count">15</long>
> > > > > <long name="missing">17</long>
> > > > > <double name="sum">85.0</double>
> > > > > <double name="sumOfSquares">603.0</double>
> > > > > <double name="mean">5.666666666666667</double>
> > > > > <double name="stddev">2.943920288775949</double>
> > > > >
> > > > >
> > > > > So I have the sumOfSquares available (SST), and using this
> > > calculation, I
> > > > > can get R^2:
> > > > >
> > > > > R^2 = 1 - SSE/SST
> > > > >
> > > > > All I need then is SSE. Is there anyway I can get SSE from those
> > other
> > > > > stats in solr?
> > > > >
> > > > > Thanks in advance!
> > > > >
> > > >
> > >
> >
>

Reply via email to