Re: upgrading from solr4 to solr8 searches taking 4 to 10 times as long to return

2019-09-08 Thread Günter Hipler

Hi

what about this
https://issues.apache.org/jira/browse/SOLR-8096
seems still unresolved issue?

With our migration from version 4 to 7 last year we experienced similar 
problems.


Günter

On 08.09.19 06:09, Russell Bahr wrote:

Hi David and Toke,
Thank you both for your input.  I will be in DC tomorrow evening and will
try your suggestions, and read the ref guide again on the parts that you
have pointed out.  I will let you know the results, and will share your
feedback with my team to see what we can change and still bring back the
result sets that are needed for our system.
Thanks again,
Russ

*Manzama*a MODERN GOVERNANCE company

Russell Bahr
Lead Infrastructure Engineer

USA & CAN Office: +1 (541) 306 3271
USA & CAN Support: +1 (541) 706 9393
UK Office & Support: +44 (0)203 282 1633
AUS Office & Support: +61 (0) 2 8417 2339

543 NW York Drive, Suite 100, Bend, OR 97703

LinkedIn  | Twitter
 | Facebook
 | YouTube



On Sat, Sep 7, 2019 at 2:43 PM David Smiley 
wrote:


Also consider substituting grouping with expand/collapse (see the ref
guide).  The latter performs much better in general, although grouping does
have certain options that are uniquely valuable like ensuring that facet
counts look at the aggregate (if you want that).  I wish we could outright
remove grouping; it's a complexity weight on our codebase.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Sat, Sep 7, 2019 at 5:15 PM David Smiley 
wrote:


10s of seconds to respond to a simple match-all query, especially to just
a single shard via using distrib=false, is very bizarre.  What's the
"QTime" on one of these? -- also super long or sub-second?

I took a brief look at your schema with a hunch.  I see you have
docValues=true on your ID field -- makes sense to me.  You also have
version=1.5 on the schema instead of 1.6.  Why did you not do 1.6?  With
1.5 useDocValuesAsStored is false by default.  try toggling the version
number to 1.6.  And try your query with "fl=id" and see how that changes
the times.

I also took a look at your solrconfig.xml with a hunch, and now think I
found the smoking gun.  I see you've modified the /select request handler
to add a bunch of defaults, including, of all things, grouping.  Thus

when

you report to us your queries are simple *:* queries, the reality is far
different.  I wish people would treat /select as immutable and instead
create request handlers for their apps' needs.

Nonetheless my investigation here only reveals that your test queries are
actually very complex and thus explains their overall slowness.  We don't
know why Solr 8 performs slower than Solr 4 here.  For that I think we've
given you some tips.  Get back to a simple query and compare that.  Try
certain features in isolation (e.g. *just* the grouping).  Maybe it's
that.  You might experiment with switching "fingerprint" (the string

field

you group on) from docValues=true to false to see if it's a docValues

perf

issue compared to uninverting.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Sat, Sep 7, 2019 at 3:06 PM Russell Bahr  wrote:


Hi David,
I ran the *:* query 10 times against all 30 servers and the results
(below)
were similar across all of them. I agree working against a single server
is
easier troubleshooting, but I do not know where to start.

Server shard replica, Matches, Time, Pass
16 1 n2 2989421 78800 1
20 1 n1 2989559 63246 1
23 1 n8 2989619 55141 1
28 1 n6 2989619 65536 1
17 1 n4 2989818 56694 1
26 2 n10 2990088 63485 1
21 2 n18 2990145 68077 1
11 2 n16 2990145 62271 1
13 2 n12 2990242 68564 1
27 2 n14 2990242 63739 1
10 3 n26 2988056 69117 1
25 3 n24 2988056 73750 1
12 3 n28 2988096 61948 1
6 3 n20 2988123 62174 1
19 3 n22 2988123 65826 1
1 4 n30 2985457 60404 1
29 4 n34 2985457 68498 1
30 4 n38 2985604 72034 1
9 4 n36 2902757 65943 1
15 4 n32 2985948 67208 1
7 5 n48 2992278 63098 1
5 5 n42 2992363 69503 1
8 5 n44 2992363 66818 1
4 5 n40 2992397 66784 1
14 5 n46 2883495 58759 1
3 6 n56 2878221 52265 1
22 6 n58 2878221 53768 1
24 6 n52 2878326 62174 1
2 6 n50 2878326 53143 1
18 6 n54 2878326 59044 1

Results from 10 passes
p-solr-8-16.obscured.com:8983/solr/content_shard1_replica_n2/ 69697.8
4599.8171896
Query time milliseconds [78800, 65549, 68045, 72151, 62774, 69168,

66459,

74336, 69028, 70668]
p-solr-8-20.obscured.com:8983/solr/content_shard1_replica_n1/ 58310.5
4531.23621224
Query time milliseconds [63246, 59626, 61001, 59366, 53028, 58693,

58832,

64633, 54659, 50021]
p-solr-8-23.obscured.com:8983/solr/content_shard1_replica_n8/ 57778.5
4659.23933348
Query time milliseconds [55141, 55194, 59100, 62614, 65425, 59261,

58961,

59259, 53799, 49031]
p-solr-8-28.obscured.com:8983/solr/content_shard1_replica_n6/ 64944.1
3382.61379705
Query time 

Re: upgrading from solr4 to solr8 searches taking 4 to 10 times as long to return

2019-09-08 Thread Toke Eskildsen
Günter Hipler  wrote:
> what about this
> https://issues.apache.org/jira/browse/SOLR-8096
> seems still unresolved issue?

Unfortunately Russell has de-shared the solrconfig.xml, but as far as I 
remember it does not trigger faceting.

> With our migration from version 4 to 7 last year we experienced similar
> problems.

The iterator-based DocValues implementation in Solr 7 has a performance issue 
with large segments, with symptoms akin to SOLR-8096. If you have not already 
solved your problems, Solr 8 (with an upgraded index) might help.

- Toke Eskildsen


Re: upgrading from solr4 to solr8 searches taking 4 to 10 times as long to return

2019-09-08 Thread Günter Hipler
Thanks for this information Toke - the library community, your domain 
too, is happy to hear this.


I have seen you have done a lot of work at the end of version 7 for 
version 8 but was not sure if it is related to this issue.


Best wishes from Basel, Günter

On 08.09.19 19:42, Toke Eskildsen wrote:

Günter Hipler  wrote:

what about this
https://issues.apache.org/jira/browse/SOLR-8096
seems still unresolved issue?

Unfortunately Russell has de-shared the solrconfig.xml, but as far as I 
remember it does not trigger faceting.


With our migration from version 4 to 7 last year we experienced similar
problems.

The iterator-based DocValues implementation in Solr 7 has a performance issue 
with large segments, with symptoms akin to SOLR-8096. If you have not already 
solved your problems, Solr 8 (with an upgraded index) might help.

- Toke Eskildsen



Sample JWT Solr configuration

2019-09-08 Thread Tyrone
I have Solr 8.1 installed, and I have this sample JWT

HEADER:ALGORITHM & TOKEN TYPE { "alg": "HS256", "typ": "JWT" }
PAYLOAD:DATA

{ "sub": "1234567890", "name": "John Doe", "iat": 1516239022 }
The secret key is "your-256-bit-secret"

Which generates the encoded JWT of

eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6IkpvaG4gRG9lIiwiaWF0IjoxNTE2MjM5MDIyfQ.SflKxwRJSMeKKF2QT4fwpMeJf36POk6yJV_adQssw5c

I am following the documentation for Solr 8.1 about how to configure AWT 
authentication 

https://lucene.apache.org/solr/guide/8_1/jwt-authentication-plugin.html#editing-jwt-authentication-plugin-configuration

Which says that the security.json file will have the the following JSON object

{ "authentication": { "class":"solr.JWTAuthPlugin" } }
Which can have a lot more fields like jwk

Can someone show me an example of how the information for the JWT e.g

HEADER:ALGORITHM & TOKEN TYPE

{ "alg": "HS256", "typ": "JWT" }
PAYLOAD:DATA

{ "sub": "1234567890", "name": "John Doe", "iat": 1516239022 }
can be put into this object, and what field it should use

{ "authentication": { "class":"solr.JWTAuthPlugin" } }

Sent from my iPhone

Re: Query field alias - issue with circular reference

2019-09-08 Thread David Smiley
No but this seems like a decent enhancement request.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Fri, Aug 9, 2019 at 3:07 AM Jaroslaw Rozanski 
wrote:

> Hi Folks,
>
>
>
> Question about query field aliases.
>
>
>
> Assuming one has fields:
>
>  * foo1
>  * foo2
> Sending "defType=edismax&q=foo:hello&f.foo.qf=foo1 foo2" will work.
>
>
>
> But what in case of, when one has fields:
>
>  * foo
>  * foo1
> Say we want to add behaviour to queries that are already in use. We want
> to search in existing "foo" and "foo1" without making query changes.
>
>
>
> Sending "defType=edismax&q=foo:hello&f.foo.qf=foo foo1" will *not* work.
> The error is "org.apache.solr.search.SyntaxError: Field aliases lead to a
> cycle".
>
>
>
> So, is there anyway, to extend search query for the existing field without
> modifying index?
>
>
> --
> Jaroslaw Rozanski | m...@jarekrozanski.eu
>


Issue with delete

2019-09-08 Thread Jayadevan Maymala
Hello All,

I have a 3-node Solr cluster using a 3-node Zoookeeper system. Solr Version
is 7.3.0. We have batch deletes which were working a few days ago. All of a
sudden, they stopped working (I did run a yum update on the client machine
- not sure if it did anything to the Guzzle client). The delete is sent via
GuzzleHttp client from Lumen (php Microservices) framework. The delete
request reaches the Solr servers all right - here is from the log -
(qtp434091818-167594) [c:paymetryproducts s:shard1 r:core_node4
x:paymetryproducts_shard1_replica_n2] o.a.s.u.p.LogUpdateProcessorFactory
[paymetryproducts_shard1_replica_n2]  webapp=/solr path=/update
params={stream.body=category_id:"*5812b8c81874e142b86fbb0e*"&commit=true&wt=json}{deleteByQuery=category_id:"5812b8c81874e142b86fbb0e"
(-1644171174075170816),commit=} 0 3695
I tried setting both 'json' and 'xml' wt types. A dump of the response on
the client gives me only this -

(
[stream:GuzzleHttp\Psr7\Stream:private] => Resource id #403
[size:GuzzleHttp\Psr7\Stream:private] =>
[seekable:GuzzleHttp\Psr7\Stream:private] => 1
[readable:GuzzleHttp\Psr7\Stream:private] => 1
[writable:GuzzleHttp\Psr7\Stream:private] => 1
[uri:GuzzleHttp\Psr7\Stream:private] => php://temp
[customMetadata:GuzzleHttp\Psr7\Stream:private] => Array
(
)

)

If I execute the delete from Solr Admin panel, it works.  The query I am
executing from Admin to check if the data was deleted is this  (I am
forwarding Solr port to local machine).
http://127.0.0.1:8993/solr/paymetryproducts/select?q=category_id%20:%22
*5812b8c81874e142b86fbb0e*%22

Regards,
Jayadevan


Re: Issue with delete

2019-09-08 Thread Jörn Franke
Do you commit after running the delete?

> Am 09.09.2019 um 06:59 schrieb Jayadevan Maymala :
> 
> Hello All,
> 
> I have a 3-node Solr cluster using a 3-node Zoookeeper system. Solr Version
> is 7.3.0. We have batch deletes which were working a few days ago. All of a
> sudden, they stopped working (I did run a yum update on the client machine
> - not sure if it did anything to the Guzzle client). The delete is sent via
> GuzzleHttp client from Lumen (php Microservices) framework. The delete
> request reaches the Solr servers all right - here is from the log -
> (qtp434091818-167594) [c:paymetryproducts s:shard1 r:core_node4
> x:paymetryproducts_shard1_replica_n2] o.a.s.u.p.LogUpdateProcessorFactory
> [paymetryproducts_shard1_replica_n2]  webapp=/solr path=/update
> params={stream.body=category_id:"*5812b8c81874e142b86fbb0e*"&commit=true&wt=json}{deleteByQuery=category_id:"5812b8c81874e142b86fbb0e"
> (-1644171174075170816),commit=} 0 3695
> I tried setting both 'json' and 'xml' wt types. A dump of the response on
> the client gives me only this -
> 
> (
>[stream:GuzzleHttp\Psr7\Stream:private] => Resource id #403
>[size:GuzzleHttp\Psr7\Stream:private] =>
>[seekable:GuzzleHttp\Psr7\Stream:private] => 1
>[readable:GuzzleHttp\Psr7\Stream:private] => 1
>[writable:GuzzleHttp\Psr7\Stream:private] => 1
>[uri:GuzzleHttp\Psr7\Stream:private] => php://temp
>[customMetadata:GuzzleHttp\Psr7\Stream:private] => Array
>(
>)
> 
> )
> 
> If I execute the delete from Solr Admin panel, it works.  The query I am
> executing from Admin to check if the data was deleted is this  (I am
> forwarding Solr port to local machine).
> http://127.0.0.1:8993/solr/paymetryproducts/select?q=category_id%20:%22
> *5812b8c81874e142b86fbb0e*%22
> 
> Regards,
> Jayadevan


Re: Query terms and the match state

2019-09-08 Thread Scott Stults
Lucene has a SynonymQuery and a BlendedTermQuery that do something like you
want in different ways. However, if you want to keep your existing schema
and do this through Solr you can use the constant score syntax in edismax
on each term:

q=name:(corsair)^=1.0 name:(ddr)^=1.0 manu:(corsair)^=1.0 manu:(ddr)^=1.0

The resulting score will be the total number of times each term matched in
either field. (Note, if you group the terms together in the parentheses
like "name:(corsair ddr)^=1.0" you'll only know if either term matched --
the whole clause gets a score of 1.0). For the techproducts example corpus:

[
  {
"name":"CORSAIR  XMS 2GB (2 x 1GB) 184-Pin DDR SDRAM
Unbuffered DDR 400 (PC 3200) Dual Channel Kit System Memory - Retail",
"manu":"Corsair Microsystems Inc.",
"score":3.0},
  {
"name":"CORSAIR ValueSelect 1GB 184-Pin DDR SDRAM Unbuffered
DDR 400 (PC 3200) System Memory - Retail",
"manu":"Corsair Microsystems Inc.",
"score":3.0},
  {
"name":"A-DATA V-Series 1GB 184-Pin DDR SDRAM Unbuffered DDR
400 (PC 3200) System Memory - OEM",
"manu":"A-DATA Technology Inc.",
"score":1.0}]


You could use this as the basis for a function query to gain more control
over your scoring.

Hope that helps!

-Scott


On Tue, Sep 3, 2019 at 1:35 PM Kumaresh AK  wrote:

> Hello Solr Community!
>
> *Problem*: I wish to know if the result document matched all the terms in
> the query. The ranking used in solr works most of the time. For some cases
> where one of the term is rare and occurs in couple of fields; such
> documents trump a document which matches all the terms. Ideally i wish to
> have such a document (that matches all terms) to trump a document that
> matches only 9/10 terms but matches one of the rare terms twice.
> eg:
> *query1*
> field1:(a b c d) field2:(a b c d)
> Results of the above query looks good.
>
> *query2*
> filed1:(a b c 5) field2:(a b c 5)
> result:
> doc1: {field1: b c 5 field2: b c 5}
> 
> doc21: {field1: a b c 5 field: null}
>
> Results are almost good except that doc21 is trailing doc1. There are a few
> documents similar to doc1 and pushes doc21 to next page (I use default page
> size = 10)
>
> I understand that this is how tf-idf works. I tried to boost certain fields
> to solve this problem. But that breaks normal cases (query1). So, I set out
> to just solve the case where I wish to boost (or) augment a field with that
> information (as ratio of matched-terms/total-terms)
>
> *Ask:* Is it possible to get back the terms of the query and the matched
> state ?
>
> I tried
>
>- debug=query option (with the default select handler)
>- with terms in the debug response I could write a function query to
>know its match state
>
> Is this approach safe/performant for production use ? Is there a better
> approach to solve this problem ?
>
> Regards,
> Kumaresh
>


-- 
Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC
| 434.409.2780
http://www.opensourceconnections.com


Re: Sample JWT Solr configuration

2019-09-08 Thread Jan Høydahl
In your security.json, add a JWK matching your signing algorithm, using the 
“jwk” JSON key.

Example:
“jwk” : { "kty" : "oct", "kid" : "0afee142-a0af-4410-abcc-9f2d44ff45b5", "alg" 
: "HS256", "k" : "FdFYFzERwC2uCBB46pZQi4GG85LujR8obt-KWRBICVQ" }

Of course you need to find a way to encode your particular secret in jwk 
format, there should be plenty of tools available for that. If you intend to 
use symmetric key in prod you have to configure solr so that security.json is 
not readable for anyone but the admin!

Jan Høydahl

> 9. sep. 2019 kl. 05:46 skrev Tyrone :
> 
> HS256