TikaEntityProcessor with DIH

2020-04-20 Thread Srinivas Kashyap
Hi,

we were in Solr 5.2.1 and TikaEntityProcessor to index pdf documents through 
DIH and was working fine. The jars were tika-core-1.4.jar and 
tika-parsers-1.4.jar.

Below is my schema.xml: (p,s. All filed types have been defined)


   
   
   
   
   
   
   

And my tika-data-config.xml:





 
 
 
 









Now we have upgraded to solr-8.4.1 and when I try to put the above jars and 
index, I see only below are getting indexed:

{
"fileName":"01 - System-Wide Functions.pdf",
"size":"2524884",
"lastmodified":"Mon Jul 15 06:26:52 UTC 2019",
"path":"D:\\tssindex\\server\\solr\\help\\help\\01 - System-Wide 
Functions.pdf",
"text":"",
"_version_":1664474933885927424},
{

As you can see, the text field is empty & author, title fields are not getting 
indexed and any search on that text field is not returning the documents.

Please help me in this regard.


Thanks,
Srinivas



DISCLAIMER:
E-mails and attachments from Bamboo Rose, LLC are confidential.
If you are not the intended recipient, please notify the sender immediately by 
replying to the e-mail, and then delete it without making copies or using it in 
any way.
No representation is made that this email or any attachments are free of 
viruses. Virus scanning is recommended and is the responsibility of the 
recipient.

Disclaimer

The information contained in this communication from the sender is 
confidential. It is intended solely for use by the recipient and others 
authorized to receive it. If you are not the recipient, you are hereby notified 
that any disclosure, copying, distribution or taking action in relation of the 
contents of this information is strictly prohibited and may be unlawful.

This email has been scanned for viruses and malware, and may have been 
automatically archived by Mimecast Ltd, an innovator in Software as a Service 
(SaaS) for business. Providing a safer and more useful place for your human 
generated data. Specializing in; Security, archiving and compliance. To find 
out more visit the Mimecast website.


Re: Refresh doesn't work in the new Nodes view in Admin UI on Windows

2020-04-20 Thread Colvin Cowie
I have opened https://issues.apache.org/jira/browse/SOLR-14416 for this

On Thu, 20 Jun 2019 at 17:01, Colvin Cowie 
wrote:

> On Solr 8.1.1 / 7.7.2 with Oracle 1.8.0_191 25.191-b12 with Solr running
> on Windows 10
>
> In the Nodes view of the Admin UI,
> http://localhost:8983/solr/#/~cloud?view=nodes there is a refresh button.
> However when you click it, the only thing that gets visibly refreshed is
> the 'bar chart' (not sure what to call it - it's shown when you choose show
> details) of the index shard size on disk. The other stats do not update.
>
> Firefox dev console shows:
>
>
>
>
>
>
>
>
>
>
>
> *Error: s.system.uptime is
> undefinednodesSubController/$scope.reload/<@http://localhost:8983/solr/js/angular/controllers/cloud.js:384:11
> v/http://localhost:8983/solr/libs/angular-resource.min.js:33:133
> processQueue@http://localhost:8983/solr/libs/angular.js:13193:27
> scheduleProcessQueue/<@http://localhost:8983/solr/libs/angular.js:13209:27
> $eval@http://localhost:8983/solr/libs/angular.js:14406:16
> $digest@http://localhost:8983/solr/libs/angular.js:14222:15
> $apply@http://localhost:8983/solr/libs/angular.js:14511:13
> done@http://localhost:8983/solr/libs/angular.js:9669:36
> completeRequest@http://localhost:8983/solr/libs/angular.js:9859:7
> requestLoaded@http://localhost:8983/solr/libs/angular.js:9800:9
> *
>
> The system response has upTimeMs in it for the JVM/JMX properties, but no
> system/uptime
>
> {
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> *  "responseHeader":{"status":0,"QTime":63},
> "localhost:8983_solr":{"responseHeader":{  "status":0,
> "QTime":49},"mode":"solrcloud","zkHost":"localhost:9983",
> "solr_home":"...","lucene":{  "solr-spec-version":"8.1.1",
> "solr-impl-version":"8.1.1 fcbe46c28cef11bc058779afba09521de1b19bef - ab -
> 2019-05-22 15:20:01",  "lucene-spec-version":"8.1.1",
> "lucene-impl-version":"8.1.1 fcbe46c28cef11bc058779afba09521de1b19bef - ab
> - 2019-05-22 15:15:24"},"jvm":{  "version":"1.8.0_211 25.211-b12",
> "name":"Oracle Corporation Java HotSpot(TM) 64-Bit Server VM",
> "spec":{"vendor":"Oracle Corporation","name":"Java Platform
> API Specification","version":"1.8"},  "jre":{
> "vendor":"Oracle Corporation","version":"1.8.0_211"},  "vm":{
>   "vendor":"Oracle Corporation","name":"Java HotSpot(TM) 64-Bit
> Server VM","version":"25.211-b12"},  "processors":8,
> "memory":{"free":"1.4 GB","total":"2 GB","max":"2
> GB","used":"566.7 MB (%27.7)","raw":{
> "free":1553268432,  "total":2147483648,  "max":2147483648,
> "used":594215216,  "used%":27.670302242040634}},
> "jmx":{"bootclasspath":"...","classpath":"start.jar",
>   "commandLineArgs":[...],"startTime":"2019-06-20T11:41:58.955Z",
>   "upTimeMS":516602}},"system":{  "name":"Windows 10",
> "arch":"amd64",  "availableProcessors":8,
> "systemLoadAverage":-1.0,  "version":"10.0",
> "committedVirtualMemorySize":2709114880,
> "freePhysicalMemorySize":16710127616,
> "freeSwapSpaceSize":16422531072,
> "processCpuLoad":0.13941671744473663,  "processCpuTime":194609375000,
> "systemCpuLoad":0.25816002967796037,
> "totalPhysicalMemorySize":34261250048,
> "totalSwapSpaceSize":39361523712},"node":"localhost:8983_solr"}}*
>
> The SystemInfoHandler does this:
>
>
>
>
>
>
>
>
>
> *// Try some command line things:try {   if (!Constants.WINDOWS)
> {info.add( "uname",  execute( "uname -a" ) );info.add(
> "uptime", execute( "uptime" ) );  }} catch( Exception ex ) {
> log.warn("Unable to execute command line tools to get operating system
> properties.", ex);} *
>
> Which appears to be the problem.
>
> If I run uptime from my Ubuntu shell in WSL the output is like "16:41:40
> up 7 min,  0 users,  load average: 0.52, 0.58, 0.59". If I make the System
> handler return that then there are no further dev console errors...
> However, even with that "fixed", refresh doesn't actually seem to refresh
> anything other than the graph.
>
> In contrast, refreshing the System (e.g. memory) section on the main
> dashboard does correctly update.
>
> The missing "uptime" from the response looks like the problem, but isn't
> actually stopping refresh from doi

Solr facet order same as result set

2020-04-20 Thread Venu
Hi
For a given query and sort order, Solr returns the results(ordered based on
score and sort order) set along with facets(ordered in descending order of
buckets counts)

Is there any way to get the facets also in the same order as results/docs? I
tried with json facet, but I am not able to make it. 

In the below example, I sorted based on multiple fields, say, 'rank' and
'sales' and the first doc is sku: 123456, but the same is not returned in
the facets, but I want those SKUs to be part of the facets.

*Sample query:*
http://localhost:8983/solr/samplecollection/select?q=((group_id: ("g.0" OR
"g.1")) OR (!group_id: "g.46"))
AND
!(sku: 1000422 OR group_id: g.13) &json.facet={ sku: { type: terms, field:
sku, facet: { fc_ids: { type: terms, field: fc_id } } } }&sort=rank desc,
sales desc&fl=score *&rows=3
*Sample Response:*
"response": {
"numFound": 1998,
"start": 0,
"maxScore": 1.7779741,
"docs": [
{
"sku": "123456",
"group_id": "g.0",
"id": "123456.0",
"fc_id": "0",
"_version_": 1664396609960542200,
"score": 1.7779741
},
{
"sku": "366222",
"group_id": "g.0",
"id": "366222.0",
"fc_id": "0",
"_version_": 1664396609963688000,
"score": 1.7779741
},
{
"sku": "1000425",
"group_id": "g.0",
"id": "1000425.0",
"fc_id": "0",
"_version_": 1664396609964736500,
"score": 1.7779741
}
]
},
"facets": {
"count": 1998,
"sku": {
"buckets": [
{
"val": "1000425",
"count": 2,
"fc_ids": {
"buckets": [
{
"val": "0",
"count": 1
},
{
"val": "1",
"count": 1
}
]
}
},
{
"val": "100253356",
"count": 2,
"fc_ids": {
"buckets": [
{
"val": "0",
"count": 1
},
{
"val": "1",
"count": 1
}
]
}
}




--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Solr facet order same as result set

2020-04-20 Thread Erick Erickson
I have no idea what “getting the facets in the same order as the sort docs” 
would mean. 
Even in the single-valued case, say 

idrankvalue_in_facet_field
doc1 1  32
doc2 2  76
doc3 33
doc4 4  76

How would facets be ordered in the same sort order as the docs?

It’s much worse with multivalued fields, since a single doc can
contribute to more than one facet.

Best,
Erick

> On Apr 20, 2020, at 5:53 AM, Venu  wrote:
> 
> Hi
> For a given query and sort order, Solr returns the results(ordered based on
> score and sort order) set along with facets(ordered in descending order of
> buckets counts)
> 
> Is there any way to get the facets also in the same order as results/docs? I
> tried with json facet, but I am not able to make it. 
> 
> In the below example, I sorted based on multiple fields, say, 'rank' and
> 'sales' and the first doc is sku: 123456, but the same is not returned in
> the facets, but I want those SKUs to be part of the facets.
> 
> *Sample query:*
> http://localhost:8983/solr/samplecollection/select?q=((group_id: ("g.0" OR
> "g.1")) OR (!group_id: "g.46"))
> AND
> !(sku: 1000422 OR group_id: g.13) &json.facet={ sku: { type: terms, field:
> sku, facet: { fc_ids: { type: terms, field: fc_id } } } }&sort=rank desc,
> sales desc&fl=score *&rows=3
> *Sample Response:*
> "response": {
> "numFound": 1998,
> "start": 0,
> "maxScore": 1.7779741,
> "docs": [
> {
> "sku": "123456",
> "group_id": "g.0",
> "id": "123456.0",
> "fc_id": "0",
> "_version_": 1664396609960542200,
> "score": 1.7779741
> },
> {
> "sku": "366222",
> "group_id": "g.0",
> "id": "366222.0",
> "fc_id": "0",
> "_version_": 1664396609963688000,
> "score": 1.7779741
> },
> {
> "sku": "1000425",
> "group_id": "g.0",
> "id": "1000425.0",
> "fc_id": "0",
> "_version_": 1664396609964736500,
> "score": 1.7779741
> }
> ]
> },
> "facets": {
> "count": 1998,
> "sku": {
> "buckets": [
> {
> "val": "1000425",
> "count": 2,
> "fc_ids": {
> "buckets": [
> {
> "val": "0",
> "count": 1
> },
> {
> "val": "1",
> "count": 1
> }
> ]
> }
> },
> {
> "val": "100253356",
> "count": 2,
> "fc_ids": {
> "buckets": [
> {
> "val": "0",
> "count": 1
> },
> {
> "val": "1",
> "count": 1
> }
> ]
> }
> }
> 
> 
> 
> 
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html



Re: Solr facet order same as result set

2020-04-20 Thread Venu
Probably I haven't framed my question properly.

Consider the schema with the fields - id, sku, fc_id, group_id
The same SKU can be part of multiple documents with different fc_id and
group_id.

For a given search query, multiple documents having the same SKU will be
returned. Is there any way I can get all the fc_ids for those SKUs returned
in the result set? Do I have to do a separate query with those SKUs again to
fetch the fc_ids through json facets?

I am fetching the fc_ids through JSON-facets. But the order of those
returned from facets is different from the result set. 



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Indexing data from multiple data sources

2020-04-20 Thread Charlie Hull
The link you quote is Sematext's mirror of the Apache solr-user mailing 
list. There are others also providing copies of this list. As the cat is 
very much out of the bag your best course of action is to change all the 
logins and passwords that have been leaked and review your security 
procedures.


Cheers

Charlie

On 18/04/2020 13:27, RaviKiran Moola wrote:

Hi,
Greetings of the day!!!

Unfortunately we have enclosed our database source details in the Solr 
community post while sending our queries to solr support as mentioned 
in the below mail.


We find that it has been posted with this link 
https://sematext.com/opensee/m/Solr/eHNlswSd1vD6AF?subj=RE+Indexing+data+from+multiple+data+sources


As it is open to the world, what we are requesting here is, could you 
please remove that post as-soon-as possible before it creates any 
sucurity issues for us.


Your help is very very appreciable!!!

FYI.
Here I'm attaching the below screenshot




Thanks & Regards,

Ravikiran Moola



*From:* RaviKiran Moola
*Sent:* Friday, April 17, 2020 9:13 PM
*To:* solr-user@lucene.apache.org 
*Subject:* RE: Indexing data from multiple data sources
Hi,

Greetings!!!

We are working on indexing data from multiple data sources (MySQL & 
MSSQL) in a single collection. We specified data source details like 
connection details along with the required fields for both data 
sources in a single data config file, along with specified required 
fields details in the managed schema and here fetching the same 
columns from both data sources by specifying the common “unique key”.


Unable to index the data from the data sources using solr.

Here I’m attaching the data config file and screenshot.

Data config file:

 url="jdbc:mysql://182.74.133.92:3306/ra_dev" user="devuser" 
password="Welcome_009" batchSize="1" />
 driver="com.microsoft.sqlserver.jdbc.SQLServerDriver" 
url="jdbc:sqlserver://182.74.133.92;databasename=BB_SOLR" 
user="matuser" password="MatDev:07"/>

  
  

   
   
  

   
   
  

 



Thanks & Regards,

Ravikiran Moola

+91-9494924492




--
Charlie Hull
OpenSource Connections, previously Flax

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.o19s.com



"SolrCore Initialization Failures" error message appears briefly in Solr 8.5.1 Admin UI

2020-04-20 Thread Colvin Cowie
Sorry if this has already been raised, but I didn't see it.

When loading / refreshing the Admin UI in 8.5.1, it briefly but *visibly*
shows a placeholder for the "SolrCore Initialization Failures" error
message, with a lot of redness. It looks like there is a real problem.
Obviously the message then disappears, and it can be ignored.
However, if I was a first time user, it would not give me confidence that
everything is okay. In a way, an error message that appears briefly then
disappears before I can finish reading it is worse than one which just
stays there.

Here's a screenshot of what I mean
https://drive.google.com/open?id=1eK4HNprEuEua08_UwtEoDQuRwFgqbGjU
and a gif:
https://drive.google.com/open?id=1Rw3z03MzAqFpfZFU4uVv4G158vk66QVx

I assume that this is connected to the UI updates discussed in
https://issues.apache.org/jira/browse/SOLR-14359

Cheers,
Colvin


Solr filter cache hits not reflecting

2020-04-20 Thread Rahul Goswami
Hello,

I was trying to analyze the filter cache performance and noticed a strange
thing. Upon searching with fq, the entry gets added to the cache the first
time. Observing from the "Stats/Plugins" tab on Solr admin UI, the 'lookup'
and 'inserts' count gets incremented.
However, if I search with the same fq again, I expect the lookup and hits
count to increase, but it doesn't. This ultimately results in an incorrect
hitratio.
I tried this scenario on Solr 7.2.1, 7.7.2 and 8.5 and observe the same
behavior on all three versions.

Is this a bug or am I missing something here?

Thanks,
Rahul


Re: Solr filter cache hits not reflecting

2020-04-20 Thread Chris Hostetter


: I was trying to analyze the filter cache performance and noticed a strange
: thing. Upon searching with fq, the entry gets added to the cache the first
: time. Observing from the "Stats/Plugins" tab on Solr admin UI, the 'lookup'
: and 'inserts' count gets incremented.
: However, if I search with the same fq again, I expect the lookup and hits
: count to increase, but it doesn't. This ultimately results in an incorrect
: hitratio.

We'll need to see the actual specifics of the requests you're executing & 
stats you're seeing in order to make any guesses as to why you're not 
seeing the expected outcome.

Wild guesses: 
- Are you use Date math based fq params that don't round?  
- Are you using SolrCloud and some of your requests are getting routed to 
different replicas?
- Are you using some complex/custom filter impl that may have a bug in 
it's equals/hashCode impl that prevents it from being a cache hit?


Here's an example showing that the basics of filterCache work find with 
8.5 for trivial examples...

$ bin/solr -e techproducts
...
$ curl -sS 
'http://localhost:8983/solr/techproducts/admin/mbeans?wt=json&indent=true&category=CACHE&stats=true&key=filterCache'
 | grep 'CACHE.searcher.filterCache'
  "CACHE.searcher.filterCache.hits":0,
  "CACHE.searcher.filterCache.cumulative_evictions":0,
  "CACHE.searcher.filterCache.cleanupThread":false,
  "CACHE.searcher.filterCache.size":0,
  "CACHE.searcher.filterCache.maxRamMB":-1,
  "CACHE.searcher.filterCache.hitratio":0.0,
  "CACHE.searcher.filterCache.warmupTime":0,
  "CACHE.searcher.filterCache.idleEvictions":0,
  "CACHE.searcher.filterCache.evictions":0,
  "CACHE.searcher.filterCache.cumulative_hitratio":0.0,
  "CACHE.searcher.filterCache.lookups":0,
  "CACHE.searcher.filterCache.cumulative_hits":0,
  "CACHE.searcher.filterCache.cumulative_inserts":0,
  "CACHE.searcher.filterCache.ramBytesUsed":1328,
  "CACHE.searcher.filterCache.cumulative_idleEvictions":0,
  "CACHE.searcher.filterCache.inserts":0,
  "CACHE.searcher.filterCache.cumulative_lookups":0}}},
$ curl -sS 
'http://localhost:8983/solr/techproducts/query?q=*:*&fq=inStock=true' > 
/dev/null
$ curl -sS 
'http://localhost:8983/solr/techproducts/admin/mbeans?wt=json&indent=true&category=CACHE&stats=true&key=filterCache'
 | grep 'CACHE.searcher.filterCache'
  "CACHE.searcher.filterCache.hits":0,
  "CACHE.searcher.filterCache.cumulative_evictions":0,
  "CACHE.searcher.filterCache.cleanupThread":false,
  "CACHE.searcher.filterCache.size":1,
  "CACHE.searcher.filterCache.maxRamMB":-1,
  "CACHE.searcher.filterCache.hitratio":0.0,
  "CACHE.searcher.filterCache.warmupTime":0,
  "CACHE.searcher.filterCache.idleEvictions":0,
  "CACHE.searcher.filterCache.evictions":0,
  "CACHE.searcher.filterCache.cumulative_hitratio":0.0,
  "CACHE.searcher.filterCache.lookups":1,
  "CACHE.searcher.filterCache.cumulative_hits":0,
  "CACHE.searcher.filterCache.cumulative_inserts":1,
  "CACHE.searcher.filterCache.ramBytesUsed":4808,
  "CACHE.searcher.filterCache.cumulative_idleEvictions":0,
  "CACHE.searcher.filterCache.inserts":1,
  "CACHE.searcher.filterCache.cumulative_lookups":1}}},
$ curl -sS 
'http://localhost:8983/solr/techproducts/query?q=name:solr&fq=inStock=true' > 
/dev/null
$ curl -sS 
'http://localhost:8983/solr/techproducts/admin/mbeans?wt=json&indent=true&category=CACHE&stats=true&key=filterCache'
 | grep 'CACHE.searcher.filterCache'
  "CACHE.searcher.filterCache.hits":1,
  "CACHE.searcher.filterCache.cumulative_evictions":0,
  "CACHE.searcher.filterCache.cleanupThread":false,
  "CACHE.searcher.filterCache.size":1,
  "CACHE.searcher.filterCache.maxRamMB":-1,
  "CACHE.searcher.filterCache.hitratio":0.5,
  "CACHE.searcher.filterCache.warmupTime":0,
  "CACHE.searcher.filterCache.idleEvictions":0,
  "CACHE.searcher.filterCache.evictions":0,
  "CACHE.searcher.filterCache.cumulative_hitratio":0.5,
  "CACHE.searcher.filterCache.lookups":2,
  "CACHE.searcher.filterCache.cumulative_hits":1,
  "CACHE.searcher.filterCache.cumulative_inserts":1,
  "CACHE.searcher.filterCache.ramBytesUsed":4808,
  "CACHE.searcher.filterCache.cumulative_idleEvictions":0,
  "CACHE.searcher.filterCache.inserts":1,
  "CACHE.searcher.filterCache.cumulative_lookups":2}}},

...so the first time we use 'fq=inStock:true' we get a single lookup and a 
single insert.  he second time we use it (even with a different 'q' param) 
we get our 2nd lookup and our 1st hit -- no new inserts -- and now we have 
a 50% hitratio.

how does that compare with what you see?  what do similar commands show 
you with your fq?




-Hoss
http://www.lucidworks.com/


Re: Solr facet order same as result set

2020-04-20 Thread Chris Hostetter


The goal you are describing doesn't really sound at all like faceting -- 
it sounds like what you want might be "grouping" (or collapse/expand) 
... OR: depending on how you index your data perhaps what you really 
want is "nested documents" ... or maybe maybe if youre usecase is simple 
enough just using the "subquery" DocTransformer w/o needing explicit 
relationships between the docs at indexing time.

I would suggest you read the docs on each of these features and see what 
sounds best to you...

https://lucene.apache.org/solr/guide/8_5/result-grouping.html
https://lucene.apache.org/solr/guide/8_5/collapse-and-expand-results.html

https://lucene.apache.org/solr/guide/8_5/indexing-nested-documents.html
https://lucene.apache.org/solr/guide/8_5/searching-nested-documents.html
https://lucene.apache.org/solr/guide/8_5/transforming-result-documents.html#child-childdoctransformerfactory

https://lucene.apache.org/solr/guide/8_5/transforming-result-documents.html#subquery


: Date: Mon, 20 Apr 2020 04:37:06 -0700 (MST)
: From: Venu 
: Reply-To: solr-user@lucene.apache.org
: To: solr-user@lucene.apache.org
: Subject: Re: Solr facet order same as result set
: 
: Probably I haven't framed my question properly.
: 
: Consider the schema with the fields - id, sku, fc_id, group_id
: The same SKU can be part of multiple documents with different fc_id and
: group_id.
: 
: For a given search query, multiple documents having the same SKU will be
: returned. Is there any way I can get all the fc_ids for those SKUs returned
: in the result set? Do I have to do a separate query with those SKUs again to
: fetch the fc_ids through json facets?
: 
: I am fetching the fc_ids through JSON-facets. But the order of those
: returned from facets is different from the result set. 
: 
: 
: 
: --
: Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
: 

-Hoss
http://www.lucidworks.com/


Re: Solr filter cache hits not reflecting

2020-04-20 Thread Rahul Goswami
Hi Hoss,

Thanks for your detailed response. In your steps if you go a step
further and search again with the same fq, you should be able to
uncover the problem. Here are the step-by-step observations on Solr
8.5 (7.2.1 and 7.7.2 have the same issue)


1) Before any queries:

http://localhost:8984/solr/admin/metrics?group=core&prefix=CACHE.searcher.filterCache

   "solr.core.techproducts":{
  "CACHE.searcher.filterCache":{
"lookups":0,
"idleEvictions":0,
"evictions":0,
"cumulative_inserts":0,
"ramBytesUsed":1328,
"cumulative_hits":0,
"cumulative_idleEvictions":0,
"hits":0,
"cumulative_evictions":0,
"cleanupThread":false,
"size":0,
"hitratio":0.0,
"cumulative_lookups":0,
"cumulative_hitratio":0.0,
"warmupTime":0,
"maxRamMB":-1,
"inserts":0}},


2) With fq:manu:samsung OR manu:apple

http://localhost:8984/solr/techproducts/select?q=*:*&fq=manu:samsung%20OR%20manu:apple

"solr.core.techproducts":{
  "CACHE.searcher.filterCache":{
"lookups":1,
"idleEvictions":0,
"evictions":0,
"cumulative_inserts":1,
"ramBytesUsed":4800,
"cumulative_hits":0,
"cumulative_idleEvictions":0,
"hits":0,
"cumulative_evictions":0,
"cleanupThread":false,
"size":1,
"hitratio":0.0,
"cumulative_lookups":1,
"cumulative_hitratio":0.0,
"item_manu:samsung
manu:apple":"SortedIntDocSet{size=2,ramUsed=40 bytes}",
"warmupTime":0,
"maxRamMB":-1,
"inserts":1}},

3) q changed but same fq... the hits and lookups are updated as expected:
http://localhost:8984/solr/techproducts/select?q=popularity:[5%20TO%2012]&fq=manu:samsung%20OR%20manu:apple

   "solr.core.techproducts":{
  "CACHE.searcher.filterCache":{
"lookups":2,
"idleEvictions":0,
"evictions":0,
"cumulative_inserts":1,
"ramBytesUsed":4800,
"cumulative_hits":1,
"cumulative_idleEvictions":0,
"hits":1,
"cumulative_evictions":0,
"cleanupThread":false,
"size":1,
"hitratio":0.5,
"cumulative_lookups":2,
"cumulative_hitratio":0.5,
"item_manu:samsung
manu:apple":"SortedIntDocSet{size=2,ramUsed=40 bytes}",
"warmupTime":0,
"maxRamMB":-1,
"inserts":1}},

4) A query with different fq.
http://localhost:8984/solr/techproducts/select?q=popularity:[5%20TO%2012]&fq=manu:samsung

"solr.core.techproducts":{
  "CACHE.searcher.filterCache":{
"lookups":3,
"idleEvictions":0,
"evictions":0,
"cumulative_inserts":2,
"ramBytesUsed":6076,
"cumulative_hits":1,
"cumulative_idleEvictions":0,
"hits":1,
"cumulative_evictions":0,
"cleanupThread":false,
"size":2,
"item_manu:samsung":"SortedIntDocSet{size=1,ramUsed=36 bytes}",
"hitratio":0.33,
"cumulative_lookups":3,
"cumulative_hitratio":0.33,
"item_manu:samsung
manu:apple":"SortedIntDocSet{size=2,ramUsed=40 bytes}",
"warmupTime":0,
"maxRamMB":-1,

5) A query with the same fq again (fq=manu:samsung OR manu:apple)the
numbers don't get update for this fq hereafter for subsequent searches

http://localhost:8984/solr/techproducts/select?q=popularity:[5%20TO%2012]&fq=manu:samsung%20OR%20manu:apple

"solr.core.techproducts":{
  "CACHE.searcher.filterCache":{
"lookups":3,
"idleEvictions":0,
"evictions":0,
"cumulative_inserts":2,
"ramBytesUsed":6076,
"cumulative_hits":1,
"cumulative_idleEvictions":0,
"hits":1,
"cumulative_evictions":0,
"cleanupThread":false,
"size":2,
"item_manu:samsung":"SortedIntDocSet{size=1,ramUsed=36 bytes}",
"hitratio":0.33,
"cumulative_lookups":3,
"cumulative_hitratio":0.33,
"item_manu:samsung
manu:apple":"SortedIntDocSet{size=2,ramUsed=40 bytes}",
"warmupTime":0,
"maxRamMB":-1,
"inserts":2}},

Thanks,

Rahul


On Mon, Apr 20, 2020 at 2:48 PM Chris Hostetter 
wrote:

>
> : I was trying to analyze the filter cache performance and noticed a
> strange
> : thing. Upon searching with fq, the entry gets added to the cache the
> first
> : time. Observing from the "Stats/Plugins" tab on Solr admin UI, the
> 'lookup'
> : and 'inserts' count gets incremented.
> : However, if I search with the same fq again, I expect the lookup and hits
> : count to increase, but it doesn't. This ultimately results in an
> incorrect
> : hitratio.
>
> We'll need to see the actual specifics of the requests you're executing &
> stats you're seeing in order to make any guesses as to why you're not
> seeing the expected outcome.
>
> Wild guesses:
> - Are you use Date math based fq params that don't round?
> - Are you using SolrCloud and some of your reque

Re: Solr filter cache hits not reflecting

2020-04-20 Thread Chris Hostetter
: 4) A query with different fq.
: 
http://localhost:8984/solr/techproducts/select?q=popularity:[5%20TO%2012]&fq=manu:samsung
...
: 5) A query with the same fq again (fq=manu:samsung OR manu:apple)the
: numbers don't get update for this fq hereafter for subsequent searches
: 
: 
http://localhost:8984/solr/techproducts/select?q=popularity:[5%20TO%2012]&fq=manu:samsung%20OR%20manu:apple

that's not just *A* query with the same fq, it's the *exact* same request 
(q + sort + pagination + all filters)

Whch means that everything solr needs to reply to this request is 
available in the *queryResultCache* -- no filterCache needed at all (if 
you had faceting enabled that would be a different issue: then the 
filterCache would still be needed in order to compute facet counts over 
the entire DocSet matching the query, not just the current page window)...


$ bin/solr -e techproducts
...

# mostly empty caches (techproudct has a single static warming query)

$ curl -sS 
'http://localhost:8983/solr/techproducts/admin/mbeans?wt=json&indent=true&category=CACHE&stats=true'
 | grep -E 
'CACHE.searcher.(queryResultCache|filterCache).(inserts|hits|lookups)'
  "CACHE.searcher.queryResultCache.lookups":0,
  "CACHE.searcher.queryResultCache.inserts":1,
  "CACHE.searcher.queryResultCache.hits":0}},
  "CACHE.searcher.filterCache.hits":0,
  "CACHE.searcher.filterCache.lookups":0,
  "CACHE.searcher.filterCache.inserts":0,

# new q and fq: lookup & insert into both caches...

$ curl -sS 
'http://localhost:8983/solr/techproducts/select?q=popularity:[5%20TO%2012]&fq=manu:samsung%20OR%20manu:apple'
 > /dev/null
$ curl -sS 
'http://localhost:8983/solr/techproducts/admin/mbeans?wt=json&indent=true&category=CACHE&stats=true'
 | grep -E 
'CACHE.searcher.(queryResultCache|filterCache).(inserts|hits|lookups)'
  "CACHE.searcher.queryResultCache.lookups":1,
  "CACHE.searcher.queryResultCache.inserts":2,
  "CACHE.searcher.queryResultCache.hits":0}},
  "CACHE.searcher.filterCache.hits":0,
  "CACHE.searcher.filterCache.lookups":1,
  "CACHE.searcher.filterCache.inserts":1,

# new q, same fq: 
# lookup on both caches, hit on filter, insert on queryResultCache

$ curl -sS 
'http://localhost:8983/solr/techproducts/select?q=*:*&fq=manu:samsung%20OR%20manu:apple'
 > /dev/null
$ curl -sS 
'http://localhost:8983/solr/techproducts/admin/mbeans?wt=json&indent=true&category=CACHE&stats=true'
 | grep -E 
'CACHE.searcher.(queryResultCache|filterCache).(inserts|hits|lookups)'
  "CACHE.searcher.queryResultCache.lookups":2,
  "CACHE.searcher.queryResultCache.inserts":3,
  "CACHE.searcher.queryResultCache.hits":0}},
  "CACHE.searcher.filterCache.hits":1,
  "CACHE.searcher.filterCache.lookups":2,
  "CACHE.searcher.filterCache.inserts":1,

# same q & fq as before:
# hit on queryresultCache means no filterCache needed...

$ curl -sS 
'http://localhost:8983/solr/techproducts/select?q=popularity:[5%20TO%2012]&fq=manu:samsung%20OR%20manu:apple'
 > /dev/null
$ curl -sS 
'http://localhost:8983/solr/techproducts/admin/mbeans?wt=json&indent=true&category=CACHE&stats=true'
 | grep -E 
'CACHE.searcher.(queryResultCache|filterCache).(inserts|hits|lookups)'
  "CACHE.searcher.queryResultCache.lookups":3,
  "CACHE.searcher.queryResultCache.inserts":3,
  "CACHE.searcher.queryResultCache.hits":1}},
  "CACHE.searcher.filterCache.hits":1,
  "CACHE.searcher.filterCache.lookups":2,
  "CACHE.searcher.filterCache.inserts":1,



-Hoss
http://www.lucidworks.com/


solr as a general search engine

2020-04-20 Thread matthew sporleder
Is there a comprehensive/big set of tips for making solr into a
search-engine as a human would expect one to behave?  I poked around
in the nutch github for a minute and found this:
https://github.com/apache/nutch/blob/9e5ae7366f7dd51eaa76e77bee6eb69f812bd29b/src/plugin/indexer-solr/schema.xml
 but I was wondering if I was missing a very obvious document
somewhere.

I guess I'm looking for things like:
use suggester here, use spelling there, use DocValues around here, DIY
pagerank, etc

Thanks,
Matt


Re: Solr filter cache hits not reflecting

2020-04-20 Thread Rahul Goswami
Hoss,
Thank you for such a succinct explanation! I was not aware of the order of
lookups (queryResultCache  followed by filterCache). Makes sense now. Sorry
for the false alarm!

Rahul

On Mon, Apr 20, 2020 at 4:04 PM Chris Hostetter 
wrote:

> : 4) A query with different fq.
> :
> http://localhost:8984/solr/techproducts/select?q=popularity:[5%20TO%2012]&fq=manu:samsung
> ...
> : 5) A query with the same fq again (fq=manu:samsung OR manu:apple)the
> : numbers don't get update for this fq hereafter for subsequent searches
> :
> :
> http://localhost:8984/solr/techproducts/select?q=popularity:[5%20TO%2012]&fq=manu:samsung%20OR%20manu:apple
>
> that's not just *A* query with the same fq, it's the *exact* same request
> (q + sort + pagination + all filters)
>
> Whch means that everything solr needs to reply to this request is
> available in the *queryResultCache* -- no filterCache needed at all (if
> you had faceting enabled that would be a different issue: then the
> filterCache would still be needed in order to compute facet counts over
> the entire DocSet matching the query, not just the current page window)...
>
>
> $ bin/solr -e techproducts
> ...
>
> # mostly empty caches (techproudct has a single static warming query)
>
> $ curl -sS '
> http://localhost:8983/solr/techproducts/admin/mbeans?wt=json&indent=true&category=CACHE&stats=true'
> | grep -E
> 'CACHE.searcher.(queryResultCache|filterCache).(inserts|hits|lookups)'
>   "CACHE.searcher.queryResultCache.lookups":0,
>   "CACHE.searcher.queryResultCache.inserts":1,
>   "CACHE.searcher.queryResultCache.hits":0}},
>   "CACHE.searcher.filterCache.hits":0,
>   "CACHE.searcher.filterCache.lookups":0,
>   "CACHE.searcher.filterCache.inserts":0,
>
> # new q and fq: lookup & insert into both caches...
>
> $ curl -sS '
> http://localhost:8983/solr/techproducts/select?q=popularity:[5%20TO%2012]&fq=manu:samsung%20OR%20manu:apple'
> > /dev/null
> $ curl -sS '
> http://localhost:8983/solr/techproducts/admin/mbeans?wt=json&indent=true&category=CACHE&stats=true'
> | grep -E
> 'CACHE.searcher.(queryResultCache|filterCache).(inserts|hits|lookups)'
>   "CACHE.searcher.queryResultCache.lookups":1,
>   "CACHE.searcher.queryResultCache.inserts":2,
>   "CACHE.searcher.queryResultCache.hits":0}},
>   "CACHE.searcher.filterCache.hits":0,
>   "CACHE.searcher.filterCache.lookups":1,
>   "CACHE.searcher.filterCache.inserts":1,
>
> # new q, same fq:
> # lookup on both caches, hit on filter, insert on queryResultCache
>
> $ curl -sS '
> http://localhost:8983/solr/techproducts/select?q=*:*&fq=manu:samsung%20OR%20manu:apple'
> > /dev/null
> $ curl -sS '
> http://localhost:8983/solr/techproducts/admin/mbeans?wt=json&indent=true&category=CACHE&stats=true'
> | grep -E
> 'CACHE.searcher.(queryResultCache|filterCache).(inserts|hits|lookups)'
>   "CACHE.searcher.queryResultCache.lookups":2,
>   "CACHE.searcher.queryResultCache.inserts":3,
>   "CACHE.searcher.queryResultCache.hits":0}},
>   "CACHE.searcher.filterCache.hits":1,
>   "CACHE.searcher.filterCache.lookups":2,
>   "CACHE.searcher.filterCache.inserts":1,
>
> # same q & fq as before:
> # hit on queryresultCache means no filterCache needed...
>
> $ curl -sS '
> http://localhost:8983/solr/techproducts/select?q=popularity:[5%20TO%2012]&fq=manu:samsung%20OR%20manu:apple'
> > /dev/null
> $ curl -sS '
> http://localhost:8983/solr/techproducts/admin/mbeans?wt=json&indent=true&category=CACHE&stats=true'
> | grep -E
> 'CACHE.searcher.(queryResultCache|filterCache).(inserts|hits|lookups)'
>   "CACHE.searcher.queryResultCache.lookups":3,
>   "CACHE.searcher.queryResultCache.inserts":3,
>   "CACHE.searcher.queryResultCache.hits":1}},
>   "CACHE.searcher.filterCache.hits":1,
>   "CACHE.searcher.filterCache.lookups":2,
>   "CACHE.searcher.filterCache.inserts":1,
>
>
>
> -Hoss
> http://www.lucidworks.com/
>