Help with complex boolean search queries

2017-10-28 Thread Ankit Shah
Hi,
I am new to the solr community, and have this weird problem with the search
results
here is whats going on. i have a logfile that is indexed into solr with the
following config


   <
tokenizer class="solr.StandardTokenizerFactory"/>


here is a sample for demonstration purpose, assume the following
logfile(text) is indexed to solr in the field "log"

AppleCare+ extends the basic warranty that covers non-accidental iPhone
mishaps -- such as battery issues or a faulty headphone jack -- from one
year to two. The iPhone X was unveiled to much fanfare last month. It
boasts a radical update to the iPhone models of years past, with an
all-glass display and an option to unlock with facial recognition. It also
has an all-glass back, so owners run the risk of cracking either side of
the phone. However, Apple has claimed the glass on the iPhone 8 and iPhone
X is much stronger than earlier models, so it could be harder to break.
Pre-orders for the phone began online Friday, and units were selling out
quickly. The U.S. Apple Store site said it would take

now the query that i run is as follows:

q=("warranty that covers non-accidental") OR ("risking it all" AND "harder
to break")
hl.q=("warranty that covers non-accidental") OR ("risking it all" AND "harder
to break")
hl=true hl.fl=log hl.usePhraseHighlighter=true hl.fragsize=2000
hl.maxAnalyzedChars=2097152
indent=on

or as a URL
http://localhost:8983/solr/mycore/select?hl.usePhraseHighlighter=true&hl.fl=log&hl=true&hl.fragsize=2000&indent=on&wt=json
&hl.q=(%22warranty%20that%20covers%20non-accidental%22)%20OR%20(%22risking
%20it%20all%22%20AND%20%22harder%20to%20break%22)&q=(%22warranty%20that%20
covers%20non-accidental%22)%20OR%20(%22risking%20it%20all
%22%20AND%20%22harder%20to%20break%22)

the response is as follows:
{
  "responseHeader":{
"status":0,
"QTime":24,
"params":{
  "q":"(\"warranty that covers non-accidental\") OR (\"risking it all\"
AND \"harder to break\")",
  "hl":"true",
  "indent":"on",
  "hl.q":"(\"warranty that covers non-accidental\") OR (\"risking it
all\" AND \"harder to break\")",
  "hl.usePhraseHighlighter":"true",
  "hl.fragsize":"2000",
  "hl.fl":"log",
  "wt":"json"}},
  "response":{"numFound":1,"start":0,"docs":[
  {
"logid":5487941,
"log":"AppleCare+ extends the basic warranty that covers
non-accidental iPhone mishaps -- such as battery issues or a faulty
headphone jack -- from one year to two.\nThe iPhone X was unveiled to much
fanfare last month. It boasts a radical update to the iPhone models of
years past, with an all-glass display and an option to unlock with facial
recognition.\nIt also has an all-glass back, so owners run the risk of
cracking either side of the phone.\nHowever, Apple has claimed the glass on
the iPhone 8 and iPhone X is much stronger than earlier models, so it could
be harder to break.\nPre-orders for the phone began online Friday, and
units were selling out quickly. The U.S. Apple Store site said it would
take \n",
"_version_":1582439847966015488}]
  },
  "highlighting":{
"5487941":{
  "log":["AppleCare+ extends the basic *warranty that
covers non-accidental* iPhone mishaps -- such as battery
issues or a faulty headphone jack -- from one year to two.\nThe iPhone X
was unveiled to much fanfare last month. It boasts a radical update to the
iPhone models of years past, with an all-glass display and an option to
unlock with facial recognition.\nIt also has an all-glass back, so owners
run the risk of cracking either side of the phone.\nHowever, Apple has
claimed the glass on the iPhone 8 and iPhone X is much stronger than
earlier models, so it could be *harder to
break*.\nPre-orders
for the phone began online Friday, and units were selling out quickly. The
U.S. Apple Store site said it would take \n"]
}
  }
}

i get the correct document as a hit, but the highlighted text is wrong, i
am wondering the querying is straight forward, match either condition 1 or
condition 2
where condition 1 =  "warranty that covers non-accidental"
and condition 2 = "risking it all" AND "harder to break"

now the hit is correct as condition 1 matched, but why is the highlight
indicating that it also matched part of the condition 2. why is it ignoring
the AND operator for condition 2
Am i missing something here. I am hoping to get this resolved as i am
planning to use even more complex queries like
*(condition 1) OR (**condition **2) OR (**condition **3) OR (**condition **4
AND condition 5) OR (**condition **6 AND condition 5 NOT condition 7)*

I am looking for some pointers

Thanks
Ankit


Query performance degrades when TLOG replica

2020-09-03 Thread Ankit Shah
We have the following setup , solr 7.7.2 with 1 TLOG Leader & 1 TLOG
replica with a single shard. We have about  34.5 million documents with an
approximate index size of 600GB. I have noticed a degraded query
performance whenever the replica is trying to (guessing here) sync or
perform actual replication. To test this, I fire a very basic query using
solrj client & the query comes back right away, but whenever the
replication is trying to see how far behind it is by comparing the
generation ids the same queries take longer. In production we do not make
these simple queries, but rather complex queries with filter queries &
sorting. These queries take too long as compared to our previous
(standalone solr 6.1.0)

Any help here is appreciated

20-09-02 16:35:30 INFO  [db_shard1_replica_t3]  webapp=/solr path=/select
params={q=*:*&fl=id&sort=id+desc&rows=1&wt=xml&version=2.2} hits=34458909
status=0 QTime=0
2020-09-02 16:35:30 INFO  [db_shard1_replica_t3]  webapp=/solr path=/select
params={q=*:*&fl=id&sort=id+desc&rows=1&wt=xml&version=2.2} hits=34458909
status=0 QTime=0
2020-09-02 16:36:00 INFO  [db_shard1_replica_t3]  webapp=/solr path=/select
params={q=*:*&fl=id&sort=id+desc&rows=1&wt=xml&version=2.2} hits=34458909
status=0 QTime=0
2020-09-02 16:36:00 INFO  [db_shard1_replica_t3]  webapp=/solr path=/select
params={q=*:*&fl=id&sort=id+desc&rows=1&wt=xml&version=2.2} hits=34458909
status=0 QTime=0
2020-09-02 16:36:30 INFO  [db_shard1_replica_t3]  webapp=/solr path=/select
params={q=*:*&fl=id&sort=id+desc&rows=1&wt=xml&version=2.2} hits=34458909
status=0 QTime=0
2020-09-02 16:36:30 INFO  [db_shard1_replica_t3]  webapp=/solr path=/select
params={q=*:*&fl=id&sort=id+desc&rows=1&wt=xml&version=2.2} hits=34458909
status=0 QTime=0
*2020-09-02 16:37:01* INFO  [db_shard1_replica_t3]  webapp=/solr
path=/select params={q=*:*&fl=id&sort=id+desc&rows=1&wt=xml&version=2.2}
hits=34458909 status=0 QTime=*1011*
*2020-09-02 16:37:01* INFO  [db_shard1_replica_t3]  webapp=/solr
path=/select params={q=*:*&fl=id&sort=id+desc&rows=1&wt=xml&version=2.2}
hits=34458909 status=0 QTime=*758*
*2020-09-02 16:37:32* INFO  [db_shard1_replica_t3]  webapp=/solr
path=/select params={q=*:*&fl=id&sort=id+desc&rows=1&wt=xml&version=2.2}
hits=34458957 status=0 QTime=*1077*
*2020-09-02 16:37:32* INFO  [db_shard1_replica_t3]  webapp=/solr
path=/select params={q=*:*&fl=id&sort=id+desc&rows=1&wt=xml&version=2.2}
hits=34458957 status=0 QTime=*1081*
2020-09-02 16:38:02 INFO  [db_shard1_replica_t3]  webapp=/solr path=/select
params={q=*:*&fl=id&sort=id+desc&rows=1&wt=xml&version=2.2} hits=34458957
status=0 QTime=*668*
2020-09-02 16:38:03 INFO  [db_shard1_replica_t3]  webapp=/solr path=/select
params={q=*:*&fl=id&sort=id+desc&rows=1&wt=xml&version=2.2} hits=34458957
status=0 QTime=*1001*


*2020-09-02 16:37:01* INFO  Master's generation: 263116
*2020-09-02 16:37:01* INFO  Master's version: 1599064577322
*2020-09-02 16:37:01* INFO  Slave's generation: 263116
*2020-09-02 16:37:01* INFO  Slave's version: 1599064577322
*2020-09-02 16:37:01* INFO  Slave in sync with master.
2020-09-02 16:37:02 INFO  Master's generation: 104189
2020-09-02 16:37:02 INFO  Master's version: 1599064620532
2020-09-02 16:37:02 INFO  Slave's generation: 104188
2020-09-02 16:37:02 INFO  Slave's version: 1599064560341
2020-09-02 16:37:02 INFO  Starting replication process
2020-09-02 16:37:02 INFO  Number of files in latest index in master: 1010
2020-09-02 16:37:02 INFO  Starting download (fullCopy=false) to
NRTCachingDirectory(MMapDirectory@/opt/solr-7.7.2/server/solr/test_shard1_replica_t3/data/index.20200902163702345
lockFactory=org.apache.lucene.store.NativeFSLockFactory@77247ee;
maxCacheMB=48.0 maxMergeSizeMB=4.0)
2020-09-02 16:37:02 INFO  Bytes downloaded: 837587, Bytes skipped
downloading: 0
2020-09-02 16:37:02 INFO  Total time taken for download
(fullCopy=false,bytesDownloaded=837587) : 0 secs (null bytes/sec) to
NRTCachingDirectory(MMapDirectory@/opt/solr-7.7.2/server/solr/test_shard1_replica_t3/data/index.20200902163702345
lockFactory=org.apache.lucene.store.NativeFSLockFactory@77247ee;
maxCacheMB=48.0 maxMergeSizeMB=4.0)
2020-09-02 16:37:03 INFO  New IndexWriter is ready to be used.
2020-09-02 16:37:03 INFO  Master's generation: 124002
2020-09-02 16:37:03 INFO  Master's version: 1599064617242
2020-09-02 16:37:03 INFO  Slave's generation: 124000
2020-09-02 16:37:03 INFO  Slave's version: 1599064492914
2020-09-02 16:37:03 INFO  Starting replication process
2020-09-02 16:37:04 INFO  [db_shard1_replica_t3]  webapp=/solr path=/update
params={update.distrib=FROMLEADER&distrib.from=
http://178.33.234.1:8983/solr/db_shard1_replica_t25/&wt=javabin&version=2}{add=[11907382419
(1676740784884285440), 11907383701 (1676740784889528320), 11907383253
(1676740784900014080), 11907379290 (1676740785002774528), 11907382623
(1676740785005920256), 11907378461 (1676740785011163136), 11907382429
(1676740785012211712), 11907380739 (1676740785023746048), 11907381184
(1676740785038426112), 11907380614 (1