How to handle List in Solr 6.6

2018-11-06 Thread waseem-farooqui
I am new with Solr and using Spring-data-solr to store my complete **pdf**
files with its contents now there raise a situation in which I want to store
the file rating, that can be rate by list of users means I would have object
something like this in my **DataModel** `List` in which
`FileRating` would have `user, comments, date, rating` the response json
structure should be like this 

{
  "document": "Fuzzy based semantic search.pdf",
  "md5Hash": "md5",
  "rated": [
{
  "user": "John",
  "comments": "Not Very useful",
  "rating": 2,
  "date": "20/10/2018"
},
{
  "user": "Terrion",
  "comments": "Useful with context to semantic based fuzzy logics.",
  "rating": 6,
  "date": "20/10/2018"
}
  ]
}
  and I not getting any idea how is this possible in solr have looked
`multivalued` type but I don't think it would work in my scenario because at
the end of the day I want to search all documents with its rating and could
be file rated by specific users.

`Solr 6.6`



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Java Advanced Imaging (JAI) Image I/O Tools are not installed

2018-11-06 Thread Yasufumi Mizoguchi
Hi,

It seems a PDFBox issue, I think.
( https://pdfbox.apache.org/2.0/dependencies.html )

Thanks,
Yasufumi


2018年11月6日(火) 16:10 Furkan KAMACI :

> Hi All,
>
> I use Solr 6.5.0 and test OCR capabilities. It OCRs pdf files even it is so
> slow. However, I see that error when I check logs:
>
> o.a.p.c.PDFStreamEngine Cannot read JPEG2000 image: Java Advanced Imaging
> (JAI) Image I/O Tools are not installed
>
> Any idea how to fix this?
>
> Kind  Regards,
> Furkan KAMACI
>


Solr suggestions, best practices

2018-11-06 Thread Clemens Wyss DEV
At the moment we are using spellchecking-component for suggestions which is 
suboptimal, to say the least. What are best pracitces for suggestions using 
Solr? 
googling (with excellent suggestions 😉) I came along 
https://blog.trifork.com/2012/02/15/different-ways-to-make-auto-suggestions-with-solr/
and
https://grokbase.com/t/lucene/solr-user/14bayc6jkc/best-practice-autosuggest-autocomplete-vs-real-search

Any other valuable reads/links regarding suggestions?

Thx in advance
- Clemens


AW: AW: AW: 6.6 -> 7.5 SolrJ, seeing many "Connection evictor"-Threads

2018-11-06 Thread Clemens Wyss DEV
Hi Shalin,
> You can expect as many connection evictor threads
I have (whysoever (*)) 27 SolrClient instances instantiated but I see ~95 
"Connection Evictor" threads ...

>It turns out that I made a mistake in the patch I committed in...which names 
>threads like pool-123-thread-1282. 
>So if you take a thread dump from Solr 6.6
Also I cannot prove, but I do not recall seeing many pool-xxx-thread- in my 
stack traces. In one I have at hand I see
2 "pool-x-thread-y"-threads
27 "ForkJoinPool.commonPool-worker-xx"-threads
So I guess it is/was the ForkJoinPool.commonPool-worker's, but 27 is not >90

Thx
- Clemens

(*) I will follow Shawn's advices in this thread asap

-Ursprüngliche Nachricht-
Von: Shalin Shekhar Mangar  
Gesendet: Dienstag, 23. Oktober 2018 10:30
An: solr-user@lucene.apache.org
Betreff: Re: AW: AW: 6.6 -> 7.5 SolrJ, seeing many "Connection evictor"-Threads

You can expect as many connection evictor threads as the number of http client 
instances. This is true for both Solr 6.6 and 7.x.

I was intrigued as to why you were not seeing the same threads in both 
versions. It turns out that I made a mistake in the patch I committed in
SOLR-9290 where instead of using Solr's DefaultSolrThreadFactory which names 
threads with a proper prefix, I used Java's DefaultThreadFactory which names 
threads like pool-123-thread-1282. So if you take a thread dump from Solr 6.6, 
you should be able to find threads named like these which are sleeping at a 
similar place in the stack.




Re: Retrieve field from docValues

2018-11-06 Thread Yasufumi Mizoguchi
Hi,

> 1. For schema version 1.6, useDocValuesAsStored=true is default, so there
> is no need to explicitly set it in schema.xml?

Yes.

> 2.  With useDocValuesAsStored=true and the following definition, will Solr
> retrieve id from docValues instead of stored field?

No.
AFAIK, if you define both docValues="true" and stored="true" in your
schema,
Solr tries to retrieve stored value.
(Except using streaming expressions or /export handler etc...
See:
https://lucene.apache.org/solr/guide/6_6/docvalues.html#DocValues-EnablingDocValues
)

Thanks,
Yasufumi


2018年11月6日(火) 9:54 Wei :

> Hi,
>
> I have a few questions about using the useDocValuesAsStored option to
> retrieve field from docValues:
>
> 1. For schema version 1.6, useDocValuesAsStored=true is default, so there
> is no need to explicitly set it in schema.xml?
>
> 2.  With useDocValuesAsStored=true and the following definition, will Solr
> retrieve id from docValues instead of stored field? if fl= id, title,
> score,   both id and title are single value field:
>
>docValues="true" required="true"/>
>
>   docValues="true" required="true"/>
>
>   Do I need to have all fields stored="false" docValues="true" to make solr
> retrieve from docValues only? I am using Solr 6.6.
>
> Thanks,
> Wei
>


is SearchComponent the correct way?

2018-11-06 Thread John Thorhauer
We have a need to check the results of a search against a set of security
lists that are maintained in a redis cache.  I need to be able to take each
document that is returned for a search and check the redis cache to see if
the document should be displayed or not.

I am attempting to do this by creating a SearchComponent.  I am able to
iterate thru the results and identify the items I want to remove from the
results but I am not sure how to proceed in removing them.

Is SearchComponent the best way to do this?  If so, any thoughts on how to
proceed?


Thanks,
John Thorhauer


Re: is SearchComponent the correct way?

2018-11-06 Thread Mikhail Khludnev
It should be postfilter
https://lucidworks.com/2017/11/27/caching-and-filters-and-post-filters/, I
believe.


On Tue, Nov 6, 2018 at 2:24 PM John Thorhauer 
wrote:

> We have a need to check the results of a search against a set of security
> lists that are maintained in a redis cache.  I need to be able to take each
> document that is returned for a search and check the redis cache to see if
> the document should be displayed or not.
>
> I am attempting to do this by creating a SearchComponent.  I am able to
> iterate thru the results and identify the items I want to remove from the
> results but I am not sure how to proceed in removing them.
>
> Is SearchComponent the best way to do this?  If so, any thoughts on how to
> proceed?
>
>
> Thanks,
> John Thorhauer
>


-- 
Sincerely yours
Mikhail Khludnev


Re: SolrCloud scaling/optimization for high request rate

2018-11-06 Thread Sofiya Strochyk

Hi Toke,

sorry for the late reply. The query i wrote here is edited to hide 
production details, but I can post additional info if this helps.


I have tested all of the suggested changes none of these seem to make a 
noticeable difference (usually response time and other metrics fluctuate 
over time, and the changes caused by different parameters are smaller 
than the fluctuations). What this probably means is that the heaviest 
task is retrieving IDs by query and not fields by ID. I've also checked 
QTime logged for these types of operations, and it is much higher for 
"get IDs by query" than for "get fields by IDs list". What could be done 
about this?


On 05.11.18 14:43, Toke Eskildsen wrote:

So far no answer from Sofiya. That's fair enough: My suggestions might
have seemed random. Let me try to qualify them a bit.


What we have to work with is the redacted query
q=&fl=&start=0&sort=&fq=&rows=24&version=2.2&wt=json
and an earlier mention that sorting was complex.

My suggestions were to try

1) Only request simple sorting by score

If this improves performance substantially, we could try and see if
sorting could be made more efficient: Reducing complexity, pre-
calculating numbers etc.

2) Reduce rows to 0
3) Increase rows to 100

This measures one aspect of retrieval. If there is a big performance
difference between these two, we can further probe if the problem is
the number or size of fields - perhaps there is a ton of stored text,
perhaps there is a bunch of DocValued fields?

4) Set fl=id only

This is a variant of 2+3 to do a quick check if it is the resolving of
specific field values that is the problem. If using fl=id speeds up
substantially, the next step would be to add fields gradually until
(hopefully) there is a sharp performance decrease.

- Toke Eskildsen, Royal Danish Library




--
Email Signature
*Sofiia Strochyk
*


s...@interlogic.com.ua 
InterLogic
www.interlogic.com.ua 

Facebook icon  LinkedIn 
icon 




Re: How to handle List in Solr 6.6

2018-11-06 Thread Tim Underwood
Hi,

It sounds like you are looking for the "Nested Child Documents"[1] and
"Block Join Query Parsers"[2] features in Solr.  The terminology is weird
(block join, child/of, parent/which) but it should do what you want.

Do take note of the warning in the docs:

One limitation of indexing nested documents is that the whole block of
> parent-children documents must be updated together whenever any changes are
> required. In other words, even if a single child document or the parent
> document is changed, the whole block of parent-child documents must be
> indexed together.


What this note does not include is that if you delete a parent document you
must also explicitly delete the child documents otherwise they end up being
attached to another parent document.  I forget if this applies when you
re-index a document or not but to be safe I always explicitly delete the
parent and child documents.  There are a number of JIRA tickets floating
around relating to cleaning up the user experience for this.

-Tim

[1]
https://lucene.apache.org/solr/guide/6_6/uploading-data-with-index-handlers.html#UploadingDatawithIndexHandlers-NestedChildDocuments
[2]
https://lucene.apache.org/solr/guide/6_6/other-parsers.html#OtherParsers-BlockJoinQueryParsers

On Tue, Nov 6, 2018 at 12:01 AM waseem-farooqui 
wrote:

> I am new with Solr and using Spring-data-solr to store my complete **pdf**
> files with its contents now there raise a situation in which I want to
> store
> the file rating, that can be rate by list of users means I would have
> object
> something like this in my **DataModel** `List` in which
> `FileRating` would have `user, comments, date, rating` the response json
> structure should be like this
>
> {
>   "document": "Fuzzy based semantic search.pdf",
>   "md5Hash": "md5",
>   "rated": [
> {
>   "user": "John",
>   "comments": "Not Very useful",
>   "rating": 2,
>   "date": "20/10/2018"
> },
> {
>   "user": "Terrion",
>   "comments": "Useful with context to semantic based fuzzy
> logics.",
>   "rating": 6,
>   "date": "20/10/2018"
> }
>   ]
> }
>   and I not getting any idea how is this possible in solr have looked
> `multivalued` type but I don't think it would work in my scenario because
> at
> the end of the day I want to search all documents with its rating and could
> be file rated by specific users.
>
> `Solr 6.6`
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


CloudSolrClient produces tons of CLUSTERSTATUS commands against single server in Cloud

2018-11-06 Thread Zimmermann, Thomas
Question about CloudSolrClient and CLUSTERSTATUS. We just deployed a 3 server 
ZK cluster and a 5 node solr cluster using the CloudSolrClient in Solr 7.4.

We're seeing a TON of traffic going to one server with just cluster status 
commands. Every single query seems to be hitting this box for status, but the 
rest of the query load is divided evenly amongst the servers. Is this an 
expected interaction in this client?

For example - 75k request per minute going to this one box, and 3.5k RPM to all 
other nodes in the cloud.

All of those extra requests on the one box are 
"/solr/admin/collections?collection=collectionName&action=CLUSTERSTATUS&wt=javabin&version=2"

Our plan right now is to roll back to the basic HTTP client and pass all 
traffic through our load balancer, but would like to understand if this is an 
expected interaction for the Cloud Client, a misconfiguration on our end, or a 
bug


Re: How to handle List in Solr 6.6

2018-11-06 Thread Shawn Heisey

On 11/6/2018 12:52 AM, waseem-farooqui wrote:

 {
  "document": "Fuzzy based semantic search.pdf",
  "md5Hash": "md5",
  "rated": [
{
  "user": "John",
  "comments": "Not Very useful",
  "rating": 2,
  "date": "20/10/2018"
},
{
  "user": "Terrion",
  "comments": "Useful with context to semantic based fuzzy logics.",
  "rating": 6,
  "date": "20/10/2018"
}
  ]
}


Solr documents have a flat structure.  There is no normalization like 
you have with a relational database.  Think of it like a single database 
table with many columns, instead of several database tables working 
together.  A complex nested structure in a single document is not possible.


Solr has one feature that might be what you need -- parent/child 
documents.  In that scenario, your rating structures would be completely 
separate documents, indexed together with the parent document.  To query 
it and make use of the dependent structure, you would use the blockjoin 
query parser.  I have never used this functionality, so this paragraph 
is all I know about it and I cannot help any further.


Thanks,
Shawn



distributed grouping by date

2018-11-06 Thread Tomáš Hampl
Hi,

i have error while running grouping query by date in collection with 5
shards. When i try same query on collection with only one shard everything
works.

*query:*

/solr/search_cz/select?q=*:*&group=true&group.field=odjezd

*part of schema.xml*



...



*collection create *

/solr/admin/collections\?action\=CREATE\&autoAddReplicas\=true\&collection.configName\=search_cz\&maxShardsPerNode\=5\&name\=search_cz\&numShards\=5\&replicationFactor\=1\&router.field\=routing_key\&
router.name\=compositeId\&rule\=shard:\*,replica:\<2,node:\*


*stacktrace:*

org.apache.solr.common.SolrException: Invalid Date String:'Wed Nov 21
12:45:00 UTC 2018'
at 
org.apache.solr.util.DateMathParser.parseMath(DateMathParser.java:247)
at 
org.apache.solr.util.DateMathParser.parseMath(DateMathParser.java:226)
at 
org.apache.solr.schema.DatePointField.toNativeType(DatePointField.java:113)
at 
org.apache.solr.schema.DatePointField.readableToIndexed(DatePointField.java:184)
at 
org.apache.solr.search.grouping.distributed.command.GroupConverter.fromMutable(GroupConverter.java:57)
at 
org.apache.solr.search.grouping.distributed.command.SearchGroupsFieldCommand.result(SearchGroupsFieldCommand.java:128)
at 
org.apache.solr.search.grouping.distributed.shardresultserializer.SearchGroupsResultTransformer.transform(SearchGroupsResultTransformer.java:55)
at 
org.apache.solr.search.grouping.distributed.shardresultserializer.SearchGroupsResultTransformer.transform(SearchGroupsResultTransformer.java:37)
at 
org.apache.solr.search.grouping.CommandHandler.processResult(CommandHandler.java:208)
at 
org.apache.solr.handler.component.QueryComponent.doProcessGroupedDistributedSearchFirstPhase(QueryComponent.java:1282)
at 
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:360)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:298)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2541)
at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:709)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:515)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:377)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:323)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1634)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
at 
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595)
at 
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1317)
at 
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564)
at 
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1219)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:219)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
at 
org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
at org.eclipse.jetty.server.Server.handle(Server.java:531)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:352)
at 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260)
at 
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:281)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:102)
at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:762)
at 
org.eclipse.jetty.util.thread.QueuedThre

Re: CloudSolrClient produces tons of CLUSTERSTATUS commands against single server in Cloud

2018-11-06 Thread Erick Erickson
Is the box you're seeing this on the Overseer? Or is it in any other
way "special", like has all the leaders? And I'm assuming all these
are NRT replicas, not TLOG or PULL.

What are you doing when these occur? Queries? Updates? If you're doing
updates, are these coincident with each request? Each commit (which
you shouldn't be doing from the client anyway)? If they're coincident
with updating, are you updating in batches or a single doc at a time?

I can imagine each update or commit gets the status, although even
that seems questionable.

If you can pin down a bit what actions trigger the request that'd help a lot.

Best,
Erick



On Tue, Nov 6, 2018 at 8:06 AM Zimmermann, Thomas
 wrote:
>
> Question about CloudSolrClient and CLUSTERSTATUS. We just deployed a 3 server 
> ZK cluster and a 5 node solr cluster using the CloudSolrClient in Solr 7.4.
>
> We're seeing a TON of traffic going to one server with just cluster status 
> commands. Every single query seems to be hitting this box for status, but the 
> rest of the query load is divided evenly amongst the servers. Is this an 
> expected interaction in this client?
>
> For example - 75k request per minute going to this one box, and 3.5k RPM to 
> all other nodes in the cloud.
>
> All of those extra requests on the one box are 
> "/solr/admin/collections?collection=collectionName&action=CLUSTERSTATUS&wt=javabin&version=2"
>
> Our plan right now is to roll back to the basic HTTP client and pass all 
> traffic through our load balancer, but would like to understand if this is an 
> expected interaction for the Cloud Client, a misconfiguration on our end, or 
> a bug


Re: distributed grouping by date

2018-11-06 Thread Erick Erickson
Looks like: https://issues.apache.org/jira/browse/SOLR-11086
On Tue, Nov 6, 2018 at 8:19 AM Tomáš Hampl  wrote:
>
> Hi,
>
> i have error while running grouping query by date in collection with 5
> shards. When i try same query on collection with only one shard everything
> works.
>
> *query:*
>
> /solr/search_cz/select?q=*:*&group=true&group.field=odjezd
>
> *part of schema.xml*
>
>  positionIncrementGap="0"/>
>
> ...
>
> 
>
> *collection create *
>
> /solr/admin/collections\?action\=CREATE\&autoAddReplicas\=true\&collection.configName\=search_cz\&maxShardsPerNode\=5\&name\=search_cz\&numShards\=5\&replicationFactor\=1\&router.field\=routing_key\&
> router.name\=compositeId\&rule\=shard:\*,replica:\<2,node:\*
>
>
> *stacktrace:*
>
> org.apache.solr.common.SolrException: Invalid Date String:'Wed Nov 21
> 12:45:00 UTC 2018'
> at 
> org.apache.solr.util.DateMathParser.parseMath(DateMathParser.java:247)
> at 
> org.apache.solr.util.DateMathParser.parseMath(DateMathParser.java:226)
> at 
> org.apache.solr.schema.DatePointField.toNativeType(DatePointField.java:113)
> at 
> org.apache.solr.schema.DatePointField.readableToIndexed(DatePointField.java:184)
> at 
> org.apache.solr.search.grouping.distributed.command.GroupConverter.fromMutable(GroupConverter.java:57)
> at 
> org.apache.solr.search.grouping.distributed.command.SearchGroupsFieldCommand.result(SearchGroupsFieldCommand.java:128)
> at 
> org.apache.solr.search.grouping.distributed.shardresultserializer.SearchGroupsResultTransformer.transform(SearchGroupsResultTransformer.java:55)
> at 
> org.apache.solr.search.grouping.distributed.shardresultserializer.SearchGroupsResultTransformer.transform(SearchGroupsResultTransformer.java:37)
> at 
> org.apache.solr.search.grouping.CommandHandler.processResult(CommandHandler.java:208)
> at 
> org.apache.solr.handler.component.QueryComponent.doProcessGroupedDistributedSearchFirstPhase(QueryComponent.java:1282)
> at 
> org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:360)
> at 
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:298)
> at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:2541)
> at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:709)
> at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:515)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:377)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:323)
> at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1634)
> at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533)
> at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)
> at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
> at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
> at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)
> at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595)
> at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
> at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1317)
> at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
> at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473)
> at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564)
> at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)
> at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1219)
> at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
> at 
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:219)
> at 
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)
> at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
> at 
> org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
> at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
> at org.eclipse.jetty.server.Server.handle(Server.java:531)
> at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:352)
> at 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260)
> at 
> org.eclipse.jetty.io.AbstractConnection$ReadCallbac

Re: Retrieve field from docValues

2018-11-06 Thread Erick Erickson
2. "it depends". Solr  will try to do the most efficient thing
possible. If _all_ the fields are docValues, it will return the stored
values from the docValues  structure. This prevents a disk seek and
decompress cycle.

However, if even one field is docValues=false Solr will by default
return the stored values. For the multiValued case, you can explicitly
tell Solr to return the docValues field.

Best,
Erick
On Tue, Nov 6, 2018 at 1:46 AM Yasufumi Mizoguchi
 wrote:
>
> Hi,
>
> > 1. For schema version 1.6, useDocValuesAsStored=true is default, so there
> > is no need to explicitly set it in schema.xml?
>
> Yes.
>
> > 2.  With useDocValuesAsStored=true and the following definition, will Solr
> > retrieve id from docValues instead of stored field?
>
> No.
> AFAIK, if you define both docValues="true" and stored="true" in your
> schema,
> Solr tries to retrieve stored value.
> (Except using streaming expressions or /export handler etc...
> See:
> https://lucene.apache.org/solr/guide/6_6/docvalues.html#DocValues-EnablingDocValues
> )
>
> Thanks,
> Yasufumi
>
>
> 2018年11月6日(火) 9:54 Wei :
>
> > Hi,
> >
> > I have a few questions about using the useDocValuesAsStored option to
> > retrieve field from docValues:
> >
> > 1. For schema version 1.6, useDocValuesAsStored=true is default, so there
> > is no need to explicitly set it in schema.xml?
> >
> > 2.  With useDocValuesAsStored=true and the following definition, will Solr
> > retrieve id from docValues instead of stored field? if fl= id, title,
> > score,   both id and title are single value field:
> >
> >> docValues="true" required="true"/>
> >
> >   > docValues="true" required="true"/>
> >
> >   Do I need to have all fields stored="false" docValues="true" to make solr
> > retrieve from docValues only? I am using Solr 6.6.
> >
> > Thanks,
> > Wei
> >


Re: CloudSolrClient produces tons of CLUSTERSTATUS commands against single server in Cloud

2018-11-06 Thread Jason Gerlowski
My understanding was that we always tried to use the cached version of
this information until either (a) Solr responds in a way that
indicates our cache is out of date, or (b) the TTL on the cache entry
expires.  Though there might very well be a code path that behaves
differently as Erick suggests above.

A few more questions that might shed light on this for you (or for us):
1. How are you creating your CloudSolrClient?  Can you share the
2. Did you modify the TTL on your cache via CloudSolrClient's
"setCollectionCacheTTl" method?
3. Are all of the CLUSTERSTATUS requests you're seeing for the same
collection, or different collections?  How many collections do you
have on your cluster?

Best,

Jason

On Tue, Nov 6, 2018 at 11:25 AM Erick Erickson  wrote:
>
> Is the box you're seeing this on the Overseer? Or is it in any other
> way "special", like has all the leaders? And I'm assuming all these
> are NRT replicas, not TLOG or PULL.
>
> What are you doing when these occur? Queries? Updates? If you're doing
> updates, are these coincident with each request? Each commit (which
> you shouldn't be doing from the client anyway)? If they're coincident
> with updating, are you updating in batches or a single doc at a time?
>
> I can imagine each update or commit gets the status, although even
> that seems questionable.
>
> If you can pin down a bit what actions trigger the request that'd help a lot.
>
> Best,
> Erick
>
>
>
> On Tue, Nov 6, 2018 at 8:06 AM Zimmermann, Thomas
>  wrote:
> >
> > Question about CloudSolrClient and CLUSTERSTATUS. We just deployed a 3 
> > server ZK cluster and a 5 node solr cluster using the CloudSolrClient in 
> > Solr 7.4.
> >
> > We're seeing a TON of traffic going to one server with just cluster status 
> > commands. Every single query seems to be hitting this box for status, but 
> > the rest of the query load is divided evenly amongst the servers. Is this 
> > an expected interaction in this client?
> >
> > For example - 75k request per minute going to this one box, and 3.5k RPM to 
> > all other nodes in the cloud.
> >
> > All of those extra requests on the one box are 
> > "/solr/admin/collections?collection=collectionName&action=CLUSTERSTATUS&wt=javabin&version=2"
> >
> > Our plan right now is to roll back to the basic HTTP client and pass all 
> > traffic through our load balancer, but would like to understand if this is 
> > an expected interaction for the Cloud Client, a misconfiguration on our 
> > end, or a bug


Re: CloudSolrClient produces tons of CLUSTERSTATUS commands against single server in Cloud

2018-11-06 Thread Shawn Heisey

On 11/6/2018 9:06 AM, Zimmermann, Thomas wrote:

For example - 75k request per minute going to this one box, and 3.5k RPM to all 
other nodes in the cloud.

All of those extra requests on the one box are 
"/solr/admin/collections?collection=collectionName&action=CLUSTERSTATUS&wt=javabin&version=2"


That sounds like either a bug or some kind of problem in your setup.  
Over a thousand requests per second will overwhelm a single Solr node, 
even if the info can be satisfied entirely from memory and doesn't 
require complex calculations or large-scale data retrieval like a 
regular query does.


If you manually execute that request, do you get a response, and does it 
return quickly or take a significant amount of time?  If the request 
itself has problems, maybe CloudSolrClient is repeating it frequently 
because it's not getting the info it's after.  Can you share the full 
log entry from solr.log for one of those requests?


I try to keep an eye on things with CloudSolrClient, but I have very 
limited experience with it.  I cannot imagine that the behavior you're 
seeing is normal.  It sounds very wrong to me.


Since I do not know all that much about how CloudSolrClient's background 
threads work, I cannot say for sure whether it's a bug or a problem with 
your setup.  Can you try upgrading the Solr jars in your client app to 
7.5.0 and see if that makes any difference?  What version of Solr are 
you running on the server side?



Our plan right now is to roll back to the basic HTTP client and pass all 
traffic through our load balancer, but would like to understand if this is an 
expected interaction for the Cloud Client, a misconfiguration on our end, or a 
bug


At least you have that as an option!  Some people might not be able to 
do that.


Thanks,
Shawn



Re: CloudSolrClient produces tons of CLUSTERSTATUS commands against single server in Cloud

2018-11-06 Thread Zimmermann, Thomas
Erik - 

This box did have all the leaders for the dozen or so collections we have
when the cloud spun up. We were able to force the leaders for other cores
onto other nodes using the apis, but did not see this traffic load migrate
to the new hosts when leadership changed. All nodes are NRT. The requests
are 99% queries to load content on the web front ends, a few intermittent
updates with comments, new content creation, etc.

Jason - 

1. We are instantiating the cloud client with our VIP Load Balancer url.
We ran into a memory leak issue when passing in ZK server addresses that
forced this path.
2. No we did not tweak any cache TTLs
3. This codebase interacts with three collections in our cloud, and we are
seeing CLUSTERSTATUS checks for all 3.

Shawn -

Server performance is fine and request time are great. We are tolerating
the level of traffic, but the server that is taking all the hits is
obviously performing a bit slower than the others. Response times are
under 5MS avg for queries on all servers, which is within our perf
thresholds.

We are running 7.4 on the client and server side, moving to 7.5 was
troublesome for us so we are holding off for the time being.

Thanks,
TZ



On 11/6/18, 11:39 AM, "Shawn Heisey"  wrote:

>On 11/6/2018 9:06 AM, Zimmermann, Thomas wrote:
>> For example - 75k request per minute going to this one box, and 3.5k
>>RPM to all other nodes in the cloud.
>>
>> All of those extra requests on the one box are
>>"/solr/admin/collections?collection=collectionName&action=CLUSTERSTATUS&w
>>t=javabin&version=2"
>
>That sounds like either a bug or some kind of problem in your setup.
>Over a thousand requests per second will overwhelm a single Solr node,
>even if the info can be satisfied entirely from memory and doesn't
>require complex calculations or large-scale data retrieval like a
>regular query does.
>
>If you manually execute that request, do you get a response, and does it
>return quickly or take a significant amount of time?  If the request
>itself has problems, maybe CloudSolrClient is repeating it frequently
>because it's not getting the info it's after.  Can you share the full
>log entry from solr.log for one of those requests?
>
>I try to keep an eye on things with CloudSolrClient, but I have very
>limited experience with it.  I cannot imagine that the behavior you're
>seeing is normal.  It sounds very wrong to me.
>
>Since I do not know all that much about how CloudSolrClient's background
>threads work, I cannot say for sure whether it's a bug or a problem with
>your setup.  Can you try upgrading the Solr jars in your client app to
>7.5.0 and see if that makes any difference?  What version of Solr are
>you running on the server side?
>
>> Our plan right now is to roll back to the basic HTTP client and pass
>>all traffic through our load balancer, but would like to understand if
>>this is an expected interaction for the Cloud Client, a misconfiguration
>>on our end, or a bug
>
>At least you have that as an option!  Some people might not be able to
>do that.
>
>Thanks,
>Shawn
>



Re: CloudSolrClient produces tons of CLUSTERSTATUS commands against single server in Cloud

2018-11-06 Thread Zimmermann, Thomas
I should mention I¹m also hanging out in the Solr IRC Channel today under
the nick ³apatheticnow² if anyone wants to follow up in real time during
business hours EST.

On 11/6/18, 11:39 AM, "Shawn Heisey"  wrote:

>On 11/6/2018 9:06 AM, Zimmermann, Thomas wrote:
>> For example - 75k request per minute going to this one box, and 3.5k
>>RPM to all other nodes in the cloud.
>>
>> All of those extra requests on the one box are
>>"/solr/admin/collections?collection=collectionName&action=CLUSTERSTATUS&w
>>t=javabin&version=2"
>
>That sounds like either a bug or some kind of problem in your setup.
>Over a thousand requests per second will overwhelm a single Solr node,
>even if the info can be satisfied entirely from memory and doesn't
>require complex calculations or large-scale data retrieval like a
>regular query does.
>
>If you manually execute that request, do you get a response, and does it
>return quickly or take a significant amount of time?  If the request
>itself has problems, maybe CloudSolrClient is repeating it frequently
>because it's not getting the info it's after.  Can you share the full
>log entry from solr.log for one of those requests?
>
>I try to keep an eye on things with CloudSolrClient, but I have very
>limited experience with it.  I cannot imagine that the behavior you're
>seeing is normal.  It sounds very wrong to me.
>
>Since I do not know all that much about how CloudSolrClient's background
>threads work, I cannot say for sure whether it's a bug or a problem with
>your setup.  Can you try upgrading the Solr jars in your client app to
>7.5.0 and see if that makes any difference?  What version of Solr are
>you running on the server side?
>
>> Our plan right now is to roll back to the basic HTTP client and pass
>>all traffic through our load balancer, but would like to understand if
>>this is an expected interaction for the Cloud Client, a misconfiguration
>>on our end, or a bug
>
>At least you have that as an option!  Some people might not be able to
>do that.
>
>Thanks,
>Shawn
>



Re: is SearchComponent the correct way?

2018-11-06 Thread John Thorhauer
Mikhail,

Thanks for the suggestion.  After looking over the PostFilter interface and
the DelegatingCollector, it appears that this would require me to query my
outside datastore (redis) for security information once for each document.
This would be a big performance issue.  I would like to be able to iterate
through the documents, gathering all the critical ID's and then send a
single query to redis, getting back my security related data, and then
iterate through the documents, pulling out the ones that the user should
not see.

Is this possible?

Thanks again for your help!
John


On Tue, Nov 6, 2018 at 6:24 AM John Thorhauer 
wrote:

> We have a need to check the results of a search against a set of security
> lists that are maintained in a redis cache.  I need to be able to take each
> document that is returned for a search and check the redis cache to see if
> the document should be displayed or not.
>
> I am attempting to do this by creating a SearchComponent.  I am able to
> iterate thru the results and identify the items I want to remove from the
> results but I am not sure how to proceed in removing them.
>
> Is SearchComponent the best way to do this?  If so, any thoughts on how to
> proceed?
>
>
> Thanks,
> John Thorhauer
>
>

-- 
John Thorhauer
Vice President, Software Development
Yakabod, Inc.
Cell: 240-818-9050
Office: 301-662-4554 x2105


Re: CloudSolrClient produces tons of CLUSTERSTATUS commands against single server in Cloud

2018-11-06 Thread Shawn Heisey

On 11/6/2018 10:12 AM, Zimmermann, Thomas wrote:

Shawn -

Server performance is fine and request time are great. We are tolerating
the level of traffic, but the server that is taking all the hits is
obviously performing a bit slower than the others. Response times are
under 5MS avg for queries on all servers, which is within our perf
thresholds.


I was asking specifically about the clusterstatus requests -- whether 
the response looks complete if you manually execute the same request and 
whether it returns quickly.  And I'd like to see the solr.log where 
these are happening.


Knowing that requests in general are performing well is good info, 
although I have no idea how that is possible on the node that is getting 
over a thousand clusterstatus requests per second.  I would expect that 
node to be essentially dead under that much load.  Since it's apparently 
handling it fine ... that's really impressive.



We are running 7.4 on the client and server side, moving to 7.5 was
troublesome for us so we are holding off for the time being.


I was hoping you could just upgrade the SolrJ client, which would 
involve either replacing the solrj jar or bumping the version number in 
the config for a dependency manager (things like ivy, maven, gradle, 
etc).  A 7.5 client should be pretty safe against 7.4 servers.  The 
client would be newer than the server and very close to the same 
version, which is the general recommendation for CloudSolrClient when 
the two versions cannot be identical for some reason.


Are you absolutely sure that those requests are coming from the program 
with CloudSolrClient?  To find out, you'll need to enable the request 
log in jetty.xml (it just needs to be un-commented) and restart the 
server.  The source address is not logged in solr.log.  It's very 
important to be absolutely sure where the requests are coming from.  If 
you're running the client code on the same machine as one of your Solr 
servers, it will be difficult to be sure about the source, so I would 
definitely suggest running the client code on a completely different 
machine, so the source addresses in the request log are useful.


Thanks,
Shawn



Re: is SearchComponent the correct way?

2018-11-06 Thread Mikhail Khludnev
Not really. It expect to work segment by segment. So it can buffer all doc
from one segment, hit redis and push all results into delegating collector.

On Tue, Nov 6, 2018 at 8:29 PM John Thorhauer 
wrote:

> Mikhail,
>
> Thanks for the suggestion.  After looking over the PostFilter interface and
> the DelegatingCollector, it appears that this would require me to query my
> outside datastore (redis) for security information once for each document.
> This would be a big performance issue.  I would like to be able to iterate
> through the documents, gathering all the critical ID's and then send a
> single query to redis, getting back my security related data, and then
> iterate through the documents, pulling out the ones that the user should
> not see.
>
> Is this possible?
>
> Thanks again for your help!
> John
>
>
> On Tue, Nov 6, 2018 at 6:24 AM John Thorhauer 
> wrote:
>
> > We have a need to check the results of a search against a set of security
> > lists that are maintained in a redis cache.  I need to be able to take
> each
> > document that is returned for a search and check the redis cache to see
> if
> > the document should be displayed or not.
> >
> > I am attempting to do this by creating a SearchComponent.  I am able to
> > iterate thru the results and identify the items I want to remove from the
> > results but I am not sure how to proceed in removing them.
> >
> > Is SearchComponent the best way to do this?  If so, any thoughts on how
> to
> > proceed?
> >
> >
> > Thanks,
> > John Thorhauer
> >
> >
>
> --
> John Thorhauer
> Vice President, Software Development
> Yakabod, Inc.
> Cell: 240-818-9050
> Office: 301-662-4554 x2105
>


-- 
Sincerely yours
Mikhail Khludnev


Re: CloudSolrClient produces tons of CLUSTERSTATUS commands against single server in Cloud

2018-11-06 Thread Tomáš Hampl
This error comes every request,
in solr client or if i call url in chrome browser or curl from console.
I have no replicas actually for this test but it is NRT type.
There is no writes or another reads on this server (solr cloud) completely
isolated.  (version 7.5 single docker container)
I have 6 collection on server.

--
Tomáš Hampl
Mob.: +420774850702
E-Mail: to...@hampl.biz


út 6. 11. 2018 v 18:35 odesílatel Shawn Heisey  napsal:

> On 11/6/2018 10:12 AM, Zimmermann, Thomas wrote:
> > Shawn -
> >
> > Server performance is fine and request time are great. We are tolerating
> > the level of traffic, but the server that is taking all the hits is
> > obviously performing a bit slower than the others. Response times are
> > under 5MS avg for queries on all servers, which is within our perf
> > thresholds.
>
> I was asking specifically about the clusterstatus requests -- whether
> the response looks complete if you manually execute the same request and
> whether it returns quickly.  And I'd like to see the solr.log where
> these are happening.
>
> Knowing that requests in general are performing well is good info,
> although I have no idea how that is possible on the node that is getting
> over a thousand clusterstatus requests per second.  I would expect that
> node to be essentially dead under that much load.  Since it's apparently
> handling it fine ... that's really impressive.
>
> > We are running 7.4 on the client and server side, moving to 7.5 was
> > troublesome for us so we are holding off for the time being.
>
> I was hoping you could just upgrade the SolrJ client, which would
> involve either replacing the solrj jar or bumping the version number in
> the config for a dependency manager (things like ivy, maven, gradle,
> etc).  A 7.5 client should be pretty safe against 7.4 servers.  The
> client would be newer than the server and very close to the same
> version, which is the general recommendation for CloudSolrClient when
> the two versions cannot be identical for some reason.
>
> Are you absolutely sure that those requests are coming from the program
> with CloudSolrClient?  To find out, you'll need to enable the request
> log in jetty.xml (it just needs to be un-commented) and restart the
> server.  The source address is not logged in solr.log.  It's very
> important to be absolutely sure where the requests are coming from.  If
> you're running the client code on the same machine as one of your Solr
> servers, it will be difficult to be sure about the source, so I would
> definitely suggest running the client code on a completely different
> machine, so the source addresses in the request log are useful.
>
> Thanks,
> Shawn
>
>


Re: CloudSolrClient produces tons of CLUSTERSTATUS commands against single server in Cloud

2018-11-06 Thread Zimmermann, Thomas
Hi Shawn,

We¹re equally impressed by how well the server is handling it. We¹re using
Sematext for monitoring and the load on the box has been steady under 1
and not entering a swap state memory wise.

We are 100% certain the traffic is coming from the 3 web hosts running
this code. We have put some custom logging in place that logs all requests
to an access style log and stores that data in kibana/logstash. In
logstash we are able to confirm that all these requests (~40million in the
last 12 hours) are coming from our web front ends directly to a single box
in the cluster.

Our client codes is on separate servers from our solr servers and zk has
it¹s own boxes as well.

Here¹s a scrubbed pastbin of our cluster status response from that machine
that is getting all the traffic, I pulled this via browser on my local
machine.
https://pastebin.com/42haKVME

We can attempt to update the SolrJ dependency on our lower env and see if
that fixes the problem if you think that a good course of action, but we
are also in the midst of switching over to HTTP Client to resolve the
production issues we are seeing ASAP, so I can¹t promise a timeline. If
you think there¹s a chance that will fix this, we could of course give it
a quick go.


-TZ



On 11/6/18, 12:35 PM, "Shawn Heisey"  wrote:

>On 11/6/2018 10:12 AM, Zimmermann, Thomas wrote:
>> Shawn -
>>
>> Server performance is fine and request time are great. We are tolerating
>> the level of traffic, but the server that is taking all the hits is
>> obviously performing a bit slower than the others. Response times are
>> under 5MS avg for queries on all servers, which is within our perf
>> thresholds.
>
>I was asking specifically about the clusterstatus requests -- whether
>the response looks complete if you manually execute the same request and
>whether it returns quickly.  And I'd like to see the solr.log where
>these are happening.
>
>Knowing that requests in general are performing well is good info,
>although I have no idea how that is possible on the node that is getting
>over a thousand clusterstatus requests per second.  I would expect that
>node to be essentially dead under that much load.  Since it's apparently
>handling it fine ... that's really impressive.
>
>> We are running 7.4 on the client and server side, moving to 7.5 was
>> troublesome for us so we are holding off for the time being.
>
>I was hoping you could just upgrade the SolrJ client, which would
>involve either replacing the solrj jar or bumping the version number in
>the config for a dependency manager (things like ivy, maven, gradle,
>etc).  A 7.5 client should be pretty safe against 7.4 servers.  The
>client would be newer than the server and very close to the same
>version, which is the general recommendation for CloudSolrClient when
>the two versions cannot be identical for some reason.
>
>Are you absolutely sure that those requests are coming from the program
>with CloudSolrClient?  To find out, you'll need to enable the request
>log in jetty.xml (it just needs to be un-commented) and restart the
>server.  The source address is not logged in solr.log.  It's very
>important to be absolutely sure where the requests are coming from.  If
>you're running the client code on the same machine as one of your Solr
>servers, it will be difficult to be sure about the source, so I would
>definitely suggest running the client code on a completely different
>machine, so the source addresses in the request log are useful.
>
>Thanks,
>Shawn
>



Negative CDCR Queue Size?

2018-11-06 Thread Webster Homer
Several times I have noticed that the CDCR action=QUEUES will return a negative 
queueSize. When this happens we seem to be missing data in the target 
collection. How can this happen? What does a negative Queue size mean? The 
timestamp is an empty string.

We have two targets for a source. One looks like this, with a negative queue 
size
queues": 
["uc1f-ecom-mzk01.sial.com:2181,uc1f-ecom-mzk02.sial.com:2181,uc1f-ecom-mzk03.sial.com:2181/solr",["ucb-catalog-material-180317",["queueSize",-1,"lastTimestamp",""]],

The other is healthy
"ae1b-ecom-mzk01.sial.com:2181,ae1b-ecom-mzk02.sial.com:2181,ae1b-ecom-mzk03.sial.com:2181/solr",["ucb-catalog-material-180317",["queueSize",246980,"lastTimestamp","2018-11-06T16:21:53.265Z"]]

We are not seeing CDCR errors.

What could cause this behavior?


Re: Negative CDCR Queue Size?

2018-11-06 Thread Erick Erickson
What version of Solr? CDCR has changed quite a bit in the 7x  code
line so it's important to know the version.

On Tue, Nov 6, 2018 at 10:32 AM Webster Homer
 wrote:
>
> Several times I have noticed that the CDCR action=QUEUES will return a 
> negative queueSize. When this happens we seem to be missing data in the 
> target collection. How can this happen? What does a negative Queue size mean? 
> The timestamp is an empty string.
>
> We have two targets for a source. One looks like this, with a negative queue 
> size
> queues": 
> ["uc1f-ecom-mzk01.sial.com:2181,uc1f-ecom-mzk02.sial.com:2181,uc1f-ecom-mzk03.sial.com:2181/solr",["ucb-catalog-material-180317",["queueSize",-1,"lastTimestamp",""]],
>
> The other is healthy
> "ae1b-ecom-mzk01.sial.com:2181,ae1b-ecom-mzk02.sial.com:2181,ae1b-ecom-mzk03.sial.com:2181/solr",["ucb-catalog-material-180317",["queueSize",246980,"lastTimestamp","2018-11-06T16:21:53.265Z"]]
>
> We are not seeing CDCR errors.
>
> What could cause this behavior?


Re: Retrieve field from docValues

2018-11-06 Thread Wei
Thanks Yasufumi and Erick.

---. 2. "it depends". Solr  will try to do the most efficient thing
possible. If _all_ the fields are docValues, it will return the stored
values from the docValues  structure.

I find this jira:   https://issues.apache.org/jira/browse/SOLR-8344Does
this mean "Solr  will try to do the most efficient thing possible" only
working for 7.x?  Is the behavior available for 6.6?

-- This prevents a disk seek and  decompress cycle.

Does this still hold if whole index is loaded into memory?  Also for the
benefit of performance improvement,  does the uniqueKey field need to be
always docValues? Since it is used in the first phase of distributed
search.

Thanks,
Wei



On Tue, Nov 6, 2018 at 8:30 AM Erick Erickson 
wrote:

> 2. "it depends". Solr  will try to do the most efficient thing
> possible. If _all_ the fields are docValues, it will return the stored
> values from the docValues  structure. This prevents a disk seek and
> decompress cycle.
>
> However, if even one field is docValues=false Solr will by default
> return the stored values. For the multiValued case, you can explicitly
> tell Solr to return the docValues field.
>
> Best,
> Erick
> On Tue, Nov 6, 2018 at 1:46 AM Yasufumi Mizoguchi
>  wrote:
> >
> > Hi,
> >
> > > 1. For schema version 1.6, useDocValuesAsStored=true is default, so
> there
> > > is no need to explicitly set it in schema.xml?
> >
> > Yes.
> >
> > > 2.  With useDocValuesAsStored=true and the following definition, will
> Solr
> > > retrieve id from docValues instead of stored field?
> >
> > No.
> > AFAIK, if you define both docValues="true" and stored="true" in your
> > schema,
> > Solr tries to retrieve stored value.
> > (Except using streaming expressions or /export handler etc...
> > See:
> >
> https://lucene.apache.org/solr/guide/6_6/docvalues.html#DocValues-EnablingDocValues
> > )
> >
> > Thanks,
> > Yasufumi
> >
> >
> > 2018年11月6日(火) 9:54 Wei :
> >
> > > Hi,
> > >
> > > I have a few questions about using the useDocValuesAsStored option to
> > > retrieve field from docValues:
> > >
> > > 1. For schema version 1.6, useDocValuesAsStored=true is default, so
> there
> > > is no need to explicitly set it in schema.xml?
> > >
> > > 2.  With useDocValuesAsStored=true and the following definition, will
> Solr
> > > retrieve id from docValues instead of stored field? if fl= id, title,
> > > score,   both id and title are single value field:
> > >
> > >> > docValues="true" required="true"/>
> > >
> > >   > > docValues="true" required="true"/>
> > >
> > >   Do I need to have all fields stored="false" docValues="true" to make
> solr
> > > retrieve from docValues only? I am using Solr 6.6.
> > >
> > > Thanks,
> > > Wei
> > >
>


Re: SolrCloud Replication Failure

2018-11-06 Thread Kevin Risden
Erick Erickson - I don't have much time to chase this down. Do you think
this a blocker for 7.6? It seems pretty serious.

Jeremy - This would be a good JIRA to create - we can move the conversation
there to try to get the right people involved.

Kevin Risden


On Fri, Nov 2, 2018 at 7:57 AM Jeremy Smith  wrote:

> Hi Susheel,
>
>  Yes, it appears that under certain conditions, if a follower is down
> when the leader gets an update, the follower will not receive that update
> when it comes back (or maybe it receives the update and it's then
> overwritten by its own transaction logs, I'm not sure).  Furthermore, if
> that follower then becomes the leader, it will replicate its own out of
> date value back to the former leader, even though the version number is
> lower.
>
>
>-Jeremy
>
> 
> From: Susheel Kumar 
> Sent: Thursday, November 1, 2018 2:57:00 PM
> To: solr-user@lucene.apache.org
> Subject: Re: SolrCloud Replication Failure
>
> Are we saying it has to do something with stop and restarting replica's
> otherwise I haven't seen/heard any issues with document updates and
> forwarding to replica's...
>
> Thanks,
> Susheel
>
> On Thu, Nov 1, 2018 at 12:58 PM Erick Erickson 
> wrote:
>
> > So  this seems like it absolutely needs a JIRA
> > On Thu, Nov 1, 2018 at 9:39 AM
> Kevin Risden
>  wrote:
> > >
> > > I pushed 3 branches that modifies test.sh to test 5.5, 6.6, and 7.5
> > locally
> > > without docker. I still see the same behavior where the latest updates
> > > aren't on the replicas. I still don't know what is happening but it
> > happens
> > > without Docker :(
> > >
> > >
> >
> https://github.com/risdenk/test-solr-start-stop-replica-consistency/branches
> > >
> > > Kevin Risden
> > >
> > >
> > > On Thu, Nov 1, 2018 at 11:41 AM Kevin Risden 
> wrote:
> > >
> > > > Erick - Yea thats a fair point. Would be interesting to see if this
> > fails
> > > > without Docker.
> > > >
> > > > Kevin Risden
> > > >
> > > >
> > > > On Thu, Nov 1, 2018 at 11:06 AM Erick Erickson <
> > erickerick...@gmail.com>
> > > > wrote:
> > > >
> > > >> Kevin:
> > > >>
> > > >> You're also using Docker, right? Docker is not "officially"
> supported
> > > >> although there's some movement in that direction and if this is only
> > > >> reproducible in Docker than it's a clue where to look
> > > >>
> > > >> Erick
> > > >> On Wed, Oct 31, 2018 at 7:24 PM
> > > >> Kevin Risden
> > > >>  wrote:
> > > >> >
> > > >> > I haven't dug into why this is happening but it definitely
> > reproduces. I
> > > >> > removed the local requirements (port mapping and such) from the
> > gist you
> > > >> > posted (very helpful). I confirmed this fails locally and on
> Travis
> > CI.
> > > >> >
> > > >> >
> https://github.com/risdenk/test-solr-start-stop-replica-consistency
> > > >> >
> > > >> > I don't even see the first update getting applied from num 10 ->
> 20.
> > > >> After
> > > >> > the first update there is no more change.
> > > >> >
> > > >> > Kevin Risden
> > > >> >
> > > >> >
> > > >> > On Wed, Oct 31, 2018 at 8:26 PM Jeremy Smith  >
> > > >> wrote:
> > > >> >
> > > >> > > Thanks Erick, this is 7.5.0.
> > > >> > > 
> > > >> > > From: Erick Erickson 
> > > >> > > Sent: Wednesday, October 31, 2018 8:20:18 PM
> > > >> > > To: solr-user
> > > >> > > Subject: Re: SolrCloud Replication Failure
> > > >> > >
> > > >> > > What version of solr? This code was pretty much rewriten in 7.3
> > IIRC
> > > >> > >
> > > >> > > On Wed, Oct 31, 2018, 10:47 Jeremy Smith  > wrote:
> > > >> > >
> > > >> > > > Hi all,
> > > >> > > >
> > > >> > > >  We are currently running a moderately large instance of
> > > >> standalone
> > > >> > > > solr and are preparing to switch to solr cloud to help us
> scale
> > > >> up.  I
> > > >> > > have
> > > >> > > > been running a number of tests using docker locally and ran
> > into an
> > > >> issue
> > > >> > > > where replication is consistently failing.  I have pared down
> > the
> > > >> test
> > > >> > > case
> > > >> > > > as minimally as I could.  Here's a link for the
> > docker-compose.yml
> > > >> (I put
> > > >> > > > it in a directory called solrcloud_simple) and a script to run
> > the
> > > >> test:
> > > >> > > >
> > > >> > > >
> > > >> > > >
> > https://gist.github.com/smithje/2056209fc4a6fb3bcc8b44d0b7df3489
> > > >> > > >
> > > >> > > >
> > > >> > > > Here's the basic idea behind the test:
> > > >> > > >
> > > >> > > >
> > > >> > > > 1) Create a cluster with 2 nodes (solr-1 and solr-2), 1 shard,
> > and 2
> > > >> > > > replicas (each node gets a replica).  Just use the default
> > schema,
> > > >> > > although
> > > >> > > > I've also tried our schema and got the same result.
> > > >> > > >
> > > >> > > >
> > > >> > > > 2) Shut down solr-2
> > > >> > > >
> > > >> > > >
> > > >> > > > 3) Add 100 simple docs, just id and a field called num.
> > > >> > > >
> > > >> > > >
> > > >> > > > 4) Start solr-2 and check that it received t

RE: Negative CDCR Queue Size?

2018-11-06 Thread Webster Homer
I'm sorry I should have included that. We are running Solr 7.2. We use CDCR for 
almost all of our collections. We have experienced several intermittent 
problems with CDCR, this one seems to be new, at least I hadn't seen it before

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Tuesday, November 06, 2018 12:36 PM
To: solr-user 
Subject: Re: Negative CDCR Queue Size?

What version of Solr? CDCR has changed quite a bit in the 7x  code line so it's 
important to know the version.

On Tue, Nov 6, 2018 at 10:32 AM Webster Homer 
 wrote:
>
> Several times I have noticed that the CDCR action=QUEUES will return a 
> negative queueSize. When this happens we seem to be missing data in the 
> target collection. How can this happen? What does a negative Queue size mean? 
> The timestamp is an empty string.
>
> We have two targets for a source. One looks like this, with a negative 
> queue size
> queues": 
> ["uc1f-ecom-mzk01.sial.com:2181,uc1f-ecom-mzk02.sial.com:2181,uc1f-eco
> m-mzk03.sial.com:2181/solr",["ucb-catalog-material-180317",["queueSize
> ",-1,"lastTimestamp",""]],
>
> The other is healthy
> "ae1b-ecom-mzk01.sial.com:2181,ae1b-ecom-mzk02.sial.com:2181,ae1b-ecom
> -mzk03.sial.com:2181/solr",["ucb-catalog-material-180317",["queueSize"
> ,246980,"lastTimestamp","2018-11-06T16:21:53.265Z"]]
>
> We are not seeing CDCR errors.
>
> What could cause this behavior?


Re: SolrCloud Replication Failure

2018-11-06 Thread Erick Erickson
Kevin:

Well, let's certainly raise it as a JIRA, blocker or not I'm not sure.
I _think_ the new LIR work done in Solr 7.3 might make it possible to
detect this condition but I'm not totally sure what to do about it.

So let's say the leader gets an update while a follower is down. (one
leader and one follower for simplicity). Now say the leader dies and
the follower is restarted. What should happen? Should Solr refuse to
start? Would FORCELEADER work if the user was willing to lose data?

Let's move the discussion to the JIRA though.
On Tue, Nov 6, 2018 at 10:58 AM Kevin Risden  wrote:
>
> Erick Erickson - I don't have much time to chase this down. Do you think
> this a blocker for 7.6? It seems pretty serious.
>
> Jeremy - This would be a good JIRA to create - we can move the conversation
> there to try to get the right people involved.
>
> Kevin Risden
>
>
> On Fri, Nov 2, 2018 at 7:57 AM Jeremy Smith  wrote:
>
> > Hi Susheel,
> >
> >  Yes, it appears that under certain conditions, if a follower is down
> > when the leader gets an update, the follower will not receive that update
> > when it comes back (or maybe it receives the update and it's then
> > overwritten by its own transaction logs, I'm not sure).  Furthermore, if
> > that follower then becomes the leader, it will replicate its own out of
> > date value back to the former leader, even though the version number is
> > lower.
> >
> >
> >-Jeremy
> >
> > 
> > From: Susheel Kumar 
> > Sent: Thursday, November 1, 2018 2:57:00 PM
> > To: solr-user@lucene.apache.org
> > Subject: Re: SolrCloud Replication Failure
> >
> > Are we saying it has to do something with stop and restarting replica's
> > otherwise I haven't seen/heard any issues with document updates and
> > forwarding to replica's...
> >
> > Thanks,
> > Susheel
> >
> > On Thu, Nov 1, 2018 at 12:58 PM Erick Erickson 
> > wrote:
> >
> > > So  this seems like it absolutely needs a JIRA
> > > On Thu, Nov 1, 2018 at 9:39 AM
> > Kevin Risden
> >  wrote:
> > > >
> > > > I pushed 3 branches that modifies test.sh to test 5.5, 6.6, and 7.5
> > > locally
> > > > without docker. I still see the same behavior where the latest updates
> > > > aren't on the replicas. I still don't know what is happening but it
> > > happens
> > > > without Docker :(
> > > >
> > > >
> > >
> > https://github.com/risdenk/test-solr-start-stop-replica-consistency/branches
> > > >
> > > > Kevin Risden
> > > >
> > > >
> > > > On Thu, Nov 1, 2018 at 11:41 AM Kevin Risden 
> > wrote:
> > > >
> > > > > Erick - Yea thats a fair point. Would be interesting to see if this
> > > fails
> > > > > without Docker.
> > > > >
> > > > > Kevin Risden
> > > > >
> > > > >
> > > > > On Thu, Nov 1, 2018 at 11:06 AM Erick Erickson <
> > > erickerick...@gmail.com>
> > > > > wrote:
> > > > >
> > > > >> Kevin:
> > > > >>
> > > > >> You're also using Docker, right? Docker is not "officially"
> > supported
> > > > >> although there's some movement in that direction and if this is only
> > > > >> reproducible in Docker than it's a clue where to look
> > > > >>
> > > > >> Erick
> > > > >> On Wed, Oct 31, 2018 at 7:24 PM
> > > > >> Kevin Risden
> > > > >>  wrote:
> > > > >> >
> > > > >> > I haven't dug into why this is happening but it definitely
> > > reproduces. I
> > > > >> > removed the local requirements (port mapping and such) from the
> > > gist you
> > > > >> > posted (very helpful). I confirmed this fails locally and on
> > Travis
> > > CI.
> > > > >> >
> > > > >> >
> > https://github.com/risdenk/test-solr-start-stop-replica-consistency
> > > > >> >
> > > > >> > I don't even see the first update getting applied from num 10 ->
> > 20.
> > > > >> After
> > > > >> > the first update there is no more change.
> > > > >> >
> > > > >> > Kevin Risden
> > > > >> >
> > > > >> >
> > > > >> > On Wed, Oct 31, 2018 at 8:26 PM Jeremy Smith  > >
> > > > >> wrote:
> > > > >> >
> > > > >> > > Thanks Erick, this is 7.5.0.
> > > > >> > > 
> > > > >> > > From: Erick Erickson 
> > > > >> > > Sent: Wednesday, October 31, 2018 8:20:18 PM
> > > > >> > > To: solr-user
> > > > >> > > Subject: Re: SolrCloud Replication Failure
> > > > >> > >
> > > > >> > > What version of solr? This code was pretty much rewriten in 7.3
> > > IIRC
> > > > >> > >
> > > > >> > > On Wed, Oct 31, 2018, 10:47 Jeremy Smith  > > wrote:
> > > > >> > >
> > > > >> > > > Hi all,
> > > > >> > > >
> > > > >> > > >  We are currently running a moderately large instance of
> > > > >> standalone
> > > > >> > > > solr and are preparing to switch to solr cloud to help us
> > scale
> > > > >> up.  I
> > > > >> > > have
> > > > >> > > > been running a number of tests using docker locally and ran
> > > into an
> > > > >> issue
> > > > >> > > > where replication is consistently failing.  I have pared down
> > > the
> > > > >> test
> > > > >> > > case
> > > > >> > > > as minimally as I could.  Here's a link for the
> > > docker-compose.

Re: Retrieve field from docValues

2018-11-06 Thread Erick Erickson
Yes, "the most efficient possible" is associated with that JIRA, so only in 7x.

"Does this still hold if whole index is loaded into memory?"
The decompression part yes, the disk seek part no. And it's also
sensitive to whether the documentCache already has the document.

I'd also make uniqueKey ant the _version_ fields docValues.

Best,
Erick
On Tue, Nov 6, 2018 at 10:44 AM Wei  wrote:
>
> Thanks Yasufumi and Erick.
>
> ---. 2. "it depends". Solr  will try to do the most efficient thing
> possible. If _all_ the fields are docValues, it will return the stored
> values from the docValues  structure.
>
> I find this jira:   https://issues.apache.org/jira/browse/SOLR-8344Does
> this mean "Solr  will try to do the most efficient thing possible" only
> working for 7.x?  Is the behavior available for 6.6?
>
> -- This prevents a disk seek and  decompress cycle.
>
> Does this still hold if whole index is loaded into memory?  Also for the
> benefit of performance improvement,  does the uniqueKey field need to be
> always docValues? Since it is used in the first phase of distributed
> search.
>
> Thanks,
> Wei
>
>
>
> On Tue, Nov 6, 2018 at 8:30 AM Erick Erickson 
> wrote:
>
> > 2. "it depends". Solr  will try to do the most efficient thing
> > possible. If _all_ the fields are docValues, it will return the stored
> > values from the docValues  structure. This prevents a disk seek and
> > decompress cycle.
> >
> > However, if even one field is docValues=false Solr will by default
> > return the stored values. For the multiValued case, you can explicitly
> > tell Solr to return the docValues field.
> >
> > Best,
> > Erick
> > On Tue, Nov 6, 2018 at 1:46 AM Yasufumi Mizoguchi
> >  wrote:
> > >
> > > Hi,
> > >
> > > > 1. For schema version 1.6, useDocValuesAsStored=true is default, so
> > there
> > > > is no need to explicitly set it in schema.xml?
> > >
> > > Yes.
> > >
> > > > 2.  With useDocValuesAsStored=true and the following definition, will
> > Solr
> > > > retrieve id from docValues instead of stored field?
> > >
> > > No.
> > > AFAIK, if you define both docValues="true" and stored="true" in your
> > > schema,
> > > Solr tries to retrieve stored value.
> > > (Except using streaming expressions or /export handler etc...
> > > See:
> > >
> > https://lucene.apache.org/solr/guide/6_6/docvalues.html#DocValues-EnablingDocValues
> > > )
> > >
> > > Thanks,
> > > Yasufumi
> > >
> > >
> > > 2018年11月6日(火) 9:54 Wei :
> > >
> > > > Hi,
> > > >
> > > > I have a few questions about using the useDocValuesAsStored option to
> > > > retrieve field from docValues:
> > > >
> > > > 1. For schema version 1.6, useDocValuesAsStored=true is default, so
> > there
> > > > is no need to explicitly set it in schema.xml?
> > > >
> > > > 2.  With useDocValuesAsStored=true and the following definition, will
> > Solr
> > > > retrieve id from docValues instead of stored field? if fl= id, title,
> > > > score,   both id and title are single value field:
> > > >
> > > >> > > docValues="true" required="true"/>
> > > >
> > > >   > > > docValues="true" required="true"/>
> > > >
> > > >   Do I need to have all fields stored="false" docValues="true" to make
> > solr
> > > > retrieve from docValues only? I am using Solr 6.6.
> > > >
> > > > Thanks,
> > > > Wei
> > > >
> >


Re: SolrCloud Replication Failure

2018-11-06 Thread Jeremy Smith
Thanks everyone.  I added SOLR-12969.


Erick - those sound like important questions, but I think this issue is 
slightly different.  In this case, replication is failing even if the leader 
never goes down.


From: Erick Erickson 
Sent: Tuesday, November 6, 2018 2:52:30 PM
To: solr-user
Subject: Re: SolrCloud Replication Failure

Kevin:

Well, let's certainly raise it as a JIRA, blocker or not I'm not sure.
I _think_ the new LIR work done in Solr 7.3 might make it possible to
detect this condition but I'm not totally sure what to do about it.

So let's say the leader gets an update while a follower is down. (one
leader and one follower for simplicity). Now say the leader dies and
the follower is restarted. What should happen? Should Solr refuse to
start? Would FORCELEADER work if the user was willing to lose data?

Let's move the discussion to the JIRA though.
On Tue, Nov 6, 2018 at 10:58 AM Kevin Risden  wrote:
>
> Erick Erickson - I don't have much time to chase this down. Do you think
> this a blocker for 7.6? It seems pretty serious.
>
> Jeremy - This would be a good JIRA to create - we can move the conversation
> there to try to get the right people involved.
>
> Kevin Risden
>
>
> On Fri, Nov 2, 2018 at 7:57 AM Jeremy Smith  wrote:
>
> > Hi Susheel,
> >
> >  Yes, it appears that under certain conditions, if a follower is down
> > when the leader gets an update, the follower will not receive that update
> > when it comes back (or maybe it receives the update and it's then
> > overwritten by its own transaction logs, I'm not sure).  Furthermore, if
> > that follower then becomes the leader, it will replicate its own out of
> > date value back to the former leader, even though the version number is
> > lower.
> >
> >
> >-Jeremy
> >
> > 
> > From: Susheel Kumar 
> > Sent: Thursday, November 1, 2018 2:57:00 PM
> > To: solr-user@lucene.apache.org
> > Subject: Re: SolrCloud Replication Failure
> >
> > Are we saying it has to do something with stop and restarting replica's
> > otherwise I haven't seen/heard any issues with document updates and
> > forwarding to replica's...
> >
> > Thanks,
> > Susheel
> >
> > On Thu, Nov 1, 2018 at 12:58 PM Erick Erickson 
> > wrote:
> >
> > > So  this seems like it absolutely needs a JIRA
> > > On Thu, Nov 1, 2018 at 9:39 AM
> > Kevin Risden
> >  wrote:
> > > >
> > > > I pushed 3 branches that modifies test.sh to test 5.5, 6.6, and 7.5
> > > locally
> > > > without docker. I still see the same behavior where the latest updates
> > > > aren't on the replicas. I still don't know what is happening but it
> > > happens
> > > > without Docker :(
> > > >
> > > >
> > >
> > https://github.com/risdenk/test-solr-start-stop-replica-consistency/branches
> > > >
> > > > Kevin Risden
> > > >
> > > >
> > > > On Thu, Nov 1, 2018 at 11:41 AM Kevin Risden 
> > wrote:
> > > >
> > > > > Erick - Yea thats a fair point. Would be interesting to see if this
> > > fails
> > > > > without Docker.
> > > > >
> > > > > Kevin Risden
> > > > >
> > > > >
> > > > > On Thu, Nov 1, 2018 at 11:06 AM Erick Erickson <
> > > erickerick...@gmail.com>
> > > > > wrote:
> > > > >
> > > > >> Kevin:
> > > > >>
> > > > >> You're also using Docker, right? Docker is not "officially"
> > supported
> > > > >> although there's some movement in that direction and if this is only
> > > > >> reproducible in Docker than it's a clue where to look
> > > > >>
> > > > >> Erick
> > > > >> On Wed, Oct 31, 2018 at 7:24 PM
> > > > >> Kevin Risden
> > > > >>  wrote:
> > > > >> >
> > > > >> > I haven't dug into why this is happening but it definitely
> > > reproduces. I
> > > > >> > removed the local requirements (port mapping and such) from the
> > > gist you
> > > > >> > posted (very helpful). I confirmed this fails locally and on
> > Travis
> > > CI.
> > > > >> >
> > > > >> >
> > https://github.com/risdenk/test-solr-start-stop-replica-consistency
> > > > >> >
> > > > >> > I don't even see the first update getting applied from num 10 ->
> > 20.
> > > > >> After
> > > > >> > the first update there is no more change.
> > > > >> >
> > > > >> > Kevin Risden
> > > > >> >
> > > > >> >
> > > > >> > On Wed, Oct 31, 2018 at 8:26 PM Jeremy Smith  > >
> > > > >> wrote:
> > > > >> >
> > > > >> > > Thanks Erick, this is 7.5.0.
> > > > >> > > 
> > > > >> > > From: Erick Erickson 
> > > > >> > > Sent: Wednesday, October 31, 2018 8:20:18 PM
> > > > >> > > To: solr-user
> > > > >> > > Subject: Re: SolrCloud Replication Failure
> > > > >> > >
> > > > >> > > What version of solr? This code was pretty much rewriten in 7.3
> > > IIRC
> > > > >> > >
> > > > >> > > On Wed, Oct 31, 2018, 10:47 Jeremy Smith  > > wrote:
> > > > >> > >
> > > > >> > > > Hi all,
> > > > >> > > >
> > > > >> > > >  We are currently running a moderately large instance of
> > > > >> standalone
> > > > >> > > > solr and are preparing to switch to solr cloud to help us
> 

Re: SolrCloud Replication Failure

2018-11-06 Thread Erick Erickson
Hmmm, ok. The replication failure could lead to the scenario I
outlined, but that's a secondary issue to the update not getting to
the follower in the first place as you say.
On Tue, Nov 6, 2018 at 12:19 PM Jeremy Smith  wrote:
>
> Thanks everyone.  I added SOLR-12969.
>
>
> Erick - those sound like important questions, but I think this issue is 
> slightly different.  In this case, replication is failing even if the leader 
> never goes down.
>
> 
> From: Erick Erickson 
> Sent: Tuesday, November 6, 2018 2:52:30 PM
> To: solr-user
> Subject: Re: SolrCloud Replication Failure
>
> Kevin:
>
> Well, let's certainly raise it as a JIRA, blocker or not I'm not sure.
> I _think_ the new LIR work done in Solr 7.3 might make it possible to
> detect this condition but I'm not totally sure what to do about it.
>
> So let's say the leader gets an update while a follower is down. (one
> leader and one follower for simplicity). Now say the leader dies and
> the follower is restarted. What should happen? Should Solr refuse to
> start? Would FORCELEADER work if the user was willing to lose data?
>
> Let's move the discussion to the JIRA though.
> On Tue, Nov 6, 2018 at 10:58 AM Kevin Risden  wrote:
> >
> > Erick Erickson - I don't have much time to chase this down. Do you think
> > this a blocker for 7.6? It seems pretty serious.
> >
> > Jeremy - This would be a good JIRA to create - we can move the conversation
> > there to try to get the right people involved.
> >
> > Kevin Risden
> >
> >
> > On Fri, Nov 2, 2018 at 7:57 AM Jeremy Smith  wrote:
> >
> > > Hi Susheel,
> > >
> > >  Yes, it appears that under certain conditions, if a follower is down
> > > when the leader gets an update, the follower will not receive that update
> > > when it comes back (or maybe it receives the update and it's then
> > > overwritten by its own transaction logs, I'm not sure).  Furthermore, if
> > > that follower then becomes the leader, it will replicate its own out of
> > > date value back to the former leader, even though the version number is
> > > lower.
> > >
> > >
> > >-Jeremy
> > >
> > > 
> > > From: Susheel Kumar 
> > > Sent: Thursday, November 1, 2018 2:57:00 PM
> > > To: solr-user@lucene.apache.org
> > > Subject: Re: SolrCloud Replication Failure
> > >
> > > Are we saying it has to do something with stop and restarting replica's
> > > otherwise I haven't seen/heard any issues with document updates and
> > > forwarding to replica's...
> > >
> > > Thanks,
> > > Susheel
> > >
> > > On Thu, Nov 1, 2018 at 12:58 PM Erick Erickson 
> > > wrote:
> > >
> > > > So  this seems like it absolutely needs a JIRA
> > > > On Thu, Nov 1, 2018 at 9:39 AM
> > > Kevin Risden
> > >  wrote:
> > > > >
> > > > > I pushed 3 branches that modifies test.sh to test 5.5, 6.6, and 7.5
> > > > locally
> > > > > without docker. I still see the same behavior where the latest updates
> > > > > aren't on the replicas. I still don't know what is happening but it
> > > > happens
> > > > > without Docker :(
> > > > >
> > > > >
> > > >
> > > https://github.com/risdenk/test-solr-start-stop-replica-consistency/branches
> > > > >
> > > > > Kevin Risden
> > > > >
> > > > >
> > > > > On Thu, Nov 1, 2018 at 11:41 AM Kevin Risden 
> > > wrote:
> > > > >
> > > > > > Erick - Yea thats a fair point. Would be interesting to see if this
> > > > fails
> > > > > > without Docker.
> > > > > >
> > > > > > Kevin Risden
> > > > > >
> > > > > >
> > > > > > On Thu, Nov 1, 2018 at 11:06 AM Erick Erickson <
> > > > erickerick...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > >> Kevin:
> > > > > >>
> > > > > >> You're also using Docker, right? Docker is not "officially"
> > > supported
> > > > > >> although there's some movement in that direction and if this is 
> > > > > >> only
> > > > > >> reproducible in Docker than it's a clue where to look
> > > > > >>
> > > > > >> Erick
> > > > > >> On Wed, Oct 31, 2018 at 7:24 PM
> > > > > >> Kevin Risden
> > > > > >>  wrote:
> > > > > >> >
> > > > > >> > I haven't dug into why this is happening but it definitely
> > > > reproduces. I
> > > > > >> > removed the local requirements (port mapping and such) from the
> > > > gist you
> > > > > >> > posted (very helpful). I confirmed this fails locally and on
> > > Travis
> > > > CI.
> > > > > >> >
> > > > > >> >
> > > https://github.com/risdenk/test-solr-start-stop-replica-consistency
> > > > > >> >
> > > > > >> > I don't even see the first update getting applied from num 10 ->
> > > 20.
> > > > > >> After
> > > > > >> > the first update there is no more change.
> > > > > >> >
> > > > > >> > Kevin Risden
> > > > > >> >
> > > > > >> >
> > > > > >> > On Wed, Oct 31, 2018 at 8:26 PM Jeremy Smith  > > >
> > > > > >> wrote:
> > > > > >> >
> > > > > >> > > Thanks Erick, this is 7.5.0.
> > > > > >> > > 
> > > > > >> > > From: Erick Erickson 
> > > > > >> > > Sent: Wednesday, October 31, 2018 8:20:

Re: Retrieve field from docValues

2018-11-06 Thread Wei
I see there is also a docValuesFormat option, what's the default for this
setting? Performance wise is it good to set docValuesFormat="Memory" ?

Best,
Wei


On Tue, Nov 6, 2018 at 11:55 AM Erick Erickson 
wrote:

> Yes, "the most efficient possible" is associated with that JIRA, so only
> in 7x.
>
> "Does this still hold if whole index is loaded into memory?"
> The decompression part yes, the disk seek part no. And it's also
> sensitive to whether the documentCache already has the document.
>
> I'd also make uniqueKey ant the _version_ fields docValues.
>
> Best,
> Erick
> On Tue, Nov 6, 2018 at 10:44 AM Wei  wrote:
> >
> > Thanks Yasufumi and Erick.
> >
> > ---. 2. "it depends". Solr  will try to do the most efficient thing
> > possible. If _all_ the fields are docValues, it will return the stored
> > values from the docValues  structure.
> >
> > I find this jira:   https://issues.apache.org/jira/browse/SOLR-8344
> Does
> > this mean "Solr  will try to do the most efficient thing possible" only
> > working for 7.x?  Is the behavior available for 6.6?
> >
> > -- This prevents a disk seek and  decompress cycle.
> >
> > Does this still hold if whole index is loaded into memory?  Also for the
> > benefit of performance improvement,  does the uniqueKey field need to be
> > always docValues? Since it is used in the first phase of distributed
> > search.
> >
> > Thanks,
> > Wei
> >
> >
> >
> > On Tue, Nov 6, 2018 at 8:30 AM Erick Erickson 
> > wrote:
> >
> > > 2. "it depends". Solr  will try to do the most efficient thing
> > > possible. If _all_ the fields are docValues, it will return the stored
> > > values from the docValues  structure. This prevents a disk seek and
> > > decompress cycle.
> > >
> > > However, if even one field is docValues=false Solr will by default
> > > return the stored values. For the multiValued case, you can explicitly
> > > tell Solr to return the docValues field.
> > >
> > > Best,
> > > Erick
> > > On Tue, Nov 6, 2018 at 1:46 AM Yasufumi Mizoguchi
> > >  wrote:
> > > >
> > > > Hi,
> > > >
> > > > > 1. For schema version 1.6, useDocValuesAsStored=true is default, so
> > > there
> > > > > is no need to explicitly set it in schema.xml?
> > > >
> > > > Yes.
> > > >
> > > > > 2.  With useDocValuesAsStored=true and the following definition,
> will
> > > Solr
> > > > > retrieve id from docValues instead of stored field?
> > > >
> > > > No.
> > > > AFAIK, if you define both docValues="true" and stored="true" in your
> > > > schema,
> > > > Solr tries to retrieve stored value.
> > > > (Except using streaming expressions or /export handler etc...
> > > > See:
> > > >
> > >
> https://lucene.apache.org/solr/guide/6_6/docvalues.html#DocValues-EnablingDocValues
> > > > )
> > > >
> > > > Thanks,
> > > > Yasufumi
> > > >
> > > >
> > > > 2018年11月6日(火) 9:54 Wei :
> > > >
> > > > > Hi,
> > > > >
> > > > > I have a few questions about using the useDocValuesAsStored option
> to
> > > > > retrieve field from docValues:
> > > > >
> > > > > 1. For schema version 1.6, useDocValuesAsStored=true is default, so
> > > there
> > > > > is no need to explicitly set it in schema.xml?
> > > > >
> > > > > 2.  With useDocValuesAsStored=true and the following definition,
> will
> > > Solr
> > > > > retrieve id from docValues instead of stored field? if fl= id,
> title,
> > > > > score,   both id and title are single value field:
> > > > >
> > > > >> > > > docValues="true" required="true"/>
> > > > >
> > > > >   > > > > docValues="true" required="true"/>
> > > > >
> > > > >   Do I need to have all fields stored="false" docValues="true" to
> make
> > > solr
> > > > > retrieve from docValues only? I am using Solr 6.6.
> > > > >
> > > > > Thanks,
> > > > > Wei
> > > > >
> > >
>


Re: Retrieve field from docValues

2018-11-06 Thread Erick Erickson
docValuesFormat="Memory" has been deprecated, so you shouldn't use it.
On Tue, Nov 6, 2018 at 2:14 PM Wei  wrote:
>
> I see there is also a docValuesFormat option, what's the default for this
> setting? Performance wise is it good to set docValuesFormat="Memory" ?
>
> Best,
> Wei
>
>
> On Tue, Nov 6, 2018 at 11:55 AM Erick Erickson 
> wrote:
>
> > Yes, "the most efficient possible" is associated with that JIRA, so only
> > in 7x.
> >
> > "Does this still hold if whole index is loaded into memory?"
> > The decompression part yes, the disk seek part no. And it's also
> > sensitive to whether the documentCache already has the document.
> >
> > I'd also make uniqueKey ant the _version_ fields docValues.
> >
> > Best,
> > Erick
> > On Tue, Nov 6, 2018 at 10:44 AM Wei  wrote:
> > >
> > > Thanks Yasufumi and Erick.
> > >
> > > ---. 2. "it depends". Solr  will try to do the most efficient thing
> > > possible. If _all_ the fields are docValues, it will return the stored
> > > values from the docValues  structure.
> > >
> > > I find this jira:   https://issues.apache.org/jira/browse/SOLR-8344
> > Does
> > > this mean "Solr  will try to do the most efficient thing possible" only
> > > working for 7.x?  Is the behavior available for 6.6?
> > >
> > > -- This prevents a disk seek and  decompress cycle.
> > >
> > > Does this still hold if whole index is loaded into memory?  Also for the
> > > benefit of performance improvement,  does the uniqueKey field need to be
> > > always docValues? Since it is used in the first phase of distributed
> > > search.
> > >
> > > Thanks,
> > > Wei
> > >
> > >
> > >
> > > On Tue, Nov 6, 2018 at 8:30 AM Erick Erickson 
> > > wrote:
> > >
> > > > 2. "it depends". Solr  will try to do the most efficient thing
> > > > possible. If _all_ the fields are docValues, it will return the stored
> > > > values from the docValues  structure. This prevents a disk seek and
> > > > decompress cycle.
> > > >
> > > > However, if even one field is docValues=false Solr will by default
> > > > return the stored values. For the multiValued case, you can explicitly
> > > > tell Solr to return the docValues field.
> > > >
> > > > Best,
> > > > Erick
> > > > On Tue, Nov 6, 2018 at 1:46 AM Yasufumi Mizoguchi
> > > >  wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > > 1. For schema version 1.6, useDocValuesAsStored=true is default, so
> > > > there
> > > > > > is no need to explicitly set it in schema.xml?
> > > > >
> > > > > Yes.
> > > > >
> > > > > > 2.  With useDocValuesAsStored=true and the following definition,
> > will
> > > > Solr
> > > > > > retrieve id from docValues instead of stored field?
> > > > >
> > > > > No.
> > > > > AFAIK, if you define both docValues="true" and stored="true" in your
> > > > > schema,
> > > > > Solr tries to retrieve stored value.
> > > > > (Except using streaming expressions or /export handler etc...
> > > > > See:
> > > > >
> > > >
> > https://lucene.apache.org/solr/guide/6_6/docvalues.html#DocValues-EnablingDocValues
> > > > > )
> > > > >
> > > > > Thanks,
> > > > > Yasufumi
> > > > >
> > > > >
> > > > > 2018年11月6日(火) 9:54 Wei :
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I have a few questions about using the useDocValuesAsStored option
> > to
> > > > > > retrieve field from docValues:
> > > > > >
> > > > > > 1. For schema version 1.6, useDocValuesAsStored=true is default, so
> > > > there
> > > > > > is no need to explicitly set it in schema.xml?
> > > > > >
> > > > > > 2.  With useDocValuesAsStored=true and the following definition,
> > will
> > > > Solr
> > > > > > retrieve id from docValues instead of stored field? if fl= id,
> > title,
> > > > > > score,   both id and title are single value field:
> > > > > >
> > > > > >> > > > > docValues="true" required="true"/>
> > > > > >
> > > > > >   > > > > > docValues="true" required="true"/>
> > > > > >
> > > > > >   Do I need to have all fields stored="false" docValues="true" to
> > make
> > > > solr
> > > > > > retrieve from docValues only? I am using Solr 6.6.
> > > > > >
> > > > > > Thanks,
> > > > > > Wei
> > > > > >
> > > >
> >


Re: Retrieve field from docValues

2018-11-06 Thread Wei
Also I notice this issue is still open:
https://issues.apache.org/jira/browse/SOLR-10816
Does that mean we still need to have stored=true for uniqueKey?

On Tue, Nov 6, 2018 at 2:14 PM Wei  wrote:

> I see there is also a docValuesFormat option, what's the default for this
> setting? Performance wise is it good to set docValuesFormat="Memory" ?
>
> Best,
> Wei
>
>
> On Tue, Nov 6, 2018 at 11:55 AM Erick Erickson 
> wrote:
>
>> Yes, "the most efficient possible" is associated with that JIRA, so only
>> in 7x.
>>
>> "Does this still hold if whole index is loaded into memory?"
>> The decompression part yes, the disk seek part no. And it's also
>> sensitive to whether the documentCache already has the document.
>>
>> I'd also make uniqueKey ant the _version_ fields docValues.
>>
>> Best,
>> Erick
>> On Tue, Nov 6, 2018 at 10:44 AM Wei  wrote:
>> >
>> > Thanks Yasufumi and Erick.
>> >
>> > ---. 2. "it depends". Solr  will try to do the most efficient thing
>> > possible. If _all_ the fields are docValues, it will return the stored
>> > values from the docValues  structure.
>> >
>> > I find this jira:   https://issues.apache.org/jira/browse/SOLR-8344
>> Does
>> > this mean "Solr  will try to do the most efficient thing possible" only
>> > working for 7.x?  Is the behavior available for 6.6?
>> >
>> > -- This prevents a disk seek and  decompress cycle.
>> >
>> > Does this still hold if whole index is loaded into memory?  Also for the
>> > benefit of performance improvement,  does the uniqueKey field need to be
>> > always docValues? Since it is used in the first phase of distributed
>> > search.
>> >
>> > Thanks,
>> > Wei
>> >
>> >
>> >
>> > On Tue, Nov 6, 2018 at 8:30 AM Erick Erickson 
>> > wrote:
>> >
>> > > 2. "it depends". Solr  will try to do the most efficient thing
>> > > possible. If _all_ the fields are docValues, it will return the stored
>> > > values from the docValues  structure. This prevents a disk seek and
>> > > decompress cycle.
>> > >
>> > > However, if even one field is docValues=false Solr will by default
>> > > return the stored values. For the multiValued case, you can explicitly
>> > > tell Solr to return the docValues field.
>> > >
>> > > Best,
>> > > Erick
>> > > On Tue, Nov 6, 2018 at 1:46 AM Yasufumi Mizoguchi
>> > >  wrote:
>> > > >
>> > > > Hi,
>> > > >
>> > > > > 1. For schema version 1.6, useDocValuesAsStored=true is default,
>> so
>> > > there
>> > > > > is no need to explicitly set it in schema.xml?
>> > > >
>> > > > Yes.
>> > > >
>> > > > > 2.  With useDocValuesAsStored=true and the following definition,
>> will
>> > > Solr
>> > > > > retrieve id from docValues instead of stored field?
>> > > >
>> > > > No.
>> > > > AFAIK, if you define both docValues="true" and stored="true" in your
>> > > > schema,
>> > > > Solr tries to retrieve stored value.
>> > > > (Except using streaming expressions or /export handler etc...
>> > > > See:
>> > > >
>> > >
>> https://lucene.apache.org/solr/guide/6_6/docvalues.html#DocValues-EnablingDocValues
>> > > > )
>> > > >
>> > > > Thanks,
>> > > > Yasufumi
>> > > >
>> > > >
>> > > > 2018年11月6日(火) 9:54 Wei :
>> > > >
>> > > > > Hi,
>> > > > >
>> > > > > I have a few questions about using the useDocValuesAsStored
>> option to
>> > > > > retrieve field from docValues:
>> > > > >
>> > > > > 1. For schema version 1.6, useDocValuesAsStored=true is default,
>> so
>> > > there
>> > > > > is no need to explicitly set it in schema.xml?
>> > > > >
>> > > > > 2.  With useDocValuesAsStored=true and the following definition,
>> will
>> > > Solr
>> > > > > retrieve id from docValues instead of stored field? if fl= id,
>> title,
>> > > > > score,   both id and title are single value field:
>> > > > >
>> > > > >   > > > > > docValues="true" required="true"/>
>> > > > >
>> > > > >  > > > > > docValues="true" required="true"/>
>> > > > >
>> > > > >   Do I need to have all fields stored="false" docValues="true" to
>> make
>> > > solr
>> > > > > retrieve from docValues only? I am using Solr 6.6.
>> > > > >
>> > > > > Thanks,
>> > > > > Wei
>> > > > >
>> > >
>>
>


Re: Retrieve field from docValues

2018-11-06 Thread Erick Erickson
You should until this is resolved. The original purpose of that JIRA
doesn't count any longer, i.e. the speedup aspects since that's been
taken care of though.
On Tue, Nov 6, 2018 at 3:50 PM Wei  wrote:
>
> Also I notice this issue is still open:
> https://issues.apache.org/jira/browse/SOLR-10816
> Does that mean we still need to have stored=true for uniqueKey?
>
> On Tue, Nov 6, 2018 at 2:14 PM Wei  wrote:
>
> > I see there is also a docValuesFormat option, what's the default for this
> > setting? Performance wise is it good to set docValuesFormat="Memory" ?
> >
> > Best,
> > Wei
> >
> >
> > On Tue, Nov 6, 2018 at 11:55 AM Erick Erickson 
> > wrote:
> >
> >> Yes, "the most efficient possible" is associated with that JIRA, so only
> >> in 7x.
> >>
> >> "Does this still hold if whole index is loaded into memory?"
> >> The decompression part yes, the disk seek part no. And it's also
> >> sensitive to whether the documentCache already has the document.
> >>
> >> I'd also make uniqueKey ant the _version_ fields docValues.
> >>
> >> Best,
> >> Erick
> >> On Tue, Nov 6, 2018 at 10:44 AM Wei  wrote:
> >> >
> >> > Thanks Yasufumi and Erick.
> >> >
> >> > ---. 2. "it depends". Solr  will try to do the most efficient thing
> >> > possible. If _all_ the fields are docValues, it will return the stored
> >> > values from the docValues  structure.
> >> >
> >> > I find this jira:   https://issues.apache.org/jira/browse/SOLR-8344
> >> Does
> >> > this mean "Solr  will try to do the most efficient thing possible" only
> >> > working for 7.x?  Is the behavior available for 6.6?
> >> >
> >> > -- This prevents a disk seek and  decompress cycle.
> >> >
> >> > Does this still hold if whole index is loaded into memory?  Also for the
> >> > benefit of performance improvement,  does the uniqueKey field need to be
> >> > always docValues? Since it is used in the first phase of distributed
> >> > search.
> >> >
> >> > Thanks,
> >> > Wei
> >> >
> >> >
> >> >
> >> > On Tue, Nov 6, 2018 at 8:30 AM Erick Erickson 
> >> > wrote:
> >> >
> >> > > 2. "it depends". Solr  will try to do the most efficient thing
> >> > > possible. If _all_ the fields are docValues, it will return the stored
> >> > > values from the docValues  structure. This prevents a disk seek and
> >> > > decompress cycle.
> >> > >
> >> > > However, if even one field is docValues=false Solr will by default
> >> > > return the stored values. For the multiValued case, you can explicitly
> >> > > tell Solr to return the docValues field.
> >> > >
> >> > > Best,
> >> > > Erick
> >> > > On Tue, Nov 6, 2018 at 1:46 AM Yasufumi Mizoguchi
> >> > >  wrote:
> >> > > >
> >> > > > Hi,
> >> > > >
> >> > > > > 1. For schema version 1.6, useDocValuesAsStored=true is default,
> >> so
> >> > > there
> >> > > > > is no need to explicitly set it in schema.xml?
> >> > > >
> >> > > > Yes.
> >> > > >
> >> > > > > 2.  With useDocValuesAsStored=true and the following definition,
> >> will
> >> > > Solr
> >> > > > > retrieve id from docValues instead of stored field?
> >> > > >
> >> > > > No.
> >> > > > AFAIK, if you define both docValues="true" and stored="true" in your
> >> > > > schema,
> >> > > > Solr tries to retrieve stored value.
> >> > > > (Except using streaming expressions or /export handler etc...
> >> > > > See:
> >> > > >
> >> > >
> >> https://lucene.apache.org/solr/guide/6_6/docvalues.html#DocValues-EnablingDocValues
> >> > > > )
> >> > > >
> >> > > > Thanks,
> >> > > > Yasufumi
> >> > > >
> >> > > >
> >> > > > 2018年11月6日(火) 9:54 Wei :
> >> > > >
> >> > > > > Hi,
> >> > > > >
> >> > > > > I have a few questions about using the useDocValuesAsStored
> >> option to
> >> > > > > retrieve field from docValues:
> >> > > > >
> >> > > > > 1. For schema version 1.6, useDocValuesAsStored=true is default,
> >> so
> >> > > there
> >> > > > > is no need to explicitly set it in schema.xml?
> >> > > > >
> >> > > > > 2.  With useDocValuesAsStored=true and the following definition,
> >> will
> >> > > Solr
> >> > > > > retrieve id from docValues instead of stored field? if fl= id,
> >> title,
> >> > > > > score,   both id and title are single value field:
> >> > > > >
> >> > > > >>> > > > > docValues="true" required="true"/>
> >> > > > >
> >> > > > >   >> > > > > docValues="true" required="true"/>
> >> > > > >
> >> > > > >   Do I need to have all fields stored="false" docValues="true" to
> >> make
> >> > > solr
> >> > > > > retrieve from docValues only? I am using Solr 6.6.
> >> > > > >
> >> > > > > Thanks,
> >> > > > > Wei
> >> > > > >
> >> > >
> >>
> >


Re: CloudSolrClient produces tons of CLUSTERSTATUS commands against single server in Cloud

2018-11-06 Thread Gus Heck
Tomáš,

One thing that causes a clusterstatus call is alias resolution if the
HttpClusterStateProvider is in use instead of the ZkClusterStateProvider.
I've just been fixing spurious error messages generated by this
in SOLR-12938.

-Gus

On Tue, Nov 6, 2018 at 1:08 PM Zimmermann, Thomas <
tzimmerm...@techtarget.com> wrote:

> Hi Shawn,
>
> We¹re equally impressed by how well the server is handling it. We¹re using
> Sematext for monitoring and the load on the box has been steady under 1
> and not entering a swap state memory wise.
>
> We are 100% certain the traffic is coming from the 3 web hosts running
> this code. We have put some custom logging in place that logs all requests
> to an access style log and stores that data in kibana/logstash. In
> logstash we are able to confirm that all these requests (~40million in the
> last 12 hours) are coming from our web front ends directly to a single box
> in the cluster.
>
> Our client codes is on separate servers from our solr servers and zk has
> it¹s own boxes as well.
>
> Here¹s a scrubbed pastbin of our cluster status response from that machine
> that is getting all the traffic, I pulled this via browser on my local
> machine.
> https://pastebin.com/42haKVME
>
> We can attempt to update the SolrJ dependency on our lower env and see if
> that fixes the problem if you think that a good course of action, but we
> are also in the midst of switching over to HTTP Client to resolve the
> production issues we are seeing ASAP, so I can¹t promise a timeline. If
> you think there¹s a chance that will fix this, we could of course give it
> a quick go.
>
>
> -TZ
>
>
>
> On 11/6/18, 12:35 PM, "Shawn Heisey"  wrote:
>
> >On 11/6/2018 10:12 AM, Zimmermann, Thomas wrote:
> >> Shawn -
> >>
> >> Server performance is fine and request time are great. We are tolerating
> >> the level of traffic, but the server that is taking all the hits is
> >> obviously performing a bit slower than the others. Response times are
> >> under 5MS avg for queries on all servers, which is within our perf
> >> thresholds.
> >
> >I was asking specifically about the clusterstatus requests -- whether
> >the response looks complete if you manually execute the same request and
> >whether it returns quickly.  And I'd like to see the solr.log where
> >these are happening.
> >
> >Knowing that requests in general are performing well is good info,
> >although I have no idea how that is possible on the node that is getting
> >over a thousand clusterstatus requests per second.  I would expect that
> >node to be essentially dead under that much load.  Since it's apparently
> >handling it fine ... that's really impressive.
> >
> >> We are running 7.4 on the client and server side, moving to 7.5 was
> >> troublesome for us so we are holding off for the time being.
> >
> >I was hoping you could just upgrade the SolrJ client, which would
> >involve either replacing the solrj jar or bumping the version number in
> >the config for a dependency manager (things like ivy, maven, gradle,
> >etc).  A 7.5 client should be pretty safe against 7.4 servers.  The
> >client would be newer than the server and very close to the same
> >version, which is the general recommendation for CloudSolrClient when
> >the two versions cannot be identical for some reason.
> >
> >Are you absolutely sure that those requests are coming from the program
> >with CloudSolrClient?  To find out, you'll need to enable the request
> >log in jetty.xml (it just needs to be un-commented) and restart the
> >server.  The source address is not logged in solr.log.  It's very
> >important to be absolutely sure where the requests are coming from.  If
> >you're running the client code on the same machine as one of your Solr
> >servers, it will be difficult to be sure about the source, so I would
> >definitely suggest running the client code on a completely different
> >machine, so the source addresses in the request log are useful.
> >
> >Thanks,
> >Shawn
> >
>
>

-- 
http://www.the111shift.com


Re: Solr suggestions, best practices

2018-11-06 Thread Zheng Lin Edwin Yeo
Maybe you can look into this:
https://lucidworks.com/2015/03/04/solr-suggester/

Which version of Solr are you using?

Regards,
Edwin

On Tue, 6 Nov 2018 at 17:00, Clemens Wyss DEV  wrote:

> At the moment we are using spellchecking-component for suggestions which
> is suboptimal, to say the least. What are best pracitces for suggestions
> using Solr?
> googling (with excellent suggestions 😉) I came along
>
> https://blog.trifork.com/2012/02/15/different-ways-to-make-auto-suggestions-with-solr/
> and
>
> https://grokbase.com/t/lucene/solr-user/14bayc6jkc/best-practice-autosuggest-autocomplete-vs-real-search
>
> Any other valuable reads/links regarding suggestions?
>
> Thx in advance
> - Clemens
>