In-place re-indexing after DocValue schema change

2020-01-29 Thread moscovig
Hi all

We are about to alter our schema with some DocValue annotations. 
According to docs, we should whether delete all docs and re-insert, or
create a new collection with the new schema.

1. Is it valid to modify the schema in the current collection, where all
documents were created without docValue, and having docValue for new docs?

2. Is it valid to upsert all documents onto the same collection, having all
docs re-indexed in-place? It does sound risky, but would it work if we will
take care of *all* documents?

Thanks!



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: In-place re-indexing after DocValue schema change

2020-01-29 Thread Emir Arnautović
Hi,
1. No, it’s not valid. Solr will look at schema to see if it can use docValues 
or if it has to uninvert field and it assumes that all fields will have doc 
values. You might expect from wrong results to errors if you do something like 
that.
2. Not sure if it would work, but It is not better than reindexing everything. 
Lucene segments are immutable and it needs to create new document and flag 
existing as deleted and purge it at segment merge time. If you are trying to 
avoid changing collection name, maybe you could do something like that by using 
aliases: index into new collection, delete existing collection, create alias 
with old collection name pointing to new collection.

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 29 Jan 2020, at 09:37, moscovig  wrote:
> 
> Hi all
> 
> We are about to alter our schema with some DocValue annotations. 
> According to docs, we should whether delete all docs and re-insert, or
> create a new collection with the new schema.
> 
> 1. Is it valid to modify the schema in the current collection, where all
> documents were created without docValue, and having docValue for new docs?
> 
> 2. Is it valid to upsert all documents onto the same collection, having all
> docs re-indexed in-place? It does sound risky, but would it work if we will
> take care of *all* documents?
> 
> Thanks!
> 
> 
> 
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html



Re: In-place re-indexing after DocValue schema change

2020-01-29 Thread moscovig
Tank you Emir.

I tried this locally (changing schema, re-index all implace)
and I wasn't able to sort on the doc value fields anymore (someone actually
mentioned this before on that forum -
https://lucene.472066.n3.nabble.com/DocValues-error-td4240116.html)
with the next error
"Error from server at
http://10.150.197.29:8961/solr/accountmaster_shard1_replica1: unexpected
docvalues type NONE for field 'key' (expected=SORTED). Re-index with correct
docvalues type."

Also, having this great overhead you mention, is another reason not to
reindex inplace.

Thanks!








--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Operation backup caused exception : AccessDeniedException

2020-01-29 Thread Salmaan Rashid Syed
Hi Shawn,

I was trying to execute the backup command using curl command on my work
computer to see why EC2 instance was giving the previous error. On my
current computer, I have root privileges. But when I execute the command on
my work computer, I have a different problem. It states that the
path/folder doesn't exist as follows.

sudo curl --user "solr-admin:#" "
http://solr.mroads.com:8983/solr/admin/collections?action=BACKUP&name=panna_backup&collection=PANNA&location=/Users/salmaan/
"

{

  "responseHeader":{

"status":500,

"QTime":0},

  "error":{

"metadata":[

  "error-class","org.apache.solr.common.SolrException",

  "root-error-class","org.apache.solr.common.SolrException"],

"msg":"specified location file:///Users/salmaan does not exist.",

org.apache.solr.common.SolrException: specified location
file:///Users/salmaan does not exist.
at 
org.apache.solr.handler.admin.CollectionsHandler$CollectionOperation.lambda$static$30(CollectionsHandler.java:980)
at 
org.apache.solr.handler.admin.CollectionsHandler$CollectionOperation.execute(CollectionsHandler.java:1177)
at 
org.apache.solr.handler.admin.CollectionsHandler.invokeAction(CollectionsHandler.java:258)
at 
org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:246)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)
at 
org.apache.solr.servlet.HttpSolrCall.handleAdmin(HttpSolrCall.java:734)
at 
org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:715)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:496)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:377)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:323)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1634)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
at 
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595)
at 
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1317)
at 
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564)
at 
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1219)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:219)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
at 
org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
at org.eclipse.jetty.server.Server.handle(Server.java:531)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:352)
at 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260)
at 
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:281)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:102)
at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118)
at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)
at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)
at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)
at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126)
at 
org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:762)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:680)
at java.base/java.lang.Thread.run(Thread.java:834

Re: Performance Issue since Solr 7.7 with wt=javabin

2020-01-29 Thread Florent Sithi
Hi Paras,

Thanks for your answer and your ideas ;)

I have the exact same issue than Andy  "wt=javabin&version=2" have really
poor performances comprared to wt=json
I'm using : 
- solr 7.7.2 
- OpenJDK8U-jdk_x64_linux_hotspot_8u222b10 or jdk-8u241-linux-x64 (same
behaviour)

The server have much RAM and GC are not triggered during the test.
I alternatively perform stress test with wt=javabin then wt=json without
restarting solr. I presume warmups is not an issue there.

What do you mean by "rebuild the performance matrix"

Thanks
Florent







--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Spell check with data from database and not from english dictionary

2020-01-29 Thread seeteshh
Hello Jan

Let me work on your suggestions too.

Also I had one query

While working on the spell check component, I dont any suggestion for the
incorrect word typed

example : In spellcheck.q,   I type "Teh" instead of "The" or "saa" instead
of "sea"

  "responseHeader":{
"status":0,
"QTime":0,
"params":{
  "spellcheck.q":"Teh",
  "spellcheck":"on",
  "spellcheck.reload":"true",
  "spellcheck.build":"true",
  "_":"1580287370193",
  "spellcheck.collate":"true"}},
  "command":"build",
  "response":{"numFound":0,"start":0,"docs":[]
  },
  "spellcheck":{
"suggestions":[],
"collations":[]}}

I have to create an entry in the synonyms.txt file for teh => The to make up
for this issue.

Does Solr require a 4 digit character in spellcheck.q to provide the proper
suggestion for the mis-spelt word? Is there any section in the Reference
guide  where it is documented? These are my findings/observations but need
to know the rationale behind this.

Regards,

Seetesh Hindlekar





-
Seetesh Hindlekar
--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Performance Issue since Solr 7.7 with wt=javabin

2020-01-29 Thread Jan Høydahl
Check out SOLR-14013  which I 
believe is what you are looking for

Jan

> 29. jan. 2020 kl. 11:46 skrev Florent Sithi :
> 
> Hi Paras,
> 
> Thanks for your answer and your ideas ;)
> 
> I have the exact same issue than Andy  "wt=javabin&version=2" have really
> poor performances comprared to wt=json
> I'm using : 
> - solr 7.7.2 
> - OpenJDK8U-jdk_x64_linux_hotspot_8u222b10 or jdk-8u241-linux-x64 (same
> behaviour)
> 
> The server have much RAM and GC are not triggered during the test.
> I alternatively perform stress test with wt=javabin then wt=json without
> restarting solr. I presume warmups is not an issue there.
> 
> What do you mean by "rebuild the performance matrix"
> 
> Thanks
> Florent
> 
> 
> 
> 
> 
> 
> 
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html



Re: Query Regarding SOLR cross collection join

2020-01-29 Thread Mikhail Khludnev
It's time to enforce and document field type constraints
https://issues.apache.org/jira/browse/SOLR-14230.

On Mon, Jan 27, 2020 at 4:12 PM Doss  wrote:

> @ Alessandro Benedetti , Thanks for your input!
>
> @ Mikhail Khludnev , I made docValues="true" for from & to and did a index
> rotation, now the score join works perfectly!  Saw 7x performance increase.
> Thanks!
>
>
> On Thu, Jan 23, 2020 at 9:53 PM Mikhail Khludnev  wrote:
>
> > On Wed, Jan 22, 2020 at 4:27 PM Doss  wrote:
> >
> > > HI,
> > >
> > > SOLR version 8.3.1 (10 nodes), zookeeper ensemble (3 nodes)
> > >
> > > Read somewhere that the score join parser will be faster, but for me it
> > > produces no results. I am using string type fields for from and to.
> > >
> >
> > That's odd. Can you try to enable docValues on from side and reindex
> small
> > portion of data just to check if it works.
> >
> >
> > >
> > >
> > > Thanks!
> > >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> >
>


-- 
Sincerely yours
Mikhail Khludnev


Re: Solr 7.7: Using Tika in Production

2020-01-29 Thread Erick Erickson
I doubt that’d work. When Solr gets an update, it forwards the document to the 
leader of the shard it’s going to eventually reside on. Among other things, the 
Solr node hosting no replicas would need to go to ZK and pull down the config 
you've created for Tika to know what to do. There’s no technical reason this 
couldn’t be done but I’m 99.9% certain nobody has, especially since running 
Tika inside solr is intended for PoC purposes rather than production.

The article you linked to has some SolrJ code that is usually  a better idea, 
or run Tika in server mode.

Best,
Erick

> On Jan 28, 2020, at 5:02 PM, Dustin Lebsock  
> wrote:
> 
> Hi!
>  
> First off, thank you for the help!
>  
> I’m currently running SolrCloud based off the helm chart found here: 
> https://github.com/helm/charts/tree/master/incubator/solr
>  
> Everything works great but I’d like to now use Tika to start indexing PDF’s 
> as well. In the documentation, its recommended to not use Solr Cell in a 
> production environment: 
> https://lucene.apache.org/solr/guide/7_7/uploading-data-with-solr-cell-using-apache-tika.html#solr-cell-performance-implications
>  
> So I have been trying to figure out a solution to have a Tika service to 
> extract the contents of the possible files and came up with an idea. I could 
> scale the amount of solr pods, have a dedicated service point to specific 
> solr-pods that do not contain any shards on them and that will only be used 
> for content extraction. That way if content-extraction goes wrong, it doesn’t 
> matter if the pod crashes. However, these nodes will still be connected to 
> ZooKeeper for the entire cluster, that way they may index the file to the 
> correct collection immediately after extraction. I’m not sure if this is how 
> SolrCloud works though.
>  
> If I send an extraction and Index request to a pod that doesn’t contain the 
> specified collection, is it extracted before being sent to the correct pod 
> for indexing? Or is it sent to a pod with the collection and then extracted? 
> If it’s the later, do you have any advice?
>  
> Thanks for the help! 
>  
> Dustin Pilkington
> Associate Software Engineer
> dustin.pilking...@bentley.com
>  
> 



Re: Can I create 1000 cores in SOLR CLOUD

2020-01-29 Thread Vignan Malyala
Guys,
Did anyone work on this type of thing?
Can you please help with this? For real time deployment and issues?

On Mon, Jan 27, 2020 at 5:29 PM Vignan Malyala  wrote:

> Hi all,
>
> We are currently using solr without cloud with 500 cores. It works good.
>
> Now we are planning to expand it using solr cloud with 1000 cores, (2
> cores for each of my client with different domain data).
>
> I'm planning to put all fields as "stored".
>
> Is it the right thought? Will it have any issues?  Will it become slow??
> How should I take care in production?
> Please help!
>
> Thanks in advance!
>
> Regards,
> VIgnan
>


Easiest way to export the entire index

2020-01-29 Thread Amanda Shuman
Dear all:

I've been asked to produce a JSON file of our index so it can be combined
and indexed with other records. (We run solr 5.3.1 on this project; we're
not going to upgrade, in part because funding has ended.) The index has
several thousand rows, but nothing too drastic. Unfortunately, this is too
much to handle for a simple query dump from the admin console. I tried to
follow instructions related to running /export directly but I guess the
export handler isn't installed. I tried to divide the query into rows, but
after a certain amount it freezes, and it also freezes when I try to limit
rows (e.g., rows 501-551 freezes the console). Is there any other way to
export the index short of having to install the export handler considering
we're not working on this project anyone?

Thanks,
Amanda

--
Dr. Amanda Shuman
Researcher and Lecturer, Institute of Chinese Studies, University of
Freiburg
Coordinator for the MA program in Modern China Studies
Database Administrator, The Maoist Legacy 
PhD, University of California, Santa Cruz
http://www.amandashuman.net/
http://www.prchistoryresources.org/
Office: +49 (0) 761 203 96748


Re: Solr Cloud on Docker?

2020-01-29 Thread Scott Stults
One of our clients has been running a big Solr Cloud (100-ish nodes, TB
index, billions of docs) in kubernetes for over a year and it's been
wonderful. I think during that time the biggest scrapes we got were when we
ran out of disk space. Performance and reliability has been solid
otherwise. Like Dwane alluded to, a lot of operations pitfalls can be
avoided if you do your Docker orchestration through kubernetes.


k/r,
Scott

On Tue, Jan 28, 2020 at 3:34 AM Dominique Bejean 
wrote:

> Hi  Dwane,
>
> Thank you for sharing this great solr/docker user story.
>
> According to your Solr/JVM memory requirements (Heap size + MetaSpace +
> OffHeap size) are you specifying specific settings in docker-compose files
> (mem_limit, mem_reservation, mem_swappiness, ...) ?
> I suppose you are limiting total memory used by all dockerised Solr in
> order to keep free memory on host for MMAPDirectory ?
>
> In short can you explain the memory management ?
>
> Regards
>
> Dominique
>
>
>
>
> Le lun. 23 déc. 2019 à 00:17, Dwane Hall  a écrit :
>
> > Hey Walter,
> >
> > I recently migrated our Solr cluster to Docker and am very pleased I did
> > so. We run relativity large servers and run multiple Solr instances per
> > physical host and having managed Solr upgrades on bare metal installs
> since
> > Solr 5, containerisation has been a blessing (currently Solr 7.7.2). In
> our
> > case we run 20 Solr nodes per host over 5 hosts totalling 100 Solr
> > instances. Here I host 3 collections of varying size. The first contains
> > 60m docs (8 shards), the second 360m (12 shards) , and the third 1.3b (30
> > shards) all with 2 NRT replicas. The docs are primarily database sourced
> > but are not tiny by any means.
> >
> > Here are some of my comments from our migration journey:
> > - Running Solr on Docker should be no different to bare metal. You still
> > need to test for your environment and conditions and follow the guides
> and
> > best practices outlined in the excellent Lucidworks blog post
> >
> https://lucidworks.com/post/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
> > .
> > - The recent Solr Docker images are built with Java 11 so if you store
> > your indexes in hdfs you'll have to build your own Docker image as Hadoop
> > is not yet certified with Java 11 (or use an older Solr version image
> built
> > with Java 8)
> > - As Docker will be responsible for quite a few Solr nodes it becomes
> > important to make sure the Docker daemon is configured in systemctl to
> > restart after failure or reboot of the host. Additionally the Docker
> > restart=always setting is useful for restarting failed containers
> > automatically if a single container dies (i.e. JVM explosions). I've
> > deliberately blown up the JVM in test conditions and found the
> > containers/Solr recover really well under Docker.
> > - I use Docker Compose to spin up our environment and it has been
> > excellent for maintaining consistent settings across Solr nodes and
> hosts.
> > Additionally using a .env file makes most of the Solr environment
> variables
> > per node configurable in an external file.
> > - I'd recommend Docker Swarm if you plan on running Solr over multiple
> > physical hosts. Unfortunately we had an incompatible OS so I was unable
> to
> > utilise this approach. The same incompatibility existed for K8s but
> > Lucidworks has another great article on this approach if you're more
> > fortunate with your environment than us
> > https://lucidworks.com/post/running-solr-on-kubernetes-part-1/.
> > - Our Solr instances are TLS secured and use the basic auth plugin and
> > rules based authentication provider. There's nothing I have not been able
> > to configure with the default Docker images using environment variables
> > passed into the container. This makes upgrades to Solr versions really
> easy
> > as you just need to grab the image and pass in your environment details
> to
> > the container for any new Solr version.
> > - If possible I'd start with the Solr 8 Docker image. The project
> > underwent a large refactor to align it with the install script based on
> > community feedback. If you start with an earlier version you'll need to
> > refactor when you eventually move to Solr version 8. The Solr Docker page
> > has more details on this.
> > - Matijn Koster (the project lead) is excellent and very responsive to
> > questions on the project page. Read through the q&a page before reaching
> > out I found a lot of my questions already answered there.  Additionally,
> he
> > provides a number of example Docker configurations from command line
> > parameters to docker-compose files running multiple instances and
> zookeeper
> > quarums.
> > - The Docker extra hosts parameter is useful for adding extra hosts to
> > your containers hosts file particularly if you have multiple nic cards
> with
> > internal and external interfaces and you want to force communication
> over a
> > specific one.
> > - We use the Solr Prome

Re: Bug in scoreNodes function of streaming expressions?

2020-01-29 Thread Joel Bernstein
Hi Pratik,

I'll create the ticket now and report back. If you've got a fix please post
it to the ticket and I'll try to get this in for the next release.

Joel Bernstein
http://joelsolr.blogspot.com/


On Tue, Jan 28, 2020 at 11:52 AM pratik@semandex 
wrote:

> Joel Bernstein wrote
> > Ok, that sounds like a bug. I can create a ticket for this.
> >
> > On Mon, Jul 1, 2019 at 5:57 PM Pratik Patel <
>
> > pratik@
>
> > > wrote:
> >
> >> I think the problem was that my streaming expression was always
> returning
> >> just one node. When I added more data so that I can have more than one
> >> node, I started seeing the result.
> >>
> >> On Mon, Jul 1, 2019 at 11:21 AM Pratik Patel <
>
> > pratik@
>
> > > wrote:
> >>
> >>> Hello Everyone,
> >>>
> >>> I am trying to execute following streaming expression with "scoreNodes"
> >>> function in it. This is taken from the documentation.
> >>>
> >>> scoreNodes(top(n="50",
> >>>sort="count(*) desc",
> >>>nodes(baskets,
> >>>  random(baskets, q="productID:ABC",
> >>> fl="basketID", rows="500"),
> >>>  walk="basketID->basketID",
> >>>  fq="-productID:ABC",
> >>>  gather="productID",
> >>>  count(*
> >>>
> >>> I have ensured that I have the collection and data present for it.
> >>> Upon executing this, I am getting an error message as follows.
> >>>
> >>> "No collection param specified on request and no default collection has
> >>> been set: []"
> >>>
> >>> Upon digging into the source code I found that there is a possible bug
> >>> in
> >>> ScoreNodesStream.java
> >>>
> >>> StringBuilder instance is never appended any string and the block which
> >>> initializes collection, needs the length of that instance to be more
> >>> than
> >>> zero. This condition will always be false and hence the collection will
> >>> never be set.
> >>>
> >>> I checked this file in solr version 8.1 and that also has the same
> >>> issue.
> >>> Is there any JIRA open for this or any patch available?
> >>>
> >>> [image: image.png]
> >>>
> >>> Thanks,
> >>> Pratik
> >>>
> >>
>
>
> Hi Joel,
>
> You mentioned creating a ticket for this bug, I can't find any, was it
> created? If not then I can create one. Currently, ScoreNodes has two
> issues.
>
> 1. It fails when result has only one node.
> 2. It triggers a GET request instead of POST. GET fails if number of nodes
> is large.
>
> I have been using a custom class as workaround for #2, it would be good to
> use the original SolrJ class.
>
> Thanks,
> Pratik
>
>
>
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


Re: Bug in scoreNodes function of streaming expressions?

2020-01-29 Thread Joel Bernstein
Here is the ticket:
https://issues.apache.org/jira/browse/SOLR-14231


Joel Bernstein
http://joelsolr.blogspot.com/


On Wed, Jan 29, 2020 at 10:03 AM Joel Bernstein  wrote:

> Hi Pratik,
>
> I'll create the ticket now and report back. If you've got a fix please
> post it to the ticket and I'll try to get this in for the next release.
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
>
> On Tue, Jan 28, 2020 at 11:52 AM pratik@semandex 
> wrote:
>
>> Joel Bernstein wrote
>> > Ok, that sounds like a bug. I can create a ticket for this.
>> >
>> > On Mon, Jul 1, 2019 at 5:57 PM Pratik Patel <
>>
>> > pratik@
>>
>> > > wrote:
>> >
>> >> I think the problem was that my streaming expression was always
>> returning
>> >> just one node. When I added more data so that I can have more than one
>> >> node, I started seeing the result.
>> >>
>> >> On Mon, Jul 1, 2019 at 11:21 AM Pratik Patel <
>>
>> > pratik@
>>
>> > > wrote:
>> >>
>> >>> Hello Everyone,
>> >>>
>> >>> I am trying to execute following streaming expression with
>> "scoreNodes"
>> >>> function in it. This is taken from the documentation.
>> >>>
>> >>> scoreNodes(top(n="50",
>> >>>sort="count(*) desc",
>> >>>nodes(baskets,
>> >>>  random(baskets, q="productID:ABC",
>> >>> fl="basketID", rows="500"),
>> >>>  walk="basketID->basketID",
>> >>>  fq="-productID:ABC",
>> >>>  gather="productID",
>> >>>  count(*
>> >>>
>> >>> I have ensured that I have the collection and data present for it.
>> >>> Upon executing this, I am getting an error message as follows.
>> >>>
>> >>> "No collection param specified on request and no default collection
>> has
>> >>> been set: []"
>> >>>
>> >>> Upon digging into the source code I found that there is a possible bug
>> >>> in
>> >>> ScoreNodesStream.java
>> >>>
>> >>> StringBuilder instance is never appended any string and the block
>> which
>> >>> initializes collection, needs the length of that instance to be more
>> >>> than
>> >>> zero. This condition will always be false and hence the collection
>> will
>> >>> never be set.
>> >>>
>> >>> I checked this file in solr version 8.1 and that also has the same
>> >>> issue.
>> >>> Is there any JIRA open for this or any patch available?
>> >>>
>> >>> [image: image.png]
>> >>>
>> >>> Thanks,
>> >>> Pratik
>> >>>
>> >>
>>
>>
>> Hi Joel,
>>
>> You mentioned creating a ticket for this bug, I can't find any, was it
>> created? If not then I can create one. Currently, ScoreNodes has two
>> issues.
>>
>> 1. It fails when result has only one node.
>> 2. It triggers a GET request instead of POST. GET fails if number of nodes
>> is large.
>>
>> I have been using a custom class as workaround for #2, it would be good to
>> use the original SolrJ class.
>>
>> Thanks,
>> Pratik
>>
>>
>>
>> --
>> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>>
>


Re: Easiest way to export the entire index

2020-01-29 Thread Emir Arnautović
Hi Amanda,
I assume that you have all the fields stored so you will be able to export full 
document.

Several thousands records should not be too much to use regular start+rows to 
paginate results, but the proper way of doing that would be to use cursors. 
Adjust page size to avoid creating huge responses and you can use curl or some 
similar tool to avoid using admin console. I did a quick search and there are 
several blog posts with scripts that does what you need.

HTH,
Emir

--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 29 Jan 2020, at 15:43, Amanda Shuman  wrote:
> 
> Dear all:
> 
> I've been asked to produce a JSON file of our index so it can be combined
> and indexed with other records. (We run solr 5.3.1 on this project; we're
> not going to upgrade, in part because funding has ended.) The index has
> several thousand rows, but nothing too drastic. Unfortunately, this is too
> much to handle for a simple query dump from the admin console. I tried to
> follow instructions related to running /export directly but I guess the
> export handler isn't installed. I tried to divide the query into rows, but
> after a certain amount it freezes, and it also freezes when I try to limit
> rows (e.g., rows 501-551 freezes the console). Is there any other way to
> export the index short of having to install the export handler considering
> we're not working on this project anyone?
> 
> Thanks,
> Amanda
> 
> --
> Dr. Amanda Shuman
> Researcher and Lecturer, Institute of Chinese Studies, University of
> Freiburg
> Coordinator for the MA program in Modern China Studies
> Database Administrator, The Maoist Legacy 
> PhD, University of California, Santa Cruz
> http://www.amandashuman.net/
> http://www.prchistoryresources.org/
> Office: +49 (0) 761 203 96748



Re: Easiest way to export the entire index

2020-01-29 Thread David Hastings
i do this often and just create a 30gb file using wget,

On Wed, Jan 29, 2020 at 10:21 AM Emir Arnautović <
emir.arnauto...@sematext.com> wrote:

> Hi Amanda,
> I assume that you have all the fields stored so you will be able to export
> full document.
>
> Several thousands records should not be too much to use regular start+rows
> to paginate results, but the proper way of doing that would be to use
> cursors. Adjust page size to avoid creating huge responses and you can use
> curl or some similar tool to avoid using admin console. I did a quick
> search and there are several blog posts with scripts that does what you
> need.
>
> HTH,
> Emir
>
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 29 Jan 2020, at 15:43, Amanda Shuman  wrote:
> >
> > Dear all:
> >
> > I've been asked to produce a JSON file of our index so it can be combined
> > and indexed with other records. (We run solr 5.3.1 on this project; we're
> > not going to upgrade, in part because funding has ended.) The index has
> > several thousand rows, but nothing too drastic. Unfortunately, this is
> too
> > much to handle for a simple query dump from the admin console. I tried to
> > follow instructions related to running /export directly but I guess the
> > export handler isn't installed. I tried to divide the query into rows,
> but
> > after a certain amount it freezes, and it also freezes when I try to
> limit
> > rows (e.g., rows 501-551 freezes the console). Is there any other way to
> > export the index short of having to install the export handler
> considering
> > we're not working on this project anyone?
> >
> > Thanks,
> > Amanda
> >
> > --
> > Dr. Amanda Shuman
> > Researcher and Lecturer, Institute of Chinese Studies, University of
> > Freiburg
> > Coordinator for the MA program in Modern China Studies
> > Database Administrator, The Maoist Legacy 
> > PhD, University of California, Santa Cruz
> > http://www.amandashuman.net/
> > http://www.prchistoryresources.org/
> > Office: +49 (0) 761 203 96748
>
>


Re: Bug in scoreNodes function of streaming expressions?

2020-01-29 Thread Pratik Patel
Thanks a lot. I will update the ticket with more details if appropriate.

Pratik

On Wed, Jan 29, 2020 at 10:07 AM Joel Bernstein  wrote:

> Here is the ticket:
> https://issues.apache.org/jira/browse/SOLR-14231
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
>
> On Wed, Jan 29, 2020 at 10:03 AM Joel Bernstein 
> wrote:
>
> > Hi Pratik,
> >
> > I'll create the ticket now and report back. If you've got a fix please
> > post it to the ticket and I'll try to get this in for the next release.
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> >
> > On Tue, Jan 28, 2020 at 11:52 AM pratik@semandex 
> > wrote:
> >
> >> Joel Bernstein wrote
> >> > Ok, that sounds like a bug. I can create a ticket for this.
> >> >
> >> > On Mon, Jul 1, 2019 at 5:57 PM Pratik Patel <
> >>
> >> > pratik@
> >>
> >> > > wrote:
> >> >
> >> >> I think the problem was that my streaming expression was always
> >> returning
> >> >> just one node. When I added more data so that I can have more than
> one
> >> >> node, I started seeing the result.
> >> >>
> >> >> On Mon, Jul 1, 2019 at 11:21 AM Pratik Patel <
> >>
> >> > pratik@
> >>
> >> > > wrote:
> >> >>
> >> >>> Hello Everyone,
> >> >>>
> >> >>> I am trying to execute following streaming expression with
> >> "scoreNodes"
> >> >>> function in it. This is taken from the documentation.
> >> >>>
> >> >>> scoreNodes(top(n="50",
> >> >>>sort="count(*) desc",
> >> >>>nodes(baskets,
> >> >>>  random(baskets, q="productID:ABC",
> >> >>> fl="basketID", rows="500"),
> >> >>>  walk="basketID->basketID",
> >> >>>  fq="-productID:ABC",
> >> >>>  gather="productID",
> >> >>>  count(*
> >> >>>
> >> >>> I have ensured that I have the collection and data present for it.
> >> >>> Upon executing this, I am getting an error message as follows.
> >> >>>
> >> >>> "No collection param specified on request and no default collection
> >> has
> >> >>> been set: []"
> >> >>>
> >> >>> Upon digging into the source code I found that there is a possible
> bug
> >> >>> in
> >> >>> ScoreNodesStream.java
> >> >>>
> >> >>> StringBuilder instance is never appended any string and the block
> >> which
> >> >>> initializes collection, needs the length of that instance to be more
> >> >>> than
> >> >>> zero. This condition will always be false and hence the collection
> >> will
> >> >>> never be set.
> >> >>>
> >> >>> I checked this file in solr version 8.1 and that also has the same
> >> >>> issue.
> >> >>> Is there any JIRA open for this or any patch available?
> >> >>>
> >> >>> [image: image.png]
> >> >>>
> >> >>> Thanks,
> >> >>> Pratik
> >> >>>
> >> >>
> >>
> >>
> >> Hi Joel,
> >>
> >> You mentioned creating a ticket for this bug, I can't find any, was it
> >> created? If not then I can create one. Currently, ScoreNodes has two
> >> issues.
> >>
> >> 1. It fails when result has only one node.
> >> 2. It triggers a GET request instead of POST. GET fails if number of
> nodes
> >> is large.
> >>
> >> I have been using a custom class as workaround for #2, it would be good
> to
> >> use the original SolrJ class.
> >>
> >> Thanks,
> >> Pratik
> >>
> >>
> >>
> >> --
> >> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
> >>
> >
>


Re: Solr fact response strange behaviour

2020-01-29 Thread Jason Gerlowski
Hey Adi,

There was a separate JIRA for this on the SolrJ objects it sounds like
you're using: SOLR-13780.  That JIRA was fixed, apparently in 8.3, so
I'm surprised you're still seeing the issue.  If you include the full
stacktrace and a snippet of code to reproduce, I'm curious to take a
look.

That won't help you in the short term though.  For that, yes, you'll
have to use ((Number)count).longValue() in the interim.

Best,

Jason

On Tue, Jan 28, 2020 at 2:20 AM Kaminski, Adi  wrote:
>
> Thanks Mikhail  !
>
> In issue comments that you have shared it seems that Yonik S doesn't agree 
> it's a defect...so probably will remain opened for a while.
>
>
>
> So meanwhile, is it recommended to perform casting 
> ((Number)count).longValue()  to our relevant logic ?
>
>
>
> Thanks,
> Adi
>
>
>
> -Original Message-
> From: Mikhail Khludnev 
> Sent: Tuesday, January 28, 2020 9:14 AM
> To: solr-user 
> Subject: Re: Solr fact response strange behaviour
>
>
>
> https://issues.apache.org/jira/browse/SOLR-11775
>
>
>
> On Tue, Jan 28, 2020 at 10:00 AM Kaminski, Adi 
> mailto:adi.kamin...@verint.com>>
>
> wrote:
>
>
>
> > Is it existing issue and tracked for future fix consideration ?
>
> >
>
> > What's the suggestion as W/A until fix - to case every related
>
> > response with ((Number)count).longValue() ?
>
> >
>
> > -Original Message-
>
> > From: Mikhail Khludnev mailto:m...@apache.org>>
>
> > Sent: Tuesday, January 28, 2020 8:53 AM
>
> > To: solr-user 
> > mailto:solr-user@lucene.apache.org>>
>
> > Subject: Re: Solr fact response strange behaviour
>
> >
>
> > I suppose there's an issue, which no one ever took a look.
>
> >
>
> > https://lucene.472066.n3.nabble.com/JSON-facets-count-a-long-or-an-int
>
> > eger-in-cloud-and-non-cloud-modes-td4265291.html
>
> >
>
> >
>
> > On Mon, Jan 27, 2020 at 11:47 PM Kaminski, Adi
>
> > mailto:adi.kamin...@verint.com>>
>
> > wrote:
>
> >
>
> > > SolrJ client is used of SolrCloud of Solr 8.3 version for JSON
>
> > > Facets requests...any idea why not consistent ?
>
> > >
>
> > > Sent from Workspace ONE Boxer
>
> > >
>
> > > On Jan 27, 2020 22:13, Mikhail Khludnev 
> > > mailto:m...@apache.org>> wrote:
>
> > > Hello,
>
> > > It might be different between SolrCloud and standalone mode. No data
>
> > > enough to make a conclusion.
>
> > >
>
> > > On Mon, Jan 27, 2020 at 5:40 PM Rudenko, Artur
>
> > > mailto:artur.rude...@verint.com>>
>
> > > wrote:
>
> > >
>
> > > > I'm trying to parse facet response, but sometimes the count
>
> > > > returns as Long type and sometimes as Integer type(on different
>
> > > > environments), The error is:
>
> > > > "java.lang.ClassCastException: java.lang.Integer cannot be cast to
>
> > > > java.lang.Long"
>
> > > >
>
> > > > Can you please explain why this happenes? Why it not consistent?
>
> > > >
>
> > > > I know the workaround to use Number class and longValue method but
>
> > > > I want to to the root cause before using this workaround
>
> > > >
>
> > > > Artur Rudenko
>
> > > >
>
> > > >
>
> > > >
>
> > > > This electronic message may contain proprietary and confidential
>
> > > > information of Verint Systems Inc., its affiliates and/or subsidiaries.
>
> > > The
>
> > > > information is intended to be for the use of the individual(s) or
>
> > > > entity(ies) named above. If you are not the intended recipient (or
>
> > > > authorized to receive this e-mail for the intended recipient), you
>
> > > > may
>
> > > not
>
> > > > use, copy, disclose or distribute to anyone this message or any
>
> > > information
>
> > > > contained in this message. If you have received this electronic
>
> > > > message
>
> > > in
>
> > > > error, please notify us by replying to this e-mail.
>
> > > >
>
> > >
>
> > >
>
> > > --
>
> > > Sincerely yours
>
> > > Mikhail Khludnev
>
> > >
>
> > >
>
> > > This electronic message may contain proprietary and confidential
>
> > > information of Verint Systems Inc., its affiliates and/or
>
> > > subsidiaries. The information is intended to be for the use of the
>
> > > individual(s) or
>
> > > entity(ies) named above. If you are not the intended recipient (or
>
> > > authorized to receive this e-mail for the intended recipient), you
>
> > > may not use, copy, disclose or distribute to anyone this message or
>
> > > any information contained in this message. If you have received this
>
> > > electronic message in error, please notify us by replying to this e-mail.
>
> > >
>
> >
>
> >
>
> > --
>
> > Sincerely yours
>
> > Mikhail Khludnev
>
> >
>
> >
>
> > This electronic message may contain proprietary and confidential
>
> > information of Verint Systems Inc., its affiliates and/or
>
> > subsidiaries. The information is intended to be for the use of the
>
> > individual(s) or
>
> > entity(ies) named above. If you are not the intended recipient (or
>
> > authorized to receive this e-mail for the intended recipient), you may
>
> > not use, copy, disclose or distribute to anyone this message or any
>
> > information conta

Solr Nested Documents not properly working

2020-01-29 Thread Yirmiyahu Fischer
Could you please answer my question on
https://stackoverflow.com/questions/59566421/solr-nested-documents-not-properly-setup

Thank you.
Yirmiyahu Fischer
Senior Developer
Signature IT


Re: Performance Issue since Solr 7.7 with wt=javabin

2020-01-29 Thread Florent Sithi
yes thanks so much, fixed in 8.4.0



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Easiest way to export the entire index

2020-01-29 Thread Steve Ge
@Amanda
You can try using curl and write output to a file
  curl http://localhost:8983/Solr?q={theSolrQuery) > out.json
  theSolrQuery - you need to specify all attrs you want exported, not just *
If you are on Windows, there is a Windows curl tool you can download to use




Steve  
 
  On Wed, Jan 29, 2020 at 10:21 AM, Emir 
Arnautović wrote:   Hi Amanda,
I assume that you have all the fields stored so you will be able to export full 
document.

Several thousands records should not be too much to use regular start+rows to 
paginate results, but the proper way of doing that would be to use cursors. 
Adjust page size to avoid creating huge responses and you can use curl or some 
similar tool to avoid using admin console. I did a quick search and there are 
several blog posts with scripts that does what you need.

HTH,
Emir

--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 29 Jan 2020, at 15:43, Amanda Shuman  wrote:
> 
> Dear all:
> 
> I've been asked to produce a JSON file of our index so it can be combined
> and indexed with other records. (We run solr 5.3.1 on this project; we're
> not going to upgrade, in part because funding has ended.) The index has
> several thousand rows, but nothing too drastic. Unfortunately, this is too
> much to handle for a simple query dump from the admin console. I tried to
> follow instructions related to running /export directly but I guess the
> export handler isn't installed. I tried to divide the query into rows, but
> after a certain amount it freezes, and it also freezes when I try to limit
> rows (e.g., rows 501-551 freezes the console). Is there any other way to
> export the index short of having to install the export handler considering
> we're not working on this project anyone?
> 
> Thanks,
> Amanda
> 
> --
> Dr. Amanda Shuman
> Researcher and Lecturer, Institute of Chinese Studies, University of
> Freiburg
> Coordinator for the MA program in Modern China Studies
> Database Administrator, The Maoist Legacy 
> PhD, University of California, Santa Cruz
> http://www.amandashuman.net/
> http://www.prchistoryresources.org/
> Office: +49 (0) 761 203 96748
  


KeeperErrorCode= BadVersion

2020-01-29 Thread Rajeswari Natarajan
Hi,

Getting below exception. We have solrcloud 7.6 installed and have commented
off the below in solrconfig.xml



what could be the reason.

Thanks,
Rajeswari


2020-01-17T13:03:40.84206185Z 2020-01-17 13:03:40,841 [myid:5] - INFO
[ProcessThread(sid:5 cport:-1)::PrepRequestProcessor@653] - Got user-level
KeeperException when processing sessionid:0x4065b74bfde04ef type:setData
cxid:0x559 zxid:0x500551395 txntype:-1 reqpath:n/a Error
Path:/collections/testcollection/terms/shard1 Error:KeeperErrorCode =
BadVersion for /collections/testcollection/terms/shard1


Unable to get ICUFoldingFilterFactory class loaded in unsecured 8.4.1 SolrCloud

2020-01-29 Thread Andy C
I have a schema currently used with Solr 7.3.1 that uses the ICU contrib
extensions. Previously I used a  directive in the solrconfig.xml to
load the icu4j and lucene-analyzers-icu jars.

The 8.4 upgrade notes indicate that this approach is no longer supported
for SolrCloud unless you enable authentication. So I removed the 
directive from the solrconfig.xml.

I tried creating a 'lib' directory underneath solr-8.4.1\server\solr and
copying the jars there. However I get a ClassNotFoundException for
ICUFoldingFilterFactory class when I try creating a collection using the
uploaded configset. Adding an explicit "lib"
entry to the solr.xml (kept outside zookeeper), didn't help. (Note: both
these approaches work with a standalone 8.4.1 Solr instance).

I tried copying the 2 jars into the one of directories that are part of the
standard classpath, but that seems to cause problems with the class loader,
as I start getting a  NoClassDefFoundError :
org/apache/lucene/analysis/util/ResourceLoaderAware exception.

Any suggestions?

Thanks,
- Andy -


Clarity on Stable Release

2020-01-29 Thread Jeff
TL;DR: I am having difficulty on deciding on a release that is stable to
use and would like this to be easier.

Recently it has been rather difficult to figure out what release to use
based on its stability. This is probably in part because of the rapid
release cadence and also the versioning being employed upon a release.

To demonstrate what I mean, let me walk through some of the process we've
had for determining what version to use starting at version 8.1.0:
1) 8.1.0 could not be used because of NPE (SOLR-13475) so we upgrade to
8.1.1
2) 8.1.1 could not be used because of intermittent 401s (SOLR-13510) so we
looked for a patch version 8.1.2 - which does not exist. So instead we
looked into upgrading to 8.2.0 (which includes new features and
improvements alongside bug fixes).
3) 8.2.0 is fine except for CVE-2019-12409 caused by a bad configuration.
This is still a good stable candidate if the configuration is simply
changed (or solr is properly secured through networking measures anyway).
4) 8.3.0 contains a bug that causes data loss during inter-node updates
SOLR-13963 so must use patch version 8.3.1
5) Versions 8.4.0 and 8.4.1 have since been released and they seem stable
so far.

Now, we are considering 8.2.0, 8.3.1, or 8.4.1 to use as they seem to be
stable. But it is hard to determine if we should be using the bleeding edge
or a few minor versions back since each of  these includes many bug fixes.
It is unclear to me why some fixes get back-patched and why some are
released under new minor version changes (which include some hefty
improvements and features).

To clarify, I am mostly asking for some clarity on which versions *should*
be used for a stable system and that we somehow can make it more clear in
the future. I am not trying to point the finger at specific bugs, but am
simply using them as examples as to why it is hard to determine a release
as stable.

If anybody has insight on this, please let me know.


RE: Solr fact response strange behaviour

2020-01-29 Thread Kaminski, Adi
Sure, thanks for the guidance and the assistance anyway.

Here is the stack trace:
Here is the stack trace:
[29/01/20 08:09:41:041 IST] [http-nio-8080-exec-2] ERROR api.BaseAPI: There was 
an Exception calling Solr
java.lang.ClassCastException: java.lang.Integer cannot be cast to java.lang.Long
at 
com.productcore.analytics.api.AutoCompleteAPI.lambda$mapSolrResponse$0(AutoCompleteAPI.java:170)
 ~[classes/:?]
at 
java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382) 
~[?:1.8.0_201]
at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:580) 
~[?:1.8.0_201]
at 
com.productcore.analytics.api.AutoCompleteAPI.mapSolrResponse(AutoCompleteAPI.java:167)
 ~[classes/:?]
at com.productcore.analytics.api.BaseAPI.execute(BaseAPI.java:48) [classes/:?]
at 
com.productcore.analytics.controllers.DalController.getAutocomplete(DalController.java:205)
 [classes/:?]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_201]
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
~[?:1.8.0_201]
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 ~[?:1.8.0_201]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_201]
at 
org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:189)
 [spring-web-5.1.6.RELEASE.jar:5.1.6.RELEASE]
at 
org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:138)
 [spring-web-5.1.6.RELEASE.jar:5.1.6.RELEASE]
at 
org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:102)
 [spring-webmvc-5.1.6.RELEASE.jar:5.1.6.RELEASE]
at 
org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:892)
 [spring-webmvc-5.1.6.RELEASE.jar:5.1.6.RELEASE]
at 
org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:797)
 [spring-webmvc-5.1.6.RELEASE.jar:5.1.6.RELEASE]
at 
org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:87)
 [spring-webmvc-5.1.6.RELEASE.jar:5.1.6.RELEASE]
at 
org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:1038)
 [spring-webmvc-5.1.6.RELEASE.jar:5.1.6.RELEASE]
at 
org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:942)
 [spring-webmvc-5.1.6.RELEASE.jar:5.1.6.RELEASE]
at 
org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:1005)
 [spring-webmvc-5.1.6.RELEASE.jar:5.1.6.RELEASE]
at 
org.springframework.web.servlet.FrameworkServlet.doPost(FrameworkServlet.java:908)
 [spring-webmvc-5.1.6.RELEASE.jar:5.1.6.RELEASE]
at javax.servlet.http.HttpServlet.service(HttpServlet.java:660) 
[tomcat-embed-core-9.0.17.jar:9.0.17]
at 
org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:882)
 [spring-webmvc-5.1.6.RELEASE.jar:5.1.6.RELEASE]
at javax.servlet.http.HttpServlet.service(HttpServlet.java:741) 
[tomcat-embed-core-9.0.17.jar:9.0.17]
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:231)
 [tomcat-embed-core-9.0.17.jar:9.0.17]
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
 [tomcat-embed-core-9.0.17.jar:9.0.17]
at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:53) 
[tomcat-embed-websocket-9.0.17.jar:9.0.17]
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
 [tomcat-embed-core-9.0.17.jar:9.0.17]
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
 [tomcat-embed-core-9.0.17.jar:9.0.17]
at 
org.springframework.boot.actuate.web.trace.servlet.HttpTraceFilter.doFilterInternal(HttpTraceFilter.java:90)
 [spring-boot-actuator-2.1.4.RELEASE.jar:2.1.4.RELEASE]
at 
org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
 [spring-web-5.1.6.RELEASE.jar:5.1.6.RELEASE]
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
 [tomcat-embed-core-9.0.17.jar:9.0.17]
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
 [tomcat-embed-core-9.0.17.jar:9.0.17]
at 
org.springframework.web.filter.RequestContextFilter.doFilterInternal(RequestContextFilter.java:99)
 [spring-web-5.1.6.RELEASE.jar:5.1.6.RELEASE]
at 
org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
 [spring-web-5.1.6.RELEASE.jar:5.1.6.RELEASE]
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
 [tomcat-embed-core-9.0.17.jar:9.0.17]
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)

For Hierarchical data structure is Graph Query a good option ?

2020-01-29 Thread sambasivarao giddaluri
Hi ,


I have a data in hierarchical structure ex:

parent --> children --> grandchildren


Usecase:

Get parent docs by adding filter on children and grand children or Get
grand children docs by adding filters on parent and children


To accommodate this use case i have flattened the docs by adding a
reference (parent) in the children and similarly (parent and children) in
grandchildren doc. And used graph to query join the data using from/to
fields.


But this gets complicated as we add filter with AND and OR conditions.


Any other approach which can solve these kind of use cases .


Regards

sam


Re: Solr fact response strange behaviour

2020-01-29 Thread Jason Gerlowski
Thanks Adi,

There's no SolrJ code in your stacktrace, so this was something other
than SOLR-13780 apparently.  Best of luck!

Jason

On Wed, Jan 29, 2020 at 1:28 PM Kaminski, Adi  wrote:
>
> Sure, thanks for the guidance and the assistance anyway.
>
> Here is the stack trace:
> Here is the stack trace:
> [29/01/20 08:09:41:041 IST] [http-nio-8080-exec-2] ERROR api.BaseAPI: There 
> was an Exception calling Solr
> java.lang.ClassCastException: java.lang.Integer cannot be cast to 
> java.lang.Long
> at 
> com.productcore.analytics.api.AutoCompleteAPI.lambda$mapSolrResponse$0(AutoCompleteAPI.java:170)
>  ~[classes/:?]
> at 
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
>  ~[?:1.8.0_201]
> at 
> java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:580) 
> ~[?:1.8.0_201]
> at 
> com.productcore.analytics.api.AutoCompleteAPI.mapSolrResponse(AutoCompleteAPI.java:167)
>  ~[classes/:?]
> at com.productcore.analytics.api.BaseAPI.execute(BaseAPI.java:48) [classes/:?]
> at 
> com.productcore.analytics.controllers.DalController.getAutocomplete(DalController.java:205)
>  [classes/:?]
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_201]
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_201]
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_201]
> at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_201]
> at 
> org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:189)
>  [spring-web-5.1.6.RELEASE.jar:5.1.6.RELEASE]
> at 
> org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:138)
>  [spring-web-5.1.6.RELEASE.jar:5.1.6.RELEASE]
> at 
> org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:102)
>  [spring-webmvc-5.1.6.RELEASE.jar:5.1.6.RELEASE]
> at 
> org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:892)
>  [spring-webmvc-5.1.6.RELEASE.jar:5.1.6.RELEASE]
> at 
> org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:797)
>  [spring-webmvc-5.1.6.RELEASE.jar:5.1.6.RELEASE]
> at 
> org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:87)
>  [spring-webmvc-5.1.6.RELEASE.jar:5.1.6.RELEASE]
> at 
> org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:1038)
>  [spring-webmvc-5.1.6.RELEASE.jar:5.1.6.RELEASE]
> at 
> org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:942)
>  [spring-webmvc-5.1.6.RELEASE.jar:5.1.6.RELEASE]
> at 
> org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:1005)
>  [spring-webmvc-5.1.6.RELEASE.jar:5.1.6.RELEASE]
> at 
> org.springframework.web.servlet.FrameworkServlet.doPost(FrameworkServlet.java:908)
>  [spring-webmvc-5.1.6.RELEASE.jar:5.1.6.RELEASE]
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:660) 
> [tomcat-embed-core-9.0.17.jar:9.0.17]
> at 
> org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:882)
>  [spring-webmvc-5.1.6.RELEASE.jar:5.1.6.RELEASE]
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:741) 
> [tomcat-embed-core-9.0.17.jar:9.0.17]
> at 
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:231)
>  [tomcat-embed-core-9.0.17.jar:9.0.17]
> at 
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
>  [tomcat-embed-core-9.0.17.jar:9.0.17]
> at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:53) 
> [tomcat-embed-websocket-9.0.17.jar:9.0.17]
> at 
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
>  [tomcat-embed-core-9.0.17.jar:9.0.17]
> at 
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
>  [tomcat-embed-core-9.0.17.jar:9.0.17]
> at 
> org.springframework.boot.actuate.web.trace.servlet.HttpTraceFilter.doFilterInternal(HttpTraceFilter.java:90)
>  [spring-boot-actuator-2.1.4.RELEASE.jar:2.1.4.RELEASE]
> at 
> org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
>  [spring-web-5.1.6.RELEASE.jar:5.1.6.RELEASE]
> at 
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
>  [tomcat-embed-core-9.0.17.jar:9.0.17]
> at 
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
>  [tomcat-embed-core-9.0.17.jar:9.0.17]
> at 
> org.springframework.web.filter.RequestContextFilter.doFilterInternal(RequestContextFilter.java:99)
>  [spring-web-5.1.6.RELEASE.jar:5.1.

Solr Searcher 100% Latency Spike

2020-01-29 Thread Karl Stoney
Hi All,
Looking for a bit of support here.  When we soft commit (every 10 minutes), we 
get a latency spike that means response times for solr are loosely double, as 
you can see in this screenshot:

[cid:ed9fa791-0776-43fc-8f22-d8a568f5c084]

These do correlate to GC spikes (albeit not particularly bad):
[cid:36724c12-ef9a-4764-b132-86794824bd61]

But don't really correlate to disk or cpu/ram stress:
[cid:6e0306e4-ed51-459c-b55f-d0e1175fcc34]

They do correlate to filterCache warmup, which seem to take between 10s and 30s:
[cid:7b8acb40-fa7b-4653-9214-6b2010b3529d]

We don't have any other caches enabled, due to the high level of cardinality of 
the queries.

The spikes are specifically on /select
[cid:19b70fb5-9894-4b30-a6a3-459e0e58f532]

We have the following autowarm configuration for the filterCache:



And some suitable queries in our newSearcher warmup config.

I'm at a lot at what else to do to try and minimise these spikes.  Does anyone 
have any ideas?

Thanks
This e-mail is sent on behalf of Auto Trader Group Plc, Registered Office: 1 
Tony Wilson Place, Manchester, Lancashire, M15 4FN (Registered in England No. 
9439967). This email and any files transmitted with it are confidential and may 
be legally privileged, and intended solely for the use of the individual or 
entity to whom they are addressed. If you have received this email in error 
please notify the sender. This email message has been swept for the presence of 
computer viruses.


Re: Performance Issue since Solr 7.7 with wt=javabin

2020-01-29 Thread Karl Stoney
Could anyone produce a patch for 7.7 please?

From: Florent Sithi 
Sent: 29 January 2020 14:34
To: solr-user@lucene.apache.org 
Subject: Re: Performance Issue since Solr 7.7 with wt=javabin

yes thanks so much, fixed in 8.4.0



--
Sent from: 
https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Flucene.472066.n3.nabble.com%2FSolr-User-f472068.html&data=02%7C01%7Ckarl.stoney%40autotrader.co.uk%7C908476b216bd4c6d8cb008d7a4d5af4d%7C926f3743f3d24b8a816818cfcbe776fe%7C0%7C1%7C637159109977374057&sdata=tcRNMCd5JOMFnx9ukCqikpVUUB%2FTOCwmsrZsalNUc4I%3D&reserved=0

This e-mail is sent on behalf of Auto Trader Group Plc, Registered Office: 1 
Tony Wilson Place, Manchester, Lancashire, M15 4FN (Registered in England No. 
9439967). This email and any files transmitted with it are confidential and may 
be legally privileged, and intended solely for the use of the individual or 
entity to whom they are addressed. If you have received this email in error 
please notify the sender. This email message has been swept for the presence of 
computer viruses.


Re: Operation backup caused exception : AccessDeniedException

2020-01-29 Thread Shawn Heisey

On 1/29/2020 3:26 AM, Salmaan Rashid Syed wrote:

I was trying to execute the backup command using curl command on my work
computer to see why EC2 instance was giving the previous error. On my
current computer, I have root privileges. But when I execute the command on
my work computer, I have a different problem. It states that the
path/folder doesn't exist as follows.


It makes no difference what user makes the HTTP request.  OS user 
information is not transmitted with the request.



I was reading the documentation for Backup and restore of solr. The
location should be a shared drive for the backup. Can I not store it on my
hard drive?.


If you have only one Solr server, then you can use its hard drive.  But 
if you have more than one Solr instance running on different machines, 
they all must access the same storage at the same mount point, so that 
when multiple servers execute the backup, the data is all sent to the 
same location.



But, when I tried downloading the backup to my drive on computer, it worked
fine previously. But, now it refuses to download the solr backup file to my
hard drive.


The backup location must exist on the Solr server(s).  The data will NOT 
be sent to a location on a workstation that makes the HTTP request.



The only change that the Solr underwent from the previous time is that I
have enabled authentication on it.

Is authentication causing all these problems?.


I do not know.  I would not expect it to.  I have never used Solr's 
authentication mechanisms.


Thanks,
Shawn


Re: Can I create 1000 cores in SOLR CLOUD

2020-01-29 Thread Shawn Heisey

On 1/27/2020 4:59 AM, Vignan Malyala wrote:

We are currently using solr without cloud with 500 cores. It works good.

Now we are planning to expand it using solr cloud with 1000 cores, (2 cores
for each of my client with different domain data).


SolrCloud starts having scalability issues once you reach a few hundred 
collections, regardless of how many servers are in the cloud.


I explored this in SOLR-7191.  You might notice that the issue is in a 
"Resolved/Fixed" state ... but there were no changes committed, and when 
I tested it again with a later version, I saw evidence that the 
situation has gotten worse, not better.


https://issues.apache.org/jira/browse/SOLR-7191

If you already have mechanisms in place to handle high availability, you 
would be far better off NOT using SolrCloud mode.


Thanks,
Shawn


Re: Solr Searcher 100% Latency Spike

2020-01-29 Thread Shawn Heisey

On 1/29/2020 12:44 PM, Karl Stoney wrote:
Looking for a bit of support here.  When we soft commit (every 10 
minutes), we get a latency spike that means response times for solr are 
loosely double, as you can see in this screenshot:


Attachments almost never make it to the list.  We cannot see any of your 
screenshots.


They do correlate to filterCache warmup, which seem to take between 10s 
and 30s:


We don't have any other caches enabled, due to the high level of 
cardinality of the queries.


The spikes are specifically on /select


We have the following autowarm configuration for the filterCache:

         


Autowarm, especially on filterCache, can be an extremely lengthy 
process.  What Solr must do in order to warm the cache here is execute 
up to 900 queries, sequentially, on the new index.  That can take a lot 
of time and use a lot of resources like CPU and I/O.


In order to reduce the impact of cache warming, I had to reduce my own 
autowarmCount on the filterCache to 4.


Thanks,
Shawn


Re: Solr Searcher 100% Latency Spike

2020-01-29 Thread Karl Stoney
Hey Shawn,
Thanks for the reply - funnily enough that is exactly what i'm trialing now.  
I've significantly lowered the autoWarm (as well as the size) and still have a 
0.95+ cache hit rate through searcher loads.

I'm going to continue to tweak these values down so long as i keep the hit rate 
above 90, which should reduce some memory pressure at least.

Thanks
Karl

From: Shawn Heisey 
Sent: 29 January 2020 21:01
To: solr-user@lucene.apache.org 
Subject: Re: Solr Searcher 100% Latency Spike

On 1/29/2020 12:44 PM, Karl Stoney wrote:
> Looking for a bit of support here.  When we soft commit (every 10
> minutes), we get a latency spike that means response times for solr are
> loosely double, as you can see in this screenshot:

Attachments almost never make it to the list.  We cannot see any of your
screenshots.

> They do correlate to filterCache warmup, which seem to take between 10s
> and 30s:
>
> We don't have any other caches enabled, due to the high level of
> cardinality of the queries.
>
> The spikes are specifically on /select
>
>
> We have the following autowarm configuration for the filterCache:
>
> size="8192"
>   initialSize="8192"
>   cleanupThread="true"
>   autowarmCount="900"/>

Autowarm, especially on filterCache, can be an extremely lengthy
process.  What Solr must do in order to warm the cache here is execute
up to 900 queries, sequentially, on the new index.  That can take a lot
of time and use a lot of resources like CPU and I/O.

In order to reduce the impact of cache warming, I had to reduce my own
autowarmCount on the filterCache to 4.

Thanks,
Shawn

This e-mail is sent on behalf of Auto Trader Group Plc, Registered Office: 1 
Tony Wilson Place, Manchester, Lancashire, M15 4FN (Registered in England No. 
9439967). This email and any files transmitted with it are confidential and may 
be legally privileged, and intended solely for the use of the individual or 
entity to whom they are addressed. If you have received this email in error 
please notify the sender. This email message has been swept for the presence of 
computer viruses.


Re: Solr Searcher 100% Latency Spike

2020-01-29 Thread Walter Underwood
I use a static set of warming queries, about 20 of them. That is fast and gets 
a decent amount of the index into file buffers. Your top queries won’t change 
much unless you have a news site or a seasonal business.

Like this:


  

  
  introduction
  intermediate
  fundamentals
  understanding
  introductory
  precalculus
  foundations
  microeconomics
  microbiology
  macroeconomics
  discovering
  international
  mathematics
  organizational
  criminology
  developmental
  engineering

  


wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Jan 29, 2020, at 1:01 PM, Shawn Heisey  wrote:
> 
> On 1/29/2020 12:44 PM, Karl Stoney wrote:
>> Looking for a bit of support here.  When we soft commit (every 10 minutes), 
>> we get a latency spike that means response times for solr are loosely 
>> double, as you can see in this screenshot:
> 
> Attachments almost never make it to the list.  We cannot see any of your 
> screenshots.
> 
>> They do correlate to filterCache warmup, which seem to take between 10s and 
>> 30s:
>> We don't have any other caches enabled, due to the high level of cardinality 
>> of the queries.
>> The spikes are specifically on /select
>> We have the following autowarm configuration for the filterCache:
>> >  size="8192"
>>  initialSize="8192"
>>  cleanupThread="true"
>>  autowarmCount="900"/>
> 
> Autowarm, especially on filterCache, can be an extremely lengthy process.  
> What Solr must do in order to warm the cache here is execute up to 900 
> queries, sequentially, on the new index.  That can take a lot of time and use 
> a lot of resources like CPU and I/O.
> 
> In order to reduce the impact of cache warming, I had to reduce my own 
> autowarmCount on the filterCache to 4.
> 
> Thanks,
> Shawn



Re: Solr Searcher 100% Latency Spike

2020-01-29 Thread Karl Stoney
Out of curiosity, could you define "fast"?
I'm wondering what sort of figures people target their searcher warm time at

From: Walter Underwood 
Sent: 29 January 2020 21:13
To: solr-user@lucene.apache.org 
Subject: Re: Solr Searcher 100% Latency Spike

I use a static set of warming queries, about 20 of them. That is fast and gets 
a decent amount of the index into file buffers. Your top queries won’t change 
much unless you have a news site or a seasonal business.

Like this:


  

  
  introduction
  intermediate
  fundamentals
  understanding
  introductory
  precalculus
  foundations
  microeconomics
  microbiology
  macroeconomics
  discovering
  international
  mathematics
  organizational
  criminology
  developmental
  engineering

  


wunder
Walter Underwood
wun...@wunderwood.org
https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fobserver.wunderwood.org%2F&data=02%7C01%7Ckarl.stoney%40autotrader.co.uk%7C48627550665c47efecae08d7a5002b8e%7C926f3743f3d24b8a816818cfcbe776fe%7C0%7C0%7C637159292473223261&sdata=ZCCITDfh2TlR4KKwLzZ%2BVQL1b6%2F3OXewXFS1T3nhlVo%3D&reserved=0
  (my blog)

> On Jan 29, 2020, at 1:01 PM, Shawn Heisey  wrote:
>
> On 1/29/2020 12:44 PM, Karl Stoney wrote:
>> Looking for a bit of support here.  When we soft commit (every 10 minutes), 
>> we get a latency spike that means response times for solr are loosely 
>> double, as you can see in this screenshot:
>
> Attachments almost never make it to the list.  We cannot see any of your 
> screenshots.
>
>> They do correlate to filterCache warmup, which seem to take between 10s and 
>> 30s:
>> We don't have any other caches enabled, due to the high level of cardinality 
>> of the queries.
>> The spikes are specifically on /select
>> We have the following autowarm configuration for the filterCache:
>> >  size="8192"
>>  initialSize="8192"
>>  cleanupThread="true"
>>  autowarmCount="900"/>
>
> Autowarm, especially on filterCache, can be an extremely lengthy process.  
> What Solr must do in order to warm the cache here is execute up to 900 
> queries, sequentially, on the new index.  That can take a lot of time and use 
> a lot of resources like CPU and I/O.
>
> In order to reduce the impact of cache warming, I had to reduce my own 
> autowarmCount on the filterCache to 4.
>
> Thanks,
> Shawn

This e-mail is sent on behalf of Auto Trader Group Plc, Registered Office: 1 
Tony Wilson Place, Manchester, Lancashire, M15 4FN (Registered in England No. 
9439967). This email and any files transmitted with it are confidential and may 
be legally privileged, and intended solely for the use of the individual or 
entity to whom they are addressed. If you have received this email in error 
please notify the sender. This email message has been swept for the presence of 
computer viruses.


Re: Solr Searcher 100% Latency Spike

2020-01-29 Thread Walter Underwood
Looking at the log, that takes one or two seconds after a complete batch reload 
(master/slave). So that is loading a cold index, all new files. This is not a 
big index, about a half million book titles.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Jan 29, 2020, at 1:21 PM, Karl Stoney 
>  wrote:
> 
> Out of curiosity, could you define "fast"?
> I'm wondering what sort of figures people target their searcher warm time at
> 
> From: Walter Underwood 
> Sent: 29 January 2020 21:13
> To: solr-user@lucene.apache.org 
> Subject: Re: Solr Searcher 100% Latency Spike
> 
> I use a static set of warming queries, about 20 of them. That is fast and 
> gets a decent amount of the index into file buffers. Your top queries won’t 
> change much unless you have a news site or a seasonal business.
> 
> Like this:
> 
>
>  
>
>  
>  introduction
>  intermediate
>  fundamentals
>  understanding
>  introductory
>  precalculus
>  foundations
>  microeconomics
>  microbiology
>  macroeconomics
>  discovering
>  international
>  mathematics
>  organizational
>  criminology
>  developmental
>  engineering
>
>  
>
> 
> wunder
> Walter Underwood
> wun...@wunderwood.org
> https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fobserver.wunderwood.org%2F&data=02%7C01%7Ckarl.stoney%40autotrader.co.uk%7C48627550665c47efecae08d7a5002b8e%7C926f3743f3d24b8a816818cfcbe776fe%7C0%7C0%7C637159292473223261&sdata=ZCCITDfh2TlR4KKwLzZ%2BVQL1b6%2F3OXewXFS1T3nhlVo%3D&reserved=0
>   (my blog)
> 
>> On Jan 29, 2020, at 1:01 PM, Shawn Heisey  wrote:
>> 
>> On 1/29/2020 12:44 PM, Karl Stoney wrote:
>>> Looking for a bit of support here.  When we soft commit (every 10 minutes), 
>>> we get a latency spike that means response times for solr are loosely 
>>> double, as you can see in this screenshot:
>> 
>> Attachments almost never make it to the list.  We cannot see any of your 
>> screenshots.
>> 
>>> They do correlate to filterCache warmup, which seem to take between 10s and 
>>> 30s:
>>> We don't have any other caches enabled, due to the high level of 
>>> cardinality of the queries.
>>> The spikes are specifically on /select
>>> We have the following autowarm configuration for the filterCache:
>>>>> size="8192"
>>> initialSize="8192"
>>> cleanupThread="true"
>>> autowarmCount="900"/>
>> 
>> Autowarm, especially on filterCache, can be an extremely lengthy process.  
>> What Solr must do in order to warm the cache here is execute up to 900 
>> queries, sequentially, on the new index.  That can take a lot of time and 
>> use a lot of resources like CPU and I/O.
>> 
>> In order to reduce the impact of cache warming, I had to reduce my own 
>> autowarmCount on the filterCache to 4.
>> 
>> Thanks,
>> Shawn
> 
> This e-mail is sent on behalf of Auto Trader Group Plc, Registered Office: 1 
> Tony Wilson Place, Manchester, Lancashire, M15 4FN (Registered in England No. 
> 9439967). This email and any files transmitted with it are confidential and 
> may be legally privileged, and intended solely for the use of the individual 
> or entity to whom they are addressed. If you have received this email in 
> error please notify the sender. This email message has been swept for the 
> presence of computer viruses.



Re: Solr Searcher 100% Latency Spike

2020-01-29 Thread Karl Stoney
So interestingly tweaking my filter cache i've got the warming time down to 1s 
(from 10!) and also reduced my memory footprint due to the smaller cache size.

However, I still get these latency spikes (these changes have made no 
difference to them).

So the theory about them being due to the warming being too intensive is wrong.

I know the images didn't load btw so when I say spike I mean p95th response 
time going from 50ms to 100-120ms momentarily.

From: Walter Underwood 
Sent: 29 January 2020 21:30
To: solr-user@lucene.apache.org 
Subject: Re: Solr Searcher 100% Latency Spike

Looking at the log, that takes one or two seconds after a complete batch reload 
(master/slave). So that is loading a cold index, all new files. This is not a 
big index, about a half million book titles.

wunder
Walter Underwood
wun...@wunderwood.org
https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fobserver.wunderwood.org%2F&data=02%7C01%7Ckarl.stoney%40autotrader.co.uk%7C88a60f1aa3e14255da7b08d7a5026ee3%7C926f3743f3d24b8a816818cfcbe776fe%7C0%7C0%7C637159302173939949&sdata=hhLg7bOfsLMN8OgLR625oj8xX%2Fm%2BZ%2BVOf1C813e4xk8%3D&reserved=0
  (my blog)

> On Jan 29, 2020, at 1:21 PM, Karl Stoney 
>  wrote:
>
> Out of curiosity, could you define "fast"?
> I'm wondering what sort of figures people target their searcher warm time at
> 
> From: Walter Underwood 
> Sent: 29 January 2020 21:13
> To: solr-user@lucene.apache.org 
> Subject: Re: Solr Searcher 100% Latency Spike
>
> I use a static set of warming queries, about 20 of them. That is fast and 
> gets a decent amount of the index into file buffers. Your top queries won’t 
> change much unless you have a news site or a seasonal business.
>
> Like this:
>
>
>  
>
>  
>  introduction
>  intermediate
>  fundamentals
>  understanding
>  introductory
>  precalculus
>  foundations
>  microeconomics
>  microbiology
>  macroeconomics
>  discovering
>  international
>  mathematics
>  organizational
>  criminology
>  developmental
>  engineering
>
>  
>
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fobserver.wunderwood.org%2F&data=02%7C01%7Ckarl.stoney%40autotrader.co.uk%7C88a60f1aa3e14255da7b08d7a5026ee3%7C926f3743f3d24b8a816818cfcbe776fe%7C0%7C0%7C637159302173939949&sdata=hhLg7bOfsLMN8OgLR625oj8xX%2Fm%2BZ%2BVOf1C813e4xk8%3D&reserved=0
>   (my blog)
>
>> On Jan 29, 2020, at 1:01 PM, Shawn Heisey  wrote:
>>
>> On 1/29/2020 12:44 PM, Karl Stoney wrote:
>>> Looking for a bit of support here.  When we soft commit (every 10 minutes), 
>>> we get a latency spike that means response times for solr are loosely 
>>> double, as you can see in this screenshot:
>>
>> Attachments almost never make it to the list.  We cannot see any of your 
>> screenshots.
>>
>>> They do correlate to filterCache warmup, which seem to take between 10s and 
>>> 30s:
>>> We don't have any other caches enabled, due to the high level of 
>>> cardinality of the queries.
>>> The spikes are specifically on /select
>>> We have the following autowarm configuration for the filterCache:
>>>>> size="8192"
>>> initialSize="8192"
>>> cleanupThread="true"
>>> autowarmCount="900"/>
>>
>> Autowarm, especially on filterCache, can be an extremely lengthy process.  
>> What Solr must do in order to warm the cache here is execute up to 900 
>> queries, sequentially, on the new index.  That can take a lot of time and 
>> use a lot of resources like CPU and I/O.
>>
>> In order to reduce the impact of cache warming, I had to reduce my own 
>> autowarmCount on the filterCache to 4.
>>
>> Thanks,
>> Shawn
>
> This e-mail is sent on behalf of Auto Trader Group Plc, Registered Office: 1 
> Tony Wilson Place, Manchester, Lancashire, M15 4FN (Registered in England No. 
> 9439967). This email and any files transmitted with it are confidential and 
> may be legally privileged, and intended solely for the use of the individual 
> or entity to whom they are addressed. If you have received this email in 
> error please notify the sender. This email message has been swept for the 
> presence of computer viruses.

This e-mail is sent on behalf of Auto Trader Group Plc, Registered Office: 1 
Tony Wilson Place, Manchester, Lancashire, M15 4FN (Registered in England No. 
9439967). This email and any files transmitted with it are confidential and may 
be legally privileged, and intended solely for the use of the individual or 
entity to whom they are addressed. If you have received this email in error 
please notify the sender. This email message has been swept for the presence of 
computer viruses.


How expensive is core loading?

2020-01-29 Thread Rahul Goswami
Hello,
I am using Solr 7.2.1 on a Solr node running in standalone mode (-Xmx 8
GB). I wish to implement a service to monitor the server stats (like number
of docs per core, index size etc) .This would require me to load the core
and my concern is that for a node hosting 100+ cores, this could be
expensive. So here are my questions:

1) How expensive is core loading if I am only getting stats like the total
docs and size of the index (no expensive queries)?
2) Does the memory consumption on core loading depend on the index size ?
3) What is a reasonable value for transient cache size in a production
setup with above configuration?

Thanks,
Rahul


Re: How expensive is core loading?

2020-01-29 Thread Walter Underwood
You might use Luke to get that info from the index files without loading them
into Solr.

https://code.google.com/archive/p/luke/

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Jan 29, 2020, at 2:01 PM, Rahul Goswami  wrote:
> 
> Hello,
> I am using Solr 7.2.1 on a Solr node running in standalone mode (-Xmx 8
> GB). I wish to implement a service to monitor the server stats (like number
> of docs per core, index size etc) .This would require me to load the core
> and my concern is that for a node hosting 100+ cores, this could be
> expensive. So here are my questions:
> 
> 1) How expensive is core loading if I am only getting stats like the total
> docs and size of the index (no expensive queries)?
> 2) Does the memory consumption on core loading depend on the index size ?
> 3) What is a reasonable value for transient cache size in a production
> setup with above configuration?
> 
> Thanks,
> Rahul



Re: Easiest way to export the entire index

2020-01-29 Thread Edward Ribeiro
HI Amanda,

Below is crude prototype in Bash that fetches documents from Solr using
cursorMark:
https://gist.github.com/eribeiro/de1588aaa1759c02ea40cc281e8aedc8

This is a crude prototype, but should shed some light for your use case (I
copied the code below too):

Best,
Edward

- fetcher.sh
--

#!/bin/bash

## Usage:
## $ chmod +x fetcher.sh
## $fetcher.sh 

SOLR_URL="http://localhost:8983/solr";
COLLECTION="teste"
ROWS=10
CURSORMARK=*
Q=*:*
SORT=id%20desc

NEXT_CURSORMARK=

FILENAME=$1

cat /dev/null > $FILENAME   ## truncante file content, if file exists

echo "[" >> $FILENAME   ## open bracket so file content is a valid
json (list of lists of records)

counter=0
while [[ True ]]; do


url="$SOLR_URL/$COLLECTION/select?q=$Q&rows=$ROWS&cursorMark=$CURSORMARK&sort=$SORT"
resp=$(curl -s "$url")

## jq '.' <<< "$resp"

NEXT_CURSORMARK=$(jq '.nextCursorMark' <<< "$resp")
NEXT_CURSORMARK=$(echo $NEXT_CURSORMARK | sed -e 's/\"//g')

docs=$(jq '.response.docs' <<< "$resp")
num_docs=$(echo $docs | jq '. | length')
echo $docs
counter=$((counter + num_docs))

echo $docs >> $FILENAME

if [[ "$CURSORMARK" == "$NEXT_CURSORMARK" ]]; then
   echo "]" >> $FILENAME   ## make content a valid json file
   # echo "Num docs: "$counter
   echo "Finished."
   exit
else
   echo "," >> $FILENAME  ## make content a valid json file
fi

CURSORMARK=$NEXT_CURSORMARK

# sleep 1 ## optional, sleep a bit before fetching the next page
done;

--end fetcher.sh
---



On Wed, Jan 29, 2020 at 1:12 PM Steve Ge  wrote:

> @Amanda
> You can try using curl and write output to a file
>   curl http://localhost:8983/Solr?q={theSolrQuery) > out.json
>   theSolrQuery - you need to specify all attrs you want exported, not just
> *
> If you are on Windows, there is a Windows curl tool you can download to use
>
>
>
>
> Steve
>
>   On Wed, Jan 29, 2020 at 10:21 AM, Emir Arnautović<
> emir.arnauto...@sematext.com> wrote:   Hi Amanda,
> I assume that you have all the fields stored so you will be able to export
> full document.
>
> Several thousands records should not be too much to use regular start+rows
> to paginate results, but the proper way of doing that would be to use
> cursors. Adjust page size to avoid creating huge responses and you can use
> curl or some similar tool to avoid using admin console. I did a quick
> search and there are several blog posts with scripts that does what you
> need.
>
> HTH,
> Emir
>
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 29 Jan 2020, at 15:43, Amanda Shuman  wrote:
> >
> > Dear all:
> >
> > I've been asked to produce a JSON file of our index so it can be combined
> > and indexed with other records. (We run solr 5.3.1 on this project; we're
> > not going to upgrade, in part because funding has ended.) The index has
> > several thousand rows, but nothing too drastic. Unfortunately, this is
> too
> > much to handle for a simple query dump from the admin console. I tried to
> > follow instructions related to running /export directly but I guess the
> > export handler isn't installed. I tried to divide the query into rows,
> but
> > after a certain amount it freezes, and it also freezes when I try to
> limit
> > rows (e.g., rows 501-551 freezes the console). Is there any other way to
> > export the index short of having to install the export handler
> considering
> > we're not working on this project anyone?
> >
> > Thanks,
> > Amanda
> >
> > --
> > Dr. Amanda Shuman
> > Researcher and Lecturer, Institute of Chinese Studies, University of
> > Freiburg
> > Coordinator for the MA program in Modern China Studies
> > Database Administrator, The Maoist Legacy 
> > PhD, University of California, Santa Cruz
> > http://www.amandashuman.net/
> > http://www.prchistoryresources.org/
> > Office: +49 (0) 761 203 96748
>
>


Re: How expensive is core loading?

2020-01-29 Thread Rahul Goswami
Thanks for your response Walter. But I could not find a Java api for Luke
for writing my tool. Is there one? I also tried using the  LukeRequestHandler
that comes with Solr, but invoking it causes the Solr core to be loaded.

Rahul

On Wed, Jan 29, 2020 at 5:20 PM Walter Underwood 
wrote:

> You might use Luke to get that info from the index files without loading
> them
> into Solr.
>
> https://code.google.com/archive/p/luke/
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
> > On Jan 29, 2020, at 2:01 PM, Rahul Goswami 
> wrote:
> >
> > Hello,
> > I am using Solr 7.2.1 on a Solr node running in standalone mode (-Xmx 8
> > GB). I wish to implement a service to monitor the server stats (like
> number
> > of docs per core, index size etc) .This would require me to load the core
> > and my concern is that for a node hosting 100+ cores, this could be
> > expensive. So here are my questions:
> >
> > 1) How expensive is core loading if I am only getting stats like the
> total
> > docs and size of the index (no expensive queries)?
> > 2) Does the memory consumption on core loading depend on the index size ?
> > 3) What is a reasonable value for transient cache size in a production
> > setup with above configuration?
> >
> > Thanks,
> > Rahul
>
>


Re: Clarity on Stable Release

2020-01-29 Thread Shawn Heisey

On 1/29/2020 11:24 AM, Jeff wrote:

Now, we are considering 8.2.0, 8.3.1, or 8.4.1 to use as they seem to be
stable. But it is hard to determine if we should be using the bleeding edge
or a few minor versions back since each of  these includes many bug fixes.
It is unclear to me why some fixes get back-patched and why some are
released under new minor version changes (which include some hefty
improvements and features).






To clarify, I am mostly asking for some clarity on which versions *should*
be used for a stable system and that we somehow can make it more clear in
the future. I am not trying to point the finger at specific bugs, but am
simply using them as examples as to why it is hard to determine a release
as stable.

If anybody has insight on this, please let me know.


My personal thought about any particular major version is that before 
using that version, it's a good idea to wait for a few releases, so that 
somebody braver than me can find the really big problems.


If 8.x were still brand new, I'd run the latest version of 7.x.  Since 
8.x has had a number of releases, my current thought for a new 
deployment would be to run the latest version of 8.x.  I would also plan 
on watching for new issues and being aggressive about upgrading to 
future 8.x versions.  I would maintain a test environment to qualify 
those releases.


All releases are called "stable".  That is the intent with any release 
-- for it to be good enough for anyone to use in production.  Sometimes 
we find problems after release.  When a problem is noted, we almost 
always create a test that will alert us if that problem should resurface.


What you refer to as "bleeding edge" is the master branch, and that 
branch is never used to create releases.


Thanks,
Shawn


Re: How expensive is core loading?

2020-01-29 Thread Shawn Heisey

On 1/29/2020 3:01 PM, Rahul Goswami wrote:

1) How expensive is core loading if I am only getting stats like the total
docs and size of the index (no expensive queries)?
2) Does the memory consumption on core loading depend on the index size ?
3) What is a reasonable value for transient cache size in a production
setup with above configuration?


What I would do is issue a RELOAD command.  For non-cloud deployments, 
I'd use the CoreAdmin API.  For cloud deployments, I'd use the 
Collections API.  To discover the answer, see how long it takes for the 
response to come back.


The time interval for a RELOAD is likely different than when Solr starts 
... but it sounds like you're more interested in the numbers for core 
loading after Solr starts than the ones during startup.


Thanks,
Shawn


Re: How expensive is core loading?

2020-01-29 Thread Rahul Goswami
Hi Shawn,
Thanks for the inputs. I realize I could have been clearer. By "expensive",
I mean expensive in terms of memory utilization. Eg: Let's say I have a
core with an index size of 10 GB and is not loaded on startup as per
configuration. If I load it in order to know the total documents and the
index size (to gather stats about the Solr server), is the amount of memory
consumed proportional to the index size in some way?

Thanks,
Rahul

On Wed, Jan 29, 2020 at 6:43 PM Shawn Heisey  wrote:

> On 1/29/2020 3:01 PM, Rahul Goswami wrote:
> > 1) How expensive is core loading if I am only getting stats like the
> total
> > docs and size of the index (no expensive queries)?
> > 2) Does the memory consumption on core loading depend on the index size ?
> > 3) What is a reasonable value for transient cache size in a production
> > setup with above configuration?
>
> What I would do is issue a RELOAD command.  For non-cloud deployments,
> I'd use the CoreAdmin API.  For cloud deployments, I'd use the
> Collections API.  To discover the answer, see how long it takes for the
> response to come back.
>
> The time interval for a RELOAD is likely different than when Solr starts
> ... but it sounds like you're more interested in the numbers for core
> loading after Solr starts than the ones during startup.
>
> Thanks,
> Shawn
>


Re: Solr Searcher 100% Latency Spike

2020-01-29 Thread Erick Erickson
Autowarming is significantly misunderstood. One of it's purposes in “the bad 
old days” was to rebuild very expensive on-heap structures for 
searching/sorting/grouping/and function queries.

These are exactly what docValues are designed to make much, much faster.

If you are still using spinning disks, the other benefit of warming queries is 
to read the index off disk and into MMapDirectory space. SSDs make this much 
faster too.


I often see two common mistakes:
1> no autowarming
2> excessive autowarming

I usually recommend people start with, say autowarm counts in the 10-20 as a 
start.

One implication of what you’ve said so far is that the additional 9 seconds 
your old autowarming took didn’t get you any benefit either, so putting it back 
isn’t indicated. I’m not quite clear why you say your memory footprint is 
lower, it’s unrelated to autowarming unless you also decreased your size 
parameter. If you’re saying that your reduced cache size hasn’t changed your 
95th percentile, I’d keep reducing it until it _did_ have a measurable effect.

The hit ratio is only loosely related to autowarming. So focusing on 
autowarming as a way to improve the hit ratio is probably the wrong focus.

So the first thing I’d do is make very, very sure that all the fields I used 
for grouping/sorting/faceting/function operations are docValues. Second, a 
static warming query that insured this rather relying on autowarming of the 
queryResultCache to happen to exercise those functions would be another step. 
NOTE: you don’t have to do all those operations on every field, just sorting on 
each field would suffice. NOTE: as of Solr 7.6, you can add “uninvertible=true” 
to your field types to insure that you have docValues set, see: SOLR-12962

And then I’d ask how much effort is smoothing out that kind of spike worth? You 
certainly see it with monitoring tools, but do users notice at all? If not, I 
wouldn’t spend all that much effort pursuing it…

Best,
Erick


> On Jan 29, 2020, at 4:48 PM, Karl Stoney 
>  wrote:
> 
> So interestingly tweaking my filter cache i've got the warming time down to 
> 1s (from 10!) and also reduced my memory footprint due to the smaller cache 
> size.
> 
> However, I still get these latency spikes (these changes have made no 
> difference to them).
> 
> So the theory about them being due to the warming being too intensive is 
> wrong.
> 
> I know the images didn't load btw so when I say spike I mean p95th response 
> time going from 50ms to 100-120ms momentarily.
> 
> From: Walter Underwood 
> Sent: 29 January 2020 21:30
> To: solr-user@lucene.apache.org 
> Subject: Re: Solr Searcher 100% Latency Spike
> 
> Looking at the log, that takes one or two seconds after a complete batch 
> reload (master/slave). So that is loading a cold index, all new files. This 
> is not a big index, about a half million book titles.
> 
> wunder
> Walter Underwood
> wun...@wunderwood.org
> https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fobserver.wunderwood.org%2F&data=02%7C01%7Ckarl.stoney%40autotrader.co.uk%7C88a60f1aa3e14255da7b08d7a5026ee3%7C926f3743f3d24b8a816818cfcbe776fe%7C0%7C0%7C637159302173939949&sdata=hhLg7bOfsLMN8OgLR625oj8xX%2Fm%2BZ%2BVOf1C813e4xk8%3D&reserved=0
>   (my blog)
> 
>> On Jan 29, 2020, at 1:21 PM, Karl Stoney 
>>  wrote:
>> 
>> Out of curiosity, could you define "fast"?
>> I'm wondering what sort of figures people target their searcher warm time at
>> 
>> From: Walter Underwood 
>> Sent: 29 January 2020 21:13
>> To: solr-user@lucene.apache.org 
>> Subject: Re: Solr Searcher 100% Latency Spike
>> 
>> I use a static set of warming queries, about 20 of them. That is fast and 
>> gets a decent amount of the index into file buffers. Your top queries won’t 
>> change much unless you have a news site or a seasonal business.
>> 
>> Like this:
>> 
>>   
>> 
>>   
>> 
>> introduction
>> intermediate
>> fundamentals
>> understanding
>> introductory
>> precalculus
>> foundations
>> microeconomics
>> microbiology
>> macroeconomics
>> discovering
>> international
>> mathematics
>> organizational
>> criminology
>> developmental
>> engineering
>>   
>> 
>>   
>> 
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fobserver.wunderwood.org%2F&data=02%7C01%7Ckarl.stoney%40autotrader.co.uk%7C88a60f1aa3e14255da7b08d7a5026ee3%7C926f3743f3d24b8a816818cfcbe776fe%7C0%7C0%7C637159302173939949&sdata=hhLg7bOfsLMN8OgLR625oj8xX%2Fm%2BZ%2BVOf1C813e4xk8%3D&reserved=0
>>   (my blog)
>> 
>>> On Jan 29, 2020, at 1:01 PM, Shawn Heisey  wrote:
>>> 
>>> On 1/29/2020 12:44 PM, Karl Stoney wrote:
 Looking for a bit of support here.  When we soft commit (every 10 
 minutes), we get a latency spike that means resp

Re: Clarity on Stable Release

2020-01-29 Thread Jeff
Thanks Shawn! Your answer is very helpful. Especially your note about
keeping up to date with the latest major version after a number of releases.

On Wed, Jan 29, 2020 at 6:35 PM Shawn Heisey  wrote:

> On 1/29/2020 11:24 AM, Jeff wrote:
> > Now, we are considering 8.2.0, 8.3.1, or 8.4.1 to use as they seem to be
> > stable. But it is hard to determine if we should be using the bleeding
> edge
> > or a few minor versions back since each of  these includes many bug
> fixes.
> > It is unclear to me why some fixes get back-patched and why some are
> > released under new minor version changes (which include some hefty
> > improvements and features).
>
> 
>
> >
> > To clarify, I am mostly asking for some clarity on which versions
> *should*
> > be used for a stable system and that we somehow can make it more clear in
> > the future. I am not trying to point the finger at specific bugs, but am
> > simply using them as examples as to why it is hard to determine a release
> > as stable.
> >
> > If anybody has insight on this, please let me know.
>
> My personal thought about any particular major version is that before
> using that version, it's a good idea to wait for a few releases, so that
> somebody braver than me can find the really big problems.
>
> If 8.x were still brand new, I'd run the latest version of 7.x.  Since
> 8.x has had a number of releases, my current thought for a new
> deployment would be to run the latest version of 8.x.  I would also plan
> on watching for new issues and being aggressive about upgrading to
> future 8.x versions.  I would maintain a test environment to qualify
> those releases.
>
> All releases are called "stable".  That is the intent with any release
> -- for it to be good enough for anyone to use in production.  Sometimes
> we find problems after release.  When a problem is noted, we almost
> always create a test that will alert us if that problem should resurface.
>
> What you refer to as "bleeding edge" is the master branch, and that
> branch is never used to create releases.
>
> Thanks,
> Shawn
>


Re: Solr Searcher 100% Latency Spike

2020-01-29 Thread Shawn Heisey

On 1/29/2020 2:48 PM, Karl Stoney wrote:

I know the images didn't load btw so when I say spike I mean p95th response 
time going from 50ms to 100-120ms momentarily.


I agree with Erick on looking at what users can actually notice.

When the normal response time is 50 milliseconds, even if that doubles 
or triples briefly, it's going to be barely noticeable to users.


Thanks,
Shawn


Re: Clarity on Stable Release

2020-01-29 Thread Dave
But!

If we don’t have people throwing a new release into production and finding real 
world problems we can’t trust that the current release problems will be exposed 
and then remedied, so it’s a double edged sword. I personally agree with 
staying a major version back, but that’s because it takes a long time to 
reindex another terabyte in combined indexes when a bug is found. However 
that’s not the norm, and I’m on an edge case where a full reindex is a few 
weeks or longer, if it was less than an a day or so I would be on 8.x

> On Jan 29, 2020, at 7:43 PM, Jeff  wrote:
> 
> Thanks Shawn! Your answer is very helpful. Especially your note about
> keeping up to date with the latest major version after a number of releases.
> 
>> On Wed, Jan 29, 2020 at 6:35 PM Shawn Heisey  wrote:
>> 
>>> On 1/29/2020 11:24 AM, Jeff wrote:
>>> Now, we are considering 8.2.0, 8.3.1, or 8.4.1 to use as they seem to be
>>> stable. But it is hard to determine if we should be using the bleeding
>> edge
>>> or a few minor versions back since each of  these includes many bug
>> fixes.
>>> It is unclear to me why some fixes get back-patched and why some are
>>> released under new minor version changes (which include some hefty
>>> improvements and features).
>> 
>> 
>> 
>>> 
>>> To clarify, I am mostly asking for some clarity on which versions
>> *should*
>>> be used for a stable system and that we somehow can make it more clear in
>>> the future. I am not trying to point the finger at specific bugs, but am
>>> simply using them as examples as to why it is hard to determine a release
>>> as stable.
>>> 
>>> If anybody has insight on this, please let me know.
>> 
>> My personal thought about any particular major version is that before
>> using that version, it's a good idea to wait for a few releases, so that
>> somebody braver than me can find the really big problems.
>> 
>> If 8.x were still brand new, I'd run the latest version of 7.x.  Since
>> 8.x has had a number of releases, my current thought for a new
>> deployment would be to run the latest version of 8.x.  I would also plan
>> on watching for new issues and being aggressive about upgrading to
>> future 8.x versions.  I would maintain a test environment to qualify
>> those releases.
>> 
>> All releases are called "stable".  That is the intent with any release
>> -- for it to be good enough for anyone to use in production.  Sometimes
>> we find problems after release.  When a problem is noted, we almost
>> always create a test that will alert us if that problem should resurface.
>> 
>> What you refer to as "bleeding edge" is the master branch, and that
>> branch is never used to create releases.
>> 
>> Thanks,
>> Shawn
>> 


Re: Can I create 1000 cores in SOLR CLOUD

2020-01-29 Thread Natarajan, Rajeswari
Good to know Shawn.

Thanks,
Rajeswari

On 1/29/20, 12:52 PM, "Shawn Heisey"  wrote:

On 1/27/2020 4:59 AM, Vignan Malyala wrote:
> We are currently using solr without cloud with 500 cores. It works good.
> 
> Now we are planning to expand it using solr cloud with 1000 cores, (2 
cores
> for each of my client with different domain data).

SolrCloud starts having scalability issues once you reach a few hundred 
collections, regardless of how many servers are in the cloud.

I explored this in SOLR-7191.  You might notice that the issue is in a 
"Resolved/Fixed" state ... but there were no changes committed, and when 
I tested it again with a later version, I saw evidence that the 
situation has gotten worse, not better.

https://issues.apache.org/jira/browse/SOLR-7191

If you already have mechanisms in place to handle high availability, you 
would be far better off NOT using SolrCloud mode.

Thanks,
Shawn




Re: How expensive is core loading?

2020-01-29 Thread Edward Ribeiro
Hi,

Luke was an standalone app and now is a Lucene module. Read here:
https://github.com/DmitryKey/luke

You don't need Solr to use it (LukeRequestHandler is a plus).

Best,
Edward


Em qua, 29 de jan de 2020 20:35, Rahul Goswami 
escreveu:

> Thanks for your response Walter. But I could not find a Java api for Luke
> for writing my tool. Is there one? I also tried using the
> LukeRequestHandler
> that comes with Solr, but invoking it causes the Solr core to be loaded.
>
> Rahul
>
> On Wed, Jan 29, 2020 at 5:20 PM Walter Underwood 
> wrote:
>
> > You might use Luke to get that info from the index files without loading
> > them
> > into Solr.
> >
> > https://code.google.com/archive/p/luke/
> >
> > wunder
> > Walter Underwood
> > wun...@wunderwood.org
> > http://observer.wunderwood.org/  (my blog)
> >
> > > On Jan 29, 2020, at 2:01 PM, Rahul Goswami 
> > wrote:
> > >
> > > Hello,
> > > I am using Solr 7.2.1 on a Solr node running in standalone mode (-Xmx 8
> > > GB). I wish to implement a service to monitor the server stats (like
> > number
> > > of docs per core, index size etc) .This would require me to load the
> core
> > > and my concern is that for a node hosting 100+ cores, this could be
> > > expensive. So here are my questions:
> > >
> > > 1) How expensive is core loading if I am only getting stats like the
> > total
> > > docs and size of the index (no expensive queries)?
> > > 2) Does the memory consumption on core loading depend on the index
> size ?
> > > 3) What is a reasonable value for transient cache size in a production
> > > setup with above configuration?
> > >
> > > Thanks,
> > > Rahul
> >
> >
>


Re: Solr fact response strange behaviour

2020-01-29 Thread Mikhail Khludnev
What's happen at AutoCompleteAPI.java:170 ?

On Wed, Jan 29, 2020 at 9:28 PM Kaminski, Adi 
wrote:

> Sure, thanks for the guidance and the assistance anyway.
>
> Here is the stack trace:
> Here is the stack trace:
> [29/01/20 08:09:41:041 IST] [http-nio-8080-exec-2] ERROR api.BaseAPI:
> There was an Exception calling Solr
> java.lang.ClassCastException: java.lang.Integer cannot be cast to
> java.lang.Long
> at
> com.productcore.analytics.api.AutoCompleteAPI.lambda$mapSolrResponse$0(AutoCompleteAPI.java:170)
> ~[classes/:?]
> at
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
> ~[?:1.8.0_201]
> at
> java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:580)
> ~[?:1.8.0_201]
> at
> com.productcore.analytics.api.AutoCompleteAPI.mapSolrResponse(AutoCompleteAPI.java:167)
> ~[classes/:?]
> at com.productcore.analytics.api.BaseAPI.execute(BaseAPI.java:48)
> [classes/:?]
> at
> com.productcore.analytics.controllers.DalController.getAutocomplete(DalController.java:205)
> [classes/:?]
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> ~[?:1.8.0_201]
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> ~[?:1.8.0_201]
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> ~[?:1.8.0_201]
> at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_201]
> at
> org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:189)
> [spring-web-5.1.6.RELEASE.jar:5.1.6.RELEASE]
> at
> org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:138)
> [spring-web-5.1.6.RELEASE.jar:5.1.6.RELEASE]
> at
> org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:102)
> [spring-webmvc-5.1.6.RELEASE.jar:5.1.6.RELEASE]
> at
> org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:892)
> [spring-webmvc-5.1.6.RELEASE.jar:5.1.6.RELEASE]
> at
> org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:797)
> [spring-webmvc-5.1.6.RELEASE.jar:5.1.6.RELEASE]
> at
> org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:87)
> [spring-webmvc-5.1.6.RELEASE.jar:5.1.6.RELEASE]
> at
> org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:1038)
> [spring-webmvc-5.1.6.RELEASE.jar:5.1.6.RELEASE]
> at
> org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:942)
> [spring-webmvc-5.1.6.RELEASE.jar:5.1.6.RELEASE]
> at
> org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:1005)
> [spring-webmvc-5.1.6.RELEASE.jar:5.1.6.RELEASE]
> at
> org.springframework.web.servlet.FrameworkServlet.doPost(FrameworkServlet.java:908)
> [spring-webmvc-5.1.6.RELEASE.jar:5.1.6.RELEASE]
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:660)
> [tomcat-embed-core-9.0.17.jar:9.0.17]
> at
> org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:882)
> [spring-webmvc-5.1.6.RELEASE.jar:5.1.6.RELEASE]
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:741)
> [tomcat-embed-core-9.0.17.jar:9.0.17]
> at
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:231)
> [tomcat-embed-core-9.0.17.jar:9.0.17]
> at
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
> [tomcat-embed-core-9.0.17.jar:9.0.17]
> at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:53)
> [tomcat-embed-websocket-9.0.17.jar:9.0.17]
> at
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
> [tomcat-embed-core-9.0.17.jar:9.0.17]
> at
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
> [tomcat-embed-core-9.0.17.jar:9.0.17]
> at
> org.springframework.boot.actuate.web.trace.servlet.HttpTraceFilter.doFilterInternal(HttpTraceFilter.java:90)
> [spring-boot-actuator-2.1.4.RELEASE.jar:2.1.4.RELEASE]
> at
> org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
> [spring-web-5.1.6.RELEASE.jar:5.1.6.RELEASE]
> at
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
> [tomcat-embed-core-9.0.17.jar:9.0.17]
> at
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
> [tomcat-embed-core-9.0.17.jar:9.0.17]
> at
> org.springframework.web.filter.RequestContextFilter.doFilterInternal(RequestContextFilter.java:99)
> [spring-web-5.1.6.RELEASE.jar:5.1.6.RELEASE]
> at
> org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
> [spring-web-5.1.6.RELEASE.jar:5.1

Re: How expensive is core loading?

2020-01-29 Thread Emir Arnautović
Hi Rahul,
It depends. You might have warm up queries that would populate caches. For each 
core Solr exposes JMX stats so you can read just those without “touching" core. 
You can also try using some of existing tools for monitoring Solr, but I don’t 
think that any of them provides you info about cores that are not loaded. You 
would see it as occupied disk.

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 30 Jan 2020, at 01:01, Rahul Goswami  wrote:
> 
> Hi Shawn,
> Thanks for the inputs. I realize I could have been clearer. By "expensive",
> I mean expensive in terms of memory utilization. Eg: Let's say I have a
> core with an index size of 10 GB and is not loaded on startup as per
> configuration. If I load it in order to know the total documents and the
> index size (to gather stats about the Solr server), is the amount of memory
> consumed proportional to the index size in some way?
> 
> Thanks,
> Rahul
> 
> On Wed, Jan 29, 2020 at 6:43 PM Shawn Heisey  wrote:
> 
>> On 1/29/2020 3:01 PM, Rahul Goswami wrote:
>>> 1) How expensive is core loading if I am only getting stats like the
>> total
>>> docs and size of the index (no expensive queries)?
>>> 2) Does the memory consumption on core loading depend on the index size ?
>>> 3) What is a reasonable value for transient cache size in a production
>>> setup with above configuration?
>> 
>> What I would do is issue a RELOAD command.  For non-cloud deployments,
>> I'd use the CoreAdmin API.  For cloud deployments, I'd use the
>> Collections API.  To discover the answer, see how long it takes for the
>> response to come back.
>> 
>> The time interval for a RELOAD is likely different than when Solr starts
>> ... but it sounds like you're more interested in the numbers for core
>> loading after Solr starts than the ones during startup.
>> 
>> Thanks,
>> Shawn
>>