Re: Caching Solr Grouping Results

2018-05-21 Thread Yasufumi Mizoguchi
Hi, Have you already tried "group.cache.percent" parameter? It might improve grouping performance. Or if you try CollapsingQParser, you can use expand component to acquire all values in groups, I think. ( https://lucene.apache.org/solr/guide/6_6/collapse-and-expand-results.html#collapse-and-expand

Navigating through Solr Source Code

2018-05-21 Thread Greenhorn Techie
Hi, As the documentation around Solr is limited, I am thinking to go through the source code and understand the various bits and pieces. However, I am a bit confused on where to start as I my developing skills are a bit limited. Any thoughts on how best to start / where to start looking into Solr

Re: Navigating through Solr Source Code

2018-05-21 Thread Emir Arnautović
Hi, I would start from the feature/concept that I find documentation to be vague. If you think that everything is like that, I would not start with code just yet and would focus on understanding high level concepts first. Also, you need to figure out if some feature is Solr or Lucene and if it i

[SECURITY] CVE-2018-8010: XXE vulnerability due to Apache Solr configset upload

2018-05-21 Thread Uwe Schindler
CVE-2018-8010: XXE vulnerability due to Apache Solr configset upload Severity: High Vendor: The Apache Software Foundation Versions Affected: Solr 6.0.0 to 6.6.3 Solr 7.0.0 to 7.3.0 Description: The details of this vulnerability were reported internally by one of Apache Solr's committers. This

Re: Navigating through Solr Source Code

2018-05-21 Thread Erick Erickson
Another useful trick is the class hierarchy displays most modern IDE's have available to get a sense of what class is where. And I second Emir's comment about picking some feature. _Nobody_ knows all the Solr code, and that's not even including Lucene. It's big, very big. So pick a feature you want

Re: Navigating through Solr Source Code

2018-05-21 Thread Deepak Goel
If you can find out how Solr evolved over the years, you can perhaps follow that same path On Mon, 21 May 2018, 18:35 Erick Erickson, wrote: > Another useful trick is the class hierarchy displays most modern IDE's > have available to get a sense of what class is where. And I second > Emir's comm

Thoughts on scaling strategy for Solr deployed on AWS EC2 instances - Scale up / out and which instance type?

2018-05-21 Thread Kelly, Frank
Using Solr 5.3.1 - index We have an indexing heavy workload (we do more indexing than searching) and for those searches we do perform we have very few cache hits (25% of our index is in memory and the hit rate is < 0.1%) We are currently using r3.xlarge (memory optimized instances as we origina

Re: Thoughts on scaling strategy for Solr deployed on AWS EC2 instances - Scale up / out and which instance type?

2018-05-21 Thread Erick Erickson
"replication falls behind and then starts to recover which causes more usage" I'm not quite sure what you mean by this. Are you using TLOG or PULL replica types? Or stand-alone Solr? There shouldn't really be any replication in the ideal state for NRT replicas. If you're using SolrCloud, the usua

Re: Thoughts on scaling strategy for Solr deployed on AWS EC2 instances - Scale up / out and which instance type?

2018-05-21 Thread Kelly, Frank
Thanks Erick, I am using TLOG replicas in this SolrCloud cluster - 3 shards, each with 3 replicas. Here¹s my decision logic based on my (limited) understanding - All shards seem to be equally used so to improve performance by adding shards I think I'd have to double from 3 shards to 6 (as indexi

Re: Thoughts on scaling strategy for Solr deployed on AWS EC2 instances - Scale up / out and which instance type?

2018-05-21 Thread Shawn Heisey
On 5/21/2018 8:25 AM, Kelly, Frank wrote: We have an indexing heavy workload (we do more indexing than searching) and for those searches we do perform we have very few cache hits (25% of our index is in memory and the hit rate is < 0.1%) Which cache are you looking at for that hitrate?  How a

analyzing infix suggester building in near real time LUCENE-5477

2018-05-21 Thread Matteo Grolla
Hi everyone, I'm evaluating suggesters that can can be in near real time and I came across https://issues.apache.org/jira/browse/LUCENE-5477. Is there a way to use this functionality from solr? Thanks very much Matteo Grolla

Re: Thoughts on scaling strategy for Solr deployed on AWS EC2 instances - Scale up / out and which instance type?

2018-05-21 Thread Deepak Goel
On Mon, May 21, 2018 at 7:55 PM, Kelly, Frank wrote: > Using Solr 5.3.1 - index > > We have an indexing heavy workload (we do more indexing than searching) > and for those searches we do perform we have very few cache hits (25% of > our index is in memory and the hit rate is < 0.1%) > > We are cu

Re: Navigating through Solr Source Code

2018-05-21 Thread Shawn Heisey
On 5/21/2018 4:35 AM, Greenhorn Techie wrote: As the documentation around Solr is limited, I am thinking to go through the source code and understand the various bits and pieces. However, I am a bit confused on where to start as I my developing skills are a bit limited. Any thoughts on how best

Re: CDCR setup with Custom Document Routing

2018-05-21 Thread Shalin Shekhar Mangar
Setups using implicit routers are not considered in the design so I don't think they will work today. That being said, it should be a simple enhancement to the CdcrReplicator to add the shard name to the UpdateRequest object. But ensure that both target and source have the exact same number and nam

Re: analyzing infix suggester building in near real time LUCENE-5477

2018-05-21 Thread Mikhail Khludnev
There was nothing like that year ago. Patches are welcome. On Mon, May 21, 2018 at 6:35 PM, Matteo Grolla wrote: > Hi everyone, > I'm evaluating suggesters that can can be in near real time and I came > across > https://issues.apache.org/jira/browse/LUCENE-5477. > Is there a way to use this

Re: Navigating through Solr Source Code

2018-05-21 Thread Greenhorn Techie
Thanks for your responses. Best Regards! On 21 May 2018 at 16:40:10, Shawn Heisey (apa...@elyograg.org) wrote: On 5/21/2018 4:35 AM, Greenhorn Techie wrote: > As the documentation around Solr is limited, I am thinking to go through > the source code and understand the various bits and pieces. H

Atomic update error with JSON handler

2018-05-21 Thread Nándor Mátravölgyi
Hi, I'm trying to build a simple document search core with SolrCloud. I've run into an issue when trying to partially update doucments. (aka atomic updates) It appears to be a bug, because the semantically same request succeeds in XML format, while it fails as JSON. The body of the XML request: t

Re: Index filename while indexing JSON file

2018-05-21 Thread S.Ashwath
Thanks Raymond. As I was doing the indexing of other delimited files directly with Solr and the terminal (without a client), I thought it would be possible to index the filename of JSON files this way as well. But like you say, I'm parsing the search results in Python. So I might as well build the

How to maintain fast query speed during heavy indexing?

2018-05-21 Thread Nguyen Nguyen
Hello everyone, I'm running SolrCloud cluster of 5 nodes with 5 shards and 3 replicas per shard. I usually see spikes in query performance during high indexing period. I would like to have stable query response time even during high indexing period. I recently upgraded to Solr 7.3 and running wi

Re: Atomic update error with JSON handler

2018-05-21 Thread Yasufumi Mizoguchi
Hi, At least, it is better to enclose your json body with '[ ]', I think. Following is the result I tried using curl. $ curl -XPOST "localhost:8983/solr/test_core/update/json?commit=true" --data-binary '{"id":"test1","title":{"set":"Solr Rocks"}}' { "responseHeader":{ "status":400, "QT

Re: Index filename while indexing JSON file

2018-05-21 Thread Bernd Fehling
I don't know if DIH can solve your problem but I would go for a simple self programmed ETL in JAVA and use SolrJ for loading. Best regards, Bernd Am 18.05.2018 um 21:47 schrieb S.Ashwath: Hello, I have 2 directories: 1 with txt files and the other with corresponding JSON (metadata) files (aro