How to handle nested documents in solr (SolrJ)

2017-05-23 Thread prasad chowdary
Dear All, I have a requirement that I need to index the documents in solr using Java code. Each document contains a sub documents like below ( Its just for underastanding my question). student id : 123 student name : john marks : maths: 90 English :95 student id : 124 student na

Re: solr 6 at scale

2017-05-23 Thread Erick Erickson
I'll quibble a little with Walter and say that 6.4.2 fixes the perf problem in 6.4.0 and 6.4.1. Which doesn't change his recommendation at all, I'd go with 6.5.1. Best, Erick On Tue, May 23, 2017 at 5:49 PM, Walter Underwood wrote: > We are running 6.5.1 in a 16 node cluster, four shards and fou

Re: Solr in NAS or Network Shared Drive

2017-05-23 Thread Shawn Heisey
On 5/19/2017 8:33 AM, Ravi Kumar Taminidi wrote: > Hello, Scenario: Currently we have 2 Solr Servers running in 2 different > servers (linux), Is there any way can we make the Core to be located in NAS > or Network shared Drive so both the solrs using the same Index. > > Let me know if any perfo

Re: Disable All kind of caching in Solr/Lucene

2017-05-23 Thread Nilesh Kamani
Thanks Pushkar. I will upgrade to latest solar version and check if it is working now. On Tue, May 23, 2017 at 7:13 PM Pushkar Raste wrote: > What version are you on. There was a bug where if you use cache size 0, it > would still create a cache with size 2 (or may be just 1). It was fixed > un

Re: solr 6 at scale

2017-05-23 Thread Walter Underwood
We are running 6.5.1 in a 16 node cluster, four shards and four replicas. It is performing brilliantly. Our index is 18 million documents, but we have very heavy queries. Students are searching for homework help, so they paste in the entire problem. We truncate queries at 40 terms to limit the

solr 6 at scale

2017-05-23 Thread Nawab Zada Asad Iqbal
Hi all, I am planning to upgrade my solr.4.x installation to a recent stable version. Should I get the latest 6.5.1 bits or will a little older release be better in terms of stability? I am curious if there is way to see solr.6.x adoption in large companies. I have talked to few people and they ar

Re: Rule-based Replica Placement not working with Solr 6.5.1

2017-05-23 Thread Damien Kamerman
I'm not sure I fully understand what you're trying to do but this is what I do to ensure replicas are not on the same rack: rule=shard:*,replica:<2,sysprop.rack:* On 23 May 2017 at 22:37, Bernd Fehling wrote: > Yes, I tried that already. > Sure, it assigns 2 nodes with port 8983 to shard1 (e.g.

Re: Disable All kind of caching in Solr/Lucene

2017-05-23 Thread Pushkar Raste
What version are you on. There was a bug where if you use cache size 0, it would still create a cache with size 2 (or may be just 1). It was fixed under https://issues.apache.org/jira/browse/SOLR-9886?filter=-2 On Apr 3, 2017 9:26 AM, "Nilesh Kamani" wrote: > @Yonik even though the code change

JSON Facet API : numBuckets and count clarification

2017-05-23 Thread Varun Thacker
Here is my current understanding of how these counts work numBuckets : Is supposed to tell the user how many unique buckets were seen in the facet calculation. Given that we currently don't do refinements this number can be equal or less than the actual number of unique buckets count : The total

Re: Indexing word with plus sign

2017-05-23 Thread Walter Underwood
That was on Solr 1.3, so I’m pretty sure it was the whitespace tokenizer. The synonym substitution for “+/-" was done in client code and indexing code, outside of Solr. We also sanitized queries to remove all query syntax characters. wunder Walter Underwood wun...@wunderwood.org http://observe

Re: Indexing word with plus sign

2017-05-23 Thread Fundera Developer
Thanks Walter!! For the sake of curiosity, do you remember which Tokenizer were you using in that case? Thanks! El 23/05/17 a las 20:02, Walter Underwood escribió: Years ago at Netflix, I had to deal with a DVD from a band named “+/-“. I gave up and translated that to “plusminus” at index an

Re: Indexing word with plus sign

2017-05-23 Thread Walter Underwood
Years ago at Netflix, I had to deal with a DVD from a band named “+/-“. I gave up and translated that to “plusminus” at index and query time. http://plusmin.us/ Luckily, “.hack//Sign” and other related dot-hack anime matched if I just deleted all the punctuation. And everyo

Re: solrcloud replicas not in sync

2017-05-23 Thread Erick Erickson
This is all quite strange. Optimize (BTW, it's rarely necessary/desirable on an index that changes, despite its name) shouldn't matter here. CDCR forwards the raw documents to the target cluster. Ample time indeed. With a soft commit of 15 seconds, that's your window (with some slop for how long C

Re: Indexing word with plus sign

2017-05-23 Thread Erick Erickson
You need to distinguish between PatternReplaceCharFilterFactory and PatternReplaceFilterFactory The first one is applied to the entire input _before_ tokenization. The second is applied _after_ tokenization to individual tokens, by that time it's too late. It's an easy thing to miss. And at q

RE: Using the Data Import Handler with SQLite

2017-05-23 Thread Dheeraj Kumar Karnati
Hi Zac, I think you have added entity closing tag 2 times. that might be causing an issue. It been a long time . not sure whether you are still working on it or not. -- View this message in context: http://lucene.472066.n3.nabble.com/Using-the-Data-Import-Handler-with-SQLite-tp276565

Re: Indexing word with plus sign

2017-05-23 Thread Fundera Developer
I have also tried this option, by using a PatternReplaceFilterFactory, like this: but it gets processed AFTER the Tokenizer, so when it executes there is no longer an "i+d" token, but two "i" and "d" independent tokens. Is there a way I could make the filter execute before the Tokenizer? I ha

Re: solrcloud replicas not in sync

2017-05-23 Thread Webster Homer
We see a pretty consistent issue where the replicas show in the admin console as not current, indicating that our auto commit isn't commiting. In one case we loaded the data to the source, cdcr replicated it to the targets and we see the source and the target as having current = false. It is search

Re: Spread SolrCloud across two locations

2017-05-23 Thread Shawn Heisey
On 5/23/2017 10:12 AM, Susheel Kumar wrote: Hi Jan, FYI - Since last year, I have been running a Solr 6.0 cluster in one of lower env with 6 shards/replica in dc1 & 6 shard/replica in dc2 (each shard replicated cross data center) with 3 ZK in dc1 and 2 ZK in dc2. (I didn't have the availability

Re: Spread SolrCloud across two locations

2017-05-23 Thread Susheel Kumar
Agree, Erick. Since this setup is in our test env, haven't really invested to add another DC but for Prod sure, will go by DC3 if we do go with this setup. On Tue, May 23, 2017 at 12:38 PM, Erick Erickson wrote: > Susheel: > > The issue is that if, for any reason at all, the connection between

Re: Spread SolrCloud across two locations

2017-05-23 Thread Erick Erickson
Susheel: The issue is that if, for any reason at all, the connection between dc1 and dc2 is broken, there will be no indexing on dc2 since the Solr servers there will not sense ZK quorum. You'll have to do something manual to reconfigure. That's not a flaw in your setup, just the way things w

Re: Spread SolrCloud across two locations

2017-05-23 Thread Susheel Kumar
Hi Jan, FYI - Since last year, I have been running a Solr 6.0 cluster in one of lower env with 6 shards/replica in dc1 & 6 shard/replica in dc2 (each shard replicated cross data center) with 3 ZK in dc1 and 2 ZK in dc2. (I didn't have the availability of 3rd data center for ZK so went with only 2

Re: SOLR Index and Schema.xml file corruption

2017-05-23 Thread Erick Erickson
If you have classic schema factory configured, then Solr will not write the schema.xml file out. So either something's strange with SiteCore or someone inadvertently hand-edited the schema. I suggest contacting the SiteCore people to see how it would get that way. You should be able to shut Solr/S

Re: High CPU when use grouping group.ngroups=true

2017-05-23 Thread Nguyen Manh Tien
The collapse field is high-cardinality field. I haven't profiling yet but will do it. Thanks, Tien On Tue, May 23, 2017 at 9:48 PM, Erick Erickson wrote: > How many unique values in your group field? For high-cardinality > fields there's quite a bit of bookkeeping that needs to be done. > > Hav

Re: High CPU when use grouping group.ngroups=true

2017-05-23 Thread Erick Erickson
How many unique values in your group field? For high-cardinality fields there's quite a bit of bookkeeping that needs to be done. Have you tried profiling to see where the CPU time is being spent? Best, Erick On Tue, May 23, 2017 at 7:46 AM, Nguyen Manh Tien wrote: > Hi All, > > I recently swit

High CPU when use grouping group.ngroups=true

2017-05-23 Thread Nguyen Manh Tien
Hi All, I recently switch from solr field collapse/expand to grouping for collapse search result All seem good but CPU is always high (80-100%) when i set param group.ngroups=true. We set ngroups=true to get number of groups so that we can paginate search result correctly. Due to CPU issue we nee

Re: Spread SolrCloud across two locations

2017-05-23 Thread Jan Høydahl
I.e. tell the customer that in order to have automatic failover and recovery in a 2-location setup we require at least one ZK instance in a separate third location. Kind of a tough requirement but necessary to safe-guard against split brain during network partition. If a third location is not

Re: Rule-based Replica Placement not working with Solr 6.5.1

2017-05-23 Thread Bernd Fehling
Yes, I tried that already. Sure, it assigns 2 nodes with port 8983 to shard1 (e.g. server1:8983,server2:8983). But due to no replica rule (which defaults to wildcard) I also get shard3 --> server2:8983,server2:7574 shard2 --> server1:7574,server3:8983 The result is 3 replicas on server2 and also

SOLR Index and Schema.xml file corruption

2017-05-23 Thread LAD, SAGAR
Hi SOLR team, We are using SOLR 4.6.0 with sitecore CMS 7.2 . It is observed that search indexes and some time schema.xml file get corrupted. Schema.xml field tag got extra forward slash and it result into stopping of SOLR. We have " " therefore only manual update is allowed. Please guide us

Re: Rule-based Replica Placement not working with Solr 6.5.1

2017-05-23 Thread Noble Paul
did you try the rule shard:shard1,port:8983 this ensures that all replicas of shard1 is allocated in the node w/ port 8983. if it doesn't , it's a bug. Please open aticket On Tue, May 23, 2017 at 7:10 PM, Bernd Fehling wrote: > After some analysis it turns out that they compare apples with ora

Re: the problem on CDCR of solrCloud

2017-05-23 Thread Rick Leir
Hi 魏晓峰 What in particular is not working? Cheers -- Rick On May 22, 2017 11:00:24 PM EDT, "魏晓峰" wrote: >hello,my name is weixiaofeng. I'm from China, I'm a java >developer.Recently we use the technology of solr to complete search >big data.we were in trouble in module CDCR(Cross Data Center >Re

Re: the problem on CDCR of solrCloud

2017-05-23 Thread alessandro.benedetti
What doesn't work ? Can you specify exactly what is hapening ? Can you add stacktraces as evidence to something bad happening ? We can then try to help! - --- Alessandro Benedetti Search Consultant, R&D Software Engineer, Director Sease Ltd. - www.sease.io -- View this message in

Re: different length/size of unique 'id' field value in a collection.

2017-05-23 Thread Rick Leir
Derek, If your algorithm is guaranteed to always provide unique id's then fine. I say incorrectly in that, after a few years in software development, I have seen bugs in the most careful code. A bug causing ID collisions could be hard to track down. Solr can generate unique ID's for you, and you

Re: Rule-based Replica Placement not working with Solr 6.5.1

2017-05-23 Thread Bernd Fehling
After some analysis it turns out that they compare apples with oranges :-( Inside "tryAPermutationOfRules" the rule is called with rules.get() and the next step is calling rule.compare(), but they don't compare the nodes against the rule (or rules). They compare the nodes against each other. E.g.

RE: Spread SolrCloud across two locations

2017-05-23 Thread Markus Jelsma
I would probably start by renting a VM at a third location to run Zookeeper. Markus -Original message- > From:Jan Høydahl > Sent: Tuesday 23rd May 2017 11:09 > To: solr-user > Subject: Spread SolrCloud across two locations > > Hi, > > A customer has two locations (a few km apart) wit

Spread SolrCloud across two locations

2017-05-23 Thread Jan Høydahl
Hi, A customer has two locations (a few km apart) with super-fast networking in-between, so for day-to-day operation they view all VMs in both locations as a pool of servers. They typically spin up redundant servers for various services in each zone and if a zone should fail (they are a few km