mm with sow set to false bug
Hello, We decided to upgrade solr to use Synonym graph filter, and this filter requires sow to be set to false. But after setting sow to false we’ve started to see some unexpected behaviour and after digging down the problem we’ve found out that mm results in wrong behaviour when setting sow to false and mm.autoRelax doesn’t work properly. The tests were as follows: solr setup: Solr version: 6.6.0 and 6.6.1 Index: id,value doc1,“This is the first sentence” doc2,“This is the second sentence” doc3,“This is the third one” doc4,“This is the last one” Field type: stopwords.txt file: This Is The random_word Tests: test 1: Query string: This one mm value: 2 mm.autoRelax: TRUE Expected result: doc1 Actual result: doc1 test 2: Query string: This one mm value: 2 mm.autoRelax: FALSE Expected result: nothing Actual result: doc1 test 3: Query string: This one mm value: 1 mm.autoRelax: TRUE Expected result: doc1 Actual result: doc1 test 4: Query string: This one mm value: 1 mm.autoRelax: FALSE Expected result: nothing Actual result: doc1 As we can see in the tests, mm.autoRelax isn’t working properly. It behaves like it’s always true in the previous cases. This behaviour was present in both solr 6.6.0 and 6.6.1 What can we do to keep using SynonymGraphFilter without introducing the above issues with autoRelax and mm?
Re: Solr memory leak
Hi, looks like SOLR-10506 didn't make it into 6.6.1. I do however also not see it listen in the current release notes for 6.7 nor 7.0: https://issues.apache.org/jira/projects/SOLR/versions/12340568 https://issues.apache.org/jira/projects/SOLR/versions/12335718 Is there any any rough idea already when 6.7 or 7.0 will be released? thanks, Hendrik On 28.08.2017 18:31, Erick Erickson wrote: Varun Thacker is the RM for Solr 6.6.1, I've pinged him about including it. On Mon, Aug 28, 2017 at 8:52 AM, Walter Underwood wrote: That would be a really good reason for a 6.7. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) On Aug 28, 2017, at 8:48 AM, Markus Jelsma wrote: It is, unfortunately, not committed for 6.7. -Original message- From:Markus Jelsma Sent: Monday 28th August 2017 17:46 To: solr-user@lucene.apache.org Subject: RE: Solr memory leak See https://issues.apache.org/jira/browse/SOLR-10506 Fixed for 7.0 Markus -Original message- From:Hendrik Haddorp Sent: Monday 28th August 2017 17:42 To: solr-user@lucene.apache.org Subject: Solr memory leak Hi, we noticed that triggering collection reloads on many collections has a good chance to result in an OOM-Error. To investigate that further I did a simple test: - Start solr with a 2GB heap and 1GB Metaspace - create a trivial collection with a few documents (I used only 2 fields and 100 documents) - trigger a collection reload in a loop (I used SolrJ for this) Using Solr 6.3 the test started to fail after about 250 loops. Solr 6.6 worked better but also failed after 1100 loops. When looking at the memory usage on the Solr dashboard it looks like the space left after GC cycles gets less and less. Then Solr gets very slow, as the JVM is busy with the GC. A bit later Solr gets an OOM-Error. In my last run this was actually for the Metaspace. So it looks like more and more heap and metaspace is being used by just constantly reloading a trivial collection. regards, Hendrik
multi language search engine in solr
Hi I am working on multi language search engine for english,bangla, hindi and indonesia language. can anybody guide me how to configure solr schema. 1.) should i need to configure all the language in a single shard/collection. ? 2.)should I need to configure separate shard/collection for each of language ? I am looking for the suggestion about architecture level of this project, Please suggest and guide me to defining the schema and architecture. -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Solr memory leak
There will be no 6.7. Once the X+1 version is released, all past fixes are applied to as minor releases to the last released version of the previous major release. So now that 7.0 has been cut, there might be a 6.6.2 (6.6.1 was just released) but no 6.7. Current un-released JIRAs are parked on the 6.x (as opposed to branch_6_6) for convenience. If anyone steps up to release 6.6.2, they can include ths. Why do you say this isn't in 7.0? The "Fix Versions" clearly states so, as does CHANGES.txt for 7.0. The new file is is in the 7.0 branch. If you need it in 6x you have a couple of options: 1> agitate fo ra 6.6.2 with this included 2> apply the patch yourself and compile it locally Best, Erick On Sun, Sep 10, 2017 at 6:04 AM, Hendrik Haddorp wrote: > Hi, > > looks like SOLR-10506 didn't make it into 6.6.1. I do however also not see > it listen in the current release notes for 6.7 nor 7.0: > https://issues.apache.org/jira/projects/SOLR/versions/12340568 > https://issues.apache.org/jira/projects/SOLR/versions/12335718 > > Is there any any rough idea already when 6.7 or 7.0 will be released? > > thanks, > Hendrik > > > On 28.08.2017 18:31, Erick Erickson wrote: >> >> Varun Thacker is the RM for Solr 6.6.1, I've pinged him about including >> it. >> >> On Mon, Aug 28, 2017 at 8:52 AM, Walter Underwood >> wrote: >>> >>> That would be a really good reason for a 6.7. >>> >>> wunder >>> Walter Underwood >>> wun...@wunderwood.org >>> http://observer.wunderwood.org/ (my blog) >>> >>> On Aug 28, 2017, at 8:48 AM, Markus Jelsma wrote: It is, unfortunately, not committed for 6.7. -Original message- > > From:Markus Jelsma > Sent: Monday 28th August 2017 17:46 > To: solr-user@lucene.apache.org > Subject: RE: Solr memory leak > > See https://issues.apache.org/jira/browse/SOLR-10506 > Fixed for 7.0 > > Markus > > > > -Original message- >> >> From:Hendrik Haddorp >> Sent: Monday 28th August 2017 17:42 >> To: solr-user@lucene.apache.org >> Subject: Solr memory leak >> >> Hi, >> >> we noticed that triggering collection reloads on many collections has >> a >> good chance to result in an OOM-Error. To investigate that further I >> did >> a simple test: >> - Start solr with a 2GB heap and 1GB Metaspace >> - create a trivial collection with a few documents (I used only 2 >> fields and 100 documents) >> - trigger a collection reload in a loop (I used SolrJ for this) >> >> Using Solr 6.3 the test started to fail after about 250 loops. Solr >> 6.6 >> worked better but also failed after 1100 loops. >> >> When looking at the memory usage on the Solr dashboard it looks like >> the >> space left after GC cycles gets less and less. Then Solr gets very >> slow, >> as the JVM is busy with the GC. A bit later Solr gets an OOM-Error. In >> my last run this was actually for the Metaspace. So it looks like more >> and more heap and metaspace is being used by just constantly reloading >> a >> trivial collection. >> >> regards, >> Hendrik >> >
Re: multi language search engine in solr
Mugeesh, One important question: will the typical document have a mix of English and Bangla and Hindi? If so, you would probably have them all in one collection. Another thing to think about is the tokenizer. Are all words separated by white space? If not, then you might need to think about which tokenizer to use. As for character sets, I think you should make sure all the inputs are in UTF-8, then there should be no problem. There will be other things to consider but this is a start. Cheers -- Rick On September 10, 2017 9:32:11 AM EDT, Mugeesh Husain wrote: >Hi > >I am working on multi language search engine for english,bangla, hindi >and >indonesia language. can anybody guide me how to configure solr >schema. > >1.) should i need to configure all the language in a single >shard/collection. ? >2.)should I need to configure separate shard/collection for each of >language ? > >I am looking for the suggestion about architecture level of this >project, >Please suggest and guide me to defining the schema and architecture. > > > >-- >Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html -- Sorry for being brief. Alternate email is rickleir at yahoo dot com
Re: Solr memory leak
I didn't meant to say that the fix is not in 7.0. I just stated that I do not see it listed in the release notes (https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12310230&version=12335718). Thanks for explaining the release process. regards, Hendrik On 10.09.2017 17:32, Erick Erickson wrote: There will be no 6.7. Once the X+1 version is released, all past fixes are applied to as minor releases to the last released version of the previous major release. So now that 7.0 has been cut, there might be a 6.6.2 (6.6.1 was just released) but no 6.7. Current un-released JIRAs are parked on the 6.x (as opposed to branch_6_6) for convenience. If anyone steps up to release 6.6.2, they can include ths. Why do you say this isn't in 7.0? The "Fix Versions" clearly states so, as does CHANGES.txt for 7.0. The new file is is in the 7.0 branch. If you need it in 6x you have a couple of options: 1> agitate fo ra 6.6.2 with this included 2> apply the patch yourself and compile it locally Best, Erick On Sun, Sep 10, 2017 at 6:04 AM, Hendrik Haddorp wrote: Hi, looks like SOLR-10506 didn't make it into 6.6.1. I do however also not see it listen in the current release notes for 6.7 nor 7.0: https://issues.apache.org/jira/projects/SOLR/versions/12340568 https://issues.apache.org/jira/projects/SOLR/versions/12335718 Is there any any rough idea already when 6.7 or 7.0 will be released? thanks, Hendrik On 28.08.2017 18:31, Erick Erickson wrote: Varun Thacker is the RM for Solr 6.6.1, I've pinged him about including it. On Mon, Aug 28, 2017 at 8:52 AM, Walter Underwood wrote: That would be a really good reason for a 6.7. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) On Aug 28, 2017, at 8:48 AM, Markus Jelsma wrote: It is, unfortunately, not committed for 6.7. -Original message- From:Markus Jelsma Sent: Monday 28th August 2017 17:46 To: solr-user@lucene.apache.org Subject: RE: Solr memory leak See https://issues.apache.org/jira/browse/SOLR-10506 Fixed for 7.0 Markus -Original message- From:Hendrik Haddorp Sent: Monday 28th August 2017 17:42 To: solr-user@lucene.apache.org Subject: Solr memory leak Hi, we noticed that triggering collection reloads on many collections has a good chance to result in an OOM-Error. To investigate that further I did a simple test: - Start solr with a 2GB heap and 1GB Metaspace - create a trivial collection with a few documents (I used only 2 fields and 100 documents) - trigger a collection reload in a loop (I used SolrJ for this) Using Solr 6.3 the test started to fail after about 250 loops. Solr 6.6 worked better but also failed after 1100 loops. When looking at the memory usage on the Solr dashboard it looks like the space left after GC cycles gets less and less. Then Solr gets very slow, as the JVM is busy with the GC. A bit later Solr gets an OOM-Error. In my last run this was actually for the Metaspace. So it looks like more and more heap and metaspace is being used by just constantly reloading a trivial collection. regards, Hendrik
Re: multi language search engine in solr
Thank you rick for your response. The document document have sepearte of the lanaguage instead of mix of Arabic, English, Bengali, Hindi, Malay. I coul not find any tokenizer for Malay, can you suggest me if you know please. -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
ways to check if document is in a huge search result set
Hi I have a collection of productdocument. Each productdocument has supplier information in it. I need to check if a supplier's products is return in a search resultcontaining over 100,000 products and in which page (assuming pagination is 20 products per page). Itis time-consuming and "labour-intensive" to go through each page to look for the product of the supplier. Would like to know if you guys have any better and easier waysto do this? Derek -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons.
Re: ways to check if document is in a huge search result set
Maybe I don't understand your problem, but why don't you just filter by "supplier information"? -Michael Am 11.09.2017 um 04:12 schrieb Derek Poh: > Hi > > I have a collection of productdocument. > Each productdocument has supplier information in it. > > I need to check if a supplier's products is return in a search > resultcontaining over 100,000 products and in which page (assuming > pagination is 20 products per page). > Itis time-consuming and "labour-intensive" to go through each page to > look for the product of the supplier. > > Would like to know if you guys have any better and easier waysto do this? > > Derek > > -- > CONFIDENTIALITY NOTICE > This e-mail (including any attachments) may contain confidential > and/or privileged information. If you are not the intended recipient > or have received this e-mail in error, please inform the sender > immediately and delete this e-mail (including any attachments) from > your computer, and you must not use, disclose to anyone else or copy > this e-mail (including any attachments), whether in whole or in part. > This e-mail and any reply to it may be monitored for security, legal, > regulatory compliance and/or other appropriate reasons.
Re: ways to check if document is in a huge search result set
You can request facet field, query facet, filter or even explainOther. On Mon, Sep 11, 2017 at 5:12 AM, Derek Poh wrote: > Hi > > I have a collection of productdocument. > Each productdocument has supplier information in it. > > I need to check if a supplier's products is return in a search > resultcontaining over 100,000 products and in which page (assuming > pagination is 20 products per page). > Itis time-consuming and "labour-intensive" to go through each page to look > for the product of the supplier. > > Would like to know if you guys have any better and easier waysto do this? > > Derek > > -- > CONFIDENTIALITY NOTICE > This e-mail (including any attachments) may contain confidential and/or > privileged information. If you are not the intended recipient or have > received this e-mail in error, please inform the sender immediately and > delete this e-mail (including any attachments) from your computer, and you > must not use, disclose to anyone else or copy this e-mail (including any > attachments), whether in whole or in part. > This e-mail and any reply to it may be monitored for security, legal, > regulatory compliance and/or other appropriate reasons. -- Sincerely yours Mikhail Khludnev