Partial Match with DF
Forgive me if I'm missing something obvious -- I'm new to Solr, but I can't seem to find an explanation for the behavior I'm seeing. If I have a document that looks like this: { field1: "aaa bbb", field2: "ccc ddd", field3: "eee fff" } And I do a search where "q" is "aaa ccc", I get the document in the results. This is because (please correct me if I'm wrong) the default "df" is set to the "_text_" field, which contains the text values from all fields. However, if I do a search where "df" is "field1" and "field2" and "q" is "aaa ccc" (words from field1 and field2) I get no results. In a simpler example, if I do a search where "df" is "field1" and "q" is "aaa" (a word from field1) I still get no results. If I do a search where "df" is "field1" and "q" is "aaa bbb" (the full value of field1) then I get the document in the results. So I'm concluding that when using "df" to specify which fields to search then only an exact match on the full field value will return a document. Is that a correct conclusion? Is there another way to specify which fields to search without requiring an exact match? The results I'd like to achieve are: Would Match: q=aaa q=aaa bbb q=aaa ccc q=aaa fff Would Not Match: q=eee q=fff q=eee fff -- *This message is intended only for the use of the individual or entity to which it is addressed and may contain information that is privileged, confidential and exempt from disclosure under applicable law. If you have received this message in error, you are hereby notified that any use, dissemination, distribution or copying of this message is prohibited. If you have received this communication in error, please notify the sender immediately and destroy the transmitted information.*
Re: Partial Match with DF
Oh, great! Thank you! So if I switch over to eDisMax I'd specify the fields to query via the "qf" parameter, right? That seems to have the same result (only matches when I specify the exact phrase in the field, not just certain words from it). On Thu, Mar 16, 2017 at 9:33 AM, Alexandre Rafalovitch wrote: > df is default field - you can only give one. To search over multiple > fields, you switch to eDisMax query parser and fl parameter. > > Then, the question will be what type definition your fields have. When you > search text field, you are using its definition because of copyField. Your > original fields may be strings. > > Remember to reload core and reminded when you change definitions. > > Regards, >Alex > > > On 16 Mar 2017 9:15 AM, "Mark Johnson" > wrote: > > > Forgive me if I'm missing something obvious -- I'm new to Solr, but I > can't > > seem to find an explanation for the behavior I'm seeing. > > > > If I have a document that looks like this: > > { > > field1: "aaa bbb", > > field2: "ccc ddd", > > field3: "eee fff" > > } > > > > And I do a search where "q" is "aaa ccc", I get the document in the > > results. This is because (please correct me if I'm wrong) the default > "df" > > is set to the "_text_" field, which contains the text values from all > > fields. > > > > However, if I do a search where "df" is "field1" and "field2" and "q" is > > "aaa ccc" (words from field1 and field2) I get no results. > > > > In a simpler example, if I do a search where "df" is "field1" and "q" is > > "aaa" (a word from field1) I still get no results. > > > > If I do a search where "df" is "field1" and "q" is "aaa bbb" (the full > > value of field1) then I get the document in the results. > > > > So I'm concluding that when using "df" to specify which fields to search > > then only an exact match on the full field value will return a document. > > > > Is that a correct conclusion? Is there another way to specify which > fields > > to search without requiring an exact match? The results I'd like to > achieve > > are: > > > > Would Match: > > q=aaa > > q=aaa bbb > > q=aaa ccc > > q=aaa fff > > > > Would Not Match: > > q=eee > > q=fff > > q=eee fff > > > > -- > > *This message is intended only for the use of the individual or entity to > > which it is addressed and may contain information that is privileged, > > confidential and exempt from disclosure under applicable law. If you have > > received this message in error, you are hereby notified that any use, > > dissemination, distribution or copying of this message is prohibited. If > > you have received this communication in error, please notify the sender > > immediately and destroy the transmitted information.* > > > -- Best Regards, *Mark Johnson* | .NET Software Engineer Office: 603-392-7017 Emerson Ecologics, LLC | 1230 Elm Street | Suite 301 | Manchester NH | 03101 <http://www.emersonecologics.com/> <https://wellevate.me/#/> *Supporting The Practice Of Healthy Living* <http://blog.emersonecologics.com/> <https://www.linkedin.com/company/emerson-ecologics> <https://www.facebook.com/emersonecologics/> <https://twitter.com/EmersonEcologic> <https://www.instagram.com/emerson_ecologics/> <https://www.pinterest.com/emersonecologic/> <https://www.glassdoor.com/Overview/Working-at-Emerson-Ecologics-EI_IE388367.11,28.htm> -- *This message is intended only for the use of the individual or entity to which it is addressed and may contain information that is privileged, confidential and exempt from disclosure under applicable law. If you have received this message in error, you are hereby notified that any use, dissemination, distribution or copying of this message is prohibited. If you have received this communication in error, please notify the sender immediately and destroy the transmitted information.*
Re: Partial Match with DF
You're right! The fields I'm searching are all "string" type. I switched to "text_en" and now it's working exactly as I need it to! I'll do some research to see if "text_en" or another "text" type field is best for our needs. Also, those debug options are amazing! They'll help tremendously in the future. Thank you much! On Thu, Mar 16, 2017 at 10:02 AM, Erick Erickson wrote: > My guess: Your analysis chain for the fields is different, i.e. they > have a different fieldType. In particular, watch out for the "string" > type, people are often confused about it. It does _not_ break input > into tokens, you need a text-based field type, text_en is one example > that is usually in the configs by default. > > Two tools that'll help you enormously: > > admin UI>>select core (or collection) from the drop-down>>analysis > That shows you exactly how Solr/Lucene break up text at query and index > time > > add &debug=query to the URL. That'll show you how the query was parsed. > > Best, > Erick > > On Thu, Mar 16, 2017 at 6:52 AM, Mark Johnson > wrote: > > Oh, great! Thank you! > > > > So if I switch over to eDisMax I'd specify the fields to query via the > "qf" > > parameter, right? That seems to have the same result (only matches when I > > specify the exact phrase in the field, not just certain words from it). > > > > On Thu, Mar 16, 2017 at 9:33 AM, Alexandre Rafalovitch < > arafa...@gmail.com> > > wrote: > > > >> df is default field - you can only give one. To search over multiple > >> fields, you switch to eDisMax query parser and fl parameter. > >> > >> Then, the question will be what type definition your fields have. When > you > >> search text field, you are using its definition because of copyField. > Your > >> original fields may be strings. > >> > >> Remember to reload core and reminded when you change definitions. > >> > >> Regards, > >>Alex > >> > >> > >> On 16 Mar 2017 9:15 AM, "Mark Johnson" > >> wrote: > >> > >> > Forgive me if I'm missing something obvious -- I'm new to Solr, but I > >> can't > >> > seem to find an explanation for the behavior I'm seeing. > >> > > >> > If I have a document that looks like this: > >> > { > >> > field1: "aaa bbb", > >> > field2: "ccc ddd", > >> > field3: "eee fff" > >> > } > >> > > >> > And I do a search where "q" is "aaa ccc", I get the document in the > >> > results. This is because (please correct me if I'm wrong) the default > >> "df" > >> > is set to the "_text_" field, which contains the text values from all > >> > fields. > >> > > >> > However, if I do a search where "df" is "field1" and "field2" and "q" > is > >> > "aaa ccc" (words from field1 and field2) I get no results. > >> > > >> > In a simpler example, if I do a search where "df" is "field1" and "q" > is > >> > "aaa" (a word from field1) I still get no results. > >> > > >> > If I do a search where "df" is "field1" and "q" is "aaa bbb" (the full > >> > value of field1) then I get the document in the results. > >> > > >> > So I'm concluding that when using "df" to specify which fields to > search > >> > then only an exact match on the full field value will return a > document. > >> > > >> > Is that a correct conclusion? Is there another way to specify which > >> fields > >> > to search without requiring an exact match? The results I'd like to > >> achieve > >> > are: > >> > > >> > Would Match: > >> > q=aaa > >> > q=aaa bbb > >> > q=aaa ccc > >> > q=aaa fff > >> > > >> > Would Not Match: > >> > q=eee > >> > q=fff > >> > q=eee fff > >> > > >> > -- > >> > *This message is intended only for the use of the individual or > entity to > >> > which it is addressed and may contain information that is privileged, > >> > confidential and exempt from disclosure under appli
Re: Partial Match with DF
Wow, that's really powerful! Thank you! On Thu, Mar 16, 2017 at 11:19 AM, Charlie Hull wrote: > Hi Mark, > > Open Source Connection's excellent www.splainer.io might also be useful to > help you break down exactly what your query is doing. > > Cheers > > Charlie > > P.S. planning a blog soon listing 'useful Solr tools' > > On 16 March 2017 at 14:39, Mark Johnson > wrote: > > > You're right! The fields I'm searching are all "string" type. I switched > to > > "text_en" and now it's working exactly as I need it to! I'll do some > > research to see if "text_en" or another "text" type field is best for our > > needs. > > > > Also, those debug options are amazing! They'll help tremendously in the > > future. > > > > Thank you much! > > > > On Thu, Mar 16, 2017 at 10:02 AM, Erick Erickson < > erickerick...@gmail.com> > > wrote: > > > > > My guess: Your analysis chain for the fields is different, i.e. they > > > have a different fieldType. In particular, watch out for the "string" > > > type, people are often confused about it. It does _not_ break input > > > into tokens, you need a text-based field type, text_en is one example > > > that is usually in the configs by default. > > > > > > Two tools that'll help you enormously: > > > > > > admin UI>>select core (or collection) from the drop-down>>analysis > > > That shows you exactly how Solr/Lucene break up text at query and index > > > time > > > > > > add &debug=query to the URL. That'll show you how the query was parsed. > > > > > > Best, > > > Erick > > > > > > On Thu, Mar 16, 2017 at 6:52 AM, Mark Johnson > > > wrote: > > > > Oh, great! Thank you! > > > > > > > > So if I switch over to eDisMax I'd specify the fields to query via > the > > > "qf" > > > > parameter, right? That seems to have the same result (only matches > > when I > > > > specify the exact phrase in the field, not just certain words from > it). > > > > > > > > On Thu, Mar 16, 2017 at 9:33 AM, Alexandre Rafalovitch < > > > arafa...@gmail.com> > > > > wrote: > > > > > > > >> df is default field - you can only give one. To search over multiple > > > >> fields, you switch to eDisMax query parser and fl parameter. > > > >> > > > >> Then, the question will be what type definition your fields have. > When > > > you > > > >> search text field, you are using its definition because of > copyField. > > > Your > > > >> original fields may be strings. > > > >> > > > >> Remember to reload core and reminded when you change definitions. > > > >> > > > >> Regards, > > > >>Alex > > > >> > > > >> > > > >> On 16 Mar 2017 9:15 AM, "Mark Johnson" < > mjohn...@emersonecologics.com > > > > > > >> wrote: > > > >> > > > >> > Forgive me if I'm missing something obvious -- I'm new to Solr, > but > > I > > > >> can't > > > >> > seem to find an explanation for the behavior I'm seeing. > > > >> > > > > >> > If I have a document that looks like this: > > > >> > { > > > >> > field1: "aaa bbb", > > > >> > field2: "ccc ddd", > > > >> > field3: "eee fff" > > > >> > } > > > >> > > > > >> > And I do a search where "q" is "aaa ccc", I get the document in > the > > > >> > results. This is because (please correct me if I'm wrong) the > > default > > > >> "df" > > > >> > is set to the "_text_" field, which contains the text values from > > all > > > >> > fields. > > > >> > > > > >> > However, if I do a search where "df" is "field1" and "field2" and > > "q" > > > is > > > >> > "aaa ccc" (words from field1 and field2) I get no results. > > > >> > > > > >> > In a simpler example, if I do a search where "d
Re: Partial Match with DF
Thank you for the heads up! I think in some cases we will want to strip out punctuation but in others we might need it (for example, "liquid courage." should tokenize to "liquid" and "courage", while "1.5 oz liquid courage" should tokenize to "1.5", "oz", "liquid" and "courage"). I'll have to do some experimenting to see which one will work best for us. On Thu, Mar 16, 2017 at 11:09 AM, Erick Erickson wrote: > Yeah, they've saved me on numerous occasions, glad to see they helped. > > One caution BTW when you start changing fieldTypes is you have to > watch punctuation. StandardTokenizerFactory won't pass through most > punctuation. > > WordDelimiterFilterFactory breaks on non alpha-num, including > punctuation effectively throwing it out. > > But WhitespaceTokenizer does just that and spits out punctuation as > part of tokens, i.e. > "my words." (note period) is broken up as "my" "words." and wouldn't > match a search on "word". > > One other note, there's a tokenizer/filter for a zillion different > cases, you can go wild. Here's a partial > list:https://cwiki.apache.org/confluence/display/solr/ > Understanding+Analyzers%2C+Tokenizers%2C+and+Filters, > see the "Tokenizer", "Filters" and CharFilters" links. There are 12 > tokenizers listed and 40 or so filters... and the list is not > guaranteed to be complete. > > On Thu, Mar 16, 2017 at 7:39 AM, Mark Johnson > wrote: > > You're right! The fields I'm searching are all "string" type. I switched > to > > "text_en" and now it's working exactly as I need it to! I'll do some > > research to see if "text_en" or another "text" type field is best for our > > needs. > > > > Also, those debug options are amazing! They'll help tremendously in the > > future. > > > > Thank you much! > > > > On Thu, Mar 16, 2017 at 10:02 AM, Erick Erickson < > erickerick...@gmail.com> > > wrote: > > > >> My guess: Your analysis chain for the fields is different, i.e. they > >> have a different fieldType. In particular, watch out for the "string" > >> type, people are often confused about it. It does _not_ break input > >> into tokens, you need a text-based field type, text_en is one example > >> that is usually in the configs by default. > >> > >> Two tools that'll help you enormously: > >> > >> admin UI>>select core (or collection) from the drop-down>>analysis > >> That shows you exactly how Solr/Lucene break up text at query and index > >> time > >> > >> add &debug=query to the URL. That'll show you how the query was parsed. > >> > >> Best, > >> Erick > >> > >> On Thu, Mar 16, 2017 at 6:52 AM, Mark Johnson > >> wrote: > >> > Oh, great! Thank you! > >> > > >> > So if I switch over to eDisMax I'd specify the fields to query via the > >> "qf" > >> > parameter, right? That seems to have the same result (only matches > when I > >> > specify the exact phrase in the field, not just certain words from > it). > >> > > >> > On Thu, Mar 16, 2017 at 9:33 AM, Alexandre Rafalovitch < > >> arafa...@gmail.com> > >> > wrote: > >> > > >> >> df is default field - you can only give one. To search over multiple > >> >> fields, you switch to eDisMax query parser and fl parameter. > >> >> > >> >> Then, the question will be what type definition your fields have. > When > >> you > >> >> search text field, you are using its definition because of copyField. > >> Your > >> >> original fields may be strings. > >> >> > >> >> Remember to reload core and reminded when you change definitions. > >> >> > >> >> Regards, > >> >>Alex > >> >> > >> >> > >> >> On 16 Mar 2017 9:15 AM, "Mark Johnson" < > mjohn...@emersonecologics.com> > >> >> wrote: > >> >> > >> >> > Forgive me if I'm missing something obvious -- I'm new to Solr, > but I > >> >> can't > >> >> > seem to find an explanation for the behavior I'm seeing. > >> >> > > >> >> > If I have a document that looks like this: &g
Regex Phrases
Is it possible to configure Solr to treat text that matches a regex as a phrase? I have a database full of products, and the Title and Description fields are text_en, tokenized via the StandardTokenizerFactory. This works in most cases, but a number of products have names like: - Vitamin A - Vitamin-A - Vitamin B12 - Vitamin B-12 ...and so on I have a regex that will match all of the permutations and would like to configure the field type so that anything that matches the regex pattern is treated as a single token, instead of being broken up by spaces, etc. Is that possible? -- *This message is intended only for the use of the individual or entity to which it is addressed and may contain information that is privileged, confidential and exempt from disclosure under applicable law. If you have received this message in error, you are hereby notified that any use, dissemination, distribution or copying of this message is prohibited. If you have received this communication in error, please notify the sender immediately and destroy the transmitted information.*
Re: Regex Phrases
Awesome, thank you much! On Wed, Mar 22, 2017 at 2:38 PM, Erick Erickson wrote: > Take a close look at WordDelimiterFilterFactory, it's designed to deal > with things like part numbers, phone numbers and the like, and the > example you gave is in the same class of problem I think. It'll take > a bit to get your head around what it does, but it'll perfom better > than regexes, assuming you can get what you need out of it. > > And the admin/analysis page will help you _greatly_ in understanding > what the effects of the various parameters are. > > Best, > Erick > > On Wed, Mar 22, 2017 at 11:06 AM, Mark Johnson > wrote: > > Is it possible to configure Solr to treat text that matches a regex as a > > phrase? > > > > I have a database full of products, and the Title and Description fields > > are text_en, tokenized via the StandardTokenizerFactory. This works in > most > > cases, but a number of products have names like: > > > > - Vitamin A > > - Vitamin-A > > - Vitamin B12 > > - Vitamin B-12 > > ...and so on > > > > I have a regex that will match all of the permutations and would like to > > configure the field type so that anything that matches the regex pattern > is > > treated as a single token, instead of being broken up by spaces, etc. Is > > that possible? > > > > -- > > *This message is intended only for the use of the individual or entity to > > which it is addressed and may contain information that is privileged, > > confidential and exempt from disclosure under applicable law. If you have > > received this message in error, you are hereby notified that any use, > > dissemination, distribution or copying of this message is prohibited. If > > you have received this communication in error, please notify the sender > > immediately and destroy the transmitted information.* > -- Best Regards, *Mark Johnson* | .NET Software Engineer Office: 603-392-7017 Emerson Ecologics, LLC | 1230 Elm Street | Suite 301 | Manchester NH | 03101 <http://www.emersonecologics.com/> <https://wellevate.me/#/> *Supporting The Practice Of Healthy Living* <http://blog.emersonecologics.com/> <https://www.linkedin.com/company/emerson-ecologics> <https://www.facebook.com/emersonecologics/> <https://twitter.com/EmersonEcologic> <https://www.instagram.com/emerson_ecologics/> <https://www.pinterest.com/emersonecologic/> <https://www.glassdoor.com/Overview/Working-at-Emerson-Ecologics-EI_IE388367.11,28.htm> -- *This message is intended only for the use of the individual or entity to which it is addressed and may contain information that is privileged, confidential and exempt from disclosure under applicable law. If you have received this message in error, you are hereby notified that any use, dissemination, distribution or copying of this message is prohibited. If you have received this communication in error, please notify the sender immediately and destroy the transmitted information.*
Re: Regex Phrases
So I managed to get the tokenizing to work with both PatternTokenizerFactory and WordDelimiterFilterFactory (used in combination with WhitespaceTokenizerFactory). For PT I used a regex that matches the various permutations of the phrases, and for WDF/WT I used protected words with every permutation (there are only 40 or 50). In both cases, via the admin/analysis screen, the Index and Query values were tokenized correctly (for example, "Super Vitamin C" was tokenized as "Super" and "Vitamin C"). However, when I do a query like "DisplayName:(Super Vitamin C)" with "debug=query", I see that the parsed query is "DisplayName:Super DisplayName:Vitamin DisplayName:C" ("DisplayName" is the field I'm working on here). Shouldn't that instead be parsed as something like "DIsplayName:Super DisplayName:"Vitamin C"" or something similar? Or am I not understanding how query parsing works? In either case, I'm seeing results where DisplayName contains things like "Vitamin B 90 Caps" or "Super Orange 30 pkts", neither of which contain the phrase "Vitamin C", so I suspect something is wrong. On Thu, Mar 23, 2017 at 8:08 AM, Joel Bernstein wrote: > You can also checkout > https://cwiki.apache.org/confluence/display/solr/Tokenizers#Tokenizers- > RegularExpressionPatternTokenizer > . > > Joel Bernstein > http://joelsolr.blogspot.com/ > > On Wed, Mar 22, 2017 at 7:52 PM, Erick Erickson > wrote: > > > Susheel: > > > > That'll work, but the options you've specified for > > WordDelimiterFilterFactory pretty much make it so it's doing nothing. > > I realize it's commented out... > > > > That said, it's true that if you have a very specific pattern you want > > to recognize a Regex can do the trick. WDFF is a bit more generic > > though when you have less specific requirements. > > > > Best, > > Erick > > > > On Wed, Mar 22, 2017 at 12:56 PM, Susheel Kumar > > wrote: > > > I have used PatternReplaceFilterFactory in some of these situations. > e.g. > > > below > > > > > > > > > class="solr.PatternReplaceFilterFactory" pattern="(\d+)-(\d+)-?(\d+)$" > > > replacement="$1$2$3"/> > > > > > > On Wed, Mar 22, 2017 at 2:54 PM, Mark Johnson < > > mjohn...@emersonecologics.com > > >> wrote: > > > > > >> Awesome, thank you much! > > >> > > >> On Wed, Mar 22, 2017 at 2:38 PM, Erick Erickson < > > erickerick...@gmail.com> > > >> wrote: > > >> > > >> > Take a close look at WordDelimiterFilterFactory, it's designed to > deal > > >> > with things like part numbers, phone numbers and the like, and the > > >> > example you gave is in the same class of problem I think. It'll take > > >> > a bit to get your head around what it does, but it'll perfom better > > >> > than regexes, assuming you can get what you need out of it. > > >> > > > >> > And the admin/analysis page will help you _greatly_ in understanding > > >> > what the effects of the various parameters are. > > >> > > > >> > Best, > > >> > Erick > > >> > > > >> > On Wed, Mar 22, 2017 at 11:06 AM, Mark Johnson > > >> > wrote: > > >> > > Is it possible to configure Solr to treat text that matches a > regex > > as > > >> a > > >> > > phrase? > > >> > > > > >> > > I have a database full of products, and the Title and Description > > >> fields > > >> > > are text_en, tokenized via the StandardTokenizerFactory. This > works > > in > > >> > most > > >> > > cases, but a number of products have names like: > > >> > > > > >> > > - Vitamin A > > >> > > - Vitamin-A > > >> > > - Vitamin B12 > > >> > > - Vitamin B-12 > > >> > > ...and so on > > >> > > > > >> > > I have a regex that will match all of the permutations and would > > like > > >> to > > >> > > configure the field type so that anything that matches the regex > > >> pattern > > >> > is > > >> > > treated as a single token, instead of being broken up by spaces, > > etc. > > >> Is > > >> > > that possibl
Concurrent Updates
We have a SolrCloud cluster (of 3 nodes) running solr 6.4.2. Every night, we delete and recreate our whole catalog. In this process, we're simultaneously running a query which recreates the product catalog (which includes child documents of a different type) and a query that creates a third document type that we use for joining. When we issue a search against one shard, we see the response we expect. But when we issue the same search against another shard, instead of the prescribed child documents, we'll have children that are this third type of document. This seems to only affect the occasional document. We're wondering if anybody out there has experience with this, and might have some ideas as to why it is happening. Thanks so much. -- *This message is intended only for the use of the individual or entity to which it is addressed and may contain information that is privileged, confidential and exempt from disclosure under applicable law. If you have received this message in error, you are hereby notified that any use, dissemination, distribution or copying of this message is prohibited. If you have received this communication in error, please notify the sender immediately and destroy the transmitted information.*