Doug, A couple things quickly: - I'll check in to that. How would you go about testing things, direct URL? If so, how would you compose one of the examples above? - yup, I used it extensively before testing scores to ensure that I was getting things parsed appropriately (segmenting off the unit of measure [mm] whilst still maintaining the decimal instead of breaking it up was my largest concern as of late) - to that point, though, it looks like one of my blunders was in the synonyms file. i just referenced /analysis/ again and realized "CANN" was being transposed to "cannula" instead of "cannulated" #facepalm - i'll be GLAD to use that! i'd been trying to use http://explain.solr.pl/ previously but it kept error'ing out on me :\
thanks again, will report back! -- *John Blythe* Product Manager & Lead Developer 251.605.3071 | j...@curvolabs.com www.curvolabs.com 58 Adams Ave Evansville, IN 47713 On Mon, May 18, 2015 at 4:47 PM, Doug Turnbull < dturnb...@opensourceconnections.com> wrote: > Hey John, > > I think you likely do need to think about escaping the query operators. I > doubt the Solr admin could tell the difference. > > For analysis, have you looked at the handy analysis tool in the Solr Admin > UI? Its pretty indespensible for figuring out if an analyzed query matches > an analyzed field. > > Outside of that, I can selfishly plug Splainer (http://splainer.io) that > gives you more insight into the Solr relevance explain. You would paste in > something like > http://solr.quepid.com/solr/statedecoded/select?q=text:(deer%20hunting). > > Cheers! > -Doug > > On Mon, May 18, 2015 at 3:02 PM, John Blythe <j...@curvolabs.com> wrote: > > > Thanks again for the speediness, Doug. > > > > Good to know on some of those things, not least of all the + indicating a > > mandatory field and the parentheses. It seems like the escaping is pretty > > robust in light of the product number. > > > > I'm thinking it has to be largely related to the analyzer. Check this > out, > > this time with more of a real world case for us. Searching for > "descript2: > > CANN SCREW PT 3.5X50MM" produces a top result that has "Cannulated screw > PT > > 4.0x40mm" as its description. There is a document, though, that has the > > description of "Cannulated screw PT 3.5x50mm"—the exact same thing (minus > > lowercases) rendering that the analyzer is producing (per the /analysis > > page). Why would 4.0x40 come up first? The top four results have > > 4.0x[Something]. It's not till the fifth result that you see a 3.5 > > something: "Cannulated screw PT 3.5x105mm" at which point I'm saying WTF. > > So close, but then it ignores the "50" for a "105" instead. > > > > Further, adding parenthesis around the phrase—"descript2: (CANN SCREW PT > > 3.5X50MM)"—produces top results that have the correct > dimensions—3.5x50—but > > the wrong type. Instead of "cannulated" screws we see "cortical." I'm > > convinced Solr is trolling me at this point :p > > > > -- > > *John Blythe* > > Product Manager & Lead Developer > > > > 251.605.3071 | j...@curvolabs.com > > www.curvolabs.com > > > > 58 Adams Ave > > Evansville, IN 47713 > > > > On Mon, May 18, 2015 at 2:34 PM, Doug Turnbull < > > dturnb...@opensourceconnections.com> wrote: > > > > > You might just need some syntax help. Not sure what the Solr admin > > escapes, > > > but many of the text in your query actually have reserved meaning. > Also, > > > when a term appears without a fieldName:value directly in front of it, > I > > > believe its going to search the default field (it's no longer attached > to > > > the field). You need to use parens to attach multiple terms to that > field > > > for search. > > > > > > I'd try to see if doing any of the following help: > > > > > > Add parens to group terms to the field: > > > > > > mfgname2:(Ben & Jerry's) +descript1:(Strawberry Shortcake Ice Cream > > 1.5pt) > > > + > > > productnumber:(001-029-1298) > > > > > > Also keep in mind "+" means mandatory, and its an operator on just one > > > field. So in the above you're requiring description and product number > > > match the provided terms. > > > > > > Further, you may need to escape the "-" as that means "NOT". You can do > > > that with the following: > > > mfgname2:(Ben & Jerry's) +descript1:(Strawberry Shortcake Ice Cream > > 1.5pt) > > > + > > > productnumber:(001\-029\-1298) > > > > > > You can read more in the article on Solr query syntax > > > https://wiki.apache.org/solr/SolrQuerySyntax > > > > > > Hope that helps, for all I know your cut and paste didn't work and I'm > > > assuming you have syntax issues :) > > > > > > -Doug > > > > > > On Mon, May 18, 2015 at 2:25 PM, John Blythe <j...@curvolabs.com> > wrote: > > > > > > > Hey Doug, > > > > > > > > Thanks for the quick reply. > > > > > > > > No edismax just yet. Planning on getting there, but have been trying > to > > > > fine tune the 3 primary fields we use over the last week or so before > > > > jumping into edismax and its nifty toolset to help push our accuracy > > and > > > > precision even further (aside: is this a good strategy?) > > > > > > > > For now I'm querying directly in the admin interface, doing something > > > like > > > > this: > > > > mfgname2: Ben & Jerry's + descript1: Strawberry Shortcake Ice Cream > > > 1.5pt + > > > > productnumber: 001-029-1298 > > > > > > > > versus > > > > mfgname2: Ben & Jerry's + descript1: Strawberry Shortcake Ice Cream > > 1.5pt > > > > > > > > Another interesting and likely related factor is the description's > lack > > > of > > > > help. With the product number in place it gets nailed even with stray > > > > zeros, 4's instead of 1's, etc. > > > > > > > > Without it, though, the querying just flat out sucks. For instance, I > > > just > > > > saw something akin to this: > > > > mfgname2: Ben & Jerry's + descript1: Straw Shortcake Ice Cream 1.5pt > > > > > > > > that got nowhere near what it should have. Straw would have a synonym > > to > > > > map to strawberry and would match the document's description > *exactly, > > > *yet > > > > Solr would push out all sorts of peripheral suggestions that didn't > > match > > > > strawberry or was a different amount (.75pt, for instance). I know > I'm > > no > > > > expert, but I was thinking my analyzer was a bit better than that :p > > > > > > > > -- > > > > *John Blythe* > > > > Product Manager & Lead Developer > > > > > > > > 251.605.3071 | j...@curvolabs.com > > > > www.curvolabs.com > > > > > > > > 58 Adams Ave > > > > Evansville, IN 47713 > > > > > > > > On Mon, May 18, 2015 at 2:18 PM, Doug Turnbull < > > > > dturnb...@opensourceconnections.com> wrote: > > > > > > > > > > The maxScore is 772 when I remove the > > > > > description. > > > > > > I suppose the actual question, then, is if a low relevancy score > on > > > one > > > > > field > > > > > hurts the rest of them / the cumulative score, > > > > > > > > > > This depends a lot on how you're searching over these fields. Is > > this a > > > > > (e)dismax query? Or a lucene query? Something else? > > > > > > > > > > Across fields there's query normalization, which attempts to take a > > sum > > > > of > > > > > squares of IDFs of the search terms across the fields being > searched. > > > > > Adding/removing a field could impact query normalization. > > > > > > > > > > By removing a field, you also likely remove a boolean clause. By > > > removing > > > > > the clause, there's less of a chance the coordinating factor (known > > as > > > > > coord) would punish your relevancy score. > > > > > > > > > > Otherwise, don't know -- perhaps you could give us more information > > on > > > > how > > > > > you're searching your documents? Perhaps a sample Solr URL that > shows > > > how > > > > > you're querying? > > > > > > > > > > Cheers, > > > > > -- > > > > > *Doug Turnbull **| *Search Relevance Consultant | OpenSource > > > Connections, > > > > > LLC | 240.476.9983 | http://www.opensourceconnections.com > > > > > Author: Relevant Search <http://manning.com/turnbull> from Manning > > > > > Publications > > > > > This e-mail and all contents, including attachments, is considered > to > > > be > > > > > Company Confidential unless explicitly stated otherwise, regardless > > > > > of whether attachments are marked as such. > > > > > On Mon, May 18, 2015 at 1:57 PM, John Blythe <j...@curvolabs.com> > > > wrote: > > > > > > > > > > > Background: > > > > > > I'm using Solr as a mechanism for search for users, but before > even > > > > > getting > > > > > > to that point as a means of intelligent inference more or less. > > > Product > > > > > > data comes in and we're hoping to match it to the correct known > > > product > > > > > > without having to use the user for confirmation/search. > > > > > > > > > > > > Problem: > > > > > > I get a maxScore (with the correct result at the top) of > 618.22626 > > > > using > > > > > > the manufacturer's name, the product number, and the product > > > > description. > > > > > > All of these items are coming from a previous purchaser so we > have > > to > > > > > > account for manufacturer name variations, miskeying of product > > > numbers, > > > > > and > > > > > > variances of descriptions. The maxScore is 772 when I remove the > > > > > > description. > > > > > > > > > > > > My initial question is regarding relevancy scoring ( > > > > > > https://wiki.apache.org/solr/SolrRelevancyFAQ). I get that many > of > > > the > > > > > > description's tokens will be found throughout the other > documents, > > > thus > > > > > > keeping the relevancy at bay per the IDF portion of the relevancy > > > > score. > > > > > I > > > > > > suppose the actual question, then, is if a low relevancy score on > > one > > > > > field > > > > > > hurts the rest of them / the cumulative score, or if it simply > keep > > > > that > > > > > > field's contribution lower than it'd otherwise be. I thought it > was > > > the > > > > > > latter, but the results I mention above are making me think that > > the > > > > > first > > > > > > scenario is actually the case. > > > > > > > > > > > > Based on what I hear about the above, a follow up question may be > > > what > > > > in > > > > > > the world is wrong with my analyzer :) > > > > > > > > > > > > Thanks for any thoughts! > > > > > > > > > > > > Best, > > > > > > John > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > *Doug Turnbull **| *Search Relevance Consultant | OpenSource > Connections, > > > LLC | 240.476.9983 | http://www.opensourceconnections.com > > > Author: Relevant Search <http://manning.com/turnbull> from Manning > > > Publications > > > This e-mail and all contents, including attachments, is considered to > be > > > Company Confidential unless explicitly stated otherwise, regardless > > > of whether attachments are marked as such. > > > > > > > > > -- > *Doug Turnbull **| *Search Relevance Consultant | OpenSource Connections, > LLC | 240.476.9983 | http://www.opensourceconnections.com > Author: Relevant Search <http://manning.com/turnbull> from Manning > Publications > This e-mail and all contents, including attachments, is considered to be > Company Confidential unless explicitly stated otherwise, regardless > of whether attachments are marked as such. >