solr custom boost
Hi, I have a index with contacts. I also have a graph with triples. When a person logs in, with the information, I am able to query the graph to get the node lengths for that person to various contents such has Job, location etc. ex. PersonA, Job:Programmer, Location:Floor2 Node Lengths : Job: Programmer>Programmer: 0, Programmer>Architect :1, Programmer>Manager :2 Location: Floor2>Floor1: 1, Floor2>Floor2:0, Floor2>Floor3:1 node length :0, weight:10 nodeLength:1, weight:8 nodeLength:2, weight:6 I want to have a query time boost with custom algorithm where the node lengths value can be used for boost weight. ex: JobWeight+LocationWeight. job:programmer^10*+*location:floor1^8 job:architect^8*+*location:floor1^8 job:manager*+*location:floor1^8 job:programmer^10*+*location:floor2^10 A query such as above with function queries would be too long when added more data in graph and if i need custom boost algorithm. Any advice on how to achieve this would be most appreciated Thanks Sharnel
Re: Wildcard searches with space in TextField/StrField
Thanks, Erick. I am actually not trying to use the String field (prefer a TextField here). But, in my comparisons with TextField, it seems that something like phrase matching with whitespace and wildcard (like, 'my do*' or say, 'my dog*', or say, 'my dog has*') can only be accomplished with a string type field, especially because, with a WhitespaceTokenizer in TextField, the space will be lost, and all tokens will be individually considered. Am I missing something? SRK On Friday, November 11, 2016 10:05 PM, Erick Erickson wrote: You have to query text and string fields differently, that's just the way it works. The problem is getting the query string through the parser as a _single_ token or as multiple tokens. Let's say you have a string field with the "a b" example. You have a single token a b that starts at offset 0. But with a text field, you have two tokens, a at position 0 b at position 1 But when the query parser sees "a b" (without quotes) it splits it into two tokens, and only the text field has both tokens so the string field won't match. OTOH, when the query parser sees "a\ b" it passes this through as a single token, which only matches the string field as there's no _single_ token "a b" in the text field. But a more interesting question is why you want to search this way. String fields are intended for keywords, machine-generated IDs and the like. They're pretty useless for searching anything except 1> exact tokens 2> prefixes While if you have "my dog has fleas" in a string field, you _can_ search "*dog*" and get a hit but the performance is poor when you get a large corpus. Performance for "my*" will be pretty good though. In all this sounds like an XY problem, what's the use-case you're trying to solve? Best, Erick On Thu, Nov 10, 2016 at 10:11 PM, Sandeep Khanzode wrote: > Hi Erick, Reth, > > The 'a\ b*' as well as the q.op=AND approach worked (successfully) only for > StrField for me. > > Any attempt at creating a 'a\ b*' for a TextField does not match any > documents. The parsedQuery in debug mode does show 'field:a b*'. I am sure > there are documents that should match. > Another (maybe unrelated) observation is if I have 'field:a\ b', then the > parsedQuery is field:a field:b. Which does not match as expected (matches > individually). > > Can you please provide an example that I can use in Solr Query dashboard? > That will be helpful. > > I have also seen that wildcard queries work irrespective of field type i.e. > StrField as well as TextField. That makes sense because with a > WhitespaceTokenizer only creates word boundaries when we do not use a > EdgeNGramFilter. If I am not wrong, that is. SRK > > On Friday, November 11, 2016 5:00 AM, Erick Erickson > wrote: > > > You can escape the space with a backslash as 'a\ b*' > > Best, > Erick > > On Thu, Nov 10, 2016 at 2:37 PM, Reth RM wrote: >> I don't think you can do wildcard on StrField. For text field, if your >> query is "category:(test m*)" the parsed query will be "category:test OR >> category:m*" >> You can add q.op=AND to make an AND between those terms. >> >> For phrase type wild card query support, as per docs, it >> is ComplexPhraseQueryParser that supports it. (I haven't tested it myself) >> >> https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-ComplexPhraseQueryParser >> >> On Thu, Nov 10, 2016 at 11:40 AM, Sandeep Khanzode < >> sandeep_khanz...@yahoo.com.invalid> wrote: >> >>> Hi, >>> How does a search like abc* work in StrField. Since the entire thing is >>> stored as a single token, is it a type of a trie structure that allows such >>> wildcard matching? >>> How can searches with space like 'a b*' be executed for text fields >>> (tokenized on whitespace)? If we specify this type of query, it is broken >>> down into two queries with field:a and field:b*. I would like them to be >>> contiguous, sort of, like a phrase search with wild card. >>> SRK > > >
collection creation and recovery
Hi, I have a SolrCloud 6.2.1 setup with 5 nodes. I do an occasional restart of my nodes in which I restart one node at a time. I have quite a few collections. Lets say 2000 with a replication factor of 3. When the node comes up again it looks like I get the same issue as described in SOLR-5796. According to Jira this should be fixed in 6.0. Is there now a setting to increase the conflict resolution time as I also saw some leader conflict exceptions in some logs. If so could somebody point me to those settings? A second thing is that it looks like when a new collection is being created there is first data being written into /clusterstate.json in ZK. I thought this was a legacy file and not being used anymore but that does not seem to be the case. The problem is now that when a new collection is being created and the first node is being assigned to a node and then I'm happening to stop exactly that node the collection does not seem to recover after the restart. The admin UI shows the new collection with one down replica but it never recovers. In this state I can not create any further collections anymore. The only solution that I found so far is to set the contents of /clusterstate.json to "{}" but this kills the collection. Is that a know issue? The release notes of Solr 6.3 stated "Many bug fixes related to SolrCloud recovery for data safety and faster recovery times". Any chance that those could fix the issues I'm seeing? thanks, Hendrik
Re: Edismax query parsing in Solr 4 vs Solr 6
This has come up a lot on the lists lately. Keep in mind that edismax parses your query uses additional parameters such as 'mm' and 'q.op'. It is the handling of these parameters (and the selection of default values) which has changed between versions to address a few functionality gaps. The most common issue I've seen is where users were not setting those values and relying on the defaults. You might now need to set them explicitly to return to desired behaviour. I can't see all of your configuration, but I'm guessing the important one here is 'q.op', which was previously hard coded to 'OR', irrespective of either parameters or solrconfig. Try setting that to 'OR' explicitly... maybe you have your default operator set to 'AND' in solrconfig and that is now being applied? The other option is 'mm', which I suspect should be set to '0' unless you have some reason to want it. If it was set to '100%' it might insert the additional '+' flags, but it can also show up as a '~' operator on the end. Ta, Greg On 8 November 2016 at 22:13, Max Bridgewater wrote: > I am migrating a solr based app from Solr 4 to Solr 6. One of the > discrepancies I am noticing is around edismax query parsing. My code makes > the following call: > > > userQuery="+(title:shirts isbn:shirts) +(id:20446 id:82876)" > Query query=QParser.getParser(userQuery, "edismax", req).getQuery(); > > > With Solr 4, query becomes: > > +(+(title:shirt isbn:shirts) +(id:20446 id:82876)) > > With Solr 6 it however becomes: > > +(+(+title:shirt +isbn:shirts) +(+id:20446 +id:82876)) > > Digging deeper, it appears that parseOriginalQuery() in > ExtendedDismaxQParser is adding those additional + signs. > > > Is there a way to prevent this altering of queries? > > Thanks, > Max. >
Re: Edismax query parsing in Solr 4 vs Solr 6
Hi Greg, Your analysis is SPOT ON. I did some debugging and found out that we had q.op in the default set to AND. And when I changed that to OR, things worked exactly as in Solr 4. So, it seemed Solr 6 was behaving as is should. What I could not explain was whether Solr 4 was using the configured q.op that was set in the default or not. But your explanation makes sense now. Thanks, Max. On Sat, Nov 12, 2016 at 4:54 PM, Greg Pendlebury wrote: > This has come up a lot on the lists lately. Keep in mind that edismax > parses your query uses additional parameters such as 'mm' and 'q.op'. It is > the handling of these parameters (and the selection of default values) > which has changed between versions to address a few functionality gaps. > > The most common issue I've seen is where users were not setting those > values and relying on the defaults. You might now need to set them > explicitly to return to desired behaviour. > > I can't see all of your configuration, but I'm guessing the important one > here is 'q.op', which was previously hard coded to 'OR', irrespective of > either parameters or solrconfig. Try setting that to 'OR' explicitly... > maybe you have your default operator set to 'AND' in solrconfig and that is > now being applied? The other option is 'mm', which I suspect should be set > to '0' unless you have some reason to want it. If it was set to '100%' it > might insert the additional '+' flags, but it can also show up as a '~' > operator on the end. > > Ta, > Greg > > On 8 November 2016 at 22:13, Max Bridgewater > wrote: > > > I am migrating a solr based app from Solr 4 to Solr 6. One of the > > discrepancies I am noticing is around edismax query parsing. My code > makes > > the following call: > > > > > > userQuery="+(title:shirts isbn:shirts) +(id:20446 id:82876)" > > Query query=QParser.getParser(userQuery, "edismax", req).getQuery(); > > > > > > With Solr 4, query becomes: > > > > +(+(title:shirt isbn:shirts) +(id:20446 id:82876)) > > > > With Solr 6 it however becomes: > > > > +(+(+title:shirt +isbn:shirts) +(+id:20446 +id:82876)) > > > > Digging deeper, it appears that parseOriginalQuery() in > > ExtendedDismaxQParser is adding those additional + signs. > > > > > > Is there a way to prevent this altering of queries? > > > > Thanks, > > Max. > > >