solr custom boost

2016-11-12 Thread sharnel pereira
Hi,

I have a index with contacts. I also have a graph with triples.

When a person logs in, with the information, I am able to query the graph
to get the node lengths for that person to various contents such has Job,
location etc.

ex. PersonA, Job:Programmer, Location:Floor2
Node Lengths :
Job: Programmer>Programmer: 0, Programmer>Architect :1, Programmer>Manager
:2
Location: Floor2>Floor1: 1, Floor2>Floor2:0, Floor2>Floor3:1

node length :0, weight:10
nodeLength:1, weight:8
nodeLength:2, weight:6

I want to have a query time boost with custom algorithm where the node
lengths value can be used for boost weight.
ex: JobWeight+LocationWeight.

job:programmer^10*+*location:floor1^8 job:architect^8*+*location:floor1^8
job:manager*+*location:floor1^8 job:programmer^10*+*location:floor2^10

A query such as above with function queries would be too long when added
more data in graph and if i need custom boost algorithm.
Any advice on how to achieve this would be most appreciated

Thanks
Sharnel


Re: Wildcard searches with space in TextField/StrField

2016-11-12 Thread Sandeep Khanzode
Thanks, Erick.
I am actually not trying to use the String field (prefer a TextField here). 
But, in my comparisons with TextField, it seems that something like phrase 
matching with whitespace and wildcard (like, 'my do*' or say, 'my dog*', or 
say, 'my dog has*') can only be accomplished with a string type field, 
especially because, with a WhitespaceTokenizer in TextField, the space will be 
lost, and all tokens will be individually considered. Am I missing something? 
SRK 

On Friday, November 11, 2016 10:05 PM, Erick Erickson 
 wrote:
 

 You have to query text and string fields differently, that's just the
way it works. The problem is getting the query string through the
parser as a _single_ token or as multiple tokens.

Let's say you have a string field with the "a b" example. You have a
single token
a b that starts at offset 0.

But with a text field, you have two tokens,
a at position 0
b at position 1

But when the query parser sees "a b" (without quotes) it splits it
into two tokens, and only the text field has both tokens so the string
field won't match.

OTOH, when the query parser sees "a\ b" it passes this through as a
single token, which only matches the string field as there's no
_single_ token "a b" in the text field.

But a more interesting question is why you want to search this way.
String fields are intended for keywords, machine-generated IDs and the
like. They're pretty useless for searching anything except
1> exact tokens
2> prefixes

While if you have "my dog has fleas" in a string field, you _can_
search "*dog*" and get a hit but the performance is poor when you get
a large corpus. Performance for "my*" will be pretty good though.

In all this sounds like an XY problem, what's the use-case you're
trying to solve?

Best,
Erick



On Thu, Nov 10, 2016 at 10:11 PM, Sandeep Khanzode
 wrote:
> Hi Erick, Reth,
>
> The 'a\ b*' as well as the q.op=AND approach worked (successfully) only for 
> StrField for me.
>
> Any attempt at creating a 'a\ b*' for a TextField does not match any 
> documents. The parsedQuery in debug mode does show 'field:a b*'. I am sure 
> there are documents that should match.
> Another (maybe unrelated) observation is if I have 'field:a\ b', then the 
> parsedQuery is field:a field:b. Which does not match as expected (matches 
> individually).
>
> Can you please provide an example that I can use in Solr Query dashboard? 
> That will be helpful.
>
> I have also seen that wildcard queries work irrespective of field type i.e. 
> StrField as well as TextField. That makes sense because with a 
> WhitespaceTokenizer only creates word boundaries when we do not use a 
> EdgeNGramFilter. If I am not wrong, that is. SRK
>
>    On Friday, November 11, 2016 5:00 AM, Erick Erickson 
> wrote:
>
>
>  You can escape the space with a backslash as  'a\ b*'
>
> Best,
> Erick
>
> On Thu, Nov 10, 2016 at 2:37 PM, Reth RM  wrote:
>> I don't think you can do wildcard on StrField. For text field, if your
>> query is "category:(test m*)"  the parsed query will be  "category:test OR
>> category:m*"
>> You can add q.op=AND to make an AND between those terms.
>>
>> For phrase type wild card query support, as per docs, it
>> is ComplexPhraseQueryParser that supports it. (I haven't tested it myself)
>>
>> https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-ComplexPhraseQueryParser
>>
>> On Thu, Nov 10, 2016 at 11:40 AM, Sandeep Khanzode <
>> sandeep_khanz...@yahoo.com.invalid> wrote:
>>
>>> Hi,
>>> How does a search like abc* work in StrField. Since the entire thing is
>>> stored as a single token, is it a type of a trie structure that allows such
>>> wildcard matching?
>>> How can searches with space like 'a b*' be executed for text fields
>>> (tokenized on whitespace)? If we specify this type of query, it is broken
>>> down into two queries with field:a and field:b*. I would like them to be
>>> contiguous, sort of, like a phrase search with wild card.
>>> SRK
>
>
>

   

collection creation and recovery

2016-11-12 Thread Hendrik Haddorp

Hi,

I have a SolrCloud 6.2.1 setup with 5 nodes. I do an occasional restart 
of my nodes in which I restart one node at a time. I have quite a few 
collections. Lets say 2000 with a replication factor of 3. When the node 
comes up again it looks like I get the same issue as described in 
SOLR-5796. According to Jira this should be fixed in 6.0. Is there now a 
setting to increase the conflict resolution time as I also saw some 
leader conflict exceptions in some logs. If so could somebody point me 
to those settings?


A second thing is that it looks like when a new collection is being 
created there is first data being written into /clusterstate.json in ZK. 
I thought this was a legacy file and not being used anymore but that 
does not seem to be the case. The problem is now that when a new 
collection is being created and the first node is being assigned to a 
node and then I'm happening to stop exactly that node the collection 
does not seem to recover after the restart. The admin UI shows the new 
collection with one down replica but it never recovers. In this state I 
can not create any further collections anymore. The only solution that I 
found so far is to set the contents of /clusterstate.json to "{}" but 
this kills the collection. Is that a know issue?


The release notes of Solr 6.3 stated "Many bug fixes related to 
SolrCloud recovery for data safety and faster recovery times". Any 
chance that those could fix the issues I'm seeing?


thanks,
Hendrik


Re: Edismax query parsing in Solr 4 vs Solr 6

2016-11-12 Thread Greg Pendlebury
This has come up a lot on the lists lately. Keep in mind that edismax
parses your query uses additional parameters such as 'mm' and 'q.op'. It is
the handling of these parameters (and the selection of default values)
which has changed between versions to address a few functionality gaps.

The most common issue I've seen is where users were not setting those
values and relying on the defaults. You might now need to set them
explicitly to return to desired behaviour.

I can't see all of your configuration, but I'm guessing the important one
here is 'q.op', which was previously hard coded to 'OR', irrespective of
either parameters or solrconfig. Try setting that to 'OR' explicitly...
maybe you have your default operator set to 'AND' in solrconfig and that is
now being applied? The other option is 'mm', which I suspect should be set
to '0' unless you have some reason to want it. If it was set to '100%' it
might insert the additional '+' flags, but it can also show up as a '~'
operator on the end.

Ta,
Greg

On 8 November 2016 at 22:13, Max Bridgewater 
wrote:

> I am migrating a solr based app from Solr 4 to Solr 6.  One of the
> discrepancies I am noticing is around edismax query parsing. My code makes
> the following call:
>
>
>  userQuery="+(title:shirts isbn:shirts) +(id:20446 id:82876)"
>   Query query=QParser.getParser(userQuery, "edismax", req).getQuery();
>
>
> With Solr 4, query becomes:
>
> +(+(title:shirt isbn:shirts) +(id:20446 id:82876))
>
> With Solr 6 it however becomes:
>
> +(+(+title:shirt +isbn:shirts) +(+id:20446 +id:82876))
>
> Digging deeper, it appears that parseOriginalQuery() in
> ExtendedDismaxQParser is adding those additional + signs.
>
>
> Is there a way to prevent this altering of queries?
>
> Thanks,
> Max.
>


Re: Edismax query parsing in Solr 4 vs Solr 6

2016-11-12 Thread Max Bridgewater
Hi Greg,

Your analysis is SPOT ON. I did some debugging and found out that we had
q.op in the default set to AND. And when I changed that to OR, things
worked exactly as in Solr 4. So, it seemed Solr 6 was behaving as is
should. What I could not explain was whether Solr 4 was using the
configured q.op that was set in the default or not. But your explanation
makes sense now.

Thanks,
Max.



On Sat, Nov 12, 2016 at 4:54 PM, Greg Pendlebury 
wrote:

> This has come up a lot on the lists lately. Keep in mind that edismax
> parses your query uses additional parameters such as 'mm' and 'q.op'. It is
> the handling of these parameters (and the selection of default values)
> which has changed between versions to address a few functionality gaps.
>
> The most common issue I've seen is where users were not setting those
> values and relying on the defaults. You might now need to set them
> explicitly to return to desired behaviour.
>
> I can't see all of your configuration, but I'm guessing the important one
> here is 'q.op', which was previously hard coded to 'OR', irrespective of
> either parameters or solrconfig. Try setting that to 'OR' explicitly...
> maybe you have your default operator set to 'AND' in solrconfig and that is
> now being applied? The other option is 'mm', which I suspect should be set
> to '0' unless you have some reason to want it. If it was set to '100%' it
> might insert the additional '+' flags, but it can also show up as a '~'
> operator on the end.
>
> Ta,
> Greg
>
> On 8 November 2016 at 22:13, Max Bridgewater 
> wrote:
>
> > I am migrating a solr based app from Solr 4 to Solr 6.  One of the
> > discrepancies I am noticing is around edismax query parsing. My code
> makes
> > the following call:
> >
> >
> >  userQuery="+(title:shirts isbn:shirts) +(id:20446 id:82876)"
> >   Query query=QParser.getParser(userQuery, "edismax", req).getQuery();
> >
> >
> > With Solr 4, query becomes:
> >
> > +(+(title:shirt isbn:shirts) +(id:20446 id:82876))
> >
> > With Solr 6 it however becomes:
> >
> > +(+(+title:shirt +isbn:shirts) +(+id:20446 +id:82876))
> >
> > Digging deeper, it appears that parseOriginalQuery() in
> > ExtendedDismaxQParser is adding those additional + signs.
> >
> >
> > Is there a way to prevent this altering of queries?
> >
> > Thanks,
> > Max.
> >
>