pf doesn't work like normal phrase query

2015-01-11 Thread Michael Lackhoff
My aim is to boost "exactish" matches similar to the recipe described in [1]. The anchoring works in q but not in pf, where I need it. Here is an example that shows the effect: q=title_exact:"anatomie"&pf=title_exact^2000 debugQuery says it is interpreted this way: +title_exact:" anatomie "

Re: Frequent deletions

2015-01-11 Thread ig01
Thank you all for your response, The thing is that we have 180G index while half of it are deleted documents. We tried to run an optimization in order to shrink index size but it crashes on ‘out of memory’ when the process reaches 120G. Is it possible to optimize parts of the index? Please adv

Re: Frequent deletions

2015-01-11 Thread Jürgen Wagner (DVT)
Maybe you should consider creating different generations of indexes and not keep everything in one index. If the likelihood of documents being deleted is rather high in, e.g., the first week or so, you could have one index for the high-probability of deletion documents (the fresh ones) and a second

Re: pf doesn't work like normal phrase query

2015-01-11 Thread Ahmet Arslan
What happens when you do not use fielded query? q=anatomie&qf=title_exact instead of q=title_exact:"anatomie" Ahmet On Sunday, January 11, 2015 12:05 PM, Michael Lackhoff wrote: My aim is to boost "exactish" matches similar to the recipe described in [1]. The anchoring works in q but not in

Re: pf doesn't work like normal phrase query

2015-01-11 Thread Michael Lackhoff
Am 11.01.2015 um 14:01 schrieb Ahmet Arslan: > What happens when you do not use fielded query? > > q=anatomie&qf=title_exact > instead of > > q=title_exact:"anatomie" Then it works (with qf=title): +(title:anatomie) (title_exact:" anatomie "^20.0) Only problem is that my frontend alway

Re: Frequent deletions

2015-01-11 Thread Michał B . .
Not directly in your subject but you could look at this patch https://issues.apache.org/jira/browse/SOLR-6841 it implements visualization of solr(lucene) segments with exact information of how much deletions are present in each segment. Looking at this one you could - of course next time - react li

Re: Frequent deletions

2015-01-11 Thread ig01
Hi, It's not an option for us, all the documents in our index have same deletion probability. Is there any other solution to perform an optimization in order to reduce index size? Thanks in advance. -- View this message in context: http://lucene.472066.n3.nabble.com/Frequent-deletions-tp41766

Re: pf doesn't work like normal phrase query

2015-01-11 Thread Michael Lackhoff
Am 11.01.2015 um 14:19 schrieb Michael Lackhoff: > Or put another way: How can I do this boost in more complex queries like: > title:foo AND author:miller AND year:[2010 TO *] > It would be nice to have a title "foo" before another title "some foo > and bar" (given the other criteria also match bo

Extending solr analysis in index time

2015-01-11 Thread Ali Nazemian
Hi everybody, I am going to add some analysis to Solr at the index time. Here is what I am considering in my mind: Suppose I have two different fields for Solr schema, field "a" and field "b". I am going to use the created reverse index in a way that some terms are considered as important ones and

Re: Frequent deletions

2015-01-11 Thread Alexandre Rafalovitch
I believe if you delete all documents in a segment, that segment as a whole goes away. A segment is created on every commit whether you reopen the searcher or not. Do you know what documents would be deleted later (are there are natural clusters). If yes, perhaps there is a way to index them so th

Re: Extending solr analysis in index time

2015-01-11 Thread Jack Krupansky
You would do that with a custom similarity (scoring) class. That's an expert feature. In fact a SUPER-expert feature. Start by completely familiarizing yourself with how TF*IDF similarity already works: http://lucene.apache.org/core/4_10_3/core/org/apache/lucene/search/similarities/TFIDFSimilarit

Re: Frequent deletions

2015-01-11 Thread Jack Krupansky
Usually, Lucene will be optimizing (merging) segments on the fly so that you should only have a fraction of your total deletions present in the index and should never have an absolute need to do an old-fashioned full optimize. What merge policy are you using? Is Solr otherwise running fine other

Re: Extending solr analysis in index time

2015-01-11 Thread Ali Nazemian
Dear Jack, Hi, I think you misunderstood my need. I dont want to change the default scoring behavior of Lucene (tf-idf) I just want to have another field to do sorting for some specific queries (not all the search business), however I am aware of Lucene payload. Thank you very much. On Sun, Jan 11

Re: pf doesn't work like normal phrase query

2015-01-11 Thread Ahmet Arslan
Hi, You might find this useful : https://lucidworks.com/blog/whats-a-dismax/ Regarding your example : title:foo AND author:miller AND year:[2010 TO *] last two clauses better served as a filter query. http://wiki.apache.org/solr/CommonQueryParameters#fq By the way it is possible to combine di

Re: Frequent deletions

2015-01-11 Thread Erick Erickson
OK, why can't you give the JVM more memory, perhaps on a one-time basis to get past this problem? You've never told us how much memory you give the JVM in the first place. Best, Erick On Sun, Jan 11, 2015 at 7:54 AM, Jack Krupansky wrote: > Usually, Lucene will be optimizing (merging) segments o

Re: Extending solr analysis in index time

2015-01-11 Thread Alexandre Rafalovitch
Your description uses the terms Solr/Lucene uses but perhaps not in the same way we do. That might explain the confusion. It sounds - on a high level - that you want to create a field based on a combination of a couple of other fields during indexing stage. Have you tried UpdateRequestProcessors?

Re: How large is your solr index?

2015-01-11 Thread Bram Van Dam
Do note that one strategy is to create more shards than you need at the beginning. Say you determine that 10 shards will work fine, but you expect to grow your corpus by 2x. _Start_ with 20 shards (multiple shards can be hosted in the same JVM, no problem, see maxShardsPerNode in the collections

Re: pf doesn't work like normal phrase query

2015-01-11 Thread Michael Lackhoff
Hi Ahmet, > You might find this useful : > https://lucidworks.com/blog/whats-a-dismax/ I have a basic understanding but will do further reading... > Regarding your example : title:foo AND author:miller AND year:[2010 TO *] > last two clauses better served as a filter query. > > http://wiki.apa

Re: pf doesn't work like normal phrase query

2015-01-11 Thread Jack Krupansky
It's still not quite clear to me what your specific goal is. From your vague description it seems somewhat different from the blog post that you originally cited. So, let's try one more time... explain in plain English what use case you are trying to satisfy. You mention fielded queries, but in my

Re: pf doesn't work like normal phrase query

2015-01-11 Thread Michael Lackhoff
Am 11.01.2015 um 18:30 schrieb Jack Krupansky: > It's still not quite clear to me what your specific goal is. From your > vague description it seems somewhat different from the blog post that you > originally cited. So, let's try one more time... explain in plain English > what use case you are tr

Re: pf doesn't work like normal phrase query

2015-01-11 Thread Alexandre Rafalovitch
For the title searches, Doug Turnbull wrote a really interesting in-depth article: http://opensourceconnections.com/blog/solr/using-solr-cloud-for-robustness-but-returning-json-format/ I don't know if that's the one you read already. For the fielded query, you get more flexibility if you use multi

Re: pf doesn't work like normal phrase query

2015-01-11 Thread Jack Krupansky
Thanks for the clarification. The issue still remains that you need to distill all of the competing requirements into a single, concise, and consistent model, and whether that adequately aligns with existing Solr features remains problematic. The general guidance is to stick with the existing Solr

Re: Extending solr analysis in index time

2015-01-11 Thread Ali Nazemian
Dear Alexandre, I did not tried updaterequestprocessor yet. Can I access to term frequencies at this level? I dont want to calculate term frequencies once more while lucene already calculate them in reverse index? Thank you very much. On Jan 11, 2015 7:49 PM, "Alexandre Rafalovitch" wrote: > Yo

Re: Extending solr analysis in index time

2015-01-11 Thread Alexandre Rafalovitch
No you cannot anything outside specific document being indexed at that point. What are you actually trying to achieve on the business level? Regards, Alex. Sign up for my Solr resources newsletter at http://www.solr-start.com/ On 11 January 2015 at 14:59, Ali Nazemian wrote: > Dear Ale

Re: How large is your solr index?

2015-01-11 Thread Erick Erickson
bq: I still feel like shard management could be made easier. I'll see if I can have a look at JIRA and try to pitch in. _nobody_ will disagree here! It's been a time and interest thing so far... Automating it all is tempting, but my experience so far indicates that once people get to large sc

Re: Extending solr analysis in index time

2015-01-11 Thread Alexandre Rafalovitch
Actually, let me take that back. I seem to remember an example where somebody used URP to do a pre-analysis of the field. That implies access to Solr core. So it might be possible. But I still think you need to review the business level issues, as you are going into increasingly hacky territory.

Re: Extending solr analysis in index time

2015-01-11 Thread Jack Krupansky
Won't function queries do the job at query time? You can add or multiply the tf*idf score by a function of the term frequency of arbitrary terms, using the tf, mul, and add functions. See: https://cwiki.apache.org/confluence/display/solr/Function+Queries -- Jack Krupansky On Sun, Jan 11, 2015 at

Re: pf doesn't work like normal phrase query

2015-01-11 Thread Ahmet Arslan
Hi Michael, I had to deal such expert users in the past :) I suggest you to create a new syntax for exact match. Since he is an expert he will love it. either suggest i) ask user to enter number of tokens e.g. q=title:Anatomie AND length:1 or ii) use dollar sign (or something else) for art

Re: Frequent deletions

2015-01-11 Thread David Santamauro
[ disclaimer: this worked for me, ymmv ... ] I just battled this. Turns out incrementally optimizing using the maxSegments attribute was the most efficient solution for me. In particular when you are actually running out of disk space. #!/bin/bash # n-segments I started with high=400 # n-segme

Re: Extending solr analysis in index time

2015-01-11 Thread Ali Nazemian
Dear Jack, Thank you very much. Yeah I was thinking of function query for sorting, but I have to problems in this case, 1) function query do the process at query time which I dont want to. 2) I also want to have the score field for retrieving and showing to users. Dear Alexandre, Here is some more

Re: pf doesn't work like normal phrase query

2015-01-11 Thread Michael Lackhoff
Thanks everyone for all the advice! To sum up there seems to be no easy solution. I only have the option to either - make things really complicated - only help some users/query structures - accept the status quo What could help is an analogon to field aliases: If it was possible to say f.title.pf