My aim is to boost "exactish" matches similar to the recipe described in
[1]. The anchoring works in q but not in pf, where I need it. Here is an
example that shows the effect:
q=title_exact:"anatomie"&pf=title_exact^2000
debugQuery says it is interpreted this way:
+title_exact:" anatomie "
Thank you all for your response,
The thing is that we have 180G index while half of it are deleted documents.
We tried to run an optimization in order to shrink index size but it
crashes on ‘out of memory’ when the process reaches 120G.
Is it possible to optimize parts of the index?
Please adv
Maybe you should consider creating different generations of indexes and
not keep everything in one index. If the likelihood of documents being
deleted is rather high in, e.g., the first week or so, you could have
one index for the high-probability of deletion documents (the fresh
ones) and a second
What happens when you do not use fielded query?
q=anatomie&qf=title_exact
instead of
q=title_exact:"anatomie"
Ahmet
On Sunday, January 11, 2015 12:05 PM, Michael Lackhoff
wrote:
My aim is to boost "exactish" matches similar to the recipe described in
[1]. The anchoring works in q but not in
Am 11.01.2015 um 14:01 schrieb Ahmet Arslan:
> What happens when you do not use fielded query?
>
> q=anatomie&qf=title_exact
> instead of
>
> q=title_exact:"anatomie"
Then it works (with qf=title):
+(title:anatomie) (title_exact:" anatomie "^20.0)
Only problem is that my frontend alway
Not directly in your subject but you could look at this patch
https://issues.apache.org/jira/browse/SOLR-6841 it implements visualization
of solr(lucene) segments with exact information of how much deletions are
present in each segment. Looking at this one you could - of course next
time - react li
Hi,
It's not an option for us, all the documents in our index have same deletion
probability.
Is there any other solution to perform an optimization in order to reduce
index size?
Thanks in advance.
--
View this message in context:
http://lucene.472066.n3.nabble.com/Frequent-deletions-tp41766
Am 11.01.2015 um 14:19 schrieb Michael Lackhoff:
> Or put another way: How can I do this boost in more complex queries like:
> title:foo AND author:miller AND year:[2010 TO *]
> It would be nice to have a title "foo" before another title "some foo
> and bar" (given the other criteria also match bo
Hi everybody,
I am going to add some analysis to Solr at the index time. Here is what I
am considering in my mind:
Suppose I have two different fields for Solr schema, field "a" and field
"b". I am going to use the created reverse index in a way that some terms
are considered as important ones and
I believe if you delete all documents in a segment, that segment as a
whole goes away.
A segment is created on every commit whether you reopen the searcher
or not. Do you know what documents would be deleted later (are there
are natural clusters). If yes, perhaps there is a way to index them so
th
You would do that with a custom similarity (scoring) class. That's an
expert feature. In fact a SUPER-expert feature.
Start by completely familiarizing yourself with how TF*IDF similarity
already works:
http://lucene.apache.org/core/4_10_3/core/org/apache/lucene/search/similarities/TFIDFSimilarit
Usually, Lucene will be optimizing (merging) segments on the fly so that
you should only have a fraction of your total deletions present in the
index and should never have an absolute need to do an old-fashioned full
optimize.
What merge policy are you using?
Is Solr otherwise running fine other
Dear Jack,
Hi,
I think you misunderstood my need. I dont want to change the default
scoring behavior of Lucene (tf-idf) I just want to have another field to do
sorting for some specific queries (not all the search business), however I
am aware of Lucene payload.
Thank you very much.
On Sun, Jan 11
Hi,
You might find this useful :
https://lucidworks.com/blog/whats-a-dismax/
Regarding your example : title:foo AND author:miller AND year:[2010 TO *]
last two clauses better served as a filter query.
http://wiki.apache.org/solr/CommonQueryParameters#fq
By the way it is possible to combine di
OK, why can't you give the JVM more memory, perhaps on
a one-time basis to get past this problem? You've never
told us how much memory you give the JVM in the first place.
Best,
Erick
On Sun, Jan 11, 2015 at 7:54 AM, Jack Krupansky
wrote:
> Usually, Lucene will be optimizing (merging) segments o
Your description uses the terms Solr/Lucene uses but perhaps not in
the same way we do. That might explain the confusion.
It sounds - on a high level - that you want to create a field based on
a combination of a couple of other fields during indexing stage. Have
you tried UpdateRequestProcessors?
Do note that one strategy is to create more shards than you need at
the beginning. Say you determine that 10 shards will work fine, but
you expect to grow your corpus by 2x. _Start_ with 20 shards
(multiple shards can be hosted in the same JVM, no problem, see
maxShardsPerNode in the collections
Hi Ahmet,
> You might find this useful :
> https://lucidworks.com/blog/whats-a-dismax/
I have a basic understanding but will do further reading...
> Regarding your example : title:foo AND author:miller AND year:[2010 TO *]
> last two clauses better served as a filter query.
>
> http://wiki.apa
It's still not quite clear to me what your specific goal is. From your
vague description it seems somewhat different from the blog post that you
originally cited. So, let's try one more time... explain in plain English
what use case you are trying to satisfy.
You mention fielded queries, but in my
Am 11.01.2015 um 18:30 schrieb Jack Krupansky:
> It's still not quite clear to me what your specific goal is. From your
> vague description it seems somewhat different from the blog post that you
> originally cited. So, let's try one more time... explain in plain English
> what use case you are tr
For the title searches, Doug Turnbull wrote a really interesting
in-depth article:
http://opensourceconnections.com/blog/solr/using-solr-cloud-for-robustness-but-returning-json-format/
I don't know if that's the one you read already.
For the fielded query, you get more flexibility if you use multi
Thanks for the clarification. The issue still remains that you need to
distill all of the competing requirements into a single, concise, and
consistent model, and whether that adequately aligns with existing Solr
features remains problematic.
The general guidance is to stick with the existing Solr
Dear Alexandre,
I did not tried updaterequestprocessor yet. Can I access to term
frequencies at this level? I dont want to calculate term frequencies once
more while lucene already calculate them in reverse index?
Thank you very much.
On Jan 11, 2015 7:49 PM, "Alexandre Rafalovitch"
wrote:
> Yo
No you cannot anything outside specific document being indexed at that point.
What are you actually trying to achieve on the business level?
Regards,
Alex.
Sign up for my Solr resources newsletter at http://www.solr-start.com/
On 11 January 2015 at 14:59, Ali Nazemian wrote:
> Dear Ale
bq: I still feel like shard management could be made easier. I'll see
if I can have a look at JIRA and try to pitch in.
_nobody_ will disagree here! It's been a time and interest thing so far...
Automating it all is tempting, but my experience so far indicates
that once people get to large sc
Actually, let me take that back. I seem to remember an example where
somebody used URP to do a pre-analysis of the field. That implies
access to Solr core. So it might be possible.
But I still think you need to review the business level issues, as you
are going into increasingly hacky territory.
Won't function queries do the job at query time? You can add or multiply
the tf*idf score by a function of the term frequency of arbitrary terms,
using the tf, mul, and add functions.
See:
https://cwiki.apache.org/confluence/display/solr/Function+Queries
-- Jack Krupansky
On Sun, Jan 11, 2015 at
Hi Michael,
I had to deal such expert users in the past :)
I suggest you to create a new syntax for exact match.
Since he is an expert he will love it.
either suggest
i) ask user to enter number of tokens e.g. q=title:Anatomie AND length:1
or
ii) use dollar sign (or something else) for art
[ disclaimer: this worked for me, ymmv ... ]
I just battled this. Turns out incrementally optimizing using the
maxSegments attribute was the most efficient solution for me. In
particular when you are actually running out of disk space.
#!/bin/bash
# n-segments I started with
high=400
# n-segme
Dear Jack,
Thank you very much.
Yeah I was thinking of function query for sorting, but I have to problems
in this case, 1) function query do the process at query time which I dont
want to. 2) I also want to have the score field for retrieving and showing
to users.
Dear Alexandre,
Here is some more
Thanks everyone for all the advice!
To sum up there seems to be no easy solution. I only have the option to
either
- make things really complicated
- only help some users/query structures
- accept the status quo
What could help is an analogon to field aliases:
If it was possible to say
f.title.pf
31 matches
Mail list logo