Attached patch into the JIRA issue.
Reviews are welcome.
On Thu, Dec 19, 2013 at 7:24 PM, Isaac Hebsh wrote:
> Roman, do you have any results?
>
> created SOLR-5561
>
> Robert, if I'm wrong, you are welcome to close that issue.
>
>
> On Mon, Dec 9, 2013 at 10:50 PM, Isaac Hebsh wrote:
>
>> You
Roman, do you have any results?
created SOLR-5561
Robert, if I'm wrong, you are welcome to close that issue.
On Mon, Dec 9, 2013 at 10:50 PM, Isaac Hebsh wrote:
> You can see the norm value, in the "explain" text, when setting
> debugQuery=true.
> If the same item gets different norm before/a
You can see the norm value, in the "explain" text, when setting
debugQuery=true.
If the same item gets different norm before/after, that's it.
Note that this configuration is in schema.xml (not solrconfig.xml...)
On Monday, December 9, 2013, Roman Chyla wrote:
> Isaac, is there an easy way to re
Isaac, is there an easy way to recognize this problem? We also index
synonym tokens in the same position (like you do, and I'm sure that our
positions are set correctly). I could test whether the default similarity
factory in solrconfig.xml had any effect (before/after reindexing).
--roman
On Mo
Hi Robert and Manuel.
The DefaultSimilarity indeed sets discountOverlap to true by default.
BUT, the *factory*, aka DefaultSimilarityFactory, when called by
IndexSchema (the getSimilarity method), explicitly sets this value to the
value of its corresponding class member.
This class member is initi
no, its turned on by default in the default similarity.
as i said, all that is necessary is to fix your analyzer to emit the
proper position increments.
On Mon, Dec 9, 2013 at 12:27 PM, Manuel Le Normand
wrote:
> In order to set discountOverlaps to true you must have added the
> to the schema.x
In order to set discountOverlaps to true you must have added the
to the schema.xml, which
is commented out by default!
As by default this param is false, the above situation is expected with
correct positioning, as said.
In order to fix the field norms you'd have to reindex with the similarity
c
its accurate, you are wrong.
please, look at setDiscountOverlaps in your similarity. This is really
easy to understand.
On Sun, Dec 8, 2013 at 7:23 AM, Manuel Le Normand
wrote:
> Robert, you last reply is not accurate.
> It's true that the field norms and termVectors are independent. But this
>
Robert, you last reply is not accurate.
It's true that the field norms and termVectors are independent. But this
issue of higher norms for this case is expected with well assigned
positions. The LengthNorm is assigned as FieldInvertState.length which is
the count of incrementToken and not num of po
termvectors have nothing to do with any of this.
please, fix your analyzer first. if you want to add a synonym, it
should be position increment of zero.
i bet exact phrase queries aren't working correctly either.
On Fri, Dec 6, 2013 at 12:50 AM, Isaac Hebsh wrote:
> 1) positions look all right
1) positions look all right (for me).
2) fieldNorm is determined by the size of the termVector, isn't it? the
termVector size isn't affected by the positions.
On Fri, Dec 6, 2013 at 10:46 AM, Robert Muir wrote:
> Your analyzer needs to set positionIncrement correctly: sounds like its
> broken.
Your analyzer needs to set positionIncrement correctly: sounds like its broken.
On Thu, Dec 5, 2013 at 1:53 PM, Isaac Hebsh wrote:
> Hi,
> we implemented a morphologic analyzer, which stems words on index time.
> For some reasons, we index both the original word and the stem (on the same
> positi
The field is our main textual field. In the standard case, the
length-normalization makes a significant work with tf-idf, we don't want to
avoid it.
Removing duplicates won't help here, because the terms are not dup. One
term is stemmed, and the other is not.
On Fri, Dec 6, 2013 at 9:48 AM, Ahme
Hi Isaac,
Did you consider omitting norms completely for that field? omitNorms="true"
Are you using solr.RemoveDuplicatesTokenFilterFactory?
On Thursday, December 5, 2013 8:55 PM, Isaac Hebsh
wrote:
Hi,
we implemented a morphologic analyzer, which stems words on index time.
For some reasons
14 matches
Mail list logo