Ok I think I understand your points there. Just clarify say if the term was
"Large increased" and my filters went something like:
Large|increased
Large|increase|increased
large|increase|increased
the final tokens indexed would be large|increase|increased ?
Once again thanks for all the help.
right, prior to 3.6, the standard way to handle wildcards was to,
essentially, pre-analyze the terms that had wildcards. This works
fine for simple filters, things like lowercasing for instance, but
doesn't work so well for things like stemming.
So you're doing what can be done at this point, but
On 10/2/2014 4:33 AM, waynemailinglist wrote:
> Something that is still not clear in my mind is how this tokenising works.
> For example with the filters I have when I run the analyser I get:
> Field: Hello You
>
> Hello|You
> Hello|You
> Hello|You
> hello|you
> hello|you
>
>
> Does this mean th
Many many thanks for the replies - it was helpful for me to start
understanding how this works.
I'm using 3.5 so this goes to explain a lot. What I have done is if the
query contains a * I make the query lowercase before sending to solr. This
seems to have solved this issue given your explanation
Two things:
1> what version of Solr are you using? If it's prior to 3.6, then the
bits that handle applying lowercaseFilter to wildcards isn't in the
code.
2> what do you see if you add &debug=query?
I just tried it with your analysis chain and it seemed to work. Did
you completely blow your ind
If you use "*" you use Multiterm analysis path, which is semi-hidden
and is a lot more limited to the things done with normal tokens:
https://wiki.apache.org/solr/MultitermQueryAnalysis
The Analyzer components that are NOT multiterm aware cannot be used
that way. Looking at: http://www.solr-start.
I'm still stuck on this actually. I would really appreciate any pointers.
If I search for :
query 1: Κώστας
result: Κώστας
query 2: Κώστα*
result:
I've looked at the analyser but I don't really understand what I'm looking
at if I'm honest. It gives the output:
Field (name): title
Field value: Κ
Ahmet - many thanks - I removed the EnglishPorterFilterFactory and reindexed
and this seems to behave as expected now.
Jack - thanks aswell - I'm very much a noob with this, and thats a great
tip.
--
View this message in context:
http://lucene.472066.n3.nabble.com/Wildcard-search-makes-no-sen
The presence of a wildcard in a query term short circuits some portions of
the analysis process. Some token filters like lower case can still be
performed on the query terms, but others, like stemming, cannot. So, either
simplify the analysis (be more selective of what token filters you use), or
On Wed, 2014-10-01 at 13:16 +0200, Wayne W wrote:
> query 2: capit*
> result: Capital Health
>
> query 3: capita*
> result:
You are likely using a stemmer for the field: "Capital Health" gets
indexed as "capit" and "health", so there are no tokens starting with
"capita".
Turn off the stemmer or
Hi,
Probably you have stemmer and it is eating up Capital to capit. Thats the
reason.
Either remove stemmer from analyser chain or add keyword repeat filter.
Ahmet
On Wednesday, October 1, 2014 2:16 PM, Wayne W
wrote:
Hi,
I don't understand this at all. We are indexing some contact names.
11 matches
Mail list logo