Jack, Emir,
Thanks for your answers. Moving ngram logic to client side would be a fast
and easy way to test the solution and compare it with the phonetic one.
Best regards,
Elisabeth
2016-03-11 10:52 GMT+01:00 Emir Arnautovic :
> Hi Elizabeth,
> In order to see if you will get better results, y
Hi Elizabeth,
In order to see if you will get better results, you can move ngram logic
outside of analysis chain - simplest solution is to move it to client.
In such setup, you should be able to use pf2 and pf3 and see if that
produces desired result.
Regards,
Emir
On 10.03.2016 13:47, elisa
I suspect that what you really want is analogous to PF2/PF3, but based on
the ngram terms that come out of query token analysis rather than using
pairs/triples of source terms before analysis that are then analyzed as
phrases so that all of the ngrams for a PF2/PF3 phrase must be in order
rather po
oh yeah, now that you're saying it, yeah you're right, pf2 pf3 will boost
proximity between words, not between ngrams.
Thanks again,
Elisabeth
2016-03-10 12:31 GMT+01:00 Alessandro Benedetti :
> The reason pf2 and pf3 seems not a good solution to me is the fact that the
> edismax query parser ca
The reason pf2 and pf3 seems not a good solution to me is the fact that the
edismax query parser calculate those grams on top of words shingles.
So it takes the query in input, and produces the shingle based on the white
space separator.
i.e. if you search :
"white tiger jumping"
and pf2 configur
That's the use cas, yes. Find Amsterdam with Asmtreadm.
And yes, we're only doing approximative search if we get 0 result.
I don't quite get why pf2 pf3 not a good solution.
We're actually testing a solution close to phonetic. Some kind of word
reduction.
Thanks for the suggestion (and the link
If I followed your use case is:
I type Asmtreadm and I want document matching Amsterdam ( even if the edit
distance is greater than 2) .
First of all is something I hope you do only if you get 0 results, if not
the overhead can be great and you are going to lose a lot of precision
causing con
I am trying to do approximative search with solr. We've tried fuzzy search,
and spellcheck search, it's working ok but edit distance is limited (to 2
for DirectSolrSpellChecker in solr 4.10.1). With fuzzy operator, we've had
performance issues, and I don't think you can have an edit distance more
t
if you store the positions for your tokens ( and it is by default if you
don't omit them), you have the relative position in the index. [1]
I attach a blog post of mine, describing a little bit more in details the
lucene internals.
Apart from that, can you explain the problem you are trying to sol
Hello Alessandro,
You may be right. What would you use to keep relative order between, for
instance, grams
__a
_am
ams
mst
ste
ter
erd
rda
dam
am_
of amsterdam? pf2 and pf3? That's all I can think about. Please let me know
if you have more insights.
Best regards,
Elisabeth
2016-03-08 17:46 GMT
Elizabeth,
out of curiousity, could we know what you are trying to solve with that
complex way of tokenisation ?
Solr is really good in storing positions along with token, so I am curious
to know why your are mixing the things up.
Cheers
On 8 March 2016 at 10:08, elisabeth benoit
wrote:
> Thank
Thanks for your answer Emir,
I'll check that out.
Best regards,
Elisabeth
2016-03-08 10:24 GMT+01:00 Emir Arnautovic :
> Hi Elisabeth,
> I don't think there is such token filter, so you would have to create your
> own token filter that takes token and emits ngram token of specific length.
> It
Hi Elisabeth,
I don't think there is such token filter, so you would have to create
your own token filter that takes token and emits ngram token of specific
length. It should not be too hard to create such filter - you can take a
look how nagram filter is coded - yours should be simpler than th
Hello,
I'm using solr 4.10.1. I'd like to index words with ngrams of fix lenght
with a position in the end.
For instance, with fix lenght 3, Amsterdam would be something like:
a0 (two spaces added at beginning)
am1
ams2
mst3
ste4
ter5
erd6
rda7
dam8
am9 (one more space in the end)
The number a
14 matches
Mail list logo