I think in my case LowerCaseTokenizerFactory will be sufficient because
there will never be spaces in this particular field. But thank you for the
useful link!
Thanks,
Brian Lamb
On Wed, Jun 1, 2011 at 11:44 AM, Erick Erickson wrote:
> Be a little careful here. LowerCaseTokenizerFactory is diff
Be a little careful here. LowerCaseTokenizerFactory is different than
KeywordTokenizerFactory.
LowerCaseTokenizerFactory will give you more than one term. e.g.
the string "Intelligence can't be MeaSurEd" will give you 5 terms,
any of which may match. i.e.
"intelligence", "can", "t", "be", "measure
Hi Tomás,
Thank you very much for your suggestion. I took another crack at it using
your recommendation and it worked ideally. The only thing I had to change
was
to
The first did not produce any results but the second worked beautifully.
Thanks!
Brian Lamb
2011/5/31 Tomás Fernánde
...or also use the LowerCaseTokenizerFactory at query time for consistency,
but not the edge ngram filter.
2011/5/31 Tomás Fernández Löbbe
> Hi Brian, I don't know if I understand what you are trying to achieve. You
> want the term query "abcdefg" to have an idf of 1 insead of 7? I think using
>
Hi Brian, I don't know if I understand what you are trying to achieve. You
want the term query "abcdefg" to have an idf of 1 insead of 7? I think using
the KeywordTokenizerFilterFactory at query time should work. I would be
something like:
this way, at query time "abcde
I believe I used that link when I initially set up the field and it worked
great (and I'm still using it in other places). In this particular example
however it does not appear to be practical for me. I mentioned that I have a
similarity class that returns 1 for the idf and i
Can you specify the analyzer you are using for your queries?
May be you could use a KeywordAnalyzer for your queries so you don't end up
matching parts of your query.
http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/
This should help you.
On Tue,
In this particular case, I will be doing a solr search based on user
preferences. So I will not be depending on the user to type "abcdefg". That
will be automatically generated based on user selections.
The contents of the field do not contain spaces and since I am created the
search parameters, c
That'll work for your case, although be aware that string types aren't
analyzed at all,
so case matters, as do spaces etc.
What is the use-case here? If you explain it a bit there might be
better answers
Best
Erick
On Fri, May 27, 2011 at 9:17 AM, Brian Lamb
wrote:
> For this, I ended u
For this, I ended up just changing it to string and using "abcdefg*" to
match. That seems to work so far.
Thanks,
Brian Lamb
On Wed, May 25, 2011 at 4:53 PM, Brian Lamb
wrote:
> Hi all,
>
> I'm running into some confusion with the way edgengram works. I have the
> field set up as:
>
> position
I'm afraid I'll have to pass, I'm absolutely swamped at the moment. Perhaps
someone else can pick it up.
I will say that you should be getting terms back when you pre-lower-case
them, so look in your index via the admin page or Luke to see if what's
really in your index is what you think in the "n
Hi Erick,
If you have time, Can you please take a look and provide your comments (or)
suggestions for this problem?
Please let me know if you need any more information.
Thanks,
Johnny
--
View this message in context:
http://lucene.472066.n3.nabble.com/EdgeNgram-Auto-suggest-doubles-ignore-tp
Hi Erick,
I tried to use terms component, I got ended up with the following problems.
Problem: 1
Custom Sort not working in terms component:
http://lucene.472066.n3.nabble.com/Term-component-sort-is-not-working-td1905059.html#a1909386
I want to sort using one of my custom fiel
OK, try this.
Use some analysis chain for your field like:
This can be a multiValued field, BTW.
now use the TermsComponent to fetch your data. See:
http://wiki.apache.org/solr/TermsComponent
and specify terms.prefix=apple e.g.
http://localhost:8983/solr/terms?terms.prefix=app&terms.fl=bli
Right now our configuration says multivalues=true. But that need not be
"true" in our case. Will make it false and try and update this thread with
more details..
--
View this message in context:
http://lucene.472066.n3.nabble.com/EdgeNgram-Auto-suggest-doubles-ignore-tp2321919p2334627.html
Sent
Ah, sorry, I got confused about your requirements, if you just want to
match at the beginning of the field, it may be more possible. Using
edgegrams or wildcard. If you have a single-valued field. Do you have a
single-valued or a multi-valued field? That is, does each document have
just one v
The index contains around 1.5 million documents. As this is used for
autosuggest feature, performance is an important factor.
So it looks like, using edgeNgram it is difficult to achieve the the
following
Result should return only those terms where search letter is matching with
the first word
Oh, i should perhaps mention that EdgeNGrams will yield results a lot quicker
than using wildcards at the cost of a larger index. You should, of course, use
EdgeNGrams if you worry about performance and have a huge index and a number
of queries per second.
> Then you don't need NGrams at all. A
Then you don't need NGrams at all. A wildcard will suffice or you can use the
TermsComponent.
If these strings are indexed as single tokens (KeywordTokenizer with
LowercaseFilter) you can simply do field:app* to retrieve the "apple milk
shake". You can also use the string field type but then yo
I haven't figured out any way to achieve that AT ALL without making a
seperate Solr index just to serve autosuggest queries. At least when you
want to auto-suggest on a multi-value field. Someone posted a crazy
tricky way to do it with a single-valued field a while ago. If you
can/are willing
Hi Eric,
What I want here is, lets say I have 3 documents like
["pineapple vers apple", "milk with apple", "apple milk shake" ]
and If i search for "apple", it should return only "apple milk shake"
because that term alone starts with the letter "apple" which I typed in. It
should not bring oth
Let's back up here because now I'm not clear what you actually want.
EdgeNGrams
are a way of matching substrings, which is what's happening here. Of course
searching "apple" against any of the three examples, just as searching for
"apple"
without grams would match, that's the expected behavior.
So
Hi Eric,
You are right, there is a copy field to EdgeNgram, I tried the configuration
but it not working as expected.
Configuration I tried:
edgy_user_query
==
When I search for th
See below.
On Mon, Jan 24, 2011 at 1:51 PM, johnnyisrael wrote:
>
> Hi,
>
> I am trying out the auto suggest using EdgeNgram.
>
> Using the following tutorial as a reference.
>
>
> http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/
>
> In the above
it seems adding the '+' (required) operator to each term in a multi-term query
does the trick:
http://lucene.apache.org/java/2_4_0/queryparsersyntax.html#+
ie: edgytext2:(+Martin +Sco)
-robert
On Nov 16, 2010, at 8:52 PM, Robert Gründler wrote:
> thanks for the explanation.
>
> the result
thanks for the explanation.
the results for the autocompletion are pretty good now, but we still have a
small problem.
When there are hits in the "edgytext2" fields, results which only have hits in
the "edgytext" field
should not be returned at all.
Example:
Query: "Martin Sco"
Current Resu
Without the parens, the "edgytext:" only applied to "Mr", the default
field still applied to "Scorcese".
The double quotes are neccesary in the second case (rather than parens),
because on a non-tokenized field because the standard query parser will
"pre-tokenize" on whitespace before sending
>
> Did you run your query without using () and "" operators? If yes can you try
> this?
> &q=edgytext:(Mr Scorsese) OR edgytext2:"Mr Scorsese"^2.0
I didn't use () and "" in my query before. Using the query with those operators
works now, stopwords are thrown out as the should, thanks.
However,
Ah I see. Thanks for the explanation.
Could you set the defaultOperator to "AND"? That way both "Bill" and "Cl" must
be a match and that would exclude "Clyde Phillips".
--- On Thu, 11/11/10, Robert Gründler wrote:
> From: Robert Gründler
> Su
according to the fieldtype i posted previously, i think it's because of:
1. WhiteSpaceTokenizer splits the String "Clyde Phillips" into 2 tokens:
"Clyde" and "Phillips"
2. EdgeNGramFilter gets the 2 tokens, and creates an EdgeNGram for each token:
"C" "Cl" "Cly" ... AND "P" "Ph" "Phi" ...
Th
Could anyone help me understand what does "Clyde Phillips" appear in the
results for "Bill Cl"??
"Clyde Phillips" doesn't produce any EdgeNGram that would match "Bill Cl", so
why is it even in the results?
Thanks.
--- On Thu, 11/11/10, Ahmet Arslan wrote:
> You can add an additional field, w
On 12 Nov 2010, at 01:46, Ahmet Arslan wrote:
>> This setup now makes troubles regarding StopWords, here's
>> an example:
>>
>> Let's say the index contains 2 Strings: "Mr Martin
>> Scorsese" and "Martin Scorsese". "Mr" is in the stopword
>> list.
>>
>> Query: edgytext:Mr Scorsese OR edgytext2
> This setup now makes troubles regarding StopWords, here's
> an example:
>
> Let's say the index contains 2 Strings: "Mr Martin
> Scorsese" and "Martin Scorsese". "Mr" is in the stopword
> list.
>
> Query: edgytext:Mr Scorsese OR edgytext2:Mr Scorsese^2.0
>
> This way, the only result i get is
thanks a lot, that setup works pretty well now.
the only problem now is that the StopWords do not work that good anymore. I'll
provide an example, but first the 2 fieldtypes:
You can add an additional field, with using KeywordTokenizerFactory instead of
WhitespaceTokenizerFactory. And query both these fields with an OR operator.
edgytext:(Bill Cl) OR edgytext2:"Bill Cl"
You can even apply boost so that begins with matches comes first.
--- On Thu, 11/11/10, Robert G
35 matches
Mail list logo