At Netflix, we load the completion lexicon with movie titles, person
names, and a few aliases. Even then, we find a few misspellings in
our metadata (is it "NWA" or "N.W.A."?). Extracting terms from
documents will find a lot of misspellings.
You really do not want to rely on random users to correc
I'd add you probably don't want just the query logs, people may search
for things that aren't in the index, too. Your call as to whether
that is useful or not. Also, have a look at the TermsComponent, as it
will tell you the doc freq for terms.
On Dec 19, 2008, at 10:08 AM, roberto wrote
Erick,
Thanks this sounds good, i'll try.
Mike,
Could you give more details about query logs?
Thanks
On Fri, Dec 19, 2008 at 12:02 AM, Mike Klaas wrote:
>
> On 18-Dec-08, at 10:53 AM, roberto wrote:
>
> Erick,
>>
>> Thanks for the answer, let me clarify the thing, we would like to have a
>>
On 18-Dec-08, at 10:53 AM, roberto wrote:
Erick,
Thanks for the answer, let me clarify the thing, we would like to
have a
combobox with the terms to guide the user in the search i mean, if a
have
thousands of documents and want to tell them how many documents in
the base
have the partic
How do you get the word in the first place? If the combobox
is for all words in your index, it's probably completely useless
to provide this information because there is too much data to
guide the user at all. I mean a list of 10,000 words with some sort
of document frequency seems to me to require
Erick,
Thanks for the answer, let me clarify the thing, we would like to have a
combobox with the terms to guide the user in the search i mean, if a have
thousands of documents and want to tell them how many documents in the base
have the particular word, how can i do that?
thanks
On Thu, Dec 18
I think I'd pin the user down and have him give me the real-world
use-cases that require this, then see if there's a more reasonable
way to satisfy that use-case. Do they want type-ahead? What
is the user of the system going to see? Because, for instance,
a drop-down of 10,000 terms is totally use
Grant
It completely crazy do something like this i know, but the customer want´s,
i´m really trying to figure out how to do it in a better way, maybe using
the (auto suggest) filter from solr 1.3 to get all the words starting with
some letter and cache the letter in the client side, out client is
All terms from all docs? Really?
At any rate, see http://wiki.apache.org/solr/TermsComponent May need
a mod to not require any field, but for now you can enter all fields
(which you can get from LukeRequestHandler)
-Grant
On Dec 17, 2008, at 2:17 PM, roberto wrote:
Hello,
I need to g