Re: Get All terms from all documents

2008-12-19 Thread Walter Underwood
At Netflix, we load the completion lexicon with movie titles, person names, and a few aliases. Even then, we find a few misspellings in our metadata (is it "NWA" or "N.W.A."?). Extracting terms from documents will find a lot of misspellings. You really do not want to rely on random users to correc

Re: Get All terms from all documents

2008-12-19 Thread Grant Ingersoll
I'd add you probably don't want just the query logs, people may search for things that aren't in the index, too. Your call as to whether that is useful or not. Also, have a look at the TermsComponent, as it will tell you the doc freq for terms. On Dec 19, 2008, at 10:08 AM, roberto wrote

Re: Get All terms from all documents

2008-12-19 Thread roberto
Erick, Thanks this sounds good, i'll try. Mike, Could you give more details about query logs? Thanks On Fri, Dec 19, 2008 at 12:02 AM, Mike Klaas wrote: > > On 18-Dec-08, at 10:53 AM, roberto wrote: > > Erick, >> >> Thanks for the answer, let me clarify the thing, we would like to have a >>

Re: Get All terms from all documents

2008-12-18 Thread Mike Klaas
On 18-Dec-08, at 10:53 AM, roberto wrote: Erick, Thanks for the answer, let me clarify the thing, we would like to have a combobox with the terms to guide the user in the search i mean, if a have thousands of documents and want to tell them how many documents in the base have the partic

Re: Get All terms from all documents

2008-12-18 Thread Erick Erickson
How do you get the word in the first place? If the combobox is for all words in your index, it's probably completely useless to provide this information because there is too much data to guide the user at all. I mean a list of 10,000 words with some sort of document frequency seems to me to require

Re: Get All terms from all documents

2008-12-18 Thread roberto
Erick, Thanks for the answer, let me clarify the thing, we would like to have a combobox with the terms to guide the user in the search i mean, if a have thousands of documents and want to tell them how many documents in the base have the particular word, how can i do that? thanks On Thu, Dec 18

Re: Get All terms from all documents

2008-12-18 Thread Erick Erickson
I think I'd pin the user down and have him give me the real-world use-cases that require this, then see if there's a more reasonable way to satisfy that use-case. Do they want type-ahead? What is the user of the system going to see? Because, for instance, a drop-down of 10,000 terms is totally use

Re: Get All terms from all documents

2008-12-17 Thread roberto
Grant It completely crazy do something like this i know, but the customer want´s, i´m really trying to figure out how to do it in a better way, maybe using the (auto suggest) filter from solr 1.3 to get all the words starting with some letter and cache the letter in the client side, out client is

Re: Get All terms from all documents

2008-12-17 Thread Grant Ingersoll
All terms from all docs? Really? At any rate, see http://wiki.apache.org/solr/TermsComponent May need a mod to not require any field, but for now you can enter all fields (which you can get from LukeRequestHandler) -Grant On Dec 17, 2008, at 2:17 PM, roberto wrote: Hello, I need to g