On Jan 15, 2010, at 8:19 AM, MitchK wrote: > > Hello, > > I have searched the wiki and the mailing-lists, but I can't find any > postings for the following training-use cases. > > First: > I want to create a Term Dictionary, which I can response to my client. The > client should be able to manipulate this response in any way he wants - so I > really need a by human readable dictionary, which I can export to a > database, if I need to do so. > I know that Lucene has got a Term Dictionary, but I don't know how to access > it. >
http://wiki.apache.org/solr/TermsComponent > Second: > I want to manipulate the scoring of a document. Sure, there are some good > ways to do so out-of-the-box, but I want to do so in a special way: > For example my index contains on those stored documents: > 1. "Star Wars - Episode I - Phantom Meneance DVD Extended Edition" > 2. "Star Wars - Episodde I - Phantom Meneance Video-box" > 3. "Star Wars - Episode V - The Empire Strikes Back Special Edition DVD" > 4. "Star Wars - Heir to the Empire by Timothy Zahn" > > There are only three queries since the index has been built: > 1. "Star Wars Episode I" - > the user clicked document 1 > 2. "Star Wars" -> the user clicked document 3 > 3. the user can't remind the title of a special Star Wars book, so he is > searching for "Star Wars Empire" > and he clicked document 4. > > Now, I want to do the following: > If someone is querying for "Star Wars" again, document 3 should be the first > responsed document, because whenever someone has searched for "Star Wars" in > the past, document 3 was the most popular document. > If someone is querying for "Star Wars Episode I" document 1 should be > responsed, due to the same reason. > > In easy words, I want to boost some documents by query. > I can't do so with the help of a popularity-category, because if 1.000 > queries were "Star Wars Episode I" and all 1.000 people clicked on document > 1, the popularity of document 1 would be 1.000. > If 500 people were searching for "Star Wars" and clicked document 3, the > popularity of document 3 would be 500. However, the first result in the > response would be document 1 instead of 3. > > I have absolutely no idea how to do so, without creating a seperate file per > document with the needed information. So, if anyone has some experiences > with such a use case, feel free to tell us your thoughts and ideas. Having a > term dictionary, one could use an external database to solve this problem > for the moment. Generally, this clickthrough tracking is tied to the query, so you need a layer above just popularity. You need popularity per query (or in all likelihood a subset of the queries, since you likely only care about this where you have a certain level of clickthroughs/queries).