Re: TermVectorComponent for tag generation?

Vaijanath N. Rao Fri, 31 Oct 2008 22:49:53 -0700

Hi Jon,

Isn't it similar to what Grant just said the top most terms ( afterremoving the stop words ).

You would need to get how many terms are there and there relatedfrequency and any term which is beyond a certain threshold you wouldmark it as an member of tag set.

One can also build a set of related entities or terms which arefollowing the current term, and than can decide on which all can becomepart of the tagset.


It that the requirement or I am missing something here.

-- Thanks and Regards
Vaijanath N. Rao

Jon Baer wrote:

Well for example in any given text (which is field on a document);
"While suitable for any application which requires full text indexingand searching capability, Lucene has been widely recognized for itsutility in the implementation of Internet search engines and local,single-site searching.
At the core of Lucene's logical architecture is the idea of a documentcontaining fields of text. This flexibility allows Lucene's API to beindependent of file format. Text from PDFs, HTML, Microsoft Worddocuments, as well as many others can all be indexed so long as theirtextual information can be extracted."
Id like to be able to say the tags for this article should be [Lucene,PDF, HTML, Microsoft Word] because they are in field values from otherdocuments. Basically how to generate tags from just a single documentbased on other document field values.
- Jon


On Oct 31, 2008, at 6:17 PM, Grant Ingersoll wrote:
Hey Jon,
Not following how the TVC (TermVectorComp) would help here. Isuppose you could use the "most important" terms, as defined byTF-IDF, as suggested tags. The MLT (MoreLikeThis) uses this togenerate query terms.
However, I'm not following the different filter query piece. Can youprovide a bit more details?
One thing you did make me think, though, is it might be interestingto extend TermVectorMapper so that it can output a NamedList and thenallow people to implement their own SolrTermVectorMapper and have itcustomize the TV output...
Thanks,
Grant

On Oct 31, 2008, at 5:20 PM, Jon Baer wrote:
Hi,
So Im looking to either use this or build a component which might dowhat Im looking for. Id like to figure out if its possible use asingle doc to get tag generation based on the matches within thatdocument for example:
1 News Doc -> contains 5 Players and 8 Teams (show them as possibletags for this article)
In this case Players and Teams are also docs. It's almost like Iwant to use MoreLikeThis w/ a different filter query than what Imusing.
Is there any easy hack to get this going?

Thanks.

- Jon
--------------------------
Grant Ingersoll
Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New Orleans.
http://www.lucenebootcamp.com


Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ

Re: TermVectorComponent for tag generation?

Reply via email to