Hi Jon,
Isn't it similar to what Grant just said the top most terms ( after
removing the stop words ).
You would need to get how many terms are there and there related
frequency and any term which is beyond a certain threshold you would
mark it as an member of tag set.
One can also build a set of related entities or terms which are
following the current term, and than can decide on which all can become
part of the tagset.
It that the requirement or I am missing something here.
-- Thanks and Regards
Vaijanath N. Rao
Jon Baer wrote:
Well for example in any given text (which is field on a document);
"While suitable for any application which requires full text indexing
and searching capability, Lucene has been widely recognized for its
utility in the implementation of Internet search engines and local,
single-site searching.
At the core of Lucene's logical architecture is the idea of a document
containing fields of text. This flexibility allows Lucene's API to be
independent of file format. Text from PDFs, HTML, Microsoft Word
documents, as well as many others can all be indexed so long as their
textual information can be extracted."
Id like to be able to say the tags for this article should be [Lucene,
PDF, HTML, Microsoft Word] because they are in field values from other
documents. Basically how to generate tags from just a single document
based on other document field values.
- Jon
On Oct 31, 2008, at 6:17 PM, Grant Ingersoll wrote:
Hey Jon,
Not following how the TVC (TermVectorComp) would help here. I
suppose you could use the "most important" terms, as defined by
TF-IDF, as suggested tags. The MLT (MoreLikeThis) uses this to
generate query terms.
However, I'm not following the different filter query piece. Can you
provide a bit more details?
One thing you did make me think, though, is it might be interesting
to extend TermVectorMapper so that it can output a NamedList and then
allow people to implement their own SolrTermVectorMapper and have it
customize the TV output...
Thanks,
Grant
On Oct 31, 2008, at 5:20 PM, Jon Baer wrote:
Hi,
So Im looking to either use this or build a component which might do
what Im looking for. Id like to figure out if its possible use a
single doc to get tag generation based on the matches within that
document for example:
1 News Doc -> contains 5 Players and 8 Teams (show them as possible
tags for this article)
In this case Players and Teams are also docs. It's almost like I
want to use MoreLikeThis w/ a different filter query than what Im
using.
Is there any easy hack to get this going?
Thanks.
- Jon
--------------------------
Grant Ingersoll
Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New Orleans.
http://www.lucenebootcamp.com
Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ