Well for example in any given text (which is field on a document);

"While suitable for any application which requires full text indexing and searching capability, Lucene has been widely recognized for its utility in the implementation of Internet search engines and local, single-site searching.

At the core of Lucene's logical architecture is the idea of a document containing fields of text. This flexibility allows Lucene's API to be independent of file format. Text from PDFs, HTML, Microsoft Word documents, as well as many others can all be indexed so long as their textual information can be extracted."

Id like to be able to say the tags for this article should be [Lucene, PDF, HTML, Microsoft Word] because they are in field values from other documents. Basically how to generate tags from just a single document based on other document field values.

- Jon


On Oct 31, 2008, at 6:17 PM, Grant Ingersoll wrote:

Hey Jon,

Not following how the TVC (TermVectorComp) would help here. I suppose you could use the "most important" terms, as defined by TF- IDF, as suggested tags. The MLT (MoreLikeThis) uses this to generate query terms.

However, I'm not following the different filter query piece. Can you provide a bit more details?

One thing you did make me think, though, is it might be interesting to extend TermVectorMapper so that it can output a NamedList and then allow people to implement their own SolrTermVectorMapper and have it customize the TV output...

Thanks,
Grant

On Oct 31, 2008, at 5:20 PM, Jon Baer wrote:

Hi,

So Im looking to either use this or build a component which might do what Im looking for. Id like to figure out if its possible use a single doc to get tag generation based on the matches within that document for example:

1 News Doc -> contains 5 Players and 8 Teams (show them as possible tags for this article)

In this case Players and Teams are also docs. It's almost like I want to use MoreLikeThis w/ a different filter query than what Im using.

Is there any easy hack to get this going?

Thanks.

- Jon

--------------------------
Grant Ingersoll
Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New Orleans.
http://www.lucenebootcamp.com


Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ










Reply via email to