On Tue, Apr 29, 2003 at 12:38:18PM +0200, Enrico Zini wrote: > On Mon, Apr 28, 2003 at 03:23:24PM +0200, Javier Fernández-Sanguino Peña > wrote: > > > [1] One of the difficult things in the future might be to generate new tags > > or associate new packages to tags already available. Automatising (sp?) > > this would be useful and TFIDF (and similar IA-related techniques) help > > with this quite a lot. > > I've never tried the tools you are suggesting me, but I definitely will. > I have some immediate additions to make to tagcoll and debtags based on > the many suggestions I have received. >
Notice that rainbow (libbow) is not being actively updated upstream anymore. The library works, the tool works, but some documentation is still lacking. > You're showing me a whole new world to explore, and I'll be sure do it > asap. Glad to help. Just FYI TFIDF is a very simple "technology" (as a matter of fact it's just an equation) used to determine the 'weight' of words given a liberal text. It's useful for document clustering (because you can determine documents belong to the same 'group' if they have the same word weights). The application I found, in Debian, worth testing (which prompted me to develop the hack that 'dpkg-iasearch' is) is to use document clustering and TFIDF to find packages. If you have a set of packages descriptions (let's say 4000) you can parse all the words in the descriptions, compare all the words in all the descriptions and determine which words are 'appropiate' to describe a given package. As a matter of fact, naturally, this same words are keywords that can describe a set of related packages and thus, this approach could be useful to automaticly tags new packages when they get into Debian. Just my 2c. Regards Javi
pgpp75dqPcXpw.pgp
Description: PGP signature