Thanks for the link Shalin... played with that a while back.. It's possibly got some indirect possibilities.
On Mon, Jan 26, 2009 at 10:46 AM, Hannes Carl Meyer <m...@hcmeyer.com> wrote: > I didn't understand, is the corpus of documents you want to use to classify > fix? Assume the 'documents' are not stored in the same index and I want to only store the taxonomy or ontology in this index. Instead of indexing documents about 'sports' and searching for hits based upon 'basketball', 'football' etc.. I simply want to index the taxonomy and classify documents into it. This is a an ancient AI/Data-Mining discipline.. but the standard methods of 'indexing' the taxonomy are/were primitive compared to what one /could/ do with something like Lucene. Here's a 2007 research paper that used Lucene directly for classification, but doing the inverse of what I described: http://www.cs.ucl.ac.uk/staff/R.Hirsch/papers/gecco_HHS.pdf >>>previously suggested procedure of 1) store document 2) execute >>>more-like-this and 3) delete document would be too slow. > Do you mean the document to classify? > Why do you then want to put it into the index (very expensive), you just > need the contents of it to build a query! Exactly.. in the December Taxonomy thread Walter Underwood outlined a store/classify/delete procedure. Too slow if you have no need to index the document itself. - Neal