Don't we need vectors of the same size to calculate the cosine similarity? Maybe I missed something, but following that example it looks like i have to manually recreate the sparse vectors, because the term vector of a document should (i may be wrong) contain only the terms that appear in that document. Am I wrong?
Given that i assumed (and that example goes in that direction) that we have to manually create the sparse vector by first collecting all the terms and then calculating the tf-idf frequency for each term in each document. That's what i did, and I obtained vectors of the same dimension for each document, i was just wondering if there was a better optimized way to obtain those sparse vectors. -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html