tf.idf was invented because cosine similarity is too much computation. tf.idf gives similar results much, much faster than cosine distance.
I would expect cosine similarity to be slow. I would also expect retrieving 1 million records to be slow. Doing both of those in one minute is pretty good. As Kernighan and Paugher said in 1978, "Don’t diddle code to make it faster—find a better algorithm.” https://en.wikipedia.org/wiki/The_Elements_of_Programming_Style wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Aug 11, 2019, at 10:40 AM, Doug Turnbull > <dturnb...@opensourceconnections.com> wrote: > > Hi Vignan, > > We need to see more details / code of what your query parser plugin does > exactly with term vectors, we can't really help you without more details. > Is it open source? Can you share a minimal example that recreates the > problem? > > On Sun, Aug 11, 2019 at 1:19 PM Vignan Malyala <dsmsvig...@gmail.com> wrote: > >> Hi guys, >> >> I made my custom qparser plugin in Solr for scoring. The plugin only does >> cosine similarity of vectors for each record. I use term vectors here. >> Results are fine! >> >> BUT, Solr response is very slow with term vectors. It takes around 55 >> seconds for each request for 1000000 records. >> How do I make it faster to get my results in ms ? >> Please respond soon as its lil urgent. >> >> Note: All my values are stored and indexed. I am not using Solr Cloud. >> > > > -- > *Doug Turnbull **| CTO* | OpenSource Connections > <http://opensourceconnections.com>, LLC | 240.476.9983 > Author: Relevant Search <http://manning.com/turnbull> > This e-mail and all contents, including attachments, is considered to be > Company Confidential unless explicitly stated otherwise, regardless > of whether attachments are marked as such.