jmazanec15 commented on issue #12342: URL: https://github.com/apache/lucene/issues/12342#issuecomment-1668914911
> Second, great you remembered it, but I think there's no difference between cosine and L2 (i.e., search results are the same) if queries and documents have constant norms. They don't have to be normalized to the unit norm, I think any constant would suffice: > > L2=(a-b)^2 = |a|^2 - ab + |b|^2 = const1 - cosine_similarity(a,b) * const2 > > What do you think? cc @jmazanec15 @searchivarius yes I believe you are correct. Attached is a small proof that ordering is the same between euclidean distance and negative dot product when norm of all vectors is the same (please double check that I did not make any mistakes) <details> <summary> Proof </summary> Prove: Given a set of points, $S \subset \mathbb{R}^d$, where $\exists x \in \mathbb{R} \, \forall v \in S \, \|\|v\|\|^2 == x$. Then, $\forall q \in \mathbb{R}^d$, the ordering produced by $\|\|q - v_1\|\|^2 \le \|\|q - v_2\|\|^2 \iff v_1 \le v_2\$ is the same as the ordering produced by $-\langle q \, v_1\rangle \le -\langle q \, v_2\rangle \iff v_1 \le v_2\$. Starting with: $$\|\|q - v_1\|\|^2 \le \|\|q - v_2\|\|^2$$ By parallelogram law: $$\Rightarrow 2(\|\|q\|\|^2 + \|\|v_1\|\|^2) - \|\|q + v_1\|\|^2 \le 2(\|\|q\|\|^2 + \|\|v_2\|\|^2) - \|\|q + v_2\|\|^2$$ Removing equal terms: $$\Rightarrow - \|\|q + v_1\|\|^2 \le - \|\|q + v_2\|\|^2$$ Flipping sign: $$\Rightarrow \|\|q + v_1\|\|^2 \ge \|\|q + v_2\|\|^2$$ Definition of norm: $$\Rightarrow \langle q + v_1 \, q + v_1\rangle \ge \langle q + v_2 \, q + v_2\rangle$$ Expanding: $$\Rightarrow \langle q \, q \rangle + \langle v_1 \, v_1 \rangle + 2\langle q \, v_1 \rangle \ge \langle q \, q \rangle + \langle v_2 \, v_2 \rangle + 2\langle q \, v_2 \rangle$$ Removing equal terms: $$\Rightarrow 2\langle q \, v_1 \rangle \ge 2\langle q \, v_2 \rangle$$ Dividing and flipping sign: $$\Rightarrow -\langle q \, v_1 \rangle \le -\langle q \, v_2 \rangle$$ </details> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org