Maybe overthinking this. There is a “more like this” feature at basically does this. Give that a try before digging deeper into the LTR methods. It may be good enough for rock and roll.
-- Rahul Singh rahul.si...@anant.us Anant Corporation On Mar 28, 2018, 12:25 PM -0400, Xavier Schepler <xavier.schep...@recommerce.com>, wrote: > Hello, > > I'm considering using Solr with learning to rank to build a product matcher. > For example, it should match the titles: > - Apple iPhone 6 16 Gb, > - iPhone 6 16 Gb, > - Smartphone IPhone 6 16 Gb, > - iPhone 6 black 16 Gb, > to the same internal reference, an unique identifier. > > With Solr, each document would then have a field for the product title and > one for its class, which is the unique identifier of the product. > Solr would then be used to perform matching as follows. > > 1. A search is performed with a given product title. > 2. The first three results are considered (this requires an initial > product title database). > 3. The most frequent identifier is returned. > > This method corresponds roughly to a k-Nearest Neighbor approach with the > cosine metric, k = 3, and a TF-IDF model. > > I've done some preliminary tests with Sci-kit learn and the results are > good, but not as good as the ones of more sophisticated learning algorithms. > > Then, I noticed that there exists learning to rank with Solr. > > First, do you think that such an use of Solr makes sense? > Second, is there a relatively simple way to build a learning model using a > sparse representation of the query TF-IDF vector? > > Kind regards, > > Xavier Schepler