I would suggest you look at the mlt query parser. That allows you to find documents similar to a particular documents, and also allows for specifying the field to use for similarity purposes.
https://lucene.apache.org/solr/guide/7_0/other-parsers.html#more-like-this-query-parser <https://lucene.apache.org/solr/guide/7_0/other-parsers.html#more-like-this-query-parser> -Anshum > On Oct 26, 2017, at 1:16 AM, Atita Arora <atitaar...@gmail.com> wrote: > > Hi , > > We're working with a productr where the idea is to present the users the > related documents in particular timeseries. > > For an overview think about this as an application which picks up top > trending blogposts "topics" which are picked and ingested from various > social sites. > Further , when you look into the topic from the trending list it shows the > related topics which happen to happen on the blogposts. > So to mark a related topic they should have occured on a same blogpost , to > add , more are these number of occurences , more would be the relatedness > factor. > > Complexity is the related topics change on the user defined date spread , > which means if x & y were top most related topics in the blogposts made in > last 30 days , > there is an equal possibility that x could be more related to z if the user > would have wanted to see related topics in last 60 days. > So the number of days are user defined and they impact the related topics. > > For now every blogpost goes in the index as a seperate document and the > topic extraction happens alongside indexing which extracts the topics from > the blogposts and stores them in a different collection. > For this we have lot of duplicates on the index too , for e.g. a topicname > search "football" has around 80K documents , all of them are > topicname="football". > > I wonder if someone can help me : > 1. How to structure the document in such a way the queries could be more > performant > 2. Suggest me as to how can we detect the RELATED topics. > > Any help on this would be highly appreciated. > > Thanks in advance. > > Atita
signature.asc
Description: Message signed with OpenPGP