Hi, I you may leverage and / or improve MLT component [1].
HTH, Tommaso [1] : http://wiki.apache.org/solr/MoreLikeThis 2013/7/23 Furkan KAMACI <furkankam...@gmail.com> > Hi; > > Sometimes a huge part of a document may exist in another document. As like > in student plagiarism or quotation of a blog post at another blog post. > Does Solr/Lucene or its libraries (UIMA, OpenNLP, etc.) has any class to > detect it? >