"content-based recommender" so its not CF etc and its a project so its whatever his supervisor wants.
take a look at solrj should be more natural to integrate your java code with. (Although not sure if it supports termv ector comp) good luck On 26 January 2012 17:27, Walter Underwood <wun...@wunderwood.org> wrote: > Why are you using a search engine to build a recomender? None of the leading > teams in the Netflix Prize used search engines as a base technology. > > Start with the recommender algorithms in Mahout: http://mahout.apache.org/ > > wunder > > On Jan 26, 2012, at 9:18 AM, Nejla Karacan wrote: > >> Hey there, >> >> I'm using Solr for my thesis, where I have to implement a content-based >> recommender system for movies. >> >> I have indexed about 20thousand movies with their informations: >> movie-id >> title >> genre >> plot/movie-description <- !!! >> cast >> >> I've enabled the TermvektorComponent for the fields genre, description and >> cast. >> So I can get the tf-idf-values for the terms of every movie. >> >> With these term-TfIdfValue-couples I have to compute the similarities >> between movies by using the cosine similarity. >> I know about the Solr-Feature MLT (MoreLikeThis), but thats not the >> solution, I have to >> implement the CosineSimilarity in java myself. >> >> Now I have some problems/questions: >> I get the responses in XML-format, which I read out with an XML-reader in >> Java, >> where it wriggle trough every child-node in order to reach the right node. >> Is there a better way, to get these values in Node-Attributes or node-texts? >> I have tried it with wt=csv but for the requests I get >> responses only with the Movie-ID's, nothing more. >> By XML-responseWriter my request is for example this: >> http://localhost:8983/solr/select/?qt=tvrh&q=id:1800180382&fl=id&tv.tf_idf=true >> I get the right response with all terms and tf-tdf's - in xml. >> >> And if I add csv-notation >> http://localhost:8983/solr/select/?qt=tvrh&q=id:1800180382&fl=id&tv.tf_idf=true&wt=csv >> I get only this: >> id >> 1800180382 >> >> Maybe my request is wrong? >> >> Another problem is, if I get the terms and their tfidf-values, I store >> them in a map. >> But there isn't a succession in the values. I want e.g. store only the 10 >> chief terms, >> so 10 terms with the highest tfidf-values. Can I sort them in a descending >> succession? >> I haven't find anything therefor. If its not possible, I must sort them >> later in the map. >> >> My last question is: >> any movie has a genre - often more than one. >> Its like the "cat"-field (category) in the exampledocs with ipod/monitor >> etc. and its an important pointfor the movies. >> How can I integrate this factor? >> I changed the boost-attribute in the Solr-Xml-Schema like this: >> <field name="genre" type="string" indexed="true" stored="true" >> multiValued="true" omitNorms="false" boost="3" termVectors="true" >> termPositions="true" termOffsets="true"/> >> Is that enough or is there any other possibility? >> >> Perhaps you see, that I am a beginner in Solr, >> at the beginning a few weeks ago it was even more difficult for me but now >> it goes better. >> I would be very grateful for any help, ideas, tips or suggestions! >> >> Many regards >> Nejla >> > > >