Re: Solr and TF-IDF

Lee Carroll Thu, 26 Jan 2012 12:01:39 -0800

"content-based recommender"  so its not CF etc
and its a project so its whatever his supervisor wants.


take a look at solrj should be more natural to integrate your java code with.

(Although not sure if it supports termv ector comp)

good luck



On 26 January 2012 17:27, Walter Underwood <wun...@wunderwood.org> wrote:
> Why are you using a search engine to build a recomender? None of the leading 
> teams in the Netflix Prize used search engines as a base technology.
>
> Start with the recommender algorithms in Mahout: http://mahout.apache.org/
>
> wunder
>
> On Jan 26, 2012, at 9:18 AM, Nejla Karacan wrote:
>
>> Hey there,
>>
>> I'm using Solr for my thesis, where I have to implement a content-based
>> recommender system for movies.
>>
>> I have indexed about 20thousand movies with their informations:
>> movie-id
>> title
>> genre
>> plot/movie-description <- !!!
>> cast
>>
>> I've enabled the TermvektorComponent for the fields genre, description and
>> cast.
>> So I can get the tf-idf-values for the terms of every movie.
>>
>> With these term-TfIdfValue-couples I have to compute the similarities
>> between movies by using the cosine similarity.
>> I know about the Solr-Feature MLT (MoreLikeThis), but thats not the
>> solution, I have to
>> implement the CosineSimilarity in java myself.
>>
>> Now I have some problems/questions:
>> I get the responses in XML-format, which I read out with an XML-reader in
>> Java,
>> where it wriggle trough every child-node in order to reach the right node.
>> Is there a better way, to get these values in Node-Attributes or node-texts?
>> I have tried it with wt=csv but for the requests I get
>> responses only with the Movie-ID's, nothing more.
>> By XML-responseWriter my request is for example this:
>> http://localhost:8983/solr/select/?qt=tvrh&q=id:1800180382&fl=id&tv.tf_idf=true
>> I get the right response with all terms and tf-tdf's - in xml.
>>
>> And if I add csv-notation
>> http://localhost:8983/solr/select/?qt=tvrh&q=id:1800180382&fl=id&tv.tf_idf=true&wt=csv
>> I get only this:
>> id
>> 1800180382
>>
>> Maybe my request is wrong?
>>
>> Another problem is, if I get the terms and their tfidf-values, I store
>> them in a map.
>> But there isn't a succession in the values. I want e.g. store only the 10
>> chief terms,
>> so 10 terms with the highest tfidf-values. Can I sort them in a descending
>> succession?
>> I haven't find anything therefor. If its not possible, I must sort them
>> later in the map.
>>
>> My last question is:
>> any movie has a genre - often more than one.
>> Its like the "cat"-field (category) in the exampledocs with ipod/monitor
>> etc. and its an important pointfor the movies.
>> How can I integrate this factor?
>> I changed the boost-attribute in the Solr-Xml-Schema like this:
>> <field name="genre" type="string" indexed="true" stored="true"
>> multiValued="true" omitNorms="false" boost="3" termVectors="true"
>> termPositions="true" termOffsets="true"/>
>> Is that enough or is there any other possibility?
>>
>> Perhaps you see, that I am a beginner in Solr,
>> at the beginning a few weeks ago it was even more difficult for me but now
>> it goes better.
>> I would be very grateful for any help, ideas, tips or suggestions!
>>
>> Many regards
>> Nejla
>>
>
>
>

Re: Solr and TF-IDF

Reply via email to