I have an interesting situation of searching Business Names where results 
should be partially sorted by position.

Searching for "Kramer Tractors" will not result in any matches as there no 
results that exactly match this. However there are business names that start 
with Kramer and there are also business names which contain the word Tractor. 
One important item to note is that we don't want the document frequency to 
influence the score.

Ideally we'd like the Kramer matches to appear before the Tractor Matches. At 
the moment I'm using simply boosting the terms as in Kramer^4 Tractor^2.  I've 
looked into using the term vector component. I've starting playing with the TVC 
but suspect, from the documentation, that Document Frequency is causing my 
results to be ordered not to my liking.

If I read the following correctly, Korpan Tractor appears first due to Tractors 
having df=35.

<lst name="103503">
                                                <str 
name="uniqueKey">103503</str>
                                                <lst name="BUS_BUSINESS_NAME">
                                                                <lst 
name="korpan">
                                                                                
<int name="tf">1</int>
                                                                                
<lst name="positions">
                                                                                
                <int name="position">0</int>
                                                                                
</lst>
                                                                                
<lst name="offsets">
                                                                                
                <int name="start">0</int>
                                                                                
                <int name="end">6</int>
                                                                                
</lst>
                                                                                
<int name="df">6</int>
                                                                                
<double name="tf-idf">0.16666666666666666</double>
                                                                </lst>
                                                                <lst 
name="tractor">
                                                                                
<int name="tf">1</int>
                                                                                
<lst name="positions">
                                                                                
                <int name="position">1</int>
                                                                                
</lst>
                                                                                
<lst name="offsets">
                                                                                
                <int name="start">7</int>
                                                                                
                <int name="end">14</int>
                                                                                
</lst>
                                                                                
<int name="df">35</int>
                                                                                
<double name="tf-idf">0.02857142857142857</double>
                                                                </lst>
                                                </lst>
                                </lst>
                                <lst name="503457">
                                                <str 
name="uniqueKey">503457</str>
                                                <lst name="BUS_BUSINESS_NAME">
                                                                <lst 
name="salvage">
                                                                                
<int name="tf">1</int>
                                                                                
<lst name="positions">
                                                                                
                <int name="position">3</int>
                                                                                
</lst>
                                                                                
<lst name="offsets">
                                                                                
                <int name="start">12</int>
                                                                                
                <int name="end">19</int>
                                                                                
</lst>
                                                                                
<int name="df">61</int>
                                                                                
<double name="tf-idf">0.01639344262295082</double>
                                                                </lst>
                                                                <lst 
name="tractor">
                                                                                
<int name="tf">1</int>
                                                                                
<lst name="positions">
                                                                                
                <int name="position">2</int>
                                                                                
</lst>
                                                                                
<lst name="offsets">
                                                                                
                <int name="start">4</int>
                                                                                
                <int name="end">11</int>
                                                                                
</lst>
                                                                                
<int name="df">35</int>
                                                                                
<double name="tf-idf">0.02857142857142857</double>
                                                                </lst>
                                                </lst>
                                </lst>
                                <lst name="903">
                                                <str name="uniqueKey">903</str>
                                                <lst name="BUS_BUSINESS_NAME">
                                                                <lst 
name="kramer">
                                                                                
<int name="tf">1</int>
                                                                                
<lst name="positions">
                                                                                
                <int name="position">0</int>
                                                                                
</lst>
                                                                                
<lst name="offsets">
                                                                                
                <int name="start">0</int>
                                                                                
                <int name="end">6</int>
                                                                                
</lst>
                                                                                
<int name="df">72</int>
                                                                                
<double name="tf-idf">0.013888888888888888</double>
                                                                </lst>
                                                                <lst name="ltd">
                                                                                
<int name="tf">1</int>
                                                                                
<lst name="positions">
                                                                                
                <int name="position">1</int>
                                                                                
</lst>
                                                                                
<lst name="offsets">
                                                                                
                <int name="start">7</int>
                                                                                
                <int name="end">10</int>
                                                                                
</lst>
                                                                                
<int name="df">9798</int>
                                                                                
<double name="tf-idf">1.0206164523372117E-4</double>
                                                                </lst>
                                                </lst>

Am I going in the wrong direction with trying to use the Term Vector Component 
to accomplish Kramer then Tractor?

Thanks,

Corey

Reply via email to