Controlling the order of partial matches based on the position

2011-10-11 Thread aronitin
Hi All, 

I'm using SOLR/Lucene to index few keywords in a "multivalued" field. The
data that is being stored in the indexes is already mined to remove the
noise and occurrences and is very precise. All the text mining and filtering
steps are already performed before indexing. 

Whenever a user search for a specific keyword e.g. query: "user feedback"
and if a partial match happens like 
"user testing" and "search user" in 2 documents. I want to control the rank
of the documents based on the position where the match has happened. 

User testing should always appear 1st than search user. 

Also I would like to understand the possibility of dropping the partial
match happening at a position other than 1st. 

Can somebody point out how can we achieve both of the things? 

Thanks 
Nitin

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Controlling-the-order-of-partial-matches-based-on-the-position-tp3413867p3413867.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Controlling the order of partial matches based on the position

2011-10-17 Thread aronitin
Guys,

It's been almost a week but there are no replies to the question that I
posted. 

If its a small problem and already answered somewhere, please point me to
that post. Otherwise please suggest any pointer to handle the requirement
mentioned in the question,

Nitin

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Controlling-the-order-of-partial-matches-based-on-the-position-tp3413867p3429823.html
Sent from the Solr - User mailing list archive at Nabble.com.


Incorrect Search Results showing up

2011-10-25 Thread aronitin
Hi Group,

I've the defined a type "text" in the SOLR schema as shown below. 


  






  
  






  


A multi valued field is defined to use the type defined above


I index some content such as 
- Google REST API
- Facebook REST API
- Software Architecture
- Design Documents
- Xml Web Services
- Web API design

When I issue a search query like content:"rest api"~4, the matches that I
get are
- Google REST API (which is fine)
- Facebook REST API (which is fine)
- *Web API design* (which is not fine, because the query was a phrase query
and rest and api should be within 4 words of each other)

Does any body see the 3rd search result as a correct search result to be
returned? If yes, then what is explanation for that result based on the
schema defined.

According to me 3rd result should not be returned as part of the search
result. If somebody can point out anything wrong in my schema it will be
great help to me.

Thanks
Nitin



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Incorrect-Search-Results-showing-up-tp3452810p3452810.html
Sent from the Solr - User mailing list archive at Nabble.com.


Sorting results within the fields

2012-01-13 Thread aronitin
I need to implement sorting of search results where sorting needs to be done
based on the fields that are matched for a query and the score associated
with each term in the field which is generated by application logic.

e.g if there are 3 fields which are being queried and the final query after
applying the boosts is 
field1^8 OR field2^4 OR field3^2 

and there are 2 documents D1, D2 matching field1,
 2 documents D3,D4 matching field2 and 
 3 documents D5, D6, D7 matching field3

The 2  search results in field1 needs to be sorted by score which is
generated by my application, then the next 2 search results matching field2
needs to be sorted the score value which is generate by my application
and 3 search results matching field3 needs to be sorted the score value
which is generate by my application. So, that the final results of the query
will look like

(D1, D2) (D3,D4) (D5,D6,D7).

Fields which are being matched are dynamic fields.
Score value which is generated by the application is score of each term
which is added to the dynamic field of the document.

Can somebody suggest how this can be achieved using SOLR and Lucene??

Thanks
Nitin




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Sorting-results-within-the-fields-tp3656049p3656049.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Sorting results within the fields

2012-01-17 Thread aronitin
It's been almost a week and there is no response to the question that I
asked. 

Is the question has less details or there is no way to achieve the same in
Lucene?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Sorting-results-within-the-fields-tp3656049p3666983.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Sorting results within the fields

2012-01-17 Thread aronitin
Hi Jan,

Thanks for the reply. 

Here is the concrete explanation of the problem that I'm trying to solve.

*SOLR Schema*

Here is the definition of the SOLR schema

*There are 3 dynamic fields*


   
   

*There are 4 searchable fields*


*Description*: Data in this field is Whitespace Tokenized, Stemmed,
Lowercased

 
*Description*: Data in this field is only lowercase and Keyword Tokenizer is
applied. So, data is not changed when stored in this field.


*Description*: Head terms are encoded in the format HEAD$Value


*Description*: Tail terms are encoded in the format TAIL$Value

The data that we store in these fields is cleaned up data from large text:
generally 1 word, 2 words, 3 words values

D1 -> UI, UI Design, UI Programming , UI Design Document, 
D2 -> UI Mockup, UI development
D3 -> UI

When somebody queries *UI*,  internal query that is generated is 
concepts_headtermencoded_concept:HEAD$ui^100.0 concepts:ui^50.0
concepts_tailtermencoded_concept:TAIL$ui^10.0

So, that head term matched document is ranked higher than partial match. 

Current Implementation without score ranks the document like: D1 > D2 > D3
(because Lucene use Tf, IDF while scoring the document)

Now, we have created *application specific score* for each concept and want
to sort the results based on that score but preserving the boost on the
field defined in the query. 
e.g.
D1 ->  UI=90, UI Design = 45, UI Programming = 40, UI Design Document = 85,
Project Wolverine=40
D2 -> UI Mockup=55, UI Development=74, Project Management=39
D3 -> UI=95, Project Wolverine=35
D4 -> UI Dev = 75, Video Project=42
1. If a match is found and only exact match was found then sorting will
happen based on the score value for the term that we have defined.
2. If a match is found and exact and partial matches are there. Then
sorting should happen based on the exact matched documents on top and then
partially matched documents sorted within themselves based on score.

*Examples*
*Search*: UI
*Desired Results*: D3 > D1 > D4  > D2 where (D3, D1) contains exact match
and hence scored within themselves. (D4, D2 both have head match but score
of head match in D4 > D2)

*Search*: Project
*Desired Results*: D1 > D2 > D3 > D4 Where D1, D2 and D3 are head term
matches and sorted within (D1, D2, D3) based on score and D4 is tail term
match (even though has better score tail term boost is 1/10th of head term
boost).

So,  in all we can override the TF, IDF of Lucene scoring and want do the
scoring based on our concept specific score but preserving giving the higher
preference to exact match and then partial matches.

Hope I explained the problem. Let me know if you have any specific question. 

Thanks
Nitin

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Sorting-results-within-the-fields-tp3656049p3668047.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Highlighting more than 1 term

2012-01-18 Thread aronitin
Hi Tim,

Can you share the "text_en" type definition? Do check if your have Stemmer
configured in the type definition.

If not then that might be the reason of scheduled not matching with
scheduling.

Thanks
Nitin



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Highlighting-more-than-1-term-tp3670862p3671004.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to boost the relevancy of a field

2012-01-18 Thread aronitin
Hi Dean,

You can use Query Time boosting where you specify the boost value in the
query itself that title:solr^2 OR body:solr

Thanks
Nitin

--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-boost-the-relevancy-of-a-field-tp3671020p3671118.html
Sent from the Solr - User mailing list archive at Nabble.com.