AnalyzingSuggester returning index value instead of field value?

2013-02-07 Thread Sebastian Saip
I'm looking into a way to implement an autosuggest and for my special needs
(I'm doing a "startsWith"-search that should retrieve the full name, which
may have accents - However, I want to search with/without accents and in
any upper/lowercase for comfort)

Here's part of my configuration: http://pastebin.com/20vSGJ1a

So I have a name="Têst Námè" and I query for "test", "tést", "TÈST", or
similiar. This gives me back "test name" as a suggestion, which looks like
the index, rather than the actual value.

Furthermore, when I fed the document without index-analyzers, then added
the index-analyzers, restarted without refeeding and queried, it returned
the right value (so this seems to retrieve the index, rather than the
actual stored value?)

Or maybe I just configured it the wrong way :?
Theres not really much documentation about this yet :(

BR Sebastian Saip


Re: AnalyzingSuggester returning index value instead of field value?

2013-02-07 Thread Sebastian Saip
It's the same with whitespace removed unfortunately - still getting back
"testname" then.
I'm not quite sure how to test this via the Lucene API - in particular, how
to define the KeywordTokenizer with ASCII+LowerCase, so I can't test this
atm :/

BR Sebastian Saip


On 7 February 2013 16:19, Michael McCandless wrote:

> I'm not very familiar with how AnalyzingSuggester works inside Solr
> ... if you try this directly with the Lucene APIs does it still
> happen?
>
> Hmm maybe one idea: if you remove whitespace from your suggestion does
> it work?  I wonder if there's a whitespace / multi-token issue ... if
> so then maybe see how TestPhraseSuggestions.java (in Solr) does this?
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Thu, Feb 7, 2013 at 9:48 AM, Sebastian Saip 
> wrote:
> > I'm looking into a way to implement an autosuggest and for my special
> needs
> > (I'm doing a "startsWith"-search that should retrieve the full name,
> which
> > may have accents - However, I want to search with/without accents and in
> > any upper/lowercase for comfort)
> >
> > Here's part of my configuration: http://pastebin.com/20vSGJ1a
> >
> > So I have a name="Têst Námè" and I query for "test", "tést", "TÈST", or
> > similiar. This gives me back "test name" as a suggestion, which looks
> like
> > the index, rather than the actual value.
> >
> > Furthermore, when I fed the document without index-analyzers, then added
> > the index-analyzers, restarted without refeeding and queried, it returned
> > the right value (so this seems to retrieve the index, rather than the
> > actual stored value?)
> >
> > Or maybe I just configured it the wrong way :?
> > Theres not really much documentation about this yet :(
> >
> > BR Sebastian Saip
>


Re: AnalyzingSuggester returning index value instead of field value?

2013-02-07 Thread Sebastian Saip
The solution, as pointed out on
http://stackoverflow.com/questions/14732713/solr-autosuggest-with-diacritics/14743278
,
is not to use a copyField but instead use the AnalyzingSuggester on the
StrField directly.

Cheers!


On 7 February 2013 17:30, Sebastian Saip  wrote:

> It's the same with whitespace removed unfortunately - still getting back
> "testname" then.
> I'm not quite sure how to test this via the Lucene API - in particular,
> how to define the KeywordTokenizer with ASCII+LowerCase, so I can't test
> this atm :/
>
> BR Sebastian Saip
>
>
> On 7 February 2013 16:19, Michael McCandless wrote:
>
>> I'm not very familiar with how AnalyzingSuggester works inside Solr
>> ... if you try this directly with the Lucene APIs does it still
>> happen?
>>
>> Hmm maybe one idea: if you remove whitespace from your suggestion does
>> it work?  I wonder if there's a whitespace / multi-token issue ... if
>> so then maybe see how TestPhraseSuggestions.java (in Solr) does this?
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>> On Thu, Feb 7, 2013 at 9:48 AM, Sebastian Saip 
>> wrote:
>> > I'm looking into a way to implement an autosuggest and for my special
>> needs
>> > (I'm doing a "startsWith"-search that should retrieve the full name,
>> which
>> > may have accents - However, I want to search with/without accents and in
>> > any upper/lowercase for comfort)
>> >
>> > Here's part of my configuration: http://pastebin.com/20vSGJ1a
>> >
>> > So I have a name="Têst Námè" and I query for "test", "tést", "TÈST", or
>> > similiar. This gives me back "test name" as a suggestion, which looks
>> like
>> > the index, rather than the actual value.
>> >
>> > Furthermore, when I fed the document without index-analyzers, then added
>> > the index-analyzers, restarted without refeeding and queried, it
>> returned
>> > the right value (so this seems to retrieve the index, rather than the
>> > actual stored value?)
>> >
>> > Or maybe I just configured it the wrong way :?
>> > Theres not really much documentation about this yet :(
>> >
>> > BR Sebastian Saip
>>
>
>


Re: what do you use for testing relevance?

2013-02-12 Thread Sebastian Saip
What do you want to achieve with these tests?

Is it meant as a regression, to make sure that only the queries/boosts you
changed are affected?
Then you will have to implement tests that cover your specific
schema/boosts. I'm not aware of any frameworks that do this - we're using
Java based tests that retrieve documents from solr,  map them to our domain
model (objects representing a document) and do assertions on "debug values"
(e.g. score)

Or is it more about "whats more relevant for the user?" Then you will need
some kind of user tracking, as Markus described already.

BR


On 12 February 2013 23:16, Markus Jelsma  wrote:

> Roman,
>
> Logging clicks and their position in the result list is one useful method
> to measure the relevance. Using the position you can calculate the mean
> reciprocal rank, a value near 1.0 is very good so over time you can clearly
> see whether changes actually improve user experience/expectations. Keep in
> mind that there is some noise because users tend to click one or more of
> the first few results anyway.
>
> You may also be interested in A/B testing.
>
> http://en.wikipedia.org/wiki/Mean_reciprocal_rank
> http://en.wikipedia.org/wiki/A/B_testing
>
> Cheers
> Markus
>
>
> -Original message-
> > From:Roman Chyla 
> > Sent: Tue 12-Feb-2013 23:04
> > To: solr-user@lucene.apache.org
> > Subject: what do you use for testing relevance?
> >
> > Hi,
> > I do realize this is a very broad question, but still I need to ask it.
> > Suppose you make a change into the scoring formula. How do you
> > test/know/see what impact it had? Any framework out there?
> >
> > It seems like people are writing their own tools to measure relevancy.
> >
> > Thanks for any pointers,
> >
> >   roman
> >
>


Re: Send Input Through Json into solr

2013-02-13 Thread Sebastian Saip
I'm not sure if I understood you..

You want to send a request like http://localhost/solr/select?
q=*:*&wt=json&start=0&fq=course_id:"18" and get back only parts of the
response for further processing?
Then the easiest way is to retrieve the whole json and post-process only
"responseHeader.params".

BR



On 13 February 2013 11:45, anurag.jain  wrote:

> hey,
>
> I want to send query input through json file do not want to give query
> parameter. so is there any way to send.
>
>
> Like if i give query parameter it give response and in response there is a
> key call as parameter. so if i send that parameter through json. it will
> easy for me.
>
> let say input parameter is
> http://localhost/solr/select?q=*:*&wt=json&start=0&fq=course_id:\"18\";
> it give me response.
>
>
> responseHeader":{
> "status":0,
> "QTime":2,
> "params":{
>   "indent":"on",
>   "start":"0",
>   "q":"*:*",
>   "wt":"json",
>   "fq":"course_id:\\\"18\\\""}
> },
>   "response":{"numFound":729,"start":0,"docs":[
>   {
>   ...
>   },
>   {
>   ...
>   }
>   ]
>
>
>
>
> i want to send this json file
>
> {
>   "indent":"on",
>   "start":"0",
>   "q":"*:*",
>   "wt":"json",
>   "fq":"course_id:\\\"18\\\""
> }
>
> is there any way to do this ? ?
>
> please reply. it will help me out with lots of problems
>
>
> Thanks ---
>
>
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Send-Input-Through-Json-into-solr-tp4040186.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Send Input Through Json into solr

2013-02-13 Thread Sebastian Saip
Ok, I see - you want to send a JSON Object which contains the query
parameters.

As far as I know, that's not possible out-of-the-box, so you'll have to
create a custom SearchHandler
http://lucene.apache.org/solr/4_1_0/solr-core/org/apache/solr/handler/component/SearchHandler.html
for
that.

In the end it would be easier to do the mapping from JSON to HTTP params in
your application (looping through the JSON fields and appending them to the
request URL)

BR


On 13 February 2013 12:09, anurag.jain  wrote:

> No, I want to post the parameters to solr in json format.
>
> you got me wrong. Actually it is little bit difficult to me to explain
> correctly :(
>
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Send-Input-Through-Json-into-solr-tp4040186p4040190.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


difference between q=field:"value1 value2" and q=field:value1 value2

2013-02-19 Thread Sebastian Saip
Hi there,

I'm implementing a didYouMean in Java, which will returns collated terms.
Unfortunately, the only way (?) to retrieve those collated terms is by
"getCollationQueryString()", which will look something like this:
input for my query: atricle test
my final query: (field1:atricle test OR field2:atricle test OR ...)
getCollationQueryString: (field1:article test OR field2:article test OR ...
)

So far so good, I could just regex/match the "article test" out of it, but
my queries are not always the same (e.g. field1 or field2 are not searched
on everytime), so this requires a bit of work..


Therefore (and to come to my actual problem), I thought I'll just add
double-quotes around my searchterm, making it easier to extract the
corrected terms from the getCollationQueryString():

input for my query: atricle test
my final query: (field1:"atricle test" OR field2:"atricle test" OR ...)
getCollationQueryString: (field1:"article test" OR field2:"article test" OR
... )

---

Now the problem is, that I'm getting different results for:
>> (field1:article test OR field2:article test OR ... )
and
>> (field1:"article test" OR field2:"article test" OR ... )

All fields are of "text_en" and I want to have partial matches (so the
terms don't have to be consecutive)

Is there any way (parameter or whatever) to get this bevahiour although I'm
using double-quotes around it?
Or even easier/better .. Is there any way to get the collation term
("article test"), rather than the whole query?

Cheers!


Re: difference between q=field:"value1 value2" and q=field:value1 value2

2013-02-20 Thread Sebastian Saip
I see.. this is exactly as lucene then.

Thanks Erick!


On 20 February 2013 03:04, Erick Erickson  wrote:

> Of course you are 
> (field1:article test OR field2:article test OR ...)
>
> parses as
>
> field1:article defaultfield:test OR field2:article defaultfield:test
> probably with an implied SHOULD
>
> whereas
> (field1:"article test" OR field2:"article test" OR ... )
> is parsing as phrase queries. That is, test must appear immediately after
> article in either field1 ro field2
>
> Best
> Erick
>
>
> On Tue, Feb 19, 2013 at 6:03 PM, Sebastian Saip  >wrote:
>
> > Hi there,
> >
> > I'm implementing a didYouMean in Java, which will returns collated terms.
> > Unfortunately, the only way (?) to retrieve those collated terms is by
> > "getCollationQueryString()", which will look something like this:
> > input for my query: atricle test
> > my final query: (field1:atricle test OR field2:atricle test OR ...)
> > getCollationQueryString: (field1:article test OR field2:article test OR
> ...
> > )
> >
> > So far so good, I could just regex/match the "article test" out of it,
> but
> > my queries are not always the same (e.g. field1 or field2 are not
> searched
> > on everytime), so this requires a bit of work..
> >
> >
> > Therefore (and to come to my actual problem), I thought I'll just add
> > double-quotes around my searchterm, making it easier to extract the
> > corrected terms from the getCollationQueryString():
> >
> > input for my query: atricle test
> > my final query: (field1:"atricle test" OR field2:"atricle test" OR ...)
> > getCollationQueryString: (field1:"article test" OR field2:"article test"
> OR
> > ... )
> >
> > ---
> >
> > Now the problem is, that I'm getting different results for:
> > >> (field1:article test OR field2:article test OR ... )
> > and
> > >> (field1:"article test" OR field2:"article test" OR ... )
> >
> > All fields are of "text_en" and I want to have partial matches (so the
> > terms don't have to be consecutive)
> >
> > Is there any way (parameter or whatever) to get this bevahiour although
> I'm
> > using double-quotes around it?
> > Or even easier/better .. Is there any way to get the collation term
> > ("article test"), rather than the whole query?
> >
> > Cheers!
> >
>


"synonym replacement" in AnalyzingSuggester?

2013-02-21 Thread Sebastian Saip
I'm using the new AnalyzingSuggester (my code is available on
http://pastebin.com/tN9yXHB0)
and I got the synonyms "whisky,whiskey" (they are bi-directional)

So whether the user searches for whiskey or whisky, I want to retrieve all
documents that have any of them.

However, for autosuggest, I would like to prefer (better said: only show!)
"whisky"
e.g. I got the document "Whiskey Bottle"
but autosuggest for "whi" should return "Whisky Bottle"

The only way I'd think of is replacing "Whiskey" with "Whisky" on feeding,
but that would also mean an additional field in solr (since I do want to
keep "Whiskey" in the original field)

Is there any way to do some kind of "synonym replacement" on-the-fly for
these suggestions?
Has anyone ever done that or has an idea how to do that?

Cheers.
Sebastian


Re: Matching an exact word

2013-02-21 Thread Sebastian Saip
And keep in mind you do need quotes around your searchTerm if it consists
of multiple words - q=text_exact_field:"your_unquoted_query"
otherwise Solr will interpret "two words" as: "exact_field:two
defaultfield:words"

(Maybe not directly applicable for your problem Kristian, but I just want
to mention that there are a few StemFilters available, maybe another one
acts differently!)


On 21 February 2013 21:52, SUJIT PAL  wrote:

> You could also do this outside Solr, in your client. If your query is
> surrounded by quotes, then strip away the quotes and make
> q=text_exact_field:your_unquoted_query. Probably better to do outside Solr
> in general keeping in mind the upgrade path.
>
> -sujit
>
> On Feb 21, 2013, at 12:20 PM, Van Tassell, Kristian wrote:
>
> > Thank you.
> >
> > So essentially I need to write a custom query parser (extending upon
> something like the QParser)?
> >
> > -Original Message-
> > From: Upayavira [mailto:u...@odoko.co.uk]
> > Sent: Thursday, February 21, 2013 12:22 PM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Matching an exact word
> >
> > Solr will only match on the terms as they are in the index. If it is
> stemmed in the index, it will match that. If it isn't, it'll match that.
> >
> > All term matches are (by default at least) exact matches. Only with
> stemming you are doing an exact match against the stemmed term.
> > Therefore, there really is no way to do what you are looking for within
> Solr. I'd suggest you'll need to do some parsing at your side and, if you
> find quotes, do the query against a different field.
> >
> > Upayavira
> >
> > On Thu, Feb 21, 2013, at 06:17 PM, Van Tassell, Kristian wrote:
> >> I'm trying to match the word "created". Given that it is surrounded by
> >> quotes, I would expect an exact match to occur, but instead the entire
> >> stemming results show for words such as create, creates, created, etc.
> >>
> >> q="created"&wt=xml&rows=1000&qf=text&defType=edismax
> >>
> >> If I copy the text field to a new one that does not stem words,
> >> "text_exact" for example, I get the expected results:
> >>
> >> q="created"&wt=xml&rows=1000&qf=text_exact&defType=edismax
> >>
> >> I would like the decision whether to match exact or not to be
> >> determined by the quotes rather than the qf parameter (eg, not have to
> >> use it at all). What topic do I need to look into more to understand
> >> this? Thanks in advance!
> >>
>
>