Re: pf2 pf3 and stopwords

2015-12-17 Thread elisabeth benoit
Inversion (paris charonne or charonne paris) cannot be scored the same.

2015-12-16 11:08 GMT+01:00 Binoy Dalal :

> What is your exact use case?
>
> On Wed, 16 Dec 2015, 13:40 elisabeth benoit 
> wrote:
>
> > Thanks for your answer.
> >
> > Actually, using a slop of 1 is something I can't do (because of other
> > specifications)
> >
> > I guess I'll index differently.
> >
> > Best regards,
> > Elisabeth
> >
> > 2015-12-14 16:24 GMT+01:00 Binoy Dalal :
> >
> > > Moreover, the stopword de will work on your queries and not on your
> > > documents, meaning if you query 'Gare de Saint Lazare', the terms
> > actually
> > > searched for will be Gare Saint and Lazare, 'de' will be filtered out.
> > >
> > > On Mon, Dec 14, 2015 at 8:49 PM Binoy Dalal 
> > > wrote:
> > >
> > > > This isn't a bug. During pf3 matching, since your query has only
> three
> > > > tokens, the entire query will be treated as a single phrase, and with
> > > slop
> > > > = 0, any word that comes in the middle of your query  - 'de' in this
> > case
> > > > will cause the phrase to not be matched. If you want to get around
> > this,
> > > > try setting your slop = 1 in which case it should match Gare Saint
> > Lazare
> > > > even with the de in it.
> > > >
> > > > On Mon, Dec 14, 2015 at 7:22 PM elisabeth benoit <
> > > > elisaelisael...@gmail.com> wrote:
> > > >
> > > >> Hello,
> > > >>
> > > >> I am using solr 4.10.1. I have a field with stopwords
> > > >>
> > > >>
> > > >>  > > >> words="stopwords.txt"
> > > >> enablePositionIncrements="true"/>
> > > >>
> > > >> And I use pf2 pf3 on that field with a slop of 0.
> > > >>
> > > >> If the request is "Gare Saint Lazare", and I have a document "Gare
> de
> > > >> Saint
> > > >> Lazare", "de" being a stopword, this document doesn't get the pf3
> > boost,
> > > >> because of "de".
> > > >>
> > > >> I was wondering, is this normal? is this a bug? is something wrong
> > with
> > > my
> > > >> configuration?
> > > >>
> > > >> Best regards,
> > > >> Elisabeth
> > > >>
> > > > --
> > > > Regards,
> > > > Binoy Dalal
> > > >
> > > --
> > > Regards,
> > > Binoy Dalal
> > >
> >
> --
> Regards,
> Binoy Dalal
>


Re: pf2 pf3 and stopwords

2015-12-18 Thread elisabeth benoit
ok, thanks a lot for your advice.

i'll try that.



2015-12-17 10:05 GMT+01:00 Binoy Dalal :

> For this case of inversion in particular a slop of 1 won't cause any issues
> since such a reverse match will require the slop to be 2
>
> On Thu, 17 Dec 2015, 14:20 elisabeth benoit 
> wrote:
>
> > Inversion (paris charonne or charonne paris) cannot be scored the same.
> >
> > 2015-12-16 11:08 GMT+01:00 Binoy Dalal :
> >
> > > What is your exact use case?
> > >
> > > On Wed, 16 Dec 2015, 13:40 elisabeth benoit  >
> > > wrote:
> > >
> > > > Thanks for your answer.
> > > >
> > > > Actually, using a slop of 1 is something I can't do (because of other
> > > > specifications)
> > > >
> > > > I guess I'll index differently.
> > > >
> > > > Best regards,
> > > > Elisabeth
> > > >
> > > > 2015-12-14 16:24 GMT+01:00 Binoy Dalal :
> > > >
> > > > > Moreover, the stopword de will work on your queries and not on your
> > > > > documents, meaning if you query 'Gare de Saint Lazare', the terms
> > > > actually
> > > > > searched for will be Gare Saint and Lazare, 'de' will be filtered
> > out.
> > > > >
> > > > > On Mon, Dec 14, 2015 at 8:49 PM Binoy Dalal <
> binoydala...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > This isn't a bug. During pf3 matching, since your query has only
> > > three
> > > > > > tokens, the entire query will be treated as a single phrase, and
> > with
> > > > > slop
> > > > > > = 0, any word that comes in the middle of your query  - 'de' in
> > this
> > > > case
> > > > > > will cause the phrase to not be matched. If you want to get
> around
> > > > this,
> > > > > > try setting your slop = 1 in which case it should match Gare
> Saint
> > > > Lazare
> > > > > > even with the de in it.
> > > > > >
> > > > > > On Mon, Dec 14, 2015 at 7:22 PM elisabeth benoit <
> > > > > > elisaelisael...@gmail.com> wrote:
> > > > > >
> > > > > >> Hello,
> > > > > >>
> > > > > >> I am using solr 4.10.1. I have a field with stopwords
> > > > > >>
> > > > > >>
> > > > > >>  > > > > >> words="stopwords.txt"
> > > > > >> enablePositionIncrements="true"/>
> > > > > >>
> > > > > >> And I use pf2 pf3 on that field with a slop of 0.
> > > > > >>
> > > > > >> If the request is "Gare Saint Lazare", and I have a document
> "Gare
> > > de
> > > > > >> Saint
> > > > > >> Lazare", "de" being a stopword, this document doesn't get the
> pf3
> > > > boost,
> > > > > >> because of "de".
> > > > > >>
> > > > > >> I was wondering, is this normal? is this a bug? is something
> wrong
> > > > with
> > > > > my
> > > > > >> configuration?
> > > > > >>
> > > > > >> Best regards,
> > > > > >> Elisabeth
> > > > > >>
> > > > > > --
> > > > > > Regards,
> > > > > > Binoy Dalal
> > > > > >
> > > > > --
> > > > > Regards,
> > > > > Binoy Dalal
> > > > >
> > > >
> > > --
> > > Regards,
> > > Binoy Dalal
> > >
> >
> --
> Regards,
> Binoy Dalal
>


solr 4.10 I change slop in pf2 pf3 and query norm changes

2015-12-21 Thread elisabeth benoit
Hello all,

I am using solr 4.10.1 and I have configured my pf2 pf3 like this

catchall~0^0.2 name~0^0.21 synonyms^0.2
catchall~0^0.2 name~0^0.21 synonyms^0.2

my search field (qf) is my catchall field

I'v been trying to change slop in pf2, pf3 for catchall and synonyms (going
from 0, or default value for synonyms, to 1)

pf2=catchall~1^0.2 name~0^0.21 synonyms~1^0.2
pf3=catchall~1^0.2 name~0^0.21 synonyms~1^0.2

but some results are not ordered the same way anymore even if I get the
same MATCH values in debugQuery output

For instance, for a doc matching bastill in catchall field (and nothing to
do with pf2, pf3!)

with first pf2, pf3

0.5163083 = (MATCH) weight(catchall:bastill in 105256) [NoTFIDFSimilarity],
result of:
   * 0.5163083 = score(doc=105256,freq=2.0 = termFreq=2.0*
), product of:
 * 0.5163083 = queryWeight,* product of:
1.0 = idf(docFreq=134, maxDocs=12258543)
0.5163083 = queryNorm
  1.0 = fieldWeight in 105256, product of:
1.0 = tf(freq=2.0), with freq of:
  2.0 = termFreq=2.0
1.0 = idf(docFreq=134, maxDocs=12258543)
1.0 = fieldNorm(doc=105256)
  0.5163083 = (MATCH) weight(catchall:paris in 105256)
[NoTFIDFSimilarity], result of:
0.5163083 = score(doc=105256,freq=6.0 = termFreq=6.0

and when I change pf2 pf3 (the only change, same query, same docs)

0.47504464 = (MATCH) weight(catchall:paris in 105256) [NoTFIDFSimilarity],
result of:
   * 0.47504464 = score(doc=105256,freq=6.0 = termFreq=6.0*
), product of:
 * 0.47504464 = queryWeight*, product of:
1.0 = idf(docFreq=10958, maxDocs=12258543)
0.47504464 = queryNorm
  1.0 = fieldWeight in 105256, product of:
1.0 = tf(freq=6.0), with freq of:
  6.0 = termFreq=6.0
1.0 = idf(docFreq=10958, maxDocs=12258543)
1.0 = fieldNorm(doc=105256)

so in the end, with same MATCH results, in first case I get two documents
with same score, and in second case, one document has a higher score.

This seem very very strange. Does anyone have a clue what's going on?

Thanks
Elisabeth


Re: solr 4.10 I change slop in pf2 pf3 and query norm changes

2015-12-21 Thread elisabeth benoit
Hello,

I don't think the query is important in this case.

After checking out solr's debug output, I dont think the query norm is
relevant either.

I think the scoring changes because

1) in first case, I have same slop for catchall and name fields. Bot match
pf2 pf3. In this case, solr uses max of both for scoring pf2 pf3 results.

2) In second case, I have different slopes, then solr uses sum of values
instead of max.



If anyone knows how to work around this, please let me know.

Elisabeth

2015-12-21 11:22 GMT+01:00 Binoy Dalal :

> What is your query?
>
> On Mon, 21 Dec 2015, 14:37 elisabeth benoit 
> wrote:
>
> > Hello all,
> >
> > I am using solr 4.10.1 and I have configured my pf2 pf3 like this
> >
> > catchall~0^0.2 name~0^0.21 synonyms^0.2
> > catchall~0^0.2 name~0^0.21 synonyms^0.2
> >
> > my search field (qf) is my catchall field
> >
> > I'v been trying to change slop in pf2, pf3 for catchall and synonyms
> (going
> > from 0, or default value for synonyms, to 1)
> >
> > pf2=catchall~1^0.2 name~0^0.21 synonyms~1^0.2
> > pf3=catchall~1^0.2 name~0^0.21 synonyms~1^0.2
> >
> > but some results are not ordered the same way anymore even if I get the
> > same MATCH values in debugQuery output
> >
> > For instance, for a doc matching bastill in catchall field (and nothing
> to
> > do with pf2, pf3!)
> >
> > with first pf2, pf3
> >
> > 0.5163083 = (MATCH) weight(catchall:bastill in 105256)
> [NoTFIDFSimilarity],
> > result of:
> >* 0.5163083 = score(doc=105256,freq=2.0 = termFreq=2.0*
> > ), product of:
> >  * 0.5163083 = queryWeight,* product of:
> > 1.0 = idf(docFreq=134, maxDocs=12258543)
> > 0.5163083 = queryNorm
> >   1.0 = fieldWeight in 105256, product of:
> > 1.0 = tf(freq=2.0), with freq of:
> >   2.0 = termFreq=2.0
> > 1.0 = idf(docFreq=134, maxDocs=12258543)
> > 1.0 = fieldNorm(doc=105256)
> >   0.5163083 = (MATCH) weight(catchall:paris in 105256)
> > [NoTFIDFSimilarity], result of:
> > 0.5163083 = score(doc=105256,freq=6.0 = termFreq=6.0
> >
> > and when I change pf2 pf3 (the only change, same query, same docs)
> >
> > 0.47504464 = (MATCH) weight(catchall:paris in 105256)
> [NoTFIDFSimilarity],
> > result of:
> >* 0.47504464 = score(doc=105256,freq=6.0 = termFreq=6.0*
> > ), product of:
> >  * 0.47504464 = queryWeight*, product of:
> > 1.0 = idf(docFreq=10958, maxDocs=12258543)
> > 0.47504464 = queryNorm
> >   1.0 = fieldWeight in 105256, product of:
> > 1.0 = tf(freq=6.0), with freq of:
> >   6.0 = termFreq=6.0
> > 1.0 = idf(docFreq=10958, maxDocs=12258543)
> > 1.0 = fieldNorm(doc=105256)
> >
> > so in the end, with same MATCH results, in first case I get two documents
> > with same score, and in second case, one document has a higher score.
> >
> > This seem very very strange. Does anyone have a clue what's going on?
> >
> > Thanks
> > Elisabeth
> >
> --
> Regards,
> Binoy Dalal
>


Re: solr 4.10 I change slop in pf2 pf3 and query norm changes

2015-12-21 Thread elisabeth benoit
hello,

yes in the second case I get one document with a higher score. the relative
scoring between documents is not the same anymore.

best regards,
elisabeth

2015-12-22 4:39 GMT+01:00 Binoy Dalal :

> I have one query.
> In the second case do you get two records with the same lower scores or
> just one record with a lower score and the other with a higher one?
>
> On Mon, 21 Dec 2015, 18:45 elisabeth benoit 
> wrote:
>
> > Hello,
> >
> > I don't think the query is important in this case.
> >
> > After checking out solr's debug output, I dont think the query norm is
> > relevant either.
> >
> > I think the scoring changes because
> >
> > 1) in first case, I have same slop for catchall and name fields. Bot
> match
> > pf2 pf3. In this case, solr uses max of both for scoring pf2 pf3 results.
> >
> > 2) In second case, I have different slopes, then solr uses sum of values
> > instead of max.
> >
> >
> >
> > If anyone knows how to work around this, please let me know.
> >
> > Elisabeth
> >
> > 2015-12-21 11:22 GMT+01:00 Binoy Dalal :
> >
> > > What is your query?
> > >
> > > On Mon, 21 Dec 2015, 14:37 elisabeth benoit  >
> > > wrote:
> > >
> > > > Hello all,
> > > >
> > > > I am using solr 4.10.1 and I have configured my pf2 pf3 like this
> > > >
> > > > catchall~0^0.2 name~0^0.21 synonyms^0.2
> > > > catchall~0^0.2 name~0^0.21 synonyms^0.2
> > > >
> > > > my search field (qf) is my catchall field
> > > >
> > > > I'v been trying to change slop in pf2, pf3 for catchall and synonyms
> > > (going
> > > > from 0, or default value for synonyms, to 1)
> > > >
> > > > pf2=catchall~1^0.2 name~0^0.21 synonyms~1^0.2
> > > > pf3=catchall~1^0.2 name~0^0.21 synonyms~1^0.2
> > > >
> > > > but some results are not ordered the same way anymore even if I get
> the
> > > > same MATCH values in debugQuery output
> > > >
> > > > For instance, for a doc matching bastill in catchall field (and
> nothing
> > > to
> > > > do with pf2, pf3!)
> > > >
> > > > with first pf2, pf3
> > > >
> > > > 0.5163083 = (MATCH) weight(catchall:bastill in 105256)
> > > [NoTFIDFSimilarity],
> > > > result of:
> > > >* 0.5163083 = score(doc=105256,freq=2.0 = termFreq=2.0*
> > > > ), product of:
> > > >  * 0.5163083 = queryWeight,* product of:
> > > > 1.0 = idf(docFreq=134, maxDocs=12258543)
> > > > 0.5163083 = queryNorm
> > > >   1.0 = fieldWeight in 105256, product of:
> > > > 1.0 = tf(freq=2.0), with freq of:
> > > >   2.0 = termFreq=2.0
> > > > 1.0 = idf(docFreq=134, maxDocs=12258543)
> > > > 1.0 = fieldNorm(doc=105256)
> > > >   0.5163083 = (MATCH) weight(catchall:paris in 105256)
> > > > [NoTFIDFSimilarity], result of:
> > > > 0.5163083 = score(doc=105256,freq=6.0 = termFreq=6.0
> > > >
> > > > and when I change pf2 pf3 (the only change, same query, same docs)
> > > >
> > > > 0.47504464 = (MATCH) weight(catchall:paris in 105256)
> > > [NoTFIDFSimilarity],
> > > > result of:
> > > >* 0.47504464 = score(doc=105256,freq=6.0 = termFreq=6.0*
> > > > ), product of:
> > > >  * 0.47504464 = queryWeight*, product of:
> > > > 1.0 = idf(docFreq=10958, maxDocs=12258543)
> > > > 0.47504464 = queryNorm
> > > >   1.0 = fieldWeight in 105256, product of:
> > > > 1.0 = tf(freq=6.0), with freq of:
> > > >   6.0 = termFreq=6.0
> > > > 1.0 = idf(docFreq=10958, maxDocs=12258543)
> > > > 1.0 = fieldNorm(doc=105256)
> > > >
> > > > so in the end, with same MATCH results, in first case I get two
> > documents
> > > > with same score, and in second case, one document has a higher score.
> > > >
> > > > This seem very very strange. Does anyone have a clue what's going on?
> > > >
> > > > Thanks
> > > > Elisabeth
> > > >
> > > --
> > > Regards,
> > > Binoy Dalal
> > >
> >
> --
> Regards,
> Binoy Dalal
>


Re: solr 4.10 I change slop in pf2 pf3 and query norm changes

2015-12-21 Thread elisabeth benoit
hello,

That's what I did, like I wrote in my mail yesterday. In first case, solr
computes max. In second case, he sums both results.

That's why I dont get the same relative scoring between docs with the same
query.

2015-12-22 8:30 GMT+01:00 Binoy Dalal :

> Unless the content for both the docs is exactly the same it is highly
> unlikely that you will get the same score for the docs under different
> querying conditions. What you saw in the first case may have been a happy
> coincidence.
> Other than that it is very difficult to say why the scoring is different
> without getting a look at the actual query and the doc content.
>
> If you still wish to dig deeper, try to understand how solr actually scores
> documents that match your query. It takes into account a variety of factors
> to compute the cosine similarity to find the best match.
> You can find this formula and a decent explanation for it in the book solr
> in action or online in the lucene docs:
>
> https://lucene.apache.org/core/3_5_0/api/core/org/apache/lucene/search/Similarity.html
>
> On Tue, 22 Dec 2015, 11:10 elisabeth benoit 
> wrote:
>
> > hello,
> >
> > yes in the second case I get one document with a higher score. the
> relative
> > scoring between documents is not the same anymore.
> >
> > best regards,
> > elisabeth
> >
> > 2015-12-22 4:39 GMT+01:00 Binoy Dalal :
> >
> > > I have one query.
> > > In the second case do you get two records with the same lower scores or
> > > just one record with a lower score and the other with a higher one?
> > >
> > > On Mon, 21 Dec 2015, 18:45 elisabeth benoit  >
> > > wrote:
> > >
> > > > Hello,
> > > >
> > > > I don't think the query is important in this case.
> > > >
> > > > After checking out solr's debug output, I dont think the query norm
> is
> > > > relevant either.
> > > >
> > > > I think the scoring changes because
> > > >
> > > > 1) in first case, I have same slop for catchall and name fields. Bot
> > > match
> > > > pf2 pf3. In this case, solr uses max of both for scoring pf2 pf3
> > results.
> > > >
> > > > 2) In second case, I have different slopes, then solr uses sum of
> > values
> > > > instead of max.
> > > >
> > > >
> > > >
> > > > If anyone knows how to work around this, please let me know.
> > > >
> > > > Elisabeth
> > > >
> > > > 2015-12-21 11:22 GMT+01:00 Binoy Dalal :
> > > >
> > > > > What is your query?
> > > > >
> > > > > On Mon, 21 Dec 2015, 14:37 elisabeth benoit <
> > elisaelisael...@gmail.com
> > > >
> > > > > wrote:
> > > > >
> > > > > > Hello all,
> > > > > >
> > > > > > I am using solr 4.10.1 and I have configured my pf2 pf3 like this
> > > > > >
> > > > > > catchall~0^0.2 name~0^0.21 synonyms^0.2
> > > > > > catchall~0^0.2 name~0^0.21 synonyms^0.2
> > > > > >
> > > > > > my search field (qf) is my catchall field
> > > > > >
> > > > > > I'v been trying to change slop in pf2, pf3 for catchall and
> > synonyms
> > > > > (going
> > > > > > from 0, or default value for synonyms, to 1)
> > > > > >
> > > > > > pf2=catchall~1^0.2 name~0^0.21 synonyms~1^0.2
> > > > > > pf3=catchall~1^0.2 name~0^0.21 synonyms~1^0.2
> > > > > >
> > > > > > but some results are not ordered the same way anymore even if I
> get
> > > the
> > > > > > same MATCH values in debugQuery output
> > > > > >
> > > > > > For instance, for a doc matching bastill in catchall field (and
> > > nothing
> > > > > to
> > > > > > do with pf2, pf3!)
> > > > > >
> > > > > > with first pf2, pf3
> > > > > >
> > > > > > 0.5163083 = (MATCH) weight(catchall:bastill in 105256)
> > > > > [NoTFIDFSimilarity],
> > > > > > result of:
> > > > > >* 0.5163083 = score(doc=105256,freq=2.0 = termFreq=2.0*
> > > > > > ), product of:
> > > > > >  * 0.5163083 = queryWeight,* product of:
> > > > > > 1.0 = idf(docFreq=134, maxDoc

Re: Boost exact search

2016-02-22 Thread elisabeth benoit
Hello,

There was a discussion on this thread about exact match

http://www.mail-archive.com/solr-user%40lucene.apache.org/msg118115.html


they mention an example on this page


https://github.com/cominvent/exactmatch


Best regards,
Elisabeth

2016-02-19 18:01 GMT+01:00 Loïc Stéphan :

> Hello,
>
>
>
> We try to boost exact search to improve relevance.
>
> We followed this article :
> http://everydaydeveloper.blogspot.fr/2012/02/solr-improve-relevancy-by-boosting.html
> and this
> http://stackoverflow.com/questions/29103155/solr-exact-match-boost-over-text-containing-the-exact-match
>  but it doesn’t work for us.
>
>
>
> What is the best way to do this ?
>
>
>
> Thanks in advance
>
>
>
> [image: cid:image001.jpg@01CDD6D4.98875830]
>
>
>
> *--*
>
> *LOIC STEPHAN*
> Responsable TMA
>
> *www.w-seils.com *
>
>
>
> *lstep...@w-seils.com *
> Tel   *+33 (0)2 28 22 75 42 <%2B33%20%280%292%2028%2023%2070%2072>*
>
>
>
>
>


ngrams with position

2016-03-07 Thread elisabeth benoit
Hello,

I'm using solr 4.10.1. I'd like to index words with ngrams of fix lenght
with a position in the end.

For instance, with fix lenght 3, Amsterdam would be something like:


a0 (two spaces added at beginning)
am1
ams2
mst3
ste4
ter5
erd6
rda7
dam8
am9 (one more space in the end)

The number at the end being the position.

Does anyone have a clue how to achieve this?

Best regards,
Elisabeth


Re: ngrams with position

2016-03-08 Thread elisabeth benoit
Thanks for your answer Emir,

I'll check that out.

Best regards,
Elisabeth

2016-03-08 10:24 GMT+01:00 Emir Arnautovic :

> Hi Elisabeth,
> I don't think there is such token filter, so you would have to create your
> own token filter that takes token and emits ngram token of specific length.
> It should not be too hard to create such filter - you can take a look how
> nagram filter is coded - yours should be simpler than that.
>
> Regards,
> Emir
>
>
> On 08.03.2016 08:52, elisabeth benoit wrote:
>
>> Hello,
>>
>> I'm using solr 4.10.1. I'd like to index words with ngrams of fix lenght
>> with a position in the end.
>>
>> For instance, with fix lenght 3, Amsterdam would be something like:
>>
>>
>> a0 (two spaces added at beginning)
>> am1
>> ams2
>> mst3
>> ste4
>> ter5
>> erd6
>> rda7
>> dam8
>> am9 (one more space in the end)
>>
>> The number at the end being the position.
>>
>> Does anyone have a clue how to achieve this?
>>
>> Best regards,
>> Elisabeth
>>
>>
> --
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> Solr & Elasticsearch Support * http://sematext.com/
>
>


Re: ngrams with position

2016-03-09 Thread elisabeth benoit
Hello Alessandro,

You may be right. What would you use to keep relative order between, for
instance, grams

__a
_am
ams
mst
ste
ter
erd
rda
dam
am_

of amsterdam? pf2 and pf3? That's all I can think about. Please let me know
if you have more insights.

Best regards,
Elisabeth

2016-03-08 17:46 GMT+01:00 Alessandro Benedetti :

> Elizabeth,
> out of curiousity, could we know what you are trying to solve with that
> complex way of tokenisation ?
> Solr is really good in storing positions along with token, so I am curious
> to know why your are mixing the things up.
>
> Cheers
>
> On 8 March 2016 at 10:08, elisabeth benoit 
> wrote:
>
> > Thanks for your answer Emir,
> >
> > I'll check that out.
> >
> > Best regards,
> > Elisabeth
> >
> > 2016-03-08 10:24 GMT+01:00 Emir Arnautovic  >:
> >
> > > Hi Elisabeth,
> > > I don't think there is such token filter, so you would have to create
> > your
> > > own token filter that takes token and emits ngram token of specific
> > length.
> > > It should not be too hard to create such filter - you can take a look
> how
> > > nagram filter is coded - yours should be simpler than that.
> > >
> > > Regards,
> > > Emir
> > >
> > >
> > > On 08.03.2016 08:52, elisabeth benoit wrote:
> > >
> > >> Hello,
> > >>
> > >> I'm using solr 4.10.1. I'd like to index words with ngrams of fix
> lenght
> > >> with a position in the end.
> > >>
> > >> For instance, with fix lenght 3, Amsterdam would be something like:
> > >>
> > >>
> > >> a0 (two spaces added at beginning)
> > >> am1
> > >> ams2
> > >> mst3
> > >> ste4
> > >> ter5
> > >> erd6
> > >> rda7
> > >> dam8
> > >> am9 (one more space in the end)
> > >>
> > >> The number at the end being the position.
> > >>
> > >> Does anyone have a clue how to achieve this?
> > >>
> > >> Best regards,
> > >> Elisabeth
> > >>
> > >>
> > > --
> > > Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> > > Solr & Elasticsearch Support * http://sematext.com/
> > >
> > >
> >
>
>
>
> --
> --
>
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>


Re: ngrams with position

2016-03-10 Thread elisabeth benoit
I am trying to do approximative search with solr. We've tried fuzzy search,
and spellcheck search, it's working ok but edit distance is limited (to 2
for DirectSolrSpellChecker in solr 4.10.1). With fuzzy operator, we've had
performance issues, and I don't think you can have an edit distance more
than 2.

What we used to do with a database was more efficient: storing trigrams
with position, and then searching arround that position (not precisely at
that position, since it's approximative search)

Position is to avoid  for a trigram like ams (amsterdam) to get answers
where the same trigram is for instance at the end of the word. I would like
answers with the same relative position between trigrams to score higher.
Maybe using edismax'ss pf2 and pf3 is a way to do this. I don't see any
other way. Please tell me if you do.

>From you're answer, I get that position is stored, but I dont understand
how I can preserve relative order between trigrams, apart from using pf2
pf3.

Best regards,
Elisabeth

2016-03-10 0:02 GMT+01:00 Alessandro Benedetti :

> if you store the positions for your tokens ( and it is by default if you
> don't omit them), you have the relative position in the index. [1]
> I attach a blog post of mine, describing a little bit more in details the
> lucene internals.
>
> Apart from that, can you explain the problem you are trying to solve ?
> The high level user experience ?
> What kind of search/autocompletion/relevancy tuning are you trying to
> achieve ?
> Maybe we can help better if we start from the problem :)
>
> Cheers
>
> [1]
>
> http://alexbenedetti.blogspot.co.uk/2015/07/exploring-solr-internals-lucene.html
>
> On 9 March 2016 at 15:02, elisabeth benoit 
> wrote:
>
> > Hello Alessandro,
> >
> > You may be right. What would you use to keep relative order between, for
> > instance, grams
> >
> > __a
> > _am
> > ams
> > mst
> > ste
> > ter
> > erd
> > rda
> > dam
> > am_
> >
> > of amsterdam? pf2 and pf3? That's all I can think about. Please let me
> know
> > if you have more insights.
> >
> > Best regards,
> > Elisabeth
> >
> > 2016-03-08 17:46 GMT+01:00 Alessandro Benedetti :
> >
> > > Elizabeth,
> > > out of curiousity, could we know what you are trying to solve with that
> > > complex way of tokenisation ?
> > > Solr is really good in storing positions along with token, so I am
> > curious
> > > to know why your are mixing the things up.
> > >
> > > Cheers
> > >
> > > On 8 March 2016 at 10:08, elisabeth benoit 
> > > wrote:
> > >
> > > > Thanks for your answer Emir,
> > > >
> > > > I'll check that out.
> > > >
> > > > Best regards,
> > > > Elisabeth
> > > >
> > > > 2016-03-08 10:24 GMT+01:00 Emir Arnautovic <
> > emir.arnauto...@sematext.com
> > > >:
> > > >
> > > > > Hi Elisabeth,
> > > > > I don't think there is such token filter, so you would have to
> create
> > > > your
> > > > > own token filter that takes token and emits ngram token of specific
> > > > length.
> > > > > It should not be too hard to create such filter - you can take a
> look
> > > how
> > > > > nagram filter is coded - yours should be simpler than that.
> > > > >
> > > > > Regards,
> > > > > Emir
> > > > >
> > > > >
> > > > > On 08.03.2016 08:52, elisabeth benoit wrote:
> > > > >
> > > > >> Hello,
> > > > >>
> > > > >> I'm using solr 4.10.1. I'd like to index words with ngrams of fix
> > > lenght
> > > > >> with a position in the end.
> > > > >>
> > > > >> For instance, with fix lenght 3, Amsterdam would be something
> like:
> > > > >>
> > > > >>
> > > > >> a0 (two spaces added at beginning)
> > > > >> am1
> > > > >> ams2
> > > > >> mst3
> > > > >> ste4
> > > > >> ter5
> > > > >> erd6
> > > > >> rda7
> > > > >> dam8
> > > > >> am9 (one more space in the end)
> > > > >>
> > > > >> The number at the end being the position.
> > > > >>
> > > > >> Does anyone have a clue how to achieve this?
> > > > >>
> > > > >> Best regards,
> > > > >> Elisabeth
> > > > >>
> > > > >>
> > > > > --
> > > > > Monitoring * Alerting * Anomaly Detection * Centralized Log
> > Management
> > > > > Solr & Elasticsearch Support * http://sematext.com/
> > > > >
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > --
> > >
> > > Benedetti Alessandro
> > > Visiting card : http://about.me/alessandro_benedetti
> > >
> > > "Tyger, tyger burning bright
> > > In the forests of the night,
> > > What immortal hand or eye
> > > Could frame thy fearful symmetry?"
> > >
> > > William Blake - Songs of Experience -1794 England
> > >
> >
>
>
>
> --
> --
>
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>


Re: ngrams with position

2016-03-10 Thread elisabeth benoit
That's the use cas, yes. Find Amsterdam with Asmtreadm.

And yes, we're only doing approximative search if we get 0 result.

I don't quite get why pf2 pf3 not a good solution.

We're actually testing a solution close to phonetic. Some kind of word
reduction.

Thanks for the suggestion (and the link), this makes me think maybe
phonetic is the good solution.

Thanks for your help,
Elisabeth

2016-03-10 11:32 GMT+01:00 Alessandro Benedetti :

>  If I followed your use case is:
>
> I type Asmtreadm and I want document matching Amsterdam ( even if the edit
> distance is greater than 2) .
> First of all is something I hope you do only if you get 0 results, if not
> the overhead can be great and you are going to lose a lot of precision
> causing confusion in the customer.
>
> Pf2 and Pf3 is ngram of white space separated tokens, to make partial
> phrase query to affect the scoring.
> Not a good fit for your problem.
>
> More than grams, have you considered using some sort of phonetic matching ?
> Could this help :
> https://cwiki.apache.org/confluence/display/solr/Phonetic+Matching
>
> Cheers
>
> On 10 March 2016 at 08:47, elisabeth benoit 
> wrote:
>
> > I am trying to do approximative search with solr. We've tried fuzzy
> search,
> > and spellcheck search, it's working ok but edit distance is limited (to 2
> > for DirectSolrSpellChecker in solr 4.10.1). With fuzzy operator, we've
> had
> > performance issues, and I don't think you can have an edit distance more
> > than 2.
> >
> > What we used to do with a database was more efficient: storing trigrams
> > with position, and then searching arround that position (not precisely at
> > that position, since it's approximative search)
> >
> > Position is to avoid  for a trigram like ams (amsterdam) to get answers
> > where the same trigram is for instance at the end of the word. I would
> like
> > answers with the same relative position between trigrams to score higher.
> > Maybe using edismax'ss pf2 and pf3 is a way to do this. I don't see any
> > other way. Please tell me if you do.
> >
> > From you're answer, I get that position is stored, but I dont understand
> > how I can preserve relative order between trigrams, apart from using pf2
> > pf3.
> >
> > Best regards,
> > Elisabeth
> >
> > 2016-03-10 0:02 GMT+01:00 Alessandro Benedetti :
> >
> > > if you store the positions for your tokens ( and it is by default if
> you
> > > don't omit them), you have the relative position in the index. [1]
> > > I attach a blog post of mine, describing a little bit more in details
> the
> > > lucene internals.
> > >
> > > Apart from that, can you explain the problem you are trying to solve ?
> > > The high level user experience ?
> > > What kind of search/autocompletion/relevancy tuning are you trying to
> > > achieve ?
> > > Maybe we can help better if we start from the problem :)
> > >
> > > Cheers
> > >
> > > [1]
> > >
> > >
> >
> http://alexbenedetti.blogspot.co.uk/2015/07/exploring-solr-internals-lucene.html
> > >
> > > On 9 March 2016 at 15:02, elisabeth benoit 
> > > wrote:
> > >
> > > > Hello Alessandro,
> > > >
> > > > You may be right. What would you use to keep relative order between,
> > for
> > > > instance, grams
> > > >
> > > > __a
> > > > _am
> > > > ams
> > > > mst
> > > > ste
> > > > ter
> > > > erd
> > > > rda
> > > > dam
> > > > am_
> > > >
> > > > of amsterdam? pf2 and pf3? That's all I can think about. Please let
> me
> > > know
> > > > if you have more insights.
> > > >
> > > > Best regards,
> > > > Elisabeth
> > > >
> > > > 2016-03-08 17:46 GMT+01:00 Alessandro Benedetti <
> abenede...@apache.org
> > >:
> > > >
> > > > > Elizabeth,
> > > > > out of curiousity, could we know what you are trying to solve with
> > that
> > > > > complex way of tokenisation ?
> > > > > Solr is really good in storing positions along with token, so I am
> > > > curious
> > > > > to know why your are mixing the things up.
> > > > >
> > > > > Cheers
> > > > >
> > > > > On 8 March 2016 at 10:08, elisabeth benoit <
> > elisa

Re: ngrams with position

2016-03-10 Thread elisabeth benoit
oh yeah, now that you're saying it, yeah you're right, pf2 pf3 will boost
proximity between words, not between ngrams.

Thanks again,
Elisabeth

2016-03-10 12:31 GMT+01:00 Alessandro Benedetti :

> The reason pf2 and pf3 seems not a good solution to me is the fact that the
> edismax query parser calculate those grams on top of words shingles.
> So it takes the query in input, and produces the shingle based on the white
> space separator.
>
> i.e. if you search :
> "white tiger jumping"
>  and pf2 configured on field1.
> You are going to end up searching in field1 :
> "white tiger", "tiger jumping" .
> This is really useful in full text search oriented to phrases and partial
> phrases match.
> But it has nothing to do with the analysis type associated at query time at
> this moment.
> First it is used the query parser tokenisation to build the grams and then
> the query time analysis is applied.
> This according to my remembering,
> I will double check in the code and let you know.
>
> Cheers
>
>
> On 10 March 2016 at 11:02, elisabeth benoit 
> wrote:
>
> > That's the use cas, yes. Find Amsterdam with Asmtreadm.
> >
> > And yes, we're only doing approximative search if we get 0 result.
> >
> > I don't quite get why pf2 pf3 not a good solution.
> >
> > We're actually testing a solution close to phonetic. Some kind of word
> > reduction.
> >
> > Thanks for the suggestion (and the link), this makes me think maybe
> > phonetic is the good solution.
> >
> > Thanks for your help,
> > Elisabeth
> >
> > 2016-03-10 11:32 GMT+01:00 Alessandro Benedetti :
> >
> > >  If I followed your use case is:
> > >
> > > I type Asmtreadm and I want document matching Amsterdam ( even if the
> > edit
> > > distance is greater than 2) .
> > > First of all is something I hope you do only if you get 0 results, if
> not
> > > the overhead can be great and you are going to lose a lot of precision
> > > causing confusion in the customer.
> > >
> > > Pf2 and Pf3 is ngram of white space separated tokens, to make partial
> > > phrase query to affect the scoring.
> > > Not a good fit for your problem.
> > >
> > > More than grams, have you considered using some sort of phonetic
> > matching ?
> > > Could this help :
> > > https://cwiki.apache.org/confluence/display/solr/Phonetic+Matching
> > >
> > > Cheers
> > >
> > > On 10 March 2016 at 08:47, elisabeth benoit  >
> > > wrote:
> > >
> > > > I am trying to do approximative search with solr. We've tried fuzzy
> > > search,
> > > > and spellcheck search, it's working ok but edit distance is limited
> > (to 2
> > > > for DirectSolrSpellChecker in solr 4.10.1). With fuzzy operator,
> we've
> > > had
> > > > performance issues, and I don't think you can have an edit distance
> > more
> > > > than 2.
> > > >
> > > > What we used to do with a database was more efficient: storing
> trigrams
> > > > with position, and then searching arround that position (not
> precisely
> > at
> > > > that position, since it's approximative search)
> > > >
> > > > Position is to avoid  for a trigram like ams (amsterdam) to get
> answers
> > > > where the same trigram is for instance at the end of the word. I
> would
> > > like
> > > > answers with the same relative position between trigrams to score
> > higher.
> > > > Maybe using edismax'ss pf2 and pf3 is a way to do this. I don't see
> any
> > > > other way. Please tell me if you do.
> > > >
> > > > From you're answer, I get that position is stored, but I dont
> > understand
> > > > how I can preserve relative order between trigrams, apart from using
> > pf2
> > > > pf3.
> > > >
> > > > Best regards,
> > > > Elisabeth
> > > >
> > > > 2016-03-10 0:02 GMT+01:00 Alessandro Benedetti <
> abenede...@apache.org
> > >:
> > > >
> > > > > if you store the positions for your tokens ( and it is by default
> if
> > > you
> > > > > don't omit them), you have the relative position in the index. [1]
> > > > > I attach a blog post of mine, describing a little bit more in
> details
> > > the
> > > > > lucene internals.

Re: ngrams with position

2016-03-11 Thread elisabeth benoit
Jack, Emir,

Thanks for your answers. Moving ngram logic to client side would be a fast
and easy way to test the solution and compare it with the phonetic one.

Best regards,
Elisabeth

2016-03-11 10:52 GMT+01:00 Emir Arnautovic :

> Hi Elizabeth,
> In order to see if you will get better results, you can move ngram logic
> outside of analysis chain - simplest solution is to move it to client. In
> such setup, you should be able to use pf2 and pf3 and see if that produces
> desired result.
>
> Regards,
> Emir
>
>
> On 10.03.2016 13:47, elisabeth benoit wrote:
>
>> oh yeah, now that you're saying it, yeah you're right, pf2 pf3 will boost
>> proximity between words, not between ngrams.
>>
>> Thanks again,
>> Elisabeth
>>
>> 2016-03-10 12:31 GMT+01:00 Alessandro Benedetti :
>>
>> The reason pf2 and pf3 seems not a good solution to me is the fact that
>>> the
>>> edismax query parser calculate those grams on top of words shingles.
>>> So it takes the query in input, and produces the shingle based on the
>>> white
>>> space separator.
>>>
>>> i.e. if you search :
>>> "white tiger jumping"
>>>   and pf2 configured on field1.
>>> You are going to end up searching in field1 :
>>> "white tiger", "tiger jumping" .
>>> This is really useful in full text search oriented to phrases and partial
>>> phrases match.
>>> But it has nothing to do with the analysis type associated at query time
>>> at
>>> this moment.
>>> First it is used the query parser tokenisation to build the grams and
>>> then
>>> the query time analysis is applied.
>>> This according to my remembering,
>>> I will double check in the code and let you know.
>>>
>>> Cheers
>>>
>>>
>>> On 10 March 2016 at 11:02, elisabeth benoit 
>>> wrote:
>>>
>>> That's the use cas, yes. Find Amsterdam with Asmtreadm.
>>>>
>>>> And yes, we're only doing approximative search if we get 0 result.
>>>>
>>>> I don't quite get why pf2 pf3 not a good solution.
>>>>
>>>> We're actually testing a solution close to phonetic. Some kind of word
>>>> reduction.
>>>>
>>>> Thanks for the suggestion (and the link), this makes me think maybe
>>>> phonetic is the good solution.
>>>>
>>>> Thanks for your help,
>>>> Elisabeth
>>>>
>>>> 2016-03-10 11:32 GMT+01:00 Alessandro Benedetti >>> >:
>>>>
>>>>  If I followed your use case is:
>>>>>
>>>>> I type Asmtreadm and I want document matching Amsterdam ( even if the
>>>>>
>>>> edit
>>>>
>>>>> distance is greater than 2) .
>>>>> First of all is something I hope you do only if you get 0 results, if
>>>>>
>>>> not
>>>
>>>> the overhead can be great and you are going to lose a lot of precision
>>>>> causing confusion in the customer.
>>>>>
>>>>> Pf2 and Pf3 is ngram of white space separated tokens, to make partial
>>>>> phrase query to affect the scoring.
>>>>> Not a good fit for your problem.
>>>>>
>>>>> More than grams, have you considered using some sort of phonetic
>>>>>
>>>> matching ?
>>>>
>>>>> Could this help :
>>>>> https://cwiki.apache.org/confluence/display/solr/Phonetic+Matching
>>>>>
>>>>> Cheers
>>>>>
>>>>> On 10 March 2016 at 08:47, elisabeth benoit >>>> wrote:
>>>>>
>>>>> I am trying to do approximative search with solr. We've tried fuzzy
>>>>>>
>>>>> search,
>>>>>
>>>>>> and spellcheck search, it's working ok but edit distance is limited
>>>>>>
>>>>> (to 2
>>>>
>>>>> for DirectSolrSpellChecker in solr 4.10.1). With fuzzy operator,
>>>>>>
>>>>> we've
>>>
>>>> had
>>>>>
>>>>>> performance issues, and I don't think you can have an edit distance
>>>>>>
>>>>> more
>>>>
>>>>> than 2.
>>>>>>
>>>>>> What we used to do with a database was more efficient: storing
>>>>&g

deactivate coord scoring factor in pf2 pf3

2016-04-28 Thread elisabeth benoit
Hello all,

I am using Solr 4.10.1. I use edismax, with pf2 to boost documents starting
with. I use a start with token (b) automatically added at index time,
and added in request at query time.

I have a problem at this point.

request is *q=b saint denis rer*

the start with field is name_sw

first document *name_sw: Saint-Denis-Université*
second document *name_sw: RER Saint-Denis*

So one will have the pf2 starts with boost and not the other. The problem
is that it has an effect on the scoring of pf2 for all other words.

In other words, my problem is the proximity between "saint" and "denis" is
not scored the same value for those two documents.

>From what I get this is because of the coord scoring factor used for pf2.

In explain output, for first document

0.52612317 Matches Punished by 0.667 (not all query terms matched)
   0.78918475 Sum of the following:
 0.39459237 names_sw:"b saint"^0.21

 0.39459237 Dismax (take winner of below)
   0.39459237 names_sw:"saint denis"^0.21

   0.37580228 catchall:"saint den"^0.2


*So here, matches punished by 0.66*, which corresponds to coord(2/3)

and final score pf2 for proximity between saint and denis

0.263061593153079 names_sw:"saint denis"^0.21


In explain output, for second document


 0.13153079 Matches Punished by 0.3334 (not all query terms matched)
   0.39459237 Dismax (take winner of below)
 0.39459237 names_sw:"saint denis"^0.21

 0.37580228 catchall:"saint den"^0.2


*So here matches punished by 0.33*, which corresponds to coord(1/3)

and final score pf2 for proximity between saint and denis

0.1315307926306158 names_sw:"saint denis"^0.21


I would like to deactivate coord for pf2 pf3. Does anyone know how I
could do this?


Best regards,

Elisabeth


catchall fields or multiple fields

2015-10-12 Thread elisabeth benoit
Hello,

We're using solr 4.10 and storing all data in a catchall field. It seems to
me that one good reason for using a catchall field is when using scoring
with idf (with idf, a word might not have same score in all fields). We got
rid of idf and are now considering using multiple fields. I remember
reading somewhere that using a catchall field might speed up searching
time. I was wondering if some of you have any opinion (or experience)
related to this subject.

Best regards,
Elisabeth


Re: catchall fields or multiple fields

2015-10-13 Thread elisabeth benoit
Thanks to you all for those informed advices.

Thanks Trey for your very detailed point of view. This is now very clear to
me how a search on multiple fields can grow slower than a search on a
catchall field.

Our actual search model is problematic: we search on a catchall field, but
need to know which fields match, so we do highlighting on multi fields (not
indexed, but stored). To improve performance, we want to get rid of
highlighting and use the solr explain output. To get the explain output on
those fields, we need to do a search on those fields.

So I guess we have to test if removing highlighting and adding multi fields
search will improve performances or not.

Best regards,
Elisabeth



2015-10-12 17:55 GMT+02:00 Jack Krupansky :

> I think it may all depend on the nature of your application and how much
> commonality there is between fields.
>
> One interesting area is auto-suggest, where you can certainly suggest from
> the union of all fields, you may want to give priority to suggestions from
> preferred fields. For example, for actual product names or important
> keywords rather than random words from the English language that happen to
> occur in descriptions, all of which would occur in a catchall.
>
> -- Jack Krupansky
>
> On Mon, Oct 12, 2015 at 8:39 AM, elisabeth benoit <
> elisaelisael...@gmail.com
> > wrote:
>
> > Hello,
> >
> > We're using solr 4.10 and storing all data in a catchall field. It seems
> to
> > me that one good reason for using a catchall field is when using scoring
> > with idf (with idf, a word might not have same score in all fields). We
> got
> > rid of idf and are now considering using multiple fields. I remember
> > reading somewhere that using a catchall field might speed up searching
> > time. I was wondering if some of you have any opinion (or experience)
> > related to this subject.
> >
> > Best regards,
> > Elisabeth
> >
>


Re: catchall fields or multiple fields

2015-10-14 Thread elisabeth benoit
Thanks for your suggestion Jack. In fact we're doing geographic search
(fields are country, state, county, town, hamlet, district)

So it's difficult to split.

Best regards,
Elisabeth

2015-10-13 16:01 GMT+02:00 Jack Krupansky :

> Performing a sequence of queries can help too. For example, if users
> commonly search for a product name, you could do an initial query on just
> the product name field which should be much faster than searching the text
> of all product descriptions, and highlighting would be less problematic. If
> that initial query comes up empty, then you could move on to the next
> highest most likely field, maybe product title (short one line
> description), and query voluminous fields like detailed product
> descriptions, specifications, and user comments/reviews only as a last
> resort.
>
> -- Jack Krupansky
>
> On Tue, Oct 13, 2015 at 6:17 AM, elisabeth benoit <
> elisaelisael...@gmail.com
> > wrote:
>
> > Thanks to you all for those informed advices.
> >
> > Thanks Trey for your very detailed point of view. This is now very clear
> to
> > me how a search on multiple fields can grow slower than a search on a
> > catchall field.
> >
> > Our actual search model is problematic: we search on a catchall field,
> but
> > need to know which fields match, so we do highlighting on multi fields
> (not
> > indexed, but stored). To improve performance, we want to get rid of
> > highlighting and use the solr explain output. To get the explain output
> on
> > those fields, we need to do a search on those fields.
> >
> > So I guess we have to test if removing highlighting and adding multi
> fields
> > search will improve performances or not.
> >
> > Best regards,
> > Elisabeth
> >
> >
> >
> > 2015-10-12 17:55 GMT+02:00 Jack Krupansky :
> >
> > > I think it may all depend on the nature of your application and how
> much
> > > commonality there is between fields.
> > >
> > > One interesting area is auto-suggest, where you can certainly suggest
> > from
> > > the union of all fields, you may want to give priority to suggestions
> > from
> > > preferred fields. For example, for actual product names or important
> > > keywords rather than random words from the English language that happen
> > to
> > > occur in descriptions, all of which would occur in a catchall.
> > >
> > > -- Jack Krupansky
> > >
> > > On Mon, Oct 12, 2015 at 8:39 AM, elisabeth benoit <
> > > elisaelisael...@gmail.com
> > > > wrote:
> > >
> > > > Hello,
> > > >
> > > > We're using solr 4.10 and storing all data in a catchall field. It
> > seems
> > > to
> > > > me that one good reason for using a catchall field is when using
> > scoring
> > > > with idf (with idf, a word might not have same score in all fields).
> We
> > > got
> > > > rid of idf and are now considering using multiple fields. I remember
> > > > reading somewhere that using a catchall field might speed up
> searching
> > > > time. I was wondering if some of you have any opinion (or experience)
> > > > related to this subject.
> > > >
> > > > Best regards,
> > > > Elisabeth
> > > >
> > >
> >
>


pf2 pf3 and stopwords

2015-12-14 Thread elisabeth benoit
Hello,

I am using solr 4.10.1. I have a field with stopwords




And I use pf2 pf3 on that field with a slop of 0.

If the request is "Gare Saint Lazare", and I have a document "Gare de Saint
Lazare", "de" being a stopword, this document doesn't get the pf3 boost,
because of "de".

I was wondering, is this normal? is this a bug? is something wrong with my
configuration?

Best regards,
Elisabeth


Re: pf2 pf3 and stopwords

2015-12-16 Thread elisabeth benoit
Thanks for your answer.

Actually, using a slop of 1 is something I can't do (because of other
specifications)

I guess I'll index differently.

Best regards,
Elisabeth

2015-12-14 16:24 GMT+01:00 Binoy Dalal :

> Moreover, the stopword de will work on your queries and not on your
> documents, meaning if you query 'Gare de Saint Lazare', the terms actually
> searched for will be Gare Saint and Lazare, 'de' will be filtered out.
>
> On Mon, Dec 14, 2015 at 8:49 PM Binoy Dalal 
> wrote:
>
> > This isn't a bug. During pf3 matching, since your query has only three
> > tokens, the entire query will be treated as a single phrase, and with
> slop
> > = 0, any word that comes in the middle of your query  - 'de' in this case
> > will cause the phrase to not be matched. If you want to get around this,
> > try setting your slop = 1 in which case it should match Gare Saint Lazare
> > even with the de in it.
> >
> > On Mon, Dec 14, 2015 at 7:22 PM elisabeth benoit <
> > elisaelisael...@gmail.com> wrote:
> >
> >> Hello,
> >>
> >> I am using solr 4.10.1. I have a field with stopwords
> >>
> >>
> >>  >> words="stopwords.txt"
> >> enablePositionIncrements="true"/>
> >>
> >> And I use pf2 pf3 on that field with a slop of 0.
> >>
> >> If the request is "Gare Saint Lazare", and I have a document "Gare de
> >> Saint
> >> Lazare", "de" being a stopword, this document doesn't get the pf3 boost,
> >> because of "de".
> >>
> >> I was wondering, is this normal? is this a bug? is something wrong with
> my
> >> configuration?
> >>
> >> Best regards,
> >> Elisabeth
> >>
> > --
> > Regards,
> > Binoy Dalal
> >
> --
> Regards,
> Binoy Dalal
>


using DirectSpellChecker and FileBasedSpellChecker with Solr 4.10.1

2015-04-14 Thread elisabeth benoit
Hello,

I am using Solr 4.10.1 and trying to use DirectSolrSpellChecker and
FileBasedSpellchecker in same request.

I've applied change from patch 135.patch (cf Solr-6271). I've tried running
the command "patch -p1 -i 135.patch --dry-run" but it didn't work, maybe
because the patch was a fix to Solr 4.9, so I just replaced line in
ConjunctionSolrSpellChecker

else if (!stringDistance.equals(checker.getStringDistance())) {
 throw new IllegalArgumentException(
 "All checkers need to use the same StringDistance.");
   }


by

else if (!stringDistance.equals(checker.getStringDistance())) {
throw new IllegalArgumentException(
"All checkers need to use the same StringDistance!!! 1:" +
checker.getStringDistance() + " 2: " + stringDistance);
  }

as it was done in the patch

but still, when I send a spellcheck request, I get the error

msg": "All checkers need to use the same StringDistance!!!
1:org.apache.lucene.search.spell.LuceneLevenshteinDistance@15f57db32:
org.apache.lucene.search.spell.LuceneLevenshteinDistance@280f7e08"

>From error message I gather both spellchecker use same distanceMeasure
LuceneLevenshteinDistance, but they're not same instance of
LuceneLevenshteinDistance.

Is the condition all right? What should be done to fix this properly?

Thanks,
Elisabeth


Re: using DirectSpellChecker and FileBasedSpellChecker with Solr 4.10.1

2015-04-14 Thread elisabeth benoit
Thanks for your answer!

I didn't realize this what not supposed to be done (conjunction of
DirectSolrSpellChecker and FileBasedSpellChecker). I got this idea in the
mailing list while searching for a solution to get a list of words to
ignore for the DirectSolrSpellChecker.

Well well well, I'll try removing the check and see what happens. I'm not a
java programmer, but if I can find a simple solution I'll let you know.

Thanks again,
Elisabeth

2015-04-14 16:29 GMT+02:00 Dyer, James :

> Elisabeth,
>
> Currently ConjunctionSolrSpellChecker only supports adding
> WordBreakSolrSpellchecker to IndexBased- FileBased- or
> DirectSolrSpellChecker.  In the future, it would be great if it could
> handle other Spell Checker combinations.  For instance, if you had a
> (e)dismax query that searches multiple fields, to have a separate
> spellchecker for each of them.
>
> But CSSC is not hardened for this more general usage, as hinted in the API
> doc.  The check done to ensure all spellcheckers use the same
> stringdistance object, I believe, is a safeguard against using this class
> for functionality it is not able to correctly support.  It looks to me that
> SOLR-6271 was opened to fix the bug in that it is comparing references on
> the stringdistance.  This is not a problem with WBSSC because this one does
> not support string distance at all.
>
> What you're hoping for, however, is that the requirement for the string
> distances be the same to be removed entirely.  You could try modifying the
> code by removing the check.  However beware that you might not get the
> results you desire!  But should this happen, please, go ahead and fix it
> for your use case and then donate the code.  This is something I've
> personally wanted for a long time.
>
> James Dyer
> Ingram Content Group
>
>
> -Original Message-
> From: elisabeth benoit [mailto:elisaelisael...@gmail.com]
> Sent: Tuesday, April 14, 2015 7:37 AM
> To: solr-user@lucene.apache.org
> Subject: using DirectSpellChecker and FileBasedSpellChecker with Solr
> 4.10.1
>
> Hello,
>
> I am using Solr 4.10.1 and trying to use DirectSolrSpellChecker and
> FileBasedSpellchecker in same request.
>
> I've applied change from patch 135.patch (cf Solr-6271). I've tried running
> the command "patch -p1 -i 135.patch --dry-run" but it didn't work, maybe
> because the patch was a fix to Solr 4.9, so I just replaced line in
> ConjunctionSolrSpellChecker
>
> else if (!stringDistance.equals(checker.getStringDistance())) {
>  throw new IllegalArgumentException(
>  "All checkers need to use the same StringDistance.");
>}
>
>
> by
>
> else if (!stringDistance.equals(checker.getStringDistance())) {
> throw new IllegalArgumentException(
> "All checkers need to use the same StringDistance!!! 1:" +
> checker.getStringDistance() + " 2: " + stringDistance);
>   }
>
> as it was done in the patch
>
> but still, when I send a spellcheck request, I get the error
>
> msg": "All checkers need to use the same StringDistance!!!
> 1:org.apache.lucene.search.spell.LuceneLevenshteinDistance@15f57db32:
> org.apache.lucene.search.spell.LuceneLevenshteinDistance@280f7e08"
>
> From error message I gather both spellchecker use same distanceMeasure
> LuceneLevenshteinDistance, but they're not same instance of
> LuceneLevenshteinDistance.
>
> Is the condition all right? What should be done to fix this properly?
>
> Thanks,
> Elisabeth
>


Re: using DirectSpellChecker and FileBasedSpellChecker with Solr 4.10.1

2015-04-16 Thread elisabeth benoit
For the records, what I finally did is place those words I want spellcheck
to ignore in spellcheck.collateParam.fq and the words I'd like to be
checked in spellcheck.q. collationQuery uses spellcheck.collateParam.fq so
all did_you_mean queries return results containing words in
spellcheck.collateParam.fq.

Best regards,
Elisabeth



2015-04-14 17:05 GMT+02:00 elisabeth benoit :

> Thanks for your answer!
>
> I didn't realize this what not supposed to be done (conjunction of
> DirectSolrSpellChecker and FileBasedSpellChecker). I got this idea in the
> mailing list while searching for a solution to get a list of words to
> ignore for the DirectSolrSpellChecker.
>
> Well well well, I'll try removing the check and see what happens. I'm not
> a java programmer, but if I can find a simple solution I'll let you know.
>
> Thanks again,
> Elisabeth
>
> 2015-04-14 16:29 GMT+02:00 Dyer, James :
>
>> Elisabeth,
>>
>> Currently ConjunctionSolrSpellChecker only supports adding
>> WordBreakSolrSpellchecker to IndexBased- FileBased- or
>> DirectSolrSpellChecker.  In the future, it would be great if it could
>> handle other Spell Checker combinations.  For instance, if you had a
>> (e)dismax query that searches multiple fields, to have a separate
>> spellchecker for each of them.
>>
>> But CSSC is not hardened for this more general usage, as hinted in the
>> API doc.  The check done to ensure all spellcheckers use the same
>> stringdistance object, I believe, is a safeguard against using this class
>> for functionality it is not able to correctly support.  It looks to me that
>> SOLR-6271 was opened to fix the bug in that it is comparing references on
>> the stringdistance.  This is not a problem with WBSSC because this one does
>> not support string distance at all.
>>
>> What you're hoping for, however, is that the requirement for the string
>> distances be the same to be removed entirely.  You could try modifying the
>> code by removing the check.  However beware that you might not get the
>> results you desire!  But should this happen, please, go ahead and fix it
>> for your use case and then donate the code.  This is something I've
>> personally wanted for a long time.
>>
>> James Dyer
>> Ingram Content Group
>>
>>
>> -Original Message-
>> From: elisabeth benoit [mailto:elisaelisael...@gmail.com]
>> Sent: Tuesday, April 14, 2015 7:37 AM
>> To: solr-user@lucene.apache.org
>> Subject: using DirectSpellChecker and FileBasedSpellChecker with Solr
>> 4.10.1
>>
>> Hello,
>>
>> I am using Solr 4.10.1 and trying to use DirectSolrSpellChecker and
>> FileBasedSpellchecker in same request.
>>
>> I've applied change from patch 135.patch (cf Solr-6271). I've tried
>> running
>> the command "patch -p1 -i 135.patch --dry-run" but it didn't work, maybe
>> because the patch was a fix to Solr 4.9, so I just replaced line in
>> ConjunctionSolrSpellChecker
>>
>> else if (!stringDistance.equals(checker.getStringDistance())) {
>>  throw new IllegalArgumentException(
>>  "All checkers need to use the same StringDistance.");
>>}
>>
>>
>> by
>>
>> else if (!stringDistance.equals(checker.getStringDistance())) {
>> throw new IllegalArgumentException(
>> "All checkers need to use the same StringDistance!!! 1:" +
>> checker.getStringDistance() + " 2: " + stringDistance);
>>   }
>>
>> as it was done in the patch
>>
>> but still, when I send a spellcheck request, I get the error
>>
>> msg": "All checkers need to use the same StringDistance!!!
>> 1:org.apache.lucene.search.spell.LuceneLevenshteinDistance@15f57db32:
>> org.apache.lucene.search.spell.LuceneLevenshteinDistance@280f7e08"
>>
>> From error message I gather both spellchecker use same distanceMeasure
>> LuceneLevenshteinDistance, but they're not same instance of
>> LuceneLevenshteinDistance.
>>
>> Is the condition all right? What should be done to fix this properly?
>>
>> Thanks,
>> Elisabeth
>>
>
>


Re: spellcheck enabled but not getting any suggestions.

2015-04-17 Thread elisabeth benoit
Shouldn't you specify a spellcheck.dictionary in your request handler?

Best regards,
Elisabeth

2015-04-17 11:24 GMT+02:00 Derek Poh :

> Hi
>
> I have enabled spellcheck but not getting any suggestions withincorrectly
> spelled keywords.
> I added the spellcheck into the/select request handler.
>
> What steps did I miss out?
>
> spellcheck list in return result:
> 
> 
> 
>
>
> solrconfig.xml:
>
> 
> 
>  
>explicit
>10
>text
>
>on
>false
>5
>2
>5
>true
>true
>5
>3
>  
>
>  
>  
> spellcheck
>  
>
> 
>
>
>


Solr join between documents

2016-05-19 Thread elisabeth benoit
Hello all,

I was wondering if there was a solr solution for a problem I have (and I'm
not the only one I guess)

We use solr as a search engine for addresses. We sometimes have requests
with let's say for instance

street A close to street B City postcode

I was wondering if some kind of join between two documents is possible in
solr?

The query would be: find union of two documents matching all words in query.

Those documents have a latitude and a longitude, and we would fix a max
distance between two documents to be eligible for a join.

Is there a way to do this?

Best regards,
Elisabeth


Re: Solr join between documents

2016-05-21 Thread elisabeth benoit
Ok, thanks for your answer! That's what I thought but just wanted to be
sure.

Best regards,
Elisabeth

2016-05-21 2:02 GMT+02:00 Erick Erickson :

> Gosh, I'm not even sure how to start to form such a query.
>
> Let's see, you have StreetB in some city identified by postal code P.
>
> Is what you're wanting "return me all pairs of documents within that
> postal code that have all the terms matching and the polygons enclosing
> those streets plus some distance intersect"?
>
> Seems difficult.
>
> Best,
> Erick
>
> On Thu, May 19, 2016 at 8:35 AM, elisabeth benoit
>  wrote:
> > Hello all,
> >
> > I was wondering if there was a solr solution for a problem I have (and
> I'm
> > not the only one I guess)
> >
> > We use solr as a search engine for addresses. We sometimes have requests
> > with let's say for instance
> >
> > street A close to street B City postcode
> >
> > I was wondering if some kind of join between two documents is possible in
> > solr?
> >
> > The query would be: find union of two documents matching all words in
> query.
> >
> > Those documents have a latitude and a longitude, and we would fix a max
> > distance between two documents to be eligible for a join.
> >
> > Is there a way to do this?
> >
> > Best regards,
> > Elisabeth
>


Re: Boosting exact match fields.

2016-06-16 Thread elisabeth benoit
In addition to what was proposed

We use the technic described here

https://github.com/cominvent/exactmatch

and it works quite well.

Best regards
Elisabeth

2016-06-15 16:32 GMT+02:00 Alessandro Benedetti :

> In addition to what Erick correctly proposed,
> are you storing norms for your field of interest ( to boost documents with
> shorter field values )?
> If you are, I find suspicious "Sony Ear Phones" to win over "Ear Phones"
> for your "Ear Phones" query.
> What are the other factors currently involved in your relevancy score
> calculus ?
>
> Cheers
>
> On Tue, Jun 14, 2016 at 4:48 PM, Erick Erickson 
> wrote:
>
> > If these are the complete field, i.e. your document
> > contains exactly "ear phones" and not "ear phones
> > are great" use a copyField to put it into an "exact_match"
> > field that uses a much simpler analysis chain based
> > on KeywordTokenizer (plus, perhaps things like
> > lowercaseFilter, maybe strip punctuation and the like".
> > Then you add a clause on exact_match boosted
> > really high.
> >
> > Best,
> > Erick
> >
> > On Tue, Jun 14, 2016 at 1:01 AM, Naveen Pajjuri
> >  wrote:
> > > Hi,
> > >
> > > I have documents with a field (data type definition for that field is
> > > below) values as ear phones, sony ear phones, philips ear phones. when
> i
> > > query for earphones sony ear phones is the top result where as i want
> ear
> > > phones as top result. please suggest how to boost exact matches. PS: I
> > have
> > > earphones => ear phones in my synonyms.txt and the datatype definition
> > for
> > > that field keywords is  > > positionIncrementGap="100">   > > "solr.WhitespaceTokenizerFactory"/>  > class="solr.StopFilterFactory"
> > > ignoreCase="true" words="stopwords.txt"/>  > > "solr.LowerCaseFilterFactory"/>  class="solr.SynonymFilterFactory"
> > > synonyms="synonyms.txt" ignoreCase="true" expand="true"/>  class=
> > > "solr.RemoveDuplicatesTokenFilterFactory"/>   > > "query">   > class=
> > > "solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
> >  > > class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> > ignoreCase="true"
> > > expand="true"/>   > class=
> > > "solr.RemoveDuplicatesTokenFilterFactory"/>  
> > REGARDS,
> > > Naveen
> >
>
>
>
> --
> --
>
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>


solr 5.5.2 loadOnStartUp does not work

2016-07-25 Thread elisabeth benoit
Hello,

I have a core.properties with content

name=indexer
loadOnStartup=false


but the core is loaded on start up (it appears on the admin interface).

I thougth the core would be unloaded on startup. did I miss something?


best regards,

elisabeth


Re: solr 5.5.2 loadOnStartUp does not work

2016-07-26 Thread elisabeth benoit
Hello,

Thanks for your answer.

Yes, it seems a little tricky to me.

Best regards,
Elisabeth

2016-07-25 18:06 GMT+02:00 Erick Erickson :

> "Load" is a little tricky here, it means "load the core and open a
> searcher.
> The core _descriptor_ which is the internal structure of
> core.properties (plus some other info) _is_ loaded and is what's
> used to show the list of available cores. Else how would you
> even know the core existed?
>
> It's not until you actually try to do anything (even click on the
> item in the "cores" drop-down) that the heavy-duty
> work of opening the core actually executes.
>
> So I think it's working as expected,. But do note
> that this whole area (transient cores, loading on
> startup true/false) is intended for stand-alone
> Solr and is unsupported in SolrCloud.
>
> Best,
> Erick
>
> On Mon, Jul 25, 2016 at 6:09 AM, elisabeth benoit
>  wrote:
> > Hello,
> >
> > I have a core.properties with content
> >
> > name=indexer
> > loadOnStartup=false
> >
> >
> > but the core is loaded on start up (it appears on the admin interface).
> >
> > I thougth the core would be unloaded on startup. did I miss something?
> >
> >
> > best regards,
> >
> > elisabeth
>


Solr 5.5.2 mm parameter not working the same

2016-07-27 Thread elisabeth benoit
Hello,

We are migrating from solr 4.10.1 to solr 5.5.2, and it seems that the mm
parameter is not working the same anymore.

In fact, as soon as there is a word not in the index in the query, no
matter what mm value I send, I get no answer as if my query is a pure AND
query.

Does anyone have a clue?

Best regards,
Elisabeth


Re: Solr 5.5.2 mm parameter not working the same

2016-07-27 Thread elisabeth benoit
oh sorry wrote too fast. had to change the defaultOperator to OR.

Elisabeth

2016-07-27 10:11 GMT+02:00 elisabeth benoit :

>
> Hello,
>
> We are migrating from solr 4.10.1 to solr 5.5.2, and it seems that the mm
> parameter is not working the same anymore.
>
> In fact, as soon as there is a word not in the index in the query, no
> matter what mm value I send, I get no answer as if my query is a pure AND
> query.
>
> Does anyone have a clue?
>
> Best regards,
> Elisabeth
>
>


equivalent of localhost_access_log for solr 5

2016-08-24 Thread elisabeth benoit
Hello,

I'd like to know what is the best way to have the equivalent of
tomcat localhost_access_log  for solr 5?

Best regards,
Elisabeth


another log question about solr 5

2016-08-24 Thread elisabeth benoit
Hello again,

We're planning on using solr 5.5.2 on production, using installation
script install_solr_service.sh.

I was wondering was is the right way to prevent solr 5 from creating a new
log file at every startup  (and renaming the actual file mv
"$SOLR_LOGS_DIR/solr_gc.log" "$SOLR_LOGS_DIR/solr_gc_log_$(date
+"%Y%m%d_%H%M")"

Thanks,
Elisabeth


Re: equivalent of localhost_access_log for solr 5

2016-08-24 Thread elisabeth benoit
Thanks a lot for your answer.

Best regards,
elisabeth

2016-08-24 16:16 GMT+02:00 Shawn Heisey :

> On 8/24/2016 5:44 AM, elisabeth benoit wrote:
> > I'd like to know what is the best way to have the equivalent of tomcat
> > localhost_access_log for solr 5?
>
> I don't know for sure what that is, but it sounds like a request log.
> If you edit server/etc/jetty.xml you will find a commented out section
> of configuration that enables a request log.  The header says "Configure
> Request Log".  If that's what you want, just uncomment that section and
> restart Solr.
>
> Thanks,
> Shawn
>
>


Re: another log question about solr 5

2016-08-25 Thread elisabeth benoit
Thanks! This is very helpful!

Best regards,
Elisabeth

2016-08-25 17:07 GMT+02:00 Shawn Heisey :

> On 8/24/2016 6:01 AM, elisabeth benoit wrote:
> > I was wondering was is the right way to prevent solr 5 from creating a
> new
> > log file at every startup  (and renaming the actual file mv
> > "$SOLR_LOGS_DIR/solr_gc.log" "$SOLR_LOGS_DIR/solr_gc_log_$(date
> > +"%Y%m%d_%H%M")"
>
> I think if you find and comment/remove the command in the startup script
> that renames the logfile, that would do it.  The default log4j config
> will rotate the logfiles.  You can comment the first part of the
> bin/solr section labeled "backup the log files before starting".  I
> would recommend NOT commenting the next part, which rotates the garbage
> collection log.
>
> You should also modify server/resources/log4j.properties to remove all
> mention of the CONSOLE output.  The console logfile is created by shell
> redirection, which means it is never rotated and can fill up your disk.
> It's a duplicate of information that goes into solr.log, so you don't
> need it.  This means removing ", CONSOLE" from the log4j.rootLogger line
> and entirely removing the lines that start with log4j.appender.CONSOLE.
>
> You might also want to adjust the log4j.appender.file.MaxFileSize line
> in log4j.properties -- 4 megabytes is very small, which means that your
> logfile history might not cover enough time to be useful.
>
> Dev note:I think we really need to include gc logfile rotation in the
> startup script.  If the java heap is properly sized, this file won't
> grow super-quickly, but it WILL grow, and that might cause issues.  I
> also think that the MaxFileSize default in log4j.properties needs to be
> larger.
>
> Thanks,
> Shawn
>
>


threads blocked in LRUcache.get() in solr 5.5.2

2016-08-31 Thread elisabeth benoit
Hello,

We are migrating from solr 4.10.1 to solr 5.5.2. We don't use solr cloud.

We installed the service with installation script and kept the default
configuration, except for a few settings about logs and the gc config (the
same used with solr 4.10.1).

We tested today the performances of solr 5.5.2 with a limit test, and got
really really bad performances, some queries taking up to 29 ms (on our
dev server, which are sub dimensioned, but with no perf test, the query
time is still bigger, but not THAT much)

The server has three cores, one of 8g, one of 3g and one of less than 1g.
The machine has 64g of ram and xmx and xms are set to 16g.

 We check the jvm in visualvm and noticed too many threads were created by
jetty. The max threads was set to 1 in jetty.xml, so we lowered it to
400 (the same number we used with tomcat7)

Then we perf tested again, the queries were still very slow, with not so
much of the cpu used, as we saw with top, 16 cores all used at most at 20%
(but really some juste 5%). After 30 minutes of test, we could see in
visualvm that the threads we're spending 65% of their time in
LRUCache.get() and 25% in LRUCache.put(). We noticed in visualvm the
solrthreads were mostly blocked, and then checked the dump threads in solr
admin interface, and the blocked ones were waiting for LRUcache.get().

We have queries with filters (fq parameter). We use FastLRUCache for filter
cache and LRUCache for document cache, with a min/max size of 512 for
filter and 15000 for document cache. This may seem small but it's the value
we use with solr 4.10.1 in production with what we consider good enough
performances (less than 40 ms).

Does anyone have an idea what is wrong ? Our configuration is ok with solr
4.10.1.

Best regards,
Elisabeth


solr 5.5.2 dump threads - threads blocked in org.eclipse.jetty.util.BlockingArrayQueue

2016-09-08 Thread elisabeth benoit
Hello,


We are perf testing solr 5.5.2 (with a limit test, i.e. sending as much
queries/sec as possible) and we see the cpu never goes over 20%, and
threads are blocked in org.eclipse.jetty.util.BlockingArrayQueue, as we can
see in solr admin interface thread dumps

qtp706277948-757 (757)

java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@2c4a56cb

   - sun.misc.Unsafe.park​(Native Method)
   - java.util.concurrent.locks.LockSupport.parkNanos​(LockSupport.java:215)
   -
   
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos​(AbstractQueuedSynchronizer.java:2078)
   -
   org.eclipse.jetty.util.BlockingArrayQueue.poll​(BlockingArrayQueue.java:389)
   -
   
org.eclipse.jetty.util.thread.QueuedThreadPool.idleJobPoll​(QueuedThreadPool.java:531)
   -
   
org.eclipse.jetty.util.thread.QueuedThreadPool.access$700​(QueuedThreadPool.java:47)
   -
   
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run​(QueuedThreadPool.java:590)
   - java.lang.Thread.run​(Thread.java:745)


We changed two things in jetty configuration,

maxThreads value in /opt/solr/server/solr/jetty.xml




and we activated the request log, i.e. uncommented the lines





  



  



  



   /var/solr/logs/requests.log



_MM_dd

90

true

false

false

UTC

true

  



  



  




in jetty.xml


We had the same result with maxThreads=1 (default value in solr
install).


Did anyone experiment the same issue with solr 5?


Best regards,

Elisabeth


Re: solr 5.5.2 dump threads - threads blocked in org.eclipse.jetty.util.BlockingArrayQueue

2016-09-08 Thread elisabeth benoit
Well, we rekicked the machine with puppet, restarted solr and now it seems
ok. dont know what happened.

2016-09-08 11:38 GMT+02:00 elisabeth benoit :

>
> Hello,
>
>
> We are perf testing solr 5.5.2 (with a limit test, i.e. sending as much
> queries/sec as possible) and we see the cpu never goes over 20%, and
> threads are blocked in org.eclipse.jetty.util.BlockingArrayQueue, as we
> can see in solr admin interface thread dumps
>
> qtp706277948-757 (757)
>
> java.util.concurrent.locks.AbstractQueuedSynchronizer$
> ConditionObject@2c4a56cb
>
>- sun.misc.Unsafe.park​(Native Method)
>- java.util.concurrent.locks.LockSupport.parkNanos​(
>LockSupport.java:215)
>- java.util.concurrent.locks.AbstractQueuedSynchronizer$
>ConditionObject.awaitNanos​(AbstractQueuedSynchronizer.java:2078)
>- org.eclipse.jetty.util.BlockingArrayQueue.poll​(
>BlockingArrayQueue.java:389)
>- org.eclipse.jetty.util.thread.QueuedThreadPool.idleJobPoll​(
>QueuedThreadPool.java:531)
>- org.eclipse.jetty.util.thread.QueuedThreadPool.access$700​(
>QueuedThreadPool.java:47)
>- org.eclipse.jetty.util.thread.QueuedThreadPool$3.run​(
>QueuedThreadPool.java:590)
>- java.lang.Thread.run​(Thread.java:745)
>
>
> We changed two things in jetty configuration,
>
> maxThreads value in /opt/solr/server/solr/jetty.xml
>
>  default="400"/>
>
>
> and we activated the request log, i.e. uncommented the lines
>
>
>
> 
>
>   
>
> 
>
>   
>
> 
>
>   
>
> 
>
>/var/solr/logs/requests.log
>
> 
>
> _MM_dd
>
> 90
>
> true
>
> false
>
> false
>
> UTC
>
> true
>
>   
>
> 
>
>   
>
> 
>
>   
>
> 
>
>
> in jetty.xml
>
>
> We had the same result with maxThreads=1 (default value in solr
> install).
>
>
> Did anyone experiment the same issue with solr 5?
>
>
> Best regards,
>
> Elisabeth
>


migration to solr 5.5.2 highlight on ngrams not working

2016-09-22 Thread elisabeth benoit
Hello

After migrating from solr 4.10.1 to solr 5.5.2, we dont have the same
behaviour with highlighting on edge ngrams fields.

We're using it for an autocomplete component. With Solr 4.10.1, if request
is sol, highlighting on solr is sol<\em>r

with solr 5.5.2, we have solr<\em>.

Same problem as described in
http://grokbase.com/t/lucene/solr-user/154m4jzv2f/solr-5-hit-highlight-with-ngram-edgengram-fields

but nobody answered the post.

Does anyone know we can fix this?

Best regards,
Elisabeth

Field definition


  






  
  






  



Re: migration to solr 5.5.2 highlight on ngrams not working

2016-09-22 Thread elisabeth benoit
and as was said in previous post, we can clearly see in analysis output
that end values for edgengrams are good for solr 4.10.1 and not good for
solr 5.5.2


solr 5.5.2

text
raw_bytes
start
end
positionLength
type
position
p
[70]
0
5
1
word
1
pa
[70 61]
0
5
1
word
1
par
[70 61 72]
0
5
1
word
1
pari
[70 61 72 69]
0
5
1
word
1
paris
[70 61 72 69 73]
0
5
1
word



end is always set to 5, which is false


solr 4.10.1


text
raw_bytes
start
end
positionLength
type
position
p
[70]
0
1
1
word
1
pa
[70 61]
0
2
1
word
1
par
[70 61 72]
0
3
1
word
1
pari
[70 61 72 69]
0
4
1
word
1
paris
[70 61 72 69 73]
0
5
1
word

end is set to 1, 2, 3 or 4 depending on edgengrams length


2016-09-22 14:57 GMT+02:00 elisabeth benoit :

>
> Hello
>
> After migrating from solr 4.10.1 to solr 5.5.2, we dont have the same
> behaviour with highlighting on edge ngrams fields.
>
> We're using it for an autocomplete component. With Solr 4.10.1, if request
> is sol, highlighting on solr is sol<\em>r
>
> with solr 5.5.2, we have solr<\em>.
>
> Same problem as described in http://grokbase.com/t/
> lucene/solr-user/154m4jzv2f/solr-5-hit-highlight-with-
> ngram-edgengram-fields
>
> but nobody answered the post.
>
> Does anyone know we can fix this?
>
> Best regards,
> Elisabeth
>
> Field definition
>
> 
>   
> 
> 
>  pattern="[\s,;:\-\']"/>
>  splitOnNumerics="0"
> generateWordParts="1"
> generateNumberParts="1"
> catenateWords="0"
> catenateNumbers="0"
> catenateAll="0"
> splitOnCaseChange="1"
> preserveOriginal="1"
> types="wdfftypes.txt"
> />
> 
>  minGramSize="1"/>
>   
>   
> 
> 
>  pattern="[\s,;:\-\']"/>
>  splitOnNumerics="0"
> generateWordParts="1"
> generateNumberParts="0"
> catenateWords="0"
> catenateNumbers="0"
> catenateAll="0"
> splitOnCaseChange="0"
> preserveOriginal="1"
> types="wdfftypes.txt"
> />
> 
>
>   
> 
>


solr 5.5.2 using omitNorms=False on multivalued fields

2016-10-18 Thread elisabeth benoit
Hello,

I would like to score higher, or even better to sort documents with same
text scores based on norm

for instance, with query "a b d"

document with

a b d

should score higher  than (or appear before)  document with

a b c d

The problem is my field is multivalued so omitNorms= False is not working.

Does anyone know how to achieve this with a multivalued field on solr 5.5.2?


Best regards,
Elisabeth


in-places update solr 5.5.2

2017-07-26 Thread elisabeth benoit
Are in place updates available in solr 5.5.2, I find atomic updates in the
doc
https://archive.apache.org/dist/lucene/solr/ref-guide/apache-solr-ref-guide-5.5.pdf,
which redirects me to the page
https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents#UpdatingPartsofDocuments-AtomicUpdates
.

On that page, for in-place updates, it says

the _*version*_ field is also a non-indexed, non-stored single valued
docValues field

when I try this with solr 5.5.2 I get an error message

org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
Unable to use updateLog: _version_ field must exist in schema, using
indexed=\"true\" or docValues=\"true\", stored=\"true\" and
multiValued=\"false\" (_version_ is not stored


What I'm looking for is a way to update one field of a doc without erasing
the non stored fields. Is this possible in solr 5.5.2?

best regards,
Elisabeth


Re: in-places update solr 5.5.2

2017-07-26 Thread elisabeth benoit
Thanks a lot for your answer

2017-07-26 16:35 GMT+02:00 Cassandra Targett :

> The in-place update section you referenced was added in Solr 6.5. On
> p. 224 of the PDF for 5.5, note it says there are only two available
> approaches and the section on in-place updates you see online isn't
> mentioned. I looked into the history of the online page and the
> section on in-place updates was added for Solr 6.5, when SOLR-5944 was
> released.
>
> So, unfortunately, unless someone else has a better option for
> pre-6.5, I believe it was not possible in 5.5.2.
>
> Cassandra
>
> On Wed, Jul 26, 2017 at 2:30 AM, elisabeth benoit
>  wrote:
> > Are in place updates available in solr 5.5.2, I find atomic updates in
> the
> > doc
> > https://archive.apache.org/dist/lucene/solr/ref-guide/
> apache-solr-ref-guide-5.5.pdf,
> > which redirects me to the page
> > https://cwiki.apache.org/confluence/display/solr/
> Updating+Parts+of+Documents#UpdatingPartsofDocuments-AtomicUpdates
> > .
> >
> > On that page, for in-place updates, it says
> >
> > the _*version*_ field is also a non-indexed, non-stored single valued
> > docValues field
> >
> > when I try this with solr 5.5.2 I get an error message
> >
> > org.apache.solr.common.SolrException:org.apache.solr.
> common.SolrException:
> > Unable to use updateLog: _version_ field must exist in schema, using
> > indexed=\"true\" or docValues=\"true\", stored=\"true\" and
> > multiValued=\"false\" (_version_ is not stored
> >
> >
> > What I'm looking for is a way to update one field of a doc without
> erasing
> > the non stored fields. Is this possible in solr 5.5.2?
> >
> > best regards,
> > Elisabeth
>


solr 5.5.2 bug in edismax pf2 when boosting term

2017-05-18 Thread elisabeth benoit
Hello,

I am using solr 5.5.2.

I am trying to give a lower score to frequent words in query.

The only way I've found so far is to do like

q=avenue^0.1 de champaubert village suisse 75015 paris

where avenue is a frequent word.

The problem is I'm using edismax, and when I add ^0.1 to avenue, it is not
considered anymore in pf2.

I am looking for a work around, or another way to give lower score to
frequent words in solr.

If anyone could help it would be great.

Elisabeth


solr 4.2.1 index gets slower over time

2014-03-31 Thread elisabeth benoit
Hello,

We are currently using solr 4.2.1. Our index is updated on a daily basis.
After noticing solr query time has increased (two times the initial size)
without any change in index size or in solr configuration, we tried an
optimize on the index but it didn't fix our problem. We checked the garbage
collector, but everything seemed fine. What did in fact fix our problem was
to delete all documents and reindex from scratch.

It looks like over time our index gets "corrupted" and optimize doesn't fix
it. Does anyone have a clue how to investigate further this situation?


Elisabeth


Re: solr 4.2.1 index gets slower over time

2014-03-31 Thread elisabeth benoit
Hello,

Thanks for your answer.

We use JVisualVM. The CPU usage is very high (90%), but the GC activity
shows less than 0.01% average activity. Plus the heap usage stays low
(below 4G while the max heap size is 16G).

Do you have a different tool to suggest to check the GC? Do you think there
is something else me might not see?

Thanks again,
Elisabeth


2014-03-31 16:26 GMT+02:00 Shawn Heisey :

> On 3/31/2014 6:57 AM, elisabeth benoit wrote:
> > We are currently using solr 4.2.1. Our index is updated on a daily basis.
> > After noticing solr query time has increased (two times the initial size)
> > without any change in index size or in solr configuration, we tried an
> > optimize on the index but it didn't fix our problem. We checked the
> garbage
> > collector, but everything seemed fine. What did in fact fix our problem
> was
> > to delete all documents and reindex from scratch.
> >
> > It looks like over time our index gets "corrupted" and optimize doesn't
> fix
> > it. Does anyone have a clue how to investigate further this situation?
>
> That seems very odd.  I have one production copy of my index using
> 4.2.1, and it has been working fine for quite a long time.  We are
> transitioning to Solr 4.6.1 now, so the other copy is running that
> version.  We do occasionally do a full rebuild, but that is for index
> content, not for any problems.
>
> When you say you checked your garbage collector, what tools did you use?
>  I was having GC pause problems, but I didn't know it until I started
> using different tools.
>
> Thanks,
> Shawn
>
>


Re: solr 4.2.1 index gets slower over time

2014-04-01 Thread elisabeth benoit
Thanks a lot for your answers!

Shawn. Our GC configuration has far less parameters defined, so we'll check
this out.

Dimitry, about the expungeDeletes option, we'll add that in the delete
process. But from what I read, this is done in the optimize process (cf.
http://lucene.472066.n3.nabble.com/Does-expungeDeletes-need-calling-during-an-optimize-td1214083.html).
Or maybe not?

Thanks again,
Elisabeth


2014-04-01 7:52 GMT+02:00 Dmitry Kan :

> Hi,
>
> We have noticed something like this as well, but with older versions of
> solr, 3.4. In our setup we delete documents pretty often. Internally in
> Lucene, when a document is client requested to be deleted, it is not
> physically deleted, but only marked as "deleted". Our original optimization
> assumption was such that the "deleted" documents would get physically
> removed on each optimize command issued. We started to suspect it wasn't
> always true as the shards (especially relatively large shards) became
> slower over time. So we found out about the expungeDeletes option, which
> purges the "deleted" docs and is by default false. We have set it to true.
> If your solr update lifecycle includes frequent deletes, try this out.
>
> This of course does not override working towards finding better
> GCparameters.
>
> https://cwiki.apache.org/confluence/display/solr/Near+Real+Time+Searching
>
>
> On Mon, Mar 31, 2014 at 3:57 PM, elisabeth benoit <
> elisaelisael...@gmail.com
> > wrote:
>
> > Hello,
> >
> > We are currently using solr 4.2.1. Our index is updated on a daily basis.
> > After noticing solr query time has increased (two times the initial size)
> > without any change in index size or in solr configuration, we tried an
> > optimize on the index but it didn't fix our problem. We checked the
> garbage
> > collector, but everything seemed fine. What did in fact fix our problem
> was
> > to delete all documents and reindex from scratch.
> >
> > It looks like over time our index gets "corrupted" and optimize doesn't
> fix
> > it. Does anyone have a clue how to investigate further this situation?
> >
> >
> > Elisabeth
> >
>
>
>
> --
> Dmitry
> Blog: http://dmitrykan.blogspot.com
> Twitter: http://twitter.com/dmitrykan
>


Re: Re: solr 4.2.1 index gets slower over time

2014-04-02 Thread elisabeth benoit
This sounds interesting, I'll check this out.

Thanks!
Elisabeth


2014-04-02 8:54 GMT+02:00 Dmitry Kan :

> Thanks, Markus, that is useful.
> I'm guessing the higher the weight, the longer the op takes?
>
>
> On Tue, Apr 1, 2014 at 10:39 PM, Markus Jelsma
> wrote:
>
> > You may want to increase reclaimdeletesweight for tieredmergepolicy from
> 2
> > to 3 or 4. By default it may keep too much deleted or updated docs in the
> > index. This can increase index size by 50%!! Dmitry Kan <
> > solrexp...@gmail.com> schreef:Elisabeth,
> >
> > Yes, I believe you are right in that the deletes are part of the optimize
> > process. If you delete often, you may consider (if not already) the
> > TieredMergePolicy, which is suited for this scenario. Check out this
> > relevant discussion I had with Lucene committers:
> > https://twitter.com/DmitryKan/status/399820408444051456
> >
> > HTH,
> >
> > Dmitry
> >
> >
> > On Tue, Apr 1, 2014 at 11:34 AM, elisabeth benoit <
> > elisaelisael...@gmail.com
> > > wrote:
> >
> > > Thanks a lot for your answers!
> > >
> > > Shawn. Our GC configuration has far less parameters defined, so we'll
> > check
> > > this out.
> > >
> > > Dimitry, about the expungeDeletes option, we'll add that in the delete
> > > process. But from what I read, this is done in the optimize process
> (cf.
> > >
> > >
> >
> http://lucene.472066.n3.nabble.com/Does-expungeDeletes-need-calling-during-an-optimize-td1214083.html
> > > ).
> > > Or maybe not?
> > >
> > > Thanks again,
> > > Elisabeth
> > >
> > >
> > > 2014-04-01 7:52 GMT+02:00 Dmitry Kan :
> > >
> > > > Hi,
> > > >
> > > > We have noticed something like this as well, but with older versions
> of
> > > > solr, 3.4. In our setup we delete documents pretty often. Internally
> in
> > > > Lucene, when a document is client requested to be deleted, it is not
> > > > physically deleted, but only marked as "deleted". Our original
> > > optimization
> > > > assumption was such that the "deleted" documents would get physically
> > > > removed on each optimize command issued. We started to suspect it
> > wasn't
> > > > always true as the shards (especially relatively large shards) became
> > > > slower over time. So we found out about the expungeDeletes option,
> > which
> > > > purges the "deleted" docs and is by default false. We have set it to
> > > true.
> > > > If your solr update lifecycle includes frequent deletes, try this
> out.
> > > >
> > > > This of course does not override working towards finding better
> > > > GCparameters.
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/solr/Near+Real+Time+Searching
> > > >
> > > >
> > > > On Mon, Mar 31, 2014 at 3:57 PM, elisabeth benoit <
> > > > elisaelisael...@gmail.com
> > > > > wrote:
> > > >
> > > > > Hello,
> > > > >
> > > > > We are currently using solr 4.2.1. Our index is updated on a daily
> > > basis.
> > > > > After noticing solr query time has increased (two times the initial
> > > size)
> > > > > without any change in index size or in solr configuration, we tried
> > an
> > > > > optimize on the index but it didn't fix our problem. We checked the
> > > > garbage
> > > > > collector, but everything seemed fine. What did in fact fix our
> > problem
> > > > was
> > > > > to delete all documents and reindex from scratch.
> > > > >
> > > > > It looks like over time our index gets "corrupted" and optimize
> > doesn't
> > > > fix
> > > > > it. Does anyone have a clue how to investigate further this
> > situation?
> > > > >
> > > > >
> > > > > Elisabeth
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Dmitry
> > > > Blog: http://dmitrykan.blogspot.com
> > > > Twitter: http://twitter.com/dmitrykan
> > > >
> > >
> >
> >
> >
> > --
> > Dmitry
> > Blog: http://dmitrykan.blogspot.com
> > Twitter: http://twitter.com/dmitrykan
> >
>
>
>
> --
> Dmitry
> Blog: http://dmitrykan.blogspot.com
> Twitter: http://twitter.com/dmitrykan
>


permissive mm value and efficient spellchecking

2014-05-14 Thread elisabeth benoit
Hello,

I'm using solr 4.2.1.

I use a very permissive value for mm, to be able to find results even if
request contains non relevant words.

At the same time, I'd like to be able to do some efficient spellcheking
with solrdirectspellchecker.

So for instance, if user searches for "rue de Chraonne Paris", where
Chraonne is mispelled, because of my permissive mm value I get more than
100 000 results containing words "rue" and "Paris" ("de" is a stopword),
which are very frequent terms in my index, but no spellcheck correction for
Chraonne. If I set mm=3, then I get the expected spellcheck correction
value: "rue de Charonne Paris".

Is there a way to achieve my two goals in a single solr request?

Thanks,
Elisabeth


Re: permissive mm value and efficient spellchecking

2014-05-16 Thread elisabeth benoit
ok, thanks a lot, I'll check that out.


2014-05-14 14:20 GMT+02:00 Markus Jelsma :

> Elisabeth, i think you are looking for SOLR-3211 that introduced
> spellcheck.collateParam.* to override e.g. dismax settings.
>
> Markus
>
> -Original message-
> From:elisabeth benoit 
> Sent:Wed 14-05-2014 14:01
> Subject:permissive mm value and efficient spellchecking
> To:solr-user@lucene.apache.org;
> Hello,
>
> I'm using solr 4.2.1.
>
> I use a very permissive value for mm, to be able to find results even if
> request contains non relevant words.
>
> At the same time, I'd like to be able to do some efficient spellcheking
> with solrdirectspellchecker.
>
> So for instance, if user searches for "rue de Chraonne Paris", where
> Chraonne is mispelled, because of my permissive mm value I get more than
> 100 000 results containing words "rue" and "Paris" ("de" is a stopword),
> which are very frequent terms in my index, but no spellcheck correction for
> Chraonne. If I set mm=3, then I get the expected spellcheck correction
> value: "rue de Charonne Paris".
>
> Is there a way to achieve my two goals in a single solr request?
>
> Thanks,
> Elisabeth
>


split field on json update

2014-06-12 Thread elisabeth benoit
Hello,

Is it possible, in solr 4.2.1, to split a multivalued field with a json
update as it is possible do to with a csv update?

with csv
/update/csv?f.address.split=true&f.address.separator=%2C&commit=true

with json (using a post)
/update/json

Thanks,
Elisabeth


Re: split field on json update

2014-06-12 Thread elisabeth benoit
Thanks for your answer,

best regards,
Elisabeth


2014-06-12 14:07 GMT+02:00 Alexandre Rafalovitch :

> There is always UpdateRequestProcessor.
>
> Regards,
> Alex
> On 12/06/2014 7:05 pm, "elisabeth benoit" 
> wrote:
>
> > Hello,
> >
> > Is it possible, in solr 4.2.1, to split a multivalued field with a json
> > update as it is possible do to with a csv update?
> >
> > with csv
> > /update/csv?f.address.split=true&f.address.separator=%2C&commit=true
> >
> > with json (using a post)
> > /update/json
> >
> > Thanks,
> > Elisabeth
> >
>


spatial search: find result in bbox OR first result outside bbox

2014-07-22 Thread elisabeth benoit
Hello,

I am using solr 4.2.1. I have the following use case.

I should find results inside bbox OR if there is none, first result outside
bbox within a 1000 km distance. I was wondering what is the best way to
proceed.

I was considering doing a geofilt search from the center of my bounding box
and post filtering results.

fq={!geofilt sfield=store}&pt=45.15,-93.85&d=1000

>From a performance point of view I don't think it's a good solution though,
since solr will have to calculate every document distance, then sort.

I was wondering if there was another way to do this and avoid sending more
than one request to solr.

Thanks,
Elisabeth


Re: spatial search: find result in bbox OR first result outside bbox

2014-07-25 Thread elisabeth benoit
Thanks a lot for your answer David!

I'll check that out.

Elisabeth


2014-07-24 20:28 GMT+02:00 david.w.smi...@gmail.com <
david.w.smi...@gmail.com>:

> Hi Elisabeth,
>
> Sorry for not responding sooner; I forgot.
>
> You’re in need of some spatial nearest-neighbor code I wrote but it isn’t
> open-sourced yet.  It works on the RPT grid.
>
> Any way, you should consider doing this in two searches: the first query
> tries the bbox provided, and if that returns nothing then issue a second
> for the closest within the a 1000km distance.  The first query is
> straight-forward as documented.  The second would be close to what you gave
> in your example but sort by distance and return rows=1.  It will *not*
> compute the distance to every document, just those within the 1000km radius
> plus some grid internal grid squares *if* you use spatial RPT
> (“location_rpt” in the example schema).  But use LatLonType for optimal
> sorting performance, not RPT.
>
> With respect to doing this in one search vs two, that would involve writing
> a custom request handler.  I have a patch to make this easier:
> https://issues.apache.org/jira/browse/SOLR-5005.  If in your case there
> are
> absolutely no other filters and it’s not a distributed search (no
> sharding), then you could approach this with a custom query parser that
> generates and executes one query to know if it should return that query or
> return the fallback.
>
> Please let me know how this goes.
>
> ~ David Smiley
> Freelance Apache Lucene/Solr Search Consultant/Developer
> http://www.linkedin.com/in/davidwsmiley
>
>
> On Tue, Jul 22, 2014 at 3:12 AM, elisabeth benoit <
> elisaelisael...@gmail.com
> > wrote:
>
> > Hello,
> >
> > I am using solr 4.2.1. I have the following use case.
> >
> > I should find results inside bbox OR if there is none, first result
> outside
> > bbox within a 1000 km distance. I was wondering what is the best way to
> > proceed.
> >
> > I was considering doing a geofilt search from the center of my bounding
> box
> > and post filtering results.
> >
> > fq={!geofilt sfield=store}&pt=45.15,-93.85&d=1000
> >
> > From a performance point of view I don't think it's a good solution
> though,
> > since solr will have to calculate every document distance, then sort.
> >
> > I was wondering if there was another way to do this and avoid sending
> more
> > than one request to solr.
> >
> > Thanks,
> > Elisabeth
> >
>


Re: How to handle multiple sub second updates to same SOLR Document

2014-01-26 Thread Elisabeth Benoit
yutz

Envoyé de mon iPhoneippj

Le 26 janv. 2014 à 06:13, Shalin Shekhar Mangar  a 
écrit :

> There is no timestamp versioning as such in Solr but there is a new
> document based versioning which will allow you to specify your own
> (externally assigned) versions.
> 
> See the "Document Centric Versioning Constraints" section at
> https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents
> 
> Sub-second soft auto commit can be expensive but it is hard to say if
> it will be too expensive for your use-case. You must benchmark it
> yourself.
> 
> On Sat, Jan 25, 2014 at 11:51 PM, christopher palm  wrote:
>> I have a scenario where the same SOLR document is being updated several
>> times within a few ms of each other due to how the source system is sending
>> in field updates on the document.
>> 
>> The problem I am trying to solve is that the order of these updates isn’t
>> guaranteed once the multi threaded SOLRJ client starts sending them to
>> SOLR, and older updates are overlaying the newer updates on the same
>> document.
>> 
>> I would like to use a timestamp versioning so that the older document
>> change won’t be sent into SOLR, but I didn’t see any automated way of doing
>> this based on the document timestamp.
>> 
>> Is there a good way to handle this scenario in SOLR 4.6?
>> 
>> It seems that we would have to be soft auto committing with a  subsecond
>> level as well, is that even possible?
>> 
>> Thanks,
>> 
>> Chris
> 
> 
> 
> -- 
> Regards,
> Shalin Shekhar Mangar.


looking for a solr/search expert in Paris

2014-09-03 Thread elisabeth benoit
Hello,


We are looking for a solr consultant to help us with our devs using solr.
We've been working on this for a little while, and we feel we need an
expert point of view on what we're doing, who could give us insights about
our solr conf, performance issues, error handling issues (big thing). Well
everything.

The entreprise is in the Paris (France) area. Any suggestion is welcomed.

Thanks,
Elisabeth


Re: looking for a solr/search expert in Paris

2014-09-03 Thread elisabeth benoit
Thanks a lot for your answers.

Best regards,
Elisabeth


2014-09-03 17:18 GMT+02:00 Jack Krupansky :

> Don't forget to check out the Solr Support wiki where consultants
> advertise their services:
> http://wiki.apache.org/solr/Support
>
> And any Solr or Lucene consultants on this mailing list should be sure
> that they are "registered" on that support wiki. Hey, it's free! And be
> sure to keep your listing up to date, including regional availability and
> any specialties.
>
> -- Jack Krupansky
>
> -Original Message- From: elisabeth benoit
> Sent: Wednesday, September 3, 2014 4:02 AM
> To: solr-user@lucene.apache.org
> Subject: looking for a solr/search expert in Paris
>
>
> Hello,
>
>
> We are looking for a solr consultant to help us with our devs using solr.
> We've been working on this for a little while, and we feel we need an
> expert point of view on what we're doing, who could give us insights about
> our solr conf, performance issues, error handling issues (big thing). Well
> everything.
>
> The entreprise is in the Paris (France) area. Any suggestion is welcomed.
>
> Thanks,
> Elisabeth
>


per field similarity not working with solr 4.2.1

2014-10-09 Thread elisabeth benoit
Hello,

I am using Solr 4..2.1 and I've tried to use a per field similarity, as
described in

https://apache.googlesource.com/lucene-solr/+/c5bb5cd921e1ce65e18eceb55e738f40591214f0/solr/core/src/test-files/solr/collection1/conf/schema-sim.xml

so in my schema I have




and a custom similarity in fieldtype definition


 
   
...

but it is not working

when I send a request with debugQuery=on, instead of [
NoTFSimilarity], I see []

or to give an example, I have


weight(catchall:bretagn in 2575) []

instead of weight(catchall:bretagn in 2575) [NoTFSimilarity]

Anyone has a clue what I am doing wrong?

Best regards,
Elisabeth


does one need to reindex when changing similarity class

2014-10-09 Thread elisabeth benoit
I've read somewhere that we do have to reindex when changing similarity
class. Is that right?

Thanks again,
Elisabeth


Re: per field similarity not working with solr 4.2.1

2014-10-09 Thread elisabeth benoit
Thanks for the information!

I've been struggling with that debug output. Any other way to know for sure
my similarity class is being used?

Thanks again,
Elisabeth

2014-10-09 13:03 GMT+02:00 Markus Jelsma :

> Hi - it should work, not seeing your implemenation in the debug output is
> a known issue.
>
>
> -Original message-
> > From:elisabeth benoit 
> > Sent: Thursday 9th October 2014 12:22
> > To: solr-user@lucene.apache.org
> > Subject: per field similarity not working with solr 4.2.1
> >
> > Hello,
> >
> > I am using Solr 4..2.1 and I've tried to use a per field similarity, as
> > described in
> >
> >
> https://apache.googlesource.com/lucene-solr/+/c5bb5cd921e1ce65e18eceb55e738f40591214f0/solr/core/src/test-files/solr/collection1/conf/schema-sim.xml
> >
> > so in my schema I have
> >
> > 
> > 
> >
> > and a custom similarity in fieldtype definition
> >
> >  > positionIncrementGap="100">
> >   > class="com.company.lbs.solr.search.similarity.NoTFSimilarity"/>
> >
> > ...
> >
> > but it is not working
> >
> > when I send a request with debugQuery=on, instead of [
> > NoTFSimilarity], I see []
> >
> > or to give an example, I have
> >
> >
> > weight(catchall:bretagn in 2575) []
> >
> > instead of weight(catchall:bretagn in 2575) [NoTFSimilarity]
> >
> > Anyone has a clue what I am doing wrong?
> >
> > Best regards,
> > Elisabeth
> >
>


Re: per field similarity not working with solr 4.2.1

2014-10-09 Thread elisabeth benoit
ok thanks.


I think something is not working here (I'm quite sure my similarity class
is not beeing used because when I use
SchemaSimilarityFactory and a custom fieldtype similarity definition with
NoTFSimilarity, I don't get the same scoring as when I use NoTFSimilarity
as global similarity; but I'll try to gather more evidences).

Thanks again,
Elisabeth

2014-10-09 15:05 GMT+02:00 Markus Jelsma :

> Well, it is either the output of your calculation or writing something to
> System.out
> Markus
>
>
>
> -Original message-
> > From:elisabeth benoit 
> > Sent: Thursday 9th October 2014 13:31
> > To: solr-user@lucene.apache.org
> > Subject: Re: per field similarity not working with solr 4.2.1
> >
> > Thanks for the information!
> >
> > I've been struggling with that debug output. Any other way to know for
> sure
> > my similarity class is being used?
> >
> > Thanks again,
> > Elisabeth
> >
> > 2014-10-09 13:03 GMT+02:00 Markus Jelsma :
> >
> > > Hi - it should work, not seeing your implemenation in the debug output
> is
> > > a known issue.
> > >
> > >
> > > -Original message-
> > > > From:elisabeth benoit 
> > > > Sent: Thursday 9th October 2014 12:22
> > > > To: solr-user@lucene.apache.org
> > > > Subject: per field similarity not working with solr 4.2.1
> > > >
> > > > Hello,
> > > >
> > > > I am using Solr 4..2.1 and I've tried to use a per field similarity,
> as
> > > > described in
> > > >
> > > >
> > >
> https://apache.googlesource.com/lucene-solr/+/c5bb5cd921e1ce65e18eceb55e738f40591214f0/solr/core/src/test-files/solr/collection1/conf/schema-sim.xml
> > > >
> > > > so in my schema I have
> > > >
> > > > 
> > > > 
> > > >
> > > > and a custom similarity in fieldtype definition
> > > >
> > > >  > > > positionIncrementGap="100">
> > > >   > > > class="com.company.lbs.solr.search.similarity.NoTFSimilarity"/>
> > > >
> > > > ...
> > > >
> > > > but it is not working
> > > >
> > > > when I send a request with debugQuery=on, instead of [
> > > > NoTFSimilarity], I see []
> > > >
> > > > or to give an example, I have
> > > >
> > > >
> > > > weight(catchall:bretagn in 2575) []
> > > >
> > > > instead of weight(catchall:bretagn in 2575) [NoTFSimilarity]
> > > >
> > > > Anyone has a clue what I am doing wrong?
> > > >
> > > > Best regards,
> > > > Elisabeth
> > > >
> > >
> >
>


Re: does one need to reindex when changing similarity class

2014-10-14 Thread elisabeth benoit
thanks a lot for your answers!

2014-10-14 6:10 GMT+02:00 Jack Krupansky :

> To correct myself, the selected Similarity class can have a computeNorm
> method that calculates the "norm" value that will be stored in the index
> when the document is indexed, so changing the Similarity class will require
> reindexing if the implementation of the computeNorm method is different.
>
> -- Jack Krupansky
>
> -Original Message- From: Markus Jelsma
> Sent: Monday, October 13, 2014 5:06 PM
>
> To: solr-user@lucene.apache.org
> Subject: RE: does one need to reindex when changing similarity class
>
> Yes, if the replacing similarity has a different implementation on norms,
> you should reindex or gradually update all documents within decent time.
>
>
>
> -Original message-
>
>> From:Ahmet Arslan 
>> Sent: Thursday 9th October 2014 18:27
>> To: solr-user@lucene.apache.org
>> Subject: Re: does one need to reindex when changing similarity class
>>
>> How about SweetSpotSimilarity? Length norm is saved at index time?
>>
>>
>>
>> On Thursday, October 9, 2014 5:44 PM, Jack Krupansky <
>> j...@basetechnology.com> wrote:
>> The similarity class is only invoked at query time, so it doesn't
>> participate in indexing.
>>
>> -- Jack Krupansky
>>
>>
>>
>>
>> -Original Message- From: Markus Jelsma
>> Sent: Thursday, October 9, 2014 6:59 AM
>> To: solr-user@lucene.apache.org
>> Subject: RE: does one need to reindex when changing similarity class
>>
>> Hi - no you don't have to, although maybe if you changed on how norms are
>> encoded.
>> Markus
>>
>>
>>
>> -Original message-
>> > From:elisabeth benoit 
>> > Sent: Thursday 9th October 2014 12:26
>> > To: solr-user@lucene.apache.org
>> > Subject: does one need to reindex when changing similarity class
>> >
>> > I've read somewhere that we do have to reindex when changing similarity
>> > class. Is that right?
>> >
>> > Thanks again,
>> > Elisabeth
>> >
>>
>>
>


fuzzy search and edismax: how to do not sum up

2014-10-15 Thread elisabeth benoit
Hello all,

We are using solr 4.2.1 (but planning to switch to solr 4.10 very soon).

We are trying to do approximative search using ~ operator.

We use catchall_light field without stemming (to do not mix fuzzy and
stemming)

We send a request to solr using fuzzy operator on non "frequent" words

for instance

q=catchall_light:(lyon 69002~1)

our handler uses edismax

that query gives a higher score to document Lyon, having postal codes
69001, 69002, 69003, 69004,...

than to other documents having only Lyon and postal code 69002 (the debug
output is below)

but we do not want to sum up all scores for Lyon document.

Does anyone knows if it is possible to change that?

Best regards,
Elisabeth


here is the debug output for Lyon
(we use idf for that field but want to get rid of it)

15.728481 = (MATCH) sum of:
  1.2349477 = (MATCH) weight(catchall_light:lyon in 707758)
[NoTFSimilarity], result of:
1.2349477 = score(doc=707758,freq=1.0 = termFreq=1.0
), product of:
  0.13427915 = queryWeight, product of:
9.196869 = idf(docFreq=2924, maxDocs=10616483)
0.014600528 = queryNorm
  9.196869 = fieldWeight in 707758, product of:
1.0 = tf(freq=1.0), with freq of:
  1.0 = termFreq=1.0
9.196869 = idf(docFreq=2924, maxDocs=10616483)
1.0 = fieldNorm(doc=707758)
  14.493534 = (MATCH) sum of:
1.576392 = (MATCH) weight(catchall_light:69001^0.8 in 707758)
[NoTFSimilarity], result of:
  1.576392 = score(doc=707758,freq=1.0 = termFreq=1.0
), product of:
0.13569424 = queryWeight, product of:
  0.8 = boost
  11.617237 = idf(docFreq=259, maxDocs=10616483)
  0.014600528 = queryNorm
11.617237 = fieldWeight in 707758, product of:
  1.0 = tf(freq=1.0), with freq of:
1.0 = termFreq=1.0
  11.617237 = idf(docFreq=259, maxDocs=10616483)
  1.0 = fieldNorm(doc=707758)
1.8904426 = (MATCH) weight(catchall_light:69002 in 707758)
[NoTFSimilarity], result of:
  1.8904426 = score(doc=707758,freq=1.0 = termFreq=1.0
), product of:
0.16613688 = queryWeight, product of:
  11.378826 = idf(docFreq=329, maxDocs=10616483)
  0.014600528 = queryNorm
11.378826 = fieldWeight in 707758, product of:
  1.0 = tf(freq=1.0), with freq of:
1.0 = termFreq=1.0
  11.378826 = idf(docFreq=329, maxDocs=10616483)
  1.0 = fieldNorm(doc=707758)
1.460347 = (MATCH) weight(catchall_light:69003^0.8 in 707758)
[NoTFSimilarity], result of:
  1.460347 = score(doc=707758,freq=1.0 = termFreq=1.0
), product of:
0.13060425 = queryWeight, product of:
  0.8 = boost
  11.181466 = idf(docFreq=401, maxDocs=10616483)
  0.014600528 = queryNorm
11.181466 = fieldWeight in 707758, product of:
  1.0 = tf(freq=1.0), with freq of:
1.0 = termFreq=1.0
  11.181466 = idf(docFreq=401, maxDocs=10616483)
  1.0 = fieldNorm(doc=707758)
1.7109065 = (MATCH) weight(catchall_light:69004^0.8 in 707758)
[NoTFSimilarity], result of:
  1.7109065 = score(doc=707758,freq=1.0 = termFreq=1.0
), product of:
0.14136517 = queryWeight, product of:
  0.8 = boost
  12.102744 = idf(docFreq=159, maxDocs=10616483)
  0.014600528 = queryNorm
12.102744 = fieldWeight in 707758, product of:
  1.0 = tf(freq=1.0), with freq of:
1.0 = termFreq=1.0
  12.102744 = idf(docFreq=159, maxDocs=10616483)
  1.0 = fieldNorm(doc=707758)
1.5255939 = (MATCH) weight(catchall_light:69005^0.8 in 707758)
[NoTFSimilarity], result of:
  1.5255939 = score(doc=707758,freq=1.0 = termFreq=1.0
), product of:
0.13349001 = queryWeight, product of:
  0.8 = boost
  11.428525 = idf(docFreq=313, maxDocs=10616483)
  0.014600528 = queryNorm
11.428525 = fieldWeight in 707758, product of:
  1.0 = tf(freq=1.0), with freq of:
1.0 = termFreq=1.0
  11.428525 = idf(docFreq=313, maxDocs=10616483)
  1.0 = fieldNorm(doc=707758)
1.6497903 = (MATCH) weight(catchall_light:69006^0.8 in 707758)
[NoTFSimilarity], result of:
  1.6497903 = score(doc=707758,freq=1.0 = termFreq=1.0
), product of:
0.13881733 = queryWeight, product of:
  0.8 = boost
  11.884614 = idf(docFreq=198, maxDocs=10616483)
  0.014600528 = queryNorm
11.884614 = fieldWeight in 707758, product of:
  1.0 = tf(freq=1.0), with freq of:
1.0 = termFreq=1.0
  11.884614 = idf(docFreq=198, maxDocs=10616483)
  1.0 = fieldNorm(doc=707758)
1.5892421 = (MATCH) weight(catchall_light:69007^0.8 in 707758)
[NoTFSimilarity], result of:
  1.5892421 = score(doc=707758,freq=1.0 = termFreq=1.0
), product of:
0.13624617 = queryWeight, product of:
  0.8 = boost
  11.66449 = idf(docFreq=247, maxDocs=10616483)
  0.014600528 = queryNorm
11.66449 = 

prefix length in fuzzy search solr 4.10.1

2014-10-30 Thread elisabeth benoit
Hello all,

Is there a parameter in solr 4.10.1 api allowing user to fix prefix length
in fuzzy search.

Best regards,
Elisabeth


Re: prefix length in fuzzy search solr 4.10.1

2014-11-01 Thread elisabeth benoit
ok, thanks for the answer.

best regards,
Elisabeth

2014-10-31 22:04 GMT+01:00 Jack Krupansky :

> No, but it is a reasonable request, as a global default, a
> collection-specific default, a request-specific default, and on an
> individual fuzzy term.
>
> -- Jack Krupansky
>
> -Original Message- From: elisabeth benoit
> Sent: Thursday, October 30, 2014 6:07 AM
> To: solr-user@lucene.apache.org
> Subject: prefix length in fuzzy search solr 4.10.1
>
>
> Hello all,
>
> Is there a parameter in solr 4.10.1 api allowing user to fix prefix length
> in fuzzy search.
>
> Best regards,
> Elisabeth
>


autocomplete_edge type split words

2013-09-25 Thread elisabeth benoit
Hello,

I am using solr 4.2.1 and I have a autocomplete_edge type defined in
schema.xml



  





   
  




 
  


When I have a request with more then one word, for instance "rue de la", my
request doesn't match with my autocomplete_edge field unless I use quotes
around the query. In other words q=rue de la doesnt work and q="rue de la"
works.

I've check the request with debugQuery=on, and I can see in first case, the
query is splitted into words, and I don't understand why since my field
type uses KeywordTokenizerFactory.

Does anyone have a clue on how I can request my field without using quotes?

Thanks,
Elisabeth


Re: autocomplete_edge type split words

2013-09-27 Thread elisabeth benoit
Thanks for your answer.

So I guess if someone wants to search on two fields, on with phrase query
and one with "normal" query (splitted in words), one has to find a way to
send query twice: one with quote and one without...

Best regards,
Elisabeth


2013/9/27 Erick Erickson 

> This is a classic issue where there's confusion between
> the query parser and field analysis.
>
> Early in the process the query parser has to take the input
> and break it up. that's how, for instance, a query like
> text:term1 term2
> gets parsed as
> text:term1 defaultfield:term2
> This happens long before the terms get to the analysis chain
> for the field.
>
> So your only options are to either quote the string or
> escape the spaces.
>
> Best,
> Erick
>
> On Wed, Sep 25, 2013 at 9:24 AM, elisabeth benoit
>  wrote:
> > Hello,
> >
> > I am using solr 4.2.1 and I have a autocomplete_edge type defined in
> > schema.xml
> >
> >
> > 
> >   
> >  > mapping="mapping-ISOLatin1Accent.txt"/>
> > 
> > 
> >  > replacement=" " replace="all"/>
> >  > minGramSize="1"/>
> >
> >   
> >  > mapping="mapping-ISOLatin1Accent.txt"/>
> > 
> > 
> >  > replacement=" " replace="all"/>
> >   > pattern="^(.{30})(.*)?" replacement="$1" replace="all"/>
> >   
> > 
> >
> > When I have a request with more then one word, for instance "rue de la",
> my
> > request doesn't match with my autocomplete_edge field unless I use quotes
> > around the query. In other words q=rue de la doesnt work and q="rue de
> la"
> > works.
> >
> > I've check the request with debugQuery=on, and I can see in first case,
> the
> > query is splitted into words, and I don't understand why since my field
> > type uses KeywordTokenizerFactory.
> >
> > Does anyone have a clue on how I can request my field without using
> quotes?
> >
> > Thanks,
> > Elisabeth
>


Re: autocomplete_edge type split words

2013-09-27 Thread elisabeth benoit
Yes!

what I've done is set autoGeneratePhraseQueries to true for my field, then
give it a boost (bq=myAutompleteEdgeNGramField="my query with spaces"^50).
This only worked with autoGeneratePhraseQueries=true, for a reason I didn't
understand.

since when I did

q= myAutompleteEdgeNGramField="my query with spaces", I didn't need
autoGeneratePhraseQueries
set to true.

and, another thing is when I tried

q=myAutocompleteNGramField:(my query with spaces) OR
myAutompleteEdgeNGramField="my
query with spaces"

(with a request handler with edismax and default operator field = AND), the
request on myAutocompleteNGramField would OR the grams, so I had to put an
AND (myAutocompleteNGramField:(my AND query AND with AND spaces)), which
was pretty ugly.

I don't always understand what is exactly going on. If you have a pointer
to some text I could read to get more insights about this, please let me
know.

Thanks again,
Best regards,
Elisabeth




2013/9/27 Erick Erickson 

> Have you looked at "autoGeneratePhraseQueries"? That might help.
>
> If that doesn't work, you can always do something like add an OR clause
> like
> OR "original query"
> and optionally boost it high. But I'd start with the autoGenerate bits.
>
> Best,
> Erick
>
>
> On Fri, Sep 27, 2013 at 7:37 AM, elisabeth benoit
>  wrote:
> > Thanks for your answer.
> >
> > So I guess if someone wants to search on two fields, on with phrase query
> > and one with "normal" query (splitted in words), one has to find a way to
> > send query twice: one with quote and one without...
> >
> > Best regards,
> > Elisabeth
> >
> >
> > 2013/9/27 Erick Erickson 
> >
> >> This is a classic issue where there's confusion between
> >> the query parser and field analysis.
> >>
> >> Early in the process the query parser has to take the input
> >> and break it up. that's how, for instance, a query like
> >> text:term1 term2
> >> gets parsed as
> >> text:term1 defaultfield:term2
> >> This happens long before the terms get to the analysis chain
> >> for the field.
> >>
> >> So your only options are to either quote the string or
> >> escape the spaces.
> >>
> >> Best,
> >> Erick
> >>
> >> On Wed, Sep 25, 2013 at 9:24 AM, elisabeth benoit
> >>  wrote:
> >> > Hello,
> >> >
> >> > I am using solr 4.2.1 and I have a autocomplete_edge type defined in
> >> > schema.xml
> >> >
> >> >
> >> > 
> >> >   
> >> >  >> > mapping="mapping-ISOLatin1Accent.txt"/>
> >> > 
> >> > 
> >> >  >> > replacement=" " replace="all"/>
> >> >  >> > minGramSize="1"/>
> >> >
> >> >   
> >> >  >> > mapping="mapping-ISOLatin1Accent.txt"/>
> >> > 
> >> > 
> >> >  >> > replacement=" " replace="all"/>
> >> >   >> > pattern="^(.{30})(.*)?" replacement="$1" replace="all"/>
> >> >   
> >> > 
> >> >
> >> > When I have a request with more then one word, for instance "rue de
> la",
> >> my
> >> > request doesn't match with my autocomplete_edge field unless I use
> quotes
> >> > around the query. In other words q=rue de la doesnt work and q="rue de
> >> la"
> >> > works.
> >> >
> >> > I've check the request with debugQuery=on, and I can see in first
> case,
> >> the
> >> > query is splitted into words, and I don't understand why since my
> field
> >> > type uses KeywordTokenizerFactory.
> >> >
> >> > Does anyone have a clue on how I can request my field without using
> >> quotes?
> >> >
> >> > Thanks,
> >> > Elisabeth
> >>
>


Re: autocomplete_edge type split words

2013-09-30 Thread elisabeth benoit
in fact, I've removed the autoGeneratePhraseQuery=true, and it doesn't
change anything. behaviour is the same with or without (ie request with
debugQuery=on is the same)

Thanks for your comments.

Best,
Elisabeth


2013/9/28 Erick Erickson 

> You've probably been doing this right along, but adding
> debug=query will show the parsed query.
>
> I really question though, your apparent combination of
> autoGeneratePhraseQuery what looks like an ngram field.
> I'm not at all sure how those would interact...
>
> Best,
> Erick
>
> On Fri, Sep 27, 2013 at 10:12 AM, elisabeth benoit
>  wrote:
> > Yes!
> >
> > what I've done is set autoGeneratePhraseQueries to true for my field,
> then
> > give it a boost (bq=myAutompleteEdgeNGramField="my query with
> spaces"^50).
> > This only worked with autoGeneratePhraseQueries=true, for a reason I
> didn't
> > understand.
> >
> > since when I did
> >
> > q= myAutompleteEdgeNGramField="my query with spaces", I didn't need
> > autoGeneratePhraseQueries
> > set to true.
> >
> > and, another thing is when I tried
> >
> > q=myAutocompleteNGramField:(my query with spaces) OR
> > myAutompleteEdgeNGramField="my
> > query with spaces"
> >
> > (with a request handler with edismax and default operator field = AND),
> the
> > request on myAutocompleteNGramField would OR the grams, so I had to put
> an
> > AND (myAutocompleteNGramField:(my AND query AND with AND spaces)), which
> > was pretty ugly.
> >
> > I don't always understand what is exactly going on. If you have a pointer
> > to some text I could read to get more insights about this, please let me
> > know.
> >
> > Thanks again,
> > Best regards,
> > Elisabeth
> >
> >
> >
> >
> > 2013/9/27 Erick Erickson 
> >
> >> Have you looked at "autoGeneratePhraseQueries"? That might help.
> >>
> >> If that doesn't work, you can always do something like add an OR clause
> >> like
> >> OR "original query"
> >> and optionally boost it high. But I'd start with the autoGenerate bits.
> >>
> >> Best,
> >> Erick
> >>
> >>
> >> On Fri, Sep 27, 2013 at 7:37 AM, elisabeth benoit
> >>  wrote:
> >> > Thanks for your answer.
> >> >
> >> > So I guess if someone wants to search on two fields, on with phrase
> query
> >> > and one with "normal" query (splitted in words), one has to find a
> way to
> >> > send query twice: one with quote and one without...
> >> >
> >> > Best regards,
> >> > Elisabeth
> >> >
> >> >
> >> > 2013/9/27 Erick Erickson 
> >> >
> >> >> This is a classic issue where there's confusion between
> >> >> the query parser and field analysis.
> >> >>
> >> >> Early in the process the query parser has to take the input
> >> >> and break it up. that's how, for instance, a query like
> >> >> text:term1 term2
> >> >> gets parsed as
> >> >> text:term1 defaultfield:term2
> >> >> This happens long before the terms get to the analysis chain
> >> >> for the field.
> >> >>
> >> >> So your only options are to either quote the string or
> >> >> escape the spaces.
> >> >>
> >> >> Best,
> >> >> Erick
> >> >>
> >> >> On Wed, Sep 25, 2013 at 9:24 AM, elisabeth benoit
> >> >>  wrote:
> >> >> > Hello,
> >> >> >
> >> >> > I am using solr 4.2.1 and I have a autocomplete_edge type defined
> in
> >> >> > schema.xml
> >> >> >
> >> >> >
> >> >> > 
> >> >> >   
> >> >> >  >> >> > mapping="mapping-ISOLatin1Accent.txt"/>
> >> >> > 
> >> >> > 
> >> >> >  >> >> > replacement=" " replace="all"/>
> >> >> >  >> >> > minGramSize="1"/>
> >> >> >
> >> >> >   
> >> >> >  >> >> > mapping="mapping-ISOLatin1Accent.txt"/>
> >> >> > 
> >> >> > 
> >> >> >  >> >> > replacement=" " replace="all"/>
> >> >> >   >> >> > pattern="^(.{30})(.*)?" replacement="$1" replace="all"/>
> >> >> >   
> >> >> > 
> >> >> >
> >> >> > When I have a request with more then one word, for instance "rue de
> >> la",
> >> >> my
> >> >> > request doesn't match with my autocomplete_edge field unless I use
> >> quotes
> >> >> > around the query. In other words q=rue de la doesnt work and
> q="rue de
> >> >> la"
> >> >> > works.
> >> >> >
> >> >> > I've check the request with debugQuery=on, and I can see in first
> >> case,
> >> >> the
> >> >> > query is splitted into words, and I don't understand why since my
> >> field
> >> >> > type uses KeywordTokenizerFactory.
> >> >> >
> >> >> > Does anyone have a clue on how I can request my field without using
> >> >> quotes?
> >> >> >
> >> >> > Thanks,
> >> >> > Elisabeth
> >> >>
> >>
>


no result with q

2011-09-08 Thread elisabeth benoit
Hello,

I have a query

/select?&q=49&q.alt=*:*&fq=NAME_ANALYZED:decorasol AND WAY_ANALYZED:rue
charonne AND (TOWN_ANALYZED:paris OR
DEPARTMENT_ANALYZED:paris)&rows=50&fl=*,score&

returning no answer because of the q=49 parameter.

The query

/select?&q=&q.alt=*:*&fq=NAME_ANALYZED:decorasol AND WAY_ANALYZED:rue
charonne AND (TOWN_ANALYZED:paris OR
DEPARTMENT_ANALYZED:paris)&rows=50&fl=*,score&


(the only difference is "q=" instead of "q=49")

returns the answer I'm expecting

Is there a way to tell Solr to only use fq and neglect q if no answer? Or am
I doomed to send first request, realize I've got no answer and then send a
second request?

I guess so, but just checking, just in case.

Thanks,
Elisabeth


Re: no result with q

2011-09-08 Thread elisabeth benoit
ok, I guess I found how

q=49 OR *

2011/9/8 elisabeth benoit 

>
> Hello,
>
> I have a query
>
> /select?&q=49&q.alt=*:*&fq=NAME_ANALYZED:decorasol AND WAY_ANALYZED:rue
> charonne AND (TOWN_ANALYZED:paris OR
> DEPARTMENT_ANALYZED:paris)&rows=50&fl=*,score&
>
> returning no answer because of the q=49 parameter.
>
> The query
>
> /select?&q=&q.alt=*:*&fq=NAME_ANALYZED:decorasol AND WAY_ANALYZED:rue
> charonne AND (TOWN_ANALYZED:paris OR
> DEPARTMENT_ANALYZED:paris)&rows=50&fl=*,score&
>
>
> (the only difference is "q=" instead of "q=49")
>
> returns the answer I'm expecting
>
> Is there a way to tell Solr to only use fq and neglect q if no answer? Or
> am I doomed to send first request, realize I've got no answer and then send
> a second request?
>
> I guess so, but just checking, just in case.
>
> Thanks,
> Elisabeth
>
>


getting answers starting with a requested string first

2011-09-16 Thread elisabeth benoit
Hello,

Iif I have a request with

fq=NAME_ANALYZED:tour eiffel

and I have different answers like

Restaurant la tour Eiffel
Hotel la tour Eiffel
Tour Eiffel
...

Is there a way to get answers with NAME_ANALYZED beginning with "tour
Eiffel" first?

Thanks,
Elisabeth


fuzzy search by default

2011-09-20 Thread elisabeth benoit
Hello,

Does anynone know if it is possible to configure Solr to do by default fuzzy
search on every query word? All examples I've seen are ponctual (ie the
tilde operator follows one specific word in q parameter).

Best regards,
Elisabeth


Re: getting answers starting with a requested string first

2011-09-20 Thread elisabeth benoit
Hello all,

I'm answering my own post, hoping someone will comment.

I thought about two possibilities to solve my problem:

1) giving NAME_ANALYZED a type where omitNorms=false: I thought this would
give answers with shorter NAME_ANALYZED field a higher score. I've tested
that solution, but it's not working. I guess this is because there is no
score for fq parameter (all my answers have same score)

2) sorting my answers by length desc, and I guess in this case I would need
to store the length of NAME_ANALYZED field to avoid having to compute it on
the fly. at this point, this is the only solution I can think of.

Any comment would be appreciated,
Thanks,
Elisabeth



2011/9/16 elisabeth benoit 

>
> Hello,
>
> Iif I have a request with
>
> fq=NAME_ANALYZED:tour eiffel
>
> and I have different answers like
>
> Restaurant la tour Eiffel
> Hotel la tour Eiffel
> Tour Eiffel
> ...
>
> Is there a way to get answers with NAME_ANALYZED beginning with "tour
> Eiffel" first?
>
> Thanks,
> Elisabeth
>


Re: getting answers starting with a requested string first

2011-09-28 Thread elisabeth benoit
Thanks a lot for your advice.

What really matters to me is that answers with NAME_ANALYZED=Tour Eiffel
appear first. Then, if "Tour Eiffel Tower By Helicopter" appears before or
after "Hotel la tour Eiffel" doesn't really matter.

Since I send fq=NAME_ANALYZED:tour eiffel, I am sure NAME_ANALYZED will at
least contain those two words. So I figured out that if I sort answers by
this field length, I'll get those called "Tour eiffel" first.

But I'll check the QParser anyway since it seems to be an interesting one.

Best regards,
Elisabeth

2011/9/28 Chris Hostetter 

>
> : 1) giving NAME_ANALYZED a type where omitNorms=false: I thought this
> would
> : give answers with shorter NAME_ANALYZED field a higher score. I've tested
> : that solution, but it's not working. I guess this is because there is no
> : score for fq parameter (all my answers have same score)
>
> both of those statements are correct.  omitNorms=false will cause length
> normalization to apply, so with the default similarity, shorter field
> values will generally score higher, but norms are very coarse, so it
> won't be very precise; and "fq" queries filter the results,
> but do not affect the score.
>
> : 2) sorting my answers by length desc, and I guess in this case I would
> need
> : to store the length of NAME_ANALYZED field to avoid having to compute it
> on
> : the fly. at this point, this is the only solution I can think of.
>
> that will also be a good way to sort on the length of the field, and will
> give you a lot of precise control.
>
> but sorting on length isn't what you asked about...
>
> : > and I have different answers like
> : >
> : > Restaurant la tour Eiffel
> : > Hotel la tour Eiffel
> : > Tour Eiffel
>...
> : > Is there a way to get answers with NAME_ANALYZED beginning with "tour
> : > Eiffel" first?
>
> If you want to score documents higher because they appear at the begining
> of the field value, that is a differnet problem then scoring documents
> higher because they are shorter -- ie: "Tour Eiffel Tower By Helicopter"
> is longer then "Hotel la tour Eiffel", which one do you want to come
> first?
>
> If you want documents to score higher if they appear "early" in the field
> value, you can either index a "marker" token at the begining of the field
> (ie: "S_T_A_R_T Tour Eiffel") and then do all queries on that field as
> phrase queries including that token (shorter matches score higher in
> phrase queries); or you can look into using the "surround" QParser that
> was recently commited to the trunk.  the surround parser has special
> syntax for generting "Span" Queries, which support a "SpanFirst" query
> that scores documents higher based on how close to the begining of a field
> value the match is.
>
>
> -Hoss
>


is there a way to know which mm value was used?

2011-10-05 Thread elisabeth benoit
Hello,

I'd like to be able to know programmaticaly what value mm was set to for one
request (to avoid having to parse the query, identify stopwords, calculate
mm based on solrconfig.xml). Is there a way to get mm value in solr
response?

Thanks,
Elisabeth


Re: is there a way to know which mm value was used?

2011-10-05 Thread elisabeth benoit
thanks for answering.

echoParams just echos mm value in solrconfig.xml (in my case mm = 4<-1
6<-2), not the actual value of mm for one particular request.

I think would be very useful to be able to know which mm value was
effectively used, in particular for request with stopwords.

It's of course possible to calculate mm in my own code, but this would
necessitate to be synchronize with mm default value in solrconfig.xml + with
stopwords.txt + identifying all stopwords in request.

Best regards,
Elisabeth





2011/10/5 Shawn Heisey 

> On 10/5/2011 1:01 AM, elisabeth benoit wrote:
>
>> Hello,
>>
>> I'd like to be able to know programmaticaly what value mm was set to for
>> one
>> request (to avoid having to parse the query, identify stopwords, calculate
>> mm based on solrconfig.xml). Is there a way to get mm value in solr
>> response?
>>
>
> To supplement the other answers you've gotten:
>
> If you set echoParams to all, either on the URL or in the solrconfig.xml
> request handler definition, each request should give you whatever value of
> mm is used, along with all the other parameters, which might be useful
> information.  If mm is not present in the response when you do this, then it
> probably was not specified anywhere.  That would indicate that it is set to
> the default, 100%.
>
> http://wiki.apache.org/solr/**CoreQueryParameters#echoParams<http://wiki.apache.org/solr/CoreQueryParameters#echoParams>
> http://wiki.apache.org/solr/**DisMaxQParserPlugin#mm_.**
> 28Minimum_.27Should.27_Match.**29<http://wiki.apache.org/solr/DisMaxQParserPlugin#mm_.28Minimum_.27Should.27_Match.29>
>
> Thanks,
> Shawn
>
>


Re: How do i get results for quering with separated words?

2011-10-05 Thread elisabeth benoit
I think you could define star wars and starwars as synonyms in
synonyms.txt...

maybe not generic enough?

2011/10/5 Mike Mander 

> Isn't this more a problem of the query string?
>
> Let's assume i have a game name like "Nintentdo 3DS - 'Star Wars - Clone
> Wars'".
> Can i copy that name to a field cutting the - and ', lowercase the result
> string
> and remove the whitespaces? So that i have "nintendo3dsstarwarsclonewars"*
> *.
> Is that "findable" with my "starwars" query string?
>
> Thanks for helping me
> Mike
>
>
>  index this field without whitespaces ? XD
>>
>> -
>> --**- System --**
>> --
>>
>> One Server, 12 GB RAM, 2 Solr Instances, 8 Cores,
>> 1 Core with 45 Million Documents other Cores<  200.000
>>
>> - Solr1 for Search-Requests - commit every Minute  - 5GB Xmx
>> - Solr2 for Update-Request  - delta every Minute - 4GB Xmx
>> --
>> View this message in context: http://lucene.472066.n3.**
>> nabble.com/How-do-i-get-**results-for-quering-with-**separated-words-**
>> tp3395966p3396207.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
>


Re: is there a way to know which mm value was used?

2011-10-05 Thread elisabeth benoit
I would use that mm value to decrease it in case user's request would get no
answer.

I deal with requests potentially containing a lot of parasite words, and I
want to progammaticaly lower mm in a second try request if necessary. But I
don't want to decrease it too much to avoid getting too many irrelevant
answers. That mm value information would be useful for calculating a new mm
value.

Best,
Elisabeth

2011/10/5 Chris Hostetter 

>
> : the response.  When I add "&mm=50%25" to the URL in my browser (%25 being
> the
> : URL encoding for the percent symbol), the response changes the mm value
> to
> : "50%" as expected, overriding the value in solrconfig.xml.  I have not
> tried
>
> that is the value of hte mm param, but elisabeth seems to be asking about
> the actually numeric value computed from the mm param (ie: if there are 4
> clauses, and mm=50%, then the final value is "2")
>
> it's not entirely clear to me *how* this value would be useful to clients,
> so there may be an XY Problem here we should discuss -- but more
> specificly there is no generic way that we could add the final computed mm
> value in the response, since there is no garuntee that there will be only
> one final computed value.  dismax is a QParser, and using nested QParsers
> there could be multiple instances of dismax used in a single request, with
> distinct mm values computed for each of them.
>
> But like i said: there may be an XY Problem here ... what is the end
> goal?  how would you use this value if you had it?
>
> https://people.apache.org/~hossman/#xyproblem
> XY Problem
>
> Your question appears to be an "XY Problem" ... that is: you are dealing
> with "X", you are assuming "Y" will help you, and you are asking about "Y"
> without giving more details about the "X" so that we can understand the
> full issue.  Perhaps the best solution doesn't involve "Y" at all?
> See Also: http://www.perlmonks.org/index.pl?node_id=542341
>
>
> -Hoss
>


Re: help with phrase query

2011-10-18 Thread elisabeth benoit
I think you can use pf2 and pf3 in your requestHandler.

Best regards,
Elisabeth

2011/10/16 Vijay Ramachandran 

> Hello. I have an application where I try to match longer queries
> (sentences)
> to short documents (search phrases). Typically, the documents are 3-5 terms
> in length. I am facing a problem where phrase match in the indicated phrase
> fields via "pf" doesn't seem to match in most cases, and I am stumped.
> Please help!
>
> For instance, when my query is "should I buy a house now while the rates
> are
> low. We filed BR 2 yrs ago. Rent now, w/ some sch loan debt"
>
> I expect the document "buy a house" to match much higher than "house
> loan rates".
> However, the latter is the document which always matches higher.
>
>
> I tried to do this the following way (solr 3.1):
> 1. Score phrase matches high
> 2. Score single word matches lower
> 3. Use dismax with a "mm" of 1, and very high boost for exact phrase match.
>
> I used the s "text" definition in the schema for the single words, and the
> following for the phrase:
>
> positionIncrementGap="100">
>  
>
> generateWordParts="1" generateNumberParts="1"
>catenateWords="1" catenateNumbers="1" catenateAll="0"
> splitOnCaseChange="1"/>
>
> protected="protwords.txt"/>
> outputUnigrams="false"/>
>  
>  
>
> ignoreCase="true" expand="true"/>
> generateWordParts="1" generateNumberParts="1"
>catenateWords="0" catenateNumbers="0" catenateAll="0"
> splitOnCaseChange="1"/>
>
> protected="protwords.txt"/>
> outputUnigrams="false"/>
>  
>
>
> and my schema fields look like this:
>
>/>
>
>   
>/>
>
> This is my search handler config:
>
>  
>
> edismax
> explicit
> 0.1
> 
>   kpid,advid,campaign,keywords
> 
> 1
> 
>   kw_stopped^1.0
> 
> 
>   kw_phrases^50.0
> 
> 3
> 3
> *:*
> 
> keywords
> 
> 0
> 
> title
> regex 
>
>  
>
> These are the match score debugQuery explanations:
>
> 8.480054E-4 = (MATCH) sum of:
>  8.480054E-4 = (MATCH) product of:
>0.0031093531 = (MATCH) sum of:
>  0.0015556295 = (MATCH) weight(kw_stopped:hous in 1812), product of:
>2.8209004E-4 = queryWeight(kw_stopped:hous), product of:
>  5.514656 = idf(docFreq=25, maxDocs=2375)
>  5.1152787E-5 = queryNorm
>5.514656 = (MATCH) fieldWeight(kw_stopped:hous in 1812), product of:
>  1.0 = tf(termFreq(kw_stopped:hous)=1)
>  5.514656 = idf(docFreq=25, maxDocs=2375)
>  1.0 = fieldNorm(field=kw_stopped, doc=1812)
>  8.192911E-4 = (MATCH) weight(kw_stopped:rate in 1812), product of:
>2.0471694E-4 = queryWeight(kw_stopped:rate), product of:
>  4.002068 = idf(docFreq=117, maxDocs=2375)
>  5.1152787E-5 = queryNorm
>4.002068 = (MATCH) fieldWeight(kw_stopped:rate in 1812), product of:
>  1.0 = tf(termFreq(kw_stopped:rate)=1)
>  4.002068 = idf(docFreq=117, maxDocs=2375)
>  1.0 = fieldNorm(field=kw_stopped, doc=1812)
>  7.344327E-4 = (MATCH) weight(kw_stopped:loan in 1812), product of:
>1.9382538E-4 = queryWeight(kw_stopped:loan), product of:
>  3.7891462 = idf(docFreq=145, maxDocs=2375)
>  5.1152787E-5 = queryNorm
>3.7891462 = (MATCH) fieldWeight(kw_stopped:loan in 1812), product
> of:
>  1.0 = tf(termFreq(kw_stopped:loan)=1)
>  3.7891462 = idf(docFreq=145, maxDocs=2375)
>  1.0 = fieldNorm(field=kw_stopped, doc=1812)
>0.27272728 = coord(3/11)
>
> for "house loan rates" vs
>
> 8.480054E-4 = (MATCH) sum of:
>  8.480054E-4 = (MATCH) product of:
>0.0031093531 = (MATCH) sum of:
>  0.0015556295 = (MATCH) weight(kw_stopped:hous in 1812), product of:
>2.8209004E-4 = queryWeight(kw_stopped:hous), product of:
>  5.514656 = idf(docFreq=25, maxDocs=2375)
>  5.1152787E-5 = queryNorm
>5.514656 = (MATCH) fieldWeight(kw_stopped:hous in 1812), product of:
>  1.0 = tf(termFreq(kw_stopped:hous)=1)
>  5.514656 = idf(docFreq=25, maxDocs=2375)
>  1.0 = fieldNorm(field=kw_stopped, doc=1812)
>  8.192911E-4 = (MATCH) weight(kw_stopped:rate in 1812), product of:
>2.0471694E-4 = queryWeight(kw_stopped:rate), product of:
>  4.002068 = idf(docFreq=117, maxDocs=2375)
>  5.1152787E-5 = queryNorm
>4.002068 = (MATCH) fieldWeight(kw_stopped:rate in 1812), product of:
>  1.0 = tf(termFreq(kw_stopped:rate)=1)
>  4.002068 = idf(docFreq=117, maxDocs=2375)
>  1.0 = fieldNorm(field=kw_stopped, doc=1812)
>  7.344327E-4 = (MATCH) weight(kw_stopped:loan in 1812), product of:
>1.9382538E-4 = queryWeight(kw_stopped:loan), product of:
>  3.7891462 = idf(docFreq=145, maxDocs=2375)
>  5.1152787E-5 = queryNorm
>3.7891462 = (MATCH) fieldWeight(kw_stopped:loan in 1812), product
> of:
>   

Solr 4.0 indexing NoSuchMethodError

2011-11-09 Thread elisabeth benoit
Hello,

I've just installed Solr 4.0, and I am getting an error when indexing.

*GRAVE: java.lang.NoSuchMethodError:
org.apache.lucene.util.CodecUtil.writeHeader(Lorg/apache/lucene/store/DataOutput;Ljava/lang/String;I)Lorg/apache/lucene/store/DataOutput;
at org.apache.lucene.util.fst.FST.save(FST.java:311)*.

Does anybody know what I've done wrong?

Thanks,
Elisabeth


Re: Solr 4.0 indexing NoSuchMethodError

2011-11-10 Thread elisabeth benoit
Merci beaucoup Frédéric.

Le 9 novembre 2011 21:52, Frédéric Cons  a écrit :

> The CodecUtil.writeHeader signature has changed from
>
> public static DataOutput writeHeader(DataOutput out, String codec, int
> version)
>
> in lucene 3.4 (which is the method not found) to
>
> public static void writeHeader(DataOutput out, String codec, int version)
>
> in lucene 4.0
>
> It means that while you're using solr 4.0, some 3.4 jars are stuck
> somewhere in the java classpath. Obviously some code is looking for this
> 3.4 method.
>
> If you're using the start.jar executable, you should have a look at your
> system-wide classpath
> If you're using tomcat (and that sounds plausible in this situation), you
> should trash the "work" directory sub-folders of your tomcat installation;
> and restart it. Tomcat unpacks war archives in this directory, and it may
> have kept a 3.4 solr war deployed here.
>
> 2011/11/9 elisabeth benoit 
>
> > Hello,
> >
> > I've just installed Solr 4.0, and I am getting an error when indexing.
> >
> > *GRAVE: java.lang.NoSuchMethodError:
> >
> >
> org.apache.lucene.util.CodecUtil.writeHeader(Lorg/apache/lucene/store/DataOutput;Ljava/lang/String;I)Lorg/apache/lucene/store/DataOutput;
> >at org.apache.lucene.util.fst.FST.save(FST.java:311)*.
> >
> > Does anybody know what I've done wrong?
> >
> > Thanks,
> > Elisabeth
> >
>


NGramFilterFactory - proximity and percentage of ngrams found

2011-11-15 Thread elisabeth benoit
Hello,

I'm trying to use NGramFilterFactory for spell correction. I have three
questions.

1) I use an edismax request handler. In this case, what is the relation
between my ngrams and my default operator (q.op), if there is any?

2) Is there a way to control the proximity and percentage of ngrams found?
I figured I could use pf, pf2 and pf3 parameters, but is there something
more general?

3) If I want to favorise begining of words, is there a way to do it with
ngrams (for instance, if there was an option to add two spaces at begining
of every word, with ngrams size 3, "paris" would result in "  p", " pa",
"par", "ari", "ris") or should I use Edge factory?

Thanks,
Elisabeth


Solr 4.0 Levenshtein distance algorithm for DirectSpellChecker

2011-11-29 Thread elisabeth benoit
Hello,

I'd like to know if the Levensthein distance algorithm used by Solr 4.0
DirectSpellChecker (working quite well I must say) is considering an
inversion as distance = 1 or distance = 2?

For instance, if I write Monteruil and I meant Montreuil, is the distance 1
or 2?

Thanks,
Elisabeth


Re: Solr 4.0 Levenshtein distance algorithm for DirectSpellChecker

2011-11-29 Thread elisabeth benoit
ok, thanks.

I think it would be a nice improvment to consider inversion as distance =
1, since it's a so common mistake. The distance = 2 makes it difficult to
correct transpositions on small words (for instance, the DirectSpellChecker
couldn't make the right suggestion for "joile" given for 'jolie").

Best,
Elisabeth

one error the DirectSpellChecker couldn't make the right suggestion for is
"joile" for "jolie", I guess because transposition is 2, and because the
word is just five letters long so the inversion
2011/11/29 Robert Muir 

> On Tue, Nov 29, 2011 at 8:07 AM, elisabeth benoit
>  wrote:
> > Hello,
> >
> > I'd like to know if the Levensthein distance algorithm used by Solr 4.0
> > DirectSpellChecker (working quite well I must say) is considering an
> > inversion as distance = 1 or distance = 2?
> >
> > For instance, if I write Monteruil and I meant Montreuil, is the
> distance 1
> > or 2?
> >
>
> the algorithm is just levenshtein, so 2. its possible to also support
> a modified form where transpositions count as 1, but its not
> implemented.
>
> --
> lucidimagination.com
>


Solr cache size information

2011-12-01 Thread elisabeth benoit
Hello,

If anybody can help, I'd like to confirm a few things about Solr's caches
configuration.

If I want to calculate cache size in memory relativly to cache size in
solrconfig.xml

For Document cache

size in memory = size in solrconfig.xml * average size of all fields
defined in fl parameter   ???

For Filter cache

size in memory = size in solrconfig.xml * WHAT (the size of an id) ??? (I
don't use facet.enum method)

For Query result cache

size in memory = size in solrconfig.xml * the size of an id ???


I would also like to know relation between solr's caches sizes and JVM max
size?

If anyone has an answer or a link for further reading to suggest, it would
be greatly appreciated.

Thanks,
Elisabeth


Re: Solr cache size information

2011-12-04 Thread elisabeth benoit
Thanks a lot for these answers!

Elisabeth

2011/12/4 Erick Erickson 

> See below:
>
> On Thu, Dec 1, 2011 at 10:57 AM, elisabeth benoit
>  wrote:
> > Hello,
> >
> > If anybody can help, I'd like to confirm a few things about Solr's caches
> > configuration.
> >
> > If I want to calculate cache size in memory relativly to cache size in
> > solrconfig.xml
> >
> > For Document cache
> >
> > size in memory = size in solrconfig.xml * average size of all fields
> > defined in fl parameter   ???
>
> pretty much.
>
> >
> > For Filter cache
> >
> > size in memory = size in solrconfig.xml * WHAT (the size of an id) ??? (I
> > don't use facet.enum method)
> >
>
> It Depends(tm). Solr tries to do the best thing here, depending upon
> how many docs match the filter query. One method puts in a bitset for
> each
> entry, which is (maxDocs/8) bytes. maxDocs is reported on the admin/stats
> page.
>
> If the filter cache only hits a few documents, the size is smaller than
> that.
>
> You can think of this cache as a map where the key is the
> filter query (which is how they're re-used and how autowarm
> works) and the value for each key is the bitset or list. The
> size of the map is bounded by the size in solrconfig.xml.
>
> > For Query result cache
> >
> > size in memory = size in solrconfig.xml * the size of an id ???
> >
> Pretty much. This is the maximum size, but each entry is
> the query plus a list of IDs that's up to 
> long. This cache is, by and large, the least of your worries.
>
>
> >
> > I would also like to know relation between solr's caches sizes and JVM
> max
> > size?
>
> Don't quite know what you're asking for here. There's nothing automatic
> that's sensitive to whether the JVM memory limits are about to be exceeded.
> If the caches get too big, OOMs happen.
>
> >
> > If anyone has an answer or a link for further reading to suggest, it
> would
> > be greatly appreciated.
> >
> There's some information here: http://wiki.apache.org/solr/SolrCaching,
> but
> it often comes down to "try your app and monitor"
>
> Here's a work-in-progress that Grant is working on, be aware that it's
> for trunk, not 3x.
> http://java.dzone.com/news/estimating-memory-and-storage
>
>
> Best
> Erick
>
> > Thanks,
> > Elisabeth
>


Solr 3.4 problem with words separated by coma without space

2011-12-08 Thread elisabeth benoit
Hello,

I'm using Solr 3.4, and I'm having a problem with a request returning
different results if I have or not a space after a coma.

The request "name, number rue taine paris" returns results with 4 words out
of 5 matching ("name", "number", "rue", "paris")

The request "name,number rue taine paris" (no space between coma and
"number") returns no results, unless I set mm=3, and then matching words
are "rue", "taine", "paris".

If I check in the solr.admin.analyzer, I get the same analysis for the two
different requests. But it seems, if fact, that the lacking space after
coma prevents name and number from matching.


My field type is


  























  

Anyone has a clue?

Thanks,
Elisabeth


Re: Solr 3.4 problem with words separated by coma without space

2011-12-08 Thread elisabeth benoit
same problem with Solr 4.0

2011/12/8 elisabeth benoit 

>
>
> Hello,
>
> I'm using Solr 3.4, and I'm having a problem with a request returning
> different results if I have or not a space after a coma.
>
> The request "name, number rue taine paris" returns results with 4 words
> out of 5 matching ("name", "number", "rue", "paris")
>
> The request "name,number rue taine paris" (no space between coma and
> "number") returns no results, unless I set mm=3, and then matching words
> are "rue", "taine", "paris".
>
> If I check in the solr.admin.analyzer, I get the same analysis for the two
> different requests. But it seems, if fact, that the lacking space after
> coma prevents name and number from matching.
>
>
> My field type is
>
>
>   
> 
> 
> 
>  mapping="mapping-ISOLatin1Accent.txt"/>
> 
> 
> 
> 
> 
> 
>  pattern="^(\p{Punct}*)(.*?)(\p{Punct}*)$" replacement="$2"/>
> 
> 
> 
>  splitOnCaseChange="1" splitOnNumerics="1" stemEnglishPossessive="1"
> generateWordParts="1"
>
> generateNumberParts="1" catenateWords="0" catenateNumbers="1"
> catenateAll="0" preserveOriginal="1"/>
> 
>  articles="elisionwords.txt"/>
> 
>  words="stopwords.txt" enablePositionIncrements="true"/>
> 
>  protected="protwords.txt"/>
> 
> 
>   
>
> Anyone has a clue?
>
> Thanks,
> Elisabeth
>


Re: Solr 3.4 problem with words separated by coma without space

2011-12-12 Thread elisabeth benoit
Thanks for the answer.

yes in fact when I look at debugQuery output, I notice that name and number
are never treated as single entries.

I have

(((text:name text:number)) (text:ru) (text:tain) (text:paris)))

so name and number are in same parenthesis, but not exactlly treated as a
phrase, as far as I know, since a phrase would be more like text:"name
number".

could you tell me what is the difference between (text:name text:number)
and (text:"name number")?

I'll check autoGeneratePhraseQueries.

Best regards,
Elisabeth




2011/12/8 Chris Hostetter 

>
> : If I check in the solr.admin.analyzer, I get the same analysis for the
> two
> : different requests. But it seems, if fact, that the lacking space after
> : coma prevents name and number from matching.
>
> query analysis is only part of hte picture ... Did you look at the
> debuqQuery output? ...  i believe you are seeing the effects of the
> QueryParser analyzing "name," distinctly from "number" in one case, vs
> analyzing the entire string "name,number" in the second case, an treating
> the later as a phrase query (because one input clause produces multiple
> tokens)
>
> there is a recently added autoGeneratePhraseQueries option that affects
> this.
>
>
> -Hoss
>


catchall field minus one field

2012-01-11 Thread elisabeth benoit
Hello,

I have a catchall field, and I need to do some request in all fields of
that catchall field, minus one. To avoid duplicating my index, I'd like to
know if there is a way to use my catch field while excluding that one field.

Thanks,
Elisabeth


Re: catchall field minus one field

2012-01-12 Thread elisabeth benoit
thanks a lot for your advice, I'll try that.

Best regards,
Elisabeth

2012/1/11 Erick Erickson 

> Hmmm, Once the data is included in the catch-all, it's indistinguishable
> from
> all the rest of the data, so I don't see how you could do this. A clause
> like:
> -excludeField:[* TO *] would exclude all documents that had any data in
> the field so that's probably not what you want.
>
> Could you approach it the other way? Do NOT put the special field in
> the catch-all field in the first place, but massage the input to add
> a clause there? I.e. your "usual" case would have
> catchall: exclude_field:, but your
> special one would just be catchall:.
>
> You could set up request handlers to do this under the covers, so your
> queries would really be
> ...solr/usual?q=
> ...solr/special?q=
> and two different request handlers (edismax-style I'm thinking)
> would differ only by the "qf" field containing or not containing
> your special field.
>
> the other way, of course, would be to have a second catch-all
> field that didn't have your special field, then use one or the other
> depending, but as you say that would increase the size of your
> index...
>
> Best
> Erick
>
> On Wed, Jan 11, 2012 at 9:47 AM, elisabeth benoit
>  wrote:
> > Hello,
> >
> > I have a catchall field, and I need to do some request in all fields of
> > that catchall field, minus one. To avoid duplicating my index, I'd like
> to
> > know if there is a way to use my catch field while excluding that one
> field.
> >
> > Thanks,
> > Elisabeth
>


homogeneous dispersion in a bbox

2013-03-05 Thread elisabeth benoit
Hello,

I'd like to know if there is some specific way, in Solr 3.6.1, to have
something like an homogeneous dispersion of documents in a bbox.

My use case is I a have a request returning let's say 1000 documents in a
bbox (they all have the same solr score), and I want only 50 documents, but
not all heaped (gathered) in a specific geographical location.

We were thinking of adding a random field in our index and do a sort on
that field, but I'm wondering if there is solr already has a solution for
that king of use case.


best regards,
Elisabeth


disadvantage one field in a catchall field

2012-03-29 Thread elisabeth benoit
Hi all,

I'm using solr 3.4 with a catchall field and an edismaw request handler.
I'd like to score higher answers matching with words not contained in one
of the fields copied into my catchall field.

So my catchallfield is called catchall. It contains, let's say, fields
NAME, CATEGORY, TOWN, WAY and DESCRIPTION.

For one query, I would like to have answers matching NAME, CATEGORY, TOWN
and WAY scored higher, but I still want to search in DESCRIPTION.

I tried

qf=catchall DESCRIPTION^0.001,

but this doesn't seem to change the scoring. When I set debutQuery=on,
parsedquery_toString looks like

(text:paus | DESCRIPTION:pause^0.001) (this seems like an OR to me)

but I see no trace of DESCRIPTION in explain

One solution I guess would be to keep DESCRIPTION in a separate filed, and
do not include it in my catchall field. But I wonder if there is a solution
with the catchall field???

Thanks for your help,
Elisabeth


Multi-words synonyms matching

2012-04-10 Thread elisabeth benoit
Hello,

I've read several post on this issue, but can't find a real solution to my
multi-words synonyms matching problem.

I have in my synonyms.txt an entry like

mairie, hotel de ville

and my index time analyzer is configured as followed for synonyms.



The problem I have is that now "mairie" matches with "hotel" and I would
only want "mairie" to match with "hotel de ville" and "mairie".

When I look into the analyzer, I see that "mairie" is mapped into "hotel",
and words "de ville" are added in second and third position. To change
that, I tried to do

 (as I read in one post)

and I can see now in the analyzer that "mairie" is mapped to "hotel de
ville", but now when I have query "hotel de ville", it doesn't match at all
with "mairie".

Anyone has a clue of what I'm doing wrong?

I'm using Solr 3.4.

Thanks,
Elisabeth


Re: Multi-words synonyms matching

2012-04-11 Thread elisabeth benoit
<' mapping instead? Something
< mairie
<http://lucene.472066.n3.nabble.com/Multi-word-synonyms-td3716292.html#a3717215


Thanks a lot for your answers,
Elisabeth





2012/4/10 Erick Erickson 

> Have you tried the "=>' mapping instead? Something
> like
> hotel de ville => mairie
> might work for you.
>
> Best
> Erick
>
> On Tue, Apr 10, 2012 at 1:41 AM, elisabeth benoit
>  wrote:
> > Hello,
> >
> > I've read several post on this issue, but can't find a real solution to
> my
> > multi-words synonyms matching problem.
> >
> > I have in my synonyms.txt an entry like
> >
> > mairie, hotel de ville
> >
> > and my index time analyzer is configured as followed for synonyms.
> >
> >  > ignoreCase="true" expand="true"/>
> >
> > The problem I have is that now "mairie" matches with "hotel" and I would
> > only want "mairie" to match with "hotel de ville" and "mairie".
> >
> > When I look into the analyzer, I see that "mairie" is mapped into
> "hotel",
> > and words "de ville" are added in second and third position. To change
> > that, I tried to do
> >
> >  > ignoreCase="true" expand="true"
> > tokenizerFactory="solr.KeywordTokenizerFactory"/> (as I read in one post)
> >
> > and I can see now in the analyzer that "mairie" is mapped to "hotel de
> > ville", but now when I have query "hotel de ville", it doesn't match at
> all
> > with "mairie".
> >
> > Anyone has a clue of what I'm doing wrong?
> >
> > I'm using Solr 3.4.
> >
> > Thanks,
> > Elisabeth
>


Re: Multi-words synonyms matching

2012-04-11 Thread elisabeth benoit
oh, that's right.

thanks a lot,
Elisabeth

2012/4/11 Jeevanandam Madanagopal 

> Elisabeth -
>
> As you described, below mapping might suit for your need.
> mairie => hotel de ville, mairie
>
> mairie gets expanded to "hotel de ville" and "mairie" at index time.  So
> "mairie" and "hotel de ville" searchable on document.
>
> However, still white space tokenizer splits at query time will be a
> problem as described by Markus.
>
> --Jeevanandam
>
> On Apr 11, 2012, at 12:30 PM, elisabeth benoit wrote:
>
> > <' mapping instead? Something
> > < > < mairie
> > < >
> > Yes, thanks, I've tried it but from what I undestand it doesn't solve my
> > problem, since this means hotel de ville will be replace by mairie at
> > index time (I use synonyms only at index time). So when user will ask
> > "hôtel de ville", it won't match.
> >
> > In fact, at index time I have mairie in my data, but I want user to be
> able
> > to request "mairie" or "hôtel de ville" and have mairie as answer, and
> not
> > have mairie as an answer when requesting "hôtel".
> >
> >
> > < your
> > white
> > < >
> > < >
> > < > query
> > < >
> > Ok, I guess this means I have a problem. No simple solution since at
> query
> > time my tokenizer do split on white spaces.
> >
> > I guess my problem is more or less one of the problems discussed in
> >
> >
> http://lucene.472066.n3.nabble.com/Multi-word-synonyms-td3716292.html#a3717215
> >
> >
> > Thanks a lot for your answers,
> > Elisabeth
> >
> >
> >
> >
> >
> > 2012/4/10 Erick Erickson 
> >
> >> Have you tried the "=>' mapping instead? Something
> >> like
> >> hotel de ville => mairie
> >> might work for you.
> >>
> >> Best
> >> Erick
> >>
> >> On Tue, Apr 10, 2012 at 1:41 AM, elisabeth benoit
> >>  wrote:
> >>> Hello,
> >>>
> >>> I've read several post on this issue, but can't find a real solution to
> >> my
> >>> multi-words synonyms matching problem.
> >>>
> >>> I have in my synonyms.txt an entry like
> >>>
> >>> mairie, hotel de ville
> >>>
> >>> and my index time analyzer is configured as followed for synonyms.
> >>>
> >>>  >>> ignoreCase="true" expand="true"/>
> >>>
> >>> The problem I have is that now "mairie" matches with "hotel" and I
> would
> >>> only want "mairie" to match with "hotel de ville" and "mairie".
> >>>
> >>> When I look into the analyzer, I see that "mairie" is mapped into
> >> "hotel",
> >>> and words "de ville" are added in second and third position. To change
> >>> that, I tried to do
> >>>
> >>>  >>> ignoreCase="true" expand="true"
> >>> tokenizerFactory="solr.KeywordTokenizerFactory"/> (as I read in one
> post)
> >>>
> >>> and I can see now in the analyzer that "mairie" is mapped to "hotel de
> >>> ville", but now when I have query "hotel de ville", it doesn't match at
> >> all
> >>> with "mairie".
> >>>
> >>> Anyone has a clue of what I'm doing wrong?
> >>>
> >>> I'm using Solr 3.4.
> >>>
> >>> Thanks,
> >>> Elisabeth
> >>
>
>


Re: Multi-words synonyms matching

2012-04-24 Thread elisabeth benoit
Hello,

I'd like to resume this post.

The only way I found to do not split synonyms in words in synonyms.txt it
to use the line

 

in schema.xml

where tokenizerFactory="solr.KeywordTokenizerFactory"

instructs SynonymFilterFactory not to break synonyms into words on white
spaces when parsing synonyms file.

So now it works fine, "mairie" is mapped into "hotel de ville" and when I
send request q="hotel de ville" (quotes are mandatory to prevent analyzer
to split hotel de ville on white spaces), I get answers with word "mairie".

But when I use fq parameter (fq=CATEGORY_ANALYZED:"hotel de ville"), it
doesn't work!!!

CATEGORY_ANALYZED is same field type as default search field. This means
that when I send q="hotel de ville" and fq=CATEGORY_ANALYZED:"hotel de
ville", solr uses the same analyzer, the one with the line

.

Anyone as a clue what is different between q analysis behaviour and fq
analysis behaviour?

Thanks a lot
Elisabeth

2012/4/12 elisabeth benoit 

> oh, that's right.
>
> thanks a lot,
> Elisabeth
>
>
> 2012/4/11 Jeevanandam Madanagopal 
>
>> Elisabeth -
>>
>> As you described, below mapping might suit for your need.
>> mairie => hotel de ville, mairie
>>
>> mairie gets expanded to "hotel de ville" and "mairie" at index time.  So
>> "mairie" and "hotel de ville" searchable on document.
>>
>> However, still white space tokenizer splits at query time will be a
>> problem as described by Markus.
>>
>> --Jeevanandam
>>
>> On Apr 11, 2012, at 12:30 PM, elisabeth benoit wrote:
>>
>> > <' mapping instead? Something
>> > <> > < mairie
>> > <> >
>> > Yes, thanks, I've tried it but from what I undestand it doesn't solve my
>> > problem, since this means hotel de ville will be replace by mairie at
>> > index time (I use synonyms only at index time). So when user will ask
>> > "hôtel de ville", it won't match.
>> >
>> > In fact, at index time I have mairie in my data, but I want user to be
>> able
>> > to request "mairie" or "hôtel de ville" and have mairie as answer, and
>> not
>> > have mairie as an answer when requesting "hôtel".
>> >
>> >
>> > <> your
>> > white
>> > <> >
>> > <> >
>> > <> > query
>> > <> >
>> > Ok, I guess this means I have a problem. No simple solution since at
>> query
>> > time my tokenizer do split on white spaces.
>> >
>> > I guess my problem is more or less one of the problems discussed in
>> >
>> >
>> http://lucene.472066.n3.nabble.com/Multi-word-synonyms-td3716292.html#a3717215
>> >
>> >
>> > Thanks a lot for your answers,
>> > Elisabeth
>> >
>> >
>> >
>> >
>> >
>> > 2012/4/10 Erick Erickson 
>> >
>> >> Have you tried the "=>' mapping instead? Something
>> >> like
>> >> hotel de ville => mairie
>> >> might work for you.
>> >>
>> >> Best
>> >> Erick
>> >>
>> >> On Tue, Apr 10, 2012 at 1:41 AM, elisabeth benoit
>> >>  wrote:
>> >>> Hello,
>> >>>
>> >>> I've read several post on this issue, but can't find a real solution
>> to
>> >> my
>> >>> multi-words synonyms matching problem.
>> >>>
>> >>> I have in my synonyms.txt an entry like
>> >>>
>> >>> mairie, hotel de ville
>> >>>
>> >>> and my index time analyzer is configured as followed for synonyms.
>> >>>
>> >>> > >>> ignoreCase="true" expand="true"/>
>> >>>
>> >>> The problem I have is that now "mairie" matches with "hotel" and I
>> would
>> >>> only want "mairie" to match with "hotel de ville" and "mairie".
>> >>>
>> >>> When I look into the analyzer, I see that "mairie" is mapped into
>> >> "hotel",
>> >>> and words "de ville" are added in second and third position. To change
>> >>> that, I tried to do
>> >>>
>> >>> > >>> ignoreCase="true" expand="true"
>> >>> tokenizerFactory="solr.KeywordTokenizerFactory"/> (as I read in one
>> post)
>> >>>
>> >>> and I can see now in the analyzer that "mairie" is mapped to "hotel de
>> >>> ville", but now when I have query "hotel de ville", it doesn't match
>> at
>> >> all
>> >>> with "mairie".
>> >>>
>> >>> Anyone has a clue of what I'm doing wrong?
>> >>>
>> >>> I'm using Solr 3.4.
>> >>>
>> >>> Thanks,
>> >>> Elisabeth
>> >>
>>
>>
>


Re: Multi-words synonyms matching

2012-04-24 Thread elisabeth benoit
yes, thanks, but this is NOT my question.

I was wondering why I have multiple matches with q="hotel de ville" and no
match with fq=CATEGORY_ANALYZED:"hotel de ville", since in both case I'm
searching in the same solr fieldType.

Why is q parameter behaving differently in that case? Why do the quotes
work in one case and not in the other?

Does anyone know?

Thanks,
Elisabeth

2012/4/24 Jeevanandam 

>
> usage of q and fq
>
> q => is typically the main query for the search request
>
> fq => is Filter Query; generally used to restrict the super set of
> documents without influencing score (more info.
> http://wiki.apache.org/solr/**CommonQueryParameters#q<http://wiki.apache.org/solr/CommonQueryParameters#q>
> )
>
> For example:
> 
> q="hotel de ville" ===> returns 100 documents
>
> q="hotel de ville"&fq=price:[100 To *]&fq=roomType:"King size Bed" ===>
> returns 40 documents from super set of 100 documents
>
>
> hope this helps!
>
> - Jeevanandam
>
>
>
> On 24-04-2012 3:08 pm, elisabeth benoit wrote:
>
>> Hello,
>>
>> I'd like to resume this post.
>>
>> The only way I found to do not split synonyms in words in synonyms.txt it
>> to use the line
>>
>>  > ignoreCase="true" expand="true"
>> tokenizerFactory="solr.**KeywordTokenizerFactory"/>
>>
>> in schema.xml
>>
>> where tokenizerFactory="solr.**KeywordTokenizerFactory"
>>
>> instructs SynonymFilterFactory not to break synonyms into words on white
>> spaces when parsing synonyms file.
>>
>> So now it works fine, "mairie" is mapped into "hotel de ville" and when I
>> send request q="hotel de ville" (quotes are mandatory to prevent analyzer
>> to split hotel de ville on white spaces), I get answers with word
>> "mairie".
>>
>> But when I use fq parameter (fq=CATEGORY_ANALYZED:"hotel de ville"), it
>> doesn't work!!!
>>
>> CATEGORY_ANALYZED is same field type as default search field. This means
>> that when I send q="hotel de ville" and fq=CATEGORY_ANALYZED:"hotel de
>> ville", solr uses the same analyzer, the one with the line
>>
>> > ignoreCase="true" expand="true"
>> tokenizerFactory="solr.**KeywordTokenizerFactory"/>.
>>
>> Anyone as a clue what is different between q analysis behaviour and fq
>> analysis behaviour?
>>
>> Thanks a lot
>> Elisabeth
>>
>> 2012/4/12 elisabeth benoit 
>>
>>  oh, that's right.
>>>
>>> thanks a lot,
>>> Elisabeth
>>>
>>>
>>> 2012/4/11 Jeevanandam Madanagopal 
>>>
>>>  Elisabeth -
>>>>
>>>> As you described, below mapping might suit for your need.
>>>> mairie => hotel de ville, mairie
>>>>
>>>> mairie gets expanded to "hotel de ville" and "mairie" at index time.  So
>>>> "mairie" and "hotel de ville" searchable on document.
>>>>
>>>> However, still white space tokenizer splits at query time will be a
>>>> problem as described by Markus.
>>>>
>>>> --Jeevanandam
>>>>
>>>> On Apr 11, 2012, at 12:30 PM, elisabeth benoit wrote:
>>>>
>>>> > <' mapping instead? Something
>>>> > <>>> > < mairie
>>>> > <>>> >
>>>> > Yes, thanks, I've tried it but from what I undestand it doesn't solve
>>>> my
>>>> > problem, since this means hotel de ville will be replace by mairie at
>>>> > index time (I use synonyms only at index time). So when user will ask
>>>> > "hôtel de ville", it won't match.
>>>> >
>>>> > In fact, at index time I have mairie in my data, but I want user to be
>>>> able
>>>> > to request "mairie" or "hôtel de ville" and have mairie as answer, and
>>>> not
>>>> > have mairie as an answer when requesting "hôtel".
>>>> >
>>>> >
>>>> > <>>> your
>>>> > white
>>>> > <>>> >
>>>> > <>>> >
>>>> > <>>> at
>>>> > query
>>>> > <>>> >
>>>> > Ok, I guess this means I have a problem. No

Re: Multi-words synonyms matching

2012-04-25 Thread elisabeth benoit
I'm not at the office until next Wednesday, and I don't have my Solr under
hand, but isn't debugQuery=on giving informations only about q parameter
matching and nothing about fq parameter? Or do you mean
"parsed_filter_querie"s gives information about fq?

CATEGORY_ANALYZED is being populated by a copyField instruction in
schema.xml, and has the same field type as my catchall field, the search
field for my searchHandler (the one being used by q parameter).

CATEGORY (a string) is copied in CATEGORY_ANALYZED (field type is text)

CATEGORY (a string) is copied in catchall field (field type is text), and a
lot of other fields are copied too in that catchall field.

So as far as I can see, the same analysis should be done in both cases, but
obviously I'm missing something, and the only thing I can think of is a
different behavior between q and fq parameter.

I'll check that parsed_filter_querie first thing in the morning next
Wednesday.

Thanks a lot for your help.

Elisabeth


2012/4/24 Erick Erickson 

> Elisabeth:
>
> What shows up in the debug section of the response when you add
> &debugQuery=on? There should be some bit of that section like:
> "parsed_filter_queries"
>
> My other question is "are you absolutely sure that your
> CATEGORY_ANALYZED field has the correct content?". How does it
> get populated?
>
> Nothing jumps out at me here
>
> Best
> Erick
>
> On Tue, Apr 24, 2012 at 9:55 AM, elisabeth benoit
>  wrote:
> > yes, thanks, but this is NOT my question.
> >
> > I was wondering why I have multiple matches with q="hotel de ville" and
> no
> > match with fq=CATEGORY_ANALYZED:"hotel de ville", since in both case I'm
> > searching in the same solr fieldType.
> >
> > Why is q parameter behaving differently in that case? Why do the quotes
> > work in one case and not in the other?
> >
> > Does anyone know?
> >
> > Thanks,
> > Elisabeth
> >
> > 2012/4/24 Jeevanandam 
> >
> >>
> >> usage of q and fq
> >>
> >> q => is typically the main query for the search request
> >>
> >> fq => is Filter Query; generally used to restrict the super set of
> >> documents without influencing score (more info.
> >> http://wiki.apache.org/solr/**CommonQueryParameters#q<
> http://wiki.apache.org/solr/CommonQueryParameters#q>
> >> )
> >>
> >> For example:
> >> 
> >> q="hotel de ville" ===> returns 100 documents
> >>
> >> q="hotel de ville"&fq=price:[100 To *]&fq=roomType:"King size Bed" ===>
> >> returns 40 documents from super set of 100 documents
> >>
> >>
> >> hope this helps!
> >>
> >> - Jeevanandam
> >>
> >>
> >>
> >> On 24-04-2012 3:08 pm, elisabeth benoit wrote:
> >>
> >>> Hello,
> >>>
> >>> I'd like to resume this post.
> >>>
> >>> The only way I found to do not split synonyms in words in synonyms.txt
> it
> >>> to use the line
> >>>
> >>>   >>> ignoreCase="true" expand="true"
> >>> tokenizerFactory="solr.**KeywordTokenizerFactory"/>
> >>>
> >>> in schema.xml
> >>>
> >>> where tokenizerFactory="solr.**KeywordTokenizerFactory"
> >>>
> >>> instructs SynonymFilterFactory not to break synonyms into words on
> white
> >>> spaces when parsing synonyms file.
> >>>
> >>> So now it works fine, "mairie" is mapped into "hotel de ville" and
> when I
> >>> send request q="hotel de ville" (quotes are mandatory to prevent
> analyzer
> >>> to split hotel de ville on white spaces), I get answers with word
> >>> "mairie".
> >>>
> >>> But when I use fq parameter (fq=CATEGORY_ANALYZED:"hotel de ville"), it
> >>> doesn't work!!!
> >>>
> >>> CATEGORY_ANALYZED is same field type as default search field. This
> means
> >>> that when I send q="hotel de ville" and fq=CATEGORY_ANALYZED:"hotel de
> >>> ville", solr uses the same analyzer, the one with the line
> >>>
> >>>  >>> ignoreCase="true" expand="true"
> >>> tokenizerFactory="solr.**KeywordTokenizerFactory"/>.
> >>>
> >>> Anyone as a clue what is different between q analysis behav

  1   2   >