Re: Weird behavior of stopwords in search query

2014-02-19 Thread Ahmet Arslan
Hi Samik,

Please see parameter of edismax. 
https://cwiki.apache.org/confluence/display/solr/The+Extended+DisMax+Query+Parser
If lowercaseOperators=true then and is treated as AND. Also stopwords parameter 
could be used.

Stopwords and edismax had issues (when mm=100%) in history. Not sure current 
situation but you may need to apply same set of stopwords to all fields listed 
in qf parameter. Even to string types. String type should be replaced with 
KeywordTokenizer + StopwordFilter combo.





On Wednesday, February 19, 2014 7:48 AM, shamik  wrote:
Jack, thanks for the pointer. I should have checked this closely. I'm using
edismax and here's my qf entry :


          id^10.0 cat^1.4 text^0.5 features^1.0 name^1.2 sku^1.5 manu^1.1
title^10.0 description^5.0 keywords^5.0 author^2.0 resourcename^1.0

       

As you can see, I was boosting id and cat which are of type string and of
course doesn't go through the stopwords filter. Removing them returned one
result which is based on AND operator. 

The part what I'm not clear is how "and" is being treated even through its a
stopword and the default operator is OR. Shouldn't this be ignored ?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Weird-behavior-of-stopwords-in-search-query-tp4118156p4118188.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: block join and atomic updates

2014-02-19 Thread Mikhail Khludnev
Just a side note. Sidecar index might be really useful for updating blocked
docs, but it's in experimenting stage iirc.
http://www.lucenerevolution.org/2013/Sidecar-Index-Solr-Components-for-Parallel-Index-Management


On Wed, Feb 19, 2014 at 10:42 AM, Mikhail Khludnev <
mkhlud...@griddynamics.com> wrote:

> Colleagues,
> You are definitely right regarding denorm&collapse. It works fine in most
> cases, but look at the case more precisely. Moritz needs to update the
> parent's fields, if they are copied during denormalization, the price of
> update is the same as block join's. With q-time join updates are way
> cheaper, but searching time, you know.
> 19.02.2014 8:15 пользователь "Walter Underwood" 
> написал:
>
>  Listen to that advice. Denormalize, denormalize, denormalize. Think about
>> the results page and work backwards from that. Flat data model.
>>
>> wunder
>> Search guy at Infoseek, Inktomi, Verity, Autonomy, Netflix, and Chegg
>>
>> On Feb 18, 2014, at 7:37 PM, Jason Hellman <
>> jhell...@innoventsolutions.com> wrote:
>>
>> > Thinking in terms of normalized data in the context of a Lucene index
>> is dangerous.  It is not a relational data model technology, and the join
>> behaviors available to you have limited use.  Each approach requires
>> compromises that are likely impermissible for certain uses cases.
>> >
>> > If it is at all reasonable to consider you will likely be best served
>> de-normalizing the data.  Of course, your specific details may prove an
>> exception to this rule...but generally approach works very well.
>> >
>> > On Feb 18, 2014, at 4:19 AM, Mikhail Khludnev <
>> mkhlud...@griddynamics.com> wrote:
>> >
>> >> absolutely.
>> >>
>> >>
>> >> On Tue, Feb 18, 2014 at 1:20 PM,  wrote:
>> >>
>> >>> But isn't query time join much slower when it comes to a large amount
>> of
>> >>> documents?
>> >>>
>> >>> Zitat von Mikhail Khludnev :
>> >>>
>> >>>
>> >>> Hello,
>> 
>>  It sounds like you need to switch to query time join.
>>  15.02.2014 21:57 пользователь  написал:
>> 
>>  Any suggestions?
>> >
>> >
>> > Zitat von m...@preselect-media.com:
>> >
>> > Yonik Seeley :
>> >
>> >>
>> >> On Thu, Feb 13, 2014 at 8:25 AM,   wrote:
>> >>>
>> >>> Is there any workaround to perform atomic updates on blocks or do
>> I
>>  have to
>>  re-index the parent document and all its children always again
>> if I
>>  want to
>>  update a field?
>> 
>> 
>> >>> The latter, unfortunately.
>> >>>
>> >>>
>> >> Is there any plan to change this behavior in near future?
>> >>
>> >> So, I'm thinking of alternatives without loosing the benefit of
>> block
>> >> join.
>> >> I try to explain an idea I just thought about:
>> >>
>> >> Let's say I have a parent document A with a number of fields I
>> want to
>> >> update regularly and a number of child documents AC_1 ... AC_n
>> which are
>> >> only indexed once and aren't going to change anymore.
>> >> So, if I index A and AC_* in a block and I update A, the block is
>> gone.
>> >> But if I create an additional document AF which only contains
>> something
>> >> like an foreign key to A and indexing AF + AC_* as a block (not A
>> + AC_*
>> >> anymore), could I perform a {!parent ... } query on AF + AC_* and
>> make
>> >> an
>> >> join from the results to get A?
>> >> Does this makes any sense and is it even possible? ;-)
>> >> And if it's possible, how can I do it?
>> >>
>> >> Thanks,
>> >> - Moritz
>> >>
>> >>
>> >
>> >
>> >
>> >
>> >>>
>> >>>
>> >>>
>> >>
>> >>
>> >> --
>> >> Sincerely yours
>> >> Mikhail Khludnev
>> >> Principal Engineer,
>> >> Grid Dynamics
>> >>
>> >> 
>> >> 
>> >
>>
>> --
>> Walter Underwood
>> wun...@wunderwood.org
>>
>>
>>
>>


-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics


 


Problems with ICUCollationField

2014-02-19 Thread Thomas Fischer
Hello,

I'm migrating to solr 4.6.1 and have problems with the ICUCollationField 
(apache-solr-ref-guide-4.6.pdf, pp. 31 and 100).

I get consistently the error message 
Error loading class 'solr.ICUCollationField'.
even after
INFO: Adding 'file:/srv/solr4.6.1/contrib/analysis-extras/lib/icu4j-49.1.jar' 
to classloader
and
INFO: Adding 
'file:/srv/solr4.6.1/contrib/analysis-extras/lucene-libs/lucene-analyzers-icu-4.6.1.jar'
 to classloader.

Am I missing something?

I solr's subversion I found
/SVN/solr/contrib/analysis-extras/src/java/org/apache/solr/schema/ICUCollationField.java
but no corresponding class in solr4.6.1's contrib folder.

Best
Thomas



Re: Problems with ICUCollationField

2014-02-19 Thread Robert Muir
you need the solr analysis-extras jar in your classpath, too.



On Wed, Feb 19, 2014 at 6:45 AM, Thomas Fischer  wrote:

> Hello,
>
> I'm migrating to solr 4.6.1 and have problems with the ICUCollationField
> (apache-solr-ref-guide-4.6.pdf, pp. 31 and 100).
>
> I get consistently the error message
> Error loading class 'solr.ICUCollationField'.
> even after
> INFO: Adding
> 'file:/srv/solr4.6.1/contrib/analysis-extras/lib/icu4j-49.1.jar' to
> classloader
> and
> INFO: Adding
> 'file:/srv/solr4.6.1/contrib/analysis-extras/lucene-libs/lucene-analyzers-icu-4.6.1.jar'
> to classloader.
>
> Am I missing something?
>
> I solr's subversion I found
>
> /SVN/solr/contrib/analysis-extras/src/java/org/apache/solr/schema/ICUCollationField.java
> but no corresponding class in solr4.6.1's contrib folder.
>
> Best
> Thomas
>
>


Re: Problems with ICUCollationField

2014-02-19 Thread Thomas Fischer
Hello Robert,

I already added
contrib/analysis-extras/lib/
and
contrib/analysis-extras/lucene-libs/
via lib directives in solrconfig, this is why the classes mentioned are loaded.

Do you know which jar is supposed to contain the ICUCollationField?

Best regards
Thomas



Am 19.02.2014 um 13:54 schrieb Robert Muir:

> you need the solr analysis-extras jar in your classpath, too.
> 
> 
> 
> On Wed, Feb 19, 2014 at 6:45 AM, Thomas Fischer  wrote:
> 
>> Hello,
>> 
>> I'm migrating to solr 4.6.1 and have problems with the ICUCollationField
>> (apache-solr-ref-guide-4.6.pdf, pp. 31 and 100).
>> 
>> I get consistently the error message
>> Error loading class 'solr.ICUCollationField'.
>> even after
>> INFO: Adding
>> 'file:/srv/solr4.6.1/contrib/analysis-extras/lib/icu4j-49.1.jar' to
>> classloader
>> and
>> INFO: Adding
>> 'file:/srv/solr4.6.1/contrib/analysis-extras/lucene-libs/lucene-analyzers-icu-4.6.1.jar'
>> to classloader.
>> 
>> Am I missing something?
>> 
>> I solr's subversion I found
>> 
>> /SVN/solr/contrib/analysis-extras/src/java/org/apache/solr/schema/ICUCollationField.java
>> but no corresponding class in solr4.6.1's contrib folder.
>> 
>> Best
>> Thomas
>> 
>> 



Re: Problems with ICUCollationField

2014-02-19 Thread Robert Muir
you need the solr analysis-extras jar itself, too.



On Wed, Feb 19, 2014 at 8:25 AM, Thomas Fischer  wrote:

> Hello Robert,
>
> I already added
> contrib/analysis-extras/lib/
> and
> contrib/analysis-extras/lucene-libs/
> via lib directives in solrconfig, this is why the classes mentioned are
> loaded.
>
> Do you know which jar is supposed to contain the ICUCollationField?
>
> Best regards
> Thomas
>
>
>
> Am 19.02.2014 um 13:54 schrieb Robert Muir:
>
> > you need the solr analysis-extras jar in your classpath, too.
> >
> >
> >
> > On Wed, Feb 19, 2014 at 6:45 AM, Thomas Fischer 
> wrote:
> >
> >> Hello,
> >>
> >> I'm migrating to solr 4.6.1 and have problems with the ICUCollationField
> >> (apache-solr-ref-guide-4.6.pdf, pp. 31 and 100).
> >>
> >> I get consistently the error message
> >> Error loading class 'solr.ICUCollationField'.
> >> even after
> >> INFO: Adding
> >> 'file:/srv/solr4.6.1/contrib/analysis-extras/lib/icu4j-49.1.jar' to
> >> classloader
> >> and
> >> INFO: Adding
> >>
> 'file:/srv/solr4.6.1/contrib/analysis-extras/lucene-libs/lucene-analyzers-icu-4.6.1.jar'
> >> to classloader.
> >>
> >> Am I missing something?
> >>
> >> I solr's subversion I found
> >>
> >>
> /SVN/solr/contrib/analysis-extras/src/java/org/apache/solr/schema/ICUCollationField.java
> >> but no corresponding class in solr4.6.1's contrib folder.
> >>
> >> Best
> >> Thomas
> >>
> >>
>
>


Re: Weird behavior of stopwords in search query

2014-02-19 Thread Jack Krupansky
Simply add the lowecaserOperators=false parameter or add it to the 
"defaults" section of the request handler in solrconfig, and then "and" will 
not be treated as "AND".


The wiki is confusing - it shouldn't be advising you how to set the 
parameter to achieve the default setting! Rather, it should tell you how to 
override the default setting.


-- Jack Krupansky

-Original Message- 
From: Ahmet Arslan

Sent: Wednesday, February 19, 2014 4:16 AM
To: solr-user@lucene.apache.org
Subject: Re: Weird behavior of stopwords in search query

Hi Samik,

Please see parameter of edismax. 
https://cwiki.apache.org/confluence/display/solr/The+Extended+DisMax+Query+Parser
If lowercaseOperators=true then and is treated as AND. Also stopwords 
parameter could be used.


Stopwords and edismax had issues (when mm=100%) in history. Not sure current 
situation but you may need to apply same set of stopwords to all fields 
listed in qf parameter. Even to string types. String type should be replaced 
with KeywordTokenizer + StopwordFilter combo.






On Wednesday, February 19, 2014 7:48 AM, shamik  wrote:
Jack, thanks for the pointer. I should have checked this closely. I'm using
edismax and here's my qf entry :


 id^10.0 cat^1.4 text^0.5 features^1.0 name^1.2 sku^1.5 manu^1.1
title^10.0 description^5.0 keywords^5.0 author^2.0 resourcename^1.0

  

As you can see, I was boosting id and cat which are of type string and of
course doesn't go through the stopwords filter. Removing them returned one
result which is based on AND operator.

The part what I'm not clear is how "and" is being treated even through its a
stopword and the default operator is OR. Shouldn't this be ignored ?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Weird-behavior-of-stopwords-in-search-query-tp4118156p4118188.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: Indexed a new big database while the old is running?

2014-02-19 Thread Bruno Mannina

Hi Shaw,

Thanks for your answer.

Actually we haven't performance problem because we do only select request.
We have 4 CPUs 8cores 24Go Ram.

I know how to create alias, my question was just concerning performance, 
and you have right,
impossible to answer to this question without more information about my 
system, sorry.


I will do real test and I will check if perf will be down, if yes I will 
stop new indexation


If you have more information concerning indexation performance with my 
server config, don't miss to

write me. :)

Have a nice day,

Regards,
Bruno


Le 18/02/2014 16:30, Shawn Heisey a écrit :

On 2/18/2014 5:28 AM, Bruno Mannina wrote:

We have actually a SOLR db with around 88 000 000 docs.
All work fine :)

We receive each year a new backfile with the same content (but improved).

Index these docs takes several days on SOLR,
So is it possible to create a new collection (restart SOLR) and
Index these new 88 000 000 docs without stopping the current collection ?

We have around 1 million connections by month.

Do you think that this new indexation may cause problem to SOLR using?
Note: new database will not be used until the current collection will be
stopped.

You can instantly switch between collections by using the alias feature.
  To do this, you would have collections named something like test201302
and test201402, then you would create an alias named 'test' that points
to one of these collections.  Your code can use 'test' as the collection
name.

Without a lot more information, it's impossible to say whether building
a new collection will cause performance problems for the existing
collection.

It does seem like a problem that rebuilding the index takes several
days.  You might already be having performance problems.  It's also
possible that there's an aspect to this that I am not seeing, and that
several days is perfectly normal for YOUR index.

Not enough RAM is the most common reason for performance issues on a
large index:

http://wiki.apache.org/solr/SolrPerformanceProblems

Thanks,
Shawn







Re: Problems with ICUCollationField

2014-02-19 Thread Thomas Fischer
Thanks, that helps!

I'm trying to migrate from the now deprecated ICUCollationKeyFilterFactory I 
used before to the ICUCollationField.
Is there any description how to achieve this?

First tries now yield

ICUCollationField does not support specifying an analyzer.

which makes it complicated since I used the ICUCollationKeyFilterFactory to 
standardize my text fields (in particular because of German Umlauts).
But an ICUCollationField without LowerCaseFilter, a WhitespaceTokenizer, a 
LetterTokenizer, etc. doesn't do me much good, I'm afraid.
Or is this somehow wrapped into the ICUCollationField?

I didn't find ICUCollationField  in the solr wiki and not much information in 
the reference.
And the hint

"solr.ICUCollationField is included in the Solr analysis-extras contrib - see 
solr/contrib/analysis-extras/README.txt for instructions on which jars you need 
to add to your SOLR_HOME/lib in order to use it."

is misleading insofar as this README.txt doesn't mention the 
solr-analysis-extras-4.6.1.jar in dist.

Best
Thomas


Am 19.02.2014 um 14:27 schrieb Robert Muir:

> you need the solr analysis-extras jar itself, too.
> 
> 
> 
> On Wed, Feb 19, 2014 at 8:25 AM, Thomas Fischer  wrote:
> 
>> Hello Robert,
>> 
>> I already added
>> contrib/analysis-extras/lib/
>> and
>> contrib/analysis-extras/lucene-libs/
>> via lib directives in solrconfig, this is why the classes mentioned are
>> loaded.
>> 
>> Do you know which jar is supposed to contain the ICUCollationField?
>> 
>> Best regards
>> Thomas
>> 
>> 
>> 
>> Am 19.02.2014 um 13:54 schrieb Robert Muir:
>> 
>>> you need the solr analysis-extras jar in your classpath, too.
>>> 
>>> 
>>> 
>>> On Wed, Feb 19, 2014 at 6:45 AM, Thomas Fischer 
>> wrote:
>>> 
 Hello,
 
 I'm migrating to solr 4.6.1 and have problems with the ICUCollationField
 (apache-solr-ref-guide-4.6.pdf, pp. 31 and 100).
 
 I get consistently the error message
 Error loading class 'solr.ICUCollationField'.
 even after
 INFO: Adding
 'file:/srv/solr4.6.1/contrib/analysis-extras/lib/icu4j-49.1.jar' to
 classloader
 and
 INFO: Adding
 
>> 'file:/srv/solr4.6.1/contrib/analysis-extras/lucene-libs/lucene-analyzers-icu-4.6.1.jar'
 to classloader.
 
 Am I missing something?
 
 I solr's subversion I found
 
 
>> /SVN/solr/contrib/analysis-extras/src/java/org/apache/solr/schema/ICUCollationField.java
 but no corresponding class in solr4.6.1's contrib folder.
 
 Best
 Thomas
 
 
>> 
>> 



Re: Problems with ICUCollationField

2014-02-19 Thread Robert Muir
Hmm, for standardization of text fields, collation might be a little
awkward.

For your german umlauts, what do you mean by standardize? is this to
achieve equivalency of e.g. oe to ö in your search terms?

In that case, a simpler approach would be to put
GermanNormalizationFilterFactory in your chain:
http://lucene.apache.org/core/4_6_1/analyzers-common/org/apache/lucene/analysis/de/GermanNormalizationFilter.html


On Wed, Feb 19, 2014 at 9:16 AM, Thomas Fischer  wrote:

> Thanks, that helps!
>
> I'm trying to migrate from the now deprecated ICUCollationKeyFilterFactory
> I used before to the ICUCollationField.
> Is there any description how to achieve this?
>
> First tries now yield
>
> ICUCollationField does not support specifying an analyzer.
>
> which makes it complicated since I used the ICUCollationKeyFilterFactory
> to standardize my text fields (in particular because of German Umlauts).
> But an ICUCollationField without LowerCaseFilter, a WhitespaceTokenizer, a
> LetterTokenizer, etc. doesn't do me much good, I'm afraid.
> Or is this somehow wrapped into the ICUCollationField?
>
> I didn't find ICUCollationField  in the solr wiki and not much information
> in the reference.
> And the hint
>
> "solr.ICUCollationField is included in the Solr analysis-extras contrib -
> see solr/contrib/analysis-extras/README.txt for instructions on which jars
> you need to add to your SOLR_HOME/lib in order to use it."
>
> is misleading insofar as this README.txt doesn't mention the
> solr-analysis-extras-4.6.1.jar in dist.
>
> Best
> Thomas
>
>
> Am 19.02.2014 um 14:27 schrieb Robert Muir:
>
> > you need the solr analysis-extras jar itself, too.
> >
> >
> >
> > On Wed, Feb 19, 2014 at 8:25 AM, Thomas Fischer 
> wrote:
> >
> >> Hello Robert,
> >>
> >> I already added
> >> contrib/analysis-extras/lib/
> >> and
> >> contrib/analysis-extras/lucene-libs/
> >> via lib directives in solrconfig, this is why the classes mentioned are
> >> loaded.
> >>
> >> Do you know which jar is supposed to contain the ICUCollationField?
> >>
> >> Best regards
> >> Thomas
> >>
> >>
> >>
> >> Am 19.02.2014 um 13:54 schrieb Robert Muir:
> >>
> >>> you need the solr analysis-extras jar in your classpath, too.
> >>>
> >>>
> >>>
> >>> On Wed, Feb 19, 2014 at 6:45 AM, Thomas Fischer 
> >> wrote:
> >>>
>  Hello,
> 
>  I'm migrating to solr 4.6.1 and have problems with the
> ICUCollationField
>  (apache-solr-ref-guide-4.6.pdf, pp. 31 and 100).
> 
>  I get consistently the error message
>  Error loading class 'solr.ICUCollationField'.
>  even after
>  INFO: Adding
>  'file:/srv/solr4.6.1/contrib/analysis-extras/lib/icu4j-49.1.jar' to
>  classloader
>  and
>  INFO: Adding
> 
> >>
> 'file:/srv/solr4.6.1/contrib/analysis-extras/lucene-libs/lucene-analyzers-icu-4.6.1.jar'
>  to classloader.
> 
>  Am I missing something?
> 
>  I solr's subversion I found
> 
> 
> >>
> /SVN/solr/contrib/analysis-extras/src/java/org/apache/solr/schema/ICUCollationField.java
>  but no corresponding class in solr4.6.1's contrib folder.
> 
>  Best
>  Thomas
> 
> 
> >>
> >>
>
>


Re: Problems with ICUCollationField

2014-02-19 Thread Thomas Fischer

> Hmm, for standardization of text fields, collation might be a little
> awkward.

I arrived there after using custom rules for a while (see "RuleBasedCollator" 
on http://wiki.apache.org/solr/UnicodeCollation) and then being told
"For better performance, less memory usage, and support for more locales, you 
can add the analysis-extras contrib and use ICUCollationKeyFilterFactory 
instead." (on the same page under "ICU Collation").

> For your german umlauts, what do you mean by standardize? is this to
> achieve equivalency of e.g. oe to ö in your search terms?

That is the main point, but I might also need the additional normalization of 
combined characters like
o+  ̈ = ö and probably similar constructions for other languages (like 
Hungarian).

> In that case, a simpler approach would be to put
> GermanNormalizationFilterFactory in your chain:
> http://lucene.apache.org/core/4_6_1/analyzers-common/org/apache/lucene/analysis/de/GermanNormalizationFilter.html

I'll see how far I get with this, but from the description
• 'ä', 'ö', 'ü' are replaced by 'a', 'o', 'u', respectively.
• 'ae' and 'oe' are replaced by 'a', and 'o', respectively.
this seems to be too far-reaching a reduction: while the identification "ä=ae" 
is not very serious and rarely misleading, "ä=a" might pack words together that 
shouldn't be, "Äsen" and "Asen" are quite different concepts,

In general, the deprecation of ICUCollationKeyFilterFactory doesn't seem to be 
really thought through.

Thanks anyway, best
Thomas

> 
> On Wed, Feb 19, 2014 at 9:16 AM, Thomas Fischer  wrote:
> 
>> Thanks, that helps!
>> 
>> I'm trying to migrate from the now deprecated ICUCollationKeyFilterFactory
>> I used before to the ICUCollationField.
>> Is there any description how to achieve this?
>> 
>> First tries now yield
>> 
>> ICUCollationField does not support specifying an analyzer.
>> 
>> which makes it complicated since I used the ICUCollationKeyFilterFactory
>> to standardize my text fields (in particular because of German Umlauts).
>> But an ICUCollationField without LowerCaseFilter, a WhitespaceTokenizer, a
>> LetterTokenizer, etc. doesn't do me much good, I'm afraid.
>> Or is this somehow wrapped into the ICUCollationField?
>> 
>> I didn't find ICUCollationField  in the solr wiki and not much information
>> in the reference.
>> And the hint
>> 
>> "solr.ICUCollationField is included in the Solr analysis-extras contrib -
>> see solr/contrib/analysis-extras/README.txt for instructions on which jars
>> you need to add to your SOLR_HOME/lib in order to use it."
>> 
>> is misleading insofar as this README.txt doesn't mention the
>> solr-analysis-extras-4.6.1.jar in dist.
>> 
>> Best
>> Thomas
>> 
>> 
>> Am 19.02.2014 um 14:27 schrieb Robert Muir:
>> 
>>> you need the solr analysis-extras jar itself, too.
>>> 
>>> 
>>> 
>>> On Wed, Feb 19, 2014 at 8:25 AM, Thomas Fischer 
>> wrote:
>>> 
 Hello Robert,
 
 I already added
 contrib/analysis-extras/lib/
 and
 contrib/analysis-extras/lucene-libs/
 via lib directives in solrconfig, this is why the classes mentioned are
 loaded.
 
 Do you know which jar is supposed to contain the ICUCollationField?
 
 Best regards
 Thomas
 
 
 
 Am 19.02.2014 um 13:54 schrieb Robert Muir:
 
> you need the solr analysis-extras jar in your classpath, too.
> 
> 
> 
> On Wed, Feb 19, 2014 at 6:45 AM, Thomas Fischer 
 wrote:
> 
>> Hello,
>> 
>> I'm migrating to solr 4.6.1 and have problems with the
>> ICUCollationField
>> (apache-solr-ref-guide-4.6.pdf, pp. 31 and 100).
>> 
>> I get consistently the error message
>> Error loading class 'solr.ICUCollationField'.
>> even after
>> INFO: Adding
>> 'file:/srv/solr4.6.1/contrib/analysis-extras/lib/icu4j-49.1.jar' to
>> classloader
>> and
>> INFO: Adding
>> 
 
>> 'file:/srv/solr4.6.1/contrib/analysis-extras/lucene-libs/lucene-analyzers-icu-4.6.1.jar'
>> to classloader.
>> 
>> Am I missing something?
>> 
>> I solr's subversion I found
>> 
>> 
 
>> /SVN/solr/contrib/analysis-extras/src/java/org/apache/solr/schema/ICUCollationField.java
>> but no corresponding class in solr4.6.1's contrib folder.
>> 
>> Best
>> Thomas
>> 
>> 
 
 
>> 
>> 



Re: Problems with ICUCollationField

2014-02-19 Thread Robert Muir
On Wed, Feb 19, 2014 at 10:33 AM, Thomas Fischer  wrote:

>
> > Hmm, for standardization of text fields, collation might be a little
> > awkward.
>
> I arrived there after using custom rules for a while (see
> "RuleBasedCollator" on http://wiki.apache.org/solr/UnicodeCollation) and
> then being told
> "For better performance, less memory usage, and support for more locales,
> you can add the analysis-extras contrib and use
> ICUCollationKeyFilterFactory instead." (on the same page under "ICU
> Collation").
>
> > For your german umlauts, what do you mean by standardize? is this to
> > achieve equivalency of e.g. oe to ö in your search terms?
>
> That is the main point, but I might also need the additional normalization
> of combined characters like
> o+  ̈ = ö and probably similar constructions for other languages (like
> Hungarian).
>

Sure but using collation to get normalization is pretty overkill too. Maybe
try ICUNormalizer2Filter? This gives you better control over the
normalization anyway.


>
> > In that case, a simpler approach would be to put
> > GermanNormalizationFilterFactory in your chain:
> >
> http://lucene.apache.org/core/4_6_1/analyzers-common/org/apache/lucene/analysis/de/GermanNormalizationFilter.html
>
> I'll see how far I get with this, but from the description
> • 'ä', 'ö', 'ü' are replaced by 'a', 'o', 'u', respectively.
> • 'ae' and 'oe' are replaced by 'a', and 'o', respectively.
> this seems to be too far-reaching a reduction: while the identification
> "ä=ae" is not very serious and rarely misleading, "ä=a" might pack words
> together that shouldn't be, "Äsen" and "Asen" are quite different concepts,
>

I'm not sure thats a mainstream opinion: not only do the default german
collation rules conflate these two characters as equivalent at primary
level, but so do many german stemming algorithms. Similar arguments could
be made for 'résumé' versus 'resume' and so on. Search isn't an exact
science.


Exact fragment length in highlighting

2014-02-19 Thread Juan Carlos Serrano
Hello everybody,

I'm using Solr 4.6.1. and I'd like to know if there's a way to determine
exactly the number of characters of a fragment used in highlights. If I use
hl.fragsize=70 the length of the fragments that I get is variable (often)
and I get results of 90 characters length.

Regards and thanks in advance,

Juan Carlos


Re: Preventing multiple on-deck searchers without causing failed commits

2014-02-19 Thread Greg Walters
I believe that there's a configuration option that'll make on-deck searchers be 
used if they're needed even if they're not fully warmed yet. You might try that 
option and see if it doesn't solve your 503 errors.

Thanks,
Greg

On Feb 18, 2014, at 9:05 PM, Erick Erickson  wrote:

> Colin:
> 
> Stop. Back up. The automatic soft commits will make updates available to
> your users every second. Those documents _include_ anything from your "hard
> commit" jobs. What could be faster? Parenthetically I'll add that 1 second
> soft commits are rarely an actual requirement, but that's your decision.
> 
> For the hard commits. Fine. Do them if you insist. Just set
> openSearcher=false. The documents will be searchable the next time the soft
> commit happens, within one second. The key is openSearcher=false. That
> prevents starting a brand new searcher.
> 
> BTW, your commits are not failing. It's just that _after_ the commit
> happens, the warming searcher limit is exceeded.
> 
> You can even wait until the segments are flushed to disk. All without
> opening a searcher.
> 
> Shawn is spot on in his recommendations to not fixate on the commits. Solr
> handles that. Here's a long blog about all the details of durability .vs.
> visibility.
> http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
> 
> You're over-thinking the problem here, trying to control commits with a
> sledgehammer when you don't need to, just use the built-in capabilities.
> 
> Best,
> Erick
> 
> 
> 
> On Tue, Feb 18, 2014 at 10:33 AM, Colin Bartolome  wrote:
> 
>> On 02/18/2014 10:15 AM, Shawn Heisey wrote:
>> 
>>> If you want to be completely in control like that, get rid of the
>>> automatic soft commits and just do the hard commits.
>>> 
>>> I would personally choose another option for your setup -- get rid of
>>> *all* explicit commits entirely, and just configure autoCommit and
>>> autoSoftCommit in the server config.  Since you're running 4.x, you really
>>> should have the transaction log (updateLog in the config) enabled.  You
>>> can rely on the transaction log to replay updates since the last hard
>>> commit if there's ever a crash.
>>> 
>>> I would also recommend upgrading to 4.6.1, but that's a completely
>>> separate item.
>>> 
>>> Thanks,
>>> Shawn
>>> 
>>> 
>> We use the automatic soft commits to get search index updates to our users
>> faster, via Near Realtime Searching. We have the updateLog enabled. I'm not
>> worried that the Solr side of the equation will lose data; I'm worried that
>> the communication from our web servers and scheduled jobs to the Solr
>> servers will break down and nothing will come along to make sure everything
>> is up to date. It sounds like what we're picturing is not currently
>> supported, so I'll file the RFE.
>> 
>> Will upgrading to 4.6.1 help at all with this issue?
>> 



Re: Exact fragment length in highlighting

2014-02-19 Thread Ahmet Arslan
Hi Juan,

Are you counting number of characters of html rendered snippet?

I think pre and post strings (html markup which are not displayed) are causing 
that difference.

Ahmet


On Wednesday, February 19, 2014 5:53 PM, Juan Carlos Serrano 
 wrote:
Hello everybody,

I'm using Solr 4.6.1. and I'd like to know if there's a way to determine
exactly the number of characters of a fragment used in highlights. If I use
hl.fragsize=70 the length of the fragments that I get is variable (often)
and I get results of 90 characters length.

Regards and thanks in advance,

Juan Carlos



Re: Fault Tolerant Technique of Solr Cloud

2014-02-19 Thread Per Steffensen

On 19/02/14 07:57, Vineet Mishra wrote:

Thanks for all your response but my doubt is which *Server:Port* should the
query be made as we don't know the crashed server or which server might
crash in the future(as any server can go down).
That is what CloudSolrServer will deal with for you. It knows which 
servers are down and make sure not to send request to those servers.


The only intention for writing this doubt is to get an idea about how the
query format for distributed search might work if any of the shard or
replica goes down.


// Setting up your CloudSolrServer-client
CloudSolrServer client=  new  CloudSolrServer();  // 
 being the same string as you provide in -D|zkHost when starting 
your servers
|client.setDefaultCollection("collection1");
client.connect();

// Creating and firing queries (you can do it in different way, but at least 
this is an option)
SolrQuery query = new SolrQuery("*:*");
QueryResponse results = client.query(query);


Because you are using CloudSolrServer you do not have to worry about not 
sending the request to a crashed server.


In your example I believe the situation is as follows:
* One collection called "collection1" with two shards "shard1" and 
"shard2" each having two replica "replica1" and "replica2" (a replica is 
an "instance" of a shard, and when you have one replica you are not 
having replication).
* collection1.shard1.replica1 is running on localhost:8983 and 
collection1.shard1.replica2 is running on localhost:8900 (or maybe switched)
* collection1.shard2.replica1 is running on localhost:7574 and 
collection1.shard2.replica2 is running on localhost:7500 (or maybe switched)
If localhost:8900 is the only server that is down, all data is still 
available for search because every shard has at least on replica 
running. In that case I believe setting "shards.tolerant" will not make 
a difference. You will get your response no matter what. But if 
localhost:8983 was also down there would no live replica of shard1. I 
that case you will get an exception from you query, indicating that the 
query cannot be carried out over the complete data-set. In that case if 
you set "shards.tolerant" that behaviour will change, and you will not 
get an exception - you will get a real response, but it will just not 
include data from shard1, because it is not available at the moment. 
That is just the way I believe "shards.tolerant" works, but you might 
want to verify that.


To set "shards.tolerant":

SolrQuery query = new SolrQuery("*:*");
query.set("shards.tolerant", true);
QueryResponse results = client.query(query);


Believe distributes search is default, but you can explicitly require it by

query.setDistrib(true);

or

query.set("distrib", true);



Thanks




Re: Preventing multiple on-deck searchers without causing failed commits

2014-02-19 Thread Shawn Heisey

On 2/19/2014 8:59 AM, Greg Walters wrote:

I believe that there's a configuration option that'll make on-deck searchers be 
used if they're needed even if they're not fully warmed yet. You might try that 
option and see if it doesn't solve your 503 errors.


I'm fairly sure that this option (useColdSearcher) only applies to 
warming queries defined in solrconfig.xml, and that it only applies to 
situations when the searcher that is warming up is the *ONLY* searcher 
that exists.  The only time that should happen is at Solr startup and 
core reload.  At that time, the only warming queries that will be 
executed are those configured for the firstSearcher event.


A quick peek at the code (branch_4x, SolrCore.java, starting at line 
1647) seems to confirm this.  I did not do an in-depth analysis.


Thanks,
Shawn



Re: Preventing multiple on-deck searchers without causing failed commits

2014-02-19 Thread Greg Walters
> A quick peek at the code (branch_4x, SolrCore.java, starting at line 1647) 
> seems to confirm this.

It seems my understanding of that option was wrong! Thanks for correcting me 
Shawn.

Greg 

On Feb 19, 2014, at 11:19 AM, Shawn Heisey  wrote:

> On 2/19/2014 8:59 AM, Greg Walters wrote:
>> I believe that there's a configuration option that'll make on-deck searchers 
>> be used if they're needed even if they're not fully warmed yet. You might 
>> try that option and see if it doesn't solve your 503 errors.
> 
> I'm fairly sure that this option (useColdSearcher) only applies to warming 
> queries defined in solrconfig.xml, and that it only applies to situations 
> when the searcher that is warming up is the *ONLY* searcher that exists.  The 
> only time that should happen is at Solr startup and core reload.  At that 
> time, the only warming queries that will be executed are those configured for 
> the firstSearcher event.
> 
> A quick peek at the code (branch_4x, SolrCore.java, starting at line 1647) 
> seems to confirm this.  I did not do an in-depth analysis.
> 
> Thanks,
> Shawn
> 



Re: Exact fragment length in highlighting

2014-02-19 Thread Jason Hellman
Juan,

Pay close attention to the boundary scanner you’re employing:

http://wiki.apache.org/solr/HighlightingParameters#hl.boundaryScanner

You can be explicit to indicate a type (hl.bs.type) with options such as 
CHARACTER, WORD, SENTENCE, and LINE.  The default is WORD (as the wiki 
indicates) and I presume this is what you are employing.

Be careful about using explicit characters.  I had an interesting case of 
highlight returns that looked like this:

> This is a highlight
> Here is another highlight
> Yes, another one, etc…

It was a bit maddening trying to figure out why “>” was in the highlight…turned 
out it was XML content and the character boundary clipped the trailing “>” 
based on the boundary rules.

In any case, you should be able to achieve a pretty flexible result depending 
on what you’re really after with the right combination of settings.

Jason

On Feb 19, 2014, at 7:53 AM, Juan Carlos Serrano  wrote:

> Hello everybody,
> 
> I'm using Solr 4.6.1. and I'd like to know if there's a way to determine
> exactly the number of characters of a fragment used in highlights. If I use
> hl.fragsize=70 the length of the fragments that I get is variable (often)
> and I get results of 90 characters length.
> 
> Regards and thanks in advance,
> 
> Juan Carlos



Does SolrCloud Improves Indexing or Slows it down

2014-02-19 Thread Susheel Kumar
Hi,

If we setup a solr cloud with 3 nodes and then we have like 100+ million 
documents to index. How we should be indexing a) will the indexing request be 
going to each machine assuming we are able to divide data based on some field 
or b) we should be sending the request to one end point and what should be end 
point? 

Can you please clarify and reading this article it says indexing may become 
slower. 

http://stackoverflow.com/questions/13500955/does-solrclouds-scalability-extend-to-indexing
  


Please suggest & let me know if you need more info.

Thnx


Re: Slow 95th-percentile

2014-02-19 Thread Allan Carroll
Thanks, Chris. Adding autoWarming to the filter cache made another big  
improvement.

Between increasing the soft commit to 60s, fixing the q:* query, and 
autowarming the filter caches my 95% latencies are down to a very acceptable 
range — almost an order of magnitude improvement. :-)

-Allan

On February 18, 2014 at 5:32:51 PM, Chris Hostetter (hossman_luc...@fucit.org) 
wrote:


: Slowing the soft commits to every 100 seconds helped. The main culprit  
: was a bad query that was coming through every few seconds. Something  
: about the empty fq param and the q=* slowed everything else down.  
:  
: INFO: [event] webapp=/solr path=/select  
: params={start=0&q=*&wt=javabin&fq=&fq=startTime:139283643&version=2}  
: hits=1894 status=0 QTime=6943  

1) if you are using Solr 4.1 or earlier, then q=* is an expensive &  
useless query that doesn't mean what you think it does...  

https://issues.apache.org/jira/browse/SOLR-2996  

2) an empty "fq" doesn't cost anything -- if you use debugQuery=true you  
should see that it's not even included in "parsed_filter_queries" because  
it's totally ignored.  

3) if that "startTime" value changes at some fixed and regular  
interval, that could explain some anomoloies if it's normally the  
same and cached, but changes once a day/hour/minute or whatever and is a  
bit slow to cache.  


bottom line: a softCommit is going to re-open a searcher, which is going  
to wipe your caches. if you don't have any (auto)warming configured, that  
means any "fq"s, or "q"s that you run regularly are going to pay the  
price of being "slow" the first time they are run against a new searcher  
is opened.  

If your priority is low response time, you really want to open new  
searchers as infrequently as your SLA for visibility allows, and use  
(auto)warming for those common queries.  



-Hoss  
http://www.lucidworks.com/  


Re: Does SolrCloud Improves Indexing or Slows it down

2014-02-19 Thread Kranti Parisa
Why don't you do parallel indexing and then merge everything into one and
replicate that from the master to the slaves in SolrCloud?

Thanks,
Kranti K. Parisa
http://www.linkedin.com/in/krantiparisa



On Wed, Feb 19, 2014 at 3:04 PM, Susheel Kumar <
susheel.ku...@thedigitalgroup.net> wrote:

> Hi,
>
> If we setup a solr cloud with 3 nodes and then we have like 100+ million
> documents to index. How we should be indexing a) will the indexing request
> be going to each machine assuming we are able to divide data based on some
> field or b) we should be sending the request to one end point and what
> should be end point?
>
> Can you please clarify and reading this article it says indexing may
> become slower.
>
>
> http://stackoverflow.com/questions/13500955/does-solrclouds-scalability-extend-to-indexing
>
>
> Please suggest & let me know if you need more info.
>
> Thnx
>


Getting fields from query

2014-02-19 Thread Jamie Johnson
Is there a way to get all the fields that are in a particular query?
 Ultimately I'd like to restrict the fields that a user can use to search
so I want to make sure that there aren't any fields in the query that they
should not be allowed to search.


Re: Getting fields from query

2014-02-19 Thread Ahmet Arslan
Hi Jamie,

May not be direct answer to your question but your Q reminded me edismax's uf 
parameter.

http://wiki.apache.org/solr/ExtendedDisMax#uf_.28User_Fields.29





On Wednesday, February 19, 2014 11:18 PM, Jamie Johnson  
wrote:
Is there a way to get all the fields that are in a particular query?
Ultimately I'd like to restrict the fields that a user can use to search
so I want to make sure that there aren't any fields in the query that they
should not be allowed to search.



RE: Does SolrCloud Improves Indexing or Slows it down

2014-02-19 Thread Susheel Kumar
Thanks for your reply, Kranti. If we want to shard the index into 3 nodes  does 
the slave/master concept will help and we are using solr 4.6 so should we 
utilize the concept of master/slave or move to sharding concept.

-Original Message-
From: Kranti Parisa [mailto:kranti.par...@gmail.com] 
Sent: Wednesday, February 19, 2014 2:12 PM
To: solr-user@lucene.apache.org
Subject: Re: Does SolrCloud Improves Indexing or Slows it down

Why don't you do parallel indexing and then merge everything into one and 
replicate that from the master to the slaves in SolrCloud?

Thanks,
Kranti K. Parisa
http://www.linkedin.com/in/krantiparisa



On Wed, Feb 19, 2014 at 3:04 PM, Susheel Kumar < 
susheel.ku...@thedigitalgroup.net> wrote:

> Hi,
>
> If we setup a solr cloud with 3 nodes and then we have like 100+ 
> million documents to index. How we should be indexing a) will the 
> indexing request be going to each machine assuming we are able to 
> divide data based on some field or b) we should be sending the request 
> to one end point and what should be end point?
>
> Can you please clarify and reading this article it says indexing may 
> become slower.
>
>
> http://stackoverflow.com/questions/13500955/does-solrclouds-scalabilit
> y-extend-to-indexing
>
>
> Please suggest & let me know if you need more info.
>
> Thnx
>


Re: Getting fields from query

2014-02-19 Thread Jamie Johnson
This actually may do what I want, I'll have to check.  Right now we are
using Lucene directly and not Solr for this particular project, but if this
fits the bill we may be able to use just the query parser.


On Wed, Feb 19, 2014 at 4:30 PM, Ahmet Arslan  wrote:

> Hi Jamie,
>
> May not be direct answer to your question but your Q reminded me edismax's
> uf parameter.
>
> http://wiki.apache.org/solr/ExtendedDisMax#uf_.28User_Fields.29
>
>
>
>
>
> On Wednesday, February 19, 2014 11:18 PM, Jamie Johnson 
> wrote:
> Is there a way to get all the fields that are in a particular query?
> Ultimately I'd like to restrict the fields that a user can use to search
> so I want to make sure that there aren't any fields in the query that they
> should not be allowed to search.
>
>


Re: Getting fields from query

2014-02-19 Thread Jamie Johnson
On closer inspection this isn't quite what I'm looking for. The
functionality is spot on, but I'm looking for a way to do this using a
Query Parser in Lucene core, i.e. StandardQueryParser unless folks have
experience with using the Solr query parsers with vanilla lucene?  Though
I'd prefer to stick to just Lucene.

If there was a way to just get all of the fields that were part of a query
I could easily do this myself, but it looks as if there is no standard way
to get a field given a query i.e. TermQuery you need to getTerm, SpanQuery
you get field, BooleanQuery you need to check all the clauses, etc.  It's
definitely possible for me to go through each of these and determine the
proper way to do it, but I'd have thought that this was something already
done as a utility somewhere.  I had thought that the
Query.extractTerms(Set terms) would have done this, but it doesn't
appear to be implemented for range queries.  Any ideas how I can get this
information?


On Wed, Feb 19, 2014 at 7:37 PM, Jamie Johnson  wrote:

> This actually may do what I want, I'll have to check.  Right now we are
> using Lucene directly and not Solr for this particular project, but if this
> fits the bill we may be able to use just the query parser.
>
>
> On Wed, Feb 19, 2014 at 4:30 PM, Ahmet Arslan  wrote:
>
>> Hi Jamie,
>>
>> May not be direct answer to your question but your Q reminded me
>> edismax's uf parameter.
>>
>> http://wiki.apache.org/solr/ExtendedDisMax#uf_.28User_Fields.29
>>
>>
>>
>>
>>
>> On Wednesday, February 19, 2014 11:18 PM, Jamie Johnson <
>> jej2...@gmail.com> wrote:
>> Is there a way to get all the fields that are in a particular query?
>> Ultimately I'd like to restrict the fields that a user can use to search
>> so I want to make sure that there aren't any fields in the query that they
>> should not be allowed to search.
>>
>>
>


Re: Getting fields from query

2014-02-19 Thread Jack Krupansky
Try asking the question on the Lucene user list - this is the Solr user 
list.


Also, clarify whether you are trying to get the list of fields used in a 
query or trying to limit the fields that can be used in a query. uf does the 
latter, but your latest message suggested the former. You're confused us - 
or at least me!


Look at how "uf" is implemented in Solr and then just replicate that in the 
Lucene standard query parser. The solr standard query parser was originally 
just a subclass of the Lucene standard query parser, but then they diverged 
and Solr now has a copy of the Lucene query parser.


-- Jack Krupansky

-Original Message- 
From: Jamie Johnson

Sent: Wednesday, February 19, 2014 8:05 PM
To: solr-user@lucene.apache.org ; Ahmet Arslan
Subject: Re: Getting fields from query

On closer inspection this isn't quite what I'm looking for. The
functionality is spot on, but I'm looking for a way to do this using a
Query Parser in Lucene core, i.e. StandardQueryParser unless folks have
experience with using the Solr query parsers with vanilla lucene?  Though
I'd prefer to stick to just Lucene.

If there was a way to just get all of the fields that were part of a query
I could easily do this myself, but it looks as if there is no standard way
to get a field given a query i.e. TermQuery you need to getTerm, SpanQuery
you get field, BooleanQuery you need to check all the clauses, etc.  It's
definitely possible for me to go through each of these and determine the
proper way to do it, but I'd have thought that this was something already
done as a utility somewhere.  I had thought that the
Query.extractTerms(Set terms) would have done this, but it doesn't
appear to be implemented for range queries.  Any ideas how I can get this
information?


On Wed, Feb 19, 2014 at 7:37 PM, Jamie Johnson  wrote:


This actually may do what I want, I'll have to check.  Right now we are
using Lucene directly and not Solr for this particular project, but if 
this

fits the bill we may be able to use just the query parser.


On Wed, Feb 19, 2014 at 4:30 PM, Ahmet Arslan  wrote:


Hi Jamie,

May not be direct answer to your question but your Q reminded me
edismax's uf parameter.

http://wiki.apache.org/solr/ExtendedDisMax#uf_.28User_Fields.29





On Wednesday, February 19, 2014 11:18 PM, Jamie Johnson <
jej2...@gmail.com> wrote:
Is there a way to get all the fields that are in a particular query?
Ultimately I'd like to restrict the fields that a user can use to search
so I want to make sure that there aren't any fields in the query that 
they

should not be allowed to search.








Re: block join and atomic updates

2014-02-19 Thread Michael Sokolov
Maybe he can use updateable docvalues (LUCENE-5189)?  I heard that was a 
thing. Has it made its way into Solr in some way?


-Mike


On 2/19/2014 4:23 AM, Mikhail Khludnev wrote:

Just a side note. Sidecar index might be really useful for updating blocked
docs, but it's in experimenting stage iirc.
http://www.lucenerevolution.org/2013/Sidecar-Index-Solr-Components-for-Parallel-Index-Management


On Wed, Feb 19, 2014 at 10:42 AM, Mikhail Khludnev <
mkhlud...@griddynamics.com> wrote:


Colleagues,
You are definitely right regarding denorm&collapse. It works fine in most
cases, but look at the case more precisely. Moritz needs to update the
parent's fields, if they are copied during denormalization, the price of
update is the same as block join's. With q-time join updates are way
cheaper, but searching time, you know.
19.02.2014 8:15 пользователь "Walter Underwood" 
написал:

  Listen to that advice. Denormalize, denormalize, denormalize. Think about

the results page and work backwards from that. Flat data model.

wunder
Search guy at Infoseek, Inktomi, Verity, Autonomy, Netflix, and Chegg

On Feb 18, 2014, at 7:37 PM, Jason Hellman <
jhell...@innoventsolutions.com> wrote:


Thinking in terms of normalized data in the context of a Lucene index

is dangerous.  It is not a relational data model technology, and the join
behaviors available to you have limited use.  Each approach requires
compromises that are likely impermissible for certain uses cases.

If it is at all reasonable to consider you will likely be best served

de-normalizing the data.  Of course, your specific details may prove an
exception to this rule...but generally approach works very well.

On Feb 18, 2014, at 4:19 AM, Mikhail Khludnev <

mkhlud...@griddynamics.com> wrote:

absolutely.


On Tue, Feb 18, 2014 at 1:20 PM,  wrote:


But isn't query time join much slower when it comes to a large amount

of

documents?

Zitat von Mikhail Khludnev :


Hello,

It sounds like you need to switch to query time join.
15.02.2014 21:57 пользователь  написал:

Any suggestions?


Zitat von m...@preselect-media.com:

Yonik Seeley :


On Thu, Feb 13, 2014 at 8:25 AM,   wrote:

Is there any workaround to perform atomic updates on blocks or do

I

have to
re-index the parent document and all its children always again

if I

want to
update a field?



The latter, unfortunately.



Is there any plan to change this behavior in near future?

So, I'm thinking of alternatives without loosing the benefit of

block

join.
I try to explain an idea I just thought about:

Let's say I have a parent document A with a number of fields I

want to

update regularly and a number of child documents AC_1 ... AC_n

which are

only indexed once and aren't going to change anymore.
So, if I index A and AC_* in a block and I update A, the block is

gone.

But if I create an additional document AF which only contains

something

like an foreign key to A and indexing AF + AC_* as a block (not A

+ AC_*

anymore), could I perform a {!parent ... } query on AF + AC_* and

make

an
join from the results to get A?
Does this makes any sense and is it even possible? ;-)
And if it's possible, how can I do it?

Thanks,
- Moritz











--
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics




--
Walter Underwood
wun...@wunderwood.org










Re: Getting fields from query

2014-02-19 Thread Jamie Johnson
Thanks Jack,  I ultimately want to limit but I'd take getting them if that
was available.  I'll post to the lucene list though
On Feb 19, 2014 8:22 PM, "Jack Krupansky"  wrote:

> Try asking the question on the Lucene user list - this is the Solr user
> list.
>
> Also, clarify whether you are trying to get the list of fields used in a
> query or trying to limit the fields that can be used in a query. uf does
> the latter, but your latest message suggested the former. You're confused
> us - or at least me!
>
> Look at how "uf" is implemented in Solr and then just replicate that in
> the Lucene standard query parser. The solr standard query parser was
> originally just a subclass of the Lucene standard query parser, but then
> they diverged and Solr now has a copy of the Lucene query parser.
>
> -- Jack Krupansky
>
> -Original Message- From: Jamie Johnson
> Sent: Wednesday, February 19, 2014 8:05 PM
> To: solr-user@lucene.apache.org ; Ahmet Arslan
> Subject: Re: Getting fields from query
>
> On closer inspection this isn't quite what I'm looking for. The
> functionality is spot on, but I'm looking for a way to do this using a
> Query Parser in Lucene core, i.e. StandardQueryParser unless folks have
> experience with using the Solr query parsers with vanilla lucene?  Though
> I'd prefer to stick to just Lucene.
>
> If there was a way to just get all of the fields that were part of a query
> I could easily do this myself, but it looks as if there is no standard way
> to get a field given a query i.e. TermQuery you need to getTerm, SpanQuery
> you get field, BooleanQuery you need to check all the clauses, etc.  It's
> definitely possible for me to go through each of these and determine the
> proper way to do it, but I'd have thought that this was something already
> done as a utility somewhere.  I had thought that the
> Query.extractTerms(Set terms) would have done this, but it doesn't
> appear to be implemented for range queries.  Any ideas how I can get this
> information?
>
>
> On Wed, Feb 19, 2014 at 7:37 PM, Jamie Johnson  wrote:
>
>  This actually may do what I want, I'll have to check.  Right now we are
>> using Lucene directly and not Solr for this particular project, but if
>> this
>> fits the bill we may be able to use just the query parser.
>>
>>
>> On Wed, Feb 19, 2014 at 4:30 PM, Ahmet Arslan  wrote:
>>
>>  Hi Jamie,
>>>
>>> May not be direct answer to your question but your Q reminded me
>>> edismax's uf parameter.
>>>
>>> http://wiki.apache.org/solr/ExtendedDisMax#uf_.28User_Fields.29
>>>
>>>
>>>
>>>
>>>
>>> On Wednesday, February 19, 2014 11:18 PM, Jamie Johnson <
>>> jej2...@gmail.com> wrote:
>>> Is there a way to get all the fields that are in a particular query?
>>> Ultimately I'd like to restrict the fields that a user can use to search
>>> so I want to make sure that there aren't any fields in the query that
>>> they
>>> should not be allowed to search.
>>>
>>>
>>>
>>
>


Re: Solr Hot Cpu and high load

2014-02-19 Thread Nitin Sharma
Thanks, Erick. I will try that




On Sun, Feb 16, 2014 at 5:07 PM, Erick Erickson wrote:

> Stored fields are what the Solr DocumentCache in solrconfig.xml
> is all about.
>
> My general feeling is that stored fields are mostly irrelevant for
> search speed, especially if lazy-loading is enabled. The only time
> stored fields come in to play is when assembling the final result
> list, i.e. the 10 or 20 documents that you return. That does imply
> disk I/O, and if you have massive fields theres also decompression
> to add to the CPU load.
>
> So, as usual, "it depends". Try measuring where you restrict the returned
> fields to whatever your  field is for one set of tests, then
> try returning _everything_ for another?
>
> Best,
> Erick
>
>
> On Sun, Feb 16, 2014 at 12:18 PM, Nitin Sharma
> wrote:
>
> > Thanks Tri
> >
> >
> > *a. Are you docs distributed evenly across shards: number of docs and
> size
> > of the shards*
> > >> Yes the size of all the shards is equal (an ignorable delta in the
> order
> > of KB) and so are the # of docs
> >
> > *b. Is your test client querying all nodes, or all the queries go to
> those
> > 2 busy nodes?*
> > *>> *Yes all nodes are receiving exactly the same amount of queries
> >
> >
> > I have one more question. Do stored fields have significant impact on
> > performance of solr queries? Having 50% of the fields stored ( out of 100
> > fields) significantly worse that having 20% of the fields stored?
> > (signficantly == orders of 100s of milliseconds assuming all fields are
> of
> > the same size and type)
> >
> > How are stored fields retrieved in general (always from disk or loaded
> into
> > memory in the first query and then going forward read from memory?)
> >
> > Thanks
> > Nitin
> >
> >
> >
> > On Fri, Feb 14, 2014 at 11:45 AM, Tri Cao  wrote:
> >
> > > 1. Yes, that's the right way to go, well, in theory at least :)
> > > 2. Yes, queries are alway fanned to all shards and will be as slow as
> the
> > > slowest shard. When I looked into
> > > Solr distributed querying implementation a few months back, the support
> > > for graceful degradation for things
> > > like network failures and slow shards was not there yet.
> > > 3. I doubt mmap settings would impact your read-only load, and it seems
> > > you can easily
> > > fit your index in RAM. You could try to warm the file cache to make
> sure
> > > with "cat $sorl_dir > /dev/null".
> > >
> > > It's odd that only 2 nodes are at 100% in your set up. I would check a
> > > couple of things:
> > > a. Are you docs distributed evenly across shards: number of docs and
> size
> > > of the shards
> > > b. Is your test client querying all nodes, or all the queries go to
> those
> > > 2 busy nodes?
> > >
> > > Regards,
> > > Tri
> > >
> > > On Feb 14, 2014, at 10:52 AM, Nitin Sharma <
> nitin.sha...@bloomreach.com>
> > > wrote:
> > >
> > > Hell folks
> > >
> > > We are currently using solrcloud 4.3.1. We have 8 node solrcloud
> cluster
> > > with 32 cores, 60Gb of ram and SSDs.We are using zk to manage the
> > > solrconfig used by our collections
> > >
> > > We have many collections and some of them are relatively very large
> > > compared to the other. The size of the shard of these big collections
> are
> > > in the order of Gigabytes.We decided to split the bigger collection
> > evenly
> > > across all nodes (8 shards and 2 replicas) with maxNumShards > 1.
> > >
> > > We did a test with a read load only on one big collection and we still
> > see
> > > only 2 nodes running 100% CPU and the rest are blazing through the
> > queries
> > > way faster (under 30% cpu). [Despite all of them being sharded across
> all
> > > nodes]
> > >
> > > I checked the JVM usage and found that none of the pools have high
> > > utilization (except Survivor space which is 100%). The GC cycles are in
> > > the order of ms and mostly doing scavenge. Mark and sweep occurs once
> > every
> > > 30 minutes
> > >
> > > Few questions:
> > >
> > > 1. Sharding all collections (small and large) across all nodes evenly
> > >
> > > distributes the load and makes the system characteristics of all
> machines
> > > similar. Is this a recommended way to do ?
> > > 2. Solr Cloud does a distributed query by default. So if a node is at
> > >
> > > 100% CPU does it slow down the response time for the other nodes
> waiting
> > > for this query? (or does it have a timeout if it cannot get a response
> > from
> > > a node within x seconds?)
> > > 3. Our collections use Mmap directory but i specifically haven't
> enabled
> > >
> > > anything related to mmaps (locked pages under ulimit ). Does it adverse
> > > affect performance? or can lock pages even without this?
> > >
> > > Thanks a lot in advance.
> > > Nitin
> > >
> > >
> >
> >
> > --
> > - N
> >
>



-- 
- N


Re: Caching Solr boost functions?

2014-02-19 Thread Jason Hellman
Gregg,

The QueryResultCache caches a sorted int array of results matching the a query. 
 This should overlap very nicely with your desired behavior, as a hit in this 
cache will not perform a Lucene query nor a need to calculate score.  

Now, ‘for the life of the Searcher’ is the trick here.  You can size your cache 
large enough to ensure it can fit every possible query, but at some point this 
is untenable.  I would argue that high volatility of query parameters would 
invalidate the need for caching anyway, but that’s clearly debatable.  
Nevertheless, this should work admirably well to solve your needs.

Jason

On Feb 18, 2014, at 11:32 AM, Gregg Donovan  wrote:

> We're testing out a new handler that uses edismax with three different
> "boost" functions. One has a random() function in it, so is not very
> cacheable, but the other two boost functions do not change from query to
> query.
> 
> I'd like to tell Solr to cache those boost queries for the life of the
> Searcher so they don't get recomputed every time. Is there any way to do
> that out of the box?
> 
> In a different custom QParser we have we wrote a CachingValueSource that
> wrapped a ValueSource with a custom ValueSource cache. Would it make sense
> to implement that as a standard Solr function so that one could do:
> 
> boost=cache(expensiveFunctionQuery())
> 
> Thanks.
> 
> --Gregg



Re: Solr Autosuggest - Strange issue with leading numbers in query

2014-02-19 Thread Jason Hellman
Here’s a rather obvious question:  have you rebuilt your spell index recently?  
Is it possible the offending numbers snuck into the spell dictionary?  The 
terms component will show you what’s in your current, searchable field…but not 
the dictionary.

If my memory serves correctly, with collate=true this would allow for such 
behavior to occur, especially with onlyMorePopular set to false (which would 
ensure the resulting collation has a query count greater than the current 
query).  Have you flipped onlyMorePopular to true to confirm?




On Feb 18, 2014, at 10:16 AM, bbi123  wrote:

> Thanks a lot for your response Erik.
> 
> I was trying to find if I have any suggestion starting with numbers using
> terms component but I couldn't find any.. Its very strange!!!
> 
> Anyways, thanks again for your response.
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-Autosuggest-Strange-issue-with-leading-numbers-in-query-tp4116751p4118072.html
> Sent from the Solr - User mailing list archive at Nabble.com.