date:20140219

Hello,

I'm migrating to solr 4.6.1 and have problems with the ICUCollationField 
(apache-solr-ref-guide-4.6.pdf, pp. 31 and 100).

I get consistently the error message 
Error loading class 'solr.ICUCollationField'.
even after
INFO: Adding 'file:/srv/solr4.6.1/contrib/analysis-extras/lib/icu4j-49.1.jar' 
to classloader
and
INFO: Adding 
'file:/srv/solr4.6.1/contrib/analysis-extras/lucene-libs/lucene-analyzers-icu-4.6.1.jar'
 to classloader.

Am I missing something?

I solr's subversion I found
/SVN/solr/contrib/analysis-extras/src/java/org/apache/solr/schema/ICUCollationField.java
but no corresponding class in solr4.6.1's contrib folder.

Best
Thomas

Re: Problems with ICUCollationField

you need the solr analysis-extras jar in your classpath, too.



On Wed, Feb 19, 2014 at 6:45 AM, Thomas Fischer  wrote:

> Hello,
>
> I'm migrating to solr 4.6.1 and have problems with the ICUCollationField
> (apache-solr-ref-guide-4.6.pdf, pp. 31 and 100).
>
> I get consistently the error message
> Error loading class 'solr.ICUCollationField'.
> even after
> INFO: Adding
> 'file:/srv/solr4.6.1/contrib/analysis-extras/lib/icu4j-49.1.jar' to
> classloader
> and
> INFO: Adding
> 'file:/srv/solr4.6.1/contrib/analysis-extras/lucene-libs/lucene-analyzers-icu-4.6.1.jar'
> to classloader.
>
> Am I missing something?
>
> I solr's subversion I found
>
> /SVN/solr/contrib/analysis-extras/src/java/org/apache/solr/schema/ICUCollationField.java
> but no corresponding class in solr4.6.1's contrib folder.
>
> Best
> Thomas
>
>

Re: Problems with ICUCollationField

Hello Robert,

I already added
contrib/analysis-extras/lib/
and
contrib/analysis-extras/lucene-libs/
via lib directives in solrconfig, this is why the classes mentioned are loaded.

Do you know which jar is supposed to contain the ICUCollationField?

Best regards
Thomas



Am 19.02.2014 um 13:54 schrieb Robert Muir:

> you need the solr analysis-extras jar in your classpath, too.
> 
> 
> 
> On Wed, Feb 19, 2014 at 6:45 AM, Thomas Fischer  wrote:
> 
>> Hello,
>> 
>> I'm migrating to solr 4.6.1 and have problems with the ICUCollationField
>> (apache-solr-ref-guide-4.6.pdf, pp. 31 and 100).
>> 
>> I get consistently the error message
>> Error loading class 'solr.ICUCollationField'.
>> even after
>> INFO: Adding
>> 'file:/srv/solr4.6.1/contrib/analysis-extras/lib/icu4j-49.1.jar' to
>> classloader
>> and
>> INFO: Adding
>> 'file:/srv/solr4.6.1/contrib/analysis-extras/lucene-libs/lucene-analyzers-icu-4.6.1.jar'
>> to classloader.
>> 
>> Am I missing something?
>> 
>> I solr's subversion I found
>> 
>> /SVN/solr/contrib/analysis-extras/src/java/org/apache/solr/schema/ICUCollationField.java
>> but no corresponding class in solr4.6.1's contrib folder.
>> 
>> Best
>> Thomas
>> 
>>

Re: Problems with ICUCollationField

you need the solr analysis-extras jar itself, too.



On Wed, Feb 19, 2014 at 8:25 AM, Thomas Fischer  wrote:

> Hello Robert,
>
> I already added
> contrib/analysis-extras/lib/
> and
> contrib/analysis-extras/lucene-libs/
> via lib directives in solrconfig, this is why the classes mentioned are
> loaded.
>
> Do you know which jar is supposed to contain the ICUCollationField?
>
> Best regards
> Thomas
>
>
>
> Am 19.02.2014 um 13:54 schrieb Robert Muir:
>
> > you need the solr analysis-extras jar in your classpath, too.
> >
> >
> >
> > On Wed, Feb 19, 2014 at 6:45 AM, Thomas Fischer 
> wrote:
> >
> >> Hello,
> >>
> >> I'm migrating to solr 4.6.1 and have problems with the ICUCollationField
> >> (apache-solr-ref-guide-4.6.pdf, pp. 31 and 100).
> >>
> >> I get consistently the error message
> >> Error loading class 'solr.ICUCollationField'.
> >> even after
> >> INFO: Adding
> >> 'file:/srv/solr4.6.1/contrib/analysis-extras/lib/icu4j-49.1.jar' to
> >> classloader
> >> and
> >> INFO: Adding
> >>
> 'file:/srv/solr4.6.1/contrib/analysis-extras/lucene-libs/lucene-analyzers-icu-4.6.1.jar'
> >> to classloader.
> >>
> >> Am I missing something?
> >>
> >> I solr's subversion I found
> >>
> >>
> /SVN/solr/contrib/analysis-extras/src/java/org/apache/solr/schema/ICUCollationField.java
> >> but no corresponding class in solr4.6.1's contrib folder.
> >>
> >> Best
> >> Thomas
> >>
> >>
>
>

Re: Weird behavior of stopwords in search query

2014-02-19 Thread Jack Krupansky

Simply add the lowecaserOperators=false parameter or add it to the 
"defaults" section of the request handler in solrconfig, and then "and" will 
not be treated as "AND".


The wiki is confusing - it shouldn't be advising you how to set the 
parameter to achieve the default setting! Rather, it should tell you how to 
override the default setting.


-- Jack Krupansky

-Original Message- 
From: Ahmet Arslan

Sent: Wednesday, February 19, 2014 4:16 AM
To: solr-user@lucene.apache.org
Subject: Re: Weird behavior of stopwords in search query

Hi Samik,

Please see parameter of edismax. 
https://cwiki.apache.org/confluence/display/solr/The+Extended+DisMax+Query+Parser
If lowercaseOperators=true then and is treated as AND. Also stopwords 
parameter could be used.


Stopwords and edismax had issues (when mm=100%) in history. Not sure current 
situation but you may need to apply same set of stopwords to all fields 
listed in qf parameter. Even to string types. String type should be replaced 
with KeywordTokenizer + StopwordFilter combo.






On Wednesday, February 19, 2014 7:48 AM, shamik  wrote:
Jack, thanks for the pointer. I should have checked this closely. I'm using
edismax and here's my qf entry :


 id^10.0 cat^1.4 text^0.5 features^1.0 name^1.2 sku^1.5 manu^1.1
title^10.0 description^5.0 keywords^5.0 author^2.0 resourcename^1.0

  

As you can see, I was boosting id and cat which are of type string and of
course doesn't go through the stopwords filter. Removing them returned one
result which is based on AND operator.

The part what I'm not clear is how "and" is being treated even through its a
stopword and the default operator is OR. Shouldn't this be ignored ?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Weird-behavior-of-stopwords-in-search-query-tp4118156p4118188.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Indexed a new big database while the old is running?

2014-02-19 Thread Bruno Mannina


Hi Shaw,

Thanks for your answer.

Actually we haven't performance problem because we do only select request.
We have 4 CPUs 8cores 24Go Ram.

I know how to create alias, my question was just concerning performance, 
and you have right,
impossible to answer to this question without more information about my 
system, sorry.


I will do real test and I will check if perf will be down, if yes I will 
stop new indexation


If you have more information concerning indexation performance with my 
server config, don't miss to

write me. :)

Have a nice day,

Regards,
Bruno


Le 18/02/2014 16:30, Shawn Heisey a écrit :

On 2/18/2014 5:28 AM, Bruno Mannina wrote:

We have actually a SOLR db with around 88 000 000 docs.
All work fine :)

We receive each year a new backfile with the same content (but improved).

Index these docs takes several days on SOLR,
So is it possible to create a new collection (restart SOLR) and
Index these new 88 000 000 docs without stopping the current collection ?

We have around 1 million connections by month.

Do you think that this new indexation may cause problem to SOLR using?
Note: new database will not be used until the current collection will be
stopped.

You can instantly switch between collections by using the alias feature.
  To do this, you would have collections named something like test201302
and test201402, then you would create an alias named 'test' that points
to one of these collections.  Your code can use 'test' as the collection
name.

Without a lot more information, it's impossible to say whether building
a new collection will cause performance problems for the existing
collection.

It does seem like a problem that rebuilding the index takes several
days.  You might already be having performance problems.  It's also
possible that there's an aspect to this that I am not seeing, and that
several days is perfectly normal for YOUR index.

Not enough RAM is the most common reason for performance issues on a
large index:

http://wiki.apache.org/solr/SolrPerformanceProblems

Thanks,
Shawn

Re: Problems with ICUCollationField

Thanks, that helps!

I'm trying to migrate from the now deprecated ICUCollationKeyFilterFactory I 
used before to the ICUCollationField.
Is there any description how to achieve this?

First tries now yield

ICUCollationField does not support specifying an analyzer.

which makes it complicated since I used the ICUCollationKeyFilterFactory to 
standardize my text fields (in particular because of German Umlauts).
But an ICUCollationField without LowerCaseFilter, a WhitespaceTokenizer, a 
LetterTokenizer, etc. doesn't do me much good, I'm afraid.
Or is this somehow wrapped into the ICUCollationField?

I didn't find ICUCollationField  in the solr wiki and not much information in 
the reference.
And the hint

"solr.ICUCollationField is included in the Solr analysis-extras contrib - see 
solr/contrib/analysis-extras/README.txt for instructions on which jars you need 
to add to your SOLR_HOME/lib in order to use it."

is misleading insofar as this README.txt doesn't mention the 
solr-analysis-extras-4.6.1.jar in dist.

Best
Thomas

Am 19.02.2014 um 14:27 schrieb Robert Muir:

> you need the solr analysis-extras jar itself, too.
> 
> 
> 
> On Wed, Feb 19, 2014 at 8:25 AM, Thomas Fischer  wrote:
> 
>> Hello Robert,
>> 
>> I already added
>> contrib/analysis-extras/lib/
>> and
>> contrib/analysis-extras/lucene-libs/
>> via lib directives in solrconfig, this is why the classes mentioned are
>> loaded.
>> 
>> Do you know which jar is supposed to contain the ICUCollationField?
>> 
>> Best regards
>> Thomas
>> 
>> 
>> 
>> Am 19.02.2014 um 13:54 schrieb Robert Muir:
>> 
>>> you need the solr analysis-extras jar in your classpath, too.
>>> 
>>> 
>>> 
>>> On Wed, Feb 19, 2014 at 6:45 AM, Thomas Fischer 
>> wrote:
>>> 
 Hello,

 I'm migrating to solr 4.6.1 and have problems with the ICUCollationField
 (apache-solr-ref-guide-4.6.pdf, pp. 31 and 100).

 I get consistently the error message
 Error loading class 'solr.ICUCollationField'.
 even after
 INFO: Adding
 'file:/srv/solr4.6.1/contrib/analysis-extras/lib/icu4j-49.1.jar' to
 classloader
 and
 INFO: Adding

>> 'file:/srv/solr4.6.1/contrib/analysis-extras/lucene-libs/lucene-analyzers-icu-4.6.1.jar'
 to classloader.

 Am I missing something?

 I solr's subversion I found

>> /SVN/solr/contrib/analysis-extras/src/java/org/apache/solr/schema/ICUCollationField.java
 but no corresponding class in solr4.6.1's contrib folder.

 Best
 Thomas

>> 
>>

Re: Problems with ICUCollationField

Hmm, for standardization of text fields, collation might be a little
awkward.

For your german umlauts, what do you mean by standardize? is this to
achieve equivalency of e.g. oe to ö in your search terms?

In that case, a simpler approach would be to put
GermanNormalizationFilterFactory in your chain:
http://lucene.apache.org/core/4_6_1/analyzers-common/org/apache/lucene/analysis/de/GermanNormalizationFilter.html


On Wed, Feb 19, 2014 at 9:16 AM, Thomas Fischer  wrote:

> Thanks, that helps!
>
> I'm trying to migrate from the now deprecated ICUCollationKeyFilterFactory
> I used before to the ICUCollationField.
> Is there any description how to achieve this?
>
> First tries now yield
>
> ICUCollationField does not support specifying an analyzer.
>
> which makes it complicated since I used the ICUCollationKeyFilterFactory
> to standardize my text fields (in particular because of German Umlauts).
> But an ICUCollationField without LowerCaseFilter, a WhitespaceTokenizer, a
> LetterTokenizer, etc. doesn't do me much good, I'm afraid.
> Or is this somehow wrapped into the ICUCollationField?
>
> I didn't find ICUCollationField  in the solr wiki and not much information
> in the reference.
> And the hint
>
> "solr.ICUCollationField is included in the Solr analysis-extras contrib -
> see solr/contrib/analysis-extras/README.txt for instructions on which jars
> you need to add to your SOLR_HOME/lib in order to use it."
>
> is misleading insofar as this README.txt doesn't mention the
> solr-analysis-extras-4.6.1.jar in dist.
>
> Best
> Thomas
>
>
> Am 19.02.2014 um 14:27 schrieb Robert Muir:
>
> > you need the solr analysis-extras jar itself, too.
> >
> >
> >
> > On Wed, Feb 19, 2014 at 8:25 AM, Thomas Fischer 
> wrote:
> >
> >> Hello Robert,
> >>
> >> I already added
> >> contrib/analysis-extras/lib/
> >> and
> >> contrib/analysis-extras/lucene-libs/
> >> via lib directives in solrconfig, this is why the classes mentioned are
> >> loaded.
> >>
> >> Do you know which jar is supposed to contain the ICUCollationField?
> >>
> >> Best regards
> >> Thomas
> >>
> >>
> >>
> >> Am 19.02.2014 um 13:54 schrieb Robert Muir:
> >>
> >>> you need the solr analysis-extras jar in your classpath, too.
> >>>
> >>>
> >>>
> >>> On Wed, Feb 19, 2014 at 6:45 AM, Thomas Fischer 
> >> wrote:
> >>>
>  Hello,
> 
>  I'm migrating to solr 4.6.1 and have problems with the
> ICUCollationField
>  (apache-solr-ref-guide-4.6.pdf, pp. 31 and 100).
> 
>  I get consistently the error message
>  Error loading class 'solr.ICUCollationField'.
>  even after
>  INFO: Adding
>  'file:/srv/solr4.6.1/contrib/analysis-extras/lib/icu4j-49.1.jar' to
>  classloader
>  and
>  INFO: Adding
> 
> >>
> 'file:/srv/solr4.6.1/contrib/analysis-extras/lucene-libs/lucene-analyzers-icu-4.6.1.jar'
>  to classloader.
> 
>  Am I missing something?
> 
>  I solr's subversion I found
> 
> 
> >>
> /SVN/solr/contrib/analysis-extras/src/java/org/apache/solr/schema/ICUCollationField.java
>  but no corresponding class in solr4.6.1's contrib folder.
> 
>  Best
>  Thomas
> 
> 
> >>
> >>
>
>

Re: Problems with ICUCollationField

> Hmm, for standardization of text fields, collation might be a little
> awkward.

I arrived there after using custom rules for a while (see "RuleBasedCollator" 
on http://wiki.apache.org/solr/UnicodeCollation) and then being told
"For better performance, less memory usage, and support for more locales, you 
can add the analysis-extras contrib and use ICUCollationKeyFilterFactory 
instead." (on the same page under "ICU Collation").

> For your german umlauts, what do you mean by standardize? is this to
> achieve equivalency of e.g. oe to ö in your search terms?

That is the main point, but I might also need the additional normalization of 
combined characters like
o+  ̈ = ö and probably similar constructions for other languages (like 
Hungarian).

> In that case, a simpler approach would be to put
> GermanNormalizationFilterFactory in your chain:
> http://lucene.apache.org/core/4_6_1/analyzers-common/org/apache/lucene/analysis/de/GermanNormalizationFilter.html

I'll see how far I get with this, but from the description
• 'ä', 'ö', 'ü' are replaced by 'a', 'o', 'u', respectively.
• 'ae' and 'oe' are replaced by 'a', and 'o', respectively.
this seems to be too far-reaching a reduction: while the identification "ä=ae" 
is not very serious and rarely misleading, "ä=a" might pack words together that 
shouldn't be, "Äsen" and "Asen" are quite different concepts,

In general, the deprecation of ICUCollationKeyFilterFactory doesn't seem to be 
really thought through.

Thanks anyway, best
Thomas

> 
> On Wed, Feb 19, 2014 at 9:16 AM, Thomas Fischer  wrote:
> 
>> Thanks, that helps!
>> 
>> I'm trying to migrate from the now deprecated ICUCollationKeyFilterFactory
>> I used before to the ICUCollationField.
>> Is there any description how to achieve this?
>> 
>> First tries now yield
>> 
>> ICUCollationField does not support specifying an analyzer.
>> 
>> which makes it complicated since I used the ICUCollationKeyFilterFactory
>> to standardize my text fields (in particular because of German Umlauts).
>> But an ICUCollationField without LowerCaseFilter, a WhitespaceTokenizer, a
>> LetterTokenizer, etc. doesn't do me much good, I'm afraid.
>> Or is this somehow wrapped into the ICUCollationField?
>> 
>> I didn't find ICUCollationField  in the solr wiki and not much information
>> in the reference.
>> And the hint
>> 
>> "solr.ICUCollationField is included in the Solr analysis-extras contrib -
>> see solr/contrib/analysis-extras/README.txt for instructions on which jars
>> you need to add to your SOLR_HOME/lib in order to use it."
>> 
>> is misleading insofar as this README.txt doesn't mention the
>> solr-analysis-extras-4.6.1.jar in dist.
>> 
>> Best
>> Thomas
>> 
>> 
>> Am 19.02.2014 um 14:27 schrieb Robert Muir:
>> 
>>> you need the solr analysis-extras jar itself, too.
>>> 
>>> 
>>> 
>>> On Wed, Feb 19, 2014 at 8:25 AM, Thomas Fischer 
>> wrote:
>>> 
 Hello Robert,

 I already added
 contrib/analysis-extras/lib/
 and
 contrib/analysis-extras/lucene-libs/
 via lib directives in solrconfig, this is why the classes mentioned are
 loaded.

 Do you know which jar is supposed to contain the ICUCollationField?

 Best regards
 Thomas

 Am 19.02.2014 um 13:54 schrieb Robert Muir:

> you need the solr analysis-extras jar in your classpath, too.
> 
> 
> 
> On Wed, Feb 19, 2014 at 6:45 AM, Thomas Fischer 
 wrote:
> 
>> Hello,
>> 
>> I'm migrating to solr 4.6.1 and have problems with the
>> ICUCollationField
>> (apache-solr-ref-guide-4.6.pdf, pp. 31 and 100).
>> 
>> I get consistently the error message
>> Error loading class 'solr.ICUCollationField'.
>> even after
>> INFO: Adding
>> 'file:/srv/solr4.6.1/contrib/analysis-extras/lib/icu4j-49.1.jar' to
>> classloader
>> and
>> INFO: Adding
>> 

>> 'file:/srv/solr4.6.1/contrib/analysis-extras/lucene-libs/lucene-analyzers-icu-4.6.1.jar'
>> to classloader.
>> 
>> Am I missing something?
>> 
>> I solr's subversion I found
>> 
>> 

>> /SVN/solr/contrib/analysis-extras/src/java/org/apache/solr/schema/ICUCollationField.java
>> but no corresponding class in solr4.6.1's contrib folder.
>> 
>> Best
>> Thomas
>> 
>> 

>> 
>>

Re: Problems with ICUCollationField

On Wed, Feb 19, 2014 at 10:33 AM, Thomas Fischer  wrote:

>
> > Hmm, for standardization of text fields, collation might be a little
> > awkward.
>
> I arrived there after using custom rules for a while (see
> "RuleBasedCollator" on http://wiki.apache.org/solr/UnicodeCollation) and
> then being told
> "For better performance, less memory usage, and support for more locales,
> you can add the analysis-extras contrib and use
> ICUCollationKeyFilterFactory instead." (on the same page under "ICU
> Collation").
>
> > For your german umlauts, what do you mean by standardize? is this to
> > achieve equivalency of e.g. oe to ö in your search terms?
>
> That is the main point, but I might also need the additional normalization
> of combined characters like
> o+  ̈ = ö and probably similar constructions for other languages (like
> Hungarian).
>

Sure but using collation to get normalization is pretty overkill too. Maybe
try ICUNormalizer2Filter? This gives you better control over the
normalization anyway.


>
> > In that case, a simpler approach would be to put
> > GermanNormalizationFilterFactory in your chain:
> >
> http://lucene.apache.org/core/4_6_1/analyzers-common/org/apache/lucene/analysis/de/GermanNormalizationFilter.html
>
> I'll see how far I get with this, but from the description
> • 'ä', 'ö', 'ü' are replaced by 'a', 'o', 'u', respectively.
> • 'ae' and 'oe' are replaced by 'a', and 'o', respectively.
> this seems to be too far-reaching a reduction: while the identification
> "ä=ae" is not very serious and rarely misleading, "ä=a" might pack words
> together that shouldn't be, "Äsen" and "Asen" are quite different concepts,
>

I'm not sure thats a mainstream opinion: not only do the default german
collation rules conflate these two characters as equivalent at primary
level, but so do many german stemming algorithms. Similar arguments could
be made for 'résumé' versus 'resume' and so on. Search isn't an exact
science.

Exact fragment length in highlighting

2014-02-19 Thread Juan Carlos Serrano

Hello everybody,

I'm using Solr 4.6.1. and I'd like to know if there's a way to determine
exactly the number of characters of a fragment used in highlights. If I use
hl.fragsize=70 the length of the fragments that I get is variable (often)
and I get results of 90 characters length.

Regards and thanks in advance,

Juan Carlos

Re: Preventing multiple on-deck searchers without causing failed commits

2014-02-19 Thread Greg Walters

I believe that there's a configuration option that'll make on-deck searchers be 
used if they're needed even if they're not fully warmed yet. You might try that 
option and see if it doesn't solve your 503 errors.

Thanks,
Greg

On Feb 18, 2014, at 9:05 PM, Erick Erickson  wrote:

> Colin:
> 
> Stop. Back up. The automatic soft commits will make updates available to
> your users every second. Those documents _include_ anything from your "hard
> commit" jobs. What could be faster? Parenthetically I'll add that 1 second
> soft commits are rarely an actual requirement, but that's your decision.
> 
> For the hard commits. Fine. Do them if you insist. Just set
> openSearcher=false. The documents will be searchable the next time the soft
> commit happens, within one second. The key is openSearcher=false. That
> prevents starting a brand new searcher.
> 
> BTW, your commits are not failing. It's just that _after_ the commit
> happens, the warming searcher limit is exceeded.
> 
> You can even wait until the segments are flushed to disk. All without
> opening a searcher.
> 
> Shawn is spot on in his recommendations to not fixate on the commits. Solr
> handles that. Here's a long blog about all the details of durability .vs.
> visibility.
> http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
> 
> You're over-thinking the problem here, trying to control commits with a
> sledgehammer when you don't need to, just use the built-in capabilities.
> 
> Best,
> Erick
> 
> 
> 
> On Tue, Feb 18, 2014 at 10:33 AM, Colin Bartolome  wrote:
> 
>> On 02/18/2014 10:15 AM, Shawn Heisey wrote:
>> 
>>> If you want to be completely in control like that, get rid of the
>>> automatic soft commits and just do the hard commits.
>>> 
>>> I would personally choose another option for your setup -- get rid of
>>> *all* explicit commits entirely, and just configure autoCommit and
>>> autoSoftCommit in the server config.  Since you're running 4.x, you really
>>> should have the transaction log (updateLog in the config) enabled.  You
>>> can rely on the transaction log to replay updates since the last hard
>>> commit if there's ever a crash.
>>> 
>>> I would also recommend upgrading to 4.6.1, but that's a completely
>>> separate item.
>>> 
>>> Thanks,
>>> Shawn
>>> 
>>> 
>> We use the automatic soft commits to get search index updates to our users
>> faster, via Near Realtime Searching. We have the updateLog enabled. I'm not
>> worried that the Solr side of the equation will lose data; I'm worried that
>> the communication from our web servers and scheduled jobs to the Solr
>> servers will break down and nothing will come along to make sure everything
>> is up to date. It sounds like what we're picturing is not currently
>> supported, so I'll file the RFE.
>> 
>> Will upgrading to 4.6.1 help at all with this issue?
>>

Re: Exact fragment length in highlighting

2014-02-19 Thread Ahmet Arslan

Hi Juan,

Are you counting number of characters of html rendered snippet?

I think pre and post strings (html markup which are not displayed) are causing 
that difference.

Ahmet


On Wednesday, February 19, 2014 5:53 PM, Juan Carlos Serrano 
 wrote:
Hello everybody,

I'm using Solr 4.6.1. and I'd like to know if there's a way to determine
exactly the number of characters of a fragment used in highlights. If I use
hl.fragsize=70 the length of the fragments that I get is variable (often)
and I get results of 90 characters length.

Regards and thanks in advance,

Juan Carlos

Re: Fault Tolerant Technique of Solr Cloud

2014-02-19 Thread Per Steffensen


On 19/02/14 07:57, Vineet Mishra wrote:

Thanks for all your response but my doubt is which *Server:Port* should the
query be made as we don't know the crashed server or which server might
crash in the future(as any server can go down).
That is what CloudSolrServer will deal with for you. It knows which 
servers are down and make sure not to send request to those servers.


The only intention for writing this doubt is to get an idea about how the
query format for distributed search might work if any of the shard or
replica goes down.


// Setting up your CloudSolrServer-client
CloudSolrServer client=  new  CloudSolrServer();  // 
 being the same string as you provide in -D|zkHost when starting 
your servers
|client.setDefaultCollection("collection1");
client.connect();

// Creating and firing queries (you can do it in different way, but at least 
this is an option)
SolrQuery query = new SolrQuery("*:*");
QueryResponse results = client.query(query);


Because you are using CloudSolrServer you do not have to worry about not 
sending the request to a crashed server.


In your example I believe the situation is as follows:
* One collection called "collection1" with two shards "shard1" and 
"shard2" each having two replica "replica1" and "replica2" (a replica is 
an "instance" of a shard, and when you have one replica you are not 
having replication).
* collection1.shard1.replica1 is running on localhost:8983 and 
collection1.shard1.replica2 is running on localhost:8900 (or maybe switched)
* collection1.shard2.replica1 is running on localhost:7574 and 
collection1.shard2.replica2 is running on localhost:7500 (or maybe switched)
If localhost:8900 is the only server that is down, all data is still 
available for search because every shard has at least on replica 
running. In that case I believe setting "shards.tolerant" will not make 
a difference. You will get your response no matter what. But if 
localhost:8983 was also down there would no live replica of shard1. I 
that case you will get an exception from you query, indicating that the 
query cannot be carried out over the complete data-set. In that case if 
you set "shards.tolerant" that behaviour will change, and you will not 
get an exception - you will get a real response, but it will just not 
include data from shard1, because it is not available at the moment. 
That is just the way I believe "shards.tolerant" works, but you might 
want to verify that.


To set "shards.tolerant":

SolrQuery query = new SolrQuery("*:*");
query.set("shards.tolerant", true);
QueryResponse results = client.query(query);


Believe distributes search is default, but you can explicitly require it by

query.setDistrib(true);

or

query.set("distrib", true);



Thanks

Re: Preventing multiple on-deck searchers without causing failed commits

2014-02-19 Thread Shawn Heisey


On 2/19/2014 8:59 AM, Greg Walters wrote:

I believe that there's a configuration option that'll make on-deck searchers be 
used if they're needed even if they're not fully warmed yet. You might try that 
option and see if it doesn't solve your 503 errors.


I'm fairly sure that this option (useColdSearcher) only applies to 
warming queries defined in solrconfig.xml, and that it only applies to 
situations when the searcher that is warming up is the *ONLY* searcher 
that exists.  The only time that should happen is at Solr startup and 
core reload.  At that time, the only warming queries that will be 
executed are those configured for the firstSearcher event.


A quick peek at the code (branch_4x, SolrCore.java, starting at line 
1647) seems to confirm this.  I did not do an in-depth analysis.


Thanks,
Shawn

Re: Preventing multiple on-deck searchers without causing failed commits

2014-02-19 Thread Greg Walters

> A quick peek at the code (branch_4x, SolrCore.java, starting at line 1647) 
> seems to confirm this.

It seems my understanding of that option was wrong! Thanks for correcting me 
Shawn.

Greg 

On Feb 19, 2014, at 11:19 AM, Shawn Heisey  wrote:

> On 2/19/2014 8:59 AM, Greg Walters wrote:
>> I believe that there's a configuration option that'll make on-deck searchers 
>> be used if they're needed even if they're not fully warmed yet. You might 
>> try that option and see if it doesn't solve your 503 errors.
> 
> I'm fairly sure that this option (useColdSearcher) only applies to warming 
> queries defined in solrconfig.xml, and that it only applies to situations 
> when the searcher that is warming up is the *ONLY* searcher that exists.  The 
> only time that should happen is at Solr startup and core reload.  At that 
> time, the only warming queries that will be executed are those configured for 
> the firstSearcher event.
> 
> A quick peek at the code (branch_4x, SolrCore.java, starting at line 1647) 
> seems to confirm this.  I did not do an in-depth analysis.
> 
> Thanks,
> Shawn
>

Re: Exact fragment length in highlighting

2014-02-19 Thread Jason Hellman

Juan,

Pay close attention to the boundary scanner you’re employing:

http://wiki.apache.org/solr/HighlightingParameters#hl.boundaryScanner

You can be explicit to indicate a type (hl.bs.type) with options such as 
CHARACTER, WORD, SENTENCE, and LINE.  The default is WORD (as the wiki 
indicates) and I presume this is what you are employing.

Be careful about using explicit characters.  I had an interesting case of 
highlight returns that looked like this:

> This is a highlight
> Here is another highlight
> Yes, another one, etc…

It was a bit maddening trying to figure out why “>” was in the highlight…turned 
out it was XML content and the character boundary clipped the trailing “>” 
based on the boundary rules.

In any case, you should be able to achieve a pretty flexible result depending 
on what you’re really after with the right combination of settings.

Jason

On Feb 19, 2014, at 7:53 AM, Juan Carlos Serrano  wrote:

> Hello everybody,
> 
> I'm using Solr 4.6.1. and I'd like to know if there's a way to determine
> exactly the number of characters of a fragment used in highlights. If I use
> hl.fragsize=70 the length of the fragments that I get is variable (often)
> and I get results of 90 characters length.
> 
> Regards and thanks in advance,
> 
> Juan Carlos

Does SolrCloud Improves Indexing or Slows it down

2014-02-19 Thread Susheel Kumar

Hi,

If we setup a solr cloud with 3 nodes and then we have like 100+ million 
documents to index. How we should be indexing a) will the indexing request be 
going to each machine assuming we are able to divide data based on some field 
or b) we should be sending the request to one end point and what should be end 
point? 

Can you please clarify and reading this article it says indexing may become 
slower. 

http://stackoverflow.com/questions/13500955/does-solrclouds-scalability-extend-to-indexing
  


Please suggest & let me know if you need more info.

Thnx

Re: Slow 95th-percentile

2014-02-19 Thread Allan Carroll

Thanks, Chris. Adding autoWarming to the filter cache made another big  
improvement.

Between increasing the soft commit to 60s, fixing the q:* query, and 
autowarming the filter caches my 95% latencies are down to a very acceptable 
range — almost an order of magnitude improvement. :-)

-Allan

On February 18, 2014 at 5:32:51 PM, Chris Hostetter (hossman_luc...@fucit.org) 
wrote:


: Slowing the soft commits to every 100 seconds helped. The main culprit  
: was a bad query that was coming through every few seconds. Something  
: about the empty fq param and the q=* slowed everything else down.  
:  
: INFO: [event] webapp=/solr path=/select  
: params={start=0&q=*&wt=javabin&fq=&fq=startTime:139283643&version=2}  
: hits=1894 status=0 QTime=6943  

1) if you are using Solr 4.1 or earlier, then q=* is an expensive &  
useless query that doesn't mean what you think it does...  

https://issues.apache.org/jira/browse/SOLR-2996  

2) an empty "fq" doesn't cost anything -- if you use debugQuery=true you  
should see that it's not even included in "parsed_filter_queries" because  
it's totally ignored.  

3) if that "startTime" value changes at some fixed and regular  
interval, that could explain some anomoloies if it's normally the  
same and cached, but changes once a day/hour/minute or whatever and is a  
bit slow to cache.  


bottom line: a softCommit is going to re-open a searcher, which is going  
to wipe your caches. if you don't have any (auto)warming configured, that  
means any "fq"s, or "q"s that you run regularly are going to pay the  
price of being "slow" the first time they are run against a new searcher  
is opened.  

If your priority is low response time, you really want to open new  
searchers as infrequently as your SLA for visibility allows, and use  
(auto)warming for those common queries.  



-Hoss  
http://www.lucidworks.com/

Re: Does SolrCloud Improves Indexing or Slows it down

2014-02-19 Thread Kranti Parisa

Why don't you do parallel indexing and then merge everything into one and
replicate that from the master to the slaves in SolrCloud?

Thanks,
Kranti K. Parisa
http://www.linkedin.com/in/krantiparisa



On Wed, Feb 19, 2014 at 3:04 PM, Susheel Kumar <
susheel.ku...@thedigitalgroup.net> wrote:

> Hi,
>
> If we setup a solr cloud with 3 nodes and then we have like 100+ million
> documents to index. How we should be indexing a) will the indexing request
> be going to each machine assuming we are able to divide data based on some
> field or b) we should be sending the request to one end point and what
> should be end point?
>
> Can you please clarify and reading this article it says indexing may
> become slower.
>
>
> http://stackoverflow.com/questions/13500955/does-solrclouds-scalability-extend-to-indexing
>
>
> Please suggest & let me know if you need more info.
>
> Thnx
>

Getting fields from query

Is there a way to get all the fields that are in a particular query?
 Ultimately I'd like to restrict the fields that a user can use to search
so I want to make sure that there aren't any fields in the query that they
should not be allowed to search.

Re: Getting fields from query

2014-02-19 Thread Ahmet Arslan

Hi Jamie,

May not be direct answer to your question but your Q reminded me edismax's uf 
parameter.

http://wiki.apache.org/solr/ExtendedDisMax#uf_.28User_Fields.29





On Wednesday, February 19, 2014 11:18 PM, Jamie Johnson  
wrote:
Is there a way to get all the fields that are in a particular query?
Ultimately I'd like to restrict the fields that a user can use to search
so I want to make sure that there aren't any fields in the query that they
should not be allowed to search.

RE: Does SolrCloud Improves Indexing or Slows it down

2014-02-19 Thread Susheel Kumar

Thanks for your reply, Kranti. If we want to shard the index into 3 nodes  does 
the slave/master concept will help and we are using solr 4.6 so should we 
utilize the concept of master/slave or move to sharding concept.

-Original Message-
From: Kranti Parisa [mailto:kranti.par...@gmail.com] 
Sent: Wednesday, February 19, 2014 2:12 PM
To: solr-user@lucene.apache.org
Subject: Re: Does SolrCloud Improves Indexing or Slows it down

Why don't you do parallel indexing and then merge everything into one and 
replicate that from the master to the slaves in SolrCloud?

Thanks,
Kranti K. Parisa
http://www.linkedin.com/in/krantiparisa

On Wed, Feb 19, 2014 at 3:04 PM, Susheel Kumar < 
susheel.ku...@thedigitalgroup.net> wrote:

> Hi,
>
> If we setup a solr cloud with 3 nodes and then we have like 100+ 
> million documents to index. How we should be indexing a) will the 
> indexing request be going to each machine assuming we are able to 
> divide data based on some field or b) we should be sending the request 
> to one end point and what should be end point?
>
> Can you please clarify and reading this article it says indexing may 
> become slower.
>
>
> http://stackoverflow.com/questions/13500955/does-solrclouds-scalabilit
> y-extend-to-indexing
>
>
> Please suggest & let me know if you need more info.
>
> Thnx
>

Re: Getting fields from query

This actually may do what I want, I'll have to check.  Right now we are
using Lucene directly and not Solr for this particular project, but if this
fits the bill we may be able to use just the query parser.

On Wed, Feb 19, 2014 at 4:30 PM, Ahmet Arslan  wrote:

> Hi Jamie,
>
> May not be direct answer to your question but your Q reminded me edismax's
> uf parameter.
>
> http://wiki.apache.org/solr/ExtendedDisMax#uf_.28User_Fields.29
>
>
>
>
>
> On Wednesday, February 19, 2014 11:18 PM, Jamie Johnson 
> wrote:
> Is there a way to get all the fields that are in a particular query?
> Ultimately I'd like to restrict the fields that a user can use to search
> so I want to make sure that there aren't any fields in the query that they
> should not be allowed to search.
>
>

Re: Getting fields from query

On closer inspection this isn't quite what I'm looking for. The
functionality is spot on, but I'm looking for a way to do this using a
Query Parser in Lucene core, i.e. StandardQueryParser unless folks have
experience with using the Solr query parsers with vanilla lucene?  Though
I'd prefer to stick to just Lucene.

If there was a way to just get all of the fields that were part of a query
I could easily do this myself, but it looks as if there is no standard way
to get a field given a query i.e. TermQuery you need to getTerm, SpanQuery
you get field, BooleanQuery you need to check all the clauses, etc.  It's
definitely possible for me to go through each of these and determine the
proper way to do it, but I'd have thought that this was something already
done as a utility somewhere.  I had thought that the
Query.extractTerms(Set terms) would have done this, but it doesn't
appear to be implemented for range queries.  Any ideas how I can get this
information?

On Wed, Feb 19, 2014 at 7:37 PM, Jamie Johnson  wrote:

> This actually may do what I want, I'll have to check.  Right now we are
> using Lucene directly and not Solr for this particular project, but if this
> fits the bill we may be able to use just the query parser.
>
>
> On Wed, Feb 19, 2014 at 4:30 PM, Ahmet Arslan  wrote:
>
>> Hi Jamie,
>>
>> May not be direct answer to your question but your Q reminded me
>> edismax's uf parameter.
>>
>> http://wiki.apache.org/solr/ExtendedDisMax#uf_.28User_Fields.29
>>
>>
>>
>>
>>
>> On Wednesday, February 19, 2014 11:18 PM, Jamie Johnson <
>> jej2...@gmail.com> wrote:
>> Is there a way to get all the fields that are in a particular query?
>> Ultimately I'd like to restrict the fields that a user can use to search
>> so I want to make sure that there aren't any fields in the query that they
>> should not be allowed to search.
>>
>>
>

Re: Getting fields from query

2014-02-19 Thread Jack Krupansky

Try asking the question on the Lucene user list - this is the Solr user 
list.


Also, clarify whether you are trying to get the list of fields used in a 
query or trying to limit the fields that can be used in a query. uf does the 
latter, but your latest message suggested the former. You're confused us - 
or at least me!


Look at how "uf" is implemented in Solr and then just replicate that in the 
Lucene standard query parser. The solr standard query parser was originally 
just a subclass of the Lucene standard query parser, but then they diverged 
and Solr now has a copy of the Lucene query parser.


-- Jack Krupansky

-Original Message- 
From: Jamie Johnson

Sent: Wednesday, February 19, 2014 8:05 PM
To: solr-user@lucene.apache.org ; Ahmet Arslan
Subject: Re: Getting fields from query

On closer inspection this isn't quite what I'm looking for. The
functionality is spot on, but I'm looking for a way to do this using a
Query Parser in Lucene core, i.e. StandardQueryParser unless folks have
experience with using the Solr query parsers with vanilla lucene?  Though
I'd prefer to stick to just Lucene.

If there was a way to just get all of the fields that were part of a query
I could easily do this myself, but it looks as if there is no standard way
to get a field given a query i.e. TermQuery you need to getTerm, SpanQuery
you get field, BooleanQuery you need to check all the clauses, etc.  It's
definitely possible for me to go through each of these and determine the
proper way to do it, but I'd have thought that this was something already
done as a utility somewhere.  I had thought that the
Query.extractTerms(Set terms) would have done this, but it doesn't
appear to be implemented for range queries.  Any ideas how I can get this
information?


On Wed, Feb 19, 2014 at 7:37 PM, Jamie Johnson  wrote:


This actually may do what I want, I'll have to check.  Right now we are
using Lucene directly and not Solr for this particular project, but if 
this

fits the bill we may be able to use just the query parser.


On Wed, Feb 19, 2014 at 4:30 PM, Ahmet Arslan  wrote:


Hi Jamie,

May not be direct answer to your question but your Q reminded me
edismax's uf parameter.

http://wiki.apache.org/solr/ExtendedDisMax#uf_.28User_Fields.29





On Wednesday, February 19, 2014 11:18 PM, Jamie Johnson <
jej2...@gmail.com> wrote:
Is there a way to get all the fields that are in a particular query?
Ultimately I'd like to restrict the fields that a user can use to search
so I want to make sure that there aren't any fields in the query that 
they

should not be allowed to search.

Re: block join and atomic updates

2014-02-19 Thread Michael Sokolov

Maybe he can use updateable docvalues (LUCENE-5189)? I heard that was a
thing. Has it made its way into Solr in some way?

-Mike

On 2/19/2014 4:23 AM, Mikhail Khludnev wrote:

Just a side note. Sidecar index might be really useful for updating blocked
docs, but it's in experimenting stage iirc.
http://www.lucenerevolution.org/2013/Sidecar-Index-Solr-Components-for-Parallel-Index-Management

On Wed, Feb 19, 2014 at 10:42 AM, Mikhail Khludnev <
mkhlud...@griddynamics.com> wrote:

Colleagues,
You are definitely right regarding denorm&collapse. It works fine in most
cases, but look at the case more precisely. Moritz needs to update the
parent's fields, if they are copied during denormalization, the price of
update is the same as block join's. With q-time join updates are way
cheaper, but searching time, you know.
19.02.2014 8:15 пользователь "Walter Underwood"
написал:

Listen to that advice. Denormalize, denormalize, denormalize. Think about

the results page and work backwards from that. Flat data model.

wunder
Search guy at Infoseek, Inktomi, Verity, Autonomy, Netflix, and Chegg

On Feb 18, 2014, at 7:37 PM, Jason Hellman <
jhell...@innoventsolutions.com> wrote:

Thinking in terms of normalized data in the context of a Lucene index

is dangerous. It is not a relational data model technology, and the join
behaviors available to you have limited use. Each approach requires
compromises that are likely impermissible for certain uses cases.

If it is at all reasonable to consider you will likely be best served

de-normalizing the data. Of course, your specific details may prove an
exception to this rule...but generally approach works very well.

On Feb 18, 2014, at 4:19 AM, Mikhail Khludnev <

mkhlud...@griddynamics.com> wrote:

absolutely.

On Tue, Feb 18, 2014 at 1:20 PM, wrote:

But isn't query time join much slower when it comes to a large amount

documents?

Zitat von Mikhail Khludnev :

Hello,

It sounds like you need to switch to query time join.
15.02.2014 21:57 пользователь написал:

Any suggestions?

Zitat von m...@preselect-media.com:

Yonik Seeley :

On Thu, Feb 13, 2014 at 8:25 AM, wrote:

Is there any workaround to perform atomic updates on blocks or do

have to
re-index the parent document and all its children always again

if I

want to
update a field?

The latter, unfortunately.

Is there any plan to change this behavior in near future?

So, I'm thinking of alternatives without loosing the benefit of

block

join.
I try to explain an idea I just thought about:

Let's say I have a parent document A with a number of fields I

want to

update regularly and a number of child documents AC_1 ... AC_n

which are

only indexed once and aren't going to change anymore.
So, if I index A and AC_* in a block and I update A, the block is

gone.

But if I create an additional document AF which only contains

something

like an foreign key to A and indexing AF + AC_* as a block (not A

+ AC_*

anymore), could I perform a {!parent ... } query on AF + AC_* and

make

an
join from the results to get A?
Does this makes any sense and is it even possible? ;-)
And if it's possible, how can I do it?

Thanks,
- Moritz

--
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

--
Walter Underwood
wun...@wunderwood.org

Re: Getting fields from query