Slow query response.

2015-12-17 Thread Modassar Ather
Hi,

I have a field f which is defined as follows.


Solr-5.2.1 is used. The index is spread across 12 shards (no replica) and
the index size on each node is around 100 GB.

When I search for 50 thousand values (ORed) in the field f it takes almost
around 45 to 55 seconds.
Per my understanding it is too slow. Kindly share your thoughts on this
behavior and provide your suggestions.

Thanks,
Modassar


Re: Slow query response.

2015-12-21 Thread Modassar Ather
Thanks Jack for your response.

The users of our application can enter a list of ids which the UI caps at
50k. All the ids are valid and match documents. We do faceting, grouping
etc. on the result set of up to 50k documents.
I checked and found that the query is not very resource intensive. It is
not eating up lot of CPU%, IO% or memory.

Regards,
Modassar

On Thu, Dec 17, 2015 at 8:44 PM, Jack Krupansky 
wrote:

> A single query with tens of thousands of terms is very clearly a misuse of
> Solr. If it happens to work at all, consider yourself lucky. Are you using
> a standard Solr query parser or the terms query parser that lets you write
> a raw list of terms to OR.
>
> Are your nodes CPU-bound or I/O-bound during those 50-second intervals? My
> bet is that your index does not fit fully in memory, causing lots of I/O to
> repeatedly page in portions of the index and probably additional CPU usage
> as well.
>
> How many rows are you returning on each query? Are you using all these
> terms just to filter a smaller query or to return a large bulk of
> documents?
>
>
> -- Jack Krupansky
>
> On Thu, Dec 17, 2015 at 7:01 AM, Modassar Ather 
> wrote:
>
> > Hi,
> >
> > I have a field f which is defined as follows.
> >  > omitNorms="true"/>
> >
> > Solr-5.2.1 is used. The index is spread across 12 shards (no replica) and
> > the index size on each node is around 100 GB.
> >
> > When I search for 50 thousand values (ORed) in the field f it takes
> almost
> > around 45 to 55 seconds.
> > Per my understanding it is too slow. Kindly share your thoughts on this
> > behavior and provide your suggestions.
> >
> > Thanks,
> > Modassar
> >
>


Re: Best practices on monitoring Solr

2015-12-22 Thread Modassar Ather
Last week our Solr Search was un-responsive and we need to re-boot the
server, but we were able to find out after customer complained about it.
What's best way to monitor that search is working?

May not be the best way but you can write a class which keeps on checking
the status of all the nodes on SolrCloud (assuming you are using SolrCloud)
on certain interval.
If the node(s) status is down then you can take a decision of restarting
Solr cluster. The restart of SolrCloud can be automated using some script.
If the state of the node(s) is recovering then wait for sometime and check
if it has recovered and has an active state. If it has not recovered even
after some time waiting then take a decision of restarting the SolrCloud
instances.

Hope this helps.

Regards,
Modassar



On Wed, Dec 23, 2015 at 12:45 AM, Tiwari, Shailendra <
shailendra.tiw...@macmillan.com> wrote:

> Hi,
>
> Last week our Solr Search was un-responsive and we need to re-boot the
> server, but we were able to find out after customer complained about it.
> What's best way to monitor that search is working?
> We can always add Gomez alerts from UI.
> What are the best practices?
>
> Thanks
>
> Shail


Query behavior difference.

2016-01-04 Thread Modassar Ather
Hi,

Kindly help me understand how will relevance ranking differ int following
searches.

query : fl:network
query : fl:networ*

What I am observing that the results returned are different in both of them
in a way that the top documents returned for q=fl:network is not present in
the top results of q=fl:networ*.
For example for q=fl:network I am getting top documents having around 20
occurrence of network whereas the top result of q=fl:networ* has only
couple of occurrence of network.
I am aware of the underlying normalization process participation in
relevance ranking of documents but not able to understand such a difference
in the ranking of result for the queries.

Thanks,
Modassar


solr-user@lucene.apache.org

2016-01-05 Thread Modassar Ather
Hi,

*q=fl1:net*&facet.field=fl&facet.limit=50&stats=true&stats.field={!cardinality=1.0}fl*
is returning cardinality around 15 million. It is taking around 4 minutes.
Similar response time is seen with different queries which yields high
cardinality. Kindly note that the cardinality=1.0 is the desired goal.
Here in the above example the fl1 is a text field whereas fl is a docValue
enabled non-stroed, non-indexed field.
Kindly let me know if such response time is expected or I am missing
something about this feature in my query.

Thanks,
Modassar


Re: Query behavior difference.

2016-01-05 Thread Modassar Ather
Thanks for your response Ahmet.

Best,
Modassar

On Mon, Jan 4, 2016 at 5:07 PM, Ahmet Arslan 
wrote:

> Hi,
>
> I think wildcard queries fl:networ* are re-written into Constant Score
> Query.
> fl=*,score should returns same score for all documents that are retrieved.
>
> Ahmet
>
>
>
> On Monday, January 4, 2016 12:22 PM, Modassar Ather <
> modather1...@gmail.com> wrote:
> Hi,
>
> Kindly help me understand how will relevance ranking differ int following
> searches.
>
> query : fl:network
> query : fl:networ*
>
> What I am observing that the results returned are different in both of them
> in a way that the top documents returned for q=fl:network is not present in
> the top results of q=fl:networ*.
> For example for q=fl:network I am getting top documents having around 20
> occurrence of network whereas the top result of q=fl:networ* has only
> couple of occurrence of network.
> I am aware of the underlying normalization process participation in
> relevance ranking of documents but not able to understand such a difference
> in the ranking of result for the queries.
>
> Thanks,
> Modassar
>


Re: Query behavior difference.

2016-01-06 Thread Modassar Ather
Please help me understand why queries like wildcard, prefix and few others
are re-written into constant score query?
Why the scoring factors are not taken into consideration in such queries?

Please correct me if I am wrong that this behavior is per the query type
irrespective of the parser used.

Thanks,
Modassar

On Wed, Jan 6, 2016 at 12:56 PM, Modassar Ather 
wrote:

> Thanks for your response Ahmet.
>
> Best,
> Modassar
>
> On Mon, Jan 4, 2016 at 5:07 PM, Ahmet Arslan 
> wrote:
>
>> Hi,
>>
>> I think wildcard queries fl:networ* are re-written into Constant Score
>> Query.
>> fl=*,score should returns same score for all documents that are retrieved.
>>
>> Ahmet
>>
>>
>>
>> On Monday, January 4, 2016 12:22 PM, Modassar Ather <
>> modather1...@gmail.com> wrote:
>> Hi,
>>
>> Kindly help me understand how will relevance ranking differ int following
>> searches.
>>
>> query : fl:network
>> query : fl:networ*
>>
>> What I am observing that the results returned are different in both of
>> them
>> in a way that the top documents returned for q=fl:network is not present
>> in
>> the top results of q=fl:networ*.
>> For example for q=fl:network I am getting top documents having around 20
>> occurrence of network whereas the top result of q=fl:networ* has only
>> couple of occurrence of network.
>> I am aware of the underlying normalization process participation in
>> relevance ranking of documents but not able to understand such a
>> difference
>> in the ranking of result for the queries.
>>
>> Thanks,
>> Modassar
>>
>
>


Re: Query behavior difference.

2016-01-06 Thread Modassar Ather
Thanks for your responses.

Best,
Modassar

On Wed, Jan 6, 2016 at 9:27 PM, Jack Krupansky 
wrote:

> The motivation for the constant-score rewrite is simply performance. As per
> the Javadoc:
>
> "*This method is faster than the BooleanQuery rewrite methods when the
> number of matched terms or matched documents is non-trivial. Also, it will
> never hit an errant BooleanQuery.TooManyClauses exception.*"
>
> So that's a second reason - to avoid the max clause count limitation of
> Boolean Query.
>
> See:
>
> https://lucene.apache.org/core/5_4_0/core/org/apache/lucene/search/MultiTermQuery.html#CONSTANT_SCORE_REWRITE
>
> https://lucene.apache.org/core/5_4_0/core/org/apache/lucene/search/WildcardQuery.html
>
>
> -- Jack Krupansky
>
> On Wed, Jan 6, 2016 at 6:07 AM, Modassar Ather 
> wrote:
>
> > Please help me understand why queries like wildcard, prefix and few
> others
> > are re-written into constant score query?
> > Why the scoring factors are not taken into consideration in such queries?
> >
> > Please correct me if I am wrong that this behavior is per the query type
> > irrespective of the parser used.
> >
> > Thanks,
> > Modassar
> >
> > On Wed, Jan 6, 2016 at 12:56 PM, Modassar Ather 
> > wrote:
> >
> > > Thanks for your response Ahmet.
> > >
> > > Best,
> > > Modassar
> > >
> > > On Mon, Jan 4, 2016 at 5:07 PM, Ahmet Arslan  >
> > > wrote:
> > >
> > >> Hi,
> > >>
> > >> I think wildcard queries fl:networ* are re-written into Constant Score
> > >> Query.
> > >> fl=*,score should returns same score for all documents that are
> > retrieved.
> > >>
> > >> Ahmet
> > >>
> > >>
> > >>
> > >> On Monday, January 4, 2016 12:22 PM, Modassar Ather <
> > >> modather1...@gmail.com> wrote:
> > >> Hi,
> > >>
> > >> Kindly help me understand how will relevance ranking differ int
> > following
> > >> searches.
> > >>
> > >> query : fl:network
> > >> query : fl:networ*
> > >>
> > >> What I am observing that the results returned are different in both of
> > >> them
> > >> in a way that the top documents returned for q=fl:network is not
> present
> > >> in
> > >> the top results of q=fl:networ*.
> > >> For example for q=fl:network I am getting top documents having around
> 20
> > >> occurrence of network whereas the top result of q=fl:networ* has only
> > >> couple of occurrence of network.
> > >> I am aware of the underlying normalization process participation in
> > >> relevance ranking of documents but not able to understand such a
> > >> difference
> > >> in the ranking of result for the queries.
> > >>
> > >> Thanks,
> > >> Modassar
> > >>
> > >
> > >
> >
>


solr-user@lucene.apache.org

2016-01-08 Thread Modassar Ather
Hi,

An input will be helpful.

Thanks,
Modassar

On Wed, Jan 6, 2016 at 12:39 PM, Modassar Ather 
wrote:

> Hi,
>
>
> *q=fl1:net*&facet.field=fl&facet.limit=50&stats=true&stats.field={!cardinality=1.0}fl*
> is returning cardinality around 15 million. It is taking around 4 minutes.
> Similar response time is seen with different queries which yields high
> cardinality. Kindly note that the cardinality=1.0 is the desired goal.
> Here in the above example the fl1 is a text field whereas fl is a docValue
> enabled non-stroed, non-indexed field.
> Kindly let me know if such response time is expected or I am missing
> something about this feature in my query.
>
> Thanks,
> Modassar
>


solr-user@lucene.apache.org

2016-01-08 Thread Modassar Ather
Hi Toke,

Is this a single shard or multiple?
It is 12 shard cluster without replicas and has around 90+ GB on each shard.

Thanks for sharing the link. I will look into that.

Regards,
Modassar

On Fri, Jan 8, 2016 at 4:28 PM, Toke Eskildsen 
wrote:

> On Wed, 2016-01-06 at 12:39 +0530, Modassar Ather wrote:
> >
> *q=fl1:net*&facet.field=fl&facet.limit=50&stats=true&stats.field={!cardinality=1.0}fl*
> > is returning cardinality around 15 million. It is taking around 4
> minutes.
>
> Is this a single shard or multiple?
>
> Anyway, you might have better luck trying the 'unique' request in JSON
> faceting:
> https://cwiki.apache.org/confluence/display/solr/Faceted+Search
>
> - Toke Eskildsen, State and University Library, Denmark
>
>
>


Position increment in WordDelimiterFilter.

2016-01-14 Thread Modassar Ather
Hi,

I have following definition for WordDelimiterFilter.



The analysis of 3d shows following four tokens and their positions.

token position
3d 1
3   1
3d 1
d   2

Please help me understand why d is at 2? Should not it also be at position
1.
Is it a bug and if not is there any attribute which I can use to restrict
the position increment?

Thanks,
Modassar


Re: Position increment in WordDelimiterFilter.

2016-01-14 Thread Modassar Ather
Thanks for your responses.

Why do you think it should be at position 1? In that case searching for "3
d" would not find anything. Is it what you expect?
During search some of the results returned are not wanted. Following is the
example.
Search query: "3d image"
Search results with 3-d image/3 d image/1d image are also returned. This is
happening because of position increment.
Another example is "1d obj*" returning results containing "d-object"
related results. This can bring a completely different search item. Here
the token d matches with d of d-object as this term is again split same way.
The position increment will also cause the "3d image" search fail on a
document containing "3d image" as the "d" comes at position 2.

1) can you confirm if you've made a typo while typing out your results?
I have confirmed the position attribute displayed on analysis page and I
found there is no typo.
2 ) you'll get the d and 3d as 2 since they're the 2nd token once 3d is
split.
Irrespective of it what I want to understand why there is an increment in
position. Should not all the terms be at same position as they are yielded
from the same term/token?

Best,
Modassar

On Thu, Jan 14, 2016 at 3:25 PM, Binoy Dalal  wrote:

> I've tried out your settings and here's what I get:
> 3d 1
> 3   1
> d   2
> 3d 2
>
> 1) can you confirm if you've made a typo while typing out your results?
> 2 ) you'll get the d and 3d as 2 since they're the 2nd token once 3d is
> split.
> Try the same thing with d3 and you'll get 3 and d3 at position 2
>
> On Thu, 14 Jan 2016, 15:11 Emir Arnautovic 
> wrote:
>
> > Hi Modassar,
> > Why do you think it should be at position 1? In that case searching for
> > "3 d" would not find anything. Is it what you expect?
> >
> > Thanks,
> > Emir
> >
> > On 14.01.2016 10:15, Modassar Ather wrote:
> > > Hi,
> > >
> > > I have following definition for WordDelimiterFilter.
> > >
> > >  > > generateNumberParts="1" catenateWords="1" catenateNumbers="1"
> > > catenateAll="1" splitOnCaseChange="1" preserveOriginal="1"/>
> > >
> > > The analysis of 3d shows following four tokens and their positions.
> > >
> > > token position
> > > 3d 1
> > > 3   1
> > > 3d 1
> > > d   2
> > >
> > > Please help me understand why d is at 2? Should not it also be at
> > position
> > > 1.
> > > Is it a bug and if not is there any attribute which I can use to
> restrict
> > > the position increment?
> > >
> > > Thanks,
> > > Modassar
> > >
> >
> > --
> > Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> > Solr & Elasticsearch Support * http://sematext.com/
> >
> > --
> Regards,
> Binoy Dalal
>


Re: Position increment in WordDelimiterFilter.

2016-01-14 Thread Modassar Ather
Thanks for your responses.

It seems to me that you don't want to split on numbers.
It is not with number only. Even if you try to analyze WiFi it will create
4 token one of which will be at position 2. So basically the issue is with
position increment which causes few of the queries behave unexpectedly.

Which release of Solr are you using?
I am using Lucene/Solr-5.4.0.

Best,
Modassar

On Thu, Jan 14, 2016 at 9:44 PM, Jack Krupansky 
wrote:

> Which release of Solr are you using? Last year (or so) there was a Lucene
> change that had the effect of keeping all terms for WDF at the same
> position. There was also some discussion about whether this was either a
> bug or a bug fix, but I don't recall any resolution.
>
> -- Jack Krupansky
>
> On Thu, Jan 14, 2016 at 4:15 AM, Modassar Ather 
> wrote:
>
> > Hi,
> >
> > I have following definition for WordDelimiterFilter.
> >
> >  > generateNumberParts="1" catenateWords="1" catenateNumbers="1"
> > catenateAll="1" splitOnCaseChange="1" preserveOriginal="1"/>
> >
> > The analysis of 3d shows following four tokens and their positions.
> >
> > token position
> > 3d 1
> > 3   1
> > 3d 1
> > d   2
> >
> > Please help me understand why d is at 2? Should not it also be at
> position
> > 1.
> > Is it a bug and if not is there any attribute which I can use to restrict
> > the position increment?
> >
> > Thanks,
> > Modassar
> >
>


Re: Position increment in WordDelimiterFilter.

2016-01-15 Thread Modassar Ather
Are you saying that WiFi Wi-Fi and Wi Fi should not match each other?
I am using WhiteSpaceTokenizer in my analysis chain so wi fi becomes two
different token. Please refer to my examples given in previous mail about
the issues faced.
Wi Fi are two term which will match but what happens if for a content
having *WiFi device* is searched with *"WiFi device"*. It will not match as
there is a position increment by WordDelimiterFilter for WiFi.
"WiFi device"~1 will match which is confusing that there is no gap in the
content why a slop is required.

Why do you use WordDelimiterFilter? Can you give us few examples where it
is useful?
It is useful when a word like* lucene-search documentation *is indexed with
WordDelimiterFilter and it is broken in two terms like lucene and search
then it will be helpful to get the documents containing it for queries like
lucene documentation or search documentation.

Best,
Modassar

On Fri, Jan 15, 2016 at 2:14 PM, Emir Arnautovic <
emir.arnauto...@sematext.com> wrote:

> Modassar,
> Are you saying that WiFi Wi-Fi and Wi Fi should not match each other? Why
> do you use WordDelimiterFilter? Can you give us few examples where it is
> useful?
>
> Thanks,
> Emir
>
>
> On 15.01.2016 05:13, Modassar Ather wrote:
>
>> Thanks for your responses.
>>
>> It seems to me that you don't want to split on numbers.
>> It is not with number only. Even if you try to analyze WiFi it will create
>> 4 token one of which will be at position 2. So basically the issue is with
>> position increment which causes few of the queries behave unexpectedly.
>>
>> Which release of Solr are you using?
>> I am using Lucene/Solr-5.4.0.
>>
>> Best,
>> Modassar
>>
>> On Thu, Jan 14, 2016 at 9:44 PM, Jack Krupansky > >
>> wrote:
>>
>> Which release of Solr are you using? Last year (or so) there was a Lucene
>>> change that had the effect of keeping all terms for WDF at the same
>>> position. There was also some discussion about whether this was either a
>>> bug or a bug fix, but I don't recall any resolution.
>>>
>>> -- Jack Krupansky
>>>
>>> On Thu, Jan 14, 2016 at 4:15 AM, Modassar Ather 
>>> wrote:
>>>
>>> Hi,
>>>>
>>>> I have following definition for WordDelimiterFilter.
>>>>
>>>> >>> generateNumberParts="1" catenateWords="1" catenateNumbers="1"
>>>> catenateAll="1" splitOnCaseChange="1" preserveOriginal="1"/>
>>>>
>>>> The analysis of 3d shows following four tokens and their positions.
>>>>
>>>> token position
>>>> 3d 1
>>>> 3   1
>>>> 3d 1
>>>> d   2
>>>>
>>>> Please help me understand why d is at 2? Should not it also be at
>>>>
>>> position
>>>
>>>> 1.
>>>> Is it a bug and if not is there any attribute which I can use to
>>>> restrict
>>>> the position increment?
>>>>
>>>> Thanks,
>>>> Modassar
>>>>
>>>>
> --
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> Solr & Elasticsearch Support * http://sematext.com/
>
>


Re: Position increment in WordDelimiterFilter.

2016-01-18 Thread Modassar Ather
Can you please send us tokens you get (and positions) when you analyze
*WiFi device*

Tokens generated and their respective positions.

WiFi1
Wi   1
WiFi1
Fi2
device 3

Best,
Modassar

On Fri, Jan 15, 2016 at 6:25 PM, Emir Arnautovic <
emir.arnauto...@sematext.com> wrote:

> Can you please send us tokens you get (and positions) when you analyze
> *WiFi device*
>
> On 15.01.2016 13:15, Modassar Ather wrote:
>
>> Are you saying that WiFi Wi-Fi and Wi Fi should not match each other?
>> I am using WhiteSpaceTokenizer in my analysis chain so wi fi becomes two
>> different token. Please refer to my examples given in previous mail about
>> the issues faced.
>> Wi Fi are two term which will match but what happens if for a content
>> having *WiFi device* is searched with *"WiFi device"*. It will not match
>> as
>> there is a position increment by WordDelimiterFilter for WiFi.
>> "WiFi device"~1 will match which is confusing that there is no gap in the
>> content why a slop is required.
>>
>> Why do you use WordDelimiterFilter? Can you give us few examples where it
>> is useful?
>> It is useful when a word like* lucene-search documentation *is indexed
>> with
>>
>> WordDelimiterFilter and it is broken in two terms like lucene and search
>> then it will be helpful to get the documents containing it for queries
>> like
>> lucene documentation or search documentation.
>>
>> Best,
>> Modassar
>>
>> On Fri, Jan 15, 2016 at 2:14 PM, Emir Arnautovic <
>> emir.arnauto...@sematext.com> wrote:
>>
>> Modassar,
>>> Are you saying that WiFi Wi-Fi and Wi Fi should not match each other? Why
>>> do you use WordDelimiterFilter? Can you give us few examples where it is
>>> useful?
>>>
>>> Thanks,
>>> Emir
>>>
>>>
>>> On 15.01.2016 05:13, Modassar Ather wrote:
>>>
>>> Thanks for your responses.
>>>>
>>>> It seems to me that you don't want to split on numbers.
>>>> It is not with number only. Even if you try to analyze WiFi it will
>>>> create
>>>> 4 token one of which will be at position 2. So basically the issue is
>>>> with
>>>> position increment which causes few of the queries behave unexpectedly.
>>>>
>>>> Which release of Solr are you using?
>>>> I am using Lucene/Solr-5.4.0.
>>>>
>>>> Best,
>>>> Modassar
>>>>
>>>> On Thu, Jan 14, 2016 at 9:44 PM, Jack Krupansky <
>>>> jack.krupan...@gmail.com
>>>> wrote:
>>>>
>>>> Which release of Solr are you using? Last year (or so) there was a
>>>> Lucene
>>>>
>>>>> change that had the effect of keeping all terms for WDF at the same
>>>>> position. There was also some discussion about whether this was either
>>>>> a
>>>>> bug or a bug fix, but I don't recall any resolution.
>>>>>
>>>>> -- Jack Krupansky
>>>>>
>>>>> On Thu, Jan 14, 2016 at 4:15 AM, Modassar Ather <
>>>>> modather1...@gmail.com>
>>>>> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>>> I have following definition for WordDelimiterFilter.
>>>>>>
>>>>>> >>>>> generateNumberParts="1" catenateWords="1" catenateNumbers="1"
>>>>>> catenateAll="1" splitOnCaseChange="1" preserveOriginal="1"/>
>>>>>>
>>>>>> The analysis of 3d shows following four tokens and their positions.
>>>>>>
>>>>>> token position
>>>>>> 3d 1
>>>>>> 3   1
>>>>>> 3d 1
>>>>>> d   2
>>>>>>
>>>>>> Please help me understand why d is at 2? Should not it also be at
>>>>>>
>>>>>> position
>>>>>
>>>>> 1.
>>>>>> Is it a bug and if not is there any attribute which I can use to
>>>>>> restrict
>>>>>> the position increment?
>>>>>>
>>>>>> Thanks,
>>>>>> Modassar
>>>>>>
>>>>>>
>>>>>> --
>>> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
>>> Solr & Elasticsearch Support * http://sematext.com/
>>>
>>>
>>>
> --
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> Solr & Elasticsearch Support * http://sematext.com/
>
>


Re: Position increment in WordDelimiterFilter.

2016-01-18 Thread Modassar Ather
Thanks Shawn for your explanation.

Everything else about the analysis looks
correct to me, and the positions you see are needed for a phrase query
to work correctly.

Here the "WiFi device" will not be searched as there is a gap in between
because Fi is at position 2. The document containing WiFi device will be
seen as a phrase with no word in between hence it should match phrase "WiFi
device" but it will not whereas "WiFi device"~1 will matched.

Best,
Modassar

On Mon, Jan 18, 2016 at 7:57 PM, Shawn Heisey  wrote:

> On 1/18/2016 6:21 AM, Modassar Ather wrote:
> > Can you please send us tokens you get (and positions) when you analyze
> > *WiFi device*
> >
> > Tokens generated and their respective positions.
> >
> > WiFi1
> > Wi  1
> > WiFi1
> > Fi  2
> > device  3
>
> It seems very odd to me that the original value would show up twice with
> the preserveOriginal parameter set, but I am seeing the same behavior on
> 4.7 and 5.3.  Because both copies are at the same position, this will
> not affect search, but will slightly affect relevance if you are not
> specifying a sort parameter.  Everything else about the analysis looks
> correct to me, and the positions you see are needed for a phrase query
> to work correctly.
>
> I have seen working configurations where preserveOriginal is set on the
> index analysis but NOT set on query analysis.  This is how my own schema
> is configured.  One of the reasons for this configuration is to reduce
> the number of terms in the query so it is faster than it would be if
> preserveOriginal were present and generated additional terms.  The
> preserveOriginal on the index side ensures a match whether mixed case is
> used or not.
>
> Thanks,
> Shawn
>
>


CorruptIndexException during optimize.

2016-01-31 Thread Modassar Ather
Hi,

Got following error during optimize of index on 2 nodes of 12 node cluster.
Please let me know if the index can be recovered and how and what could be
the reason?
Total number of nodes: 12
No replica.
Solr version - 5.4.0
Java version - 1.7.0_91 (Open JDK 64 bit)
Ubuntu version : Ubuntu 14.04.3 LTS

2016-01-31 20:00:31.211 ERROR (qtp1698904557-9710) [c:core s:shard4
r:core_node3 x:core] o.a.s.h.RequestHandlerBase java.io.IOException:
Invalid vInt detected (too many bits)
at org.apache.lucene.store.DataInput.readVInt(DataInput.java:141)
at
org.apache.lucene.codecs.lucene54.Lucene54DocValuesProducer.readNumericEntry(Lucene54DocValuesProducer.java:355)
at
org.apache.lucene.codecs.lucene54.Lucene54DocValuesProducer.readFields(Lucene54DocValuesProducer.java:243)
at
org.apache.lucene.codecs.lucene54.Lucene54DocValuesProducer.(Lucene54DocValuesProducer.java:122)
at
org.apache.lucene.codecs.lucene54.Lucene54DocValuesFormat.fieldsProducer(Lucene54DocValuesFormat.java:113)
at
org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsReader.(PerFieldDocValuesFormat.java:268)
at
org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat.fieldsProducer(PerFieldDocValuesFormat.java:358)
at
org.apache.lucene.index.SegmentDocValues.newDocValuesProducer(SegmentDocValues.java:51)
at
org.apache.lucene.index.SegmentDocValues.getDocValuesProducer(SegmentDocValues.java:67)
at
org.apache.lucene.index.SegmentReader.initDocValuesProducer(SegmentReader.java:147)
at org.apache.lucene.index.SegmentReader.(SegmentReader.java:81)
at
org.apache.lucene.index.ReadersAndUpdates.getReader(ReadersAndUpdates.java:145)
at
org.apache.lucene.index.BufferedUpdatesStream$SegmentState.(BufferedUpdatesStream.java:384)
at
org.apache.lucene.index.BufferedUpdatesStream.openSegmentStates(BufferedUpdatesStream.java:416)
at
org.apache.lucene.index.BufferedUpdatesStream.applyDeletesAndUpdates(BufferedUpdatesStream.java:261)
at
org.apache.lucene.index.IndexWriter.applyAllDeletesAndUpdates(IndexWriter.java:3161)
at
org.apache.lucene.index.IndexWriter.maybeApplyDeletes(IndexWriter.java:3147)
at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3124)
at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3087)
at org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1741)
at org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1721)
at
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:590)
at
org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:95)
at
org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:62)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalCommit(DistributedUpdateProcessor.java:1612)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:1589)
at
org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:69)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:64)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:156)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2073)
at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:658)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:457)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:222)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:181)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
at org.eclipse.jetty.server.Server.handle(Server.java:499)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)
at
org.eclipse.jetty.server

Re: CorruptIndexException during optimize.

2016-02-09 Thread Modassar Ather
Hi,

Kindly provide your inputs on the issue.

Thanks,
Modassar

On Mon, Feb 1, 2016 at 12:40 PM, Modassar Ather 
wrote:

> Hi,
>
> Got following error during optimize of index on 2 nodes of 12 node
> cluster. Please let me know if the index can be recovered and how and what
> could be the reason?
> Total number of nodes: 12
> No replica.
> Solr version - 5.4.0
> Java version - 1.7.0_91 (Open JDK 64 bit)
> Ubuntu version : Ubuntu 14.04.3 LTS
>
> 2016-01-31 20:00:31.211 ERROR (qtp1698904557-9710) [c:core s:shard4
> r:core_node3 x:core] o.a.s.h.RequestHandlerBase java.io.IOException:
> Invalid vInt detected (too many bits)
> at org.apache.lucene.store.DataInput.readVInt(DataInput.java:141)
> at
> org.apache.lucene.codecs.lucene54.Lucene54DocValuesProducer.readNumericEntry(Lucene54DocValuesProducer.java:355)
> at
> org.apache.lucene.codecs.lucene54.Lucene54DocValuesProducer.readFields(Lucene54DocValuesProducer.java:243)
> at
> org.apache.lucene.codecs.lucene54.Lucene54DocValuesProducer.(Lucene54DocValuesProducer.java:122)
> at
> org.apache.lucene.codecs.lucene54.Lucene54DocValuesFormat.fieldsProducer(Lucene54DocValuesFormat.java:113)
> at
> org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsReader.(PerFieldDocValuesFormat.java:268)
> at
> org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat.fieldsProducer(PerFieldDocValuesFormat.java:358)
> at
> org.apache.lucene.index.SegmentDocValues.newDocValuesProducer(SegmentDocValues.java:51)
> at
> org.apache.lucene.index.SegmentDocValues.getDocValuesProducer(SegmentDocValues.java:67)
> at
> org.apache.lucene.index.SegmentReader.initDocValuesProducer(SegmentReader.java:147)
> at org.apache.lucene.index.SegmentReader.(SegmentReader.java:81)
> at
> org.apache.lucene.index.ReadersAndUpdates.getReader(ReadersAndUpdates.java:145)
> at
> org.apache.lucene.index.BufferedUpdatesStream$SegmentState.(BufferedUpdatesStream.java:384)
> at
> org.apache.lucene.index.BufferedUpdatesStream.openSegmentStates(BufferedUpdatesStream.java:416)
> at
> org.apache.lucene.index.BufferedUpdatesStream.applyDeletesAndUpdates(BufferedUpdatesStream.java:261)
> at
> org.apache.lucene.index.IndexWriter.applyAllDeletesAndUpdates(IndexWriter.java:3161)
> at
> org.apache.lucene.index.IndexWriter.maybeApplyDeletes(IndexWriter.java:3147)
> at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3124)
> at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3087)
> at
> org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1741)
> at
> org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1721)
> at
> org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:590)
> at
> org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:95)
> at
> org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:62)
> at
> org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalCommit(DistributedUpdateProcessor.java:1612)
> at
> org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:1589)
> at
> org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:69)
> at
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:64)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:156)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:2073)
> at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:658)
> at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:457)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:222)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:181)
> at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
> at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
> at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
> at
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
> at
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
> at
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
> at
> org.eclipse

Re: Multi-lingual search

2016-02-09 Thread Modassar Ather
And what does proximity search exactly mean?

A proximity search means searching terms with a distance in between them.
E.g. Search for a document which has java near 3 words of network.
field:"java network"~3
So the above query will match any document having a distance of 3 by its
position between java and network.

Can i implement proximity search if i use
>seperate core per language
>field per language
>multilingual field that supports all languages.
A proximity search is on a field so it does not matter if it is in
same/different core.

searching for walk word when walking is indexed,should fetch and display the
record?
It will be included in stemming filter.right?
Stemming does bring the word to its root form. So yes if the root word is
achieved from the given word it will search.

Hope this helps.

Best,
Modassar


On Tue, Feb 9, 2016 at 12:58 PM, vidya  wrote:

> Hi
>   Can i implement proximity search if i use
> >seperate core per language
> >field per language
> >multilingual field that supports all languages.
>
> And what does proximity search exactly mean?
>
> searching for walk word when walking is indexed,should fetch and display
> the
> record?
> It will be included in stemming filter.right?
>
> Thanks in advance
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Multi-lingual-search-tp4254398p4256094.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Searching special characters

2016-02-12 Thread Modassar Ather
You can search them by escaping with backslash.

Best,
Modassar


Re: Searching special characters

2016-02-12 Thread Modassar Ather
These special characters can be removed if at begging or end or can be
taken care by the relevant filters depending on the schema defined.
E.g "Audit"/*Audit should be searched by query Audit so I see no reason of
indexing "/* of the content. You can use PatternReplaceFilter for replacing
these special character.
If the special character is in between a word E.g. Wi-Fi then these type of
terms can be taken care by WordDelimiterFilter.

Note that the special character handling may vary based on use cases.

Best,
Modassar

On Fri, Feb 12, 2016 at 3:09 PM, Anil  wrote:

> Thanks for quick response.
>
> Should these be treated differently during index ?
>
> I have tried *\"Audit* which is returning results of *Audit *also which is
> incorrect. what do you say ?
>
> On 12 February 2016 at 15:07, Modassar Ather 
> wrote:
>
> > You can search them by escaping with backslash.
> >
> > Best,
> > Modassar
> >
>


Re: SOLR ranking

2016-02-15 Thread Modassar Ather
First it will search for "Eating Disorders" together and then the individual
words "Eating" and "Disorders"

I don't think the phrase will be searched as individual ANDed terms until
the query has it like below.
"Eating Disorders" OR (Eating AND Disorders).

Best,
Modassar

On Tue, Feb 16, 2016 at 10:48 AM, Nitin.K  wrote:

> I am using edismax parser with the following query:
>
>
> localhost:8983/solr/tgl/select?q=eating%20disorders&wt=xml&tie=1.0&rows=200&q.op=AND&indent=true&defType=edismax&stopwords=true&lowercaseOperators=true&debugQuery=true&qf=topic_title%5E100+subtopic_title%5E40+index_term%5E20+drug%5E15+content%5E3&pf2=topTitle%5E200+subTopTitle%5E80+indTerm%5E40+drugString%5E30+content%5E6
>
> Configuration of schema.xml
>
>  />
> 
>
>  stored="true"/>
> 
>
>  multiValued="true"/>
>  multiValued="true"/>
>
>  multiValued="true"/>
>  multiValued="true"/>
>
> 
>
> 
> 
> 
> 
>
>  positionIncrementGap="100" omitNorms="true">
> 
> 
>  ignoreCase="true"
> words="stopwords.txt" />
> 
> 
> 
> 
>  ignoreCase="true"
> words="stopwords.txt" />
>  synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/>
> 
> 
> 
>  omitTermFreqAndPositions="true" omitNorms="true">
> 
>  class="solr.WhitespaceTokenizerFactory"/>
>  ignoreCase="true"
> words="stopwords.txt" />
> 
> 
> 
>
>
> I want , if user will search for a phrase then that pharse should always
> takes the priority in comaprison to the individual words;
>
> Example: "Eating Disorders"
>
> First it will search for "Eating Disorders" together and then the
> individual
> words "Eating" and "Disorders"
> but while searching for individual words, it will always return those
> documents where both the words should exist for which i am already using
> q.op="AND" in my query.
>
> Thanks,
> Nitin
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/SOLR-ranking-tp4257367p4257510.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: SOLR ranking

2016-02-16 Thread Modassar Ather
Actually you can get it with the edismax.
Just set mm to 100% and then configure a pf field ( or more) .
You are going to search all the search terms mandatory and boost phrases
match .

@Alessandro Thanks for your insight.
I thought that the document will be boosted if all of the terms appear in
close proximity by setting pf. Not sure how much is meant by the close
proximity. Checked it on dismax query parser wiki too.

Best,
Modassar

On Tue, Feb 16, 2016 at 3:36 PM, Alessandro Benedetti  wrote:

> Binoy, the omitTermFreqAndPositions is set only for text_ws which is used
> only on the "indexed_terms" field.
> The text_general fields seem fine to me.
>
> Are you omitting norms on purpose ? To be fair it could be relevant in
> title or short topic searches to boost up short field values, containing a
> lot of terms from the searched query.
>
> To respond Modassar :
>
> I don't think the phrase will be searched as individual ANDed terms until
> > the query has it like below.
> > "Eating Disorders" OR (Eating AND Disorders).
> >
>
> Actually you can get it with the edismax.
> Just set mm to 100% and then configure a pf field ( or more) .
> You are going to search all the search terms mandatory and boost phrases
> match .
>
> Cheers
>
> On 16 February 2016 at 07:57, Emir Arnautovic <
> emir.arnauto...@sematext.com>
> wrote:
>
> > Hi Nitin,
> > You can use pf parameter to boost results with exact phrase. You can also
> > use pf2 and pf3 to boost results with bigrams (phrase matches with 2 or 3
> > words in case input is with more than 3 words)
> >
> > Regards,
> > Emir
> >
> >
> > On 16.02.2016 06:18, Nitin.K wrote:
> >
> >> I am using edismax parser with the following query:
> >>
> >>
> >>
> localhost:8983/solr/tgl/select?q=eating%20disorders&wt=xml&tie=1.0&rows=200&q.op=AND&indent=true&defType=edismax&stopwords=true&lowercaseOperators=true&debugQuery=true&qf=topic_title%5E100+subtopic_title%5E40+index_term%5E20+drug%5E15+content%5E3&pf2=topTitle%5E200+subTopTitle%5E80+indTerm%5E40+drugString%5E30+content%5E6
> >>
> >> Configuration of schema.xml
> >>
> >>  stored="true"
> >> />
> >> 
> >>
> >>  >> stored="true"/>
> >> 
> >>
> >>  >> multiValued="true"/>
> >>  >> multiValued="true"/>
> >>
> >>  >> multiValued="true"/>
> >>  >> multiValued="true"/>
> >>
> >> 
> >>
> >> 
> >> 
> >> 
> >> 
> >>
> >>  >> positionIncrementGap="100" omitNorms="true">
> >> 
> >>  class="solr.StandardTokenizerFactory"/>
> >>  >> ignoreCase="true"
> >> words="stopwords.txt" />
> >> 
> >> 
> >> 
> >>  class="solr.StandardTokenizerFactory"/>
> >>  >> ignoreCase="true"
> >> words="stopwords.txt" />
> >>  >> synonyms="synonyms.txt"
> >> ignoreCase="true" expand="true"/>
> >> 
> >> 
> >> 
> >>  >> positionIncrementGap="100"
> >> omitTermFreqAndPositions="true" omitNorms="true">
> >> 
> >>  >> class="solr.WhitespaceTokenizerFactory"/>
> >>  >> ignoreCase="true"
> >> words="stopwords.txt" />
> >> 
> >> 
> >> 
> >>
> >>
> >> I want , if user will search for a phrase then that pharse should always
> >> takes the priority in comaprison to the individual words;
> >>
> >> Example: "Eating Disorders"
> >>
> >> First it will search for "Eating Disorders" together and then the
> >> individual
> >> words "Eating" and "Disorders"
> >> but while searching for individual words, it will always return those
> >> documents where both the words should exist for which i am already using
> >> q.op="AND" in my query.
> >>
> >> Thanks,
> >> Nitin
> >>
> >>
> >>
> >>
> >> --
> >> View this message in context:
> >> http://lucene.472066.n3.nabble.com/SOLR-ranking-tp4257367p4257510.html
> >> Sent from the Solr - User mailing list archive at Nabble.com.
> >>
> >
> > --
> > Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> > Solr & Elasticsearch Support * http://sematext.com/
> >
> >
>
>
> --
> --
>
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>


Re: SOLR ranking

2016-02-16 Thread Modassar Ather
In that case will a phrase with a given slop match a document having the
terms of the given phrase with more than the given slop in between them
when pf field and mm=100%? Per my understanding as a phrase it will not
match for sure.

Best,
Modassar


On Tue, Feb 16, 2016 at 5:26 PM, Alessandro Benedetti  wrote:

> If I remember well , it is going to be as a phrase query ( when you use the
> "quotes") .
> So the close proximity means a match of the phrase with 0 tolerance ( so
> the terms must respect the position distance in the query).
> If I remember well I debugged that recently.
>
> Cheers
>
> On 16 February 2016 at 11:42, Modassar Ather 
> wrote:
>
> > Actually you can get it with the edismax.
> > Just set mm to 100% and then configure a pf field ( or more) .
> > You are going to search all the search terms mandatory and boost phrases
> > match .
> >
> > @Alessandro Thanks for your insight.
> > I thought that the document will be boosted if all of the terms appear in
> > close proximity by setting pf. Not sure how much is meant by the close
> > proximity. Checked it on dismax query parser wiki too.
> >
> > Best,
> > Modassar
> >
> > On Tue, Feb 16, 2016 at 3:36 PM, Alessandro Benedetti <
> > abenede...@apache.org
> > > wrote:
> >
> > > Binoy, the omitTermFreqAndPositions is set only for text_ws which is
> used
> > > only on the "indexed_terms" field.
> > > The text_general fields seem fine to me.
> > >
> > > Are you omitting norms on purpose ? To be fair it could be relevant in
> > > title or short topic searches to boost up short field values,
> containing
> > a
> > > lot of terms from the searched query.
> > >
> > > To respond Modassar :
> > >
> > > I don't think the phrase will be searched as individual ANDed terms
> until
> > > > the query has it like below.
> > > > "Eating Disorders" OR (Eating AND Disorders).
> > > >
> > >
> > > Actually you can get it with the edismax.
> > > Just set mm to 100% and then configure a pf field ( or more) .
> > > You are going to search all the search terms mandatory and boost
> phrases
> > > match .
> > >
> > > Cheers
> > >
> > > On 16 February 2016 at 07:57, Emir Arnautovic <
> > > emir.arnauto...@sematext.com>
> > > wrote:
> > >
> > > > Hi Nitin,
> > > > You can use pf parameter to boost results with exact phrase. You can
> > also
> > > > use pf2 and pf3 to boost results with bigrams (phrase matches with 2
> > or 3
> > > > words in case input is with more than 3 words)
> > > >
> > > > Regards,
> > > > Emir
> > > >
> > > >
> > > > On 16.02.2016 06:18, Nitin.K wrote:
> > > >
> > > >> I am using edismax parser with the following query:
> > > >>
> > > >>
> > > >>
> > >
> >
> localhost:8983/solr/tgl/select?q=eating%20disorders&wt=xml&tie=1.0&rows=200&q.op=AND&indent=true&defType=edismax&stopwords=true&lowercaseOperators=true&debugQuery=true&qf=topic_title%5E100+subtopic_title%5E40+index_term%5E20+drug%5E15+content%5E3&pf2=topTitle%5E200+subTopTitle%5E80+indTerm%5E40+drugString%5E30+content%5E6
> > > >>
> > > >> Configuration of schema.xml
> > > >>
> > > >>  > > stored="true"
> > > >> />
> > > >> 
> > > >>
> > > >>  > > >> stored="true"/>
> > > >>  > stored="false"/>
> > > >>
> > > >>  > > >> multiValued="true"/>
> > > >>  > > >> multiValued="true"/>
> > > >>
> > > >>  > > >> multiValued="true"/>
> > > >>  > > >> multiValued="true"/>
> > > >>
> > > >>  > stored="true"/>
> > > >>
> > > >> 
> > > >> 
> > > >> 
> > > >> 
> > > >>
> > > >>  > > >> positionIncrementGap="100" omitNorms="true">
> > > >> 
> > > >>  > > class="solr.StandardTokenizerFactory"/>
> > > >>  > > >> ignoreCase=&q

Query behavior.

2016-03-08 Thread Modassar Ather
Hi,

Kindly help me understand the parsing of following query. I am using
edismax parser and Solr-5.5.0.
q.op is set to AND and there is no explicit mm value set.

fl:(java OR book) => "boost(+((fl:java fl:book)~2),int(val))"

When the query has explicit OR then why the ~2 is present in the parsed
query?

How can I achieve following?
"boost(+((fl:java fl:book)),int(val))"

The reason being the ANDed and ORed queries both returns the same number of
documents. But what expected is that the ORed query should have more number
of documents.

Thanks,
Modassar


Re: Query behavior.

2016-03-09 Thread Modassar Ather
Hi,

A suggestion will be very helpful.

Thanks,
Modassar

On Wed, Mar 9, 2016 at 12:37 PM, Modassar Ather 
wrote:

> Hi,
>
> Kindly help me understand the parsing of following query. I am using
> edismax parser and Solr-5.5.0.
> q.op is set to AND and there is no explicit mm value set.
>
> fl:(java OR book) => "boost(+((fl:java fl:book)~2),int(val))"
>
> When the query has explicit OR then why the ~2 is present in the parsed
> query?
>
> How can I achieve following?
> "boost(+((fl:java fl:book)),int(val))"
>
> The reason being the ANDed and ORed queries both returns the same number
> of documents. But what expected is that the ORed query should have more
> number of documents.
>
> Thanks,
> Modassar
>


Re: Query behavior.

2016-03-10 Thread Modassar Ather
Thanks Shawn for pointing to the jira issue. I was not sure that if it is
an expected behavior or a bug or there could have been a way to get the
desired result.

Best,
Modassar

On Thu, Mar 10, 2016 at 11:32 AM, Shawn Heisey  wrote:

> On 3/9/2016 10:55 PM, Shawn Heisey wrote:
> > The ~2 syntax, when not attached to a phrase query (quotes) is the way
> > you express a fuzzy query. If it's attached to a query in quotes, then
> > it is a proximity query. I'm not sure whether it means something
> > different when it's attached to a query clause in parentheses, someone
> > with more knowledge will need to comment.
> 
> > https://issues.apache.org/jira/browse/SOLR-8812
>
> After I read SOLR-8812 more closely, it seems that the ~2 syntax with
> parentheses is the way that the effective mm value is expressed for a
> particular query clause in the parsed query.  I've learned something new
> today.
>
> Thanks,
> Shawn
>
>


Re: Query behavior.

2016-03-14 Thread Modassar Ather
Thanks Jack for your response.
The following jira bug for this issue is already present so I have not
created a new one.
https://issues.apache.org/jira/browse/SOLR-8812

Kindly help me understand that whether it is possible to achieve search on
ORed terms as it was done in earlier Solr version.
Is this behavior intentional or is it a bug? I need to migrate to
Solr-5.5.0 but not doing so due to this behavior.

Thanks,
Modassar


On Fri, Mar 11, 2016 at 3:18 AM, Jack Krupansky 
wrote:

> We probably need a Jira to investigate whether this really is an explicitly
> intentional feature change, or whether it really is a bug. And if it truly
> was intentional, how people can work around the change to get the desired,
> pre-5.5 behavior. Personally, I always thought it was a mistake that q.op
> and mm were so tightly linked in Solr even though they are independent in
> Lucene.
>
> In short, I think people want to be able to set the default behavior for
> individual terms (MUST vs. SHOULD) if explicit operators are not used, and
> that OR is an explicit operator. And that mm should control only how many
> SHOULD terms are required (Lucene MinShouldMatch.)
>
>
> -- Jack Krupansky
>
> On Thu, Mar 10, 2016 at 3:41 AM, Modassar Ather 
> wrote:
>
> > Thanks Shawn for pointing to the jira issue. I was not sure that if it is
> > an expected behavior or a bug or there could have been a way to get the
> > desired result.
> >
> > Best,
> > Modassar
> >
> > On Thu, Mar 10, 2016 at 11:32 AM, Shawn Heisey 
> > wrote:
> >
> > > On 3/9/2016 10:55 PM, Shawn Heisey wrote:
> > > > The ~2 syntax, when not attached to a phrase query (quotes) is the
> way
> > > > you express a fuzzy query. If it's attached to a query in quotes,
> then
> > > > it is a proximity query. I'm not sure whether it means something
> > > > different when it's attached to a query clause in parentheses,
> someone
> > > > with more knowledge will need to comment.
> > > 
> > > > https://issues.apache.org/jira/browse/SOLR-8812
> > >
> > > After I read SOLR-8812 more closely, it seems that the ~2 syntax with
> > > parentheses is the way that the effective mm value is expressed for a
> > > particular query clause in the parsed query.  I've learned something
> new
> > > today.
> > >
> > > Thanks,
> > > Shawn
> > >
> > >
> >
>


Understanding parsed queries.

2016-03-15 Thread Modassar Ather
Hi,

Kindly help me understand the parsed queries of following three queries.
How these parsed queries can be interpreted for boolean logic.
Please ignore the boost part.

*Query : *fl:term1 OR fl:term2 AND fl:term3
*"parsedquery_toString" : *"boost(+(fl:term1 +fl:term2
+fl:term3),int(doc_wt))",
*matches : *50685

The above query seems to be ignoring the fl:term1 as the result of fl:term2
AND fl:term3 is exactly 50685.

*Query : *fl:term1 OR (fl:term2 AND fl:term3)
*parsedquery_toString:* "boost(+(fl:term1 (+fl:term2
+fl:term3)),int(doc_wt))",
*matches : *809006

*Query : *(fl:term1 OR fl:term2) AND fl:term3
*parsedquery_toString:* "boost(+(+(fl:term1 fl:term2)
+fl:term3),int(doc_wt))",
*matches : *293949

Per my understanding the terms having + is a must and must be present in
the document whereas a term without it may or may not be present but query
one seems to be ignoring the first term completely.
How the outer plus defines the behavior. E.g. *outer +* in query +(fl:term1
+fl:term2 +fl:term3)

Thanks,
Modassar


Re: Understanding parsed queries.

2016-03-15 Thread Modassar Ather
The query parsing is not strict Boolean logic, here's a great
writeup on the topic:
https://lucidworks.com/blog/2011/12/28/why-not-and-or-and-not/

Thanks for pointing to the link. I have gone through this post. Following
is mentioned in the post.
Practically speaking this means that NOT takes precedence over AND which
takes precedence over OR — but only if the default operator for the query
parser has not been changed from the default (“Or”). If the default
operator is set to “And” then the behavior is just plain weird.

I have q.op set as AND. Not sure how it will behave. Kindly provide your
inputs.

My guess as to why the counts are the same with and without the fl
term is that it's present only in docs with term2 and term3 in them
perhaps?

I have checked only fl:term1 and found many many more documents containing
it than document having all the three terms so the results should have the
documents containing only term1.

Thanks,
Modassar
<https://lucidworks.com/blog/2011/12/28/why-not-and-or-and-not/>

On Tue, Mar 15, 2016 at 9:16 PM, Erick Erickson 
wrote:

> The query parsing is not strict Boolean logic, here's a great
> writeup on the topic:
> https://lucidworks.com/blog/2011/12/28/why-not-and-or-and-not/
>
> The outer "+" is simply the entire clause (of which there is only one)
> must be present, i.e. it's the whole query.
>
> My guess as to why the counts are the same with and without the fl
> term is that it's present only in docs with term2 and term3 in them
> perhaps?
>
> Best,
> Erick
>
> Best,
> Erick
>
> On Tue, Mar 15, 2016 at 12:22 AM, Modassar Ather 
> wrote:
> > Hi,
> >
> > Kindly help me understand the parsed queries of following three queries.
> > How these parsed queries can be interpreted for boolean logic.
> > Please ignore the boost part.
> >
> > *Query : *fl:term1 OR fl:term2 AND fl:term3
> > *"parsedquery_toString" : *"boost(+(fl:term1 +fl:term2
> > +fl:term3),int(doc_wt))",
> > *matches : *50685
> >
> > The above query seems to be ignoring the fl:term1 as the result of
> fl:term2
> > AND fl:term3 is exactly 50685.
> >
> > *Query : *fl:term1 OR (fl:term2 AND fl:term3)
> > *parsedquery_toString:* "boost(+(fl:term1 (+fl:term2
> > +fl:term3)),int(doc_wt))",
> > *matches : *809006
> >
> > *Query : *(fl:term1 OR fl:term2) AND fl:term3
> > *parsedquery_toString:* "boost(+(+(fl:term1 fl:term2)
> > +fl:term3),int(doc_wt))",
> > *matches : *293949
> >
> > Per my understanding the terms having + is a must and must be present in
> > the document whereas a term without it may or may not be present but
> query
> > one seems to be ignoring the first term completely.
> > How the outer plus defines the behavior. E.g. *outer +* in query
> +(fl:term1
> > +fl:term2 +fl:term3)
> >
> > Thanks,
> > Modassar
>


Re: Query behavior.

2016-03-15 Thread Modassar Ather
Jack as suggested I have created following jira issue.

https://issues.apache.org/jira/browse/SOLR-8853

Thanks,
Modassar


On Tue, Mar 15, 2016 at 8:15 PM, Jack Krupansky 
wrote:

> That was precisely the point of the need for a new Jira - to answer exactly
> the questions that you have posed - and that I had proposed as well. Until
> some of the senior committers comment on that Jira you won't have answers.
> They've painted themselves into a corner and now I am curious how they will
> unpaint themselves out of that corner.
>
> -- Jack Krupansky
>
> On Tue, Mar 15, 2016 at 1:46 AM, Modassar Ather 
> wrote:
>
> > Thanks Jack for your response.
> > The following jira bug for this issue is already present so I have not
> > created a new one.
> > https://issues.apache.org/jira/browse/SOLR-8812
> >
> > Kindly help me understand that whether it is possible to achieve search
> on
> > ORed terms as it was done in earlier Solr version.
> > Is this behavior intentional or is it a bug? I need to migrate to
> > Solr-5.5.0 but not doing so due to this behavior.
> >
> > Thanks,
> > Modassar
> >
> >
> > On Fri, Mar 11, 2016 at 3:18 AM, Jack Krupansky <
> jack.krupan...@gmail.com>
> > wrote:
> >
> > > We probably need a Jira to investigate whether this really is an
> > explicitly
> > > intentional feature change, or whether it really is a bug. And if it
> > truly
> > > was intentional, how people can work around the change to get the
> > desired,
> > > pre-5.5 behavior. Personally, I always thought it was a mistake that
> q.op
> > > and mm were so tightly linked in Solr even though they are independent
> in
> > > Lucene.
> > >
> > > In short, I think people want to be able to set the default behavior
> for
> > > individual terms (MUST vs. SHOULD) if explicit operators are not used,
> > and
> > > that OR is an explicit operator. And that mm should control only how
> many
> > > SHOULD terms are required (Lucene MinShouldMatch.)
> > >
> > >
> > > -- Jack Krupansky
> > >
> > > On Thu, Mar 10, 2016 at 3:41 AM, Modassar Ather <
> modather1...@gmail.com>
> > > wrote:
> > >
> > > > Thanks Shawn for pointing to the jira issue. I was not sure that if
> it
> > is
> > > > an expected behavior or a bug or there could have been a way to get
> the
> > > > desired result.
> > > >
> > > > Best,
> > > > Modassar
> > > >
> > > > On Thu, Mar 10, 2016 at 11:32 AM, Shawn Heisey 
> > > > wrote:
> > > >
> > > > > On 3/9/2016 10:55 PM, Shawn Heisey wrote:
> > > > > > The ~2 syntax, when not attached to a phrase query (quotes) is
> the
> > > way
> > > > > > you express a fuzzy query. If it's attached to a query in quotes,
> > > then
> > > > > > it is a proximity query. I'm not sure whether it means something
> > > > > > different when it's attached to a query clause in parentheses,
> > > someone
> > > > > > with more knowledge will need to comment.
> > > > > 
> > > > > > https://issues.apache.org/jira/browse/SOLR-8812
> > > > >
> > > > > After I read SOLR-8812 more closely, it seems that the ~2 syntax
> with
> > > > > parentheses is the way that the effective mm value is expressed
> for a
> > > > > particular query clause in the parsed query.  I've learned
> something
> > > new
> > > > > today.
> > > > >
> > > > > Thanks,
> > > > > Shawn
> > > > >
> > > > >
> > > >
> > >
> >
>


Re: Query behavior.

2016-03-19 Thread Modassar Ather
What I understand by q.op is the default operator. If there is no AND/OR
in-between the terms the default will be AND as per my setting of q.op=AND.
But what if the query has AND/OR explicitly put in-between the query terms?
I just think that if (A OR B) is the query then the result should be based
on any of the term's or both of the terms and not only both of the terms.
Please correct me if my understanding is wrong.

Thanks,
Modassar

On Wed, Mar 16, 2016 at 7:34 PM, Jack Krupansky 
wrote:

> Now you've confused me... Did you actually intend that q.op=AND was going
> to perform some function in a query with only two terms and and OR
> operator? I mean, why not just drop the q.op=AND?
>
> -- Jack Krupansky
>
> On Wed, Mar 16, 2016 at 1:31 AM, Modassar Ather 
> wrote:
>
> > Jack as suggested I have created following jira issue.
> >
> > https://issues.apache.org/jira/browse/SOLR-8853
> >
> > Thanks,
> > Modassar
> >
> >
> > On Tue, Mar 15, 2016 at 8:15 PM, Jack Krupansky <
> jack.krupan...@gmail.com>
> > wrote:
> >
> > > That was precisely the point of the need for a new Jira - to answer
> > exactly
> > > the questions that you have posed - and that I had proposed as well.
> > Until
> > > some of the senior committers comment on that Jira you won't have
> > answers.
> > > They've painted themselves into a corner and now I am curious how they
> > will
> > > unpaint themselves out of that corner.
> > >
> > > -- Jack Krupansky
> > >
> > > On Tue, Mar 15, 2016 at 1:46 AM, Modassar Ather <
> modather1...@gmail.com>
> > > wrote:
> > >
> > > > Thanks Jack for your response.
> > > > The following jira bug for this issue is already present so I have
> not
> > > > created a new one.
> > > > https://issues.apache.org/jira/browse/SOLR-8812
> > > >
> > > > Kindly help me understand that whether it is possible to achieve
> search
> > > on
> > > > ORed terms as it was done in earlier Solr version.
> > > > Is this behavior intentional or is it a bug? I need to migrate to
> > > > Solr-5.5.0 but not doing so due to this behavior.
> > > >
> > > > Thanks,
> > > > Modassar
> > > >
> > > >
> > > > On Fri, Mar 11, 2016 at 3:18 AM, Jack Krupansky <
> > > jack.krupan...@gmail.com>
> > > > wrote:
> > > >
> > > > > We probably need a Jira to investigate whether this really is an
> > > > explicitly
> > > > > intentional feature change, or whether it really is a bug. And if
> it
> > > > truly
> > > > > was intentional, how people can work around the change to get the
> > > > desired,
> > > > > pre-5.5 behavior. Personally, I always thought it was a mistake
> that
> > > q.op
> > > > > and mm were so tightly linked in Solr even though they are
> > independent
> > > in
> > > > > Lucene.
> > > > >
> > > > > In short, I think people want to be able to set the default
> behavior
> > > for
> > > > > individual terms (MUST vs. SHOULD) if explicit operators are not
> > used,
> > > > and
> > > > > that OR is an explicit operator. And that mm should control only
> how
> > > many
> > > > > SHOULD terms are required (Lucene MinShouldMatch.)
> > > > >
> > > > >
> > > > > -- Jack Krupansky
> > > > >
> > > > > On Thu, Mar 10, 2016 at 3:41 AM, Modassar Ather <
> > > modather1...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Thanks Shawn for pointing to the jira issue. I was not sure that
> if
> > > it
> > > > is
> > > > > > an expected behavior or a bug or there could have been a way to
> get
> > > the
> > > > > > desired result.
> > > > > >
> > > > > > Best,
> > > > > > Modassar
> > > > > >
> > > > > > On Thu, Mar 10, 2016 at 11:32 AM, Shawn Heisey <
> > apa...@elyograg.org>
> > > > > > wrote:
> > > > > >
> > > > > > > On 3/9/2016 10:55 PM, Shawn Heisey wrote:
> > > > > > > > The ~2 syntax, when not attached to a phrase query (quotes)
> is
> > > the
> > > > > way
> > > > > > > > you express a fuzzy query. If it's attached to a query in
> > quotes,
> > > > > then
> > > > > > > > it is a proximity query. I'm not sure whether it means
> > something
> > > > > > > > different when it's attached to a query clause in
> parentheses,
> > > > > someone
> > > > > > > > with more knowledge will need to comment.
> > > > > > > 
> > > > > > > > https://issues.apache.org/jira/browse/SOLR-8812
> > > > > > >
> > > > > > > After I read SOLR-8812 more closely, it seems that the ~2
> syntax
> > > with
> > > > > > > parentheses is the way that the effective mm value is expressed
> > > for a
> > > > > > > particular query clause in the parsed query.  I've learned
> > > something
> > > > > new
> > > > > > > today.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Shawn
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>


Re: Query behavior.

2016-03-19 Thread Modassar Ather
What I understand by "+((fl:java fl:book))" is any of the terms should be
present in the complete query. Please correct me if I am wrong.
What I want to achieve is (A OR B) where any of the term or both of the
term will cause a match.

Thanks,
Modassar

On Thu, Mar 17, 2016 at 10:32 AM, Jack Krupansky 
wrote:

> That's what I thought you had meant before, but the Jira ticket indicates
> that you are looking for some extra level of AND/MUST outside of the OR,
> which is different from what you just indicated. In the ticket you say:
> "How
> can I achieve following? "+((fl:java fl:book))"", which has an extra AND
> outside of the inner sub-query, which is a little different than just
> "(fl:java
> fl:book)". Sure, the results should be the same, but why insist on the
> extra level of nested boolean query?
>
> -- Jack Krupansky
>
> On Thu, Mar 17, 2016 at 12:50 AM, Modassar Ather 
> wrote:
>
> > What I understand by q.op is the default operator. If there is no AND/OR
> > in-between the terms the default will be AND as per my setting of
> q.op=AND.
> > But what if the query has AND/OR explicitly put in-between the query
> terms?
> > I just think that if (A OR B) is the query then the result should be
> based
> > on any of the term's or both of the terms and not only both of the terms.
> > Please correct me if my understanding is wrong.
> >
> > Thanks,
> > Modassar
> >
> > On Wed, Mar 16, 2016 at 7:34 PM, Jack Krupansky <
> jack.krupan...@gmail.com>
> > wrote:
> >
> > > Now you've confused me... Did you actually intend that q.op=AND was
> going
> > > to perform some function in a query with only two terms and and OR
> > > operator? I mean, why not just drop the q.op=AND?
> > >
> > > -- Jack Krupansky
> > >
> > > On Wed, Mar 16, 2016 at 1:31 AM, Modassar Ather <
> modather1...@gmail.com>
> > > wrote:
> > >
> > > > Jack as suggested I have created following jira issue.
> > > >
> > > > https://issues.apache.org/jira/browse/SOLR-8853
> > > >
> > > > Thanks,
> > > > Modassar
> > > >
> > > >
> > > > On Tue, Mar 15, 2016 at 8:15 PM, Jack Krupansky <
> > > jack.krupan...@gmail.com>
> > > > wrote:
> > > >
> > > > > That was precisely the point of the need for a new Jira - to answer
> > > > exactly
> > > > > the questions that you have posed - and that I had proposed as
> well.
> > > > Until
> > > > > some of the senior committers comment on that Jira you won't have
> > > > answers.
> > > > > They've painted themselves into a corner and now I am curious how
> > they
> > > > will
> > > > > unpaint themselves out of that corner.
> > > > >
> > > > > -- Jack Krupansky
> > > > >
> > > > > On Tue, Mar 15, 2016 at 1:46 AM, Modassar Ather <
> > > modather1...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Thanks Jack for your response.
> > > > > > The following jira bug for this issue is already present so I
> have
> > > not
> > > > > > created a new one.
> > > > > > https://issues.apache.org/jira/browse/SOLR-8812
> > > > > >
> > > > > > Kindly help me understand that whether it is possible to achieve
> > > search
> > > > > on
> > > > > > ORed terms as it was done in earlier Solr version.
> > > > > > Is this behavior intentional or is it a bug? I need to migrate to
> > > > > > Solr-5.5.0 but not doing so due to this behavior.
> > > > > >
> > > > > > Thanks,
> > > > > > Modassar
> > > > > >
> > > > > >
> > > > > > On Fri, Mar 11, 2016 at 3:18 AM, Jack Krupansky <
> > > > > jack.krupan...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > We probably need a Jira to investigate whether this really is
> an
> > > > > > explicitly
> > > > > > > intentional feature change, or whether it really is a bug. And
> if
> > > it
> > > > > > truly
> > > > > > > was intentional, how people can work around the change to get
> the
> > > > > > desired,
> > > > > &

Wildcard query behavior.

2016-04-18 Thread Modassar Ather
Hi,

Please help me understand following.

I have analysis chain which uses KStemFilterFactory for a field. Solr
version is 5.4.0

When I search for f:validator I get 80K+ documents whereas if I search for
f:validator* I get only around 150 results.

When I checked on analysis page I see that validator is changed to
validate. Per my understanding in both the above cases it should at-least
give the exact same result of around 80K+ documents.

I understand in some cases wildcards can result in sub-optimal results for
stemmed content. Please correct me if I am wrong.

Thanks,
Modassar


Re: Wildcard query behavior.

2016-04-18 Thread Modassar Ather
Thanks Reth for your response.

When validator is changed to validate, both at query time and index time,
then should not validator*/validator return the same results at-least?

E.g. 5 documents contains validator. At index time validator got changed to
validate.
Now when validator* is searched it will also change to validate and should
match all 5 documents. In this case I am not sure how the wildcard
internally is handled meaning what the query will transform to.

Please help me understand the internals of wildcard with stemming or point
me to some documents as I could not find any details on it.

Best,
Modassar

On Mon, Apr 18, 2016 at 1:04 PM, Reth RM  wrote:

> If you search for f:validat*, then I believe you will get same number of
> results. Please check.
>
> f:validator* is searching for records that have prefix "validator" where as
> field with stemmer which stems "validator" to "validate" (if this stemming
> was applied at index time as well as query time) its looking for records
> that have "validate" or "validator", so for obvious reasons, numFound might
> have been different.
>
>
>
> On Mon, Apr 18, 2016 at 12:48 PM, Modassar Ather 
> wrote:
>
> > Hi,
> >
> > Please help me understand following.
> >
> > I have analysis chain which uses KStemFilterFactory for a field. Solr
> > version is 5.4.0
> >
> > When I search for f:validator I get 80K+ documents whereas if I search
> for
> > f:validator* I get only around 150 results.
> >
> > When I checked on analysis page I see that validator is changed to
> > validate. Per my understanding in both the above cases it should at-least
> > give the exact same result of around 80K+ documents.
> >
> > I understand in some cases wildcards can result in sub-optimal results
> for
> > stemmed content. Please correct me if I am wrong.
> >
> > Thanks,
> > Modassar
> >
>


Re: Wildcard query behavior.

2016-04-19 Thread Modassar Ather
Yes! wildcards are not analyzed. Thanks Shwan for reminding me.
Thanks Erick for your response.

Best,
Modassar

On Mon, Apr 18, 2016 at 8:53 PM, Erick Erickson 
wrote:

> Here's a blog on the subject:
>
> https://lucidworks.com/blog/2011/11/29/whats-with-lowercasing-wildcard-multiterm-queries-in-solr/
>
> bq: When validator is changed to validate, both at query time and index
> time,
> then should not validator*/validator return the same results at-least?
>
> This is one of those problems that's easy to state, but hard to solve. And
> there are so many variations that any attempt to solve it will _always_
> have lots of surprises. Simple example (and remember that the
> stemming is usually algorithmic). "validator" probably stems to "validat".
> However, "validato" (note the 'o') may not stem
> the same way at all, so searching for "validato*" wouldn't produce the
> expected response.
>
> Best,
> Erick
>
> On Mon, Apr 18, 2016 at 6:23 AM, Shawn Heisey  wrote:
> > On 4/18/2016 1:18 AM, Modassar Ather wrote:
> >> When I search for f:validator I get 80K+ documents whereas if I search
> for
> >> f:validator* I get only around 150 results.
> >>
> >> When I checked on analysis page I see that validator is changed to
> >> validate. Per my understanding in both the above cases it should
> at-least
> >> give the exact same result of around 80K+ documents.
> >
> > What Reth was trying to tell you, but did not state clearly, is that
> > when you use wildcards, your query is NOT analyzed -- none of your
> > filters, including the stemmer, are used.
> >
> > Thanks,
> > Shawn
> >
>


Results of facet differs with change in facet.limit.

2016-05-02 Thread Modassar Ather
Hi,

I have a field f which is defined as follows on solr 5.x. It is 12 shard
cluster with no replica.



When I facet on this field with different facet.limit I get different facet
count.

E.g.
Query : text_field:term&facet.field=f&facet.limit=100
Result :
1225
1082
1076

Query : text_field:term&facet.field=f&facet.limit=200
1366
1321
1315

I am noticing lesser document in facets whereas the numFound during search
is more. Please refer to following query for details.

Query : text_field:term&facet.field=f
Result :
1225
1082
1076

Query : text_field:term AND f:val1
Result: numFound=1366

Kindly help me understand this behavior or let me know if it is an issue.

Thanks,
Modassar


Re: Results of facet differs with change in facet.limit.

2016-05-02 Thread Modassar Ather
Hi,

Kindly share your inputs on this issue.

Thanks,
Modassar

On Mon, May 2, 2016 at 3:53 PM, Modassar Ather 
wrote:

> Hi,
>
> I have a field f which is defined as follows on solr 5.x. It is 12 shard
> cluster with no replica.
>
>  stored="false" indexed="false" docValues="true"/>
>
> When I facet on this field with different facet.limit I get different
> facet count.
>
> E.g.
> Query : text_field:term&facet.field=f&facet.limit=100
> Result :
> 1225
> 1082
> 1076
>
> Query : text_field:term&facet.field=f&facet.limit=200
> 1366
> 1321
> 1315
>
> I am noticing lesser document in facets whereas the numFound during search
> is more. Please refer to following query for details.
>
> Query : text_field:term&facet.field=f
> Result :
> 1225
> 1082
> 1076
>
> Query : text_field:term AND f:val1
> Result: numFound=1366
>
> Kindly help me understand this behavior or let me know if it is an issue.
>
> Thanks,
> Modassar
>


Re: Results of facet differs with change in facet.limit.

2016-05-03 Thread Modassar Ather
I tried to reproduce the same issue with a field of following type but
could not.


Please share your inputs.

Best,
Modassar

On Tue, May 3, 2016 at 10:32 AM, Modassar Ather 
wrote:

> Hi,
>
> Kindly share your inputs on this issue.
>
> Thanks,
> Modassar
>
> On Mon, May 2, 2016 at 3:53 PM, Modassar Ather 
> wrote:
>
>> Hi,
>>
>> I have a field f which is defined as follows on solr 5.x. It is 12 shard
>> cluster with no replica.
>>
>> > stored="false" indexed="false" docValues="true"/>
>>
>> When I facet on this field with different facet.limit I get different
>> facet count.
>>
>> E.g.
>> Query : text_field:term&facet.field=f&facet.limit=100
>> Result :
>> 1225
>> 1082
>> 1076
>>
>> Query : text_field:term&facet.field=f&facet.limit=200
>> 1366
>> 1321
>> 1315
>>
>> I am noticing lesser document in facets whereas the numFound during
>> search is more. Please refer to following query for details.
>>
>> Query : text_field:term&facet.field=f
>> Result :
>> 1225
>> 1082
>> 1076
>>
>> Query : text_field:term AND f:val1
>> Result: numFound=1366
>>
>> Kindly help me understand this behavior or let me know if it is an issue.
>>
>> Thanks,
>> Modassar
>>
>
>


Re: Results of facet differs with change in facet.limit.

2016-05-03 Thread Modassar Ather
Thanks Erick for your response.

I checked with distrib=false. I tried with a smaller result set.

*Search*
E.g. text_field:term AND f:val1
Number of matches : 49

*Facet:* (distrib=true)
text_field:term AND f:val1
*Result*
Shard1 :

47

*Facet: *(distrib=false)
text_field:term AND f:val1&distrib=false
*Result*
 Shard1 :
 
 44

 Shard3 :
 
 2

 Shard8 :
 
 3

All other shards out of 12 shows 0 count against val1. It seems that the
result of shard3 is not being added to the main result.
Kindly comment.

Best,
Modassar

On Wed, May 4, 2016 at 5:33 AM, Erick Erickson 
wrote:

> Hmm, I'd be interested what you get if you restrict your
> queries to individual shards using &distrib=false. This
> will go to the individual shard you address and no others.
>
> Does the facet count change in those circumstances?
>
> Best,
> Erick
>
> On Tue, May 3, 2016 at 4:48 AM, Modassar Ather 
> wrote:
> > I tried to reproduce the same issue with a field of following type but
> > could not.
> >  > stored="false" omitNorms="true"/>
> >
> > Please share your inputs.
> >
> > Best,
> > Modassar
> >
> > On Tue, May 3, 2016 at 10:32 AM, Modassar Ather 
> > wrote:
> >
> >> Hi,
> >>
> >> Kindly share your inputs on this issue.
> >>
> >> Thanks,
> >> Modassar
> >>
> >> On Mon, May 2, 2016 at 3:53 PM, Modassar Ather 
> >> wrote:
> >>
> >>> Hi,
> >>>
> >>> I have a field f which is defined as follows on solr 5.x. It is 12
> shard
> >>> cluster with no replica.
> >>>
> >>>  sortMissingLast="true"
> >>> stored="false" indexed="false" docValues="true"/>
> >>>
> >>> When I facet on this field with different facet.limit I get different
> >>> facet count.
> >>>
> >>> E.g.
> >>> Query : text_field:term&facet.field=f&facet.limit=100
> >>> Result :
> >>> 1225
> >>> 1082
> >>> 1076
> >>>
> >>> Query : text_field:term&facet.field=f&facet.limit=200
> >>> 1366
> >>> 1321
> >>> 1315
> >>>
> >>> I am noticing lesser document in facets whereas the numFound during
> >>> search is more. Please refer to following query for details.
> >>>
> >>> Query : text_field:term&facet.field=f
> >>> Result :
> >>> 1225
> >>> 1082
> >>> 1076
> >>>
> >>> Query : text_field:term AND f:val1
> >>> Result: numFound=1366
> >>>
> >>> Kindly help me understand this behavior or let me know if it is an
> issue.
> >>>
> >>> Thanks,
> >>> Modassar
> >>>
> >>
> >>
>


Re: Results of facet differs with change in facet.limit.

2016-05-04 Thread Modassar Ather
The "val1" is same for both the test with limit 100 and 200 so the
following is true.

limit=100
1225
1082
1076

limit=200
1366
1321
1315

This I have noticed irrespective of facet.limit too. Please refer to my
previous mail for the example.

Thanks,
Modassar

On Wed, May 4, 2016 at 3:01 PM, Toke Eskildsen 
wrote:

> On Mon, 2016-05-02 at 15:53 +0530, Modassar Ather wrote:
> > E.g.
> > Query : text_field:term&facet.field=f&facet.limit=100
> > Result :
> > 1225
> > 1082
> > 1076
> >
> > Query : text_field:term&facet.field=f&facet.limit=200
> > 1366
> > 1321
> > 1315
>
> Is the "val1" in your limit=100 test the same term as your "val1" in
> your limit=200-test?
>
>
> Or to phrase it another way: Do you have
>
> limit=100
> 1225
> 1082
> 1076
>
> limit=200
> 1366
> 1321
> 1315
>
>
> or
>
> limit=100
> 1225
> 1082
> 1076
>
> limit=200
> 1366
> 1321
> 1315
>
>
> - Toke Eskildsen, State and University Library, Denmark
>
>


Zookeeper state and its effect on Solr cluster.

2015-07-27 Thread Modassar Ather
Hi,

Kindly help me understand following with respect to Solr version 5.2.1.

1. What happens to the solr cluster if the standalone external zookeeper is
stopped/restarted with some changes done in zoo_data during the restart?
E.g After restarting the zookeeper the solr configs are reloaded with
changes. Please note that solr cluster is not restarted.
2. In what conditions of zookeeper restart the solr nodes are required to
be restarted?

Thanks,
Modassar


Re: Zookeeper state and its effect on Solr cluster.

2015-07-27 Thread Modassar Ather
Thanks for your response Erick and Shawn.

We had automated the solr/zookeeper future upgrades using scripts. So for
any new version of solr/zookeeper we use those script.
While upgrading zookeeper we do stop it to install it as a service and then
apply the new distribution(which is currently 3.4.6) and restart. Content
of zoo_data is not deleted.
After that the solr configs are uploaded. In this process of zookeeper
upgrade solr nodes are not restarted.
After this upgrade process I have seen all the nodes active. There are
connection related exception in solr log for the time the zookeeper was
stopped.

Our indexer again uploads the configs to accommodate any possible changes
in schema or solrconfig which passes every time and then during reload of
collection we are getting following exception intermittently.

{"responseHeader":{"status":500,"QTime":180028},"error":{"msg":"reload the
collection time out:180s","trace":"org.apache.solr.common.SolrException:
reload the collection time out:180s\n\tat
org.apache.solr.handler.admin.CollectionsHandler.handleResponse(CollectionsHandler.java:237)\n\tat
org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:168)\n\tat
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)\n\tat
org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:660)\n\tat
org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:431)\n\tat
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:227)\n\tat
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:196)\n\tat
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)\n\tat
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)\n\tat
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)\n\tat
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)\n\tat
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)\n\tat
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)\n\tat
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)\n\tat
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)\n\tat
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)\n\tat
org.eclipse.jetty.server.Server.handle(Server.java:497)\n\tat
org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)\n\tat
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)\n\tat
org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)\n\tat
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)\n\tat
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)\n\tat
java.lang.Thread.run(Thread.java:745)\n","code":500}}

Regards,
Modassar



On Mon, Jul 27, 2015 at 8:45 PM, Shawn Heisey  wrote:

> On 7/27/2015 6:17 AM, Modassar Ather wrote:
> > Kindly help me understand following with respect to Solr version 5.2.1.
> >
> > 1. What happens to the solr cluster if the standalone external zookeeper
> is
> > stopped/restarted with some changes done in zoo_data during the restart?
> > E.g After restarting the zookeeper the solr configs are reloaded with
> > changes. Please note that solr cluster is not restarted.
> > 2. In what conditions of zookeeper restart the solr nodes are required to
> > be restarted?
>
> If zookeeper loses quorum, SolrCloud goes read-only.  Updates won't be
> possible until zookeeper has quorum again.  If zookeeper goes away
> completely, I think the result might be the same, but I do not know.
>
> For changes in zookeeper related to core configuration, simply reloading
> affected collections with the Collections API is enough.  For more
> extensive changes, especially to things like the clusterstate,
> restarting all Solr nodes might be required.  If you give us specifics
> about what you want to change, we can figure out exactly what actions
> are needed.
>
> Thanks,
> Shawn
>
>


Re: Zookeeper state and its effect on Solr cluster.

2015-07-27 Thread Modassar Ather
Erick I am using the ZK upload process only. It is just that it is added
into a script.
The exception is coming when I am doing a RELOAD of collection after the ZK
restart and fresh schema/solrconfig is uploaded.
And once this exception occurs I have to restart the Solr nodes to get them
working.

Thanks,
Modassar

On Tue, Jul 28, 2015 at 1:05 AM, Erick Erickson 
wrote:

> Why are you doing this? It seems like you're making it
> _much_ more difficult than necessary. Sure, automate
> all the non-solr stuff, but why not make your scripts use
> the ZK upload/download process that's well established
> and tested for maintaining the Solr specific data?
>
> Best,
> Erick
>
> On Mon, Jul 27, 2015 at 9:48 AM, Modassar Ather 
> wrote:
> > Thanks for your response Erick and Shawn.
> >
> > We had automated the solr/zookeeper future upgrades using scripts. So for
> > any new version of solr/zookeeper we use those script.
> > While upgrading zookeeper we do stop it to install it as a service and
> then
> > apply the new distribution(which is currently 3.4.6) and restart. Content
> > of zoo_data is not deleted.
> > After that the solr configs are uploaded. In this process of zookeeper
> > upgrade solr nodes are not restarted.
> > After this upgrade process I have seen all the nodes active. There are
> > connection related exception in solr log for the time the zookeeper was
> > stopped.
> >
> > Our indexer again uploads the configs to accommodate any possible changes
> > in schema or solrconfig which passes every time and then during reload of
> > collection we are getting following exception intermittently.
> >
> > {"responseHeader":{"status":500,"QTime":180028},"error":{"msg":"reload
> the
> > collection time out:180s","trace":"org.apache.solr.common.SolrException:
> > reload the collection time out:180s\n\tat
> >
> org.apache.solr.handler.admin.CollectionsHandler.handleResponse(CollectionsHandler.java:237)\n\tat
> >
> org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:168)\n\tat
> >
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)\n\tat
> >
> org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:660)\n\tat
> > org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:431)\n\tat
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:227)\n\tat
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:196)\n\tat
> >
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)\n\tat
> >
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)\n\tat
> >
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat
> >
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)\n\tat
> >
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)\n\tat
> >
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)\n\tat
> >
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)\n\tat
> >
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)\n\tat
> >
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)\n\tat
> >
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat
> >
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)\n\tat
> >
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)\n\tat
> >
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)\n\tat
> > org.eclipse.jetty.server.Server.handle(Server.java:497)\n\tat
> > org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)\n\tat
> >
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)\n\tat
> >
> org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)\n\tat
> >
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)\n\tat
> >
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)\n\tat
> > java.lang.Thread.run(Thread.java:745)\n","code":500}}
> >
> > Regards,
> > Modassar
> >
> >
> >
> > On Mon, Jul 27, 2015 at 8:45 PM, Shawn Heisey 
> wrote:
> >
> >> On 7/27/2015 6:17 AM, Modassar Ather wrote:
> >> > Kin

Re: Zookeeper state and its effect on Solr cluster.

2015-07-27 Thread Modassar Ather
If we upgrade zookeeper we need to restart. This upgrade process is
automated for future releases/changes of zookeeper.
This is a single external zookeeper which is completely stopped/shutdown.
No Solr node are restarted/shutdown.
What I have understanding that even if the zookeeper shuts down, after
restart the Solr nodes should come insync with the ZK state. Please correct
me if I am wrong.

On Tue, Jul 28, 2015 at 10:22 AM, Shawn Heisey  wrote:

> On 7/27/2015 10:21 PM, Modassar Ather wrote:
> > Erick I am using the ZK upload process only. It is just that it is added
> > into a script.
> > The exception is coming when I am doing a RELOAD of collection after the
> ZK
> > restart and fresh schema/solrconfig is uploaded.
> > And once this exception occurs I have to restart the Solr nodes to get
> them
> > working.
>
> Why are you restarting ZK?  Uploading a new config doesn't require that.
>
> Are you completely shutting down the entire zookeeper ensemble, or just
> restarting each node in turn?  If zookeeper completely disappears, I
> actually wouldn't be too surprised to see a problem in Solr.  That would
> be a bug, of course.
>
> Thanks,
> Shawn
>
>


Re: Zookeeper state and its effect on Solr cluster.

2015-07-30 Thread Modassar Ather
Hi,

Our indexer before starting does upload/reload of Solr configuration files
using ZK UPLOAD and RELOAD APIs. In this process zookeeper is not
stopped/restarted. ZK is alive and so are Solr nodes.
Doing this often causes following exception. Kindly note that the ZK
instance is standalone and not ensemble. This exception is only happening
at RELOAD.

{"responseHeader":{"status":500,"QTime":180028},"error":{"msg":"reload the
collection time out:180s","trace":"org.apache.solr.common.SolrException:
reload the collection time out:180s\n\tat
org.apache.solr.handler.admin.CollectionsHandler.handleResponse(CollectionsHandler.java:237)\n\tat
org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:168)\n\tat
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)\n\tat
org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:660)\n\tat
org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:431)\n\tat
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:227)\n\tat
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:196)\n\tat
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)\n\tat
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)\n\tat
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)\n\tat
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)\n\tat
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)\n\tat
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)\n\tat
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)\n\tat
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)\n\tat
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)\n\tat
org.eclipse.jetty.server.Server.handle(Server.java:497)\n\tat
org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)\n\tat
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)\n\tat
org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)\n\tat
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)\n\tat
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)\n\tat
java.lang.Thread.run(Thread.java:745)\n","code":500}}

Kindly help as this is blocking our smooth process of indexing.

Regards,
Modassar

On Tue, Jul 28, 2015 at 11:40 AM, Shawn Heisey  wrote:

> On 7/27/2015 10:59 PM, Modassar Ather wrote:
> > If we upgrade zookeeper we need to restart. This upgrade process is
> > automated for future releases/changes of zookeeper.
> > This is a single external zookeeper which is completely stopped/shutdown.
> > No Solr node are restarted/shutdown.
> > What I have understanding that even if the zookeeper shuts down, after
> > restart the Solr nodes should come insync with the ZK state. Please
> correct
> > me if I am wrong.
>
> Disclaimer:  I do not have a ton of concrete experience with SolrCloud.
>  I do have a cloud setup, but it is running Solr 4.2.1, which at this
> point is ancient.  I haven't needed to do much to maintain it ... it
> takes care of itself.
>
> Recovering correctly from a complete zookeeper failure is what I would
> hope for, but it's a scenario that I've never tried.  I hope there's a
> unit test for it, but I haven't checked.
>
> A fully redundant zookeeper ensemble requires a minimum of three hosts.
>  If you need to upgrade ZK, then you upgrade them one at a time, and the
> ensemble never loses quorum.
>
> http://wiki.apache.org/hadoop/ZooKeeper/FAQ#A6
>
> Thanks,
> Shawn
>
>


Clarification on WordDelimiterFilter.

2015-08-06 Thread Modassar Ather
I am using WordDelimiterFilter while indexing and searching both with the
following attributes. Parser used is edismax. Solr version is 5.2.1.

**

During search some of the results returned are not wanted. Following is the
example.

Search query: "3d image"
Search results with 3-d image/3 d image/1d image are also returned. As per
analysis page this is happening because of position increment in the token
as explained below.

On the analysis page it shows following four tokens for 3d and there
positions.
token position
3d 1
3   1
3d 1
d   2

image3

Another example is "1d obj*" returning results containing "d-object"
related result. This can bring a completely different search item.

Here the token d is at position 2 which is causing the above matches.
Please help me understand why this position increment is done?
The position increment will also cause the "3d image" search fail on a
document containing "3d image" as the "d" comes at position 2.

Kindly help me understand the best practices of using WordDelimiterFilter
or provide your inputs how we can resolve the issue.

Thanks,
Modassar


Re: Clarification on WordDelimiterFilter.

2015-08-06 Thread Modassar Ather
Hi,

Any suggestion will be really helpful. Kindly provide your inputs.

Thanks,
Modassar

On Thu, Aug 6, 2015 at 2:06 PM, Modassar Ather 
wrote:

> I am using WordDelimiterFilter while indexing and searching both with the
> following attributes. Parser used is edismax. Solr version is 5.2.1.
>
> * generateNumberParts="1" catenateWords="1" catenateNumbers="1"
> catenateAll="1" splitOnCaseChange="1" preserveOriginal="1"/>*
>
> During search some of the results returned are not wanted. Following is
> the example.
>
> Search query: "3d image"
> Search results with 3-d image/3 d image/1d image are also returned. As per
> analysis page this is happening because of position increment in the token
> as explained below.
>
> On the analysis page it shows following four tokens for 3d and there
> positions.
> token position
> 3d 1
> 3   1
> 3d 1
> d   2
>
> image3
>
> Another example is "1d obj*" returning results containing "d-object"
> related result. This can bring a completely different search item.
>
> Here the token d is at position 2 which is causing the above matches.
> Please help me understand why this position increment is done?
> The position increment will also cause the "3d image" search fail on a
> document containing "3d image" as the "d" comes at position 2.
>
> Kindly help me understand the best practices of using WordDelimiterFilter
> or provide your inputs how we can resolve the issue.
>
> Thanks,
> Modassar
>


Re: Renaming Solr webapp in release 5.2.1

2015-08-10 Thread Modassar Ather
May be you can look at solr-jetty-context.xml present under server/context
folder of solr.

Regards,
Modassar

On Mon, Aug 10, 2015 at 2:42 PM, saurabh tewari 
wrote:

> Hi,
>
> Is there any way through which one can change the name of solr webapp? I
> used to name my instances differently according to their usage and I'd like
> to keep doing that, since I've mapped the solr APIs in my code using those
> names only.
>
> Regards,
> Saurabh Tewari
>


Count of distinct values in faceting.

2015-08-11 Thread Modassar Ather
Hi,

Count of distinct values can be retrieved by following ways. Please note
that the Solr version is 5.2.1.
1. Using cardinality=true.
2. Using hll() facet function.

Kindly help me understand:
 1. How accurate are them comparatively and better performance wise with
millions of documents?
 2. Per my understanding the {!cardinality=1.0} returns the most accurate
result. Is my understanding correct and if yes is it 100% accurate?
 3. How accurate result is returned by hll() function?
 4. I am getting following exception for the query :
q=field:query&stats=true&stats.field={!cardinality=1.0}field. The exception
is not seen once the cardinality is set to 0.9 or less.
The field is docVlaues enabled and indexed=false. The same exception I
tried to reproduce on non docVlaues field but could not. Please help me
resolve the issue.
 ERROR - 2015-08-11 12:24:00.222; [core]
org.apache.solr.common.SolrException;
null:java.lang.ArrayIndexOutOfBoundsException: 3
at
net.agkn.hll.serialization.BigEndianAscendingWordSerializer.writeWord(BigEndianAscendingWordSerializer.java:152)
at
net.agkn.hll.util.BitVector.getRegisterContents(BitVector.java:247)
at net.agkn.hll.HLL.toBytes(HLL.java:917)
at net.agkn.hll.HLL.toBytes(HLL.java:869)
at
org.apache.solr.handler.component.AbstractStatsValues.getStatsValues(StatsValuesFactory.java:348)
at
org.apache.solr.handler.component.StatsComponent.convertToResponse(StatsComponent.java:151)
at
org.apache.solr.handler.component.StatsComponent.process(StatsComponent.java:62)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:255)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2064)
at
org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:654)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:450)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:227)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:196)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
at org.eclipse.jetty.server.Server.handle(Server.java:497)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)
at
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
at
org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
at java.lang.Thread.run(Thread.java:745)

Thanks,
Modassar


Re: Count of distinct values in faceting.

2015-08-11 Thread Modassar Ather
Please read docVlaues as docValues in my mail above.

Regards,
Modassar

On Tue, Aug 11, 2015 at 4:01 PM, Modassar Ather 
wrote:

> Hi,
>
> Count of distinct values can be retrieved by following ways. Please note
> that the Solr version is 5.2.1.
> 1. Using cardinality=true.
> 2. Using hll() facet function.
>
> Kindly help me understand:
>  1. How accurate are them comparatively and better performance wise with
> millions of documents?
>  2. Per my understanding the {!cardinality=1.0} returns the most accurate
> result. Is my understanding correct and if yes is it 100% accurate?
>  3. How accurate result is returned by hll() function?
>  4. I am getting following exception for the query :
> q=field:query&stats=true&stats.field={!cardinality=1.0}field. The exception
> is not seen once the cardinality is set to 0.9 or less.
> The field is docVlaues enabled and indexed=false. The same exception I
> tried to reproduce on non docVlaues field but could not. Please help me
> resolve the issue.
>  ERROR - 2015-08-11 12:24:00.222; [core]
> org.apache.solr.common.SolrException;
> null:java.lang.ArrayIndexOutOfBoundsException: 3
> at
> net.agkn.hll.serialization.BigEndianAscendingWordSerializer.writeWord(BigEndianAscendingWordSerializer.java:152)
> at
> net.agkn.hll.util.BitVector.getRegisterContents(BitVector.java:247)
> at net.agkn.hll.HLL.toBytes(HLL.java:917)
> at net.agkn.hll.HLL.toBytes(HLL.java:869)
> at
> org.apache.solr.handler.component.AbstractStatsValues.getStatsValues(StatsValuesFactory.java:348)
> at
> org.apache.solr.handler.component.StatsComponent.convertToResponse(StatsComponent.java:151)
> at
> org.apache.solr.handler.component.StatsComponent.process(StatsComponent.java:62)
> at
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:255)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:2064)
> at
> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:654)
> at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:450)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:227)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:196)
> at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
> at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
> at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
> at
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
> at
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
> at
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
> at
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
> at
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
> at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
> at org.eclipse.jetty.server.Server.handle(Server.java:497)
> at
> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)
> at
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
> at
> org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
> at java.lang.Thread.run(Thread.java:745)
>
> Thanks,
> Modassar
>


Re: Clarification on WordDelimiterFilter.

2015-08-13 Thread Modassar Ather
Thanks for your response Cario.

On Wed, Aug 12, 2015 at 10:20 PM, Cario, Elaine <
elaine.ca...@wolterskluwer.com> wrote:

> Modassar,
>
> There are additional settings in WDFF that you can experiment with (google
> around for the javadocs for the filter).  Specific to your question, there
> is splitOnNumerics param, which might be defaulting to true ("1") causing
> terms like "3d" to get tokenized as "3" and "d".  If you set it to 0 it may
> correct the behavior you're seeing. (You'll need to re-index your content
> to see the effect).
>
> Also, the standard practice that I've seen is that settings which create
> additional tokens are usually only applied at index time, and not applied
> during query time analysis (on the theory that you've indexed all the
> different ways the user can search for a term, so there's no need to
> actually modify the query to get a match).
>
> -Original Message-
> From: Modassar Ather [mailto:modather1...@gmail.com]
> Sent: Friday, August 07, 2015 12:21 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Clarification on WordDelimiterFilter.
>
> Hi,
>
> Any suggestion will be really helpful. Kindly provide your inputs.
>
> Thanks,
> Modassar
>
> On Thu, Aug 6, 2015 at 2:06 PM, Modassar Ather 
> wrote:
>
> > I am using WordDelimiterFilter while indexing and searching both with
> > the following attributes. Parser used is edismax. Solr version is 5.2.1.
> >
> > * > generateNumberParts="1" catenateWords="1" catenateNumbers="1"
> > catenateAll="1" splitOnCaseChange="1" preserveOriginal="1"/>*
> >
> > During search some of the results returned are not wanted. Following
> > is the example.
> >
> > Search query: "3d image"
> > Search results with 3-d image/3 d image/1d image are also returned. As
> > per analysis page this is happening because of position increment in
> > the token as explained below.
> >
> > On the analysis page it shows following four tokens for 3d and there
> > positions.
> > token position
> > 3d 1
> > 3   1
> > 3d 1
> > d   2
> >
> > image3
> >
> > Another example is "1d obj*" returning results containing "d-object"
> > related result. This can bring a completely different search item.
> >
> > Here the token d is at position 2 which is causing the above matches.
> > Please help me understand why this position increment is done?
> > The position increment will also cause the "3d image" search fail on a
> > document containing "3d image" as the "d" comes at position 2.
> >
> > Kindly help me understand the best practices of using
> > WordDelimiterFilter or provide your inputs how we can resolve the issue.
> >
> > Thanks,
> > Modassar
> >
>


Query time out. Solr node goes down.

2015-08-17 Thread Modassar Ather
Hi,

I have a Solr cluster which hosts around 200 GB of index on each node and
are 6 nodes. Solr version is 5.2.1.
When a huge query is fired, it times out *(The request took too long to
iterate over terms.)*, which I can see in the log but at same time the one
of the Solr node goes down and the logs on the Solr nodes starts showing


*following exception.org.apache.solr.common.SolrException: no servers
hosting shard.*
For sometime the shards are not responsive and other queries are not
searched till the node(s) are back again. This is fine but what could be
the possible cause of solr node going down.
The other exception after the solr node goes down is leader election
related which is not a concern as there is no replica of the nodes.

Please provide your suggestions.

Thanks,
Modassar


Re: Query time out. Solr node goes down.

2015-08-17 Thread Modassar Ather
The servers have 32g memory each. Solr JVM memory is set to -Xms20g
-Xmx24g. There are no OOM in logs.

Regards,
Modassar

On Mon, Aug 17, 2015 at 5:06 PM, Upayavira  wrote:

> How much memory does each server have? How much of that memory is
> assigned to the JVM? Is anything reported in the logs (e.g.
> OutOfMemoryError)?
>
> On Mon, Aug 17, 2015, at 12:29 PM, Modassar Ather wrote:
> > Hi,
> >
> > I have a Solr cluster which hosts around 200 GB of index on each node and
> > are 6 nodes. Solr version is 5.2.1.
> > When a huge query is fired, it times out *(The request took too long to
> > iterate over terms.)*, which I can see in the log but at same time the
> > one
> > of the Solr node goes down and the logs on the Solr nodes starts showing
> >
> >
> > *following exception.org.apache.solr.common.SolrException: no servers
> > hosting shard.*
> > For sometime the shards are not responsive and other queries are not
> > searched till the node(s) are back again. This is fine but what could be
> > the possible cause of solr node going down.
> > The other exception after the solr node goes down is leader election
> > related which is not a concern as there is no replica of the nodes.
> >
> > Please provide your suggestions.
> >
> > Thanks,
> > Modassar
>


Re: Query time out. Solr node goes down.

2015-08-17 Thread Modassar Ather
Thanks Upayavira fo your inputs. The java vesrion is 1.7.0_79.

On Mon, Aug 17, 2015 at 5:57 PM, Upayavira  wrote:

> Hoping that others will chime in here with other ideas. Have you,
> though, tried reducing the JVM memory, leaving more available for the OS
> disk cache? Having said that, I'd expect that to improve performance,
> not to cause JVM crashes.
>
> It might also help to know what version of Java you are running.
>
> Upayavira
>
> On Mon, Aug 17, 2015, at 12:45 PM, Modassar Ather wrote:
> > The servers have 32g memory each. Solr JVM memory is set to -Xms20g
> > -Xmx24g. There are no OOM in logs.
> >
> > Regards,
> > Modassar
> >
> > On Mon, Aug 17, 2015 at 5:06 PM, Upayavira  wrote:
> >
> > > How much memory does each server have? How much of that memory is
> > > assigned to the JVM? Is anything reported in the logs (e.g.
> > > OutOfMemoryError)?
> > >
> > > On Mon, Aug 17, 2015, at 12:29 PM, Modassar Ather wrote:
> > > > Hi,
> > > >
> > > > I have a Solr cluster which hosts around 200 GB of index on each
> node and
> > > > are 6 nodes. Solr version is 5.2.1.
> > > > When a huge query is fired, it times out *(The request took too long
> to
> > > > iterate over terms.)*, which I can see in the log but at same time
> the
> > > > one
> > > > of the Solr node goes down and the logs on the Solr nodes starts
> showing
> > > >
> > > >
> > > > *following exception.org.apache.solr.common.SolrException: no servers
> > > > hosting shard.*
> > > > For sometime the shards are not responsive and other queries are not
> > > > searched till the node(s) are back again. This is fine but what
> could be
> > > > the possible cause of solr node going down.
> > > > The other exception after the solr node goes down is leader election
> > > > related which is not a concern as there is no replica of the nodes.
> > > >
> > > > Please provide your suggestions.
> > > >
> > > > Thanks,
> > > > Modassar
> > >
>


Re: Query time out. Solr node goes down.

2015-08-17 Thread Modassar Ather
Shawn! The container I am using is jetty only and the JVM setting I am
using is the default one which comes with Solr startup scripts. Yes I have
changed the JVM memory setting as mentioned.
Kindly help me understand, even if there is a a GC pause why the solr node
will go down. At least for other queries is should not throw exception of
*org.apache.solr.common.SolrException: no servers hosting shard.*
Why the node will throw above exception even a huge query is time out or
may have taken lot of resources. Kindly help me understand in what
conditions such exception can arise as I am not fully aware of it.

Daniel! The error logs do not say if it was JVM crash or just solr. But by
the exception I understand that it might have gone to a state from where it
recovered after sometime. I did not restart the Solr.

On Mon, Aug 17, 2015 at 10:12 PM, Daniel Collins 
wrote:

> When you say "the solr node goes down", what do you mean by that? From your
> comment on the logs, you obviously lose the solr core at best (you do
> realize only having a single replica is inherently susceptible to failure,
> right?)
> But do you mean the Solr Core drops out of the collection (ZK timeout), the
> JVM stops, the whole machine crashes?
>
> On 17 August 2015 at 14:17, Shawn Heisey  wrote:
>
> > On 8/17/2015 5:45 AM, Modassar Ather wrote:
> > > The servers have 32g memory each. Solr JVM memory is set to -Xms20g
> > > -Xmx24g. There are no OOM in logs.
> >
> > Are you starting Solr 5.2.1 with the included start script, or have you
> > installed it into another container?
> >
> > Assuming you're using the download's "bin/solr" script, that will
> > normally set Xms and Xmx to the same value, so if you have overridden
> > the memory settings such that you can have different values in Xms and
> > Xmx, have you also overridden the garbage collection parameters?  If you
> > have, what are they set to now?  You can see all arguments used on
> > startup in the "JVM" section of the admin UI dashboard.
> >
> > If you've installed in an entirely different container, or you have
> > overridden the garbage collection settings, then a 24GB heap might have
> > extreme garbage collection pauses, lasting long enough to exceed the
> > timeout.
> >
> > Giving 24 out of 32GB to Solr will mean that there is only (at most) 8GB
> > left over for caching the index.  With 200GB of index, this is nowhere
> > near enough, and is another likely source of Solr performance problems
> > that cause timeouts.  This is what Upayavira was referring to in his
> > reply.  For good performance with 200GB of index, you may need a lot
> > more than 32GB of total RAM.
> >
> > https://wiki.apache.org/solr/SolrPerformanceProblems
> >
> > This wiki page also describes how you can use jconsole to judge how much
> > heap you actually need.  24GB may be too much.
> >
> > Thanks,
> > Shawn
> >
> >
>


Re: Query time out. Solr node goes down.

2015-08-17 Thread Modassar Ather
I tried to profile the memory of each solr node. I can see the GC activity
going higher as much as 98% and there are many instances where it has gone
up at 10+%. In one of the solr node I can see it going to 45%.
Memory is fully used and have gone to the maximum usage of heap which is
set to 24g. During other search I can see the error
*org.apache.solr.common.SolrException: no servers hosting shard.*
Few nodes are in gone state. There are many instances of
*org.apache.solr.common.SolrException:
org.apache.zookeeper.KeeperException$SessionExpiredException.*
GC logs shows a very busy garbage collection.Please provide your inputs.

On Tue, Aug 18, 2015 at 10:38 AM, Modassar Ather 
wrote:

> Shawn! The container I am using is jetty only and the JVM setting I am
> using is the default one which comes with Solr startup scripts. Yes I have
> changed the JVM memory setting as mentioned.
> Kindly help me understand, even if there is a a GC pause why the solr node
> will go down. At least for other queries is should not throw exception of
> *org.apache.solr.common.SolrException: no servers hosting shard.*
> Why the node will throw above exception even a huge query is time out or
> may have taken lot of resources. Kindly help me understand in what
> conditions such exception can arise as I am not fully aware of it.
>
> Daniel! The error logs do not say if it was JVM crash or just solr. But by
> the exception I understand that it might have gone to a state from where it
> recovered after sometime. I did not restart the Solr.
>
> On Mon, Aug 17, 2015 at 10:12 PM, Daniel Collins 
> wrote:
>
>> When you say "the solr node goes down", what do you mean by that? From
>> your
>> comment on the logs, you obviously lose the solr core at best (you do
>> realize only having a single replica is inherently susceptible to failure,
>> right?)
>> But do you mean the Solr Core drops out of the collection (ZK timeout),
>> the
>> JVM stops, the whole machine crashes?
>>
>> On 17 August 2015 at 14:17, Shawn Heisey  wrote:
>>
>> > On 8/17/2015 5:45 AM, Modassar Ather wrote:
>> > > The servers have 32g memory each. Solr JVM memory is set to -Xms20g
>> > > -Xmx24g. There are no OOM in logs.
>> >
>> > Are you starting Solr 5.2.1 with the included start script, or have you
>> > installed it into another container?
>> >
>> > Assuming you're using the download's "bin/solr" script, that will
>> > normally set Xms and Xmx to the same value, so if you have overridden
>> > the memory settings such that you can have different values in Xms and
>> > Xmx, have you also overridden the garbage collection parameters?  If you
>> > have, what are they set to now?  You can see all arguments used on
>> > startup in the "JVM" section of the admin UI dashboard.
>> >
>> > If you've installed in an entirely different container, or you have
>> > overridden the garbage collection settings, then a 24GB heap might have
>> > extreme garbage collection pauses, lasting long enough to exceed the
>> > timeout.
>> >
>> > Giving 24 out of 32GB to Solr will mean that there is only (at most) 8GB
>> > left over for caching the index.  With 200GB of index, this is nowhere
>> > near enough, and is another likely source of Solr performance problems
>> > that cause timeouts.  This is what Upayavira was referring to in his
>> > reply.  For good performance with 200GB of index, you may need a lot
>> > more than 32GB of total RAM.
>> >
>> > https://wiki.apache.org/solr/SolrPerformanceProblems
>> >
>> > This wiki page also describes how you can use jconsole to judge how much
>> > heap you actually need.  24GB may be too much.
>> >
>> > Thanks,
>> > Shawn
>> >
>> >
>>
>
>


Re: Exception while using {!cardinality=1.0}.

2015-08-17 Thread Modassar Ather
Any suggestions please.

Regards,
Modassar

On Thu, Aug 13, 2015 at 4:25 PM, Modassar Ather 
wrote:

> Hi,
>
> I am getting following exception for the query :
> *q=field:query&stats=true&stats.field={!cardinality=1.0}field*. The
> exception is not seen once the cardinality is set to 0.9 or less.
> The field is *docValues enabled* and *indexed=false*. The same exception
> I tried to reproduce on non docValues field but could not. Please help me
> resolve the issue.
>
> ERROR - 2015-08-11 12:24:00.222; [core]
> org.apache.solr.common.SolrException;
> null:java.lang.ArrayIndexOutOfBoundsException: 3
> at
> net.agkn.hll.serialization.BigEndianAscendingWordSerializer.writeWord(BigEndianAscendingWordSerializer.java:152)
> at
> net.agkn.hll.util.BitVector.getRegisterContents(BitVector.java:247)
> at net.agkn.hll.HLL.toBytes(HLL.java:917)
> at net.agkn.hll.HLL.toBytes(HLL.java:869)
> at
> org.apache.solr.handler.component.AbstractStatsValues.getStatsValues(StatsValuesFactory.java:348)
> at
> org.apache.solr.handler.component.StatsComponent.convertToResponse(StatsComponent.java:151)
> at
> org.apache.solr.handler.component.StatsComponent.process(StatsComponent.java:62)
> at
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:255)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:2064)
> at
> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:654)
> at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:450)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:227)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:196)
> at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
> at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
> at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
> at
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
> at
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
> at
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
> at
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
> at
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
> at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
> at org.eclipse.jetty.server.Server.handle(Server.java:497)
> at
> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)
> at
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
> at
> org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
> at java.lang.Thread.run(Thread.java:745)
>
> Kindly let me know if I need to ask this on any of the related jira issue.
>
> Thanks,
> Modassar
>


Re: Query time out. Solr node goes down.

2015-08-18 Thread Modassar Ather
So Toke/Daniel is the node showing *gone* on Solr cloud dashboard is
because of GC pause and it is actually not gone but the ZK is not able to
get the correct state?
The issue is caused by a huge query with many wildcards and phrases in it.
If you see I have mentioned about (*The request took too long to iterate
over terms.). *So does it mean that the terms which are getting expanded
has taken the amount of memory? Just trying to understand what consumes so
much of memory.
I am trying to reproduce the OOM by executing multiple queries in parallel
but not able to whereas I am seeing the memory usage going up by more than
90+% for Solr JVM. So what happens to the query which is executed in
parallel. Do they wait for such query to timeout/complete which is taking
lot of time and resources?
We also have migration to java 8 on our things to do list and will try with
different GC settings.



On Tue, Aug 18, 2015 at 2:08 PM, Daniel Collins 
wrote:

> Ah ok, its ZK timeout then
> (org.apache.zookeeper.KeeperException$SessionExpiredException)
> which is because of your GC pause.
>
> The page Shawn mentioned earlier has several links on how to investigate GC
> issues and some common GC settings, sounds like you need to tweak those.
>
> Generally speaking, I believe Java 8 is considered better for GC
> performance than 7, so you probably want to investigate that.  GC tuning is
> very dependent on the load on your system. You may be running close yo the
> limit under normal load, and that 1 big query is enough to tip it over the
> edge.  We have seen similar issues from time to time. We are still running
> an older Java 7 build with G1GC which we found worked well for us (though
> CMS seems to be the general consensus on the list here), migrating to Java
> 8 is on our "list of things to do", so our settings are probably not that
> relevant.
>
>
> On 18 August 2015 at 09:04, Toke Eskildsen  wrote:
>
> > On Tue, 2015-08-18 at 10:38 +0530, Modassar Ather wrote:
> > > Kindly help me understand, even if there is a a GC pause why the solr
> > node
> > > will go down.
> >
> > If a stop-the-world GC is in progress, it is not possible for an
> > external service to know if this is because a GC is in progress or the
> > node is dead. If the GC takes longer than the relevant timeouts, the
> > external conclusion is that it is dead.
> >
> > In you next post you state that there is very heavy GC going on, so it
> > would seem that your main problem is that your heap is too small for
> > your setup.
> >
> > Getting OOM for a 200GB index with 24GB heap is not at all impossible,
> > but it is a bit of a red flag. If you have very high values for your
> > caches or perform faceting on a lot of different fields, that might be
> > the cause. If you describe your setup in more detail, we might be able
> > to help find the cause for your relatively high heap requirement.
> >
> > - Toke Eskildsen, State and University Library, Denmark
> >
> >
> >
>


Re: Exception while using {!cardinality=1.0}.

2015-08-18 Thread Modassar Ather
Ahmet/Chris! Thanks for your replies.

Ahmet I think "net.agkn.hll.serialization" is used by hll() function
implementation of Solr.

Chris I will try to create sample data and create a jira ticket with
details.

Regards,
Modassar


On Tue, Aug 18, 2015 at 9:58 PM, Chris Hostetter 
wrote:

>
> : > I am getting following exception for the query :
> : > *q=field:query&stats=true&stats.field={!cardinality=1.0}field*. The
> : > exception is not seen once the cardinality is set to 0.9 or less.
> : > The field is *docValues enabled* and *indexed=false*. The same
> exception
> : > I tried to reproduce on non docValues field but could not. Please help
> me
> : > resolve the issue.
>
> Hmmm... this is a weird error ... but you haven't really given us enough
> information to really guess what the root cause is
>
> - What was the datatype of the field(s)?
> - Did you have the exact same data in both fields?
> - Are these multivalued fields?
> - Did your "real" query actually compute stats on the same field you had
>   done your main term query on?
>
> I know we have some tests of this bsaic siuation, and i tried to do ome
> more manual testing to spot check, but i can't reproduce.
>
> If you can please provide a full copy of the data (as csv o xml or
> whatever) to build your index along with all solr configs and the exact
> queries to reproduce that would really help get to the bottom of this --
> if you can't provide all the data, then can you at least reproduce with a
> small set of sample data?
>
> either way: please file a new jira issue and attach as much detail as you
> can -- this URL has a lot of great tips on the types of data we need to be
> able to get to the bottom of bugs...
>
> https://wiki.apache.org/solr/UsingMailingLists
>
>
>
>
>
> : > ERROR - 2015-08-11 12:24:00.222; [core]
> : > org.apache.solr.common.SolrException;
> : > null:java.lang.ArrayIndexOutOfBoundsException: 3
> : > at
> : >
> net.agkn.hll.serialization.BigEndianAscendingWordSerializer.writeWord(BigEndianAscendingWordSerializer.java:152)
> : > at
> : > net.agkn.hll.util.BitVector.getRegisterContents(BitVector.java:247)
> : > at net.agkn.hll.HLL.toBytes(HLL.java:917)
> : > at net.agkn.hll.HLL.toBytes(HLL.java:869)
> : > at
> : >
> org.apache.solr.handler.component.AbstractStatsValues.getStatsValues(StatsValuesFactory.java:348)
> : > at
> : >
> org.apache.solr.handler.component.StatsComponent.convertToResponse(StatsComponent.java:151)
> : > at
> : >
> org.apache.solr.handler.component.StatsComponent.process(StatsComponent.java:62)
> : > at
> : >
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:255)
> : > at
> : >
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
> : > at org.apache.solr.core.SolrCore.execute(SolrCore.java:2064)
> : > at
> : > org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:654)
> : > at
> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:450)
> : > at
> : >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:227)
> : > at
> : >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:196)
> : > at
> : >
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
> : > at
> : >
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
> : > at
> : >
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
> : > at
> : >
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
> : > at
> : >
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
> : > at
> : >
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
> : > at
> : >
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
> : > at
> : >
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
> : > at
> : >
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
> : > at
> : >
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
> : > at
> : >
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
> : > at
> : >
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
> : > at
> : >
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
> : > at org.eclipse.jetty.server.Server.handle(Server.java:497)
> : > at
> : > org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)
> : > at
> : >
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
> : > at
> : >
> org.eclipse.jetty.io.AbstractConnecti

Re: How to add second Zookeeper to same machine?

2015-08-20 Thread Modassar Ather
You might want to look into the following documentation. These documents
have explanation on how to setup Zookeeper ensemble and Zookeeper
administration.

https://cwiki.apache.org/confluence/display/solr/Setting+Up+an+External+ZooKeeper+Ensemble
http://zookeeper.apache.org/doc/r3.4.6/zookeeperAdmin.html

Regards,
Modassar

On Thu, Aug 20, 2015 at 1:19 PM, Merlin Morgenstern <
merlin.morgenst...@gmail.com> wrote:

> I am running 2 dedicated servers on which I plan to install Solrcloud with
> 2 solr nodes and 3 ZK.
>
> From Stackoverflow I learned that the best method for autostarting
> zookeeper on ubuntu 14.04 is to install it via "apt-get install
> zookeeperd". I have that running now.
>
> How could I add a second zookeeper to one machine? The config only allows
> one. Or if this is not possible, what would be the recommended way to get 3
> ZK on 2 dedicated running?
>
> I have followed a tutorial where I have that setup available va bash
> script, but it seems that the ubuntu zookeeper setup is robust as it offers
> zombie processes and a startup script as well.
>
> Thank you for any help on this.
>


Re: How to configure solr to not bind at 8983

2015-08-20 Thread Modassar Ather
I think you need to add the port number in solr.xml too under hostPort
attribute.

STOP.PORT is SOLR.PORT-1000 and set under /bin/solr file.
As far as I understand this can not be changed but I am not sure.

Regards,
Modassar

On Thu, Aug 20, 2015 at 11:39 AM, Samy Ateia  wrote:

> I changed the solr listen port in the solr.in.sh file in my solr home
> directory by setting the variable: SOLR_PORT=.
> But Solr is still trying to also listen on 8983 because it gets started
> with the -DSTOP.PORT=8983 variable.
>
> What is this -DSTOP.PORT variable for and where should I configure it?
>
> I ran the install_solr_service.sh script to setup solr and changed the
> SOLR_PORT afterwards.
>
> best regards.
>
> Samy
>


Re: Exception while using {!cardinality=1.0}.

2015-08-20 Thread Modassar Ather
Hi Chris,

I have raised https://issues.apache.org/jira/browse/SOLR-7954 for the issue.

- What was the datatype of the field(s)?
The data type of fields which passes are of type string with following
attributes.

The data type of the fields which fails is of type string with docvalues
enabled and have following attributes.


- Did you have the exact same data in both fields?
Both the field are string type.

- Are these multivalued fields?
Both the fields are multivalued.

- Did your "real" query actually compute stats on the same field you had
  done your main term query on?
I did not get the question but as much I understood and verified in the
Solr log the stat is computed on the field given with
stats.field={!cardinality=1.0}field.

Regards,
Modassar

On Wed, Aug 19, 2015 at 10:24 AM, Modassar Ather 
wrote:

> Ahmet/Chris! Thanks for your replies.
>
> Ahmet I think "net.agkn.hll.serialization" is used by hll() function
> implementation of Solr.
>
> Chris I will try to create sample data and create a jira ticket with
> details.
>
> Regards,
> Modassar
>
>
> On Tue, Aug 18, 2015 at 9:58 PM, Chris Hostetter  > wrote:
>
>>
>> : > I am getting following exception for the query :
>> : > *q=field:query&stats=true&stats.field={!cardinality=1.0}field*. The
>> : > exception is not seen once the cardinality is set to 0.9 or less.
>> : > The field is *docValues enabled* and *indexed=false*. The same
>> exception
>> : > I tried to reproduce on non docValues field but could not. Please
>> help me
>> : > resolve the issue.
>>
>> Hmmm... this is a weird error ... but you haven't really given us enough
>> information to really guess what the root cause is
>>
>> - What was the datatype of the field(s)?
>> - Did you have the exact same data in both fields?
>> - Are these multivalued fields?
>> - Did your "real" query actually compute stats on the same field you had
>>   done your main term query on?
>>
>> I know we have some tests of this bsaic siuation, and i tried to do ome
>> more manual testing to spot check, but i can't reproduce.
>>
>> If you can please provide a full copy of the data (as csv o xml or
>> whatever) to build your index along with all solr configs and the exact
>> queries to reproduce that would really help get to the bottom of this --
>> if you can't provide all the data, then can you at least reproduce with a
>> small set of sample data?
>>
>> either way: please file a new jira issue and attach as much detail as you
>> can -- this URL has a lot of great tips on the types of data we need to be
>> able to get to the bottom of bugs...
>>
>> https://wiki.apache.org/solr/UsingMailingLists
>>
>>
>>
>>
>>
>> : > ERROR - 2015-08-11 12:24:00.222; [core]
>> : > org.apache.solr.common.SolrException;
>> : > null:java.lang.ArrayIndexOutOfBoundsException: 3
>> : > at
>> : >
>> net.agkn.hll.serialization.BigEndianAscendingWordSerializer.writeWord(BigEndianAscendingWordSerializer.java:152)
>> : > at
>> : > net.agkn.hll.util.BitVector.getRegisterContents(BitVector.java:247)
>> : > at net.agkn.hll.HLL.toBytes(HLL.java:917)
>> : > at net.agkn.hll.HLL.toBytes(HLL.java:869)
>> : > at
>> : >
>> org.apache.solr.handler.component.AbstractStatsValues.getStatsValues(StatsValuesFactory.java:348)
>> : > at
>> : >
>> org.apache.solr.handler.component.StatsComponent.convertToResponse(StatsComponent.java:151)
>> : > at
>> : >
>> org.apache.solr.handler.component.StatsComponent.process(StatsComponent.java:62)
>> : > at
>> : >
>> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:255)
>> : > at
>> : >
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
>> : > at org.apache.solr.core.SolrCore.execute(SolrCore.java:2064)
>> : > at
>> : > org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:654)
>> : > at
>> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:450)
>> : > at
>> : >
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:227)
>> : > at
>> : >
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:196)
>> : > at
>> : >
>> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler

Re: Exception while using {!cardinality=1.0}.

2015-08-21 Thread Modassar Ather
Hi Chris,

I have added a method in SOLR-7954 jira ticket to index sample data which
can be used to reproduce the issue.

Regards,
Modassar

On Fri, Aug 21, 2015 at 11:49 AM, Modassar Ather 
wrote:

> Hi Chris,
>
> I have raised https://issues.apache.org/jira/browse/SOLR-7954 for the
> issue.
>
> - What was the datatype of the field(s)?
> The data type of fields which passes are of type string with following
> attributes.
>  stored="false" omitNorms="true"/>
> The data type of the fields which fails is of type string with docvalues
> enabled and have following attributes.
>  stored="false" indexed="false" docValues="true"/>
>
> - Did you have the exact same data in both fields?
> Both the field are string type.
>
> - Are these multivalued fields?
> Both the fields are multivalued.
>
> - Did your "real" query actually compute stats on the same field you had
>   done your main term query on?
> I did not get the question but as much I understood and verified in the
> Solr log the stat is computed on the field given with
> stats.field={!cardinality=1.0}field.
>
> Regards,
> Modassar
>
> On Wed, Aug 19, 2015 at 10:24 AM, Modassar Ather 
> wrote:
>
>> Ahmet/Chris! Thanks for your replies.
>>
>> Ahmet I think "net.agkn.hll.serialization" is used by hll() function
>> implementation of Solr.
>>
>> Chris I will try to create sample data and create a jira ticket with
>> details.
>>
>> Regards,
>> Modassar
>>
>>
>> On Tue, Aug 18, 2015 at 9:58 PM, Chris Hostetter <
>> hossman_luc...@fucit.org> wrote:
>>
>>>
>>> : > I am getting following exception for the query :
>>> : > *q=field:query&stats=true&stats.field={!cardinality=1.0}field*. The
>>> : > exception is not seen once the cardinality is set to 0.9 or less.
>>> : > The field is *docValues enabled* and *indexed=false*. The same
>>> exception
>>> : > I tried to reproduce on non docValues field but could not. Please
>>> help me
>>> : > resolve the issue.
>>>
>>> Hmmm... this is a weird error ... but you haven't really given us enough
>>> information to really guess what the root cause is
>>>
>>> - What was the datatype of the field(s)?
>>> - Did you have the exact same data in both fields?
>>> - Are these multivalued fields?
>>> - Did your "real" query actually compute stats on the same field you had
>>>   done your main term query on?
>>>
>>> I know we have some tests of this bsaic siuation, and i tried to do ome
>>> more manual testing to spot check, but i can't reproduce.
>>>
>>> If you can please provide a full copy of the data (as csv o xml or
>>> whatever) to build your index along with all solr configs and the exact
>>> queries to reproduce that would really help get to the bottom of this --
>>> if you can't provide all the data, then can you at least reproduce with a
>>> small set of sample data?
>>>
>>> either way: please file a new jira issue and attach as much detail as you
>>> can -- this URL has a lot of great tips on the types of data we need to
>>> be
>>> able to get to the bottom of bugs...
>>>
>>> https://wiki.apache.org/solr/UsingMailingLists
>>>
>>>
>>>
>>>
>>>
>>> : > ERROR - 2015-08-11 12:24:00.222; [core]
>>> : > org.apache.solr.common.SolrException;
>>> : > null:java.lang.ArrayIndexOutOfBoundsException: 3
>>> : > at
>>> : >
>>> net.agkn.hll.serialization.BigEndianAscendingWordSerializer.writeWord(BigEndianAscendingWordSerializer.java:152)
>>> : > at
>>> : > net.agkn.hll.util.BitVector.getRegisterContents(BitVector.java:247)
>>> : > at net.agkn.hll.HLL.toBytes(HLL.java:917)
>>> : > at net.agkn.hll.HLL.toBytes(HLL.java:869)
>>> : > at
>>> : >
>>> org.apache.solr.handler.component.AbstractStatsValues.getStatsValues(StatsValuesFactory.java:348)
>>> : > at
>>> : >
>>> org.apache.solr.handler.component.StatsComponent.convertToResponse(StatsComponent.java:151)
>>> : > at
>>> : >
>>> org.apache.solr.handler.component.StatsComponent.process(StatsComponent.java:62)
>>> : > at
>>> : >
>>> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:255)
>>> : &g

Re: Exception while using {!cardinality=1.0}.

2015-08-23 Thread Modassar Ather
- Did you have the exact same data in both fields?
No the data is not same.

- Did your "real" query actually compute stats on the same field you had
:   done your main term query on?

The query field is different and I missed to clearly put it. I will
accordingly modify the jira.
So the query can be
q=anyfield:query&stats=true&stats.field={!cardinality=1.0}field
Can you please explain how having the same field for query and stat can
cause some issue for my better understanding of this feature?

I haven't had a chance to review the jira in depth or actaully run your
code with those configs -- but if you get a chance before i do, please
re-review the code & configs you posted and see if you can reproduce using
the *exact* same data in two different fields, and if the choice of query
makes a differnce in the behavior you see.

Will try to reproduce the same as you have mentioned and revert with
details.

Thanks,
Modassar

On Sat, Aug 22, 2015 at 3:43 AM, Chris Hostetter 
wrote:

>
> : - Did you have the exact same data in both fields?
> : Both the field are string type.
>
> that's not the question i asked.
>
> is the data *in* these fields (ie: the actual value of each field for each
> document) the same for both of the fields?  This is important to figuring
> out if the root problem that having docValues (or not having docValues)
> causes a problem, or is the root problem that having certain kinds of
> *data* in a string field (regardless of docValues) can cause this problem.
>
> Skimming the sample code you posted to SOLR-7954 you are definitley
> putting differnet data into "field" then you put into "field1" so it's
> still not clear what the problem is.
>
> : - Did your "real" query actually compute stats on the same field you had
> :   done your main term query on?
> : I did not get the question but as much I understood and verified in the
> : Solr log the stat is computed on the field given with
> : stats.field={!cardinality=1.0}field.
>
> the question is sepcific to the example query you mentioned before and
> again in your descripion in SOLR-7954.  They show that the same field
> name you are computing stats on ("field") is also used in your main query
> as a constraint on the documents ("q=field:query") which is an odd and
> very special edge case that may be pertinant to the problem you are
> seeing.  Depending on what data you index, that might easily only match 1
> document -- in the case of the test code you put in jira, exactly 0
> documents since you never index the text "query" into field "field" for
> any document)
>
>
> I haven't had a chance to review the jira in depth or actaully run your
> code with those configs -- but if you get a chance before i do, please
> re-review the code & configs you posted and see if you can reproduce using
> the *exact* same data in two different fields, and if the choice of query
> makes a differnce in the behavior you see.
>
>
> :
> : Regards,
> : Modassar
> :
> : On Wed, Aug 19, 2015 at 10:24 AM, Modassar Ather  >
> : wrote:
> :
> : > Ahmet/Chris! Thanks for your replies.
> : >
> : > Ahmet I think "net.agkn.hll.serialization" is used by hll() function
> : > implementation of Solr.
> : >
> : > Chris I will try to create sample data and create a jira ticket with
> : > details.
> : >
> : > Regards,
> : > Modassar
> : >
> : >
> : > On Tue, Aug 18, 2015 at 9:58 PM, Chris Hostetter <
> hossman_luc...@fucit.org
> : > > wrote:
> : >
> : >>
> : >> : > I am getting following exception for the query :
> : >> : > *q=field:query&stats=true&stats.field={!cardinality=1.0}field*.
> The
> : >> : > exception is not seen once the cardinality is set to 0.9 or less.
> : >> : > The field is *docValues enabled* and *indexed=false*. The same
> : >> exception
> : >> : > I tried to reproduce on non docValues field but could not. Please
> : >> help me
> : >> : > resolve the issue.
> : >>
> : >> Hmmm... this is a weird error ... but you haven't really given us
> enough
> : >> information to really guess what the root cause is
> : >>
> : >> - What was the datatype of the field(s)?
> : >> - Did you have the exact same data in both fields?
> : >> - Are these multivalued fields?
> : >> - Did your "real" query actually compute stats on the same field you
> had
> : >>   done your main term query on?
> : >>
> : >> I know we have some tests of this bsaic siuation, and i tried to do
> ome
> : >

Query timeAllowed and its behavior.

2015-08-25 Thread Modassar Ather
Hi,

Kindly help me understand the query time allowed attribute. The following
is set in solrconfig.xml.
30

Does this setting stop the query from running after the timeAllowed is
reached? If not is there a way to stop it as it will occupy resources in
background for no benefit.

Thanks,
Modassar


Re: Query timeAllowed and its behavior.

2015-08-25 Thread Modassar Ather
Thanks for your response Jonathon.

Please correct me if I am wrong in following points.
   -query actually ceases to run once time allowed is reached and releases
all the resources.
   -query expansion is stopped and the query is terminated from execution
releasing all the resources.

Thanks,
Modassar

On Tue, Aug 25, 2015 at 4:46 PM, Jonathon Marks (BLOOMBERG/ LONDON) <
jmark...@bloomberg.net> wrote:

> timeAllowed applies to the time taken by the collector in each shard
> (TimeLimitingCollector). Once timeAllowed is exceeded the collector
> terminates early, returning any partial results it has and freeing the
> resources it was using.
> From Solr 5.0 timeAllowed also applies to the query expansion phase and
> SolrClient request retry.
>
> From: solr-user@lucene.apache.org At: Aug 25 2015 10:18:07
> Subject: Re:Query timeAllowed and its behavior.
>
> Hi,
>
> Kindly help me understand the query time allowed attribute. The following
> is set in solrconfig.xml.
> 30
>
> Does this setting stop the query from running after the timeAllowed is
> reached? If not is there a way to stop it as it will occupy resources in
> background for no benefit.
>
> Thanks,
> Modassar
>
>
>


Behavior of grouping on a field with same value spread across shards.

2015-08-25 Thread Modassar Ather
Hi,

As per my understanding, to group on a field all documents with the same
value in the field have to be in the same shard.

Can we group by a field where the documents with the same value in that
field will be distributed across shards?
Please let me know what are the limitations, feature not available or
performance issues for such fields?

Thanks,
Modassar


Re: Behavior of grouping on a field with same value spread across shards.

2015-08-25 Thread Modassar Ather
Thanks Erick,

I saw the link. So is it that the grouping functionality works fine in
distributed search except the two cases mentioned in the link?

Regards,
Modassar

On Tue, Aug 25, 2015 at 10:40 PM, Erick Erickson 
wrote:

> That's not really the case. Perhaps you're confusing
> group.ngroups and group.facet with just grouping?
>
> See the ref guide:
>
> https://cwiki.apache.org/confluence/display/solr/Result+Grouping#ResultGrouping-DistributedResultGroupingCaveats
>
> Best,
> Erick
>
> On Tue, Aug 25, 2015 at 4:51 AM, Modassar Ather 
> wrote:
> > Hi,
> >
> > As per my understanding, to group on a field all documents with the same
> > value in the field have to be in the same shard.
> >
> > Can we group by a field where the documents with the same value in that
> > field will be distributed across shards?
> > Please let me know what are the limitations, feature not available or
> > performance issues for such fields?
> >
> > Thanks,
> > Modassar
>


Re: Behavior of grouping on a field with same value spread across shards.

2015-08-26 Thread Modassar Ather
Thanks Erick.

On Wed, Aug 26, 2015 at 12:11 PM, Erick Erickson 
wrote:

> That should be the case.
>
> Best,
> Erick
>
> On Tue, Aug 25, 2015 at 8:55 PM, Modassar Ather 
> wrote:
> > Thanks Erick,
> >
> > I saw the link. So is it that the grouping functionality works fine in
> > distributed search except the two cases mentioned in the link?
> >
> > Regards,
> > Modassar
> >
> > On Tue, Aug 25, 2015 at 10:40 PM, Erick Erickson <
> erickerick...@gmail.com>
> > wrote:
> >
> >> That's not really the case. Perhaps you're confusing
> >> group.ngroups and group.facet with just grouping?
> >>
> >> See the ref guide:
> >>
> >>
> https://cwiki.apache.org/confluence/display/solr/Result+Grouping#ResultGrouping-DistributedResultGroupingCaveats
> >>
> >> Best,
> >> Erick
> >>
> >> On Tue, Aug 25, 2015 at 4:51 AM, Modassar Ather  >
> >> wrote:
> >> > Hi,
> >> >
> >> > As per my understanding, to group on a field all documents with the
> same
> >> > value in the field have to be in the same shard.
> >> >
> >> > Can we group by a field where the documents with the same value in
> that
> >> > field will be distributed across shards?
> >> > Please let me know what are the limitations, feature not available or
> >> > performance issues for such fields?
> >> >
> >> > Thanks,
> >> > Modassar
> >>
>


Search results differs with sorting on pagination.

2015-09-09 Thread Modassar Ather
Hi,

Search results are changed every time the following query is hit. Please
note that it is 7 shard cluster of Solr-5.2.1.

Query: q=network&start=50&rows=50&sort=f_sort asc&group=true&group.field=id

Following are the fields and their types in my schema.xml.







As per my understanding it seems to be the issue of tie among the document
as when I added a new sort field like below the result never changed across
multiple hits.
q=network&start=50&rows=50&sort=f_sort asc, score
asc&group=true&group.field=id

Kindly let me know if this is an issue or how this can be fixed.

Thanks,
Modassar


Re: Search results differs with sorting on pagination.

2015-09-09 Thread Modassar Ather
Thanks Erick. There are no replicas on my cluster and the indexing is one
time. No updates or additions are done to the index and the segments are
optimized at the end of indexing.
So adding a secondary sort criteria is the only solution for such issue in
sort?

Regards,
Modassar

On Wed, Sep 9, 2015 at 8:21 PM, Erick Erickson 
wrote:

> When the primary sort criteria is identical for two documents,
> then the _internal_ Lucene document ID is used to break the
> tie. The internal ID for two docs can be not only different, but
> in different _order_ on two separate shards. I'm assuming here
> that  each of your shards has multiple replicas and/or you're
> continuing to index to your cluster.
>
> The relative internal doc IDs for may change even relative to
> each other when segments get merged.
>
> So yes, if you are sorting by something that can be identical
> in documents, it's always best to specify a secondary sort
> criteria. It's not referenced unless there's a tie so it's
> not that expensive. People often use whatever field
> is defined for  since that's _guaranteed_ to
> never be the same for two docs.
>
> Best,
> Erick
>
> On Wed, Sep 9, 2015 at 1:45 AM, Modassar Ather 
> wrote:
> > Hi,
> >
> > Search results are changed every time the following query is hit. Please
> > note that it is 7 shard cluster of Solr-5.2.1.
> >
> > Query: q=network&start=50&rows=50&sort=f_sort
> asc&group=true&group.field=id
> >
> > Following are the fields and their types in my schema.xml.
> >
> >  > stored="false" omitNorms="true"/>
> >  > stored="false" indexed="true" docValues="true"/>
> >
> > 
> > 
> >
> > As per my understanding it seems to be the issue of tie among the
> document
> > as when I added a new sort field like below the result never changed
> across
> > multiple hits.
> > q=network&start=50&rows=50&sort=f_sort asc, score
> > asc&group=true&group.field=id
> >
> > Kindly let me know if this is an issue or how this can be fixed.
> >
> > Thanks,
> > Modassar
>


Re: Search results differs with sorting on pagination.

2015-09-10 Thread Modassar Ather
Upayavira! I add the fl=id,score,[shard] and saw the shards changing in the
response every time and for different shards the response changes but for
the same shard result is same on multiple hits.
When I add secondary sort field e.g. score the shard remains same across
hits.

On Thu, Sep 10, 2015 at 12:52 PM, Upayavira  wrote:

> Add fl=id,score,[shard] to your query, and show us the results of two
> differing executions.
>
> Perhaps we will be able to see the cause of the difference.
>
> Upayavira
>
> On Thu, Sep 10, 2015, at 05:35 AM, Modassar Ather wrote:
> > Thanks Erick. There are no replicas on my cluster and the indexing is one
> > time. No updates or additions are done to the index and the segments are
> > optimized at the end of indexing.
> > So adding a secondary sort criteria is the only solution for such issue
> > in
> > sort?
> >
> > Regards,
> > Modassar
> >
> > On Wed, Sep 9, 2015 at 8:21 PM, Erick Erickson 
> > wrote:
> >
> > > When the primary sort criteria is identical for two documents,
> > > then the _internal_ Lucene document ID is used to break the
> > > tie. The internal ID for two docs can be not only different, but
> > > in different _order_ on two separate shards. I'm assuming here
> > > that  each of your shards has multiple replicas and/or you're
> > > continuing to index to your cluster.
> > >
> > > The relative internal doc IDs for may change even relative to
> > > each other when segments get merged.
> > >
> > > So yes, if you are sorting by something that can be identical
> > > in documents, it's always best to specify a secondary sort
> > > criteria. It's not referenced unless there's a tie so it's
> > > not that expensive. People often use whatever field
> > > is defined for  since that's _guaranteed_ to
> > > never be the same for two docs.
> > >
> > > Best,
> > > Erick
> > >
> > > On Wed, Sep 9, 2015 at 1:45 AM, Modassar Ather  >
> > > wrote:
> > > > Hi,
> > > >
> > > > Search results are changed every time the following query is hit.
> Please
> > > > note that it is 7 shard cluster of Solr-5.2.1.
> > > >
> > > > Query: q=network&start=50&rows=50&sort=f_sort
> > > asc&group=true&group.field=id
> > > >
> > > > Following are the fields and their types in my schema.xml.
> > > >
> > > >  > > > stored="false" omitNorms="true"/>
> > > >  sortMissingLast="true"
> > > > stored="false" indexed="true" docValues="true"/>
> > > >
> > > > 
> > > > 
> > > >
> > > > As per my understanding it seems to be the issue of tie among the
> > > document
> > > > as when I added a new sort field like below the result never changed
> > > across
> > > > multiple hits.
> > > > q=network&start=50&rows=50&sort=f_sort asc, score
> > > > asc&group=true&group.field=id
> > > >
> > > > Kindly let me know if this is an issue or how this can be fixed.
> > > >
> > > > Thanks,
> > > > Modassar
> > >
>


Re: Search results differs with sorting on pagination.

2015-09-10 Thread Modassar Ather
To add to my previous observation I saw the response having results from
multiple shards when the secondary sort field is added and they remain same
across hits.
Kindly help me understand this behavior. Why the results are changing as I
understand that the result should be first clubbed together from all shard
and then based on their score it should be sorted.
But here I see that every time I hit the sort query I am getting results
from different shard which has different scores.

Thanks,
Modassar

On Thu, Sep 10, 2015 at 2:59 PM, Modassar Ather 
wrote:

> Upayavira! I add the fl=id,score,[shard] and saw the shards changing in
> the response every time and for different shards the response changes but
> for the same shard result is same on multiple hits.
> When I add secondary sort field e.g. score the shard remains same across
> hits.
>
> On Thu, Sep 10, 2015 at 12:52 PM, Upayavira  wrote:
>
>> Add fl=id,score,[shard] to your query, and show us the results of two
>> differing executions.
>>
>> Perhaps we will be able to see the cause of the difference.
>>
>> Upayavira
>>
>> On Thu, Sep 10, 2015, at 05:35 AM, Modassar Ather wrote:
>> > Thanks Erick. There are no replicas on my cluster and the indexing is
>> one
>> > time. No updates or additions are done to the index and the segments are
>> > optimized at the end of indexing.
>> > So adding a secondary sort criteria is the only solution for such issue
>> > in
>> > sort?
>> >
>> > Regards,
>> > Modassar
>> >
>> > On Wed, Sep 9, 2015 at 8:21 PM, Erick Erickson > >
>> > wrote:
>> >
>> > > When the primary sort criteria is identical for two documents,
>> > > then the _internal_ Lucene document ID is used to break the
>> > > tie. The internal ID for two docs can be not only different, but
>> > > in different _order_ on two separate shards. I'm assuming here
>> > > that  each of your shards has multiple replicas and/or you're
>> > > continuing to index to your cluster.
>> > >
>> > > The relative internal doc IDs for may change even relative to
>> > > each other when segments get merged.
>> > >
>> > > So yes, if you are sorting by something that can be identical
>> > > in documents, it's always best to specify a secondary sort
>> > > criteria. It's not referenced unless there's a tie so it's
>> > > not that expensive. People often use whatever field
>> > > is defined for  since that's _guaranteed_ to
>> > > never be the same for two docs.
>> > >
>> > > Best,
>> > > Erick
>> > >
>> > > On Wed, Sep 9, 2015 at 1:45 AM, Modassar Ather <
>> modather1...@gmail.com>
>> > > wrote:
>> > > > Hi,
>> > > >
>> > > > Search results are changed every time the following query is hit.
>> Please
>> > > > note that it is 7 shard cluster of Solr-5.2.1.
>> > > >
>> > > > Query: q=network&start=50&rows=50&sort=f_sort
>> > > asc&group=true&group.field=id
>> > > >
>> > > > Following are the fields and their types in my schema.xml.
>> > > >
>> > > > > sortMissingLast="true"
>> > > > stored="false" omitNorms="true"/>
>> > > > > sortMissingLast="true"
>> > > > stored="false" indexed="true" docValues="true"/>
>> > > >
>> > > > 
>> > > > 
>> > > >
>> > > > As per my understanding it seems to be the issue of tie among the
>> > > document
>> > > > as when I added a new sort field like below the result never changed
>> > > across
>> > > > multiple hits.
>> > > > q=network&start=50&rows=50&sort=f_sort asc, score
>> > > > asc&group=true&group.field=id
>> > > >
>> > > > Kindly let me know if this is an issue or how this can be fixed.
>> > > >
>> > > > Thanks,
>> > > > Modassar
>> > >
>>
>
>


Re: Search results differs with sorting on pagination.

2015-09-10 Thread Modassar Ather
If two documents come back from different
shards with the same score, the order would not be predictable

This is fine.

What I am not able to understand is that when I do not give a secondary
field for sort I am getting the result from one shard which changes to
other shard in other hits. Here the results are always from one shard.
E.g In first hit all the results are from shard1 and in next hit all the
results are from shard2.

But when I add the secondary sort field I see the results from multiple
shards. E.g It has results from shard1 and shard2 both. This does not
change in multiple hits.

So please help me understand why the similar result merge and aggregation
in not happening in when a single sort field is given?

Regards,
Modassar



On Thu, Sep 10, 2015 at 5:03 PM, Upayavira  wrote:

> What scores are you getting? If two documents come back from different
> shards with the same score, the order would not be predictable -
> probably down to which shard responds first.
>
> Fix it with something like sort=score,timestamp or some other time
> related field.
>
> Upayavira
>
> On Thu, Sep 10, 2015, at 11:01 AM, Modassar Ather wrote:
> > To add to my previous observation I saw the response having results from
> > multiple shards when the secondary sort field is added and they remain
> > same
> > across hits.
> > Kindly help me understand this behavior. Why the results are changing as
> > I
> > understand that the result should be first clubbed together from all
> > shard
> > and then based on their score it should be sorted.
> > But here I see that every time I hit the sort query I am getting results
> > from different shard which has different scores.
> >
> > Thanks,
> > Modassar
> >
> > On Thu, Sep 10, 2015 at 2:59 PM, Modassar Ather 
> > wrote:
> >
> > > Upayavira! I add the fl=id,score,[shard] and saw the shards changing in
> > > the response every time and for different shards the response changes
> but
> > > for the same shard result is same on multiple hits.
> > > When I add secondary sort field e.g. score the shard remains same
> across
> > > hits.
> > >
> > > On Thu, Sep 10, 2015 at 12:52 PM, Upayavira  wrote:
> > >
> > >> Add fl=id,score,[shard] to your query, and show us the results of two
> > >> differing executions.
> > >>
> > >> Perhaps we will be able to see the cause of the difference.
> > >>
> > >> Upayavira
> > >>
> > >> On Thu, Sep 10, 2015, at 05:35 AM, Modassar Ather wrote:
> > >> > Thanks Erick. There are no replicas on my cluster and the indexing
> is
> > >> one
> > >> > time. No updates or additions are done to the index and the
> segments are
> > >> > optimized at the end of indexing.
> > >> > So adding a secondary sort criteria is the only solution for such
> issue
> > >> > in
> > >> > sort?
> > >> >
> > >> > Regards,
> > >> > Modassar
> > >> >
> > >> > On Wed, Sep 9, 2015 at 8:21 PM, Erick Erickson <
> erickerick...@gmail.com
> > >> >
> > >> > wrote:
> > >> >
> > >> > > When the primary sort criteria is identical for two documents,
> > >> > > then the _internal_ Lucene document ID is used to break the
> > >> > > tie. The internal ID for two docs can be not only different, but
> > >> > > in different _order_ on two separate shards. I'm assuming here
> > >> > > that  each of your shards has multiple replicas and/or you're
> > >> > > continuing to index to your cluster.
> > >> > >
> > >> > > The relative internal doc IDs for may change even relative to
> > >> > > each other when segments get merged.
> > >> > >
> > >> > > So yes, if you are sorting by something that can be identical
> > >> > > in documents, it's always best to specify a secondary sort
> > >> > > criteria. It's not referenced unless there's a tie so it's
> > >> > > not that expensive. People often use whatever field
> > >> > > is defined for  since that's _guaranteed_ to
> > >> > > never be the same for two docs.
> > >> > >
> > >> > > Best,
> > >> > > Erick
> > >> > >
> > >> > > On Wed, Sep 9, 2015 at 1:45 AM, Modassar Ather <
> > >> modather1...@gmail

Re: Search results differs with sorting on pagination.

2015-09-10 Thread Modassar Ather
Thanks Erick and Upayavira for the responses. One thing which I noticed in
context of single sort field that the scores differ in each shard response.
No score is identical in the response of one shard and they differ too in
the responses from other shards. The score I got using fl=score.

Regards,
Modassar

On Thu, Sep 10, 2015 at 8:45 PM, Erick Erickson 
wrote:

> First, if Upayavira's intuition is correct (and I'm guessing it is),
> then the behavior you're seeing is probably an accident of
> coding rather than intentional. I think the algorithm is something
> like this:
>
> Node1 gets the original query
> Node1 sends sub-queries out to each shard.
> As the results come back, they're sorted one by one into a final
> list.
>
> For simplicity, let's claim _all_ the docs have the exact same score.
> The _first_
> shard's response will completely fill up the final list. The rest will
> be thrown on
> the floor as none of the docs from the other 6 shards will have a
> higher score than
> any doc currently in the list.
>
> Here's the important part. The order that the sub-requests come back varies
> due to a zillion possible causes, network latency, a minor GC pause on one
> of the shards, whether all the caches are loaded, whatever. So subsequent
> calls will happen to get some _other_ shards docs in the list first.
>
> Does that make sense?
>
> On Thu, Sep 10, 2015 at 4:48 AM, Modassar Ather 
> wrote:
> > If two documents come back from different
> > shards with the same score, the order would not be predictable
> >
> > This is fine.
> >
> > What I am not able to understand is that when I do not give a secondary
> > field for sort I am getting the result from one shard which changes to
> > other shard in other hits. Here the results are always from one shard.
> > E.g In first hit all the results are from shard1 and in next hit all the
> > results are from shard2.
> >
> > But when I add the secondary sort field I see the results from multiple
> > shards. E.g It has results from shard1 and shard2 both. This does not
> > change in multiple hits.
> >
> > So please help me understand why the similar result merge and aggregation
> > in not happening in when a single sort field is given?
> >
> > Regards,
> > Modassar
> >
> >
> >
> > On Thu, Sep 10, 2015 at 5:03 PM, Upayavira  wrote:
> >
> >> What scores are you getting? If two documents come back from different
> >> shards with the same score, the order would not be predictable -
> >> probably down to which shard responds first.
> >>
> >> Fix it with something like sort=score,timestamp or some other time
> >> related field.
> >>
> >> Upayavira
> >>
> >> On Thu, Sep 10, 2015, at 11:01 AM, Modassar Ather wrote:
> >> > To add to my previous observation I saw the response having results
> from
> >> > multiple shards when the secondary sort field is added and they remain
> >> > same
> >> > across hits.
> >> > Kindly help me understand this behavior. Why the results are changing
> as
> >> > I
> >> > understand that the result should be first clubbed together from all
> >> > shard
> >> > and then based on their score it should be sorted.
> >> > But here I see that every time I hit the sort query I am getting
> results
> >> > from different shard which has different scores.
> >> >
> >> > Thanks,
> >> > Modassar
> >> >
> >> > On Thu, Sep 10, 2015 at 2:59 PM, Modassar Ather <
> modather1...@gmail.com>
> >> > wrote:
> >> >
> >> > > Upayavira! I add the fl=id,score,[shard] and saw the shards
> changing in
> >> > > the response every time and for different shards the response
> changes
> >> but
> >> > > for the same shard result is same on multiple hits.
> >> > > When I add secondary sort field e.g. score the shard remains same
> >> across
> >> > > hits.
> >> > >
> >> > > On Thu, Sep 10, 2015 at 12:52 PM, Upayavira  wrote:
> >> > >
> >> > >> Add fl=id,score,[shard] to your query, and show us the results of
> two
> >> > >> differing executions.
> >> > >>
> >> > >> Perhaps we will be able to see the cause of the difference.
> >> > >>
> >> > >> Upayavira
> >> > >>
> >> > >> On Thu, Sep 10, 2015, at 05:35 AM, Moda

Re: [Help]Solr_Not_Responding

2015-10-30 Thread Modassar Ather
The information given is not sufficient to conclude a cause. You can check
the solr logs for details for any exception.

Regards,
Modassar


On Fri, Oct 30, 2015 at 10:12 AM, Franky Parulian Silalahi <
fra...@telunjuk.com> wrote:

> I have problem with my solr and i run in centos 7.
> sometime my solr is detected  as down, but when i check solr's service,
> that service is run. how it happen? and why ?
>


Very high memory and CPU utilization.

2015-11-01 Thread Modassar Ather
Hi,

I have a setup of 12 shard cluster started with 28gb memory each on a
single server. There are no replica. The size of index is around 90gb on
each shard. The Solr version is 5.2.1.
When I query "network se*", the memory utilization goes upto 24-26 gb and
the query takes around 3+ minutes to execute. Also the CPU utilization goes
upto 400% in few of the nodes.

Kindly note that use of wildcard in above query can not be restricted.

Please help me understand why so much of the memory utilization? Please
correct me if I am wrong that it is because of the term expansion of *se**.
Why the CPU utilization is so high and more than one core is used. As far
as I understand querying is single threaded.

Help me understand the behavior of query timeout. How the client is
notified about the query time out?
How can I disable replication(as it is implicitly enabled) permanently as
in our case we are not using it but can see warnings related to leader
election?

Thanks,
Modassar


Re: Very high memory and CPU utilization.

2015-11-02 Thread Modassar Ather
Hi Toke,
Thanks for your response. My comments in-line.

That is 12 machines, running a shard each?
No! This is a single big machine with 12 shards on it.

What is the total amount of physical memory on each machine?
Around 370 gb on the single machine.

Well, se* probably expands to a great deal of documents, but a huge bump
in memory utilization and 3 minutes+ sounds strange.

- What are your normal query times?
Few simple queries are returned with in a couple of seconds. But the more
complex queries with proximity and wild cards have taken more than 3-4
minutes and some times some queries have timed out too where time out is
set to 5 minutes.
- How many hits do you get from 'network se*'?
More than a million records.
- How many results do you return (the rows-parameter)?
It is the default one 10. Grouping is enabled on a field.
- If you issue a query without wildcards, but with approximately the
same amount of hits as 'network se*', how long does it take?
A query resulting in around half a million record return within a couple of
seconds.

That is strange, yes. Have you checked the logs to see if something
unexpected is going on while you test?
Have not seen anything particularly. Will try to check again.

If you are using spinning drives and only have 32GB of RAM in total in
each machine, you are probably struggling just to keep things running.
As mentioned above this is a big machine with 370+ gb of RAM and Solr (12
nodes total) is assigned 336 GB. The rest is still a good for other system
activities.

Thanks,
Modassar

On Mon, Nov 2, 2015 at 1:30 PM, Toke Eskildsen 
wrote:

> On Mon, 2015-11-02 at 12:00 +0530, Modassar Ather wrote:
> > I have a setup of 12 shard cluster started with 28gb memory each on a
> > single server. There are no replica. The size of index is around 90gb on
> > each shard. The Solr version is 5.2.1.
>
> That is 12 machines, running a shard each?
>
> What is the total amount of physical memory on each machine?
>
> > When I query "network se*", the memory utilization goes upto 24-26 gb and
> > the query takes around 3+ minutes to execute. Also the CPU utilization
> goes
> > upto 400% in few of the nodes.
>
> Well, se* probably expands to a great deal of documents, but a huge bump
> in memory utilization and 3 minutes+ sounds strange.
>
> - What are your normal query times?
> - How many hits do you get from 'network se*'?
> - How many results do you return (the rows-parameter)?
> - If you issue a query without wildcards, but with approximately the
> same amount of hits as 'network se*', how long does it take?
>
> > Why the CPU utilization is so high and more than one core is used.
> > As far as I understand querying is single threaded.
>
> That is strange, yes. Have you checked the logs to see if something
> unexpected is going on while you test?
>
> > How can I disable replication(as it is implicitly enabled) permanently as
> > in our case we are not using it but can see warnings related to leader
> > election?
>
> If you are using spinning drives and only have 32GB of RAM in total in
> each machine, you are probably struggling just to keep things running.
>
>
> - Toke Eskildsen, State and University Library, Denmark
>
>
>


Re: Very high memory and CPU utilization.

2015-11-02 Thread Modassar Ather
Just to add one more point that one external Zookeeper instance is also
running on this particular machine.

Regards,
Modassar

On Mon, Nov 2, 2015 at 2:34 PM, Modassar Ather 
wrote:

> Hi Toke,
> Thanks for your response. My comments in-line.
>
> That is 12 machines, running a shard each?
> No! This is a single big machine with 12 shards on it.
>
> What is the total amount of physical memory on each machine?
> Around 370 gb on the single machine.
>
> Well, se* probably expands to a great deal of documents, but a huge bump
> in memory utilization and 3 minutes+ sounds strange.
>
> - What are your normal query times?
> Few simple queries are returned with in a couple of seconds. But the more
> complex queries with proximity and wild cards have taken more than 3-4
> minutes and some times some queries have timed out too where time out is
> set to 5 minutes.
> - How many hits do you get from 'network se*'?
> More than a million records.
> - How many results do you return (the rows-parameter)?
> It is the default one 10. Grouping is enabled on a field.
> - If you issue a query without wildcards, but with approximately the
> same amount of hits as 'network se*', how long does it take?
> A query resulting in around half a million record return within a couple
> of seconds.
>
> That is strange, yes. Have you checked the logs to see if something
> unexpected is going on while you test?
> Have not seen anything particularly. Will try to check again.
>
> If you are using spinning drives and only have 32GB of RAM in total in
> each machine, you are probably struggling just to keep things running.
> As mentioned above this is a big machine with 370+ gb of RAM and Solr (12
> nodes total) is assigned 336 GB. The rest is still a good for other system
> activities.
>
> Thanks,
> Modassar
>
> On Mon, Nov 2, 2015 at 1:30 PM, Toke Eskildsen 
> wrote:
>
>> On Mon, 2015-11-02 at 12:00 +0530, Modassar Ather wrote:
>> > I have a setup of 12 shard cluster started with 28gb memory each on a
>> > single server. There are no replica. The size of index is around 90gb on
>> > each shard. The Solr version is 5.2.1.
>>
>> That is 12 machines, running a shard each?
>>
>> What is the total amount of physical memory on each machine?
>>
>> > When I query "network se*", the memory utilization goes upto 24-26 gb
>> and
>> > the query takes around 3+ minutes to execute. Also the CPU utilization
>> goes
>> > upto 400% in few of the nodes.
>>
>> Well, se* probably expands to a great deal of documents, but a huge bump
>> in memory utilization and 3 minutes+ sounds strange.
>>
>> - What are your normal query times?
>> - How many hits do you get from 'network se*'?
>> - How many results do you return (the rows-parameter)?
>> - If you issue a query without wildcards, but with approximately the
>> same amount of hits as 'network se*', how long does it take?
>>
>> > Why the CPU utilization is so high and more than one core is used.
>> > As far as I understand querying is single threaded.
>>
>> That is strange, yes. Have you checked the logs to see if something
>> unexpected is going on while you test?
>>
>> > How can I disable replication(as it is implicitly enabled) permanently
>> as
>> > in our case we are not using it but can see warnings related to leader
>> > election?
>>
>> If you are using spinning drives and only have 32GB of RAM in total in
>> each machine, you are probably struggling just to keep things running.
>>
>>
>> - Toke Eskildsen, State and University Library, Denmark
>>
>>
>>
>


Re: Very high memory and CPU utilization.

2015-11-02 Thread Modassar Ather
Thanks Jim for your response.

The remaining size after you removed the heap usage should be reserved for
the index (not only the other system activities).
I am not able to get  the above point. So when I start Solr with 28g RAM,
for all the activities related to Solr it should not go beyond 28g. And the
remaining heap will be used for activities other than Solr. Please help me
understand.

*Also the CPU utilization goes upto 400% in few of the nodes:*
You said that only machine is used so I assumed that 400% cpu is for a
single process (one solr node), right ?
Yes you are right that 400% is for single process.
The disks are SSDs.

Regards,
Modassar

On Mon, Nov 2, 2015 at 4:09 PM, jim ferenczi  wrote:

> *if it correlates with the bad performance you're seeing. One important
> thing to notice is that a significant part of your index needs to be in RAM
> (especially if you're using SSDs) in order to achieve good performance.*
>
> Especially if you're not using SSDs, sorry ;)
>
> 2015-11-02 11:38 GMT+01:00 jim ferenczi :
>
> > 12 shards with 28GB for the heap and 90GB for each index means that you
> > need at least 336GB for the heap (assuming you're using all of it which
> may
> > be easily the case considering the way the GC is handling memory) and ~=
> > 1TO for the index. Let's say that you don't need your entire index in
> RAM,
> > the problem as I see it is that you don't have enough RAM for your index
> +
> > heap. Assuming your machine has 370GB of RAM there are only 34GB left for
> > your index, 1TO/34GB means that you can only have 1/30 of your entire
> index
> > in RAM. I would advise you to check the swap activity on the machine and
> > see if it correlates with the bad performance you're seeing. One
> important
> > thing to notice is that a significant part of your index needs to be in
> RAM
> > (especially if you're using SSDs) in order to achieve good performance:
> >
> >
> >
> > *As mentioned above this is a big machine with 370+ gb of RAM and Solr
> (12
> > nodes total) is assigned 336 GB. The rest is still a good for other
> system
> > activities.*
> > The remaining size after you removed the heap usage should be reserved
> for
> > the index (not only the other system activities).
> >
> >
> > *Also the CPU utilization goes upto 400% in few of the nodes:*
> > You said that only machine is used so I assumed that 400% cpu is for a
> > single process (one solr node), right ?
> > This seems impossible if you are sure that only one query is played at a
> > time and no indexing is performed. Best thing to do is to dump stack
> trace
> > of the solr nodes during the query and to check what the threads are
> doing.
> >
> > Jim
> >
> >
> >
> > 2015-11-02 10:38 GMT+01:00 Modassar Ather :
> >
> >> Just to add one more point that one external Zookeeper instance is also
> >> running on this particular machine.
> >>
> >> Regards,
> >> Modassar
> >>
> >> On Mon, Nov 2, 2015 at 2:34 PM, Modassar Ather 
> >> wrote:
> >>
> >> > Hi Toke,
> >> > Thanks for your response. My comments in-line.
> >> >
> >> > That is 12 machines, running a shard each?
> >> > No! This is a single big machine with 12 shards on it.
> >> >
> >> > What is the total amount of physical memory on each machine?
> >> > Around 370 gb on the single machine.
> >> >
> >> > Well, se* probably expands to a great deal of documents, but a huge
> bump
> >> > in memory utilization and 3 minutes+ sounds strange.
> >> >
> >> > - What are your normal query times?
> >> > Few simple queries are returned with in a couple of seconds. But the
> >> more
> >> > complex queries with proximity and wild cards have taken more than 3-4
> >> > minutes and some times some queries have timed out too where time out
> is
> >> > set to 5 minutes.
> >> > - How many hits do you get from 'network se*'?
> >> > More than a million records.
> >> > - How many results do you return (the rows-parameter)?
> >> > It is the default one 10. Grouping is enabled on a field.
> >> > - If you issue a query without wildcards, but with approximately the
> >> > same amount of hits as 'network se*', how long does it take?
> >> > A query resulting in around half a million record return within a
> couple
> >> > of seconds.
> >> >
> >> > That is strange, yes. 

Re: Very high memory and CPU utilization.

2015-11-02 Thread Modassar Ather
Okay. I guess your observation of 400% for a single core is with top and
looking at that core's entry? If so, the 400% can be explained by
excessive garbage collection. You could turn GC-logging on to check
that. With a bit of luck GC would be the cause of the slow down.

Yes it is with top command. I will check GC activities and try to relate
with CPU usage.

The query q=network se* is quick enough in our system too. It takes around
3-4 seconds for around 8 million records.
The problem is with the same query as phrase. q="network se*".
Can you please share your experience with such query where the wild card
expansion is huge like in the query above?

I changed my SolrCloud setup from 12 shard to 8 shard and given each shard
30 GB of RAM on the same machine with same index size (re-indexed) but
could not see the significant improvement for the query given.

I will check the swap activity.

Also can you please share your experiences with respect to RAM, GC, solr
cache setup etc as it seems by your comment that the SolrCloud environment
you have is kind of similar to the one I work on?

Regards,
Modassar

On Mon, Nov 2, 2015 at 5:20 PM, Toke Eskildsen 
wrote:

> On Mon, 2015-11-02 at 16:25 +0530, Modassar Ather wrote:
> > The remaining size after you removed the heap usage should be reserved
> for
> > the index (not only the other system activities).
> > I am not able to get  the above point. So when I start Solr with 28g RAM,
> > for all the activities related to Solr it should not go beyond 28g. And
> the
> > remaining heap will be used for activities other than Solr. Please help
> me
> > understand.
>
> It is described here:
> https://wiki.apache.org/solr/SolrPerformanceProblems#OS_Disk_Cache
>
> I will be quick to add that I do not agree with Shawn (the primary
> author of the page) on the stated limits and find that the page in
> general ignores that performance requirements differ a great deal.
> Nevertheless, it is very true that Solr performance is tied to the
> amount of OS disk cache:
>
> You can have a machine with 10TB of RAM, but Solr performance will still
> be poor if you use it all for JVMs.
>
> Practically all modern operating system uses free memory for disk cache.
> Free memory is the memory not used for JVMs or other programs. It might
> be that you have a lot less than 30-40GB free: If you are on a Linux
> server, try calling 'top' and see what is says under 'cached'.
>
> Related, I support jim's suggestion to inspect the swap activity:
> In the past we had problem with a machine that insisted on swapping
> excessively, although there were high IO and free memory.
>
> > The disks are SSDs.
>
> That makes your observations stranger still.
>
>
> - Toke Eskildsen, State and University Library, Denmark
>
>
>


Re: Very high memory and CPU utilization.

2015-11-02 Thread Modassar Ather
I monitored swap activities for the query using vmstat. The *so* and *si*
shows 0 till the completion of query. Also the top showed 0 against swap.
This means there was no scarcity of physical memory. Swap activity seems
not to be a bottleneck.
Kindly note that this I ran on 8 node cluster with 30 gb RAM and 140 gb of
index on each node.

Regards,
Modassar

On Mon, Nov 2, 2015 at 5:27 PM, Modassar Ather 
wrote:

> Okay. I guess your observation of 400% for a single core is with top and
> looking at that core's entry? If so, the 400% can be explained by
> excessive garbage collection. You could turn GC-logging on to check
> that. With a bit of luck GC would be the cause of the slow down.
>
> Yes it is with top command. I will check GC activities and try to relate
> with CPU usage.
>
> The query q=network se* is quick enough in our system too. It takes around
> 3-4 seconds for around 8 million records.
> The problem is with the same query as phrase. q="network se*".
> Can you please share your experience with such query where the wild card
> expansion is huge like in the query above?
>
> I changed my SolrCloud setup from 12 shard to 8 shard and given each shard
> 30 GB of RAM on the same machine with same index size (re-indexed) but
> could not see the significant improvement for the query given.
>
> I will check the swap activity.
>
> Also can you please share your experiences with respect to RAM, GC, solr
> cache setup etc as it seems by your comment that the SolrCloud environment
> you have is kind of similar to the one I work on?
>
> Regards,
> Modassar
>
> On Mon, Nov 2, 2015 at 5:20 PM, Toke Eskildsen 
> wrote:
>
>> On Mon, 2015-11-02 at 16:25 +0530, Modassar Ather wrote:
>> > The remaining size after you removed the heap usage should be reserved
>> for
>> > the index (not only the other system activities).
>> > I am not able to get  the above point. So when I start Solr with 28g
>> RAM,
>> > for all the activities related to Solr it should not go beyond 28g. And
>> the
>> > remaining heap will be used for activities other than Solr. Please help
>> me
>> > understand.
>>
>> It is described here:
>> https://wiki.apache.org/solr/SolrPerformanceProblems#OS_Disk_Cache
>>
>> I will be quick to add that I do not agree with Shawn (the primary
>> author of the page) on the stated limits and find that the page in
>> general ignores that performance requirements differ a great deal.
>> Nevertheless, it is very true that Solr performance is tied to the
>> amount of OS disk cache:
>>
>> You can have a machine with 10TB of RAM, but Solr performance will still
>> be poor if you use it all for JVMs.
>>
>> Practically all modern operating system uses free memory for disk cache.
>> Free memory is the memory not used for JVMs or other programs. It might
>> be that you have a lot less than 30-40GB free: If you are on a Linux
>> server, try calling 'top' and see what is says under 'cached'.
>>
>> Related, I support jim's suggestion to inspect the swap activity:
>> In the past we had problem with a machine that insisted on swapping
>> excessively, although there were high IO and free memory.
>>
>> > The disks are SSDs.
>>
>> That makes your observations stranger still.
>>
>>
>> - Toke Eskildsen, State and University Library, Denmark
>>
>>
>>
>


Re: warning

2015-11-02 Thread Modassar Ather
Normally tlog is replayed in case if solr server crashes for some reason
and when restarted it tries to recover from the crash gracefully.
You can look into following documentation which explains about transaction
logs and related stuff of Solr.

http://lucidworks.com/blog/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

regards,
Modassar

On Mon, Nov 2, 2015 at 12:22 PM, Midas A  wrote:

> Please explain following warning
>
> Starting log replay
> tlog{file=/mnt/vol1/path/data/tlog/tlog.0060544 refcount=2}
> active=false starting pos=0
>
> Is there any harm with this error ?
>


Re: Very high memory and CPU utilization.

2015-11-02 Thread Modassar Ather
The problem is with the same query as phrase. q="network se*".

The last . is fullstops for the sentence and the query is q=field:"network
se*"

Best,
Modassar

On Mon, Nov 2, 2015 at 6:10 PM, jim ferenczi  wrote:

> Oups I did not read the thread carrefully.
> *The problem is with the same query as phrase. q="network se*".*
> I was not aware that you could do that with Solr ;). I would say this is
> expected because in such case if the number of expansions for "se*" is big
> then you would have to check the positions for a significant words. I don't
> know if there is a limitation in the number of expansions for a prefix
> query contained into a phrase query but I would look at this parameter
> first (limit the number of expansion per prefix search, let's say the N
> most significant words based on the frequency of the words for instance).
>
> 2015-11-02 13:36 GMT+01:00 jim ferenczi :
>
> >
> >
> >
> > *I am not able to get  the above point. So when I start Solr with 28g
> RAM,
> > for all the activities related to Solr it should not go beyond 28g. And
> the
> > remaining heap will be used for activities other than Solr. Please help
> me
> > understand.*
> >
> > Well those 28GB of heap are the memory "reserved" for your Solr
> > application, though some parts of the index (not to say all) are
> retrieved
> > via MMap (if you use the default MMapDirectory) which do not use the heap
> > at all. This is a very important part of Lucene/Solr, the heap should be
> > sized in a way that let a significant amount of RAM available for the
> > index. If not then you rely on the speed of your disk, if you have SSDs
> > it's better but reads are still significantly slower with SSDs than with
> > direct RAM access. Another thing to keep in mind is that mmap will always
> > tries to put things in RAM, this is why I suspect that you swap activity
> is
> > killing your performance.
> >
> > 2015-11-02 11:55 GMT+01:00 Modassar Ather :
> >
> >> Thanks Jim for your response.
> >>
> >> The remaining size after you removed the heap usage should be reserved
> for
> >> the index (not only the other system activities).
> >> I am not able to get  the above point. So when I start Solr with 28g
> RAM,
> >> for all the activities related to Solr it should not go beyond 28g. And
> >> the
> >> remaining heap will be used for activities other than Solr. Please help
> me
> >> understand.
> >>
> >> *Also the CPU utilization goes upto 400% in few of the nodes:*
> >> You said that only machine is used so I assumed that 400% cpu is for a
> >> single process (one solr node), right ?
> >> Yes you are right that 400% is for single process.
> >> The disks are SSDs.
> >>
> >> Regards,
> >> Modassar
> >>
> >> On Mon, Nov 2, 2015 at 4:09 PM, jim ferenczi 
> >> wrote:
> >>
> >> > *if it correlates with the bad performance you're seeing. One
> important
> >> > thing to notice is that a significant part of your index needs to be
> in
> >> RAM
> >> > (especially if you're using SSDs) in order to achieve good
> performance.*
> >> >
> >> > Especially if you're not using SSDs, sorry ;)
> >> >
> >> > 2015-11-02 11:38 GMT+01:00 jim ferenczi :
> >> >
> >> > > 12 shards with 28GB for the heap and 90GB for each index means that
> >> you
> >> > > need at least 336GB for the heap (assuming you're using all of it
> >> which
> >> > may
> >> > > be easily the case considering the way the GC is handling memory)
> and
> >> ~=
> >> > > 1TO for the index. Let's say that you don't need your entire index
> in
> >> > RAM,
> >> > > the problem as I see it is that you don't have enough RAM for your
> >> index
> >> > +
> >> > > heap. Assuming your machine has 370GB of RAM there are only 34GB
> left
> >> for
> >> > > your index, 1TO/34GB means that you can only have 1/30 of your
> entire
> >> > index
> >> > > in RAM. I would advise you to check the swap activity on the machine
> >> and
> >> > > see if it correlates with the bad performance you're seeing. One
> >> > important
> >> > > thing to notice is that a significant part of your index needs to be
> >> in
> >> > RAM
> >> > &

Re: Very high memory and CPU utilization.

2015-11-02 Thread Modassar Ather
Thanks Walter for your response,

It is around 90GB of index (around 8 million documents) on one shard and
there are 12 such shards. As per my understanding the sharding is required
for this case. Please help me understand if it is not required.

We have requirements where we need full wild card support to be provided to
our users.
I will try using EdgeNgramFilter. Can you please help me understand if
EdgeNgramFilter can be a replacement of wild cards?
There are situations where the words may be extended with some special
characters e.g. For se* there can be a match secondry-school which also
needs to be considered.

Regards,
Modassar



On Mon, Nov 2, 2015 at 10:17 PM, Walter Underwood 
wrote:

> To back up a bit, how many documents are in this 90GB index? You might not
> need to shard at all.
>
> Why are you sending a query with a trailing wildcard? Are you matching the
> prefix of words, for query completion? If so, look at the suggester, which
> is designed to solve exactly that. Or you can use the EdgeNgramFilter to
> index prefixes. That will make your index larger, but prefix searches will
> be very fast.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>
> > On Nov 2, 2015, at 5:17 AM, Toke Eskildsen 
> wrote:
> >
> > On Mon, 2015-11-02 at 17:27 +0530, Modassar Ather wrote:
> >
> >> The query q=network se* is quick enough in our system too. It takes
> >> around 3-4 seconds for around 8 million records.
> >>
> >> The problem is with the same query as phrase. q="network se*".
> >
> > I misunderstood your query then. I tried replicating it with
> > q="der se*"
> >
> > http://rosalind:52300/solr/collection1/select?q=%22der+se*%
> > 22&wt=json&indent=true&facet=false&group=true&group.field=domain
> >
> > gets expanded to
> >
> > parsedquery": "(+DisjunctionMaxQuery((content_text:\"kan svane\" |
> > author:kan svane* | text:\"kan svane\" | title:\"kan svane\" | url:kan
> > svane* | description:\"kan svane\")) ())/no_coord"
> >
> > The result was 1,043,258,271 hits in 15,211 ms
> >
> >
> > Interestingly enough, a search for
> > q="kan svane*"
> > resulted in 711 hits in 12,470 ms. Maybe because 'kan' alone matches 1
> > billion+ documents. On that note,
> > q=se*
> > resulted in -951812427 hits in 194,276 ms.
> >
> > Now this is interesting. The negative number seems to be caused by
> > grouping, but I finally got the response time up in the minutes. Still
> > no memory problems though. Hits without grouping were 3,343,154,869.
> >
> > For comparison,
> > q=http
> > resulted in -1527418054 hits in 87,464 ms. Without grouping the hit
> > count was 7,062,516,538. Twice the hits of 'se*' in half the time.
> >
> >> I changed my SolrCloud setup from 12 shard to 8 shard and given each
> >> shard 30 GB of RAM on the same machine with same index size
> >> (re-indexed) but could not see the significant improvement for the
> >> query given.
> >
> > Strange. I would have expected the extra free memory for disk space to
> > help performance.
> >
> >> Also can you please share your experiences with respect to RAM, GC,
> >> solr cache setup etc as it seems by your comment that the SolrCloud
> >> environment you have is kind of similar to the one I work on?
> >>
> > There is a short write up at
> > https://sbdevel.wordpress.com/net-archive-search/
> >
> > - Toke Eskildsen, State and University Library, Denmark
> >
> >
> >
>
>


Re: warning

2015-11-02 Thread Modassar Ather
The information is not sufficient to say something. You can refer to solr
log to find the reason of log replay.
You can also check if the index is as per expectation. E.g Number of
document indexed.

Regards,
Modassar

On Tue, Nov 3, 2015 at 11:11 AM, Midas A  wrote:

> Thanks Modassar for replying ,
>
> could u please elaborate ..what wuld have happened when we were getting
> this kind of warning  ds
>
> Regards,
> Abhishek Tiwari
>
> On Mon, Nov 2, 2015 at 6:00 PM, Modassar Ather 
> wrote:
>
> > Normally tlog is replayed in case if solr server crashes for some reason
> > and when restarted it tries to recover from the crash gracefully.
> > You can look into following documentation which explains about
> transaction
> > logs and related stuff of Solr.
> >
> >
> >
> http://lucidworks.com/blog/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
> >
> > regards,
> > Modassar
> >
> > On Mon, Nov 2, 2015 at 12:22 PM, Midas A  wrote:
> >
> > > Please explain following warning
> > >
> > > Starting log replay
> > > tlog{file=/mnt/vol1/path/data/tlog/tlog.0060544 refcount=2}
> > > active=false starting pos=0
> > >
> > > Is there any harm with this error ?
> > >
> >
>


Re: Solr Cloud and Multiple Indexes

2015-11-05 Thread Modassar Ather
What is your index size? How much memory is used? What type of queries are
slow?
Are there GC pauses as they can be a cause of slowness?
Are document updates/additions happening in parallel?

The queries are very slow to run so I was thinking to distribute
the indexes into multiple indexes and consequently distributed search. Can
anyone guide me to some sources (articles) that discuss this in Solr Cloud?

This is what you are already doing. Did you mean that you want to add more
shards?

Regards,
Modassar

On Thu, Nov 5, 2015 at 1:51 PM, Salman Ansari 
wrote:

> Hi,
>
> I am using Solr cloud and I have created a single index that host around
> 70M documents distributed into 2 shards (each having 35M documents) and 2
> replicas. The queries are very slow to run so I was thinking to distribute
> the indexes into multiple indexes and consequently distributed search. Can
> anyone guide me to some sources (articles) that discuss this in Solr Cloud?
>
> Appreciate your feedback regarding this.
>
> Regards,
> Salman
>


Re: Solr Cloud and Multiple Indexes

2015-11-05 Thread Modassar Ather
SolrCloud makes the distributed search easier. You can find details about
it under following link.
https://cwiki.apache.org/confluence/display/solr/How+SolrCloud+Works

You can also refer to following link:
https://cwiki.apache.org/confluence/display/solr/Shards+and+Indexing+Data+in+SolrCloud

>From size of your index I meant index size and not the total document alone.
How many segments are there in the index? The more the segment the slower
is the search.
What is the Xms and Xmx you are allocating to Solr and how much max is used
by your solr?

I doubt this as the slowness was happening for a long period of time.
I mentioned this point as I have seen gc pauses of 30 seconds and more in
some complex queries.

I am facing delay of 2-3 seconds but previously I
had delays of around 28 seconds.
Is this after you moved to solrcloud?

Regards,
Modassar


On Thu, Nov 5, 2015 at 3:09 PM, Salman Ansari 
wrote:

> Here is the current info
>
> How much memory is used?
> Physical memory consumption: 5.48 GB out of 14 GB.
> Swap space consumption: 5.83 GB out of 15.94 GB.
> JVM-Memory consumption: 1.58 GB out of 3.83 GB.
>
> What is your index size?
> I have around 70M documents distributed on 2 shards (so each shard has 35M
> document)
>
> What type of queries are slow?
> I am running normal queries (queries on a field) no faceting or highlights
> are requested. Currently, I am facing delay of 2-3 seconds but previously I
> had delays of around 28 seconds.
>
> Are there GC pauses as they can be a cause of slowness?
> I doubt this as the slowness was happening for a long period of time.
>
> Are document updates/additions happening in parallel?
> No, I have stopped adding/updating documents and doing queries only.
>
> This is what you are already doing. Did you mean that you want to add more
> shards?
> No, what I meant is that I read that previously there was a way to chunk a
> large index into multiple and then do distributed search on that as in this
> article https://wiki.apache.org/solr/DistributedSearch. What I was looking
> for how this is handled in Solr Cloud?
>
>
> Regards,
> Salman
>
>
>
>
>
> On Thu, Nov 5, 2015 at 12:06 PM, Modassar Ather 
> wrote:
>
> > What is your index size? How much memory is used? What type of queries
> are
> > slow?
> > Are there GC pauses as they can be a cause of slowness?
> > Are document updates/additions happening in parallel?
> >
> > The queries are very slow to run so I was thinking to distribute
> > the indexes into multiple indexes and consequently distributed search.
> Can
> > anyone guide me to some sources (articles) that discuss this in Solr
> Cloud?
> >
> > This is what you are already doing. Did you mean that you want to add
> more
> > shards?
> >
> > Regards,
> > Modassar
> >
> > On Thu, Nov 5, 2015 at 1:51 PM, Salman Ansari 
> > wrote:
> >
> > > Hi,
> > >
> > > I am using Solr cloud and I have created a single index that host
> around
> > > 70M documents distributed into 2 shards (each having 35M documents)
> and 2
> > > replicas. The queries are very slow to run so I was thinking to
> > distribute
> > > the indexes into multiple indexes and consequently distributed search.
> > Can
> > > anyone guide me to some sources (articles) that discuss this in Solr
> > Cloud?
> > >
> > > Appreciate your feedback regarding this.
> > >
> > > Regards,
> > > Salman
> > >
> >
>


Re: Solr Cloud and Multiple Indexes

2015-11-05 Thread Modassar Ather
Thanks for your response. I have already gone through those documents
before. My point was that if I am using Solr Cloud the only way to
distribute my indexes is by adding shards? and I don't have to do anything
manually (because all the distributed search is handled by Solr Cloud).

Yes as per my knowledge.

How do I check how many segments are there in the index?
You can see into the index folder manually. Which version of solr are you
using? I don't remember exactly the start version but in the latest and
Solr-5.2.1 there is a "Segments info" link available where you can see
number of segments.

Regards,
Modassar

On Thu, Nov 5, 2015 at 5:41 PM, Salman Ansari 
wrote:

> Thanks for your response. I have already gone through those documents
> before. My point was that if I am using Solr Cloud the only way to
> distribute my indexes is by adding shards? and I don't have to do anything
> manually (because all the distributed search is handled by Solr Cloud).
>
> What is the Xms and Xmx you are allocating to Solr and how much max is
> used by
> your solr?
> Xms and Xmx are both 4G. My current JVM-Memory consumption is 1.58 GB
>
> How many segments are there in the index? The more the segment the slower
> is
> the search.
> How do I check how many segments are there in the index?
>
> Is this after you moved to solrcloud?
> I have been using SolrCloud from the beginning.
>
> Regards,
> Salman
>
>
> On Thu, Nov 5, 2015 at 1:21 PM, Modassar Ather 
> wrote:
>
> > SolrCloud makes the distributed search easier. You can find details about
> > it under following link.
> > https://cwiki.apache.org/confluence/display/solr/How+SolrCloud+Works
> >
> > You can also refer to following link:
> >
> >
> https://cwiki.apache.org/confluence/display/solr/Shards+and+Indexing+Data+in+SolrCloud
> >
> > From size of your index I meant index size and not the total document
> > alone.
> > How many segments are there in the index? The more the segment the slower
> > is the search.
> > What is the Xms and Xmx you are allocating to Solr and how much max is
> used
> > by your solr?
> >
> > I doubt this as the slowness was happening for a long period of time.
> > I mentioned this point as I have seen gc pauses of 30 seconds and more in
> > some complex queries.
> >
> > I am facing delay of 2-3 seconds but previously I
> > had delays of around 28 seconds.
> > Is this after you moved to solrcloud?
> >
> > Regards,
> > Modassar
> >
> >
> > On Thu, Nov 5, 2015 at 3:09 PM, Salman Ansari 
> > wrote:
> >
> > > Here is the current info
> > >
> > > How much memory is used?
> > > Physical memory consumption: 5.48 GB out of 14 GB.
> > > Swap space consumption: 5.83 GB out of 15.94 GB.
> > > JVM-Memory consumption: 1.58 GB out of 3.83 GB.
> > >
> > > What is your index size?
> > > I have around 70M documents distributed on 2 shards (so each shard has
> > 35M
> > > document)
> > >
> > > What type of queries are slow?
> > > I am running normal queries (queries on a field) no faceting or
> > highlights
> > > are requested. Currently, I am facing delay of 2-3 seconds but
> > previously I
> > > had delays of around 28 seconds.
> > >
> > > Are there GC pauses as they can be a cause of slowness?
> > > I doubt this as the slowness was happening for a long period of time.
> > >
> > > Are document updates/additions happening in parallel?
> > > No, I have stopped adding/updating documents and doing queries only.
> > >
> > > This is what you are already doing. Did you mean that you want to add
> > more
> > > shards?
> > > No, what I meant is that I read that previously there was a way to
> chunk
> > a
> > > large index into multiple and then do distributed search on that as in
> > this
> > > article https://wiki.apache.org/solr/DistributedSearch. What I was
> > looking
> > > for how this is handled in Solr Cloud?
> > >
> > >
> > > Regards,
> > > Salman
> > >
> > >
> > >
> > >
> > >
> > > On Thu, Nov 5, 2015 at 12:06 PM, Modassar Ather <
> modather1...@gmail.com>
> > > wrote:
> > >
> > > > What is your index size? How much memory is used? What type of
> queries
> > > are
> > > > slow?
> > > > Are there GC pauses as they can be a cause of slowness?
> > > > Are document updates/additions happening in parallel?
&g

Exception in grouping with docValues enable field.

2015-11-05 Thread Modassar Ather
Hi,

I have following docValues enabled field.

*Field : *
*Type:  *

When I am grouping on this field I am getting following exception. Kindly
let me know if I am missing something or it is an issue.

  org.apache.solr.common.SolrException; java.lang.NullPointerException
at org.apache.solr.schema.FieldType.toExternal(FieldType.java:346)
at org.apache.solr.schema.FieldType.toObject(FieldType.java:355)
at
org.apache.solr.search.grouping.endresulttransformer.GroupedEndResultTransformer.transform(GroupedEndResultTransformer.java:72)
at
org.apache.solr.handler.component.QueryComponent.groupedFinishStage(QueryComponent.java:810)
at
org.apache.solr.handler.component.QueryComponent.finishStage(QueryComponent.java:768)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:394)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2064)
at
org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:654)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:450)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:227)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:196)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
at org.eclipse.jetty.server.Server.handle(Server.java:497)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)
at
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
at
org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
at java.lang.Thread.run(Thread.java:745)

Thanks,
Modassar


Re: Exception in grouping with docValues enable field.

2015-11-08 Thread Modassar Ather
Hi,

Any input will be really helpful.

Regards,
Modassar

On Fri, Nov 6, 2015 at 11:20 AM, Modassar Ather 
wrote:

> Hi,
>
> I have following docValues enabled field.
>
> *Field : *
> *Type:  * sortMissingLast="true" stored="false" indexed="false" docValues="true"/>
>
> When I am grouping on this field I am getting following exception. Kindly
> let me know if I am missing something or it is an issue.
>
>   org.apache.solr.common.SolrException; java.lang.NullPointerException
> at org.apache.solr.schema.FieldType.toExternal(FieldType.java:346)
> at org.apache.solr.schema.FieldType.toObject(FieldType.java:355)
> at
> org.apache.solr.search.grouping.endresulttransformer.GroupedEndResultTransformer.transform(GroupedEndResultTransformer.java:72)
> at
> org.apache.solr.handler.component.QueryComponent.groupedFinishStage(QueryComponent.java:810)
> at
> org.apache.solr.handler.component.QueryComponent.finishStage(QueryComponent.java:768)
> at
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:394)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:2064)
> at
> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:654)
> at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:450)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:227)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:196)
> at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
> at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
> at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
> at
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
> at
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
> at
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
> at
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
> at
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
> at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
> at org.eclipse.jetty.server.Server.handle(Server.java:497)
> at
> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)
> at
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
> at
> org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
> at java.lang.Thread.run(Thread.java:745)
>
> Thanks,
> Modassar
>


Re: Exception in grouping with docValues enable field.

2015-11-08 Thread Modassar Ather
Thanks Alexandre,
It seems to be the same problem. The version of Solr I am using is 5.2.1.

Regards,
Modassar

On Mon, Nov 9, 2015 at 9:37 AM, Alexandre Rafalovitch 
wrote:

> SOLR-4647 ?
>
> But your name is already in that JIRA, so perhaps something else
> similar. You also did not mention the version of Solr.
>
> Regards,
>Alex.
> 
> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> http://www.solr-start.com/
>
>
> On 8 November 2015 at 22:59, Modassar Ather 
> wrote:
> > Hi,
> >
> > Any input will be really helpful.
> >
> > Regards,
> > Modassar
> >
> > On Fri, Nov 6, 2015 at 11:20 AM, Modassar Ather 
> > wrote:
> >
> >> Hi,
> >>
> >> I have following docValues enabled field.
> >>
> >> *Field : *
> >> *Type:  * >> sortMissingLast="true" stored="false" indexed="false" docValues="true"/>
> >>
> >> When I am grouping on this field I am getting following exception.
> Kindly
> >> let me know if I am missing something or it is an issue.
> >>
> >>   org.apache.solr.common.SolrException;
> java.lang.NullPointerException
> >> at
> org.apache.solr.schema.FieldType.toExternal(FieldType.java:346)
> >> at org.apache.solr.schema.FieldType.toObject(FieldType.java:355)
> >> at
> >>
> org.apache.solr.search.grouping.endresulttransformer.GroupedEndResultTransformer.transform(GroupedEndResultTransformer.java:72)
> >> at
> >>
> org.apache.solr.handler.component.QueryComponent.groupedFinishStage(QueryComponent.java:810)
> >> at
> >>
> org.apache.solr.handler.component.QueryComponent.finishStage(QueryComponent.java:768)
> >> at
> >>
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:394)
> >> at
> >>
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
> >> at org.apache.solr.core.SolrCore.execute(SolrCore.java:2064)
> >> at
> >> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:654)
> >> at
> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:450)
> >> at
> >>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:227)
> >> at
> >>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:196)
> >> at
> >>
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
> >> at
> >>
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
> >> at
> >>
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
> >> at
> >>
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
> >> at
> >>
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
> >> at
> >>
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
> >> at
> >>
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
> >> at
> >>
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
> >> at
> >>
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
> >> at
> >>
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
> >> at
> >>
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
> >> at
> >>
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
> >> at
> >>
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
> >> at org.eclipse.jetty.server.Server.handle(Server.java:497)
> >> at
> >> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)
> >> at
> >>
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
> >> at
> >>
> org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
> >> at
> >>
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
> >> at
> >>
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
> >> at java.lang.Thread.run(Thread.java:745)
> >>
> >> Thanks,
> >> Modassar
> >>
>


Re: Solr Cloud and Multiple Indexes

2015-11-08 Thread Modassar Ather
As per my understanding if the data getting indexed is completely different
and does not fall into same schema they can be segregated for indexing.
But if they fit into same schema then it is better to keep them in same
index and if the index size grows then switch to SolrCloud as it has lots
of benefits.

Our is a 12 shard cluster and each cluster has around 100 gb of index on
each of them. The simple query response is very fast.

Currently, I am using Solr 5.3. btw, I could not find segment info link. Is
it under Core Admin?
Select your core on the dashboard. The last link is segment info link.

Regards,
Modassar

On Sun, Nov 8, 2015 at 3:07 PM, Salman Ansari 
wrote:

> Just to give you a context of what I am talking about, I am collecting data
> from different sources (such as articles, videos etc.). Moreover, I will be
> doing enrichment on the data such as Entity Extraction. From my previous
> experiment with Solr what I was doing is dumping all articles, videos meta
> data into a single index (distributed into multiple shards). Now that made
> the whole query very slow. So for entity extraction, I created another
> index on the same shards and pushed entities there. This actually made
> querying entities very quick as there was very little data on that index
> (although it was residing on the same machine as the main index).
>
> Based on that quick experiment, I was thinking if I  need to use another
> approach for my data. For example, instead of just relying on Solr Cloud to
> distribute my data on different shards, why don't I create another index
> for each type of data I have, such as articles, videos and then perform
> some sort of distributed search over them. Will that be better in some
> sense, such as performance?
>
> Which version of solr are you using?
> Currently, I am using Solr 5.3. btw, I could not find segment info link. Is
> it under Core Admin?
>
> Regards,
> Salman
>
>
> On Fri, Nov 6, 2015 at 7:26 AM, Modassar Ather 
> wrote:
>
> > Thanks for your response. I have already gone through those documents
> > before. My point was that if I am using Solr Cloud the only way to
> > distribute my indexes is by adding shards? and I don't have to do
> anything
> > manually (because all the distributed search is handled by Solr Cloud).
> >
> > Yes as per my knowledge.
> >
> > How do I check how many segments are there in the index?
> > You can see into the index folder manually. Which version of solr are you
> > using? I don't remember exactly the start version but in the latest and
> > Solr-5.2.1 there is a "Segments info" link available where you can see
> > number of segments.
> >
> > Regards,
> > Modassar
> >
> > On Thu, Nov 5, 2015 at 5:41 PM, Salman Ansari 
> > wrote:
> >
> > > Thanks for your response. I have already gone through those documents
> > > before. My point was that if I am using Solr Cloud the only way to
> > > distribute my indexes is by adding shards? and I don't have to do
> > anything
> > > manually (because all the distributed search is handled by Solr Cloud).
> > >
> > > What is the Xms and Xmx you are allocating to Solr and how much max is
> > > used by
> > > your solr?
> > > Xms and Xmx are both 4G. My current JVM-Memory consumption is 1.58 GB
> > >
> > > How many segments are there in the index? The more the segment the
> slower
> > > is
> > > the search.
> > > How do I check how many segments are there in the index?
> > >
> > > Is this after you moved to solrcloud?
> > > I have been using SolrCloud from the beginning.
> > >
> > > Regards,
> > > Salman
> > >
> > >
> > > On Thu, Nov 5, 2015 at 1:21 PM, Modassar Ather  >
> > > wrote:
> > >
> > > > SolrCloud makes the distributed search easier. You can find details
> > about
> > > > it under following link.
> > > > https://cwiki.apache.org/confluence/display/solr/How+SolrCloud+Works
> > > >
> > > > You can also refer to following link:
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/solr/Shards+and+Indexing+Data+in+SolrCloud
> > > >
> > > > From size of your index I meant index size and not the total document
> > > > alone.
> > > > How many segments are there in the index? The more the segment the
> > slower
> > > > is the search.
> > > > What is the Xms and Xmx you are allocating to Solr and how much max
> is
> > > used
> > > >

Difference in query behavior.

2015-11-30 Thread Modassar Ather
Hi,

I have a query title:(solr lucene api). The mm is set to 100% using q.op as
AND.
When the query is executed it returns documnets having all the terms. It
parses to following:
+(title:solr title:faceting title:api)~3

Similarlly I have another query like this topic:facet AND title:(solr
lucene api) which is parsed as:
+(+topic:facet +(title:solr title:lucene title:api)

The second query is a subset of first query but it returns more results
than the first.
Per my understanding reason being that there are two clauses in second
query 1) topic:facet which MUST occur and 2) (title:solr title:lucene
title:api) any of the terms MUST occur.
In first query there are 3 clauses which has SHOULD occur in between terms
but due to 100% mm all terms are matched.

Kindly help me understand how I can get the subset of result of query 1 by
query 2.
I understand if I put +/AND in between the clauses it will work but the
same is not required in query one.
Is there a way I can group the clauses which ensures that the first clause
and the terms of other clause all should match as in the query first all
the clauses are matched.
Also let me know how ~ is different from phrase slop in the case of first
query.

Thanks,
Modassar


Re: Difference in query behavior.

2015-11-30 Thread Modassar Ather
Thanks for your response.

Upayavira : The missing bracket is a copy paste error. Correct parsed query
: +(+topic:facet +(title:solr title:lucene title:api)). Use of fq is not an
option as these are user queries.
Alexandre : That is just an example query. Those terms used are just to
explain the behavior. Basically the query forms can be seen as field:(term1
term2 term3) and field1:term4 AND field:(term1 term2 term3)
  The second query should bring the subset of the first
query but that is not happening.

Thanks Jack for your input.

Please let me know if there is a way to achieve the subset of first query
from second query. As per my understanding of the code I saw that until
there is an OR in between clauses the mm is not considered. So for the
query field1:term4 AND field:(term1 term2 term3) mm is not considered at
all.

Regards,
Modassar

On Tue, Dec 1, 2015 at 10:14 AM, Modassar Ather 
wrote:

> Thanks for your response.
>
> Upayavira : The missing bracket is a copy paste error. Correct parsed
> query : +(+topic:facet +(title:solr title:lucene title:api)). Use of fq is
> not an option as these are user queries.
> Alexandre : That is just an example query. Those terms used are just to
> explain the behavior. Basically the query forms can be seen as field:(term1
> term2 term3) and field1:term4 AND field:(term1 term2 term3)
>   The second query should bring the subset of the first
> query but that is not happening.
>
> Thanks Jack for your input.
>
> Please let me know if there is a way to achieve the subset of first query
> from second query.
>
> Regards,
> Modassar
>
>
> On Tue, Dec 1, 2015 at 9:57 AM, Modassar Ather 
> wrote:
>
>> Hi Tim,
>>
>> I am using the SpanQueryParser for phrases particularly.
>>
>> Thanks,
>> Modassar
>>
>> On Mon, Nov 30, 2015 at 6:27 PM, Allison, Timothy B. 
>> wrote:
>>
>>> Out of curiosity, how does the SpanQueryParser work on this?  Or have
>>> you stopped using that?
>>>
>>> Cheers,
>>>
>>>   Tim
>>>
>>> -Original Message-
>>> From: Modassar Ather [mailto:modather1...@gmail.com]
>>> Sent: Monday, November 30, 2015 5:46 AM
>>> To: solr-user@lucene.apache.org
>>> Subject: Difference in query behavior.
>>>
>>> Hi,
>>>
>>> I have a query title:(solr lucene api). The mm is set to 100% using q.op
>>> as AND.
>>> When the query is executed it returns documnets having all the terms. It
>>> parses to following:
>>> +(title:solr title:faceting title:api)~3
>>>
>>> Similarlly I have another query like this topic:facet AND title:(solr
>>> lucene api) which is parsed as:
>>> +(+topic:facet +(title:solr title:lucene title:api)
>>>
>>> The second query is a subset of first query but it returns more results
>>> than the first.
>>> Per my understanding reason being that there are two clauses in second
>>> query 1) topic:facet which MUST occur and 2) (title:solr title:lucene
>>> title:api) any of the terms MUST occur.
>>> In first query there are 3 clauses which has SHOULD occur in between
>>> terms but due to 100% mm all terms are matched.
>>>
>>> Kindly help me understand how I can get the subset of result of query 1
>>> by query 2.
>>> I understand if I put +/AND in between the clauses it will work but the
>>> same is not required in query one.
>>> Is there a way I can group the clauses which ensures that the first
>>> clause and the terms of other clause all should match as in the query first
>>> all the clauses are matched.
>>> Also let me know how ~ is different from phrase slop in the case of
>>> first query.
>>>
>>> Thanks,
>>> Modassar
>>>
>>
>>
>


Regarding behavior of docValues.

2015-02-24 Thread Modassar Ather
Hi,

Kindly help me understand the behavior of following field.



For a field like above where indexed="true" and docValues="true", is it
that:
 1) For sorting/faceting on *manu_exact* the docValues will be used.
 2) For querying on *manu_exact* the inverted index will be used.

Thanks,
Modassar


Re: Regarding behavior of docValues.

2015-02-24 Thread Modassar Ather
Thanks for your response Mikhail.

On Tue, Feb 24, 2015 at 5:35 PM, Mikhail Khludnev <
mkhlud...@griddynamics.com> wrote:

> Both statements seem true to me.
>
> On Tue, Feb 24, 2015 at 2:49 PM, Modassar Ather 
> wrote:
>
> > Hi,
> >
> > Kindly help me understand the behavior of following field.
> >
> >  > docValues="true" />
> >
> > For a field like above where indexed="true" and docValues="true", is it
> > that:
> >  1) For sorting/faceting on *manu_exact* the docValues will be used.
> >  2) For querying on *manu_exact* the inverted index will be used.
> >
> > Thanks,
> > Modassar
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> <http://www.griddynamics.com>
> 
>


Re: Regarding behavior of docValues.

2015-02-24 Thread Modassar Ather
So for a requirement where I have a field which is used for sorting,
faceting and searching what should be the better field definition.

Can it be **
or
Two fields each for sorting+faceting and for searching like following.


**

Kindly note that it will be better if can use existing field for sorting,
faceting and add searching on it like in example one above.

Regards,
Modassar

On Tue, Feb 24, 2015 at 11:15 PM, Erick Erickson 
wrote:

> Hmmm, that's not my understanding. docValues are simply a different
> layout for storing
> the _indexed_ values that facilitates rapid loading of the field from
> disk, essentially
> putting the uninverted field value in a conveniently-loadable form.
>
> So AFAIK, the field is stored only once and used for all three,
> sorting, faceting and
> searching.
>
> Best,
> Erick
>
> On Tue, Feb 24, 2015 at 4:13 AM, Modassar Ather 
> wrote:
> > Thanks for your response Mikhail.
> >
> > On Tue, Feb 24, 2015 at 5:35 PM, Mikhail Khludnev <
> > mkhlud...@griddynamics.com> wrote:
> >
> >> Both statements seem true to me.
> >>
> >> On Tue, Feb 24, 2015 at 2:49 PM, Modassar Ather  >
> >> wrote:
> >>
> >> > Hi,
> >> >
> >> > Kindly help me understand the behavior of following field.
> >> >
> >> >  >> > docValues="true" />
> >> >
> >> > For a field like above where indexed="true" and docValues="true", is
> it
> >> > that:
> >> >  1) For sorting/faceting on *manu_exact* the docValues will be used.
> >> >  2) For querying on *manu_exact* the inverted index will be used.
> >> >
> >> > Thanks,
> >> > Modassar
> >> >
> >>
> >>
> >>
> >> --
> >> Sincerely yours
> >> Mikhail Khludnev
> >> Principal Engineer,
> >> Grid Dynamics
> >>
> >> <http://www.griddynamics.com>
> >> 
> >>
>


Re: Regarding behavior of docValues.

2015-02-24 Thread Modassar Ather
Thanks Erick for your detailed response.

Sorry! I missed to put that I was trying to understand it in context of
Solr-5.0.0 where fieldcache is no more available.

Regards,
Modassar

On Wed, Feb 25, 2015 at 11:26 AM, Erick Erickson 
wrote:

> You're making it too complicated. Both a docValues field and
> an indexed (not docValues) field will give you the same
> functionality. For rapidly changing indexes, docValues will
> load more quickly when a new searcher is opened.
>
> Your question below is not really relevant.
> 
> Can it be * stored="false" docValues="true" />*
> or
> Two fields each for sorting+faceting and for searching like following.
>
>
> * /> stored="false" docValues="true" />*
>
> *
> You simply cannot sort, search, or facet on any field for which
> indexed="false". You can do all three on any field where
> indexed="true" (assuming it's not multiValued and only has one token
> since sorting only really makes sense for single-valued fields).
>
> It doesn't matter whether the field is docValues="true" or not.
> So if you want a "rule of thumb", make it a docValues field
> if you're updating your index rapidly. Otherwise whether a field is
> docValues or not is largely irrelevant.
>
> Best,
> Erick
>
> On Tue, Feb 24, 2015 at 9:09 PM, Modassar Ather 
> wrote:
> > So for a requirement where I have a field which is used for sorting,
> > faceting and searching what should be the better field definition.
> >
> > Can it be * > stored="false" docValues="true" />*
> > or
> > Two fields each for sorting+faceting and for searching like following.
> >
> >
> > * > /> > stored="false" docValues="true" />*
> >
> > Kindly note that it will be better if can use existing field for sorting,
> > faceting and add searching on it like in example one above.
> >
> > Regards,
> > Modassar
> >
> > On Tue, Feb 24, 2015 at 11:15 PM, Erick Erickson <
> erickerick...@gmail.com>
> > wrote:
> >
> >> Hmmm, that's not my understanding. docValues are simply a different
> >> layout for storing
> >> the _indexed_ values that facilitates rapid loading of the field from
> >> disk, essentially
> >> putting the uninverted field value in a conveniently-loadable form.
> >>
> >> So AFAIK, the field is stored only once and used for all three,
> >> sorting, faceting and
> >> searching.
> >>
> >> Best,
> >> Erick
> >>
> >> On Tue, Feb 24, 2015 at 4:13 AM, Modassar Ather  >
> >> wrote:
> >> > Thanks for your response Mikhail.
> >> >
> >> > On Tue, Feb 24, 2015 at 5:35 PM, Mikhail Khludnev <
> >> > mkhlud...@griddynamics.com> wrote:
> >> >
> >> >> Both statements seem true to me.
> >> >>
> >> >> On Tue, Feb 24, 2015 at 2:49 PM, Modassar Ather <
> modather1...@gmail.com
> >> >
> >> >> wrote:
> >> >>
> >> >> > Hi,
> >> >> >
> >> >> > Kindly help me understand the behavior of following field.
> >> >> >
> >> >> >  stored="false"
> >> >> > docValues="true" />
> >> >> >
> >> >> > For a field like above where indexed="true" and docValues="true",
> is
> >> it
> >> >> > that:
> >> >> >  1) For sorting/faceting on *manu_exact* the docValues will be
> used.
> >> >> >  2) For querying on *manu_exact* the inverted index will be used.
> >> >> >
> >> >> > Thanks,
> >> >> > Modassar
> >> >> >
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Sincerely yours
> >> >> Mikhail Khludnev
> >> >> Principal Engineer,
> >> >> Grid Dynamics
> >> >>
> >> >> <http://www.griddynamics.com>
> >> >> 
> >> >>
> >>
>


  1   2   >