language identification during solrj indexing

2015-07-02 Thread vineet yadav
Hi,

I want to identify language identification during solrj indexing. I have
made configuration  changes required for language identification on the
basis of solr wiki(
https://cwiki.apache.org/confluence/display/solr/Detecting+Languages+During+Indexing
).
language detection update chain is working externally, I am indexing
document using solrj. Language field is not added in solr, when I am
indexing through solrj.

I want to ask, if it is possible  to identify language during solrj
indexing.

Thanks
Vineet Yadav


accent insensitive field-type

2015-07-02 Thread Søren

Hi Solr users

I'm new to Solr and I need to be able to search in structured data in a 
case and accent insensitive manner. E.g. find "Crème brûlée", both when 
quering with "Crème brûlée" and "creme brulee".


It seems that none of the build-in text types support this, or am I wrong?
So I try to add my own inspired by another post, although it was old.

I'm running solr-5.2.1.

Curl to http://localhost:8983/solr/mycore/schema
{
"add-field-type":{
 "name":"myTxtField",
 "class":"solr.TextField",
 "positionIncrementGap":"100",
 "analyzer":{
"charFilter": {"class":"solr.MappingCharFilterFactory", 
"mapping":"mapping-ISOLatin1Accent.txt"},

"filter": {"class":"solr.LowerCaseFilterFactory"},
"tokenizer": {"class":"solr.StandardTokenizerFactory"}
}
}
}

But it doesn't work and when I look in '[... 
]\solr-5.2.1\server\solr\mycore\conf\managed-schema'

the analyzer section is reduced to this:
  positionIncrementGap="100">


  

  

 I'm I almost there or am I on a completely wrong track?

Thanks in advance
Søren



Location of config files in Zoo Keeper

2015-07-02 Thread dinesh naik
Hi all,
For solr version 5.1.0, Where does Zoo keeper keep all the config files
?How do we access them ?

>From Admin console , Cloud-->Tree-->config , we are able to see them but
where does Zoo Keeper store them(location)?
-- 
Best Regards,
Dinesh Naik


Re: accent insensitive field-type

2015-07-02 Thread Ahmet Arslan
Hi Soren,

I am not familiar with managed schema part, but there are built-in filters for 
this task.

ASCIIFoldingFilter and ICUFoldingFilter are two examples. 

Also solr provides two files: mapping-FoldToASCII.txt and 
mapping-ISOLatin1Accent.txt to be used with 
MappingCharFilter as you did.
You are probably hitting a problem with managed schema.

Ahmet


On Thursday, July 2, 2015 11:17 AM, Søren  wrote:
Hi Solr users

I'm new to Solr and I need to be able to search in structured data in a 
case and accent insensitive manner. E.g. find "Crème brûlée", both when 
quering with "Crème brûlée" and "creme brulee".

It seems that none of the build-in text types support this, or am I wrong?
So I try to add my own inspired by another post, although it was old.

I'm running solr-5.2.1.

Curl to http://localhost:8983/solr/mycore/schema
{
"add-field-type":{
  "name":"myTxtField",
  "class":"solr.TextField",
  "positionIncrementGap":"100",
  "analyzer":{
 "charFilter": {"class":"solr.MappingCharFilterFactory", 
"mapping":"mapping-ISOLatin1Accent.txt"},
 "filter": {"class":"solr.LowerCaseFilterFactory"},
 "tokenizer": {"class":"solr.StandardTokenizerFactory"}
 }
 }
}

But it doesn't work and when I look in '[... 
]\solr-5.2.1\server\solr\mycore\conf\managed-schema'
the analyzer section is reduced to this:
   
 
   
 
   

  I'm I almost there or am I on a completely wrong track?

Thanks in advance
Søren


Re: language identification during solrj indexing

2015-07-02 Thread Alessandro Benedetti
SolrJ is simply a java client to access Solr REST API.
This means that " indexing through SolrJ" doesn't exist.
You simply need to add the proper chain to the update request handler you
are using.
Taking a look to the code , by Default SolrJ UpdateRequest refers to the
"/update" endpoint.
Have you checked if you have your custom chain configured for that ?


Cheers

2015-07-02 9:07 GMT+01:00 vineet yadav :

> Hi,
>
> I want to identify language identification during solrj indexing. I have
> made configuration  changes required for language identification on the
> basis of solr wiki(
>
> https://cwiki.apache.org/confluence/display/solr/Detecting+Languages+During+Indexing
> ).
> language detection update chain is working externally, I am indexing
> document using solrj. Language field is not added in solr, when I am
> indexing through solrj.
>
> I want to ask, if it is possible  to identify language during solrj
> indexing.
>
> Thanks
> Vineet Yadav
>



-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Re: how to

2015-07-02 Thread Alessandro Benedetti
You request is very cryptic, actually I really discourage this kind of
requests …

Give always at least basic information for :
1) Environment ( Solr you are using ? Architecture ?)
2) Domain ( Problem you are trying to solve ? Data model ? )
3) Specific problem with a generic and detailed description
4) Some real world example of your problem

Trying to decipher do you mean :
1) I want all the documents containing the term "iPhone" in a field but not
the term "test" .
-> field:(+iphone -test)

2) I want only documents containing for a field the term "iPhone" and
nothing else
In this case I would suggest to not tokenize the field of interest, maybe
simple lowercase it, and a simple term query should solve the problem.

Please try to reformulate this in a more "human readable form" and we cal
help better and avoid to decipher :)

Cheers



2015-07-02 2:51 GMT+01:00 rulinma :

> search "iphone"
>
>
> but I don't want "iphone test" content is the first record, I want minus
> "test" weights , how to do this.
>
> thanks.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/how-to-tp4215345.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Re: Bug: replies mixed up with concurrent requests from the same host

2015-07-02 Thread Kevin Perros

Thanks for the answers,

I also found that blog post about such issues:
http://techbytes.anuragkapur.com/2014/08/potential-jetty-concurrency-bug-seen-in.html

On 01/07/15 20:26, Chris Hostetter wrote:

: Hmm, interesting. That particular bug was fixed by upgrading to Jetty
: 4.1.7 in https://issues.apache.org/jira/browse/SOLR-4031

1st) Typo - Shalin ment 8.1.7 above.

2nd) If you note the details of both issues, no root cause was ever
identified as being "fixed" -- all that hapened was that Per tried
upgrading to 8.1.7 and found he could no longer reproduce with his
particular test cases.

That doesn't mean the bug went away in 8.1.7, it means something
changed in 8.1.7 that cause the bug to no longer surface in the same way
for the same person.

It's very possible this is in fact hte same bug, but some other minor
change in 8.1.7 just changed the input needed to trigger the bug (eg:
maybe a buffer size increase/decrease, or a change in the default size of
a HashMap, ... anything like that that could tweak the neccessary input
size / request count / etc... neccessary to trigger the bug)

: > When I query solr with either curl or wget, with multiple parallel requests
: > from the same client host to the server, the answers come mixed up. From my
: > logs, I've seen that if I send 1 requests, with a 24 fold parallelism, I
: > often get as an answer to a request, the answer to the first one.

can you reproduce this against a controlled set of data/configs/queries
that you can bundle up in a zip file and make available to other people
for testing?  (ie: non proprietary/confidential configs + data + queries,
preferably with a data set small enough that it can be downloaded
quickly, ideally under 10MB so it can be attached to jira)


-Hoss
http://www.lucidworks.com/






Re: Suggester duplicating values

2015-07-02 Thread Alessandro Benedetti
Hi Rafael,
Your problem is clear and it has actually been explored few times in the
past.
I agree with you in a first instance.

A Suggester basic unit of information is a term. Not a document.
This means that actually it does not make a lot of sense to return
duplicates terms ( because they are coming from different docs).
The term id should be the term itself as there is no way for a human to
perceive any difference between two different terms returned by the
Suggester.

So, this consideration apart, are you using an intermediate API to query
Solr ( you should definitely do) .
If you are using any client, your client language should provide you a data
structure implementation to use to avoid duplicates.
Java for example is giving you HashSet , TreeSet and all the related
classes.

Hope this helps,

Cheers

2015-07-01 18:40 GMT+01:00 Rafael :

> Hi, I'm building a autocomplete solution on top of Solr for an ebook
> seller, but my database is complete denormalized, for example, I have this
> kind of records:
>
> *author   | title   | price*
> -+-+-
> J. R. R. Tolkien | Lord of the Rings   | $10.0
> J. R. R. Tolkien | Lord of the Rings Vol. 3| $12.0
> J. R. R. Tolkien | Lord of the Rings   | $11.0
> J. R. R. Tolkien | Lord of the Rings Vol. 3| $7.5
> J. R. R. Tolkien | Lord of the Rings Hardcover | $30.5
>
> We are already spending effort to normalize the database, but it will
> take a while*
>
>
> Thus, when I try to implement a suggest on author field, for example, if I
> type "*J.*" I'd get "*J. R. R. Tolkien*" 4 times.
>
> My Suggester Configuration is pretty standard:
>
> 
>  positionIncrementGap="100">
>   
> 
> 
>   
>   
> 
> 
>   
> 
>
>
> 
>   
> 
>   mySuggester
>   AnalyzingInfixLookupFactory
>   DocumentDictionaryFactory
>   author
>   textSuggest
> 
>   
>
>startup="lazy">
> 
>   true
>   20
>   mySuggester
> 
> 
>   suggest
> 
>   
>
>
> And I'm using Solr 5.2.1.
>
> *Question:* Is there a way to get only unique values for suggestion ? Or,
> would be simpler to export a file (or even a nem table in database) without
> duplicated values ?
>
> Thanks.
>



-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Re: DocValues: Which format is better Default or Memory?

2015-07-02 Thread Alessandro Benedetti
So first of all,
DocValues is a strategy to store on the disk ( or in memory) the
Un-inverted index for the field of interests.
This has been done to SPEED UP the faceting calculus using the "fc"
algorithm, and improve the memory usage.
It is really weird that this is the cause of a degrading of performances.

Building the DocValues should improve the query time to build facets,
increasing the indexing time.
Are you sure anything else could affect your times ?

let's try to help you out !

2015-07-02 4:19 GMT+01:00 Aman Tandon :

> Hi,
>
> I tried to use the docValues to reduce the search time, but when I am using
> the default format for docValues it is taking more time as compared to
> normal faceting technique (without docValues).
>
> Should I go for Memory format or there is something missing?
>
> *Note:-* I am doing the indexing at every 10 minutes and I am using solr
> 4.8.1
>
> With Regards
> Aman Tandon
>



-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Problem XY - X = SolrCloud 4.8 replicas down, Y = SolrCloud upgrade to a new version

2015-07-02 Thread Vincenzo D'Amore
Hi All,

In the latest months my SolrCloud clusters, sometimes (one/two times a
week), have few replicas down.
Usually all the replicas goes down on the same node.
I'm unable to understand why a 3 nodes cluster with 8 core/32 GB and high
performance disks have this problem. The main index is small, about 1.5 M
of documents with very small text inside.
I don't know if having 3 shards with 3 replicas is too much, to me it seems
a fair high high availability, but anyway this should not compromise the
cluster stability.
All the queries are under the second, so it is responsive.

Few months ago I begun to think the problem was related to an old and
bugged version of SolrCloud that we have to upgrade.
But reading in this list about the classic XY problem I changed my mind,
maybe there a much better solution.

This night I had, again, a couple of replicas down around 1.07 AM, this is
the SolrCloud log file:

http://pastebin.com/raw.php?i=bCHnqnXD

At end of exceptions list there are few "cancelElection did not find
election node to remove" errors and this morning I found the replicas down.

Looking GC log file I found that at same moment there is a GC that takes
about 20 seconds. Now I'm using CMS (ConcurrentMarkSweep) Collector taken
from Shawn Hensey suggestions:
https://wiki.apache.org/solr/ShawnHeisey#CMS_.28ConcurrentMarkSweep.29_Collector


http://pastebin.com/raw.php?i=VuSrg4uz

At last, looking around in the latest months I found this bug, that seems
to me be related to with this problems.
So I begun to think that I need an upgrade, am I right? What do you think
about ?

https://issues.apache.org/jira/browse/SOLR-6159

Any help is very appreciated.

Thanks,
Vincenzo


RE: language identification during solrj indexing

2015-07-02 Thread Markus Jelsma
https://wiki.apache.org/solr/LanguageDetection

 
 
-Original message-
> From:Alessandro Benedetti 
> Sent: Thursday 2nd July 2015 11:06
> To: solr-user@lucene.apache.org
> Subject: Re: language identification during solrj indexing
> 
> SolrJ is simply a java client to access Solr REST API.
> This means that " indexing through SolrJ" doesn't exist.
> You simply need to add the proper chain to the update request handler you
> are using.
> Taking a look to the code , by Default SolrJ UpdateRequest refers to the
> "/update" endpoint.
> Have you checked if you have your custom chain configured for that ?
> 
> 
> Cheers
> 
> 2015-07-02 9:07 GMT+01:00 vineet yadav :
> 
> > Hi,
> >
> > I want to identify language identification during solrj indexing. I have
> > made configuration  changes required for language identification on the
> > basis of solr wiki(
> >
> > https://cwiki.apache.org/confluence/display/solr/Detecting+Languages+During+Indexing
> > ).
> > language detection update chain is working externally, I am indexing
> > document using solrj. Language field is not added in solr, when I am
> > indexing through solrj.
> >
> > I want to ask, if it is possible  to identify language during solrj
> > indexing.
> >
> > Thanks
> > Vineet Yadav
> >
> 
> 
> 
> -- 
> --
> 
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
> 
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
> 
> William Blake - Songs of Experience -1794 England
> 


Re: DocValues: Which format is better Default or Memory?

2015-07-02 Thread Aman Tandon
Hi,

I tried to query the without and with docValues, the query with docValues
was taking more time. Does it may be due to IO got involved as some data
will be in some file.

Are you sure anything else could affect your times ?


Yes I am sure. We re-indexed the whole index of 40 Million records, to
implement the docValues to improve the speed. And I somehow managed to do
the simultaneous query for with/without docValues and I am getting higher
time with docValues by approx 200ms. As far as I could see it is increasing
as no of hits are increasing.

*My configuration for docValue is:*




With Regards
Aman Tandon

On Thu, Jul 2, 2015 at 3:15 PM, Alessandro Benedetti <
benedetti.ale...@gmail.com> wrote:

> So first of all,
> DocValues is a strategy to store on the disk ( or in memory) the
> Un-inverted index for the field of interests.
> This has been done to SPEED UP the faceting calculus using the "fc"
> algorithm, and improve the memory usage.
> It is really weird that this is the cause of a degrading of performances.
>
> Building the DocValues should improve the query time to build facets,
> increasing the indexing time.
> Are you sure anything else could affect your times ?
>
> let's try to help you out !
>
> 2015-07-02 4:19 GMT+01:00 Aman Tandon :
>
> > Hi,
> >
> > I tried to use the docValues to reduce the search time, but when I am
> using
> > the default format for docValues it is taking more time as compared to
> > normal faceting technique (without docValues).
> >
> > Should I go for Memory format or there is something missing?
> >
> > *Note:-* I am doing the indexing at every 10 minutes and I am using solr
> > 4.8.1
> >
> > With Regards
> > Aman Tandon
> >
>
>
>
> --
> --
>
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>


Re: Suggester configuration queries.

2015-07-02 Thread ssharma7...@gmail.com
Erick,
We actaully have a working version of Solr 4.6 Spellchecker, the
configuration details are as mentioned below:

*Solr 4.6 - schema.xml*






















*Solr 4.6 - solrconfig.xml*


none
json
false
true
suggestDictionary
true
5
false


suggest





suggestDictionary
org.apache.solr.spelling.suggest.Suggester
org.apache.solr.spelling.suggest.fst.FSTLookupFactory
suggest
0.
true



*Solr 4.6 Spellcheck query*
http://localhost:8983/solr/portal_documents/suggest?&wt=xml&spellcheck.q=wh

*Solr 4.6 Spellcheck results*


0
0




5
0
2

*when
what
where
which
who*






Now, we are migrating to Solr 5.1 & have the following configuration
details:
*Solr 5.1 - schema.xml*























*Solr 5.1 - solrconfig.xml*

  
 c_suggest

  default
  suggest
  solr.DirectSolrSpellChecker
  internal
  0.01
  2
  1
  5
  1
  0.01
  .01

  

  

default
on
5
false
xml
true
false


  spellcheck

  

*Solr 5.1 SPellcheck query (same as Solr 4.6)*
http://localhost:8983/solr/portal_documents/spell?&wt=xml&spellcheck.q=wh

*Solr 5.1 Spellcheck results*


0
62





2
0
2

*we
who*






Both the Solr versions have same data & spellcheck index is also built.
I want to get the same results for Spellchecker in Solr 5.1 as I am getting
in 4.6, but I am not able to get it.

Can you please suggest an appropriate fix?
Is there some problem in my Solr 5.1 configuration?


Regards,
Sachin Vyas.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Suggester-configuration-queries-tp4214950p4215393.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to do a Data sharding for data in a database table

2015-07-02 Thread wwang525
Hi,

I worked with other search solutions before, and cache management is
important in boosting performance. Apart from the cache generated due to
user's requests, loading the search index into memory is the very initial
step after the index is built. This is to ensure search results to be
retrieved from memory, and not from disk I/O.

The observation is that if the search index has not been accessed for a long
time, the performance will be degraded greatly due to the swap of the search
index from memory to disk by OS.

Does Solr automatically loads search index into memory after the index is
built? Otherwise, is there any tool or command that can accomplish this
task. 

Regards




--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-do-a-Data-sharding-for-data-in-a-database-table-tp4212765p4215398.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: how to

2015-07-02 Thread Jack Krupansky
Use a fractional boost for the test term, and make test optional: +iphone
test^0.5

-- Jack Krupansky

On Wed, Jul 1, 2015 at 9:51 PM, rulinma  wrote:

> search "iphone"
>
>
> but I don't want "iphone test" content is the first record, I want minus
> "test" weights , how to do this.
>
> thanks.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/how-to-tp4215345.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Suggester duplicating values

2015-07-02 Thread Rafael
Thanks, Alessandro!

Well, I'm using Ruby and the r-solr as a client library. I didn't get what
you said about term id. Do I have to create this field ? Or is it a "hidden
field" utilized by solr under the hood ?

[]'s
Rafael

On Thu, Jul 2, 2015 at 6:41 AM, Alessandro Benedetti <
benedetti.ale...@gmail.com> wrote:

> Hi Rafael,
> Your problem is clear and it has actually been explored few times in the
> past.
> I agree with you in a first instance.
>
> A Suggester basic unit of information is a term. Not a document.
> This means that actually it does not make a lot of sense to return
> duplicates terms ( because they are coming from different docs).
> The term id should be the term itself as there is no way for a human to
> perceive any difference between two different terms returned by the
> Suggester.
>
> So, this consideration apart, are you using an intermediate API to query
> Solr ( you should definitely do) .
> If you are using any client, your client language should provide you a data
> structure implementation to use to avoid duplicates.
> Java for example is giving you HashSet , TreeSet and all the related
> classes.
>
> Hope this helps,
>
> Cheers
>
> 2015-07-01 18:40 GMT+01:00 Rafael :
>
> > Hi, I'm building a autocomplete solution on top of Solr for an ebook
> > seller, but my database is complete denormalized, for example, I have
> this
> > kind of records:
> >
> > *author   | title   | price*
> > -+-+-
> > J. R. R. Tolkien | Lord of the Rings   | $10.0
> > J. R. R. Tolkien | Lord of the Rings Vol. 3| $12.0
> > J. R. R. Tolkien | Lord of the Rings   | $11.0
> > J. R. R. Tolkien | Lord of the Rings Vol. 3| $7.5
> > J. R. R. Tolkien | Lord of the Rings Hardcover | $30.5
> >
> > We are already spending effort to normalize the database, but it will
> > take a while*
> >
> >
> > Thus, when I try to implement a suggest on author field, for example, if
> I
> > type "*J.*" I'd get "*J. R. R. Tolkien*" 4 times.
> >
> > My Suggester Configuration is pretty standard:
> >
> > 
> >  > positionIncrementGap="100">
> >   
> > 
> > 
> >   
> >   
> > 
> > 
> >   
> > 
> >
> >
> > 
> >   
> > 
> >   mySuggester
> >   AnalyzingInfixLookupFactory
> >   DocumentDictionaryFactory
> >   author
> >   textSuggest
> > 
> >   
> >
> >> startup="lazy">
> > 
> >   true
> >   20
> >   mySuggester
> > 
> > 
> >   suggest
> > 
> >   
> >
> >
> > And I'm using Solr 5.2.1.
> >
> > *Question:* Is there a way to get only unique values for suggestion ? Or,
> > would be simpler to export a file (or even a nem table in database)
> without
> > duplicated values ?
> >
> > Thanks.
> >
>
>
>
> --
> --
>
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>


Re: accent insensitive field-type

2015-07-02 Thread Søren

Thanks Ahmet

I'm trying the add the ICUFoldingFilterFactory in the analyzer.
I suspect that my problem is that the filter class doesn't load.
The managed-schema file is the same info as when looking in the schema 
browser in the web gui.


Cheers

On 02-07-2015 10:47, Ahmet Arslan wrote:

Hi Soren,

I am not familiar with managed schema part, but there are built-in filters for 
this task.

ASCIIFoldingFilter and ICUFoldingFilter are two examples.

Also solr provides two files: mapping-FoldToASCII.txt and 
mapping-ISOLatin1Accent.txt to be used with
MappingCharFilter as you did.
You are probably hitting a problem with managed schema.

Ahmet


On Thursday, July 2, 2015 11:17 AM, Søren  wrote:
Hi Solr users

I'm new to Solr and I need to be able to search in structured data in a
case and accent insensitive manner. E.g. find "Crème brûlée", both when
quering with "Crème brûlée" and "creme brulee".

It seems that none of the build-in text types support this, or am I wrong?
So I try to add my own inspired by another post, although it was old.

I'm running solr-5.2.1.

Curl to http://localhost:8983/solr/mycore/schema
{
"add-field-type":{
   "name":"myTxtField",
   "class":"solr.TextField",
   "positionIncrementGap":"100",
   "analyzer":{
  "charFilter": {"class":"solr.MappingCharFilterFactory",
"mapping":"mapping-ISOLatin1Accent.txt"},
  "filter": {"class":"solr.LowerCaseFilterFactory"},
  "tokenizer": {"class":"solr.StandardTokenizerFactory"}
  }
  }
}

But it doesn't work and when I look in '[...
]\solr-5.2.1\server\solr\mycore\conf\managed-schema'
the analyzer section is reduced to this:

  

  


   I'm I almost there or am I on a completely wrong track?

Thanks in advance
Søren




Re: Location of config files in Zoo Keeper

2015-07-02 Thread Shawn Heisey
On 7/2/2015 2:31 AM, dinesh naik wrote:
> For solr version 5.1.0, Where does Zoo keeper keep all the config files
> ?How do we access them ?
> 
> From Admin console , Cloud-->Tree-->config , we are able to see them but
> where does Zoo Keeper store them(location)?

The information you can see in Cloud->Tree is a visual representation of
the entire config tree within Zookeeper.  Zookeeper is organized quite a
lot like a filesystem, with structures that look a lot like directories.

If you have provided a chroot with your zkHost string, then the Solr
config tree will begin at the path indicated in the chroot.  If you did
not include a chroot, then it will be at the root of the zookeeper
"filesystem."

Thanks,
Shawn



Re: accent insensitive field-type

2015-07-02 Thread Shawn Heisey
On 7/2/2015 8:53 AM, Søren wrote:
> I'm trying the add the ICUFoldingFilterFactory in the analyzer.
> I suspect that my problem is that the filter class doesn't load.
> The managed-schema file is the same info as when looking in the schema
> browser in the web gui.

The ICU analysis components are considered "contrib" and therefore
require adding jars to Solr for them to work.  The notes on the
AnalyzersTokenizersTokenFilters wiki page tell you where to look for
more information about those jars.

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ICUFoldingFilterFactory

The best place to add additional jars is ${solr.solr.home}/lib ... a
directory which usually must be created.  The solr home is where
solr.xml lives, providing global config options to Solr.  The default
location for the solr home in the 5.x download is server/solr, but that
can be overridden.

Thanks,
Shawn



Re: Suggester duplicating values

2015-07-02 Thread Alessandro Benedetti
No, I was referring to the fact that a Suggester as a unit of information
manages simple terms which are identified simply by themselves.

What you need to do is tu sums some Ruby Datastructure that prevent the
duplicates to be inserted, and then offer the Suggestions from there.

Cheers

2015-07-02 15:42 GMT+01:00 Rafael :

> Thanks, Alessandro!
>
> Well, I'm using Ruby and the r-solr as a client library. I didn't get what
> you said about term id. Do I have to create this field ? Or is it a "hidden
> field" utilized by solr under the hood ?
>
> []'s
> Rafael
>
> On Thu, Jul 2, 2015 at 6:41 AM, Alessandro Benedetti <
> benedetti.ale...@gmail.com> wrote:
>
> > Hi Rafael,
> > Your problem is clear and it has actually been explored few times in the
> > past.
> > I agree with you in a first instance.
> >
> > A Suggester basic unit of information is a term. Not a document.
> > This means that actually it does not make a lot of sense to return
> > duplicates terms ( because they are coming from different docs).
> > The term id should be the term itself as there is no way for a human to
> > perceive any difference between two different terms returned by the
> > Suggester.
> >
> > So, this consideration apart, are you using an intermediate API to query
> > Solr ( you should definitely do) .
> > If you are using any client, your client language should provide you a
> data
> > structure implementation to use to avoid duplicates.
> > Java for example is giving you HashSet , TreeSet and all the related
> > classes.
> >
> > Hope this helps,
> >
> > Cheers
> >
> > 2015-07-01 18:40 GMT+01:00 Rafael :
> >
> > > Hi, I'm building a autocomplete solution on top of Solr for an ebook
> > > seller, but my database is complete denormalized, for example, I have
> > this
> > > kind of records:
> > >
> > > *author   | title   | price*
> > > -+-+-
> > > J. R. R. Tolkien | Lord of the Rings   | $10.0
> > > J. R. R. Tolkien | Lord of the Rings Vol. 3| $12.0
> > > J. R. R. Tolkien | Lord of the Rings   | $11.0
> > > J. R. R. Tolkien | Lord of the Rings Vol. 3| $7.5
> > > J. R. R. Tolkien | Lord of the Rings Hardcover | $30.5
> > >
> > > We are already spending effort to normalize the database, but it
> will
> > > take a while*
> > >
> > >
> > > Thus, when I try to implement a suggest on author field, for example,
> if
> > I
> > > type "*J.*" I'd get "*J. R. R. Tolkien*" 4 times.
> > >
> > > My Suggester Configuration is pretty standard:
> > >
> > > 
> > >  > > positionIncrementGap="100">
> > >   
> > > 
> > > 
> > >   
> > >   
> > > 
> > > 
> > >   
> > > 
> > >
> > >
> > > 
> > >   
> > > 
> > >   mySuggester
> > >   AnalyzingInfixLookupFactory
> > >   DocumentDictionaryFactory
> > >   author
> > >   textSuggest
> > > 
> > >   
> > >
> > >> > startup="lazy">
> > > 
> > >   true
> > >   20
> > >   mySuggester
> > > 
> > > 
> > >   suggest
> > > 
> > >   
> > >
> > >
> > > And I'm using Solr 5.2.1.
> > >
> > > *Question:* Is there a way to get only unique values for suggestion ?
> Or,
> > > would be simpler to export a file (or even a nem table in database)
> > without
> > > duplicated values ?
> > >
> > > Thanks.
> > >
> >
> >
> >
> > --
> > --
> >
> > Benedetti Alessandro
> > Visiting card : http://about.me/alessandro_benedetti
> >
> > "Tyger, tyger burning bright
> > In the forests of the night,
> > What immortal hand or eye
> > Could frame thy fearful symmetry?"
> >
> > William Blake - Songs of Experience -1794 England
> >
>



-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Re: accent insensitive field-type

2015-07-02 Thread Steve Rowe
Hi Søren,

“charFilter” should be “charFilters”, and “filter” should be “filters”; and 
both their values should be arrays - try this:

{
  "add-field-type”: {
"name":"myTxtField",
"class":"solr.TextField",
"positionIncrementGap":"100",
"analyzer”: {
  "charFilters": [ {"class":"solr.MappingCharFilterFactory", 
"mapping":"mapping-ISOLatin1Accent.txt”} ],
  "tokenizer": [ {"class":"solr.StandardTokenizerFactory”} ],
  "filters": {"class":"solr.LowerCaseFilterFactory"}
}
  }
}

There should be better error messages for misspellings here.  I’ll file a JIRA 
issue.

(I also moved “filters” after “tokenizer” since that’s the order in which 
they’re executed in an analysis pipeline, but Solr will interpret the 
out-of-order version correctly.)

FYI, if you want to *correct* a field type, rather than create a new one, you 
should use the “replace-field-type” command instead of the “add-field-type” 
command.  You’ll get an error if you attempt to add a field type that already 
exists in the schema.

Steve

> On Jul 2, 2015, at 1:17 AM, Søren  wrote:
> 
> Hi Solr users
> 
> I'm new to Solr and I need to be able to search in structured data in a case 
> and accent insensitive manner. E.g. find "Crème brûlée", both when quering 
> with "Crème brûlée" and "creme brulee".
> 
> It seems that none of the build-in text types support this, or am I wrong?
> So I try to add my own inspired by another post, although it was old.
> 
> I'm running solr-5.2.1.
> 
> Curl to http://localhost:8983/solr/mycore/schema
> {
> "add-field-type":{
> "name":"myTxtField",
> "class":"solr.TextField",
> "positionIncrementGap":"100",
> "analyzer":{
>"charFilter": {"class":"solr.MappingCharFilterFactory", 
> "mapping":"mapping-ISOLatin1Accent.txt"},
>"filter": {"class":"solr.LowerCaseFilterFactory"},
>"tokenizer": {"class":"solr.StandardTokenizerFactory"}
>}
>}
> }
> 
> But it doesn't work and when I look in '[... 
> ]\solr-5.2.1\server\solr\mycore\conf\managed-schema'
> the analyzer section is reduced to this:
>   positionIncrementGap="100">
>
>  
>
>  
> 
> I'm I almost there or am I on a completely wrong track?
> 
> Thanks in advance
> Søren
> 



Re: Suggester duplicating values

2015-07-02 Thread Rafael
Just double checking:

In my ruby backend I ask for (using the given example) all suggested terms
that starts with "J." , then I (probably) add all the terms to a Set, and
then return the Set to the view. Right ?

[]'s
Rafael

On Thu, Jul 2, 2015 at 12:12 PM, Alessandro Benedetti <
benedetti.ale...@gmail.com> wrote:

> No, I was referring to the fact that a Suggester as a unit of information
> manages simple terms which are identified simply by themselves.
>
> What you need to do is tu sums some Ruby Datastructure that prevent the
> duplicates to be inserted, and then offer the Suggestions from there.
>
> Cheers
>
> 2015-07-02 15:42 GMT+01:00 Rafael :
>
> > Thanks, Alessandro!
> >
> > Well, I'm using Ruby and the r-solr as a client library. I didn't get
> what
> > you said about term id. Do I have to create this field ? Or is it a
> "hidden
> > field" utilized by solr under the hood ?
> >
> > []'s
> > Rafael
> >
> > On Thu, Jul 2, 2015 at 6:41 AM, Alessandro Benedetti <
> > benedetti.ale...@gmail.com> wrote:
> >
> > > Hi Rafael,
> > > Your problem is clear and it has actually been explored few times in
> the
> > > past.
> > > I agree with you in a first instance.
> > >
> > > A Suggester basic unit of information is a term. Not a document.
> > > This means that actually it does not make a lot of sense to return
> > > duplicates terms ( because they are coming from different docs).
> > > The term id should be the term itself as there is no way for a human to
> > > perceive any difference between two different terms returned by the
> > > Suggester.
> > >
> > > So, this consideration apart, are you using an intermediate API to
> query
> > > Solr ( you should definitely do) .
> > > If you are using any client, your client language should provide you a
> > data
> > > structure implementation to use to avoid duplicates.
> > > Java for example is giving you HashSet , TreeSet and all the related
> > > classes.
> > >
> > > Hope this helps,
> > >
> > > Cheers
> > >
> > > 2015-07-01 18:40 GMT+01:00 Rafael :
> > >
> > > > Hi, I'm building a autocomplete solution on top of Solr for an ebook
> > > > seller, but my database is complete denormalized, for example, I have
> > > this
> > > > kind of records:
> > > >
> > > > *author   | title   | price*
> > > > -+-+-
> > > > J. R. R. Tolkien | Lord of the Rings   | $10.0
> > > > J. R. R. Tolkien | Lord of the Rings Vol. 3| $12.0
> > > > J. R. R. Tolkien | Lord of the Rings   | $11.0
> > > > J. R. R. Tolkien | Lord of the Rings Vol. 3| $7.5
> > > > J. R. R. Tolkien | Lord of the Rings Hardcover | $30.5
> > > >
> > > > We are already spending effort to normalize the database, but it
> > will
> > > > take a while*
> > > >
> > > >
> > > > Thus, when I try to implement a suggest on author field, for example,
> > if
> > > I
> > > > type "*J.*" I'd get "*J. R. R. Tolkien*" 4 times.
> > > >
> > > > My Suggester Configuration is pretty standard:
> > > >
> > > > 
> > > >  > > > positionIncrementGap="100">
> > > >   
> > > > 
> > > > 
> > > >   
> > > >   
> > > > 
> > > > 
> > > >   
> > > > 
> > > >
> > > >
> > > > 
> > > >   
> > > > 
> > > >   mySuggester
> > > >   AnalyzingInfixLookupFactory
> > > >   DocumentDictionaryFactory
> > > >   author
> > > >   textSuggest
> > > > 
> > > >   
> > > >
> > > >> > > startup="lazy">
> > > > 
> > > >   true
> > > >   20
> > > >   mySuggester
> > > > 
> > > > 
> > > >   suggest
> > > > 
> > > >   
> > > >
> > > >
> > > > And I'm using Solr 5.2.1.
> > > >
> > > > *Question:* Is there a way to get only unique values for suggestion ?
> > Or,
> > > > would be simpler to export a file (or even a nem table in database)
> > > without
> > > > duplicated values ?
> > > >
> > > > Thanks.
> > > >
> > >
> > >
> > >
> > > --
> > > --
> > >
> > > Benedetti Alessandro
> > > Visiting card : http://about.me/alessandro_benedetti
> > >
> > > "Tyger, tyger burning bright
> > > In the forests of the night,
> > > What immortal hand or eye
> > > Could frame thy fearful symmetry?"
> > >
> > > William Blake - Songs of Experience -1794 England
> > >
> >
>
>
>
> --
> --
>
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>


Re: Problem XY - X = SolrCloud 4.8 replicas down, Y = SolrCloud upgrade to a new version

2015-07-02 Thread Erick Erickson
Vincenzo:

First and foremost, figure out why you're having 20 second GC pauses. For
indexes like you're describing, this is unusual. How big is the heap
you allocate to the JVM?

Check your Zookeeper timeout. In earlier versions of SolrCloud it defaulted to
15 seconds. Going into leader election would happen for no obvious reason,
and lengthening it to 30-60 seconds seemed to help a lot of people.

The disks should be largely irrelevant to the origin or cure for this problem...

Here's a good article on why you want to allocate "just enough" heap
for your app. Of course, "just enough" can be interesting to actually
define:

http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

Best,
Erick

On Thu, Jul 2, 2015 at 5:45 AM, Vincenzo D'Amore  wrote:
> Hi All,
>
> In the latest months my SolrCloud clusters, sometimes (one/two times a
> week), have few replicas down.
> Usually all the replicas goes down on the same node.
> I'm unable to understand why a 3 nodes cluster with 8 core/32 GB and high
> performance disks have this problem. The main index is small, about 1.5 M
> of documents with very small text inside.
> I don't know if having 3 shards with 3 replicas is too much, to me it seems
> a fair high high availability, but anyway this should not compromise the
> cluster stability.
> All the queries are under the second, so it is responsive.
>
> Few months ago I begun to think the problem was related to an old and
> bugged version of SolrCloud that we have to upgrade.
> But reading in this list about the classic XY problem I changed my mind,
> maybe there a much better solution.
>
> This night I had, again, a couple of replicas down around 1.07 AM, this is
> the SolrCloud log file:
>
> http://pastebin.com/raw.php?i=bCHnqnXD
>
> At end of exceptions list there are few "cancelElection did not find
> election node to remove" errors and this morning I found the replicas down.
>
> Looking GC log file I found that at same moment there is a GC that takes
> about 20 seconds. Now I'm using CMS (ConcurrentMarkSweep) Collector taken
> from Shawn Hensey suggestions:
> https://wiki.apache.org/solr/ShawnHeisey#CMS_.28ConcurrentMarkSweep.29_Collector
>
>
> http://pastebin.com/raw.php?i=VuSrg4uz
>
> At last, looking around in the latest months I found this bug, that seems
> to me be related to with this problems.
> So I begun to think that I need an upgrade, am I right? What do you think
> about ?
>
> https://issues.apache.org/jira/browse/SOLR-6159
>
> Any help is very appreciated.
>
> Thanks,
> Vincenzo


Re: DocValues: Which format is better Default or Memory?

2015-07-02 Thread Aman Tandon
Anything wrong?

With Regards
Aman Tandon

On Thu, Jul 2, 2015 at 4:19 PM, Aman Tandon  wrote:

> Hi,
>
> I tried to query the without and with docValues, the query with docValues
> was taking more time. Does it may be due to IO got involved as some data
> will be in some file.
>
> Are you sure anything else could affect your times ?
>
>
> Yes I am sure. We re-indexed the whole index of 40 Million records, to
> implement the docValues to improve the speed. And I somehow managed to do
> the simultaneous query for with/without docValues and I am getting higher
> time with docValues by approx 200ms. As far as I could see it is increasing
> as no of hits are increasing.
>
> *My configuration for docValue is:*
>
>  ="false" omitNorms="true" multiValued="false" />
>
>
> With Regards
> Aman Tandon
>
> On Thu, Jul 2, 2015 at 3:15 PM, Alessandro Benedetti <
> benedetti.ale...@gmail.com> wrote:
>
>> So first of all,
>> DocValues is a strategy to store on the disk ( or in memory) the
>> Un-inverted index for the field of interests.
>> This has been done to SPEED UP the faceting calculus using the "fc"
>> algorithm, and improve the memory usage.
>> It is really weird that this is the cause of a degrading of performances.
>>
>> Building the DocValues should improve the query time to build facets,
>> increasing the indexing time.
>> Are you sure anything else could affect your times ?
>>
>> let's try to help you out !
>>
>> 2015-07-02 4:19 GMT+01:00 Aman Tandon :
>>
>> > Hi,
>> >
>> > I tried to use the docValues to reduce the search time, but when I am
>> using
>> > the default format for docValues it is taking more time as compared to
>> > normal faceting technique (without docValues).
>> >
>> > Should I go for Memory format or there is something missing?
>> >
>> > *Note:-* I am doing the indexing at every 10 minutes and I am using solr
>> > 4.8.1
>> >
>> > With Regards
>> > Aman Tandon
>> >
>>
>>
>>
>> --
>> --
>>
>> Benedetti Alessandro
>> Visiting card : http://about.me/alessandro_benedetti
>>
>> "Tyger, tyger burning bright
>> In the forests of the night,
>> What immortal hand or eye
>> Could frame thy fearful symmetry?"
>>
>> William Blake - Songs of Experience -1794 England
>>
>
>


Re: DocValues: Which format is better Default or Memory?

2015-07-02 Thread Erick Erickson
How are you testing? I'd do a couple of things:
1> turn of your queryResultCache (set its size to 0).
2> run multiple queries through something like jmeter
3> insure you've run enough warmup queries to load
 all your fields into memory.

Basically, if this were always the case, I'd expect a
_lot_ of people to be talking about it, I suspect there's
something in your test methodology that's giving you
innacurate results.

On Thu, Jul 2, 2015 at 6:49 AM, Aman Tandon  wrote:
> Hi,
>
> I tried to query the without and with docValues, the query with docValues
> was taking more time. Does it may be due to IO got involved as some data
> will be in some file.
>
> Are you sure anything else could affect your times ?
>
>
> Yes I am sure. We re-indexed the whole index of 40 Million records, to
> implement the docValues to improve the speed. And I somehow managed to do
> the simultaneous query for with/without docValues and I am getting higher
> time with docValues by approx 200ms. As far as I could see it is increasing
> as no of hits are increasing.
>
> *My configuration for docValue is:*
>
>  "false" omitNorms="true" multiValued="false" />
>
>
> With Regards
> Aman Tandon
>
> On Thu, Jul 2, 2015 at 3:15 PM, Alessandro Benedetti <
> benedetti.ale...@gmail.com> wrote:
>
>> So first of all,
>> DocValues is a strategy to store on the disk ( or in memory) the
>> Un-inverted index for the field of interests.
>> This has been done to SPEED UP the faceting calculus using the "fc"
>> algorithm, and improve the memory usage.
>> It is really weird that this is the cause of a degrading of performances.
>>
>> Building the DocValues should improve the query time to build facets,
>> increasing the indexing time.
>> Are you sure anything else could affect your times ?
>>
>> let's try to help you out !
>>
>> 2015-07-02 4:19 GMT+01:00 Aman Tandon :
>>
>> > Hi,
>> >
>> > I tried to use the docValues to reduce the search time, but when I am
>> using
>> > the default format for docValues it is taking more time as compared to
>> > normal faceting technique (without docValues).
>> >
>> > Should I go for Memory format or there is something missing?
>> >
>> > *Note:-* I am doing the indexing at every 10 minutes and I am using solr
>> > 4.8.1
>> >
>> > With Regards
>> > Aman Tandon
>> >
>>
>>
>>
>> --
>> --
>>
>> Benedetti Alessandro
>> Visiting card : http://about.me/alessandro_benedetti
>>
>> "Tyger, tyger burning bright
>> In the forests of the night,
>> What immortal hand or eye
>> Could frame thy fearful symmetry?"
>>
>> William Blake - Songs of Experience -1794 England
>>


Re: Suggester duplicating values

2015-07-02 Thread Alessandro Benedetti
That is what I was saying :)
Hope it helps

2015-07-02 16:32 GMT+01:00 Rafael :

> Just double checking:
>
> In my ruby backend I ask for (using the given example) all suggested terms
> that starts with "J." , then I (probably) add all the terms to a Set, and
> then return the Set to the view. Right ?
>
> []'s
> Rafael
>
> On Thu, Jul 2, 2015 at 12:12 PM, Alessandro Benedetti <
> benedetti.ale...@gmail.com> wrote:
>
> > No, I was referring to the fact that a Suggester as a unit of information
> > manages simple terms which are identified simply by themselves.
> >
> > What you need to do is tu sums some Ruby Datastructure that prevent the
> > duplicates to be inserted, and then offer the Suggestions from there.
> >
> > Cheers
> >
> > 2015-07-02 15:42 GMT+01:00 Rafael :
> >
> > > Thanks, Alessandro!
> > >
> > > Well, I'm using Ruby and the r-solr as a client library. I didn't get
> > what
> > > you said about term id. Do I have to create this field ? Or is it a
> > "hidden
> > > field" utilized by solr under the hood ?
> > >
> > > []'s
> > > Rafael
> > >
> > > On Thu, Jul 2, 2015 at 6:41 AM, Alessandro Benedetti <
> > > benedetti.ale...@gmail.com> wrote:
> > >
> > > > Hi Rafael,
> > > > Your problem is clear and it has actually been explored few times in
> > the
> > > > past.
> > > > I agree with you in a first instance.
> > > >
> > > > A Suggester basic unit of information is a term. Not a document.
> > > > This means that actually it does not make a lot of sense to return
> > > > duplicates terms ( because they are coming from different docs).
> > > > The term id should be the term itself as there is no way for a human
> to
> > > > perceive any difference between two different terms returned by the
> > > > Suggester.
> > > >
> > > > So, this consideration apart, are you using an intermediate API to
> > query
> > > > Solr ( you should definitely do) .
> > > > If you are using any client, your client language should provide you
> a
> > > data
> > > > structure implementation to use to avoid duplicates.
> > > > Java for example is giving you HashSet , TreeSet and all the related
> > > > classes.
> > > >
> > > > Hope this helps,
> > > >
> > > > Cheers
> > > >
> > > > 2015-07-01 18:40 GMT+01:00 Rafael :
> > > >
> > > > > Hi, I'm building a autocomplete solution on top of Solr for an
> ebook
> > > > > seller, but my database is complete denormalized, for example, I
> have
> > > > this
> > > > > kind of records:
> > > > >
> > > > > *author   | title   | price*
> > > > > -+-+-
> > > > > J. R. R. Tolkien | Lord of the Rings   | $10.0
> > > > > J. R. R. Tolkien | Lord of the Rings Vol. 3| $12.0
> > > > > J. R. R. Tolkien | Lord of the Rings   | $11.0
> > > > > J. R. R. Tolkien | Lord of the Rings Vol. 3| $7.5
> > > > > J. R. R. Tolkien | Lord of the Rings Hardcover | $30.5
> > > > >
> > > > > We are already spending effort to normalize the database, but
> it
> > > will
> > > > > take a while*
> > > > >
> > > > >
> > > > > Thus, when I try to implement a suggest on author field, for
> example,
> > > if
> > > > I
> > > > > type "*J.*" I'd get "*J. R. R. Tolkien*" 4 times.
> > > > >
> > > > > My Suggester Configuration is pretty standard:
> > > > >
> > > > > 
> > > > >  > > > > positionIncrementGap="100">
> > > > >   
> > > > > 
> > > > > 
> > > > >   
> > > > >   
> > > > > 
> > > > > 
> > > > >   
> > > > > 
> > > > >
> > > > >
> > > > > 
> > > > >   
> > > > > 
> > > > >   mySuggester
> > > > >   AnalyzingInfixLookupFactory
> > > > >   DocumentDictionaryFactory
> > > > >   author
> > > > >   textSuggest
> > > > > 
> > > > >   
> > > > >
> > > > >> > > > startup="lazy">
> > > > > 
> > > > >   true
> > > > >   20
> > > > >   mySuggester
> > > > > 
> > > > > 
> > > > >   suggest
> > > > > 
> > > > >   
> > > > >
> > > > >
> > > > > And I'm using Solr 5.2.1.
> > > > >
> > > > > *Question:* Is there a way to get only unique values for
> suggestion ?
> > > Or,
> > > > > would be simpler to export a file (or even a nem table in database)
> > > > without
> > > > > duplicated values ?
> > > > >
> > > > > Thanks.
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > --
> > > >
> > > > Benedetti Alessandro
> > > > Visiting card : http://about.me/alessandro_benedetti
> > > >
> > > > "Tyger, tyger burning bright
> > > > In the forests of the night,
> > > > What immortal hand or eye
> > > > Could frame thy fearful symmetry?"
> > > >
> > > > William Blake - Songs of Experience -1794 England
> > > >
> > >
> >
> >
> >
> > --
> > --
> >
> > Benedetti Alessandro
> > Visiting card : http://about.me/alessandro_benedetti
> >
> > "Tyger, tyger burning bright
> > In the forests of the night,
> > What immortal hand or eye
> > Could frame thy fearful symmetry?"
> >
> 

AND for multiple faceted queries

2015-07-02 Thread Aki Balogh
I'm trying to specify multiple fq and get the intersection: (lines
separated for readability)

query?
q=webCrawlId:36&
fq=(body:"crib bedding" OR title:"crib bedding")&
fq={!frange l=0 u=0}termfreq(body,"crib bedding")&
fq={!frange l=0 u=0}termfreq(title,"crib bedding")&
rows=25000&
tv=false&
start=0&
wt=json

this should return 0 records, but it comes back with results. turns out, it
is returning records that match ANY of the fqs, not ALL of the fqs.

how can I force solr to return only records that match ALL?

thanks,
Aki


Re: How to do a Data sharding for data in a database table

2015-07-02 Thread Erick Erickson
bq: Does Solr automatically loads search index into memory after the index is
built?

No. That's what the autowarm counts on on your queryResultCache
and filterCache are intended to facilitate. Also after every commit,
a newSearcher event is fired and any warmup queries you have configured
in the newSearcher section of your solrconfig.xml file are fired that you
should configure so as to load whatever low-level caches you expect
should be loaded.

What have you looked for to try to answer this question before you posted
the question? The top two Google responses outline this in some detail.

Best,
Erick

On Thu, Jul 2, 2015 at 8:41 AM, wwang525  wrote:
> Hi,
>
> I worked with other search solutions before, and cache management is
> important in boosting performance. Apart from the cache generated due to
> user's requests, loading the search index into memory is the very initial
> step after the index is built. This is to ensure search results to be
> retrieved from memory, and not from disk I/O.
>
> The observation is that if the search index has not been accessed for a long
> time, the performance will be degraded greatly due to the swap of the search
> index from memory to disk by OS.
>
> Does Solr automatically loads search index into memory after the index is
> built? Otherwise, is there any tool or command that can accomplish this
> task.
>
> Regards
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/How-to-do-a-Data-sharding-for-data-in-a-database-table-tp4212765p4215398.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: DocValues: Which format is better Default or Memory?

2015-07-02 Thread Toke Eskildsen
Alessandro Benedetti  wrote:
> DocValues is a strategy to store on the disk ( or in memory) the
> Un-inverted index for the field of interests.

True.

> This has been done to SPEED UP the faceting calculus using the "fc"
> algorithm, and improve the memory usage.

Part of the reason was to speed up the _startup_ time for faceting.

This is not the first time I read about people getting poorer query-performance 
with DocValues. It does make sense: DocValues in the index means that they 
compete with other files for disk caching and even when they are fully cached, 
the UnInverted structure has a speed edge due to being directly accessible as 
standard on-heap memory structures.

The difference is likely to vary a great deal depending on concrete corpus & 
hardware.

- Toke Eskildsen


Re: Location of config files in Zoo Keeper

2015-07-02 Thread Erick Erickson
_Why_ do you want to access the raw file? Because
in the "normal" case you shouldn't have to care a whit
about where ZK keeps the files.

The normal pattern for changing these files is to use
the upconfig/downconfig zkcli commands to replace
the configs wholesale.

For production situations, it's often the pattern to keep
the configs in some version control system (svn, git,
whatever). Then to change them you check them out
of source-control and upconfig them (which replaces
an old configset or adds a new one).

You can use the downconfig command to fetch the files
from ZK to your local machine.

And if you really want to, there's a nifty IntelliJ plugin
that will allow you to edit them directly on ZK, but
that is something I'd never use for production because
your configs had _really_ better be under some
versioning system. It's great for development though.

Best,
Erick

On Thu, Jul 2, 2015 at 11:04 AM, Shawn Heisey  wrote:
> On 7/2/2015 2:31 AM, dinesh naik wrote:
>> For solr version 5.1.0, Where does Zoo keeper keep all the config files
>> ?How do we access them ?
>>
>> From Admin console , Cloud-->Tree-->config , we are able to see them but
>> where does Zoo Keeper store them(location)?
>
> The information you can see in Cloud->Tree is a visual representation of
> the entire config tree within Zookeeper.  Zookeeper is organized quite a
> lot like a filesystem, with structures that look a lot like directories.
>
> If you have provided a chroot with your zkHost string, then the Solr
> config tree will begin at the path indicated in the chroot.  If you did
> not include a chroot, then it will be at the root of the zookeeper
> "filesystem."
>
> Thanks,
> Shawn
>


Re: AND for multiple faceted queries

2015-07-02 Thread Erick Erickson
What have you done to try to track this down? What
proof do you have that the intersection of all those
sets is indeed empty? Have you tried the fq clauses
one at a time? If my guess is correct, you'll see the
last two returning all documents.

This certainly isn't the way fq's work,
and if it were fundamental to fq's, lots of tests would break
and you'd be seeing a zillion bug reports.

So I suspect it's
1> something here you're not showing us
or
2> something really strange with frange
or
3> you are misunderstanding something. To whit:

fq={!frange l=0 u=0}termfreq(body,"crib bedding")&

will return all documents that do _not_ mention "crib bedding"
(since you're specifying term frequency of 0)
as a single term. Which, if the field is analyzed means
all documents since there is no _single_ term "crib bedding",
there are two single terms "crib" and "bedding" perhaps.

BTW, returning 25,000 rows is something of an anti-pattern
in Solr, usually you do that with the export handler.

If that's not what's happening, let's see:
1> results of adding &debug=all to the query
2> what version of Solr?
3> any changes you've made to your solrconfig that might be important.

Best,
Erick

On Thu, Jul 2, 2015 at 11:46 AM, Aki Balogh  wrote:
> I'm trying to specify multiple fq and get the intersection: (lines
> separated for readability)
>
> query?
> q=webCrawlId:36&
> fq=(body:"crib bedding" OR title:"crib bedding")&
> fq={!frange l=0 u=0}termfreq(body,"crib bedding")&
> fq={!frange l=0 u=0}termfreq(title,"crib bedding")&
> rows=25000&
> tv=false&
> start=0&
> wt=json
>
> this should return 0 records, but it comes back with results. turns out, it
> is returning records that match ANY of the fqs, not ALL of the fqs.
>
> how can I force solr to return only records that match ALL?
>
> thanks,
> Aki


Re: DocValues: Which format is better Default or Memory?

2015-07-02 Thread Aman Tandon
So should I use Memory format?

With Regards
Aman Tandon

On Thu, Jul 2, 2015 at 9:20 PM, Toke Eskildsen 
wrote:

> Alessandro Benedetti  wrote:
> > DocValues is a strategy to store on the disk ( or in memory) the
> > Un-inverted index for the field of interests.
>
> True.
>
> > This has been done to SPEED UP the faceting calculus using the "fc"
> > algorithm, and improve the memory usage.
>
> Part of the reason was to speed up the _startup_ time for faceting.
>
> This is not the first time I read about people getting poorer
> query-performance with DocValues. It does make sense: DocValues in the
> index means that they compete with other files for disk caching and even
> when they are fully cached, the UnInverted structure has a speed edge due
> to being directly accessible as standard on-heap memory structures.
>
> The difference is likely to vary a great deal depending on concrete corpus
> & hardware.
>
> - Toke Eskildsen
>


Re: Problem XY - X = SolrCloud 4.8 replicas down, Y = SolrCloud upgrade to a new version

2015-07-02 Thread Vincenzo D'Amore
Hi Erick,

thanks for your answer.

We use java 8 and allocate a 16GB heap size

 -Xms2g -Xmx16g

There are 1.5M docs and about 16 GB index size on disk.

Let me also say, during the day we have a lot of little update, from 1k to
50k docs every time, and we do a full update of all documents during the
night. And during this full update the 20 seconds GC happened.

I haven't read completely the Uwe's post just because was too long, all I
got was that I have to use MMapDirectory.
But I was still unable to restart the production with this new component.
After the change it is not clear if we only need to restart the core/node
or if a full reindex must be done.

Thanks for your time, I'll read very carefully Uwe's post.


On Thu, Jul 2, 2015 at 5:39 PM, Erick Erickson 
wrote:

> Vincenzo:
>
> First and foremost, figure out why you're having 20 second GC pauses. For
> indexes like you're describing, this is unusual. How big is the heap
> you allocate to the JVM?
>
> Check your Zookeeper timeout. In earlier versions of SolrCloud it
> defaulted to
> 15 seconds. Going into leader election would happen for no obvious reason,
> and lengthening it to 30-60 seconds seemed to help a lot of people.
>
> The disks should be largely irrelevant to the origin or cure for this
> problem...
>
> Here's a good article on why you want to allocate "just enough" heap
> for your app. Of course, "just enough" can be interesting to actually
> define:
>
> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
>
> Best,
> Erick
>
> On Thu, Jul 2, 2015 at 5:45 AM, Vincenzo D'Amore 
> wrote:
> > Hi All,
> >
> > In the latest months my SolrCloud clusters, sometimes (one/two times a
> > week), have few replicas down.
> > Usually all the replicas goes down on the same node.
> > I'm unable to understand why a 3 nodes cluster with 8 core/32 GB and high
> > performance disks have this problem. The main index is small, about 1.5 M
> > of documents with very small text inside.
> > I don't know if having 3 shards with 3 replicas is too much, to me it
> seems
> > a fair high high availability, but anyway this should not compromise the
> > cluster stability.
> > All the queries are under the second, so it is responsive.
> >
> > Few months ago I begun to think the problem was related to an old and
> > bugged version of SolrCloud that we have to upgrade.
> > But reading in this list about the classic XY problem I changed my mind,
> > maybe there a much better solution.
> >
> > This night I had, again, a couple of replicas down around 1.07 AM, this
> is
> > the SolrCloud log file:
> >
> > http://pastebin.com/raw.php?i=bCHnqnXD
> >
> > At end of exceptions list there are few "cancelElection did not find
> > election node to remove" errors and this morning I found the replicas
> down.
> >
> > Looking GC log file I found that at same moment there is a GC that takes
> > about 20 seconds. Now I'm using CMS (ConcurrentMarkSweep) Collector taken
> > from Shawn Hensey suggestions:
> >
> https://wiki.apache.org/solr/ShawnHeisey#CMS_.28ConcurrentMarkSweep.29_Collector
> >
> >
> > http://pastebin.com/raw.php?i=VuSrg4uz
> >
> > At last, looking around in the latest months I found this bug, that seems
> > to me be related to with this problems.
> > So I begun to think that I need an upgrade, am I right? What do you think
> > about ?
> >
> > https://issues.apache.org/jira/browse/SOLR-6159
> >
> > Any help is very appreciated.
> >
> > Thanks,
> > Vincenzo
>



-- 
Vincenzo D'Amore
email: v.dam...@gmail.com
skype: free.dev
mobile: +39 349 8513251


Re: Suggester duplicating values

2015-07-02 Thread Rafael
Absolutely!

Thanks man.

[]'s
Rafael

On Thu, Jul 2, 2015 at 12:42 PM, Alessandro Benedetti <
benedetti.ale...@gmail.com> wrote:

> That is what I was saying :)
> Hope it helps
>
> 2015-07-02 16:32 GMT+01:00 Rafael :
>
> > Just double checking:
> >
> > In my ruby backend I ask for (using the given example) all suggested
> terms
> > that starts with "J." , then I (probably) add all the terms to a Set, and
> > then return the Set to the view. Right ?
> >
> > []'s
> > Rafael
> >
> > On Thu, Jul 2, 2015 at 12:12 PM, Alessandro Benedetti <
> > benedetti.ale...@gmail.com> wrote:
> >
> > > No, I was referring to the fact that a Suggester as a unit of
> information
> > > manages simple terms which are identified simply by themselves.
> > >
> > > What you need to do is tu sums some Ruby Datastructure that prevent the
> > > duplicates to be inserted, and then offer the Suggestions from there.
> > >
> > > Cheers
> > >
> > > 2015-07-02 15:42 GMT+01:00 Rafael :
> > >
> > > > Thanks, Alessandro!
> > > >
> > > > Well, I'm using Ruby and the r-solr as a client library. I didn't get
> > > what
> > > > you said about term id. Do I have to create this field ? Or is it a
> > > "hidden
> > > > field" utilized by solr under the hood ?
> > > >
> > > > []'s
> > > > Rafael
> > > >
> > > > On Thu, Jul 2, 2015 at 6:41 AM, Alessandro Benedetti <
> > > > benedetti.ale...@gmail.com> wrote:
> > > >
> > > > > Hi Rafael,
> > > > > Your problem is clear and it has actually been explored few times
> in
> > > the
> > > > > past.
> > > > > I agree with you in a first instance.
> > > > >
> > > > > A Suggester basic unit of information is a term. Not a document.
> > > > > This means that actually it does not make a lot of sense to return
> > > > > duplicates terms ( because they are coming from different docs).
> > > > > The term id should be the term itself as there is no way for a
> human
> > to
> > > > > perceive any difference between two different terms returned by the
> > > > > Suggester.
> > > > >
> > > > > So, this consideration apart, are you using an intermediate API to
> > > query
> > > > > Solr ( you should definitely do) .
> > > > > If you are using any client, your client language should provide
> you
> > a
> > > > data
> > > > > structure implementation to use to avoid duplicates.
> > > > > Java for example is giving you HashSet , TreeSet and all the
> related
> > > > > classes.
> > > > >
> > > > > Hope this helps,
> > > > >
> > > > > Cheers
> > > > >
> > > > > 2015-07-01 18:40 GMT+01:00 Rafael :
> > > > >
> > > > > > Hi, I'm building a autocomplete solution on top of Solr for an
> > ebook
> > > > > > seller, but my database is complete denormalized, for example, I
> > have
> > > > > this
> > > > > > kind of records:
> > > > > >
> > > > > > *author   | title   | price*
> > > > > > -+-+-
> > > > > > J. R. R. Tolkien | Lord of the Rings   | $10.0
> > > > > > J. R. R. Tolkien | Lord of the Rings Vol. 3| $12.0
> > > > > > J. R. R. Tolkien | Lord of the Rings   | $11.0
> > > > > > J. R. R. Tolkien | Lord of the Rings Vol. 3| $7.5
> > > > > > J. R. R. Tolkien | Lord of the Rings Hardcover | $30.5
> > > > > >
> > > > > > We are already spending effort to normalize the database, but
> > it
> > > > will
> > > > > > take a while*
> > > > > >
> > > > > >
> > > > > > Thus, when I try to implement a suggest on author field, for
> > example,
> > > > if
> > > > > I
> > > > > > type "*J.*" I'd get "*J. R. R. Tolkien*" 4 times.
> > > > > >
> > > > > > My Suggester Configuration is pretty standard:
> > > > > >
> > > > > > 
> > > > > >  > > > > > positionIncrementGap="100">
> > > > > >   
> > > > > > 
> > > > > > 
> > > > > >   
> > > > > >   
> > > > > > 
> > > > > > 
> > > > > >   
> > > > > > 
> > > > > >
> > > > > >
> > > > > > 
> > > > > >   
> > > > > > 
> > > > > >   mySuggester
> > > > > >   AnalyzingInfixLookupFactory
> > > > > >   DocumentDictionaryFactory
> > > > > >   author
> > > > > >   textSuggest
> > > > > > 
> > > > > >   
> > > > > >
> > > > > >> > > > > startup="lazy">
> > > > > > 
> > > > > >   true
> > > > > >   20
> > > > > >   mySuggester
> > > > > > 
> > > > > > 
> > > > > >   suggest
> > > > > > 
> > > > > >   
> > > > > >
> > > > > >
> > > > > > And I'm using Solr 5.2.1.
> > > > > >
> > > > > > *Question:* Is there a way to get only unique values for
> > suggestion ?
> > > > Or,
> > > > > > would be simpler to export a file (or even a nem table in
> database)
> > > > > without
> > > > > > duplicated values ?
> > > > > >
> > > > > > Thanks.
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > --
> > > > >
> > > > > Benedetti Alessandro
> > > > > Visiting card : http://about.me/alessandro_benedetti
> > > > >
> > > > > "Tyger, tyger burning bright
> > > > > In t

Re: Problem XY - X = SolrCloud 4.8 replicas down, Y = SolrCloud upgrade to a new version

2015-07-02 Thread Erick Erickson
bq: and we do a full update of all documents during the night.

How fast are you sending documents? Prior to Solr 5.2 the replicas
would do a twice the amount of work for indexing that the leader
did (odd, but...) See:

http://lucidworks.com/blog/indexing-performance-solr-5-2-now-twice-fast/

Still, focusing on the GC pauses is probably the most fruitful. You just
shouldn't be getting pauses that long with 16G heaps. How long does it
take you to re-index? I've seen situations where indexing at an
_extremely_ high rate will force replicas into recovery. This took 150 threads
all firing queries as fast as possible to hit, but I thought I'd mention it.

Best,
Erick

On Thu, Jul 2, 2015 at 12:56 PM, Vincenzo D'Amore  wrote:
> Hi Erick,
>
> thanks for your answer.
>
> We use java 8 and allocate a 16GB heap size
>
>  -Xms2g -Xmx16g
>
> There are 1.5M docs and about 16 GB index size on disk.
>
> Let me also say, during the day we have a lot of little update, from 1k to
> 50k docs every time, and we do a full update of all documents during the
> night. And during this full update the 20 seconds GC happened.
>
> I haven't read completely the Uwe's post just because was too long, all I
> got was that I have to use MMapDirectory.
> But I was still unable to restart the production with this new component.
> After the change it is not clear if we only need to restart the core/node
> or if a full reindex must be done.
>
> Thanks for your time, I'll read very carefully Uwe's post.
>
>
> On Thu, Jul 2, 2015 at 5:39 PM, Erick Erickson 
> wrote:
>
>> Vincenzo:
>>
>> First and foremost, figure out why you're having 20 second GC pauses. For
>> indexes like you're describing, this is unusual. How big is the heap
>> you allocate to the JVM?
>>
>> Check your Zookeeper timeout. In earlier versions of SolrCloud it
>> defaulted to
>> 15 seconds. Going into leader election would happen for no obvious reason,
>> and lengthening it to 30-60 seconds seemed to help a lot of people.
>>
>> The disks should be largely irrelevant to the origin or cure for this
>> problem...
>>
>> Here's a good article on why you want to allocate "just enough" heap
>> for your app. Of course, "just enough" can be interesting to actually
>> define:
>>
>> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
>>
>> Best,
>> Erick
>>
>> On Thu, Jul 2, 2015 at 5:45 AM, Vincenzo D'Amore 
>> wrote:
>> > Hi All,
>> >
>> > In the latest months my SolrCloud clusters, sometimes (one/two times a
>> > week), have few replicas down.
>> > Usually all the replicas goes down on the same node.
>> > I'm unable to understand why a 3 nodes cluster with 8 core/32 GB and high
>> > performance disks have this problem. The main index is small, about 1.5 M
>> > of documents with very small text inside.
>> > I don't know if having 3 shards with 3 replicas is too much, to me it
>> seems
>> > a fair high high availability, but anyway this should not compromise the
>> > cluster stability.
>> > All the queries are under the second, so it is responsive.
>> >
>> > Few months ago I begun to think the problem was related to an old and
>> > bugged version of SolrCloud that we have to upgrade.
>> > But reading in this list about the classic XY problem I changed my mind,
>> > maybe there a much better solution.
>> >
>> > This night I had, again, a couple of replicas down around 1.07 AM, this
>> is
>> > the SolrCloud log file:
>> >
>> > http://pastebin.com/raw.php?i=bCHnqnXD
>> >
>> > At end of exceptions list there are few "cancelElection did not find
>> > election node to remove" errors and this morning I found the replicas
>> down.
>> >
>> > Looking GC log file I found that at same moment there is a GC that takes
>> > about 20 seconds. Now I'm using CMS (ConcurrentMarkSweep) Collector taken
>> > from Shawn Hensey suggestions:
>> >
>> https://wiki.apache.org/solr/ShawnHeisey#CMS_.28ConcurrentMarkSweep.29_Collector
>> >
>> >
>> > http://pastebin.com/raw.php?i=VuSrg4uz
>> >
>> > At last, looking around in the latest months I found this bug, that seems
>> > to me be related to with this problems.
>> > So I begun to think that I need an upgrade, am I right? What do you think
>> > about ?
>> >
>> > https://issues.apache.org/jira/browse/SOLR-6159
>> >
>> > Any help is very appreciated.
>> >
>> > Thanks,
>> > Vincenzo
>>
>
>
>
> --
> Vincenzo D'Amore
> email: v.dam...@gmail.com
> skype: free.dev
> mobile: +39 349 8513251


Re: accent insensitive field-type

2015-07-02 Thread Steve Rowe
See https://issues.apache.org/jira/browse/SOLR-7749

> On Jul 2, 2015, at 8:31 AM, Steve Rowe  wrote:
> 
> Hi Søren,
> 
> “charFilter” should be “charFilters”, and “filter” should be “filters”; and 
> both their values should be arrays - try this:
> 
> {
>  "add-field-type”: {
>"name":"myTxtField",
>"class":"solr.TextField",
>"positionIncrementGap":"100",
>"analyzer”: {
>  "charFilters": [ {"class":"solr.MappingCharFilterFactory", 
> "mapping":"mapping-ISOLatin1Accent.txt”} ],
>  "tokenizer": [ {"class":"solr.StandardTokenizerFactory”} ],
>  "filters": {"class":"solr.LowerCaseFilterFactory"}
>}
>  }
> }
> 
> There should be better error messages for misspellings here.  I’ll file a 
> JIRA issue.
> 
> (I also moved “filters” after “tokenizer” since that’s the order in which 
> they’re executed in an analysis pipeline, but Solr will interpret the 
> out-of-order version correctly.)
> 
> FYI, if you want to *correct* a field type, rather than create a new one, you 
> should use the “replace-field-type” command instead of the “add-field-type” 
> command.  You’ll get an error if you attempt to add a field type that already 
> exists in the schema.
> 
> Steve
> 
>> On Jul 2, 2015, at 1:17 AM, Søren  wrote:
>> 
>> Hi Solr users
>> 
>> I'm new to Solr and I need to be able to search in structured data in a case 
>> and accent insensitive manner. E.g. find "Crème brûlée", both when quering 
>> with "Crème brûlée" and "creme brulee".
>> 
>> It seems that none of the build-in text types support this, or am I wrong?
>> So I try to add my own inspired by another post, although it was old.
>> 
>> I'm running solr-5.2.1.
>> 
>> Curl to http://localhost:8983/solr/mycore/schema
>> {
>> "add-field-type":{
>>"name":"myTxtField",
>>"class":"solr.TextField",
>>"positionIncrementGap":"100",
>>"analyzer":{
>>   "charFilter": {"class":"solr.MappingCharFilterFactory", 
>> "mapping":"mapping-ISOLatin1Accent.txt"},
>>   "filter": {"class":"solr.LowerCaseFilterFactory"},
>>   "tokenizer": {"class":"solr.StandardTokenizerFactory"}
>>   }
>>   }
>> }
>> 
>> But it doesn't work and when I look in '[... 
>> ]\solr-5.2.1\server\solr\mycore\conf\managed-schema'
>> the analyzer section is reduced to this:
>> > positionIncrementGap="100">
>>   
>> 
>>   
>> 
>> 
>> I'm I almost there or am I on a completely wrong track?
>> 
>> Thanks in advance
>> Søren
>> 
> 



Re: Problem XY - X = SolrCloud 4.8 replicas down, Y = SolrCloud upgrade to a new version

2015-07-02 Thread Vincenzo D'Amore
We are trying to send documents as fast as we can, we wrote a multi-thread
Solrj application that read from file, solr, or rdbms and update a
collection.
But if we have too much threads during the day servers become unresponsive.
Now, in the night, with a low number of search, we reindex the entire
collection (1.5M docs) with 2 threads in about 1 h.

As I wrote, now I'm using CMS (ConcurrentMarkSweep), and I supposed that
use Shawn's suggestions about GC was enough to have the right
configuration.




On Thu, Jul 2, 2015 at 7:05 PM, Erick Erickson 
wrote:

> bq: and we do a full update of all documents during the night.
>
> How fast are you sending documents? Prior to Solr 5.2 the replicas
> would do a twice the amount of work for indexing that the leader
> did (odd, but...) See:
>
> http://lucidworks.com/blog/indexing-performance-solr-5-2-now-twice-fast/
>
> Still, focusing on the GC pauses is probably the most fruitful. You just
> shouldn't be getting pauses that long with 16G heaps. How long does it
> take you to re-index? I've seen situations where indexing at an
> _extremely_ high rate will force replicas into recovery. This took 150
> threads
> all firing queries as fast as possible to hit, but I thought I'd mention
> it.
>
> Best,
> Erick
>
> On Thu, Jul 2, 2015 at 12:56 PM, Vincenzo D'Amore 
> wrote:
> > Hi Erick,
> >
> > thanks for your answer.
> >
> > We use java 8 and allocate a 16GB heap size
> >
> >  -Xms2g -Xmx16g
> >
> > There are 1.5M docs and about 16 GB index size on disk.
> >
> > Let me also say, during the day we have a lot of little update, from 1k
> to
> > 50k docs every time, and we do a full update of all documents during the
> > night. And during this full update the 20 seconds GC happened.
> >
> > I haven't read completely the Uwe's post just because was too long, all I
> > got was that I have to use MMapDirectory.
> > But I was still unable to restart the production with this new component.
> > After the change it is not clear if we only need to restart the core/node
> > or if a full reindex must be done.
> >
> > Thanks for your time, I'll read very carefully Uwe's post.
> >
> >
> > On Thu, Jul 2, 2015 at 5:39 PM, Erick Erickson 
> > wrote:
> >
> >> Vincenzo:
> >>
> >> First and foremost, figure out why you're having 20 second GC pauses.
> For
> >> indexes like you're describing, this is unusual. How big is the heap
> >> you allocate to the JVM?
> >>
> >> Check your Zookeeper timeout. In earlier versions of SolrCloud it
> >> defaulted to
> >> 15 seconds. Going into leader election would happen for no obvious
> reason,
> >> and lengthening it to 30-60 seconds seemed to help a lot of people.
> >>
> >> The disks should be largely irrelevant to the origin or cure for this
> >> problem...
> >>
> >> Here's a good article on why you want to allocate "just enough" heap
> >> for your app. Of course, "just enough" can be interesting to actually
> >> define:
> >>
> >> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
> >>
> >> Best,
> >> Erick
> >>
> >> On Thu, Jul 2, 2015 at 5:45 AM, Vincenzo D'Amore 
> >> wrote:
> >> > Hi All,
> >> >
> >> > In the latest months my SolrCloud clusters, sometimes (one/two times a
> >> > week), have few replicas down.
> >> > Usually all the replicas goes down on the same node.
> >> > I'm unable to understand why a 3 nodes cluster with 8 core/32 GB and
> high
> >> > performance disks have this problem. The main index is small, about
> 1.5 M
> >> > of documents with very small text inside.
> >> > I don't know if having 3 shards with 3 replicas is too much, to me it
> >> seems
> >> > a fair high high availability, but anyway this should not compromise
> the
> >> > cluster stability.
> >> > All the queries are under the second, so it is responsive.
> >> >
> >> > Few months ago I begun to think the problem was related to an old and
> >> > bugged version of SolrCloud that we have to upgrade.
> >> > But reading in this list about the classic XY problem I changed my
> mind,
> >> > maybe there a much better solution.
> >> >
> >> > This night I had, again, a couple of replicas down around 1.07 AM,
> this
> >> is
> >> > the SolrCloud log file:
> >> >
> >> > http://pastebin.com/raw.php?i=bCHnqnXD
> >> >
> >> > At end of exceptions list there are few "cancelElection did not find
> >> > election node to remove" errors and this morning I found the replicas
> >> down.
> >> >
> >> > Looking GC log file I found that at same moment there is a GC that
> takes
> >> > about 20 seconds. Now I'm using CMS (ConcurrentMarkSweep) Collector
> taken
> >> > from Shawn Hensey suggestions:
> >> >
> >>
> https://wiki.apache.org/solr/ShawnHeisey#CMS_.28ConcurrentMarkSweep.29_Collector
> >> >
> >> >
> >> > http://pastebin.com/raw.php?i=VuSrg4uz
> >> >
> >> > At last, looking around in the latest months I found this bug, that
> seems
> >> > to me be related to with this problems.
> >> > So I begun to think that I need an upgrade, am I right? What do you
> think
> >> > abou

Re: Problem XY - X = SolrCloud 4.8 replicas down, Y = SolrCloud upgrade to a new version

2015-07-02 Thread Erick Erickson
1.5M docs in an hour isn't near the rates I saw that trigger the
LIR problem, so I strongly doubt that's the issue, never mind ;)

On Thu, Jul 2, 2015 at 1:47 PM, Vincenzo D'Amore  wrote:
> We are trying to send documents as fast as we can, we wrote a multi-thread
> Solrj application that read from file, solr, or rdbms and update a
> collection.
> But if we have too much threads during the day servers become unresponsive.
> Now, in the night, with a low number of search, we reindex the entire
> collection (1.5M docs) with 2 threads in about 1 h.
>
> As I wrote, now I'm using CMS (ConcurrentMarkSweep), and I supposed that
> use Shawn's suggestions about GC was enough to have the right
> configuration.
>
>
>
>
> On Thu, Jul 2, 2015 at 7:05 PM, Erick Erickson 
> wrote:
>
>> bq: and we do a full update of all documents during the night.
>>
>> How fast are you sending documents? Prior to Solr 5.2 the replicas
>> would do a twice the amount of work for indexing that the leader
>> did (odd, but...) See:
>>
>> http://lucidworks.com/blog/indexing-performance-solr-5-2-now-twice-fast/
>>
>> Still, focusing on the GC pauses is probably the most fruitful. You just
>> shouldn't be getting pauses that long with 16G heaps. How long does it
>> take you to re-index? I've seen situations where indexing at an
>> _extremely_ high rate will force replicas into recovery. This took 150
>> threads
>> all firing queries as fast as possible to hit, but I thought I'd mention
>> it.
>>
>> Best,
>> Erick
>>
>> On Thu, Jul 2, 2015 at 12:56 PM, Vincenzo D'Amore 
>> wrote:
>> > Hi Erick,
>> >
>> > thanks for your answer.
>> >
>> > We use java 8 and allocate a 16GB heap size
>> >
>> >  -Xms2g -Xmx16g
>> >
>> > There are 1.5M docs and about 16 GB index size on disk.
>> >
>> > Let me also say, during the day we have a lot of little update, from 1k
>> to
>> > 50k docs every time, and we do a full update of all documents during the
>> > night. And during this full update the 20 seconds GC happened.
>> >
>> > I haven't read completely the Uwe's post just because was too long, all I
>> > got was that I have to use MMapDirectory.
>> > But I was still unable to restart the production with this new component.
>> > After the change it is not clear if we only need to restart the core/node
>> > or if a full reindex must be done.
>> >
>> > Thanks for your time, I'll read very carefully Uwe's post.
>> >
>> >
>> > On Thu, Jul 2, 2015 at 5:39 PM, Erick Erickson 
>> > wrote:
>> >
>> >> Vincenzo:
>> >>
>> >> First and foremost, figure out why you're having 20 second GC pauses.
>> For
>> >> indexes like you're describing, this is unusual. How big is the heap
>> >> you allocate to the JVM?
>> >>
>> >> Check your Zookeeper timeout. In earlier versions of SolrCloud it
>> >> defaulted to
>> >> 15 seconds. Going into leader election would happen for no obvious
>> reason,
>> >> and lengthening it to 30-60 seconds seemed to help a lot of people.
>> >>
>> >> The disks should be largely irrelevant to the origin or cure for this
>> >> problem...
>> >>
>> >> Here's a good article on why you want to allocate "just enough" heap
>> >> for your app. Of course, "just enough" can be interesting to actually
>> >> define:
>> >>
>> >> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
>> >>
>> >> Best,
>> >> Erick
>> >>
>> >> On Thu, Jul 2, 2015 at 5:45 AM, Vincenzo D'Amore 
>> >> wrote:
>> >> > Hi All,
>> >> >
>> >> > In the latest months my SolrCloud clusters, sometimes (one/two times a
>> >> > week), have few replicas down.
>> >> > Usually all the replicas goes down on the same node.
>> >> > I'm unable to understand why a 3 nodes cluster with 8 core/32 GB and
>> high
>> >> > performance disks have this problem. The main index is small, about
>> 1.5 M
>> >> > of documents with very small text inside.
>> >> > I don't know if having 3 shards with 3 replicas is too much, to me it
>> >> seems
>> >> > a fair high high availability, but anyway this should not compromise
>> the
>> >> > cluster stability.
>> >> > All the queries are under the second, so it is responsive.
>> >> >
>> >> > Few months ago I begun to think the problem was related to an old and
>> >> > bugged version of SolrCloud that we have to upgrade.
>> >> > But reading in this list about the classic XY problem I changed my
>> mind,
>> >> > maybe there a much better solution.
>> >> >
>> >> > This night I had, again, a couple of replicas down around 1.07 AM,
>> this
>> >> is
>> >> > the SolrCloud log file:
>> >> >
>> >> > http://pastebin.com/raw.php?i=bCHnqnXD
>> >> >
>> >> > At end of exceptions list there are few "cancelElection did not find
>> >> > election node to remove" errors and this morning I found the replicas
>> >> down.
>> >> >
>> >> > Looking GC log file I found that at same moment there is a GC that
>> takes
>> >> > about 20 seconds. Now I'm using CMS (ConcurrentMarkSweep) Collector
>> taken
>> >> > from Shawn Hensey suggestions:
>> >> >
>> >>
>> https://wiki.apache.org/solr/ShawnHeise

Distributed queries hang in a non-SolrCloud environment, Solr 4.10.4

2015-07-02 Thread Ronald Wood

We are running into an issue when doing distributed queries on Solr 4.10.4. We 
do not use SolrCloud but instead keep track of shards that need to be searched 
based on date ranges.

We have been running distributed queries without incident for several years 
now, but we only recently upgraded to 4.10.4 from 4.8.1.

The query is relatively simple and involves 4 shards, including the aggregator 
itself.

For a while the server that is acting as the aggregator for the distributed 
query handles the requests fine, but after an indefinite amount of usage (in 
the range of 2-4 hours) it starts hanging on all distributed queries while 
serving non-distributed versions  (no shards list is included) of the same 
query quickly (9 ms).

CPU, Heap and System Memory Usage do not seem unusual compared to other servers.

I had initially suspect that distributed searches combined with faceting might 
be part of the issue, since I had seen some long-running threads that seemed to 
spend a long time in the FastLRUCache when getting facets for a single field. 
However, in the latest case of blocked queries, I am not seeing that.

We have two slaves that replicate from a master, and we were saw the issue 
recur after a while of client usage, ruling out a hardware issue.

Does anyone have any suggestions for potential avenues of attack for getting to 
the bottom of this? Or are there any known issues that could be implicated in 
this?

- Ronald S. Wood


RE: Distributed queries hang in a non-SolrCloud environment, Solr 4.10.4

2015-07-02 Thread Ryan, Michael F. (LNG-DAY)
Try running jstack on the aggregator - that will show you where the threads are 
hanging.

-Michael

-Original Message-
From: Ronald Wood [mailto:rw...@smarsh.com] 
Sent: Thursday, July 02, 2015 3:37 PM
To: solr-user@lucene.apache.org
Subject: Distributed queries hang in a non-SolrCloud environment, Solr 4.10.4


We are running into an issue when doing distributed queries on Solr 4.10.4. We 
do not use SolrCloud but instead keep track of shards that need to be searched 
based on date ranges.

We have been running distributed queries without incident for several years 
now, but we only recently upgraded to 4.10.4 from 4.8.1.

The query is relatively simple and involves 4 shards, including the aggregator 
itself.

For a while the server that is acting as the aggregator for the distributed 
query handles the requests fine, but after an indefinite amount of usage (in 
the range of 2-4 hours) it starts hanging on all distributed queries while 
serving non-distributed versions  (no shards list is included) of the same 
query quickly (9 ms).

CPU, Heap and System Memory Usage do not seem unusual compared to other servers.

I had initially suspect that distributed searches combined with faceting might 
be part of the issue, since I had seen some long-running threads that seemed to 
spend a long time in the FastLRUCache when getting facets for a single field. 
However, in the latest case of blocked queries, I am not seeing that.

We have two slaves that replicate from a master, and we were saw the issue 
recur after a while of client usage, ruling out a hardware issue.

Does anyone have any suggestions for potential avenues of attack for getting to 
the bottom of this? Or are there any known issues that could be implicated in 
this?

- Ronald S. Wood


Re: Distributed queries hang in a non-SolrCloud environment, Solr 4.10.4

2015-07-02 Thread Ronald Wood
Thanks I’ll try that. Is the Thread Dump view in the Solr Admin panel not 
reliable for diagnosing thread hangs?

On a different note, I am considering introducing a dedicated aggregator to 
avoid using a shard both for search and aggregation, in case there is an issue 
there.

Ronald S. Wood | Senior Software Developer
857-991-7681 (mobile)
 
Smarsh
100 Franklin St. Suite 903 | Boston, MA 02210
1-866-SMARSH-1 | 971-998-9967 (fax)
www.smarsh.com 
 
Immediate customer support:
Call 1-866-762-7741 (x2) or visit www.smarsh.com/support 








On 7/2/15, 3:56 PM, "Ryan, Michael F. (LNG-DAY)"  
wrote:

>Try running jstack on the aggregator - that will show you where the threads 
>are hanging.
>
>-Michael
>
>-Original Message-
>From: Ronald Wood [mailto:rw...@smarsh.com] 
>Sent: Thursday, July 02, 2015 3:37 PM
>To: solr-user@lucene.apache.org
>Subject: Distributed queries hang in a non-SolrCloud environment, Solr 4.10.4
>
>
>We are running into an issue when doing distributed queries on Solr 4.10.4. We 
>do not use SolrCloud but instead keep track of shards that need to be searched 
>based on date ranges.
>
>We have been running distributed queries without incident for several years 
>now, but we only recently upgraded to 4.10.4 from 4.8.1.
>
>The query is relatively simple and involves 4 shards, including the aggregator 
>itself.
>
>For a while the server that is acting as the aggregator for the distributed 
>query handles the requests fine, but after an indefinite amount of usage (in 
>the range of 2-4 hours) it starts hanging on all distributed queries while 
>serving non-distributed versions  (no shards list is included) of the same 
>query quickly (9 ms).
>
>CPU, Heap and System Memory Usage do not seem unusual compared to other 
>servers.
>
>I had initially suspect that distributed searches combined with faceting might 
>be part of the issue, since I had seen some long-running threads that seemed 
>to spend a long time in the FastLRUCache when getting facets for a single 
>field. However, in the latest case of blocked queries, I am not seeing that.
>
>We have two slaves that replicate from a master, and we were saw the issue 
>recur after a while of client usage, ruling out a hardware issue.
>
>Does anyone have any suggestions for potential avenues of attack for getting 
>to the bottom of this? Or are there any known issues that could be implicated 
>in this?
>
>- Ronald S. Wood


Re: Distributed queries hang in a non-SolrCloud environment, Solr 4.10.4

2015-07-02 Thread Chris Hostetter

: Thanks I’ll try that. Is the Thread Dump view in the Solr Admin panel not 
reliable for diagnosing thread hangs?

If the JVM is totally hung, you might not be able to connect to solr to 
even ask it to generate the hread dump itself -- but jstack may still be 
able to.



-Hoss
http://www.lucidworks.com/

Re: Distributed queries hang in a non-SolrCloud environment, Solr 4.10.4

2015-07-02 Thread Matthew Dickinson
unsubscribe

On 2 July 2015 at 20:36, Ronald Wood  wrote:

>
> We are running into an issue when doing distributed queries on Solr
> 4.10.4. We do not use SolrCloud but instead keep track of shards that need
> to be searched based on date ranges.
>
> We have been running distributed queries without incident for several
> years now, but we only recently upgraded to 4.10.4 from 4.8.1.
>
> The query is relatively simple and involves 4 shards, including the
> aggregator itself.
>
> For a while the server that is acting as the aggregator for the
> distributed query handles the requests fine, but after an indefinite amount
> of usage (in the range of 2-4 hours) it starts hanging on all distributed
> queries while serving non-distributed versions  (no shards list is
> included) of the same query quickly (9 ms).
>
> CPU, Heap and System Memory Usage do not seem unusual compared to other
> servers.
>
> I had initially suspect that distributed searches combined with faceting
> might be part of the issue, since I had seen some long-running threads that
> seemed to spend a long time in the FastLRUCache when getting facets for a
> single field. However, in the latest case of blocked queries, I am not
> seeing that.
>
> We have two slaves that replicate from a master, and we were saw the issue
> recur after a while of client usage, ruling out a hardware issue.
>
> Does anyone have any suggestions for potential avenues of attack for
> getting to the bottom of this? Or are there any known issues that could be
> implicated in this?
>
> - Ronald S. Wood
>


Re: heatmaps

2015-07-02 Thread Joseph Obernberger
Hi - perhaps you do not have enough geospatial data in your index to 
generate a larger image?  Try setting the facet.heatpmap.gridLevel to 
something higher like 4.

I've run queries like:
q=insert whatever 
here&wt=json&indent=true&facet=true&facet.heatmap=geo&facet.heatmap.gridLevel=4&facet.heatmap.format=ints2D&rows=0&facet.heatmap.maxCells=35554432


I'm not familiar with the notation you have 
facet.heatmap=store_geohash.  What is that?


I'm very interested in working with heatmaps and have tried integrating 
the results from Solr with GeoTools.  I have access to a very sizable 
index with geospatial content in it, and I've been able to generate 
images like this:

http://lovehorsepower.com/images/Solr/AllDataSinceJan01_2015_Map.jpg
The above is a 1024x1024 raster.

Doing a geospatial filter query around the UK results in an image like this:
http://lovehorsepower.com/images/Solr/TwitterDataOneDayUKGeoSearchMap.jpg

I would love to be able to apply some heatmap visualization to this.  
Anyone have any ideas on what tools to use?


Thank you.

-Joe


On 6/13/2015 4:29 PM, William Bell wrote:

OK. I was able to get a PNG - but it looks tiny and a blob. Not sure how to
use it.

echo iVBORw0KGgoNSUhEUgAAACAgCAYAAABzenr00UlEQVR42mNg
YGD4P8B41AEwBuNAOoD7PwMbP5BmGygHyP9nENf5z8C45D8Dk+ZAp4FNA+
wA5pn/GZfz/GeQzv3PkL3hP4Pw+f8MLGuBctl0cgCv1n+Gton/GSUs/jPGL/jP+
uvCf467IAdMoFcUyPxnaS78zzCN6T+TKMt/5g+T/jNJraZzOXBc6T/D9bv/
mfp//2f8GDqABZFCzX/GQh86OED1338Grm34FVrMoEcITKdDt
msZrYxGHTDqAFIdwDgaBaMOGHXAqANGHTC8HQAAQf8YdbR5KkIASUVORK5CYII= |
base64 --decode > test.png

Very strange.

Maybe I need to zoom in?



On Sat, Jun 13, 2015 at 2:18 PM, William Bell  wrote:


How do you use the heatmaps feature to show the PNG or use the int2D to
show the heatmap easily just for testing?


http://localhost:8983/solr/select?q=*%3A*&wt=json&rows=0&indent=true&facet=true&facet.heatmap=store_geohash&facet.heatmap.format=png

{

- responseHeader:
{
   - status: 0,
   - QTime: 163,
   - params:
   {
  - facet.heatmap.format: "png",
  - q: "*:*",
  - facet.heatmap: "store_geohash",
  - indent: "true",
  - rows: "0",
  - wt: "json",
  - facet: "true"
  }
   },
- response:
{
   - numFound: 2664396,
   - start: 0,
   - docs: [ ]
   },
- facet_counts:
{
   - facet_queries: { },
   - facet_fields: { },
   - facet_dates: { },
   - facet_ranges: { },
   - facet_intervals: { },
   - facet_heatmaps:
   {
  - store_geohash:
  [
 - "gridLevel",
 - 2,
 - "columns",
 - 32,
 - "rows",
 - 32,
 - "minX",
 - -180,
 - "maxX",
 - 180,
 - "minY",
 - -90,
 - "maxY",
 - 90,
 - "counts_png",
 -
 
"iVBORw0KGgoNSUhEUgAAACAgCAYAAABzenr00UlEQVR42mNgYGD4P8B41AEwBuNAOoD7PwMbP5BmGygHyP9nENf5z8C45D8Dk+ZAp4FNA+wA5pn/GZfz/GeQzv3PkL3hP4Pw+f8MLGuBctl0cgCv1n+Gton/GSUs/jPGL/jP+uvCf467IAdMoFcUyPxnaS78zzCN6T+TKMt/5g+T/jNJraZzOXBc6T/D9bv/mfp//2f8GDqABZFCzX/GQh86OED1338Grm34FVrMoEcITKdDtmsZrYxGHTDqAFIdwDgaBaMOGHXAqANGHTC8HQAAQf8YdbR5KkIASUVORK5CYII="
 ]
  }
   }

}

--
Bill Bell
billnb...@gmail.com
cell 720-256-8076








Re: optimize status

2015-07-02 Thread Summer Shire
Upayavira:
I am using solr 4.7 and yes I am using TieredMergePolicy

Erick:
All my boxes have SSD’s and there isn’t a big disparity between qTime and 
response time.
The performance hit on my end is because of the fragmented index files causing 
more disk seeks are you mentioned.
And I tried requesting fewer docs too but that did not help .



> On Jun 30, 2015, at 5:23 AM, Erick Erickson  wrote:
> 
> I've actually seen this happen right in front of my eyes "in the
> field". However, that was a very high-performance environment. My
> assumption was that fragmented index files were causing more disk
> seeks especially for the first-pass query response in distributed
> mode. So, if the problem is similar, it should go away if you test
> requesting fewer docs. Note: This is not a cure for your problem, but
> would be useful for identifying if it's similar to what I saw.
> 
> NOTE: the symptom was a significant disparity between the QTime (which
> does not measure assembling the document) and the response time. _If_
> that's the case and _if_ my theory that disk access is the culprit,
> then SOLR-5478 and SOLR-6810 should be a big help as they remove the
> first-pass decompression for distributed searches.
> 
> If that hypothesis has any validity, I'd expect you're running on
> spinning-disks rather than SSDs, is that so?
> 
> Best,
> Erick
> 
> On Tue, Jun 30, 2015 at 2:07 AM, Upayavira  wrote:
>> We need to work out why your performance is bad without optimise. What
>> version of Solr are you using? Can you confirm that your config is using
>> the TieredMergePolicy?
>> 
>> Upayavira
>> 
>> Oe, Jun 30, 2015, at 04:48 AM, Summer Shire wrote:
>>> Hi Upayavira and Erick,
>>> 
>>> There are two things we are talking about here.
>>> 
>>> First: Why am I optimizing? If I don’t our SEARCH (NOT INDEXING)
>>> performance is 100% worst.
>>> The problem lies in the number of total segments. We have to have max
>>> segments 1 or 2.
>>> I have done intensive performance related tests around number of
>>> segments, merge factor or changing the Merge policy.
>>> 
>>> Second: Solr does not perform better for me without an optimize. So now
>>> that I have to optimize the second issue
>>> is updating concurrently during an optimize. If I update when an optimize
>>> is happening the optimize takes 5 times as long as
>>> the normal optimize.
>>> 
>>> So is there any way other than creating a postOptimize hook and writing
>>> the status in a file and somehow making it available to the indexer.
>>> All of this just sounds traumatic :)
>>> 
>>> Thanks
>>> Summer
>>> 
>>> 
 On Jun 29, 2015, at 5:40 AM, Erick Erickson  
 wrote:
 
 Steven:
 
 Yes, but
 
 First, here's Mike McCandles' excellent blog on segment merging:
 http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html
 
 I think the third animation is the TieredMergePolicy. In short, yes an
 optimize will reclaim disk space. But as you update, this is done for
 you anyway. About the only time optimizing is at all beneficial is
 when you have a relatively static index. If you're continually
 updating documents, and by that I mean replacing some existing
 documents, then you'll immediately start generating "holes" in your
 index.
 
 And if you _do_ optimize, you wind up with a huge segment. And since
 the default policy tries to merge segments of roughly the same size,
 it accumulates deletes for quite a while before they merged away.
 
 And if you don't update existing docs or delete docs, then there's no
 wasted space anyway.
 
 Summer:
 
 First off, why do you care about not updating during optimizing?
 There's no good reason you have to worry about that, you can freely
 update while optimizing.
 
 But frankly I have to agree with Upayavira that on the face of it
 you're doing a lot of extra work. See above, but you optimize while
 indexing, so immediately you're rather defeating the purpose.
 Personally I'd only optimize relatively static indexes and, by
 definition, you're index isn't static since the second process is just
 waiting to modify it.
 
 Best,
 Erick
 
 On Mon, Jun 29, 2015 at 8:15 AM, Steven White  wrote:
> Hi Upayavira,
> 
> This is news to me that we should not optimize and index.
> 
> What about disk space saving, isn't optimization to reclaim disk space or
> is Solr somehow does that?  Where can I read more about this?
> 
> I'm on Solr 5.1.0 (may switch to 5.2.1)
> 
> Thanks
> 
> Steve
> 
> On Mon, Jun 29, 2015 at 4:16 AM, Upayavira  wrote:
> 
>> I'm afraid I don't understand. You're saying that optimising is causing
>> performance issues?
>> 
>> Simple solution: DO NOT OPTIMIZE!
>> 
>> Optimisation is very badly named. What it does is squashes all segments
>> in your index into on

Re: Distributed queries hang in a non-SolrCloud environment, Solr 4.10.4

2015-07-02 Thread Shalin Shekhar Mangar
On Fri, Jul 3, 2015 at 1:06 AM, Ronald Wood  wrote:
>
>
> I had initially suspect that distributed searches combined with faceting 
> might be part of the issue, since I had seen some long-running threads that 
> seemed to spend a long time in the FastLRUCache when getting facets for a 
> single field. However, in the latest case of blocked queries, I am not seeing 
> that.

That happens because by default the FastLRUCache will steal the
request thread to perform clean up when the number of items exceed a
certain limit. You can avoid this by using cleanupThread="true" as an
attribute in the cache's configuration. That will spawn a new thread
to clean up if and when required so that request threads aren't
blocked for a long time.

>
> We have two slaves that replicate from a master, and we were saw the issue 
> recur after a while of client usage, ruling out a hardware issue.
>
> Does anyone have any suggestions for potential avenues of attack for getting 
> to the bottom of this? Or are there any known issues that could be implicated 
> in this?
>
> - Ronald S. Wood



-- 
Regards,
Shalin Shekhar Mangar.


Problem in facet.contains

2015-07-02 Thread Pritam Kute
Hello All,

I am new user to solr and using solr 5.2.0 setup. I am trying to create
multiple types of facets on same field. I am filtering the facets by using "
*facet.contains*". The following is the data into field.

roles : {
 "0/Student Name/",
 "1/Student Name/1000/",
 "0/Center Name/",
 "1/Center Name/1000/"
}

I am trying to add facet field like following:

query.addFacetField("{!ex=role"+i+" key=role"+i+"
facet.contains=/"+roleType+"/}roles");

where, roleType is iterated and it contains values "Student Name", "Center
Name" etc. and value of i is 1.

But I am getting error as:

org.apache.solr.search.SyntaxError: Expected identifier at pos 63
str='{!key=role1 facet.contains=/Student Name/}roles'

It works nicely if there is no any space in the string. i.e. if I index doc
as "1/StudentName/1000/".

It would be great help if somebody helps me out in this issue. Please
replay if I am missing something in this. Or point me to best practice of
doing hierarchical faceting in solr.

Thanks in advance.

Thanks & Regards,
--
Pritam Kute