Question regarding enablePositionIncrements

2015-04-02 Thread Aman Tandon
Hi,

I was using the enablePositionIncrements in solr 4.8.1 schema. But when I
tries to use it in solr-5.0.0 it is giving error in creating the collection

If I am correct it was useful in phrase queries. So is there any particular
reasons for not supporting this option in solr 5? If so, then please
explain it to me. Thanks in advance.

With Regards
Aman Tandon


Re: SolrCloud 5.0 cluster RAM requirements

2015-04-02 Thread Toke Eskildsen
Ryan Steele  wrote:
> Does a SolrCloud 5.0 cluster need enough RAM across the cluster to load
> all the collections into RAM at all times?

Although Shawn is right about us not being able to answer properly, sometimes 
we can give qualified suggestions and guesses. At least to the direction you 
should be looking. The quality of the guesses goes up with the amount of 
information provided and "1TB" is really not much information.

- Are you indexing while searching? How much?
- How many documents in the index?
- What is a typical query? What about faceting?
- How many concurrent queries?
- Expected median response time?

> I'm building a SolrCloud cluster that may have approximately 1 TB of
> data spread across the collections.

We're running a 22TB SolrCloud of a single 16-core server with 256GB RAM. We've 
also had performance problems serving a 100GB index from a same-size machine.

The one hardware advice I will give is to start with SSDs and scale from there. 
With present day price/performance, using spinning drives for anything 
IO-intensive makes little sense.

- Toke Eskildsen


Alphanumeric Wild card search

2015-04-02 Thread Palagiri, Jayasankar
Hello Team,

Below is my field type


  








  
  







  



And my field is I have few docunets in my index

Like 1234-305, 1234-308,1234-318.

When I search
Thanks and Regards,
Jaya



Re: Restart solr failed after applied the patch in https://issues.apache.org/jira/browse/SOLR-6359

2015-04-02 Thread forest_soup
Thanks Ramkumar!

Understood. We will try 100, 10. 

But with our original steps which we found the exception, can we say that
the patch has some issue? 
1, put the patch to all 5 running solr servers(tomcat) by replacing the
tomcat/webapps/solr/WEB-INF/lib/solr-core-4.7.0.jar with the patched
solr-core-4.7-SNAPSHOT.jar I built out. And we keep them all running.
2, uploaded the solrconfig.xml to zookeeper with below changes: 

${solr.ulog.dir:}
1
100

3, restarted solr server 1(tomcat), after it restarted, it has that
exception in my first POST.
4, restarted solr server 1 again, it still has the same issue.
5, restored the patch by replace the
tomcat/webapps/solr/WEB-INF/lib/solr-core-4.7-SNAPSHOT.jar with the orignal
4.7.0 one.
6, restarted solr server 1 again, there is no that issue. 

So we are thinking if we will have that in version 5.1, after we upgrade
solr, and doing rolling restart, will the issue emerge and we have to do a
full restart which causes service outage. 

Thanks! 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Restart-solr-failed-after-applied-the-patch-in-https-issues-apache-org-jira-browse-SOLR-6359-tp4196251p4197163.html
Sent from the Solr - User mailing list archive at Nabble.com.


sort on facet.index?

2015-04-02 Thread Derek Poh

Is sorting on facet index supported?

I would like to sort on the below facet index


14
8
12
349
81
8
12


to


12
8
81
349
...
...
...


-Derek


Re: Database vs Solr : ID based filtering

2015-04-02 Thread Aman Tandon
Thanks Mikhail for the explanation.

With Regards
Aman Tandon

On Fri, Mar 27, 2015 at 3:40 PM, Mikhail Khludnev <
mkhlud...@griddynamics.com> wrote:

> for the single where clause RDBMS with index performs comparable same as
> inverted index. Inverted index wins on multiple 'where' clauses, where it
> doesn't need composite indices; multivalue field is also its' intrinsic
> advantage. More details at
> http://www.slideshare.net/lucenerevolution/what-is-inaluceneagrandfinal
>
>
> On Fri, Mar 27, 2015 at 9:56 AM, Aman Tandon 
> wrote:
>
> > Hi,
> >
> > Does an ID based filtering on solr will perform poor than DB?
> >
> > 
> >
> >- http://localhost:8983/solr/select?q=*&fq=id:153
> >
> >*OR*
> >
> >- select * from TABLE where id=153
> >
> >
> > With Regards
> > Aman Tandon
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> 
> 
>


Facet sorting algorithm for index

2015-04-02 Thread yriveiro
Hi,

I have an external application that use the output of a facet to join other
dataset using the keys of the facet result.

The facet query use index sort but in some point, my application crash
because the order of the keys "is not correct". If I do an unix sort over
the keys of the result with LC_ALL=C doesn't output the same result.

I identified a case like this:

760d1f833b764591161\"84b20f28242a0
760d1f833b76459116184b20f2

Why the line whit the '\"' is before? This chain of chars is the character "
or is raw and are 2 chars?

In ASCII the " has lower ord than character 8, if \" is " then this sort
makes sense ...

My question here is how index sort works and how I can replicate it in C++





-
Best regards
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Facet-sorting-algorithm-for-index-tp4197174.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Alphanumeric Wild card search

2015-04-02 Thread Palagiri, Jayasankar
Hello Team,

Below is my field type


  








  
  







  



And my field is



I have few docunets in my index

Like 1234-305, 1234-308,1234-318.

When I search Name:"1234-*" I get desired results, but when I search like 
Name:"123-3*" I get 0 results

Can some one help to find what is wrong with my indexing?

When I search
Thanks and Regards,
Jaya



Re: Alphanumeric Wild card search

2015-04-02 Thread Simon Martinelli
Hi,

Have a look at the generated terms to see how they look.

Simon

On Thu, Apr 2, 2015 at 9:43 AM, Palagiri, Jayasankar <
jayashankar.palag...@honeywell.com> wrote:

> Hello Team,
>
> Below is my field type
>
>  positionIncrementGap="100" autoGeneratePhraseQueries="true">
>   
> 
> 
> 
>  ignoreCase="true"
> words="lang/stopwords_en.txt"
> />
>  generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1" />
> 
>  protected="protwords.txt"/>
> 
>   
>   
> 
>  ignoreCase="true" expand="true"/>
>  ignoreCase="true"
> words="lang/stopwords_en.txt"
> />
>  generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
> 
>  protected="protwords.txt"/>
> 
>   
> 
>
>
> And my field is
>
> 
>
> I have few docunets in my index
>
> Like 1234-305, 1234-308,1234-318.
>
> When I search Name:"1234-*" I get desired results, but when I search like
> Name:"123-3*" I get 0 results
>
> Can some one help to find what is wrong with my indexing?
>
> When I search
> Thanks and Regards,
> Jaya
>
>


edismax operators

2015-04-02 Thread Mahmoud Almokadem
Hello,

I've a strange behaviour on using edismax with multiwords. When using
passing q=+(word1 word2) I got

"rawquerystring": "+(word1 word2)", "querystring": "+(word1 word2)", "
parsedquery": "(+(+(DisjunctionMaxQuery((title:word1))
DisjunctionMaxQuery((title:word2)/no_coord",
"parsedquery_toString": "+(+((title:word1)
(title:word2)))",

I expected to get two words as must as I added "+" before the parentheses
so It must be applied for all terms in parentheses.

How can I apply default operator AND for all words.

Thanks,
Mahmoud


Re: Facet sorting algorithm for index

2015-04-02 Thread Yonik Seeley
On Thu, Apr 2, 2015 at 6:36 AM, yriveiro  wrote:
> Hi,
>
> I have an external application that use the output of a facet to join other
> dataset using the keys of the facet result.
>
> The facet query use index sort but in some point, my application crash
> because the order of the keys "is not correct". If I do an unix sort over
> the keys of the result with LC_ALL=C doesn't output the same result.
>
> I identified a case like this:
>
> 760d1f833b764591161\"84b20f28242a0
> 760d1f833b76459116184b20f2
>
> Why the line whit the '\"' is before? This chain of chars is the character "
> or is raw and are 2 chars?
>
> In ASCII the " has lower ord than character 8, if \" is " then this sort
> makes sense ...

How are you viewing the results?  If it's JSON, then yes the backslash
double quote would mean that there is just a literal double quote in
the string.

-Yonik


Re: Alphanumeric Wild card search

2015-04-02 Thread Jack Krupansky
This is caused by the word delimiter filter - it breaks multi-part terms
(the hyphens trigger it) into multiple terms. Wildcards simply don't work
consistently well in such a situation. The basic problem is that the
presence of the wildcard causes all but the simplest token filtering stages
to be bypassed, particularly the word delimiter filter (because it would
have stripped out the wildcard asterisk), so your wildcard term is analyzed
differently than it was indexed, so it fails to match. In other cases it
may match, but that would be happen only if the abbreviated token filtering
actually happened to match the full indexing filtering.

This is a limitation of Solr. You just have to learn to live with it. Or...
don't use the word delimiter filter when you need to be able to do
wildcards of multi-part terms.

-- Jack Krupansky

On Thu, Apr 2, 2015 at 3:43 AM, Palagiri, Jayasankar <
jayashankar.palag...@honeywell.com> wrote:

> Hello Team,
>
> Below is my field type
>
>  positionIncrementGap="100" autoGeneratePhraseQueries="true">
>   
> 
> 
> 
>  ignoreCase="true"
> words="lang/stopwords_en.txt"
> />
>  generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1" />
> 
>  protected="protwords.txt"/>
> 
>   
>   
> 
>  ignoreCase="true" expand="true"/>
>  ignoreCase="true"
> words="lang/stopwords_en.txt"
> />
>  generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
> 
>  protected="protwords.txt"/>
> 
>   
> 
>
>
> And my field is
>
> 
>
> I have few docunets in my index
>
> Like 1234-305, 1234-308,1234-318.
>
> When I search Name:"1234-*" I get desired results, but when I search like
> Name:"123-3*" I get 0 results
>
> Can some one help to find what is wrong with my indexing?
>
> When I search
> Thanks and Regards,
> Jaya
>
>


Re: edismax operators

2015-04-02 Thread Jack Krupansky
The parentheses signal a nested query. Your plus operator applies to the
overall nested query - that the nested query must match something. Use the
plus operator on each of the discrete terms if each of them is mandatory.
The plus and minus operators apply to the overall nested query - they do
not distribute to each term within the nested query. They don't magically
distribute to all nested queries.

Let's see you full set of query parameters, both on the request and in
solrconfig.

-- Jack Krupansky

On Thu, Apr 2, 2015 at 7:12 AM, Mahmoud Almokadem 
wrote:

> Hello,
>
> I've a strange behaviour on using edismax with multiwords. When using
> passing q=+(word1 word2) I got
>
> "rawquerystring": "+(word1 word2)", "querystring": "+(word1 word2)", "
> parsedquery": "(+(+(DisjunctionMaxQuery((title:word1))
> DisjunctionMaxQuery((title:word2)/no_coord",
> "parsedquery_toString": "+(+((title:word1)
> (title:word2)))",
>
> I expected to get two words as must as I added "+" before the parentheses
> so It must be applied for all terms in parentheses.
>
> How can I apply default operator AND for all words.
>
> Thanks,
> Mahmoud
>


Re: Question regarding enablePositionIncrements

2015-04-02 Thread Jack Krupansky
Position increments were considered problematic, especially for
highlighting. Did you get this for the stop filter? There was a Jira for
this - check CHANGES.TXT and the Jira for details.

For some discussion, see:
https://issues.apache.org/jira/browse/SOLR-6468


-- Jack Krupansky

On Thu, Apr 2, 2015 at 3:01 AM, Aman Tandon  wrote:

> Hi,
>
> I was using the enablePositionIncrements in solr 4.8.1 schema. But when I
> tries to use it in solr-5.0.0 it is giving error in creating the collection
>
> If I am correct it was useful in phrase queries. So is there any particular
> reasons for not supporting this option in solr 5? If so, then please
> explain it to me. Thanks in advance.
>
> With Regards
> Aman Tandon
>


Re: Question regarding enablePositionIncrements

2015-04-02 Thread Aman Tandon
Hi Jack,

I read that jira, i understand the concern of heaven.

So does it mean that, no hole will be left when we will use the stop filter?

With Regards
Aman Tandon

On Thu, Apr 2, 2015 at 6:01 PM, Jack Krupansky 
wrote:

> Position increments were considered problematic, especially for
> highlighting. Did you get this for the stop filter? There was a Jira for
> this - check CHANGES.TXT and the Jira for details.
>
> For some discussion, see:
> https://issues.apache.org/jira/browse/SOLR-6468
>
>
> -- Jack Krupansky
>
> On Thu, Apr 2, 2015 at 3:01 AM, Aman Tandon 
> wrote:
>
> > Hi,
> >
> > I was using the enablePositionIncrements in solr 4.8.1 schema. But when I
> > tries to use it in solr-5.0.0 it is giving error in creating the
> collection
> >
> > If I am correct it was useful in phrase queries. So is there any
> particular
> > reasons for not supporting this option in solr 5? If so, then please
> > explain it to me. Thanks in advance.
> >
> > With Regards
> > Aman Tandon
> >
>


Re: Facet sorting algorithm for index

2015-04-02 Thread Yago Riveiro
The result is a custom responseWriter, I found a bug in my code that append de 
\ to “.


The JSON response shows the data without the \.




Where can I found the source code used in index sorting? I need to ensure that 
the external data has the same sorting that the facet result.


—
/Yago Riveiro

On Thu, Apr 2, 2015 at 12:26 PM, Yonik Seeley  wrote:

> On Thu, Apr 2, 2015 at 6:36 AM, yriveiro  wrote:
>> Hi,
>>
>> I have an external application that use the output of a facet to join other
>> dataset using the keys of the facet result.
>>
>> The facet query use index sort but in some point, my application crash
>> because the order of the keys "is not correct". If I do an unix sort over
>> the keys of the result with LC_ALL=C doesn't output the same result.
>>
>> I identified a case like this:
>>
>> 760d1f833b764591161\"84b20f28242a0
>> 760d1f833b76459116184b20f2
>>
>> Why the line whit the '\"' is before? This chain of chars is the character "
>> or is raw and are 2 chars?
>>
>> In ASCII the " has lower ord than character 8, if \" is " then this sort
>> makes sense ...
> How are you viewing the results?  If it's JSON, then yes the backslash
> double quote would mean that there is just a literal double quote in
> the string.
> -Yonik

"Taking Solr 5.0 to Production" on Windows

2015-04-02 Thread Steven White
Hi folks,

I'm reading "Taking Solr 5.0 to Production"
https://cwiki.apache.org/confluence/display/solr/Taking+Solr+to+Production
but I cannot find anything about Windows, is there some other link I'm
missing?

This section in the doc is an important part for a successful Solr
deployment, but it is missing Windows instructions.  Without one, there
will either be scattered deployment or Windows folks (like me) will miss
out on some key aspects that Solr expert know.

Any feedback on this?

Thanks

Steve


Re: sort on facet.index?

2015-04-02 Thread Ryan Josal
Sorting the result set or the facets?  For the facets there is
facet.sort=index (lexicographically) and facet.sort=count.  So maybe you
are asking if you can sort by index, but reversed?  I don't think this is
possible, and it's a good question.  I wanted to chime in on this one
because I wanted my own facet.sort=rank, but there is no nice pluggable way
to implement a new sort.  I'd love to be able to add a Comparator for a new
sort.  I ended up subclassing FacetComponent to sort of hack on the rank
sort implementation but it isn't very pretty and I'm sure not as efficient
as it could be if FacetComponent was designed for more sorts.

Ryan

On Thursday, April 2, 2015, Derek Poh  wrote:

> Is sorting on facet index supported?
>
> I would like to sort on the below facet index
>
> 
> 14
> 8
> 12
> 349
> 81
> 8
> 12
> 
>
> to
>
> 
> 12
> 8
> 81
> 349
> ...
> ...
> ...
> 
>
> -Derek
>


Re: edismax operators

2015-04-02 Thread Mahmoud Almokadem
Thank you Jack for your clarifications. I used regular defType and set
q.op=AND so all terms without operators are must. How can I use this with
edismax?

Thanks,
Mahmoud

On Thu, Apr 2, 2015 at 2:14 PM, Jack Krupansky 
wrote:

> The parentheses signal a nested query. Your plus operator applies to the
> overall nested query - that the nested query must match something. Use the
> plus operator on each of the discrete terms if each of them is mandatory.
> The plus and minus operators apply to the overall nested query - they do
> not distribute to each term within the nested query. They don't magically
> distribute to all nested queries.
>
> Let's see you full set of query parameters, both on the request and in
> solrconfig.
>
> -- Jack Krupansky
>
> On Thu, Apr 2, 2015 at 7:12 AM, Mahmoud Almokadem 
> wrote:
>
> > Hello,
> >
> > I've a strange behaviour on using edismax with multiwords. When using
> > passing q=+(word1 word2) I got
> >
> > "rawquerystring": "+(word1 word2)", "querystring": "+(word1 word2)", "
> > parsedquery": "(+(+(DisjunctionMaxQuery((title:word1))
> > DisjunctionMaxQuery((title:word2)/no_coord",
> > "parsedquery_toString": "+(+((title:word1)
> > (title:word2)))",
> >
> > I expected to get two words as must as I added "+" before the parentheses
> > so It must be applied for all terms in parentheses.
> >
> > How can I apply default operator AND for all words.
> >
> > Thanks,
> > Mahmoud
> >
>


Re: Question regarding enablePositionIncrements

2015-04-02 Thread Jack Krupansky
That's my understanding - but use the Solr Admin UI analysis page to
confirm exactly what happens, for both index and query analysis.

-- Jack Krupansky

On Thu, Apr 2, 2015 at 10:04 AM, Aman Tandon 
wrote:

> Hi Jack,
>
> I read that jira, i understand the concern of heaven.
>
> So does it mean that, no hole will be left when we will use the stop
> filter?
>
> With Regards
> Aman Tandon
>
> On Thu, Apr 2, 2015 at 6:01 PM, Jack Krupansky 
> wrote:
>
> > Position increments were considered problematic, especially for
> > highlighting. Did you get this for the stop filter? There was a Jira for
> > this - check CHANGES.TXT and the Jira for details.
> >
> > For some discussion, see:
> > https://issues.apache.org/jira/browse/SOLR-6468
> >
> >
> > -- Jack Krupansky
> >
> > On Thu, Apr 2, 2015 at 3:01 AM, Aman Tandon 
> > wrote:
> >
> > > Hi,
> > >
> > > I was using the enablePositionIncrements in solr 4.8.1 schema. But
> when I
> > > tries to use it in solr-5.0.0 it is giving error in creating the
> > collection
> > >
> > > If I am correct it was useful in phrase queries. So is there any
> > particular
> > > reasons for not supporting this option in solr 5? If so, then please
> > > explain it to me. Thanks in advance.
> > >
> > > With Regards
> > > Aman Tandon
> > >
> >
>


Re: Facet sorting algorithm for index

2015-04-02 Thread Yonik Seeley
On Thu, Apr 2, 2015 at 9:44 AM, Yago Riveiro  wrote:
> Where can I found the source code used in index sorting? I need to ensure 
> that the external data has the same sorting that the facet result.

If you step over the indexed terms of a field you get them in sorted
order (hence for a single node, the sorting is done at indexing time).
Lucene index order for text will essentially be unicode code point order.

-Yonik


Re: edismax operators

2015-04-02 Thread Shawn Heisey
On 4/2/2015 8:35 AM, Mahmoud Almokadem wrote:
> Thank you Jack for your clarifications. I used regular defType and set
> q.op=AND so all terms without operators are must. How can I use this with
> edismax?

The edismax parser is capable of much more granularity than simply
AND/OR on the default operator, through the mm parameter.  If you set
q.op to AND, the mm parameter will be set to 100%.  The mm parameter is
EXTREMELY flexible.

https://wiki.apache.org/solr/ExtendedDisMax#mm_.28Minimum_.27Should.27_Match.29

Thanks,
Shawn



Re: "Taking Solr 5.0 to Production" on Windows

2015-04-02 Thread Shawn Heisey
On 4/2/2015 8:20 AM, Steven White wrote:
> I'm reading "Taking Solr 5.0 to Production"
> https://cwiki.apache.org/confluence/display/solr/Taking+Solr+to+Production
> but I cannot find anything about Windows, is there some other link I'm
> missing?
>
> This section in the doc is an important part for a successful Solr
> deployment, but it is missing Windows instructions.  Without one, there
> will either be scattered deployment or Windows folks (like me) will miss
> out on some key aspects that Solr expert know.

We are aware that the documentation is missing step-by-step information
for Windows.  We are all volunteers, and there's a limited amount of
free time available.  The hole in the documentation will eventually get
filled, but it's not going to happen immediately.  The available
solutions must be studied so the best option can be determined, and it
probably will require some development work to automate the install.

You might get the sense that Windows is treated as a second class
citizen around here ... and I think you'd probably be right to feel that
way.  There are no technical advantages in a Windows server over the
free operating systems like Linux.  The biggest disadvantage is in
Microsoft's licensing model.  A Windows Server OS has a hefty price tag,
and client operating systems like Windows 7 and 8 are intentionally
crippled by Microsoft so they run heavy-duty server programs poorly, in
order to sell more Server licenses.  If a Solr install is not very busy,
Windows 7 or 8 would probably run it just fine, but a very busy install
will run into problems if it's on a client OS.  Unfortunately I cannot
find any concrete information about the precise limitations in client
operating systems.

Thanks,
Shawn



Re: sort on facet.index?

2015-04-02 Thread Toke Eskildsen
Ryan Josal  wrote:
> So maybe you are asking if you can sort by index, but reversed?
> I don't think this is possible, and it's a good question.

It is not currently possible and the JIRA for the issue 
  https://issues.apache.org/jira/browse/SOLR-1672
is 5 years old. On the plus side, there seems to be renewed interest for it.

- Toke Eskildsen


Re: How to recover a Shard

2015-04-02 Thread Erick Erickson
Matt:

This seems dangerous, but you might be able to use the Collections API to
1> DELTEREPLICA an all but one.
2> RELOAD the collection
3> ADDREPLICA back.

I don't _like_ this much mind you as when you added the replicas back
it'd replicate the index from the leader, but at least you might not
have to take Solr down.

I'm not completely sure that this'll work, mind you but

Erick

On Wed, Apr 1, 2015 at 8:04 PM, Matt Kuiper  wrote:
> Maybe I have been working too many long hours as I missed the obvious 
> solution of bringing down/up one of the Solr nodes backing one of the 
> replicas, and then the same for the second node.  This did the trick.
>
> Since I brought this topic up, I will narrow the question a bit:  Would there 
> be a way to recover without restarting the Solr node?  Basically to delete 
> one replica and then somehow declare the other replica the leader and break 
> it out of its recovery process?
>
> Thanks,
> Matt
>
>
> From: Matt Kuiper
> Sent: Wednesday, April 01, 2015 8:43 PM
> To: solr-user@lucene.apache.org
> Subject: How to recover a Shard
>
> Hello,
>
> I have a SolrCloud (4.10.1) where for one of the shards, both replicas are in 
> a "Recovery Failed" state per the Solr Admin Cloud page.  The logs contains 
> the following type of entries for the two Solr nodes involved, including 
> statements that it will retry.
>
> Is there a way to recover from this state?
>
> Maybe bring down one replica, and then somehow declare that the remaining 
> replica is to be the leader?  Understand this would not be ideal as the new 
> leader may be missing documents that were sent its way to be indexed while it 
> was down, but would be better than having to rebuild the whole cloud.
>
> Any tips or suggestions would be appreciated.
>
> Thanks,
> Matt
>
> Solr node .65
> Error while trying to recover. 
> core=kla_collection_shard6_replica5:org.apache.solr.common.SolrException: No 
> registered leader was found after waiting for 4000ms , collection: 
> kla_collection slice: shard6
>  at 
> org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(ZkStateReader.java:568)
>  at 
> org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(ZkStateReader.java:551)
>  at 
> org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:332)
>  at 
> org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:235)
> Solr node .64
>
> Error while trying to recover. 
> core=kla_collection_shard6_replica2:org.apache.solr.common.SolrException: No 
> registered leader was found after waiting for 4000ms , collection: 
> kla_collection slice: shard6
>
>  at 
> org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(ZkStateReader.java:568)
>
>  at 
> org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(ZkStateReader.java:551)
>
>  at 
> org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:332)
>
>  at 
> org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:235)
>


RE: How to recover a Shard

2015-04-02 Thread Matt Kuiper
Thanks Erick!  Understand your warning.  Next time it occurs, I will plan to 
give it a try.  I am currently in a dev environment, so it is a safe place to 
experiment.

Matt

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Thursday, April 02, 2015 9:40 AM
To: solr-user@lucene.apache.org
Subject: Re: How to recover a Shard

Matt:

This seems dangerous, but you might be able to use the Collections API to
1> DELTEREPLICA an all but one.
2> RELOAD the collection
3> ADDREPLICA back.

I don't _like_ this much mind you as when you added the replicas back it'd 
replicate the index from the leader, but at least you might not have to take 
Solr down.

I'm not completely sure that this'll work, mind you but

Erick

On Wed, Apr 1, 2015 at 8:04 PM, Matt Kuiper  wrote:
> Maybe I have been working too many long hours as I missed the obvious 
> solution of bringing down/up one of the Solr nodes backing one of the 
> replicas, and then the same for the second node.  This did the trick.
>
> Since I brought this topic up, I will narrow the question a bit:  Would there 
> be a way to recover without restarting the Solr node?  Basically to delete 
> one replica and then somehow declare the other replica the leader and break 
> it out of its recovery process?
>
> Thanks,
> Matt
>
>
> From: Matt Kuiper
> Sent: Wednesday, April 01, 2015 8:43 PM
> To: solr-user@lucene.apache.org
> Subject: How to recover a Shard
>
> Hello,
>
> I have a SolrCloud (4.10.1) where for one of the shards, both replicas are in 
> a "Recovery Failed" state per the Solr Admin Cloud page.  The logs contains 
> the following type of entries for the two Solr nodes involved, including 
> statements that it will retry.
>
> Is there a way to recover from this state?
>
> Maybe bring down one replica, and then somehow declare that the remaining 
> replica is to be the leader?  Understand this would not be ideal as the new 
> leader may be missing documents that were sent its way to be indexed while it 
> was down, but would be better than having to rebuild the whole cloud.
>
> Any tips or suggestions would be appreciated.
>
> Thanks,
> Matt
>
> Solr node .65
> Error while trying to recover. 
> core=kla_collection_shard6_replica5:org.apache.solr.common.SolrException: No 
> registered leader was found after waiting for 4000ms , collection: 
> kla_collection slice: shard6
>  at 
> org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(ZkStateReader.java:568)
>  at 
> org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(ZkStateReader.java:551)
>  at 
> org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:332)
>  at 
> org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:235)
> Solr node .64
>
> Error while trying to recover. 
> core=kla_collection_shard6_replica2:org.apache.solr.common.SolrExcepti
> on: No registered leader was found after waiting for 4000ms , 
> collection: kla_collection slice: shard6
>
>  at 
> org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(ZkStateReade
> r.java:568)
>
>  at 
> org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(ZkStateReade
> r.java:551)
>
>  at 
> org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.jav
> a:332)
>
>  at 
> org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:235)
>


RE: edismax operators

2015-04-02 Thread Davis, Daniel (NIH/NLM) [C]
Can the mm parameter be set per clause?I guess I've ignored it in the past 
aside from setting it once to what seemed like a reasonable value.
That is probably replicated across every collection, which cannot be ideal for 
relevance.

-Original Message-
From: Shawn Heisey [mailto:apa...@elyograg.org] 
Sent: Thursday, April 02, 2015 11:13 AM
To: solr-user@lucene.apache.org
Subject: Re: edismax operators

On 4/2/2015 8:35 AM, Mahmoud Almokadem wrote:
> Thank you Jack for your clarifications. I used regular defType and set 
> q.op=AND so all terms without operators are must. How can I use this 
> with edismax?

The edismax parser is capable of much more granularity than simply AND/OR on 
the default operator, through the mm parameter.  If you set q.op to AND, the mm 
parameter will be set to 100%.  The mm parameter is EXTREMELY flexible.

https://wiki.apache.org/solr/ExtendedDisMax#mm_.28Minimum_.27Should.27_Match.29

Thanks,
Shawn



Re: sort on facet.index?

2015-04-02 Thread Yonik Seeley
On Thu, Apr 2, 2015 at 10:25 AM, Ryan Josal  wrote:
> Sorting the result set or the facets?  For the facets there is
> facet.sort=index (lexicographically) and facet.sort=count.  So maybe you
> are asking if you can sort by index, but reversed?  I don't think this is
> possible, and it's a good question.

The new facet module that will be in Solr 5.1 supports sorting both
directions on both count and index order (as well as by statistics /
bucket aggregations).
http://yonik.com/json-facet-api/

-Yonik


Re: sort on facet.index?

2015-04-02 Thread Ryan Josal
Awesome, I didn't know this feature was going to add so much power!
Looking forward to using it.

On Thursday, April 2, 2015, Yonik Seeley  wrote:

> On Thu, Apr 2, 2015 at 10:25 AM, Ryan Josal  > wrote:
> > Sorting the result set or the facets?  For the facets there is
> > facet.sort=index (lexicographically) and facet.sort=count.  So maybe you
> > are asking if you can sort by index, but reversed?  I don't think this is
> > possible, and it's a good question.
>
> The new facet module that will be in Solr 5.1 supports sorting both
> directions on both count and index order (as well as by statistics /
> bucket aggregations).
> http://yonik.com/json-facet-api/
>
> -Yonik
>


Re: edismax operators

2015-04-02 Thread Erick Erickson
The MM parameter is specific to the handler you set up/use, so it's
really on a per collection basis. Different collections can specify
this however they want.

Or I misunderstand what you're asking..

Best,
Erick

On Thu, Apr 2, 2015 at 8:59 AM, Davis, Daniel (NIH/NLM) [C]
 wrote:
> Can the mm parameter be set per clause?I guess I've ignored it in the 
> past aside from setting it once to what seemed like a reasonable value.
> That is probably replicated across every collection, which cannot be ideal 
> for relevance.
>
> -Original Message-
> From: Shawn Heisey [mailto:apa...@elyograg.org]
> Sent: Thursday, April 02, 2015 11:13 AM
> To: solr-user@lucene.apache.org
> Subject: Re: edismax operators
>
> On 4/2/2015 8:35 AM, Mahmoud Almokadem wrote:
>> Thank you Jack for your clarifications. I used regular defType and set
>> q.op=AND so all terms without operators are must. How can I use this
>> with edismax?
>
> The edismax parser is capable of much more granularity than simply AND/OR on 
> the default operator, through the mm parameter.  If you set q.op to AND, the 
> mm parameter will be set to 100%.  The mm parameter is EXTREMELY flexible.
>
> https://wiki.apache.org/solr/ExtendedDisMax#mm_.28Minimum_.27Should.27_Match.29
>
> Thanks,
> Shawn
>


Re: edismax operators

2015-04-02 Thread Shawn Heisey
On 4/2/2015 9:59 AM, Davis, Daniel (NIH/NLM) [C] wrote:
> Can the mm parameter be set per clause?I guess I've ignored it in the 
> past aside from setting it once to what seemed like a reasonable value.
> That is probably replicated across every collection, which cannot be ideal 
> for relevance.

It applies to the whole query.  You can have a different value on every
query you send.  Just like with other parameters, defaults can be
configured in the solrconfig.xml request handler definition.

Thanks,
Shawn




RE: edismax operators

2015-04-02 Thread Davis, Daniel (NIH/NLM) [C]
Thanks Shawn,

This is what I thought, but Solr often has features I don't anticipate.

-Original Message-
From: Shawn Heisey [mailto:apa...@elyograg.org] 
Sent: Thursday, April 02, 2015 12:54 PM
To: solr-user@lucene.apache.org
Subject: Re: edismax operators

On 4/2/2015 9:59 AM, Davis, Daniel (NIH/NLM) [C] wrote:
> Can the mm parameter be set per clause?I guess I've ignored it in the 
> past aside from setting it once to what seemed like a reasonable value.
> That is probably replicated across every collection, which cannot be ideal 
> for relevance.

It applies to the whole query.  You can have a different value on every query 
you send.  Just like with other parameters, defaults can be configured in the 
solrconfig.xml request handler definition.

Thanks,
Shawn



Generating json response in custom requestHandler (xml is working)

2015-04-02 Thread Christian Reuschling
Hi,

I managed it to create a small custom requestHandler, and filled the response 
parameter with  some
static values in the structure I want to have later.

I can invoke the requestHander from the browser and get nicely xml with the 
data and structure I
had specified - so far so good. Here is the xml response:




0
17


13.0
14.0
15
16.0
17.0


13.0
14.0
15
16.0
17.0

id1
id2
id3
id4



13.0
14.0
15
16.0
17.0

id1
id2
id3
id4







Now I simply add &wt=json to the invocation. Sadly I get a

HTTP ERROR 404

Problem accessing /solr/etr_base_core/trends&wt=json. Reason:

Not Found


I had the feeling that the response format is transparent for me when I write a 
custom
requestHandler. But it seems I've overseen something.

Does anybody have an idea?


Regards

Christian


RE: Generating json response in custom requestHandler (xml is working)

2015-04-02 Thread Davis, Daniel (NIH/NLM) [C]
Use XSLT to generate JSON?But you probably actually do want both, and 
ruby/python, etc.

-Original Message-
From: Christian Reuschling [mailto:christian.reuschl...@gmail.com] 
Sent: Thursday, April 02, 2015 12:51 PM
To: solr-user@lucene.apache.org
Subject: Generating json response in custom requestHandler (xml is working)

Hi,

I managed it to create a small custom requestHandler, and filled the response 
parameter with  some static values in the structure I want to have later.

I can invoke the requestHander from the browser and get nicely xml with the 
data and structure I had specified - so far so good. Here is the xml response:




0
17


13.0
14.0
15
16.0
17.0


13.0
14.0
15
16.0
17.0

id1
id2
id3
id4



13.0
14.0
15
16.0
17.0

id1
id2
id3
id4







Now I simply add &wt=json to the invocation. Sadly I get a

HTTP ERROR 404

Problem accessing /solr/etr_base_core/trends&wt=json. Reason:

Not Found


I had the feeling that the response format is transparent for me when I write a 
custom requestHandler. But it seems I've overseen something.

Does anybody have an idea?


Regards

Christian


Re: edismax operators

2015-04-02 Thread Mahmoud Almokadem
Thanks all for you response,

But the parsed_query and number of results still when changing MM parameter

the following results for mm=100% and mm=0%

http://solrserver/solr/collection1/select?q=%2B(word1+word2)&rows=0&fl=Title&wt=json&indent=true&debugQuery=true&defType=edismax&qf=title&mm=100%25&stopwords=true&lowercaseOperators=true


"rawquerystring": "+(word1 word2)", "querystring": "+(word1 word2)",
"parsedquery": "(+(+(DisjunctionMaxQuery((title:word1))
DisjunctionMaxQuery((title:word2)/no_coord",
"parsedquery_toString": "+(+((title:word1) (title:word2)))”,



http://solrserver/solr/collection1/select?q=%2B(word1+word2)&rows=0&fl=Title&wt=json&indent=true&debugQuery=true&defType=edismax&qf=title&mm=0%25&stopwords=true&lowercaseOperators=true


"rawquerystring": "+(word1 word2)", "querystring": "+(word1 word2)",
"parsedquery": "(+(+(DisjunctionMaxQuery((title:word1))
DisjunctionMaxQuery((title:word2)/no_coord",
"parsedquery_toString": "+(+((title:word1) (title:word2)))",

There are any changes on two queries

solr version 4.8.1

Thanks,
Mahmoud

On Thu, Apr 2, 2015 at 6:56 PM, Davis, Daniel (NIH/NLM) [C] <
daniel.da...@nih.gov> wrote:

> Thanks Shawn,
>
> This is what I thought, but Solr often has features I don't anticipate.
>
> -Original Message-
> From: Shawn Heisey [mailto:apa...@elyograg.org]
> Sent: Thursday, April 02, 2015 12:54 PM
> To: solr-user@lucene.apache.org
> Subject: Re: edismax operators
>
> On 4/2/2015 9:59 AM, Davis, Daniel (NIH/NLM) [C] wrote:
> > Can the mm parameter be set per clause?I guess I've ignored it in
> the past aside from setting it once to what seemed like a reasonable value.
> > That is probably replicated across every collection, which cannot be
> ideal for relevance.
>
> It applies to the whole query.  You can have a different value on every
> query you send.  Just like with other parameters, defaults can be
> configured in the solrconfig.xml request handler definition.
>
> Thanks,
> Shawn
>
>


RE: Generating json response in custom requestHandler (xml is working)

2015-04-02 Thread Davis, Daniel (NIH/NLM) [C]
I mean that you could use XSLTResponseWriter to generate exactly the format you 
want.   However, I anticipate that if you already have a custom response, 
getting it to automatically generate XML/JSON/Python/Ruby was an expectation, 
and may be a requirement.

Maybe you should look at the code - it could be that the standard response 
writer looks explicitly at the "wt" parameter and does something using these 
other response writers that you should copy.

-Original Message-
From: Davis, Daniel (NIH/NLM) [C] 
Sent: Thursday, April 02, 2015 1:00 PM
To: solr-user@lucene.apache.org
Subject: RE: Generating json response in custom requestHandler (xml is working)

Use XSLT to generate JSON?But you probably actually do want both, and 
ruby/python, etc.

-Original Message-
From: Christian Reuschling [mailto:christian.reuschl...@gmail.com] 
Sent: Thursday, April 02, 2015 12:51 PM
To: solr-user@lucene.apache.org
Subject: Generating json response in custom requestHandler (xml is working)

Hi,

I managed it to create a small custom requestHandler, and filled the response 
parameter with  some static values in the structure I want to have later.

I can invoke the requestHander from the browser and get nicely xml with the 
data and structure I had specified - so far so good. Here is the xml response:




0
17


13.0
14.0
15
16.0
17.0


13.0
14.0
15
16.0
17.0

id1
id2
id3
id4



13.0
14.0
15
16.0
17.0

id1
id2
id3
id4







Now I simply add &wt=json to the invocation. Sadly I get a

HTTP ERROR 404

Problem accessing /solr/etr_base_core/trends&wt=json. Reason:

Not Found


I had the feeling that the response format is transparent for me when I write a 
custom requestHandler. But it seems I've overseen something.

Does anybody have an idea?


Regards

Christian


Re: edismax operators

2015-04-02 Thread Jack Krupansky
Personally, I am not convinced how the q.op and mm parameters are really
handled within nested queries. There have been bugs in edismax and some
oddities for how it does work. I have personally given up on figuring out
how the code works. At one stage, back in the days when I did feel that I
had a handle on the code, the q.op/mm logic seemed to apply only to the
outer, top level of the query, not to the nested terms of the query, but my
recollection could be faulty on that specific point, and it may have
changed as some bugs have been fixed.

So, I would suggest that you file a Jira and let the committers sort out
whether it is really a bug or simply needs better doc for its expected
behavior on this specific issue.

-- Jack Krupansky

On Thu, Apr 2, 2015 at 1:02 PM, Mahmoud Almokadem 
wrote:

> Thanks all for you response,
>
> But the parsed_query and number of results still when changing MM parameter
>
> the following results for mm=100% and mm=0%
>
>
> http://solrserver/solr/collection1/select?q=%2B(word1+word2)&rows=0&fl=Title&wt=json&indent=true&debugQuery=true&defType=edismax&qf=title&mm=100%25&stopwords=true&lowercaseOperators=true
> <
> http://10.1.1.118:8090/solr/PAEB/select?q=%2B(word1+word2)&rows=0&fl=Title&wt=json&indent=true&debugQuery=true&defType=edismax&qf=title&mm=100%25&stopwords=true&lowercaseOperators=true
> >
>
> "rawquerystring": "+(word1 word2)", "querystring": "+(word1 word2)",
> "parsedquery": "(+(+(DisjunctionMaxQuery((title:word1))
> DisjunctionMaxQuery((title:word2)/no_coord",
> "parsedquery_toString": "+(+((title:word1) (title:word2)))”,
>
>
>
>
> http://solrserver/solr/collection1/select?q=%2B(word1+word2)&rows=0&fl=Title&wt=json&indent=true&debugQuery=true&defType=edismax&qf=title&mm=0%25&stopwords=true&lowercaseOperators=true
> <
> http://10.1.1.118:8090/solr/PAEB/select?q=%2B(word1+word2)&rows=0&fl=Title&wt=json&indent=true&debugQuery=true&defType=edismax&qf=title&mm=100%25&stopwords=true&lowercaseOperators=true
> >
>
> "rawquerystring": "+(word1 word2)", "querystring": "+(word1 word2)",
> "parsedquery": "(+(+(DisjunctionMaxQuery((title:word1))
> DisjunctionMaxQuery((title:word2)/no_coord",
> "parsedquery_toString": "+(+((title:word1) (title:word2)))",
>
> There are any changes on two queries
>
> solr version 4.8.1
>
> Thanks,
> Mahmoud
>
> On Thu, Apr 2, 2015 at 6:56 PM, Davis, Daniel (NIH/NLM) [C] <
> daniel.da...@nih.gov> wrote:
>
> > Thanks Shawn,
> >
> > This is what I thought, but Solr often has features I don't anticipate.
> >
> > -Original Message-
> > From: Shawn Heisey [mailto:apa...@elyograg.org]
> > Sent: Thursday, April 02, 2015 12:54 PM
> > To: solr-user@lucene.apache.org
> > Subject: Re: edismax operators
> >
> > On 4/2/2015 9:59 AM, Davis, Daniel (NIH/NLM) [C] wrote:
> > > Can the mm parameter be set per clause?I guess I've ignored it in
> > the past aside from setting it once to what seemed like a reasonable
> value.
> > > That is probably replicated across every collection, which cannot be
> > ideal for relevance.
> >
> > It applies to the whole query.  You can have a different value on every
> > query you send.  Just like with other parameters, defaults can be
> > configured in the solrconfig.xml request handler definition.
> >
> > Thanks,
> > Shawn
> >
> >
>


Re: Generating json response in custom requestHandler (xml is working)

2015-04-02 Thread Shalin Shekhar Mangar
The URL you are trying to access is wrong. You are using
/solr/etr_base_core/trends&wt=json but you should be using
/solr/etr_base_core/trends?wt=json

On Thu, Apr 2, 2015 at 9:51 AM, Christian Reuschling <
christian.reuschl...@gmail.com> wrote:

> Hi,
>
> I managed it to create a small custom requestHandler, and filled the
> response parameter with  some
> static values in the structure I want to have later.
>
> I can invoke the requestHander from the browser and get nicely xml with
> the data and structure I
> had specified - so far so good. Here is the xml response:
>
>
> 
> 
> 0
> 17
> 
> 
> 13.0
> 14.0
> 15
> 16.0
> 17.0
> 
> 
> 13.0
> 14.0
> 15
> 16.0
> 17.0
> 
> id1
> id2
> id3
> id4
> 
> 
> 
> 13.0
> 14.0
> 15
> 16.0
> 17.0
> 
> id1
> id2
> id3
> id4
> 
> 
> 
> 
> 
>
>
> Now I simply add &wt=json to the invocation. Sadly I get a
>
> HTTP ERROR 404
>
> Problem accessing /solr/etr_base_core/trends&wt=json. Reason:
>
> Not Found
>
>
> I had the feeling that the response format is transparent for me when I
> write a custom
> requestHandler. But it seems I've overseen something.
>
> Does anybody have an idea?
>
>
> Regards
>
> Christian
>



-- 
Regards,
Shalin Shekhar Mangar.


newbie questions regarding solr cloud

2015-04-02 Thread Ben Hsu
Hello

I am playing with solr5 right now, to see if its cloud features can replace
what we have with solr 3.6, and I have some questions, some newbie, and
some not so newbie

Background: the documents we are putting in solr have a date field. the
majority of our searches are restricted to documents created within the
last week, but searches do go back 60 days. documents older than 60 days
are removed from the repo. we also want high availability in case a machine
becomes unavailable

our current method, using solr 3.6, is to split the data into 1 day chunks,
within each day the data is split into several shards, and each shard has 2
replicas. Our code generates the list of cores to be queried on based on
the time ranged in the query. Cores that fall off the 60 day range are
deleted through solr's RESTful API.

This all sounds a lot like what Solr Cloud provides, so I started looking
at Solr Cloud's features.

My newbie questions:

 - it looks like the way to write a document is to pick a node (possibly
using a LB), send it to that node, and let solr figure out which nodes that
document is supposed to go. is this the recommended way?
 - similarly, can I just randomly pick a core (using the demo example:
http://localhost:7575/solr/#/gettingstarted_shard1_replica2/query ), query
it, and let it scatter out the queries to the appropriate cores, and send
me the results back? will it give me back results from all the shards?
 - is there a recommended Python library?

My hopefully less newbie questions:
 - does solr auto detect when node become unavailable, and stop sending
queries to them?
 - when the master node dies and the cluster elects a new master, what
happens to writes?
 - what happens when a node is unavailable
 - what is the procedure when a shard becomes too big for one machine, and
needs to be split?
 - what is the procedure when we lose a machine and the node needs replacing
 - how would we quickly bulk delete data within a date range?


Re: newbie questions regarding solr cloud

2015-04-02 Thread Erick Erickson
See inline:

On Thu, Apr 2, 2015 at 12:36 PM, Ben Hsu  wrote:
> Hello
>
> I am playing with solr5 right now, to see if its cloud features can replace
> what we have with solr 3.6, and I have some questions, some newbie, and
> some not so newbie
>
> Background: the documents we are putting in solr have a date field. the
> majority of our searches are restricted to documents created within the
> last week, but searches do go back 60 days. documents older than 60 days
> are removed from the repo. we also want high availability in case a machine
> becomes unavailable
>
> our current method, using solr 3.6, is to split the data into 1 day chunks,
> within each day the data is split into several shards, and each shard has 2
> replicas. Our code generates the list of cores to be queried on based on
> the time ranged in the query. Cores that fall off the 60 day range are
> deleted through solr's RESTful API.
>
> This all sounds a lot like what Solr Cloud provides, so I started looking
> at Solr Cloud's features.
>
> My newbie questions:
>
>  - it looks like the way to write a document is to pick a node (possibly
> using a LB), send it to that node, and let solr figure out which nodes that
> document is supposed to go. is this the recommended way?

[EOE] That's totally fine. If you're using SolrJ a better way is to
use CloudSolrClient
which sends the docs to the proper leader, thus saving one hop.

>  - similarly, can I just randomly pick a core (using the demo example:
> http://localhost:7575/solr/#/gettingstarted_shard1_replica2/query ), query
> it, and let it scatter out the queries to the appropriate cores, and send
> me the results back? will it give me back results from all the shards?

[EOE] Yes. Actually, you don't even have to pick a core, just a collection.
The # is totally unneeded, it's just part of navigating around the UI. So this
should work:
http://localhost:7575/solr/gettingstarted/query?q=*:*

>  - is there a recommended Python library?
[EOE] Unsure. If you do find one, check that it has the
CloudSolrClient support as
I expect that would take the most effort

>
> My hopefully less newbie questions:
>  - does solr auto detect when node become unavailable, and stop sending
> queries to them?

[EOE] Yes, that's what Zookeeper is all about. As each Solr node comes up it
registers itself as a listener for collection state changes. ZK
detects a node dying and
notifies all the remaining nodes that nodeX is out of commission and
they adjust accordingly.

>  - when the master node dies and the cluster elects a new master, what
> happens to writes?
[EOE] Stop thinking master/slave! It's "leaders" and "replicas"
(although I'm trying
to use "leaders" and "followers"). The critical bit is that on an
update, the raw document
is forwarded from the leader to all followers so they can come and go.
You simply cannot
rely on a particular node that is a leader remaining the leader. For
instance, if you bring up
your nodes in a different order tomorrow, the leaders and followers
won't be the same.


>  - what happens when a node is unavailable
[EOE] SolrCloud "does the right thing" and keeps on chugging. See the
comments about
auto-detect. The exception is that if _all_ the nodes hosting a shard
go down, you cannot
add to the index and queries will fail unless you set shards.tolerant=true.

>  - what is the procedure when a shard becomes too big for one machine, and
> needs to be split?
There is the Collections API SPLITSHARD command you can use. This means that
you increase by powers of two though, there's no such thing as adding,
say, one new
shard to a 4 shard cluster.

You can also reindex from scratch.

You can also "overshard" when you initially create your collection and
host multiple
shards and/or replicas on a single machine, then physically move them when the
aggregate size exceeds your boundaries.

>  - what is the procedure when we lose a machine and the node needs replacing
Use the Collections API to DELETEREPLICA on the replicas on the dead node.
Use the Collections API to ADREPLICA on new machines.

>  - how would we quickly bulk delete data within a date range?
[EOE]
...solr/update?commit=true&stream.body=date_field:[DATE1
TO DATE2]

ShardHandler semantics

2015-04-02 Thread Gregg Donovan
We're starting work on adding backup requests

to the ShardHandler. Roughly something like:

1. Send requests to 100 shards.
2. Wait for results from 75 to come back.
3. Wait for either a) the other 25 to come back or b) 20% more time to
elapse
4. If any shards have still not returned results, send a second request to
a different server for each of the missing shards.

I want to be sure I understand the ShardHandler contract correctly before
getting started. My understanding is :

--ShardHandler#take methods

can be called with different ShardRequests having been submitted

.
--ShardHandler#takeXXX is then called in a loop, returning a ShardResponse
from the last shard returning for a given ShardRequest.
--When ShardHandler#takeXXX returns null, the SearchHandler

proceeds

.

For example, the flow could look like:

shardHandler.submit(slowGroupingRequest, "shard1", groupingParams);
shardHandler.submit(slowGroupingRequest, "shard2", groupingParams);
shardHandler.submit(fastFacetRefinementRequest, "shard1", facetParams);
shardHandler.submit(fastFacetRefinementRequest, "shard2", facetParams);
shardHandler.takeCompletedOrError(); // returns fastFacetRefinementRequest
with responses
shardHandler.takeCompletedOrError(); // returns slowGroupingRequest with
responses
shardHandler.takeCompletedOrError(); // return null, SearchHandler exits
take loop

Does that seem like a correct understanding of the
SearchHandler->ShardHandler interaction?

If so, it seems that to make backup requests work we'd need to fanout
individual ShardRequests independently, each with its own completion
service and pending queue. Does that sound right?

Thanks!

--Gregg


Re: "Taking Solr 5.0 to Production" on Windows

2015-04-02 Thread Upayavira


On Thu, Apr 2, 2015, at 04:23 PM, Shawn Heisey wrote:
> On 4/2/2015 8:20 AM, Steven White wrote:
> > I'm reading "Taking Solr 5.0 to Production"
> > https://cwiki.apache.org/confluence/display/solr/Taking+Solr+to+Production
> > but I cannot find anything about Windows, is there some other link I'm
> > missing?
> >
> > This section in the doc is an important part for a successful Solr
> > deployment, but it is missing Windows instructions.  Without one, there
> > will either be scattered deployment or Windows folks (like me) will miss
> > out on some key aspects that Solr expert know.
> 
> We are aware that the documentation is missing step-by-step information
> for Windows.  We are all volunteers, and there's a limited amount of
> free time available.  The hole in the documentation will eventually get
> filled, but it's not going to happen immediately.  The available
> solutions must be studied so the best option can be determined, and it
> probably will require some development work to automate the install.
> 
> You might get the sense that Windows is treated as a second class
> citizen around here ... and I think you'd probably be right to feel that
> way.  There are no technical advantages in a Windows server over the
> free operating systems like Linux.  The biggest disadvantage is in
> Microsoft's licensing model.  A Windows Server OS has a hefty price tag,
> and client operating systems like Windows 7 and 8 are intentionally
> crippled by Microsoft so they run heavy-duty server programs poorly, in
> order to sell more Server licenses.  If a Solr install is not very busy,
> Windows 7 or 8 would probably run it just fine, but a very busy install
> will run into problems if it's on a client OS.  Unfortunately I cannot
> find any concrete information about the precise limitations in client
> operating systems.

I think the point is more that the majority of developers use a Unix
based system, and the majority of testing is done on Unix based systems.

Also, there are ways in which the Windows memory model differs from a
Unix one, meaning certain memory optimisations have not been possible
under Windows. A Lucene index is accessed via a Directory object, and
Solr/Lucene will, by default, choose one according to your architecture:
Windows/Unix, 32/64 bit, etc. 64 bit Unix gives you the best options.

My unconfirmed understanding is that this is to do with the
MemoryMappedDirectory implementation which will only work on Unix. This
implementation uses the OS disk cache directly, rather than reading
files from the disk cache into the heap, and is therefore much more
efficient. I’m sure there are some folks here who can clarify if I got
my implementation names or other details wrong.

So, Solr *will* run on Windows, whether desktop (for development) or
server. However, it is much less tested, and you will find some things,
such as new init scripts, and so on, that maybe have not yet been ported
over to Windows.

Upayavira


Re: newbie questions regarding solr cloud

2015-04-02 Thread Upayavira
A couple of additions:

I had a system that indexed log files. I created a new core each day
(some 20m log events/day). I created collection aliases called today,
week and month that aggregated the relevant collections. That way,
accessing the “today” collection would always get you to the right
place. And I could unload, or delete, collections over a certain age.

Second thing - some months ago, I created a pull request against pysolr
that added Zookeeper support. Please use it, try it, and comment on the
PR, as it hasn’t been merged yet. I’m keen to get feedback on whether it
works for you. When testing it, I had it happily notice a node going
down and redirect traffic to another host within 200ms, and did so
transparently. I will likely be starting to use it in a project in the
next few weeks myself.

Upayavira

On Thu, Apr 2, 2015, at 09:00 PM, Erick Erickson wrote:
> See inline:
> 
> On Thu, Apr 2, 2015 at 12:36 PM, Ben Hsu 
> wrote:
> > Hello
> >
> > I am playing with solr5 right now, to see if its cloud features can replace
> > what we have with solr 3.6, and I have some questions, some newbie, and
> > some not so newbie
> >
> > Background: the documents we are putting in solr have a date field. the
> > majority of our searches are restricted to documents created within the
> > last week, but searches do go back 60 days. documents older than 60 days
> > are removed from the repo. we also want high availability in case a machine
> > becomes unavailable
> >
> > our current method, using solr 3.6, is to split the data into 1 day chunks,
> > within each day the data is split into several shards, and each shard has 2
> > replicas. Our code generates the list of cores to be queried on based on
> > the time ranged in the query. Cores that fall off the 60 day range are
> > deleted through solr's RESTful API.
> >
> > This all sounds a lot like what Solr Cloud provides, so I started looking
> > at Solr Cloud's features.
> >
> > My newbie questions:
> >
> >  - it looks like the way to write a document is to pick a node (possibly
> > using a LB), send it to that node, and let solr figure out which nodes that
> > document is supposed to go. is this the recommended way?
> 
> [EOE] That's totally fine. If you're using SolrJ a better way is to
> use CloudSolrClient
> which sends the docs to the proper leader, thus saving one hop.
> 
> >  - similarly, can I just randomly pick a core (using the demo example:
> > http://localhost:7575/solr/#/gettingstarted_shard1_replica2/query ), query
> > it, and let it scatter out the queries to the appropriate cores, and send
> > me the results back? will it give me back results from all the shards?
> 
> [EOE] Yes. Actually, you don't even have to pick a core, just a
> collection.
> The # is totally unneeded, it's just part of navigating around the UI. So
> this
> should work:
> http://localhost:7575/solr/gettingstarted/query?q=*:*
> 
> >  - is there a recommended Python library?
> [EOE] Unsure. If you do find one, check that it has the
> CloudSolrClient support as
> I expect that would take the most effort
> 
> >
> > My hopefully less newbie questions:
> >  - does solr auto detect when node become unavailable, and stop sending
> > queries to them?
> 
> [EOE] Yes, that's what Zookeeper is all about. As each Solr node comes up
> it
> registers itself as a listener for collection state changes. ZK
> detects a node dying and
> notifies all the remaining nodes that nodeX is out of commission and
> they adjust accordingly.
> 
> >  - when the master node dies and the cluster elects a new master, what
> > happens to writes?
> [EOE] Stop thinking master/slave! It's "leaders" and "replicas"
> (although I'm trying
> to use "leaders" and "followers"). The critical bit is that on an
> update, the raw document
> is forwarded from the leader to all followers so they can come and go.
> You simply cannot
> rely on a particular node that is a leader remaining the leader. For
> instance, if you bring up
> your nodes in a different order tomorrow, the leaders and followers
> won't be the same.
> 
> 
> >  - what happens when a node is unavailable
> [EOE] SolrCloud "does the right thing" and keeps on chugging. See the
> comments about
> auto-detect. The exception is that if _all_ the nodes hosting a shard
> go down, you cannot
> add to the index and queries will fail unless you set
> shards.tolerant=true.
> 
> >  - what is the procedure when a shard becomes too big for one machine, and
> > needs to be split?
> There is the Collections API SPLITSHARD command you can use. This means
> that
> you increase by powers of two though, there's no such thing as adding,
> say, one new
> shard to a 4 shard cluster.
> 
> You can also reindex from scratch.
> 
> You can also "overshard" when you initially create your collection and
> host multiple
> shards and/or replicas on a single machine, then physically move them
> when the
> aggregate size exceeds your boundaries.
> 
> >  - what is the procedure when we lose a

solr query latency spike when replicating index

2015-04-02 Thread wei
I noticed the solr query latency spike on slave node when replicating index
from master. Especially when master just finished optimization, the slave
node will copy the whole index, and the latency is really bad.

Is there some way to fix it?

Thanks,
Wei


sort param could not be parsed as a query, and is not a field that exists in the index: geodist()

2015-04-02 Thread Niraj
*Objective: To find out all locations those are present within 1 KM of the
specified reference point, sorted by the distance from the reference*

curl -i --globoff --negotiate -u XXX:XXX -XGET  -H "Accept:
application/json" \
-X GET
"http://xx:8983/solr/loc_data/select?q=*:*&wt=json&indent=true&start=0&rows=1000&fq=%7B!geofilt%7D&sfield=GEO_LOCATION&pt=25.8227920532,-80.1314697266&d=1&sort=geodist()+asc"


--
{
  "responseHeader":{
"status":400,
"QTime":1,
"params":{
  "d":"1",
  "sort":"geodist() asc",
  "indent":"true",
  "start":"0",
  "q":"*:*",
  "sfield":"GEO_LOCATION",
  "pt":"25.8227920532,-80.1314697266",
  "doAs":"*",
  "wt":"json",
  "fq":"{!geofilt}",
  "rows":"1000"}},
  "error":{
"msg":"*sort param could not be parsed as a query, and is not a field
that exists in the index: geodist()"*,
"code":400}}

Please note that, the query works properly without the geodist() function.
I am newbie to Solr. Please help.

Regards,
Niraj






--
View this message in context: 
http://lucene.472066.n3.nabble.com/sort-param-could-not-be-parsed-as-a-query-and-is-not-a-field-that-exists-in-the-index-geodist-tp4197350.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: "Taking Solr 5.0 to Production" on Windows

2015-04-02 Thread Shawn Heisey
On 4/2/2015 2:23 PM, Upayavira wrote:
> I think the point is more that the majority of developers use a Unix
> based system, and the majority of testing is done on Unix based systems.
>
> Also, there are ways in which the Windows memory model differs from a
> Unix one, meaning certain memory optimisations have not been possible
> under Windows. A Lucene index is accessed via a Directory object, and
> Solr/Lucene will, by default, choose one according to your architecture:
> Windows/Unix, 32/64 bit, etc. 64 bit Unix gives you the best options.
>
> My unconfirmed understanding is that this is to do with the
> MemoryMappedDirectory implementation which will only work on Unix. This
> implementation uses the OS disk cache directly, rather than reading
> files from the disk cache into the heap, and is therefore much more
> efficient. I’m sure there are some folks here who can clarify if I got
> my implementation names or other details wrong.
>
> So, Solr *will* run on Windows, whether desktop (for development) or
> server. However, it is much less tested, and you will find some things,
> such as new init scripts, and so on, that maybe have not yet been ported
> over to Windows.

MMap seems to work perfectly fine on Windows.

Uwe Schindler indicates that MMap is used by default on 64-bit Windows
JVMs since Lucene/Solr 3.1:

http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

For various reasons, MMap being only one of them, Solr should always be
run on 64-bit operating systems with a 64-bit Java.

There are no major *disadvantages* to running Solr on Windows, as long
as it's a 64-bit Server OS.  NTFS cannot compare to the best filesystems
on a recent Linux kernel, but it's not horrible.  If you've sized your
RAM appropriately, Solr will hardly ever hit the disk, so the filesystem
may not make much difference.

Thanks,
Shawn



Re: sort param could not be parsed as a query, and is not a field that exists in the index: geodist()

2015-04-02 Thread Erick Erickson
What comes out int he Solr logs? Nothing's jumping out at me here.

What version of Solr are you using? What is your GEOLOCATION field type?

Best,
Erick

On Thu, Apr 2, 2015 at 2:20 PM, Niraj  wrote:
> *Objective: To find out all locations those are present within 1 KM of the
> specified reference point, sorted by the distance from the reference*
>
> curl -i --globoff --negotiate -u XXX:XXX -XGET  -H "Accept:
> application/json" \
> -X GET
> "http://xx:8983/solr/loc_data/select?q=*:*&wt=json&indent=true&start=0&rows=1000&fq=%7B!geofilt%7D&sfield=GEO_LOCATION&pt=25.8227920532,-80.1314697266&d=1&sort=geodist()+asc"
>
>
> --
> {
>   "responseHeader":{
> "status":400,
> "QTime":1,
> "params":{
>   "d":"1",
>   "sort":"geodist() asc",
>   "indent":"true",
>   "start":"0",
>   "q":"*:*",
>   "sfield":"GEO_LOCATION",
>   "pt":"25.8227920532,-80.1314697266",
>   "doAs":"*",
>   "wt":"json",
>   "fq":"{!geofilt}",
>   "rows":"1000"}},
>   "error":{
> "msg":"*sort param could not be parsed as a query, and is not a field
> that exists in the index: geodist()"*,
> "code":400}}
>
> Please note that, the query works properly without the geodist() function.
> I am newbie to Solr. Please help.
>
> Regards,
> Niraj
>
>
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/sort-param-could-not-be-parsed-as-a-query-and-is-not-a-field-that-exists-in-the-index-geodist-tp4197350.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud 5.0 cluster RAM requirements

2015-04-02 Thread Ryan Steele
Thank you Shawn and Toke for the information and links!

No, I was not the one on #solr IRC channel. :/

Here are the details I have right now:

I'm building/running the operations side of this new SolrCloud cluster. 
It will be in Amazon, the initial cluster I'm planning to start with is 
5 r3.xlarge instances each using a general purpose SSD EBS volume for 
the SolrCloud related data (this will be separate from the EBS volume 
used by the OS). Each instance has 30.5 GiB RAM--152.5 GiB cluster 
wide--and each instance has 4 vCPU's. I'm using Oracle Java 1.8.0_31 and 
the G1 GC.

The data will be indexed on a separate machine and added to the 
SolrCloud cluster while searching is taking place. Unfortunately I don't 
have numbers at this time on how much data will be indexed. I do know 
that we will have over 2000 collections--some will be small (a few 
hundred documents and only a few megabytes at most), and a few will be 
very large (somewhere in the gigabytes). Our old Solr Master/Slave 
systems isn't broken up this way, so we aren't certain about how exactly 
things will map out in SolrCloud.

I'll continue researching, but I expect I'll just have to monitor the 
cluster as data gets imported into it and make adjustments as needed.

Ryan

On 4/2/15 12:06 AM, Toke Eskildsen wrote:
> Ryan Steele  wrote:
>> Does a SolrCloud 5.0 cluster need enough RAM across the cluster to load
>> all the collections into RAM at all times?
> Although Shawn is right about us not being able to answer properly, sometimes 
> we can give qualified suggestions and guesses. At least to the direction you 
> should be looking. The quality of the guesses goes up with the amount of 
> information provided and "1TB" is really not much information.
>
> - Are you indexing while searching? How much?
> - How many documents in the index?
> - What is a typical query? What about faceting?
> - How many concurrent queries?
> - Expected median response time?
>
>> I'm building a SolrCloud cluster that may have approximately 1 TB of
>> data spread across the collections.
> We're running a 22TB SolrCloud of a single 16-core server with 256GB RAM. 
> We've also had performance problems serving a 100GB index from a same-size 
> machine.
>
> The one hardware advice I will give is to start with SSDs and scale from 
> there. With present day price/performance, using spinning drives for anything 
> IO-intensive makes little sense.
>
> - Toke Eskildsen
>
>
---
 This email has been scanned for email related threats and delivered safely by 
Mimecast.
 For more information please visit http://www.mimecast.com
---



Problems with solr-cloud 4.8.0 and zookeeper 3.4.6

2015-04-02 Thread Vincenzo D'Amore
Hi,

In my development I have 3 servers.
Inside every server there are two running instance of zookeeper and
solrcloud.
zkHos
There aren't connections or any other clients running but I have the
zookeeper logs flooded by this annoying exceptions coming only from server
1 and 3.
All solrcloud and zookeeper instances seems to be healty.
I'm really unable to understand why this thing is happening.

Any help is really appreciated.


2015-04-03 01:27:18,899 [myid:1] - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection
from /192.168.0.13:51675
2015-04-03 01:27:18,900 [myid:1] - WARN  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid
0x0, likely client has closed socket
at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
at
org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
at java.lang.Thread.run(Thread.java:744)
2015-04-03 01:27:18,900 [myid:1] - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for
client /192.168.0.13:51675 (no session established for client)


-- 
Vincenzo D'Amore
email: v.dam...@gmail.com
skype: free.dev
mobile: +39 349 8513251


Unable to update config file using zkcli or RELOAD

2015-04-02 Thread Shamik Bandopadhyay
Hi,

  I'm facing a weird issue. I've a solr cloud cluster with 2 shards having
a replica each. I started the cluster
using -Dbootstrap_confdir=./solr/collection1/conf
-Dcollection.configName=myconf. After the cluster is up and running, I
added a new request handler (newhandler) and wanted to push it without
restarting the server. First, I tried the RELOAD option. I ran

http://54.151.xx.xxx:8983/solr/admin/cores?action=RELOAD&core=collection1

The command was successful, but when I logged in to the admin screen, the
solrconfig didn't show the request handler. Next I tried the zkcli script
on shard 1.

sh zkcli.sh -cmd upconfig -zkhost  zoohost1:2181 -confname myconf -solrhome
/mnt/opt/solrhome/ -confdir /mnt/opt/solrhome/solr/collection1/conf/

The script ran successfully and I could see the updated solrconfig file in
Solr admin. But then, when I tried

http://54.151.xx.xxx:8983/solr/collection1/newhandler

I got a 404. Not sure what I'm doing wrong. Do I need to run the zkcli
script on each node? I'm using Solr 5.0.

Regards,
Shamik


DOcValues

2015-04-02 Thread William Bell
If I set indexed=true and docvalues=true, when I
facet=true&facet.field=manu_exact
will it use docValues or the Indexed version?

Also, does it help with "*Too many values for UnInvertedField faceting" ?*


*Do I need to set facet.method when using docvalues?*



-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076


Re: SolrCloud 5.0 cluster RAM requirements

2015-04-02 Thread Shawn Heisey
On 4/2/2015 4:46 PM, Ryan Steele wrote:
> Thank you Shawn and Toke for the information and links! No, I was not
> the one on #solr IRC channel. :/ Here are the details I have right
> now: I'm building/running the operations side of this new SolrCloud
> cluster. It will be in Amazon, the initial cluster I'm planning to
> start with is 5 r3.xlarge instances each using a general purpose SSD
> EBS volume for the SolrCloud related data (this will be separate from
> the EBS volume used by the OS). Each instance has 30.5 GiB RAM--152.5
> GiB cluster wide--and each instance has 4 vCPU's. I'm using Oracle
> Java 1.8.0_31 and the G1 GC. 

Java 8u40 is supposed to have some significant improvements to G1
garbage collection, so I would recommend an upgrade from 8u31.  I heard
this directly from Oracle engineers on a mailing list for GC issues.

> The data will be indexed on a separate machine and added to the
> SolrCloud cluster while searching is taking place. Unfortunately I
> don't have numbers at this time on how much data will be indexed. I do
> know that we will have over 2000 collections--some will be small (a
> few hundred documents and only a few megabytes at most), and a few
> will be very large (somewhere in the gigabytes). Our old Solr
> Master/Slave systems isn't broken up this way, so we aren't certain
> about how exactly things will map out in SolrCloud. 

If is a viable option to combine collections that use the same or
similar schemas and do filtering on the query side to reduce the total
number of collections to only a few hundred, your SolrCloud experience
will probably be better.  See this issue:

https://issues.apache.org/jira/browse/SOLR-7191

General SolrCloud stability is not very good with thousands of
collections, but I would imagine that SSD storage will improve that,
especially if the zookeeper database is also on SSD.

In a perfect world, for the best performance, you would have enough
memory across the cluster so that you can cache all of the index data
present on the cluster, including all replicas ... but for terabyte
scale indexes, that's either a huge amount of RAM on a modest number of
servers or a huge amount of servers, each with a big chunk of RAM. 
Either way it's very expensive, especially on Amazon.  Usually you can
achieve very good performance without a perfect one-to-one relationship
between index size and RAM.

The fact that you will have a lot of smaller indexes will hopefully mean
only some of them are needed at any given time.  If that's the case,
your overall memory requirements will be lower than if you had a single
1TB index, and I think the SSD storage will help the performance of
those smaller indexes a lot more than it would for very large indexes.

Thanks,
Shawn



multi core faceting

2015-04-02 Thread Aman Tandon
Hi,

I have two cores one contains the data of jeans and other core contains
data of shirts available to user. I want to show count of shirts and jeans
on my website from one solr request.

Is there any functionality available i solr by which I can get the combined
facet from both the cores (jeans & shirts).

With Regards
Aman Tandon


Re: SolrCloud 5.0 cluster RAM requirements

2015-04-02 Thread Shawn Heisey
On 4/2/2015 11:18 PM, Shawn Heisey wrote:
> On 4/2/2015 4:46 PM, Ryan Steele wrote:
>> cluster. It will be in Amazon, the initial cluster I'm planning to
>> start with is 5 r3.xlarge instances each using a general purpose SSD
>> EBS volume for the SolrCloud related data (this will be separate from
>> the EBS volume used by the OS). Each instance has 30.5 GiB RAM--152.5
>> GiB cluster wide--and each instance has 4 vCPU's. I'm using Oracle
>> Java 1.8.0_31 and the G1 GC. 

Followup on the RAM:  Depending on your query characteristics, 1TB of
index data might require a significant amount of heap memory.  I would
imagine that you'll need to allocate at least half of your 30GB RAM to
the Java heap on each server, and possibly more, which will reduce the
amount available for disk caching.

There's a very good chance that you'll either need more EC2 instances,
and/or that you will need instances with more memory.  Before committing
more resources, you will need to find out whether performance is
acceptable or not with what you have already planned.

Thanks,
Shawn



Re: multi core faceting

2015-04-02 Thread Shawn Heisey
On 4/2/2015 11:30 PM, Aman Tandon wrote:
> I have two cores one contains the data of jeans and other core contains
> data of shirts available to user. I want to show count of shirts and jeans
> on my website from one solr request.
> 
> Is there any functionality available i solr by which I can get the combined
> facet from both the cores (jeans & shirts).

Are the schemas of the two cores the same, or at least very similar?  At
a bare minimum, they would need to use the same field name for the
uniqueKey, but substantial similarity, at least on the fields that you
will be querying and faceting, is usually required.

If the answer to that question is yes, you may be able to do a
distributed search.

Your message history on this list mentions SolrCloud quite frequently,
but your message specifically says "cores" ... which would tend to mean
that it's NOT SolrCloud.

If it is cloud, you could create a collection alias that points at both
collections, then use the alias in your queries to query them both.

https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api4

If it's not SolrCloud, then you can use the older method for distributed
searching:

https://wiki.apache.org/solr/DistributedSearch

Thanks,
Shawn



Re: Unable to update config file using zkcli or RELOAD

2015-04-02 Thread shamik
Ok, I figured the steps in case someone needs a reference. It required  both
zkcli and RELOAD to update the changes.

1. Use zkcli to load the changes. I ran it from the node which used the
bootstrapping.

sh zkcli.sh -cmd upconfig -zkhost  zoohost1:2181 -confname myconf -solrhome 
/mnt/opt/solrhome/ -confdir /mnt/opt/solrhome/solr/collection1/conf/ 

2. Use the same node to run the RELOAD

http://54.151.xx.xxx:8983/solr/admin/cores?action=RELOAD&core=collection1



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Unable-to-update-config-file-using-zkcli-or-RELOAD-tp4197376p4197393.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Unable to update config file using zkcli or RELOAD

2015-04-02 Thread Shawn Heisey
On 4/3/2015 12:28 AM, shamik wrote:
> Ok, I figured the steps in case someone needs a reference. It required  both
> zkcli and RELOAD to update the changes.
> 
> 1. Use zkcli to load the changes. I ran it from the node which used the
> bootstrapping.
> 
> sh zkcli.sh -cmd upconfig -zkhost  zoohost1:2181 -confname myconf -solrhome 
> /mnt/opt/solrhome/ -confdir /mnt/opt/solrhome/solr/collection1/conf/ 
> 
> 2. Use the same node to run the RELOAD
> 
> http://54.151.xx.xxx:8983/solr/admin/cores?action=RELOAD&core=collection1

I was about to reply to your first message, then I read the second one.
 You've found the essence of the correct procedure, which is to upload
the new config and then RELOAD.

With SolrCloud, you should reload using the Collections API, not the
CoreAdmin API.  It will reload all the shard replicas (cores) in the
collection with one HTTP call.  Since a fault tolerant collection will
involve at least two cores located on different servers, the Collections
API is a lot easier to manage.

https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api2

Thanks,
Shawn