MiniSolrCloudCluster usage in solr 7.0.0

2016-04-14 Thread Rohana Rajapakse

Can someone give a sample code snippet to create MiniSolrCloudCluster from a 
separate java application (outside of solr codebase).  Wants to know dependency 
jars and config files you need.

Thanks

Rohana




Registered Office: 24 Darklake View, Estover, Plymouth, PL6 7TL.
Company Registration No: 3553908

This email contains proprietary information, some or all of which may be 
legally privileged. It is for the intended recipient only. If an addressing or 
transmission error has misdirected this email, please notify the author by 
replying to this email. If you are not the intended recipient you may not use, 
disclose, distribute, copy, print or rely on this email.

Email transmission cannot be guaranteed to be secure or error free, as 
information may be intercepted, corrupted, lost, destroyed, arrive late or 
incomplete or contain viruses. This email and any files attached to it have 
been checked with virus detection software before transmission. You should 
nonetheless carry out your own virus check before opening any attachment. GOSS 
Interactive Ltd accepts no liability for any loss or damage that may be caused 
by software viruses.




Re: Optimal indexing speed in Solr

2016-04-14 Thread Emir Arnautovic

Hi Edwin,
Indexing speed depends on multiple factors: HW, Solr configurations and 
load, documents, indexing client: More complex documents, more CPU time 
to process each document before indexing structure is written down to 
disk. Bigger the index, more heap is used, more frequent GCs. Maybe you 
are just not sending enough doc to Solr to have such throughput.
The best way to pinpoint bottleneck is to use some monitoring tool. One 
such tool is our SPM (http://sematext.com/spm) - it allows you to 
monitor both Solr and OS metrics.


HTH,
Emir

--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/


On 14.04.2016 05:29, Zheng Lin Edwin Yeo wrote:

Hi,

Would like to find out, what is the optimal indexing speed in Solr?

Previously, I managed to get more than 3GB/hour, but now the speed has drop
to 0.7GB/hr. What could be the potential reason behind this?

Besides the index size getting bigger, I have only added in more
collections into the core and added another field. Other than that nothing
else has been changed..

Could the source file which I'm indexing made a difference in the indexing
speed?

I'm using Solr 5.4.0 for now, but will be planning to migrate to Solr 6.0.0.

Regards,
Edwin





Solr 5.5 timeout of solrj client

2016-04-14 Thread Novin Novin
Hi guys,

I'm having error

 when sending solr doc
mid15955728
org.apache.solr.client.solrj.SolrServerException: Timeout occured
while waiting response from server at:
http://localhost.com:8983/solr/analysis
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:585)
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:240)
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:229)
at 
org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:149)
at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:106)
at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:71)
at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:85)
at com.temetra.wms.textindexer.TextIndexer$7.run(TextIndexer.java:544)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:152)
at java.net.SocketInputStream.read(SocketInputStream.java:122)
at 
org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:139)
at 
org.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:155)
at 
org.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:284)
at 
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:140)
at 
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57)
at 
org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:261)
at 
org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:165)
at 
org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:167)
at 
org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:272)
at 
org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:124)
at 
org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:271)
at 
org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:184)
at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:88)
at 
org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
at 
org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184)
at 
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
at 
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:107)
at 
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55)
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:481)
... 10 more


Don't really able find why is this happening.

Does any body know what could have cause such error?


Thanks in advance.

Novin


dynamicField and type solr.LatLonType

2016-04-14 Thread Vangelis Katsikaros

Hi

I use solr [1] on ubuntu 14.04. I am trying to define a dynamicField on a custom 
type (ie nont built in like "int"). I don't see something mentioned in the 
documentation that prohibits it but I can't seem to make it work.


For a built in type, my code can index fine the following in schema.xml




but can't index with









I get the following:
HTTP Status 400 - ERROR: [doc=123] Error adding field 
'lala_1'='50.657398,-2.366020'

If you need any more info let me know.

Regards
Vangelis


[1]
Solr Specification Version: 3.5.0.2011.11.22.14.54.38
Solr Implementation Version: 3.5.0 1204988 - simon - 2011-11-22 14:54:38
Lucene Specification Version: 3.5.0
Lucene Implementation Version: 3.5.0 1204988 - simon - 2011-11-22 14:46:51


Re: How to declare field type for IntPoint field in solr 6.0 schema?

2016-04-14 Thread Shawn Heisey
On 4/13/2016 8:57 PM, Rafis Ismagilov wrote:
> Should it be PointType, BinaryField, or something else. All examples use 
> TrieIntField for int.

Solr doesn't have support for the new Point types in Lucene yet.  They
are a recent introduction, and Solr was caught a little off guard by how
fast they were pushed into becoming the primary numeric type for Lucene 6.

This is the issue that will most likely add them to a later 6.x release:

https://issues.apache.org/jira/browse/SOLR-8396

For reasons that might not be apparent, getting support for these new types 
into an early 6.x release is a high priority for us.

Thanks,
Shawn



Re: MiniSolrCloudCluster usage in solr 7.0.0

2016-04-14 Thread Shawn Heisey
On 4/14/2016 2:01 AM, Rohana Rajapakse wrote:
> Can someone give a sample code snippet to create MiniSolrCloudCluster from a 
> separate java application (outside of solr codebase).  Wants to know 
> dependency jars and config files you need.

I would imagine that you need to start with solr-test-framework.  I have
looked at this jar on maven central (also used by ivy), and the 6.0
version has *eighty* direct dependencies, one of which is solr-core. 
These dependencies probably represent most of a full Solr install.

As for sample code ... the class is heavily used in Solr tests, all of
which you can obtain by cloning the master branch with git.

Thanks,
Shawn



Re: Solr 5.5 timeout of solrj client

2016-04-14 Thread Shawn Heisey
On 4/14/2016 4:40 AM, Novin Novin wrote:
> I'm having error
>
>  when sending solr doc
> mid15955728
> org.apache.solr.client.solrj.SolrServerException: Timeout occured
> while waiting response from server at:
> http://localhost.com:8983/solr/analysis



> Caused by: java.net.SocketTimeoutException: Read timed out

You encountered a socket timeout.  This is a low-level TCP timeout. 
It's effectively an idle timeout -- no activity for X seconds and the
TCP connection is severed.

I believe the Jetty included with Solr has a socket timeout of 50
seconds configured.  You can also configure a socket timeout on the
HttpClient used by various SolrClient implementations.

The operating system (on either end of the connection) may also have a
default socket timeout configured, but I believe that these defaults are
normally measured in hours, not seconds.

Thanks,
Shawn



Re: Solr 5.5 timeout of solrj client

2016-04-14 Thread Novin Novin
Thanks for reply Shawn.

Below is snippet of jetty.xml and jetty-https.xml

jetty.xml:38:
/// I presume this one I should increase, But I believe 5 second is enough
time for 250 docs to add to solr.

jetty.xml:39:

jetty-https.xml:45:

I'm also seeing "DirectUpdateHandler2 Starting optimize... Reading and
rewriting the entire index! Use with care". Would this be causing delay
response from solr?

Thanks in advance,
Novin


On 14 April 2016 at 14:05, Shawn Heisey  wrote:

> On 4/14/2016 4:40 AM, Novin Novin wrote:
> > I'm having error
> >
> >  when sending solr doc
> > mid15955728
> > org.apache.solr.client.solrj.SolrServerException: Timeout occured
> > while waiting response from server at:
> > http://localhost.com:8983/solr/analysis
>
> 
>
> > Caused by: java.net.SocketTimeoutException: Read timed out
>
> You encountered a socket timeout.  This is a low-level TCP timeout.
> It's effectively an idle timeout -- no activity for X seconds and the
> TCP connection is severed.
>
> I believe the Jetty included with Solr has a socket timeout of 50
> seconds configured.  You can also configure a socket timeout on the
> HttpClient used by various SolrClient implementations.
>
> The operating system (on either end of the connection) may also have a
> default socket timeout configured, but I believe that these defaults are
> normally measured in hours, not seconds.
>
> Thanks,
> Shawn
>
>


Re: dynamicField and type solr.LatLonType

2016-04-14 Thread Shawn Heisey
On 4/14/2016 4:56 AM, Vangelis Katsikaros wrote:
> but can't index with
>
> 
>  subFieldSuffix="_coordinate"/>
> 
>
> 
> 
> 
>
> I get the following:
> HTTP Status 400 - ERROR: [doc=123] Error adding field
> 'lala_1'='50.657398,-2.366020'

When a LatLonType field is indexed, it will create two additional fields
containing the latitude and longitude as double-precision floating point
numbers.  These extra fields are normally handled by a dynamicField
definition for *_coordinate".  That dynamic field definition has a type
of tdouble.  It is indexed and not stored.  The "*_coordinate"
dynamicField and its associated fieldType can be found in Solr examples.

Adding that definition to your schema might fix this problem.  The full
stacktrace may mention a field name starting with lala_1 and ending with
_coordinate.  For further help, we will need to see that full
stacktrace, including any "Caused by" sections.

http://stackoverflow.com/a/12530639/2665648

> Solr Specification Version: 3.5.0.2011.11.22.14.54.38
> Solr Implementation Version: 3.5.0 1204988 - simon - 2011-11-22 14:54:38
> Lucene Specification Version: 3.5.0
> Lucene Implementation Version: 3.5.0 1204988 - simon - 2011-11-22
> 14:46:51

That's an ancient version of Solr -- over four years old now.  The 3.x
versions are VERY solid, but are receiving zero developer attention. 
I'm not telling you to upgrade, but if you find bugs, they have probably
already been fixed in a newer version, and will not be fixed in your
version.  I don't think the problem you're having is a bug.

Thanks,
Shawn



Re: Solr 5.5 timeout of solrj client

2016-04-14 Thread Shawn Heisey
On 4/14/2016 7:23 AM, Novin Novin wrote:
> Thanks for reply Shawn.
>
> Below is snippet of jetty.xml and jetty-https.xml
>
> jetty.xml:38: name="solr.jetty.threads.idle.timeout" default="5000"/>
> /// I presume this one I should increase, But I believe 5 second is enough
> time for 250 docs to add to solr.

5 seconds might not be enough time.  The *add* probably completes in
time, but the entire request might take longer, especially if you use
commit=true with the request.  I would definitely NOT set this timeout
so low -- requests that take longer than 5 seconds are very likely going
to happen.

> I'm also seeing "DirectUpdateHandler2 Starting optimize... Reading and
> rewriting the entire index! Use with care". Would this be causing delay
> response from solr?

Exactly how long an optimize takes is dependent on the size of your
index.  Rewriting an index that's a few hundred megabytes may take 30
seconds to a minute.  Rewriting an index that's several gigabytes will
take a few minutes.  Performance is typically lower during an optimize,
because the CPU and disks are very busy.

Thanks,
Shawn



DIH with Nested Documents - Configuration Issue

2016-04-14 Thread Jeff Chastain
I am working on a project where the specification requires a parent - child 
relationship within the Solr data collection ... i.e. a user and the collection 
of languages they speak (each of which is made up of multiple data fields).  My 
production system is a 4.10 Solr implementation but I have a 5.5 implementation 
as my disposal as well.  Thus far, I am not getting this to work on either one 
and I have yet to find a complete documentation source on how to implement this.

The goal is to get a resulting document from Solr that looks like this:

   {
   "id": 123,
   "firstName": "John",
   "lastName": "Doe",
   "languagesSpoken": [
  {
 "id": 243,
 "abbreviation": "en",
 "name": "English"
  },
  {
 "id": 442,
 "abbreviation": "fr",
 "name": "French"
  }
   ]
}

In my schema.xml, I have flatted out all of the fields as follows:

   
   
   
   
   
   
   

The latest rendition of my db-data-config.xml looks like this:


   
   
  

 
 
 

 



 
  
   
   ...

On the 4.10 server, when the data comes out of Solr, I get one flat document 
record with the fields for one language inline with the firstName and lastname 
like this:

   {
   "id": 123,
   "firstName": "John",
   "lastName": "Doe",
   "languagesSpoken_id": 243,
   "languagesSpoken_abbreviation ": "en",
   "languagesSpoken_name": "English"
}

On the 5.5 server, when the data comes out, I get separate documents for the 
root client document and the child language documents with no relationship 
between them like this:

   {
   "id": 123,
   "firstName": "John",
   "lastName": "Doe"
},
{
   "languagesSpoken_id": 243,
   "languagesSpoken_abbreviation": "en",
   "languagesSpoken_name": "English"
},
{
   "languagesSpoken_id": 442,
   "languagesSpoken_abbreviation": "fr",
   "languagesSpoken_name": "French"
}

I have spent several days now trying to figure out what is going on here to no 
avail.  Can anybody provide me with a pointer as to what I am missing here?

Thanks,
-- Jeff



Re: Optimal indexing speed in Solr

2016-04-14 Thread John Bickerstaff
If you delete a lot of documents over time, or if you add updated documents
of the same I'd over time, optimizing your collection(s) may help.
On Apr 14, 2016 3:52 AM, "Emir Arnautovic" 
wrote:

> Hi Edwin,
> Indexing speed depends on multiple factors: HW, Solr configurations and
> load, documents, indexing client: More complex documents, more CPU time to
> process each document before indexing structure is written down to disk.
> Bigger the index, more heap is used, more frequent GCs. Maybe you are just
> not sending enough doc to Solr to have such throughput.
> The best way to pinpoint bottleneck is to use some monitoring tool. One
> such tool is our SPM (http://sematext.com/spm) - it allows you to monitor
> both Solr and OS metrics.
>
> HTH,
> Emir
>
> --
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> Solr & Elasticsearch Support * http://sematext.com/
>
>
> On 14.04.2016 05:29, Zheng Lin Edwin Yeo wrote:
>
>> Hi,
>>
>> Would like to find out, what is the optimal indexing speed in Solr?
>>
>> Previously, I managed to get more than 3GB/hour, but now the speed has
>> drop
>> to 0.7GB/hr. What could be the potential reason behind this?
>>
>> Besides the index size getting bigger, I have only added in more
>> collections into the core and added another field. Other than that nothing
>> else has been changed..
>>
>> Could the source file which I'm indexing made a difference in the indexing
>> speed?
>>
>> I'm using Solr 5.4.0 for now, but will be planning to migrate to Solr
>> 6.0.0.
>>
>> Regards,
>> Edwin
>>
>>
>


Solr best practices for many to many relations...

2016-04-14 Thread Bastien Latard - MDPI AG

Hi Guys,

/I am upgrading from solr 4.2 to 6.0.//
//I successfully (after some time) migrated the config files and other 
parameters.../


Now I'm just wondering if my indexes are following the best 
practices...(and they are probably not :-) )


What would be the best if we have this kind of sql data to write in Solr:


I have several different services which need (more or less), different 
data based on these JOINs...


e.g.:
Service A needs lots of data (but bot all),
Service B needs a few data (some fields already included in A),
Service C needs a bit more data than B(some fields already included in 
A/B)...


*1. Would it be better to create one single index?**
**-> i.e.: this will duplicate journal info for every single article**
**
**2. Would it be better to create several specific indexes for each 
similar services?**
**-> i.e.: this will use more space on the disks (and there are 
~70millions of documents to join)


3. Would it be better to create an index per table and make a join?
-> if yes, how??

*

Kind regards,
Bastien



Re: Solr best practices for many to many relations...

2016-04-14 Thread Jack Krupansky
Solr is a search engine, not a database.

JOINs? Although Solr does have some limited JOIN capabilities, they are
more for special situations, not the front-line go-to technique for data
modeling for search.

Rather, denormalization is the front-line go-to technique for data modeling
in Solr.

In any case, the first step in data modeling is always to focus on your
queries - what information will be coming into your apps and what
information will the apps want to access based on those inputs.

But wait... you say you are upgrading, which suggests that you have an
existing Solr data model, and probably queries as well. So...

1. Share at least a summary of your existing Solr data model as well as at
least a summary of the kinds of queries you perform today.
2. Tell us what exacting is driving your inquiry - are queries too slow,
too cumbersome, not sufficiently powerful, or... what exactly is the
problem you need to solve.


-- Jack Krupansky

On Thu, Apr 14, 2016 at 10:12 AM, Bastien Latard - MDPI AG <
lat...@mdpi.com.invalid> wrote:

> Hi Guys,
>
> *I am upgrading from solr 4.2 to 6.0.*
> *I successfully (after some time) migrated the config files and other
> parameters...*
>
> Now I'm just wondering if my indexes are following the best
> practices...(and they are probably not :-) )
>
> What would be the best if we have this kind of sql data to write in Solr:
>
>
> I have several different services which need (more or less), different
> data based on these JOINs...
>
> e.g.:
> Service A needs lots of data (but bot all),
> Service B needs a few data (some fields already included in A),
> Service C needs a bit more data than B(some fields already included in
> A/B)...
>
> *1. Would it be better to create one single index?*
> *-> i.e.: this will duplicate journal info for every single article*
>
> *2. Would it be better to create several specific indexes for each similar
> services?*
>
>
>
>
>
> *-> i.e.: this will use more space on the disks (and there are ~70millions
> of documents to join) 3. Would it be better to create an index per table
> and make a join? -> if yes, how?? *
>
> Kind regards,
> Bastien
>
>


RE: MiniSolrCloudCluster usage in solr 7.0.0

2016-04-14 Thread Rohana Rajapakse
Thanks Shawn.

I have added few dependency jars into my project. There are no compilation 
errors or ClassNotFound exceptions, but Zookeeper exception " 
KeeperException$NodeExistsException: KeeperErrorCode = NodeExists for 
/solr/solr.xml ". My temporary solrHome folder has a solr.xml.  No other files 
(solrconfig.xml , schema.xml) are provided. Thought it should start solr cloud 
server with defaults, but it doesn't. There are no other solr or zookeeper 
servers running on my machine. 

 I have had a look at the unit tests in solr-7.0 code base and tried tests in 
TestMiniSolrCloudCluster() by copying this test file across to my apps project. 
The tests fail possibly due to the difference in the environment. I am running 
the tests in eclipse.



-Original Message-
From: Shawn Heisey [mailto:apa...@elyograg.org] 
Sent: 14 April 2016 14:00
To: solr-user@lucene.apache.org
Subject: Re: MiniSolrCloudCluster usage in solr 7.0.0

On 4/14/2016 2:01 AM, Rohana Rajapakse wrote:
> Can someone give a sample code snippet to create MiniSolrCloudCluster from a 
> separate java application (outside of solr codebase).  Wants to know 
> dependency jars and config files you need.

I would imagine that you need to start with solr-test-framework.  I have looked 
at this jar on maven central (also used by ivy), and the 6.0 version has 
*eighty* direct dependencies, one of which is solr-core. 
These dependencies probably represent most of a full Solr install.

As for sample code ... the class is heavily used in Solr tests, all of which 
you can obtain by cloning the master branch with git.

Thanks,
Shawn



Registered Office: 24 Darklake View, Estover, Plymouth, PL6 7TL.
Company Registration No: 3553908

This email contains proprietary information, some or all of which may be 
legally privileged. It is for the intended recipient only. If an addressing or 
transmission error has misdirected this email, please notify the author by 
replying to this email. If you are not the intended recipient you may not use, 
disclose, distribute, copy, print or rely on this email.

Email transmission cannot be guaranteed to be secure or error free, as 
information may be intercepted, corrupted, lost, destroyed, arrive late or 
incomplete or contain viruses. This email and any files attached to it have 
been checked with virus detection software before transmission. You should 
nonetheless carry out your own virus check before opening any attachment. GOSS 
Interactive Ltd accepts no liability for any loss or damage that may be caused 
by software viruses.




Re: Optimal indexing speed in Solr

2016-04-14 Thread John Bickerstaff
Stupid phone autocorrect...

If you add updated documents of the same ID over time, optimizing your
collection(s) may help.

On Thu, Apr 14, 2016 at 7:50 AM, John Bickerstaff 
wrote:

> If you delete a lot of documents over time, or if you add updated
> documents of the same I'd over time, optimizing your collection(s) may help.
> On Apr 14, 2016 3:52 AM, "Emir Arnautovic" 
> wrote:
>
>> Hi Edwin,
>> Indexing speed depends on multiple factors: HW, Solr configurations and
>> load, documents, indexing client: More complex documents, more CPU time to
>> process each document before indexing structure is written down to disk.
>> Bigger the index, more heap is used, more frequent GCs. Maybe you are just
>> not sending enough doc to Solr to have such throughput.
>> The best way to pinpoint bottleneck is to use some monitoring tool. One
>> such tool is our SPM (http://sematext.com/spm) - it allows you to
>> monitor both Solr and OS metrics.
>>
>> HTH,
>> Emir
>>
>> --
>> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
>> Solr & Elasticsearch Support * http://sematext.com/
>>
>>
>> On 14.04.2016 05:29, Zheng Lin Edwin Yeo wrote:
>>
>>> Hi,
>>>
>>> Would like to find out, what is the optimal indexing speed in Solr?
>>>
>>> Previously, I managed to get more than 3GB/hour, but now the speed has
>>> drop
>>> to 0.7GB/hr. What could be the potential reason behind this?
>>>
>>> Besides the index size getting bigger, I have only added in more
>>> collections into the core and added another field. Other than that
>>> nothing
>>> else has been changed..
>>>
>>> Could the source file which I'm indexing made a difference in the
>>> indexing
>>> speed?
>>>
>>> I'm using Solr 5.4.0 for now, but will be planning to migrate to Solr
>>> 6.0.0.
>>>
>>> Regards,
>>> Edwin
>>>
>>>
>>


Re: Solr 5.5 timeout of solrj client

2016-04-14 Thread Novin Novin
How can I stop happening "DirectUpdateHandler2 Starting optimize... Reading
and rewriting the entire index! Use with care"

Thanks
novin

On 14 April 2016 at 14:36, Shawn Heisey  wrote:

> On 4/14/2016 7:23 AM, Novin Novin wrote:
> > Thanks for reply Shawn.
> >
> > Below is snippet of jetty.xml and jetty-https.xml
> >
> > jetty.xml:38: > name="solr.jetty.threads.idle.timeout" default="5000"/>
> > /// I presume this one I should increase, But I believe 5 second is
> enough
> > time for 250 docs to add to solr.
>
> 5 seconds might not be enough time.  The *add* probably completes in
> time, but the entire request might take longer, especially if you use
> commit=true with the request.  I would definitely NOT set this timeout
> so low -- requests that take longer than 5 seconds are very likely going
> to happen.
>
> > I'm also seeing "DirectUpdateHandler2 Starting optimize... Reading and
> > rewriting the entire index! Use with care". Would this be causing delay
> > response from solr?
>
> Exactly how long an optimize takes is dependent on the size of your
> index.  Rewriting an index that's a few hundred megabytes may take 30
> seconds to a minute.  Rewriting an index that's several gigabytes will
> take a few minutes.  Performance is typically lower during an optimize,
> because the CPU and disks are very busy.
>
> Thanks,
> Shawn
>
>


RE: Multiple data-config.xml in one collection?

2016-04-14 Thread Jay Parashar
You have to specify which one to run. Each DIH will run only one XML (e.g. 
health-topics-conf.xml)

One thing, and please correct if wrong, I have noticed running DataImport for a 
particular config overwrites the existing data  for a document...that is, there 
is no way to preserve the existing data.
For example if you have a schema of 5 fields and running the 
health-topics-conf.xml  DIH  loads 3 of those fields of a document (id=XYZ)
And then running the encyclopedia-conf.xml DIH will overwrite those 3 fields 
for the same  document id = XYZ.

-Original Message-
From: Yangrui Guo [mailto:guoyang...@gmail.com] 
Sent: Tuesday, April 05, 2016 2:16 PM
To: solr-user@lucene.apache.org
Subject: Re: Multiple data-config.xml in one collection?

Hi Daniel,

So if I implement multiple dataimporthandler and do a full import, does Solr 
perform import of all handlers at once or can just specify which handler to 
import? Thank you

Yangrui

On Tuesday, April 5, 2016, Davis, Daniel (NIH/NLM) [C] 
wrote:

> If Shawn is correct, and you are using DIH, then I have done this by 
> implementing multiple requestHandlers each of them using Data Import 
> Handler, and have each specify a different XML file for the data config.
> Instead of using data-config.xml, I've used a large number of files such as:
> health-topics-conf.xml
> encyclopedia-conf.xml
> ...
> I tend to index a single valued, required field named "source" that I 
> can use in the delete query, and I use the TemplateTranformer to make this 
> easy:
>
>  ...
>transformer="TemplateTransformer">
>
>...
>
> Hope this helps,
>
> -Dan
>
> -Original Message-
> From: Shawn Heisey [mailto:apa...@elyograg.org ]
> Sent: Tuesday, April 05, 2016 10:50 AM
> To: solr-user@lucene.apache.org 
> Subject: Re: Multiple data-config.xml in one collection?
>
> On 4/5/2016 8:12 AM, Yangrui Guo wrote:
> > I'm using Solr Cloud to index a number of databases. The problem is 
> > there is unknown number of databases and each database has its own
> configuration.
> > If I create a single collection for every database the query would 
> > eventually become insanely long. Is it possible to upload different 
> > config to zookeeper for each node in a single collection?
>
> Every shard replica (core) in a collection shares the same 
> configuration, which it gets from zookeeper.  This is one of 
> SolrCloud's guarantees, to prevent problems found with old-style 
> sharding when the configuration is different on each machine.
>
> If you're using the dataimport handler, which you probably are since 
> you mentioned databases, you can parameterize pretty much everything 
> in the DIH config file so it comes from URL parameters on the 
> full-import or delta-import command.
>
> Below is a link to the DIH config that I'm using, redacted slightly.
> I'm not running SolrCloud, but the same thing should work in cloud.  
> It should give you some idea of how to use variables in your config, 
> set by parameters on the URL.
>
> https://urldefense.proofpoint.com/v2/url?u=http-3A__apaste.info_jtq&d=
> CwIBaQ&c=uGuXJ43KPkPWEl2imVFDmZQlhQUET7pVRA2PDIOxgqw&r=bRfqJEeedEKG5nk
> p5748YxbNMFrUYT3YiNl0Ni2vUBQ&m=ps8KnPZhgym3oVyuWub8JT0eZI39W0FLsBW4fx5
> 61NY&s=k7H8l9XT7yyH_KHFtnIi793EtkLZnUvOz3lZA1mV01s&e=
>
> Thanks,
> Shawn
>
>


Re: Solr 5.5 timeout of solrj client

2016-04-14 Thread Erick Erickson
don't issue an optimize command... either you have a solrj client that
issues a client.optimize() command or you pressed the "optimize now"
in the admin UI. Solr doesn't do this by itself.

Best,
Erick

On Thu, Apr 14, 2016 at 8:30 AM, Novin Novin  wrote:
> How can I stop happening "DirectUpdateHandler2 Starting optimize... Reading
> and rewriting the entire index! Use with care"
>
> Thanks
> novin
>
> On 14 April 2016 at 14:36, Shawn Heisey  wrote:
>
>> On 4/14/2016 7:23 AM, Novin Novin wrote:
>> > Thanks for reply Shawn.
>> >
>> > Below is snippet of jetty.xml and jetty-https.xml
>> >
>> > jetty.xml:38:> > name="solr.jetty.threads.idle.timeout" default="5000"/>
>> > /// I presume this one I should increase, But I believe 5 second is
>> enough
>> > time for 250 docs to add to solr.
>>
>> 5 seconds might not be enough time.  The *add* probably completes in
>> time, but the entire request might take longer, especially if you use
>> commit=true with the request.  I would definitely NOT set this timeout
>> so low -- requests that take longer than 5 seconds are very likely going
>> to happen.
>>
>> > I'm also seeing "DirectUpdateHandler2 Starting optimize... Reading and
>> > rewriting the entire index! Use with care". Would this be causing delay
>> > response from solr?
>>
>> Exactly how long an optimize takes is dependent on the size of your
>> index.  Rewriting an index that's a few hundred megabytes may take 30
>> seconds to a minute.  Rewriting an index that's several gigabytes will
>> take a few minutes.  Performance is typically lower during an optimize,
>> because the CPU and disks are very busy.
>>
>> Thanks,
>> Shawn
>>
>>


Re: MiniSolrCloudCluster usage in solr 7.0.0

2016-04-14 Thread Erick Erickson
Rohana:

Let's back up a bit, this really feels like an XY problem. Why do you
want to do this? MiniSolrCloudCluster is designed as a test mechanism,
it is not intended (AFAIK) for any kind of stand-alone operation so
you'd be on your own if thats your goal...

Best,
Erick

On Thu, Apr 14, 2016 at 7:32 AM, Rohana Rajapakse
 wrote:
> Thanks Shawn.
>
> I have added few dependency jars into my project. There are no compilation 
> errors or ClassNotFound exceptions, but Zookeeper exception " 
> KeeperException$NodeExistsException: KeeperErrorCode = NodeExists for 
> /solr/solr.xml ". My temporary solrHome folder has a solr.xml.  No other 
> files (solrconfig.xml , schema.xml) are provided. Thought it should start 
> solr cloud server with defaults, but it doesn't. There are no other solr or 
> zookeeper servers running on my machine.
>
>  I have had a look at the unit tests in solr-7.0 code base and tried tests in 
> TestMiniSolrCloudCluster() by copying this test file across to my apps 
> project. The tests fail possibly due to the difference in the environment. I 
> am running the tests in eclipse.
>
>
>
> -Original Message-
> From: Shawn Heisey [mailto:apa...@elyograg.org]
> Sent: 14 April 2016 14:00
> To: solr-user@lucene.apache.org
> Subject: Re: MiniSolrCloudCluster usage in solr 7.0.0
>
> On 4/14/2016 2:01 AM, Rohana Rajapakse wrote:
>> Can someone give a sample code snippet to create MiniSolrCloudCluster from a 
>> separate java application (outside of solr codebase).  Wants to know 
>> dependency jars and config files you need.
>
> I would imagine that you need to start with solr-test-framework.  I have 
> looked at this jar on maven central (also used by ivy), and the 6.0 version 
> has *eighty* direct dependencies, one of which is solr-core.
> These dependencies probably represent most of a full Solr install.
>
> As for sample code ... the class is heavily used in Solr tests, all of which 
> you can obtain by cloning the master branch with git.
>
> Thanks,
> Shawn
>
>
>
> Registered Office: 24 Darklake View, Estover, Plymouth, PL6 7TL.
> Company Registration No: 3553908
>
> This email contains proprietary information, some or all of which may be 
> legally privileged. It is for the intended recipient only. If an addressing 
> or transmission error has misdirected this email, please notify the author by 
> replying to this email. If you are not the intended recipient you may not 
> use, disclose, distribute, copy, print or rely on this email.
>
> Email transmission cannot be guaranteed to be secure or error free, as 
> information may be intercepted, corrupted, lost, destroyed, arrive late or 
> incomplete or contain viruses. This email and any files attached to it have 
> been checked with virus detection software before transmission. You should 
> nonetheless carry out your own virus check before opening any attachment. 
> GOSS Interactive Ltd accepts no liability for any loss or damage that may be 
> caused by software viruses.
>
>


RE: Multiple data-config.xml in one collection?

2016-04-14 Thread Davis, Daniel (NIH/NLM) [C]
Jay Parashar wrote:
> One thing, and please correct if wrong, I have noticed running DataImport for 
> a particular config overwrites the existing data  for a document...that is, 
> there is 
> no way to preserve the existing data.
> 
> For example if you have a schema of 5 fields and running the 
> health-topics-conf.xml  
> DIH  loads 3 of those fields of a document (id=XYZ) And then running the 
> encyclopedia-conf.xml 
> DIH will overwrite those 3 fields for the same  document id = XYZ.

Not quite so.   You're right that each RequestHandler has a *default* data 
config, 
specified in solrconfig.xml.   As most things Solr, this can be overridden.   
But it is still a 
good best practice.   You are right that if one DataImport imports the same ID 
as another, 
it will overwrite the older copy completely.   However, you can control the 
overlap so that
indexing is independent even into the same collection.

Suppose you have two configured request handlers:

/dataimport/healthtopics - this uses health-topics-conf.xml
/dataimport/encyclopedia - this uses encyclopedia-conf.xml

These two files can load *completely separate records* with different ids, and 
they can 
have different delete queries configured.   An excerpt from my 
health-topics-conf.xml:





   




  
  
  

  

Hope this helps,

Dan Davis, Systems/Applications Architect (Contractor),
Office of Computer and Communications Systems,
National Library of Medicine, NIH



-Original Message-
From: Jay Parashar [mailto:bparas...@slb.com] 
Sent: Thursday, April 14, 2016 11:43 AM
To: solr-user@lucene.apache.org
Subject: RE: Multiple data-config.xml in one collection?

You have to specify which one to run. Each DIH will run only one XML (e.g. 
health-topics-conf.xml)


-Original Message-
From: Yangrui Guo [mailto:guoyang...@gmail.com]
Sent: Tuesday, April 05, 2016 2:16 PM
To: solr-user@lucene.apache.org
Subject: Re: Multiple data-config.xml in one collection?

Hi Daniel,

So if I implement multiple dataimporthandler and do a full import, does Solr 
perform import of all handlers at once or can just specify which handler to 
import? Thank you

Yangrui

On Tuesday, April 5, 2016, Davis, Daniel (NIH/NLM) [C] 
wrote:

> If Shawn is correct, and you are using DIH, then I have done this by 
> implementing multiple requestHandlers each of them using Data Import 
> Handler, and have each specify a different XML file for the data config.
> Instead of using data-config.xml, I've used a large number of files such as:
> health-topics-conf.xml
> encyclopedia-conf.xml
> ...
> I tend to index a single valued, required field named "source" that I 
> can use in the delete query, and I use the TemplateTranformer to make this 
> easy:
>
>  ...
>transformer="TemplateTransformer">
>
>...
>
> Hope this helps,
>
> -Dan
>
> -Original Message-
> From: Shawn Heisey [mailto:apa...@elyograg.org ]
> Sent: Tuesday, April 05, 2016 10:50 AM
> To: solr-user@lucene.apache.org 
> Subject: Re: Multiple data-config.xml in one collection?
>
> On 4/5/2016 8:12 AM, Yangrui Guo wrote:
> > I'm using Solr Cloud to index a number of databases. The problem is 
> > there is unknown number of databases and each database has its own
> configuration.
> > If I create a single collection for every database the query would 
> > eventually become insanely long. Is it possible to upload different 
> > config to zookeeper for each node in a single collection?
>
> Every shard replica (core) in a collection shares the same 
> configuration, which it gets from zookeeper.  This is one of 
> SolrCloud's guarantees, to prevent problems found with old-style 
> sharding when the configuration is different on each machine.
>
> If you're using the dataimport handler, which you probably are since 
> you mentioned databases, you can parameterize pretty much everything 
> in the DIH config file so it comes from URL parameters on the 
> full-import or delta-import command.
>
> Below is a link to the DIH config that I'm using, redacted slightly.
> I'm not running SolrCloud, but the same thing should work in cloud.  
> It should give you some idea of how to use variables in your config, 
> set by parameters on the URL.
>
> https://urldefense.proofpoint.com/v2/url?u=http-3A__apaste.info_jtq&d=
> CwIBaQ&c=uGuXJ43KPkPWEl2imVFDmZQlhQUET7pVRA2PDIOxgqw&r=bRfqJEeedEKG5nk
> p5748YxbNMFrUYT3YiNl0Ni2vUBQ&m=ps8KnPZhgym3oVyuWub8JT0eZI39W0FLsBW4fx5
> 61NY&s=k7H8l9XT7yyH_KHFtnIi793EtkLZnUvOz3lZA1mV01s&e=
>
> Thanks,
> Shawn
>
>


Solr Support for BM25F

2016-04-14 Thread David Cawley
Hello,
I am developing an enterprise search engine for a project and I was hoping
to implement BM25F ranking algorithm to configure the tuning parameters on
a per field basis. I understand BM25 similarity is now supported in Solr
but I was hoping to be able to configure k1 and b for different fields such
as title, description, anchor etc, as they are structured documents.
I am fairly new to Solr so any help would be appreciated. If this is
possible or any steps as to how I can go about implementing this it would
be greatly appreciated.

Regards,

David

Current Solr Version 5.4.1


Re: Solr Support for BM25F

2016-04-14 Thread Doug Turnbull
Hey David

You can configure BM25 differently for each field by configuring the
similarity per field type, as shown here in this example from the Solr tests

https://github.com/sudarshang/lucene-solr/blob/master/solr/core/src/test-files/solr/conf/schema-bm25.xml#L32

On Thu, Apr 14, 2016 at 12:41 PM, David Cawley 
wrote:

> Hello,
> I am developing an enterprise search engine for a project and I was hoping
> to implement BM25F ranking algorithm to configure the tuning parameters on
> a per field basis. I understand BM25 similarity is now supported in Solr
> but I was hoping to be able to configure k1 and b for different fields such
> as title, description, anchor etc, as they are structured documents.
> I am fairly new to Solr so any help would be appreciated. If this is
> possible or any steps as to how I can go about implementing this it would
> be greatly appreciated.
>
> Regards,
>
> David
>
> Current Solr Version 5.4.1
>



-- 
*Doug Turnbull **| *Search Relevance Consultant | OpenSource Connections
, LLC | 240.476.9983
Author: Relevant Search 
This e-mail and all contents, including attachments, is considered to be
Company Confidential unless explicitly stated otherwise, regardless
of whether attachments are marked as such.


Re: Solr Support for BM25F

2016-04-14 Thread Chris Hostetter

: a per field basis. I understand BM25 similarity is now supported in Solr

BM25 has been supported for a while, the major change recently is that it 
is now the underlying default in Solr 6.

: but I was hoping to be able to configure k1 and b for different fields such
: as title, description, anchor etc, as they are structured documents.

What you can do in Solr is configured idff Similarity instances on a 
per-fieldType basis -- but you can have as many fieldTypes in your schema 
as you want, so you could have one type used just by your title field, and 
a diff type used just by your description field, etc...

: Current Solr Version 5.4.1

You can download the solr refrence guide for 5.4 from here...

http://archive.apache.org/dist/lucene/solr/ref-guide/

You'll want to search for Similarity and in particularly 
"SchemaSimilarityFactory" which (in 5.4) you'll have to configure 
explicitly in order to use diff BM25Similarity instances for each 
fieldType.

In 6.0, SchemaSimilarityFactory is the global default, with BM25 as 
the per-field default...

The current (draft) guide for 6.0 (not yet released) has info on that...
https://cwiki.apache.org/confluence/display/solr/Other+Schema+Elements




-Hoss
http://www.lucidworks.com/


Re: Solr Support for BM25F

2016-04-14 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
Hi David, 

I implemented bm25f for Europeana on Solr 4.x a couple of years ago,
you can find it here:

https://github.com/europeana/contrib/tree/master/bm25f-ranking

maybe I should contribute it back.. 
Please do not hesitate to contact me if you need help :) 
Cheers,
Diego

From: solr-user@lucene.apache.org At: Apr 14 2016 17:48:50
To: solr-user@lucene.apache.org
Subject: Re: Solr Support for BM25F

Hey David

You can configure BM25 differently for each field by configuring the
similarity per field type, as shown here in this example from the Solr tests

https://github.com/sudarshang/lucene-solr/blob/master/solr/core/src/test-files/solr/conf/schema-bm25.xml#L32

On Thu, Apr 14, 2016 at 12:41 PM, David Cawley 
wrote:

> Hello,
> I am developing an enterprise search engine for a project and I was hoping
> to implement BM25F ranking algorithm to configure the tuning parameters on
> a per field basis. I understand BM25 similarity is now supported in Solr
> but I was hoping to be able to configure k1 and b for different fields such
> as title, description, anchor etc, as they are structured documents.
> I am fairly new to Solr so any help would be appreciated. If this is
> possible or any steps as to how I can go about implementing this it would
> be greatly appreciated.
>
> Regards,
>
> David
>
> Current Solr Version 5.4.1
>


-- 
*Doug Turnbull **| *Search Relevance Consultant | OpenSource Connections
, LLC | 240.476.9983
Author: Relevant Search 
This e-mail and all contents, including attachments, is considered to be
Company Confidential unless explicitly stated otherwise, regardless
of whether attachments are marked as such.




RE: Multiple data-config.xml in one collection?

2016-04-14 Thread Jay Parashar
Thanks a lot Daniel.


-Original Message-
From: Davis, Daniel (NIH/NLM) [C] [mailto:daniel.da...@nih.gov] 
Sent: Thursday, April 14, 2016 11:41 AM
To: solr-user@lucene.apache.org
Subject: RE: Multiple data-config.xml in one collection?

Jay Parashar wrote:
> One thing, and please correct if wrong, I have noticed running 
> DataImport for a particular config overwrites the existing data  for a 
> document...that is, there is no way to preserve the existing data.
> 
> For example if you have a schema of 5 fields and running the 
> health-topics-conf.xml DIH  loads 3 of those fields of a document 
> (id=XYZ) And then running the encyclopedia-conf.xml DIH will overwrite those 
> 3 fields for the same  document id = XYZ.

Not quite so.   You're right that each RequestHandler has a *default* data 
config, 
specified in solrconfig.xml.   As most things Solr, this can be overridden.   
But it is still a 
good best practice.   You are right that if one DataImport imports the same ID 
as another, 
it will overwrite the older copy completely.   However, you can control the 
overlap so that
indexing is independent even into the same collection.

Suppose you have two configured request handlers:

/dataimport/healthtopics - this uses health-topics-conf.xml
/dataimport/encyclopedia - this uses encyclopedia-conf.xml

These two files can load *completely separate records* with different ids, and 
they can 
have different delete queries configured.   An excerpt from my 
health-topics-conf.xml:





   




  
  
  

  

Hope this helps,

Dan Davis, Systems/Applications Architect (Contractor), Office of Computer and 
Communications Systems, National Library of Medicine, NIH



-Original Message-
From: Jay Parashar [mailto:bparas...@slb.com]
Sent: Thursday, April 14, 2016 11:43 AM
To: solr-user@lucene.apache.org
Subject: RE: Multiple data-config.xml in one collection?

You have to specify which one to run. Each DIH will run only one XML (e.g. 
health-topics-conf.xml)


-Original Message-
From: Yangrui Guo [mailto:guoyang...@gmail.com]
Sent: Tuesday, April 05, 2016 2:16 PM
To: solr-user@lucene.apache.org
Subject: Re: Multiple data-config.xml in one collection?

Hi Daniel,

So if I implement multiple dataimporthandler and do a full import, does Solr 
perform import of all handlers at once or can just specify which handler to 
import? Thank you

Yangrui

On Tuesday, April 5, 2016, Davis, Daniel (NIH/NLM) [C] 
wrote:

> If Shawn is correct, and you are using DIH, then I have done this by 
> implementing multiple requestHandlers each of them using Data Import 
> Handler, and have each specify a different XML file for the data config.
> Instead of using data-config.xml, I've used a large number of files such as:
> health-topics-conf.xml
> encyclopedia-conf.xml
> ...
> I tend to index a single valued, required field named "source" that I 
> can use in the delete query, and I use the TemplateTranformer to make this 
> easy:
>
>  ...
>transformer="TemplateTransformer">
>
>...
>
> Hope this helps,
>
> -Dan
>
> -Original Message-
> From: Shawn Heisey [mailto:apa...@elyograg.org ]
> Sent: Tuesday, April 05, 2016 10:50 AM
> To: solr-user@lucene.apache.org 
> Subject: Re: Multiple data-config.xml in one collection?
>
> On 4/5/2016 8:12 AM, Yangrui Guo wrote:
> > I'm using Solr Cloud to index a number of databases. The problem is 
> > there is unknown number of databases and each database has its own
> configuration.
> > If I create a single collection for every database the query would 
> > eventually become insanely long. Is it possible to upload different 
> > config to zookeeper for each node in a single collection?
>
> Every shard replica (core) in a collection shares the same 
> configuration, which it gets from zookeeper.  This is one of 
> SolrCloud's guarantees, to prevent problems found with old-style 
> sharding when the configuration is different on each machine.
>
> If you're using the dataimport handler, which you probably are since 
> you mentioned databases, you can parameterize pretty much everything 
> in the DIH config file so it comes from URL parameters on the 
> full-import or delta-import command.
>
> Below is a link to the DIH config that I'm using, redacted slightly.
> I'm not running SolrCloud, but the same thing should work in cloud.  
> It should give you some idea of how to use variables in your config, 
> set by parameters on the URL.
>
> https://urldefense.proofpoint.com/v2/url?u=http-3A__apaste.info_jtq&d=
> CwIBaQ&c=uGuXJ43KPkPWEl2imVFDmZQlhQUET7pVRA2PDIOxgqw&r=bRfqJEeedEKG5nk
> p5748YxbNMFrUYT3YiNl0Ni2vUBQ&m=ps8KnPZhgym3oVyuWub8JT0eZI39W0FLsBW4fx5
> 61NY&s=k7H8l9XT7yyH_KHFtnIi793EtkLZnUvOz3lZA1mV01s&e=
>
> Thanks,
> Shawn
>
>


Re: Solr 5.5 timeout of solrj client

2016-04-14 Thread Novin Novin
Thanks Erick,  for pointing out.  You are right.  I was optimizing every 10
mins.  And I have change this to every day in night.
On 14-Apr-2016 5:20 pm, "Erick Erickson"  wrote:

> don't issue an optimize command... either you have a solrj client that
> issues a client.optimize() command or you pressed the "optimize now"
> in the admin UI. Solr doesn't do this by itself.
>
> Best,
> Erick
>
> On Thu, Apr 14, 2016 at 8:30 AM, Novin Novin  wrote:
> > How can I stop happening "DirectUpdateHandler2 Starting optimize...
> Reading
> > and rewriting the entire index! Use with care"
> >
> > Thanks
> > novin
> >
> > On 14 April 2016 at 14:36, Shawn Heisey  wrote:
> >
> >> On 4/14/2016 7:23 AM, Novin Novin wrote:
> >> > Thanks for reply Shawn.
> >> >
> >> > Below is snippet of jetty.xml and jetty-https.xml
> >> >
> >> > jetty.xml:38: >> > name="solr.jetty.threads.idle.timeout" default="5000"/>
> >> > /// I presume this one I should increase, But I believe 5 second is
> >> enough
> >> > time for 250 docs to add to solr.
> >>
> >> 5 seconds might not be enough time.  The *add* probably completes in
> >> time, but the entire request might take longer, especially if you use
> >> commit=true with the request.  I would definitely NOT set this timeout
> >> so low -- requests that take longer than 5 seconds are very likely going
> >> to happen.
> >>
> >> > I'm also seeing "DirectUpdateHandler2 Starting optimize... Reading and
> >> > rewriting the entire index! Use with care". Would this be causing
> delay
> >> > response from solr?
> >>
> >> Exactly how long an optimize takes is dependent on the size of your
> >> index.  Rewriting an index that's a few hundred megabytes may take 30
> >> seconds to a minute.  Rewriting an index that's several gigabytes will
> >> take a few minutes.  Performance is typically lower during an optimize,
> >> because the CPU and disks are very busy.
> >>
> >> Thanks,
> >> Shawn
> >>
> >>
>


RE: Solr Support for BM25F

2016-04-14 Thread Jay Parashar
To use per-field similarity you have to add  to your schema.xml file:
And then in individual fields you can use the BM25 with different k1 and b.

-Original Message-
From: David Cawley [mailto:david.cawl...@mail.dcu.ie] 
Sent: Thursday, April 14, 2016 11:42 AM
To: solr-user@lucene.apache.org
Subject: Solr Support for BM25F

Hello,
I am developing an enterprise search engine for a project and I was hoping to 
implement BM25F ranking algorithm to configure the tuning parameters on a per 
field basis. I understand BM25 similarity is now supported in Solr but I was 
hoping to be able to configure k1 and b for different fields such as title, 
description, anchor etc, as they are structured documents.
I am fairly new to Solr so any help would be appreciated. If this is possible 
or any steps as to how I can go about implementing this it would be greatly 
appreciated.

Regards,

David

Current Solr Version 5.4.1


Re: Solr 5.5 timeout of solrj client

2016-04-14 Thread Erick Erickson
Unless you have somewhat unusual circumstances, I wouldn't optimize at
all, despite the name it really doesn't help all that much in _most_
cases.

If your percentage deleted docs doesn't exceed, say, 15-20% I wouldn't
bother. Most of what optimize does is reclaim resources from deleted
docs. This happens as part of general background merging anyway.

There have been some reports of 10-15% query performance after
optimizing, but I would measure on your system before expending the
resources optimizing.

Best,
Erick

On Thu, Apr 14, 2016 at 9:56 AM, Novin Novin  wrote:
> Thanks Erick,  for pointing out.  You are right.  I was optimizing every 10
> mins.  And I have change this to every day in night.
> On 14-Apr-2016 5:20 pm, "Erick Erickson"  wrote:
>
>> don't issue an optimize command... either you have a solrj client that
>> issues a client.optimize() command or you pressed the "optimize now"
>> in the admin UI. Solr doesn't do this by itself.
>>
>> Best,
>> Erick
>>
>> On Thu, Apr 14, 2016 at 8:30 AM, Novin Novin  wrote:
>> > How can I stop happening "DirectUpdateHandler2 Starting optimize...
>> Reading
>> > and rewriting the entire index! Use with care"
>> >
>> > Thanks
>> > novin
>> >
>> > On 14 April 2016 at 14:36, Shawn Heisey  wrote:
>> >
>> >> On 4/14/2016 7:23 AM, Novin Novin wrote:
>> >> > Thanks for reply Shawn.
>> >> >
>> >> > Below is snippet of jetty.xml and jetty-https.xml
>> >> >
>> >> > jetty.xml:38:> >> > name="solr.jetty.threads.idle.timeout" default="5000"/>
>> >> > /// I presume this one I should increase, But I believe 5 second is
>> >> enough
>> >> > time for 250 docs to add to solr.
>> >>
>> >> 5 seconds might not be enough time.  The *add* probably completes in
>> >> time, but the entire request might take longer, especially if you use
>> >> commit=true with the request.  I would definitely NOT set this timeout
>> >> so low -- requests that take longer than 5 seconds are very likely going
>> >> to happen.
>> >>
>> >> > I'm also seeing "DirectUpdateHandler2 Starting optimize... Reading and
>> >> > rewriting the entire index! Use with care". Would this be causing
>> delay
>> >> > response from solr?
>> >>
>> >> Exactly how long an optimize takes is dependent on the size of your
>> >> index.  Rewriting an index that's a few hundred megabytes may take 30
>> >> seconds to a minute.  Rewriting an index that's several gigabytes will
>> >> take a few minutes.  Performance is typically lower during an optimize,
>> >> because the CPU and disks are very busy.
>> >>
>> >> Thanks,
>> >> Shawn
>> >>
>> >>
>>


Singular Plural Results Inconsistent - SOLR v3.6 and EnglishMinimalStemFilterFactor

2016-04-14 Thread Sara Woodmansee
Hello all,

I posted yesterday, however I never received my own post, so worried it did not 
go through (?) Also, I am not a coder, so apologies if not appropriate to post 
here. I honestly don't know where else to turn, and am determined to find a 
solution, as search is essential to our site.

We are having a website built with a search engine based on SOLR v3.6. For 
stemming, the developer uses EnglishMinimalStemFilterFactory. They were 
previously using PorterStemFilterFactory which worked better with plural forms, 
however PorterStemFilterFactory was not working correctly with –ing endings. 
“icing” becoming "ic", for example.

Most search terms work fine, but we have inconsistent results (singular vs 
plural) with terms that end in -ee, -oe, -ie, -ae,  and words that end in -s.  
In comparison, the following work fine: words that end with -oo, -ue, -e, -a.

The developers have been unable to find a solution ("Unfortunately we tried to 
apply all the filters for stemming but this problem is not resolved"), but this 
has to be a common issue (?) Someone surely has found a solution to this 
problem?? 

Any suggestions greatly appreciated.

Many thanks!
Sara 
_

DO NOT WORK:  Plural terms that end in -ee, -oe, -ie, -ae,  and words that end 
in -s.  

Examples: 

tree = 0 results
trees = 21 results

dungaree = 0 results
dungarees = 1 result

shoe = 0 results
shoes = 1 result

toe = 1 result
toes = 0 results

tie = 1 result
ties = 0 results

Cree = 0 results
Crees = 1 result

dais = 1 result
daises = 0 results

bias = 1 result
biases = 0 results

dress = 1 result
dresses = 0 results
_

WORKS:  Words that end with -oo, -ue, -e, -a

Examples: 

tide = 1 result
tides = 1 results

hue = 2 results
hues = 2 results

dakota = 1 result
dakotas = 1 result

loo = 1 result
loos = 1 result
_



Re: Solr 5.5 timeout of solrj client

2016-04-14 Thread Erick Erickson
BTW, the place optimize seems best used is when the index isn't
updated very often. I've seen a pattern where the index is updated
once a night (or even less). In that situation, optimization makes
more sense. But when an index is continually updated, it's mostly
wasted effort.

Best,
Erick

On Thu, Apr 14, 2016 at 10:17 AM, Erick Erickson
 wrote:
> Unless you have somewhat unusual circumstances, I wouldn't optimize at
> all, despite the name it really doesn't help all that much in _most_
> cases.
>
> If your percentage deleted docs doesn't exceed, say, 15-20% I wouldn't
> bother. Most of what optimize does is reclaim resources from deleted
> docs. This happens as part of general background merging anyway.
>
> There have been some reports of 10-15% query performance after
> optimizing, but I would measure on your system before expending the
> resources optimizing.
>
> Best,
> Erick
>
> On Thu, Apr 14, 2016 at 9:56 AM, Novin Novin  wrote:
>> Thanks Erick,  for pointing out.  You are right.  I was optimizing every 10
>> mins.  And I have change this to every day in night.
>> On 14-Apr-2016 5:20 pm, "Erick Erickson"  wrote:
>>
>>> don't issue an optimize command... either you have a solrj client that
>>> issues a client.optimize() command or you pressed the "optimize now"
>>> in the admin UI. Solr doesn't do this by itself.
>>>
>>> Best,
>>> Erick
>>>
>>> On Thu, Apr 14, 2016 at 8:30 AM, Novin Novin  wrote:
>>> > How can I stop happening "DirectUpdateHandler2 Starting optimize...
>>> Reading
>>> > and rewriting the entire index! Use with care"
>>> >
>>> > Thanks
>>> > novin
>>> >
>>> > On 14 April 2016 at 14:36, Shawn Heisey  wrote:
>>> >
>>> >> On 4/14/2016 7:23 AM, Novin Novin wrote:
>>> >> > Thanks for reply Shawn.
>>> >> >
>>> >> > Below is snippet of jetty.xml and jetty-https.xml
>>> >> >
>>> >> > jetty.xml:38:>> >> > name="solr.jetty.threads.idle.timeout" default="5000"/>
>>> >> > /// I presume this one I should increase, But I believe 5 second is
>>> >> enough
>>> >> > time for 250 docs to add to solr.
>>> >>
>>> >> 5 seconds might not be enough time.  The *add* probably completes in
>>> >> time, but the entire request might take longer, especially if you use
>>> >> commit=true with the request.  I would definitely NOT set this timeout
>>> >> so low -- requests that take longer than 5 seconds are very likely going
>>> >> to happen.
>>> >>
>>> >> > I'm also seeing "DirectUpdateHandler2 Starting optimize... Reading and
>>> >> > rewriting the entire index! Use with care". Would this be causing
>>> delay
>>> >> > response from solr?
>>> >>
>>> >> Exactly how long an optimize takes is dependent on the size of your
>>> >> index.  Rewriting an index that's a few hundred megabytes may take 30
>>> >> seconds to a minute.  Rewriting an index that's several gigabytes will
>>> >> take a few minutes.  Performance is typically lower during an optimize,
>>> >> because the CPU and disks are very busy.
>>> >>
>>> >> Thanks,
>>> >> Shawn
>>> >>
>>> >>
>>>


DIH error - Bad Request

2016-04-14 Thread Brian Narsi
We have solr 5.1.0 running for several months retrieving about 10.5
millions records with no issues and no errors or warning in logs. I checked
several times and the number of records reported as processed in DIH was
exactly the same number in the collection.

Recently I reviewed logs and found out the following:

DistributedUpdateProcessor
Error sending update to http://ip/solr
org.apache.solr.common.SolrException: Bad Request


StreamingSolrClients
error
org.apache.solr.common.SolrException: Bad Request

request:
http://ip:7574/solr/collection_shard2_replica1/update?update.distrib=TOLEADER&distrib.from=http%3A%2F%2F1ip%3A8983%2Fsolr%2Fcollection_shard1_replica2%2F&wt=javabin&version=2
at
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.run(ConcurrentUpdateSolrClient.java:241)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)


I also found out that although the DIH reports as having processed the full
10.5 millions records, the actual number of records in the collection are
about a couple of hundred less.

Any ideas/suggestions on what can be wrong?

Thanks


Re: How to search for a First, Last of contact which are stored in differnet multivalued fields

2016-04-14 Thread Thrinadh Kuppili
Thank you Eric

Will try it and let you know.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-search-for-a-First-Last-of-contact-which-are-stored-in-differnet-multivalued-fields-tp4269901p4270192.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Singular Plural Results Inconsistent - SOLR v3.6 and EnglishMinimalStemFilterFactor

2016-04-14 Thread Walter Underwood
Solr 3.6 is a VERY old release. You won’t see any fixes for that.

I would recommend starting with Solr 5.5 and keeping an eye on Solr 6.x, which 
has just started releases.

Removing -ing endings is pretty aggressive. That changes “tracking meeting” 
into “track meet”. Most of the time, you’ll be better off with an inflectional 
stemmer that just converts plurals to singulars and other similar changes.

The Porter stemmer does not produce dictionary words. It produces “stems”. 
Those are the same for the singular and plural forms of a word, but the stem 
might not be a word.

1. Start using Solr 5.5. That automatically gets you four years of bug fixes 
and performance improvements.
2. Look at the options for language analysis in the current release of Solr: 
https://cwiki.apache.org/confluence/display/solr/Language+Analysis 

3. Learn the analysis tool in the Solr admin UI. That allows you to explore the 
behavior.
4. If you really need a high grade morphological analyzer, consider purchasing 
one from Basis Technology: http://www.rosette.com/solr/ 


wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Apr 14, 2016, at 10:17 AM, Sara Woodmansee  wrote:
> 
> Hello all,
> 
> I posted yesterday, however I never received my own post, so worried it did 
> not go through (?) Also, I am not a coder, so apologies if not appropriate to 
> post here. I honestly don't know where else to turn, and am determined to 
> find a solution, as search is essential to our site.
> 
> We are having a website built with a search engine based on SOLR v3.6. For 
> stemming, the developer uses EnglishMinimalStemFilterFactory. They were 
> previously using PorterStemFilterFactory which worked better with plural 
> forms, however PorterStemFilterFactory was not working correctly with –ing 
> endings. “icing” becoming "ic", for example.
> 
> Most search terms work fine, but we have inconsistent results (singular vs 
> plural) with terms that end in -ee, -oe, -ie, -ae,  and words that end in -s. 
>  In comparison, the following work fine: words that end with -oo, -ue, -e, -a.
> 
> The developers have been unable to find a solution ("Unfortunately we tried 
> to apply all the filters for stemming but this problem is not resolved"), but 
> this has to be a common issue (?) Someone surely has found a solution to this 
> problem?? 
> 
> Any suggestions greatly appreciated.
> 
> Many thanks!
> Sara 
> _
> 
> DO NOT WORK:  Plural terms that end in -ee, -oe, -ie, -ae,  and words that 
> end in -s.  
> 
> Examples: 
> 
> tree = 0 results
> trees = 21 results
> 
> dungaree = 0 results
> dungarees = 1 result
> 
> shoe = 0 results
> shoes = 1 result
> 
> toe = 1 result
> toes = 0 results
> 
> tie = 1 result
> ties = 0 results
> 
> Cree = 0 results
> Crees = 1 result
> 
> dais = 1 result
> daises = 0 results
> 
> bias = 1 result
> biases = 0 results
> 
> dress = 1 result
> dresses = 0 results
> _
> 
> WORKS:  Words that end with -oo, -ue, -e, -a
> 
> Examples: 
> 
> tide = 1 result
> tides = 1 results
> 
> hue = 2 results
> hues = 2 results
> 
> dakota = 1 result
> dakotas = 1 result
> 
> loo = 1 result
> loos = 1 result
> _
> 



SolrTestCaseJ4 errors with SOLR 4.9 (works with SOLR 4.8.1)

2016-04-14 Thread vsrikanthp
Hi,

I'm trying to upgrade from SOLR 4.8.1 to SOLR 4.9. Some of our test cases
(using SolrTestCaseJ4 framework) which work with 4.8.1 are failing when I
try to run them with SOLR 4.9.
I'm trying to figure out (and fix) the test cases. 

We are using maven surefire plugin with JUNIT to run the tests.

Error:
java.lang.IllegalAccessError: class
org.apache.lucene.codecs.diskdv.DiskDocValuesFormat$1 cannot access its
superclass org.apache.lucene.codecs.lucene45.Lucene45DocValuesConsumer
at __randomizedtesting.SeedInfo.seed([3C587EB0DE791716]:0)
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
at
java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:789)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.getDeclaredConstructors0(Native Method)
at java.lang.Class.privateGetDeclaredConstructors(Class.java:2585)
at java.lang.Class.getConstructor0(Class.java:2885)
at java.lang.Class.newInstance(Class.java:350)
at
org.apache.lucene.util.NamedSPILoader.reload(NamedSPILoader.java:67)
at
org.apache.lucene.codecs.DocValuesFormat.reloadDocValuesFormats(DocValuesFormat.java:121)
at
org.apache.solr.core.SolrResourceLoader.reloadLuceneSPI(SolrResourceLoader.java:205)
at org.apache.solr.core.SolrConfig.initLibs(SolrConfig.java:587)
at org.apache.solr.core.SolrConfig.(SolrConfig.java:162)
at org.apache.solr.core.SolrConfig.(SolrConfig.java:139)
at
org.apache.solr.util.TestHarness.createConfig(TestHarness.java:74)
at
org.apache.solr.SolrTestCaseJ4.createCore(SolrTestCaseJ4.java:553)
at org.apache.solr.SolrTestCaseJ4.initCore(SolrTestCaseJ4.java:546)
at org.apache.solr.SolrTestCaseJ4.initCore(SolrTestCaseJ4.java:366)


>From the SOLR 4.9 Release Notes, in "Upgrading from Solr 4.8", I see:
Support for DiskDocValuesFormat (ie: fieldTypes configured with
docValuesFormat="Disk") has been removed due to poor performance.

As a result of that, the lucene-codecs-4.9.0.jar doesnt have
"DiskDocValuesFormat$1" class anymore, which somewhat makes sense.

Question is: What should I do to get rid of this error? since it seems to be
coming from inside of SolrTestCaseJ4. Our code base doesn't use
DiskDocValuesFormat for any fieldType.

>From my observations, SolrTestCaseJ4 seems to be passing different codecs
(randomly) to the test cases but regardless of the codec chosen, the error
is always the same mentioned above.

I found the following on the lucene forum that mentions same error that I'm
getting but in a different context. No solution is mentioned, however.

http://lucene.472066.n3.nabble.com/Have-anyone-used-Automatic-Phrase-Tokenization-AutoPhrasingTokenFilterFactory-td4173808i20.html#a4195182

Please advise as to what I'm doing wrong.

Thanks,
Sri




--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrTestCaseJ4-errors-with-SOLR-4-9-works-with-SOLR-4-8-1-tp4270200.html
Sent from the Solr - User mailing list archive at Nabble.com.


Referencing incoming search terms in searchHandler XML

2016-04-14 Thread John Bickerstaff
I have the following (essentially hard-coded) line in the Solr Admin Query
UI

=
bq: contentType:(searchTerm1 searchTerm2 searchTerm2)^1000
=

The "searchTerm" entries represent whatever the user typed into the search
box.  This can be one or more words.  Usually less than 5.

I want to put the search parameters I've built in the Admin UI into a
requestHandler.

I think that means I need a like like this in the searchHandler in
solrconfig.xml

=
contentType:(magic_reference_to_incoming_search)^1000
-->
=

Am I oversimplifying?

How can I accurately reference the incoming search terms as a "variable" or
parameter in the requestHandler XML?

Is it as simple as $q?  Something more complex?

Is there any choice besides the somewhat arcane local params?  If not, what
is the simplest, most straightforward way to reference incoming query terms
using local params?

Thanks...


Growing memory?

2016-04-14 Thread Betsey Benagh
X-posted from stack overflow...

I'm running solr 6.0.0 in server mode. I have one core. I loaded about 2000 
documents in, and it was using about 54 MB of memory. No problem. Nobody was 
issuing queries or doing anything else, but over the course of about 4 hours, 
the memory usage had tripled to 152 MB. I shut solr down and restarted it, and 
saw the memory usage back at 54 MB. Again, with no queries or anything being 
executed against the core, the memory usage is creeping up - after 17 minutes, 
it was up to 60 MB. I've looked at the documentation for how to limit memory 
usage, but I want to understand why it's creeping up when nothing is happening, 
lest it run out of memory when I limit the usage. The machine is running CentOS 
6.6, if that matters, with Java 1.8.0_65.

Thanks!



Re: Referencing incoming search terms in searchHandler XML

2016-04-14 Thread John Bickerstaff
Maybe I'm overdoing it...

It seems to me that qf= text contentType^1000 would do this for me more
easily - as it appears to assume the incoming search terms...

However, I'd still like to know the simplest way to reference the search
terms in the XML - or possibly get a URL that points the way.

Thanks.

On Thu, Apr 14, 2016 at 12:34 PM, John Bickerstaff  wrote:

> I have the following (essentially hard-coded) line in the Solr Admin Query
> UI
>
> =
> bq: contentType:(searchTerm1 searchTerm2 searchTerm2)^1000
> =
>
> The "searchTerm" entries represent whatever the user typed into the search
> box.  This can be one or more words.  Usually less than 5.
>
> I want to put the search parameters I've built in the Admin UI into a
> requestHandler.
>
> I think that means I need a like like this in the searchHandler in
> solrconfig.xml
>
> =
> contentType:(magic_reference_to_incoming_search)^1000
> -->
> =
>
> Am I oversimplifying?
>
> How can I accurately reference the incoming search terms as a "variable"
> or parameter in the requestHandler XML?
>
> Is it as simple as $q?  Something more complex?
>
> Is there any choice besides the somewhat arcane local params?  If not,
> what is the simplest, most straightforward way to reference incoming query
> terms using local params?
>
> Thanks...
>


Re: Growing memory?

2016-04-14 Thread Erick Erickson
well, things _are_ running, specifically the communications channels
are looking for incoming messages and the like, generating garbage
etc.

Try attaching jconsole to the process and hitting the GC button to
force a garbage collection. As long as your memory gets to some level
and drops back to that level after forcing GCs, you'll be fine.

Best,
Erick

On Thu, Apr 14, 2016 at 11:45 AM, Betsey Benagh
 wrote:
> X-posted from stack overflow...
>
> I'm running solr 6.0.0 in server mode. I have one core. I loaded about 2000 
> documents in, and it was using about 54 MB of memory. No problem. Nobody was 
> issuing queries or doing anything else, but over the course of about 4 hours, 
> the memory usage had tripled to 152 MB. I shut solr down and restarted it, 
> and saw the memory usage back at 54 MB. Again, with no queries or anything 
> being executed against the core, the memory usage is creeping up - after 17 
> minutes, it was up to 60 MB. I've looked at the documentation for how to 
> limit memory usage, but I want to understand why it's creeping up when 
> nothing is happening, lest it run out of memory when I limit the usage. The 
> machine is running CentOS 6.6, if that matters, with Java 1.8.0_65.
>
> Thanks!
>


Re: Singular Plural Results Inconsistent - SOLR v3.6 and EnglishMinimalStemFilterFactor

2016-04-14 Thread Jack Krupansky
Yes, this is the intended behavior. All of the Solr stemmers are based on
heuristics that are not perfect, and are not based on the real dictionary.
You can solve one problem by switching to another stemmer, but then you run
into a different problem, rinse and repeat.

The code has a specific rule that refrains from stemming a pattern that
also happens to match your specified cases:

if (s[len-3] == 'i' || s[len-3] == 'a' || s[len-3] == 'o' ||
s[len-3] == 'e')
  return len;

See:
https://github.com/apache/lucene-solr/blob/branch_3x/lucene/contrib/analyzers/common/src/java/org/apache/lucene/analysis/en/EnglishMinimalStemmer.java

So, xxxies, xxxaes, xxxoes, and xxxees will all remain unstemmed. Exactly
what the rationale for that rule was is unspecified in the code - no
comments, other than to point to this research document:
https://www.researchgate.net/publication/220433848_How_effective_is_suffixing



-- Jack Krupansky

On Thu, Apr 14, 2016 at 1:17 PM, Sara Woodmansee  wrote:

> Hello all,
>
> I posted yesterday, however I never received my own post, so worried it
> did not go through (?) Also, I am not a coder, so apologies if not
> appropriate to post here. I honestly don't know where else to turn, and am
> determined to find a solution, as search is essential to our site.
>
> We are having a website built with a search engine based on SOLR v3.6. For
> stemming, the developer uses EnglishMinimalStemFilterFactory. They were
> previously using PorterStemFilterFactory which worked better with plural
> forms, however PorterStemFilterFactory was not working correctly with –ing
> endings. “icing” becoming "ic", for example.
>
> Most search terms work fine, but we have inconsistent results (singular vs
> plural) with terms that end in -ee, -oe, -ie, -ae,  and words that end in
> -s.  In comparison, the following work fine: words that end with -oo, -ue,
> -e, -a.
>
> The developers have been unable to find a solution ("Unfortunately we
> tried to apply all the filters for stemming but this problem is not
> resolved"), but this has to be a common issue (?) Someone surely has found
> a solution to this problem??
>
> Any suggestions greatly appreciated.
>
> Many thanks!
> Sara
> _
>
> DO NOT WORK:  Plural terms that end in -ee, -oe, -ie, -ae,  and words that
> end in -s.
>
> Examples:
>
> tree = 0 results
> trees = 21 results
>
> dungaree = 0 results
> dungarees = 1 result
>
> shoe = 0 results
> shoes = 1 result
>
> toe = 1 result
> toes = 0 results
>
> tie = 1 result
> ties = 0 results
>
> Cree = 0 results
> Crees = 1 result
>
> dais = 1 result
> daises = 0 results
>
> bias = 1 result
> biases = 0 results
>
> dress = 1 result
> dresses = 0 results
> _
>
> WORKS:  Words that end with -oo, -ue, -e, -a
>
> Examples:
>
> tide = 1 result
> tides = 1 results
>
> hue = 2 results
> hues = 2 results
>
> dakota = 1 result
> dakotas = 1 result
>
> loo = 1 result
> loos = 1 result
> _
>
>


Re: Growing memory?

2016-04-14 Thread Betsey Benagh
Thanks for the quick response.  Forgive the naïve question, but shouldn¹t
it be doing garbage collection automatically? Having to manually force GC
via jconsole isn¹t a sustainable solution.

Thanks again,
betsey

On 4/14/16, 2:54 PM, "Erick Erickson"  wrote:

>well, things _are_ running, specifically the communications channels
>are looking for incoming messages and the like, generating garbage
>etc.
>
>Try attaching jconsole to the process and hitting the GC button to
>force a garbage collection. As long as your memory gets to some level
>and drops back to that level after forcing GCs, you'll be fine.
>
>Best,
>Erick
>
>On Thu, Apr 14, 2016 at 11:45 AM, Betsey Benagh
> wrote:
>> X-posted from stack overflow...
>>
>> I'm running solr 6.0.0 in server mode. I have one core. I loaded about
>>2000 documents in, and it was using about 54 MB of memory. No problem.
>>Nobody was issuing queries or doing anything else, but over the course
>>of about 4 hours, the memory usage had tripled to 152 MB. I shut solr
>>down and restarted it, and saw the memory usage back at 54 MB. Again,
>>with no queries or anything being executed against the core, the memory
>>usage is creeping up - after 17 minutes, it was up to 60 MB. I've looked
>>at the documentation for how to limit memory usage, but I want to
>>understand why it's creeping up when nothing is happening, lest it run
>>out of memory when I limit the usage. The machine is running CentOS 6.6,
>>if that matters, with Java 1.8.0_65.
>>
>> Thanks!
>>



Re: Referencing incoming search terms in searchHandler XML

2016-04-14 Thread Erick Erickson
You really don't do that in solrconfig.xml.

This seems like an XY problem. You're trying
to solve some particular use-case and accessing the
terms in solrconfig.xml. You've already found the ability
to configure edismax as your defType and apply boosts
to particular fields...

Best,
Erick

On Thu, Apr 14, 2016 at 11:53 AM, John Bickerstaff
 wrote:
> Maybe I'm overdoing it...
>
> It seems to me that qf= text contentType^1000 would do this for me more
> easily - as it appears to assume the incoming search terms...
>
> However, I'd still like to know the simplest way to reference the search
> terms in the XML - or possibly get a URL that points the way.
>
> Thanks.
>
> On Thu, Apr 14, 2016 at 12:34 PM, John Bickerstaff > wrote:
>
>> I have the following (essentially hard-coded) line in the Solr Admin Query
>> UI
>>
>> =
>> bq: contentType:(searchTerm1 searchTerm2 searchTerm2)^1000
>> =
>>
>> The "searchTerm" entries represent whatever the user typed into the search
>> box.  This can be one or more words.  Usually less than 5.
>>
>> I want to put the search parameters I've built in the Admin UI into a
>> requestHandler.
>>
>> I think that means I need a like like this in the searchHandler in
>> solrconfig.xml
>>
>> =
>> contentType:(magic_reference_to_incoming_search)^1000
>> -->
>> =
>>
>> Am I oversimplifying?
>>
>> How can I accurately reference the incoming search terms as a "variable"
>> or parameter in the requestHandler XML?
>>
>> Is it as simple as $q?  Something more complex?
>>
>> Is there any choice besides the somewhat arcane local params?  If not,
>> what is the simplest, most straightforward way to reference incoming query
>> terms using local params?
>>
>> Thanks...
>>


Re: Growing memory?

2016-04-14 Thread Shawn Heisey
On 4/14/2016 12:45 PM, Betsey Benagh wrote:
> I'm running solr 6.0.0 in server mode. I have one core. I loaded about 2000 
> documents in, and it was using about 54 MB of memory. No problem. Nobody was 
> issuing queries or doing anything else, but over the course of about 4 hours, 
> the memory usage had tripled to 152 MB. I shut solr down and restarted it, 
> and saw the memory usage back at 54 MB. Again, with no queries or anything 
> being executed against the core, the memory usage is creeping up - after 17 
> minutes, it was up to 60 MB. I've looked at the documentation for how to 
> limit memory usage, but I want to understand why it's creeping up when 
> nothing is happening, lest it run out of memory when I limit the usage. The 
> machine is running CentOS 6.6, if that matters, with Java 1.8.0_65.

When you start Solr 5.0 or later directly from the download or directly
after installing it with the service installer script (on *NIX
platforms), Solr starts with a 512MB Java heap.  You can change this if
you need to -- most Solr users do need to increase the heap size to a
few gigabytes.

Java uses a garbage collection memory model.  It's perfectly normal
during the operation of a Java program, even one that is not doing
anything you can see, for the memory utilization to rise up to the
configured heap size.  This is simply how things work in systems using a
garbage collection memory model.

Where exactly are you looking to find the memory utilization?  In the
admin UI, that number will go up over time, until one of the memory
pools gets full and Java does a garbage collection, and then it will
likely go down again.  From the operating system point of view, the
resident memory usage will increase up to a point (when the entire heap
has been allocated) and probably never go back down -- but it also
shouldn't go up either.

Thanks,
Shawn



Re: Growing memory?

2016-04-14 Thread Erick Erickson
Yes, it will do GC automatically, but only after some threshold
has been reached. It doesn't collect as soon as something is
no longer referenced.

So you typically see a sawtooth pattern where memory increases
for a while, then drops back when a GC happens, then increases,
then drops back.

Problem is that you never quite know when the GC has kicked in and
whether it's collected everything from all the spaces or not. Usually
tools like jConsole collect everything that can be collected

I'm not recommending you manually force GCs as a regular thing,
just to answer whether memory is really creeping up when you're not
doing anything by being able to set your expectations.

NOTE: I expect the permanently-used memory to increase from a
fresh start for a while, but level out pretty soon.

Best,
Erick

On Thu, Apr 14, 2016 at 12:00 PM, Betsey Benagh
 wrote:
> Thanks for the quick response.  Forgive the naïve question, but shouldn¹t
> it be doing garbage collection automatically? Having to manually force GC
> via jconsole isn¹t a sustainable solution.
>
> Thanks again,
> betsey
>
> On 4/14/16, 2:54 PM, "Erick Erickson"  wrote:
>
>>well, things _are_ running, specifically the communications channels
>>are looking for incoming messages and the like, generating garbage
>>etc.
>>
>>Try attaching jconsole to the process and hitting the GC button to
>>force a garbage collection. As long as your memory gets to some level
>>and drops back to that level after forcing GCs, you'll be fine.
>>
>>Best,
>>Erick
>>
>>On Thu, Apr 14, 2016 at 11:45 AM, Betsey Benagh
>> wrote:
>>> X-posted from stack overflow...
>>>
>>> I'm running solr 6.0.0 in server mode. I have one core. I loaded about
>>>2000 documents in, and it was using about 54 MB of memory. No problem.
>>>Nobody was issuing queries or doing anything else, but over the course
>>>of about 4 hours, the memory usage had tripled to 152 MB. I shut solr
>>>down and restarted it, and saw the memory usage back at 54 MB. Again,
>>>with no queries or anything being executed against the core, the memory
>>>usage is creeping up - after 17 minutes, it was up to 60 MB. I've looked
>>>at the documentation for how to limit memory usage, but I want to
>>>understand why it's creeping up when nothing is happening, lest it run
>>>out of memory when I limit the usage. The machine is running CentOS 6.6,
>>>if that matters, with Java 1.8.0_65.
>>>
>>> Thanks!
>>>
>


Re: Singular Plural Results Inconsistent - SOLR v3.6 and EnglishMinimalStemFilterFactor

2016-04-14 Thread Sara Woodmansee
Hi Walter and Jack,

Many thanks for your feedback!

I have no idea why the developer is using such an old version, but hoping that 
your feedback and suggestions will give them a push in the right direction.

Is it a huge undertaking to upgrade from v3.6 to v5.5?? (I surely hope not.)

Thanks again,
Sara


On Apr 14, 2016, at 2:55 PM, Jack Krupansky  wrote:
> 
> Yes, this is the intended behavior. All of the Solr stemmers are based on
> heuristics that are not perfect, and are not based on the real dictionary.
> You can solve one problem by switching to another stemmer, but then you run
> into a different problem, rinse and repeat.
> 
> The code has a specific rule that refrains from stemming a pattern that
> also happens to match your specified cases:
> 
>if (s[len-3] == 'i' || s[len-3] == 'a' || s[len-3] == 'o' ||
> s[len-3] == 'e')
>  return len;
> 
> See:
> https://github.com/apache/lucene-solr/blob/branch_3x/lucene/contrib/analyzers/common/src/java/org/apache/lucene/analysis/en/EnglishMinimalStemmer.java
>  
> 
> 
> So, xxxies, xxxaes, xxxoes, and xxxees will all remain unstemmed. Exactly
> what the rationale for that rule was is unspecified in the code - no
> comments, other than to point to this research document:
> https://www.researchgate.net/publication/220433848_How_effective_is_suffixing 
> 
> 
> 
> -- Jack Krupansky

> 
> 
>> On Apr 14, 2016, at 1:44 PM, Walter Underwood  wrote:
>> 
>> Solr 3.6 is a VERY old release. You won’t see any fixes for that.
>> 
>> I would recommend starting with Solr 5.5 and keeping an eye on Solr 6.x, 
>> which has just started releases.
>> 
>> Removing -ing endings is pretty aggressive. That changes “tracking meeting” 
>> into “track meet”. Most of the time, you’ll be better off with an 
>> inflectional stemmer that just converts plurals to singulars and other 
>> similar changes.
>> 
>> The Porter stemmer does not produce dictionary words. It produces “stems”. 
>> Those are the same for the singular and plural forms of a word, but the stem 
>> might not be a word.
>> 
>> 1. Start using Solr 5.5. That automatically gets you four years of bug fixes 
>> and performance improvements.
>> 2. Look at the options for language analysis in the current release of Solr: 
>> https://cwiki.apache.org/confluence/display/solr/Language+Analysis 
>> 
>> 3. Learn the analysis tool in the Solr admin UI. That allows you to explore 
>> the behavior.
>> 4. If you really need a high grade morphological analyzer, consider 
>> purchasing one from Basis Technology: http://www.rosette.com/solr/ 
>> 
>> 
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>> 
>>> On Apr 14, 2016, at 10:17 AM, Sara Woodmansee  wrote:
>>> 
>>> Hello all,
>>> 
>>> I posted yesterday, however I never received my own post, so worried it did 
>>> not go through (?) Also, I am not a coder, so apologies if not appropriate 
>>> to post here. I honestly don't know where else to turn, and am determined 
>>> to find a solution, as search is essential to our site.
>>> 
>>> We are having a website built with a search engine based on SOLR v3.6. For 
>>> stemming, the developer uses EnglishMinimalStemFilterFactory. They were 
>>> previously using PorterStemFilterFactory which worked better with plural 
>>> forms, however PorterStemFilterFactory was not working correctly with –ing 
>>> endings. “icing” becoming "ic", for example.
>>> 
>>> Most search terms work fine, but we have inconsistent results (singular vs 
>>> plural) with terms that end in -ee, -oe, -ie, -ae,  and words that end in 
>>> -s.  In comparison, the following work fine: words that end with -oo, -ue, 
>>> -e, -a.
>>> 
>>> The developers have been unable to find a solution ("Unfortunately we tried 
>>> to apply all the filters for stemming but this problem is not resolved"), 
>>> but this has to be a common issue (?) Someone surely has found a solution 
>>> to this problem?? 
>>> 
>>> Any suggestions greatly appreciated.
>>> 
>>> Many thanks!
>>> Sara 
>>> _
>>> 
>>> DO NOT WORK:  Plural terms that end in -ee, -oe, -ie, -ae,  and words that 
>>> end in -s.  
>>> 
>>> Examples: 
>>> 
>>> tree = 0 results
>>> trees = 21 results
>>> 
>>> dungaree = 0 results
>>> dungarees = 1 result
>>> 
>>> shoe = 0 results
>>> shoes = 1 result
>>> 
>>> toe = 1 result
>>> toes = 0 results
>>> 
>>> tie = 1 result
>>> ties = 0 results
>>> 
>>> Cree = 0 results
>>> Crees = 1 result
>>> 
>>> dais = 1 result
>>> daises = 0 results
>>> 
>>> bias = 1 result
>>> biases = 0 results
>>> 
>>> dress = 1 result
>>> dresses = 0 results
>>> _
>>> 
>>> WORKS:  Words 

Re: Referencing incoming search terms in searchHandler XML

2016-04-14 Thread John Bickerstaff
OK - that's interesting.  Perhaps I'm thinking too much like a developer
and just want to be able to reach into context and grab anything any time I
want...  Thanks for the input...

=

To clarify, I want to boost the document's score if the user enters a term
found in the contentType field.

As an example, the term "figo" is one of a few that are stored in the
contentType field.  It's not a multivalued field - one entry per document.

If a user types in "foobarbaz figo" I want all documents with "figo" in the
contentType field boosted above every other document in the results.  The
order of docs can be determined by the other scores - my user's rule is
simply that any with "figo" in contentType should be appear above any which
do NOT have "figo" in that field.

I can't know when the users will type any of the "magic" contentType terms
into the search, so I think I have to run the search every time against the
contentType field.

So - that's my underlying use case - and as I say, I'm beginning to think
the edismax setting of qf= text contentType^1000 answers my need really
well -- and is easier.  A quick test looks like I'm getting the results I
expect...





On Thu, Apr 14, 2016 at 1:02 PM, Erick Erickson 
wrote:

> You really don't do that in solrconfig.xml.
>
> This seems like an XY problem. You're trying
> to solve some particular use-case and accessing the
> terms in solrconfig.xml. You've already found the ability
> to configure edismax as your defType and apply boosts
> to particular fields...
>
> Best,
> Erick
>
> On Thu, Apr 14, 2016 at 11:53 AM, John Bickerstaff
>  wrote:
> > Maybe I'm overdoing it...
> >
> > It seems to me that qf= text contentType^1000 would do this for me more
> > easily - as it appears to assume the incoming search terms...
> >
> > However, I'd still like to know the simplest way to reference the search
> > terms in the XML - or possibly get a URL that points the way.
> >
> > Thanks.
> >
> > On Thu, Apr 14, 2016 at 12:34 PM, John Bickerstaff <
> j...@johnbickerstaff.com
> >> wrote:
> >
> >> I have the following (essentially hard-coded) line in the Solr Admin
> Query
> >> UI
> >>
> >> =
> >> bq: contentType:(searchTerm1 searchTerm2 searchTerm2)^1000
> >> =
> >>
> >> The "searchTerm" entries represent whatever the user typed into the
> search
> >> box.  This can be one or more words.  Usually less than 5.
> >>
> >> I want to put the search parameters I've built in the Admin UI into a
> >> requestHandler.
> >>
> >> I think that means I need a like like this in the searchHandler in
> >> solrconfig.xml
> >>
> >> =
> >>  name="bq">contentType:(magic_reference_to_incoming_search)^1000
> >> -->
> >> =
> >>
> >> Am I oversimplifying?
> >>
> >> How can I accurately reference the incoming search terms as a "variable"
> >> or parameter in the requestHandler XML?
> >>
> >> Is it as simple as $q?  Something more complex?
> >>
> >> Is there any choice besides the somewhat arcane local params?  If not,
> >> what is the simplest, most straightforward way to reference incoming
> query
> >> terms using local params?
> >>
> >> Thanks...
> >>
>


UUID processor handling of empty string

2016-04-14 Thread Susmit Shukla
Hi,

I have configured solr schema to generate unique id for a collection using
UUIDUpdateProcessorFactory

I am seeing a peculiar behavior - if the unique 'id' field is explicitly
set as empty string in the SolrInputDocument, the document gets indexed
with UUID update processor generating the id.
However, sorting does not work if uuid was generated in this way. Also
cursor functionality that depends on unique id sort also does not work.
I guess the correct behavior would be to fail the indexing if user provides
an empty string for a uuid field.

The issues do not happen if I omit the id field from the SolrInputDocument .

SolrInputDocument

solrDoc.addField("id", "");

...

I am using schema similar to below-







id




  id





 
   
 uuid
   



Thanks,
Susmit


Re: Growing memory?

2016-04-14 Thread Betsey Benagh
bin/solr status shows the memory usage increasing, as does the admin ui.

I¹m running this on a shared machine that is supporting several other
applications, so I can¹t be particularly greedy with memory usage.  Is
there anything out there that gives guidelines on what an appropriate
amount of heap is based on number of documents or whatever?  We¹re just
playing around with it right now, but it sounds like we may need a
different machine in order to load in all of the data we want to have
available.

Thanks,
betsey

On 4/14/16, 3:08 PM, "Shawn Heisey"  wrote:

>On 4/14/2016 12:45 PM, Betsey Benagh wrote:
>> I'm running solr 6.0.0 in server mode. I have one core. I loaded about
>>2000 documents in, and it was using about 54 MB of memory. No problem.
>>Nobody was issuing queries or doing anything else, but over the course
>>of about 4 hours, the memory usage had tripled to 152 MB. I shut solr
>>down and restarted it, and saw the memory usage back at 54 MB. Again,
>>with no queries or anything being executed against the core, the memory
>>usage is creeping up - after 17 minutes, it was up to 60 MB. I've looked
>>at the documentation for how to limit memory usage, but I want to
>>understand why it's creeping up when nothing is happening, lest it run
>>out of memory when I limit the usage. The machine is running CentOS 6.6,
>>if that matters, with Java 1.8.0_65.
>
>When you start Solr 5.0 or later directly from the download or directly
>after installing it with the service installer script (on *NIX
>platforms), Solr starts with a 512MB Java heap.  You can change this if
>you need to -- most Solr users do need to increase the heap size to a
>few gigabytes.
>
>Java uses a garbage collection memory model.  It's perfectly normal
>during the operation of a Java program, even one that is not doing
>anything you can see, for the memory utilization to rise up to the
>configured heap size.  This is simply how things work in systems using a
>garbage collection memory model.
>
>Where exactly are you looking to find the memory utilization?  In the
>admin UI, that number will go up over time, until one of the memory
>pools gets full and Java does a garbage collection, and then it will
>likely go down again.  From the operating system point of view, the
>resident memory usage will increase up to a point (when the entire heap
>has been allocated) and probably never go back down -- but it also
>shouldn't go up either.
>
>Thanks,
>Shawn
>



Re: Singular Plural Results Inconsistent - SOLR v3.6 and EnglishMinimalStemFilterFactor

2016-04-14 Thread Shawn Heisey
On 4/14/2016 11:17 AM, Sara Woodmansee wrote:
> I posted yesterday, however I never received my own post, so worried it did 
> not go through (?)

I *did* see your previous message, but couldn't immediately think of
anything constructive to say.  I've had a little bit of time on my lunch
break today to look deeper.

EnglishMinimalStemFilter is designed to *not* aggressively stem
everything it sees.  It appears that the behavior you are seeing is
probably intentional with that filter.

In 5.5.0 and 6.0.0, PorterStemFilter will handle words of the form you
mentioned correctly.  In the screenshot below, PSF means
"PorterStemFilter".  I did not check any earlier versions.  I already
had these versions on my system.

https://www.dropbox.com/s/ss48vinrtbgifce/stemmer-ee-es-6.0.0.png?dl=0

https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#Stemming

That version of Solr is over four years old.  Bugs in 3.x will *not* be
fixed.  Bugs in 4.x will also not be fixed.  On 5.x, only extremely
major bugs are likely to get any attention, and this does not qualify as
a major bug.



On another matter:

http://people.apache.org/~hossman/#threadhijack

You replied to a message with the subject "Solr Support for BM25F" ...
so your message is showing up within that thread.

https://www.dropbox.com/s/xi0o8z6smhd2n5d/woodmansee-thread-hijack.png?dl=0

Thanks,
Shawn



Re: UUID processor handling of empty string

2016-04-14 Thread Erick Erickson
What do you mean "doesn't work"? An empty string is
different than not being present. Thee UUID update
processor (I'm pretty sure) only adds a field if it
is _absent_. Specifying it as an empty string
fails that test so no value is added.

At that point, if this uuid field is also the ,
then each doc that comes in with an empty field will replace
the others.

If it's _not_ the , the sorting will be confusing.
All the empty string fields are equal, so the tiebreaker is
the internal Lucene doc ID, which may change as merges
happen. You can specify secondary sort fields to make the
sort predictable (the  field is popular for this).

Best,
Erick

On Thu, Apr 14, 2016 at 12:18 PM, Susmit Shukla  wrote:
> Hi,
>
> I have configured solr schema to generate unique id for a collection using
> UUIDUpdateProcessorFactory
>
> I am seeing a peculiar behavior - if the unique 'id' field is explicitly
> set as empty string in the SolrInputDocument, the document gets indexed
> with UUID update processor generating the id.
> However, sorting does not work if uuid was generated in this way. Also
> cursor functionality that depends on unique id sort also does not work.
> I guess the correct behavior would be to fail the indexing if user provides
> an empty string for a uuid field.
>
> The issues do not happen if I omit the id field from the SolrInputDocument .
>
> SolrInputDocument
>
> solrDoc.addField("id", "");
>
> ...
>
> I am using schema similar to below-
>
> 
>
> 
>
> 
>
> id
>
> 
> 
> 
>   id
> 
> 
> 
>
>
>  
>
>  uuid
>
> 
>
>
> Thanks,
> Susmit


Re: How to declare field type for IntPoint field in solr 6.0 schema?

2016-04-14 Thread rafis
Thank you, Shawn!

It can wait. There are other features in 6.0 I was waiting for. It is always
nice to have such improvements!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-declare-field-type-for-IntPoint-field-in-solr-6-0-schema-tp4270040p4270256.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Singular Plural Results Inconsistent - SOLR v3.6 and EnglishMinimalStemFilterFactor

2016-04-14 Thread Erick Erickson
re: upgrading to 5.x... 5X Solr's are NOT guaranteed to
read 3x indexes, you'd have to go through 4x to do that.

If you can re-index from scratch that would be best.

Best,
Erick

On Thu, Apr 14, 2016 at 12:29 PM, Shawn Heisey  wrote:
> On 4/14/2016 11:17 AM, Sara Woodmansee wrote:
>> I posted yesterday, however I never received my own post, so worried it did 
>> not go through (?)
>
> I *did* see your previous message, but couldn't immediately think of
> anything constructive to say.  I've had a little bit of time on my lunch
> break today to look deeper.
>
> EnglishMinimalStemFilter is designed to *not* aggressively stem
> everything it sees.  It appears that the behavior you are seeing is
> probably intentional with that filter.
>
> In 5.5.0 and 6.0.0, PorterStemFilter will handle words of the form you
> mentioned correctly.  In the screenshot below, PSF means
> "PorterStemFilter".  I did not check any earlier versions.  I already
> had these versions on my system.
>
> https://www.dropbox.com/s/ss48vinrtbgifce/stemmer-ee-es-6.0.0.png?dl=0
>
> https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#Stemming
>
> That version of Solr is over four years old.  Bugs in 3.x will *not* be
> fixed.  Bugs in 4.x will also not be fixed.  On 5.x, only extremely
> major bugs are likely to get any attention, and this does not qualify as
> a major bug.
>
> 
>
> On another matter:
>
> http://people.apache.org/~hossman/#threadhijack
>
> You replied to a message with the subject "Solr Support for BM25F" ...
> so your message is showing up within that thread.
>
> https://www.dropbox.com/s/xi0o8z6smhd2n5d/woodmansee-thread-hijack.png?dl=0
>
> Thanks,
> Shawn
>


Re: Growing memory?

2016-04-14 Thread Erick Erickson
In a word, "no", there are simply too many variables.
It's like asking "how much memory will a Java program
need?"

But Solr does like memory, both the Java heap and
the OS memory. Here's a long blog on how to scope
this out:

https://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

Best,
Erick

On Thu, Apr 14, 2016 at 12:25 PM, Betsey Benagh
 wrote:
> bin/solr status shows the memory usage increasing, as does the admin ui.
>
> I¹m running this on a shared machine that is supporting several other
> applications, so I can¹t be particularly greedy with memory usage.  Is
> there anything out there that gives guidelines on what an appropriate
> amount of heap is based on number of documents or whatever?  We¹re just
> playing around with it right now, but it sounds like we may need a
> different machine in order to load in all of the data we want to have
> available.
>
> Thanks,
> betsey
>
> On 4/14/16, 3:08 PM, "Shawn Heisey"  wrote:
>
>>On 4/14/2016 12:45 PM, Betsey Benagh wrote:
>>> I'm running solr 6.0.0 in server mode. I have one core. I loaded about
>>>2000 documents in, and it was using about 54 MB of memory. No problem.
>>>Nobody was issuing queries or doing anything else, but over the course
>>>of about 4 hours, the memory usage had tripled to 152 MB. I shut solr
>>>down and restarted it, and saw the memory usage back at 54 MB. Again,
>>>with no queries or anything being executed against the core, the memory
>>>usage is creeping up - after 17 minutes, it was up to 60 MB. I've looked
>>>at the documentation for how to limit memory usage, but I want to
>>>understand why it's creeping up when nothing is happening, lest it run
>>>out of memory when I limit the usage. The machine is running CentOS 6.6,
>>>if that matters, with Java 1.8.0_65.
>>
>>When you start Solr 5.0 or later directly from the download or directly
>>after installing it with the service installer script (on *NIX
>>platforms), Solr starts with a 512MB Java heap.  You can change this if
>>you need to -- most Solr users do need to increase the heap size to a
>>few gigabytes.
>>
>>Java uses a garbage collection memory model.  It's perfectly normal
>>during the operation of a Java program, even one that is not doing
>>anything you can see, for the memory utilization to rise up to the
>>configured heap size.  This is simply how things work in systems using a
>>garbage collection memory model.
>>
>>Where exactly are you looking to find the memory utilization?  In the
>>admin UI, that number will go up over time, until one of the memory
>>pools gets full and Java does a garbage collection, and then it will
>>likely go down again.  From the operating system point of view, the
>>resident memory usage will increase up to a point (when the entire heap
>>has been allocated) and probably never go back down -- but it also
>>shouldn't go up either.
>>
>>Thanks,
>>Shawn
>>
>


Re: Referencing incoming search terms in searchHandler XML

2016-04-14 Thread Erick Erickson
Right, edismax is where I'd start. NOTE: there are about a zillion
options here so you may find yourself lost in a bit of a maze for
a while, but it's usually faster than coding it yourself ;).

In this case, take a look at the "bq" parameter to edismax and
make it something like bq=contentType:(original query text here)^1000

In short, it's likely that someone has had this problem before and
there's a solution, said solution may not be easy to find though ;(

And also note that boosting is not definitive. By that I mean that
boosting just influences the score it does _not_ explicitly order the
results. So the docs with "figo" in the conentType field will tend to
the top, but won't be absolutely guaranteed to be there.



Best,
Erick

On Thu, Apr 14, 2016 at 12:18 PM, John Bickerstaff
 wrote:
> OK - that's interesting.  Perhaps I'm thinking too much like a developer
> and just want to be able to reach into context and grab anything any time I
> want...  Thanks for the input...
>
> =
>
> To clarify, I want to boost the document's score if the user enters a term
> found in the contentType field.
>
> As an example, the term "figo" is one of a few that are stored in the
> contentType field.  It's not a multivalued field - one entry per document.
>
> If a user types in "foobarbaz figo" I want all documents with "figo" in the
> contentType field boosted above every other document in the results.  The
> order of docs can be determined by the other scores - my user's rule is
> simply that any with "figo" in contentType should be appear above any which
> do NOT have "figo" in that field.
>
> I can't know when the users will type any of the "magic" contentType terms
> into the search, so I think I have to run the search every time against the
> contentType field.
>
> So - that's my underlying use case - and as I say, I'm beginning to think
> the edismax setting of qf= text contentType^1000 answers my need really
> well -- and is easier.  A quick test looks like I'm getting the results I
> expect...
>
>
>
>
>
> On Thu, Apr 14, 2016 at 1:02 PM, Erick Erickson 
> wrote:
>
>> You really don't do that in solrconfig.xml.
>>
>> This seems like an XY problem. You're trying
>> to solve some particular use-case and accessing the
>> terms in solrconfig.xml. You've already found the ability
>> to configure edismax as your defType and apply boosts
>> to particular fields...
>>
>> Best,
>> Erick
>>
>> On Thu, Apr 14, 2016 at 11:53 AM, John Bickerstaff
>>  wrote:
>> > Maybe I'm overdoing it...
>> >
>> > It seems to me that qf= text contentType^1000 would do this for me more
>> > easily - as it appears to assume the incoming search terms...
>> >
>> > However, I'd still like to know the simplest way to reference the search
>> > terms in the XML - or possibly get a URL that points the way.
>> >
>> > Thanks.
>> >
>> > On Thu, Apr 14, 2016 at 12:34 PM, John Bickerstaff <
>> j...@johnbickerstaff.com
>> >> wrote:
>> >
>> >> I have the following (essentially hard-coded) line in the Solr Admin
>> Query
>> >> UI
>> >>
>> >> =
>> >> bq: contentType:(searchTerm1 searchTerm2 searchTerm2)^1000
>> >> =
>> >>
>> >> The "searchTerm" entries represent whatever the user typed into the
>> search
>> >> box.  This can be one or more words.  Usually less than 5.
>> >>
>> >> I want to put the search parameters I've built in the Admin UI into a
>> >> requestHandler.
>> >>
>> >> I think that means I need a like like this in the searchHandler in
>> >> solrconfig.xml
>> >>
>> >> =
>> >> > name="bq">contentType:(magic_reference_to_incoming_search)^1000
>> >> -->
>> >> =
>> >>
>> >> Am I oversimplifying?
>> >>
>> >> How can I accurately reference the incoming search terms as a "variable"
>> >> or parameter in the requestHandler XML?
>> >>
>> >> Is it as simple as $q?  Something more complex?
>> >>
>> >> Is there any choice besides the somewhat arcane local params?  If not,
>> >> what is the simplest, most straightforward way to reference incoming
>> query
>> >> terms using local params?
>> >>
>> >> Thanks...
>> >>
>>


Re: Singular Plural Results Inconsistent - SOLR v3.6 and EnglishMinimalStemFilterFactor

2016-04-14 Thread Sara Woodmansee
Hi Shawn,

Thanks so much the feedback. And for the heads-up regarding (the bad form of) 
starting a new discussion from an existing one. Thought removing all content 
wouldn’t track to original. (Sigh). This is what you get when you have 
photographers posting to high-end forums. 

Thanks Erick, regarding upgrading to v5.  We actually just removed all test 
data from the site, so we can now upload all the true, final files and 
metadata. In some ways this could be a perfect time to upgrade to v5 (if I can 
talk the developer into it) since all metadata has to be re-ingested anyway..

All best,
Sara


> On Apr 14, 2016, at 3:31 PM, Erick Erickson  wrote:
> 
> re: upgrading to 5.x... 5X Solr's are NOT guaranteed to
> read 3x indexes, you'd have to go through 4x to do that.
> 
> If you can re-index from scratch that would be best.
> 
> Best,
> Erick
> 
> 
>> On Apr 14, 2016, at 3:29 PM, Shawn Heisey  wrote:
>> 
>> On 4/14/2016 11:17 AM, Sara Woodmansee wrote:
>>> I posted yesterday, however I never received my own post, so worried it did 
>>> not go through (?)
>> 
>> I *did* see your previous message, but couldn't immediately think of
>> anything constructive to say.  I've had a little bit of time on my lunch
>> break today to look deeper.
>> 
>> EnglishMinimalStemFilter is designed to *not* aggressively stem
>> everything it sees.  It appears that the behavior you are seeing is
>> probably intentional with that filter.
>> 
>> In 5.5.0 and 6.0.0, PorterStemFilter will handle words of the form you
>> mentioned correctly.  In the screenshot below, PSF means
>> "PorterStemFilter".  I did not check any earlier versions.  I already
>> had these versions on my system.
>> 
>> https://www.dropbox.com/s/ss48vinrtbgifce/stemmer-ee-es-6.0.0.png?dl=0
>> 
>> https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#Stemming
>> 
>> That version of Solr is over four years old.  Bugs in 3.x will *not* be
>> fixed.  Bugs in 4.x will also not be fixed.  On 5.x, only extremely
>> major bugs are likely to get any attention, and this does not qualify as
>> a major bug.
>> 
>> 
>> 
>> On another matter:
>> 
>> http://people.apache.org/~hossman/#threadhijack
>> 
>> You replied to a message with the subject "Solr Support for BM25F" ...
>> so your message is showing up within that thread.
>> 
>> https://www.dropbox.com/s/xi0o8z6smhd2n5d/woodmansee-thread-hijack.png?dl=0
>> 
>> Thanks,
>> Shawn
>> 
> 


Re: Singular Plural Results Inconsistent - SOLR v3.6 and EnglishMinimalStemFilterFactor

2016-04-14 Thread Jack Krupansky
BTW, I did check and that stemmer code is the same today as it was in 3.x,
so there should be no change in stemmer behavior there.

-- Jack Krupansky

On Thu, Apr 14, 2016 at 3:47 PM, Sara Woodmansee  wrote:

> Hi Shawn,
>
> Thanks so much the feedback. And for the heads-up regarding (the bad form
> of) starting a new discussion from an existing one. Thought removing all
> content wouldn’t track to original. (Sigh). This is what you get when you
> have photographers posting to high-end forums.
>
> Thanks Erick, regarding upgrading to v5.  We actually just removed all
> test data from the site, so we can now upload all the true, final files and
> metadata. In some ways this could be a perfect time to upgrade to v5 (if I
> can talk the developer into it) since all metadata has to be re-ingested
> anyway..
>
> All best,
> Sara
>
>
> > On Apr 14, 2016, at 3:31 PM, Erick Erickson 
> wrote:
> >
> > re: upgrading to 5.x... 5X Solr's are NOT guaranteed to
> > read 3x indexes, you'd have to go through 4x to do that.
> >
> > If you can re-index from scratch that would be best.
> >
> > Best,
> > Erick
> >
> >
> >> On Apr 14, 2016, at 3:29 PM, Shawn Heisey  wrote:
> >>
> >> On 4/14/2016 11:17 AM, Sara Woodmansee wrote:
> >>> I posted yesterday, however I never received my own post, so worried
> it did not go through (?)
> >>
> >> I *did* see your previous message, but couldn't immediately think of
> >> anything constructive to say.  I've had a little bit of time on my lunch
> >> break today to look deeper.
> >>
> >> EnglishMinimalStemFilter is designed to *not* aggressively stem
> >> everything it sees.  It appears that the behavior you are seeing is
> >> probably intentional with that filter.
> >>
> >> In 5.5.0 and 6.0.0, PorterStemFilter will handle words of the form you
> >> mentioned correctly.  In the screenshot below, PSF means
> >> "PorterStemFilter".  I did not check any earlier versions.  I already
> >> had these versions on my system.
> >>
> >> https://www.dropbox.com/s/ss48vinrtbgifce/stemmer-ee-es-6.0.0.png?dl=0
> >>
> >> https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#Stemming
> >>
> >> That version of Solr is over four years old.  Bugs in 3.x will *not* be
> >> fixed.  Bugs in 4.x will also not be fixed.  On 5.x, only extremely
> >> major bugs are likely to get any attention, and this does not qualify as
> >> a major bug.
> >>
> >> 
> >>
> >> On another matter:
> >>
> >> http://people.apache.org/~hossman/#threadhijack
> >>
> >> You replied to a message with the subject "Solr Support for BM25F" ...
> >> so your message is showing up within that thread.
> >>
> >>
> https://www.dropbox.com/s/xi0o8z6smhd2n5d/woodmansee-thread-hijack.png?dl=0
> >>
> >> Thanks,
> >> Shawn
> >>
> >
>


Re: Solr 5.5 timeout of solrj client

2016-04-14 Thread Novin Novin
Thanks for the great advice Erick.

On 14 April 2016 at 18:18, Erick Erickson  wrote:

> BTW, the place optimize seems best used is when the index isn't
> updated very often. I've seen a pattern where the index is updated
> once a night (or even less). In that situation, optimization makes
> more sense. But when an index is continually updated, it's mostly
> wasted effort.
>
> Best,
> Erick
>
> On Thu, Apr 14, 2016 at 10:17 AM, Erick Erickson
>  wrote:
> > Unless you have somewhat unusual circumstances, I wouldn't optimize at
> > all, despite the name it really doesn't help all that much in _most_
> > cases.
> >
> > If your percentage deleted docs doesn't exceed, say, 15-20% I wouldn't
> > bother. Most of what optimize does is reclaim resources from deleted
> > docs. This happens as part of general background merging anyway.
> >
> > There have been some reports of 10-15% query performance after
> > optimizing, but I would measure on your system before expending the
> > resources optimizing.
> >
> > Best,
> > Erick
> >
> > On Thu, Apr 14, 2016 at 9:56 AM, Novin Novin 
> wrote:
> >> Thanks Erick,  for pointing out.  You are right.  I was optimizing
> every 10
> >> mins.  And I have change this to every day in night.
> >> On 14-Apr-2016 5:20 pm, "Erick Erickson" 
> wrote:
> >>
> >>> don't issue an optimize command... either you have a solrj client that
> >>> issues a client.optimize() command or you pressed the "optimize now"
> >>> in the admin UI. Solr doesn't do this by itself.
> >>>
> >>> Best,
> >>> Erick
> >>>
> >>> On Thu, Apr 14, 2016 at 8:30 AM, Novin Novin 
> wrote:
> >>> > How can I stop happening "DirectUpdateHandler2 Starting optimize...
> >>> Reading
> >>> > and rewriting the entire index! Use with care"
> >>> >
> >>> > Thanks
> >>> > novin
> >>> >
> >>> > On 14 April 2016 at 14:36, Shawn Heisey  wrote:
> >>> >
> >>> >> On 4/14/2016 7:23 AM, Novin Novin wrote:
> >>> >> > Thanks for reply Shawn.
> >>> >> >
> >>> >> > Below is snippet of jetty.xml and jetty-https.xml
> >>> >> >
> >>> >> > jetty.xml:38: >>> >> > name="solr.jetty.threads.idle.timeout" default="5000"/>
> >>> >> > /// I presume this one I should increase, But I believe 5 second
> is
> >>> >> enough
> >>> >> > time for 250 docs to add to solr.
> >>> >>
> >>> >> 5 seconds might not be enough time.  The *add* probably completes in
> >>> >> time, but the entire request might take longer, especially if you
> use
> >>> >> commit=true with the request.  I would definitely NOT set this
> timeout
> >>> >> so low -- requests that take longer than 5 seconds are very likely
> going
> >>> >> to happen.
> >>> >>
> >>> >> > I'm also seeing "DirectUpdateHandler2 Starting optimize...
> Reading and
> >>> >> > rewriting the entire index! Use with care". Would this be causing
> >>> delay
> >>> >> > response from solr?
> >>> >>
> >>> >> Exactly how long an optimize takes is dependent on the size of your
> >>> >> index.  Rewriting an index that's a few hundred megabytes may take
> 30
> >>> >> seconds to a minute.  Rewriting an index that's several gigabytes
> will
> >>> >> take a few minutes.  Performance is typically lower during an
> optimize,
> >>> >> because the CPU and disks are very busy.
> >>> >>
> >>> >> Thanks,
> >>> >> Shawn
> >>> >>
> >>> >>
> >>>
>


Re: DIH with Nested Documents - Configuration Issue

2016-04-14 Thread Mikhail Khludnev
Giving child="true" Solr 5.5 creates a documents block with implicit
relations across parent and nested children. These later retrievable via
https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-BlockJoinQueryParsers
only. Giving the fact you run 4.10 I don't think you really need
child="true"

On Thu, Apr 14, 2016 at 4:38 PM, Jeff Chastain  wrote:

> I am working on a project where the specification requires a parent -
> child relationship within the Solr data collection ... i.e. a user and the
> collection of languages they speak (each of which is made up of multiple
> data fields).  My production system is a 4.10 Solr implementation but I
> have a 5.5 implementation as my disposal as well.  Thus far, I am not
> getting this to work on either one and I have yet to find a complete
> documentation source on how to implement this.
>
> The goal is to get a resulting document from Solr that looks like this:
>
>{
>"id": 123,
>"firstName": "John",
>"lastName": "Doe",
>"languagesSpoken": [
>   {
>  "id": 243,
>  "abbreviation": "en",
>  "name": "English"
>   },
>   {
>  "id": 442,
>  "abbreviation": "fr",
>  "name": "French"
>   }
>]
> }
>
> In my schema.xml, I have flatted out all of the fields as follows:
>
> required="true" multiValued="false" />
> stored="true" />
> stored="true" />
> stored="true" multiValued="true"/>
> stored="true" />
> type="text_general" indexed="true" stored="true" />
> indexed="true" stored="true" />
>
> The latest rendition of my db-data-config.xml looks like this:
>
> 
> driver="com.microsoft.sqlserver.jdbc.SQLServerDriver" url="jdbc:" />
>
>   
> deltaImportQuery="SELECT * FROM clients WHERE id = ${dih.delta.id}"
>
> deltaQuery="SELECT id FROM clients WHERE updateDate >
> '${dih.last_index_time}'">
>
>   />
>   name="firstName" />
>   name="lastName" />
>
>   name="languagesSpoken" child="true" query="SELECT id, abbreviation, name
> FROM languages WHERE clientId = ${client.id}">
>  name="languagesSpoken_id" column="id" />
> 
>  name="languagesSpoken_name" column="name" />
>  
>   
>
>...
>
> On the 4.10 server, when the data comes out of Solr, I get one flat
> document record with the fields for one language inline with the firstName
> and lastname like this:
>
>{
>"id": 123,
>"firstName": "John",
>"lastName": "Doe",
>"languagesSpoken_id": 243,
>"languagesSpoken_abbreviation ": "en",
>"languagesSpoken_name": "English"
> }
>
> On the 5.5 server, when the data comes out, I get separate documents for
> the root client document and the child language documents with no
> relationship between them like this:
>
>{
>"id": 123,
>"firstName": "John",
>"lastName": "Doe"
> },
> {
>"languagesSpoken_id": 243,
>"languagesSpoken_abbreviation": "en",
>"languagesSpoken_name": "English"
> },
> {
>"languagesSpoken_id": 442,
>"languagesSpoken_abbreviation": "fr",
>"languagesSpoken_name": "French"
> }
>
> I have spent several days now trying to figure out what is going on here
> to no avail.  Can anybody provide me with a pointer as to what I am missing
> here?
>
> Thanks,
> -- Jeff
>
>


-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics





HTTP Client Only

2016-04-14 Thread Robert Brown

Hi,

I have a collection with 2 shards, 1 replica each.

When I send updates, I currently /admin/ping each of the nodes, and then 
pick one at random.


I'm guessing it makes more sense to only send updates to one of the 
leaders, so I'm contemplating getting the collection status instead, and 
filter out the leaders.


Is there anything else I should be aware of, apart from using a Java 
client, etc.


I guess the ping becomes redundant?

Thanks,
Rob





RE: Shard ranges seem incorrect

2016-04-14 Thread Markus Jelsma
Hi - bumping this issue. Any thoughts to share?

Thanks,
M

 
 
-Original message-
> From:Markus Jelsma 
> Sent: Tuesday 12th April 2016 13:49
> To: solr-user 
> Subject: Shard ranges seem incorrect
> 
> Hi - i've just created a 3 shard 3 replica collection on Solr 6.0.0 and we 
> noticed something odd, the hashing ranges don't make sense (full state.json 
> below):
> shard1 Range: 8000-d554
> shard2 Range: d555-2aa9
> shard3 Range: 2aaa-7fff
> 
> We've also noticed ranges not going from 0 to  for a 5.5 create 
> single shard collection. Another collection created on an older (unknown) 
> release has correct shard ranges. Any idea what's going on?
> Thanks,
> Markus
> 
> {"logs":{
> "replicationFactor":"3",
> "router":{"name":"compositeId"},
> "maxShardsPerNode":"9",
> "autoAddReplicas":"false",
> "shards":{
>   "shard1":{
> "range":"8000-d554",
> "state":"active",
> "replicas":{
>   "core_node3":{
> "core":"logs_shard1_replica3",
> "base_url":"http://127.0.1.1:8983/solr";,
> "node_name":"127.0.1.1:8983_solr",
> "state":"active"},
>   "core_node4":{
> "core":"logs_shard1_replica1",
> "base_url":"http://127.0.1.1:8983/solr";,
> "node_name":"127.0.1.1:8983_solr",
> "state":"active",
> "leader":"true"},
>   "core_node8":{
> "core":"logs_shard1_replica2",
> "base_url":"http://127.0.1.1:8983/solr";,
> "node_name":"127.0.1.1:8983_solr",
> "state":"active"}}},
>   "shard2":{
> "range":"d555-2aa9",
> "state":"active",
> "replicas":{
>   "core_node1":{
> "core":"logs_shard2_replica1",
> "base_url":"http://127.0.1.1:8983/solr";,
> "node_name":"127.0.1.1:8983_solr",
> "state":"active",
> "leader":"true"},
>   "core_node2":{
> "core":"logs_shard2_replica2",
> "base_url":"http://127.0.1.1:8983/solr";,
> "node_name":"127.0.1.1:8983_solr",
> "state":"active"},
>   "core_node9":{
> "core":"logs_shard2_replica3",
> "base_url":"http://127.0.1.1:8983/solr";,
> "node_name":"127.0.1.1:8983_solr",
> "state":"active"}}},
>   "shard3":{
> "range":"2aaa-7fff",
> "state":"active",
> "replicas":{
>   "core_node5":{
> "core":"logs_shard3_replica1",
> "base_url":"http://127.0.1.1:8983/solr";,
> "node_name":"127.0.1.1:8983_solr",
> "state":"active",
> "leader":"true"},
>   "core_node6":{
> "core":"logs_shard3_replica2",
> "base_url":"http://127.0.1.1:8983/solr";,
> "node_name":"127.0.1.1:8983_solr",
> "state":"active"},
>   "core_node7":{
> "core":"logs_shard3_replica3",
> "base_url":"http://127.0.1.1:8983/solr";,
> "node_name":"127.0.1.1:8983_solr",
> "state":"active"}}
> 
> 
> 
> 
> 


RE: Shard ranges seem incorrect

2016-04-14 Thread Chris Hostetter

: Hi - bumping this issue. Any thoughts to share?

Shawn's response to your email seemed spot on acurate to me -- is there 
something about his answer that doesn't match up with what you're seeing? 
can you clarify/elaborate your concerns?

http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201604.mbox/%3c570d0a03.5010...@elyograg.org%3E


 :  
: -Original message-
: > From:Markus Jelsma 
: > Sent: Tuesday 12th April 2016 13:49
: > To: solr-user 
: > Subject: Shard ranges seem incorrect
: > 
: > Hi - i've just created a 3 shard 3 replica collection on Solr 6.0.0 and we 
noticed something odd, the hashing ranges don't make sense (full state.json 
below):
: > shard1 Range: 8000-d554
: > shard2 Range: d555-2aa9
: > shard3 Range: 2aaa-7fff
: > 
: > We've also noticed ranges not going from 0 to  for a 5.5 create 
single shard collection. Another collection created on an older (unknown) 
release has correct shard ranges. Any idea what's going on?
: > Thanks,
: > Markus
: > 
: > {"logs":{
: > "replicationFactor":"3",
: > "router":{"name":"compositeId"},
: > "maxShardsPerNode":"9",
: > "autoAddReplicas":"false",
: > "shards":{
: >   "shard1":{
: > "range":"8000-d554",
: > "state":"active",
: > "replicas":{
: >   "core_node3":{
: > "core":"logs_shard1_replica3",
: > "base_url":"http://127.0.1.1:8983/solr";,
: > "node_name":"127.0.1.1:8983_solr",
: > "state":"active"},
: >   "core_node4":{
: > "core":"logs_shard1_replica1",
: > "base_url":"http://127.0.1.1:8983/solr";,
: > "node_name":"127.0.1.1:8983_solr",
: > "state":"active",
: > "leader":"true"},
: >   "core_node8":{
: > "core":"logs_shard1_replica2",
: > "base_url":"http://127.0.1.1:8983/solr";,
: > "node_name":"127.0.1.1:8983_solr",
: > "state":"active"}}},
: >   "shard2":{
: > "range":"d555-2aa9",
: > "state":"active",
: > "replicas":{
: >   "core_node1":{
: > "core":"logs_shard2_replica1",
: > "base_url":"http://127.0.1.1:8983/solr";,
: > "node_name":"127.0.1.1:8983_solr",
: > "state":"active",
: > "leader":"true"},
: >   "core_node2":{
: > "core":"logs_shard2_replica2",
: > "base_url":"http://127.0.1.1:8983/solr";,
: > "node_name":"127.0.1.1:8983_solr",
: > "state":"active"},
: >   "core_node9":{
: > "core":"logs_shard2_replica3",
: > "base_url":"http://127.0.1.1:8983/solr";,
: > "node_name":"127.0.1.1:8983_solr",
: > "state":"active"}}},
: >   "shard3":{
: > "range":"2aaa-7fff",
: > "state":"active",
: > "replicas":{
: >   "core_node5":{
: > "core":"logs_shard3_replica1",
: > "base_url":"http://127.0.1.1:8983/solr";,
: > "node_name":"127.0.1.1:8983_solr",
: > "state":"active",
: > "leader":"true"},
: >   "core_node6":{
: > "core":"logs_shard3_replica2",
: > "base_url":"http://127.0.1.1:8983/solr";,
: > "node_name":"127.0.1.1:8983_solr",
: > "state":"active"},
: >   "core_node7":{
: > "core":"logs_shard3_replica3",
: > "base_url":"http://127.0.1.1:8983/solr";,
: > "node_name":"127.0.1.1:8983_solr",
: > "state":"active"}}
: > 
: > 
: > 
: > 
: > 
: 

-Hoss
http://www.lucidworks.com/


Re: Referencing incoming search terms in searchHandler XML

2016-04-14 Thread John Bickerstaff
Thanks - so this:

bq=contentType:(original query text here)^1000

is exactly what I want to do to every incoming query via an entry in a
custom requestHandler.  Thus my question about how to reference the
original query text in the requestHandler xml...

I believe that if I want to do that, I'm going to have to use the
simpleparams syntax, yes?

Ideally, it would be as simple as:

contentType:($q)^1000

... and the requestHandler would recognize $q as a magic variable that
holds the current search text (what came in on q on the URL)



But I'm guessing I can't access the query itself in the requestHandler
without being inside the brackets of a simpleParams, like this:

&bq={! . .  .  v=$q}^1000

However - as you pointed out, there are other alternatives...

I believe I've found an easier way - at least for this case - which is:

&qf=text contentType^1000

I think this ensures that the "standard" search on the catchall field of
"text" will happen with no boosting and an additional search on contentType
will occur with the boost listed.

Also - thanks for the caveat on the boosting function - I'll set
expectations with my users.  I can't limit my search to only docs with
contentType = figo -- or I could guarantee that they show up - so the boost
seems the best compromise -- unless I want to parse search results in code
- which I'd rather avoid whenever possible.

On Thu, Apr 14, 2016 at 1:41 PM, Erick Erickson 
wrote:

> Right, edismax is where I'd start. NOTE: there are about a zillion
> options here so you may find yourself lost in a bit of a maze for
> a while, but it's usually faster than coding it yourself ;).
>
> In this case, take a look at the "bq" parameter to edismax and
> make it something like bq=contentType:(original query text here)^1000
>
> In short, it's likely that someone has had this problem before and
> there's a solution, said solution may not be easy to find though ;(
>
> And also note that boosting is not definitive. By that I mean that
> boosting just influences the score it does _not_ explicitly order the
> results. So the docs with "figo" in the conentType field will tend to
> the top, but won't be absolutely guaranteed to be there.
>
>
>
> Best,
> Erick
>
> On Thu, Apr 14, 2016 at 12:18 PM, John Bickerstaff
>  wrote:
> > OK - that's interesting.  Perhaps I'm thinking too much like a developer
> > and just want to be able to reach into context and grab anything any
> time I
> > want...  Thanks for the input...
> >
> > =
> >
> > To clarify, I want to boost the document's score if the user enters a
> term
> > found in the contentType field.
> >
> > As an example, the term "figo" is one of a few that are stored in the
> > contentType field.  It's not a multivalued field - one entry per
> document.
> >
> > If a user types in "foobarbaz figo" I want all documents with "figo" in
> the
> > contentType field boosted above every other document in the results.  The
> > order of docs can be determined by the other scores - my user's rule is
> > simply that any with "figo" in contentType should be appear above any
> which
> > do NOT have "figo" in that field.
> >
> > I can't know when the users will type any of the "magic" contentType
> terms
> > into the search, so I think I have to run the search every time against
> the
> > contentType field.
> >
> > So - that's my underlying use case - and as I say, I'm beginning to think
> > the edismax setting of qf= text contentType^1000 answers my need really
> > well -- and is easier.  A quick test looks like I'm getting the results I
> > expect...
> >
> >
> >
> >
> >
> > On Thu, Apr 14, 2016 at 1:02 PM, Erick Erickson  >
> > wrote:
> >
> >> You really don't do that in solrconfig.xml.
> >>
> >> This seems like an XY problem. You're trying
> >> to solve some particular use-case and accessing the
> >> terms in solrconfig.xml. You've already found the ability
> >> to configure edismax as your defType and apply boosts
> >> to particular fields...
> >>
> >> Best,
> >> Erick
> >>
> >> On Thu, Apr 14, 2016 at 11:53 AM, John Bickerstaff
> >>  wrote:
> >> > Maybe I'm overdoing it...
> >> >
> >> > It seems to me that qf= text contentType^1000 would do this for me
> more
> >> > easily - as it appears to assume the incoming search terms...
> >> >
> >> > However, I'd still like to know the simplest way to reference the
> search
> >> > terms in the XML - or possibly get a URL that points the way.
> >> >
> >> > Thanks.
> >> >
> >> > On Thu, Apr 14, 2016 at 12:34 PM, John Bickerstaff <
> >> j...@johnbickerstaff.com
> >> >> wrote:
> >> >
> >> >> I have the following (essentially hard-coded) line in the Solr Admin
> >> Query
> >> >> UI
> >> >>
> >> >> =
> >> >> bq: contentType:(searchTerm1 searchTerm2 searchTerm2)^1000
> >> >> =
> >> >>
> >> >> The "searchTerm" entries represent whatever the user typed into the
> >> search
> >> >> box.  This can be one or more words.  Usually less than 5.
> >> >>
> >> >> I want to put the search parameters I've 

Re: Singular Plural Results Inconsistent - SOLR v3.6 and EnglishMinimalStemFilterFactor

2016-04-14 Thread Sara Woodmansee
Thanks Jack.

So - if I understand (all email feedback thus far) correctly:  

— Upgrading to newer version vital (5.5 —6.0)

— EnglishMinimalStemFilter:  upgrading to v5.5-6.0 will NOT help with stemming 
issues, as code has not been updated.

— PorterStemFilter:  Has been updated to work with better with v5.5 - 6.0

— Or, perhaps we just need a stemmer that is more dictionary-based (Hunspell?), 
or inflectional (any suggestions?)

Thanks again all, for your patience and time!
Sara

> On Apr 14, 2016, at 3:51 PM, Jack Krupansky  wrote:
> 
> BTW, I did check and that stemmer code is the same today as it was in 3.x, so 
> there should be no change in stemmer behavior there.
> 
> -- Jack Krupansky
> 
> On Thu, Apr 14, 2016 at 3:47 PM, Sara Woodmansee  wrote:
> 
>> Hi Shawn,
>> 
>> Thanks so much the feedback. And for the heads-up regarding (the bad form
>> of) starting a new discussion from an existing one. Thought removing all
>> content wouldn’t track to original. (Sigh). This is what you get when you
>> have photographers posting to high-end forums.
>> 
>> Thanks Erick, regarding upgrading to v5.  We actually just removed all
>> test data from the site, so we can now upload all the true, final files and
>> metadata. In some ways this could be a perfect time to upgrade to v5 (if I
>> can talk the developer into it) since all metadata has to be re-ingested
>> anyway..
>> 
>> All best,
>> Sara
>> 
>> 
>>> On Apr 14, 2016, at 3:31 PM, Erick Erickson 
>> wrote:
>>> 
>>> re: upgrading to 5.x... 5X Solr's are NOT guaranteed to read 3x indexes, 
>>> you'd have to go through 4x to do that.
>>> 
>>> If you can re-index from scratch that would be best.
>>> 
>>> Best,
>>> Erick
>>> 
>>> 
 On Apr 14, 2016, at 3:29 PM, Shawn Heisey  wrote:
 
 On 4/14/2016 11:17 AM, Sara Woodmansee wrote:
> I posted yesterday, however I never received my own post, so worried
>> it did not go through (?)
 
 I *did* see your previous message, but couldn't immediately think of
 anything constructive to say.  I've had a little bit of time on my lunch
 break today to look deeper.
 
 EnglishMinimalStemFilter is designed to *not* aggressively stem
 everything it sees.  It appears that the behavior you are seeing is
 probably intentional with that filter.
 
 In 5.5.0 and 6.0.0, PorterStemFilter will handle words of the form you
 mentioned correctly.  In the screenshot below, PSF means
 "PorterStemFilter".  I did not check any earlier versions.  I already
 had these versions on my system.
 
 https://www.dropbox.com/s/ss48vinrtbgifce/stemmer-ee-es-6.0.0.png?dl=0
 
 https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#Stemming
 
 That version of Solr is over four years old.  Bugs in 3.x will *not* be
 fixed.  Bugs in 4.x will also not be fixed.  On 5.x, only extremely
 major bugs are likely to get any attention, and this does not qualify as
 a major bug.
 
 
 
 On another matter:
 
 http://people.apache.org/~hossman/#threadhijack
 
 You replied to a message with the subject "Solr Support for BM25F" ...
 so your message is showing up within that thread.
 
 
>> https://www.dropbox.com/s/xi0o8z6smhd2n5d/woodmansee-thread-hijack.png?dl=0
 
 Thanks,
 Shawn
 
>>> 
>> 



Re: UUID processor handling of empty string

2016-04-14 Thread Chris Hostetter

I'm also confused by what exactly you mean by "doesn't work" but a general 
suggestion you can try is putting the 
RemoveBlankFieldUpdateProcessorFactory before your UUID Processor...

https://lucene.apache.org/solr/6_0_0/solr-core/org/apache/solr/update/processor/RemoveBlankFieldUpdateProcessorFactory.html

If you are also worried about strings that aren't exactly empty, but 
consist only of whitespace, you can put TrimFieldUpdateProcessorFactory 
before RemoveBlankFieldUpdateProcessorFactory ...

https://lucene.apache.org/solr/6_0_0/solr-core/org/apache/solr/update/processor/TrimFieldUpdateProcessorFactory.html


: Date: Thu, 14 Apr 2016 12:30:24 -0700
: From: Erick Erickson 
: Reply-To: solr-user@lucene.apache.org
: To: solr-user 
: Subject: Re: UUID processor handling of empty string
: 
: What do you mean "doesn't work"? An empty string is
: different than not being present. Thee UUID update
: processor (I'm pretty sure) only adds a field if it
: is _absent_. Specifying it as an empty string
: fails that test so no value is added.
: 
: At that point, if this uuid field is also the ,
: then each doc that comes in with an empty field will replace
: the others.
: 
: If it's _not_ the , the sorting will be confusing.
: All the empty string fields are equal, so the tiebreaker is
: the internal Lucene doc ID, which may change as merges
: happen. You can specify secondary sort fields to make the
: sort predictable (the  field is popular for this).
: 
: Best,
: Erick
: 
: On Thu, Apr 14, 2016 at 12:18 PM, Susmit Shukla  
wrote:
: > Hi,
: >
: > I have configured solr schema to generate unique id for a collection using
: > UUIDUpdateProcessorFactory
: >
: > I am seeing a peculiar behavior - if the unique 'id' field is explicitly
: > set as empty string in the SolrInputDocument, the document gets indexed
: > with UUID update processor generating the id.
: > However, sorting does not work if uuid was generated in this way. Also
: > cursor functionality that depends on unique id sort also does not work.
: > I guess the correct behavior would be to fail the indexing if user provides
: > an empty string for a uuid field.
: >
: > The issues do not happen if I omit the id field from the SolrInputDocument .
: >
: > SolrInputDocument
: >
: > solrDoc.addField("id", "");
: >
: > ...
: >
: > I am using schema similar to below-
: >
: > 
: >
: > 
: >
: > 
: >
: > id
: >
: > 
: > 
: > 
: >   id
: > 
: > 
: > 
: >
: >
: >  
: >
: >  uuid
: >
: > 
: >
: >
: > Thanks,
: > Susmit
: 

-Hoss
http://www.lucidworks.com/


Re: Referencing incoming search terms in searchHandler XML

2016-04-14 Thread Walter Underwood
> On Apr 14, 2016, at 12:18 PM, John Bickerstaff  
> wrote:
> 
> If a user types in "foobarbaz figo" I want all documents with "figo" in the
> contentType field boosted above every other document in the results.


This is a very common requirement that seems like a good idea, but has very bad 
corner cases. I always take this back to the customer and convert it to 
something that works for all queries.

Think about this query:

   vitamin a figo

Now, every document with the word “a” is ranked in front of documents with 
“vitamin a”. That is probably what not what the customer wanted.

Instead, have a requirement that when two documents are equal matches for the 
query, the “figo” document is first.

Or, create an SRP with two sections, five figo matches with a “More …” link, 
then five general matches. But you might want to avoid dupes between the two.

If your customer absolutely insists on having every single figo doc above 
non-figo docs, well, they deserve what they get.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)



Adding replica on solr - 5.50

2016-04-14 Thread Jay Potharaju
Hi,
I am using solr 5.5 and testing adding a new replica when a solr instance
comes up. When I run the following command I get an error. I have 1 replica
and trying to add another replica.

http://x.x.x.x:8984/solr/admin/collections?action=ADDREPLICA&collection=test2&shard=shard1&node=x.x.x.x:9001_solr

Error:
> org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
> At least one of the node(s) specified are not currently active, no action
> taken.
> 
> At least one of the node(s) specified are not currently
> active, no action taken.
> 400
> 
> 
> 
> org.apache.solr.common.SolrException
> org.apache.solr.common.SolrException
> 
> At least one of the node(s) specified are not currently
> active, no action taken.
> 400
> 
> 


But when i create a new collection with 2 replicas it works fine.
As a side note my clusterstate.json is not updating correctly. Not sure if
that is causing an issue.

 Any suggestions why the Addreplica command is not working. And is it
related to the clusterstate.json? If yes, how can i fix it?

-- 
Thanks
Jay


Re: Adding replica on solr - 5.50

2016-04-14 Thread John Bickerstaff
I had a hard time getting replicas made via the API, once I had created the
collection for the first time although that may have been ignorance on
my part.

I was able to get it done fairly easily on the Linux command line.  If
that's an option and you're interested, let me know - I have a rough but
accurate document. But perhaps others on the list will have the specific
answer you're looking for.

On Thu, Apr 14, 2016 at 4:19 PM, Jay Potharaju 
wrote:

> Hi,
> I am using solr 5.5 and testing adding a new replica when a solr instance
> comes up. When I run the following command I get an error. I have 1 replica
> and trying to add another replica.
>
>
> http://x.x.x.x:8984/solr/admin/collections?action=ADDREPLICA&collection=test2&shard=shard1&node=x.x.x.x:9001_solr
>
> Error:
> > org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
> > At least one of the node(s) specified are not currently active, no action
> > taken.
> > 
> > At least one of the node(s) specified are not currently
> > active, no action taken.
> > 400
> > 
> > 
> > 
> > org.apache.solr.common.SolrException
> > org.apache.solr.common.SolrException
> > 
> > At least one of the node(s) specified are not currently
> > active, no action taken.
> > 400
> > 
> > 
>
>
> But when i create a new collection with 2 replicas it works fine.
> As a side note my clusterstate.json is not updating correctly. Not sure if
> that is causing an issue.
>
>  Any suggestions why the Addreplica command is not working. And is it
> related to the clusterstate.json? If yes, how can i fix it?
>
> --
> Thanks
> Jay
>


Re: Adding replica on solr - 5.50

2016-04-14 Thread Jay Potharaju
Curious what command did you use?

On Thu, Apr 14, 2016 at 3:48 PM, John Bickerstaff 
wrote:

> I had a hard time getting replicas made via the API, once I had created the
> collection for the first time although that may have been ignorance on
> my part.
>
> I was able to get it done fairly easily on the Linux command line.  If
> that's an option and you're interested, let me know - I have a rough but
> accurate document. But perhaps others on the list will have the specific
> answer you're looking for.
>
> On Thu, Apr 14, 2016 at 4:19 PM, Jay Potharaju 
> wrote:
>
> > Hi,
> > I am using solr 5.5 and testing adding a new replica when a solr instance
> > comes up. When I run the following command I get an error. I have 1
> replica
> > and trying to add another replica.
> >
> >
> >
> http://x.x.x.x:8984/solr/admin/collections?action=ADDREPLICA&collection=test2&shard=shard1&node=x.x.x.x:9001_solr
> >
> > Error:
> > > org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
> > > At least one of the node(s) specified are not currently active, no
> action
> > > taken.
> > > 
> > > At least one of the node(s) specified are not currently
> > > active, no action taken.
> > > 400
> > > 
> > > 
> > > 
> > > org.apache.solr.common.SolrException
> > > org.apache.solr.common.SolrException
> > > 
> > > At least one of the node(s) specified are not currently
> > > active, no action taken.
> > > 400
> > > 
> > > 
> >
> >
> > But when i create a new collection with 2 replicas it works fine.
> > As a side note my clusterstate.json is not updating correctly. Not sure
> if
> > that is causing an issue.
> >
> >  Any suggestions why the Addreplica command is not working. And is it
> > related to the clusterstate.json? If yes, how can i fix it?
> >
> > --
> > Thanks
> > Jay
> >
>



-- 
Thanks
Jay Potharaju


Re: Adding replica on solr - 5.50

2016-04-14 Thread John Bickerstaff
su - solr -c "/opt/solr/bin/solr create -c statdx -d /home/john/conf
-shards 1 -replicationFactor 2"

However, this won't work by itself.  There is some preparation
necessary...  I'll send you the doc.

On Thu, Apr 14, 2016 at 4:55 PM, Jay Potharaju 
wrote:

> Curious what command did you use?
>
> On Thu, Apr 14, 2016 at 3:48 PM, John Bickerstaff <
> j...@johnbickerstaff.com>
> wrote:
>
> > I had a hard time getting replicas made via the API, once I had created
> the
> > collection for the first time although that may have been ignorance
> on
> > my part.
> >
> > I was able to get it done fairly easily on the Linux command line.  If
> > that's an option and you're interested, let me know - I have a rough but
> > accurate document. But perhaps others on the list will have the specific
> > answer you're looking for.
> >
> > On Thu, Apr 14, 2016 at 4:19 PM, Jay Potharaju 
> > wrote:
> >
> > > Hi,
> > > I am using solr 5.5 and testing adding a new replica when a solr
> instance
> > > comes up. When I run the following command I get an error. I have 1
> > replica
> > > and trying to add another replica.
> > >
> > >
> > >
> >
> http://x.x.x.x:8984/solr/admin/collections?action=ADDREPLICA&collection=test2&shard=shard1&node=x.x.x.x:9001_solr
> > >
> > > Error:
> > > > org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
> > > > At least one of the node(s) specified are not currently active, no
> > action
> > > > taken.
> > > > 
> > > > At least one of the node(s) specified are not
> currently
> > > > active, no action taken.
> > > > 400
> > > > 
> > > > 
> > > > 
> > > > org.apache.solr.common.SolrException
> > > >  name="root-error-class">org.apache.solr.common.SolrException
> > > > 
> > > > At least one of the node(s) specified are not
> currently
> > > > active, no action taken.
> > > > 400
> > > > 
> > > > 
> > >
> > >
> > > But when i create a new collection with 2 replicas it works fine.
> > > As a side note my clusterstate.json is not updating correctly. Not sure
> > if
> > > that is causing an issue.
> > >
> > >  Any suggestions why the Addreplica command is not working. And is it
> > > related to the clusterstate.json? If yes, how can i fix it?
> > >
> > > --
> > > Thanks
> > > Jay
> > >
> >
>
>
>
> --
> Thanks
> Jay Potharaju
>


Re: Adding replica on solr - 5.50

2016-04-14 Thread Jay Potharaju
Thanks John, which version of solr are you using?

On Thu, Apr 14, 2016 at 3:59 PM, John Bickerstaff 
wrote:

> su - solr -c "/opt/solr/bin/solr create -c statdx -d /home/john/conf
> -shards 1 -replicationFactor 2"
>
> However, this won't work by itself.  There is some preparation
> necessary...  I'll send you the doc.
>
> On Thu, Apr 14, 2016 at 4:55 PM, Jay Potharaju 
> wrote:
>
> > Curious what command did you use?
> >
> > On Thu, Apr 14, 2016 at 3:48 PM, John Bickerstaff <
> > j...@johnbickerstaff.com>
> > wrote:
> >
> > > I had a hard time getting replicas made via the API, once I had created
> > the
> > > collection for the first time although that may have been ignorance
> > on
> > > my part.
> > >
> > > I was able to get it done fairly easily on the Linux command line.  If
> > > that's an option and you're interested, let me know - I have a rough
> but
> > > accurate document. But perhaps others on the list will have the
> specific
> > > answer you're looking for.
> > >
> > > On Thu, Apr 14, 2016 at 4:19 PM, Jay Potharaju 
> > > wrote:
> > >
> > > > Hi,
> > > > I am using solr 5.5 and testing adding a new replica when a solr
> > instance
> > > > comes up. When I run the following command I get an error. I have 1
> > > replica
> > > > and trying to add another replica.
> > > >
> > > >
> > > >
> > >
> >
> http://x.x.x.x:8984/solr/admin/collections?action=ADDREPLICA&collection=test2&shard=shard1&node=x.x.x.x:9001_solr
> > > >
> > > > Error:
> > > > > org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
> > > > > At least one of the node(s) specified are not currently active, no
> > > action
> > > > > taken.
> > > > > 
> > > > > At least one of the node(s) specified are not
> > currently
> > > > > active, no action taken.
> > > > > 400
> > > > > 
> > > > > 
> > > > > 
> > > > > org.apache.solr.common.SolrException
> > > > >  > name="root-error-class">org.apache.solr.common.SolrException
> > > > > 
> > > > > At least one of the node(s) specified are not
> > currently
> > > > > active, no action taken.
> > > > > 400
> > > > > 
> > > > > 
> > > >
> > > >
> > > > But when i create a new collection with 2 replicas it works fine.
> > > > As a side note my clusterstate.json is not updating correctly. Not
> sure
> > > if
> > > > that is causing an issue.
> > > >
> > > >  Any suggestions why the Addreplica command is not working. And is it
> > > > related to the clusterstate.json? If yes, how can i fix it?
> > > >
> > > > --
> > > > Thanks
> > > > Jay
> > > >
> > >
> >
> >
> >
> > --
> > Thanks
> > Jay Potharaju
> >
>



-- 
Thanks
Jay Potharaju


Re: Adding replica on solr - 5.50

2016-04-14 Thread John Bickerstaff
5.4

This problem drove me insane for about a month...

I'll send you the doc.

On Thu, Apr 14, 2016 at 5:02 PM, Jay Potharaju 
wrote:

> Thanks John, which version of solr are you using?
>
> On Thu, Apr 14, 2016 at 3:59 PM, John Bickerstaff <
> j...@johnbickerstaff.com>
> wrote:
>
> > su - solr -c "/opt/solr/bin/solr create -c statdx -d /home/john/conf
> > -shards 1 -replicationFactor 2"
> >
> > However, this won't work by itself.  There is some preparation
> > necessary...  I'll send you the doc.
> >
> > On Thu, Apr 14, 2016 at 4:55 PM, Jay Potharaju 
> > wrote:
> >
> > > Curious what command did you use?
> > >
> > > On Thu, Apr 14, 2016 at 3:48 PM, John Bickerstaff <
> > > j...@johnbickerstaff.com>
> > > wrote:
> > >
> > > > I had a hard time getting replicas made via the API, once I had
> created
> > > the
> > > > collection for the first time although that may have been
> ignorance
> > > on
> > > > my part.
> > > >
> > > > I was able to get it done fairly easily on the Linux command line.
> If
> > > > that's an option and you're interested, let me know - I have a rough
> > but
> > > > accurate document. But perhaps others on the list will have the
> > specific
> > > > answer you're looking for.
> > > >
> > > > On Thu, Apr 14, 2016 at 4:19 PM, Jay Potharaju <
> jspothar...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi,
> > > > > I am using solr 5.5 and testing adding a new replica when a solr
> > > instance
> > > > > comes up. When I run the following command I get an error. I have 1
> > > > replica
> > > > > and trying to add another replica.
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> http://x.x.x.x:8984/solr/admin/collections?action=ADDREPLICA&collection=test2&shard=shard1&node=x.x.x.x:9001_solr
> > > > >
> > > > > Error:
> > > > > > org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
> > > > > > At least one of the node(s) specified are not currently active,
> no
> > > > action
> > > > > > taken.
> > > > > > 
> > > > > > At least one of the node(s) specified are not
> > > currently
> > > > > > active, no action taken.
> > > > > > 400
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > >  name="error-class">org.apache.solr.common.SolrException
> > > > > >  > > name="root-error-class">org.apache.solr.common.SolrException
> > > > > > 
> > > > > > At least one of the node(s) specified are not
> > > currently
> > > > > > active, no action taken.
> > > > > > 400
> > > > > > 
> > > > > > 
> > > > >
> > > > >
> > > > > But when i create a new collection with 2 replicas it works fine.
> > > > > As a side note my clusterstate.json is not updating correctly. Not
> > sure
> > > > if
> > > > > that is causing an issue.
> > > > >
> > > > >  Any suggestions why the Addreplica command is not working. And is
> it
> > > > > related to the clusterstate.json? If yes, how can i fix it?
> > > > >
> > > > > --
> > > > > Thanks
> > > > > Jay
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Thanks
> > > Jay Potharaju
> > >
> >
>
>
>
> --
> Thanks
> Jay Potharaju
>


Re: UUID processor handling of empty string

2016-04-14 Thread Susmit Shukla
Hi Chris/Erick,

Does not work in the sense the order of documents does not change on
changing sort from asc to desc.
This could be just a trivial bug where UUID processor factory is generating
uuid even if it is empty.
This is on solr 5.3.0

Thanks,
Susmit





On Thu, Apr 14, 2016 at 2:30 PM, Chris Hostetter 
wrote:

>
> I'm also confused by what exactly you mean by "doesn't work" but a general
> suggestion you can try is putting the
> RemoveBlankFieldUpdateProcessorFactory before your UUID Processor...
>
>
> https://lucene.apache.org/solr/6_0_0/solr-core/org/apache/solr/update/processor/RemoveBlankFieldUpdateProcessorFactory.html
>
> If you are also worried about strings that aren't exactly empty, but
> consist only of whitespace, you can put TrimFieldUpdateProcessorFactory
> before RemoveBlankFieldUpdateProcessorFactory ...
>
>
> https://lucene.apache.org/solr/6_0_0/solr-core/org/apache/solr/update/processor/TrimFieldUpdateProcessorFactory.html
>
>
> : Date: Thu, 14 Apr 2016 12:30:24 -0700
> : From: Erick Erickson 
> : Reply-To: solr-user@lucene.apache.org
> : To: solr-user 
> : Subject: Re: UUID processor handling of empty string
> :
> : What do you mean "doesn't work"? An empty string is
> : different than not being present. Thee UUID update
> : processor (I'm pretty sure) only adds a field if it
> : is _absent_. Specifying it as an empty string
> : fails that test so no value is added.
> :
> : At that point, if this uuid field is also the ,
> : then each doc that comes in with an empty field will replace
> : the others.
> :
> : If it's _not_ the , the sorting will be confusing.
> : All the empty string fields are equal, so the tiebreaker is
> : the internal Lucene doc ID, which may change as merges
> : happen. You can specify secondary sort fields to make the
> : sort predictable (the  field is popular for this).
> :
> : Best,
> : Erick
> :
> : On Thu, Apr 14, 2016 at 12:18 PM, Susmit Shukla 
> wrote:
> : > Hi,
> : >
> : > I have configured solr schema to generate unique id for a collection
> using
> : > UUIDUpdateProcessorFactory
> : >
> : > I am seeing a peculiar behavior - if the unique 'id' field is
> explicitly
> : > set as empty string in the SolrInputDocument, the document gets indexed
> : > with UUID update processor generating the id.
> : > However, sorting does not work if uuid was generated in this way. Also
> : > cursor functionality that depends on unique id sort also does not work.
> : > I guess the correct behavior would be to fail the indexing if user
> provides
> : > an empty string for a uuid field.
> : >
> : > The issues do not happen if I omit the id field from the
> SolrInputDocument .
> : >
> : > SolrInputDocument
> : >
> : > solrDoc.addField("id", "");
> : >
> : > ...
> : >
> : > I am using schema similar to below-
> : >
> : > 
> : >
> : > 
> : >
> : >  required="true" />
> : >
> : > id
> : >
> : > 
> : > 
> : > 
> : >   id
> : > 
> : > 
> : > 
> : >
> : >
> : >  
> : >
> : >  uuid
> : >
> : > 
> : >
> : >
> : > Thanks,
> : > Susmit
> :
>
> -Hoss
> http://www.lucidworks.com/
>


Re: HTTP Client Only

2016-04-14 Thread Jeff Wartes


If you’re already using java, just use the CloudSolrClient. 
If you’re using the default router, (CompositeId) it’ll figure out the leaders 
and send documents to the right place for you.

If you’re not using java, then I’d still look there for hints on how to 
duplicate the functionality.



On 4/14/16, 1:27 PM, "Robert Brown"  wrote:

>Hi,
>
>I have a collection with 2 shards, 1 replica each.
>
>When I send updates, I currently /admin/ping each of the nodes, and then 
>pick one at random.
>
>I'm guessing it makes more sense to only send updates to one of the 
>leaders, so I'm contemplating getting the collection status instead, and 
>filter out the leaders.
>
>Is there anything else I should be aware of, apart from using a Java 
>client, etc.
>
>I guess the ping becomes redundant?
>
>Thanks,
>Rob
>
>
>


Re: Adding replica on solr - 5.50

2016-04-14 Thread Jeff Wartes
I’m all for finding another way to make something work, but I feel like this is 
the wrong advice. 

There are two options:
1) You are doing something wrong. In which case, you should probably invest in 
figuring out what.
2) Solr is doing something wrong. In which case, you should probably invest in 
figuring out what, and then file a bug so it doesn’t happen to anyone else.

Adding a replica is a pretty basic operation, so whichever option is the case, 
I feel like you’ll just encounter other problems down the road if you don’t 
figure out what’s going on.

I’d probably start by creating the single-replica collection, and then 
inspecting the live_nodes list in Zookeeper to confirm that the (live) node 
list is actually what you think it is.





On 4/14/16, 4:04 PM, "John Bickerstaff"  wrote:

>5.4
>
>This problem drove me insane for about a month...
>
>I'll send you the doc.
>
>On Thu, Apr 14, 2016 at 5:02 PM, Jay Potharaju 
>wrote:
>
>> Thanks John, which version of solr are you using?
>>
>> On Thu, Apr 14, 2016 at 3:59 PM, John Bickerstaff <
>> j...@johnbickerstaff.com>
>> wrote:
>>
>> > su - solr -c "/opt/solr/bin/solr create -c statdx -d /home/john/conf
>> > -shards 1 -replicationFactor 2"
>> >
>> > However, this won't work by itself.  There is some preparation
>> > necessary...  I'll send you the doc.
>> >
>> > On Thu, Apr 14, 2016 at 4:55 PM, Jay Potharaju 
>> > wrote:
>> >
>> > > Curious what command did you use?
>> > >
>> > > On Thu, Apr 14, 2016 at 3:48 PM, John Bickerstaff <
>> > > j...@johnbickerstaff.com>
>> > > wrote:
>> > >
>> > > > I had a hard time getting replicas made via the API, once I had
>> created
>> > > the
>> > > > collection for the first time although that may have been
>> ignorance
>> > > on
>> > > > my part.
>> > > >
>> > > > I was able to get it done fairly easily on the Linux command line.
>> If
>> > > > that's an option and you're interested, let me know - I have a rough
>> > but
>> > > > accurate document. But perhaps others on the list will have the
>> > specific
>> > > > answer you're looking for.
>> > > >
>> > > > On Thu, Apr 14, 2016 at 4:19 PM, Jay Potharaju <
>> jspothar...@gmail.com>
>> > > > wrote:
>> > > >
>> > > > > Hi,
>> > > > > I am using solr 5.5 and testing adding a new replica when a solr
>> > > instance
>> > > > > comes up. When I run the following command I get an error. I have 1
>> > > > replica
>> > > > > and trying to add another replica.
>> > > > >
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>> http://x.x.x.x:8984/solr/admin/collections?action=ADDREPLICA&collection=test2&shard=shard1&node=x.x.x.x:9001_solr
>> > > > >
>> > > > > Error:
>> > > > > > org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
>> > > > > > At least one of the node(s) specified are not currently active,
>> no
>> > > > action
>> > > > > > taken.
>> > > > > > 
>> > > > > > At least one of the node(s) specified are not
>> > > currently
>> > > > > > active, no action taken.
>> > > > > > 400
>> > > > > > 
>> > > > > > 
>> > > > > > 
>> > > > > > > name="error-class">org.apache.solr.common.SolrException
>> > > > > > > > > name="root-error-class">org.apache.solr.common.SolrException
>> > > > > > 
>> > > > > > At least one of the node(s) specified are not
>> > > currently
>> > > > > > active, no action taken.
>> > > > > > 400
>> > > > > > 
>> > > > > > 
>> > > > >
>> > > > >
>> > > > > But when i create a new collection with 2 replicas it works fine.
>> > > > > As a side note my clusterstate.json is not updating correctly. Not
>> > sure
>> > > > if
>> > > > > that is causing an issue.
>> > > > >
>> > > > >  Any suggestions why the Addreplica command is not working. And is
>> it
>> > > > > related to the clusterstate.json? If yes, how can i fix it?
>> > > > >
>> > > > > --
>> > > > > Thanks
>> > > > > Jay
>> > > > >
>> > > >
>> > >
>> > >
>> > >
>> > > --
>> > > Thanks
>> > > Jay Potharaju
>> > >
>> >
>>
>>
>>
>> --
>> Thanks
>> Jay Potharaju
>>


JSON Facet Stats Mincount

2016-04-14 Thread Nick Vasilyev
Hello, I am trying to get a list of items that have more than one
manufacturer using the following json facet query. This works fine without
mincount, but errors out as soon as I add it.

Is this possible or am I doing something wrong?

json.facet={
   groupID: {
  type: terms,
  field: groupID,
  facet:{ y: "unique(mfr)",
mincount: 2}
   }
}

Error:
"error": { "msg": "expected Map but got 2 ,path=facet/groupID", "code": 400
}

Thanks in advance


Re: Adding replica on solr - 5.50

2016-04-14 Thread John Bickerstaff
Sure - couldn't agree more.

I couldn't find any good documentation on the Solr site about how to add a
replica to a Solr cloud.  The Admin UI appears to require that the
directories be created anyway.

There is probably a way to do it through the UI, once Solr is installed on
a new machine - and IIRC, I did manage that, but my IT guy wanted
scriptable command lines.

Also, IIRC, the stuff I did on the command line actually showed the API URL
as part of the output so Jay could try that and see what the difference
is...

Jay - I'm going offline now, but if you're still stuck tomorrow, I'll try
to recreate... I have a VM snapshot just before I issued the command...

Keep in mind everything I did was in a Solr Cloud...

On Thu, Apr 14, 2016 at 6:21 PM, Jeff Wartes  wrote:

> I’m all for finding another way to make something work, but I feel like
> this is the wrong advice.
>
> There are two options:
> 1) You are doing something wrong. In which case, you should probably
> invest in figuring out what.
> 2) Solr is doing something wrong. In which case, you should probably
> invest in figuring out what, and then file a bug so it doesn’t happen to
> anyone else.
>
> Adding a replica is a pretty basic operation, so whichever option is the
> case, I feel like you’ll just encounter other problems down the road if you
> don’t figure out what’s going on.
>
> I’d probably start by creating the single-replica collection, and then
> inspecting the live_nodes list in Zookeeper to confirm that the (live) node
> list is actually what you think it is.
>
>
>
>
>
> On 4/14/16, 4:04 PM, "John Bickerstaff"  wrote:
>
> >5.4
> >
> >This problem drove me insane for about a month...
> >
> >I'll send you the doc.
> >
> >On Thu, Apr 14, 2016 at 5:02 PM, Jay Potharaju 
> >wrote:
> >
> >> Thanks John, which version of solr are you using?
> >>
> >> On Thu, Apr 14, 2016 at 3:59 PM, John Bickerstaff <
> >> j...@johnbickerstaff.com>
> >> wrote:
> >>
> >> > su - solr -c "/opt/solr/bin/solr create -c statdx -d /home/john/conf
> >> > -shards 1 -replicationFactor 2"
> >> >
> >> > However, this won't work by itself.  There is some preparation
> >> > necessary...  I'll send you the doc.
> >> >
> >> > On Thu, Apr 14, 2016 at 4:55 PM, Jay Potharaju  >
> >> > wrote:
> >> >
> >> > > Curious what command did you use?
> >> > >
> >> > > On Thu, Apr 14, 2016 at 3:48 PM, John Bickerstaff <
> >> > > j...@johnbickerstaff.com>
> >> > > wrote:
> >> > >
> >> > > > I had a hard time getting replicas made via the API, once I had
> >> created
> >> > > the
> >> > > > collection for the first time although that may have been
> >> ignorance
> >> > > on
> >> > > > my part.
> >> > > >
> >> > > > I was able to get it done fairly easily on the Linux command line.
> >> If
> >> > > > that's an option and you're interested, let me know - I have a
> rough
> >> > but
> >> > > > accurate document. But perhaps others on the list will have the
> >> > specific
> >> > > > answer you're looking for.
> >> > > >
> >> > > > On Thu, Apr 14, 2016 at 4:19 PM, Jay Potharaju <
> >> jspothar...@gmail.com>
> >> > > > wrote:
> >> > > >
> >> > > > > Hi,
> >> > > > > I am using solr 5.5 and testing adding a new replica when a solr
> >> > > instance
> >> > > > > comes up. When I run the following command I get an error. I
> have 1
> >> > > > replica
> >> > > > > and trying to add another replica.
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> http://x.x.x.x:8984/solr/admin/collections?action=ADDREPLICA&collection=test2&shard=shard1&node=x.x.x.x:9001_solr
> >> > > > >
> >> > > > > Error:
> >> > > > > > org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
> >> > > > > > At least one of the node(s) specified are not currently
> active,
> >> no
> >> > > > action
> >> > > > > > taken.
> >> > > > > > 
> >> > > > > > At least one of the node(s) specified are not
> >> > > currently
> >> > > > > > active, no action taken.
> >> > > > > > 400
> >> > > > > > 
> >> > > > > > 
> >> > > > > > 
> >> > > > > >  >> name="error-class">org.apache.solr.common.SolrException
> >> > > > > >  >> > > name="root-error-class">org.apache.solr.common.SolrException
> >> > > > > > 
> >> > > > > > At least one of the node(s) specified are not
> >> > > currently
> >> > > > > > active, no action taken.
> >> > > > > > 400
> >> > > > > > 
> >> > > > > > 
> >> > > > >
> >> > > > >
> >> > > > > But when i create a new collection with 2 replicas it works
> fine.
> >> > > > > As a side note my clusterstate.json is not updating correctly.
> Not
> >> > sure
> >> > > > if
> >> > > > > that is causing an issue.
> >> > > > >
> >> > > > >  Any suggestions why the Addreplica command is not working. And
> is
> >> it
> >> > > > > related to the clusterstate.json? If yes, how can i fix it?
> >> > > > >
> >> > > > > --
> >> > > > > Thanks
> >> > > > > Jay
> >> > > > >
> >> > > >
> >> > >
> >> > >
> >> > >
> >> > > --
> >> > > Thanks
> >> > > Jay Potharaju
> >> > >
> >> >
> >>
> >>
> >>
> >> --
>

Re: Adding replica on solr - 5.50

2016-04-14 Thread Jay Potharaju
Jeff, I couldn't agree more with you. I think the reason it is not working is 
because of screwed up clusterstate.json, not sure how to fix it. Have already 
restarted my zk servers. Any more suggestions regarding the same.

> On Apr 14, 2016, at 5:21 PM, Jeff Wartes  wrote:
> 
> I’m all for finding another way to make something work, but I feel like this 
> is the wrong advice. 
> 
> There are two options:
> 1) You are doing something wrong. In which case, you should probably invest 
> in figuring out what.
> 2) Solr is doing something wrong. In which case, you should probably invest 
> in figuring out what, and then file a bug so it doesn’t happen to anyone else.
> 
> Adding a replica is a pretty basic operation, so whichever option is the 
> case, I feel like you’ll just encounter other problems down the road if you 
> don’t figure out what’s going on.
> 
> I’d probably start by creating the single-replica collection, and then 
> inspecting the live_nodes list in Zookeeper to confirm that the (live) node 
> list is actually what you think it is.
> 
> 
> 
> 
> 
>> On 4/14/16, 4:04 PM, "John Bickerstaff"  wrote:
>> 
>> 5.4
>> 
>> This problem drove me insane for about a month...
>> 
>> I'll send you the doc.
>> 
>> On Thu, Apr 14, 2016 at 5:02 PM, Jay Potharaju 
>> wrote:
>> 
>>> Thanks John, which version of solr are you using?
>>> 
>>> On Thu, Apr 14, 2016 at 3:59 PM, John Bickerstaff <
>>> j...@johnbickerstaff.com>
>>> wrote:
>>> 
 su - solr -c "/opt/solr/bin/solr create -c statdx -d /home/john/conf
 -shards 1 -replicationFactor 2"
 
 However, this won't work by itself.  There is some preparation
 necessary...  I'll send you the doc.
 
 On Thu, Apr 14, 2016 at 4:55 PM, Jay Potharaju 
 wrote:
 
> Curious what command did you use?
> 
> On Thu, Apr 14, 2016 at 3:48 PM, John Bickerstaff <
> j...@johnbickerstaff.com>
> wrote:
> 
>> I had a hard time getting replicas made via the API, once I had
>>> created
> the
>> collection for the first time although that may have been
>>> ignorance
> on
>> my part.
>> 
>> I was able to get it done fairly easily on the Linux command line.
>>> If
>> that's an option and you're interested, let me know - I have a rough
 but
>> accurate document. But perhaps others on the list will have the
 specific
>> answer you're looking for.
>> 
>> On Thu, Apr 14, 2016 at 4:19 PM, Jay Potharaju <
>>> jspothar...@gmail.com>
>> wrote:
>> 
>>> Hi,
>>> I am using solr 5.5 and testing adding a new replica when a solr
> instance
>>> comes up. When I run the following command I get an error. I have 1
>> replica
>>> and trying to add another replica.
>>> 
>>> 
>>> 
>> 
> 
 
>>> http://x.x.x.x:8984/solr/admin/collections?action=ADDREPLICA&collection=test2&shard=shard1&node=x.x.x.x:9001_solr
>>> 
>>> Error:
 org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
 At least one of the node(s) specified are not currently active,
>>> no
>> action
 taken.
 
 At least one of the node(s) specified are not
> currently
 active, no action taken.
 400
 
 
 
 >> name="error-class">org.apache.solr.common.SolrException
  name="root-error-class">org.apache.solr.common.SolrException
 
 At least one of the node(s) specified are not
> currently
 active, no action taken.
 400
 
 
>>> 
>>> 
>>> But when i create a new collection with 2 replicas it works fine.
>>> As a side note my clusterstate.json is not updating correctly. Not
 sure
>> if
>>> that is causing an issue.
>>> 
>>> Any suggestions why the Addreplica command is not working. And is
>>> it
>>> related to the clusterstate.json? If yes, how can i fix it?
>>> 
>>> --
>>> Thanks
>>> Jay
>>> 
>> 
> 
> 
> 
> --
> Thanks
> Jay Potharaju
> 
 
>>> 
>>> 
>>> 
>>> --
>>> Thanks
>>> Jay Potharaju
>>> 


Re: Adding replica on solr - 5.50

2016-04-14 Thread Erick Erickson
bq:  the Solr site about how to add a
replica to a Solr cloud.  The Admin UI appears to require that the
directories be created anyway

No, no, a thousand times NO! You're getting confused,
I think, with the difference between _cores_ and _collections_
(or replicas in a collection).

Do not use the admin UI for _cores_ to create replicas. It's possible
if (and only if) you do it exactly correctly. Instead, use the collections API
ADDREPLICA command here:
https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api_addreplica

Which you could cURL etc., does that qualify as "scripting" in your
situation?

You're right, the Solr instance must be up and running for the replica to
be added, but that's not onerous


The bin/solr script is a "work in progress", and doesn't have direct support
for "addreplica", but it could be added.

Best,
Erick

On Thu, Apr 14, 2016 at 6:22 PM, John Bickerstaff
 wrote:
> Sure - couldn't agree more.
>
> I couldn't find any good documentation on the Solr site about how to add a
> replica to a Solr cloud.  The Admin UI appears to require that the
> directories be created anyway.
>
> There is probably a way to do it through the UI, once Solr is installed on
> a new machine - and IIRC, I did manage that, but my IT guy wanted
> scriptable command lines.
>
> Also, IIRC, the stuff I did on the command line actually showed the API URL
> as part of the output so Jay could try that and see what the difference
> is...
>
> Jay - I'm going offline now, but if you're still stuck tomorrow, I'll try
> to recreate... I have a VM snapshot just before I issued the command...
>
> Keep in mind everything I did was in a Solr Cloud...
>
> On Thu, Apr 14, 2016 at 6:21 PM, Jeff Wartes  wrote:
>
>> I’m all for finding another way to make something work, but I feel like
>> this is the wrong advice.
>>
>> There are two options:
>> 1) You are doing something wrong. In which case, you should probably
>> invest in figuring out what.
>> 2) Solr is doing something wrong. In which case, you should probably
>> invest in figuring out what, and then file a bug so it doesn’t happen to
>> anyone else.
>>
>> Adding a replica is a pretty basic operation, so whichever option is the
>> case, I feel like you’ll just encounter other problems down the road if you
>> don’t figure out what’s going on.
>>
>> I’d probably start by creating the single-replica collection, and then
>> inspecting the live_nodes list in Zookeeper to confirm that the (live) node
>> list is actually what you think it is.
>>
>>
>>
>>
>>
>> On 4/14/16, 4:04 PM, "John Bickerstaff"  wrote:
>>
>> >5.4
>> >
>> >This problem drove me insane for about a month...
>> >
>> >I'll send you the doc.
>> >
>> >On Thu, Apr 14, 2016 at 5:02 PM, Jay Potharaju 
>> >wrote:
>> >
>> >> Thanks John, which version of solr are you using?
>> >>
>> >> On Thu, Apr 14, 2016 at 3:59 PM, John Bickerstaff <
>> >> j...@johnbickerstaff.com>
>> >> wrote:
>> >>
>> >> > su - solr -c "/opt/solr/bin/solr create -c statdx -d /home/john/conf
>> >> > -shards 1 -replicationFactor 2"
>> >> >
>> >> > However, this won't work by itself.  There is some preparation
>> >> > necessary...  I'll send you the doc.
>> >> >
>> >> > On Thu, Apr 14, 2016 at 4:55 PM, Jay Potharaju > >
>> >> > wrote:
>> >> >
>> >> > > Curious what command did you use?
>> >> > >
>> >> > > On Thu, Apr 14, 2016 at 3:48 PM, John Bickerstaff <
>> >> > > j...@johnbickerstaff.com>
>> >> > > wrote:
>> >> > >
>> >> > > > I had a hard time getting replicas made via the API, once I had
>> >> created
>> >> > > the
>> >> > > > collection for the first time although that may have been
>> >> ignorance
>> >> > > on
>> >> > > > my part.
>> >> > > >
>> >> > > > I was able to get it done fairly easily on the Linux command line.
>> >> If
>> >> > > > that's an option and you're interested, let me know - I have a
>> rough
>> >> > but
>> >> > > > accurate document. But perhaps others on the list will have the
>> >> > specific
>> >> > > > answer you're looking for.
>> >> > > >
>> >> > > > On Thu, Apr 14, 2016 at 4:19 PM, Jay Potharaju <
>> >> jspothar...@gmail.com>
>> >> > > > wrote:
>> >> > > >
>> >> > > > > Hi,
>> >> > > > > I am using solr 5.5 and testing adding a new replica when a solr
>> >> > > instance
>> >> > > > > comes up. When I run the following command I get an error. I
>> have 1
>> >> > > > replica
>> >> > > > > and trying to add another replica.
>> >> > > > >
>> >> > > > >
>> >> > > > >
>> >> > > >
>> >> > >
>> >> >
>> >>
>> http://x.x.x.x:8984/solr/admin/collections?action=ADDREPLICA&collection=test2&shard=shard1&node=x.x.x.x:9001_solr
>> >> > > > >
>> >> > > > > Error:
>> >> > > > > > org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
>> >> > > > > > At least one of the node(s) specified are not currently
>> active,
>> >> no
>> >> > > > action
>> >> > > > > > taken.
>> >> > > > > > 
>> >> > > > > > At least one of the node(s) specified are not
>> >> > > currently
>> >> > > > > > 

Re: Adding replica on solr - 5.50

2016-04-14 Thread Jay Potharaju
Thanks for the help John.

> On Apr 14, 2016, at 6:22 PM, John Bickerstaff  
> wrote:
> 
> Sure - couldn't agree more.
> 
> I couldn't find any good documentation on the Solr site about how to add a
> replica to a Solr cloud.  The Admin UI appears to require that the
> directories be created anyway.
> 
> There is probably a way to do it through the UI, once Solr is installed on
> a new machine - and IIRC, I did manage that, but my IT guy wanted
> scriptable command lines.
> 
> Also, IIRC, the stuff I did on the command line actually showed the API URL
> as part of the output so Jay could try that and see what the difference
> is...
> 
> Jay - I'm going offline now, but if you're still stuck tomorrow, I'll try
> to recreate... I have a VM snapshot just before I issued the command...
> 
> Keep in mind everything I did was in a Solr Cloud...
> 
>> On Thu, Apr 14, 2016 at 6:21 PM, Jeff Wartes  wrote:
>> 
>> I’m all for finding another way to make something work, but I feel like
>> this is the wrong advice.
>> 
>> There are two options:
>> 1) You are doing something wrong. In which case, you should probably
>> invest in figuring out what.
>> 2) Solr is doing something wrong. In which case, you should probably
>> invest in figuring out what, and then file a bug so it doesn’t happen to
>> anyone else.
>> 
>> Adding a replica is a pretty basic operation, so whichever option is the
>> case, I feel like you’ll just encounter other problems down the road if you
>> don’t figure out what’s going on.
>> 
>> I’d probably start by creating the single-replica collection, and then
>> inspecting the live_nodes list in Zookeeper to confirm that the (live) node
>> list is actually what you think it is.
>> 
>> 
>> 
>> 
>> 
>>> On 4/14/16, 4:04 PM, "John Bickerstaff"  wrote:
>>> 
>>> 5.4
>>> 
>>> This problem drove me insane for about a month...
>>> 
>>> I'll send you the doc.
>>> 
>>> On Thu, Apr 14, 2016 at 5:02 PM, Jay Potharaju 
>>> wrote:
>>> 
 Thanks John, which version of solr are you using?
 
 On Thu, Apr 14, 2016 at 3:59 PM, John Bickerstaff <
 j...@johnbickerstaff.com>
 wrote:
 
> su - solr -c "/opt/solr/bin/solr create -c statdx -d /home/john/conf
> -shards 1 -replicationFactor 2"
> 
> However, this won't work by itself.  There is some preparation
> necessary...  I'll send you the doc.
> 
> On Thu, Apr 14, 2016 at 4:55 PM, Jay Potharaju >> 
> wrote:
> 
>> Curious what command did you use?
>> 
>> On Thu, Apr 14, 2016 at 3:48 PM, John Bickerstaff <
>> j...@johnbickerstaff.com>
>> wrote:
>> 
>>> I had a hard time getting replicas made via the API, once I had
 created
>> the
>>> collection for the first time although that may have been
 ignorance
>> on
>>> my part.
>>> 
>>> I was able to get it done fairly easily on the Linux command line.
 If
>>> that's an option and you're interested, let me know - I have a
>> rough
> but
>>> accurate document. But perhaps others on the list will have the
> specific
>>> answer you're looking for.
>>> 
>>> On Thu, Apr 14, 2016 at 4:19 PM, Jay Potharaju <
 jspothar...@gmail.com>
>>> wrote:
>>> 
 Hi,
 I am using solr 5.5 and testing adding a new replica when a solr
>> instance
 comes up. When I run the following command I get an error. I
>> have 1
>>> replica
 and trying to add another replica.
 
 
 
>>> 
>> 
> 
 
>> http://x.x.x.x:8984/solr/admin/collections?action=ADDREPLICA&collection=test2&shard=shard1&node=x.x.x.x:9001_solr
 
 Error:
> org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
> At least one of the node(s) specified are not currently
>> active,
 no
>>> action
> taken.
> 
> At least one of the node(s) specified are not
>> currently
> active, no action taken.
> 400
> 
> 
> 
> >>> name="error-class">org.apache.solr.common.SolrException
> > name="root-error-class">org.apache.solr.common.SolrException
> 
> At least one of the node(s) specified are not
>> currently
> active, no action taken.
> 400
> 
> 
 
 
 But when i create a new collection with 2 replicas it works
>> fine.
 As a side note my clusterstate.json is not updating correctly.
>> Not
> sure
>>> if
 that is causing an issue.
 
 Any suggestions why the Addreplica command is not working. And
>> is
 it
 related to the clusterstate.json? If yes, how can i fix it?
 
 --
 Thanks
 Jay
 
>>> 
>> 
>> 
>> 
>> --
>> Thanks
>> Jay Potharaju
>> 
> 
 
 
 
 --
 Thanks
 Jay Potharaju
 
>> 


Re: Adding replica on solr - 5.50

2016-04-14 Thread Erick Erickson
Post your clusterstate.json file?

You shouldn't even have a clusterstate.json file with anything in it.
In the 5x code line the state of each collection is kept under
the relevant collections z-noed in "state.json".

Confusingly, though, the clusterstate.json node still exists
but is empty...

Best,
Erick

On Thu, Apr 14, 2016 at 6:29 PM, Jay Potharaju  wrote:
> Jeff, I couldn't agree more with you. I think the reason it is not working is 
> because of screwed up clusterstate.json, not sure how to fix it. Have already 
> restarted my zk servers. Any more suggestions regarding the same.
>
>> On Apr 14, 2016, at 5:21 PM, Jeff Wartes  wrote:
>>
>> I’m all for finding another way to make something work, but I feel like this 
>> is the wrong advice.
>>
>> There are two options:
>> 1) You are doing something wrong. In which case, you should probably invest 
>> in figuring out what.
>> 2) Solr is doing something wrong. In which case, you should probably invest 
>> in figuring out what, and then file a bug so it doesn’t happen to anyone 
>> else.
>>
>> Adding a replica is a pretty basic operation, so whichever option is the 
>> case, I feel like you’ll just encounter other problems down the road if you 
>> don’t figure out what’s going on.
>>
>> I’d probably start by creating the single-replica collection, and then 
>> inspecting the live_nodes list in Zookeeper to confirm that the (live) node 
>> list is actually what you think it is.
>>
>>
>>
>>
>>
>>> On 4/14/16, 4:04 PM, "John Bickerstaff"  wrote:
>>>
>>> 5.4
>>>
>>> This problem drove me insane for about a month...
>>>
>>> I'll send you the doc.
>>>
>>> On Thu, Apr 14, 2016 at 5:02 PM, Jay Potharaju 
>>> wrote:
>>>
 Thanks John, which version of solr are you using?

 On Thu, Apr 14, 2016 at 3:59 PM, John Bickerstaff <
 j...@johnbickerstaff.com>
 wrote:

> su - solr -c "/opt/solr/bin/solr create -c statdx -d /home/john/conf
> -shards 1 -replicationFactor 2"
>
> However, this won't work by itself.  There is some preparation
> necessary...  I'll send you the doc.
>
> On Thu, Apr 14, 2016 at 4:55 PM, Jay Potharaju 
> wrote:
>
>> Curious what command did you use?
>>
>> On Thu, Apr 14, 2016 at 3:48 PM, John Bickerstaff <
>> j...@johnbickerstaff.com>
>> wrote:
>>
>>> I had a hard time getting replicas made via the API, once I had
 created
>> the
>>> collection for the first time although that may have been
 ignorance
>> on
>>> my part.
>>>
>>> I was able to get it done fairly easily on the Linux command line.
 If
>>> that's an option and you're interested, let me know - I have a rough
> but
>>> accurate document. But perhaps others on the list will have the
> specific
>>> answer you're looking for.
>>>
>>> On Thu, Apr 14, 2016 at 4:19 PM, Jay Potharaju <
 jspothar...@gmail.com>
>>> wrote:
>>>
 Hi,
 I am using solr 5.5 and testing adding a new replica when a solr
>> instance
 comes up. When I run the following command I get an error. I have 1
>>> replica
 and trying to add another replica.



>>>
>>
>
 http://x.x.x.x:8984/solr/admin/collections?action=ADDREPLICA&collection=test2&shard=shard1&node=x.x.x.x:9001_solr

 Error:
> org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
> At least one of the node(s) specified are not currently active,
 no
>>> action
> taken.
> 
> At least one of the node(s) specified are not
>> currently
> active, no action taken.
> 400
> 
> 
> 
> >>> name="error-class">org.apache.solr.common.SolrException
> > name="root-error-class">org.apache.solr.common.SolrException
> 
> At least one of the node(s) specified are not
>> currently
> active, no action taken.
> 400
> 
> 


 But when i create a new collection with 2 replicas it works fine.
 As a side note my clusterstate.json is not updating correctly. Not
> sure
>>> if
 that is causing an issue.

 Any suggestions why the Addreplica command is not working. And is
 it
 related to the clusterstate.json? If yes, how can i fix it?

 --
 Thanks
 Jay

>>>
>>
>>
>>
>> --
>> Thanks
>> Jay Potharaju
>>
>



 --
 Thanks
 Jay Potharaju



Re: Adding replica on solr - 5.50

2016-04-14 Thread John Bickerstaff
Thanks Eric!

I'll look into that immediately - yes, I think that cURL would qualify as
scriptable for my IT lead.

In the end, I found I could do it two ways...

Either copy the entire solr data directory over to /var/solr/data on the
new machine, change the directory name and the entries in the
core.properties file, then start the already-installed Solr in cloud mode -
everything came up roses in the cloud section of the UI - the new replica
was there as part of the collection, properly named and worked fine.

Alternatively, I used the command I mentioned earlier and then waited as
the data was replicated over to the newly-created replica -- again,
everything was roses in the Cloud section of the Admin UI...

What might I have messed up in this scenario?  I didn't love the hackish
feeling either, but had been unable to find anything like the addreplica -
although I did look for a fairly long time - I'm glad to know about it now.



On Thu, Apr 14, 2016 at 7:36 PM, Erick Erickson 
wrote:

> bq:  the Solr site about how to add a
> replica to a Solr cloud.  The Admin UI appears to require that the
> directories be created anyway
>
> No, no, a thousand times NO! You're getting confused,
> I think, with the difference between _cores_ and _collections_
> (or replicas in a collection).
>
> Do not use the admin UI for _cores_ to create replicas. It's possible
> if (and only if) you do it exactly correctly. Instead, use the collections
> API
> ADDREPLICA command here:
>
> https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api_addreplica
>
> Which you could cURL etc., does that qualify as "scripting" in your
> situation?
>
> You're right, the Solr instance must be up and running for the replica to
> be added, but that's not onerous
>
>
> The bin/solr script is a "work in progress", and doesn't have direct
> support
> for "addreplica", but it could be added.
>
> Best,
> Erick
>
> On Thu, Apr 14, 2016 at 6:22 PM, John Bickerstaff
>  wrote:
> > Sure - couldn't agree more.
> >
> > I couldn't find any good documentation on the Solr site about how to add
> a
> > replica to a Solr cloud.  The Admin UI appears to require that the
> > directories be created anyway.
> >
> > There is probably a way to do it through the UI, once Solr is installed
> on
> > a new machine - and IIRC, I did manage that, but my IT guy wanted
> > scriptable command lines.
> >
> > Also, IIRC, the stuff I did on the command line actually showed the API
> URL
> > as part of the output so Jay could try that and see what the difference
> > is...
> >
> > Jay - I'm going offline now, but if you're still stuck tomorrow, I'll try
> > to recreate... I have a VM snapshot just before I issued the command...
> >
> > Keep in mind everything I did was in a Solr Cloud...
> >
> > On Thu, Apr 14, 2016 at 6:21 PM, Jeff Wartes 
> wrote:
> >
> >> I’m all for finding another way to make something work, but I feel like
> >> this is the wrong advice.
> >>
> >> There are two options:
> >> 1) You are doing something wrong. In which case, you should probably
> >> invest in figuring out what.
> >> 2) Solr is doing something wrong. In which case, you should probably
> >> invest in figuring out what, and then file a bug so it doesn’t happen to
> >> anyone else.
> >>
> >> Adding a replica is a pretty basic operation, so whichever option is the
> >> case, I feel like you’ll just encounter other problems down the road if
> you
> >> don’t figure out what’s going on.
> >>
> >> I’d probably start by creating the single-replica collection, and then
> >> inspecting the live_nodes list in Zookeeper to confirm that the (live)
> node
> >> list is actually what you think it is.
> >>
> >>
> >>
> >>
> >>
> >> On 4/14/16, 4:04 PM, "John Bickerstaff" 
> wrote:
> >>
> >> >5.4
> >> >
> >> >This problem drove me insane for about a month...
> >> >
> >> >I'll send you the doc.
> >> >
> >> >On Thu, Apr 14, 2016 at 5:02 PM, Jay Potharaju 
> >> >wrote:
> >> >
> >> >> Thanks John, which version of solr are you using?
> >> >>
> >> >> On Thu, Apr 14, 2016 at 3:59 PM, John Bickerstaff <
> >> >> j...@johnbickerstaff.com>
> >> >> wrote:
> >> >>
> >> >> > su - solr -c "/opt/solr/bin/solr create -c statdx -d
> /home/john/conf
> >> >> > -shards 1 -replicationFactor 2"
> >> >> >
> >> >> > However, this won't work by itself.  There is some preparation
> >> >> > necessary...  I'll send you the doc.
> >> >> >
> >> >> > On Thu, Apr 14, 2016 at 4:55 PM, Jay Potharaju <
> jspothar...@gmail.com
> >> >
> >> >> > wrote:
> >> >> >
> >> >> > > Curious what command did you use?
> >> >> > >
> >> >> > > On Thu, Apr 14, 2016 at 3:48 PM, John Bickerstaff <
> >> >> > > j...@johnbickerstaff.com>
> >> >> > > wrote:
> >> >> > >
> >> >> > > > I had a hard time getting replicas made via the API, once I had
> >> >> created
> >> >> > > the
> >> >> > > > collection for the first time although that may have been
> >> >> ignorance
> >> >> > > on
> >> >> > > > my part.
> >> >> > > >
> >> >> > > > I was

How to get stats on currency field?

2016-04-14 Thread Pranaya Behera

Hi,
I have a currency field type. How do I get StatsComponent to work 
with it ? Currently StatsComponent works with strings, numerics but not 
currency field.
Another question is how to copy only the value part from a currency 
field ? e.g. if my field name is "mrp" and the value is "62.00, USD", if 
the currency field cannot be used in the StatsComponent then how can I 
copy only 62.00 to another field ?


Re: Adding replica on solr - 5.50

2016-04-14 Thread John Bickerstaff
Jay - it's probably too simple, but the error says "not currently active"
which could, of course, mean that although it's up and running, it's not
listening on the port you have in the command line...  Or that the port is
blocked by a firewall or other network problem.

I note that you're using ports different from the default 8983 for your
Solr instances...

You probably checked already, but I thought I'd mention it.


On Thu, Apr 14, 2016 at 8:30 PM, John Bickerstaff 
wrote:

> Thanks Eric!
>
> I'll look into that immediately - yes, I think that cURL would qualify as
> scriptable for my IT lead.
>
> In the end, I found I could do it two ways...
>
> Either copy the entire solr data directory over to /var/solr/data on the
> new machine, change the directory name and the entries in the
> core.properties file, then start the already-installed Solr in cloud mode -
> everything came up roses in the cloud section of the UI - the new replica
> was there as part of the collection, properly named and worked fine.
>
> Alternatively, I used the command I mentioned earlier and then waited as
> the data was replicated over to the newly-created replica -- again,
> everything was roses in the Cloud section of the Admin UI...
>
> What might I have messed up in this scenario?  I didn't love the hackish
> feeling either, but had been unable to find anything like the addreplica -
> although I did look for a fairly long time - I'm glad to know about it now.
>
>
>
> On Thu, Apr 14, 2016 at 7:36 PM, Erick Erickson 
> wrote:
>
>> bq:  the Solr site about how to add a
>> replica to a Solr cloud.  The Admin UI appears to require that the
>> directories be created anyway
>>
>> No, no, a thousand times NO! You're getting confused,
>> I think, with the difference between _cores_ and _collections_
>> (or replicas in a collection).
>>
>> Do not use the admin UI for _cores_ to create replicas. It's possible
>> if (and only if) you do it exactly correctly. Instead, use the
>> collections API
>> ADDREPLICA command here:
>>
>> https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api_addreplica
>>
>> Which you could cURL etc., does that qualify as "scripting" in your
>> situation?
>>
>> You're right, the Solr instance must be up and running for the replica to
>> be added, but that's not onerous
>>
>>
>> The bin/solr script is a "work in progress", and doesn't have direct
>> support
>> for "addreplica", but it could be added.
>>
>> Best,
>> Erick
>>
>> On Thu, Apr 14, 2016 at 6:22 PM, John Bickerstaff
>>  wrote:
>> > Sure - couldn't agree more.
>> >
>> > I couldn't find any good documentation on the Solr site about how to
>> add a
>> > replica to a Solr cloud.  The Admin UI appears to require that the
>> > directories be created anyway.
>> >
>> > There is probably a way to do it through the UI, once Solr is installed
>> on
>> > a new machine - and IIRC, I did manage that, but my IT guy wanted
>> > scriptable command lines.
>> >
>> > Also, IIRC, the stuff I did on the command line actually showed the API
>> URL
>> > as part of the output so Jay could try that and see what the difference
>> > is...
>> >
>> > Jay - I'm going offline now, but if you're still stuck tomorrow, I'll
>> try
>> > to recreate... I have a VM snapshot just before I issued the command...
>> >
>> > Keep in mind everything I did was in a Solr Cloud...
>> >
>> > On Thu, Apr 14, 2016 at 6:21 PM, Jeff Wartes 
>> wrote:
>> >
>> >> I’m all for finding another way to make something work, but I feel like
>> >> this is the wrong advice.
>> >>
>> >> There are two options:
>> >> 1) You are doing something wrong. In which case, you should probably
>> >> invest in figuring out what.
>> >> 2) Solr is doing something wrong. In which case, you should probably
>> >> invest in figuring out what, and then file a bug so it doesn’t happen
>> to
>> >> anyone else.
>> >>
>> >> Adding a replica is a pretty basic operation, so whichever option is
>> the
>> >> case, I feel like you’ll just encounter other problems down the road
>> if you
>> >> don’t figure out what’s going on.
>> >>
>> >> I’d probably start by creating the single-replica collection, and then
>> >> inspecting the live_nodes list in Zookeeper to confirm that the (live)
>> node
>> >> list is actually what you think it is.
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> On 4/14/16, 4:04 PM, "John Bickerstaff" 
>> wrote:
>> >>
>> >> >5.4
>> >> >
>> >> >This problem drove me insane for about a month...
>> >> >
>> >> >I'll send you the doc.
>> >> >
>> >> >On Thu, Apr 14, 2016 at 5:02 PM, Jay Potharaju > >
>> >> >wrote:
>> >> >
>> >> >> Thanks John, which version of solr are you using?
>> >> >>
>> >> >> On Thu, Apr 14, 2016 at 3:59 PM, John Bickerstaff <
>> >> >> j...@johnbickerstaff.com>
>> >> >> wrote:
>> >> >>
>> >> >> > su - solr -c "/opt/solr/bin/solr create -c statdx -d
>> /home/john/conf
>> >> >> > -shards 1 -replicationFactor 2"
>> >> >> >
>> >> >> > However, this won't work by itself.  There is some preparation

[ANNOUNCE] YCSB 0.8.0 Release

2016-04-14 Thread Chrisjan Matser
On behalf of the development community, I am pleased to announce the
release of YCSB 0.8.0.  Though there were no major Solr updates in this
release, we are always interested in having members from the community help
with ensuring that we have compliance with Solr's latest and greatest.

Highlights:

* Amazon S3 improvments including proper closing of the S3Object

* Apache Cassandra improvements including update to DataStax driver 3.0.0,
tested with Cassandra 2.2.5

* Apache HBase10 improvements including synchronization for multi-threading

* Core improvements to address future enhancements

* Elasticsearch improvements including update to 2.3.1 (latest stable
version)

* Orientdb improvements including a readallfields fix

Full release notes, including links to source and convenience binaries:

https://github.com/brianfrankcooper/YCSB/releases/tag/0.8.0

This release covers changes from the last 1 month.


Re: Adding replica on solr - 5.50

2016-04-14 Thread John Bickerstaff
Another thought - again probably not it, but just in case...

Shouldn't this: &node=x.x.x.x:9001_solr


Actually be this?  &node=x.x.x.x:9001/solr


(Note the / instead of _ )

On Thu, Apr 14, 2016 at 10:45 PM, John Bickerstaff  wrote:

> Jay - it's probably too simple, but the error says "not currently active"
> which could, of course, mean that although it's up and running, it's not
> listening on the port you have in the command line...  Or that the port is
> blocked by a firewall or other network problem.
>
> I note that you're using ports different from the default 8983 for your
> Solr instances...
>
> You probably checked already, but I thought I'd mention it.
>
>
> On Thu, Apr 14, 2016 at 8:30 PM, John Bickerstaff <
> j...@johnbickerstaff.com> wrote:
>
>> Thanks Eric!
>>
>> I'll look into that immediately - yes, I think that cURL would qualify as
>> scriptable for my IT lead.
>>
>> In the end, I found I could do it two ways...
>>
>> Either copy the entire solr data directory over to /var/solr/data on the
>> new machine, change the directory name and the entries in the
>> core.properties file, then start the already-installed Solr in cloud mode -
>> everything came up roses in the cloud section of the UI - the new replica
>> was there as part of the collection, properly named and worked fine.
>>
>> Alternatively, I used the command I mentioned earlier and then waited as
>> the data was replicated over to the newly-created replica -- again,
>> everything was roses in the Cloud section of the Admin UI...
>>
>> What might I have messed up in this scenario?  I didn't love the hackish
>> feeling either, but had been unable to find anything like the addreplica -
>> although I did look for a fairly long time - I'm glad to know about it now.
>>
>>
>>
>> On Thu, Apr 14, 2016 at 7:36 PM, Erick Erickson 
>> wrote:
>>
>>> bq:  the Solr site about how to add a
>>> replica to a Solr cloud.  The Admin UI appears to require that the
>>> directories be created anyway
>>>
>>> No, no, a thousand times NO! You're getting confused,
>>> I think, with the difference between _cores_ and _collections_
>>> (or replicas in a collection).
>>>
>>> Do not use the admin UI for _cores_ to create replicas. It's possible
>>> if (and only if) you do it exactly correctly. Instead, use the
>>> collections API
>>> ADDREPLICA command here:
>>>
>>> https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api_addreplica
>>>
>>> Which you could cURL etc., does that qualify as "scripting" in your
>>> situation?
>>>
>>> You're right, the Solr instance must be up and running for the replica to
>>> be added, but that's not onerous
>>>
>>>
>>> The bin/solr script is a "work in progress", and doesn't have direct
>>> support
>>> for "addreplica", but it could be added.
>>>
>>> Best,
>>> Erick
>>>
>>> On Thu, Apr 14, 2016 at 6:22 PM, John Bickerstaff
>>>  wrote:
>>> > Sure - couldn't agree more.
>>> >
>>> > I couldn't find any good documentation on the Solr site about how to
>>> add a
>>> > replica to a Solr cloud.  The Admin UI appears to require that the
>>> > directories be created anyway.
>>> >
>>> > There is probably a way to do it through the UI, once Solr is
>>> installed on
>>> > a new machine - and IIRC, I did manage that, but my IT guy wanted
>>> > scriptable command lines.
>>> >
>>> > Also, IIRC, the stuff I did on the command line actually showed the
>>> API URL
>>> > as part of the output so Jay could try that and see what the difference
>>> > is...
>>> >
>>> > Jay - I'm going offline now, but if you're still stuck tomorrow, I'll
>>> try
>>> > to recreate... I have a VM snapshot just before I issued the command...
>>> >
>>> > Keep in mind everything I did was in a Solr Cloud...
>>> >
>>> > On Thu, Apr 14, 2016 at 6:21 PM, Jeff Wartes 
>>> wrote:
>>> >
>>> >> I’m all for finding another way to make something work, but I feel
>>> like
>>> >> this is the wrong advice.
>>> >>
>>> >> There are two options:
>>> >> 1) You are doing something wrong. In which case, you should probably
>>> >> invest in figuring out what.
>>> >> 2) Solr is doing something wrong. In which case, you should probably
>>> >> invest in figuring out what, and then file a bug so it doesn’t happen
>>> to
>>> >> anyone else.
>>> >>
>>> >> Adding a replica is a pretty basic operation, so whichever option is
>>> the
>>> >> case, I feel like you’ll just encounter other problems down the road
>>> if you
>>> >> don’t figure out what’s going on.
>>> >>
>>> >> I’d probably start by creating the single-replica collection, and then
>>> >> inspecting the live_nodes list in Zookeeper to confirm that the
>>> (live) node
>>> >> list is actually what you think it is.
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >> On 4/14/16, 4:04 PM, "John 

Re: How to get stats on currency field?

2016-04-14 Thread Chris Hostetter

The thing to remember about currency fields is that even if you tend to 
only put one currency value in it, any question of interpreting the values 
in that field has to be done relative to a specific currency, and the 
exchange rates may change dynamically.

So use the currency function to get a numerical value in some explicit 
currency at the moment you execute the query, and then do stats over that 
function.

Something like this IIRC: stats.field={!func}currency(your_field,EUR)



-Hoss
http://www.lucidworks.com/


Re: Growing memory?

2016-04-14 Thread Shawn Heisey
On 4/14/2016 1:25 PM, Betsey Benagh wrote:
> bin/solr status shows the memory usage increasing, as does the admin ui.
>
> I¹m running this on a shared machine that is supporting several other
> applications, so I can¹t be particularly greedy with memory usage.  Is
> there anything out there that gives guidelines on what an appropriate
> amount of heap is based on number of documents or whatever?  We¹re just
> playing around with it right now, but it sounds like we may need a
> different machine in order to load in all of the data we want to have
> available.

That means you're seeing the memory usage from Java's point of view. 
There will be three numbers in the admin UI.  The first is the actual
amount of memory used by the program right at that instant.  The second
is the highest amount of memory that has ever been allocated since the
program started.  The third is the maximum amount of memory that *can*
be allocated.  It's normal for the last two numbers to be the same and
the first number to fluctuate up and down.

>From the operating system's point of view, the program will be using the
amount from the middle number on the admin UI, plus some overhead for
Java itself.

https://wiki.apache.org/solr/SolrPerformanceProblems#How_much_heap_space_do_I_need.3F

In addition to having enough heap memory, getting good performance will
require that you have additional memory in the system that is not
allocated to ANY program, which the OS can use to cache your index
data.  The total amount of memory that a well-tuned Solr server requires
often surprises people.  Running Solr with other applications on the
same server may not be a problem if your Solr server load is low and
your indexes are very small, but if your indexes are large and/or Solr
is very busy, those other applications might interfere with Solr
performance.

Thanks,
Shawn



Re:Re: solr 5.2.1, data import issue, shown processed rows doesn't match acturally indexed doc quantity.

2016-04-14 Thread cqlangyi
hi guys,


thank you very much for the help. sorry been so lated to reply.


1. "commit" didn't help.
after commit, the 'numFound' of "*:*" query is still the same.


2. "id" field in every doc is generated by solr using UUID, i have
idea how to check if there is a duplicated one. but i assuming
there shouldn't be, unless solr cloud has some known bug when
using UUID in a distributed environment.


the environment is


solr cloud with:
3 linux boxes, use zookeeper 3.4.6  + solr 5.2.1, oracle JDK 1.7.80


any ideas?


thank you very much.






At 2016-04-05 12:09:14, "John Bickerstaff"  wrote:
>Both of us implied it, but to be completely clear - if you have a duplicate
>ID in your data set, SOLR will throw away previous documents with that ID
>and index the new one.  That's fine if your duplicates really are
>duplicates - it's not OK if there's a problem in the data set and the
>duplicates ID's are on documents that are actually unique.
>
>On Mon, Apr 4, 2016 at 9:51 PM, John Bickerstaff 
>wrote:
>
>> Sweet - that's a good point - I ran into that too - I had not run the
>> commit for the last "batch" (I was using SolrJ) and so numbers didn't match
>> until I did.
>>
>> On Mon, Apr 4, 2016 at 9:50 PM, Binoy Dalal 
>> wrote:
>>
>>> 1) Are you sure you don't have duplicates?
>>> 2) All of your records might have been indexed but a new searcher may not
>>> have opened on the updated index yet. Try issuing a commit and see if that
>>> works.
>>>
>>> On Tue, 5 Apr 2016, 08:56 cqlangyi,  wrote:
>>>
>>> > hi there,
>>> >
>>> >
>>> > i have an solr 5.2.1,  when i do data import, after the job is done,
>>> it's
>>> > shown 165,191 rows processed successfully.
>>> >
>>> >
>>> > but when i query with *:*, the "numFound" shown only 163,349 docs in
>>> index.
>>> >
>>> >
>>> > when i tred to do it again, , it's shown 165,191 rows processed
>>> > successfully. but the *:* query result now is 162,390.
>>> >
>>> >
>>> > no errors in any log,
>>> >
>>> >
>>> > any idea?
>>> >
>>> >
>>> > thank you very much!
>>> >
>>> >
>>> > cq
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> > At 2016-04-05 09:19:48, "Chris Hostetter" 
>>> > wrote:
>>> > >
>>> > >: I am not sure how to use "Sort By Function" for Case.
>>> > >:
>>> > >:
>>> |10#40|14#19|33#17|27#6|15#6|19#5|7#2|6#1|29#1|5#1|30#1|28#1|12#0|20#0|
>>> > >:
>>> > >: Can you tell how to fetch 40 when input is 10.
>>> > >
>>> > >Something like...
>>> > >
>>> >
>>> >
>>> >if(termfreq(f,10),40,if(termfreq(f,14),19,if(termfreq(f,33),17,)))
>>> > >
>>> > >But i suspect there may be a much better way to achieve your ultimate
>>> goal
>>> > >if you tell us what it is.  what do these fields represent? what makes
>>> > >these numeric valuessignificant? do you know which values are
>>> significant
>>> > >when indexing, or do they vary for every query?
>>> > >
>>> > >https://people.apache.org/~hossman/#xyproblem
>>> > >XY Problem
>>> > >
>>> > >Your question appears to be an "XY Problem" ... that is: you are
>>> dealing
>>> > >with "X", you are assuming "Y" will help you, and you are asking about
>>> "Y"
>>> > >without giving more details about the "X" so that we can understand the
>>> > >full issue.  Perhaps the best solution doesn't involve "Y" at all?
>>> > >See Also: http://www.perlmonks.org/index.pl?node_id=542341
>>> > >
>>> > >
>>> > >
>>> > >
>>> > >-Hoss
>>> > >http://www.lucidworks.com/
>>> >
>>> --
>>> Regards,
>>> Binoy Dalal
>>>
>>
>>