JSON facet performance for aggregations

2017-04-30 Thread Mikhail Ibraheem
Hi,

I am trying to do aggregation with JSON faceting but performance is very bad 
for one of the requests:

json.facet={  

   studentId:{  

  type:terms,

  limit:-1,

  field:"studentId",

  facet:{

  x:"sum(grades)"

  }

   }

}

 

This request finishes in 250 seconds, and we can't paginate for this service 
for functional reason so we have to use limit:-1, and the cardinality of the 
studentId is 7500.

 

If I try the same with flat facet it finishes in 3 seconds :  
stats=true&facet=true&stats.field={!tag=piv1 
sum=true}grades&facet.pivot={!stats=piv1}studentId

 

We are hoping to use one approach json or flat for all our services. JSON facet 
performance is better for many case.

 

Please advise on why the performance for this is so bad and if we can improve 
it. Also what is the default algorithm used for json facet.

 

Thanks

Mikhail


Re: JSON facet performance for aggregations

2017-04-30 Thread Vijay Tiwary
Json facet on string fields run lot slower than on numeric fields. Try and
see if you can represent studentid as a numeric field.

On Apr 30, 2017 1:19 PM, "Mikhail Ibraheem" 
wrote:

> Hi,
>
> I am trying to do aggregation with JSON faceting but performance is very
> bad for one of the requests:
>
> json.facet={
>
>studentId:{
>
>   type:terms,
>
>   limit:-1,
>
>   field:"studentId",
>
>   facet:{
>
>   x:"sum(grades)"
>
>   }
>
>}
>
> }
>
>
>
> This request finishes in 250 seconds, and we can't paginate for this
> service for functional reason so we have to use limit:-1, and the
> cardinality of the studentId is 7500.
>
>
>
> If I try the same with flat facet it finishes in 3 seconds :
> stats=true&facet=true&stats.field={!tag=piv1
> sum=true}grades&facet.pivot={!stats=piv1}studentId
>
>
>
> We are hoping to use one approach json or flat for all our services. JSON
> facet performance is better for many case.
>
>
>
> Please advise on why the performance for this is so bad and if we can
> improve it. Also what is the default algorithm used for json facet.
>
>
>
> Thanks
>
> Mikhail
>


pagination of results of grouping by more than one field

2017-04-30 Thread Mikhail Ibraheem
Hi,

I have a problem that I need to group by X and Y and aggregator on Z and I need 
to paginate on the results. 

The results aren't flat they are in hierarchy so how to flat the results  so we 
can paginate on them for each combination of X,Y like:

Computers, Computer Laptops, 5.684790733920929E10

Computers, PE_Server, 1.1207993365851181E10

Computers, Monitors, 1.2723246848002455E9

Data Communications Hardware, Datacom Hardware, 6.3539691650598495E10

 

>From the sample of the results:

 

"X":{

  "buckets":[{

  "val":"Computers",

  "count":981466,

  "Y":{

"buckets":[{

"val":"Computer Laptops",

"count":391064,

"sum":5.684790733920929E10},

  {

"val":"PE_Server",

"count":218148,

"sum":1.1207993365851181E10},

  {

"val":"Monitors",

"count":122176,

"sum":1.2723246848002455E9}]}},

{

  "val":"Data Communications Hardware",

  "count":428230,

  "Y":{

"buckets":[{

"val":"Datacom Hardware",

"count":428230,

"sum":6.3539691650598495E10}]}},

{

  "val":"Leasehold Improvements",

  "count":33677,

  "Y":{

"buckets":[{

"val":"Leasehold improvements",

"count":33676,

"sum":1.6308392462957385E12},

  {

"val":"Electrical & Air Conditioning",

"count":1,

"sum":4505.0}]}},

{

 


RE: JSON facet performance for aggregations

2017-04-30 Thread Mikhail Ibraheem
Hi Vijay,
It is already numeric field.
It is huge difference between json and flat here. Do you know the reason for 
this? Is there a way to improve it ?

-Original Message-
From: Vijay Tiwary [mailto:vijaykr.tiw...@gmail.com] 
Sent: Sunday, April 30, 2017 9:58 AM
To: solr-user@lucene.apache.org
Subject: Re: JSON facet performance for aggregations

Json facet on string fields run lot slower than on numeric fields. Try and see 
if you can represent studentid as a numeric field.

On Apr 30, 2017 1:19 PM, "Mikhail Ibraheem" 
wrote:

> Hi,
>
> I am trying to do aggregation with JSON faceting but performance is 
> very bad for one of the requests:
>
> json.facet={
>
>studentId:{
>
>   type:terms,
>
>   limit:-1,
>
>   field:"studentId",
>
>   facet:{
>
>   x:"sum(grades)"
>
>   }
>
>}
>
> }
>
>
>
> This request finishes in 250 seconds, and we can't paginate for this 
> service for functional reason so we have to use limit:-1, and the 
> cardinality of the studentId is 7500.
>
>
>
> If I try the same with flat facet it finishes in 3 seconds :
> stats=true&facet=true&stats.field={!tag=piv1
> sum=true}grades&facet.pivot={!stats=piv1}studentId
>
>
>
> We are hoping to use one approach json or flat for all our services. 
> JSON facet performance is better for many case.
>
>
>
> Please advise on why the performance for this is so bad and if we can 
> improve it. Also what is the default algorithm used for json facet.
>
>
>
> Thanks
>
> Mikhail
>


RE: JSON facet performance for aggregations

2017-04-30 Thread Vijay Tiwary
Please enable doc values and try.
There is a bug in the source code which causes json facet on string field
to run very slow. On numeric fields it runs fine with doc value enabled.

On Apr 30, 2017 1:41 PM, "Mikhail Ibraheem" 
wrote:

> Hi Vijay,
> It is already numeric field.
> It is huge difference between json and flat here. Do you know the reason
> for this? Is there a way to improve it ?
>
> -Original Message-
> From: Vijay Tiwary [mailto:vijaykr.tiw...@gmail.com]
> Sent: Sunday, April 30, 2017 9:58 AM
> To: solr-user@lucene.apache.org
> Subject: Re: JSON facet performance for aggregations
>
> Json facet on string fields run lot slower than on numeric fields. Try and
> see if you can represent studentid as a numeric field.
>
> On Apr 30, 2017 1:19 PM, "Mikhail Ibraheem" 
> wrote:
>
> > Hi,
> >
> > I am trying to do aggregation with JSON faceting but performance is
> > very bad for one of the requests:
> >
> > json.facet={
> >
> >studentId:{
> >
> >   type:terms,
> >
> >   limit:-1,
> >
> >   field:"studentId",
> >
> >   facet:{
> >
> >   x:"sum(grades)"
> >
> >   }
> >
> >}
> >
> > }
> >
> >
> >
> > This request finishes in 250 seconds, and we can't paginate for this
> > service for functional reason so we have to use limit:-1, and the
> > cardinality of the studentId is 7500.
> >
> >
> >
> > If I try the same with flat facet it finishes in 3 seconds :
> > stats=true&facet=true&stats.field={!tag=piv1
> > sum=true}grades&facet.pivot={!stats=piv1}studentId
> >
> >
> >
> > We are hoping to use one approach json or flat for all our services.
> > JSON facet performance is better for many case.
> >
> >
> >
> > Please advise on why the performance for this is so bad and if we can
> > improve it. Also what is the default algorithm used for json facet.
> >
> >
> >
> > Thanks
> >
> > Mikhail
> >
>


RE: JSON facet performance for aggregations

2017-04-30 Thread Mikhail Ibraheem
1- 
studentId has docValue = true . it is of type double which is 


2- If we just facet without aggregation it finishes in good time 60ms:

json.facet={  
   studentId:{  
  type:terms,
  limit:-1,
  field:" studentId "
  
   }
}


Thanks


-Original Message-
From: Vijay Tiwary [mailto:vijaykr.tiw...@gmail.com] 
Sent: Sunday, April 30, 2017 10:44 AM
To: solr-user@lucene.apache.org
Subject: RE: JSON facet performance for aggregations

Please enable doc values and try.
There is a bug in the source code which causes json facet on string field to 
run very slow. On numeric fields it runs fine with doc value enabled.

On Apr 30, 2017 1:41 PM, "Mikhail Ibraheem" 
wrote:

> Hi Vijay,
> It is already numeric field.
> It is huge difference between json and flat here. Do you know the 
> reason for this? Is there a way to improve it ?
>
> -Original Message-
> From: Vijay Tiwary [mailto:vijaykr.tiw...@gmail.com]
> Sent: Sunday, April 30, 2017 9:58 AM
> To: solr-user@lucene.apache.org
> Subject: Re: JSON facet performance for aggregations
>
> Json facet on string fields run lot slower than on numeric fields. Try 
> and see if you can represent studentid as a numeric field.
>
> On Apr 30, 2017 1:19 PM, "Mikhail Ibraheem" 
> 
> wrote:
>
> > Hi,
> >
> > I am trying to do aggregation with JSON faceting but performance is 
> > very bad for one of the requests:
> >
> > json.facet={
> >
> >studentId:{
> >
> >   type:terms,
> >
> >   limit:-1,
> >
> >   field:"studentId",
> >
> >   facet:{
> >
> >   x:"sum(grades)"
> >
> >   }
> >
> >}
> >
> > }
> >
> >
> >
> > This request finishes in 250 seconds, and we can't paginate for this 
> > service for functional reason so we have to use limit:-1, and the 
> > cardinality of the studentId is 7500.
> >
> >
> >
> > If I try the same with flat facet it finishes in 3 seconds :
> > stats=true&facet=true&stats.field={!tag=piv1
> > sum=true}grades&facet.pivot={!stats=piv1}studentId
> >
> >
> >
> > We are hoping to use one approach json or flat for all our services.
> > JSON facet performance is better for many case.
> >
> >
> >
> > Please advise on why the performance for this is so bad and if we 
> > can improve it. Also what is the default algorithm used for json facet.
> >
> >
> >
> > Thanks
> >
> > Mikhail
> >
>


Re: Both main and replica are trying to access solr_gc.log.0.current file

2017-04-30 Thread Zheng Lin Edwin Yeo
I'm starting Solr with this command:

bin\solr.cmd start -cloud -p 8983 -s solr\node1\solr -m 8g -z
"localhost:9981,localhost:9982,localhost:9983"

bin\solr.cmd start -cloud -p 8984 -s solr\node2\solr -m 8g -z
"localhost:9981,localhost:9982,localhost:9983"

Regards,
Edwin

On 30 April 2017 at 13:52, Mike Drob  wrote:

> It might depend some on how you are starting Solr (I am less familiar with
> Windows) but you will need to give each instead a separate log4j.properties
> file and configure the log location in there.
>
> Also check out the Solr Ref Guide section on Configuring Logging,
> subsection Permanent Logging Settings.
>
> https://cwiki.apache.org/confluence/display/solr/Configuring+Logging
>
> Mike
>
> On Sat, Apr 29, 2017, 12:24 PM Zheng Lin Edwin Yeo 
> wrote:
>
> > Yes, both Solr instances are running in the same hardware.
> >
> > I believe they are pointing to the same log directories/config too.
> >
> > How do we point them to different log directories/config?
> >
> > Regards,
> > Edwin
> >
> >
> > On 30 April 2017 at 00:36, Mike Drob  wrote:
> >
> > > Are you running both Solr instances in the same hardware and pointing
> > them
> > > at the same log directories/config?
> > >
> > > On Sat, Apr 29, 2017, 2:56 AM Zheng Lin Edwin Yeo <
> edwinye...@gmail.com>
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > I'm using Solr 6.4.2 on SolrCloud, and I'm running 2 replica of Solr.
> > > >
> > > > When I start the replica, I will encounter this error message. It is
> > > > probably due to the Solr log, as both the main and the replica are
> > trying
> > > > to access the same solr_gc.log.0.current file.
> > > >
> > > > Is there anyway to prevent this?
> > > >
> > > > Besides this error message, the rest of the Solr for both main and
> > > replica
> > > > are running normally.
> > > >
> > > > Exception in thread "main" java.nio.file.FileSystemException:
> > > > C:\edwin\solr\server\logs\solr_gc.log.0.current ->
> > > > C:\edwin\solr\server\logs\archived\solr_gc.log.0.current: The
> process
> > > >  cannot access the file because it is being used by another process.
> > > >
> > > > at
> > > > sun.nio.fs.WindowsException.translateToIOException(
> WindowsException.j
> > > > ava:86)
> > > > at
> > > > sun.nio.fs.WindowsException.rethrowAsIOException(
> WindowsException.jav
> > > > a:97)
> > > > at sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:387)
> > > > at
> > > > sun.nio.fs.WindowsFileSystemProvider.move(
> WindowsFileSystemProvider.j
> > > > ava:287)
> > > > at java.nio.file.Files.move(Files.java:1395)
> > > > at
> > > > org.apache.solr.util.SolrCLI$UtilsTool.archiveGcLogs(
> SolrCLI.java:357
> > > > 9)
> > > > at
> > > > org.apache.solr.util.SolrCLI$UtilsTool.runTool(SolrCLI.java:3548)
> > > > at org.apache.solr.util.SolrCLI.main(SolrCLI.java:250)
> > > > "Failed archiving old GC logs"
> > > > Exception in thread "main" java.nio.file.FileSystemException:
> > > > C:\edwin\solr\server\logs\solr-8983-console.log ->
> > > > C:\edwin\solr\server\logs\archived\solr-8983-console.log: The
> process
> > > >  cannot access the file because it is being used by another process.
> > > >
> > > > at
> > > > sun.nio.fs.WindowsException.translateToIOException(
> WindowsException.j
> > > > ava:86)
> > > > at
> > > > sun.nio.fs.WindowsException.rethrowAsIOException(
> WindowsException.jav
> > > > a:97)
> > > > at sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:387)
> > > > at
> > > > sun.nio.fs.WindowsFileSystemProvider.move(
> WindowsFileSystemProvider.j
> > > > ava:287)
> > > > at java.nio.file.Files.move(Files.java:1395)
> > > > at
> > > > org.apache.solr.util.SolrCLI$UtilsTool.archiveConsoleLogs(
> SolrCLI.jav
> > > > a:3608)
> > > > at
> > > > org.apache.solr.util.SolrCLI$UtilsTool.runTool(SolrCLI.java:3551)
> > > > at org.apache.solr.util.SolrCLI.main(SolrCLI.java:250)
> > > > "Failed archiving old console logs"
> > > > Exception in thread "main" java.nio.file.FileSystemException:
> > > > C:\edwin\solr\server\logs\solr.log -> C:\edwin\solr\server\logs\
> > > solr.log.1:
> > > > The process cannot access the file because i
> > > > t is being used by another process.
> > > >
> > > > at
> > > > sun.nio.fs.WindowsException.translateToIOException(
> WindowsException.j
> > > > ava:86)
> > > > at
> > > > sun.nio.fs.WindowsException.rethrowAsIOException(
> WindowsException.jav
> > > > a:97)
> > > > at sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:387)
> > > > at
> > > > sun.nio.fs.WindowsFileSystemProvider.move(
> WindowsFileSystemProvider.j
> > > > ava:287)
> > > > at java.nio.file.Files.move(Files.java:1395)
> > > > at
> > > > org.apache.solr.util.SolrCLI$UtilsTool.rotateSolrLogs(
> SolrCLI.java:36
> > > > 51)
> > > > at
> > > > org.apache.solr.util.SolrCLI$UtilsTool.runTool(SolrCLI.java:3545)
> > > > at org.apache

Re: JSON facet performance for aggregations

2017-04-30 Thread Yonik Seeley
It is odd there would be quite such a big performance delta.
What version of solr are you using?
What is the fieldType of "grades"?
-Yonik


On Sun, Apr 30, 2017 at 5:15 AM, Mikhail Ibraheem
 wrote:
> 1-
> studentId has docValue = true . it is of type double which is  name="double" class="solr.TrieDoubleField" indexed="false" stored="true" 
> docValues="true" multiValued="false" required="false"/>
>
>
> 2- If we just facet without aggregation it finishes in good time 60ms:
>
> json.facet={
>studentId:{
>   type:terms,
>   limit:-1,
>   field:" studentId "
>
>}
> }
>
>
> Thanks
>
>
> -Original Message-
> From: Vijay Tiwary [mailto:vijaykr.tiw...@gmail.com]
> Sent: Sunday, April 30, 2017 10:44 AM
> To: solr-user@lucene.apache.org
> Subject: RE: JSON facet performance for aggregations
>
> Please enable doc values and try.
> There is a bug in the source code which causes json facet on string field to 
> run very slow. On numeric fields it runs fine with doc value enabled.
>
> On Apr 30, 2017 1:41 PM, "Mikhail Ibraheem" 
> wrote:
>
>> Hi Vijay,
>> It is already numeric field.
>> It is huge difference between json and flat here. Do you know the
>> reason for this? Is there a way to improve it ?
>>
>> -Original Message-
>> From: Vijay Tiwary [mailto:vijaykr.tiw...@gmail.com]
>> Sent: Sunday, April 30, 2017 9:58 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: JSON facet performance for aggregations
>>
>> Json facet on string fields run lot slower than on numeric fields. Try
>> and see if you can represent studentid as a numeric field.
>>
>> On Apr 30, 2017 1:19 PM, "Mikhail Ibraheem"
>> 
>> wrote:
>>
>> > Hi,
>> >
>> > I am trying to do aggregation with JSON faceting but performance is
>> > very bad for one of the requests:
>> >
>> > json.facet={
>> >
>> >studentId:{
>> >
>> >   type:terms,
>> >
>> >   limit:-1,
>> >
>> >   field:"studentId",
>> >
>> >   facet:{
>> >
>> >   x:"sum(grades)"
>> >
>> >   }
>> >
>> >}
>> >
>> > }
>> >
>> >
>> >
>> > This request finishes in 250 seconds, and we can't paginate for this
>> > service for functional reason so we have to use limit:-1, and the
>> > cardinality of the studentId is 7500.
>> >
>> >
>> >
>> > If I try the same with flat facet it finishes in 3 seconds :
>> > stats=true&facet=true&stats.field={!tag=piv1
>> > sum=true}grades&facet.pivot={!stats=piv1}studentId
>> >
>> >
>> >
>> > We are hoping to use one approach json or flat for all our services.
>> > JSON facet performance is better for many case.
>> >
>> >
>> >
>> > Please advise on why the performance for this is so bad and if we
>> > can improve it. Also what is the default algorithm used for json facet.
>> >
>> >
>> >
>> > Thanks
>> >
>> > Mikhail
>> >
>>


Re: Poll: Master-Slave or SolrCloud?

2017-04-30 Thread Yonik Seeley
On Tue, Apr 25, 2017 at 1:33 PM, Otis Gospodnetić
 wrote:
> I think I saw mentions (maybe on user or dev MLs or JIRA) about
> potentially, in the future, there only being SolrCloud mode (and dropping
> SolrCloud name in favour of Solr).

I personally never saw this actually happening, and not because of any
complexity issues with "getting started with SolrCloud", although I
think continuing improvements there are a good thing.

Many times, I see these two things conflated:
1) how easy it is to get SolrCloud set up
2) the inherent internal complexity of a system

We can always improve #1, but that does not imply improvement in #2
(and may actually increase internal complexity).

A system where you can just fire up a node pointed at a directory and
not worry about any shared state is very easy to understand, debug,
hack around, and build very complex custom systems around.

-Yonik


RE: JSON facet performance for aggregations

2017-04-30 Thread Mikhail Ibraheem
Hi Yonik,
We are using Solr 6.5
Both studentId and grades are double:
  

We have 1.5 million records.

Thanks
Mikhail

-Original Message-
From: Yonik Seeley [mailto:ysee...@gmail.com] 
Sent: Sunday, April 30, 2017 1:04 PM
To: solr-user@lucene.apache.org
Subject: Re: JSON facet performance for aggregations

It is odd there would be quite such a big performance delta.
What version of solr are you using?
What is the fieldType of "grades"?
-Yonik


On Sun, Apr 30, 2017 at 5:15 AM, Mikhail Ibraheem  
wrote:
> 1-
> studentId has docValue = true . it is of type double which is 
>  stored="true" docValues="true" multiValued="false" required="false"/>
>
>
> 2- If we just facet without aggregation it finishes in good time 60ms:
>
> json.facet={
>studentId:{
>   type:terms,
>   limit:-1,
>   field:" studentId "
>
>}
> }
>
>
> Thanks
>
>
> -Original Message-
> From: Vijay Tiwary [mailto:vijaykr.tiw...@gmail.com]
> Sent: Sunday, April 30, 2017 10:44 AM
> To: solr-user@lucene.apache.org
> Subject: RE: JSON facet performance for aggregations
>
> Please enable doc values and try.
> There is a bug in the source code which causes json facet on string field to 
> run very slow. On numeric fields it runs fine with doc value enabled.
>
> On Apr 30, 2017 1:41 PM, "Mikhail Ibraheem" 
> 
> wrote:
>
>> Hi Vijay,
>> It is already numeric field.
>> It is huge difference between json and flat here. Do you know the 
>> reason for this? Is there a way to improve it ?
>>
>> -Original Message-
>> From: Vijay Tiwary [mailto:vijaykr.tiw...@gmail.com]
>> Sent: Sunday, April 30, 2017 9:58 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: JSON facet performance for aggregations
>>
>> Json facet on string fields run lot slower than on numeric fields. 
>> Try and see if you can represent studentid as a numeric field.
>>
>> On Apr 30, 2017 1:19 PM, "Mikhail Ibraheem"
>> 
>> wrote:
>>
>> > Hi,
>> >
>> > I am trying to do aggregation with JSON faceting but performance is 
>> > very bad for one of the requests:
>> >
>> > json.facet={
>> >
>> >studentId:{
>> >
>> >   type:terms,
>> >
>> >   limit:-1,
>> >
>> >   field:"studentId",
>> >
>> >   facet:{
>> >
>> >   x:"sum(grades)"
>> >
>> >   }
>> >
>> >}
>> >
>> > }
>> >
>> >
>> >
>> > This request finishes in 250 seconds, and we can't paginate for 
>> > this service for functional reason so we have to use limit:-1, and 
>> > the cardinality of the studentId is 7500.
>> >
>> >
>> >
>> > If I try the same with flat facet it finishes in 3 seconds :
>> > stats=true&facet=true&stats.field={!tag=piv1
>> > sum=true}grades&facet.pivot={!stats=piv1}studentId
>> >
>> >
>> >
>> > We are hoping to use one approach json or flat for all our services.
>> > JSON facet performance is better for many case.
>> >
>> >
>> >
>> > Please advise on why the performance for this is so bad and if we 
>> > can improve it. Also what is the default algorithm used for json facet.
>> >
>> >
>> >
>> > Thanks
>> >
>> > Mikhail
>> >
>>


Re: Poll: Master-Slave or SolrCloud?

2017-04-30 Thread Shawn Heisey
On 4/25/2017 3:13 PM, Otis Gospodnetić wrote:
> Could one run *only* embedded ZK on some SolrCloud nodes, sans any data?
> It would be equivalent of dedicated Elasticsearch nodes, which is the
> current ES best practice/recommendation.  I've never heard of anyone being
> scared of running 3 dedicated master ES nodes, so if SolrCloud offered the
> same, perhaps even completely hiding ZK from users, that would present the
> same level of complexity (err, simplicity) ES users love about ES.  Don't
> want to talk about SolrCloud vs. ES here at all, just trying to share
> observations since we work a lot with both Elasticsearch and Solr(Cloud) at
> Sematext.

Yes, you could do that ... but I don't see any real value right now. 
You have to learn how to configure a redundant ZK ensemble and apply
that configuration to the embedded servers manually.  Since that's not
any different from what you'd do with an external ensemble, I think it's
better to just use the external install.  As I understand it, elastic
wrote their cluster code themselves ... it's part of ES, not provided by
a separate software package, so their recommendation makes sense for ES.

Using embedded ZK as you have described, there will be at least three
extra Solr nodes that are not intended to host collections.  To keep it
running this way, it will be important to explicitly avoid putting new
collections on those nodes, because that won't happen by default.  With
dedicated external ZK processes, there's no Solr node to worry about,
and no need to create a "master node" capability.

I'm not opposed to automated scripts included with Solr to configure and
start standalone ZK processes, including a way to create an init
script.  That would be very useful and go a long way towards extremely
easy instructions for setting up a fault tolerant SolrCloud installation
on multiple servers.

In situations where ZK is installed on dedicated hardware, a native ZK
will require less heap memory than one embedded in Solr, and probably
will have slightly lower CPU requirements.

SOLR-9386 does make your idea more viable because it brings the full
capability of recent zookeeper configuration options to the embedded
server.  It will be available in version 6.6.

Thanks,
Shawn



Re: Poll: Master-Slave or SolrCloud?

2017-04-30 Thread Ganesh M
We use zookeeper for Hadoop / HBase and so we use same ensemble for Solr
too. We are using Solr Cloud in EC2 instances with 6 collections containing
4 shards and 2 replicas.

We followed the one of the blog

in the internet for our setup and it's works fine. Though the setup is on
tomcat, for latest  solr version with Jetty can also be used with little
change.

Hope this is useful.

Regards,




On Sun, Apr 30, 2017 at 9:06 PM Shawn Heisey  wrote:

> On 4/25/2017 3:13 PM, Otis Gospodnetić wrote:
> > Could one run *only* embedded ZK on some SolrCloud nodes, sans any data?
> > It would be equivalent of dedicated Elasticsearch nodes, which is the
> > current ES best practice/recommendation.  I've never heard of anyone
> being
> > scared of running 3 dedicated master ES nodes, so if SolrCloud offered
> the
> > same, perhaps even completely hiding ZK from users, that would present
> the
> > same level of complexity (err, simplicity) ES users love about ES.  Don't
> > want to talk about SolrCloud vs. ES here at all, just trying to share
> > observations since we work a lot with both Elasticsearch and Solr(Cloud)
> at
> > Sematext.
>
> Yes, you could do that ... but I don't see any real value right now.
> You have to learn how to configure a redundant ZK ensemble and apply
> that configuration to the embedded servers manually.  Since that's not
> any different from what you'd do with an external ensemble, I think it's
> better to just use the external install.  As I understand it, elastic
> wrote their cluster code themselves ... it's part of ES, not provided by
> a separate software package, so their recommendation makes sense for ES.
>
> Using embedded ZK as you have described, there will be at least three
> extra Solr nodes that are not intended to host collections.  To keep it
> running this way, it will be important to explicitly avoid putting new
> collections on those nodes, because that won't happen by default.  With
> dedicated external ZK processes, there's no Solr node to worry about,
> and no need to create a "master node" capability.
>
> I'm not opposed to automated scripts included with Solr to configure and
> start standalone ZK processes, including a way to create an init
> script.  That would be very useful and go a long way towards extremely
> easy instructions for setting up a fault tolerant SolrCloud installation
> on multiple servers.
>
> In situations where ZK is installed on dedicated hardware, a native ZK
> will require less heap memory than one embedded in Solr, and probably
> will have slightly lower CPU requirements.
>
> SOLR-9386 does make your idea more viable because it brings the full
> capability of recent zookeeper configuration options to the embedded
> server.  It will be available in version 6.6.
>
> Thanks,
> Shawn
>
>


Re: Solr performance on EC2 linux

2017-04-30 Thread Jeff Wartes
I’d like to think I helped a little with the metrics upgrade that got released 
in 6.4, so I was already watching that and I’m aware of the resulting 
performance issue.
This was 5.4 though, patched with https://github.com/whitepages/SOLR-4449 - an 
index we’ve been running for some time now.

Mganeshs’s comment that he doesn’t see a difference on EC2 with Solr 6.2 lends 
some additional strength to the thought that something changed between Lucene 
5.4 and 6.2 (which is used in ES 5), but of course it’s all still pretty 
anecdotal.


On 4/28/17, 11:44 AM, "Erick Erickson"  wrote:

Well, 6.4.0 had a pretty severe performance issue so if you were using
that release you might see this, 6.4.2 is the most recent 6.4 release.
But I have no clue how changing linux settings would alter that and I
sure can't square that issue with you having such different
performance between local and EC2

But thanks for telling us about this! It's totally baffling

Erick

On Fri, Apr 28, 2017 at 9:09 AM, Jeff Wartes  wrote:
>
> tldr: Recently, I tried moving an existing solrcloud configuration from a 
local datacenter to EC2. Performance was roughly 1/10th what I’d expected, 
until I applied a bunch of linux tweaks.
>
> This should’ve been a straight port: one datacenter server -> one EC2 
node. Solr 5.4, Solrcloud, Ubuntu xenial. Nodes were sized in both cases such 
that the entire index could be cached in memory, and the JVM settings were 
identical in both places. I applied what should’ve been a comfortable load to 
the EC2 cluster, and everything exploded. I had to back the rate down to 
something close to 10% of what I had been getting in the datacenter before 
latency improved.
> Looking around, I was interested to note that under load, user-time CPU 
usage was being shadowed by an almost equal amount of system CPU time. This was 
not IOWait, but system time. Strace showed a bunch of time being spent in futex 
and restart_syscall, but I couldn’t see where to go from there.
>
> Interestingly, a coworker playing with a ElasticSearch (ES 5.x, so a much 
more recent release) alternate implementation of the same index was not seeing 
this high-system-time behavior on EC2, and was getting throughput consistent 
with our general expectations.
>
> Eventually, we came across this: 
https://linkprotect.cudasvc.com/url?a=http://www.brendangregg.com/blog/2015-03-03/performance-tuning-linux-instances-on-ec2.html&c=E,1,wrdb94Vzm3Hu0-Edzz8gwrCGG9MiHbLKDKltAaM0g2kqyw35-xRDD2azZNIQqp8aoVnP654tzZ3WyRGAhneL4AvPRfV4G6s4VoEeZtSzXgRIBXS62M4Zq4Q,&typo=0
> In direct opposition to the author’s intent, (something about taking 
expired medication) we applied these settings blindly to see what happened. The 
difference was breathtaking. The system time usage disappeared, and I could 
apply load at and even a little above my expected rates, well within my latency 
goals.
>
> There are a number of settings involved, and we haven’t isolated for sure 
which ones made the biggest difference, but my guess at the moment is that it’s 
the change of clocksource. I think this would be consistent with the observed 
system time. Note however that using the “tsc” clocksource on EC2 is generally 
discouraged, because it’s possible to get backwards clock drift.
>
> I’m writing this for a few reasons:
>
> 1.   The performance difference was so crazy I really feel like this 
should really be broader knowledge.
>
> 2.   If anyone is aware of anything that changed in Lucene between 
5.4 and 6.x that could explain why Elasticsearch wasn’t suffering from this? If 
it’s the clocksource that’s the issue, there’s an implication that Solr was 
using tons more system calls like gettimeofday that the EC2 (xen) hypervisor 
doesn’t allow in userspace.
>
> 3.   Has anyone run Solr with the “tsc” clocksource, and is aware of 
any concrete issues?
>
>




Re: Spatial Search: can not use FieldCache on a field which is neither indexed nor has doc values: latitudeLongitude_0_coordinate

2017-04-30 Thread David Smiley
Frederick,

RE LatLonType: Weird. Is the dynamic field "_coordinate" defined?  It
should be ensure it has indexed=true on it.  I forget if indexed needs
to be set on that or on the LLT field that refers to it but to be sure set
on both.

RE LatLonPointSpatialField: You should use this for sure assuming you are
using the latest Solr release (6.5.x).  You said "Solr version 6.1.0" which
doesn't have this field type though.

~ David

On Thu, Apr 27, 2017 at 8:26 AM freddy79 
wrote:

> Hi,
>
> when doing a query with spatial search i get the error: can not use
> FieldCache on a field which is neither indexed nor has doc values:
> latitudeLongitude_0_coordinate
>
> *SOLR Version:* 6.1.0
> *schema.xml:*
>
>  subFieldSuffix="_coordinate" />
>  stored="false" multiValued="false"  />
>
> *Query:*
>
> http://localhost:8983/solr/career_educationVacancyLocation/select?q=*:*&fq={!geofilt}&sfield=latitudeLongitude&pt=48.15,16.23&d=10
>
> *Error Message:*
> can not use FieldCache on a field which is neither indexed nor has doc
> values: latitudeLongitude_0_coordinate
>
> What is wrong? Thanks.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Spatial-Search-can-not-use-FieldCache-on-a-field-which-is-neither-indexed-nor-has-doc-values-latitude-tp4332185.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
-- 
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com


Step By Step guide to create Solr Cloud in Solr 6.x

2017-04-30 Thread Nilesh Kamani
Hello All,

Sorry to bother you all again. I am having hard time understanding solr
terminologies.

Is there any step by step guide to create solr cloud in Solr 6.x ?

I have two servers on my google cloud and have installed solr on both of
them.

I would like to create one collection, shard1 on server1, shard2 on
server2, (replicas).

I want to index few GBs of documents on Shard1/Server1 and few GBs
documents on Shard2/Server1.

Could you please point me to a link or video ?

Thanks,
Nilesh Kamani


Slow indexing speed when collection size is large

2017-04-30 Thread Zheng Lin Edwin Yeo
Hi,

I'm using Solr 6.4.2.

Would like to check, if there are alot of collections in my Solr which has
very large index size, will the indexing speed be affected?

Currently, I have created a new collections in Solr which has several
collections with very large index size, and the indexing speed is much
slower than expected.

Regards,
Edwin


Re: Step By Step guide to create Solr Cloud in Solr 6.x

2017-04-30 Thread Nilesh Kamani
UPDATE -

Able to get shard1 on server and shard2 on server 2 and core on server 1 in
the cluster.

How can I add another node/core to cluster which is on server 2.




On Sun, Apr 30, 2017 at 9:48 PM, Nilesh Kamani 
wrote:

> Hello All,
>
> Sorry to bother you all again. I am having hard time understanding solr
> terminologies.
>
> Is there any step by step guide to create solr cloud in Solr 6.x ?
>
> I have two servers on my google cloud and have installed solr on both of
> them.
>
> I would like to create one collection, shard1 on server1, shard2 on
> server2, (replicas).
>
> I want to index few GBs of documents on Shard1/Server1 and few GBs
> documents on Shard2/Server1.
>
> Could you please point me to a link or video ?
>
> Thanks,
> Nilesh Kamani
>
>
>
>


Re: Step By Step guide to create Solr Cloud in Solr 6.x

2017-04-30 Thread Nilesh Kamani
UPDATE -

After restarting the server, I can see that issue has been resolved for now.


On Sun, Apr 30, 2017 at 11:12 PM, Nilesh Kamani 
wrote:

> UPDATE -
>
> Able to get shard1 on server and shard2 on server 2 and core on server 1
> in the cluster.
>
> How can I add another node/core to cluster which is on server 2.
>
>
>
>
> On Sun, Apr 30, 2017 at 9:48 PM, Nilesh Kamani 
> wrote:
>
>> Hello All,
>>
>> Sorry to bother you all again. I am having hard time understanding solr
>> terminologies.
>>
>> Is there any step by step guide to create solr cloud in Solr 6.x ?
>>
>> I have two servers on my google cloud and have installed solr on both of
>> them.
>>
>> I would like to create one collection, shard1 on server1, shard2 on
>> server2, (replicas).
>>
>> I want to index few GBs of documents on Shard1/Server1 and few GBs
>> documents on Shard2/Server1.
>>
>> Could you please point me to a link or video ?
>>
>> Thanks,
>> Nilesh Kamani
>>
>>
>>
>>
>


Building Solr greater than 6.2.1

2017-04-30 Thread Ryan Yacyshyn
Hi all,

I'm trying to build Solr 6.5.1 but it's is failing. I'm able to
successfully build 6.2.1. I've tried 6.4.0, 6.4.2, and 6.5.1 but the build
fails. I'm not sure what the issue could be. I'm running `ant server` in
the solr dir and this is where it fails:

ivy-configure:
[ivy:configure] :: loading settings :: file =
/Users/rye/lucene-solr2/lucene/top-level-ivy-settings.xml

resolve:

common.init:

compile-lucene-core:

init:

-clover.disable:

-clover.load:

-clover.classpath:

-clover.setup:

clover:

compile-core:

-clover.disable:

-clover.load:

-clover.classpath:

-clover.setup:

clover:

common.compile-core:
[mkdir] Created dir:
/Users/rye/lucene-solr2/lucene/build/test-framework/classes/java
[javac] Compiling 186 source files to
/Users/rye/lucene-solr2/lucene/build/test-framework/classes/java
[javac]
/Users/rye/lucene-solr2/lucene/test-framework/src/java/org/apache/lucene/util/RamUsageTester.java:164:
error: no suitable method found for
collect(Collector>)
[javac]   .collect(Collectors.toList());
[javac]   ^
[javac] method Stream.collect(Supplier,BiConsumer,BiConsumer) is not applicable
[javac]   (cannot infer type-variable(s) R#1
[javac] (actual and formal argument lists differ in length))
[javac] method Stream.collect(Collector) is not applicable
[javac]   (cannot infer type-variable(s) R#2,A,CAP#3,T#2
[javac] (argument mismatch; Collector>
cannot be converted to Collector>))
[javac]   where R#1,T#1,R#2,A,T#2 are type-variables:
[javac] R#1 extends Object declared in method
collect(Supplier,BiConsumer,BiConsumer)
[javac] T#1 extends Object declared in interface Stream
[javac] R#2 extends Object declared in method
collect(Collector)
[javac] A extends Object declared in method
collect(Collector)
[javac] T#2 extends Object declared in method toList()
[javac]   where CAP#1,CAP#2,CAP#3,CAP#4 are fresh type-variables:
[javac] CAP#1 extends Object from capture of ?
[javac] CAP#2 extends Object from capture of ?
[javac] CAP#3 extends Object from capture of ?
[javac] CAP#4 extends Object from capture of ?
[javac] Note: Some input files use or override a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.
[javac] 1 error

BUILD FAILED
/Users/rye/lucene-solr2/solr/build.xml:463: The following error occurred
while executing this line:
/Users/rye/lucene-solr2/solr/common-build.xml:476: The following error
occurred while executing this line:
/Users/rye/lucene-solr2/solr/contrib/map-reduce/build.xml:53: The following
error occurred while executing this line:
/Users/rye/lucene-solr2/solr/contrib/morphlines-cell/build.xml:45: The
following error occurred while executing this line:
/Users/rye/lucene-solr2/solr/common-build.xml:443: The following error
occurred while executing this line:
/Users/rye/lucene-solr2/solr/test-framework/build.xml:35: The following
error occurred while executing this line:
/Users/rye/lucene-solr2/lucene/common-build.xml:767: The following error
occurred while executing this line:
/Users/rye/lucene-solr2/lucene/common-build.xml:501: The following error
occurred while executing this line:
/Users/rye/lucene-solr2/lucene/common-build.xml:1967: Compile failed; see
the compiler error output for details.

Total time: 2 minutes 28 seconds

Java version:

java version "1.8.0_25"
Java(TM) SE Runtime Environment (build 1.8.0_25-b17)
Java HotSpot(TM) 64-Bit Server VM (build 25.25-b02, mixed mode)

ant: Apache Ant(TM) version 1.10.0 compiled on December 27 2016
ivy: ivy-2.3.0.jar

Any suggestions I can try?

Regards,
Ryan


BooleanQuery and WordDelimiterFilter

2017-04-30 Thread Avi Steiner
Hi

I have  a question regarding the use of query parser and BooleanQuery.

I have 3 documents indexed.
Doc1 contains the words huntman's and huntman
Doc2 contains the word huntman's
Doc3 contains the word huntman

When I search for huntman's I get Doc1 and Doc2
When I search for +huntman's I get Doc1, Doc2 and Doc3

As far as I understand, when I search for huntman's it should return documents 
with both huntman and huntman's (using WordDelimiterFilter)
I also know that plus sign means that the term must be in document and the 
absence of plus (or minus) sign means that the term may or may not be in 
document as explained here: 
https://lucidworks.com/2011/12/28/why-not-and-or-and-not/

So I don't understand the combination of these two properties.
I think I understand why +huntman's returns Doc3 as well, because it can be 
translated to +(huntman's OR huntman), which means: must be one of the 
following: huntman's or huntman.
But I don't understand why Doc3 is not returned by huntman's as well. Isn't it 
translated to huntman's OR huntman?

Thanks

Avi



This email and any attachments thereto may contain private, confidential, and 
privileged material for the sole use of the intended recipient. Any review, 
copying, or distribution of this email (or any attachments thereto) by others 
is strictly prohibited. If you are not the intended recipient, please contact 
the sender immediately and permanently delete the original and any copies of 
this email and any attachments thereto.


Re: Building Solr greater than 6.2.1

2017-04-30 Thread Alexandre Rafalovitch
Make sure your Java is latest update. Seriously

Also, if still failing, try blowing away your Ivy cache.

Regards,
Alex

On 1 May 2017 6:34 AM, "Ryan Yacyshyn"  wrote:

> Hi all,
>
> I'm trying to build Solr 6.5.1 but it's is failing. I'm able to
> successfully build 6.2.1. I've tried 6.4.0, 6.4.2, and 6.5.1 but the build
> fails. I'm not sure what the issue could be. I'm running `ant server` in
> the solr dir and this is where it fails:
>
> ivy-configure:
> [ivy:configure] :: loading settings :: file =
> /Users/rye/lucene-solr2/lucene/top-level-ivy-settings.xml
>
> resolve:
>
> common.init:
>
> compile-lucene-core:
>
> init:
>
> -clover.disable:
>
> -clover.load:
>
> -clover.classpath:
>
> -clover.setup:
>
> clover:
>
> compile-core:
>
> -clover.disable:
>
> -clover.load:
>
> -clover.classpath:
>
> -clover.setup:
>
> clover:
>
> common.compile-core:
> [mkdir] Created dir:
> /Users/rye/lucene-solr2/lucene/build/test-framework/classes/java
> [javac] Compiling 186 source files to
> /Users/rye/lucene-solr2/lucene/build/test-framework/classes/java
> [javac]
> /Users/rye/lucene-solr2/lucene/test-framework/src/
> java/org/apache/lucene/util/RamUsageTester.java:164:
> error: no suitable method found for
> collect(Collector>)
> [javac]   .collect(Collectors.toList());
> [javac]   ^
> [javac] method Stream.collect(Supplier,BiConsumer super CAP#2>,BiConsumer) is not applicable
> [javac]   (cannot infer type-variable(s) R#1
> [javac] (actual and formal argument lists differ in length))
> [javac] method Stream.collect(Collector CAP#2,A,R#2>) is not applicable
> [javac]   (cannot infer type-variable(s) R#2,A,CAP#3,T#2
> [javac] (argument mismatch; Collector>
> cannot be converted to Collector>))
> [javac]   where R#1,T#1,R#2,A,T#2 are type-variables:
> [javac] R#1 extends Object declared in method
> collect(Supplier,BiConsumer T#1>,BiConsumer)
> [javac] T#1 extends Object declared in interface Stream
> [javac] R#2 extends Object declared in method
> collect(Collector)
> [javac] A extends Object declared in method
> collect(Collector)
> [javac] T#2 extends Object declared in method toList()
> [javac]   where CAP#1,CAP#2,CAP#3,CAP#4 are fresh type-variables:
> [javac] CAP#1 extends Object from capture of ?
> [javac] CAP#2 extends Object from capture of ?
> [javac] CAP#3 extends Object from capture of ?
> [javac] CAP#4 extends Object from capture of ?
> [javac] Note: Some input files use or override a deprecated API.
> [javac] Note: Recompile with -Xlint:deprecation for details.
> [javac] 1 error
>
> BUILD FAILED
> /Users/rye/lucene-solr2/solr/build.xml:463: The following error occurred
> while executing this line:
> /Users/rye/lucene-solr2/solr/common-build.xml:476: The following error
> occurred while executing this line:
> /Users/rye/lucene-solr2/solr/contrib/map-reduce/build.xml:53: The
> following
> error occurred while executing this line:
> /Users/rye/lucene-solr2/solr/contrib/morphlines-cell/build.xml:45: The
> following error occurred while executing this line:
> /Users/rye/lucene-solr2/solr/common-build.xml:443: The following error
> occurred while executing this line:
> /Users/rye/lucene-solr2/solr/test-framework/build.xml:35: The following
> error occurred while executing this line:
> /Users/rye/lucene-solr2/lucene/common-build.xml:767: The following error
> occurred while executing this line:
> /Users/rye/lucene-solr2/lucene/common-build.xml:501: The following error
> occurred while executing this line:
> /Users/rye/lucene-solr2/lucene/common-build.xml:1967: Compile failed; see
> the compiler error output for details.
>
> Total time: 2 minutes 28 seconds
>
> Java version:
>
> java version "1.8.0_25"
> Java(TM) SE Runtime Environment (build 1.8.0_25-b17)
> Java HotSpot(TM) 64-Bit Server VM (build 25.25-b02, mixed mode)
>
> ant: Apache Ant(TM) version 1.10.0 compiled on December 27 2016
> ivy: ivy-2.3.0.jar
>
> Any suggestions I can try?
>
> Regards,
> Ryan
>