Re: Can TrieDateField fields be null?

2015-08-23 Thread Upayavira
To be strict about it, I'd say that TrieDateFields CANNOT be null, but
they CAN be excluded from the document.

You could then check whether or not a value exists for this field.

Upayavira

On Sun, Aug 23, 2015, at 02:55 AM, Erick Erickson wrote:
> TrieDateFields can be null. Actually, just not in the document.
> I just verified with 4.10
> 
> How are you indexing? I suspect that somehow the program that's sending
> things to Solr is putting the default time in.
> 
> What version of Solr?
> 
> Best,
> Erick
> 
> On Sat, Aug 22, 2015 at 4:04 PM, Henrique O. Santos 
> wrote:
> > Hello,
> >
> > Just a simple question. Can TrieDateField fields be null? I have a schema
> > with the following field and type:
> >  > docValues="true" />
> >  > positionIncrementGap="0"/>
> >
> > Every time I index a document with no value for this field, the current time
> > gets indexed and stored. Is there anyway to make this field null?
> >
> > My use case for this collection requires that I check if that date field is
> > already filled or not.
> >
> > Thank you,
> > Henrique.


SOLR 5.3

2015-08-23 Thread William Bell
At lucene.apache.org/solr it says SOLR 5.3 is there, but when I click on
downloads it shows Solr 5.2.1... ??

"APACHE SOLR™ 5.3.0Solr is the popular, blazing-fast, open source
enterprise search platform built on Apache Lucene™."

-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076


Re: solr add document

2015-08-23 Thread CrazyDiamond
thx i just need to call solr.commit



--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-add-document-tp4224480p4224698.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: DIH delta-import pk

2015-08-23 Thread CrazyDiamond
As far as I understand I cant use 2 uniquefield. i need db id and uuid 
because i moving data from database to solr index entirely. And temporaly i
need it to be compatble with delta-import, but in future i will use new only
uuid .



--
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-delta-import-pk-tp4224342p4224699.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr performance is slow with just 1GB of data indexed

2015-08-23 Thread Toke Eskildsen
Zheng Lin Edwin Yeo  wrote:
> However, I find that clustering is exceeding slow after I index this 1GB of
> data. It took almost 30 seconds to return the cluster results when I set it
> to cluster the top 1000 records, and still take more than 3 seconds when I
> set it to cluster the top 100 records.

Your clustering uses Carrot2, which fetches the top documents and performs 
real-time clustering on them - that process is (nearly) independent of index 
size. The relevant numbers here are top 1000 and top 100, not 1GB. The unknown 
part is whether it is the fetching of top 1000 (the Solr part) or the 
clustering itself (the Carrot part) that is the bottleneck.

- Toke Eskildsen


Re: DIH delta-import pk

2015-08-23 Thread CrazyDiamond
Now  I set db id as unique field and uuid field,which should be generated
automatically as required. but when i add document i have an error that my
required uuid field is missing.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-delta-import-pk-tp4224342p4224701.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SOLR 5.3

2015-08-23 Thread Arcadius Ahouansou
Solr-5.3 has been available for download from
http://mirror.catn.com/pub/apache/lucene/solr/5.3.0/

The redirection on the web site will probably be fixed before we get the
official announcement.

Arcadius.

On 23 August 2015 at 09:00, William Bell  wrote:

> At lucene.apache.org/solr it says SOLR 5.3 is there, but when I click on
> downloads it shows Solr 5.2.1... ??
>
> "APACHE SOLR™ 5.3.0Solr is the popular, blazing-fast, open source
> enterprise search platform built on Apache Lucene™."
>
> --
> Bill Bell
> billnb...@gmail.com
> cell 720-256-8076
>



-- 
Arcadius Ahouansou
Menelic Ltd | Information is Power
M: 07908761999
W: www.menelic.com
---


Re: Remove duplicate suggestions in Solr

2015-08-23 Thread Arcadius Ahouansou
Hi Edwin.

What you are doing here is "search" as Solr has separate components for
doing suggestions.

About dedup,

- have a look at  the manual
https://cwiki.apache.org/confluence/display/solr/De-Duplication

- or simply do your dedup upfront before ingesting into Solr by assigning
the same "id" to all doc with same "textng" (may require a different index
if you want to keep the existing data with duplicate for other purpose)

- Or you could use result grouping/fieldCollapsing to group/dedup your
result

Hope this helps

Arcadius.


On 21 August 2015 at 06:41, Zheng Lin Edwin Yeo 
wrote:

> Hi,
>
> I would like to check, is there anyway to remove duplicate suggestions in
> Solr?
> I have several documents that looks very similar, and when I do a
> suggestion query, it came back with all same results. I'm using Solr 5.2.1
>
> This is my suggestion pipeline:
>
> 
> 
> 
> all
>   json
>   true
>
> 
> edismax
> 10
> id, score
> 
> 
> content^50 title^50 extrasearch^30.0
> textnge^50.0
> 
> 
> 
> name="boost">product(map(query($type1query),0,0,1,$type1boost),map(query($type2query),0,0,1,$type2boost),map(query($type3query),0,0,1,$type3boost),map(query($type4query),0,0,1,$type4boost),$typeboost)
> 1.0
>
> content_type:"application/pdf"
> 0.9
> content_type:"application/msword"
> 0.5
> content_type:"NA"
> 0.0
> content_type:"NA"
> 0.0
>   on
>   id, textng, textng2, language_s
>   true
>   true
>   html
>   
>   50
> false
> 
> 
>
> This is my query:
> http://localhost:8983/edm/chinese2/suggest?q=do our
> best&defType=edismax&qf=content^5 textng^5&pf=textnge^50&pf2=content^20
> textnge^50&pf3=content^40%20textnge^50&ps2=2&ps3=2&stats.calcdistinct=true
>
>
> This is the suggestion result:
>
>  "highlighting":{
> "responsibility001":{
>   "id":["responsibility001"],
>   "textng":["We will strive to do our best.
>  
"], > "responsibility002":{ > "id":["responsibility002"], > "textng":["We will strive to do our best. >
"], > "responsibility003":{ > "id":["responsibility003"], > "textng":["We will strive to do our best. >
"], > "responsibility004":{ > "id":["responsibility004"], > "textng":["We will strive to do our best. >
"], > "responsibility005":{ > "id":["responsibility005"], > "textng":["We will strive to do our best. >
"], > "responsibility006":{ > "id":["responsibility006"], > "textng":["We will strive to do our best. >
"], > "responsibility007":{ > "id":["responsibility007"], > "textng":["We will strive to do our best. >
"], > "responsibility008":{ > "id":["responsibility008"], > "textng":["We will strive to do our best. >
"], > "responsibility009":{ > "id":["responsibility009"], > "textng":["We will strive to do our best. >
"], > "responsibility010":{ > "id":["responsibility010"], > "textng":["We will strive to do our best. >
"], > > > Regards, > Edwin > -- Arcadius Ahouansou Menelic Ltd | Information is Power M: 07908761999 W: www.menelic.com ---

Multiple concurrent queries to Solr

2015-08-23 Thread Ashish Mukherjee
Hello,

I want to run few Solr queries in parallel, which are being done in a
multi-threaded model now. I was wondering if there are any client libraries
to query Solr  through a non-blocking I/O mechanism instead of a threaded
model. Has anyone attempted something like this?

Regards,
Ashish


Re: Can TrieDateField fields be null?

2015-08-23 Thread Henrique O. Santos

Hi Erick and Upayavira, thanks for the reply.

I am using Solr 5.2.1 and using SolrJ 5.2.1 API with an annotated POJO 
to update the index. And you were right, somehow my JODA DateTime field 
was being filled with the current timestamp prior to the update.


Thanks for the clarification again.

On 08/22/2015 09:55 PM, Erick Erickson wrote:

TrieDateFields can be null. Actually, just not in the document.
I just verified with 4.10

How are you indexing? I suspect that somehow the program that's sending
things to Solr is putting the default time in.

What version of Solr?

Best,
Erick

On Sat, Aug 22, 2015 at 4:04 PM, Henrique O. Santos  wrote:

Hello,

Just a simple question. Can TrieDateField fields be null? I have a schema
with the following field and type:
 
 

Every time I index a document with no value for this field, the current time
gets indexed and stored. Is there anyway to make this field null?

My use case for this collection requires that I check if that date field is
already filled or not.

Thank you,
Henrique.




Re: DIH delta-import pk

2015-08-23 Thread William Bell
Send the SQL and Schema.xml. Also logs. Does it complain about _id_ or you
field in schema?




On Sun, Aug 23, 2015 at 4:55 AM, CrazyDiamond  wrote:

> Now  I set db id as unique field and uuid field,which should be generated
> automatically as required. but when i add document i have an error that my
> required uuid field is missing.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/DIH-delta-import-pk-tp4224342p4224701.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076


Re: Can TrieDateField fields be null?

2015-08-23 Thread Henrique O. Santos

Hello again,

I am doing some manual indexing using Solr Admin UI to be exactly sure 
how TrieDateFields and null values work. When I remove the TrieDateField 
from the document, I get the following when trying to index it:


| "msg": "Invalid Date String:'NULL'",
"code": 400|


On Solr 5.2.1. Can I assume that TrieDateFields need to be specified for 
every document?


Thanks.

On 08/23/2015 09:48 AM, Henrique O. Santos wrote:

Hi Erick and Upayavira, thanks for the reply.

I am using Solr 5.2.1 and using SolrJ 5.2.1 API with an annotated POJO 
to update the index. And you were right, somehow my JODA DateTime 
field was being filled with the current timestamp prior to the update.


Thanks for the clarification again.

On 08/22/2015 09:55 PM, Erick Erickson wrote:

TrieDateFields can be null. Actually, just not in the document.
I just verified with 4.10

How are you indexing? I suspect that somehow the program that's sending
things to Solr is putting the default time in.

What version of Solr?

Best,
Erick

On Sat, Aug 22, 2015 at 4:04 PM, Henrique O. Santos 
 wrote:

Hello,

Just a simple question. Can TrieDateField fields be null? I have a 
schema

with the following field and type:
 
 precisionStep="0"

positionIncrementGap="0"/>

Every time I index a document with no value for this field, the 
current time

gets indexed and stored. Is there anyway to make this field null?

My use case for this collection requires that I check if that date 
field is

already filled or not.

Thank you,
Henrique.






Re: Solr performance is slow with just 1GB of data indexed

2015-08-23 Thread Shawn Heisey
On 8/22/2015 10:28 PM, Zheng Lin Edwin Yeo wrote:
> Hi Shawn,
> 
> Yes, I've increased the heap size to 4GB already, and I'm using a machine
> with 32GB RAM.
> 
> Is it recommended to further increase the heap size to like 8GB or 16GB?

Probably not, but I know nothing about your data.  How many Solr docs
were created by indexing 1GB of data?  How much disk space is used by
your Solr index(es)?

I know very little about clustering, but it looks like you've gotten a
reply from Toke, who knows a lot more about that part of the code than I do.

Thanks,
Shawn



Re: Solr performance is slow with just 1GB of data indexed

2015-08-23 Thread Zheng Lin Edwin Yeo
Hi Shawn and Toke,

I only have 520 docs in my data, but each of the documents is quite big in
size, In the Solr, it is using 221MB. So when i set to read from the top
1000 rows, it should just be reading all the 520 docs that are indexed?

Regards,
Edwin


On 23 August 2015 at 22:52, Shawn Heisey  wrote:

> On 8/22/2015 10:28 PM, Zheng Lin Edwin Yeo wrote:
> > Hi Shawn,
> >
> > Yes, I've increased the heap size to 4GB already, and I'm using a machine
> > with 32GB RAM.
> >
> > Is it recommended to further increase the heap size to like 8GB or 16GB?
>
> Probably not, but I know nothing about your data.  How many Solr docs
> were created by indexing 1GB of data?  How much disk space is used by
> your Solr index(es)?
>
> I know very little about clustering, but it looks like you've gotten a
> reply from Toke, who knows a lot more about that part of the code than I
> do.
>
> Thanks,
> Shawn
>
>


Re: Too many updates received since start

2015-08-23 Thread Yago Riveiro
Indeed, I don't understand the caveat too, but I can imagine that is related 
with some algorithm to trigger a full sync if necessary.




I will waiting for 5.3 to do the upgrade and have this configuration available.


—/Yago Riveiro

On Sun, Aug 23, 2015 at 3:37 AM, Shawn Heisey  wrote:

> On 8/22/2015 3:50 PM, Yago Riveiro wrote:
>> I'm using java 7u25 oracle version with Solr 4.6.1
>> 
>> It work well with > 98% of throughput but in some full GC the issue arises. 
>> A full sync for one shard is more than 50G.
>> 
>> There is any configuration to configurate the number of docs behind leader 
>> that a replica can be?
> It looks like the number of docs is configurable in 5.1 and later:
> https://issues.apache.org/jira/browse/SOLR-6359
> There is apparently a caveat related to SolrCloud recovery, which I am
> having trouble grasping:
> "the 20% newest existing transaction log of the core to be recovered
> must be newer than the 20% oldest existing transaction log of the good
> core."
> Thanks,
> Shawn

Re: Can TrieDateField fields be null?

2015-08-23 Thread Shawn Heisey
On 8/23/2015 8:29 AM, Henrique O. Santos wrote:
> I am doing some manual indexing using Solr Admin UI to be exactly sure
> how TrieDateFields and null values work. When I remove the TrieDateField
> from the document, I get the following when trying to index it:
> 
> | "msg": "Invalid Date String:'NULL'",
> "code": 400|

Unless the field is marked as required in your schema, TrieDateField
will work if you have no value for the field.  This means the field is
not present in the javabin, xml, or json data sent to Solr for indexing,
not that the empty string is present.

What you have here is literally the string "NULL" -- four letters.  This
will NOT work on any kind of Trie field.

Sometimes you can run into a conversion glitch related to a Java null
object, but in that case the value is usually lowercase -- "null" --
which wouldn't work either.

Thanks,
Shawn



Re: Can TrieDateField fields be null?

2015-08-23 Thread Erick Erickson
Following up on Shawn's comment, this can be the
result of some sort of serialization or, if you're pulling
info from a DB the literal string NULL may be
returned from the DB.

Solr really has no concept of a distinct value of NULL
for a field, in Solr/Lucene terms that's just the total
absence of the field from the document.

Best,
Erick

On Sun, Aug 23, 2015 at 8:15 AM, Shawn Heisey  wrote:
> On 8/23/2015 8:29 AM, Henrique O. Santos wrote:
>> I am doing some manual indexing using Solr Admin UI to be exactly sure
>> how TrieDateFields and null values work. When I remove the TrieDateField
>> from the document, I get the following when trying to index it:
>>
>> | "msg": "Invalid Date String:'NULL'",
>> "code": 400|
>
> Unless the field is marked as required in your schema, TrieDateField
> will work if you have no value for the field.  This means the field is
> not present in the javabin, xml, or json data sent to Solr for indexing,
> not that the empty string is present.
>
> What you have here is literally the string "NULL" -- four letters.  This
> will NOT work on any kind of Trie field.
>
> Sometimes you can run into a conversion glitch related to a Java null
> object, but in that case the value is usually lowercase -- "null" --
> which wouldn't work either.
>
> Thanks,
> Shawn
>


Re: Multiple concurrent queries to Solr

2015-08-23 Thread Shawn Heisey
On 8/23/2015 7:46 AM, Ashish Mukherjee wrote:
> I want to run few Solr queries in parallel, which are being done in a
> multi-threaded model now. I was wondering if there are any client libraries
> to query Solr  through a non-blocking I/O mechanism instead of a threaded
> model. Has anyone attempted something like this?

The only client library that the Solr project makes is SolrJ -- the
client for Java.  If you are not using the SolrJ client, then the Solr
project did not write it, and you should contact the authors of the
library directly.

SolrJ and Solr are both completely thread-safe, and multiple threads are
recommended for highly concurrent usage.  SolrJ uses HttpClient for
communication with Solr.

I was not able to determine whether the default httpclient settings will
result in non-blocking I/O or not. As far as I am aware, nothing in
SolrJ sets any explicit configuration for blocking or non-blocking I/O.
 You can create your own HttpClient object in a SolrJ program and have
the SolrClient object use it.

HttpClient uses HttpCore.  Here is the main web page for these components:

https://hc.apache.org/

On this webpage, it says "HttpCore supports two I/O models: blocking I/O
model based on the classic Java I/O and non-blocking, event driven I/O
model based on Java NIO."  There is no information here about which
model is chosen by default.

Thanks,
Shawn



Re: Solr performance is slow with just 1GB of data indexed

2015-08-23 Thread Alexandre Rafalovitch
Are you by any chance doing store=true on the fields you want to search?

If so, you may want to switch to just index=true. Of course, they will
then not come back in the results, but do you really want to sling
huge content fields around.

The other option is to do lazyLoading=true and not request that field.
This, as a test, you could actually do without needing to reindex
Solr, just with restart. This could give you a way to test whether the
field stored size is the issue.

Regards,
   Alex.

Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/


On 23 August 2015 at 11:13, Zheng Lin Edwin Yeo  wrote:
> Hi Shawn and Toke,
>
> I only have 520 docs in my data, but each of the documents is quite big in
> size, In the Solr, it is using 221MB. So when i set to read from the top
> 1000 rows, it should just be reading all the 520 docs that are indexed?
>
> Regards,
> Edwin
>
>
> On 23 August 2015 at 22:52, Shawn Heisey  wrote:
>
>> On 8/22/2015 10:28 PM, Zheng Lin Edwin Yeo wrote:
>> > Hi Shawn,
>> >
>> > Yes, I've increased the heap size to 4GB already, and I'm using a machine
>> > with 32GB RAM.
>> >
>> > Is it recommended to further increase the heap size to like 8GB or 16GB?
>>
>> Probably not, but I know nothing about your data.  How many Solr docs
>> were created by indexing 1GB of data?  How much disk space is used by
>> your Solr index(es)?
>>
>> I know very little about clustering, but it looks like you've gotten a
>> reply from Toke, who knows a lot more about that part of the code than I
>> do.
>>
>> Thanks,
>> Shawn
>>
>>


Re: Multiple concurrent queries to Solr

2015-08-23 Thread Walter Underwood
The last time that I used the HTTPClient library, it was non-blocking. It 
doesn’t try to read from the socket until you ask for data from the response 
object. That allows parallel requests without threads.

Underneath, it has a pool of connections that can be reused. If the pool is 
exhausted, it can block.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


On Aug 23, 2015, at 8:49 AM, Shawn Heisey  wrote:

> On 8/23/2015 7:46 AM, Ashish Mukherjee wrote:
>> I want to run few Solr queries in parallel, which are being done in a
>> multi-threaded model now. I was wondering if there are any client libraries
>> to query Solr  through a non-blocking I/O mechanism instead of a threaded
>> model. Has anyone attempted something like this?
> 
> The only client library that the Solr project makes is SolrJ -- the
> client for Java.  If you are not using the SolrJ client, then the Solr
> project did not write it, and you should contact the authors of the
> library directly.
> 
> SolrJ and Solr are both completely thread-safe, and multiple threads are
> recommended for highly concurrent usage.  SolrJ uses HttpClient for
> communication with Solr.
> 
> I was not able to determine whether the default httpclient settings will
> result in non-blocking I/O or not. As far as I am aware, nothing in
> SolrJ sets any explicit configuration for blocking or non-blocking I/O.
> You can create your own HttpClient object in a SolrJ program and have
> the SolrClient object use it.
> 
> HttpClient uses HttpCore.  Here is the main web page for these components:
> 
> https://hc.apache.org/
> 
> On this webpage, it says "HttpCore supports two I/O models: blocking I/O
> model based on the classic Java I/O and non-blocking, event driven I/O
> model based on Java NIO."  There is no information here about which
> model is chosen by default.
> 
> Thanks,
> Shawn
> 



Re: Solr performance is slow with just 1GB of data indexed

2015-08-23 Thread Erick Erickson
You're confusing clustering with searching. Sure, Solr can index
and lots of data, but clustering is essentially finding ad-hoc
similarities between arbitrary documents. It must take each of
the documents in the result size you specify from your result
set and try to find commonalities.

For perf issues in terms of clustering, you'd be better off
talking to the folks at the carrot project.

Best,
Erick

On Sun, Aug 23, 2015 at 8:51 AM, Alexandre Rafalovitch
 wrote:
> Are you by any chance doing store=true on the fields you want to search?
>
> If so, you may want to switch to just index=true. Of course, they will
> then not come back in the results, but do you really want to sling
> huge content fields around.
>
> The other option is to do lazyLoading=true and not request that field.
> This, as a test, you could actually do without needing to reindex
> Solr, just with restart. This could give you a way to test whether the
> field stored size is the issue.
>
> Regards,
>Alex.
> 
> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> http://www.solr-start.com/
>
>
> On 23 August 2015 at 11:13, Zheng Lin Edwin Yeo  wrote:
>> Hi Shawn and Toke,
>>
>> I only have 520 docs in my data, but each of the documents is quite big in
>> size, In the Solr, it is using 221MB. So when i set to read from the top
>> 1000 rows, it should just be reading all the 520 docs that are indexed?
>>
>> Regards,
>> Edwin
>>
>>
>> On 23 August 2015 at 22:52, Shawn Heisey  wrote:
>>
>>> On 8/22/2015 10:28 PM, Zheng Lin Edwin Yeo wrote:
>>> > Hi Shawn,
>>> >
>>> > Yes, I've increased the heap size to 4GB already, and I'm using a machine
>>> > with 32GB RAM.
>>> >
>>> > Is it recommended to further increase the heap size to like 8GB or 16GB?
>>>
>>> Probably not, but I know nothing about your data.  How many Solr docs
>>> were created by indexing 1GB of data?  How much disk space is used by
>>> your Solr index(es)?
>>>
>>> I know very little about clustering, but it looks like you've gotten a
>>> reply from Toke, who knows a lot more about that part of the code than I
>>> do.
>>>
>>> Thanks,
>>> Shawn
>>>
>>>


Re: Solr performance is slow with just 1GB of data indexed

2015-08-23 Thread Upayavira
And be aware that I'm sure the more terms in your documents, the slower
clustering will be. So it isn't just the number of docs, the size of
them counts in this instance.

A simple test would be to build an index with just the first 1000 terms
of your clustering fields, and see if that makes a difference to
performance.

Upayavira

On Sun, Aug 23, 2015, at 05:32 PM, Erick Erickson wrote:
> You're confusing clustering with searching. Sure, Solr can index
> and lots of data, but clustering is essentially finding ad-hoc
> similarities between arbitrary documents. It must take each of
> the documents in the result size you specify from your result
> set and try to find commonalities.
> 
> For perf issues in terms of clustering, you'd be better off
> talking to the folks at the carrot project.
> 
> Best,
> Erick
> 
> On Sun, Aug 23, 2015 at 8:51 AM, Alexandre Rafalovitch
>  wrote:
> > Are you by any chance doing store=true on the fields you want to search?
> >
> > If so, you may want to switch to just index=true. Of course, they will
> > then not come back in the results, but do you really want to sling
> > huge content fields around.
> >
> > The other option is to do lazyLoading=true and not request that field.
> > This, as a test, you could actually do without needing to reindex
> > Solr, just with restart. This could give you a way to test whether the
> > field stored size is the issue.
> >
> > Regards,
> >Alex.
> > 
> > Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> > http://www.solr-start.com/
> >
> >
> > On 23 August 2015 at 11:13, Zheng Lin Edwin Yeo  
> > wrote:
> >> Hi Shawn and Toke,
> >>
> >> I only have 520 docs in my data, but each of the documents is quite big in
> >> size, In the Solr, it is using 221MB. So when i set to read from the top
> >> 1000 rows, it should just be reading all the 520 docs that are indexed?
> >>
> >> Regards,
> >> Edwin
> >>
> >>
> >> On 23 August 2015 at 22:52, Shawn Heisey  wrote:
> >>
> >>> On 8/22/2015 10:28 PM, Zheng Lin Edwin Yeo wrote:
> >>> > Hi Shawn,
> >>> >
> >>> > Yes, I've increased the heap size to 4GB already, and I'm using a 
> >>> > machine
> >>> > with 32GB RAM.
> >>> >
> >>> > Is it recommended to further increase the heap size to like 8GB or 16GB?
> >>>
> >>> Probably not, but I know nothing about your data.  How many Solr docs
> >>> were created by indexing 1GB of data?  How much disk space is used by
> >>> your Solr index(es)?
> >>>
> >>> I know very little about clustering, but it looks like you've gotten a
> >>> reply from Toke, who knows a lot more about that part of the code than I
> >>> do.
> >>>
> >>> Thanks,
> >>> Shawn
> >>>
> >>>


Re: Solr performance is slow with just 1GB of data indexed

2015-08-23 Thread Jimmy Lin
unsubscribe

On Sat, Aug 22, 2015 at 9:31 PM, Zheng Lin Edwin Yeo 
wrote:

> Hi,
>
> I'm using Solr 5.2.1, and I've indexed about 1GB of data into Solr.
>
> However, I find that clustering is exceeding slow after I index this 1GB of
> data. It took almost 30 seconds to return the cluster results when I set it
> to cluster the top 1000 records, and still take more than 3 seconds when I
> set it to cluster the top 100 records.
>
> Is this speed normal? Cos i understand Solr can index terabytes of data
> without having the performance impacted so much, but now the collection is
> slowing down even with just 1GB of data.
>
> Below is my clustering configurations in solrconfig.xml.
>
> startup="lazy"
>   enable="${solr.clustering.enabled:true}"
>   class="solr.SearchHandler">
> 
>explicit
>   1000
>json
>true
>   text
>   null
>
>   true
>   true
>   subject content tag
>   true
>
>  20
>   
>   20
>   
>   false
>  7
>
>   
>   edismax
> 
> 
>   clustering
> 
>   
>
>
> Regards,
> Edwin
>


Re: Multiple concurrent queries to Solr

2015-08-23 Thread Arcadius Ahouansou
Hello Ashish.

Therse is an unfinished work about this at
https://issues.apache.org/jira/browse/SOLR-3383

Maybe you want to have a look and contribute?

Arcadius.

On 23 August 2015 at 17:02, Walter Underwood  wrote:

> The last time that I used the HTTPClient library, it was non-blocking. It
> doesn’t try to read from the socket until you ask for data from the
> response object. That allows parallel requests without threads.
>
> Underneath, it has a pool of connections that can be reused. If the pool
> is exhausted, it can block.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>
> On Aug 23, 2015, at 8:49 AM, Shawn Heisey  wrote:
>
> > On 8/23/2015 7:46 AM, Ashish Mukherjee wrote:
> >> I want to run few Solr queries in parallel, which are being done in a
> >> multi-threaded model now. I was wondering if there are any client
> libraries
> >> to query Solr  through a non-blocking I/O mechanism instead of a
> threaded
> >> model. Has anyone attempted something like this?
> >
> > The only client library that the Solr project makes is SolrJ -- the
> > client for Java.  If you are not using the SolrJ client, then the Solr
> > project did not write it, and you should contact the authors of the
> > library directly.
> >
> > SolrJ and Solr are both completely thread-safe, and multiple threads are
> > recommended for highly concurrent usage.  SolrJ uses HttpClient for
> > communication with Solr.
> >
> > I was not able to determine whether the default httpclient settings will
> > result in non-blocking I/O or not. As far as I am aware, nothing in
> > SolrJ sets any explicit configuration for blocking or non-blocking I/O.
> > You can create your own HttpClient object in a SolrJ program and have
> > the SolrClient object use it.
> >
> > HttpClient uses HttpCore.  Here is the main web page for these
> components:
> >
> > https://hc.apache.org/
> >
> > On this webpage, it says "HttpCore supports two I/O models: blocking I/O
> > model based on the classic Java I/O and non-blocking, event driven I/O
> > model based on Java NIO."  There is no information here about which
> > model is chosen by default.
> >
> > Thanks,
> > Shawn
> >
>
>


-- 
Arcadius Ahouansou
Menelic Ltd | Information is Power
M: 07908761999
W: www.menelic.com
---


Re: DIH delta-import pk

2015-08-23 Thread CrazyDiamond
i don't use SQL now. i'm adding documents manually.

 
  
  

 db_id_s 




--
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-delta-import-pk-tp4224342p4224762.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Custom Solr caches in a FunctionQuery that emulates the ExternalFileField

2015-08-23 Thread Mikhail Khludnev
Hello Upayavira,
It's a long month ago! I just described this approach in
http://blog.griddynamics.com/2015/08/scoring-join-party-in-solr-53.html
Coming back to our discussion I think I miss {!func} which turn fieldname
into function query.

On Fri, Jul 24, 2015 at 3:41 PM, Upayavira  wrote:

> Mikhail,
>
> I've tried this out, but to be honest I can't work out what the score=
> parameter is supposed to add.
>
> I assume that if I do {!join fromIndex=other from=other_key to=key
> score=max}somefield:(abc dev)
>
> It will calculate the score for each document that has the same "key"
> value, and include that in the score for the main document?
>
> If this is the case, then I should be able to do:
>
> {!join fromIndex=other from=other_key to=key score=max}{!boost
> b=my_boost_value_field}*:*
>
> In which case, it'll take the value of "my_boost_field" in the other
> core, and include it in the score for my document that has the value of
> "key"?
>
> Upayavira
>
> On Fri, Jul 10, 2015, at 04:15 PM, Mikhail Khludnev wrote:
> > I've heard that people use
> > https://issues.apache.org/jira/browse/SOLR-6234
> > for such purpose - adding scores from fast moving core to the bigger slow
> > moving one
> >
> > On Fri, Jul 10, 2015 at 4:54 PM, Upayavira  wrote:
> >
> > > All,
> > >
> > > I have knocked up what I think could be a really cool function query -
> > > it allows you to retrieve a value from another core (much like a pseudo
> > > join) and use that value during scoring (much like an
> > > ExternalFileField).
> > >
> > > Examples:
> > >  * Selective boosting of documents based upon a category based value
> > >  * boost on aggregated popularity values
> > >  * boost on fast moving data on your slow moving index
> > >
> > > It *works* but it does so very slowly (on 3m docs, milliseconds
> without,
> > > and 24s with it). There are two things that happen a lot:
> > >
> > >  * locate a document with unique ID value of X
> > >  * retrieve the value of field Y for that doc
> > >
> > > What it seems to me now is that I need to implement a cache that will
> > > have a string value as the key and the (float) field value as the
> > > object, that is warmed alongside existing caches.
> > >
> > > Any pointers to examples of how I could do this, or other ways to do
> the
> > > conversion from a key value to a float value faster?
> > >
> > > NB. I hope to contribute this if I can make it perform.
> > >
> > > Thanks!
> > >
> > > Upayavira
> > >
> >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> > Principal Engineer,
> > Grid Dynamics
> >
> > 
> > 
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics





Re: Solr performance is slow with just 1GB of data indexed

2015-08-23 Thread Bill Bell
We use 8gb to 10gb for those size indexes all the time.


Bill Bell
Sent from mobile


> On Aug 23, 2015, at 8:52 AM, Shawn Heisey  wrote:
> 
>> On 8/22/2015 10:28 PM, Zheng Lin Edwin Yeo wrote:
>> Hi Shawn,
>> 
>> Yes, I've increased the heap size to 4GB already, and I'm using a machine
>> with 32GB RAM.
>> 
>> Is it recommended to further increase the heap size to like 8GB or 16GB?
> 
> Probably not, but I know nothing about your data.  How many Solr docs
> were created by indexing 1GB of data?  How much disk space is used by
> your Solr index(es)?
> 
> I know very little about clustering, but it looks like you've gotten a
> reply from Toke, who knows a lot more about that part of the code than I do.
> 
> Thanks,
> Shawn
> 


Re: Remove duplicate suggestions in Solr

2015-08-23 Thread Zheng Lin Edwin Yeo
Hi Arcadius,

Thank you for your reply.

So this means that the de-duplication has to be done during indexing time,
and not during query time?

Yes, currently I'm building on the "search" to be do my suggestion as I
faced some issues with the suggestions components in the Solr 5.1.0 version.
Will the suggestion components solve this issues of giving duplicating
suggestions?

There might also be cases where about 1/2 to 3/4 of my indexed documents
are the same, with only the remaining 1/4 to 1/2 are different. So this
will probably lead to cases where the index is different, but a search may
return the part of the document that are the same.


Regards,
Edwin


On 23 August 2015 at 21:44, Arcadius Ahouansou  wrote:

> Hi Edwin.
>
> What you are doing here is "search" as Solr has separate components for
> doing suggestions.
>
> About dedup,
>
> - have a look at  the manual
> https://cwiki.apache.org/confluence/display/solr/De-Duplication
>
> - or simply do your dedup upfront before ingesting into Solr by assigning
> the same "id" to all doc with same "textng" (may require a different index
> if you want to keep the existing data with duplicate for other purpose)
>
> - Or you could use result grouping/fieldCollapsing to group/dedup your
> result
>
> Hope this helps
>
> Arcadius.
>
>
> On 21 August 2015 at 06:41, Zheng Lin Edwin Yeo 
> wrote:
>
> > Hi,
> >
> > I would like to check, is there anyway to remove duplicate suggestions in
> > Solr?
> > I have several documents that looks very similar, and when I do a
> > suggestion query, it came back with all same results. I'm using Solr
> 5.2.1
> >
> > This is my suggestion pipeline:
> >
> > 
> > 
> > 
> > all
> >   json
> >   true
> >
> > 
> > edismax
> > 10
> > id, score
> > 
> > 
> > content^50 title^50 extrasearch^30.0
> > textnge^50.0
> > 
> > 
> >  >
> >
> name="boost">product(map(query($type1query),0,0,1,$type1boost),map(query($type2query),0,0,1,$type2boost),map(query($type3query),0,0,1,$type3boost),map(query($type4query),0,0,1,$type4boost),$typeboost)
> > 1.0
> >
> > content_type:"application/pdf"
> > 0.9
> > content_type:"application/msword"
> > 0.5
> > content_type:"NA"
> > 0.0
> > content_type:"NA"
> > 0.0
> >   on
> >   id, textng, textng2, language_s
> >   true
> >   true
> >   html
> >   
> >   50
> > false
> > 
> > 
> >
> > This is my query:
> > http://localhost:8983/edm/chinese2/suggest?q=do our
> > best&defType=edismax&qf=content^5 textng^5&pf=textnge^50&pf2=content^20
> >
> textnge^50&pf3=content^40%20textnge^50&ps2=2&ps3=2&stats.calcdistinct=true
> >
> >
> > This is the suggestion result:
> >
> >  "highlighting":{
> > "responsibility001":{
> >   "id":["responsibility001"],
> >   "textng":["We will strive to do our
> best.
> >  
"], > > "responsibility002":{ > > "id":["responsibility002"], > > "textng":["We will strive to do our > best. > >
"], > > "responsibility003":{ > > "id":["responsibility003"], > > "textng":["We will strive to do our > best. > >
"], > > "responsibility004":{ > > "id":["responsibility004"], > > "textng":["We will strive to do our > best. > >
"], > > "responsibility005":{ > > "id":["responsibility005"], > > "textng":["We will strive to do our > best. > >
"], > > "responsibility006":{ > > "id":["responsibility006"], > > "textng":["We will strive to do our > best. > >
"], > > "responsibility007":{ > > "id":["responsibility007"], > > "textng":["We will strive to do our > best. > >
"], > > "responsibility008":{ > > "id":["responsibility008"], > > "textng":["We will strive to do our > best. > >
"], > > "responsibility009":{ > > "id":["responsibility009"], > > "textng":["We will strive to do our > best. > >
"], > > "responsibility010":{ > > "id":["responsibility010"], > > "textng":["We will strive to do our > best. > >
"], > > > > > > Regards, > > Edwin > > > > > > -- > Arcadius Ahouansou > Menelic Ltd | Information is Power > M: 07908761999 > W: www.menelic.com > --- >

Re: Solr performance is slow with just 1GB of data indexed

2015-08-23 Thread Zheng Lin Edwin Yeo
Yes, I'm using store=true.


However, this field needs to be stored as my program requires this field to
be returned during normal searching. I tried the lazyLoading=true, but it's
not working.

Will you do a copy field for the content, and not to set stored="true" for
that field. So that field will just be referenced to for the clustering,
and the normal search will reference to the original content field?

Regards,
Edwin




On 23 August 2015 at 23:51, Alexandre Rafalovitch 
wrote:

> Are you by any chance doing store=true on the fields you want to search?
>
> If so, you may want to switch to just index=true. Of course, they will
> then not come back in the results, but do you really want to sling
> huge content fields around.
>
> The other option is to do lazyLoading=true and not request that field.
> This, as a test, you could actually do without needing to reindex
> Solr, just with restart. This could give you a way to test whether the
> field stored size is the issue.
>
> Regards,
>Alex.
> 
> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> http://www.solr-start.com/
>
>
> On 23 August 2015 at 11:13, Zheng Lin Edwin Yeo 
> wrote:
> > Hi Shawn and Toke,
> >
> > I only have 520 docs in my data, but each of the documents is quite big
> in
> > size, In the Solr, it is using 221MB. So when i set to read from the top
> > 1000 rows, it should just be reading all the 520 docs that are indexed?
> >
> > Regards,
> > Edwin
> >
> >
> > On 23 August 2015 at 22:52, Shawn Heisey  wrote:
> >
> >> On 8/22/2015 10:28 PM, Zheng Lin Edwin Yeo wrote:
> >> > Hi Shawn,
> >> >
> >> > Yes, I've increased the heap size to 4GB already, and I'm using a
> machine
> >> > with 32GB RAM.
> >> >
> >> > Is it recommended to further increase the heap size to like 8GB or
> 16GB?
> >>
> >> Probably not, but I know nothing about your data.  How many Solr docs
> >> were created by indexing 1GB of data?  How much disk space is used by
> >> your Solr index(es)?
> >>
> >> I know very little about clustering, but it looks like you've gotten a
> >> reply from Toke, who knows a lot more about that part of the code than I
> >> do.
> >>
> >> Thanks,
> >> Shawn
> >>
> >>
>


Re: Solr performance is slow with just 1GB of data indexed

2015-08-23 Thread Zheng Lin Edwin Yeo
Hi Alexandre,

I've tried to use just index=true, and the speed is still the same and not
any faster. If I set to store=false, there's no results that came back with
the clustering. Is this due to the index are not stored, and the clustering
requires indexed that are stored?

I've also increase my heap size to 16GB as I'm using a machine with 32GB
RAM, but there is no significant improvement with the performance too.

Regards,
Edwin



On 24 August 2015 at 10:16, Zheng Lin Edwin Yeo 
wrote:

> Yes, I'm using store=true.
>  omitNorms="true" termVectors="true"/>
>
> However, this field needs to be stored as my program requires this field
> to be returned during normal searching. I tried the lazyLoading=true, but
> it's not working.
>
> Will you do a copy field for the content, and not to set stored="true" for
> that field. So that field will just be referenced to for the clustering,
> and the normal search will reference to the original content field?
>
> Regards,
> Edwin
>
>
>
>
> On 23 August 2015 at 23:51, Alexandre Rafalovitch 
> wrote:
>
>> Are you by any chance doing store=true on the fields you want to search?
>>
>> If so, you may want to switch to just index=true. Of course, they will
>> then not come back in the results, but do you really want to sling
>> huge content fields around.
>>
>> The other option is to do lazyLoading=true and not request that field.
>> This, as a test, you could actually do without needing to reindex
>> Solr, just with restart. This could give you a way to test whether the
>> field stored size is the issue.
>>
>> Regards,
>>Alex.
>> 
>> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
>> http://www.solr-start.com/
>>
>>
>> On 23 August 2015 at 11:13, Zheng Lin Edwin Yeo 
>> wrote:
>> > Hi Shawn and Toke,
>> >
>> > I only have 520 docs in my data, but each of the documents is quite big
>> in
>> > size, In the Solr, it is using 221MB. So when i set to read from the top
>> > 1000 rows, it should just be reading all the 520 docs that are indexed?
>> >
>> > Regards,
>> > Edwin
>> >
>> >
>> > On 23 August 2015 at 22:52, Shawn Heisey  wrote:
>> >
>> >> On 8/22/2015 10:28 PM, Zheng Lin Edwin Yeo wrote:
>> >> > Hi Shawn,
>> >> >
>> >> > Yes, I've increased the heap size to 4GB already, and I'm using a
>> machine
>> >> > with 32GB RAM.
>> >> >
>> >> > Is it recommended to further increase the heap size to like 8GB or
>> 16GB?
>> >>
>> >> Probably not, but I know nothing about your data.  How many Solr docs
>> >> were created by indexing 1GB of data?  How much disk space is used by
>> >> your Solr index(es)?
>> >>
>> >> I know very little about clustering, but it looks like you've gotten a
>> >> reply from Toke, who knows a lot more about that part of the code than
>> I
>> >> do.
>> >>
>> >> Thanks,
>> >> Shawn
>> >>
>> >>
>>
>
>


Re: Exception while using {!cardinality=1.0}.

2015-08-23 Thread Modassar Ather
- Did you have the exact same data in both fields?
No the data is not same.

- Did your "real" query actually compute stats on the same field you had
:   done your main term query on?

The query field is different and I missed to clearly put it. I will
accordingly modify the jira.
So the query can be
q=anyfield:query&stats=true&stats.field={!cardinality=1.0}field
Can you please explain how having the same field for query and stat can
cause some issue for my better understanding of this feature?

I haven't had a chance to review the jira in depth or actaully run your
code with those configs -- but if you get a chance before i do, please
re-review the code & configs you posted and see if you can reproduce using
the *exact* same data in two different fields, and if the choice of query
makes a differnce in the behavior you see.

Will try to reproduce the same as you have mentioned and revert with
details.

Thanks,
Modassar

On Sat, Aug 22, 2015 at 3:43 AM, Chris Hostetter 
wrote:

>
> : - Did you have the exact same data in both fields?
> : Both the field are string type.
>
> that's not the question i asked.
>
> is the data *in* these fields (ie: the actual value of each field for each
> document) the same for both of the fields?  This is important to figuring
> out if the root problem that having docValues (or not having docValues)
> causes a problem, or is the root problem that having certain kinds of
> *data* in a string field (regardless of docValues) can cause this problem.
>
> Skimming the sample code you posted to SOLR-7954 you are definitley
> putting differnet data into "field" then you put into "field1" so it's
> still not clear what the problem is.
>
> : - Did your "real" query actually compute stats on the same field you had
> :   done your main term query on?
> : I did not get the question but as much I understood and verified in the
> : Solr log the stat is computed on the field given with
> : stats.field={!cardinality=1.0}field.
>
> the question is sepcific to the example query you mentioned before and
> again in your descripion in SOLR-7954.  They show that the same field
> name you are computing stats on ("field") is also used in your main query
> as a constraint on the documents ("q=field:query") which is an odd and
> very special edge case that may be pertinant to the problem you are
> seeing.  Depending on what data you index, that might easily only match 1
> document -- in the case of the test code you put in jira, exactly 0
> documents since you never index the text "query" into field "field" for
> any document)
>
>
> I haven't had a chance to review the jira in depth or actaully run your
> code with those configs -- but if you get a chance before i do, please
> re-review the code & configs you posted and see if you can reproduce using
> the *exact* same data in two different fields, and if the choice of query
> makes a differnce in the behavior you see.
>
>
> :
> : Regards,
> : Modassar
> :
> : On Wed, Aug 19, 2015 at 10:24 AM, Modassar Ather  >
> : wrote:
> :
> : > Ahmet/Chris! Thanks for your replies.
> : >
> : > Ahmet I think "net.agkn.hll.serialization" is used by hll() function
> : > implementation of Solr.
> : >
> : > Chris I will try to create sample data and create a jira ticket with
> : > details.
> : >
> : > Regards,
> : > Modassar
> : >
> : >
> : > On Tue, Aug 18, 2015 at 9:58 PM, Chris Hostetter <
> hossman_luc...@fucit.org
> : > > wrote:
> : >
> : >>
> : >> : > I am getting following exception for the query :
> : >> : > *q=field:query&stats=true&stats.field={!cardinality=1.0}field*.
> The
> : >> : > exception is not seen once the cardinality is set to 0.9 or less.
> : >> : > The field is *docValues enabled* and *indexed=false*. The same
> : >> exception
> : >> : > I tried to reproduce on non docValues field but could not. Please
> : >> help me
> : >> : > resolve the issue.
> : >>
> : >> Hmmm... this is a weird error ... but you haven't really given us
> enough
> : >> information to really guess what the root cause is
> : >>
> : >> - What was the datatype of the field(s)?
> : >> - Did you have the exact same data in both fields?
> : >> - Are these multivalued fields?
> : >> - Did your "real" query actually compute stats on the same field you
> had
> : >>   done your main term query on?
> : >>
> : >> I know we have some tests of this bsaic siuation, and i tried to do
> ome
> : >> more manual testing to spot check, but i can't reproduce.
> : >>
> : >> If you can please provide a full copy of the data (as csv o xml or
> : >> whatever) to build your index along with all solr configs and the
> exact
> : >> queries to reproduce that would really help get to the bottom of this
> --
> : >> if you can't provide all the data, then can you at least reproduce
> with a
> : >> small set of sample data?
> : >>
> : >> either way: please file a new jira issue and attach as much detail as
> you
> : >> can -- this URL has a lot of great tips on the types of data we need
> to be
> : >> able to g

Re: Multiple concurrent queries to Solr

2015-08-23 Thread Ashish Mukherjee
Thanks, everyone. Arcadius, that ticket is interesting.

I was wondering if an implementation of SolrClient could be based on
HttpAsyncClient
instead of HttpSolrClient. Just a thought right now, which needs to be
explored deeper.

- Ashish

On Mon, Aug 24, 2015 at 1:46 AM, Arcadius Ahouansou 
wrote:

> Hello Ashish.
>
> Therse is an unfinished work about this at
> https://issues.apache.org/jira/browse/SOLR-3383
>
> Maybe you want to have a look and contribute?
>
> Arcadius.
>
> On 23 August 2015 at 17:02, Walter Underwood 
> wrote:
>
> > The last time that I used the HTTPClient library, it was non-blocking. It
> > doesn’t try to read from the socket until you ask for data from the
> > response object. That allows parallel requests without threads.
> >
> > Underneath, it has a pool of connections that can be reused. If the pool
> > is exhausted, it can block.
> >
> > wunder
> > Walter Underwood
> > wun...@wunderwood.org
> > http://observer.wunderwood.org/  (my blog)
> >
> >
> > On Aug 23, 2015, at 8:49 AM, Shawn Heisey  wrote:
> >
> > > On 8/23/2015 7:46 AM, Ashish Mukherjee wrote:
> > >> I want to run few Solr queries in parallel, which are being done in a
> > >> multi-threaded model now. I was wondering if there are any client
> > libraries
> > >> to query Solr  through a non-blocking I/O mechanism instead of a
> > threaded
> > >> model. Has anyone attempted something like this?
> > >
> > > The only client library that the Solr project makes is SolrJ -- the
> > > client for Java.  If you are not using the SolrJ client, then the Solr
> > > project did not write it, and you should contact the authors of the
> > > library directly.
> > >
> > > SolrJ and Solr are both completely thread-safe, and multiple threads
> are
> > > recommended for highly concurrent usage.  SolrJ uses HttpClient for
> > > communication with Solr.
> > >
> > > I was not able to determine whether the default httpclient settings
> will
> > > result in non-blocking I/O or not. As far as I am aware, nothing in
> > > SolrJ sets any explicit configuration for blocking or non-blocking I/O.
> > > You can create your own HttpClient object in a SolrJ program and have
> > > the SolrClient object use it.
> > >
> > > HttpClient uses HttpCore.  Here is the main web page for these
> > components:
> > >
> > > https://hc.apache.org/
> > >
> > > On this webpage, it says "HttpCore supports two I/O models: blocking
> I/O
> > > model based on the classic Java I/O and non-blocking, event driven I/O
> > > model based on Java NIO."  There is no information here about which
> > > model is chosen by default.
> > >
> > > Thanks,
> > > Shawn
> > >
> >
> >
>
>
> --
> Arcadius Ahouansou
> Menelic Ltd | Information is Power
> M: 07908761999
> W: www.menelic.com
> ---
>


Re: Solr 4.10.3 cached grouping results but Solr 5.2.1 don't, why?

2015-08-23 Thread Pavel Hladik
Nobody knows or has the same issue?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-10-3-cached-grouping-results-but-Solr-5-2-1-don-t-why-tp4224396p4224812.html
Sent from the Solr - User mailing list archive at Nabble.com.


GC parameters tuning for core of 140M docs on 50G of heap memory

2015-08-23 Thread Pavel Hladik
Hi,

we have a Solr 5.2.1 with 9 cores and one of them has 140M docs. Can you
please recommend tuning of those GC parameters? The performance is not a
issue, sometimes during peaks we have OOM and we use 50G of heap memory, the
server has 64G of ram.

GC_TUNE="-XX:NewRatio=3 \
-XX:SurvivorRatio=4 \
-XX:TargetSurvivorRatio=90 \
-XX:MaxTenuringThreshold=8 \
-XX:+UseConcMarkSweepGC \
-XX:+UseParNewGC \
-XX:ConcGCThreads=4 -XX:ParallelGCThreads=4 \
-XX:+CMSScavengeBeforeRemark \
-XX:PretenureSizeThreshold=64m \
-XX:+UseCMSInitiatingOccupancyOnly \
-XX:CMSInitiatingOccupancyFraction=50 \
-XX:CMSMaxAbortablePrecleanTime=6000 \
-XX:+CMSParallelRemarkEnabled \
-XX:+ParallelRefProcEnabled"



--
View this message in context: 
http://lucene.472066.n3.nabble.com/GC-parameters-tuning-for-core-of-140M-docs-on-50G-of-heap-memory-tp4224813.html
Sent from the Solr - User mailing list archive at Nabble.com.