Re: Long string in fq value parameter, more than 2000000 chars

2017-05-30 Thread Susheel Kumar
If you are able to draw gc logs in gcviewer when OOM happens, it can give
you idea if it was sudden OOM or heap gets filled over a period of time.
This may help to nail down if any particular query is causing the problem
or something else...

Thanks,
Susheel

On Sat, May 27, 2017 at 5:36 PM, Daniel Angelov 
wrote:

> Thanks for the support so far.
> I am going to analyze the logs in order to check the frequency of such
> queries. BTW, I have forgot to mention, the soft and the hard commits are
> each 60 sec.
>
> BR
> Daniel
>
> Am 27.05.2017 22:57 schrieb "Erik Hatcher" :
>
> > Another technique to consider is {!join}.  Index the cross ref id "sets"
> > to another core and use a short and sweet join, if there are stable sets
> of
> > id's.
> >
> >Erik
> >
> > > On May 27, 2017, at 11:39, Alexandre Rafalovitch 
> > wrote:
> > >
> > > On top of Shawn's analysis, I am also wondering how often those FQ
> > > queries are reused. Because they and the matching documents are
> > > getting cached, so there might be quite a bit of space taken with that
> > > too.
> > >
> > > Regards,
> > >Alex.
> > > 
> > > http://www.solr-start.com/ - Resources for Solr users, new and
> > experienced
> > >
> > >
> > >> On 27 May 2017 at 11:32, Shawn Heisey  wrote:
> > >>> On 5/27/2017 9:05 AM, Shawn Heisey wrote:
> >  On 5/27/2017 7:14 AM, Daniel Angelov wrote:
> >  I would like to ask, what could be the memory/cpu impact, if the fq
> >  parameter in many of the queries is a long string (fq={!terms
> >  f=...}..., ) around 200 chars. Most of the queries are like:
> >  "q={!frange l=Timestamp1 u=Timestamp2}... + some others criteria".
> >  This is with SolrCloud 4.1, on 10 hosts, 3 collections, summary in
> >  all collections are around 1000 docs. The queries are over all 3
> >  collections.
> > >>
> > >> Followup after a little more thought:
> > >>
> > >> If we assume that the terms in your filter query are a generous 15
> > >> characters each (plus a comma), that means there are in the ballpark
> of
> > >> 125 thousand of them in a two million byte filter query.  If they're
> > >> smaller, then there would be more.  Considering 56 bytes of overhead
> for
> > >> each one, there's at least another 7 million bytes of memory for
> 125000
> > >> terms when the terms parser divides that filter into multiple String
> > >> objects, plus memory required for the data in each of those small
> > >> strings, which will be just a little bit less than the original four
> > >> million bytes, because it will exclude the commas.  A fair amount of
> > >> garbage will probably also be generated in order to parse the filter
> ...
> > >> and then once the query is done, the 15 megabytes (or more) of memory
> > >> for the strings will also be garbage.  This is going to repeat for
> every
> > >> shard.
> > >>
> > >> I haven't even discussed what happens for memory requirements on the
> > >> Lucene frange parser, because I don't have any idea what those are,
> and
> > >> you didn't describe the function you're using.  I also don't know how
> > >> much memory Lucene is going to require in order to execute a terms
> > >> filter with at least 125K terms.  I don't imagine it's going to be
> > small.
> > >>
> > >> Thanks,
> > >> Shawn
> > >>
> >
>


Nested documents using solr 6.2.1

2017-05-30 Thread aniljayanti
hi ,

i am trying to work on nested documents in solr 6.2.1. I am trying to
generate employee info from database.

parent node consists of empid,cid,sid,pid.
child node consists of price,empid,cid,sid,pid.[multiple prices will be
exists, remaining values are same as parent]

data-config.xml
---











   




managed-schema
--











empid

Did full-import, completed successfully. when i used select query, below
resonse got.

http://localhost:9119/solr/employee/select?q=*:*

{
"responseHeader": {
"status": 0,
"QTime": 0,
"params": {
"q": "*:*",
"indent": "on",
"rows": "100",
"wt": "json",
"_": "1496057171982"
}
},
"response": {
"numFound": 3,
"start": 0,
"docs": [{
"empid": "E1",
"cid": "c1",
"sid": "s1",
"pid": "p1",
"price": "123"
},
{
"empid": "E1",
"cid": "c1",
"sid": "s1",
"pid": "p1",
"price": "567"
},
{
"empid": "E1",
"cid": "c1",
"sid": "s1",
"pid": "p1",
"price": "0"
}
]
}
}

when i query above, got the childs along with parent.[parent node that is
which is having price 'zero'].

I don't know how to write the solr query to get the below response.


{
"responseHeader": {
"status": 0,
"QTime": 0,
"params": {
"q": "*:*",
"indent": "on",
"rows": "100",
"wt": "json",
"_": "1496057171982"
}
},
"response": {
"numFound": 3,
"start": 0,
"docs": [{
"empid": "E1",
"cid": "c1",
"sid": "s1",
"pid": "p1",
"price": "0",
"_childDocuments_": [{
"empid": "E1",
"cid": "c1",
"sid": "s1",
"pid": "p1",
"price": "123"
},
{
"empid": "E1",
"cid": "c1",
"sid": "s1",
"pid": "p1",
"price": "567"
}
]
}]
}
}

Please help me in this.

Many thanks in advance.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Nested-documents-using-solr-6-2-1-tp4338024.html
Sent from the Solr - User mailing list archive at Nabble.com.


Zookeeper-aware golang library

2017-05-30 Thread Shawn Feldman
I've been working on a go library for solr that is zk aware for the past
few months.  Here is my library.  let me know if you'd like to contribute.
I compute the hashrange given a key and route and return the desired
servers in the cluster then have an sdk for query and for retries given
network outages.  We've found that just using a load balancer leads to some
data loss when nodes reboot which is why i emulated the java solr client
and rotate through a list of solr servers.

https://github.com/sendgrid/go-solr


Collapse/Expand on non indexed field

2017-05-30 Thread Eirik Hungnes
Hi,

We are seeing that the ExpandComponent does not expand results when the
field is not indexed. If indexed=true the results will be expanded as
expected.

1. Is that the case?
2. Why?

Here is the field used for collapse/expand:


Thanks,
Eirik


update please

2017-05-30 Thread Saman Rasheed
hi, can someone kindly update me on the question i raised on Mon, 22 May, 17:14


subject:


without termfeq - returning the number of terms/or regex of terms in a 
document


thanks,


Re: Long string in fq value parameter, more than 2000000 chars

2017-05-30 Thread Rick Leir
Daniel,
Is it worth saying that you have honkin' long queries and there must be a 
simpler way? ( I am a big fan of KISS . .  Keep It Simple Stupid). I am not 
calling you names, just saying that this acronym comes up in just about every 
project I work on. It is akin to the Peter Principle, where design complexity 
inevitably increases to the breaking point, then I get cranky. And you probably 
can tell us a solid reason for having the long queries.  Cheers -- Rick

On May 30, 2017 9:22:24 AM EDT, Susheel Kumar  wrote:
>If you are able to draw gc logs in gcviewer when OOM happens, it can
>give
>you idea if it was sudden OOM or heap gets filled over a period of
>time.
>This may help to nail down if any particular query is causing the
>problem
>or something else...
>
>Thanks,
>Susheel
>
>On Sat, May 27, 2017 at 5:36 PM, Daniel Angelov
>
>wrote:
>
>> Thanks for the support so far.
>> I am going to analyze the logs in order to check the frequency of
>such
>> queries. BTW, I have forgot to mention, the soft and the hard commits
>are
>> each 60 sec.
>>
>> BR
>> Daniel
>>
>> Am 27.05.2017 22:57 schrieb "Erik Hatcher" :
>>
>> > Another technique to consider is {!join}.  Index the cross ref id
>"sets"
>> > to another core and use a short and sweet join, if there are stable
>sets
>> of
>> > id's.
>> >
>> >Erik
>> >
>> > > On May 27, 2017, at 11:39, Alexandre Rafalovitch
>
>> > wrote:
>> > >
>> > > On top of Shawn's analysis, I am also wondering how often those
>FQ
>> > > queries are reused. Because they and the matching documents are
>> > > getting cached, so there might be quite a bit of space taken with
>that
>> > > too.
>> > >
>> > > Regards,
>> > >Alex.
>> > > 
>> > > http://www.solr-start.com/ - Resources for Solr users, new and
>> > experienced
>> > >
>> > >
>> > >> On 27 May 2017 at 11:32, Shawn Heisey 
>wrote:
>> > >>> On 5/27/2017 9:05 AM, Shawn Heisey wrote:
>> >  On 5/27/2017 7:14 AM, Daniel Angelov wrote:
>> >  I would like to ask, what could be the memory/cpu impact, if
>the fq
>> >  parameter in many of the queries is a long string (fq={!terms
>> >  f=...}..., ) around 200 chars. Most of the queries are
>like:
>> >  "q={!frange l=Timestamp1 u=Timestamp2}... + some others
>criteria".
>> >  This is with SolrCloud 4.1, on 10 hosts, 3 collections,
>summary in
>> >  all collections are around 1000 docs. The queries are over
>all 3
>> >  collections.
>> > >>
>> > >> Followup after a little more thought:
>> > >>
>> > >> If we assume that the terms in your filter query are a generous
>15
>> > >> characters each (plus a comma), that means there are in the
>ballpark
>> of
>> > >> 125 thousand of them in a two million byte filter query.  If
>they're
>> > >> smaller, then there would be more.  Considering 56 bytes of
>overhead
>> for
>> > >> each one, there's at least another 7 million bytes of memory for
>> 125000
>> > >> terms when the terms parser divides that filter into multiple
>String
>> > >> objects, plus memory required for the data in each of those
>small
>> > >> strings, which will be just a little bit less than the original
>four
>> > >> million bytes, because it will exclude the commas.  A fair
>amount of
>> > >> garbage will probably also be generated in order to parse the
>filter
>> ...
>> > >> and then once the query is done, the 15 megabytes (or more) of
>memory
>> > >> for the strings will also be garbage.  This is going to repeat
>for
>> every
>> > >> shard.
>> > >>
>> > >> I haven't even discussed what happens for memory requirements on
>the
>> > >> Lucene frange parser, because I don't have any idea what those
>are,
>> and
>> > >> you didn't describe the function you're using.  I also don't
>know how
>> > >> much memory Lucene is going to require in order to execute a
>terms
>> > >> filter with at least 125K terms.  I don't imagine it's going to
>be
>> > small.
>> > >>
>> > >> Thanks,
>> > >> Shawn
>> > >>
>> >
>>

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com 

Re: update please

2017-05-30 Thread Rick Leir
Salman,
That is a week ago, which is a long while. And my Android does not display the 
archives link in a readable way. Would you mind repeating the question here? Be 
a bit verbose, sometimes it is better that way.
Cheers -- Rick


On May 30, 2017 12:29:34 PM EDT, Saman Rasheed  
wrote:
>hi, can someone kindly update me on the question i raised on Mon, 22
>May, 17:14
>
>
>subject:
>
>
>without termfeq - returning the number of terms/or regex of terms in a
>document
>
>
>thanks,

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com 

Re: update please

2017-05-30 Thread Saman Rasheed
Hi Rick,


Thanks for coming back to me on this, btw it's 'Saman' but please call me sam 
like everyone else 😊


here we go:

~~

i have an english book which i have indexed its contents successfully into a 
field called 'content,
with the following properties:





so if need to return the number of a specific term regex e.g. '*olomo*' then my 
document should
contain 2 and give me 'Solomon' with a term frequency = 2.


I've tried going through the term vector section in the reference and various 
other posts
on the internet but still i havent managed to figure out how.


the nearest i found is the following syntax/way:


http://localhost:8983/solr/test/tvrh?q=content:[*%20TO%20*]&indent=true&tv.tf=true&tv.df=true


which brings my pc to a near halt for about a couple of minutes, and then it 
returns the term
frequency of every term! but i only need the term frequency of particular 
pattern/regex:


is there a way to narrow it down to just one regex term, e.g. *thing*, so it 
will find the term frequency of 'soothing',
'somthing' and 'everything' ... etc each with their number of occurences per 
document?


thanks,



From: Rick Leir 
Sent: 30 May 2017 16:45
To: solr-user@lucene.apache.org
Subject: Re: update please

Salman,
That is a week ago, which is a long while. And my Android does not display the 
archives link in a readable way. Would you mind repeating the question here? Be 
a bit verbose, sometimes it is better that way.
Cheers -- Rick


On May 30, 2017 12:29:34 PM EDT, Saman Rasheed  
wrote:
>hi, can someone kindly update me on the question i raised on Mon, 22
>May, 17:14
>
>
>subject:
>
>
>without termfeq - returning the number of terms/or regex of terms in a
>document
>
>
>thanks,

--
Sorry for being brief. Alternate email is rickleir at yahoo dot com


can't create collection using solrcloud

2017-05-30 Thread BrianMaltzan
Hi,

I have a fresh install with 3 zookeepers(3.4.6), and 1 solr(6.5.0) instance.
They are running and talking to one another, but I can not create a
collection. I can connect to zk and get files, upconfig, etc. My config
works locally with the built in zk.

I'm currently using a similar setup, with solr 5.2.

Any ideas?

Eg: bin/solr create_collection -c collection1 -d basic_configs
Creating new collection 'collection1' using command:
https://solr.test.x.org:8983/solr/admin/collections?action=CREATE&name=collection1&numShards=1&replicationFactor=1&maxShardsPerNode=1&collection.configName=basic_configs
ERROR: Failed to create collection 'collection1' due to:
{solr.test.x.org:8983_solr=org.apache.solr.client.solrj.SolrServerException:IOException
occured when talking to server at: http://solr.test.x.org:8983/solr}

And in logs:
Timed out waiting for new collection's replicas to become ACTIVE

CollectionsHandler Timed out waiting for new collection's replicas to become
ACTIVE with timeout=30
org.apache.solr.client.solrj.SolrServerException: IOException occured when
talking to server at: http://solr.test.x.org:8983/solr
hjjjCaused by: org.apache.http.client.ClientProtocolExceptionk
Caused by: org.apache.http.ProtocolException: The server failed to respond
with a valid HTTP response

Thanks,
Brian






--
View this message in context: 
http://lucene.472066.n3.nabble.com/can-t-create-collection-using-solrcloud-tp4338092.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: update please

2017-05-30 Thread Rick Leir
Sam,
First, try it with Solomo* and you should see much better response.

Try things and experiment in the SolrAdmin query tab or analysis tab.

Use debug=true or debugquery= true.

When the server is really slow, use top(1) to see if you are swapping. Say 
top -o RES
Or
top then shift-M

Get a screen grab and post it for us to see.
Check the solr log? Enough for now
Cheers -- Rick

On May 30, 2017 1:00:39 PM EDT, Saman Rasheed  wrote:
>Hi Rick,
>
>
>Thanks for coming back to me on this, btw it's 'Saman' but please call
>me sam like everyone else 😊
>
>
>here we go:
>
>~~
>
>i have an english book which i have indexed its contents successfully
>into a field called 'content,
>with the following properties:
>
>
>multiValued="true"
>termVectors="true" termPositions="true" termOffsets="true"/>
>
>
>so if need to return the number of a specific term regex e.g. '*olomo*'
>then my document should
>contain 2 and give me 'Solomon' with a term frequency = 2.
>
>
>I've tried going through the term vector section in the reference and
>various other posts
>on the internet but still i havent managed to figure out how.
>
>
>the nearest i found is the following syntax/way:
>
>
>http://localhost:8983/solr/test/tvrh?q=content:[*%20TO%20*]&indent=true&tv.tf=true&tv.df=true
>
>
>which brings my pc to a near halt for about a couple of minutes, and
>then it returns the term
>frequency of every term! but i only need the term frequency of
>particular pattern/regex:
>
>
>is there a way to narrow it down to just one regex term, e.g. *thing*,
>so it will find the term frequency of 'soothing',
>'somthing' and 'everything' ... etc each with their number of
>occurences per document?
>
>
>thanks,
>
>
>
>From: Rick Leir 
>Sent: 30 May 2017 16:45
>To: solr-user@lucene.apache.org
>Subject: Re: update please
>
>Salman,
>That is a week ago, which is a long while. And my Android does not
>display the archives link in a readable way. Would you mind repeating
>the question here? Be a bit verbose, sometimes it is better that way.
>Cheers -- Rick
>
>
>On May 30, 2017 12:29:34 PM EDT, Saman Rasheed
> wrote:
>>hi, can someone kindly update me on the question i raised on Mon, 22
>>May, 17:14
>>
>>
>>subject:
>>
>>
>>without termfeq - returning the number of terms/or regex of terms in a
>>document
>>
>>
>>thanks,
>
>--
>Sorry for being brief. Alternate email is rickleir at yahoo dot com

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com 

Re: Pagination issue when grouping

2017-05-30 Thread Rick Leir
Hi Tien
Consider using the export handler if you can. Then you have no paging.

When you are having a paging problem you might want to think of the use case -- 
how many of your users will be willing​ to page deeply? If they give up, then 
you have lost already.
Cheers -- Rick

On May 29, 2017 7:58:38 PM EDT, Nguyen Manh Tien  
wrote:
>Hello,
>
>I group search result by a field (with high cardinality)
>I paginate search page using num of groups using param
>group.ngroups=true.
>But that cause high CPU issue. So i turn off it.
>
>Without ngroups=true, i can't get the num of groups so pagination is
>not
>correct because i must use numFound,
>
>it alway miss some last pages, the reason is some results was already
>collapsed into groups in previous pages.
>
>For example, a search return 11 results, but there are 2 results belong
>to
>1 groups, so it has 10 groups (but i don't know it in advance because i
>set
>ngroups=false), with 11 results, pagination display 2 pages, but page 2
>have 0 results.
>
>Anyone faced similar issue and had a work around?
>
>Thanks,
>Tien

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com 

Re: Data from 4.10 to 6.5.1

2017-05-30 Thread mganeshs
All,

As I mentioned above, thought  I will update on steps we followed to move my
data from 4.10 to 6.5.1

Our setup has 6 collections containing only one shard in each and couple of
replicas in each collections

* Install Solr 5.5.4
* Create configs for each collection. Copied basic_configs ( thats comes by
default )
* In Manage schema add our custom field types needed for that corresponding
collection
* Start Solr in cloud mode
* Upconfig the configs for all collections
* Creating Collection with numShards as 1 using HTTP command as mentioned in
over  here
  
* Stop the solr
* In the created shards's data directory, delete the index folder and copy
the 4.10 index folder and make sure write.lock is deleted if exists.
* Now start the solr again. In the solr admin UI, we can see the num docs
will be as per your data copied from 4.10 version. 
* Optimize the index
* Do this for all collection.

Now Install 6.5.1 and repeat same above steps. 

* Install Solr 6.5.1
* Create configs for each collection. Copied basic_configs ( thats comes by
default )
* In Manage schema add our custom field types needed for that corresponding
collection
* Start Solr in cloud mode
* Upconfig the configs for all collections
* Creating Collection with numShards as 1 using HTTP command as mentioned in
over  here
  
* Stop the solr
* In the created shards's data directory, delete the index folder and copy
the 5.5.4 index folder and make sure write.lock is deleted if exists.
* Now start the solr again. In the solr admin UI, we can see the num docs
will be as per your data copied from 5.5.4  version. 
* Do this for all collection.

Now we can create REPLICA as per our need for each collection your
ADDREPLICA command.

This worked fine for us without any issues.

Hope this helps for others who wants to move from older version of SOLR 4.x
to 6.X.

Thanks and regards,



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Data-from-4-10-to-6-5-1-tp4337410p4338133.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: can't create collection using solrcloud

2017-05-30 Thread mganeshs
Couple of times I faced this issue when firewall "Endpoint security" was on.
Once I disabled it then it started working.

Also for creating collection, I usually do in the following way,

upconfig the configuration to zookeeper using the command

bin/solr zk upconfig -n collection1_configs -z srv-nl-com12:2181 -d
collection1_configs

then for creating collection use HTTP command ( REST API )

http://srv-nl-com13:8983/solr/admin/collections?action=CREATE&name=collection1&numShards=1&replicationFactor=2&maxShardsPerNode=2&collection.configName=collection1_configs

This works fine for us...

Hope this helps...




--
View this message in context: 
http://lucene.472066.n3.nabble.com/can-t-create-collection-using-solrcloud-tp4338092p4338135.html
Sent from the Solr - User mailing list archive at Nabble.com.