regarding cursorMark feature for deep pagination

2017-07-18 Thread suresh pendap
Hi, This question is more about the Implementation detail of the cursorMark feature. I was reading about using the cursorMark feature for deep pagination in Solr mentioned in this blog http://yonik.com/solr/paging-and-deep-paging/ It is not clear to me as to how it is more efficient as compared

Re: Copy field a source of copy field

2017-07-18 Thread Erick Erickson
OK, I take it back. Keepwords handle multiple words just fine. So I have to rewind. I'm having no trouble at all applying multiple, successive keepwords filters, even when there are multiple words on a single line in the keepwords file. Your use of shingles in here is probably going to confuse thi

Re: Solr 6.6.0 - Indexing errors

2017-07-18 Thread Joe Obernberger
Thank you Shawn. We will be adjusting solr.solr.home to point some place else so that our puppet module will work. We actually didn't loose any data since the indexes are in HDFS. Our configuration for our largest collection is 100 shards with 3 replicas each on top of HDFS with 3x replicati

Re: Copy field a source of copy field

2017-07-18 Thread tstusr
Well, for me it's kind of strange because it's working only with words that have blank spaces. It seems that maybe I'm not explaining well. My field is defined as follows: We have 2 KWF files, "species" and

Re: cpu utilization high

2017-07-18 Thread Erick Erickson
There isn't nearly enough information to say much of anything here. Saying "Poi" makes me wonder if you're using the extracting request handler, in which case I'd recommend you move it off Solr, see: https://lucidworks.com/2012/02/14/indexing-with-solrj/ you might review: https://wiki.apache.org/s

Re: Copy field a source of copy field

2017-07-18 Thread Erick Erickson
Multiple keyword files work just fine for me. one issue you're having is that multi-word keepwords aren't going to do what you expect. The analysis chains work on _tokens_, and only see one at a time. Plus (apparently) the input is broken up on whitespace (the docs aren't entirely clear on this, b

Re: default values for numRecordsToKeep and maxNumLogsToKeep

2017-07-18 Thread Erick Erickson
I'm going to punt on the rationale since I wasn't involved in that discussion. numRecordsToKeep can be configured in the section of solrconfig.xml if you want to change it though. Best, Erick On Tue, Jul 18, 2017 at 10:53 AM, suresh pendap wrote: > Hi, > After looking at the source code I see

default values for numRecordsToKeep and maxNumLogsToKeep

2017-07-18 Thread suresh pendap
Hi, After looking at the source code I see that the default values for numRecordsToKeep is 100 and maxNumLogsToKeep is 10. So it seems by default the replica can only have 1000 document updates lag before the replica goes for a Full recovery from the leader. I would like to know the rationale for

Re: Copy field a source of copy field

2017-07-18 Thread tstusr
Well, I have no idea why that images display as did. The correct order is: Field chain analyzer. KWF-genus file Test output.

Re: Copy field a source of copy field

2017-07-18 Thread tstusr
It seems that is just taking the last file of keep words. Now for control purposes, I have in genus file: And just is takin

Re: Solr 6.6.0 - Indexing errors

2017-07-18 Thread Shawn Heisey
On 7/17/2017 11:39 AM, Joe Obernberger wrote: We use puppet to deploy the solr instance to all the nodes. I changed what was deployed to use the CDH jars, but our puppet module deletes the old directory and replaces it. So, all the core configuration files under server/solr/ were removed. Zoo

cpu utilization high

2017-07-18 Thread Satya Marivada
Hi All, We are using solr-6.3.0 with external zookeeper. Setup is as below. Poi is the collection which is big about 20G with each shard at 10G. Each jvm is having 3G and the vms have 70G of RAM. The processors are at 6. The cpu utilization when running queries is reaching more than 100%. Any sug

Re: Copy field a source of copy field

2017-07-18 Thread Erick Erickson
The code is very simple, it looks at a quick glance like it just reads the words in then the "accept" method just returns true or false based on whether the text file contains the token. Are you sure you reloaded your core/collection and pushed the changed schema to the right place? The admin/anal

Re: Joins in Parallel SQL?

2017-07-18 Thread Erick Erickson
bq: "Is it possible to contribute towards..." Of course. "Developer documentation" is in short supply, mostly you have to dive into the code and figure it out. See: https://wiki.apache.org/solr/HowToContribute for getting the code, setting up an IDE etc. I often find the most useful approach is t

Re: Copy field a source of copy field

2017-07-18 Thread tstusr
Ok, I know shingling will join with "_". But that is the behaviour we want, imagine we have this fields (contained in species file): abarema idiopoda abutilon bakerianum Those become in: abarema idiopoda abutilon bakerianum abarema_idiopoda abutilon_bakerianum But now in my genus file maybe i

Re: Need guidance for distributing data base on date interval in a collection

2017-07-18 Thread Charlie Hull
Hi, You should also consider how you should shard for best performance: for example, if most of your queries are for recent documents, you could end up with them all hitting only one shard. Here's an old blog we wrote on this subject (it mentions another open source engine, Xapian, but ignore that

RE: StringIndexOutOfBoundsException "in" SpellCheckCollator.getCollation

2017-07-18 Thread Umoreno
Hi. Was this issue solved?, I am facing a similar one -- View this message in context: http://lucene.472066.n3.nabble.com/StringIndexOutOfBoundsException-in-SpellCheckCollator-getCollation-tp4312517p4346582.html Sent from the Solr - User mailing list archive at Nabble.com.

Short Circuit Reads -

2017-07-18 Thread Joe Obernberger
Hi All - does SolrCloud support using Short Circuit Reads when using HDFS? Thanks! -Joe

Re: SolrJ 6.6.0 Connection pool shutdown now with stack trace

2017-07-18 Thread Shawn Heisey
On 7/18/2017 5:10 AM, Markus Jelsma wrote: > The problem was never resolved but Shawn asked for the stack trace, here it > is: > Caused by: java.lang.IllegalStateException: Connection pool shut down > at org.apache.http.util.Asserts.check(Asserts.java:34) As I suspected, it is the connection p

Re: Get results in multiple orders (multiple boosts)

2017-07-18 Thread alessandro.benedetti
"I have different "sort preferences", so I can't build a index and use for sorting.Maybe I have to sort by category then by source and by language or by source, then by category and by date" I would like to focus on this bit. It is ok to go for a custom function and sort at query time, but I am cu

Re: Create too many zookeeper connections when recreate CloudSolrServer instance

2017-07-18 Thread Walter Underwood
The entire point of a Zookeeper cluster is that it continues to be available when one (or more) nodes are down. If you want more failure tolerance, run a five node Zookeeper cluster instead of a three node cluster. Hacking the client will not increase robustness. Right now, you are hurting rob

RE: SolrJ 6.6.0 Connection pool shutdown now with stack trace

2017-07-18 Thread Markus Jelsma
Hello Susheel, Yes, the closing happens only at the end of the checking cycle. I asked my colleague about the firewall and he is positive everything is allowed between those nodes. I also cannot completely drop the firewall between those nodes to be sure, because the problem is very hard to rep

Re: SolrJ 6.6.0 Connection pool shutdown now with stack trace

2017-07-18 Thread Susheel Kumar
Then most likely its due to closing of connection as mentioned above though you said it's not happening in that part of your code. To rule out firewall possibility, you can test in some other/local env. Also how many requests/client/connections happening concurrently. Thanks, Susheel On Tue, Ju

RE: Enabling SSL

2017-07-18 Thread Miller, William K - Norman, OK - Contractor
Thank you all for your responses. I finally got it straightened out. I had forgotten to change my url from http to https. Dumb mistake on my part. Consider this issue closed. ~~~ William Kevin Miller ECS Federal, Inc. USPS/MTSC (405) 573-2158 -Original Message--

Re: Create too many zookeeper connections when recreate CloudSolrServer instance

2017-07-18 Thread Shawn Heisey
On 7/17/2017 2:48 AM, wg85907 wrote: > Thanks for your detail explanation. The reason I want to shutdown the > CloudSolrServer instance and create a new one is that I have concern that if > it can successfully reconnect to Zookeeper server if Zookeeper cluster has > some issue and reboot

Re: Get results in multiple orders (multiple boosts)

2017-07-18 Thread Susheel Kumar
As Eric suggested, its possible by sorting using custom function. You may have to use if, sum and exists function etc. to come up with custom score field and sort using this field. The if condition would check for the conditions mentioned and keep adding the score etc. Thanks, Susheel On Tue, Jul

RE: SolrJ 6.6.0 Connection pool shutdown now with stack trace

2017-07-18 Thread Markus Jelsma
Hello Susheel, No, nothing at all. I've check all six nodes, they are clean. Thanks, Markus -Original message- > From:Susheel Kumar > Sent: Tuesday 18th July 2017 14:30 > To: solr-user@lucene.apache.org > Subject: Re: SolrJ 6.6.0 Connection pool shutdown now with stack trace > > Do

Re: SolrJ 6.6.0 Connection pool shutdown now with stack trace

2017-07-18 Thread Susheel Kumar
Do you see any errors etc. in solr.log during this time? On Tue, Jul 18, 2017 at 7:10 AM, Markus Jelsma wrote: > The problem was never resolved but Shawn asked for the stack trace, here > it is: > > org.apache.solr.client.solrj.SolrServerException: > java.lang.IllegalStateException: > Connectio

Re: Embedded documents in solr

2017-07-18 Thread Susheel Kumar
How many availabilities.day can be there for a single document? Is it for a week/month/year? On Tue, Jul 18, 2017 at 4:21 AM, Swapnil Pande wrote: > Hi , > I am new to solr. I am facing a problem for embedding documents to solr. I > dont want to use solr joins. > The document is similar to > {"n

Re: Limit to the number of cores supported?

2017-07-18 Thread Pouliot, Scott
It doesn't seem to report anything at all, which is part of the problem. No error for me to track down as of yet Get Outlook for iOS From: Erick Erickson Sent: Monday, July 17, 2017 3:23:24 PM To: solr-user Subject: Re: Limit to the n

RE: SolrJ 6.6.0 Connection pool shutdown now with stack trace

2017-07-18 Thread Markus Jelsma
The problem was never resolved but Shawn asked for the stack trace, here it is: org.apache.solr.client.solrj.SolrServerException: java.lang.IllegalStateException: Connection pool shut down at org.apache.solr.client.solrj.impl.LBHttpSolrClient.doRequest(LBHttpSolrClient.java:485) at org.apach

Re: Get results in multiple orders (multiple boosts)

2017-07-18 Thread Luca Dall'Osto
Hello everyone,thanks for the prompt reply! In response to Florian, I can get correct score only when boost for 1 filed (for example category): the score are correctly increased by the factor.But when I try to make a double boost, the score are not great as expected (for example, if the greatest

Re: Create too many zookeeper connections when recreate CloudSolrServer instance

2017-07-18 Thread wg85907
I am not mean my Zookeeper cluster is rebooting frequently, just want to ensure my query service can be stable when Zookeeper cluster has issue or reboot. Will do some test to check if there is some issue here. Maybe current Zookeeper client can handle this case well. Hacking the client will always

Embedded documents in solr

2017-07-18 Thread Swapnil Pande
Hi , I am new to solr. I am facing a problem for embedding documents to solr. I dont want to use solr joins. The document is similar to {"name":string, availabilities:[{"day":Date,"status":0}..{}]} I want to index the array and search with queries like 1) where name = 'xyz' and availabilities.

Re: Need guidance for distributing data base on date interval in a collection

2017-07-18 Thread Modassar Ather
Hi Rehman, You may want to look into how the documents are routed on different shards. For that you can look into following documentation. https://cwiki.apache.org/confluence/display/solr/Shards+and+Indexing+Data+in+SolrCloud Basically it is the id of the document which when prefixed with certain

Re: Need guidance for distributing data base on date interval in a collection

2017-07-18 Thread Atita Arora
Hi Rehman, I am not sure about your use case, but why wouldn't you consider creating shard for a particular date range like within a week from current date, 15 days, a month and so on and so forth. I have done a similar implementation elsewhere. Can you tell more about your use case? Atita On

Need guidance for distributing data base on date interval in a collection

2017-07-18 Thread rehman kahloon
Hello Sir/Madam                    I am new to SolrCloud, Having ORACLE technologies experience. Now a days , i am comparing oracle and solrcloud using bigdata. So i want to know how can i create time interval sharding. e.g i have 10 machines, each machine for one shard and one date data, So how

Re: Parent child documents partial update

2017-07-18 Thread Sujay Bawaskar
Yup, got it! On Tue, Jul 18, 2017 at 12:22 PM, Amrit Sarkar wrote: > Sujay, > > Lucene index is in flat-object document style, so I really not think nested > documents at index / storage will ever be supported unless someone change > the very intricacy of the index. > > Amrit Sarkar > Search Eng