Generic questions - increase performance

2016-04-13 Thread Bastien Latard - MDPI AG
Dear Folks, :-) From this source , I read: "Each incoming request requires a thread [...] If still more simultaneous requests (more than maxThreads) are received, they are stacked up inside the server socket" I have a couple of gener

Re: Cache problem

2016-04-13 Thread Shawn Heisey
On 4/13/2016 12:57 AM, Bastien Latard - MDPI AG wrote: > Thank you Shawn & Reth! > > So I have now some questions, again > > > Remind: I have only Solr running on this server (i.e.: java + tomcat). > > /BTW: I needed to increase previously the java heap size because I > went out of memory. Actually

Re: Generic questions - increase performance

2016-04-13 Thread Shawn Heisey
On 4/13/2016 1:50 AM, Bastien Latard - MDPI AG wrote: > From this source > , I read: > "Each incoming request requires a thread [...] If still more > simultaneous requests (more than maxThreads) are received, they are > stacked up inside th

Re: Soft commit does not affecting query performance

2016-04-13 Thread Bhaumik Joshi
Hi Bill, Please find below reference. http://www.cloudera.com/documentation/enterprise/5-4-x/topics/search_tuning_solr.html * "Enable soft commits and set the value to the largest value that meets your requirements. The default value of 1000 (1 second) is too aggressive for some enviro

overseer status documentation

2016-04-13 Thread GOURAUD Emmanuel
Hi all, I'm looking for exact definitions of results returned by an "OVERSTATUS" API query, and the documentation talk about " various overseer APIs" does someone have more informations about that? cheers, Emmanuel GOURAUD Ingénieur Infogérance | IT Outsourcing Engineer egour...@jou

Solr facet using gap function

2016-04-13 Thread Ali Nazemian
Dear all, Hi, I am wondering, is there any way to introduce and add a function for facet gap parameter? I already know there are some Date Math that can be used. (Such as DAY, MONTH, and etc.) I want to add some functions and try to use them as gap in facet range; Is it possible? Sincerely, Ali.

Configuring F5 load balancer with Solr cloud to switch between clusters on separate servers

2016-04-13 Thread preeti kumari
Hi, I have a solr cloud setup. Two clusters - Primary and Secondary. Each cluster has two collections. Primary cluster is hosted on Solr Node N1, N2, N3 Secondary Cluster is hosted on Solr Nodes N3, N4, N5 . All solr nodes are on different physical servers. Primary clusters are running on zk1,z

Re: Cache problem

2016-04-13 Thread Bastien Latard - MDPI AG
Thank you all again for your good and detailed answer. I will combine all of them to try to build a better environment. *Just a last question...* /I don't remember exactly when I needed to increase the java heap.../ /but is it possible that this was for the DataImport.../ *Would the DIH work if

Re: Configuring F5 load balancer with Solr cloud to switch between clusters on separate servers

2016-04-13 Thread preeti kumari
Updating the info : Primary cluster is hosted on Solr Node N1, N2, N3 Secondary Cluster is hosted on Solr Nodes N4, N5, N6. All solr nodes (N1-N6) are on different physical servers. On Wed, Apr 13, 2016 at 3:59 PM, preeti kumari wrote: > Hi, > > I have a solr cloud setup. > Two clusters - P

Re: Solr document duplicated during pagination

2016-04-13 Thread Anil
Yes Erick. I have the attached the queries generated from logs. i see many duplicate records :( . i could not see any duplicates on solr admin console. Each run giving different number of duplicates. Do you think Not (-) on query is an issue? please advice. Thanks, Anil On 10 April 2016 at

RE: EmbeddedSolr for unit tests in Solr 6

2016-04-13 Thread Rohana Rajapakse
Thanks Shalin, I am now trying to use MiniSolrCloudCluster to create mini cluster in solr 6/7 as below: MiniSolrCloudCluster miniCluster = new MiniSolrCloudCluster(1, temp_folder_path, path_to_solr.xml, null); It throws the following exception: org.apache.zookeeper.KeeperException$NodeExist

CloudDescription sometimes null

2016-04-13 Thread Markus Jelsma
Hello - we use CloudDescriptor to get information about the collection. Very early after starting Solr, we obain an instance: cloudDescriptor = core.getCoreDescriptor().getCloudDescriptor(); In some strange cases, at some later point cloudDescriptor is null? Is it possible cloudDescriptor is

Re: boost parent fields BlockJoinQuery

2016-04-13 Thread michael solomon
Thank your for the response. this query worked without errors: > (city:"tucson"^1000) +{!parent which="is_parent:true" > score=max}(normal_text:"silver ring") > however, this is not exactly what I was looking for.. I got from solr all the documents that their city field has the value Tucson. but I

Re: Configuring F5 load balancer with Solr cloud to switch between clusters on separate servers

2016-04-13 Thread Shawn Heisey
On 4/13/2016 4:29 AM, preeti kumari wrote: > I have a solr cloud setup. > Two clusters - Primary and Secondary. Each cluster has two collections. > Primary cluster is hosted on Solr Node N1, N2, N3 > Secondary Cluster is hosted on Solr Nodes N3, N4, N5 . > All solr nodes are on different physica

Re: Cache problem

2016-04-13 Thread Shawn Heisey
On 4/13/2016 4:34 AM, Bastien Latard - MDPI AG wrote: > Thank you all again for your good and detailed answer. > I will combine all of them to try to build a better environment. > > *Just a last question...* > /I don't remember exactly when I needed to increase the java heap.../ > /but is it possib

Re: Solr document duplicated during pagination

2016-04-13 Thread Shawn Heisey
On 4/13/2016 4:57 AM, Anil wrote: > Yes Erick. > > I have the attached the queries generated from logs. > > i see many duplicate records :( . i could not see any duplicates on > solr admin console. > > Each run giving different number of duplicates. > > Do you think Not (-) on query is an issue? pl

number of zookeeper & aws instances

2016-04-13 Thread Jay Potharaju
Hi, In my current setup I have about 30 million docs which will grow to 100 million by the end of the year. In order to accommodate scaling and query load, i am planning to have atleast 2 shards and 2/3 replicas to begin with. With the above solrcloud setup I plan to have 3 zookeepers in the quoru

Adding custom filter plugin to solr cloud

2016-04-13 Thread Harsha JSN
Hi, I had set up solr cloud with some set of nodes. I am trying to add an external library which has custom query parser logic. I have done this by copying the custom jar file to lib folder in each node. May i know if this is the correct way to do or is there a standard way to add custom librarie

Re: Adding custom filter plugin to solr cloud

2016-04-13 Thread Erick Erickson
Copying files "to the right place" certainly is one way. It does suffer from bookkeeping issues, i.e. when you make changes you have to get the new jar files all pushed out to the right place on all the nodes. Another possibility is: https://cwiki.apache.org/confluence/display/solr/Adding+Custom+P

Re: number of zookeeper & aws instances

2016-04-13 Thread Erick Erickson
For collections with this few nodes, 3 zookeepers are plenty. From what I've seen people don't go to 5 zookeepers until they have hundreds and hundreds of nodes. 100M docs can fit on 2 shards, I've actually seen many more. That said, if the docs are very large and/or the searchers are complex perf

Re: number of zookeeper & aws instances

2016-04-13 Thread Jay Potharaju
Thanks for the feedback Eric. I am assuming the number of replicas help in load balancing and reliability. That being said are there any recommendation for that, or is it dependent on query load and performance sla's. Any suggestions on aws setup? Thanks > On Apr 13, 2016, at 7:12 AM, Erick Er

Re: CloudDescription sometimes null

2016-04-13 Thread Erick Erickson
It takes a little time for core discovery to enumerate all of the cores and fill in the various descriptors. That said, I'd be surprised if you actually can hit this very often since the coreDescriptor creation code also creates the cloudDescriptor and they're both loaded by the enumeration process

Re: number of zookeeper & aws instances

2016-04-13 Thread Erick Erickson
bq: or is it dependent on query load and performance sla's Exactly. The critical bit is that every single replica meets your SLA. By that I mean let's claim that your SLA is 500ms. If you can serve 10 qps at that SLA with one replica/shard (i.e. leader only) you can server 50 QPS by adding 4 more

RE: CloudDescription sometimes null

2016-04-13 Thread Markus Jelsma
Hello, That core.getCoreDescriptor().getCloudDescriptor(); piece of code runs in RequestHandlerBase.inform(core) so maybe that is too early. I'll add a NPE check and reset cloudDescriptor is missing. Thanks! Markus -Original message- > From:Erick Erickson > Sent: Wednesday 13th Apri

Re: Which line is solr following in terms of a BI Tool?

2016-04-13 Thread Kevin Risden
For Solr 6, ParallelSQL and Solr JDBC driver are going to be developed more as well as JSON facets. The Solr JDBC driver that is in Solr 6 contains SOLR-8502. There are further improvements coming in SOLR-8659 that didn't make it into 6.0. The Solr JDBC piece leverages ParallelSQL and in some cases

update log not in ACTIVE or REPLAY state

2016-04-13 Thread michael dürr
Hello, when I launch my two nodes in Solr cloud, I always get the following error at node2: PeerSync: core=portal_shard1_replica2 url=http://127.0.1.1:8984/solr ERROR, update log not in ACTIVE or REPLAY state. FSUpdateLog{state=BUFFERING, tlog=null} Actually, I cannot experience any problems, bu

Re: Which line is solr following in terms of a BI Tool?

2016-04-13 Thread Pablo Anzorena
Thank you very much both of you for your insights! I really appreciate it. 2016-04-13 11:30 GMT-03:00 Kevin Risden : > For Solr 6, ParallelSQL and Solr JDBC driver are going to be developed more > as well as JSON facets. The Solr JDBC driver that is in Solr 6 contains > SOLR-8502. There are fur

Solrj: SystemInfoRequest fails if no default collection specified.

2016-04-13 Thread Iana Bondarska
Hi All, I'm trying to get solr version via solrj api. If I try to use SystemInfoRequest without specifying collection -- I'm getting an error "No collection is set and no default collection specified". Could you tell me please, is there any way to get solr version without specifying collection? Th

Querying of multiple string value

2016-04-13 Thread Zheng Lin Edwin Yeo
Hi, Would like to find out, is there any way to do a multiple value query of a field that is of type String, besides using the OR parameters? Currently, I am using the OR parameters like http://localhost:8983/solr/collection1/highlight?q=id:collection1_0001 OR id:collection1_0002 But this will g

Re: Solrj: SystemInfoRequest fails if no default collection specified.

2016-04-13 Thread Shawn Heisey
On 4/13/2016 9:01 AM, Iana Bondarska wrote: > I'm trying to get solr version via solrj api. If I try to use > SystemInfoRequest without specifying collection -- I'm getting an error "No > collection is set and no default collection specified". > Could you tell me please, is there any way to get sol

RE: number of zookeeper & aws instances

2016-04-13 Thread Garth Grimm
I thought that if you start with 3 Zk nodes in the ensemble, and only lose 1, it will have no effect on indexing at all, since you still have a quorum. If you lose 2 (which takes you below quorum), then the cloud loses "confidence" in which solr core is the leader of each shard and stops indexin

Re: number of zookeeper & aws instances

2016-04-13 Thread Daniel Collins
Just to chip in, more ZKs are probably only necessary if you are doing NRT indexing. Loss of a single ZK (in a 3 machine setup) will block indexing for the time it takes to get that machine/instance back up, however it will have less impact on search, since the search side can use the existing sta

Re: number of zookeeper & aws instances

2016-04-13 Thread Shawn Heisey
On 4/13/2016 9:34 AM, Daniel Collins wrote: > Just to chip in, more ZKs are probably only necessary if you are doing NRT > indexing. > > Loss of a single ZK (in a 3 machine setup) will block indexing for the time > it takes to get that machine/instance back up That would NOT block indexing. If yo

Re: number of zookeeper & aws instances

2016-04-13 Thread Daniel Collins
Yeah, sorry, my maths was clearly flawed today, thanks for correcting me Shawn. What I meant was in a 3 ZK setup, if you lose one machine, you are okay, but you are also "at risk", since losing anything else would lose quorum. So in our NRT-style scenario, we would have to get that dead machine ba

Re: Solrj: SystemInfoRequest fails if no default collection specified.

2016-04-13 Thread Iana Bondarska
yes, it's solr_cloud, version is 4.8. Solrj version is 5.4.1.sorry, yes, there are no systeminforequest, I'm sending new GenericSolrRequest(SolrRequest.METHOD.GET, "/admin/info/system", new MapSolrParams(ImmutableMap.of())) 2016-04-13 18:38 GMT+03:00 Shawn Heisey : > On 4/13/2016 9:01

Commiting with no updates

2016-04-13 Thread Robert Brown
Hi, My autoSoftCommit is set to 1 minute. Does this actually affect things if no documents have actually been updated/created? Will this also affect the clearing of any caches? Is this also the same for hard commits, either with autoCommit or making an explicit http request to commit. Th

Re: Commiting with no updates

2016-04-13 Thread Chris Hostetter
the autoCommit settings initialize trackers so that they only fire after some updates have been made -- don't think of it as a cron that fires every X seconds, think of it as an update monitor that triggers timers. if an update comes in, and there are no timers currently active, a timer is cr

How to search for a First, Last of contact which are stored in differnet multivalued fields

2016-04-13 Thread Thrinadh Kuppili
Hi, I have created 2 multivalued fields FirstName, Lastname In solr the values available are : FirstName": [ "Kim", "Jake","NATALIE", "Tammey"] LastName": [ "Lara", "Sharan","Taylor", "Taylor"] I am trying to search where firstName is Tammey and LastName is Taylor. I should be able to sea

Re: Querying of multiple string value

2016-04-13 Thread Shawn Heisey
On 4/13/2016 9:25 AM, Zheng Lin Edwin Yeo wrote: > Would like to find out, is there any way to do a multiple value query of a > field that is of type String, besides using the OR parameters? > > Currently, I am using the OR parameters like > http://localhost:8983/solr/collection1/highlight?q=id:col

Certain Plurals Not Working SOLR v3.6

2016-04-13 Thread Sara Woodmansee
Hello all, We are having a website built with a search engine based on SOLR v3.6. We submit terms (keywords) as singulars, and the code is supposed to add (search for) plurals. Most search terms work, but we have an issue with plural terms that end in -ee, -oe, -ie, -ae, and words that end i

Re: Solrj: SystemInfoRequest fails if no default collection specified.

2016-04-13 Thread Shawn Heisey
On 4/13/2016 10:15 AM, Iana Bondarska wrote: > yes, it's solr_cloud, version is 4.8. Solrj version is 5.4.1.sorry, yes, > there are no systeminforequest, I'm sending > new GenericSolrRequest(SolrRequest.METHOD.GET, "/admin/info/system", > new MapSolrParams(ImmutableMap.of())) Side note

Re: How to search for a First, Last of contact which are stored in differnet multivalued fields

2016-04-13 Thread Erick Erickson
multivalued fields (parallel) simply do not support this type of search. You'd have to do something like index kim_lara, jake_sharan, natalie_taylor etc then search for those terms. A more general option is to put these in a multiValued field "name", with a positionIncrementGap of, say, 10. Now se

Re: How to search for a First, Last of contact which are stored in differnet multivalued fields

2016-04-13 Thread Ahmet Arslan
Hi Thrinadh, I think you can pull something together with FieldMaskingSpanQuery http://blog.griddynamics.com/2011/07/solr-experience-search-parent-child.html Ahmet On Wednesday, April 13, 2016 8:24 PM, Thrinadh Kuppili wrote: Hi, I have created 2 multivalued fields FirstName, Lastname In s

Re: How to search for a First, Last of contact which are stored in differnet multivalued fields

2016-04-13 Thread Jack Krupansky
I was also going to point out the field masking span query, but... also that it is at the Lucene level and not surfaced in Solr: http://lucene.apache.org/core/6_0_0/core/org/apache/lucene/search/spans/FieldMaskingSpanQuery.html Also see: https://issues.apache.org/jira/browse/LUCENE-1494 But no hi

Get number of results in filtered query

2016-04-13 Thread Fundera Developer
Hi all, we are developing a search engine in which all the possible results have one or more countries associated. If, apart from writing the query, the user selects a country, we use a filterquery to restrict the results to those that match the query and are associated to that country. Nothin

Problem retaining PDF text

2016-04-13 Thread Alan G Quan
I am indexing PDF documents in Solr 5.3.0 like this: curl "http://localhost:8983/solr/mycore1/update/extract?literal.id=101&commit=true"; -F "myfile=@101.pdf". This works fine and I can search for keywords in the PDF text in Solr and it finds the document correctly. But when I make any subseque

Re: Get number of results in filtered query

2016-04-13 Thread Jack Krupansky
If you just do a faceted query without the filter, each facet will give you the number of results for that country and numResults will give you the total number of results across all countries. But once you apply one or more filters, numResults reflects onl the post-filtering documents. -- Jack Kr

Re: Problem retaining PDF text

2016-04-13 Thread Alexandre Rafalovitch
Atomic update requires to reload the content of all _other_ fields to reconstruct full document before putting it back into Lucene index. That's because Lucene does not support 'update' and every update actually deletes the original and recreates it. The problem is that your PDF text is probably n

Re: Get number of results in filtered query

2016-04-13 Thread Alexandre Rafalovitch
Sounds like a job for tagging and excluding: https://cwiki.apache.org/confluence/display/solr/Faceting?focusedCommentId=42569404#Faceting-TaggingandExcludingFilters Regards, Alex. Newsletter and resources for Solr beginners and intermediates: http://www.solr-start.com/ On 14 April 2016 a

Re: How to search for a First, Last of contact which are stored in differnet multivalued fields

2016-04-13 Thread Thrinadh Kuppili
I will look into it. Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-search-for-a-First-Last-of-contact-which-are-stored-in-differnet-multivalued-fields-tp4269901p4270013.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: How to search for a First, Last of contact which are stored in differnet multivalued fields

2016-04-13 Thread Thrinadh Kuppili
Thanks Eric, It tried by indexing with one multivalued field as name but I am not getting the expected result, below are the stored data and query to fetch from solr Name: James_Cook fq = name:Jam* I should be able to search using first name (james) or Last name ( cook) or partial names (jam o

Re: How to search for a First, Last of contact which are stored in differnet multivalued fields

2016-04-13 Thread Erick Erickson
String fields are unanalyzed so this simply will not work. String fields are case sensitive, don't split on whitespace or non-letters etc. You might get significant mileage out of the admin/analysis page to see exactly what tokenizations occur. Your original question implied you were looking for c

Re: Querying of multiple string value

2016-04-13 Thread Zheng Lin Edwin Yeo
Hi Shawn, Thanks for the reply. It works. Regards, Edwin On 14 April 2016 at 01:40, Shawn Heisey wrote: > On 4/13/2016 9:25 AM, Zheng Lin Edwin Yeo wrote: > > Would like to find out, is there any way to do a multiple value query of > a > > field that is of type String, besides using the OR pa

Optimal indexing speed in Solr

2016-04-13 Thread Zheng Lin Edwin Yeo
Hi, Would like to find out, what is the optimal indexing speed in Solr? Previously, I managed to get more than 3GB/hour, but now the speed has drop to 0.7GB/hr. What could be the potential reason behind this? Besides the index size getting bigger, I have only added in more collections into the c

How to declare field type for IntPoint field in solr 6.0 schema?

2016-04-13 Thread Rafis Ismagilov
Should it be PointType, BinaryField, or something else. All examples use TrieIntField for int. Thanks,Rafis