Schema/Index design for disparate data sources (Federated / Google like search)

2015-12-22 Thread Susheel Kumar
Hello, I am going thru few use cases where we have kind of multiple disparate data sources which in general doesn't have much common fields and i was thinking to design different schema/index/collection for each of them and query each of them separately and provide different result sets to the cli

Re: Schema/Index design for disparate data sources (Federated / Google like search)

2015-12-22 Thread Susheel Kumar
s fields which have values, so no > storage or performance is consumed when you have a lot of fields which are > not present for a particular data source. > > -- Jack Krupansky > > On Tue, Dec 22, 2015 at 11:25 AM, Susheel Kumar > wrote: > > > Hello, > > >

Re: Newbie: Searching across 2 collections ?

2016-01-06 Thread Susheel Kumar
Hi Bruno, I just tested this scenario in my local solr 5.3.1 and it returned results from two identical collections. I doubt if it is broken in 5.4 just double check if you are not missing anything else. Thanks, Susheel http://localhost:8983/solr/c1/select?q=id_type%3Ahello&wt=json&indent=true&c

Re: Newbie: Searching across 2 collections ?

2016-01-06 Thread Susheel Kumar
cc_CA":"LAPSED", > "cc_EP":"LAPSED", > "cc_JP":"PENDING", > "cc_US":"LAPSED", > "fid":"34520196"}] > }} > > > I have the same xxx.xxx.xxx.xxx:xxx

Re: Newbie: Searching across 2 collections ?

2016-01-06 Thread Susheel Kumar
handler, is it always present in 5.4 version ? > > > Le 06/01/2016 17:38, Bruno Mannina a écrit : > >> I have a dev' server, I will do some test on it... >> >> Le 06/01/2016 17:31, Susheel Kumar a écrit : >> >>> I'll suggest if you can setup som

Re: collapse filter query

2016-01-11 Thread Susheel Kumar
You can go to https://issues.apache.org/jira/browse/SOLR/ and create Jira ticket after signing in. Thanks, Susheel On Mon, Jan 11, 2016 at 2:15 PM, sara hajili wrote: > Tnx.How I can create a jira ticket? > On Jan 11, 2016 10:42 PM, "Joel Bernstein" wrote: > > > I believe this is a bug. I thin

Re: Returning all documents in a collection

2016-01-20 Thread Susheel Kumar
Hello Salman, Please checkout the export functionality https://cwiki.apache.org/confluence/display/solr/Exporting+Result+Sets Thanks, Susheel On Wed, Jan 20, 2016 at 6:57 AM, Emir Arnautovic < emir.arnauto...@sematext.com> wrote: > Hi Salman, > You should use cursors in order to avoid "deep pag

Re: collection aliasing

2016-01-22 Thread Susheel Kumar
Hi Vidya, if i understood your question correctly you can simply use the original collection name(s) to point to individual collections. Isn't that the case? Thanks, Susheel On Fri, Jan 22, 2016 at 8:10 AM, vidya wrote: > Hi > > I wanted to mainatain two sets of indexes or collections for maint

Re: Mix Solr 4 and 5?

2016-01-23 Thread Susheel Kumar
Just to share one of our developer noticed issues when trying to use SolrJ 4.10.x against Solr 5.4.0 with chroot enabled. After she upgraded to SolrJ 5.3.1, it worked. Thanks, Susheel On Fri, Jan 22, 2016 at 11:20 AM, Jack Krupansky wrote: > To be clear, having separate Solr servers on differen

Re: collection aliasing

2016-01-24 Thread Susheel Kumar
As Jens mentioned you use aliasing for referring to a group of collections. E.g. below command you can create a alias called quarterly for 3 separate collections Jan,Feb & Mar and then you can use alias quarterly to refer all of them in single query http:// :8983/solr/admin/collections?action=CREA

Re: Solrcloud error on finding active nodes.

2016-01-27 Thread Susheel Kumar
Hi, I haven't seen this error before but which version of Solr you are using & assume zookeeper is configured correctly. Do you see nodes down/active/leader etc. under Cloud in Admin UI? Thanks, Susheel On Wed, Jan 27, 2016 at 11:51 AM, Pranaya Behera wrote: > Hi, > I have created one sol

Re: Memory leak defect or misssuse of SolrJ API?

2016-01-30 Thread Susheel Kumar
Hi Steve, Can you please elaborate what error you are getting and i didn't understand your code above, that why initiating Solr client object is in loop. In general creating client instance should be outside the loop and a one time activity during the complete execution of program. Thanks, Sus

Re: URI is too long

2016-02-01 Thread Susheel Kumar
Post is pretty much similar to GET. You can use any REST Client to try. Same select URL & pass below header and put the queries parameters into body POST: http://localhost:8983/solr/techproducts/select Header == Content-Type:application/x-www-form-urlencoded payload/body: == q=*:*&rows=2 Than

Re: sorry, no dataimport-handler defined!

2016-02-01 Thread Susheel Kumar
Please register Data Import Handler to work with it https://cwiki.apache.org/confluence/display/solr/Uploading+Structured+Data+Store+Data+with+the+Data+Import+Handler On Mon, Feb 1, 2016 at 2:31 PM, Jean-Jacques MONOT wrote: > Hello > > I am using SOLR 5.4.1 and the graphical admin UI. > > I su

Re: catch alls and nuances

2016-02-02 Thread Susheel Kumar
Hi John - You can take more close look on different options with WordDelimeterFilterFactory at https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory to see if they meet your requirement and use Analysis tab in Solr Admin UI. If still have question, you can sha

Re: Solr for real time analytics system

2016-02-04 Thread Susheel Kumar
Hi Rohit, Please take a loot at Streaming expressions & Parallel SQL Interface. That should meet many of your analytics requirement (aggregation queries like sum/average/groupby etc). https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions https://cwiki.apache.org/confluence/displa

Re: solr performance issue

2016-02-08 Thread Susheel Kumar
1 million document shouldn't have any issues at all. Something else is wrong with your hw/system configuration. Thanks, Susheel On Mon, Feb 8, 2016 at 6:45 AM, sara hajili wrote: > On Mon, Feb 8, 2016 at 3:04 AM, sara hajili wrote: > > > sorry i made a mistake i have a bout 1000 K doc. > > i

Re: Data Import Handler - autoSoftCommit and autoCommit

2016-02-08 Thread Susheel Kumar
You can start with one of the suggestions from this link based on your indexing and query load. https://lucidworks.com/blog/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/ Thanks, Susheel On Mon, Feb 8, 2016 at 10:15 AM, Troy Edwards wrote: > We are running the

Re: Solr architecture

2016-02-08 Thread Susheel Kumar
Also if you are expecting indexing of 2 billion docs as NRT or if it will be offline (during off hours etc). For more accurate sizing you may also want to index say 10 million documents which may give you idea how much is your index size and then use that for extrapolation to come up with memory r

Re: Bulk delete of Solr documents

2016-02-08 Thread Susheel Kumar
Yes, use below url http://localhost:8983/solr//update?stream.body= *:*&commit=true On Mon, Feb 8, 2016 at 11:33 AM, Anil wrote: > Hi , > > Can we delete solr documents from a collection in a bulk ? > > Regards, > Anil >

Re: Solr 4.10 with Jetty 8.1.10 & Tomcat 7

2016-02-09 Thread Susheel Kumar
Shahzad - I am curious what features of distributed search stops you to run SolrCloud. Using DS, you would be able to search across cores or collections. https://cwiki.apache.org/confluence/display/solr/Advanced+Distributed+Request+Options Thanks, Susheel On Tue, Feb 9, 2016 at 12:10 AM, Shahzad

Re: Solr 4.10 with Jetty 8.1.10 & Tomcat 7

2016-02-09 Thread Susheel Kumar
Shahzad - As Shawn mentioned you can get lot of inputs from the folks who are using joins in Solr cloud if you start a new thread and i would suggest to take a look at Solr Streaming expressions and Parallel SQL Interface which covers joining use cases as well. Thanks, Susheel On Tue, Feb 9, 2016

Re: Adding nodes

2016-02-14 Thread Susheel Kumar
Hi Paul, Shawn is referring to use Collections API https://cwiki.apache.org/confluence/display/solr/Collections+API than Core Admin API https://cwiki.apache.org/confluence/display/solr/CoreAdmin+API for SolrCloud. Hope that clarifies and you mentioned about ADDREPLICA which is the collections AP

Re: Adding nodes

2016-02-14 Thread Susheel Kumar
of autoscale scenarios where a node has > gone down or more nodes are needed to handle load. > > > The coreadmin api makes this easy. The collections api (ADDREPLICA), > makes this very difficult. > > > On 2/14/16, 8:19 AM, "Susheel Kumar" wrote: &

Re: Need to move on SOlr cloud (help required)

2016-02-15 Thread Susheel Kumar
In SolrJ, you would use CloudSolrClient which interacts with Zookeeper (which maintains Cluster State). See CloudSolrClient API. So that's how SolrJ would know which node is down or not. Thanks, Susheel On Mon, Feb 15, 2016 at 12:07 AM, Midas A wrote: > Erick, > > We are using php for our app

Re: Adding nodes

2016-02-15 Thread Susheel Kumar
assumptions, as I'm new > to solr. > > Thanks, > Paul > > > On Feb 14, 2016, at 10:35 AM, Susheel Kumar > wrote: > > > > Hi Pual, > > > > > > For Auto-scaling, it depends on how you are thinking to design and > what/how > > do you

Re: Running solr as a service vs. Running it as a process

2016-02-17 Thread Susheel Kumar
In addition you also get many advantages like you can start/stop/restart solr using "service solr stop|start|restart" as mentioned above. You don't need to launch solr script directly. Also the install scripts take care of installing/setting up Solr nicely for Production environment. Even you can

Re: OutOfMemory when batchupdating from SolrJ

2016-02-19 Thread Susheel Kumar
When you run your SolrJ Client Indexing program, can you increase heap size similar below. I guess it may be on your client side you are running int OOM... or please share the exact error if below doesn't work/is the issue. java -Xmx4096m Thanks, Susheel On Fri, Feb 19, 2016 at 6:25 AM,

Re: OutOfMemory when batchupdating from SolrJ

2016-02-19 Thread Susheel Kumar
And if it is on Solr side, please increase the heap size on Solr side https://cwiki.apache.org/confluence/display/solr/JVM+Settings On Fri, Feb 19, 2016 at 8:42 AM, Susheel Kumar wrote: > When you run your SolrJ Client Indexing program, can you increase heap > size similar below. I gu

Re: OutOfMemory when batchupdating from SolrJ

2016-02-19 Thread Susheel Kumar
, Feb 19, 2016 at 9:17 AM, Clemens Wyss DEV wrote: > > increase heap size > this is a "workaround" > > Doesn't SolrClient free part of its buffer? At least documents it has sent > to the Solr-Server? > > -----Ursprüngliche Nachricht- > Von: Susheel Kumar

Re: OutOfMemory when batchupdating from SolrJ

2016-02-19 Thread Susheel Kumar
Hope that clarifies. On Fri, Feb 19, 2016 at 12:11 PM, Clemens Wyss DEV wrote: > Thanks Susheel, > but I am having problems in and am talking about SolrJ, i.e. the > "client-side of Solr" ... > > -Ursprüngliche Nachricht- > Von: Susheel Kumar [mailto:susheel2.

Re: Slow commits

2016-02-22 Thread Susheel Kumar
Adam - how many documents you have in your index? Thanks, Susheel On Mon, Feb 22, 2016 at 4:37 AM, Adam Neal [Extranet] wrote: > Well I got the numbers wrong, there are actually around 66000 fields on > the index. I have restructured the index and there are now around 1500 > fiields. This has r

Re: Slow commits

2016-02-22 Thread Susheel Kumar
Sorry, I see now you mentioned 56K docs which is pretty small. On Mon, Feb 22, 2016 at 8:30 AM, Susheel Kumar wrote: > Adam - how many documents you have in your index? > > Thanks, > Susheel > > On Mon, Feb 22, 2016 at 4:37 AM, Adam Neal [Extranet] > wrote: > >>

Re: SOLR cloud startup poniting to zookeeper ensemble

2016-02-23 Thread Susheel Kumar
Use this syntax and see if it works. bin/solr start -e cloud -noprompt -z localhost:2181,localhost:2182,localhost:2183 On Mon, Feb 22, 2016 at 11:16 PM, bbarani wrote: > I downloaded the latest version of SOLR (5.5.0) and also installed > zookeeper > on port 2181,2182,2183 and its running fine.

Re: SOLR cloud startup poniting to zookeeper ensemble

2016-02-24 Thread Susheel Kumar
I see your point. Didn't realize that you are using windows. If it work using double quotes, please go ahead and launch that way. Thank, Susheel On Wed, Feb 24, 2016 at 12:44 PM, bbarani wrote: > Its still throwing error without quotes. > > solr start -e cloud -noprompt -z > localhost:2181,lo

Re: Indexing Twitter - Hypothetical

2016-03-06 Thread Susheel Kumar
Entity Recognition means you may want to recognize different entities name/person, email, location/city/state/country etc. in your tweets/messages with goal of providing better relevant results to users. NER can be used at query or indexing (data enrichment) time. Thanks, Susheel On Fri, Mar 4,

Re: Solr Queries are very slow - Suggestions needed

2016-03-14 Thread Susheel Kumar
For each of the solr machines/shards you have. Thanks. On Mon, Mar 14, 2016 at 10:04 AM, Susheel Kumar wrote: > Hello Anil, > > Can you go to Solr Admin Panel -> Dashboard and share all 4 memory > parameters under System / share the snapshot. ? > > Thanks, > Susheel >

Re: Solr Queries are very slow - Suggestions needed

2016-03-14 Thread Susheel Kumar
Hello Anil, Can you go to Solr Admin Panel -> Dashboard and share all 4 memory parameters under System / share the snapshot. ? Thanks, Susheel On Mon, Mar 14, 2016 at 5:36 AM, Anil wrote: > HI Toke and Jack, > > Please find the details below. > > * How large are your 3 shards in bytes? (total

Re: Solr Queries are very slow - Suggestions needed

2016-03-14 Thread Susheel Kumar
thing like > >> >> > >> >> solr/files_shard1_replica1/query?(your queryhere)&distrib=false > >> >> > >> >> My bet is that your QTime will be significantly different with the > two > >> >> shards. > >> >> > >>

Re: Indexing 700 docs per second

2016-04-19 Thread Susheel Kumar
It sounds achievable with your machine configuration and i would suggest to try out atomic update. Use SolrJ with multi-threaded indexing for higher indexing rate. Thanks, Susheel On Tue, Apr 19, 2016 at 9:27 AM, Tom Evans wrote: > On Tue, Apr 19, 2016 at 10:25 AM, Mark Robinson > wrote: >

ReversedWildcardFilterFactory question

2016-05-04 Thread Susheel Kumar
Hello, I wanted to confirm that using below type for fields where user *may also* search for leading wildcard, is a good solution and edismax query parser would automatically reverse the query string in case of leading wildcard search e.g. q:"text:*plane" would automatically be reversed by edismax

Solr cloud 6.0.0 with ZooKeeper 3.4.8 Errors

2016-05-04 Thread Susheel Kumar
Hello, I am trying to setup 2 node Solr cloud 6 cluster with ZK 3.4.8 and used the install service to setup solr. After launching Solr Admin Panel on server1, it looses connections in few seconds and then comes back and other node server2 is marked as Down in cloud graph. After few seconds its lo

Re: Solr cloud 6.0.0 with ZooKeeper 3.4.8 Errors

2016-05-04 Thread Susheel Kumar
iptor limit. > > On Wed, May 4, 2016 at 3:48 PM, Susheel Kumar > wrote: > > > Hello, > > > > I am trying to setup 2 node Solr cloud 6 cluster with ZK 3.4.8 and used > the > > install service to setup solr. > > > > After launching Solr Admin Panel on se

Re: Solr cloud 6.0.0 with ZooKeeper 3.4.8 Errors

2016-05-04 Thread Susheel Kumar
you have too many open files, try increasing the file > : > descriptor limit. > : > > : > On Wed, May 4, 2016 at 3:48 PM, Susheel Kumar > : > wrote: > : > > : > > Hello, > : > > > : > > I am trying to setup 2 node Solr cloud 6 cl

Re: Solr cloud 6.0.0 with ZooKeeper 3.4.8 Errors

2016-05-05 Thread Susheel Kumar
unstable state. Thanks, Susheel On Wed, May 4, 2016 at 9:56 PM, Susheel Kumar wrote: > Thanks, Nick & Hoss. I am using the exact same machine, have wiped out > solr 5.5.0 and installed solr-6.0.0 with external ZK 3.4.8. I checked the > File Description limit for user solr, which

Re: Solr cloud 6.0.0 with ZooKeeper 3.4.8 Errors

2016-05-05 Thread Susheel Kumar
5, 2016 2:05 PM, "Susheel Kumar" wrote: > > > Nick, Hoss - Things are back to normal with ZK 3.4.8 and ZK-6.0.0. I > > switched to Solr 5.5.0 with ZK 3.4.8 which worked fine and then installed > > 6.0.0. I suspect (not 100% sure) i left ZK dataDir / Solr collec

Re: understanding phonetic matching

2016-05-07 Thread Susheel Kumar
Jay, There are mainly three phonetics algorithms available in Solr i.e. RefinedSoundex, DoubleMetaphone & BeiderMorse. We did extensive comparison considering various tests cases and found BeiderMorse to be the best among those for finding sound like matches and it also supports multiple language

Re: Difficulties in getting Solrcloud running

2015-08-19 Thread Susheel Kumar
Use command like below to create collection http:// :/solr/admin/collections?action=CREATE&name=&numShards=2&replicationFactor=2&maxShardsPerNode=2&collection.configName= Susheel On Wed, Aug 19, 2015 at 11:03 AM, Kevin Lee wrote: > Hi, > > Have you created a collection yet? If not, then the

Re: Solrcloud node is not comming up

2015-08-19 Thread Susheel Kumar
When you are adding a node,what exactly you are looking for that node to do. Are you adding node to create a new Replica in which case you will call ADDREPLICA collections API. Thanks, Susheel On Wed, Aug 19, 2015 at 3:42 PM, Merlin Morgenstern < merlin.morgenst...@gmail.com> wrote: > I have a

Re: How to Fast Bulk Inserting documents

2015-08-19 Thread Susheel Kumar
For Indexing 3.5 billion documents, you will not only run into bottleneck with Solr but also at different places (data acquisition, solr document object creation, submitting in bulk/batches to Solr). This will require parallelizing the above operations at each of the above steps which can get you

Re: How to close log when use the solrj api

2015-08-20 Thread Susheel Kumar
You may want to see the logging level using the Dashboard URL http://localhost:8983/solr/#/~logging/level & even can set for the session but otherwise you can look into server/resources/log4j.properties. Refer https://cwiki.apache.org/confluence/display/solr/Configuring+Logging On Thu, Aug 20, 201

Re: Too many updates received since start

2015-08-22 Thread Susheel Kumar
You can try to follow the suggestions at below link which had similar issued and see if that helps. http://lucene.472066.n3.nabble.com/ColrCloud-IOException-occured-when-talking-to-server-at-td4061831.html Thnx On Sat, Aug 22, 2015 at 9:05 AM, Yago Riveiro wrote: > Hi, > > Can anyone explain

Ant Ivy resolve / Authenticated Proxy Issue

2015-09-16 Thread Susheel Kumar
Hi, Sending it to Solr group in addition to Ivy group. I have been building Solr trunk ( http://svn.apache.org/repos/asf/lucene/dev/trunk/) using "ant eclipse" from quite some time but this week i am on a job where things are behind the firewall and a proxy is used. Issue: When not in company n

Re: Ant Ivy resolve / Authenticated Proxy Issue

2015-09-16 Thread Susheel Kumar
Not really. There are no lock files & even after cleaning up lock files (to be sure) problem still persists. It works outside company network but inside it stucks. let me try to see if jconsole can show something meaningful. Thanks, Susheel On Wed, Sep 16, 2015 at 12:17 PM, Shawn Heisey wrote:

Re: How to check Zookeeper ensemble status?

2015-09-18 Thread Susheel Kumar
Additionally you may want to use the four letter commands like stat et.c. using nc or telnet http://zookeeper.apache.org/doc/r3.3.3/zookeeperAdmin.html Thanks, Susheel On Fri, Sep 18, 2015 at 11:54 AM, Sameer Maggon wrote: > Have you tried zkServer.sh status? > > This will tell you whether zo

Re: Different ports for search and upload request

2015-09-24 Thread Susheel Kumar
I am not aware of such a feature in Solr but do want to know your use case / logic behind coming up with different ports. If it is for security / exposing to user, usually Solr shouldn't be exposed to user directly but via application / service / api. Thanks, Susheel On Thu, Sep 24, 2015 at 4:01

Re: Solr cross core join special condition

2015-10-07 Thread Susheel Kumar
You may want to take a look at new Solr feature of Streaming API & Expressions https://issues.apache.org/jira/browse/SOLR-7584?filter=12333278 for making joins between collections. On Wed, Oct 7, 2015 at 9:42 AM, Ryan Josal wrote: > I developed a join transformer plugin that did that (although i

Re: Best Indexing Approaches - To max the throughput

2015-10-08 Thread Susheel Kumar
The ConcurrentUpdateSolrClient is not cloud aware or takes zkHostString as input. So only option is to use CloudSolrClient with SolrJ & Thread pool executor framework. On Thu, Oct 8, 2015 at 12:50 PM, Alessandro Benedetti < benedetti.ale...@gmail.com> wrote: > This depends of the number of activ

Re: Scramble data

2015-10-08 Thread Susheel Kumar
Like Erick said, would something like using replace function on individual sensitive fields in fl param would work? replacing to something REDACTED etc. On Thu, Oct 8, 2015 at 2:58 PM, Tarala, Magesh wrote: > I already have the data ingested and it takes several days to do that. I > was trying

Re: Exclude documents having same data in two fields

2015-10-09 Thread Susheel Kumar
Hi Aman, Did the problem resolved or still having some errors. Thnx On Fri, Oct 9, 2015 at 8:28 AM, Aman Tandon wrote: > okay Thanks > > With Regards > Aman Tandon > > On Fri, Oct 9, 2015 at 4:25 PM, Upayavira wrote: > > > Just beware of performance here. This is fine for smaller indexes, but

Re: Solr cross core join special condition

2015-10-11 Thread Susheel Kumar
sue that you mentioned but it seems its target is > Solr 6! Am I correct? The patch failed for Solr 5.3 due to class not found. > For Solr 5.x should I try to implement something similar myself? > > Sincerely yours. > > > On Wed, Oct 7, 2015 at 7:15 PM, Susheel Kumar > wrote:

Re: Spell Check and Privacy

2015-10-12 Thread Susheel Kumar
Hi Arnon, I couldn't fully understood your use case regarding Privacy. Are you concerned that SpellCheck may reveal user names part of suggestions which could have belonged to different organizations / ACLS OR after providing suggestions you are concerned that user may be able to click and view ot

Re: How to formulate query

2015-10-12 Thread Susheel Kumar
Hi Prassana, This is a highly custom relevancy/ordering requirement and one possible way you can try is by creating multiple fields and coming up with query for each of the searches and boost them accordingly. Thnx On Mon, Oct 12, 2015 at 12:50 PM, Erick Erickson wrote: > Nothing exists current

Re: are there any SolrCloud supervisors?

2015-10-13 Thread Susheel Kumar
Sounds interesting... On Tue, Oct 13, 2015 at 12:58 AM, Trey Grainger wrote: > I'd be very interested in taking a look if you post the code. > > Trey Grainger > Co-Author, Solr in Action > Director of Engineering, Search & Recommendations @ CareerBuilder > > On Fri, Oct 2, 2015 at 3:09 PM, r b

Re: slow queries

2015-10-14 Thread Susheel Kumar
Hi Lorenzo, Can you provide which solr version you are using, index size on disks & hardware config (memory/processor on each machine. Thanks, Susheel On Wed, Oct 14, 2015 at 6:03 AM, Lorenzo Fundaró < lorenzo.fund...@dawandamail.com> wrote: > Hello, > > I have following conf for filters and co

Re: Lucene Revolution ?

2015-10-18 Thread Susheel Kumar
I couldn't also make it. Would love to hear more who make it. Thanks, Susheel On Sun, Oct 18, 2015 at 10:53 AM, Jack Krupansky wrote: > Sorry I missed out this year. I thought it was next month and hadn't seen > any reminders. Just last Tuesday I finally got around to googling the > conference

auto deploument/setup of Solr & Zookeeper on medium-large clusters

2015-10-19 Thread Susheel Kumar
Hi, I am trying to find the best practises for setting up Solr on new 20+ machines & ZK (5+) and repeating same on other environments. What's the best way to download, extract, setup Solr & ZK in an automated way along with other dependencies like java etc. Among shell scripts or puppet or dock

Re: Validating idea of architecture for RDB / Import / Solr

2015-10-20 Thread Susheel Kumar
Hello Hangu, OPTION1. you can write complex/nested join queries with DIH and have functions written in javascript for transformations in data-config if that meets your domain requirement OPTION2. use Java program with SolrJ and read data using jdbc and apply domain specific rules, create solr d

DevOps question : auto deployment/setup of Solr & Zookeeper on medium-large clusters

2015-10-20 Thread Susheel Kumar
anks, Susheel On Mon, Oct 19, 2015 at 3:32 PM, Susheel Kumar wrote: > Hi, > > I am trying to find the best practises for setting up Solr on new 20+ > machines & ZK (5+) and repeating same on other environments. What's the > best way to download, extract, setup Solr & ZK

Re: DevOps question : auto deployment/setup of Solr & Zookeeper on medium-large clusters

2015-10-20 Thread Susheel Kumar
ch as Jenkins, TeamCity, > >BuildBot), installs solr as "solradm" and sets it up to run as "solrapp". > > > >I am not a systems administrator, and I'm not really in "DevOps", my job > >is to be above all of that and do "systems archi

Re: SolrJ stalls/hangs on client.add(); and doesn't return

2015-10-30 Thread Susheel Kumar
Just a suggestion Markus that sending 50k documents in your case worked but you may want to benchmark sending batches in 5K, 10k or 20k batches and compare with sending 50k batches. It may turn out that smaller batch size may be faster than very big batch size... On Fri, Oct 30, 2015 at 7:59 AM,

Re: Problem with the Content Field during Solr Indexing

2015-11-02 Thread Susheel Kumar
Hi Shruti, If you are looking to index images to make them searchable (Image Search) then you will have to look at LIRE (Lucene Image Retrieval) http://www.lire-project.net/ and can follow Lire Solr Plugin at this site https://bitbucket.org/dermotte/liresolr. Thanks, Susheel On Sat, Oct 31, 201

Solr Search: Access Control / Role based security

2015-11-05 Thread Susheel Kumar
Hi, I have seen couple of use cases / need where we want to restrict result of search based on role of a user. For e.g. - if user role is admin, any document from the search result will be returned - if user role is manager, only documents intended for managers will be returned - if user role is

Re: Solr Search: Access Control / Role based security

2015-11-10 Thread Susheel Kumar
r simple use-cases and has the benefit that the > > filterCache will help things stay nice and speedy. Apache ManifoldCF > goes a > > bit further and ties back to your authentication and authorization > > mechanism: > > > > > > > http://manifoldcf.apache.org/

Re: DevOps question : auto deployment/setup of Solr & Zookeeper on medium-large clusters

2015-11-13 Thread Susheel Kumar
wrote: > Susheel, > > Our puppet stuff is very close to our infrastructure, using specific > Netapp volumes and such, and assuming some files come from NFS. > It is also personally embarrassing to me that we still use NIS - doh! > > -Original Message- > From:

Re: capacity of storage a single core

2015-12-09 Thread Susheel Kumar
Hi Jack, Just to add, OS Disk Cache will still make query performant even though entire index can't be loaded into memory. How much more latency compare to if index gets completely loaded into memory may vary depending to index size etc. I am trying to clarify this here because lot of folks takes

Re: capacity of storage a single core

2015-12-09 Thread Susheel Kumar
he purpose of a > general rule is to generally avoid unhappiness, but if you have an appetite > and tolerance for unhappiness, then go for it. > > Replica vs. shard? They're basically the same - a replica is a copy of a > shard. > > -- Jack Krupansky > > On Wed, Dec 9

Re: Increasing Solr5 time out from 30 seconds while starting solr

2015-12-09 Thread Susheel Kumar
Yes, Either look into log files as Eric suggested or run with -f and see the startup error on the console. Kill any existing instance or remove any old PID file before starting with -f. Thnx On Wed, Dec 9, 2015 at 12:46 PM, Erick Erickson wrote: > What does the Solr log file say? Often this is

Re: Increasing Solr5 time out from 30 seconds while starting solr

2015-12-10 Thread Susheel Kumar
gt; INFO - 2015-12-10 15:37:17.702; [ ] org.apache.solr.core.SolrConfig; > Loaded SolrConfig: solrconfig.xml > INFO - 2015-12-10 15:37:17.718; [ ] org.apache.solr.schema.IndexSchema; > Reading Solr Schema from > > /home/jabong/Downloads/software/dev/solr5/server/solr/jabong_disco

Re: capacity of storage a single core

2015-12-10 Thread Susheel Kumar
drive performance > down, sometimes radically. > > I've seen 350M docs, 200-300 fields (aggregate) fit into 12G > of JVM. I've seen 25M docs (really big ones) strain 48G > JVM heaps. > > Jack's approach is what I use; pick a number and test with it. > Here&#x

Re: capacity of storage a single core

2015-12-11 Thread Susheel Kumar
s of the stored index. > > 2) MMap index segments are actually only the segments used for searching ? > Is not the Lucene directory memory mapping the stored segments as well ? > This was my understanding but maybe I am wrong. > In the case we first memory map the stored segments and the

RE: Is it possible to use multiple index data directory in Apache Solr?

2015-03-01 Thread Susheel Kumar
Under Solr/example folder, you will find "multicore" folder under which you can create multiple core/index directory folders and edit the solr.xml to specify each of the new core/directory. When you start Solr under examples directory, use command line like below to load Solr and then you sho

Re: SOLR cloud sharding

2016-06-03 Thread Susheel Kumar
Also not sure about your domain but you may want to double check if you really need 350 fields for searching & storing. Many times when you challenge this against the higher cost of hardware, you may be able to reduce # of searchable / stored fields. Thanks, Susheel On Thu, Jun 2, 2016 at 9:21 AM

Re: Solr Schema for same field names within different input entities

2016-06-08 Thread Susheel Kumar
How about creating schema with temperature, humidity & a day field (and other fields you may have like zipcode/city/country etc). Put day="next" or day="previous" and during query use fq (filter query) to have fq=day:previous or fq=day:next. Thanks, Susheel On Wed, Jun 8, 2016 at 2:46 PM, Anirudd

ImplicitSnitch preferredNodes

2016-06-30 Thread Susheel Kumar
Hello Arcadius, Noble, I have a single Solr cluster setup across two DC's having good connectivity with below similar configuration and looking to use preferredNodes feature/rule that search queries executed from DC1 client, uses all dc1 replica's and DC2 client, uses all dc2 replica's. Bit confu

ImplicitSnitch Documentation for querying Multi-DataCenter replicas using preferredNodes

2016-07-05 Thread Susheel Kumar
Hello, Can someone help me to clarify and document how to use ImplicitSnitch preferredNodes rule to implement scenario where search queries executed from data center DC1 client, uses all dc1 replica's and data center DC2 client, uses all dc2 replica's. The only source I see is the discussion from

Re: Shard vs Replica

2016-07-06 Thread Susheel Kumar
To understand shard & replica, let's first understand what is sharding and why it is needed. Sharding - Assume your index grows large that it doesn't fit into a single machine (for e.g. your index size is 80GB and your machine is 64GB in which case index won't fit into memory). Now to get better

Re: SolrCloud - Query performance degrades with multiple servers(Shards)

2016-07-18 Thread Susheel Kumar
Hello, Question: Do you really need sharding/can live without sharding since you mentioned only 10K records in one shard. What's your index/document size? Thanks, Susheel On Mon, Jul 18, 2016 at 2:08 AM, kasimjinwala wrote: > currently I am using solrCloud 5.0 and I am facing query performanc

Re: SolrCloud - Query performance degrades with multiple servers(Shards)

2016-07-19 Thread Susheel Kumar
You may want to utilise Document routing (_route_) option to have your query serve faster but above you are trying to compare apple with oranges meaning your performance tests numbers have to be based on either your actual numbers like 3-5 million docs per shard or sufficient enough to see advantag

How to set credentials when querying using SolrJ - Basic Authentication

2016-08-01 Thread Susheel Kumar
Hello, I am looking to pass user / pwd when querying using CloudSolrClient. The documentation https://cwiki.apache.org/confluence/display/solr/Basic+Authentication+Plugin describes about setting the credential when calling the request method like below SolrRequest req ;//create a new request obj

Re: How to set credentials when querying using SolrJ - Basic Authentication

2016-08-01 Thread Susheel Kumar
Thank you so much, Shawn. Didn't realize that i could call process directly. I think it will be helpful to add this code to solr documentation. I'll create a jira to update the documentation. Thanks, Susheel On Mon, Aug 1, 2016 at 7:14 PM, Shawn Heisey wrote: > On 8/1/2016 1:5

Re: problems with bulk indexing with concurrent DIH

2016-08-02 Thread Susheel Kumar
My experience with DIH was we couldn't scale to the level we wanted. SorlJ with multi-threading & batch updates (parallel threads pushing data into solr) worked and were able to ingest 5K-10K docs per second. Thanks, Susheel On Tue, Aug 2, 2016 at 9:15 AM, Mikhail Khludnev wrote: > Bernd, > Bu

Re: [Non-DoD Source] Re: Solr 6.1.0 issue (UNCLASSIFIED)

2016-08-06 Thread Susheel Kumar
As Eric mentioned you may want to check your analysis chain and see if you are not using *KeywordTokenizer* for content type / content type is String in your schema.xml. I have seen similar errors before due to KeywordTokenizer being used. Thanks, Susheel On Fri, Aug 5, 2016 at 11:46 PM, Erick E

Need Permission to commit feature branch for Pull Request SOLR-8146

2016-08-09 Thread Susheel Kumar
Hello, I created a feature branch for SOLR-8146 that i can submit a pull request (PR) for review. While pushing the feature branch i am getting below error. My github id is susheel2...@gmail.com Thanks, Susheel lucene-solr git:(SOLR-8146) git push origin SOLR-8146 Username for 'https://github.

Re: Solr 6.1 :: language specific analysis

2016-08-10 Thread Susheel Kumar
BeiderMorse supports these phonetics variations like Foto / Photo and have support for many languages including German. Please see https://cwiki.apache.org/confluence/display/solr/Phonetic+Matching Thanks, Susheel On Wed, Aug 10, 2016 at 2:47 PM, Alexandre Drouin < alexandre.dro...@orckestra.com

Re: Want zero results from SOLR when there are no matches for "querystring"

2016-08-12 Thread Susheel Kumar
Not exactly sure what you are looking from chaining the results but similar functionality is available in Streaming expressions where result of inner expressions are passed to outer expressions and so on https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions HTH Susheel On Fri, Au

BasicAuthentication & blockUnknown Issue

2016-08-26 Thread Susheel Kumar
Hello, I configured Solr for Basic Authentication with blockUnknown:true. It works well and no issue observed in ingestion & querying the solr cloud cluster but in the Logging i see below errors being logged. I see SOLR-9188 and SOLR 8236 logged for similar issue. Is there any workaround/fix th

Re: Solrcloud with rest api

2016-09-21 Thread Susheel Kumar
This link has more info http://lucene.472066.n3.nabble.com/Configure-SolrCloud-for-Loadbalance-for-net-client-td4280074.html Another suggestion to consider. If you are able to use Java for developing the search API/service/client then please explore this option since that will make life easier an

Re: SolrJ App Engine Client

2016-09-22 Thread Susheel Kumar
As per this doc, socket are allowed for paid apps. Not sure if this would make it unrestricted. https://cloud.google.com/appengine/docs/java/sockets/ On Thu, Sep 22, 2016 at 3:38 PM, Jay Parashar wrote: > I sent a similar message earlier but do not see it. Apologize if its > duplicated. > > I a

Re: Connect to SolrCloud using proxy in SolrJ

2016-09-29 Thread Susheel Kumar
As Vincenzo mentioned above you shall try to check using telnet and if connection fails, then you should try to set http proxy on terminal/command line using this and then give try again with telnet. As long as telnet works, your code should also be able to connect export http.proxy=http:// \:@:

  1   2   3   4   5   >