No, we don't want to put at the same box as Database box. Agree, that indexing/committing/merging and optimizing is the bottle neck.
I think it worths to try SolrJ with CommmonsHttpSolrServer option for now and let's see what happened to load 3 millions docs. Thanks Francis -----Original Message----- From: Fuad Efendi [mailto:f...@efendi.ca] Sent: Wednesday, August 26, 2009 1:34 PM To: solr-user@lucene.apache.org Subject: RE: SolrJ and Solr web simultaneously? With this configuration probably preferred method is to run standalone Java application on same box as DB, or very close to DB (in same network segment). HTTP is not a bottleneck; main bottleneck is indexing/committing/merging/optimizing in SOLR... Just as a sample, if you submit to SOLR batch of large documents, - expect 5-55 seconds response time (even with EmbeddedSolr or pure Lucene), but nothing related to network latency nor to firewalling... upload 1Mb over 100Mbps network takes less than 0.1 seconds, but indexing it may take > 0.5 secs... Standalone application with SolrJ is also good because you may schedule batch updates etc; automated... P.S. In theory, if you are using Oracle, you may even try to implement triggers written in Java causing SOLR update on each row update (transactional); but I haven't heard anyone uses stored procs in Java, too risky and slow, with specific dependencies... -----Original Message----- From: Francis Yakin [mailto:fya...@liquid.com] Sent: August-26-09 4:18 PM To: 'solr-user@lucene.apache.org' Subject: RE: SolrJ and Solr web simultaneously? We already opened port 80 from solr to DB so that's not the issue, but httpd(port 80) is very flaky if there is firewall between Solr and DB. We have Solr master/slaves env, client access the search thru slaves( master only accept the new index from DB and slaves will pull the new indexes from Solr master). We have someone in Development team knows Java and implement JDBC. We don't share Solr master and DB on the same box, it's separate box and separate network, port 80 opened between these. It looks like CommonsHttpSolrServer is better approach than EmbeddedSolrServer, since we want the Solr Master acting as a solr server as well. I just worried that http will be a bottle neck, that's why I prefer JDBC connection method. Francis -----Original Message----- From: Fuad Efendi [mailto:f...@efendi.ca] Sent: Wednesday, August 26, 2009 11:56 AM To: solr-user@lucene.apache.org Subject: RE: SolrJ and Solr web simultaneously? Do you have firewall between DB and possible SOLR-Master instance? Do you have firewall between Client application and DB? Such configuration is strange... by default firewalls allow access to port 80, try to set port 80 for SOLR-Tomcat and/or configure AJP mapping for front-end HTTPD which you might have; btw Apache HTTPD with SOLR supports HTTP caching for SOLR-slaves... 1. SolrJ does not provide multithreading, but instance of CommonsHttpSolrServer is thread-safe. Developers need to implement multithreaded application. 2. SolrJ does not use JDBC; developers need to implement... It requires some Java coding, it is not out-of-the-box Document Import Handler. Suppose you have 2 quad-cores, why use single-threaded if we can use 8-threaded... or why wait 5 seconds responce from SOLR if we can use additional 32 threads doing job with DB at the same time... and why to share I/O between SOLR and DB? Diversify, lower risks, having SOLR and DB on same box is extremely unsafe... -Fuad -----Original Message----- From: Francis Yakin [mailto:fya...@liquid.com] Sent: August-26-09 2:25 PM To: 'solr-user@lucene.apache.org' Subject: RE: SolrJ and Solr web simultaneously? Thanks. The issue we have actually, it could be firewall issue more likely than network latency, that's why we try to avoid to use http connection. Fixing the firewall is not an option right now. We have around 3 millions docs to load from DB to Solr master( first initial load only) and subsequently we actively adding the new docs to Solr after the initial load. We prefer to use JDBC connection , so if solrj uses JDBC connection that might usefull. I also like the multi-threading option from Solrj. So, since we want the solr Master running as server also EmbedderSolrServer is not a good better approach for this? Francis -----Original Message----- From: Fuad Efendi [mailto:f...@efendi.ca] Sent: Wednesday, August 26, 2009 10:56 AM To: solr-user@lucene.apache.org Subject: RE: SolrJ and Solr web simultaneously? > I don't want or try not to use http connection from Database to Solr Master because of network latency( very slow). "network latency" does not play any role here; throughput is more important. With separate SOLR instance on a separate box, and with separate java application (SOLR-bridge) querying database and using SolrJ, letency will be 1 second (for instance), but you can fine-tune performance by allocating necessary amount of threads (depends on latency of SOLR and Oracle, average doc size, etc), JDBC connections, etc. - and you can reach thousands docs per second throughput. DIHs only simplify some staff for total beginners... In addition, you will have nice Admin screen of standalone SOLR-master. -Fuad http://www.tokenizer.org -----Original Message----- From: Francis Yakin [mailto:fya...@liquid.com] Sent: August-26-09 1:41 PM To: 'solr-user@lucene.apache.org'; Paul Tomblin Subject: RE: SolrJ and Solr web simultaneously? I have the same situation now. If I don't want to use http connection, so I need to use EmbeddedSolrServer that what I think I need correct? We have Master/slaves solr, the applications use slaves for search. The Master only taking the new index from Database and slaves will pull the new index using snappuller/snapinstaller. I don't want or try not to use http connection from Database to Solr Master because of network latency( very slow). Any suggestions? Francis -----Original Message----- From: Smiley, David W. [mailto:dsmi...@mitre.org] Sent: Wednesday, August 26, 2009 10:23 AM To: solr; Paul Tomblin Subject: Re: SolrJ and Solr web simultaneously? Once a commit occurs, all data added before it (by any & all clients) becomes visible to all searches henceforth. The "web interface" has direct access to Solr, and SolrJ remotely accesses that Solr. SolrEmbeddedSolrServer is something that few people should actually use. It's mostly for embedding Solr without running Solr as a server, which is a somewhat rare need. ~ David Smiley Author: http://www.packtpub.com/solr-1-4-enterprise-search-server On 8/26/09 1:14 PM, "Paul Tomblin" <ptomb...@xcski.com> wrote: Is Solr like a RDBMS in that I can have multiple programs querying and updating the index at once, and everybody else will see the updates after a commit, or do I have to something explicit to see others updates? Does it matter whether they're using the web interface, SolrJ with a CommonsHttpSolrServer or SolrJ with a EmbeddedSolrServer? -- http://www.linkedin.com/in/paultomblin