Hi Everyone, We are indexing quite a lot of data using update/csv handler. For reasons I can't get into right now, I can't implement a DIH since I can only access the DB using Stored Procs and stored proc support in DIH is not yet available. Indexing takes about 3 hours and I don't want to tax the server too much during indexing so I came up with a two server solution. Indexing server to index the file every night and subsequently copy the index on the search server. Maintaining a full fledged Tomcat/Jetty for just indexing is too much of a pain, so I wrote a small utility Java class which starts an Embedded Server, indexes the CSV and shuts down the server. I would like the community's input on this solution.
Is this Okay to do? Is there a better way to do this without running two separate servers? Is my class safe enough to run everynight in production environment? Here's my utility calss. This is just a POC and before I productionize it, I would like some input from Solr Czars here. import org.apache.solr.client.solrj.SolrServer; import org.apache.solr.client.solrj.embedded.EmbeddedSolrServer; import org.apache.solr.client.solrj.request.ContentStreamUpdateRequest; import org.apache.solr.common.util.NamedList; import org.apache.solr.core.CoreContainer; import org.apache.solr.core.CoreDescriptor; import org.apache.solr.core.SolrConfig; import org.apache.solr.core.SolrCore; import java.io.File; public class StandaloneSolrIndexer { public static void main(String args[]) throws Exception { SolrCore core = null; CoreContainer container = null; try { container = new CoreContainer(); SolrConfig config = new SolrConfig("/tmp/solr", "solrconfig.xml", null); CoreDescriptor descriptor = new CoreDescriptor(container, "core1", "/tmp/solr"); core = new SolrCore("core1", "/tmp/solr/data", config, null, descriptor); container.register("core1", core, false); SolrServer server = new EmbeddedSolrServer(container, "core1"); //Start by deleting everything server.deleteByQuery("*:*"); ContentStreamUpdateRequest req = new ContentStreamUpdateRequest("/update/csv"); req.addFile(new File("/tmp/product-5k.tsv")); req.setParam("commit", "true"); req.setParam("stream.contentType", "text/plain;charset=utf-8"); req.setParam("escape", "\\"); req.setParam("separator", "\t"); req.setParam("fieldnames", "product_id,account_id,name,category_tags,short_desc,upc,manu_mdl_num,ext_prd_id,brand,long_desc,sku,seller,seller_email,vertical,cat,subcat"); req.setParam("skipLines", "1"); NamedList<Object> result = server.request(req); System.out.println("Result ====================================================================================: \n" + result); } finally { if (core != null) core.close(); if (container != null) container.shutdown(); } } } Thanks, Rohit