Solr for real time analytics system
Hi I am quite new to Solr. I have to build a real time analytics system which displays metrics based on multiple filters over a huge data set (~50million documents with ~100 fileds ). I would need mostly aggregation queries like sum/average/groupby etc, but data set is quite huge. The aggregation queries should be very fast. Is Solr suitable for such use cases? Thanks Rohit
Re: Solr for real time analytics system
Thanks Bhimavarapu for the information. We are creating our own dashboard, so probably wont need kibana/banana. I was more curious about Solr support for fast aggregation query over very large data set. As suggested, I guess elasticsearch has this capability. Is there any published metrics or data regarding elasticsearch/solr performance in this area that I can refer to? Thanks Rohit On Thu, Feb 4, 2016 at 11:48 AM, CKReddy Bhimavarapu wrote: > Hello Rohit, > > You can use the Banana project which was forked from Kibana > <https://github.com/elastic/kibana>, and works with all kinds of time > series (and non-time series) data stored in Apache Solr > <https://lucene.apache.org/solr/>. It uses Kibana's powerful dashboard > configuration capabilities, ports key panels to work with Solr, and > provides significant additional capabilities, including new panels that > leverage D3.js <http://d3js.org/> > > would need mostly aggregation queries like sum/average/groupby etc, but > > data set is quite huge. The aggregation queries should be very fast. > > > all your requirement can be served by this banana but I'm not sure about > how fast solr compare to ELK <https://www.elastic.co/products> > > On Thu, Feb 4, 2016 at 10:51 AM, Rohit Kumar < > rohitkumarbhagat...@gmail.com> > wrote: > > > Hi > > > > I am quite new to Solr. I have to build a real time analytics system > which > > displays metrics based on multiple filters over a huge data set > (~50million > > documents with ~100 fileds ). I would need mostly aggregation queries > like > > sum/average/groupby etc, but data set is quite huge. The aggregation > > queries should be very fast. > > > > Is Solr suitable for such use cases? > > > > Thanks > > Rohit > > > > > > -- > ckreddybh. >
Auto Soft commit not working !!!
My solr config has : 15000 false 1000 Machine is ubuntu 13 / 4 cores / 16GB RAM. Given 6gb to Solr running over tomcat. Still when i am adding documents to solr and searching its returning 0 hits. Its taking long before the document actually starts showing up. Can somebody help. Thanks
Re: Auto Soft commit not working !!!
I checked with the tomcat logs. Although the config says it to commit every 15000ms 15000 false Strangely there are no commit logs. Did i miss anything? - Having issues in Soft Auto commit (Near Real Time). Am using solr 4.0 on tomcat . The index size is 10.95 GB. With this configuration it takes more than 60 seconds to return the indexed document. When adding documents to solr and searching after soft commit time, its returning 0 hits. Its taking long before the document actually starts showing up, even more than the autoCommit interval. 15000 false 1000 Machine is ubuntu 13 / 4 cores / 16GB RAM. Given 6gb to Solr running over tomcat. On Fri, Jul 5, 2013 at 12:13 AM, Daniel Collins wrote: > You should see the commit messages in the solr logs, do they come up at the > expected frequency? > > > On 4 July 2013 15:35, Rohit Kumar wrote: > > > My solr config has : > > > > > >15000 > >false > > > > > > > > > > 1000 > > > > > > > > Machine is ubuntu 13 / 4 cores / 16GB RAM. Given 6gb to Solr running > > over tomcat. > > > > > > Still when i am adding documents to solr and searching its returning 0 > > hits. Its taking long before the document actually starts showing up. > > > > Can somebody help. > > > > Thanks > > >
Re: Auto Soft commit not working !!!
1. Do you have an update processor chain that doesn't have RunUpdate in it?*- No * 2. Is the solrconfig directive missing? - *Bang On. It was still commented !!!* 3. Is _version_ missing from your schema? *Checked it. and its present * *I will test again and update soon . * *Thanks * On Fri, Jul 5, 2013 at 8:30 AM, Jack Krupansky wrote: > 1. Do you have an update processor chain that doesn't have RunUpdate in it? > > 2. Is the solrconfig directive missing? > > 3. Is _version_ missing from your schema? > > -- Jack Krupansky > > -----Original Message- From: Rohit Kumar > Sent: Thursday, July 04, 2013 9:22 PM > To: solr-user@lucene.apache.org > Subject: Re: Auto Soft commit not working !!! > > > I checked with the tomcat logs. Although the config says it to commit every > 15000ms > > > 15000 > false > > > > Strangely there are no commit logs. Did i miss anything? > > > --**--** > - > > Having issues in Soft Auto commit (Near Real Time). Am using solr 4.0 on > tomcat . The index size is 10.95 GB. With this configuration it takes more > than 60 seconds to return the indexed document. When adding documents to > solr and searching after soft commit time, its returning 0 hits. Its taking > long before the document actually starts showing up, even more than the > autoCommit interval. > > > 15000 > false > > > > 1000 > > > Machine is ubuntu 13 / 4 cores / 16GB RAM. Given 6gb to Solr running over > tomcat. > > > > > > > > > On Fri, Jul 5, 2013 at 12:13 AM, Daniel Collins >wrote: > > You should see the commit messages in the solr logs, do they come up at >> the >> expected frequency? >> >> >> On 4 July 2013 15:35, Rohit Kumar wrote: >> >> > My solr config has : >> > >> > >> >15000 >> >false >> > >> > >> > >> > >> > 1000 >> > >> > >> > >> > Machine is ubuntu 13 / 4 cores / 16GB RAM. Given 6gb to Solr running >> > over tomcat. >> > >> > >> > Still when i am adding documents to solr and searching its returning 0 >> > hits. Its taking long before the document actually starts showing up. >> > >> > Can somebody help. >> > >> > Thanks >> > >> >> >
Using Solr to search between two Strings without using index
Hi, I have a scenario. String array = ["Input1 is good", ""Input2 is better", "Input2 is sweet", "Input3 is bad"] I want to compare the string array against the given input : String inputarray= ["Input1", "Input2"] It involves no indexes. I just want to use the power of string search to do a runtime search on the array and should return ["Input1 is good", ""Input2 is better", "Input2 is sweet"] Thanks
Searching in stopwords
I have a company search which uses stopwords during quezary time. In my stopwords list i have entries like : HR Club India Pvt. Ltd. So if i search for companies like HR Club i get no results. Similarly search for India HR giving no results. How can i get results in query for following companies : 1. HR India 2. HR Club 3. HR India Pvt Ltd I would still want to maintain the above list of stopwords since these letters occur heavily in company text. Please guide if i need to change my strategy itself. Thanks Rohit Kumar
Searching solr on school name during year
Hi, Currently I have a student search which allows me to search for documents in a school. I am looking at including year search into the existing schema which would enable users to search for students in a school during an year. I have a proposed change in the schema to add the "year" component to facilitate this search. Existing schema: (No year information currently) Current sample data: name:Borris Mayers schoolName:Canterbury University New schema: Sample data: name:Borris Mayers schoolName:Canterbury University, start_2001, year_2001, year_2002, year_2003, year_2004, year_2005, end_2005 schoolNameWithTermOriginal:Canterbury University||2001-2005 Please suggest if its a correct approach or there is a better way to do the same. I am using Solr 4.3. Thanks, Rohit Kumar
Frequent softCommits leading to high faceting times?
Hi, We are running *SOLR 4.3* with 8 Gb of index on Ubuntu 12.04 64 bits Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz Single core. 16GB RAM We just started using the autoSoftCommit feature and noticed the facet queries slowed down from milliseconds taking earlier to a minute. We have *8 facet fields*. We add close to 300 documents per second during peak interval. 60 false 1000 Here is some information i got with debugQuery. Please note that *facet time is more than 50 seconds.* 50779.0 0.0 41.0 * 50590.0 * 0.0 0.0 0.0 5.0 143.0 Please help. Thanks, Rohit Kumar
Section Search in SOLR
Hi, I have following SOLR documents indexed. 1 Boeing Kaseya Executive Technician 2 Boeing Kodak Technician Executive Company name and Position name are multivalued fields maintained in order. The following is the solr query. *fq=companyName:Boeing&fq=positionName:Executive* which returns both the documents as expected. What changes will i have to make to be able to search for companyName:Boeing and positionName:Executive both at same indexes in the corresponding multivalued fields i.e. should return me only doc id 1. Thanks, Rohit Kumar
Re: Section Search in SOLR
Thanks Jack for quick reply. Probably my question was not elaborate enough. Let me add more explanation. *Option 1: * Even if I flatten my document to store separate *experiences* in multivalued field, solr will still return me the doc id 1 and 2 if i query : *fq=**experience:Boeing&fq=**experience:Executive* 1 Boeing Kaseya Executive Technician Boeing, Executive Kaseya, Technician 2 Boeing Kodak Technician Executive Boeing, Technician Kodak, Executive *Option 2: * Storing separate experience in separate fields and generate query q=(exp1:(Boeing AND Executive) OR exp2:(Boeing AND Executive)) and this can be queried to return the docs with the expected match. 2 ... Boeing, Executive Kodak, Executive * * Please suggest. * * I would just love to know how linkedin does it to show facets for people working in company with titles. Thanks On Sat, Sep 28, 2013 at 9:58 PM, Jack Krupansky wrote: > "multivalued fields maintained in order" > > That is not a feature supported by Solr. > > Solr will maintain the order of an individual multivalued field and will > return the values of that field in order, but makes no other use of the > order. > > Ditto for "corresponding multivalued fields". Solr does not support any > correspondence between multivalued fields. > > You must flatten your data your data to achieve any correspondence. > > Multivalued field are a powerful feature of Solr, but you must be > extremely careful to use them only in moderation. > > -- Jack Krupansky > > -Original Message- From: Rohit Kumar > Sent: Saturday, September 28, 2013 12:11 PM > To: solr-user@lucene.apache.org > Subject: Section Search in SOLR > > > Hi, > > I have following SOLR documents indexed. > > >1 > > Boeing > Kaseya > > > Executive > Technician > > > > >2 > > Boeing > Kodak > > > Technician > Executive > > > > > Company name and Position name are multivalued fields maintained in order. > > The following is the solr query. > *fq=companyName:Boeing&fq=**positionName:Executive* which returns both the > > documents as expected. > What changes will i have to make to be able to search for > companyName:Boeing and positionName:Executive both at same indexes in the > corresponding multivalued fields i.e. should return me only doc id 1. > > > Thanks, > Rohit Kumar >
SOLR : ArrayIndexOutOfBoundsException from SolrDispatchFilter
Need help to figure out the error below. *Code Snippet*: public class ConnectionComponent extends SearchComponent { @Override public void process(ResponseBuilder rb) throws IOException { NamedList nList = new SimpleOrderedMap(); NamedList nl= new SimpleOrderedMap(); List ld = new ArrayList(); Document mydoc = new Document(); mydoc.add(f); //IndexableField f not null ld.add(mydoc); nl.add(someKey, ld); nList.add(otherKey, nl); // rb instance of ResponseBuilder rb.rsp.add(returnKey, nList); } } RROR org.apache.solr.servlet.SolrDispatchFilter ? null:java.lang.ArrayIndexOutOfBoundsException: -1 at java.util.ArrayList.get(ArrayList.java:324) at java.util.Collections$UnmodifiableList.get(Collections.java:1152) at org.apache.solr.response.transform.ValueSourceAugmenter.transform(ValueSourceAugmenter.java:92) at org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:165) at org.apache.solr.response.JSONWriter.writeArray(JSONResponseWriter.java:526) at org.apache.solr.response.TextResponseWriter.writeArray(TextResponseWriter.java:289) at org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:192) at org.apache.solr.response.JSONWriter.writeNamedListAsMapWithDups(JSONResponseWriter.java:183) at org.apache.solr.response.JSONWriter.writeNamedList(JSONResponseWriter.java:299) at org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:188)