Solr for real time analytics system

2016-02-03 Thread Rohit Kumar
Hi

I am quite new to Solr. I have to build a real time analytics system which
displays metrics based on multiple filters over a huge data set (~50million
documents with ~100 fileds ).  I would need mostly aggregation queries like
sum/average/groupby etc, but data set is quite huge. The aggregation
queries should be very fast.

Is Solr suitable for such use cases?

Thanks
Rohit


Re: Solr for real time analytics system

2016-02-04 Thread Rohit Kumar
Thanks Bhimavarapu for the information.

We are creating our own dashboard, so probably wont need kibana/banana. I
was more curious about Solr support for fast aggregation query over very
large data set. As suggested, I guess elasticsearch  has this capability.
Is there any published metrics or data regarding elasticsearch/solr
performance in this area that I can refer to?

Thanks
Rohit



On Thu, Feb 4, 2016 at 11:48 AM, CKReddy Bhimavarapu 
wrote:

> Hello Rohit,
>
> You can use the Banana project which was forked from Kibana
> <https://github.com/elastic/kibana>, and works with all kinds of time
> series (and non-time series) data stored in Apache Solr
> <https://lucene.apache.org/solr/>. It uses Kibana's powerful dashboard
> configuration capabilities, ports key panels to work with Solr, and
> provides significant additional capabilities, including new panels that
> leverage D3.js <http://d3js.org/>
>
>  would need mostly aggregation queries like sum/average/groupby etc, but
> > data set is quite huge. The aggregation queries should be very fast.
>
>
> all your requirement can be served by this banana but I'm not sure about
> how fast solr compare to ELK <https://www.elastic.co/products>
>
> On Thu, Feb 4, 2016 at 10:51 AM, Rohit Kumar <
> rohitkumarbhagat...@gmail.com>
> wrote:
>
> > Hi
> >
> > I am quite new to Solr. I have to build a real time analytics system
> which
> > displays metrics based on multiple filters over a huge data set
> (~50million
> > documents with ~100 fileds ).  I would need mostly aggregation queries
> like
> > sum/average/groupby etc, but data set is quite huge. The aggregation
> > queries should be very fast.
> >
> > Is Solr suitable for such use cases?
> >
> > Thanks
> > Rohit
> >
>
>
>
> --
> ckreddybh. 
>


Auto Soft commit not working !!!

2013-07-04 Thread Rohit Kumar
My solr config has :

 
   15000
   false
 


   
 1000
   


Machine is ubuntu 13 / 4 cores / 16GB RAM. Given 6gb to Solr running
over tomcat.


Still when i am adding documents to solr and searching its returning 0
hits. Its taking long before the document actually starts showing up.

Can somebody help.

Thanks


Re: Auto Soft commit not working !!!

2013-07-04 Thread Rohit Kumar
I checked with the tomcat logs. Although the config says it to commit every
15000ms


   15000
   false
 


Strangely there are no commit logs. Did i miss anything?


-

Having issues in Soft Auto commit (Near Real Time). Am using solr 4.0 on
tomcat . The index size is 10.95 GB. With this configuration it takes more
than 60 seconds to return the indexed document. When adding documents to
solr and searching after soft commit time, its returning 0 hits. Its taking
long before the document actually starts showing up, even more than the
autoCommit interval.

 
   15000
   false
 

   
 1000
   

Machine is ubuntu 13 / 4 cores / 16GB RAM. Given 6gb to Solr running over
tomcat.








On Fri, Jul 5, 2013 at 12:13 AM, Daniel Collins wrote:

> You should see the commit messages in the solr logs, do they come up at the
> expected frequency?
>
>
> On 4 July 2013 15:35, Rohit Kumar  wrote:
>
> > My solr config has :
> >
> >  
> >15000
> >false
> >  
> >
> > 
> >
> >  1000
> >
> >
> >
> > Machine is ubuntu 13 / 4 cores / 16GB RAM. Given 6gb to Solr running
> > over tomcat.
> >
> >
> > Still when i am adding documents to solr and searching its returning 0
> > hits. Its taking long before the document actually starts showing up.
> >
> > Can somebody help.
> >
> > Thanks
> >
>


Re: Auto Soft commit not working !!!

2013-07-04 Thread Rohit Kumar
1. Do you have an update processor chain that doesn't have RunUpdate in it?*- No
*

2. Is the  solrconfig directive missing? - *Bang On. It was
still commented !!!*

3. Is _version_ missing from your schema?  *Checked it. and its present


*
*I will test again and update soon .


*
*Thanks

*



On Fri, Jul 5, 2013 at 8:30 AM, Jack Krupansky wrote:

> 1. Do you have an update processor chain that doesn't have RunUpdate in it?
>
> 2. Is the  solrconfig directive missing?
>
> 3. Is _version_ missing from your schema?
>
> -- Jack Krupansky
>
> -----Original Message- From: Rohit Kumar
> Sent: Thursday, July 04, 2013 9:22 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Auto Soft commit not working !!!
>
>
> I checked with the tomcat logs. Although the config says it to commit every
> 15000ms
>
> 
>   15000
>   false
> 
>
>
> Strangely there are no commit logs. Did i miss anything?
>
>
> --**--**
> -
>
> Having issues in Soft Auto commit (Near Real Time). Am using solr 4.0 on
> tomcat . The index size is 10.95 GB. With this configuration it takes more
> than 60 seconds to return the indexed document. When adding documents to
> solr and searching after soft commit time, its returning 0 hits. Its taking
> long before the document actually starts showing up, even more than the
> autoCommit interval.
>
> 
>   15000
>   false
> 
>
>   
> 1000
>   
>
> Machine is ubuntu 13 / 4 cores / 16GB RAM. Given 6gb to Solr running over
> tomcat.
>
>
>
>
>
>
>
>
> On Fri, Jul 5, 2013 at 12:13 AM, Daniel Collins  >wrote:
>
>  You should see the commit messages in the solr logs, do they come up at
>> the
>> expected frequency?
>>
>>
>> On 4 July 2013 15:35, Rohit Kumar  wrote:
>>
>> > My solr config has :
>> >
>> >  
>> >15000
>> >false
>> >  
>> >
>> > 
>> >
>> >  1000
>> >
>> >
>> >
>> > Machine is ubuntu 13 / 4 cores / 16GB RAM. Given 6gb to Solr running
>> > over tomcat.
>> >
>> >
>> > Still when i am adding documents to solr and searching its returning 0
>> > hits. Its taking long before the document actually starts showing up.
>> >
>> > Can somebody help.
>> >
>> > Thanks
>> >
>>
>>
>


Using Solr to search between two Strings without using index

2013-07-25 Thread Rohit Kumar
Hi,

I have a scenario.

String array = ["Input1 is good", ""Input2 is better", "Input2 is sweet",
"Input3 is bad"]

I want to compare the string array against the given input :
String inputarray= ["Input1", "Input2"]


It involves no indexes. I just want to use the power of string search to do
a runtime search on the array and should return

["Input1 is good", ""Input2 is better", "Input2 is sweet"]



Thanks


Searching in stopwords

2013-07-27 Thread Rohit Kumar
I have a company search which uses stopwords during quezary time. In my
stopwords list i have entries like :

HR
Club
India
Pvt.
Ltd.



So if i search for companies like HR Club i get no results. Similarly
search for India HR giving no results. How can i get results in query for
following companies :

1. HR India
2. HR Club
3. HR India Pvt Ltd


I would still want to maintain the above list of stopwords since these
letters occur heavily in company text.

Please guide if i need to change my strategy itself.






   





   
   
 




   




Thanks
Rohit Kumar


Searching solr on school name during year

2013-09-08 Thread Rohit Kumar
Hi,

Currently I have a student search which allows me to search for documents
in a school. I am looking at including year search into the existing schema
which would enable users to search for students in a school during an year.
I have a proposed change in the schema to add the "year" component to
facilitate this search.


Existing schema: (No year information currently)





Current sample data:
name:Borris Mayers
schoolName:Canterbury University




New schema:







Sample data:

name:Borris Mayers
schoolName:Canterbury University, start_2001, year_2001, year_2002,
year_2003, year_2004, year_2005, end_2005
schoolNameWithTermOriginal:Canterbury University||2001-2005


Please suggest if its a correct approach or there is a better way to do the
same.
I am using Solr 4.3.


Thanks,
Rohit Kumar


Frequent softCommits leading to high faceting times?

2013-09-15 Thread Rohit Kumar
Hi,

We are running *SOLR 4.3* with 8 Gb of index on

Ubuntu 12.04 64 bits
Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz Single core.
16GB RAM


We just started using the autoSoftCommit feature and noticed the facet
queries slowed down from milliseconds taking earlier to a minute. We have *8
facet fields*.

We add close to 300 documents per second during peak interval.


60
false



1000



Here is some information i got with debugQuery. Please note that *facet
time is more than 50 seconds.*


50779.0

0.0


41.0

*
50590.0
*

0.0


0.0


0.0


5.0


143.0



Please help.

Thanks,
Rohit Kumar


Section Search in SOLR

2013-09-28 Thread Rohit Kumar
Hi,

I have following SOLR documents indexed.


1

  Boeing
  Kaseya


  Executive
  Technician




2

  Boeing
  Kodak


  Technician
  Executive




Company name and Position name are multivalued fields maintained in order.

The following is the solr query.
*fq=companyName:Boeing&fq=positionName:Executive* which returns both the
documents as expected.
What changes will i have to make to be able to search for
companyName:Boeing and positionName:Executive both at same indexes in the
corresponding multivalued fields i.e. should return me only doc id 1.


Thanks,
Rohit Kumar


Re: Section Search in SOLR

2013-09-28 Thread Rohit Kumar
Thanks Jack for quick reply.


Probably my question was not elaborate enough. Let me add more explanation.

*Option 1:
*
Even if I flatten my document to store separate *experiences* in
multivalued field, solr will still return me the doc id 1 and 2 if i query
: *fq=**experience:Boeing&fq=**experience:Executive*


1

  Boeing
  Kaseya


  Executive
  Technician


  Boeing, Executive
  Kaseya, Technician




2

  Boeing
  Kodak


  Technician
  Executive


  Boeing, Technician
  Kodak, Executive




*Option 2:

*
Storing separate experience in separate fields and generate query
q=(exp1:(Boeing AND Executive) OR exp2:(Boeing AND Executive)) and this can
be queried to return the docs with the expected match.


2
   ...
Boeing, Executive
Kodak, Executive

*

*
Please suggest.
*
*
I would just love to know how linkedin does it to show facets for people
working in company with titles.

Thanks




On Sat, Sep 28, 2013 at 9:58 PM, Jack Krupansky wrote:

> "multivalued fields maintained in order"
>
> That is not a feature supported by Solr.
>
> Solr will maintain the order of an individual multivalued field and will
> return the values of that field in order, but makes no other use of the
> order.
>
> Ditto for "corresponding multivalued fields". Solr does not support any
> correspondence between multivalued fields.
>
> You must flatten your data your data to achieve any correspondence.
>
> Multivalued field are a powerful feature of Solr, but you must be
> extremely careful to use them only in moderation.
>
> -- Jack Krupansky
>
> -Original Message- From: Rohit Kumar
> Sent: Saturday, September 28, 2013 12:11 PM
> To: solr-user@lucene.apache.org
> Subject: Section Search in SOLR
>
>
> Hi,
>
> I have following SOLR documents indexed.
>
> 
>1
>
>  Boeing
>  Kaseya
>
>
>  Executive
>  Technician
>
> 
>
> 
>2
>
>  Boeing
>  Kodak
>
>
>  Technician
>  Executive
>
> 
>
>
> Company name and Position name are multivalued fields maintained in order.
>
> The following is the solr query.
> *fq=companyName:Boeing&fq=**positionName:Executive* which returns both the
>
> documents as expected.
> What changes will i have to make to be able to search for
> companyName:Boeing and positionName:Executive both at same indexes in the
> corresponding multivalued fields i.e. should return me only doc id 1.
>
>
> Thanks,
> Rohit Kumar
>


SOLR : ArrayIndexOutOfBoundsException from SolrDispatchFilter

2013-06-19 Thread Rohit Kumar
Need help to figure out the error below.


*Code Snippet*:

public class ConnectionComponent extends SearchComponent {

@Override

public void process(ResponseBuilder rb) throws IOException {

NamedList nList = new SimpleOrderedMap();



NamedList nl= new SimpleOrderedMap();


List ld = new ArrayList();
Document mydoc = new Document();
mydoc.add(f); //IndexableField f not null
ld.add(mydoc);

nl.add(someKey, ld);

nList.add(otherKey, nl);



// rb instance of ResponseBuilder

rb.rsp.add(returnKey, nList);

  }

}


RROR org.apache.solr.servlet.SolrDispatchFilter  ?
null:java.lang.ArrayIndexOutOfBoundsException: -1
at java.util.ArrayList.get(ArrayList.java:324)
at java.util.Collections$UnmodifiableList.get(Collections.java:1152)
at 
org.apache.solr.response.transform.ValueSourceAugmenter.transform(ValueSourceAugmenter.java:92)
at 
org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:165)
at org.apache.solr.response.JSONWriter.writeArray(JSONResponseWriter.java:526)
at 
org.apache.solr.response.TextResponseWriter.writeArray(TextResponseWriter.java:289)
at 
org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:192)
at 
org.apache.solr.response.JSONWriter.writeNamedListAsMapWithDups(JSONResponseWriter.java:183)
at 
org.apache.solr.response.JSONWriter.writeNamedList(JSONResponseWriter.java:299)
at 
org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:188)