solre scores remains same for exact match and nearly exact match

2013-04-02 Thread amit

Below is my query
http://localhost:8983/solr/select/?q=subject:session management in
php&fq=category:[*%20TO%20*]&fl=category,score,subject

The result is like below




0
983

category:[* TO *]
subject:session management in php
category,score,subject




0.8770298
Annapurnap
session management in asp.net



0.8770298
Annapurnap
session management in PHP

 


The question is how come both have the same score when 1 is exact match and
the other isn't.
This is the schema







--
View this message in context: 
http://lucene.472066.n3.nabble.com/solre-scores-remains-same-for-exact-match-and-nearly-exact-match-tp4053406.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: solre scores remains same for exact match and nearly exact match

2013-04-03 Thread amit
Thanks. I added a copy field and that fixed the issue.


On Wed, Apr 3, 2013 at 12:29 PM, Gora Mohanty-3 [via Lucene] <
ml-node+s472066n4053412...@n3.nabble.com> wrote:

> On 3 April 2013 10:52, amit <[hidden 
> email]<http://user/SendEmail.jtp?type=node&node=4053412&i=0>>
> wrote:
> >
> > Below is my query
> > http://localhost:8983/solr/select/?q=subject:session management in
> > php&fq=category:[*%20TO%20*]&fl=category,score,subject
> [...]
>
> Add debugQuery=on to your Solr URL, and you will get an
> explanation of the score. Your subject field is tokenised, so
> that there is no a priori reason that an exact match should
> score higher. Several strategies are available if you want that
> behaviour. Try searching Google, e.g., for "solr exact match
> higher score".
>
> Regards,
> Gora
>
>
> --
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://lucene.472066.n3.nabble.com/solre-scores-remains-same-for-exact-match-and-nearly-exact-match-tp4053406p4053412.html
>  To unsubscribe from solre scores remains same for exact match and nearly
> exact match, click 
> here<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=4053406&code=YW1pdC5tYWxsaWtAZ21haWwuY29tfDQwNTM0MDZ8LTk5Njc5OTA3NA==>
> .
> NAML<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>




--
View this message in context: 
http://lucene.472066.n3.nabble.com/solre-scores-remains-same-for-exact-match-and-nearly-exact-match-tp4053406p4053478.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: solre scores remains same for exact match and nearly exact match

2013-04-03 Thread amit
when I use the copy field destination as "text" it works fine.
I get a boost for exact match.
But if I use some other field the score is not boosted for exact match.




Not sure if I am in the right direction.. I am new to solr please bear with
me
I checked this link http://wiki.apache.org/solr/SolrRelevancyCookbook
and trying to index same field multiple times to get exact match.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/solre-scores-remains-same-for-exact-match-and-nearly-exact-match-tp4053406p4053718.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: solre scores remains same for exact match and nearly exact match

2013-04-04 Thread amit
Thanks Jack and Andre
I am trying to use edismax;but struck with the NoClassDefFoundError:
org/apache/solr/response/QueryResponseWriter
I am using solr 3.6
I have followed the steps here 
http://wiki.apache.org/solr/VelocityResponseWriter#Using_the_VelocityResponseWriter_in_Solr_Core

Just the jars are copied rest was already there in solrconfig.xml






--
View this message in context: 
http://lucene.472066.n3.nabble.com/solre-scores-remains-same-for-exact-match-and-nearly-exact-match-tp4053406p4053811.html
Sent from the Solr - User mailing list archive at Nabble.com.


using edismax without velocity

2013-04-06 Thread amit
I am using solr3.6 and trying to use the edismax handler.
The config has a /browse requestHandler, but it doesn't work because of
missing class definition VelocityResponseWriter error.
 
I have copied the jars to solr/lib following the steps here, but no luck
http://wiki.apache.org/solr/VelocityResponseWriter#Using_the_VelocityResponseWriter_in_Solr_Core

I just want  to search on multiple fields with different boost. *Can I use
edismax with the /select requestHandler?* If I write a query like below,
does it search in both the fields name and description?
Does the query below solves my purpose?
http://localhost:8080/solr/select/?q=(coldfusion^2
cache^1)*&defType=edismax&qf=name^2 description^1*&fq=author:[* TO *] AND
-author:chinmoyp&start=0&rows=10&fl=author,score, id





--
View this message in context: 
http://lucene.472066.n3.nabble.com/using-edismax-without-velocity-tp4054190.html
Sent from the Solr - User mailing list archive at Nabble.com.


edismax returns very less matches than regular

2013-04-08 Thread amit
I have a simple system. I put the title of webpages into the "name" field and
content of the web pages into the "Description" field.
I want to search both fields and give the name a little more boost.
A search on name field or description field returns records cloase to
hundreds.

http://localhost:8983/solr/select/?q=name:%28coldfusion^2%20cache^1%29&fq=author:[*%20TO%20*]%20AND%20-author:chinmoyp&start=0&rows=10&fl=author,score,%20id

But search on both fields using boost just gives 5 matches.

http://localhost:8983/solr/mindfire/?q=%28%20coldfusion^2%20cache^1%29*&defType=edismax&qf=name^1.5%20description^1.0*&fq=author:[*%20TO%20*]%20AND%20-author:chinmoyp&start=0&rows=10&fl=author,score,%20id

I am wondering what is wrong, because there are valid results returned in
1st query which is ignored by edismax. I am on solr3.6



--
View this message in context: 
http://lucene.472066.n3.nabble.com/edismax-returns-very-less-matches-than-regular-tp4054442.html
Sent from the Solr - User mailing list archive at Nabble.com.


solr doesn't start on tomcat on aws

2013-05-15 Thread amit
I am installing solr on tomcat7 in aws using bitmani tomcat stack.My solr
server is not starting; below is the errorINFO: Starting service Catalina 
May 15, 2013 7:01:51 AM org.apache.catalina.core.StandardEngine
startInternal  INFO: Starting Servlet Engine: Apache Tomcat/7.0.39  May 15,
2013 7:01:51 AM org.apache.catalina.startup.HostConfig deployDescriptor 
INFO: Deploying configuration descriptor
/opt/bitnami/apache-tomcat/conf/Catalina/localhost/solr.xml  May 15, 2013
7:01:52 AM org.apache.catalina.startup.HostConfig deployDescriptor  SEVERE:
Error deploying configuration descriptor
/opt/bitnami/apache-tomcat/conf/Catalina/localhost/solr.xml 
java.lang.NullPointerException  at
org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:625) 
at
org.apache.catalina.startup.HostConfig$DeployDescriptor.run(HostConfig.java:1637)
 
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) 
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)  at
java.util.concurrent.FutureTask.run(FutureTask.java:166)  at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
at java.lang.Thread.run(Thread.java:722)  May 15, 2013 7:01:52 AM
org.apache.catalina.startup.HostConfig deployDescriptors  SEVERE: Error
waiting for multi-thread deployment of context descriptors to complete 
java.util.concurrent.ExecutionException: java.lang.NullPointerException  at
java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:252)  at
java.util.concurrent.FutureTask.get(FutureTask.java:111)  at
org.apache.catalina.startup.HostConfig.deployDescriptors(HostConfig.java:579) 
at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:475) 
at org.apache.catalina.startup.HostConfig.start(HostConfig.java:1402)  at
org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:318) 
at 
org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:119)
 
at
org.apache.catalina.util.LifecycleBase.fireLifecycleEvent(LifecycleBase.java:90)
 
at
org.apache.catalina.util.LifecycleBase.setStateInternal(LifecycleBase.java:402) 
at org.apache.catalina.util.LifecycleBase.setState(LifecycleBase.java:347) 
The /opt/bitnami/apache-tomcat/conf/Catalina/localhost/solr.xml looks like
this.   The contents of
/usr/share/solr/ also looks finebitnami@ip-10-144-66-148:/usr/share/solr$ ls
-ltotal 11384drwxr-xr-x 2 tomcat tomcat 4096 Jul 17  2012 bin  drwxr-xr-x 5
tomcat tomcat 4096 May 13 13:11 conf  drwxr-xr-x 9 tomcat tomcat 4096 Jul 17 
2012 contrib  drwxr-xr-x 2 tomcat tomcat 4096 May 13 13:20 data  drwxr-xr-x
2 tomcat tomcat 4096 May 13 13:21 lib  -rw-r--r-- 1 tomcat tomcat 2259 Jul
17  2012 README.txt  -rw-r--r-- 1 tomcat tomcat 11628199 May 14 12:58
solr.war-rw-r--r-- 1 tomcat tomcat 1676 Jul 17  2012 solr.xml  Not sure
what is wrong, but this is killing me :-(



--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-doesn-t-start-on-tomcat-on-aws-tp4063448.html
Sent from the Solr - User mailing list archive at Nabble.com.

solr indexing slows down after few minutes

2012-08-30 Thread amit
Hi 
I am indexing our website using solr. After crawling the website I am adding
the contents to the solr server and committing. My slor server is on 3.6
version.

I observed that initially the indexing is very fast. It added around 33
docs/sec; making 2000 / minute. But then after the 30 mins or so the
performance reduces drastically. Within an hour of indexing it is adding 200
docs/min which is 1/10 times what it used to add.

Here is the details the time and the total docs in slor.
TimeDocs in solr
(min)
3.480
3.5410,000
3.5816,000
4.0324,000
4:0830,000
4:1538,000
4:1841,000
4:2542,230
4:3143,140
4:3844,410

This is after making the name field stored="false". Otherwise the same
behavior can be duplicated much before. My doc has 3 fields and the name
field is the extracted text from the webpages. Others are too small
Not sure what is wrong. I am using a win 7 machine dual core and 4 GB ram.

Thanks
Amit




--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-indexing-slows-down-after-few-minutes-tp4004337.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: solr indexing slows down after few minutes

2012-09-06 Thread amit
Commit is not too often, it's a batch of 100 records, takes 40 to 60 secs
before another commit.
No I am not indexing with multi threads. It uses a single thread executor.

I have seen steady performance for now after increasing the merge factor
from 10 to 25.
Will have to wait and watch if that reduces the search speed, but so far so
good.

Thanks
Amit

On Thu, Aug 30, 2012 at 10:53 PM, pravesh [via Lucene] <
ml-node+s472066n4004421...@n3.nabble.com> wrote:

> Did you checked wiki:
> http://wiki.apache.org/lucene-java/ImproveIndexingSpeed
>
> Do you commit often? Do you index with multiple threads? Also try
> experimenting with various available MergePolicies introduced from SOLR 3.4
> onwards
>
> Thanx
> Pravesh
>
> --
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://lucene.472066.n3.nabble.com/solr-indexing-slows-down-after-few-minutes-tp4004337p4004421.html
>  To unsubscribe from solr indexing slows down after few minutes, click
> here<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=4004337&code=YW1pdC5tYWxsaWtAZ21haWwuY29tfDQwMDQzMzd8LTk5Njc5OTA3NA==>
> .
> NAML<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>




--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-indexing-slows-down-after-few-minutes-tp4004337p4005864.html
Sent from the Solr - User mailing list archive at Nabble.com.

solr 3.6.1 tomcat 7.0 missing core name in path

2012-09-06 Thread amit
Hi 
I have installed solr 3.6.1 on tomcat 7.0 following the steps here. 
http://ralf.schaeftlein.de/2012/02/10/installing-solr-3-5-under-tomcat-7/

The slor home page loads fine but the admin page
(http://localhost:8080/solr/admin/) throws error missing core name in path.
I am installing single core. This is the solr.xml 

  

  


I have double checked lot of steps by searching on net, but no luck.
If anyone has faced this please suggest.  

Thanks
Amit



--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-3-6-1-tomcat-7-0-missing-core-name-in-path-tp4005868.html
Sent from the Solr - User mailing list archive at Nabble.com.


index solr using jquery AJAX

2012-11-01 Thread amit
Is is possible to index (add/update) solr using jquery AJAX?
I am trying with jasonp first, but no luck.
try {
$.ajax({
type: "POST",
url:
"http://192.168.10.113:8080/solr/update/json?commit=true";,
data: { "add": { "doc": { "id": "22", "name": "Ruby on
Trails"}} },
contentType: "application/json",
dataType: 'jsonp',
crossDomain: true,
jsonp: 'json.wrf',
success: function (data) { alert(data); },
failure: function (errMsg) {
alert(errMsg);
}
});
}
catch (err) {
alert(err);
}

Xml update is not working because the solr server is a different domain. I
have tried to use CORS filter on tomcat but that's also not working.
http://software.dzhuvinov.com/cors-filter-installation.html

Please suggest. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/index-solr-using-jquery-AJAX-tp4017490.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: solr indexing using jquery AJAX

2012-11-01 Thread amit
I changed as per your feedback. Added quotes and escaped them before id and
name.
Still not able to insert.

data: "20trailblazers",

The tomcat log says bad request. 
192.168.11.88 - - [01/Nov/2012:17:10:35 +0530] "OPTIONS
/solr/update?commit=true HTTP/1.1" 400 1052

In google chrome there are 2 errors. 1st one says Bad request, The 2nd one
is Access-Control-Allow-Origin error.

Please let me know if it is possible to put a cros filter to allow access. I
tried to use below filter on tomcat
http://software.dzhuvinov.com/cors-filter-installation.html but it's not
working.






--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-indexing-using-jquery-AJAX-tp4017029p4017499.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: index solr using jquery AJAX

2012-11-02 Thread amit
Hi Luis 
I tried sending an array too, but no luck
This is how the request looks like.
$.ajax({
url: "http://192.168.10.113:8080/solr/update/json?commit=true";,
type: "POST",
contentType: "application/json; charset=utf-8",
data: [ { "id": "22", "name": "Seattle seahawks"} ],
dataType: 'jsonp',
crossDomain: true,
jsonp: 'json.wrf'
 });



--
View this message in context: 
http://lucene.472066.n3.nabble.com/index-solr-using-jquery-AJAX-tp4017490p4017822.html
Sent from the Solr - User mailing list archive at Nabble.com.


Ranking result on the basis of field value irrespective of score

2008-10-23 Thread Amit
Hi All,

 

How  we can do the ranking on the basis of specific field value irrespective
of score in solr?

 

For example:

   Let say field “language” which content values
like “German,English,French,Chines,Arabic”.

  So I want English language document come first irrespective of
score.

 

Thanks in advance any kind reply.

 

Regards,

 Amit

 



No virus found in this outgoing message.
Checked by AVG. 
Version: 7.5.549 / Virus Database: 270.8.2/1740 - Release Date: 22-10-2008
19:24
 


facet sort by ranking

2008-11-21 Thread Amit
Hi All,

 

Is there any way to sort the facet values by other own ranking value instead
of only count?

 

Thanks and Regards,

Amit


No virus found in this outgoing message.
Checked by AVG. 
Version: 7.5.549 / Virus Database: 270.9.9/1804 - Release Date: 21-11-2008
18:24
 


RE: facet sort by ranking

2008-11-22 Thread Amit
Hi Sahlin,

Thanks for reply.

Actually we have some ranking associated to field on which we are faceting
and we want to show only top 10 facet value now which is sort by count but
we want to sort by it ranking.



Regards,
Amit



-Original Message-
From: Shalin Shekhar Mangar [mailto:[EMAIL PROTECTED] 
Sent: 22 November 2008 13:10
To: solr-user@lucene.apache.org
Subject: Re: facet sort by ranking

On Sat, Nov 22, 2008 at 12:31 PM, Amit <[EMAIL PROTECTED]> wrote:

>
> Is there any way to sort the facet values by other own ranking value
> instead
> of only count?
>
>
Facets do not have a "score" associated with them of their own. Not sure
what you mean exactly.

-- 
Regards,
Shalin Shekhar Mangar.

No virus found in this incoming message.
Checked by AVG. 
Version: 7.5.549 / Virus Database: 270.9.9/1804 - Release Date: 21-11-2008
18:24
 

No virus found in this outgoing message.
Checked by AVG. 
Version: 7.5.549 / Virus Database: 270.9.9/1804 - Release Date: 21-11-2008
18:24
 



RE: facet sort by ranking

2008-11-23 Thread Amit
Hi,

We having 100 category and each category having it own internal ranking.
Let consider if I search for any product and its fall under 30 categories
and we are showing top 10 categories in filter so that user can filter there
results.

Let consider hypothetical example(as we don't have correct data and we are
under testing solr features):
Categories values and internal ranking:
Cat1
- 1
Cat2
- 2
Cat3
- 3
Cat4
- 4
Cat5
- 5
Cat6
- 6
Cat7
- 7
Cat8
- 8
Cat9
- 9

Cat10 - 10

Cat11 - 11

Cat12 - 12

Cat13 - 13

Cat14 - 14

Cat15 - 15  
If I search for product it will return result:
   Category
count(as sort by count)
Cat2
- 20
Cat3
- 17
Cat4
- 15
Cat1
- 14
Cat7
- 13
Cat8
- 12
Cat9
- 10

Cat15 - 9

Cat13 - 8

Cat10 - 7   

Cat11 - 6

Cat12 - 5
Now we want show only top 10 values so we will miss: Cat11 and Cat12 as it
sort by count not by its ranking

We would like result below :

Cat15
  Cat13


Cat12 

Cat11 

Cat10 
Cat9

Cat8

Cat7

Cat4

Cat3

Cat2

Cat1

Hope this will convey what we want

Have great day .:)

Thanks and Regards,
Amit


-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik Seeley
Sent: 22 November 2008 22:51
To: solr-user@lucene.apache.org
Subject: Re: facet sort by ranking

On Sat, Nov 22, 2008 at 12:05 PM, Amit <[EMAIL PROTECTED]> wrote:
> Actually we have some ranking associated to field on which we are faceting
> and we want to show only top 10 facet value now which is sort by count but
> we want to sort by it ranking.

I think you're going to have to give some concrete examples of what
your documents look like, and what results you want back.

-Yonik

No virus found in this incoming message.
Checked by AVG. 
Version: 7.5.549 / Virus Database: 270.9.9/1804 - Release Date: 21-11-2008
18:24
 

No virus found in this outgoing message.
Checked by AVG. 
Version: 7.5.549 / Virus Database: 270.9.9/1804 - Release Date: 21-11-2008
18:24
 



Re: How fast indexing?

2016-03-20 Thread Amit Jha
Hi All,

In my case I am using DIH to index the data and Query is having 2 join 
statements. To index 70K documents it is taking 3-4Hours. Document size would 
be around 10-20KB. DB is MSSQL and using solr4.2.10 in cloud mode.

Rgds
AJ

> On 21-Mar-2016, at 05:23, Erick Erickson  wrote:
> 
> In my experience, a majority of the time the bottleneck is in
> the data acquisition, not the Solr indexing per-se. Take a look
> at the CPU utilization on Solr, if it's not running very heavy,
> then you need to look upstream.
> 
> You haven't told us anything about _how_ you're indexing.
> SolrJ? DIH? Something from some other party? so it's hard to
> say much useful.
> 
> You might review:
> 
> http://wiki.apache.org/solr/UsingMailingLists
> 
> Best,
> Erick
> 
> On Sun, Mar 20, 2016 at 3:31 PM, Nick Vasilyev 
> wrote:
> 
>> There can be a lot of factors, can you provide a bit of additional
>> information to get started?
>> 
>> - How many items are you indexing per second?
>> - How does the indexing process look like?
>> - How large is each item?
>> - What hardware are you using?
>> - How is your Solr set up? JVM memory, collection layout, etc...
>> - What is your current commit frequency?
>> - What is the query volume while you are indexing?
>> 
>> On Sun, Mar 20, 2016 at 6:25 PM, fabigol 
>> wrote:
>> 
>>> hi,
>>> i have a soir project where i do the indexing since a database postgre.
>>> the indexation is very long.
>>> How i can accelerate it.
>>> I can modify autocommit in the file solrconfig.xml?
>>> someone has some ideas. I looking on google but I found little
>>> help me please
>>> 
>>> 
>>> 
>>> 
>>> --
>>> View this message in context:
>>> http://lucene.472066.n3.nabble.com/How-fast-indexing-tp4264994.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>> 


Re: How fast indexing?

2016-03-21 Thread Amit Jha
Yes, I do have multiple modes in my solr cloud setup.

Rgds
AJ

> On 21-Mar-2016, at 22:20, fabigol  wrote:
> 
> Amit Jha,
> do you have several sold server with solr cloud?
> 
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/How-fast-indexing-tp4264994p4265122.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: How fast indexing?

2016-03-21 Thread Amit Jha
When I run the same sql on DB it takes only 1 sec. And 6-7 documents are 
getting indexed per second. 

As I've 4 node solrCloud setup, can I run 4 import handler to index the same 
data? Will it not over write? 

10-20k is very high in numbers, where can I get the actual size of document.

Rgds
AJ

> On 22-Mar-2016, at 05:32, Shawn Heisey  wrote:
> 
>> On 3/20/2016 6:11 PM, Amit Jha wrote:
>> In my case I am using DIH to index the data and Query is having 2 join 
>> statements. To index 70K documents it is taking 3-4Hours. Document size 
>> would be around 10-20KB. DB is MSSQL and using solr4.2.10 in cloud mode.
> 
> My source data is in a MySQL database.  I use DIH for full rebuilds and
> SolrJ for maintenance.
> 
> My index is sharded, but I'm not running SolrCloud.  When using DIH, all
> of my shards build at once, and each one achieves about 750 docs per
> second.  With six large shards, rebuilding a 146 million document index
> takes 9-10 hours.  It produces a total index size in the ballpark of 170GB.
> 
> DIH has a performance limitation -- it's single-threaded.  I obtain the
> speeds that I do because all of my shards import at the same time -- six
> dataimport instances running at the same time, each one with a single
> thread, importing a little more than 24 million documents.  I have
> discovered that Solr is the bottleneck on my setup.  The data retrieval
> from MySQL can proceed much faster than Solr can handle with a single
> indexing thread.  My situation is a little bit unusual -- as Erick
> mentioned, usually the bottleneck is data retrieval, not Solr.
> 
> At this point, if I want to make bulk indexing go faster, I need to
> build a SolrJ application that can index with multiple threads to each
> Solr core at the same time.  This is on my roadmap, but it's not going
> to be a trivial project.
> 
> At 10-20K, your documents are large, but not excessively so.  If 7
> documents takes 3-4 hours, then there's one of a few problems happening.
> 
> 1) your database is VERY slow.
> 2) your analysis chain in schema.xml is running SUPER slow analysis
> components.
> 3) Your server or its configuration is not providing enough resources
> (CPU/RAM/IO) so Solr can run efficiently.
> 
> #2 seems rather unlikely, so I would suspect one of the other two.
> 
> 
> 
> I have seen one situation related to the Microsoft side of your setup
> that might cause a problem like this.  If any of your machines are
> running on Windows Server 2012 and you have bridged NICs (usually for
> failover in the event of a switch failure), then you will need to break
> the bridge and just run one NIC.
> 
> The performance improvement on the network when a bridged NIC is removed
> from Server 2012 is enough to blow your mind, especially if the access
> is over a high-latency network link, like a VPN or WAN connection.  The
> same setup on Server 2003 or Server 2008 has very good performance.
> Microsoft seems to have a bug with bridged NICs in Server 2012.  Last
> time I tried to figure out whether it could be fixed, I ran into this
> problem:
> 
> https://xkcd.com/979/
> 
> Thanks,
> Shawn
> 


SolrCloud Replication Issue

2015-04-27 Thread Amit L
Hi,

A few days ago I deployed a solr 4.9.0 cluster, which consists of 2
collections. Each collection has 1 shard with 3 replicates on 3 different
machines.

On the first day I noticed this error appear on the leader. Full Log -
http://pastebin.com/wcPMZb0s

4/23/2015, 2:34:37 PM SEVERE SolrCmdDistributor
org.apache.solr.client.solrj.SolrServerException: IOException occured when
talking to server at:
http://production-solrcloud-004:8080/solr/bookings_shard1_replica2

4/23/2015, 2:34:37 PM WARNING DistributedUpdateProcessor
Error sending update

4/23/2015, 2:34:37 PM WARNING ZkController
Leader is publishing core=bookings_shard1_replica2 state=down on behalf of
un-reachable replica
http://production-solrcloud-004:8080/solr/bookings_shard1_replica2/;
forcePublishState? false


The other 2 replicas had 0 errors.

I thought it may be a one off but the same error occured on day 2 which has
got me slighlty concerned. During these periods I didn't notice any issues
with the cluster and everything looks healthy in the cloud summary. All of
the instances are hosted on AWS.

Any idea what may be causing this issue and what I can do to mitigate?

Thanks
Amit


Re: SolrCloud Replication Issue

2015-04-27 Thread Amit L
Appreciate the response, to answer your questions.

* Do you see this happen often? How often?
It has happened twice in five days. The first two days after deployment.

* Are there any known network issues?
There are no obvious network issues but as these instances reside in AWS i
cannot rule it out network blips.

* Do you have any idea about the GC on those replicas?
I have been monitoring the memory usage and all instances are using no more
than 30% of its JVM memory allocation.




On 27 April 2015 at 21:36, Anshum Gupta  wrote:

> Looks like LeaderInitiatedRecovery or LIR. When a leader receives a
> document (update) but fails to successfully forward it to a replica, it
> marks that replica as down and asks the replica to recover (hence the name,
> Leader Initiated Recovery). It could be due to multiple reasons e.g.
> network issue/GC. The replica generally comes back up and syncs with the
> leader transparently. As an end-user, you don't have to really worry much
> about this but if you want to dig deeper, here are a few questions that
> might help us in suggesting what to do/look at.
> * Do you see this happen often? How often?
> * Are there any known network issues?
> * Do you have any idea about the GC on those replicas?
>
>
> On Mon, Apr 27, 2015 at 1:25 PM, Amit L  wrote:
>
> > Hi,
> >
> > A few days ago I deployed a solr 4.9.0 cluster, which consists of 2
> > collections. Each collection has 1 shard with 3 replicates on 3 different
> > machines.
> >
> > On the first day I noticed this error appear on the leader. Full Log -
> > http://pastebin.com/wcPMZb0s
> >
> > 4/23/2015, 2:34:37 PM SEVERE SolrCmdDistributor
> > org.apache.solr.client.solrj.SolrServerException: IOException occured
> when
> > talking to server at:
> > http://production-solrcloud-004:8080/solr/bookings_shard1_replica2
> >
> > 4/23/2015, 2:34:37 PM WARNING DistributedUpdateProcessor
> > Error sending update
> >
> > 4/23/2015, 2:34:37 PM WARNING ZkController
> > Leader is publishing core=bookings_shard1_replica2 state=down on behalf
> of
> > un-reachable replica
> > http://production-solrcloud-004:8080/solr/bookings_shard1_replica2/;
> > forcePublishState? false
> >
> >
> > The other 2 replicas had 0 errors.
> >
> > I thought it may be a one off but the same error occured on day 2 which
> has
> > got me slighlty concerned. During these periods I didn't notice any
> issues
> > with the cluster and everything looks healthy in the cloud summary. All
> of
> > the instances are hosted on AWS.
> >
> > Any idea what may be causing this issue and what I can do to mitigate?
> >
> > Thanks
> > Amit
> >
>
>
>
> --
> Anshum Gupta
>


Real Time indexing and Scalability

2015-06-05 Thread Amit Jha
Hi,

In my use case, I am adding a document to Solr through spring application using 
spring-data-solr. This setup works well with single Solr. In current setup it 
is single point of failure. So we decided to use solr replication because we 
also need centralized search. Therefore we setup two instances both in repeater 
mode. The problem with this setup was, some time data was not get indexed. So 
we moved to SolrCloud, with 3zk and 2 shards and 2 replica setup, but still 
sometime we found that documents are not getting indexed.

I would like to know what is the best way to have highly available setup.

Rgds
AJ

Re: Real Time indexing and Scalability

2015-06-05 Thread Amit Jha
I want to have realtime index and realtime search.

Rgds
AJ

> On Jun 5, 2015, at 10:12 PM, Amit Jha  wrote:
> 
> Hi,
> 
> In my use case, I am adding a document to Solr through spring application 
> using spring-data-solr. This setup works well with single Solr. In current 
> setup it is single point of failure. So we decided to use solr replication 
> because we also need centralized search. Therefore we setup two instances 
> both in repeater mode. The problem with this setup was, some time data was 
> not get indexed. So we moved to SolrCloud, with 3zk and 2 shards and 2 
> replica setup, but still sometime we found that documents are not getting 
> indexed.
> 
> I would like to know what is the best way to have highly available setup.
> 
> Rgds
> AJ


Re: Real Time indexing and Scalability

2015-06-05 Thread Amit Jha
Thanks Eric, what about document is committed to master?Then document should be 
visible from master. Is that correct?

I was using replication with repeater mode because LBHttpSolrServer can send 
write request to any of the Solr server, and that Solr should index the 
document because it a master. we have a polling interval of 2 sec. After 
polling interval slave can poll the data. It is worth to mention here is 
application request the commit command. 

If document is committed to master and a search request coming to the same 
master then document should be retrieved. Irrespective of replication because 
master doesn't know who the slave are?

In repeater mode document can be indexed on both the Solr instance. Is that 
understanding correct?

Also why you say that commit is inappropriate? 





Rgds
AJ

> On Jun 5, 2015, at 11:16 PM, Erick Erickson  wrote:
> 
> You have to provide a _lot_ more details. You say:
> "The problem... some data was not get indexed... still sometime we
> found that documents are not getting indexed".
> 
> Neither of these should be happening, so I suspect
> 1> you're expectations aren't correct. For instance, in the
> master/slave setup you won't see docs on the slave until after the
> polling interval is expired and the index is replicated.
> 2> In SolrCloud you aren't committing appropriately.
> 
> You might review: http://wiki.apache.org/solr/UsingMailingLists
> 
> Best,
> Erick
> 
> 
>> On Fri, Jun 5, 2015 at 9:45 AM, Amit Jha  wrote:
>> I want to have realtime index and realtime search.
>> 
>> Rgds
>> AJ
>> 
>>> On Jun 5, 2015, at 10:12 PM, Amit Jha  wrote:
>>> 
>>> Hi,
>>> 
>>> In my use case, I am adding a document to Solr through spring application 
>>> using spring-data-solr. This setup works well with single Solr. In current 
>>> setup it is single point of failure. So we decided to use solr replication 
>>> because we also need centralized search. Therefore we setup two instances 
>>> both in repeater mode. The problem with this setup was, some time data was 
>>> not get indexed. So we moved to SolrCloud, with 3zk and 2 shards and 2 
>>> replica setup, but still sometime we found that documents are not getting 
>>> indexed.
>>> 
>>> I would like to know what is the best way to have highly available setup.
>>> 
>>> Rgds
>>> AJ


Re: Real Time indexing and Scalability

2015-06-05 Thread Amit Jha
Thanks Shawn, for reminding CloudSolrServer, yes I have moved to SolrCloud. 

I agree that repeater is a slave and acts as master for other slaves. But still 
it's a master and logically it has to obey the what master suppose to obey. 

if 2 servers are master that means writing can be done on both. If I setup 
replication between 2 servers and configure both as repeater, than both can act 
master and slave for each other. Therefore writing can be done on both.


Rgds
AJ

> On Jun 6, 2015, at 1:26 AM, Shawn Heisey  wrote:
> 
>> On 6/5/2015 1:38 PM, Amit Jha wrote:
>> Thanks Eric, what about document is committed to master?Then document should 
>> be visible from master. Is that correct?
>> 
>> I was using replication with repeater mode because LBHttpSolrServer can send 
>> write request to any of the Solr server, and that Solr should index the 
>> document because it a master. we have a polling interval of 2 sec. After 
>> polling interval slave can poll the data. It is worth to mention here is 
>> application request the commit command. 
>> 
>> If document is committed to master and a search request coming to the same 
>> master then document should be retrieved. Irrespective of replication 
>> because master doesn't know who the slave are?
>> 
>> In repeater mode document can be indexed on both the Solr instance. Is that 
>> understanding correct?
>> 
>> Also why you say that commit is inappropriate?
> 
> If you are not using SolrCloud, then you must index to the master
> *ONLY*.  A repeater does not enable two-way replication.  A repeater is
> a slave that is also a master for additional slaves.  Master-slave
> replication is *only* one-way - from the master to slaves, and if any of
> those slaves are repeaters, from there to additional slaves.
> 
> SolrCloud is probably a far better choice for your setup, especially if
> you are using the SolrJ client.  You mentioned LBHttpSolrServer, which
> is why I am thinking you're using SolrJ.
> 
> With a proper configuration on your collection, SolrCloud lets you index
> to any machine in the cloud and the data will end up exactly where it
> needs to go.  If you use CloudSolrServer/CloudSolrClient and a very
> recent Solr/SolrJ version, the data will be sent directly to the correct
> instance for best performance.
> 
> Thanks,
> Shawn
> 


Re: Real Time indexing and Scalability

2015-06-05 Thread Amit Jha
Thanks everyone. I got the answer.

Rgds
AJ

> On Jun 6, 2015, at 7:00 AM, Erick Erickson  wrote:
> 
> bq: if 2 servers are master that means writing can be done on both.
> 
> If there's a single piece of documentation that supports this contention,
> we'll correct it immediately. But it's simply not true.
> 
> As Shawn says, the entire design behind master/slave
> architecture is that there is exactly one (and only one) master that
> _ever_ gets documents indexed to it. Repeaters were introduced
> as a way to "fan out" the replication process, particularly across data
> centers that had "expensive" pipes connecting them. You could have
> the repeater in DC2 relay the index form the master in DC1 to  all slaves in
> DC2. In that kind of setup, you then replicate the index
> across the expensive pipe once rather than once for each slave in
> DC2.
> 
> But even in this situation you are only ever indexing to the master
> on DC1.
> 
> Best,
> Erick
> 
>> On Fri, Jun 5, 2015 at 1:20 PM, Amit Jha  wrote:
>> Thanks Shawn, for reminding CloudSolrServer, yes I have moved to SolrCloud.
>> 
>> I agree that repeater is a slave and acts as master for other slaves. But 
>> still it's a master and logically it has to obey the what master suppose to 
>> obey.
>> 
>> if 2 servers are master that means writing can be done on both. If I setup 
>> replication between 2 servers and configure both as repeater, than both can 
>> act master and slave for each other. Therefore writing can be done on both.
>> 
>> 
>> Rgds
>> AJ
>> 
>>>> On Jun 6, 2015, at 1:26 AM, Shawn Heisey  wrote:
>>>> 
>>>> On 6/5/2015 1:38 PM, Amit Jha wrote:
>>>> Thanks Eric, what about document is committed to master?Then document 
>>>> should be visible from master. Is that correct?
>>>> 
>>>> I was using replication with repeater mode because LBHttpSolrServer can 
>>>> send write request to any of the Solr server, and that Solr should index 
>>>> the document because it a master. we have a polling interval of 2 sec. 
>>>> After polling interval slave can poll the data. It is worth to mention 
>>>> here is application request the commit command.
>>>> 
>>>> If document is committed to master and a search request coming to the same 
>>>> master then document should be retrieved. Irrespective of replication 
>>>> because master doesn't know who the slave are?
>>>> 
>>>> In repeater mode document can be indexed on both the Solr instance. Is 
>>>> that understanding correct?
>>>> 
>>>> Also why you say that commit is inappropriate?
>>> 
>>> If you are not using SolrCloud, then you must index to the master
>>> *ONLY*.  A repeater does not enable two-way replication.  A repeater is
>>> a slave that is also a master for additional slaves.  Master-slave
>>> replication is *only* one-way - from the master to slaves, and if any of
>>> those slaves are repeaters, from there to additional slaves.
>>> 
>>> SolrCloud is probably a far better choice for your setup, especially if
>>> you are using the SolrJ client.  You mentioned LBHttpSolrServer, which
>>> is why I am thinking you're using SolrJ.
>>> 
>>> With a proper configuration on your collection, SolrCloud lets you index
>>> to any machine in the cloud and the data will end up exactly where it
>>> needs to go.  If you use CloudSolrServer/CloudSolrClient and a very
>>> recent Solr/SolrJ version, the data will be sent directly to the correct
>>> instance for best performance.
>>> 
>>> Thanks,
>>> Shawn
>>> 


SolrCloud Document Update Problem

2015-06-29 Thread Amit Jha
Hi,

I setup a SolrCloud with 2 shards each is having 2 replicas with 3
zookeeper ensemble.

We add and update documents from web app. While updating we delete the
document and add same document with updated values with same unique id.

I am facing a very strange issue that some time 2 documents have the same
unique ID. One document with old values and another one with new values.
It happens only we update the document.

Please suggest or guide...

Rgds


Re: SolrCloud Document Update Problem

2015-06-29 Thread Amit Jha
It was because of the issues

Rgds
AJ

> On Jun 29, 2015, at 6:52 PM, Shalin Shekhar Mangar  
> wrote:
> 
>> On Mon, Jun 29, 2015 at 4:37 PM, Amit Jha  wrote:
>> Hi,
>> 
>> I setup a SolrCloud with 2 shards each is having 2 replicas with 3
>> zookeeper ensemble.
>> 
>> We add and update documents from web app. While updating we delete the
>> document and add same document with updated values with same unique id.
> 
> I am not sure why you delete the document. If you use the same unique
> key and send the whole document again (with some other fields
> changed), Solr will automatically overwrite the old document with the
> new one.
> 
>> 
>> I am facing a very strange issue that some time 2 documents have the same
>> unique ID. One document with old values and another one with new values.
>> It happens only we update the document.
> 
> 
>> 
>> Please suggest or guide...
>> 
>> Rgds
> 
> 
> 
> -- 
> Regards,
> Shalin Shekhar Mangar.


SegmentInfo from (SolrIndexSearcher) LeafReader

2016-05-14 Thread Amit Kumar
Hey Guys,

I am writing a SearchComponent for SOLR 5.4.0 that does some caching at the
level of segments and I want to be able to get SegmentInfo from a
LeafReader -I am unable to figure that out; A LeafReader is not an instance
of SegmentReader that exposes the segment information, is it still possible
to get the SegmentInfo that I might be missing, If I am in
the SearchComponent.prepare/process.

Many thanks,
Amit


Solr results in null response

2017-12-26 Thread Kumar, Amit
Hi Team,

I have an application running on solr 4.7.0; I am frequently seeing null 
responses for requests to application. On SOLR console I see below error 
related to 'grouping parameters'. Although I am setting all grouping parameters 
in code. Could you please suggest why it is throwing this error, the scenario 
in which it throws this, how I can rectify it?

Thanks in advance. Below is the full error details:

org.apache.solr.common.SolrException: Specify at least one field, function or 
query to group by.
 at org.apache.solr.search.Grouping.execute(Grouping.java:298)
 at 
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:433)
 at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:214)
 at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1916)
 at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:780)
 at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:427)
 at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:217)
 at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
 at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
 at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220)
 at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122)
 at 
org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:503)
 at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:170)
 at 
com.googlecode.psiprobe.Tomcat70AgentValve.invoke(Tomcat70AgentValve.java:38)
 at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103)
 at 
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:950)
 at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116)
 at 
org.apache.catalina.ha.session.JvmRouteBinderValve.invoke(JvmRouteBinderValve.java:218)
 at 
org.apache.catalina.ha.tcp.ReplicationValve.invoke(ReplicationValve.java:333)
 at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:421)
 at 
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1070)
 at 
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:611)
 at 
org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.doRun(AprEndpoint.java:2462)
 at 
org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.run(AprEndpoint.java:2451)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at 
org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
 at java.lang.Thread.run(Thread.java:745)

best,
Amit



Please explain SolConfig.xml in terms of SolrAPIs (Java Psuedo Code)

2013-10-25 Thread Amit Aggarwal
Hello All,

Can some one explain me following snippet of SolrConfig.xml in terms of
Solr API (Java Psuedo Code) for better understanding.

like
**
* *
* *
*   *
**
**
**
**
**


Here I want to know .

1. What is "updateHandler" ? Is it some Package or class of interface ?
2. Whats is solr.DirectUpdateHandler2 ? Is it class
3. What is "updateLog" ? is it package ?
4. How do we know that UpdateLog have sub-element "dir" ?
5. how do we know that "updateLog" would be sub-element of "updateHandler"
?? Is "updateLog" some kind of subClass of something else ?


I KNOW that all these things are given in SolConfig.xml but I donot want to
cram those things .

One example of jetty.xml whatever we write there , it can be translated to
JAVA psuedo code


Re: Please explain SolConfig.xml in terms of SolrAPIs (Java Psuedo Code)

2013-10-25 Thread Amit Aggarwal
Yeah , you caught it right  Yes it was kid of Dtd .
Anyways thanks a lot for clearing my doubt ..

SOLVED .
On 25-Oct-2013 6:34 PM, "Daniel Collins"  wrote:

> I think what you are looking for is some kind of DTD/schema you can use to
> see all the possible parameters in SolrConfig.xml, short answer, there
> isn't one (currently) :(
>
> jetty.xml has a DTD schema, and its XMLConfiguration format is inherently
> designed to convert to code, so the list of possible options can be
> generated by Java Reflection, but Solr's isn't quite that advanced.
>
> Generally speaking the config is described in
> http://wiki.apache.org/solr/SolrConfigXml.
>
> However, that is (by the nature of manually generated documentation) a bit
> out of date, so things like the updateLog aren't referenced there.  There
> is no Schema or DTD for SolrConfig, the best place to look for what the
> various options are is either the sample config which is generally quite
> good or the code (org.apache.solr.core.SolrConfig.java).
>
> At the end of the day updateLog is just the name of a config parameter  it
> is grouped under updateHandler since it relates to that.  How we "know"
> such a parameter exists:
>
> 1) it was in the sample config (and commented to indicate what it means)
> 2) its referenced in the code if you look through that
>
>
>
>
> On 25 October 2013 13:06, Alexandre Rafalovitch 
> wrote:
>
> > I think better understanding is a bit too vague. Is there a specific
> > problem you have? Your Jetty example would make sense if, for example,
> your
> > goal was to automatically generate solrconfig.xml from some other
> > configuration. But even then, you would probably use fillable templates
> and
> > don't need fully corresponding JAVA api.
> >
> > For example, you are unlikely to edit the very line you are asking about,
> > it's a little too esoteric:
> > 
> >
> > Perhaps, what you want to do is to look at the smallest possible
> > solrconfig.xml and then expand from there by looking at additional
> options.
> >
> > Regarding specific options available, most are documented on the Wiki and
> > in the comments of the sample file.
> >
> > Regards,
> >Alex.
> >
> > Personal website: http://www.outerthoughts.com/
> > LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> > - Time is the quality of nature that keeps events from happening all at
> > once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)
> >
> >
> > On Fri, Oct 25, 2013 at 5:19 PM, Amit Aggarwal <
> amit.aggarwa...@gmail.com
> > >wrote:
> >
> > > Hello All,
> > >
> > > Can some one explain me following snippet of SolrConfig.xml in terms of
> > > Solr API (Java Psuedo Code) for better understanding.
> > >
> > > like
> > > **
> > > * *
> > > * *
> > > *   *
> > > **
> > > **
> > > **
> > > **
> > > **
> > >
> > >
> > > Here I want to know .
> > >
> > > 1. What is "updateHandler" ? Is it some Package or class of interface ?
> > > 2. Whats is solr.DirectUpdateHandler2 ? Is it class
> > > 3. What is "updateLog" ? is it package ?
> > > 4. How do we know that UpdateLog have sub-element "dir" ?
> > > 5. how do we know that "updateLog" would be sub-element of
> > "updateHandler"
> > > ?? Is "updateLog" some kind of subClass of something else ?
> > >
> > >
> > > I KNOW that all these things are given in SolConfig.xml but I donot
> want
> > to
> > > cram those things .
> > >
> > > One example of jetty.xml whatever we write there , it can be translated
> > to
> > > JAVA psuedo code
> > >
> >
>


Re: How to configure solr to our java project in eclipse

2013-10-27 Thread Amit Aggarwal
How so you start your another project ? If it is maven or ant then you can
use anturn plugin to start solr . Otherwise you can write a small shell
script to start solr ..
 On 27-Oct-2013 9:15 PM, "giridhar"  wrote:

> Hi friends,Iam giridhar.please clarify my doubt.
>
> we are using solr for our project.the problem the solr is outside of our
> project( in another folder)
>
> we have to manually type java -start.jar to start the solr and use that
> services.
>
> But what we need is,when we run the project,the solr should be
> automatically
> start.
>
> our project is a java project with tomcat in eclipse.
>
> How can i achieve this.
>
> Please help me.
>
> Thankyou.
> Giridhar
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/How-to-configure-solr-to-our-java-project-in-eclipse-tp4097954.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Solr For

2013-10-27 Thread Amit Aggarwal
Depends  One core one schema file ... One solrconfig.xml .

So if you want only one core then put all required fields of both search in
one schema file and carry out your searches  Otherwise make two cores
having two schema file and perform searches accordingly ...
On 27-Oct-2013 7:22 AM, "Baskar Sikkayan"  wrote:

> Hi,
>Looking for solr config for Job Site. In a job site there are 2 main
> searches.
>
> 1) Employee can search for job ( based on skill set, job location, title,
> salary )
> 2) Employer can search for employees ( based on skill set, exp, location,
>  )
>
> Should i have a separate config xml for both searches?
>
> Thanks,
> Baskar
>


Re: Stop solr service

2013-10-27 Thread Amit Aggarwal
Lol ... Unsubscribe from this mailing list .
On 27-Oct-2013 5:02 PM, "veena rani"  wrote:

> I want to stop the mail
>
>
> On Sun, Oct 27, 2013 at 4:37 PM, Rafał Kuć  wrote:
>
> > Hello!
> >
> > Could you please write more about what you want to do? Do you need to
> > stop running Solr process. If yes what you need to do is stop the
> > container (Jetty/Tomcat) that Solr runs in. You can also kill JVM
> > running Solr, however it will be usually enough to just stop the
> > container.
> >
> > --
> > Regards,
> >  Rafał Kuć
> > Performance Monitoring * Log Analytics * Search Analytics
> > Solr & Elasticsearch Support * http://sematext.com/
> >
> >
> > > Hi Team,
> >
> > > Pla stop the solr service.
> >
> >
>
>
> --
> Regards,
> Veena Rani P N
> Banglore.
> 9538440458
>


Re: How to configure solr to our java project in eclipse

2013-10-27 Thread Amit Nithian
Try this:
http://hokiesuns.blogspot.com/2010/01/setting-up-apache-solr-in-eclipse.html

I use this today and it still works. If anything is outdated (as it's a
relatively old post) let me know.
I wrote this so ping me if you have any questions.

Thanks
Amit


On Sun, Oct 27, 2013 at 7:33 PM, Amit Aggarwal wrote:

> How so you start your another project ? If it is maven or ant then you can
> use anturn plugin to start solr . Otherwise you can write a small shell
> script to start solr ..
>  On 27-Oct-2013 9:15 PM, "giridhar"  wrote:
>
> > Hi friends,Iam giridhar.please clarify my doubt.
> >
> > we are using solr for our project.the problem the solr is outside of our
> > project( in another folder)
> >
> > we have to manually type java -start.jar to start the solr and use that
> > services.
> >
> > But what we need is,when we run the project,the solr should be
> > automatically
> > start.
> >
> > our project is a java project with tomcat in eclipse.
> >
> > How can i achieve this.
> >
> > Please help me.
> >
> > Thankyou.
> > Giridhar
> >
> >
> >
> > --
> > View this message in context:
> >
> http://lucene.472066.n3.nabble.com/How-to-configure-solr-to-our-java-project-in-eclipse-tp4097954.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
>


When is/should qf different from pf?

2013-10-27 Thread Amit Nithian
Hi all,

I have been using Solr for years but never really stopped to wonder:

When using the dismax/edismax handler, when do you have the qf different
from the pf?

I have always set them to be the same (maybe different weights) but I was
wondering if there is a situation where you would have a field in the qf
not in the pf or vice versa.

My understanding from the docs is that qf is a term-wise hard filter while
pf is a phrase-wise boost of documents who made it past the "qf" filter.

Thanks!
Amit


return value from SolrJ client to php

2013-10-28 Thread Amit Aggarwal
Hello All,

I have a requirement where I have to conect to Solr using SolrJ client and
documents return by solr to SolrJ client have to returned to PHP.

I know its simple to get document from Solr to SolrJ
But how do I return documents from SolrJ to PHP ?


Thanks
Amit Aggarwal


Re: When is/should qf different from pf?

2013-10-28 Thread Amit Nithian
Thanks Erick. Numeric fields make sense as I guess would strictly string
fields too since its one  term? In the normal text searching case though
does it make sense to have qf and pf differ?

Thanks
Amit
On Oct 28, 2013 3:36 AM, "Erick Erickson"  wrote:

> The facetious answer is "when phrases aren't important in the fields".
> If you're doing a simple boolean match, adding phrase fields will add
> expense, to no good purpose etc. Phrases on numeric
> fields seems wrong.
>
> FWIW,
> Erick
>
>
> On Mon, Oct 28, 2013 at 1:03 AM, Amit Nithian  wrote:
>
> > Hi all,
> >
> > I have been using Solr for years but never really stopped to wonder:
> >
> > When using the dismax/edismax handler, when do you have the qf different
> > from the pf?
> >
> > I have always set them to be the same (maybe different weights) but I was
> > wondering if there is a situation where you would have a field in the qf
> > not in the pf or vice versa.
> >
> > My understanding from the docs is that qf is a term-wise hard filter
> while
> > pf is a phrase-wise boost of documents who made it past the "qf" filter.
> >
> > Thanks!
> > Amit
> >
>


Re: Why do people want to deploy to Tomcat?

2013-11-12 Thread Amit Aggarwal
Agreed with Doug
On 12-Nov-2013 6:46 PM, "Doug Turnbull" 
wrote:

> As an aside, I think one reason people feel compelled to deviate from the
> distributed jetty distribution is because the folder is named "example".
> I've had to explain to a few clients that this is a bit of a misnomer. The
> IT dept especially sees "example" and feels uncomfortable using that as a
> starting point for a jetty install. I wish it was called "default" or "bin"
> or something where its more obviously the default jetty distribution of
> Solr.
>
>
> On Tue, Nov 12, 2013 at 7:06 AM, Roland Everaert  >wrote:
>
> > In my case, the first time I had to deploy and configure solr on tomcat
> > (and jboss) it was a requirement to reuse as much as possible the
> > application/web server already in place. The next deployment I also use
> > tomcat, because I was used to deploy on tomcat and I don't know jetty at
> > all.
> >
> > I could ask the same question with regard to jetty. Why use/bundle(/ if
> not
> > recommend) jetty with solr over other webserver solutions?
> >
> > Regards,
> >
> >
> > Roland Everaert.
> >
> >
> >
> > On Tue, Nov 12, 2013 at 12:33 PM, Alvaro Cabrerizo  > >wrote:
> >
> > > In my case, the selection of the servlet container has never been a
> hard
> > > requirement. I mean, some customers provide us a virtual machine
> > configured
> > > with java/tomcat , others have a tomcat installed and want to share it
> > with
> > > solr, others prefer jetty because their sysadmins are used to configure
> > > it...  At least in the projects I've been working in, the selection of
> > the
> > > servlet engine has not been a key factor in the project success.
> > >
> > > Regards.
> > >
> > >
> > > On Tue, Nov 12, 2013 at 12:11 PM, Andre Bois-Crettez
> > > wrote:
> > >
> > > > We are using Solr running on Tomcat.
> > > >
> > > > I think the top reasons for us are :
> > > >  - we already have nagios monitoring plugins for tomcat that trace
> > > > queries ok/error, http codes / response time etc in access logs,
> number
> > > > of threads, jvm memory usage etc
> > > >  - start, stop, watchdogs, logs : we also use our standard tools for
> > that
> > > >  - what about security filters ? Is that possible with jetty ?
> > > >
> > > > André
> > > >
> > > >
> > > > On 11/12/2013 04:54 AM, Alexandre Rafalovitch wrote:
> > > >
> > > >> Hello,
> > > >>
> > > >> I keep seeing here and on Stack Overflow people trying to deploy
> Solr
> > to
> > > >> Tomcat. We don't usually ask why, just help when where we can.
> > > >>
> > > >> But the question happens often enough that I am curious. What is the
> > > >> actual
> > > >> business case. Is that because Tomcat is well known? Is it because
> > other
> > > >> apps are running under Tomcat and it is ops' requirement? Is it
> > because
> > > >> Tomcat gives something - to Solr - that Jetty does not?
> > > >>
> > > >> It might be useful to know. Especially, since Solr team is
> considering
> > > >> making the server part into a black box component. What use cases
> will
> > > >> that
> > > >> break?
> > > >>
> > > >> So, if somebody runs Solr under Tomcat (or needed to and gave up),
> > let's
> > > >> use this thread to collect this knowledge.
> > > >>
> > > >> Regards,
> > > >> Alex.
> > > >> Personal website: http://www.outerthoughts.com/
> > > >> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> > > >> - Time is the quality of nature that keeps events from happening all
> > at
> > > >> once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
> > > book)
> > > >>
> > > >> --
> > > >> André Bois-Crettez
> > > >>
> > > >> Software Architect
> > > >> Search Developer
> > > >> http://www.kelkoo.com/
> > > >>
> > > >
> > > > Kelkoo SAS
> > > > Société par Actions Simplifiée
> > > > Au capital de € 4.168.964,30
> > > > Siège social : 8, rue du Sentier 75002 Paris
> > > > 425 093 069 RCS Paris
> > > >
> > > > Ce message et les pièces jointes sont confidentiels et établis à
> > > > l'attention exclusive de leurs destinataires. Si vous n'êtes pas le
> > > > destinataire de ce message, merci de le détruire et d'en avertir
> > > > l'expéditeur.
> > > >
> > >
> >
>
>
>
> --
> Doug Turnbull
> Search & Big Data Architect
> OpenSource Connections 
>


Boosting documents by categorical preferences

2013-11-12 Thread Amit Nithian
Hi all,

I have a question around boosting. I wanted to use the &boost= to write a
nested query that will boost a document based on categorical preferences.

For a movie search for example, say that a user likes drama, comedy, and
action. I could use things like

qq=&q={!boost%20b=$b%20defType=edismax%20v=$qq}&b=sum(product(query($cat1),1.482),product(query($cat2),0.1199),product(query($cat3),1.448))&cat1=category:Drama&cat2=category:Comedy&cat3=category:Action

where cat1=Drama cat2=Comedy cat3=Action

Currently I have the weights set to the z-score equivalent of a user's
preference for that category which is simply how many standard deviations
above the global average is this user's preference for that movie category.

My question though is basically whether or not semantically the equation
query(category:Drama)* + query(category:Comedy)*
+ query(category:Action)* makes sense?

What are some techniques people use to boost documents based on discrete
things like category, manufacturer, genre etc?

Thanks!
Amit


How to get score with getDocList method Solr API

2013-11-18 Thread Amit Aggarwal

Hello All,

I am trying to develop a custom request handler.
Here is the snippet :

// returnMe is nothing but a list of Document going to return

try {

// FLAG ???
DocList docList = searcher.getDocList(parsedQuery, 
parsedFilterQueryList, Sort.RELEVANCE, 1, maxDocs , FLAG);


// Now get DocIterator
DocIterator it = docList.iterator();

// Now for each id get doc and put it in list

int i =0;
while (it.hasNext()) {

returnMe.add(searcher.doc(it.next()));

}


Ques 1 - > My question is , what does FLAG represent in getDocList method ?
Ques 2 - > How can I ensure that searcher.getDocList method give me 
score also with each document.



--
Amit Aggarwal
8095552012



Re: Boosting documents by categorical preferences

2013-11-18 Thread Amit Nithian
Hey Chris,

Sorry for the delay and thanks for your response. This was inspired by your
talk on boosting and biasing that you presented way back when at a meetup.
I'm glad that my general approach seems to make sense.

My approach was something like:
1) Look at the categories that the user has preferred and compute the
z-score
2) Pick the top 3 among those
3) Use those to boost search results.

I'll look at using the boosts as an exponent instead of a multiplier as I
think that would make sense.. also as it handles the 0 case.

This is for a prototype I am doing but I'll share the results one day in a
meetup as I think it'll be kinda interesting.

Thanks again
Amit


On Thu, Nov 14, 2013 at 11:11 AM, Chris Hostetter
wrote:

>
> : I have a question around boosting. I wanted to use the &boost= to write a
> : nested query that will boost a document based on categorical preferences.
>
> You have no idea how stoked I am to see you working on this in a real
> world application.
>
> : Currently I have the weights set to the z-score equivalent of a user's
> : preference for that category which is simply how many standard deviations
> : above the global average is this user's preference for that movie
> category.
> :
> : My question though is basically whether or not semantically the equation
> : query(category:Drama)* + query(category:Comedy)* weight>
> : + query(category:Action)* makes sense?
>
> My gut says that your apprach makes sense -- but if i'm
> understadning you correclty, i think that you need to add "1" to
> all your weights: the "boost" is a multiplier, so if someone's rating for
> every category is is 0 std devs above the average rating (ie: the most
> average person imaginable), you don't wnat to give every moving in every
> category a score of 0.
>
> Are you picking the "top 3" categories the user prefers as a cut off, or
> are you arbitrarily using N category boosts for however many N categories
> the user is above the global average in their pref for that category?
>
> Are your prefrences coming from explicit user feedback on the categories
> (ie: "rate how much you like comedies on a scale of 1-5") or are you
> infering it from user ratings of the movies themselves? (ie: "rate this
> movie, which happens to be an scifi,action,comedy, on a scale of 1-5") ...
> because if it's hte later you probably want to be careful to also
> normalize based on how many categories the movie is in.
>
> the other thing to consider is wether you want to include "negative
> prefrences" (ie: weights less then 1) based on how many std dev the user's
> average is *below* the global average for a category .. in this case i
> *think* you'd want to divide the raw value from -1 to get a useful
> multiplier.
>
> Alternatively: you oculd experiment with using the weights as exponents
> instead of multipliers...
>
>
> b=sum(pow(query($cat1),1.482),pow(query($cat2),0.1199),pow(query($cat3),1.448))
>
> ...that would simplify the math you'd have to worry about both for the
> "totally boring average user" (x**0 = 1) and for the categories users hate
> (x**-5 = some positive fraction that will act as a penalty) ... but you'd
> definitley need to run some tests to see if it "over boosts" as the std
> dev variations get really high (might want to take a root first before
> using them as the exponent)
>
>
>
> -Hoss
>


Re: How to get score with getDocList method Solr API

2013-11-19 Thread Amit Aggarwal
Hello shekhar ,
Thanks for answering . Do I have to set GET_SCORES FLAG as last parameter
of getDocList method ?

Thanks
On 19-Nov-2013 1:43 PM, "Shalin Shekhar Mangar" 
wrote:

> A few flags are supported:
> public static final int GET_DOCSET= 0x4000;
> public static final int TERMINATE_EARLY = 0x04;
> public static final int GET_DOCLIST   =0x02; // get
> the documents actually returned in a response
> public static final int GET_SCORES =   0x01;
>
> Use the GET_SCORES flag to get the score with each document.
>
> On Tue, Nov 19, 2013 at 8:08 AM, Amit Aggarwal
>  wrote:
> > Hello All,
> >
> > I am trying to develop a custom request handler.
> > Here is the snippet :
> >
> > // returnMe is nothing but a list of Document going to return
> >
> > try {
> >
> > // FLAG ???
> > DocList docList = searcher.getDocList(parsedQuery,
> > parsedFilterQueryList, Sort.RELEVANCE, 1, maxDocs , FLAG);
> >
> > // Now get DocIterator
> > DocIterator it = docList.iterator();
> >
> > // Now for each id get doc and put it in
> list
> >
> > int i =0;
> > while (it.hasNext()) {
> >
> > returnMe.add(searcher.doc(it.next()));
> >
> > }
> >
> >
> > Ques 1 - > My question is , what does FLAG represent in getDocList
> method ?
> > Ques 2 - > How can I ensure that searcher.getDocList method give me score
> > also with each document.
> >
> >
> > --
> > Amit Aggarwal
> > 8095552012
> >
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>


Re: Boosting documents by categorical preferences

2013-11-20 Thread Amit Nithian
I thought about that but my concern/question was how. If I used the pow
function then I'm still boosting the bad categories by a small
amount..alternatively I could multiply by a negative number but does that
work as expected?

I haven't done much with negative boosting except for the sledgehammer
approach of category exclusion through filters.

Thanks
Amit
On Nov 19, 2013 8:51 AM, "Chris Hostetter"  wrote:

> : My approach was something like:
> : 1) Look at the categories that the user has preferred and compute the
> : z-score
> : 2) Pick the top 3 among those
> : 3) Use those to boost search results.
>
> I think that totaly makes sense ... the additional bit i was suggesting
> that you consider is that instead of picking the "highest" 3 z-scores,
> pick the z-scores with the greatest absolute value ... that way if someone
> is a very booring person and their "positive interests" are all basically
> exactly the same as the mean for everyone else, but they have some very
> strong "dis-interests" you don't bother boosting on those miniscule
> interests and instead you negatively boost on the things they are
> antogonistic against.
>
>
> -Hoss
>


Can I use boosting fields with edismax ?

2013-11-23 Thread Amit Aggarwal
Hello All ,

I am using defType=edismax
So will boosting will work like this in solrConfig.xml

value_search^2.0 desc_search country_search^1.5
state_search^2.0 city_search^2.5 area_search^3.0

I think it is not working ..

If yes , then what should I do ?


Re: Can I use boosting fields with edismax ?

2013-11-24 Thread Amit Aggarwal
Ok Erick.. I will try thanks
On 25-Nov-2013 2:46 AM, "Erick Erickson"  wrote:

> This should work. Try adding &debug=all to your URL, and examine
> the output both with and without your boosting. I believe you'll see
> the difference in the score calculations. From there it's a matter
> of adjusting the boosts to get the results you want.
>
>
> Best,
> Erick
>
>
> On Sat, Nov 23, 2013 at 9:17 AM, Amit Aggarwal  >wrote:
>
> > Hello All ,
> >
> > I am using defType=edismax
> > So will boosting will work like this in solrConfig.xml
> >
> > value_search^2.0 desc_search country_search^1.5
> > state_search^2.0 city_search^2.5 area_search^3.0
> >
> > I think it is not working ..
> >
> > If yes , then what should I do ?
> >
>


Please help me to understand debugQuery output

2013-11-25 Thread Amit Aggarwal

Hello All,

Can any one help me in understanding "debugQuery" output like this.






0.6276088 = (MATCH) sum of:

0.6276088 = (MATCH) max of:

0.18323982 = (MATCH) sum of:

	0.18323982 = (MATCH) weight(state_search:a in 327) [DefaultSimilarity], 
result of:


0.18323982 = score(doc=327,freq=2.0 = termFreq=2.0

), product of:

0.3188151 = queryWeight, product of:

3.2512918 = idf(docFreq=35, maxDocs=342)

0.098057985 = queryNorm

0.5747526 = fieldWeight in 327, product of:

1.4142135 = tf(freq=2.0), with freq of:

2.0 = termFreq=2.0

3.2512918 = idf(docFreq=35, maxDocs=342)

0.125 = fieldNorm(doc=327)

0.2505932 = (MATCH) sum of:

	0.2505932 = (MATCH) weight(country_search:a in 327) 
[DefaultSimilarity], result of:


0.2505932 = score(doc=327,freq=1.0 = termFreq=1.0

), product of:

0.3135134 = queryWeight, product of:

3.1972246 = idf(docFreq=37, maxDocs=342)

0.098057985 = queryNorm

0.79930615 = fieldWeight in 327, product of:

1.0 = tf(freq=1.0), with freq of:

1.0 = termFreq=1.0

3.1972246 = idf(docFreq=37, maxDocs=342)

0.25 = fieldNorm(doc=327)

0.25283098 = (MATCH) sum of:

	0.25283098 = (MATCH) weight(area_search:a in 327) [DefaultSimilarity], 
result of:


0.25283098 = score(doc=327,freq=1.0 = termFreq=1.0

), product of:

0.398 = queryWeight, product of:

4.06 = idf(docFreq=15, maxDocs=342)

0.098057985 = queryNorm

0.6347222 = fieldWeight in 327, product of:

1.0 = tf(freq=1.0), with freq of:

1.0 = termFreq=1.0

4.06 = idf(docFreq=15, maxDocs=342)

0.15625 = fieldNorm(doc=327)

0.6276088 = (MATCH) sum of:

	0.12957011 = (MATCH) weight(city_search:a in 327) [DefaultSimilarity], 
result of:


0.12957011 = score(doc=327,freq=1.0 = termFreq=1.0

), product of:

0.3188151 = queryWeight, product of:

3.2512918 = idf(docFreq=35, maxDocs=342)

0.098057985 = queryNorm

0.40641147 = fieldWeight in 327, product of:

1.0 = tf(freq=1.0), with freq of:

1.0 = termFreq=1.0

3.2512918 = idf(docFreq=35, maxDocs=342)

0.125 = fieldNorm(doc=327)

	0.3638727 = (MATCH) weight(city_search:ab in 327) [DefaultSimilarity], 
result of:


0.3638727 = score(doc=327,freq=1.0 = termFreq=1.0

), product of:

0.5342705 = queryWeight, product of:

5.4485164 = idf(docFreq=3, maxDocs=342)

0.098057985 = queryNorm

0.68106455 = fieldWeight in 327, product of:

1.0 = tf(freq=1.0), with freq of:

1.0 = termFreq=1.0

5.4485164 = idf(docFreq=3, maxDocs=342)

0.125 = fieldNorm(doc=327)

	0.13416591 = (MATCH) weight(city_search:b in 327) [DefaultSimilarity], 
result of:


0.13416591 = score(doc=327,freq=1.0 = termFreq=1.0

), product of:

0.32441998 = queryWeight, product of:

3.3084502 = idf(docFreq=33, maxDocs=342)

0.098057985 = queryNorm

0.41355628 = fieldWeight in 327, product of:

1.0 = tf(freq=1.0), with freq of:

1.0 = termFreq=1.0

3.3084502 = idf(docFreq=33, maxDocs=342)

0.125 = fieldNorm(doc=327)










Any links where this explaination is explained ?

Thanks

--
Amit Aggarwal
8095552012



Re: /select with 'q' parameter does not work

2013-12-11 Thread Amit Aggarwal
Because in your solrconfig ... Against /select ... DirectUpdateHandler is
mentioned . It should be solr.searchhanlder ..
On 11-Dec-2013 3:11 PM, "Nutan"  wrote:

> I have indexed 9 docs.
> this my* schema.xml*
>
> 
> 
>
>  multiValued="false"/>
>  required="true"
> multiValued="false"/>
>  multiValued="false"/>
>  multiValued="true"/>
> 
>  multiValued="false"/>
> 
>  multiValued="true"/>
>
>  stored="false" />
> 
> 
>
> 
>
>  positionIncrementGap="100" >
> 
> 
> 
> 
> 
> 
>
>
> 
>  class="solr.StrField" />
>  positionIncrementGap="0"/>
> 
> 
> 
> 
> 
> 
>
> 
> 
>
>
> 
> 
> 
> 
>  splitOnCaseChange="1" generateNumberParts="1" splitOnNumerics="1" />
>  dictionary="my_stemmer.txt" />
> 
>  ignoreCase="true" expand="false" />
> 
> 
> 
> 
> 
> 
>  splitOnCaseChange="1" generateNumberParts="1" splitOnNumerics="1" />
>  dictionary="my_stemmer.txt" />
> 
> 
> 
> 
> 
> contents
> id
> 
>
> *solrconfig.xml* is:
>
> 
>
> 
>
>   LUCENE_42
>
>   ${solr.document.data.dir:}
>
>   
>multipartUploadLimitInKB="8500" />
> 
>
>
>
>
>default="true">
>
>  
>explicit
>20
>*
>id
>2.1
>  
>   
>
>   
>   
> ${solr.document.data.dir:}
>   
>   
>
>   class="solr.FieldAnalysisRequestHandler" />
>  
>  
> 
>  
>explicit
>10
>contents
>  
> 
> 
> (i have also added extract,analysis,elevator,promotion,spell,suggester
> components in solrconfig but i guess that wont select query)
> When i run this:
> http://localhost:8080/solr/document/select?q=*:*   --> all the 9 docs are
> replaced
>
> but when i run this:
> http://localhost:8080/solr/document/select?q=programmer or anything in
> place
> of programmer --> output shows numfound=0 evenif there are about 34 times
> programmer has appeared in docs.
>
> Initially it worked fine,but not now.
> Why is it so?
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/select-with-q-parameter-does-not-work-tp4106099.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: /select with 'q' parameter does not work

2013-12-11 Thread Amit Aggarwal
When you start solr , do you find any error or exception
 Java -jar ./start.jar ... Then see if there is any problem ...
Otherwise take solr solrconfig.xml and try to run .. it should run
On 11-Dec-2013 5:41 PM, "Nutan"  wrote:

>  default="true">
>
>  
>explicit
>20
>*
>id
>2.1
>  
>   
>
>
>   
>   
> ${solr.document.data.dir:}
>   
>   
>
>
>   
>
>   class="solr.FieldAnalysisRequestHandler" />
>  
>  
>
> 
>  
>explicit
>10
>contents
>  
> 
>
> i made changes n this new solrconfig.xml ,but still the query does not
> work.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/select-with-q-parameter-does-not-work-tp4106099p4106133.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


DateField - Invalid JSON String Exception - converting Query Response to JSON Object

2014-01-06 Thread Amit Jha
Hi,

"Wish You All a Very Happy New Year".

We have index where date field have default value as 'NOW'. We are using
solrj to query solr and when we try to convert query
response(response.getResponse) to JSON object in java. The JSON
API(org.json) throws 'invalid json string' exception. API say so because
date field value i.e. -mm-ddThh:mm:ssZ  is not surrounded by double
inverted commas( " ). So It says required , or } character when API see the
colon.

Could you please help me to retrieve the date field value as string in JSON
response. Or any pointers.

Any help would be highly appreciable.


Re: DateField - Invalid JSON String Exception - converting Query Response to JSON Object

2014-01-06 Thread Amit Jha
Hi,


We have index where date field have default value as 'NOW'. We are using
solrj to query solr and when we try to convert query
response(response.getResponse) to JSON object in java. The JSON
API(org.json) throws 'invalid json string' exception. API say so because
date field value i.e. -mm-ddThh:mm:ssZ  is not surrounded by double
inverted commas( " ). So It says required , or } character when API see the
colon.

Could you please help me to retrieve the date field value as string in JSON
response. Or any pointers.

Any help would be highly appreciable.


On Tue, Jan 7, 2014 at 12:28 AM, Amit Jha  wrote:

> Hi,
>
> "Wish You All a Very Happy New Year".
>
> We have index where date field have default value as 'NOW'. We are using
> solrj to query solr and when we try to convert query
> response(response.getResponse) to JSON object in java. The JSON
> API(org.json) throws 'invalid json string' exception. API say so because
> date field value i.e. -mm-ddThh:mm:ssZ  is not surrounded by double
> inverted commas( " ). So It says required , or } character when API see the
> colon.
>
> Could you please help me to retrieve the date field value as string in
> JSON response. Or any pointers.
>
> Any help would be highly appreciable.
>
>
>
>


Re: DateField - Invalid JSON String Exception - converting Query Response to JSON Object

2014-01-07 Thread Amit Jha
I am using it. But timestamp having ":" in between causes the issue. Please
help


On Tue, Jan 7, 2014 at 11:46 AM, Ahmet Arslan  wrote:

> Hi Amit,
>
> If you want json response, Why don't you use wt=json?
>
> Ahmet
>
>
> On Tuesday, January 7, 2014 7:34 AM, Amit Jha 
> wrote:
> Hi,
>
>
> We have index where date field have default value as 'NOW'. We are using
> solrj to query solr and when we try to convert query
> response(response.getResponse) to JSON object in java. The JSON
> API(org.json) throws 'invalid json string' exception. API say so because
> date field value i.e. -mm-ddThh:mm:ssZ  is not surrounded by double
> inverted commas( " ). So It says required , or } character when API see the
> colon.
>
> Could you please help me to retrieve the date field value as string in JSON
> response. Or any pointers.
>
> Any help would be highly appreciable.
>
>
>
> On Tue, Jan 7, 2014 at 12:28 AM, Amit Jha  wrote:
>
> > Hi,
> >
> > "Wish You All a Very Happy New Year".
> >
> > We have index where date field have default value as 'NOW'. We are using
> > solrj to query solr and when we try to convert query
> > response(response.getResponse) to JSON object in java. The JSON
> > API(org.json) throws 'invalid json string' exception. API say so because
> > date field value i.e. -mm-ddThh:mm:ssZ  is not surrounded by double
> > inverted commas( " ). So It says required , or } character when API see
> the
> > colon.
> >
> > Could you please help me to retrieve the date field value as string in
> > JSON response. Or any pointers.
> >
> > Any help would be highly appreciable.
> >
> >
> >
> >
>
>


Re: DateField - Invalid JSON String Exception - converting Query Response to JSON Object

2014-01-07 Thread Amit Jha
Hey Hoss,

Thanks for replying back..Here is the response generated by solrj.




*SolrJ Response*: ignore the Braces at It have copied it from big chunk

Response:
{responseHeader={status=0,QTime=0,params={lowercaseOperators=true,sort=score
desc,cache=false,qf=content,wt=javabin,rows=100,defType=edismax,version=2,fl=*,score,start=0,q="White+Paper",stopwords=true,fq=type:"White
Paper"}},response={numFound=9,start=0,maxScore=0.61586785,docs=[SolrDocument{id=007,
type=White Paper, source=Documents, title=White Paper 003, body=White Paper
004 Body, author=[Author 3], keywords=[Keyword 3], description="Vivamus
turpis eros", mime_type=pdf, _version_=1456609602022932480,
*publication_date=Wed
Jan 08 03:16:06 IST 2014*, score=0.61586785}]},

Please the publication_date value, Whenever I enable "stored=true" for this
field I got the error

*org.json.JSONException: Expected a ',' or '}' at 853 [character 854 line
1]*

*Solr Query String*
q=%22White%2BPaper%22&qf=content&start=0&rows=100&sort=score+desc&defType=edismax&stopwords=true&lowercaseOperators=true&wt=json&cache=false&fl=*%2Cscore&fq=type%3A%22White+Paper%22

Hope this may help you to answer.




On Tue, Jan 7, 2014 at 10:29 PM, Chris Hostetter
wrote:

>
> : We have index where date field have default value as 'NOW'. We are using
> : solrj to query solr and when we try to convert query
> : response(response.getResponse) to JSON object in java. The JSON
>
> You're going to have to show us some real code, some real data, and a real
> error exception that you are getting -- because it's not at all clear what
> you are trying to do, or why you would get an error about invalid JSON.
>
> If you generate a JSON response from Solr, you'll get properly quoted
> strings for the dates...
>
> $ curl 'http://localhost:8983/solr/collection1/query?q=SOLR&fl=*_dt&;'
> {
>   "responseHeader":{
> "status":0,
> "QTime":8,
> "params":{
>   "fl":"*_dt",
>   "q":"SOLR"}},
>   "response":{"numFound":1,"start":0,"docs":[
>   {
> "incubationdate_dt":"2006-01-17T00:00:00Z"}]
>   }}
>
>
> ...but it appears you are trying to *generate* JSON yourself, using the
> Java objects you get back from a parsed SolrJ response -- so i'm not sure
> where you would be getting an error about invalid JSON, unless you were
> doing something invalid in the code you are writing to create that JSON.
>
>
> -Hoss
> http://www.lucidworks.com/
>


Index size - to determine storage

2014-01-09 Thread Amit Jha
Hi,

I would like to know if I index a file I.e PDF of 100KB then what would be the 
size of index. What all factors should be consider to determine the disk size?

Rgds
AJ

SolrCloud Cluster Setup - Shard & Replica

2014-01-18 Thread Amit Jha
Hi,

I tried to create 2 shard cluster with shard replica of a collection. For
this set up I used two physical machines. In this set up I have installed 1
shard and replica in Machine A and another 1 shard and 1 replica in Machine
B.
Now when I stop both shard and replica on machine B. I was not able to
perform search. I would like to know how can I set up a fail safe cluster
using two machines?
I would like achieve the use case where if machine goes down, Still I can
serve the search request. I have a constraint where I can not add more
machine. Is there any alternative to achieve the use case?

Regards
Amit


Re: Boosting documents by categorical preferences

2014-01-27 Thread Amit Nithian
Hi Chris (and others interested in this),

Sorry for dropping off.. I got sidetracked with other work and came back to
this and finally got a V1 of this implemented.

The final process is as follows:
1) Pre-compute the global categorical num_ratings/average/std-dev (so for
Action the average rating may be 3.49 with stdDev of .99)
2) For a given user, retrieve the last X (X for me is 10) ratings and
compute the user's categorical affinities by taking the average rating for
all movies in that particular category (Action) subtract the global cat
average and divide by cat std_dev. Furthermore, multiply this by the
fraction of total user ratings in that category.
   -> For example, if a user's last 10 ratings consisted of 9/10 Drama and
1/10 Thriller, the z-score of the Thriller should be discounted relative to
that of the Drama so that it's more prominent the user's preference (either
positive or negative) to Drama.
3) Sort by the absolute value of the z-score (Thanks Hossman.. great
thought).
4) Return the top 3 (arbitrary number)
5) Modify the query to look like the following:

qq=tom hanks&q={!boost b=$b defType=edismax
v=$qq}&cat1=category:Children&cat2=category:Fantasy&cat3=category:Animation&b=sum(1,sum(product(query($cat1),0.22267872),product(query($cat2),0.21630952),product(query($cat3),0.21120241)))

basically b = 1+(pref1*query(category:something1) +
pref2*query(category:something2) + pref3*query(category:something3))

The initial results seem to be kinda promising... of course there are many
more optimizations I could do like decay user ratings over time to indicate
that preferences decay over time so a 5 rating a year ago doesn't count as
much as a 5 rating today.

Hope this helps others. I'll open source what I have soon and post back. If
there is feedback or other thoughts let me know!

Cheers
Amit


On Fri, Nov 22, 2013 at 11:38 AM, Chris Hostetter
wrote:

>
> : I thought about that but my concern/question was how. If I used the pow
> : function then I'm still boosting the bad categories by a small
> : amount..alternatively I could multiply by a negative number but does that
> : work as expected?
>
> I'm not sure i understand your concern: negative powers would give you
> values less then 1, positive powers would give you values greater then 1,
> and then you'd use those values as multiplicitive boosts -- so the values
> less then 1 would penalize the scores of existing matching docs in the
> categories the user dislikes.
>
> Oh wait ... i see, in your original email (and in my subsequent suggested
> tweak to use pow()) you were talking about sum()ing up these 3 category
> boosts (and i cut/pasted sum() in my example as well) ... yeah,
> using multiplcation there would make more sense if you wanted to do the
> "negative prefrences" as well, because then then score of any matching doc
> will be reduced if it matches on an "undesired" category -- and the
> amount it will be reduced will be determined by how strongly it
> matches on that category (ie: the base score returned by the nested
> query() func) and "how negative" the undesired prefrence value (ie:
> the pow() exponent) is
>
>
> qq=...
> q={!boost b=$b v=$qq}
>
> b=prod(pow(query($cat1,cat1z)),pow(query($cat2,cat2z)),pow(query($cat3,cat3z))
> cat1=...action...
> cat1z=1.48
> cat2=...comedy...
> cat2z=1.33
> cat3=...kids...
> cat3z=-1.7
>
>
> -Hoss
>


Re: Boosting documents by categorical preferences

2014-01-30 Thread Amit Nithian
Chris,

Sounds good! Thanks for the tips.. I'll be glad to submit my talk to this
as I have a writeup pretty much ready to go.

Cheers
Amit


On Tue, Jan 28, 2014 at 11:24 AM, Chris Hostetter
wrote:

>
> : The initial results seem to be kinda promising... of course there are
> many
> : more optimizations I could do like decay user ratings over time to
> indicate
> : that preferences decay over time so a 5 rating a year ago doesn't count
> as
> : much as a 5 rating today.
> :
> : Hope this helps others. I'll open source what I have soon and post back.
> If
> : there is feedback or other thoughts let me know!
>
> Hey Amit,
>
> Glad to hear your user based boosting experiments are paying off.  I would
> definitely love to see a more detailed writeup down the road showing off
> how it affects your final user metrics -- or perhaps even give a session
> on your technique at ApacheCon?
>
>
> http://events.linuxfoundation.org/events/apachecon-north-america/program/cfp
>
>
> -Hoss
> http://www.lucidworks.com/
>


Solr Deduplication use of overWriteDupes flag

2014-02-04 Thread Amit Agrawal
Hello,

I had a configuration where I had "overwriteDupes"=false. I added few
duplicate documents. Result: I got duplicate documents in the index.

When I changed to "overwriteDupes"=true, the duplicate documents started
overwriting the older documents.

Question 1: How do I achieve, [add if not there, fail if duplicate is
found] i.e. mimic the behaviour of a DB which fails when trying to insert a
record which violates some unique constraint. I thought that
"overwriteDupes"=false would do that, but apparently not.

Question2: Is there some documentation around overwriteDupes? I have
checked the existing Wiki; there is very little explanation of the flag
there.

Thanks,

-Amit


Re: Fault Tolerant Technique of Solr Cloud

2014-02-18 Thread Amit Jha
Solr will complaint only if you brought down both replica & leader of same 
shard. It would be difficult to have highly available env. If you have less 
number of physical servers.

Rgds
AJ

> On 18-Feb-2014, at 18:35, Vineet Mishra  wrote:
> 
> Hi All,
> 
> I want to have clear idea about the Fault Tolerant Capability of SolrCloud
> 
> Considering I have setup the SolrCloud with a external Zookeeper, 2 shards,
> each having a replica with single collection as given in the official Solr
> Documentation.
> 
> https://cwiki.apache.org/confluence/display/solr/Getting+Started+with+SolrCloud
> 
>   *Collection1*
> /\
>   /\
> /\
>   /\
> /\
>/   \
> *Shard 1 Shard 2*
> localhost:8983localhost:7574
> localhost:8900localhost:7500
> 
> 
> I Indexed some document and then if I shutdown any of the replica or Leader
> say for ex- *localhost:8900*, I can't query to the collection to that
> particular port
> 
> http:/*/localhost:8900*/solr/collection1/select?q=*:*
> 
> Then how is it Fault Tolerant or how the query has to be made.
> 
> Regards


Re: Boost Query Example

2014-02-18 Thread Amit Jha
I would say use dismax query parser and set boost factor in qf params.

Following link may help

http://wiki.apache.org/solr/DisMaxQParserPlugin#qf_.28Query_Fields.29

https://wiki.apache.org/solr/SolrRelevancyFAQ#Solr_Relevancy_FAQ

Rgds
AJ

> On 18-Feb-2014, at 20:49, "EXTERNAL Taminidi Ravi (ETI, 
> Automotive-Service-Solutions)"  wrote:
> 
> 
> I am not much experience on this boosting, can you explain with an example?  
> Really appreciated on you help.
> 
> --Ravi
> 
> -Original Message-
> From: Jack Krupansky [mailto:j...@basetechnology.com] 
> Sent: Tuesday, February 18, 2014 9:58 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Boost Query Example
> 
> Add debugQuery=true to your queries and look at the scoring in the "explain" 
> section. From the intermediate scoring by field, you should be able to do the 
> math to figure out what boost would be required to rank your exact match high 
> enough.
> 
> -- Jack Krupansky
> 
> -Original Message-
> From: EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions)
> Sent: Tuesday, February 18, 2014 9:50 AM
> To: solr-user@lucene.apache.org ; michael.della.bi...@appinions.com
> Subject: RE: Boost Query Example
> 
> Hi Michael, Thanks for the information.
> 
> Now I am trying with the query , but I am not getting the sequence in order.
> 
> SKU with 223-CL10V3 lists first (Exact Match) ManfacturerNumber with 
> 223-CL10V3 list Second (Exact Match) if first is available if not 
> MangacturesNumber doc will be first in the list.
> 
> SKU with 223-CL10V3* list third (Starts with the number if SKU or 
> ManafactureNumeber not found then this will be first in Query.
> 
> Can you check below query or rewrite the query or some help references? 
> Below query not returning the way it should be..
> 
> http://localhost:8983/solr/SRSFR_ProductCollection/select?q=SKU:223-CL10V3^10%20OR%20ManufactureNumber:223-CL10V3^5%20
> OR%20SKU:223-CL10V3*^1&wt=json&indent=true
> 
> 
> 
> -Original Message-
> From: Michael Della Bitta [mailto:michael.della.bi...@appinions.com]
> Sent: Monday, February 17, 2014 4:12 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Boost Query Example
> 
> Hi,
> 
> Filter queries don't affect score, so boosting won't have an effect there.
> If you want those query terms to get boosted, move them into the q 
> parameter.
> 
> http://wiki.apache.org/solr/CommonQueryParameters#fq
> 
> Hope that helps!
> 
> Michael Della Bitta
> 
> Applications Developer
> 
> o: +1 646 532 3062
> 
> appinions inc.
> 
> "The Science of Influence Marketing"
> 
> 18 East 41st Street
> 
> New York, NY 10017
> 
> t: @appinions  | g+:
> plus.google.com/appinions
> w: appinions.com 
> 
> 
> On Mon, Feb 17, 2014 at 3:49 PM, EXTERNAL Taminidi Ravi (ETI,
> Automotive-Service-Solutions)  wrote:
> 
>> 
>> Hi can some one help me on the Boost & Sort query example.
>> 
>> http://localhost:8983/solr/ProductCollection/select?q=*%3A*&wt=json&in
>> dent=true&fq=SKU:223-CL10V3^100
>> OR SKU:223-CL1^90
>> 
>> There is not different in the query Order, Let me know if I am missing
>> something. Also I like to Order with the exact match for
>> SKU:223-CL10V3^100
>> 
>> Thanks
>> 
>> Ravi
> 


Re: different fields for user-supplied phrases in edismax

2014-12-12 Thread Amit Jha
Hi Mike,

What is exact your use case?  
What do mean by "controlling the fields used for phrase queries" ? 


Rgds
AJ

> On 12-Dec-2014, at 20:11, Michael Sokolov  
> wrote:
> 
> Doug - I believe pf controls the fields that are used for the phrase queries 
> *generated by the parser*.
> 
> What I am after is controlling the fields used for the phrase queries 
> *supplied by the user* -- ie surrounded by double-quotes.
> 
> -Mike
> 
>> On 12/12/2014 08:53 AM, Doug Turnbull wrote:
>> Michael,
>> 
>> I typically solve this problem by using a copyField and running different
>> analysis on the destination field. Then you could use this field as pf
>> insteaf of qf. If I recall, fields in pf must also be mentioned in qf for
>> this to work.
>> 
>> -Doug
>> 
>> On Fri, Dec 12, 2014 at 8:13 AM, Michael Sokolov <
>> msoko...@safaribooksonline.com> wrote:
>>> Yes, I guess it's a common expectation that searches work this way.  It
>>> was actually almost trivial to add as an extension to the edismax parser,
>>> and I have what I need now; I opened SOLR-6842; if there's interest I'll
>>> try to find the time to contribute back to Solr
>>> 
>>> -Mike
>>> 
>>> 
 On 12/11/14 5:20 PM, Ahmet Arslan wrote:
 
 Hi Mike,
 
 If I am not wrong, you are trying to simulate google behaviour.
 If you use quotes, google return exact matches. I think that makes
 perfectly sense and will be a valuable addition. I remember some folks
 asked/requested this behaviour in the list.
 
 Ahmet
 
 
 
 On Thursday, December 11, 2014 10:50 PM, Michael Sokolov <
 msoko...@safaribooksonline.com> wrote:
 I'd like to supply a different set of fields for phrases than for bare
 terms.  Specifically, we'd like to treat phrases as more "exact" -
 probably turning off stemming and generally having a tighter analysis
 chain.  Note: this is *not* what's done by configuring "pf" which
 controls fields for the auto-generated phrases.  What we want to do is
 provide our users more precise control by explicit use of " "
 
 Is there a way to do this by configuring edismax?  I don't think there
 is, and then if you agree, a followup question - if I want to extend the
 EDismax parser, does anybody have advice as to the best way in?  I'm
 looking at:
 
 Query getFieldQuery(String field, String val, int slop)
 
 and altering getAliasedQuery() to accept an aliases parameter, which
 would be a different set of aliases for phrases ...
 
 does that make sense?
 
 -Mike
> 


De Duplication using Solr

2015-01-02 Thread Amit Jha
I am trying to find out duplicate records based on distance and phonetic
algorithms. Can I utilize solr for that? I have following fields and
conditions to identify exact or possible duplicates.

1. Fields
prefix
suffix
firstname
lastname
email(primary_email1, email2, email3)
phone(primary_phone1, phone2, phone3)
2. Conditions:
Two records said to be exact duplicates if

1. IsExactMatchFunction(record1_prefix, record2_prefix) AND
IsExactMatchFunction(record1_suffix, record2_suffix) AND
IsExactMatchFunction(record1_firstname,record2_firstname) AND
IsExactMatchFunction(record1_lastname,record2_lastname) AND
IsExactMatchFunction(record1_primary_email,record2_primary_email) OR
IsExactMatchFunction(record1_primary_phone,record2_primary_primary)
Two records said to be possible duplicates if

1. IsExactMatchFunction(record1_prefix, record2_prefix) OR
IsExactMatchFunction(record1_suffix, record2_suffix) OR
IsExactMatchFunction(record1_firstname,record2_firstname) AND
IsExactMatchFunction(record1_lastname,record2_lastname) AND
IsExactMatchFunction(record1_primary_email,record2_primary_email) OR
IsExactMatchFunction(record1_primary_phone,record2_primary_primary)
 ELSE
 2. IsFuzzyMatchFunction(record1_firstname,record2_firstname) AND
IsExactMatchFunction(record1_lastname,record2_lastname) AND
IsExactMatchFunction(record1_primary_email,record2_primary_email) OR
IsExactMatchFunction(record1_primary_phone,record2_primary_primary)
 ELSE
 3. IsFuzzyMatchFunction(record1_firstname,record2_firstname) AND
IsExactMatchFunction(record1_lastname,record2_lastname) AND
IsExactMatchFunction(record1_any_email,record2_any_email) OR
IsExactMatchFunction(record1_any_phone,record2_any_primary)

IsFuzzyMatchFunction() will perform distance and phonetic algorithms
calculation and compare it with predefined threshold.

For example:

if threshold defined for firsname is 85 and IsFuzzyMatchFunction() function
only return "ture" only and only if one of the algorithms(distance or
phonetic) return the similarity socre >= 85.

Can I use solr to perform this job. Or Can you guys suggest how can I
approach to this problem. I have seen the duke(De duplication API) but I
can not use duke out of the box.


Re: De Duplication using Solr

2015-01-03 Thread Amit Jha
Thanks for reply...I have already seen wiki. It is more  likely to record
matching.

On Sat, Jan 3, 2015 at 7:39 PM, Jack Krupansky 
wrote:

> First, see if you can get your requirements to align to the de-dupe feature
> that Solr already has:
> https://cwiki.apache.org/confluence/display/solr/De-Duplication
>
>
> -- Jack Krupansky
>
> On Sat, Jan 3, 2015 at 2:54 AM, Amit Jha  wrote:
>
> > I am trying to find out duplicate records based on distance and phonetic
> > algorithms. Can I utilize solr for that? I have following fields and
> > conditions to identify exact or possible duplicates.
> >
> > 1. Fields
> > prefix
> > suffix
> > firstname
> > lastname
> > email(primary_email1, email2, email3)
> > phone(primary_phone1, phone2, phone3)
> > 2. Conditions:
> > Two records said to be exact duplicates if
> >
> > 1. IsExactMatchFunction(record1_prefix, record2_prefix) AND
> > IsExactMatchFunction(record1_suffix, record2_suffix) AND
> > IsExactMatchFunction(record1_firstname,record2_firstname) AND
> > IsExactMatchFunction(record1_lastname,record2_lastname) AND
> > IsExactMatchFunction(record1_primary_email,record2_primary_email) OR
> > IsExactMatchFunction(record1_primary_phone,record2_primary_primary)
> > Two records said to be possible duplicates if
> >
> > 1. IsExactMatchFunction(record1_prefix, record2_prefix) OR
> > IsExactMatchFunction(record1_suffix, record2_suffix) OR
> > IsExactMatchFunction(record1_firstname,record2_firstname) AND
> > IsExactMatchFunction(record1_lastname,record2_lastname) AND
> > IsExactMatchFunction(record1_primary_email,record2_primary_email) OR
> > IsExactMatchFunction(record1_primary_phone,record2_primary_primary)
> >  ELSE
> >  2. IsFuzzyMatchFunction(record1_firstname,record2_firstname) AND
> > IsExactMatchFunction(record1_lastname,record2_lastname) AND
> > IsExactMatchFunction(record1_primary_email,record2_primary_email) OR
> > IsExactMatchFunction(record1_primary_phone,record2_primary_primary)
> >  ELSE
> >  3. IsFuzzyMatchFunction(record1_firstname,record2_firstname) AND
> > IsExactMatchFunction(record1_lastname,record2_lastname) AND
> > IsExactMatchFunction(record1_any_email,record2_any_email) OR
> > IsExactMatchFunction(record1_any_phone,record2_any_primary)
> >
> > IsFuzzyMatchFunction() will perform distance and phonetic algorithms
> > calculation and compare it with predefined threshold.
> >
> > For example:
> >
> > if threshold defined for firsname is 85 and IsFuzzyMatchFunction()
> function
> > only return "ture" only and only if one of the algorithms(distance or
> > phonetic) return the similarity socre >= 85.
> >
> > Can I use solr to perform this job. Or Can you guys suggest how can I
> > approach to this problem. I have seen the duke(De duplication API) but I
> > can not use duke out of the box.
> >
>


Retrieving Phonetic Code as result

2015-01-22 Thread Amit Jha
Hi,

I need to know how can I retrieve phonetic codes. Does solr provide it as
part of result? I need codes for record matching.

*following is schema fragment:*


  


  


 
  
  
  


 


Re: Retrieving Phonetic Code as result

2015-01-22 Thread Amit Jha
Hi,

I need to know how can I retrieve phonetic codes. Does solr provide it as
part of result? I need codes for record matching.

*following is schema fragment:*


  


  


 
  
  
  


 

Hi,

Thanks for response, I can see generated MetaPhone codes using Luke. I am
using solr only because it creates the phonetic code at time of indexing.
Otherwise for each record I need to call Metaphone algorithm in realtime to
get the codes and compare them. I think when luke can read and display it,
why can't solr?


Re: Retrieving Phonetic Code as result

2015-01-22 Thread Amit Jha
Thanks for response, I can see generated MetaPhone codes using Luke. I am
using solr only because it creates the phonetic code at time of indexing.
Otherwise for each record I need to call Metaphone algorithm in realtime to
get the codes and compare them. I think when luke can read and display it,
why can't solr

On Thu, Jan 22, 2015 at 7:54 PM, Amit Jha  wrote:

> Hi,
>
> I need to know how can I retrieve phonetic codes. Does solr provide it as
> part of result? I need codes for record matching.
>
> *following is schema fragment:*
>
>  class="solr.TextField" >
>   
> 
>  maxCodeLength="4"/>
>   
> 
>
>  
>   
>   
>   
>
> 
>  
>
> Hi,
>
> Thanks for response, I can see generated MetaPhone codes using Luke. I am
> using solr only because it creates the phonetic code at time of indexing.
> Otherwise for each record I need to call Metaphone algorithm in realtime to
> get the codes and compare them. I think when luke can read and display it,
> why can't solr?
>
>


Re: Retrieving Phonetic Code as result

2015-01-23 Thread Amit Jha
Can I extend solr to add phonetic codes at time of indexing as uuid field 
getting added. Because I want to preprocess the metaphone code because I 
calculate the code on runtime will give me some performance hit.

Rgds
AJ

> On Jan 23, 2015, at 5:37 PM, Jack Krupansky  wrote:
> 
> Your app can use the field analysis API (FieldAnalysisRequestHandler) to
> query Solr for what the resulting field values are for each filter in the
> analysis chain for a given input string. This is what the Solr Admin UI
> Analysis web page uses.
> 
> See:
> http://lucene.apache.org/solr/4_10_2/solr-core/org/apache/solr/handler/FieldAnalysisRequestHandler.html
> and in solrconfig.xml
> 
> 
> -- Jack Krupansky
> 
>> On Thu, Jan 22, 2015 at 8:42 AM, Amit Jha  wrote:
>> 
>> Hi,
>> 
>> I need to know how can I retrieve phonetic codes. Does solr provide it as
>> part of result? I need codes for record matching.
>> 
>> *following is schema fragment:*
>> 
>> > class="solr.TextField" >
>>  
>>
>>> maxCodeLength="4"/>
>>  
>>
>> 
>> 
>>  
>>  
>>  
>> 
>> 
>> 
>> 


Solr admin search with wildcard

2013-06-27 Thread Amit Sela
I'm looking to search (in the solr admin search screen) a certain field
for:

*youtube*

I know that leading wildcards takes a lot of resources but I'm not worried
with that

My only question is about the syntax, would this work:

field:"*youtube*" ?

Thanks,

I'm using Solr 3.6.2


Re: Solr admin search with wildcard

2013-06-27 Thread Amit Sela
The stored and indexed string is actually a url like "
http://www.youtube.com/somethingsomething";.
It looks like removing the quotes does the job: iframe:*youtube* or am I
wrong ? For now, performance is not an issue, but accuracy is and I would
like to know for example how many URLS have iframe source leading to
YouTube for example. So query like: iframe:*youtube* with max rows 10 or
something will return in the response numFound field the total number of
pages that have a tag ifarme with a source matching *youtube, No ?


On Thu, Jun 27, 2013 at 3:24 PM, Jack Krupansky wrote:

> No, you cannot use wildcards within a quoted term.
>
> Tell us a little more about what your strings look like. You might want to
> consider tokenizing or using ngrams to avoid the need for wildcards.
>
> -- Jack Krupansky
>
> -----Original Message- From: Amit Sela
> Sent: Thursday, June 27, 2013 3:33 AM
> To: solr-user@lucene.apache.org
> Subject: Solr admin search with wildcard
>
>
> I'm looking to search (in the solr admin search screen) a certain field
> for:
>
> *youtube*
>
> I know that leading wildcards takes a lot of resources but I'm not worried
> with that
>
> My only question is about the syntax, would this work:
>
> field:"*youtube*" ?
>
> Thanks,
>
> I'm using Solr 3.6.2
>


Re: Solr admin search with wildcard

2013-06-27 Thread Amit Sela
Forgive my ignorance but I want to  be sure, do I add  to solrindex-mapping.xml?
so that my solrindex-mapping.xml looks like this:











* *

url

And what do you mean by standard tokenization ?

Thanks!


On Thu, Jun 27, 2013 at 3:43 PM, Jack Krupansky wrote:

> Just  from the string field to a "text" field and use standard
> tokenization, then you can search the text field for "youtube" or even
> "something" that is a component of the URL path. No wildcard required.
>
>
> -- Jack Krupansky
>
> -Original Message- From: Amit Sela
> Sent: Thursday, June 27, 2013 8:37 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr admin search with wildcard
>
>
> The stored and indexed string is actually a url like "
> http://www.youtube.com/**somethingsomething<http://www.youtube.com/somethingsomething>
> ".
> It looks like removing the quotes does the job: iframe:*youtube* or am I
> wrong ? For now, performance is not an issue, but accuracy is and I would
> like to know for example how many URLS have iframe source leading to
> YouTube for example. So query like: iframe:*youtube* with max rows 10 or
> something will return in the response numFound field the total number of
> pages that have a tag ifarme with a source matching *youtube, No ?
>
>
> On Thu, Jun 27, 2013 at 3:24 PM, Jack Krupansky *
> *wrote:
>
>  No, you cannot use wildcards within a quoted term.
>>
>> Tell us a little more about what your strings look like. You might want to
>> consider tokenizing or using ngrams to avoid the need for wildcards.
>>
>> -- Jack Krupansky
>>
>> -Original Message- From: Amit Sela
>> Sent: Thursday, June 27, 2013 3:33 AM
>> To: solr-user@lucene.apache.org
>> Subject: Solr admin search with wildcard
>>
>>
>> I'm looking to search (in the solr admin search screen) a certain field
>> for:
>>
>> *youtube*
>>
>> I know that leading wildcards takes a lot of resources but I'm not worried
>> with that
>>
>> My only question is about the syntax, would this work:
>>
>> field:"*youtube*" ?
>>
>> Thanks,
>>
>> I'm using Solr 3.6.2
>>
>>
>


Re: More on topic of Meta-search/Federated Search with Solr

2013-08-26 Thread Amit Jha
Hi,

I would suggest for the following. 

1. Create custom search connectors for each individual sources.
2. Connector will responsible to query the source of any type web, gateways 
etc. and get the results & write the top N results to a solr.
3. Query the same keyword to solr and display the result. 

Would you like to create something like
http://knimbus.com


Rgds
AJ

On 27-Aug-2013, at 2:28, Dan Davis  wrote:

> One more question here - is this topic more appropriate to a different list?
> 
> 
> On Mon, Aug 26, 2013 at 4:38 PM, Dan Davis  wrote:
> 
>> I have now come to the task of estimating man-days to add "Blended Search
>> Results" to Apache Solr.   The argument has been made that this is not
>> desirable (see Jonathan Rochkind's blog entries on Bento search with
>> blacklight).   But the estimate remains.No estimate is worth much
>> without a design.   So, I am come to the difficult of estimating this
>> without having an in-depth knowledge of the Apache core.   Here is my
>> design, likely imperfect, as it stands.
>> 
>>   - Configure a core specific to each search source (local or remote)
>>   - On cores that index remote content, implement a periodic delete
>>   query that deletes documents whose timestamp is too old
>>   - Implement a custom requestHandler for the "remote" cores that goes
>>   out and queries the remote source.   For each result in the top N
>>   (configurable), it computes an id that is stable (e.g. it is based on the
>>   remote resource URL, doi, or hash of data returned).   It uses that id to
>>   look-up the document in the lucene database.   If the data is not there, it
>>   updates the lucene core and sets a flag that commit is required.   Once it
>>   is done, it commits if needed.
>>   - Configure a core that uses a custom SearchComponent to call the
>>   requestHandler that goes and gets new documents and commits them.   Since
>>   the cores for remote content are different cores, they can restart their
>>   searcher at this point if any commit is needed.   The custom
>>   SearchComponent will wait for commit and reload to be completed.   Then,
>>   search continues uses the other cores as "shards".
>>   - Auto-warming on this will assure that the most recently requested
>>   data is present.
>> 
>> It will, of course, be very slow a good part of the time.
>> 
>> Erik and others, I need to know whether this design has legs and what
>> other alternatives I might consider.
>> 
>> 
>> 
>> On Sun, Aug 18, 2013 at 3:14 PM, Erick Erickson 
>> wrote:
>> 
>>> The lack of global TF/IDF has been answered in the past,
>>> in the sharded case, by "usually you have similar enough
>>> stats that it doesn't matter". This pre-supposes a fairly
>>> evenly distributed set of documents.
>>> 
>>> But if you're talking about federated search across different
>>> types of documents, then what would you "rescore" with?
>>> How would you even consider scoring docs that are somewhat/
>>> totally different? Think magazine articles an meta-data associated
>>> with pictures.
>>> 
>>> What I've usually found is that one can use grouping to show
>>> the top N of a variety of results. Or show tabs with different
>>> types. Or have the app intelligently combine the different types
>>> of documents in a way that "makes sense". But I don't know
>>> how you'd just get "the right thing" to happen with some kind
>>> of scoring magic.
>>> 
>>> Best
>>> Erick
>>> 
>>> 
>>> On Fri, Aug 16, 2013 at 4:07 PM, Dan Davis  wrote:
>>> 
 I've thought about it, and I have no time to really do a meta-search
 during
 evaluation.  What I need to do is to create a single core that contains
 both of my data sets, and then describe the architecture that would be
 required to do blended results, with liberal estimates.
 
 From the perspective of evaluation, I need to understand whether any of
 the
 solutions to better ranking in the absence of global IDF have been
 explored?I suspect that one could retrieve a much larger than N set
 of
 results from a set of shards, re-score in some way that doesn't require
 IDF, e.g. storing both results in the same priority queue and
 *re-scoring*
 before *re-ranking*.
 
 The other way to do this would be to have a custom SearchHandler that
 works
 differently - it performs the query, retries all results deemed relevant
 by
 another engine, adds them to the Lucene index, and then performs the
 query
 again in the standard way.   This would be quite slow, but perhaps useful
 as a way to evaluate my method.
 
 I still welcome any suggestions on how such a SearchHandler could be
 implemented.
>> 


Re: Combining Solr score with customized user ratings for a document

2013-09-10 Thread Amit Jha
You can use DB for storing user preferences and later if you want you can flush 
them to solr as an update along with userid.

Or you may add a result pipeline filter 



Rgds
AJ

On 13-Feb-2013, at 17:50, Á_o  wrote:

> Hi:
> 
> I am working on a proyect where we want to recommend our users products
> based on their previous 'likes', purchases and so on (typical stuff of a
> recommender system), while we want to let them browse freely the catalogue
> by search queries, making use of facets, more-like-this and so on (typical
> stuff of a Solr index).
> 
> After reading here and there, I have reached the conclusion that's it's
> better to keep Solr Index apart from the database. Solr is for products
> (which can be reindexed from the DB as a nightly batch) while the DB is for
> everything else, including -the products and- user profiles. 
> 
> So, given an user and a particular search (which can be as simple as "q=*"),
> on one hand we have Solr results (i.e. docs + scores) for the query, while
> on the other we have user predicted ratings (i.e. recommender scores) coming
> from the DB (though they could be cached elsewhere) for each of the products
> returned by Solr.
> 
> And what I want is clear -to state-: combine both scores (e.g. by a simple
> product) so the user receives a sorted list of relevant products biased by
> his/her preferences.
> 
> I have been googleing for the last days without finding which is the best
> way to achieve this.
> 
> I think it's not a matter of boosting, or at least I can't see which
> boosting method could be useful as the boost should be user-based. I think
> that I need to extend -somewhere- Solr so I can alter the result scores by
> providing the user ID and connecting to the DB at query time, doing the
> necessary maths and returning the final score in a -quite- transparent way
> for the Web app.
> 
> A less elegant solution could be letting Solr do its work as usual, and then
> navigate through the XML modifying the scores and reordering the whole list
> of products (or maybe just the first N results) by the new combined score.
> 
> What do you think?
> A big THANKS in advance
> 
> Álvaro
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Combining-Solr-score-with-customized-user-ratings-for-a-document-tp4040200.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Committing when indexing in parallel

2013-09-14 Thread Amit Jha
Hi,

As per my knowledge, any number of requests can be issued in parallel for index 
the documents. Any commit request will write them to index. 

So if P1 issued a commit then all documents of P2 those are eligible get 
committed and remaining documents will get committed on other commit request. 


Rgds
AJ

On 14-Sep-2013, at 2:51, Phani Chaitanya  wrote:

> 
> I'm wondering what happens to commit while we are indexing in parallel in
> Solr. Are the indexing update requests blocked until the commit finishes ?
> 
> Lets say I've a process P1 which issued a commit request and there is
> another process P2 which is still indexing to the same index. What happens
> to the index in that scenario. Are the P2 indexing requests blocked until P1
> commit request finishes ?
> 
> I'm just wondering about what is the behavior of Solr in the above case.
> 
> 
> 
> -
> Phani Chaitanya
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Committing-when-indexing-in-parallel-tp4089953.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: MySQL Data import handler

2013-09-14 Thread Amit Jha
Hi Baskar,

Just create a single schema.xml which should contains required fields from 3 
tables.

Add a status column to child table.i.e 
1 = add
2 = update
3 = delete
4 = indexed
Etc

Write a program using solrj which will read the status and do thing 
accordingly. 
 

Rgds
AJ

On 15-Sep-2013, at 5:46, Baskar Sikkayan  wrote:

> Hi,
>  If i am supposed to go with Java client, should i still do any
> configurations in solrconfig.xml or schema.xml.
> 
> Thanks,
> Baskar.S
> 
> 
> On Sat, Sep 14, 2013 at 8:46 PM, Gora Mohanty  wrote:
> 
>> On 14 September 2013 20:07, Baskar Sikkayan  wrote:
>>> Hi Gora,
>>>Thanks a lot for your reply.
>>> My requirement is to combine 3 tables in mysql for search operation and
>>> planning to sync these 3 tables( not all the columns ) in Apache Solr.
>>> Whenever there is any change( adding a new row, deleting a row, modifying
>>> the column data( any column in the 3 tables ) ), the same has to updated
>> in
>>> solr. Guess, for this requirement, instead of going with delta-import,
>>> Apachae Solar java client will be of useful.
>> [...]
>> 
>> Yes, if you are comfortable with programming in Java,
>> the Solr client would be a good alternative, though the
>> DataImportHandler can also do what you want.
>> 
>> Regards,
>> Gora
>> 


Re: Solr Java Client

2013-09-14 Thread Amit Jha
Add a field called "source" in schema.xml and value would be your table names. 



Rgds
AJ

On 15-Sep-2013, at 5:38, Baskar Sikkayan  wrote:

> Hi,
>  I am new to Solr and trying to use Solr java client instead of using the
> Data handler.
>  Is there any configuration i need to do for this?
> 
> I got the following sample code.
> 
> SolrInputDocument doc = new SolrInputDocument();
> 
>  doc.addField("cat", "book");
>  doc.addField("id", "book-" + i);
>  doc.addField("name", "The Legend of the Hobbit part " + i);
>  server.add(doc);
>  server.commit();  // periodically flush
> 
> I am confused here. I am going to index 3 different tables for 3 different
> kind of searches. Here i dont have any option to differentiate 3 kind of
> indexes.
> Am i missing anything here. Could anyone please shed some light here?
> 
> Thanks,
> Baskar.S


Re: Solr Java Client

2013-09-14 Thread Amit Jha
Question is not clear to me.  Please be more elaborative in your query. Why do 
u want to store index to DB tables?

Rgds
AJ

On 15-Sep-2013, at 7:20, Baskar Sikkayan  wrote:

> How to add index to 3 diff tables from java ...
> 
> 
> On Sun, Sep 15, 2013 at 6:49 AM, Amit Jha  wrote:
> 
>> Add a field called "source" in schema.xml and value would be your table
>> names.
>> 
>> 
>> 
>> Rgds
>> AJ
>> 
>> On 15-Sep-2013, at 5:38, Baskar Sikkayan  wrote:
>> 
>>> Hi,
>>> I am new to Solr and trying to use Solr java client instead of using the
>>> Data handler.
>>> Is there any configuration i need to do for this?
>>> 
>>> I got the following sample code.
>>> 
>>> SolrInputDocument doc = new SolrInputDocument();
>>> 
>>> doc.addField("cat", "book");
>>> doc.addField("id", "book-" + i);
>>> doc.addField("name", "The Legend of the Hobbit part " + i);
>>> server.add(doc);
>>> server.commit();  // periodically flush
>>> 
>>> I am confused here. I am going to index 3 different tables for 3
>> different
>>> kind of searches. Here i dont have any option to differentiate 3 kind of
>>> indexes.
>>> Am i missing anything here. Could anyone please shed some light here?
>>> 
>>> Thanks,
>>> Baskar.S
>> 


Solr ZooKeeper ensemble with HBase

2013-04-03 Thread Amit Sela
Hi all,

I have a running Hadoop + HBase cluster and the HBase cluster is running
it's own zookeeper (HBase manages zookeeper).
I would like to deploy my SolrCloud cluster on a portion of the machines on
that cluster.

My question is: Should I have any trouble / issues deploying an additional
ZooKeeper ensemble ? I don't want to use the HBase ZooKeeper because, well
first of all HBase manages it so I'm not sure it's possible and second I
have HBase working pretty hard at times and I don't want to create any
connection issues by overloading ZooKeeper.

Thanks,

Amit.


Re: Solr ZooKeeper ensemble with HBase

2013-04-03 Thread Amit Sela
Trouble in what why ? If I have enough memory - HBase RegionServer 10GB and
maybe 2GB for Solr ? - or you mean CPU / disk ?


On Wed, Apr 3, 2013 at 5:54 PM, Michael Della Bitta <
michael.della.bi...@appinions.com> wrote:

> Hello, Amit:
>
> My guess is that, if HBase is working hard, you're going to have more
> trouble with HBase and Solr on the same nodes than HBase and Solr
> sharing a Zookeeper. Solr's usage of Zookeeper is very minimal.
>
> Michael Della Bitta
>
> 
> Appinions
> 18 East 41st Street, 2nd Floor
> New York, NY 10017-6271
>
> www.appinions.com
>
> Where Influence Isn’t a Game
>
>
> On Wed, Apr 3, 2013 at 8:06 AM, Amit Sela  wrote:
> > Hi all,
> >
> > I have a running Hadoop + HBase cluster and the HBase cluster is running
> > it's own zookeeper (HBase manages zookeeper).
> > I would like to deploy my SolrCloud cluster on a portion of the machines
> on
> > that cluster.
> >
> > My question is: Should I have any trouble / issues deploying an
> additional
> > ZooKeeper ensemble ? I don't want to use the HBase ZooKeeper because,
> well
> > first of all HBase manages it so I'm not sure it's possible and second I
> > have HBase working pretty hard at times and I don't want to create any
> > connection issues by overloading ZooKeeper.
> >
> > Thanks,
> >
> > Amit.
>


Re: do SearchComponents have access to response contents

2013-04-04 Thread Amit Nithian
"We need to also track the size of the response (as the size in bytes of the
whole xml response tat is streamed, with stored fields and all). I was a
bit worried cause I am wondering if a searchcomponent will actually have
access to the response bytes..."

==> Can't you get this from your container access logs after the fact? I
may be misunderstanding something but why wouldn't mining the Jetty/Tomcat
logs for the response size here suffice?

Thanks!
Amit


On Thu, Apr 4, 2013 at 1:34 AM, xavier jmlucjav  wrote:

> A custom QueryResponseWriter...this makes sense, thanks Jack
>
>
> On Wed, Apr 3, 2013 at 11:21 PM, Jack Krupansky  >wrote:
>
> > The search components can see the "response" as a namedlist, but it is
> > only when SolrDispatchFIlter calls the QueryResponseWriter that XML or
> JSON
> > or whatever other format (Javabin as well) is generated from the named
> list
> > for final output in an HTTP response.
> >
> > You probably want a custom query response writer that wraps the XML
> > response writer. Then you can generate the XML and then do whatever you
> > want with it.
> >
> > The QueryResponseWriter class and  in
> solrconfig.xml.
> >
> > -- Jack Krupansky
> >
> > -Original Message- From: xavier jmlucjav
> > Sent: Wednesday, April 03, 2013 4:22 PM
> > To: solr-user@lucene.apache.org
> > Subject: do SearchComponents have access to response contents
> >
> >
> > I need to implement some SearchComponent that will deal with metrics on
> the
> > response. Some things I see will be easy to get, like number of hits for
> > instance, but I am more worried with this:
> >
> > We need to also track the size of the response (as the size in bytes of
> the
> > whole xml response tat is streamed, with stored fields and all). I was a
> > bit worried cause I am wondering if a searchcomponent will actually have
> > access to the response bytes...
> >
> > Can someone confirm one way or the other? We are targeting Sorl4.0
> >
> > thanks
> > xavier
> >
>


Re: Solr 4.2 single server limitations

2013-04-04 Thread Amit Nithian
There's a whole heap of information that is missing like what you plan on
storing vs indexing and yes QPS too. My short answer is try with one server
until it falls over then start adding more.

When you say multiple-server setup do you mean multiple servers where each
server acts as a slave storing the entire index so you have load balancing
across multiple servers OR do you mean multiple servers where each server
stores a portion of the data? If it's the former, sometimes a simple
master/slave setup in Solr 4.x works but the latter may mean SolrCloud.
Master/Slave is easy but I don't know much about SolrCloud.

Questions to think about (this is not exhaustive by any means)
1) When you say 5-10 pages per website (300+ websites) that you are
crawling 2x per hour, are you *replacing* the old copy of the web page in
your index or storing some form of history for some reason.
2) What are you planning on storing vs indexing which would dictate your
memory requirements.
3) You mentioned you don't know QPS but having some guess would help.. is
it mostly for storage and occasional lookup (where slow responses is
probably tolerable) or is this powering a real user-facing website (where
low latency is prob desired).

Again, I like to start simple and use one server until it dies then expand
from there.

Cheers
Amit


On Thu, Apr 4, 2013 at 7:58 AM, imehesz  wrote:

> hello,
>
> I'm using a single server setup with Nutch (1.6) and Solr (4.2)
>
> I plan to trigger the Nutch crawling process every 30 minutes or so and add
> about 300+ websites a month with (~5-10 pages each). At this point I'm not
> sure about the query requests/sec.
>
> Can I run this on a single server (how long)?
> If not, what would be the best and most efficient way to have multiple
> server setup?
>
> thanks,
> --iM
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solr-4-2-single-server-limitations-tp4053829.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


unknown field error when indexing with nutch

2013-04-05 Thread Amit Sela
Hi all,

I'm trying to run a nutch crawler and index to Solr.
I'm running Nutch 1.6 and Solr 4.2.

I managed to crawl and index with that Nutch version into Solr 3.6.2 but I
can't seem to manage to run it with Solr 4.2

I re-built Nutch with the schema-solr4.xml and copied that file to
SOLR_HOME/example/solr/collection1/conf/schema.xml but the job fails when
trying to index:

SolrException: ERROR: [doc=
http://0movies.com/watchversion.php?id=3818&link=1364879137] unknown field
'host'

It looks like Solr is not aware of the schema... Did I miss something ?

Thanks.


Re: unknown field error when indexing with nutch

2013-04-05 Thread Amit Sela
I'm using the solrconfig supplied with Sole 4.2 and I added the nutch
request handler. But I keep getting the same errors.
 On Apr 5, 2013 8:11 PM, "Jack Krupansky"  wrote:

> Check your solrconfig.xml file for references to a "host" field.
>
> But maybe more importantly, make sure you use a Solr 4.1 solrconfig and
> merge in any of your application-specific changes.
>
> -- Jack Krupansky
>
> -Original Message- From: Amit Sela
> Sent: Friday, April 05, 2013 12:57 PM
> To: solr-user@lucene.apache.org
> Subject: unknown field error when indexing with nutch
>
> Hi all,
>
> I'm trying to run a nutch crawler and index to Solr.
> I'm running Nutch 1.6 and Solr 4.2.
>
> I managed to crawl and index with that Nutch version into Solr 3.6.2 but I
> can't seem to manage to run it with Solr 4.2
>
> I re-built Nutch with the schema-solr4.xml and copied that file to
> SOLR_HOME/example/solr/**collection1/conf/schema.xml but the job fails
> when
> trying to index:
>
> SolrException: ERROR: [doc=
> http://0movies.com/**watchversion.php?id=3818&link=**1364879137<http://0movies.com/watchversion.php?id=3818&link=1364879137>]
> unknown field
> 'host'
>
> It looks like Solr is not aware of the schema... Did I miss something ?
>
> Thanks.
>


Re: Sharing index amongst multiple nodes

2013-04-06 Thread Amit Nithian
I don't understand why this would be more performant.. seems like it'd be
more memory and resource intensive as you'd have multiple class-loaders and
multiple cache spaces for no good reason. Just have a single core with
sufficiently large caches to handle your response needs.

If you want to load balance reads consider having multiple physical nodes
with a master/slaves or SolrCloud.


On Sat, Apr 6, 2013 at 9:21 AM, Daire Mac Mathúna wrote:

> Hi. Wat are the thoughts on having multiple SOLR instances i.e. multiple
> SOLR war files, sharing the same index (i.e. sharing the same solr_home)
> where only one SOLR instance is used for writing and the others for
> reading?
>
> Is this possible?
>
> Is it beneficial - is it more performant than having just one solr
> instance?
>
> How does it affect auto-commits i.e. how would the read nodes know the
> index has been changed and re-populate cache etc.?
>
> Sole 3.6.1
>
> Thanks.
>


Re: how to skip test while building

2013-04-06 Thread Amit Nithian
If you generate the maven pom files you can do this I think by doing mvn
 -DskipTests=true.


On Sat, Apr 6, 2013 at 7:25 AM, Erick Erickson wrote:

> Don't know a good way to skip compiling the tests, but there isn't
> any harm in compiling them...
>
> changing to the solr directory and just issuing
> "ant example dist" builds pretty much everything. You don't execute
> tests unless you specify "ant test".
>
> "ant -p" shows you all the targets. Note that you have different
> targets depending on whether you're executing it in  or
> /solr or /lucene.
>
> Since you mention Solr, you probably want to work in /solr to
> start.
>
> Best
> Erick
>
> On Sat, Apr 6, 2013 at 5:36 AM, parnab kumar 
> wrote:
> > Hi All,
> >
> >   I am new to Solr . I am using solr 3.4 . I want to build without
> > building  lucene tests files in lucene and skip the tests to be fired .
> Can
> > anyone please help where to make the necessary changes .
> >
> > Thanks,
> > Pom
>


Re: writing a custom Filter plugin?

2013-05-14 Thread Amit Nithian
At first I thought you were referring to Filters in Lucene at query time
(i.e. bitset filters) but I think you are referring to token filters at
indexing/text analysis time?

I have had success writing my own Filter as the link presents. The key is
that you should write a custom class that extends TokenFilter (
http://lucene.apache.org/core/4_1_0/core/org/apache/lucene/analysis/TokenFilter.html)
and write the implementation in your incrementToken() method.

My recollection of this is that instead of returning something of a Token
like you would have in earlier versions of Lucene, you set attribute values
on a notional "current" token. One obvious attribute is the term text
itself and perhaps any positional information. The best place to start is
to pick a fairly simple example from the Solr Source (maybe
lowercasefilter) and try and mimic that.

Cheers!
Amit


On Mon, May 13, 2013 at 1:33 PM, Jonathan Rochkind  wrote:

> Does anyone know of any tutorials, basic examples, and/or documentation on
> writing your own Filter plugin for Solr? For Solr 4.x/4.3?
>
> I would like a Solr 4.3 version of the normalization filters found here
> for Solr 1.4: 
> https://github.com/billdueber/**lib.umich.edu-solr-stuff<https://github.com/billdueber/lib.umich.edu-solr-stuff>
>
> But those are old, for Solr 1.4.
>
> Does anyone have any hints for writing a simple substitution Filter for
> Solr 4.x?  Or, does a simple sourcecode example exist anywhere?
>


Re: Need solr query help

2013-05-14 Thread Amit Nithian
Is it possible instead to store in your solr index a bounding box of store
location + delivery radius, do a bounding box intersection between your
user's point + radius (as a bounding box) and the shop's delivery bounding
box. If you want further precision, the frange may work assuming it's a
post-filter implementation so that you are doing heavy computation on a
presumably small set of data only to filter out the corner cases around the
radius circle that results.

I haven't looked at Solr's spatial querying in a while to know if this is
possible or not.

Cheers
Amit


On Sat, May 11, 2013 at 10:42 AM, smsolr  wrote:

> Hi Abhishek,
>
> I forgot to explain why it works.  It uses the frange filter which is
> mentioned here:-
>
> http://wiki.apache.org/solr/CommonQueryParameters
>
> and it works because it filters in results where the geodist minus the
> shopMaxDeliveryDistance is less than zero (that's what the u=0 means, upper
> limit=0), i.e.:-
>
> geodist - shopMaxDeliveryDistance < 0
> ->
> geodist < shopMaxDeliveryDistance
>
> i.e. the geodist is less than the shopMaxDeliveryDistance and so the shop
> is
> within delivery range of the location specified.
>
> smsolr
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Need-solr-query-help-tp4061800p4062603.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Restaurant availability from database

2013-05-23 Thread Amit Nithian
Hossman did a presentation on something similar to this using spatial data
at a Solr meetup some months ago.

http://people.apache.org/~hossman/spatial-for-non-spatial-meetup-20130117/

May be helpful to you.


On Thu, May 23, 2013 at 9:40 AM, rajh  wrote:

> Thank you for your answer.
>
> Do you mean I should index the availability data as a document in Solr?
> Because the availability data in our databases is around 6,509,972 records
> and contains the availability per number of seats and per 15 minutes. I
> also
> tried this method, and as far as I know it's only possible to join the
> availability documents and not to include that information per result
> document.
>
> An example API response (created from the Solr response):
> {
> "restaurants": [
> {
> "id": "13906",
> "name": "Allerlei",
> "zipcode": "6511DP",
> "house_number": "59",
> "available": true
> },
> {
> "id": "13907",
> "name": "Voorbeeld",
> "zipcode": "6512DP",
> "house_number": "39",
> "available": false
> }
> ],
> "resultCount": 12156,
> "resultCountAvailable": 55,
> }
>
> I'm currently hacking around the problem by executing the search again with
> a very high value for the rows parameter and counting the number of
> available restaurants on the backend, but this causes a big performance
> impact (as expected).
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Restaurant-availability-from-database-tp4065609p4065710.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


DIH, UTF8 and default DIH encoding value

2010-07-31 Thread Amit Nithian
All,

I am not sure if this is overly obvious or not (it wasn't to me) but in
trying to index some international characters from XML files using the DIH,
I found that setting the encoding attribute on the dataSource element to
"UTF-8" fixed my problem.



My question is why the default isn't UTF-8 or if there is a good reason, can
the DIH wiki be made more clear that this encoding attribute can affect the
indexing of international characters? If I can get access to edit this wiki
page, I can add a section to that effect.. perhaps under a troubleshooting
section?

Thanks!
Amit


Re: DIH and multivariable fields problems

2010-08-06 Thread Amit Nithian
That's probably the most efficient way to do it... I believe the line you
are referring allows you to have sub-entities which , in the RDBMS, would
execute a separate query for each parent given a primary key. The downside
to this though is that for each parent you will be executing N separate
queries.

I tend to like forcing all my logic in SQL and using transformers to process
each row for your DB is more efficient at the joins than the application
layer.

On Fri, Aug 6, 2010 at 5:46 PM, harrysmith  wrote:

>
> Thanks, this helps a great deal, and I may be able to use this method.
>
> Is this how DIH is intended to be used? The multi values should be returned
> in 1 row then manipulated by a transformer? This is fine, but is just
> unclear from the documentation. I was under the assumption that multiple
> rows returned for a child entity with the same parent would be able to
> create a multivalued entry.
>
> From the DataImportHandler wiki:
>
> "...it is possible to create a multivalued field by joining an entity with
> another.i.e if the sub-entity returns multiple rows for one row from parent
> entity it can go into a multivalued field"
>
>
>
>
>
>   For multiple value fields using the DIH, i use group_concat with the
> regextransformer's splitby:
> ex:
>  transformer="RegexTransformer">
> 
> 
>
> hope that's helpful.
>
> @tommychheng
> Programmer and UC Irvine Graduate Student
> Find a great grad school based on research interests:
> http://gradschoolnow.com
>
>
> On 8/6/10 4:39 PM, harrysmith wrote:
> > I'm having a difficult time understanding how multivariable fields work
> > with
> > the DataImportHandler when the source is a RDBMS. I've read the following
> > from the wiki:
> >
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/DIH-and-multivariable-fields-problems-tp1032893p1033045.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: DIH, UTF8 and default DIH encoding value

2010-08-08 Thread Amit Nithian
Thanks Otis. I went ahead and added this section. I hope that others can add
to this too but of course the list should be short :-)

- Amit

On Sun, Aug 1, 2010 at 12:00 AM, Otis Gospodnetic <
otis_gospodne...@yahoo.com> wrote:

> Hi Amit,
>
> Anyone can edit any Solr Wiki page - just create an account (I think the
> link to
> that is in the page footer) and edit.
>
> Otis
> 
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
>
>
>
> - Original Message 
> > From: Amit Nithian 
> > To: solr-user@lucene.apache.org
> > Sent: Sat, July 31, 2010 4:41:44 PM
> > Subject: DIH, UTF8 and default DIH encoding value
> >
> > All,
> >
> > I am not sure if this is overly obvious or not (it wasn't to me) but  in
> > trying to index some international characters from XML files using the
>  DIH,
> > I found that setting the encoding attribute on the dataSource element  to
> > "UTF-8" fixed my problem.
> >
> > 
> >
> > My question is why the default isn't UTF-8 or if  there is a good reason,
> can
> > the DIH wiki be made more clear that this  encoding attribute can affect
> the
> > indexing of international characters? If I  can get access to edit this
> wiki
> > page, I can add a section to that effect..  perhaps under a
> troubleshooting
> > section?
> >
> > Thanks!
> > Amit
> >
>


Can a Solr Server be both master and slave?

2010-08-16 Thread Amit Nithian
I am not sure if this is the best approach to this problem but I was curious
if a single solr server could be both a master and a slave without causing
index corruption? It seems that you could setup multiple replication
handlers in the SOLR config, /replication /replication2 and have one be
master and another be a slave syncing from another server. Here's why:
1) I want to build an index using data stored in our own local datacenter
generated using M/R and our MySQL DB
2) This index would be synced with a Solr Master sitting in EC2
3) Series of EC2 solr slaves replicate from EC2 Solr master for scaling
purposes.

I figure this would save costs (both time and money) over having all EC2
slaves slave from our datacenter. The index isn't that big but I figure
transferring it once would be best. I was going to setup my local datacenter
process hourly and let it sync accordingly.

Any pitfalls to this?

Thanks
Amit


Re: Can a Solr Server be both master and slave?

2010-08-16 Thread Amit Nithian
Ugh I should have checked there first! Thanks for the reply.. that helps a
lot.

Sincerely
Amit

On Mon, Aug 16, 2010 at 10:57 AM, Gora Mohanty  wrote:

> On Mon, 16 Aug 2010 10:43:38 -0700
> Amit Nithian  wrote:
>
> > I am not sure if this is the best approach to this problem but I
> > was curious if a single solr server could be both a master and a
> > slave without causing index corruption? It seems that you could
> > setup multiple replication handlers in the SOLR
> > config, /replication /replication2 and have one be master and
> > another be a slave syncing from another server. Here's why: 1) I
> > want to build an index using data stored in our own local
> > datacenter generated using M/R and our MySQL DB 2) This index
> > would be synced with a Solr Master sitting in EC2 3) Series of
> > EC2 solr slaves replicate from EC2 Solr master for scaling
> > purposes.
> [...]
>
> Have you taken a look at the replication page on the Solr Wiki?
> A repeater seems to address exactly your use case:
> http://wiki.apache.org/solr/SolrReplication#Setting_up_a_Repeater
>
> Regards,
> Gora
>


Re: Is there any strss test tool for testing Solr?

2010-08-25 Thread Amit Nithian
i recommend JMeter. We use that to do load testing on a search server. of
course you have to provide a reasonable set of queries as input... if you
don't have any then a reasonable estimation based on your expected traffic
should suffice. JMeter can be used for other load testing too..

Be careful though.. as silly as this may sound.. do NOT just issue random
queries because that won't exercise your caches... We had a load test that
killed our servers because our caches kept getting blown out. Of course the
traffic being generated was purely random was not representative of real
world traffic which usually has more predictable behavior.

hope that helps!
Amit

On Wed, Aug 25, 2010 at 7:50 PM, scott chu (朱炎詹) wrote:

> We're currently building a Solr index with ober 1.2 million documents. I
> want to do a good stress test of it. Does anyone know if ther's a
> appropriate stress test tool for Solr? Or any good suggestion?
>
> Best Regards,
>
> Scott
>


Hardware Specs Question

2010-08-30 Thread Amit Nithian
Hi all,

I am curious to know get some opinions on at what point having more CPU
cores shows diminishing returns in terms of QPS. Our index size is about 8GB
and we have 16GB of RAM on a quad core 4 x 2.4 GHz AMD Opteron 2216.
Currently I have the heap to 8GB.

We are looking to get more servers to increase capacity and because the
warranty is set to expire on our old servers and so I was curious before
asking for a certain spec what others run and at what point does having more
cores cease to matter? Mainly looking at somewhere between 4-12 cores per
server.

Thanks!
Amit


Re: Hardware Specs Question

2010-08-30 Thread Amit Nithian
Lance,

Thanks for your help. What do you mean by that the OS can keep the index in
memory better than Solr? Do you mean that you should use another means to
keep the index in memory (i.e. ramdisk)? Is there a generally accepted heap
size/index size that you follow?

Thanks
Amit

On Mon, Aug 30, 2010 at 5:00 PM, Lance Norskog  wrote:

> The price-performance knee for small servers is 32G ram, 2-6 SATA
> disks on a raid, 8/16 cores. You can buy these servers and half-fill
> them, leaving room for expansion.
>
> I have not done benchmarks about the max # of processors that can be
> kept busy during indexing or querying, and the total numbers: QPS,
> response time averages & variability, etc.
>
> If your index file size is 8G, and your Java heap is 8G, you will do
> long garbage collection cycles. The operating system is very good at
> keeping your index in memory- better than Solr can.
>
> Lance
>
> On Mon, Aug 30, 2010 at 4:52 PM, Amit Nithian  wrote:
> > Hi all,
> >
> > I am curious to know get some opinions on at what point having more CPU
> > cores shows diminishing returns in terms of QPS. Our index size is about
> 8GB
> > and we have 16GB of RAM on a quad core 4 x 2.4 GHz AMD Opteron 2216.
> > Currently I have the heap to 8GB.
> >
> > We are looking to get more servers to increase capacity and because the
> > warranty is set to expire on our old servers and so I was curious before
> > asking for a certain spec what others run and at what point does having
> more
> > cores cease to matter? Mainly looking at somewhere between 4-12 cores per
> > server.
> >
> > Thanks!
> > Amit
> >
>
>
>
> --
> Lance Norskog
> goks...@gmail.com
>


Re: Hardware Specs Question

2010-08-30 Thread Amit Nithian
Lance,

makes sense and I have heard about the long GC times on large heaps but I
personally haven't experienced a slowdown but that doesn't mean anything
either :-). Agreed that tuning the SOLR caching is the way to go.

I haven't followed all the solr/lucene changes but from what I remember
there are synchronization points that could be a bottleneck where adding
more cores won't help this problem? Or am I completely missing something.

Thanks again
Amit

On Mon, Aug 30, 2010 at 8:28 PM, scott chu (朱炎詹) wrote:

> I am also curious as Amit does. Can you make an example about the garbage
> collection problem you mentioned?
>
> - Original Message - From: "Lance Norskog" 
> To: 
> Sent: Tuesday, August 31, 2010 9:14 AM
> Subject: Re: Hardware Specs Question
>
>
>
>  It generally works best to tune the Solr caches and allocate enough
>> RAM to run comfortably. Linux & Windows et. al. have their own cache
>> of disk blocks. They use very good algorithms for managing this cache.
>> Also, they do not make long garbage collection passes.
>>
>> On Mon, Aug 30, 2010 at 5:48 PM, Amit Nithian  wrote:
>>
>>> Lance,
>>>
>>> Thanks for your help. What do you mean by that the OS can keep the index
>>> in
>>> memory better than Solr? Do you mean that you should use another means to
>>> keep the index in memory (i.e. ramdisk)? Is there a generally accepted
>>> heap
>>> size/index size that you follow?
>>>
>>> Thanks
>>> Amit
>>>
>>> On Mon, Aug 30, 2010 at 5:00 PM, Lance Norskog 
>>> wrote:
>>>
>>>  The price-performance knee for small servers is 32G ram, 2-6 SATA
>>>> disks on a raid, 8/16 cores. You can buy these servers and half-fill
>>>> them, leaving room for expansion.
>>>>
>>>> I have not done benchmarks about the max # of processors that can be
>>>> kept busy during indexing or querying, and the total numbers: QPS,
>>>> response time averages & variability, etc.
>>>>
>>>> If your index file size is 8G, and your Java heap is 8G, you will do
>>>> long garbage collection cycles. The operating system is very good at
>>>> keeping your index in memory- better than Solr can.
>>>>
>>>> Lance
>>>>
>>>> On Mon, Aug 30, 2010 at 4:52 PM, Amit Nithian 
>>>> wrote:
>>>> > Hi all,
>>>> >
>>>> > I am curious to know get some opinions on at what point having more >
>>>> CPU
>>>> > cores shows diminishing returns in terms of QPS. Our index size is >
>>>> about
>>>> 8GB
>>>> > and we have 16GB of RAM on a quad core 4 x 2.4 GHz AMD Opteron 2216.
>>>> > Currently I have the heap to 8GB.
>>>> >
>>>> > We are looking to get more servers to increase capacity and because >
>>>> the
>>>> > warranty is set to expire on our old servers and so I was curious >
>>>> before
>>>> > asking for a certain spec what others run and at what point does >
>>>> having
>>>> more
>>>> > cores cease to matter? Mainly looking at somewhere between 4-12 cores
>>>> > per
>>>> > server.
>>>> >
>>>> > Thanks!
>>>> > Amit
>>>> >
>>>>
>>>>
>>>>
>>>> --
>>>> Lance Norskog
>>>> goks...@gmail.com
>>>>
>>>>
>>>
>>
>>
>> --
>> Lance Norskog
>> goks...@gmail.com
>>
>>
>
>
> 
>
>
>
> ___b___J_T_f_r_C
> Checked by AVG - www.avg.com
> Version: 9.0.851 / Virus Database: 271.1.1/3102 - Release Date: 08/30/10
> 14:35:00
>
>


Re: anybody using solr with Cassandra?

2010-08-30 Thread Amit Nithian
I am curious about this too.. are you talking about using HBase/Cassandra as
an aux store of large data or using Cassandra to store the actual lucene
index (as in LuCandra)?

On Mon, Aug 30, 2010 at 11:06 PM, Siju George  wrote:

> Thanks a million Nick,
>
> We are currently debating whether we should use cassandra or membase or
> hbase with solr.
> Do you have anything to contribute as advice to us?
>
> Thanks again :-)
>
> --Siju
>
> On Tue, Aug 31, 2010 at 5:15 AM, nickdos  wrote:
>
> >
> > Yes, we are Cassandra. There is nothing much to say really, it just
> works.
> > Note we are SOLR generating indexes using Java & SolrJ (embedded mode)
> and
> > reading data out of Cassandra with Java. Index generation is fast.
> > --
> > View this message in context:
> >
> http://lucene.472066.n3.nabble.com/anybody-using-solr-with-Cassandra-tp1383646p1391589.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
>


CoreContainer Usage

2010-10-07 Thread Amit Nithian
I am trying to understand the multicore setup of Solr more and saw
that SolrCore.getCore is deprecated in favor of
CoreContainer.getCore(name). How can I get a reference to the
CoreContainer for I assume it's been created somewhere in Solr and is
it possible for one core to get access to another SolrCore via the
CoreContainer?

Thanks
Amit


  1   2   3   >