Boost on basis of field is present or not in found documents

2014-10-16 Thread Rahul
Where should i do changes in config files if i want to boost on the basis
of if a field is present in my found documents.

Explanation:
I have documents with fields name, address, id, number, where number may or
may not exists.
I have to rank the documents higher based on if number is not present.

I thought of writing function exists in my qf but that is not working.
I am using edismax query parser.

Thanks

-- 

Rahul Ranjan


DIH Blob data

2014-11-11 Thread Rahul
I am trying to index json data present under blob data type in data base.
JSON stored in database as {a:1,b:2,c:3}.

I want to Search based on fields later like fq= a:1.
The fields a,b,c are dynamic and can be anything based on data posted by
users.

What is the correct way to index data based on dynamic fields in Solr and
search them later based on those fields.

-- 

Rahul Ranjan


Solr inserting Multivalued filelds

2011-02-02 Thread rahul

Hi,

I am a newbie to Apache Solr.

We are using ContentStreamUpdateRequest to insert into Solr. For eg,

ContentStreamUpdateRequest req = new ContentStreamUpdateRequest(
"/update/extract")
req.addContentStream(stream); 
req.addContentStream(literal.name, name);

SolrServer server = new CommonsHttpSolrServer(URL);
server.request(req);

here.. in schema.xml , I have specified name as multivalued text. 

Now, I want to set multivalues for name. Could anyone update me, how to do
this..??

Additonally, assume if I set commit maxTime as 1 in my solr
configuration file, ie (10 sec) then if my application stops before
committing into solr, then whether this information will be lost.

whether I need to insert all these documents to solr again ??

thanks in Advance..
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-inserting-Multivalued-filelds-tp2406612p2406612.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr inserting Multivalued filelds

2011-02-02 Thread rahul

Nevermind.. got the details from here..

http://wiki.apache.org/solr/ExtractingRequestHandler

Thanks..
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-inserting-Multivalued-filelds-tp2406612p2411248.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr Autosuggest help

2011-02-26 Thread rahul
Hi,

I am using Solr  (1.4.1) AutoSuggest feature using termsComponent.

Currently, if I type 'goo' means, Solr suggest words like 'google'.

But I would like to receive suggestions like 'google, google alerts, ..' .
ie, suggestions with single and multiple terms.

Not sure, whether I need to use edgengrams for that. for eg, indexing google
like 'go', 'oo', 'og', ... . But I think I don't need this, Since I don't
want partial search. Please let me know if there is any way to do multiple
word suggestions .

Thanks in Advance. 

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Autosuggest-help-tp2580944p2580944.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr Autosuggest help

2011-03-06 Thread rahul
Hi

I have added the following line in both the  section and in   section in
schema.xml.

filter class="solr.ShingleFilterFactory" maxShingleSize="2"
outputUnigrams="true" outputUnigramIfNoNgram="true"

And reindex my content. However, if I query solr for the multi work search
terms suggestion , it only send the single word suggestions.

http://localhost:8080/solr/mydata/select?qt=/terms&terms=true&terms.fl=content&terms.lower=java&terms.prefix=java&terms.lower.incl=false&indent=true

It wont return the words like 'java final', it only returns words like
javadoc, javascript..

Could any one update me how to correct this.. or what I am missing..

thanks, 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Autosuggest-help-tp2580944p2645316.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr Autosuggest help

2011-03-07 Thread rahul
hi..

thanks for your replies..

It seems I mistakenly put ShingleFilterFactory in another field. When I put
the factory in correct field it works fine now. 

Thanks.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Autosuggest-help-tp2580944p2645780.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr insert error

2011-03-11 Thread rahul
Hi,

I have received the following error, when I try to insert a document into
solr,

SEVERE: org.apache.solr.common.SolrException: ERROR: multiple values
encountered for non multiValued copy field id: 272327_1

In my schema.xml, I have specified,




id


In the query, I have passed as literal.uniqueid=272327_1 .

Could any one update me why this error occur. And how to fix it. 

Thanks in Advance.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-insert-error-tp2664504p2664504.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr insert error

2011-03-11 Thread rahul
Hi,

thanks for your reply.

I have post that value only one time.

The following are the list of values that I have posted,

literal.uniqueid=272327_1&literal.urlid=272327&literal.url=http%3A%2F%2Fblogs.edweek.org%2Fteachers%2Fbook_whisperer%2F2009%2F03%2Fa_book_in_every_backpack_1.html&literal.title=A%2BBook%2Bin%2BEvery%2BBackpack%2B-%2BThe%2BBook%2BWhisperer%2B-%2BEducation%2BWeek%2BTeacher&literal.description=Donalyn%2BMiller%2Bis%2Ba%2B6th%2Bgrade%2Blanguage%2Barts%2Bteacher%2Bin%2BTexas%2Bwho%2Bis%2Bsaid%2Bto%2Bhave%2Ba&literal.type=1&literal.date=2011-03-11T17%3A28%3A57Z&literal.users=1

When I call , server.request(req);

It throws an Bad Request 400 error,

and the error message is,

Bad Request

request:
http://localhost:8080/solr/bookmarks/update/extract?literal.uniqueid=272327_1&literal.urlid=272327&literal.url=http://blogs.edweek.org/teachers/book_whisperer/2009/03/a_book_in_every_backpack_1.html&literal.title=A+Book+in+Every+Backpack+-+The+Book+Whisperer+-+Education+Week+Teacher&literal.description=Donalyn+Miller+is+a+6th+grade+language+arts+teacher+in+Texas+who+is+said+to+have+a&literal.type=1&literal.date=2011-03-11T17:28:57Z&literal.users=1&wt=javabin&version=1

Since , I declared copyfileld for uniqueid as id, I think It should be
inserted.

Please check this and update me, whether I am doing anything wrong.

thanks







--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-insert-error-tp2664504p2664735.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr insert error

2011-03-11 Thread rahul
hi,,

seems I have identified the issue.

In the code I am using 

ContentStreamBase.StringStream stream = new
ContentStreamBase.StringStream(streamData);

If the streamData contains name="ID"  , ie, ID value then already I set
copyfield for uniqueid as id. Hence, It throws error.

Seems, it check on both parameters and also on string stream data.

Is it a solr limitation or design..??

thanks,..




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-insert-error-tp2664504p2664928.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr insert error

2011-03-12 Thread rahul
uniqueKey id uniqueKey
copyField source="uniqueid" dest="id"

Please note that, I have marked uniqueKey as bold, since I can't make it as
Tag and post in this forum. uniqueKey is the tag only..

Here, id and uniqueid both are declared as string. I only pass the uniqueid
in the query.

Please note, this error not occurred for all the inserts. Its only occur for
particular inserts which has id value in their content. I believe I need to
change the unique field names in schema (avoid to repeat the same words in
content also)

Thanks,

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-insert-error-tp2664504p2668614.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solrj performance bottleneck

2011-03-15 Thread rahul
Hi,

I am using Solrj as a Solr client in my project.

While searching, for a few words, it seems Solrj takes more time to send
response, for eg (8 - 12 sec). While searching most of the other words it
seems Solrj take less amount of time only.

For eg, if I post a search url in browser, it shows the QTime in
milliseconds only.

http://serverName/solr/mydata/select?q=computing&qt=myhandler&fq=category:1

But, if I query the same using Solrj from my project like below, it takes
long time(8 - 12 sec) to produce the same results. Hence, I suspect whether
Solrj takes such long time to produce results.

SolrServer server = new CommonsHttpSolrServer(url);
SolrQuery query = new SolrQuery("computing");
query.setParam("qt", "myhandler");
query.setFilterQueries("category:1");
query.setHighlight(false);
QueryResponse rsp = server.query( query );

I have tried both POTH and GET method. But, both are taking much time.

Any idea why Solrj takes such long time for particular words. It returns
around 40 doc list as a search result.  I have even comment out highlighting
for that. 

And any way to speed it up.

Note: I am using Tomcat and set heap size as around 1024 mb.

Thanks,



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solrj-performance-bottleneck-tp2681294p2681294.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solrj Performance check.

2011-03-15 Thread rahul
Hi,

I am using Solrj as a Solr client in my project.

While searching, for a few words, it seems Solrj takes more time to send
response, for eg (8 - 12 sec). While searching most of the other words it
seems Solrj take less amount of time only.

For eg, if I post a search url in browser, it shows the QTime in
milliseconds only.

http://serverName/solr/mydata/select?q=computing&qt=myhandler&fq=category:1

But, if I query the same using Solrj from my project like below, it takes
long time(8 - 12 sec) to produce the same results. Hence, I suspect whether
Solrj takes such long time to produce results.

SolrServer server = new CommonsHttpSolrServer(url);
SolrQuery query = new SolrQuery("computing");
query.setParam("qt", "myhandler");
query.setFilterQueries("category:1");
query.setHighlight(false);
QueryResponse rsp = server.query( query );

I have tried both POTH and GET method. But, both are taking much time.

Any idea why Solrj takes such long time for particular words. It returns
around 40 doc list as a search result.  I have even comment out highlighting
for that.

And any way to speed it up.

Note: I am using Tomcat and set heap size as around 1024 mb.

Thanks, 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solrj-Performance-check-tp2681444p2681444.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solrj (1.4.1) Performance related query

2011-03-15 Thread rahul
Hi,

I am using Solrj as a Solr client in my project.

While searching, for a few words, it seems Solrj takes more time to send
response, for eg (8 - 12 sec). While searching most of the other words it
seems Solrj take less amount of time only.

For eg, if I post a search url in browser, it shows the QTime in
milliseconds only.

http://serverName/solr/mydata/select?q=computing&qt=myhandler&fq=category:1

But, if I query the same using Solrj from my project like below, it takes
long time(8 - 12 sec) to produce the same results. Hence, I suspect whether
Solrj takes such long time to produce results.

SolrServer server = new CommonsHttpSolrServer(url);
SolrQuery query = new SolrQuery("computing");
query.setParam("qt", "myhandler");
query.setFilterQueries("category:1");
query.setHighlight(false);
QueryResponse rsp = server.query( query );

I have tried both POTH and GET method. But, both are taking much time.

Any idea why Solrj takes such long time for particular words. It returns
around 40 doc list as a search result.  I have even comment out highlighting
for that.

And any way to speed it up.

Note: I am using Tomcat and set heap size as around 1024 mb. And I am using
Solr 1.4.1 version.

Thanks, 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solrj-1-4-1-Performance-related-query-tp2681488p2681488.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solrj performance bottleneck

2011-03-16 Thread rahul
Hi,

Thanks for your information.

One simple question. Please clarify me.

In our setup, we are having Solr index in one machine. And Solrj client part
(java code) in another machine. Currently as you suggest, if it may be a
'not enough free RAM for the OS to cache' then whether I need to increase
the RAM in the machine in which Solrj query part is there.??? Or need to
increase RAM for Solr instance for the OS cache?

Since both the system are in local Amazon network (Linux EC2 small
instances), I believe the network wont be a issue.

Another thing, in the reply you have mentioned 'client not reading fast
enough'. Whether it is related to network or Solrj.

Thanks in advance for your info.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solrj-performance-bottleneck-tp2682797p2687448.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solrj performance bottleneck

2011-03-16 Thread rahul
thanks for all your info.

I will try increase the RAM and check it.

thanks,

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solrj-performance-bottleneck-tp2682797p2692503.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr Autosuggest help

2011-03-17 Thread rahul
Hi,

One more query.

Currently in the autosuggestion Solr returns words like below:

googl
googl _
googl search
googl chrome
googl map

The last letter seems to be missing in autosuggestion. I have send the query
as
"?qt=/terms&terms=true&terms.fl=mydata&terms.lower=goog&terms.prefix=goog".

The following is my schema.xml for the Text filed.

fieldType name="text" class="solr.TextField" positionIncrementGap="100"
 analyzer
tokenizer class="solr.WhitespaceTokenizerFactory"
filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt"
filter class="solr.WordDelimiterFilterFactory"
generateWordParts="0" generateNumberParts="1"
catenateWords="0" catenateNumbers="1" catenateAll="0"
splitOnCaseChange="1"
  
filter class="solr.LowerCaseFilterFactory"
filter class="solr.EnglishPorterFilterFactory"
protected="protwords.txt"  
filter class="solr.RemoveDuplicatesTokenFilterFactory"
filter class="solr.ShingleFilterFactory" maxShingleSize="2"
outputUnigrams="true" outputUnigramIfNoNgram="true"
 analyzer
fieldType

Could anyone update what could be wrong? why the last letter get missing. It
occurs for a few word only. Suggestions for other words are good only.

One more query, how the word 'sci/tech' will be indexed in solr. If I search
on sci/tech it wont send any results.

Thanks in Advance.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Autosuggest-help-tp2580944p2692651.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr Autosuggest help

2011-03-17 Thread rahul
hi,

We have found that 'EnglishPorterFilterFactory' causes that issue. I believe
that is used for stemming words. Once we commented that factory, it works
fine.

And another thing, currently I am checking about how the word 'sci/tech'
will be indexed in solr. As mentioned in my previous email, if I search on
sci/tech it wont send any results. But solr has the terms as sci/tech. When
I search on other terms which also contain sci/tech, it returns both the
words.

Please let me know, if you have any idea regarding that.. If I came to know
I will update this thread.

thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Autosuggest-help-tp2580944p2693601.html
Sent from the Solr - User mailing list archive at Nabble.com.


Comment Unused portions in solrconfig

2011-04-01 Thread rahul
Hi,

I have indexed some terms in our solr and I only used Solr for searching
purpose only. Currently, I dont use highlighting part (" and requesthandlers
like '/spell', and queryResponseWriters (since I use solrj, it uses default
javabin response writer).

I have little bit concerned about, if I comment those unused modules from
solrconfig.xml, whether it will used to increase search performance little
bit. ?? Or if I comment then it will affect any of the Solr
operations/performance ??

thanks in advance.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Comment-Unused-portions-in-solrconfig-tp2762664p2762664.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Comment Unused portions in solrconfig

2011-04-01 Thread rahul
thanks for your info.


On Sat, Apr 2, 2011 at 9:51 AM, Chris Hostetter-3 [via Lucene] <
ml-node+2766175-1482636196-340...@n3.nabble.com> wrote:

>
> : I have little bit concerned about, if I comment those unused modules from
>
> : solrconfig.xml, whether it will used to increase search performance
> little
> : bit. ?? Or if I comment then it will affect any of the Solr
> : operations/performance ??
>
> there may be some things you are using w/o realizing it so you should
> definitely test out any changes -- but once you know you don't need
> something, commenting it out should be fine.
>
> is it worth it?
>
> probably not.  but in general most unuse request handlers and components
> and what not take up very little resources -- a small amount of ram that
> you likely won' even notice getting back. it's certainly possible for a
> solr plugin to eat up huge gobs or ram and/or spin up background threads
> that consume lots of CPU even when you don't use them -- but nothing
> in the example configs behaves like that.
>
>
>
> -Hoss
>
>
> --
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://lucene.472066.n3.nabble.com/Comment-Unused-portions-in-solrconfig-tp2762664p2766175.html
>  To unsubscribe from Comment Unused portions in solrconfig, click 
> here.
>
>


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Comment-Unused-portions-in-solrconfig-tp2762664p2766365.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solrj performance bottleneck

2011-04-04 Thread rahul
Hi All,

I just to want to share some findings which clearly identified the reason
for our performance bottleneck. we had looked into several areas for
optimization mostly directed at Solr configurations, stored fields,
highlighting, JVM, OS cache etc. But it turned out that the "main" culprit
was elsewhere. We were using the terms component for auto suggestion and
while examining the firebug outputs for time taken during the searches, we
detected that multiple requests were being spawned for autosuggestion as we
typed in the keyword to search (1 request per each character typed) and this
in turn cost us great delay in getting the search results. Once we turned
auto suggestion off, the performance was remarkably better and came down to
a second or so (compared to 8-10 seconds registered earlier).

if anybody has some suggestions/experience on how to leverage autosuggestion
without affecting search performance much, please do share them.

Once again, thanks for your inputs in analyzing our issues.

Thanks,

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solrj-performance-bottleneck-tp2682797p2775245.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solrj performance bottleneck

2011-04-05 Thread rahul
Thanks Stefan and Victor ! we are using GWT for front end. We stopped issuing
multiple asynchronous queries and issue a request and fetch results and then
filter the results based on what has
been typed subsequent to the request and then re trigger the request only if
we don't get the expected results.

Thanks Victor, I appreciate the link to the Jquery example and we will look
into it as a reference.

Regards,
Rahul.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solrj-performance-bottleneck-tp2682797p2779387.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Internal Server Error

2011-04-09 Thread rahul
hi,

I believe you are facing this 'Internal Server Error' while try to index the
pdf files.

Try adding the 'pdfbox and fontbox' jars in to your solr lib folder and try
restart Tomcat once.

I hope it will solve the issue.

Thanks.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Internal-Server-Error-tp715713p2798605.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr 3.1 Upgrade

2011-04-09 Thread rahul
Hi,

I am using Solr 1.4.1 in my environment. I have not used special solr
features such as replication/distributed searches. From the Solr 3.1 release
notes, I came to know, for upgrade , first we need to upgrade our slaves and
then need to update Master solr server. Since I have not used this, I
believe I dont need to worry about this.

To upgrade to 3.1, whether I can simply copy the latest Solr and Solrj
libraries in my setup?? Or reindex of the entire documents needed ??

Also, any special changes need to be done on the existing solrconfig.xml and
schema.xml ?? or can I have the old schema files itself..

Thanks in Advance for your reply.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-3-1-Upgrade-tp2798750p2798750.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr indexing size for a particular document.

2011-04-19 Thread rahul
Hi,

Is there a way to find out Solr indexing size for a particular document. I
am using Solrj to index the documents. 

Assume, I am indexing multiple fields like title, description, content, and
few integer fields in schema.xml, then once I index the content, is there a
way to identify the index size for the particular document during indexing
or after indexing..??

Because, most of the common words are excluded from StopWords.txt using
StopFilterFactory. I just want to calculate the actual index size of the
particular document. Is there any way in current Solr ??

thanks,


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-indexing-size-for-a-particular-document-tp2838416p2838416.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr indexing size for a particular document.

2011-04-22 Thread rahul
thanks for all your inputs.



On Fri, Apr 22, 2011 at 8:36 PM, Otis Gospodnetic-2 [via Lucene] <
ml-node+2851624-1936255218-340...@n3.nabble.com> wrote:

> Rahul,
>
> Here's a suggestion:
> Write a simple app that uses *Lucene* to create N indices, one for each of
> the
> documents you want to test.  Then you can look at their sizes on disk.
>
> Not sure if it's super valuable to see sizes of individual documents, but
> you
> can do it as described above.
> Of course, if you *store* all your data, the index will be bigger than the
> original/input data.
>
> Otis
> 
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
>
>
>
>
> - Original Message 
>
> > From: rahul <[hidden 
> > email]<http://user/SendEmail.jtp?type=node&node=2851624&i=0&by-user=t>>
>
> > To: [hidden 
> > email]<http://user/SendEmail.jtp?type=node&node=2851624&i=1&by-user=t>
> > Sent: Tue, April 19, 2011 7:49:39 AM
> > Subject: Solr indexing size for a particular document.
> >
> > Hi,
> >
> > Is there a way to find out Solr indexing size for a particular  document.
> I
> > am using Solrj to index the documents.
> >
> > Assume, I am  indexing multiple fields like title, description, content,
> and
> > few integer  fields in schema.xml, then once I index the content, is
> there a
> > way to  identify the index size for the particular document during
> indexing
> > or after  indexing..??
> >
> > Because, most of the common words are excluded from  StopWords.txt using
> > StopFilterFactory. I just want to calculate the actual  index size of the
>
> > particular document. Is there any way in current Solr  ??
> >
> > thanks,
> >
> >
> > --
> > View this message in context:
> >
> http://lucene.472066.n3.nabble.com/Solr-indexing-size-for-a-particular-document-tp2838416p2838416.html<http://lucene.472066.n3.nabble.com/Solr-indexing-size-for-a-particular-document-tp2838416p2838416.html?by-user=t>
> >
> > Sent  from the Solr - User mailing list archive at Nabble.com.
> >
>
>
> --
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://lucene.472066.n3.nabble.com/Solr-indexing-size-for-a-particular-document-tp2838416p2851624.html
>  To unsubscribe from Solr indexing size for a particular document., click
> here<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=2838416&code=YXNoYXJ1ZGVlbkBnbWFpbC5jb218MjgzODQxNnwtMTcyMzYyMjg1>.
>
>


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-indexing-size-for-a-particular-document-tp2838416p2851652.html
Sent from the Solr - User mailing list archive at Nabble.com.

Post content to be indexed to Solr

2011-08-12 Thread rahul
Hi,

Currently I am indexing documents by directly adding files as
'req.addFile(fi);' or  by sending the content of the file like
'req.addContentStream(stream);' using solrj.

Assume, if the solrj client & Solr server are in different network (ie, Solr
server is in remote location) I need to transfer entire file content to
Solr. I believe the indexed content of a file should be less than the actual
file. 

Hence is there a way to get the content that to be indexed from client part
(instead of simply sending the entire file content - I believe the content
to be indexed should be 1 to 10% of original file. plz Correct me, if I am
wrong...) using any lucene  api and then post the specific content to remote
server.

Is there any way to achieve this ?? Plz update me, if I am anything wrongly
understand.

Thanks in Advance..

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Post-content-to-be-indexed-to-Solr-tp3249009p3249009.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Auto-optimization for Solr indexes

2015-12-20 Thread Rahul Ramesh
Hi Erick,
We index around several million documents/ day and we optimize everyday
when the relative load is low. The reason we optimize is, we dont want the
index sizes to grow too large and auto optimzie to kick in. When auto
optimize kicks in, it results in unpredictable performance as it is CPU and
IO intensive.

In older solr (4.2), when the segment size grows too large, insertion used
to fail .  Have we seen this problem in solr cloud?

Also, we have observed, recovery takes a bit more time when it is not
optimized. We dont have any quantitative measurement for the same. Its just
an observation. Is this correct observation?

If we optimize it every day, the indexes will not be skewed right?

Please let me know if my understanding is correct.

Regards,
Rahul

On Mon, Dec 21, 2015 at 9:54 AM, Erick Erickson 
wrote:

> You'll probably have to shard before you get to the TB range. At that
> point, all the optimization is done individually on each shard so it
> really doesn't matter how many shards you have.
>
> Just issuing
> http://solr:port/solr/collection/update?optimize=true
>
> is sufficient, that'll forward the optimize command to all the shards
> in the collection.
>
> Best,
> Erick
>
> On Sun, Dec 20, 2015 at 8:19 PM, Zheng Lin Edwin Yeo
>  wrote:
> > Thanks for your information Erick.
> >
> > We have yet to decide how often we will update the index to include new
> > documents that came in. Let's say we update the index once a day, then
> when
> > the indexed is updated, we do the optimization (this will be done at
> night
> > when there are not many users using the system).
> > But my index size will probably grow quite big (potentially can go up to
> > more than 1TB in the future), so does that have to be taken into
> > consideration too?
> >
> > Regards,
> > Edwin
> >
> >
> > On 21 December 2015 at 12:12, Erick Erickson 
> > wrote:
> >
> >> Much depends on how often the index is updated. If your index only
> >> changes, say, once a day then it's probably a good idea. If you're
> >> constantly updating your index, then I'd recommend that you do _not_
> >> optimize.
> >>
> >> Optimizing will create one large segment. That segment will be
> >> unlikely to be merged since it is so large relative to other segments
> >> for quite a while, resulting in significant wasted space. So if you're
> >> regularly indexing documents that _replace_ existing documents, this
> >> will skew your index.
> >>
> >> Bottom line:
> >> If you have a relatively static index the you can build and then use
> >> for an extended time (as in 12 hours plus) it can be worth the time to
> >> optimize. Otherwise I wouldn't bother.
> >>
> >> Best,
> >> Erick
> >>
> >> On Sun, Dec 20, 2015 at 7:57 PM, Zheng Lin Edwin Yeo
> >>  wrote:
> >> > Hi,
> >> >
> >> > I would like to find out, will it be good to do write a script to do
> an
> >> > auto-opitmization of the indexes at a certain time every day? Is there
> >> any
> >> > advantage to do so?
> >> >
> >> > I found that optimization can reduce the index size by quite a
> >> > signification amount, and allow the searching of the index to run
> faster.
> >> > But will there be advantage if we do the optimization every day?
> >> >
> >> > I'm using Solr 5.3.0
> >> >
> >> > Regards,
> >> > Edwin
> >>
>


Re: Auto-optimization for Solr indexes

2015-12-20 Thread Rahul Ramesh
Thanks Erick!

Rahul

On Mon, Dec 21, 2015 at 10:07 AM, Erick Erickson 
wrote:

> Rahul:
>
> bq:  we dont want the index sizes to grow too large and auto optimzie to
> kick in
>
> Not what quite what's going on. There is no "auto optimize". What
> there is is background merging that will take _some_ segments and
> merge them together. Very occasionally this will be the same as a full
> optimize if it just happens that "some" means all the segments.
>
> bq: recovery takes a bit more time when it is not optimized
>
> I'd be interested in formal measurements here. A recovery that copied
> the _entire_ index down from the leader shouldn't really have that
> much be different between an optimized and non-optimized index, but
> all things are possible. If the recovery is a "peer sync" it shouldn't
> matter at all.
>
> If you're continually adding documents that _replace_ older documents,
> optimizing will recover any "holes" left by the old updated docs. An
> update is really a mark-as-deleted for the old version and a re-index
> of the new. Since segments are write-once, the old data is left there
> until the segment is merged. Now, one of the bits of information that
> goes into deciding whether to merge a segment or not is the size.
> Another is the percentage of deleted docs. When you optimize, you get
> one huge segment. Now you have to update a lot of docs for that
> segment to have a large percentage of deleted documents and be merged,
> thus wasting space and memory.
>
> So it's a tradeoff. But if you're getting satisfactory performance
> from what you have now, there's no reason to change.
>
> Here's a wonderful video about the process. you want the third one
> down (TieredMergePolicy) as that's the default.
>
>
> http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html
>
> Best,
> Erick
>
> On Sun, Dec 20, 2015 at 8:26 PM, Rahul Ramesh  wrote:
> > Hi Erick,
> > We index around several million documents/ day and we optimize everyday
> > when the relative load is low. The reason we optimize is, we dont want
> the
> > index sizes to grow too large and auto optimzie to kick in. When auto
> > optimize kicks in, it results in unpredictable performance as it is CPU
> and
> > IO intensive.
> >
> > In older solr (4.2), when the segment size grows too large, insertion
> used
> > to fail .  Have we seen this problem in solr cloud?
> >
> > Also, we have observed, recovery takes a bit more time when it is not
> > optimized. We dont have any quantitative measurement for the same. Its
> just
> > an observation. Is this correct observation?
> >
> > If we optimize it every day, the indexes will not be skewed right?
> >
> > Please let me know if my understanding is correct.
> >
> > Regards,
> > Rahul
> >
> > On Mon, Dec 21, 2015 at 9:54 AM, Erick Erickson  >
> > wrote:
> >
> >> You'll probably have to shard before you get to the TB range. At that
> >> point, all the optimization is done individually on each shard so it
> >> really doesn't matter how many shards you have.
> >>
> >> Just issuing
> >> http://solr:port/solr/collection/update?optimize=true
> >>
> >> is sufficient, that'll forward the optimize command to all the shards
> >> in the collection.
> >>
> >> Best,
> >> Erick
> >>
> >> On Sun, Dec 20, 2015 at 8:19 PM, Zheng Lin Edwin Yeo
> >>  wrote:
> >> > Thanks for your information Erick.
> >> >
> >> > We have yet to decide how often we will update the index to include
> new
> >> > documents that came in. Let's say we update the index once a day, then
> >> when
> >> > the indexed is updated, we do the optimization (this will be done at
> >> night
> >> > when there are not many users using the system).
> >> > But my index size will probably grow quite big (potentially can go up
> to
> >> > more than 1TB in the future), so does that have to be taken into
> >> > consideration too?
> >> >
> >> > Regards,
> >> > Edwin
> >> >
> >> >
> >> > On 21 December 2015 at 12:12, Erick Erickson  >
> >> > wrote:
> >> >
> >> >> Much depends on how often the index is updated. If your index only
> >> >> changes, say, once a day then it's probably a good idea. If you're
> >> >> constantly updating your inde

Re: Pro and cons of using Solr Cloud vs standard Master Slave Replica

2016-01-11 Thread Rahul Ramesh
Please have a look at this post

https://support.lucidworks.com/hc/en-us/articles/201298317-What-is-SolrCloud-And-how-does-it-compare-to-master-slave-

We dont use Master slave architecture, however we use solr cloud and
standalone solr for our documents.

Indexing is a bit slow in cloud when compared to Standalone. This is
because of replication I think. However you will get a faster query
response.

Solr Cloud also requires a slightly elaborate setup with Zookeepers
compared to master/slave or standalone.

However, once Solr cloud is setup, it runs very smoothly and you dont have
to worry about the performance / high availability.

Please check the post, a detailed analysis and comparison between the two
has been given.

-Rahul


On Mon, Jan 11, 2016 at 4:58 PM, Gian Maria Ricci - aka Alkampfer <
alkamp...@nablasoft.com> wrote:

> Hi guys,
>
>
>
> a customer need a comprehensive list of all pro and cons of using standard
> Master Slave replica VS using Solr Cloud. I’m interested especially in
> query performance consideration, because in this specific situation the
> rate of new documents is really slow, but the amount of data is about 50
> millions of document, and the index size on disk for single core is about
> 30 GB.
>
>
>
> Such amount of data should be easily handled by a Master Slave replica
> with a  single core replicated on a certain number of slaves, but we need
> to evaluate also the option of SolrCloud, especially for fault tolerance.
>
>
>
> I’ve googled around, but did not find anything really comprehensive, so
> I’m looking for real experience from you in Mailing List. J.
>
>
>
> Thanks in advance.
>
>
>
> --
> Gian Maria Ricci
> Cell: +39 320 0136949
>
> [image:
> https://ci5.googleusercontent.com/proxy/5oNMOYAeFXZ_LDKanNfoLRHC37mAZkVVhkPN7QxMdA0K5JW2m0bm8azJe7oWZMNt8fKHNX1bzrUTd-kIyE40CmwT2Mlf8OI=s0-d-e1-ft#http://www.codewrecks.com/files/signature/mvp.png]
> <http://mvp.microsoft.com/en-us/mvp/Gian%20Maria%20Ricci-4025635> [image:
> https://ci3.googleusercontent.com/proxy/f-unQbmk6NtkHFspO5Y6x4jlIf_xrmGLUT3fU9y_7VUHSFUjLs7aUIMdZQYTh3eWIA0sBnvNX3WGXCU59chKXLuAHi2ArWdAcBclKA=s0-d-e1-ft#http://www.codewrecks.com/files/signature/linkedin.jpg]
> <http://www.linkedin.com/in/gianmariaricci> [image:
> https://ci3.googleusercontent.com/proxy/gjapMzu3KEakBQUstx_-cN7gHJ_GpcIZNEPjCzOYMrPl-r1DViPE378qNAQyEWbXMTj6mcduIAGaApe9qHG1KN_hyFxQAIkdNSVT=s0-d-e1-ft#http://www.codewrecks.com/files/signature/twitter.jpg]
> <https://twitter.com/alkampfer> [image:
> https://ci5.googleusercontent.com/proxy/iuDOD2sdaxRDvTwS8MO7-CcXchpNJX96uaWuvagoVLcjpAPsJi88XeOonE4vHT6udVimo7yL9ZtdrYueEfH7jXnudmi_Vvw=s0-d-e1-ft#http://www.codewrecks.com/files/signature/rss.jpg]
> <http://feeds.feedburner.com/AlkampferEng> [image:
> https://ci6.googleusercontent.com/proxy/EBJjfkBzcsSlAzlyR88y86YXcwaKfn3x7ydAObL1vtjJYclQr_l5TvrFx4PQ5qLNYW3yp7Ig66DJ-0tPJCDbDmYAFcamPQehwg=s0-d-e1-ft#http://www.codewrecks.com/files/signature/skype.jpg]
>
>
>


Understanding solr commit

2016-01-25 Thread Rahul Ramesh
We are facing some issue and we are finding it difficult to debug the
problem. We wanted to understand how solr commit works.
A background on our setup:
We have  3 Node Solr Cluster running in version 5.3.1. Its a index heavy
use case. In peak load, we index 400-500 documents/second.
We also want these documents to be visible as quickly as possible, hence we
run an external script which commits every 3 mins.

Consider the three nodes as N1, N2, N3. Commit is an synchronous operation.
So, we will not get control till the commit operation is complete.

Consider the following scenario. Although it looks like a basic scenario in
distributed system:-) but we just wanted to eliminate this possibility.

step 1 : At time T1, commit happens to Node N1
step 2: At same time T1, we search for all the documents inserted in Node
N2.

My question is

1. Is commit an atomic operation? I mean, will commit happen on all the
nodes at the same time?
2. Can we say that, the search result will always contain the documents
before commit / or after commit . Or can it so happen that we get new
documents fron N1, N2 but old documents (i.e., before commit)  from N3?

Thank you,
Rahul


Re: Understanding solr commit

2016-01-25 Thread Rahul Ramesh
Thanks for your replies.

A bit more detail about our setup.
The index size is close to 80Gb spread across 30 collections. The main
memory available is around 32Gb. We are always in short of memory!
Unfortunately we could not expand the memory as the server motherboard
doesnt support it.

We tried with solr auto commit features. However, sometimes we were getting
Java OOM exception and when I start digging more about it, somebody
suggested that I am not committing the collections often. So, we started
committing the collections explicitly.

Please let me know if our approach is not correct.

*Emir*,
We are committing to the collection only once. We have Node N1, N2 and N3
and for a collection Coll1, commit will happen to N1/coll1 every 3 minutes.
we are not doing it for every node. We will remove _shard<>_replica<> and
use only the collection name to commit.

*Alessandro*,
We are using Solr Cloud with replication factor of 2 and no of shards as
either 2 or 3.

Thanks,
Rahul









On Mon, Jan 25, 2016 at 4:43 PM, Alessandro Benedetti  wrote:

> Let me answer in line :
>
> On 25 January 2016 at 11:02, Rahul Ramesh  wrote:
>
> > We are facing some issue and we are finding it difficult to debug the
> > problem. We wanted to understand how solr commit works.
> > A background on our setup:
> > We have  3 Node Solr Cluster running in version 5.3.1. Its a index heavy
> > use case. In peak load, we index 400-500 documents/second.
> > We also want these documents to be visible as quickly as possible, hence
> we
> > run an external script which commits every 3 mins.
> >
>
> This is weird, why not using the auto-soft commit if you want visibility
> every 3 minutes ?
> Is there any particular reason you trigger the commit from the client ?
>
> >
> > Consider the three nodes as N1, N2, N3. Commit is an synchronous
> operation.
> > So, we will not get control till the commit operation is complete.
> >
> > Consider the following scenario. Although it looks like a basic scenario
> in
> > distributed system:-) but we just wanted to eliminate this possibility.
> >
> > step 1 : At time T1, commit happens to Node N1
> > step 2: At same time T1, we search for all the documents inserted in Node
> > N2.
> >
> > My question is
> >
> > 1. Is commit an atomic operation? I mean, will commit happen on all the
> > nodes at the same time?
> >
> Which kind of architecture of Solr are you using ? Are you using SolrCloud
> ?
>
> 2. Can we say that, the search result will always contain the documents
> > before commit / or after commit . Or can it so happen that we get new
> > documents fron N1, N2 but old documents (i.e., before commit)  from N3?
> >
> With a manual cluster it could faintly happen.
> In SolrCloud it should not, but I should double check the code !
>
> >
> > Thank you,
> > Rahul
> >
>
>
>
> --
> --
>
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>


Re: Understanding solr commit

2016-01-25 Thread Rahul Ramesh
Can you give us bit more details about Solr heap parameters.
Each node has 32Gb of RAM and we are using 8Gb for heap.
Index size in each node is around 80Gb
#of collections 30


Also can you give us info about auto commit (both hard and soft) you used
when experienced OOM.
 15000 15000 false 

soft commit is not enabled.

-Rahul



On Mon, Jan 25, 2016 at 6:00 PM, Emir Arnautovic <
emir.arnauto...@sematext.com> wrote:

> Hi Rahul,
> It is good that you commit only once, but not sure how external commits
> can do something auto commit cannot.
> Can you give us bit more details about Solr heap parameters. Running Solr
> on the edge of OOM is always risk of starting snowball effect and crashing
> entire cluster. Also can you give us info about auto commit (both hard and
> soft) you used when experienced OOM.
>
> Thanks,
> Emir
>
> On 25.01.2016 12:28, Rahul Ramesh wrote:
>
>> Thanks for your replies.
>>
>> A bit more detail about our setup.
>> The index size is close to 80Gb spread across 30 collections. The main
>> memory available is around 32Gb. We are always in short of memory!
>> Unfortunately we could not expand the memory as the server motherboard
>> doesnt support it.
>>
>> We tried with solr auto commit features. However, sometimes we were
>> getting
>> Java OOM exception and when I start digging more about it, somebody
>> suggested that I am not committing the collections often. So, we started
>> committing the collections explicitly.
>>
>> Please let me know if our approach is not correct.
>>
>> *Emir*,
>> We are committing to the collection only once. We have Node N1, N2 and N3
>> and for a collection Coll1, commit will happen to N1/coll1 every 3
>> minutes.
>> we are not doing it for every node. We will remove _shard<>_replica<> and
>> use only the collection name to commit.
>>
>> *Alessandro*,
>>
>> We are using Solr Cloud with replication factor of 2 and no of shards as
>> either 2 or 3.
>>
>> Thanks,
>> Rahul
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Mon, Jan 25, 2016 at 4:43 PM, Alessandro Benedetti <
>> abenede...@apache.org
>>
>>> wrote:
>>> Let me answer in line :
>>>
>>> On 25 January 2016 at 11:02, Rahul Ramesh  wrote:
>>>
>>> We are facing some issue and we are finding it difficult to debug the
>>>> problem. We wanted to understand how solr commit works.
>>>> A background on our setup:
>>>> We have  3 Node Solr Cluster running in version 5.3.1. Its a index heavy
>>>> use case. In peak load, we index 400-500 documents/second.
>>>> We also want these documents to be visible as quickly as possible, hence
>>>>
>>> we
>>>
>>>> run an external script which commits every 3 mins.
>>>>
>>>> This is weird, why not using the auto-soft commit if you want visibility
>>> every 3 minutes ?
>>> Is there any particular reason you trigger the commit from the client ?
>>>
>>> Consider the three nodes as N1, N2, N3. Commit is an synchronous
>>>>
>>> operation.
>>>
>>>> So, we will not get control till the commit operation is complete.
>>>>
>>>> Consider the following scenario. Although it looks like a basic scenario
>>>>
>>> in
>>>
>>>> distributed system:-) but we just wanted to eliminate this possibility.
>>>>
>>>> step 1 : At time T1, commit happens to Node N1
>>>> step 2: At same time T1, we search for all the documents inserted in
>>>> Node
>>>> N2.
>>>>
>>>> My question is
>>>>
>>>> 1. Is commit an atomic operation? I mean, will commit happen on all the
>>>> nodes at the same time?
>>>>
>>>> Which kind of architecture of Solr are you using ? Are you using
>>> SolrCloud
>>> ?
>>>
>>> 2. Can we say that, the search result will always contain the documents
>>>
>>>> before commit / or after commit . Or can it so happen that we get new
>>>> documents fron N1, N2 but old documents (i.e., before commit)  from N3?
>>>>
>>>> With a manual cluster it could faintly happen.
>>> In SolrCloud it should not, but I should double check the code !
>>>
>>> Thank you,
>>>> Rahul
>>>>
>>>>
>>>
>>> --
>>> --
>>>
>>> Benedetti Alessandro
>>> Visiting card : http://about.me/alessandro_benedetti
>>>
>>> "Tyger, tyger burning bright
>>> In the forests of the night,
>>> What immortal hand or eye
>>> Could frame thy fearful symmetry?"
>>>
>>> William Blake - Songs of Experience -1794 England
>>>
>>>
> --
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> Solr & Elasticsearch Support * http://sematext.com/
>
>


Re: Understanding solr commit

2016-01-25 Thread Rahul Ramesh
Thank you Emir, Allesandro for the inputs. We use sematext for monitoring.
We understand that Solr needs more memory but unfortunately we have to move
towards an altogether new range of servers.
As you say eventually, we will have to upgrade our servers.

Thanks,
Rahul


On Mon, Jan 25, 2016 at 6:32 PM, Emir Arnautovic <
emir.arnauto...@sematext.com> wrote:

> Hi Rahul,
> It is hard to tell without seeing metrics, but 8GB heap seems small for
> such setup - e.g. with indexing buffer of 32MB and 30 collections, it will
> eat almost 1GB memory.
> About commits, you can set auto commit to be more frequent (keep
> openSearcher=false) and add soft commits every 3 min.
> What you need to tune is your heap and heap related settings - indexing
> buffer, caches. Not sure what you use for monitoring Solr, but Sematext's
> SPM (http://sematext.com/spm) is one such tool that can give you info how
> you Solr, JVM and host handle different load. One such tool can give you
> enough info to tune your Solr.
>
> Regards,
> Emir
>
>
> On 25.01.2016 13:42, Rahul Ramesh wrote:
>
>> Can you give us bit more details about Solr heap parameters.
>> Each node has 32Gb of RAM and we are using 8Gb for heap.
>> Index size in each node is around 80Gb
>> #of collections 30
>>
>>
>> Also can you give us info about auto commit (both hard and soft) you used
>> when experienced OOM.
>>  15000 15000
>> >
>>> false 
>>>
>> soft commit is not enabled.
>>
>> -Rahul
>>
>>
>>
>> On Mon, Jan 25, 2016 at 6:00 PM, Emir Arnautovic <
>> emir.arnauto...@sematext.com> wrote:
>>
>> Hi Rahul,
>>> It is good that you commit only once, but not sure how external commits
>>> can do something auto commit cannot.
>>> Can you give us bit more details about Solr heap parameters. Running Solr
>>> on the edge of OOM is always risk of starting snowball effect and
>>> crashing
>>> entire cluster. Also can you give us info about auto commit (both hard
>>> and
>>> soft) you used when experienced OOM.
>>>
>>> Thanks,
>>> Emir
>>>
>>> On 25.01.2016 12:28, Rahul Ramesh wrote:
>>>
>>> Thanks for your replies.
>>>>
>>>> A bit more detail about our setup.
>>>> The index size is close to 80Gb spread across 30 collections. The main
>>>> memory available is around 32Gb. We are always in short of memory!
>>>> Unfortunately we could not expand the memory as the server motherboard
>>>> doesnt support it.
>>>>
>>>> We tried with solr auto commit features. However, sometimes we were
>>>> getting
>>>> Java OOM exception and when I start digging more about it, somebody
>>>> suggested that I am not committing the collections often. So, we started
>>>> committing the collections explicitly.
>>>>
>>>> Please let me know if our approach is not correct.
>>>>
>>>> *Emir*,
>>>> We are committing to the collection only once. We have Node N1, N2 and
>>>> N3
>>>> and for a collection Coll1, commit will happen to N1/coll1 every 3
>>>> minutes.
>>>> we are not doing it for every node. We will remove _shard<>_replica<>
>>>> and
>>>> use only the collection name to commit.
>>>>
>>>> *Alessandro*,
>>>>
>>>> We are using Solr Cloud with replication factor of 2 and no of shards as
>>>> either 2 or 3.
>>>>
>>>> Thanks,
>>>> Rahul
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Mon, Jan 25, 2016 at 4:43 PM, Alessandro Benedetti <
>>>> abenede...@apache.org
>>>>
>>>> wrote:
>>>>> Let me answer in line :
>>>>>
>>>>> On 25 January 2016 at 11:02, Rahul Ramesh  wrote:
>>>>>
>>>>> We are facing some issue and we are finding it difficult to debug the
>>>>>
>>>>>> problem. We wanted to understand how solr commit works.
>>>>>> A background on our setup:
>>>>>> We have  3 Node Solr Cluster running in version 5.3.1. Its a index
>>>>>> heavy
>>>>>> use case. In peak load, we index 400-500 documents/second.
>>>>>> We also want these documents to be visible as quickly as possible,
>>>>>> hence
>>>>>>
>

Re: Increasing Solr5 time out from 30 seconds while starting solr

2015-12-08 Thread Rahul Ramesh
Hi Debraj,
I dont think increasing the timeout will help. Are you sure solr/ any other
program is not running on 8789? Please check the output of lsof -i :8789 .

Regards,
Rahul

On Tue, Dec 8, 2015 at 11:58 PM, Debraj Manna 
wrote:

> Can someone help me on this?
> On Dec 7, 2015 7:55 PM, "D"  wrote:
>
> > Hi,
> >
> > Many time while starting solr I see the below message and then the solr
> is
> > not reachable.
> >
> > debraj@boutique3:~/solr5$ sudo bin/solr start -p 8789
> > Waiting to see Solr listening on port 8789 [-]  Still not seeing Solr
> listening on 8789 after 30 seconds!
> >
> > However when I try to start solr again by trying to execute the same
> > command. It says that *"solr is already running on port 8789. Try using a
> > different port with -p"*
> >
> > I am having two cores in my local set-up. I am guessing this is happening
> > because one of the core is a little big. So solr is timing out while
> > loading the core. If I take one of the core out of solr then everything
> > works fine.
> >
> > Can some one let me know how can I increase this timeout value from
> > default 30 seconds?
> >
> > I am using Solr 5.2.1 on Debian 7.
> >
> > Thanks,
> >
> >
>


Re: Moving to SolrCloud, specifying dataDir correctly

2015-12-14 Thread Rahul Ramesh
We currently moved data from magnetic drive to SSD. We run Solr in cloud
mode. Only data is stored in the drive configuration is stored in ZK. We
start solr using the -s option specifying the data dir
Command to start solr
./bin/solr start -c -h  -p  -z  -s 

We followed the following steps to migrate data

1. Stop all new insertions .
2. Copy the solr data to the new location
3. restart the server with -s option pointing to new solr directory name.
4. We have a 3 node solr cluster. The restarted server will get in sync
with the other two servers.
5. Repeat this procedure for other two servers.

We used the similar procedure to upgrade from 5.2.1 to 5.3.1.





On Tue, Dec 15, 2015 at 5:07 AM, Jeff Wartes  wrote:

>
> Don’t set solr.data.dir. Instead, set the install dir. Something like:
> -Dsolr.solr.home=/data/solr
> -Dsolr.install.dir=/opt/solr
>
> I have many solrcloud collections, and separate data/install dirs, and
> I’ve never had to do anything with manual per-collection or per-replica
> data dirs.
>
> That said, it’s been a while since I set this up, and I may not remember
> all the pieces.
> You might need something like this too, for example:
>
> -Djetty.home=/opt/solr/server
>
>
> On 12/14/15, 3:11 PM, "Erick Erickson"  wrote:
>
> >Currently, it'll be a little tedious but here's what you can do (going
> >partly from memory)...
> >
> >When you create the collection, specify the special value EMPTY for
> >createNodeSet (Solr 5.3+).
> >Use ADDREPLICA to add each individual replica. When you do this, you
> >can add a dataDir for
> >each individual replica and thus keep them separate, i.e. for a
> >particular box the first
> >replica would get /data/solr/collection1_shard1_replica1, the second
> >/data/solr/collection1_shard2_replica1 and so forth.
> >
> >If you don't have Solr 5.3+, you can still to the same thing, except
> >you create your collection letting
> >the replicas fall where they will. Then do the ADDREPLICA as above.
> >When that's all done,
> >DELETEREPLICA for the original replicas.
> >
> >Best,
> >Erick
> >
> >On Mon, Dec 14, 2015 at 2:21 PM, Tom Evans 
> >wrote:
> >> On Mon, Dec 14, 2015 at 1:22 PM, Shawn Heisey 
> >>wrote:
> >>> On 12/14/2015 10:49 AM, Tom Evans wrote:
>  When I tried this in SolrCloud mode, specifying
>  "-Dsolr.data.dir=/mnt/solr/" when starting each node, it worked fine
>  for the first collection, but then the second collection tried to use
>  the same directory to store its index, which obviously failed. I fixed
>  this by changing solrconfig.xml in each collection to specify a
>  specific directory, like so:
> 
>    ${solr.data.dir:}products
> 
>  Looking back after the weekend, I'm not a big fan of this. Is there a
>  way to add a core.properties to ZK, or a way to specify
>  core.baseDatadir on the command line, or just a better way of handling
>  this that I'm not aware of?
> >>>
> >>> Since you're running SolrCloud, just let Solr handle the dataDir, don't
> >>> try to override it.  It will default to "data" relative to the
> >>> instanceDir.  Each instanceDir is likely to be in the solr home.
> >>>
> >>> With SolrCloud, your cores will not contain a "conf" directory (unless
> >>> you create it manually), therefore the on-disk locations will be *only*
> >>> data, there's not really any need to have separate locations for
> >>> instanceDir and dataDir.  All active configuration information for
> >>> SolrCloud is in zookeeper.
> >>>
> >>
> >> That makes sense, but I guess I was asking the wrong question :)
> >>
> >> We have our SSDs mounted on /data/solr, which is where our indexes
> >> should go, but our solr install is on /opt/solr, with the default solr
> >> home in /opt/solr/server/solr. How do we change where the indexes get
> >> put so they end up on the fast storage?
> >>
> >> Cheers
> >>
> >> Tom
>
>


Re: solrcloud used a lot of memory and memory keep increasing during long time run

2015-12-15 Thread Rahul Ramesh
You should actually decrease solr heap size. Let me explain a bit.

Solr requires very less heap memory for its operation and more memory for
storing data in main memory. This is because solr uses mmap for storing the
index files.
Please check the link
http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html for
understanding how solr operates on files .

Solr has typical problem of Garbage collection once you the heap size to a
large value. It will have indeterminate pauses due to GC. The amount of
heap memory required is difficult to tell. However the way we tuned this
parameter is setting it to a low value and increasing it by 1Gb whenever
OOM is thrown.

Please check the problem of having large Java Heap

http://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap


Just for your reference, in our production setup, we have data of around
60Gb/node spread across 25 collections. We have configured 8GB as heap and
the rest of the memory we will leave it to OS to manage. We do around 1000
(search + Insert)/second on the data.

I hope this helps.

Regards,
Rahul



On Tue, Dec 15, 2015 at 4:33 PM, zhenglingyun  wrote:

> Hi, list
>
> I’m new to solr. Recently I encounter a “memory leak” problem with
> solrcloud.
>
> I have two 64GB servers running a solrcloud cluster. In the solrcloud, I
> have
> one collection with about 400k docs. The index size of the collection is
> about
> 500MB. Memory for solr is 16GB.
>
> Following is "ps aux | grep solr” :
>
> /usr/java/jdk1.7.0_67-cloudera/bin/java
> -Djava.util.logging.config.file=/var/lib/solr/tomcat-deployment/conf/logging.properties
> -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
> -Djava.net.preferIPv4Stack=true -Dsolr.hdfs.blockcache.enabled=true
> -Dsolr.hdfs.blockcache.direct.memory.allocation=true
> -Dsolr.hdfs.blockcache.blocksperbank=16384
> -Dsolr.hdfs.blockcache.slab.count=1 -Xms16608395264 -Xmx16608395264
> -XX:MaxDirectMemorySize=21590179840 -XX:+UseParNewGC
> -XX:+UseConcMarkSweepGC -XX:-CMSConcurrentMTEnabled
> -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled
> -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintHeapAtGC
> -Xloggc:/var/log/solr/gc.log
> -XX:OnOutOfMemoryError=/usr/lib64/cmf/service/common/killparent.sh -DzkHost=
> bjzw-datacenter-hadoop-160.d.yourmall.cc:2181,
> bjzw-datacenter-hadoop-163.d.yourmall.cc:2181,
> bjzw-datacenter-hadoop-164.d.yourmall.cc:2181/solr
> -Dsolr.solrxml.location=zookeeper -Dsolr.hdfs.home=hdfs://datacenter/solr
> -Dsolr.hdfs.confdir=/var/run/cloudera-scm-agent/process/6288-solr-SOLR_SERVER/hadoop-conf
> -Dsolr.authentication.simple.anonymous.allowed=true
> -Dsolr.security.proxyuser.hue.hosts=*
> -Dsolr.security.proxyuser.hue.groups=* -Dhost=
> bjzw-datacenter-solr-15.d.yourmall.cc -Djetty.port=8983 -Dsolr.host=
> bjzw-datacenter-solr-15.d.yourmall.cc -Dsolr.port=8983
> -Dlog4j.configuration=file:///var/run/cloudera-scm-agent/process/6288-solr-SOLR_SERVER/log4j.properties
> -Dsolr.log=/var/log/solr -Dsolr.admin.port=8984
> -Dsolr.max.connector.thread=1 -Dsolr.solr.home=/var/lib/solr
> -Djava.net.preferIPv4Stack=true -Dsolr.hdfs.blockcache.enabled=true
> -Dsolr.hdfs.blockcache.direct.memory.allocation=true
> -Dsolr.hdfs.blockcache.blocksperbank=16384
> -Dsolr.hdfs.blockcache.slab.count=1 -Xms16608395264 -Xmx16608395264
> -XX:MaxDirectMemorySize=21590179840 -XX:+UseParNewGC
> -XX:+UseConcMarkSweepGC -XX:-CMSConcurrentMTEnabled
> -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled
> -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintHeapAtGC
> -Xloggc:/var/log/solr/gc.log
> -XX:OnOutOfMemoryError=/usr/lib64/cmf/service/common/killparent.sh -DzkHost=
> bjzw-datacenter-hadoop-160.d.yourmall.cc:2181,
> bjzw-datacenter-hadoop-163.d.yourmall.cc:2181,
> bjzw-datacenter-hadoop-164.d.yourmall.cc:2181/solr
> -Dsolr.solrxml.location=zookeeper -Dsolr.hdfs.home=hdfs://datacenter/solr
> -Dsolr.hdfs.confdir=/var/run/cloudera-scm-agent/process/6288-solr-SOLR_SERVER/hadoop-conf
> -Dsolr.authentication.simple.anonymous.allowed=true
> -Dsolr.security.proxyuser.hue.hosts=*
> -Dsolr.security.proxyuser.hue.groups=* -Dhost=
> bjzw-datacenter-solr-15.d.yourmall.cc -Djetty.port=8983 -Dsolr.host=
> bjzw-datacenter-solr-15.d.yourmall.cc -Dsolr.port=8983
> -Dlog4j.configuration=file:///var/run/cloudera-scm-agent/process/6288-solr-SOLR_SERVER/log4j.properties
> -Dsolr.log=/var/log/solr -Dsolr.admin.port=8984
> -Dsolr.max.connector.thread=1 -Dsolr.solr.home=/var/lib/solr
> -Djava.endorsed.dirs=/usr/lib/bigtop-tomcat/endorsed -classpath
> /usr/lib/bigtop-tomcat/bin/bootstrap.jar
> -Dcatalina.base=/var/lib/solr/tomcat-deployment
> -Dcatalina.home=/usr/lib/bigtop-tomcat -Djava.io.tmpdir=/var/lib/solr/
> org.apache.catalina.startup.Bootstrap start
>
>

Atomic Update while having fields with attribute stored="true" in schema

2015-02-23 Thread Rahul Bhooteshwar
Hi,
I have around 50 fields in my schema and having 20 fields are stored=”true”
and rest of them stored=”false”
In case partial update (atomic update), it is  mentioned at many places
that the fields in schema should have stored=”true”. I have also tried
atomic update on documents having fields with stored="false" and
indexed="true", and it didn't work (My whole document vanished from solr or
I am unable to search it now, whatever.). Although I didn't change the
existing value for the fields having stored="false".

Which means I have to change all my fields to stored=”true” if I want to
use atomic update.Right?
Will it affect the performance of the Solr? if yes, then what is the best
practice to reduce performance degradation as much as possible?Thanks in
advance.

Thanks and Regards,
Rahul Bhooteshwar
Enterprise Software Engineer
HotWax Systems <http://www.hotwaxsystems.com> - The global leader in
innovative enterprise commerce solutions powered by Apache OFBiz.
ApacheCon US 2014 Silver Sponsor


Re: Atomic Update while having fields with attribute stored="true" in schema

2015-02-23 Thread Rahul Bhooteshwar
Hi Yago Riveiro,
Thanks for your quick reply. I am using Solr for faceted search using *Solr**j.
*I am using facet queries and filter queries. I am new to Solr so I would
like to know what is the best practice to handle such scenarios.

Thanks and Regards,
Rahul Bhooteshwar
Enterprise Software Engineer
HotWax Systems <http://www.hotwaxsystems.com> - The global leader in
innovative enterprise commerce solutions powered by Apache OFBiz.
ApacheCon US 2014 Silver Sponsor

On Mon, Feb 23, 2015 at 5:42 PM, Yago Riveiro 
wrote:

> "Which means I have to change all my fields to stored=”true” if I want to
>
> use atomic update.Right?”
>
>
>
>
> Yes, and re-index all your data.
>
>
>
>
> "Will it affect the performance of the Solr?”
>
>
>
>
> What type of queries are you doing now?
>
>
> —
> /Yago Riveiro
>
> On Mon, Feb 23, 2015 at 12:05 PM, Rahul Bhooteshwar
>  wrote:
>
> > Hi,
> > I have around 50 fields in my schema and having 20 fields are
> stored=”true”
> > and rest of them stored=”false”
> > In case partial update (atomic update), it is  mentioned at many places
> > that the fields in schema should have stored=”true”. I have also tried
> > atomic update on documents having fields with stored="false" and
> > indexed="true", and it didn't work (My whole document vanished from solr
> or
> > I am unable to search it now, whatever.). Although I didn't change the
> > existing value for the fields having stored="false".
> > Which means I have to change all my fields to stored=”true” if I want to
> > use atomic update.Right?
> > Will it affect the performance of the Solr? if yes, then what is the best
> > practice to reduce performance degradation as much as possible?Thanks in
> > advance.
> > Thanks and Regards,
> > Rahul Bhooteshwar
> > Enterprise Software Engineer
> > HotWax Systems <http://www.hotwaxsystems.com> - The global leader in
> > innovative enterprise commerce solutions powered by Apache OFBiz.
> > ApacheCon US 2014 Silver Sponsor
>


solr.war built from solr 4.7.2 not working

2015-05-07 Thread Rahul Singh
Hi,
  I have tried to deploy solr.war from building it from 4.7.2 but it is
showing the below mentioned error. Has anyone faced the same? any lead
would also be appreciated.

Error Message:

{
  "responseHeader": {
"status": 500,
"QTime": 33
  },
  "error": {
"msg": "parsing error",
"trace":
"org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
parsing error
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:477)
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:199)
at
org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:157)
at
org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:119)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.solr.common.SolrException: parsing error
at
org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse(BinaryResponseParser.java:45)
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:475)
... 9 more
Caused by: java.io.EOFException
at
org.apache.solr.common.util.FastInputStream.readByte(FastInputStream.java:193)
at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:172)
at org.apache.solr.common.util.JavaBinCodec.readArray(JavaBinCodec.java:477)
at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:186)
at
org.apache.solr.common.util.JavaBinCodec.readSolrDocumentList(JavaBinCodec.java:359)
at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:221)
at
org.apache.solr.common.util.JavaBinCodec.readOrderedMap(JavaBinCodec.java:125)
at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:188)
at org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:116)
at
org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse(BinaryResponseParser.java:43)
... 10 more
",
"code": 500
  }
}


Thanks and Regards,


Re: solr.war built from solr 4.7.2 not working

2015-05-07 Thread Rahul Singh
response inline.

On Thu, May 7, 2015 at 7:01 PM, Shawn Heisey  wrote:

> On 5/7/2015 3:43 AM, Rahul Singh wrote:
> >   I have tried to deploy solr.war from building it from 4.7.2 but it is
> > showing the below mentioned error. Has anyone faced the same? any lead
> > would also be appreciated.
> >
> > Error Message:
> >
> > {
> >   "responseHeader": {
> > "status": 500,
> > "QTime": 33
> >   },
> >   "error": {
> > "msg": "parsing error",
> > "trace":
> > "org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
> > parsing error
>
> Did you change the source code in any way before you compiled it?  You
> haven't said what you're actually doing that resulted in this error, or
> given any other details about your setup.  It's good that you've given
> us the full response with the error, but additional details, like the
> request that generated the error and any errors found in the Solr log,
> are important.
>
> just made few build files changes to include my jar for overriding lucene
default similarity.

logs showing following error...

ERROR - 2015-05-08 11:15:25.738; org.apache.solr.common.SolrException;
null:java.lang.IllegalArgumentException: You cannot set an index-time bo
ost on an unindexed field, or one that omits norms
at org.apache.lucene.document.Field.setBoost(Field.java:452)
at
org.apache.lucene.document.DocumentStoredFieldVisitor.stringField(DocumentStoredFieldVisitor.java:75)
at
org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader.readField(CompressingStoredFieldsReader.java:187)
at
org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader.visitDocument(CompressingStoredFieldsReader.java:351)
at
org.apache.lucene.index.SegmentReader.document(SegmentReader.java:287)
at
org.apache.lucene.index.BaseCompositeReader.document(BaseCompositeReader.java:110)
at
org.apache.lucene.index.IndexReader.document(IndexReader.java:446)
at
org.apache.solr.search.SolrIndexSearcher.doc(SolrIndexSearcher.java:659)
at
org.apache.solr.response.BinaryResponseWriter$Resolver.writeResultsBody(BinaryResponseWriter.java:147)
at
org.apache.solr.response.BinaryResponseWriter$Resolver.writeResults(BinaryResponseWriter.java:174)
at
org.apache.solr.response.BinaryResponseWriter$Resolver.resolve(BinaryResponseWriter.java:87)
at
org.apache.solr.common.util.JavaBinCodec.writeVal(JavaBinCodec.java:158)
at
org.apache.solr.common.util.JavaBinCodec.writeNamedList(JavaBinCodec.java:148)
at
org.apache.solr.common.util.JavaBinCodec.writeKnownType(JavaBinCodec.java:242)
at
org.apache.solr.common.util.JavaBinCodec.writeVal(JavaBinCodec.java:153)
at
org.apache.solr.common.util.JavaBinCodec.marshal(JavaBinCodec.java:96)
at
org.apache.solr.response.BinaryResponseWriter.write(BinaryResponseWriter.java:51)
at
org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:749)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:428)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:205)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:950)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)
at
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1040)

Because the error comes from HttpSolrServer and is embedded in a Solr
> response, I'm guessing this is a distributed request ... but I can't
> tell if it's SolrCloud or "manual" sharding.
>
its solr cloud sharding and it's is a solr cloud implementation with two
nodes both of them using the same war.


> With no other information to go on, I do have some possible ideas:
>
> You might have changed something fundamental in the source code that
> makes the distributed request incompatible with the target core/server.
>
> There might be mixed versions ... either multiple copi

Inconsistent parsing of pure negative queries inside brackets

2016-07-04 Thread Rahul Verma
Hi everyone,

While tracing a bug in one of our systems we notices some interesting
behavior from Solr.

These two queries return different results. I fail to understand why the
second query returns empty results just by adding brackets. Can you please
help us understand this behavior?
*1. Without Brackets:*
{ "responseHeader": { "status": 0, "QTime": 0, "params": { "q": "*:*", "
indent": "true", "fq": "-fl_monitoring_channel: 36 AND (title: salesforce)",
"wt": "json", "_": "1467637035433" } }, "response": { "numFound": 35541, "
start": 0, "docs": [...

*2. With Brackets:*
{ "responseHeader": { "status": 0, "QTime": 0, "params": { "q": "*:*", "
indent": "true", "fq": "*(*-fl_monitoring_channel: 36*)* AND (title:
salesforce)", "wt": "json", "_": "1467637344339" } }, "response": { "
numFound": 0, "start": 0, "docs": [] } }


'Minimum Should Match' on subquery level

2016-12-13 Thread Rahul Lodha
Hi Myron,

Can you give me an example of this?

http://grokbase.com/t/lucene/solr-user/105jjpxa2x/minimum-should-match-on-subquery-level
 
<http://grokbase.com/t/lucene/solr-user/105jjpxa2x/minimum-should-match-on-subquery-level>

Regards,
Rahul

Re: ranking retrieval measure

2014-04-01 Thread Rahul Singh
one of the measurement criteria is DCG.
http://en.wikipedia.org/wiki/Discounted_cumulative_gain



On Tue, Apr 1, 2014 at 11:44 AM, Floyd Wu  wrote:

> Usually IR system is measured using Precision & Recall.
> But depends on what kind of system you are developing to fit what scenario.
>
> Take a look
> http://en.wikipedia.org/wiki/Precision_and_recall
>
>
>
> 2014-04-01 10:23 GMT+08:00 azhar2007 :
>
> > Hi people. Ive developed a search engine to implement and improve it
> using
> > another search engine as a test case. Now I want to compare and test
> > results
> > from both to determine which is better. I am unaware of how to do this so
> > someone please point me in the right direction.
> >
> > Regards
> >
> >
> >
> > --
> > View this message in context:
> >
> http://lucene.472066.n3.nabble.com/ranking-retrieval-measure-tp4128324.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
>


facet.missing=true returns null records with zero count also

2013-06-04 Thread Rahul R
All,
We had a requirement in our solr powered application where customers want
to see all the documents that have a blank value for a field. So when they
facet on a field, if the field has null values, they should be able select
that facet value and see all documents. I thought facet.missing=true was
the answer.

When I set facet.missing=true in solrconfig.xml, I expected to get facet
values that are null along with their count. However, when there is no null
value, I do not want the null to be returned along with a count of zero,
which is what is happening now.

Background information: Using SolrJ with Solr 3.4 and jdk7
Sample program
SolrQuery facquery= new SolrQuery();
facquery.setQuery("*:*");
facquery.addFilterQuery("Field2:\"ISC\"");
facquery.setRows(0);
facquery.setFacet(true);
facquery.setFacetMinCount(1);
facquery.setFacetLimit(2);
String[] orderedFacetList = new String[] {"Field1", "Field2", "Field3"};
for(int i=0; i < orderedFacetList.length; i++) {
facquery.addFacetField(orderedFacetList[i]);
}
try {
facResponse = server.query(facquery);
}catch(SolrServerException ex) {
}
FacetField ff1=facResponse.getFacetField("Field2");
int count = ff1.getValueCount();//This gives count of 2
List flist = ff1.getValues(); //The values are [ISC
(1077), null (0)]

In the above program, I am applying a filter on the field Field2 with a
value ISC. So the results will be only documents that have ISC for Field2.
My expectation is that flist in above program should only return [ISC
(1077)].

Appreciate any pointers on this. Thank you

- Rahul


Re: facet.missing=true returns null records with zero count also

2013-06-05 Thread Rahul R
Hoss,
We rely heavily on facet.mincount because once a user has selected a facet,
it doesn't make sense for us to show that facet field to him and let him
filter again with the same facet. Also, when a facet has only one value, it
doesn't make sense to show it to the user, since searching with that facet
is just going to give the same result set again. So when facet.missing does
not work with facet.mincount, it is a bit of a hassle for us Will work
on handling it in our program.Thank you for the clarification

- Rahul


On Wed, Jun 5, 2013 at 12:32 AM, Chris Hostetter
wrote:

>
> : that facet value and see all documents. I thought facet.missing=true was
> : the answer.
> ...
> : facquery.setFacetMinCount(1);
>
> Hmm, yeah -- it looks like facet.missing doesn't take facet.mincount into
> consideration.
>
> I don't remember if that was intentional or not, but as a special case
> one-off count it seems like a toss up as to wether it would be more or
> less surprising to hide it if it's below the mincount. (it's very similar
> to doing one off facet.query for example, and those are always included in
> the response and don't consider the facet.mincount either)
>
> In general, this seems like a low impact thing though, correct?  i mean:
> the main advantage of facet.mincount is to reduce what could be a very
> large amount of useless data from being stream from the server->client,
> particularly in the case of using facet.sort where you really need the
> consraints eliminated server side in order to get the sort=limit applied
> correctly.
>
> but with the facet.missing value, it's just a single value per field that
> can easily be ignored by the client if it's not desired because of the
> mincount.  or to put it another way: the amount of work needed to ignor
> this on the client, is less then the amount of work to make it
> configurable to ignore it on the server.
>
>
> -Hoss
>


OR query with null value and non-null value(s)

2013-06-06 Thread Rahul R
I have recently enabled facet.missing=true in solrconfig.xml which gives
null facet values also. As I understand it, the syntax to do a faceted
search on a null value is something like this:
&fq=-price:[* TO *]
So when I want to search on a particular value (for example : 4)  OR null
value, I would expect the syntax to be something like this:
&fq=(price:4+OR+(-price:[* TO *]))
But this does not work. After searching around for more, read somewhere
that the right way to achieve this would be:
fq=-(-price:4+AND+price:[*+TO+*])
Now this does work but seems like a very roundabout way. Is there a better
way to achieve this ?

I use solrJ in Solr 3.4.

Thank you.

- Rahul


Re: OR query with null value and non-null value(s)

2013-06-06 Thread Rahul R
Thank you Shawn. This does work. To help me understand better, why do
we need the *:* ? Shouldn't it be implicit ?
Shouldn't
fq=(price:4+OR+(-price:[* TO *]))  //does not work
mean the same as
fq=(price:4+OR+(*:* -price:[* TO *]))   //works

Why does Solr need the *:* there ?




On Fri, Jun 7, 2013 at 12:07 AM, Shawn Heisey  wrote:

> On 6/6/2013 12:28 PM, Rahul R wrote:
>
>> I have recently enabled facet.missing=true in solrconfig.xml which gives
>> null facet values also. As I understand it, the syntax to do a faceted
>> search on a null value is something like this:
>> &fq=-price:[* TO *]
>> So when I want to search on a particular value (for example : 4)  OR null
>> value, I would expect the syntax to be something like this:
>> &fq=(price:4+OR+(-price:[* TO *]))
>> But this does not work. After searching around for more, read somewhere
>> that the right way to achieve this would be:
>> fq=-(-price:4+AND+price:[*+TO+***])
>> Now this does work but seems like a very roundabout way. Is there a better
>> way to achieve this ?
>>
>
> Pure negative queries don't work -- you have to have results in the query
> before you can subtract.  For some top-level queries, Solr is able to
> detect this situation and fix it internally, but on inner queries you must
> explicitly state your intentions.  It is best if you always use '*:*
> -query' syntax, just to be safe.
>
> fq=(price:4+OR+(*:* -price:[* TO *]))
>
> Thanks,
> Shawn
>
>


Re: OR query with null value and non-null value(s)

2013-06-07 Thread Rahul R
Thank you for the Clarification Shawn.


On Fri, Jun 7, 2013 at 7:34 PM, Jack Krupansky wrote:

> Yes, it SHOULD! And in the LucidWorks Search query parser it does. Why
> doesn't it in Solr? Ask Yonik to explain that!
>
> -- Jack Krupansky
>
> -Original Message- From: Rahul R
> Sent: Friday, June 07, 2013 1:21 AM
> To: solr-user@lucene.apache.org
> Subject: Re: OR query with null value and non-null value(s)
>
>
> Thank you Shawn. This does work. To help me understand better, why do
> we need the *:* ? Shouldn't it be implicit ?
> Shouldn't
> fq=(price:4+OR+(-price:[* TO *]))  //does not work
> mean the same as
> fq=(price:4+OR+(*:* -price:[* TO *]))   //works
>
> Why does Solr need the *:* there ?
>
>
>
>
> On Fri, Jun 7, 2013 at 12:07 AM, Shawn Heisey  wrote:
>
>  On 6/6/2013 12:28 PM, Rahul R wrote:
>>
>>  I have recently enabled facet.missing=true in solrconfig.xml which gives
>>> null facet values also. As I understand it, the syntax to do a faceted
>>> search on a null value is something like this:
>>> &fq=-price:[* TO *]
>>> So when I want to search on a particular value (for example : 4)  OR null
>>> value, I would expect the syntax to be something like this:
>>> &fq=(price:4+OR+(-price:[* TO *]))
>>> But this does not work. After searching around for more, read somewhere
>>> that the right way to achieve this would be:
>>> fq=-(-price:4+AND+price:[*+TO+*])
>>>
>>> Now this does work but seems like a very roundabout way. Is there a
>>> better
>>> way to achieve this ?
>>>
>>>
>> Pure negative queries don't work -- you have to have results in the query
>> before you can subtract.  For some top-level queries, Solr is able to
>> detect this situation and fix it internally, but on inner queries you must
>> explicitly state your intentions.  It is best if you always use '*:*
>> -query' syntax, just to be safe.
>>
>> fq=(price:4+OR+(*:* -price:[* TO *]))
>>
>> Thanks,
>> Shawn
>>
>>
>>
>


FastVectorHighlighter with wildcard queries

2011-09-08 Thread Rahul Warawdekar
Hi,

I am currently evaluating the FastVectorHighlighter in a Solr search based
project and have a couple of questions

1. Is there any specific reason why the FastVectorHighlighter does not
provide support for multiterm(wildcard) queries ?
2. What are the other constraints when using FastVectorHighlighter ?

-- 
Thanks and Regards
Rahul A. Warawdekar


Re: FastVectorHighlighter with wildcard queries

2011-09-12 Thread Rahul Warawdekar
Hi Koji,

Thanks for the information !
I will try the patches provided by you.

On 9/8/11, Koji Sekiguchi  wrote:
> (11/09/09 6:16), Rahul Warawdekar wrote:
>> Hi,
>>
>> I am currently evaluating the FastVectorHighlighter in a Solr search based
>> project and have a couple of questions
>>
>> 1. Is there any specific reason why the FastVectorHighlighter does not
>> provide support for multiterm(wildcard) queries ?
>> 2. What are the other constraints when using FastVectorHighlighter ?
>>
>
> FVH used to have typical constrains:
>
> 1. supports only TermQuery and PhraseQuery (and
> BooleanQuery/DisjunctionMaxQuery that
> include TQ and PQ)
> 2. ignores word boundary
>
> But now for 1, FVH will support other queries:
>
> https://issues.apache.org/jira/browse/LUCENE-1889
>
> I believe it is almost closed to be fixed. For 2, FVH in the latest
> trunk/3x, pays
> regard to word or sentence boundary through BoundaryScanner:
>
> https://issues.apache.org/jira/browse/LUCENE-1824
>
> koji
> --
> Check out "Query Log Visualizer" for Apache Solr
> http://www.rondhuit-demo.com/loganalyzer/loganalyzer.html
> http://www.rondhuit.com/en/
>


-- 
Thanks and Regards
Rahul A. Warawdekar


Solr: Return field names that contain search term

2011-09-12 Thread Rahul Warawdekar
Hi,

I have a a query on Solr search as follows.

I am indexing an entity which includes a multivalued field using DIH.
This multivalued field contains content from multiple attachments for
a single entity.

Now, for eg. if i search for the term "solr", will I be able to know
which field contains this search term ?
And if it is a multivaued field, which field number in that
multivalued field contains the search term ?

Currently, to achieve this, I am using a workaround using the
highlighting feature.
I am indexing all the multiple attachments within a single entity and
document as dynamic fields "_i".

While searching, I am highlighting on these dynamic fields (hl.fl=*_i)
and from the highlighitng section in the results, I am able to get the
attachment number which contains the search term.
But since this approach involves highlighting large attachments, the
search response times are very slow.

Would highly appreciate if someone can suggest other efficient ways to
address this kind of a requirement.

-- 
Thanks and Regards
Rahul A. Warawdekar


Re: Solr: Return field names that contain search term

2011-09-12 Thread Rahul Warawdekar
Thanks Chris !

Will try out the second approach you suggested and share my findings.

On Mon, Sep 12, 2011 at 5:03 PM, Chris Hostetter
wrote:

>
> : > Would highly appreciate if someone can suggest other efficient ways to
> : > address this kind of a requirement.
>
> one approach would be to index each attachment as it's own document and
> search those.  you could then use things like the group collapsing
> features to return onlly the "main" type documents when multiple
> attachments match.
>
> similarly: you could still index each "main" document with a giant
> text field containing all of the attachment text, *and* you could indx
> each attachment as it's own document.  You would search on the main docs
> as you do now, but then your app could issue a secondary request searching
> for all  "attachment" docs that match on one of the main docIds in a
> special field, and use the results to note which attachment of each doc
> (if any) caused the match.
>
> -Hoss
>



-- 
Thanks and Regards
Rahul A. Warawdekar


Re: DIH delta last_index_time

2011-09-14 Thread Rahul Warawdekar
Hi Maria/Gora,

I see this as more of a problem with the timezones in which the Solr server
and the database server are located.
Is this true ?
If yes, one more possibility of handling this scenario would be to customize
DataImportHandler code as follows

1. Add one more configuration property named "dbTimeZone" at the entity
level in "data-config.xml" file
2. While saving the lastIndexTime in the properties file, save it according
to the timezone specified in the config so that it is in sync with the
database
server time.

Basically customize the code so that all the time related updates to the
dataimport.properties file should be timezone specific.


On Wed, Sep 14, 2011 at 4:31 AM, Gora Mohanty  wrote:

> On Wed, Sep 14, 2011 at 11:23 AM, Maria Vazquez
>  wrote:
> > Hi,
> > How do you handle the situation where the time on the server running Solr
> > doesn¹t match the time in the database?
>
> Firstly, why is that the case? NTP is pretty universal
> these days.
>
> > I¹m using the last_index_time saved by Solr in the delta query checking
> it
> > against lastModifiedDate field in the database but the times are not in
> sync
> > so I might lose some changes.
> > Can we use something else other than last_index_time? Maybe something
> like
> > last_pk or something.
>
> One possible way is to edit dataimport.properties, manually or through
> a script, to put the last_index_time back to a "safe" value.
>
> Regards,
> Gora
>



-- 
Thanks and Regards
Rahul A. Warawdekar


Re: Index not getting refreshed

2011-09-14 Thread Rahul Warawdekar
Hi Pawan,

Can you please share more details on the indexing mechanism ? (DIH,  SolrJ
or any other)
Please let us know the configuration details.


On Wed, Sep 14, 2011 at 12:48 PM, Pawan Darira wrote:

> Hi
>
> I am using Solr 3.2 on a live website. i get live user's data of about 2000
> per day. I do an incremental index every 8 hours. but my search results
> always show the same result with same sorting order. when i check the same
> search from corresponding db, it gives me different results always (as new
> data regularly gets added)
>
> please suggest what might be the issue. is there any cache related problem
> at SOLR level
>
> thanks
> pawan
>



-- 
Thanks and Regards
Rahul A. Warawdekar


Re: How to get the fields that match the request?

2011-09-22 Thread Rahul Warawdekar
Hi,

Before considering highlighting to address this requirement, you also need
to consider the performance implications of highlighting for large text
fields.

On Thu, Sep 22, 2011 at 11:42 AM, Nicolas Martin wrote:

> yes, highlights can help to do that, but if you wants to paginate your
> results, you can't use hl.
>
> It'd be great to have a scoring average by fields...
>
>
>
>
>
> On 22/09/2011 17:37, Tanner Postert wrote:
>
>> this would be useful to me as well.
>>
>> even when searching with q=test, I know it defaults to the default search
>> field, but it would helpful to know what field(s) match the query term.
>>
>> On Thu, Sep 22, 2011 at 3:29 AM, Nicolas Martin**
>> wrote:
>>
>>
>>
>>> Hi everyBody,
>>>
>>> I need your help to get more information in my solR query's response.
>>>
>>> i've got a simple input text which allows me to query several fields in
>>> the
>>> same query.
>>>
>>> So my query  looks like this
>>> "q=email:martyn+OR+name:****martynn+OR+commercial:martyn ..."
>>>
>>> Is it possible in the response to know the fields where "martynn" has
>>> been
>>> found ?
>>>
>>> Thanks a Lot :-)
>>>
>>>
>>>
>>
>>
>
>


-- 
Thanks and Regards
Rahul A. Warawdekar


Re: JdbcDataSource and threads

2011-09-22 Thread Rahul Warawdekar
Hi,

Have you applied the patch that is provided with the Jira you mentioned ?
https://issues.apache.org/jira/browse/SOLR-2233

Please apply the patch and check if you are getting the same exceptions.
It has worked well for me till now.

On Thu, Sep 22, 2011 at 3:17 PM, Vazquez, Maria (STM) <
maria.vazq...@dexone.com> wrote:

> Hi!
>
> So as of 3.4 JdbcDataSource doesn't work with threads, correct?
>
>
>
> https://issues.apache.org/jira/browse/SOLR-2233
>
>
>
> I'm using Microsoft SQL Server, my data-config.xml has a lot of very
> complex SQL queries and it takes a long time to index.
>
> I'm migrating from Lucene to Solr and the Lucene code uses threads so it
> takes little time to index, now in Solr if I add threads=xx to my
> rootEntity I get lots of errors about connections being closed.
>
>
>
> Thanks a lot,
>
> Maria
>
>


-- 
Thanks and Regards
Rahul A. Warawdekar


Re: JdbcDataSource and threads

2011-09-23 Thread Rahul Warawdekar
I am using Solr 3.1.
But you can surely try the patch with 3.3.

On Fri, Sep 23, 2011 at 1:35 PM, Vazquez, Maria (STM) <
maria.vazq...@dexone.com> wrote:

> Thanks Rahul.
> Are you using 3.3 or 3.4? I'm on 3.3 right now
> I will try the patch today
> Thanks again,
> Maria
>
>
> -Original Message-
> From: Rahul Warawdekar [mailto:rahul.warawde...@gmail.com]
> Sent: Thursday, September 22, 2011 12:46 PM
> To: solr-user@lucene.apache.org
> Subject: Re: JdbcDataSource and threads
>
> Hi,
>
> Have you applied the patch that is provided with the Jira you mentioned
> ?
> https://issues.apache.org/jira/browse/SOLR-2233
>
> Please apply the patch and check if you are getting the same exceptions.
> It has worked well for me till now.
>
> On Thu, Sep 22, 2011 at 3:17 PM, Vazquez, Maria (STM) <
> maria.vazq...@dexone.com> wrote:
>
> > Hi!
> >
> > So as of 3.4 JdbcDataSource doesn't work with threads, correct?
> >
> >
> >
> > https://issues.apache.org/jira/browse/SOLR-2233
> >
> >
> >
> > I'm using Microsoft SQL Server, my data-config.xml has a lot of very
> > complex SQL queries and it takes a long time to index.
> >
> > I'm migrating from Lucene to Solr and the Lucene code uses threads so
> it
> > takes little time to index, now in Solr if I add threads=xx to my
> > rootEntity I get lots of errors about connections being closed.
> >
> >
> >
> > Thanks a lot,
> >
> > Maria
> >
> >
>
>
> --
> Thanks and Regards
> Rahul A. Warawdekar
>



-- 
Thanks and Regards
Rahul A. Warawdekar


Re: Solr stopword problem in Query

2011-09-26 Thread Rahul Warawdekar
Hi Isan,

Does your search return any documents when you remove the 'at' keyword and
just search for "Coke studio MTV" ?
Also, can you please provide the snippet of schema.xml file where you have
mentioned this field name and its "type" description ?

On Mon, Sep 26, 2011 at 6:09 AM, Isan Fulia wrote:

> Hi all,
>
> I have a text field named* textForQuery* .
> Following content has been indexed into solr in field textForQuery
> *Coke Studio at MTV*
>
> when i fired the query as
> *textForQuery:("coke studio at mtv")* the results showed 0 documents
>
> After runing the same query in debugMode i got the following results
>
> 
> 
> textForQuery:("coke studio at mtv")
> textForQuery:("coke studio at mtv")
> PhraseQuery(textForQuery:"coke studio ? mtv")
> textForQuery:"coke studio *? *mtv"
>
> Why the query did not matched any document even when there is a document
> with value of textForQuery as *Coke Studio at MTV*?
> Is this because of the stopword *at* present in stopwordList?
>
>
>
> --
> Thanks & Regards,
> Isan Fulia.
>



-- 
Thanks and Regards
Rahul A. Warawdekar


Re: Solr stopword problem in Query

2011-09-27 Thread Rahul Warawdekar
Hi Isan,

The schema.xml seems OK to me.

Is "textForQuery" the only field you are searching in ?
Are you also searching on any other non text based fields ? If yes, please
provide schema description for those fields also.
Also, provide your solrconfig.xml file.


On Tue, Sep 27, 2011 at 1:12 AM, Isan Fulia wrote:

> Hi Rahul,
>
> I also tried searching "Coke Studio MTV" but no documents were returned.
>
> Here is the snippet of my schema file.
>
>   positionIncrementGap="100" autoGeneratePhraseQueries="true">
>
>  
>
>
>ignoreCase="true"
>
>words="stopwords_en.txt"
>enablePositionIncrements="true"
>
>/>
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>
>
>
> protected="protwords.txt"/>
>
>
>  
>
>  
>
>
> synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>
>ignoreCase="true"
>
>words="stopwords_en.txt"
>enablePositionIncrements="true"
>
>/>
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
>
>
>
> protected="protwords.txt"/>
>
>
>  
>
>
>
>
> * multiValued="false"/>
>  multiValued="false"/>
>
> ** multiValued="true" omitTermFreqAndPositions="true"/>**
>
> 
> *
>
>
> Thanks,
> Isan Fulia.
>
>
> On 26 September 2011 21:19, Rahul Warawdekar  >wrote:
>
> > Hi Isan,
> >
> > Does your search return any documents when you remove the 'at' keyword
> and
> > just search for "Coke studio MTV" ?
> > Also, can you please provide the snippet of schema.xml file where you
> have
> > mentioned this field name and its "type" description ?
> >
> > On Mon, Sep 26, 2011 at 6:09 AM, Isan Fulia  > >wrote:
> >
> > > Hi all,
> > >
> > > I have a text field named* textForQuery* .
> > > Following content has been indexed into solr in field textForQuery
> > > *Coke Studio at MTV*
> > >
> > > when i fired the query as
> > > *textForQuery:("coke studio at mtv")* the results showed 0 documents
> > >
> > > After runing the same query in debugMode i got the following results
> > >
> > > 
> > > 
> > > textForQuery:("coke studio at mtv")
> > > textForQuery:("coke studio at mtv")
> > > PhraseQuery(textForQuery:"coke studio ?
> > mtv")
> > > textForQuery:"coke studio *?
> *mtv"
> > >
> > > Why the query did not matched any document even when there is a
> document
> > > with value of textForQuery as *Coke Studio at MTV*?
> > > Is this because of the stopword *at* present in stopwordList?
> > >
> > >
> > >
> > > --
> > > Thanks & Regards,
> > > Isan Fulia.
> > >
> >
> >
> >
> > --
> > Thanks and Regards
> > Rahul A. Warawdekar
> >
>
>
>
> --
> Thanks & Regards,
> Isan Fulia.
>



-- 
Thanks and Regards
Rahul A. Warawdekar


Re: Trouble configuring multicore / accessing admin page

2011-09-28 Thread Rahul Warawdekar
Hi Joshua,

Can you try updating your solr.xml as follows:
Specify
"" instead of
""

Basically remove the extra text "cores" in the core element from the
instanceDir attribute.

Just try and let us know if it works.

On Wed, Sep 28, 2011 at 3:40 PM, Joshua Miller wrote:

> Hello,
>
> I am trying to get SOLR working with multiple cores and have a problem
> accessing the admin page once I configure multiple cores.
>
> Problem:
> When accessing the admin page via http://solrhost:8080/solr/admin, I get a
> 404, "missing core name in path".
>
> Question:  when using the multicore option, is the standard admin page
> still available?
>
> Environment:
> - solr 1.4.1
> - Windows server 2008 R2
> - Java SE 1.6u27
> - Tomcat 6.0.33
> - Solr Experience:  none
>
> I have set -Dsolr.solr.home=c:\solr and within that I have a solr.xml with
> the following contents:
>
> 
>  
>
>
>  
> 
>
> I have copied the example/solr directory to c:\solr and have populated that
> directory with the cores/{core{0,1}} as well as the proper configs and data
> directories within.
>
> When I restart tomcat, it shows a couple of exceptions related to
> queryElevationComponent and null pointers that I think are due to the DB not
> yet being available but I see that the cores appear to initialize properly
> other than that
>
> So the problem I'm looking to solve/clarify here is the admin page - should
> that remain available and usable when using the multicore configuration or
> am I doing something wrong?  Do I need to use the CoreAdminHandler type
> requests to manage multicore instead?
>
> Thanks,
> --
> Josh Miller
> Open Source Solutions Architect
> (425) 737-2590
> http://itsecureadmin.com/
>
>


-- 
Thanks and Regards
Rahul A. Warawdekar


Architecture and Capacity planning for large Solr index

2011-10-11 Thread Rahul Warawdekar
Hi All,

I am working on a Solr search based project, and would highly appreciate
help/suggestions from you all regarding Solr architecture and capacity
planning.
Details of the project are as follows

1. There are 2 databases from which, data needs to be indexed and made
searchable,
- Production
- Archive
2. Production database will retain 6 months old data and archive data every
month.
3. Archive database will retain 3 years old data.
4. Database is SQL Server 2008 and Solr version is 3.1

Data to be indexed contains a huge volume of attachments (PDF, Word, excel
etc..), approximately 200 GB per month.
We are planning to do a full index every month (multithreaded) and
incremental indexing on a daily basis.
The Solr index size is coming to approximately 25 GB per month.

If we were to use distributed search, what would be the best configuration
for Production as well as Archive indexes ?
What would be the best CPU/RAM/Disk configuration ?
How can I implement failover mechanism for sharded searches ?

Please let me know in case I need to share more information.


-- 
Thanks and Regards
Rahul A. Warawdekar


Issue with Shard configuration in solrconfig.xml (Solr 3.1)

2011-10-20 Thread Rahul Warawdekar
Hi,

I am trying to evaluate distributed search for my project by splitting up
our single index on 2 shards with Solr 3.1
When I query the first solr server by passing the "shards" parameter, I get
correct search results from both shards.
(
http://server1:8080/solr/test/select/?shards=server1:8080/solr/test,server2:8080/solr/test&q=solr&start=0&rows=20
)

I want to avoid the use of this shards parameter in the http url and specify
it in solrconfig.xml as follows.


server1:8080/solr/test,server2:8080/solr/test
..


After adding the shards parameter in solrconfig.xml, I get search results
only from the first shard and not from the from the second one.
Am I missing any configuration ?

Also, can the urls with the shard parameter be load balanced for a failover
mechanism ?



-- 
Thanks and Regards
Rahul A. Warawdekar


Re: Ordered proximity search

2011-11-04 Thread Rahul Warawdekar
Hi Thomas,

Do you always need the ordered proximity search by default ?
You may want to check SpanNearQuery at "
http://www.lucidimagination.com/blog/2009/07/18/the-spanquery/";.

We are using edismax query parser provided by Solr.
I had a similar type of requirement in our project in here is how we
addressed it

1. Wrote a customized query parser similar to edismax.
2. Identified the method in the code which takes care of "PhraseQuery" and
replaced it with a snippet of "SpanNearQuery" code.

Please check more on SpanNearQuery if that works for you.



On Thu, Nov 3, 2011 at 2:11 PM, LT.thomas  wrote:

> Hi,
>
> By ordered I mean term1 will always come before term2 in the document.
>
> I have two documents:
> 1. "By ordered I mean term1 will always come before term2 in the document"
> 2. "By ordered I mean term2 will always come before term1 in the document"
>
> if I make the query:
>
> "term1 term2"~Integer.MAX_VALUE
>
> my results is: 2 documents
>
> How can I query to have one result (only if term1 come before term2):
> "By ordered I mean term1 will always come before term2 in the document"
>
> Thanks
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Ordered-proximity-search-tp3477946p3477946.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Thanks and Regards
Rahul A. Warawdekar


License Info

2011-11-11 Thread Rahul R
Hello,
Since Apache Solr is governed by Apache License 2.0 - does it mean that all
jar files bundled within Solr are also governed by the same License ? Do I
have to worry about checking the License information of all bundled jar
files in my commercial Solr powered application ?

Even if I use them independent of Solr, will the same License apply ? Some
of the jar files - slf4j-api-1.6.1.jar, jcl-over-slf4j-1.6.1.jar etc - do
not have any License file inside the jar.

Regards
Rahul


Re: Architecture and Capacity planning for large Solr index

2011-11-21 Thread Rahul Warawdekar
Thanks !

My business requirements have changed a bit.
We need one year rolling data in Production.
The index size for the same comes to approximately 200 - 220 GB.
I am planning to address this using Solr distributed search as follows.

1. Whole index to be split up between 3 shards, with 3 masters and 6 slaves
(load balanced)
2. Master configuration
 will be 4 CPU


On Tue, Oct 11, 2011 at 2:05 PM, Otis Gospodnetic <
otis_gospodne...@yahoo.com> wrote:

> Hi Rahul,
>
> This is unfortunately not enough information for anyone to give you very
> precise answers, so I'll just give some rough ones:
>
> * best disk - SSD :)
> * CPU - multicore, depends on query complexity, concurrency, etc.
> * sharded search and failover - start with SolrCloud, there are a couple
> of pages about it on the Wiki and
> http://blog.sematext.com/2011/09/14/solr-digest-spring-summer-2011-part-2-solr-cloud-and-near-real-time-search/
>
> Otis
> 
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
>
>
> >
> >From: Rahul Warawdekar 
> >To: solr-user 
> >Sent: Tuesday, October 11, 2011 11:47 AM
> >Subject: Architecture and Capacity planning for large Solr index
> >
> >Hi All,
> >
> >I am working on a Solr search based project, and would highly appreciate
> >help/suggestions from you all regarding Solr architecture and capacity
> >planning.
> >Details of the project are as follows
> >
> >1. There are 2 databases from which, data needs to be indexed and made
> >searchable,
> >- Production
> >- Archive
> >2. Production database will retain 6 months old data and archive data
> every
> >month.
> >3. Archive database will retain 3 years old data.
> >4. Database is SQL Server 2008 and Solr version is 3.1
> >
> >Data to be indexed contains a huge volume of attachments (PDF, Word, excel
> >etc..), approximately 200 GB per month.
> >We are planning to do a full index every month (multithreaded) and
> >incremental indexing on a daily basis.
> >The Solr index size is coming to approximately 25 GB per month.
> >
> >If we were to use distributed search, what would be the best configuration
> >for Production as well as Archive indexes ?
> >What would be the best CPU/RAM/Disk configuration ?
> >How can I implement failover mechanism for sharded searches ?
> >
> >Please let me know in case I need to share more information.
> >
> >
> >--
> >Thanks and Regards
> >Rahul A. Warawdekar
> >
> >
> >
>



-- 
Thanks and Regards
Rahul A. Warawdekar


Re: Architecture and Capacity planning for large Solr index

2011-11-21 Thread Rahul Warawdekar
Thanks Otis !
Please ignore my earlier email which does not have all the information.

My business requirements have changed a bit.
We now need one year rolling data in Production, with the following details
- Number of records -> 1.2 million
- Solr index size for these records comes to approximately 200 - 220
GB. (includes large attachments)
- Approx 250 users who will be searching the applicaiton with a peak of
1 search request every 40 seconds.

I am planning to address this using Solr distributed search on a VMWare
virtualized environment as follows.

1. Whole index to be split up between 3 shards, with 3 masters and 6 slaves
(load balanced)

2. Master configuration for each server is as follows
- 4 CPUs
- 16 GB RAM
- 300 GB disk space

3. Slave configuration for each server is as follows
- 4 CPUs
- 16 GB RAM
- 150 GB disk space

4. I am planning to use SAN instead of local storage to store Solr index.

And my questions are as follows:
Will 3 shards serve the purpose here ?
Is SAN a a good option for storing solr index, given the high index volume ?




On Mon, Nov 21, 2011 at 3:05 PM, Rahul Warawdekar <
rahul.warawde...@gmail.com> wrote:

> Thanks !
>
> My business requirements have changed a bit.
> We need one year rolling data in Production.
> The index size for the same comes to approximately 200 - 220 GB.
> I am planning to address this using Solr distributed search as follows.
>
> 1. Whole index to be split up between 3 shards, with 3 masters and 6
> slaves (load balanced)
> 2. Master configuration
>  will be 4 CPU
>
>
>
> On Tue, Oct 11, 2011 at 2:05 PM, Otis Gospodnetic <
> otis_gospodne...@yahoo.com> wrote:
>
>> Hi Rahul,
>>
>> This is unfortunately not enough information for anyone to give you very
>> precise answers, so I'll just give some rough ones:
>>
>> * best disk - SSD :)
>> * CPU - multicore, depends on query complexity, concurrency, etc.
>> * sharded search and failover - start with SolrCloud, there are a couple
>> of pages about it on the Wiki and
>> http://blog.sematext.com/2011/09/14/solr-digest-spring-summer-2011-part-2-solr-cloud-and-near-real-time-search/
>>
>> Otis
>> 
>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
>> Lucene ecosystem search :: http://search-lucene.com/
>>
>>
>> >
>> >From: Rahul Warawdekar 
>> >To: solr-user 
>> >Sent: Tuesday, October 11, 2011 11:47 AM
>> >Subject: Architecture and Capacity planning for large Solr index
>> >
>> >Hi All,
>> >
>> >I am working on a Solr search based project, and would highly appreciate
>> >help/suggestions from you all regarding Solr architecture and capacity
>> >planning.
>> >Details of the project are as follows
>> >
>> >1. There are 2 databases from which, data needs to be indexed and made
>> >searchable,
>> >- Production
>> >- Archive
>> >2. Production database will retain 6 months old data and archive data
>> every
>> >month.
>> >3. Archive database will retain 3 years old data.
>> >4. Database is SQL Server 2008 and Solr version is 3.1
>> >
>> >Data to be indexed contains a huge volume of attachments (PDF, Word,
>> excel
>> >etc..), approximately 200 GB per month.
>> >We are planning to do a full index every month (multithreaded) and
>> >incremental indexing on a daily basis.
>> >The Solr index size is coming to approximately 25 GB per month.
>> >
>> >If we were to use distributed search, what would be the best
>> configuration
>> >for Production as well as Archive indexes ?
>> >What would be the best CPU/RAM/Disk configuration ?
>> >How can I implement failover mechanism for sharded searches ?
>> >
>> >Please let me know in case I need to share more information.
>> >
>> >
>> >--
>> >Thanks and Regards
>> >Rahul A. Warawdekar
>> >
>> >
>> >
>>
>
>
>
> --
> Thanks and Regards
> Rahul A. Warawdekar
>
>


-- 
Thanks and Regards
Rahul A. Warawdekar


how to use term proxymity queries with apache solr

2011-11-21 Thread Rahul Mehta
Hello,

Have used Proximity Queries only work using a sloppy phrase query (e.g.:
"catalyst polymer" ~5) but do not allow wildcards.

Want to use Proximity Queries between any terms (e.g.: (poly* NEAR *lyst))
is this possible using additional query parsers like "Surround"?

if yes ,Please suggest how to install surround.

currently we are using solr 3.1 .

Thanks & Regards

Rahul Mehta


Integrating Surround Query Parser

2011-11-21 Thread Rahul Mehta
Hello,

I want to Run surround query .


   1. Downloading from
   http://www.java2s.com/Code/Jar/JKL/Downloadlucenesurround241jar.htm
   2. Moved the lucene-surround-2.4.1.jar  to /apache-solr-3.1.0/example/lib
   3. Edit  the solrconfig.xml with
  1. 
   4. Restart Solr

Got this error :

org.apache.solr.common.SolrException: Error Instantiating
QParserPlugin, org.apache.lucene.queryParser.surround.parser.QueryParser
is not a org.apache.solr.search.QParserPlugin
at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:425)



-- 
Thanks & Regards

Rahul Mehta


How to be sure that surround

2011-11-22 Thread Rahul Mehta
I have done the following steps for installing surround plugin.

   1. Downloading from
   http://www.java2s.com/Code/Jar/JKL/Downloadlucenesurround241jar.htm
   2. Moved the lucene-surround-2.4.1.jar  to /apache-solr-3.1.0/example/lib
   3. restart solr .

But How to be sure that surround plugin is being installed .
Means what query i can run.

-- 
Thanks & Regards

Rahul Mehta


Re: How to be sure that surround

2011-11-22 Thread Rahul Mehta
I have the solr-trunk , but queries are running on both (on trunk (4.0) and
on (3.1) ) . then how i can be sure that what query will run by surround
query parser plugin.

The query i tried :
http://localhost:8983/solr/select?q=abstracts:99n(flat,panel,display)

http://localhost:8983/solr/select?q=abstracts:(poly*%20NEAR%20*lyst)

The above queries both are running on 3.1 and 4.0

How i can sure that these query are running by Surround Plugin.


On Tue, Nov 22, 2011 at 5:51 PM, Ahmet Arslan  wrote:

> > I have done the following steps for
> > installing surround plugin.
> >
> >1. Downloading from
> >http://www.java2s.com/Code/Jar/JKL/Downloadlucenesurround241jar.htm
> >2. Moved the
> > lucene-surround-2.4.1.jar  to
> > /apache-solr-3.1.0/example/lib
> >3. restart solr .
> >
> > But How to be sure that surround plugin is being installed
> > .
> > Means what query i can run.
> >
>
> Rahul, you need to switch to solr-trunk, it is already there
> http://wiki.apache.org/solr/SurroundQueryParser
>



-- 
Thanks & Regards

Rahul Mehta


Re: how to use term proxymity queries with apache solr

2011-11-22 Thread Rahul Mehta
do i need to install this seperately or it is integrated in solr 4.0 ?

On Tue, Nov 22, 2011 at 5:49 PM, Ahmet Arslan  wrote:

> > Not sure about leading wildcard but you can use
> https://issues.apache.org for this.
>
> Sorry, link was : https://issues.apache.org/jira/browse/SOLR-1604
>



-- 
Thanks & Regards

Rahul Mehta


Re: Integrating Surround Query Parser

2011-11-22 Thread Rahul Mehta
How to apply this patch https://issues.apache.org/jira/browse/SOLR-2703 with
solr 3.1 to install surround as plugin?

On Tue, Nov 22, 2011 at 7:34 PM, Erik Hatcher wrote:

> The "surround" query parser is fully wired into Solr trunk/4.0, if that
> helps.  See http://wiki.apache.org/solr/SurroundQueryParser and the JIRA
> issue linked there in case you want to patch it into a different version.
>
>Erik
>
> On Jan 21, 2011, at 02:24 , Ahson Iqbal wrote:
>
> > Hi All
> >
> > I want to integrate Surround Query Parser with solr, To do this i have
> > downloaded jar file from the internet and and then pasting that jar file
> in
> > web-inf/lib
> >
> > and configured query parser in solrconfig.xml as
> >  > class="org.apache.lucene.queryParser.surround.parser.QueryParser"/>
> >
> > now when i load solr admin page following exception comes
> > org.apache.solr.common.SolrException: Error Instantiating QParserPlugin,
> > org.apache.lucene.queryParser.surround.parser.QueryParser is not a
> > org.apache.solr.search.QParserPlugin
> >
> > what i think that i didnt get the right plugin, can any body guide me
> from where
> > to get right plugin for surround query parser or how to accurately
> integrate
> > this plugin with solr.
> >
> >
> > thanx
> > Ahsan
> >
> >
> >
>
>


-- 
Thanks & Regards

Rahul Mehta


Re: Integrating Surround Query Parser

2011-11-22 Thread Rahul Mehta
This what i tried:


   - Gone to  the solr 3.1 directory which is downloaded from here.
   http://www.trieuvan.com/apache//lucene/solr/3.1.0/apache-solr-3.1.0.tgz
   - wget
   https://issues.apache.org/jira/secure/attachment/12493167/SOLR-2703.patch
   - run the :  patch -p0 -i SOLR-2703.patch --dry-run
   - got an error :
  - patching file
  core/src/test/org/apache/solr/search/TestSurroundQueryParser.java
  - patching file core/src/test-files/solr/conf/schemasurround.xml
  - patching file core/src/test-files/solr/conf/solrconfigsurround.xml
  - patching file
  core/src/java/org/apache/solr/search/SurroundQParserPlugin.java
  - patching file example/solr/conf/solrconfig.xml
  - Hunk #1 FAILED at 1538.
  - 1 out of 1 hunk FAILED -- saving rejects to file
  example/solr/conf/solrconfig.xml.rej
   - our solr config file is getting end at 1508 only.
   - tried finding sudo find / -name TestSurroundQueryParser.java  which is
   not found in the directory .
   - and when m doing svn up giving me Skipped '.'

*Please suggest what should i do now ? *

On Wed, Nov 23, 2011 at 10:39 AM, Rahul Mehta wrote:

> How to apply this patch https://issues.apache.org/jira/browse/SOLR-2703 with
> solr 3.1 to install surround as plugin?
>
>
> On Tue, Nov 22, 2011 at 7:34 PM, Erik Hatcher wrote:
>
>> The "surround" query parser is fully wired into Solr trunk/4.0, if that
>> helps.  See http://wiki.apache.org/solr/SurroundQueryParser and the JIRA
>> issue linked there in case you want to patch it into a different version.
>>
>>Erik
>>
>> On Jan 21, 2011, at 02:24 , Ahson Iqbal wrote:
>>
>> > Hi All
>> >
>> > I want to integrate Surround Query Parser with solr, To do this i have
>> > downloaded jar file from the internet and and then pasting that jar
>> file in
>> > web-inf/lib
>> >
>> > and configured query parser in solrconfig.xml as
>> > > > class="org.apache.lucene.queryParser.surround.parser.QueryParser"/>
>> >
>> > now when i load solr admin page following exception comes
>> > org.apache.solr.common.SolrException: Error Instantiating QParserPlugin,
>> > org.apache.lucene.queryParser.surround.parser.QueryParser is not a
>> > org.apache.solr.search.QParserPlugin
>> >
>> > what i think that i didnt get the right plugin, can any body guide me
>> from where
>> > to get right plugin for surround query parser or how to accurately
>> integrate
>> > this plugin with solr.
>> >
>> >
>> > thanx
>> > Ahsan
>> >
>> >
>> >
>>
>>
>
>
> --
> Thanks & Regards
>
> Rahul Mehta
>
>
>
>


-- 
Thanks & Regards

Rahul Mehta


Re: Integrating Surround Query Parser

2011-11-23 Thread Rahul Mehta
After this i tried with solr3.1-src.


   - and this time i got the core folder in the previous installation ,when
   this folder get created
   - 
/home/reach121/basf/*apache-solr-3.1.0/core/src/test/org/apache/solr/search/TestSurroundQueryParser.java
   *
   - and i have putted * *
   - but when i run solr , it is giving me an error :
   -
   - SEVERE: org.apache.solr.common.SolrException: Error loading class
   'org.apache.solr.search.SurroundQParserPlugin'
   - at
   
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:389)
   - at
   org.apache.solr.core.SolrCore.createInstance(SolrCore.java:423)
   - at
   org.apache.solr.core.SolrCore.createInitInstance(SolrCore.java:445)
   - at
   org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1545)
   - at
   org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1539)
   - at
   org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1572)
   - at
   org.apache.solr.core.SolrCore.initQParsers(SolrCore.java:1489)
   - at org.apache.solr.core.SolrCore.(SolrCore.java:555)
   - at
   org.apache.solr.core.CoreContainer.create(CoreContainer.java:458)
   - at
   org.apache.solr.core.CoreContainer.load(CoreContainer.java:316)
   - at
   org.apache.solr.core.CoreContainer.load(CoreContainer.java:207)
   - at
   
org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:130)
   - at
   org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:94)
   - at
   org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:97)
   - at
   org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
   - at
   org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:713)
   -

Please suggest what should i do ?

On Wed, Nov 23, 2011 at 11:19 AM, Rahul Mehta wrote:

> This what i tried:
>
>
>- Gone to  the solr 3.1 directory which is downloaded from here.
>http://www.trieuvan.com/apache//lucene/solr/3.1.0/apache-solr-3.1.0.tgz
>- wget
>https://issues.apache.org/jira/secure/attachment/12493167/SOLR-2703.patch
>- run the :  patch -p0 -i SOLR-2703.patch --dry-run
>- got an error :
>   - patching file
>   core/src/test/org/apache/solr/search/TestSurroundQueryParser.java
>   - patching file core/src/test-files/solr/conf/schemasurround.xml
>   - patching file core/src/test-files/solr/conf/solrconfigsurround.xml
>   - patching file
>   core/src/java/org/apache/solr/search/SurroundQParserPlugin.java
>   - patching file example/solr/conf/solrconfig.xml
>   - Hunk #1 FAILED at 1538.
>   - 1 out of 1 hunk FAILED -- saving rejects to file
>   example/solr/conf/solrconfig.xml.rej
>- our solr config file is getting end at 1508 only.
>- tried finding sudo find / -name TestSurroundQueryParser.java  which
>is not found in the directory .
>- and when m doing svn up giving me Skipped '.'
>
> *Please suggest what should i do now ? *
>
> On Wed, Nov 23, 2011 at 10:39 AM, Rahul Mehta wrote:
>
>> How to apply this patch https://issues.apache.org/jira/browse/SOLR-2703 with
>> solr 3.1 to install surround as plugin?
>>
>>
>> On Tue, Nov 22, 2011 at 7:34 PM, Erik Hatcher wrote:
>>
>>> The "surround" query parser is fully wired into Solr trunk/4.0, if that
>>> helps.  See http://wiki.apache.org/solr/SurroundQueryParser and the
>>> JIRA issue linked there in case you want to patch it into a different
>>> version.
>>>
>>>Erik
>>>
>>> On Jan 21, 2011, at 02:24 , Ahson Iqbal wrote:
>>>
>>> > Hi All
>>> >
>>> > I want to integrate Surround Query Parser with solr, To do this i have
>>> > downloaded jar file from the internet and and then pasting that jar
>>> file in
>>> > web-inf/lib
>>> >
>>> > and configured query parser in solrconfig.xml as
>>> > >> > class="org.apache.lucene.queryParser.surround.parser.QueryParser"/>
>>> >
>>> > now when i load solr admin page following exception comes
>>> > org.apache.solr.common.SolrException: Error Instantiating
>>> QParserPlugin,
>>> > org.apache.lucene.queryParser.surround.parser.QueryParser is not a
>>> > org.apache.solr.search.QParserPlugin
>>> >
>>> > what i think that i didnt get the right plugin, can any body guide me
>>> from where
>>> > to get right plugin for surround query parser or how to accurately
>>> integrate
>>> > this plugin with solr.
>>> >
>>> >
>>> > thanx
>>> > Ahsan
>>> >
>>> >
>>> >
>>>
>>>
>>
>>
>> --
>> Thanks & Regards
>>
>> Rahul Mehta
>>
>>
>>
>>
>
>
> --
> Thanks & Regards
>
> Rahul Mehta
>
>
>
>


-- 
Thanks & Regards

Rahul Mehta


Re: Integrating Surround Query Parser

2011-11-23 Thread Rahul Mehta
is this is the trunk of solr 4.0 , can't i implement in solr 3.1 .?

On Wed, Nov 23, 2011 at 7:23 PM, Ahmet Arslan  wrote:

> > After this i tried with solr3.1-src.
> > Please suggest what should i do ?
>
> Please use solr-trunk.
> svn checkout http://svn.apache.org/repos/asf/lucene/dev/trunk
>



-- 
Thanks & Regards

Rahul Mehta


complex phrase plugin install

2011-11-24 Thread Rahul Mehta
Hi,

I want to install complex phrase plugin this one.
https://issues.apache.org/jira/browse/SOLR-1604?focusedCommentId=12923982&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12923982

I had done following step and got an error :


   - configure maven path variable in .bashrc
  - http://maven.apache.org/download.html#Installation
   - download the ComplexPhrase.zip
   - run the mvn -e package command in ComplexPhrase Folder
  - [INFO]
  
  - [ERROR] BUILD ERROR
  - [INFO]
  
  - [INFO] Error configuring:
  org.apache.maven.plugins:maven-resources-plugin. Reason: ERROR: Cannot
  override read-only parameter: resources in goal: resources:resources
  - [INFO]
  
  - [INFO] Trace
  - org.apache.maven.lifecycle.LifecycleExecutionException: Error
  configuring: org.apache.maven.plugins:maven-resources-plugin. Reason:
  ERROR: Cannot override read-only parameter: resources in goal:
  resources:resources
  -at
  
org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoals(DefaultLifecycleExecutor.java:723)
  -at
  
org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoalWithLifecycle(DefaultLifecycleExecutor.java:556)
  -at
  
org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoal(DefaultLifecycleExecutor.java:535)
  -at
  
org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoalAndHandleFailures(DefaultLifecycleExecutor.java:387)
  -at
  
org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeTaskSegments(DefaultLifecycleExecutor.java:348)
  -at
  
org.apache.maven.lifecycle.DefaultLifecycleExecutor.execute(DefaultLifecycleExecutor.java:180)
  -at
  org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:328)
  -at
  org.apache.maven.DefaultMaven.execute(DefaultMaven.java:138)
  -at org.apache.maven.cli.MavenCli.main(MavenCli.java:362)
  -at
  org.apache.maven.cli.compat.CompatibleMain.main(CompatibleMain.java:60)
  -at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
  Method)
  -at
  
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
  -at
  
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  -at java.lang.reflect.Method.invoke(Method.java:616)
  -at
  org.codehaus.classworlds.Launcher.launchEnhanced(Launcher.java:315)
  -at
  org.codehaus.classworlds.Launcher.launch(Launcher.java:255)
  -at
  org.codehaus.classworlds.Launcher.mainWithExitCode(Launcher.java:430)
  -at org.codehaus.classworlds.Launcher.main(Launcher.java:375)
  - Caused by: org.apache.maven.plugin.PluginConfigurationException:
  Error configuring:
org.apache.maven.plugins:maven-resources-plugin. Reason:
  ERROR: Cannot override read-only parameter: resources in goal:
  resources:resources
  -at
  
org.apache.maven.plugin.DefaultPluginManager.validatePomConfiguration(DefaultPluginManager.java:1157)
  -at
  
org.apache.maven.plugin.DefaultPluginManager.getConfiguredMojo(DefaultPluginManager.java:705)
  -at
  
org.apache.maven.plugin.DefaultPluginManager.executeMojo(DefaultPluginManager.java:468)
  -at
  
org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoals(DefaultLifecycleExecutor.java:694)
  -... 17 more
  - [INFO]
  


Please suggest how to solve this error.

-- 
Thanks & Regards

Rahul Mehta


highlighting on range query

2011-11-24 Thread Rahul Mehta
Hello,

I want to have result of a range query with highlighted Result.

e.g. i have this query
http://localhsot:8983/solr/select?q=field1:[5000%20TO%206000]&fl=field2&hl=on&rows=5&wt=json&indent=on&hl.fl=field3

is not giving any result in hightliting.

Please suggest how can i get the result?

-- 
Thanks & Regards

Rahul Mehta


Re: Integrating Surround Query Parser

2011-11-24 Thread Rahul Mehta
Okay, thanks for reply.

On Thu, Nov 24, 2011 at 2:35 PM, Erik Hatcher wrote:

>
> On Nov 23, 2011, at 09:56 , Ahmet Arslan wrote:
>
> >
> >> is this is the trunk of solr 4.0 ,
> >> can't i implement in solr 3.1 .?
> >
> > Author of the patch would know answer to this. But why not use trunk?
>
> I spent a fair bit of time yesterday on making a 3.x compatible patch but
> have not completed that work yet.  It's a bit more work because of the
> dependency in the build system.   I may not be able to get back to this for
> some weeks yet.  The SurroundQParserPlugin is really all you need to make
> this work, just need to get the compilation bit fixed (as things changed
> from 3.x to trunk with contrib/modules).
>
> Rahul - if you'd like to see this done, feel free to take a stab at it.
>  I'll tinker with it as I have time.
>
>Erik
>
>


-- 
Thanks & Regards

Rahul Mehta


Re: highlighting on range query

2011-11-24 Thread Rahul Mehta
Hi Ahmet,

I passed &hl.highlightMultiTerm=true in request ,* but still field1 is not
coming in hightlighting.*

http://localhsot:8983/solr/select?q=field1:[5000%20TO%206000]&fl=field2&hl=on&rows=5&wt=json&indent=on&hl.fl=field3&hl.highlightMultiTerm=true

I am using solr 3.1.

is i need to install the patch ? or any thing else i need to do ?






On Thu, Nov 24, 2011 at 3:36 PM, Ahmet Arslan  wrote:

> > I want to have result of a range query with highlighted
> > Result.
>
> http://wiki.apache.org/solr/HighlightingParameters#hl.highlightMultiTerm
>



-- 
Thanks & Regards

Rahul Mehta


Re: highlighting on range query

2011-11-24 Thread Rahul Mehta
oh sorry forgot to tell you that i added &hl.usePhraseHighlighter=true this
also , but still no result is coming .

On Thu, Nov 24, 2011 at 5:14 PM, Ahmet Arslan  wrote:

> > I passed &hl.highlightMultiTerm=true in request ,* but
> > still field1 is not
> > coming in hightlighting.*
> >
> >
> http://localhsot:8983/solr/select?q=field1:[5000%20TO%206000]&fl=field2&hl=on&rows=5&wt=json&indent=on&hl.fl=field3&hl.highlightMultiTerm=true
> >
>
> As wiki says "If the SpanScorer is also being used..." which means you
> need to add &hl.usePhraseHighlighter=true too.
>



-- 
Thanks & Regards

Rahul Mehta


Re: highlighting on range query

2011-11-24 Thread Rahul Mehta
Yes, I tried with specifiying hl.fl=field1, and field1 is indexed and
stored.


On Thu, Nov 24, 2011 at 5:23 PM, Ahmet Arslan  wrote:

> > oh sorry forgot to tell you that i
> > added &hl.usePhraseHighlighter=true this
> > also , but still no result is coming .
>
> Did you specify field1 in hl.fl parameter?
>
> Plus you need you mark field1 as indexed="true" and stored="true" to
> enable highlighting.
>
> http://wiki.apache.org/solr/FieldOptionsByUseCase
>
>


-- 
Thanks & Regards

Rahul Mehta


Re: highlighting on range query

2011-11-24 Thread Rahul Mehta
Any other Suggestion.

On Thu, Nov 24, 2011 at 5:30 PM, Rahul Mehta wrote:

> Yes, I tried with specifiying hl.fl=field1, and field1 is indexed and
> stored.
>
>
> On Thu, Nov 24, 2011 at 5:23 PM, Ahmet Arslan  wrote:
>
>> > oh sorry forgot to tell you that i
>> > added &hl.usePhraseHighlighter=true this
>> > also , but still no result is coming .
>>
>> Did you specify field1 in hl.fl parameter?
>>
>> Plus you need you mark field1 as indexed="true" and stored="true" to
>> enable highlighting.
>>
>> http://wiki.apache.org/solr/FieldOptionsByUseCase
>>
>>
>
>
> --
> Thanks & Regards
>
> Rahul Mehta
>
>
>
>


-- 
Thanks & Regards

Rahul Mehta


Re: highlighting on range query

2011-11-27 Thread Rahul Mehta
Any other Suggestion. as these suggestions are not working.

On Thu, Nov 24, 2011 at 5:44 PM, Rahul Mehta wrote:

> Any other Suggestion.
>
>
> On Thu, Nov 24, 2011 at 5:30 PM, Rahul Mehta wrote:
>
>> Yes, I tried with specifiying hl.fl=field1, and field1 is indexed and
>> stored.
>>
>>
>> On Thu, Nov 24, 2011 at 5:23 PM, Ahmet Arslan  wrote:
>>
>>> > oh sorry forgot to tell you that i
>>> > added &hl.usePhraseHighlighter=true this
>>> > also , but still no result is coming .
>>>
>>> Did you specify field1 in hl.fl parameter?
>>>
>>> Plus you need you mark field1 as indexed="true" and stored="true" to
>>> enable highlighting.
>>>
>>> http://wiki.apache.org/solr/FieldOptionsByUseCase
>>>
>>>
>>
>>
>> --
>> Thanks & Regards
>>
>> Rahul Mehta
>>
>>
>>
>>
>
>
> --
> Thanks & Regards
>
> Rahul Mehta
>
>
>
>


-- 
Thanks & Regards

Rahul Mehta


Re: highlighting on range query

2011-11-28 Thread Rahul Mehta
I tried this url :

http://localhost:8983/solr/select?q=rangefld:[5000%20TO%206000]&fl=lily.id,rangefld&hl=on&rows=5&wt=json&indent=on&hl.fl=*,rangefld&hl.highlightMultiTerm=true&hl.usePhraseHighlighter=true&hl.useFastVectorHighlighter=false

and output is

{
  "responseHeader":{
"status":0,
"QTime":4,
"params":{
  "hl.highlightMultiTerm":"true",
  "fl":"lily.id,rangefld",
  "indent":"on",
  "hl.useFastVectorHighlighter":"false",
   "q":"rangefld:[5000 TO 6000]",
  "hl.fl":"*,rangefld",
  "wt":"json",
  "hl.usePhraseHighlighter":"true",
  "hl":"on",
  "rows":"5"}},
  "response":{"numFound":64,"start":0,"docs":[
  {
"lily.id":"UUID.c5f00cd3-343a-47c1-ab16-ace104b2540f",
"rangefld":5948},
  {
"lily.id":"UUID.ed69ece0-1b24-4829-afb6-22eb242939f2",
"rangefld":5749},
  {
"lily.id":"UUID.afa0c654-2f26-4c5b-9fda-8b51c5ec080d",
"rangefld":5739},
  {
"lily.id":"UUID.d92b405d-f41e-4c85-9014-1b89a986ec42",
"rangefld":5783},
  {
"lily.id":"UUID.102adde5-cbff-4ca6-acb1-426bb14fb579",
"rangefld":5753}]
  },
  "highlighting":{
"UUID.c5f00cd3-343a-47c1-ab16-ace104b2540f":{},
"UUID.ed69ece0-1b24-4829-afb6-22eb242939f2":{},
    "UUID.afa0c654-2f26-4c5b-9fda-8b51c5ec080d":{},
"UUID.d92b405d-f41e-4c85-9014-1b89a986ec42":{},
"UUID.102adde5-cbff-4ca6-acb1-426bb14fb579":{}}}

Why rangefld is not coming in highlight result.

On Mon, Nov 28, 2011 at 12:47 PM, Ahmet Arslan  wrote:

> > Any other Suggestion. as these
> > suggestions are not working.
>
> Could it be that you are using FastVectorHighlighter? What happens when
> you add &hl.useFastVectorHighlighter=false to your search URL?
>



-- 
Thanks & Regards

Rahul Mehta


Re: highlighting on range query

2011-11-28 Thread Rahul Mehta
Tried  below url and got the same output. Any other suggestion .

http://localhost:8983/solr/select?q=rangefld:[5000%20TO%206000]&fl=lily.id,rangefld&hl=on&rows=5&wt=json&indent=on&hl.fl=rangefld&hl.highlightMultiTerm=true&hl.usePhraseHighlighter=true&hl.useFastVectorHighlighter=false

On Mon, Nov 28, 2011 at 8:10 PM, Ahmet Arslan  wrote:

> > and output is
> >
> > {
> >   "responseHeader":{
> > "status":0,
> > "QTime":4,
> > "params":{
> >   "hl.highlightMultiTerm":"true",
> >   "fl":"lily.id,rangefld",
> >   "indent":"on",
> >
> > "hl.useFastVectorHighlighter":"false",
> >"q":"rangefld:[5000 TO
> > 6000]",
> >   "hl.fl":"*,rangefld",
>
> I don't think hl.fl parameter accepts * value. Please try &hl.fl=rangefld
>
>
>


-- 
Thanks & Regards

Rahul Mehta


Re: DataImportHandler w/ multivalued fields

2011-12-01 Thread Rahul Warawdekar
Hi Briggs,

By saying "multivalued fields are not getting indexed prperly", do you mean
to say that you are not able to search on those fields ?
Have you tried actually searching your Solr index for those multivalued
terms and make sure if it returns the search results ?

One possibility could be that the multivalued fields are getting indexed
correctly and are searchable.
However, since your schema.xml has a "raw_tag" field whose "stored"
attribute is set to false, you may not be able to see those fields.



On Thu, Dec 1, 2011 at 1:43 PM, Briggs Thompson  wrote:

> In addition, I tried a query like below and changed the column definition
> to
>
> and still no luck. It is indexing the full content now but not multivalued.
> It seems like the "splitBy" ins't working properly.
>
>select group_concat(freetags.raw_tag separator ', ') as raw_tag, site.*
> from site
> left outer join
>  (freetags inner join freetagged_objects)
> on (freetags.id = freetagged_objects.tag_id
>   and site.siteId = freetagged_objects.object_id)
> group  by site.siteId
>
> Am I doing something wrong?
> Thanks,
> Briggs Thompson
>
> On Thu, Dec 1, 2011 at 11:46 AM, Briggs Thompson <
> w.briggs.thomp...@gmail.com> wrote:
>
> > Hello Solr Community!
> >
> > I am implementing a data connection to Solr through the Data Import
> > Handler and non-multivalued fields are working correctly, but multivalued
> > fields are not getting indexed properly.
> >
> > I am new to DataImportHandler, but from what I could find, the entity is
> > the way to go for multivalued field. The weird thing is that data is
> being
> > indexed for one row, meaning first raw_tag gets populated.
> >
> >
> > Anyone have any ideas?
> > Thanks,
> > Briggs
> >
> > This is the relevant part of the schema:
> >
> > > stored="false" multivalued="true"/>
> > > stored="true" multivalued="true"/>
> >
> >
> > And the relevant part of data-import.xml:
> >
> > 
> >  >   query="select * from site ">
> > 
> > 
> > 
> > 
> > 
> > 
> >  />
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> >  />
> > 
> > 
> > 
> >  > query="select raw_tag, freetags.id,
> > freetagged_objects.object_id as siteId
> >from freetags
> >inner join freetagged_objects
> >on freetags.id=freetagged_objects.tag_id
> > where freetagged_objects.object_id='${site.siteId}'">
> > 
> >  
> > 
> > 
> >
>



-- 
Thanks and Regards
Rahul A. Warawdekar


Re: SOLR 4.1 Out Of Memory error After commit of a few thousand Solr Docs

2013-01-25 Thread Rahul Bishnoi
Thanks for quick reply and addressing each point queried.

Additional asked information is mentioned below:

OS = Ubuntu 12.04 (64 bit)
Sun Java 7 (64 bit)
Total RAM = 8GB

SolrConfig.xml is available at http://pastebin.com/SEFxkw2R


Re: SOLR 4.1 Out Of Memory error After commit of a few thousand Solr Docs

2013-01-27 Thread Rahul Bishnoi
Hi Shawn,

Thanks for your reply. After following your suggestions we were able to
index 30k documents. I have some queries:

1) What is stored in the RAM while only indexing is going on?  How to
calculate the RAM/heap requirements for our documents?
2) The document cache, filter cache, etc...are populated while querying.
Correct me if I am wrong. Are there any caches that are populated while
indexing?

Thanks,
Rahul



On Sat, Jan 26, 2013 at 11:46 PM, Shawn Heisey  wrote:

> On 1/26/2013 12:55 AM, Rahul Bishnoi wrote:
>
>> Thanks for quick reply and addressing each point queried.
>>
>> Additional asked information is mentioned below:
>>
>> OS = Ubuntu 12.04 (64 bit)
>> Sun Java 7 (64 bit)
>> Total RAM = 8GB
>>
>> SolrConfig.xml is available at http://pastebin.com/SEFxkw2R
>>
>>
> Rahul,
>
> The MaxPermGenSize could be a contributing factor.  The documents where
> you have 1000 words are somewhat large, though your overall index size is
> pretty small.  I would try removing the MaxPermGenSize option and see what
> happens.  You can also try reducing the ramBufferSizeMB in solrconfig.xml.
>  The default in previous versions of Solr was 32, which is big enough for
> most things, unless you are indexing HUGE documents like entire books.
>
> It looks like you have the cache sizes under  at values close to
> default.  I wouldn't decrease the documentCache any - in fact an increase
> might be a good thing there.  As for the others, you could probably reduce
> them.  The filterCache size I would start at 64 or 128.  Watch your cache
> hitratios to see whether the changes make things remarkably worse.
>
> If that doesn't help, try increasing the -Xmx option - first 3072m, next
> 4096m.  You could go as high as 6GB and not run into any OS cache problems
> with your small index size, though you might run into long GC pauses.
>
> Indexing, especially big documents, is fairly memory intensive.  Some
> queries can be memory intensive as well, especially those using facets or a
> lot of clauses.
>
> Under normal operation, I could probably get away with a 3GB heap size,
> but I have it at 8GB because otherwise a full reindex (full-import from
> mysql) runs into OOM errors.
>
> Thanks,
> Shawn
>
>


Fw: confirm subscribe to solr-user@lucene.apache.org

2012-03-29 Thread Rahul Mandaliya



- Forwarded Message -
From: Rahul Mandaliya 
To: "solr-user@lucene.apache.org"  
Sent: Thursday, March 29, 2012 9:38 AM
Subject: Fw: confirm subscribe to solr-user@lucene.apache.org
 



hi,
 i am giving confirmation for subscribtion to solr-user@lucene.apache.org
regards,
Rahul




Re: Solr with UIMA

2012-04-19 Thread Rahul Warawdekar
Hi Divakar,

Try making your updateRequestProcessorChain as default. Simply add
default="true" as follows and check if that works.




On Thu, Apr 19, 2012 at 12:01 PM, dsy99  wrote:

> Hi Chris,
> Are you been able to get success to integrate the UIMA in SOLR.
>
> I too  tried to integrate Uima in Solr by following the instructions
> provided in README i.e. the following four steps:
>
> Step1. I set  tags in solrconfig.xml appropriately to point the jar
> files.
>
>   
>
>
> Step2. modified my "schema.xml" adding the fields I wanted to  hold
> metadata
> specifying proper values for type, indexed, stored and multiValued options
> as follows:
>
> required="false"/>
>   multiValued="true" required="false"/>
>multiValued="true" required="false" />
>
> Step3. modified my solrconfig.xml adding the following snippet:
>
>  
> class="org.apache.solr.uima.processor.UIMAUpdateRequestProcessorFactory">
>  
>
>   VALID_ALCHEMYAPI_KEY
>  VALID_ALCHEMYAPI_KEY
>  VALID_ALCHEMYAPI_KEY
>  VALID_ALCHEMYAPI_KEY
>  VALID_ALCHEMYAPI_KEY
>  VALID_OPENCALAIS_KEY
>
>
> name="analysisEngine">/org/apache/uima/desc/OverridingParamsExtServicesAE.xml
>
>true
>
> 
>  false
>  
> text
>   
>
>
>  
> name="name">org.apache.uima.alchemy.ts.concept.ConceptFS
>
>  text
>  concept
>
>  
>  
> name="name">org.apache.uima.alchemy.ts.language.LanguageFS
>
>  language
>  language
>
>  
>  
>org.apache.uima.SentenceAnnotation
>
>  coveredText
>  sentence
> 
>  
>
>  
>
>
>
>  
>
> Step 4: and finally created a new UpdateRequestHandler with the following:
>   
>
>  uima
>
>
>
> Further I  indexed a word file called text.docx using the following
> command:
>
> curl
> "
> http://localhost:8983/solr/update/extract?literal.id=doc1&uprefix=attr_&fmap.content=attr_content&commit=true
> "
> -F "myfile=@UIMA_sample_test.docx"
>
> When I searched the file I am not able to see the additional UIMA fields.
>
> Can you please help if you been able to solve the problem.
>
>
> With Regds & Thanks
> Divakar
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solr-with-UIMA-tp3863324p3923443.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Thanks and Regards
Rahul A. Warawdekar


Re: solr replication failing with error: Master at: is not available. Index fetch failed

2012-04-24 Thread Rahul Warawdekar
Hi,

In Solr wiki, for replication, the master url is defined as follows
http://master_host:port
/solr/corename/replication

This url does not contain "admin" in its path where as in the master url
provided by you, you have an additional "admin" in the url.
Not very sure if this might be an issue but you can just check removing
"admin" and check if replication works.


On Tue, Apr 24, 2012 at 11:49 AM, geeky2  wrote:

> hello,
>
> thank you for the reply,
>
> yes - master has been indexed.
>
> ok - makes sense - the polling interval needs to change
>
> i did check the solr war file on both boxes (master and slave).  they are
> identical.  actually - if they were not indentical - this would point to a
> different issue altogether - since our deployment infrastructure - rolls
> the
> war file to the slaves when you do a deployment on the master.
>
> this has me stumped - not sure what to check next.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/solr-replication-failing-with-error-Master-at-is-not-available-Index-fetch-failed-tp3932921p3935699.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Thanks and Regards
Rahul A. Warawdekar


Re: solr replication failing with error: Master at: is not available. Index fetch failed

2012-04-25 Thread Rahul Warawdekar
Hi,

Is the replication still failing or working fine with that change ?

On Tue, Apr 24, 2012 at 2:16 PM, geeky2  wrote:

> that was it!
>
> thank you.
>
> i did notice something else in the logs now ...
>
> what is the meaning or implication of the message, "Connection reset".?
>
>
>
> 2012-04-24 12:59:19,996 INFO  [org.apache.solr.handler.SnapPuller]
> (pool-12-thread-1) Slave in sync with master.
> 2012-04-24 12:59:39,998 INFO  [org.apache.solr.handler.SnapPuller]
> (pool-12-thread-1) Slave in sync with master.
> *2012-04-24 12:59:59,997 SEVERE [org.apache.solr.handler.SnapPuller]
> (pool-12-thread-1) Master at:
> http://bogus:bogusport/somepath/somecore/replication/ is not available.
> Index fetch failed. Exception: Connection reset*
> 2012-04-24 13:00:19,998 INFO  [org.apache.solr.handler.SnapPuller]
> (pool-12-thread-1) Slave in sync with master.
> 2012-04-24 13:00:40,004 INFO  [org.apache.solr.handler.SnapPuller]
> (pool-12-thread-1) Slave in sync with master.
> 2012-04-24 13:00:59,992 INFO  [org.apache.solr.handler.SnapPuller]
> (pool-12-thread-1) Slave in sync with master.
> 2012-04-24 13:01:19,993 INFO  [org.apache.solr.handler.SnapPuller]
> (pool-12-thread-1) Slave in sync with master.
> 2012-04-24 13:01:39,992 INFO  [org.apache.solr.handler.SnapPuller]
> (pool-12-thread-1) Slave in sync with master.
> 2012-04-24 13:01:59,989 INFO  [org.apache.solr.handler.SnapPuller]
> (pool-12-thread-1) Slave in sync with master.
> 2012-04-24 13:02:19,990 INFO  [org.apache.solr.handler.SnapPuller]
> (pool-12-thread-1) Slave in sync with master.
> 2012-04-24 13:02:39,989 INFO  [org.apache.solr.handler.SnapPuller]
> (pool-12-thread-1) Slave in sync with master.
> 2012-04-24 13:02:59,991 INFO  [org.a
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/solr-replication-failing-with-error-Master-at-is-not-available-Index-fetch-failed-tp3932921p3936107.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Thanks and Regards
Rahul A. Warawdekar


Lucene FieldCache - Out of memory exception

2012-04-30 Thread Rahul R
Hello,
I am using solr 1.3 with jdk 1.5.0_14 and weblogic 10MP1 application server
on Solaris. I use embedded solr server. More details :
Number of docs in solr index : 1.4 million
Physical size of index : 640MB
Total number of fields in the index : 700 (99% of these are dynamic fields)
Total number of fields enabled for faceting : 440
Avg number of facet fields participating in a faceted query : 50-70
Total RAM allocated to weblogic appserver : 3GB (max possible)

In a multi user environment with 3 users using this application for a
period of around 40 minutes, the application runs out of memory. Analysis
of the heap dump shows that almost 85% of the memory is retained by the
FieldCache. Now I understand that the field cache is out of our control but
would appreciate some suggestions on how to handle this issue.

Some questions on this front :
- some mail threads on this forum seem to indicate that there could be some
connection between having dynamic fields and usage of FieldCache. Is this
true ? Most of the fields in my index are dynamic fields.
- as mentioned above, most of my faceted queries could have around 50-70
facet fields (I would do SolrQuery.addFacetField() for around 50-70 fields
per query). Could this be the source of the problem ? Is this too high for
solr to support ?
- Initially, I had a facet.sort defined in solrconfig.xml. Since FieldCache
builds up on sorting, I even removed the facet.sort and tried, but no
respite. The behavior is same as before.
- The document id that I have for each document is quite big (around 50
characters on average). Can this be a problem ? I reduced this to around 15
characters and tried but still there is no improvement.
- Can the size of the data be a problem ? But on this forum, I see many
users talking of more than 100 million documents in their index. I have
only 1.4 million with physical size of 640MB. The physical server on which
this application is running, has sufficient RAM and CPU.
- What gets stored in the FieldCache ? Is it the entire document or just
the document Id ?


Any help is much appreciated. Thank you.

regards
Rahul


Re: get a total count

2012-05-01 Thread Rahul R
Hello,
A related question on this topic. How do I programmatically find the total
number of documents across many shards ? For EmbeddedSolrServer, I use the
following command to get the total count :
solrSearcher.getStatistics().get("numDocs")

With distributed search, how do i get the count of all records in all
shards. Apart from doing a *:* query, is there a way to get the total count
? I am not able to use the same command above because, I am not able to get
a handle to the SolrIndexSearcher object with distributed search. The conf
and data directories of my index reside directly under a folder called solr
(no core) under the weblogic domain directly. I dont have a SolrCore
object. With EmbeddedSolrServer, I used to get the SolrIndexSearcher object
using the following call :
solrSearcher = (SolrIndexSearcher)SolrCoreObject.getSearcher().get();

Stack Information :
OS : Solaris
jdk : 1.5.0_14 32 bit
Solr : 1.3
App Server : Weblogic 10MP1

Thank you.

- Rahul

On Tue, Nov 15, 2011 at 10:49 PM, Otis Gospodnetic <
otis_gospodne...@yahoo.com> wrote:

> I'm assuming the question was about how MANY documents have been indexed
> across all shards.
>
> Answer #1:
> Look at the Solr Admin Stats page on each of your Solr instances and add
> up the numDocs numbers you see there
>
> Answer #2:
> Use Sematext's free Performance Monitoring tool for Solr
> On Index report choose "all, sum" in the Solr Host selector and that will
> show you the total # of docs across the cluster, total # of deleted docs,
> total segments, total size on disk, etc.
> URL: http://www.sematext.com/spm/solr-performance-monitoring/index.html
>
> Otis
> 
>
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
>
>
> >
> >From: U Anonym 
> >To: solr-user@lucene.apache.org
> >Sent: Monday, November 14, 2011 11:50 AM
> >Subject: get a total count
> >
> >Hello everyone,
> >
> >A newbie question:  how do I find out how documents have been indexed
> >across all shards?
> >
> >Thanks much!
> >
> >
> >
>


Re: Lucene FieldCache - Out of memory exception

2012-05-01 Thread Rahul R
Here is one sample query that I picked up from the log file :

q=*%3A*&fq=Category%3A%223__107%22&fq=S_P1540477699%3A%22MICROCIRCUIT%2C+LINE+TRANSCEIVERS%22&rows=0&facet=true&facet.mincount=1&facet.limit=2&facet.field=S_C1503120369&facet.field=S_P1406389942&facet.field=S_P1430116878&facet.field=S_P1430116881&facet.field=S_P1406453552&facet.field=S_P1406451296&facet.field=S_P1406452465&facet.field=S_C2968809156&facet.field=S_P1406389980&facet.field=S_P1540477699&facet.field=S_P1406389982&facet.field=S_P1406389984&facet.field=S_P1406451284&facet.field=S_P1406389926&facet.field=S_P1424886581&facet.field=S_P2017662632&facet.field=F_P1946367021&facet.field=S_P1430116884&facet.field=S_P2017662620&facet.field=F_P1406451304&facet.field=F_P1406451306&facet.field=F_P1406451308&facet.field=S_P1500901421&facet.field=S_P1507138990&facet.field=I_P1406452433&facet.field=I_P1406453565&facet.field=I_P1406452463&facet.field=I_P1406453573&facet.field=I_P1406451324&facet.field=I_P1406451288&facet.field=S_P1406451282&facet.field=S_P1406452471&facet.field=S_P1424886605&facet.field=S_P1946367015&facet.field=S_P1424886598&facet.field=S_P1946367018&facet.field=S_P1406453556&facet.field=S_P1406389932&facet.field=S_P2017662623&facet.field=S_P1406450978&facet.field=F_P1406452455&facet.field=S_P1406389972&facet.field=S_P1406389974&facet.field=S_P1406389986&facet.field=F_P1946367027&facet.field=F_P1406451294&facet.field=F_P1406451286&facet.field=F_P1406451328&facet.field=S_P1424886593&facet.field=S_P1406453567&facet.field=S_P2017662629&facet.field=S_P1406453571&facet.field=F_P1946367030&facet.field=S_P1406453569&facet.field=S_P2017662626&facet.field=S_P1406389978&facet.field=F_P1946367024

My primary question here is, can Solr handle this kind of queries with so
many facet fields. I have tried using both enum and fc for facet.method and
there is no improvement with either.

Appreciate any help on this. Thank you.

- Rahul


On Mon, Apr 30, 2012 at 2:53 PM, Rahul R  wrote:

> Hello,
> I am using solr 1.3 with jdk 1.5.0_14 and weblogic 10MP1 application
> server on Solaris. I use embedded solr server. More details :
> Number of docs in solr index : 1.4 million
> Physical size of index : 640MB
> Total number of fields in the index : 700 (99% of these are dynamic fields)
> Total number of fields enabled for faceting : 440
> Avg number of facet fields participating in a faceted query : 50-70
> Total RAM allocated to weblogic appserver : 3GB (max possible)
>
> In a multi user environment with 3 users using this application for a
> period of around 40 minutes, the application runs out of memory. Analysis
> of the heap dump shows that almost 85% of the memory is retained by the
> FieldCache. Now I understand that the field cache is out of our control but
> would appreciate some suggestions on how to handle this issue.
>
> Some questions on this front :
> - some mail threads on this forum seem to indicate that there could be
> some connection between having dynamic fields and usage of FieldCache. Is
> this true ? Most of the fields in my index are dynamic fields.
> - as mentioned above, most of my faceted queries could have around 50-70
> facet fields (I would do SolrQuery.addFacetField() for around 50-70 fields
> per query). Could this be the source of the problem ? Is this too high for
> solr to support ?
> - Initially, I had a facet.sort defined in solrconfig.xml. Since
> FieldCache builds up on sorting, I even removed the facet.sort and tried,
> but no respite. The behavior is same as before.
> - The document id that I have for each document is quite big (around 50
> characters on average). Can this be a problem ? I reduced this to around 15
> characters and tried but still there is no improvement.
> - Can the size of the data be a problem ? But on this forum, I see many
> users talking of more than 100 million documents in their index. I have
> only 1.4 million with physical size of 640MB. The physical server on which
> this application is running, has sufficient RAM and CPU.
> - What gets stored in the FieldCache ? Is it the entire document or just
> the document Id ?
>
>
> Any help is much appreciated. Thank you.
>
> regards
> Rahul
>
>
>


Re: Lucene FieldCache - Out of memory exception

2012-05-02 Thread Rahul R
Jack,
Yes, the queries work fine till I hit the OOM. The fields that start with
S_* are strings, F_* are floats, I_* are ints and so so. The dynamic field
definitions from schema.xml :
 
   
   
   
   

*Each FieldCache will be an array with maxdoc entries (your total number of
documents - 1.4 million) times the size of the field value or whatever a
string reference is in your JVM*
So if I understand correct - every field (dynamic or normal) will have its
own field cache. The size of the field cache for any field will be (maxDocs
* sizeOfField) ? If the field has only 100 unique values, will it occupy
(100 * sizeOfField) or will it still be (maxDocs * sizeOfField) ?

*Roughly what is the typical or average length of one of your facet field
values? And, on average, how many unique terms are there within a typical
faceted field?*
Each field length may vary from 10 - 30 characters. Average of 20 maybe.
Number of unique terms within a faceted field will vary from 100 - 1000.
Average of 300. How will the number of unique terms affect performance ?

*3 GB sounds like it might not be enough for such heavy use of faceting. It
is probably not the 50-70 number, but the 440 or accumulated number across
many queries that pushes the memory usage up*
I am using jdk1.5.0_14 - 32 bit. With 32 bit jdk, I think there is a
limitation that more RAM cannot be allocated.

*When you hit OOM, what does the Solr admin stats display say for
FieldCache?*
I don't have solr deployed as a separate web app. All solr jar files are
present in my webapp's WEB-INF\lib directory. I use EmbeddedSolrServer. So
is there a way I can get this information that the admin would show ?

Thank you for your time.

-Rahul


On Wed, May 2, 2012 at 5:19 PM, Jack Krupansky wrote:

> The FieldCache gets populated the first time a given field is referenced
> as a facet and then will stay around forever. So, as additional queries get
> executed with different facet fields, the number of FieldCache entries will
> grow.
>
> If I understand what you have said, theses faceted queries do work
> initially, but after awhile they stop working with OOM, correct?
>
> The size of a single FieldCache depends on the field type. Since you are
> using dynamic fields, it depends on your "dynamicField" types - which you
> have not told us about. From your query I see that your fields start with
> "S_" and "F_" - presumably you have dynamic field types "S_*" and "F_*"?
> Are they strings, integers, floats, or what?
>
> Each FieldCache will be an array with maxdoc entries (your total number of
> documents - 1.4 million) times the size of the field value or whatever a
> string reference is in your JVM.
>
> String fields will take more space than numeric fields for the FieldCache,
> since a separate table is maintained for the unique terms in that field.
> Roughly what is the typical or average length of one of your facet field
> values? And, on average, how many unique terms are there within a typical
> faceted field?
>
> If you can convert many of these faceted fields to simple integers the
> size should go down dramatically, but that depends on your application.
>
> 3 GB sounds like it might not be enough for such heavy use of faceting. It
> is probably not the 50-70 number, but the 440 or accumulated number across
> many queries that pushes the memory usage up.
>
> When you hit OOM, what does the Solr admin stats display say for
> FieldCache?
>
> -- Jack Krupansky
>
> -Original Message- From: Rahul R
> Sent: Wednesday, May 02, 2012 2:22 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Lucene FieldCache - Out of memory exception
>
>
> Here is one sample query that I picked up from the log file :
>
> q=*%3A*&fq=Category%3A%223__**107%22&fq=S_P1540477699%3A%**
> 22MICROCIRCUIT%2C+LINE+**TRANSCEIVERS%22&rows=0&facet=**
> true&facet.mincount=1&facet.**limit=2&facet.field=S_**
> C1503120369&facet.field=S_**P1406389942&facet.field=S_**
> P1430116878&facet.field=S_**P1430116881&facet.field=S_**
> P1406453552&facet.field=S_**P1406451296&facet.field=S_**
> P1406452465&facet.field=S_**C2968809156&facet.field=S_**
> P1406389980&facet.field=S_**P1540477699&facet.field=S_**
> P1406389982&facet.field=S_**P1406389984&facet.field=S_**
> P1406451284&facet.field=S_**P1406389926&facet.field=S_**
> P1424886581&facet.field=S_**P2017662632&facet.field=F_**
> P1946367021&facet.field=S_**P1430116884&facet.field=S_**
> P2017662620&facet.field=F_**P1406451304&facet.field=F_**
> P1406451306&facet.field=F_**P1406451308&facet.field=S_**
> P1500901421&facet.field=S_**P1507138990&facet.field=I_**
> P1406452433&facet.field

Re: how to limit solr indexing to specific number of rows

2012-05-03 Thread Rahul Warawdekar
Hi,

What is the error that you are getting ?
ROWNUM works fine with DIH, I have tried and tested it with Solr 3.1.

One thing that comes to my mind is the query that you are using to
implement ROWNUM.
Do you replaced the "<" in the query by a "<" in dataconfig.xml ?
like "ROMNUM < =100" ?

On Thu, May 3, 2012 at 4:11 PM, srini  wrote:

> I am doing database import using solr DIH. I would like to limit the solr
> indexing to specific number. In other words If Solr reaches indexing 100
> records I want to database import to stop importing.
>
> Not sure if there is any particular setting that would tell solr that I
> only
> want to import 100 rows from database and index those 100 records.
>
> I tried to give select query with ROMNUM<=100 (using oracle) in
> data-config.xml, but it gave error. Any ideas!!!
>
> Thanks in Advance
> Srini
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/how-to-limit-solr-indexing-to-specific-number-of-rows-tp3960344.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Thanks and Regards
Rahul A. Warawdekar


Re: Lucene FieldCache - Out of memory exception

2012-05-07 Thread Rahul R
Jack,
Sorry for the delayed response:
Total memory allocated : 3GB
Free Memory on startup of application server : 2.85GB (95%)
Free Memory after first request by first user(1 request involves 3 queries)
: 2.7GB (90%)
Free Memory after a few requests by same user : 2.52GB (84%)

All values recorded above have been done after 2 force GCs were done to
identify the free memory.

The progression of memory usage looks quite high with the above numbers. As
the number of searches widen, the speed of memory consumption decreases.
But at some point it does hit OOM.

- Rahul

On Thu, May 3, 2012 at 8:37 PM, Jack Krupansky wrote:

> Just for a baseline, how much memory is available in the JVM (using
> jconsole or something similar) before you do your first query, and then
> after your first query (that has these 50-70 facets), and then after a few
> different queries (different facets.) Just to see how close you are to "the
> edge" even before a volume of queries start coming in.
>
>
> -- Jack Krupansky
>
> -Original Message- From: Rahul R
> Sent: Thursday, May 03, 2012 1:28 AM
>
> To: solr-user@lucene.apache.org
> Subject: Re: Lucene FieldCache - Out of memory exception
>
> Jack,
> Yes, the queries work fine till I hit the OOM. The fields that start with
> S_* are strings, F_* are floats, I_* are ints and so so. The dynamic field
> definitions from schema.xml :
>  omitNorms="true"/>
>   omitNorms="true"/>
>   omitNorms="true"/>
>   omitNorms="true"/>
>   omitNorms="true"/>
>
> *Each FieldCache will be an array with maxdoc entries (your total number of
>
> documents - 1.4 million) times the size of the field value or whatever a
> string reference is in your JVM*
>
> So if I understand correct - every field (dynamic or normal) will have its
> own field cache. The size of the field cache for any field will be (maxDocs
> * sizeOfField) ? If the field has only 100 unique values, will it occupy
> (100 * sizeOfField) or will it still be (maxDocs * sizeOfField) ?
>
> *Roughly what is the typical or average length of one of your facet field
>
> values? And, on average, how many unique terms are there within a typical
> faceted field?*
>
> Each field length may vary from 10 - 30 characters. Average of 20 maybe.
> Number of unique terms within a faceted field will vary from 100 - 1000.
> Average of 300. How will the number of unique terms affect performance ?
>
> *3 GB sounds like it might not be enough for such heavy use of faceting. It
>
> is probably not the 50-70 number, but the 440 or accumulated number across
> many queries that pushes the memory usage up*
>
> I am using jdk1.5.0_14 - 32 bit. With 32 bit jdk, I think there is a
> limitation that more RAM cannot be allocated.
>
> *When you hit OOM, what does the Solr admin stats display say for
> FieldCache?*
>
> I don't have solr deployed as a separate web app. All solr jar files are
> present in my webapp's WEB-INF\lib directory. I use EmbeddedSolrServer. So
> is there a way I can get this information that the admin would show ?
>
> Thank you for your time.
>
> -Rahul
>
>
> On Wed, May 2, 2012 at 5:19 PM, Jack Krupansky **
> wrote:
>
>  The FieldCache gets populated the first time a given field is referenced
>> as a facet and then will stay around forever. So, as additional queries
>> get
>> executed with different facet fields, the number of FieldCache entries
>> will
>> grow.
>>
>> If I understand what you have said, theses faceted queries do work
>> initially, but after awhile they stop working with OOM, correct?
>>
>> The size of a single FieldCache depends on the field type. Since you are
>> using dynamic fields, it depends on your "dynamicField" types - which you
>> have not told us about. From your query I see that your fields start with
>> "S_" and "F_" - presumably you have dynamic field types "S_*" and "F_*"?
>> Are they strings, integers, floats, or what?
>>
>> Each FieldCache will be an array with maxdoc entries (your total number of
>> documents - 1.4 million) times the size of the field value or whatever a
>> string reference is in your JVM.
>>
>> String fields will take more space than numeric fields for the FieldCache,
>> since a separate table is maintained for the unique terms in that field.
>> Roughly what is the typical or average length of one of your facet field
>> values? And, on average, how many unique terms are there within a typical
>> faceted field?
>>
>> If you can convert many of these faceted fields to simple integers the
>> size should

Re: Lucene FieldCache - Out of memory exception

2012-05-08 Thread Rahul R
A update on the things I tried today. Since multiValued fields do not use
the fieldCache, I changed my schema to define all my fields as multiValued
fields. Although these fields need to be only single valued, I made this
change and recreated the index and tested with it. Observations :
- force GC always results in freeing up most of the heap i.e the FieldCache
doesn't seem to be created. So OOM issue does not occur.
- response time is terribly slow for faceting queries. Application is
almost unusable and system monitoring shows high CPU usage.
- using solr caches - documentCache, filterCache & queryResultsCache - does
not seem to improve performance. Cache sizes are documentCache - 100K,
filterCache - 10K, queryResultsCache - 10K.

I don't think I can use this as a solution because response times are very
poor. But a few questions :
- solr documentation indicates that the fieldCache gets built up on sorting
and function queries only. When I use single Valued fields, I don't do any
explicit sorting or use any functions. Could there be some setting that
results in automatic sorting to happen on the result set (although I don't
want a sort) ?
- is there a way I can improve faceting performance with all my fields as
multiValued fields ?

Appreciate any help on this. Thank you.

- Rahul

On Mon, May 7, 2012 at 7:23 PM, Rahul R  wrote:

> Jack,
> Sorry for the delayed response:
> Total memory allocated : 3GB
> Free Memory on startup of application server : 2.85GB (95%)
> Free Memory after first request by first user(1 request involves 3
> queries) : 2.7GB (90%)
> Free Memory after a few requests by same user : 2.52GB (84%)
>
> All values recorded above have been done after 2 force GCs were done to
> identify the free memory.
>
> The progression of memory usage looks quite high with the above numbers.
> As the number of searches widen, the speed of memory consumption decreases.
> But at some point it does hit OOM.
>
> - Rahul
>
>
> On Thu, May 3, 2012 at 8:37 PM, Jack Krupansky wrote:
>
>> Just for a baseline, how much memory is available in the JVM (using
>> jconsole or something similar) before you do your first query, and then
>> after your first query (that has these 50-70 facets), and then after a few
>> different queries (different facets.) Just to see how close you are to "the
>> edge" even before a volume of queries start coming in.
>>
>>
>> -- Jack Krupansky
>>
>> -Original Message- From: Rahul R
>> Sent: Thursday, May 03, 2012 1:28 AM
>>
>> To: solr-user@lucene.apache.org
>> Subject: Re: Lucene FieldCache - Out of memory exception
>>
>> Jack,
>> Yes, the queries work fine till I hit the OOM. The fields that start with
>> S_* are strings, F_* are floats, I_* are ints and so so. The dynamic field
>> definitions from schema.xml :
>> > omitNorms="true"/>
>>  > omitNorms="true"/>
>>  > omitNorms="true"/>
>>  > omitNorms="true"/>
>>  > omitNorms="true"/>
>>
>> *Each FieldCache will be an array with maxdoc entries (your total number
>> of
>>
>> documents - 1.4 million) times the size of the field value or whatever a
>> string reference is in your JVM*
>>
>> So if I understand correct - every field (dynamic or normal) will have its
>> own field cache. The size of the field cache for any field will be
>> (maxDocs
>> * sizeOfField) ? If the field has only 100 unique values, will it occupy
>> (100 * sizeOfField) or will it still be (maxDocs * sizeOfField) ?
>>
>> *Roughly what is the typical or average length of one of your facet field
>>
>> values? And, on average, how many unique terms are there within a typical
>> faceted field?*
>>
>> Each field length may vary from 10 - 30 characters. Average of 20 maybe.
>> Number of unique terms within a faceted field will vary from 100 - 1000.
>> Average of 300. How will the number of unique terms affect performance ?
>>
>> *3 GB sounds like it might not be enough for such heavy use of faceting.
>> It
>>
>> is probably not the 50-70 number, but the 440 or accumulated number across
>> many queries that pushes the memory usage up*
>>
>> I am using jdk1.5.0_14 - 32 bit. With 32 bit jdk, I think there is a
>> limitation that more RAM cannot be allocated.
>>
>> *When you hit OOM, what does the Solr admin stats display say for
>> FieldCache?*
>>
>> I don't have solr deployed as a separate web app. All solr jar files are
>> present in my webapp's WEB-INF\lib directory. I use EmbeddedSolrServer. So
>> is there a way I can get

  1   2   3   >