from:"Revas"

field type definition

2009-11-20 Thread revas

Hello,

If  I define a field like this in the schema  ,is this correct ?

 
 - <http://sites.google.com/a/impelsys.com/search/phrase-match#> 
   
   
   
  
  

Here I am not differentiating it in terms of query analyzer and the index
analyzer and  I am assuming that this will be used by both
query and index analyzer .Is this correct?

Regards
Revas

Help on this parsed query

2009-11-25 Thread revas

I have the text analyzer defined as follows










































when i search on this field  name simple of the above field type , the term
peRsonal

*I expect it to search as   simple:personal   simple :pe simple:rsonal*



instead the parsed query string says

*simple:peRsonal*
 * * *simple:peRsonal*
 * * *MultiPhraseQuery(simple:"(person pe)
rsonal")*
 * * *simple:"(person pe) rsonal"*


what is this multiphrase query ,why is this a phrase query istead of simple
query?


Regards
Revas

Number of webapps

2009-02-25 Thread revas

Hi

I am sure this question has been repeated many times over and there has been
several generic answers ,but i am looking for specific answers.

I have a single server whose configuration i give below,this being the only
server we have at present ,the requirement is everytime we create a new
website ,we create two instances for the same one for content search and one
for product search ,both have faceting requirements.

there are about 25 fields for product schema and abt 20 for content schema
,we do not store the content in the server ,the content is only indexed.

Assuming that we currently have 10 websites ,which means we have 20 webapps
running on this server each having about 1000 documents  and size of the
index is approximately 50mb currently .The index size of each is expected to
grow continlously as more products are added.


We recenlty got the followng error on creation of a new webapp ?
SEVERE: Caught exception (java.lang.OutOfMemoryError: PermGen space)
executing org.apache.tomcat.util.net.leaderfollowerworkerthr...@1c2534f,
terminating thread
Feb 24, 2009 6:22:16 AM
org.apache.tomcat.util.threads.ThreadPool$ControlRunnable run
SEVERE: Caught exception (java.lang.OutOfMemoryError: PermGen space)
executing org.apache.tomcat.util.net.leaderfollowerworkerthr...@1c2534f,
terminating thread
  Sent at 12:32 PM on Wednesday

What would this mean?Given the above,How many such webapps can we have
on this server?

*Server config*

OS: Red Hat Enterprise Linux ES 4 - 64 Bit
# Processor: Dual AMD Opteron Dual Core 270 2.0 GHz
# 4GB DDR RAM
# Hard Drive: 73GB SCSI
# Hard Drive: 73GB SCSI

thanks

Re: Number of webapps

2009-02-25 Thread revas

thanks will try that .I also have the war file for each solr instance in the
home directory of the instance ,would that be the problem ?

if i were to have common war file for n instances ,would there be any issue?

regards
revas

On 2/25/09, Michael Della Bitta  wrote:
>
> It's possible I don't know enough about Solr's internals and there's a
> better solution than this, and it's surprising me that you're running
> out of PermGen space before you're running out of heap, but maybe
> you've already increased the general heap size without tweaking
> PermGen, and loading all the classes involved in loading 20 contexts
> is putting you over. In any case, you might try adding the following
> option to CATALINA_OPTS: -XX:MaxPermSize=256m. If you don't know where
> to put something like that, you might try adding the following line to
> $TOMCAT_HOME/bin/startup.sh:
>
> export CATALINA_OPTS="-XX:MaxPermSize=256m ${CATALINA_OPTS}"
>
> If that value (256) doesn't alleviate the problem, you might try increasing
> it.
>
> Hope that helps,
>
> Michael Della Bitta
>
>
> On Wed, Feb 25, 2009 at 3:08 AM, revas  wrote:
> > Hi
> >
> > I am sure this question has been repeated many times over and there has
> been
> > several generic answers ,but i am looking for specific answers.
> >
> > I have a single server whose configuration i give below,this being the
> only
> > server we have at present ,the requirement is everytime we create a new
> > website ,we create two instances for the same one for content search and
> one
> > for product search ,both have faceting requirements.
> >
> > there are about 25 fields for product schema and abt 20 for content
> schema
> > ,we do not store the content in the server ,the content is only indexed.
> >
> > Assuming that we currently have 10 websites ,which means we have 20
> webapps
> > running on this server each having about 1000 documents  and size of the
> > index is approximately 50mb currently .The index size of each is expected
> to
> > grow continlously as more products are added.
> >
> >
> > We recenlty got the followng error on creation of a new webapp ?
> >SEVERE: Caught exception (java.lang.OutOfMemoryError: PermGen space)
> > executing org.apache.tomcat.util.net.leaderfollowerworkerthr...@1c2534f,
> > terminating thread
> > Feb 24, 2009 6:22:16 AM
> > org.apache.tomcat.util.threads.ThreadPool$ControlRunnable run
> > SEVERE: Caught exception (java.lang.OutOfMemoryError: PermGen space)
> > executing org.apache.tomcat.util.net.leaderfollowerworkerthr...@1c2534f,
> > terminating thread
> >  Sent at 12:32 PM on Wednesday
> >
> >What would this mean?Given the above,How many such webapps can we have
> > on this server?
> >
> > *Server config*
> >
> > OS: Red Hat Enterprise Linux ES 4 - 64 Bit
> > # Processor: Dual AMD Opteron Dual Core 270 2.0 GHz
> > # 4GB DDR RAM
> > # Hard Drive: 73GB SCSI
> > # Hard Drive: 73GB SCSI
> >
> > thanks
> >
>

Solr and Zend Lucene

2009-03-05 Thread revas

Hi,

I have a requirement where i need to search offline.We are thinking of doing
this by storing the index terms in a db .

Is there a was of accessing the index tokens in solr 1.3 ?

The other way is to use Zend_lucene to read the index file of solr as zend
lucene has method for doing this.But Zend lucene is not able to
open the solr index files ,the error being unsupported format.

The final option is to reindex using zend lucene and read the index tokens
,but then facets are not supported by zend-lucene

Any body done something similar,please give your thoughts or pointers

Regards
Revas

change the lucene version

2009-03-05 Thread revas

Hi,

If i need to change the lucene version of solr ,then how can we do this?

Regards
Revas

Re: Number of webapps

2009-03-05 Thread revas

HI,

How do i get the info on the current setting of MaxPermSize?

Regards
Sujahta

On 2/27/09, Alexander Ramos Jardim  wrote:
>
> Another simple solution for your requirement is to use multicore. This way
> you will have only one Solr webapp loaded with as many indexes as you need.
>
> See more at http://wiki.apache.org/solr/MultiCore
>
> 2009/2/25 Michael Della Bitta 
>
> > Unfortunately, I think the way this works is the container creates a
> > Classloader for each context and loads the contents of the .war into
> > that, regardless of whether each context references the same .war
> > file. All those classes are stored in permanent generation space, and
> > I'm fairly sure if you restart a context individually with the manager
> > application, a new ClassLoader for the context is created and the
> > permanent generation space the old one was consuming is simply leaked.
> >
> > Something that is crazy enough to work might be to unpack the Solr
> > .war and move all the .jar files and class files that don't contain
> > servlet API classes to .jars in $TOMCAT_HOME/lib, and then repack the
> > .war without these files. These would then be loaded by the common
> > classloader once per container, instead of once per context. You can
> > read more about this classloader business here:
> > http://tomcat.apache.org/tomcat-6.0-doc/class-loader-howto.html (might
> > need a different URL depending on the version of Tomcat you're
> > running).
> >
> > Michael
> >
> > On Wed, Feb 25, 2009 at 11:42 AM, revas  wrote:
> > > thanks will try that .I also have the war file for each solr instance
> in
> > the
> > > home directory of the instance ,would that be the problem ?
> > >
> > > if i were to have common war file for n instances ,would there be any
> > issue?
> > >
> > > regards
> > > revas
> > >
> > > On 2/25/09, Michael Della Bitta  wrote:
> > >>
> > >> It's possible I don't know enough about Solr's internals and there's a
> > >> better solution than this, and it's surprising me that you're running
> > >> out of PermGen space before you're running out of heap, but maybe
> > >> you've already increased the general heap size without tweaking
> > >> PermGen, and loading all the classes involved in loading 20 contexts
> > >> is putting you over. In any case, you might try adding the following
> > >> option to CATALINA_OPTS: -XX:MaxPermSize=256m. If you don't know where
> > >> to put something like that, you might try adding the following line to
> > >> $TOMCAT_HOME/bin/startup.sh:
> > >>
> > >> export CATALINA_OPTS="-XX:MaxPermSize=256m ${CATALINA_OPTS}"
> > >>
> > >> If that value (256) doesn't alleviate the problem, you might try
> > increasing
> > >> it.
> > >>
> > >> Hope that helps,
> > >>
> > >> Michael Della Bitta
> > >>
> > >>
> > >> On Wed, Feb 25, 2009 at 3:08 AM, revas  wrote:
> > >> > Hi
> > >> >
> > >> > I am sure this question has been repeated many times over and there
> > has
> > >> been
> > >> > several generic answers ,but i am looking for specific answers.
> > >> >
> > >> > I have a single server whose configuration i give below,this being
> the
> > >> only
> > >> > server we have at present ,the requirement is everytime we create a
> > new
> > >> > website ,we create two instances for the same one for content search
> > and
> > >> one
> > >> > for product search ,both have faceting requirements.
> > >> >
> > >> > there are about 25 fields for product schema and abt 20 for content
> > >> schema
> > >> > ,we do not store the content in the server ,the content is only
> > indexed.
> > >> >
> > >> > Assuming that we currently have 10 websites ,which means we have 20
> > >> webapps
> > >> > running on this server each having about 1000 documents  and size of
> > the
> > >> > index is approximately 50mb currently .The index size of each is
> > expected
> > >> to
> > >> > grow continlously as more products are added.
> > >> >
> > >> >
> > >> > We recenlty got the followng error on creation of a new webapp ?
> > >> >SEVERE: Caught exception (java.lang.OutOfMemoryError: PermGen
> > space)
> > >> > executing
> > org.apache.tomcat.util.net.leaderfollowerworkerthr...@1c2534f,
> > >> > terminating thread
> > >> > Feb 24, 2009 6:22:16 AM
> > >> > org.apache.tomcat.util.threads.ThreadPool$ControlRunnable run
> > >> > SEVERE: Caught exception (java.lang.OutOfMemoryError: PermGen space)
> > >> > executing
> > org.apache.tomcat.util.net.leaderfollowerworkerthr...@1c2534f,
> > >> > terminating thread
> > >> >  Sent at 12:32 PM on Wednesday
> > >> >
> > >> >What would this mean?Given the above,How many such webapps can we
> > have
> > >> > on this server?
> > >> >
> > >> > *Server config*
> > >> >
> > >> > OS: Red Hat Enterprise Linux ES 4 - 64 Bit
> > >> > # Processor: Dual AMD Opteron Dual Core 270 2.0 GHz
> > >> > # 4GB DDR RAM
> > >> > # Hard Drive: 73GB SCSI
> > >> > # Hard Drive: 73GB SCSI
> > >> >
> > >> > thanks
> > >> >
> > >>
> > >
> >
>
>
>
> --
> Alexander Ramos Jardim
>

Re: Solr and Zend Lucene

2009-03-05 Thread revas

We will be using sqllite for db.This can be used for a cd version  where we
need to provide search


On 3/5/09, Grant Ingersoll  wrote:
>
>
> On Mar 5, 2009, at 3:10 AM, revas wrote:
>
> Hi,
>>
>> I have a requirement where i need to search offline.We are thinking of
>> doing
>> this by storing the index terms in a db .
>>
>
> I'm not sure I follow.  How is it that Solr would be offline, but your DB
> would be online?  Can you explain a bit more the problem you are trying to
> solve?
>
>
>
>>
>> Is there a was of accessing the index tokens in solr 1.3 ?
>>
>
> Not in 1.3, but trunk does.  Have a look at the TermsComponent (
> http://wiki.apache.org/solr/TermsComponent).  I suppose if you got things
> in a JSON or binary format, the performance might not be horrible, but it
> will depend on the # of terms in the index.  Or, you could get things in
> stages, i.e. all terms between a and b, etc.  It might be back compatible
> with 1.3, but I don't know for sure.
>
>
> -Grant
>

Re: Solr and Zend Lucene

2009-03-06 Thread revas

The luke request handler returns all the tokens from the index ,is this
correct?

On 3/5/09, revas  wrote:
>
> We will be using sqllite for db.This can be used for a cd version  where we
> need to provide search
>
>
> On 3/5/09, Grant Ingersoll  wrote:
>>
>>
>> On Mar 5, 2009, at 3:10 AM, revas wrote:
>>
>> Hi,
>>>
>>> I have a requirement where i need to search offline.We are thinking of
>>> doing
>>> this by storing the index terms in a db .
>>>
>>
>> I'm not sure I follow.  How is it that Solr would be offline, but your DB
>> would be online?  Can you explain a bit more the problem you are trying to
>> solve?
>>
>>
>>
>>>
>>> Is there a was of accessing the index tokens in solr 1.3 ?
>>>
>>
>> Not in 1.3, but trunk does.  Have a look at the TermsComponent (
>> http://wiki.apache.org/solr/TermsComponent).  I suppose if you got things
>> in a JSON or binary format, the performance might not be horrible, but it
>> will depend on the # of terms in the index.  Or, you could get things in
>> stages, i.e. all terms between a and b, etc.  It might be back compatible
>> with 1.3, but I don't know for sure.
>>
>>
>> -Grant
>>
>
>

Luke request handler

2009-03-06 Thread revas

Hi,

I just want to confirm my understanding of luke request handler.

It gives us the raw lucene index tokens on a field by field basis.


What should be the query to return all tokens for a field .



Is there any way to return all the token across all fields

Regards
Revas

muticore setup with tomcat

2009-03-09 Thread revas

Hi,

I am trying to do amulticore set up..

I added the following from the 1.3 solr download to new dir called multicore

core0 ,core1,solr.xml and solr.war

in the tomcat context fragment i have defined as


   

http://localhost:8080/multicore/admin
http://localhost:8080/multicore/admin/core0

The above 2 ursl give me resource not found error

the solr.xml is the default one from the download.

Please tell me as to what needs to be changed to make this work in tomcat

Regards
Sujatha

Sharding question

2009-03-16 Thread revas

Hi,

If i were to add a second server for sharding once ,the first server reaches
its limit and then if i need to update any document,how can i figure out on
which server the document is located?

Regards
Sujatha

stop word search

2009-03-17 Thread revas

Hi,

I have a query like this

content:the AND iuser_id:5

which means return all docs of user id 5 which have the word "the" in
content .Since 'the' is a stop word ,this query executes as just user_id :5
inspite of the "AND" clause ,Whereas the expected result here is since there
is no result for  "the " ,no results shloud be returned.

Am i missing anythin here?

Regards

Re: stop word search

2009-03-20 Thread revas

Hi Erik,

I have now commented the query time stopword analyzer .I restarted the
server.But now when i search for a stop word ,i am getting results.

We had earlier indexed the content with the stop word analyzer.I dont think
we need to reindex after commentting the query analyzer,right?

This field is a text field with the defaul analyzer.

Please let me know if i have missed something here.

Regards
Sujatha


On 3/17/09, Erick Erickson  wrote:
>
> Well, by definition, using an analyzer that removes stopwords
> *should* do this at query time. This assumes that you used
> an analyzer that removed stopwords at index and query time.
> The stopwords are not in the index.
>
> You can get the behavior you expect by using an analyzer at
> query time that does NOT remove stopwords, and one at
> indexing time that *does* remove stopwords. Gut I'm having a
> hard time imagining that this would result in a good user experience.
>
> I mean anytime that you had a stopword in the query where the
> stopword was required, no results would be returned. Which would
> be hard to explain to a user
>
> What is it you're trying to accomplish?
>
> Best
> Erick
>
>
>
> On Tue, Mar 17, 2009 at 7:40 AM, revas  wrote:
>
> > Hi,
> >
> > I have a query like this
> >
> > content:the AND iuser_id:5
> >
> > which means return all docs of user id 5 which have the word "the" in
> > content .Since 'the' is a stop word ,this query executes as just user_id
> :5
> > inspite of the "AND" clause ,Whereas the expected result here is since
> > there
> > is no result for  "the " ,no results shloud be returned.
> >
> > Am i missing anythin here?
> >
> > Regards
> >
>

Re: stop word search

2009-03-21 Thread revas

Hi Erick,

I still don't get it.The scenario is like this.

Intially i indexed the content with the stop word filter at both index time
and query time.That means
the stop words are not there in the index .

Now i removed the stop filter only at query time so that a query like this
will not fetch results
content:the AND id:8  as due to the stop filter this query becomes just id:8
and returns results.

Why would i have to reindex as there shloud not be any stop words in the
index in the first place.


Thanks for your time .

Regards




On 3/21/09, Erick Erickson  wrote:
>
> Yes, you do need to reindex after removing the stopword filter
> from the configuration. When you indexed the first time using
> the stopword filter, the words were NOT indexed, so they won't
> be found now that they're getting through the query analyzer.
>
> Best
> Erick
>
> On Fri, Mar 20, 2009 at 1:02 PM, revas  wrote:
>
> > Hi Erik,
> >
> > I have now commented the query time stopword analyzer .I restarted the
> > server.But now when i search for a stop word ,i am getting results.
> >
> > We had earlier indexed the content with the stop word analyzer.I dont
> think
> > we need to reindex after commentting the query analyzer,right?
> >
> > This field is a text field with the defaul analyzer.
> >
> > Please let me know if i have missed something here.
> >
> > Regards
> > Sujatha
> >
> >
> > On 3/17/09, Erick Erickson  wrote:
> > >
> > > Well, by definition, using an analyzer that removes stopwords
> > > *should* do this at query time. This assumes that you used
> > > an analyzer that removed stopwords at index and query time.
> > > The stopwords are not in the index.
> > >
> > > You can get the behavior you expect by using an analyzer at
> > > query time that does NOT remove stopwords, and one at
> > > indexing time that *does* remove stopwords. Gut I'm having a
> > > hard time imagining that this would result in a good user experience.
> > >
> > > I mean anytime that you had a stopword in the query where the
> > > stopword was required, no results would be returned. Which would
> > > be hard to explain to a user
> > >
> > > What is it you're trying to accomplish?
> > >
> > > Best
> > > Erick
> > >
> > >
> > >
> > > On Tue, Mar 17, 2009 at 7:40 AM, revas  wrote:
> > >
> > > > Hi,
> > > >
> > > > I have a query like this
> > > >
> > > > content:the AND iuser_id:5
> > > >
> > > > which means return all docs of user id 5 which have the word "the" in
> > > > content .Since 'the' is a stop word ,this query executes as just
> > user_id
> > > :5
> > > > inspite of the "AND" clause ,Whereas the expected result here is
> since
> > > > there
> > > > is no result for  "the " ,no results shloud be returned.
> > > >
> > > > Am i missing anythin here?
> > > >
> > > > Regards
> > > >
> > >
> >
>

caching

2009-03-23 Thread revas

If i don't explicity set any default  query in the solrconfig.xml  for
caching and make use of the default config file,does solr do the caching
automatically based on the query?

Thanks

Facets drill down

2009-04-02 Thread revas

Hi,

I typically issue a facetdrill down query thus

q=somequery and Facetfield:facetval .

Is there any issues with the above approach as opposed to
&fq=facetfield:value in terms of memory consumption and the use of cache.

Regards
Suajatha

Multi-language support

2009-04-09 Thread revas

Hi,

To reframe my earlier question

Some languages have just analyzers only but nostemmer from snowball
porter,then does the analyzer take care of stemming as well?

Some languages only have the stemmer from snowball but no analyzer?

Some have both.

Can we say then that solr supports all the above languages .Will search be
same across all the above cases?

thanks
revas

Analyzers and stemmer

2009-04-09 Thread revas

Hi ,

  With respect to language support in solr ,we have analyzers for some
languages and stemmers for certain langauges.Do we say that solr supports
this particular language only if we have both analyzer and stemmer for the
language or also for which we have analyzer but not stemmer

Regards
Sujatha

Solr Cache Usage

2009-05-19 Thread revas

Hi,

We are running several webapps under a single container roughly about 40 -50
.All have similar schema .Under this circumstance ,
how would i calculate the cache memory allocation?The number of  documents
per webapps is roughly abt 1000 currently but like ly to increase in future.

Would it make sense to enble caching with these many apps?

For example
would  filter cahce size be  :no of unique facet fileds in each of the
webapps?

In the doucment cache  =max results* max concurrent users ,would this be
based on the average or would it be a multiplication for each webapp
combined together  in which case we need to have atleast that much  RAM.

Suppose we have only 2gb ram on a dual core ,then what happens in typical
wiki based document cache entry ?how would we have documents from each
webapp in the cache .Would this value need to be multiplied by number of
webapps ?

Thanks
Regards
Revas

Compund file format

2009-05-19 Thread revas

What is the draw back in using compunf file format for indexing when we have
several webapps in a sinle container

Regards
Sujatha

query issue /special character and case

2009-05-29 Thread revas

Hi ,

When i give a query like the following ,why does it become a phrase query
as shown below?
The field type is the default text field in the schema.

volker-blanz
PhraseQuery(content:"volker blanz")

Also when i have special characters in the query as SCHÖLKOPF , i am not
able to convert the "o" with spl character  to lower case on my unix os/it
works fine on windows xp OS .Also if i have a spl character in my  query ,i
would like to search for it wihtout the special character as  SCHOLKOPF
,this works fine in windows with strtr (string translate php fucntion) ,but
again not in windows OS.

Any pointers

Regards
Revas

Re: query issue /special character and case

2009-06-08 Thread revas

On Sat, Jun 6, 2009 at 11:40 AM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> On Sat, May 30, 2009 at 9:48 AM, revas  wrote:
>
> > Hi ,
> >
> > When i give a query like the following ,why does it become a phrase query
> > as shown below?
> > The field type is the default text field in the schema.
> >
> > volker-blanz
> > PhraseQuery(content:"volker blanz")
> >
>
> What is the query that was sent to Solr?


The query is  content:volker-blanz and this is a default text field


>
>
>
> > Also when i have special characters in the query as SCHÖLKOPF , i am not
> > able to convert the "o" with spl character  to lower case on my unix
> os/it
> > works fine on windows xp OS .Also if i have a spl character in my  query
> ,i
> > would like to search for it wihtout the special character as  SCHOLKOPF
> > ,this works fine in windows with strtr (string translate php fucntion)
> ,but
> > again not in windows OS.
> >
>
> Hmm, not sure. If you are using Tomcat, have you enabled UTF-8?
>
>
> http://wiki.apache.org/solr/SolrTomcat#head-20147ee4d9dd5ca83ed264898280ab60457847c4
>
> You can try using the analysis.jsp on the text field with this token and
> see
> how it is being analyzed. See if that gives some hints.


Yes  i am using tomcat and have enabled utf -8 in tomcat.

>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>

spellcheck /too many open files

2009-06-08 Thread revas

Hi ,

1)Does the spell check component support all languages?


2) I have a scnenario where i have abt 20 webapps in  a single container.We
get too many open files at index time /while restarting tomcat.

The mergefactor is at default.

If i reduce the merge factor to 2 and optimize the index ,will the open
files be closed automatically or would i have to reindex to close the open
files or  how do i close the already opened files.This is on linux with solr
1.3 and tomcat 5.5

Regards
Revas

Re: spellcheck /too many open files

2009-06-09 Thread revas

But the spell check componenet uses the n-gram analyzer and henc should work
for any language ,is this correct ,also we can refer an extern dictionary
for suggestions ,could this be in any language?

The open files is not because of spell check as we have not yet implemented
this yet, every time we restart solr we need to up the ulimit ,otherwise it
does not work,so is there any workaround to permanently close this open
files ,does optmizing the index close it?

Regards
Sujatha

On Tue, Jun 9, 2009 at 12:53 PM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> On Tue, Jun 9, 2009 at 11:15 AM, revasHi  wrote:
>
> >
> > 1)Does the spell check component support all languages?
> >
>
> SpellCheckComponent relies on Lucene/Solr analyzers and tokenizers. So if
> you can find an analyzer/tokenizer for your language, spell checker can
> work.
>
>
> > 2) I have a scnenario where i have abt 20 webapps in  a single
> container.We
> > get too many open files at index time /while restarting tomcat.
>
>
> Is that because of SpellCheckComponent?
>
>
> > The mergefactor is at default.
> >
> > If i reduce the merge factor to 2 and optimize the index ,will the open
> > files be closed automatically or would i have to reindex to close the
> open
> > files or  how do i close the already opened files.This is on linux with
> > solr
> > 1.3 and tomcat 5.5
> >
>
> Lucene/Solr does not keep any file opened longer than it is necessary. But
> decreasing merge factor should help. You can also increase the open file
> limit on your system.
>
> --
> Regards,
> Shalin Shekhar Mangar.
>

Re: spellcheck /too many open files

2009-06-09 Thread revas

Thanks ShalinWhen we use the external  file  dictionary (if there is
one),then it should work fine ,right for spell check,also is there any
format for this file

Regards
Sujatha

On Tue, Jun 9, 2009 at 3:03 PM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> On Tue, Jun 9, 2009 at 2:56 PM, revas  wrote:
>
> > But the spell check componenet uses the n-gram analyzer and henc should
> > work
> > for any language ,is this correct ,also we can refer an extern dictionary
> > for suggestions ,could this be in any language?
> >
>
> Yes it does use n-grams but there's an analysis step before the n-grams are
> created. For example, if you are creating your spell check index from a
> Solr
> field, SpellCheckComponent uses that field's index time analyzer. So you
> should create your language-specific fields in such a way that the analysis
> works correctly for that language.
>
>
> > The open files is not because of spell check as we have not yet
> implemented
> > this yet, every time we restart solr we need to up the ulimit ,otherwise
> it
> > does not work,so is there any workaround to permanently close this open
> > files ,does optmizing the index close it?
> >
>
> Optimization merges the segments of the index into one big segment. So it
> will reduce the number of files. However, during the merge it may create
> many more files. The old files after the merge are cleanup by Lucene in a
> while (unless you have changed the defaults in the IndexDeletionPolicy
> section in solrconfig.xml).
>
> --
> Regards,
> Shalin Shekhar Mangar.
>

Re: spellcheck /too many open files

2009-06-09 Thread revas

Thanks

On Tue, Jun 9, 2009 at 5:14 PM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> On Tue, Jun 9, 2009 at 4:32 PM, revas  wrote:
>
> > Thanks ShalinWhen we use the external  file  dictionary (if there is
> > one),then it should work fine ,right for spell check,also is there any
> > format for this file
> >
>
> The external file should have one token per line. See
> http://wiki.apache.org/solr/FileBasedSpellChecker
>
> The default analyzer is WhitespaceAnalyzer. So all tokens in the file will
> be split on whitespace and the resulting tokens will be used for giving
> suggestions. If you want to change the analyzer, specify fieldType in the
> spell checker configuration and the component will use the analyzer
> configured for that field type.
>
> --
> Regards,
> Shalin Shekhar Mangar.
>

Re: Customizing results

2009-06-10 Thread revas

Hi Michael,

What is GNU gettext and how this can be used in a multilanguage scenario?
Regards
Revas

On Wed, Jun 10, 2009 at 8:10 PM, Michael Ludwig  wrote:

> Manepalli, Kalyan schrieb:
>
>> Hi,
>> I am trying to customize the response that I receive from Solr. In the
>> index I have multiple fields that contain the same data in different
>> language.
>> At the query time client specifies the language. Based on this param,
>> I want to return the value, copied into a different field.
>> Eg:
>> Lubang, Filippinerne
>> Lubang, Philippinen
>> Lubang, Philippines
>> Lubang, Filipinas
>>
>> If the user specifies language as de_de, then I want to return the
>> result as Lubang, Philippinen
>>
>
> If you control how the client works, you could also consider using an
> internationalization technology such as GNU Gettext for this purpose.
> May or may not make sense in your particular situation.
>
> Michael Ludwig
>

solr Analyzer help

2009-07-09 Thread revas

Hi ,

In the  solr 1.3 download ,under the folder
src/java/org/apache/solr/analysis

I find the following tokenizer classes for other languages (other than
English)

1.Chinese tokenizer
2.cjk tokenizer which is not expected to work very well with Japanese for
Chinese we already have the Chinese tokenizer

only the above 2 tokenizer are there for the languages

I also see  stem filter factory  and  palin filtet factory for some
languages like
DutchStemFilterFactory,BrazilianStemFilterFactory.java
GermanStemFilterFactory etc

and the plain filter  like ChineseFilterFactory.java

What is the stem filter factory  does it stem the words without including
the snowball porter filter factory

what is the simple filter factories do ?

where do i look for analyzers for other languages and also the information
on for which languages i can use the standard analyzers?

For example given only all the above

for German language analysis  am i to use the standardard anlyzer with
German filter factory and German stemmers ?

are there more language specific tokenizers in lucene and if so what are the
steps to integrate into solr?

Regards
Revas

Migration to Solr 1.4

2010-01-08 Thread revas

Hello,

I would like to  know if by just copying the solr.war file to my existing
solr 1.3 installation  ,lucene  version is also upgraded to current 2.9 ?

I believe reindex is not necessary  ,is that correct?

Is there anything else apart form this that i need to do to upgrade to the
latest lucene version?

Regards
Sujatha

Re: Migration to Solr 1.4

2010-01-08 Thread revas

Thanks ,Erik.



On Fri, Jan 8, 2010 at 4:34 PM, Erik Hatcher  wrote:

>
> On Jan 8, 2010, at 4:14 AM, revas wrote:
>
>> I would like to  know if by just copying the solr.war file to my existing
>> solr 1.3 installation  ,lucene  version is also upgraded to current 2.9 ?
>>
>
> Yes, Lucene 2.9 is built into solr.war, so you're automatically upgrading
> that too.
>
>
> I believe reindex is not necessary  ,is that correct?
>>
>
> Correct.
>
> Though for peace of mind it isn't a bad idea to reindex.  But your testing
> will tell you all is well, or not.
>
>
> Is there anything else apart form this that i need to do to upgrade to the
>> latest lucene version?
>>
>
> I'd encourage you to compare your solrconfig.xml and schema.xml files to
> the ones that ship with Solr 1.4's example.  You may want to adjust your
> configurations a bit.
>
>Erik
>
>

Overlapping onDeckSearchers=2

2010-05-02 Thread revas

Hello,

We have a server with many solr  instances running  (around 40-50) .

We are committing documents  ,sometimes one or sometimes around 200
documents at  a time .to only one instance at a time

When i run 2 -3 commits parallely  to diff instances or same instance I get
this error

PERFORMANCE WARNING: Overlapping onDeckSearchers=2

What is the Best approach to solve this

Regards
revas

Re: Overlapping onDeckSearchers=2

2010-05-04 Thread revas

Thanks for the repsonse .What happens in this scenario?

Does the commit happen in this case or does the search server hang or just
throws an error without  committing

Regards
Sujatha

On Mon, May 3, 2010 at 11:41 PM, Chris Hostetter
wrote:

> : When i run 2 -3 commits parallely  to diff instances or same instance I
> get
> : this error
> :
> : PERFORMANCE WARNING: Overlapping onDeckSearchers=2
> :
> : What is the Best approach to solve this
>
>
> http://wiki.apache.org/solr/FAQ#What_does_.22PERFORMANCE_WARNING:_Overlapping_onDeckSearchers.3DX.22_mean_in_my_logs.3F
>
>
>
> -Hoss
>
>

Re: DocValue field & commit

2020-03-30 Thread Revas

Thanks, Eric.

1) We are using dynamic string field for faceting where indexing =false and
stored=false . By default docValues are enabled for primitive fields (solr
6.6.), so not explicitly defined in schema. Do you think its wrong
assumption? Also I do not this field listed in feild cache, but dont see
any dynamic fields listed.
2) Autowarm count is at 32 for both and autowarm time is 25 for queryresult
and  17
3)Can you elaborate what you mean here



On Mon, Mar 30, 2020 at 1:43 PM Erick Erickson 
wrote:

> Response spikes after commits are almost always something to do
> with autowarming or docValues being set to false. So here’s what
> I’d look at, in order.
>
> 1> are the fields used defined with docValues=true? They should be.
> With this much variance it sounds like you don’t have that value set.
> You’ll have to rebuild your entire index, first deleting all documents…
>
> You assert that they are all docValues, but the variance is so
> high that I wonder whether they _all_ are. They may very well be, but
> I’ve been tripped up by things I know are true that aren’t too often ;)
>
> You can insure this by setting 'uninvertible=“true” ‘ in your field type,
> see: https://issues.apache.org/jira/browse/SOLR-12962 if you’re on
> 7.6 or later.
>
> 2>what are your autowarming settings for queryResultCache and/or
> filterCache. Start with a relatively small number, say 16 and look at
> your autowarm times to insure they aren’t excessive.
>
> 3> if autowarming doesn’t help, consider specifying a newSearcher
> event in solrconfig.xml that exercises the facets.
>
> NOTE: <2> and <3> will mask any fields that are docValues=false that
> slipped through the cracks, so I’d double check <1> first.
>
> Best,
> Erick
>
> > On Mar 30, 2020, at 12:20 PM, sujatha arun  wrote:
> >
> > A facet heavy query which uses docValue fields for faceting  returns
> about
> > 5k results executes between  10ms to 5 secs and the 5 secs time seems to
> > coincide with after a hard commit.
> >
> > Does that have any relation? Why the fluctuation in execution time?
> >
> > Thanks,
> > Revas
>
>

Re: DocValue field & commit

2020-03-30 Thread Revas

Correcting some typos ...

Thanks, Eric.

1) We are using dynamic string field for faceting where indexing =false and
stored=false . By default docValues are enabled for primitive fields (solr
6.6.), so not explicitly defined in schema. Do you think its wrong
assumption? Also I do not see this field listed in feild cache, but don't
see any dynamic fields listed.
2) Autowarm count is at 32 for both and autowarm time is 25 for
query-result cache and  1724 for filter cache
3)Can you elaborate what you mean here. We have hard-commit every 5 mins
with opensearcher=false and soft-commit every 2 secs.


On Mon, Mar 30, 2020 at 4:06 PM Revas  wrote:

> Thanks, Eric.
>
> 1) We are using dynamic string field for faceting where indexing =false
> and stored=false . By default docValues are enabled for primitive fields
> (solr 6.6.), so not explicitly defined in schema. Do you think its wrong
> assumption? Also I do not this field listed in feild cache, but dont see
> any dynamic fields listed.
> 2) Autowarm count is at 32 for both and autowarm time is 25 for
> queryresult and  17
> 3)Can you elaborate what you mean here
>
>
>
> On Mon, Mar 30, 2020 at 1:43 PM Erick Erickson 
> wrote:
>
>> Response spikes after commits are almost always something to do
>> with autowarming or docValues being set to false. So here’s what
>> I’d look at, in order.
>>
>> 1> are the fields used defined with docValues=true? They should be.
>> With this much variance it sounds like you don’t have that value set.
>> You’ll have to rebuild your entire index, first deleting all documents…
>>
>> You assert that they are all docValues, but the variance is so
>> high that I wonder whether they _all_ are. They may very well be, but
>> I’ve been tripped up by things I know are true that aren’t too often ;)
>>
>> You can insure this by setting 'uninvertible=“true” ‘ in your field type,
>> see: https://issues.apache.org/jira/browse/SOLR-12962 if you’re on
>> 7.6 or later.
>>
>> 2>what are your autowarming settings for queryResultCache and/or
>> filterCache. Start with a relatively small number, say 16 and look at
>> your autowarm times to insure they aren’t excessive.
>>
>> 3> if autowarming doesn’t help, consider specifying a newSearcher
>> event in solrconfig.xml that exercises the facets.
>>
>> NOTE: <2> and <3> will mask any fields that are docValues=false that
>> slipped through the cracks, so I’d double check <1> first.
>>
>> Best,
>> Erick
>>
>> > On Mar 30, 2020, at 12:20 PM, sujatha arun  wrote:
>> >
>> > A facet heavy query which uses docValue fields for faceting  returns
>> about
>> > 5k results executes between  10ms to 5 secs and the 5 secs time seems to
>> > coincide with after a hard commit.
>> >
>> > Does that have any relation? Why the fluctuation in execution time?
>> >
>> > Thanks,
>> > Revas
>>
>>

Re: DocValue field & commit

2020-03-30 Thread Revas

Thanks, Erick,

The process time execution based on debugQuery between the query and facets
is as follows

query 10ms
facets 4900ms

since max time is spent on facet processing (docValues enabled), query and
filter cache do no apply to this, correct?


   -  Autowarm count is at 32 for both and autowarm time is 25 for
   query-result cache and  1724 for filter cache
   -  We have hard-commit every 5 mins with opensearcher=false and
   soft-commit every 2 secs.
   - facet are a mix of pivot facets,range facets and facet queries
   - when the same facets criteria bring a smaller result set, response is
   much faster




On Mon, Mar 30, 2020 at 4:47 PM Erick Erickson 
wrote:

> OK, sounds like docValues is set.
>
> Sure, in solrconfig.xml, there are two sections “firstSearcher” and
> “newSearcher”.
> These are queries (or lists of queries) that are fired as part of
> autowarming
> when Solr is first started (firstSearcher) or when a commit happens that
> opens
> a new searcher (newSearcher). These are hand-crafted static queries. So
> create one or more newSearcher sections in that block that exercise your
> faceting and it’ll be fired as part of autowarming. That should smooth out
> the delay your user’s experience when commits happen.
>
> Best,
> Erick
>
> > On Mar 30, 2020, at 4:06 PM, Revas  wrote:
> >
> > Thanks, Eric.
> >
> > 1) We are using dynamic string field for faceting where indexing =false
> and
> > stored=false . By default docValues are enabled for primitive fields
> (solr
> > 6.6.), so not explicitly defined in schema. Do you think its wrong
> > assumption? Also I do not this field listed in feild cache, but dont see
> > any dynamic fields listed.
> > 2) Autowarm count is at 32 for both and autowarm time is 25 for
> queryresult
> > and  17
> > 3)Can you elaborate what you mean here
> >
> >
> >
> > On Mon, Mar 30, 2020 at 1:43 PM Erick Erickson 
> > wrote:
> >
> >> Response spikes after commits are almost always something to do
> >> with autowarming or docValues being set to false. So here’s what
> >> I’d look at, in order.
> >>
> >> 1> are the fields used defined with docValues=true? They should be.
> >> With this much variance it sounds like you don’t have that value set.
> >> You’ll have to rebuild your entire index, first deleting all documents…
> >>
> >> You assert that they are all docValues, but the variance is so
> >> high that I wonder whether they _all_ are. They may very well be, but
> >> I’ve been tripped up by things I know are true that aren’t too often ;)
> >>
> >> You can insure this by setting 'uninvertible=“true” ‘ in your field
> type,
> >> see: https://issues.apache.org/jira/browse/SOLR-12962 if you’re on
> >> 7.6 or later.
> >>
> >> 2>what are your autowarming settings for queryResultCache and/or
> >> filterCache. Start with a relatively small number, say 16 and look at
> >> your autowarm times to insure they aren’t excessive.
> >>
> >> 3> if autowarming doesn’t help, consider specifying a newSearcher
> >> event in solrconfig.xml that exercises the facets.
> >>
> >> NOTE: <2> and <3> will mask any fields that are docValues=false that
> >> slipped through the cracks, so I’d double check <1> first.
> >>
> >> Best,
> >> Erick
> >>
> >>> On Mar 30, 2020, at 12:20 PM, sujatha arun 
> wrote:
> >>>
> >>> A facet heavy query which uses docValue fields for faceting  returns
> >> about
> >>> 5k results executes between  10ms to 5 secs and the 5 secs time seems
> to
> >>> coincide with after a hard commit.
> >>>
> >>> Does that have any relation? Why the fluctuation in execution time?
> >>>
> >>> Thanks,
> >>> Revas
> >>
> >>
>
>

Re: DocValue field & commit

2020-03-31 Thread Revas

Hi Erick,

Thanks. We do have NRT requirement in our application that updates be
immediately visible. We do have constant updates. The push is for even
faster visibility but we are holding off at 2 secs soft-commit for now.

What I am not able to understand is that as per query debugging, the facet
processing time varies between a few ms to secs . Why would there be a
variability in facet processing time if they are based of docvalues and how
a newsearcher would help?

We do have 8 core CPU and lot of RAM in our server as we host multiple
collections.

On Mon, Mar 30, 2020 at 7:08 PM Erick Erickson 
wrote:

> Oh dear. Your autowarming is almost, but not quite totally, useless given
> your 2 second soft commit interval. See:
>
> https://lucidworks.com/post/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
>
> So autowarming is probably not a cure, when you originally said “commit” I
> was assuming
> that was one that opened a new searcher, if that’s not true autowarming
> isn’t a cure.
>
> Do you _really_ require 2 second soft commit intervals? I would not be
> surprised if you also see “too many on deck searcher” warnings in your
> logs at times. This is one of my hot buttons, having very short soft commit
> intervals is something people do without understanding the tradeoffs,
> one of which is that your caches are probably getting a poor utilization
> rate. Often
> the recommendation for short intervals like this is to not use the caches
> at all.
>
> The newSearcher is a full query. Go ahead and add facets. But again, this
> probably
> isn’t going to help much.
>
> But really, revisit your autocommit settings. Taking 1.7 seconds to
> autowarm
> means that you have roughly this.
> - commit
> - 1.7 seconds later, the new searcher is open for business.
> - 0.3 seconds after that a new searcher is open, which takes another 1.7
> seconds to autowarm.
>
> I doubt your hard commit is really the culprit here _unless_ you’re
> running on an under-powered
> machine. The hard commit will trigger segment merging, which is CPU and
> I/O intensive. If
> you’re using a machine that can’t afford the cycles to be taken up by
> merging, that could account
> for what you see, but new searchers are being opened every 2 seconds
> (assuming a relatively
> constant indexing load).
>
> Best,
> Erick
>
> > On Mar 30, 2020, at 6:42 PM, Revas  wrote:
> >
> > Thanks, Erick,
> >
> > The process time execution based on debugQuery between the query and
> facets
> > is as follows
> >
> > query 10ms
> > facets 4900ms
> >
> > since max time is spent on facet processing (docValues enabled), query
> and
> > filter cache do no apply to this, correct?
> >
> >
> >   -  Autowarm count is at 32 for both and autowarm time is 25 for
> >   query-result cache and  1724 for filter cache
> >   -  We have hard-commit every 5 mins with opensearcher=false and
> >   soft-commit every 2 secs.
> >   - facet are a mix of pivot facets,range facets and facet queries
> >   - when the same facets criteria bring a smaller result set, response is
> >   much faster
> >
> >
> >
> >
> > On Mon, Mar 30, 2020 at 4:47 PM Erick Erickson 
> > wrote:
> >
> >> OK, sounds like docValues is set.
> >>
> >> Sure, in solrconfig.xml, there are two sections “firstSearcher” and
> >> “newSearcher”.
> >> These are queries (or lists of queries) that are fired as part of
> >> autowarming
> >> when Solr is first started (firstSearcher) or when a commit happens that
> >> opens
> >> a new searcher (newSearcher). These are hand-crafted static queries. So
> >> create one or more newSearcher sections in that block that exercise your
> >> faceting and it’ll be fired as part of autowarming. That should smooth
> out
> >> the delay your user’s experience when commits happen.
> >>
> >> Best,
> >> Erick
> >>
> >>> On Mar 30, 2020, at 4:06 PM, Revas  wrote:
> >>>
> >>> Thanks, Eric.
> >>>
> >>> 1) We are using dynamic string field for faceting where indexing =false
> >> and
> >>> stored=false . By default docValues are enabled for primitive fields
> >> (solr
> >>> 6.6.), so not explicitly defined in schema. Do you think its wrong
> >>> assumption? Also I do not this field listed in feild cache, but dont
> see
> >>> any dynamic fields listed.
> >>> 2) Autowarm count is at 32 for both and autowarm time is 25 for
> >> queryresult
> >>> and  17
> >>> 3)Can you elabora

searcher

2020-03-31 Thread Revas

Hi

I am seeing from my logs searcher referenced as main and realtime .Do they
correspond to hard vs sofCommit. I do not see the co-relation to that based
on our commit settings.

Opening [Searcher@538abc62[xx_shard1_replica2] main]
Opening [Searcher@2e151991[ xx  _shard1_replica1] realtime]

Thanks

facets & docValues

2020-04-14 Thread Revas

We have faceting fields that have been defined as indexed=false,
stored=false and docValues=true

However we use a lot of subfacets  using  json facets and facet ranges
using facet.queries. We see that after every soft-commit our performance
worsens and performs ideal between commits

how is that docValue fields are affected by soft-commit and do we need to
enable indexing if we use subfacets and facet query to improve performance?

Tha

Re: facets & docValues

2020-04-16 Thread Revas

Hi Erick, You are correct, we have only about 1.8M documents so far and
turning on the indexing on the facet fields helped improve the timings of
the facet query a lot which has (sub facets and facet queries). So does
docValues help at all for sub facets and facet query, our tests
revealed further query time improvement when we turned off the docValues.
is that the right approach?

Currently we have only 1 shard and  we are thinking of scaling by
increasing the number of shards when we see a deterioration on query time.
Any suggestions?

Thanks.

On Wed, Apr 15, 2020 at 8:21 AM Erick Erickson 
wrote:

> In a word, “yes”. I also suspect your corpus isn’t very big.
>
> I think the key is the facet queries. Now, I’m talking from
> theory rather than diving into the code, but querying on
> a docValues=true, indexed=false field is really doing a
> search. And searching on a field like that is effectively
> analogous to a table scan. Even if somehow an internal
> structure would be constructed to deal with it, it would
> probably be on the heap, where you don’t want it.
>
> So the test would be to take the queries out and measure
> performance, but I think that’s the root issue here.
>
> Best,
> Erick
>
> > On Apr 14, 2020, at 11:51 PM, Revas  wrote:
> >
> > We have faceting fields that have been defined as indexed=false,
> > stored=false and docValues=true
> >
> > However we use a lot of subfacets  using  json facets and facet ranges
> > using facet.queries. We see that after every soft-commit our performance
> > worsens and performs ideal between commits
> >
> > how is that docValue fields are affected by soft-commit and do we need to
> > enable indexing if we use subfacets and facet query to improve
> performance?
> >
> > Tha
>
>

Re: facets & docValues

2020-05-04 Thread Revas

Hi Erick, Thanks for the explanation and advise. With facet queries, does
doc Values help at all ?

1) indexed=true, docValues=true =>  all facets

2)

   -  indexed=true , docValues=true => only for subfacets
   - inexed=true, docValues=false=> facet query
   - docValues=true, indexed=false=> term facets



In case of 1 above, => Indexing slowed considerably. over all facet
performance improved many fold
In case of  2=>  over all performance showed only slight
improvement

Does that mean turning on docValues even for facet query helps improve the
performance,  fetching from docValues for facet query is faster than
fetching from stored fields ?

Thanks


On Thu, Apr 16, 2020 at 1:50 PM Erick Erickson 
wrote:

> DocValues should help when faceting over fields, i.e. facet.field=blah.
>
> I would expect docValues to help with sub facets and, but don’t know
> the code well enough to say definitely one way or the other.
>
> The empirical approach would be to set “uninvertible=true” (Solr 7.6) and
> turn docValues off. What that means is that if any operation tries to
> uninvert
> the index on the Java heap, you’ll get an exception like:
> "can not sort on a field w/o docValues unless it is indexed=true
> uninvertible=true and the type supports Uninversion:”
>
> See SOLR-12962
>
> Speed is only one issue. The entire point of docValues is to not “uninvert”
> the field on the heap. This used to lead to very significant memory
> pressure. So when turning docValues off, you run the risk of
> reverting back to the old behavior and having unexpected memory
> consumption, not to mention slowdowns when the uninversion
> takes place.
>
> Also, unless your documents are very large, this is a tiny corpus. It can
> be
> quite hard to get realistic numbers, the signal gets lost in the noise.
>
> You should only shard when your individual query times exceed your
> requirement. Say you have a 95%tile requirement of 1 second response time.
>
> Let’s further say that you can meet that requirement with 50
> queries/second,
> but when you get to 75 queries/second your response time exceeds your
> requirements. Do NOT shard at this point. Add another replica instead.
> Sharding adds inevitable overhead and should only be considered when
> you can’t get adequate response time even under fairly light query loads
> as a general rule.
>
> Best,
> Erick
>
> > On Apr 16, 2020, at 12:08 PM, Revas  wrote:
> >
> > Hi Erick, You are correct, we have only about 1.8M documents so far and
> > turning on the indexing on the facet fields helped improve the timings of
> > the facet query a lot which has (sub facets and facet queries). So does
> > docValues help at all for sub facets and facet query, our tests
> > revealed further query time improvement when we turned off the docValues.
> > is that the right approach?
> >
> > Currently we have only 1 shard and  we are thinking of scaling by
> > increasing the number of shards when we see a deterioration on query
> time.
> > Any suggestions?
> >
> > Thanks.
> >
> >
> > On Wed, Apr 15, 2020 at 8:21 AM Erick Erickson 
> > wrote:
> >
> >> In a word, “yes”. I also suspect your corpus isn’t very big.
> >>
> >> I think the key is the facet queries. Now, I’m talking from
> >> theory rather than diving into the code, but querying on
> >> a docValues=true, indexed=false field is really doing a
> >> search. And searching on a field like that is effectively
> >> analogous to a table scan. Even if somehow an internal
> >> structure would be constructed to deal with it, it would
> >> probably be on the heap, where you don’t want it.
> >>
> >> So the test would be to take the queries out and measure
> >> performance, but I think that’s the root issue here.
> >>
> >> Best,
> >> Erick
> >>
> >>> On Apr 14, 2020, at 11:51 PM, Revas  wrote:
> >>>
> >>> We have faceting fields that have been defined as indexed=false,
> >>> stored=false and docValues=true
> >>>
> >>> However we use a lot of subfacets  using  json facets and facet ranges
> >>> using facet.queries. We see that after every soft-commit our
> performance
> >>> worsens and performs ideal between commits
> >>>
> >>> how is that docValue fields are affected by soft-commit and do we need
> to
> >>> enable indexing if we use subfacets and facet query to improve
> >> performance?
> >>>
> >>> Tha
> >>
> >>
>
>

Re: facets & docValues

2020-05-05 Thread Revas

Hi joel, No, we have not, we have softCommit requirement of 2 secs.

On Tue, May 5, 2020 at 3:31 PM Joel Bernstein  wrote:

> Have you configured static warming queries for the facets? This will warm
> the cache structures for the facet fields. You just want to make sure you
> commits are spaced far enough apart that the warming completes before a new
> searcher starts warming.
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
>
> On Mon, May 4, 2020 at 10:27 AM Revas  wrote:
>
> > Hi Erick, Thanks for the explanation and advise. With facet queries, does
> > doc Values help at all ?
> >
> > 1) indexed=true, docValues=true =>  all facets
> >
> > 2)
> >
> >-  indexed=true , docValues=true => only for subfacets
> >- inexed=true, docValues=false=> facet query
> >- docValues=true, indexed=false=> term facets
> >
> >
> >
> > In case of 1 above, => Indexing slowed considerably. over all facet
> > performance improved many fold
> > In case of  2=>  over all performance showed only slight
> > improvement
> >
> > Does that mean turning on docValues even for facet query helps improve
> the
> > performance,  fetching from docValues for facet query is faster than
> > fetching from stored fields ?
> >
> > Thanks
> >
> >
> > On Thu, Apr 16, 2020 at 1:50 PM Erick Erickson 
> > wrote:
> >
> > > DocValues should help when faceting over fields, i.e. facet.field=blah.
> > >
> > > I would expect docValues to help with sub facets and, but don’t know
> > > the code well enough to say definitely one way or the other.
> > >
> > > The empirical approach would be to set “uninvertible=true” (Solr 7.6)
> and
> > > turn docValues off. What that means is that if any operation tries to
> > > uninvert
> > > the index on the Java heap, you’ll get an exception like:
> > > "can not sort on a field w/o docValues unless it is indexed=true
> > > uninvertible=true and the type supports Uninversion:”
> > >
> > > See SOLR-12962
> > >
> > > Speed is only one issue. The entire point of docValues is to not
> > “uninvert”
> > > the field on the heap. This used to lead to very significant memory
> > > pressure. So when turning docValues off, you run the risk of
> > > reverting back to the old behavior and having unexpected memory
> > > consumption, not to mention slowdowns when the uninversion
> > > takes place.
> > >
> > > Also, unless your documents are very large, this is a tiny corpus. It
> can
> > > be
> > > quite hard to get realistic numbers, the signal gets lost in the noise.
> > >
> > > You should only shard when your individual query times exceed your
> > > requirement. Say you have a 95%tile requirement of 1 second response
> > time.
> > >
> > > Let’s further say that you can meet that requirement with 50
> > > queries/second,
> > > but when you get to 75 queries/second your response time exceeds your
> > > requirements. Do NOT shard at this point. Add another replica instead.
> > > Sharding adds inevitable overhead and should only be considered when
> > > you can’t get adequate response time even under fairly light query
> loads
> > > as a general rule.
> > >
> > > Best,
> > > Erick
> > >
> > > > On Apr 16, 2020, at 12:08 PM, Revas  wrote:
> > > >
> > > > Hi Erick, You are correct, we have only about 1.8M documents so far
> and
> > > > turning on the indexing on the facet fields helped improve the
> timings
> > of
> > > > the facet query a lot which has (sub facets and facet queries). So
> does
> > > > docValues help at all for sub facets and facet query, our tests
> > > > revealed further query time improvement when we turned off the
> > docValues.
> > > > is that the right approach?
> > > >
> > > > Currently we have only 1 shard and  we are thinking of scaling by
> > > > increasing the number of shards when we see a deterioration on query
> > > time.
> > > > Any suggestions?
> > > >
> > > > Thanks.
> > > >
> > > >
> > > > On Wed, Apr 15, 2020 at 8:21 AM Erick Erickson <
> > erickerick...@gmail.com>
> > > > wrote:
> > > >
> > > >> In a word, “yes”. I also suspect your corpus isn’t very big.
> > > >>
> > >

Re: when to use docvalue

2020-05-20 Thread Revas

Erick, Can you also explain how to optimize facet query and range facets as
they dont use docValues and contribute to higher response time?

On Tue, May 19, 2020 at 5:55 PM Erick Erickson 
wrote:

> They are _absolutely_ able to be used together. Background:
>
> “In the bad old days”, there was no docValues. So whenever you needed
> to facet/sort/group/use function queries Solr (well, Lucene) had to take
> the inverted structure resulting from “index=true” and “uninvert” it on the
> Java heap.
>
> docValues essentially does the “uninverting” at index time and puts
> that structure in a separate file for each segment. So rather than uninvert
> the index on the heap, Lucene can just read it in from disk in
> MMapDirectory
> (i.e. OS) memory space.
>
> The downside is that your index will be bigger when you do both, that is
> the
> size on disk will be bigger. But, it’ll be much faster to load, much
> faster to
> autowarm, and will move the structures necessary to do faceting/sorting/etc
> into OS memory where the garbage collection is vastly more efficient than
> Javas.
>
> And frankly I don’t think the increased size on disk is a downside. You’ll
> have
> to have the memory anyway, and having it used on the OS memory space is
> so much more efficient than on Java’s heap that it’s a win-win IMO.
>
> Oh, and if you never sort/facet/group/use function queries, then the
> docValues structures are never even read into MMapDirectory space.
>
> So yes, freely do both.
>
> Best,
> Erick
>
>
> > On May 19, 2020, at 5:41 PM, matthew sporleder 
> wrote:
> >
> > You can index AND docvalue?  For some reason I thought they were
> exclusive
> >
> > On Tue, May 19, 2020 at 5:36 PM Erick Erickson 
> wrote:
> >>
> >> Yes. You should also index them….
> >>
> >> Here’s the way I think of it.
> >>
> >> For questions “For term X, which docs contain that value?” means
> index=true. This is a search.
> >>
> >> For questions “Does doc X have value Y in field Z”, means
> docValues=true.
> >>
> >> what’s the difference? Well, the first one is to get the result set.
> The second is for, given a result set,
> >> count/sort/whatever.
> >>
> >> fq clauses are searches, so index=true.
> >>
> >> sorting, faceting, grouping and function queries  are “for each doc in
> the result set, what values does field Y contain?”
> >>
> >> Maybe that made things clear as mud, but it’s the way I think of it ;)
> >>
> >> Best,
> >> Erick
> >>
> >>
> >>
> >> fq clauses are searches. Indexed=true is for searching.
> >>
> >> sort
> >>
> >>> On May 19, 2020, at 4:00 PM, matthew sporleder 
> wrote:
> >>>
> >>> I have quite a few numeric / meta-data type fields in my schema and
> >>> pretty much only use them in fq=, sort=, and friends.  Should I always
> >>> use DocValue on these if i never plan to q=search: on them?  Are there
> >>> any drawbacks?
> >>>
> >>> Thanks,
> >>> Matt
> >>
>
>

Re: when to use docvalue

2020-05-20 Thread Revas

Thanks, Erick. Its just when we enable both index=true and docValues=true,
it increases the index time by 2x atleast for full re-index.

On Wed, May 20, 2020 at 2:30 PM Erick Erickson 
wrote:

> Revas:
>
> Facet queries are just queries that are constrained by the total result
> set of your
> primary query, so the answer to that would be the same as speeding up
> regular
> queries. As far as range facets are concerned, I believe they _do_ use
> docValues,
> after all they have to answer the exact same question: For doc X in the
> result set,
> what is the value of field Y? The only difference is it has to bucket a
> bunch of them.
>
> Rahul: Please don;’t hijack threads, it makes it difficult to find things
> later. Start
> a separate e-mail thread.
>
> The answer to your question is, of course, “it depends” on a number of
> things and
> changes with the query. First of all, multivalued fields don’t qualify
> because
> docValues are a sorted set, meaning the return is sorted and deduplicated.
> So if
> the input has f values in it, b c d c d, what you’d get back from DV is b
> c d.
>
> So let’s go with primitive, single-valued types. It still depends, but
> Solr does
> the right thing, or tries. Here’s the scoop. stored fields for any single
> doc are
> stored as a contiguous, compressed bit of memory. So if any _one_ field
> needs
> to be read from the stored data, the entire block is decompressed and Solr
> will
> preferentially fetch the value from the decompressed data as it’s pretty
> certain
> to be at least as cheap as fetching from DV. However, the reverse is true
> if _all_
> the returned values are single-valued DV fields. Then it’s more efficient
> to fetch
> the DV values as they’re MMapped, and won’t cost the seek-and-decompress
> cycle.
>
> Unless space is a real consideration for you, I’d set both index and
> docValues to
> true…
>
> Best,
> Erick
>
> > On May 20, 2020, at 10:45 AM, Rahul Goswami 
> wrote:
> >
> > Eric,
> > Thanks for that explanation. I have a follow up question on that. I find
> > the scenario of stored=true and docValues=true to be tricky at times...
> > would like to know when is each of these scenarios preferred over the
> other
> > two for primitive datatypes:
> >
> > 1) stored=true and docValues=false
> > 2) stored=false and docValues=true
> > 3) stored=true and docValues=true
> >
> > Thanks,
> > Rahul
> >
> > On Tue, May 19, 2020 at 5:55 PM Erick Erickson 
> > wrote:
> >
> >> They are _absolutely_ able to be used together. Background:
> >>
> >> “In the bad old days”, there was no docValues. So whenever you needed
> >> to facet/sort/group/use function queries Solr (well, Lucene) had to take
> >> the inverted structure resulting from “index=true” and “uninvert” it on
> the
> >> Java heap.
> >>
> >> docValues essentially does the “uninverting” at index time and puts
> >> that structure in a separate file for each segment. So rather than
> uninvert
> >> the index on the heap, Lucene can just read it in from disk in
> >> MMapDirectory
> >> (i.e. OS) memory space.
> >>
> >> The downside is that your index will be bigger when you do both, that is
> >> the
> >> size on disk will be bigger. But, it’ll be much faster to load, much
> >> faster to
> >> autowarm, and will move the structures necessary to do
> faceting/sorting/etc
> >> into OS memory where the garbage collection is vastly more efficient
> than
> >> Javas.
> >>
> >> And frankly I don’t think the increased size on disk is a downside.
> You’ll
> >> have
> >> to have the memory anyway, and having it used on the OS memory space is
> >> so much more efficient than on Java’s heap that it’s a win-win IMO.
> >>
> >> Oh, and if you never sort/facet/group/use function queries, then the
> >> docValues structures are never even read into MMapDirectory space.
> >>
> >> So yes, freely do both.
> >>
> >> Best,
> >> Erick
> >>
> >>
> >>> On May 19, 2020, at 5:41 PM, matthew sporleder 
> >> wrote:
> >>>
> >>> You can index AND docvalue?  For some reason I thought they were
> >> exclusive
> >>>
> >>> On Tue, May 19, 2020 at 5:36 PM Erick Erickson <
> erickerick...@gmail.com>
> >> wrote:
> >>>>
> >>>> Yes. You should also index them….
> >>>>
> >>>> Here’s the way I think of it.
> &g

Collection Creation across DC

2021-02-10 Thread Revas

Hello,

Can we create a collection across data Center ( shard replica is in a
different data center)
for HA ?

Thanks
Revas

45 matches

Mail list logo