Russian stopwords

2008-12-05 Thread tushar kapoor

I am trying to filter russian stopwords but have not been successful with
that. I am using the following schema entry -

.
 
   

 




  

..

Intrestingly, Russian synonyms are working fine. English and russian
synonyms get searched correctly.

Also,If I add an English language word to stopwords.txt it gets filtered
correctly. Its the russian words that are not getting filtered as stopwords.

Can someone explain the behaviour.

Thanks,
Tushar.
-- 
View this message in context: 
http://www.nabble.com/Russian-stopwords-tp20851093p20851093.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: new faceting algorithm

2008-12-05 Thread Till Kinstler

Yonik Seeley schrieb:


We'd love some feedback on how it works to
ensure that it actually is a win for the majority and should be the
default.


I just did a quick test using Solr nightly 2008-11-30. I have an index 
of about 2.9 mil bibliographic records, size: 16G. I tested facetting 
author names, each index document may contain multiple author names, so 
author names go into a multivalued field (not analyzed). Queries used 
for testing were extracted from log files of a prototype application.
With facet.method=enum, 50 request threads, I get an average response 
time of about 19(!) ms, no cache evictions. With 1 request thread: 
about 1800 ms.
With facet.method=fc, 50 threads I get an average response time of 
around 300 ms. 1 thread: 16 ms.

Seems to be a major improvement at first sight :-)

Regards,
Till

--
Till Kinstler
Verbundzentrale des Gemeinsamen Bibliotheksverbundes (VZG)
Platz der Göttinger Sieben 1, D 37073 Göttingen
[EMAIL PROTECTED], +49 (0) 551 39-13431, http://www.gbv.de


JSONResponseWriter bug ? (solr-1.3)

2008-12-05 Thread Grégoire Neuville
Hi,

I think I've discovered a bug with the JSONResponseWriter : starting
from the following query -

http://127.0.0.1:8080/solr-urbamet/select?q=(tout:1)&rows=0&sort=TITRE+desc&facet=true&facet.query=SUJET:b*&facet.field=SUJET&facet.prefix=b&facet.limit=1&facet.missing=true&wt=json&json.nl=arrarr

- which produced a NullPointerException (see the stacktrace below), I
played with the parameters and obtained the following results :

##PAGINATION
rows : starting from 0, the exception occurs until we pass a certain threshold
=> rows implicated

##SORTING
the rows threshold afore mentionned seems to be influenced by the
presence/absence of the sort parameter

##FACETS
facet=false => OK while facet=true => NullPointerException
=>facets implicated
--
facet.field absent => OK while facet.field=whatever => NullPointerException
=>facet.field implicated
--
facet.missing=false => OK while facet.missing=true => NullPointerException
=> facet.missing implicated
--
facet.limit=-1 or 0 => OK while facet.limit>0  => NullPointerException
=> facet.limit implicated
--
facet.query absent or facet.query = whatever => NullPointerException
=>facet.query not implicated
--
facet.offset=(several values or absent) => NullPointerException
=> facet.offset not implicated
--
=> facet.sort not implicated (true or false => NullPointerException)
--
=> facet.mincount not implicated (several values or absent =>
NullPointerException)

#ResponseWriter
wt=standard => ok while wt=json => NullPointerException
=> jsonwriter implicated
json.nl=flat or map => ok
=> jsonwriter 'arrarr' format implicated

I hope this debugging is readable and will help.
--
Grégoire Neuville

Stacktrace :

GRAVE: java.lang.NullPointerException
   at 
org.apache.solr.request.JSONWriter.writeStr(JSONResponseWriter.java:607)
   at 
org.apache.solr.request.JSONWriter.writeNamedListAsArrArr(JSONResponseWriter.java:245)
   at 
org.apache.solr.request.JSONWriter.writeNamedList(JSONResponseWriter.java:294)
   at 
org.apache.solr.request.TextResponseWriter.writeVal(TextResponseWriter.java:151)
   at 
org.apache.solr.request.JSONWriter.writeNamedListAsMapWithDups(JSONResponseWriter.java:175)
   at 
org.apache.solr.request.JSONWriter.writeNamedList(JSONResponseWriter.java:288)
   at 
org.apache.solr.request.TextResponseWriter.writeVal(TextResponseWriter.java:151)
   at 
org.apache.solr.request.JSONWriter.writeNamedListAsMapWithDups(JSONResponseWriter.java:175)
   at 
org.apache.solr.request.JSONWriter.writeNamedList(JSONResponseWriter.java:288)
   at 
org.apache.solr.request.TextResponseWriter.writeVal(TextResponseWriter.java:151)
   at 
org.apache.solr.request.JSONWriter.writeNamedListAsMapWithDups(JSONResponseWriter.java:175)
   at 
org.apache.solr.request.JSONWriter.writeNamedList(JSONResponseWriter.java:288)
   at 
org.apache.solr.request.JSONWriter.writeResponse(JSONResponseWriter.java:88)
   at 
org.apache.solr.request.JSONResponseWriter.write(JSONResponseWriter.java:49)
   at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:257)
   at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
   at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
   at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
   at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175)
   at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
   at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
   at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
   at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
   at 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844)
   at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
   at 
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
   at java.lang.Thread.run(Thread.java:595)


multiValued multiValued fields

2008-12-05 Thread Joel Karlsson
Hello,

I want to index a field with an array of arrays, is that possible in Solr?
I.e I have one multi-valued field with persons and would like one
multi-valued field with their employer, but sometimes there are more than
one employer per person and therefor it would've been good to use a
multi-valued multi-valued field:

Person-field:
["Andersson, John","Svensson, Marcus"]

Employer-field:
[ [ "Volvo","Saab" ] , [ "Ericsson", "Nokia", "Motorola" ] ]

I could from these fields easily retrieve which companies are associated
with which person.

Thanks in advance // Joel


Can Solr follow links?

2008-12-05 Thread Joel Karlsson
Hello,

Is there any way for Solr to follow links stored in my database and index
the content of these files and HTTP-resources?

Thanks in advance! // Joel


Re: new faceting algorithm

2008-12-05 Thread Andre Hagenbruch
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Till Kinstler schrieb:

Hi,

> I just did a quick test using Solr nightly 2008-11-30. I have an index
> of about 2.9 mil bibliographic records, size: 16G. I tested facetting
> author names, each index document may contain multiple author names, so
> author names go into a multivalued field (not analyzed). Queries used
> for testing were extracted from log files of a prototype application.
> With facet.method=enum, 50 request threads, I get an average response
> time of about 19(!) ms, no cache evictions. With 1 request thread:
> about 1800 ms.
> With facet.method=fc, 50 threads I get an average response time of
> around 300 ms. 1 thread: 16 ms.
> Seems to be a major improvement at first sight :-)

same here: multi valued author fields were the bottleneck with 1.3 for
us, too. I'm currently testing with 1.5 million records, ~1.2 million of
which have values for the author field, but with ~2 million distinct
values. With Solr 1.3 we had average response times of 15000-25000 ms
for 10 parallel requests (depending on cache settings), with 1.4 they
are now down to 230 ms...

Regards,

Andre
- --
Andre Hagenbruch
Projekt "Integriertes Bibliotheksportal"
Universitaetsbibliothek Bochum, Etage 4/Raum 6
Fon: +49 234 3229346, Fax: +49 234 3214736
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkk5G5kACgkQ3wuzs9k1icVbOACgta0COUoOJGRN93puG2LzBJZU
t1EAn3od/3CmD9zE0ioo/yjQ5YrHv+1m
=80sA
-END PGP SIGNATURE-


Re: Is there a clean way to determine whether a core exists?

2008-12-05 Thread Dean Thompson
Wow -- thanks for all the help!!  With everyone's help, I did end up  
in a *much* better place:


private static boolean solrCoreExists(String coreName, String  
solrRootUrl) throws IOException, SolrServerException

{
CommonsHttpSolrServer adminServer = new  
CommonsHttpSolrServer(solrRootUrl);
CoreAdminResponse status =  
CoreAdminRequest.getStatus(coreName, adminServer);


return status.getCoreStatus(coreName).get("instanceDir") !=  
null;

}


On Dec 5, 2008, at 1:09 AM, Ryan McKinley wrote:


yes:
http://localhost:8983/solr/admin/cores?action=STATUS

will give you a list of running cores.  However that is not easy to  
check with a simple status != 404


see:
http://wiki.apache.org/solr/CoreAdmin


On Dec 4, 2008, at 11:46 PM, Chris Hostetter wrote:



: Subject: Is there a clean way to determine whether a core exists?

doesn't the CoreAdminHandler's STATUS feature make this easy?




-Hoss







Re: new faceting algorithm

2008-12-05 Thread Peter Keegan
Hi Yonik,

May I ask in which class(es) this improvement was made? I've been using the
DocSet, DocList, BitDocSet, HashDocSet from Solr from a few years ago with a
Lucene based app. to do faceting.

Thanks,
Peter


On Mon, Nov 24, 2008 at 11:12 PM, Yonik Seeley <[EMAIL PROTECTED]> wrote:

> A new faceting algorithm has been committed to the development version
> of Solr, and should be available in the next nightly test build (will
> be dated 11-25).  This change should generally improve field faceting
> where the field has many unique values but relatively few values per
> document.  This new algorithm is now the default for multi-valued
> fields (including tokenized fields) so you shouldn't have to do
> anything to enable it.  We'd love some feedback on how it works to
> ensure that it actually is a win for the majority and should be the
> default.
>
> -Yonik
>


DataImportHandler - time stamp format in

2008-12-05 Thread Jae Joo
In the dataimport.properties file, there is the timespamp.

#Thu Dec 04 15:36:22 EST 2008
last_index_time=2008-12-04 15\:36\:20

I am using the Oracle (10g) and would like to know which format of timestamp
I have to use in Oracle.

Thanks,

Jae


Re: JSONResponseWriter bug ? (solr-1.3)

2008-12-05 Thread Yonik Seeley
Thanks for the report Grégoire, it definitely looks like a bug.
Would you mind opening a JIRA issue for this?

-Yonik

On Fri, Dec 5, 2008 at 6:26 AM, Grégoire Neuville
<[EMAIL PROTECTED]> wrote:
> Hi,
>
> I think I've discovered a bug with the JSONResponseWriter : starting
> from the following query -
>
> http://127.0.0.1:8080/solr-urbamet/select?q=(tout:1)&rows=0&sort=TITRE+desc&facet=true&facet.query=SUJET:b*&facet.field=SUJET&facet.prefix=b&facet.limit=1&facet.missing=true&wt=json&json.nl=arrarr
>
> - which produced a NullPointerException (see the stacktrace below), I
> played with the parameters and obtained the following results :
>
> ##PAGINATION
> rows : starting from 0, the exception occurs until we pass a certain threshold
> => rows implicated
>
> ##SORTING
> the rows threshold afore mentionned seems to be influenced by the
> presence/absence of the sort parameter
>
> ##FACETS
> facet=false => OK while facet=true => NullPointerException
> =>facets implicated
> --
> facet.field absent => OK while facet.field=whatever => NullPointerException
> =>facet.field implicated
> --
> facet.missing=false => OK while facet.missing=true => NullPointerException
> => facet.missing implicated
> --
> facet.limit=-1 or 0 => OK while facet.limit>0  => NullPointerException
> => facet.limit implicated
> --
> facet.query absent or facet.query = whatever => NullPointerException
> =>facet.query not implicated
> --
> facet.offset=(several values or absent) => NullPointerException
> => facet.offset not implicated
> --
> => facet.sort not implicated (true or false => NullPointerException)
> --
> => facet.mincount not implicated (several values or absent =>
> NullPointerException)
>
> #ResponseWriter
> wt=standard => ok while wt=json => NullPointerException
> => jsonwriter implicated
> json.nl=flat or map => ok
> => jsonwriter 'arrarr' format implicated
>
> I hope this debugging is readable and will help.
> --
> Grégoire Neuville
>
> Stacktrace :
>
> GRAVE: java.lang.NullPointerException
>   at 
> org.apache.solr.request.JSONWriter.writeStr(JSONResponseWriter.java:607)
>   at 
> org.apache.solr.request.JSONWriter.writeNamedListAsArrArr(JSONResponseWriter.java:245)
>   at 
> org.apache.solr.request.JSONWriter.writeNamedList(JSONResponseWriter.java:294)
>   at 
> org.apache.solr.request.TextResponseWriter.writeVal(TextResponseWriter.java:151)
>   at 
> org.apache.solr.request.JSONWriter.writeNamedListAsMapWithDups(JSONResponseWriter.java:175)
>   at 
> org.apache.solr.request.JSONWriter.writeNamedList(JSONResponseWriter.java:288)
>   at 
> org.apache.solr.request.TextResponseWriter.writeVal(TextResponseWriter.java:151)
>   at 
> org.apache.solr.request.JSONWriter.writeNamedListAsMapWithDups(JSONResponseWriter.java:175)
>   at 
> org.apache.solr.request.JSONWriter.writeNamedList(JSONResponseWriter.java:288)
>   at 
> org.apache.solr.request.TextResponseWriter.writeVal(TextResponseWriter.java:151)
>   at 
> org.apache.solr.request.JSONWriter.writeNamedListAsMapWithDups(JSONResponseWriter.java:175)
>   at 
> org.apache.solr.request.JSONWriter.writeNamedList(JSONResponseWriter.java:288)
>   at 
> org.apache.solr.request.JSONWriter.writeResponse(JSONResponseWriter.java:88)
>   at 
> org.apache.solr.request.JSONResponseWriter.write(JSONResponseWriter.java:49)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:257)
>   at 
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
>   at 
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
>   at 
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
>   at 
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175)
>   at 
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
>   at 
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
>   at 
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
>   at 
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
>   at 
> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844)
>   at 
> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
>   at 
> org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
>   at java.lang.Thread.run(Thread.java:595)
>


Re: Solr on Solaris

2008-12-05 Thread Jae Joo
I do have same experience.
What is the CPU in the Solaris box? it is not depending on the operating
system (linux or Solaris). It is depenong on the CPU (Intel ro SPARC).
Don't know why, but based on my performance test, SPARC machine requires
MORE memory for java application.

Jae

On Thu, Dec 4, 2008 at 10:40 PM, Kashyap, Raghu <[EMAIL PROTECTED]>wrote:

> We are running solr on a solaris box with 4 CPU's(8 cores) and  3GB Ram.
> When we try to index sometimes the HTTP Connection just hangs and the
> client which is posting documents to solr doesn't get any response back.
> We since then have added timeouts to our http requests from the clients.
>
>
>
> I then get this error.
>
>
>
> java.lang.OutOfMemoryError: requested 239848 bytes for Chunk::new. Out
> of swap space?
>
> java.lang.OutOfMemoryError: unable to create new native thread
>
> Exception in thread "JmxRmiRegistryConnectionPoller"
> java.lang.OutOfMemoryError: unable to create new native thread
>
>
>
> We are running JDK 1.6_10 on the solaris box. . The weird thing is we
> are running the same application on linux box with JDK 1.6 and we
> haven't seen any problem like this.
>
>
>
> Any suggestions?
>
>
>
> -Raghu
>
>


RE: Solr on Solaris

2008-12-05 Thread Kashyap, Raghu
Jon,

What do you mean by off a "Zone"? Please clarify

-Raghu


-Original Message-
From: Jon Baer [mailto:[EMAIL PROTECTED] 
Sent: Thursday, December 04, 2008 9:56 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr on Solaris

Just curious, is this off a "zone" by any chance?

- Jon

On Dec 4, 2008, at 10:40 PM, Kashyap, Raghu wrote:

> We are running solr on a solaris box with 4 CPU's(8 cores) and  3GB  
> Ram.
> When we try to index sometimes the HTTP Connection just hangs and the
> client which is posting documents to solr doesn't get any response  
> back.
> We since then have added timeouts to our http requests from the  
> clients.
>
>
>
> I then get this error.
>
>
>
> java.lang.OutOfMemoryError: requested 239848 bytes for Chunk::new. Out
> of swap space?
>
> java.lang.OutOfMemoryError: unable to create new native thread
>
> Exception in thread "JmxRmiRegistryConnectionPoller"
> java.lang.OutOfMemoryError: unable to create new native thread
>
>
>
> We are running JDK 1.6_10 on the solaris box. . The weird thing is we
> are running the same application on linux box with JDK 1.6 and we
> haven't seen any problem like this.
>
>
>
> Any suggestions?
>
>
>
> -Raghu
>



RE: Solr on Solaris

2008-12-05 Thread Kashyap, Raghu
Hi Jae,

  Its intel based CPU.

-Raghu

-Original Message-
From: Jae Joo [mailto:[EMAIL PROTECTED] 
Sent: Friday, December 05, 2008 9:53 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr on Solaris

I do have same experience.
What is the CPU in the Solaris box? it is not depending on the operating
system (linux or Solaris). It is depenong on the CPU (Intel ro SPARC).
Don't know why, but based on my performance test, SPARC machine requires
MORE memory for java application.

Jae

On Thu, Dec 4, 2008 at 10:40 PM, Kashyap, Raghu
<[EMAIL PROTECTED]>wrote:

> We are running solr on a solaris box with 4 CPU's(8 cores) and  3GB
Ram.
> When we try to index sometimes the HTTP Connection just hangs and the
> client which is posting documents to solr doesn't get any response
back.
> We since then have added timeouts to our http requests from the
clients.
>
>
>
> I then get this error.
>
>
>
> java.lang.OutOfMemoryError: requested 239848 bytes for Chunk::new. Out
> of swap space?
>
> java.lang.OutOfMemoryError: unable to create new native thread
>
> Exception in thread "JmxRmiRegistryConnectionPoller"
> java.lang.OutOfMemoryError: unable to create new native thread
>
>
>
> We are running JDK 1.6_10 on the solaris box. . The weird thing is we
> are running the same application on linux box with JDK 1.6 and we
> haven't seen any problem like this.
>
>
>
> Any suggestions?
>
>
>
> -Raghu
>
>


Re: new faceting algorithm

2008-12-05 Thread Rob Casson
very similar situation to those already reported.  2.9M bilbiographic
records, with authors being the (previous) bottleneck, and the one
we're starting to test with the new algorithm.

so far, no load tests, but just in single requests i'm seeing the same
improvements...phenomenal improvements, btw, with most example queries
taking less than 1/100th of the time

always very impressed with this project/product, and just thought i'd
add a "me-too" to the list...cheers, and have a great weekend,

rob

On Mon, Nov 24, 2008 at 11:12 PM, Yonik Seeley <[EMAIL PROTECTED]> wrote:
> A new faceting algorithm has been committed to the development version
> of Solr, and should be available in the next nightly test build (will
> be dated 11-25).  This change should generally improve field faceting
> where the field has many unique values but relatively few values per
> document.  This new algorithm is now the default for multi-valued
> fields (including tokenized fields) so you shouldn't have to do
> anything to enable it.  We'd love some feedback on how it works to
> ensure that it actually is a win for the majority and should be the
> default.
>
> -Yonik
>


Re: new faceting algorithm

2008-12-05 Thread Koji Sekiguchi

Peter,

It is UnInvertedField class. See also:
https://issues.apache.org/jira/browse/SOLR-475


Peter Keegan wrote:

Hi Yonik,

May I ask in which class(es) this improvement was made? I've been using the
DocSet, DocList, BitDocSet, HashDocSet from Solr from a few years ago with a
Lucene based app. to do faceting.

Thanks,
Peter

  




RE: Russian stopwords

2008-12-05 Thread Steven A Rowe
Hi Tushar,

On 12/05/2008 at 5:18 AM, tushar kapoor wrote:
> I am trying to filter russian stopwords but have not been
> successful with that.
[...]
>  words="stopwords.txt"/>
>ignoreCase="true" expand="false"/>
[...]
> Intrestingly, Russian synonyms are working fine. English and russian
> synonyms get searched correctly.
>
> Also,If I add an English language word to stopwords.txt it
> gets filtered correctly. Its the russian words that are not
> getting filtered as stopwords.

It might be an encoding issue - StopFilterFactory delegates stopword file 
reading to SolrResourceLoader.getLines(), which uses an InputStreamReader 
instantiated with the UTF-8 charset.  Is your stopwords.txt encoded as UTF-8?

It's strange that synonyms are working fine, though - SynonymFilterFactory 
reads in the synonyms file using the same mechanism as StopFilterFactory - is 
it possible that your synonyms file is encoded as UTF-8, but your stopwords 
file is encoded with a different encoding, perhaps KOI8-R?  Like UTF-8, KOI8-R 
includes the entirety of 7-bit ASCII, so English words would be properly 
decoded under UTF-8.

Steve


Re: Solr on Solaris

2008-12-05 Thread Jon Baer
Are you running Solr in a container more specifically,  Ive had few  
issues w/ zones in the past and Solr (I believe there are some  
networking issues w/ older Solaris versions) ...


They are basically where you can slice ("virtualize") your resources  
and divide a box up into something similar to a VPS ...


http://www.sun.com/bigadmin/content/zones/

- Jon

On Dec 5, 2008, at 10:58 AM, Kashyap, Raghu wrote:


Jon,

What do you mean by off a "Zone"? Please clarify

-Raghu


-Original Message-
From: Jon Baer [mailto:[EMAIL PROTECTED]
Sent: Thursday, December 04, 2008 9:56 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr on Solaris

Just curious, is this off a "zone" by any chance?

- Jon

On Dec 4, 2008, at 10:40 PM, Kashyap, Raghu wrote:


We are running solr on a solaris box with 4 CPU's(8 cores) and  3GB
Ram.
When we try to index sometimes the HTTP Connection just hangs and the
client which is posting documents to solr doesn't get any response
back.
We since then have added timeouts to our http requests from the
clients.



I then get this error.



java.lang.OutOfMemoryError: requested 239848 bytes for Chunk::new.  
Out

of swap space?

java.lang.OutOfMemoryError: unable to create new native thread

Exception in thread "JmxRmiRegistryConnectionPoller"
java.lang.OutOfMemoryError: unable to create new native thread



We are running JDK 1.6_10 on the solaris box. . The weird thing is we
are running the same application on linux box with JDK 1.6 and we
haven't seen any problem like this.



Any suggestions?



-Raghu







Re: Merging Indices

2008-12-05 Thread Shalin Shekhar Mangar
On Fri, Dec 5, 2008 at 5:09 AM, ashokc <[EMAIL PROTECTED]> wrote:

>
> The SOLR wiki says
>
> >>3. Make sure both indexes you want to merge are closed.
>
> What exactly does 'closed' mean?


I think that would mean that the IndexReader and IndexWriter on that index
are closed.

1. Do I need to stop SOLR search on both indexes before running the merge
> command? So a brief downtime is required?


I think so.


> Or do I simply prevent any 'updates/deletes' to these indices during the
> merge time so they can still serve up results (read only?) while I am
> creating a new merged index?
>
> 2. Before the new index replaces the old index, do I need to stop SOLR for
> that instance? Or can I simply move the old index out and place the new
> index in the same place, without having to stop SOLR


The rsync based replication in Solr uses similar schema. It creates
hardlinks to the new index files over the old ones.


> 3. If SOLR has to be stopped during the merge operation, can we work with a
> redundant/failover instance and stagger the merge so the search service
> will
> not go down? Any guidelines here are welcome.


It is not very clear as to what you are actually trying to do. Why do you
even need to merge indices? Are you creating your index outside of Solr?
Just curious to know your use-case.

-- 
Regards,
Shalin Shekhar Mangar.


Re: DataImportHandler - time stamp format in

2008-12-05 Thread Noble Paul നോബിള്‍ नोब्ळ्
I gguess you are trying to pass it in the SQL query. Tryit as it is .
If oracle does not take it you can format the date according to what
oracle likes

http://wiki.apache.org/solr/DataImportHandler#head-5675e913396a42eb7c6c5d3c894ada5dadbb62d7

On Fri, Dec 5, 2008 at 8:09 PM, Jae Joo <[EMAIL PROTECTED]> wrote:
> In the dataimport.properties file, there is the timespamp.
>
> #Thu Dec 04 15:36:22 EST 2008
> last_index_time=2008-12-04 15\:36\:20
>
> I am using the Oracle (10g) and would like to know which format of timestamp
> I have to use in Oracle.
>
> Thanks,
>
> Jae
>



-- 
--Noble Paul


Re: Can Solr follow links?

2008-12-05 Thread Noble Paul നോബിള്‍ नोब्ळ्
Look at http://wiki.apache.org/solr/DataImportHandler

You may use an outer entity with SqlEntityProcessor and an inner
entity with XPathEntityProcessor


On Fri, Dec 5, 2008 at 5:35 PM, Joel Karlsson <[EMAIL PROTECTED]> wrote:
> Hello,
>
> Is there any way for Solr to follow links stored in my database and index
> the content of these files and HTTP-resources?
>
> Thanks in advance! // Joel
>



-- 
--Noble Paul


Re: Merging Indices

2008-12-05 Thread Yonik Seeley
On Thu, Dec 4, 2008 at 6:39 PM, ashokc <[EMAIL PROTECTED]> wrote:
>
> The SOLR wiki says
>
>>>3. Make sure both indexes you want to merge are closed.
>
> What exactly does 'closed' mean?

If you do a commit, and then prevent updates, the index should be
closed (no open IndexWriter).

> 1. Do I need to stop SOLR search on both indexes before running the merge
> command? So a brief downtime is required?
> Or do I simply prevent any 'updates/deletes' to these indices during the
> merge time so they can still serve up results (read only?) while I am
> creating a new merged index?

Preventing updates/deletes should be sufficient.

> 2. Before the new index replaces the old index, do I need to stop SOLR for
> that instance? Or can I simply move the old index out and place the new
> index in the same place, without having to stop SOLR

Yes, simply moving the index should work if you are careful to avoid
any updates since the last commit.

> 3. If SOLR has to be stopped during the merge operation, can we work with a
> redundant/failover instance and stagger the merge so the search service will
> not go down? Any guidelines here are welcome.
>
> Thanks
>
> - ashok
> --
> View this message in context: 
> http://www.nabble.com/Merging-Indices-tp20845009p20845009.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


Re: Merging Indices

2008-12-05 Thread ashokc

Thanks for the help Yonik & Shalin.It really makes it easy for me if I do not
have to stop/start the SOLR app during the merge operations.

The reason I have to do this many times a day, is that I am implementing a
simple-minded entity-extraction procedure for the content I am indexing. I
have a user defined taxonomy into which the current documents, and any new
documents should be classified under. The taxonomy defines the nested facet
fields for SOLR. When a new document is posted, the user expects to have it
available in the right facet right away. My classification procedure is as
follows when a new document is added.

1. Create a new temporary index with that document (no taxonomy fields at
this time)
2. Search this index with each of the taxonomy terms (synonyms are employed
as well through synonyms.txt) and find out which of these categories is a
hit for this document.
3. Add a new " 
> On Thu, Dec 4, 2008 at 6:39 PM, ashokc <[EMAIL PROTECTED]> wrote:
>>
>> The SOLR wiki says
>>
3. Make sure both indexes you want to merge are closed.
>>
>> What exactly does 'closed' mean?
> 
> If you do a commit, and then prevent updates, the index should be
> closed (no open IndexWriter).
> 
>> 1. Do I need to stop SOLR search on both indexes before running the merge
>> command? So a brief downtime is required?
>> Or do I simply prevent any 'updates/deletes' to these indices during the
>> merge time so they can still serve up results (read only?) while I am
>> creating a new merged index?
> 
> Preventing updates/deletes should be sufficient.
> 
>> 2. Before the new index replaces the old index, do I need to stop SOLR
>> for
>> that instance? Or can I simply move the old index out and place the new
>> index in the same place, without having to stop SOLR
> 
> Yes, simply moving the index should work if you are careful to avoid
> any updates since the last commit.
> 
>> 3. If SOLR has to be stopped during the merge operation, can we work with
>> a
>> redundant/failover instance and stagger the merge so the search service
>> will
>> not go down? Any guidelines here are welcome.
>>
>> Thanks
>>
>> - ashok
>> --
>> View this message in context:
>> http://www.nabble.com/Merging-Indices-tp20845009p20845009.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Merging-Indices-tp20845009p20859513.html
Sent from the Solr - User mailing list archive at Nabble.com.



RE: Solr on Solaris

2008-12-05 Thread Kashyap, Raghu
Jon,

We are running under tomcat. Thanks for the link I will check it out

-Raghu

-Original Message-
From: Jon Baer [mailto:[EMAIL PROTECTED] 
Sent: Friday, December 05, 2008 10:57 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr on Solaris

Are you running Solr in a container more specifically,  Ive had few  
issues w/ zones in the past and Solr (I believe there are some  
networking issues w/ older Solaris versions) ...

They are basically where you can slice ("virtualize") your resources  
and divide a box up into something similar to a VPS ...

http://www.sun.com/bigadmin/content/zones/

- Jon

On Dec 5, 2008, at 10:58 AM, Kashyap, Raghu wrote:

> Jon,
>
> What do you mean by off a "Zone"? Please clarify
>
> -Raghu
>
>
> -Original Message-
> From: Jon Baer [mailto:[EMAIL PROTECTED]
> Sent: Thursday, December 04, 2008 9:56 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr on Solaris
>
> Just curious, is this off a "zone" by any chance?
>
> - Jon
>
> On Dec 4, 2008, at 10:40 PM, Kashyap, Raghu wrote:
>
>> We are running solr on a solaris box with 4 CPU's(8 cores) and  3GB
>> Ram.
>> When we try to index sometimes the HTTP Connection just hangs and the
>> client which is posting documents to solr doesn't get any response
>> back.
>> We since then have added timeouts to our http requests from the
>> clients.
>>
>>
>>
>> I then get this error.
>>
>>
>>
>> java.lang.OutOfMemoryError: requested 239848 bytes for Chunk::new.  
>> Out
>> of swap space?
>>
>> java.lang.OutOfMemoryError: unable to create new native thread
>>
>> Exception in thread "JmxRmiRegistryConnectionPoller"
>> java.lang.OutOfMemoryError: unable to create new native thread
>>
>>
>>
>> We are running JDK 1.6_10 on the solaris box. . The weird thing is we
>> are running the same application on linux box with JDK 1.6 and we
>> haven't seen any problem like this.
>>
>>
>>
>> Any suggestions?
>>
>>
>>
>> -Raghu
>>
>



Re: IOException: Mark invalid while analyzing HTML

2008-12-05 Thread Dean Thompson

Was this one ever addressed?  I'm seeing it in some small percentage of the
documents that I index in 1.4-dev 708596M.  I don't see a corresponding JIRA
issue.


James Brady-3 wrote:
> 
> Hi,
> I'm seeing a problem mentioned in Solr-42, Highlighting problems with  
> HTMLStripWhitespaceTokenizerFactory:
> https://issues.apache.org/jira/browse/SOLR-42
> 
> I'm indexing HTML documents, and am getting reams of "Mark invalid"  
> IOExceptions:
> SEVERE: java.io.IOException: Mark invalid
>   at java.io.BufferedReader.reset(Unknown Source)
>   at  
> org 
> .apache 
> .solr.analysis.HTMLStripReader.restoreState(HTMLStripReader.java:171)
>   at org.apache.solr.analysis.HTMLStripReader.read(HTMLStripReader.java: 
> 728)
>   at org.apache.solr.analysis.HTMLStripReader.read(HTMLStripReader.java: 
> 742)
>   at java.io.Reader.read(Unknown Source)
>   at org.apache.lucene.analysis.CharTokenizer.next(CharTokenizer.java:56)
>   at org.apache.lucene.analysis.StopFilter.next(StopFilter.java:118)
>   at  
> org 
> .apache 
> .solr.analysis.WordDelimiterFilter.next(WordDelimiterFilter.java:249)
>   at  
> org.apache.lucene.analysis.LowerCaseFilter.next(LowerCaseFilter.java:33)
>   at  
> org 
> .apache 
> .solr 
> .analysis.EnglishPorterFilter.next(EnglishPorterFilterFactory.java:92)
>   at org.apache.lucene.analysis.TokenStream.next(TokenStream.java:45)
>   at  
> org 
> .apache 
> .solr.analysis.BufferedTokenStream.read(BufferedTokenStream.java:94)
>   at  
> org 
> .apache 
> .solr 
> .analysis 
> .RemoveDuplicatesTokenFilter.process(RemoveDuplicatesTokenFilter.java: 
> 33)
>   at  
> org 
> .apache 
> .solr.analysis.BufferedTokenStream.next(BufferedTokenStream.java:82)
>   at org.apache.lucene.analysis.TokenStream.next(TokenStream.java:79)
>   at org.apache.lucene.index.DocumentsWriter$ThreadState 
> $FieldData.invertField(DocumentsWriter.java:1518)
>   at org.apache.lucene.index.DocumentsWriter$ThreadState 
> $FieldData.processField(DocumentsWriter.java:1407)
>   at org.apache.lucene.index.DocumentsWriter 
> $ThreadState.processDocument(DocumentsWriter.java:1116)
>   at  
> org 
> .apache 
> .lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:2440)
>   at  
> org 
> .apache.lucene.index.DocumentsWriter.addDocument(DocumentsWriter.java: 
> 2422)
>   at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java: 
> 1445)
> 
> 
> This is using a ~1 week old version of Solr 1.3 from SVN.
> 
> One workaround mentioned in that Jira issue was to move HTML stripping  
> outside of Solr; can anyone suggest a better approach than that?
> 
> Thanks
> James
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/IOException%3A-Mark-invalid-while-analyzing-HTML-tp17052153p20859862.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr on Solaris

2008-12-05 Thread Jeryl Cook
your out of memory :).

each instance of an application server you can technically only
allocate like 1024mb to the JVM, to take advantage of the memory you
need to run multiple instances of the application server.

are you using RAMDirectory with SOLR?

On Thu, Dec 4, 2008 at 10:40 PM, Kashyap, Raghu
<[EMAIL PROTECTED]> wrote:
> We are running solr on a solaris box with 4 CPU's(8 cores) and  3GB Ram.
> When we try to index sometimes the HTTP Connection just hangs and the
> client which is posting documents to solr doesn't get any response back.
> We since then have added timeouts to our http requests from the clients.
>
>
>
> I then get this error.
>
>
>
> java.lang.OutOfMemoryError: requested 239848 bytes for Chunk::new. Out
> of swap space?
>
> java.lang.OutOfMemoryError: unable to create new native thread
>
> Exception in thread "JmxRmiRegistryConnectionPoller"
> java.lang.OutOfMemoryError: unable to create new native thread
>
>
>
> We are running JDK 1.6_10 on the solaris box. . The weird thing is we
> are running the same application on linux box with JDK 1.6 and we
> haven't seen any problem like this.
>
>
>
> Any suggestions?
>
>
>
> -Raghu
>
>



-- 
Jeryl Cook
/^\ Pharaoh /^\
http://pharaohofkush.blogspot.com/
"Whether we bring our enemies to justice, or bring justice to our
enemies, justice will be done."
--George W. Bush, Address to a Joint Session of Congress and the
American People, September 20, 2001


getting xml out of a SolrDocument ?

2008-12-05 Thread Dan Robin

I am using solrj to query solr and the QueryResponse.getResults() returns a
SolrDocumentList. There is a SolrDocument in the list with the results I
want. The problem is that I want to view these results as XML. How can I get
the SolrDocument to give me XML?

Thanks in advance.

  -Dan
-- 
View this message in context: 
http://www.nabble.com/getting-xml-out-of-a-SolrDocument---tp20861491p20861491.html
Sent from the Solr - User mailing list archive at Nabble.com.



creating cores on demand

2008-12-05 Thread Dean Thompson
Our application processes RSS feeds.  Its search activity is heavily  
concentrated on the most recent 24 hours, with modest searching across  
the past few days, and rare (but important) searching across months or  
more.  So we create a Solr core for each day, and then search the  
appropriate set of cores for any given date range.


We used to pile up zillions of cores in solr.xml, and open them on  
every Solr restart.  But we kept running out of things:  memory, open  
file descriptors, and threads.  So I think I have a better solution.


Now, any time we need a core, we create it on the fly.  We have  
solr.xml set up to *not* persist new cores.  But of course their data  
directories are persistent.


So far this appears to work great in QA.  I've only done limited  
testing yet, but I believe each core that we create will either  
"reconnect" to an existing data directory or create a new data  
directory, as appropriate.


Anyone know of problems with this approach?

Here is some of the most important source code (using Solrj), in case  
someone else finds this approach useful, or in case someone feels  
motivated to study it for problems.


Dean

/**
 * Keeps track of the names of cores that are known to exist, so  
we don't have to keep checking.

 */
private Set knownCores = new HashSet(20);

/**
 * Returns the [EMAIL PROTECTED] SolrServer} for the specified [EMAIL PROTECTED]  
prefix} and [EMAIL PROTECTED] day}.

 */
private SolrServer getSolrServer(String prefix, int day)
throws SolrServerException, IOException
{
String coreName = prefix + day;
String serverUrl = solrRootUrl + "/" + coreName;
try {
makeCoreAvailable(coreName);
return new CommonsHttpSolrServer(serverUrl);
} catch (MalformedURLException e) {
String message = "Invalid Solr server URL  
(misconfiguration of solrRootUrl) "
 + serverUrl + ": " +  
ExceptionUtil.getMessage(e);

LOGGER.error(message, e);
reportError();
throw new SolrMisconfigurationException(message, e);
}
}

private synchronized void makeCoreAvailable(String coreName)
throws SolrServerException, IOException
{
if (knownCores.contains(coreName)) {
return;
}
if (solrCoreExists(coreName, solrRootUrl)) {
knownCores.add(coreName);
return;
}
CommonsHttpSolrServer adminServer = new  
CommonsHttpSolrServer(solrRootUrl);

LOGGER.info("Creating new Solr core " + coreName);
CoreAdminRequest.createCore(coreName, coreName, adminServer,  
solrConfigFilename, solrSchemaFilename);

LOGGER.info("Successfully created new Solr core " + coreName);
}

private static boolean solrCoreExists(String coreName, String  
solrRootUrl) throws IOException, SolrServerException

{
CommonsHttpSolrServer adminServer = new  
CommonsHttpSolrServer(solrRootUrl);
CoreAdminResponse status =  
CoreAdminRequest.getStatus(coreName, adminServer);


return status.getCoreStatus(coreName).get("instanceDir") !=  
null;

}



Re: Solr on Solaris

2008-12-05 Thread Glen Newton
When you are saying "application server" do you mean tomcat?

If yes, I have allocated >8GB of heap to tomcat and it uses it all no
problem (64 bit Intel/64 bit Java).

-glen

2008/12/5 Jeryl Cook <[EMAIL PROTECTED]>:
> your out of memory :).
>
> each instance of an application server you can technically only
> allocate like 1024mb to the JVM, to take advantage of the memory you
> need to run multiple instances of the application server.
>
> are you using RAMDirectory with SOLR?
>
> On Thu, Dec 4, 2008 at 10:40 PM, Kashyap, Raghu
> <[EMAIL PROTECTED]> wrote:
>> We are running solr on a solaris box with 4 CPU's(8 cores) and  3GB Ram.
>> When we try to index sometimes the HTTP Connection just hangs and the
>> client which is posting documents to solr doesn't get any response back.
>> We since then have added timeouts to our http requests from the clients.
>>
>>
>>
>> I then get this error.
>>
>>
>>
>> java.lang.OutOfMemoryError: requested 239848 bytes for Chunk::new. Out
>> of swap space?
>>
>> java.lang.OutOfMemoryError: unable to create new native thread
>>
>> Exception in thread "JmxRmiRegistryConnectionPoller"
>> java.lang.OutOfMemoryError: unable to create new native thread
>>
>>
>>
>> We are running JDK 1.6_10 on the solaris box. . The weird thing is we
>> are running the same application on linux box with JDK 1.6 and we
>> haven't seen any problem like this.
>>
>>
>>
>> Any suggestions?
>>
>>
>>
>> -Raghu
>>
>>
>
>
>
> --
> Jeryl Cook
> /^\ Pharaoh /^\
> http://pharaohofkush.blogspot.com/
> "Whether we bring our enemies to justice, or bring justice to our
> enemies, justice will be done."
> --George W. Bush, Address to a Joint Session of Congress and the
> American People, September 20, 2001
>



-- 

-


Re: Stemmer vs. exact match

2008-12-05 Thread Grant Ingersoll


On Dec 4, 2008, at 8:19 PM, Jonathan Ariel wrote:

Hi! I'm wondering what solr is really doing with the exact word vs.  
the

stemmed word.
So for example I have 2 documents.
The first one has in the title the word "convertible"
The second one has "convert"
When solr stem the titles, both will be the same since convertible ->
convert.

Then when I search "convertible" both documents seems to have the same
relevancy... is that right or Solr keeps track of the original word  
and
gives extra score to the fact that I am actually looking for the  
same exact
word that I have in a document... I might be wrong, but it seems to  
me that

it should score that better.



Solr doesn't keep track of the original word, unless you tell it to.   
So, if you are stemming, then you are losing the original word.  A  
common way to solve what you are doing is to actually have two fields,  
where one is stemmed and one is exact (you can do this with the  
 mechanism in the Schema).   Thus, if you want exact  
match, you search the exact match field, otherwise you search the  
stemmed field.


-Grant


Re: getting xml out of a SolrDocument ?

2008-12-05 Thread Erik Hatcher
I'd somehow pass through Solr's XML response, or perhaps consider  
using Solr's XSLT response writer to convert to the format you want.   
I don't have the magic incantation handy, but it should be possible to  
make a request through SolrJ and get the raw response string back in  
whatever format you want.


Erik

On Dec 5, 2008, at 3:02 PM, Dan Robin wrote:



I am using solrj to query solr and the QueryResponse.getResults()  
returns a
SolrDocumentList. There is a SolrDocument in the list with the  
results I
want. The problem is that I want to view these results as XML. How  
can I get

the SolrDocument to give me XML?

Thanks in advance.

 -Dan
--
View this message in context: 
http://www.nabble.com/getting-xml-out-of-a-SolrDocument---tp20861491p20861491.html
Sent from the Solr - User mailing list archive at Nabble.com.




Smaller filterCache giving better performance

2008-12-05 Thread wojtekpia

I've seen some strangle results in the last few days of testing, but this one
flies in the face of everything I've read on this forum: Reducing
filterCache size has increased performance. 

I have posted my setup here:
http://www.nabble.com/Throughput-Optimization-td20335132.html.

My original filterCache was 700,000. Reducing it to 20,000, I found:
- Average response time decreased by 85%
- Average throughput increased by 250%
- CPU time used by the garbage collector decreased by 85%
- The system showed to weird GC issues (reported yesterday at:
http://www.nabble.com/new-faceting-algorithm-td20674902.html)

Further reducing the filterCache to 10,000
- Average response time decreased by another 27%
- Average throughput increased by another 30%
- GC CPU usage also dropped
- System behavior changed after ~30 minutes, with a slight performance
degradation

These results came from a load test. I'm running trunk code from Dec 2 with
Yonik's faceting improvement turned on.

Any thoughts?
-- 
View this message in context: 
http://www.nabble.com/Smaller-filterCache-giving-better-performance-tp20863674p20863674.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Smaller filterCache giving better performance

2008-12-05 Thread Mike Klaas

On 5-Dec-08, at 2:24 PM, wojtekpia wrote:



I've seen some strangle results in the last few days of testing, but  
this one

flies in the face of everything I've read on this forum: Reducing
filterCache size has increased performance.


This isn't really unexpected behaviour.  The problem with a huge  
filter cache is that it is fighting with OS disk cache--the latter of  
which can be much much more important.  Reducing the size of the  
filter cache give more to the OS.


Try giving 17GB to java, and letting the OS cache the entire index.   
Increase the filter cache as much as you can without OOM'ing.  That  
should give optimal performance.   Note that you don't always need the  
_whole_ index in the os cache to get acceptable performance, but if  
you can afford it, it is a good idea.


It is also possible that you are experiencing contention in the  
filtercache code--have you tried the concurrent filter cache impl?


-Mike



I have posted my setup here:
http://www.nabble.com/Throughput-Optimization-td20335132.html.

My original filterCache was 700,000. Reducing it to 20,000, I found:
- Average response time decreased by 85%
- Average throughput increased by 250%
- CPU time used by the garbage collector decreased by 85%
- The system showed to weird GC issues (reported yesterday at:
http://www.nabble.com/new-faceting-algorithm-td20674902.html)

Further reducing the filterCache to 10,000
- Average response time decreased by another 27%
- Average throughput increased by another 30%
- GC CPU usage also dropped
- System behavior changed after ~30 minutes, with a slight performance
degradation

These results came from a load test. I'm running trunk code from Dec  
2 with

Yonik's faceting improvement turned on.

Any thoughts?
--
View this message in context: 
http://www.nabble.com/Smaller-filterCache-giving-better-performance-tp20863674p20863674.html
Sent from the Solr - User mailing list archive at Nabble.com.





Re: getting xml out of a SolrDocument ?

2008-12-05 Thread Yonik Seeley
On Fri, Dec 5, 2008 at 5:24 PM, Erik Hatcher <[EMAIL PROTECTED]> wrote:
> I'd somehow pass through Solr's XML response, or perhaps consider using
> Solr's XSLT response writer to convert to the format you want.  I don't have
> the magic incantation handy, but it should be possible to make a request
> through SolrJ and get the raw response string back in whatever format you
> want.

One could subclass RequestParser (or XMLRequestParser) and do nothing
but put the entire response body in a String.

-Yonik


Re: Smaller filterCache giving better performance

2008-12-05 Thread Yonik Seeley
On Fri, Dec 5, 2008 at 5:24 PM, wojtekpia <[EMAIL PROTECTED]> wrote:
>
> I've seen some strangle results in the last few days of testing, but this one
> flies in the face of everything I've read on this forum: Reducing
> filterCache size has increased performance.
>
> I have posted my setup here:
> http://www.nabble.com/Throughput-Optimization-td20335132.html.
>
> My original filterCache was 700,000. Reducing it to 20,000, I found:
> - Average response time decreased by 85%
> - Average throughput increased by 250%
> - CPU time used by the garbage collector decreased by 85%
> - The system showed to weird GC issues (reported yesterday at:
> http://www.nabble.com/new-faceting-algorithm-td20674902.html)
>
> Further reducing the filterCache to 10,000
> - Average response time decreased by another 27%
> - Average throughput increased by another 30%
> - GC CPU usage also dropped
> - System behavior changed after ~30 minutes, with a slight performance
> degradation
>
> These results came from a load test. I'm running trunk code from Dec 2 with
> Yonik's faceting improvement turned on.

Old faceting used the filterCache exclusively.
New faceting only uses it for terms that cover ~5% of the index, so
you can reduce the filterCache quite a bit potentially, save more RAM,
and increase the amount of memory you can give to the OS cache.

-Yonik


Re: Smaller filterCache giving better performance

2008-12-05 Thread wojtekpia

Reducing the amount of memory given to java slowed down Solr at first, then
quickly caused the garbage collector to behave badly (same issue as I
referenced above). 

I am using the concurrent cache for all my caches.
-- 
View this message in context: 
http://www.nabble.com/Smaller-filterCache-giving-better-performance-tp20863674p20864928.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Dealing with field values as key/value pairs

2008-12-05 Thread Chris Hostetter

: So i'm basically looking for design pattern/best practice for that scenario
: based on people's experience.

I've taken two approaches in the past...

1) encode the "id" and the "label" in the field value; facet on it; 
require clients to know how to decode.  This works really well for simple 
things where the the id=>label mappings don't ever change, and are 
easy to encode (ie "01234:Chris Hostetter").  This is a horrible approach 
when id=>label mappings do change with any frequency.

2) have a seperate type of "metadata" document, one per "thing" that you 
are faceting on containing fields for id and the label (and probably a 
doc_type field so you can tell it apart from your main docs) then once 
you've done your main query and gotten the results back facetied on id, 
you can query for those ids to get the corrisponding labels.  this works 
realy well if the labels ever change (just reindex the corrisponding 
metadata document) and has the added bonus that you can store additional 
metadata in each of those docs, and in many use cases for presenting an 
initial "browse" interface, you can sometimes get away with a cheap 
search for all metadata docs (or all metadata docs meeting a certain 
criteria) instead of an expensive facet query across all of your main 
documents.



-Hoss



Re: Ordering updates

2008-12-05 Thread Shalin Shekhar Mangar
On Fri, Dec 5, 2008 at 5:40 AM, Laurence Rowe <[EMAIL PROTECTED]> wrote:

> 2008/12/4 Shalin Shekhar Mangar <[EMAIL PROTECTED]>:
>
>
> I think we have a slight misunderstanding here. Because there are many
> CMS processes it is possible that the same document will be updated
> concurrently (from different web requests). In this case two updates
> are sent (one by each process). The problem arises when the two update
> requests are processed in a different order to the original database
> transactions.


Ok, I think I understand your problem now. If multiple processes send update
requests, they will overwrite each other which is not what you want.


> I guess the only way to achieve consistency is to stage my indexed
> data in a database table and trigger a DataImportHandler to perform
> delta imports after each transaction.


I agree. You need to use a transactional mechanism to ensure consistency,
then you should use a database. Periodically, you can index this particular
table into Solr. However, if you have multi-valued fields, you may run into
problems.

One more thing that you can think about, depending on your use-case, is
whether a small amount of stale data is OK? Do you really need things
consistent and upto date all the time in Solr? I also know of cases where
people have removed frequently changing fields from Solr and fetched them
from DB at the time of page render. Ofcourse, that doesn't work when you
need to sort by that frequently changind field.


> >> From what I can tell this conditional indexing feature is not
> >> supported by Solr. Might it be supported by Lucene but not exposed by
> >> Solr?
> >>
> >
> > No this is not supported by either of Lucene/Solr.
>
>
> This is a pity, eventual consistency is a nice model.
>
> Regards,
>
> Laurence
>



-- 
Regards,
Shalin Shekhar Mangar.