Documentation for Master-Slave Replication missing for Solr1.3. No mirror site for Solr 1.4 distribution.
Hi, I want to know how to setup master-slave configuration for Solr 1.3 . I can't get documentation on the net. I found one for 1.4 but not for 1.3 . ReplicationHandler is not present in 1.3. Also, I would like to know from will I get the Solr 14. distribution. The Solr Site lists mirrors only for 1.3 dist. Regards, Ninad.
Re: Documentation for Master-Slave Replication missing for Solr1.3. No mirror site for Solr 1.4 distribution.
1.4 is not released yet. you can grab a nightly from here http://people.apache.org/builds/lucene/solr/nightly/ On Fri, Aug 7, 2009 at 12:47 PM, Ninad Raut wrote: > Hi, > I want to know how to setup master-slave configuration for Solr 1.3 . I > can't get documentation on the net. I found one for 1.4 but not for 1.3 . > ReplicationHandler is not present in 1.3. > Also, I would like to know from will I get the Solr 14. distribution. The > Solr Site lists mirrors only for 1.3 dist. > Regards, > Ninad. > -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: Documentation for Master-Slave Replication missing for Solr1.3. No mirror site for Solr 1.4 distribution.
On Fri, Aug 7, 2009 at 12:47 PM, Ninad Raut wrote: > Hi, > I want to know how to setup master-slave configuration for Solr 1.3 . I > can't get documentation on the net. I found one for 1.4 but not for 1.3 . > ReplicationHandler is not present in 1.3. > Also, I would like to know from will I get the Solr 14. distribution. The > Solr Site lists mirrors only for 1.3 dist. > Regards, > > Most documentation on the 1.3 script based replication is on the wiki at: http://wiki.apache.org/solr/CollectionDistribution http://wiki.apache.org/solr/SolrCollectionDistributionScripts http://wiki.apache.org/solr/SolrCollectionDistributionStatusStats http://wiki.apache.org/solr/SolrCollectionDistributionOperationsOutline -- Regards, Shalin Shekhar Mangar.
Re: Documentation for Master-Slave Replication missing for Solr1.3. No mirror site for Solr 1.4 distribution.
Hi Noble, can these builds be used in production environment? Are they stable? we are not going live now, but in a few months we will. as such when will 1.4 be officially released? 2009/8/7 Noble Paul നോബിള് नोब्ळ् > 1.4 is not released yet. you can grab a nightly from here > http://people.apache.org/builds/lucene/solr/nightly/ > > On Fri, Aug 7, 2009 at 12:47 PM, Ninad Raut > wrote: > > Hi, > > I want to know how to setup master-slave configuration for Solr 1.3 . I > > can't get documentation on the net. I found one for 1.4 but not for 1.3 . > > ReplicationHandler is not present in 1.3. > > Also, I would like to know from will I get the Solr 14. distribution. The > > Solr Site lists mirrors only for 1.3 dist. > > Regards, > > Ninad. > > > > > > -- > - > Noble Paul | Principal Engineer| AOL | http://aol.com >
CorruptIndexException: Unknown format version
Hi, how can that happen, it is a new index, and it is already corrupt? Did anybody else something like this? WARN - 2009-08-07 10:44:54,925 | Solr index directory 'data/solr/index' doesn't exist. Creating new index... WARN - 2009-08-07 10:44:56,583 | solrconfig.xml uses deprecated , Please update your config to use the ShowFileRequestHandler. WARN - 2009-08-07 10:44:56,586 | adding ShowFileRequestHandler with hidden files: [XSLT] ERROR - 2009-08-07 10:44:58,758 | java.lang.RuntimeException: org.apache.lucene.index.CorruptIndexException: Unknown format version: -7 at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:433) at org.apache.solr.core.SolrCore.(SolrCore.java:216) at org.apache.solr.core.SolrCore.getSolrCore(SolrCore.java:177) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:69) at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:99) Best regards -- Maximilian Hütter blue elephant systems GmbH Wollgrasweg 49 D-70599 Stuttgart Tel: (+49) 0711 - 45 10 17 578 Fax: (+49) 0711 - 45 10 17 573 e-mail : max.huet...@blue-elephant-systems.com Sitz : Stuttgart, Amtsgericht Stuttgart, HRB 24106 Geschäftsführer: Joachim Hörnle, Thomas Gentsch, Holger Dietrich
Re: mergeFactor / indexing speed
Juhu, great news, guys. I merged my child entity into the root entity, and changed the custom entityprocessor to handle the additional columns correctly. And - indexing 160k documents now takes 5min instead of 1.5h! (Now I can go relaxed on vacation. :-D ) Conclusion: In my case performance was so bad because of constantly querying a database on a different machine (network traffic + db query per document). Thanks for all your help! Chantal Avlesh Singh schrieb: does DIH call commit periodically, or are things done in one big batch? AFAIK, one big batch. yes. There is no index available once the full-import started (and the searcher has no cache, other wise it still reads from that). There is no data (i.e. in the Admin/Luke frontend) visible until the import is finished correctly.
Re: Language Detection for Analysis?
Otis Gospodnetic wrote: Bradford, If I may: Have a look at http://www.sematext.com/products/language-identifier/index.html And/or http://www.sematext.com/products/multilingual-indexer/index.html .. and a Nutch plugin with similar functionality: http://lucene.apache.org/nutch/apidocs-1.0/org/apache/nutch/analysis/lang/LanguageIdentifier.html -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com
Re: Language Detection for Analysis?
Hi, On Fri, Aug 7, 2009 at 12:31 PM, Andrzej Bialecki wrote: > .. and a Nutch plugin with similar functionality: > > http://lucene.apache.org/nutch/apidocs-1.0/org/apache/nutch/analysis/lang/LanguageIdentifier.html See also TIKA-209 [1] where I'm currently integrating the Nutch code to work with Tika. Tika 0.5 will have built-in language detection based on this. [1] https://issues.apache.org/jira/browse/TIKA-209 BR, Jukka Zitting
Help creating schema for indexable document
Hi Guys. I am struggling to create a schema with a determinist content model for a set of documents I want to index. My indexable documents will look something like: 1 code1 code2 mycategory My service will be mission critical and will accept batch imports from a potentially unreliable source. Are there any xml schema guru's who can help me with creating xn xsd which will work with my sample document? Thanks in advance for your help, -- Ross -- View this message in context: http://www.nabble.com/Help-creating-schema-for-indexable-document-tp24862700p24862700.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: mergeFactor / indexing speed
On Fri, Aug 7, 2009 at 3:58 PM, Chantal Ackermann < chantal.ackerm...@btelligent.de> wrote: > Juhu, great news, guys. I merged my child entity into the root entity, and > changed the custom entityprocessor to handle the additional columns > correctly. > And - indexing 160k documents now takes 5min instead of 1.5h! > I'm a little late to the party but you may also want to look at CachedSqlEntityProcessor. -- Regards, Shalin Shekhar Mangar.
Re: mergeFactor / indexing speed
Thanks for the tip, Shalin. I'm happy with 6 indexes running in parallel and completing in less than 10min, right now, but I'll have look anyway. Shalin Shekhar Mangar schrieb: On Fri, Aug 7, 2009 at 3:58 PM, Chantal Ackermann < chantal.ackerm...@btelligent.de> wrote: Juhu, great news, guys. I merged my child entity into the root entity, and changed the custom entityprocessor to handle the additional columns correctly. And - indexing 160k documents now takes 5min instead of 1.5h! I'm a little late to the party but you may also want to look at CachedSqlEntityProcessor. -- Regards, Shalin Shekhar Mangar.
Solr 1.4 in Production Environment-- Is it stable?
Hi, Has anyone used Solr 1.4 in production? There are some really nice features in it like - Directly adding POJOs to Solr - ReplicationHandler etc. Is 1.4 stable enought to be used in production?
Re: solr v1.4 in production?
On Wed, Jul 1, 2009 at 6:17 PM, Ed Summers wrote: > Here at the Library of Congress we've got several production Solr > instances running v1.3. We've been itching to get at what will be v1.4 > and were wondering if anyone else happens to be using it in production > yet. Any information you can provide would be most welcome. > > We're using Solr 1.4 built from r793546 in production along with the new java based replication. -- Regards, Shalin Shekhar Mangar.
Re: Solr 1.4 in Production Environment-- Is it stable?
I know a number of large companies using 1.4-dev. But you could also wait another month or so and get the real 1.4. Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message > From: Ninad Raut > To: solr-user@lucene.apache.org > Sent: Friday, August 7, 2009 7:32:17 AM > Subject: Solr 1.4 in Production Environment-- Is it stable? > > Hi, > Has anyone used Solr 1.4 in production? There are some really nice features > in it like > >- Directly adding POJOs to Solr >- ReplicationHandler etc. > > Is 1.4 stable enought to be used in production?
Re: Language Detection for Analysis?
There are several free Language Detection libraries out there, as well as a few commercial ones. I think Karl Wettin has even written one as a plugin for Lucene. Nutch also has one, AIUI. I would just Google "language detection". Also see http://www.lucidimagination.com/search/?q=language+detection, as this has been brought up many times before and I'm sure there are links in the archives. On Aug 6, 2009, at 3:46 PM, Bradford Stephens wrote: Hey there, We're trying to add foreign language support into our new search engine -- languages like Arabic, Farsi, and Urdu (that don't work with standard analyzers). But our data source doesn't tell us which languages we're actually collecting -- we just get blocks of text. Has anyone here worked on language detection so we can figure out what analyzers to use? Are there commercial solutions? Much appreciated! -- http://www.roadtofailure.com -- The Fringes of Scalability, Social Media, and Computer Science - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Item Facet
Thanks Avlesh. But I didn't get it. How a dynamic field would aggregate values in query time? On Thu, Aug 6, 2009 at 11:14 PM, Avlesh Singh wrote: > Dynamic fields might be an answer. If you had a field called "product_*" and > these were populated with the corresponding values during indexing then > faceting on these fields will give you the desired behavior. > > The only catch here is that the product names have to be known upfront. A > wildcard support for field names in facet.fl is still to come in Solr. > Here's the issue - https://issues.apache.org/jira/browse/SOLR-247 > > Cheers > Avlesh > > On Fri, Aug 7, 2009 at 3:33 AM, David Lojudice Sobrinho > wrote: > >> I can't reindex because the aggregated/grouped result should change as >> the query changes... in other words, the result must by dynamic >> >> We've been thinking about a new handler for it something like: >> >> >> /select?q=laptop&rows=0&itemfacet=on&itemfacet.field=product_name,min(price),max(price) >> >> Does it make sense? Something easier ready to use? >> >> >> On Thu, Aug 6, 2009 at 6:05 PM, Ge, Yao (Y.) wrote: >> > If you can reindex, simply rebuild the index with fields replaced by >> > combining existing fields. >> > -Yao >> > >> > -Original Message- >> > From: David Lojudice Sobrinho [mailto:dalss...@gmail.com] >> > Sent: Thursday, August 06, 2009 4:17 PM >> > To: solr-user@lucene.apache.org >> > Subject: Item Facet >> > >> > Hi... >> > >> > Is there any way to group values like shopping.yahoo.com or >> > shopper.cnet.com do? >> > >> > For instance, I have documents like: >> > >> > doc1 - product_name1 - value1 >> > doc2 - product_name1 - value2 >> > doc3 - product_name1 - value3 >> > doc4 - product_name2 - value4 >> > doc5 - product_name2 - value5 >> > doc6 - product_name2 - value6 >> > >> > I'd like to have a result grouping by product name with the value >> > range per product. Something like: >> > >> > product_name1 - (value1 to value3) >> > product_name2 - (value4 to value6) >> > >> > It is not like the current facet because the information is grouped by >> > item, not the entire result. >> > >> > Any idea? >> > >> > Thanks! >> > >> > David Lojudice Sobrinho >> > >> >> >> >> -- >> __ >> >> David L. S. >> dalss...@gmail.com >> __ >> > -- __ David L. S. dalss...@gmail.com __
Re: Solr 1.4 in Production Environment-- Is it stable?
We also use 1.4 which has gotten hit with load tests of up to 2000queries/sec. Biggest thing is make sure you are using the slaves for that kind of load. Other than that 1.4 is pretty impressive. -- Jeff Newburn Software Engineer, Zappos.com jnewb...@zappos.com - 702-943-7562 > From: Otis Gospodnetic > Reply-To: > Date: Fri, 7 Aug 2009 05:26:06 -0700 (PDT) > To: > Subject: Re: Solr 1.4 in Production Environment-- Is it stable? > > I know a number of large companies using 1.4-dev. But you could also wait > another month or so and get the real 1.4. > > Otis > -- > Sematext is hiring -- http://sematext.com/about/jobs.html?mls > Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR > > > > - Original Message >> From: Ninad Raut >> To: solr-user@lucene.apache.org >> Sent: Friday, August 7, 2009 7:32:17 AM >> Subject: Solr 1.4 in Production Environment-- Is it stable? >> >> Hi, >> Has anyone used Solr 1.4 in production? There are some really nice features >> in it like >> >>- Directly adding POJOs to Solr >>- ReplicationHandler etc. >> >> Is 1.4 stable enought to be used in production? >
Re: Item Facet
Are your product_name* fields numeric fields (integer or float)? Dals wrote: > > Hi... > > Is there any way to group values like shopping.yahoo.com or > shopper.cnet.com do? > > For instance, I have documents like: > > doc1 - product_name1 - value1 > doc2 - product_name1 - value2 > doc3 - product_name1 - value3 > doc4 - product_name2 - value4 > doc5 - product_name2 - value5 > doc6 - product_name2 - value6 > > I'd like to have a result grouping by product name with the value > range per product. Something like: > > product_name1 - (value1 to value3) > product_name2 - (value4 to value6) > > It is not like the current facet because the information is grouped by > item, not the entire result. > > Any idea? > > Thanks! > > David Lojudice Sobrinho > > -- View this message in context: http://www.nabble.com/Item-Facet-tp24853669p24865535.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: CorruptIndexException: Unknown format version
Wow, that is an interesting one... I bet there is more than one Lucene version kicking around the classpath somehow. Try removing all of the servlet container's working directories. -Yonik http://www.lucidimagination.com On Fri, Aug 7, 2009 at 4:41 AM, Maximilian Hütter wrote: > Hi, > > how can that happen, it is a new index, and it is already corrupt? > > Did anybody else something like this? > > WARN - 2009-08-07 10:44:54,925 | Solr index directory 'data/solr/index' > doesn't exist. Creating new index... > WARN - 2009-08-07 10:44:56,583 | solrconfig.xml uses deprecated > , Please update your config to use the > ShowFileRequestHandler. > WARN - 2009-08-07 10:44:56,586 | adding ShowFileRequestHandler with > hidden files: [XSLT] > ERROR - 2009-08-07 10:44:58,758 | java.lang.RuntimeException: > org.apache.lucene.index.CorruptIndexException: Unknown format version: -7 > at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:433) > at org.apache.solr.core.SolrCore.(SolrCore.java:216) > at org.apache.solr.core.SolrCore.getSolrCore(SolrCore.java:177) > at > org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:69) > at > org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:99) > > > Best regards > > > -- > Maximilian Hütter > blue elephant systems GmbH > Wollgrasweg 49 > D-70599 Stuttgart > > Tel : (+49) 0711 - 45 10 17 578 > Fax : (+49) 0711 - 45 10 17 573 > e-mail : max.huet...@blue-elephant-systems.com > Sitz : Stuttgart, Amtsgericht Stuttgart, HRB 24106 > Geschäftsführer: Joachim Hörnle, Thomas Gentsch, Holger Dietrich >
Re: Item Facet
The behavior i'm expecting is something similar to a GROUP BY in a relational database. SELECT product_name, model, min(price), max(price), count(*) FROM t GROUP BY product_name, model The current schema: product_name (type: text) model (type: text) price (type: sfloat) On Fri, Aug 7, 2009 at 11:07 AM, Yao Ge wrote: > > Are your product_name* fields numeric fields (integer or float)? > > > Dals wrote: >> >> Hi... >> >> Is there any way to group values like shopping.yahoo.com or >> shopper.cnet.com do? >> >> For instance, I have documents like: >> >> doc1 - product_name1 - value1 >> doc2 - product_name1 - value2 >> doc3 - product_name1 - value3 >> doc4 - product_name2 - value4 >> doc5 - product_name2 - value5 >> doc6 - product_name2 - value6 >> >> I'd like to have a result grouping by product name with the value >> range per product. Something like: >> >> product_name1 - (value1 to value3) >> product_name2 - (value4 to value6) >> >> It is not like the current facet because the information is grouped by >> item, not the entire result. >> >> Any idea? >> >> Thanks! >> >> David Lojudice Sobrinho >> >> > > -- > View this message in context: > http://www.nabble.com/Item-Facet-tp24853669p24865535.html > Sent from the Solr - User mailing list archive at Nabble.com. > > -- __ David L. S. dalss...@gmail.com __
Is kill -9 safe or not?
I've seen several threads that are one or two years old saying that performing "kill -9" on the java process running Solr either CAN, or CAN NOT corrupt your index. The more recent ones seem to say that it CAN NOT, but before I bake a kill -9 into my control script (which first tries a normal "kill", of course), I'd like to hear the answer straight from the horse's mouth... I'm using Solr 1.4 nightly from about a month ago. Can I kill -9 without fear of having to rebuild my index? Thanks! Michael
Re: Preserving "C++" and other weird tokens
On Thu, Aug 6, 2009 at 11:38 AM, Michael _ wrote: > Hi everyone, > I'm indexing several documents that contain words that the > StandardTokenizer cannot detect as tokens. These are words like > C# > .NET > C++ > which are important for users to be able to search for, but get treated as > "C", "NET", and "C". > > How can I create a list of words that should be understood to be > indivisible tokens? Is my only option somehow stringing together a lot of > PatternTokenizers? I'd love to do something like class="StandardTokenizer" tokenwhitelist=".NET C++ C#" />. > > Thanks in advance! > By the way, in case it wasn't clear: I'm not particularly tied to using the StandardTokenizer. Any tokenizer would be fine, if it did a reasonable job of splitting up the input text while preserving special cases. I'm also not averse to passing in a list of regexes, if I had to, but I'm suspicious that that would be redoing a lot of the work done by the parser inside the Tokenizer. Thanks, Michael
Re: Is kill -9 safe or not?
Kill -9 will not corrupt your index, but you would lose any uncommitted documents. -Yonik http://www.lucidimagination.com On Fri, Aug 7, 2009 at 11:07 AM, Michael _ wrote: > I've seen several threads that are one or two years old saying that > performing "kill -9" on the java process running Solr either CAN, or CAN NOT > corrupt your index. The more recent ones seem to say that it CAN NOT, but > before I bake a kill -9 into my control script (which first tries a normal > "kill", of course), I'd like to hear the answer straight from the horse's > mouth... > I'm using Solr 1.4 nightly from about a month ago. Can I kill -9 without > fear of having to rebuild my index? > > Thanks! > Michael >
Re: Preserving "C++" and other weird tokens
http://search.lucidimagination.com/search/document/2d325f6178afc00a/how_to_search_for_c -Yonik http://www.lucidimagination.com On Thu, Aug 6, 2009 at 11:38 AM, Michael _ wrote: > Hi everyone, > I'm indexing several documents that contain words that the StandardTokenizer > cannot detect as tokens. These are words like > C# > .NET > C++ > which are important for users to be able to search for, but get treated as > "C", "NET", and "C". > > How can I create a list of words that should be understood to be indivisible > tokens? Is my only option somehow stringing together a lot of > PatternTokenizers? I'd love to do something like class="StandardTokenizer" tokenwhitelist=".NET C++ C#" />. > > Thanks in advance! >
Re: Attempt to query for max id failing with exception
I just tried this sample code... it worked fine for me on trunk. -Yonik http://www.lucidimagination.com On Thu, Aug 6, 2009 at 8:28 PM, Reuben Firmin wrote: > I'm using SolrJ. When I attempt to set up a query to retrieve the maximum id > in the index, I'm getting an exception. > > My setup code is: > > final SolrQuery params = new SolrQuery(); > params.addSortField("id", ORDER.desc); > params.setRows(1); > params.setQuery(queryString); > > final QueryResponse queryResponse = server.query(params); > > This latter line is blowing up with: > > Not Found > > request: > http://solr.xxx.myserver/select?sort=iddesc&rows=1&q=*:*&wt=javabin&version=2.2 > org.apache.solr.common.SolrException > org.apache.solr.client.solrj.impl.CommonsHttpSolrServer#request(343) > org.apache.solr.client.solrj.impl.CommonsHttpSolrServer#request(183) > org.apache.solr.client.solrj.request.QueryRequest#process(90) > org.apache.solr.client.solrj.SolrServer#query(109) > > There are a couple things to note - > > - there is a space between id and desc which looks suspicious, but > swapping changing wt to XML and leaving the URL otherwise the same causes > solr no grief when queried via a browser > - the index is in fact empty - this particular section of code is bulk > loading our documents, and using the max id query to figure out where to > start from. (I can and will try catching the exception and assuming 0, but > ideally I wouldn't get an exception just from doing the query) > > Am I doing this query in the wrong way? > > Thanks > Reuben >
Re: Is kill -9 safe or not?
Yonik, Uncommitted (as in solr un"commit"ed) on unflushed? Thanks, Otis - Original Message > From: Yonik Seeley > To: solr-user@lucene.apache.org > Sent: Friday, August 7, 2009 11:10:49 AM > Subject: Re: Is kill -9 safe or not? > > Kill -9 will not corrupt your index, but you would lose any > uncommitted documents. > > -Yonik > http://www.lucidimagination.com > > > On Fri, Aug 7, 2009 at 11:07 AM, Michael _wrote: > > I've seen several threads that are one or two years old saying that > > performing "kill -9" on the java process running Solr either CAN, or CAN NOT > > corrupt your index. The more recent ones seem to say that it CAN NOT, but > > before I bake a kill -9 into my control script (which first tries a normal > > "kill", of course), I'd like to hear the answer straight from the horse's > > mouth... > > I'm using Solr 1.4 nightly from about a month ago. Can I kill -9 without > > fear of having to rebuild my index? > > > > Thanks! > > Michael > >
Re: Attempt to query for max id failing with exception
Yep, thanks - this turned out to be a systems configuration error. Our sysadmin hadn't opened up the http port on the server's internal network interface; I could browse to it from outside (i.e. firefox on my machine), but the apache landing page was being returned when CommonsHttpSolrServer tried to get at it. Reuben On Fri, Aug 7, 2009 at 12:03 PM, Yonik Seeley wrote: > I just tried this sample code... it worked fine for me on trunk. > > -Yonik > http://www.lucidimagination.com > > On Thu, Aug 6, 2009 at 8:28 PM, Reuben Firmin wrote: > > I'm using SolrJ. When I attempt to set up a query to retrieve the maximum > id > > in the index, I'm getting an exception. > > > > My setup code is: > > > > final SolrQuery params = new SolrQuery(); > > params.addSortField("id", ORDER.desc); > > params.setRows(1); > > params.setQuery(queryString); > > > > final QueryResponse queryResponse = server.query(params); > > > > This latter line is blowing up with: > > > > Not Found > > > > request: > http://solr.xxx.myserver/select?sort=iddesc&rows=1&q=*:*&wt=javabin&version=2.2 > > org.apache.solr.common.SolrException > > org.apache.solr.client.solrj.impl.CommonsHttpSolrServer#request(343) > > > org.apache.solr.client.solrj.impl.CommonsHttpSolrServer#request(183) > >org.apache.solr.client.solrj.request.QueryRequest#process(90) > >org.apache.solr.client.solrj.SolrServer#query(109) > > > > There are a couple things to note - > > > > - there is a space between id and desc which looks suspicious, but > > swapping changing wt to XML and leaving the URL otherwise the same > causes > > solr no grief when queried via a browser > > - the index is in fact empty - this particular section of code is bulk > > loading our documents, and using the max id query to figure out where > to > > start from. (I can and will try catching the exception and assuming 0, > but > > ideally I wouldn't get an exception just from doing the query) > > > > Am I doing this query in the wrong way? > > > > Thanks > > Reuben > > >
Re: Is kill -9 safe or not?
On Fri, Aug 7, 2009 at 12:04 PM, Otis Gospodnetic wrote: > Yonik, > > Uncommitted (as in solr un"commit"ed) on unflushed? Solr uncommitted. Even if the docs hit the disk via a segment flush, they aren't part of the index until the index descriptor (segments_n) is written pointing to that new segment. -Yonik http://www.lucidimagination.com > Thanks, > Otis > > > - Original Message >> From: Yonik Seeley >> To: solr-user@lucene.apache.org >> Sent: Friday, August 7, 2009 11:10:49 AM >> Subject: Re: Is kill -9 safe or not? >> >> Kill -9 will not corrupt your index, but you would lose any >> uncommitted documents. >> >> -Yonik >> http://www.lucidimagination.com >> >> >> On Fri, Aug 7, 2009 at 11:07 AM, Michael _wrote: >> > I've seen several threads that are one or two years old saying that >> > performing "kill -9" on the java process running Solr either CAN, or CAN >> > NOT >> > corrupt your index. The more recent ones seem to say that it CAN NOT, but >> > before I bake a kill -9 into my control script (which first tries a normal >> > "kill", of course), I'd like to hear the answer straight from the horse's >> > mouth... >> > I'm using Solr 1.4 nightly from about a month ago. Can I kill -9 without >> > fear of having to rebuild my index? >> > >> > Thanks! >> > Michael >> > > >
Solr CMS Integration
I've been asked to suggest a framework for managing a website's content and making all that content searchable. I'm comfortable using Solr for search, but I don't know where to start with the content management system. Is anyone using a CMS (open source or commercial) that you've integrated with Solr for search and are happy with? This will be a consumer facing website with a combination or articles, blogs, white papers, etc. Thanks, Wojtek -- View this message in context: http://www.nabble.com/Solr-CMS-Integration-tp24868462p24868462.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Preserving "C++" and other weird tokens
Ach, sorry I didn't find this before posting! - Michael Yonik Seeley-2 wrote: > > http://search.lucidimagination.com/search/document/2d325f6178afc00a/how_to_search_for_c > > -Yonik > http://www.lucidimagination.com > -- View this message in context: http://www.nabble.com/Preserving-%22C%2B%2B%22-and-other-weird-tokens-tp24848968p24868579.html Sent from the Solr - User mailing list archive at Nabble.com.
Question regarding merging Solr indexes
Hello, I have a MultiCore setup with 3 cores. I am trying to merge the indexes of core1 and core2 into core3. I looked at the wiki but I'm somewhat unclear on what needs to happen. This is what I used: http://localhost:9085/solr/core3/admin/?action=mergeindexes&core=core3&indexDir=/solrHome/core1/data/index&indexDir=/solrHome/core2/data/index&commit=true When I hit this I just go to the admin page for core3. Maybe the way I reference the indexes is incorrect? What path goes there anyway? Thanks -- View this message in context: http://www.nabble.com/Question-regarding-merging-Solr-indexes-tp24868670p24868670.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr CMS Integration
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 wojtekpia schrieb: Hi Wojtek, > I've been asked to suggest a framework for managing a website's content and > making all that content searchable. I'm comfortable using Solr for search, > but I don't know where to start with the content management system. Is > anyone using a CMS (open source or commercial) that you've integrated with > Solr for search and are happy with? This will be a consumer facing website > with a combination or articles, blogs, white papers, etc. if you're comfortable with PHP you might want to look at Drupal (http://drupal.org/project/apachesolr) which sounds like a good match for your requirements... Regards, Andre -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkp8YlQACgkQ3wuzs9k1icVFSACgjRy7AOd+Aney7LDmpWTaIssz p74AnAn+/5So+qSfpfbXOXShCYZfAppS =zqHU -END PGP SIGNATURE-
Re: Solr CMS Integration
lucidimagination.com is powered off of Drupal and we index it using Solr (but not the Drupal plugin, as we have non CMS data as well). It has blogs, articles, white papers, mail archives, JIRA tickets, Wiki's etc. On Aug 7, 2009, at 1:01 PM, wojtekpia wrote: I've been asked to suggest a framework for managing a website's content and making all that content searchable. I'm comfortable using Solr for search, but I don't know where to start with the content management system. Is anyone using a CMS (open source or commercial) that you've integrated with Solr for search and are happy with? This will be a consumer facing website with a combination or articles, blogs, white papers, etc. Thanks, Wojtek -- View this message in context: http://www.nabble.com/Solr-CMS-Integration-tp24868462p24868462.html Sent from the Solr - User mailing list archive at Nabble.com. -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
localSolr install
Is there any sort of guide to installing and configuring localSolr into an existing solr implementation? I'm not extremely versed with java applications, but I've managed to cobble together jetty and solr multicore fairly reliably. I've downloaded localLucine 2.0 and localSolr 6.1, and this is where the guesswork starts. Any help is greatly appreciated.
Re: localSolr install
Hi All, I also need the same information. I am planning to set up solr. I have data around 20 to 30 million records and those in csv formats. Your help is highly appreciable. Regards, Bhargava S Akula. 2009/8/7 Brian Klippel > Is there any sort of guide to installing and configuring localSolr into > an existing solr implementation? > > > > I'm not extremely versed with java applications, but I've managed to > cobble together jetty and solr multicore fairly reliably. I've > downloaded localLucine 2.0 and localSolr 6.1, and this is where the > guesswork starts. > > > > Any help is greatly appreciated. > > > >
Re: Is kill -9 safe or not?
Thanks for the confirmation and reassurance! - Michael Yonik Seeley-2 wrote: > > On Fri, Aug 7, 2009 at 12:04 PM, Otis > Gospodnetic wrote: >> Yonik, >> >> Uncommitted (as in solr un"commit"ed) on unflushed? > > Solr uncommitted. Even if the docs hit the disk via a segment flush, > they aren't part of the index until the index descriptor (segments_n) > is written pointing to that new segment. > > -Yonik > http://www.lucidimagination.com > >> -- View this message in context: http://www.nabble.com/Is-kill--9-safe-or-not--tp24866506p24869260.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr CMS Integration
I would second that and add that you may want to consider acquia.com as they provide a solid infrustracture to support the solr instance. On Fri, Aug 7, 2009 at 11:20 AM, Andre Hagenbruch wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > wojtekpia schrieb: > > Hi Wojtek, > > > I've been asked to suggest a framework for managing a website's content > and > > making all that content searchable. I'm comfortable using Solr for > search, > > but I don't know where to start with the content management system. Is > > anyone using a CMS (open source or commercial) that you've integrated > with > > Solr for search and are happy with? This will be a consumer facing > website > > with a combination or articles, blogs, white papers, etc. > > if you're comfortable with PHP you might want to look at Drupal > (http://drupal.org/project/apachesolr) which sounds like a good match > for your requirements... > > Regards, > > Andre > -BEGIN PGP SIGNATURE- > Version: GnuPG v1.4.9 (Darwin) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ > > iEYEARECAAYFAkp8YlQACgkQ3wuzs9k1icVFSACgjRy7AOd+Aney7LDmpWTaIssz > p74AnAn+/5So+qSfpfbXOXShCYZfAppS > =zqHU > -END PGP SIGNATURE- > -- Contact me: 801.850.2953 (cell or sms) facebook: http://www.facebook.com/profile.php?id=534661678 LinkedIn: http://www.linkedin.com/profile?viewProfile=&key=3902213 website:scanalytix.com
Re: Solr CMS Integration
Thanks for the responses. I'll give Drupal a shot. It sounds like it'll do the trick, and if it doesn't then at least I'll know what I'm looking for. Wojtek -- View this message in context: http://www.nabble.com/Solr-CMS-Integration-tp24868462p24870218.html Sent from the Solr - User mailing list archive at Nabble.com.
PhoneticFilterFactory related questions
Hi, I have a schema with three (relevant to this question) fields: title, author, book_content. I found that if PhoneticFilterFactory is used as a filter on book_content, it was bringing back all kinds of unrelated results, so I have it applied only against title and author. Questions -- 1) I have the filter set up on both the index and query analyzers for the fieldType of title/author. When running against an index which had been built without the phonetic filter, phonetic searches still worked. Is there a performance benefit to applying the phonetic filter to the index analyzer as well as the query analyzer, are there other benefits to doing so, or should I not bother? (I.e. should I just apply the filter to the query analyzer?) 2) Title / author matches are generally boosted, which is fine if it's an exact match (i.e. "Shakespeare In Love" or "by William Shakespeare" are more relevant than a book which mentions Shakespeare). However, the phonetic filter put a bit of a spanner in the works - now if I search for "bottling", books with the word "b*a*ttling" in the title show up above books with the non-substituted word in the content. How can I juggle the boosting / field setup to be something like: a) Title/author matches (with exactly matched spelling - stemming etc is fine) b) Content matches (with exactly matched spelling) c) Title/author matches (with phoneme equivalent spelling) Do I need to create separate non-phonetic title/author fields for this, or is there a different way to achieve the same effect? Thanks Reuben
Solr Security
Have anyone had an experience to setup the Solr Security? http://wiki.apache.org/solr/SolrSecurity I would like to implement using HTTP Authentication or using Path Based Authentication. So, in the webdefault.xml I set like the following: Solr authenticated application /core1/* core1-role BASIC Test Realm What should I put in "url-pattern" and "web-resource-name" ? Then I set up Realm.properties like this guest: guest, core1-role Francis
Re: Solr CMS Integration
Am 07.08.2009 um 19:01 schrieb wojtekpia: I've been asked to suggest a framework for managing a website's content and making all that content searchable. I'm comfortable using Solr for search, but I don't know where to start with the content management system. Is anyone using a CMS (open source or commercial) that you've integrated with Solr for search and are happy with? This will be a consumer facing website with a combination or articles, blogs, white papers, etc. Hi Wojtek, Have a look at TYPO3. http://typo3.org/ It is quite powerful. Ingo and I are currently implementing a SOLR extension for it. We currently use it at http://www.be-lufthansa.com/ Contact me if you want an insight. Many greetings, Olivier -- Olivier Dobberkau . . . . . . . . . . . . . . Je TYPO3, desto d.k.d d.k.d Internet Service GmbH Kaiserstrasse 73 D 60329 Frankfurt/Main Fon: +49 (0)69 - 247 52 18 - 0 Fax: +49 (0)69 - 247 52 18 - 99 Mail: olivier.dobber...@dkd.de Web: http://www.dkd.de Registergericht: Amtsgericht Frankfurt am Main Registernummer: HRB 45590 Geschäftsführer: Olivier Dobberkau, Søren Schaffstein, Götz Wegenast Aktuelle Projekte: http://bewegung.taz.de - Launch (Ruby on Rails) http://www.hans-im-glueck.de - Relaunch (TYPO3) http://www.proasyl.de - Relaunch (TYPO3)
Re: Solr CMS Integration
Hello Wojtek, I don't want to discourage all the famous CMSs around nor solr uptake but xwiki is quite a powerful CMS and has a search that is lucene based. paul Le 07-août-09 à 22:42, Olivier Dobberkau a écrit : I've been asked to suggest a framework for managing a website's content and making all that content searchable. I'm comfortable using Solr for search, but I don't know where to start with the content management system. Is anyone using a CMS (open source or commercial) that you've integrated with Solr for search and are happy with? This will be a consumer facing website with a combination or articles, blogs, white papers, etc. Have a look at TYPO3. http://typo3.org/ It is quite powerful. Ingo and I are currently implementing a SOLR extension for it. We currently use it at http://www.be-lufthansa.com/ Contact me if you want an insight. smime.p7s Description: S/MIME cryptographic signature
spellcheck component in 1.4 distributed
I am e-mailing to inquire about the status of the spellchecking component in 1.4 (distributed). I saw SOLR-785, but it is unreleased and for 1.5. Any help would be much appreciated. Thanks in advance, Mike
Re: solr v1.4 in production?
Pubget has been using 1.4 for a while now to make the replication easier. http://pubget.com We compiled a while back and are thinking of updating to the latest build to start playing with distributed spell checking. On Fri, Aug 7, 2009 at 7:42 AM, Shalin Shekhar Mangar < shalinman...@gmail.com> wrote: > On Wed, Jul 1, 2009 at 6:17 PM, Ed Summers wrote: > > > Here at the Library of Congress we've got several production Solr > > instances running v1.3. We've been itching to get at what will be v1.4 > > and were wondering if anyone else happens to be using it in production > > yet. Any information you can provide would be most welcome. > > > > > We're using Solr 1.4 built from r793546 in production along with the new > java based replication. > > -- > Regards, > Shalin Shekhar Mangar. > -- Regards, Ian Connor
Can multiple Solr webapps access the same lucene index files?
Hello, I have a question I can't find an answer to in the list. Can mutliple solr webapps (for instance in separate cluster nodes) share the same lucene index files stored within a shared filesystem? We do this with a custom Lucene search application right now, I'm trying to switch to using solr and am curious if we can use the same deployment strategy. Mark
MoreLikeThis: How to get quality terms from html from content stream?
I'm using the MoreLikeThisHandler with a content stream to get documents from my index that match content from an html page like this: http://localhost:8080/solr/mlt?stream.url=http://www.sfgate.com/cgi-bin/article.cgi?f=/c/a/2009/08/06/SP5R194Q13.DTL&mlt.fl=body&rows=4&debugQuery=true But, not surprisingly, the query generated is meaningless because a lot of the markup is picked out as terms: body:li body:href body:div body:class body:a body:script body:type body:js body:ul body:text body:javascript body:style body:css body:h body:img body:var body:articl body:ad body:http body:span body:prop Does anyone know a way to transform the html so that the content can be parsed out of the content stream and processed w/o the markup? Or do I need to write my own HTMLParsingMoreLikeThisHandler? If I parse the content out to a plain text file and point the stream.url param to file:///parsedfile.txt it works great. -Jay
How to use key with facet.prefix?
I'm trying to facet multiple times on same field using key. This works fine except when I use prefixes for these facets. What I got so far (and not functional): .. &facet=true &facet.field=category&f.category.facet.prefix=01 &facet.field={!key=subcat}category&f.subcat.facet.prefix=00 This will give me 2 facets in results, one named 'category' and another 'subcat' like expected. But prefix for key 'subcat' is ignored and the other prefix is used for both facets. How do I use key with prefixes or am I barking up the wrong tree here? Thanks!
Re: Can multiple Solr webapps access the same lucene index files?
Yes, they could all point to an index that lives on a NAS or SAN, for example. You'd still have to make sure only one server is writing to the index at a time. Zookeeper can help with coordination of that. Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message > From: Mark Diggory > To: solr-user@lucene.apache.org > Sent: Friday, August 7, 2009 8:16:46 PM > Subject: Can multiple Solr webapps access the same lucene index files? > > Hello, > > I have a question I can't find an answer to in the list. Can mutliple solr > webapps (for instance in separate cluster nodes) share the same lucene index > files stored within a shared filesystem? We do this with a custom Lucene > search application right now, I'm trying to switch to using solr and am > curious if we can use the same deployment strategy. > > Mark
Re: Question regarding merging Solr indexes
On Fri, Aug 7, 2009 at 10:45 PM, ahammad wrote: > > Hello, > > I have a MultiCore setup with 3 cores. I am trying to merge the indexes of > core1 and core2 into core3. I looked at the wiki but I'm somewhat unclear > on > what needs to happen. > > This is what I used: > > > http://localhost:9085/solr/core3/admin/?action=mergeindexes&core=core3&indexDir=/solrHome/core1/data/index&indexDir=/solrHome/core2/data/index&commit=true > > When I hit this I just go to the admin page for core3. Maybe the way I > reference the indexes is incorrect? What path goes there anyway? > Look at http://wiki.apache.org/solr/MergingSolrIndexes#head-0befd0949a54b6399ff926062279afec62deb9ce -- Regards, Shalin Shekhar Mangar.
Re: 99.9% uptime requirement
: Subject: 99.9% uptime requirement : In-Reply-To: <4a730d0f.3050...@btelligent.de> http://people.apache.org/~hossman/#threadhijack Thread Hijacking on Mailing Lists When starting a new discussion on a mailing list, please do not reply to an existing message, instead start a fresh email. Even if you change the subject line of your email, other mail headers still track which thread you replied to and your question is "hidden" in that thread and gets less attention. It makes following discussions in the mailing list archives particularly difficult. See Also: http://en.wikipedia.org/wiki/Thread_hijacking -Hoss
Re: solr/home in web.xml relative to web server home
: the environment variable (env-entry) in web.xml to configure the solr/home is : relative to the web server's working directory. I find this unusual as all the : servlet paths are relative to the web applications directory (webapp context, : that is). So, I specified solr/home relative to the web app dir, as well, at : first. the intention is not that the SOlr Home dir be configured inside the web.xml -- it is *possible* to specify the solr home dir in the web.xml as you describe, but that's relaly just a fallback for people who really, really, want to bake all of the information into the war. solr.war is an application -- when you run hte paplication you specify (at run time) some configuration information. Hardcoding that config information into the war file is akin to compile a C++ program with all of hte config options hardcoded -- you can do it, but it's not very generic, and requires a lot of hacking whenever you want to upgrade. : (In my case, I want to deliver the solr web application including a custom : entity processor, so that is why I want to include the solr war as part of my : release cycle. It is easier to deliver that to the system administration than : to provide them with partial packages they have to install into an already : installed war, imho.) you don't have to "install into an already installed war" to add custom plugins .. you just have to put the jar file for your custom plugins into a "lib" directory instead of your solr home dir. This is really no different then something like the Apache HTTPD server. there is the application (the binary httpd / solr.war) there is your configuration (httpd.conf / solr home dir) and there are custom modules you can choose to load (libmod_entityprocessor.so / your-entityprocessor.jar) -Hoss
Re: solr indexing on same set of records with different value of unique field, not working fine.
: Sorry, schema.xml file is here in this mail... in the schema.xml file you attached, the uniqueKey field is "evid" you only provided one example of the type of input you are indexing, and in that example... : > 501 ...but in your orriginal email (see below) you said you were using a timestamp field as the uniqueKey, and you didn't understand why reindexing hte same 100 docs twice didn't give you 200 docs. that example uniqueKey value isn't a timestamp, so i don't really understand what you're talking about. if you index that doc over and over with the schema.xml you sent, then it's constaintly going to replace it self over and over again because hte uniqueKey field (evid) is the same (501) everytime. : > > : Here, i specified 20 fields in schema.xml file. the unoque field i set : > > was, : > > : currentTimeStamp field. : > > : So, when i run the loader program (which loads xml data into solr) it : > > creates : > > : currentTimestamp value...and loads into solr. : > > : : For this situation, : > > : i stopped the loader program, after 100 records indexed into solr. : > > : Then again, i run the loader program for the SAME 100 records to indexed : > > : means, : > > : the solr results 100, rather than 200. : > > : : Because, i set currentTimeStamp field as uniqueField. So i expect the : > > result : > > : as 200, if i run again the same 100 records... : > > : : Any suggestions please... -Hoss
Re: update some index documents after indexing process is done with DIH
: What is confusing me now is that I have to implement my logic in you're certianly in a fuzzy grey area here ... none of this stuff was designed for the kind of thing you're doing. : But in processCommit, having access to the core I can get the IndexReader : but I still don't know how to get the IndexWriter and SolrInputDocuments in you don't get direct access ot the IndexWriter ... instead your UpdateProcessor uses the SolrCore to get an UpdateRequestProcessorChain to add (ie: replace) the SolrInputDocuments you made based on what you saw in the orriginal SolrInputDocuments. for a second i was thinking that you'd have to worry about checking some threadlocal variable to keep yourself from going into an infinite loop, but then i remembered that you can configured named UpdateRequestProcessorChains ... so your default Chain can use your custom component, and you can create a simple chain (that bybasses your custom component) for your component to call processAdd()/processCommit() on. -Hoss
Re: Reasonable number of maxWarming searchers
: Is there a problem if i set maxWarmingSearchers to something like 30 or 40? my personal opinion: anything higher then 3 indicates a serious architecture problem. On a master, doing lots of updates, the "warming" time should be zero, so there shouldn't ever be more then 2 searchers at one time -- 3 is being generous incase you just happen to get some paralell rapid fire add/commit pairs ... beyond that you're better off just letting any ohter concurrent commit calls block for the few milli-seconds it will take to finish the commit. : Also, how do I disable the cache warming? Is setting autowarmCount's : to 0 enough? yes, but even better: make the cache sizes zero, that way if someone accidently does query your master, you won't waste ram caching it. -Hoss