Re: spell suggestions help

2013-04-11 Thread Rohan Thakur
hi jack I am using whitespace toknizer only and before this im using pattern replace to replace & with and but its not working I guess. my query analyser: wrote: > Try replacing standard tokenizer with whitespace tokenizer in your field > types. And make s

Re: Not able to replicate the solr 3.5 indexes to solr 4.2 indexes

2013-04-11 Thread Montu v Boda
hi thanks for your reply. is anyone is going to fix this issue in new solr version? because there are so many guys facing the same problem while upgrading the solr index 3.5.0 to solr 4.2 Thanks & Regards Montu v Boda -- View this message in context: http://lucene.472066.n3.nabble.com/Not-ab

Support old syntax including geodist

2013-04-11 Thread Billnbell
Since Spatial Lucene 4 does not seem to support geodist(), even sending d,pt,fq={!geofilt}does not help me = I need to sort. So I end up having to set up the sortsq. Any other ideas on how to support the old syntax on the new spatial? Can I create a transform or something ? Convert http://localho

Easier way to do this?

2013-04-11 Thread Billnbell
I would love for the SOLR spatial 4 to support pt so that I can run # of results around a central point easily like in 3.6. How can I pass parameters to a Circle() ? I would love to send PT to this query since the pt is the same across multiple areas For example: http://localhost:8983/solr/core/s

Re: Spellchecker not working for Solr 4.1

2013-04-11 Thread alxsss
inside your request handler try to put spellcheck true and name of the spellcheck dictionary hth Alex. -Original Message- From: davers To: solr-user Sent: Thu, Apr 11, 2013 6:24 pm Subject: Spellchecker not working for Solr 4.1 This is almost the same exact setup I was usin

Re: tokenizer of solr

2013-04-11 Thread Jack Krupansky
In that case, use the types="wdfftypes.txt" attribute of WDF and map "@" and "_" to ALPHA as shown in: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory. -- Jack Krupansky -Original Message- From: Mingfeng Yang Sent: Thursday, April 11, 2013 8

Re: Cloud and Master slave replcation

2013-04-11 Thread Erick Erickson
Why would you want to do this? The whole point of SolrCloud is to support HA/DR and NRT. By putting a master/slave in the mix, you're kind of circumventing both things. Which means I don't understand what you're trying to accomplish, can you clarify? Best Erick On Wed, Apr 10, 2013 at 11:47 AM,

Spellchecker not working for Solr 4.1

2013-04-11 Thread davers
This is almost the same exact setup I was using in solr 3.6 not sure why it's not working. Here is my setup. textSpell default spell solr.DirectSolrSpellChecker internal 0.7 2 1 5 4 0.01

Re: Index Replication Failing in Solr 4.2.1

2013-04-11 Thread Umesh Prasad
Created Jira Issue https://issues.apache.org/jira/browse/SOLR-4703 and attached the Patch. No unit tests yet. On Fri, Apr 12, 2013 at 12:59 AM, Mark Miller wrote: > I was looking for this msg the other day and couldn't find it offhand… > > +1, please add this to JIRA so someone can look into it

Re: tokenizer of solr

2013-04-11 Thread Mingfeng Yang
looks like it's due to the word delimiter filter. Anyone know if the "protected" file support regular expression or not? Ming On Thu, Apr 11, 2013 at 4:58 PM, Jack Krupansky wrote: > Try the whitespace tokenizer. > > -- Jack Krupansky > > -Original Message- From: Mingfeng Yang Sent: Th

Re: One case for shingle and synonym filter

2013-04-11 Thread Otis Gospodnetic
Hi, Sure, you can make sportctr be the synonym for "sports center". You could also make use of ngrams, but that created much bigger indices. Otis -- Solr & ElasticSearch Support http://sematext.com/ On Thu, Apr 11, 2013 at 5:12 PM, Xiang Liu wrote: > Hi, > Here is the case:Given a doc name

Re: SolrCloud leader to replica

2013-04-11 Thread Otis Gospodnetic
Hi, I think Timothy is right about what Lisheng is really after, which is consistency. I agree with what Timothy is implying here - changes of search being inconsistent are very, very small. I'm guessing Lisheng is trying to solve a problem he doesn't actually have yet? Also, think about a non-

Re: SolrCloud leader to replica

2013-04-11 Thread Timothy Potter
Hmmm ... I was following this discussion but then got confused when Lisheng said to change Solr to "compromise consistency in order to increase availability" when your concern is "how long replica is behind leader". Seems you want more consistency vs. less in this case? One of the reasons behind So

Re: tokenizer of solr

2013-04-11 Thread Jack Krupansky
Try the whitespace tokenizer. -- Jack Krupansky -Original Message- From: Mingfeng Yang Sent: Thursday, April 11, 2013 7:48 PM To: solr-user@lucene.apache.org Subject: tokenizer of solr Dear Solr users and developers, I am trying to index some documents some of which are twitter m

RE: Solr Multiword Search

2013-04-11 Thread skmirch
Hi James, Your suggestions/tips for our spellcheck requirements were all very good. Thanks a lot for your help. -- Sandeep -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Multiword-Search-tp4053038p4055433.html Sent from the Solr - User mailing list archive at Nabble.c

tokenizer of solr

2013-04-11 Thread Mingfeng Yang
Dear Solr users and developers, I am trying to index some documents some of which are twitter messages, and we have a problem when indexing retweet. Say a twitter user named "jpc_108" post a tweet, and then someone retweet his msg, and now @jpc_108 become part of the tweet text body. Seems like

Re: SolrCloud leader to replica

2013-04-11 Thread Shawn Heisey
On 4/11/2013 3:50 PM, Otis Gospodnetic wrote: But note that I misspoke, which I realized after re-reading the thread I pointed you to. Mark explains it nicely there: * the index call returns only when (and IF!) indexing to all replicas succeeds Does this actually mean "all active replicas" ...

One case for shingle and synonym filter

2013-04-11 Thread Xiang Liu
Hi, Here is the case:Given a doc named "sport center", we hope some query like "sportctr" (user ignore) can recall it.Can shingle and synonym filter be combined in some smart way to produce the term? Thanks,Xiang

RE: SolrCloud leader to replica

2013-04-11 Thread Zhang, Lisheng
Hi Otis, Thanks very much for helps, your explanation is very clear. My main concern is not the return status for indexing calls (although which is also important), my main concern is how long replica is behind the leader (or putting in your way, how consistent search picture is to client A and

Re: Combining join queries

2013-04-11 Thread Upayavira
Many thanks Yonik! Upayavira On Thu, Apr 11, 2013, at 09:32 PM, Yonik Seeley wrote: > On Wed, Apr 10, 2013 at 7:33 AM, Upayavira wrote: > > On Wed, Apr 10, 2013, at 12:22 PM, Upayavira wrote: > >> I'm sure the best way for me to solve this issue myself is to ask it > >> publicly, so... > >> > >>

Re: SolrCloud leader to replica

2013-04-11 Thread Otis Gospodnetic
But note that I misspoke, which I realized after re-reading the thread I pointed you to. Mark explains it nicely there: * the index call returns only when (and IF!) indexing to all replicas succeeds BUT, that should not be mixed with what search clients see! Just because the indexing client sees

Re: Use of SolrJettyTestBase

2013-04-11 Thread Upayavira
On Tue, Apr 2, 2013, at 12:21 AM, Chris Hostetter wrote: > : I've subclassed SolrJettyTestBase, and added a test method (annotated > : with @test). However, my test method is never called. I see the > > You got an immediate failure from the tests setup, because you don'th ave > assertions enabled

RE: SolrCloud leader to replica

2013-04-11 Thread Zhang, Lisheng
Thanks very much for your helps! -Original Message- From: Otis Gospodnetic [mailto:otis.gospodne...@gmail.com] Sent: Thursday, April 11, 2013 1:23 PM To: solr-user@lucene.apache.org Subject: Re: SolrCloud leader to replica Yes, I *think* that is the case. Some distributed systems have t

Re: Combining join queries

2013-04-11 Thread Yonik Seeley
On Wed, Apr 10, 2013 at 7:33 AM, Upayavira wrote: > On Wed, Apr 10, 2013, at 12:22 PM, Upayavira wrote: >> I'm sure the best way for me to solve this issue myself is to ask it >> publicly, so... >> >> If I have two {!join} queries that select a collection of documents >> each, how do I create a fi

Re: Pointing to Hbase for Docuements or Directly Saving Documents at Hbase

2013-04-11 Thread Otis Gospodnetic
Source code is your best bet. Wiki has info about how to use it, but not how highlighting is implemented. But you don't need to understand the implementation details to understand that they are dynamic, computed specifically for each query for each matching document, so you cannot store them anyw

Re: SolrCloud leader to replica

2013-04-11 Thread Otis Gospodnetic
Yes, I *think* that is the case. Some distributed systems have the option to return success to caller only after data has been added/indexed to N other nodes, but I think Solr doesn't have this yet. Somebody please correct me if I'm wrong. See: http://search-lucene.com/?q=eventually+consistent&f

Re: RequestHandler.. Conditional components

2013-04-11 Thread Jack Krupansky
By "come out", do you mean that the request is completed successfully, or to abort and fail the request? For the latter, you can throw an exception. There is no direct, proper way for search component B to skip search component C, but... B can modify the request parameters to set "C=false" to

Re: Approximately needed RAM for 5000 query/second at a Solr machine?

2013-04-11 Thread Shawn Heisey
On 4/11/2013 7:46 AM, Michael Ryan wrote: In our system, we use LogByteSizeMergePolicy, and used to have a mergeFactor of 10. We would see the worst case happen when there were exactly 20 segments (or some other multiple of 10, I believe) at the start of the optimize. IIRC, it would merge thos

Re: Slow qTime for distributed search

2013-04-11 Thread Manuel Le Normand
Hi, We have different working hours, sorry for the reply delay. Your assumed numbers are right, about 25-30Kb per doc. giving a total of 15G per shard, there are two shards per server (+2 slaves that should do no work normally). An average query has about 30 conditions (OR AND mixed), most of them

Re: Index Replication Failing in Solr 4.2.1

2013-04-11 Thread Mark Miller
I was looking for this msg the other day and couldn't find it offhand… +1, please add this to JIRA so someone can look into it and it does not get lost! - Mark On Apr 11, 2013, at 11:17 AM, Otis Gospodnetic wrote: > Hi Umesh, > > The attachment didn't make it through. Could you please add

Re: RequestHandler.. Conditional components

2013-04-11 Thread venkata
For example requestHander A defined as shown below. Based on some condition ( based on SOLR param value or some other condition), I want to come out after executing "component B". Something possible? -- View this message in context: http://lucene.472066.n3.nabble.com/RequestHandler-Con

Re: Basic auth on SolrCloud /admin/* calls

2013-04-11 Thread Michael Della Bitta
It's fairly easy to lock down Solr behind basic auth using just the servlet container it's running in, but the problem becomes letting services that *should* be able to access Solr in. I've rolled with basic auth in some setups, but certain deployments such as Solr Cloud or sharded setups don't pla

Re: Basic auth on SolrCloud /admin/* calls

2013-04-11 Thread Raymond Wiker
On Apr 11, 2013, at 17:12 , adfel70 wrote: > Hi > I need to implement security in solr as follows: > 1. prevent unauthorized users from accessing to solr admin pages. > 2. prevent unauthorized users from performing solr operations - both /admin > and /update. > > > Is the conclusion of this thre

RE: SolrCloud leader to replica

2013-04-11 Thread Zhang, Lisheng
Hi Otis, Thanks very much for the quick help! We are considering to upgrade from solr 3.6 to 4x and use solrCloud, but we are concerned about performance related to replica? In this scenario it seems that the replica would be a few seconds beyond leader because replica would start indexing only af

Re: solr join use case (not in instead of in)

2013-04-11 Thread Jack Krupansky
Generally, you need to flatten and denormalize your data before you place it in Solr. But, Solr does have a limited join capability that does handle some cases reasonably well: http://wiki.apache.org/solr/Join For example... "This Solr request... /solr/collection1/select ? fl=xxx,yyy & q=

Getting page number of result with tika

2013-04-11 Thread Gian Maria Ricci
As far as I know SOLR-380 deal with the problem of kowing page number with tika indexing. The issue contains a patch but it is really old, and I'm curious how is the status of this issue (since I see Fix Version/s 4.3, so it seems that it will be imp

Solr 4.2.x replication events on slaves

2013-04-11 Thread Stephane Bailliez
In Solr 3.x, I was relying on a postCommit call to a listener in the update handler to perform data update to caches, this data was used to perform 'realtime' filtering on the documents. So something like: ... In Solr 4.2.x, the postCommit is not called anymore on the slaves during

Re: Pointing to Hbase for Docuements or Directly Saving Documents at Hbase

2013-04-11 Thread Furkan KAMACI
Hi Otis; It seems that I should read more about highlights. Is there any where that explains in detail how highlights are generated at Solr? 2013/4/11 Otis Gospodnetic > Hi, > > You can't store highlights ahead of time because they are query > dependent. You could store documents in HBase and

Re: Index Replication Failing in Solr 4.2.1

2013-04-11 Thread Otis Gospodnetic
Hi Umesh, The attachment didn't make it through. Could you please add it to JIRA? http://wiki.apache.org/solr/HowToContribute Thanks, Otis -- Solr & ElasticSearch Support http://sematext.com/ On Wed, Apr 10, 2013 at 9:43 PM, Umesh Prasad wrote: > Root caused the Issue to a Code Bug / Contr

RE: Basic auth on SolrCloud /admin/* calls

2013-04-11 Thread adfel70
Hi I need to implement security in solr as follows: 1. prevent unauthorized users from accessing to solr admin pages. 2. prevent unauthorized users from performing solr operations - both /admin and /update. Is the conclusion of this thread is that this is not possible at the moment? -- View t

Re: Approximately needed RAM for 5000 query/second at a Solr machine?

2013-04-11 Thread Jack Krupansky
Segments are on a per-field basis... so doesn't it depend on how many fields are merged in parallel? I mean, when most people say "index size" they are referring to all fields, collectively, not individual fields. I'm just wondering how number of processor cores might affect things (more cores

Re: SolrCloud leader to replica

2013-04-11 Thread Otis Gospodnetic
I believe it indexes locally on leader first. Otherwise one could end up with a situation where indexing to replica(s) succeeds and indexing to leader fails, which I suspect might create a mess. Otis -- Solr & ElasticSearch Support http://sematext.com/ On Thu, Apr 11, 2013 at 2:53 AM, Zhang,

Re: Pointing to Hbase for Docuements or Directly Saving Documents at Hbase

2013-04-11 Thread Otis Gospodnetic
Hi, You can't store highlights ahead of time because they are query dependent. You could store documents in HBase and use Solr just for indexing. Is that what you want to do? If so, a custom SearchComponent executed after QueryComponent could fetch data from external store like HBase. I'm not

Re: migration solr 3.5 to 4.1 - JVM GC problems

2013-04-11 Thread Otis Gospodnetic
Marc, Re smaller index sizes - it's the stored field compression that didn't exist in 3.x. See https://issues.apache.org/jira/browse/SOLR-4375 Otis -- Solr & ElasticSearch Support http://sematext.com/ On Thu, Apr 11, 2013 at 10:53 AM, Marc Des Garets wrote: > Same config. I compared both, s

java.io.CharConversionException] Invalid UTF-8 character 0xffff at char #478803, byte #606190)

2013-04-11 Thread eakarsu
Hello, I am crawling with apache nutche some sites and index it with solr. It has been working fine until a few days ago. The crawled data can have 200K or more documents inside. When I send it to SOLR to index with bin/nutch solrindex http://.com:8080/solr crawl/crawldb -linkdb crawl/linkd

Re: migration solr 3.5 to 4.1 - JVM GC problems

2013-04-11 Thread Marc Des Garets
Same config. I compared both, some defaults changed like ramBufferSize which I've set like in 3.5 (same with other things). It becomes even more strange to me. Now I have changed the jvm settings to this: -d64 -server -Xms40g -Xmx40g -XX:+UseG1GC -XX:NewRatio=6 -XX:SurvivorRatio=2 -XX:G1ReservePer

Re: Approximately needed RAM for 5000 query/second at a Solr machine?

2013-04-11 Thread Furkan KAMACI
Thanks Walter, you guys gave me really nice ideas about RAM approximation. 2013/4/11 Walter Underwood > Here is the situation where merging can require 3X space. It can only > happen if you force merge, then index with merging turned off, but we had > Ultraseek customers do that. > > * All docum

Re: Approximately needed RAM for 5000 query/second at a Solr machine?

2013-04-11 Thread Walter Underwood
Here is the situation where merging can require 3X space. It can only happen if you force merge, then index with merging turned off, but we had Ultraseek customers do that. * All documents are merged into a single segment. * Without a merge, all documents are replaced. * This results in one segm

solr join use case (not in instead of in)

2013-04-11 Thread Ariel Zerbib
Solr has implemented from version 4 the !join query. I'd like to know if the following case is possible. For example, we have documents of the following form: doc1: field1:123 field2:A field3:456 doc2: field1:123 field2:B field3:789 doc3:

Re: Pointing to Hbase for Docuements or Directly Saving Documents at Hbase

2013-04-11 Thread Furkan KAMACI
Actually I don't think to store documents at Solr. I want to store just highlights (snippets) at Hbase and I want to retrieve them from Hbase when needed. What do you think about separating just highlights from Solr and storing them into Hbase at Solrclod. By the way if you explain at which process

RE: Approximately needed RAM for 5000 query/second at a Solr machine?

2013-04-11 Thread Michael Ryan
I've investigated this in the past. The worst case is 2*indexSize additional disk space (3*indexSize total) during an optimize. In our system, we use LogByteSizeMergePolicy, and used to have a mergeFactor of 10. We would see the worst case happen when there were exactly 20 segments (or some oth

Re: Solr : Search with special character

2013-04-11 Thread Jack Krupansky
You might also consider the white space tokenizer plus the word delimiter filter with a character map that treats desired special characters as ALPHA. -- Jack Krupansky -Original Message- From: meghana Sent: Thursday, April 11, 2013 2:48 AM To: solr-user@lucene.apache.org Subject: Re:

Re: migration solr 3.5 to 4.1 - JVM GC problems

2013-04-11 Thread Jack Krupansky
Same config? Do a compare with the new example config and see what settings are different/changed. There may have been some defaults that changed. Read the comments in the new config. If you had just taken or merged the new config, then I would suggest making sure that the update log is not en

Re: spell suggestions help

2013-04-11 Thread Jack Krupansky
Try replacing standard tokenizer with whitespace tokenizer in your field types. And make sure not to use any other token filters that might discard special characters (or provide a character map if they support one.) Also, be side to try your test terms in the Solr Admin UI ANalyzer page to se

Re: solr 3.4: memory leak?

2013-04-11 Thread Andre Bois-Crettez
On 04/11/2013 08:49 AM, Dmitry Kan wrote: SEVERE: The web application [/solr] appears to have started a thread named [MultiThreadedHttpConnectionManager cleanup] but has failed to stop it. This is very likely to create a memory leak. Apr 11, 2013 6:38:14 AM org.apache.catalina.loader.WebappClassL

Re: spell suggestions help

2013-04-11 Thread Rohan Thakur
urlencode replaces & with space thus resulting in results that contains even the single terms like in the case of mobile & accessories it replaces it with mobile accessories and results the document containing even accessories which i dont want. how to tackle this I tried using pattern replace filt

Re: spell suggestions help

2013-04-11 Thread Rohan Thakur
hi erick do we have to do urlencoding from the php side or does solr supports urlencode? On Thu, Apr 11, 2013 at 5:57 AM, Erick Erickson wrote: > Try URL encoding it and/or escaping the & > > On Tue, Apr 9, 2013 at 2:32 AM, Rohan Thakur wrote: > > hi all > > > > one thing I wanted to clear is

Re: migration solr 3.5 to 4.1 - JVM GC problems

2013-04-11 Thread Marc Des Garets
I have 45 solr 4.1 indexes. Sizes vary between 20Gb and 2.2Gb. - 1 is 20Gb (80 million docs) - 1 is 5.1Gb (24 million docs) - 1 is 5.6Gb (26 million docs) - 1 is 6.5Gb (28 million docs) - 11 others are about 2.2Gb (6-7 million docs). - 20 others are about 600Mb (2.5 million docs) That reminds me

Re: migration solr 3.5 to 4.1 - JVM GC problems

2013-04-11 Thread Furkan KAMACI
Hi Marc; Could I learn your index size and what is your performance measure as query per second? 2013/4/11 Marc Des Garets > Big heap because very large number of requests with more than 60 indexes > and hundreds of million of documents (all indexes together). My problem > is with solr 4.1. All

Re: Indexed data not searchable

2013-04-11 Thread Max Bo
Thanks alot, so I will make a XSLT. Great community here! -- View this message in context: http://lucene.472066.n3.nabble.com/Indexed-data-not-searchable-tp4054473p4055258.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr : Stopwords at query time

2013-04-11 Thread Upayavira
I'd suggest using the analyze tab in the admin UI to unpick what is going on. You can play with scenarios there without having to waste round trips indexing stuff. Upayavira On Thu, Apr 11, 2013, at 08:25 AM, meghana wrote: > In solr , I have text as like below format. > > 1s: This is very nice

Re: Score field statistics

2013-04-11 Thread Gora Mohanty
On 11 April 2013 13:41, lexus wrote: > Thanks for response, > > But problem is that "SolrException: undefined field: score" appears after > trying to get stats for score pseudo field. [...] Sorry, didn't catch the part where you also wanted statistics from the StatsComponent. I do not think that

Re: Score field statistics

2013-04-11 Thread lexus
Thanks for response, But problem is that "SolrException: undefined field: score" appears after trying to get stats for score pseudo field. Sincerely, Alex Gora Mohanty-3 wrote > Even simpler: You can just add &fl=*,score to get the score returned > in the search results along with all other fi

Solr : Stopwords at query time

2013-04-11 Thread meghana
In solr , I have text as like below format. 1s: This is very nice day. 4s: Christmas is about to come 7s: and christmas preparation is just on 12s: this is awesome!! I want that words like '1s:' , '4s:' , anything like 'ns:' should not be indexed and searchable, to do so I have added stop words

Re: migration solr 3.5 to 4.1 - JVM GC problems

2013-04-11 Thread Marc Des Garets
Big heap because very large number of requests with more than 60 indexes and hundreds of million of documents (all indexes together). My problem is with solr 4.1. All is perfect with 3.5. I have 0.05 sec GCs every 1 or 2mn and 20Gb of the heap is used. With the 4.1 indexes it uses 30Gb-33Gb, the s