Re: Backuping SolrCloud

2014-11-25 Thread elmerfudd
Thanks Ralph , Is it mandatory to stop indexing while processing backup? I didn't really understand the script - It's not iterating through all the shards an I thought it will. Thank you all for helping , I will keep track and contribute to the open JIRAs. -- View this message in context: htt

Re: ERROR StreamingSolrServers 4.10.2

2014-11-25 Thread Ahmet Arslan
Hi Joe, StreamingSolrServer is renamed to ConcurrentSolrServer. I am surprised you see 'StreamingSolrServers' in your logs. May be you need to upgrade your client jars too? Ahmet On Tuesday, November 25, 2014 4:42 AM, Joseph V J wrote: Hi Team, Does this mean that the updates on the other

Fwd: Change in the Score of Similiar Documents

2014-11-25 Thread rashi gandhi
Hi, I have created two shards at SOLR Server and I have indexed 6 documents (all docs having exactly same data = Welcome to SOLR). Let’s say ids are from 1 to 6 and they are indexed in such a way : Shard_one : ids with 2,4,6 are present in this shard. Shard_two : ids with 1,3,5 are present in

Fwd: Reindex Issues

2014-11-25 Thread rashi gandhi
Hi, I have created two shards at solr server and around 4K documents are equally indexed over these two shards. I did re-indexing for all the indexed documents (updating exiting docs with same data again). After Re-indexing, I found that my indexes are not optimized and there is change in the

Re: ERROR StreamingSolrServers 4.10.2

2014-11-25 Thread Joseph V J
Hello Ahmet, Thank you for the quick reply. I'm not sure about the 'client jars' you specified in the mail. JAR files used in my case are from the tar ball available from the location http://apache.cs.utah.edu/lucene/solr/4.10.2/solr-4.10.2.tgz ~Regards Joe On Tue, Nov 25, 2014 at 1:45 PM, Ahmet

AW: CoreContainer : create new cores reusing/sharing solrconfig.xml and schema.xml

2014-11-25 Thread Clemens Wyss DEV
We are "skipping" Java 1.7, so no excuse required ;) -Ursprüngliche Nachricht- Von: Erick Erickson [mailto:erickerick...@gmail.com] Gesendet: Montag, 24. November 2014 17:12 An: solr-user@lucene.apache.org Betreff: Re: CoreContainer : create new cores reusing/sharing solrconfig.xml and

AW: CoreContainer : create new cores reusing/sharing solrconfig.xml and schema.xml

2014-11-25 Thread Clemens Wyss DEV
Even if I set CORE_CONFIGSET I get org.apache.solr.common.SolrException: Unable to create core [test] at org.apache.solr.core.CoreContainer.create(CoreContainer.java:507) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:466) at ch.mysign.indexing.solr.SolrTest

Re: ERROR StreamingSolrServers 4.10.2

2014-11-25 Thread Ahmet Arslan
Hi, By saying "client jars" I mean standalone java program that send documents to sold server. I assume that you use SolrJ for indexing, like described here : https://cwiki.apache.org/confluence/display/solr/Using+SolrJ Updating solr server is one thing, if you are using solr java client, updat

Re: Fwd: Reindex Issues

2014-11-25 Thread Ahmet Arslan
Hi, The query you use is constant score query, so as long as all documents assigned to same score, it is not a problem. Also you may want to read about expungeDeletes. Ahmet On Tuesday, November 25, 2014 10:23 AM, rashi gandhi wrote: Hi, I have created two shards at solr server and arou

Case Insensitive Matching in Solr/Lucene

2014-11-25 Thread Apurv Verma
Hey all, The standard solution to doing a case-insensitive match in lucene is to use a Lowercase filter at index and query time. However this does not preserve the content of the original document. For example if my inverted index is. Term Doc_1 Doc_2 - Quick |

Re: Case Insensitive Matching in Solr/Lucene

2014-11-25 Thread Ahmet Arslan
Hi Apurv, You can create an additional field for case sensitive search, and then you can switch at query time. You will have two fields (text_ci and text_lower) with different analysers populated with copyField. Ahmet On Tuesday, November 25, 2014 1:39 PM, Apurv Verma wrote: Hey all, The sta

Re: Case Insensitive Matching in Solr/Lucene

2014-11-25 Thread Apurv Verma
Hii Ahmet, Thanks for your reply. Creating two separate fields is a viable solution where one contains the original value and the other contains the lowercased value. But this leads to index bloat up. (~ 2x) I am looking for any other alternative solutions. -- Regards, Apurv Verma On Tue, Nov

Re: Case Insensitive Matching in Solr/Lucene

2014-11-25 Thread Ahmet Arslan
Hi Apurv, I wouldn't worry about index size, increase in index size is not linear (2x) like that. Please see similar discussion : https://issues.apache.org/jira/browse/LUCENE-5620 Ahmet On Tuesday, November 25, 2014 1:46 PM, Ahmet Arslan wrote: Hi Apurv, You can create an additional fi

Re: Reindex Issues

2014-11-25 Thread Jack Krupansky
When a document is reindexed, the old document is deleted and the new document is added. The deleted document is not visible on queries, but the document frequency (df) for terms includes the count of deleted documents containing the terms. I would expect that df would double if all documents a

Re: Case Insensitive Matching in Solr/Lucene

2014-11-25 Thread Michael Sokolov
The index size will not increase as quickly as you might think, and is not an issue in most cases. An alternative to two fields, though, is to index both upper- and lower-case tokens at the same position in a single field, and then to perform no case folding at query time. There is no standar

Re: Case Insensitive Matching in Solr/Lucene

2014-11-25 Thread Michael Sokolov
right -- missed Ahmet's answer there in my haste to respond ... -Mike On 11/25/14 6:56 AM, Ahmet Arslan wrote: Hi Apurv, I wouldn't worry about index size, increase in index size is not linear (2x) like that. Please see similar discussion : https://issues.apache.org/jira/browse/LUCENE-5620 A

Re: Fwd: Change in the Score of Similiar Documents

2014-11-25 Thread Michael Sokolov
Scores are related to total term frequencies *in each shard*, not globally, and I think they may include term counts from deleted documents as well, which could account for the discrepancy in scores across the two shards. -Mike On 11/25/14 3:22 AM, rashi gandhi wrote: Hi, I have created t

Re: Case Insensitive Matching in Solr/Lucene

2014-11-25 Thread Heyde, Ralf
Simply take 2 fields for sensitive and in-sensitive selection Am 25.11.2014 12:39 schrieb "Apurv Verma" : > Hey all, > The standard solution to doing a case-insensitive match in lucene is to > use a Lowercase filter at index and query time. However this does not > preserve the content of the orig

Re: Case Insensitive Matching in Solr/Lucene

2014-11-25 Thread Apurv Verma
Hey Michael, Thanks for your reply. My use case is a little different. I would like to get the original values in facet queries but I would like to apply filter queries in a case insensitive fashion. For example I require facet_query to return Quick, The, brown, ... But I want filter queries of

Help on matching a shingle in a query to a shingle in the document

2014-11-25 Thread vit
Example what I need: Query: Hi likes *this kind of winter *weather Document shingle field: They like *this kind of winter *with many sunny days So I need to match *this kind of winter *. What tokenisers and filters and maybe something else should be used for this kind of match. I tried for exa

Re: Case Insensitive Matching in Solr/Lucene

2014-11-25 Thread Alexandre Rafalovitch
The usual solution is to have faceting using the other field (with copyField). Usually it is because people want the original unmodified version the string without tokenization (So, "United States of America" instead of "united" "states" "america"). It sounds like your case is a little different an

Re: Case Insensitive Matching in Solr/Lucene

2014-11-25 Thread Ahmet Arslan
Hi, CapitalizationFilterFactory could be useful to build nice looking facet parameters. Ahmet On Tuesday, November 25, 2014 3:28 PM, Alexandre Rafalovitch wrote: The usual solution is to have faceting using the other field (with copyField). Usually it is because people want the original unmo

Re: Help on matching a shingle in a query to a shingle in the document

2014-11-25 Thread Erick Erickson
Tokenizers, filters and the like have no real way to figure out that some words in the query are to be ignored. In your example, how would one algorithmically determine that "this kind of winter" is important and that "Hi", "likes" and "weather" aren't? What's different about like/likes that indica

Re: Help on matching a shingle in a query to a shingle in the document

2014-11-25 Thread Alexandre Rafalovitch
Sounds like an attempt to identify stable Multi Word Units, sometimes used in Natural Language Processing. In that case, a Shingle factory plus using the field as a facet might do the trick. The shingle will generate a "token" that is "this kind of winter" and facet will give back a count for it.

determine amount of memory used by different solr caches

2014-11-25 Thread sumitj25
Hi, I posted a question on stackoverflow regarding this http://stackoverflow.com/questions/26909948/how-to-determine-amount-of-memory-used-by-different-solr-caches Haven't received a

Re: Help on matching a shingle in a query to a shingle in the document

2014-11-25 Thread vit
Erick, What you are saying of course makes perfect sense. But in our particular situation there is a high probability that an essential part of the query will match a meaningful part or a business name in a short description indexed as shingle. Also it is better than just a broad match. Besides I

Re: determine amount of memory used by different solr caches

2014-11-25 Thread Alexandre Rafalovitch
Have you looked in the Admin UI's Plugin and Stats' menu? https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=32604180 I believe there are cache sizes (in units) there and you can even lock and compare to verify the impact of specific operations. You may also find this article useful:

Re: determine amount of memory used by different solr caches

2014-11-25 Thread Erick Erickson
Each of these is essentially a map where filterCache: key: the filter query value: a bitmap of docs that satisfy the clause, i.e. maxDoc/8 Note, I'm cheating a little here, this is the max size of each entry. queryResultCache: key: the query, probably the whole thing value: an array

Replicate a collection to a 2nd SolrCloud

2014-11-25 Thread Gili Nachum
Hi, *I need to replicate a collection between SolrClouds, anyone did it?*The replication style I need is one direction replicating anything that happens on my main site SolrCloud to the DR site (master->salve) I considered and decide against synchronizing the collections' shards Lucene index ov

Re: Replicate a collection to a 2nd SolrCloud

2014-11-25 Thread Alexandre Rafalovitch
Is this one off or on the ongoing bases? For ongoing bases, Apple did a presentation at the Solr Revolution and basically they had a client that would send updates to Solr and the queue. Then, the other client would read from the queue and apply. Theirs was bidirectional and symmetric setup, so it

Re: Case Insensitive Matching in Solr/Lucene

2014-11-25 Thread Shawn Heisey
On 11/25/2014 6:27 AM, Alexandre Rafalovitch wrote: > The usual solution is to have faceting using the other field (with > copyField). Usually it is because people want the original unmodified > version the string without tokenization (So, "United States of > America" instead of "united" "states" "

Does any solr version use lucene concurrent flush

2014-11-25 Thread Aaron Beach
-- Aaron Beach Senior Data Scientist w: +1-303-625-7043 SendGrid -- Email Delivery. Simplified.

Re: Does any solr version use lucene concurrent flush

2014-11-25 Thread Shawn Heisey
On 11/25/2014 2:37 PM, Aaron Beach wrote: I'm fairly clueless when it comes to how Lucene internals work or how Solr uses those internals ... but I remember hearing a LOT about concurrent flushing in the leadup to the Lucene/Solr 4.0 release, and what a performance boost it was for indexing speed.

Re: Replicate a collection to a 2nd SolrCloud

2014-11-25 Thread Gili Nachum
Replication should occur on ongoing basis. Sounds interesting. Would wait for the 2014 Solr Revolution videos to be released. On Tue, Nov 25, 2014 at 10:33 PM, Alexandre Rafalovitch wrote: > Is this one off or on the ongoing bases? > > For ongoing bases, Apple did a presentation at the Solr Rev

Re: Replicate a collection to a 2nd SolrCloud

2014-11-25 Thread Alexandre Rafalovitch
Uhm. That was about it for the details. Maybe Shalin will chip in with more, though I suspect he is under NDA for deep specifics. But yes, the videos should be out soon. Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com

Re: Replicate a collection to a 2nd SolrCloud

2014-11-25 Thread Otis Gospodnetic
Hi, I think you are looking for this: http://search-lucene.com/?q=Cross+Data+Center+Replication&fc_project=Solr ==> https://issues.apache.org/jira/browse/SOLR-6273 Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr & Elasticsearch Support * http://sematext.com/

Re: Does any solr version use lucene concurrent flush

2014-11-25 Thread Otis Gospodnetic
Yes. Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr & Elasticsearch Support * http://sematext.com/ On Tue, Nov 25, 2014 at 4:37 PM, Aaron Beach wrote: > -- > Aaron Beach > Senior Data Scientist > w: +1-303-625-7043 > > SendGrid -- Email Delivery. Simplified

Re: Case Insensitive Matching in Solr/Lucene

2014-11-25 Thread Erick Erickson
DocValues are restricted to certain types of untokenized fields, specifically string, Trie* and UUID. So lowercasefilter is just not even in the picture. Furthermore, changing to DocValues requires completely re-indexing, so Best, Erick On Tue, Nov 25, 2014 at 1:26 PM, Shawn Heisey wrote: >

Exception in unit tests for distributed search component

2014-11-25 Thread Suchi Amalapurapu
Hi I am trying to test a custom distributed component with solr 4.6.1 which extends BaseDistributedSearchTestCase but end up with the following error. There are lot of tests in the solr code base which extend BaseDistributedSearchTestCase. Not sure what is wrong here. Suchi testDistribSearch(com.

updateNumericDocValue in solr 4.6.1

2014-11-25 Thread Suchi Amalapurapu
All The following code changes don't seem to really update the docValue in my case. IndexWriter iw = core.getSolrCoreState().getIndexWriter(core).get(); value = Long.parseLong(score); Term term = new Term(ID, id1); iw.updateNumericDocValue(term, 'rank', value); iw.commit() Schema changes: D

Re: updateNumericDocValue in solr 4.6.1

2014-11-25 Thread Mikhail Khludnev
Hello Suchi, It seems like work in progress https://issues.apache.org/jira/browse/SOLR-5944 but hasn't been done completely. On Wed, Nov 26, 2014 at 7:24 AM, Suchi Amalapurapu wrote: > All > The following code changes don't seem to really update the docValue in my > case. > > IndexWriter iw =