Re: Modifying date format when using TrieDateField.

2014-08-13 Thread Modassar Ather
Thanks Erick for you inputs. Regards, Modassar On Tue, Aug 12, 2014 at 8:32 PM, Erick Erickson wrote: > The response will always be the full specification, > so you'll have -MM-dd'T'HH:mm:ss format. > If you want the user to just see the -MM-dd > you could use a DocTransformer to chang

Re: Disabling transaction logs

2014-08-13 Thread Ramkumar R. Aiyengar
(1) sounds a lot like SOLR-6261 I mention above. There are possibly other improvements since 4.6.1 as Mark mentions, I would certainly suggest you test with the latest version with the issue above patched (or use the current stable branch in svn, branch_4x) to see if that makes a difference.

Re: Writing my first Solr Search Component

2014-08-13 Thread Tri Cao
1. No, there is only one instance 2. init() is called 3. check these standard search components: https://github.com/apache/lucene-solr/tree/trunk/solr/core/src/java/org/apache/solr/handler/component Depending on what you are doing, you can pick the component that's closest to your purposes. Othe

Writing my first Solr Search Component

2014-08-13 Thread Apurv Verma
Hey all, I am writing my first solr search component and had the following questions. 1. Does each incoming query create a new solr search component object? 2. What happens at the time of core reload? 3. What are some good examples of SearchComponent s written? Thanks -- Regards, Apur

Re: solrCloud ignore inactive nodes

2014-08-13 Thread Erick Erickson
shards.tolerant=true On Wed, Aug 13, 2014 at 3:32 PM, Jonk wrote: > Am considering solrCloud for distributed search of a small set of widely > distributed indexes where each index will be small and non-deterministic > search behavior is permissible. Nodes may be "down", in which case merely >

Re: Solr cloud performance degradation with billions of documents

2014-08-13 Thread Erick Erickson
Several points: 1> Have you considered using the MapReduceIndexerTool for your ingestion? Assuming you don't have duplicate IDs, i.e. each doc is new, you can spread your indexing across as many nodes as you have in your cluster. That said, it's not entirely clear that you'll gain throughput since

Re: Solr cloud performance degradation with billions of documents

2014-08-13 Thread Jack Krupansky
Be careful when you say "instance" - that usually refers to a single Solr node. Anyway... 32 shards - with a replication factor of 1? So, given your worst case here, 5 billion documents in a 32-node cluster, that's 156 million documents per node. What is the index size on a typical node? And

Re: Regarding solr commits

2014-08-13 Thread Erick Erickson
SolrServer.add(Collection docs, int commitWithinMs) Best, Erick On Wed, Aug 13, 2014 at 10:13 AM, M, Arjun (NSN - IN/Bangalore) < arju...@nsn.com> wrote: > Hi Harsha, > > Thanks for the response. But I am sending the updates via XML messages. > Updates are through solr api in java. > > So it wi

Re: Disabling transaction logs

2014-08-13 Thread KNitin
Thanks Anshum. This is great to know. If any of you can share your experience with restarting such massive clusters, that will greatly help On Wed, Aug 13, 2014 at 3:19 PM, Anshum Gupta wrote: > Hi Nitin, > > There's already an issue for breaking the clusterstate.json. Here's the > link: > ht

solrCloud ignore inactive nodes

2014-08-13 Thread Jonk
Am considering solrCloud for distributed search of a small set of widely distributed indexes where each index will be small and non-deterministic search behavior is permissible. Nodes may be "down", in which case merely want to omit results from that node (shard). Small scale testing shows that i

Re: Disabling transaction logs

2014-08-13 Thread Anshum Gupta
Hi Nitin, There's already an issue for breaking the clusterstate.json. Here's the link: https://issues.apache.org/jira/browse/SOLR-5473 A lot of work has already been done on that one and hopefully, it should be in trunk soon. On Wed, Aug 13, 2014 at 3:13 PM, KNitin wrote: > Thanks, Mark. Yes

Re: Disabling transaction logs

2014-08-13 Thread KNitin
Thanks, Mark. Yes I keep track of the overseer and restart it in the end. The only thing that i observe is that as the zookeeper cluster state file grows, this behavior gets worse. I notice the following issues 1. Two nodes (different replicas for the same shard) get stuck in recovering stat

RE: Solr cloud performance degradation with billions of documents

2014-08-13 Thread Markus Jelsma
Hi - You are running mapred jobs on the same nodes as Solr runs right? The first thing i would think of is that your OS file buffer cache is abused. The mappers read all data, presumably residing on the same node. The mapper output and shuffling part would take place on the same node, only the r

RE: Solr cloud performance degradation with billions of documents

2014-08-13 Thread Toke Eskildsen
Wilburn, Scott [scott.wilb...@verizonwireless.com.INVALID] wrote: > Hardware wise, I have a 32-node Hadoop cluster that I use to run all of the > Solr shards and > each node has 128GB of memory. The current SolrCloud setup is split into 4 > > separate and > individual clouds of 32 shards each the

RE: Solr cloud performance degradation with billions of documents

2014-08-13 Thread Wilburn, Scott
Thanks for replying Jack. I have 4 SolrCloud instances( or clusters ), each consisting of 32 shards. The clusters do not have any interaction with each other. Thanks, Scott -Original Message- From: Jack Krupansky [mailto:j...@basetechnology.com] Sent: Wednesday, August 13, 2014 2:1

Re: Solr cloud performance degradation with billions of documents

2014-08-13 Thread Jack Krupansky
Could you clarify what you mean with the term "cloud", as in "per cloud" and "individual clouds"? That's not a proper Solr or SolrCloud concept per se. SolrCloud works with a single "cluster" of nodes. And there is no interaction between separate SolrCloud clusters. -- Jack Krupansky -Ori

Solr cloud performance degradation with billions of documents

2014-08-13 Thread Wilburn, Scott
Hello everyone, I am trying to use SolrCloud to index a very large number of simple documents and have run into some performance and scalability limitations and was wondering what can be done about it. Hardware wise, I have a 32-node Hadoop cluster that I use to run all of the Solr shards and e

Re: ICUTokenizer acting very strangely with oriental characters

2014-08-13 Thread Shawn Heisey
On 8/12/2014 9:13 PM, Steve Rowe wrote: > In the table below, the "IsSameS" (is same script) and "SBreak?" (script > break = not IsSameS) decisions are based on what I mentioned in my previous > message, and the "WBreak" (word break) decision is based on UAX#29 word > break rules: > > CharCode

Re: Regarding solr commits

2014-08-13 Thread M, Arjun (NSN - IN/Bangalore)
Hi Harsha, Thanks for the response. But I am sending the updates via XML messages. Updates are through solr api in java. So it will be good if you can provide the references that side. Arjun M On Aug 13, 2014 5:40 PM, ext Harshvardhan Ojha wrote: Hi Arjun, You can send commit request to sol

Re: Regarding solr commits

2014-08-13 Thread Erick Erickson
What version of Solr? If it's Solr 4.x, see: http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/ Let me say that your settings are _extremely_ aggressive. Hard committing every second is _very_ likely to cause you significant problems down the road so

Re: Unable to read HBase data from solr

2014-08-13 Thread Erick Erickson
Solr doesn't know a _thing_ about HBase. You say you've put the HBase jar somewhere and are accessing it in your script update processor. You need to find out where the HBase code is looking for the hbase-default.xml file and I'm afraid I have very little knowledge here. You _might_ have some joy

Re: matching "starts with" only

2014-08-13 Thread Erick Erickson
I'd recommend that you spend some time with the admin/analysis page. KeywordTokenizer doesn't break up the input at _all_. So the text "this is a black cat" will never match anything that starts out "black". String is even more restrictive, it not only doesn't tokenize, it won't allow lower case.

Re: Can I use multiple cores

2014-08-13 Thread Erick Erickson
You really can't tell until you prototype and measure. Here's a long blog on why what you're asking, although a reasonable request, is just about impossible to answer without prototyping and measuring. http://searchhub.org/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-an

Re: Disabling transaction logs

2014-08-13 Thread Mark Miller
That is good testing :) We should track down what is up with that 30%. Might open a JIRA with some logs. It can help if you restart the overseer node last. There are likely some improvements around this post 4.6. -- Mark Miller about.me/markrmiller On August 13, 2014 at 12:05:27 PM, KNitin (n

Re: Disabling transaction logs

2014-08-13 Thread KNitin
Thank u all! Yes I want to disable it for testing purposes The main issue is that rolling restart of solrcloud for 1000 collections is extremely unreliable and slow. More than 30% of the collections fail to recover. What are some good guidelines to follow while restarting a massive cluster like t

Re: FW: solr Analysis page matching question

2014-08-13 Thread Shawn Heisey
On 8/13/2014 8:59 AM, Corey Gerhardt wrote: > Here's hopefully a better explanation of what I'm asking. > > http://screencast.com/t/8blvgtJbY This would not match, because "restaurant" (one of the query terms) is not present in the field. The mm value of 100% requires *all* of the query terms to

Re: Autosuggest with spelling correction

2014-08-13 Thread Gopal Patwa
This jira has some documentation, may be this will help you.. https://issues.apache.org/jira/browse/SOLR-5683 On Wed, Aug 13, 2014 at 1:28 AM, Harun Reşit Zafer < harun.za...@tubitak.gov.tr> wrote: > Hi everyone, > > Currently I'm using AnalyzingInfixLookupFactory with a suggestions file > con

FW: solr Analysis page matching question

2014-08-13 Thread Corey Gerhardt
Here's hopefully a better explanation of what I'm asking. http://screencast.com/t/8blvgtJbY Thanks, Corey -Original Message- From: Corey Gerhardt [mailto:corey.gerha...@directwest.com] Sent: August-08-14 2:30 PM To: Solr User List Subject: solr Analysis page matching question Edismax

Re: SolrCloud OOM Problem

2014-08-13 Thread tuxedomoon
I applied the OPTS you pointed me to, here's the full string: CATALINA_OPTS="${CATALINA_OPTS} -XX:NewSize=1536m -XX:MaxNewSize=1536m -Xms12288m -Xmx12288m -XX:NewRatio=3 -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThreshold=8 -XX:+UseConcMarkSweepGC -XX:+CMSScavengeBeforeRemark -

Re: Help Required

2014-08-13 Thread Shawn Heisey
On 8/13/2014 5:11 AM, Dmitry Kan wrote: > OK, thanks. Can you please add my user name to the Contributor group? > > username: DmitryKan You are added. Edit away! Thanks, Shawn

Re: explaination of query processing in SOLR

2014-08-13 Thread Jack Krupansky
Why? The semantics are defined by the code and similarity matching algorithm, not... files. -- Jack Krupansky -Original Message- From: abhi Abhishek Sent: Wednesday, August 13, 2014 2:40 AM To: solr-user@lucene.apache.org Subject: Re: explaination of query processing in SOLR Thanks A

Re: Replication Issue with Repeater Please help

2014-08-13 Thread Shawn Heisey
On 8/13/2014 12:49 AM, waqas sarwar wrote: > Hi, I'm using Solr. I need a little bit assistance from you. I am bit > stuck with Solr replication, before discussing issue let me write a brief > description.Scenario:- I want to set up solr in distributed architecture, > suppose start w

Re: SolrCloud OOM Problem

2014-08-13 Thread Shawn Heisey
On 8/13/2014 5:42 AM, tuxedomoon wrote: > Have you used a queue to intercept queries and if so what was your > implementation? We are indexing huge amounts of data from 7 SolrJ instances > which run independently, so there's a lot of concurrent indexing. On my setup, the queries come from a java

Re: SolrCloud OOM Problem

2014-08-13 Thread Shawn Heisey
On 8/13/2014 5:34 AM, tuxedomoon wrote: > Great info. Can I ask how much data you are handling with that 6G or 7G > heap? My dev server is the one with the 7GB heap. My production servers only handle half the index shards, so they have the smaller heap. Here is the index size info from my dev s

Re: Regarding solr commits

2014-08-13 Thread Dmitry Kan
Hi, you can use: - waitSearcher = "true" | "false" — default is true — block until a new searcher is opened and registered as the main query searcher, making the changes visible. on the commit request. Dmitry On Wed, Aug 13, 2014 at 3:06 PM, M, Arjun (NSN - IN/Bangalore) < arju

Re: Regarding solr commits

2014-08-13 Thread Harshvardhan Ojha
Hi Arjun, You can send commit request to solr https://wiki.apache.org/solr/UpdateXmlMessages#A.22commit.22_and_.22optimize.22 Regards Harshvardhan Ojha On Wed, Aug 13, 2014 at 5:36 PM, M, Arjun (NSN - IN/Bangalore) < arju...@nsn.com> wrote: > Hi, > > I have a query regarding solr comm

Regarding solr commits

2014-08-13 Thread M, Arjun (NSN - IN/Bangalore)
Hi, I have a query regarding solr commits. We have a use case where we immediately query solr db after committing. We found that some data is missing as part of query results. We are predicting that commit wouldn't have completed when we query the solr db. Our solr commit config as fol

Unable to read HBase data from solr

2014-08-13 Thread Vivekanand Ittigi
I'm trying to read specific HBase data and index into solr using groovy script in "/update" handler of solrconfig file but I'm getting the error mentioned below I'm placing the same HBase jar on which i'm running in solr lib. Many article said WorkAround: 1. First i thought that class path has tw

Re: SolrCloud OOM Problem

2014-08-13 Thread tuxedomoon
Have you used a queue to intercept queries and if so what was your implementation? We are indexing huge amounts of data from 7 SolrJ instances which run independently, so there's a lot of concurrent indexing. -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-OOM-Pr

Re: SolrCloud OOM Problem

2014-08-13 Thread tuxedomoon
Great info. Can I ask how much data you are handling with that 6G or 7G heap? -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-OOM-Problem-tp4152389p4152712.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Help Required

2014-08-13 Thread Dmitry Kan
OK, thanks. Can you please add my user name to the Contributor group? username: DmitryKan On Tue, Aug 12, 2014 at 5:41 PM, Shawn Heisey wrote: > On 8/12/2014 3:57 AM, Dmitry Kan wrote: > > Hi, > > > > is http://wiki.apache.org/solr/Support page immutable? > > All pages on that wiki are change

Autosuggest with spelling correction

2014-08-13 Thread Harun Reşit Zafer
Hi everyone, Currently I'm using AnalyzingInfixLookupFactory with a suggestions file containing up to 3 word phrases. However this component can't keep suggesting in case of spelling errors. I heard about FuzzySuggester and found some sample configurations here

Re: Error for creating Index folder?

2014-08-13 Thread mahesh_varak
Hey thankx Bill Au -- View this message in context: http://lucene.472066.n3.nabble.com/Error-for-creating-Index-folder-tp489756p4152667.html Sent from the Solr - User mailing list archive at Nabble.com.

about solr question

2014-08-13 Thread 段启江
hi,all : i'm sorry to disturb you.But,now i suffer some questions with solr . i want to use solr DIH tool.But i don't konw if it support to One-To-Many table when import. note: Across data sources if support ,how to modify below xml. Thank you very much exemple :