Re: ReplicationFactor for solrcloud

2013-09-11 Thread Aloke Ghoshal
Hi Aditya, You need to start another 6 instances (9 instances in total) to achieve this. The first 3 instances, as you mention, are already assigned to the 3 shards. The next 3 will be become their replicas, followed by the next 3 as the next replicas. You could create two copies each of the exam

Re: charset encoding

2013-09-11 Thread Andreas Owen
no jetty, and yes for tomcat i've seen a couple of answers On 12. Sep 2013, at 3:12 AM, Otis Gospodnetic wrote: > Using tomcat by any chance? The ML archive has the solution. May be on > Wiki, too. > > Otis > Solr & ElasticSearch Support > http://sematext.com/ > On Sep 11, 2013 8:56 AM, "Andreas

create a core with explicit node_name

2013-09-11 Thread YouPeng Yang
Hi solr users I want to create a core with node_name through the api CloudSolrServer.query(SolrParams params ). For example: ModifiableSolrParams params = new ModifiableSolrParams(); params.set("qt", "/admin/cores"); params.set("action", "CREATE"); params.set("nam

Storing/indexing speed drops quickly

2013-09-11 Thread Per Steffensen
Hi SolrCloud 4.0: 6 machines, quadcore, 8GB ram, 1T disk, one Solr-node on each, one collection across the 6 nodes, 4 shards per node Storing/indexing from 100 threads on external machines, each thread one doc at the time, full speed (they always have a new doc to store/index) See attached ima

DataImportHandler oddity

2013-09-11 Thread Raymond Wiker
I'm trying to index a view in an Oracle database, and have come across some strange behaviour: all the VARCHAR2 fields are being returned as empty strings; this also applies to a datetime field converted to a string via TO_CHAR, and the url field built by concatenating two constant strings and a nu

Re: SolrCloud 4.x hangs under high update volume

2013-09-11 Thread Tim Vaillancourt
Thanks Erick! Yeah, I think the next step will be CloudSolrServer with the SOLR-4816 patch. I think that is a very, very useful patch by the way. SOLR-5232 seems promising as well. I see your point on the more-shards idea, this is obviously a global/instance-level lock. If I really had to, I

Re: No or limited use of FieldCache

2013-09-11 Thread Per Steffensen
Thanks, guys. Now I know a little more about DocValues and realize that they will do the job wrt FieldCache. Regards, Per Steffensen On 9/12/13 3:11 AM, Otis Gospodnetic wrote: Per, check zee Wiki, there is a page describing docvalues. We used them successfully in a solr for analytics scenari

Re: number of replicas in Cloud

2013-09-11 Thread Prasi S
Hi Anshum, Im using solr 4.4. Is there a problem with using replicationFactor of 2 On Thu, Sep 12, 2013 at 11:20 AM, Anshum Gupta wrote: > Prasi, a replicationFactor of 2 is what you want. However, as of the > current releases, this is not persisted. > > > > On Thu, Sep 12, 2013 at 11:17 AM, P

Re: number of replicas in Cloud

2013-09-11 Thread Anshum Gupta
Prasi, a replicationFactor of 2 is what you want. However, as of the current releases, this is not persisted. On Thu, Sep 12, 2013 at 11:17 AM, Prasi S wrote: > Hi, > I want to setup solrcloud with 2 shards and 1 replica for each shard. > > MyCollection > > shard1 , shard2 > shard1-replica , s

number of replicas in Cloud

2013-09-11 Thread Prasi S
Hi, I want to setup solrcloud with 2 shards and 1 replica for each shard. MyCollection shard1 , shard2 shard1-replica , shard2-replica In this case, i would "numShards=2". For replicationFactor , should give replicationFactor=1 or replicationFActor=2 ? Pls suggest me. thanks, Prasi

Wrapper for SOLR for Compression

2013-09-11 Thread William Bell
I asked this before... But can we add a parameter for SOLR to expose the compression modes to solrconfig.xml ? >* https://issues.apache.org/jira/browse/LUCENE-4226* >* It mentions that we can set compression mode:* >* FAST, HIGH_COMPRESSION, FAST_UNCOMPRESSION.* > -- Bill Bell billnb...@gmail.

Re: Can we used CloudSolrServer for searching data

2013-09-11 Thread Dharmendra Jaiswal
Thanks for your reply. I am using Solrcloud with zookeeper setup. And using CloudSolrServer for both indexing and searching. As per my understanding CloudSolrserver by default using LBHttpSolrServer. And CloudSolrServer connect to Zookeeper and passing all the running server node to LBHttpSolrServ

Re: Distributing lucene segments across multiple disks.

2013-09-11 Thread Deepak Konidena
Very helpful link. Thanks for sharing that. -Deepak On Wed, Sep 11, 2013 at 4:34 PM, Shawn Heisey wrote: > On 9/11/2013 4:16 PM, Deepak Konidena wrote: > >> As far as RAM usage goes, I believe we set the heap size to about 40% of >> the RAM and less than 10% is available for OS caching ( sin

Re: charset encoding

2013-09-11 Thread Otis Gospodnetic
Using tomcat by any chance? The ML archive has the solution. May be on Wiki, too. Otis Solr & ElasticSearch Support http://sematext.com/ On Sep 11, 2013 8:56 AM, "Andreas Owen" wrote: > i'm using solr 4.3.1 with tika to index html-pages. the html files are > iso-8859-1 (ansi) encoded and the met

Re: No or limited use of FieldCache

2013-09-11 Thread Otis Gospodnetic
Per, check zee Wiki, there is a page describing docvalues. We used them successfully in a solr for analytics scenario. Otis Solr & ElasticSearch Support http://sematext.com/ On Sep 11, 2013 9:15 AM, "Michael Sokolov" wrote: > On 09/11/2013 08:40 AM, Per Steffensen wrote: > >> The reason I menti

Re: solr performance against oracle

2013-09-11 Thread Chris Hostetter
Setting asside the excellent responses that have already been made in this thread, there are fundemental discrepencies in what you are comparing in your respective timing tests. first off: a micro benchmark like this is virtually useless -- unless you really plan on only ever executing a singl

Re: Grouping by field substring?

2013-09-11 Thread Jack Krupansky
Do a copyField to another field, with a limit of 8 characters, and then use that other field. -- Jack Krupansky -Original Message- From: Ken Krugler Sent: Wednesday, September 11, 2013 8:24 PM To: solr-user@lucene.apache.org Subject: Grouping by field substring? Hi all, Assuming I w

Grouping by field substring?

2013-09-11 Thread Ken Krugler
Hi all, Assuming I want to use the first N characters of a specific field for grouping results, is such a thing possible out-of-the-box? If not, then what would the next best option be? E.g. a custom function query? Thanks, -- Ken -- Ken Krugler +1 530-210-6378 http://

Re: Do I need to delete my index?

2013-09-11 Thread Brian Robinson
Thanks Erick On 9/11/2013 6:46 PM, Erick Erickson wrote: Typically I'll just delete the entire data dir recursively after shutting down Solr, the default location is /solr/collectionblah/data On Wed, Sep 11, 2013 at 6:01 PM, Brian Robinson wrote: Thanks Shawn. I had actually tried changing &

Re: Do I need to delete my index?

2013-09-11 Thread Erick Erickson
Typically I'll just delete the entire data dir recursively after shutting down Solr, the default location is /solr/collectionblah/data On Wed, Sep 11, 2013 at 6:01 PM, Brian Robinson wrote: > Thanks Shawn. I had actually tried changing &load= to &load=, but > still got the error. It sounds like

Re: Distributing lucene segments across multiple disks.

2013-09-11 Thread Shawn Heisey
On 9/11/2013 4:16 PM, Deepak Konidena wrote: As far as RAM usage goes, I believe we set the heap size to about 40% of the RAM and less than 10% is available for OS caching ( since replica takes another 40%). Why does unallocated RAM help? How does it impact performance under load? Because once

Re: Do I need to delete my index?

2013-09-11 Thread Brian Robinson
Thanks Shawn. I had actually tried changing &load= to &load=, but still got the error. It sounds like addDocuments is worth a try, though. On 9/11/2013 4:37 PM, Shawn Heisey wrote: On 9/11/2013 2:17 PM, Brian Robinson wrote: I'm in the process of creating my index using a series of SolrClient:

Re: Distributing lucene segments across multiple disks.

2013-09-11 Thread Deepak Konidena
@Greg - Thanks for the suggestion. Will pass it along to my folks. @Shawn - That's the link I was looking for 'non-SolrCloud approach to distributed search'. Thanks for passing that along. Will give it a try. As far as RAM usage goes, I believe we set the heap size to about 40% of the RAM and les

Re: Distributing lucene segments across multiple disks.

2013-09-11 Thread Shawn Heisey
On 9/11/2013 2:57 PM, Deepak Konidena wrote: I guess at this point in the discussion, I should probably give some more background on why I am doing what I am doing. Having a single Solr shard (multiple segments) on the same disk is posing severe performance problems under load,in that, calls to S

ReplicationFactor for solrcloud

2013-09-11 Thread Aditya Sakhuja
Hi - I am trying to set the 3 shards and 3 replicas for my solrcloud deployment with 3 servers, specifying the replicationFactor=3 and numShards=3 when starting the first node. I see each of the servers allocated to 1 shard each.however, do not see 3 replicas allocated on each node. I specificall

Re: Do I need to delete my index?

2013-09-11 Thread Shawn Heisey
On 9/11/2013 3:17 PM, Brian Robinson wrote: In addition, if I do need to delete my index, how do I go about that? I've been looking through the documentation and can't find anything specific. I know where the index is, I'm just not sure which files to delete. Generally you'll find it in a path

RE: Distributing lucene segments across multiple disks.

2013-09-11 Thread Greg Walters
Deepak, It might be a bit outside what you're willing to consider but you can make a raid out of your spinning disks then use your SSD(s) as a dm-cache device to accelerate reads and writes to the raid device. If you're putting lucene indexes on a mixed bag of disks and ssd's without any type o

Re: Do I need to delete my index?

2013-09-11 Thread Shawn Heisey
On 9/11/2013 2:17 PM, Brian Robinson wrote: I'm in the process of creating my index using a series of SolrClient::request commands in PHP. I ran into a problem when some of the fields that I had as "text_general" fieldType contained "&load=" in a URL, triggering an error because the HTML entity "

Re: Do I need to delete my index?

2013-09-11 Thread Brian Robinson
In addition, if I do need to delete my index, how do I go about that? I've been looking through the documentation and can't find anything specific. I know where the index is, I'm just not sure which files to delete. Hello, I'm in the process of creating my index using a series of SolrClient:

Re: Distributing lucene segments across multiple disks.

2013-09-11 Thread Deepak Konidena
I guess at this point in the discussion, I should probably give some more background on why I am doing what I am doing. Having a single Solr shard (multiple segments) on the same disk is posing severe performance problems under load,in that, calls to Solr cause a lot of connection timeouts. When we

Re: Error while importing HBase data to Solr using the DataImportHandler

2013-09-11 Thread ppatel
Hi, Can you provide me an example of data-config.xml? because with my Hbase configuration, I am getting Full Import failed:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.solr.handler.dataimport.DataImportHandlerException: java.lang.NoSuchMethodError: org.apache.hadoop.net.NetUt

Re: Distributing lucene segments across multiple disks.

2013-09-11 Thread Shawn Heisey
On 9/11/2013 1:07 PM, Deepak Konidena wrote: Are you suggesting a multi-core setup, where all the cores share the same schema, and the cores lie on different disks? Basically, I'd like to know if I can distribute shards/segments on a single machine (with multiple disks) without the use of zookee

Do I need to delete my index?

2013-09-11 Thread Brian Robinson
Hello, I'm in the process of creating my index using a series of SolrClient::request commands in PHP. I ran into a problem when some of the fields that I had as "text_general" fieldType contained "&load=" in a URL, triggering an error because the HTML entity "load" wasn't recognized. I realize

RE: Distributing lucene segments across multiple disks.

2013-09-11 Thread Greg Walters
Deepak, Sorry for not being more verbose in my previous suggestion. As I take your question, you'd like to spread your index files across multiple disks (for performance or space reasons I assume). If you used even a basic md-raid setup you could then format the raid device and thus your entire

RE: Distributing lucene segments across multiple disks.

2013-09-11 Thread Greg Walters
Why not use some form of RAID for your index store? You'd get the performance benefit of multiple disks without the complexity of managing them via solr. Thanks, Greg -Original Message- From: Deepak Konidena [mailto:deepakk...@gmail.com] Sent: Wednesday, September 11, 2013 2:07 PM To:

Re: Distributing lucene segments across multiple disks.

2013-09-11 Thread Deepak Konidena
Are you suggesting a multi-core setup, where all the cores share the same schema, and the cores lie on different disks? Basically, I'd like to know if I can distribute shards/segments on a single machine (with multiple disks) without the use of zookeeper. -Deepak On Wed, Sep 11, 2013 at 11

Re: Distributing lucene segments across multiple disks.

2013-09-11 Thread Deepak Konidena
@Greg - Are you suggesting RAID as a replacement for Solr or making Solr work with RAID? Could you elaborate more on the latter, if that's you meant? We make use of solr's advanced text processing features which would be hard to replicate just using RAID. -Deepak On Wed, Sep 11, 2013 at 12:11

Re: Distributing lucene segments across multiple disks.

2013-09-11 Thread Upayavira
I think you'll find it hard to distribute different segments between disks, as they are typically stored in the same directory. However, instantiating separate cores on different disks should be straight-forward enough, and would give you a performance benefit. I've certainly heard of that done a

Distributing lucene segments across multiple disks.

2013-09-11 Thread Deepak Konidena
Hi, I know that SolrCloud allows you to have multiple shards on different machines (or a single machine). But it requires a zookeeper installation for doing things like leader election, leader availability, etc While SolrCloud may be the ideal solution for my usecase eventually, I'd like to know

Re: Higher Memory Usage with solr 4.4

2013-09-11 Thread Shawn Heisey
On 9/11/2013 8:54 AM, Kuchekar wrote: We are using solr 4.4 on Linux with OpenJDK 64-Bit. We started the Solr with 40GB but we noticed that the QTime is way high compared to similar on 3.5 solr. Both the 3.5 and 4.4 solr's configurations and schema are similarly constructed. Also during the

Re: synonyms not working

2013-09-11 Thread cheops
thanx for your help. could solve the problem meanwhile! i used ...which is wrong, it must be -- View this message in context: http://lucene.472066.n3.nabble.com/synonyms-not-working-tp4089318p4089345.html Sent from the Solr - User mailing list archive at Nabble.com.

synonyms not working

2013-09-11 Thread cheops
Hi, I'm using solr4.4 and try to use different synonyms based on different fieldtypes: ...I have the same fieldtype for english (name="text_general_en" and synonyms="synonyms_en.txt"). The first fiel

Re: synonyms not working

2013-09-11 Thread Erick Erickson
Attach &debug=query to your URL and inspect the parsed query, you should be seeing the substitutions if you're configured correctly. Multi-word synonyms at query time have the "getting through the query parser" problem. Best Erick On Wed, Sep 11, 2013 at 11:04 AM, cheops wrote: > Hi, > I'm usi

Re: Higher Memory Usage with solr 4.4

2013-09-11 Thread Erick Erickson
There are some defaults (sorry, don't have them listed) that are somewhat different. If you took your 3.5 and just used it for 4.x, it's probably worth going back over it and start with the 4.x example and add in any customizations you did for 3.5... But in general, the memory usage for 4.x should

Re: Error with Solr 4.4.0, Glassfish, and CentOS 6.2

2013-09-11 Thread Shawn Heisey
On 9/10/2013 9:18 PM, vhoangvu wrote: Yesterday, I just install latest version of Solr 4.4.0 on Glassfish and CentOS 6.2 and got an error when try to access the administration page. I have checked this version on Mac OS one month ago, it works well. So, please help me clarify what problem. [

Higher Memory Usage with solr 4.4

2013-09-11 Thread Kuchekar
Hi, We are using solr 4.4 on Linux with OpenJDK 64-Bit. We started the Solr with 40GB but we noticed that the QTime is way high compared to similar on 3.5 solr. Both the 3.5 and 4.4 solr's configurations and schema are similarly constructed. Also during the triage we found the physical memory

Re: Dynamic analizer settings change

2013-09-11 Thread Erick Erickson
You're still in danger of overly-broad hits. When you try stemming differently into the _same_ underlying field you get things that make sense in one language but are totally bogus in another language matching the query. As far as lots and lots of fields is concerned, if you want to restrict your

Re: Facet values for spacial field

2013-09-11 Thread Erick Erickson
It seems like the right thing to do here is store something more intelligible than an encoded lat/lon pair and facet on that instead. lat/lon, even bare are not all that useful without some effort anywa... FWIW, Erick On Wed, Sep 11, 2013 at 9:24 AM, Köhler Christian wrote: > Hi Eric (and othe

Re: solrj-httpclient-slow

2013-09-11 Thread Erick Erickson
First, I would be wary of mixing the solrj version with a different solr version. They are pretty compatible but what are you expecting to gain for the risk? Regardless, though, that shouldn't be your problem. You'll have to give us a lot more detail about what you're trying to do, what you mean b

Re: Dynamic analizer settings change

2013-09-11 Thread maephisto
Thanks Jack! Indeed, very nice examples in your book. Inspired from there, here's a crazy idea: would it be possible to build a custom processor chain that would detect the language and use it to apply filters, like the aforementioned SnowballPorterFilter. That would leave at the end a document ha

AW: Facet values for spacial field

2013-09-11 Thread Köhler Christian
Hi Eric (and others), thanx for the the explanation. This helps. For the usecase: I am cataloging findings of field expeditions. The collectors usualy store a single location for the field trip, so the numer of locations is limited. Regards Chris Von: E

Re: No or limited use of FieldCache

2013-09-11 Thread Michael Sokolov
On 09/11/2013 08:40 AM, Per Steffensen wrote: The reason I mention sort is that we in my project, half a year ago, have dealt with the FieldCache->OOM-problem when doing sort-requests. We basically just reject sort-requests unless they hit below X documents - in case they do we just find them w

charset encoding

2013-09-11 Thread Andreas Owen
i'm using solr 4.3.1 with tika to index html-pages. the html files are iso-8859-1 (ansi) encoded and the meta tag "content-encoding" as well. the server-http-header says it's utf8 and firefox-webdeveloper agrees. when i index a page with special chars like ä,ö,ü solr outputs it completly forei

Re: Dynamic analizer settings change

2013-09-11 Thread Jack Krupansky
Yes, supporting multiple languages will be a performance hit, but maybe it won't be so bad since all but one of these language-specific fields will be empty for each document and Lucene text search should handle empty field values just fine. If you can't accept that performance hit, don't suppor

Re: No or limited use of FieldCache

2013-09-11 Thread Per Steffensen
The reason I mention sort is that we in my project, half a year ago, have dealt with the FieldCache->OOM-problem when doing sort-requests. We basically just reject sort-requests unless they hit below X documents - in case they do we just find them without sorting and sort them ourselves afterwa

RE: Dynamic analizer settings change

2013-09-11 Thread Markus Jelsma
-Original message- > From:maephisto > Sent: Wednesday 11th September 2013 14:34 > To: solr-user@lucene.apache.org > Subject: Re: Dynamic analizer settings change > > Thanks, Erik! > > I might have missed mentioning something relevant. When querying Solr, I > wouldn't actually need

Re: Profiling Solr Lucene for query

2013-09-11 Thread Manuel Le Normand
Dmitry - currently we don't have such a front end, this sounds like a good idea creating it. And yes, we do query all 36 shards every query. Mikhail - I do think 1 minute is enough data, as during this exact minute I had a single query running (that took a qtime of 1 minute). I wanted to isolate t

Re: Dynamic analizer settings change

2013-09-11 Thread maephisto
Thanks, Erik! I might have missed mentioning something relevant. When querying Solr, I wouldn't actually need to query all fields, but only the one corresponding to the language picked by the user on the website. If he's using DE, then the search should only apply to the text_de field. What if I

solrj-httpclient-slow

2013-09-11 Thread xiaoqi
hi,everyone when i track my solr client timing cost , i find one problem : some time the whole execute time is very long ,when i go to detail ,i find the solr server execute short time , then the main costs inside httpclient (make a connection ,send request or recived response ,blablabla.

Re: Stemming and protwords configuration

2013-09-11 Thread Erick Erickson
Did you try putting them _all_ in protwords.txt? i.e. frais, fraise, fraises? Don't forget to re-index. An alternative is to index in a second field that doesn't have the stemmer and when you want exact matches, search against that field. Best Erick On Mon, Sep 9, 2013 at 10:29 AM, wrote: >

Re: No or limited use of FieldCache

2013-09-11 Thread Erick Erickson
I don't know any more than Michael, but I'd _love_ some reports from the field. There are some restriction on DocValues though, I believe one of them is that they don't really work on analyzed data FWIW, Erick On Wed, Sep 11, 2013 at 7:00 AM, Michael Sokolov < msoko...@safaribooksonline.com

Re: Dynamic analizer settings change

2013-09-11 Thread Erick Erickson
I wouldn't :). Here's the problem. Say you do this successfully at index time. How do you then search reasonably? There's often not near enough information to know what the search language is, there's little or no context. If the number of languages is limited, people often index into separate lan

Re: Regarding improving performance of the solr

2013-09-11 Thread Erick Erickson
Be a little careful when extrapolating from disk to memory. Any fields where you've set stored="true" will put data in segment files with extensions .fdt and .fdx, see These are the compressed verbatim copy of the data for stored fields and have very little impact on memory required for searching.

Re: SolrCloud 4.x hangs under high update volume

2013-09-11 Thread Erick Erickson
If you use CloudSolrServer, you need to apply SOLR-4816 or use a recent copy of the 4x branch. By "recent", I mean like today, it looks like Mark applied this early this morning. But several reports indicate that this will solve your problem. I would expect that increasing the number of shards wou

Re: Solr doesnt return answer when searching numbers

2013-09-11 Thread Erick Erickson
Mail guy. You've been around long enough to know to try adding &debug=query to your URL and looking at the results, what does that show? Best Erick On Tue, Sep 10, 2013 at 9:25 AM, Mysurf Mail wrote: > I am querying using > > http://...:8983/solr/vault/select?q="design test"&fl=PackageName >

Re: No or limited use of FieldCache

2013-09-11 Thread Michael Sokolov
On 9/11/13 3:11 AM, Per Steffensen wrote: Hi We have a SolrCloud setup handling huge amounts of data. When we do group, facet or sort searches Solr will use its FieldCache, and add data in it for every single document we have. For us it is not realistic that this will ever fit in memory and w

Dynamic analizer settings change

2013-09-11 Thread maephisto
Let's take the following type definition and schema (borrowed from Rafal Kuc's Solr 4 cookbook) : and schema: The above analizer will apply SnowballPorterFilter english language filter. But would it be possible to change the language to french during indexing for some documents. is thi

Re: charfilter doesn't do anything

2013-09-11 Thread Andreas Owen
perfect, i tried it before but always at the tail of the expression with no effect. thanks a lot. a last question, do you know how to keep the html comments from being filtered before the transformer has done its work? On 10. Sep 2013, at 3:17 PM, Jack Krupansky wrote: > Okay, I can repro the

Re: How to facet data from a multivalued field?

2013-09-11 Thread Raheel Hasan
oh got it.. Thanks a lot... On Tue, Sep 10, 2013 at 10:10 PM, Erick Erickson wrote: > You can't facet on fields where indexed="false". When you look at > output docs, you're seeing _stored_ not indexed data. Set > indexed="true" and re-index... > > Best, > Erick > > > On Tue, Sep 10, 2013 at 5:5

No or limited use of FieldCache

2013-09-11 Thread Per Steffensen
Hi We have a SolrCloud setup handling huge amounts of data. When we do group, facet or sort searches Solr will use its FieldCache, and add data in it for every single document we have. For us it is not realistic that this will ever fit in memory and we get OOM exceptions. Are there some way o

Re: Some highlighted snippets aren't being returned

2013-09-11 Thread Eric O'Hanlon
Thank you, Aloke and Bryan! I'll give this a try and I'll report back on what happens! - Eric On Sep 9, 2013, at 2:32 AM, Aloke Ghoshal wrote: > Hi Eric, > > As Bryan suggests, you should look at appropriately setting up the > fragSize & maxAnalyzedChars for long documents. > > One issue I