Re: Backup strategy for SolrCloud

2012-09-20 Thread Tommaso Teofili
I also think that's a good question and currently without a "use this" answer :-) I think it shouldn't be hard to write a Solr service querying ZK and replicate both conf and indexes (via SnapPuller or ZK itself) so that such a node is responsible to back up the whole cluster in a secure storage (N

Re: DIH import from MySQL results in garbage text for special chars

2012-09-20 Thread Pranav Prakash
I am seeing the garbage text in browser, Luke Index Toolbox and everywhere it is the same. My servlet container is Jetty which is the out-of-box one. Many other special chars are getting indexed and stored properly, only few characters causes pain. *Pranav Prakash* "temet nosce" On Fri, Sep 14

Re: Solr Swap Function doesn't work when using Solr Cloud Beta

2012-09-20 Thread sam fang
Hi Hoss, Thanks for your quick reply. Below is my solr.xml configuration, and already set persistent to true. For test1 and tets1-ondeck content, just copied from example/solr/collection1 Then publish 1 record to test1, and query. it's ok now. INFO: [test1] webapp=/solr path=/sel

Re: MMapDirectory

2012-09-20 Thread Lance Norskog
The Solr caches are thrown away on each hard commit. The document cache could be conserved across commits. Documents in segments that still exist would be saved. Documents in segments that are removed would be thrown away. Perhaps the document cache should be pushed down into Lucene, to handle t

Re: Solr Swap Function doesn't work when using Solr Cloud Beta

2012-09-20 Thread Chris Hostetter
: In Solr 3.6, core swap function works good. After switch to use Solr 4.0 : Beta, and found it doesn't work well. can you elaborate on what exactly you mean by "doesn't work well" ? .. what does your solr.xml file look like? what command did you run to do the swap? what results did you get from

Re: Backup strategy for SolrCloud

2012-09-20 Thread Upayavira
What sorts of failures are you thinking of? Power loss? Index corruption? Server overload? Could you keep somewhat remote replicas of each shard, but not behind your load balancer? Then, should all your customer facing nodes go down, those replicas would be elected leaders. When you bring the cus

Re: Solr4 how to make it do this?

2012-09-20 Thread george123
I have been thinking about this some more. So my scenario of search is as follows. A visitor types in 3 bed 2 bath condo new york Now my schema has bed, bath, property type, city. The data going in is denormalised csv files, so column headings are the fields. The search consists of a near exac

RE: Backup strategy for SolrCloud

2012-09-20 Thread Markus Jelsma
If reindexing from raw XML files is feasible (less than 30 minutes) it would be the easiest option. The problem with recovering with old snapshots is that you have to remove bad indices from all cores and possible stale (or recoveries in progress) indices and replace it with your snapshot and mo

RE: Backup strategy for SolrCloud

2012-09-20 Thread jimtronic
I'm thinking about catastrophic failure and recovery. If, for some reason, the cluster should go down or become unusable and I simply want to bring it back up as quickly as possible, what's the best way to accomplish that? Maybe I'm thinking about this incorrectly? Is this not a concern? --

RE: some general solr 4.0 questions

2012-09-20 Thread Petersen, Robert
That is a great idea to run the updates thru the LB also! I like it! Thanks for the replies guys -Original Message- From: jimtronic [mailto:jimtro...@gmail.com] Sent: Thursday, September 20, 2012 1:46 PM To: solr-user@lucene.apache.org Subject: Re: some general solr 4.0 questions I've

Re: Backup strategy for SolrCloud

2012-09-20 Thread Walter Underwood
He explained why in the message. Because it is faster to bring up a new host from a snapshot. I presume that he doesn't need the full cluster running all the time. wunder On Sep 20, 2012, at 2:19 PM, Markus Jelsma wrote: > Hi, > > Why do you want to back up? With enough machines and a decent

RE: Backup strategy for SolrCloud

2012-09-20 Thread Markus Jelsma
Hi, Why do you want to back up? With enough machines and a decent replication factor (3 or higher) there is usually little need to back it up. If you have the space it's better to launch a second cluster in another DC. You can also choose to increase the number of maxCommitsToKeep but it'll tak

Re: deleting a single value from multivalued field

2012-09-20 Thread jimtronic
Just added this today. https://issues.apache.org/jira/browse/SOLR-3862 -- View this message in context: http://lucene.472066.n3.nabble.com/deleting-a-single-value-from-multivalued-field-tp4009092p4009292.html Sent from the Solr - User mailing list archive at Nabble.com.

Backup strategy for SolrCloud

2012-09-20 Thread jimtronic
I'm trying to determine my options for backing up data from a SolrCloud cluster. For me, bringing up my cluster from scratch can take several hours. It's way faster to take snapshots of the index periodically and then use one of these when booting a new instance. Since I use static xml files and d

Re: Problems with SolrEnitityProcessor + frange filterQuery

2012-09-20 Thread Jack Krupansky
Sorry, but it looks like the SolrEntityProcessor does a raw split on commas of its "fq" parameter, with no provision for escaping. You should be able to combine the fq into the query parameter as a nested query which does not have the split issue. -- Jack Krupansky -Original Message-

Re: some general solr 4.0 questions

2012-09-20 Thread jimtronic
I've got a setup like yours -- lots of cores and replicas, but no need for shards -- and here's what I've found so far: 1. Zookeeper is tiny. I would think network I/O is going to be the biggest concern. 2. I think this is more about high availability than performance. I've been expirementing wit

Re: some general solr 4.0 questions

2012-09-20 Thread Otis Gospodnetic
I'll answer the other easy ones ;) #1 yes, no need for a ton of RAM and tons of cores. #2 it's not the overhead, it's that zookeeper is sensitive to not hearing from nodes and marking them dead, at least in the Hadoop and HBase world. #3 yes, the external LB would simply spread the query load ov

Re: Problems with SolrEnitityProcessor + frange filterQuery

2012-09-20 Thread Dirceu Vieira
Hi guys, Has anybody got any idea about that? I'm really open for any suggestions Thanks! Dirceu On Thu, Sep 20, 2012 at 11:58 AM, Dirceu Vieira wrote: > Hi, > > I'm attempting to write a filter query for my SolrEntityProcessor using > {frange} over a function. > It works fine when I'm te

Re: Regarding Search

2012-09-20 Thread Åsmund Tokheim
Hi You gave us quite little information to go on, but I can list some probable reasons for why you search doesn't match any documents. In schema.xml check that: - you have specified fields for custid, familyname, usrname - that those fields have the attribute indexed="true" - that they are not of

Re: some general solr 4.0 questions

2012-09-20 Thread Erik Hatcher
I'll answer the easy one: #4 - yes! In fact, it would seem wise in many of these straightforward cases like yours to leave standard master/slave as-is for the time being even when upgrading to Solr 4. No need to make life more complicated. Now, if you did want to have NRT where updates are

some general solr 4.0 questions

2012-09-20 Thread Petersen, Robert
Hello solr user group, I am evaluating the new Solr 4.0 beta with an eye to how to fit it into our current solr setup. Our current setup is running on solr 3.6.1 and uses 12 slaves behind a load balancer and a master which we index into, and they all have three cores (now referred to as collec

Re: Best way to index Solr XML from w/in the same servlet container

2012-09-20 Thread Chris Hostetter
: I've created a custom process in Solr that has a Zookeeper Watcher : configured to pull Solr XML files from a znode. When I receive a file I can : send the file to /update and get it indexed, but that seems inefficient. I : could use SolrJ, but I believe that is still sending an HTTP request to

Re: MMapDirectory

2012-09-20 Thread Mikhail Khludnev
My limited understanding, confirmed by profiler though, is that doing mmap IO cost you a copying bytes from mmaped virtual memory into heap VM. Just look into java.nio.DirectByteBuffer.get(byte[], int, int) . It happens several times to me - we saw hotspot in profiler on mmaped IO (yep, just in co

Re: Split XML configuration

2012-09-20 Thread Michael Della Bitta
Ah, I just upgraded us to 3.6, and abandoned xi:include in favor of symlinks, so I didn't know whether it was fixed or not. Another thing I just thought of is if you want your config files to be available from the web UI, the xi:include directives won't be resolved, so you'll just see the literal

MMapDirectory

2012-09-20 Thread Erick Erickson
So I just had a curiosity question pop up and wanted to check it out. Solr has the documentCache, designed to hold stored fields while various parts of a requestHandler do their tricks, keeping the stored content from having to be re-fetched from disk. When using MMapDirectory, is this even somethi

Re: Split XML configuration

2012-09-20 Thread Chris Hostetter
: "xi:include" directives work in Solr config files, but in most (all?) : versions of Solr, they require absolute paths, which makes portable : configuration slightly more sticky. Still, a very viable solution. Huh? There were bugs in xinclude parsing up to Solr 1.4 that caused relative paths

4.0.snapshot to 4.0.beta index migration

2012-09-20 Thread vybe3142
Hi We have a bunch of data that was indexes using a 4.0 snapshot build of solr We'd like to migrate to the 4.0.beta version. Is there a reccomended way to migrate the indices or is reindexing the best option Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/4-0-snapsh

Re: deleting a single value from multivalued field

2012-09-20 Thread Jack Krupansky
There isn’t a mechanism to update or delete only a subset of a multivalued field. You would have to supply the full list of values you want to have in the multivalued field. You may want to offer it as a suggested improvement. -- Jack Krupansky -Original Message- From: deniz Sent: T

Prevent Log and other math functions from returning "Infinity" and erroring out

2012-09-20 Thread Amit Nithian
Is there any reason why the log function shouldn't be modified to always take 1+the number being requested to be log'ed? Reason I ask is I am taking the log of the value output by another function which could return 0. For testing, I modified it to return 1 which works but would rather have the log

RE: Wildcard searches don't work

2012-09-20 Thread Alex Cougarman
Thanks, Erick. That really helped us in learning about tokens and how the Analyzer works. Thank you! Warm regards, Alex -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: 19 September 2012 3:56 PM To: solr-user@lucene.apache.org Subject: Re: Wildcard search

Correct way of storing longitude and latitude

2012-09-20 Thread Spadez
Hi, Sorry for all the questions today but I paid a third party coder to develop a schema for me but now that I have more of an understanding myself I have a questions. The aim is to do spacial searching so in my schema I have this: My site doesnt seem to submit via JSON to lat_lng_0_coordina

Re: Solr 4.0 - disappointing results sharding on 1 machine

2012-09-20 Thread Yonik Seeley
Depends on where the bottlenecks are I guess. On a single system, increasing shards decreases throughput (this isn't specific to Solr). The increased parallelism *can* decrease latency to the degree that the parts that were parallelized outweigh the overhead. Going from one shard to two shards

Re: Solr 4.0 - disappointing results sharding on 1 machine

2012-09-20 Thread Tom Mortimer
Before anyone asks, these results were obtained warm. On 20 Sep 2012, at 14:39, Tom Mortimer wrote: > Hi all, > > After reading > http://carsabi.com/car-news/2012/03/23/optimizing-solr-7x-your-search-speed/ > , I thought I'd do my own experiments. I used 2M docs from wikipedia, indexed > in

solrcloud and csv import hangs

2012-09-20 Thread dan sutton
Hi, I'm using Solr 4.0-BETA and trying to import a CSV file as follows: curl http://localhost:8080/solr//update -d overwrite=false -d commit=true -d stream.contentType='text/csv;charset=utf-8' -d stream.url=file:///dir/file.csv I have 2 tomcat servers running on different machines and a separate

Re: "&" char in querystring

2012-09-20 Thread Jack Krupansky
Ah... you are probably not "encoding" the & and % in your URL, so they are being eaten when the URL is parsed. Use % followed by the 2-digit hex ASCII character code. & should be %26 and % should be %25. -- Jack Krupansky -Original Message- From: Gustav Sent: Thursday, September 20,

Re: indexing issue

2012-09-20 Thread Jack Krupansky
You probably are using a "text" field which is tokenizing the input when this data should probably be a "string" (or "text" with the KeywordAnalyzer.) -- Jack Krupansky -Original Message- From: zainu Sent: Thursday, September 20, 2012 5:49 AM To: solr-user@lucene.apache.org Subject:

Re: "&" char in querystring

2012-09-20 Thread Gustav
Hello Jack, My the fieldtype is configured as following: What other filter could i use to preserve the "&" char? Another problem that came up, is when i search for ?q="0,5%" it gives an error: HTTP Status 400 - missing query string Probably

RE: Solr Write workload

2012-09-20 Thread John, Phil (CSS)
But even with XA log, am I correct in thinking that the writes themselves will be mostly sequential? Regards, Phil. From: Otis Gospodnetic [mailto:otis.gospodne...@gmail.com] Sent: Thu 20/09/2012 14:09 To: solr-user@lucene.apache.org Subject: Re: Solr Write w

Re: Compond File Format Advice needed - On migration to 3.6.1

2012-09-20 Thread Jack Krupansky
Seriously, if you are having trouble finding the build file, I would suggest that you do a lot more homework reading and studying the available Solr and Lucene materials online before asking for further assistance. Start with: http://lucene.apache.org/solr/ http://lucene.apache.org/solr/version

Re: "&" char in querystring

2012-09-20 Thread Jack Krupansky
Use a field type whose analyzer preserves the &. What field type are you using? -- Jack Krupansky -Original Message- From: Gustav Sent: Thursday, September 20, 2012 9:05 AM To: solr-user@lucene.apache.org Subject: "&" char in querystring Good Morning Everyone! Again, i need your help

Solr 4.0 - disappointing results sharding on 1 machine

2012-09-20 Thread Tom Mortimer
Hi all, After reading http://carsabi.com/car-news/2012/03/23/optimizing-solr-7x-your-search-speed/ , I thought I'd do my own experiments. I used 2M docs from wikipedia, indexed in Solr 4.0 Beta on a standard EC2 large instance. I compared an unsharded and 2-shard configuration (the latter set

Re: ID reference field - Needed but not searchable or retrievable

2012-09-20 Thread Tom Mortimer
Hi James, If you don't want this field to be included in user searches, just omit it from the search configuration (e.g. if using eDisMax parser, don't put it in the qf list). To keep it out of search results, exclude it from the fl list. See http://wiki.apache.org/solr/CommonQueryParam

Re: ramBufferSizeMB

2012-09-20 Thread Otis Gospodnetic
Hi, And there is a wonderful report in SPM for Solr that shows how your index changes over time in terms of size, index files, segments, indexed docs, deleted docs... very useful for understanding what's going on at that level. Otis -- Performance Monitoring - http://sematext.com/spm On Sep 20, 2

Re: Solr Write workload

2012-09-20 Thread Otis Gospodnetic
Hi, Right, documents are buffered in jvm heap according to ramBufferSizeMB setting before getting indexed. But xa log doesn't do that I don't think. Otis -- Performance Monitoring - http://sematext.com/spm On Sep 20, 2012 8:11 AM, "John, Phil (CSS)" wrote: > Hi, > > We're in the process of final

"&" char in querystring

2012-09-20 Thread Gustav
Good Morning Everyone! Again, i need your help Lucene comunity! I have a query string just like this: q="johnson & johnson" and when i use debugQuery=true i realize that the Solrparse breaks the string exactly in the "&" char, changing my query to q="Johnson", i would like to know, is there any wa

Re: Split XML configuration

2012-09-20 Thread Michael Della Bitta
Hi, Simone: "xi:include" directives work in Solr config files, but in most (all?) versions of Solr, they require absolute paths, which makes portable configuration slightly more sticky. Still, a very viable solution. Michael Della Bitta Appinions

Re: ID reference field - Needed but not searchable or retrievable

2012-09-20 Thread Michael Della Bitta
Hi, James, If you don't store or index this value, it won't exist in Solr. If you want to be able to find these records by the unique id, you need to index it. If you want to find the corresponding DB record from a Solr document you brought up by other means, you'll need to store the unique id.

ID reference field - Needed but not searchable or retrievable

2012-09-20 Thread Spadez
Hi. My SQL database assigns a uniqueID to each item. I want to keep this uniqueID assosiated to the items that are in Solr even though I wont ever need to display them or have them searchable. I do however what to be able to target specific items in Solr with it, for updating or deleting the recor

Regarding Search

2012-09-20 Thread darshan
HI Fellows, I had added the following fields in my data-config.xml to implement Data Import Handler When I perform steps of Full import Example at http://wiki.apache.org/solr/DataImportHandler I can successfully index on my databas

Re: SOLR memory usage jump in JVM

2012-09-20 Thread Erick Erickson
Yeah, I sent a note to the web folks there about the images. I'll leave the rest to people who really _understand_ all that stuff On Thu, Sep 20, 2012 at 8:31 AM, Bernd Fehling wrote: > Hi Erik, > > thanks for the link. > Now if we could see the images in that article that would be great

Re: SOLR memory usage jump in JVM

2012-09-20 Thread Bernd Fehling
Hi Erik, thanks for the link. Now if we could see the images in that article that would be great :-) By the way, one cause for the memory jumps was located as "killer search" from a user. The interesting part is that the verbose gc.log showed a "hiccup" in the GC. Which means that during a GC r

Solr Write workload

2012-09-20 Thread John, Phil (CSS)
Hi, We're in the process of finalising the specification for our Solr cluster and just wanted to double check something: What is the major IO/write workload type in Solr? >From what I understand, the main workload appears to be largely sequential >appends to segments, rather than heavily bi

Re: indexing issue

2012-09-20 Thread Erick Erickson
Not enough info to go on here, what is your fieldType? But the first place to look is admin/analysis to see how the text is tokenized. Best Erick On Thu, Sep 20, 2012 at 5:49 AM, zainu wrote: > Dear fellows, > I have a field in solr with value '8E0061123-8E1'. Now when i seach '8E*', > it does

indexing issue

2012-09-20 Thread zainu
Dear fellows, I have a field in solr with value '8E0061123-8E1'. Now when i seach '8E*', it does return me all values starting with'8E' which is totally right but it returns nothing when i search '8E0*'. I guess it is not indexing 8E0 or so. I want to search with all combinations likes '8E', '8E0',

Re: SOLR memory usage jump in JVM

2012-09-20 Thread Erick Erickson
Here's a wonderful writeup about GC and memory in Solr/Lucene: http://searchhub.org/dev/2011/03/27/garbage-collection-bootcamp-1-0/ Best Erick On Thu, Sep 20, 2012 at 5:49 AM, Robert Muir wrote: > On Thu, Sep 20, 2012 at 3:09 AM, Bernd Fehling > wrote: > >> By the way while looking for upgradi

Re: ramBufferSizeMB

2012-09-20 Thread Erick Erickson
> Is it correct that a segment file is ready for merging after a commit has > been done (e.g. using the autoCommit property), so I will see merges of 100 > and up documents (and the index writer continues writing into a new segment > file)? Yes, merging won't happen until after a segment is closed

Re: how to set Automatic dataimport in solr 4.0

2012-09-20 Thread Erick Erickson
Well, from the bullet points on the Wiki page: Planned to be included in Solr_4.1 The JIRA referenced points to a Jar that Marko kindly provides, you can try that. Best Erick On Wed, Sep 19, 2012 at 10:22 PM, rayvicky wrote: > dataimport.properties > #Thu Sep 20 10:11:09 CST 2012 > interval=1

Split XML configuration

2012-09-20 Thread Finotti Simone
Hi, is it possible to split schema.xml and solrconfig.xml configurations? My configurations are getting quite large and I'd like to be able to partition them logically in multiple files. thank you in advance, S

Problems with SolrEnitityProcessor + frange filterQuery

2012-09-20 Thread Dirceu Vieira
Hi, I'm attempting to write a filter query for my SolrEntityProcessor using {frange} over a function. It works fine when I'm testing it on the admin, but once I move it into my data-config.xml the query blows up because of the commas in the function. The problem is that fq parameter can be a comma

Re: SOLR memory usage jump in JVM

2012-09-20 Thread Robert Muir
On Thu, Sep 20, 2012 at 3:09 AM, Bernd Fehling wrote: > By the way while looking for upgrading to JDK7, the release notes say under > section > "known issues" about the "PorterStemmer" bug: > "...The recommended workaround is to specify -XX:-UseLoopPredicate on the > command line." > Is this st

Dynamically field selection for Solr Suggestion (Spellcheck) multiple term query

2012-09-20 Thread zbindigonzales
Hello everybode. I already posted this question on stackoverflow but didn't get an answer. I am using the solr suggestion component with the following configuration: schema.xml solrconfig.xml suggest org.apache.solr.spelling.suggest.Suggester

Re: ramBufferSizeMB

2012-09-20 Thread Trym R. Møller
Hi Thanks a lot for your answer, Erick! I changed the value of the autoSoftCommit property and it had the expected effect. It can be noted that this is per Core, so I get four getReader calls when my Solr contains four cores per autoSoftCommit interval. Is it correct that a segment file is

Re: what happends with slave during repliacation?

2012-09-20 Thread Bernd Fehling
Hi Alex, during replication the slave is still available and serving requests but as you can imagine the responses will be slower because of disk usage, even with 15k rpm disks. We have one master and two slaves. Master only for indexing, slaves for searching. Only one slave is online the other i

RE: what happends with slave during repliacation?

2012-09-20 Thread Harshvardhan Ojha
Hi Alex, During replication also your slave will be available for searches and opens a new searcher just after replication. You won't get any downtime, but you might not have warmed cache at the moment. Please look into cache configuration for solr. Regards Harshvardhan OJha -Original Mes

RE: Nodes cannot recover and become unavailable

2012-09-20 Thread Markus Jelsma
Hi - at first i didn't recreate the Zookeeper data but i got it to work. I'll check the removal of the LOG line. thanks -Original message- > From:Sami Siren > Sent: Wed 19-Sep-2012 17:45 > To: solr-user@lucene.apache.org > Subject: Re: Nodes cannot recover and become unavailable > > a

what happends with slave during repliacation?

2012-09-20 Thread Alex
Hi All! I want to replicate my Solr server. At the begining I want to have one master and one slave. Master would serve for indexing and slave (slaves in the future) would be used for searching. I was wondering if anybody could tell me what happens with slave during replication. Is it unavaila

Re: SOLR memory usage jump in JVM

2012-09-20 Thread Bernd Fehling
That is the problem with a jvm, it is a virtual machine. Ask 10 experts about a good jvm settings and you get 15 answers. May be a tradeoff of the flexibility of jvm's. There is always a right setting for any application running on a jvm but you just have to find it. How about a Solr Wiki page abo

Solr 3.6 observe connections in CLOSE_WAIT state

2012-09-20 Thread Alok Bhandari
Hello, I am using solr 3.6.0 , I have observed many connection in CLOSE_WAIT state after using solr server for some time. On further analysis and googling found that I need to close the idle connections from the client which is connecting to solr to query data and it does reduce the number of CLO