Re: Difference Between Indexing and Reindexing

2013-04-03 Thread Furkan KAMACI
Hi Otis, then what is the difference between add and update? And how we update or add documents into Solr (I see that there is just one update handler)? 2013/4/4 Otis Gospodnetic > I don't recall what Nutch does, so it's hard to tell. > > In Solr (Lucene, really), you can: > * add documents > *

Re: maxWarmingSearchers in Solr 4.

2013-04-03 Thread Dotan Cohen
On Wed, Apr 3, 2013 at 7:55 PM, Shawn Heisey wrote: > In situations where I don't want to change the default value, I prefer > to leave config elements out of the solrconfig. It makes the config > smaller, and it also makes it so that I will automatically see benefits > from the default changing

Re: solre scores remains same for exact match and nearly exact match

2013-04-03 Thread amit
when I use the copy field destination as "text" it works fine. I get a boost for exact match. But if I use some other field the score is not boosted for exact match. Not sure if I am in the right direction.. I am new to solr please bear with me I checked this link http://wiki.apache.org/solr/So

Zookeeper dataimport.properties node

2013-04-03 Thread Nathan Findley
- Is dataimport.properties ever written to the filesystem? (Trying to determine if I have a permissions error because I don't see it anywhere on disk). - How do you manually edit dataimport.properties? My system is periodically pulling in new data. If that process has issues, I want to be able

Re: Solr 4.2 Cloud Replication Replica has higher version than Master?

2013-04-03 Thread Jamie Johnson
sorry that should say none of the _* files were present, not one On Wed, Apr 3, 2013 at 10:16 PM, Jamie Johnson wrote: > I have since removed the files but when I had looked there was an index > directory, the only files I remember being there were the segments, one of > the _* files were p

Re: Solr 4.2 Cloud Replication Replica has higher version than Master?

2013-04-03 Thread Jamie Johnson
I have since removed the files but when I had looked there was an index directory, the only files I remember being there were the segments, one of the _* files were present. I'll watch it to see if it happens again but it happened on 2 of the shards while heavy indexing. On Wed, Apr 3, 2013 at 1

Re: Solr 4.2 Cloud Replication Replica has higher version than Master?

2013-04-03 Thread Mark Miller
Is that file still there when you look? Not being able to find an index file is not a common error I've seen recently. Do those replicas have an index directory or when you look on disk, is it an index.timestamp directory? - Mark On Apr 3, 2013, at 10:01 PM, Jamie Johnson wrote: > so somethi

Re: hl.usePhraseHighlighter defaults to true but Query form and wiki suggest otherwise

2013-04-03 Thread Mark Miller
It was def intentional to make it default to true, but I believe that was changed at one point from initially defaulting to false - the doc was probably not updated and that slipped into he UI. Thanks for looking into this. - Mark On Apr 3, 2013, at 8:50 PM, Timothy Potter wrote: > Minor issu

[ANNOUNCE] Apache Solr 4.2.1 released

2013-04-03 Thread Mark Miller
April 2013, Apache Solr™ 4.2.1 available The Lucene PMC is pleased to announce the release of Apache Solr 4.2.1 Solr is the popular, blazing fast, open source NoSQL search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted searc

Re: Solr 4.2 Cloud Replication Replica has higher version than Master?

2013-04-03 Thread Jamie Johnson
so something is still not right. Things were going ok, but I'm seeing this in the logs of several of the replicas SEVERE: Unable to create core: dsc-shard3-core1 org.apache.solr.common.SolrException: Error opening new searcher at org.apache.solr.core.SolrCore.(SolrCore.java:822) a

Re: Difference Between Indexing and Reindexing

2013-04-03 Thread Otis Gospodnetic
I don't recall what Nutch does, so it's hard to tell. In Solr (Lucene, really), you can: * add documents * update documents * delete documents Currently, update is really a delete + readd under the hood. It's been like that for 13+ years, but this may change: https://issues.apache.org/jira/brows

Difference Between Indexing and Reindexing

2013-04-03 Thread Furkan KAMACI
OK, This could be a so easy question but I want to learn just a bit more technical detail of it. When I use Nutch to send documents to Solr to be indexing there are two parameters: -index and -reindex. What Solr does at each one different from the other one?

Re: Solr 4.2 Cloud Replication Replica has higher version than Master?

2013-04-03 Thread Jamie Johnson
Thanks I will try that. On Wed, Apr 3, 2013 at 8:28 PM, Mark Miller wrote: > > > On Apr 3, 2013, at 8:17 PM, Jamie Johnson wrote: > > > I am not using the concurrent low pause garbage collector, I could look > at > > switching, I'm assuming you're talking about adding > -XX:+UseConcMarkSweepGC

hl.usePhraseHighlighter defaults to true but Query form and wiki suggest otherwise

2013-04-03 Thread Timothy Potter
Minor issues - It seems that the hl.usePhraseHighlighter is enabled by default, which definitely makes sense but the wiki says it's default value is "false" and the checkbox is unchecked by default on the Query form. This gives the impression this parameter defaults to "false". I'm assuming the co

Re: Solr 4.2 Cloud Replication Replica has higher version than Master?

2013-04-03 Thread Mark Miller
On Apr 3, 2013, at 8:17 PM, Jamie Johnson wrote: > I am not using the concurrent low pause garbage collector, I could look at > switching, I'm assuming you're talking about adding -XX:+UseConcMarkSweepGC > correct? Right - if you don't do that, the default is almost always the throughput coll

Re: Solr 4.2 Cloud Replication Replica has higher version than Master?

2013-04-03 Thread Jamie Johnson
I am not using the concurrent low pause garbage collector, I could look at switching, I'm assuming you're talking about adding -XX:+UseConcMarkSweepGC correct? I also just had a shard go down and am seeing this in the log SEVERE: org.apache.solr.common.SolrException: I was asked to wait on state

Re: Solr 4.2 Cloud Replication Replica has higher version than Master?

2013-04-03 Thread Mark Miller
This shouldn't be a problem though, if things are working as they are supposed to. Another node should simply take over as the overseer and continue processing the work queue. It's just best if you configure so that session timeouts don't happen unless a node is really down. On the other hand, i

Re: Solr 4.2 Cloud Replication Replica has higher version than Master?

2013-04-03 Thread Mark Miller
Yeah. Are you using the concurrent low pause garbage collector? This means the overseer wasn't able to communicate with zk for 15 seconds - due to load or gc or whatever. If you can't resolve the root cause of that, or the load just won't allow for it, next best thing you can do is raise it to 3

Re: Solr 4.2 Cloud Replication Replica has higher version than Master?

2013-04-03 Thread Jamie Johnson
I am occasionally seeing this in the log, is this just a timeout issue? Should I be increasing the zk client timeout? WARNING: Overseer cannot talk to ZK Apr 3, 2013 11:14:25 PM org.apache.solr.cloud.DistributedQueue$LatchChildWatcher process INFO: Watcher fired on path: null state: Expired type

RE: Solr Multiword Search

2013-04-03 Thread skmirch
The following query is doing a word search (based on my previous post)... solr/spell?q=(charles+and+the+choclit+factory+OR+(title2:("charles+and+the+choclit+factory")))&spellcheck.collate=true&spellcheck=true&spellcheck.q=charles+and+the+choclit+factory It produces a lot of unwanted matches.

Re: Solr 4.2 Cloud Replication Replica has higher version than Master?

2013-04-03 Thread Jamie Johnson
just an update, I'm at 1M records now with no issues. This looks promising as to the cause of my issues, thanks for the help. Is the routing method with numShards documented anywhere? I know numShards is documented but I didn't know that the routing changed if you don't specify it. On Wed, Apr

Re: Solr metrics in Codahale metrics and Graphite?

2013-04-03 Thread Walter Underwood
In the Jira, but not in the docs. It would be nice to have VM stats like GC, too, so we can have common monitoring and alerting on all our services. wunder On Apr 3, 2013, at 3:31 PM, Otis Gospodnetic wrote: > It's there! :) > http://search-lucene.com/?q=percentile&fc_project=Solr&fc_type=iss

Re: Solr metrics in Codahale metrics and Graphite?

2013-04-03 Thread Otis Gospodnetic
It's there! :) http://search-lucene.com/?q=percentile&fc_project=Solr&fc_type=issue Otis -- Solr & ElasticSearch Support http://sematext.com/ On Wed, Apr 3, 2013 at 6:29 PM, Walter Underwood wrote: > That sounds great. I'll check out the bug, I didn't see anything in the docs > about this.

Re: Solr metrics in Codahale metrics and Graphite?

2013-04-03 Thread Walter Underwood
That sounds great. I'll check out the bug, I didn't see anything in the docs about this. And if I can't find it with a search engine, it probably isn't there. --wunder On Apr 3, 2013, at 6:39 AM, Shawn Heisey wrote: > On 3/29/2013 12:07 PM, Walter Underwood wrote: >> What are folks using for t

RE: Solr Multiword Search

2013-04-03 Thread skmirch
Hi James, Thanks for the information you have provided. I tried your suggestion and it helped a lot. However, as close as this seems to what I want, I still need for it to match the exact phrases that closely match my search words. So while I am now using the search words in q and also spellch

Re: Solr metrics in Codahale metrics and Graphite?

2013-04-03 Thread Otis Gospodnetic
Hi, We're using... eh, our SPM for Solr - http://sematext.com/spm/solr-performance-monitoring/index.html (Wunder, I think somebody from Chegg actually looked into using it - please ping if you need more info) Shawn, metrics 3.0.0beta1 is out, apparently very reworked, so might be worth revisi

Streaming search results

2013-04-03 Thread Victor Miroshnikov
Is it possible to stream search results from Solr? Seems that this feature is missing. I see two options to solve this: 1. Using search results pagination feature The idea is to implement a smart proxy that will stream chunks from search results using pagination. 2. Implement Solr plugin with

Re: Filtering Search Cloud

2013-04-03 Thread Shawn Heisey
On 4/3/2013 1:52 PM, Furkan KAMACI wrote: > Thanks for your explanation, you explained every thing what I need. Just > one more question. I see that I can not make it with Solr Cloud, but I can > do something like that with master-slave replication of Solr. If I use > master-slave replication of So

Re: SolrCloud not distributing documents across shards

2013-04-03 Thread Mark Miller
On Apr 3, 2013, at 5:53 PM, Michael Della Bitta wrote: > From what I can tell, the Collections API has been hardened > significantly since 4.2 I did a lot of work here for 4.2.1 - there was a lot to improve. Hopefully there is much less now, but if anyone finds anything, I'll fix any JIRA's.

Re: SolrCloud not distributing documents across shards

2013-04-03 Thread Michael Della Bitta
>From what I can tell, the Collections API has been hardened significantly since 4.2 and now will refuse to create a collection if you give it something ambiguous to do. So if you upgrade to 4.2, things will become more safe. But overall I'd find a way of using the Collections API that works and s

Re: Solr Tika Override

2013-04-03 Thread Jan Høydahl
You'd probably want to work on the XML output from Tika's PDF parser, from which you can identify which page and context. Personally I would build a separate indexing application in Java and call Tika directly, then build a SolrInputDocument which you pass to solr through SolrJ. I.e. not use Ex

Re: do SearchComponents have access to response contents

2013-04-03 Thread Jack Krupansky
The search components can see the "response" as a namedlist, but it is only when SolrDispatchFIlter calls the QueryResponseWriter that XML or JSON or whatever other format (Javabin as well) is generated from the named list for final output in an HTTP response. You probably want a custom query

Re: Question on Exact Matches - edismax

2013-04-03 Thread Jan Høydahl
Can you show us your *_ci field type? Solr does not really have a way to tell whether a match is "exact" or only partial, but you could hack around it with the fieldType. See https://github.com/cominvent/exactmatch for a possible solution. -- Jan Høydahl, search solution architect Cominvent AS

Re: SolrCloud not distributing documents across shards

2013-04-03 Thread vsilgalis
Michael Della Bitta-2 wrote > If you can work with a clean state, I'd turn off all your shards, > clear out the Solr directories in Zookeeper, reset solr.xml for each > of your shards, upgrade to the latest version of Solr, and turn > everything back on again. Then upload config, recreate your > co

Re: HTML entities being missed by DIH HTMLStripTransformer

2013-04-03 Thread Steve Rowe
Cool, glad I was able to help. On Apr 3, 2013, at 4:18 PM, Ashok wrote: > Hi Steve, > > Fabulous suggestion! Yup, that is it! Using the HTMLStripTransformer twice > did the trick. I am using Solr 4.1. > > Thank you very much! > > - ashok > > > > -- > View this message in context: > http:/

Re: Solr 4.2 Cloud Replication Replica has higher version than Master?

2013-04-03 Thread Jamie Johnson
with these changes things are looking good, I'm up to 600,000 documents without any issues as of right now. I'll keep going and add more to see if I find anything. On Wed, Apr 3, 2013 at 4:01 PM, Jamie Johnson wrote: > ok, so that's not a deal breaker for me. I just changed it to match the >

do SearchComponents have access to response contents

2013-04-03 Thread xavier jmlucjav
I need to implement some SearchComponent that will deal with metrics on the response. Some things I see will be easy to get, like number of hits for instance, but I am more worried with this: We need to also track the size of the response (as the size in bytes of the whole xml response tat is stre

Re: HTML entities being missed by DIH HTMLStripTransformer

2013-04-03 Thread Ashok
Hi Steve, Fabulous suggestion! Yup, that is it! Using the HTMLStripTransformer twice did the trick. I am using Solr 4.1. Thank you very much! - ashok -- View this message in context: http://lucene.472066.n3.nabble.com/HTML-entities-being-missed-by-DIH-HTMLStripTransformer-tp4053582p4053609.h

Re: Solr 4.2 Cloud Replication Replica has higher version than Master?

2013-04-03 Thread Jamie Johnson
ok, so that's not a deal breaker for me. I just changed it to match the shards that are auto created and it looks like things are happy. I'll go ahead and try my test to see if I can get things out of sync. On Wed, Apr 3, 2013 at 3:56 PM, Mark Miller wrote: > I had thought you could - but loo

Re: Solr 4.2 Cloud Replication Replica has higher version than Master?

2013-04-03 Thread Mark Miller
I had thought you could - but looking at the code recently, I don't think you can anymore. I think that's a technical limitation more than anything though. When these changes were made, I think support for that was simply not added at the time. I'm not sure exactly how straightforward it would

Re: Filtering Search Cloud

2013-04-03 Thread Furkan KAMACI
Thanks for your explanation, you explained every thing what I need. Just one more question. I see that I can not make it with Solr Cloud, but I can do something like that with master-slave replication of Solr. If I use master-slave replication of Solr, can I eliminate (filter) something (something

Re: SolrCloud not distributing documents across shards

2013-04-03 Thread Michael Della Bitta
If you can work with a clean state, I'd turn off all your shards, clear out the Solr directories in Zookeeper, reset solr.xml for each of your shards, upgrade to the latest version of Solr, and turn everything back on again. Then upload config, recreate your collection, etc. I do it like this, but

Re: Solr 4.2 Cloud Replication Replica has higher version than Master?

2013-04-03 Thread Jamie Johnson
answered my own question, it now says compositeId. What is problematic though is that in addition to my shards (which are say jamie-shard1) I see the solr created shards (shard1). I assume that these were created because of the numShards param. Is there no way to specify the names of these shard

Re: SolrCloud not distributing documents across shards

2013-04-03 Thread vsilgalis
Michael Della Bitta-2 wrote > With earlier versions of Solr Cloud, if there was any error or warning > when you made a collection, you likely were set up for "implicit" > routing which means that documents only go to the shard you're talking > to. What you want is "compositeId" routing, which works

Re: HTML entities being missed by DIH HTMLStripTransformer

2013-04-03 Thread Steve Rowe
Hi Ashok, HTMLStripTransformer uses HTMLStripCharFilter under the hood, and HTMLStripCharFilter converts all HTML entities to their corresponding characters. What version of Solr are you using? My guess is that it only appears that nothing is happening, since when they are presented in a brow

Re: Filtering Search Cloud

2013-04-03 Thread Shawn Heisey
On 4/3/2013 1:13 PM, Furkan KAMACI wrote: > Shawn, thanks for your detailed explanation. My system will work on high > load. I mean I will always index something and something always will be > queried at my system. That is why I consider about physically separating > indexer and query reply machine

Re: HTML entities being missed by DIH HTMLStripTransformer

2013-04-03 Thread Alexandre Rafalovitch
Then, I would say, you have a bigger problem However, you can probably run RegEx filter and replace those known escapes with real characters before you run your HTMLStrip filter. Or run, HTMLStrip, RegEx and HTMLStrip again. Regards, Alex. Personal blog: http://blog.outerthoughts.com/ Lin

Re: SolrCloud not distributing documents across shards

2013-04-03 Thread Michael Della Bitta
With earlier versions of Solr Cloud, if there was any error or warning when you made a collection, you likely were set up for "implicit" routing which means that documents only go to the shard you're talking to. What you want is "compositeId" routing, which works how you think it should. Go into t

Re: Solr 4.2 Cloud Replication Replica has higher version than Master?

2013-04-03 Thread Jamie Johnson
ah interestingso I need to specify num shards, blow out zk and then try this again to see if things work properly now. What is really strange is that for the most part things have worked right and on 4.2.1 I have 600,000 items indexed with no duplicates. In any event I will specify num shards

Re: HTML entities being missed by DIH HTMLStripTransformer

2013-04-03 Thread Ashok
Well, the database field has text, sometimes with HTML entities and at other times with html tags. I have no control over the process that populates the database tables with info. -- View this message in context: http://lucene.472066.n3.nabble.com/HTML-entities-being-missed-by-DIH-HTMLStripTra

Re: Solr 4.2 Cloud Replication Replica has higher version than Master?

2013-04-03 Thread Mark Miller
If you don't specify numShards after 4.1, you get an implicit doc router and it's up to you to distribute updates. In the past, partitioning was done on the fly - but for shard splitting and perhaps other features, we now divvy up the hash range up front based on numShards and store it in ZooKee

Re: Filtering Search Cloud

2013-04-03 Thread Furkan KAMACI
Shawn, thanks for your detailed explanation. My system will work on high load. I mean I will always index something and something always will be queried at my system. That is why I consider about physically separating indexer and query reply machines. I think about that: imagine a machine that both

Re: HTML entities being missed by DIH HTMLStripTransformer

2013-04-03 Thread Gora Mohanty
On 4 April 2013 00:30, Ashok wrote: [...] > Two questions. > > (1) Is this the expected behavior of DIH HTMLStripTransformer? Yes, I believe so. > (2) If yes, is there an another transformer that I can employ first to turn > these html entities into their usual symbols that can then be removed b

HTML entities being missed by DIH HTMLStripTransformer

2013-04-03 Thread Ashok
Hi, I am using DIH to index some database fields. These fields contain html formatted text in them. I use the 'HTMLStripTransformer' to remove that markup. This works fine when the text is like for example: Item One or *This is in Bold* However when the text has HTML entity names like in:
  • I

  • Re: SolrCloud not distributing documents across shards

    2013-04-03 Thread vsilgalis
    Chris Hostetter-3 wrote > I'm not familiar with the details, but i've seen miller respond to a > similar question with reference to the issue of not explicitly specifying > numShards when creating your collections... > > http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201303.mbox/% >

    Re: Solr 4.2 Cloud Replication Replica has higher version than Master?

    2013-04-03 Thread Jamie Johnson
    The router says "implicit". I did start from a blank zk state but perhaps I missed one of the ZkCLI commands? One of my shards from the clusterstate.json is shown below. What is the process that should be done to bootstrap a cluster other than the ZkCLI commands I listed above? My process right

    Re: It seems a issue of deal with chinese synonym for solr

    2013-04-03 Thread Kuro Kurosaka
    On 3/11/13 6:15 PM, 李威 wrote: in org.apache.solr.parser.SolrQueryParserBase, there is a function: "protected Query newFieldQuery(Analyzer analyzer, String field, String queryText, boolean quoted) throws SyntaxError" The below code can't process chinese rightly. " BooleanClause.Occur

    Re: SolrCloud not distributing documents across shards

    2013-04-03 Thread Chris Hostetter
    : So we indexed a set of 33010 documents on server01 which are now in shard1. : And we kicked off a set of 85934 documents on server02 which are now in : shard2 (as tests). In my understanding of how SolrCloud works, the : documents should be distributed across the shards in the collection. Now I

    Re: Solr 4.2 Cloud Replication Replica has higher version than Master?

    2013-04-03 Thread Mark Miller
    It should be part of your clusterstate.json. Some users have reported trouble upgrading a previous zk install when this change came. I recommended manually updating the clusterstate.json to have the right info, and that seemed to work. Otherwise, I guess you have to start from a clean zk state.

    Re: Solr 4.2 Cloud Replication Replica has higher version than Master?

    2013-04-03 Thread Jamie Johnson
    Where is this information stored in ZK? I don't see it in the cluster state (or perhaps I don't understand it ;) ). Perhaps something with my process is broken. What I do when I start from scratch is the following ZkCLI -cmd upconfig ... ZkCLI -cmd linkconfig but I don't ever explicitly c

    Re: SolrCloud not distributing documents across shards

    2013-04-03 Thread vsilgalis
    Michael Della Bitta-2 wrote > Hello Vytenis, > > What exactly do you mean by "aren't distributing across the shards"? > Do you mean that POSTs against the server for shard 1 never end up > resulting in documents saved in shard 2? So we indexed a set of 33010 documents on server01 which are now in

    RE: Solr Multiword Search

    2013-04-03 Thread Dyer, James
    You have specified "spellcheck.q" in your query. The whole purpose of "spellcheck.q" is to bypass any query converter you've configured giving it raw keywords instead. But possibly a custom query converter is not your best answer? I agree that charles > charlie is an edit distance of 2, so if

    RE: AW: AW: java.lang.OutOfMemoryError: Map failed

    2013-04-03 Thread Van Tassell, Kristian
    I just posted a similar error and discovered that decreasing the Xmx fixed the problem for me. The "free" command/top, etc. indicated I was flying just below the threshold for my allowed memory, and with swap/virtual space available, so I'm still confused as to what the issue is, but you may try

    Re: Solr 4.2 Cloud Replication Replica has higher version than Master?

    2013-04-03 Thread Mark Miller
    Thanks for digging Jamie. In 4.2, hash ranges are assigned up front when a collection is created - each shard gets a range, which is stored in zookeeper. You should not be able to end up with the same id on different shards - something very odd going on. Hopefully I'll have some time to try and

    Solr Tika Override

    2013-04-03 Thread JerryC
    I am researching Solr and seeing if it would be a good fit for a document search service I am helping to develop. One of the requirements is that we will need to be able to customize how file contents are parsed beyond the default configurations that are offered out of the box by Tika. For exampl

    Re: Lengthy description is converted to hash symbols

    2013-04-03 Thread Danny Watari
    Here is a query that should return 2 documents... but it only returns 1. /solr/m7779912/select?indent=on&version=2.2&q=description%3Agateway&fq=&start=0&rows=10&fl=description&qt=&wt=&explainOther=&hl.fl= Oddly enough, the description of the two documents are exactly the same. Except one is inde

    Re: SolrCloud not distributing documents across shards

    2013-04-03 Thread Michael Della Bitta
    Hello Vytenis, What exactly do you mean by "aren't distributing across the shards"? Do you mean that POSTs against the server for shard 1 never end up resulting in documents saved in shard 2? Michael Della Bitta Appinions 18 East 41st Street, 2nd

    Re: Out of memory on some faceting queries

    2013-04-03 Thread Shawn Heisey
    On 4/2/2013 3:09 AM, Dotan Cohen wrote: > I notice that this only occurs on queries that run facets. I start > Solr with the following command: > sudo nohup java -XX:NewRatio=1 -XX:+UseParNewGC > -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled > -Dsolr.solr.home=/mnt/SolrFiles100/solr -jar >

    Re: Solr Multiword Search

    2013-04-03 Thread skmirch
    I have been trying to use the MultiWordSpellingQueryConverter.java since I need to be able to find the document that correspond to the suggested collations. At the moment it seems to be producing collations based on word matches and arbitrary words from the field are picked up to form collation an

    Re: Query parser cuts last letter from search term.

    2013-04-03 Thread Upayavira
    On Wed, Apr 3, 2013, at 11:36 AM, vsl wrote: > So why Solr does not return proper document? You're gonna have to give us a bit more than that. What is wrong with the documents it is returning? Upayavira

    Re: Solr 4.2 Cloud Replication Replica has higher version than Master?

    2013-04-03 Thread Jamie Johnson
    no, my thought was wrong, it appears that even with the parameter set I am seeing this behavior. I've been able to duplicate it on 4.2.0 by indexing 100,000 documents on 10 threads (10,000 each) when I get to 400,000 or so. I will try this on 4.2.1. to see if I see the same behavior On Wed, Apr

    Re: Flow Chart of Solr

    2013-04-03 Thread Jack Krupansky
    We're using the 4.x branch code as the basis for our writing. So, effectively it will be for at least 4.3 when the book comes out in the summer. Early access will be in about a month or so. O'Reilly will be showing a galley proof for 200 pages of the book next week at Big Data TechCon next we

    Re: Flow Chart of Solr

    2013-04-03 Thread Jack Park
    Jack, Is that new book up to the 4.+ series? Thanks The other Jack On Wed, Apr 3, 2013 at 9:19 AM, Jack Krupansky wrote: > And another one on the way: > http://www.amazon.com/Lucene-Solr-Definitive-comprehensive-realtime/dp/1449359957 > > Hopefully that help a lot as well. Plenty of diagrams. L

    Re: maxWarmingSearchers in Solr 4.

    2013-04-03 Thread Shawn Heisey
    On 4/3/2013 1:48 AM, Dotan Cohen wrote: > I have been dragging the same solrconfig.xml from Solr 3.x to 4.0 to > 4.1, with no customization (bad, bad me!). I'm now looking into > customizing it and I see that the Solr 4.1 solrconfig.xml is much > simpler and shorter. Is this simply because many of

    Re: Solr ZooKeeper ensemble with HBase

    2013-04-03 Thread Walter underwood
    It will be limited by disk IO until you get the caches full. Then it will be limited by CPU. wunder On Apr 3, 2013, at 8:55 AM, Amit Sela wrote: > Trouble in what why ? If I have enough memory - HBase RegionServer 10GB and > maybe 2GB for Solr ? - or you mean CPU / disk ? > > > On Wed, Apr

    Re: Lengthy description is converted to hash symbols

    2013-04-03 Thread Danny Watari
    I looked at the text via the admin analysis tool. The text appeared to be ok! Unfortunately, the description is client data... so I can't post it here, but I do not see any issues when running the analysis tool. -- View this message in context: http://lucene.472066.n3.nabble.com/Lengthy-desc

    SolrException: Error opening new searcher

    2013-04-03 Thread Van Tassell, Kristian
    We're suddenly seeing an error when trying to do updates/commits. This is on Solr 4.2 (Tomcat, solr war deployed to webapps, on Linux SuSE 11). Based off of some initial searching on things related to this issue, I have set ulimit in Linux to 'unlimited' and verified that Tomcat has enough memor

    Re: Solr 4.2 Cloud Replication Replica has higher version than Master?

    2013-04-03 Thread Jamie Johnson
    Since I don't have that many items in my index I exported all of the keys for each shard and wrote a simple java program that checks for duplicates. I found some duplicate keys on different shards, a grep of the files for the keys found does indicate that they made it to the wrong places. If you

    Re: Filtering Search Cloud

    2013-04-03 Thread Shawn Heisey
    On 4/1/2013 3:02 PM, Furkan KAMACI wrote: > I want to separate my cloud into two logical parts. One of them is indexer > cloud of SolrCloud. Second one is Searcher cloud of SolrCloud. > > My first question is that. Does separating my cloud system make sense about > performance improvement. Because

    SolrCloud not distributing documents across shards

    2013-04-03 Thread vsilgalis
    So we have 3 servers in a SolrCloud cluster. We have 2 shards for our collection (classic_bt) with a shard on each of the first two servers as the picture shows. The third server has replicas of the first 2 shards just for high availa

    Re: Lengthy description is converted to hash symbols

    2013-04-03 Thread Jack Krupansky
    Show us the exact query URL as well as the request handler defaults. Make sure to try to do an explicit query on the field that has the "#" value. QA and prod may differ because maybe QA got completely reindexed more recently and maybe prod hasn't gotten fully reindexed recently. Maybe the s

    Re: Flow Chart of Solr

    2013-04-03 Thread Jack Krupansky
    And another one on the way: http://www.amazon.com/Lucene-Solr-Definitive-comprehensive-realtime/dp/1449359957 Hopefully that help a lot as well. Plenty of diagrams. Lots of examples. -- Jack Krupansky -Original Message- From: Jack Park Sent: Wednesday, April 03, 2013 11:25 AM To: solr

    Re: Lengthy description is converted to hash symbols

    2013-04-03 Thread Danny Watari
    Yes... the is what I see in the admin console when I perform a search for the document. Currently, I am using solrj and the addBean() method to update the core. Whats strange is in our QA env, the document indexed correctly. But in prod, I see hash symbols and thus any user search against that

    Re: Upgrade Solr3.5 to Solr4.1 - Index Reformat ?

    2013-04-03 Thread Shawn Heisey
    On 4/1/2013 12:19 PM, feroz_kh wrote: > Hi Shawn, > > I tried optimizing using this command... > > curl > 'http://localhost:/solr/update?optimize=true&maxSegments=10&waitFlush=true' > > And i got this response within secs... > > > > 0 name="QTime">840 > > > Is this a valid response that

    Re: Solr ZooKeeper ensemble with HBase

    2013-04-03 Thread Michael Della Bitta
    Solr heavily uses RAM for disk caching, so depending on your index size and what you intend to do with it, 2 GB could easily not be enough. We run with 6 GB heaps on 34 GB boxes, and the remaining RAM is there solely to act as a disk cache. We're on EC2, though, so unless you're using the SSD insta

    Re: Solr ZooKeeper ensemble with HBase

    2013-04-03 Thread Amit Sela
    Trouble in what why ? If I have enough memory - HBase RegionServer 10GB and maybe 2GB for Solr ? - or you mean CPU / disk ? On Wed, Apr 3, 2013 at 5:54 PM, Michael Della Bitta < michael.della.bi...@appinions.com> wrote: > Hello, Amit: > > My guess is that, if HBase is working hard, you're going

    Re: Flow Chart of Solr

    2013-04-03 Thread Jack Park
    There are three books on Solr, two with that in the title, and one, Taming Text, each of which have been very valuable in understanding Solr. Jack On Wed, Apr 3, 2013 at 5:25 AM, Jack Krupansky wrote: > Sure, yes. But... it comes down to what level of detail you want and need > for a specific ta

    Re: Solr ZooKeeper ensemble with HBase

    2013-04-03 Thread Michael Della Bitta
    Hello, Amit: My guess is that, if HBase is working hard, you're going to have more trouble with HBase and Solr on the same nodes than HBase and Solr sharing a Zookeeper. Solr's usage of Zookeeper is very minimal. Michael Della Bitta Appinions 18 E

    Re: Solr 4.2 Cloud Replication Replica has higher version than Master?

    2013-04-03 Thread Jamie Johnson
    Something interesting that I'm noticing as well, I just indexed 300,000 items, and some how 300,020 ended up in the index. I thought perhaps I messed something up so I started the indexing again and indexed another 400,000 and I see 400,064 docs. Is there a good way to find possibile duplicates?

    Re: solre scores remains same for exact match and nearly exact match

    2013-04-03 Thread amit
    Thanks. I added a copy field and that fixed the issue. On Wed, Apr 3, 2013 at 12:29 PM, Gora Mohanty-3 [via Lucene] < ml-node+s472066n4053412...@n3.nabble.com> wrote: > On 3 April 2013 10:52, amit <[hidden > email]> > wrote: > > > > Below is

    Question on Exact Matches - edismax

    2013-04-03 Thread Sandeep Mestry
    Hi All, I have a requirement where in exact matches for 2 fields (Series Title, Title) should be ranked higher than the partial matches. The configuration looks like below: edismax explicit 0.01 *pg_series_title_ci*^500 *title_ci*^300 * pg

    Re: Synonyms problem

    2013-04-03 Thread Shawn Heisey
    On 3/29/2013 12:14 PM, Plamen Mihaylov wrote: > Can I ask you another question: I have Magento + Solr and have a > requirement to create an admin magento module, where I can add/remove > synonyms dynamically. Is this possible? I searched google but it seems not > possible. If you change the synony

    Re: Solr metrics in Codahale metrics and Graphite?

    2013-04-03 Thread Shawn Heisey
    On 3/29/2013 12:07 PM, Walter Underwood wrote: > What are folks using for this? I don't know that this really answers your question, but Solr 4.1 and later includes a big chunk of codahale metrics internally for request handler statistics - see SOLR-1972. First we tried including the jar and usin

    Re: is there a way we can build spell dictionary from solr index such that it only take words leaving all`special characters

    2013-04-03 Thread Rohan Thakur
    hi upayavira you mean to say that I dont have to follow this : http://wiki.apache.org/solr/SpellCheckComponent and directly I can create spell check field from copyfield and use it...I dont have to build dictionary on the fieldjust use copyfield for spell suggetions? thanks regards Rohan O

    Re: Solr 4.2 Cloud Replication Replica has higher version than Master?

    2013-04-03 Thread Jamie Johnson
    Ok, so clearing the transaction log allowed things to go again. I am going to clear the index and try to replicate the problem on 4.2.0 and then I'll try on 4.2.1 On Wed, Apr 3, 2013 at 8:21 AM, Mark Miller wrote: > No, not that I know if, which is why I say we need to get to the bottom of > i

    RE: Confusion over Solr highlight hl.q parameter

    2013-04-03 Thread Van Tassell, Kristian
    Thank you for the response, unfortunately it didn't change that I'm still getting no highlighting hits for this query. ...hl.q={!dismax}text_it_IT:l'assieme... -Original Message- From: Koji Sekiguchi [mailto:k...@r.email.ne.jp] Sent: Tuesday, April 02, 2013 9:00 PM To: solr-user@lucene

    Re: Query parser cuts last letter from search term.

    2013-04-03 Thread Jack Krupansky
    The standard tokenizer recognizes "!" as a punctuation character, so it will be treated as white space. You could use the white space tokenizer if punctuation is considered significant. -- Jack Krupansky -Original Message- From: vsl Sent: Wednesday, April 03, 2013 6:25 AM To: solr-

    Re: Flow Chart of Solr

    2013-04-03 Thread Jack Krupansky
    Sure, yes. But... it comes down to what level of detail you want and need for a specific task. In other words, there are probably a dozen or more levels of detail. The reality is that if you are going to work at the Solr code level, that is very, very different than being a "user" of Solr, and a

    Re: Solr 4.2 Cloud Replication Replica has higher version than Master?

    2013-04-03 Thread Mark Miller
    No, not that I know if, which is why I say we need to get to the bottom of it. - Mark On Apr 2, 2013, at 10:18 PM, Jamie Johnson wrote: > Mark > It's there a particular jira issue that you think may address this? I read > through it quickly but didn't see one that jumped out > On Apr 2, 2013 10

    Re: Solr 4.2 Cloud Replication Replica has higher version than Master?

    2013-04-03 Thread Mark Miller
    Clear out it's tlogs before starting it again may help. - Mark On Apr 2, 2013, at 10:07 PM, Jamie Johnson wrote: > I brought the bad one down and back up and it did nothing. I can clear the > index and try4.2.1. I will save off the logs and see if there is anything > else odd > On Apr 2, 2013

      1   2   >