Re: Out of memory on some faceting queries

2013-04-03 Thread Toke Eskildsen
On Tue, 2013-04-02 at 17:08 +0200, Dotan Cohen wrote: > Most of the time I facet on one field that has about twenty unique > values. They are likely to be disk cached so warming those for 9M documents should only take a few seconds. > However, once per day I would like to facet on the text field,

maxWarmingSearchers in Solr 4.

2013-04-03 Thread Dotan Cohen
I have been dragging the same solrconfig.xml from Solr 3.x to 4.0 to 4.1, with no customization (bad, bad me!). I'm now looking into customizing it and I see that the Solr 4.1 solrconfig.xml is much simpler and shorter. Is this simply because many of the examples have been removed? In particular,

Re: Out of memory on some faceting queries

2013-04-03 Thread Dotan Cohen
On Tue, Apr 2, 2013 at 6:26 PM, Andre Bois-Crettez wrote: > warmupTime is available on the admin page for each type of cache (in > milliseconds) : > http://solr-box:8983/solr/#/core1/plugins/cache > > Or if you are only interested in the total : > http://solr-box:8983/solr/core1/admin/mbeans?stats

Re: Out of memory on some faceting queries

2013-04-03 Thread Dotan Cohen
On Wed, Apr 3, 2013 at 10:11 AM, Toke Eskildsen wrote: >> However, once per day I would like to facet on the text field, >> which is a free-text field usually around 1 KiB (about 100 words), in >> order to determine what the top keywords / topics are. That query >> would take up to 200 seconds to

Re: Solr 4.2.0 results links

2013-04-03 Thread zeroeffect
Thanks for the response. I found the issue. The data was being ingested correctly it just being echoed incorrectly. while inspecting the final HTML output I was able to find that the richtext-doc.vm file was used to display my data. The code in this file generated the links to local files. I did so

Query parser cuts last letter from search term.

2013-04-03 Thread vsl
Hi, I have strange problem with Solr query. I added to my Solr Index new document with "behave!" word inside content. While I was trying to search this document using "behave" search term it was impossible. Only "behave!" returns result. Additionaly search debug returns following information: debu

RE: MoreLikeThis - Odd results - what am I doing wrong?

2013-04-03 Thread DC tech
Thanks David - I suppose it is an AWS question and thank you for the pointers. As a further input to the MLT question - it does seem that 3.6 behavior is different from 4.2 - the issue seems to be more in terms of the raw query that is generated. I will some more research and revert back with

Re: Query parser cuts last letter from search term.

2013-04-03 Thread Upayavira
This is called 'stemming', and is caused by this: It means that all of these terms would match: behave behaving behaved (and possibly more) because they would all stem down to 'behav'. This stemming will happen at index time and at query time, so stemmed terms are stored in your index

Re: Query parser cuts last letter from search term.

2013-04-03 Thread vsl
So why Solr does not return proper document? -- View this message in context: http://lucene.472066.n3.nabble.com/Query-parser-cuts-last-letter-from-search-term-tp4053432p4053435.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Flow Chart of Solr

2013-04-03 Thread Furkan KAMACI
So, all in all, is there anybody who can write down just main steps of Solr(including parsing, stemming etc.)? 2013/4/2 Furkan KAMACI > I think about myself as an example. I have started to make research about > Solr just for some weeks. I have learned Solr and its related projects. My > next s

Words being duplicated with highlighting & DictionaryCompoundWordTokenFilterFactory

2013-04-03 Thread Philtjens, Raf
I'm having issues with highlighting & DictionaryCompoundWordTokenFilterFactory in Solr 3.6.1/3.6.2. It's duplicating/adding words in the highlighted snippet. For example, my dictionary (dutch) has the following words: premie, beter, ring. If I search for 'verbetering', results with 'verbeterings

Solr ZooKeeper ensemble with HBase

2013-04-03 Thread Amit Sela
Hi all, I have a running Hadoop + HBase cluster and the HBase cluster is running it's own zookeeper (HBase manages zookeeper). I would like to deploy my SolrCloud cluster on a portion of the machines on that cluster. My question is: Should I have any trouble / issues deploying an additional ZooKe

Re: Solr 4.2 Cloud Replication Replica has higher version than Master?

2013-04-03 Thread Mark Miller
Clear out it's tlogs before starting it again may help. - Mark On Apr 2, 2013, at 10:07 PM, Jamie Johnson wrote: > I brought the bad one down and back up and it did nothing. I can clear the > index and try4.2.1. I will save off the logs and see if there is anything > else odd > On Apr 2, 2013

Re: Solr 4.2 Cloud Replication Replica has higher version than Master?

2013-04-03 Thread Mark Miller
No, not that I know if, which is why I say we need to get to the bottom of it. - Mark On Apr 2, 2013, at 10:18 PM, Jamie Johnson wrote: > Mark > It's there a particular jira issue that you think may address this? I read > through it quickly but didn't see one that jumped out > On Apr 2, 2013 10

Re: Flow Chart of Solr

2013-04-03 Thread Jack Krupansky
Sure, yes. But... it comes down to what level of detail you want and need for a specific task. In other words, there are probably a dozen or more levels of detail. The reality is that if you are going to work at the Solr code level, that is very, very different than being a "user" of Solr, and a

Re: Query parser cuts last letter from search term.

2013-04-03 Thread Jack Krupansky
The standard tokenizer recognizes "!" as a punctuation character, so it will be treated as white space. You could use the white space tokenizer if punctuation is considered significant. -- Jack Krupansky -Original Message- From: vsl Sent: Wednesday, April 03, 2013 6:25 AM To: solr-

RE: Confusion over Solr highlight hl.q parameter

2013-04-03 Thread Van Tassell, Kristian
Thank you for the response, unfortunately it didn't change that I'm still getting no highlighting hits for this query. ...hl.q={!dismax}text_it_IT:l'assieme... -Original Message- From: Koji Sekiguchi [mailto:k...@r.email.ne.jp] Sent: Tuesday, April 02, 2013 9:00 PM To: solr-user@lucene

Re: Solr 4.2 Cloud Replication Replica has higher version than Master?

2013-04-03 Thread Jamie Johnson
Ok, so clearing the transaction log allowed things to go again. I am going to clear the index and try to replicate the problem on 4.2.0 and then I'll try on 4.2.1 On Wed, Apr 3, 2013 at 8:21 AM, Mark Miller wrote: > No, not that I know if, which is why I say we need to get to the bottom of > i

Re: is there a way we can build spell dictionary from solr index such that it only take words leaving all`special characters

2013-04-03 Thread Rohan Thakur
hi upayavira you mean to say that I dont have to follow this : http://wiki.apache.org/solr/SpellCheckComponent and directly I can create spell check field from copyfield and use it...I dont have to build dictionary on the fieldjust use copyfield for spell suggetions? thanks regards Rohan O

Re: Solr metrics in Codahale metrics and Graphite?

2013-04-03 Thread Shawn Heisey
On 3/29/2013 12:07 PM, Walter Underwood wrote: > What are folks using for this? I don't know that this really answers your question, but Solr 4.1 and later includes a big chunk of codahale metrics internally for request handler statistics - see SOLR-1972. First we tried including the jar and usin

Re: Synonyms problem

2013-04-03 Thread Shawn Heisey
On 3/29/2013 12:14 PM, Plamen Mihaylov wrote: > Can I ask you another question: I have Magento + Solr and have a > requirement to create an admin magento module, where I can add/remove > synonyms dynamically. Is this possible? I searched google but it seems not > possible. If you change the synony

Question on Exact Matches - edismax

2013-04-03 Thread Sandeep Mestry
Hi All, I have a requirement where in exact matches for 2 fields (Series Title, Title) should be ranked higher than the partial matches. The configuration looks like below: edismax explicit 0.01 *pg_series_title_ci*^500 *title_ci*^300 * pg

Re: solre scores remains same for exact match and nearly exact match

2013-04-03 Thread amit
Thanks. I added a copy field and that fixed the issue. On Wed, Apr 3, 2013 at 12:29 PM, Gora Mohanty-3 [via Lucene] < ml-node+s472066n4053412...@n3.nabble.com> wrote: > On 3 April 2013 10:52, amit <[hidden > email]> > wrote: > > > > Below is

Re: Solr 4.2 Cloud Replication Replica has higher version than Master?

2013-04-03 Thread Jamie Johnson
Something interesting that I'm noticing as well, I just indexed 300,000 items, and some how 300,020 ended up in the index. I thought perhaps I messed something up so I started the indexing again and indexed another 400,000 and I see 400,064 docs. Is there a good way to find possibile duplicates?

Re: Solr ZooKeeper ensemble with HBase

2013-04-03 Thread Michael Della Bitta
Hello, Amit: My guess is that, if HBase is working hard, you're going to have more trouble with HBase and Solr on the same nodes than HBase and Solr sharing a Zookeeper. Solr's usage of Zookeeper is very minimal. Michael Della Bitta Appinions 18 E

Re: Flow Chart of Solr

2013-04-03 Thread Jack Park
There are three books on Solr, two with that in the title, and one, Taming Text, each of which have been very valuable in understanding Solr. Jack On Wed, Apr 3, 2013 at 5:25 AM, Jack Krupansky wrote: > Sure, yes. But... it comes down to what level of detail you want and need > for a specific ta

Re: Solr ZooKeeper ensemble with HBase

2013-04-03 Thread Amit Sela
Trouble in what why ? If I have enough memory - HBase RegionServer 10GB and maybe 2GB for Solr ? - or you mean CPU / disk ? On Wed, Apr 3, 2013 at 5:54 PM, Michael Della Bitta < michael.della.bi...@appinions.com> wrote: > Hello, Amit: > > My guess is that, if HBase is working hard, you're going

Re: Solr ZooKeeper ensemble with HBase

2013-04-03 Thread Michael Della Bitta
Solr heavily uses RAM for disk caching, so depending on your index size and what you intend to do with it, 2 GB could easily not be enough. We run with 6 GB heaps on 34 GB boxes, and the remaining RAM is there solely to act as a disk cache. We're on EC2, though, so unless you're using the SSD insta

Re: Upgrade Solr3.5 to Solr4.1 - Index Reformat ?

2013-04-03 Thread Shawn Heisey
On 4/1/2013 12:19 PM, feroz_kh wrote: > Hi Shawn, > > I tried optimizing using this command... > > curl > 'http://localhost:/solr/update?optimize=true&maxSegments=10&waitFlush=true' > > And i got this response within secs... > > > > 0 name="QTime">840 > > > Is this a valid response that

Re: Lengthy description is converted to hash symbols

2013-04-03 Thread Danny Watari
Yes... the is what I see in the admin console when I perform a search for the document. Currently, I am using solrj and the addBean() method to update the core. Whats strange is in our QA env, the document indexed correctly. But in prod, I see hash symbols and thus any user search against that

Re: Flow Chart of Solr

2013-04-03 Thread Jack Krupansky
And another one on the way: http://www.amazon.com/Lucene-Solr-Definitive-comprehensive-realtime/dp/1449359957 Hopefully that help a lot as well. Plenty of diagrams. Lots of examples. -- Jack Krupansky -Original Message- From: Jack Park Sent: Wednesday, April 03, 2013 11:25 AM To: solr

Re: Lengthy description is converted to hash symbols

2013-04-03 Thread Jack Krupansky
Show us the exact query URL as well as the request handler defaults. Make sure to try to do an explicit query on the field that has the "#" value. QA and prod may differ because maybe QA got completely reindexed more recently and maybe prod hasn't gotten fully reindexed recently. Maybe the s

SolrCloud not distributing documents across shards

2013-04-03 Thread vsilgalis
So we have 3 servers in a SolrCloud cluster. We have 2 shards for our collection (classic_bt) with a shard on each of the first two servers as the picture shows. The third server has replicas of the first 2 shards just for high availa

Re: Filtering Search Cloud

2013-04-03 Thread Shawn Heisey
On 4/1/2013 3:02 PM, Furkan KAMACI wrote: > I want to separate my cloud into two logical parts. One of them is indexer > cloud of SolrCloud. Second one is Searcher cloud of SolrCloud. > > My first question is that. Does separating my cloud system make sense about > performance improvement. Because

Re: Solr 4.2 Cloud Replication Replica has higher version than Master?

2013-04-03 Thread Jamie Johnson
Since I don't have that many items in my index I exported all of the keys for each shard and wrote a simple java program that checks for duplicates. I found some duplicate keys on different shards, a grep of the files for the keys found does indicate that they made it to the wrong places. If you

SolrException: Error opening new searcher

2013-04-03 Thread Van Tassell, Kristian
We're suddenly seeing an error when trying to do updates/commits. This is on Solr 4.2 (Tomcat, solr war deployed to webapps, on Linux SuSE 11). Based off of some initial searching on things related to this issue, I have set ulimit in Linux to 'unlimited' and verified that Tomcat has enough memor

Re: Lengthy description is converted to hash symbols

2013-04-03 Thread Danny Watari
I looked at the text via the admin analysis tool. The text appeared to be ok! Unfortunately, the description is client data... so I can't post it here, but I do not see any issues when running the analysis tool. -- View this message in context: http://lucene.472066.n3.nabble.com/Lengthy-desc

Re: Solr ZooKeeper ensemble with HBase

2013-04-03 Thread Walter underwood
It will be limited by disk IO until you get the caches full. Then it will be limited by CPU. wunder On Apr 3, 2013, at 8:55 AM, Amit Sela wrote: > Trouble in what why ? If I have enough memory - HBase RegionServer 10GB and > maybe 2GB for Solr ? - or you mean CPU / disk ? > > > On Wed, Apr

Re: maxWarmingSearchers in Solr 4.

2013-04-03 Thread Shawn Heisey
On 4/3/2013 1:48 AM, Dotan Cohen wrote: > I have been dragging the same solrconfig.xml from Solr 3.x to 4.0 to > 4.1, with no customization (bad, bad me!). I'm now looking into > customizing it and I see that the Solr 4.1 solrconfig.xml is much > simpler and shorter. Is this simply because many of

Re: Flow Chart of Solr

2013-04-03 Thread Jack Park
Jack, Is that new book up to the 4.+ series? Thanks The other Jack On Wed, Apr 3, 2013 at 9:19 AM, Jack Krupansky wrote: > And another one on the way: > http://www.amazon.com/Lucene-Solr-Definitive-comprehensive-realtime/dp/1449359957 > > Hopefully that help a lot as well. Plenty of diagrams. L

Re: Flow Chart of Solr

2013-04-03 Thread Jack Krupansky
We're using the 4.x branch code as the basis for our writing. So, effectively it will be for at least 4.3 when the book comes out in the summer. Early access will be in about a month or so. O'Reilly will be showing a galley proof for 200 pages of the book next week at Big Data TechCon next we

Re: Solr 4.2 Cloud Replication Replica has higher version than Master?

2013-04-03 Thread Jamie Johnson
no, my thought was wrong, it appears that even with the parameter set I am seeing this behavior. I've been able to duplicate it on 4.2.0 by indexing 100,000 documents on 10 threads (10,000 each) when I get to 400,000 or so. I will try this on 4.2.1. to see if I see the same behavior On Wed, Apr

Re: Query parser cuts last letter from search term.

2013-04-03 Thread Upayavira
On Wed, Apr 3, 2013, at 11:36 AM, vsl wrote: > So why Solr does not return proper document? You're gonna have to give us a bit more than that. What is wrong with the documents it is returning? Upayavira

Re: Solr Multiword Search

2013-04-03 Thread skmirch
I have been trying to use the MultiWordSpellingQueryConverter.java since I need to be able to find the document that correspond to the suggested collations. At the moment it seems to be producing collations based on word matches and arbitrary words from the field are picked up to form collation an

Re: Out of memory on some faceting queries

2013-04-03 Thread Shawn Heisey
On 4/2/2013 3:09 AM, Dotan Cohen wrote: > I notice that this only occurs on queries that run facets. I start > Solr with the following command: > sudo nohup java -XX:NewRatio=1 -XX:+UseParNewGC > -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled > -Dsolr.solr.home=/mnt/SolrFiles100/solr -jar >

Re: SolrCloud not distributing documents across shards

2013-04-03 Thread Michael Della Bitta
Hello Vytenis, What exactly do you mean by "aren't distributing across the shards"? Do you mean that POSTs against the server for shard 1 never end up resulting in documents saved in shard 2? Michael Della Bitta Appinions 18 East 41st Street, 2nd

Re: Lengthy description is converted to hash symbols

2013-04-03 Thread Danny Watari
Here is a query that should return 2 documents... but it only returns 1. /solr/m7779912/select?indent=on&version=2.2&q=description%3Agateway&fq=&start=0&rows=10&fl=description&qt=&wt=&explainOther=&hl.fl= Oddly enough, the description of the two documents are exactly the same. Except one is inde

Solr Tika Override

2013-04-03 Thread JerryC
I am researching Solr and seeing if it would be a good fit for a document search service I am helping to develop. One of the requirements is that we will need to be able to customize how file contents are parsed beyond the default configurations that are offered out of the box by Tika. For exampl

Re: Solr 4.2 Cloud Replication Replica has higher version than Master?

2013-04-03 Thread Mark Miller
Thanks for digging Jamie. In 4.2, hash ranges are assigned up front when a collection is created - each shard gets a range, which is stored in zookeeper. You should not be able to end up with the same id on different shards - something very odd going on. Hopefully I'll have some time to try and

RE: AW: AW: java.lang.OutOfMemoryError: Map failed

2013-04-03 Thread Van Tassell, Kristian
I just posted a similar error and discovered that decreasing the Xmx fixed the problem for me. The "free" command/top, etc. indicated I was flying just below the threshold for my allowed memory, and with swap/virtual space available, so I'm still confused as to what the issue is, but you may try

RE: Solr Multiword Search

2013-04-03 Thread Dyer, James
You have specified "spellcheck.q" in your query. The whole purpose of "spellcheck.q" is to bypass any query converter you've configured giving it raw keywords instead. But possibly a custom query converter is not your best answer? I agree that charles > charlie is an edit distance of 2, so if

Re: SolrCloud not distributing documents across shards

2013-04-03 Thread vsilgalis
Michael Della Bitta-2 wrote > Hello Vytenis, > > What exactly do you mean by "aren't distributing across the shards"? > Do you mean that POSTs against the server for shard 1 never end up > resulting in documents saved in shard 2? So we indexed a set of 33010 documents on server01 which are now in

Re: Solr 4.2 Cloud Replication Replica has higher version than Master?

2013-04-03 Thread Jamie Johnson
Where is this information stored in ZK? I don't see it in the cluster state (or perhaps I don't understand it ;) ). Perhaps something with my process is broken. What I do when I start from scratch is the following ZkCLI -cmd upconfig ... ZkCLI -cmd linkconfig but I don't ever explicitly c

Re: Solr 4.2 Cloud Replication Replica has higher version than Master?

2013-04-03 Thread Mark Miller
It should be part of your clusterstate.json. Some users have reported trouble upgrading a previous zk install when this change came. I recommended manually updating the clusterstate.json to have the right info, and that seemed to work. Otherwise, I guess you have to start from a clean zk state.

Re: SolrCloud not distributing documents across shards

2013-04-03 Thread Chris Hostetter
: So we indexed a set of 33010 documents on server01 which are now in shard1. : And we kicked off a set of 85934 documents on server02 which are now in : shard2 (as tests). In my understanding of how SolrCloud works, the : documents should be distributed across the shards in the collection. Now I

Re: It seems a issue of deal with chinese synonym for solr

2013-04-03 Thread Kuro Kurosaka
On 3/11/13 6:15 PM, 李威 wrote: in org.apache.solr.parser.SolrQueryParserBase, there is a function: "protected Query newFieldQuery(Analyzer analyzer, String field, String queryText, boolean quoted) throws SyntaxError" The below code can't process chinese rightly. " BooleanClause.Occur

Re: Solr 4.2 Cloud Replication Replica has higher version than Master?

2013-04-03 Thread Jamie Johnson
The router says "implicit". I did start from a blank zk state but perhaps I missed one of the ZkCLI commands? One of my shards from the clusterstate.json is shown below. What is the process that should be done to bootstrap a cluster other than the ZkCLI commands I listed above? My process right

Re: SolrCloud not distributing documents across shards

2013-04-03 Thread vsilgalis
Chris Hostetter-3 wrote > I'm not familiar with the details, but i've seen miller respond to a > similar question with reference to the issue of not explicitly specifying > numShards when creating your collections... > > http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201303.mbox/% >

HTML entities being missed by DIH HTMLStripTransformer

2013-04-03 Thread Ashok
Hi, I am using DIH to index some database fields. These fields contain html formatted text in them. I use the 'HTMLStripTransformer' to remove that markup. This works fine when the text is like for example: Item One or *This is in Bold* However when the text has HTML entity names like in:
  • I

  • Re: HTML entities being missed by DIH HTMLStripTransformer

    2013-04-03 Thread Gora Mohanty
    On 4 April 2013 00:30, Ashok wrote: [...] > Two questions. > > (1) Is this the expected behavior of DIH HTMLStripTransformer? Yes, I believe so. > (2) If yes, is there an another transformer that I can employ first to turn > these html entities into their usual symbols that can then be removed b

    Re: Filtering Search Cloud

    2013-04-03 Thread Furkan KAMACI
    Shawn, thanks for your detailed explanation. My system will work on high load. I mean I will always index something and something always will be queried at my system. That is why I consider about physically separating indexer and query reply machines. I think about that: imagine a machine that both

    Re: Solr 4.2 Cloud Replication Replica has higher version than Master?

    2013-04-03 Thread Mark Miller
    If you don't specify numShards after 4.1, you get an implicit doc router and it's up to you to distribute updates. In the past, partitioning was done on the fly - but for shard splitting and perhaps other features, we now divvy up the hash range up front based on numShards and store it in ZooKee

    Re: HTML entities being missed by DIH HTMLStripTransformer

    2013-04-03 Thread Ashok
    Well, the database field has text, sometimes with HTML entities and at other times with html tags. I have no control over the process that populates the database tables with info. -- View this message in context: http://lucene.472066.n3.nabble.com/HTML-entities-being-missed-by-DIH-HTMLStripTra

    Re: Solr 4.2 Cloud Replication Replica has higher version than Master?

    2013-04-03 Thread Jamie Johnson
    ah interestingso I need to specify num shards, blow out zk and then try this again to see if things work properly now. What is really strange is that for the most part things have worked right and on 4.2.1 I have 600,000 items indexed with no duplicates. In any event I will specify num shards

    Re: SolrCloud not distributing documents across shards

    2013-04-03 Thread Michael Della Bitta
    With earlier versions of Solr Cloud, if there was any error or warning when you made a collection, you likely were set up for "implicit" routing which means that documents only go to the shard you're talking to. What you want is "compositeId" routing, which works how you think it should. Go into t

    Re: HTML entities being missed by DIH HTMLStripTransformer

    2013-04-03 Thread Alexandre Rafalovitch
    Then, I would say, you have a bigger problem However, you can probably run RegEx filter and replace those known escapes with real characters before you run your HTMLStrip filter. Or run, HTMLStrip, RegEx and HTMLStrip again. Regards, Alex. Personal blog: http://blog.outerthoughts.com/ Lin

    Re: Filtering Search Cloud

    2013-04-03 Thread Shawn Heisey
    On 4/3/2013 1:13 PM, Furkan KAMACI wrote: > Shawn, thanks for your detailed explanation. My system will work on high > load. I mean I will always index something and something always will be > queried at my system. That is why I consider about physically separating > indexer and query reply machine

    Re: HTML entities being missed by DIH HTMLStripTransformer

    2013-04-03 Thread Steve Rowe
    Hi Ashok, HTMLStripTransformer uses HTMLStripCharFilter under the hood, and HTMLStripCharFilter converts all HTML entities to their corresponding characters. What version of Solr are you using? My guess is that it only appears that nothing is happening, since when they are presented in a brow

    Re: SolrCloud not distributing documents across shards

    2013-04-03 Thread vsilgalis
    Michael Della Bitta-2 wrote > With earlier versions of Solr Cloud, if there was any error or warning > when you made a collection, you likely were set up for "implicit" > routing which means that documents only go to the shard you're talking > to. What you want is "compositeId" routing, which works

    Re: Solr 4.2 Cloud Replication Replica has higher version than Master?

    2013-04-03 Thread Jamie Johnson
    answered my own question, it now says compositeId. What is problematic though is that in addition to my shards (which are say jamie-shard1) I see the solr created shards (shard1). I assume that these were created because of the numShards param. Is there no way to specify the names of these shard

    Re: SolrCloud not distributing documents across shards

    2013-04-03 Thread Michael Della Bitta
    If you can work with a clean state, I'd turn off all your shards, clear out the Solr directories in Zookeeper, reset solr.xml for each of your shards, upgrade to the latest version of Solr, and turn everything back on again. Then upload config, recreate your collection, etc. I do it like this, but

    Re: Filtering Search Cloud

    2013-04-03 Thread Furkan KAMACI
    Thanks for your explanation, you explained every thing what I need. Just one more question. I see that I can not make it with Solr Cloud, but I can do something like that with master-slave replication of Solr. If I use master-slave replication of Solr, can I eliminate (filter) something (something

    Re: Solr 4.2 Cloud Replication Replica has higher version than Master?

    2013-04-03 Thread Mark Miller
    I had thought you could - but looking at the code recently, I don't think you can anymore. I think that's a technical limitation more than anything though. When these changes were made, I think support for that was simply not added at the time. I'm not sure exactly how straightforward it would

    Re: Solr 4.2 Cloud Replication Replica has higher version than Master?

    2013-04-03 Thread Jamie Johnson
    ok, so that's not a deal breaker for me. I just changed it to match the shards that are auto created and it looks like things are happy. I'll go ahead and try my test to see if I can get things out of sync. On Wed, Apr 3, 2013 at 3:56 PM, Mark Miller wrote: > I had thought you could - but loo

    Re: HTML entities being missed by DIH HTMLStripTransformer

    2013-04-03 Thread Ashok
    Hi Steve, Fabulous suggestion! Yup, that is it! Using the HTMLStripTransformer twice did the trick. I am using Solr 4.1. Thank you very much! - ashok -- View this message in context: http://lucene.472066.n3.nabble.com/HTML-entities-being-missed-by-DIH-HTMLStripTransformer-tp4053582p4053609.h

    do SearchComponents have access to response contents

    2013-04-03 Thread xavier jmlucjav
    I need to implement some SearchComponent that will deal with metrics on the response. Some things I see will be easy to get, like number of hits for instance, but I am more worried with this: We need to also track the size of the response (as the size in bytes of the whole xml response tat is stre

    Re: Solr 4.2 Cloud Replication Replica has higher version than Master?

    2013-04-03 Thread Jamie Johnson
    with these changes things are looking good, I'm up to 600,000 documents without any issues as of right now. I'll keep going and add more to see if I find anything. On Wed, Apr 3, 2013 at 4:01 PM, Jamie Johnson wrote: > ok, so that's not a deal breaker for me. I just changed it to match the >

    Re: HTML entities being missed by DIH HTMLStripTransformer

    2013-04-03 Thread Steve Rowe
    Cool, glad I was able to help. On Apr 3, 2013, at 4:18 PM, Ashok wrote: > Hi Steve, > > Fabulous suggestion! Yup, that is it! Using the HTMLStripTransformer twice > did the trick. I am using Solr 4.1. > > Thank you very much! > > - ashok > > > > -- > View this message in context: > http:/

    Re: SolrCloud not distributing documents across shards

    2013-04-03 Thread vsilgalis
    Michael Della Bitta-2 wrote > If you can work with a clean state, I'd turn off all your shards, > clear out the Solr directories in Zookeeper, reset solr.xml for each > of your shards, upgrade to the latest version of Solr, and turn > everything back on again. Then upload config, recreate your > co

    Re: Question on Exact Matches - edismax

    2013-04-03 Thread Jan Høydahl
    Can you show us your *_ci field type? Solr does not really have a way to tell whether a match is "exact" or only partial, but you could hack around it with the fieldType. See https://github.com/cominvent/exactmatch for a possible solution. -- Jan Høydahl, search solution architect Cominvent AS

    Re: do SearchComponents have access to response contents

    2013-04-03 Thread Jack Krupansky
    The search components can see the "response" as a namedlist, but it is only when SolrDispatchFIlter calls the QueryResponseWriter that XML or JSON or whatever other format (Javabin as well) is generated from the named list for final output in an HTTP response. You probably want a custom query

    Re: Solr Tika Override

    2013-04-03 Thread Jan Høydahl
    You'd probably want to work on the XML output from Tika's PDF parser, from which you can identify which page and context. Personally I would build a separate indexing application in Java and call Tika directly, then build a SolrInputDocument which you pass to solr through SolrJ. I.e. not use Ex

    Re: SolrCloud not distributing documents across shards

    2013-04-03 Thread Michael Della Bitta
    >From what I can tell, the Collections API has been hardened significantly since 4.2 and now will refuse to create a collection if you give it something ambiguous to do. So if you upgrade to 4.2, things will become more safe. But overall I'd find a way of using the Collections API that works and s

    Re: SolrCloud not distributing documents across shards

    2013-04-03 Thread Mark Miller
    On Apr 3, 2013, at 5:53 PM, Michael Della Bitta wrote: > From what I can tell, the Collections API has been hardened > significantly since 4.2 I did a lot of work here for 4.2.1 - there was a lot to improve. Hopefully there is much less now, but if anyone finds anything, I'll fix any JIRA's.

    Re: Filtering Search Cloud

    2013-04-03 Thread Shawn Heisey
    On 4/3/2013 1:52 PM, Furkan KAMACI wrote: > Thanks for your explanation, you explained every thing what I need. Just > one more question. I see that I can not make it with Solr Cloud, but I can > do something like that with master-slave replication of Solr. If I use > master-slave replication of So

    Streaming search results

    2013-04-03 Thread Victor Miroshnikov
    Is it possible to stream search results from Solr? Seems that this feature is missing. I see two options to solve this: 1. Using search results pagination feature The idea is to implement a smart proxy that will stream chunks from search results using pagination. 2. Implement Solr plugin with

    Re: Solr metrics in Codahale metrics and Graphite?

    2013-04-03 Thread Otis Gospodnetic
    Hi, We're using... eh, our SPM for Solr - http://sematext.com/spm/solr-performance-monitoring/index.html (Wunder, I think somebody from Chegg actually looked into using it - please ping if you need more info) Shawn, metrics 3.0.0beta1 is out, apparently very reworked, so might be worth revisi

    RE: Solr Multiword Search

    2013-04-03 Thread skmirch
    Hi James, Thanks for the information you have provided. I tried your suggestion and it helped a lot. However, as close as this seems to what I want, I still need for it to match the exact phrases that closely match my search words. So while I am now using the search words in q and also spellch

    Re: Solr metrics in Codahale metrics and Graphite?

    2013-04-03 Thread Walter Underwood
    That sounds great. I'll check out the bug, I didn't see anything in the docs about this. And if I can't find it with a search engine, it probably isn't there. --wunder On Apr 3, 2013, at 6:39 AM, Shawn Heisey wrote: > On 3/29/2013 12:07 PM, Walter Underwood wrote: >> What are folks using for t

    Re: Solr metrics in Codahale metrics and Graphite?

    2013-04-03 Thread Otis Gospodnetic
    It's there! :) http://search-lucene.com/?q=percentile&fc_project=Solr&fc_type=issue Otis -- Solr & ElasticSearch Support http://sematext.com/ On Wed, Apr 3, 2013 at 6:29 PM, Walter Underwood wrote: > That sounds great. I'll check out the bug, I didn't see anything in the docs > about this.

    Re: Solr metrics in Codahale metrics and Graphite?

    2013-04-03 Thread Walter Underwood
    In the Jira, but not in the docs. It would be nice to have VM stats like GC, too, so we can have common monitoring and alerting on all our services. wunder On Apr 3, 2013, at 3:31 PM, Otis Gospodnetic wrote: > It's there! :) > http://search-lucene.com/?q=percentile&fc_project=Solr&fc_type=iss

    Re: Solr 4.2 Cloud Replication Replica has higher version than Master?

    2013-04-03 Thread Jamie Johnson
    just an update, I'm at 1M records now with no issues. This looks promising as to the cause of my issues, thanks for the help. Is the routing method with numShards documented anywhere? I know numShards is documented but I didn't know that the routing changed if you don't specify it. On Wed, Apr

    RE: Solr Multiword Search

    2013-04-03 Thread skmirch
    The following query is doing a word search (based on my previous post)... solr/spell?q=(charles+and+the+choclit+factory+OR+(title2:("charles+and+the+choclit+factory")))&spellcheck.collate=true&spellcheck=true&spellcheck.q=charles+and+the+choclit+factory It produces a lot of unwanted matches.

    Re: Solr 4.2 Cloud Replication Replica has higher version than Master?

    2013-04-03 Thread Jamie Johnson
    I am occasionally seeing this in the log, is this just a timeout issue? Should I be increasing the zk client timeout? WARNING: Overseer cannot talk to ZK Apr 3, 2013 11:14:25 PM org.apache.solr.cloud.DistributedQueue$LatchChildWatcher process INFO: Watcher fired on path: null state: Expired type

    Re: Solr 4.2 Cloud Replication Replica has higher version than Master?

    2013-04-03 Thread Mark Miller
    Yeah. Are you using the concurrent low pause garbage collector? This means the overseer wasn't able to communicate with zk for 15 seconds - due to load or gc or whatever. If you can't resolve the root cause of that, or the load just won't allow for it, next best thing you can do is raise it to 3

    Re: Solr 4.2 Cloud Replication Replica has higher version than Master?

    2013-04-03 Thread Mark Miller
    This shouldn't be a problem though, if things are working as they are supposed to. Another node should simply take over as the overseer and continue processing the work queue. It's just best if you configure so that session timeouts don't happen unless a node is really down. On the other hand, i

    Re: Solr 4.2 Cloud Replication Replica has higher version than Master?

    2013-04-03 Thread Jamie Johnson
    I am not using the concurrent low pause garbage collector, I could look at switching, I'm assuming you're talking about adding -XX:+UseConcMarkSweepGC correct? I also just had a shard go down and am seeing this in the log SEVERE: org.apache.solr.common.SolrException: I was asked to wait on state

    Re: Solr 4.2 Cloud Replication Replica has higher version than Master?

    2013-04-03 Thread Mark Miller
    On Apr 3, 2013, at 8:17 PM, Jamie Johnson wrote: > I am not using the concurrent low pause garbage collector, I could look at > switching, I'm assuming you're talking about adding -XX:+UseConcMarkSweepGC > correct? Right - if you don't do that, the default is almost always the throughput coll

    hl.usePhraseHighlighter defaults to true but Query form and wiki suggest otherwise

    2013-04-03 Thread Timothy Potter
    Minor issues - It seems that the hl.usePhraseHighlighter is enabled by default, which definitely makes sense but the wiki says it's default value is "false" and the checkbox is unchecked by default on the Query form. This gives the impression this parameter defaults to "false". I'm assuming the co

    Re: Solr 4.2 Cloud Replication Replica has higher version than Master?

    2013-04-03 Thread Jamie Johnson
    Thanks I will try that. On Wed, Apr 3, 2013 at 8:28 PM, Mark Miller wrote: > > > On Apr 3, 2013, at 8:17 PM, Jamie Johnson wrote: > > > I am not using the concurrent low pause garbage collector, I could look > at > > switching, I'm assuming you're talking about adding > -XX:+UseConcMarkSweepGC

      1   2   >