indexing fails after 40/ 50 million data being indexed - single zookeeper

2013-08-19 Thread Santanu8939967892
Hi, We have a requirement of huge 250 million data to be indexed and we are using DIH to index the data in batches of 10 million. We are using Solr 4.4. We are using Solr cloud with One zookeeper and two collections with two shards in each in two Tomcat servers. But the indexing fails after 40/

Re: Use case of Spatial search

2013-08-19 Thread David Smiley (@MITRE.org)
Shishir, Use the location_rpt type and index circles of the business and the distance they serve with this syntax: Circle(lat,lon d=degreesRadius) Your query shape is then simply a point; use bbox query parser with d=0. This approach should scale *great* at query time. Erick suggesting using fun

Re: spatial search, geofilt does not work

2013-08-19 Thread David Smiley (@MITRE.org)
Thank goodness for Solr's feature of echo'ing params back in the response as it helps diagnose problems like this. In your case, the filter query that Solr is seeing isn't what you (seemed) to have given on the command line: "fq":"!geofilt sfield=author_geo" Clearly wrong. Try escaping the braces

Re: SOLR4 Spatial sorting and query string

2013-08-19 Thread David Smiley (@MITRE.org)
This is a known limitation. From CHANGES.txt: * SOLR-2345: Enhanced geodist() to work with an RPT field, provided that the field is referenced via 'sfield' and the query point is constant. (David Smiley) The reason why that limitation is there relates to the fact that the function query parse

Re: Custom Sort(0.2*relervanceScore + 0.8*numberic_field_value)

2013-08-19 Thread 刘健
Thank you very much! Then could you tell me how to implement relervance_score*numberic_field/(relervance_score + numberic_field) ? I think it's better to sort by harmmean -- Original -- From: "Jack Krupansky"; Date: Tue, Aug 20, 2013 10:47 AM To: "solr-u

Re: Custom Sort(0.2*relervanceScore + 0.8*numberic_field_value)

2013-08-19 Thread Jack Krupansky
Edismax applies the multiplicative boost ("boost") after applying the additive boost functions ("bf"). I think (0.2*relervance score + 0.8* specified_numberic_field) should be equivalent to: 0.2*(relevance score + (0.8/0.2)* specified_numeric_field) or 0.2*(relevance score + 4.0* specified_nu

Custom Sort(0.2*relervanceScore + 0.8*numberic_field_value)

2013-08-19 Thread 刘健
Hello: I want to get final search result sorted by (0.2*relervance score + 0.8* specified_numberic_field) . I have known that if I use “bf”in edismax (e.g. bf=field(value)), I can get a result sorted by(relervance sore + field(value)) ,but I don`t know how to Implement the result sorted b

What filter to use to search with spaces omitted/included between words?

2013-08-19 Thread Utkarsh Sengar
I have a field which consists of a store name. How can I make sure that these queries return relevant results when searched against this column: *Example1: "Best Buy"* q=best (tokenizer filter makes this work) q=bestbuy q=buy (tokenizer filter makes this work) q=best buy (lower case filter makes t

Re: get term frequency, just only keywords search

2013-08-19 Thread Jack Krupansky
The Lucene PhraseQuery goes through a lot of effort to calculate "phrase frequency" (phraseFreq) - but that is not the same as term frequency (don't confuse terms and phrases). Feel free to pick that number out of the debugQuery output, or from the XML variant of the explain output. For refere

Re: custom hashing across cloud & shards

2013-08-19 Thread Erick Erickson
Right, you can't just tell Solr to create a single shard (i.e. by not specifying numshards) then expect to be able to do anything except index to a single shard. All the nodes will be replicas of the single shard. From there it really doesn't matter what you do, the documents will be routed to all

Re: Percolate feature?

2013-08-19 Thread Chris Hostetter
: Let's talk about the real use case. We are marketplace that sells : products that users have listed. For certain popular, high risk or : restricted keywords we charge the seller an extra fee/ban the listing. : We now have sellers purposely misspelling their listings to circumvent : this fee.

Re: get term frequency, just only keywords search

2013-08-19 Thread Erick Erickson
There are a series of functions that can deal with _some_ relevance data, see: http://wiki.apache.org/solr/FunctionQuery#Relevance_Functions Best Erick On Mon, Aug 19, 2013 at 10:25 AM, danielitos85 wrote: > ok I undestand it (thanks) but if I search a sentence and type > "debugQuery=on", in th

Re: Use case of Spatial search

2013-08-19 Thread Erick Erickson
I think you can do this by a combination of standard function queries, see: http://wiki.apache.org/solr/FunctionQuery#if and geodist, see: http://wiki.apache.org/solr/SpatialSearch#geodist_-_The_distance_function WARNING: I haven't tried this myself, but it seems like it would work. The trick is t

Re: Facing Solr performance during query search

2013-08-19 Thread Erick Erickson
Not until you tell us a lot more about your symptoms. What are your replication intervals? autowarm settings? how are you measuring "drastic" reductions? What have you tried in terms of diagnosing the problem? Please review: http://wiki.apache.org/solr/UsingMailingLists Best Erick On Mon, Aug 1

custom hashing across cloud & shards

2013-08-19 Thread Katie McCorkell
Hey All, If you don't specify numShards at the start, then you can do custom hashing, because Solr will just write the document to whatever shard you send it to. However, when I don't specify numshards, I'm having trouble creating more than one shard. It makes one shard and the others I add are s

RE: Regarding mointoring the solr

2013-08-19 Thread Boogie Shafer
thanks for that. that URL with the corename explicity called out seems to work correctly on both 4.4 (using the new style config for solr.xml) and 4.2.1 (using the old style config.xml) From: Shawn Heisey Sent: Monday, August 19, 2013 10:24 To: solr-u

Re: spatial search, geofilt does not work

2013-08-19 Thread Mingfeng Yang
BTW: my schema.xml contains the following related lines. On Mon, Aug 19, 2013 at 2:02 PM, Mingfeng Yang wrote: > My solr index has a field called "author_geo" which contains the author's > location, and when I am trying to get all docs whose author are within 10 > km of 35.0,35.0 using the f

spatial search, geofilt does not work

2013-08-19 Thread Mingfeng Yang
My solr index has a field called "author_geo" which contains the author's location, and when I am trying to get all docs whose author are within 10 km of 35.0,35.0 using the following query. curl ' http://localhost/solr/select?q=*:*&fq={!geofilt%20sfield=author_geo}&pt=35.0,35.0&d=10&wt=json&inden

Re: Regarding mointoring the solr

2013-08-19 Thread Shawn Heisey
On 8/19/2013 11:10 AM, Boogie Shafer wrote: the not often mentioned stats URL is another interface which you could scrape for stats (although i just noticed this url doesnt seem to work in my 4.4.0 test environment (it does work on the 4.2.1 hosts) so something may have changed, or my 4.4 env

Re: Prevent Some Keywords at Analyzer Step

2013-08-19 Thread Dan Davis
This is an interesting topic - my employer is a medical library and there are many keywords that may need to be aliased in various ways, and 2 or 3 word phrases that perhaps should be treated specially. Jack, can you give me an example of how to do that sort of thing?Perhaps I need to buy you

RE: Regarding mointoring the solr

2013-08-19 Thread Boogie Shafer
re: monitoring performance trends we use a free option which is lightweight and works at collecting the general java stats info out of solr is using the sflow agent for java. in concert with a host sflowd setup you can gather the jvm and system stats in decently dense intervals (default is 30s)

Re: Issue in Swap Space display at Solr Admin

2013-08-19 Thread Stefan Matheis
Vladimir Would you mind attaching the output of /solr/admin/system?wt=json ? The last about 20 lines should be enough .. i'm only interested in the "system" key which contains the memory informations. if that is completely missing .. or literally 0? - Stefan On Monday, August 19, 2013 at 1:

Re: Prevent Some Keywords at Analyzer Step

2013-08-19 Thread Jack Krupansky
Okay, but what is it that you are trying to "prevent"?? And, "diet follower" is a phrase, not a keyword or term. So, I'm still baffled as to what you are really trying to do. Trying explaining it in plain English. And given this same input, how would it be queried? -- Jack Krupansky -Or

Re: Prevent Some Keywords at Analyzer Step

2013-08-19 Thread Furkan KAMACI
Let's assume that my sentence is that: *Alice is a diet follower* My special keyword => *diet follower* Tokens will be: Token 1) Alice Token 2) is Token 3) a Token 4) diet Token 5) follower Token 6) *diet follower* 2013/8/19 Jack Krupansky > Your example doesn't "prevent" any keywords. > >

Re: get term frequency, just only keywords search

2013-08-19 Thread danielitos85
ok I undestand it (thanks) but if I search a sentence and type "debugQuery=on", in the explain I obtain termFreq=2.0 and it right. Is it possible to obtain that parameter? -- View this message in context: http://lucene.472066.n3.nabble.com/get-term-frequency-just-only-keywords-search-tp40845

Re: get term frequency, just only keywords search

2013-08-19 Thread Jack Krupansky
"Term frequency" is about "terms", nothing else. So, by definition, a phrase or any other collection of terms does not have a "termfreq" - in Lucene. -- Jack Krupansky -Original Message- From: danielitos85 Sent: Monday, August 19, 2013 9:59 AM To: solr-user@lucene.apache.org Subject

Re: Penalizing absent words

2013-08-19 Thread Erik Hatcher
You could penalize by boosting documents that have the term. Can you give a concrete example? How dynamic is the "absent words" list? Erik On Aug 19, 2013, at 09:26 , Rafael Calsaverini wrote: > Hi there, > > is there a way to penalize a document's score for lacking a particular > te

Re: Solr 4.3 and above core swap

2013-08-19 Thread richardg
I commented out the lockType, so it should be using the default of native according to the documentation. There is nothing special about our file systems. Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-3-and-above-core-swap-tp4084794p4085456.html Sent

Re: Create term vector from text

2013-08-19 Thread Jack Krupansky
The Solr Terms Component will give you the terms in the index and the document frequency of each. https://cwiki.apache.org/confluence/display/solr/The+Terms+Component -- Jack Krupansky -Original Message- From: Domma, Achim Sent: Monday, August 19, 2013 3:09 AM To: solr-user@lucene.ap

Re: get term frequency, just only keywords search

2013-08-19 Thread danielitos85
there isn't a way to get termFreq about a search like "french fries" (sentence)? -- View this message in context: http://lucene.472066.n3.nabble.com/get-term-frequency-just-only-keywords-search-tp4084510p4085454.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: get term frequency, just only keywords search

2013-08-19 Thread Jack Krupansky
"french fries" is a phrase, not a term or a keyword. It consists of two terms or keywords, "french" and "fries". They have to be treated separately. -- Jack Krupansky -Original Message- From: danielitos85 Sent: Monday, August 19, 2013 4:30 AM To: solr-user@lucene.apache.org Subject: R

Re: Problems installing Solr4 in Jetty9

2013-08-19 Thread Steve Rowe
https://issues.apache.org/jira/browse/SOLR-5173 On Aug 18, 2013, at 8:43 PM, Steve Rowe wrote: > bq. I thought that when Steve moved it from the test module to the core, he > handled it so that it would not go out in the dist. > > Mea culpa. > > @Chris Collins, I think you're talking about Ma

Re: State sharing

2013-08-19 Thread Jack Krupansky
Generally, you shouldn't be trying to maintain, let alone share "state" in Solr itself. It sounds like you need an application layer between your application clients and Solr which could then maintain whatever state it needs. -- Jack Krupansky -Original Message- From: Peyman Faratin

Re: struggling with solr.WordDelimiterFilterFactory

2013-08-19 Thread vicky desai
Hi Aloke, After taking the schema.xml and solrconfig.xml with the changes u mentioned it worked fine. However simply making this changes in schema.xml doesnt work. So seems like there is an issue in some configuration in solrconfig.xml. I will figure that out and post it here. Anyways thanks a lo

Re: Negation words

2013-08-19 Thread Jack Krupansky
Solr has tools for lexical analysis of text, not deeper syntax and semantics of text. IOW, Solr supports "keyword search", not "natural language search". -- Jack Krupansky -Original Message- From: venkatesham.gu...@igate.com Sent: Monday, August 19, 2013 8:38 AM To: solr-user@lucene.

Re: Prevent Some Keywords at Analyzer Step

2013-08-19 Thread Jack Krupansky
Your example doesn't "prevent" any keywords. You need to elaborate the specific requirements with more detail. Given a long stream of text, what tokenization do you expect in the index? -- Jack Krupansky -Original Message- From: Furkan KAMACI Sent: Monday, August 19, 2013 8:07 AM To

Re: Indexing an XML file in Apache Solr

2013-08-19 Thread Michael Sokolov
Abhiroop, I'm cc-ing the lux mailing list since this thread might not be of interest to all of solr-user; I'd suggest following up on that list. But to answer your actual question: see the documentation here http://luxdb.org/REST-API.html#LuxUpdateProcessor where it explains what to do. Basic

Re: struggling with solr.WordDelimiterFilterFactory

2013-08-19 Thread Aloke Ghoshal
Location of the schema.xml: http://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_4_2_1/solr/example/solr/collection1/conf/schema.xml On Mon, Aug 19, 2013 at 6:52 PM, Aloke Ghoshal wrote: > Here you go, it is the default 4.2.1 schema.xml ( > http://svn.apache.org/repos/asf/lucene/dev/tags

Penalizing absent words

2013-08-19 Thread Rafael Calsaverini
Hi there, is there a way to penalize a document's score for lacking a particular term? It would be quite nice if I could add a negative term to the score, which is proportional to the idf of a word that is not present in a given field of that document. Thanks for your time, Rafael Calsaverini D

Re: struggling with solr.WordDelimiterFilterFactory

2013-08-19 Thread Aloke Ghoshal
Here you go, it is the default 4.2.1 schema.xml ( http://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_4_2_1/solr/example/solr/solr.xml), with the following additions: Test with the field *ContTest*. Regards, Aloke On Mon,

Re: Negation words

2013-08-19 Thread Raymond Wiker
wheezed AND NOT "not wheezed" or +wheezed -"not wheezed" perhaps? Note: this assumes that you meant to search with the keyword "wheezed" and not "wheeze". On Mon, Aug 19, 2013 at 2:38 PM, venkatesham.gu...@igate.com < venkatesham.gu...@igate.com> wrote: > I am searching with a keyword and if

Re: struggling with solr.WordDelimiterFilterFactory

2013-08-19 Thread vicky desai
Hi Aloke, I have multiple fields in my schema which are of type text. i tried the same case on all the fields. Not working for me on any of them. If possible for u can u please post your dummy solrconfig.xml and schema.xml. I can replace them and check -- View this message in context: http:/

Re: struggling with solr.WordDelimiterFilterFactory

2013-08-19 Thread Aloke Ghoshal
Hi Vicky, Please check you if you have a second "multiValued" field by the name "content" defined in your schema.xml. It is typically part of the default schema definition & is different from the one you had initially posted had "Content" with a capital C. Here's the debugQuery on my system (wit

Use case of Spatial search

2013-08-19 Thread Shishir Jain
Hi, I have a very standard use case of Spatial search. Was trying to figure out how to do it in Solr, but couldn't figure out a standard way of doing it. Please point me to any document which explains this use case or how this specific use case can be implemented in Solr. The Use case is: There

Negation words

2013-08-19 Thread venkatesham.gu...@igate.com
I am searching with a keyword and if that keyword is attached to a negation(not, could not and etc) in the document that document should not be matched. For example I have a document text like "I have not wheezed since I have been taking Spiriva." I am searching with a keyword "wheeze" should not

Facing Solr performance during query search

2013-08-19 Thread sivaprasad
Hi, Last week we configured Solr master and slave set up. All the Solr search requests are routed to slave. After this configuration, we are seeing drastic performance problems with Solr. Can any one explain what would be the reason? And, how to disable optimizing the index, warming the searcher

Re: Regarding mointoring the solr

2013-08-19 Thread sivaprasad
You can look at this tool -- View this message in context: http://lucene.472066.n3.nabble.com/Regarding-mointoring-the-solr-tp4085392p4085423.html Sent from the Solr - User mailing list archive at Nabble.com.

Prevent Some Keywords at Analyzer Step

2013-08-19 Thread Furkan KAMACI
Hi; I want to write an analyzer that will prevent some special words. For example sentence to be indexed is: diet follower it will tokenize it as like that token 1) diet token 2) follower token 3) diet follower How can I do that with Solr?

Re: struggling with solr.WordDelimiterFilterFactory

2013-08-19 Thread vicky desai
Hi, Another observation while testing Docs having the value for content field as below 1. content:speedPost 2. content:sPeedpost 3. content:speEdpost 4. content:speedposT matches the query q=content:speedPost. So basically if in the entire word there is one 1 letter that is camel cased then it m

Re: Regarding mointoring the solr

2013-08-19 Thread Ados1984
Not sure of any solr specific tool but you can use jprofiler to see what is causing delay under the hood. Andy, On Aug 19, 2013, at 3:08 AM, prabu palanisamy wrote: > Hi > > My solr 3.5.0 indexed by wikipedia dump with java 1.6 is working perfectly. > I run the solr server in my server CentOS

SolrCloud Zookeeper Exception

2013-08-19 Thread Prasi S
Hi, I have setup solrcloud with 4.4 version. There is one external zookeeper and two instances of solr ( Total 4 shards - 2 shards in each instance) I was using dih to index from sql server. I twas indexing fine initially. Later when i shutdown solr and zookeeper's and then restarted them, I get t

Issue in Swap Space display at Solr Admin

2013-08-19 Thread Vladimir Vagaitsev
Hi, I've found an issue in displaying of Swap Space at Solr Admin page. When swap page is not used, the admin page shows a NaN percent of usage. Since used and total space are stored in double variables, the result of division of the used space (0.0Mb) by the total space (0.0Mb) is NaN. Maybe it's

Re: struggling with solr.WordDelimiterFilterFactory

2013-08-19 Thread vicky desai
Hi Erik, These are the request handlers defined in solrconfig.xml -- View this message in context: http://lucene.472066.n3.nabble.com/struggling-with-solr-WordDelimiterFilterFactory-tp4085021p4085417.html Sent from the Solr - User mailing list arc

Re: struggling with solr.WordDelimiterFilterFactory

2013-08-19 Thread Erick Erickson
Well, the case of your parsedQuery field _name_ (i.e. content) does not match the case of your field definition, (i.e. Content). This may just be an artifact however. That said, the MultiPhraseQuery is probably coming from your request handler definition. Can we see that too? Erick On Mon, Aug

Re: Indexing an XML file in Apache Solr

2013-08-19 Thread Abhiroop
Funnily just today itself I was looking at Lux for searching through my xml file. Now what I have inferred is that I need to format my xml to fit the format of Solr. Now do I have to manually code it or do i have some kind of parser on which the xml if fed is formatted to the Solr version? I couldn

Re: Share splitting at 23 million documents -> OOM

2013-08-19 Thread Bastian Mathes
Hi Greg, I am a colleague of Harald and had a look at his experiments last week. You are right, unpacking a fresh Solr 4.4, feeding a small number of documents (in my case 144) and trying to split the shard is not working. I get the same error message ("maxValue must be non-negative") that was di

Re: struggling with solr.WordDelimiterFilterFactory

2013-08-19 Thread vicky desai
Hi, I have created a new index. So reindexing shouldnt be the issue. Analysis page shows me correct result and match should be found as per the analysis page.But no output on actual query The Output of debug query is as follows content:speedPost content:speedPost MultiPhraseQuery(content:"(speedp

Re: Version Conflict on Atomic Update

2013-08-19 Thread Syao Work
Your _version_ does not match. On Fri, Aug 9, 2013 at 7:08 PM, Bruno René Santos wrote: > Using the document interface on the Solr admin i try to update the > following document: > > { "responseHeader": { "status": 0, "QTime": 1, "params": { "indent": > "true", > "q": "*:*", "_": "1376064413493"

Re: get term frequency, just only keywords search

2013-08-19 Thread danielitos85
Thanks Jack, but if my keyword search are two words? for example "french fries" ? how is the right syntax? -- View this message in context: http://lucene.472066.n3.nabble.com/get-term-frequency-just-only-keywords-search-tp4084510p4085399.html Sent from the Solr - User mailing list archive at N

Re: State sharing

2013-08-19 Thread Shalin Shekhar Mangar
As you noted, sharing S's ResponseBuilder is not possible because once the handler's process method is complete, the http action is deemed complete and the pipe is broken. You cannot send the client any further responses anymore. One way to solve this problem is to maintain the state of the job in

Re: Giving OpenSearcher as false

2013-08-19 Thread Shalin Shekhar Mangar
Comments inline: On Mon, Aug 19, 2013 at 12:20 PM, Prasi S wrote: > Hi, > 1. What is the impact , use of giving opensearcher as true > > >${solr.autoCommit.maxTime:15000} >true > >From the Solr reference guide: "Whether to open a new searcher when performing a commit. If this

Regarding mointoring the solr

2013-08-19 Thread prabu palanisamy
Hi My solr 3.5.0 indexed by wikipedia dump with java 1.6 is working perfectly. I run the solr server in my server CentOS release 5.7 (Final) and client Ubuntu 11.04 which access the solr server in my local system. The problem is that it is taking too much time. This problem does not arise when the

Create term vector from text

2013-08-19 Thread Domma, Achim
Hi, the TermVectorComponent allows me to retrieve data about the terms of a document, including tf-idf. Is it possible to get this data for a text, but without storing it in SOLR? As far as I figured out, the AnalysisComponent comes close, but does not return the core specific frequencies. Obvious