Nested boosting in map function in solr?

2014-06-02 Thread Kamal Kishore Aggarwal
Dear Team, I am trying to implement nested boosting in solr using map function. http://www.example.com:8984/solr/collection1/select?&q=laundry services&boost=map(query({!dismax qf=titlex !v=$ql3 pf=""}),0,0,1,map(query({!dismax qf=city v='"mumbai"' pf=""}),0,0,1,15))&ql3="laundry services". But

solr indexing not working when i try to insert 1000000 rows but works fine when i try to index 400000 rows or below

2014-06-02 Thread madhav bahuguna
HI iam using solr 4.7.1 and trying to do a full import.My data source is a table in mysql. It has 1000 rows and 20 columns. Whenever iam trying to do a full import solr stops responding. But when i try to do a import with a limit of 40 or less it works fine. If i try to import more than

Re: Integrate solr with openNLP

2014-06-02 Thread Vivekanand Ittigi
We'll surely look into UIMA integration. But before moving, is this( https://wiki.apache.org/solr/OpenNLP ) the only link we've got to integrate?isn't there any other article or link which may help us to do fix this problem. Thanks, Vivek On Tue, Jun 3, 2014 at 2:50 AM, Ahmet Arslan wrote: >

Re: search component needs access to results of previous component

2014-06-02 Thread Mikhail Khludnev
Hello Jitka, I wonder why you put the custom component logic into prepare() but not in process()? 28.05.2014 1:55 пользователь "Jitka" написал: > Hello and thanks for reading my question. > > If our high-level search handler doesn't get enough results back from a > Solr > query, it tweaks the qu

Re: Strange behaviour when tuning the caches

2014-06-02 Thread Otis Gospodnetic
Hi Jean-Sebastien, One thing you didn't mention is whether as you are increasing(I assume) cache sizes you actually see performance improve? If not, then maybe there is no value increasing cache sizes. I assume you changed only one cache at a time? Were you able to get any one of them to the poi

RE: suspect SOLR query from D029 (SOLR master)

2014-06-02 Thread Branham, Jeremy [HR]
These are the typical queries we are using. I'm curious if any of these parameters could be causing issues when using synonyms. ?shards=myserver1.com:8080/svc/solr/wdsc,myserver1.com:8080/svc/solr/kms&sort=score desc&q=(keyword:(this is a test) OR titleSearch:(this is a test) AND (doctype:("D

RE: suspect SOLR query from D029 (SOLR master)

2014-06-02 Thread Branham, Jeremy [HR]
We found a problem with the synonym list, and suspect there was some sort of recursion causing the memory to be gobbled up until the JVM crashed. Is this expected behavior from complex synonyms? Or could this be due to the combination of complex synonyms and a bad query format? Jeremy D. Branha

Re: Uneven shard heap usage

2014-06-02 Thread Michael Sokolov
You'll get very different performance profiles from the various highlighters (we saw up to 15x speed difference in our queries on average by changing highlighters). The default one re-analyzes the entire stored document, in memory and is the slowest, but provides the most faithful match to the

Re: Uneven shard heap usage

2014-06-02 Thread Joe Gresock
So, we were finally able to reproduce the heap overload behavior with a stress test of a query that highlighted the large fields we found. We'll have to play around with the highlighting settings, but for now we've disabled the highlighting on this query (which is a canned query that doesn't even

Automatic syncing of data on a node that was down for a while:

2014-06-02 Thread keertisurapaneni
We have 3 SOLR instances on 3 different hosts and we have an external zookeeper configured for each SOLR instance. Suppose, instance1 and instance2 are up and running and instance3 is down. A few records are added to both the running instances. I am able to see the records that were added to in

Re: JVM Crashed - SOLR deployed in Tomcat

2014-06-02 Thread subrata.sar...@oup.com
Hi, I am also facing similar issues with Tomcat running the solr. are you able to solve this issue? Thanks Subrata -- View this message in context: http://lucene.472066.n3.nabble.com/JVM-Crashed-SOLR-deployed-in-Tomcat-tp4078439p4139421.html Sent from the Solr - User mailing list archive at N

Re: Solr cloud nodes falling

2014-06-02 Thread Kashish
Hi Shawn, The reason why am looking at the physical memory is that i see my nodes falling off often. I have attached the cloud structure with this. I don't seem to find the reason why this third node has 'gone away'? How ever i can still query it as my tomcat server is up and running. Currently t

Re: Does CloudSolrServer hit zookeeper for every request?

2014-06-02 Thread Steve McKay
ZooKeeper allows clients to put watches on paths in the ZK tree. When the cluster state changes, every Solr client is notified by the ZK server and then each client reads the updated state. No polling is needed or even helpful. In any event, reading from ZK is much more lightweight than writing,

Re: Integrate solr with openNLP

2014-06-02 Thread Ahmet Arslan
Hi, I believe I answered it. Let me re-try,  There is no committed code for OpenNLP. There is an open ticket with patches. They may not work with current trunk. Confluence is the official documentation. Wiki is maintained by community. Meaning wiki can talk about some uncommitted features/stuf

Does CloudSolrServer hit zookeeper for every request?

2014-06-02 Thread Jim . Musil
I’m curious how CloudSolrServer works in practice. I understand that it gets the active solr nodes from zookeeper, but does it do this for every request? If it does hit zk for every request, that seems to put a lot of pressure on the zk ensemble. If it does NOT hit zk for every request, then h

Re: search component needs access to results of previous component

2014-06-02 Thread Jitka
Thanks for your reply. I'll check out that link. -- View this message in context: http://lucene.472066.n3.nabble.com/search-component-needs-access-to-results-of-previous-component-tp4138335p4139409.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: SolrCloud: Understanding Replication

2014-06-02 Thread Shawn Heisey
On 6/2/2014 2:21 PM, Marc Campeau wrote: > I notice I have this in the logs when I start SOLR for default example (I > had the same with my own connection) > > 21242 [coreZkRegister-1-thread-1] INFO > org.apache.solr.cloud.ShardLeaderElectionContext – Enough replicas found > to continue. > 21242

Re: Solr cloud nodes falling

2014-06-02 Thread Shawn Heisey
On 6/2/2014 1:39 PM, Kashish wrote: > I have a SOLR Cluster Cloud(SOLR+Tomcat) set up with one shard across 3 VM's. > All works well. But i see one node falls off after sometime. I noticed this > with two shards as well. The physical memory shoots up to 3.61 GB for total > of 3.73 GB. Even before i

Re: SolrCloud: Understanding Replication

2014-06-02 Thread Marc Campeau
I notice I have this in the logs when I start SOLR for default example (I had the same with my own connection) 21242 [coreZkRegister-1-thread-1] INFO org.apache.solr.cloud.ShardLeaderElectionContext – Enough replicas found to continue. 21242 [coreZkRegister-1-thread-1] INFO org.apache.solr.clou

Re: solr multi-tenant: anyone use per-tenant synonyms file?

2014-06-02 Thread Jack Krupansky
Try to stay with a separate collection/core for each tenant - otherwise relevancy for document scores gets "polluted" by other tenants, even if you do use filter queries to isolate what documents get returned for a tenant in a multi-tenant core. -- Jack Krupansky -Original Message- F

Re: SolrCloud: Understanding Replication

2014-06-02 Thread Marc Campeau
Here's my solrconfig.xml: 4.4 ${solr.data.dir:} ${solr.lock.type:native} true 6 3 false ${solr.autoSoftCommit.maxTime:15000} 1024

Solr cloud nodes falling

2014-06-02 Thread Kashish
I have a SOLR Cluster Cloud(SOLR+Tomcat) set up with one shard across 3 VM's. All works well. But i see one node falls off after sometime. I noticed this with two shards as well. The physical memory shoots up to 3.61 GB for total of 3.73 GB. Even before i loaded the documents, the physical memory u

solr multi-tenant: anyone use per-tenant synonyms file?

2014-06-02 Thread Will Milspec
Hi all, I've been reading up on solr cloud (via solr in action) with an eye toward multi-tenancy. (Read: "solrcloud newbie") One question that came up: what if a "one size fits all" synonyms file does not work for all customers? i.e. different customers/industries use different sets of synonym

Re: DirectSpellChecker not returning expected suggestions.

2014-06-02 Thread S.L
James, I get no results back and no suggestions for "wrangle" , however I get suggestions for "wranglr" , and "wrangle" is not present in my index. I am just searching for "wrangle" in a field that is created by copying other fields, as to how it is analyzed I dont have access to it now. Thanks

Re: Boost documents having a field value

2014-06-02 Thread Alvaro Cabrerizo
Hi, One option (not tested by myself), could be the use of payloads ( http://wiki.apache.org/solr/Payloads). Regards. On Mon, Jun 2, 2014 at 7:58 PM, Hakim Benoudjit wrote: > Hi guys, > Is it possible in solr to boost documents having a field value (Ex. > :)? > I know that it's possible to bo

Re: Boost documents having a field value

2014-06-02 Thread Jason Hellman
Hakim, That is what Boost Query (bq=) does. http://wiki.apache.org/solr/DisMaxQParserPlugin#bq_.28Boost_Query.29 Jason On Jun 2, 2014, at 10:58 AM, Hakim Benoudjit wrote: > Hi guys, > Is it possible in solr to boost documents having a field value (Ex. > :)? > I know that it's possible to boo

Re: openSearcher, default commit settings

2014-06-02 Thread Jason Hellman
Boon, I expect you will find many definitions of “proper usage” depending upon context and expected results. Personally, don’t believe this is Solr’s job to enforce, and there are many ways through the use of directives in the servlet container layer that can allow restrictions if you feel th

RE: DirectSpellChecker not returning expected suggestions.

2014-06-02 Thread Dyer, James
If "wrangle" is not in your index, and if it is within the max # of edits, then it should suggest it. Are you getting anything back from spellcheck at all? What is the exact query you are using? How is the spellcheck field analyzed? If you're using stemming, then "wrangle" and "wrangler" mig

Re: DirectSpellChecker not returning expected suggestions.

2014-06-02 Thread S.L
Thanks, you mean "wrangler" , has been stemmed to "wrangle" , if thats the case then why does it not return any results for "wrangle" ? On Mon, Jun 2, 2014 at 2:07 PM, david.w.smi...@gmail.com < david.w.smi...@gmail.com> wrote: > It appears to be stemmed. > > ~ David Smiley > Freelance Apache Lu

Re: DirectSpellChecker not returning expected suggestions.

2014-06-02 Thread david.w.smi...@gmail.com
It appears to be stemmed. ~ David Smiley Freelance Apache Lucene/Solr Search Consultant/Developer http://www.linkedin.com/in/davidwsmiley On Mon, Jun 2, 2014 at 2:06 PM, S.L wrote: > OK, I just realized that "wrangle" is a proper english word, probably thats > why I dont get a suggestion for "

Re: DirectSpellChecker not returning expected suggestions.

2014-06-02 Thread S.L
OK, I just realized that "wrangle" is a proper english word, probably thats why I dont get a suggestion for "wrangler" in this case. How ever in my test index there is no "wrangle" present , so even though this is a proper english word , since there is no occurence of it in the index should'nt Solr

Re: Integrate solr with openNLP

2014-06-02 Thread Vivekanand Ittigi
Thanks, I will check with the jira.. but you dint answe my first question..? And there's no way to integrate solr with openNLP?or is there any committed code, using which i can go head. Thanks, Vivek On Mon, Jun 2, 2014 at 10:30 PM, Ahmet Arslan wrote: > Hi, > > Here is the jira issue : https:

Re: DirectSpellChecker not returning expected suggestions.

2014-06-02 Thread S.L
I do not get any suggestion (when I search for "wrangle") , however I correctly get the suggestion wrangler when I search for wranglr , I am using the Direct and WordBreak spellcheckers in combination, I have not tried using anything else. Is the distance calculation of Solr different than what Le

Boost documents having a field value

2014-06-02 Thread Hakim Benoudjit
Hi guys, Is it possible in solr to boost documents having a field value (Ex. :)? I know that it's possible to boost a field above other fields at query-time, but I want to boost a field value not the field name. And if so, is the boosting done at query time or on indexing? -- Hakim Benoudjit.

Re: DirectSpellChecker not returning expected suggestions.

2014-06-02 Thread david.w.smi...@gmail.com
What do you get then? Suggestions, but not the one you’re looking for, or is it deemed correctly spelled? Have you tried another spellChecker impl, for troubleshooting purposes? ~ David Smiley Freelance Apache Lucene/Solr Search Consultant/Developer http://www.linkedin.com/in/davidwsmiley On S

Re: DirectSpellChecker not returning expected suggestions.

2014-06-02 Thread S.L
Anyone ? On Sat, May 31, 2014 at 12:33 AM, S.L wrote: > Hi All, > > I have a small test index of 400 documents , it happens to have an entry > for "wrangler", When I search for "wranglr", I correctly get the collation > suggestion as "wrangler", however when I search for "wrangle" , I do not >

Re: iText hitting infinite loop - Was Re: pdfs

2014-06-02 Thread Erick Erickson
Siegfried: Thanks! That pretty well nails the issue as being in Tika, it's nice to know! Erick On Mon, Jun 2, 2014 at 10:14 AM, Siegfried Goeschl wrote: > Hi folks, > > Brian was so kind and sent me the troublesome PDF document > > I gave it a try with PDFBox directly in order to extract the

iText hitting infinite loop - Was Re: pdfs

2014-06-02 Thread Siegfried Goeschl
Hi folks, Brian was so kind and sent me the troublesome PDF document I gave it a try with PDFBox directly in order to extract the text (PDFBox is used by Tikka to extract the textual content of a PDF document) * hitting an infinite loop with PDFBox 1.8.3 * no problems with PDFBox 1.8.4 & 1.8.

Re: Integrate solr with openNLP

2014-06-02 Thread Ahmet Arslan
Hi, Here is the jira issue : https://issues.apache.org/jira/browse/LUCENE-2899  Anyone can create an account.  I didn't use UIMA by myself and I have little knowledge about it. But I believe it is possible to use OpenNLP inside UIMA. You need to dig into UIMA documentation. Solr UIMA integrati

Re: Integrate solr with openNLP

2014-06-02 Thread Vivekanand Ittigi
Hi Arslan, If not uncommitted code, then which code to be used to integrate? If i have to comment my problems, which jira and how to put it? And why you are suggesting UIMA integration. My requirements is integrating with openNLP.? You mean we can do all the acitivties through UIMA as we do it u

Re: Enforcing a hard timeout on shard requests?

2014-06-02 Thread Gregg Donovan
On our search pages we have a main request where we really want to give the correct answer, but we also have a number of other child searches performed on that page where we're happy to get 90% of the way there and be able to enforce an SLA. Right now, when the main search finishes we have to comp

Re: change in EnumField configuration - what do you think?

2014-06-02 Thread Jack Krupansky
Do these numeric values have any significance to the application, or are they merely to reserve holes that will be later filled in without reindexing existing documents? I mean, there is no API to retrieve the numeric values or query them, right? IOW, they are not like stored values or docvalues

Re: Solr 4.8 synonyms expansion for each primary term

2014-06-02 Thread Jack Krupansky
Alas, the doc is silent on that point: https://cwiki.apache.org/confluence/display/solr/Managed+Resources The Solr javadoc adds no additional clarification. At a minimum, it should either clearly state that the overwrite (replace) feature is not supported, or show how to do it. I haven't dug

Re: Integrate solr with openNLP

2014-06-02 Thread Ahmet Arslan
Hi, Uncommitted code could have these kind of problems. It is not guaranteed to work with latest trunk. You could commend the problem you face on the jira ticket. By the way, may be you are after something doable with already committed UIMA stuff? https://cwiki.apache.org/confluence/display/s

RE: Strange behaviour when tuning the caches

2014-06-02 Thread Jean-Sebastien Vachon
Thanks for your quick response. Our JVM is configured with a heap of 8GB. So we are pretty close of the "optimal" configuration you are mentioning. The only other programs running is Zookeeper (which has its own storage device) and a proprietary API (with a heap of 1GB) we have on top of Solr t

Solr 4.8 synonyms expansion for each primary term

2014-06-02 Thread Archana R
we recently upgraded to Solr 4.8 and we are using REST API to update synonyms. we are trying to migrate synonyms from synonyms.txt file to new files. we have our synonyms defined in synonyms.txt where we can overwrite expand property with => . ex tv=>television in synonyms.txt . I am wondering h

Re: Upgrade of solr 4.0 to 4.8.1 query

2014-06-02 Thread Erick Erickson
Upgrade steps are carried along in the CHANGES.txt file, there's a section for every release (i.e. 4.1 -> 4.2, 4.5 -> 4.7) etc. There's no 4.0 -> 4.8 in a single go though. So I'd start there. Best, Erick On Mon, Jun 2, 2014 at 7:14 AM, Alexandre Rafalovitch wrote: > You can do lots of new stu

Re: Strange behaviour when tuning the caches

2014-06-02 Thread Shawn Heisey
On 6/2/2014 8:24 AM, Jean-Sebastien Vachon wrote: > We have yet to determine where the exact breaking point is. > > The two patterns we are seeing are: > > - less cache (around 20-30% hit/ratio), poor performance but > overall good stability When caches are too small, a low hit ratio is

Re: change in EnumField configuration - what do you think?

2014-06-02 Thread Erick Erickson
Would both then be supported? I see where it would be easily detectable. And I also assume that this wouldn't break back-compat? Best Erick On Mon, Jun 2, 2014 at 6:22 AM, Elran Dvir wrote: > Hi all, > > I am the one that contributed EnumField code to Solr. > There was a long discussion how th

Re: Uneven shard heap usage

2014-06-02 Thread Erick Erickson
Joe: One thing to add, if you're returning that doc (or perhaps even some fields, this bit is still something of a mystery to me) then the whole 180M may be being decompressed. Since 4.1 the stored fields have been compressed to disk by default. That this, this is only true if the docs in question

Re: Can Atomic Updates help me to re-indexing w/o crawling external content?

2014-06-02 Thread Erick Erickson
Hmmm, when changing the schema there might be issues is you're changing the definition of an already-existing field. I've seen weirdness when the fundamental definition of a field changes so I'd be cautious. You'd only be able to add new fields via copyField I'd guess. In this situation, since yo

Strange behaviour when tuning the caches

2014-06-02 Thread Jean-Sebastien Vachon
Hi All, We have a 5 nodes setup running Solr 4.8.1 and we are trying to get the most out of it by tuning Solr caches. Following is the output of the script version.sh provided with Tomcat Server version: Apache Tomcat/7.0.39 Server built: Mar 22 2013 12:37:24 Server number: 7.0.39.0 OS Name:

Re: Upgrade of solr 4.0 to 4.8.1 query

2014-06-02 Thread Alexandre Rafalovitch
You can do lots of new stuff, but I believe the old config will run ok without changes. One thing to be aware of is the logging jar unbundling and manual correction for that when running under tomcat. That's on the Wiki somewhere and should have been covered in 4.2 to 4.7 change. Regards, Alex.

Integrate solr with openNLP

2014-06-02 Thread Vivekanand Ittigi
I followed this link to integrate https://wiki.apache.org/solr/OpenNLP to integrate Installation For English language testing: Until LUCENE-2899 is committed: 1.pull the latest trunk or 4.0 branch 2.apply the latest LUCENE-2899 patch 3.do 'ant compile' cd solr/contrib/opennlp/sr

Sum of OR'd nested query scores

2014-06-02 Thread Diego Fernandez
Hi! I have a question which I posted on http://stackoverflow.com/questions/23959727/sum-of-nested-queries-in-solr about taking the sum of OR'd nested queries. I'll repeat it here, but if you want some SO points and have an answer, feel free to answer there. [quote] We have a search that takes

RE: Spell check [or] Did you mean this with Phrase suggestion

2014-06-02 Thread Vanitha
Here is my configuration : schema.xml: solrconfig.xml: text_en default name_gen solr.DirectSolrSpellChecker internal 0.5

Re: Offline Indexes Update to Shard

2014-06-02 Thread Vineet Mishra
Hi Wolfgang, Thanks for your response, can you quote some running example of MapReduceIndexerTool for indexing through csv files. If you are referring to http://www.cloudera.com/content/cloudera-content/cloudera-docs/Search/latest/Cloudera-Search-User-Guide/csug_mapreduceindexertool.html?scroll=cs

RE: Spell check [or] Did you mean this with Phrase suggestion

2014-06-02 Thread Vanitha
No collate is working as expected, Please help. It is more like spell checking with Infix suggester. -- View this message in context: http://lucene.472066.n3.nabble.com/Spell-check-or-Did-you-mean-this-with-Phrase-suggestion-tp4135547p4139260.html Sent from the Solr - User mailing list archive

change in EnumField configuration - what do you think?

2014-06-02 Thread Elran Dvir
Hi all, I am the one that contributed EnumField code to Solr. There was a long discussion how the integer values of an enum field should be indicated in the configuration. It was decided that the integer value wouldn't be written explicitly, but would be implicitly determined by the value order.

Upgrade of solr 4.0 to 4.8.1 query

2014-06-02 Thread Steve Howe
Hi all, First time posting so the regular sorry if this is a popular question.. Anyhoo - I'm running solr 4.0 on a test rig with multicore and I would like to upgrade to 4.8.1. I can't find any clear tutorials on this on the web and I can only see a thread on 4.2 -> 4.7 on the mailing list. Can

Re: Uneven shard heap usage

2014-06-02 Thread Michael Sokolov
Joe - there shouldn't really be a problem *indexing* these fields: remember that all the terms are spread across the index, so there is really no storage difference between one 180MB document and 180 1 MB documents from an indexing perspective. Making the field "stored" is more likely to lead

Re: openSearcher, default commit settings

2014-06-02 Thread Boon Low
Thanks for clearing this up. The wiki, being an authoritative reference, needs to be corrected. Re. default commit settings. I agree educating developers is very essential. But in reality, you can't rely on this as the sole mechanism for ensuring proper usage of the update API, especially for c

Re: Uneven shard heap usage

2014-06-02 Thread Joe Gresock
And the followup question would be.. if some of these documents are legitimately this large (they really do have that much text), is there a good way to still allow that to be searchable and not explode our index? These would be "text_en" type fields. On Mon, Jun 2, 2014 at 6:09 AM, Joe Gresock

Re: Uneven shard heap usage

2014-06-02 Thread Joe Gresock
So, we're definitely running into some very large documents (180MB, for example). I haven't run the analysis on the other 2 shards yet, but this could definitely be our problem. Is there any conventional wisdom on a good "maximum size" for your indexed fields? Of course it will vary for each sys

Re: Grouping on a multi-valued field

2014-06-02 Thread Bhoomit Vasani
Hi Erick, Thanks again, I'm using the same thing as workaround. I first use "pivot faceting" and another call to fetch actual documents. On Fri, May 30, 2014 at 9:46 PM, Erick Erickson wrote: > OK, I see what you're trying to do. Unfortunately grouping is just not > built to support multivalu

Re: Offline Indexes Update to Shard

2014-06-02 Thread Wolfgang Hoschek
Sounds like you should consider using MapReduceIndexerTool. AFAIK, this is the most scalable indexing (and merging) solution out there. Wolfgang. On Jun 2, 2014, at 10:33 AM, Vineet Mishra wrote: > Hi Erick, > > Thanks for your mail, please let me go through with my use case. > I am having ar

Re: Anybody knows of a project that indexes SVN repos into Solr?

2014-06-02 Thread Ramkumar R. Aiyengar
Not an exact answer.. OpenGrok uses Lucene, but not Solr. On 2 Jun 2014 07:48, "Alexandre Rafalovitch" wrote: > Hello, > > Anybody knows of a recent projects that index SVN repos for Solr > search? With or without UI. > > I know of similar efforts for other VCS, but the only thing I found > for S

Re: Offline Indexes Update to Shard

2014-06-02 Thread Vineet Mishra
Hi Erick, Thanks for your mail, please let me go through with my use case. I am having around 20-40 Billion Records to index with each record is having around 200-400 fields, the data is sensor data so it can be easily stored in Integer or Float. Now to index this huge amount of data I am going wi

Re: Offline Indexes Update to Shard

2014-06-02 Thread Vineet Mishra
Hi Otis, I have to index some huge amount of data that's around Billions of records, since indexing via HTTP post mechanism will be a slow and lethargic due to network delay hence I am indexing through EmbeddedSolrServer to create index which I can later upload to different Shards in SolrCloud, al