Re: What to expect when testing Japanese search index

2013-03-22 Thread Hayden Muhl
A search for a single character will only return hits if that character makes up a whole word, and only if the tokenizer recognizes that character as a word. It's just like in other languages, where a search for "p" won't return documents with the word "apple". If I were you, I would go into the S

Question on highlighting of external fields

2013-03-22 Thread Jamie Johnson
Some time ago I had worked with a fellow developer to put together an addon to the (then) current Solr Highlighter to support fetching fields from an external source (like a database for instance). The general mechanics seem to work properly but I am seeing issues now where the highlights do not m

RE: Boost query parameter with Lucid parser and using query FunctionQuery

2013-03-22 Thread Miller, Will Jr
This is the echo params... It looks like it ignores the qf in the FunctionQuery and instead takes the qf of the main query. true true score desc 11 *:* true body true true false all title,score

Re: NoSuchMethodError updateDocument

2013-03-22 Thread Jan Høydahl
Are you 100% sure you use the exact jars for 4.1.0 *everywhere*, and that you're not blending older versions from the Nutch distro in your classpath here? > Any ideas? BTW: What was your question here regarding Jetty vs Tomcat? -- Jan Høydahl, search solution architect Cominvent AS - www.cominve

Re: Boost query parameter with Lucid parser and using query FunctionQuery

2013-03-22 Thread Jan Høydahl
Why would you use dismax for the query() when you want to match a simple term to one field? If you share &echoParams=all the answer may lie somewhere therein? -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com 23. mars 2013 kl. 00:07

Re: Boost query parameter with Lucid parser and using query FunctionQuery

2013-03-22 Thread Jack Krupansky
You'll have to contact Lucid's support for questions about their code. (I've been away from that code too long to recall much about it.) -- Jack Krupansky -Original Message- From: Miller, Will Jr Sent: Friday, March 22, 2013 7:07 PM To: solr-user@lucene.apache.org Subject: Boost query

Re: NoSuchMethodError updateDocument

2013-03-22 Thread Furkan KAMACI
I just indicated that JVM parameter: -Dsolr.solr.home=/home/projects/lucene-solr/solr/solr_home solr_home is where is my config files etc. stands. My solr.xml has that lines: On the other hand I run it from my tomcat without using example embedded jetty start.jar. Any ideas? 2013/3/

Boost query parameter with Lucid parser and using query FunctionQuery

2013-03-22 Thread Miller, Will Jr
I have been playing around with the bq/bf/boost query parameters available in dismax/edismax. I am using the Lucid parser as my default parser for the query. The lucid parser is an extension of the DisMax parser and should contain everything that is available in that parser. My goal is boost it

doc cache issues... query-time way to bypass cache?

2013-03-22 Thread Gary Yngve
I have a situation we just discovered in solr4.2 where there are previously cached results from a limited field list, and when querying for the whole field list, it responds differently depending on which shard gets the query (no extra replicas). It either returns the document on the limited field

Re: Did something change with Payloads?

2013-03-22 Thread Mark Miller
On Mar 22, 2013, at 5:54 PM, jimtronic wrote: > Ok, this is very bizzare. > > If I insert more than one document at a time using the update handler like > so: > > [{"id":"1","foo_ap":"bar|50"}},{"id":"2","foo_ap":"bar|75"}] > > It actually stores the same payload value "50" for both docs. >

RE: strange behaviour of wordbreak spellchecker in solr cloud

2013-03-22 Thread Dyer, James
Alex, You may want to move over to the dev user's list now that you're working on code. Or if you would rather not subscribe to the dev-list, add yourself as a watcher to SOLR-3758 and comment further there. This will help us keep track on progress for the issue. The short answer is that in

Re: overseer queue clogged

2013-03-22 Thread Mark Miller
On Mar 22, 2013, at 5:54 PM, Gary Yngve wrote: > Thanks, Mark! > > The core node names in the solr.xml in solr4.2 is great! Maybe in 4.3 it > can be supported via API? It is with the core admin api - do you mean the collections api? Please make a JIRA for any feature requests so they don't g

Re: overseer queue clogged

2013-03-22 Thread Gary Yngve
Thanks, Mark! The core node names in the solr.xml in solr4.2 is great! Maybe in 4.3 it can be supported via API? Also I am glad you mentioned in other post the chance to namespace zookeeper by adding a path to the end of the comma-delim zk hosts. That works out really well in our situation for

Re: Did something change with Payloads?

2013-03-22 Thread jimtronic
Ok, this is very bizzare. If I insert more than one document at a time using the update handler like so: [{"id":"1","foo_ap":"bar|50"}},{"id":"2","foo_ap":"bar|75"}] It actually stores the same payload value "50" for both docs. That seems like a bug, no? There was a core change in 4.1 to how p

Re: strange behaviour of wordbreak spellchecker in solr cloud

2013-03-22 Thread alxsss
Thanks. I can fix this, but going over code it seems it is not easy to figure out where the whole request and response come from. I followed up SpellCheckComponent#finishStage and found out that SearchHandler#handleRequestBody calls this function. However, which part calls handleRequestBod

RE: strange behaviour of wordbreak spellchecker in solr cloud

2013-03-22 Thread Dyer, James
Alex, I added your comments to SOLR-3758 (https://issues.apache.org/jira/browse/SOLR-3758) , which seems to me to be the very same issue. If you need this to work now and if you cannot devise a fix yourself, then perhaps a workaround is if the query returns with 0 results, re-issue the query

Re: Solr 4.2 - Slave Index version is higher than Master

2013-03-22 Thread Mark Miller
That was to you Phil. So it seems this is a problem with the configuration replication case I would guess - I didn't really look at that path in the 4.2 fixes I worked on. I did add it to the new testing I'm doing since I've suspected it (it will prompt a core reload that doesn't happen when co

Re: how to get term vector information of sepcific word/position in field

2013-03-22 Thread Chris Hostetter
: is there any way, if i can get term vector information of specific word : only, like i can pass the word, and it will just return term position and : frequency for that word only? : : and also if i can pass the position e.g. startPosition=5 and endPosition=10; : then it will return terms, posit

Re: transientCacheSize not working

2013-03-22 Thread didier deshommes
I've created an issue and patch here that makes it possible to specify transient and loadOnStatup on core creation: https://issues.apache.org/jira/browse/SOLR-4631 On Wed, Mar 20, 2013 at 10:14 AM, didier deshommes wrote: > Thanks. Is there a way to pass loadOnStartup and/or transient as > param

Re: Can we manipulate termfreq to count as 1 for multiple matches?

2013-03-22 Thread Chris Hostetter
: parameter "*omitTermFreqAndPositions"* the key thing to remember being: if you use this, then by omiting positions you can no longer do phrase queries. : or you can use a custom similarity class that overrides the term freq and : return one for only that field. : http://wiki.apache.org/solr/S

Re: DocValues and field requirements

2013-03-22 Thread Chris Hostetter
: Thank you for your response. Yes, that's strange. By enabling DocValues the : information about missing fields is lost, which changes the way of sorting : as well. Adding default value to the fields can change a logic of : application dramatically (I can't set default value to 0 for all : Trie*F

Re: Slow queries for common terms

2013-03-22 Thread Tom Burton-West
Hi David and Jan, I wrote the blog post, and David, you are right, the problem we had was with phrase queries because our positions lists are so huge. Boolean queries don't need to read the positions lists. I think you need to determine whether you are CPU bound or I/O bound.It is possible

Re: Solr 4.2, reindexing, transaction logs, high memory usage

2013-03-22 Thread Shawn Heisey
On 3/22/2013 9:24 AM, Raghav Karol wrote: We run this index in 8 solr sharded in 8 solr cores on a single host an m2.4xlarge EC2 instances. We do not use zookeeper (because of operational issues on our live indexes) and manage the sharding ourselves. For this index we run with -Xmx30G and observ

Re: strange behaviour of wordbreak spellchecker in solr cloud

2013-03-22 Thread alxsss
Hello, Further investigation shows the following pattern, for both DirectIndex and wordbreak spellchekers. Assume that in all cases there are spellchecker results when distrib=false In distributed mode (distrib=true) case when matches=0 1. group=true, no spellcheck results 2. group

Re: Solr 4.2 - Slave Index version is higher than Master

2013-03-22 Thread Mark Miller
And your also on 4.2? - Mark On Mar 22, 2013, at 12:41 PM, Uomesh wrote: > Also, I am replicating only on commit and startup. > > Thanks, > Umesh > > On Fri, Mar 22, 2013 at 11:23 AM, Umesh Sharma wrote: > >> Hi Mrk, >> >> I am replicating below config files but not replicating solrconfig.

NoSuchMethodError updateDocument

2013-03-22 Thread Furkan KAMACI
I use Solr 4.1.0 and Nutch 2.1, Java 1.7.0_17, Tomcat 7.0, Intellij IDEA 12.with a Centos 6.4 at my 64 bit computer. I run that command succesfully: bin/nutch solrindex http://localhost:8080/solr -index However when I run that command: bin/nutch solrindex http://localhost:8080/solr -reindex I

Re: Solr 4.2 - Slave Index version is higher than Master

2013-03-22 Thread Uomesh
Also, I am replicating only on commit and startup. Thanks, Umesh On Fri, Mar 22, 2013 at 11:23 AM, Umesh Sharma wrote: > Hi Mrk, > > I am replicating below config files but not replicating solrconfig.xml. > > confFiles: schema.xml, elevate.xml, stopwords.txt, > mapping-FoldToASCII.txt, mapping-

Re: Solr 4.2 - Slave Index version is higher than Master

2013-03-22 Thread Uomesh
Hi Mrk, I am replicating below config files but not replicating solrconfig.xml. confFiles:schema.xml, elevate.xml, stopwords.txt, mapping-FoldToASCII.txt, mapping-ISOLatin1Accent.txt, protwords.txt, spellings.txt, synonyms.txt also strange I am seeing big Gen difference between Master and slave

Re: SOLR - Documents with large number of fields ~ 450

2013-03-22 Thread John Nielsen
"with the on disk option". Could you elaborate on that? Den 22/03/2013 05.25 skrev "Mark Miller" : > You might try using docvalues with the on disk option and try and let the > OS manage all the memory needed for all the faceting/sorting. This would > require Solr 4.2. > > - Mark > > On Mar 21, 2

Re: Solr 4.2 - Slave Index version is higher than Master

2013-03-22 Thread Mark Miller
Are you replicating configuration files as well? - Mark On Mar 22, 2013, at 6:38 AM, "John, Phil (CSS)" wrote: > To add to the discussion. > > We're running classic master/slave replication (not solrcloud) with 1 master > and 2 slaves and I noticed the slave having a higher version number tha

Re: Solr 4.2 replcation whole index files mechanism.

2013-03-22 Thread Mark Miller
There are a few things going on here that caused this, all resolved in 4.2 as far as I know. - Mark On Mar 22, 2013, at 3:56 AM, bradhill99 wrote: > Hi, > I use solrcloud 4.1. > I start up two solr nodes A and B and then created a new collection using > CoreAdmin to A using one shard, so Nod

Solr 4.2, reindexing, transaction logs, high memory usage

2013-03-22 Thread Raghav Karol
Dear List, We are using solr-4.2 to build an index of 5M docs each limited to 6K in size. Conceptually we are modelling a stack of documents. Here is a excerpt from our schema.xml We have publicationBody_1: ..., publicationBody_2: ... maximum of 30 with max 10K of data in each.

Re: Sort-field for ALL docs in FieldCache for sort queries -> OOM on lots of docs

2013-03-22 Thread Shawn Heisey
On 3/22/2013 8:54 AM, Per Steffensen wrote: > Me too. I will find out soon - I hope! But re-indexing is kinda a > problem for us, but we will figure out. > Any "guide to re-index all you stuff" anywhere, so I do it the easiest > way? Guess maybe there are some nice tricks about steaming data direct

Re: Sort-field for ALL docs in FieldCache for sort queries -> OOM on lots of docs

2013-03-22 Thread Per Steffensen
On 3/21/13 10:50 PM, Shawn Heisey wrote: On 3/21/2013 4:05 AM, Per Steffensen wrote: Can anyone else elaborate? How to "activate" it? How to make sure, for sorting, that sort-field-value for all docs are not read into memory for sorting - leading to OOM when you have a lot of docs? Can this feat

PatternReplaceFilterFactory -- what does this regex do?

2013-03-22 Thread Eric Wilson
I'm using the Solr Suggester for autocompletion with WFSTLookup suggest component, and a text file with phrases and weights. ( http://wiki.apache.org/solr/Suggester) I found that the following filter made it impossible to match on ampersands. So I removed it. But I'm sure it was there for a reason

solr-user@lucene.apache.org

2013-03-22 Thread anuj vats
Hi Shawan, I have seen your post on solr cloude Master-Master configuration on two servers. I have to use the same Solr structure, but from long I am not able to configure it to comunicate between two server, on single server it works fine. Can you pls help me out to provide required config cha

Solr 4.2 replcation whole index files mechanism.

2013-03-22 Thread bradhill99
Hi, I use solrcloud 4.1. I start up two solr nodes A and B and then created a new collection using CoreAdmin to A using one shard, so Node A is leader. Then I index some docs to it. Then I created the same collection using CoreAdmin to B to become a replica. I found that solr will sync all ind

Re: Slow queries for common terms

2013-03-22 Thread Jan Høydahl
Hi There might not be a final cure with more RAM if you are CPU bound. Scoring 90M docs is some work. Can you check what's going on during those 15 seconds? Is your CPU at 100%? Try an (foo OR bar OR baz) search which generates >100mill hits and see if that is slow too, even if you don't use fr

Re: Solr cloud and auto shard timeline

2013-03-22 Thread Jamie Johnson
Yes Anshum exactly what I was looking for. Is this being targeted in a particular solr release? I see that some of the related issues are targeted for 4.3, is that the goal for this as well? On Fri, Mar 22, 2013 at 8:07 AM, Anshum Gupta wrote: > Hi Jamie, > > There's progress on the Shard spli

Re: SOLR - Documents with large number of fields ~ 450

2013-03-22 Thread Marcin Rzewucki
Hi, I have a collection with more than 4K fields, but mostly Trie*Fields types. It is used for faceting,sorting,searching and statsComponent. It works pretty fine on Amazon 4xm1.large (7.5GB RAM) EC2 boxes. I'm using SolrCloud, multi A-Z setup and ephemeral storage. Index is managed by mmap, 4GB f

RE: Logging inside a custom analyzer

2013-03-22 Thread Gian Maria Ricci
Thanks a lot, it was exactly what I need, sorry for not being so clear with my question :). Gian Maria. -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Tuesday, March 19, 2013 3:04 PM To: solr-user@lucene.apache.org; alkamp...@nablasoft.com Subject: Re: Log

Re: Solr cloud and auto shard timeline

2013-03-22 Thread Anshum Gupta
Hi Jamie, There's progress on the Shard splitting JIRA that I believe you are talking about. You may have a look at this for more details: https://issues.apache.org/jira/browse/SOLR-3755 . On Fri, Mar 22, 2013 at 4:30 PM, Jamie Johnson wrote: > I am sorry for the confusion, I had assumed that

Using Solr For a Real Search Engine

2013-03-22 Thread Furkan KAMACI
If I want to use Solr in a web search engine what kind of strategies should I follow about how to run Solr. I mean I can run it via embedded jetty or use war and deploy to a container? You should consider that I will have heavy work load on my Solr.

Re: Solr 4.2 - Slave Index version is higher than Master

2013-03-22 Thread Bernd Fehling
That issue was already with solr 4.1. http://lucene.472066.n3.nabble.com/replication-problems-with-solr4-1-td4039647.html Nice to know that it is still there in 4.2. With some luck it will make it to 4.2.1 ;-) Regards Bernd Am 21.03.2013 21:08, schrieb Uomesh: > Hi, > > I am seeing an issue af

Re: Don't cache filter queries

2013-03-22 Thread Dotan Cohen
On Thu, Mar 21, 2013 at 6:22 PM, Chris Hostetter wrote: > > : Just add {!cache=false} to the filter in your query > : (http://wiki.apache.org/solr/SolrCaching#filterCache). > ... > : > I need to use the filter query feature to filter my results, but I > : > don't want the results cached as

Re: Solr cloud and auto shard timeline

2013-03-22 Thread Jamie Johnson
I am sorry for the confusion, I had assumed that there was a way to issue commands to ES to have it change it's current shard layout (i.e. go from 2 to 4 for instance) but on further reading of their documentation I do not see that. That being said is there a timeline on being able to add shards t

RE: Solr 4.2 - Slave Index version is higher than Master

2013-03-22 Thread John, Phil (CSS)
To add to the discussion. We're running classic master/slave replication (not solrcloud) with 1 master and 2 slaves and I noticed the slave having a higher version number than the master the other day as well. In our case, knock on wood, it hasn't stopped replication. If you'd like a copy o

Re: DocValues and field requirements

2013-03-22 Thread Marcin Rzewucki
Hi Shawn, Thank you for your response. Yes, that's strange. By enabling DocValues the information about missing fields is lost, which changes the way of sorting as well. Adding default value to the fields can change a logic of application dramatically (I can't set default value to 0 for all Trie*F

solr 4.1 replcation whole indexs files from leader

2013-03-22 Thread Brad Hill
Hi,  I use solrcloud 4.1.  I start up two solr nodes A and B and then created a new collection using CoreAdmin to A using one shard, so Node A is leader.  Then I index some docs to it. Then I created the same collection using CoreAdmin to B to become a replica. I found that solr will sync all ind