search on default field returns less documents
Hi All we have two fields: 'text' is our default field: text we copy the doc field to the 'text' field when indexing 10 documents that have a value with same prefix in the doc field, for example: ca067-XXX ,and searching on the default field I get only 5 results, I search for ca067 on the default field. when searching ca067 on the 'doc' field I get the expected 10 results. anyone has an idea what is wrong here ? Thank you -- View this message in context: http://lucene.472066.n3.nabble.com/search-on-default-field-returns-less-documents-tp3999896.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: search on default field returns less documents
Jack, Thanks for your reply. We are using solr 3.4. We use the standard lucene query parser. I added debugQuery=true , this is the result when searching ca067 and getting 5 documents: ca067ca067PhraseQuery(text:"ca 067")text:"ca 067" 0.1108914 = (MATCH) weight(text:"ca 067" in 75), product of: 1.0 = queryWeight(text:"ca 067"), product of: 5.67764 = idf(text: ca=16 067=9) 0.17612952 = queryNorm 0.1108914 = fieldWeight(text:"ca 067" in 75), product of: 1.0 = tf(phraseFreq=1.0) 5.67764 = idf(text: ca=16 067=9) 0.01953125 = fieldNorm(field=text, doc=75) 0.088713124 = (MATCH) weight(text:"ca 067" in 71), product of: 1.0 = queryWeight(text:"ca 067"), product of: 5.67764 = idf(text: ca=16 067=9) 0.17612952 = queryNorm 0.088713124 = fieldWeight(text:"ca 067" in 71), product of: 1.0 = tf(phraseFreq=1.0) 5.67764 = idf(text: ca=16 067=9) 0.015625 = fieldNorm(field=text, doc=71) 0.088713124 = (MATCH) weight(text:"ca 067" in 72), product of: 1.0 = queryWeight(text:"ca 067"), product of: 5.67764 = idf(text: ca=16 067=9) 0.17612952 = queryNorm 0.088713124 = fieldWeight(text:"ca 067" in 72), product of: 1.0 = tf(phraseFreq=1.0) 5.67764 = idf(text: ca=16 067=9) 0.015625 = fieldNorm(field=text, doc=72) 0.06653485 = (MATCH) weight(text:"ca 067" in 74), product of: 1.0 = queryWeight(text:"ca 067"), product of: 5.67764 = idf(text: ca=16 067=9) 0.17612952 = queryNorm 0.06653485 = fieldWeight(text:"ca 067" in 74), product of: 1.0 = tf(phraseFreq=1.0) 5.67764 = idf(text: ca=16 067=9) 0.01171875 = fieldNorm(field=text, doc=74) 0.0554457 = (MATCH) weight(text:"ca 067" in 73), product of: 1.0 = queryWeight(text:"ca 067"), product of: 5.67764 = idf(text: ca=16 067=9) 0.17612952 = queryNorm 0.0554457 = fieldWeight(text:"ca 067" in 73), product of: 1.0 = tf(phraseFreq=1.0) 5.67764 = idf(text: ca=16 067=9) 0.009765625 = fieldNorm(field=text, doc=73) this is the result when searching doc:ca067 and getting 10 documents: doc:ca067doc:ca067PhraseQuery(doc:"ca 067")doc:"ca 067" 1.8805147 = (MATCH) weight(doc:"ca 067" in 71), product of: 0.9994 = queryWeight(doc:"ca 067"), product of: 6.0176477 = idf(doc: ca=10 067=10) 0.16617788 = queryNorm 1.8805149 = fieldWeight(doc:"ca 067" in 71), product of: 1.0 = tf(phraseFreq=1.0) 6.0176477 = idf(doc: ca=10 067=10) 0.3125 = fieldNorm(field=doc, doc=71) 1.8805147 = (MATCH) weight(doc:"ca 067" in 72), product of: 0.9994 = queryWeight(doc:"ca 067"), product of: 6.0176477 = idf(doc: ca=10 067=10) 0.16617788 = queryNorm 1.8805149 = fieldWeight(doc:"ca 067" in 72), product of: 1.0 = tf(phraseFreq=1.0) 6.0176477 = idf(doc: ca=10 067=10) 0.3125 = fieldNorm(field=doc, doc=72) 1.8805147 = (MATCH) weight(doc:"ca 067" in 73), product of: 0.9994 = queryWeight(doc:"ca 067"), product of: 6.0176477 = idf(doc: ca=10 067=10) 0.16617788 = queryNorm 1.8805149 = fieldWeight(doc:"ca 067" in 73), product of: 1.0 = tf(phraseFreq=1.0) 6.0176477 = idf(doc: ca=10 067=10) 0.3125 = fieldNorm(field=doc, doc=73) 1.8805147 = (MATCH) weight(doc:"ca 067" in 74), product of: 0.9994 = queryWeight(doc:"ca 067"), product of: 6.0176477 = idf(doc: ca=10 067=10) 0.16617788 = queryNorm 1.8805149 = fieldWeight(doc:"ca 067" in 74), product of: 1.0 = tf(phraseFreq=1.0) 6.0176477 = idf(doc: ca=10 067=10) 0.3125 = fieldNorm(field=doc, doc=74) 1.8805147 = (MATCH) weight(doc:"ca 067" in 75), product of: 0.9994 = queryWeight(doc:"ca 067"), product of: 6.0176477 = idf(doc: ca=10 067=10) 0.16617788 = queryNorm 1.8805149 = fieldWeight(doc:"ca 067" in 75), product of: 1.0 = tf(phraseFreq=1.0) 6.0176477 = idf(doc: ca=10 067=10) 0.3125 = fieldNorm(field=doc, doc=75) 1.8805147 = (MATCH) weight(doc:"ca 067" in 76), product of: 0.9994 = queryWeight(doc:"ca 067"), product of: 6.0176477 = idf(doc: ca=10 067=10) 0.16617788 = queryNorm 1.8805149 = fieldWeight(doc:"ca 067" in 76), product of: 1.0 = tf(phraseFreq=1.0) 6.0176477 = idf(doc: ca=10 067=10) 0.3125 = fieldNorm(field=doc, doc=76) 1.8805147 = (MATCH) weight(doc:"ca 067" in 77), product of: 0.9994 = queryWeight(doc:"ca 067"), product of: 6.0176477 = idf(doc: ca=10 067=10) 0.16617788 = queryNorm 1.8805149 = fieldWeight(doc:"ca 067" in 77), product of: 1.0 = tf(phraseFreq=1.0) 6.0176477 = idf(doc: ca=10 067=10) 0.3125 = fieldNorm(field=doc, doc=77) 1.8805147 = (MATCH) weight(doc:"ca 067" in 78), product of: 0.9994 = queryWeight(doc:"ca 067"), product of: 6.0176477 = idf(doc: ca=10 067=10) 0.16617788 = queryNorm 1.8805149 = fieldWeight(doc:"ca 067" in 78), product of: 1.0 = tf(phraseFreq=1.0) 6.0176477 = idf(doc: ca=10 067=10) 0.3125 = fieldNorm(field=doc, doc=78) 1.8805147 = (MATCH) weight(doc:"ca 067" in 79), product of: 0.9994 = queryWeight(doc
Re: search on default field returns less documents
Thanks Jack. our schema version is 1.3 we are using the official solr 3.4 release. actually we use maven to download solr war and artifacts org.apache.solr solr 3.4.0 war No, I did not modify the schema at anytime, all documents where indexed with the same schema. Yes, we have additional copyFields into the text field. usually none of them will contain the same text as the document name, its mostly owner information. to make the picture clearer: we are indexing text documents, every document has a db row, and the content on disk space. we index the db with DataImportHandler. among other columns we index the document name which is our 'doc' column, another field 'docname' which is the document display name,usually the same as 'doc', and we also index the document content in 'content' field (the content is indexed in the same DataImportHandler process). we copy 'doc' and 'content' into the 'text' field plus some other fields usually owner information like email address etc. it may be that the content contains the document name or parts of it. Thank you -- View this message in context: http://lucene.472066.n3.nabble.com/search-on-default-field-returns-less-documents-tp3999896p4000238.html Sent from the Solr - User mailing list archive at Nabble.com.
single node causing cluster-wide outage
Hi all! After upgrading to Solr 4.6.1 we encountered a situation where a cluster outage was traced to a single node misbehaving, after restarting the node the cluster immediately returned to normal operation. The bad node had ~420 threads locked on FastLRUCache and most httpshardexecutor threads were waiting on apache commons http futures. Has anyone encountered such a situation? what can we do to prevent misbehaving nodes from bringing down the entire cluster? Cheers, Avishai
Re: single node causing cluster-wide outage
a little more information: it seems the issue is happening after we get OutOfMemory error on facet query. On Wed, Mar 12, 2014 at 11:06 PM, Avishai Ish-Shalom wrote: > Hi all! > > After upgrading to Solr 4.6.1 we encountered a situation where a cluster > outage was traced to a single node misbehaving, after restarting the node > the cluster immediately returned to normal operation. > The bad node had ~420 threads locked on FastLRUCache and most > httpshardexecutor threads were waiting on apache commons http futures. > > Has anyone encountered such a situation? what can we do to prevent > misbehaving nodes from bringing down the entire cluster? > > Cheers, > Avishai >
Solr memory usage off-heap
Hi, My solr instances are configured with 10GB heap (Xmx) but linux shows resident size of 16-20GB. even with thread stack and permgen taken into account i'm still far off from these numbers. Could it be that jvm IO buffers take so much space? does lucene use JNI/JNA memory allocations?
Re: Solr memory usage off-heap
aha! mmap explains it. thank you. On Tue, Mar 18, 2014 at 3:11 PM, Shawn Heisey wrote: > On 3/18/2014 5:30 AM, Avishai Ish-Shalom wrote: > > My solr instances are configured with 10GB heap (Xmx) but linux shows > > resident size of 16-20GB. even with thread stack and permgen taken into > > account i'm still far off from these numbers. Could it be that jvm IO > > buffers take so much space? does lucene use JNI/JNA memory allocations? > > Solr does not do anything off-heap. There is a project called > heliosearch underway that aims to use off-heap memory extensively with > Solr. > > There IS some mis-reporting of memory usage, though. See a screenshot > that I just captured of top output, sorted by memory usage. The java > process at the top of the list is Solr, running under the included Jetty: > > https://www.dropbox.com/s/03a3pp510mrtixo/solr-ram-usage-wrong.png > > I have a 6GB heap and 52GB of index data on this server. This makes the > 62.2GB virtual memory size completely reasonable. The claimed resident > memory size is 20GB, though. If you add that 20GB to the 49GB that is > allocated to the OS disk cache and the 6GB that it says is free, that's > 75GB. I've only got 64GB of RAM on the box, so something is being > reported wrong. > > If I take my 20GB resident size and subtract the 14GB shared size, that > is closer to reality, and it makes the numbers fit into the actual > amount of RAM that's on the machine. I believe the misreporting is > caused by the specific way that Java uses MMap when opening Lucene > indexes. This information comes from what I remember about a > conversation I witnessed in #lucene or #lucene-dev, not from my own > exploration. I believe they said that the MMap methods which don't > misreport memory usage would not do what Lucene requires. > > Thanks, > Shawn > >
Re: Solr memory usage off-heap
thanks! On Tue, Mar 18, 2014 at 4:37 PM, Erick Erickson wrote: > Avishai: > > It sounds like you already understand mmap. Even so you might be > interested in this excellent writeup of MMapDirectory and Lucene by > Uwe: > http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html > > Best, > Erick > > On Tue, Mar 18, 2014 at 7:23 AM, Avishai Ish-Shalom > wrote: > > aha! mmap explains it. thank you. > > > > > > On Tue, Mar 18, 2014 at 3:11 PM, Shawn Heisey wrote: > > > >> On 3/18/2014 5:30 AM, Avishai Ish-Shalom wrote: > >> > My solr instances are configured with 10GB heap (Xmx) but linux shows > >> > resident size of 16-20GB. even with thread stack and permgen taken > into > >> > account i'm still far off from these numbers. Could it be that jvm IO > >> > buffers take so much space? does lucene use JNI/JNA memory > allocations? > >> > >> Solr does not do anything off-heap. There is a project called > >> heliosearch underway that aims to use off-heap memory extensively with > >> Solr. > >> > >> There IS some mis-reporting of memory usage, though. See a screenshot > >> that I just captured of top output, sorted by memory usage. The java > >> process at the top of the list is Solr, running under the included > Jetty: > >> > >> https://www.dropbox.com/s/03a3pp510mrtixo/solr-ram-usage-wrong.png > >> > >> I have a 6GB heap and 52GB of index data on this server. This makes the > >> 62.2GB virtual memory size completely reasonable. The claimed resident > >> memory size is 20GB, though. If you add that 20GB to the 49GB that is > >> allocated to the OS disk cache and the 6GB that it says is free, that's > >> 75GB. I've only got 64GB of RAM on the box, so something is being > >> reported wrong. > >> > >> If I take my 20GB resident size and subtract the 14GB shared size, that > >> is closer to reality, and it makes the numbers fit into the actual > >> amount of RAM that's on the machine. I believe the misreporting is > >> caused by the specific way that Java uses MMap when opening Lucene > >> indexes. This information comes from what I remember about a > >> conversation I witnessed in #lucene or #lucene-dev, not from my own > >> exploration. I believe they said that the MMap methods which don't > >> misreport memory usage would not do what Lucene requires. > >> > >> Thanks, > >> Shawn > >> > >> >
hung threads and CLOSE_WAIT sockets
Hi, We've had a strange mishap with a solr cloud cluster (version 4.5.1) where we observed high search latency. The problem appears to develop over several hours until such point where the entire cluster stopped responding properly. After investigation we found that the number of threads (both solr and jetty) gradually rose over several hours until it hit a the maximum allowed at which point the cluster stopped responding properly. After restarting several nodes the number of threads dropped and the cluster started responding again. We've examined nodes that were not restarted and found a high number of CLOSE_WAIT sockets held by the solr process; these sockets were using a random local port and 8983 remote port - meaning they were outgoing connections. a thread dump did not show a large number of solr threads and we were unable to determine which thread(s) is holding these sockets. has anyone else encountered such a situation? Regards, Avishai
Re: hung threads and CLOSE_WAIT sockets
SOLR-5216 ? On Fri, Mar 7, 2014 at 12:13 AM, Mark Miller wrote: > It sounds like the distributed update deadlock issue. > > It's fixed in 4.6.1 and 4.7. > > - Mark > > http://about.me/markrmiller > > On Mar 6, 2014, at 3:10 PM, Avishai Ish-Shalom > wrote: > > > Hi, > > > > We've had a strange mishap with a solr cloud cluster (version 4.5.1) > where > > we observed high search latency. The problem appears to develop over > > several hours until such point where the entire cluster stopped > responding > > properly. > > > > After investigation we found that the number of threads (both solr and > > jetty) gradually rose over several hours until it hit a the maximum > allowed > > at which point the cluster stopped responding properly. After restarting > > several nodes the number of threads dropped and the cluster started > > responding again. > > We've examined nodes that were not restarted and found a high number of > > CLOSE_WAIT sockets held by the solr process; these sockets were using a > > random local port and 8983 remote port - meaning they were outgoing > > connections. a thread dump did not show a large number of solr threads > and > > we were unable to determine which thread(s) is holding these sockets. > > > > has anyone else encountered such a situation? > > > > Regards, > > Avishai > >
Large fields storage
Hi all, I have very large documents (as big as 1GB) which i'm indexing and planning to store in Solr in order to use highlighting snippets. I am concerned about possible performance issues with such large fields - does storing the fields require additional RAM over what is required to index/fetch/search? I'm assuming Solr reads only the required data by offset from the storage and not the entire field. Am I correct in this assumption? Does anyone on this list has experience to share with such large documents? Thanks, Avishai
Re: Large fields storage
The use case is not for pdf or documents with images but very large text documents. My question is does storing the documents degrade performance more then just indexing without storing? i will only return highlighted text of limited length and probably never download the entire document. On Tue, Dec 2, 2014 at 2:15 AM, Jack Krupansky wrote: > In particular, if they are image-intensive, all the images go away. And > the formatting as well. > > -- Jack Krupansky > > -Original Message- From: Ahmet Arslan > Sent: Monday, December 1, 2014 6:02 PM > To: solr-user@lucene.apache.org > Subject: Re: Large fields storage > > > Hi Avi, > > I assume your documents are rich documents like pdf word, am I correct? > When you extract textual content from them, their size will shrink. > > Ahmet > > > > On Tuesday, December 2, 2014 12:11 AM, Avishai Ish-Shalom < > avis...@fewbytes.com> wrote: > Hi all, > > I have very large documents (as big as 1GB) which i'm indexing and planning > to store in Solr in order to use highlighting snippets. I am concerned > about possible performance issues with such large fields - does storing the > fields require additional RAM over what is required to index/fetch/search? > I'm assuming Solr reads only the required data by offset from the storage > and not the entire field. Am I correct in this assumption? > > Does anyone on this list has experience to share with such large documents? > > Thanks, > Avishai >
ReplicationHandler - SnapPull failed to download a file completely.
we are continuously getting this exception during replication from master to slave. our index size is 9.27 G and we are trying to replicate a slave from scratch. Its a different file each time , sometimes we get to 60% replication before it fails and sometimes only 10%, we never managed a successful replication. 30 Oct 2013 18:38:52,884 [explicit-fetchindex-cmd] ERROR ReplicationHandler - SnapPull failed :org.apache.solr.common.SolrException: Unable to download _aa7_Lucene41_0.tim completely. Downloaded 0!=1054090 at org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.cleanup(SnapPuller.java:1244) at org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.fetchFile(SnapPuller.java:1124) at org.apache.solr.handler.SnapPuller.downloadIndexFiles(SnapPuller.java:719) at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:397) at org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:317) at org.apache.solr.handler.ReplicationHandler$1.run(ReplicationHandler.java:218) I read in some thread that there was a related bug in solr 4.1, but we are using solr 4.3 and tried with 4.5.1 also. It seams that DirectoryFileFetcher can not download a file sometimes , the files is downloaded to the slave in size zero. we are running in a test environment where bandwidth is high. this is the master setup: | commit startup stopwords.txt,spellings.txt,synonyms.txt,protwords.txt,elevate.xml,currency.xml 00:00:50 | and the slave setup: | http://solr-master.saltdev.sealdoc.com:8081/solr-master 15 30 |
Re: ReplicationHandler - SnapPull failed to download a file completely.
Shawn, Thank you for your answer. for the purpose of testing it we have a test environment where we are not indexing anymore. We also disabled the DIH delta import. so as I understand there shouldn't be any commits on the master. I also tried with 50:50:50 and get the same failure. I tried changing and increasing various parameters on the master and slave but no luck yet. the master is functioning ok, we do have search results so I assume there is no index corruption on the master side. just to mention , we have done that many times before in the past few years, this started just now when we upgraded our solr from version 3.6 to version 4.3 and we reindexed all documents. if we have no solution soon, and this is holding an upgrade to our production site and various customers, do you think we can copy the index directory from the master to the slave and hope that future replication will work ? Thank you again. Shalom On Wed, Oct 30, 2013 at 10:00 PM, Shawn Heisey wrote: > On 10/30/2013 1:49 PM, Shalom Ben-Zvi Kazaz wrote: > >> we are continuously getting this exception during replication from >> master to slave. our index size is 9.27 G and we are trying to replicate >> a slave from scratch. >> Its a different file each time , sometimes we get to 60% replication >> before it fails and sometimes only 10%, we never managed a successful >> replication. >> > > > > > this is the master setup: >> >> | >> >> commit >> startup<**/str> >> stopwords.**txt,spellings.txt,synonyms.** >> txt,protwords.txt,elevate.xml,**currency.xml >> **00:00:50 >> >> >> > > I assume that you're probably doing commits fairly often, resulting in a > lot of merge activity that frequently deletes segments. That > "commitReserveDuration" parameter needs to be made larger. I would imagine > that it takes a lot more than 50 seconds to do the replication - even if > you've got an extremely fast network, replicating 9.7GB probably takes > several minutes. > > From the wiki page on replication: "If your commits are very frequent and > network is particularly slow, you can tweak an extra attribute name="commitReserveDuration">**00:00:10. This is roughly the time > taken to download 5MB from master to slave. Default is 10 secs." > > http://wiki.apache.org/solr/**SolrReplication#Master<http://wiki.apache.org/solr/SolrReplication#Master> > > You've said that your network is not slow, but with that much data, all > networks are slow. > > Thanks, > Shawn > >
Re: [SOLVED] ReplicationHandler - SnapPull failed to download a file completely.
emoving directory before core close: /opt/watchdox/solr-slave/data/index.20131031180837277 31 Oct 2013 18:10:40,878 [explicit-fetchindex-cmd] DEBUG CachingDirectoryFactory - Removing from cache: CachedDir<> 31 Oct 2013 18:10:40,878 [explicit-fetchindex-cmd] DEBUG CachingDirectoryFactory - Releasing directory: /opt/watchdox/solr-slave/data/index 1 false 31 Oct 2013 18:10:40,879 [explicit-fetchindex-cmd] ERROR ReplicationHandler - SnapPull failed :org.apache.solr.common.SolrException: Unable to download _aa7_Lucene41_0.pos completely. Downloaded 0!=1081710 at org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.cleanup(SnapPuller.java:1212) at org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.fetchFile(SnapPuller.java:1092) at org.apache.solr.handler.SnapPuller.downloadIndexFiles(SnapPuller.java:719) at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:397) at org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:317) at org.apache.solr.handler.ReplicationHandler$1.run(ReplicationHandler.java:218) 31 Oct 2013 18:10:40,910 [http-bio-8080-exec-8] DEBUG CachingDirectoryFactory - Reusing cached directory: CachedDir<> So I upgraded the httpcomponents jars to their latest 4.3.x version and the problem disappeared. the httpcomponents jars which are dependencies of solrj where in the 4.2.x version, I upgraded to httpclient-4.3.1 , httpcore-4.3 and httpmime-4.3.1 I ran the replication a few times now and no problem at all, it is now working as expected. It seams that the upgrade is necessary only on the slave side but I'm going to upgrade the master too. Thank you so much for your help. Shalom On Thu, Oct 31, 2013 at 6:46 PM, Shawn Heisey wrote: > On 10/31/2013 7:26 AM, Shalom Ben-Zvii Kazaz wrote: > > Shawn, Thank you for your answer. > > for the purpose of testing it we have a test environment where we are not > > indexing anymore. We also disabled the DIH delta import. so as I > understand > > there shouldn't be any commits on the master. > > I also tried with > > 50:50:50 > > and get the same failure. > > If it's in an environment where there are no commits, that's really > odd. I would suspect underlying filesystem or network issues. There's > one problem that's not well known, but is very common - problems with > NIC firmware, most commonly Broadcom NICs. These problems result in > things working correctly almost all the time, but when there is a high > network load, things break in strange ways, and the resulting errors > rarely look like they are network-related. > > Most embedded NICs are either Broadcom or Realtek, both of which are > famous for their firmware problems. Broadcom NICs are very common on > Dell and HP servers. Upgrading the firmware (which is not usually the > same thing as upgrading the driver) is the only fix. NICs from other > manufacturers also have upgradable firmware, but don't usually have the > same kind of high-profile problems as Broadcom. > > The NIC firmware might not have anything to do with this problem, but > it's the only thing left that I can think of. I personally haven't used > replication since Solr 1.4.1, but a lot of people do. I can't say that > there's no bugs, but so far I'm not seeing the kind of problem reports > that appear when a bug in a critical piece of the software exists. > > Thanks, > Shawn > >
searching both english and japanese
Hi, We have a customer that needs support for both english and japanese, a document can be any of the two and we have no indication about the language for a document. ,so I know I can construct a schema with both english and japanese fields and index them with copy field. I also know I can detect the language and index only the relevant fields but I want to support mixed language documents so I think I need to index to both english and japanese fields. we are using the standard request handler no dismax and we want to keep using it as our queries should be on certain fields with no errors. queries are user entered and can be any valid query like q=lexmark or q=docname:lexmark AND content:printer , now what I think I want is to add the japanese fields to this query and end up with "q=docname:lexmark OR docname_ja:lexmark" or "q=(docname:lexmark AND content:printer) OR (docname_ja:lexmark AND content_ja:printer) " . of course I can not ask the use to do that. and also we have only one default field and it must be japanese or english but not both. I think the default field can be solved by using dixmax and specify multi default fields with qt, but we don't use dismax. we use solrj as our client and It would be better if I could do something in the client side and not in solr side. any help/idea is appreciated. ?
edismax behaviour with japanese
Hello, I have a text and text_ja fields where text is english and text_ja is japanese analyzers, i index both with copyfield from other fields. I'm trying to search both fields using edismax and qf parameter, but I see strange behaviour of edismax , I wonder if someone can give me a hist to what's going on and what am I doing wrong? when I run this query i can see that solr is searching both fields but the text_ja: query is only a partial text and text: is the complete text. http://localhost/solr/core0/select/?indent=on&rows=100&; debug=query& defType=edismax&qf=text+text_ja&q=このたびは このたびは このたびは (+DisjunctionMaxQuery((text_ja:たび | text:この たびは)))/no_coord +(text_ja:たび | text:このたびは) ExtendedDismaxQParser now, if I remove the last two characters from the query string solr will not search the text_ja, at list that's what I understand from the debug output: http://localhost/solr/core0/select/?indent=on&rows=100&; debug=query& defType=edismax&qf=text+text_ja&q=このた このた このた (+DisjunctionMaxQuery((text:このた)))/no_coord< /str> +(text:このた) ExtendedDismaxQParser with another string of japanese text solr now cuts the query to multiple text_ja queries: http://localhost/solr/core0/select/?indent=on&rows=100&; debug=query& defType=edismax&qf=text+text_ja&q=システムをお買い求め いただき システムをお買い求めいただき システムをお買い求めいただき (+DisjunctionMaxQuerytext_ja:システム text_ja:買い求める text_ja:いただく)~3) | text:システムをお買い求めいた だき)))/no_coord +(((text_ja:システム text_ja:買い求める text_ja:いただく)~3) | text:システムをお買い求めいただき) ExtendedDismaxQParser Thank you.
filter result by numFound in Result Grouping
Hello list In one of our search that we use Result Grouping we have a need to filter results to only groups that have more then one document in the group, or more specifically to groups that have two documents. Is it possible in some way? Thank you