Re: Solr 4: Join Query
That's the way joins work, and why they're called "pseudo join", they don't work like DB joins and return data from both records Joins were put in for a specific use-case, when you try to treat Solr like a DB you're bound to be disappointed. I'd think about reworking the solution to de-normalize the data so you don't have to do joins. Best Erick On Fri, Nov 30, 2012 at 10:38 AM, Vikash Sharma wrote: > Hi All, > I have my field definition in schema.xml like below > > > > > > > > I need to create separate record in solr for each parent child > relationship... such that if child is same across different parent that it > gets stored only once. > > For e.g. > ---_Record 1 > ABC > EMP001 > DOC001 > My Parent Doc > > ---_Record 2 > DOC001 > > > My Document Data > > > This will ensure that if any doc_id content is duplicate, than only once > the record is inserted in the solr. > > Lastly, I want the result as join. if emp_id=EMP001. then both record > should be returned, as there is a relationship between two records using of > doc_id = id > > If I query: > > http://localhost:8983/solr/select?q={!join%20from=doc_id%20to=id}emp_id:EMP001&wt=json > < > http://localhost:8983/solr/select?q={!join%20from=sha_one%20to=id}project_id:10&wt=json > > > > I expect both record should be returned either one after another or > nested.. > But I only get child records... > > > Please help.. > > > > Regards, > Vikash Sharma > vikash0...@gmail.com >
Re: Replication happening before replicateAfter event
First comment: You probably don't need to optimize. Despite its name, it rarely makes a difference and has several downsides, particularly it'll make replication replicate the entire index rather than just the changed segments. Optimize purges leftover data from docs that have been deleted, which will happen anyway on segment merges. But your problem isn't really a problem I don't think. I think you're confusing special events and polling. When you set these properties: "replicateAfter" "startup" and "optimize", you're really telling the slaves to update when any of them fire _in addition to_ when any replication that happens due to polling. So when you optimize, a couple of thing happen. 1> all unclosed segments are closed. 2> segments are merged. If the poll happens between 1 and 2, you'll get an index replication. Then you'll get another after the optimize. Ditto on autocommits. An auto commit closes the open segments. As soon as a poll sees that, the new segments are pulled down. The intent is for polling to pull down all changes it can every time, that's just the way it's designed. So you have a couple of choices: 1> use the HTTP api to disable replication, then enable it when you want. 2> turn off autocommit and don't commit during indexing at all until the very end. No commit == no replication. 3> but even if you do <2>, you still might get a replication after commit and after optimize. If you insist on optimizing, you're probably stuck with <1>. But I'd really think twice about the optimize bit. Best Erick On Fri, Nov 30, 2012 at 7:25 AM, Duncan Irvine wrote: > Hi All, > I'm a bit new to the whole solr world and am having a slight problem with > replication. I'm attempting to configure a master/slave scenario with bulk > updates happening periodically. I'd like to insert a large batch of docs to > the master, then invoke an optimize and have it only then replicate to the > slave. > > At present I can create the master index, which seems to go to plan. > Watching the updateHandler, I see records being added, indexed and > auto-committed every so often. If I query the master while I am inserting, > and auto-commits have happened I see 0 records. Then, when I commit at the > end, they all appear at once. This is as I'd expect. > > What doesn't seem to be working right is that I've configured replication > to "replicateAfter" "startup" and "optimize" with a pollInterval of 60s; > however the slave is replicating and serving the "uncommitted" data > (although presumably post-auto-commit). > > According to my master, I have: > > Version: 0 > Gen: 1 > Size: 1.53GB > replicateAfter: optimize, startup > > And, at present, my slave says: > Master: > Version: 0 > Gen: 1 > Size: 1.53GB > Slave: > Version: 1354275651817 > Gen: 52 > Size: 1.39GB > > Which is a bit odd. > If I query the slave, I get results and as the slave polls I gradually get > more and more. > > Obviously, I can disable polling and enable it programmatically once I'm > ready, but I was hoping to avoid that. > > Does anyone have any thoughts? > > Cheers, > Duncan. >
NPE Exception when adding documents without UUID
Hi Solr users, when adding documents with SolrJ using the BinaryRequestWriter I get NPEs (see attached stacktrace). The documents I add do not have the unique key field initialized, but the schema declares this field as . This issues seems to be related to https://issues.apache.org/jira/browse/SOLR-2615 This does not happen when I use the “normal” XML based request writer. I am using Solr 3.6.1. Any suggestions how I can work around this? Thanks for your help, Leander - [#|2012-11-30T19:34:49.763+0100|SEVERE|sun-appserver2.1|org.apache.solr.handler.XmlUpdateRequestHandler|_ThreadID=39;_ThreadName=httpSSLWorkerThread-8080-9;_RequestID=6708df5d-c255-4c52-b89f-4706d95f0598;|Exception while processing update request java.lang.NullPointerException at org.apache.solr.update.AddUpdateCommand.getPrintableId(AddUpdateCommand.java:102) at org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:109) at org.apache.solr.handler.BinaryUpdateRequestHandler$2.update(BinaryUpdateRequestHandler.java:89) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$2.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:140) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$2.readIterator(JavaBinUpdateRequestCodec.java:129) at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:211) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$2.readNamedList(JavaBinUpdateRequestCodec.java:114) at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:176) at org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:102) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:150) at org.apache.solr.handler.BinaryUpdateRequestHandler.parseAndLoadDocs(BinaryUpdateRequestHandler.java:99) at org.apache.solr.handler.BinaryUpdateRequestHandler.access$000(BinaryUpdateRequestHandler.java:46) at org.apache.solr.handler.BinaryUpdateRequestHandler$1.load(BinaryUpdateRequestHandler.java:57) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:58) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376) -- Mario-Leander Reimer Softwarearchitekt QAware GmbH Aschauer Str. 32 81549 München, Germany Tel +49 89 6008871-21 Mobil +49 151 61314748 Fax +49 89 6008871-29 mario-leander.rei...@qaware.de www.qaware.de -- Geschäftsführer: Christian Kamm, Bernd Schlüter, Johannes Weigend, Josef Adersberger Registergericht: München Handelsregisternummer: HRB 163761
A (seemingly) unavoidable bump in qtimes shortly after replication ends
I'm working on performance tuning / testing a solr 4 cluster - the slaves are handling all the queries and they are currently replicating from the master every minute. Right now for my load test, I'm playing back random queries form a set of about 5M queries harvested from our existing production servers. Whenever the replication ends and the new searcher gets used there is a jump in QTimes. The QTimes jump briefly but ONLY after replication. I tried adding in some warming queries to create facets etc - and I analyzed the queries and they all seem relatively similar (use a large number of facets etc) - but even when I have converted those exact queries over to the format that is in the logs, it doesn't seem to eliminate the "post-repl bump" of qtimes jumping from single digits to 500 - 800ms. Here's an example of what i'm talking about. -- replication ends (message about deleting temporary index in the logs) date | qtime 21:50:45.159 | 0 | 21:50:45.170 | 1 | 21:50:45.174 | 0 | 21:50:45.182 | 2 | 21:50:45.185 | 78 | 21:50:45.194 | 1 | 21:50:45.201 | 69 | 21:50:45.206 | 0 | 21:50:45.211 | 0 | 21:50:45.219 | 0 | 21:50:45.286 | 1 | 21:50:45.288 | 0 | 21:50:45.301 | 30 | 21:50:45.317 | 15 | 21:50:45.327 | 2 | 21:50:45.327 | 1 | 21:50:45.334 | 0 | 21:50:45.337 | 1 | 21:50:45.345 | 1 | 21:50:45.347 | 2 | 21:50:45.392 | 19 | 21:50:45.415 | 47 | 21:50:45.428 | 1 | 21:50:45.438 | 2 | 21:50:45.453 | 1 | 21:50:45.468 | 3 | 21:50:45.507 | 4 | 21:50:45.551 | 1 | 21:50:45.617 | 92 | 21:50:45.617 | 251 | 21:50:45.619 | 457 | 21:50:45.632 | 1 | 21:50:45.731 | 500 | 21:50:45.731 | 437 | 21:50:45.731 | 526 | 21:50:45.731 | 514 | 21:50:45.731 | 354 | 21:50:45.731 | 531 | 21:50:45.731 | 525 | 21:50:45.731 | 525 | 21:50:45.732 | 502 | 21:50:45.732 | 452 | 21:50:45.732 | 278 | 21:50:45.732 | 527 | 21:50:45.732 | 270 | 21:50:45.732 | 576 | 21:50:45.733 | 221 | 21:50:45.733 | 225 | 21:50:45.734 | 265 | 21:50:45.735 | 370 | 21:50:45.737 | 551 | 21:50:45.737 | 517 | 21:50:45.738 | 440 | 21:50:45.738 | 477 | 21:50:45.738 | 43 | 21:50:45.738 | 299 | 21:50:45.739 | 541 | 21:50:45.825 | 1 | 21:50:45.838 | 5 | 21:50:45.848 | 0 | 21:50:45.852 | 0 | 21:50:45.859 | 19 | 21:50:45.875 | 6 | 21:50:45.876 | 7 | 21:50:45.881 | 0 | 21:50:45.883 | 14 | 21:50:45.886 | 1 | 21:50:45.890 | 4 | 21:50:45.891 | 3 | 21:50:45.894 | 1 | 21:50:45.902 | 1 | 21:50:45.906 | 4 | 21:50:45.908 | 3 | 21:50:45.918 | 18 | 21:50:45.921 | 10 | The DocumentCache in our case is not very useful because of our 1 minute replication pattern. I have it sized to 1024 elements now. When I've tried to increase the size, it caused our GC pause times to skyrocket. (Currently it's tuned so it has a 250ms GC pause roughly every 16 seconds - and I've verified that the above QTime bump is not due to GC activity) Is there something I can do to help with these? (short of increasing the replication time to a longer interval to mitigate the impact of these bumps)? I guess the real question I have is "Why do queries get faster after a second or so after replication? How can I try to get it to do that as part of the newSearcher warming??" - Like I said, I've copied some of those slow queries and put them into the newSearcher warming section to see if "well, maybe running through a few dozen of these searches that's what makes it get faster" - but that hasn't helped. Our index file is stored on disk but the OS basically has it all cached in RAM (I tested moving it to tmpfs but saw no improvement in speed - so I went back to putting it on disk) - and the CPU is not anywhere near taxed (it's got 24 cores on this machine) So far the performance of SOLR has been stellar - and once we finish tuning this we'll write up how we've tuned it / how we're using it to share out more widely with anyone who cares - but the one perplexing thing is this bump in bad times after replication. Any thoughts would be appreciated. Ryan
Re: SolrCloud(5x) - Errors while recovering
FYI, I've fixed this 5x issue a few days ago. - Mark On Nov 27, 2012, at 10:57 AM, Mark Miller wrote: > Someone else has been seeing this on 5x as well - their must be a bug in the > new file handling code (which is why it's still baking in 5x and not on 4x > yet). I tried to trigger it in tests a while back, but had no look in the > brief time I had. I'll try some manual tests when I get chance, as well as a > little code review. Something is off. > > - Mark > > > On Nov 26, 2012, at 10:58 PM, deniz wrote: > >> Here is briefly what is happening: >> >> I have a simple SolrCloud environment for test purposes, running with a >> zookeeper ensemble, not the ones embedded in Solr. >> >> I have 3 instances in the cloud, all of them are using RAMDirectory (which >> is enabled by new Solr release to use with cloud) >> >> After running zookeepers and connecting my solrs to them, the cloud is up >> without any errors or problems. Then I have started indexing (which is much >> slower than a single instance, i will open a topic about it too) and >> everything is okay once again, all of the nodes get the sync'ed data from >> the leader node. >> >> After that I have killed one Solr instance. then I have restarted it and in >> the logs it keeps showing me these errors: >> >> SEVERE: Error while trying to recover:org.apache.solr.common.SolrException: >> Server at http://myhost:8995/solr/mycore returned non ok status:500, >> message:Server Error >> at >> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:372) >> at >> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181) >> at >> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117) >> at >> org.apache.solr.cloud.RecoveryStrategy.commitOnLeader(RecoveryStrategy.java:182) >> at >> org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:134) >> at >> org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:407) >> at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:222) >> . >> . >> . >> . >> . >> >> Nov 27, 2012 11:49:04 AM >> org.apache.solr.handler.SnapPuller$DirectoryFileFetcher fetchPackets >> WARNING: Error in fetching packets >> java.io.EOFException >> at >> org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:151) >> at >> org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:144) >> at >> org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.fetchPackets(SnapPuller.java:1143) >> at >> org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.fetchFile(SnapPuller.java:1107) >> at >> org.apache.solr.handler.SnapPuller.downloadIndexFiles(SnapPuller.java:716) >> at >> org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:387) >> at >> org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:273) >> at >> org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:152) >> at >> org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:407) >> at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:222) >> . >> . >> . >> . >> . >> SEVERE: SnapPull failed :org.apache.solr.common.SolrException: Unable to >> download _41y.fdt completely. Downloaded 3145728!=3243906 >> at >> org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.cleanup(SnapPuller.java:1237) >> at >> org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.fetchFile(SnapPuller.java:1118) >> at >> org.apache.solr.handler.SnapPuller.downloadIndexFiles(SnapPuller.java:716) >> at >> org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:387) >> at >> org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:273) >> at >> org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:152) >> at >> org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:407) >> at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:222) >> SEVERE: Error while trying to recover:org.apache.solr.common.SolrException: >> Replication for recovery failed. >> at >> org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:155) >> at >> org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:407) >> at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:222) >> >> >> >> can anyone explain why i am getting this error? >> >> >> >> >> >> >> >> >> >> >> >> >> - >> Zeki ama calismiyor... Calissa yapar... >> -- >> View this message in context: >> http://lucene.472066.n3.nabble.com/SolrCloud-5x-Errors-while-recovering-tp4022542.html >> Sent from the Solr - User mailing list archive at Nabble.com. >
how to do a range search not on ordered data (text type)
Hi, Im building a solr install which has a blurb of data in a field "description". In that field there are sentences such as "This property has a block size of 770sqm." or "1200sqm block blah blah blah". Its a text field obviously. How can I construct a search as follows. Someone wants to search for properties with the block size over 900sqm. with a block size under 1200sqm. with a block size of between 550 and 1500sqm. Its essentially a text string but can you range values in text somehow? Thanks in advance. -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-do-a-range-search-not-on-ordered-data-text-type-tp4023761.html Sent from the Solr - User mailing list archive at Nabble.com.