date:20121201

Re: Solr 4: Join Query

2012-12-01 Thread Erick Erickson

That's the way joins work, and why they're called "pseudo join", they don't
work like DB joins and return data from both records

Joins were put in for a specific use-case, when you try to treat Solr like
a DB you're bound to be disappointed. I'd think about reworking the
solution to de-normalize the data so you don't have to do joins.

Best
Erick


On Fri, Nov 30, 2012 at 10:38 AM, Vikash Sharma wrote:

> Hi All,
> I have my field definition in schema.xml like below
>
> 
> 
> 
> 
>
>
> I need to create separate record in solr for each parent child
> relationship... such that if child is same across different parent that it
> gets stored only once.
>
> For e.g.
>  ---_Record 1
> ABC
> EMP001
> DOC001
> My Parent Doc
>
>  ---_Record 2
> DOC001
> 
> 
> My Document Data
>
>
> This will ensure that if any doc_id content is duplicate, than only once
> the record is inserted in the solr.
>
> Lastly, I want the result as join. if emp_id=EMP001. then both record
> should be returned, as there is a relationship between two records using of
> doc_id = id
>
> If I query:
>
> http://localhost:8983/solr/select?q={!join%20from=doc_id%20to=id}emp_id:EMP001&wt=json
> <
> http://localhost:8983/solr/select?q={!join%20from=sha_one%20to=id}project_id:10&wt=json
> >
>
> I expect both record should be returned either one after another or
> nested..
> But I only get child records...
>
>
> Please help..
>
>
>
> Regards,
> Vikash Sharma
> vikash0...@gmail.com
>

Re: Replication happening before replicateAfter event

2012-12-01 Thread Erick Erickson

First comment: You probably don't need to optimize. Despite its name, it
rarely makes a difference and has several downsides, particularly it'll make
replication replicate the entire index rather than just the changed
segments.
Optimize purges leftover data from docs that have been deleted, which will
happen anyway on segment merges.

But your problem isn't really a problem I don't think. I think you're
confusing
special events and polling. When you set these properties:
"replicateAfter" "startup" and "optimize", you're really telling the slaves
to update when any of them fire _in addition to_ when any replication that
happens due to polling. So when you optimize, a couple of thing happen.
1> all unclosed segments are closed.
2> segments are merged.

If the poll happens between 1 and 2, you'll get an index replication. Then
you'll get another after the optimize.

Ditto on autocommits. An auto commit closes the open segments. As soon
as a poll sees that, the new segments are pulled down.

The intent is for polling to pull down all changes it can every time, that's
just the way it's designed.

So you have a couple of choices:
1> use the HTTP api to disable replication, then enable it when you want.
2> turn off autocommit and don't commit during indexing at all until the
very end. No commit ==  no replication.
3> but even if you do <2>, you still might get a replication after commit
and after optimize. If you insist on optimizing, you're probably stuck with
<1>. But I'd really think twice about the optimize bit.

Best
Erick

On Fri, Nov 30, 2012 at 7:25 AM, Duncan Irvine wrote:

> Hi All,
>   I'm a bit new to the whole solr world and am having a slight problem with
> replication.  I'm attempting to configure a master/slave scenario with bulk
> updates happening periodically. I'd like to insert a large batch of docs to
> the master, then invoke an optimize and have it only then replicate to the
> slave.
>
> At present I can create the master index, which seems to go to plan.
>  Watching the updateHandler, I see records being added, indexed and
> auto-committed every so often.  If I query the master while I am inserting,
> and auto-commits have happened I see 0 records.  Then, when I commit at the
> end, they all appear at once.  This is as I'd expect.
>
> What doesn't seem to be working right is that I've configured replication
> to "replicateAfter" "startup" and "optimize" with a pollInterval of 60s;
> however the slave is replicating and serving the "uncommitted" data
> (although presumably post-auto-commit).
>
> According to my master, I have:
>
> Version: 0
> Gen: 1
> Size: 1.53GB
> replicateAfter: optimize, startup
>
> And, at present, my slave says:
> Master:
>   Version: 0
>   Gen: 1
>   Size: 1.53GB
> Slave:
>   Version: 1354275651817
>   Gen: 52
>   Size: 1.39GB
>
> Which is a bit odd.
> If I query the slave, I get results and as the slave polls I gradually get
> more and more.
>
> Obviously, I can disable polling and enable it programmatically once I'm
> ready, but I was hoping to avoid that.
>
> Does anyone have any thoughts?
>
> Cheers,
>   Duncan.
>

NPE Exception when adding documents without UUID

2012-12-01 Thread Mario-Leander Reimer

Hi Solr users,



when adding documents with SolrJ using the BinaryRequestWriter I get NPEs
(see attached stacktrace). The documents I add do not have the unique key
field initialized, but the schema declares this field as .

This issues seems to be related to
https://issues.apache.org/jira/browse/SOLR-2615 This does not happen when I
use the “normal” XML based request writer. I am using Solr 3.6.1.



Any suggestions how I can work around this?



Thanks for your help,

Leander



-



[#|2012-11-30T19:34:49.763+0100|SEVERE|sun-appserver2.1|org.apache.solr.handler.XmlUpdateRequestHandler|_ThreadID=39;_ThreadName=httpSSLWorkerThread-8080-9;_RequestID=6708df5d-c255-4c52-b89f-4706d95f0598;|Exception
while processing update request

java.lang.NullPointerException

at
org.apache.solr.update.AddUpdateCommand.getPrintableId(AddUpdateCommand.java:102)

at
org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:109)

at
org.apache.solr.handler.BinaryUpdateRequestHandler$2.update(BinaryUpdateRequestHandler.java:89)

at
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$2.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:140)

at
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$2.readIterator(JavaBinUpdateRequestCodec.java:129)

at
org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:211)

at
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$2.readNamedList(JavaBinUpdateRequestCodec.java:114)

at
org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:176)

at
org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:102)

at
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:150)

at
org.apache.solr.handler.BinaryUpdateRequestHandler.parseAndLoadDocs(BinaryUpdateRequestHandler.java:99)

at
org.apache.solr.handler.BinaryUpdateRequestHandler.access$000(BinaryUpdateRequestHandler.java:46)

at
org.apache.solr.handler.BinaryUpdateRequestHandler$1.load(BinaryUpdateRequestHandler.java:57)

at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:58)

at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)

at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376)


--

Mario-Leander Reimer
Softwarearchitekt

QAware GmbH
Aschauer Str. 32
81549 München, Germany
Tel +49 89 6008871-21
Mobil +49 151 61314748
Fax +49 89 6008871-29
mario-leander.rei...@qaware.de
www.qaware.de
--

Geschäftsführer: Christian Kamm, Bernd Schlüter, Johannes Weigend, Josef
Adersberger
Registergericht: München
Handelsregisternummer: HRB 163761

A (seemingly) unavoidable bump in qtimes shortly after replication ends

2012-12-01 Thread Ryan Gardner

I'm working on performance tuning / testing a solr 4 cluster - the slaves
are handling all the queries and they are currently replicating from the
master every minute.

Right now for my load test, I'm playing back random queries form a set of
about 5M queries harvested from our existing production servers. Whenever
the replication ends and the new searcher gets used there is a jump in
QTimes.

The QTimes jump briefly but ONLY after replication. I tried adding in some
warming queries to create facets etc - and I analyzed the queries and they
all seem relatively similar (use a large number of facets etc) - but even
when I have converted those exact queries over to the format that is in the
logs, it doesn't seem to eliminate the "post-repl bump" of qtimes jumping
from single digits to 500 - 800ms.

Here's an example of what i'm talking about.

-- replication ends (message about deleting temporary index in the logs)
date  |  qtime
21:50:45.159 | 0 |
21:50:45.170 | 1 |
21:50:45.174 | 0 |
21:50:45.182 | 2 |
21:50:45.185 | 78 |
21:50:45.194 | 1 |
21:50:45.201 | 69 |
21:50:45.206 | 0 |
21:50:45.211 | 0 |
21:50:45.219 | 0 |
21:50:45.286 | 1 |
21:50:45.288 | 0 |
21:50:45.301 | 30 |
21:50:45.317 | 15 |
21:50:45.327 | 2 |
21:50:45.327 | 1 |
21:50:45.334 | 0 |
21:50:45.337 | 1 |
21:50:45.345 | 1 |
21:50:45.347 | 2 |
21:50:45.392 | 19 |
21:50:45.415 | 47 |
21:50:45.428 | 1 |
21:50:45.438 | 2 |
21:50:45.453 | 1 |
21:50:45.468 | 3 |
21:50:45.507 | 4 |
21:50:45.551 | 1 |
21:50:45.617 | 92 |
21:50:45.617 | 251 |
21:50:45.619 | 457 |
21:50:45.632 | 1 |
21:50:45.731 | 500 |
21:50:45.731 | 437 |
21:50:45.731 | 526 |
21:50:45.731 | 514 |
21:50:45.731 | 354 |
21:50:45.731 | 531 |
21:50:45.731 | 525 |
21:50:45.731 | 525 |
21:50:45.732 | 502 |
21:50:45.732 | 452 |
21:50:45.732 | 278 |
21:50:45.732 | 527 |
21:50:45.732 | 270 |
21:50:45.732 | 576 |
21:50:45.733 | 221 |
21:50:45.733 | 225 |
21:50:45.734 | 265 |
21:50:45.735 | 370 |
21:50:45.737 | 551 |
21:50:45.737 | 517 |
21:50:45.738 | 440 |
21:50:45.738 | 477 |
21:50:45.738 | 43 |
21:50:45.738 | 299 |
21:50:45.739 | 541 |
21:50:45.825 | 1 |
21:50:45.838 | 5 |
21:50:45.848 | 0 |
21:50:45.852 | 0 |
21:50:45.859 | 19 |
21:50:45.875 | 6 |
21:50:45.876 | 7 |
21:50:45.881 | 0 |
21:50:45.883 | 14 |
21:50:45.886 | 1 |
21:50:45.890 | 4 |
21:50:45.891 | 3 |
21:50:45.894 | 1 |
21:50:45.902 | 1 |
21:50:45.906 | 4 |
21:50:45.908 | 3 |
21:50:45.918 | 18 |
21:50:45.921 | 10 |

The DocumentCache in our case is not very useful because of our 1 minute
replication pattern. I have it sized to 1024 elements now. When I've tried
to increase the size, it caused our GC pause times to skyrocket. (Currently
it's tuned so it has a 250ms GC pause roughly every 16 seconds - and I've
verified that the above QTime bump is not due to GC activity)

Is there something I can do to help with these? (short of increasing the
replication time to a longer interval to mitigate the impact of these
bumps)?

I guess the real question I have is "Why do queries get faster after a
second or so after replication? How can I try to get it to do that as part
of the newSearcher warming??" - Like I said, I've copied some of those slow
queries and put them into the newSearcher warming section to see if "well,
maybe running through a few dozen of these searches that's what makes it
get faster" - but that hasn't helped.

Our index file is stored on disk but the OS basically has it all cached in
RAM (I tested moving it to tmpfs but saw no improvement in speed - so I
went back to putting it on disk) - and the CPU is not anywhere near taxed
(it's got 24 cores on this machine)

So far the performance of SOLR has been stellar - and once we finish tuning
this we'll write up how we've tuned it / how we're using it to share out
more widely with anyone who cares - but the one perplexing thing is this
bump in bad times after replication.

Any thoughts would be appreciated.

Ryan

Re: SolrCloud(5x) - Errors while recovering

2012-12-01 Thread Mark Miller

FYI, I've fixed this 5x issue a few days ago.

- Mark


On Nov 27, 2012, at 10:57 AM, Mark Miller  wrote:

> Someone else has been seeing this on 5x as well - their must be a bug in the 
> new file handling code (which is why it's still baking in 5x and not on 4x 
> yet). I tried to trigger it in tests a while back, but had no look in the 
> brief time I had. I'll try some manual tests when I get chance, as well as a 
> little code review. Something is off.
> 
> - Mark
> 
> 
> On Nov 26, 2012, at 10:58 PM, deniz  wrote:
> 
>> Here is briefly what is happening:
>> 
>> I have a simple SolrCloud environment for test purposes, running with a
>> zookeeper ensemble, not the ones embedded in Solr.
>> 
>> I have 3 instances in the cloud, all of them are using RAMDirectory (which
>> is enabled by new Solr release to use with cloud)
>> 
>> After running zookeepers and connecting my solrs to them, the cloud is up
>> without any errors or problems. Then I have started indexing (which is much
>> slower than a single instance, i will open a topic about it too) and
>> everything is okay once again, all of the nodes get the sync'ed data from
>> the leader node. 
>> 
>> After that I have killed one Solr instance. then I have restarted it and in
>> the logs it keeps showing me these errors:
>> 
>> SEVERE: Error while trying to recover:org.apache.solr.common.SolrException:
>> Server at http://myhost:8995/solr/mycore returned non ok status:500,
>> message:Server Error
>>  at
>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:372)
>>  at
>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
>>  at
>> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
>>  at
>> org.apache.solr.cloud.RecoveryStrategy.commitOnLeader(RecoveryStrategy.java:182)
>>  at
>> org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:134)
>>  at
>> org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:407)
>>  at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:222)
>> .
>> .
>> .
>> .
>> .
>> 
>> Nov 27, 2012 11:49:04 AM
>> org.apache.solr.handler.SnapPuller$DirectoryFileFetcher fetchPackets
>> WARNING: Error in fetching packets 
>> java.io.EOFException
>>  at
>> org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:151)
>>  at
>> org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:144)
>>  at
>> org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.fetchPackets(SnapPuller.java:1143)
>>  at
>> org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.fetchFile(SnapPuller.java:1107)
>>  at
>> org.apache.solr.handler.SnapPuller.downloadIndexFiles(SnapPuller.java:716)
>>  at 
>> org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:387)
>>  at
>> org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:273)
>>  at
>> org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:152)
>>  at
>> org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:407)
>>  at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:222)
>> .
>> .
>> .
>> .
>> .
>> SEVERE: SnapPull failed :org.apache.solr.common.SolrException: Unable to
>> download _41y.fdt completely. Downloaded 3145728!=3243906
>>  at
>> org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.cleanup(SnapPuller.java:1237)
>>  at
>> org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.fetchFile(SnapPuller.java:1118)
>>  at
>> org.apache.solr.handler.SnapPuller.downloadIndexFiles(SnapPuller.java:716)
>>  at 
>> org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:387)
>>  at
>> org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:273)
>>  at
>> org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:152)
>>  at
>> org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:407)
>>  at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:222)
>> SEVERE: Error while trying to recover:org.apache.solr.common.SolrException:
>> Replication for recovery failed.
>>  at
>> org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:155)
>>  at
>> org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:407)
>>  at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:222)
>> 
>> 
>> 
>> can anyone explain why i am getting this error? 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> -
>> Zeki ama calismiyor... Calissa yapar...
>> --
>> View this message in context: 
>> http://lucene.472066.n3.nabble.com/SolrCloud-5x-Errors-while-recovering-tp4022542.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>

how to do a range search not on ordered data (text type)

2012-12-01 Thread jend

Hi,
Im building a solr install which has a blurb of data in a field
"description".

In that field there are sentences such as "This property has a block size of
770sqm." or "1200sqm block blah blah blah".

Its a text field obviously.

How can I construct a search as follows.

Someone wants to search for properties 
with the block size over 900sqm.

with a block size under 1200sqm.

with a block size of between 550 and 1500sqm.

Its essentially a text string but can you range values in text somehow?

Thanks in advance.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-do-a-range-search-not-on-ordered-data-text-type-tp4023761.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr 4: Join Query

Re: Replication happening before replicateAfter event

NPE Exception when adding documents without UUID

A (seemingly) unavoidable bump in qtimes shortly after replication ends

Re: SolrCloud(5x) - Errors while recovering

how to do a range search not on ordered data (text type)

6 matches

Site Navigation

Mail list logo

Footer information