Re: Manually assigning shard leader and replicas during initial setup on EC2

2013-01-23 Thread Daniel Collins
This is exactly the problem we are encountering as well, how to deal with
the ZK Quorum when we have multiple DCs.  Our index is spread so that each
DC has a complete copy and *should* be able to survive on its own, but how
to arrange ZK to deal with that.  The problem with Quorum is we need an odd
number of ZKs to start which doesn't fit when we have 2 DCs.

Logically we would want 4 ZKs, 2 in each DC, and be able to survive with 2
out of that 4 (or in the extreme case 1) but the quorum has to be an odd
number. So if we have 3 or 5, we have to assign more to 1 DC than the other
and we can't afford to lose that DC.  It feels like we are assigning
"weight" to 1 DC, when in reality they should be peers...

One solution is not to use Zookeeper/Solr Cloud and just manually
assign/distribute the shards/queries/configuration at the application
level, but that feels like a step back to Solr 3.x?

Cheers, Daniel


On 23 January 2013 05:56, Timothy Potter  wrote:

> For the Zk quorum issue, we'll put nodes in 3 different AZ's so we can lose
> 1 AZ and still establish quorum with the other 2.
>
> On Tue, Jan 22, 2013 at 10:44 PM, Timothy Potter  >wrote:
>
> > Hi Markus,
> >
> > Thanks for the insight. There's a pretty high cost to using the approach
> > you suggest in that I'd have to double my node count which won't make my
> > acct'ing dept. very happy.
> >
> > As for cross AZ latency, I'm already running my cluster with nodes in 3
> > different AZ's and our distributed query performance is acceptable for
> us.
> > Our AZ's are in the same region.
> >
> > However, I'm not sure I understand your point about Solr modifying
> > clusterstate.json when a node goes down. From what I understand, it will
> > assign a new shard leader but in my case that's expected and doesn't seem
> > to cause an issue. The new shard leader will be the previous replica from
> > the other AZ but that's OK. In this case, the cluster is still
> functional.
> > In other words, from my understanding, Solr is not going to change shard
> > assignments on the nodes, it's just going to select a new leader, which
> in
> > my case is in another AZ.
> >
> > Lastly, Erick raises a good point about Zk and cross AZ quorum. I don't
> > have a good answer to that issue but will post back if I come up with
> > something.
> >
> > Cheers,
> > Tim
> >
> > On Tue, Jan 22, 2013 at 3:11 PM, Markus Jelsma <
> markus.jel...@openindex.io
> > > wrote:
> >
> >> Hi,
> >>
> >> Regarding availability; since SolrCloud is not DC-aware at this moment
> we
> >> 'solve' the problem by simply operating multiple identical clusters in
> >> different DCs and send updates to them all. This works quite well but it
> >> requires some manual intervention if a DC is down due to a prolonged DOS
> >> attack or netwerk of power failure.
> >>
> >> I don't think it's a very good idea to change clusterstate.json because
> >> Solr will modify it when for example a node goes down. Your
> preconfigured
> >> state doesn't exist anymore. It's also a bad idea because distributed
> >> queries are going to be sent to remote locations, adding a lot of
> latency.
> >> Again, because it's not DC aware.
> >>
> >> Any good solution to this problem should be in Solr itself.
> >>
> >> Cheers,
> >>
> >>
> >> -Original message-
> >> > From:Timothy Potter 
> >> > Sent: Tue 22-Jan-2013 22:46
> >> > To: solr-user@lucene.apache.org
> >> > Subject: Manually assigning shard leader and replicas during initial
> >> setup on EC2
> >> >
> >> > Hi,
> >> >
> >> > I'm wanting to split my existing Solr 4 cluster into 2 different
> >> > availability zones in EC2, as in have my initial leaders in one zone
> and
> >> > their replicas in another AZ. My thinking here is if one zone goes
> >> down, my
> >> > cluster stays online. This is the recommendation of Amazon EC2 docs.
> >> >
> >> > My thinking here is to just cook up a clusterstate.json file to
> manually
> >> > set my desired shard / replica assignments to specific nodes. After
> >> which I
> >> > can update the clusterstate.json file in Zk and then bring the nodes
> >> > online.
> >> >
> >> > The other thing to mention is that I have existing indexes that need
> to
> >> be
> >> > preserved as I don't want to re-index. For this I'm planning to just
> >> move
> >> > data directories where they need to be based on my changes to
> >> > clusterstate.json
> >> >
> >> > Does this sound reasonable? Any pitfalls I should look out for?
> >> >
> >> > Thanks.
> >> > Tim
> >> >
> >>
> >
> >
>


Re: Problem querying collection in Solr 4.1

2013-01-23 Thread Daniel Collins
Interesting, that sounds like a bit of an issue really, the cloud is
"hiding" the real error.  Presumably the non ok status: 500 (buried at the
bottom of your trace) was where the actual shard was returning the error
(we've had issues with positional stuff before and it normally says
something obvious like "field X wasn't indexed with positions data but
requires it"(sic).

Presumably the cloud gets the 500 from 1 server, and can't tell the
difference between an error like that (which means the query/data is
wrong), and the shard being down.  So it tries on another replica, gets the
same error, etc...  Would there be any way for the load balancer to detect
the difference between those types of errors?

I guess the morale of the story is to try your queries on a single system
before you migrate to replicas :)


On 22 January 2013 16:12, Brett Hoerner  wrote:

> Thanks, I'll check that out.
>
> Turns out our problem was we had omitTermFreqAndPositions true but were
> running queries like "puppy dog" which, I would imagine, require position.
>
>
> On Mon, Jan 21, 2013 at 9:22 PM, Gopal Patwa  wrote:
>
> > one thing I noticed in solrconfig xml that it set to use Lucene version
> 4.0
> > index format but you  mention you are using it 4.1
> >
> >   LUCENE_40
> >
> >
> >
> > On Mon, Jan 21, 2013 at 4:26 PM, Brett Hoerner  > >wrote:
> >
> > > I have a collection in Solr 4.1 RC1 and doing a simple query like
> > > text:"puppy dog" is causing an exception. Oddly enough, I CAN query for
> > > text:puppy or text:"puppy", but adding the space breaks everything.
> > >
> > > Schema and config: https://gist.github.com/f49da15e39e5609b75b1
> > >
> > > This happens whether I query the whole collection or a single direct
> > core.
> > > I haven't tested whether this would happen outside of SolrCloud.
> > >
> > >
> >
> http://localhost:8984/solr/timeline/select?q=text%3A%22puppy+dog%22&wt=xml
> > >
> > >
> > >
> >
> http://localhost:8984/solr/timeline_shard4_replica1/select?q=text%3A%22puppy+dog%22&wt=xml
> > >
> > > Jan 22, 2013 12:07:24 AM org.apache.solr.common.SolrException log
> > > SEVERE: null:org.apache.solr.common.SolrException:
> > > org.apache.solr.client.solrj.SolrServerException: No live SolrServers
> > > available to handle this request:[
> > >
> >
> http://timelinesearch-1d.i.massrel.com:8983/solr/timeline_shard2_replica1,
> > >
> >
> http://timelinesearch-2d.i.massrel.com:8983/solr/timeline_shard1_replica2,
> > >
> >
> http://timelinesearch-2d.i.massrel.com:8983/solr/timeline_shard3_replica2,
> > >
> >
> http://timelinesearch-1d.i.massrel.com:8983/solr/timeline_shard4_replica1,
> > >
> >
> http://timelinesearch-1d.i.massrel.com:8983/solr/timeline_shard1_replica1,
> > >
> >
> http://timelinesearch-2d.i.massrel.com:8983/solr/timeline_shard2_replica2,
> > >
> >
> http://timelinesearch-1d.i.massrel.com:8983/solr/timeline_shard3_replica1,
> > >
> >
> http://timelinesearch-2d.i.massrel.com:8983/solr/timeline_shard4_replica2]
> > >  at
> > >
> > >
> >
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:302)
> > > at
> > >
> > >
> >
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
> > >  at org.apache.solr.core.SolrCore.execute(SolrCore.java:1816)
> > > at
> > >
> > >
> >
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:448)
> > >  at
> > >
> > >
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:269)
> > > at
> > >
> > >
> >
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307)
> > >  at
> > >
> >
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453)
> > > at
> > >
> > >
> >
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
> > >  at
> > >
> >
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560)
> > > at
> > >
> > >
> >
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
> > >  at
> > >
> > >
> >
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072)
> > > at
> > >
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382)
> > >  at
> > >
> > >
> >
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
> > > at
> > >
> > >
> >
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006)
> > >  at
> > >
> > >
> >
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
> > > at
> > >
> > >
> >
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
> > >  at
> > >
> > >
> >
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
> > > at
> > >
> > >
> >
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
> > >  at org.eclipse.jetty.server.Server.handle(Server.java:365)
> > > at
> > >
> > >
> >
> org.eclipse.jetty.server.

Re: Large data importing getting rollback with solr

2013-01-23 Thread ashimbose
Hi Gora and Roman ,

Thank you for you valuable comments, I am trying to fallow your suggestion.
I will notify you when its done. 
If I face any problem to integrate solr please help me at the same way in
future.

Thank you,

Ashim 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Large-data-importing-getting-rollback-with-solr-tp4034075p4035562.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud index recovery

2013-01-23 Thread Upayavira
the first stage is identifying whether it can sync with transaction
logs. It couldn't, because there's no index. So the logs you have shown
make complete sense. It then says 'trying replication', which is what I
would expect, and the bit you are saying has failed. So the interesting
bit is likely immediately after the snippet you showed.



Upayavira





On Wed, Jan 23, 2013, at 07:40 AM, Marcin Rzewucki wrote:

  OK, so I did yet another test. I stopped solr, removed whole "data/"
  dir and started Solr again. Directories were recreated fine, but
  missing files were not downloaded from leader. Log is attached (I
  took the lines related to my test with 2 lines of context. I hope it
  helps.). I could find the following warning message:


Jan 23, 2013 7:16:08 AM org.apache.solr.update.PeerSync sync
INFO: PeerSync: core=ofac url=http://:8983/solr START
replicas=[http://:8983/solr/ofac/] nUpdates=100
Jan 23, 2013 7:16:08 AM org.apache.solr.update.PeerSync sync
WARNING: no frame of reference to tell of we've missed updates
Jan 23, 2013 7:16:08 AM org.apache.solr.cloud.RecoveryStrategy
doRecovery
INFO: PeerSync Recovery was not successful - trying replication.
core=ofac

So it did not know which files to download ?? Could you help me to
solve this problem ?

Thanks in advance.
Regards.

On 22 January 2013 23:06, Yonik Seeley <[1]yo...@lucidworks.com> wrote:

On Tue, Jan 22, 2013 at 4:37 PM, Marcin Rzewucki
<[2]mrzewu...@gmail.com> wrote:

> Sorry, my mistake. I did 2 tests: in the 1st I removed just index
directory

> and in 2nd test I removed both index and tlog directory. Log lines
I've

> sent are related to the first case. So Solr could read tlog directory
in

> that moment.

> Anyway, do you have an idea why it did not download files from leader
?

For your 1st test, if you only deleted the index and not the

transaction logs, Solr will look at the transaction logs to try and

determine if it is up to date or not (by comparing with peers).

If you want to clear out all the data, remove the entire data
directory.



-Yonik

[3]http://lucidworks.com

References

1. mailto:yo...@lucidworks.com
2. mailto:mrzewu...@gmail.com
3. http://lucidworks.com/


Re: Manually assigning shard leader and replicas during initial setup on EC2

2013-01-23 Thread Upayavira
The way Zookeeper is set up, requiring 'quorum' is aimed at avoiding
'split brain' where two halves of your cluster start to operate
independently. This means that you *have* to favour one half of your
cluster over the other, in the case that they cannot communicate with
each other.

For example. if you have three zookeepers, you'll put two in one DC and
one in the other. The DC with two Zookeepers will stay active should the
link between them go down.

I'm not entirely sure what happens to the network with one zookeeper,
I'd like to think it can still serve queries, it *could* work out which
nodes are accessible to it - but it will certainly not be doing updates
(they should be buffered until the other DC returns).

If you want true geographical redundancy, I think Markus' suggestion is
a sensible one.

Upayavira

On Tue, Jan 22, 2013, at 10:11 PM, Markus Jelsma wrote:
> Hi,
> 
> Regarding availability; since SolrCloud is not DC-aware at this moment we
> 'solve' the problem by simply operating multiple identical clusters in
> different DCs and send updates to them all. This works quite well but it
> requires some manual intervention if a DC is down due to a prolonged DOS
> attack or netwerk of power failure.
> 
> I don't think it's a very good idea to change clusterstate.json because
> Solr will modify it when for example a node goes down. Your preconfigured
> state doesn't exist anymore. It's also a bad idea because distributed
> queries are going to be sent to remote locations, adding a lot of
> latency. Again, because it's not DC aware.
> 
> Any good solution to this problem should be in Solr itself.
> 
> Cheers,
> 
>  
> -Original message-
> > From:Timothy Potter 
> > Sent: Tue 22-Jan-2013 22:46
> > To: solr-user@lucene.apache.org
> > Subject: Manually assigning shard leader and replicas during initial setup 
> > on EC2
> > 
> > Hi,
> > 
> > I'm wanting to split my existing Solr 4 cluster into 2 different
> > availability zones in EC2, as in have my initial leaders in one zone and
> > their replicas in another AZ. My thinking here is if one zone goes down, my
> > cluster stays online. This is the recommendation of Amazon EC2 docs.
> > 
> > My thinking here is to just cook up a clusterstate.json file to manually
> > set my desired shard / replica assignments to specific nodes. After which I
> > can update the clusterstate.json file in Zk and then bring the nodes
> > online.
> > 
> > The other thing to mention is that I have existing indexes that need to be
> > preserved as I don't want to re-index. For this I'm planning to just move
> > data directories where they need to be based on my changes to
> > clusterstate.json
> > 
> > Does this sound reasonable? Any pitfalls I should look out for?
> > 
> > Thanks.
> > Tim
> > 


Re: SolrCloud index recovery

2013-01-23 Thread Marcin Rzewucki
Hi,
Previously, I took the lines related to collection I tested. Maybe some
interesting part was missing. I'm sending the full log this time.
It ends up with:
INFO: Finished recovery process. core=ofac

The issue I described is related to collection called "ofac". I hope the
log is meaningful now.

It is trying to do the replication, but it seems to not know which files to
download.

Regards.

On 23 January 2013 10:39, Upayavira  wrote:

> the first stage is identifying whether it can sync with transaction
> logs. It couldn't, because there's no index. So the logs you have shown
> make complete sense. It then says 'trying replication', which is what I
> would expect, and the bit you are saying has failed. So the interesting
> bit is likely immediately after the snippet you showed.
>
>
>
> Upayavira
>
>
>
>
>
> On Wed, Jan 23, 2013, at 07:40 AM, Marcin Rzewucki wrote:
>
>   OK, so I did yet another test. I stopped solr, removed whole "data/"
>   dir and started Solr again. Directories were recreated fine, but
>   missing files were not downloaded from leader. Log is attached (I
>   took the lines related to my test with 2 lines of context. I hope it
>   helps.). I could find the following warning message:
>
>
> Jan 23, 2013 7:16:08 AM org.apache.solr.update.PeerSync sync
> INFO: PeerSync: core=ofac url=http://:8983/solr START
> replicas=[http://:8983/solr/ofac/] nUpdates=100
> Jan 23, 2013 7:16:08 AM org.apache.solr.update.PeerSync sync
> WARNING: no frame of reference to tell of we've missed updates
> Jan 23, 2013 7:16:08 AM org.apache.solr.cloud.RecoveryStrategy
> doRecovery
> INFO: PeerSync Recovery was not successful - trying replication.
> core=ofac
>
> So it did not know which files to download ?? Could you help me to
> solve this problem ?
>
> Thanks in advance.
> Regards.
>
> On 22 January 2013 23:06, Yonik Seeley <[1]yo...@lucidworks.com> wrote:
>
> On Tue, Jan 22, 2013 at 4:37 PM, Marcin Rzewucki
> <[2]mrzewu...@gmail.com> wrote:
>
> > Sorry, my mistake. I did 2 tests: in the 1st I removed just index
> directory
>
> > and in 2nd test I removed both index and tlog directory. Log lines
> I've
>
> > sent are related to the first case. So Solr could read tlog directory
> in
>
> > that moment.
>
> > Anyway, do you have an idea why it did not download files from leader
> ?
>
> For your 1st test, if you only deleted the index and not the
>
> transaction logs, Solr will look at the transaction logs to try and
>
> determine if it is up to date or not (by comparing with peers).
>
> If you want to clear out all the data, remove the entire data
> directory.
>
>
>
> -Yonik
>
> [3]http://lucidworks.com
>
> References
>
> 1. mailto:yo...@lucidworks.com
> 2. mailto:mrzewu...@gmail.com
> 3. http://lucidworks.com/
>


Re: SolrCloud index recovery

2013-01-23 Thread Upayavira
Hmm, don't see it. Not sure if attachments make it to this list.
Perhaps put it in a pastebin and include a link if too long to include
in an email?



Upayavira





On Wed, Jan 23, 2013, at 10:28 AM, Marcin Rzewucki wrote:

Hi,

Previously, I took the lines related to collection I tested. Maybe some
interesting part was missing. I'm sending the full log this time.

  It ends up with:

INFO: Finished recovery process. core=ofac

The issue I described is related to collection called "ofac". I hope
the log is meaningful now.

It is trying to do the replication, but it seems to not know which
files to download.

Regards.
On 23 January 2013 10:39, Upayavira <[1]u...@odoko.co.uk> wrote:

the first stage is identifying whether it can sync with transaction

logs. It couldn't, because there's no index. So the logs you have shown

make complete sense. It then says 'trying replication', which is what I

would expect, and the bit you are saying has failed. So the interesting

bit is likely immediately after the snippet you showed.







Upayavira



On Wed, Jan 23, 2013, at 07:40 AM, Marcin Rzewucki wrote:

  OK, so I did yet another test. I stopped solr, removed whole "data/"

  dir and started Solr again. Directories were recreated fine, but

  missing files were not downloaded from leader. Log is attached (I

  took the lines related to my test with 2 lines of context. I hope it

  helps.). I could find the following warning message:

Jan 23, 2013 7:16:08 AM org.apache.solr.update.PeerSync sync

INFO: PeerSync: core=ofac url=http://:8983/solr START

replicas=[http://:8983/solr/ofac/] nUpdates=100

Jan 23, 2013 7:16:08 AM org.apache.solr.update.PeerSync sync

WARNING: no frame of reference to tell of we've missed updates

Jan 23, 2013 7:16:08 AM org.apache.solr.cloud.RecoveryStrategy

doRecovery

INFO: PeerSync Recovery was not successful - trying replication.

core=ofac

So it did not know which files to download ?? Could you help me to

solve this problem ?

Thanks in advance.

Regards.

On 22 January 2013 23:06, Yonik Seeley <[1][2]yo...@lucidworks.com>
wrote:

On Tue, Jan 22, 2013 at 4:37 PM, Marcin Rzewucki

<[2][3]mrzewu...@gmail.com> wrote:

> Sorry, my mistake. I did 2 tests: in the 1st I removed just index

directory

> and in 2nd test I removed both index and tlog directory. Log lines

I've

> sent are related to the first case. So Solr could read tlog directory

in

> that moment.

> Anyway, do you have an idea why it did not download files from leader

?

For your 1st test, if you only deleted the index and not the

transaction logs, Solr will look at the transaction logs to try and

determine if it is up to date or not (by comparing with peers).

If you want to clear out all the data, remove the entire data

directory.

-Yonik

[3][4]http://lucidworks.com



References



1. mailto:[5]yo...@lucidworks.com

2. mailto:[6]mrzewu...@gmail.com

3. [7]http://lucidworks.com/

References

1. mailto:u...@odoko.co.uk
2. mailto:yo...@lucidworks.com
3. mailto:mrzewu...@gmail.com
4. http://lucidworks.com/
5. mailto:yo...@lucidworks.com
6. mailto:mrzewu...@gmail.com
7. http://lucidworks.com/


Re: solr query

2013-01-23 Thread Upayavira
You could use wt=xml&tr=.xsl and use an xsl stylesheet to transform
the XML into the order you want. (on 3.6 and below, use
wt=xslt&tr=.xsl)

However, I'm not that sure why you would want to - presumably some app
is going to consume this XML, and can't that put them into the right
order?

Upayavira

On Wed, Jan 23, 2013, at 07:24 AM, Gora Mohanty wrote:
> On 23 January 2013 01:26, hassancrowdc  wrote:
> > sorry if it is a stupid question but where can i find result.xml and where
> > do i write this program? any hints?
> [...]
> 
> The result XML referred to is the XML returned by
> Solr as a response to a query (if you prefer, you
> can instead get JSON by adding &wt=json to the end
> of the query string).
> 
> You will need to write a program to fetch this response,
> parse the XML/JSON, and process it further as needed.
> 
> I think that one can also write a custom Solr QueryComponent
> to reorder fields in each document, but as far as I know
> there is no other simple way to do this.
> 
> It might make sense to have the order of returned fields
> match the order specified by the "fl" attribute.
> 
> Regards,
> Gora


Re: SolrCloud index recovery

2013-01-23 Thread Marcin Rzewucki
OK, check this link: http://pastebin.com/qMC9kDvt


On 23 January 2013 11:35, Upayavira  wrote:

> Hmm, don't see it. Not sure if attachments make it to this list.
> Perhaps put it in a pastebin and include a link if too long to include
> in an email?
>
>
>
> Upayavira
>
>
>
>
>
> On Wed, Jan 23, 2013, at 10:28 AM, Marcin Rzewucki wrote:
>
> Hi,
>
> Previously, I took the lines related to collection I tested. Maybe some
> interesting part was missing. I'm sending the full log this time.
>
>   It ends up with:
>
> INFO: Finished recovery process. core=ofac
>
> The issue I described is related to collection called "ofac". I hope
> the log is meaningful now.
>
> It is trying to do the replication, but it seems to not know which
> files to download.
>
> Regards.
> On 23 January 2013 10:39, Upayavira <[1]u...@odoko.co.uk> wrote:
>
> the first stage is identifying whether it can sync with transaction
>
> logs. It couldn't, because there's no index. So the logs you have shown
>
> make complete sense. It then says 'trying replication', which is what I
>
> would expect, and the bit you are saying has failed. So the interesting
>
> bit is likely immediately after the snippet you showed.
>
>
>
>
>
>
>
> Upayavira
>
>
>
> On Wed, Jan 23, 2013, at 07:40 AM, Marcin Rzewucki wrote:
>
>   OK, so I did yet another test. I stopped solr, removed whole "data/"
>
>   dir and started Solr again. Directories were recreated fine, but
>
>   missing files were not downloaded from leader. Log is attached (I
>
>   took the lines related to my test with 2 lines of context. I hope it
>
>   helps.). I could find the following warning message:
>
> Jan 23, 2013 7:16:08 AM org.apache.solr.update.PeerSync sync
>
> INFO: PeerSync: core=ofac url=http://:8983/solr START
>
> replicas=[http://:8983/solr/ofac/] nUpdates=100
>
> Jan 23, 2013 7:16:08 AM org.apache.solr.update.PeerSync sync
>
> WARNING: no frame of reference to tell of we've missed updates
>
> Jan 23, 2013 7:16:08 AM org.apache.solr.cloud.RecoveryStrategy
>
> doRecovery
>
> INFO: PeerSync Recovery was not successful - trying replication.
>
> core=ofac
>
> So it did not know which files to download ?? Could you help me to
>
> solve this problem ?
>
> Thanks in advance.
>
> Regards.
>
> On 22 January 2013 23:06, Yonik Seeley <[1][2]yo...@lucidworks.com>
> wrote:
>
> On Tue, Jan 22, 2013 at 4:37 PM, Marcin Rzewucki
>
> <[2][3]mrzewu...@gmail.com> wrote:
>
> > Sorry, my mistake. I did 2 tests: in the 1st I removed just index
>
> directory
>
> > and in 2nd test I removed both index and tlog directory. Log lines
>
> I've
>
> > sent are related to the first case. So Solr could read tlog directory
>
> in
>
> > that moment.
>
> > Anyway, do you have an idea why it did not download files from leader
>
> ?
>
> For your 1st test, if you only deleted the index and not the
>
> transaction logs, Solr will look at the transaction logs to try and
>
> determine if it is up to date or not (by comparing with peers).
>
> If you want to clear out all the data, remove the entire data
>
> directory.
>
> -Yonik
>
> [3][4]http://lucidworks.com
>
>
>
> References
>
>
>
> 1. mailto:[5]yo...@lucidworks.com
>
> 2. mailto:[6]mrzewu...@gmail.com
>
> 3. [7]http://lucidworks.com/
>
> References
>
> 1. mailto:u...@odoko.co.uk
> 2. mailto:yo...@lucidworks.com
> 3. mailto:mrzewu...@gmail.com
> 4. http://lucidworks.com/
> 5. mailto:yo...@lucidworks.com
> 6. mailto:mrzewu...@gmail.com
> 7. http://lucidworks.com/
>


Re: access matched token ids in the FacetComponent?

2013-01-23 Thread Dmitry Kan
Thanks Alexandre for correcting the link and Mikhail for sharing the ideas!

Mihkail,

I will need to look closer at your customization of SpansFacetComponent on
the blogpost.
Is it so, that in this component, you are accessing and counting the
matched spans?

Thanks,

Dmitry

On Tue, Jan 22, 2013 at 9:17 PM, Mikhail Khludnev <
mkhlud...@griddynamics.com> wrote:

> Dmitry,
>
> Solr faceting is really fast due to using in-memory approach (keeping few
> noticeable exceptions in mind), hence spans should be slower. Reading term
> positions/payloads always has sensible gain. You can estimate it, if you
> compare time for a phrase query "foo bar" with a plain conjunction +foo
> +bar one.
> It worth to mention that our SpansFacetComponent performed well enough,
> even for public site. You can find my comment about performance numbers
> "64К docs with 5-20 span positions per each. Search result length 100-2000
> docs with 3-5 facet fields. It shows 100 q/sec on an average datacenter
> box."
>
>
> On Mon, Jan 21, 2013 at 5:23 PM, Dmitry Kan  wrote:
>
> > Mikhail,
> >
> > Thanks for the guidance! This indeed sounds challenging, esp. given the
> > bonus of fighting with solr 3.x in light of disjunction queries.
> Although,
> > moving to solr 4.0 if this makes life easier should be ok.
> >
> > But even before getting one's hands dirty, it would be good to know, if
> > this is going to fly performance wise. Has your span based implementation
> > been fast enough? Did it stand close to the native solr's faceting in
> terms
> > of performance?
> >
> > On Mon, Jan 21, 2013 at 2:33 PM, Mikhail Khludnev <
> > mkhlud...@griddynamics.com> wrote:
> >
> > > Dmitry,
> > >
> > > First of all, FacetComponent is the Solr's out-of-the-box
> functionality.
> > It
> > > runs after search is done and accesses the bitSet of the found
> document,
> > > i.e. there is no spans (matched terms positions) there at all.
> > >
> > > StandardFacetsAccumulator sounds like the "brand new" lucene faceting
> > > library. see http://shaierera.blogspot.com/. I don't think but don't
> > > exactly know whether they are accessible there too.
> > >
> > > Some time ago my team successfully prototyped facet component backed on
> > > spans
> > >
> >
> blog.griddynamics.com/2011/10/solr-experience-search-parent-child.htmlbut
> > > I don't suggest you go this way.
> > > I can suggest you start from the following:
> > > - supply PostFilter/DelegatingCollector
> > > http://yonik.com/posts/advanced-filter-caching-in-solr/
> > > - the DelegatingCollector will accept the scorer instance
> > > - if this scorer is BooleanScorer2 (but not BooleanScorer!), you can
> > access
> > > the SpanQueryScorer in one of the legs and try to access the matched
> > spans
> > > - if you are in 3.x you'll have a problem with disjunction queries.
> > >
> > > it seems challenging, doesn't it?
> > >
> > > 18.01.2013 17:40 пользователь "Dmitry Kan" 
> > написал:
> > >
> > > > Mikhail,
> > > >
> > > > Do you say, that it is not possible to access the matched terms
> > positions
> > > > in the FacetComponent? If that would be possible (somewhere in the
> > > > StandardFacetsAccumulator class, where docids are available), then by
> > > > knowing the matched term positions I can do some school simple math
> to
> > > > calculate the sentence counts per doc id.
> > > >
> > > > Dmitry
> > > >
> > > > On Fri, Jan 18, 2013 at 2:45 PM, Mikhail Khludnev <
> > > > mkhlud...@griddynamics.com> wrote:
> > > >
> > > > > Dmitry,
> > > > >
> > > > > It definitely seems like postptocessing highlighter's output. The
> > also
> > > > > approach is:
> > > > > - limit number of occurrences of a word in a sentence to 1
> > > > > - play with facet by function patch
> > > > > https://issues.apache.org/jira/browse/SOLR-1581 accomplished by
> tf()
> > > > > function.
> > > > >
> > > > > It doesn't seem like much help.
> > > > >
> > > > > On Fri, Jan 18, 2013 at 12:42 PM, Dmitry Kan  >
> > > > wrote:
> > > > >
> > > > > > that we actually require the count of the sentences inside
> > > > > > each document where the hits were found.
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Sincerely yours
> > > > > Mikhail Khludnev
> > > > > Principal Engineer,
> > > > > Grid Dynamics
> > > > >
> > > > > 
> > > > >  
> > > > >
> > > >
> > >
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> 
>  
>


Re: SolrCloud index recovery

2013-01-23 Thread Upayavira
Are documents arriving, but your index is empty? Looking at that log,
everything appears to have happened fine, except the replication handler
has put the index in a directory with a suffix:

WARNING: New index directory detected: old=null
new=/solr/cores/bpr/selekta/data/index.20130121090342477
Jan 23, 2013 7:16:08 AM org.apache.solr.core.CachingDirectoryFactory get
INFO: return new directory for
/solr/cores/bpr/selekta/data/index.20130121090342477 forceNew:false

Once you look in that dir, how do things look?

Upayavira

On Wed, Jan 23, 2013, at 10:45 AM, Marcin Rzewucki wrote:
> OK, check this link: http://pastebin.com/qMC9kDvt
> 
> 
> On 23 January 2013 11:35, Upayavira  wrote:
> 
> > Hmm, don't see it. Not sure if attachments make it to this list.
> > Perhaps put it in a pastebin and include a link if too long to include
> > in an email?
> >
> >
> >
> > Upayavira
> >
> >
> >
> >
> >
> > On Wed, Jan 23, 2013, at 10:28 AM, Marcin Rzewucki wrote:
> >
> > Hi,
> >
> > Previously, I took the lines related to collection I tested. Maybe some
> > interesting part was missing. I'm sending the full log this time.
> >
> >   It ends up with:
> >
> > INFO: Finished recovery process. core=ofac
> >
> > The issue I described is related to collection called "ofac". I hope
> > the log is meaningful now.
> >
> > It is trying to do the replication, but it seems to not know which
> > files to download.
> >
> > Regards.
> > On 23 January 2013 10:39, Upayavira <[1]u...@odoko.co.uk> wrote:
> >
> > the first stage is identifying whether it can sync with transaction
> >
> > logs. It couldn't, because there's no index. So the logs you have shown
> >
> > make complete sense. It then says 'trying replication', which is what I
> >
> > would expect, and the bit you are saying has failed. So the interesting
> >
> > bit is likely immediately after the snippet you showed.
> >
> >
> >
> >
> >
> >
> >
> > Upayavira
> >
> >
> >
> > On Wed, Jan 23, 2013, at 07:40 AM, Marcin Rzewucki wrote:
> >
> >   OK, so I did yet another test. I stopped solr, removed whole "data/"
> >
> >   dir and started Solr again. Directories were recreated fine, but
> >
> >   missing files were not downloaded from leader. Log is attached (I
> >
> >   took the lines related to my test with 2 lines of context. I hope it
> >
> >   helps.). I could find the following warning message:
> >
> > Jan 23, 2013 7:16:08 AM org.apache.solr.update.PeerSync sync
> >
> > INFO: PeerSync: core=ofac url=http://:8983/solr START
> >
> > replicas=[http://:8983/solr/ofac/] nUpdates=100
> >
> > Jan 23, 2013 7:16:08 AM org.apache.solr.update.PeerSync sync
> >
> > WARNING: no frame of reference to tell of we've missed updates
> >
> > Jan 23, 2013 7:16:08 AM org.apache.solr.cloud.RecoveryStrategy
> >
> > doRecovery
> >
> > INFO: PeerSync Recovery was not successful - trying replication.
> >
> > core=ofac
> >
> > So it did not know which files to download ?? Could you help me to
> >
> > solve this problem ?
> >
> > Thanks in advance.
> >
> > Regards.
> >
> > On 22 January 2013 23:06, Yonik Seeley <[1][2]yo...@lucidworks.com>
> > wrote:
> >
> > On Tue, Jan 22, 2013 at 4:37 PM, Marcin Rzewucki
> >
> > <[2][3]mrzewu...@gmail.com> wrote:
> >
> > > Sorry, my mistake. I did 2 tests: in the 1st I removed just index
> >
> > directory
> >
> > > and in 2nd test I removed both index and tlog directory. Log lines
> >
> > I've
> >
> > > sent are related to the first case. So Solr could read tlog directory
> >
> > in
> >
> > > that moment.
> >
> > > Anyway, do you have an idea why it did not download files from leader
> >
> > ?
> >
> > For your 1st test, if you only deleted the index and not the
> >
> > transaction logs, Solr will look at the transaction logs to try and
> >
> > determine if it is up to date or not (by comparing with peers).
> >
> > If you want to clear out all the data, remove the entire data
> >
> > directory.
> >
> > -Yonik
> >
> > [3][4]http://lucidworks.com
> >
> >
> >
> > References
> >
> >
> >
> > 1. mailto:[5]yo...@lucidworks.com
> >
> > 2. mailto:[6]mrzewu...@gmail.com
> >
> > 3. [7]http://lucidworks.com/
> >
> > References
> >
> > 1. mailto:u...@odoko.co.uk
> > 2. mailto:yo...@lucidworks.com
> > 3. mailto:mrzewu...@gmail.com
> > 4. http://lucidworks.com/
> > 5. mailto:yo...@lucidworks.com
> > 6. mailto:mrzewu...@gmail.com
> > 7. http://lucidworks.com/
> >


Re: Logging wrong exception

2013-01-23 Thread Muhzin R
Hi Gora,
I'm solrj4.1 with spring data solr.
here is my code.

PartialUpdate update = new PartialUpdate("id", "123");
update.setValueOfField("mutiValuedField", null);
solrTemplate.saveBean(update);
solrTemplate.commit();



On Fri, Jan 18, 2013 at 6:44 PM, Gora Mohanty  wrote:

> On 18 January 2013 18:34, Muhzin R  wrote:
> > Hi all, I'm trying to set the value of a field in my schema to null.The
> > solr throws the following exception .
> > 
> >
> [...]
>
> This is the relevant part of the error:
>
> > INFO  - 2013-01-18 18:13:35.409;
> > org.apache.solr.update.processor.LogUpdateProcessor; [core0] webapp=/solr
> > path=/update params={wt=javabin&version=2} {} 0 3
> > ERROR - 2013-01-18 18:13:35.409; org.apache.solr.common.SolrException;
> > org.apache.solr.common.SolrException: [doc=10] missing required field:
> > countryId
> [...]
>
> > __
> >  Even though I'm trying to modify a field other than countryId. FYI i'm
> > trying to do a partial update.
> > The schema of countryId is as :
> > 
> >  > "true"/>
> > 
> > Why is solr logging a me the wrong exception?
>
> Please show us the code that triggers this exception. Seems
> like you are trying to do an update without providing a value
> for a required field.
>
> If you are using Solr 4.0, here is how to update only a specific
> in a document:
> http://solr.pl/en/2012/07/09/solr-4-0-partial-documents-update/
>
> Regards,
> Gora
>


Re: Hi

2013-01-23 Thread Upayavira
You are going to have to give more information than this. If you get bad
request, look in the logs for the Solr server and you will probably find
an exception there that tells you what was wrong with your document.

Upayavira

On Wed, Jan 23, 2013, at 08:58 AM, Thendral Thiruvengadam wrote:
> Hi,
> 
> We are trying to use solar for indexing our application data.
> 
> When we try to add a new object into solr, we are getting Bad Request.
> 
> Please help us with this.
> 
> Thanks,
> Thendral
> 
> 
> 
> http://www.mindtree.com/email/disclaimer.html


Re: SolrCloud index recovery

2013-01-23 Thread Marcin Rzewucki
No, you look at wrong collection. I told you I have couple of collections
in Solr. I guess some messages may overlap each other. The one for which I
did test (index recovery) is called "ofac" and that fails. Besides, Solr
sometimes adds suffix to index directory internally and it is not a bug.

The lines which are interesting are:
WARNING: no frame of reference to tell of we've missed updates
Jan 23, 2013 7:16:08 AM org.apache.solr.cloud.RecoveryStrategy doRecovery
INFO: PeerSync Recovery was not successful - trying replication. core=ofac
Jan 23, 2013 7:16:08 AM org.apache.solr.cloud.RecoveryStrategy doRecovery

and after a while:

Jan 23, 2013 7:16:11 AM org.apache.solr.update.UpdateLog bufferUpdates
INFO: Starting to buffer updates. FSUpdateLog{state=ACTIVE,* tlog=null*}
Jan 23, 2013 7:16:11 AM org.apache.solr.cloud.RecoveryStrategy replicate
*INFO: Attempting to replicate from http://:8983/solr/ofac/.
core=ofac*
Jan 23, 2013 7:16:11 AM org.apache.solr.client.solrj.impl.HttpClientUtil
createClient
INFO: Creating new http client,
config:maxConnections=128&maxConnectionsPerHost=32&followRedirects=false
Jan 23, 2013 7:16:11 AM org.apache.solr.client.solrj.impl.HttpClientUtil
createClient
INFO: Creating new http client,
config:connTimeout=5000&socketTimeout=2&allowCompression=false&maxConnections=1&maxConnectionsPerHost=1
Jan 23, 2013 7:16:11 AM org.apache.solr.handler.SnapPuller 
INFO:  No value set for 'pollInterval'. Timer Task not started.
Jan 23, 2013 7:16:11 AM org.apache.solr.core.SolrDeletionPolicy onInit
INFO: SolrDeletionPolicy.onInit: commits:num=1

commit{dir=,segFN=segments_1,generation=1,filenames=[segments_1]
Jan 23, 2013 7:16:11 AM org.apache.solr.core.SolrDeletionPolicy
updateCommits
INFO: newest commit = 1
Jan 23, 2013 7:16:11 AM org.apache.solr.update.DirectUpdateHandler2 commit
INFO: start
commit{flags=0,_version_=0,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false}
Jan 23, 2013 7:16:11 AM org.apache.solr.core.SolrDeletionPolicy onCommit
INFO: SolrDeletionPolicy.onCommit: commits:num=2

commit{dir=,segFN=segments_1,generation=1,filenames=[segments_1]

commit{dir=,segFN=segments_2,generation=2,filenames=[segments_2]
Jan 23, 2013 7:16:11 AM org.apache.solr.core.SolrDeletionPolicy
updateCommits
INFO: newest commit = 2
Jan 23, 2013 7:16:11 AM org.apache.solr.search.SolrIndexSearcher 
INFO: Opening Searcher@192aaffb main
Jan 23, 2013 7:16:11 AM org.apache.solr.schema.IndexSchema readSchema
INFO: default search field in schema is cmpy_lstd
Jan 23, 2013 7:16:11 AM org.apache.solr.core.QuerySenderListener newSearcher
INFO: QuerySenderListener sending requests to
Searcher@192aaffbmain{StandardDirectoryReader(segments_2:2)}
Jan 23, 2013 7:16:11 AM org.apache.solr.update.DirectUpdateHandler2 commit
INFO: end_commit_flush
Jan 23, 2013 7:16:11 AM org.apache.solr.core.QuerySenderListener newSearcher
INFO: QuerySenderListener done.
Jan 23, 2013 7:16:11 AM org.apache.solr.core.SolrCore registerSearcher
INFO: [ofac] Registered new searcher
Searcher@192aaffbmain{StandardDirectoryReader(segments_2:2)}
Jan 23, 2013 7:16:11 AM org.apache.solr.cloud.RecoveryStrategy replay
*INFO: No replay needed. core=ofac*
Jan 23, 2013 7:16:11 AM org.apache.solr.cloud.RecoveryStrategy doRecovery
*INFO: Replication Recovery was successful - registering as Active.
core=ofac*

But recovery was not successful. Index was not downloaded from leader. It's
empty.


On 23 January 2013 12:22, Upayavira  wrote:

> Are documents arriving, but your index is empty? Looking at that log,
> everything appears to have happened fine, except the replication handler
> has put the index in a directory with a suffix:
>
> WARNING: New index directory detected: old=null
> new=/solr/cores/bpr/selekta/data/index.20130121090342477
> Jan 23, 2013 7:16:08 AM org.apache.solr.core.CachingDirectoryFactory get
> INFO: return new directory for
> /solr/cores/bpr/selekta/data/index.20130121090342477 forceNew:false
>
> Once you look in that dir, how do things look?
>
> Upayavira
>
> On Wed, Jan 23, 2013, at 10:45 AM, Marcin Rzewucki wrote:
> > OK, check this link: http://pastebin.com/qMC9kDvt
> >
> >
> > On 23 January 2013 11:35, Upayavira  wrote:
> >
> > > Hmm, don't see it. Not sure if attachments make it to this list.
> > > Perhaps put it in a pastebin and include a link if too long to include
> > > in an email?
> > >
> > >
> > >
> > > Upayavira
> > >
> > >
> > >
> > >
> > >
> > > On Wed, Jan 23, 2013, at 10:28 AM, Marcin Rzewucki wrote:
> > >
> > > Hi,
> > >
> > > Previously, I took the lines related to collection I tested. Maybe some
> > > interesting part was missing. I'm sending the full log this time.
> > >
> > >   It ends up with:
> > >
> > > INFO: Finished recovery process. core=ofac
> > >
> > > The issue I described is related to collection called "ofac". I hope
> > > the log is meaningful now.
> > >
> > > It is trying to do the replication, but it seems to not know which
> > > files to download.
> > 

Re: Hi

2013-01-23 Thread Alexandre Rafalovitch
We need a "Make your own adventure"  (TM) Solr troubleshooting guide. :-)

*) You are staring at the Solr installation full of twisty little passages
and nuances. Would you like to:
   *) Build your first index?
   *) Make your first query?
   *) Spread your documents in the cloud?
   *) Build your own UpdateProcessor to integrate reverse Geocoding web
service into your NLP disambiguation UIMA module to drive your More Like
This suggestions?

Well, maybe somebody with more imagination can figure the better way to
phrase it. Then, we make a mobile app for doing this and retire
millionaires. :-) Though that last one could make for an awesome Solr demo.
:-)

Seriously though.

Thendral,
You do need to say at least how far you got before you emailed us. Have you
gone through tutorial and understood that but your own custom schema is
giving you troubles? Have you tried indexing a Solr Update XML document
containing the data you believe you have?

You need to be able to take a long problem and split it into half and see
which half works and which one does not. It is bit hard from your
description.

Regards,
   Alex.


Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Wed, Jan 23, 2013 at 7:00 AM, Upayavira  wrote:

> You are going to have to give more information than this. If you get bad
> request, look in the logs for the Solr server and you will probably find
> an exception there that tells you what was wrong with your document.
>
> Upayavira
>
> On Wed, Jan 23, 2013, at 08:58 AM, Thendral Thiruvengadam wrote:
> > Hi,
> >
> > We are trying to use solar for indexing our application data.
> >
> > When we try to add a new object into solr, we are getting Bad Request.
> >
> > Please help us with this.
> >
> > Thanks,
> > Thendral
> >
> > 
> >
> > http://www.mindtree.com/email/disclaimer.html
>


Possible UI issue after upgrade to 4.1

2013-01-23 Thread Joseph Dale
Please see the linked screen shot. The DIH works but the UI says its not 
configured.

http://i1194.photobucket.com/albums/aa365/Rouphis/ScreenShot2013-01-23at80306AM_zps1aa10b37.png

-Joey

Re: Multiple-fields multilingual indexing - Query expansion for multilingual fields

2013-01-23 Thread Eduard Moraru
On Wed, Jan 23, 2013 at 3:38 PM, Eduard Moraru  wrote:

> Hello,
>
> Here is my problem:
>
> I am trying to do multilingual indexing in Solr and each document
> translation is indexed as an independent Solr/Lucene document having some
> fields suffixed with the language code. Here is an example:
>
> English:
> id:SomeDocument_en
> lang:en
> title:"English version of the title" <-- text_general field
> title_en:"English version of the title" <-- text_en field, optimized for
> english
> content:"English version of the content"
> content_en:"English version of the content"
> author:SomeGuy
> ...
>
> French:
> id:SomeDocument_fr
> lang:fr
> title:"French version of the title" <-- text_general field
> title_fr:"French version of the title" <-- text_en field, optimized for
> english
> content:"French version of the content"
> content_fr:"French version of the content"
> author:SomeGuy
> ...
>
> I am writing a search back-end that should allow my users to easily query
> this layout, without having to care about the fact that *some* fields have
> funky names. To achieve this, I would like to write some sort of query
> expander that allows users to write simple queries like:
>
> "title:version author:SomeGuy content:content"
>
> which would get automagically expanded to:
>
> "(title_en:version OR title_fr:version) author:SomeGuy (content_en:content
> OR content_fr:content)"
>
> The list of available languages for which to do the expansion is
> configured separately and should be manually synchronized with the
> configured fields from the schema.xml.
>
> I want the users to use my back-end/library for searching the Solr index
> and I want this library to work with either a remote or embedded Solr
> server, using solrj.
>
> My current/best approach to date is to use a custom Lucene QueryParser set
> up with the KeywordAnalyzer and "" as default field. The result is a Query
> object that I can call extractTerms() on it so that I can inspect the terms
> and see what terms are in my list of fields to expand. The actual expansion
> consists of doing a replaceAll("field:value", "(field:value OR
> field_en:value OR field_fr:value OR ...))" on the toString() of the Query
> instance. Since some of the parsed query terms don`t support
> extractTerms(), I have overridden getPrefixQuery, getWildcardQuery and
> getRangeQuery in my custom query parser to return a simple TermQuery
> instead. Also, I`ve overridden getFieldQuery to manually add quotes inside
> a Term's value since the parsing strips them. Basically, I`ve tried as much
> as possible to make the Query.toString() method output a query which
> resembles as much as possible the input query and which does not alter the
> field value inside the terms (preserves quotes, preserves escaping, etc.).
>
> Reminder: this custom query parser is not run inside Solr (as a plugin or
> anything). It is run inside my search back-end module and its purpose is to
> run, no matter where the Solr instance is located (local or remote).
>
> This approach was looking pretty good in my tests, until I`ve noticed some
> shortcomings:
> - Solr-specific queries containing local parameters are not parsed (I get
> an invalid query exception). This is most likely due to the fact that
> QueryParser is Lucene specific and it does not understand the term of local
> parameters.
> - Queries such as single quote (") or single escape character (/) also
> throw exceptions, even if such queries work (well, get cleaned up properly
> but don`t throw exceptions) in a pure solr query.
> - Other stuff that I might miss by this approach that consists of
> diferences between Solr and Lucene queries.
>
> I tried as much as possible to
>

typo: I tried to *not* have to write a Solr plugin.

 have to write some sort of Solr query parser plugin that needs to be
> installed inside the running Solr instance and tried to do everything on
> the "client" side of the request.
>
> I have noticed that all the Solr-specific query parsers
> (ExtendedSolrQueryParser and such) can not be instantiated without
> supplying an IndexSchema and a SolrCore. If I do this on the client side, I
> end up with an embedded Solr server, just so that I can expand a query,
> which is definetly not the overhead that I want.
>
> Can somebody please suggest me what is the best approach to handling this
> problem? Am I handling the issue from a bad angle? Is there a different
> best practice when dealing with multilingual documents that allows querying
> fields in all the languages more easily?
>
> Thank you for the patience of reading this message and I hope you`ll help
> me find a good solution to this problem that has been eating a lot of my
> time recently.
>
> -Eduard
>


Re: Multiple-fields multilingual indexing - Query expansion for multilingual fields

2013-01-23 Thread Alexandre Rafalovitch
On Wed, Jan 23, 2013 at 8:38 AM, Eduard Moraru  wrote:

> "title:version author:SomeGuy content:content"
>
> which would get automagically expanded to:
>
> "(title_en:version OR title_fr:version) author:SomeGuy (content_en:content
> OR content_fr:content)"
>

Ignoring everything else, how is this different from eDisMax's field
combined with User Field setting? (haven't used them myself yet)
http://wiki.apache.org/solr/ExtendedDisMax#Field_aliasing_.2BAC8_renaming

Regards,
   Alex.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


Possible UI issue after upgrade to 4.1

2013-01-23 Thread Joseph Dale
I upgraded to solr 4.1 from 4.0 to take advantage of some solrcloud 
improvements, but now it seems that the DIH UI is broken. I have a screen shot 
but the list seems to block emails w/ links. I will try to describe my issue:

* DIH it self works, via commands & the buttons on the UI.
* The DIH UI says "Dataimport XML-Configuration is not valid"

Any ideas?

Thanks
-Joey

Re: SolrException: Error loading class 'org.apache.solr.response.transform.EditorialMarkerFactory'

2013-01-23 Thread eShard
Thanks,
That worked.
So the documentation needs to be fixed in a few places (the solr wiki and
the default solrconfig.xml in Solr 4.0 final; I didn't check any other
versions)
I'll either open a new ticket in JIRA to request a fix or reopen the old
one...

Furthermore,
I tried using the ElevatedMarkerFactory and it didn't behave the way I
thought it would.

this http://localhost:8080/solr/Lisa/elevate?q=foo+bar&wt=xml&defType=dismax
got me all the doc info but no elevated marker

I ran this
http://localhost:8080/solr/Lisa/elevate?q=foo+bar&fl=[elevated]&wt=xml&defType=dismax
and all I got was response = 1 and elevated = true

I had to run this to get all of the above info:
http://localhost:8080/solr/Lisa/elevate?q=foo+bar&fl=*,[elevated]&wt=xml&defType=dismax



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrException-Error-loading-class-org-apache-solr-response-transform-EditorialMarkerFactory-tp4035203p4035621.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud index recovery

2013-01-23 Thread Mark Miller
Was your full logged stripped? You are right, we need more. Yes, the peer sync 
failed, but then you cut out all the important stuff about the replication 
attempt that happens after.

- Mark

On Jan 23, 2013, at 5:28 AM, Marcin Rzewucki  wrote:

> Hi,
> Previously, I took the lines related to collection I tested. Maybe some 
> interesting part was missing. I'm sending the full log this time.
> It ends up with:
> INFO: Finished recovery process. core=ofac
> 
> The issue I described is related to collection called "ofac". I hope the log 
> is meaningful now.
> 
> It is trying to do the replication, but it seems to not know which files to 
> download.
> 
> Regards.
> 
> On 23 January 2013 10:39, Upayavira  wrote:
> the first stage is identifying whether it can sync with transaction
> logs. It couldn't, because there's no index. So the logs you have shown
> make complete sense. It then says 'trying replication', which is what I
> would expect, and the bit you are saying has failed. So the interesting
> bit is likely immediately after the snippet you showed.
> 
> 
> 
> Upayavira
> 
> 
> 
> 
> 
> On Wed, Jan 23, 2013, at 07:40 AM, Marcin Rzewucki wrote:
> 
>   OK, so I did yet another test. I stopped solr, removed whole "data/"
>   dir and started Solr again. Directories were recreated fine, but
>   missing files were not downloaded from leader. Log is attached (I
>   took the lines related to my test with 2 lines of context. I hope it
>   helps.). I could find the following warning message:
> 
> 
> Jan 23, 2013 7:16:08 AM org.apache.solr.update.PeerSync sync
> INFO: PeerSync: core=ofac url=http://:8983/solr START
> replicas=[http://:8983/solr/ofac/] nUpdates=100
> Jan 23, 2013 7:16:08 AM org.apache.solr.update.PeerSync sync
> WARNING: no frame of reference to tell of we've missed updates
> Jan 23, 2013 7:16:08 AM org.apache.solr.cloud.RecoveryStrategy
> doRecovery
> INFO: PeerSync Recovery was not successful - trying replication.
> core=ofac
> 
> So it did not know which files to download ?? Could you help me to
> solve this problem ?
> 
> Thanks in advance.
> Regards.
> 
> On 22 January 2013 23:06, Yonik Seeley <[1]yo...@lucidworks.com> wrote:
> 
> On Tue, Jan 22, 2013 at 4:37 PM, Marcin Rzewucki
> <[2]mrzewu...@gmail.com> wrote:
> 
> > Sorry, my mistake. I did 2 tests: in the 1st I removed just index
> directory
> 
> > and in 2nd test I removed both index and tlog directory. Log lines
> I've
> 
> > sent are related to the first case. So Solr could read tlog directory
> in
> 
> > that moment.
> 
> > Anyway, do you have an idea why it did not download files from leader
> ?
> 
> For your 1st test, if you only deleted the index and not the
> 
> transaction logs, Solr will look at the transaction logs to try and
> 
> determine if it is up to date or not (by comparing with peers).
> 
> If you want to clear out all the data, remove the entire data
> directory.
> 
> 
> 
> -Yonik
> 
> [3]http://lucidworks.com
> 
> References
> 
> 1. mailto:yo...@lucidworks.com
> 2. mailto:mrzewu...@gmail.com
> 3. http://lucidworks.com/
> 



Re: SolrCloud index recovery

2013-01-23 Thread Upayavira
Mark,

Take a peek in the pastebin url Marcin mentioned earlier
(http://pastebin.com/qMC9kDvt) is there enough info there?

Upayavira

On Wed, Jan 23, 2013, at 02:04 PM, Mark Miller wrote:
> Was your full logged stripped? You are right, we need more. Yes, the peer
> sync failed, but then you cut out all the important stuff about the
> replication attempt that happens after.
> 
> - Mark
> 
> On Jan 23, 2013, at 5:28 AM, Marcin Rzewucki  wrote:
> 
> > Hi,
> > Previously, I took the lines related to collection I tested. Maybe some 
> > interesting part was missing. I'm sending the full log this time.
> > It ends up with:
> > INFO: Finished recovery process. core=ofac
> > 
> > The issue I described is related to collection called "ofac". I hope the 
> > log is meaningful now.
> > 
> > It is trying to do the replication, but it seems to not know which files to 
> > download.
> > 
> > Regards.
> > 
> > On 23 January 2013 10:39, Upayavira  wrote:
> > the first stage is identifying whether it can sync with transaction
> > logs. It couldn't, because there's no index. So the logs you have shown
> > make complete sense. It then says 'trying replication', which is what I
> > would expect, and the bit you are saying has failed. So the interesting
> > bit is likely immediately after the snippet you showed.
> > 
> > 
> > 
> > Upayavira
> > 
> > 
> > 
> > 
> > 
> > On Wed, Jan 23, 2013, at 07:40 AM, Marcin Rzewucki wrote:
> > 
> >   OK, so I did yet another test. I stopped solr, removed whole "data/"
> >   dir and started Solr again. Directories were recreated fine, but
> >   missing files were not downloaded from leader. Log is attached (I
> >   took the lines related to my test with 2 lines of context. I hope it
> >   helps.). I could find the following warning message:
> > 
> > 
> > Jan 23, 2013 7:16:08 AM org.apache.solr.update.PeerSync sync
> > INFO: PeerSync: core=ofac url=http://:8983/solr START
> > replicas=[http://:8983/solr/ofac/] nUpdates=100
> > Jan 23, 2013 7:16:08 AM org.apache.solr.update.PeerSync sync
> > WARNING: no frame of reference to tell of we've missed updates
> > Jan 23, 2013 7:16:08 AM org.apache.solr.cloud.RecoveryStrategy
> > doRecovery
> > INFO: PeerSync Recovery was not successful - trying replication.
> > core=ofac
> > 
> > So it did not know which files to download ?? Could you help me to
> > solve this problem ?
> > 
> > Thanks in advance.
> > Regards.
> > 
> > On 22 January 2013 23:06, Yonik Seeley <[1]yo...@lucidworks.com> wrote:
> > 
> > On Tue, Jan 22, 2013 at 4:37 PM, Marcin Rzewucki
> > <[2]mrzewu...@gmail.com> wrote:
> > 
> > > Sorry, my mistake. I did 2 tests: in the 1st I removed just index
> > directory
> > 
> > > and in 2nd test I removed both index and tlog directory. Log lines
> > I've
> > 
> > > sent are related to the first case. So Solr could read tlog directory
> > in
> > 
> > > that moment.
> > 
> > > Anyway, do you have an idea why it did not download files from leader
> > ?
> > 
> > For your 1st test, if you only deleted the index and not the
> > 
> > transaction logs, Solr will look at the transaction logs to try and
> > 
> > determine if it is up to date or not (by comparing with peers).
> > 
> > If you want to clear out all the data, remove the entire data
> > directory.
> > 
> > 
> > 
> > -Yonik
> > 
> > [3]http://lucidworks.com
> > 
> > References
> > 
> > 1. mailto:yo...@lucidworks.com
> > 2. mailto:mrzewu...@gmail.com
> > 3. http://lucidworks.com/
> > 
> 


Re: SolrCloud index recovery

2013-01-23 Thread Mark Miller
Looks like it shows 3 cores start - 2 with versions that decide they are up to 
date and one that replicates. The one that replicates doesn't have much logging 
showing that activity.

Is this Solr 4.0?

- Mark

On Jan 23, 2013, at 9:27 AM, Upayavira  wrote:

> Mark,
> 
> Take a peek in the pastebin url Marcin mentioned earlier
> (http://pastebin.com/qMC9kDvt) is there enough info there?
> 
> Upayavira
> 
> On Wed, Jan 23, 2013, at 02:04 PM, Mark Miller wrote:
>> Was your full logged stripped? You are right, we need more. Yes, the peer
>> sync failed, but then you cut out all the important stuff about the
>> replication attempt that happens after.
>> 
>> - Mark
>> 
>> On Jan 23, 2013, at 5:28 AM, Marcin Rzewucki  wrote:
>> 
>>> Hi,
>>> Previously, I took the lines related to collection I tested. Maybe some 
>>> interesting part was missing. I'm sending the full log this time.
>>> It ends up with:
>>> INFO: Finished recovery process. core=ofac
>>> 
>>> The issue I described is related to collection called "ofac". I hope the 
>>> log is meaningful now.
>>> 
>>> It is trying to do the replication, but it seems to not know which files to 
>>> download.
>>> 
>>> Regards.
>>> 
>>> On 23 January 2013 10:39, Upayavira  wrote:
>>> the first stage is identifying whether it can sync with transaction
>>> logs. It couldn't, because there's no index. So the logs you have shown
>>> make complete sense. It then says 'trying replication', which is what I
>>> would expect, and the bit you are saying has failed. So the interesting
>>> bit is likely immediately after the snippet you showed.
>>> 
>>> 
>>> 
>>> Upayavira
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On Wed, Jan 23, 2013, at 07:40 AM, Marcin Rzewucki wrote:
>>> 
>>>  OK, so I did yet another test. I stopped solr, removed whole "data/"
>>>  dir and started Solr again. Directories were recreated fine, but
>>>  missing files were not downloaded from leader. Log is attached (I
>>>  took the lines related to my test with 2 lines of context. I hope it
>>>  helps.). I could find the following warning message:
>>> 
>>> 
>>> Jan 23, 2013 7:16:08 AM org.apache.solr.update.PeerSync sync
>>> INFO: PeerSync: core=ofac url=http://:8983/solr START
>>> replicas=[http://:8983/solr/ofac/] nUpdates=100
>>> Jan 23, 2013 7:16:08 AM org.apache.solr.update.PeerSync sync
>>> WARNING: no frame of reference to tell of we've missed updates
>>> Jan 23, 2013 7:16:08 AM org.apache.solr.cloud.RecoveryStrategy
>>> doRecovery
>>> INFO: PeerSync Recovery was not successful - trying replication.
>>> core=ofac
>>> 
>>> So it did not know which files to download ?? Could you help me to
>>> solve this problem ?
>>> 
>>> Thanks in advance.
>>> Regards.
>>> 
>>> On 22 January 2013 23:06, Yonik Seeley <[1]yo...@lucidworks.com> wrote:
>>> 
>>> On Tue, Jan 22, 2013 at 4:37 PM, Marcin Rzewucki
>>> <[2]mrzewu...@gmail.com> wrote:
>>> 
 Sorry, my mistake. I did 2 tests: in the 1st I removed just index
>>> directory
>>> 
 and in 2nd test I removed both index and tlog directory. Log lines
>>> I've
>>> 
 sent are related to the first case. So Solr could read tlog directory
>>> in
>>> 
 that moment.
>>> 
 Anyway, do you have an idea why it did not download files from leader
>>> ?
>>> 
>>> For your 1st test, if you only deleted the index and not the
>>> 
>>> transaction logs, Solr will look at the transaction logs to try and
>>> 
>>> determine if it is up to date or not (by comparing with peers).
>>> 
>>> If you want to clear out all the data, remove the entire data
>>> directory.
>>> 
>>> 
>>> 
>>> -Yonik
>>> 
>>> [3]http://lucidworks.com
>>> 
>>> References
>>> 
>>> 1. mailto:yo...@lucidworks.com
>>> 2. mailto:mrzewu...@gmail.com
>>> 3. http://lucidworks.com/
>>> 
>> 



JSON with order-preserving update commands?

2013-01-23 Thread Craig Ching
Hi all,

We're using the JSON update handler and we're currently doing two seperate,
but related updates.  The first is a deleteByQuery to delete a bunch of
documents, the second then is a new set of documents to replace the old.
The premise is that the documents are all related in some way and there
might be additions or deletions to the set of related documents.

I don't know if we have a problem with solr per se, but we do have
instances where some of the documents are not searchable after an
"update."  I say "update" because it could be our process for sending the
documents that is the problem.

Nevertheless, I'd like to commit the deleteByQuery and the update of the
set of documents in one transaction with solr, especially since the two
commits are going to invalidate search readers on each.  Since we use the
JSON update handler, I went to the wiki and found this example:

{
"add": {
  "doc": {
"id": "DOC1",
"my_boosted_field": {/* use a map with boost/value for a
boosted field */
  "boost": 2.3,
  "value": "test"
},
"my_multivalued_field": [ "aaa", "bbb" ]   /* use an array for a
multi-valued field */
  }
},
"add": {
  "commitWithin": 5000,  /* commit this document within 5 seconds */
  "overwrite": false,/* don't check for existing documents
with the same uniqueKey */
  "boost": 3.45, /* a document boost */
  "doc": {
"f1": "v1",
"f1": "v2"
  }
},

"commit": {},
"optimize": { "waitFlush":false, "waitSearcher":false },

"delete": { "id":"ID" },   /* delete by ID */
"delete": { "query":"QUERY" }  /* delete by query */
"delete": { "query":"QUERY", 'commitWithin':'500' }/* delete by
query, commit within 500ms */
}

The problem I have is that JSON is not specified to preserve order of
keys.  What I want to do is:

{
"delete": { "query":"QUERY" },
"add": [{
  "doc": {
"id": "DOC1",
"my_boosted_field": {/* use a map with boost/value for a
boosted field */
  "boost": 2.3,
  "value": "test"
},
"my_multivalued_field": [ "aaa", "bbb" ]   /* use an array for a
multi-valued field */
  }
},
{
  "commitWithin": 5000,  /* commit this document within 5 seconds */
  "overwrite": false,/* don't check for existing documents
with the same uniqueKey */
  "boost": 3.45, /* a document boost */
  "doc": {
"f1": "v1",
"f1": "v2"
  }
}],

"commit": {}
}

But how do I guarantee that the "delete" comes before the "add" and that
the "commit" comes after everything?  Is it possible to put the delete in a
separate JSON object, but first in the HTTP POST request (and then put the
commit in the url)?

Thanks for any help!

Cheers,
Craig


Solrcloud 4.1 inconsistent # of results in replicas

2013-01-23 Thread Roupihs
I have a one shard collection, with one replica.
I did a dataImport from my oracle DB.
In the master, I have 93835 docs, in the non master 92627.

I have tried http://{machinename}:8080/solr/{collection}/update/commit=true
on the master, but the index does not replicate.

Also, the node list different generations of the core.

How do I get these nodes to sync?

Thanks
-Joey



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solrcloud-4-1-inconsistent-of-results-in-replicas-tp4035638.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr 4.0 indexing performance question

2013-01-23 Thread Kevin Stone
I am having some difficulty migrating our solr indexing scripts from using 3.5 
to solr 4.0. Notably, I am trying to track down why our performance in solr 4.0 
is about 5-10 times slower when indexing documents. Querying is still quite 
fast.

The code adds  documents in groups of 1000, and adds each group to the solr in 
a thread. The documents are somewhat large, including maybe 30-40 different 
field types, mostly multivalued. Here are some snippets of the code we used in 
3.5.


 MultiThreadedHttpConnectionManager mgr = new 
MultiThreadedHttpConnectionManager();

 HttpClient client = new HttpClient(mgr);

 CommonsHttpSolrServer server = new CommonsHttpSolrServer( "some url for our 
index",client );

 server.setRequestWriter(new BinaryRequestWriter());


 Then, we delete the index, and proceed to generate documents and load the 
groups in a thread that looks kind of like this. I've omitted some overhead for 
handling exceptions, and retry attempts.


class DocWriterThread implements Runnable

{

CommonsHttpSolrServer server;

Collection docs;

private int commitWithin = 5; // 50 seconds

public DocWriterThread(CommonsHttpSolrServer 
server,Collection docs)

{

this.server=server;

this.docs=docs;

}

public void run()

{

// set the commitWithin feature

server.add(docs,commitWithin);

}

}


Now, I've had to change some things to get this compile with the Solr 4.0 
libraries. Here is what I tried to convert the above code to. I don't know if 
these are the correct equivalents, as I am not familiar with apache 
httpcomponents.



 ThreadSafeClientConnManager mgr = new ThreadSafeClientConnManager();

 DefaultHttpClient client = new DefaultHttpClient(mgr);

 HttpSolrServer server = new HttpSolrServer( "some url for our solr 
index",client );

 server.setRequestWriter(new BinaryRequestWriter());




The thread method is the same, but uses HttpSolrServer instead of 
CommonsHttpSolrServer.

We also, had an old solrconfig (not sure what version, but it is pre 3.x and 
had mostly default values) that I had to replace with a 4.0 style 
solrconfig.xml. I don't want to post the entire file (as it is large), but I 
copied one from the solr 4.0 examples, and made a couple changes. First, I 
wanted to turn off transaction logging. So essentially I have a line like this 
(everything inside is commented out):





And I added a handler for javabin






 application/javabin

   

  

I'm not sure what other configurations I should look at. I would think that 
there should be a big obvious reason why the indexing performance would drop 
nearly 10 fold.

Against our 3.5 instance I timed our index load, and it adds roughly 40,000 
documents every 3-8 seconds.

Against our 4.0 instance it adds 40,000 documents every 70-75 seconds.

This isn't the end of the world, and I would love to use the new join feature 
in solr 4.0. However, we have many different indexes with millions of 
documents, and this kind of increase in load time is troubling.


Thanks for your help.


-Kevin


The information in this email, including attachments, may be confidential and 
is intended solely for the addressee(s). If you believe you received this email 
by mistake, please notify the sender by return email as soon as possible.


Problem with migration from solr 3.5 with SOLR-2155 usage to solr 4.0

2013-01-23 Thread Viacheslav Davidovich
Hi, 

With Solr 3.5 I use SOLR-2155 plugin to filter the documents by distance as 
described in http://wiki.apache.org/solr/SpatialSearch#Advanced_Spatial_Search 
and this solution perfectly filter the multiValued data defined in schema.xml  
like



 

the query looks like this with Solr 3.5:  q=*:*&fq={!geofilt}&sfield= 
location_data&pt=45.15,-93.85&d=50&sort=geodist() asc

As SOLR-2155 plugin not compatible with solr 4.0 I try to change the field 
definition to next:







But in this case after geofilt by location_data execution the correct values 
returns only if the field have 1 value, if more them 1 value stored in index 
required documents returns only when all the location points are matched.

Have anybody experience or any ideas how to receive the same behavior in 
solr4.0 as this was in solr3.5 with SOLR-2155 plugin usage?

Is this possible at all or I need to refactor the document structure and field 
definition to store only 1 location value per document?

WBR Viacheslav.



Re: Solrcloud 4.1 inconsistent # of results in replicas

2013-01-23 Thread Mark Miller
Does the admin cloud UI show all of the nodes as green? (active)

If so, something is not right.

- Mark

On Jan 23, 2013, at 10:02 AM, Roupihs  wrote:

> I have a one shard collection, with one replica.
> I did a dataImport from my oracle DB.
> In the master, I have 93835 docs, in the non master 92627.
> 
> I have tried http://{machinename}:8080/solr/{collection}/update/commit=true
> on the master, but the index does not replicate.
> 
> Also, the node list different generations of the core.
> 
> How do I get these nodes to sync?
> 
> Thanks
> -Joey
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solrcloud-4-1-inconsistent-of-results-in-replicas-tp4035638.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr 4.0 indexing performance question

2013-01-23 Thread Mark Miller
It's hard to guess, but I might start by looking at what the new UpdateLog is 
costing you. Take it's definition out of solrconfig.xml and try your test 
again. Then let's take it from there.

- Mark

On Jan 23, 2013, at 11:00 AM, Kevin Stone  wrote:

> I am having some difficulty migrating our solr indexing scripts from using 
> 3.5 to solr 4.0. Notably, I am trying to track down why our performance in 
> solr 4.0 is about 5-10 times slower when indexing documents. Querying is 
> still quite fast.
> 
> The code adds  documents in groups of 1000, and adds each group to the solr 
> in a thread. The documents are somewhat large, including maybe 30-40 
> different field types, mostly multivalued. Here are some snippets of the code 
> we used in 3.5.
> 
> 
> MultiThreadedHttpConnectionManager mgr = new 
> MultiThreadedHttpConnectionManager();
> 
> HttpClient client = new HttpClient(mgr);
> 
> CommonsHttpSolrServer server = new CommonsHttpSolrServer( "some url for our 
> index",client );
> 
> server.setRequestWriter(new BinaryRequestWriter());
> 
> 
> Then, we delete the index, and proceed to generate documents and load the 
> groups in a thread that looks kind of like this. I've omitted some overhead 
> for handling exceptions, and retry attempts.
> 
> 
> class DocWriterThread implements Runnable
> 
> {
> 
>CommonsHttpSolrServer server;
> 
>Collection docs;
> 
>private int commitWithin = 5; // 50 seconds
> 
>public DocWriterThread(CommonsHttpSolrServer 
> server,Collection docs)
> 
>{
> 
>this.server=server;
> 
>this.docs=docs;
> 
>}
> 
> public void run()
> 
> {
> 
>// set the commitWithin feature
> 
>server.add(docs,commitWithin);
> 
> }
> 
> }
> 
> 
> Now, I've had to change some things to get this compile with the Solr 4.0 
> libraries. Here is what I tried to convert the above code to. I don't know if 
> these are the correct equivalents, as I am not familiar with apache 
> httpcomponents.
> 
> 
> 
> ThreadSafeClientConnManager mgr = new ThreadSafeClientConnManager();
> 
> DefaultHttpClient client = new DefaultHttpClient(mgr);
> 
> HttpSolrServer server = new HttpSolrServer( "some url for our solr 
> index",client );
> 
> server.setRequestWriter(new BinaryRequestWriter());
> 
> 
> 
> 
> The thread method is the same, but uses HttpSolrServer instead of 
> CommonsHttpSolrServer.
> 
> We also, had an old solrconfig (not sure what version, but it is pre 3.x and 
> had mostly default values) that I had to replace with a 4.0 style 
> solrconfig.xml. I don't want to post the entire file (as it is large), but I 
> copied one from the solr 4.0 examples, and made a couple changes. First, I 
> wanted to turn off transaction logging. So essentially I have a line like 
> this (everything inside is commented out):
> 
> 
> 
> 
> 
> And I added a handler for javabin
> 
> 
>  class="solr.BinaryUpdateRequestHandler">
> 
>
> 
> application/javabin
> 
>   
> 
>  
> 
> I'm not sure what other configurations I should look at. I would think that 
> there should be a big obvious reason why the indexing performance would drop 
> nearly 10 fold.
> 
> Against our 3.5 instance I timed our index load, and it adds roughly 40,000 
> documents every 3-8 seconds.
> 
> Against our 4.0 instance it adds 40,000 documents every 70-75 seconds.
> 
> This isn't the end of the world, and I would love to use the new join feature 
> in solr 4.0. However, we have many different indexes with millions of 
> documents, and this kind of increase in load time is troubling.
> 
> 
> Thanks for your help.
> 
> 
> -Kevin
> 
> 
> The information in this email, including attachments, may be confidential and 
> is intended solely for the addressee(s). If you believe you received this 
> email by mistake, please notify the sender by return email as soon as 
> possible.



Re: Solrcloud 4.1 inconsistent # of results in replicas

2013-01-23 Thread Roupihs
Mark Miller-3 wrote
> Does the admin cloud UI show all of the nodes as green? (active)
> 
> If so, something is not right.
> 
> - Mark

Yes, the leader with the correct # is filled in green, and the other node is
a green circle.

-Joey



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solrcloud-4-1-inconsistent-of-results-in-replicas-tp4035638p4035654.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Problem with migration from solr 3.5 with SOLR-2155 usage to solr 4.0

2013-01-23 Thread Smiley, David W.
Viacheslav,


SOLR-2155 is only compatible with Solr 3.  However the technology it is
based on lives on in Lucene/Solr 4 in the
"SpatialRecursivePrefixTreeFieldType" field type.  In the example schema
it's registered under the name "location_rpt".  For more information on
how to use this field type, see: SpatialRecursivePrefixTreeFieldType

~ David Smiley

On 1/23/13 11:11 AM, "Viacheslav Davidovich"
 wrote:

>Hi, 
>
>With Solr 3.5 I use SOLR-2155 plugin to filter the documents by distance
>as described in 
>http://wiki.apache.org/solr/SpatialSearch#Advanced_Spatial_Search and
>this solution perfectly filter the multiValued data defined in schema.xml
> like
>
>length="12" />
>
>multiValued="true"/>
>
>the query looks like this with Solr 3.5:  q=*:*&fq={!geofilt}&sfield=
>location_data&pt=45.15,-93.85&d=50&sort=geodist() asc
>
>As SOLR-2155 plugin not compatible with solr 4.0 I try to change the
>field definition to next:
>
>subFieldSuffix="_coordinate" />
>
>multiValued="true"/>
>
>stored="false" />
>
>But in this case after geofilt by location_data execution the correct
>values returns only if the field have 1 value, if more them 1 value
>stored in index required documents returns only when all the location
>points are matched.
>
>Have anybody experience or any ideas how to receive the same behavior in
>solr4.0 as this was in solr3.5 with SOLR-2155 plugin usage?
>
>Is this possible at all or I need to refactor the document structure and
>field definition to store only 1 location value per document?
>
>WBR Viacheslav.
>



Re: Solr 4.0 indexing performance question

2013-01-23 Thread Kevin Stone
Do you mean commenting out the ... tag? Because
that I already commented out. Or do I also need to remove the entire
 tag? Sorry, I am not too familiar with everything in the
solrconfig file. I have a tag that essentially looks like this:




Everything inside is commented out.

-Kevin

On 1/23/13 11:21 AM, "Mark Miller"  wrote:

>It's hard to guess, but I might start by looking at what the new
>UpdateLog is costing you. Take it's definition out of solrconfig.xml and
>try your test again. Then let's take it from there.
>
>- Mark
>
>On Jan 23, 2013, at 11:00 AM, Kevin Stone  wrote:
>
>> I am having some difficulty migrating our solr indexing scripts from
>>using 3.5 to solr 4.0. Notably, I am trying to track down why our
>>performance in solr 4.0 is about 5-10 times slower when indexing
>>documents. Querying is still quite fast.
>>
>> The code adds  documents in groups of 1000, and adds each group to the
>>solr in a thread. The documents are somewhat large, including maybe
>>30-40 different field types, mostly multivalued. Here are some snippets
>>of the code we used in 3.5.
>>
>>
>> MultiThreadedHttpConnectionManager mgr = new
>>MultiThreadedHttpConnectionManager();
>>
>> HttpClient client = new HttpClient(mgr);
>>
>> CommonsHttpSolrServer server = new CommonsHttpSolrServer( "some url for
>>our index",client );
>>
>> server.setRequestWriter(new BinaryRequestWriter());
>>
>>
>> Then, we delete the index, and proceed to generate documents and load
>>the groups in a thread that looks kind of like this. I've omitted some
>>overhead for handling exceptions, and retry attempts.
>>
>>
>> class DocWriterThread implements Runnable
>>
>> {
>>
>>CommonsHttpSolrServer server;
>>
>>Collection docs;
>>
>>private int commitWithin = 5; // 50 seconds
>>
>>public DocWriterThread(CommonsHttpSolrServer
>>server,Collection docs)
>>
>>{
>>
>>this.server=server;
>>
>>this.docs=docs;
>>
>>}
>>
>> public void run()
>>
>> {
>>
>>// set the commitWithin feature
>>
>>server.add(docs,commitWithin);
>>
>> }
>>
>> }
>>
>>
>> Now, I've had to change some things to get this compile with the Solr
>>4.0 libraries. Here is what I tried to convert the above code to. I
>>don't know if these are the correct equivalents, as I am not familiar
>>with apache httpcomponents.
>>
>>
>>
>> ThreadSafeClientConnManager mgr = new ThreadSafeClientConnManager();
>>
>> DefaultHttpClient client = new DefaultHttpClient(mgr);
>>
>> HttpSolrServer server = new HttpSolrServer( "some url for our solr
>>index",client );
>>
>> server.setRequestWriter(new BinaryRequestWriter());
>>
>>
>>
>>
>> The thread method is the same, but uses HttpSolrServer instead of
>>CommonsHttpSolrServer.
>>
>> We also, had an old solrconfig (not sure what version, but it is pre
>>3.x and had mostly default values) that I had to replace with a 4.0
>>style solrconfig.xml. I don't want to post the entire file (as it is
>>large), but I copied one from the solr 4.0 examples, and made a couple
>>changes. First, I wanted to turn off transaction logging. So essentially
>>I have a line like this (everything inside is commented out):
>>
>>
>> 
>>
>>
>> And I added a handler for javabin
>>
>>
>> >class="solr.BinaryUpdateRequestHandler">
>>
>>
>>
>> application/javabin
>>
>>   
>>
>>  
>>
>> I'm not sure what other configurations I should look at. I would think
>>that there should be a big obvious reason why the indexing performance
>>would drop nearly 10 fold.
>>
>> Against our 3.5 instance I timed our index load, and it adds roughly
>>40,000 documents every 3-8 seconds.
>>
>> Against our 4.0 instance it adds 40,000 documents every 70-75 seconds.
>>
>> This isn't the end of the world, and I would love to use the new join
>>feature in solr 4.0. However, we have many different indexes with
>>millions of documents, and this kind of increase in load time is
>>troubling.
>>
>>
>> Thanks for your help.
>>
>>
>> -Kevin
>>
>>
>> The information in this email, including attachments, may be
>>confidential and is intended solely for the addressee(s). If you believe
>>you received this email by mistake, please notify the sender by return
>>email as soon as possible.
>


The information in this email, including attachments, may be confidential and 
is intended solely for the addressee(s). If you believe you received this email 
by mistake, please notify the sender by return email as soon as possible.


Re: Possible UI issue after upgrade to 4.1

2013-01-23 Thread Stefan Matheis
Joey

That looks like a mixture .. the HTML-Partial is from the 4.0 Release, but my 
guess is that you're seeing the current stylesheet .. that doesn't work well 
together. 

Perhaps it helps if you trick your Browser a bit, by requesting the Partial 
directly using http://solr:8983/solr/tpl/dataimport.html 
(http://debian2.vm:8983/solr/tpl/dataimport.html) and reload that (depending on 
your browser: SHIFT + F5 or SHIFT + CMD + R) the current Version should begin 
with those lines:

*snip*

Last Update: Unknown
[Abort Import]
Raw Status-Output


*snip*

HTH
Stefan



On Wednesday, January 23, 2013 at 2:50 PM, Joseph Dale wrote:

> I upgraded to solr 4.1 from 4.0 to take advantage of some solrcloud 
> improvements, but now it seems that the DIH UI is broken. I have a screen 
> shot but the list seems to block emails w/ links. I will try to describe my 
> issue:
> 
> * DIH it self works, via commands & the buttons on the UI.
> * The DIH UI says "Dataimport XML-Configuration is not valid"
> 
> Any ideas?
> 
> Thanks
> -Joey





Re: JSON with order-preserving update commands?

2013-01-23 Thread Yonik Seeley
On Wed, Jan 23, 2013 at 9:50 AM, Craig Ching  wrote:
> The problem I have is that JSON is not specified to preserve order of
> keys.

JSON is a serialization format, and readers/writers can preserve order
if they wish to.
If you send JSON to solr in a specific order, that order will
definitely be respected.

-Yonik
http://lucidworks.com


Re: Solr 4.0 indexing performance question

2013-01-23 Thread Kevin Stone
I'm still poking around trying to find the differences. I found a couple
things that may or may not be relevant.
First, when I start up my 3.5 solr, I get all sorts of warnings that my
solrconfig is old and will run using 2.4 emulation.
Of course I had to upgrade the solconfig for the 4.0 instance (which I
already described). I am curious if there could be some feature I was
taking advantage of in 2.4 that doesn't exist now in 4.0. I don't know.

Second when I look at the console logs for my server (3.5 and 4.0) and I
run the indexer against each, I see a subtle difference in this print out
when it connects to the solr core.
The 3.5 version prints this out:
webapp=/solr path=/update
params={waitSearcher=true&wt=javabin&commit=true&softCommit=false&version=2
} {commit=} 0 2722


The 4.0 version prints this out
 webapp=/solr path=/update/javabin
params={wt=javabin&commit=true&waitFlush=true&waitSearcher=true&version=2}
status=0 QTime=1404



The params for the update handle seem ever so slightly different. The 3.5
version (the one that runs fast) has a setting softCommit=false.
The 4.0 version does not print that setting, but instead prints this
setting waitFlush=true.

These could be irrelevant, but thought I should add the information.

-Kevin

On 1/23/13 11:42 AM, "Kevin Stone"  wrote:

>Do you mean commenting out the ... tag? Because
>that I already commented out. Or do I also need to remove the entire
> tag? Sorry, I am not too familiar with everything in the
>solrconfig file. I have a tag that essentially looks like this:
>
>
>
>
>Everything inside is commented out.
>
>-Kevin
>
>On 1/23/13 11:21 AM, "Mark Miller"  wrote:
>
>>It's hard to guess, but I might start by looking at what the new
>>UpdateLog is costing you. Take it's definition out of solrconfig.xml and
>>try your test again. Then let's take it from there.
>>
>>- Mark
>>
>>On Jan 23, 2013, at 11:00 AM, Kevin Stone  wrote:
>>
>>> I am having some difficulty migrating our solr indexing scripts from
>>>using 3.5 to solr 4.0. Notably, I am trying to track down why our
>>>performance in solr 4.0 is about 5-10 times slower when indexing
>>>documents. Querying is still quite fast.
>>>
>>> The code adds  documents in groups of 1000, and adds each group to the
>>>solr in a thread. The documents are somewhat large, including maybe
>>>30-40 different field types, mostly multivalued. Here are some snippets
>>>of the code we used in 3.5.
>>>
>>>
>>> MultiThreadedHttpConnectionManager mgr = new
>>>MultiThreadedHttpConnectionManager();
>>>
>>> HttpClient client = new HttpClient(mgr);
>>>
>>> CommonsHttpSolrServer server = new CommonsHttpSolrServer( "some url for
>>>our index",client );
>>>
>>> server.setRequestWriter(new BinaryRequestWriter());
>>>
>>>
>>> Then, we delete the index, and proceed to generate documents and load
>>>the groups in a thread that looks kind of like this. I've omitted some
>>>overhead for handling exceptions, and retry attempts.
>>>
>>>
>>> class DocWriterThread implements Runnable
>>>
>>> {
>>>
>>>CommonsHttpSolrServer server;
>>>
>>>Collection docs;
>>>
>>>private int commitWithin = 5; // 50 seconds
>>>
>>>public DocWriterThread(CommonsHttpSolrServer
>>>server,Collection docs)
>>>
>>>{
>>>
>>>this.server=server;
>>>
>>>this.docs=docs;
>>>
>>>}
>>>
>>> public void run()
>>>
>>> {
>>>
>>>// set the commitWithin feature
>>>
>>>server.add(docs,commitWithin);
>>>
>>> }
>>>
>>> }
>>>
>>>
>>> Now, I've had to change some things to get this compile with the Solr
>>>4.0 libraries. Here is what I tried to convert the above code to. I
>>>don't know if these are the correct equivalents, as I am not familiar
>>>with apache httpcomponents.
>>>
>>>
>>>
>>> ThreadSafeClientConnManager mgr = new ThreadSafeClientConnManager();
>>>
>>> DefaultHttpClient client = new DefaultHttpClient(mgr);
>>>
>>> HttpSolrServer server = new HttpSolrServer( "some url for our solr
>>>index",client );
>>>
>>> server.setRequestWriter(new BinaryRequestWriter());
>>>
>>>
>>>
>>>
>>> The thread method is the same, but uses HttpSolrServer instead of
>>>CommonsHttpSolrServer.
>>>
>>> We also, had an old solrconfig (not sure what version, but it is pre
>>>3.x and had mostly default values) that I had to replace with a 4.0
>>>style solrconfig.xml. I don't want to post the entire file (as it is
>>>large), but I copied one from the solr 4.0 examples, and made a couple
>>>changes. First, I wanted to turn off transaction logging. So essentially
>>>I have a line like this (everything inside is commented out):
>>>
>>>
>>> 
>>>
>>>
>>> And I added a handler for javabin
>>>
>>>
>>> >>class="solr.BinaryUpdateRequestHandler">
>>>
>>>
>>>
>>> application/javabin
>>>
>>>   
>>>
>>>  
>>>
>>> I'm not sure what other configurations I should look at. I would think
>>>that there should be a big obvious reason why the indexing performance
>>>would drop nearly 10 fold.
>>>
>>> Against our 3.5 instance I timed our index load, and it

RE: SolrJ DirectXmlRequest

2013-01-23 Thread Ryan Josal
Thanks Hoss,

  The issue mentioned describes a similar behavior to what I observed, but not 
quite.  Commons-fileupload creates java.io.File objects for the temp files, and 
when those Files are garbage collected, the temp file is deleted.  I've 
verified this by letting the temp files build up and then forcing a full 
collection which clears all of them.  So I think the reason a percentage of 
temp files built up in my system was that under heavy load, some of the 
java.io.Files made it into old gen in the heap.  I switched to G1, and the 
problem went away.

Regarding the how the XML files are being sent, I have verified that each XML 
file is sent as a single request, by aligning the access log of my Solr master 
server with the processing log of my SolrJ server.  I didn't test the requests 
to see if the MIME type is multipart, but I suppose it is possible if some 
other form data or instruction needed to be passed with it.  Either way, I 
suppose it would go through fileupload anyway, because somebody's got to make a 
temp file for large files, right?

Ryan

From: Chris Hostetter [hossman_luc...@fucit.org]
Sent: Wednesday, January 16, 2013 6:06 PM
To: solr-user@lucene.apache.org
Subject: RE: SolrJ DirectXmlRequest

: DirectXmlRequest is part of the SolrJ library, so I guess that means it
: is not commonly used.  My use case is that I'm applying an XSLT to the
: raw XML on the client side, instead of leaving that up to the Solr
: master (although even if I applied the XSLT on the Solr server, I'd

I think Otis's point was that most people don't have Solr XML files lying
arround that they send to Solr, nor do they build up XML strings in Java
in the Solr input format (with XSLT or otherwise) ... most people using
SolrJ build up SolrInputDocument objects and pass those to their
SolrServer instance.

: I've done some research and I'm fairly confident that apache
: commons-fileupload library is responsible for the temp files.  There's

I believe you are correct ... searching for "solr fileupload temp files"
lead me to this issue which seems to have fallen by the way side...

https://issues.apache.org/jira/browse/SOLR-1953

...if you could try that patch outand/or post your comments it would be
helpful.

Something that seems really odd to me however is how/why your basic
updates are even causing multipart/file-upload functionality to be used
... a quick skim of the client code suggests that that should only happen
if your try to send multiple ContentStreams in a single request: I can
understand why that wouldn't typically happen for most users building up
multiple SolrInputDocuments (they would get added to a single stream); and
i can understand why that would typically happen for users sending
multiple binary files to something like ExtractingRequestHandler -- but if
you are using DirectXmlRequest in the way you described each xml file
should be sent as a single stream in a single request and the XML should
be sent in the raw POST body -- the commons-fileupload code shouldn't even
come into play.  (either that, or i'm missing something, or you're using
an older version of solr that used fileupload even if there was only a
single content stream)


-Hoss

-
This transmission (including any attachments) may contain confidential 
information, privileged material (including material protected by the 
solicitor-client or other applicable privileges), or constitute non-public 
information. Any use of this information by anyone other than the intended 
recipient is prohibited. If you have received this transmission in error, 
please immediately reply to the sender and delete this information from your 
system. Use, dissemination, distribution, or reproduction of this transmission 
by unintended recipients is not authorized and may be unlawful.


Re: Multiple-fields multilingual indexing - Query expansion for multilingual fields

2013-01-23 Thread Eduard Moraru
Hi Alex,

On Wed, Jan 23, 2013 at 3:44 PM, Alexandre Rafalovitch
wrote:

> On Wed, Jan 23, 2013 at 8:38 AM, Eduard Moraru 
> wrote:
>
> > "title:version author:SomeGuy content:content"
> >
> > which would get automagically expanded to:
> >
> > "(title_en:version OR title_fr:version) author:SomeGuy
> (content_en:content
> > OR content_fr:content)"
> >
>
> Ignoring everything else, how is this different from eDisMax's field
> combined with User Field setting? (haven't used them myself yet)
> http://wiki.apache.org/solr/ExtendedDisMax#Field_aliasing_.2BAC8_renaming
>

Exactly what I needed. The field aliasing fits perfectly my scenario.
Thanks a million!

The only small but workable problem I have now is the same as
https://issues.apache.org/jira/browse/SOLR-3598. When you are creating an
alias for the field "who", you can't include the actual field in the list
of alias like "f.who.qf=who,what,where" because you`ll get an "alias loop"
exception.

The workaround suggested in the issue is to rename your "who" field to
something like "who_real" and rewrite the alias like
"f.who.qf=who_real,what,where". This is a bit cumbersome and frankly
annoying since your schema.xml will look weird. A way to avoid alias
resolution for a field in the alias list and actually try to fetch the
field from the index would have been great (like what the issue poster
suggested), but anyway.

If you have a better suggestion for the alias loop, please let me know,
otherwise, thank you again for the quick and efficient reply.

Cheers,
Eduard

>
> Regards,
>Alex.
>
> Personal blog: http://blog.outerthoughts.com/
> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> - Time is the quality of nature that keeps events from happening all at
> once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)
>


RE: firstSearcher and NewSearcher parameters

2013-01-23 Thread Petersen, Robert
Hi Otis, 

OK I guess I see how that makes sense.  If I use function queries for affecting 
the scoring of results, does it help to include those in the warm up queries or 
does the same thing go for those also?  IE is it useless to add {!boost%20b=... ?

Thanks,
Robi

-Original Message-
From: Otis Gospodnetic [mailto:otis.gospodne...@gmail.com] 
Sent: Tuesday, January 22, 2013 5:21 PM
To: solr-user@lucene.apache.org
Subject: Re: firstSearcher and NewSearcher parameters

Hi Robi,

Boosts don't do anything for warmup queries if that is what you're after...

Otis
Solr & ElasticSearch Support
http://sematext.com/
On Jan 22, 2013 8:08 PM, "Petersen, Robert"  wrote:

> Hi guys,
>
> I was wondering if there was a way to pass commonly used boost values 
> in with commonly used filter queries in these solrConfig event handler 
> sections.  Could I just append the ^1.5 at the end of the fq value?  
> IE can I do this:
> taxonomyCategoryTypeId:1^1.5
> Or perhaps this:
> (taxonomyCategoryTypeId:1)%5e1.5
>
>
> Is there a more comprehensive list of possible xml query parameters we 
> can put in these config sections?  Is it just anything normally passed 
> in? So far these are the only ones I have seen used:
>
>   name="queries"> 
> star:1
> storeId
> 0
> 35
> taxonomyCategoryTypeId:0 TO 1 
> ...Etc etc...
>
> Thanks,
> Robi
>



Re: solr query

2013-01-23 Thread Gora Mohanty
On 23 January 2013 23:04, hassancrowdc  wrote:
> Date and time is not being displayed properly, It gives me the following
> error and also it goes to he next line after year and month, see following:
> createdDate":"ERROR:SCHEMA-INDEX-MISMATCH,stringValue=2012-12-
> 21T21:34:51" in my schema:
>  required="true" />
> and type is:
>  positionIncrementGap="0"/>
> Is there any datetime field in solr that i can write in schema.xml so that
> my date and time are shown properly in my resultset(json) from solr?
[...]

Please create a new thread for separate questions,
rather than asking multiple questions in the same
thread. Why thread hijacking is bad is covered in
http://people.apache.org/~hossman/#threadhijack

Regards,
Gora

P.S. This must be the first time that I have seen someone
hijack their own thread.


Re: Multiple-fields multilingual indexing - Query expansion for multilingual fields

2013-01-23 Thread Alexandre Rafalovitch
On Wed, Jan 23, 2013 at 12:23 PM, Eduard Moraru wrote:

> The only small but workable problem I have now is the same as
> https://issues.apache.org/jira/browse/SOLR-3598. When you are creating an
> alias for the field "who", you can't include the actual field in the list
> of alias like "f.who.qf=who,what,where" because you`ll get an "alias loop"
> exception.
>

But why do you need 'title' field at all? I can see it is 'generic'
formatting, but how useful can that be if you are actively multilingual?

But if you need it, can't it be just title_generic in the schema. You can
probably use Request Update Processors to change the field name if you
can't rename it in the client/source.

And if you are worried about the client getting the field names, I believe
you can alias them on the way out as well, using a different parameter.

Regards,
   Alex.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


solr-datetime field

2013-01-23 Thread hassancrowdc
Date and time is not being displayed properly, It goes to he next line after
year and month, see following:
"createdDate":"2012-12-
21T21:34:51Z"

in my schema: 

and type is:


Is there any datetime field in solr that i can write in schema.xml so that
my date and time are shown properly in my resultset(json) from solr?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-datetime-field-tp4035704.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr cache considerations

2013-01-23 Thread Otis Gospodnetic
I think the attachment got stripped.  Here it is:
http://www.flickr.com/photos/otis/8409088080/in/photostream

Otis
--
Solr & ElasticSearch Support
http://sematext.com/





On Tue, Jan 22, 2013 at 12:36 PM, Otis Gospodnetic <
otis.gospodne...@gmail.com> wrote:

> Same here - I've seen some document caches that were huge and highly
> utilized.  Check out the screenshot of the SPM for Solr dashboard that
> shows pretty high hit rates on all caches.  I've circled the parts to look
> at.  ML manager may strip the attachment, of course. :)
>
> In addition to multiple in-request lookups and hits in document cache,
> document caches provide value when queries are frequently somewhat similar
> and thus return some of the same hits as previous queries.
>
> Otis
> --
> Solr & ElasticSearch Support
> http://sematext.com/
>
>
>
>
>
> On Mon, Jan 21, 2013 at 1:39 PM, Erick Erickson 
> wrote:
>
>> Hmm, interesting. I'll have to look closer...
>>
>> On Sun, Jan 20, 2013 at 3:50 PM, Walter Underwood 
>> wrote:
>> > I routinely see hit rates over 75% on the document cache. Perhaps yours
>> is too small. Mine is set at 10240 entries.
>> >
>> > wunder
>> >
>> > On Jan 20, 2013, at 8:08 AM, Erick Erickson wrote:
>> >
>> >> About your question about document cache: Typically the document cache
>> >> has a pretty low hit-ratio. I've rarely, if ever, seen it get hit very
>> >> often. And remember that this cache is only hit when assembling the
>> >> response for a few documents (your page size).
>> >>
>> >> Bottom line: I wouldn't worry about this cache much. It's quite useful
>> >> for processing a particular query faster, but not really intended for
>> >> cross-query use.
>> >>
>> >> Really, I think you're getting the cart before the horse here. Run it
>> >> up the flagpole and try it. Rely on the OS to do its job
>> >> (
>> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html).
>> >> Find  a bottleneck _then_ tune. Premature optimization and all
>> >> that
>> >>
>> >> Several tens of millions of docs isn't that large unless the text
>> >> fields are enormous.
>> >>
>> >> Best
>> >> Erick
>> >>
>> >> On Sat, Jan 19, 2013 at 2:32 PM, Isaac Hebsh 
>> wrote:
>> >>> Ok. Thank you everyone for your helpful answers.
>> >>> I understand that fieldValueCache is not used for resolving queries.
>> >>> Is there any cache that can help this basic scenario (a lot of
>> different
>> >>> queries, on a small set of fields)?
>> >>> Does Lucene's FieldCache help (implicitly)?
>> >>> How can I use RAM to reduce I/O in this type of queries?
>> >>>
>> >>>
>> >>> On Fri, Jan 18, 2013 at 4:09 PM, Tomás Fernández Löbbe <
>> >>> tomasflo...@gmail.com> wrote:
>> >>>
>>  No, the fieldValueCache is not used for resolving queries. Only for
>>  multi-token faceting and apparently for the stats component too. The
>>  document cache maintains in memory the stored content of the fields
>> you are
>>  retrieving or highlighting on. It'll hit if the same document
>> matches the
>>  query multiple times and the same fields are requested, but as Eirck
>> said,
>>  it is important for cases when multiple components in the same
>> request need
>>  to access the same data.
>> 
>>  I think soft committing every 10 minutes is totally fine, but you
>> should
>>  hard commit more often if you are going to be using transaction log.
>>  openSearcher=false will essentially tell Solr not to open a new
>> searcher
>>  after the (hard) commit, so you won't see the new indexed data and
>> caches
>>  wont be flushed. openSearcher=false makes sense when you are using
>>  hard-commits together with soft-commits, as the "soft-commit" is
>> dealing
>>  with opening/closing searchers, you don't need hard commits to do it.
>> 
>>  Tomás
>> 
>> 
>>  On Fri, Jan 18, 2013 at 2:20 AM, Isaac Hebsh 
>>  wrote:
>> 
>> > Unfortunately, it seems (
>> > http://lucene.472066.n3.nabble.com/Nrt-and-caching-td3993612.html)
>> that
>> > these caches are not per-segment. In this case, I want to (soft)
>> commit
>> > less frequently. Am I right?
>> >
>> > Tomás, as the fieldValueCache is very similar to lucene's
>> FieldCache, I
>> > guess it has a big contribution to standard (not only faceted)
>> queries
>> > time. SolrWiki claims that it primarily used by faceting. What that
>> says
>> > about complex textual queries?
>> >
>> > documentCache:
>> > Erick, After a query processing is finished, doesn't some documents
>> stay
>>  in
>> > the documentCache? can't I use it to accelerate queries that should
>> > retrieve stored fields of documents? In this case, a big
>> documentCache
>>  can
>> > hold more documents..
>> >
>> > About commit frequency:
>> > HardCommit: "openSearch=false" seems as a nice solution. Where can
>> I read
>> > about this? (found nothing but one unexplained sentence in
>> SolrWiki).
>>

Indexing question

2013-01-23 Thread Ron Poling
Hello!

I'm new to solr and trying to figure out how to implement it in our 
environment. My question involves building the index. Our data does not lend 
itself to delta updates so we have to build the entire index each time. Is 
there some way to feed solr a file with all index records and tell it to throw 
away all current data and use only the new? I'm guessing that I could delete 
everything and add all the new records, but until the new index was built, solr 
would not be able to service my web app. I would like to build the new index in 
solr and then tell it to switch to it and remove the old one. Is that possible? 
Another way of doing the might be to update the current index with the new data 
and then delete everything that didn't get updated.

Any help here would be appreciated so I can focus on the things in the wiki 
that I need to before I start implementing. Thanks!


RE: firstSearcher and NewSearcher parameters

2013-01-23 Thread Chris Hostetter

: OK I guess I see how that makes sense.  If I use function queries for 
: affecting the scoring of results, does it help to include those in the 
: warm up queries or does the same thing go for those also?  IE is it 
: useless to add {!boost%20b=... ?

boosts on *queries* probably won't affect your warming queries (unless you 
have concerned about a particularly important/expensive query and you 
always want that exact query to be warmed) but if you typically boost on 
some functions of field values then including those functions in your 
warming queries can be helpul to ensure that the field caches for the 
fields used in those functions are warmed up.


-Hoss


Re: Indexing question

2013-01-23 Thread Michael Della Bitta
Hi Ron,

If you turn off autoCommit and only commit after your delete and refresh,
the user's experience will be totally uninterrupted. Commits are used to
control visibility in a Solr index.


Michael Della Bitta


Appinions
18 East 41st Street, 2nd Floor
New York, NY 10017-6271

www.appinions.com

Where Influence Isn’t a Game


On Wed, Jan 23, 2013 at 2:32 PM, Ron Poling wrote:

> Hello!
>
> I'm new to solr and trying to figure out how to implement it in our
> environment. My question involves building the index. Our data does not
> lend itself to delta updates so we have to build the entire index each
> time. Is there some way to feed solr a file with all index records and tell
> it to throw away all current data and use only the new? I'm guessing that I
> could delete everything and add all the new records, but until the new
> index was built, solr would not be able to service my web app. I would like
> to build the new index in solr and then tell it to switch to it and remove
> the old one. Is that possible? Another way of doing the might be to update
> the current index with the new data and then delete everything that didn't
> get updated.
>
> Any help here would be appreciated so I can focus on the things in the
> wiki that I need to before I start implementing. Thanks!
>


Solr4 SolrCloud ClusterState says we are the leader, but locally we don't think so

2013-01-23 Thread John Skopis (lists)
Hello,

We have recently put solr4 into production.

We have a 3 node cluster with a single shard. Each solr node is also a
zookeeper node, but zookeeper is running in cluster mode. We are using the
cloudera zookeeper package.

There is no communication problems between nodes. They are in two different
racks directly connected over a 2Gb uplink. The nodes each have a 1Gb
uplink.

I was thinking ideally mmsolr01 would be the leader, the application sends
all index requests directly to the leader node. A load balancer splits read
requests over the remaining two nodes.

We autocommit every 300s or 10k documents with a softcommit every 5s. The
index is roughly 200mm documents.

I have configured a cron to run every hour (on every node):
0 * * * * /usr/bin/curl -s '
http://localhost:8983/solr/collection1/replication?command=backup&numberToKeep=3'
> /dev/null

Using a snapshot seems to be the easiest way to reproduce, but it's also
possible to reproduce under very heavy indexing load.

When the snapshot is running, occasionally we get a zk timeout, causing the
leader to drop out of the cluster. We have also seen a few zk timeouts when
index load is very high.

After the failure it can take the now inconsistent node a few hours to
recover. After numerous failed recovery attempts the failed node seems to
sync up.

I have attached a log file demonstrating this.

We see lots of timeout requests, seemingly when the failed node tries to
sync up with the current leader by doing a full sync. This seems wrong,
there should be no reason for a timeout to happen here?

I am able to manually copy the index using tar + netcat in a few minutes.
The replication handler takes

INFO: Total time taken for download : 3549 secs

Why does it take so long to recover?

Are we better off manually replicating the index?

Much appreciated,
Thanks,
John


sample.txt.gz
Description: GNU Zip compressed data


Re: Indexing question

2013-01-23 Thread Alan Rykhus
Hello,

I do nightly builds for one of my sites. I build the new index in a
parallel directory. When it is finished I move the old files to a backup
directory(I only save one, delete the previous), move the new database
files to the correct place, then stop and restart solr. It sees the new
database and uses it.

Moving the old files over, means I always have a quick backup if
something goes wrong. It hasn't happened yet though.

al

On Wed, 2013-01-23 at 13:32 -0600, Ron Poling wrote:
> Hello!
> 
> I'm new to solr and trying to figure out how to implement it in our
>  environment. My question involves building the index. Our data does
>  not lend itself to delta updates so we have to build the entire index
>  each time. Is there some way to feed solr a file with all index
>  records and tell it to throw away all current data and use only the
>  new? I'm guessing that I could delete everything and add all the new
>  records, but until the new index was built, solr would not be able to
>  service my web app. I would like to build the new index in solr and
>  then tell it to switch to it and remove the old one. Is that possible?
>  Another way of doing the might be to update the current index with the
>  new data and then delete everything that didn't get updated.
> 
> Any help here would be appreciated so I can focus on the things in the wiki 
> that I need to before I start implementing. Thanks!

-- 
Alan Rykhus
PALS, A Program of the Minnesota State Colleges and Universities 
(507)389-1975
alan.ryk...@mnsu.edu
"Be pleasant until ten o'clock in the morning and the rest of the day
will take care of itself." ~ Elbert Hubbard



ResultSet Solr

2013-01-23 Thread hassancrowdc
Is there anyway i can get rid of the response header(response header, status,
Qtime,response, numFound, start, docs) from the resultset of the query in
solr. I only want to see the result without this info at the top.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/ResultSet-Solr-tp4035729.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud index recovery

2013-01-23 Thread Marcin Rzewucki
Guys, I pasted you the full log (see pastebin url). Yes, it is Solr4.0. 2
cores are in sync, but the 3rd one is not:
INFO: PeerSync Recovery was not successful - trying replication. core=ofac
INFO: Starting Replication Recovery. core=ofac

It started replication and even says it is done successfully:
INFO: Replication Recovery was successful - registering as Active. core=ofac

but index files were not downloaded. It's empty, no docs. Also I do not see
"replication.properties" file. "tlog" dir is empty and "index" dir contains
only 3 files: segments.gen, segments_7 and write.lock
It seems to be tough issue. Anyway, thanks for your help.


On 23 January 2013 15:41, Mark Miller  wrote:

> Looks like it shows 3 cores start - 2 with versions that decide they are
> up to date and one that replicates. The one that replicates doesn't have
> much logging showing that activity.
>
> Is this Solr 4.0?
>
> - Mark
>
> On Jan 23, 2013, at 9:27 AM, Upayavira  wrote:
>
> > Mark,
> >
> > Take a peek in the pastebin url Marcin mentioned earlier
> > (http://pastebin.com/qMC9kDvt) is there enough info there?
> >
> > Upayavira
> >
> > On Wed, Jan 23, 2013, at 02:04 PM, Mark Miller wrote:
> >> Was your full logged stripped? You are right, we need more. Yes, the
> peer
> >> sync failed, but then you cut out all the important stuff about the
> >> replication attempt that happens after.
> >>
> >> - Mark
> >>
> >> On Jan 23, 2013, at 5:28 AM, Marcin Rzewucki 
> wrote:
> >>
> >>> Hi,
> >>> Previously, I took the lines related to collection I tested. Maybe
> some interesting part was missing. I'm sending the full log this time.
> >>> It ends up with:
> >>> INFO: Finished recovery process. core=ofac
> >>>
> >>> The issue I described is related to collection called "ofac". I hope
> the log is meaningful now.
> >>>
> >>> It is trying to do the replication, but it seems to not know which
> files to download.
> >>>
> >>> Regards.
> >>>
> >>> On 23 January 2013 10:39, Upayavira  wrote:
> >>> the first stage is identifying whether it can sync with transaction
> >>> logs. It couldn't, because there's no index. So the logs you have shown
> >>> make complete sense. It then says 'trying replication', which is what I
> >>> would expect, and the bit you are saying has failed. So the interesting
> >>> bit is likely immediately after the snippet you showed.
> >>>
> >>>
> >>>
> >>> Upayavira
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> On Wed, Jan 23, 2013, at 07:40 AM, Marcin Rzewucki wrote:
> >>>
> >>>  OK, so I did yet another test. I stopped solr, removed whole "data/"
> >>>  dir and started Solr again. Directories were recreated fine, but
> >>>  missing files were not downloaded from leader. Log is attached (I
> >>>  took the lines related to my test with 2 lines of context. I hope it
> >>>  helps.). I could find the following warning message:
> >>>
> >>>
> >>> Jan 23, 2013 7:16:08 AM org.apache.solr.update.PeerSync sync
> >>> INFO: PeerSync: core=ofac url=http://:8983/solr START
> >>> replicas=[http://:8983/solr/ofac/] nUpdates=100
> >>> Jan 23, 2013 7:16:08 AM org.apache.solr.update.PeerSync sync
> >>> WARNING: no frame of reference to tell of we've missed updates
> >>> Jan 23, 2013 7:16:08 AM org.apache.solr.cloud.RecoveryStrategy
> >>> doRecovery
> >>> INFO: PeerSync Recovery was not successful - trying replication.
> >>> core=ofac
> >>>
> >>> So it did not know which files to download ?? Could you help me to
> >>> solve this problem ?
> >>>
> >>> Thanks in advance.
> >>> Regards.
> >>>
> >>> On 22 January 2013 23:06, Yonik Seeley <[1]yo...@lucidworks.com>
> wrote:
> >>>
> >>> On Tue, Jan 22, 2013 at 4:37 PM, Marcin Rzewucki
> >>> <[2]mrzewu...@gmail.com> wrote:
> >>>
>  Sorry, my mistake. I did 2 tests: in the 1st I removed just index
> >>> directory
> >>>
>  and in 2nd test I removed both index and tlog directory. Log lines
> >>> I've
> >>>
>  sent are related to the first case. So Solr could read tlog directory
> >>> in
> >>>
>  that moment.
> >>>
>  Anyway, do you have an idea why it did not download files from leader
> >>> ?
> >>>
> >>> For your 1st test, if you only deleted the index and not the
> >>>
> >>> transaction logs, Solr will look at the transaction logs to try and
> >>>
> >>> determine if it is up to date or not (by comparing with peers).
> >>>
> >>> If you want to clear out all the data, remove the entire data
> >>> directory.
> >>>
> >>>
> >>>
> >>> -Yonik
> >>>
> >>> [3]http://lucidworks.com
> >>>
> >>> References
> >>>
> >>> 1. mailto:yo...@lucidworks.com
> >>> 2. mailto:mrzewu...@gmail.com
> >>> 3. http://lucidworks.com/
> >>>
> >>
>
>


Re: Solrcloud 4.1 inconsistent # of results in replicas

2013-01-23 Thread Roupihs
What can I provide to get more insight into this.

I have tried lowering the commit maxDocs but the difference in nodes is
several times the maxDocs.

If I bring up a new node it will get the correct version, however if I kill
the leader, the wrong version becomes the master, and replicates out.

-Joey 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solrcloud-4-1-inconsistent-of-results-in-replicas-tp4035638p4035739.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Indexing question

2013-01-23 Thread Upayavira
you can do this with cores. You can have one core to serve the public,
and one for indexing. Then, when you've finished updating your index,
you use the core admin handler to swap the cores around. Then you do the
same thing the following night. Doesn't require any file moving nor any
restarts of solr, just a call to an HTTP URL.

Upayavira

On Wed, Jan 23, 2013, at 07:51 PM, Alan Rykhus wrote:
> Hello,
> 
> I do nightly builds for one of my sites. I build the new index in a
> parallel directory. When it is finished I move the old files to a backup
> directory(I only save one, delete the previous), move the new database
> files to the correct place, then stop and restart solr. It sees the new
> database and uses it.
> 
> Moving the old files over, means I always have a quick backup if
> something goes wrong. It hasn't happened yet though.
> 
> al
> 
> On Wed, 2013-01-23 at 13:32 -0600, Ron Poling wrote:
> > Hello!
> > 
> > I'm new to solr and trying to figure out how to implement it in our
> >  environment. My question involves building the index. Our data does
> >  not lend itself to delta updates so we have to build the entire index
> >  each time. Is there some way to feed solr a file with all index
> >  records and tell it to throw away all current data and use only the
> >  new? I'm guessing that I could delete everything and add all the new
> >  records, but until the new index was built, solr would not be able to
> >  service my web app. I would like to build the new index in solr and
> >  then tell it to switch to it and remove the old one. Is that possible?
> >  Another way of doing the might be to update the current index with the
> >  new data and then delete everything that didn't get updated.
> > 
> > Any help here would be appreciated so I can focus on the things in the wiki 
> > that I need to before I start implementing. Thanks!
> 
> -- 
> Alan Rykhus
> PALS, A Program of the Minnesota State Colleges and Universities 
> (507)389-1975
> alan.ryk...@mnsu.edu
> "Be pleasant until ten o'clock in the morning and the rest of the day
> will take care of itself." ~ Elbert Hubbard
> 


Re: ResultSet Solr

2013-01-23 Thread Rafał Kuć
Hello!

Maybe you are looking to get the results in plain text if you want to
remove all the XML tags ? If that so, you can try adding wt=csv to get
the response as CSV instead of XML.

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

> Thanks. half problem solved. is there anyway i can get rid of the rest:
> response, numFound, start, doc?



> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/ResultSet-Solr-tp4035729p4035733.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr 4.0 indexing performance question

2013-01-23 Thread Kevin Stone
Another revelation...
I can see that there is a time difference in the Solr output for adding
these documents when I watch it realtime.
Here are some rows from the 3.5 solr server:

Jan 23, 2013 11:57:23 AM org.apache.solr.core.SolrCore execute
INFO: [gxdResult] webapp=/solr path=/update/javabin
params={wt=javabin&version=2} status=0 QTime=6196
Jan 23, 2013 11:57:23 AM
org.apache.solr.update.processor.LogUpdateProcessor finish
INFO: {add=[RNA in situ-1386104, RNA in situ-1351487, RNA in situ-1363917,
RNA in situ-1377125, RNA in situ-1371738, RNA in situ-1378746, RNA in
situ-1383410, RNA in situ-1362712, ... (1001 adds)]} 0 6266
Jan 23, 2013 11:57:23 AM org.apache.solr.core.SolrCore execute
INFO: [gxdResult] webapp=/solr path=/update/javabin
params={wt=javabin&version=2} status=0 QTime=6266
Jan 23, 2013 11:57:24 AM
org.apache.solr.update.processor.LogUpdateProcessor finish
INFO: {add=[RNA in situ-1371578, RNA in situ-1377716, RNA in situ-1378151,
RNA in situ-1360580, RNA in situ-1391657, RNA in situ-1370288, RNA in
situ-1388236, RNA in situ-1361465, ... (1001 adds)]} 0 6371
Jan 23, 2013 11:57:24 AM org.apache.solr.core.SolrCore execute
INFO: [gxdResult] webapp=/solr path=/update/javabin
params={wt=javabin&version=2} status=0 QTime=6371
Jan 23, 2013 11:57:24 AM
org.apache.solr.update.processor.LogUpdateProcessor finish
INFO: {add=[RNA in situ-1350555, RNA in situ-1350887, RNA in situ-1379699,
RNA in situ-1373773, RNA in situ-1374004, RNA in situ-1372265, RNA in
situ-1373027, RNA in situ-1380691, ... (1001 adds)]} 0 6440
Jan 23, 2013 11:57:24 AM org.apache.solr.core.SolrCore execute



And here from the 4.0 solr:

Jan 23, 2013 3:40:22 PM
org.apache.solr.update.processor.LogUpdateProcessor finish
INFO: [gxdResult] webapp=/solr path=/update params={wt=javabin&version=2}
{add=[RNA in situ-115650, RNA in situ-4109, RNA in situ-107614, RNA in
situ-86038, RNA in situ-19647, RNA in situ-1422, RNA in situ-119536, RNA
in situ-5, RNA in situ-86825, RNA in situ-91009, ... (1001 adds)]} 0
3105
Jan 23, 2013 3:40:23 PM
org.apache.solr.update.processor.LogUpdateProcessor finish
INFO: [gxdResult] webapp=/solr path=/update params={wt=javabin&version=2}
{add=[RNA in situ-38103, RNA in situ-15797, RNA in situ-79946, RNA in
situ-124877, RNA in situ-62025, RNA in situ-67908, RNA in situ-70527, RNA
in situ-20581, RNA in situ-107574, RNA in situ-96497, ... (1001 adds)]} 0
2689
Jan 23, 2013 3:40:24 PM
org.apache.solr.update.processor.LogUpdateProcessor finish
INFO: [gxdResult] webapp=/solr path=/update params={wt=javabin&version=2}
{add=[RNA in situ-35518, RNA in situ-50512, RNA in situ-109961, RNA in
situ-113025, RNA in situ-33729, RNA in situ-116967, RNA in situ-133871,
RNA in situ-55287, RNA in situ-67367, RNA in situ-8617, ... (1001 adds)]}
0 2367
Jan 23, 2013 3:40:28 PM
org.apache.solr.update.processor.LogUpdateProcessor finish
INFO: [gxdResult] webapp=/solr path=/update params={wt=javabin&version=2}
{add=[RNA in situ-105749, RNA in situ-125415, RNA in situ-14667, RNA in
situ-41067, RNA in situ-1099, RNA in situ-86169, RNA in situ-90834, RNA in
situ-114639, RT-PCR-26160, RNA in situ-79745, ... (1001 adds)]} 0 3401
Jan 23, 2013 3:40:28 PM
org.apache.solr.update.processor.LogUpdateProcessor finish
INFO: [gxdResult] webapp=/solr path=/update params={wt=javabin&version=2}
{add=[RNA in situ-82061, RNA in situ-96965, RNA in situ-22677, RNA in
situ-52637, RNA in situ-131842, RNA in situ-31863, RNA in situ-111656, RNA
in situ-120509, RNA in situ-29659, RNA in situ-63579, ... (1001 adds)]} 0
3580
Jan 23, 2013 3:40:31 PM
org.apache.solr.update.processor.LogUpdateProcessor finish



I know that they aren't the same exact documents (like I said, there are
millions to load), but the times look pretty much like this for all of
them.

Can someone help me parse out the times of this? It *appears* to me that
the inserts are happening just as fast, if not faster in 4.0 than 3.5, BUT
the timestamps between the LogUpdateProcessor calls are much longer in
4.0.
I do not have the  tag anywhere in my solrconfig.xml. So why
does it look to me like it is spending a lot of time logging? It shouldn't
really be logging anything, right? Bear in mind that these inserts happen
in threads that are pushing to Solr concurrently. So if 4.0 is logging
somewhere that 3.5 didn't, then the file-locking on that log file could be
slowing me down.

-Kevin

On 1/23/13 12:03 PM, "Kevin Stone"  wrote:

>I'm still poking around trying to find the differences. I found a couple
>things that may or may not be relevant.
>First, when I start up my 3.5 solr, I get all sorts of warnings that my
>solrconfig is old and will run using 2.4 emulation.
>Of course I had to upgrade the solconfig for the 4.0 instance (which I
>already described). I am curious if there could be some feature I was
>taking advantage of in 2.4 that doesn't exist now in 4.0. I don't know.
>
>Second when I look at the console logs for my server (3.5 and 4.0) and I
>run the indexer against each, I see a subtl

Re: ResultSet Solr

2013-01-23 Thread hassancrowdc
no I wanted it in json. i want it to start from where square bracket starts [
. I want to remove everything before that. I can get it in json by including
wt=json. I just want to remove Response, numFound, start and docs. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/ResultSet-Solr-tp4035729p4035748.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: ResultSet Solr

2013-01-23 Thread Walter Underwood
Why? Just skip over that in the code. --wunder

On Jan 23, 2013, at 12:50 PM, hassancrowdc wrote:

> no I wanted it in json. i want it to start from where square bracket starts [
> . I want to remove everything before that. I can get it in json by including
> wt=json. I just want to remove Response, numFound, start and docs. 
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/ResultSet-Solr-tp4035729p4035748.html
> Sent from the Solr - User mailing list archive at Nabble.com.






Re: ResultSet Solr

2013-01-23 Thread Rafał Kuć
Hello!

As far as I know you can't remove the response, numFound, start and
docs. This is how the response is prepared by Solr and apart from
removing the header, you can't do anything. 

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

> no I wanted it in json. i want it to start from where square bracket starts [
> . I want to remove everything before that. I can get it in json by including
> wt=json. I just want to remove Response, numFound, start and docs. 



> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/ResultSet-Solr-tp4035729p4035748.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Solr SQL Express Integrated Security - Unable to execute query

2013-01-23 Thread O. Olson
Hi,

I am using the /example-DIH in the Solr 4.0 download. The example worked
out of the box using the HSQLDB. I then attempted to modify the files to
connect to a SQL Express instance running on my local machine. A
http://localhost:8983/solr/db/dataimport?command=full-import results in 

org.apache.solr.common.SolrException log
SEVERE: Full Import failed:java.lang.RuntimeException:
java.lang.RuntimeException:
org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to
execute query: SELECT [ProdID],[Descr] FROM [Amazon].[dbo].[Table_Temp]
Processing Document # 1
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:273) …

I first copied sqljdbc4.jar (from Microsoft)  to
/example/example-DIH/solr/db/lib. I have the following db-data-config.xml:











I have adjusted my schema.xml file accordingly.

Is there anyway I can debug this problem? I want to use Integrated
Security/Authentication, am I doing this correctly?

Thank you for all the help.
O. O.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-SQL-Express-Integrated-Security-Unable-to-execute-query-tp4035758.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: ResultSet Solr

2013-01-23 Thread Upayavira
If you can handle it in XML, use wt=xml&tr=foo.xsl and use a stylesheet
to format it as you want.

Upayavira

On Wed, Jan 23, 2013, at 08:53 PM, Rafał Kuć wrote:
> Hello!
> 
> As far as I know you can't remove the response, numFound, start and
> docs. This is how the response is prepared by Solr and apart from
> removing the header, you can't do anything. 
> 
> -- 
> Regards,
>  Rafał Kuć
>  Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
>  ElasticSearch
> 
> > no I wanted it in json. i want it to start from where square bracket starts 
> > [
> > . I want to remove everything before that. I can get it in json by including
> > wt=json. I just want to remove Response, numFound, start and docs. 
> 
> 
> 
> > --
> > View this message in context:
> > http://lucene.472066.n3.nabble.com/ResultSet-Solr-tp4035729p4035748.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> 


Starting instances with multiple collections

2013-01-23 Thread Walter Underwood
I can get one Solr 4.1 instance up with the config bootstrapped into Zookeeper. 
In zk I see two configs, two collections, and I can run the DIH on the first 
node.

I can get the other two nodes to start and sync if I give them a 
-Dsolr.solr.home pointing to a directory with a solr.xml and subdirectories 
with configuration for each collection. If I don't do that, they look for 
solr/solr.xml, then fail. But what is the point of putting configs in Zookeeper 
if each host needs a copy anyway?

The wiki does not have an example of how to start a cluster with multiple 
collections.

Am I missing something here?

wunder
--
Walter Underwood
wun...@wunderwood.org





RE: firstSearcher and NewSearcher parameters

2013-01-23 Thread Petersen, Robert
Thanks Hoss, Good to know!  

I have that exact situation:  a complex function based on multiple field values 
that I always run for particular types of searches including global star 
searches to aid in sorting the results appropriately.  

Robi


-Original Message-
From: Chris Hostetter [mailto:hossman_luc...@fucit.org] 
Sent: Wednesday, January 23, 2013 11:40 AM
To: solr-user@lucene.apache.org
Subject: RE: firstSearcher and NewSearcher parameters


: OK I guess I see how that makes sense.  If I use function queries for
: affecting the scoring of results, does it help to include those in the
: warm up queries or does the same thing go for those also?  IE is it
: useless to add {!boost%20b=... ?

boosts on *queries* probably won't affect your warming queries (unless you have 
concerned about a particularly important/expensive query and you always want 
that exact query to be warmed) but if you typically boost on some functions of 
field values then including those functions in your warming queries can be 
helpul to ensure that the field caches for the fields used in those functions 
are warmed up.


-Hoss




RE: Issues with docFreq/docCount on SolrCloud

2013-01-23 Thread Markus Jelsma
Hi again,

I've tried various settings for TieredMergePolicy to make sure the docFreq, 
maxDoc and docCount don't deviate too much. We've also did tests after 
increasing reclaimDeletesWeight from 2.0 to 8.0 and slightly more frequent 
merging. In these tests we reindexed the same 500k docs each time in different 
cores with various settings at the same time.

We still see documents in distributed queries being scored slightly different 
leading to documents jumping positions in the resultset, which is obviously 
unacceptable.

To clarify, these documents don't jump positions because of them having the 
same score and being sorted by Lucene docID, it's the actual score being 
different. Also, the index doesn't change when we fire queries and it's not a 
problem of lacking distributed IDF. It is, of course, acceptable for documents 
to jump position on a frequently changing index, that's the way it works. But 
not for a multiple replica's on a static index.

Is there anyone around here with suggestions, hints or anything?

The next thing we might try is to route the same user to the same replica of a 
shard by overriding the http shard handler but i'm not sure this is a proper 
solution. This, at least, might prevent users from seeing documents jumping 
positions in the same result set.

Thanks,
Markus
 
-Original message-
> From:Markus Jelsma 
> Sent: Mon 21-Jan-2013 20:31
> To: solr-user@lucene.apache.org
> Subject: Issues with docFreq/docCount on SolrCloud
> 
> Hi,
> 
> We have a few trunk clusters running with two replica's for each shard. We 
> sometimes see results jumping positions for identical queries. We've tracked 
> it down to differences in docFreq and docCount between the leader and 
> replica's. The only way to force all cores in the shard to be consistent is 
> to optimize or forceMerge the segments.
> 
> Is there anyone here who can give advice on this issue? For obvious reasons 
> we don't want to to optimize 50GB of data on some regular basis but we do 
> want to make sure the variations in docFreq/docCount does not lead to results 
> jumping positions in the resultset for identical queries.
> 
> We already have like most of you small issues due to the lack of distributed 
> IDF, having this problem as well makes SolrCloud less predictable and harder 
> to debug.
> 
> Thanks,
> Markus
> 


RE: Issues with docFreq/docCount on SolrCloud

2013-01-23 Thread Michael Ryan
Are you able to see any evidence that some of the 500k docs are being added 
twice? Check the maxDocs on the Solr admin page. I vaguely recall there being 
some issue with docs in SolrCloud being added multiple times (which under the 
covers is really add, delete, add). I think that could cause the docCount to be 
different across "identical" indexes. That would also explain why a forceMerge 
fixes it, as the deleted documents are then fully removed.

-Michael

-Original Message-
From: Markus Jelsma [mailto:markus.jel...@openindex.io] 
Sent: Wednesday, January 23, 2013 5:38 PM
To: solr-user@lucene.apache.org
Subject: RE: Issues with docFreq/docCount on SolrCloud

Hi again,

I've tried various settings for TieredMergePolicy to make sure the docFreq, 
maxDoc and docCount don't deviate too much. We've also did tests after 
increasing reclaimDeletesWeight from 2.0 to 8.0 and slightly more frequent 
merging. In these tests we reindexed the same 500k docs each time in different 
cores with various settings at the same time.

We still see documents in distributed queries being scored slightly different 
leading to documents jumping positions in the resultset, which is obviously 
unacceptable.

To clarify, these documents don't jump positions because of them having the 
same score and being sorted by Lucene docID, it's the actual score being 
different. Also, the index doesn't change when we fire queries and it's not a 
problem of lacking distributed IDF. It is, of course, acceptable for documents 
to jump position on a frequently changing index, that's the way it works. But 
not for a multiple replica's on a static index.

Is there anyone around here with suggestions, hints or anything?

The next thing we might try is to route the same user to the same replica of a 
shard by overriding the http shard handler but i'm not sure this is a proper 
solution. This, at least, might prevent users from seeing documents jumping 
positions in the same result set.

Thanks,
Markus
 
-Original message-
> From:Markus Jelsma 
> Sent: Mon 21-Jan-2013 20:31
> To: solr-user@lucene.apache.org
> Subject: Issues with docFreq/docCount on SolrCloud
> 
> Hi,
> 
> We have a few trunk clusters running with two replica's for each shard. We 
> sometimes see results jumping positions for identical queries. We've tracked 
> it down to differences in docFreq and docCount between the leader and 
> replica's. The only way to force all cores in the shard to be consistent is 
> to optimize or forceMerge the segments.
> 
> Is there anyone here who can give advice on this issue? For obvious reasons 
> we don't want to to optimize 50GB of data on some regular basis but we do 
> want to make sure the variations in docFreq/docCount does not lead to results 
> jumping positions in the resultset for identical queries.
> 
> We already have like most of you small issues due to the lack of distributed 
> IDF, having this problem as well makes SolrCloud less predictable and harder 
> to debug.
> 
> Thanks,
> Markus
> 


RE: Issues with docFreq/docCount on SolrCloud

2013-01-23 Thread Markus Jelsma
Hi Michael,

The evidence is how Lucene works, and that i add the same docs over and over 
again in tests. If i index 500k docs to an index that already has the same 500k 
docs it means i write a delete flag to the old 500k and add the new 500k, 
leading to a million docs (maxDoc). You're correct, only by merging segments 
(or optimize/forceMerge) i can reduce (or stabilize) maxDoc on all replica's.

Old school replication has an advantage as identical segments are replicated. 
In SolrCloud only docs are pushed to replica's. The problem now is that 
replica's don't merge at the same time, leading to differences in maxDoc, 
docCount and docFreq.

We need, and i think many SolrCloud users are going to need this as well, to 
make replica's don't deviate too much from eachother, because if they do 
documents are certainly going to jump positions.

Many thanks for sharing your thoughts,
Markus

 
 
-Original message-
> From:Michael Ryan 
> Sent: Wed 23-Jan-2013 23:50
> To: solr-user@lucene.apache.org
> Subject: RE: Issues with docFreq/docCount on SolrCloud
> 
> Are you able to see any evidence that some of the 500k docs are being added 
> twice? Check the maxDocs on the Solr admin page. I vaguely recall there being 
> some issue with docs in SolrCloud being added multiple times (which under the 
> covers is really add, delete, add). I think that could cause the docCount to 
> be different across "identical" indexes. That would also explain why a 
> forceMerge fixes it, as the deleted documents are then fully removed.
> 
> -Michael
> 
> -Original Message-
> From: Markus Jelsma [mailto:markus.jel...@openindex.io] 
> Sent: Wednesday, January 23, 2013 5:38 PM
> To: solr-user@lucene.apache.org
> Subject: RE: Issues with docFreq/docCount on SolrCloud
> 
> Hi again,
> 
> I've tried various settings for TieredMergePolicy to make sure the docFreq, 
> maxDoc and docCount don't deviate too much. We've also did tests after 
> increasing reclaimDeletesWeight from 2.0 to 8.0 and slightly more frequent 
> merging. In these tests we reindexed the same 500k docs each time in 
> different cores with various settings at the same time.
> 
> We still see documents in distributed queries being scored slightly different 
> leading to documents jumping positions in the resultset, which is obviously 
> unacceptable.
> 
> To clarify, these documents don't jump positions because of them having the 
> same score and being sorted by Lucene docID, it's the actual score being 
> different. Also, the index doesn't change when we fire queries and it's not a 
> problem of lacking distributed IDF. It is, of course, acceptable for 
> documents to jump position on a frequently changing index, that's the way it 
> works. But not for a multiple replica's on a static index.
> 
> Is there anyone around here with suggestions, hints or anything?
> 
> The next thing we might try is to route the same user to the same replica of 
> a shard by overriding the http shard handler but i'm not sure this is a 
> proper solution. This, at least, might prevent users from seeing documents 
> jumping positions in the same result set.
> 
> Thanks,
> Markus
>  
> -Original message-
> > From:Markus Jelsma 
> > Sent: Mon 21-Jan-2013 20:31
> > To: solr-user@lucene.apache.org
> > Subject: Issues with docFreq/docCount on SolrCloud
> > 
> > Hi,
> > 
> > We have a few trunk clusters running with two replica's for each shard. We 
> > sometimes see results jumping positions for identical queries. We've 
> > tracked it down to differences in docFreq and docCount between the leader 
> > and replica's. The only way to force all cores in the shard to be 
> > consistent is to optimize or forceMerge the segments.
> > 
> > Is there anyone here who can give advice on this issue? For obvious reasons 
> > we don't want to to optimize 50GB of data on some regular basis but we do 
> > want to make sure the variations in docFreq/docCount does not lead to 
> > results jumping positions in the resultset for identical queries.
> > 
> > We already have like most of you small issues due to the lack of 
> > distributed IDF, having this problem as well makes SolrCloud less 
> > predictable and harder to debug.
> > 
> > Thanks,
> > Markus
> > 
> 


setting up master and slave in same machine with diff ip's and same port

2013-01-23 Thread epnRui
Hi everyone 

its my first post here so I hope im doing it in the right place. 

Im a software developer and Im setting up a DEV environment in Ubuntu with
the same configuration as in PROD. (apparently this IT department doesnt
know the difference between a developer and a sys admin) 

In PROD we have Solr Master and Solr slave, on two different IPs. Lets say: 
Master 192.10.1.1 
Slave 192.10.1.2 

In DEV I have only one server: 
10.1.1.1 

All of them are Ubuntu servers. 

Can I put Master and Slave, without touching any configurations in Solr,no
IP change, no Port change, in 10.1.1.1 (DEV), and still make it work? 

Basically what Im looking for is what Ubuntu server configuration Id have to
do to make this work. 

Thanks a lot



--
View this message in context: 
http://lucene.472066.n3.nabble.com/setting-up-master-and-slave-in-same-machine-with-diff-ip-s-and-same-port-tp4035795.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Issues with docFreq/docCount on SolrCloud

2013-01-23 Thread Yonik Seeley
On Wed, Jan 23, 2013 at 6:15 PM, Markus Jelsma
 wrote:
> We need, and i think many SolrCloud users are going to need this as well, to 
> make replica's don't deviate too much from eachother, because if they do 
> documents are certainly going to jump positions.

The synchronization that would be needed to guarantee identical lucene
indexes would be prohibitively expensive.
Perhaps the best solution would be to route users to the same replicas.

A solr request could request a token that when resubmitted with a
follow-up request would result in hitting the same replicas if
possible.

-Yonik
http://lucidworks.com


Re: Issues with docFreq/docCount on SolrCloud

2013-01-23 Thread Mark Miller

On Jan 23, 2013, at 6:21 PM, Yonik Seeley  wrote:

> A solr request could request a token that when resubmitted with a
> follow-up request would result in hitting the same replicas if
> possible.

Yeah, this would be good. It's also useful for not catching "eventual 
consistency" effects between queries.

- Mark

Sorting on Score Problem

2013-01-23 Thread Kuai, Ben
Hi

We met a wired problem in our project when sorting by score in Solr 4.0, the 
biggest score document is not a the top the debug explanation from solr are 
like this,

First Document
1.8412635 = (MATCH) sum of:
  2675.7964 = (MATCH) sum of:
0.0 = (MATCH) sum of:
  0.0 = (MATCH) max of:
0.0 = (MATCH) btq, product of:
  0.0 = weight(nameComplexNoTfNoIdf:plumber^0.0 in 0) [], result of:
0.0 = score(doc=0,freq=1.0 = phraseFreq=1.0
..

Second Document
1.8412637 = (MATCH) sum of:
  0.26757964 = (MATCH) sum of:
0.0 = (MATCH) sum of:
  0.0 = (MATCH) max of:
0.0 = (MATCH) btq, product of:
  0.0 = weight(nameComplexNoTfNoIdf:plumber^0.0 in 0) [], result of:
0.0 = score(doc=0,freq=1.0 = phraseFreq=1.0
.

Third Document
1.841253 = (MATCH) sum of:
  2675.7964 = (MATCH) sum of:
0.0 = (MATCH) sum of:
  0.0 = (MATCH) max of:
0.0 = (MATCH) btq, product of:
  0.0 = weight(nameComplexNoTfNoIdf:plumber^0.0 in 0) [], result of:
0.0 = score(doc=0,freq=1.0 = phraseFreq=1.0
...


Then we thought it could be a float rounding problem then we implement our own 
similarity class to increse queryNorm by 10,000 and it changes the score scale 
but the rank is still wrong.

Dose Anyone have the similiar issue?

I can debug with solr source code and please shed some light on the sorting part

Thanks


Re: Question on Solr Velocity Example

2013-01-23 Thread Chris Hostetter

: References: <50f8af05.8030...@elyograg.org>
:  
:  <50f99712.80...@elyograg.org>
:  
: Message-ID: <1358538442.4125.yahoomail...@web171802.mail.ir2.yahoo.com>
: Subject: Question on Solr Velocity Example
: In-Reply-To: 

https://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists

When starting a new discussion on a mailing list, please do not reply to 
an existing message, instead start a fresh email.  Even if you change the 
subject line of your email, other mail headers still track which thread 
you replied to and your question is "hidden" in that thread and gets less 
attention.   It makes following discussions in the mailing list archives 
particularly difficult.



-Hoss


Re: Solrcloud 4.1 inconsistent # of results in replicas

2013-01-23 Thread Roupihs
This is now JIRA issue SOLR-4343



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solrcloud-4-1-inconsistent-of-results-in-replicas-tp4035638p4035825.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: uniqueKey field type

2013-01-23 Thread Otis Gospodnetic
Hi,

I think trie type fields add value only if you do range queries in them and
it sounds like that is bit your use case.

Otis
Solr & ElasticSearch Support
http://sematext.com/
On Jan 23, 2013 2:53 PM, "Isaac Hebsh"  wrote:

> Hi,
>
> In my use case, Solr have to to return only the "id" field, as a response
> for queries. However, it should return 1000 docs at once (rows=1000).
>
> My id field is defined as StrField, due to external systems constraints.
>
> I guess that TrieFields are more efficient than StrFields. *Theoretically*,
> the field content can be retrieved without loading the stored field.
>
> Should I strive that the id will be managed as a number, or it has no
> contribution to performance (search & retrieve times)?
>
> (Yes, I know that lucene has an internal id mechanism. I think it is not
> relevant to my question...)
>
>
> - Isaac.
>


Re: Sorting on Score Problem

2013-01-23 Thread Chris Hostetter

: We met a wired problem in our project when sorting by score in Solr 4.0, 
: the biggest score document is not a the top the debug explanation from 
: solr are like this,

that's weird ... can you post the full debugQuery output of a an example 
query showing the problem, using "echoParams=all" & "fl=id,score" (or 
whatever unique key field you have)

also: can you elaborate wether you are using a single node setup or a 
distributed (ie: SolrCloud) query?

: Then we thought it could be a float rounding problem then we implement 
: our own similarity class to increse queryNorm by 10,000 and it changes 
: the score scale but the rank is still wrong.

when you post the details request above, please don't use your custom 
similarity (just the out of the box solr code) so there's one less 
variable in the equation.


-Hoss


Re: Solr4 SolrCloud ClusterState says we are the leader, but locally we don't think so

2013-01-23 Thread Otis Gospodnetic
Hi,

Solr4 is 4.0 or 4.1? If the former try the latter first?

Otis
Solr & ElasticSearch Support
http://sematext.com/
On Jan 23, 2013 2:51 PM, "John Skopis (lists)"  wrote:

> Hello,
>
> We have recently put solr4 into production.
>
> We have a 3 node cluster with a single shard. Each solr node is also a
> zookeeper node, but zookeeper is running in cluster mode. We are using the
> cloudera zookeeper package.
>
> There is no communication problems between nodes. They are in two
> different racks directly connected over a 2Gb uplink. The nodes each have a
> 1Gb uplink.
>
> I was thinking ideally mmsolr01 would be the leader, the application sends
> all index requests directly to the leader node. A load balancer splits read
> requests over the remaining two nodes.
>
> We autocommit every 300s or 10k documents with a softcommit every 5s. The
> index is roughly 200mm documents.
>
> I have configured a cron to run every hour (on every node):
> 0 * * * * /usr/bin/curl -s '
> http://localhost:8983/solr/collection1/replication?command=backup&numberToKeep=3'
> > /dev/null
>
> Using a snapshot seems to be the easiest way to reproduce, but it's also
> possible to reproduce under very heavy indexing load.
>
> When the snapshot is running, occasionally we get a zk timeout, causing
> the leader to drop out of the cluster. We have also seen a few zk timeouts
> when index load is very high.
>
> After the failure it can take the now inconsistent node a few hours to
> recover. After numerous failed recovery attempts the failed node seems to
> sync up.
>
> I have attached a log file demonstrating this.
>
> We see lots of timeout requests, seemingly when the failed node tries to
> sync up with the current leader by doing a full sync. This seems wrong,
> there should be no reason for a timeout to happen here?
>
> I am able to manually copy the index using tar + netcat in a few minutes.
> The replication handler takes
>
> INFO: Total time taken for download : 3549 secs
>
> Why does it take so long to recover?
>
> Are we better off manually replicating the index?
>
> Much appreciated,
> Thanks,
> John
>
>
>
>
>
>
>
>


Get tokenized words in Solr Response

2013-01-23 Thread Romita Saha
Hi,

I want the tokenized keywords to be displayed in solr response. As for 
example, my solr search could be "Seach this document named XYZ-123". And 
the tokenizer in schema.xml tokenizes the query as follows: 
"search documnent xyz 123". I want to get these tokenized words in the 
Solr response. Is it possible?

Thanks and regards,
Romita 

Re: SolrCloud index recovery

2013-01-23 Thread Mark Miller
Yeah, I don't know what you are seeing offhand. You might try Solr 4.1 and see 
if it's something that has been resolved.

- Mark

On Jan 23, 2013, at 3:14 PM, Marcin Rzewucki  wrote:

> Guys, I pasted you the full log (see pastebin url). Yes, it is Solr4.0. 2
> cores are in sync, but the 3rd one is not:
> INFO: PeerSync Recovery was not successful - trying replication. core=ofac
> INFO: Starting Replication Recovery. core=ofac
> 
> It started replication and even says it is done successfully:
> INFO: Replication Recovery was successful - registering as Active. core=ofac
> 
> but index files were not downloaded. It's empty, no docs. Also I do not see
> "replication.properties" file. "tlog" dir is empty and "index" dir contains
> only 3 files: segments.gen, segments_7 and write.lock
> It seems to be tough issue. Anyway, thanks for your help.
> 
> 
> On 23 January 2013 15:41, Mark Miller  wrote:
> 
>> Looks like it shows 3 cores start - 2 with versions that decide they are
>> up to date and one that replicates. The one that replicates doesn't have
>> much logging showing that activity.
>> 
>> Is this Solr 4.0?
>> 
>> - Mark
>> 
>> On Jan 23, 2013, at 9:27 AM, Upayavira  wrote:
>> 
>>> Mark,
>>> 
>>> Take a peek in the pastebin url Marcin mentioned earlier
>>> (http://pastebin.com/qMC9kDvt) is there enough info there?
>>> 
>>> Upayavira
>>> 
>>> On Wed, Jan 23, 2013, at 02:04 PM, Mark Miller wrote:
 Was your full logged stripped? You are right, we need more. Yes, the
>> peer
 sync failed, but then you cut out all the important stuff about the
 replication attempt that happens after.
 
 - Mark
 
 On Jan 23, 2013, at 5:28 AM, Marcin Rzewucki 
>> wrote:
 
> Hi,
> Previously, I took the lines related to collection I tested. Maybe
>> some interesting part was missing. I'm sending the full log this time.
> It ends up with:
> INFO: Finished recovery process. core=ofac
> 
> The issue I described is related to collection called "ofac". I hope
>> the log is meaningful now.
> 
> It is trying to do the replication, but it seems to not know which
>> files to download.
> 
> Regards.
> 
> On 23 January 2013 10:39, Upayavira  wrote:
> the first stage is identifying whether it can sync with transaction
> logs. It couldn't, because there's no index. So the logs you have shown
> make complete sense. It then says 'trying replication', which is what I
> would expect, and the bit you are saying has failed. So the interesting
> bit is likely immediately after the snippet you showed.
> 
> 
> 
> Upayavira
> 
> 
> 
> 
> 
> On Wed, Jan 23, 2013, at 07:40 AM, Marcin Rzewucki wrote:
> 
> OK, so I did yet another test. I stopped solr, removed whole "data/"
> dir and started Solr again. Directories were recreated fine, but
> missing files were not downloaded from leader. Log is attached (I
> took the lines related to my test with 2 lines of context. I hope it
> helps.). I could find the following warning message:
> 
> 
> Jan 23, 2013 7:16:08 AM org.apache.solr.update.PeerSync sync
> INFO: PeerSync: core=ofac url=http://:8983/solr START
> replicas=[http://:8983/solr/ofac/] nUpdates=100
> Jan 23, 2013 7:16:08 AM org.apache.solr.update.PeerSync sync
> WARNING: no frame of reference to tell of we've missed updates
> Jan 23, 2013 7:16:08 AM org.apache.solr.cloud.RecoveryStrategy
> doRecovery
> INFO: PeerSync Recovery was not successful - trying replication.
> core=ofac
> 
> So it did not know which files to download ?? Could you help me to
> solve this problem ?
> 
> Thanks in advance.
> Regards.
> 
> On 22 January 2013 23:06, Yonik Seeley <[1]yo...@lucidworks.com>
>> wrote:
> 
> On Tue, Jan 22, 2013 at 4:37 PM, Marcin Rzewucki
> <[2]mrzewu...@gmail.com> wrote:
> 
>> Sorry, my mistake. I did 2 tests: in the 1st I removed just index
> directory
> 
>> and in 2nd test I removed both index and tlog directory. Log lines
> I've
> 
>> sent are related to the first case. So Solr could read tlog directory
> in
> 
>> that moment.
> 
>> Anyway, do you have an idea why it did not download files from leader
> ?
> 
> For your 1st test, if you only deleted the index and not the
> 
> transaction logs, Solr will look at the transaction logs to try and
> 
> determine if it is up to date or not (by comparing with peers).
> 
> If you want to clear out all the data, remove the entire data
> directory.
> 
> 
> 
> -Yonik
> 
> [3]http://lucidworks.com
> 
> References
> 
> 1. mailto:yo...@lucidworks.com
> 2. mailto:mrzewu...@gmail.com
> 3. http://lucidworks.com/
> 
 
>> 
>> 



Re: uniqueKey field type

2013-01-23 Thread Isaac Hebsh
"id" field is not serial, it generated randomly.. so range queries on this
field are almost useless.
I mentioned TrieField, because solr.LongField is internally implemented as
a string, while solr.TrieLongField is a number. It might improve
performace, even without setting a precisionStep...


On Thu, Jan 24, 2013 at 3:31 AM, Otis Gospodnetic <
otis.gospodne...@gmail.com> wrote:

> Hi,
>
> I think trie type fields add value only if you do range queries in them and
> it sounds like that is bit your use case.
>
> Otis
> Solr & ElasticSearch Support
> http://sematext.com/
> On Jan 23, 2013 2:53 PM, "Isaac Hebsh"  wrote:
>
> > Hi,
> >
> > In my use case, Solr have to to return only the "id" field, as a response
> > for queries. However, it should return 1000 docs at once (rows=1000).
> >
> > My id field is defined as StrField, due to external systems constraints.
> >
> > I guess that TrieFields are more efficient than StrFields.
> *Theoretically*,
> > the field content can be retrieved without loading the stored field.
> >
> > Should I strive that the id will be managed as a number, or it has no
> > contribution to performance (search & retrieve times)?
> >
> > (Yes, I know that lucene has an internal id mechanism. I think it is not
> > relevant to my question...)
> >
> >
> > - Isaac.
> >
>


Re: Solr SQL Express Integrated Security - Unable to execute query

2013-01-23 Thread Shawn Heisey

On 1/23/2013 2:32 PM, O. Olson wrote:

Hi,

I am using the /example-DIH in the Solr 4.0 download. The example worked
out of the box using the HSQLDB. I then attempted to modify the files to
connect to a SQL Express instance running on my local machine. A
http://localhost:8983/solr/db/dataimport?command=full-import results in

org.apache.solr.common.SolrException log
SEVERE: Full Import failed:java.lang.RuntimeException:
java.lang.RuntimeException:
org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to
execute query: SELECT [ProdID],[Descr] FROM [Amazon].[dbo].[Table_Temp]
Processing Document # 1
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:273) …


There will be a lot more detail to this error.  This detail may have a 
clue about what happened.  Can you include the entire stacktrace?


Thanks,
Shawn



Re: Solr4 SolrCloud ClusterState says we are the leader, but locally we don't think so

2013-01-23 Thread John Skopis (lists)
Sorry for leaving that bit out. This is Solr 4.1.0.

Thanks again,
John

On Wed, Jan 23, 2013 at 5:39 PM, Otis Gospodnetic <
otis.gospodne...@gmail.com> wrote:

> Hi,
>
> Solr4 is 4.0 or 4.1? If the former try the latter first?
>
> Otis
> Solr & ElasticSearch Support
> http://sematext.com/
> On Jan 23, 2013 2:51 PM, "John Skopis (lists)"  wrote:
>
> > Hello,
> >
> > We have recently put solr4 into production.
> >
> > We have a 3 node cluster with a single shard. Each solr node is also a
> > zookeeper node, but zookeeper is running in cluster mode. We are using
> the
> > cloudera zookeeper package.
> >
> > There is no communication problems between nodes. They are in two
> > different racks directly connected over a 2Gb uplink. The nodes each
> have a
> > 1Gb uplink.
> >
> > I was thinking ideally mmsolr01 would be the leader, the application
> sends
> > all index requests directly to the leader node. A load balancer splits
> read
> > requests over the remaining two nodes.
> >
> > We autocommit every 300s or 10k documents with a softcommit every 5s. The
> > index is roughly 200mm documents.
> >
> > I have configured a cron to run every hour (on every node):
> > 0 * * * * /usr/bin/curl -s '
> >
> http://localhost:8983/solr/collection1/replication?command=backup&numberToKeep=3
> '
> > > /dev/null
> >
> > Using a snapshot seems to be the easiest way to reproduce, but it's also
> > possible to reproduce under very heavy indexing load.
> >
> > When the snapshot is running, occasionally we get a zk timeout, causing
> > the leader to drop out of the cluster. We have also seen a few zk
> timeouts
> > when index load is very high.
> >
> > After the failure it can take the now inconsistent node a few hours to
> > recover. After numerous failed recovery attempts the failed node seems to
> > sync up.
> >
> > I have attached a log file demonstrating this.
> >
> > We see lots of timeout requests, seemingly when the failed node tries to
> > sync up with the current leader by doing a full sync. This seems wrong,
> > there should be no reason for a timeout to happen here?
> >
> > I am able to manually copy the index using tar + netcat in a few minutes.
> > The replication handler takes
> >
> > INFO: Total time taken for download : 3549 secs
> >
> > Why does it take so long to recover?
> >
> > Are we better off manually replicating the index?
> >
> > Much appreciated,
> > Thanks,
> > John
> >
> >
> >
> >
> >
> >
> >
> >
>


Re: Starting instances with multiple collections

2013-01-23 Thread Shawn Heisey

On 1/23/2013 3:12 PM, Walter Underwood wrote:

I can get one Solr 4.1 instance up with the config bootstrapped into Zookeeper. 
In zk I see two configs, two collections, and I can run the DIH on the first 
node.

I can get the other two nodes to start and sync if I give them a 
-Dsolr.solr.home pointing to a directory with a solr.xml and subdirectories 
with configuration for each collection. If I don't do that, they look for 
solr/solr.xml, then fail. But what is the point of putting configs in Zookeeper 
if each host needs a copy anyway?

The wiki does not have an example of how to start a cluster with multiple 
collections.

Am I missing something here?


I am a beginner at SolrCloud.  I did recently get 4.1 running, though.  
With help from Mark Miller, what I did was set up a basic solr.xml based 
on the example, but with zero cores.  I did add a sharedLib parameter 
because I use ICU components in schema.xml.  I started one server with 
the zkHost parameter, then started the other with the bootstrap options, 
calling that config 'mbbasecfg'.  That got my config into zookeeper.  I 
still didn't have any collections or cores at this point.


Then I did:

http://server:port/solr/admin/collections?action=CREATE&name=mbstuff&numShards=1&replicationFactor=2&collection.configName=mbbasecfg

That created the following cores, one per server:

mbstuff_shard1_replica1
mbstuff_shard1_replica2

I have since created other collections in the same way.  It works really 
well so far.  I still haven't gotten to failure testing, but that's coming.


There hasn't been time to put my findings on the wiki.  This is the 
first time I'm writing anything about it, and it's incomplete.


Thanks,
Shawn



zookeeper config

2013-01-23 Thread J Mohamed Zahoor
Hi

I am using Solr 4.0.
I see the Solr data in zookeeper is placed on the root znode itself.
This becomes a pain if the zookeeper instance is used for multiple projects 
like HBase and like.

I am thinking of raising a Jira for putting them under a znode /solr or 
something like that?

./Zahoor



Re: ResultSet Solr

2013-01-23 Thread Mikhail Khludnev
http://wiki.apache.org/solr/XsltResponseWriter

IIRC you can output even json by xslt.


On Thu, Jan 24, 2013 at 5:11 AM, Otis Gospodnetic <
otis.gospodne...@gmail.com> wrote:

> Hi,
>
> Write a custom response writer?
>
> Otis
> Solr & ElasticSearch Support
> http://sematext.com/
> On Jan 23, 2013 3:51 PM, "hassancrowdc"  wrote:
>
> > no I wanted it in json. i want it to start from where square bracket
> > starts [
> > . I want to remove everything before that. I can get it in json by
> > including
> > wt=json. I just want to remove Response, numFound, start and docs.
> >
> >
> >
> > --
> > View this message in context:
> > http://lucene.472066.n3.nabble.com/ResultSet-Solr-tp4035729p4035748.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics


 


Re: Get tokenized words in Solr Response

2013-01-23 Thread Mikhail Khludnev
Romita,

IIRC you've already asked this, and I replied that everything what you need
is on debugQuery=on output. That format is a little bit verbose, and I
suppose you can experience some difficulties on finding the necessary info
there. Please provide debugQuery=on output, I can try to highlight the
necessary info for you.


On Thu, Jan 24, 2013 at 6:11 AM, Romita Saha
wrote:

> Hi,
>
> I want the tokenized keywords to be displayed in solr response. As for
> example, my solr search could be "Seach this document named XYZ-123". And
> the tokenizer in schema.xml tokenizes the query as follows:
> "search documnent xyz 123". I want to get these tokenized words in the
> Solr response. Is it possible?
>
> Thanks and regards,
> Romita




-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics


 


Re: setting up master and slave in same machine with diff ip's and same port

2013-01-23 Thread Marcin Rzewucki
Hi,

Have you tried to add aliases to your network interface (for master and
slave)? Then you should use -Djetty.host and -Djetty.port to bind Solr with
appropriate IPs. I think you should also use different directories for Solr
files (-Dsolr.solr.home) as there may be some conflict with index
files,etc. That's the first thing which comes to my mind.

Regards.

On 23 January 2013 23:48, epnRui  wrote:

> Hi everyone
>
> its my first post here so I hope im doing it in the right place.
>
> Im a software developer and Im setting up a DEV environment in Ubuntu with
> the same configuration as in PROD. (apparently this IT department doesnt
> know the difference between a developer and a sys admin)
>
> In PROD we have Solr Master and Solr slave, on two different IPs. Lets say:
> Master 192.10.1.1
> Slave 192.10.1.2
>
> In DEV I have only one server:
> 10.1.1.1
>
> All of them are Ubuntu servers.
>
> Can I put Master and Slave, without touching any configurations in Solr,no
> IP change, no Port change, in 10.1.1.1 (DEV), and still make it work?
>
> Basically what Im looking for is what Ubuntu server configuration Id have
> to
> do to make this work.
>
> Thanks a lot
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/setting-up-master-and-slave-in-same-machine-with-diff-ip-s-and-same-port-tp4035795.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


SOLR inconsistent results?

2013-01-23 Thread Omar Z
I have two solr instances. One is a master and the other a slave, polling the
master every 20 seconds or so for index updates. My application mainly
queries the slave, so most of the load falls to it.

There are some areas of the application that do query the master, however.
For instance, During the execution of an action (I am using the symfony 2
framework + solarium bundle + solarium lib) I query the master. I query not
just once, but between 20-50 times during the lifetime of the execution of
the action. You can assume that this amount of querying is tolerable. What
occurs during the querying has left me perplexed.

If I execute the action (make a page request through the browser), say
twice, the set of results returned are different for each of the requests.
To simplify, if the action only queried the master three times, then:

page request one: (first query: 1 hits, second query: 0 hits, third query: 1
hits)

page request two: (first query: 0 hits, second query: 1 hits, third query, 0
hits)

There are no differences in the queries in the first page request and second
(although the three queries themselves are different from each other). They
are the exact same queries. I tail the request logs in the solr master
instance, and it does log all of the requests, so all of the requests made
by the application code are being received correctly by the master (this
rules out any connection issues, application level issues), but it seems to
get hits sometimes, while other times, not. When I perform the same query
(that returned 0 hits during the execution of the action) in the front end
solr interface, I do get the hit I am expecting.

There is another server apart from the master, slave, and application, that
runs a process that continuously updates the index based on changes detected
in the source data - a relational database.

Could anyone provide some insight on this inconsistent behavior? Why would
solr produce two different results for the same query? 




--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-inconsistent-results-tp4035888.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: zookeeper config

2013-01-23 Thread Per Steffensen
This is supported. You just need to ajust your ZK connection-string: 
":/solr,:/solr,...,:/solr"


Regards, Per Steffensen

On 1/24/13 7:57 AM, J Mohamed Zahoor wrote:

Hi

I am using Solr 4.0.
I see the Solr data in zookeeper is placed on the root znode itself.
This becomes a pain if the zookeeper instance is used for multiple projects 
like HBase and like.

I am thinking of raising a Jira for putting them under a znode /solr or 
something like that?

./Zahoor