SolrCloud clarification/Question

2015-09-16 Thread Ravi Solr
Hello,
 We are trying to move away from Master-Slave configuration to a
SolrCloud environment. I have a couple of questions. Currently in the
Master-Slave setup we have 4 Machines 2 of which are indexers and 2 of them
are query servers. The query servers are fronted via Load Balancer.

There are 3 solr cores for 3 different/separate applications (mutually
exclusive). Each core is a complete index of all docs (i.e. the data is not
sharded).

  We intend to keep it in a non-sharded mode even after the SolrCloud
mode.The prime motivation to move to cloud is to effectively use all
servers for indexing and querying (read fault tolerant/redundant).

So, the real question is, can SolrCloud be used without shards ? i.e. a
"collection" resides entirely on one machine rather than partitioning data
onto different machines ?

Thanks

Ravi Kiran Bhaskar


Re: SolrCloud clarification/Question

2015-09-16 Thread Ravi Solr
Thank you very much for responding Sameer so numShards=0 and
replicationFactr=4 if I have 4 machines ??

Thanks

Ravi Kiran Bhaskar

On Wed, Sep 16, 2015 at 12:56 PM, Sameer Maggon 
wrote:

> Absolutely. You can have a collection with just replicas and no shards for
> redundancy and have a load balancer in front of it that removes the
> dependency on a single node. One of them will assume the role of a leader,
> and in case that leader goes down, one of the replicas will be elected as a
> leader and your application will be fine.
>
> Thanks,
>
> On Wed, Sep 16, 2015 at 9:44 AM, Ravi Solr  wrote:
>
> > Hello,
> >  We are trying to move away from Master-Slave configuration to a
> > SolrCloud environment. I have a couple of questions. Currently in the
> > Master-Slave setup we have 4 Machines 2 of which are indexers and 2 of
> them
> > are query servers. The query servers are fronted via Load Balancer.
> >
> > There are 3 solr cores for 3 different/separate applications (mutually
> > exclusive). Each core is a complete index of all docs (i.e. the data is
> not
> > sharded).
> >
> >   We intend to keep it in a non-sharded mode even after the SolrCloud
> > mode.The prime motivation to move to cloud is to effectively use all
> > servers for indexing and querying (read fault tolerant/redundant).
> >
> > So, the real question is, can SolrCloud be used without shards ? i.e. a
> > "collection" resides entirely on one machine rather than partitioning
> data
> > onto different machines ?
> >
> > Thanks
> >
> > Ravi Kiran Bhaskar
> >
>
>
>
> --
> *Sameer Maggon*
> Measured Search
> c: 310.344.7266
> www.measuredsearch.com <http://measuredsearch.com>
>


Re: SolrCloud clarification/Question

2015-09-16 Thread Ravi Solr
OK...I understood numShards=1, when you say replicationFactor=2 what does
it mean ? I have 4 machines, then, only 3 copies of data (1 at leader and 2
replicas) ?? so am i not under utilizing one machine ?

I was more thinking in the lines of a Mesh connectivity format i.e.
everybody has others copy so that I can put all 4 machines behind a Load
Balancer...Is that a wrong way to look at it ?

Thanks

Ravi Kiran

On Wed, Sep 16, 2015 at 2:51 PM, Sameer Maggon 
wrote:

> You'll have to say numShards=1 and replicationFactor=2.
>
> http://
>
> [hostname]:8983/solr/admin/collections?action=CREATE&name=test&configName=test&numShards=1&replicationFactor=2
>
> On Wed, Sep 16, 2015 at 11:23 AM, Ravi Solr  wrote:
>
> > Thank you very much for responding Sameer so numShards=0 and
> > replicationFactr=4 if I have 4 machines ??
> >
> > Thanks
> >
> > Ravi Kiran Bhaskar
> >
> > On Wed, Sep 16, 2015 at 12:56 PM, Sameer Maggon <
> sam...@measuredsearch.com
> > >
> > wrote:
> >
> > > Absolutely. You can have a collection with just replicas and no shards
> > for
> > > redundancy and have a load balancer in front of it that removes the
> > > dependency on a single node. One of them will assume the role of a
> > leader,
> > > and in case that leader goes down, one of the replicas will be elected
> > as a
> > > leader and your application will be fine.
> > >
> > > Thanks,
> > >
> > > On Wed, Sep 16, 2015 at 9:44 AM, Ravi Solr  wrote:
> > >
> > > > Hello,
> > > >  We are trying to move away from Master-Slave configuration
> to
> > a
> > > > SolrCloud environment. I have a couple of questions. Currently in the
> > > > Master-Slave setup we have 4 Machines 2 of which are indexers and 2
> of
> > > them
> > > > are query servers. The query servers are fronted via Load Balancer.
> > > >
> > > > There are 3 solr cores for 3 different/separate applications
> (mutually
> > > > exclusive). Each core is a complete index of all docs (i.e. the data
> is
> > > not
> > > > sharded).
> > > >
> > > >   We intend to keep it in a non-sharded mode even after the
> > SolrCloud
> > > > mode.The prime motivation to move to cloud is to effectively use all
> > > > servers for indexing and querying (read fault tolerant/redundant).
> > > >
> > > > So, the real question is, can SolrCloud be used without shards ?
> i.e. a
> > > > "collection" resides entirely on one machine rather than partitioning
> > > data
> > > > onto different machines ?
> > > >
> > > > Thanks
> > > >
> > > > Ravi Kiran Bhaskar
> > > >
> > >
> > >
> > >
> > > --
> > > *Sameer Maggon*
> > > Measured Search
> > > c: 310.344.7266
> > > www.measuredsearch.com <http://measuredsearch.com>
> > >
> >
>
>
>
> --
> *Sameer Maggon*
> Measured Search
> c: 310.344.7266
> www.measuredsearch.com <http://measuredsearch.com>
>


Re: SolrCloud clarification/Question

2015-09-18 Thread Ravi Solr
Thank you very much Sameer, Erick and Upayavira. I got the solr cloud
working !!! Hurray !!

Cheers

Ravi Kiran Bhaskar

On Thu, Sep 17, 2015 at 3:10 AM, Upayavira  wrote:

> and replicationFactor is the number of copies of your data, not the
> number of servers marked 'replica'. So as has been said, if you have one
> leader, and three replicas, your replicationFactor will be 4.
>
> Upayavira
>
> On Thu, Sep 17, 2015, at 03:29 AM, Erick Erickson wrote:
> > Ravi:
> >
> > Sameer is correct on how to get it done in one go.
> >
> > Don't get too hung up on replicationFactor. You can always
> > ADDREPLICA after the collection is created if you need to.
> >
> > Best,
> > Erick
> >
> >
> > On Wed, Sep 16, 2015 at 12:44 PM, Sameer Maggon
> >  wrote:
> > > I just gave an example API call, but for your scenario, the
> > > replicationFactor will be 4 (replicationFactor=4). In this way, all 4
> > > machines will have the same copy of the data and you can put an LB in
> front
> > > of those 4 machines.
> > >
> > > On Wed, Sep 16, 2015 at 12:00 PM, Ravi Solr 
> wrote:
> > >
> > >> OK...I understood numShards=1, when you say replicationFactor=2 what
> does
> > >> it mean ? I have 4 machines, then, only 3 copies of data (1 at leader
> and 2
> > >> replicas) ?? so am i not under utilizing one machine ?
> > >>
> > >> I was more thinking in the lines of a Mesh connectivity format i.e.
> > >> everybody has others copy so that I can put all 4 machines behind a
> Load
> > >> Balancer...Is that a wrong way to look at it ?
> > >>
> > >> Thanks
> > >>
> > >> Ravi Kiran
> > >>
> > >> On Wed, Sep 16, 2015 at 2:51 PM, Sameer Maggon <
> sam...@measuredsearch.com>
> > >> wrote:
> > >>
> > >> > You'll have to say numShards=1 and replicationFactor=2.
> > >> >
> > >> > http://
> > >> >
> > >> >
> > >>
> [hostname]:8983/solr/admin/collections?action=CREATE&name=test&configName=test&numShards=1&replicationFactor=2
> > >> >
> > >> > On Wed, Sep 16, 2015 at 11:23 AM, Ravi Solr 
> wrote:
> > >> >
> > >> > > Thank you very much for responding Sameer so numShards=0 and
> > >> > > replicationFactr=4 if I have 4 machines ??
> > >> > >
> > >> > > Thanks
> > >> > >
> > >> > > Ravi Kiran Bhaskar
> > >> > >
> > >> > > On Wed, Sep 16, 2015 at 12:56 PM, Sameer Maggon <
> > >> > sam...@measuredsearch.com
> > >> > > >
> > >> > > wrote:
> > >> > >
> > >> > > > Absolutely. You can have a collection with just replicas and no
> > >> shards
> > >> > > for
> > >> > > > redundancy and have a load balancer in front of it that removes
> the
> > >> > > > dependency on a single node. One of them will assume the role
> of a
> > >> > > leader,
> > >> > > > and in case that leader goes down, one of the replicas will be
> > >> elected
> > >> > > as a
> > >> > > > leader and your application will be fine.
> > >> > > >
> > >> > > > Thanks,
> > >> > > >
> > >> > > > On Wed, Sep 16, 2015 at 9:44 AM, Ravi Solr 
> > >> wrote:
> > >> > > >
> > >> > > > > Hello,
> > >> > > > >  We are trying to move away from Master-Slave
> configuration
> > >> > to
> > >> > > a
> > >> > > > > SolrCloud environment. I have a couple of questions.
> Currently in
> > >> the
> > >> > > > > Master-Slave setup we have 4 Machines 2 of which are indexers
> and 2
> > >> > of
> > >> > > > them
> > >> > > > > are query servers. The query servers are fronted via Load
> Balancer.
> > >> > > > >
> > >> > > > > There are 3 solr cores for 3 different/separate applications
> > >> > (mutually
> > >> > > > > exclusive). Each core is a complete index of all docs (i.e.
> the
> > >> data
> > >> > is
> > >> > > > not
> > >> > > > > sharded).
> > >> > > > >
> > >> > > > >   We intend to keep it in a non-sharded mode even after
> the
> > >> > > SolrCloud
> > >> > > > > mode.The prime motivation to move to cloud is to effectively
> use
> > >> all
> > >> > > > > servers for indexing and querying (read fault
> tolerant/redundant).
> > >> > > > >
> > >> > > > > So, the real question is, can SolrCloud be used without
> shards ?
> > >> > i.e. a
> > >> > > > > "collection" resides entirely on one machine rather than
> > >> partitioning
> > >> > > > data
> > >> > > > > onto different machines ?
> > >> > > > >
> > >> > > > > Thanks
> > >> > > > >
> > >> > > > > Ravi Kiran Bhaskar
> > >> > > > >
> > >> > > >
> > >> > > >
> > >> > > >
> > >> > > > --
> > >> > > > *Sameer Maggon*
> > >> > > > Measured Search
> > >> > > > c: 310.344.7266
> > >> > > > www.measuredsearch.com <http://measuredsearch.com>
> > >> > > >
> > >> > >
> > >> >
> > >> >
> > >> >
> > >> > --
> > >> > *Sameer Maggon*
> > >> > Measured Search
> > >> > c: 310.344.7266
> > >> > www.measuredsearch.com <http://measuredsearch.com>
> > >> >
> > >>
> > >
> > >
> > >
> > > --
> > > *Sameer Maggon*
> > > Measured Search
> > > c: 310.344.7266
> > > www.measuredsearch.com <http://measuredsearch.com>
>


SolrCloud DIH issue

2015-09-19 Thread Ravi Solr
I am facing a weird problem. As part of upgrade from 4.7.2 (Master-Slave)
to 5.3.0 (Solrcloud) I re-indexed 1.5 million records via DIH using
SolrEntityProcessor yesterday, all of them indexed properly. Today morning
I just ran the DIH again with delta import and I lost all docs...what am I
missing ? Did anybody face similar issue ?

Here are the errors in the logs

9/19/2015, 2:41:17 AM ERROR null SolrCore Previous SolrRequestInfo was not
closed!
req=waitSearcher=true&distrib.from=http://10.128.159.32:8983/solr/sitesearchcore/&update.distrib=FROMLEADER&openSearcher=true&commit=true&wt=javabin&expungeDeletes=false&commit_end_point=true&version=2&softCommit=false
9/19/2015,
2:41:17 AM ERROR null SolrCore prev == info : false 9/19/2015, 2:41:17 AM
WARN null ZKPropertiesWriter Could not read DIH properties from
/configs/sitesearchcore/dataimport.properties :class
org.apache.zookeeper.KeeperException$NoNodeException

org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode
= NoNode for /configs/sitesearchcore/dataimport.properties
at org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155)
at 
org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:349)
at 
org.apache.solr.handler.dataimport.ZKPropertiesWriter.readIndexerProperties(ZKPropertiesWriter.java:91)
at 
org.apache.solr.handler.dataimport.ZKPropertiesWriter.persist(ZKPropertiesWriter.java:65)
at 
org.apache.solr.handler.dataimport.DocBuilder.finish(DocBuilder.java:307)
at 
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:253)
at 
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:416)
at 
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:480)
at 
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:461)

9/19/2015, 11:16:43 AM ERROR null SolrCore Previous SolrRequestInfo was not
closed!
req=waitSearcher=true&distrib.from=http://10.128.159.32:8983/solr/sitesearchcore/&update.distrib=FROMLEADER&openSearcher=true&commit=true&wt=javabin&expungeDeletes=false&commit_end_point=true&version=2&softCommit=false
9/19/2015,
11:16:43 AM ERROR null SolrCore prev == info : false



Thanks

Ravi Kiran Bhaskar


Re: SolrCloud DIH issue

2015-09-19 Thread Ravi Solr
Thank you for the prompt response Erick. I did a full-import yesterday, you
are correct that I did not push dataimport.properties to ZK, should it have
not worked even for a full import ?. You may be right about 'clean' option,
I will reindex again today. BTW how do we push a single file to a specific
config name in zookeeper ?


Thanks,

Ravi Kiran Bhaskar


On Sat, Sep 19, 2015 at 1:48 PM, Erick Erickson 
wrote:

> Could not read DIH properties from
> /configs/sitesearchcore/dataimport.properties
>
> This looks like somehow you didn't push this file up to Zookeeper. You
> can check what files are there in the admin UI. How you indexed
> yesterday is a mystery though, unless somehow this file was removed
> from ZK.
>
> As for why you lost all the docs, my suspicion is that you have the
> clean param set up for delta import
>
> FWIW,
> Erick
>
> On Sat, Sep 19, 2015 at 10:36 AM, Ravi Solr  wrote:
> > I am facing a weird problem. As part of upgrade from 4.7.2 (Master-Slave)
> > to 5.3.0 (Solrcloud) I re-indexed 1.5 million records via DIH using
> > SolrEntityProcessor yesterday, all of them indexed properly. Today
> morning
> > I just ran the DIH again with delta import and I lost all docs...what am
> I
> > missing ? Did anybody face similar issue ?
> >
> > Here are the errors in the logs
> >
> > 9/19/2015, 2:41:17 AM ERROR null SolrCore Previous SolrRequestInfo was
> not
> > closed!
> > req=waitSearcher=true&distrib.from=
> http://10.128.159.32:8983/solr/sitesearchcore/&update.distrib=FROMLEADER&openSearcher=true&commit=true&wt=javabin&expungeDeletes=false&commit_end_point=true&version=2&softCommit=false
> > 9/19/2015,
> > 2:41:17 AM ERROR null SolrCore prev == info : false 9/19/2015, 2:41:17 AM
> > WARN null ZKPropertiesWriter Could not read DIH properties from
> > /configs/sitesearchcore/dataimport.properties :class
> > org.apache.zookeeper.KeeperException$NoNodeException
> >
> > org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode
> > = NoNode for /configs/sitesearchcore/dataimport.properties
> > at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
> > at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> > at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155)
> > at
> org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:349)
> > at
> org.apache.solr.handler.dataimport.ZKPropertiesWriter.readIndexerProperties(ZKPropertiesWriter.java:91)
> > at
> org.apache.solr.handler.dataimport.ZKPropertiesWriter.persist(ZKPropertiesWriter.java:65)
> > at
> org.apache.solr.handler.dataimport.DocBuilder.finish(DocBuilder.java:307)
> > at
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:253)
> > at
> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:416)
> > at
> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:480)
> > at
> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:461)
> >
> > 9/19/2015, 11:16:43 AM ERROR null SolrCore Previous SolrRequestInfo was
> not
> > closed!
> > req=waitSearcher=true&distrib.from=
> http://10.128.159.32:8983/solr/sitesearchcore/&update.distrib=FROMLEADER&openSearcher=true&commit=true&wt=javabin&expungeDeletes=false&commit_end_point=true&version=2&softCommit=false
> > 9/19/2015,
> > 11:16:43 AM ERROR null SolrCore prev == info : false
> >
> >
> >
> > Thanks
> >
> > Ravi Kiran Bhaskar
>


Re: SolrCloud DIH issue

2015-09-19 Thread Ravi Solr
Thanks Erick, I will report back once the reindex is finished. Oh, your
answer reminded me of another question - Regarding configsets the
documentation says

"On a multicore Solr instance, you may find that you want to share
configuration between a number of different cores."

Can the same be used to push disparate mutually exclusive configs ? I ask
this as I have 4 mutually exclusive apps each with a 4 single core index on
a single machine which I am trying to convert to SolrCloud with single
shard approach. Just being lazy and trying to find a way to update and link
configs to zookeeper ;-)

Thanks

Rvai Kiran Bhaskar

On Sat, Sep 19, 2015 at 6:54 PM, Erick Erickson 
wrote:

> Just pushing up the entire configset would be easiest, but the
> Zookeeper command line tools allow you to push up a single
> file if you want.
>
> Yeah, it puzzles me too that the import worked yesterday, not really
> sure what happened, the file shouldn't just disappear
>
> Erick
>
> On Sat, Sep 19, 2015 at 2:46 PM, Ravi Solr  wrote:
> > Thank you for the prompt response Erick. I did a full-import yesterday,
> you
> > are correct that I did not push dataimport.properties to ZK, should it
> have
> > not worked even for a full import ?. You may be right about 'clean'
> option,
> > I will reindex again today. BTW how do we push a single file to a
> specific
> > config name in zookeeper ?
> >
> >
> > Thanks,
> >
> > Ravi Kiran Bhaskar
> >
> >
> > On Sat, Sep 19, 2015 at 1:48 PM, Erick Erickson  >
> > wrote:
> >
> >> Could not read DIH properties from
> >> /configs/sitesearchcore/dataimport.properties
> >>
> >> This looks like somehow you didn't push this file up to Zookeeper. You
> >> can check what files are there in the admin UI. How you indexed
> >> yesterday is a mystery though, unless somehow this file was removed
> >> from ZK.
> >>
> >> As for why you lost all the docs, my suspicion is that you have the
> >> clean param set up for delta import
> >>
> >> FWIW,
> >> Erick
> >>
> >> On Sat, Sep 19, 2015 at 10:36 AM, Ravi Solr  wrote:
> >> > I am facing a weird problem. As part of upgrade from 4.7.2
> (Master-Slave)
> >> > to 5.3.0 (Solrcloud) I re-indexed 1.5 million records via DIH using
> >> > SolrEntityProcessor yesterday, all of them indexed properly. Today
> >> morning
> >> > I just ran the DIH again with delta import and I lost all docs...what
> am
> >> I
> >> > missing ? Did anybody face similar issue ?
> >> >
> >> > Here are the errors in the logs
> >> >
> >> > 9/19/2015, 2:41:17 AM ERROR null SolrCore Previous SolrRequestInfo was
> >> not
> >> > closed!
> >> > req=waitSearcher=true&distrib.from=
> >>
> http://10.128.159.32:8983/solr/sitesearchcore/&update.distrib=FROMLEADER&openSearcher=true&commit=true&wt=javabin&expungeDeletes=false&commit_end_point=true&version=2&softCommit=false
> >> > 9/19/2015,
> >> > 2:41:17 AM ERROR null SolrCore prev == info : false 9/19/2015,
> 2:41:17 AM
> >> > WARN null ZKPropertiesWriter Could not read DIH properties from
> >> > /configs/sitesearchcore/dataimport.properties :class
> >> > org.apache.zookeeper.KeeperException$NoNodeException
> >> >
> >> > org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode
> >> > = NoNode for /configs/sitesearchcore/dataimport.properties
> >> > at
> >> org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
> >> > at
> >> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> >> > at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155)
> >> > at
> >> org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:349)
> >> > at
> >>
> org.apache.solr.handler.dataimport.ZKPropertiesWriter.readIndexerProperties(ZKPropertiesWriter.java:91)
> >> > at
> >>
> org.apache.solr.handler.dataimport.ZKPropertiesWriter.persist(ZKPropertiesWriter.java:65)
> >> > at
> >>
> org.apache.solr.handler.dataimport.DocBuilder.finish(DocBuilder.java:307)
> >> > at
> >>
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:253)
> >> > at
> >>
> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:416)
> >> > at
> >>
> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:480)
> >> > at
> >>
> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:461)
> >> >
> >> > 9/19/2015, 11:16:43 AM ERROR null SolrCore Previous SolrRequestInfo
> was
> >> not
> >> > closed!
> >> > req=waitSearcher=true&distrib.from=
> >>
> http://10.128.159.32:8983/solr/sitesearchcore/&update.distrib=FROMLEADER&openSearcher=true&commit=true&wt=javabin&expungeDeletes=false&commit_end_point=true&version=2&softCommit=false
> >> > 9/19/2015,
> >> > 11:16:43 AM ERROR null SolrCore prev == info : false
> >> >
> >> >
> >> >
> >> > Thanks
> >> >
> >> > Ravi Kiran Bhaskar
> >>
>


Re: SolrCloud DIH issue

2015-09-19 Thread Ravi Solr
Cant thank you enough for clarifying it at length. Yeah its pretty
confusing even for experienced Solr users. I used the upconfig and
linkconfig commands to update 4 collections into zookeeper...As you
described, I lucked out as I used the same name for configset and the
collection and hence did not have to use the collections API :-)

Thanks,

Ravi Kiran Bhaskar

On Sat, Sep 19, 2015 at 11:22 PM, Erick Erickson 
wrote:

> Let's back up a second. Configsets are what _used_ to be in the conf
> directory for each core on a local drive, it's just that they're now
> kept up on Zookeeper. Otherwise, you'd have to put them on each
> instance in SolrCloud, and bringing up a new replica on a new machine
> would look a lot like adding a core with the old core admin API.
>
> So instead, configurations are kept on zookeeper. A config set
> consists of, essentially, a named old-style "conf" directory. There's
> no a-priori limit to the number of config sets you can have. Look in
> the admin UI, Cloud>>tree>>configs and you'll see each name you've
> pushed to ZK. If you explore that tree, you'll see a lot of old
> familiar faces, schema.xml, solrconfig.xml, etc.
>
> So now we come to associating configs with collections. You've
> probably done one of the examples where some things happen under the
> covers, including explicitly pushing the configset to Zookeeper.
> Currently, there's no option in the bin/solr script to push a config,
> although I know there's a JIRA to do that.
>
> So, to put a new config set up you currently need to use the zkCli.sh
> script see:
> https://cwiki.apache.org/confluence/display/solr/Command+Line+Utilities,
> the "upconfig" command. That pushes the configset up to ZK and gives
> it a name.
>
> Now, you create a collection and it needs a configset stored in ZK.
> It's a little tricky in that if you do _not_ explicitly specify a
> configest (using the collection.configName parameter to the
> collections API CREATE command), then by default it'll look for a
> configset with the same name as the collection. If it doesn't find
> one, _and_ there is one and only one configset, then it'll use that
> one (personally I find that confusing, but that's the way it works).
> See: https://cwiki.apache.org/confluence/display/solr/Collections+API
>
> If you have two or more configsets in ZK, then either the configset
> name has to be identical to the collection name (if you don't specify
> collection.configName), _or_ you specify collection.configName at
> create time.
>
> NOTE: there are _no_ config files on the local disk! When a replica of
> a collection loads, it "knows" what collection it's part of and pulls
> the corresponding configset from ZK.
>
> So typically the process is this.
> > you create the config set by editing all the usual suspects, schema.xml,
> solrconfig.xml, DIH config etc.
> > you put those configuration files into some version control system (you
> are using one, right?)
> > you push the configs to Zookeeper
> > you create the collection
> > you figure out you need to change the configs so you
>   > check the code out of your version control
>   > edit them
>   > put the current version back into version control
>   > push the configs up to zookeeper, overwriting the ones already
> there with that name
>   > reload the collection or bounce all the servers. As each replica
> in the collection comes up,
>  it downloads the latest configs from Zookeeper to memory (not to
> disk) and uses them.
>
> Seems like a long drawn-out process, but pretty soon it's automatic.
> And really, the only extra step is the push to Zookeeper, the rest is
> just like old-style cores with the exception that you don't have to
> manually push all the configs to all the machines hosting cores.
>
> Notice that I have mostly avoided talking about "cores" here. Although
> it's true that a replica in a collection is just another core, it's
> "special" in that it has certain very specific properties set. I
> _strongly_ advise you stop thinking about old-style Solr cores and
> instead thing about collections and replicas. And above all, do _not_
> use the admin core API to try to create members of a collection
> (cores), use the collections API to ADDREPLICA/DELETEREPLICA instead.
> Loading/unloading cores is less "fraught", but I try to avoid that too
> and use
>
> Best,
> Erick
>
> On Sat, Sep 19, 2015 at 9:08 PM, Ravi Solr  wrote:
> > Thanks Erick, I will report back once the reindex is finished. Oh, your
> > answer reminded me of another ques

Re: SolrCloud DIH issue

2015-09-20 Thread Ravi Solr
Yes Upayavira, that's exactly what prompted me to ask Erick as soon as I
read https://cwiki.apache.org/confluence/display/solr/Config+Sets

Erick, Regarding my delta-import not working I do see the
dataimport.properties in zookeeper. after I "upconfig" and "linkconfig" my
conf files into ZK...see below

[zk: localhost: (CONNECTED) 0] ls /configs/xx
[admin-extra.menu-top.html, person-synonyms.txt, entity-stopwords.txt,
protwords.txt, location-synonyms.txt, solrconfig.xml,
organization-synonyms.txt, stopwords.txt, spellings.txt,
dataimport.properties, admin-extra.html, xslt, synonyms.txt, scripts.conf,
subject-synonyms.txt, elevate.xml, admin-extra.menu-bottom.html,
solr-import-config.xml, clustering, schema.xml]

However, when I look into dataimport.properties in my 'conf' folder it
hasn't updated even after running full-import on Sep 19 2015 1:00AM
successfully and subsequent delta-import on Sep 20 2015 11:AM which did not
import newer docs, This prompted me to look into the dataimport.properties
in the conf folder...the details are shown below, you can clearly see the
dates are quite a bit off.

[@y conf]$ cat dataimport.properties
#Tue Sep 15 18:11:17 UTC 2015
reindex-docs.last_index_time=2015-09-15 18\:11\:16
last_index_time=2015-09-15 18\:11\:16
sep.last_index_time=2014-03-24 13\:41\:46


I saw some JIRA tickets about different location of dataimport.properties
for SolrCloud but couldnt find the path where it stores...Anybody have idea
where it stores it ?

Thanks

Ravi Kiran Bhaskar



On Sun, Sep 20, 2015 at 5:28 AM, Upayavira  wrote:

> It is worth noting that the ref guide page on configsets refers to
> non-cloud mode (a useful new feature) whereas people may confuse this
> with configsets in cloud mode,  which use Zookeeper.
>
> Upayavira
>
> On Sun, Sep 20, 2015, at 04:59 AM, Ravi Solr wrote:
> > Cant thank you enough for clarifying it at length. Yeah its pretty
> > confusing even for experienced Solr users. I used the upconfig and
> > linkconfig commands to update 4 collections into zookeeper...As you
> > described, I lucked out as I used the same name for configset and the
> > collection and hence did not have to use the collections API :-)
> >
> > Thanks,
> >
> > Ravi Kiran Bhaskar
> >
> > On Sat, Sep 19, 2015 at 11:22 PM, Erick Erickson
> > 
> > wrote:
> >
> > > Let's back up a second. Configsets are what _used_ to be in the conf
> > > directory for each core on a local drive, it's just that they're now
> > > kept up on Zookeeper. Otherwise, you'd have to put them on each
> > > instance in SolrCloud, and bringing up a new replica on a new machine
> > > would look a lot like adding a core with the old core admin API.
> > >
> > > So instead, configurations are kept on zookeeper. A config set
> > > consists of, essentially, a named old-style "conf" directory. There's
> > > no a-priori limit to the number of config sets you can have. Look in
> > > the admin UI, Cloud>>tree>>configs and you'll see each name you've
> > > pushed to ZK. If you explore that tree, you'll see a lot of old
> > > familiar faces, schema.xml, solrconfig.xml, etc.
> > >
> > > So now we come to associating configs with collections. You've
> > > probably done one of the examples where some things happen under the
> > > covers, including explicitly pushing the configset to Zookeeper.
> > > Currently, there's no option in the bin/solr script to push a config,
> > > although I know there's a JIRA to do that.
> > >
> > > So, to put a new config set up you currently need to use the zkCli.sh
> > > script see:
> > >
> https://cwiki.apache.org/confluence/display/solr/Command+Line+Utilities,
> > > the "upconfig" command. That pushes the configset up to ZK and gives
> > > it a name.
> > >
> > > Now, you create a collection and it needs a configset stored in ZK.
> > > It's a little tricky in that if you do _not_ explicitly specify a
> > > configest (using the collection.configName parameter to the
> > > collections API CREATE command), then by default it'll look for a
> > > configset with the same name as the collection. If it doesn't find
> > > one, _and_ there is one and only one configset, then it'll use that
> > > one (personally I find that confusing, but that's the way it works).
> > > See: https://cwiki.apache.org/confluence/display/solr/Collections+API
> > >
> > > If you have two or more configsets in ZK, then either the configset
> > &g

SolrCloud Startup question

2015-09-21 Thread Ravi Solr
Can somebody kindly help me understand the difference between the following
startup calls ?

./solr start -p  -s /solr/home -z zk1:2181,zk2:2181,zk3:2181

Vs

./solr start -c -p  -s /solr/home -z zk1:2181,zk2:2181,zk3:2181

What happens if i don't pass the "-c" option ?? I read the documentation
but got more confused, I do run a ZK ensemble of 3 instances.  FYI my cloud
seems to work fine and teh Admin UI shows Cloud graph just fine, but I want
to just make sure I am doing the right thing and not missing any nuance.

The following is form documention on cwiki.
---

"Start Solr in SolrCloud mode, which will also launch the embedded
ZooKeeper instance included with Solr.

This option can be shortened to simply -c.

If you are already running a ZooKeeper ensemble that you want to use
instead of the embedded (single-node) ZooKeeper, you should also pass the
-z parameter."

-

Thanks

Ravi Kiran Bhaskar


Re: SolrCloud Startup question

2015-09-21 Thread Ravi Solr
Thank you Anshum & Upayavira.

BTW do any of you guys know if CloudSolrClient is ThreadSafe ??

Thanks,

Ravi Kiran Bhaskar

On Monday, September 21, 2015, Anshum Gupta  wrote:

> Hi Ravi,
>
> I just tried it out and here's my understanding:
>
> 1. Starting Solr with -c starts Solr in cloud mode. This is used to start
> Solr with an embedded zookeeper.
> 2. Starting Solr with -z starts Solr in cloud mode, with the zk connection
> string you specify. You don't need to explicitly specify -c in this case.
> The help text there needs a bit of fixing though
>
> *  -zZooKeeper connection string; only used when running in
> SolrCloud mode using -c*
> *   To launch an embedded ZooKeeper instance, don't pass
> this parameter.*
>
> *"only used when running in SolrCloud mode using -c" *needs to be rephrased
> or removed. Can you create a JIRA for the same?
>
>
> On Mon, Sep 21, 2015 at 1:35 PM, Ravi Solr  > wrote:
>
> > Can somebody kindly help me understand the difference between the
> following
> > startup calls ?
> >
> > ./solr start -p  -s /solr/home -z zk1:2181,zk2:2181,zk3:2181
> >
> > Vs
> >
> > ./solr start -c -p  -s /solr/home -z zk1:2181,zk2:2181,zk3:2181
> >
> > What happens if i don't pass the "-c" option ?? I read the documentation
> > but got more confused, I do run a ZK ensemble of 3 instances.  FYI my
> cloud
> > seems to work fine and teh Admin UI shows Cloud graph just fine, but I
> want
> > to just make sure I am doing the right thing and not missing any nuance.
> >
> > The following is form documention on cwiki.
> > ---
> >
> > "Start Solr in SolrCloud mode, which will also launch the embedded
> > ZooKeeper instance included with Solr.
> >
> > This option can be shortened to simply -c.
> >
> > If you are already running a ZooKeeper ensemble that you want to use
> > instead of the embedded (single-node) ZooKeeper, you should also pass the
> > -z parameter."
> >
> > -
> >
> > Thanks
> >
> > Ravi Kiran Bhaskar
> >
>
>
>
> --
> Anshum Gupta
>


Re: SolrCloud Startup question

2015-09-22 Thread Ravi Solr
Thanks Anshum

On Mon, Sep 21, 2015 at 6:23 PM, Anshum Gupta 
wrote:

> CloudSolrClient is thread safe and it is highly recommended you reuse the
> client.
>
> If you are providing an HttpClient instance while constructing, make sure
> that the HttpClient uses a multi-threaded connection manager.
>
> On Mon, Sep 21, 2015 at 3:13 PM, Ravi Solr  wrote:
>
> > Thank you Anshum & Upayavira.
> >
> > BTW do any of you guys know if CloudSolrClient is ThreadSafe ??
> >
> > Thanks,
> >
> > Ravi Kiran Bhaskar
> >
> > On Monday, September 21, 2015, Anshum Gupta 
> > wrote:
> >
> > > Hi Ravi,
> > >
> > > I just tried it out and here's my understanding:
> > >
> > > 1. Starting Solr with -c starts Solr in cloud mode. This is used to
> start
> > > Solr with an embedded zookeeper.
> > > 2. Starting Solr with -z starts Solr in cloud mode, with the zk
> > connection
> > > string you specify. You don't need to explicitly specify -c in this
> case.
> > > The help text there needs a bit of fixing though
> > >
> > > *  -zZooKeeper connection string; only used when running in
> > > SolrCloud mode using -c*
> > > *   To launch an embedded ZooKeeper instance, don't
> pass
> > > this parameter.*
> > >
> > > *"only used when running in SolrCloud mode using -c" *needs to be
> > rephrased
> > > or removed. Can you create a JIRA for the same?
> > >
> > >
> > > On Mon, Sep 21, 2015 at 1:35 PM, Ravi Solr  > > > wrote:
> > >
> > > > Can somebody kindly help me understand the difference between the
> > > following
> > > > startup calls ?
> > > >
> > > > ./solr start -p  -s /solr/home -z zk1:2181,zk2:2181,zk3:2181
> > > >
> > > > Vs
> > > >
> > > > ./solr start -c -p  -s /solr/home -z zk1:2181,zk2:2181,zk3:2181
> > > >
> > > > What happens if i don't pass the "-c" option ?? I read the
> > documentation
> > > > but got more confused, I do run a ZK ensemble of 3 instances.  FYI my
> > > cloud
> > > > seems to work fine and teh Admin UI shows Cloud graph just fine, but
> I
> > > want
> > > > to just make sure I am doing the right thing and not missing any
> > nuance.
> > > >
> > > > The following is form documention on cwiki.
> > > > ---
> > > >
> > > > "Start Solr in SolrCloud mode, which will also launch the embedded
> > > > ZooKeeper instance included with Solr.
> > > >
> > > > This option can be shortened to simply -c.
> > > >
> > > > If you are already running a ZooKeeper ensemble that you want to use
> > > > instead of the embedded (single-node) ZooKeeper, you should also pass
> > the
> > > > -z parameter."
> > > >
> > > > -
> > > >
> > > > Thanks
> > > >
> > > > Ravi Kiran Bhaskar
> > > >
> > >
> > >
> > >
> > > --
> > > Anshum Gupta
> > >
> >
>
>
>
> --
> Anshum Gupta
>


Weird Exception

2015-09-23 Thread Ravi Solr
Recently I installed 5.3.0 and started seeing weird exception which baffled
me. Has anybody encountered such an issue ? The indexing was done via DIH,
the field that is causing the issue is a TrieDateField defined as below




Looking at the following exceptions it feels like a wrong exception,
ity just doesnt jive well with the field definitions


2015-09-24 01:43:33.667 ERROR (qtp1256054824-13) [c:collection1
s:shard1 r:core_node2 x:collection1_shard1_replica4] o.a.s.c.SolrCore
java.lang.IllegalStateException: Type mismatch: pubdatetime was
indexed with multiple values per document, use SORTED_SET instead
at 
org.apache.lucene.uninverting.FieldCacheImpl$SortedDocValuesCache.createValue(FieldCacheImpl.java:679)
at 
org.apache.lucene.uninverting.FieldCacheImpl$Cache.get(FieldCacheImpl.java:190)
at 
org.apache.lucene.uninverting.FieldCacheImpl.getTermsIndex(FieldCacheImpl.java:647)
at 
org.apache.lucene.uninverting.FieldCacheImpl.getTermsIndex(FieldCacheImpl.java:627)
at 
org.apache.lucene.uninverting.UninvertingReader.getSortedDocValues(UninvertingReader.java:257)
at 
org.apache.lucene.index.MultiDocValues.getSortedValues(MultiDocValues.java:316)
at 
org.apache.lucene.index.SlowCompositeReaderWrapper.getSortedDocValues(SlowCompositeReaderWrapper.java:125)
at org.apache.lucene.index.DocValues.getSortedSet(DocValues.java:304)
at 
org.apache.solr.search.function.OrdFieldSource.getValues(OrdFieldSource.java:99)
at 
org.apache.lucene.queries.function.FunctionQuery$AllScorer.(FunctionQuery.java:116)
at 
org.apache.lucene.queries.function.FunctionQuery$FunctionWeight.scorer(FunctionQuery.java:93)
at org.apache.lucene.search.BooleanWeight.scorer(BooleanWeight.java:274)
at org.apache.lucene.search.Weight.bulkScorer(Weight.java:135)
at 
org.apache.lucene.search.BooleanWeight.bulkScorer(BooleanWeight.java:256)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:769)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:486)
at 
org.apache.solr.search.SolrIndexSearcher.buildAndRunCollectorChain(SolrIndexSearcher.java:200)
at 
org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1682)
at 
org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1501)
at 
org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:555)
at 
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:522)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:277)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2068)
at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:669)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:462)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:210)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:179)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
at org.eclipse.jetty.server.Server.handle(Server.java:499)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)
at 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
at 
org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
at java.lang.Thread.run(Thread.java:745)

2015-09-24 01:43:33.668 INFO  (qtp

bulk reindexing 5.3.0 issue

2015-09-25 Thread Ravi Solr
I have been trying to re-index the docs (about 1.5 million) as one of the
field needed part of string value removed (accidentally introduced). I was
issuing a query for 100 docs getting 4 fields and updating the doc  (atomic
update with "set") via the CloudSolrClient in batches, However from time to
time the query returns 0 results, which exits the re-indexing program.

I cant understand as to why the cloud returns 0 results when there are 1.4x
million docs which have the "accidental" string in them.

Is there another way to do bulk massive updates ?

Thanks

Ravi Kiran Bhaskar


Re: bulk reindexing 5.3.0 issue

2015-09-25 Thread Ravi Solr
Walter, Not in a mood for banter right now Its 6:00pm on a friday and
Iam stuck here trying to figure reindexing issues :-)
I dont have source of docs so I have to query the SOLR, modify and put it
back and that is seeming to be quite a task in 5.3.0, I did reindex several
times with 4.7.2 in a master slave env without any issue. Since then we
have moved to cloud and it has been a pain all day.

Thanks

Ravi Kiran Bhaskar

On Fri, Sep 25, 2015 at 5:25 PM, Walter Underwood 
wrote:

> Sure.
>
> 1. Delete all the docs (no commit).
> 2. Add all the docs (no commit).
> 3. Commit.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>
> > On Sep 25, 2015, at 2:17 PM, Ravi Solr  wrote:
> >
> > I have been trying to re-index the docs (about 1.5 million) as one of the
> > field needed part of string value removed (accidentally introduced). I
> was
> > issuing a query for 100 docs getting 4 fields and updating the doc
> (atomic
> > update with "set") via the CloudSolrClient in batches, However from time
> to
> > time the query returns 0 results, which exits the re-indexing program.
> >
> > I cant understand as to why the cloud returns 0 results when there are
> 1.4x
> > million docs which have the "accidental" string in them.
> >
> > Is there another way to do bulk massive updates ?
> >
> > Thanks
> >
> > Ravi Kiran Bhaskar
>
>


Re: bulk reindexing 5.3.0 issue

2015-09-25 Thread Ravi Solr
No problem Walter, it's all fun. Was just wondering if there was some other
good way that I did not know of, that's all 😀

Thanks

Ravi Kiran Bhaskar

On Friday, September 25, 2015, Walter Underwood 
wrote:

> Sorry, I did not mean to be rude. The original question did not say that
> you don’t have the docs outside of Solr. Some people jump to the advanced
> features and miss the simple ones.
>
> It might be faster to fetch all the docs from Solr and save them in files.
> Then modify them. Then reload all of them. No guarantee, but it is worth a
> try.
>
> Good luck.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org 
> http://observer.wunderwood.org/  (my blog)
>
>
> > On Sep 25, 2015, at 2:59 PM, Ravi Solr  > wrote:
> >
> > Walter, Not in a mood for banter right now Its 6:00pm on a friday and
> > Iam stuck here trying to figure reindexing issues :-)
> > I dont have source of docs so I have to query the SOLR, modify and put it
> > back and that is seeming to be quite a task in 5.3.0, I did reindex
> several
> > times with 4.7.2 in a master slave env without any issue. Since then we
> > have moved to cloud and it has been a pain all day.
> >
> > Thanks
> >
> > Ravi Kiran Bhaskar
> >
> > On Fri, Sep 25, 2015 at 5:25 PM, Walter Underwood  >
> > wrote:
> >
> >> Sure.
> >>
> >> 1. Delete all the docs (no commit).
> >> 2. Add all the docs (no commit).
> >> 3. Commit.
> >>
> >> wunder
> >> Walter Underwood
> >> wun...@wunderwood.org 
> >> http://observer.wunderwood.org/  (my blog)
> >>
> >>
> >>> On Sep 25, 2015, at 2:17 PM, Ravi Solr  > wrote:
> >>>
> >>> I have been trying to re-index the docs (about 1.5 million) as one of
> the
> >>> field needed part of string value removed (accidentally introduced). I
> >> was
> >>> issuing a query for 100 docs getting 4 fields and updating the doc
> >> (atomic
> >>> update with "set") via the CloudSolrClient in batches, However from
> time
> >> to
> >>> time the query returns 0 results, which exits the re-indexing program.
> >>>
> >>> I cant understand as to why the cloud returns 0 results when there are
> >> 1.4x
> >>> million docs which have the "accidental" string in them.
> >>>
> >>> Is there another way to do bulk massive updates ?
> >>>
> >>> Thanks
> >>>
> >>> Ravi Kiran Bhaskar
> >>
> >>
>
>


Re: bulk reindexing 5.3.0 issue

2015-09-25 Thread Ravi Solr
Thanks for responding Erick. I set the "start" to zero and "rows" always to
100. I create CloudSolrClient instance and use it to both query as well as
index. But I do sleep for 5 secs just to allow for any auto commits.

So query --> client.add(100 docs) --> wait --> query again

But the weird thing I noticed was that after 8 or 9 batches I.e 800/900
docs the "query again" returns zero docs causing my while loop to
exist...so was trying to see if I was doing the right thing or if there is
an alternate way to do heavy indexing.

Thanks

Ravi Kiran Bhaskar



On Friday, September 25, 2015, Erick Erickson 
wrote:

> How are you querying Solr? You say you query for 100 docs,
> update then get the next set. What are you using for a marker?
> If you're using the start parameter, and somehow a commit is
> creeping in things might be weird, especially if you're using any
> of the internal Lucene doc IDs. If you're absolutely sure no commits
> are taking place even that should be OK.
>
> The "deep paging" stuff could be helpful here, see:
>
> https://lucidworks.com/blog/coming-soon-to-solr-efficient-cursor-based-iteration-of-large-result-sets/
>
> Best,
> Erick
>
> On Fri, Sep 25, 2015 at 3:13 PM, Ravi Solr  > wrote:
> > No problem Walter, it's all fun. Was just wondering if there was some
> other
> > good way that I did not know of, that's all 😀
> >
> > Thanks
> >
> > Ravi Kiran Bhaskar
> >
> > On Friday, September 25, 2015, Walter Underwood  >
> > wrote:
> >
> >> Sorry, I did not mean to be rude. The original question did not say that
> >> you don’t have the docs outside of Solr. Some people jump to the
> advanced
> >> features and miss the simple ones.
> >>
> >> It might be faster to fetch all the docs from Solr and save them in
> files.
> >> Then modify them. Then reload all of them. No guarantee, but it is
> worth a
> >> try.
> >>
> >> Good luck.
> >>
> >> wunder
> >> Walter Underwood
> >> wun...@wunderwood.org  
> >> http://observer.wunderwood.org/  (my blog)
> >>
> >>
> >> > On Sep 25, 2015, at 2:59 PM, Ravi Solr  
> >> > wrote:
> >> >
> >> > Walter, Not in a mood for banter right now Its 6:00pm on a friday
> and
> >> > Iam stuck here trying to figure reindexing issues :-)
> >> > I dont have source of docs so I have to query the SOLR, modify and
> put it
> >> > back and that is seeming to be quite a task in 5.3.0, I did reindex
> >> several
> >> > times with 4.7.2 in a master slave env without any issue. Since then
> we
> >> > have moved to cloud and it has been a pain all day.
> >> >
> >> > Thanks
> >> >
> >> > Ravi Kiran Bhaskar
> >> >
> >> > On Fri, Sep 25, 2015 at 5:25 PM, Walter Underwood <
> wun...@wunderwood.org 
> >> >
> >> > wrote:
> >> >
> >> >> Sure.
> >> >>
> >> >> 1. Delete all the docs (no commit).
> >> >> 2. Add all the docs (no commit).
> >> >> 3. Commit.
> >> >>
> >> >> wunder
> >> >> Walter Underwood
> >> >> wun...@wunderwood.org  
> >> >> http://observer.wunderwood.org/  (my blog)
> >> >>
> >> >>
> >> >>> On Sep 25, 2015, at 2:17 PM, Ravi Solr  
> >> > wrote:
> >> >>>
> >> >>> I have been trying to re-index the docs (about 1.5 million) as one
> of
> >> the
> >> >>> field needed part of string value removed (accidentally
> introduced). I
> >> >> was
> >> >>> issuing a query for 100 docs getting 4 fields and updating the doc
> >> >> (atomic
> >> >>> update with "set") via the CloudSolrClient in batches, However from
> >> time
> >> >> to
> >> >>> time the query returns 0 results, which exits the re-indexing
> program.
> >> >>>
> >> >>> I cant understand as to why the cloud returns 0 results when there
> are
> >> >> 1.4x
> >> >>> million docs which have the "accidental" string in them.
> >> >>>
> >> >>> Is there another way to do bulk massive updates ?
> >> >>>
> >> >>> Thanks
> >> >>>
> >> >>> Ravi Kiran Bhaskar
> >> >>
> >> >>
> >>
> >>
>


Re: bulk reindexing 5.3.0 issue

2015-09-25 Thread Ravi Solr
thank you for taking time to help me out. Yes I was not using cursorMark, I
will try that next. This is what I was doing, its a bit shabby coding but
what can I say my brain was fried :-) FYI this is a side process just to
correct a messed up string. The actual indexing process was working all the
time as our business owners are a bit petulant about stopping indexing. My
autocommit conf and code is given below, as you can see autocommit should
fire every 100 docs anyway


   100
   12



3

  

private static void processDocs() {

try {
CloudSolrClient client = new
CloudSolrClient("zk1:,zk2:,zk3.com:");
client.setDefaultCollection("collection1");

//First initialize docs
SolrDocumentList docList = getDocs(client, 100);
Long count = 0L;

while (docList != null && docList.size() > 0) {

List inList = new
ArrayList();
for(SolrDocument doc : docList) {

SolrInputDocument iDoc =
ClientUtils.toSolrInputDocument(doc);

//This is my SOLR's Unique id
String uniqueId = (String)
iDoc.getFieldValue("uniqueId");

/*
 * This is another system's id which is what I want to
correct. Was messed
 * because of script transformer in DIH import via
SolrEntityProcessor
 * ex-
sun.org.mozilla.javascript.internal.NativeString:9cdef726-05dd-40b7-b1b2-c9bbce96741f
 */
String uuid = (String) iDoc.getFieldValue("uuid");
String sanitizedUUID =
uuid.replace("sun.org.mozilla.javascript.internal.NativeString:", "");
Map fieldModifier = new
HashMap(1);
fieldModifier.put("set",sanitizedUUID);
iDoc.setField("uuid", fieldModifier);

inList.add(iDoc);
log.info("added " + systemid);
}

client.add(inList);

count = count + docList.size();
log.info("Indexed " + count + "/" + docList.getNumFound());

Thread.sleep(5000);

docList = getDocs(client, docList.size());
log.info("Got Docs- " + docList.getNumFound());
}

} catch (Exception e) {
log.error("Error indexing ", e);
}
}

private static SolrDocumentList getDocs(CloudSolrClient client, Integer
rows) {


SolrQuery q = new SolrQuery("*:*");
q.setSort("publishtime", ORDER.desc);
q.setStart(0);
q.setRows(rows);
q.addFilterQuery(new String[] {"uuid:[* TO *]",
"uuid:sun.org.mozilla*"});
q.setFields(new String[]{"uniqueId","uuid"});
SolrDocumentList docList = null;
QueryResponse resp;
try {
resp = client.query(q);
docList = resp.getResults();
} catch (Exception e) {
log.error("Error querying " + q.toString(), e);
}
return docList;
}


Thanks

Ravi Kiran Bhaskar

On Fri, Sep 25, 2015 at 10:58 PM, Erick Erickson 
wrote:

> Wait, query again how? You've got to have something that keeps you
> from getting the same 100 docs back so you have to be sorting somehow.
> Or you have a high water mark. Or something. Waiting 5 seconds for any
> commit also doesn't really make sense to me. I mean how do you know
>
> 1> that you're going to get a commit (did you explicitly send one from
> the client?).
> 2> all autowarming will be complete by the time the next query hits?
>
> Let's see the query you fire. There has to be some kind of marker that
> you're using to know when you've gotten through the entire set.
>
> And I would use much larger batches, I usually update in batches of
> 1,000 (excepting if these are very large docs of course). I suspect
> you're spending a lot more time sleeping than you need to. I wouldn't
> sleep at all in fact. This is one (rare) case I might consider
> committing from the client. If you specify the wait for searcher param
> (server.commit(true, true), then it doesn't return until a new
> searcher is completely opened so your previous updates will be
> reflected in your next search.
>
> Actually, what I'd really do is
> 1> turn off all auto commits
> 2> go ahead and query/change/update. But the query bits would be using
> the cursormark.
> 3> do NOT commit
> 4> issue a commit when you were all done.
>
> I bet you'd get through your update a lot faster that

Re: bulk reindexing 5.3.0 issue

2015-09-25 Thread Ravi Solr
ount = count + docList.size();
log.info("Indexed " + count + "/" +
docList.getNumFound());

if (cursorMark.equals(nextCursorMark)) {
done = true;
client.commit(true, true);
}
cursorMark = nextCursorMark;
    }

} catch (Exception e) {
log.error("Error indexing ", e);
}
}


Thanks

Ravi Kiran Bhaskar

On Sat, Sep 26, 2015 at 12:10 AM, Ravi Solr  wrote:

> thank you for taking time to help me out. Yes I was not using cursorMark,
> I will try that next. This is what I was doing, its a bit shabby coding but
> what can I say my brain was fried :-) FYI this is a side process just to
> correct a messed up string. The actual indexing process was working all the
> time as our business owners are a bit petulant about stopping indexing. My
> autocommit conf and code is given below, as you can see autocommit should
> fire every 100 docs anyway
>
> 
>100
>12
> 
>
> 
> 3
> 
>   
>
> private static void processDocs() {
>
> try {
> CloudSolrClient client = new
> CloudSolrClient("zk1:,zk2:,zk3.com:");
> client.setDefaultCollection("collection1");
>
> //First initialize docs
> SolrDocumentList docList = getDocs(client, 100);
> Long count = 0L;
>
> while (docList != null && docList.size() > 0) {
>
> List inList = new
> ArrayList();
> for(SolrDocument doc : docList) {
>
> SolrInputDocument iDoc =
> ClientUtils.toSolrInputDocument(doc);
>
> //This is my SOLR's Unique id
> String uniqueId = (String)
> iDoc.getFieldValue("uniqueId");
>
> /*
>  * This is another system's id which is what I want to
> correct. Was messed
>  * because of script transformer in DIH import via
> SolrEntityProcessor
>  * ex-
> sun.org.mozilla.javascript.internal.NativeString:9cdef726-05dd-40b7-b1b2-c9bbce96741f
>  */
> String uuid = (String) iDoc.getFieldValue("uuid");
> String sanitizedUUID =
> uuid.replace("sun.org.mozilla.javascript.internal.NativeString:", "");
> Map fieldModifier = new
> HashMap(1);
> fieldModifier.put("set",sanitizedUUID);
> iDoc.setField("uuid", fieldModifier);
>
> inList.add(iDoc);
> log.info("added " + systemid);
> }
>
> client.add(inList);
>
> count = count + docList.size();
> log.info("Indexed " + count + "/" +
> docList.getNumFound());
>
> Thread.sleep(5000);
>
> docList = getDocs(client, docList.size());
> log.info("Got Docs- " + docList.getNumFound());
> }
>
> } catch (Exception e) {
> log.error("Error indexing ", e);
> }
> }
>
> private static SolrDocumentList getDocs(CloudSolrClient client,
> Integer rows) {
>
>
> SolrQuery q = new SolrQuery("*:*");
> q.setSort("publishtime", ORDER.desc);
> q.setStart(0);
> q.setRows(rows);
> q.addFilterQuery(new String[] {"uuid:[* TO *]",
> "uuid:sun.org.mozilla*"});
> q.setFields(new String[]{"uniqueId","uuid"});
> SolrDocumentList docList = null;
> QueryResponse resp;
> try {
> resp = client.query(q);
> docList = resp.getResults();
> } catch (Exception e) {
> log.error("Error querying " + q.toString(), e);
> }
> return docList;
> }
>
>
> Thanks
>
> Ravi Kiran Bhaskar
>
> On Fri, Sep 25, 2015 at 10:58 PM, Erick Erickson 
> wrote:
>
>> Wait, query again how? You've got to have something that keeps you
>> from getting the same 100 docs back so you have to be sorting somehow.
>> Or you have a high water mark. Or something. Waiting 5 seconds for any
>> commit also doesn't really make sense to me. I mean how do you know
>>
>> 1> that you're going to get a commit (did you explicitly send one from
>> the client?).
>> 2> all autowarming will be complete by the time the n

Re: bulk reindexing 5.3.0 issue

2015-09-25 Thread Ravi Solr
Erick I fixed the "missing content stream" issue as well. by making sure
Iam not adding empty list. However, My very first issue of getting zero
docs once in a while is still haunting me, even after using cursorMarkers,
disabling auto commit and soft commit. I ran code two times and you can see
the statement returns zero docs at random times.

log.info("Indexed " + count + "/" + docList.getNumFound());

-bash-4.1$ tail -f reindexing.log
2015-09-26 01:44:40 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 6500/1440653
2015-09-26 01:44:44 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 7000/1439863
2015-09-26 01:44:48 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 7500/1439410
2015-09-26 01:44:56 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 8000/1438918
2015-09-26 01:45:01 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 8500/1438330
2015-09-26 01:45:01 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 8500/0
2015-09-26 01:45:06 INFO  [a.b.c.AdhocCorrectUUID] - FINISHED !!!

2015-09-26 01:48:15 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 500/1437440
2015-09-26 01:48:19 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 1000/1437440
2015-09-26 01:48:19 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 1000/0
2015-09-26 01:48:22 INFO  [a.b.c.AdhocCorrectUUID] - FINISHED !!!


Thanks

Ravi Kiran Bhaskar

On Sat, Sep 26, 2015 at 1:17 AM, Ravi Solr  wrote:

> Erick as per your advise I used cursorMarks (see code below). It was
> slightly better but Solr throws Exceptions randomly. Please look at the
> code and Stacktrace below
>
> 2015-09-26 01:00:45 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 500/1453133
> 2015-09-26 01:00:49 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 1000/1453133
> 2015-09-26 01:00:54 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 1500/1452592
> 2015-09-26 01:00:58 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 2000/1452095
> 2015-09-26 01:01:03 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 2500/1451675
> 2015-09-26 01:01:10 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 3000/1450924
> 2015-09-26 01:01:15 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 3500/1450445
> 2015-09-26 01:01:19 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 4000/1449997
> 2015-09-26 01:01:24 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 4500/1449692
> 2015-09-26 01:01:28 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 5000/1449201
> 2015-09-26 01:01:28 ERROR [a.b.c.AdhocCorrectUUID] - Error indexing
> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:
> Error from server at http://xx.xx.xx.xx:/solr/collection1: missing
> content stream
> at
> org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:560)
> at
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:234)
> at
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:226)
> at
> org.apache.solr.client.solrj.impl.LBHttpSolrClient.doRequest(LBHttpSolrClient.java:376)
> at
> org.apache.solr.client.solrj.impl.LBHttpSolrClient.request(LBHttpSolrClient.java:328)
> at
> org.apache.solr.client.solrj.impl.CloudSolrClient.sendRequest(CloudSolrClient.java:1085)
> at
> org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:856)
> at
> org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:799)
> at
> org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:135)
> at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:107)
> at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:72)
> at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:86)
> at a.b.c.AdhocCorrectUUID.processDocs(AdhocCorrectUUID.java:97)
> at a.b.c.AdhocCorrectUUID.main(AdhocCorrectUUID.java:37)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at com.simontuffs.onejar.Boot.run(Boot.java:306)
> at com.simontuffs.onejar.Boot.main(Boot.java:159)
> 2015-09-26 01:01:28 INFO  [a.b.c.AdhocCorrectUUID] - FINISHED !!!
>
>
> CODE
> 
> protected static void processDocs() {
>
> try {
> CloudSolrClient client = new
> CloudSolrClient("zk1:,zk2:,zk3.com:");
> client.setDefaultCollection("collection1");
>
> boolean done = false;
> String cursorMark = CursorMarkParams.CURSOR_MARK_START;
> Integer count = 0;
>
> while (!done) {
> SolrQuery q = new
> SolrQuery("*:*").setRows(500).addSort("publishtime",
> ORDER.desc).addSor

Re: bulk reindexing 5.3.0 issue

2015-09-26 Thread Ravi Solr
Thank you Erick & Shawn for taking significant time off your weekends to
debug and explain in great detail. I will try to address the main points
from your emails to provide more situation context for better understanding
of my situation

1. Erick, As part of our upgrade from 4.7.2 to 5.3.0 I re-indexed all docs
from my old Master-Slave to My SolrCloud using DIH SolrEntityProcessor
which used a Script Transformer. I unwittingly messed up the script and
hence this 'uuid' (String Type field) got messed up. All records prior to
Sep 20 2015 have this issue that I am currently try to rectify.

2. Regarding openSearcher=true/false, I had it as false all along in my
4.7.2 config. I read somewhere that SolrCloud or 5.x doesn't honor it or it
should be left default (Don't exactly remember where I read it), hence, I
removed it from my solrconfig.xml going against my intuition :-)

3. Erick, I wasnt getting all 1.4 mill in one shot. I was initially using
100 docs batch, which, I later increased to 500 docs per batch. Also it
would not be a infinite loop if I commit for each batch, right !!??

4. Shawn, you are correct the uuid is of String Type and its not unique key
for my schema. My uniqueKey is uniqueId and systemid is of no consequence
here, it's another field for differentiating apps within my solr.

Than you very much again guys. I will incorporate your suggestions and
report back.

Thanks

Ravi Kiran Bhaskar

On Sat, Sep 26, 2015 at 12:58 PM, Erick Erickson 
wrote:

> Oh, one more thing. _assuming_ you can't change the indexing process
> that gets the docs from the system of record, why not just add an
> update processor that does this at index time? See:
> https://cwiki.apache.org/confluence/display/solr/Update+Request+Processors
> ,
> in particular the StatelessScriptUpdateProcessorFactory might be a
> good candidate. It just takes a bit of javascript (or other scripting
> language) and changes the record before it gets indexed.
>
> FWIW,
> Erick
>
> On Sat, Sep 26, 2015 at 9:52 AM, Shawn Heisey  wrote:
> > On 9/26/2015 10:41 AM, Shawn Heisey wrote:
> >>  30 
> >
> > This needs to include openSearcher=false, as Erick mentioned.  I'm sorry
> > I screwed that up:
> >
> >   
> > 30
> > false
> >   
> >
> > Thanks,
> > Shawn
>


Re: bulk reindexing 5.3.0 issue

2015-09-26 Thread Ravi Solr
Erick & Shawn I incrporated your suggestions.


0. Shut off all other indexing processes.
1. As Shawn mentioned set batch size to 1.
2. Loved Erick's suggestion about not using filter at all and sort by
uniqueId and put last known uinqueId as next queries start while still
using cursor marks as follows

SolrQuery q = new SolrQuery("+uuid:sun.org.mozilla* +uniqueId:{" +
markerSysId + " TO
*]").setRows(1).addSort("uniqueId",ORDER.asc).setFields(new
String[]{"uniqueId","uuid"});
q.set(CursorMarkParams.CURSOR_MARK_PARAM, cursorMark);

3. As per Shawn's advise commented autocommit and soft commit in
solrconfig.xml and set openSearcher to false and issued MANUAL COMMIT for
every batch from code as follows

client.commit(true, true, true);

Here is what the log statement & results - log.info("Indexed " + count +
"/" + docList.getNumFound());


2015-09-26 17:29:57 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 9/1344085
2015-09-26 17:30:30 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 10/1334085
2015-09-26 17:33:26 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 11/1324085
2015-09-26 17:36:09 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 12/1314085
2015-09-26 17:39:42 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 13/1304085
2015-09-26 17:43:05 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 14/1294085
2015-09-26 17:46:14 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 15/1284085
2015-09-26 17:48:22 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 16/1274085
2015-09-26 17:48:25 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 16/0
2015-09-26 17:48:25 INFO  [a.b.c.AdhocCorrectUUID] - FINISHED !!!

Ran manually a second time to see if first was fluke. Still same.

2015-09-26 17:55:26 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 1/1264716
2015-09-26 17:58:07 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 2/1254716
2015-09-26 18:03:09 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 3/1244716
2015-09-26 18:06:32 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 4/1234716
2015-09-26 18:10:35 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 5/1224716
2015-09-26 18:15:23 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 6/1214716
2015-09-26 18:15:24 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 6/0
2015-09-26 18:15:26 INFO  [a.b.c.AdhocCorrectUUID] - FINISHED !!!

Now changed the autommit in solrconfig.xml as follows...Note the soft
commit has been shut off as per Shawn's advise


   
   30
 false


  

2015-09-26 18:47:44 INFO  [com.wpost.search.reindexing.AdhocCorrectUUID] -
Indexed 1/1205451
2015-09-26 18:50:49 INFO  [com.wpost.search.reindexing.AdhocCorrectUUID] -
Indexed 2/1195451
2015-09-26 18:54:18 INFO  [com.wpost.search.reindexing.AdhocCorrectUUID] -
Indexed 3/1185451
2015-09-26 18:57:04 INFO  [com.wpost.search.reindexing.AdhocCorrectUUID] -
Indexed 4/1175451
2015-09-26 19:00:10 INFO  [com.wpost.search.reindexing.AdhocCorrectUUID] -
Indexed 5/1165451
2015-09-26 19:00:13 INFO  [com.wpost.search.reindexing.AdhocCorrectUUID] -
Indexed 5/0
2015-09-26 19:00:13 INFO  [com.wpost.search.reindexing.AdhocCorrectUUID] -
FINISHED !!!


The query still returned 0 results when they are over million docs
available which match uuid:sun.org.mozilla* ...Then why do I get 0 ???

Thanks

Ravi Kiran Bhaskar

On Sat, Sep 26, 2015 at 3:49 PM, Ravi Solr  wrote:

> Thank you Erick & Shawn for taking significant time off your weekends to
> debug and explain in great detail. I will try to address the main points
> from your emails to provide more situation context for better understanding
> of my situation
>
> 1. Erick, As part of our upgrade from 4.7.2 to 5.3.0 I re-indexed all docs
> from my old Master-Slave to My SolrCloud using DIH SolrEntityProcessor
> which used a Script Transformer. I unwittingly messed up the script and
> hence this 'uuid' (String Type field) got messed up. All records prior to
> Sep 20 2015 have this issue that I am currently try to rectify.
>
> 2. Regarding openSearcher=true/false, I had it as false all along in my
> 4.7.2 config. I read somewhere that SolrCloud or 5.x doesn't honor it or it
> should be left default (Don't exactly remember where I read it), hence, I
> removed it from my solrconfig.xml going against my intuition :-)
>
> 3. Erick, I wasnt getting all 1.4 mill in one shot. I was initially using
> 100 docs batch, which, I later increased to 500 docs per batch. Also it
> would not be a infinite loop if I commit for each batch, right !!??
>
> 4. Shawn, you are correct the uuid is of String Type and its not unique
> key for my schema. My uniqueKey is uniqueId and systemid is of no
> consequence here, it's another field for differentiating apps within my
> solr.
>
> Than you very much again guys. I will incorporate your suggestions and
> report back.
>
> Thanks
>
> Ravi Kira

Re: bulk reindexing 5.3.0 issue

2015-09-26 Thread Ravi Solr
Erick...There is only one type of String
"sun.org.mozilla.javascript.internal.NativeString:" and no other variations
of that in my index, so no question of missing it. Point taken regarding
the CURSORMARK stuff, yes you are correct, my head so numb at this point
after working 3 days on this, I wasnt thinking straight.

BTW I found the real issue, I have a total of 8 servers in the solr cloud.
The leader for this specific collection was the one that was returning 0
for the searches. All other 7 servers had roughly 800K docs still needing
the string replacement. So maybe the real issue is sync among servers. Just
to prove to myself I shutdown the solr  that was giving zero results (i.e.
all uuid strings have already been somehow devoid of spurious
sun.org.mozilla.javascript.internal.NativeString on that server). Now it
ran perfectly fine and is about to finish as last 103K are still left when
I was writing this email.

So the real question is how can we ensure that the Sync is always
maintained and what to do if it ever goes out of Sync, I did see some Jira
tickets from previous 4.10.x versions where Sync was an issue. Can you
please point me to any doc which says how SolrCloud synchs/replicates ?

Thanks,

Ravi Kiran Bhaskar

Thanks

Rvai Kiran Bhaskar

On Sat, Sep 26, 2015 at 11:00 PM, Erick Erickson 
wrote:

> bq: 3. Erick, I wasnt getting all 1.4 mill in one shot. I was initially
> using
> 100 docs batch, which, I later increased to 500 docs per batch. Also it
> would not be a infinite loop if I commit for each batch, right !!??
>
> That's not the point at all. Look at the basic logic here:
>
> You run for a while processing 100 (or 500 or 1,000) docs per batch
> and change all uuid fields with this statement:
>
> uuid.replace("sun.org.mozilla.javascript.internal.NativeString:", "");
>
> and then update the doc. You run this as long as you have any docs
> that satisfy the query "q=uuid:sun.org.mozilla*", _changing_
> every one that has this string!
>
> At that point, theoretically, no document in your index has this string. So
> running your update program immediately after should find _zero_ documents.
>
> I've been assuming your complaint is that you don't process 1.4 M docs (in
> batches), you process some lower number then exit and you think this is
> wrong.
> I'm claiming that you should only expect to find as many docs as have been
> indexed since the last time the program ran.
>
> As far as the infinite loop is concerned, again trace the logic in the old
> code.
> Forget about commits and all the mechanics, just look at the logic.
> You're querying on "sun.org.mozilla*". But you only change if you get a
> match on
> "sun.org.mozilla.javascript.internal.NativeString:"
>
> Now imagine you have a doc that has sun.org.mozilla.erick in it. That doc
> gets
> returned from the query but does _not_ get modified because it doesn't
> match your pattern. In the older code, it would be found again and
> returned next
> time you queried. Then not modified again. Eventually you'd be in a
> position
> where you never changed any docs, just kept getting the same docList back
> over and over again. Marching through based on the unique key should not
> have the same potential issue.
>
> You should not be mixing the new query stuff with CURSORMARK. Deep paging
> supposes the exact same query is being run over and over and you're
> _paging_
> through the results. You're changing the query every time so the results
> aren't
> very predictable.
>
> Best,
> Erick
>
>
> On Sat, Sep 26, 2015 at 5:01 PM, Ravi Solr  wrote:
> > Erick & Shawn I incrporated your suggestions.
> >
> >
> > 0. Shut off all other indexing processes.
> > 1. As Shawn mentioned set batch size to 1.
> > 2. Loved Erick's suggestion about not using filter at all and sort by
> > uniqueId and put last known uinqueId as next queries start while still
> > using cursor marks as follows
> >
> > SolrQuery q = new SolrQuery("+uuid:sun.org.mozilla* +uniqueId:{" +
> > markerSysId + " TO
> > *]").setRows(1).addSort("uniqueId",ORDER.asc).setFields(new
> > String[]{"uniqueId","uuid"});
> > q.set(CursorMarkParams.CURSOR_MARK_PARAM, cursorMark);
> >
> > 3. As per Shawn's advise commented autocommit and soft commit in
> > solrconfig.xml and set openSearcher to false and issued MANUAL COMMIT for
> > every batch from code as follows
> >
> > client.commit(true, true, true);
> >
> > Here is what the log statement & results - log.info("Indexed " + count +
> > "/" + docL

Re: bulk reindexing 5.3.0 issue

2015-09-28 Thread Ravi Solr
Gili I was constantly checking the cloud admin UI and it always stayed
Green, that is why I initially overlooked sync issues...finally when all
options dried out I went individually to each node and quieried and that is
when i found the out of sync issue. The way I resolved my issue was shut
down the leader that was not synching properly and let another node become
the leader, then reindex all docs. Once the reindexing is done I started
the node that was causing the issue and it synched properly :-)

Thanks

Ravi Kiran Bhaskar



On Mon, Sep 28, 2015 at 10:26 AM, Gili Nachum  wrote:

> Were all of shard replica in active state (green color in admin ui) before
> starting?
> Sounds like it otherwise you won't hit the replica that is out of sync.
>
> Replicas can get out of sync, and report being in sync after a sequence of
> stop start w/o a chance to complete sync.
> See if it might have happened to you:
>
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201412.mbox/%3CCAOOKt53XTU_e0m2ioJ-S4SfsAp8JC6m-=nybbd4g_mjh60b...@mail.gmail.com%3E
> On Sep 27, 2015 06:56, "Ravi Solr"  wrote:
>
> > Erick...There is only one type of String
> > "sun.org.mozilla.javascript.internal.NativeString:" and no other
> variations
> > of that in my index, so no question of missing it. Point taken regarding
> > the CURSORMARK stuff, yes you are correct, my head so numb at this point
> > after working 3 days on this, I wasnt thinking straight.
> >
> > BTW I found the real issue, I have a total of 8 servers in the solr
> cloud.
> > The leader for this specific collection was the one that was returning 0
> > for the searches. All other 7 servers had roughly 800K docs still needing
> > the string replacement. So maybe the real issue is sync among servers.
> Just
> > to prove to myself I shutdown the solr  that was giving zero results
> (i.e.
> > all uuid strings have already been somehow devoid of spurious
> > sun.org.mozilla.javascript.internal.NativeString on that server). Now it
> > ran perfectly fine and is about to finish as last 103K are still left
> when
> > I was writing this email.
> >
> > So the real question is how can we ensure that the Sync is always
> > maintained and what to do if it ever goes out of Sync, I did see some
> Jira
> > tickets from previous 4.10.x versions where Sync was an issue. Can you
> > please point me to any doc which says how SolrCloud synchs/replicates ?
> >
> > Thanks,
> >
> > Ravi Kiran Bhaskar
> >
> > Thanks
> >
> > Rvai Kiran Bhaskar
> >
> > On Sat, Sep 26, 2015 at 11:00 PM, Erick Erickson <
> erickerick...@gmail.com>
> > wrote:
> >
> > > bq: 3. Erick, I wasnt getting all 1.4 mill in one shot. I was initially
> > > using
> > > 100 docs batch, which, I later increased to 500 docs per batch. Also it
> > > would not be a infinite loop if I commit for each batch, right !!??
> > >
> > > That's not the point at all. Look at the basic logic here:
> > >
> > > You run for a while processing 100 (or 500 or 1,000) docs per batch
> > > and change all uuid fields with this statement:
> > >
> > > uuid.replace("sun.org.mozilla.javascript.internal.NativeString:", "");
> > >
> > > and then update the doc. You run this as long as you have any docs
> > > that satisfy the query "q=uuid:sun.org.mozilla*", _changing_
> > > every one that has this string!
> > >
> > > At that point, theoretically, no document in your index has this
> string.
> > So
> > > running your update program immediately after should find _zero_
> > documents.
> > >
> > > I've been assuming your complaint is that you don't process 1.4 M docs
> > (in
> > > batches), you process some lower number then exit and you think this is
> > > wrong.
> > > I'm claiming that you should only expect to find as many docs as have
> > been
> > > indexed since the last time the program ran.
> > >
> > > As far as the infinite loop is concerned, again trace the logic in the
> > old
> > > code.
> > > Forget about commits and all the mechanics, just look at the logic.
> > > You're querying on "sun.org.mozilla*". But you only change if you get a
> > > match on
> > > "sun.org.mozilla.javascript.internal.NativeString:"
> > >
> > > Now imagine you have a doc that has sun.org.mozilla.erick in it. That
> doc
> > > gets
> > > returned from the query but does _not_ get 

Solr 4.7.2 Vs 5.3.0 Docs different for same query

2015-10-01 Thread Ravi Solr
I we migrated from 4.7.2 to 5.3.0. I sourced the docs from 4.7.2 core and
indexed into 5.3.0 collection (data directories are different) via
SolrEntityProcessor. Currently my production is all whack because of this
issue. Do I have to go back and reindex all again ?? Is there a quick fix
for this ?

Here are the results for the query 'obama'...please note the numfound.
4.7.2 has almost 148519 docs while 5.3.0 says it only has 5.3.0 docs. Any
pointers on how to correct this ?


Solr 4.7.2



  0
  2
  
 obama
  0
   
  
  


SolrCloud 5.3.0


  
   0
   2
   
obama
0

   
   



Thanks

Ravi Kiran Bhaskar


Re: Solr 4.7.2 Vs 5.3.0 Docs different for same query

2015-10-02 Thread Ravi Solr
Mr. Uchida,
Thank you for responding. It was my fault, I had a update processor
which takes specific text and string fields and concatenates them into a
single field, and I search on that single field. Recently I used Atomic
update to fix a specific field's value and forgot to disable the
UpdateProcessor chain...Since I was only updating one field the aggregate
field got messed up with just that field value and hence I had issues
searching. I reindexed the data again yesterday night and now it is all
good.

I do have a small question, when we update the zookeeper ensemble with new
configs via 'upconfig' and 'linkconfig' commands do we have to "reload" the
collections on all the nodes to see the updated config ?? Is there a single
call which can update all nodes connected to the ensemble ?? I just went to
the admin UI and hit "Reload" button manually on each of the node...Is that
the correct way to do it ?

Thanks

Ravi Kiran Bhaskar

On Fri, Oct 2, 2015 at 12:04 AM, Tomoko Uchida  wrote:

> Are you sure that you've indexed same data to Solr 4.7.2 and 5.3.0 ?
> If so, I suspect that you have multiple shards and request to one shard.
> (In that case, you might get partial results)
>
> Can you share HTTP request url and the schema and default search field ?
>
>
> 2015-10-02 6:09 GMT+09:00 Ravi Solr :
>
> > I we migrated from 4.7.2 to 5.3.0. I sourced the docs from 4.7.2 core and
> > indexed into 5.3.0 collection (data directories are different) via
> > SolrEntityProcessor. Currently my production is all whack because of this
> > issue. Do I have to go back and reindex all again ?? Is there a quick fix
> > for this ?
> >
> > Here are the results for the query 'obama'...please note the numfound.
> > 4.7.2 has almost 148519 docs while 5.3.0 says it only has 5.3.0 docs. Any
> > pointers on how to correct this ?
> >
> >
> > Solr 4.7.2
> >
> > 
> > 
> >   0
> >   2
> >   
> >  obama
> >   0
> >
> >   
> >   
> > 
> >
> > SolrCloud 5.3.0
> >
> > 
> >   
> >0
> >2
> >
> > obama
> > 0
> > 
> >
> >
> > 
> >
> >
> > Thanks
> >
> > Ravi Kiran Bhaskar
> >
>


Re: Reverse query?

2015-10-02 Thread Ravi Solr
Hello Remi,
Iam assuming the field where you store the data is analyzed.
The field definition might help us answer your question better. If you are
using edismax handler for your search requests, I believe you can achieve
you goal by setting set your "mm" to 100%, phrase slop "ps" and query slop
"qs" parameters to zero. I think that will force exact matches.

Thanks

Ravi Kiran Bhaskar

On Fri, Oct 2, 2015 at 9:48 AM, Andrea Roggerone <
andrearoggerone.o...@gmail.com> wrote:

> Hi Remy,
> The question is not really clear, could you explain a little bit better
> what you need? Reading your email I understand that you want to get
> documents containing all the search terms typed. For instance if you search
> for "Mad Max", you wanna get documents containing both Mad and Max. If
> that's your need, you can use a phrase query like:
>
> *"*Mad Max*"~2*
>
> where enclosing your keywords between double quotes means that you want to
> get both Mad and Max and the optional parameter ~2 is an example of *slop*.
> If you need more info you can look for *Phrase Query* in
> https://wiki.apache.org/solr/SolrRelevancyFAQ
>
> On Fri, Oct 2, 2015 at 2:33 PM, remi tassing 
> wrote:
>
> > Hi,
> > I have medium-low experience on Solr and I have a question I couldn't
> quite
> > solve yet.
> >
> > Typically we have quite short query strings (a couple of words) and the
> > search is done through a set of bigger documents. What if the logic is
> > turned a little bit around. I have a document and I need to find out what
> > strings appear in the document. A string here could be a person name
> > (including space for example) or a location...which are indexed in Solr.
> >
> > A concrete example, we take this text from wikipedia (Mad Max):
> > "*Mad Max is a 1979 Australian dystopian action film directed by George
> > Miller .
> > Written by Miller and James McCausland from a story by Miller and
> producer
> > Byron Kennedy , it tells a
> > story of societal breakdown
> > , murder, and vengeance
> > . The film, starring the
> > then-little-known Mel Gibson ,
> > was released internationally in 1980. It became a top-grossing Australian
> > film, while holding the record in the Guinness Book of Records
> >  for decades as
> > the
> > most profitable film ever created,[1]
> >  and
> > has
> > been credited for further opening the global market to Australian New
> Wave
> >  films.*
> > 
> > "
> >
> > I would like it to match "Mad Max" but not "Mad" or "Max" seperately, and
> > "George Miller", "global market" ...
> >
> > I've tried the keywordTokenizer but it didn't work. I suppose it's ok for
> > the index time but not query time (in this specific case)
> >
> > I had a look at Luwak but it's not what I'm looking for (
> >
> >
> http://www.flax.co.uk/blog/2013/12/06/introducing-luwak-a-library-for-high-performance-stored-queries/
> > )
> >
> > The typical name search doesn't seem to work either,
> > https://dzone.com/articles/tips-name-search-solr
> >
> > I was thinking this problem must have already be solved...or?
> >
> > Remi
> >
>


Re: Zk and Solr Cloud

2015-10-02 Thread Ravi Solr
Awesome nugget Shawn, I also faced similar issue a while ago while i was
doing a full re-index. It would be great if such tips are added into FAQ
type documentation on cwiki. I love the SOLR forum everyday I learn
something new :-)

Thanks

Ravi Kiran Bhaskar

On Fri, Oct 2, 2015 at 1:58 AM, Shawn Heisey  wrote:

> On 10/1/2015 1:26 PM, Rallavagu wrote:
> > Solr 4.6.1 single shard with 4 nodes. Zookeeper 3.4.5 ensemble of 3.
> >
> > See following errors in ZK and Solr and they are connected.
> >
> > When I see the following error in Zookeeper,
> >
> > unexpected error, closing socket connection and attempting reconnect
> > java.io.IOException: Packet len11823809 is out of range!
>
> This is usually caused by the overseer queue (stored in zookeeper)
> becoming extraordinarily huge, because it's being flooded with work
> entries far faster than the overseer can process them.  This causes the
> znode where the queue is stored to become larger than the maximum size
> for a znode, which defaults to about 1MB.  In this case (reading your
> log message that says len11823809), something in zookeeper has gotten to
> be 11MB in size, so the zookeeper client cannot read it.
>
> I think the zookeeper server code must be handling the addition of
> children to the queue znode through a code path that doesn't pay
> attention to the maximum buffer size, just goes ahead and adds it,
> probably by simply appending data.  I'm unfamiliar with how the ZK
> database works, so I'm guessing here.
>
> If I'm right about where the problem is, there are two workarounds to
> your immediate issue.
>
> 1) Delete all the entries in your overseer queue using a zookeeper
> client that lets you edit the DB directly.  If you haven't changed the
> cloud structure and all your servers are working, this should be safe.
>
> 2) Set the jute.maxbuffer system property on the startup commandline for
> all ZK servers and all ZK clients (Solr instances) to a size that's
> large enough to accommodate the huge znode.  In order to do the deletion
> mentioned in option 1 above,you might need to increase jute.maxbuffer on
> the servers and the client you use for the deletion.
>
> These are just workarounds.  Whatever caused the huge queue in the first
> place must be addressed.  It is frequently a performance issue.  If you
> go to the following link, you will see that jute.maxbuffer is considered
> an unsafe option:
>
> http://zookeeper.apache.org/doc/r3.3.3/zookeeperAdmin.html#Unsafe+Options
>
> In Jira issue SOLR-7191, I wrote the following in one of my comments:
>
> "The giant queue I encountered was about 85 entries, and resulted in
> a packet length of a little over 14 megabytes. If I divide 85 by 14,
> I know that I can have about 6 overseer queue entries in one znode
> before jute.maxbuffer needs to be increased."
>
> https://issues.apache.org/jira/browse/SOLR-7191?focusedCommentId=14347834
>
> Thanks,
> Shawn
>
>


Re: Solr 4.7.2 Vs 5.3.0 Docs different for same query

2015-10-02 Thread Ravi Solr
Thank you very much Erick and Uchida. I will take a look at the URL u gave
Erick.

Thanks

Ravi Kiran Bhaskar

On Fri, Oct 2, 2015 at 12:41 PM, Tomoko Uchida  wrote:

> Hi Ravi,
>
> And for minor additional information,
> you may want to look through Collections API reference guide to handle
> collections properly in SolrCloud environment. (I bookmark this page.)
> https://cwiki.apache.org/confluence/display/solr/Collections+API
> <https://cwiki.apache.org/confluence/display/solr/Collections+API>
>
> Regards,
> Tomoko
>
> 2015-10-03 1:15 GMT+09:00 Erick Erickson :
>
> > do we have to "reload" the collections on all the nodes to see the
> > updated config ??
> > YES
> >
> > Is there a single call which can update all nodes connected to the
> > ensemble ??
> >
> > NO. I'll be a little pedantic here. When you say "ensemble", I'm not
> quite
> > sure
> > what that means and am interpreting it as "all collections registered
> with
> > ZK".
> > But see below.
> >
> > I just went to the admin UI and hit "Reload" button manually on each
> > of the node...Is that
> > the correct way to do it ?
> >
> > NO. The admin UI, "core admin" is a remnant from the old days (like
> > 3.x) where there was
> > no concept of distributed collection as a distinct entity, you had to
> > do all the things you now
> > do automatically in SolrCloud "by hand". PLEASE DO NOT USE THIS
> > EXCEPT TO VIEW A REPLICA WHEN USING SOLRCLOUD! In particular, don't try
> to
> > take any action that manipulates the core (reload, add, unload and the
> > like).
> > It'll work, but you have to know _exactly_ what you are doing. Go
> > ahead and use it for
> > viewing the current state of a replica/core, but unless you need to do
> > something that
> > you cannot do with the Collections API it's very easy to go astray.
> >
> >
> > Instead, use the "collections API". In this case, there's a call like
> >
> >
> >
> http://localhost:8983/solr/admin/collections?action=RELOAD&name=CollectionName
> >
> > that will cause all the replicas associated with the collection to be
> > reloaded. Given you
> > mentioned linkconfig, I'm guessing that you have more than one
> > collection looking at a
> > particular configset, so the pedantic bit is you'd have to issue the
> > above for each
> > collection that references that configset.
> >
> > Best,
> > Erick
> >
> > P.S. Two bits:
> > 1> actually the collections API uses the core admin calls to
> > accomplish its tasks, but
> > lots of effort went in to doing exactly the right thing
> > 2> Upayavira has been creating an updated admin UI that will treat
> > collections as
> > first-class citizens (a work in progress). You can access it in 5.x by
> > hitting
> >
> > solr_host:solr_port/solr/index.html
> >
> > Give it a whirl if you can and please provide any feedback you can, it'd
> > be much
> > appreciated.
> >
> > On Fri, Oct 2, 2015 at 7:47 AM, Ravi Solr  wrote:
> > > Mr. Uchida,
> > > Thank you for responding. It was my fault, I had a update
> > processor
> > > which takes specific text and string fields and concatenates them into
> a
> > > single field, and I search on that single field. Recently I used Atomic
> > > update to fix a specific field's value and forgot to disable the
> > > UpdateProcessor chain...Since I was only updating one field the
> aggregate
> > > field got messed up with just that field value and hence I had issues
> > > searching. I reindexed the data again yesterday night and now it is all
> > > good.
> > >
> > > I do have a small question, when we update the zookeeper ensemble with
> > new
> > > configs via 'upconfig' and 'linkconfig' commands do we have to "reload"
> > the
> > > collections on all the nodes to see the updated config ?? Is there a
> > single
> > > call which can update all nodes connected to the ensemble ?? I just
> went
> > to
> > > the admin UI and hit "Reload" button manually on each of the node...Is
> > that
> > > the correct way to do it ?
> > >
> > > Thanks
> > >
> > > Ravi Kiran Bhaskar
> > >
> > > On Fri, Oct 2, 2015 at 12:04 AM, Tomoko Uchida <
> > tomoko.uchida.1...@gmail.com
> > >> wrote:
> &

solr 5.x on glassfish/tomcat instead of jetty

2015-05-20 Thread Ravi Solr
I have read that solr 5.x has moved away from deployable WAR architecture
to a runnable Java Application architecture. Our infrastructure/standards
folks are adamant about not running SOLR on Jetty (as we are about to
upgrade from 4.7.2 to 5.1), any ideas on how I can make it run on Glassfish
or at least on Tomcat ?? And do I have to watch for any gotchas regarding
the different containers or the upgrade itself ? Would love to hear from
people who have already treaded down that path.


Thanks

Ravi Kiran Bhaskar


Re: solr 5.x on glassfish/tomcat instead of jetty

2015-05-20 Thread Ravi Solr
Shawn I agree with you, but, some of the decisions in the corporate world
are handed down through higher powers/pay grade, who do not always like to
hear counter arguments. For example, this is the same reason why
govt/federal restrict tech folks only use certified DBs/App Servers like
Oracle,WSAD etc (Not to say that govt teams are not using SOLR, I know
library of congress etc use it.). Some times the decision is above my pay
grade more so when the firm is not a core Technology firm. I would rather
find a way than be labeled an anarchist, after all anything is possible
with software right !!?? ;-)

Hope you have already viewed "The Expert" video on YouTube :-)

Thanks

Ravi Kiran Bhaskar

On Wed, May 20, 2015 at 11:21 AM, Shawn Heisey  wrote:

> On 5/20/2015 9:07 AM, Ravi Solr wrote:
> > I have read that solr 5.x has moved away from deployable WAR architecture
> > to a runnable Java Application architecture. Our infrastructure/standards
> > folks are adamant about not running SOLR on Jetty (as we are about to
> > upgrade from 4.7.2 to 5.1), any ideas on how I can make it run on
> Glassfish
> > or at least on Tomcat ?? And do I have to watch for any gotchas regarding
> > the different containers or the upgrade itself ? Would love to hear from
> > people who have already treaded down that path.
>
> I really need to finish the wiki page on this topic.
>
> As of right now, there is still a .war file.  Look in the server/webapps
> directory for the .war, server/lib/ext for logging jars, and
> server/resources for the logging configuration.  Consult your
> container's documentation to learn where to place these things.
>
> At some point in the future, such deployments will no longer be
> possible, which is why the docs say you can't do it, even though you
> can.  The project is preparing users for the eventual reality with a
> documentation change.
>
> I'm wondering ... if Jetty is good enough for the Google App Engine, why
> isn't it good enough for your infrastructure standards?  It is the only
> container that gets testing ... I assure you that there are no tests in
> the Solr source code that make sure Glassfish works.
>
> Thanks,
> Shawn
>
>


6.4.0 collection leader election and recovery issues

2017-02-01 Thread Ravi Solr
Hello,
 Yesterday I upgraded from 6.0.1 to 6.4.0, its been straight 12
hours of debugging spree!! Can somebody kindly help me  out of this misery.

I have a set has 8 single shard collections with 3 replicas. As soon as I
updated the configs and started the servers one of my collection got stuck
with no leader. I have restarted solr to no avail, I also tried to force a
leader via collections API that dint work either. I also see that, from
time to time multiple solr nodes go down all at the same time, only a
restart resolves the issue.

The error snippets are shown below

2017-02-02 01:43:42.785 ERROR
(recoveryExecutor-3-thread-6-processing-n:10.128.159.245:9001_solr
x:clicktrack_shard1_replica1 s:shard1 c:clicktrack r:core_node1)
[c:clicktrack s:shard1 r:core_node1 x:clicktrack_shard1_replica1]
o.a.s.c.RecoveryStrategy Error while trying to recover.
core=clicktrack_shard1_replica1:org.apache.solr.common.SolrException: No
registered leader was found after waiting for 4000ms , collection:
clicktrack slice: shard1

solr.log.9:2017-02-02 01:43:41.336 INFO
(zkCallback-4-thread-29-processing-n:10.128.159.245:9001_solr) [   ]
o.a.s.c.c.ZkStateReader A cluster state change: [WatchedEvent
state:SyncConnected type:NodeDataChanged
path:/collections/clicktrack/state.json] for collection [clicktrack] has
occurred - updating... (live nodes size: [1])
solr.log.9:2017-02-02 01:43:42.224 INFO
(zkCallback-4-thread-29-processing-n:10.128.159.245:9001_solr) [   ]
o.a.s.c.c.ZkStateReader A cluster state change: [WatchedEvent
state:SyncConnected type:NodeDataChanged
path:/collections/clicktrack/state.json] for collection [clicktrack] has
occurred - updating... (live nodes size: [1])
solr.log.9:2017-02-02 01:43:43.767 INFO
(zkCallback-4-thread-23-processing-n:10.128.159.245:9001_solr) [   ]
o.a.s.c.c.ZkStateReader A cluster state change: [WatchedEvent
state:SyncConnected type:NodeDataChanged
path:/collections/clicktrack/state.json] for collection [clicktrack] has
occurred - updating... (live nodes size: [1])


Suspecting the worst I backed up the index and renamed the collection's
data folder and restarted the servers, this time the collection got a
proper leader. So is my index really corrupted ? Solr UI showed live nodes
just like the logs but without any leader. Even with the leader issue
somewhat alleviated after renaming the data folder and letting silr create
a new data folder my servers did go down a couple of times.

I am not all that well versed with zookeeper...any trick to make zookeeper
pick a leader and be happy ? Did anybody have solr/zookeeper issues with
6.4.0 ?

Thanks

Ravi Kiran Bhaskar


Re: 6.4.0 collection leader election and recovery issues

2017-02-02 Thread Ravi Solr
Following up on my previous email, the intermittent server unavailability
seems to be linked to the interaction between Solr and Zookeeper. Can
somebody help me understand what this error means and how to recover from
it.

2017-02-02 09:44:24.648 ERROR
(recoveryExecutor-3-thread-16-processing-n:xx.xxx.xxx.xxx:1234_solr
x:clicktrack_shard1_replica4 s:shard1 c:clicktrack r:core_node3)
[c:clicktrack s:shard1 r:core_node3 x:clicktrack_shard1_replica4]
o.a.s.c.RecoveryStrategy Error while trying to recover.
core=clicktrack_shard1_replica4:org.apache.zookeeper.KeeperException$SessionExpiredException:
KeeperErrorCode = Session expired for /overseer/queue/qn-
at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
at
org.apache.solr.common.cloud.SolrZkClient$9.execute(SolrZkClient.java:391)
at
org.apache.solr.common.cloud.SolrZkClient$9.execute(SolrZkClient.java:388)
at
org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:60)
at
org.apache.solr.common.cloud.SolrZkClient.create(SolrZkClient.java:388)
at
org.apache.solr.cloud.DistributedQueue.offer(DistributedQueue.java:244)
at org.apache.solr.cloud.ZkController.publish(ZkController.java:1215)
at org.apache.solr.cloud.ZkController.publish(ZkController.java:1128)
at org.apache.solr.cloud.ZkController.publish(ZkController.java:1124)
at
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:334)
at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:222)
at
com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

Thanks

Ravi Kiran Bhaskar

On Thu, Feb 2, 2017 at 2:27 AM, Ravi Solr  wrote:

> Hello,
>  Yesterday I upgraded from 6.0.1 to 6.4.0, its been straight 12
> hours of debugging spree!! Can somebody kindly help me  out of this misery.
>
> I have a set has 8 single shard collections with 3 replicas. As soon as I
> updated the configs and started the servers one of my collection got stuck
> with no leader. I have restarted solr to no avail, I also tried to force a
> leader via collections API that dint work either. I also see that, from
> time to time multiple solr nodes go down all at the same time, only a
> restart resolves the issue.
>
> The error snippets are shown below
>
> 2017-02-02 01:43:42.785 ERROR (recoveryExecutor-3-thread-6-processing-n:
> 10.128.159.245:9001_solr x:clicktrack_shard1_replica1 s:shard1
> c:clicktrack r:core_node1) [c:clicktrack s:shard1 r:core_node1
> x:clicktrack_shard1_replica1] o.a.s.c.RecoveryStrategy Error while trying
> to recover. 
> core=clicktrack_shard1_replica1:org.apache.solr.common.SolrException:
> No registered leader was found after waiting for 4000ms , collection:
> clicktrack slice: shard1
>
> solr.log.9:2017-02-02 01:43:41.336 INFO  (zkCallback-4-thread-29-
> processing-n:10.128.159.245:9001_solr) [   ] o.a.s.c.c.ZkStateReader A
> cluster state change: [WatchedEvent state:SyncConnected
> type:NodeDataChanged path:/collections/clicktrack/state.json] for
> collection [clicktrack] has occurred - updating... (live nodes size: [1])
> solr.log.9:2017-02-02 01:43:42.224 INFO  (zkCallback-4-thread-29-
> processing-n:10.128.159.245:9001_solr) [   ] o.a.s.c.c.ZkStateReader A
> cluster state change: [WatchedEvent state:SyncConnected
> type:NodeDataChanged path:/collections/clicktrack/state.json] for
> collection [clicktrack] has occurred - updating... (live nodes size: [1])
> solr.log.9:2017-02-02 01:43:43.767 INFO  (zkCallback-4-thread-23-
> processing-n:10.128.159.245:9001_solr) [   ] o.a.s.c.c.ZkStateReader A
> cluster state change: [WatchedEvent state:SyncConnected
> type:NodeDataChanged path:/collections/clicktrack/state.json] for
> collection [clicktrack] has occurred - updating... (live nodes size: [1])
>
>
> Suspecting the worst I backed up the index and renamed the collection's
> data folder and restarted the servers, this time the collection got a
> proper leader. So is my index really corrupted ? Solr UI showed live nodes
> just like the logs but without any leader. Even with the leader issue
> somewhat alleviated after renaming the data folder and letting silr create
> a new data folder my 

Re: 6.4.0 collection leader election and recovery issues

2017-02-02 Thread Ravi Solr
When i try to rollback from 6.4.0 to my original version of 6.0.1 it now
throws another issue. Now I cant go to 6.4.0 nor can I roll back to 6.0.1

Could not load codec 'Lucene62'.  Did you forget to add
lucene-backward-codecs.jar?
at org.apache.lucene.index.SegmentInfos.readCodec(SegmentInfos.java:429)
at
org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:349)
at
org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:284)

Hope this doesnt cost me dearly. Any ideas at least on how to rollback
safely.

Thanks

Ravi Kiran Bhaskar

On Thu, Feb 2, 2017 at 4:52 AM, Ravi Solr  wrote:

> Following up on my previous email, the intermittent server unavailability
> seems to be linked to the interaction between Solr and Zookeeper. Can
> somebody help me understand what this error means and how to recover from
> it.
>
> 2017-02-02 09:44:24.648 ERROR (recoveryExecutor-3-thread-16-
> processing-n:xx.xxx.xxx.xxx:1234_solr x:clicktrack_shard1_replica4
> s:shard1 c:clicktrack r:core_node3) [c:clicktrack s:shard1 r:core_node3
> x:clicktrack_shard1_replica4] o.a.s.c.RecoveryStrategy Error while trying
> to recover. core=clicktrack_shard1_replica4:org.apache.zookeeper.
> KeeperException$SessionExpiredException: KeeperErrorCode = Session
> expired for /overseer/queue/qn-
> at org.apache.zookeeper.KeeperException.create(
> KeeperException.java:127)
> at org.apache.zookeeper.KeeperException.create(
> KeeperException.java:51)
> at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
> at org.apache.solr.common.cloud.SolrZkClient$9.execute(
> SolrZkClient.java:391)
> at org.apache.solr.common.cloud.SolrZkClient$9.execute(
> SolrZkClient.java:388)
> at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(
> ZkCmdExecutor.java:60)
> at org.apache.solr.common.cloud.SolrZkClient.create(
> SolrZkClient.java:388)
> at org.apache.solr.cloud.DistributedQueue.offer(
> DistributedQueue.java:244)
> at org.apache.solr.cloud.ZkController.publish(ZkController.java:1215)
> at org.apache.solr.cloud.ZkController.publish(ZkController.java:1128)
> at org.apache.solr.cloud.ZkController.publish(ZkController.java:1124)
> at org.apache.solr.cloud.RecoveryStrategy.doRecovery(
> RecoveryStrategy.java:334)
> at org.apache.solr.cloud.RecoveryStrategy.run(
> RecoveryStrategy.java:222)
> at com.codahale.metrics.InstrumentedExecutorService$
> InstrumentedRunnable.run(InstrumentedExecutorService.java:176)
> at java.util.concurrent.Executors$RunnableAdapter.
> call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at org.apache.solr.common.util.ExecutorUtil$
> MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1142)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
>
> Thanks
>
> Ravi Kiran Bhaskar
>
> On Thu, Feb 2, 2017 at 2:27 AM, Ravi Solr  wrote:
>
>> Hello,
>>  Yesterday I upgraded from 6.0.1 to 6.4.0, its been straight 12
>> hours of debugging spree!! Can somebody kindly help me  out of this misery.
>>
>> I have a set has 8 single shard collections with 3 replicas. As soon as I
>> updated the configs and started the servers one of my collection got stuck
>> with no leader. I have restarted solr to no avail, I also tried to force a
>> leader via collections API that dint work either. I also see that, from
>> time to time multiple solr nodes go down all at the same time, only a
>> restart resolves the issue.
>>
>> The error snippets are shown below
>>
>> 2017-02-02 01:43:42.785 ERROR (recoveryExecutor-3-thread-6-processing-n:
>> 10.128.159.245:9001_solr x:clicktrack_shard1_replica1 s:shard1
>> c:clicktrack r:core_node1) [c:clicktrack s:shard1 r:core_node1
>> x:clicktrack_shard1_replica1] o.a.s.c.RecoveryStrategy Error while trying
>> to recover. 
>> core=clicktrack_shard1_replica1:org.apache.solr.common.SolrException:
>> No registered leader was found after waiting for 4000ms , collection:
>> clicktrack slice: shard1
>>
>> solr.log.9:2017-02-02 01:43:41.336 INFO  (zkCallback-4-thread-29-proces
>> sing-n:10.128.159.245:9001_solr) [   ] o.a.s.c.c.ZkStateReader A cluster
>> state change: [WatchedEvent state:SyncConnected type:NodeDataChanged
>> path:/collections/clicktrack/state.json] for collection [clicktrack] has
>> occurred - updating... (live nodes size: [1])
>> solr.log.9:2017-02-02 01:43:42.224 INFO  (zkCallback-4-thread-29-proces
>> sing-n:10.128.159.245:9001_solr) [  

Re: 6.4.0 collection leader election and recovery issues

2017-02-02 Thread Ravi Solr
Thanks Hendrik. Iam baffled as to why I did not hit this issue prior to
moving to 6.4.0.

On Thu, Feb 2, 2017 at 7:58 AM, Hendrik Haddorp 
wrote:

> Might be that your overseer queue overloaded. Similar to what is described
> here:
> https://support.lucidworks.com/hc/en-us/articles/203959903-
> Bringing-up-downed-Solr-servers-that-don-t-want-to-come-up
>
> If the overseer queue gets too long you get hit by this:
> https://github.com/Netflix/curator/wiki/Tech-Note-4
>
> Try to request the overseer status 
> (/solr/admin/collections?action=OVERSEERSTATUS).
> If that fails you likely hit that problem. If so you can also not use the
> ZooKeeper command line client anymore. You can now restart all your ZK
> nodes with an increases jute.maxbuffer value. Once ZK is restarted you can
> use the ZK command line client with the same jute.maxbuffer value and check
> how many entries /overseer/queue has in ZK. Normally there should be a few
> entries but if you see thousands then you should delete them. I used a few
> lines of Java code for that, again setting jute.maxbuffer to the same
> value. Once cleaned up restart the Solr nodes one by one and keep an eye on
> the overseer status.
>
>
> On 02.02.2017 10:52, Ravi Solr wrote:
>
>> Following up on my previous email, the intermittent server unavailability
>> seems to be linked to the interaction between Solr and Zookeeper. Can
>> somebody help me understand what this error means and how to recover from
>> it.
>>
>> 2017-02-02 09:44:24.648 ERROR
>> (recoveryExecutor-3-thread-16-processing-n:xx.xxx.xxx.xxx:1234_solr
>> x:clicktrack_shard1_replica4 s:shard1 c:clicktrack r:core_node3)
>> [c:clicktrack s:shard1 r:core_node3 x:clicktrack_shard1_replica4]
>> o.a.s.c.RecoveryStrategy Error while trying to recover.
>> core=clicktrack_shard1_replica4:org.apache.zookeeper.KeeperE
>> xception$SessionExpiredException:
>> KeeperErrorCode = Session expired for /overseer/queue/qn-
>>  at org.apache.zookeeper.KeeperException.create(KeeperException.
>> java:127)
>>  at org.apache.zookeeper.KeeperException.create(KeeperException.
>> java:51)
>>  at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
>>  at
>> org.apache.solr.common.cloud.SolrZkClient$9.execute(SolrZkCl
>> ient.java:391)
>>  at
>> org.apache.solr.common.cloud.SolrZkClient$9.execute(SolrZkCl
>> ient.java:388)
>>  at
>> org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(Zk
>> CmdExecutor.java:60)
>>  at
>> org.apache.solr.common.cloud.SolrZkClient.create(SolrZkClient.java:388)
>>  at
>> org.apache.solr.cloud.DistributedQueue.offer(DistributedQueue.java:244)
>>  at org.apache.solr.cloud.ZkController.publish(ZkController.
>> java:1215)
>>  at org.apache.solr.cloud.ZkController.publish(ZkController.
>> java:1128)
>>  at org.apache.solr.cloud.ZkController.publish(ZkController.
>> java:1124)
>>  at
>> org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoverySt
>> rategy.java:334)
>>  at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.
>> java:222)
>>  at
>> com.codahale.metrics.InstrumentedExecutorService$Instrumente
>> dRunnable.run(InstrumentedExecutorService.java:176)
>>  at
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>  at
>> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolE
>> xecutor.lambda$execute$0(ExecutorUtil.java:229)
>>  at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
>> Executor.java:1142)
>>  at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
>> lExecutor.java:617)
>>  at java.lang.Thread.run(Thread.java:745)
>>
>> Thanks
>>
>> Ravi Kiran Bhaskar
>>
>> On Thu, Feb 2, 2017 at 2:27 AM, Ravi Solr  wrote:
>>
>> Hello,
>>>   Yesterday I upgraded from 6.0.1 to 6.4.0, its been straight 12
>>> hours of debugging spree!! Can somebody kindly help me  out of this
>>> misery.
>>>
>>> I have a set has 8 single shard collections with 3 replicas. As soon as I
>>> updated the configs and started the servers one of my collection got
>>> stuck
>>> with no leader. I have restarted solr to no avail, I also tried to force
>>> a
>>> leader via collections API that dint work either. I also see that, from
>>> time to time multiple solr nodes go down all at the same time, only a
>>> restart resolves the issue.
>

Re: 6.4.0 collection leader election and recovery issues

2017-02-02 Thread Ravi Solr
Thanks Shawn. Yes I did index some docs after moving to 6.4.0. The release
notes did not mention anything about format being changed so I thought it
would be backward compatible. Yeah my only recourse is to re-index data.
Apart from that it was weird problems overall with 6.4.0. I was excited
about using the unified highlighter but the zookeeper flakiness and
constant disconnections of solr and sometimes not electing a leader for
some collections made me rollback.

Anyway thanks for promptly responding, will be more careful form next time.

Thanks

Ravi Kiran Bhaskar



On Thu, Feb 2, 2017 at 9:41 AM, Shawn Heisey  wrote:

> On 2/2/2017 7:23 AM, Ravi Solr wrote:
> > When i try to rollback from 6.4.0 to my original version of 6.0.1 it now
> > throws another issue. Now I cant go to 6.4.0 nor can I roll back to 6.0.1
> >
> > Could not load codec 'Lucene62'.  Did you forget to add
> > lucene-backward-codecs.jar?
> > at org.apache.lucene.index.SegmentInfos.readCodec(
> SegmentInfos.java:429)
> > at
> > org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:349)
> > at
> > org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:284)
> >
> > Hope this doesnt cost me dearly. Any ideas at least on how to rollback
> > safely.
>
> This sounds like you did some indexing after the upgrade, or possibly
> some index optimizing, so the parts of the index that were written (or
> merged) by the newer version are now in a format that the older version
> cannot use.  Perhaps the merge policy was changed, causing Solr to do
> some automatic merges once it started up.  I am not aware of anything in
> Solr that would write new segments without indexing input or a merge
> policy change.
>
> As far as I know, there is no straightforward way to go backwards with
> the index format.  If you want to downgrade and don't have a backup of
> your indexes from before the upgrade, you'll probably need to wipe the
> index directory and completely reindex.
>
> Solr will always use the newest default index format for new segments
> when you upgrade.  Contrary to many user expectations, setting
> luceneMatchVersion will *NOT* affect the index format, only the behavior
> of components that do field analysis.
>
> Downgrading the index format would involve writing a custom Lucene
> program that changes the active index format to the older version, then
> runs a forceMerge on the index.  It would be completely separate from
> Solr, and definitely not straightforward.
>
> Thanks,
> Shawn
>
>


solr 4.x reindexing issues

2014-03-24 Thread Ravi Solr
Hello,
We are trying to reindex as part of our move from 3.6.2 to 4.6.1
and have faced various issues reindexing 1.5 Million docs. We dont use
solrcloud, its still Master/Slave config. For testing this Iam using a
single test server reading from it and putting back into same index.

We send docs in batches of 100 but only 10/100 are getting indexed, is this
related to the maxBufferedAddsPerServer setting that is hard coded ?? Also
I tried to play with autocommit and softcommit settings but in vain.


   5
   5000
   true



1000


I use these on the test system just to check if docs are being indexed, but
even with a batch of 5 my solrj client code runs faster than indexing
causing some docs to not get indexed. The function that's indexing is a
recursive method call  (shown below) which fails after sometime with stack
overflow (I did not have this issue with 3.6.2 with same code)

private static void processDocs(HttpSolrServer server, Integer start,
Integer rows) throws Exception {
SolrQuery query = new SolrQuery();
query.setQuery("*:*");
query.addFilterQuery("-allfields:[* TO *]");
QueryResponse resp = server.query(query);
SolrDocumentList list =  resp.getResults();
Long total = list.getNumFound();

if(list != null && !list.isEmpty()) {
for(SolrDocument doc : list) {
SolrInputDocument iDoc =
ClientUtils.toSolrInputDocument(doc);
//To index full doc again
iDoc.removeField("_version_");
server.add(iDoc, 1000);
}

System.out.println("Indexed " + (start+rows) + "/" + total);
if (total >= (start + rows)) {
processDocs(server, (start + rows), rows);
}
}
}

I also tried turning on the updateLog but that was filling up so fast to
the point where it is useless.

How do we do bulk updates in solr 4.x environment ?? Is there any setting
that Iam missing ??

Thanks

Ravi Kiran Bhaskar
Technical Architect
The Washington Post


Re: solr 4.x reindexing issues

2014-03-25 Thread Ravi Solr
Thank you very much for responding Mr. Høydahl. I removed the recursion
which eliminated the stack overflow exception. However, I still
encountering my main problem with the docs not getting indexed in solr 4.x
as I mentioned in my original email. The reason I am reindexing is that
with solr 4.x EnglishPorterFilterFactory has been removed and also I wanted
to add another copyField of all field values into destination "allfields"

As per your suggestion I removed softcommit and had autoCommit to maxDocs
100 and maxTime to 12. I was printing out the indexing call...You can
clearly see still it does index around 10 at a time (testing code and
results shown below). Again my code finished fully and just for a good
measure I commited manually after 10 minutes still when I query I only see
"13513" docs got indexed.

There must be something else I am missing


 
  0
  1
  
   allfields:[* TO *]
xml
0
  
  
  

TEST INDEXER CODE
 ---
Long total = null;
Integer start = 0;
Integer rows = 100;
while(total == null || total >= (start+rows)) {
SolrQuery query = new SolrQuery();
query.setQuery("*:*");
query.setSort("displaydatetime", ORDER.desc);
query.addFilterQuery("-allfields:[* TO *]");
QueryResponse resp = server.query(query);
SolrDocumentList list =  resp.getResults();
total = list.getNumFound();

if(list != null && !list.isEmpty()) {
for(SolrDocument doc : list) {
SolrInputDocument iDoc =
ClientUtils.toSolrInputDocument(doc);
//To index full doc again
iDoc.removeField("_version_");
server.add(iDoc);
}

System.out.println("Indexed " + (start+rows) + "/" + total);
start = (start+rows);
}
}

   System.out.println("COMPLETELY DONE");

System.out output
-
Indexed 1252100/1256575
Indexed 1252200/1256575
Indexed 1252300/1256575
Indexed 1252400/1256575
Indexed 1252500/1256575
Indexed 1252600/1256575
Indexed 1252700/1256575
Indexed 1252800/1256575
Indexed 1252900/1256575
Indexed 1253000/1256575
Indexed 1253100/1256566
Indexed 1253200/1256566
Indexed 1253300/1256566
Indexed 1253400/1256566
Indexed 1253500/1256566
Indexed 1253600/1256566
Indexed 1253700/1256566
Indexed 1253800/1256566
Indexed 1253900/1256566
Indexed 1254000/1256566
Indexed 1254100/1256566
Indexed 1254200/1256566
Indexed 1254300/1256566
Indexed 1254400/1256566
Indexed 1254500/1256566
Indexed 1254600/1256566
Indexed 1254700/1256566
Indexed 1254800/1256566
Indexed 1254900/1256566
Indexed 1255000/1256566
Indexed 1255100/1256566
Indexed 1255200/1256566
Indexed 1255300/1256566
Indexed 1255400/1256566
Indexed 1255500/1256566
Indexed 1255600/1256566
Indexed 1255700/1256557
Indexed 1255800/1256557
Indexed 1255900/1256557
Indexed 1256000/1256557
Indexed 1256100/1256557
Indexed 1256200/1256557
Indexed 1256300/1256557
Indexed 1256400/1256557
Indexed 1256500/1256557
COMPLETELY DONE


Thanks,
Ravi Kiran Bhaskar



On Tue, Mar 25, 2014 at 7:13 AM, Jan Høydahl  wrote:

> Hi,
>
> Seems you try to reindex from one server to the other.
>
> Be aware that it could be easier for you to simply copy the whole index
> folder over to your 4.6.1 server and start Solr as it will be able to read
> your 3.x index. This is unless you also want to do major upgrades of your
> schema or update processors so that you'll need a re-index anyway.
>
> If you believe you really need a re-index, then please try to batch index
> without triggering commits every few seconds - this is really heavy on the
> system and completely unnecessary. You won't get the benefit of SoftCommit
> if you're not running SolrCloud, so no need to configure that.
>
> I would change your  into maxDocs=1 and maxTime=12
> (every 2min).
> Further please index without 1s commitWithin, i.e. instead of
> >server.add(iDoc, 1000);
> use
> >server.add(iDoc);
>
> This will make sure the server gets room to breathe and not constantly
> generating new indices.
>
> Finally, it's probably not a good idea to use recursion here. You really
> don't need to, filling up your stack. You can instead refactor the method
> to do the whole indexing. And a hint is that it is generally better to ask
> for ALL documents in one go and stream to the end rather than increasing
> offsets with new queries all the time - because high offsets/start can be
> time consuming, especially with multiple shards. If you increase the
> timeout enough you should be able to retrieve all documents in 

Re: solr 4.x reindexing issues

2014-03-25 Thread Ravi Solr
Iam also seeing the following in the log. Is it really commiting ??? Now I
am totally confused about how solr 4.x indexes. My relavant update config
is as shown below

  
1

   100
   12
   false

  

[#|2014-03-25T13:44:03.765-0400|INFO|glassfish3.1.2|javax.enterprise.system.std.com.sun.enterprise.server.logging|_ThreadID=86;_ThreadName=commitScheduler-6-thread-1;|820509
[commitScheduler-6-thread-1] INFO  org.apache.solr.update.UpdateHandler  -
start
commit{,optimize=false,openSearcher=false,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
|#]

[#|2014-03-25T13:44:03.766-0400|INFO|glassfish3.1.2|javax.enterprise.system.std.com.sun.enterprise.server.logging|_ThreadID=83;_ThreadName=http-thread-pool-8080(4);|820510
[http-thread-pool-8080(4)] INFO
org.apache.solr.update.processor.LogUpdateProcessor  - [sitesearchcore]
webapp=/solr-admin path=/update params={wt=javabin&version=2}
{add=[09f693e6-9a6f-11e3-9900-dd917233cf9c]} 0 13
|#]

[#|2014-03-25T13:44:03.898-0400|INFO|glassfish3.1.2|javax.enterprise.system.std.com.sun.enterprise.server.logging|_ThreadID=86;_ThreadName=commitScheduler-6-thread-1;|820642
[commitScheduler-6-thread-1] INFO  org.apache.solr.core.SolrCore  -
SolrDeletionPolicy.onCommit: commits: num=3

commit{dir=/data/solr/core/sitesearch-data/index,segFN=segments_9y68,generation=464192}

commit{dir=/data/solr/core/sitesearch-data/index,segFN=segments_9yjf,generation=464667}

commit{dir=/data/solr/core/sitesearch-data/index,segFN=segments_9yjg,generation=464668}
|#]

[#|2014-03-25T13:44:03.898-0400|INFO|glassfish3.1.2|javax.enterprise.system.std.com.sun.enterprise.server.logging|_ThreadID=86;_ThreadName=commitScheduler-6-thread-1;|820642
[commitScheduler-6-thread-1] INFO  org.apache.solr.core.SolrCore  - newest
commit generation = 464668
|#]

[#|2014-03-25T13:44:03.908-0400|INFO|glassfish3.1.2|javax.enterprise.system.std.com.sun.enterprise.server.logging|_ThreadID=86;_ThreadName=commitScheduler-6-thread-1;|820652
[commitScheduler-6-thread-1] INFO
org.apache.solr.search.SolrIndexSearcher  - Opening
Searcher@1e2ca86e[sitesearchcore]
realtime
|#]

[#|2014-03-25T13:44:03.909-0400|INFO|glassfish3.1.2|javax.enterprise.system.std.com.sun.enterprise.server.logging|_ThreadID=86;_ThreadName=commitScheduler-6-thread-1;|820653
[commitScheduler-6-thread-1] INFO  org.apache.solr.update.UpdateHandler  -
end_commit_flush


Thanks

Ravi Kiran Bhaskar


On Tue, Mar 25, 2014 at 1:10 PM, Ravi Solr  wrote:

> Thank you very much for responding Mr. Høydahl. I removed the recursion
> which eliminated the stack overflow exception. However, I still
> encountering my main problem with the docs not getting indexed in solr 4.x
> as I mentioned in my original email. The reason I am reindexing is that
> with solr 4.x EnglishPorterFilterFactory has been removed and also I wanted
> to add another copyField of all field values into destination "allfields"
>
> As per your suggestion I removed softcommit and had autoCommit to maxDocs
> 100 and maxTime to 12. I was printing out the indexing call...You can
> clearly see still it does index around 10 at a time (testing code and
> results shown below). Again my code finished fully and just for a good
> measure I commited manually after 10 minutes still when I query I only see
> "13513" docs got indexed.
>
> There must be something else I am missing
>
> 
>  
>   0
>   1
>   
>allfields:[* TO *]
> xml
> 0
>   
>   
>   
>
> TEST INDEXER CODE
>  ---
> Long total = null;
> Integer start = 0;
> Integer rows = 100;
> while(total == null || total >= (start+rows)) {
>
> SolrQuery query = new SolrQuery();
> query.setQuery("*:*");
> query.setSort("displaydatetime", ORDER.desc);
>
> query.addFilterQuery("-allfields:[* TO *]");
> QueryResponse resp = server.query(query);
> SolrDocumentList list =  resp.getResults();
> total = list.getNumFound();
>
> if(list != null && !list.isEmpty()) {
> for(SolrDocument doc : list) {
> SolrInputDocument iDoc =
> ClientUtils.toSolrInputDocument(doc);
> //To index full doc again
> iDoc.removeField("_version_");
> server.add(iDoc);
>
> }
>
> System.out.println("Indexed " + (start+rows) + "/" +
> total);
> start = (start+rows);
> }
> }
>
>System.out.println("COMPLETELY DONE");
>
> System.out output
> 

Re: solr 4.x reindexing issues

2014-03-25 Thread Ravi Solr
I just tried even reading from one core A and indexed it into core B and
the same issue still persists.


On Tue, Mar 25, 2014 at 2:49 PM, Lan  wrote:

> Ravi,
>
> It looks like you are re-indexing data by pulling data from your solr
> server
> and then indexing it back to the same server. I can think of many things
> that could go wrong with this setup. For example are all your fields
> stored?
> Since you are iterating through all documents on the solr server and at the
> same time modifying the index, the sort order could change.
>
> To make it easier to identify any bugs in your process, you should index
> into a second solr server that is *EMPTY* so you can identify any problems.
>
> Generally when people re-index data, they dont pull the data from Solr but
> from system of record such as a DB.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/solr-4-x-reindexing-issues-tp4126695p4126986.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: solr 4.x reindexing issues

2014-03-25 Thread Ravi Solr
Sorry Guys, really apologize for wasting your time...bone headed coding on
my part. Did not set the rows and start to correct values for proper
pagination so it was getting the same 10 docs every single time.

Thanks
Ravi Kiran Bhaskar


On Tue, Mar 25, 2014 at 3:50 PM, Ravi Solr  wrote:

> I just tried even reading from one core A and indexed it into core B and
> the same issue still persists.
>
>
> On Tue, Mar 25, 2014 at 2:49 PM, Lan  wrote:
>
>> Ravi,
>>
>> It looks like you are re-indexing data by pulling data from your solr
>> server
>> and then indexing it back to the same server. I can think of many things
>> that could go wrong with this setup. For example are all your fields
>> stored?
>> Since you are iterating through all documents on the solr server and at
>> the
>> same time modifying the index, the sort order could change.
>>
>> To make it easier to identify any bugs in your process, you should index
>> into a second solr server that is *EMPTY* so you can identify any
>> problems.
>>
>> Generally when people re-index data, they dont pull the data from Solr but
>> from system of record such as a DB.
>>
>>
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/solr-4-x-reindexing-issues-tp4126695p4126986.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>
>


Relevancy help

2014-05-05 Thread Ravi Solr
Hello,
I have a weird relevancy requirement. We search news content hence
chronology is very important and also relevancy, although both are mutually
exclusive. For example, if the search terms are -  malaysia airline crash
blackbox - my requirements are as follows

docs containing all words should be on top, but the editorial also wants
them sorted reverse by chronological order without loosing relevancy. Why
?? If on day 1 there is an article about search for blackbox but on day 2
the blackbox is found and day 3 there is an article about blackbox being
unusable...from the user's standpoint it makes sense that we show most
recent content on top.

I already boost recency of docs with
boost=recip(ms(NOW/HOUR,displaydatetime),7.889e-10,1,1) i.e. increments of
3 months

However when I do the boost the chronology is messed up. I know relevancy
and sorting are mutually exclusive concepts. Is there any magic that we can
do in SOLR which can achieve both ???


Thanks,

Ravi Kiran bhaskar


Re: Relevancy help

2014-05-06 Thread Ravi Solr
Thank you very much for your responses.

Jack, even if I were to tweak the boost factor it might not work in all
cases. So I was looking at a more generic way via Function Queries to
achieve my goal.

Ahmet, I did see Jan Høydahl's response on all terms boosting as follows-
 q=a
fox&defType=dismax&qf=allfields&bf=map(query($qq),0,0,0,100.0)&qq=allfields:(quick
AND brown AND fence)
This is what Iam looking for however instead of a constant boost I am
thinking the '100.0' could be replaced with some mathematical function
between score and publish date. I ran into trouble as score cannot be used
directly in a function query. Is query(x) the right way to get score ???

Alexandre I couldn't find any documentation on QueryRescore API...if you
know of any can you kindly point it out.



Ravi Kiran Bhaskar


On Tue, May 6, 2014 at 12:03 AM, Alexandre Rafalovitch
wrote:

> Can you sort by score, than date? Assuming similar articles will get
> same score (may need to discount frequency/length).
>
> There is also QueryRescore API introduced in Lucene 4.8 that might be
> relevant. Though I have no idea how that would get exposed in Solr.
>
> Regards,
>Alex.
> Personal website: http://www.outerthoughts.com/
> Current project: http://www.solr-start.com/ - Accelerating your Solr
> proficiency
>
>
> On Tue, May 6, 2014 at 5:12 AM, Ahmet Arslan  wrote:
> > Hi Ravi,
> >
> > Regarding recency please see :
> http://www.slideshare.net/lucenerevolution/potter-timothy-boosting-documents-in-solr
> >
> > Regarding "docs containing all words" there is function query that
> elevates those docs to top. Search existing mailing list past posts.
> >
> > Ahmet
> >
> >
> > On Tuesday, May 6, 2014 12:42 AM, Ravi Solr  wrote:
> >
> > Hello,
> > I have a weird relevancy requirement. We search news content
> hence
> > chronology is very important and also relevancy, although both are
> mutually
> > exclusive. For example, if the search terms are -  malaysia airline crash
> > blackbox - my requirements are as follows
> >
> > docs containing all words should be on top, but the editorial also wants
> > them sorted reverse by chronological order without loosing relevancy. Why
> > ?? If on day 1 there is an article about search for blackbox but on day 2
> > the blackbox is found and day 3 there is an article about blackbox being
> > unusable...from the user's standpoint it makes sense that we show most
> > recent content on top.
> >
> > I already boost recency of docs with
> > boost=recip(ms(NOW/HOUR,displaydatetime),7.889e-10,1,1) i.e. increments
> of
> > 3 months
> >
> > However when I do the boost the chronology is messed up. I know relevancy
> > and sorting are mutually exclusive concepts. Is there any magic that we
> can
> > do in SOLR which can achieve both ???
> >
> >
> > Thanks,
> >
> > Ravi Kiran bhaskar
>


Query ReRanking question

2014-09-04 Thread Ravi Solr
Can the ReRanking API be used to sort within docs retrieved by a date field
? Can somebody help me understand how to write such a query ?

Thanks

Ravi Kiran Bhaskar


Re: Query ReRanking question

2014-09-05 Thread Ravi Solr
Thank you very much for responding. I want to do exactly the opposite of
what you said. I want to sort the relevant docs in reverse chronology. If
you sort by date before hand then the relevancy is lost. So I want to get
Top N relevant results and then rerank those Top N to achieve relevant
reverse chronological results.

If you ask Why would I want to do that ??

Lets take a example about Malaysian airline crash. several articles might
have been published over a period of time. When I search for - malaysia
airline crash blackbox - I would want to see "relevant" results but would
also like to see the the recent developments on the top i.e. effectively a
reverse chronological order within the relevant results, like telling a
story over a period of time

Hope i am clear. Thanks for your help.

Thanks

Ravi Kiran Bhaskar


On Thu, Sep 4, 2014 at 5:08 PM, Joel Bernstein  wrote:

> If you want the main query to be sorted by date then the top N docs
> reranked by a query, that should work. Try something like this:
>
> q=foo&sort=date+desc&rq={!rerank reRandDocs=1000
> reRankQuery=$myquery}&myquery=blah
>
>
> Joel Bernstein
> Search Engineer at Heliosearch
>
>
> On Thu, Sep 4, 2014 at 4:25 PM, Ravi Solr  wrote:
>
> > Can the ReRanking API be used to sort within docs retrieved by a date
> field
> > ? Can somebody help me understand how to write such a query ?
> >
> > Thanks
> >
> > Ravi Kiran Bhaskar
> >
>


Re: Query ReRanking question

2014-09-05 Thread Ravi Solr
Erick, I believe when you apply sort this way it runs the query and sort
first and then tries to rerank...so basically it already lost the true
relevancy because of sort taking precedence. Am I making sense ?

Ravi Kiran Bhaskar


On Fri, Sep 5, 2014 at 1:23 PM, Erick Erickson 
wrote:

> OK, why can't you switch the clauses from Joel's suggestion?
>
> Something like:
> q=Malaysia plane crash&rq={!rerank reRankDocs=1000
> reRankQuery=$myquery}&myquery=*:*&sort=date+desc
>
> (haven't tried this yet, but you get the idea).
>
> Best,
> Erick
>
> On Fri, Sep 5, 2014 at 9:33 AM, Markus Jelsma
>  wrote:
> > Hi - You can already achieve this by boosting on the document's recency.
> The result set won't be exactly ordered by date but you will get the most
> relevant and recent documents on top.
> >
> > Markus
> >
> > -Original message-
> >> From:Ravi Solr mailto:ravis...@gmail.com> >
> >> Sent: Friday 5th September 2014 18:06
> >> To: solr-user@lucene.apache.org <mailto:solr-user@lucene.apache.org>
> >> Subject: Re: Query ReRanking question
> >>
> >> Thank you very much for responding. I want to do exactly the opposite of
> >> what you said. I want to sort the relevant docs in reverse chronology.
> If
> >> you sort by date before hand then the relevancy is lost. So I want to
> get
> >> Top N relevant results and then rerank those Top N to achieve relevant
> >> reverse chronological results.
> >>
> >> If you ask Why would I want to do that ??
> >>
> >> Lets take a example about Malaysian airline crash. several articles
> might
> >> have been published over a period of time. When I search for - malaysia
> >> airline crash blackbox - I would want to see "relevant" results but
> would
> >> also like to see the the recent developments on the top i.e.
> effectively a
> >> reverse chronological order within the relevant results, like telling a
> >> story over a period of time
> >>
> >> Hope i am clear. Thanks for your help.
> >>
> >> Thanks
> >>
> >> Ravi Kiran Bhaskar
> >>
> >>
> >> On Thu, Sep 4, 2014 at 5:08 PM, Joel Bernstein  <mailto:joels...@gmail.com> > wrote:
> >>
> >> > If you want the main query to be sorted by date then the top N docs
> >> > reranked by a query, that should work. Try something like this:
> >> >
> >> > q=foo&sort=date+desc&rq={!rerank reRandDocs=1000
> >> > reRankQuery=$myquery}&myquery=blah
> >> >
> >> >
> >> > Joel Bernstein
> >> > Search Engineer at Heliosearch
> >> >
> >> >
> >> > On Thu, Sep 4, 2014 at 4:25 PM, Ravi Solr  <mailto:ravis...@gmail.com> > wrote:
> >> >
> >> > > Can the ReRanking API be used to sort within docs retrieved by a
> date
> >> > field
> >> > > ? Can somebody help me understand how to write such a query ?
> >> > >
> >> > > Thanks
> >> > >
> >> > > Ravi Kiran Bhaskar
> >> > >
> >> >
> >>
> >
>


Re: Query ReRanking question

2014-09-05 Thread Ravi Solr
Walter, thank you for the valuable insight. The problem I am facing is that
between the term frequencies, mm, date boost and stemming the results can
become very inconsistent...Look at the following examples

Here the chronology is all over the place because of what I mentioned above
http://www.washingtonpost.com/pb/newssearch/?query=malaysian+airline+crash

Now take the instance of an old topic/news which was covered a a while ago
for a period of time but not actively updated recently...In this case, the
date boosting predominantly takes over because of common terms and we get a
rash of irrelevant content

http://www.washingtonpost.com/pb/newssearch/?query=faces+of+the+fallen

This has become such a balancing act and hence I was looking to see if
reRanking might help

Thanks

Ravi Kiran Bhaskar





On Fri, Sep 5, 2014 at 1:32 PM, Walter Underwood 
wrote:

> Boosting on recency is probably a better approach. A fixed re-ranking
> horizon will always be a compromise, a guess at the precision of the query.
> It will give poor results for queries that are more or less specific than
> the assumption.
>
> Think of the recency boost as a tie-breaker. When documents are similar in
> relevance, show the most recent. This can work over a wide range of queries.
>
> For “malaysian airlines crash”, there are two sets of relevant documents,
> one set on MH 370 starting six months ago, and one set on MH 17, two months
> ago. But four hours ago, The Guardian published a “six months on” article
> on MH 370. A recency boost will handle that complexity.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/
>
>
> On Sep 5, 2014, at 10:23 AM, Erick Erickson 
> wrote:
>
> > OK, why can't you switch the clauses from Joel's suggestion?
> >
> > Something like:
> > q=Malaysia plane crash&rq={!rerank reRankDocs=1000
> > reRankQuery=$myquery}&myquery=*:*&sort=date+desc
> >
> > (haven't tried this yet, but you get the idea).
> >
> > Best,
> > Erick
> >
> > On Fri, Sep 5, 2014 at 9:33 AM, Markus Jelsma
> >  wrote:
> >> Hi - You can already achieve this by boosting on the document's
> recency. The result set won't be exactly ordered by date but you will get
> the most relevant and recent documents on top.
> >>
> >> Markus
> >>
> >> -Original message-
> >>> From:Ravi Solr mailto:ravis...@gmail.com> >
> >>> Sent: Friday 5th September 2014 18:06
> >>> To: solr-user@lucene.apache.org <mailto:solr-user@lucene.apache.org>
> >>> Subject: Re: Query ReRanking question
> >>>
> >>> Thank you very much for responding. I want to do exactly the opposite
> of
> >>> what you said. I want to sort the relevant docs in reverse chronology.
> If
> >>> you sort by date before hand then the relevancy is lost. So I want to
> get
> >>> Top N relevant results and then rerank those Top N to achieve relevant
> >>> reverse chronological results.
> >>>
> >>> If you ask Why would I want to do that ??
> >>>
> >>> Lets take a example about Malaysian airline crash. several articles
> might
> >>> have been published over a period of time. When I search for - malaysia
> >>> airline crash blackbox - I would want to see "relevant" results but
> would
> >>> also like to see the the recent developments on the top i.e.
> effectively a
> >>> reverse chronological order within the relevant results, like telling a
> >>> story over a period of time
> >>>
> >>> Hope i am clear. Thanks for your help.
> >>>
> >>> Thanks
> >>>
> >>> Ravi Kiran Bhaskar
> >>>
> >>>
> >>> On Thu, Sep 4, 2014 at 5:08 PM, Joel Bernstein  <mailto:joels...@gmail.com> > wrote:
> >>>
> >>>> If you want the main query to be sorted by date then the top N docs
> >>>> reranked by a query, that should work. Try something like this:
> >>>>
> >>>> q=foo&sort=date+desc&rq={!rerank reRandDocs=1000
> >>>> reRankQuery=$myquery}&myquery=blah
> >>>>
> >>>>
> >>>> Joel Bernstein
> >>>> Search Engineer at Heliosearch
> >>>>
> >>>>
> >>>> On Thu, Sep 4, 2014 at 4:25 PM, Ravi Solr  <mailto:ravis...@gmail.com> > wrote:
> >>>>
> >>>>> Can the ReRanking API be used to sort within docs retrieved by a date
> >>>> field
> >>>>> ? Can somebody help me understand how to write such a query ?
> >>>>>
> >>>>> Thanks
> >>>>>
> >>>>> Ravi Kiran Bhaskar
> >>>>>
> >>>>
> >>>
> >>
>
>


Re: Query ReRanking question

2014-09-06 Thread Ravi Solr
Erick,
Your idea about reversing Joel's suggestion seems to give the best
results of all the options I tried...but I cant seem to understand why. I
thought the query shown below should give irrelevant results as sorting by
date would throw relevancy off...but somehow its getting relevant results
with fair enough reverse chronology. It is as if the sort is applied after
the docs are collected and reranked (which is what I wanted). One more
thing that baffled me was, if I change reRankDocs from 1000 to100 the
results become irrelevant, which doesnt make sense.

So can you kindly explain whats going on in the following query.

http://localhost:8080/solr/select?q=malaysian airline crash&rq={!rerank
reRankQuery=$rqq reRankDocs=1000}&rqq=*:*&sort=publish_date
desc&fl=headline,publish_date,score

I love the solr community, so much to learn from so many knowledgeable
people.

Thanks

Ravi Kiran Bhaskar



On Fri, Sep 5, 2014 at 1:23 PM, Erick Erickson 
wrote:

> OK, why can't you switch the clauses from Joel's suggestion?
>
> Something like:
> q=Malaysia plane crash&rq={!rerank reRankDocs=1000
> reRankQuery=$myquery}&myquery=*:*&sort=date+desc
>
> (haven't tried this yet, but you get the idea).
>
> Best,
> Erick
>
> On Fri, Sep 5, 2014 at 9:33 AM, Markus Jelsma
>  wrote:
> > Hi - You can already achieve this by boosting on the document's recency.
> The result set won't be exactly ordered by date but you will get the most
> relevant and recent documents on top.
> >
> > Markus
> >
> > -Original message-
> >> From:Ravi Solr mailto:ravis...@gmail.com> >
> >> Sent: Friday 5th September 2014 18:06
> >> To: solr-user@lucene.apache.org <mailto:solr-user@lucene.apache.org>
> >> Subject: Re: Query ReRanking question
> >>
> >> Thank you very much for responding. I want to do exactly the opposite of
> >> what you said. I want to sort the relevant docs in reverse chronology.
> If
> >> you sort by date before hand then the relevancy is lost. So I want to
> get
> >> Top N relevant results and then rerank those Top N to achieve relevant
> >> reverse chronological results.
> >>
> >> If you ask Why would I want to do that ??
> >>
> >> Lets take a example about Malaysian airline crash. several articles
> might
> >> have been published over a period of time. When I search for - malaysia
> >> airline crash blackbox - I would want to see "relevant" results but
> would
> >> also like to see the the recent developments on the top i.e.
> effectively a
> >> reverse chronological order within the relevant results, like telling a
> >> story over a period of time
> >>
> >> Hope i am clear. Thanks for your help.
> >>
> >> Thanks
> >>
> >> Ravi Kiran Bhaskar
> >>
> >>
> >> On Thu, Sep 4, 2014 at 5:08 PM, Joel Bernstein  <mailto:joels...@gmail.com> > wrote:
> >>
> >> > If you want the main query to be sorted by date then the top N docs
> >> > reranked by a query, that should work. Try something like this:
> >> >
> >> > q=foo&sort=date+desc&rq={!rerank reRandDocs=1000
> >> > reRankQuery=$myquery}&myquery=blah
> >> >
> >> >
> >> > Joel Bernstein
> >> > Search Engineer at Heliosearch
> >> >
> >> >
> >> > On Thu, Sep 4, 2014 at 4:25 PM, Ravi Solr  <mailto:ravis...@gmail.com> > wrote:
> >> >
> >> > > Can the ReRanking API be used to sort within docs retrieved by a
> date
> >> > field
> >> > > ? Can somebody help me understand how to write such a query ?
> >> > >
> >> > > Thanks
> >> > >
> >> > > Ravi Kiran Bhaskar
> >> > >
> >> >
> >>
> >
>


Re: Query ReRanking question

2014-09-06 Thread Ravi Solr
Joel, that was exactly what I was thinking too, that is why I wanted to
know the explanation. Anyway, I will modify the "fl" and report. This is
getting interesting :-)

Thanks

Ravi Kiran Bhaskar


On Sat, Sep 6, 2014 at 3:58 PM, Joel Bernstein  wrote:

> This folllowing query:
>
> http://localhost:8080/solr/select?q=malaysian airline crash&rq={!rerank
> reRankQuery=$rqq reRankDocs=1000}&rqq=*:*&sort=publish_date
> desc&fl=headline,publish_date,score
>
> Is doing the following:
>
> The main query is sorted by publish_date. Then the results are reranked by
> *:*, which in theory would have no effect at all.
>
> The reRankQuery only uses the reRankQuery to re-rank the results. The sort
> param will always apply to the main query.
>
>
>
>
>
>
>
>
>
>
>
>
> Joel Bernstein
> Search Engineer at Heliosearch
>
>
> On Sat, Sep 6, 2014 at 2:33 PM, Ravi Solr  wrote:
>
> > Erick,
> > Your idea about reversing Joel's suggestion seems to give the
> best
> > results of all the options I tried...but I cant seem to understand why. I
> > thought the query shown below should give irrelevant results as sorting
> by
> > date would throw relevancy off...but somehow its getting relevant results
> > with fair enough reverse chronology. It is as if the sort is applied
> after
> > the docs are collected and reranked (which is what I wanted). One more
> > thing that baffled me was, if I change reRankDocs from 1000 to100 the
> > results become irrelevant, which doesnt make sense.
> >
> > So can you kindly explain whats going on in the following query.
> >
> > http://localhost:8080/solr/select?q=malaysian airline crash&rq={!rerank
> > reRankQuery=$rqq reRankDocs=1000}&rqq=*:*&sort=publish_date
> > desc&fl=headline,publish_date,score
> >
> > I love the solr community, so much to learn from so many knowledgeable
> > people.
> >
> > Thanks
> >
> > Ravi Kiran Bhaskar
> >
> >
> >
> > On Fri, Sep 5, 2014 at 1:23 PM, Erick Erickson 
> > wrote:
> >
> > > OK, why can't you switch the clauses from Joel's suggestion?
> > >
> > > Something like:
> > > q=Malaysia plane crash&rq={!rerank reRankDocs=1000
> > > reRankQuery=$myquery}&myquery=*:*&sort=date+desc
> > >
> > > (haven't tried this yet, but you get the idea).
> > >
> > > Best,
> > > Erick
> > >
> > > On Fri, Sep 5, 2014 at 9:33 AM, Markus Jelsma
> > >  wrote:
> > > > Hi - You can already achieve this by boosting on the document's
> > recency.
> > > The result set won't be exactly ordered by date but you will get the
> most
> > > relevant and recent documents on top.
> > > >
> > > > Markus
> > > >
> > > > -Original message-
> > > >> From:Ravi Solr mailto:ravis...@gmail.com> >
> > > >> Sent: Friday 5th September 2014 18:06
> > > >> To: solr-user@lucene.apache.org <mailto:solr-user@lucene.apache.org
> >
> > > >> Subject: Re: Query ReRanking question
> > > >>
> > > >> Thank you very much for responding. I want to do exactly the
> opposite
> > of
> > > >> what you said. I want to sort the relevant docs in reverse
> chronology.
> > > If
> > > >> you sort by date before hand then the relevancy is lost. So I want
> to
> > > get
> > > >> Top N relevant results and then rerank those Top N to achieve
> relevant
> > > >> reverse chronological results.
> > > >>
> > > >> If you ask Why would I want to do that ??
> > > >>
> > > >> Lets take a example about Malaysian airline crash. several articles
> > > might
> > > >> have been published over a period of time. When I search for -
> > malaysia
> > > >> airline crash blackbox - I would want to see "relevant" results but
> > > would
> > > >> also like to see the the recent developments on the top i.e.
> > > effectively a
> > > >> reverse chronological order within the relevant results, like
> telling
> > a
> > > >> story over a period of time
> > > >>
> > > >> Hope i am clear. Thanks for your help.
> > > >>
> > > >> Thanks
> > > >>
> > > >> Ravi Kiran Bhaskar
> > > >>
> > > >>
> > > >> On Thu, Sep 4, 2014 at 5:08 PM, Joel Bernstein  > > <mailto:joels...@gmail.com> > wrote:
> > > >>
> > > >> > If you want the main query to be sorted by date then the top N
> docs
> > > >> > reranked by a query, that should work. Try something like this:
> > > >> >
> > > >> > q=foo&sort=date+desc&rq={!rerank reRandDocs=1000
> > > >> > reRankQuery=$myquery}&myquery=blah
> > > >> >
> > > >> >
> > > >> > Joel Bernstein
> > > >> > Search Engineer at Heliosearch
> > > >> >
> > > >> >
> > > >> > On Thu, Sep 4, 2014 at 4:25 PM, Ravi Solr  > > <mailto:ravis...@gmail.com> > wrote:
> > > >> >
> > > >> > > Can the ReRanking API be used to sort within docs retrieved by a
> > > date
> > > >> > field
> > > >> > > ? Can somebody help me understand how to write such a query ?
> > > >> > >
> > > >> > > Thanks
> > > >> > >
> > > >> > > Ravi Kiran Bhaskar
> > > >> > >
> > > >> >
> > > >>
> > > >
> > >
> >
>


Re: Query ReRanking question

2014-09-06 Thread Ravi Solr
Joel,
 I just removed the "score" from "fl" and the results still are the
same as before. So score is not causing the good results, Maybe I got lucky
and chanced on a ReRanking + Sort bug which is working to my advantage ??
:-)  The sort should have applied to the main query and then only should
the ReRank kick in. However it seems like the sort is being done after the
ReRanking, just to check the results it I queried with & without ReRanking
for the same query. The results are shown below, surely teh sort is applied
differently when ReRanking, the Mystery deepens :-)

Pure Sort Without ReRanking
http://localhost:8080/solr-admin/sitesearchcore/select?q=faces of the
fallen&sort=publish_date desc&fl=headline,publish_date



Mexico dreams face test after opening to
investors
2014-08-07T17:32:16Z


Virginia football QB David Watford tries 'to
make the best out of' fall down depth chart
2014-08-07T17:25:00Z


As wars end, a benefits system complicates the
process of moving on for spouses
2014-08-07T14:41:36Z


Wonkbook: What you need to know about Obama's
possible inversion intervention
2014-08-07T12:44:26Z


Japan architects sell a lifestyle on global
stage
2014-08-07T07:19:57Z


Japan architects sell a lifestyle on global
stage
2014-08-07T07:11:01Z


Month-long war in Gaza has left a humanitarian
and environmental crisis
2014-08-07T00:16:00Z


Shiites in India want to join the fight
against the Islamic State in Iraq
2014-08-06T11:02:59Z


Business Highlights
2014-08-05T22:12:19Z


Shining Stars finds new home in time for
school year
2014-08-05T21:57:12Z



With ReRanking and Sorting
http://localhost:8080/solr-admin/sitesearchcore/select?q=faces of the
fallen&rq={!rerank reRankQuery=$rqq
reRankDocs=1000}&rqq=*:*&sort=publish_date desc&fl=headline,publish_date




More than just the faces of the fallen
2013-05-10T21:55:44Z


Faces of the Fallen
2013-02-08T20:39:00Z


Tears for the fallen
2013-01-11T23:51:53Z


In Afghanistan, under fire from 'friends'
2013-04-06T00:19:45Z


Counting the dead
2013-02-08T23:28:42Z


Women should be trained for combat
2013-03-20T23:36:31Z


8 overlooked Civil War moments from 1864 that
could have changed history
2014-04-24T21:08:16Z


'This can't happen to the same family
twice'
2014-01-17T23:38:00Z


Islamist rebels in Syria use faces of the dead
to lure the living
2013-11-04T23:11:00Z


Behind the Wise family story
2014-01-17T23:28:00Z




Thanks

Ravi Kiran Bhaskar


On Sat, Sep 6, 2014 at 4:06 PM, Joel Bernstein  wrote:

> What may be happening here:
>
> http://localhost:8080/solr/select?q=malaysian airline crash&rq={!rerank
> reRankQuery=$rqq reRankDocs=1000}&rqq=*:*&sort=publish_date
> desc&fl=headline,publish_date,score
>
>
> Because the fl is requesting the score, possibly the scores are being
> tracked in the initial query even though it is being sorted by
> publish_date.
>
> Then during the rerank phase the the initial score is being combined with
> the *:* score which will be 1. So the effect would be to rerank the docs by
> the scores from the main query.
>
> One way to prove this would be to remove the score from the "fl" param and
> see if this changes the result ordering.
>
>
>
>
>
> Joel Bernstein
> Search Engineer at Heliosearch
>
>
> On Sat, Sep 6, 2014 at 3:58 PM, Joel Bernstein  wrote:
>
> > This folllowing query:
> >
> > http://localhost:8080/solr/select?q=malaysian airline crash&rq={!rerank
> > reRankQuery=$rqq reRankDocs=1000}&rqq=*:*&sort=publish_date
> > desc&fl=headline,publish_date,score
> >
> > Is doing the following:
> >
> > The main query is sorted by publish_date. Then the results are reranked
> by
> > *:*, which in theory would have no effect at all.
> >
> > The reRankQuery only uses the reRankQuery to re-rank the results. The
> sort
> > param will always apply to the main query.
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > Joel Bernstein
> > Search Engineer at Heliosearch
> >
> >
> > On Sat, Sep 6, 2014 at 2:33 PM, Ravi Solr  wrote:
> >
> >> Erick,
> >> Your idea about reversing Joel's suggestion seems to give the
> best
> >>

Re: Query ReRanking question

2014-09-08 Thread Ravi Solr
Joel and Erick,
   Thank you very much for explaining how the ReRanking works. Now
its a bit more clear.

Thanks,

Ravi Kiran Bhaskar

On Sun, Sep 7, 2014 at 4:45 PM, Joel Bernstein  wrote:

> Oops wrong usage pattern. It should be:
>
> 1) Main query is sorted by a field (scores tracked silently in the
> background).
> 2) Reranker is reRanking docs based on the score from the main query.
>
>
>
> Joel Bernstein
> Search Engineer at Heliosearch
>
>
> On Sun, Sep 7, 2014 at 4:43 PM, Joel Bernstein  wrote:
>
> > Ok, just reviewed the code. The ReRankingQParserPlugin always tracks the
> > scores from the main query. So this explains things. Speaking of
> explaining
> > things, the ReRankingParserPlugin also works with Lucene's explain. So if
> > you use debugQuery=true we should see that the score from the initial
> query
> > was combined with the score from the reRankQuery, which should be 1.
> >
> > You have stumbled on a interesting usage pattern which I never
> considered.
> > But basically what's happening is:
> >
> > 1) Main query is sorted by score.
> > 2) Reranker is reRanking docs based on the score from the main query.
> >
> > No, worries Erick, you've taught me a lot over the past couple of years!
> >
> >
> >
> >
> >
> >
> >
> >
> > Joel Bernstein
> > Search Engineer at Heliosearch
> >
> >
> > On Sun, Sep 7, 2014 at 11:37 AM, Erick Erickson  >
> > wrote:
> >
> >> Joel:
> >>
> >> I find that whenever I say something totally wrong publicly, I
> >> remember the correction really really well...
> >>
> >> Thanks for straightening that out!
> >> Erick
> >>
> >> On Sat, Sep 6, 2014 at 12:58 PM, Joel Bernstein 
> >> wrote:
> >> > This folllowing query:
> >> >
> >> > http://localhost:8080/solr/select?q=malaysian airline
> crash&rq={!rerank
> >> > reRankQuery=$rqq reRankDocs=1000}&rqq=*:*&sort=publish_date
> >> > desc&fl=headline,publish_date,score
> >> >
> >> > Is doing the following:
> >> >
> >> > The main query is sorted by publish_date. Then the results are
> reranked
> >> by
> >> > *:*, which in theory would have no effect at all.
> >> >
> >> > The reRankQuery only uses the reRankQuery to re-rank the results. The
> >> sort
> >> > param will always apply to the main query.
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > Joel Bernstein
> >> > Search Engineer at Heliosearch
> >> >
> >> >
> >> > On Sat, Sep 6, 2014 at 2:33 PM, Ravi Solr  wrote:
> >> >
> >> >> Erick,
> >> >> Your idea about reversing Joel's suggestion seems to give the
> >> best
> >> >> results of all the options I tried...but I cant seem to understand
> >> why. I
> >> >> thought the query shown below should give irrelevant results as
> >> sorting by
> >> >> date would throw relevancy off...but somehow its getting relevant
> >> results
> >> >> with fair enough reverse chronology. It is as if the sort is applied
> >> after
> >> >> the docs are collected and reranked (which is what I wanted). One
> more
> >> >> thing that baffled me was, if I change reRankDocs from 1000 to100 the
> >> >> results become irrelevant, which doesnt make sense.
> >> >>
> >> >> So can you kindly explain whats going on in the following query.
> >> >>
> >> >> http://localhost:8080/solr/select?q=malaysian airline
> >> crash&rq={!rerank
> >> >> reRankQuery=$rqq reRankDocs=1000}&rqq=*:*&sort=publish_date
> >> >> desc&fl=headline,publish_date,score
> >> >>
> >> >> I love the solr community, so much to learn from so many
> knowledgeable
> >> >> people.
> >> >>
> >> >> Thanks
> >> >>
> >> >> Ravi Kiran Bhaskar
> >> >>
> >> >>
> >> >>
> >> >> On Fri, Sep 5, 2014 at 1:23 PM, Erick Erickson <
> >> erickerick...@gmail.com>
> >> >> wrote:
> >> >>
> >> >> > OK, why ca

Re: Query ReRanking question

2015-01-16 Thread Ravi Solr
As per Erick's suggestion reposting my response to the group. Joel and
Erick Thank you very much for helping me out with the ReRanking question a
while ago.

I have an alternative which seems to be working better for me than
ReRanking, can you kindly let me know of any pitfalls that you guys can
think of about the this approach ?? Since we value relevancy & recency at
the same time even though both are mutually exclusive, i thought maybe I
can use the function queries to adjust the boost as follows

boost=max(recip(ms(NOW/HOUR,publish_date),7.889e-10,1,1),scale(query($q),0,1))

What I intended to do here is - if it matched a more recent doc it will
take recency into consideration, however if the relevancy is better than
date boost we keep relevancy. What do you guys think ??

Thanks,

Ravi Kiran Bhaskar


On Mon, Sep 8, 2014 at 12:35 PM, Ravi Solr  wrote:

> Joel and Erick,
>Thank you very much for explaining how the ReRanking works. Now
> its a bit more clear.
>
> Thanks,
>
> Ravi Kiran Bhaskar
>
> On Sun, Sep 7, 2014 at 4:45 PM, Joel Bernstein  wrote:
>
>> Oops wrong usage pattern. It should be:
>>
>> 1) Main query is sorted by a field (scores tracked silently in the
>> background).
>> 2) Reranker is reRanking docs based on the score from the main query.
>>
>>
>>
>> Joel Bernstein
>> Search Engineer at Heliosearch
>>
>>
>> On Sun, Sep 7, 2014 at 4:43 PM, Joel Bernstein 
>> wrote:
>>
>> > Ok, just reviewed the code. The ReRankingQParserPlugin always tracks the
>> > scores from the main query. So this explains things. Speaking of
>> explaining
>> > things, the ReRankingParserPlugin also works with Lucene's explain. So
>> if
>> > you use debugQuery=true we should see that the score from the initial
>> query
>> > was combined with the score from the reRankQuery, which should be 1.
>> >
>> > You have stumbled on a interesting usage pattern which I never
>> considered.
>> > But basically what's happening is:
>> >
>> > 1) Main query is sorted by score.
>> > 2) Reranker is reRanking docs based on the score from the main query.
>> >
>> > No, worries Erick, you've taught me a lot over the past couple of years!
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > Joel Bernstein
>> > Search Engineer at Heliosearch
>> >
>> >
>> > On Sun, Sep 7, 2014 at 11:37 AM, Erick Erickson <
>> erickerick...@gmail.com>
>> > wrote:
>> >
>> >> Joel:
>> >>
>> >> I find that whenever I say something totally wrong publicly, I
>> >> remember the correction really really well...
>> >>
>> >> Thanks for straightening that out!
>> >> Erick
>> >>
>> >> On Sat, Sep 6, 2014 at 12:58 PM, Joel Bernstein 
>> >> wrote:
>> >> > This folllowing query:
>> >> >
>> >> > http://localhost:8080/solr/select?q=malaysian airline
>> crash&rq={!rerank
>> >> > reRankQuery=$rqq reRankDocs=1000}&rqq=*:*&sort=publish_date
>> >> > desc&fl=headline,publish_date,score
>> >> >
>> >> > Is doing the following:
>> >> >
>> >> > The main query is sorted by publish_date. Then the results are
>> reranked
>> >> by
>> >> > *:*, which in theory would have no effect at all.
>> >> >
>> >> > The reRankQuery only uses the reRankQuery to re-rank the results. The
>> >> sort
>> >> > param will always apply to the main query.
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > Joel Bernstein
>> >> > Search Engineer at Heliosearch
>> >> >
>> >> >
>> >> > On Sat, Sep 6, 2014 at 2:33 PM, Ravi Solr 
>> wrote:
>> >> >
>> >> >> Erick,
>> >> >> Your idea about reversing Joel's suggestion seems to give
>> the
>> >> best
>> >> >> results of all the options I tried...but I cant seem to understand
>> >> why. I
>> >> >> thought the query shown below should give irrelevant results as
>> >> sorting by
>> >> >> date would throw rel

Query slow with termVectors termPositions termOffsets

2013-03-25 Thread Ravi Solr
Hello,
We re-indexed our entire core of 115 docs with some of the
fields having termVectors="true" termPositions="true" termOffsets="true",
prior to the reindex we only had termVectors="true". After the reindex the
the query component has become very slow. I thought that adding the
termOffsets and termPositions will increase the speed, am I wrong ? Several
queries like the one shown below which used to run fine are now very slow.
Can somebody kindly clarify how termOffsets and termPositions affect query
component ?

19076.0
 18972.0
0.0
0.0
0.0
0.0
0.0
0.0
104.0



[#|2013-03-25T11:22:53.446-0400|INFO|sun-appserver2.1|org.apache.solr.core.SolrCore|_ThreadID=45;_ThreadName=httpSSLWorkerThread-9001-19;|[xxx]
webapp=/solr-admin path=/select
params={q=primarysectionnode:(/national*+OR+/health*)+OR+(contenttype:Blog+AND+subheadline:("The+Checkup"+OR+"Checkpoint+Washington"+OR+"Post+Carbon"+OR+TSA+OR+"College+Inc."+OR+"Campus+Overload"+OR+"Planet+Panel"+OR+"The+Answer+Sheet"+OR+"Class+Struggle"+OR+"BlogPost"))+OR+(contenttype:"Photo+Gallery"+AND+headline:"day+in+photos")&start=0&rows=1&sort=displaydatetime+desc&fq=-source:(Reuters+OR+"PC+World"+OR+"CBS+News"+OR+NC8/WJLA+OR+"NewsChannel+8"+OR+NC8+OR+WJLA+OR+CBS)+-contenttype:("Discussion"+OR+"Photo")+-slug:(op-*dummy*+OR+noipad-*)+-(contenttype:"Photo+Gallery"+AND+headline:("Drawing+Board"+OR+"Drawing+board"+OR+"drawing+board"))+headline:[*+TO+*]+contenttype:[*+TO+*]+pubdatetime:[NOW/DAY-3YEARS+TO+NOW/DAY%2B1DAY]+-headline:("Summary+Box*"+OR+"Video*"+OR+"Post+Sports+Live*")+-slug:(warren*+OR+"history")+-(contenttype:Blog+AND+subheadline:("DC+Schools+Insider"+OR+"On+Leadership"))+contenttype:"Blog"+-systemid:(999c7102-955a-11e2-95ca-dd43e7ffee9c+OR+72bbb724-9554-11e2-95ca-dd43e7ffee9c+OR+2d008b80-9520-11e2-95ca-dd43e7ffee9c+OR+d2443d3c-9514-11e2-95ca-dd43e7ffee9c+OR+173764d6-9520-11e2-95ca-dd43e7ffee9c+OR+0181fd42-953c-11e2-95ca-dd43e7ffee9c+OR+e6cacb96-9559-11e2-95ca-dd43e7ffee9c+OR+03288052-9501-11e2-95ca-dd43e7ffee9c+OR+ddbf020c-9517-11e2-95ca-dd43e7ffee9c)+fullbody:[*+TO+*]&wt=javabin&version=2}
hits=4985 status=0 QTime=19044 |#]

Thanks,

Ravi Kiran Bhaskar


Re: Query slow with termVectors termPositions termOffsets

2013-03-25 Thread Ravi Solr
Yes the index size increased after turning on termPositions and termOffsets

Ravi Kiran Bhaskar

On Mon, Mar 25, 2013 at 1:13 PM,  wrote:

> Did index size increase after turning on termPositions and termOffsets?
>
> Thanks.
> Alex.
>
>
>
>
>
>
>
> -----Original Message-
> From: Ravi Solr 
> To: solr-user 
> Sent: Mon, Mar 25, 2013 8:27 am
> Subject: Query slow with termVectors termPositions termOffsets
>
>
> Hello,
> We re-indexed our entire core of 115 docs with some of the
> fields having termVectors="true" termPositions="true" termOffsets="true",
> prior to the reindex we only had termVectors="true". After the reindex the
> the query component has become very slow. I thought that adding the
> termOffsets and termPositions will increase the speed, am I wrong ? Several
> queries like the one shown below which used to run fine are now very slow.
> Can somebody kindly clarify how termOffsets and termPositions affect query
> component ?
>
> 19076.0
>   name="time">18972.0
>  name="time">0.0
>  name="time">0.0
>  name="time">0.0
>  name="time">0.0
>  name="org.apache.solr.handler.component.QueryElevationComponent"> name="time">0.0
>  name="time">0.0
>  name="time">104.0
> 
>
>
>
> [#|2013-03-25T11:22:53.446-0400|INFO|sun-appserver2.1|org.apache.solr.core.SolrCore|_ThreadID=45;_ThreadName=httpSSLWorkerThread-9001-19;|[xxx]
> webapp=/solr-admin path=/select
>
> params={q=primarysectionnode:(/national*+OR+/health*)+OR+(contenttype:Blog+AND+subheadline:("The+Checkup"+OR+"Checkpoint+Washington"+OR+"Post+Carbon"+OR+TSA+OR+"College+Inc."+OR+"Campus+Overload"+OR+"Planet+Panel"+OR+"The+Answer+Sheet"+OR+"Class+Struggle"+OR+"BlogPost"))+OR+(contenttype:"Photo+Gallery"+AND+headline:"day+in+photos")&start=0&rows=1&sort=displaydatetime+desc&fq=-source:(Reuters+OR+"PC+World"+OR+"CBS+News"+OR+NC8/WJLA+OR+"NewsChannel+8"+OR+NC8+OR+WJLA+OR+CBS)+-contenttype:("Discussion"+OR+"Photo")+-slug:(op-*dummy*+OR+noipad-*)+-(contenttype:"Photo+Gallery"+AND+headline:("Drawing+Board"+OR+"Drawing+board"+OR+"drawing+board"))+headline:[*+TO+*]+contenttype:[*+TO+*]+pubdatetime:[NOW/DAY-3YEARS+TO+NOW/DAY%2B1DAY]+-headline:("Summary+Box*"+OR+"Video*"+OR+"Post+Sports+Live*")+-slug:(warren*+OR+"history")+-(contenttype:Blog+AND+subheadline:("DC+Schools+Insider"+OR+"On+Leadership"))+contenttype:"Blog"+-systemid:(999c7102-955a-11e2-95ca-dd43e7ffee9c+OR+72bbb724-9554-11e2-95ca-dd43e7ffee9c+OR+2d008b80-9520-11e2-95ca-dd43e7ffee9c+OR+d2443d3c-9514-11e2-95ca-dd43e7ffee9c+OR+173764d6-9520-11e2-95ca-dd43e7ffee9c+OR+0181fd42-953c-11e2-95ca-dd43e7ffee9c+OR+e6cacb96-9559-11e2-95ca-dd43e7ffee9c+OR+03288052-9501-11e2-95ca-dd43e7ffee9c+OR+ddbf020c-9517-11e2-95ca-dd43e7ffee9c)+fullbody:[*+TO+*]&wt=javabin&version=2}
> hits=4985 status=0 QTime=19044 |#]
>
> Thanks,
>
> Ravi Kiran Bhaskar
>
>
>


Query Elevation exception on shard queries

2013-03-29 Thread Ravi Solr
Hello,
  We have a Solr 3.6.2 multicore setup, where each core is a complete
index for one application. In our site search we use sharded query to query
two cores at a time. The issue is, If one core has docs but other core
doesn't for an elevated query solr is throwing a 500 error. I woudl really
appreciate it if somebody can point me in the right direction on how to
avoid this error, the following is my query

[#|2013-03-29T13:44:55.609-0400|INFO|sun-appserver2.1|org.apache.solr.core.SolrCore|_ThreadID=22;_ThreadName=httpSSLWorkerThread-9001-0;|[core1]
webapp=/solr path=/select/
params={q=civil+war&start=0&rows=10&shards=localhost:/solr/core1,localhost:/solr/core2&hl=true&hl.fragsize=0&hl.snippets=5&hl.simple.pre=&hl.simple.post=&hl.fl=body&fl=*&facet=true&facet.field=type&facet.mincount=1&facet.method=enum&fq=pubdate:[2005-01-01T00:00:00Z+TO+NOW/DAY%2B1DAY]&facet.query={!ex%3Ddt+key%3D"Past+24+Hours"}pubdate:[NOW/DAY-1DAY+TO+NOW/DAY%2B1DAY]&facet.query={!ex%3Ddt+key%3D"Past+7+Days"}pubdate:[NOW/DAY-7DAYS+TO+NOW/DAY%2B1DAY]&facet.query={!ex%3Ddt+key%3D"Past+60+Days"}pubdate:[NOW/DAY-60DAYS+TO+NOW/DAY%2B1DAY]&facet.query={!ex%3Ddt+key%3D"Past+12+Months"}pubdate:[NOW/DAY-1YEAR+TO+NOW/DAY%2B1DAY]&facet.query={!ex%3Ddt+key%3D"All+Since+2005"}pubdate:[*+TO+NOW/DAY%2B1DAY]}
status=500 QTime=15 |#]


As you can see the 2 cores are core1 and core2. The core1 has data for he
query 'civil war' however core2 doesn't have any data. We have the 'civil
war' in the elevate.xml which causes Solr to throw a SolrException as
follows. However if I remove the elevate entry for this query, everything
works well.

*type* Status report

*message*Index: 1, Size: 0 java.lang.IndexOutOfBoundsException: Index: 1,
Size: 0 at java.util.ArrayList.RangeCheck(ArrayList.java:547) at
java.util.ArrayList.get(ArrayList.java:322) at
org.apache.solr.common.util.NamedList.getVal(NamedList.java:137) at
org.apache.solr.handler.component.ShardFieldSortedHitQueue$ShardComparator.sortVal(ShardDoc.java:221)
at
org.apache.solr.handler.component.ShardFieldSortedHitQueue$2.compare(ShardDoc.java:260)
at
org.apache.solr.handler.component.ShardFieldSortedHitQueue.lessThan(ShardDoc.java:160)
at
org.apache.solr.handler.component.ShardFieldSortedHitQueue.lessThan(ShardDoc.java:101)
at org.apache.lucene.util.PriorityQueue.upHeap(PriorityQueue.java:223) at
org.apache.lucene.util.PriorityQueue.add(PriorityQueue.java:132) at
org.apache.lucene.util.PriorityQueue.insertWithOverflow(PriorityQueue.java:148)
at
org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:786)
at
org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:587)
at
org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:566)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:283)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376) at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:365)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:260)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:246)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:214)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:313)
at
org.apache.catalina.core.StandardContextValve.invokeInternal(StandardContextValve.java:287)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:218)
at
org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:648)
at
org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:593)
at com.sun.enterprise.web.WebPipeline.invoke(WebPipeline.java:94) at
com.sun.enterprise.web.PESessionLockingStandardPipeline.invoke(PESessionLockingStandardPipeline.java:98)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:222)
at
org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:648)
at
org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:593)
at
org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:587)
at org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:1093)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:166)
at
org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:648)
at
org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:593)
at
org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:587)
at org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:1093)
at org.apache.coyote.tomcat5.CoyoteAdapter.service(CoyoteAdapter.java:291)
at
com.sun.enterprise.web.connector.grizzly.DefaultProcessorTask.invokeAdapter(DefaultProcessorTask.java:670)
at
com.sun.enterprise.web

Weird query issues

2013-04-19 Thread Ravi Solr
Hello,
We are using Solr 3.6.2 single core ( both index and query on same machine)
and randomly the server fails to query correctly.  If we query from the
admin console the query is not even applied and it returns numFound count
equal to total docs in the index as if no query is made, and if use SOLRJ
to query it throws javabin error

Invalid version (expected 2, but 60) or the data in not in 'javabin' format

Once we restart the container everything is back to normal.

In the process of debugging the solr logs I found empty queries like the
one below. Can anybody tell me what can cause empty queries in the log as
given below so trying to see if it may be relateed to the solr issues

[#|2013-04-19T14:10:20.308-0400|INFO|sun-appserver2.1.1|org.apache.solr.core.SolrCore|_ThreadID=19;_ThreadName=httpSSLWorkerThread-9001-0;|[core1]
webapp=/solr path=/select params={} hits=21727 status=0 QTime=24 |#]

Would Appreciate any pointers

Thanks

Ravi Kiran Bhaskar


Weird query issues

2013-04-19 Thread Ravi Solr
Hello,
We are using Solr 3.6.2 single core ( both index and query on same machine)
and randomly the server fails to query correctly.  If we query from the
admin console the query is not even applied and it returns numFound count
equal to total docs in the index as if no query is made, and if use SOLRJ
to query it throws javabin error

Invalid version (expected 2, but 60) or the data in not in 'javabin' format

Once we restart the container everything is back to normal.

In the process of debugging the solr logs I found empty queries like the
one below. Can anybody tell me what can cause empty queries in the log as
given below so trying to see if it may be relateed to the solr issues

[#|2013-04-19T14:10:20.308-0400|INFO|sun-appserver2.1.1|org.apache.solr.core.SolrCore|_ThreadID=19;_ThreadName=httpSSLWorkerThread-9001-0;|[core1]
webapp=/solr path=/select params={} hits=21727 status=0 QTime=24 |#]

Would Appreciate any pointers

Thanks

Ravi Kiran Bhaskar


Re: Weird query issues

2013-04-20 Thread Ravi Solr
Thanks you very much for responding Shawn. I never use IE, I use firefox.
These are brand new servers and I don't think I am mixing versions. What
made you think I was using the 1.4.1 ?? You are correct in saying that the
server is throwing HTML response since a group query has been failing with
SEVERE error following which the entire instance behaves weirdly until we
restart.

Its surprising that group query error handling has such glaring issue. If
you specify group=true but don't specify group.query or group.field SOLR
throws a SEVERE exception following which we see the empty queries and
finally no responses via solrj and admin console gives numFound always
equal to total number of docs in index . Looks like the searcher goes for a
spin once it encounters the exception. Such situation should have been
gracefully handled

[#|2013-04-19T23:47:53.363-0400|SEVERE|sun-appserver2.1.1|org.apache.solr.core.SolrCore|_ThreadID=26;_ThreadName=httpSSLWorkerThread-9001-17;_RequestID=2f
933642-cad0-40e5-86c6-65b00be9bb97;|org.apache.solr.common.SolrException:
Specify at least one field, function or query to group by.
at org.apache.solr.search.Grouping.execute(Grouping.java:228)
at
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:372)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:186)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:365)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:260)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:246)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:214)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:313)
at
org.apache.catalina.core.StandardContextValve.invokeInternal(StandardContextValve.java:287)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:218)
at
org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:648)
at
org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:593)
at com.sun.enterprise.web.WebPipeline.invoke(WebPipeline.java:94)
at
com.sun.enterprise.web.PESessionLockingStandardPipeline.invoke(PESessionLockingStandardPipeline.java:98)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:222)
at
org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:648)
at
org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:593)
at
org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:587)
at org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:1093)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:166)
at
org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:648)
at
org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:593)
at
org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:587)
at org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:1093)
at org.apache.coyote.tomcat5.CoyoteAdapter.service(CoyoteAdapter.java:291)
at
com.sun.enterprise.web.connector.grizzly.DefaultProcessorTask.invokeAdapter(DefaultProcessorTask.java:670)
at
com.sun.enterprise.web.connector.grizzly.DefaultProcessorTask.doProcess(DefaultProcessorTask.java:601)
at
com.sun.enterprise.web.connector.grizzly.DefaultProcessorTask.process(DefaultProcessorTask.java:875)
at
com.sun.enterprise.web.connector.grizzly.DefaultReadTask.executeProcessorTask(DefaultReadTask.java:365)
at
com.sun.enterprise.web.connector.grizzly.DefaultReadTask.doTask(DefaultReadTask.java:285)
at
com.sun.enterprise.web.connector.grizzly.DefaultReadTask.doTask(DefaultReadTask.java:221)
at com.sun.enterprise.web.connector.grizzly.TaskBase.run(TaskBase.java:269)
at
com.sun.enterprise.web.connector.grizzly.ssl.SSLWorkerThread.run(SSLWorkerThread.java:111)
|#]

 
[#|2013-04-19T23:47:53.365-0400|INFO|sun-appserver2.1.1|org.apache.solr.core.SolrCore|_ThreadID=26;_ThreadName=httpSSLWorkerThread-9001-17;|[core1]
webapp=/solr path=/select
params={q=astronomy\+&rows=10&start=0&facet=true&fq=source:"xxx.com"&fq=locations:("Maryland")&sort=score+desc&group=true}
status=400 QTime=9 |#]



Ravi Kiran Bhaskar


On Fri, Apr 19, 2013 at 5:40 PM, Shawn Heisey  wrote:

> On 4/19/2013 12:55 PM, Ravi Solr wrote:
>
>> We are using Solr 3.6.2 single core ( both index and query on same
>> machine)
>> and randomly the server fails to query correctly.  If we query from the
>> admin console the query is not even applied and it returns numFound count
>> equal to total docs in the index as if no query is made, and if use SOLRJ
>&g

Re: Weird query issues

2013-04-20 Thread Ravi Solr
Thanks for your advise Shawn. I have created a JIRA issue SOLR-4743.


On Sat, Apr 20, 2013 at 4:32 PM, Shawn Heisey  wrote:

> On 4/20/2013 9:08 AM, Ravi Solr wrote:
> > Thanks you very much for responding Shawn. I never use IE, I use firefox.
> > These are brand new servers and I don't think I am mixing versions. What
> > made you think I was using the 1.4.1 ?? You are correct in saying that
> the
> > server is throwing HTML response since a group query has been failing
> with
> > SEVERE error following which the entire instance behaves weirdly until we
> > restart.
> >
> > Its surprising that group query error handling has such glaring issue. If
> > you specify group=true but don't specify group.query or group.field SOLR
> > throws a SEVERE exception following which we see the empty queries and
> > finally no responses via solrj and admin console gives numFound always
> > equal to total number of docs in index . Looks like the searcher goes
> for a
> > spin once it encounters the exception. Such situation should have been
> > gracefully handled
>
> Ah, so what's happening is that after an invalid grouping query, Solr is
> unstable and stops working right.  You should file an issue in Jira,
> giving as much detail as you can.  My last message was almost completely
> wrong.
>
> You are right that it should be gracefully handled, and obviously it is
> not.  For the 3.x Solr versions, grouping did not exist before 3.6.  It
> is a major 4.x feature that was backported.  Sometimes such major
> features depend on significant changes that have not happened on older
> versions, leading to problems like this.  Unfortunately, you could wait
> quite a while for a fix on 3.6, where active development has stopped.
>
> I have no personal experience with grouping, but I just tried the
> problematic query (adding "&group=true" to one that works) on 4.2.1.  It
> doesn't throw an error, I just get no results. When I follow it with a
> regular query, everything works perfectly. Would you be able to upgrade
> to 4.2.1?  That's not a trivial thing to do, so hopefully you are
> already working on upgrading.
>
> Thanks,
> Shawn
>
>


Re: Dynamically loading Elevation Info

2013-04-22 Thread Ravi Solr
If you place the elevate.xml in the data directory of your index it will be
loaded every time a commit happens.

Thanks

Ravi Kiran Bhaskar


On Mon, Apr 22, 2013 at 7:38 AM, Erick Erickson wrote:

> I believe (but don't know for sure) that the QEV file is re-read on
> core reload, which the same app that modifies the elevator.xml file
> could trigger with an http request, see:
>
> http://wiki.apache.org/solr/CoreAdmin#RELOAD
>
> At least that's what I would try first.
>
> Best
> Erick
>
> On Mon, Apr 22, 2013 at 2:48 AM, Saroj C  wrote:
> > Hi,
> >  Business User wants to configure the elevation text and the IDs and they
> > want to have an UI to do the same. As soon as they configure, it should
> be
> > reflected  in SOLR,(without restarting).
> >
> > My understanding is, Now, the QueryElevationComponent reads the
> > Elevator.xml(Configurable) and loads the information into ElevationCache
> > during startup and uses the information while responding to queries. Is
> > there any way, the content in the ElevationCache can be modifiable  by
> > some other external process / is there any easy way of achieving this
> > requirement ?
> >
> > Thanks and Regards,
> > Saroj Kumar Choudhury
> > =-=-=
> > Notice: The information contained in this e-mail
> > message and/or attachments to it may contain
> > confidential or privileged information. If you are
> > not the intended recipient, any dissemination, use,
> > review, distribution, printing or copying of the
> > information contained in this e-mail message
> > and/or attachments to it are strictly prohibited. If
> > you have received this communication in error,
> > please notify us by reply e-mail or telephone and
> > immediately and permanently delete the message
> > and any attachments. Thank you
> >
> >
>


Re: Weird query issues

2013-04-26 Thread Ravi Solr
Hello Shawn,
We found that it is unrelated to the group queries instead more
related to the empty queries. Do you happen to know what could cause empty
queries like the following from SOLRJ ? I can generate similar query via
curl hitting the select handler like - http://server:port/solr/select

server.log_2013-04-26T05-02-22:[#|2013-04-26T04:33:39.065-0400|INFO|sun-appserver2.1.1|org.apache.solr.core.SolrCore|_ThreadID=38;_ThreadName=httpSSLWorkerTh
read-9001-11;|[xxxcore] webapp=/solr path=/select params={} hits=24099
status=0 QTime=19 |#]

What we are seeing is a huge number of these empty queries. Once this
happens I have observed 2 things

1. even if I query from admin console, irrespective of the query, I get
same results as if its a cached page of *:* query. i.e. I cannot see the
query I entered in the server log, the query doesn't even come to the
server but I get same results as *:*

2. If I query via solrj no results are returned.

This has been driving me nuts for almost a week. Any help is greatly
appreciated.

Thanks

Ravi Kiran Bhaskar



On Sat, Apr 20, 2013 at 10:33 PM, Ravi Solr  wrote:

> Thanks for your advise Shawn. I have created a JIRA issue SOLR-4743.
>
>
> On Sat, Apr 20, 2013 at 4:32 PM, Shawn Heisey  wrote:
>
>> On 4/20/2013 9:08 AM, Ravi Solr wrote:
>> > Thanks you very much for responding Shawn. I never use IE, I use
>> firefox.
>> > These are brand new servers and I don't think I am mixing versions. What
>> > made you think I was using the 1.4.1 ?? You are correct in saying that
>> the
>> > server is throwing HTML response since a group query has been failing
>> with
>> > SEVERE error following which the entire instance behaves weirdly until
>> we
>> > restart.
>> >
>> > Its surprising that group query error handling has such glaring issue.
>> If
>> > you specify group=true but don't specify group.query or group.field SOLR
>> > throws a SEVERE exception following which we see the empty queries and
>> > finally no responses via solrj and admin console gives numFound always
>> > equal to total number of docs in index . Looks like the searcher goes
>> for a
>> > spin once it encounters the exception. Such situation should have been
>> > gracefully handled
>>
>> Ah, so what's happening is that after an invalid grouping query, Solr is
>> unstable and stops working right.  You should file an issue in Jira,
>> giving as much detail as you can.  My last message was almost completely
>> wrong.
>>
>> You are right that it should be gracefully handled, and obviously it is
>> not.  For the 3.x Solr versions, grouping did not exist before 3.6.  It
>> is a major 4.x feature that was backported.  Sometimes such major
>> features depend on significant changes that have not happened on older
>> versions, leading to problems like this.  Unfortunately, you could wait
>> quite a while for a fix on 3.6, where active development has stopped.
>>
>> I have no personal experience with grouping, but I just tried the
>> problematic query (adding "&group=true" to one that works) on 4.2.1.  It
>> doesn't throw an error, I just get no results. When I follow it with a
>> regular query, everything works perfectly. Would you be able to upgrade
>> to 4.2.1?  That's not a trivial thing to do, so hopefully you are
>> already working on upgrading.
>>
>> Thanks,
>> Shawn
>>
>>
>


Re: Weird query issues

2013-04-26 Thread Ravi Solr
Thanks Shawn, We are using 3.6.2 client and server. I cleared my browser
cache several times while querying (is that similar to clear cache in
solrconfig.xml ?). The query is logged in the solrj based client's
application container however I see it empty in the solr's application
container...so somehow it is getting swallowed by solr...Iam not able to
figure out how and why ?

Thanks
Ravi Kiran Bhaskar

On Fri, Apr 26, 2013 at 4:33 PM, Shawn Heisey  wrote:

> On 4/26/2013 1:01 PM, Ravi Solr wrote:
> > Hello Shawn,
> > We found that it is unrelated to the group queries instead more
> > related to the empty queries. Do you happen to know what could cause
> empty
> > queries like the following from SOLRJ ? I can generate similar query via
> > curl hitting the select handler like - http://server:port/solr/select
> >
> >
> server.log_2013-04-26T05-02-22:[#|2013-04-26T04:33:39.065-0400|INFO|sun-appserver2.1.1|org.apache.solr.core.SolrCore|_ThreadID=38;_ThreadName=httpSSLWorkerTh
> > read-9001-11;|[xxxcore] webapp=/solr path=/select params={} hits=24099
> > status=0 QTime=19 |#]
> >
> > What we are seeing is a huge number of these empty queries. Once this
> > happens I have observed 2 things
> >
> > 1. even if I query from admin console, irrespective of the query, I get
> > same results as if its a cached page of *:* query. i.e. I cannot see the
> > query I entered in the server log, the query doesn't even come to the
> > server but I get same results as *:*
> >
> > 2. If I query via solrj no results are returned.
> >
> > This has been driving me nuts for almost a week. Any help is greatly
> > appreciated.
>
> Querying from the admin UI and not seeing anything in the server log
> sounds like browser caching.  You can turn that off in solrconfig.xml.
>
> I could not duplicate what you're seeing with SolrJ.  You didn't say
> what version of SolrJ, so I did this using 3.6.2 (same as your server
> version).  I thought maybe if you had a query object that didn't have an
> actual query set, it might do what you're seeing, but that doesn't
> appear to be the case.  I don't have a 3.6.2 server to test against, so
> I used my 3.5.0 and 4.2.1 servers.
>
> Test code:
> http://pastie.org/private/bnvurz1f9b9viawgqbxvmq
>
> Solr 4.2.1 log:
> INFO  - 2013-04-26 14:17:24.127; org.apache.solr.core.SolrCore; [ncmain]
> webapp=/solr path=/select params={wt=xml&version=2.2} hits=0 status=0
> QTime=20
>
> 3.5.0 server log:
> 
> Apr 26, 2013 2:20:23 PM org.apache.solr.common.SolrException log
> SEVERE: java.lang.NullPointerException
>
> Apr 26, 2013 2:20:23 PM org.apache.solr.core.SolrCore execute
> INFO: [ncmain] webapp=/solr path=/select params={wt=xml&version=2.2}
> status=500 QTime=0
> Apr 26, 2013 2:20:23 PM org.apache.solr.common.SolrException log
> SEVERE: java.lang.NullPointerException
> 
>
>
> Same code without the setParser line:
>
> Solr 4.2.1 log:
> INFO  - 2013-04-26 14:14:01.270; org.apache.solr.core.SolrCore; [ncmain]
> webapp=/solr path=/select params={wt=javabin&version=2} hits=0 status=0
> QTime=187
>
> Thanks,
> Shawn
>
>


Server inconsistent state & Core Reload issue

2013-05-01 Thread Ravi Solr
We are using Solr 3.6.2 with a single core setup on a glassfish server,
every 4-5 hours the server gradually gets into a some kind of a
inconsistent state and stops accepting any queries giving back cached
results. Even the core reload fails giving the following. Has anybody
experienced such behavior ? Can anybody help me understand why this might
happen ?

http://searchserver:80/solr/admin/cores?action=RELOAD&core=core1


 
  0
  9
 
 
  
  core1
  /data/solr/core1-home/
  /data/solr/core/core1-data/
  2013-05-01T19:16:31.32Z
  137850
  
  21479
  25170
  1367184551418
  4
  true
  true
  org.apache.lucene.store.MMapDirectory:org.apache.lucene.store.MMapDirectory@/data/solr/core/core1-data/index
lockFactory=org.apache.lucene.store.NativeFSLockFactory@71d9673
 2013-05-01T19:15:04Z
   
  
   



During the inconsistent state any queries being issued to the server loose
the query parameters. We can see the proper queries in the container's http
access logs but solr somehow solr doesn't get the query params at all. Also
note that "content length" on the container's access logs is always 68935,
which implies its always giving the same docs irrespective of the query.

If we restart the server everything is back to normal and the same queries
run properly.

SOLR Log
--
[#|2013-05-01T15:20:02.031-0400|INFO|sun-appserver2.1.1|org.apache.solr.core.SolrCore|_ThreadID=20;_ThreadName=httpSSLWorkerThread-9001-1;|[core1]
webapp=/solr path=/select params={} hits=21479 status=0 QTime=17 |#]

[#|2013-05-01T15:20:02.034-0400|INFO|sun-appserver2.1.1|org.apache.solr.core.SolrCore|_ThreadID=24;_ThreadName=httpSSLWorkerThread-9001-4;|[core1]
webapp=/solr path=/select params={} hits=21479 status=0 QTime=13 |#]

[#|2013-05-01T15:20:02.055-0400|INFO|sun-appserver2.1.1|org.apache.solr.core.SolrCore|_ThreadID=23;_ThreadName=httpSSLWorkerThread-9001-3;|[core1]
webapp=/solr path=/select params={} hits=21479 status=0 QTime=13 |#]

[#|2013-05-01T15:20:02.081-0400|INFO|sun-appserver2.1.1|org.apache.solr.core.SolrCore|_ThreadID=25;_ThreadName=httpSSLWorkerThread-9001-5;|[core1]
webapp=/solr path=/select params={} hits=21479 status=0 QTime=14 |#]

[#|2013-05-01T15:20:02.106-0400|INFO|sun-appserver2.1.1|org.apache.solr.core.SolrCore|_ThreadID=19;_ThreadName=httpSSLWorkerThread-9001-0;|[core1]
webapp=/solr path=/select params={} hits=21479 status=0 QTime=14 |#]

[#|2013-05-01T15:20:02.136-0400|INFO|sun-appserver2.1.1|org.apache.solr.core.SolrCore|_ThreadID=22;_ThreadName=httpSSLWorkerThread-9001-2;|[core1]
webapp=/solr path=/select params={} hits=21479 status=0 QTime=16 |#]

[#|2013-05-01T15:20:02.161-0400|INFO|sun-appserver2.1.1|org.apache.solr.core.SolrCore|_ThreadID=20;_ThreadName=httpSSLWorkerThread-9001-1;|[core1]
webapp=/solr path=/select params={} hits=21479 status=0 QTime=15 |#]

[#|2013-05-01T15:20:02.185-0400|INFO|sun-appserver2.1.1|org.apache.solr.core.SolrCore|_ThreadID=24;_ThreadName=httpSSLWorkerThread-9001-4;|[core1]
webapp=/solr path=/select params={} hits=21479 status=0 QTime=14 |#]

[#|2013-05-01T15:20:02.209-0400|INFO|sun-appserver2.1.1|org.apache.solr.core.SolrCore|_ThreadID=23;_ThreadName=httpSSLWorkerThread-9001-3;|[core1]
webapp=/solr path=/select params={} hits=21479 status=0 QTime=14 |#]

[#|2013-05-01T15:20:02.241-0400|INFO|sun-appserver2.1.1|org.apache.solr.core.SolrCore|_ThreadID=25;_ThreadName=httpSSLWorkerThread-9001-5;|[core1]
webapp=/solr path=/select params={} hits=21479 status=0 QTime=16 |#]

[#|2013-05-01T15:20:02.266-0400|INFO|sun-appserver2.1.1|org.apache.solr.core.SolrCore|_ThreadID=19;_ThreadName=httpSSLWorkerThread-9001-0;|[core1]
webapp=/solr path=/select params={} hits=21479 status=0 QTime=15 |#]

[#|2013-05-01T15:20:02.288-0400|INFO|sun-appserver2.1.1|org.apache.solr.core.SolrCore|_ThreadID=22;_ThreadName=httpSSLWorkerThread-9001-2;|[core1]
webapp=/solr path=/select params={} hits=21479 status=0 QTime=14 |#]

[#|2013-05-01T15:20:02.291-0400|INFO|sun-appserver2.1.1|org.apache.solr.core.SolrCore|_ThreadID=20;_ThreadName=httpSSLWorkerThread-9001-1;|[core1]
webapp=/solr path=/select params={} hits=21479 status=0 QTime=15 |#]



Container Access Logs
-
"xx.xxx.xx.xx" "" "01/May/2013:15:20:02 -0500" "GET
/solr/core1/select?q=*%3A*&rows=250&start=0&facet=true&fq=source%3A%
22site.com%22&fq=categories%3A%28%22Music+Venues%22%29&fl=name%2Cnamestring%2Cscore&sort=namestring+desc&wt=javabin&version=2&wt=javabin&version=2
HTTP/1.1" 200 68935

"xx.xxx.xx.xx" "" "01/May/2013:15:20:02 -0500" "GET
/solr/core1/select?q=*%3A*&rows=5&start=0&facet=true&fq=source%3A%22site.com%22&fq=categories%3A%28%22Exhibits%22%29&fq=types%3A%28%22Painting%2FDrawing%22%29&sort=closingdate

Re: Server inconsistent state & Core Reload issue

2013-05-01 Thread Ravi Solr
Shawn,
  I don't believe its the container because we use the same container
in another setup that has 6 cores which is serving almost 1.8 Million
requests a day without a hitch.

If you look at my email the container that is running SOLR got the request
params (http access logs provided in first email) but when it goes through
the SOLR app/code on the container (probably through request filters or
dispatchers..I don't know exactly) its getting lost, which is what I am
trying to understand. I want to understand under what situations this mat
happen.

Having said that this application that uses this problematic SOLR instance
retrieves large number of facets results for each of 26 facets for each
query and every query is a group query, would that cause any issues with
SOLR caches that could lead to the issues like I am facing ???

With regards to the port number, our paranoid security folks wanted me to
not reveal our ports so I put it as 80 without thinking :-), I assure use
that its not 80.

Thanks,

Ravi


On Wed, May 1, 2013 at 6:03 PM, Shawn Heisey  wrote:

> On 5/1/2013 3:14 PM, Ravi Solr wrote:
>
>> We are using Solr 3.6.2 with a single core setup on a glassfish server,
>> every 4-5 hours the server gradually gets into a some kind of a
>> inconsistent state and stops accepting any queries giving back cached
>> results. Even the core reload fails giving the following. Has anybody
>> experienced such behavior ? Can anybody help me understand why this might
>> happen ?
>>
>> http://searchserver:80/solr/**admin/cores?action=RELOAD&**core=core1<http://searchserver:80/solr/admin/cores?action=RELOAD&core=core1>
>>
>> 
>>   
>>0
>>9
>>   
>>   
>>
>
> It is dropping the parameters from the /admin/cores request too, so it
> returns status instead of acting on the RELOAD.
>
> This is acting like a servlet container issue more than a Solr issue. It's
> always possible that it actually is Solr.
>
> It's a little unusual to see Solr running on port 80.  It's not
> impossible, just not the normal setup, because exposing Solr directly to
> the outside world is a very bad idea, so it's a lot safer to have it listen
> on another port.
>
> Is glassfish actually listening on port 80?  If it's not, then you
> probably have something acting as a proxy in front of Solr.  If your
> platform is a UNIX variant or Linux and has a fully functional 'lsof'
> command, the following will tell you which process is bound to port 80:
>
> lsof -nPi | grep ":80"
>
> Can you try running Solr under the jetty that's included with the Solr
> download?  For Solr 3.6.2, this is a slightly modified Jetty 6.  You can't
> use the Jetty 8 that's included with a newer version of Solr.  If port 80
> is a requirement, that should be possible as long as it's running as root.
>
> Thanks,
> Shawn
>
>


Re: Query Elevation exception on shard queries

2013-05-06 Thread Ravi Solr
Varun,
 Since our cores were totally disjoint i.e. they pertain to two
different applications which may or may not have results for a given query,
we moved the elavation outside of solr into our java code. As long as both
cores had some results to return for a given query elevation would work.

Thanks,

Ravi


On Sat, May 4, 2013 at 1:54 PM, varun srivastava wrote:

> Hi Ravi,
>  I am getting same probelm . You got any solution ?
>
> Thanks
> Varun
>
>
> On Fri, Mar 29, 2013 at 11:48 AM, Ravi Solr  wrote:
>
> > Hello,
> >   We have a Solr 3.6.2 multicore setup, where each core is a complete
> > index for one application. In our site search we use sharded query to
> query
> > two cores at a time. The issue is, If one core has docs but other core
> > doesn't for an elevated query solr is throwing a 500 error. I woudl
> really
> > appreciate it if somebody can point me in the right direction on how to
> > avoid this error, the following is my query
> >
> >
> >
> [#|2013-03-29T13:44:55.609-0400|INFO|sun-appserver2.1|org.apache.solr.core.SolrCore|_ThreadID=22;_ThreadName=httpSSLWorkerThread-9001-0;|[core1]
> > webapp=/solr path=/select/
> >
> >
> params={q=civil+war&start=0&rows=10&shards=localhost:/solr/core1,localhost:/solr/core2&hl=true&hl.fragsize=0&hl.snippets=5&hl.simple.pre=&hl.simple.post=&hl.fl=body&fl=*&facet=true&facet.field=type&facet.mincount=1&facet.method=enum&fq=pubdate:[2005-01-01T00:00:00Z+TO+NOW/DAY%2B1DAY]&facet.query={!ex%3Ddt+key%3D"Past+24+Hours"}pubdate:[NOW/DAY-1DAY+TO+NOW/DAY%2B1DAY]&facet.query={!ex%3Ddt+key%3D"Past+7+Days"}pubdate:[NOW/DAY-7DAYS+TO+NOW/DAY%2B1DAY]&facet.query={!ex%3Ddt+key%3D"Past+60+Days"}pubdate:[NOW/DAY-60DAYS+TO+NOW/DAY%2B1DAY]&facet.query={!ex%3Ddt+key%3D"Past+12+Months"}pubdate:[NOW/DAY-1YEAR+TO+NOW/DAY%2B1DAY]&facet.query={!ex%3Ddt+key%3D"All+Since+2005"}pubdate:[*+TO+NOW/DAY%2B1DAY]}
> > status=500 QTime=15 |#]
> >
> >
> > As you can see the 2 cores are core1 and core2. The core1 has data for he
> > query 'civil war' however core2 doesn't have any data. We have the 'civil
> > war' in the elevate.xml which causes Solr to throw a SolrException as
> > follows. However if I remove the elevate entry for this query, everything
> > works well.
> >
> > *type* Status report
> >
> > *message*Index: 1, Size: 0 java.lang.IndexOutOfBoundsException: Index: 1,
> > Size: 0 at java.util.ArrayList.RangeCheck(ArrayList.java:547) at
> > java.util.ArrayList.get(ArrayList.java:322) at
> > org.apache.solr.common.util.NamedList.getVal(NamedList.java:137) at
> >
> >
> org.apache.solr.handler.component.ShardFieldSortedHitQueue$ShardComparator.sortVal(ShardDoc.java:221)
> > at
> >
> >
> org.apache.solr.handler.component.ShardFieldSortedHitQueue$2.compare(ShardDoc.java:260)
> > at
> >
> >
> org.apache.solr.handler.component.ShardFieldSortedHitQueue.lessThan(ShardDoc.java:160)
> > at
> >
> >
> org.apache.solr.handler.component.ShardFieldSortedHitQueue.lessThan(ShardDoc.java:101)
> > at org.apache.lucene.util.PriorityQueue.upHeap(PriorityQueue.java:223) at
> > org.apache.lucene.util.PriorityQueue.add(PriorityQueue.java:132) at
> >
> >
> org.apache.lucene.util.PriorityQueue.insertWithOverflow(PriorityQueue.java:148)
> > at
> >
> >
> org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:786)
> > at
> >
> >
> org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:587)
> > at
> >
> >
> org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:566)
> > at
> >
> >
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:283)
> > at
> >
> >
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
> > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376) at
> >
> >
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:365)
> > at
> >
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:260)
> > at
> >
> >
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:246)
> > at
> >
> >
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:214)
> > at
> >
> >
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:

Re: ConcurrentUpdateSolrServer "Missing ContentType" error on SOLR 4.2.1

2013-05-06 Thread Ravi Solr
I apologize for intruding, Shawn, do you know what can cause empty params
(i.e. params={}) ?

Ravi


On Mon, May 6, 2013 at 5:47 PM, Shawn Heisey  wrote:

> On 5/6/2013 1:25 PM, cleardot wrote:
>
>> My SolrJ client uses ConcurrentUpdateSolrServer to index > 50Gs of docs
>> to a
>> SOLR 3.6 instance on my Linux box.  When running the same client against
>> SOLR 4.2.1 on EC2 I got the following:
>>
>
> 
>
>
>  SOLR 4.2.1 log error
>> ==**
>> INFO: [mycore] webapp=/solr path=/update params={} {} 0 0
>> May 6, 2013 6:13:55 PM org.apache.solr.common.**SolrException log
>> SEVERE: org.apache.solr.common.**SolrException: Missing ContentType
>>
>
> This isn't the first time I've seen empty params in a Solr log on this
> list, but the other one was with 3.6.2 for both server and client.  Is
> "params={}" what actually got logged, or did you remove the stuff there to
> sanitize your logs on a public list?
>
> Are you by chance setting the response parser on your solr server object
> to something besides the Binary (javabin) parser?  If you are, could you
> remove the setParser call in your client code?  The only time you need to
> change the parser is when you're using SolrJ with a version of Solr that
> does not have the same javabin version.  The javabin version was v1 in Solr
> 1.4.1 and earlier, then v2 in 3.1.0 and later.  The other response parsers
> are less efficient than javabin.
>
> Thanks,
> Shawn
>
>


Is SOLR-2623 fixed ? Still issue with SOLR 3.6

2012-04-15 Thread Ravi Solr
Hello folks,
We are trying to access JMX data from SOLR 3.6 multi-core
setup and feed it into Nagios. Once we reload the core the JMX no more
works and we cannot get any data. Prior to moving to SOLR 3.6, I heard
that SOLR-2623 might have fixed the core reload issue. I reloaded one
of the core yesterday and from then onwards I cannot get any JMX data.
Has this issue been fixed or am I misunderstanding the JIRA
description. Is there any workaround ?

Thanks

Ravi Kiran Bhaskar


Invalid version (expected 2, but 60) on CentOS in production please Help!!!

2012-05-04 Thread Ravi Solr
Hello,
 We Recently we migrated our SOLR 3.6 server OS from Solaris
to CentOS and from then on we started seeing "Invalid version
(expected 2, but 60)" errors on one of the query servers (oddly one
other query server seems fine). If we restart the server having issue
everything will be alright, but the next day in the morning again we
get the same exception. I made sure that all the client applications
are using SOLR 3.6 version.

The Glassfish on which all the applications  and SOLR are deployed use
Java  1.6.0_29. The only difference I could see

1. The process indexing to the server having issues is using java1.6.0_31
2. The process indexing to the server that DOES NOT have issues is
using java1.6.0_29

Could the Java minor version being greater than the SOLR instance be
the cause of this issue  ???

Can anybody please help me debug this a bit more ? what else can I
look at to understand the underlying problem. The stack trace is given
below


[#|2012-05-04T09:58:43.985-0400|SEVERE|sun-appserver2.1.1|xxx...|_ThreadID=32;_ThreadName=httpSSLWorkerThread-9001-7;_RequestID=a19f92cc-2a8c-47e8-b159-a20330f14af5;
org.apache.solr.client.solrj.SolrServerException: Error executing query
at 
org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:95)
at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:311)
at 
com.wpost.ipad.feeds.FeedController.findLinksetNewsBySection(FeedController.java:743)
at 
com.wpost.ipad.feeds.FeedController.findNewsBySection(FeedController.java:347)
at sun.reflect.GeneratedMethodAccessor282.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.springframework.web.bind.annotation.support.HandlerMethodInvoker.invokeHandlerMethod(HandlerMethodInvoker.java:175)
at 
org.springframework.web.servlet.mvc.annotation.AnnotationMethodHandlerAdapter.invokeHandlerMethod(AnnotationMethodHandlerAdapter.java:421)
at 
org.springframework.web.servlet.mvc.annotation.AnnotationMethodHandlerAdapter.handle(AnnotationMethodHandlerAdapter.java:409)
at 
org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:774)
at 
org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:719)
at 
org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:644)
at 
org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:549)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:734)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:847)
at 
org.apache.catalina.core.ApplicationFilterChain.servletService(ApplicationFilterChain.java:427)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:315)
at 
org.apache.catalina.core.StandardContextValve.invokeInternal(StandardContextValve.java:287)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:218)
at 
org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:648)
at 
org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:593)
at com.sun.enterprise.web.WebPipeline.invoke(WebPipeline.java:94)
at 
com.sun.enterprise.web.PESessionLockingStandardPipeline.invoke(PESessionLockingStandardPipeline.java:98)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:222)
at 
org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:648)
at 
org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:593)
at 
org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:587)
at 
org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:1093)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:166)
at 
org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:648)
at 
org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:593)
at 
org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:587)
at 
org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:1093)
at 
org.apache.coyote.tomcat5.CoyoteAdapter.service(CoyoteAdapter.java:291)
at 
com.sun.enterprise.web.connector.grizzly.DefaultProcessorTask.invokeAdapter(DefaultProcessorTask.java:670)
at 
com.sun.enterprise.web.connector.grizzly.DefaultProcessorTask.doProcess(DefaultProcessorTask.java:601)
at 
com.sun.enterprise.web.connector.grizzly.DefaultProcessorTask.process(DefaultProcessorTask.java:875)
at 
com.sun.enterprise.web.connector.grizzly.DefaultReadTask.executeProcessorTask(DefaultReadTask.java:365)
at 
com.sun

Invalid version expected 2, but 60 on CentOS

2012-05-04 Thread Ravi Solr
Hello,
We recently we migrated our production SOLR 3.6 servers OS
from Solaris to CentOS and from then on we started seeing "Invalid
version (expected 2, but 60)" errors on one of the query servers
(oddly one other query server seems fine). If we restart the
problematic server everything returns to normalcy, but the next day in
the morning again we get the same exception. I made sure that all the
client applications are using SOLR 3.6 version.

The Glassfish on which all the applications  and SOLR are deployed use
Java  1.6.0_29. The only difference I could see

1. The process indexing to the server having issues is using java1.6.0_31
2. The process indexing to the server that DOES NOT have issues is
using java1.6.0_29

Could the Java minor version being greater than the SOLR instance be
the cause of this issue  ?

Can anybody please help me debug this a bit more ? what else can I
look at to understand the underlying problem. The stack trace is given
below


[#|2012-05-04T09:58:43.985-0400|SEVERE|sun-appserver2.1.1|xxx...|_ThreadID=32;_ThreadName=httpSSLWorkerThread-9001-7;_RequestID=a19f92cc-2a8c-47e8-b159-a20330f14af5;
org.apache.solr.client.solrj.SolrServerException: Error executing query
   at 
org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:95)
   at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:311)
   at 
xxx.xxx.xxx.FeedController.findLinksetNewsBySection(FeedController.java:743)
   at xxx.xxx.xxx.FeedController.findNewsBySection(FeedController.java:347)
   at sun.reflect.GeneratedMethodAccessor282.invoke(Unknown Source)
   at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
org.springframework.web.bind.annotation.support.HandlerMethodInvoker.invokeHandlerMethod(HandlerMethodInvoker.java:175)
   at 
org.springframework.web.servlet.mvc.annotation.AnnotationMethodHandlerAdapter.invokeHandlerMethod(AnnotationMethodHandlerAdapter.java:421)
   at 
org.springframework.web.servlet.mvc.annotation.AnnotationMethodHandlerAdapter.handle(AnnotationMethodHandlerAdapter.java:409)
   at 
org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:774)
   at 
org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:719)
   at 
org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:644)
   at 
org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:549)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:734)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:847)
   at 
org.apache.catalina.core.ApplicationFilterChain.servletService(ApplicationFilterChain.java:427)
   at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:315)
   at 
org.apache.catalina.core.StandardContextValve.invokeInternal(StandardContextValve.java:287)
   at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:218)
   at 
org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:648)
   at 
org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:593)
   at com.sun.enterprise.web.WebPipeline.invoke(WebPipeline.java:94)
   at 
com.sun.enterprise.web.PESessionLockingStandardPipeline.invoke(PESessionLockingStandardPipeline.java:98)
   at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:222)
   at 
org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:648)
   at 
org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:593)
   at 
org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:587)
   at org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:1093)
   at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:166)
   at 
org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:648)
   at 
org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:593)
   at 
org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:587)
   at org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:1093)
   at 
org.apache.coyote.tomcat5.CoyoteAdapter.service(CoyoteAdapter.java:291)
   at 
com.sun.enterprise.web.connector.grizzly.DefaultProcessorTask.invokeAdapter(DefaultProcessorTask.java:670)
   at 
com.sun.enterprise.web.connector.grizzly.DefaultProcessorTask.doProcess(DefaultProcessorTask.java:601)
   at 
com.sun.enterprise.web.connector.grizzly.DefaultProcessorTask.process(DefaultProcessorTask.java:875)
   at 
com.sun.enterprise.web.connector.grizzly.DefaultReadTask.executeProcessorTask(DefaultReadTask.java:365)
   at 
com.sun.enterprise.web.connector.grizzly.DefaultReadTask

Re: Invalid version expected 2, but 60 on CentOS

2012-05-04 Thread Ravi Solr
Thank you very much for responding Mr. Miller. There are 5 different
apps deployed on the same server as SOLR and all apps call SOLR as via
SOLRJ with localhost:8080/solr/sitecore as constructor url for
HttpSolrServer.out of all these 5 apps only one has this
issueif it is really the web server/container throwing 404s then
it should happen to other apps as well, as, they all call the same
core. This is what makes me believe its just not the web
server/container. Do I make sense ?

Thanks,

Ravi Kiran

On Fri, May 4, 2012 at 4:28 PM, Mark Miller  wrote:
>
> On May 4, 2012, at 4:09 PM, Ravi Solr wrote:
>
>> Thanking you in anticipation,
>
> Generally this happens because the webapp server is returning an html error 
> response of some kind. Often it's a 404.
>
> I think in trunk this might have been addressed - that is, it's easier to see 
> the true error. Not positive though.
>
> Some non success html response is likely coming back though.
>
> - Mark Miller
> lucidimagination.com
>
>
>
>
>
>
>
>
>
>
>


Re: Invalid version (expected 2, but 60) on CentOS in production please Help!!!

2012-05-06 Thread Ravi Solr
Thank you very much for responding Mr.Erickson. You may be right on
old version index, I will reindex. However we have a 2
separate/disjoint master-slave setup...only one query node/slave has
this issue. if it was really incompatible indexes why isnt the other
query server also throwing errors? that's what is throwing my
debugging thought process off.

Thanks

Ravi Kiran Bhaskar
Principal Software Engineer
Washington Post Digital
1150 15th Street NW, Washington, DC 20071

On Sat, May 5, 2012 at 12:53 PM, Erick Erickson  wrote:
> The first thing I'd check is if, in the log, there is a replication happening
> immediately prior to the error. I confess I'm not entirely up on the
> version thing, but is it possible you're replicating an index that
> is built with some other version of Solr?
>
> That would at least explain your statement that it runs OK, but then
> fails sometime later.
>
> Best
> Erick
>
> On Fri, May 4, 2012 at 1:50 PM, Ravi Solr  wrote:
>> Hello,
>>         We Recently we migrated our SOLR 3.6 server OS from Solaris
>> to CentOS and from then on we started seeing "Invalid version
>> (expected 2, but 60)" errors on one of the query servers (oddly one
>> other query server seems fine). If we restart the server having issue
>> everything will be alright, but the next day in the morning again we
>> get the same exception. I made sure that all the client applications
>> are using SOLR 3.6 version.
>>
>> The Glassfish on which all the applications  and SOLR are deployed use
>> Java  1.6.0_29. The only difference I could see
>>
>> 1. The process indexing to the server having issues is using java1.6.0_31
>> 2. The process indexing to the server that DOES NOT have issues is
>> using java1.6.0_29
>>
>> Could the Java minor version being greater than the SOLR instance be
>> the cause of this issue  ???
>>
>> Can anybody please help me debug this a bit more ? what else can I
>> look at to understand the underlying problem. The stack trace is given
>> below
>>
>>
>> [#|2012-05-04T09:58:43.985-0400|SEVERE|sun-appserver2.1.1|xxx...|_ThreadID=32;_ThreadName=httpSSLWorkerThread-9001-7;_RequestID=a19f92cc-2a8c-47e8-b159-a20330f14af5;
>> org.apache.solr.client.solrj.SolrServerException: Error executing query
>>        at 
>> org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:95)
>>        at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:311)
>>        at 
>> com.wpost.ipad.feeds.FeedController.findLinksetNewsBySection(FeedController.java:743)
>>        at 
>> com.wpost.ipad.feeds.FeedController.findNewsBySection(FeedController.java:347)
>>        at sun.reflect.GeneratedMethodAccessor282.invoke(Unknown Source)
>>        at 
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>        at java.lang.reflect.Method.invoke(Method.java:597)
>>        at 
>> org.springframework.web.bind.annotation.support.HandlerMethodInvoker.invokeHandlerMethod(HandlerMethodInvoker.java:175)
>>        at 
>> org.springframework.web.servlet.mvc.annotation.AnnotationMethodHandlerAdapter.invokeHandlerMethod(AnnotationMethodHandlerAdapter.java:421)
>>        at 
>> org.springframework.web.servlet.mvc.annotation.AnnotationMethodHandlerAdapter.handle(AnnotationMethodHandlerAdapter.java:409)
>>        at 
>> org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:774)
>>        at 
>> org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:719)
>>        at 
>> org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:644)
>>        at 
>> org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:549)
>>        at javax.servlet.http.HttpServlet.service(HttpServlet.java:734)
>>        at javax.servlet.http.HttpServlet.service(HttpServlet.java:847)
>>        at 
>> org.apache.catalina.core.ApplicationFilterChain.servletService(ApplicationFilterChain.java:427)
>>        at 
>> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:315)
>>        at 
>> org.apache.catalina.core.StandardContextValve.invokeInternal(StandardContextValve.java:287)
>>        at 
>> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:218)
>>        at 
>> org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:648)
>>        at 
>> org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:593)
>>        at com.sun.enterprise.w

Re: Invalid version (expected 2, but 60) on CentOS in production please Help!!!

2012-05-07 Thread Ravi Solr
Hello Mr. Miller and Mr. Erickson,
  Found yet another inconsistency on the query server that
might be causing this issue. Today morning also I got a similar error
as shown in stacktrace below. So I tried querying for that
"d101dd3a-979a-11e1-927c-291130c98dff" which is our unique key in the
schema.

On the server having issue it returned more than 10 docs with
numFound="1051273" and on all other sane servers it returned only 1
doc with numFound="1". This is really weird, as, we copied the entire
index from a sane server onto the server having issues now just 2 days
ago. Do you have any idea why this would happen ?

[#|2012-05-07T12:58:54.055-0400|SEVERE|sun-appserver2.1.1|com.wpost.ipad.feeds.FeedController|_ThreadID=22;_ThreadName=httpSSLWorkerThread-9001-3;_RequestID=4203e3e5-c39d-4df7-a32a-600d0169c81f;|Error
searching for thumbnails for d101dd3a-979a-11e1-927c-291130c98dff
org.apache.solr.client.solrj.SolrServerException: Error executing query
at 
org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:95)
at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:311)
at xxx.xxx.xxx.xxx.populateThumbnails(FeedController.java:1184)
at xxx.xxx.xxx.xxx..findNewsBySection(FeedController.java:509)
at sun.reflect.GeneratedMethodAccessor197.invoke(Unknown Source)
..
...
..
Caused by: java.lang.RuntimeException: Invalid version (expected 2,
but 60) or the data in not in 'javabin' format
at 
org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:99)
at 
org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse(BinaryResponseParser.java:41)
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:333)
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:211)
at 
org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:89)
... 43 more
|#]

Ravi Kiran Bhaskar
Principal Software Engineer
Washington Post Digital
1150 15th Street NW, Washington, DC 20071

On Mon, May 7, 2012 at 9:36 AM, Mark Miller  wrote:
> Normally this specific error is caused by a non success http error page and 
> response is returned. The response parser tries to parse HTML as javabin.
>
> Sent from my iPhone
>
> On May 7, 2012, at 7:37 AM, Erick Erickson  wrote:
>
>> Well, I'm guessing that the version of Solr (and perhaps there are
>> classpath issues in here?) are different, somehow, on the machine
>> slave that is showing the error.
>>
>> It's also possible that your config files have a different  LUCENE_VERSION
>> in them, although I don't think this should really create the errors you're
>> reporting.
>>
>> The thing that leads me in this direction is your statement that things
>> are fine for a while and then go bad later.  If replication happens just
>> before you get the index version error, that would point a finger at
>> something like different Solr versions.
>>
>> If there is no replication before this error, then this probably isn't
>> the problem
>> and we'll have to look elsewhere...
>>
>> But this is all guesswork, just like every bug... things are only obvious 
>> after
>> you find the problem!
>>
>> Best
>> Erick
>>
>>
>> On Sun, May 6, 2012 at 11:08 AM, Ravi Solr  wrote:
>>> Thank you very much for responding Mr.Erickson. You may be right on
>>> old version index, I will reindex. However we have a 2
>>> separate/disjoint master-slave setup...only one query node/slave has
>>> this issue. if it was really incompatible indexes why isnt the other
>>> query server also throwing errors? that's what is throwing my
>>> debugging thought process off.
>>>
>>> Thanks
>>>
>>> Ravi Kiran Bhaskar
>>> Principal Software Engineer
>>> Washington Post Digital
>>> 1150 15th Street NW, Washington, DC 20071
>>>
>>> On Sat, May 5, 2012 at 12:53 PM, Erick Erickson  
>>> wrote:
>>>> The first thing I'd check is if, in the log, there is a replication 
>>>> happening
>>>> immediately prior to the error. I confess I'm not entirely up on the
>>>> version thing, but is it possible you're replicating an index that
>>>> is built with some other version of Solr?
>>>>
>>>> That would at least explain your statement that it runs OK, but then
>>>> fails sometime later.
>>>>
>>>> Best
>>>> Erick
>>>>
>>>> On Fri, May 4, 2012 at 1:50 PM, Ravi Solr  wrote:
>>>>> Hell

Re: Invalid version (expected 2, but 60) on CentOS in production please Help!!!

2012-05-10 Thread Ravi Solr
I clean the entire index and re-indexed it with SOLRJ 3.6. Still I get
the same error every single day. How can I see if the container
returned partial/nonconforming response since it may be hidden by
solrj ?

Thanks

Ravi Kiran Bhaskar

On Mon, May 7, 2012 at 2:16 PM, Ravi Solr  wrote:
> Hello Mr. Miller and Mr. Erickson,
>              Found yet another inconsistency on the query server that
> might be causing this issue. Today morning also I got a similar error
> as shown in stacktrace below. So I tried querying for that
> "d101dd3a-979a-11e1-927c-291130c98dff" which is our unique key in the
> schema.
>
> On the server having issue it returned more than 10 docs with
> numFound="1051273" and on all other sane servers it returned only 1
> doc with numFound="1". This is really weird, as, we copied the entire
> index from a sane server onto the server having issues now just 2 days
> ago. Do you have any idea why this would happen ?
>
> [#|2012-05-07T12:58:54.055-0400|SEVERE|sun-appserver2.1.1|com.wpost.ipad.feeds.FeedController|_ThreadID=22;_ThreadName=httpSSLWorkerThread-9001-3;_RequestID=4203e3e5-c39d-4df7-a32a-600d0169c81f;|Error
> searching for thumbnails for d101dd3a-979a-11e1-927c-291130c98dff
> org.apache.solr.client.solrj.SolrServerException: Error executing query
>        at 
> org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:95)
>        at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:311)
>        at xxx.xxx.xxx.xxx.populateThumbnails(FeedController.java:1184)
>        at xxx.xxx.xxx.xxx..findNewsBySection(FeedController.java:509)
>        at sun.reflect.GeneratedMethodAccessor197.invoke(Unknown Source)
> ..
> ...
> ..
> Caused by: java.lang.RuntimeException: Invalid version (expected 2,
> but 60) or the data in not in 'javabin' format
>        at 
> org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:99)
>        at 
> org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse(BinaryResponseParser.java:41)
>        at 
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:333)
>        at 
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:211)
>        at 
> org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:89)
>        ... 43 more
> |#]
>
> Ravi Kiran Bhaskar
> Principal Software Engineer
> Washington Post Digital
> 1150 15th Street NW, Washington, DC 20071
>
> On Mon, May 7, 2012 at 9:36 AM, Mark Miller  wrote:
>> Normally this specific error is caused by a non success http error page and 
>> response is returned. The response parser tries to parse HTML as javabin.
>>
>> Sent from my iPhone
>>
>> On May 7, 2012, at 7:37 AM, Erick Erickson  wrote:
>>
>>> Well, I'm guessing that the version of Solr (and perhaps there are
>>> classpath issues in here?) are different, somehow, on the machine
>>> slave that is showing the error.
>>>
>>> It's also possible that your config files have a different  LUCENE_VERSION
>>> in them, although I don't think this should really create the errors you're
>>> reporting.
>>>
>>> The thing that leads me in this direction is your statement that things
>>> are fine for a while and then go bad later.  If replication happens just
>>> before you get the index version error, that would point a finger at
>>> something like different Solr versions.
>>>
>>> If there is no replication before this error, then this probably isn't
>>> the problem
>>> and we'll have to look elsewhere...
>>>
>>> But this is all guesswork, just like every bug... things are only obvious 
>>> after
>>> you find the problem!
>>>
>>> Best
>>> Erick
>>>
>>>
>>> On Sun, May 6, 2012 at 11:08 AM, Ravi Solr  wrote:
>>>> Thank you very much for responding Mr.Erickson. You may be right on
>>>> old version index, I will reindex. However we have a 2
>>>> separate/disjoint master-slave setup...only one query node/slave has
>>>> this issue. if it was really incompatible indexes why isnt the other
>>>> query server also throwing errors? that's what is throwing my
>>>> debugging thought process off.
>>>>
>>>> Thanks
>>>>
>>>> Ravi Kiran Bhaskar
>>>> Principal Software Engineer
>>>> Washington Post Digital
>>>> 1150 15th Street NW, Washington, DC 20071
>>>>
>>>> On Sat, May 5, 2012 at 12:53 PM, Eric

Re: Invalid version (expected 2, but 60) on CentOS in production please Help!!!

2012-05-10 Thread Ravi Solr
Task.java:221)
at 
com.sun.enterprise.web.portunif.PortUnificationPipeline$PUTask.doTask(PortUnificationPipeline.java:393)
at 
com.sun.enterprise.web.connector.grizzly.TaskBase.run(TaskBase.java:269)
at 
com.sun.enterprise.web.connector.grizzly.ssl.SSLWorkerThread.run(SSLWorkerThread.java:111)
Caused by: java.lang.RuntimeException: Invalid version (expected 2,
but 60) or the data in not in 'javabin' format
at 
org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:99)
at 
org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse(BinaryResponseParser.java:41)
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:333)
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:211)
at 
org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:89)
... 43 more
|#]


Ravi Kiran Bhaskar

On Thu, May 10, 2012 at 4:33 PM, Shawn Heisey  wrote:
> On 5/10/2012 12:27 PM, Ravi Solr wrote:
>>
>> I clean the entire index and re-indexed it with SOLRJ 3.6. Still I get
>> the same error every single day. How can I see if the container
>> returned partial/nonconforming response since it may be hidden by
>> solrj ?
>
>
> If the server is sending a non-javabin error response that SolrJ doesn't
> parse, the logs from the container that runs Solr will normally give you
> useful information.  Here's the first part of something logged on mine for
> an invalid query - the query sent in only had one double quote.  The log
> actually contains the full Java stacktrace, I just included the first little
> bit:
>
> May 10, 2012 2:13:17 PM org.apache.solr.common.SolrException log
> SEVERE: org.apache.solr.common.SolrException:
> org.apache.lucene.queryParser.ParseException: Cannot parse '(  ("trader
> joe's))': Lexical error at line 1, column 20.  Encountered:  after :
> "\"trader joe\'s))"
>
> Thanks,
> Shawn
>


Re: Invalid version (expected 2, but 60) on CentOS in production please Help!!!

2012-05-11 Thread Ravi Solr
Guys, just to give you an update, we think we "might" have found the
issue. iptables was enabled on one query server and disabled on the
other. The server where iptables is enabled is the one having issues,
we disabled the iptables today to test out the theory that the
iptables might be causing this issue of null/empty response. If the
server holds up during the weekend then we have the culprit :-)

Thanks to all of you who helped me out. Stay tuned.

Ravi Kiran

On Fri, May 11, 2012 at 1:23 AM, Shawn Heisey  wrote:
> On 5/10/2012 4:17 PM, Ravi Solr wrote:
>>
>> Thanks for responding Mr. Heisey... I don't see any parsing errors in
>> my log but I see lot of exceptions like the one listed belowonce
>> an exception like this happens weirdness ensues. For example - To
>> check sanity I queried for uniquekey:"111" from the solr admin GUI it
>> gave back numFound equal to all docs in that index i.e. its not
>> searching for that uniquekey at all, it blindly matched all docs.
>> However, once you restart the server the same index without any change
>> works perfectly returning only one doc in numFound when you search for
>> uniquekey:"111"...I tried everything from reindexing, copying index
>> from another sane server, delete entire index and reindex from scratch
>> etc but in vain, it works for roughly 24 hours and then starts
>> throwing the same error no matter what the query is.
>>
>>
>>
>> [#|2012-05-10T13:27:14.071-0400|SEVERE|sun-appserver2.1.1|xxx.xxx.xxx.xxx|_ThreadID=21;_ThreadName=httpSSLWorkerThread-9001-6;_RequestID=d44462e7-576b-4391-a499-c65da33e3293;|Error
>> searching data for section Local
>> org.apache.solr.client.solrj.SolrServerException: Error executing query
>>        at
>> org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:95)
>>        at
>> org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:311)
>>        at xxx.xxx.xxx.xxx(FeedController.java:621)
>>        at xxx.xxx.xxx.xxx(FeedController.java:402)
>
>
> This is still saying solrj.  Unless I am completely misunderstanding the way
> things work, which I will freely admit is possible, this is the client code.
>  Do you have anything in the log files from Solr (the server)?  I don't have
> a lot of experience with Tomcat, because I run my Solr under jetty as
> included in the example.  It looks like the client is running under Tomcat,
> though I suppose you might be running Solr under a different container.
>
> Thanks,
> Shawn
>


Re: Invalid version (expected 2, but 60) on CentOS in production please Help!!!

2012-05-15 Thread Ravi Solr
ghborhoodLower^15.0+addressLower^15.0+cityLower^15.0+zipcodeLower^15.0+countyLower^15.0+countryLower^15.0+countrycodeLower^15.0+body^10.0&fl=systemid,score&bf=ord(displaydatetime)^1.0&f.text.hl.fragmenter=regex&f.name.hl.alternateField=keyword&ps=5&q=annalynne&start=0&rows=10&tracking=sitesearch&facet=true&facet.field=contenttype&facet.method=enum&fq=displaydatetime:[2005-01-01T00:00:00Z+TO+NOW/DAY%2B1DAY]&facet.query={!ex%3Ddt+key%3D"Past+24+Hours"}displaydatetime:[NOW/DAY-1DAY+TO+NOW/DAY%2B1DAY]&facet.query={!ex%3Ddt+key%3D"Past+7+Days"}displaydatetime:[NOW/DAY-7DAYS+TO+NOW/DAY%2B1DAY]&facet.query={!ex%3Ddt+key%3D"Past+60+Days"}displaydatetime:[NOW/DAY-60DAYS+TO+NOW/DAY%2B1DAY]&facet.query={!ex%3Ddt+key%3D"Past+12+Months"}displaydatetime:[NOW/DAY-1YEAR+TO+NOW/DAY%2B1DAY]&facet.query={!ex%3Ddt+key%3D"All+Since+2005"}displaydatetime:[*+TO+NOW/DAY%2B1DAY]&fsv=true&f.contenttype.facet.limit=160&isShard=true&NOW=1337083963338&wt=javabin&version=2}
hits=5 status=0 QTime=11 |#]

Thanks,

Ravi Kiran Bhaskar

On Fri, May 11, 2012 at 11:04 PM, Mark Miller  wrote:
> Yeah, 9 times out of 10, this error is a 404 - which wouldn't be logged 
> anywhere.
>
> On May 11, 2012, at 6:12 PM, Ravi Solr wrote:
>
>> Guys, just to give you an update, we think we "might" have found the
>> issue. iptables was enabled on one query server and disabled on the
>> other. The server where iptables is enabled is the one having issues,
>> we disabled the iptables today to test out the theory that the
>> iptables might be causing this issue of null/empty response. If the
>> server holds up during the weekend then we have the culprit :-)
>>
>> Thanks to all of you who helped me out. Stay tuned.
>>
>> Ravi Kiran
>>
>> On Fri, May 11, 2012 at 1:23 AM, Shawn Heisey  wrote:
>>> On 5/10/2012 4:17 PM, Ravi Solr wrote:
>>>>
>>>> Thanks for responding Mr. Heisey... I don't see any parsing errors in
>>>> my log but I see lot of exceptions like the one listed belowonce
>>>> an exception like this happens weirdness ensues. For example - To
>>>> check sanity I queried for uniquekey:"111" from the solr admin GUI it
>>>> gave back numFound equal to all docs in that index i.e. its not
>>>> searching for that uniquekey at all, it blindly matched all docs.
>>>> However, once you restart the server the same index without any change
>>>> works perfectly returning only one doc in numFound when you search for
>>>> uniquekey:"111"...I tried everything from reindexing, copying index
>>>> from another sane server, delete entire index and reindex from scratch
>>>> etc but in vain, it works for roughly 24 hours and then starts
>>>> throwing the same error no matter what the query is.
>>>>
>>>>
>>>>
>>>> [#|2012-05-10T13:27:14.071-0400|SEVERE|sun-appserver2.1.1|xxx.xxx.xxx.xxx|_ThreadID=21;_ThreadName=httpSSLWorkerThread-9001-6;_RequestID=d44462e7-576b-4391-a499-c65da33e3293;|Error
>>>> searching data for section Local
>>>> org.apache.solr.client.solrj.SolrServerException: Error executing query
>>>>        at
>>>> org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:95)
>>>>        at
>>>> org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:311)
>>>>        at xxx.xxx.xxx.xxx(FeedController.java:621)
>>>>        at xxx.xxx.xxx.xxx(FeedController.java:402)
>>>
>>>
>>> This is still saying solrj.  Unless I am completely misunderstanding the way
>>> things work, which I will freely admit is possible, this is the client code.
>>>  Do you have anything in the log files from Solr (the server)?  I don't have
>>> a lot of experience with Tomcat, because I run my Solr under jetty as
>>> included in the example.  It looks like the client is running under Tomcat,
>>> though I suppose you might be running Solr under a different container.
>>>
>>> Thanks,
>>> Shawn
>>>
>
> - Mark Miller
> lucidimagination.com
>
>
>
>
>
>
>
>
>
>
>


Re: Invalid version (expected 2, but 60) on CentOS in production please Help!!!

2012-05-15 Thread Ravi Solr
I have already triple cross-checked  that all my clients are using
same version as the server which is 3.6

Thanks

Ravi Kiran

On Tue, May 15, 2012 at 2:09 PM, Ramesh K Balasubramanian
 wrote:
> I have seen similar errors before when the solr version and solrj version in 
> the client don't match.
>
> Best Regards,
> Ramesh


Re: Invalid version (expected 2, but 60) on CentOS in production please Help!!!

2012-05-18 Thread Ravi Solr
Just to give folks an update, we trashed the server having issues and
cloned/rebuild a VM from a sane server and it seems to be running good
for the past 3 days without any issues. We intend to monitor it over
the weekend. If its still stable on Monday, I would blame the issues
it on the server configuration. :-)

Thanks
Ravi Kiran Bhaskar

On Tue, May 15, 2012 at 2:57 PM, Ravi Solr  wrote:
> I have already triple cross-checked  that all my clients are using
> same version as the server which is 3.6
>
> Thanks
>
> Ravi Kiran
>
> On Tue, May 15, 2012 at 2:09 PM, Ramesh K Balasubramanian
>  wrote:
>> I have seen similar errors before when the solr version and solrj version in 
>> the client don't match.
>>
>> Best Regards,
>> Ramesh


Replication Clarification Please

2011-05-06 Thread Ravi Solr
Hello,
    Pardon me if this has been already answered somewhere and I
apologize for a lengthy post. I was wondering if anybody could help me
understand Replication internals a bit more. We have a single
master-slave setup (solr 1.4.1) with the configurations as shown
below. Our environment is quite commit heavy (almost 100s of docs
every 5 minutes), and all indexing is done on Master and all searches
go to the Slave. We are seeing that the slave replication performance
gradually decreases and the speed decreases < 1kbps and ultimately
gets backed up. Once we reload the core on slave it will be work fine
for sometime and then it again gets backed up. We have mergeFactor set
to 10 and ramBufferSizeMB is set to 32MB and solr itself is running
with 2GB memory and locktype is simple on both master and slave.

I am hoping that the following questions might help me understand the
replication performance issue better (Replication Configuration is
given at the end of the email)

1. Does the Slave get the whole index every time during replication or
just the delta since the last replication happened ?

2. If there are huge number of queries being done on slave will it
affect the replication ? How can I improve the performance ? (see the
replications details at he bottom of the page)

3. Will the segment names be same be same on master and slave after
replication ? I see that they are different. Is this correct ? If it
is correct how does the slave know what to fetch the next time i.e.
the delta.

4. When and why does the index. folder get created ? I see
this type of folder getting created only on slave and the slave
instance is pointing to it.

5. Does replication process copy both the index and index. folder ?

6. what happens if the replication kicks off even before the previous
invocation has not completed ? will the 2nd invocation block or will
it go through causing more confusion ?

7. If I have to prep a new master-slave combination is it OK to copy
the respective contents into the new master-slave and start solr ? or
do I have have to wipe the new slave and let it replicate from its new
master ?

8. Doing an 'ls | wc -l' on index folder of master and slave gave 194
and 17968 respectively...I slave has lot of segments_xxx files. Is
this normal ?

MASTER


    
    startup
    commit
    optimize

    schema.xml,stopwords.txt
    00:00:10
    



SLAVE


    
    master core url
    00:03:00
    internal
    5000
    1
 



REPLICATION DETAILS FROM PAGE

Master     master core url
Poll Interval     00:03:00
Local Index     Index Version: 1296217104577, Generation: 20190
    Location: /data/solr/core/search-data/index.20110429042508
    Size: 2.1 GB
    Times Replicated Since Startup: 672
    Previous Replication Done At: Fri May 06 15:41:01 EDT 2011
    Config Files Replicated At: null
    Config Files Replicated: null
    Times Config Files Replicated Since Startup: null
    Next Replication Cycle At: Fri May 06 15:44:00 EDT 2011
    Current Replication Status     Start Time: Fri May 06 15:41:00 EDT 2011
    Files Downloaded: 43 / 197
    Downloaded: 477.08 KB / 588.82 MB [0.0%]
    Downloading File: _hdm.prx, Downloaded: 9.3 KB / 9.3 KB [100.0%]
    Time Elapsed: 967s, Estimated Time Remaining: 1221166s, Speed: 505 bytes/s


Ravi Kiran Bhaskar


Re: Replication Clarification Please

2011-05-09 Thread Ravi Solr
Hello Mr. Bell,
   Thank you very much for patiently responding to my
questions. We optimize once in every 2 days. Can you kindly rephrase
your answer, I could not understand - "if the amount of time if > 10
segments, I believe that might also trigger a whole index, since you
cycled all the segments.In that case I think you might want to
increase the mergeFactor."

The current index folder details and sizes are given below

MASTER
--
   5K   search-data/spellchecker2
 480M  search-data/index
   5K   search-data/spellchecker1
   5K   search-data/spellcheckerFile
 480M   search-data

SLAVE
--
   2K   search-data/index.20110509103950
 419M   search-data/index
 2.3G   search-data/index.20110429042508  > SLAVE is pointing to
this directory
   5K   search-data/spellchecker1
   5K  search-data/spellchecker2
   5K   search-data/spellcheckerFile
 2.7G   search-data

Thanks,

Ravi Kiran Bhaskar

On Sat, May 7, 2011 at 11:49 PM, Bill Bell  wrote:
> I did not see answers... I am not an authority, but will tell you what I
> think
>
> Did you get some answers?
>
>
> On 5/6/11 2:52 PM, "Ravi Solr"  wrote:
>
>>Hello,
>>        Pardon me if this has been already answered somewhere and I
>>apologize for a lengthy post. I was wondering if anybody could help me
>>understand Replication internals a bit more. We have a single
>>master-slave setup (solr 1.4.1) with the configurations as shown
>>below. Our environment is quite commit heavy (almost 100s of docs
>>every 5 minutes), and all indexing is done on Master and all searches
>>go to the Slave. We are seeing that the slave replication performance
>>gradually decreases and the speed decreases < 1kbps and ultimately
>>gets backed up. Once we reload the core on slave it will be work fine
>>for sometime and then it again gets backed up. We have mergeFactor set
>>to 10 and ramBufferSizeMB is set to 32MB and solr itself is running
>>with 2GB memory and locktype is simple on both master and slave.
>
> How big is your index? How many rows and GB ?
>
> Every time you replicate, there are several resets on caching. So if you
> are constantly
> Indexing, you need to be careful on how that performance impact will apply.
>
>>
>>I am hoping that the following questions might help me understand the
>>replication performance issue better (Replication Configuration is
>>given at the end of the email)
>>
>>1. Does the Slave get the whole index every time during replication or
>>just the delta since the last replication happened ?
>
>
> It depends. If you do an OPTIMIZE every time your index, then you will be
> sending the whole index down.
> If the amount of time if > 10 segments, I believe that might also trigger
> a whole index, since you cycled all the segments.
> In that case I think you might want to increase the mergeFactor.
>
>
>>
>>2. If there are huge number of queries being done on slave will it
>>affect the replication ? How can I improve the performance ? (see the
>>replications details at he bottom of the page)
>
> It seems that might be one way the you get the index.* directories. At
> least I see it more frequently when there is huge load and you are trying
> to replicate.
> You could replicate less frequently.
>
>>
>>3. Will the segment names be same be same on master and slave after
>>replication ? I see that they are different. Is this correct ? If it
>>is correct how does the slave know what to fetch the next time i.e.
>>the delta.
>
> Yes they better be. In the old days you could just rsync the data
> directory from master and slave and reload the core, that worked fine.
>
>>
>>4. When and why does the index. folder get created ? I see
>>this type of folder getting created only on slave and the slave
>>instance is pointing to it.
>
> I would love to know all the conditions... I believe it is supposed to
> replicate to index.*, then reload to point to it. But sometimes it gets
> stuck in index.* land and never goes back to straight index.
>
> There are several bug fixes for this in 3.1.
>
>>
>>5. Does replication process copy both the index and index.
>>folder ?
>
> I believe it is supposed to copy the segment or whole index/ from master
> to index.* on slave.
>
>>
>>6. what happens if the replication kicks off even before the previous
>>invocation has not completed ? will the 2nd invocation block or will
>>it go through causing more confusion ?
>
> That is not supposed to happen, if a replication is in process, it should
> not copy again until that one is complete.
> Try it, just delete the data/*, restart SOLR, and 

Solr 3.1 Upgrade - Reindex necessary ?

2011-05-09 Thread Ravi Solr
Hello All,
 I am planning to upgrade from Solr 1.4.1 to Solr 3.1. I
saw some deprecation warnings in the log as shown below

[#|2011-05-09T12:37:18.762-0400|WARNING|sun-appserver9.1|org.apache.solr.analysis.BaseTokenStreamFactory|_ThreadID=53;_ThreadName=httpSSLWorkerThread-9001-13
;_RequestID=de32fd3f-e968-4228-a071-9bb175bfb549;|StopFilterFactory is
using deprecated LUCENE_24 emulation. You should at some point declare
and reindex to
at least 3.0, because 2.x emulation is deprecated and will be removed in 4.0|#]

[#|2011-05-09T12:37:18.765-0400|WARNING|sun-appserver9.1|org.apache.solr.analysis.BaseTokenStreamFactory|_ThreadID=53;_ThreadName=httpSSLWorkerThread-9001-13
;_RequestID=de32fd3f-e968-4228-a071-9bb175bfb549;|WordDelimiterFilterFactory
is using deprecated LUCENE_24 emulation. You should at some point
declare and re
index to at least 3.0, because 2.x emulation is deprecated and will be
removed in 4.0|#]

[#|2011-05-09T12:37:18.767-0400|WARNING|sun-appserver9.1|org.apache.solr.analysis.BaseTokenStreamFactory|_ThreadID=53;_ThreadName=httpSSLWorkerThread-9001-13
;_RequestID=de32fd3f-e968-4228-a071-9bb175bfb549;|EnglishPorterFilterFactory
is using deprecated LUCENE_24 emulation. You should at some point
declare and re
index to at least 3.0, because 2.x emulation is deprecated and will be
removed in 4.0|#]


so I would love the experts advise on the following questions

1. Do we have to reindex all content again to use Solr 3.1 ?
2. If we don't reindex all content are there any potential issues ? (I
read somewhere that first commit would change the 1.4.1 format to 3.1.
have the analyzer's behavior changed which warrants reindexing ?)
3. Apart from deploying the new solr 3.1 war; Is it just enough to set
"LUCENE_31"  to get all the
goodies and bug fixes of the LUCENE/SOLR 3.1 ?

Thank You,

Ravi Kiran Bhaskar


Re: Solr 3.1 Upgrade - Reindex necessary ?

2011-05-10 Thread Ravi Solr
Thanks Grijesh for responding. I meant that I will use the Lucene 3.1
jars for indexing also from now on. My current index already has a
million docs indexed with solr 1.4.1 version, I read somewhere that
once server is upgraded to 3.1, it is said that the first commit will
change the indexes to 3.1 format automatically. Is this true or do I
have to literally reindex the million docs again ?

Thanks,
Ravi Kiran Bhaskar

On Tuesday, May 10, 2011, Grijesh  wrote:
>>1. Do we have to reindex all content again to use Solr 3.1 ?
>
>>2. If we don't reindex all content are there any potential issues ? (I
>>read somewhere that first commit would change the 1.4.1 format to 3.1.
>>have the analyzer's behavior changed which warrants reindexing ?)
>>3. Apart from deploying the new solr 3.1 war; Is it just enough to set
>>"LUCENE_31"  to get all the
>>goodies and bug fixes of the LUCENE/SOLR 3.1 ?
>
> HI Solr-3.1 version usage the the latest version of Lucene jars so if you
> are planning to Upgrade then it is necessary to Re index all the content
> with Solr3.1 version.
>
> Not re-indexing will possibly cause of index corruption because newer
> version of lucene will create indexes in Newer version which is backward
> compatible for read only.
>
> setting  LUCENE_31 is not enough
> because it will not get the lucene 3.1 jar automatically.
>
> -
> Thanx:
> Grijesh
> www.gettinhahead.co.in
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-3-1-Upgrade-Reindex-necessary-tp2919679p2922645.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Replication Clarification Please

2011-05-10 Thread Ravi Solr
Hello Mr. Kanarsky,
Thank you very much for the detailed explanation,
probably the best explanation I found regarding replication. Just to
be sure, I wanted to test solr 3.1 to see if it alleviates the
problems...I dont think it helped. The master index version and
generation are greater than the slave, still the slave replicates the
entire index form master (see replication admin screen output below).
Any idea why it would get the whole index everytime even in 3.1 or am
I misinterpreting the output ? However I must admit that 3.1 finished
the replication unlike 1.4.1 which would hang and be backed up for
ever.

Master  http://masterurl:post/solr-admin/searchcore/replication
Latest Index Version:null, Generation: null
Replicatable Index Version:1296217097572, Generation: 12726

Poll Interval   00:03:00

Local Index Index Version: 1296217097569, Generation: 12725

Location: /data/solr/core/search-data/index
Size: 944.32 MB
Times Replicated Since Startup: 148
Previous Replication Done At: Tue May 10 12:32:42 EDT 2011
Config Files Replicated At: null
Config Files Replicated: null
Times Config Files Replicated Since Startup: null
Next Replication Cycle At: Tue May 10 12:35:41 EDT 2011

Current Replication Status  Start Time: Tue May 10 12:32:41 EDT 2011
Files Downloaded: 18 / 108
Downloaded: 317.48 KB / 436.24 MB [0.0%]
Downloading File: _ayu.nrm, Downloaded: 4 bytes / 4 bytes [100.0%]
Time Elapsed: 17s, Estimated Time Remaining: 23902s, Speed: 18.67 KB/s


Thanks,
Ravi Kiran Bhaskar

On Tue, May 10, 2011 at 4:10 AM, Alexander Kanarsky
 wrote:
> Ravi,
>
> as far as I remember, this is how the replication logic works (see
> SnapPuller class, fetchLatestIndex method):
>
>> 1. Does the Slave get the whole index every time during replication or
>> just the delta since the last replication happened ?
>
>
> It look at the index version AND the index generation. If both slave's
> version and generation are the same as on master, nothing gets
> replicated. if the master's generation is greater than on slave, the
> slave fetches the delta files only (even if the partial merge was done
> on the master) and put the new files from master to the same index
> folder on slave (either index or index., see further
> explanation). However, if the master's index generation is equals or
> less than one on slave, the slave does the full replication by
> fetching all files of the master's index and place them into a
> separate folder on slave (index.). Then, if the fetch is
> successfull, the slave updates (or creates) the index.properties file
> and puts there the name of the "current" index folder. The "old"
> index. folder(s) will be kept in 1.4.x - which was treated
> as a bug - see SOLR-2156 (and this was fixed in 3.1). After this, the
> slave does commit or reload core depending whether the config files
> were replicated. There is another bug in 1.4.x that fails replication
> if the slave need to do the full replication AND the config files were
> changed - also fixed in 3.1 (see SOLR-1983).
>
>> 2. If there are huge number of queries being done on slave will it
>> affect the replication ? How can I improve the performance ? (see the
>> replications details at he bottom of the page)
>
>
> >From my experience the half of the replication time is a time when the
> transferred data flushes to the disk. So the IO impact is important.
>
>> 3. Will the segment names be same be same on master and slave after
>> replication ? I see that they are different. Is this correct ? If it
>> is correct how does the slave know what to fetch the next time i.e.
>> the delta.
>
>
> They should be the same. The slave fetches the changed files only (see
> above), also look at SnapPuller code.
>
>> 4. When and why does the index. folder get created ? I see
>> this type of folder getting created only on slave and the slave
>> instance is pointing to it.
>
>
> See above.
>
>> 5. Does replication process copy both the index and index.
> folder ?
>
>
> index. folder gets created only of the full replication
> happened at least once. Otherwise, the slave will use the index
> folder.
>
>> 6. what happens if the replication kicks off even before the previous
>> invocation has not completed ? will the 2nd invocation block or will
>> it go through causing more confusion ?
>
>
> There is a lock (snapPullLock in ReplicationHandler) that prevents two
> replications run simultaneously. If there is no bug, it should just
> return silently from the replication call. (I personally never had
> problem with this so it looks there is no bug :)
>
>> 7. If I have to prep a new master-slave combination is it OK to copy
>> the respective contents into the new master-slave and start solr ? or
>> do I have have to wipe the new slave and let it replicate from its new
>> master ?
>
>
> If the new master has a different index, the slave will cr

Re: Solr 3.1 Upgrade - Reindex necessary ?

2011-05-10 Thread Ravi Solr
Hoss,
 Thank you very much for clearly delineating the difference.
Just to be clear - My intent to move to 3.1 was driven by my desire to
improve my replication performance - Deducing from your explanation, I
believe the replication/indexing related changes/bug fixes like the
following will be available to me even without specifying
"LUCENE_31" am I right ??

faster exact PhraseQuery; merging favors segments with deletions;
primary key lookup is faster; IndexWriter.addIndexes(Directory[]) uses
file copy instead of merging; various Directory performance
improvements; compound file is dynamically turned off for large
segments; fully deleted segments are dropped on commit; faster
snowball analyzers (in contrib); ConcurrentMergeScheduler is more
careful about setting priority of merge threads.

Ravi Kiran Bhaskar

On Tue, May 10, 2011 at 2:49 PM, Chris Hostetter
 wrote:
>
> : Thanks Grijesh for responding. I meant that I will use the Lucene 3.1
> : jars for indexing also from now on. My current index already has a
> : million docs indexed with solr 1.4.1 version, I read somewhere that
> : once server is upgraded to 3.1, it is said that the first commit will
> : change the indexes to 3.1 format automatically. Is this true or do I
> : have to literally reindex the million docs again ?
>
> index versioning happens on a segment basis, so once you start using Solr
> 3.1, as new docs are added and segments are merged those segments will be
> updated to the new file format -- the way to ensure that "all" segments
> are updated is to optimize your index.
>
> : >>1. Do we have to reindex all content again to use Solr 3.1 ?
>
> you should not need to, know.
>
> : >>3. Apart from deploying the new solr 3.1 war; Is it just enough to set
> : >>"LUCENE_31"  to get all the
> : >>goodies and bug fixes of the LUCENE/SOLR 3.1 ?
>
> It's not mandatory to change the  to upgrade -- if
> you do want to change the  then you should reindex,
> as that change causes analyzers/query parsers to behave differently (in
> ways thta might be incompatible with how they behave previously.
>
> this change is unrelated to the index fileformat -- optimizing your index
> to force the 3.1 fileformat has no impact on how what esoteric/broken
> behavior a tokenizer might have had in the past that changed once the
>  setting is updated.
>
> The purpose of  is to say "i want the behavior of
> X.Y, even when it's been decided that that behavior was bad, because it's
> what matches the terms i've already indexed"
>
>
> -Hoss


Re: Replication Clarification Please

2011-05-11 Thread Ravi Solr
Mr. Bell,
 Thank you for your help. Yes, the full index replicated every
1000, 1, 10 etc, if mergeFactor is 10 as per it's definition.
We do index every 5 minutes and replicate every 3 minutes just to make
sure consumers have  immediate access to the indexed docs.

Thanks,

Ravi Kiran Bhaskar

On Wednesday, May 11, 2011, Bill Bell  wrote:
> OK let me rephrase.
>
> In solrconfig.xml there is a setting called mergeFactor. The default is
> usually 10.
> Practically it means there are 10 segments. If you are doing fast delta
> indexing (adding a couple documents, then committing),
> You will cycle through all 10 segments pretty fast.
>
> It appears that if you do go past the 10 segments without replicating, the
> only recourse is for the replicator to do a full index replication instead
> of a delta index replication...
>
> Does that help?
>
>
> On 5/9/11 9:24 AM, "Ravi Solr"  wrote:
>
>>Hello Mr. Bell,
>>                   Thank you very much for patiently responding to my
>>questions. We optimize once in every 2 days. Can you kindly rephrase
>>your answer, I could not understand - "if the amount of time if > 10
>>segments, I believe that might also trigger a whole index, since you
>>cycled all the segments.In that case I think you might want to
>>increase the mergeFactor."
>>
>>The current index folder details and sizes are given below
>>
>>MASTER
>>--
>>   5K   search-data/spellchecker2
>> 480M  search-data/index
>>   5K   search-data/spellchecker1
>>   5K   search-data/spellcheckerFile
>> 480M   search-data
>>
>>SLAVE
>>--
>>   2K   search-data/index.20110509103950
>> 419M   search-data/index
>> 2.3G   search-data/index.20110429042508  > SLAVE is pointing to
>>this directory
>>   5K   search-data/spellchecker1
>>   5K  search-data/spellchecker2
>>   5K   search-data/spellcheckerFile
>> 2.7G   search-data
>>
>>Thanks,
>>
>>Ravi Kiran Bhaskar
>>
>>On Sat, May 7, 2011 at 11:49 PM, Bill Bell  wrote:
>>> I did not see answers... I am not an authority, but will tell you what I
>>> think
>>>
>>> Did you get some answers?
>>>
>>>
>>> On 5/6/11 2:52 PM, "Ravi Solr"  wrote:
>>>
>>>>Hello,
>>>>        Pardon me if this has been already answered somewhere and I
>>>>apologize for a lengthy post. I was wondering if anybody could help me
>>>>understand Replication internals a bit more. We have a single
>>>>master-slave setup (solr 1.4.1) with the configurations as shown
>>>>below. Our environment is quite commit heavy (almost 100s of docs
>>>>every 5 minutes), and all indexing is done on Master and all searches
>>>>go to the Slave. We are seeing that the slave replication performance
>>>>gradually decreases and the speed decreases < 1kbps and ultimately
>>>>gets backed up. Once we reload the core on slave it will be work fine
>>>>for sometime and then it again gets backed up. We have mergeFactor set
>>>>to 10 and ramBufferSizeMB is set to 32MB and solr itself is running
>>>>with 2GB memory and locktype is simple on both master and slave.
>>>
>>> How big is your index? How many rows and GB ?
>>>
>>> Every time you replicate, there are several resets on caching. So if you
>>> are constantly
>>> Indexing, you need to be careful on how that performance impact will
>>>apply.
>>>
>>>>
>>>>I am hoping that the following questions might help me understand the
>>>>replication performance issue better (Replication Configuration is
>>>>given at the end of the email)
>>>>
>>>>1. Does the Slave get the whole index every time during replication or
>>>>just the delta since the last replication happened ?
>>>
>>>
>>> It depends. If you do an OPTIMIZE every time your index, then you will
>>>be
>>> sending the whole index down.
>>> If the amount of time if > 10 segments, I believe that might also
>>>trigger
>>> a whole index, since you cycled all the segments.
>>> In that case I think you might want to increase the mergeFactor.
>>>
>>>
>>>>
>>>>2. If there are huge number of queries being done on slave will it
>>>>affect the replication ? How can I improve the performance ? (see the
>>>>replications details at he bottom of the page)
>>>
>>> It seems that might be one way the you get the index.* directories. At
>>> least I see it more frequently when there is huge load and you are
>>>trying
>>> to replicate.
>>> You could replicate less frequently.
>>>
>>>>
>>>>3. Will the segment names be same be same on master and slave after
>


Re: Replication Clarification Please

2011-05-12 Thread Ravi Solr
Thank you Mr. Bell and Mr. Kanarsky, as per your advise we have moved
from 1.4.1 to 3.1 and have made several changes to configuration. The
configuration changes have worked nicely till now and the replication
is finishing within the interval and not backing up. The changes we
made are as follows

1. Increased the mergeFactor from 10 to 15
2. Increased ramBufferSizeMB to 1024
3. Changed lockType to single (previously it was simple)
4. Set maxCommitsToKeep to 1 in the deletionPolicy
5. Set maxPendingDeletes to 0
6. Changed caches from LRUCache to FastLRUCache as we had hit ratios
well over 75% to increase warming speed
7. Increased the poll interval to 6 minutes and re-indexed all content.

Thanks,

Ravi Kiran Bhaskar

On Wed, May 11, 2011 at 6:00 PM, Alexander Kanarsky
 wrote:
> Ravi,
>
> if you have what looks like a full replication each time even if the
> master generation is greater than slave, try to watch for the index on
> both master and slave the same time to see what files are getting
> replicated. You probably may need to adjust your merge factor, as Bill
> mentioned.
>
> -Alexander
>
>
>
> On Tue, 2011-05-10 at 12:45 -0400, Ravi Solr wrote:
>> Hello Mr. Kanarsky,
>>                 Thank you very much for the detailed explanation,
>> probably the best explanation I found regarding replication. Just to
>> be sure, I wanted to test solr 3.1 to see if it alleviates the
>> problems...I dont think it helped. The master index version and
>> generation are greater than the slave, still the slave replicates the
>> entire index form master (see replication admin screen output below).
>> Any idea why it would get the whole index everytime even in 3.1 or am
>> I misinterpreting the output ? However I must admit that 3.1 finished
>> the replication unlike 1.4.1 which would hang and be backed up for
>> ever.
>>
>> Master        http://masterurl:post/solr-admin/searchcore/replication
>>       Latest Index Version:null, Generation: null
>>       Replicatable Index Version:1296217097572, Generation: 12726
>>
>> Poll Interval         00:03:00
>>
>> Local Index   Index Version: 1296217097569, Generation: 12725
>>
>>       Location: /data/solr/core/search-data/index
>>       Size: 944.32 MB
>>       Times Replicated Since Startup: 148
>>       Previous Replication Done At: Tue May 10 12:32:42 EDT 2011
>>       Config Files Replicated At: null
>>       Config Files Replicated: null
>>       Times Config Files Replicated Since Startup: null
>>       Next Replication Cycle At: Tue May 10 12:35:41 EDT 2011
>>
>> Current Replication Status    Start Time: Tue May 10 12:32:41 EDT 2011
>>       Files Downloaded: 18 / 108
>>       Downloaded: 317.48 KB / 436.24 MB [0.0%]
>>       Downloading File: _ayu.nrm, Downloaded: 4 bytes / 4 bytes [100.0%]
>>       Time Elapsed: 17s, Estimated Time Remaining: 23902s, Speed: 18.67 KB/s
>>
>>
>> Thanks,
>> Ravi Kiran Bhaskar
>>
>> On Tue, May 10, 2011 at 4:10 AM, Alexander Kanarsky
>>  wrote:
>> > Ravi,
>> >
>> > as far as I remember, this is how the replication logic works (see
>> > SnapPuller class, fetchLatestIndex method):
>> >
>> >> 1. Does the Slave get the whole index every time during replication or
>> >> just the delta since the last replication happened ?
>> >
>> >
>> > It look at the index version AND the index generation. If both slave's
>> > version and generation are the same as on master, nothing gets
>> > replicated. if the master's generation is greater than on slave, the
>> > slave fetches the delta files only (even if the partial merge was done
>> > on the master) and put the new files from master to the same index
>> > folder on slave (either index or index., see further
>> > explanation). However, if the master's index generation is equals or
>> > less than one on slave, the slave does the full replication by
>> > fetching all files of the master's index and place them into a
>> > separate folder on slave (index.). Then, if the fetch is
>> > successfull, the slave updates (or creates) the index.properties file
>> > and puts there the name of the "current" index folder. The "old"
>> > index. folder(s) will be kept in 1.4.x - which was treated
>> > as a bug - see SOLR-2156 (and this was fixed in 3.1). After this, the
>> > slave does commit or reload core depending whether the config files
>> > were replicated. There is another bug in 1.4.x that fails replication
>> > if the slave need to do the full replicati

Re: Replication Clarification Please

2011-05-13 Thread Ravi Solr
Sorry guys spoke too soon I guess. The replication still remains very
slow even after upgrading to 3.1 and setting the compression off. Now
Iam totally clueless. I have tried everything that I know of to
increase the speed of replication but failed. if anybody faced the
same issue, can you please tell me how you solved it.

Ravi Kiran Bhaskar

On Thu, May 12, 2011 at 6:42 PM, Ravi Solr  wrote:
> Thank you Mr. Bell and Mr. Kanarsky, as per your advise we have moved
> from 1.4.1 to 3.1 and have made several changes to configuration. The
> configuration changes have worked nicely till now and the replication
> is finishing within the interval and not backing up. The changes we
> made are as follows
>
> 1. Increased the mergeFactor from 10 to 15
> 2. Increased ramBufferSizeMB to 1024
> 3. Changed lockType to single (previously it was simple)
> 4. Set maxCommitsToKeep to 1 in the deletionPolicy
> 5. Set maxPendingDeletes to 0
> 6. Changed caches from LRUCache to FastLRUCache as we had hit ratios
> well over 75% to increase warming speed
> 7. Increased the poll interval to 6 minutes and re-indexed all content.
>
> Thanks,
>
> Ravi Kiran Bhaskar
>
> On Wed, May 11, 2011 at 6:00 PM, Alexander Kanarsky
>  wrote:
>> Ravi,
>>
>> if you have what looks like a full replication each time even if the
>> master generation is greater than slave, try to watch for the index on
>> both master and slave the same time to see what files are getting
>> replicated. You probably may need to adjust your merge factor, as Bill
>> mentioned.
>>
>> -Alexander
>>
>>
>>
>> On Tue, 2011-05-10 at 12:45 -0400, Ravi Solr wrote:
>>> Hello Mr. Kanarsky,
>>>                 Thank you very much for the detailed explanation,
>>> probably the best explanation I found regarding replication. Just to
>>> be sure, I wanted to test solr 3.1 to see if it alleviates the
>>> problems...I dont think it helped. The master index version and
>>> generation are greater than the slave, still the slave replicates the
>>> entire index form master (see replication admin screen output below).
>>> Any idea why it would get the whole index everytime even in 3.1 or am
>>> I misinterpreting the output ? However I must admit that 3.1 finished
>>> the replication unlike 1.4.1 which would hang and be backed up for
>>> ever.
>>>
>>> Master        http://masterurl:post/solr-admin/searchcore/replication
>>>       Latest Index Version:null, Generation: null
>>>       Replicatable Index Version:1296217097572, Generation: 12726
>>>
>>> Poll Interval         00:03:00
>>>
>>> Local Index   Index Version: 1296217097569, Generation: 12725
>>>
>>>       Location: /data/solr/core/search-data/index
>>>       Size: 944.32 MB
>>>       Times Replicated Since Startup: 148
>>>       Previous Replication Done At: Tue May 10 12:32:42 EDT 2011
>>>       Config Files Replicated At: null
>>>       Config Files Replicated: null
>>>       Times Config Files Replicated Since Startup: null
>>>       Next Replication Cycle At: Tue May 10 12:35:41 EDT 2011
>>>
>>> Current Replication Status    Start Time: Tue May 10 12:32:41 EDT 2011
>>>       Files Downloaded: 18 / 108
>>>       Downloaded: 317.48 KB / 436.24 MB [0.0%]
>>>       Downloading File: _ayu.nrm, Downloaded: 4 bytes / 4 bytes [100.0%]
>>>       Time Elapsed: 17s, Estimated Time Remaining: 23902s, Speed: 18.67 KB/s
>>>
>>>
>>> Thanks,
>>> Ravi Kiran Bhaskar
>>>
>>> On Tue, May 10, 2011 at 4:10 AM, Alexander Kanarsky
>>>  wrote:
>>> > Ravi,
>>> >
>>> > as far as I remember, this is how the replication logic works (see
>>> > SnapPuller class, fetchLatestIndex method):
>>> >
>>> >> 1. Does the Slave get the whole index every time during replication or
>>> >> just the delta since the last replication happened ?
>>> >
>>> >
>>> > It look at the index version AND the index generation. If both slave's
>>> > version and generation are the same as on master, nothing gets
>>> > replicated. if the master's generation is greater than on slave, the
>>> > slave fetches the delta files only (even if the partial merge was done
>>> > on the master) and put the new files from master to the same index
>>> > folder on slave (either index or index., see further
>>> > explanation). However, if the master's index generation is equals or
>>> 

Re: Replication Clarification Please

2011-05-18 Thread Ravi Solr
Alexander, sorry for the delay in replying. I wanted to test out a few
hunches that I had before I get back to you.
Hurray!!!  I was able to resolve the issue. The problem was with the
cache settings in the solrconfig.xml. It was taking almost 15-20
minutes to warm up the caches on each commit, as we are commit heavy
(every 5 minutes) the replication was screaming for the new searcher
to be warmed and it would never get a chance to finish so it was
perennially backed up. We reduced the cache and autowarm counts and
now the replication is happy finishing within 20 seconds!! Thank you
again for all your support.

Thanks,

Ravi Kiran Bhaskar
The Washington Post
1150 15th St. NW
Washington, DC 20071

On Sun, May 15, 2011 at 3:12 AM, Alexander Kanarsky
 wrote:
> Ravi,
>
> what is the replication configuration on both master and slave?
> Also could you list of files in the index folder on master and slave
> before and after the replication?
>
> -Alexander
>
>
> On Fri, 2011-05-13 at 18:34 -0400, Ravi Solr wrote:
>> Sorry guys spoke too soon I guess. The replication still remains very
>> slow even after upgrading to 3.1 and setting the compression off. Now
>> Iam totally clueless. I have tried everything that I know of to
>> increase the speed of replication but failed. if anybody faced the
>> same issue, can you please tell me how you solved it.
>>
>> Ravi Kiran Bhaskar
>>
>> On Thu, May 12, 2011 at 6:42 PM, Ravi Solr  wrote:
>> > Thank you Mr. Bell and Mr. Kanarsky, as per your advise we have moved
>> > from 1.4.1 to 3.1 and have made several changes to configuration. The
>> > configuration changes have worked nicely till now and the replication
>> > is finishing within the interval and not backing up. The changes we
>> > made are as follows
>> >
>> > 1. Increased the mergeFactor from 10 to 15
>> > 2. Increased ramBufferSizeMB to 1024
>> > 3. Changed lockType to single (previously it was simple)
>> > 4. Set maxCommitsToKeep to 1 in the deletionPolicy
>> > 5. Set maxPendingDeletes to 0
>> > 6. Changed caches from LRUCache to FastLRUCache as we had hit ratios
>> > well over 75% to increase warming speed
>> > 7. Increased the poll interval to 6 minutes and re-indexed all content.
>> >
>> > Thanks,
>> >
>> > Ravi Kiran Bhaskar
>> >
>> > On Wed, May 11, 2011 at 6:00 PM, Alexander Kanarsky
>> >  wrote:
>> >> Ravi,
>> >>
>> >> if you have what looks like a full replication each time even if the
>> >> master generation is greater than slave, try to watch for the index on
>> >> both master and slave the same time to see what files are getting
>> >> replicated. You probably may need to adjust your merge factor, as Bill
>> >> mentioned.
>> >>
>> >> -Alexander
>> >>
>> >>
>> >>
>> >> On Tue, 2011-05-10 at 12:45 -0400, Ravi Solr wrote:
>> >>> Hello Mr. Kanarsky,
>> >>>                 Thank you very much for the detailed explanation,
>> >>> probably the best explanation I found regarding replication. Just to
>> >>> be sure, I wanted to test solr 3.1 to see if it alleviates the
>> >>> problems...I dont think it helped. The master index version and
>> >>> generation are greater than the slave, still the slave replicates the
>> >>> entire index form master (see replication admin screen output below).
>> >>> Any idea why it would get the whole index everytime even in 3.1 or am
>> >>> I misinterpreting the output ? However I must admit that 3.1 finished
>> >>> the replication unlike 1.4.1 which would hang and be backed up for
>> >>> ever.
>> >>>
>> >>> Master        http://masterurl:post/solr-admin/searchcore/replication
>> >>>       Latest Index Version:null, Generation: null
>> >>>       Replicatable Index Version:1296217097572, Generation: 12726
>> >>>
>> >>> Poll Interval         00:03:00
>> >>>
>> >>> Local Index   Index Version: 1296217097569, Generation: 12725
>> >>>
>> >>>       Location: /data/solr/core/search-data/index
>> >>>       Size: 944.32 MB
>> >>>       Times Replicated Since Startup: 148
>> >>>       Previous Replication Done At: Tue May 10 12:32:42 EDT 2011
>> >>>       Config Files Replicated At: null
>> >>>       Config Files Replicated: null
>> >>>       Times Config Files Replicated Sin

Re: Synonym and Whitespaces and optional TokenizerFactory

2011-08-18 Thread Ravi Solr
If you have multi-word synonyms you could use -
tokenizerFactory="solr.KeywordTokenizerFactory" - in the
SynonymFilterFactory filter factory declaration. This assumes that
your tokenizer for that field allows for keeping the phrases as a
single token (achieved by using solr.KeywordTokenizerFactory instead
of Standard Tokenizer), if it is not then you might miss the synonym
setting altogether. See the configuration below


  





  


Then you can use synonyms like

Barack Obama,Barak Obama,Barack H. Obama,Barack Hussein Obama, Barak
Hussein Obama => Barack Obama

Ravi Kiran Bhaskar

On Thu, Aug 18, 2011 at 3:21 PM, Markus Jelsma
 wrote:
> How about escaping white\ space?
>
> cheers
>
>> Hmmm, why doesn't the multi word synonym syntax in your
>> synonym.txt handle this case? Or am I missing something
>> totally?
>>
>> Best
>> Erick
>>
>> On Wed, Aug 17, 2011 at 10:02 PM, Will Milspec 
> wrote:
>> > Hi all,
>> >
>> > This may be obvious. My question pertains to use of tokenizerFactory
>> > together with SynonymFilterFactory. Which tokenizerFactory does one  use
>> > to treat "synonyms with spaces" as one token,
>> >
>> > Example these two entries are synonyms: "lms", "learning management
>> > system"
>> >
>> > index time expansion would expand "lms" to these terms
>> >           "lms"
>> >           "learning management system"
>> >
>> > i.e. not  like this:
>> >           "lms"
>> >           "learning"
>> >           "management"
>> >           "system"
>> >
>> > Excerpt from the wiki article:
>> >
>> > http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
>> > 
>> > The optional *tokenizerFactory* parameter names a tokenizer factory class
>> > to analyze synonyms (see
>> > https://issues.apache.org/jira/browse/SOLR-319), which can help with the
>> > synonym+stemming problem described in
>> > http://search-lucene.com/m/hg9ri2mDvGk1 .
>> > 
>> >
>> > thanks,
>> >
>> > will
>


Re: Synonym and Whitespaces and optional TokenizerFactory

2011-08-18 Thread Ravi Solr
If you have multi-word synonyms you could use -
tokenizerFactory="solr.KeywordTokenizerFactory" - in the
SynonymFilterFactory filter factory declaration. This assumes that
your tokenizer for that field allows for keeping the phrases as a
single token (achieved by using solr.KeywordTokenizerFactory instead
of Standard Tokenizer), if it is not then you might miss the synonym
setting altogether. See the configuration below


 
   
   
   
   
   
 


Then you can use synonyms like

Barack Obama,Barak Obama,Barack H. Obama,Barack Hussein Obama, Barak
Hussein Obama => Barack Obama

Ravi Kiran Bhaskar
Principal Software Engineer
Washington Post Digital
1150 15th Street NW, Washington, DC 20071


On Thu, Aug 18, 2011 at 3:21 PM, Markus Jelsma
 wrote:
> How about escaping white\ space?
>
> cheers
>
>> Hmmm, why doesn't the multi word synonym syntax in your
>> synonym.txt handle this case? Or am I missing something
>> totally?
>>
>> Best
>> Erick
>>
>> On Wed, Aug 17, 2011 at 10:02 PM, Will Milspec 
> wrote:
>> > Hi all,
>> >
>> > This may be obvious. My question pertains to use of tokenizerFactory
>> > together with SynonymFilterFactory. Which tokenizerFactory does one  use
>> > to treat "synonyms with spaces" as one token,
>> >
>> > Example these two entries are synonyms: "lms", "learning management
>> > system"
>> >
>> > index time expansion would expand "lms" to these terms
>> >           "lms"
>> >           "learning management system"
>> >
>> > i.e. not  like this:
>> >           "lms"
>> >           "learning"
>> >           "management"
>> >           "system"
>> >
>> > Excerpt from the wiki article:
>> >
>> > http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
>> > 
>> > The optional *tokenizerFactory* parameter names a tokenizer factory class
>> > to analyze synonyms (see
>> > https://issues.apache.org/jira/browse/SOLR-319), which can help with the
>> > synonym+stemming problem described in
>> > http://search-lucene.com/m/hg9ri2mDvGk1 .
>> > 
>> >
>> > thanks,
>> >
>> > will
>


Solr 4.0 BETA Replication problems on Tomcat

2012-09-04 Thread Ravi Solr
Hello,
I have a very simple setup one master and one slave configured
as below, but replication keeps failing with stacktrace as shown
below. Note that 3.6 works fine on the same machines so I am thinking
that Iam missing something in configuration with regards to solr
4.0...can somebody kindly let me know if Iam missing something ? I am
running SOLR 4.0 on Tomcat-7.0.29 with Java6. FYI I never has any
problem with SOLR on glassfish, this is the first time Iam using it on
Tomcat

On Master


 
  commit
  optimize
  schema.xml,stopwords.txt,synonyms.txt
  00:00:10
  


On Slave


 
http://testslave:8080/solr/mycore/replication

00:00:50
internal
5000
1
 



Error

22:44:10WARNING SnapPuller  Error in fetching packets

java.util.zip.ZipException: unknown compression method
at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:147)
at 
org.apache.solr.common.util.FastInputStream.readWrappedStream(FastInputStream.java:79)
at 
org.apache.solr.common.util.FastInputStream.refill(FastInputStream.java:88)
at 
org.apache.solr.common.util.FastInputStream.read(FastInputStream.java:124)
at 
org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:149)
at 
org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:144)
at 
org.apache.solr.handler.SnapPuller$FileFetcher.fetchPackets(SnapPuller.java:1024)
at 
org.apache.solr.handler.SnapPuller$FileFetcher.fetchFile(SnapPuller.java:985)
at 
org.apache.solr.handler.SnapPuller.downloadIndexFiles(SnapPuller.java:627)
at 
org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:331)
at 
org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:297)
at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:175)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at 
java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

22:44:10SEVERE  ReplicationHandler  SnapPull failed
:org.apache.solr.common.SolrException: Unable to download
_3_Lucene40_0.tip completely. Downloaded 0!=170 at
org.apache.solr.handler.SnapPuller$FileFetcher.cleanup(SnapPuller.java:1115)
at org.apache.solr.handler.SnapPuller$FileFetcher.fetchFile(SnapPuller.java:999)
at org.apache.solr.handler.SnapPuller.downloadIndexFiles(SnapPuller.java:627)
at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:331)
at 
org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:297)
at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:175) at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150) at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)


Re: Solr 4.0 BETA Replication problems on Tomcat

2012-09-05 Thread Ravi Solr
Wow, That was quick. Thank you very much Mr. Siren. I shall remove the
compression node in the solrconfig.xml and let you know how it went.

Thanks,

Ravi Kiran Bhaskar

On Wed, Sep 5, 2012 at 2:54 AM, Sami Siren  wrote:
> I opened SOLR-3789. As a workaround you can remove  name="compression">internal from the config and it should work.
>
> --
>  Sami Siren
>
> On Wed, Sep 5, 2012 at 5:58 AM, Ravi Solr  wrote:
>> Hello,
>> I have a very simple setup one master and one slave configured
>> as below, but replication keeps failing with stacktrace as shown
>> below. Note that 3.6 works fine on the same machines so I am thinking
>> that Iam missing something in configuration with regards to solr
>> 4.0...can somebody kindly let me know if Iam missing something ? I am
>> running SOLR 4.0 on Tomcat-7.0.29 with Java6. FYI I never has any
>> problem with SOLR on glassfish, this is the first time Iam using it on
>> Tomcat
>>
>> On Master
>>
>> 
>>  
>>   commit
>>   optimize
>>   schema.xml,stopwords.txt,synonyms.txt
>>   00:00:10
>>   
>> 
>>
>> On Slave
>>
>> 
>>  
>> > name="masterUrl">http://testslave:8080/solr/mycore/replication
>>
>> 00:00:50
>> internal
>> 5000
>> 1
>>  
>> 
>>
>>
>> Error
>>
>> 22:44:10WARNING SnapPuller  Error in fetching packets
>>
>> java.util.zip.ZipException: unknown compression method
>> at 
>> java.util.zip.InflaterInputStream.read(InflaterInputStream.java:147)
>> at 
>> org.apache.solr.common.util.FastInputStream.readWrappedStream(FastInputStream.java:79)
>> at 
>> org.apache.solr.common.util.FastInputStream.refill(FastInputStream.java:88)
>> at 
>> org.apache.solr.common.util.FastInputStream.read(FastInputStream.java:124)
>> at 
>> org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:149)
>> at 
>> org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:144)
>> at 
>> org.apache.solr.handler.SnapPuller$FileFetcher.fetchPackets(SnapPuller.java:1024)
>> at 
>> org.apache.solr.handler.SnapPuller$FileFetcher.fetchFile(SnapPuller.java:985)
>> at 
>> org.apache.solr.handler.SnapPuller.downloadIndexFiles(SnapPuller.java:627)
>> at 
>> org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:331)
>> at 
>> org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:297)
>> at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:175)
>> at 
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>> at 
>> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>> at 
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
>> at 
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
>> at 
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
>> at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>> at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>> at java.lang.Thread.run(Thread.java:662)
>>
>> 22:44:10SEVERE  ReplicationHandler  SnapPull failed
>> :org.apache.solr.common.SolrException: Unable to download
>> _3_Lucene40_0.tip completely. Downloaded 0!=170 at
>> org.apache.solr.handler.SnapPuller$FileFetcher.cleanup(SnapPuller.java:1115)
>> at 
>> org.apache.solr.handler.SnapPuller$FileFetcher.fetchFile(SnapPuller.java:999)
>> at org.apache.solr.handler.SnapPuller.downloadIndexFiles(SnapPuller.java:627)
>> at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:331)
>> at 
>> org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:297)
>> at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:175) at
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>> at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150) at
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
>> at 
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
>> at 
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
>> at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>> at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>> at java.lang.Thread.run(Thread.java:662)


Re: Solr 4.0 BETA Replication problems on Tomcat

2012-09-05 Thread Ravi Solr
The replication finally worked after I removed the compression setting
from the solrconfig.xml on the slave. Thanks for providing the
workaround.

Ravi Kiran

On Wed, Sep 5, 2012 at 10:23 AM, Ravi Solr  wrote:
> Wow, That was quick. Thank you very much Mr. Siren. I shall remove the
> compression node in the solrconfig.xml and let you know how it went.
>
> Thanks,
>
> Ravi Kiran Bhaskar
>
> On Wed, Sep 5, 2012 at 2:54 AM, Sami Siren  wrote:
>> I opened SOLR-3789. As a workaround you can remove > name="compression">internal from the config and it should work.
>>
>> --
>>  Sami Siren
>>
>> On Wed, Sep 5, 2012 at 5:58 AM, Ravi Solr  wrote:
>>> Hello,
>>> I have a very simple setup one master and one slave configured
>>> as below, but replication keeps failing with stacktrace as shown
>>> below. Note that 3.6 works fine on the same machines so I am thinking
>>> that Iam missing something in configuration with regards to solr
>>> 4.0...can somebody kindly let me know if Iam missing something ? I am
>>> running SOLR 4.0 on Tomcat-7.0.29 with Java6. FYI I never has any
>>> problem with SOLR on glassfish, this is the first time Iam using it on
>>> Tomcat
>>>
>>> On Master
>>>
>>> 
>>>  
>>>   commit
>>>   optimize
>>>   schema.xml,stopwords.txt,synonyms.txt
>>>   00:00:10
>>>   
>>> 
>>>
>>> On Slave
>>>
>>> 
>>>  
>>> >> name="masterUrl">http://testslave:8080/solr/mycore/replication
>>>
>>> 00:00:50
>>> internal
>>> 5000
>>> 1
>>>  
>>> 
>>>
>>>
>>> Error
>>>
>>> 22:44:10WARNING SnapPuller  Error in fetching packets
>>>
>>> java.util.zip.ZipException: unknown compression method
>>> at 
>>> java.util.zip.InflaterInputStream.read(InflaterInputStream.java:147)
>>> at 
>>> org.apache.solr.common.util.FastInputStream.readWrappedStream(FastInputStream.java:79)
>>> at 
>>> org.apache.solr.common.util.FastInputStream.refill(FastInputStream.java:88)
>>> at 
>>> org.apache.solr.common.util.FastInputStream.read(FastInputStream.java:124)
>>> at 
>>> org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:149)
>>> at 
>>> org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:144)
>>> at 
>>> org.apache.solr.handler.SnapPuller$FileFetcher.fetchPackets(SnapPuller.java:1024)
>>> at 
>>> org.apache.solr.handler.SnapPuller$FileFetcher.fetchFile(SnapPuller.java:985)
>>> at 
>>> org.apache.solr.handler.SnapPuller.downloadIndexFiles(SnapPuller.java:627)
>>> at 
>>> org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:331)
>>> at 
>>> org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:297)
>>> at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:175)
>>> at 
>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>>> at 
>>> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>>> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>>> at 
>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
>>> at 
>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
>>> at 
>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
>>> at 
>>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>> at 
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>> at java.lang.Thread.run(Thread.java:662)
>>>
>>> 22:44:10SEVERE  ReplicationHandler  SnapPull failed
>>> :org.apache.solr.common.SolrException: Unable to download
>>> _3_Lucene40_0.tip completely. Downloaded 0!=170 at
>>> org.apache.solr.handler.SnapPuller$FileFetcher.cleanup(SnapPuller.java:1115)
>>> at 
>>> org.apache.solr.handler.SnapPuller$F

PointType doc reindex issue

2012-10-10 Thread Ravi Solr
Hello,
I have a weird problem, Whenever I read the doc from solr and
then index the same doc that already exists in the index (aka
reindexing) I get the following error. Can somebody tell me what I am
doing wrong. I use solr 3.6 and the definition of the field is given
below




Exception in thread "main"
org.apache.solr.client.solrj.SolrServerException: Server at
http://testsolr:8080/solr/mycore returned non ok status:400,
message:ERROR: [doc=1182684] multiple values encountered for non
multiValued field geolocation_0_coordinate: [39.017608, 39.017608]
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:328)
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:211)
at com.wpost.search.indexing.MyTest.main(MyTest.java:31)


The data in the index looks as follows

39.017608,-77.375239

 39.017608
 39.017608


-77.375239
-77.375239


Thanks

Ravi Kiran Bhaskar


Re: Problem with delete by query in Solr 4.0 beta

2012-10-10 Thread Ravi Solr
Do you have a "_version_" field in your schema. I believe SOLR 4.0
Beta requires that field.

Ravi Kiran Bhaskar

On Wed, Oct 10, 2012 at 11:45 AM, Andrew Groh  wrote:
> I cannot seem to get delete by query working in my simple setup in Solr 4.0 
> beta.
>
> I have a single collection and I want to delete old documents from it.  There 
> is a single solr node in the config (no replication, not distributed). This 
> is something that I previously did in Solr 3.x
>
> My collection is called dine, so I do:
>
> curl  "http://localhost:8080/solr/dine/update"; -s -H 'Content-type:text/xml; 
> charset=utf-8' -d "timestamp_dt:[2012-09-01T00:00:00Z TO 
> 2012-09-27T00:00:00Z]"
>
> and then a commit.
>
> The problem is that the documents are not delete.  When I run the same query 
> in the admin page, it still returns documents.
>
> I walked through the code and find the code in 
> DistributedUpdateProcessor::doDeleteByQuery to be suspicious.
>
> Specifically, vinfo is not null, but I have no version field, so 
> versionsStored is false.
>
> So it gets to line 786, which looks like:
> if (versionsStored) {
>
> That then skips to line 813 (the finally clause) skipping all calls to 
> doLocalDelete
>
> Now, I do confess I don't understand exactly how this code should work.  
> However, in the add code, the check for versionsStored does not skip the call 
> to doLocalAdd.
>
> Any suggestions would be welcome.
>
> Andrew
>
>
>


Re: PointType doc reindex issue

2012-10-10 Thread Ravi Solr
Gopal I did in fact test the same and it worked when I delete ted the
geolocation_0_coordinate and geolocation_1_coordinate. But that seems
weird, so I was thinking if there is something else I need to do to
avoid doing this awkward workaround.

Ravi Kiran Bhaskar

On Wed, Oct 10, 2012 at 12:36 PM, Gopal Patwa  wrote:
> You need remove field after read solr doc,  when u add new field it will
> add to list,  so when u try to commit the update field,  it will be multi
> value and in your schema it is single value
> On Oct 10, 2012 9:26 AM, "Ravi Solr"  wrote:
>
>> Hello,
>> I have a weird problem, Whenever I read the doc from solr and
>> then index the same doc that already exists in the index (aka
>> reindexing) I get the following error. Can somebody tell me what I am
>> doing wrong. I use solr 3.6 and the definition of the field is given
>> below
>>
>> > subFieldSuffix="_coordinate"/>
>> > stored="true"/>
>>
>> Exception in thread "main"
>> org.apache.solr.client.solrj.SolrServerException: Server at
>> http://testsolr:8080/solr/mycore returned non ok status:400,
>> message:ERROR: [doc=1182684] multiple values encountered for non
>> multiValued field geolocation_0_coordinate: [39.017608, 39.017608]
>> at
>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:328)
>> at
>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:211)
>> at com.wpost.search.indexing.MyTest.main(MyTest.java:31)
>>
>>
>> The data in the index looks as follows
>>
>> 39.017608,-77.375239
>> 
>>  39.017608
>>  39.017608
>> 
>> 
>> -77.375239
>> -77.375239
>> 
>>
>> Thanks
>>
>> Ravi Kiran Bhaskar
>>


Re: PointType doc reindex issue

2012-10-10 Thread Ravi Solr
I am using DirectXmlRequest to index XML. This is just a test case as
my client would be sending me a SOLR compliant XML. so I was trying to
simulate it by reading a doc from an exiting core and reindexing it.

HttpSolrServer server = new
HttpSolrServer("http://testsolr:8080/solr/mycore";);
QueryResponse resp = server.query(new 
SolrQuery("contentid:(1184911
OR 1182684)"));
SolrDocumentList list = resp.getResults();
if(list != null && !list.isEmpty()) {
for(SolrDocument doc : list) {
SolrInputDocument iDoc = 
ClientUtils.toSolrInputDocument(doc);  
String contentid = (String) 
iDoc.getFieldValue("egcontentid");
String name = (String) 
iDoc.getFieldValue("name");
iDoc.setField("name", DigestUtils.md5Hex(name));

String xml = ClientUtils.toXML(iDoc);   

DirectXmlRequest up = new 
DirectXmlRequest("/update",
""+xml+"");
server.request(up);
server.commit();

System.out.println("Updated name in contentid - 
" + contentid);

}
}

Ravi Kiran

On Wed, Oct 10, 2012 at 1:02 PM, Gopal Patwa  wrote:
> Instead addfield method use setfield
> On Oct 10, 2012 9:54 AM, "Ravi Solr"  wrote:
>
>> Gopal I did in fact test the same and it worked when I delete ted the
>> geolocation_0_coordinate and geolocation_1_coordinate. But that seems
>> weird, so I was thinking if there is something else I need to do to
>> avoid doing this awkward workaround.
>>
>> Ravi Kiran Bhaskar
>>
>> On Wed, Oct 10, 2012 at 12:36 PM, Gopal Patwa 
>> wrote:
>> > You need remove field after read solr doc,  when u add new field it will
>> > add to list,  so when u try to commit the update field,  it will be multi
>> > value and in your schema it is single value
>> > On Oct 10, 2012 9:26 AM, "Ravi Solr"  wrote:
>> >
>> >> Hello,
>> >> I have a weird problem, Whenever I read the doc from solr and
>> >> then index the same doc that already exists in the index (aka
>> >> reindexing) I get the following error. Can somebody tell me what I am
>> >> doing wrong. I use solr 3.6 and the definition of the field is given
>> >> below
>> >>
>> >> > >> subFieldSuffix="_coordinate"/>
>> >> > >> stored="true"/>
>> >>
>> >> Exception in thread "main"
>> >> org.apache.solr.client.solrj.SolrServerException: Server at
>> >> http://testsolr:8080/solr/mycore returned non ok status:400,
>> >> message:ERROR: [doc=1182684] multiple values encountered for non
>> >> multiValued field geolocation_0_coordinate: [39.017608, 39.017608]
>> >> at
>> >>
>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:328)
>> >> at
>> >>
>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:211)
>> >> at com.wpost.search.indexing.MyTest.main(MyTest.java:31)
>> >>
>> >>
>> >> The data in the index looks as follows
>> >>
>> >> 39.017608,-77.375239
>> >> 
>> >>  39.017608
>> >>  39.017608
>> >> 
>> >> 
>> >> -77.375239
>> >> -77.375239
>> >> 
>> >>
>> >> Thanks
>> >>
>> >> Ravi Kiran Bhaskar
>> >>
>>


Re: PointType doc reindex issue

2012-10-12 Thread Ravi Solr
Thank you very much Hoss, I knew I was doing something stupid. I will
change the dynamic fields to stored="false" and check it out.

Thanks

Ravi Kiran Bhaskar

On Wed, Oct 10, 2012 at 3:02 PM, Chris Hostetter
 wrote:
> : I have a weird problem, Whenever I read the doc from solr and
> : then index the same doc that already exists in the index (aka
> : reindexing) I get the following error. Can somebody tell me what I am
> : doing wrong. I use solr 3.6 and the definition of the field is given
> : below
>
> When you use the LatLonType field type you get "synthetic" *_coordinate"
> fields automicaly constructed under the covers from each of your fields
> that use a "latlon" fieldType.  because you have configured the
> "*_coordinate" fields to be "stored" they are included in the response
> when you request the doc.
>
> this means that unless you explicitly remove those synthetically
> constructed values before "reindexing", they will still be there in
> addition to the new (posisbly redundent) synthetic values created while
> indexing.
>
> This is why the "*_coordinate" dynamicField in the solr example schema.xml
> is marked 'stored="false"' so that this field doesn't come back in the
> response -- it's not ment for end users.
>
>
> :  subFieldSuffix="_coordinate"/>
> :  stored="true"/>
> :
> : Exception in thread "main"
> : org.apache.solr.client.solrj.SolrServerException: Server at
> : http://testsolr:8080/solr/mycore returned non ok status:400,
> : message:ERROR: [doc=1182684] multiple values encountered for non
> : multiValued field geolocation_0_coordinate: [39.017608, 39.017608]
> :   at 
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:328)
> :   at 
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:211)
> :   at com.wpost.search.indexing.MyTest.main(MyTest.java:31)
> :
> :
> : The data in the index looks as follows
> :
> : 39.017608,-77.375239
> : 
> :  39.017608
> :  39.017608
> : 
> : 
> : -77.375239
> : -77.375239
> : 
> :
> : Thanks
> :
> : Ravi Kiran Bhaskar
> :
>
> -Hoss