date:20160212

Re: slave is getting full synced every polling

2016-02-12 Thread Novin Novin

Typo? That's 60 seconds, but that's not especially interesting either way.

Yes, I was thinking about this too and I have changed it to 59 actually.

Do the actual segment's look identical after the polling?

Well no.

How  I am handling master slave.
How we do this use sym link for master and slave config. When converting
master to slave solrconfig.xml point to slave.xml and same for slave to
master. Than my script restart solr master and slave both. Script tells the
website which one is current master and website save current master url and
use it for searching.

What I have done when the problem started, I changed slave to master and
master to slave. Before this some thing went wrong on machine (could be the
reason of problem not really sure, checked system logs every thing was
fine) no idea what was wrong couldn't find it yet.

Ho did I fix it, I have to reinstall solr for slave, before re-installation
remove all directories related to solr. Not really ideal way to fix it. but
solved the problem and still curious what could cause that such problem?

Thanks,
Novin

On Thu, 11 Feb 2016 at 22:07 Erick Erickson  wrote:

> Typo? That's 60 seconds, but that's not especially interesting either way.
>
> Do the actual segment's look identical after the polling?
>
> On Thu, Feb 11, 2016 at 1:16 PM, Novin Novin  wrote:
> > Hi Erick,
> >
> > Below is master slave config:
> >
> > Master:
> > 
> >  
> > commit
> > optimize
> > 
> > 2
> >   
> >
> > Slave:
> > 
> > 
> >   
> >   http://master:8983/solr/big_core/replication
> > 
> >   00:00:60
> >   username
> >   password
> >  
> >   
> >
> >
> > Do you mean the Solr is restarting every minute or the polling
> > interval is 60 seconds?
> >
> > I meant polling is 60 minutes
> >
> > I didn't not see any suspicious in logs , and I'm not optimizing any
> thing
> > with commit.
> >
> > Thanks
> > Novin
> >
> > On Thu, 11 Feb 2016 at 18:02 Erick Erickson 
> wrote:
> >
> >> What is your replication configuration in solrconfig.xml on both
> >> master and slave?
> >>
> >> bq:  big core is doing full sync every time wherever it start (every
> >> minute).
> >>
> >> Do you mean the Solr is restarting every minute or the polling
> >> interval is 60 seconds?
> >>
> >> The Solr logs should tell you something about what's going on there.
> >> Also, if you are for
> >> some reason optimizing the index that'll cause a full replication.
> >>
> >> Best,
> >> Erick
> >>
> >> On Thu, Feb 11, 2016 at 8:41 AM, Novin Novin 
> wrote:
> >> > Hi Guys,
> >> >
> >> > I'm having a problem with master slave syncing.
> >> >
> >> > So I have two cores one is small core (just keep data use frequently
> for
> >> > fast results) and another is big core (for rare query and for search
> in
> >> > every thing). both core has same solrconfig file. But small core
> >> > replication is fine, other than this big core is doing full sync every
> >> time
> >> > wherever it start (every minute).
> >> >
> >> > I found this
> >> >
> >>
> http://stackoverflow.com/questions/6435652/solr-replication-keeps-downloading-entire-index-from-master
> >> >
> >> > But not really usefull.
> >> >
> >> > Solr verion 5.2.0
> >> > Small core has doc 10 mil. size around 10 to 15 GB.
> >> > Big core has doc greater than 100 mil. size around 25 to 35 GB.
> >> >
> >> > How can I stop full sync.
> >> >
> >> > Thanks
> >> > Novin
> >>
>

Re: slave is getting full synced every polling

2016-02-12 Thread Shawn Heisey

On 2/12/2016 1:58 AM, Novin Novin wrote:
> Typo? That's 60 seconds, but that's not especially interesting either way.
>
> Yes, I was thinking about this too and I have changed it to 59 actually.

If you want the polling to occur once an hour, pollInterval will need to
be set to 01:00:00 ... not 00:00:60.  If you want polling to occur once
a minute, use 00:01:00 for this setting.

> Do the actual segment's look identical after the polling?
>
> Well no.

Details here are important.  Do you understand what Erick was asking
when he was talking about segments?  The segments are the files in the
index directory, which is usually data/index inside the core's instance
directory.

I did notice that the master config you gave us has this:

class="solr.Re1plicationHandler"

Note that there is a number 1 in there.  If this is actually in your
config, I would expect there to be an error when Solr tries to create
the replication handler.  Is this an error when transferring to email,
or is it also incorrect in your solrconfig.xml file?

If you use a tool like diff to compare solrconfig.xml in your small core
to solrconfig.xml in your big core, can you see any differences in the
replication config?

Thanks,
Shawn

Searching special characters

2016-02-12 Thread Anil

HI,

How can we search special characters like *, " (double quote) where these
are actually solr uses for exact and wild card searches.

Please advice.

Regards,
Anil

Re: Searching special characters

2016-02-12 Thread Modassar Ather

You can search them by escaping with backslash.

Best,
Modassar

Re: Searching special characters

2016-02-12 Thread Anil

Thanks for quick response.

Should these be treated differently during index ?

I have tried *\"Audit* which is returning results of *Audit *also which is
incorrect. what do you say ?

On 12 February 2016 at 15:07, Modassar Ather  wrote:

> You can search them by escaping with backslash.
>
> Best,
> Modassar
>

Re: optimize requests that fetch 1000 rows

2016-02-12 Thread Matteo Grolla

Hi Jack,
 tell me if I'm wrong but qtime accounts for search time excluding the
fetch of stored fields (I have a 90ms qtime and a ~30s time to obtain the
results on the client on a LAN infrastructure for 300kB response). debug
explains how much of qtime is used by each search component.
For me 90ms are ok, I wouldn't spend time trying to make them 50ms, it's
the ~30s to obtain the response that I'd like to tackle.


2016-02-12 5:42 GMT+01:00 Jack Krupansky :

> Again, first things first... debugQuery=true and see which Solr search
> components are consuming the bulk of qtime.
>
> -- Jack Krupansky
>
> On Thu, Feb 11, 2016 at 11:33 AM, Matteo Grolla 
> wrote:
>
> > virtual hardware, 200ms is taken on the client until response is written
> to
> > disk
> > qtime on solr is ~90ms
> > not great but acceptable
> >
> > Is it possible that the method FilenameUtils.splitOnTokens is really so
> > heavy when requesting a lot of rows on slow hardware?
> >
> > 2016-02-11 17:17 GMT+01:00 Jack Krupansky :
> >
> > > Good to know. Hmmm... 200ms for 10 rows is not outrageously bad, but
> > still
> > > relatively bad. Even 50ms for 10 rows would be considered barely okay.
> > > But... again it depends on query complexity - simple queries should be
> > well
> > > under 50 ms for decent modern hardware.
> > >
> > > -- Jack Krupansky
> > >
> > > On Thu, Feb 11, 2016 at 10:36 AM, Matteo Grolla <
> matteo.gro...@gmail.com
> > >
> > > wrote:
> > >
> > > > Hi Jack,
> > > >   response time scale with rows. Relationship doens't seem linear
> > but
> > > > Below 400 rows times are much faster,
> > > > I view query times from solr logs and they are fast
> > > > the same query with rows = 1000 takes 8s
> > > > with rows = 10 takes 0.2s
> > > >
> > > >
> > > > 2016-02-11 16:22 GMT+01:00 Jack Krupansky  >:
> > > >
> > > > > Are queries scaling linearly - does a query for 100 rows take
> 1/10th
> > > the
> > > > > time (1 sec vs. 10 sec or 3 sec vs. 30 sec)?
> > > > >
> > > > > Does the app need/expect exactly 1,000 documents for the query or
> is
> > > that
> > > > > just what this particular query happened to return?
> > > > >
> > > > > What does they query look like? Is it complex or use wildcards or
> > > > function
> > > > > queries, or is it very simple keywords? How many operators?
> > > > >
> > > > > Have you used the debugQuery=true parameter to see which search
> > > > components
> > > > > are taking the time?
> > > > >
> > > > > -- Jack Krupansky
> > > > >
> > > > > On Thu, Feb 11, 2016 at 9:42 AM, Matteo Grolla <
> > > matteo.gro...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi Yonic,
> > > > > >  after the first query I find 1000 docs in the document
> cache.
> > > > > > I'm using curl to send the request and requesting javabin format
> to
> > > > mimic
> > > > > > the application.
> > > > > > gc activity is low
> > > > > > I managed to load the entire 50GB index in the filesystem cache,
> > > after
> > > > > that
> > > > > > queries don't cause disk activity anymore.
> > > > > > Time improves now queries that took ~30s take <10s. But I hoped
> > > better
> > > > > > I'm going to use jvisualvm's sampler to analyze where time is
> spent
> > > > > >
> > > > > >
> > > > > > 2016-02-11 15:25 GMT+01:00 Yonik Seeley :
> > > > > >
> > > > > > > On Thu, Feb 11, 2016 at 7:45 AM, Matteo Grolla <
> > > > > matteo.gro...@gmail.com>
> > > > > > > wrote:
> > > > > > > > Thanks Toke, yes, they are long times, and solr qtime (to
> > execute
> > > > the
> > > > > > > > query) is a fraction of a second.
> > > > > > > > The response in javabin format is around 300k.
> > > > > > >
> > > > > > > OK, That tells us a lot.
> > > > > > > And if you actually tested so that all the docs would be in the
> > > cache
> > > > > > > (can you verify this by looking at the cache stats after you
> > > > > > > re-execute?) then it seems like the slowness is down to any of:
> > > > > > > a) serializing the response (it doesn't seem like a 300K
> response
> > > > > > > should take *that* long to serialize)
> > > > > > > b) reading/processing the response (how fast the client can do
> > > > > > > something with each doc is also a factor...)
> > > > > > > c) other (GC, network, etc)
> > > > > > >
> > > > > > > You can try taking client processing out of the equation by
> > trying
> > > a
> > > > > > > curl request.
> > > > > > >
> > > > > > > -Yonik
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Searching special characters

2016-02-12 Thread Modassar Ather

These special characters can be removed if at begging or end or can be
taken care by the relevant filters depending on the schema defined.
E.g "Audit"/*Audit should be searched by query Audit so I see no reason of
indexing "/* of the content. You can use PatternReplaceFilter for replacing
these special character.
If the special character is in between a word E.g. Wi-Fi then these type of
terms can be taken care by WordDelimiterFilter.

Note that the special character handling may vary based on use cases.

Best,
Modassar

On Fri, Feb 12, 2016 at 3:09 PM, Anil  wrote:

> Thanks for quick response.
>
> Should these be treated differently during index ?
>
> I have tried *\"Audit* which is returning results of *Audit *also which is
> incorrect. what do you say ?
>
> On 12 February 2016 at 15:07, Modassar Ather 
> wrote:
>
> > You can search them by escaping with backslash.
> >
> > Best,
> > Modassar
> >
>

Weird behaviour related to facetting

2016-02-12 Thread Sebastian Geerken

Hi!

I've experienced a strange behaviour with several versions of SOLR
(currently testing with 5.4.1, but this effects can also be reproduced
with 5.3.1). Some facet values are not returned when querying
"*:*", but only when I search for something special, say text "foo".

I've stripped down both config/schema and data as far as possible,
files are attached (hope this is ok on the list).

How to reproduce:

Set up a core with the config and schema attached to this post:

$ bin/solr start
$ bin/solr create_core -c test
$ bin/solr stop
$ cp solrconfig.xml schema.xml server/solr/test/conf/
$ bin/solr start

Upload data:

$ curl 'http://localhost:8983/solr/test/update?commit=true' -H 
'Content-type:application/json' -d @data.json

Search for "*:*":

$ curl 
'http://localhost:8983/solr/test/select?q=*%3A*&rows=0&wt=json&indent=true&facet=true&facet.field=tags_hierarchy'

The facet value "1/tax/downloads/i/" will not be returned, but it will
be returned when searching for "foo" (or any other text):

$ curl 
'http://localhost:8983/solr/test/select?q=foo&rows=0&wt=json&indent=true&facet=true&facet.field=tags_hierarchy'

I also noted differences when modifying the data:

- Renaming the field "tags_hierarchy" to "tags" seems to fix the
  issue.
- The same applies to renaming "1/tax/downloads/i/" to "1/tax/d/i/".

Is this a known or unknown bug, or did I do something wrong? In the
former case: is there a feasible workaround. (Of course, renaming
comes to mind.)

Thanks in advance!

Regards
Sebastian



solrconfig.xml
Description: XML document


schema.xml
Description: XML document


data.json
Description: application/json

Re: slave is getting full synced every polling

2016-02-12 Thread Novin Novin

Details here are important.  Do you understand what Erick was asking
when he was talking about segments?  The segments are the files in the
index directory, which is usually data/index inside the core's instance
directory.

Thanks Shawn, If I am thinking right these segments also appears on core
admin page. I wasn't looked in index directory, my bad.

And  yes I used the diff to compare file. Now difference found actually.

class="solr.Re1plicationHandler" this is typing error, apologies. This is
in the file (copy from file)
class="solr.ReplicationHandler"

Thanks,
Novin

On Fri, 12 Feb 2016 at 09:30 Shawn Heisey  wrote:

> On 2/12/2016 1:58 AM, Novin Novin wrote:
> > Typo? That's 60 seconds, but that's not especially interesting either
> way.
> >
> > Yes, I was thinking about this too and I have changed it to 59 actually.
>
> If you want the polling to occur once an hour, pollInterval will need to
> be set to 01:00:00 ... not 00:00:60.  If you want polling to occur once
> a minute, use 00:01:00 for this setting.
>
> > Do the actual segment's look identical after the polling?
> >
> > Well no.
>
> Details here are important.  Do you understand what Erick was asking
> when he was talking about segments?  The segments are the files in the
> index directory, which is usually data/index inside the core's instance
> directory.
>
> I did notice that the master config you gave us has this:
>
> class="solr.Re1plicationHandler"
>
> Note that there is a number 1 in there.  If this is actually in your
> config, I would expect there to be an error when Solr tries to create
> the replication handler.  Is this an error when transferring to email,
> or is it also incorrect in your solrconfig.xml file?
>
> If you use a tool like diff to compare solrconfig.xml in your small core
> to solrconfig.xml in your big core, can you see any differences in the
> replication config?
>
> Thanks,
> Shawn
>
>

Solr-kerbarose URL not accessible

2016-02-12 Thread vidya

Hi

  When I am trying to access my solrCloud web UI page, deployed in cloudera
cluster, I have encountered with the error "DEFECTED TOKENS DETECTED" . Find
the attachment of the error that is added here. It is because of kerbarose
installed on cluster.

Is there any other way that I can access solr in this scenario with
kerbarose installed ?
Writing a java program helps in any way? While writing a java program also,
i have to give connection to solr URL with port or zookeeper host variable.
Will that java program work out?

Please help me out.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-kerbarose-URL-not-accessible-tp4256926.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: slave is getting full synced every polling

2016-02-12 Thread Novin Novin

Well It started again.

Below is are the errors from solr logging on admin ui.
Log error message in master
2/12/2016, 11:39:24 AM null:java.lang.IllegalStateException: file:
MMapDirectory@/var/solr/data/wmsapp_analysis/data/index.20160211204900750
lockFactory=org.apache.lucene.store.NativeFSLockFactory@56639f83 appears
both in delegate and in cache: cache=[_9bu7_Lucene50_0.pos,
_9bua_Lucene50_0.tip, _9bty.fdt, _9bu7.nvd, _9bu1.nvd, _9bu0.nvm,
_9bu4_Lucene50_0.tim, _8kbr_uu.liv, _9bu7_Lucene50_0.doc,
_9bu1_Lucene50_0.tip, _9bu9.fnm, _9bty.fdx, _9btv.fdx, _9bu5.nvm,
_9bu4_Lucene50_0.pos, _9bu5.fnm, _9bu3.si, _9bua_Lucene50_0.tim,
_9bty_Lucene50_0.pos, _9bu0.si, _9btw_Lucene50_0.tim, _9bu0_Lucene50_0.tim,
_9bu2.nvm, _9btv_Lucene50_0.pos, _9btv.nvd, _9bu3_Lucene50_0.tip,
_9bua_Lucene50_0.doc, _9bu7_Lucene50_0.tip, _9btw.nvm, _9bua.fdx,
_9bu4.nvm, _9bu9_Lucene50_0.tim, _9bu4_1.liv, _9bu7.nvm, _9bu3_1.liv,
_9bu0.fnm, _9bu5_Lucene50_0.tim, _9btx.fnm, _9bu2.fdx, _9bu4.fdt,
_9bu2_Lucene50_0.tip, _9bu9.fdx, _9bu9_Lucene50_0.pos, _9bu7.fdt,
_9bu9.nvd, _9btx_1.liv, _99gt_2s.liv, _9btw.nvd, _9bu3_Lucene50_0.doc,
_9bu2.fnm, _9bua_Lucene50_0.pos, _9bu9.nvm, _9btx.nvm,
_9btw_Lucene50_0.tip, _9bu1.nvm, _9bu4_Lucene50_0.doc, _9bu9_1.liv,
_9bu1.fnm, _9btu.cfs, _9bu8_Lucene50_0.tip, _9bua.nvm,
_9btx_Lucene50_0.doc, _9btu.si, _9bu0.fdt, _9bu7.si, _9btx_Lucene50_0.tip, _
9btw.si, _9bu8.fdx, _9bu0_Lucene50_0.doc, _9bu3.nvm, _9btz_Lucene50_0.tip,
_9bu3_Lucene50_0.tim, _9btz.fdt, _9btw.fdt, _9bu2.si, _9bu4.si, _9btx.nvd,
_9bu4.fnm, _9btv_1.liv, _9btz_Lucene50_0.doc, _9bpm_7.liv,
_9btx_Lucene50_0.pos, _9bty.fnm, _9btw_Lucene50_0.doc, _9btv.fdt,
_9bu2_Lucene50_0.doc, _9btu.cfe, _9bu3.nvd, _9btv.si, _9bu8.nvm, _9btx.fdt,
_9bu5.si, _9bu5.fdt, _9bu2.nvd, _9bu3.fdx, _9btv.fnm, _9bu5.fdx, _9btz.fnm,
_9bu3_Lucene50_0.pos, _9bu9_Lucene50_0.tip, _9bu1.fdt,
_9bu0_Lucene50_0.tip, _9bty_Lucene50_0.tim, _9btx_Lucene50_0.tim,
_9bt9_3.liv, _9bty.si, _9bu2.fdt, _9bu9.fdt, _9bu2_Lucene50_0.pos,
_9bua.fdt, _9bu9_Lucene50_0.doc, _9bu4.fdx, _9bu5_Lucene50_0.pos,
_9bu4.nvd, _9btv_Lucene50_0.tim, _9bty.nvd, _9bu8.si, _9bu5_Lucene50_0.doc,
_9bu9.si, _9btw.fnm, _9bu3.fnm, _9bh8_m.liv, _9bu3.fdt, _9bu5.nvd,
_9bua.fnm, _9btw_1.liv, _9bu8_Lucene50_0.pos, _9btw_Lucene50_0.pos,
_9bty_Lucene50_0.doc, _9bu6_1.liv, _9bu7.fnm, _5zcy_1kx.liv, _9bu7.fdx,
_9bu5_1.liv, _9bua.nvd, _9bty_Lucene50_0.tip, _9btz.fdx,
_9bu0_Lucene50_0.pos, _9bu1_Lucene50_0.doc, _9btx.fdx,
_9btv_Lucene50_0.tip, _9bn9_9.liv, _9bu0.fdx, _9bu8.nvd,
_9bu1_Lucene50_0.pos, _9bua.si, _9bu1.si, _9bu8_Lucene50_0.tim,
_9btv_Lucene50_0.doc, _9bu2_Lucene50_0.tim, _9bu1_Lucene50_0.tim,
_9bu8.fnm, _9bu4_Lucene50_0.tip, _9btx.si, _98nt_5c.liv, _9btz.nvd,
_9btw.fdx, _9btv.nvm, _9bu7_Lucene50_0.tim, pending_segments_nk8,
_9btz_Lucene50_0.tim, _9btz.si, _9bu8_Lucene50_0.doc, _9bu5_Lucene50_0.tip,
_9btz_Lucene50_0.pos, _9btz.nvm, _9bty.nvm, _9bu0.nvd, _9bu1.fdx,
_9bu8.fdt],delegate=[_9br2.cfe, pending_segments_nk8, _9bnd.fnm,
_9btn_Lucene50_0.tim, _96i3.cfe, _9boh.cfe, _9bto_Lucene50_0.pos,
_6s8a.fnm, _9btr.si, _9bt9.cfs, _9bh8.cfe, _9btg.nvd, _9bqi_3.liv,
_5zcy_Lucene50_0.tip, _9boh_6.liv, _98nt_Lucene50_0.tim, _9btt.si, _9bqi.si,
_9bsp.si, _9bsp.cfs, _6s8a_1la.liv, _9bn9_8.liv, _6s8a_Lucene50_0.doc,
_9bqb.cfs, _9boh.cfs, _9btp.fdx, _5h1s_1wg.liv, _8kbr.fdx, _9bti.nvm,
_9bts_Lucene50_0.pos, _9bts.si, _9btr.nvd, _9bnd_Lucene50_0.pos,
_5h1s_Lucene50_0.tim, _9btq.fdt, _9bti.nvd, _9btm_1.liv, _9btn.fdt,
_9btp.fnm, _9btg.nvm, _9bu6.cfe, _9btm.cfe, _98nt_Lucene50_0.pos,
_9bqq_6.liv, _8kbr_Lucene50_0.tip, _9btq.fdx, _9ayb_c.liv,
_5zcy_Lucene50_0.doc, _5zcy.fdt, _6s8a.nvd, _9ayb.cfe,
_6s8a_Lucene50_0.tim, _9bh8_l.liv, _17on.fdx, _9btn.fdx, _9btg.si,
_5h1s.fdt, _9btp_Lucene50_0.doc, _99gt_2r.liv, _9br2_5.liv, _9bnd.nvm, _
9bj7.si, _9bto_Lucene50_0.doc, _9bpm.cfs, _17on_Lucene50_0.doc, _99gt.si,
_9btg_Lucene50_0.tim, _9btk.nvd, _9bts_Lucene50_0.tim, _9bqb.cfe,
_98nt_Lucene50_0.tip, _9btr.nvm, _98ge.si, _9bnd_4.liv, _9bto.si,
_9btq.nvd, _9bnj.cfs, _9btn_Lucene50_0.doc, _9btt.fdt, _17on.si, _9bnj.cfe,
_17on_2wi.liv, _9btt_Lucene50_0.doc, _9bqq.si, _9bt9_2.liv,
_9btr_Lucene50_0.tim, _9btk.fnm, _9btk.si, _9bn9.cfe, _8kbr_Lucene50_0.pos,
_9bt9.cfe, _17on.fnm, _9btq.si, _98gy_d.liv, _9btp.nvd,
_9bnd_Lucene50_0.tim, _9bqq.cfe, _9bti.fnm, _8kbr_Lucene50_0.doc,
_9bqq.cfs, _9bnj.si, _9bti_1.liv, _9bt9.si, _5zcy_Lucene50_0.tim,
_9bh8.cfs, _98ge_g.liv, _9btr_Lucene50_0.pos, _9bti_Lucene50_0.doc,
_98ge.cfe, _8kbr.nvm, _9bnd.fdt, _9br2.si, _5h1s_Lucene50_0.pos,
_9btq_Lucene50_0.pos, _9btn.si, _98gy.cfe, _9b0u.si, _9btq_Lucene50_0.doc,
_9bti_Lucene50_0.tip, _9bnd_Lucene50_0.tip, _5zcy.fnm, _9ayb.cfs, _9bn9.si,
_9btq_Lucene50_0.tip, _98nt_Lucene50_0.doc, _9btp_Lucene50_0.pos, _9btp.si,
_98nt.nvd, _9bti_Lucene50_0.tim, _9bpm.cfe, _9btq.nvm, _9btn_1.liv,
_8kbr.fdt, _9btp_Lucene50_0.tip, _9btk.fdx, _9btt_Lucene50_0.tip,
_9brx_5.liv, _6s8a_Lucene50_0.tip, _9bto.fnm, _9btp.fdt, _98gy.cfs,
_9bpd_8.liv, _9bnd.nvd, _9bj7.cfs, _96i3_40.liv, _5

Re: slave is getting full synced every polling

2016-02-12 Thread Novin Novin

sorry core name is wmsapp_analysis which is big core

On Fri, 12 Feb 2016 at 12:01 Novin Novin  wrote:

> Well It started again.
>
> Below is are the errors from solr logging on admin ui.
> Log error message in master
> 2/12/2016, 11:39:24 AM null:java.lang.IllegalStateException: file:
> MMapDirectory@/var/solr/data/wmsapp_analysis/data/index.20160211204900750
> lockFactory=org.apache.lucene.store.NativeFSLockFactory@56639f83 appears
> both in delegate and in cache: cache=[_9bu7_Lucene50_0.pos,
> _9bua_Lucene50_0.tip, _9bty.fdt, _9bu7.nvd, _9bu1.nvd, _9bu0.nvm,
> _9bu4_Lucene50_0.tim, _8kbr_uu.liv, _9bu7_Lucene50_0.doc,
> _9bu1_Lucene50_0.tip, _9bu9.fnm, _9bty.fdx, _9btv.fdx, _9bu5.nvm,
> _9bu4_Lucene50_0.pos, _9bu5.fnm, _9bu3.si, _9bua_Lucene50_0.tim,
> _9bty_Lucene50_0.pos, _9bu0.si, _9btw_Lucene50_0.tim,
> _9bu0_Lucene50_0.tim, _9bu2.nvm, _9btv_Lucene50_0.pos, _9btv.nvd,
> _9bu3_Lucene50_0.tip, _9bua_Lucene50_0.doc, _9bu7_Lucene50_0.tip,
> _9btw.nvm, _9bua.fdx, _9bu4.nvm, _9bu9_Lucene50_0.tim, _9bu4_1.liv,
> _9bu7.nvm, _9bu3_1.liv, _9bu0.fnm, _9bu5_Lucene50_0.tim, _9btx.fnm,
> _9bu2.fdx, _9bu4.fdt, _9bu2_Lucene50_0.tip, _9bu9.fdx,
> _9bu9_Lucene50_0.pos, _9bu7.fdt, _9bu9.nvd, _9btx_1.liv, _99gt_2s.liv,
> _9btw.nvd, _9bu3_Lucene50_0.doc, _9bu2.fnm, _9bua_Lucene50_0.pos,
> _9bu9.nvm, _9btx.nvm, _9btw_Lucene50_0.tip, _9bu1.nvm,
> _9bu4_Lucene50_0.doc, _9bu9_1.liv, _9bu1.fnm, _9btu.cfs,
> _9bu8_Lucene50_0.tip, _9bua.nvm, _9btx_Lucene50_0.doc, _9btu.si,
> _9bu0.fdt, _9bu7.si, _9btx_Lucene50_0.tip, _9btw.si, _9bu8.fdx,
> _9bu0_Lucene50_0.doc, _9bu3.nvm, _9btz_Lucene50_0.tip,
> _9bu3_Lucene50_0.tim, _9btz.fdt, _9btw.fdt, _9bu2.si, _9bu4.si,
> _9btx.nvd, _9bu4.fnm, _9btv_1.liv, _9btz_Lucene50_0.doc, _9bpm_7.liv,
> _9btx_Lucene50_0.pos, _9bty.fnm, _9btw_Lucene50_0.doc, _9btv.fdt,
> _9bu2_Lucene50_0.doc, _9btu.cfe, _9bu3.nvd, _9btv.si, _9bu8.nvm,
> _9btx.fdt, _9bu5.si, _9bu5.fdt, _9bu2.nvd, _9bu3.fdx, _9btv.fnm,
> _9bu5.fdx, _9btz.fnm, _9bu3_Lucene50_0.pos, _9bu9_Lucene50_0.tip,
> _9bu1.fdt, _9bu0_Lucene50_0.tip, _9bty_Lucene50_0.tim,
> _9btx_Lucene50_0.tim, _9bt9_3.liv, _9bty.si, _9bu2.fdt, _9bu9.fdt,
> _9bu2_Lucene50_0.pos, _9bua.fdt, _9bu9_Lucene50_0.doc, _9bu4.fdx,
> _9bu5_Lucene50_0.pos, _9bu4.nvd, _9btv_Lucene50_0.tim, _9bty.nvd, _9bu8.si,
> _9bu5_Lucene50_0.doc, _9bu9.si, _9btw.fnm, _9bu3.fnm, _9bh8_m.liv,
> _9bu3.fdt, _9bu5.nvd, _9bua.fnm, _9btw_1.liv, _9bu8_Lucene50_0.pos,
> _9btw_Lucene50_0.pos, _9bty_Lucene50_0.doc, _9bu6_1.liv, _9bu7.fnm,
> _5zcy_1kx.liv, _9bu7.fdx, _9bu5_1.liv, _9bua.nvd, _9bty_Lucene50_0.tip,
> _9btz.fdx, _9bu0_Lucene50_0.pos, _9bu1_Lucene50_0.doc, _9btx.fdx,
> _9btv_Lucene50_0.tip, _9bn9_9.liv, _9bu0.fdx, _9bu8.nvd,
> _9bu1_Lucene50_0.pos, _9bua.si, _9bu1.si, _9bu8_Lucene50_0.tim,
> _9btv_Lucene50_0.doc, _9bu2_Lucene50_0.tim, _9bu1_Lucene50_0.tim,
> _9bu8.fnm, _9bu4_Lucene50_0.tip, _9btx.si, _98nt_5c.liv, _9btz.nvd,
> _9btw.fdx, _9btv.nvm, _9bu7_Lucene50_0.tim, pending_segments_nk8,
> _9btz_Lucene50_0.tim, _9btz.si, _9bu8_Lucene50_0.doc,
> _9bu5_Lucene50_0.tip, _9btz_Lucene50_0.pos, _9btz.nvm, _9bty.nvm,
> _9bu0.nvd, _9bu1.fdx, _9bu8.fdt],delegate=[_9br2.cfe, pending_segments_nk8,
> _9bnd.fnm, _9btn_Lucene50_0.tim, _96i3.cfe, _9boh.cfe,
> _9bto_Lucene50_0.pos, _6s8a.fnm, _9btr.si, _9bt9.cfs, _9bh8.cfe,
> _9btg.nvd, _9bqi_3.liv, _5zcy_Lucene50_0.tip, _9boh_6.liv,
> _98nt_Lucene50_0.tim, _9btt.si, _9bqi.si, _9bsp.si, _9bsp.cfs,
> _6s8a_1la.liv, _9bn9_8.liv, _6s8a_Lucene50_0.doc, _9bqb.cfs, _9boh.cfs,
> _9btp.fdx, _5h1s_1wg.liv, _8kbr.fdx, _9bti.nvm, _9bts_Lucene50_0.pos, _
> 9bts.si, _9btr.nvd, _9bnd_Lucene50_0.pos, _5h1s_Lucene50_0.tim,
> _9btq.fdt, _9bti.nvd, _9btm_1.liv, _9btn.fdt, _9btp.fnm, _9btg.nvm,
> _9bu6.cfe, _9btm.cfe, _98nt_Lucene50_0.pos, _9bqq_6.liv,
> _8kbr_Lucene50_0.tip, _9btq.fdx, _9ayb_c.liv, _5zcy_Lucene50_0.doc,
> _5zcy.fdt, _6s8a.nvd, _9ayb.cfe, _6s8a_Lucene50_0.tim, _9bh8_l.liv,
> _17on.fdx, _9btn.fdx, _9btg.si, _5h1s.fdt, _9btp_Lucene50_0.doc,
> _99gt_2r.liv, _9br2_5.liv, _9bnd.nvm, _9bj7.si, _9bto_Lucene50_0.doc,
> _9bpm.cfs, _17on_Lucene50_0.doc, _99gt.si, _9btg_Lucene50_0.tim,
> _9btk.nvd, _9bts_Lucene50_0.tim, _9bqb.cfe, _98nt_Lucene50_0.tip,
> _9btr.nvm, _98ge.si, _9bnd_4.liv, _9bto.si, _9btq.nvd, _9bnj.cfs,
> _9btn_Lucene50_0.doc, _9btt.fdt, _17on.si, _9bnj.cfe, _17on_2wi.liv,
> _9btt_Lucene50_0.doc, _9bqq.si, _9bt9_2.liv, _9btr_Lucene50_0.tim,
> _9btk.fnm, _9btk.si, _9bn9.cfe, _8kbr_Lucene50_0.pos, _9bt9.cfe,
> _17on.fnm, _9btq.si, _98gy_d.liv, _9btp.nvd, _9bnd_Lucene50_0.tim,
> _9bqq.cfe, _9bti.fnm, _8kbr_Lucene50_0.doc, _9bqq.cfs, _9bnj.si,
> _9bti_1.liv, _9bt9.si, _5zcy_Lucene50_0.tim, _9bh8.cfs, _98ge_g.liv,
> _9btr_Lucene50_0.pos, _9bti_Lucene50_0.doc, _98ge.cfe, _8kbr.nvm,
> _9bnd.fdt, _9br2.si, _5h1s_Lucene50_0.pos, _9btq_Lucene50_0.pos, _9btn.si,
> _98gy.cfe, _9b0u.si, _9btq_Lucene50_0.doc, _9bti_Lucene50_0.tip,
> _9bnd_Lucene50_0.tip, _5zcy.fnm, _9ayb.cfs, _9bn9.si,
> _9btq_Lucene50_0.tip, _98nt_Lucene50_0.doc, _9btp_Lucene50_0.pos, _9btp.si,
>

Re: Solr-kerbarose URL not accessible

2016-02-12 Thread Anil

through use jaas-configuration through java API connectivity.

System.setProperty("java.security.auth.login.config", "location of jaas
configration file");
HttpClientUtil.setConfigurer(new Krb5HttpClientConfigurer());

add above two lines when your creating solr cloud

you can find jaas conf information at

http://www.cloudera.com/documentation/archive/search/1-3-0/Cloudera-Search-User-Guide/csug_using_kerberos.html

Hope this helps.

Regards,
Anil Dasari


On 12 February 2016 at 16:58, vidya  wrote:

> Hi
>
>   When I am trying to access my solrCloud web UI page, deployed in cloudera
> cluster, I have encountered with the error "DEFECTED TOKENS DETECTED" .
> Find
> the attachment of the error that is added here. It is because of kerbarose
> installed on cluster.
>
> Is there any other way that I can access solr in this scenario with
> kerbarose installed ?
> Writing a java program helps in any way? While writing a java program also,
> i have to give connection to solr URL with port or zookeeper host variable.
> Will that java program work out?
>
> Please help me out.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solr-kerbarose-URL-not-accessible-tp4256926.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

query knowledge graph

2016-02-12 Thread Midas A

 Please suggest how to create query knowledge graph for e-commerce
application .


please describe in detail . our mote is to improve relevancy . we are from
LAMP back ground .

Re: Weird behaviour related to facetting

2016-02-12 Thread Alessandro Benedetti

I know sometime it happens, unfortunately you simply ignored the
facet.limit parameter ...
By default you show only the first 100 facets.
Showing more is going to show also the one you were thinking were missing (
but actually were simply not shown) .

Cheers

On 12 February 2016 at 10:59, Sebastian Geerken  wrote:

> Hi!
>
> I've experienced a strange behaviour with several versions of SOLR
> (currently testing with 5.4.1, but this effects can also be reproduced
> with 5.3.1). Some facet values are not returned when querying
> "*:*", but only when I search for something special, say text "foo".
>
> I've stripped down both config/schema and data as far as possible,
> files are attached (hope this is ok on the list).
>
> How to reproduce:
>
> Set up a core with the config and schema attached to this post:
>
> $ bin/solr start
> $ bin/solr create_core -c test
> $ bin/solr stop
> $ cp solrconfig.xml schema.xml server/solr/test/conf/
> $ bin/solr start
>
> Upload data:
>
> $ curl 'http://localhost:8983/solr/test/update?commit=true' -H
> 'Content-type:application/json' -d @data.json
>
> Search for "*:*":
>
> $ curl '
> http://localhost:8983/solr/test/select?q=*%3A*&rows=0&wt=json&indent=true&facet=true&facet.field=tags_hierarchy
> '
>
> The facet value "1/tax/downloads/i/" will not be returned, but it will
> be returned when searching for "foo" (or any other text):
>
> $ curl '
> http://localhost:8983/solr/test/select?q=foo&rows=0&wt=json&indent=true&facet=true&facet.field=tags_hierarchy
> '
>
> I also noted differences when modifying the data:
>
> - Renaming the field "tags_hierarchy" to "tags" seems to fix the
>   issue.
> - The same applies to renaming "1/tax/downloads/i/" to "1/tax/d/i/".
>
> Is this a known or unknown bug, or did I do something wrong? In the
> former case: is there a feasible workaround. (Of course, renaming
> comes to mind.)
>
> Thanks in advance!
>
> Regards
> Sebastian
>
>


-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England

Re: How is Tika used with Solr

2016-02-12 Thread xavi jmlucjav

Of course, but that code is very tricky, so if the extraction library takes
care of all that, it's a huge gain. The Aperture library I used worked very
well in that regard, and even though it did not use processes as Timothy
says, it never got stuck if I remember correctly.

On Fri, Feb 12, 2016 at 1:46 AM, Erick Erickson 
wrote:

> Well, I'd imagine you could spawn threads and monitor/kill them as
> necessary, although that doesn't deal with OOM errors
>
> FWIW,
> Erick
>
> On Thu, Feb 11, 2016 at 3:08 PM, xavi jmlucjav  wrote:
> > For sure, if I need heavy duty text extraction again, Tika would be the
> > obvious choice if it covers dealing with hangs. I never used tika-server
> > myself (not sure if it existed at the time) just used tika from my own
> jvm.
> >
> > On Thu, Feb 11, 2016 at 8:45 PM, Allison, Timothy B.  >
> > wrote:
> >
> >> x-post to Tika user's
> >>
> >> Y and n.  If you run tika app as:
> >>
> >> java -jar tika-app.jar  
> >>
> >> It runs tika-batch under the hood (TIKA-1330 as part of TIKA-1302).
> This
> >> creates a parent and child process, if the child process notices a hung
> >> thread, it dies, and the parent restarts it.  Or if your OS gets upset
> with
> >> the child process and kills it out of self preservation, the parent
> >> restarts the child, or if there's an OOM...and you can configure how
> often
> >> the child shuts itself down (with parental restarting) to mitigate
> memory
> >> leaks.
> >>
> >> So, y, if your use case allows  , then we now
> have
> >> that in Tika.
> >>
> >> I've been wanting to add a similar watchdog to tika-server ... any
> >> interest in that?
> >>
> >>
> >> -Original Message-
> >> From: xavi jmlucjav [mailto:jmluc...@gmail.com]
> >> Sent: Thursday, February 11, 2016 2:16 PM
> >> To: solr-user 
> >> Subject: Re: How is Tika used with Solr
> >>
> >> I have found that when you deal with large amounts of all sort of files,
> >> in the end you find stuff (pdfs are typically nasty) that will hang
> tika.
> >> That is even worse that a crash or OOM.
> >> We used aperture instead of tika because at the time it provided a
> >> watchdog feature to kill what seemed like a hanged extracting thread.
> That
> >> feature is super important for a robust text extracting pipeline. Has
> Tika
> >> gained such feature already?
> >>
> >> xavier
> >>
> >> On Wed, Feb 10, 2016 at 6:37 PM, Erick Erickson <
> erickerick...@gmail.com>
> >> wrote:
> >>
> >> > Timothy's points are absolutely spot-on. In production scenarios, if
> >> > you use the simple "run Tika in a SolrJ program" approach you _must_
> >> > abort the program on OOM errors and the like and  figure out what's
> >> > going on with the offending document(s). Or record the name somewhere
> >> > and skip it next time 'round. Or
> >> >
> >> > How much you have to build in here really depends on your use case.
> >> > For "small enough"
> >> > sets of documents or one-time indexing, you can get by with dealing
> >> > with errors one at a time.
> >> > For robust systems where you have to have indexing available at all
> >> > times and _especially_ where you don't control the document corpus,
> >> > you have to build something far more tolerant as per Tim's comments.
> >> >
> >> > FWIW,
> >> > Erick
> >> >
> >> > On Wed, Feb 10, 2016 at 4:27 AM, Allison, Timothy B.
> >> > 
> >> > wrote:
> >> > > I completely agree on the impulse, and for the vast majority of the
> >> > > time
> >> > (regular catchable exceptions), that'll work.  And, by vast majority,
> >> > aside from oom on very large files, we aren't seeing these problems
> >> > any more in our 3 million doc corpus (y, I know, small by today's
> >> > standards) from
> >> > govdocs1 and Common Crawl over on our Rackspace vm.
> >> > >
> >> > > Given my focus on Tika, I'm overly sensitive to the worst case
> >> > scenarios.  I find it encouraging, Erick, that you haven't seen these
> >> > types of problems, that users aren't complaining too often about
> >> > catastrophic failures of Tika within Solr Cell, and that this thread
> >> > is not yet swamped with integrators agreeing with me. :)
> >> > >
> >> > > However, because oom can leave memory in a corrupted state (right?),
> >> > because you can't actually kill a thread for a permanent hang and
> >> > because Tika is a kitchen sink and we can't prevent memory leaks in
> >> > our dependencies, one needs to be aware that bad things can
> >> > happen...if only very, very rarely.  For a fellow traveler who has run
> >> > into these issues on massive data sets, see also [0].
> >> > >
> >> > > Configuring Hadoop to work around these types of problems is not too
> >> > difficult -- it has to be done with some thought, though.  On
> >> > conventional single box setups, the ForkParser within Tika is one
> >> > option, tika-batch is another.  Hand rolling your own parent/child
> >> > process is non-trivial and is not necessary for the vast majority of
> use
> >> cases.
> >> > >
> >> > >
> >> > > [0]
> >>

Re: Solr architecture

2016-02-12 Thread Mark Robinson

Thanks All for your suggestions!

Rgds,
Mark.

On Thu, Feb 11, 2016 at 9:45 AM, Upayavira  wrote:

> Your biggest issue here is likely to be http connections. Making an HTTP
> connection to Solr is way more expensive than the ask of adding a single
> document to the index. If you are expecting to add 24 billion docs per
> day, I'd suggest that somehow merging those documents into batches
> before sending them to Solr will be necessary.
>
> To my previous question - what do you gain by using Solr that you don't
> get from other solutions? I'd suggest that to make this system really
> work, you are going to need a deep understanding of how Lucene works -
> segments, segment merges, deletions, and many other things because when
> you start to work at that scale, the implementation details behind
> Lucene really start to matter and impact upon your ability to succeed.
>
> I'd suggest that what you are undertaking can certainly be done, but is
> a substantial project.
>
> Upayavira
>
> On Wed, Feb 10, 2016, at 09:48 PM, Mark Robinson wrote:
> > Thanks everyone for your suggestions.
> > Based on it I am planning to have one doc per event with sessionId
> > common.
> >
> > So in this case hopefully indexing each doc as and when it comes would be
> > okay? Or do we still need to batch and index to Solr?
> >
> > Also with 4M sessions a day with about 6000 docs (events) per session we
> > can expect about 24Billion docs per day!
> >
> > Will Solr still hold good. If so could some one please recommend a sizing
> > to cater to this levels of data.
> > The queries per second is around 320 qps.
> >
> > Thanks!
> > Mark
> >
> >
> > On Wed, Feb 10, 2016 at 3:38 AM, Emir Arnautovic <
> > emir.arnauto...@sematext.com> wrote:
> >
> > > Hi Mark,
> > > Appending session actions just to be able to return more than one
> session
> > > without retrieving large number of results is not good tradeoff. Like
> > > Upayavira suggested, you should consider storing one action per doc and
> > > aggregate on read time or push to Solr once session ends and aggregate
> on
> > > some other layer.
> > > If you are thinking handling infrastructure might be too much, you may
> > > consider using some of logging services to hold data. One such service
> is
> > > Sematext's Logsene (http://sematext.com/logsene).
> > >
> > > Thanks,
> > > Emir
> > >
> > > --
> > > Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> > > Solr & Elasticsearch Support * http://sematext.com/
> > >
> > >
> > >
> > > On 10.02.2016 03:22, Mark Robinson wrote:
> > >
> > >> Thanks for your replies and suggestions!
> > >>
> > >> Why I store all events related to a session under one doc?
> > >> Each session can have about 500 total entries (events) corresponding
> to
> > >> it.
> > >> So when I try to retrieve a session's info it can back with around 500
> > >> records. If it is this compounded one doc per session, I can retrieve
> more
> > >> sessions at a time with one doc per session.
> > >> eg under a sessionId an array of eventA activities, eventB activities
> > >>   (using json). When an eventA activity again occurs, we will read all
> > >> that
> > >> data for that session, append this extra info to evenA data and push
> the
> > >> whole session related data back (indexing) to Solr. Like this for many
> > >> sessions parallely.
> > >>
> > >>
> > >> Why NRT?
> > >> Parallely many sessions are being written (4Million sessions hence
> > >> 4Million
> > >> docs per day). A person can do this querying any time.
> > >>
> > >> It is just a look up?
> > >> Yes. We just need to retrieve all info for a session and pass it on to
> > >> another system. We may even do some extra querying on some data like
> > >> timestamps, pageurl etc in that info added to a session.
> > >>
> > >> Thinking of having the data separate from the actual Solr Instance and
> > >> mention the loc of the dataDir in solrconfig.
> > >>
> > >> If Solr is not a good option could you please suggest something which
> will
> > >> satisfy this use case with min response time while querying.
> > >>
> > >> Thanks!
> > >> Mark
> > >>
> > >> On Tue, Feb 9, 2016 at 6:02 PM, Daniel Collins  >
> > >> wrote:
> > >>
> > >> So as I understand your use case, its effectively logging actions
> within a
> > >>> user session, why do you have to do the update in NRT?  Why not just
> log
> > >>> all the user session events (with some unique key, and ensuring the
> > >>> session
> > >>> Id is in the document somewhere), then when you want to do the
> query, you
> > >>> join on the session id, and that gives you all the data records for
> that
> > >>> session. I don't really follow why it has to be 1 document (which you
> > >>> continually update). If you really need that aggregation, couldn't
> that
> > >>> happen offline?
> > >>>
> > >>> I guess your 1 saving grace is that you query using the unique ID (in
> > >>> your
> > >>> scenario) so you could use the real-time get handler, since you
> aren't
> > >>> do

Re: Solr-kerbarose URL not accessible

2016-02-12 Thread Shawn Heisey

On 2/12/2016 4:28 AM, vidya wrote:
>   When I am trying to access my solrCloud web UI page, deployed in cloudera
> cluster, I have encountered with the error "DEFECTED TOKENS DETECTED" . Find
> the attachment of the error that is added here. It is because of kerbarose
> installed on cluster.
>
> Is there any other way that I can access solr in this scenario with
> kerbarose installed ?
> Writing a java program helps in any way? While writing a java program also,
> i have to give connection to solr URL with port or zookeeper host variable.
> Will that java program work out?

Thismailing list will filter out most attachments.  It looks like you're
accessing the list through the Nabble forum, but I still don't see any
files even when I visit the Nabble website.  Without your attachment(s),
I can't see the problem, so I cannot offer any advice.

The best option is to place the relevant data on a website like gist or
dropbox and provide a link to that information.

The error message you have described does not appear in the Solr source
code, so it must be coming from whatever Kerberos software is being used
for the authentication, or maybe from the customizations that Cloudera
has made to Solr in their search product.

I keep coming back to the fact that I can't actually see the full error
message and Java stacktrace, because your attachments are not available.

Thanks,
Shawn

Re: optimize requests that fetch 1000 rows

2016-02-12 Thread Shawn Heisey

On 2/12/2016 2:57 AM, Matteo Grolla wrote:
>  tell me if I'm wrong but qtime accounts for search time excluding the
> fetch of stored fields (I have a 90ms qtime and a ~30s time to obtain the
> results on the client on a LAN infrastructure for 300kB response). debug
> explains how much of qtime is used by each search component.
> For me 90ms are ok, I wouldn't spend time trying to make them 50ms, it's
> the ~30s to obtain the response that I'd like to tackle.

30 seconds to retreive data for a 300KB result indicates a *severe*
performance issue.

Stored fields in Lucene (and by extension, Solr) are compressed in
version 4.1 and later.  This means that they must be retrieved from disk
and then uncompressed before they can be sent back to clients.  Solr
does not offer any way to turn the compression off, but benchmarks have
shown that the overhead incurred by the compression and decompression on
a lightly loaded host is minimal.

If your system is running in a low memory situation, then the OS disk
cache may not be effective, which slows down data retrieval.  Also, if
available memory is low and the disks are extremely busy, then it may
take a very long time to retrieve the data from the disk.

If the CPUs on the system are extremely busy, then there may not be much
CPU time for the decompression.

A combination of low memory, extremely disk heavy I/O, and very busy
CPUs could potentially cause this kind of delay.  What can you tell us
about your index, your server, and how busythat server is?  If Solr is
running in a virtual machine, then the overall CPU, memory, and I/O load
on the physical host will be relevant.

Here's some general information about performance problems:

https://wiki.apache.org/solr/SolrPerformanceProblems

Thanks,
Shawn

Re: slave is getting full synced every polling

2016-02-12 Thread Alessandro Benedetti

Have you customised the merge factor ?
Is it aggressive ?
In case a lot of merge happens, you can potentially incur in a big trasnfer
of files each replication .
You need to check the segments in the slave every minutes.
When the replication is triggered what are the difference from the Master
index ( in term of segments) and the slave ?

Cheers

On 12 February 2016 at 12:03, Novin Novin  wrote:

> sorry core name is wmsapp_analysis which is big core
>
> On Fri, 12 Feb 2016 at 12:01 Novin Novin  wrote:
>
> > Well It started again.
> >
> > Below is are the errors from solr logging on admin ui.
> > Log error message in master
> > 2/12/2016, 11:39:24 AM null:java.lang.IllegalStateException: file:
> > MMapDirectory@
> /var/solr/data/wmsapp_analysis/data/index.20160211204900750
> > lockFactory=org.apache.lucene.store.NativeFSLockFactory@56639f83 appears
> > both in delegate and in cache: cache=[_9bu7_Lucene50_0.pos,
> > _9bua_Lucene50_0.tip, _9bty.fdt, _9bu7.nvd, _9bu1.nvd, _9bu0.nvm,
> > _9bu4_Lucene50_0.tim, _8kbr_uu.liv, _9bu7_Lucene50_0.doc,
> > _9bu1_Lucene50_0.tip, _9bu9.fnm, _9bty.fdx, _9btv.fdx, _9bu5.nvm,
> > _9bu4_Lucene50_0.pos, _9bu5.fnm, _9bu3.si, _9bua_Lucene50_0.tim,
> > _9bty_Lucene50_0.pos, _9bu0.si, _9btw_Lucene50_0.tim,
> > _9bu0_Lucene50_0.tim, _9bu2.nvm, _9btv_Lucene50_0.pos, _9btv.nvd,
> > _9bu3_Lucene50_0.tip, _9bua_Lucene50_0.doc, _9bu7_Lucene50_0.tip,
> > _9btw.nvm, _9bua.fdx, _9bu4.nvm, _9bu9_Lucene50_0.tim, _9bu4_1.liv,
> > _9bu7.nvm, _9bu3_1.liv, _9bu0.fnm, _9bu5_Lucene50_0.tim, _9btx.fnm,
> > _9bu2.fdx, _9bu4.fdt, _9bu2_Lucene50_0.tip, _9bu9.fdx,
> > _9bu9_Lucene50_0.pos, _9bu7.fdt, _9bu9.nvd, _9btx_1.liv, _99gt_2s.liv,
> > _9btw.nvd, _9bu3_Lucene50_0.doc, _9bu2.fnm, _9bua_Lucene50_0.pos,
> > _9bu9.nvm, _9btx.nvm, _9btw_Lucene50_0.tip, _9bu1.nvm,
> > _9bu4_Lucene50_0.doc, _9bu9_1.liv, _9bu1.fnm, _9btu.cfs,
> > _9bu8_Lucene50_0.tip, _9bua.nvm, _9btx_Lucene50_0.doc, _9btu.si,
> > _9bu0.fdt, _9bu7.si, _9btx_Lucene50_0.tip, _9btw.si, _9bu8.fdx,
> > _9bu0_Lucene50_0.doc, _9bu3.nvm, _9btz_Lucene50_0.tip,
> > _9bu3_Lucene50_0.tim, _9btz.fdt, _9btw.fdt, _9bu2.si, _9bu4.si,
> > _9btx.nvd, _9bu4.fnm, _9btv_1.liv, _9btz_Lucene50_0.doc, _9bpm_7.liv,
> > _9btx_Lucene50_0.pos, _9bty.fnm, _9btw_Lucene50_0.doc, _9btv.fdt,
> > _9bu2_Lucene50_0.doc, _9btu.cfe, _9bu3.nvd, _9btv.si, _9bu8.nvm,
> > _9btx.fdt, _9bu5.si, _9bu5.fdt, _9bu2.nvd, _9bu3.fdx, _9btv.fnm,
> > _9bu5.fdx, _9btz.fnm, _9bu3_Lucene50_0.pos, _9bu9_Lucene50_0.tip,
> > _9bu1.fdt, _9bu0_Lucene50_0.tip, _9bty_Lucene50_0.tim,
> > _9btx_Lucene50_0.tim, _9bt9_3.liv, _9bty.si, _9bu2.fdt, _9bu9.fdt,
> > _9bu2_Lucene50_0.pos, _9bua.fdt, _9bu9_Lucene50_0.doc, _9bu4.fdx,
> > _9bu5_Lucene50_0.pos, _9bu4.nvd, _9btv_Lucene50_0.tim, _9bty.nvd, _
> 9bu8.si,
> > _9bu5_Lucene50_0.doc, _9bu9.si, _9btw.fnm, _9bu3.fnm, _9bh8_m.liv,
> > _9bu3.fdt, _9bu5.nvd, _9bua.fnm, _9btw_1.liv, _9bu8_Lucene50_0.pos,
> > _9btw_Lucene50_0.pos, _9bty_Lucene50_0.doc, _9bu6_1.liv, _9bu7.fnm,
> > _5zcy_1kx.liv, _9bu7.fdx, _9bu5_1.liv, _9bua.nvd, _9bty_Lucene50_0.tip,
> > _9btz.fdx, _9bu0_Lucene50_0.pos, _9bu1_Lucene50_0.doc, _9btx.fdx,
> > _9btv_Lucene50_0.tip, _9bn9_9.liv, _9bu0.fdx, _9bu8.nvd,
> > _9bu1_Lucene50_0.pos, _9bua.si, _9bu1.si, _9bu8_Lucene50_0.tim,
> > _9btv_Lucene50_0.doc, _9bu2_Lucene50_0.tim, _9bu1_Lucene50_0.tim,
> > _9bu8.fnm, _9bu4_Lucene50_0.tip, _9btx.si, _98nt_5c.liv, _9btz.nvd,
> > _9btw.fdx, _9btv.nvm, _9bu7_Lucene50_0.tim, pending_segments_nk8,
> > _9btz_Lucene50_0.tim, _9btz.si, _9bu8_Lucene50_0.doc,
> > _9bu5_Lucene50_0.tip, _9btz_Lucene50_0.pos, _9btz.nvm, _9bty.nvm,
> > _9bu0.nvd, _9bu1.fdx, _9bu8.fdt],delegate=[_9br2.cfe,
> pending_segments_nk8,
> > _9bnd.fnm, _9btn_Lucene50_0.tim, _96i3.cfe, _9boh.cfe,
> > _9bto_Lucene50_0.pos, _6s8a.fnm, _9btr.si, _9bt9.cfs, _9bh8.cfe,
> > _9btg.nvd, _9bqi_3.liv, _5zcy_Lucene50_0.tip, _9boh_6.liv,
> > _98nt_Lucene50_0.tim, _9btt.si, _9bqi.si, _9bsp.si, _9bsp.cfs,
> > _6s8a_1la.liv, _9bn9_8.liv, _6s8a_Lucene50_0.doc, _9bqb.cfs, _9boh.cfs,
> > _9btp.fdx, _5h1s_1wg.liv, _8kbr.fdx, _9bti.nvm, _9bts_Lucene50_0.pos, _
> > 9bts.si, _9btr.nvd, _9bnd_Lucene50_0.pos, _5h1s_Lucene50_0.tim,
> > _9btq.fdt, _9bti.nvd, _9btm_1.liv, _9btn.fdt, _9btp.fnm, _9btg.nvm,
> > _9bu6.cfe, _9btm.cfe, _98nt_Lucene50_0.pos, _9bqq_6.liv,
> > _8kbr_Lucene50_0.tip, _9btq.fdx, _9ayb_c.liv, _5zcy_Lucene50_0.doc,
> > _5zcy.fdt, _6s8a.nvd, _9ayb.cfe, _6s8a_Lucene50_0.tim, _9bh8_l.liv,
> > _17on.fdx, _9btn.fdx, _9btg.si, _5h1s.fdt, _9btp_Lucene50_0.doc,
> > _99gt_2r.liv, _9br2_5.liv, _9bnd.nvm, _9bj7.si, _9bto_Lucene50_0.doc,
> > _9bpm.cfs, _17on_Lucene50_0.doc, _99gt.si, _9btg_Lucene50_0.tim,
> > _9btk.nvd, _9bts_Lucene50_0.tim, _9bqb.cfe, _98nt_Lucene50_0.tip,
> > _9btr.nvm, _98ge.si, _9bnd_4.liv, _9bto.si, _9btq.nvd, _9bnj.cfs,
> > _9btn_Lucene50_0.doc, _9btt.fdt, _17on.si, _9bnj.cfe, _17on_2wi.liv,
> > _9btt_Lucene50_0.doc, _9bqq.si, _9bt9_2.liv, _9btr_Lucene50_0.tim,
> > _9btk.fnm, _9btk.si, _9bn9.cfe, _8kbr_Lucene50_0.pos, _9bt9.cfe,
> > _1

Re: Weird behaviour related to facetting

2016-02-12 Thread Sebastian Geerken

Alessandro,

thank you for the hint. Setting facet.limit to a higher value fixes
the problem.

Regards
Sebastian

On Fr, Feb 12, 2016, Alessandro Benedetti wrote:
> I know sometime it happens, unfortunately you simply ignored the
> facet.limit parameter ...
> By default you show only the first 100 facets.
> Showing more is going to show also the one you were thinking were missing (
> but actually were simply not shown) .
> 
> Cheers
> 
> On 12 February 2016 at 10:59, Sebastian Geerken  wrote:
> 
> > Hi!
> >
> > I've experienced a strange behaviour with several versions of SOLR
> > (currently testing with 5.4.1, but this effects can also be reproduced
> > with 5.3.1). Some facet values are not returned when querying
> > "*:*", but only when I search for something special, say text "foo".
> >
> > I've stripped down both config/schema and data as far as possible,
> > files are attached (hope this is ok on the list).
> >
> > How to reproduce:
> >
> > Set up a core with the config and schema attached to this post:
> >
> > $ bin/solr start
> > $ bin/solr create_core -c test
> > $ bin/solr stop
> > $ cp solrconfig.xml schema.xml server/solr/test/conf/
> > $ bin/solr start
> >
> > Upload data:
> >
> > $ curl 'http://localhost:8983/solr/test/update?commit=true' -H
> > 'Content-type:application/json' -d @data.json
> >
> > Search for "*:*":
> >
> > $ curl '
> > http://localhost:8983/solr/test/select?q=*%3A*&rows=0&wt=json&indent=true&facet=true&facet.field=tags_hierarchy
> > '
> >
> > The facet value "1/tax/downloads/i/" will not be returned, but it will
> > be returned when searching for "foo" (or any other text):
> >
> > $ curl '
> > http://localhost:8983/solr/test/select?q=foo&rows=0&wt=json&indent=true&facet=true&facet.field=tags_hierarchy
> > '
> >
> > I also noted differences when modifying the data:
> >
> > - Renaming the field "tags_hierarchy" to "tags" seems to fix the
> >   issue.
> > - The same applies to renaming "1/tax/downloads/i/" to "1/tax/d/i/".
> >
> > Is this a known or unknown bug, or did I do something wrong? In the
> > former case: is there a feasible workaround. (Of course, renaming
> > comes to mind.)
> >
> > Thanks in advance!
> >
> > Regards
> > Sebastian
> >
> >
> 
> 
> -- 
> --
> 
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
> 
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
> 
> William Blake - Songs of Experience -1794 England

Re: optimize requests that fetch 1000 rows

2016-02-12 Thread Jack Krupansky

Thanks for that critical clarification. Try...

1. A different response writer to see if that impacts the clock time.
2. Selectively remove fields from the fl field list to see if some
particular field has some issue.
3. If you simply return only the ID for the document, how fast/slow is that?

How many fields are in fl?
Any function queries in fl?


-- Jack Krupansky

On Fri, Feb 12, 2016 at 4:57 AM, Matteo Grolla 
wrote:

> Hi Jack,
>  tell me if I'm wrong but qtime accounts for search time excluding the
> fetch of stored fields (I have a 90ms qtime and a ~30s time to obtain the
> results on the client on a LAN infrastructure for 300kB response). debug
> explains how much of qtime is used by each search component.
> For me 90ms are ok, I wouldn't spend time trying to make them 50ms, it's
> the ~30s to obtain the response that I'd like to tackle.
>
>
> 2016-02-12 5:42 GMT+01:00 Jack Krupansky :
>
> > Again, first things first... debugQuery=true and see which Solr search
> > components are consuming the bulk of qtime.
> >
> > -- Jack Krupansky
> >
> > On Thu, Feb 11, 2016 at 11:33 AM, Matteo Grolla  >
> > wrote:
> >
> > > virtual hardware, 200ms is taken on the client until response is
> written
> > to
> > > disk
> > > qtime on solr is ~90ms
> > > not great but acceptable
> > >
> > > Is it possible that the method FilenameUtils.splitOnTokens is really so
> > > heavy when requesting a lot of rows on slow hardware?
> > >
> > > 2016-02-11 17:17 GMT+01:00 Jack Krupansky :
> > >
> > > > Good to know. Hmmm... 200ms for 10 rows is not outrageously bad, but
> > > still
> > > > relatively bad. Even 50ms for 10 rows would be considered barely
> okay.
> > > > But... again it depends on query complexity - simple queries should
> be
> > > well
> > > > under 50 ms for decent modern hardware.
> > > >
> > > > -- Jack Krupansky
> > > >
> > > > On Thu, Feb 11, 2016 at 10:36 AM, Matteo Grolla <
> > matteo.gro...@gmail.com
> > > >
> > > > wrote:
> > > >
> > > > > Hi Jack,
> > > > >   response time scale with rows. Relationship doens't seem
> linear
> > > but
> > > > > Below 400 rows times are much faster,
> > > > > I view query times from solr logs and they are fast
> > > > > the same query with rows = 1000 takes 8s
> > > > > with rows = 10 takes 0.2s
> > > > >
> > > > >
> > > > > 2016-02-11 16:22 GMT+01:00 Jack Krupansky <
> jack.krupan...@gmail.com
> > >:
> > > > >
> > > > > > Are queries scaling linearly - does a query for 100 rows take
> > 1/10th
> > > > the
> > > > > > time (1 sec vs. 10 sec or 3 sec vs. 30 sec)?
> > > > > >
> > > > > > Does the app need/expect exactly 1,000 documents for the query or
> > is
> > > > that
> > > > > > just what this particular query happened to return?
> > > > > >
> > > > > > What does they query look like? Is it complex or use wildcards or
> > > > > function
> > > > > > queries, or is it very simple keywords? How many operators?
> > > > > >
> > > > > > Have you used the debugQuery=true parameter to see which search
> > > > > components
> > > > > > are taking the time?
> > > > > >
> > > > > > -- Jack Krupansky
> > > > > >
> > > > > > On Thu, Feb 11, 2016 at 9:42 AM, Matteo Grolla <
> > > > matteo.gro...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Yonic,
> > > > > > >  after the first query I find 1000 docs in the document
> > cache.
> > > > > > > I'm using curl to send the request and requesting javabin
> format
> > to
> > > > > mimic
> > > > > > > the application.
> > > > > > > gc activity is low
> > > > > > > I managed to load the entire 50GB index in the filesystem
> cache,
> > > > after
> > > > > > that
> > > > > > > queries don't cause disk activity anymore.
> > > > > > > Time improves now queries that took ~30s take <10s. But I hoped
> > > > better
> > > > > > > I'm going to use jvisualvm's sampler to analyze where time is
> > spent
> > > > > > >
> > > > > > >
> > > > > > > 2016-02-11 15:25 GMT+01:00 Yonik Seeley :
> > > > > > >
> > > > > > > > On Thu, Feb 11, 2016 at 7:45 AM, Matteo Grolla <
> > > > > > matteo.gro...@gmail.com>
> > > > > > > > wrote:
> > > > > > > > > Thanks Toke, yes, they are long times, and solr qtime (to
> > > execute
> > > > > the
> > > > > > > > > query) is a fraction of a second.
> > > > > > > > > The response in javabin format is around 300k.
> > > > > > > >
> > > > > > > > OK, That tells us a lot.
> > > > > > > > And if you actually tested so that all the docs would be in
> the
> > > > cache
> > > > > > > > (can you verify this by looking at the cache stats after you
> > > > > > > > re-execute?) then it seems like the slowness is down to any
> of:
> > > > > > > > a) serializing the response (it doesn't seem like a 300K
> > response
> > > > > > > > should take *that* long to serialize)
> > > > > > > > b) reading/processing the response (how fast the client can
> do
> > > > > > > > something with each doc is also a factor...)
> > > > > > > > c) other (GC, network, etc)
> > > > > > > >
> > > > > > > > You can try taking client proces

Re: optimize requests that fetch 1000 rows

2016-02-12 Thread Erick Erickson

I agree with everyone else that this seems very unusual, but here are
some additional possible options:

If (and only if) you're returning "simple" (i.e. numerics and strings)
you could consider the Streaming Aggregation stuff. It's built to
return rows without going to disk. The restriction is that it can only
return things that are DocValues fields.

If it turns out that it's I/O contention, SSDs could help here.

If it's the decompression, writing your own codec has been suggested
as a solution to not compressing. Although you can infer from the fact
that it's not offered OOB to indicate that this isn't a widespread
problem, or at least not painful enough that someone's already
provided that option.

If it's CPU, beefier machines are a possibility. I doubt it's CPU for
300K, but you never know.

I take it that when returning 0 rows, the response time roughly
approximates QTime + a fairly stable number?

Best,
Erick



On Fri, Feb 12, 2016 at 7:10 AM, Jack Krupansky
 wrote:
> Thanks for that critical clarification. Try...
>
> 1. A different response writer to see if that impacts the clock time.
> 2. Selectively remove fields from the fl field list to see if some
> particular field has some issue.
> 3. If you simply return only the ID for the document, how fast/slow is that?
>
> How many fields are in fl?
> Any function queries in fl?
>
>
> -- Jack Krupansky
>
> On Fri, Feb 12, 2016 at 4:57 AM, Matteo Grolla 
> wrote:
>
>> Hi Jack,
>>  tell me if I'm wrong but qtime accounts for search time excluding the
>> fetch of stored fields (I have a 90ms qtime and a ~30s time to obtain the
>> results on the client on a LAN infrastructure for 300kB response). debug
>> explains how much of qtime is used by each search component.
>> For me 90ms are ok, I wouldn't spend time trying to make them 50ms, it's
>> the ~30s to obtain the response that I'd like to tackle.
>>
>>
>> 2016-02-12 5:42 GMT+01:00 Jack Krupansky :
>>
>> > Again, first things first... debugQuery=true and see which Solr search
>> > components are consuming the bulk of qtime.
>> >
>> > -- Jack Krupansky
>> >
>> > On Thu, Feb 11, 2016 at 11:33 AM, Matteo Grolla > >
>> > wrote:
>> >
>> > > virtual hardware, 200ms is taken on the client until response is
>> written
>> > to
>> > > disk
>> > > qtime on solr is ~90ms
>> > > not great but acceptable
>> > >
>> > > Is it possible that the method FilenameUtils.splitOnTokens is really so
>> > > heavy when requesting a lot of rows on slow hardware?
>> > >
>> > > 2016-02-11 17:17 GMT+01:00 Jack Krupansky :
>> > >
>> > > > Good to know. Hmmm... 200ms for 10 rows is not outrageously bad, but
>> > > still
>> > > > relatively bad. Even 50ms for 10 rows would be considered barely
>> okay.
>> > > > But... again it depends on query complexity - simple queries should
>> be
>> > > well
>> > > > under 50 ms for decent modern hardware.
>> > > >
>> > > > -- Jack Krupansky
>> > > >
>> > > > On Thu, Feb 11, 2016 at 10:36 AM, Matteo Grolla <
>> > matteo.gro...@gmail.com
>> > > >
>> > > > wrote:
>> > > >
>> > > > > Hi Jack,
>> > > > >   response time scale with rows. Relationship doens't seem
>> linear
>> > > but
>> > > > > Below 400 rows times are much faster,
>> > > > > I view query times from solr logs and they are fast
>> > > > > the same query with rows = 1000 takes 8s
>> > > > > with rows = 10 takes 0.2s
>> > > > >
>> > > > >
>> > > > > 2016-02-11 16:22 GMT+01:00 Jack Krupansky <
>> jack.krupan...@gmail.com
>> > >:
>> > > > >
>> > > > > > Are queries scaling linearly - does a query for 100 rows take
>> > 1/10th
>> > > > the
>> > > > > > time (1 sec vs. 10 sec or 3 sec vs. 30 sec)?
>> > > > > >
>> > > > > > Does the app need/expect exactly 1,000 documents for the query or
>> > is
>> > > > that
>> > > > > > just what this particular query happened to return?
>> > > > > >
>> > > > > > What does they query look like? Is it complex or use wildcards or
>> > > > > function
>> > > > > > queries, or is it very simple keywords? How many operators?
>> > > > > >
>> > > > > > Have you used the debugQuery=true parameter to see which search
>> > > > > components
>> > > > > > are taking the time?
>> > > > > >
>> > > > > > -- Jack Krupansky
>> > > > > >
>> > > > > > On Thu, Feb 11, 2016 at 9:42 AM, Matteo Grolla <
>> > > > matteo.gro...@gmail.com>
>> > > > > > wrote:
>> > > > > >
>> > > > > > > Hi Yonic,
>> > > > > > >  after the first query I find 1000 docs in the document
>> > cache.
>> > > > > > > I'm using curl to send the request and requesting javabin
>> format
>> > to
>> > > > > mimic
>> > > > > > > the application.
>> > > > > > > gc activity is low
>> > > > > > > I managed to load the entire 50GB index in the filesystem
>> cache,
>> > > > after
>> > > > > > that
>> > > > > > > queries don't cause disk activity anymore.
>> > > > > > > Time improves now queries that took ~30s take <10s. But I hoped
>> > > > better
>> > > > > > > I'm going to use jvisualvm's sampler to analyze where time is
>> > spent
>> > > > > > >
>> > > > >

Re: slave is getting full synced every polling

2016-02-12 Thread Erick Erickson

bq: What I have done when the problem started, I changed slave to master and
master to slave.

OK, other things aside, if you're really saying that every time you
switch the slave
and master around and restart, you get a full sync then I'd reply
"don't do that". Why
are you switching slave and master? The whole purpose of replication is to have
one master that essentially is _always_ the master. Essentially the
slave asks the
master "is my index up to date" and I'm not sure how that logic would handle
going back and forth. Theoretically, if all the files in the index
were exactly identical
it wouldn't replicate when switched, but I cant say for certain that this is
enforced.

I think you're trying to accomplish some particular objective but
going about it in
a way that is causing you grief. This smells like an XY problem, i.e.
you're trying
to accomplish X and asking about Y where Y is the index replication. What's X?
What is the purpose of switching the master and slave and how often do you do
it and why?

Best,
Erick

On Fri, Feb 12, 2016 at 6:46 AM, Alessandro Benedetti
 wrote:
> Have you customised the merge factor ?
> Is it aggressive ?
> In case a lot of merge happens, you can potentially incur in a big trasnfer
> of files each replication .
> You need to check the segments in the slave every minutes.
> When the replication is triggered what are the difference from the Master
> index ( in term of segments) and the slave ?
>
> Cheers
>
> On 12 February 2016 at 12:03, Novin Novin  wrote:
>
>> sorry core name is wmsapp_analysis which is big core
>>
>> On Fri, 12 Feb 2016 at 12:01 Novin Novin  wrote:
>>
>> > Well It started again.
>> >
>> > Below is are the errors from solr logging on admin ui.
>> > Log error message in master
>> > 2/12/2016, 11:39:24 AM null:java.lang.IllegalStateException: file:
>> > MMapDirectory@
>> /var/solr/data/wmsapp_analysis/data/index.20160211204900750
>> > lockFactory=org.apache.lucene.store.NativeFSLockFactory@56639f83 appears
>> > both in delegate and in cache: cache=[_9bu7_Lucene50_0.pos,
>> > _9bua_Lucene50_0.tip, _9bty.fdt, _9bu7.nvd, _9bu1.nvd, _9bu0.nvm,
>> > _9bu4_Lucene50_0.tim, _8kbr_uu.liv, _9bu7_Lucene50_0.doc,
>> > _9bu1_Lucene50_0.tip, _9bu9.fnm, _9bty.fdx, _9btv.fdx, _9bu5.nvm,
>> > _9bu4_Lucene50_0.pos, _9bu5.fnm, _9bu3.si, _9bua_Lucene50_0.tim,
>> > _9bty_Lucene50_0.pos, _9bu0.si, _9btw_Lucene50_0.tim,
>> > _9bu0_Lucene50_0.tim, _9bu2.nvm, _9btv_Lucene50_0.pos, _9btv.nvd,
>> > _9bu3_Lucene50_0.tip, _9bua_Lucene50_0.doc, _9bu7_Lucene50_0.tip,
>> > _9btw.nvm, _9bua.fdx, _9bu4.nvm, _9bu9_Lucene50_0.tim, _9bu4_1.liv,
>> > _9bu7.nvm, _9bu3_1.liv, _9bu0.fnm, _9bu5_Lucene50_0.tim, _9btx.fnm,
>> > _9bu2.fdx, _9bu4.fdt, _9bu2_Lucene50_0.tip, _9bu9.fdx,
>> > _9bu9_Lucene50_0.pos, _9bu7.fdt, _9bu9.nvd, _9btx_1.liv, _99gt_2s.liv,
>> > _9btw.nvd, _9bu3_Lucene50_0.doc, _9bu2.fnm, _9bua_Lucene50_0.pos,
>> > _9bu9.nvm, _9btx.nvm, _9btw_Lucene50_0.tip, _9bu1.nvm,
>> > _9bu4_Lucene50_0.doc, _9bu9_1.liv, _9bu1.fnm, _9btu.cfs,
>> > _9bu8_Lucene50_0.tip, _9bua.nvm, _9btx_Lucene50_0.doc, _9btu.si,
>> > _9bu0.fdt, _9bu7.si, _9btx_Lucene50_0.tip, _9btw.si, _9bu8.fdx,
>> > _9bu0_Lucene50_0.doc, _9bu3.nvm, _9btz_Lucene50_0.tip,
>> > _9bu3_Lucene50_0.tim, _9btz.fdt, _9btw.fdt, _9bu2.si, _9bu4.si,
>> > _9btx.nvd, _9bu4.fnm, _9btv_1.liv, _9btz_Lucene50_0.doc, _9bpm_7.liv,
>> > _9btx_Lucene50_0.pos, _9bty.fnm, _9btw_Lucene50_0.doc, _9btv.fdt,
>> > _9bu2_Lucene50_0.doc, _9btu.cfe, _9bu3.nvd, _9btv.si, _9bu8.nvm,
>> > _9btx.fdt, _9bu5.si, _9bu5.fdt, _9bu2.nvd, _9bu3.fdx, _9btv.fnm,
>> > _9bu5.fdx, _9btz.fnm, _9bu3_Lucene50_0.pos, _9bu9_Lucene50_0.tip,
>> > _9bu1.fdt, _9bu0_Lucene50_0.tip, _9bty_Lucene50_0.tim,
>> > _9btx_Lucene50_0.tim, _9bt9_3.liv, _9bty.si, _9bu2.fdt, _9bu9.fdt,
>> > _9bu2_Lucene50_0.pos, _9bua.fdt, _9bu9_Lucene50_0.doc, _9bu4.fdx,
>> > _9bu5_Lucene50_0.pos, _9bu4.nvd, _9btv_Lucene50_0.tim, _9bty.nvd, _
>> 9bu8.si,
>> > _9bu5_Lucene50_0.doc, _9bu9.si, _9btw.fnm, _9bu3.fnm, _9bh8_m.liv,
>> > _9bu3.fdt, _9bu5.nvd, _9bua.fnm, _9btw_1.liv, _9bu8_Lucene50_0.pos,
>> > _9btw_Lucene50_0.pos, _9bty_Lucene50_0.doc, _9bu6_1.liv, _9bu7.fnm,
>> > _5zcy_1kx.liv, _9bu7.fdx, _9bu5_1.liv, _9bua.nvd, _9bty_Lucene50_0.tip,
>> > _9btz.fdx, _9bu0_Lucene50_0.pos, _9bu1_Lucene50_0.doc, _9btx.fdx,
>> > _9btv_Lucene50_0.tip, _9bn9_9.liv, _9bu0.fdx, _9bu8.nvd,
>> > _9bu1_Lucene50_0.pos, _9bua.si, _9bu1.si, _9bu8_Lucene50_0.tim,
>> > _9btv_Lucene50_0.doc, _9bu2_Lucene50_0.tim, _9bu1_Lucene50_0.tim,
>> > _9bu8.fnm, _9bu4_Lucene50_0.tip, _9btx.si, _98nt_5c.liv, _9btz.nvd,
>> > _9btw.fdx, _9btv.nvm, _9bu7_Lucene50_0.tim, pending_segments_nk8,
>> > _9btz_Lucene50_0.tim, _9btz.si, _9bu8_Lucene50_0.doc,
>> > _9bu5_Lucene50_0.tip, _9btz_Lucene50_0.pos, _9btz.nvm, _9bty.nvm,
>> > _9bu0.nvd, _9bu1.fdx, _9bu8.fdt],delegate=[_9br2.cfe,
>> pending_segments_nk8,
>> > _9bnd.fnm, _9btn_Lucene50_0.tim, _96i3.cfe, _9boh.cfe,
>> > _9bto_Lucene50_0.pos, _6s8a.fnm, _9btr.si, _9bt9.cfs, _9bh8.cfe,
>> > _9btg.nvd, _9bqi_

Re: Searching special characters

2016-02-12 Thread Erick Erickson

Also look at the admin/analysis page to see the effects
of various filters in your analysis chain. It's very likely
that the * is not even _in_ the index.

Here is a partial list of elements that _may_ be in your
analysis chain:
https://cwiki.apache.org/confluence/display/solr/Filter+Descriptions
and you're using some of them whether you know it
or not as the schemas that come with Solr for fields like
text_general etc. are composed of some of these.

Best,
Erick

On Fri, Feb 12, 2016 at 2:02 AM, Modassar Ather  wrote:
> These special characters can be removed if at begging or end or can be
> taken care by the relevant filters depending on the schema defined.
> E.g "Audit"/*Audit should be searched by query Audit so I see no reason of
> indexing "/* of the content. You can use PatternReplaceFilter for replacing
> these special character.
> If the special character is in between a word E.g. Wi-Fi then these type of
> terms can be taken care by WordDelimiterFilter.
>
> Note that the special character handling may vary based on use cases.
>
> Best,
> Modassar
>
> On Fri, Feb 12, 2016 at 3:09 PM, Anil  wrote:
>
>> Thanks for quick response.
>>
>> Should these be treated differently during index ?
>>
>> I have tried *\"Audit* which is returning results of *Audit *also which is
>> incorrect. what do you say ?
>>
>> On 12 February 2016 at 15:07, Modassar Ather 
>> wrote:
>>
>> > You can search them by escaping with backslash.
>> >
>> > Best,
>> > Modassar
>> >
>>

Re: Need to move on SOlr cloud (help required)

2016-02-12 Thread Erick Erickson

bq: in case of solrcloud architecture we need not to have load balancer

First, my comment about a load balancer was for the master/slave
architecture where the load balancer points to the slaves.

Second, for SolrCloud you don't necessarily need a load balancer as
if you're using a SolrJ client requests are distributed across the replicas
via an internal load balancer.

Best,
Erick

On Thu, Feb 11, 2016 at 9:19 PM, Midas A  wrote:
> Erick ,
>
> bq: We want the hits on solr servers to be distributed
>
> True, this happens automatically in SolrCloud, but a simple load
> balancer in front of master/slave does the same thing.
>
> Midas : in case of solrcloud architecture we need not to have load balancer
> ? .
>
> On Thu, Feb 11, 2016 at 11:42 PM, Erick Erickson 
> wrote:
>
>> bq: We want the hits on solr servers to be distributed
>>
>> True, this happens automatically in SolrCloud, but a simple load
>> balancer in front of master/slave does the same thing.
>>
>> bq: what if master node fail what should be our fail over strategy  ?
>>
>> This is, indeed one of the advantages for SolrCloud, you don't have
>> to worry about this any more.
>>
>> Another benefit (and you haven't touched on whether this matters)
>> is that in SolrCloud you do not have the latency of polling and
>> replicating from master to slave, in other words it supports Near Real
>> Time.
>>
>> This comes at some additional complexity however. If you have
>> your master node failing often enough to be a problem, you have
>> other issues ;)...
>>
>> And the recovery strategy if the master fails is straightforward:
>> 1> pick one of the slaves to be the master.
>> 2> update the other nodes to point to the new master
>> 3> re-index the docs from before the old master failed to the new master.
>>
>> You can use system variables to not even have to manually edit all of the
>> solrconfig files, just supply different -D parameters on startup.
>>
>> Best,
>> Erick
>>
>> On Wed, Feb 10, 2016 at 10:39 PM, kshitij tyagi
>>  wrote:
>> > @Jack
>> >
>> > Currently we have around 55,00,000 docs
>> >
>> > Its not about load on one node we have load on different nodes at
>> different
>> > times as our traffic is huge around 60k users at a given point of time
>> >
>> > We want the hits on solr servers to be distributed so we are planning to
>> > move on solr cloud as it would be fault tolerant.
>> >
>> >
>> >
>> > On Thu, Feb 11, 2016 at 11:10 AM, Midas A  wrote:
>> >
>> >> hi,
>> >> what if master node fail what should be our fail over strategy  ?
>> >>
>> >> On Wed, Feb 10, 2016 at 9:12 PM, Jack Krupansky <
>> jack.krupan...@gmail.com>
>> >> wrote:
>> >>
>> >> > What exactly is your motivation? I mean, the primary benefit of
>> SolrCloud
>> >> > is better support for sharding, and you have only a single shard. If
>> you
>> >> > have no need for sharding and your master-slave replicated Solr has
>> been
>> >> > working fine, then stick with it. If only one machine is having a load
>> >> > problem, then that one node should be replaced. There are indeed
>> plenty
>> >> of
>> >> > good reasons to prefer SolrCloud over traditional master-slave
>> >> replication,
>> >> > but so far you haven't touched on any of them.
>> >> >
>> >> > How much data (number of documents) do you have?
>> >> >
>> >> > What is your typical query latency?
>> >> >
>> >> >
>> >> > -- Jack Krupansky
>> >> >
>> >> > On Wed, Feb 10, 2016 at 2:15 AM, kshitij tyagi <
>> >> > kshitij.shopcl...@gmail.com>
>> >> > wrote:
>> >> >
>> >> > > Hi,
>> >> > >
>> >> > > We are currently using solr 5.2 and I need to move on solr cloud
>> >> > > architecture.
>> >> > >
>> >> > > As of now we are using 5 machines :
>> >> > >
>> >> > > 1. I am using 1 master where we are indexing ourdata.
>> >> > > 2. I replicate my data on other machines
>> >> > >
>> >> > > One or the other machine keeps on showing high load so I am
>> planning to
>> >> > > move on solr cloud.
>> >> > >
>> >> > > Need help on following :
>> >> > >
>> >> > > 1. What should be my architecture in case of 5 machines to keep
>> >> > (zookeeper,
>> >> > > shards, core).
>> >> > >
>> >> > > 2. How to add a node.
>> >> > >
>> >> > > 3. what are the exact steps/process I need to follow in order to
>> change
>> >> > to
>> >> > > solr cloud.
>> >> > >
>> >> > > 4. How indexing will work in solr cloud as of now I am using mysql
>> >> query
>> >> > to
>> >> > > get the data on master and then index the same (how I need to change
>> >> this
>> >> > > in case of solr cloud).
>> >> > >
>> >> > > Regards,
>> >> > > Kshitij
>> >> > >
>> >> >
>> >>
>>

un-Boosting some Docs at index time

2016-02-12 Thread Steven White

Hi everyone,

I'm trying to figure out if this is possible, if so how do I do it.

I'm indexing records from my database.  The Solr doc has 2 basic fields:
the ID and the Data field.  I lump the data of each field from the record
into Solr's Data field.  At search time, I search on this single field Data.

My need is as follows: given how I'm indexing my data, at index time, how
do I un-boost some Solr doc?  I know which Solr doc will need to be
lower-boosted based on a field value in the record that I read off the DB.

Thanks

Steve

Re: un-Boosting some Docs at index time

2016-02-12 Thread Erick Erickson

You can use index-time boosting on a per-field basis, here's a place to start:
https://lucidworks.com/blog/2011/12/14/options-to-tune-documents-relevance-in-solr/

Does that work?

Best,
Erick

On Fri, Feb 12, 2016 at 8:30 AM, Steven White  wrote:
> Hi everyone,
>
> I'm trying to figure out if this is possible, if so how do I do it.
>
> I'm indexing records from my database.  The Solr doc has 2 basic fields:
> the ID and the Data field.  I lump the data of each field from the record
> into Solr's Data field.  At search time, I search on this single field Data.
>
> My need is as follows: given how I'm indexing my data, at index time, how
> do I un-boost some Solr doc?  I know which Solr doc will need to be
> lower-boosted based on a field value in the record that I read off the DB.
>
> Thanks
>
> Steve

Re: un-Boosting some Docs at index time

2016-02-12 Thread Steven White

Thanks Erick!!

Yes, SolrInputDocument.setDocumentBoost() is what I'm looking for.  I was
under the impression boosting is on fields only.

Steve

On Fri, Feb 12, 2016 at 11:36 AM, Erick Erickson 
wrote:

> You can use index-time boosting on a per-field basis, here's a place to
> start:
>
> https://lucidworks.com/blog/2011/12/14/options-to-tune-documents-relevance-in-solr/
>
> Does that work?
>
> Best,
> Erick
>
> On Fri, Feb 12, 2016 at 8:30 AM, Steven White 
> wrote:
> > Hi everyone,
> >
> > I'm trying to figure out if this is possible, if so how do I do it.
> >
> > I'm indexing records from my database.  The Solr doc has 2 basic fields:
> > the ID and the Data field.  I lump the data of each field from the record
> > into Solr's Data field.  At search time, I search on this single field
> Data.
> >
> > My need is as follows: given how I'm indexing my data, at index time, how
> > do I un-boost some Solr doc?  I know which Solr doc will need to be
> > lower-boosted based on a field value in the record that I read off the
> DB.
> >
> > Thanks
> >
> > Steve
>

Re: edismax query parser - pf field question

2016-02-12 Thread Senthil

It does not work with comma as well. In fact, no DisjunctionMaxQuery is added
for any of pf fields if I add comma.

(+((DisjunctionMaxQuery((P_NAME:refriger^1.5 |
CategoryName:refrigerator)~1.0) DisjunctionMaxQuery((P_NAME:water^1.5 |
CategoryName:water)~1.0) DisjunctionMaxQuery((P_NAME:filter^1.5 |
CategoryName:filter)~1.0))~2) *()*)/no_coord

If I change the defType to dismax instead of edismax, I see phrase queries
for both pf fields.

(+((DisjunctionMaxQuery((P_NAME:refriger^1.5 |
CategoryName:refrigerator)~1.0) DisjunctionMaxQuery((P_NAME:water^1.5 |
CategoryName:water)~1.0) DisjunctionMaxQuery((P_NAME:filter^1.5 |
CategoryName:filter)~1.0))~2) *DisjunctionMaxQuery((P_NAME:"refriger water
filter" | CategoryName:refrigerator water filter)~1.0)*)/no_coord

Note: CategoryName is string field and P_NAME is text_en field.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/edismax-query-parser-pf-field-question-tp4256845p4256987.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: slave is getting full synced every polling

2016-02-12 Thread Novin Novin

you're trying
to accomplish X and asking about Y where Y is the index replication. What's
X?
What is the purpose of switching the master and slave and how often do you
do
it and why?

I think I didn't explain it quit properly. So I have situation in which
data is getting index every 20 seconds or less and I can't loose data while
indexing. I use searching a lot in website, if I have to restart my solr
machine because of kernel update or some network problem or some another
reason (not really in top of my head). It takes a while to restart, while
it is restarting some body is using website feature which require data to
be indexed and display results after request completed. In this situation,
if I loose the data I have to do full index (because I am using solrj it
takes 3 to 4 hours, so this not ideal). That's why I am doing switching between
master and slave.

So now what I do here when slave is synced I make it master2 at this point
I have two master, 1 and 2 and when I do that master2 it is used by website
and than I convert master1 to slave of master2. So I don't really use data
because it done by script it finish in couple of seconds.

How often I do this?
Not quit often. but once in two or three months.

Let me know if you need anything else or something I didn't explain it
properly.

@Alessandro Benedetti

Have you customized the merge factor ?
Nope, I am using merge factor 10 always.

Is it aggressive ?
I am not sure what did you mean here by aggressive.

When the replication is triggered what are the difference from the Master
index ( in term of segments) and the slave ?

What I have checked this time, It creates new directory
index.20160213120345, and this directory is empty. But it has another
directory with name index.20160213120322 and this one has more than 90% of
 index file same to master index directory.

I just wanna say that you guys are taking time to helping me out with this
problem is highly appreciated.

Best regards,
Novin

On Fri, 12 Feb 2016 at 16:08 Erick Erickson  wrote:

> bq: What I have done when the problem started, I changed slave to master
> and
> master to slave.
>
> OK, other things aside, if you're really saying that every time you
> switch the slave
> and master around and restart, you get a full sync then I'd reply
> "don't do that". Why
> are you switching slave and master? The whole purpose of replication is to
> have
> one master that essentially is _always_ the master. Essentially the
> slave asks the
> master "is my index up to date" and I'm not sure how that logic would
> handle
> going back and forth. Theoretically, if all the files in the index
> were exactly identical
> it wouldn't replicate when switched, but I cant say for certain that this
> is
> enforced.
>
> I think you're trying to accomplish some particular objective but
> going about it in
> a way that is causing you grief. This smells like an XY problem, i.e.
> you're trying
> to accomplish X and asking about Y where Y is the index replication.
> What's X?
> What is the purpose of switching the master and slave and how often do you
> do
> it and why?
>
> Best,
> Erick
>
> On Fri, Feb 12, 2016 at 6:46 AM, Alessandro Benedetti
>  wrote:
> > Have you customised the merge factor ?
> > Is it aggressive ?
> > In case a lot of merge happens, you can potentially incur in a big
> trasnfer
> > of files each replication .
> > You need to check the segments in the slave every minutes.
> > When the replication is triggered what are the difference from the Master
> > index ( in term of segments) and the slave ?
> >
> > Cheers
> >
> > On 12 February 2016 at 12:03, Novin Novin  wrote:
> >
> >> sorry core name is wmsapp_analysis which is big core
> >>
> >> On Fri, 12 Feb 2016 at 12:01 Novin Novin  wrote:
> >>
> >> > Well It started again.
> >> >
> >> > Below is are the errors from solr logging on admin ui.
> >> > Log error message in master
> >> > 2/12/2016, 11:39:24 AM null:java.lang.IllegalStateException: file:
> >> > MMapDirectory@
> >> /var/solr/data/wmsapp_analysis/data/index.20160211204900750
> >> > lockFactory=org.apache.lucene.store.NativeFSLockFactory@56639f83
> appears
> >> > both in delegate and in cache: cache=[_9bu7_Lucene50_0.pos,
> >> > _9bua_Lucene50_0.tip, _9bty.fdt, _9bu7.nvd, _9bu1.nvd, _9bu0.nvm,
> >> > _9bu4_Lucene50_0.tim, _8kbr_uu.liv, _9bu7_Lucene50_0.doc,
> >> > _9bu1_Lucene50_0.tip, _9bu9.fnm, _9bty.fdx, _9btv.fdx, _9bu5.nvm,
> >> > _9bu4_Lucene50_0.pos, _9bu5.fnm, _9bu3.si, _9bua_Lucene50_0.tim,
> >> > _9bty_Lucene50_0.pos, _9bu0.si, _9btw_Lucene50_0.tim,
> >> > _9bu0_Lucene50_0.tim, _9bu2.nvm, _9btv_Lucene50_0.pos, _9btv.nvd,
> >> > _9bu3_Lucene50_0.tip, _9bua_Lucene50_0.doc, _9bu7_Lucene50_0.tip,
> >> > _9btw.nvm, _9bua.fdx, _9bu4.nvm, _9bu9_Lucene50_0.tim, _9bu4_1.liv,
> >> > _9bu7.nvm, _9bu3_1.liv, _9bu0.fnm, _9bu5_Lucene50_0.tim, _9btx.fnm,
> >> > _9bu2.fdx, _9bu4.fdt, _9bu2_Lucene50_0.tip, _9bu9.fdx,
> >> > _9bu9_Lucene50_0.pos, _9bu7.fdt, _9bu9.nvd, _9btx_1.liv, _99gt_2s.liv,
>

Re: query knowledge graph

2016-02-12 Thread Alexandre Rafalovitch

The last Lucene/Solr Revolution had a number of presentations on
relevancy. I would recommend watching them as a first step. They are
on YouTube under Lucidworks channel.

There is also an early release book from Mannings called Relevant
Search which you will find very useful.

Regards,
   Alex.

Newsletter and resources for Solr beginners and intermediates:
http://www.solr-start.com/

On 12 February 2016 at 23:27, Midas A  wrote:
>  Please suggest how to create query knowledge graph for e-commerce
> application .
>
>
> please describe in detail . our mote is to improve relevancy . we are from
> LAMP back ground .

Re: slave is getting full synced every polling

2016-02-12 Thread Shawn Heisey

On 2/12/2016 11:47 AM, Novin Novin wrote:
> I think I didn't explain it quit properly. So I have situation in which
> data is getting index every 20 seconds or less and I can't loose data while
> indexing. I use searching a lot in website, if I have to restart my solr
> machine because of kernel update or some network problem or some another
> reason (not really in top of my head). It takes a while to restart, while
> it is restarting some body is using website feature which require data to
> be indexed and display results after request completed. In this situation,
> if I loose the data I have to do full index (because I am using solrj it
> takes 3 to 4 hours, so this not ideal). That's why I am doing switching 
> between
> master and slave.

If you switched to SolrCloud, you wouldn't have to worry about switching
masters -- there *are* no masters or slaves.  With a proper replicated
setup, any single machine in the cloud can go down, or have maintenance
performed, with no visible impact on SolrJ clients using
CloudSolrClient.  When the Solr instance comes back up, it will
automatically resync from the cloud.

If you're using non-SolrJ clients, you'd just need to put a load
balancer in front of your SolrCloud cluster and there would be no
downtime for failures or maintenance.

SolrCloud is a little bit more difficult to set up initially, and
requires at least three physical servers, but once it's running, it is
typically *easier* to manage than master/slave.

Thanks,
Shawn

Re: query knowledge graph

2016-02-12 Thread Jack Krupansky

"knowledge graph" is kind of vague - what did you have in mind? An example
would help.

-- Jack Krupansky

On Fri, Feb 12, 2016 at 7:27 AM, Midas A  wrote:

>  Please suggest how to create query knowledge graph for e-commerce
> application .
>
>
> please describe in detail . our mote is to improve relevancy . we are from
> LAMP back ground .
>

Re: slave is getting full synced every polling

2016-02-12 Thread Erick Erickson

If you have to stay on master/slave, then the full replication when
you do this switch is probably just a price you'll have to pay. The
indexes are different so to be on the safe side Solr will replicate
the whole thing.

Is it really that much of a problem?

As Shawn says, though, much of this would be simpler with
SolrCloud.

Best,
Erick

On Fri, Feb 12, 2016 at 1:06 PM, Shawn Heisey  wrote:
> On 2/12/2016 11:47 AM, Novin Novin wrote:
>> I think I didn't explain it quit properly. So I have situation in which
>> data is getting index every 20 seconds or less and I can't loose data while
>> indexing. I use searching a lot in website, if I have to restart my solr
>> machine because of kernel update or some network problem or some another
>> reason (not really in top of my head). It takes a while to restart, while
>> it is restarting some body is using website feature which require data to
>> be indexed and display results after request completed. In this situation,
>> if I loose the data I have to do full index (because I am using solrj it
>> takes 3 to 4 hours, so this not ideal). That's why I am doing switching 
>> between
>> master and slave.
>
> If you switched to SolrCloud, you wouldn't have to worry about switching
> masters -- there *are* no masters or slaves.  With a proper replicated
> setup, any single machine in the cloud can go down, or have maintenance
> performed, with no visible impact on SolrJ clients using
> CloudSolrClient.  When the Solr instance comes back up, it will
> automatically resync from the cloud.
>
> If you're using non-SolrJ clients, you'd just need to put a load
> balancer in front of your SolrCloud cluster and there would be no
> downtime for failures or maintenance.
>
> SolrCloud is a little bit more difficult to set up initially, and
> requires at least three physical servers, but once it's running, it is
> typically *easier* to manage than master/slave.
>
> Thanks,
> Shawn
>

boolean query with score and with out score

2016-02-12 Thread sara hajili

hi i have a Boolean query
like this
query = caption:apple Or caption:bannana^1.0003  OR
caption:pineapple^1.0023
and get a result like
doc1
doc2
doc3

but this result does not satisfy me at all.
because i had a doc that contain some of this term but i did not get these
docs.

but when i change my query to :
query = caption:apple Or caption:bannana  OR caption:pineapple
 i get appropriate result i get all docs that have even one of this terms.

why Boolean query with score and with out score has a different manner?!

Re: boolean query with score and with out score

2016-02-12 Thread Erik Hatcher

What are the parsed queries from debug=true?  Maybe it's an Or/OR thing?

> On Feb 12, 2016, at 23:47, sara hajili  wrote:
> 
> hi i have a Boolean query
> like this
> query = caption:apple Or caption:bannana^1.0003  OR
> caption:pineapple^1.0023
> and get a result like
> doc1
> doc2
> doc3
> 
> but this result does not satisfy me at all.
> because i had a doc that contain some of this term but i did not get these
> docs.
> 
> but when i change my query to :
> query = caption:apple Or caption:bannana  OR caption:pineapple
> i get appropriate result i get all docs that have even one of this terms.
> 
> why Boolean query with score and with out score has a different manner?!

37 matches

Mail list logo