from:"Callum Lamb"

Classes in solr_home /lib cannot import from solr/dist

2016-01-14 Thread Callum Lamb

I've got an extension jar that contains a class which extends from

org.apache.solr.handler.dataimport.DataSource

But it only works if it's within the solr/dist folder. However when stored
in the lib/ folder within Solr home. When it tries to load the class it
cannot find it's parent:

Exception in thread "Thread-69" java.lang.NoClassDefFoundError:
org/apache/solr/handler/dataimport/DataSource
at
org.apache.solr.handler.dataimport.DataImporter.getDataSourceInstance(DataImporter.java:374)
at
org.apache.solr.handler.dataimport.ContextImpl.getDataSource(ContextImpl.java:102)
Caused by: java.lang.ClassNotFoundException:
org.apache.solr.handler.dataimport.DataSource

The classes in the lib folder don't have access to the class within the
dist folder in their classpath when they are loaded.

I'd like the keep my solr install separate from my configs/plugins/indexes
so I want to avoid putting it into the dist folder unless I absolutely have
to.

Is this by design? Is there some kind of configuration somewhere I can
tweak to get this to work?

Cheers,

Callum L.

-- 

Mintel Group Ltd | 11 Pilgrim Street | London | EC4V 6RN
Registered in England: Number 1475918. | VAT Number: GB 232 9342 72

Contact details for our other offices can be found at 
http://www.mintel.com/office-locations.

This email and any attachments may include content that is confidential, 
privileged 
or otherwise protected under applicable law. Unauthorised disclosure, 
copying, distribution 
or use of the contents is prohibited and may be unlawful. If you have 
received this email in error,
including without appropriate authorisation, then please reply to the 
sender about the error 
and delete this email and any attachments.

Re: Classes in solr_home /lib cannot import from solr/dist

2016-01-14 Thread Callum Lamb

That's what I did:

My solrconfig.xml has the following (i've hardcoded the version numbers for
now to get regexes out of the picture):

No warning's whatsoever for not finding the jars. And the jars themselves
are in the right order (the second depends on the first).

If i move the data import handler jar to the ${solr.solr.home}/lib/ folder
then everything works. This implies that the solr-dataimporthandler jar
isn't being included properly but I've checked so many times that it's
correct. I can do a full absolute path without the use of solr.install.dir
and solr.solr.home and it still does not work.

The permissions and ownership on the jar files are identical for the 2
jars, if it can load one then it should be able to load the other.

On Thu, Jan 14, 2016 at 2:19 PM, sara hajili  wrote:

> hi Callum.
> you can create a directory for your jar file any where,and u must set jar
> file location in  tag in solrConfig.xml
> and be carefull that add your lib location at the end of the solr config
> default  tag,
> because some times your jar need class that at first solr must be load own
> class after that load your jar to don't face a class not found exception.
>
>
> On Thu, Jan 14, 2016 at 4:36 AM, Callum Lamb  wrote:
>
> > I've got an extension jar that contains a class which extends from
> >
> > org.apache.solr.handler.dataimport.DataSource
> >
> > But it only works if it's within the solr/dist folder. However when
> stored
> > in the lib/ folder within Solr home. When it tries to load the class it
> > cannot find it's parent:
> >
> > Exception in thread "Thread-69" java.lang.NoClassDefFoundError:
> > org/apache/solr/handler/dataimport/DataSource
> > at
> >
> >
> org.apache.solr.handler.dataimport.DataImporter.getDataSourceInstance(DataImporter.java:374)
> > at
> >
> >
> org.apache.solr.handler.dataimport.ContextImpl.getDataSource(ContextImpl.java:102)
> > Caused by: java.lang.ClassNotFoundException:
> > org.apache.solr.handler.dataimport.DataSource
> >
> > The classes in the lib folder don't have access to the class within the
> > dist folder in their classpath when they are loaded.
> >
> > I'd like the keep my solr install separate from my
> configs/plugins/indexes
> > so I want to avoid putting it into the dist folder unless I absolutely
> have
> > to.
> >
> > Is this by design? Is there some kind of configuration somewhere I can
> > tweak to get this to work?
> >
> > Cheers,
> >
> > Callum L.
> >
> > --
> >
> > Mintel Group Ltd | 11 Pilgrim Street | London | EC4V 6RN
> > Registered in England: Number 1475918. | VAT Number: GB 232 9342 72
> >
> > Contact details for our other offices can be found at
> > http://www.mintel.com/office-locations.
> >
> > This email and any attachments may include content that is confidential,
> > privileged
> > or otherwise protected under applicable law. Unauthorised disclosure,
> > copying, distribution
> > or use of the contents is prohibited and may be unlawful. If you have
> > received this email in error,
> > including without appropriate authorisation, then please reply to the
> > sender about the error
> > and delete this email and any attachments.
> >
> >
>

-- 

Mintel Group Ltd | 11 Pilgrim Street | London | EC4V 6RN
Registered in England: Number 1475918. | VAT Number: GB 232 9342 72

Contact details for our other offices can be found at 
http://www.mintel.com/office-locations.

This email and any attachments may include content that is confidential, 
privileged 
or otherwise protected under applicable law. Unauthorised disclosure, 
copying, distribution 
or use of the contents is prohibited and may be unlawful. If you have 
received this email in error,
including without appropriate authorisation, then please reply to the 
sender about the error 
and delete this email and any attachments.

Re: Classes in solr_home /lib cannot import from solr/dist

2016-01-15 Thread Callum Lamb

Good to know Solr already loads them, that removed a bunch of lines from my
solrconfig.xml.

Having to copy the required jars from dist/ to lib/ isn't ideal but if
that's the only solution then at least I can stop searching for a solution
and figure out how best to deal with this limitation.

I assume the reason for this is that the libs in solr.home.home/lib are
loaded at runtime? I don't know much about how this works in Java but i'm
guessing Solr can access the classes in the Jars but not the other way
around?

Thanks for your help guys.

On Thu, Jan 14, 2016 at 5:03 PM, Shawn Heisey  wrote:

> On 1/14/2016 5:36 AM, Callum Lamb wrote:
> > I've got an extension jar that contains a class which extends from
> >
> > org.apache.solr.handler.dataimport.DataSource
> >
> > But it only works if it's within the solr/dist folder. However when
> stored
> > in the lib/ folder within Solr home. When it tries to load the class it
> > cannot find it's parent:
> >
> > Exception in thread "Thread-69" java.lang.NoClassDefFoundError:
> > org/apache/solr/handler/dataimport/DataSource
> > at
> >
> org.apache.solr.handler.dataimport.DataImporter.getDataSourceInstance(DataImporter.java:374)
> > at
> >
> org.apache.solr.handler.dataimport.ContextImpl.getDataSource(ContextImpl.java:102)
> > Caused by: java.lang.ClassNotFoundException:
> > org.apache.solr.handler.dataimport.DataSource
> >
> > The classes in the lib folder don't have access to the class within the
> > dist folder in their classpath when they are loaded.
> >
> > I'd like the keep my solr install separate from my
> configs/plugins/indexes
> > so I want to avoid putting it into the dist folder unless I absolutely
> have
> > to.
>
> If you're going to put jars in $SOLR_HOME/lib, then you should *only*
> put jars in that directory, and NOT load jars explicitly.  The 
> directives should not be used in solrconfig.xml when jars are loaded
> from this directory, because Solr will automatically load jars from this
> location and make them available to all cores.
>
> If moving all your extra jars (including things like the dataimport jar)
> to $SOLR_HOME/lib and taking out jar loading in solrconfig.xml doesn't
> help, then depending on the Solr version, you *might* be running into
> SOLR-6188.
>
> https://issues.apache.org/jira/browse/SOLR-6188
>
> You'll want to be sure that you don't the same jar more than once.  This
> is the root of the specific problem that SOLR-6188 solves.  Loading the
> same jar more than once can also happen if the jar is in the lib
> directory AND mentioned on a  config element.
>
> Thanks,
> Shawn
>
>

-- 

Mintel Group Ltd | 11 Pilgrim Street | London | EC4V 6RN
Registered in England: Number 1475918. | VAT Number: GB 232 9342 72

Contact details for our other offices can be found at 
http://www.mintel.com/office-locations.

This email and any attachments may include content that is confidential, 
privileged 
or otherwise protected under applicable law. Unauthorised disclosure, 
copying, distribution 
or use of the contents is prohibited and may be unlawful. If you have 
received this email in error,
including without appropriate authorisation, then please reply to the 
sender about the error 
and delete this email and any attachments.

Nested grouping or equivalent.

2016-05-11 Thread Callum Lamb

We have a horrible Solr query that groups by a field and then sorts by
another. My understanding is that for this to happen it has to sort by the
grouping field, group it and then sort the resulting result set. It's not a
fast query.

Unfortunately our documents now need to be grouped as well (product
variants into items) and that grouping query needs to work on that grouping
instead. As far as I'm aware you can't do nested grouping in Solr.

In summary we want to have product variants that get grouped into Items and
then they get grouped by field and then sorted by another.

The solution doesn't need to be fast, it's a rarely ever used legacy part
of our application that's basically never used and we just need it to work.
Our dataset isn't huge so it doesn't matter if Solr has to scan the entire
index (I think the query does this atm anyway). But downloading the entire
document set and doing the operations in ETL isn't something we really want
to dedicate time to unless it's impossible to represent this in Solr
queries.

Any ideas?

Cheers,

Callum.

-- 

Mintel Group Ltd | 11 Pilgrim Street | London | EC4V 6RN
Registered in England: Number 1475918. | VAT Number: GB 232 9342 72

Contact details for our other offices can be found at 
http://www.mintel.com/office-locations.

This email and any attachments may include content that is confidential, 
privileged 
or otherwise protected under applicable law. Unauthorised disclosure, 
copying, distribution 
or use of the contents is prohibited and may be unlawful. If you have 
received this email in error,
including without appropriate authorisation, then please reply to the 
sender about the error 
and delete this email and any attachments.

Re: Nested grouping or equivalent.

2016-05-12 Thread Callum Lamb

Thank you so much Erick. The CollapsingQparse was exactly what I needed.
I'm able to group by the field and then do the collapse from products into
items and get the correct answer. The Collapsing is also more appropriate
for the general grouping we need to do all the time now as well, so we'll
probably use that instead.

There was one part that took me a while to figure out though. And that was
getting the unique counts of a field with facets from BEFORE the collapse
kicked in.

Just for anyone else who wants to know how to do this. I use a tag on the
collapse fq like this:

fq={!collapse field=int_item_id tag=collapse}

And then by expressing my facet count query with excludeTags like this i'm
able to get the pre_collapse_count:

json.facet={'pre_collapse_count':{'type':'query','domain':{'excludeTags':'collapse'}}}

Which works perfectly. You need Solr 5.4 for this to work with json facets
according to http://yonik.com/multi-select-faceting/. But I think you can
also do it with old facets by putting a {!ex=collapse}int_item_id in the
facet.field property.

On Thu, May 12, 2016 at 5:10 AM, Erick Erickson 
wrote:

> A couple of ideas. If this is 5x consider Streaming Aggregation.
> The idea here is that you stream the docs back to a SolrJ client and
> slice and dice them there. SA is designed to export 400K docs/sec,
> but the returned values must be DocValues (i.e. no text types, strings
> are OK).
>
> Have you seen the CollapsingQParserPlugin? That might help.
>
> Or push back at the product manager and say "why are we wasting
> time supporting something nobody uses?" ;)
>
> Best,
> Erick
>
> On Wed, May 11, 2016 at 1:45 AM, Callum Lamb  wrote:
> > We have a horrible Solr query that groups by a field and then sorts by
> > another. My understanding is that for this to happen it has to sort by
> the
> > grouping field, group it and then sort the resulting result set. It's
> not a
> > fast query.
> >
> > Unfortunately our documents now need to be grouped as well (product
> > variants into items) and that grouping query needs to work on that
> grouping
> > instead. As far as I'm aware you can't do nested grouping in Solr.
> >
> > In summary we want to have product variants that get grouped into Items
> and
> > then they get grouped by field and then sorted by another.
> >
> > The solution doesn't need to be fast, it's a rarely ever used legacy part
> > of our application that's basically never used and we just need it to
> work.
> > Our dataset isn't huge so it doesn't matter if Solr has to scan the
> entire
> > index (I think the query does this atm anyway). But downloading the
> entire
> > document set and doing the operations in ETL isn't something we really
> want
> > to dedicate time to unless it's impossible to represent this in Solr
> > queries.
> >
> > Any ideas?
> >
> > Cheers,
> >
> > Callum.
> >
> > --
> >
> > Mintel Group Ltd | 11 Pilgrim Street | London | EC4V 6RN
> > Registered in England: Number 1475918. | VAT Number: GB 232 9342 72
> >
> > Contact details for our other offices can be found at
> > http://www.mintel.com/office-locations.
> >
> > This email and any attachments may include content that is confidential,
> > privileged
> > or otherwise protected under applicable law. Unauthorised disclosure,
> > copying, distribution
> > or use of the contents is prohibited and may be unlawful. If you have
> > received this email in error,
> > including without appropriate authorisation, then please reply to the
> > sender about the error
> > and delete this email and any attachments.
> >
>

-- 

Mintel Group Ltd | 11 Pilgrim Street | London | EC4V 6RN
Registered in England: Number 1475918. | VAT Number: GB 232 9342 72

Contact details for our other offices can be found at 
http://www.mintel.com/office-locations.

This email and any attachments may include content that is confidential, 
privileged 
or otherwise protected under applicable law. Unauthorised disclosure, 
copying, distribution 
or use of the contents is prohibited and may be unlawful. If you have 
received this email in error,
including without appropriate authorisation, then please reply to the 
sender about the error 
and delete this email and any attachments.

Date range warming.

2016-07-01 Thread Callum Lamb

We want to warm some FQ's. The main ones we want to do being date presets
like "last 6 months", "last year" .etc

The queries for the last 6 months get generated to look like this from site
(it's really 6 months -1 day):

*date_published:([2016-01-02T00:00:00.000Z TO 2016-07-01T23:59:59.999Z])*

But since I have to represent this is in the firstSearcher section of
solrconfig.xml I need to use the date math features (Is there another way?
There doesn't seem to be a JVM system properties with the date in it, and I
don't want to have to restart solr everyday to update a solr env variable).

So I have this:

*date_published:([NOW/DAY-6MONTH+1DAY TO NOW/DAY+1DAY-1SECOND])*

Which should resolve to the same thing. Is there someway I can check this
for sure? I get the same results when I run them.

I have a couple questions though:

1. Is Solr smart enough to see that it if the current explicit queries that
come through are the same as my date math queries and re-use the fq in this
case? Is there a way to confirm this? I can go and change them to be the
same as well, not much of an issue, more curious than anything.

2. Can Solr re-use fq's with NOW in them at all? Since NOW is changing all
the time I'm worried there some kind of checker than just sets cache=false
on all queries containing NOW or worse expands them to the current time and
caches that, and none of the fq's will ever match it (assuming solr just
does a strcmp for fq's).

Cheers,

Callum.

-- 

Mintel Group Ltd | 11 Pilgrim Street | London | EC4V 6RN
Registered in England: Number 1475918. | VAT Number: GB 232 9342 72

Contact details for our other offices can be found at 
http://www.mintel.com/office-locations.

This email and any attachments may include content that is confidential, 
privileged 
or otherwise protected under applicable law. Unauthorised disclosure, 
copying, distribution 
or use of the contents is prohibited and may be unlawful. If you have 
received this email in error,
including without appropriate authorisation, then please reply to the 
sender about the error 
and delete this email and any attachments.

Re: Date range warming.

2016-07-01 Thread Callum Lamb

Woops, Just realised it's meant to be:

date_published:([NOW/DAY-6MONTH+1DAY TO NOW/DAY+1DAY-*1MILLISECOND*])

instead.

On Fri, Jul 1, 2016 at 11:52 AM, Callum Lamb  wrote:

> We want to warm some FQ's. The main ones we want to do being date presets
> like "last 6 months", "last year" .etc
>
> The queries for the last 6 months get generated to look like this from
> site (it's really 6 months -1 day):
>
> *date_published:([2016-01-02T00:00:00.000Z TO 2016-07-01T23:59:59.999Z])*
>
> But since I have to represent this is in the firstSearcher section of
> solrconfig.xml I need to use the date math features (Is there another way?
> There doesn't seem to be a JVM system properties with the date in it, and I
> don't want to have to restart solr everyday to update a solr env variable).
>
> So I have this:
>
> *date_published:([NOW/DAY-6MONTH+1DAY TO NOW/DAY+1DAY-1SECOND])*
>
> Which should resolve to the same thing. Is there someway I can check this
> for sure? I get the same results when I run them.
>
> I have a couple questions though:
>
> 1. Is Solr smart enough to see that it if the current explicit queries
> that come through are the same as my date math queries and re-use the fq in
> this case? Is there a way to confirm this? I can go and change them to be
> the same as well, not much of an issue, more curious than anything.
>
> 2. Can Solr re-use fq's with NOW in them at all? Since NOW is changing all
> the time I'm worried there some kind of checker than just sets cache=false
> on all queries containing NOW or worse expands them to the current time and
> caches that, and none of the fq's will ever match it (assuming solr just
> does a strcmp for fq's).
>
> Cheers,
>
> Callum.
>
>
>
>
>

-- 

Mintel Group Ltd | 11 Pilgrim Street | London | EC4V 6RN
Registered in England: Number 1475918. | VAT Number: GB 232 9342 72

Contact details for our other offices can be found at 
http://www.mintel.com/office-locations.

This email and any attachments may include content that is confidential, 
privileged 
or otherwise protected under applicable law. Unauthorised disclosure, 
copying, distribution 
or use of the contents is prohibited and may be unlawful. If you have 
received this email in error,
including without appropriate authorisation, then please reply to the 
sender about the error 
and delete this email and any attachments.

Re: Date range warming.

2016-07-01 Thread Callum Lamb

Okay I figured it out. Answer here if anyone ever stumbled across this in
future.

With debugQuery on you can see filter_queries actually get processed into
what's in the parsed_filter_queries and it's those that get cached. In this
case solr converts them to unix timestamps TO unix_timestamp and It's these
that get cached. So you can see if they'll match by looking at those.

On Fri, Jul 1, 2016 at 12:00 PM, Callum Lamb  wrote:

> Woops, Just realised it's meant to be:
>
> date_published:([NOW/DAY-6MONTH+1DAY TO NOW/DAY+1DAY-*1MILLISECOND*])
>
> instead.
>
> On Fri, Jul 1, 2016 at 11:52 AM, Callum Lamb  wrote:
>
>> We want to warm some FQ's. The main ones we want to do being date presets
>> like "last 6 months", "last year" .etc
>>
>> The queries for the last 6 months get generated to look like this from
>> site (it's really 6 months -1 day):
>>
>> *date_published:([2016-01-02T00:00:00.000Z TO 2016-07-01T23:59:59.999Z])*
>>
>> But since I have to represent this is in the firstSearcher section of
>> solrconfig.xml I need to use the date math features (Is there another way?
>> There doesn't seem to be a JVM system properties with the date in it, and I
>> don't want to have to restart solr everyday to update a solr env variable).
>>
>> So I have this:
>>
>> *date_published:([NOW/DAY-6MONTH+1DAY TO NOW/DAY+1DAY-1SECOND])*
>>
>> Which should resolve to the same thing. Is there someway I can check this
>> for sure? I get the same results when I run them.
>>
>> I have a couple questions though:
>>
>> 1. Is Solr smart enough to see that it if the current explicit queries
>> that come through are the same as my date math queries and re-use the fq in
>> this case? Is there a way to confirm this? I can go and change them to be
>> the same as well, not much of an issue, more curious than anything.
>>
>> 2. Can Solr re-use fq's with NOW in them at all? Since NOW is changing
>> all the time I'm worried there some kind of checker than just sets
>> cache=false on all queries containing NOW or worse expands them to the
>> current time and caches that, and none of the fq's will ever match it
>> (assuming solr just does a strcmp for fq's).
>>
>> Cheers,
>>
>> Callum.
>>
>>
>>
>>
>>
>

-- 

Mintel Group Ltd | 11 Pilgrim Street | London | EC4V 6RN
Registered in England: Number 1475918. | VAT Number: GB 232 9342 72

Contact details for our other offices can be found at 
http://www.mintel.com/office-locations.

This email and any attachments may include content that is confidential, 
privileged 
or otherwise protected under applicable law. Unauthorised disclosure, 
copying, distribution 
or use of the contents is prohibited and may be unlawful. If you have 
received this email in error,
including without appropriate authorisation, then please reply to the 
sender about the error 
and delete this email and any attachments.

Re: Multilevel grouping?

2016-07-14 Thread Callum Lamb

Look at the collapse module
https://cwiki.apache.org/confluence/display/solr/Collapse+and+Expand+Results.
It can the same thing as group.

If you want to get counts/facets from before the collapse, tag the collapse
statement use the exclude tags tags in your json facets (there's an
equivalent for non json facets). I think the default nullpolicy is
different from grouping too, but you can change it to be the same.

I've not been able to get 2 collapses to work on my version of Solr. But
collapse + group works and you can get 2 levels. Not being able to do
multiple collapses appears to be a bug (it sorta works). I recall there
being JIRA case somewhere stating it was fixed in some version. So you may
be able to do as many levels as you like if you upgrade/already run a very
recent version of Solr.

On Thu, Jul 14, 2016 at 3:52 PM, Aditya Sundaram  wrote:

> Thanks Yonik, was looking for exactly that, is there any workaround to
> achieve that currently?
>
> On Tue, Jul 12, 2016 at 5:07 PM, Yonik Seeley  wrote:
>
> > I started this a while ago, but haven't found the time to finish:
> > https://issues.apache.org/jira/browse/SOLR-7830
> >
> > -Yonik
> >
> >
> > On Tue, Jul 12, 2016 at 7:29 AM, Aditya Sundaram
> >  wrote:
> > > Does solr support multilevel grouping? I want to group upto 2/3 levels
> > > based on different fields i.e 1st group on field one, within which i
> > group
> > > by field 2 etc.
> > > I am aware of facet.pivot which does the same but retrieves only the
> > count.
> > > Is there anyway to get the documents as well along with the count in
> > > facet.pivot???
> > >
> > > --
> > > Aditya Sundaram
> >
>
>
>
> --
> Aditya Sundaram
> Software Engineer, Technology team
> AKR Tech park B Block, B1 047
> +91-9844006866
>

-- 

Mintel Group Ltd | 11 Pilgrim Street | London | EC4V 6RN
Registered in England: Number 1475918. | VAT Number: GB 232 9342 72

Contact details for our other offices can be found at 
http://www.mintel.com/office-locations.

This email and any attachments may include content that is confidential, 
privileged 
or otherwise protected under applicable law. Unauthorised disclosure, 
copying, distribution 
or use of the contents is prohibited and may be unlawful. If you have 
received this email in error,
including without appropriate authorisation, then please reply to the 
sender about the error 
and delete this email and any attachments.

Should we still optimize?

2016-08-08 Thread Callum Lamb

We have a cronjob that runs every week at a quiet time to run the
optimizecommand on our Solr collections. Even when it's quiet it's still an
extremely heavy operation.

One of the things I keep seeing on stackoverflow is that optimizing is now
essentially deprecated and lucene (We're on Solr 5.5.2) will now keep the
amount of segments at a reasonable level and that the performance impact of
having deletedDocs is now much less.

One of our cores doesn't get optimized and it's currently sitting at 5.5
million documents with 1.9 million deleted docs. Which seems pretty high to
me.

How true is this claim? Is optimizing still a good idea for the general
case?

-- 

Mintel Group Ltd | 11 Pilgrim Street | London | EC4V 6RN
Registered in England: Number 1475918. | VAT Number: GB 232 9342 72

Contact details for our other offices can be found at 
http://www.mintel.com/office-locations.

This email and any attachments may include content that is confidential, 
privileged 
or otherwise protected under applicable law. Unauthorised disclosure, 
copying, distribution 
or use of the contents is prohibited and may be unlawful. If you have 
received this email in error,
including without appropriate authorisation, then please reply to the 
sender about the error 
and delete this email and any attachments.

Re: Should we still optimize?

2016-08-08 Thread Callum Lamb

Yeah I figured that was too many deleteddocs. It could just be that our max
segments is set too high though.

The reason I asked is because our optimize requests have started failing.
Or at least,they are appearing to fail because the optimize request returns
a non 200. The optimize seems to go ahead successfully regardless though.
Before trying to find out if I can  asynchronously request and poll for
success (doesn't appear to be possible yet) or a better way of determining
success, I thought I'd check if the whole thing was necessary to begin with.

Hopefully it doesn't involve polling the core status until deleteddocs goes
below a certain level :/.

Cheers for info.

On Mon, Aug 8, 2016 at 2:58 PM, Shawn Heisey  wrote:

> On 8/8/2016 3:10 AM, Callum Lamb wrote:
> > How true is this claim? Is optimizing still a good idea for the
> > general case?
>
> For the general case, optimizing is not recommended.  If there are a
> very large number of deleted documents, which does describe your
> situation, then there is definitely a benefit.
>
> In cases where there are a lot of deleted documents, scoring can be
> affected by the presence of the deleted documents, and the drop in index
> size after an optimize can result in a large performance boost.  For the
> general case where there are not many deletes, there *is* a performance
> benefit to optimizing down to a single segment, but it is nowhere near
> as dramatic as it was in the 1.x/3.x days.
>
> The problem with optimizes in the general case is this:  The performance
> hit that the optimize operation itself causes may not be worth the small
> performance improvement.
>
> If you have a time where your index is quiet enough that the optimize
> itself won't be disruptive, then you should certainly take advantage of
> that time and do the optimize, even if there aren't many deletes.
>
> There is another benefit to optimizes that doesn't get mentioned often:
> It can make subsequent normal merging operations during indexing faster,
> because there will not be as many large segments.
>
> Thanks,
> Shawn
>
>

-- 

Mintel Group Ltd | 11 Pilgrim Street | London | EC4V 6RN
Registered in England: Number 1475918. | VAT Number: GB 232 9342 72

Contact details for our other offices can be found at 
http://www.mintel.com/office-locations.

This email and any attachments may include content that is confidential, 
privileged 
or otherwise protected under applicable law. Unauthorised disclosure, 
copying, distribution 
or use of the contents is prohibited and may be unlawful. If you have 
received this email in error,
including without appropriate authorisation, then please reply to the 
sender about the error 
and delete this email and any attachments.

Handling ampersands in searches.

2016-11-16 Thread Callum Lamb

I'm having an issue where searches that contain ampersands aren't being
handled correctly. I need them to be dropped at index time *AND* query
time. When documents come in and are indexed the ampersands are
successfully dropped when they go into my stemmed field (When I facet on
the stemmed field they aren't in the list), but when I actually search with
a term containing an ampersand, I get no results.

E.g. I search for the string "light fit" or "light and fit" then I get
results, but when I search for "light & fit" I get none. Even though the
SnowballPorterFilterFactory should be dropping it at query time like it
does for the "and" and all 3 queries *should* be equivalent.

I've tried adding a synonym such that shows in
my _schema_analysis_synonyms_default.json (I only have one default file) in
both this form and its inverse as well:

"and":[

  "&",
  "and"],


I've also tried adding the StopWord filter to my fieldtype with & in the
stopwords (though this shouldn't be necessary because the SnowBallPorter
should be dropping it anyway) and it still doesn't work.

Is there some kind of special handling I need for ampersands? I'm thinking
that Solr must be interpreting it as some kind of operator and I need to
tell Solr that it's actually literal text so the SnowBallPorter knows to
drop it. Using backslashes or url encoding instead doesn't work though.
Does anyone have any ideas?

I can obviously just remove any ampersands from the q before I submit the
query to Solr and get the correct results, so this is not a game breaking
problem, but i'm more curious to *why* this is happening and how to fix it
correctly.

Cheers,

Callum.

Extra info:

I'm using Solr 5.5.2 in cloud mode.

The q in the queries is specified like this and are parsed the following
way:

"rawquerystring":"stemmed_description:light & fit", "querystring":"
stemmed_description:light & fit", "parsedquery":"(+(+stemmed_description:light
+DisjunctionMaxQuery((stemmed_description:&)) +DisjunctionMaxQuery((
stemmed_description:fit/no_coord", "parsedquery_toString":"+(+
stemmed_description:light +(stemmed_description:&) +(stemmed_description
:fit))",

I have a stemmed field defined in my schema (schema version 1.5) defined
like this:



with a field type defined like this:



  







  
  







  


-- 

Mintel Group Ltd | 11 Pilgrim Street | London | EC4V 6RN
Registered in England: Number 1475918. | VAT Number: GB 232 9342 72

Contact details for our other offices can be found at 
http://www.mintel.com/office-locations.

This email and any attachments may include content that is confidential, 
privileged 
or otherwise protected under applicable law. Unauthorised disclosure, 
copying, distribution 
or use of the contents is prohibited and may be unlawful. If you have 
received this email in error,
including without appropriate authorisation, then please reply to the 
sender about the error 
and delete this email and any attachments.

Stopping a node from receiving any requests temporarily.

2017-04-12 Thread Callum Lamb

We have a Solr cluster that still takes queries that join between cores (I
know, bad). We can't change that anytime soon however and I was hoping
there was a band-aid I could use in the mean time to make deployments of
new nodes cleaner.

When we want to add a new node to cluster we'll have a brief moment in time
where one of the cores in that join will be present, but the other won't.

My understanding is that even if you stop requests from reaching the new
Solr node with haproxy, Solr can can route requests between nodes on it's
own behind haproxy. We've also noticed that this internal Solr routing is
not aware of the join in the query and will route a request to a core that
joins to another core even if the latter is not present yet (Causing the
query to fail).

Until we eliminate all the joins, we want to be able to have a node we can
do things to, but *gaurentee* it won't receive any requests until we decide
it's ready to take requests. Is there an easy way to do this? We could try
stopping the Solr's from talking to each other at the network level but
this seems iffy to me and might cause something weird to happen.

Any ideas?

-- 

Mintel Group Ltd | 11 Pilgrim Street | London | EC4V 6RN
Registered in England: Number 1475918. | VAT Number: GB 232 9342 72

Contact details for our other offices can be found at 
http://www.mintel.com/office-locations.

This email and any attachments may include content that is confidential, 
privileged 
or otherwise protected under applicable law. Unauthorised disclosure, 
copying, distribution 
or use of the contents is prohibited and may be unlawful. If you have 
received this email in error,
including without appropriate authorisation, then please reply to the 
sender about the error 
and delete this email and any attachments.

Re: Stopping a node from receiving any requests temporarily.

2017-04-12 Thread Callum Lamb

Forgot to mention. We're using solr 5.5.2 in Solr cloud mode. Everything is
single sharded at the moment as the collections are still quite small.

On Wed, Apr 12, 2017 at 3:30 PM, Callum Lamb  wrote:

> We have a Solr cluster that still takes queries that join between cores (I
> know, bad). We can't change that anytime soon however and I was hoping
> there was a band-aid I could use in the mean time to make deployments of
> new nodes cleaner.
>
> When we want to add a new node to cluster we'll have a brief moment in
> time where one of the cores in that join will be present, but the other
> won't.
>
> My understanding is that even if you stop requests from reaching the new
> Solr node with haproxy, Solr can can route requests between nodes on it's
> own behind haproxy. We've also noticed that this internal Solr routing is
> not aware of the join in the query and will route a request to a core that
> joins to another core even if the latter is not present yet (Causing the
> query to fail).
>
> Until we eliminate all the joins, we want to be able to have a node we can
> do things to, but *gaurentee* it won't receive any requests until we decide
> it's ready to take requests. Is there an easy way to do this? We could try
> stopping the Solr's from talking to each other at the network level but
> this seems iffy to me and might cause something weird to happen.
>
> Any ideas?
>
>
>

-- 

Mintel Group Ltd | 11 Pilgrim Street | London | EC4V 6RN
Registered in England: Number 1475918. | VAT Number: GB 232 9342 72

Contact details for our other offices can be found at 
http://www.mintel.com/office-locations.

This email and any attachments may include content that is confidential, 
privileged 
or otherwise protected under applicable law. Unauthorised disclosure, 
copying, distribution 
or use of the contents is prohibited and may be unlawful. If you have 
received this email in error,
including without appropriate authorisation, then please reply to the 
sender about the error 
and delete this email and any attachments.

Re: Stopping a node from receiving any requests temporarily.

2017-04-12 Thread Callum Lamb

We can do that in most cases and that's what we've been doing up until now
to prevent failed requests.

All the more incentive to get rid of those joins then I guess!

Thanks.

On Wed, Apr 12, 2017 at 4:16 PM, Erick Erickson 
wrote:

> No good ideas here with current Solr. I just raised SOLR-10484 for the
> generic ability to take a replica out of action (including the
> ADDREPLICA operation).
>
> Your understanding is correct, Solr will route requests to active
> replicas. Is it possible that you can load the "from" core first
> _then_ add the replica that references it? Or do they switch roles?
>
> Best,
> Erick
>
> On Wed, Apr 12, 2017 at 7:39 AM, Callum Lamb  wrote:
> > Forgot to mention. We're using solr 5.5.2 in Solr cloud mode. Everything
> is
> > single sharded at the moment as the collections are still quite small.
> >
> > On Wed, Apr 12, 2017 at 3:30 PM, Callum Lamb  wrote:
> >
> >> We have a Solr cluster that still takes queries that join between cores
> (I
> >> know, bad). We can't change that anytime soon however and I was hoping
> >> there was a band-aid I could use in the mean time to make deployments of
> >> new nodes cleaner.
> >>
> >> When we want to add a new node to cluster we'll have a brief moment in
> >> time where one of the cores in that join will be present, but the other
> >> won't.
> >>
> >> My understanding is that even if you stop requests from reaching the new
> >> Solr node with haproxy, Solr can can route requests between nodes on
> it's
> >> own behind haproxy. We've also noticed that this internal Solr routing
> is
> >> not aware of the join in the query and will route a request to a core
> that
> >> joins to another core even if the latter is not present yet (Causing the
> >> query to fail).
> >>
> >> Until we eliminate all the joins, we want to be able to have a node we
> can
> >> do things to, but *gaurentee* it won't receive any requests until we
> decide
> >> it's ready to take requests. Is there an easy way to do this? We could
> try
> >> stopping the Solr's from talking to each other at the network level but
> >> this seems iffy to me and might cause something weird to happen.
> >>
> >> Any ideas?
> >>
> >>
> >>
> >
> > --
> >
> > Mintel Group Ltd | 11 Pilgrim Street | London | EC4V 6RN
> > Registered in England: Number 1475918. | VAT Number: GB 232 9342 72
> >
> > Contact details for our other offices can be found at
> > http://www.mintel.com/office-locations.
> >
> > This email and any attachments may include content that is confidential,
> > privileged
> > or otherwise protected under applicable law. Unauthorised disclosure,
> > copying, distribution
> > or use of the contents is prohibited and may be unlawful. If you have
> > received this email in error,
> > including without appropriate authorisation, then please reply to the
> > sender about the error
> > and delete this email and any attachments.
> >
>

-- 

Mintel Group Ltd | 11 Pilgrim Street | London | EC4V 6RN
Registered in England: Number 1475918. | VAT Number: GB 232 9342 72

Contact details for our other offices can be found at 
http://www.mintel.com/office-locations.

This email and any attachments may include content that is confidential, 
privileged 
or otherwise protected under applicable law. Unauthorised disclosure, 
copying, distribution 
or use of the contents is prohibited and may be unlawful. If you have 
received this email in error,
including without appropriate authorisation, then please reply to the 
sender about the error 
and delete this email and any attachments.

Classes in solr_home /lib cannot import from solr/dist

Re: Classes in solr_home /lib cannot import from solr/dist

Re: Classes in solr_home /lib cannot import from solr/dist

Nested grouping or equivalent.

Re: Nested grouping or equivalent.

Date range warming.

Re: Date range warming.

Re: Date range warming.

Re: Multilevel grouping?

Should we still optimize?

Re: Should we still optimize?

Handling ampersands in searches.

Stopping a node from receiving any requests temporarily.

Re: Stopping a node from receiving any requests temporarily.

Re: Stopping a node from receiving any requests temporarily.

15 matches

Site Navigation

Mail list logo

Footer information