date:20170726

in-places update solr 5.5.2

2017-07-26 Thread elisabeth benoit

Are in place updates available in solr 5.5.2, I find atomic updates in the
doc
https://archive.apache.org/dist/lucene/solr/ref-guide/apache-solr-ref-guide-5.5.pdf,
which redirects me to the page
https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents#UpdatingPartsofDocuments-AtomicUpdates
.

On that page, for in-place updates, it says

the _*version*_ field is also a non-indexed, non-stored single valued
docValues field

when I try this with solr 5.5.2 I get an error message

org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
Unable to use updateLog: _version_ field must exist in schema, using
indexed=\"true\" or docValues=\"true\", stored=\"true\" and
multiValued=\"false\" (_version_ is not stored


What I'm looking for is a way to update one field of a doc without erasing
the non stored fields. Is this possible in solr 5.5.2?

best regards,
Elisabeth

High CPU utilization on Upgrading to Solr Version 6.3

2017-07-26 Thread Atita Arora

Hi ,

We upgrade our production Solr to version 6.3 from Version 4.3.2 about a
week ago.
We had our Dev / QA / staging on the same version (6.3) before finally
releasing the application leveraging Solr 6.3.

We did our functional and load testing on these boxes , however when we
released it to production along with the same application (using SolrJ to
query Solr) , we ran into severe CPU issues.
Just to add we're on Master - Slave where master has index on
NRTCachingDirectory
and Slave on RAMDirectory.

As soon as we placed the slaves under load balancer , under NO LOAD
condition as well , the slave went from a load of 4 -> 10 -> 16 - > 100 in
12 mins.

I suspected this to be caused due to replication but this is never ending ,
so before this crashed we de-provisioned it and brought it down.

I'm not sure what could possibly cause it.

I looked into the caches , where documentcache , filtercache ,
queryresultcaches are set to defaults 1024 and 100 documents.

I tried observing the GC activity on GCViewer too , which does'nt really
shows something alarming (as in what I feel) - a sampler at
https://pastebin.com/cnuATYrS

Can anyone possibly tell me the reasons ?

Thanks a lot in advance.

Atita

Re: 6.6 cloud starting to eat CPU after 8+ hours

2017-07-26 Thread Mikhail Khludnev

I've looked into stacktrace.
I see that one thread triggers commit via update command. And it's blocked
on searcher warming. The really odd thing is total state = BLOCKED. Can you
check that there is a spare heap space available during indexing peak? And
also that there free RAM (after heap allocation)? Can it happen that
warming query is unnecessary heavy? Also, explicit commits might cause
issues, consider the best practice with auto-commit and openSearcher=false
and soft commit when necessary.


On Mon, Jul 24, 2017 at 4:35 PM, Markus Jelsma 
wrote:

> Alright, after adding a field and full cluster restart, the cluster is
> going nuts once again and this time almost immediately after the restart.
>
> I have now restarted all but one so there is some room to compare, or so i
> thought. Now, the node i didn't restart also drops CPU-usage. This seems to
> correspond to another incident some time ago where all nodes went crazy
> over an extended period, but calmed down after a few were restarted. So it
> could be a problem of inter-node communication.
>
> The index is is one segment at this moment but some documents are being
> indexed. Some queries are executed but not very much. Attaching the stack
> anyway.
>
>
>
>
>
> -Original message-
> > From:Mikhail Khludnev 
> > Sent: Wednesday 19th July 2017 14:41
> > To: solr-user 
> > Subject: Re: 6.6 cloud starting to eat CPU after 8+ hours
> >
> > You can get stack from kill -3 jstack even from solradmin. Overall, this
> > behavior looks like typical heavy merge kicking off from time to time.
> >
> > On Wed, Jul 19, 2017 at 3:31 PM, Markus Jelsma <
> markus.jel...@openindex.io>
> > wrote:
> >
> > > Hello,
> > >
> > > No i cannot expose the stack, VisualVM samples won't show it to me.
> > >
> > > I am not sure if they're about to sync all the time, but every 15
> minutes
> > > some documents are indexed (3 - 4k). For some reason, index time does
> > > increase with latency / CPU usage.
> > >
> > > This situation runs fine for many hours, then it will slowly start to
> go
> > > bad, until nodes are restarted (or index size decreased).
> > >
> > > Thanks,
> > > Markus
> > >
> > > -Original message-
> > > > From:Mikhail Khludnev 
> > > > Sent: Wednesday 19th July 2017 14:18
> > > > To: solr-user 
> > > > Subject: Re: 6.6 cloud starting to eat CPU after 8+ hours
> > > >
> > > > >
> > > > > The real distinction between busy and calm nodes is that busy
> nodes all
> > > > > have o.a.l.codecs.perfield.PerFieldPostingsFormat$
> FieldsReader.terms()
> > > as
> > > > > second to fillBuffer(), what are they doing?
> > > >
> > > >
> > > > Can you expose the stack deeper?
> > > > Can they start to sync shards due to some reason?
> > > >
> > > > On Wed, Jul 19, 2017 at 12:35 PM, Markus Jelsma <
> > > markus.jel...@openindex.io>
> > > > wrote:
> > > >
> > > > > Hello,
> > > > >
> > > > > Another peculiarity here, our six node (2 shards / 3 replica's)
> > > cluster is
> > > > > going crazy after a good part of the day has passed. It starts
> eating
> > > CPU
> > > > > for no good reason and its latency goes up. Grafana graphs show the
> > > problem
> > > > > really well
> > > > >
> > > > > After restarting 2/6 nodes, there is also quite a distinction in
> the
> > > > > VisualVM monitor views, and the VisualVM CPU sampler reports
> (sorted on
> > > > > self time (CPU)). The busy nodes are deeply red in o.a.h.impl.io.
> > > > > AbstractSessionInputBuffer.fillBuffer (as usual), the restarted
> nodes
> > > are
> > > > > not.
> > > > >
> > > > > The real distinction between busy and calm nodes is that busy
> nodes all
> > > > > have o.a.l.codecs.perfield.PerFieldPostingsFormat$
> FieldsReader.terms()
> > > as
> > > > > second to fillBuffer(), what are they doing?! Why? The calm nodes
> don't
> > > > > show this at all. Busy nodes all have o.a.l.codec stuff on top,
> > > restarted
> > > > > nodes don't.
> > > > >
> > > > > So, actually, i don't have a clue! Any, any ideas?
> > > > >
> > > > > Thanks,
> > > > > Markus
> > > > >
> > > > > Each replica is underpowered but performing really well after
> restart
> > > (and
> > > > > JVM warmup), 4 CPU's, 900M heap, 8 GB RAM, maxDoc 2.8 million,
> index
> > > size
> > > > > 18 GB.
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Sincerely yours
> > > > Mikhail Khludnev
> > > >
> > >
> >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> >
>



-- 
Sincerely yours
Mikhail Khludnev

Issue with delta import

2017-07-26 Thread bhargava ravali koganti

Hi,

I'm trying to integrate Solr and Cassandra. I"m facing problem with delta
import. For every 10 minutes I'm running deltaquery using cron job. If any
changes in the data based on last index time, it has to fetch the data(as
far as my knowledge), however, it keeps fetching the whole data
irrespective of changes.

My problem:
https://stackoverflow.com/questions/45304803/deltaimport-fetches-all-the-data

Looking forward to hear from you.

Thanks,
Bhargava Ravali Koganti

Re: Issue with delta import

2017-07-26 Thread Sujay Bawaskar

can you please try ${dih.last_index_time} instead of
${dataimporter.last_index_time}.

On Wed, Jul 26, 2017 at 2:33 PM, bhargava ravali koganti <
ravali@gmail.com> wrote:

> Hi,
>
> I'm trying to integrate Solr and Cassandra. I"m facing problem with delta
> import. For every 10 minutes I'm running deltaquery using cron job. If any
> changes in the data based on last index time, it has to fetch the data(as
> far as my knowledge), however, it keeps fetching the whole data
> irrespective of changes.
>
> My problem:
> https://stackoverflow.com/questions/45304803/deltaimport-fetches-all-the-
> data
>
> Looking forward to hear from you.
>
> Thanks,
> Bhargava Ravali Koganti
>



-- 
Thanks,
Sujay P Bawaskar
M:+91-77091 53669

Re: in-places update solr 5.5.2

2017-07-26 Thread Cassandra Targett

The in-place update section you referenced was added in Solr 6.5. On
p. 224 of the PDF for 5.5, note it says there are only two available
approaches and the section on in-place updates you see online isn't
mentioned. I looked into the history of the online page and the
section on in-place updates was added for Solr 6.5, when SOLR-5944 was
released.

So, unfortunately, unless someone else has a better option for
pre-6.5, I believe it was not possible in 5.5.2.

Cassandra

On Wed, Jul 26, 2017 at 2:30 AM, elisabeth benoit
 wrote:
> Are in place updates available in solr 5.5.2, I find atomic updates in the
> doc
> https://archive.apache.org/dist/lucene/solr/ref-guide/apache-solr-ref-guide-5.5.pdf,
> which redirects me to the page
> https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents#UpdatingPartsofDocuments-AtomicUpdates
> .
>
> On that page, for in-place updates, it says
>
> the _*version*_ field is also a non-indexed, non-stored single valued
> docValues field
>
> when I try this with solr 5.5.2 I get an error message
>
> org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
> Unable to use updateLog: _version_ field must exist in schema, using
> indexed=\"true\" or docValues=\"true\", stored=\"true\" and
> multiValued=\"false\" (_version_ is not stored
>
>
> What I'm looking for is a way to update one field of a doc without erasing
> the non stored fields. Is this possible in solr 5.5.2?
>
> best regards,
> Elisabeth

Re: Copy field a source of copy field

2017-07-26 Thread alessandro.benedetti

I get your point, the second KeepWordFilter is not keeping anything because
the token it gets is :
"hey you" and the word is supposed to keep is "hey".
Which does clearly not work.

The KeepWordFilter just consider each row a single token ( I may be wrong, i
didn't check the code, I am just asssuming based on your observations).
If you want, you can put a WordDelimiterFilter between the 2 KeepWordFilter.
Configure the WordDelimiterFilter to split on space ( I need to double
check, but it should be possible).

OR
You simply do as Erick suggested, and you just keep the genera in the genus
field.
But as Erick mentioned, you may have problems of entity recognition.







-
---
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Copy-field-a-source-of-copy-field-tp4346425p4347731.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: in-places update solr 5.5.2

2017-07-26 Thread elisabeth benoit

Thanks a lot for your answer

2017-07-26 16:35 GMT+02:00 Cassandra Targett :

> The in-place update section you referenced was added in Solr 6.5. On
> p. 224 of the PDF for 5.5, note it says there are only two available
> approaches and the section on in-place updates you see online isn't
> mentioned. I looked into the history of the online page and the
> section on in-place updates was added for Solr 6.5, when SOLR-5944 was
> released.
>
> So, unfortunately, unless someone else has a better option for
> pre-6.5, I believe it was not possible in 5.5.2.
>
> Cassandra
>
> On Wed, Jul 26, 2017 at 2:30 AM, elisabeth benoit
>  wrote:
> > Are in place updates available in solr 5.5.2, I find atomic updates in
> the
> > doc
> > https://archive.apache.org/dist/lucene/solr/ref-guide/
> apache-solr-ref-guide-5.5.pdf,
> > which redirects me to the page
> > https://cwiki.apache.org/confluence/display/solr/
> Updating+Parts+of+Documents#UpdatingPartsofDocuments-AtomicUpdates
> > .
> >
> > On that page, for in-place updates, it says
> >
> > the _*version*_ field is also a non-indexed, non-stored single valued
> > docValues field
> >
> > when I try this with solr 5.5.2 I get an error message
> >
> > org.apache.solr.common.SolrException:org.apache.solr.
> common.SolrException:
> > Unable to use updateLog: _version_ field must exist in schema, using
> > indexed=\"true\" or docValues=\"true\", stored=\"true\" and
> > multiValued=\"false\" (_version_ is not stored
> >
> >
> > What I'm looking for is a way to update one field of a doc without
> erasing
> > the non stored fields. Is this possible in solr 5.5.2?
> >
> > best regards,
> > Elisabeth
>

WordDelimiterFilterFactory with Wildcards

2017-07-26 Thread Webster Homer

I have several fieldtypes that use the WordDelimiterFilterFactory

We have a fieldtype for cas numbers. which look like 1234-12-1, numbers
separated by hyphens, users often leave out the hyphens and either use
spaces or just string the numbers together.

The WDF seemed like a great solution especially as it gave partial matches.
However, a query like 1234-12-* fails. The analyzer tool shows the wildcard
getting stripped off.
Is there any way to preserve the wildcard in the query analyzer when using
the WordDelimiterFilterFactory?

-- 


This message and any attachment are confidential and may be privileged or 
otherwise protected from disclosure. If you are not the intended recipient, 
you must not copy this message or attachment or disclose the contents to 
any other person. If you have received this transmission in error, please 
notify the sender immediately and delete the message and any attachment 
from your system. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not accept liability for any omissions or errors in this 
message which may arise as a result of E-Mail-transmission or for damages 
resulting from any unauthorized changes of the content of this message and 
any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not guarantee that this message is free of viruses and does 
not accept liability for any damages caused by any virus transmitted 
therewith.

Click http://www.emdgroup.com/disclaimer to access the German, French, 
Spanish and Portuguese versions of this disclaimer.

Re: WordDelimiterFilterFactory with Wildcards

2017-07-26 Thread Saurabh Sethi

1. What tokenizer are you using?
2. Do you have preserveOriginal="1" flag set in your filter?
3. Which version of solr are you using?

On Wed, Jul 26, 2017 at 10:48 AM, Webster Homer 
wrote:

> I have several fieldtypes that use the WordDelimiterFilterFactory
>
> We have a fieldtype for cas numbers. which look like 1234-12-1, numbers
> separated by hyphens, users often leave out the hyphens and either use
> spaces or just string the numbers together.
>
> The WDF seemed like a great solution especially as it gave partial matches.
> However, a query like 1234-12-* fails. The analyzer tool shows the wildcard
> getting stripped off.
> Is there any way to preserve the wildcard in the query analyzer when using
> the WordDelimiterFilterFactory?
>
> --
>
>
> This message and any attachment are confidential and may be privileged or
> otherwise protected from disclosure. If you are not the intended recipient,
> you must not copy this message or attachment or disclose the contents to
> any other person. If you have received this transmission in error, please
> notify the sender immediately and delete the message and any attachment
> from your system. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not accept liability for any omissions or errors in this
> message which may arise as a result of E-Mail-transmission or for damages
> resulting from any unauthorized changes of the content of this message and
> any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not guarantee that this message is free of viruses and does
> not accept liability for any damages caused by any virus transmitted
> therewith.
>
> Click http://www.emdgroup.com/disclaimer to access the German, French,
> Spanish and Portuguese versions of this disclaimer.
>



-- 
Saurabh Sethi
Principal Engineer I | Engineering

Re: WordDelimiterFilterFactory with Wildcards

2017-07-26 Thread Webster Homer

1. KeywordTokenizer - we want to treat the entire field as a single term to
parse
2. preserveOriginal = "0" Thought about changing this to 1
3. 6.2.2

This is the fieldtype

On Wed, Jul 26, 2017 at 12:56 PM, Saurabh Sethi 
wrote:

> 1. What tokenizer are you using?
> 2. Do you have preserveOriginal="1" flag set in your filter?
> 3. Which version of solr are you using?
>
> On Wed, Jul 26, 2017 at 10:48 AM, Webster Homer 
> wrote:
>
> > I have several fieldtypes that use the WordDelimiterFilterFactory
> >
> > We have a fieldtype for cas numbers. which look like 1234-12-1, numbers
> > separated by hyphens, users often leave out the hyphens and either use
> > spaces or just string the numbers together.
> >
> > The WDF seemed like a great solution especially as it gave partial
> matches.
> > However, a query like 1234-12-* fails. The analyzer tool shows the
> wildcard
> > getting stripped off.
> > Is there any way to preserve the wildcard in the query analyzer when
> using
> > the WordDelimiterFilterFactory?
> >
> > --
> >
> >
> > This message and any attachment are confidential and may be privileged or
> > otherwise protected from disclosure. If you are not the intended
> recipient,
> > you must not copy this message or attachment or disclose the contents to
> > any other person. If you have received this transmission in error, please
> > notify the sender immediately and delete the message and any attachment
> > from your system. Merck KGaA, Darmstadt, Germany and any of its
> > subsidiaries do not accept liability for any omissions or errors in this
> > message which may arise as a result of E-Mail-transmission or for damages
> > resulting from any unauthorized changes of the content of this message
> and
> > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
> > subsidiaries do not guarantee that this message is free of viruses and
> does
> > not accept liability for any damages caused by any virus transmitted
> > therewith.
> >
> > Click http://www.emdgroup.com/disclaimer to access the German, French,
> > Spanish and Portuguese versions of this disclaimer.
> >
>
>
>
> --
> Saurabh Sethi
> Principal Engineer I | Engineering
>

-- 

This message and any attachment are confidential and may be privileged or 
otherwise protected from disclosure. If you are not the intended recipient, 
you must not copy this message or attachment or disclose the contents to 
any other person. If you have received this transmission in error, please 
notify the sender immediately and delete the message and any attachment 
from your system. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not accept liability for any omissions or errors in this 
message which may arise as a result of E-Mail-transmission or for damages 
resulting from any unauthorized changes of the content of this message and 
any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not guarantee that this message is free of viruses and does 
not accept liability for any damages caused by any virus transmitted 
therewith.

Click http://www.emdgroup.com/disclaimer to access the German, French, 
Spanish and Portuguese versions of this disclaimer.

Re: WordDelimiterFilterFactory with Wildcards

2017-07-26 Thread Saurabh Sethi

My guess is PatternReplaceFilterFactory is most likely the cause.
Also, based on your query, you might want to set preserveOriginal=1

You can take one filter out at a time and see which one is altering the
query.

On Wed, Jul 26, 2017 at 11:13 AM, Webster Homer 
wrote:

> 1. KeywordTokenizer - we want to treat the entire field as a single term to
> parse
> 2. preserveOriginal = "0" Thought about changing this to 1
> 3. 6.2.2
>
> This is the fieldtype
>  positionIncrementGap="100">
>   
> 
> 
> generateWordParts="0"
>splitOnCaseChange="0"
>splitOnNumerics="1"
>generateNumberParts="0"
>catenateWords="0"
>catenateNumbers="1"
>catenateAll="0"
>preserveOriginal="0"
>stemEnglishPossessive="0"/>
>   
>
> 
> 
> ignoreCase="true" expand="true"
> tokenizerFactory="solr.KeywordTokenizerFactory"/>
>  
>  pattern="^.*([^- 0-9*]+).*$" replacement="" replace="all"/>
>   generateWordParts="0"
>splitOnCaseChange="0"
>splitOnNumerics="1"
>generateNumberParts="0"
>catenateWords="0"
>catenateNumbers="1"
>catenateAll="0"
>preserveOriginal="0"
>stemEnglishPossessive="0"/>
>   
>
>
>
> On Wed, Jul 26, 2017 at 12:56 PM, Saurabh Sethi <
> saurabh.se...@sendgrid.com>
> wrote:
>
> > 1. What tokenizer are you using?
> > 2. Do you have preserveOriginal="1" flag set in your filter?
> > 3. Which version of solr are you using?
> >
> > On Wed, Jul 26, 2017 at 10:48 AM, Webster Homer 
> > wrote:
> >
> > > I have several fieldtypes that use the WordDelimiterFilterFactory
> > >
> > > We have a fieldtype for cas numbers. which look like 1234-12-1, numbers
> > > separated by hyphens, users often leave out the hyphens and either use
> > > spaces or just string the numbers together.
> > >
> > > The WDF seemed like a great solution especially as it gave partial
> > matches.
> > > However, a query like 1234-12-* fails. The analyzer tool shows the
> > wildcard
> > > getting stripped off.
> > > Is there any way to preserve the wildcard in the query analyzer when
> > using
> > > the WordDelimiterFilterFactory?
> > >
> > > --
> > >
> > >
> > > This message and any attachment are confidential and may be privileged
> or
> > > otherwise protected from disclosure. If you are not the intended
> > recipient,
> > > you must not copy this message or attachment or disclose the contents
> to
> > > any other person. If you have received this transmission in error,
> please
> > > notify the sender immediately and delete the message and any attachment
> > > from your system. Merck KGaA, Darmstadt, Germany and any of its
> > > subsidiaries do not accept liability for any omissions or errors in
> this
> > > message which may arise as a result of E-Mail-transmission or for
> damages
> > > resulting from any unauthorized changes of the content of this message
> > and
> > > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
> > > subsidiaries do not guarantee that this message is free of viruses and
> > does
> > > not accept liability for any damages caused by any virus transmitted
> > > therewith.
> > >
> > > Click http://www.emdgroup.com/disclaimer to access the German, French,
> > > Spanish and Portuguese versions of this disclaimer.
> > >
> >
> >
> >
> > --
> > Saurabh Sethi
> > Principal Engineer I | Engineering
> >
>
> --
>
>
> This message and any attachment are confidential and may be privileged or
> otherwise protected from disclosure. If you are not the intended recipient,
> you must not copy this message or attachment or disclose the contents to
> any other person. If you have received this transmission in error, please
> notify the sender immediately and delete the message and any attachment
> from your system. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not accept liability for any omissions or errors in this
> message which may arise as a result of E-Mail-transmission or for damages
> resulting from any unauthorized changes of the content of this message and
> any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not guarantee that this message is free of viruses and does
> not accept liability for any damages caused by any virus transmitted
> therewith.
>
> Click http://www.emdgroup.com/disclaimer to access the German, French,
> Spanish and Portuguese versions of this disclaimer.
>



-- 
Saurabh Sethi
Principal Engineer I | Engineering

Re: WordDelimiterFilterFactory with Wildcards

2017-07-26 Thread Webster Homer

checked the Pattern Replace it's OK. Can't use the preserve original since
it preserves the hyphens too, which I don't want. It would be best if it
didn't touch the * at all

On Wed, Jul 26, 2017 at 1:30 PM, Saurabh Sethi 
wrote:

> My guess is PatternReplaceFilterFactory is most likely the cause.
> Also, based on your query, you might want to set preserveOriginal=1
>
> You can take one filter out at a time and see which one is altering the
> query.
>
> On Wed, Jul 26, 2017 at 11:13 AM, Webster Homer 
> wrote:
>
> > 1. KeywordTokenizer - we want to treat the entire field as a single term
> to
> > parse
> > 2. preserveOriginal = "0" Thought about changing this to 1
> > 3. 6.2.2
> >
> > This is the fieldtype
> >  > positionIncrementGap="100">
> >   
> > 
> > 
> >  >generateWordParts="0"
> >splitOnCaseChange="0"
> >splitOnNumerics="1"
> >generateNumberParts="0"
> >catenateWords="0"
> >catenateNumbers="1"
> >catenateAll="0"
> >preserveOriginal="0"
> >stemEnglishPossessive="0"/>
> >   
> >
> > 
> > 
> >  >ignoreCase="true" expand="true"
> > tokenizerFactory="solr.KeywordTokenizerFactory"/>
> >  
> >  > pattern="^.*([^- 0-9*]+).*$" replacement="" replace="all"/>
> >>generateWordParts="0"
> >splitOnCaseChange="0"
> >splitOnNumerics="1"
> >generateNumberParts="0"
> >catenateWords="0"
> >catenateNumbers="1"
> >catenateAll="0"
> >preserveOriginal="0"
> >stemEnglishPossessive="0"/>
> >   
> >
> >
> >
> > On Wed, Jul 26, 2017 at 12:56 PM, Saurabh Sethi <
> > saurabh.se...@sendgrid.com>
> > wrote:
> >
> > > 1. What tokenizer are you using?
> > > 2. Do you have preserveOriginal="1" flag set in your filter?
> > > 3. Which version of solr are you using?
> > >
> > > On Wed, Jul 26, 2017 at 10:48 AM, Webster Homer <
> webster.ho...@sial.com>
> > > wrote:
> > >
> > > > I have several fieldtypes that use the WordDelimiterFilterFactory
> > > >
> > > > We have a fieldtype for cas numbers. which look like 1234-12-1,
> numbers
> > > > separated by hyphens, users often leave out the hyphens and either
> use
> > > > spaces or just string the numbers together.
> > > >
> > > > The WDF seemed like a great solution especially as it gave partial
> > > matches.
> > > > However, a query like 1234-12-* fails. The analyzer tool shows the
> > > wildcard
> > > > getting stripped off.
> > > > Is there any way to preserve the wildcard in the query analyzer when
> > > using
> > > > the WordDelimiterFilterFactory?
> > > >
> > > > --
> > > >
> > > >
> > > > This message and any attachment are confidential and may be
> privileged
> > or
> > > > otherwise protected from disclosure. If you are not the intended
> > > recipient,
> > > > you must not copy this message or attachment or disclose the contents
> > to
> > > > any other person. If you have received this transmission in error,
> > please
> > > > notify the sender immediately and delete the message and any
> attachment
> > > > from your system. Merck KGaA, Darmstadt, Germany and any of its
> > > > subsidiaries do not accept liability for any omissions or errors in
> > this
> > > > message which may arise as a result of E-Mail-transmission or for
> > damages
> > > > resulting from any unauthorized changes of the content of this
> message
> > > and
> > > > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
> > > > subsidiaries do not guarantee that this message is free of viruses
> and
> > > does
> > > > not accept liability for any damages caused by any virus transmitted
> > > > therewith.
> > > >
> > > > Click http://www.emdgroup.com/disclaimer to access the German,
> French,
> > > > Spanish and Portuguese versions of this disclaimer.
> > > >
> > >
> > >
> > >
> > > --
> > > Saurabh Sethi
> > > Principal Engineer I | Engineering
> > >
> >
> > --
> >
> >
> > This message and any attachment are confidential and may be privileged or
> > otherwise protected from disclosure. If you are not the intended
> recipient,
> > you must not copy this message or attachment or disclose the contents to
> > any other person. If you have received this transmission in error, please
> > notify the sender immediately and delete the message and any attachment
> > from your system. Merck KGaA, Darmstadt, Germany and any of its
> > subsidiaries do not accept liability for any omissions or errors in this
> > message which may arise as a result of E-Mail-transmission or for damages
> > resulting from any unauthorized changes of the content of this message
> and
> > any attachment thereto. Merck KGaA, Darmsta

Re: Unable to create core [collection] Caused by: null

2017-07-26 Thread Anshum Gupta

Hi Lucas,

It would be super useful if you provided more information with the question. A 
few things you might want to include are - version of Solr, how did you start 
it, stack trace from the log etc.


-Anshum



> On Jul 25, 2017, at 4:21 PM, Lucas Pelegrino  wrote:
> 
> Hey guys.
> 
> Trying to make solr work here, but I'm getting this error from this command:
> 
> $ ./solr create -c products -d /Users/lucaswxp/reduza-solr/products/conf/
> 
> Error CREATEing SolrCore 'products': Unable to create core [products]
> Caused by: null
> 
> I'm posting my solrconf.xml, schema.xml and data-config.xml here:
> https://pastebin.com/fnYK9pSJ
> 
> The debug from log solr: https://pastebin.com/kVLMvBwZ
> 
> Not sure what to do, the error isn't very descriptive.

shared file system other than HDFS?

2017-07-26 Thread Huang, Daniel

Hi,

Just wondering if anyone has deployed solrcloud on a shared filesystem other 
than HDSF. Appreciate if they can share a bit about the setup, OS, file system, 
replication and backup strategies, etc.

Thanks

--
Daniel Huang
BNY Mellon Innovation Center – Silicon Valley
3495 Deer Creek Road, Palo Alto, CA 94304

The information contained in this e-mail, and any attachment, is confidential 
and is intended solely for the use of the intended recipient. Access, copying 
or re-use of the e-mail or any attachment, or any information contained 
therein, by any other person is not authorized. If you are not the intended 
recipient please return the e-mail to the sender and delete it from your 
computer. Although we attempt to sweep e-mail and attachments for viruses, we 
do not guarantee that either are virus-free and accept no liability for any 
damage sustained as a result of viruses. 

Please refer to http://disclaimer.bnymellon.com/eu.htm for certain disclosures 
relating to European legal entities.

Re: shared file system other than HDFS?

2017-07-26 Thread Mikhail Khludnev

Hello, Daniel
You might find it useful https://issues.apache.org/jira/browse/SOLR-9952.

On Wed, Jul 26, 2017 at 10:46 PM, Huang, Daniel 
wrote:

> Hi,
>
> Just wondering if anyone has deployed solrcloud on a shared filesystem
> other than HDSF. Appreciate if they can share a bit about the setup, OS,
> file system, replication and backup strategies, etc.
>
> Thanks
>
> --
> Daniel Huang
> BNY Mellon Innovation Center – Silicon Valley
> 3495 Deer Creek Road, Palo Alto, CA 94304
>
> The information contained in this e-mail, and any attachment, is
> confidential and is intended solely for the use of the intended recipient.
> Access, copying or re-use of the e-mail or any attachment, or any
> information contained therein, by any other person is not authorized. If
> you are not the intended recipient please return the e-mail to the sender
> and delete it from your computer. Although we attempt to sweep e-mail and
> attachments for viruses, we do not guarantee that either are virus-free and
> accept no liability for any damage sustained as a result of viruses.
>
> Please refer to http://disclaimer.bnymellon.com/eu.htm for certain
> disclosures relating to European legal entities.
>



-- 
Sincerely yours
Mikhail Khludnev

RE: 6.6 cloud starting to eat CPU after 8+ hours

2017-07-26 Thread Markus Jelsma

Hello Mikhail,

Spot on, there is indeed not enough heap when our nodes are in this crazy 
state. When the nodes are happy, the average heap consumption is 50 to 60 
percent, at peak when indexing there is in general enough heap for the warming 
to run smoothly.

I probably forgot to mention that high CPU is in our case also high heap 
consumption when the nodes act mad. The stack trace you saw was a crazy node 
while documents were indexed, so the blocking you mention makes sense.

I still believe this is not a heap issue but something odd in Solr that eats in 
(INSERT SITUATION) unreasonably amounts of heap. Remind, the bad node went back 
in a normal 50-60 percent heap consumption and normal CPU time after _other_ 
nodes got restarted. All nodes were bad and went normal after restart. Bad 
nodes that were not restarted then suddenly revived and also went back to 
normal. This is very odd behavior.

Observing this, i am inclined to think that Solr's inter-node communication can 
get into a weird state, eating heap, eating CPU, going bad. Using CPU or heap 
sampling it is very hard, for me at least, to quickly spot something that is 
bad so i am clueless. What do you think?

How can a bad node become normal just by restarting another bad node? Puzzling..

Thanks,
Markus
 
-Original message-
> From:Mikhail Khludnev 
> Sent: Wednesday 26th July 2017 10:50
> To: solr-user 
> Subject: Re: 6.6 cloud starting to eat CPU after 8+ hours
> 
> I've looked into stacktrace.
> I see that one thread triggers commit via update command. And it's blocked
> on searcher warming. The really odd thing is total state = BLOCKED. Can you
> check that there is a spare heap space available during indexing peak? And
> also that there free RAM (after heap allocation)? Can it happen that
> warming query is unnecessary heavy? Also, explicit commits might cause
> issues, consider the best practice with auto-commit and openSearcher=false
> and soft commit when necessary.
> 
> 
> On Mon, Jul 24, 2017 at 4:35 PM, Markus Jelsma 
> wrote:
> 
> > Alright, after adding a field and full cluster restart, the cluster is
> > going nuts once again and this time almost immediately after the restart.
> >
> > I have now restarted all but one so there is some room to compare, or so i
> > thought. Now, the node i didn't restart also drops CPU-usage. This seems to
> > correspond to another incident some time ago where all nodes went crazy
> > over an extended period, but calmed down after a few were restarted. So it
> > could be a problem of inter-node communication.
> >
> > The index is is one segment at this moment but some documents are being
> > indexed. Some queries are executed but not very much. Attaching the stack
> > anyway.
> >
> >
> >
> >
> >
> > -Original message-
> > > From:Mikhail Khludnev 
> > > Sent: Wednesday 19th July 2017 14:41
> > > To: solr-user 
> > > Subject: Re: 6.6 cloud starting to eat CPU after 8+ hours
> > >
> > > You can get stack from kill -3 jstack even from solradmin. Overall, this
> > > behavior looks like typical heavy merge kicking off from time to time.
> > >
> > > On Wed, Jul 19, 2017 at 3:31 PM, Markus Jelsma <
> > markus.jel...@openindex.io>
> > > wrote:
> > >
> > > > Hello,
> > > >
> > > > No i cannot expose the stack, VisualVM samples won't show it to me.
> > > >
> > > > I am not sure if they're about to sync all the time, but every 15
> > minutes
> > > > some documents are indexed (3 - 4k). For some reason, index time does
> > > > increase with latency / CPU usage.
> > > >
> > > > This situation runs fine for many hours, then it will slowly start to
> > go
> > > > bad, until nodes are restarted (or index size decreased).
> > > >
> > > > Thanks,
> > > > Markus
> > > >
> > > > -Original message-
> > > > > From:Mikhail Khludnev 
> > > > > Sent: Wednesday 19th July 2017 14:18
> > > > > To: solr-user 
> > > > > Subject: Re: 6.6 cloud starting to eat CPU after 8+ hours
> > > > >
> > > > > >
> > > > > > The real distinction between busy and calm nodes is that busy
> > nodes all
> > > > > > have o.a.l.codecs.perfield.PerFieldPostingsFormat$
> > FieldsReader.terms()
> > > > as
> > > > > > second to fillBuffer(), what are they doing?
> > > > >
> > > > >
> > > > > Can you expose the stack deeper?
> > > > > Can they start to sync shards due to some reason?
> > > > >
> > > > > On Wed, Jul 19, 2017 at 12:35 PM, Markus Jelsma <
> > > > markus.jel...@openindex.io>
> > > > > wrote:
> > > > >
> > > > > > Hello,
> > > > > >
> > > > > > Another peculiarity here, our six node (2 shards / 3 replica's)
> > > > cluster is
> > > > > > going crazy after a good part of the day has passed. It starts
> > eating
> > > > CPU
> > > > > > for no good reason and its latency goes up. Grafana graphs show the
> > > > problem
> > > > > > really well
> > > > > >
> > > > > > After restarting 2/6 nodes, there is also quite a distinction in
> > the
> > > > > > VisualVM monitor views, and the VisualVM CPU sampler reports
> > (sort

Re: High CPU utilization on Upgrading to Solr Version 6.3

2017-07-26 Thread Shawn Heisey

On 7/26/2017 1:49 AM, Atita Arora wrote:
> We did our functional and load testing on these boxes , however when we
> released it to production along with the same application (using SolrJ to
> query Solr) , we ran into severe CPU issues.
> Just to add we're on Master - Slave where master has index on
> NRTCachingDirectory
> and Slave on RAMDirectory.
>
> As soon as we placed the slaves under load balancer , under NO LOAD
> condition as well , the slave went from a load of 4 -> 10 -> 16 - > 100 in
> 12 mins.
>
> I suspected this to be caused due to replication but this is never ending ,
> so before this crashed we de-provisioned it and brought it down.
>
> I'm not sure what could possibly cause it.
>
> I looked into the caches , where documentcache , filtercache ,
> queryresultcaches are set to defaults 1024 and 100 documents.
>
> I tried observing the GC activity on GCViewer too , which does'nt really
> shows something alarming (as in what I feel) - a sampler at
> https://pastebin.com/cnuATYrS

What OS is Solr running on?  I'm only asking because some additional
information I'm after has different gathering methods depending on OS. 
Other questions:

Is there only one Solr process per machine, or more than one?
How many total documents are managed by one machine?
How big is all the index data managed by one machine?
What is the max heap on each Solr process?

FYI, RAMDirectory is not the preferred way of running Solr or Lucene. 
If you have enough memory to hold the entire index, it's better to let
the OS handle keeping that information in memory, rather than having
Lucene and Java do it.

http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

NRTCachingDirectoryFactory uses MMap by default as its delegate
implementation, so your master is fine.

I would be interested in getting a copy of Solr's gc log from a system
with high CPU to look at.

Thanks,
Shawn

precedence for configurations in solrconfig.xml file

2017-07-26 Thread suresh pendap

Hi,
If I have a configoverlay.json file with the below content

{"props":{"updateHandler":{"autoCommit":{
"maxTime":5,
"maxDocs":1


and I also have a JVM properties set on the Solr JVM instance as


-Dsolr.autoCommit.maxtime=2 -Dsolr.autoCommit.maxDocs=10



I would like to know the order of precedence in which the

configurations are applied.


Regards

Suresh

Re: WordDelimiterFilterFactory with Wildcards

2017-07-26 Thread Erick Erickson

The Admin/Analysis page is useful here. It'll show you what each bit
of your query analysis chain does and may well point you to the part
of the chain that's the problem.

Best,
Erick

On Wed, Jul 26, 2017 at 11:33 AM, Webster Homer  wrote:
> checked the Pattern Replace it's OK. Can't use the preserve original since
> it preserves the hyphens too, which I don't want. It would be best if it
> didn't touch the * at all
>
> On Wed, Jul 26, 2017 at 1:30 PM, Saurabh Sethi 
> wrote:
>
>> My guess is PatternReplaceFilterFactory is most likely the cause.
>> Also, based on your query, you might want to set preserveOriginal=1
>>
>> You can take one filter out at a time and see which one is altering the
>> query.
>>
>> On Wed, Jul 26, 2017 at 11:13 AM, Webster Homer 
>> wrote:
>>
>> > 1. KeywordTokenizer - we want to treat the entire field as a single term
>> to
>> > parse
>> > 2. preserveOriginal = "0" Thought about changing this to 1
>> > 3. 6.2.2
>> >
>> > This is the fieldtype
>> > > > positionIncrementGap="100">
>> >   
>> > 
>> > 
>> > > >generateWordParts="0"
>> >splitOnCaseChange="0"
>> >splitOnNumerics="1"
>> >generateNumberParts="0"
>> >catenateWords="0"
>> >catenateNumbers="1"
>> >catenateAll="0"
>> >preserveOriginal="0"
>> >stemEnglishPossessive="0"/>
>> >   
>> >
>> > 
>> > 
>> > > >ignoreCase="true" expand="true"
>> > tokenizerFactory="solr.KeywordTokenizerFactory"/>
>> >  
>> > > > pattern="^.*([^- 0-9*]+).*$" replacement="" replace="all"/>
>> >   > >generateWordParts="0"
>> >splitOnCaseChange="0"
>> >splitOnNumerics="1"
>> >generateNumberParts="0"
>> >catenateWords="0"
>> >catenateNumbers="1"
>> >catenateAll="0"
>> >preserveOriginal="0"
>> >stemEnglishPossessive="0"/>
>> >   
>> >
>> >
>> >
>> > On Wed, Jul 26, 2017 at 12:56 PM, Saurabh Sethi <
>> > saurabh.se...@sendgrid.com>
>> > wrote:
>> >
>> > > 1. What tokenizer are you using?
>> > > 2. Do you have preserveOriginal="1" flag set in your filter?
>> > > 3. Which version of solr are you using?
>> > >
>> > > On Wed, Jul 26, 2017 at 10:48 AM, Webster Homer <
>> webster.ho...@sial.com>
>> > > wrote:
>> > >
>> > > > I have several fieldtypes that use the WordDelimiterFilterFactory
>> > > >
>> > > > We have a fieldtype for cas numbers. which look like 1234-12-1,
>> numbers
>> > > > separated by hyphens, users often leave out the hyphens and either
>> use
>> > > > spaces or just string the numbers together.
>> > > >
>> > > > The WDF seemed like a great solution especially as it gave partial
>> > > matches.
>> > > > However, a query like 1234-12-* fails. The analyzer tool shows the
>> > > wildcard
>> > > > getting stripped off.
>> > > > Is there any way to preserve the wildcard in the query analyzer when
>> > > using
>> > > > the WordDelimiterFilterFactory?
>> > > >
>> > > > --
>> > > >
>> > > >
>> > > > This message and any attachment are confidential and may be
>> privileged
>> > or
>> > > > otherwise protected from disclosure. If you are not the intended
>> > > recipient,
>> > > > you must not copy this message or attachment or disclose the contents
>> > to
>> > > > any other person. If you have received this transmission in error,
>> > please
>> > > > notify the sender immediately and delete the message and any
>> attachment
>> > > > from your system. Merck KGaA, Darmstadt, Germany and any of its
>> > > > subsidiaries do not accept liability for any omissions or errors in
>> > this
>> > > > message which may arise as a result of E-Mail-transmission or for
>> > damages
>> > > > resulting from any unauthorized changes of the content of this
>> message
>> > > and
>> > > > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
>> > > > subsidiaries do not guarantee that this message is free of viruses
>> and
>> > > does
>> > > > not accept liability for any damages caused by any virus transmitted
>> > > > therewith.
>> > > >
>> > > > Click http://www.emdgroup.com/disclaimer to access the German,
>> French,
>> > > > Spanish and Portuguese versions of this disclaimer.
>> > > >
>> > >
>> > >
>> > >
>> > > --
>> > > Saurabh Sethi
>> > > Principal Engineer I | Engineering
>> > >
>> >
>> > --
>> >
>> >
>> > This message and any attachment are confidential and may be privileged or
>> > otherwise protected from disclosure. If you are not the intended
>> recipient,
>> > you must not copy this message or attachment or disclose the contents to
>> > any other person. If you have received this transmission in error, please
>> > notify the sender immediately and de

in-places update solr 5.5.2

High CPU utilization on Upgrading to Solr Version 6.3

Re: 6.6 cloud starting to eat CPU after 8+ hours

Issue with delta import

Re: Issue with delta import

Re: in-places update solr 5.5.2

Re: Copy field a source of copy field

Re: in-places update solr 5.5.2

WordDelimiterFilterFactory with Wildcards

Re: WordDelimiterFilterFactory with Wildcards

Re: WordDelimiterFilterFactory with Wildcards

Re: WordDelimiterFilterFactory with Wildcards

Re: WordDelimiterFilterFactory with Wildcards

Re: Unable to create core [collection] Caused by: null

shared file system other than HDFS?

Re: shared file system other than HDFS?

RE: 6.6 cloud starting to eat CPU after 8+ hours

Re: High CPU utilization on Upgrading to Solr Version 6.3

precedence for configurations in solrconfig.xml file

Re: WordDelimiterFilterFactory with Wildcards

20 matches

Site Navigation

Mail list logo

Footer information