FastVector Highlighter

2015-12-07 Thread Zheng Lin Edwin Yeo
Hi,

Would like to check, will using the FastVector Highlighter takes up more
indexing space (the index size) as compared to the Original Highlighter?

I'm using Solr 5.3.0

Regards,
Edwin


Re: FastVector Highlighter

2015-12-07 Thread Emir Arnautovic

Hi Edwin,
FastVector Highlighter requires term vector, positions and frequencies, 
so if it is not enabled on fields that you want to highlight, it will 
increase index size. Since it is common to have those enabled for 
standard highlighter to speed up highlighting, those might already be 
enabled, otherwise reindexing is required as well.


Regards,
Emir

On 07.12.2015 12:06, Zheng Lin Edwin Yeo wrote:

Hi,

Would like to check, will using the FastVector Highlighter takes up more
indexing space (the index size) as compared to the Original Highlighter?

I'm using Solr 5.3.0

Regards,
Edwin



--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/



Re: Grouping by simhash signature

2015-12-07 Thread Toke Eskildsen
On Wed, 2015-12-02 at 13:00 -0700, Nickolay41189 wrote:
> I try to implement NearDup detection by  SimHash
>    algorithm in Solr. 
[...]
> How can I get groups of nearDup by /simhash_signature/?

You could follow the suggested recipe at the page you linked to and
remove the false positives as part of post-processing? Unless you have a
lot of documents that are at the edge between not-similar-enough and
similar-enough, that should be efficient.


So if a SimHash consists of 4*16 bits: ABCD, you would store all
possible 2-part representations: [AB, AC, AD, BA, BC, BD, CA, CB, CD,
DA, DB, DC], either as String-binary (0/1) for easy debug or a bit more
packed with base 16 or 64.

At query time you would do the same permutations and issue a search for
ab OR ac OR ad OR ba OR bc OR bd OR ca OR cb OR cd OR da OR db OR dc

It would even sorta-work with relevance ranking as a match on 2/4 parts
of the SimHash would mean that 2/12 of the query clauses matches, while
a match on 3/4 SimHash-parts means that 6/12 query clauses matches.

- Toke Eskildsen, State and University Library, Denmark




Joins with SolrCloud

2015-12-07 Thread Mugeesh Husain
I have create 3 cores  on same machine using solrlcoud.
core: Restaurant,User,Review 
each of core has only 1 shards and 2 replicas.

Question
1.) It is possible to use join among 3 of cores on same machine( or
different machine)
2.)I am struggling how to use join among 3 of core in solrlcoud mode.

Client: is not interested to de-normalized data.

Give some suggestion how to solved that problem.

Thanks
Mugeesh



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Joins-with-SolrCloud-tp4243957.html
Sent from the Solr - User mailing list archive at Nabble.com.


NullPointer from updateLog in solr4.8

2015-12-07 Thread Miriam Wells
I was trying to update a bunch of documents in Solr and commit them:
I got this error (It's been working for a long time.)

I have a setup of multiple cores and I'm updating docs from a bunch of them..
I commit all the cores I change.

Once I stop and start my Solr again I can add and commit the object and it 
works fine.

The Lines of source code for this are
The line is the source code :

706 if (entry != null) { ...
entry is:
public static class LogPtr {
 final long pointer;
 final long version;
public LogPtr(long pointer, long version) {
  this.pointer = pointer;
  this.version = version;
}

@Override
public String toString() {
  return "LogPtr(" + pointer + ")";
}
  }

Any idea what's wrong?

org.apache.solr.common.SolrException; java.lang.NullPointerException
at org.apache.solr.update.UpdateLog.lookup(UpdateLog.java:706)
at 
org.apache.solr.handler.component.RealTimeGetComponent.getInputDocumentFromTlog(RealTimeGetComponent.java:217)
at 
org.apache.solr.handler.component.RealTimeGetComponent.getInputDocument(RealTimeGetComponent.java:242)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.getUpdatedDocument(DistributedUpdateProcessor.java:892)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:791)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:557)
at 
org.apache.solr.handler.loader.JavabinLoader$1.update(JavabinLoader.java:96)
at 
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:166)
at 
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:136)
at 
org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:228)
at 
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:121)
at 
org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:193)
at 
org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:119)
at 
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:173)
at 
org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:106)
at 
org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:58)
at 
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1953)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:774)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:368)
at 
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
at 
org.eclipse.jetty.server.BlockingHttpConnection.handleReque

Increasing Solr5 time out from 30 seconds while starting solr

2015-12-07 Thread D

Hi,

Many time while starting solr I see the below message and then the solr 
is not reachable.


|debraj@boutique3:~/solr5$ sudo bin/solr start -p 8789 Waiting to see 
Solr listening on port 8789 [-] Still not seeing Solr listening on 8789 
after 30 seconds!|


However when I try to start solr again by trying to execute the same 
command. It says that /"solr is already running on port 8789. Try using 
a different port with -p"/


I am having two cores in my local set-up. I am guessing this is 
happening because one of the core is a little big. So solr is timing out 
while loading the core. If I take one of the core out of solr then 
everything works fine.


Can some one let me know how can I increase this timeout value from 
default 30 seconds?


I am using Solr 5.2.1 on Debian 7.

Thanks,




Re: FastVector Highlighter

2015-12-07 Thread Zheng Lin Edwin Yeo
Hi Emir,

The term vector, positions and frequencies wasn't enabled, so I'm doing the
re-indexing. That is where I found that the index size is bigger than
previously when I was using the Original Highlighter.

Regards,
Edwin


On 7 December 2015 at 19:19, Emir Arnautovic 
wrote:

> Hi Edwin,
> FastVector Highlighter requires term vector, positions and frequencies, so
> if it is not enabled on fields that you want to highlight, it will increase
> index size. Since it is common to have those enabled for standard
> highlighter to speed up highlighting, those might already be enabled,
> otherwise reindexing is required as well.
>
> Regards,
> Emir
>
>
> On 07.12.2015 12:06, Zheng Lin Edwin Yeo wrote:
>
>> Hi,
>>
>> Would like to check, will using the FastVector Highlighter takes up more
>> indexing space (the index size) as compared to the Original Highlighter?
>>
>> I'm using Solr 5.3.0
>>
>> Regards,
>> Edwin
>>
>>
> --
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> Solr & Elasticsearch Support * http://sematext.com/
>
>


secure solr 5.3.1

2015-12-07 Thread kostali hassan
How I shoud secure my server of solr 5 .3.1 in  single-node Mode. I Am
searching for the best way to secure my server solr but I found only for
cloud mode.


Re: Max indexing threads & RamBuffered size

2015-12-07 Thread KNitin
Thanks Eric. I will profile and check it out.

On Saturday, December 5, 2015, Erick Erickson 
wrote:

> bq: What adds bottleneck in the indexing flow? Is it the buffering and
> flushing
> out to disk ?
>
> It Depends (tm). What do the Solr logs show when one of these two things
> happens?
>
> You pretty much have to put a profiler on the Solr instance to see where
> it's
> spending the time, but timeouts are very often caused by:
> 1> having a very large heap
> 2> hitting a stop-the-world garbage collection that exceeds your timeouts.
>
> Best,
> Erick
>
> On Sat, Dec 5, 2015 at 8:07 PM, KNitin  > wrote:
> > I have an extremely large indexing load (per doc size of 4-5 Mb with over
> > 100M docs). I have auto commit settings to flush to disk (with open
> > searcher as false) every 20 seconds. Even with that the update sometime
> > fails or timed out. The goal is to improve the indexing throughput and
> > hence trying to experiment and see if tweaking any of these can speed up.
> >
> > What adds bottleneck in the indexing flow? Is it the buffering and
> flushing
> > out to disk ?
> >
> > On Sat, Dec 5, 2015 at 11:15 AM, Erick Erickson  >
> > wrote:
> >
> >> I'm pretty sure that max indexing threads is per core, but just looked
> >> and it's not supported in Solr 5.3 and above so I wouldn't worry about
> >> it at all.
> >>
> >> I've never seen much in the way of benefit for bumping this past 128M
> >> or maybe 256M. This is just how much memory is filled up before the
> >> buffer is flushed to disk. Unless you have very high indexing loads or
> >> really long autocommit times, you'll rarely hit it anyway since this
> >> memory is also flushed when you do any flavor of hard commit.
> >>
> >> Best,
> >> Erick
> >>
> >> On Fri, Dec 4, 2015 at 4:55 PM, KNitin  > wrote:
> >> > Hi,
> >> >
> >> > The max indexing threads in the solrconfig.xml is set to 8 by default.
> >> Does
> >> > this mean only 8 concurrent indexing threads will be allowed per
> >> collection
> >> > level? or per core level?
> >> >
> >> > Buffered size : This seems to be set at 64Mb. If we have beefier
> machine
> >> > that can take more load, can we set this to a higher limit say 1 or 2
> Gb?
> >> > What will be downside of doing so? (apart from commits taking longer).
> >> >
> >> > Thanks in advance!
> >> > Nitin
> >>
>


Re: FastVector Highlighter

2015-12-07 Thread Emir Arnautovic

Hi Edwin,
It is expected since you are storing more info about document.

Thanks,
Emir

p.s. one correction - I meant offsets, not frequencies.

On 07.12.2015 15:47, Zheng Lin Edwin Yeo wrote:

Hi Emir,

The term vector, positions and frequencies wasn't enabled, so I'm doing the
re-indexing. That is where I found that the index size is bigger than
previously when I was using the Original Highlighter.

Regards,
Edwin


On 7 December 2015 at 19:19, Emir Arnautovic 
wrote:


Hi Edwin,
FastVector Highlighter requires term vector, positions and frequencies, so
if it is not enabled on fields that you want to highlight, it will increase
index size. Since it is common to have those enabled for standard
highlighter to speed up highlighting, those might already be enabled,
otherwise reindexing is required as well.

Regards,
Emir


On 07.12.2015 12:06, Zheng Lin Edwin Yeo wrote:


Hi,

Would like to check, will using the FastVector Highlighter takes up more
indexing space (the index size) as compared to the Original Highlighter?

I'm using Solr 5.3.0

Regards,
Edwin



--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/




--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/



Re: Solrcloud: 1 server, 1 configset, multiple collections, multiple schemas

2015-12-07 Thread bengates
F*ck.

I switched from normal Solr to SolrCloud, thanks to the feature that allow
to create cores (collections) on-the-fly with the API, without having to
tell Solr where to find a schema.xml / a solrconfig.xml and let it create
them itself from a pre-defined configset.

If I understand well, there is actually no way to create a core or a
collection from the API, with a defined-at-once configset, without having to
do some CLI commands on the remote server?

Thanks for your reply,
Ben



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solrcloud-1-server-1-configset-multiple-collections-multiple-schemas-tp4243584p4244010.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr 5.2.1 Most solr nodes in a cluster going down at once.

2015-12-07 Thread philippa griggs
Hello,


I'm using:


Solr 5.2.1 10 shards each with a replica. (20 nodes in total)


Zookeeper 3.4.6.


About half a year ago we upgraded to Solr 5.2.1 and since then have been 
experiencing a 'wipe out' effect where all of a sudden most if not all nodes 
will go down. Sometimes they will recover by themselves but more often than not 
we have to step in to restart nodes.


Nothing in the logs jumps out as being the problem. With the latest wipe out we 
noticed that 10 out of the 20 nodes had garbage collections over 1min all at 
the same time, with the heap usage spiking up in some cases to 80%. We also 
noticed the amount of selects run on the solr cluster increased just before the 
wipe out.


Increasing the heap size seems to help for a while but then it starts happening 
again- so its more like a delay than a fix. Our GC settings are set to -XX: 
+UseG1GC, -XX:+ParallelRefProcEnabled.


With our previous version of solr (4.10.0) this didn't happen. We had 
nodes/shards go down but it was contained, with the new version they all seem 
to go at around the same time. We can't really continue just increasing the 
heap size and would like to solve this issue rather than delay it.


Has anyone experienced something simular?

Is there a difference between the two versions around the recovery process?

Does anyone have any suggestions on a fix.


Many thanks


Philippa



Re: Stop adding content in Solr through /update URL

2015-12-07 Thread Chris Hostetter

: Never made it into CHANGES.txt either. Not part of any patch either.
: Appears to have been secretly committed as a part of SOLR-6787 (Blob API) via
: Revision *1650448
: * in Solr 5.1.

Really? ... huh ... i could have sworn it was much older then that.  I 
thought it had been around since the 1.x days.



-Hoss
http://www.lucidworks.com/


Re: Solr 5.2.1 Most solr nodes in a cluster going down at once.

2015-12-07 Thread Erick Erickson
Tell us a bit more.

Are you adding documents to your collections or adding more
collections? Solr is a balancing act between the number of docs you
have on each node and the memory you have allocated. If you're
continually adding docs to Solr, you'll eventually run out of memory
and/or hit big GC pauses.

How much memory are you allocating to Solr? How much physical memory
to you have? etc.

Best,
Erick


On Mon, Dec 7, 2015 at 8:37 AM, philippa griggs
 wrote:
> Hello,
>
>
> I'm using:
>
>
> Solr 5.2.1 10 shards each with a replica. (20 nodes in total)
>
>
> Zookeeper 3.4.6.
>
>
> About half a year ago we upgraded to Solr 5.2.1 and since then have been 
> experiencing a 'wipe out' effect where all of a sudden most if not all nodes 
> will go down. Sometimes they will recover by themselves but more often than 
> not we have to step in to restart nodes.
>
>
> Nothing in the logs jumps out as being the problem. With the latest wipe out 
> we noticed that 10 out of the 20 nodes had garbage collections over 1min all 
> at the same time, with the heap usage spiking up in some cases to 80%. We 
> also noticed the amount of selects run on the solr cluster increased just 
> before the wipe out.
>
>
> Increasing the heap size seems to help for a while but then it starts 
> happening again- so its more like a delay than a fix. Our GC settings are set 
> to -XX: +UseG1GC, -XX:+ParallelRefProcEnabled.
>
>
> With our previous version of solr (4.10.0) this didn't happen. We had 
> nodes/shards go down but it was contained, with the new version they all seem 
> to go at around the same time. We can't really continue just increasing the 
> heap size and would like to solve this issue rather than delay it.
>
>
> Has anyone experienced something simular?
>
> Is there a difference between the two versions around the recovery process?
>
> Does anyone have any suggestions on a fix.
>
>
> Many thanks
>
>
> Philippa
>


Re: Highlighting tag problem

2015-12-07 Thread Scott Stults
I see. There appears to be a gap in what you can match on and what will get
highlighted:

id, title, content_type, last_modified, url, score 

id, title, content, author, tag

Unless you override fl or hl.fl in url parameters you can get a hit in
content_type, last_modified, url, or score and those fields will not get
highlighted. Try adding those fields to hl.fl.


k/r,
Scott

On Fri, Dec 4, 2015 at 12:59 AM, Zheng Lin Edwin Yeo 
wrote:

> Hi Scott,
>
> No, what's describe in SOLR-8334 is the tag appearing at the result, but at
> the wrong position.
>
> For this problem, the situation is that when I do a highlight query, some
> of the results in the resultset does not contain the search word in  title,
> content_type, last_modified and  url, as specified in my solrconfig.xml
> which I'm posted earlier on, and there is no  tag in those results. So
> I'm not sure why those results are returned.
>
> Regards,
> Edwin
>
>
> On 4 December 2015 at 01:03, Scott Stults <
> sstu...@opensourceconnections.com
> > wrote:
>
> > Edwin,
> >
> > Is this related to what's described in SOLR-8334?
> >
> >
> > k/r,
> > Scott
> >
> > On Thu, Dec 3, 2015 at 5:07 AM, Zheng Lin Edwin Yeo <
> edwinye...@gmail.com>
> > wrote:
> >
> > > Hi,
> > >
> > > I'm using Solr 5.3.0.
> > > Would like to find out, during a search, sometimes there is a match in
> > > content, but it is not highlighted (the word is not in the stopword
> > list)?
> > > Did I make any mistakes in my configuration?
> > >
> > > This is my highlighting request handler from solrconfig.xml.
> > >
> > > 
> > > 
> > > explicit
> > > 10
> > > json
> > > true
> > > text
> > > id, title, content_type, last_modified, url, score
> 
> > >
> > > on
> > > id, title, content, author, tag
> > >true
> > > true
> > > html
> > > 200
> > >
> > > true
> > > signature
> > > true
> > > 100
> > > 
> > > 
> > >
> > >
> > > This is my pipeline for the field.
> > >
> > >   > > positionIncrementGap="100">
> > >
> > >
> > >
> > > class="analyzer.solr5.jieba.JiebaTokenizerFactory"
> > > segMode="SEARCH"/>
> > >
> > >
> > >
> > >
> > >
> > > > > words="org/apache/lucene/analysis/cn/smart/stopwords.txt"/>
> > >
> > > > > words="stopwords.txt" />
> > >
> > > > > generateWordParts="1" generateNumberParts="1" catenateWords="0"
> > > catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
> > >
> > > > > synonyms="synonyms.txt" ignoreCase="true" expand="false"/>
> > >
> > >
> > >
> > > > > maxGramSize="15"/>
> > >
> > >
> > >
> > >
> > >
> > > class="analyzer.solr5.jieba.JiebaTokenizerFactory"
> > > segMode="SEARCH"/>
> > >
> > >
> > >
> > >
> > >
> > > > > words="org/apache/lucene/analysis/cn/smart/stopwords.txt"/>
> > >
> > > > > words="stopwords.txt" />
> > >
> > > > > generateWordParts="0" generateNumberParts="0" catenateWords="0"
> > > catenateNumbers="0" catenateAll="0" splitOnCaseChange="0"/>
> > >
> > > > > synonyms="synonyms.txt" ignoreCase="true" expand="false"/>
> > >
> > >
> > >
> > > 
> > >
> > >  
> > >
> > >
> > > Regards,
> > > Edwin
> > >
> >
> >
> >
> > --
> > Scott Stults | Founder & Solutions Architect | OpenSource Connections,
> LLC
> > | 434.409.2780
> > http://www.opensourceconnections.com
> >
>



-- 
Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC
| 434.409.2780
http://www.opensourceconnections.com


Re: Highlighting tag problem

2015-12-07 Thread Erick Erickson
Pedantry here:

bq: Unless you override fl or hl.fl in url parameters you can get a hit in
content_type, last_modified, url, or score and those fields will not get
highlighted.

In the main correct, but the phrasing makes it seem like the fl parameter
has something to do with the fields _searched_, when it just
specifies the fields _returned_. Perhaps you're thinking of qf in
edismax? Or df?...

It's spot on that the hl.fl fields are all that's highlighted and this is
probably the issue the OP had.

Best,
Erick

On Mon, Dec 7, 2015 at 9:22 AM, Scott Stults
 wrote:
> I see. There appears to be a gap in what you can match on and what will get
> highlighted:
>
> id, title, content_type, last_modified, url, score 
>
> id, title, content, author, tag
>
> Unless you override fl or hl.fl in url parameters you can get a hit in
> content_type, last_modified, url, or score and those fields will not get
> highlighted. Try adding those fields to hl.fl.
>
>
> k/r,
> Scott
>
> On Fri, Dec 4, 2015 at 12:59 AM, Zheng Lin Edwin Yeo 
> wrote:
>
>> Hi Scott,
>>
>> No, what's describe in SOLR-8334 is the tag appearing at the result, but at
>> the wrong position.
>>
>> For this problem, the situation is that when I do a highlight query, some
>> of the results in the resultset does not contain the search word in  title,
>> content_type, last_modified and  url, as specified in my solrconfig.xml
>> which I'm posted earlier on, and there is no  tag in those results. So
>> I'm not sure why those results are returned.
>>
>> Regards,
>> Edwin
>>
>>
>> On 4 December 2015 at 01:03, Scott Stults <
>> sstu...@opensourceconnections.com
>> > wrote:
>>
>> > Edwin,
>> >
>> > Is this related to what's described in SOLR-8334?
>> >
>> >
>> > k/r,
>> > Scott
>> >
>> > On Thu, Dec 3, 2015 at 5:07 AM, Zheng Lin Edwin Yeo <
>> edwinye...@gmail.com>
>> > wrote:
>> >
>> > > Hi,
>> > >
>> > > I'm using Solr 5.3.0.
>> > > Would like to find out, during a search, sometimes there is a match in
>> > > content, but it is not highlighted (the word is not in the stopword
>> > list)?
>> > > Did I make any mistakes in my configuration?
>> > >
>> > > This is my highlighting request handler from solrconfig.xml.
>> > >
>> > > 
>> > > 
>> > > explicit
>> > > 10
>> > > json
>> > > true
>> > > text
>> > > id, title, content_type, last_modified, url, score
>> 
>> > >
>> > > on
>> > > id, title, content, author, tag
>> > >true
>> > > true
>> > > html
>> > > 200
>> > >
>> > > true
>> > > signature
>> > > true
>> > > 100
>> > > 
>> > > 
>> > >
>> > >
>> > > This is my pipeline for the field.
>> > >
>> > >  > > > positionIncrementGap="100">
>> > >
>> > >
>> > >
>> > >> class="analyzer.solr5.jieba.JiebaTokenizerFactory"
>> > > segMode="SEARCH"/>
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >> > > words="org/apache/lucene/analysis/cn/smart/stopwords.txt"/>
>> > >
>> > >> > > words="stopwords.txt" />
>> > >
>> > >> > > generateWordParts="1" generateNumberParts="1" catenateWords="0"
>> > > catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
>> > >
>> > >> > > synonyms="synonyms.txt" ignoreCase="true" expand="false"/>
>> > >
>> > >
>> > >
>> > >> > > maxGramSize="15"/>
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >> class="analyzer.solr5.jieba.JiebaTokenizerFactory"
>> > > segMode="SEARCH"/>
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >> > > words="org/apache/lucene/analysis/cn/smart/stopwords.txt"/>
>> > >
>> > >> > > words="stopwords.txt" />
>> > >
>> > >> > > generateWordParts="0" generateNumberParts="0" catenateWords="0"
>> > > catenateNumbers="0" catenateAll="0" splitOnCaseChange="0"/>
>> > >
>> > >> > > synonyms="synonyms.txt" ignoreCase="true" expand="false"/>
>> > >
>> > >
>> > >
>> > > 
>> > >
>> > >  
>> > >
>> > >
>> > > Regards,
>> > > Edwin
>> > >
>> >
>> >
>> >
>> > --
>> > Scott Stults | Founder & Solutions Architect | OpenSource Connections,
>> LLC
>> > | 434.409.2780
>> > http://www.opensourceconnections.com
>> >
>>
>
>
>
> --
> Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC
> | 434.409.2780
> http://www.opensourceconnections.com


Re: how to control location of solr PID file

2015-12-07 Thread Chris Hostetter

: Subject: how to control location of solr PID file

https://cwiki.apache.org/confluence/display/solr/Taking+Solr+to+Production

https://cwiki.apache.org/confluence/display/solr/Taking+Solr+to+Production#TakingSolrtoProduction-Environmentoverridesincludefile

>> Environment overrides include file
>> ...
>> The SOLR_PID_DIR variable sets the directory where the start script 
>> will write out a file containing the Solr server’s process ID. 

: on 8983. What's the recommended file structure for placing multiple nodes on
: the same host? I am trying multiple "solr" folders within the same solr home

Use a different "Solr home dir" for each "node" -- there is an install 
option explicit for this, and some discussion in the ref guide of what 
your various service commands will look like when you do this...

https://cwiki.apache.org/confluence/display/solr/Taking+Solr+to+Production#TakingSolrtoProduction-RunningmultipleSolrnodesperhost

-Hoss
http://www.lucidworks.com/

Re: Solrcloud: 1 server, 1 configset, multiple collections, multiple schemas

2015-12-07 Thread Shawn Heisey
On 12/7/2015 9:46 AM, bengates wrote:
> If I understand well, there is actually no way to create a core or a
> collection from the API, with a defined-at-once configset, without having to
> do some CLI commands on the remote server?

With SolrCloud, the only step that requires commandline is uploading the
configuration to zookeeper, which is done with the zkcli script included
with Solr.  This script talks to zookeeper over the TCP network socket,
so it can be run from anywhere with network access to the zookeeper
servers.  You do not need to run it directly on the remote Solr server.

With a zookeeper client that's not solr-specific, you may be able to
have even more control, but it won't be as easy as zkcli.

I've used the zookeeper plugin for eclipse, but their website seems to
be broken.  Here's the URL, I hope it starts working at some point:

http://www.massedynamic.org/mediawiki/index.php?title=Eclipse_Plug-in_for_ZooKeeper

Thanks,
Shawn



Re: FastVector Highlighter

2015-12-07 Thread Zheng Lin Edwin Yeo
Hi Emir,

Thanks for the clarification. If that improve the search performance then
it is worthwhile to have the bigger index size. Since for any search
engine, it is the search returning speed and accuracy that determines if
the search engine is a good one.

Regards,
Edwin


On 7 December 2015 at 23:40, Emir Arnautovic 
wrote:

> Hi Edwin,
> It is expected since you are storing more info about document.
>
> Thanks,
> Emir
>
> p.s. one correction - I meant offsets, not frequencies.
>
>
> On 07.12.2015 15:47, Zheng Lin Edwin Yeo wrote:
>
>> Hi Emir,
>>
>> The term vector, positions and frequencies wasn't enabled, so I'm doing
>> the
>> re-indexing. That is where I found that the index size is bigger than
>> previously when I was using the Original Highlighter.
>>
>> Regards,
>> Edwin
>>
>>
>> On 7 December 2015 at 19:19, Emir Arnautovic <
>> emir.arnauto...@sematext.com>
>> wrote:
>>
>> Hi Edwin,
>>> FastVector Highlighter requires term vector, positions and frequencies,
>>> so
>>> if it is not enabled on fields that you want to highlight, it will
>>> increase
>>> index size. Since it is common to have those enabled for standard
>>> highlighter to speed up highlighting, those might already be enabled,
>>> otherwise reindexing is required as well.
>>>
>>> Regards,
>>> Emir
>>>
>>>
>>> On 07.12.2015 12:06, Zheng Lin Edwin Yeo wrote:
>>>
>>> Hi,

 Would like to check, will using the FastVector Highlighter takes up more
 indexing space (the index size) as compared to the Original Highlighter?

 I'm using Solr 5.3.0

 Regards,
 Edwin


 --
>>> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
>>> Solr & Elasticsearch Support * http://sematext.com/
>>>
>>>
>>>
> --
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> Solr & Elasticsearch Support * http://sematext.com/
>
>


Re: how to control location of solr PID file

2015-12-07 Thread tedsolr
Thanks Hoss. I did not use the service install script to install solr. I
don't have the service tool setup as a shortcut. I suppose I'll just have to
specify the port when shutting down a node since "-all" does not work. 
Obviously I'm not a unix admin.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-control-location-of-solr-PID-file-tp4243789p4244074.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Highlighting tag problem

2015-12-07 Thread Zheng Lin Edwin Yeo
So the fields in the fl will affect the fields that will be highlighted?

Isn't only those fields that are specified in hl.fl be highlighted? As I
found some fields that are not specified in hl.fl also got highlighted, but
since it is not specified in hl.fl, that field is not shown in the result
set, and the result set will show a record which doesn't have any highlight
in it.

Regards,
Edwin


On 8 Dec 2015 2:30 am, "Erick Erickson"  wrote:
>
> Pedantry here:
>
> bq: Unless you override fl or hl.fl in url parameters you can get a hit in
> content_type, last_modified, url, or score and those fields will not get
> highlighted.
>
> In the main correct, but the phrasing makes it seem like the fl parameter
> has something to do with the fields _searched_, when it just
> specifies the fields _returned_. Perhaps you're thinking of qf in
> edismax? Or df?...
>
> It's spot on that the hl.fl fields are all that's highlighted and this is
> probably the issue the OP had.
>
> Best,
> Erick
>
> On Mon, Dec 7, 2015 at 9:22 AM, Scott Stults
>  wrote:
> > I see. There appears to be a gap in what you can match on and what will
get
> > highlighted:
> >
> > id, title, content_type, last_modified, url, score 
> >
> > id, title, content, author, tag
> >
> > Unless you override fl or hl.fl in url parameters you can get a hit in
> > content_type, last_modified, url, or score and those fields will not get
> > highlighted. Try adding those fields to hl.fl.
> >
> >
> > k/r,
> > Scott
> >
> > On Fri, Dec 4, 2015 at 12:59 AM, Zheng Lin Edwin Yeo <
edwinye...@gmail.com>
> > wrote:
> >
> >> Hi Scott,
> >>
> >> No, what's describe in SOLR-8334 is the tag appearing at the result,
but at
> >> the wrong position.
> >>
> >> For this problem, the situation is that when I do a highlight query,
some
> >> of the results in the resultset does not contain the search word in
title,
> >> content_type, last_modified and  url, as specified in my solrconfig.xml
> >> which I'm posted earlier on, and there is no  tag in those
results. So
> >> I'm not sure why those results are returned.
> >>
> >> Regards,
> >> Edwin
> >>
> >>
> >> On 4 December 2015 at 01:03, Scott Stults <
> >> sstu...@opensourceconnections.com
> >> > wrote:
> >>
> >> > Edwin,
> >> >
> >> > Is this related to what's described in SOLR-8334?
> >> >
> >> >
> >> > k/r,
> >> > Scott
> >> >
> >> > On Thu, Dec 3, 2015 at 5:07 AM, Zheng Lin Edwin Yeo <
> >> edwinye...@gmail.com>
> >> > wrote:
> >> >
> >> > > Hi,
> >> > >
> >> > > I'm using Solr 5.3.0.
> >> > > Would like to find out, during a search, sometimes there is a
match in
> >> > > content, but it is not highlighted (the word is not in the stopword
> >> > list)?
> >> > > Did I make any mistakes in my configuration?
> >> > >
> >> > > This is my highlighting request handler from solrconfig.xml.
> >> > >
> >> > > 
> >> > > 
> >> > > explicit
> >> > > 10
> >> > > json
> >> > > true
> >> > > text
> >> > > id, title, content_type, last_modified, url, score
> >> 
> >> > >
> >> > > on
> >> > > id, title, content, author, tag
> >> > >true
> >> > > true
> >> > > html
> >> > > 200
> >> > >
> >> > > true
> >> > > signature
> >> > > true
> >> > > 100
> >> > > 
> >> > > 
> >> > >
> >> > >
> >> > > This is my pipeline for the field.
> >> > >
> >> > >   >> > > positionIncrementGap="100">
> >> > >
> >> > >
> >> > >
> >> > > >> class="analyzer.solr5.jieba.JiebaTokenizerFactory"
> >> > > segMode="SEARCH"/>
> >> > >
> >> > >
> >> > >
> >> > >
> >> > >
> >> > > >> > > words="org/apache/lucene/analysis/cn/smart/stopwords.txt"/>
> >> > >
> >> > > >> > > words="stopwords.txt" />
> >> > >
> >> > > >> > > generateWordParts="1" generateNumberParts="1" catenateWords="0"
> >> > > catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
> >> > >
> >> > > >> > > synonyms="synonyms.txt" ignoreCase="true" expand="false"/>
> >> > >
> >> > >
> >> > >
> >> > > >> > > maxGramSize="15"/>
> >> > >
> >> > >
> >> > >
> >> > >
> >> > >
> >> > > >> class="analyzer.solr5.jieba.JiebaTokenizerFactory"
> >> > > segMode="SEARCH"/>
> >> > >
> >> > >
> >> > >
> >> > >
> >> > >
> >> > > >> > > words="org/apache/lucene/analysis/cn/smart/stopwords.txt"/>
> >> > >
> >> > > >> > > words="stopwords.txt" />
> >> > >
> >> > > >> > > generateWordParts="0" generateNumberParts="0" catenateWords="0"
> >> > > catenateNumbers="0" catenateAll="0" splitOnCaseChange="0"/>
> >> > >
> >> > > >> > > synonyms="synonyms.txt" ignoreCase="true" expand="false"/>
> >> > >
> >> > >
> >> > >
> >> > > 
> >> > >
> >> > >  
> >> > >
> >> > >
> >> > > Regards,
> >> > > Edwin
> >> > >
> >> >
> >> >
> >> >
> >> > --
> >> > Scott Stults | Founder & Solutions Architect | OpenSource
Connections,
> >> LLC
> >> > |

Re: Highlighting tag problem

2015-12-07 Thread Erick Erickson
bq: So the fields in the fl will affect the fields that will be highlighted?

No. The pedantry was that one of the replies could be read as
the fl specification affected what fields were _searched_.

On Mon, Dec 7, 2015 at 2:43 PM, Zheng Lin Edwin Yeo
 wrote:
> So the fields in the fl will affect the fields that will be highlighted?
>
> Isn't only those fields that are specified in hl.fl be highlighted? As I
> found some fields that are not specified in hl.fl also got highlighted, but
> since it is not specified in hl.fl, that field is not shown in the result
> set, and the result set will show a record which doesn't have any highlight
> in it.
>
> Regards,
> Edwin
>
>
> On 8 Dec 2015 2:30 am, "Erick Erickson"  wrote:
>>
>> Pedantry here:
>>
>> bq: Unless you override fl or hl.fl in url parameters you can get a hit in
>> content_type, last_modified, url, or score and those fields will not get
>> highlighted.
>>
>> In the main correct, but the phrasing makes it seem like the fl parameter
>> has something to do with the fields _searched_, when it just
>> specifies the fields _returned_. Perhaps you're thinking of qf in
>> edismax? Or df?...
>>
>> It's spot on that the hl.fl fields are all that's highlighted and this is
>> probably the issue the OP had.
>>
>> Best,
>> Erick
>>
>> On Mon, Dec 7, 2015 at 9:22 AM, Scott Stults
>>  wrote:
>> > I see. There appears to be a gap in what you can match on and what will
> get
>> > highlighted:
>> >
>> > id, title, content_type, last_modified, url, score 
>> >
>> > id, title, content, author, tag
>> >
>> > Unless you override fl or hl.fl in url parameters you can get a hit in
>> > content_type, last_modified, url, or score and those fields will not get
>> > highlighted. Try adding those fields to hl.fl.
>> >
>> >
>> > k/r,
>> > Scott
>> >
>> > On Fri, Dec 4, 2015 at 12:59 AM, Zheng Lin Edwin Yeo <
> edwinye...@gmail.com>
>> > wrote:
>> >
>> >> Hi Scott,
>> >>
>> >> No, what's describe in SOLR-8334 is the tag appearing at the result,
> but at
>> >> the wrong position.
>> >>
>> >> For this problem, the situation is that when I do a highlight query,
> some
>> >> of the results in the resultset does not contain the search word in
> title,
>> >> content_type, last_modified and  url, as specified in my solrconfig.xml
>> >> which I'm posted earlier on, and there is no  tag in those
> results. So
>> >> I'm not sure why those results are returned.
>> >>
>> >> Regards,
>> >> Edwin
>> >>
>> >>
>> >> On 4 December 2015 at 01:03, Scott Stults <
>> >> sstu...@opensourceconnections.com
>> >> > wrote:
>> >>
>> >> > Edwin,
>> >> >
>> >> > Is this related to what's described in SOLR-8334?
>> >> >
>> >> >
>> >> > k/r,
>> >> > Scott
>> >> >
>> >> > On Thu, Dec 3, 2015 at 5:07 AM, Zheng Lin Edwin Yeo <
>> >> edwinye...@gmail.com>
>> >> > wrote:
>> >> >
>> >> > > Hi,
>> >> > >
>> >> > > I'm using Solr 5.3.0.
>> >> > > Would like to find out, during a search, sometimes there is a
> match in
>> >> > > content, but it is not highlighted (the word is not in the stopword
>> >> > list)?
>> >> > > Did I make any mistakes in my configuration?
>> >> > >
>> >> > > This is my highlighting request handler from solrconfig.xml.
>> >> > >
>> >> > > 
>> >> > > 
>> >> > > explicit
>> >> > > 10
>> >> > > json
>> >> > > true
>> >> > > text
>> >> > > id, title, content_type, last_modified, url, score
>> >> 
>> >> > >
>> >> > > on
>> >> > > id, title, content, author, tag
>> >> > >true
>> >> > > true
>> >> > > html
>> >> > > 200
>> >> > >
>> >> > > true
>> >> > > signature
>> >> > > true
>> >> > > 100
>> >> > > 
>> >> > > 
>> >> > >
>> >> > >
>> >> > > This is my pipeline for the field.
>> >> > >
>> >> > >  > >> > > positionIncrementGap="100">
>> >> > >
>> >> > >
>> >> > >
>> >> > >> >> class="analyzer.solr5.jieba.JiebaTokenizerFactory"
>> >> > > segMode="SEARCH"/>
>> >> > >
>> >> > >
>> >> > >
>> >> > >
>> >> > >
>> >> > >> >> > > words="org/apache/lucene/analysis/cn/smart/stopwords.txt"/>
>> >> > >
>> >> > >> >> > > words="stopwords.txt" />
>> >> > >
>> >> > >> >> > > generateWordParts="1" generateNumberParts="1" catenateWords="0"
>> >> > > catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
>> >> > >
>> >> > >> >> > > synonyms="synonyms.txt" ignoreCase="true" expand="false"/>
>> >> > >
>> >> > >
>> >> > >
>> >> > > minGramSize="1"
>> >> > > maxGramSize="15"/>
>> >> > >
>> >> > >
>> >> > >
>> >> > >
>> >> > >
>> >> > >> >> class="analyzer.solr5.jieba.JiebaTokenizerFactory"
>> >> > > segMode="SEARCH"/>
>> >> > >
>> >> > >
>> >> > >
>> >> > >
>> >> > >
>> >> > >> >> > > words="org/apache/lucene/analysis/cn/smart/stopwords.txt"/>
>> >> > >
>> >> > >> >> > > words="stopwords.txt" />
>> >> > >
>> >> > >> >> > > generateWordParts="0" gen

Re: Match All terms in indexed field value

2015-12-07 Thread Ahmet Arslan
Hi Senthil,

Please see my response (FunctionQery) to the same question on the list:
http://find.searchhub.org/document/d3af5d5a74114a07#5081d5c01219dbdf

Thanks,
Ahmet

On Monday, December 7, 2015 4:52 AM, Senthil  wrote:



Scenario: Document should be matched/returned ONLY IF user entered search
text is having ALL the terms of single indexed field in any order.

Ex.
Document has got only 2 fields. Id and title.

Below document is indexed.
{"id":"1", "title": "refrigerator water filter"}

Below search text should NOT return the document as search text is subset of
the indexed field value.
1) water filter
2) refrigerator

Below search texts should return the document as all indexed terms of title
field are present in the search term.

1) ABC refrigerator water filter
2) water filter ABC refrigerator
3) water filter refrigerator

Please advise how to model this scenario in SOLR?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Match-All-terms-in-indexed-field-value-tp4243895.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: ZooKeeper nodes die taking down Solr Cluster?

2015-12-07 Thread Shawn Heisey
On 12/1/2015 8:21 AM, Kelly, Frank wrote:
> java.lang.OutOfMemoryError: Java heap space
> 597593838 INFO  
> (zkCallback-4-thread-1103-processing-n:52.91.90.134:8983_solr) [   ]
> o.a.s.s.ZkIndexSchemaReader A schema change: WatchedEvent
> state:SyncConnected type:NodeDataChanged
> path:/configs/mycollection/managed-schema, has occurred - updating schema
> from ZooKeeper ...
>
>
> So it looks like it ran out of memory . . . Strange but I thought my
> collections were pretty small.
> Any idea why a replace-field-type call might cause an OutOfMemoryException?

The default heap size on Solr 5.x is 512MB.  This is extremely small, it
doesn't take much index data for this amount of memory to be exceeded.

Another message you sent to the list mentions "-m 1g" when starting Solr
... even a gigabyte of RAM might be very small, depending on exactly how
Solr is configured, what is indexed into Solr, and how it is being queried.

Unless you can take steps to make Solr use less memory, you're going to
need to increase the max heap.

Thanks,
Shawn



Re: how to control location of solr PID file

2015-12-07 Thread Shawn Heisey
On 12/5/2015 3:21 PM, tedsolr wrote:
> I'm running v5.2.1 on red hat linux. The "solr status" command is not
> recognizing all my nodes. Consequently, "solr stop -all" only stops the node
> on 8983. What's the recommended file structure for placing multiple nodes on
> the same host? I am trying multiple "solr" folders within the same solr home
> folder. With SOLRHOME/server/solr & SOLRHOME/server/solr2, what I get after
> starting two nodes in cloud mode (with one embedded ZK) is a solr-8983.pid
> file in SOLRHOME/bin & a solr-8984.pid file in SOLRHOME/server/solr2.
> Doesn't seem right. Can I force PID files to all be in SOLRHOME/bin?

Most people will never have any need to run more than one SolrCloud
instance per server once they move beyond the "-e cloud" example, which
sets up multiple nodes on one machine because it's the only practical
way to make that example work.

There are valid reasons for multiple nodes per machine in production,
but they mostly apply to extremely high-end hardware running very large
indexes.  For a more modest Solr install, the extra complexity is
unnecessary and will most likely waste memory.  One Solr instance can
handle multiple indexes easily.

If you really do need more than one node per server, then you definitely
want separate solr home directories for each instance.  If it were me
and running only one Solr instance wasn't possible or practical for some
reason, I would go with completely separate installs where each one has
its own service name and solr home, so there are completely separate
copies of all the install files and the startup script.  This is the
method outlined in the documentation page that Hoss gave you.

Thanks,
Shawn



Re: secure solr 5.3.1

2015-12-07 Thread Don Bosco Durai
Have you considered running your Solr as SolrCloud with embedded zookeeper?

If you do, you have multiple options. Basic Auth, Kerberos and authorization 
support.


Bosco





On 12/7/15, 7:03 AM, "kostali hassan"  wrote:

>How I shoud secure my server of solr 5 .3.1 in  single-node Mode. I Am
>searching for the best way to secure my server solr but I found only for
>cloud mode.



Solr Spatial search with self-intersecting polygons

2015-12-07 Thread Vishnu perumal
Hi,

I’m using Solr 4.10.2 with an up to date version of JTS and spatial4j. As
field type in my schema.xml i’m using “location_rpt” like the description
in the documentation. (
http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4#How_to_Use)

location_rpt field type setup:






My filter query like this :

latlon_rpt:"INTERSECTS (POLYGON((16.243972778320312 48.27016879304729,
16.411170959472656 48.268340583150504, 16.44275665283203 48.19058119922813,
16.32396697998047 48.15921534239267,16.243972778320312 48.27016879304729)))


Everything works fine. My problem is when i am trying to use a more complex
polygon (self-intersecting) Solr only shows an Error like this:

"Couldn't parse shape 'POLYGON((-87.525029 41.676998,-87.508635
41.680781,-87.494559 41.681037,-87.485719 41.680332,-87.475333
41.677447,-87.465205 41.675011,-87.443232 41.667574,-87.434992
41.658084,-87.443876 41.653338,-87.435615 41.646604,-87.43216
41.641409,-87.432332 41.630857,-87.432374 41.621907,-87.432418
41.611289,-87.433083 41.606523,-87.432718 41.600475,-87.432846
41.58709,-87.433018 41.577716,-87.432675 41.565902,-87.424092
41.560892,-87.423147 41.548814,-87.425594 41.540271,-87.439713
41.530443,-87.471643 41.520933,-87.493787 41.517463,-87.525721
41.516847,-87.52572 41.570552,-87.536797 41.574096,-87.545295
41.577723,-87.549801 41.577852,-87.568684 41.57737,-87.568727
41.587835,-87.56907 41.639941,-87.524953 41.625187,-87.525471
41.648549,-87.539761 41.6481,-87.539751 41.643562,-87.543345
41.646272,-87.548001 41.650056,-87.5362 41.650089,-87.533582
41.650056,-87.533625 41.648292,-87.525492 41.648549,-87.525029 41.676998))'
because: com.spatial4j.core.exception.InvalidShapeException:
Self-intersection at or near point (-87.52917652682729, 41.648432570220756,
NaN)","code":400}


Are there any workarounds to get self-intersecting polygon query to work?


Re: Use multiple istance simultaneously

2015-12-07 Thread Shawn Heisey
On 12/4/2015 6:37 AM, Gian Maria Ricci - aka Alkampfer wrote:
> Many thanks for your response.
> 
> I worked with Solr until early version 4.0, then switched to ElasticSearch
> for a variety of reasons. I've used replication in the past with SolR, but
> with Elasticsearch basically I had no problem because it works similar to
> SolrCloud by default and with almost zero configuration.
> 
> Now I've a customer that want to use Solr, and he want the simplest possible
> stuff to maintain in production. Since most of the work will be done by Data
> Import Handler, having multiple parallel and independent machines is easy to
> maintain. If one machine fails, it is enough to configure another machine,
> configure core and restart DIH.
> 
> I'd like to know if other people went through this path in the past.

Even though I don't use SolrCloud myself for my primary indexes, if I
were setting up a brand new install of Solr for someone else to manage
after I'm finished with it, I would use SolrCloud.  SolrCloud has no
master, no single point of failure.  Handling multiple shards and
multiple replicas is mostly automatic.  If the clients use SolrJ,
there's no need for a load balancer.

I've never used elasticsearch, but I've looked a little bit at its
configuration.  There are aspects of it that are much easier than Solr.
 Solr does not hide very much of the lower-level complexity from the
administrator.  This makes the learning curve for Solr a lot steeper
than the learning curve for ES, but once that is tackled, the Solr
administrator understands the inner workings a lot better than the ES
administrator.

I've seen claims that ES is much faster than Solr ... but if the
benchmarks supporting those claims are using the out-of-the-box
configurations, then it is an unfair comparison -- Solr's out of the box
configuration has much more capability turned on and is going to run
slower as a result.  I have not seen any numbers where Solr and ES are
set up with configurations that are as identical as possible.  I have to
wonder if this is because the performance would be similar.

Thanks,
Shawn



Solr 5.2.1 deadlock on commit

2015-12-07 Thread Ali Nazemian
Hi,
There is a while since I have had problem with Solr 5.2.1 and I could not
fix it yet. The only think that is clear to me is when I send bulk update
to Solr the commit thread will be blocked! Here is the thread dump output:

"qtp595445781-8207" prio=10 tid=0x7f0bf68f5800 nid=0x5785 waiting for
monitor entry [0x7f081cf04000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:608)
- waiting to lock <0x00067ba2e660> (a java.lang.Object)
at
org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:95)
at
org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalCommit(DistributedUpdateProcessor.java:1635)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:1612)
at
org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:161)
at
org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64)
at
org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64)
at
org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:270)
at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:177)
at
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:98)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2064)
at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:654)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:450)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:227)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:196)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
at org.eclipse.jetty.server.Server.handle(Server.java:497)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)
at
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
at
org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
at java.lang.Thread.run(Thread.java:745)

   Locked ownable synchronizers:
- None

FYI there are lots of blocked thread in thread dump report and Solr becomes
really slow in this case. The temporary solution would be restarting Solr.
But, I am really sick of restarting! I really appreciate if somebody can
help me to solve this problem?

Best regards.

-- 
A.Nazemian