date:20150504

Editing the Solr Wiki

2015-05-04 Thread Nicole Butterfield

Dear Solr Admins,
I'm writing on behalf of Manning Publications regarding the Solr wiki page: 
https://wiki.apache.org/solr/.  I would like to edit the book listings on the 
Solr wiki to include our new MEAP "Taming Search": 
http://www.manning.com/turnbull/. I have already set up an account with the 
username NicoleButterfield.   Many thanks in advance for your help and kind 
regards, Nicole ButterfieldReview EditorManning Publications Co. | 
www.manning.com

Storing SolrCloud index data in Amazon S3

2015-05-04 Thread Vijay Bhoomireddy

Hi,

 

Just wondering whether there is a provision to store SolrCloud index data on
Amazon S3? Please let me know any pointers.

 

Regards

Vijay


-- 
The contents of this e-mail are confidential and for the exclusive use of 
the intended recipient. If you receive this e-mail in error please delete 
it from your system immediately and notify us either by e-mail or 
telephone. You should not copy, forward or otherwise disclose the content 
of the e-mail. The views expressed in this communication may not 
necessarily be the view held by WHISHWORKS.

Solr 5.0, Ubuntu 14.04, SOLR_JAVA_MEM problem

2015-05-04 Thread Bruno Mannina


Dear Solr Community,

I have a recent computer with 8Go RAM, I installed Ubuntu 14.04 and SOLR 
5.0, Java 7

This is a brand new installation.

all work fine but I would like to increase the JAVA_MEM_SOLR (40% of 
total RAM available).

So I edit the bin/solr.in.sh

# Increase Java Min/Max Heap as needed to support your indexing / query 
needs

SOLR_JAVA_MEM="-Xms3g –Xmx3g -XX:MaxPermSize=512m -XX:PermSize=512m"

but with this param, the solr server can't be start, I use:
bin/solr start

Do you have an idea of the problem ?

Thanks a lot for your comment,
Bruno

---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
http://www.avast.com

Delete document stop my solr 5.0 ?!

2015-05-04 Thread Bruno Mannina


Dear Solr Users,

I have a brand new computer where I installed Ubuntu 14.04, 8Go RAM,
SOLR 5.0, Java 7
I indexed 92 000 000 docs (little text file ~2ko each)
I have around 30 fields

All work fine but each Tuesday I need to delete some docs inside, so I
create a batch file
with inside line like this:
/home/solr/solr-5.0.0/bin/post -c docdb  -commit no -d
"f1:58644"
/home/solr/solr-5.0.0/bin/post -c docdb  -commit no -d
"f1:162882"
..
.
/home/solr/solr-5.0.0/bin/post -c docdb  -commit yes -d
"f1:2868668"

my f1 field is my key field. It is unique.

But if my file contains more than one or two hundreds line, my solr
shutdown.
Two hundreds line shutdown always solr 5.0.
I have no error in my console, just Solr can't be reach on the port 8983.

Is exists a variable that I must increase to disable this error ?

On my old solr 3.6, I don't use the same line to delete document, I use:
java -jar -Ddata=args -Dcommit=no  post.jar
"113422"

You can see that I use directly  not , and my schema between
solr3.6 and solr5.0 is almost the same.
I have just some more fields.
why this method do not work now ?

Thanks a lot,
Bruno


---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
http://www.avast.com

Re: Solr Cloud reclaiming disk space from deleted documents

2015-05-04 Thread Rishi Easwaran

Sadly with the size of our complex, spiting and adding more HW is not a viable 
long term solution. 
 I guess the options we have are to run optimize regularly and/or become 
aggressive in our merges proactively even before solr cloud gets into this 
situation.
 
 Thanks,
 Rishi.
 

 

 

-Original Message-
From: Gili Nachum 
To: solr-user 
Sent: Mon, Apr 27, 2015 4:18 pm
Subject: Re: Solr Cloud reclaiming disk space from deleted documents


To prevent it from re occurring you could monitor index size and once above
a
certain size threshold add another machine and split the shard between
existing
and new machine.
On Apr 20, 2015 9:10 PM, "Rishi Easwaran"
 wrote:

> So is there anything that can be done from
a tuning perspective, to
> recover a shard that is 75%-90% full, other that get
rid of the index and
> rebuild the data?
>  Also to prevent this issue from
re-occurring, looks like we need make our
> system aggressive with segment
merges using lower merge factor
>
>
> Thanks,
> Rishi.
>
>
>
>
-Original Message-
> From: Shawn Heisey 
> To:
solr-user 
> Sent: Mon, Apr 20, 2015 11:25 am
>
Subject: Re: Solr Cloud reclaiming disk space from deleted documents
>
>
> On
4/20/2015 8:44 AM, Rishi Easwaran wrote:
> > Yeah I noticed that. Looks like
>
optimize won't work since on some disks we are already pretty full.
> > Any
>
thoughts on increasing/decreasing 10  or
>
ConcurrentMergeScheduler to make solr do merges faster.
>
> You don't have to
do
> an optimize to need 2x disk space.  Even normal
> merging, if it happens
just
> right, can require the same disk space as a
> full optimize.  Normal
Solr
> operation requires that you have enough
> space for your index to reach
at least
> double size on occasion.
>
> Higher merge factors are better for
indexing speed,
> because merging
> happens less frequently.  Lower merge
factors are better for
> query
> speed, at least after the merging finishes,
because merging happens
> more
> frequently and there are fewer total segments
at any given moment.
>
> During a merge, there is so much I/O that query speed
is often
> negatively
> affected.
>
> Thanks,
> Shawn
>
>
>
>

Re: Multiple index.timestamp directories using up disk space

2015-05-04 Thread Rishi Easwaran

Thanks for the responses Mark and Ramkumar.

 The question I had was, why does Solr need 2 copies at any given time, leading 
to 2x disk space usage. 
 Not sure if this information is not published anywhere, and makes HW 
estimation almost impossible for large scale deployment. Even if the copies are 
temporary, this becomes really expensive, especially when using SSD in 
production, when the complex size is over 400TB indexes, running 1000's of solr 
cloud shards. 

 If a solr follower has decided that it needs to do replication from leader and 
capture full copy snapshot. Why can't it delete the old information and 
replicate from scratch, not requiring more disk space.
 Is the concern data loss (a case when both leader and follower lose data)?.

 Thanks,
 Rishi.   

-Original Message-
From: Mark Miller 
To: solr-user 
Sent: Tue, Apr 28, 2015 10:52 am
Subject: Re: Multiple index.timestamp directories using up disk space

If copies of the index are not eventually cleaned up, I'd fill a JIRA
to
address the issue. Those directories should be removed over time. At
times
there will have to be a couple around at the same time and others may
take
a while to clean up.

- Mark

On Tue, Apr 28, 2015 at 3:27 AM Ramkumar
R. Aiyengar <
andyetitmo...@gmail.com> wrote:

> SolrCloud does need up to
twice the amount of disk space as your usual
> index size during replication.
Amongst other things, this ensures you have
> a full copy of the index at any
point. There's no way around this, I would
> suggest you provision the
additional disk space needed.
> On 20 Apr 2015 23:21, "Rishi Easwaran"
 wrote:
>
> > Hi All,
> >
> > We are seeing this
problem with solr 4.6 and solr 4.10.3.
> > For some reason, solr cloud tries to
recover and creates a new index
> > directory - (ex:index.20150420181214550),
while keeping the older index
> as
> > is. This creates an issues where the
disk space fills up and the shard
> > never ends up recovering.
> > Usually
this requires a manual intervention of  bouncing the instance and
> > wiping
the disk clean to allow for a clean recovery.
> >
> > Any ideas on how to
prevent solr from creating multiple copies of index
> > directory.
> >
> >
Thanks,
> > Rishi.
> >
>

Re: Storing SolrCloud index data in Amazon S3

2015-05-04 Thread Toke Eskildsen

On Mon, 2015-05-04 at 10:03 +0100, Vijay Bhoomireddy wrote:
> Just wondering whether there is a provision to store SolrCloud index data on
> Amazon S3? Please let me know any pointers.

Not to my knowledge.

>From what I can read, Amazon S3 is intended for bulk data and has really
poor latency. For example:
https://engineering.gnip.com/s3-read-performance/
puts the latency of a 1KB-request at 125ms.

Interestingly enough it seems that multi-threading helps tremendously in
this scenario. Standard Lucene/Solr does not take advantage of this. One
could conceivably over-provision reads, but it seems like a lot of work
for a special case.

- Toke Eskildsen, State and University Library, Denmark

Re: Injecting synonymns into Solr

2015-05-04 Thread Roman Chyla

It shouldn't matter.  Btw try a url instead of a file path. I think the
underlying loading mechanism uses java File , it could work.
On May 4, 2015 2:07 AM, "Zheng Lin Edwin Yeo"  wrote:

> Would like to check, will this method of splitting the synonyms into
> multiple files use up a lot of memory?
>
> I'm trying it with about 10 files and that collection is not able to be
> loaded due to insufficient memory.
>
> Although currently my machine only have 4GB of memory, but I only have
> 500,000 records indexed, so not sure if there's a significant impact in the
> future (even with larger memory) when my index grows and other things like
> faceting, highlighting, and carrot tools are implemented.
>
> Regards,
> Edwin
>
>
>
> On 1 May 2015 at 11:08, Zheng Lin Edwin Yeo  wrote:
>
> > Thank you for the info. Yup this works. I found out that we can't load
> > files that are more than 1MB into zookeeper, as it happens to any files
> > that's larger than 1MB in size, not just the synonyms files.
> > But I'm not sure if there will be an impact to the system, as the number
> > of synonym text file can potentially grow up to more than 20 since my
> > sample synonym file size is more than 20MB.
> >
> > Currently I only have less than 500,000 records indexed in Solr, so not
> > sure if there will be a significant impact as compared to one which has
> > millions of records.
> > Will try to get more records indexed and will update here again.
> >
> > Regards,
> > Edwin
> >
> >
> > On 1 May 2015 at 08:17, Philippe Soares  wrote:
> >
> >> Split your synonyms into multiple files and set the SynonymFilterFactory
> >> with a coma-separated list of files. e.g. :
> >> synonyms="syn1.txt,syn2.txt,syn3.txt"
> >>
> >> On Thu, Apr 30, 2015 at 8:07 PM, Zheng Lin Edwin Yeo <
> >> edwinye...@gmail.com>
> >> wrote:
> >>
> >> > Just to populate it with the general synonym words. I've managed to
> >> > populate it with some source online, but is there a limit to what it
> can
> >> > contains?
> >> >
> >> > I can't load the configuration into zookeeper if the synonyms.txt file
> >> > contains more than 2100 lines.
> >> >
> >> > Regards,
> >> > Edwin
> >> > On 1 May 2015 05:44, "Chris Hostetter" 
> >> wrote:
> >> >
> >> > >
> >> > > : There is a possible solution here:
> >> > > : https://issues.apache.org/jira/browse/LUCENE-2347 (Dump WordNet
> to
> >> > SOLR
> >> > > : Synonym format).
> >> > >
> >> > > If you have WordNet synonyms you do't need any special code/tools to
> >> > > convert them -- the current solr.SynonymFilterFactory supports
> wordnet
> >> > > files (just specify format="wordnet")
> >> > >
> >> > >
> >> > > : > > Does anyone knows any faster method of populating the
> >> synonyms.txt
> >> > > file
> >> > > : > > instead of manually typing in the words into the file, which
> >> there
> >> > > could
> >> > > : > be
> >> > > : > > thousands of synonyms around?
> >> > >
> >> > > populate from what?  what is hte source of your data?
> >> > >
> >> > > the default solr synonym file format is about as simple as it could
> >> > > possibly be -- pretty trivial to generate it from scripts -- the
> hard
> >> > part
> >> > > is usually selecting the synonym data you want to use and parsing
> >> > whatever
> >> > > format it is already in.
> >> > >
> >> > >
> >> > >
> >> > > -Hoss
> >> > > http://www.lucidworks.com/
> >> > >
> >> >
> >>
> >
> >
>

Solr Cloud

2015-05-04 Thread Jilani Shaik

Hi All,

Do we have any monitoring tools for Apache Solr Cloud? similar to Apache
Ambari which is used for Hadoop Cluster.


Basically I am looking for tool similar to Apache Ambari, which will give
us various metrics in terms of graphs and charts along with deep details
for each node in Hadoop cluster.


Thanks,
Jilani

Re: analyzer, indexAnalyzer and queryAnalyzer

2015-05-04 Thread Steven White

Thanks Doug.  This is extremely helpful.  It is much appreciated that you
took the time to write it all.

Do we have a Solr / Lucene wiki with such "did you know?" write ups?  If
not, just having this kind of knowledge in an email isn't good enough as it
won't be as searchable as a wiki.

Steve

On Wed, Apr 29, 2015 at 9:24 PM, Doug Turnbull <
dturnb...@opensourceconnections.com> wrote:

> So Solr has the idea of a query parser. The query parser is a convenient
> way of passing a search string to Solr and having Solr parse it into
> underlying Lucene queries: You can see a list of query parsers here
> http://wiki.apache.org/solr/QueryParser
>
> What this means is that the query parser does work to pull terms into
> individual clauses *before* analysis is run. It's a parsing layer that sits
> outside the analysis chain. This creates problems like the "sea biscuit"
> problem, whereby we declare "sea biscuit" as a query time synonym of
> "seabiscuit". As you may know synonyms are checked during analysis.
> However, if the query parser splits up "sea" from "biscuit" before running
> analysis, the query time analyzer will fail. The string "sea" is brought by
> itself to the query time analyzer and of course won't match "sea biscuit".
> Same with the string "biscuit" in isolation. If the full string "sea
> biscuit" was brought to the analyzer, it would see [sea] next to [biscuit]
> and declare it a synonym of seabiscuit. Thanks to the query parser, the
> analyzer has lost the association between the terms, and both terms aren't
> brought together to the analyzer.
>
> My colleague John Berryman wrote a pretty good blog post on this
>
> http://opensourceconnections.com/blog/2013/10/27/why-is-multi-term-synonyms-so-hard-in-solr/
>
> There's several solutions out there that attempt to address this problem.
> One from Ted Sullivan at Lucidworks
>
> https://lucidworks.com/blog/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/
>
> Another popular one is the hon-lucene-synonyms plugin:
>
> http://lucene.apache.org/solr/5_1_0/solr-core/org/apache/solr/search/FieldQParserPlugin.html
>
> Yet another work-around is to use the field query parser:
>
> http://lucene.apache.org/solr/5_1_0/solr-core/org/apache/solr/search/FieldQParserPlugin.html
>
> I also tend to write my own query parsers, so on the one hand its annoying
> that query parsers have the problems above, on the flipside Solr makes it
> very easy to implement whatever parsing you think is appropriatte with a
> small bit of Java/Lucene knowledge.
>
> Hopefully that explanation wasn't too deep, but its an important thing to
> know about Solr. Are you asking out of curiosity, or do you have a specific
> problem?
>
> Thanks
> -Doug
>
> On Wed, Apr 29, 2015 at 6:32 PM, Steven White 
> wrote:
>
> > Hi Doug,
> >
> > I don't understand what you mean by the following:
> >
> > > For example, if a user searches for q=hot dogs&defType=edismax&qf=title
> > > body the *query parser* *not* the *analyzer* first turns the query
> into:
> >
> > If I have indexAnalyzer and queryAnalyzer in a fieldType that are 100%
> > identical, the example you provided, does it stand?  If so, why?  Or do
> you
> > mean something totally different by "query parser"?
> >
> > Thanks
> >
> > Steve
> >
> >
> > On Wed, Apr 29, 2015 at 4:18 PM, Doug Turnbull <
> > dturnb...@opensourceconnections.com> wrote:
> >
> > > *> 1) If the content of indexAnalyzer and queryAnalyzer are exactly the
> > > same,that's the same as if I have an analyzer only, right?*
> > > 1) Yes
> > >
> > > *>  2) Under the hood, all three are the same thing when it comes to
> what
> > > kind*
> > > *of data and configuration attributes can take, right?*
> > > 2) Yes. Both take in text and output a token stream.
> > >
> > > *>What I'm trying to figure out is this: beside being able to configure
> > a*
> > >
> > > *fieldType to have different analyzer setting at index and query time,
> > > thereis nothing else that's unique about each.*
> > >
> > > The only thing to look out for in Solr land is the query parser. Most
> > Solr
> > > query parsers treat whitespace as meaningful.
> > >
> > > For example, if a user searches for q=hot dogs&defType=edismax&qf=title
> > > body the *query parser* *not* the *analyzer* first turns the query
> into:
> > >
> > > (title:hot title:dog) | (body:hot body:dog)
> > >
> > > each word which *then *gets analyzed. This is because the query parser
> > > tries to be smart and turn "hot dog" into hot OR dog, or more
> > specifically
> > > making them two must clauses.
> > >
> > > This trips quite a few folks up, you can use the field query parser
> which
> > > uses the field as a phrase query. Hope that helps
> > >
> > >
> > > --
> > > *Doug Turnbull **| *Search Relevance Consultant | OpenSource
> Connections,
> > > LLC | 240.476.9983 | http://www.opensourceconnections.com
> > > Author: Taming Search  from Manning
> > > Publications
> > > This e-mail

Re: Delete document stop my solr 5.0 ?!

2015-05-04 Thread Shawn Heisey

On 5/4/2015 3:19 AM, Bruno Mannina wrote:
> All work fine but each Tuesday I need to delete some docs inside, so I
> create a batch file
> with inside line like this:
> /home/solr/solr-5.0.0/bin/post -c docdb  -commit no -d
> "f1:58644"
> /home/solr/solr-5.0.0/bin/post -c docdb  -commit no -d
> "f1:162882"
> ..
> .
> /home/solr/solr-5.0.0/bin/post -c docdb  -commit yes -d
> "f1:2868668"
> 
> my f1 field is my key field. It is unique.
> 
> But if my file contains more than one or two hundreds line, my solr
> shutdown.
> Two hundreds line shutdown always solr 5.0.
> I have no error in my console, just Solr can't be reach on the port 8983.
> 
> Is exists a variable that I must increase to disable this error ?

As far as I know, the only limit that can affect that is the maximum
post size.  Current versions of Solr default to a 2MB max post size,
using the formdataUploadLimitInKB attribute on the requestParsers
element in solrconfig.xml, which defaults to 2048.

Even if that limit is exceeded by a request, it should not crash Solr,
it should simply log an error and ignore the request.  It would be a bug
if Solr does crash.

What happens if you increase that limit?  Are you seeing any error
messages in the Solr logfile when you send that delete request?

Thanks,
Shawn

Re: Solr Cloud reclaiming disk space from deleted documents

2015-05-04 Thread Shawn Heisey

On 5/4/2015 4:55 AM, Rishi Easwaran wrote:
> Sadly with the size of our complex, spiting and adding more HW is not a 
> viable long term solution. 
>  I guess the options we have are to run optimize regularly and/or become 
> aggressive in our merges proactively even before solr cloud gets into this 
> situation.

If you are regularly deleting most of your index, or reindexing large
parts of it, which effectively does the same thing, then regular
optimizes may be required to keep the index size down, although you must
remember that you need enough room for the core to grow in order to
actually complete the optimize.  If the core is 75-90 percent deleted
docs, then you will not need 2x the core size to optimize it, because
the new index will be much smaller.

Currently, SolrCloud will always optimize the entire collection when you
ask for an optimize on any core, but it will NOT optimize all the
replicas (cores) at the same time.  It will go through the cores that
make up the collection and optimize each one one in sequence.  If your
index is sharded and replicated enough, hopefully that will make it
possible for the optimize to complete even though the amount of disk
space available may be low.

We have at least one issue in Jira where users have asked for optimize
to honor distrib=false, which would allow the user to be in complete
control of all optimizing, but so far that hasn't been implemented.  The
volunteers that maintain Solr can only accomplish so much in the limited
time they have available.

Thanks,
Shawn

Re: Injecting synonymns into Solr

2015-05-04 Thread Shawn Heisey

On 5/4/2015 12:07 AM, Zheng Lin Edwin Yeo wrote:
> Would like to check, will this method of splitting the synonyms into
> multiple files use up a lot of memory?
> 
> I'm trying it with about 10 files and that collection is not able to be
> loaded due to insufficient memory.
> 
> Although currently my machine only have 4GB of memory, but I only have
> 500,000 records indexed, so not sure if there's a significant impact in the
> future (even with larger memory) when my index grows and other things like
> faceting, highlighting, and carrot tools are implemented.

For Solr, depending on exactly how you use it, the number of docs, and
the nature of those docs, a 4GB machine will usually be considered quite
small.  Solr requires a big chunk of RAM for its heap, but usually
requires an even larger chunk of RAM for the OS disk cache.

My Solr machines have 64GB (as much as the servers can hold) and I wish
they had two or four times as much, so I could get better performance.
My larger indexes (155 million docs, 103 million docs, and 18 million
docs, not using SolrCloud) are NOT considered very large by this
community -- we have users wrangling billions of docs with SolrCloud,
using hundreds of servers.

On this Wiki page, I have tried to outline how various aspects of memory
can affect Solr performance:

http://wiki.apache.org/solr/SolrPerformanceProblems

Thanks,
Shawn

Re: Solr Cloud

2015-05-04 Thread Shawn Heisey

On 5/4/2015 6:16 AM, Jilani Shaik wrote:
> Do we have any monitoring tools for Apache Solr Cloud? similar to Apache
> Ambari which is used for Hadoop Cluster.
> 
> Basically I am looking for tool similar to Apache Ambari, which will give
> us various metrics in terms of graphs and charts along with deep details
> for each node in Hadoop cluster.

The most comprehensive and capable Solr monitoring available that I know
of is a service provided by Sematext.

http://sematext.com/

If you want something cheaper, you'll have to build it yourself with
free tools.  Some of the metrics available from sematext can be
duplicated by Xymon or Nagios, others can be duplicated by JavaMelody or
another monitoring tool made specifically for Java programs.  I have
duplicated some of that information with tools that I wrote myself, like
this status servlet:

https://www.dropbox.com/s/gh6e47mu8sp7zkt/status-page-solr.png?dl=0

Nothing that I have built comes close to what sematext provides, but if
you want history from their SPM product on your servers that goes back
more than half an hour, you will pay for it.   Their prices are actually
fairly reasonable for everything you get.

Thanks,
Shawn

Re: Multiple index.timestamp directories using up disk space

2015-05-04 Thread Walter Underwood

One segment is in-use, being searched. That segment (and others) are merged 
into a new segment. After the new segment is ready, searches are directed to 
the new copy and the old copies are deleted.

That is how two copies are needed.

If you cannot provide 2X the disk space, you will not have a stable Solr 
installation. You should consider a different search engine.

“Optimizing” (forced merges) will not help. It will probably cause failures 
more often because it always merges the larges segment.

Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


On May 4, 2015, at 3:53 AM, Rishi Easwaran  wrote:

> Thanks for the responses Mark and Ramkumar.
> 
> The question I had was, why does Solr need 2 copies at any given time, 
> leading to 2x disk space usage. 
> Not sure if this information is not published anywhere, and makes HW 
> estimation almost impossible for large scale deployment. Even if the copies 
> are temporary, this becomes really expensive, especially when using SSD in 
> production, when the complex size is over 400TB indexes, running 1000's of 
> solr cloud shards. 
> 
> If a solr follower has decided that it needs to do replication from leader 
> and capture full copy snapshot. Why can't it delete the old information and 
> replicate from scratch, not requiring more disk space.
> Is the concern data loss (a case when both leader and follower lose data)?.
> 
> Thanks,
> Rishi.   
> 
> 
> 
> 
> 
> 
> 
> -Original Message-
> From: Mark Miller 
> To: solr-user 
> Sent: Tue, Apr 28, 2015 10:52 am
> Subject: Re: Multiple index.timestamp directories using up disk space
> 
> 
> If copies of the index are not eventually cleaned up, I'd fill a JIRA
> to
> address the issue. Those directories should be removed over time. At
> times
> there will have to be a couple around at the same time and others may
> take
> a while to clean up.
> 
> - Mark
> 
> On Tue, Apr 28, 2015 at 3:27 AM Ramkumar
> R. Aiyengar <
> andyetitmo...@gmail.com> wrote:
> 
>> SolrCloud does need up to
> twice the amount of disk space as your usual
>> index size during replication.
> Amongst other things, this ensures you have
>> a full copy of the index at any
> point. There's no way around this, I would
>> suggest you provision the
> additional disk space needed.
>> On 20 Apr 2015 23:21, "Rishi Easwaran"
>  wrote:
>> 
>>> Hi All,
>>> 
>>> We are seeing this
> problem with solr 4.6 and solr 4.10.3.
>>> For some reason, solr cloud tries to
> recover and creates a new index
>>> directory - (ex:index.20150420181214550),
> while keeping the older index
>> as
>>> is. This creates an issues where the
> disk space fills up and the shard
>>> never ends up recovering.
>>> Usually
> this requires a manual intervention of  bouncing the instance and
>>> wiping
> the disk clean to allow for a clean recovery.
>>> 
>>> Any ideas on how to
> prevent solr from creating multiple copies of index
>>> directory.
>>> 
>>> 
> Thanks,
>>> Rishi.
>>> 
>> 
> 
>

Re: analyzer, indexAnalyzer and queryAnalyzer

2015-05-04 Thread Shawn Heisey

On 5/4/2015 6:29 AM, Steven White wrote:
> Thanks Doug.  This is extremely helpful.  It is much appreciated that you
> took the time to write it all.
> 
> Do we have a Solr / Lucene wiki with such "did you know?" write ups?  If
> not, just having this kind of knowledge in an email isn't good enough as it
> won't be as searchable as a wiki.

There is a community-editable wiki.  If you want write permission, just
create an account on that wiki and let us know (either here or on the
#solr IRC channel) what your username is, and we can get you added to
the contributors group.

https://wiki.apache.org/solr

The Apache Solr Reference Guide is kept on another wiki system, but the
only committers can edit that wiki, because it is released as official
documentation.  Community users can comment on its pages if they have
suggestions.

https://cwiki.apache.org/confluence/display/solr/Apache+Solr+Reference+Guide

Everything that happens on both wikis is visible to anyone who
subscribes to the commits mailing list, so if there is good information
available that should go into the official documentation, editing the
community wiki or commenting on the reference guide is usually enough to
make the committers aware of it.

You can find information on the various mailing lists here:

https://lucene.apache.org/core/discussion.html
https://lucene.apache.org/solr/resources.html#mailing-lists

Thanks,
Shawn

How to get exact match along with text edge_ngram

2015-05-04 Thread Vishal Swaroop

We have item_name indexed as text edge_ngram which returns like results...

Please suggest what will be the best approach (like "string" index (in
addition to "...edge_ngram"... or using copyField...) to search ALSO for
exact matches?

e.g. url should return item_name as "abc" entries only... I tried item_name
in quotes ("abc") but no luck...
http://localhost:8081/solr/item/select?q=item_name:abc&wt=json&indent=true
But, I get abc-1, abc-2... as result, however I only need "abc" entries.

Re: Multiple index.timestamp directories using up disk space

2015-05-04 Thread Rishi Easwaran

Walter,

Unless I am missing something here.. I completely get that, when a few segment 
merges solr requires 2x space of segments to accomplish this.
Usually any index has multiple segments files so this fragmented 2x space 
consumption is not an issue, even as merged segments grow bigger. 

But what I am talking about is copy of a whole index as is into a new 
directory.  The new directory has no relation to the older index directory or 
its segments, so not sure what merges are going on across directories/indexes, 
and why solr needs the older index.

Thanks,
Rishi.

 

 

 

-Original Message-
From: Walter Underwood 
To: solr-user 
Sent: Mon, May 4, 2015 9:50 am
Subject: Re: Multiple index.timestamp directories using up disk space


One segment is in-use, being searched. That segment (and others) are merged into
a new segment. After the new segment is ready, searches are directed to the new
copy and the old copies are deleted.

That is how two copies are needed.

If
you cannot provide 2X the disk space, you will not have a stable Solr
installation. You should consider a different search engine.

“Optimizing”
(forced merges) will not help. It will probably cause failures more often
because it always merges the larges segment.

Walter
Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my
blog)


On May 4, 2015, at 3:53 AM, Rishi Easwaran 
wrote:

> Thanks for the responses Mark and Ramkumar.
> 
> The question I
had was, why does Solr need 2 copies at any given time, leading to 2x disk space
usage. 
> Not sure if this information is not published anywhere, and makes HW
estimation almost impossible for large scale deployment. Even if the copies are
temporary, this becomes really expensive, especially when using SSD in
production, when the complex size is over 400TB indexes, running 1000's of solr
cloud shards. 
> 
> If a solr follower has decided that it needs to do
replication from leader and capture full copy snapshot. Why can't it delete the
old information and replicate from scratch, not requiring more disk space.
> Is
the concern data loss (a case when both leader and follower lose data)?.
> 
>
Thanks,
> Rishi.   
> 
> 
> 
> 
> 
> 
> 
> -Original
Message-
> From: Mark Miller 
> To: solr-user

> Sent: Tue, Apr 28, 2015 10:52 am
> Subject:
Re: Multiple index.timestamp directories using up disk space
> 
> 
> If
copies of the index are not eventually cleaned up, I'd fill a JIRA
> to
>
address the issue. Those directories should be removed over time. At
> times
>
there will have to be a couple around at the same time and others may
> take
>
a while to clean up.
> 
> - Mark
> 
> On Tue, Apr 28, 2015 at 3:27 AM
Ramkumar
> R. Aiyengar <
> andyetitmo...@gmail.com> wrote:
> 
>> SolrCloud
does need up to
> twice the amount of disk space as your usual
>> index size
during replication.
> Amongst other things, this ensures you have
>> a full
copy of the index at any
> point. There's no way around this, I would
>>
suggest you provision the
> additional disk space needed.
>> On 20 Apr 2015
23:21, "Rishi Easwaran"
>  wrote:
>> 
>>> Hi
All,
>>> 
>>> We are seeing this
> problem with solr 4.6 and solr
4.10.3.
>>> For some reason, solr cloud tries to
> recover and creates a new
index
>>> directory - (ex:index.20150420181214550),
> while keeping the older
index
>> as
>>> is. This creates an issues where the
> disk space fills up
and the shard
>>> never ends up recovering.
>>> Usually
> this requires a
manual intervention of  bouncing the instance and
>>> wiping
> the disk clean
to allow for a clean recovery.
>>> 
>>> Any ideas on how to
> prevent solr
from creating multiple copies of index
>>> directory.
>>> 
>>> 
>
Thanks,
>>> Rishi.
>>> 
>> 
> 
>

Re: Solr Cloud reclaiming disk space from deleted documents

2015-05-04 Thread Rishi Easwaran

Thanks Shawn.. yeah regular optimize might be the route we take, if this 
becomes a recurring issue.
 I remember in our old multicore deployment CPU used to spike and the core 
almost became non responsive. 

My guess with solr cloud architecture, any slack by leader while optimizing is 
picked up by the replica.
I was searching around for optimize behaviour of solr cloud and could not find 
much information.

Does anyone have experience running optimize for solr cloud in a loaded 
production env?

Thanks,
Rishi.

-Original Message-
From: Shawn Heisey 
To: solr-user 
Sent: Mon, May 4, 2015 9:11 am
Subject: Re: Solr Cloud reclaiming disk space from deleted documents

On 5/4/2015 4:55 AM, Rishi Easwaran wrote:
> Sadly with the size of our
complex, spiting and adding more HW is not a viable long term solution. 
>  I
guess the options we have are to run optimize regularly and/or become aggressive
in our merges proactively even before solr cloud gets into this situation.

If
you are regularly deleting most of your index, or reindexing large
parts of it,
which effectively does the same thing, then regular
optimizes may be required
to keep the index size down, although you must
remember that you need enough
room for the core to grow in order to
actually complete the optimize.  If the
core is 75-90 percent deleted
docs, then you will not need 2x the core size to
optimize it, because
the new index will be much smaller.

Currently,
SolrCloud will always optimize the entire collection when you
ask for an
optimize on any core, but it will NOT optimize all the
replicas (cores) at the
same time.  It will go through the cores that
make up the collection and
optimize each one one in sequence.  If your
index is sharded and replicated
enough, hopefully that will make it
possible for the optimize to complete even
though the amount of disk
space available may be low.

We have at least one
issue in Jira where users have asked for optimize
to honor distrib=false, which
would allow the user to be in complete
control of all optimizing, but so far
that hasn't been implemented.  The
volunteers that maintain Solr can only
accomplish so much in the limited
time they have
available.

Thanks,
Shawn

Re: Delete document stop my solr 5.0 ?!

2015-05-04 Thread Bruno Mannina


ok I have this OOM error in the log file ...

#
# java.lang.OutOfMemoryError: Java heap space
# -XX:OnOutOfMemoryError="/home/solr/solr-5.0.0/bin/oom_solr.sh 
8983/home/solr/solr-5.0.0/server/logs"
#   Executing /bin/sh -c "/home/solr/solr-5.0.0/bin/oom_solr.sh 
8983/home/solr/solr-5.0.0/server/logs"...

Running OOM killer script for process 28233 for Solr on port 8983
Killed process 28233

I try in few minutes to increase the

formdataUploadLimitInKB

and I will tell you the result.

Le 04/05/2015 14:58, Shawn Heisey a écrit :

On 5/4/2015 3:19 AM, Bruno Mannina wrote:

All work fine but each Tuesday I need to delete some docs inside, so I
create a batch file
with inside line like this:
/home/solr/solr-5.0.0/bin/post -c docdb  -commit no -d
"f1:58644"
/home/solr/solr-5.0.0/bin/post -c docdb  -commit no -d
"f1:162882"
..
.
/home/solr/solr-5.0.0/bin/post -c docdb  -commit yes -d
"f1:2868668"

my f1 field is my key field. It is unique.

But if my file contains more than one or two hundreds line, my solr
shutdown.
Two hundreds line shutdown always solr 5.0.
I have no error in my console, just Solr can't be reach on the port 8983.

Is exists a variable that I must increase to disable this error ?

As far as I know, the only limit that can affect that is the maximum
post size.  Current versions of Solr default to a 2MB max post size,
using the formdataUploadLimitInKB attribute on the requestParsers
element in solrconfig.xml, which defaults to 2048.

Even if that limit is exceeded by a request, it should not crash Solr,
it should simply log an error and ignore the request.  It would be a bug
if Solr does crash.

What happens if you increase that limit?  Are you seeing any error
messages in the Solr logfile when you send that delete request?

Thanks,
Shawn






---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
http://www.avast.com

Re: Solr 5.0, Ubuntu 14.04, SOLR_JAVA_MEM problem

2015-05-04 Thread Scott Dawson

Bruno,
You have the wrong kind of dash (a long dash) in front of the Xmx flag.
Could that be causing a problem?

Regards,
Scott

On Mon, May 4, 2015 at 5:06 AM, Bruno Mannina  wrote:

> Dear Solr Community,
>
> I have a recent computer with 8Go RAM, I installed Ubuntu 14.04 and SOLR
> 5.0, Java 7
> This is a brand new installation.
>
> all work fine but I would like to increase the JAVA_MEM_SOLR (40% of total
> RAM available).
> So I edit the bin/solr.in.sh
>
> # Increase Java Min/Max Heap as needed to support your indexing / query
> needs
> SOLR_JAVA_MEM="-Xms3g –Xmx3g -XX:MaxPermSize=512m -XX:PermSize=512m"
>
> but with this param, the solr server can't be start, I use:
> bin/solr start
>
> Do you have an idea of the problem ?
>
> Thanks a lot for your comment,
> Bruno
>
> ---
> Ce courrier électronique ne contient aucun virus ou logiciel malveillant
> parce que la protection avast! Antivirus est active.
> http://www.avast.com
>
>

Re: Delete document stop my solr 5.0 ?!

2015-05-04 Thread Bruno Mannina


I increase the

formdataUploadLimitInKB

to 2048000 and the problem is the same, same error

an idea ?



Le 04/05/2015 16:38, Bruno Mannina a écrit :

ok I have this OOM error in the log file ...

#
# java.lang.OutOfMemoryError: Java heap space
# -XX:OnOutOfMemoryError="/home/solr/solr-5.0.0/bin/oom_solr.sh 
8983/home/solr/solr-5.0.0/server/logs"
#   Executing /bin/sh -c "/home/solr/solr-5.0.0/bin/oom_solr.sh 
8983/home/solr/solr-5.0.0/server/logs"...

Running OOM killer script for process 28233 for Solr on port 8983
Killed process 28233

I try in few minutes to increase the

formdataUploadLimitInKB

and I will tell you the result.

Le 04/05/2015 14:58, Shawn Heisey a écrit :

On 5/4/2015 3:19 AM, Bruno Mannina wrote:

All work fine but each Tuesday I need to delete some docs inside, so I
create a batch file
with inside line like this:
/home/solr/solr-5.0.0/bin/post -c docdb  -commit no -d
"f1:58644"
/home/solr/solr-5.0.0/bin/post -c docdb  -commit no -d
"f1:162882"
..
.
/home/solr/solr-5.0.0/bin/post -c docdb  -commit yes -d
"f1:2868668"

my f1 field is my key field. It is unique.

But if my file contains more than one or two hundreds line, my solr
shutdown.
Two hundreds line shutdown always solr 5.0.
I have no error in my console, just Solr can't be reach on the port 
8983.


Is exists a variable that I must increase to disable this error ?

As far as I know, the only limit that can affect that is the maximum
post size.  Current versions of Solr default to a 2MB max post size,
using the formdataUploadLimitInKB attribute on the requestParsers
element in solrconfig.xml, which defaults to 2048.

Even if that limit is exceeded by a request, it should not crash Solr,
it should simply log an error and ignore the request.  It would be a bug
if Solr does crash.

What happens if you increase that limit?  Are you seeing any error
messages in the Solr logfile when you send that delete request?

Thanks,
Shawn






---
Ce courrier électronique ne contient aucun virus ou logiciel 
malveillant parce que la protection avast! Antivirus est active.

http://www.avast.com






---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
http://www.avast.com

Re: Delete document stop my solr 5.0 ?!

2015-05-04 Thread Shawn Heisey

On 5/4/2015 8:38 AM, Bruno Mannina wrote:
> ok I have this OOM error in the log file ...
>
> #
> # java.lang.OutOfMemoryError: Java heap space
> # -XX:OnOutOfMemoryError="/home/solr/solr-5.0.0/bin/oom_solr.sh
> 8983/home/solr/solr-5.0.0/server/logs"
> #   Executing /bin/sh -c "/home/solr/solr-5.0.0/bin/oom_solr.sh
> 8983/home/solr/solr-5.0.0/server/logs"...
> Running OOM killer script for process 28233 for Solr on port 8983

Out Of Memory errors are a completely different problem.  Solr behavior
is completely unpredictable after an OutOfMemoryError exception, so the
5.0 install includes a script to run on OOME that kills Solr.  It's the
only safe way to handle that problem.

Your Solr install is not being given enough Java heap memory for what it
is being asked to do.  You need to increase the heap size for Solr.  If
you look at the admin UI for Solr in a web browser, you can see what the
max heap is set to ... on a default 5.0 install running Solr with
"bin/solr" the max heap will be 512m ... which is VERY small.  Try using
bin/solr with the -m option, set to something like 2g (for 2 gigabytes
of heap).

Thanks,
Shawn

Re: Solr 5.0, Ubuntu 14.04, SOLR_JAVA_MEM problem

2015-05-04 Thread Bruno Mannina


Yes ! it works !!!

Scott perfect 

For my config 3g do not work, but 2g yes !

Thanks

Le 04/05/2015 16:50, Scott Dawson a écrit :

Bruno,
You have the wrong kind of dash (a long dash) in front of the Xmx flag.
Could that be causing a problem?

Regards,
Scott

On Mon, May 4, 2015 at 5:06 AM, Bruno Mannina  wrote:


Dear Solr Community,

I have a recent computer with 8Go RAM, I installed Ubuntu 14.04 and SOLR
5.0, Java 7
This is a brand new installation.

all work fine but I would like to increase the JAVA_MEM_SOLR (40% of total
RAM available).
So I edit the bin/solr.in.sh

# Increase Java Min/Max Heap as needed to support your indexing / query
needs
SOLR_JAVA_MEM="-Xms3g –Xmx3g -XX:MaxPermSize=512m -XX:PermSize=512m"

but with this param, the solr server can't be start, I use:
bin/solr start

Do you have an idea of the problem ?

Thanks a lot for your comment,
Bruno

---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant
parce que la protection avast! Antivirus est active.
http://www.avast.com





---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
http://www.avast.com

Re: Delete document stop my solr 5.0 ?!

2015-05-04 Thread Bruno Mannina

Yes it was that ! I increased the SOLR_JAVA_MEM to 2g (with 8Go Ram i do 
more, 3g fail to run solr on my brand new computer)


thanks !

Le 04/05/2015 17:03, Shawn Heisey a écrit :

On 5/4/2015 8:38 AM, Bruno Mannina wrote:

ok I have this OOM error in the log file ...

#
# java.lang.OutOfMemoryError: Java heap space
# -XX:OnOutOfMemoryError="/home/solr/solr-5.0.0/bin/oom_solr.sh
8983/home/solr/solr-5.0.0/server/logs"
#   Executing /bin/sh -c "/home/solr/solr-5.0.0/bin/oom_solr.sh
8983/home/solr/solr-5.0.0/server/logs"...
Running OOM killer script for process 28233 for Solr on port 8983

Out Of Memory errors are a completely different problem.  Solr behavior
is completely unpredictable after an OutOfMemoryError exception, so the
5.0 install includes a script to run on OOME that kills Solr.  It's the
only safe way to handle that problem.

Your Solr install is not being given enough Java heap memory for what it
is being asked to do.  You need to increase the heap size for Solr.  If
you look at the admin UI for Solr in a web browser, you can see what the
max heap is set to ... on a default 5.0 install running Solr with
"bin/solr" the max heap will be 512m ... which is VERY small.  Try using
bin/solr with the -m option, set to something like 2g (for 2 gigabytes
of heap).

Thanks,
Shawn






---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
http://www.avast.com

Re: Solr 5.0, Ubuntu 14.04, SOLR_JAVA_MEM problem

2015-05-04 Thread Shawn Heisey

On 5/4/2015 9:09 AM, Bruno Mannina wrote:
> Yes ! it works !!!
>
> Scott perfect 
>
> For my config 3g do not work, but 2g yes !

If you can't start Solr with a 3g heap, chances are that you are running
a 32-bit version of Java.  A 32-bit Java cannot go above a 2GB heap.  A
64-bit JVM requires a 64-bit operating system, which requires a 64-bit
CPU.  Since 2006, Intel has only been providing 64-bit chips to the
consumer market, and getting a 32-bit chip in a new computer has gotten
extremely difficult.  The server market has had only 64-bit chips from
Intel since 2005.  I am not sure what those dates look like for AMD
chips, but it is probably similar.

Running "java -version" should give you enough information to determine
whether your Java is 32-bit or 64-bit.  This is the output from that
command on a Linux machine that is running a 64-bit JVM from Oracle:

root@idxa4:~# java -version
java version "1.8.0_45"
Java(TM) SE Runtime Environment (build 1.8.0_45-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode)

If you are running Solr on Linux, then the output of "uname -a" should
tell you whether your operating system is 32 or 64 bit.

Thanks,
Shawn

Re: Solr 5.0, Ubuntu 14.04, SOLR_JAVA_MEM problem

2015-05-04 Thread Bruno Mannina


Shaun thanks a lot for this comment,

So, I have this information, no information about 32 or 64 bits...

solr@linux:~$ java -version
java version "1.7.0_79"
OpenJDK Runtime Environment (IcedTea 2.5.5) (7u79-2.5.5-0ubuntu0.14.04.2)
OpenJDK Server VM (build 24.79-b02, mixed mode)
solr@linux:~$

solr@linux:~$ uname -a
Linux linux 3.13.0-51-generic #84-Ubuntu SMP Wed Apr 15 12:11:46 UTC 
2015 i686 i686 i686 GNU/Linux

solr@linux:~$

I need to install a new version of Java ? I just install my ubuntu since 
one week :)

updates are up to date.

Le 04/05/2015 17:23, Shawn Heisey a écrit :

On 5/4/2015 9:09 AM, Bruno Mannina wrote:

Yes ! it works !!!

Scott perfect 

For my config 3g do not work, but 2g yes !

If you can't start Solr with a 3g heap, chances are that you are running
a 32-bit version of Java.  A 32-bit Java cannot go above a 2GB heap.  A
64-bit JVM requires a 64-bit operating system, which requires a 64-bit
CPU.  Since 2006, Intel has only been providing 64-bit chips to the
consumer market, and getting a 32-bit chip in a new computer has gotten
extremely difficult.  The server market has had only 64-bit chips from
Intel since 2005.  I am not sure what those dates look like for AMD
chips, but it is probably similar.

Running "java -version" should give you enough information to determine
whether your Java is 32-bit or 64-bit.  This is the output from that
command on a Linux machine that is running a 64-bit JVM from Oracle:

root@idxa4:~# java -version
java version "1.8.0_45"
Java(TM) SE Runtime Environment (build 1.8.0_45-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode)

If you are running Solr on Linux, then the output of "uname -a" should
tell you whether your operating system is 32 or 64 bit.

Thanks,
Shawn






---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
http://www.avast.com

Re: Solr 5.0, Ubuntu 14.04, SOLR_JAVA_MEM problem

2015-05-04 Thread Shawn Heisey

On 5/4/2015 10:28 AM, Bruno Mannina wrote:
> solr@linux:~$ java -version
> java version "1.7.0_79"
> OpenJDK Runtime Environment (IcedTea 2.5.5) (7u79-2.5.5-0ubuntu0.14.04.2)
> OpenJDK Server VM (build 24.79-b02, mixed mode)
> solr@linux:~$
>
> solr@linux:~$ uname -a
> Linux linux 3.13.0-51-generic #84-Ubuntu SMP Wed Apr 15 12:11:46 UTC
> 2015 i686 i686 i686 GNU/Linux
> solr@linux:~$

Both Linux and Java are 32-bit.  For linux, I know this because your
arch is "i686", which means it is coded for a newer generation 32-bit
CPU.  You can't be running a 64-bit Java, and the Java version confirms
that because it doesn't contain "64-bit".

Run this command:

cat /proc/cpuinfo

If the "flags" on the CPU contain the string "lm" (long mode), then your
CPU is capable of running a 64-bit (sometimes known as amd64 or x86_64)
version of Linux, and a 64-bit Java.  You will need to re-install both
Linux and Java to get this capability.

Here's "uname -a" from a 64-bit version of Ubuntu:

Linux lb1 3.13.0-51-generic #84-Ubuntu SMP Wed Apr 15 12:08:34 UTC 2015
x86_64 x86_64 x86_64 GNU/Linux

Since you are running 5.0, I would recommend Oracle Java 8.

http://www.webupd8.org/2012/09/install-oracle-java-8-in-ubuntu-via-ppa.html

Thanks,
Shawn

Answer engine - NLP related question

2015-05-04 Thread bbarani

Hi,

Note: I have very basic knowledge on NLP..

I am working on an answer engine prototype where when the user enters a
keyword and searches for it we show them the answer corresponding to that
keyword (rather than displaying multiple documents that match the keyword)

For Ex: 

When user searches for 'activate phone', we have answerTags tagged in the
SOLR documents along with answer field (that will be displayed as answer for
this keyword). 


activate phone
activation
activations
activate



This is the answer


This works fine when user searches for the exact keyword tagged in the
'answerTag' field.

Now I am trying to figure out a way to match keywords based on position of
speech too.

Example: 

I want 

'how to activate phone' to match 'activate phone' in answerTags field 
'how to activate' to match 'activate' in answerTags field 

I dont want to add all possible combinations of search keywords in
answerTags... rather match is based on position of speech and other NLP
techniques.

I am trying to figure out a way to standardize the keywords (may be using
NLP) and map it to predefined keywords (may be rule based NLP?). I am not
sure how to proceed with these kinds of searches.. Any insight is
appreciated.








--
View this message in context: 
http://lucene.472066.n3.nabble.com/Answer-engine-NLP-related-question-tp4203730.html
Sent from the Solr - User mailing list archive at Nabble.com.

apache 5.1.0 under apache web server

2015-05-04 Thread Tim Dunphy

Hey all,

I need to run solr 5.1.0 on port 80 with some basic apache authentication.
Normally, under earlier versions of solr I would set it up to run under
tomcat, then connect it to apache web server using mod_jk.

However 5.1.0 seems totally different. I see that tomcat support has been
removed from the latest versions. So how do I set this up in front of
apache web server? I need to get this running on port 443 with SSL and
something at least equivalent to basic apache auth.

I really wish this hadn't changed because I could set this up under the old
method rather easily and quickly. Sigh..

But thank you for your advice!

Tim

-- 
GPG me!!

gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B

Re: apache 5.1.0 under apache web server

2015-05-04 Thread Shawn Heisey

On 5/4/2015 1:04 PM, Tim Dunphy wrote:
> I need to run solr 5.1.0 on port 80 with some basic apache authentication.
> Normally, under earlier versions of solr I would set it up to run under
> tomcat, then connect it to apache web server using mod_jk.
>
> However 5.1.0 seems totally different. I see that tomcat support has been
> removed from the latest versions. So how do I set this up in front of
> apache web server? I need to get this running on port 443 with SSL and
> something at least equivalent to basic apache auth.
>
> I really wish this hadn't changed because I could set this up under the old
> method rather easily and quickly. Sigh..
>
> But thank you for your advice!

The container in the default 5.x install is a completely unmodified
Jetty 8.x (soon to be Jetty 9.x) with a stripped and optimized config. 
The config for Jetty is similar to tomcat, you just need to figure out
how to make it work with Apache like you would with Tomcat.

Incidentially, at least for right now, you CAN still take the .war file
out of the jetty install and put it in Tomcat just like you would have
with a 4.3 or later version.  We are planning on making that impossible
in a later 5.x version, but for right now, it is still possible.

Thanks,
Shawn

Re: Solr 5.0, Ubuntu 14.04, SOLR_JAVA_MEM problem

2015-05-04 Thread Bruno Mannina


ok, I note all these information, thanks !

I will update if it's needed. 2go seems to be ok.

Le 04/05/2015 18:46, Shawn Heisey a écrit :

On 5/4/2015 10:28 AM, Bruno Mannina wrote:

solr@linux:~$ java -version
java version "1.7.0_79"
OpenJDK Runtime Environment (IcedTea 2.5.5) (7u79-2.5.5-0ubuntu0.14.04.2)
OpenJDK Server VM (build 24.79-b02, mixed mode)
solr@linux:~$

solr@linux:~$ uname -a
Linux linux 3.13.0-51-generic #84-Ubuntu SMP Wed Apr 15 12:11:46 UTC
2015 i686 i686 i686 GNU/Linux
solr@linux:~$

Both Linux and Java are 32-bit.  For linux, I know this because your
arch is "i686", which means it is coded for a newer generation 32-bit
CPU.  You can't be running a 64-bit Java, and the Java version confirms
that because it doesn't contain "64-bit".

Run this command:

cat /proc/cpuinfo

If the "flags" on the CPU contain the string "lm" (long mode), then your
CPU is capable of running a 64-bit (sometimes known as amd64 or x86_64)
version of Linux, and a 64-bit Java.  You will need to re-install both
Linux and Java to get this capability.

Here's "uname -a" from a 64-bit version of Ubuntu:

Linux lb1 3.13.0-51-generic #84-Ubuntu SMP Wed Apr 15 12:08:34 UTC 2015
x86_64 x86_64 x86_64 GNU/Linux

Since you are running 5.0, I would recommend Oracle Java 8.

http://www.webupd8.org/2012/09/install-oracle-java-8-in-ubuntu-via-ppa.html

Thanks,
Shawn






---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
http://www.avast.com

Re: apache 5.1.0 under apache web server

2015-05-04 Thread Tim Dunphy

>
> The container in the default 5.x install is a completely unmodified
> Jetty 8.x (soon to be Jetty 9.x) with a stripped and optimized config.
> The config for Jetty is similar to tomcat, you just need to figure out
> how to make it work with Apache like you would with Tomcat.
>
> Incidentially, at least for right now, you CAN still take the .war file
> out of the jetty install and put it in Tomcat just like you would have
> with a 4.3 or later version.  We are planning on making that impossible
> in a later 5.x version, but for right now, it is still possible.

Hmm well of the two options you present the second one sounds a little
easier and more attractive. However, when I tried doing just that like so:

[root@aoadbld00032la ~]# cp -v solr-5.1.0/server/webapps/solr.war
/usr/local/tomcat/webapps/
`solr-5.1.0/server/webapps/solr.war' -> `/usr/local/tomcat/webapps/solr.war'

And then start tomcat up... I can't get to the solr interface :(

HTTP Status 503 - Server is shutting down or failed to initialize

*type* Status report

*message* *Server is shutting down or failed to initialize*

*description* *The requested service is not currently available.*
--
Apache Tomcat/8.0.21Not seeing anything telling in the logs, unfortunately:

[root@aoadbld00032la ~]# tail /usr/local/tomcat/logs/catalina.out
04-May-2015 15:48:26.945 INFO [localhost-startStop-1]
org.apache.catalina.startup.HostConfig.deployDirectory Deployment of web
application directory /usr/local/apache-tomcat-8.0.21/webapps/ROOT has
finished in 32 ms
04-May-2015 15:48:26.946 INFO [localhost-startStop-1]
org.apache.catalina.startup.HostConfig.deployDirectory Deploying web
application directory /usr/local/apache-tomcat-8.0.21/webapps/host-manager
04-May-2015 15:48:26.979 INFO [localhost-startStop-1]
org.apache.jasper.servlet.TldScanner.scanJars At least one JAR was scanned
for TLDs yet contained no TLDs. Enable debug logging for this logger for a
complete list of JARs that were scanned but no TLDs were found in them.
Skipping unneeded JARs during scanning can improve startup time and JSP
compilation time.
04-May-2015 15:48:26.983 INFO [localhost-startStop-1]
org.apache.catalina.startup.HostConfig.deployDirectory Deployment of web
application directory /usr/local/apache-tomcat-8.0.21/webapps/host-manager
has finished in 36 ms
04-May-2015 15:48:26.983 INFO [localhost-startStop-1]
org.apache.catalina.startup.HostConfig.deployDirectory Deploying web
application directory /usr/local/apache-tomcat-8.0.21/webapps/examples
04-May-2015 15:48:27.195 INFO [localhost-startStop-1]
org.apache.jasper.servlet.TldScanner.scanJars At least one JAR was scanned
for TLDs yet contained no TLDs. Enable debug logging for this logger for a
complete list of JARs that were scanned but no TLDs were found in them.
Skipping unneeded JARs during scanning can improve startup time and JSP
compilation time.
04-May-2015 15:48:27.245 INFO [localhost-startStop-1]
org.apache.catalina.startup.HostConfig.deployDirectory Deployment of web
application directory /usr/local/apache-tomcat-8.0.21/webapps/examples has
finished in 262 ms
04-May-2015 15:48:27.248 INFO [main]
org.apache.coyote.AbstractProtocol.start Starting ProtocolHandler
["http-nio-8080"]
04-May-2015 15:48:27.257 INFO [main]
org.apache.coyote.AbstractProtocol.start Starting ProtocolHandler
["ajp-nio-8009"]
04-May-2015 15:48:27.258 INFO [main]
org.apache.catalina.startup.Catalina.start Server startup in 3350 ms

However it sounds like you're sure it's supposed to work this way. Can I
get some advice on this error?

Thanks
Tim

On Mon, May 4, 2015 at 3:12 PM, Shawn Heisey  wrote:

> On 5/4/2015 1:04 PM, Tim Dunphy wrote:
> > I need to run solr 5.1.0 on port 80 with some basic apache
> authentication.
> > Normally, under earlier versions of solr I would set it up to run under
> > tomcat, then connect it to apache web server using mod_jk.
> >
> > However 5.1.0 seems totally different. I see that tomcat support has been
> > removed from the latest versions. So how do I set this up in front of
> > apache web server? I need to get this running on port 443 with SSL and
> > something at least equivalent to basic apache auth.
> >
> > I really wish this hadn't changed because I could set this up under the
> old
> > method rather easily and quickly. Sigh..
> >
> > But thank you for your advice!
>
> The container in the default 5.x install is a completely unmodified
> Jetty 8.x (soon to be Jetty 9.x) with a stripped and optimized config.
> The config for Jetty is similar to tomcat, you just need to figure out
> how to make it work with Apache like you would with Tomcat.
>
> Incidentially, at least for right now, you CAN still take the .war file
> out of the jetty install and put it in Tomcat just like you would have
> with a 4.3 or later version.  We are planning on making that impossible
> in a later 5.x version, but for right now, it is still possible.
>
> Thanks,
> Shawn
>
>

-- 
GPG me!!

gpg --keyserver poo

Re: apache 5.1.0 under apache web server

2015-05-04 Thread Shawn Heisey

On 5/4/2015 1:50 PM, Tim Dunphy wrote:
> However it sounds like you're sure it's supposed to work this way. Can
> I get some advice on this error?

If you tried copying JUST the .war file with any version from 4.3 on,
something similar would happen.  At the request of many of our more
advanced users, starting in version 4.3, the war does not contain jars
required for the logging in Solr to work.  This makes it possible to
change the logging framework without surgery on the .war file or
recompiling Solr.

If those jars are not available, Solr will not start.  The logging jars
must be placed in the container's classpath for logging to work, which
has been done in the server included with Solr, placing them in the
lib/ext directory of the Jetty server.  With the jars you can find in
lib/ext, you also need a config file for log4j ... which you can find in
the example at resources/log4j.properties.

http://wiki.apache.org/solr/SolrLogging

Thanks,
Shawn

Re: Optimal configuration for high throughput indexing

2015-05-04 Thread Vinay Pothnis

Hi Erick,

Thanks for your inputs.

I think long before we had made a conscious decision to skip solrJ client
and use plain http. I think it might have been because at the time solrJ
client was queueing update in its memory or something.

But nonetheless, we will give the latest solrJ client + cloudSolrServer a
try.

* Yes, the documents are pretty small.
* We are using G1 collector and there are no major GCs, but however, there
are a lot of minor GCs sometimes going upto 2s per minute overall.
* We are allocating 12G of memory.
* Query rate: 3750 TPS (transactions per second)
* I need to get the exact rate for insert/updates.

I will make the solrJ client change first and give it a test.

Thanks
Vinay

On 3 May 2015 at 09:37, Erick Erickson  wrote:

> First, you shouldn't be using HttpSolrClient, use CloudSolrServer
> (CloudSolrClient in 5.x). That takes
> the ZK address and routes the docs to the leader, reducing the network
> hops docs have to go
> through. AFAIK, in cloud setups it is in every way superior to http.
>
> I'm guessing your docs aren't huge. You haven't really told us what
> "high indexing rates" and
> "high query rates" are in your environment, so it's hard to say much.
> For comparison I get
> 2-3K docs/sec on my laptop (no query load though).
>
> The most frequent problem for nodes going into recovery in this
> scenario is the ZK timeout
> being exceeded. This is often triggered by excessive GC pauses, some
> more details would
> help here:
>
> How much memory are you allocating to Solr? Have you turned on GC
> logging to see whether
> you're getting "stop the world" GC pauses? What rates _are_ you seeing?
>
> Personally, I'd concentrate on the nodes going into recovery before
> anything else. Until that's
> fixed any other things you do will not be predictive of much.
>
> BTW, I typically start with batch sizes of 1,000 FWIW. Sometimes
> that's too big, sometimes
> too small but it seems pretty reasonable most of the time.
>
> Best,
> Erick
>
> On Thu, Apr 30, 2015 at 12:20 PM, Vinay Pothnis  wrote:
> > Hello,
> >
> > I have a usecase with the following characteristics:
> >
> >  - High index update rate (adds/updates)
> >  - High query rate
> >  - Low index size (~800MB for 2.4Million docs)
> >  - The documents that are created at the high rate eventually "expire"
> and
> > are deleted regularly at half hour intervals
> >
> > I currently have a solr cloud set up with 1 shard and 4 replicas.
> >  * My index updates are sent to a VIP/loadbalancer (round robins to one
> of
> > the 4 solr nodes)
> >  * I am using http client to send the updates
> >  * Using batch size of 100 and 8 to 10 threads sending the batch of
> updates
> > to solr.
> >
> > When I try to run tests to scale out the indexing rate, I see the
> following:
> >  * solr nodes go into recovery
> >  * updates are taking really long to complete.
> >
> > As I understand, when a node receives an update:
> >  * If it is the leader, it forwards the update to all the replicas and
> > waits until it receives the reply from all of them before replying back
> to
> > the client that sent the reply.
> >  * If it is not the leader, it forwards the update to the leader, which
> > THEN does the above steps mentioned.
> >
> > How do I go about scaling the index updates:
> >  * As I add more replicas, my updates would get slower and slower?
> >  * Is there a way I can configure the leader to wait for say N out of M
> > replicas only?
> >  * Should I be targeting the updates to only the leader?
> >  * Any other approach i should be considering?
> >
> > Thanks
> > Vinay
>

Re: apache 5.1.0 under apache web server

2015-05-04 Thread Chris Hostetter


: I need to run solr 5.1.0 on port 80 with some basic apache authentication.
: Normally, under earlier versions of solr I would set it up to run under
: tomcat, then connect it to apache web server using mod_jk.

the general gist of what you should look into is running Solr (via 
./bin/solr) on some protected port (via firewalls or what have you) 
and running httpd (if that's what you are comfortable with) on port 
80 using something like mod_proxy to proxy (authenticated) requests to 
your solr port.


-Hoss
http://www.lucidworks.com/

Re: Answer engine - NLP related question

2015-05-04 Thread Upayavira

What you seem to be asking for is POS (parts of speech) analysis. You
can use OpenNLP to do that for you, likely outside of Solr. OpenNLP will
identify nouns, verbs, etc in your sentences. The question is, can you
identify certain of those types to be filtered out  from your queries?

A simple bit of Java code using OpenNLP should answer that for you.

Upayavira

On Mon, May 4, 2015, at 05:52 PM, bbarani wrote:
> Hi,
> 
> Note: I have very basic knowledge on NLP..
> 
> I am working on an answer engine prototype where when the user enters a
> keyword and searches for it we show them the answer corresponding to that
> keyword (rather than displaying multiple documents that match the
> keyword)
> 
> For Ex: 
> 
> When user searches for 'activate phone', we have answerTags tagged in the
> SOLR documents along with answer field (that will be displayed as answer
> for
> this keyword). 
> 
> 
> activate phone
> activation
> activations
> activate
> 
> 
> 
> This is the answer
> 
> 
> This works fine when user searches for the exact keyword tagged in the
> 'answerTag' field.
> 
> Now I am trying to figure out a way to match keywords based on position
> of
> speech too.
> 
> Example: 
> 
> I want 
> 
> 'how to activate phone' to match 'activate phone' in answerTags field 
> 'how to activate' to match 'activate' in answerTags field 
> 
> I dont want to add all possible combinations of search keywords in
> answerTags... rather match is based on position of speech and other NLP
> techniques.
> 
> I am trying to figure out a way to standardize the keywords (may be using
> NLP) and map it to predefined keywords (may be rule based NLP?). I am not
> sure how to proceed with these kinds of searches.. Any insight is
> appreciated.
> 
> 
> 
> 
> 
> 
> 
> 
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Answer-engine-NLP-related-question-tp4203730.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Solr 5.0 - uniqueKey case insensitive ?

2015-05-04 Thread Bruno Mannina


Dear Solr users,

I have a problem with SOLR5.0 (and not on SOLR3.6)

What kind of field can I use for my uniqueKey field named "code" if I
want it case insensitive ?

On SOLR3.6, I defined a string_ci field like this:



  
  





and it works fine.
- If I add a document with the same code then the doc is updated.
- If I search a document with lower or upper case, the doc is found


But in SOLR5.0, if I use this definition then :
- I can search in lower/upper case, it's OK
- BUT if I add a doc with the same code then the doc is added not updated !?

I read that the problem could be that the type of field is tokenized
instead of use a string.

If I change from string_ci to string, then
- I lost the possibility to search in lower/upper case
- but it works fine to update the doc.

So, could you help me to find the right field type to:

- search in case insensitive
- if I add a document with the same code, the old doc will be updated

Thanks a lot !


---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
http://www.avast.com

Re: Solr 5.0 - uniqueKey case insensitive ?

2015-05-04 Thread Chris Hostetter


: On SOLR3.6, I defined a string_ci field like this:
: 
: 
: 
:   
:   
: 
: 
: 
: 


I'm really suprised that field would have worked for you (reliably) as a 
uniqueKey field even in Solr 3.6.

the best practice for something like what you describe has always (going 
back to Solr 1.x) been to use a copyField to create a case insensitive 
copy of your uniqueKey for searching.

if, for some reason, you really want case insensitve *updates* (so a doc 
with id "foo" overwrites a doc with id "FOO" then the only reliable way to 
make something like that work is to do the lowercassing in an 
UpdateProcessor to ensure it happens *before* the docs are distributed to 
the correct shard, and so the correct existing doc is overwritten (even if 
you aren't using solr cloud)



-Hoss
http://www.lucidworks.com/

Re: Optimal configuration for high throughput indexing

2015-05-04 Thread Shawn Heisey

On 5/4/2015 2:36 PM, Vinay Pothnis wrote:
> But nonetheless, we will give the latest solrJ client + cloudSolrServer a
> try.
>
> * Yes, the documents are pretty small.
> * We are using G1 collector and there are no major GCs, but however, there
> are a lot of minor GCs sometimes going upto 2s per minute overall.
> * We are allocating 12G of memory.
> * Query rate: 3750 TPS (transactions per second)
> * I need to get the exact rate for insert/updates.
>
> I will make the solrJ client change first and give it a test.

Whether that 12GB heap size is for Solr itself or for your client code,
with a heap that large, you should be doing more tuning than simply
turning on G1GC.  I have spent quite a lot of time working on GC tuning
for Solr, and the results of that work can be found here:

http://wiki.apache.org/solr/ShawnHeisey

I cannot claim that these are the best options you can find for Solr,
but they've worked well for me, and for others.

Thanks,
Shawn

Re: Solr 5.0 - uniqueKey case insensitive ?

2015-05-04 Thread Bruno Mannina


Hello Chris,

yes I confirm on my SOLR3.6 it works fine since several years, and each 
doc added with same code is updated not added.


To be more clear, I receive docs with a field name "pn" and it's the 
uniqueKey, and it always in uppercase


so I must define in my schema.xml

required="true" stored="true"/>
indexed="true" stored="false"/>

...
   id
...
  

but the application that use solr already exists so it requests with pn 
field not id, i cannot change that.
and in each docs I receive, there is not id field, just pn field, and  i 
cannot also change that.


so there is a problem no ? I must import a id field and request a pn 
field, but I have a pn field only for import...



Le 05/05/2015 01:00, Chris Hostetter a écrit :

: On SOLR3.6, I defined a string_ci field like this:
:
: 
: 
:   
:   
: 
: 
:
: 


I'm really suprised that field would have worked for you (reliably) as a
uniqueKey field even in Solr 3.6.

the best practice for something like what you describe has always (going
back to Solr 1.x) been to use a copyField to create a case insensitive
copy of your uniqueKey for searching.

if, for some reason, you really want case insensitve *updates* (so a doc
with id "foo" overwrites a doc with id "FOO" then the only reliable way to
make something like that work is to do the lowercassing in an
UpdateProcessor to ensure it happens *before* the docs are distributed to
the correct shard, and so the correct existing doc is overwritten (even if
you aren't using solr cloud)



-Hoss
http://www.lucidworks.com/





---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
http://www.avast.com

Re: Optimal configuration for high throughput indexing

2015-05-04 Thread Vinay Pothnis

Hi Shawn,

Thanks for your inputs. The 12GB is for solr.
I did read through your wiki and your G1 related recommended settings are
already included. Tried a lower memory config (7G) as well and it did not
result in any better results.

Right now, in the process of changing the updates to use Solrj
CloudSolrServer and testing it.

Thanks
Vinay

On 4 May 2015 at 16:09, Shawn Heisey  wrote:

> On 5/4/2015 2:36 PM, Vinay Pothnis wrote:
> > But nonetheless, we will give the latest solrJ client + cloudSolrServer a
> > try.
> >
> > * Yes, the documents are pretty small.
> > * We are using G1 collector and there are no major GCs, but however,
> there
> > are a lot of minor GCs sometimes going upto 2s per minute overall.
> > * We are allocating 12G of memory.
> > * Query rate: 3750 TPS (transactions per second)
> > * I need to get the exact rate for insert/updates.
> >
> > I will make the solrJ client change first and give it a test.
>
> Whether that 12GB heap size is for Solr itself or for your client code,
> with a heap that large, you should be doing more tuning than simply
> turning on G1GC.  I have spent quite a lot of time working on GC tuning
> for Solr, and the results of that work can be found here:
>
> http://wiki.apache.org/solr/ShawnHeisey
>
> I cannot claim that these are the best options you can find for Solr,
> but they've worked well for me, and for others.
>
> Thanks,
> Shawn
>
>

Re: Delete document stop my solr 5.0 ?!

2015-05-04 Thread Chris Hostetter


XY-ish problem -- if you are deleting a bunch of documents by id, why have 
you switched from using delete-by-id to using delete-by-query?  What drove 
that decision?  Did you try using delete-by-query in your 3.6 setup?


: my f1 field is my key field. It is unique.
...
: On my old solr 3.6, I don't use the same line to delete document, I use:
: java -jar -Ddata=args -Dcommit=no  post.jar
: "113422"
...
: why this method do not work now ?


https://people.apache.org/~hossman/#xyproblem
XY Problem

Your question appears to be an "XY Problem" ... that is: you are dealing
with "X", you are assuming "Y" will help you, and you are asking about "Y"
without giving more details about the "X" so that we can understand the
full issue.  Perhaps the best solution doesn't involve "Y" at all?
See Also: http://www.perlmonks.org/index.pl?node_id=542341





-Hoss
http://www.lucidworks.com/

Re: Injecting synonymns into Solr

2015-05-04 Thread Zheng Lin Edwin Yeo

Yes, the underlying mechanism uses java. But the collection isn't able to
load when the Solr starts up, so it didn't return anything even if I use
url.
Is it just due to my machine not having enough memory?

Regards,
Edwin
On 4 May 2015 20:12, "Roman Chyla"  wrote:

> It shouldn't matter.  Btw try a url instead of a file path. I think the
> underlying loading mechanism uses java File , it could work.
> On May 4, 2015 2:07 AM, "Zheng Lin Edwin Yeo" 
> wrote:
>
> > Would like to check, will this method of splitting the synonyms into
> > multiple files use up a lot of memory?
> >
> > I'm trying it with about 10 files and that collection is not able to be
> > loaded due to insufficient memory.
> >
> > Although currently my machine only have 4GB of memory, but I only have
> > 500,000 records indexed, so not sure if there's a significant impact in
> the
> > future (even with larger memory) when my index grows and other things
> like
> > faceting, highlighting, and carrot tools are implemented.
> >
> > Regards,
> > Edwin
> >
> >
> >
> > On 1 May 2015 at 11:08, Zheng Lin Edwin Yeo 
> wrote:
> >
> > > Thank you for the info. Yup this works. I found out that we can't load
> > > files that are more than 1MB into zookeeper, as it happens to any files
> > > that's larger than 1MB in size, not just the synonyms files.
> > > But I'm not sure if there will be an impact to the system, as the
> number
> > > of synonym text file can potentially grow up to more than 20 since my
> > > sample synonym file size is more than 20MB.
> > >
> > > Currently I only have less than 500,000 records indexed in Solr, so not
> > > sure if there will be a significant impact as compared to one which has
> > > millions of records.
> > > Will try to get more records indexed and will update here again.
> > >
> > > Regards,
> > > Edwin
> > >
> > >
> > > On 1 May 2015 at 08:17, Philippe Soares 
> wrote:
> > >
> > >> Split your synonyms into multiple files and set the
> SynonymFilterFactory
> > >> with a coma-separated list of files. e.g. :
> > >> synonyms="syn1.txt,syn2.txt,syn3.txt"
> > >>
> > >> On Thu, Apr 30, 2015 at 8:07 PM, Zheng Lin Edwin Yeo <
> > >> edwinye...@gmail.com>
> > >> wrote:
> > >>
> > >> > Just to populate it with the general synonym words. I've managed to
> > >> > populate it with some source online, but is there a limit to what it
> > can
> > >> > contains?
> > >> >
> > >> > I can't load the configuration into zookeeper if the synonyms.txt
> file
> > >> > contains more than 2100 lines.
> > >> >
> > >> > Regards,
> > >> > Edwin
> > >> > On 1 May 2015 05:44, "Chris Hostetter" 
> > >> wrote:
> > >> >
> > >> > >
> > >> > > : There is a possible solution here:
> > >> > > : https://issues.apache.org/jira/browse/LUCENE-2347 (Dump WordNet
> > to
> > >> > SOLR
> > >> > > : Synonym format).
> > >> > >
> > >> > > If you have WordNet synonyms you do't need any special code/tools
> to
> > >> > > convert them -- the current solr.SynonymFilterFactory supports
> > wordnet
> > >> > > files (just specify format="wordnet")
> > >> > >
> > >> > >
> > >> > > : > > Does anyone knows any faster method of populating the
> > >> synonyms.txt
> > >> > > file
> > >> > > : > > instead of manually typing in the words into the file, which
> > >> there
> > >> > > could
> > >> > > : > be
> > >> > > : > > thousands of synonyms around?
> > >> > >
> > >> > > populate from what?  what is hte source of your data?
> > >> > >
> > >> > > the default solr synonym file format is about as simple as it
> could
> > >> > > possibly be -- pretty trivial to generate it from scripts -- the
> > hard
> > >> > part
> > >> > > is usually selecting the synonym data you want to use and parsing
> > >> > whatever
> > >> > > format it is already in.
> > >> > >
> > >> > >
> > >> > >
> > >> > > -Hoss
> > >> > > http://www.lucidworks.com/
> > >> > >
> > >> >
> > >>
> > >
> > >
> >
>

Re: SolrCloud+HDFS disappointed indexing performance

2015-05-04 Thread xinwu

Can someone help me ?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-HDFS-disappointed-indexing-performance-tp4203155p4203852.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr Cloud

2015-05-04 Thread Jilani Shaik

Thanks Shawn, It has provided the pointers of open source, I am really
interested to look for open source solution, I have basic knowledge of
Ganglia and Nagios. I have gone through the "sematext" and our company
already using "newrelic" on this space. But I am interested in open source
similar to Ambari/cloud era manager as one shop for this. Even I am
interested to contribute on this as a developer. Are there any one working
on this monitoring tool for Apache Solr.

Thanks,
Jilani

On Mon, May 4, 2015 at 7:08 PM, Shawn Heisey  wrote:

> On 5/4/2015 6:16 AM, Jilani Shaik wrote:
> > Do we have any monitoring tools for Apache Solr Cloud? similar to Apache
> > Ambari which is used for Hadoop Cluster.
> >
> > Basically I am looking for tool similar to Apache Ambari, which will give
> > us various metrics in terms of graphs and charts along with deep details
> > for each node in Hadoop cluster.
>
> The most comprehensive and capable Solr monitoring available that I know
> of is a service provided by Sematext.
>
> http://sematext.com/
>
> If you want something cheaper, you'll have to build it yourself with
> free tools.  Some of the metrics available from sematext can be
> duplicated by Xymon or Nagios, others can be duplicated by JavaMelody or
> another monitoring tool made specifically for Java programs.  I have
> duplicated some of that information with tools that I wrote myself, like
> this status servlet:
>
> https://www.dropbox.com/s/gh6e47mu8sp7zkt/status-page-solr.png?dl=0
>
> Nothing that I have built comes close to what sematext provides, but if
> you want history from their SPM product on your servers that goes back
> more than half an hour, you will pay for it.   Their prices are actually
> fairly reasonable for everything you get.
>
> Thanks,
> Shawn
>
>

Re: Solr Cloud

2015-05-04 Thread Anirudha Jadhav

the jmx metrics are good, you can start there, lets talk offline for more.
-Ani

On Mon, May 4, 2015 at 10:51 PM, Jilani Shaik  wrote:

> Thanks Shawn, It has provided the pointers of open source, I am really
> interested to look for open source solution, I have basic knowledge of
> Ganglia and Nagios. I have gone through the "sematext" and our company
> already using "newrelic" on this space. But I am interested in open source
> similar to Ambari/cloud era manager as one shop for this. Even I am
> interested to contribute on this as a developer. Are there any one working
> on this monitoring tool for Apache Solr.
>
> Thanks,
> Jilani
>
> On Mon, May 4, 2015 at 7:08 PM, Shawn Heisey  wrote:
>
> > On 5/4/2015 6:16 AM, Jilani Shaik wrote:
> > > Do we have any monitoring tools for Apache Solr Cloud? similar to
> Apache
> > > Ambari which is used for Hadoop Cluster.
> > >
> > > Basically I am looking for tool similar to Apache Ambari, which will
> give
> > > us various metrics in terms of graphs and charts along with deep
> details
> > > for each node in Hadoop cluster.
> >
> > The most comprehensive and capable Solr monitoring available that I know
> > of is a service provided by Sematext.
> >
> > http://sematext.com/
> >
> > If you want something cheaper, you'll have to build it yourself with
> > free tools.  Some of the metrics available from sematext can be
> > duplicated by Xymon or Nagios, others can be duplicated by JavaMelody or
> > another monitoring tool made specifically for Java programs.  I have
> > duplicated some of that information with tools that I wrote myself, like
> > this status servlet:
> >
> > https://www.dropbox.com/s/gh6e47mu8sp7zkt/status-page-solr.png?dl=0
> >
> > Nothing that I have built comes close to what sematext provides, but if
> > you want history from their SPM product on your servers that goes back
> > more than half an hour, you will pay for it.   Their prices are actually
> > fairly reasonable for everything you get.
> >
> > Thanks,
> > Shawn
> >
> >
>



-- 
Anirudha P. Jadhav

Union and intersection methods in solr DocSet

2015-05-04 Thread Gajendra Dadheech

I have a requirement where i need to find matching docsets for different
queries and then do either union or intersection on those docsets. e.g :

DocSet docset1 = Searcher.getDocSet(query1)
DocSet docset2 = Searcher.getDocSet(query2);

Docset finalDocset = docset1.intersection(docset2);

Is this a valid approach ? Give docset could either be a sortedintdocset or
a bitdocset. I am facing ArrayIndexOutOfBoundException when
union/intersected between different kind of docsets.

Thanks and regards,
Gajendra Dadheech

48 matches

Mail list logo