date:20170223

maxwarmingSearchers and memory leak

2017-02-23 Thread SOLR4189

We have maxwarmingSearchers set to 2 and field value cache set to initial
size of 64. We saw that by taking a heap dump that our caches consume 70% of
the heap size, by looking into the dump we saw that fieldValueCache has 6
occurences of org.apache.solr.util.concurrentCache.
When we have maxWarmingSearches=2 we would expect to have only 3 (maybe 4
before GC has been launched).
What can it be? We use solr4.10.1



--
View this message in context: 
http://lucene.472066.n3.nabble.com/maxwarmingSearchers-and-memory-leak-tp4321937.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Replicas fail immediately in new collection

2017-02-23 Thread Shalin Shekhar Mangar

SOLR-9739 changed the writeStr method to accept a CharSequence from
String in 6.4 so my guess is that your classpath has a newer (6.4+)
solrj version but an older solr-core jar that cannot find this new
method.

On Sat, Feb 18, 2017 at 5:16 AM, Walter Underwood
 wrote:
> Any idea why I would be getting this on a brand new, empty collection on the 
> first update?
>
> HTTP ERROR 500
> Problem accessing /solr/tutors_shard1_replica9/update. Reason:
> Server ErrorCaused 
> by:java.lang.NoSuchMethodError: 
> org.apache.solr.update.TransactionLog$LogCodec.writeStr(Ljava/lang/String;)V
> at 
> org.apache.solr.update.TransactionLog.writeCommit(TransactionLog.java:457)
>
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>



-- 
Regards,
Shalin Shekhar Mangar.

Re: Replicas fail immediately in new collection

2017-02-23 Thread Walter Underwood

I finally figured this out yesterday. Because the jar files have the version in 
the file name, I had a mix of jars from different versions. Depending on the 
load order, Solr could get into a situation where it was calling something that 
didn’t exist.

That was mysterious.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Feb 23, 2017, at 6:55 AM, Shalin Shekhar Mangar  
> wrote:
> 
> SOLR-9739 changed the writeStr method to accept a CharSequence from
> String in 6.4 so my guess is that your classpath has a newer (6.4+)
> solrj version but an older solr-core jar that cannot find this new
> method.
> 
> On Sat, Feb 18, 2017 at 5:16 AM, Walter Underwood
>  wrote:
>> Any idea why I would be getting this on a brand new, empty collection on the 
>> first update?
>> 
>> HTTP ERROR 500
>> Problem accessing /solr/tutors_shard1_replica9/update. Reason:
>> Server ErrorCaused 
>> by:java.lang.NoSuchMethodError: 
>> org.apache.solr.update.TransactionLog$LogCodec.writeStr(Ljava/lang/String;)V
>>at 
>> org.apache.solr.update.TransactionLog.writeCommit(TransactionLog.java:457)
>> 
>> 
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>> 
>> 
> 
> 
> 
> -- 
> Regards,
> Shalin Shekhar Mangar.

Re: Interval Facets with JSON

2017-02-23 Thread Tomás Fernández Löbbe

Hi Deniz,
Interval Facets is currently not supported with JSON Facets as Tom said.
Could you create a Jira issue?

On Fri, Feb 10, 2017 at 6:16 AM, Tom Evans  wrote:

> On Wed, Feb 8, 2017 at 11:26 PM, deniz  wrote:
> > Tom Evans-2 wrote
> >> I don't think there is such a thing as an interval JSON facet.
> >> Whereabouts in the documentation are you seeing an "interval" as JSON
> >> facet type?
> >>
> >>
> >> You want a range facet surely?
> >>
> >> One thing with range facets is that the gap is fixed size. You can
> >> actually do your example however:
> >>
> >> json.facet={hieght_facet:{type:range, gap:20, start:160, end:190,
> >> hardend:True, field:height}}
> >>
> >> If you do require arbitrary bucket sizes, you will need to do it by
> >> specifying query facets instead, I believe.
> >>
> >> Cheers
> >>
> >> Tom
> >
> >
> > nothing other than
> > https://cwiki.apache.org/confluence/display/solr/Faceting#Faceting-
> IntervalFaceting
> > for documentation on intervals...  i am ok with range queries as well but
> > intervals would fit better because of different sizes...
>
> That documentation is not for JSON facets though. You can't pick and
> choose features from the old facet system and use them in JSON facets
> unless they are mentioned in the JSON facet documentation:
>
> https://cwiki.apache.org/confluence/display/solr/JSON+Request+API
>
> and (not official documentation)
>
> http://yonik.com/json-facet-api/
>
> Cheers
>
> Tom
>

SOLRCloud on 6.4 on Ubuntu

2017-02-23 Thread Pouliot, Scott

I'm trying to find a good beginner level guide to setting up SolrCloud NOT
using the example configs that are provided with SOLR.

Here are my goals (and the steps I have done so far!):

1. Use an external Zookeeper server

a. wget
http://apache.claz.org/zookeeper/zookeeper-3.3.6/zookeeper-3.3.6.tar.gz

b. uncompress into /apps folder (Our company uses this type of standard
folder, so I'm following suit here)

c. Copy zoo_sample.cfg to zoo.cfg

d. Update data folder to: /apps/zookeeperData

e. Bin/zkServer.sh start

2. Install SOLR on both nodes

a. wget http://www.us.apache.org/dist/lucene/solr/6.4.1/solr-6.4.1.tgz

b. tar xzf solr-6.4.1.tgz solr-6.4.1/bin/install_solr_service.sh
--strip-components=2

c. ./install_solr_service.sh solr-6.4.1.tgz

d. Update solr.in.sh to include the ZKHome variable set to my ZK server's
ip on port 2181

Now it seems if I start SOLR manually with bin/solr start -c -p 8080 -z :2181 then it will actually load, but if I let it auto start, I get an HTTP
500 error on the Admin UI for SOLR.

I also can't seem to figure out what I need to upload into Zookeeper as far as
configuration files go. I created a test collection on the instance when I got
it up one time...but it has yet to start properly again for me.

Are there any GOOD tutorials out there? I have read most of the documentation
I can get my hands on thus far from Apache, and blogs and such, but the light
bulb still has not lit up for me yet and I feel like a n00b ;-)

My company is currently running SOLR in the old master/slave config and I'm
trying to setup a SOLRCloud so that we can toy with it in a Dev/QA Environment
and see what it's capable of. We're currently running 4 separate master/slave
SOLR server pairs in production to spread out the load a bit, but I'd rather
see us migrate towards a cluster/cloud scenario to gain some computing power
here!

Any help is GREATLY appreciated!

Scott

Phrase field matches not counting towards minimum match

2017-02-23 Thread dboychuck

Ok let me explain what I am trying to do first since there may be a better
approach. Recently I had been trying to increase solr's matching precision
by requiring that all of the words in a field match before allowing a match
on a field. I am using edismax as my query parser and since it tokenizes on
white space there's no way to make sure that if my query is q=foo bar and I
have a field named somefield indexed as a text field with foo bar that foo
doesn't match and bar doesn't match but the phrase "foo bar" does match.

I feel like I'm not explaining this very well but basically what I want to
do has already been done by Lucid works:
https://lucidworks.com/2014/07/02/automatic-phrase-tokenization-improving-lucene-search-precision-by-more-precise-linguistic-analysis/

However their solution requires that you use a pluggable query parser which
is not an extension of edismax. Now I haven't done a deep comparison but I'm
assuming I would lose access to all of edismax's parameters if I used their
pluggable query parser.

So instead I tried to replicate this functionality using edismax's pf2 and
pf3 parameters. It all works beautifully the way I have it setup except that
phrase field matches don't count towards my mm count.

Ok so now I will go into detail about how I have my index setup for this
specific example.

I am using solr's default text field to index a field named manufacturer2

here are the relevant parameters of my search

q=livex lighting 8193
qf=productid, manufacturer_stop
pf2=manufacturer2
mm=3<-1 5<-2 6<90%

now I am stopping the word lighting from my manufacturer_stop field using
stopwords so only livex is matching in the manufacturer_stop field

However "livex lighting" is matching in the manufacturer2 field using phrase
field matching in the pf2 parameter.

so my matches are the following:
MATCH livex in manufacturer_stop field
MATCH 8193 in productid field
MATCH "livex lighting" in manufacturer 2 field as a phrase field match

so I have three matches... however the phrase field match doesn't seem be be
counting towards my mm match requirement of 3 tokens passed 3 must match. If
I change my mm to require only 2 tokens must match I get the expected
result. But I want my phrase field to count towards my mm match requirement
since lighting is matching in my phrase field.

Any assistance would be appreciated Or if someone could suggest a better
approach that would also be appreciated.

--
View this message in context:
http://lucene.472066.n3.nabble.com/Phrase-field-matches-not-counting-towards-minimum-match-tp4322066.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: SOLRCloud on 6.4 on Ubuntu

2017-02-23 Thread Alexandre Rafalovitch

I don't know which of these you read, so it is a bit of a grab bag.
And I haven't reviewed some of them in depth. But hopefully, there is
a nugget of gold somewhere in there for you:

https://github.com/LucidWorks/solr-scale-tk
https://www.slideshare.net/thelabdude/apache-con-managingsolrcloudinthecloud
https://systemsarchitect.net/2013/04/06/painless-guide-to-solr-cloud-configuration/
https://github.com/bloomreach/solrcloud-haft
http://www.francelabs.com/blog/tutorial-solrcloud-5-amazon-ec2/ (oldish)
https://github.com/freedev/solrcloud-zookeeper-docker
https://sematext.com/blog/2016/12/13/solr-master-slave-solrcloud-migration/
http://dlightdaily.com/2016/11/30/solr-cloud-installation-zookeeper/
https://sbdevel.wordpress.com/2016/11/30/70tb-16b-docs-4-machines-1-solrcloud/
(just to drool, but it may also be useful)

Hope it helps,
   Alex.

http://www.solr-start.com/ - Resources for Solr users, new and experienced


On 23 February 2017 at 16:12, Pouliot, Scott
 wrote:
> I'm trying to find a good beginner level guide to setting up SolrCloud NOT 
> using the example configs that are provided with SOLR.
>
> Here are my goals (and the steps I have done so far!):
>
>
> 1.   Use an external Zookeeper server
>
> a.   wget 
> http://apache.claz.org/zookeeper/zookeeper-3.3.6/zookeeper-3.3.6.tar.gz
>
> b.   uncompress into /apps folder (Our company uses this type of standard 
> folder, so I'm following suit here)
>
> c.   Copy zoo_sample.cfg to zoo.cfg
>
> d.   Update data folder to: /apps/zookeeperData
>
> e.   Bin/zkServer.sh start
>
> 2.   Install SOLR on both nodes
>
> a.   wget http://www.us.apache.org/dist/lucene/solr/6.4.1/solr-6.4.1.tgz
>
> b.   tar xzf solr-6.4.1.tgz solr-6.4.1/bin/install_solr_service.sh 
> --strip-components=2
>
> c.   ./install_solr_service.sh solr-6.4.1.tgz
>
> d.   Update solr.in.sh to include the ZKHome variable set to my ZK 
> server's ip on port 2181
>
> Now it seems if I start SOLR manually with bin/solr start -c -p 8080 -z  IP>:2181 then it will actually load, but if I let it auto start, I get an 
> HTTP 500 error on the Admin UI for SOLR.
>
> I also can't seem to figure out what I need to upload into Zookeeper as far 
> as configuration files go.  I created a test collection on the instance when 
> I got it up one time...but it has yet to start properly again for me.
>
> Are there any GOOD tutorials out there?  I have read most of the 
> documentation I can get my hands on thus far from Apache, and blogs and such, 
> but the light bulb still has not lit up for me yet and I feel like a n00b  ;-)
>
> My company is currently running SOLR in the old master/slave config and I'm 
> trying to setup a SOLRCloud so that we can toy with it in a Dev/QA 
> Environment and see what it's capable of.  We're currently running 4 separate 
> master/slave SOLR server pairs in production to spread out the load a bit, 
> but I'd rather see us migrate towards a cluster/cloud scenario to gain some 
> computing power here!
>
> Any help is GREATLY appreciated!
>
> Scott

Subsciption to group

2017-02-23 Thread kurtuluş yılmaz

Hi;
I want to be part of solr user group. Can you add me.

Re: SOLRCloud on 6.4 on Ubuntu

2017-02-23 Thread Erick Erickson

Getting configs up (and down) from solr is done either with zkCli or
bin/solr. Personally I find the latter easier if only because it's in
a single place. Try
bin/solr zk -help
and you'll see a bunch of options. Once you do upload the config, you
must reload the collection for it to "take".

Best,
Erick

On Thu, Feb 23, 2017 at 1:51 PM, Alexandre Rafalovitch
 wrote:
> I don't know which of these you read, so it is a bit of a grab bag.
> And I haven't reviewed some of them in depth. But hopefully, there is
> a nugget of gold somewhere in there for you:
>
> https://github.com/LucidWorks/solr-scale-tk
> https://www.slideshare.net/thelabdude/apache-con-managingsolrcloudinthecloud
> https://systemsarchitect.net/2013/04/06/painless-guide-to-solr-cloud-configuration/
> https://github.com/bloomreach/solrcloud-haft
> http://www.francelabs.com/blog/tutorial-solrcloud-5-amazon-ec2/ (oldish)
> https://github.com/freedev/solrcloud-zookeeper-docker
> https://sematext.com/blog/2016/12/13/solr-master-slave-solrcloud-migration/
> http://dlightdaily.com/2016/11/30/solr-cloud-installation-zookeeper/
> https://sbdevel.wordpress.com/2016/11/30/70tb-16b-docs-4-machines-1-solrcloud/
> (just to drool, but it may also be useful)
>
> Hope it helps,
>Alex.
> 
> http://www.solr-start.com/ - Resources for Solr users, new and experienced
>
>
> On 23 February 2017 at 16:12, Pouliot, Scott
>  wrote:
>> I'm trying to find a good beginner level guide to setting up SolrCloud NOT 
>> using the example configs that are provided with SOLR.
>>
>> Here are my goals (and the steps I have done so far!):
>>
>>
>> 1.   Use an external Zookeeper server
>>
>> a.   wget 
>> http://apache.claz.org/zookeeper/zookeeper-3.3.6/zookeeper-3.3.6.tar.gz
>>
>> b.   uncompress into /apps folder (Our company uses this type of 
>> standard folder, so I'm following suit here)
>>
>> c.   Copy zoo_sample.cfg to zoo.cfg
>>
>> d.   Update data folder to: /apps/zookeeperData
>>
>> e.   Bin/zkServer.sh start
>>
>> 2.   Install SOLR on both nodes
>>
>> a.   wget http://www.us.apache.org/dist/lucene/solr/6.4.1/solr-6.4.1.tgz
>>
>> b.   tar xzf solr-6.4.1.tgz solr-6.4.1/bin/install_solr_service.sh 
>> --strip-components=2
>>
>> c.   ./install_solr_service.sh solr-6.4.1.tgz
>>
>> d.   Update solr.in.sh to include the ZKHome variable set to my ZK 
>> server's ip on port 2181
>>
>> Now it seems if I start SOLR manually with bin/solr start -c -p 8080 -z > IP>:2181 then it will actually load, but if I let it auto start, I get an 
>> HTTP 500 error on the Admin UI for SOLR.
>>
>> I also can't seem to figure out what I need to upload into Zookeeper as far 
>> as configuration files go.  I created a test collection on the instance when 
>> I got it up one time...but it has yet to start properly again for me.
>>
>> Are there any GOOD tutorials out there?  I have read most of the 
>> documentation I can get my hands on thus far from Apache, and blogs and 
>> such, but the light bulb still has not lit up for me yet and I feel like a 
>> n00b  ;-)
>>
>> My company is currently running SOLR in the old master/slave config and I'm 
>> trying to setup a SOLRCloud so that we can toy with it in a Dev/QA 
>> Environment and see what it's capable of.  We're currently running 4 
>> separate master/slave SOLR server pairs in production to spread out the load 
>> a bit, but I'd rather see us migrate towards a cluster/cloud scenario to 
>> gain some computing power here!
>>
>> Any help is GREATLY appreciated!
>>
>> Scott

Re: Question about best way to architect a Solr application with many data sources

2017-02-23 Thread Joel Bernstein

Alfresco has spent ten+ years building a content management system that
follows this basic design:

1) Original bytes (PDF, Word Doc, image file) are stored in a filesystem
based content store.
2) Meta-data is stored in a relational database, normalized.
3) Content is transformed to text and meta-data is de-normalized and is
sent to Solr for indexing.
4) Solr keeps a copy of the de-normalized, pre-analyzed content on disk
next to the indexes for re-indexing and other purposes.
5) Sor analyzes and indexes the content.

This all happens automatically when the content is added to Alfresco. ACL
lists are also stored along with documents and passed to Solr to support
document level access control during the search.




Joel Bernstein
http://joelsolr.blogspot.com/

On Wed, Feb 22, 2017 at 3:01 PM, Tim Casey  wrote:

> I would possibly extend this a bit futher.  There is the source, then the
> 'normalized' version of the data, then the indexed version.
> Sometimes you realize you miss something in the normalized view and you
> have to go back to the actual source.
>
> This will be as likely as there are number of sources for data.   I would
> expect the "DB" version of the data would be the normalized view.
> It is also possible, the DB holds the raw bytes of the source which are
> then transformed and into a normalized view.  Indexing always happens from
> the normalized view.  In this scheme, frequently there is a way to mark
> what failed normalization so you can go back and recapture the data for a
> re-index.
>
> Also, if you are dealing with timely data, being able to reindex helps
> removing stale information from the search index.  In the pipeline of
> captured source -> normalized -> analyzed -> information, where analyzed is
> indexed here, what you do with the data over a year or more becomes part of
> the thinking.
>
>
>
> On Tue, Feb 21, 2017 at 8:24 PM, Walter Underwood 
> wrote:
>
> > Reindexing is exactly why you want the Single Source of Truth to be in a
> > repository outside of Solr.
> >
> > For our slowly-changing data sets, we have an intermediate JSONL batch.
> > That is created from the source repositories and saved in Amazon S3. Then
> > we load it into Solr nightly. That allows us to reload whenever we need
> to,
> > like loading prod data in test or moving search to a different Amazon
> > region.
> >
> > wunder
> > Walter Underwood
> > wun...@wunderwood.org
> > http://observer.wunderwood.org/  (my blog)
> >
> >
> > > On Feb 21, 2017, at 7:34 PM, Erick Erickson 
> > wrote:
> > >
> > > Dave:
> > >
> > > Oh, I agree that a DB is a perfectly valid place to store the data and
> > > you're absolutely right that it allows better interaction than flat
> > > files; you can ask questions of an RDBMS that you can't easily ask the
> > > disk ;). Storing to disk is an alternative if you're unwilling to deal
> > > with a DB is all.
> > >
> > > But the main point is you'll change your schema sometime and have to
> > > re-index. Having the data you're indexing stored locally in whatever
> > > form will allow much faster turn-around rather than re-crawling. Of
> > > course it'll result in out of date data so you'll have to refresh
> > > somehow sometime.
> > >
> > > Erick
> > >
> > > On Tue, Feb 21, 2017 at 6:07 PM, Dave 
> > wrote:
> > >> Ha I think I went to one of your training seminars in NYC maybe 4
> years
> > ago Eric. I'm going to have to respectfully disagree about the rdbms.
> It's
> > such a well know data format that you could hire a high school programmer
> > to help with the db end if you knew how to flatten it to solr. Besides
> it's
> > easy to visualize and interact with the data before it goes to solr. A
> > Json/Nosql format would work just as well, but I really think a database
> > has its place in a scenario like this
> > >>
> > >>> On Feb 21, 2017, at 8:20 PM, Erick Erickson  >
> > wrote:
> > >>>
> > >>> I'll add that I _guarantee_ you'll want to re-index the data as you
> > >>> change your schema
> > >>> and the like. You'll be able to do that much more quickly if the data
> > >>> is stored locally somehow.
> > >>>
> > >>> A RDBMS is not necessary however. You could simply store the data on
> > >>> disk in some format
> > >>> you could re-read and send to Solr.
> > >>>
> > >>> Best,
> > >>> Erick
> > >>>
> >  On Tue, Feb 21, 2017 at 5:17 PM, Dave  >
> > wrote:
> >  B is a better option long term. Solr is meant for retrieving flat
> > data, fast, not hierarchical. That's what a database is for and trust me
> > you would rather have a real database on the end point.  Each tool has a
> > purpose, solr can never replace a relational database, and a relational
> > database could not replace solr. Start with the slow model (database) for
> > control/display and enhance with the fast model (solr) for
> retrieval/search
> > 
> > 
> > 
> > > On Feb 21, 2017, at 7:57 PM, Robert Hume 
> wrote:
> > >
> > > To learn how to properly use Solr, I'm building a little

Setting Solr data dir isn't really working (6.3.0)

2017-02-23 Thread Walter Underwood

I did this in the solrconfig.xml for both collections (tutors and questions). 

  /solr/data

I deleted the old collection indexes, reloaded, restarted, and created a new 
collection for “tutors". And I see this on the disk.

[wunder@new-solr-c02.test3]# ls -l /solr/data
total 36
drwxr-xr-x 2 bin bin 20480 Feb 23 17:40 index
drwxr-xr-x 2 bin bin  4096 Feb 23 15:57 snapshot_metadata
drwxr-xr-x 2 bin bin  4096 Feb 23 15:57 suggest_subject_names_fuzzy
drwxr-xr-x 2 bin bin  4096 Feb 23 15:57 suggest_subject_names_infix
drwxr-xr-x 2 bin bin  4096 Feb 23 17:40 tlog
[wunder@new-solr-c02.test3]# ls -l /apps/solr6/server/solr
total 12
drwxr-xr-x 5 bin bin   93 Jul 14  2016 configsets
-rw-r--r-- 1 bin bin 3037 Jul 14  2016 README.txt
-rw-r--r-- 1 bin bin 2117 Aug 31 20:13 solr.xml
drwxr-xr-x 2 bin bin   28 Feb 23 15:57 tutors_shard1_replica5
-rw-r--r-- 1 bin bin  501 Jul 14  2016 zoo.cfg
[wunder@new-solr-c02.test3]#

Seems pretty broken to me.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

Re: Setting Solr data dir isn't really working (6.3.0)

2017-02-23 Thread Erick Erickson

Not quite sure what you're complaint is. Is it that
you've get the index directory under /solr/data and
not under, say, /solr/data/tutors? Or that
/apps/solr6/server/solr/tutors_shard1_replica5 exists at all?

And what's in tutors_shard1_replica5 anyway? Just the
core.properties file?

Erick

On Thu, Feb 23, 2017 at 5:41 PM, Walter Underwood  wrote:
> I did this in the solrconfig.xml for both collections (tutors and questions).
>
>   /solr/data
>
> I deleted the old collection indexes, reloaded, restarted, and created a new 
> collection for “tutors". And I see this on the disk.
>
> [wunder@new-solr-c02.test3]# ls -l /solr/data
> total 36
> drwxr-xr-x 2 bin bin 20480 Feb 23 17:40 index
> drwxr-xr-x 2 bin bin  4096 Feb 23 15:57 snapshot_metadata
> drwxr-xr-x 2 bin bin  4096 Feb 23 15:57 suggest_subject_names_fuzzy
> drwxr-xr-x 2 bin bin  4096 Feb 23 15:57 suggest_subject_names_infix
> drwxr-xr-x 2 bin bin  4096 Feb 23 17:40 tlog
> [wunder@new-solr-c02.test3]# ls -l /apps/solr6/server/solr
> total 12
> drwxr-xr-x 5 bin bin   93 Jul 14  2016 configsets
> -rw-r--r-- 1 bin bin 3037 Jul 14  2016 README.txt
> -rw-r--r-- 1 bin bin 2117 Aug 31 20:13 solr.xml
> drwxr-xr-x 2 bin bin   28 Feb 23 15:57 tutors_shard1_replica5
> -rw-r--r-- 1 bin bin  501 Jul 14  2016 zoo.cfg
> [wunder@new-solr-c02.test3]#
>
> Seems pretty broken to me.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>

Re: Setting Solr data dir isn't really working (6.3.0)

2017-02-23 Thread Walter Underwood

The bug is that the dataDir is /solr/data and the index data is in 
/apps/solr6/server/solr. Except for the suggest data. No index data should be 
outside the dataDir, right?

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Feb 23, 2017, at 6:11 PM, Erick Erickson  wrote:
> 
> Not quite sure what you're complaint is. Is it that
> you've get the index directory under /solr/data and
> not under, say, /solr/data/tutors? Or that
> /apps/solr6/server/solr/tutors_shard1_replica5 exists at all?
> 
> And what's in tutors_shard1_replica5 anyway? Just the
> core.properties file?
> 
> Erick
> 
> On Thu, Feb 23, 2017 at 5:41 PM, Walter Underwood  
> wrote:
>> I did this in the solrconfig.xml for both collections (tutors and questions).
>> 
>>  /solr/data
>> 
>> I deleted the old collection indexes, reloaded, restarted, and created a new 
>> collection for “tutors". And I see this on the disk.
>> 
>> [wunder@new-solr-c02.test3]# ls -l /solr/data
>> total 36
>> drwxr-xr-x 2 bin bin 20480 Feb 23 17:40 index
>> drwxr-xr-x 2 bin bin  4096 Feb 23 15:57 snapshot_metadata
>> drwxr-xr-x 2 bin bin  4096 Feb 23 15:57 suggest_subject_names_fuzzy
>> drwxr-xr-x 2 bin bin  4096 Feb 23 15:57 suggest_subject_names_infix
>> drwxr-xr-x 2 bin bin  4096 Feb 23 17:40 tlog
>> [wunder@new-solr-c02.test3]# ls -l /apps/solr6/server/solr
>> total 12
>> drwxr-xr-x 5 bin bin   93 Jul 14  2016 configsets
>> -rw-r--r-- 1 bin bin 3037 Jul 14  2016 README.txt
>> -rw-r--r-- 1 bin bin 2117 Aug 31 20:13 solr.xml
>> drwxr-xr-x 2 bin bin   28 Feb 23 15:57 tutors_shard1_replica5
>> -rw-r--r-- 1 bin bin  501 Jul 14  2016 zoo.cfg
>> [wunder@new-solr-c02.test3]#
>> 
>> Seems pretty broken to me.
>> 
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>>

Index Segments not Merging

2017-02-23 Thread Jordan Drake

We have solr with the index stored in HDFS. We are running MapReduce jobs
to build the index using the MapReduceIndexerTool from Cloudera with the
go-live option to merge into our live index.

We are seeing an issue where the number of segments in the index never
reduces. It continues to grow until we manually do an optimize.

We are using the following solr config for merge policy











*101016*

If we add documents into solr without using MapReduce the segments merge
properly as expected.

Any ideas on why we see this behavior? Does the solr index merge prevent
the segments from merging?


Thanks,
Jordan

Re: Arabic words search in solr

2017-02-23 Thread Steve Rowe

Hi Mohan,

I indexed your 9 examples as simple documents after mapping dynamic field 
“*_ar” to the “text_ar” field type:

-
[{"id":"1", "name_ar":"المؤسسة التجارية العمانية"},
{"id":"2", "name_ar":"شركة التأمين الأهلية ش.م.ع.م"},
{"id":"3", "name_ar":"شرطة عمان السلطانية - قيادة شرطة محافظة شمال الشرقية - - 
مركز شرطة إبراء"},
{"id":"4", "name_ar":"شركة ظفار للتأمين ش.م.ع.ع"},
{"id":"5", "name_ar":"طوارئ المستشفيات   - طوارئ مستشفى صحار"},
{"id":"6", "name_ar":"شرطة عمان السلطانية - قيادة شرطة محافظة الداخلية - - مركز 
شرطة إزكي"},
{"id":"7", "name_ar":"المؤسسة التجارية العمانية"},
{"id":"8", "name_ar":"وزارة الصحة - المديرية العامة للخدمات الصحية  محافظة 
الداخلية -  - مستشفى إزكي (البدالة)  - الطوارئ"},
{"id":"9", "name_ar":"أسعار المكالمات الدولية - مونتسرات -  - مونتسرات”}]
-

Then when I search from the Admin UI for “name_ar:شرطة ازكي” (the query in one 
of your screenshots with numFound=0) I get the following results:

-
{
  "responseHeader": {
"status": 0,
"QTime": 1,
"params": {
  "indent": "true",
  "q": "name_ar:شرطة ازكي",
  "_": "1487912340325",
  "wt": "json"
}
  },
  "response": {
"numFound": 2,
"start": 0,
"docs": [
  {
"id": "6",
"name_ar": [
  "شرطة عمان السلطانية - قيادة شرطة محافظة الداخلية - - مركز شرطة إزكي"
],
"_version_": 1560170434794619000
  },
  {
"id": "3",
"name_ar": [
  "شرطة عمان السلطانية - قيادة شرطة محافظة شمال الشرقية - - مركز شرطة 
إبراء"
],
"_version_": 1560170434793570300
  }
]
  }
}
-

So I cannot reproduce the failures you’re seeing.  In fact, I tried all 9 of 
the queries you listed as not working, and all of them matched at least one of 
the above 9 documents, except for case 5 (which I give details for below).  Are 
you absolutely sure that you reindexed your data with the ICUFF last?

The one query that didn’t return any matches for me is “name_ar:طوارى صحار”.  
Here’s why:

Indexed original: طوارئ صحار
Indexed analyzed: طواري صحار

Query original: طوارى صحار
Query analyzed: طوار صحار

In the analyzed indexed form, the “ئ” (yeh with hamza above) is left intact by 
ArabicNormalizationFilter and ArabicStemFilter, and then the ICUFoldingFilter 
converts it to “ي” (yeh without the hamza).

In the analyzed query, ArabicNormalizationFilter converts “طوارى” to “طواري” 
(alef maksura->yeh), which ArabicStemFilter converts to “طوار” by removing the 
trailing yeh.

I don’t know what the correct thing to do is to make alef maksura and yeh match 
each other, but one possibility is adding a char filter that converts all alefs 
maksura into yehs with hamza, like this:

maxwarmingSearchers and memory leak

Re: Replicas fail immediately in new collection

Re: Replicas fail immediately in new collection

Re: Interval Facets with JSON

SOLRCloud on 6.4 on Ubuntu

Phrase field matches not counting towards minimum match

Re: SOLRCloud on 6.4 on Ubuntu

Subsciption to group

Re: SOLRCloud on 6.4 on Ubuntu

Re: Question about best way to architect a Solr application with many data sources

Setting Solr data dir isn't really working (6.3.0)

Re: Setting Solr data dir isn't really working (6.3.0)

Re: Setting Solr data dir isn't really working (6.3.0)

Index Segments not Merging

Re: Arabic words search in solr

15 matches

Site Navigation

Mail list logo

Footer information