mlt handler not giving response in Solr Cloud

2014-11-18 Thread Jilani Shaik
Hi,

When I tried to execute the mlt handler query on a shard it is giving
result if the documents exist on that shards.

in below scenario, I have a cloud shards on localhost with ports 8181 and
8191. where documents are distributed. if the mlt query document id belongs
to 8181 shard and the query hits to 8181 shard then only I am getting the
results.


 No result
http://localhost:8181/solr/collectionName/mlt?q=id:medl_24806189&fq=segment:medl&fl=id,owui_p&rows=100

 Will give result
http://localhost:8191/solr/collectionName/mlt?q=id:medl_24806189&fq=segment:medl&fl=id,owui_p&rows=100

*So the distributed search is not working for mlt handler(my assumption,
please correct). *

Even I tried with the below

http://localhost:8181/solr/collectionName/mlt?q=owui_p:medl_24806189&fq=segment:medl&fl=id,owui_p&rows=100&;
*shards.qt=/mlt&shards=localhost:8181/solr/,localhost:8191/solr/*

http://localhost:8181/solr/collectionName/mlt?q=owui_p:medl_24806189&fq=segment:medl&fl=id,owui_p&rows=100
*&shards.qt=/mlt&shards=localhost:8181/solr/collectionName/,localhost:8191/solr/collectionName/*

even I tried with select handler and with mlt as true also not working.

http://localhost:8181/solr/collectionName/*select?mlt=true*
&q=owui_p:medl_24806189&fq=segment:medl&fl=id,owui_p&rows=100&distrib=true&mlt.fl=ti_w


MLT configuration from solrconfig.xml




ti_w
1
2
true
localhost:8181/solr/collectionName,localhost:8191/solr/collectionName
/mlt
true
all





Please let me know what is the missing here to get the result in solr cloud.

Thanks,
Jilani


mlt handler not giving response in Solr Cloud

2014-11-18 Thread Jilani Shaik
Hi,

When I tried to execute the mlt handler query on a shard it is giving
result if the documents exist on that shards.

in below scenario, I have a cloud shards on localhost with ports 8181 and
8191. where documents are distributed. if the mlt query document id belongs
to 8181 shard and the query hits to 8181 shard then only I am getting the
results.


 No result
http://localhost:8181/solr/collectionName/mlt?q=id:medl_24806189&fq=segment:medl&fl=id,owui_p&rows=100

 Will give result
http://localhost:8191/solr/collectionName/mlt?q=id:medl_24806189&fq=segment:medl&fl=id,owui_p&rows=100

*So the distributed search is not working for mlt handler(my assumption,
please correct). *

Even I tried with the below

http://localhost:8181/solr/collectionName/mlt?q=owui_p:medl_24806189&fq=segment:medl&fl=id,owui_p&rows=100&;
*shards.qt=/mlt&shards=localhost:8181/solr/,localhost:8191/solr/*

http://localhost:8181/solr/collectionName/mlt?q=owui_p:medl_24806189&fq=segment:medl&fl=id,owui_p&rows=100
*&shards.qt=/mlt&shards=localhost:8181/solr/collectionName/,localhost:8191/solr/collectionName/*

even I tried with select handler and with mlt as true also not working.

http://localhost:8181/solr/collectionName/*select?mlt=true*
&q=owui_p:medl_24806189&fq=segment:medl&fl=id,owui_p&rows=100&distrib=true&mlt.fl=ti_w


MLT configuration from solrconfig.xml




ti_w
1
2
true
localhost:8181/solr/collectionName,localhost:8191/solr/collectionName
/mlt
true
all





Please let me know what is the missing here to get the result in solr cloud.

Thanks,
Jilani


Re: Solr 5 release date ?

2014-11-18 Thread roy123
Thanks Erick



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-5-release-date-tp4169571p4169621.html
Sent from the Solr - User mailing list archive at Nabble.com.


Could not connect to ZooKeeper x.x.x.x:2181/solr within 10000 ms

2014-11-18 Thread Uddgam Singh
Hi Experts, Its an urgent issue, Please advice :- I am Running solrj
program which connect to Solr server and run queries and gives result-set.
Queries :-2 level Nested queries First level fetch 154 rows. Each row
contain 2 fields Second Level fetch only counts for 154 rows Now problem:-
It working fine for 50 rowset but when that rowset is set to 80, it gives
errors: org.apache.solr.common.SolrException:
java.util.concurrent.TimeoutException: Could not connect to ZooKeeper
x.x.x.x:2181/solr within 1 ms at
org.apache.solr.common.cloud.SolrZkClient.(SolrZkClient.java:151) at
org.apache.solr.common.cloud.SolrZkClient.(SolrZkClient.java:102) at
org.apache.solr.common.cloud.SolrZkClient.(SolrZkClient.java:92) at
org.apache.solr.common.cloud.ZkStateReader.(ZkStateReader.java:209)
at
org.apache.solr.client.solrj.impl.CloudSolrServer.connect(CloudSolrServer.java:241)
at
org.apache.solr.client.solrj.impl.CloudSolrServer.request(CloudSolrServer.java:524)
at
org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:91)
at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:301)
Loaded data set is of 7.2 MB Vmware configuration:- 4 core, 4 gb ram
Cloudera 4.5 setup.

-- 
With Regards,
Uddgam Singh


Re: Could not connect to ZooKeeper x.x.x.x:2181/solr within 10000 ms

2014-11-18 Thread Erick Erickson
bq: Cloudera 4.5 setup

Probably should ask this on the Cloudera user's list.

But a 10,000 ms timeout is pretty low, I'd increase that. I suspect
the 80 rows bit is coincidental and there's something else happening.

Best,
Erick

On Tue, Nov 18, 2014 at 3:36 AM, Uddgam Singh
 wrote:
> Hi Experts, Its an urgent issue, Please advice :- I am Running solrj
> program which connect to Solr server and run queries and gives result-set.
> Queries :-2 level Nested queries First level fetch 154 rows. Each row
> contain 2 fields Second Level fetch only counts for 154 rows Now problem:-
> It working fine for 50 rowset but when that rowset is set to 80, it gives
> errors: org.apache.solr.common.SolrException:
> java.util.concurrent.TimeoutException: Could not connect to ZooKeeper
> x.x.x.x:2181/solr within 1 ms at
> org.apache.solr.common.cloud.SolrZkClient.(SolrZkClient.java:151) at
> org.apache.solr.common.cloud.SolrZkClient.(SolrZkClient.java:102) at
> org.apache.solr.common.cloud.SolrZkClient.(SolrZkClient.java:92) at
> org.apache.solr.common.cloud.ZkStateReader.(ZkStateReader.java:209)
> at
> org.apache.solr.client.solrj.impl.CloudSolrServer.connect(CloudSolrServer.java:241)
> at
> org.apache.solr.client.solrj.impl.CloudSolrServer.request(CloudSolrServer.java:524)
> at
> org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:91)
> at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:301)
> Loaded data set is of 7.2 MB Vmware configuration:- 4 core, 4 gb ram
> Cloudera 4.5 setup.
>
> --
> With Regards,
> Uddgam Singh


faceting on very long strings

2014-11-18 Thread English, Eben
Is there any kind of general rule-of-thumb character limit in regards to 
faceting on very long strings?

I have a string field that I want to facet on (contains geographic data 
structured as a GeoJSON Feature), where the length is typically around 220 
characters. Is this too long to facet on, performance-wise?

If this (~220 chars) is too long, is there some way to hash the long string and 
copyField for better performance, but in a way that the original string values 
can be returned as facets in the response?

I know this question is a bit vague, and I would imagine mileage will vary 
depending on the specifics of the situation. I anticipate the entire index to 
include about 2 million documents, with possibly about 200,000 unique values 
for this GeoJSON string field.

Thanks!

Eben English



Re: Restrict search to subset (a list of aprrox 40,000 ids from an external service) of corpus

2014-11-18 Thread deviantcode
Thanks will try with a POST



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Restrict-search-to-subset-a-list-of-aprrox-40-000-ids-from-an-external-service-of-corpus-tp4169210p4169675.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: faceting on very long strings

2014-11-18 Thread Toke Eskildsen
English, Eben [eengl...@bpl.org] wrote:
> Is there any kind of general rule-of-thumb character limit in regards to 
> faceting on very long strings?

Not really. There are limits, but they are quite high. Due to a bad analyzer we 
had an index with ~1M unique facet values that ranged from 100-3000 characters 
and besides a very messy GUI, we did not notice anything problematic.

> I have a string field that I want to facet on (contains geographic data 
> structured as a GeoJSON Feature),
> where the length is typically around 220 characters. Is this too long to 
> facet on, performance-wise?

I doubt you will be able to measure the difference between that and 10 
character strings. There is a question of string comparison upon index open and 
the result must be serialized, but for most of the facet operations, the length 
matters little. If your strings were 100 times longer, you might have seen a 
performance impact.

> [...] possibly about 200,000 unique values for this GeoJSON string field.

I see no problems with this. 200K values of 220 characters is only large if you 
insist on returning everyone of them as the facet result.

- Toke Eskildsen


Re: New Meetup in London - Lucene/Solr User Group

2014-11-18 Thread Charlie Hull

On 27/10/2014 14:25, Charlie Hull wrote:

Hi all,

We noticed that there isn't a Lucene/Solr user group in London (although
there is an Elasticsearch user group) - so we decided to start one!
http://www.meetup.com/Apache-Lucene-Solr-London-User-Group

Please join if you're interested and do pass the word. Our first meeting
will be November 28th 2014 at Bloomberg's European HQ on Finsbury
Square. Committer Shalin Mangar will be speaking, we'll have a Q&A with
committers and more. We're very interested in any input you have in
terms of what you'd like to hear talks about (or even better if you can
give one), so let me know.


Hi all,

Just a final mention for this event next week in London, and to add that 
we'll also be talking on Search Turned Upside Down (inverted search at 
scale for media monitoring) and presenting some results of a 
Solr/Elasticsearch comparative performance study.


Cheers

Charlie


Cheers

Charlie




--
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.flax.co.uk


Re: Handling growth

2014-11-18 Thread Michael Della Bitta
We're achieving some success by treating aliases as collections and 
collections as shards.


More specifically, there's a read alias that spans all the collections, 
and a write alias that points at the 'latest' collection. Every week, I 
create a new collection, add it to the read alias, and point the write 
alias at it.


Michael

On 11/14/14 07:06, Toke Eskildsen wrote:

Patrick Henry [patricktheawesomeg...@gmail.com] wrote:


I am working with a Solr collection that is several terabytes in size over
several hundred millions of documents.  Each document is very rich, and
over the past few years we have consistently quadrupled the size our
collection annually.  Unfortunately, this sits on a single node with only a
few hundred megabytes of memory - so our performance is less than ideal.

I assume you mean gigabytes of memory. If you have not already done so, 
switching to SSDs for storage should buy you some more time.


[Going for SolrCloud]  We are in a continuous adding documents and never change
existing ones.  Based on that, one individual recommended for me to
implement custom hashing and route the latest documents to the shard with
the least documents, and when that shard fills up add a new shard and index
on the new shard, rinse and repeat.

We have quite a similar setup, where we produce a never-changing shard once 
every 8 days and add it to our cloud. One could also combine this setup with a 
single live shard, for keeping the full index constantly up to date. The memory 
overhead of running an immutable shard is smaller than a mutable one and easier 
to fine-tune. It also allows you to optimize the index down to a single 
segment, which requires a bit less processing power and saves memory when 
faceting. There's a description of our setup at 
http://sbdevel.wordpress.com/net-archive-search/

 From an administrative point of view, we like having complete control over 
each shard. We keep track of what goes in it and in case of schema or analyze 
chain changes, we can re-build each shard one at a time and deploy them 
continuously, instead of having to re-build everything in one go on a parallel 
setup. Of course, fundamental changes to the schema would require a complete 
re-build before deploy, so we hope to avoid that.

- Toke Eskildsen




Re: New Meetup in London - Lucene/Solr User Group

2014-11-18 Thread Alexandre Rafalovitch
On 18 November 2014 11:41, Charlie Hull  wrote:
> presenting some results of a Solr/Elasticsearch comparative performance
> study.

I was asked about that a couple of times at the Solr Revolution
conference. Looking forward to seeing the results.

Regards,
   Alex.

Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


sorlj indexing problem

2014-11-18 Thread AJ Lemke
Hi All,

I am getting an error when using solrj to index records.

Exception in thread "main" 
org.apache.solr.client.solrj.impl.CloudSolrServer$RouteException: Exception 
writing document id 529241050 to the index; possible analysis error.
at 
org.apache.solr.client.solrj.impl.CloudSolrServer.directUpdate(CloudSolrServer.java:360)
at 
org.apache.solr.client.solrj.impl.CloudSolrServer.request(CloudSolrServer.java:533)
at 
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:124)
at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:68)
at Importer.commit(Importer.java:96)
at Importer.putData(Importer.java:81)
at Importer.main(Importer.java:25)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134)
Caused by: 
org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Exception 
writing document id 529241050 to the index; possible analysis error.
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:552)
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:210)
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:206)
at 
org.apache.solr.client.solrj.impl.LBHttpSolrServer.doRequest(LBHttpSolrServer.java:340)
at 
org.apache.solr.client.solrj.impl.LBHttpSolrServer.request(LBHttpSolrServer.java:301)
at 
org.apache.solr.client.solrj.impl.CloudSolrServer$1.call(CloudSolrServer.java:341)
at 
org.apache.solr.client.solrj.impl.CloudSolrServer$1.call(CloudSolrServer.java:338)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

I am connecting to a local solrCloud instance (localhost:9983) and a schemaless 
collection.
Is this possible or do I have to use a schema?

Thanks!
AJ


Re: sorlj indexing problem

2014-11-18 Thread Alexandre Rafalovitch
I haven't seen this specific error before, but my guess would be that
your 'schemaless' mode has created a field of a particular type which
does not match it's later usage.

So, it may have seen '3' and assumed integers and now you are giving it 'four'.

I would pull that specific record up and check it's fields against the
schema definition created so far (in Admin UI). Look for type mismatch
and - possibly - for single/multiValued discrepancy (single in schema,
multiValued in practice)

Regards,
   Alex.
P.s. Schemaless mode is mostly for rapid development/experimentation.
In production, you want to use explicit schema and dynamic fields.
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


On 18 November 2014 12:03, AJ Lemke  wrote:
> Hi All,
>
> I am getting an error when using solrj to index records.
>
> Exception in thread "main" 
> org.apache.solr.client.solrj.impl.CloudSolrServer$RouteException: Exception 
> writing document id 529241050 to the index; possible analysis error.
> at 
> org.apache.solr.client.solrj.impl.CloudSolrServer.directUpdate(CloudSolrServer.java:360)
> at 
> org.apache.solr.client.solrj.impl.CloudSolrServer.request(CloudSolrServer.java:533)
> at 
> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:124)
> at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:68)
> at Importer.commit(Importer.java:96)
> at Importer.putData(Importer.java:81)
> at Importer.main(Importer.java:25)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:483)
> at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134)
> Caused by: 
> org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: 
> Exception writing document id 529241050 to the index; possible analysis error.
> at 
> org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:552)
> at 
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:210)
> at 
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:206)
> at 
> org.apache.solr.client.solrj.impl.LBHttpSolrServer.doRequest(LBHttpSolrServer.java:340)
> at 
> org.apache.solr.client.solrj.impl.LBHttpSolrServer.request(LBHttpSolrServer.java:301)
> at 
> org.apache.solr.client.solrj.impl.CloudSolrServer$1.call(CloudSolrServer.java:341)
> at 
> org.apache.solr.client.solrj.impl.CloudSolrServer$1.call(CloudSolrServer.java:338)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
>
> I am connecting to a local solrCloud instance (localhost:9983) and a 
> schemaless collection.
> Is this possible or do I have to use a schema?
>
> Thanks!
> AJ


Re: mlt handler not giving response in Solr Cloud

2014-11-18 Thread Jilani Shaik
Please help me on this issue. Please provide me suggestions what is missing
to get the response from multiple solr shards in cloud.

On Tue, Nov 18, 2014 at 1:40 PM, Jilani Shaik  wrote:

> Hi,
>
> When I tried to execute the mlt handler query on a shard it is giving
> result if the documents exist on that shards.
>
> in below scenario, I have a cloud shards on localhost with ports 8181 and
> 8191. where documents are distributed. if the mlt query document id belongs
> to 8181 shard and the query hits to 8181 shard then only I am getting the
> results.
>
>
>  No result
>
> http://localhost:8181/solr/collectionName/mlt?q=id:medl_24806189&fq=segment:medl&fl=id,owui_p&rows=100
>
>  Will give result
>
> http://localhost:8191/solr/collectionName/mlt?q=id:medl_24806189&fq=segment:medl&fl=id,owui_p&rows=100
>
> *So the distributed search is not working for mlt handler(my assumption,
> please correct). *
>
> Even I tried with the below
>
>
> http://localhost:8181/solr/collectionName/mlt?q=owui_p:medl_24806189&fq=segment:medl&fl=id,owui_p&rows=100&;
> *shards.qt=/mlt&shards=localhost:8181/solr/,localhost:8191/solr/*
>
>
> http://localhost:8181/solr/collectionName/mlt?q=owui_p:medl_24806189&fq=segment:medl&fl=id,owui_p&rows=100
> *&shards.qt=/mlt&shards=localhost:8181/solr/collectionName/,localhost:8191/solr/collectionName/*
>
> even I tried with select handler and with mlt as true also not working.
>
> http://localhost:8181/solr/collectionName/*select?mlt=true*
> &q=owui_p:medl_24806189&fq=segment:medl&fl=id,owui_p&rows=100&distrib=true&mlt.fl=ti_w
>
>
> MLT configuration from solrconfig.xml
>
> 
> 
> 
> ti_w
> 1
> 2
> true
>  name="shards">localhost:8181/solr/collectionName,localhost:8191/solr/collectionName
> /mlt
> true
> all
> 
> 
>
>
>
> Please let me know what is the missing here to get the result in solr
> cloud.
>
> Thanks,
> Jilani
>


Re: problems when hunspell returns multiple stems

2014-11-18 Thread Michael Sokolov

followup - hunspell has:

follow/SDRZGJ
follower/M
following/M

follow/G generates following

I guess the reason for the /M entries is to represent the nouns, which 
have plural endings, so that


following->followings

-- I'm not really sure where the bug is, but it seems as if generating 
multiple "stems" causes issues



On 11/18/2014 02:33 PM, Michael Sokolov wrote:
I find that a query for stemmed terms sometimes fails with the edismax 
query parser and hunspell stemmer. Looklng at the output of analysis 
for the query (text:following) I can see that it generates two 
different terms at the same position: "follow" and "following".  Then 
edismax seems to generate a sloppy phrase query from that; in the 
debug output of the query I can see ( text:following text:follow)~2. 
This doesn't match anything, even though both the words follow and 
following (as well as followed, follows, etc) both occur in various 
documents.


First, I'm confused as to what the source of the sloppy query is. Here 
are the relevant settings from solrconfig:


edismax
archive_id^1 author^20 chapter_title^15 isbn^1 
publisher^5 subjects^5 text^1 title^120

chapter_title~2^1 subjects~2^20 text~10^1 title~2^4
100%
OR

Is there some process that generates a slop query for co-occurring terms?

As an aside, the same query returns a document when we use the lucene 
query parser: it matches one document.  But when I search across our 
unstemmed field, it returns more.  It appears as if


It seems as if when hunspell returns multiple terms from a single one, 
this causes problems?


So in summary: why would hunspell generate "following" as a stem for 
"following"? Probably just a buggy dictionary entry; we could fix 
that, but I wouldn't expect the phrase behavior in that case from 
edismax either.  Can anybody shed some light as to what's going on here?


Thanks

-Mike




problems when hunspell returns multiple stems

2014-11-18 Thread Michael Sokolov
I find that a query for stemmed terms sometimes fails with the edismax 
query parser and hunspell stemmer. Looklng at the output of analysis for 
the query (text:following) I can see that it generates two different 
terms at the same position: "follow" and "following". Then edismax seems 
to generate a sloppy phrase query from that; in the debug output of the 
query I can see ( text:following text:follow)~2. This doesn't match 
anything, even though both the words follow and following (as well as 
followed, follows, etc) both occur in various documents.


First, I'm confused as to what the source of the sloppy query is. Here 
are the relevant settings from solrconfig:


edismax
archive_id^1 author^20 chapter_title^15 isbn^1 
publisher^5 subjects^5 text^1 title^120

chapter_title~2^1 subjects~2^20 text~10^1 title~2^4
100%
OR

Is there some process that generates a slop query for co-occurring terms?

As an aside, the same query returns a document when we use the lucene 
query parser: it matches one document.  But when I search across our 
unstemmed field, it returns more.  It appears as if


It seems as if when hunspell returns multiple terms from a single one, 
this causes problems?


So in summary: why would hunspell generate "following" as a stem for 
"following"? Probably just a buggy dictionary entry; we could fix that, 
but I wouldn't expect the phrase behavior in that case from edismax 
either.  Can anybody shed some light as to what's going on here?


Thanks

-Mike


OutOfMemory on 28 docs with facet.method=fc/fcs

2014-11-18 Thread Mohsin Beg Beg

Hi,

I am getting OOM when faceting on numFound=28. The receiving solr node throws 
the OutOfMemoryError even though there is 7gb available heap before the 
faceting request was submitted. If a different solr node is selected that one 
fails too. Any suggestions ?


1) Test setup is:-
100 collections with 20 shards each := 2000 cores
20 solr nodes of 16gb jvm memory := 100 cores per jvm node
5 hosts of 300 gb memory := 4 solr nodes per host


2) Query (edited for berevity) is :-
fields1...15 below are 15 among ~500 fields of type strings (tokenized) and 
numerics.

http://myhost:8983/solr/Collection1/query
?q=fieldX:xyz AND fieldY:("r" OR "g" OR "b")
&rows=0
&fq={!cache=false}time:[ TO ]
&facet=true
&facet.sort=count
&facet.missing=false
&facet.mincount=1
&facet.threads=10
 &facet.field=field1field15
 &f.field1...field15.facet.method=fc/fcs
&collection=Collection1...Collection100


-M


RE: OutOfMemory on 28 docs with facet.method=fc/fcs

2014-11-18 Thread Toke Eskildsen
Mohsin Beg Beg [mohsin@oracle.com] wrote:
> I am getting OOM when faceting on numFound=28. The receiving
> solr node throws the OutOfMemoryError even though there is 7gb
> available heap before the faceting request was submitted.

fc and fcs faceting memory overhead is (nearly) independent on the number of 
hits in the search result. 

> If a different solr node is selected that one fails too. Any suggestions ?

> &facet.field=field1field15
> &f.field1...field15.facet.method=fc/fcs
> &collection=Collection1...Collection100

You seem to be issuing a facet request for 15 fields in 100 collection 
concurrently. The memory overhead will be linear to the number of documents, 
references from documents to field values and the number of unique values in 
your facets, for each facet independently.

That was confusing. Let me try an example instead:

For each field, static memory requirements will be a structure that maps from 
documents to term ordinals. Depending on circumstances, this can be small 
(DocValues and a numeric field) or big (multi-value, non-DocValue String). Each 
concurrent call will temporarily allocate a structure for counting. If the 
field is numeric, this will be a hashmap. If it is String, it will be an 
integer-array with as many entries as there are unique values: If there are 1M 
unique String values in the field, the overhead will be 4 bytes * 1M = 4MB.

So, if each field has 250K unique String values, the temporary overhead for all 
15 fields will be 15MB. I don't now if the request for multiple collections is 
threaded, but if so, the 15MB should be multiplied with 100, totalling 1.5GB 
memory overhead for each call. Add the static structures and it does not seem 
unreasonable that you run out of memory.

All this is very loose, but the overall message is that documents, unique facet 
values, facets and collections all multiplies memory requirements.

* Do you need to query all collections at once?
* Can you collapse some of the facet fields, to reduce the total number?
* Are some of the fields very small? If so, use enum for them instead of fc/fcs.
* Maybe you can determine your limits by issuing requests first for 1 field, 
then 2 etc. This is to see if it is feasible to do minor tweak to get it to 
work or if your setup is so large that something entirely else needs to be done.

- Toke Eskildsen


Solr JOIN: keeping permission data out of primary documents

2014-11-18 Thread Philip Durbin
Solr JOINs are a way to enforce simple document security, as explained
by Yonik Seeley at
http://lucene.472066.n3.nabble.com/document-level-security-filter-solution-for-Solr-tp4126992p4126994.html

I'm trying to tweak this pattern so that I don't have to keep the
security information in each of my primary Solr documents.

I just posted the gist at
https://gist.github.com/pdurbin/4d27fea7b431ef3bf4f9 as an example of
my working Solr JOIN based on data in `before.json` . Permissions per
user are embedded in the primary documents like this:

{
"id": "dataset_3",
"perms_ss": [
"alice",
"bob"
]
},
{
"id": "dataset_4",
"perms_ss": [
"alice",
"bob",
"public"
]
},

User document have been created to do the JOIN on:

{
"id": "alice",
"groups_s": "alice"
},

The JOIN looks like this:

{!join+from=groups_s+to=perms_ss}id:public+OR+{!join+from=groups_s+to=perms_ss}id:alice

Because indexing the primary documents (datasets) takes a while, I'm
interested in exploring the idea of introducing a third type of
document that contains the permission information. `after.json` is an
example, with documents that look like this:

{
"id": "dataset_3"
},
{
"id": "dataset_4"
},
{
"id": "public",
"groups_s": "public"
},
{
"id": "alice",
"groups_s": "alice"
},
{
"id": "bob",
"groups_s": "bob"
},
{
"id": "charlie",
"groups_s": "charlie"
},
{
"id": "dataset_1_perms",
"definition_point_s": "dataset_1",
"role_assignee_ss": [
"alice"
]
},
{
"id": "dataset_2_perms",
"definition_point_s": "dataset_2",
"role_assignee_ss": [
"bob"
]
},

The question is if it's possible to construct a Solr JOIN such that
the same permissions are enforced and the same documents are returned
per user. The gist contains expected output and test runners for
anyone who can figure out the syntax of the JOIN. The idea is that
silence is golden and no output means the tests passed:

murphy:4d27fea7b431ef3bf4f9 pdurbin$ ./delete
{"responseHeader":{"status":0,"QTime":8}}
murphy:4d27fea7b431ef3bf4f9 pdurbin$ ./load.before
{"responseHeader":{"status":0,"QTime":12}}
murphy:4d27fea7b431ef3bf4f9 pdurbin$ ./test.before.all
murphy:4d27fea7b431ef3bf4f9 pdurbin$

What do people think? Can anyone load up "after.json", update the
FIXME's, and get `test.after.all` to work? Thanks in advance!

And thanks again for the original JOIN tip, Yonik!

Phil

-- 
Philip Durbin
Software Developer for http://dataverse.org
http://www.iq.harvard.edu/people/philip-durbin


AbstractSubTypeFieldType as a template

2014-11-18 Thread SolrUser1543
I need to implement indexing of hierarchical data , like post and its
comments . 
Each comment has few fields like username / text / date . 

There are few more types like comment that I need too . ( the only
difference is field names and its count)


There are LatLonType filed type , which derives from
AbstractSubTypeFieldType . This type allows to index a struct of two fields
which embedded to flat document  .

Is it possible to create a generic type named , e.g. GenericSubType , which
can get in its schema configuration names of fields and its types and allows
me to index a structures inside a flat document like LatLonType ? 

The most important point is correlation between values , because LatLon
creates two multi valued fields and uses values at the same index in both
fields . 

any ideas  ? 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/AbstractSubTypeFieldType-as-a-template-tp4169740.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: problems when hunspell returns multiple stems

2014-11-18 Thread Michael Sokolov
OK - please disregard; I found a rogue new component in our analyzer 
that was messing everything up.


The hunspell behavior was perhaps a little confusing, but I don't 
believe it leads to broken queries.


-Mike


On 11/18/2014 02:38 PM, Michael Sokolov wrote:

followup - hunspell has:

follow/SDRZGJ
follower/M
following/M

follow/G generates following

I guess the reason for the /M entries is to represent the nouns, which 
have plural endings, so that


following->followings

-- I'm not really sure where the bug is, but it seems as if generating 
multiple "stems" causes issues



On 11/18/2014 02:33 PM, Michael Sokolov wrote:
I find that a query for stemmed terms sometimes fails with the 
edismax query parser and hunspell stemmer. Looklng at the output of 
analysis for the query (text:following) I can see that it generates 
two different terms at the same position: "follow" and "following".  
Then edismax seems to generate a sloppy phrase query from that; in 
the debug output of the query I can see ( text:following 
text:follow)~2. This doesn't match anything, even though both the 
words follow and following (as well as followed, follows, etc) both 
occur in various documents.


First, I'm confused as to what the source of the sloppy query is.  
Here are the relevant settings from solrconfig:


edismax
archive_id^1 author^20 chapter_title^15 isbn^1 
publisher^5 subjects^5 text^1 title^120

chapter_title~2^1 subjects~2^20 text~10^1 title~2^4
100%
OR

Is there some process that generates a slop query for co-occurring terms?

As an aside, the same query returns a document when we use the lucene 
query parser: it matches one document.  But when I search across our 
unstemmed field, it returns more.  It appears as if


It seems as if when hunspell returns multiple terms from a single 
one, this causes problems?


So in summary: why would hunspell generate "following" as a stem for 
"following"? Probably just a buggy dictionary entry; we could fix 
that, but I wouldn't expect the phrase behavior in that case from 
edismax either.  Can anybody shed some light as to what's going on here?


Thanks

-Mike






SOLR bf SyntaxError

2014-11-18 Thread David Lee
Hi,

I tried to use bf for boosting,  and got the following error:

org.apache.solr.common.SolrException: org.apache.solr.search.SyntaxError:
Unexpected text after function: )


Here's the bf boosting:

sum(div(product(log(map(reviews,0,0,1)),rating),2.5),div(log(map(sales,0,0,1)),10))


What's the syntax issue here?


Thanks,
DL


Re: mlt handler not giving response in Solr Cloud

2014-11-18 Thread Anshum Gupta
Hi Jilani,

Looking at the use case you have, you might want to try out the MLT Query
parser. The handler has issues when the client sends MLT request to a shard
that doesn't contain the document because of the way it's been designed.

Look at the following issues:
* SOLR-5480 : Make
MoreLikeThis handler distributable
* SOLR-6248 : MLTQParser
that works with SolrCloud

SOLR-6248 hasn't been released yet and would be released with Solr 5.0.


On Tue, Nov 18, 2014 at 11:23 AM, Jilani Shaik 
wrote:

> Please help me on this issue. Please provide me suggestions what is missing
> to get the response from multiple solr shards in cloud.
>
> On Tue, Nov 18, 2014 at 1:40 PM, Jilani Shaik 
> wrote:
>
> > Hi,
> >
> > When I tried to execute the mlt handler query on a shard it is giving
> > result if the documents exist on that shards.
> >
> > in below scenario, I have a cloud shards on localhost with ports 8181 and
> > 8191. where documents are distributed. if the mlt query document id
> belongs
> > to 8181 shard and the query hits to 8181 shard then only I am getting the
> > results.
> >
> >
> >  No result
> >
> >
> http://localhost:8181/solr/collectionName/mlt?q=id:medl_24806189&fq=segment:medl&fl=id,owui_p&rows=100
> >
> >  Will give result
> >
> >
> http://localhost:8191/solr/collectionName/mlt?q=id:medl_24806189&fq=segment:medl&fl=id,owui_p&rows=100
> >
> > *So the distributed search is not working for mlt handler(my assumption,
> > please correct). *
> >
> > Even I tried with the below
> >
> >
> >
> http://localhost:8181/solr/collectionName/mlt?q=owui_p:medl_24806189&fq=segment:medl&fl=id,owui_p&rows=100&;
> > *shards.qt=/mlt&shards=localhost:8181/solr/,localhost:8191/solr/*
> >
> >
> >
> http://localhost:8181/solr/collectionName/mlt?q=owui_p:medl_24806189&fq=segment:medl&fl=id,owui_p&rows=100
> >
> *&shards.qt=/mlt&shards=localhost:8181/solr/collectionName/,localhost:8191/solr/collectionName/*
> >
> > even I tried with select handler and with mlt as true also not working.
> >
> > http://localhost:8181/solr/collectionName/*select?mlt=true*
> >
> &q=owui_p:medl_24806189&fq=segment:medl&fl=id,owui_p&rows=100&distrib=true&mlt.fl=ti_w
> >
> >
> > MLT configuration from solrconfig.xml
> >
> > 
> > 
> > 
> > ti_w
> > 1
> > 2
> > true
> >  >
> name="shards">localhost:8181/solr/collectionName,localhost:8191/solr/collectionName
> > /mlt
> > true
> > all
> > 
> > 
> >
> >
> >
> > Please let me know what is the missing here to get the result in solr
> > cloud.
> >
> > Thanks,
> > Jilani
> >
>



-- 
Anshum Gupta
http://about.me/anshumgupta


Re: problems when hunspell returns multiple stems

2014-11-18 Thread Alexandre Rafalovitch
On 18 November 2014 15:52, Michael Sokolov
 wrote:
> I found a rogue new component in our analyzer

We have a first Solr virus? I thought we were safe until the "upload
the plugin" JIRA was in production :-)

Regards,
   Alex.

Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


Re: mlt handler not giving response in Solr Cloud

2014-11-18 Thread Shawn Heisey
On 11/18/2014 1:10 AM, Jilani Shaik wrote:
> When I tried to execute the mlt handler query on a shard it is giving
> result if the documents exist on that shards.
>
> in below scenario, I have a cloud shards on localhost with ports 8181 and
> 8191. where documents are distributed. if the mlt query document id belongs
> to 8181 shard and the query hits to 8181 shard then only I am getting the
> results.

The MoreLikeThisComponent was recently upgraded to work in distributed
mode, but I'm fairly sure the MoreLikeThisHandler has not received the
same upgrade.  You'll need to configure the component in another handler
and use the mlt parameters on the request URL.

I've notice some performance issues with the component on a distributed
index, but they have not yet been addressed.  I thought there was an
issue filed for that, but I can't seem to find it.

Thanks,
Shawn



Re: New Meetup in London - Lucene/Solr User Group

2014-11-18 Thread Otis Gospodnetic
Would LOVE to see the results (assuming you can ensure the same fruit(s?)
are being compared)

Otis
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/


On Tue, Nov 18, 2014 at 11:55 AM, Alexandre Rafalovitch 
wrote:

> On 18 November 2014 11:41, Charlie Hull  wrote:
> > presenting some results of a Solr/Elasticsearch comparative performance
> > study.
>
> I was asked about that a couple of times at the Solr Revolution
> conference. Looking forward to seeing the results.
>
> Regards,
>Alex.
>
> Personal: http://www.outerthoughts.com/ and @arafalov
> Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
> Solr popularizers community: https://www.linkedin.com/groups?gid=6713853
>


Re: OutOfMemory on 28 docs with facet.method=fc/fcs

2014-11-18 Thread Mohsin Beg Beg


Looking at SimpleFacets.java, doesn't fc/fcs iterate only over the DocSet for 
the fields. So assuming each field has a unique term across the 28 rows, a max 
of 28 * 15 unique small strings (<100bytes), should be in the order of 1MB. For 
100 collections, lets say a total of 1GB. Now lets say I multiply it by 3 to 
3GB. 

That still leaves more that 4GB heap used by something else to run out memory? 
Who and why? *sigh*

Now looking at the alternatives...
1. I don't know which shards have the docs so yes, all collections are needed. 
q and fq params can be complicated.
2. "hierarchical" field doesn't work when selecting 15 fields (out of 300+) 
since there is no way to give a one hierarchical path in fq or via facet.prefix 
value.
3. One facet-at-time exceeds the total latency requirements of the app on top

Am I stuck ?

ps: Doesn't enum build the uninverted index for each unique term in the field 
and then intersect with the DocSet to return the facet counts? This causes 
filterCache entries to be bloated in each core. That causes OOM on just  4 or 5 
string fields (depending on their cardinality).

-M


- Original Message -
From: t...@statsbiblioteket.dk
To: solr-user@lucene.apache.org
Sent: Tuesday, November 18, 2014 12:34:08 PM GMT -08:00 US/Canada Pacific
Subject: RE: OutOfMemory on 28 docs with facet.method=fc/fcs

Mohsin Beg Beg [mohsin@oracle.com] wrote:
> I am getting OOM when faceting on numFound=28. The receiving
> solr node throws the OutOfMemoryError even though there is 7gb
> available heap before the faceting request was submitted.

fc and fcs faceting memory overhead is (nearly) independent on the number of 
hits in the search result. 

> If a different solr node is selected that one fails too. Any suggestions ?

> &facet.field=field1field15
> &f.field1...field15.facet.method=fc/fcs
> &collection=Collection1...Collection100

You seem to be issuing a facet request for 15 fields in 100 collection 
concurrently. The memory overhead will be linear to the number of documents, 
references from documents to field values and the number of unique values in 
your facets, for each facet independently.

That was confusing. Let me try an example instead:

For each field, static memory requirements will be a structure that maps from 
documents to term ordinals. Depending on circumstances, this can be small 
(DocValues and a numeric field) or big (multi-value, non-DocValue String). Each 
concurrent call will temporarily allocate a structure for counting. If the 
field is numeric, this will be a hashmap. If it is String, it will be an 
integer-array with as many entries as there are unique values: If there are 1M 
unique String values in the field, the overhead will be 4 bytes * 1M = 4MB.

So, if each field has 250K unique String values, the temporary overhead for all 
15 fields will be 15MB. I don't now if the request for multiple collections is 
threaded, but if so, the 15MB should be multiplied with 100, totalling 1.5GB 
memory overhead for each call. Add the static structures and it does not seem 
unreasonable that you run out of memory.

All this is very loose, but the overall message is that documents, unique facet 
values, facets and collections all multiplies memory requirements.

* Do you need to query all collections at once?
* Can you collapse some of the facet fields, to reduce the total number?
* Are some of the fields very small? If so, use enum for them instead of fc/fcs.
* Maybe you can determine your limits by issuing requests first for 1 field, 
then 2 etc. This is to see if it is feasible to do minor tweak to get it to 
work or if your setup is so large that something entirely else needs to be done.

- Toke Eskildsen


Re: OutOfMemory on 28 docs with facet.method=fc/fcs

2014-11-18 Thread Shawn Heisey
On 11/18/2014 3:06 PM, Mohsin Beg Beg wrote:
> Looking at SimpleFacets.java, doesn't fc/fcs iterate only over the DocSet for 
> the fields. So assuming each field has a unique term across the 28 rows, a 
> max of 28 * 15 unique small strings (<100bytes), should be in the order of 
> 1MB. For 100 collections, lets say a total of 1GB. Now lets say I multiply it 
> by 3 to 3GB. 

Are there 28 documents in the entire index?  It's my understanding that
the fieldcache memory required is not dependent on the number of
documents that match your query (numFound), it's dependent on the number
of documents in the entire index.

If my understanding is correct, once that memory structure is calculated
and stored in the fieldcache, it's available to speed up future facets
on that field, even if the query and filters are different than what was
used the first time.  It doesn't seem as useful for typical use cases to
store a facet cache entry that depends on the specific query.

Thanks,
Shawn



RE: OutOfMemory on 28 docs with facet.method=fc/fcs

2014-11-18 Thread Toke Eskildsen
Mohsin Beg Beg [mohsin@oracle.com] wrote:

> Looking at SimpleFacets.java, doesn't fc/fcs iterate only over the DocSet for 
> the fields.

To get the seed for the concrete faceting resolving, yes. That still leaves the 
mapping and the counting structures.

> So assuming each field has a unique term across the 28 rows, a max of 28 * 15
> unique small strings (<100bytes), should be in the order of 1MB.
> For 100 collections, lets say a total of 1GB. Now lets say I multiply it by 3 
> to 3GB.

My explanation must have been unclear. You are still operating under the 
assumption that fc/fcs memory consumption is tied to the size of the search 
result. That is not the case. What you are describing sounds more like enum.

> That still leaves more that 4GB heap used by something else to run out 
> memory? Who and why? *sigh*

The why is quite simple: fc/fcs is designed to deliver fast faceting for fields 
with non-trivial cardinality. The cost is memory overhead and delayed startup. 
Both mitigated but not removed by using DocValues.

> 2. "hierarchical" field doesn't work when selecting 15 fields (out of 300+) 
> since
> there is no way to give a one hierarchical path in fq or via facet.prefix 
> value.

Sadly Solr does not yet support under-the-hood collapsing of facets.

> 3. One facet-at-time exceeds the total latency requirements of the app on top

But you can run one facet at a time? If so, what about 2? 3? What is your 
current limit?

> Am I stuck ?

Not yet.

Are you currently using DocValues?

Could you describe your facet fields a bit more? Type? Cardinality? Maximum 
count for any tag? If you have high cardinality (1M+) for some fields and a low 
(< 65000) maximum count, http://tokee.github.io/lucene-solr/ could help you by 
lowering memory usage.

If you can accept imprecise counts, you could speed up the faceting process 
substantially by doing single-phase distributed faceting and maybe get 
satisfactory performance requesting the facet results one at a time.

> ps: Doesn't enum build the uninverted index for each unique term in the field 
> and then intersect
> with the DocSet to return the facet counts?

Not an uninverted index as such, but yes, it gets the docIDs for each term and 
intersects with the query result docIDs.

> This causes filterCache entries to be bloated in each core. That causes OOM
> on just  4 or 5 string fields (depending on their cardinality).

Set filterCache lower? But if your cardinality is in the thousands or higher, 
enum is unlikely to give you proper response times.

- Toke Eskildsen


Re: OutOfMemory on 28 docs with facet.method=fc/fcs

2014-11-18 Thread Mohsin Beg Beg


solrcloud has 8billion+ docs and increasing non-linearly each hour.
numFound=28 was for the faceting query only.

If fieldCache (lucene caches) is the issue, is q=time:[ TO ] be better instead ?

-Mohsin



- Original Message -
From: apa...@elyograg.org
To: solr-user@lucene.apache.org
Sent: Tuesday, November 18, 2014 2:45:46 PM GMT -08:00 US/Canada Pacific
Subject: Re: OutOfMemory on 28 docs with facet.method=fc/fcs

On 11/18/2014 3:06 PM, Mohsin Beg Beg wrote:
> Looking at SimpleFacets.java, doesn't fc/fcs iterate only over the DocSet for 
> the fields. So assuming each field has a unique term across the 28 rows, a 
> max of 28 * 15 unique small strings (<100bytes), should be in the order of 
> 1MB. For 100 collections, lets say a total of 1GB. Now lets say I multiply it 
> by 3 to 3GB. 

Are there 28 documents in the entire index?  It's my understanding that
the fieldcache memory required is not dependent on the number of
documents that match your query (numFound), it's dependent on the number
of documents in the entire index.

If my understanding is correct, once that memory structure is calculated
and stored in the fieldcache, it's available to speed up future facets
on that field, even if the query and filters are different than what was
used the first time.  It doesn't seem as useful for typical use cases to
store a facet cache entry that depends on the specific query.

Thanks,
Shawn


RE: Hierarchical faceting

2014-11-18 Thread Appaneravanda, Rashmy
Thanks Evan and Jason.

I'll probably go with the approach that Evan suggested. This allows the UI to 
change and display full hierarchy if required in future.


-Original Message-
From: Evan Pease [mailto:evancpe...@gmail.com] 
Sent: Tuesday, November 18, 2014 12:41 AM
To: solr-user@lucene.apache.org
Subject: Re: Hierarchical faceting

>I'm looking to see if Solr has any in-built tokenizer that splits the
tokens
>and prepends with the depth information. I'd like to avoid building 
>depth information into the filed values if Solr already has something 
>that can be used.

So the goal is to find out the level of the tree for each category? You could 
determine this in the UI by splitting the category facet value string by the 
separator.

As you're aware, when you query a field indexed using 
solr.PathHierarchyTokenizerFactory
you still get the full path category path back as a facet value.

For example, if a user navigates to "Phy":
fq={!term f=category}NonFic/Sci/Phy

The facet values that are returned will look like this (made up counts):


  10
  
wrote:

> I realize you want to avoid putting depth details into the field 
> values, but something has to imply the depth.  So with that in mind, 
> here is another approach (with the assumption that you are chasing 
> down a single branch of a tree (and all its subbranch offshoots)),
>
> Use dynamic fields
> Step from one level to the next with a simple increment Build the 
> facet for the next level on the call The UI needs only know the 
> current level
>
> This would possibly be as so:
>
> step_fieldname_n
>
> With a dynamic field configuration of:
>
> step_*
>
> The content of the step_fieldname_n field would either be the strong 
> of the field value or the delimited path of the current level (as 
> suited to taste).  Either way, most likely a fieldType of String (or 
> some variation
> thereof)
>
> The UI would then call:
>
> facet.field=step_fieldname_n+1
>
> And the UI would need to be aware to carry the n+1 into the fq link
> verbiage:
>
> fq=step_fieldname_n+1:facetvalue
>
> The trick of all of this is that you must build your index with the 
> depth of your hierarchy in mind to place the values into the suitable fields.
> You could, of course, write an UpdateProcessor to accomplish this if 
> that seems fitting.
>
> Jason
>
> > On Nov 17, 2014, at 12:22 PM, Alexandre Rafalovitch 
> > 
> wrote:
> >
> > You might be able to stick in a couple of 
> > PatternReplaceFilterFactory in a row with regular expressions to catch 
> > different levels.
> >
> > Something like:
> >
> >  > pattern="^[^0-9][^/]+/[^/]/[^/]+$" replacement="2$0" />  > class="solr.PatternReplaceFilterFactory"
> > pattern="^[^0-9][^/]+/[^/]$" replacement="1$0" /> ...
> >
> > I did not test this, you may need to escape some thing or put 
> > explicit groups in there.
> >
> > Regards,
> >   Alex.
> > P.s.
> http://www.solr-start.com/javadoc/solr-lucene/org/apache/lucene/analys
> is/pattern/PatternReplaceFilterFactory.html
> >
> > Personal: http://www.outerthoughts.com/ and @arafalov Solr resources 
> > and newsletter: http://www.solr-start.com/ and @solrstart Solr 
> > popularizers community: https://www.linkedin.com/groups?gid=6713853
> >
> >
> > On 17 November 2014 15:01, rashmy1 
> > 
> wrote:
> >> Hi Alexandre,
> >> Yes, I've read this post and that's the 'Option1' listed in my 
> >> initial
> post.
> >>
> >> I'm looking to see if Solr has any in-built tokenizer that splits 
> >> the
> tokens
> >> and prepends with the depth information. I'd like to avoid building
> depth
> >> information into the filed values if Solr already has something 
> >> that
> can be
> >> used.
> >>
> >> Thanks!
> >>
> >>
> >>
> >> --
> >> View this message in context:
> http://lucene.472066.n3.nabble.com/Hierarchical-faceting-tp4169263p416
> 9536.html
> >> Sent from the Solr - User mailing list archive at Nabble.com.
>
>