Re: Listening on SolrCloud events

2014-07-04 Thread Ugo Matrangolo
Hi,

thank you very much for the answers.

I think the best way is to write a little Zookeeper watcher myself and
react on event of interest.

Thank you,
Ugo



On Thu, Jul 3, 2014 at 9:42 PM, Jeff Wartes  wrote:

> If you¹re using SolrJ, CloudSolrServer exposes the information you need
> directly, although you¹d have to poll it for changes.
> Specifically, this code path will get you a snapshot of the clusterstate:
> http://lucene.apache.org/solr/4_5_0/solr-solrj/org/apache/solr/client/solrj
> /impl/CloudSolrServer.html#getZkStateReader()
>
> http://lucene.apache.org/solr/4_5_0/solr-solrj/org/apache/solr/common/cloud
> /ZkStateReader.html#updateClusterState(boolean)
>
> http://lucene.apache.org/solr/4_5_0/solr-solrj/org/apache/solr/common/cloud
> /ZkStateReader.html#getClusterState()
>
>
> If you¹re not using SolrJ, or don¹t want to poll, you really only need a
> zookeeper client. As Shawn Heisey suggests, you put a watch on the
> clusterstate.json, and write something to determine if the change was
> relevant to you.
>
> A few months ago I wanted to do analysis of changes in my SolrCloud
> cluster, so I threw together something using the Curator library to set
> watches on clusterstate.json and commit the new version to a local git
> repo whenever it changed.
> (https://github.com/randomstatistic/git_zk_monitor) The hardest part
> turned out to be learning about the Zookeeper watch semantics, but you
> could use any client you like.
>
>
>
> On 7/3/14, 9:23 AM, "Shawn Heisey"  wrote:
>
> >On 7/3/2014 7:49 AM, Ugo Matrangolo wrote:
> >> I would like to be informed as soon as a cluster event happens like a
> >>node
> >> dropping and/or starting a recovery process.
> >>
> >> What is the best way (if any) to listening on SolrCloud events ?
> >
> >I don't know how it's done, but if you are using SolrJ and
> >CloudSolrServer, you have access to the zookeeper client.  With the
> >zookeeper client, you can put a watcher on various parts of the
> >zookeeper database, which should fire whenever there is a change.
> >
> >Thanks,
> >Shawn
> >
>
>


Re: How to get related facets using Solr query ?

2014-07-04 Thread Erik Hatcher
Why isn't IJK not under AB too?

Are the "Facet" field names different?  Pivot facets looks like what you want.  
facet.pivot=field1,field2 if they are different field names. 

   Erik

> On Jul 3, 2014, at 20:15, Shamik Bandopadhyay  wrote:
> 
> Hi,
> 
>   I've trying construct a facet query to organize related facets in the
> response. Let me illustrate a sample. Let's say I've the following
> documents indexed in Solr.
> 
> 1. Doc A -->
>  Facet:AB
>  Facet:MNO
> 
> 2. Doc B -->
>  Facet:CD
>  Facet:XYZ
> 
> 3. Doc C --> Facet:AB,CD
>   Facet:IJK, XYZ
> 
> 
> Now, I want the result organized as :
> 
> AB
>MNO,XYZ
> CD
>IJK,XYZ
> 
> Is there a way to do this ?
> 
> Thanks,
> Shamik


Using solr for image retrieval - very long response time

2014-07-04 Thread Yossi Biton
Hello there,

Recently I was trying to implement the bag-of-words model for image
retrieval by using Solr. Shortly this model consists of extracting "visual
words" from images and then use tf-idf schema for fast querying (usually
include also re-ranking stage).
I found solr as a suitable platform (hope i'm not wrong), as it provides
tf-idf ranking.

Currently i'm issuing the following problem :
My images usually contains about 1,000 words, so it means the query
consists of 1,000 terms.
When using simple select query with 1,000 OR i get a very long response
time (100s for index with 2M images).

Is there an efficient way to build the query in this case ?


Re: Using solr for image retrieval - very long response time

2014-07-04 Thread Jack Krupansky
I would expect an excessively long query (greater than dozens or low 
hundreds of terms) to run significantly slower, but 100 seconds does seem 
excessively slow even for a 1000-term query.


Add the debugQuery=true parameter to your query request and checking the 
timing section to see if the time is spent in the query process or some 
other stage of processing.


How is your JVM heap usage? Make sure you have enough heap but not too much. 
Are a lot of GCs occurring?


Does your index fit entirely in OS system memory for file caching? If not, 
you could be incurring tons of IO.


-- Jack Krupansky

-Original Message- 
From: Yossi Biton

Sent: Friday, July 4, 2014 7:25 AM
To: solr-user@lucene.apache.org
Subject: Using solr for image retrieval - very long response time

Hello there,

Recently I was trying to implement the bag-of-words model for image
retrieval by using Solr. Shortly this model consists of extracting "visual
words" from images and then use tf-idf schema for fast querying (usually
include also re-ranking stage).
I found solr as a suitable platform (hope i'm not wrong), as it provides
tf-idf ranking.

Currently i'm issuing the following problem :
My images usually contains about 1,000 words, so it means the query
consists of 1,000 terms.
When using simple select query with 1,000 OR i get a very long response
time (100s for index with 2M images).

Is there an efficient way to build the query in this case ? 



Re: Using solr for image retrieval - very long response time

2014-07-04 Thread Yossi Biton
1. debugQuery shows almost all of the time spent in query.

2. i cant look right now at the heap, but i remember i allocated 4gb for
the JVM and it's far from being fully used.
Regarding GC im not sure how to check it (gc.log ?).

3. The whole index fits in memory during the query.
 On Jul 4, 2014 3:31 PM, "Jack Krupansky"  wrote:

> I would expect an excessively long query (greater than dozens or low
> hundreds of terms) to run significantly slower, but 100 seconds does seem
> excessively slow even for a 1000-term query.
>
> Add the debugQuery=true parameter to your query request and checking the
> timing section to see if the time is spent in the query process or some
> other stage of processing.
>
> How is your JVM heap usage? Make sure you have enough heap but not too
> much. Are a lot of GCs occurring?
>
> Does your index fit entirely in OS system memory for file caching? If not,
> you could be incurring tons of IO.
>
> -- Jack Krupansky
>
> -Original Message- From: Yossi Biton
> Sent: Friday, July 4, 2014 7:25 AM
> To: solr-user@lucene.apache.org
> Subject: Using solr for image retrieval - very long response time
>
> Hello there,
>
> Recently I was trying to implement the bag-of-words model for image
> retrieval by using Solr. Shortly this model consists of extracting "visual
> words" from images and then use tf-idf schema for fast querying (usually
> include also re-ranking stage).
> I found solr as a suitable platform (hope i'm not wrong), as it provides
> tf-idf ranking.
>
> Currently i'm issuing the following problem :
> My images usually contains about 1,000 words, so it means the query
> consists of 1,000 terms.
> When using simple select query with 1,000 OR i get a very long response
> time (100s for index with 2M images).
>
> Is there an efficient way to build the query in this case ?
>


multilingual search

2014-07-04 Thread benjelloun
Hello,

what i need to do is to detect language of my fields then when i search with
"/select  RequestHandler"
how can i define for a search to detect the language of words to choose
which field_langid use.

my conf:


   
   
 true
 NomDocument,ContenuDocument,Postit,
 
 language_s
 en,fr,ar
 fr
 0.6
 true
 true 
 true

   















 
   explicit
   10
   edismax
   
   AllChamp^2.0 AllChamp_ar^2.0 AllChamp_en^2.0 AllChamp_fr^5.0
   
 


exemple for search in Solr Admin:  "nous présentons" it is frensh language.
and "nous" is a stopwords_fr.
but when i search for "nous présontons" i find nous becaus i have some
english docs which contain "nous".

this is just one exemple for on language. i dont want to add stopwords_fr in
stopwords_en.
what i want is to detect the language before the select search then choose
the field_langid for search.

Best regards,
Anass BENJELLOUN








--
View this message in context: 
http://lucene.472066.n3.nabble.com/multilingual-search-tp4145639.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: multilingual search

2014-07-04 Thread Jack Krupansky
What leads you to believe that the user is not interested in occurrences of 
the French phrase in English text? I mean, we English-speakers and writers 
like to use French phrases to show how sophisticated we are! It's part of 
our... raison d'être. If I do a Google search for "raison d'être", it 
doesn't mysteriously show me only French documents.


So, usually, it needs to be a user preference - the user's preferred 
language, and whether they want to search across documents in all languages 
or just a subset of languages. And then, on the results page you can show 
the language and a button to restrict a re-query to the specific language.


If you really need to do this query language detection, the best approach is 
to do it within your application layer (you can use the Google code for 
language detection) and then send the query to the appropriate query request 
handler, with a separate query request handler for each language that 
optimizes the settings for that language, such as the language-specific 
fields to use for the "qf" parameter.


-- Jack Krupansky

-Original Message- 
From: benjelloun

Sent: Friday, July 4, 2014 10:52 AM
To: solr-user@lucene.apache.org
Subject: multilingual search

Hello,

what i need to do is to detect language of my fields then when i search with
"/select  RequestHandler"
how can i define for a search to detect the language of words to choose
which field_langid use.

my conf:


  
  
true
NomDocument,ContenuDocument,Postit,

language_s
en,fr,ar
fr
0.6
true
true
true

  
















  explicit
  10
  edismax
  
  AllChamp^2.0 AllChamp_ar^2.0 AllChamp_en^2.0 AllChamp_fr^5.0
  



exemple for search in Solr Admin:  "nous présentons" it is frensh language.
and "nous" is a stopwords_fr.
but when i search for "nous présontons" i find nous becaus i have some
english docs which contain "nous".

this is just one exemple for on language. i dont want to add stopwords_fr in
stopwords_en.
what i want is to detect the language before the select search then choose
the field_langid for search.

Best regards,
Anass BENJELLOUN








--
View this message in context: 
http://lucene.472066.n3.nabble.com/multilingual-search-tp4145639.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Solr 4.7 Payload

2014-07-04 Thread Ranjith Venkatesan
Hi all,

I am evaluating Payload of lucene. I am using solr4.7.2 for this. I could
able to index with payload, but i couldnt able to retrieve payload from
DocsAndPositionsEnum. It is returning just null. But terms.hasPayloads() is
returning true. And i can able to see payload value in luke (image attached
below). 

I have following schema for payload field ,

*schema.xml*
   
 
  

 
  


*My indexing code,*

for(int i=1;i<=1000;i++)
{
SolrInputDocument doc1= new SolrInputDocument();
doc1.addField("id", "test:"+i);
doc1.addField("uid", ""+i);
doc1.addField("payloads", "_UID_|"+i+"f");
doc1.addField("content", "test");

server.add(doc1);
if(i%1 == 0)
{
server.commit();
}
}

server.commit();

*Search code :*
DocsAndPositionsEnum termPositionsEnum =
solrSearcher.getAtomicReader().termPositionsEnum(t);
int doc = -1;

while((doc = termPositionsEnum.nextDoc()) != 
DocsAndPositionsEnum.NO_MORE_DOCS)
{
System.out.println(termPositionsEnum.getPayload()); // returns null
}


*luke *
 

Am i missing some configuration or i am doing in a wrong way ??? Any help in
resolving this issue will be appreciated. 

Thanks in advance

Ranjith Venkatesan



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-7-Payload-tp4145641.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: multilingual search

2014-07-04 Thread Paul Libbrecht
To do just what Jack described, I often write a solr query component that does 
"query expansion".
Based on some parameters I can recognize to be a language hint (e.g. the 
language of the environment they search in, the browser's accept-language) I 
reformulate the query into a query in the fields in these languages in a 
preference order.

I am sure that doing this produces some noise. E.g. because the search corpus 
is not uniformly spread, but… I have to accept it.

There are many other example's than the fine "raison d'être" example of Jack (I 
like particularly the way he describes the motivation to using it, I almost 
hear people trying to carefully articulate this! ;-)).
Other examples of language cross-use include the "gallicisms" e.g. in German: 
http://de.wikipedia.org/wiki/Liste_von_Gallizismen or other languages linked 
there.

E.g. "direction" which has a different meanings in French (where it can mean 
the management staff) and in English (where it can mean the teacher's 
instruction), "demonstration" too, "sitting" (which is an english word used in 
French). 


paul

On 4 juil. 2014, at 17:15, "Jack Krupansky"  wrote:

> What leads you to believe that the user is not interested in occurrences of 
> the French phrase in English text? I mean, we English-speakers and writers 
> like to use French phrases to show how sophisticated we are! It's part of 
> our... raison d'être. If I do a Google search for "raison d'être", it doesn't 
> mysteriously show me only French documents.
> 
> So, usually, it needs to be a user preference - the user's preferred 
> language, and whether they want to search across documents in all languages 
> or just a subset of languages. And then, on the results page you can show the 
> language and a button to restrict a re-query to the specific language.
> 
> If you really need to do this query language detection, the best approach is 
> to do it within your application layer (you can use the Google code for 
> language detection) and then send the query to the appropriate query request 
> handler, with a separate query request handler for each language that 
> optimizes the settings for that language, such as the language-specific 
> fields to use for the "qf" parameter.
> 
> -- Jack Krupansky
> 
> -Original Message- From: benjelloun
> Sent: Friday, July 4, 2014 10:52 AM
> To: solr-user@lucene.apache.org
> Subject: multilingual search
> 
> Hello,
> 
> what i need to do is to detect language of my fields then when i search with
> "/select  RequestHandler"
> how can i define for a search to detect the language of words to choose
> which field_langid use.
> 
> my conf:
> 
> 
>   class="org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessorFactory">
>  
>true
>NomDocument,ContenuDocument,Postit,
> 
>language_s
>en,fr,ar
>fr
>0.6
>true
>true
>true
> 
>  
> 
> 
>  required="false" stored="false"/>
>  required="false" stored="false"/>
>  required="false" stored="false"/>
> 
>  required="false" multiValued="true"/>
>  required="false" multiValued="true"/>
>  required="false" multiValued="true"/>
> 
> 
> 
> 
> 
> 
>
>  explicit
>  10
>  edismax
>  
>  AllChamp^2.0 AllChamp_ar^2.0 AllChamp_en^2.0 AllChamp_fr^5.0
>  
>
> 
> 
> exemple for search in Solr Admin:  "nous présentons" it is frensh language.
> and "nous" is a stopwords_fr.
> but when i search for "nous présontons" i find nous becaus i have some
> english docs which contain "nous".
> 
> this is just one exemple for on language. i dont want to add stopwords_fr in
> stopwords_en.
> what i want is to detect the language before the select search then choose
> the field_langid for search.
> 
> Best regards,
> Anass BENJELLOUN
> 
> 
> 
> 
> 
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/multilingual-search-tp4145639.html
> Sent from the Solr - User mailing list archive at Nabble.com. 



Re: Field for 'species' data?

2014-07-04 Thread Dan Bolser
The problem is that each document has a single species (or
super-species, or sub-species), and needs to get information about
it's place in the hierarchy 'elsewhere', i.e. in an externally encoded
hierarchy.

I don't, for example, have data in this format:
SPECIES: "Hordeum / Hordeum vulgare / Hordeum vulgare var. hybernum"

but rather
SPECIES: "Hordeum vulgare".

How can I add in that data at analysis time?


Cheers,
Dan.


On 4 July 2014 04:19, Gora Mohanty  wrote:
> On 3 July 2014 21:40, Dan Bolser  wrote:
>>
>> Hi,
>>
>> Does anyone on the list have experience with hierarchical facets,
>> specifically for species data?
> [...]
>
> Maybe not specifically for species data, but hierarchical faceting works
> pretty well with Solr. Please see
> http://wiki.apache.org/solr/HierarchicalFaceting
> For your use case, I would probably use pivot facets:
> http://wiki.apache.org/solr/HierarchicalFaceting#Pivot_Facets
>
> Regards,
> Gora


Re: multilingual search

2014-07-04 Thread Jack Krupansky
Indeed, a Solr search component to customize the incoming query for query 
language can work as well. Add it to the search components before the 
"query" component, have it call the language detection code on the q 
parameter, and then modify the "qf" parameter based on the language 
discovered.


Two possible approaches come to mind:

1. Modify the qf parameter directly by either adding the "_xx" language 
suffix to each field in qf, or replacing the "xx" for any qf fields that 
already have an "_xx" suffix.
2. Have separate "qf_xx" parameters which are customized for specific 
languages and then copy the language-specific "qf_xx" parameter to the main 
qf parameter based on the language that is detected.


-- Jack Krupansky

-Original Message- 
From: Paul Libbrecht

Sent: Friday, July 4, 2014 11:36 AM
To: solr-user@lucene.apache.org
Subject: Re: multilingual search

To do just what Jack described, I often write a solr query component that 
does "query expansion".
Based on some parameters I can recognize to be a language hint (e.g. the 
language of the environment they search in, the browser's accept-language) I 
reformulate the query into a query in the fields in these languages in a 
preference order.


I am sure that doing this produces some noise. E.g. because the search 
corpus is not uniformly spread, but… I have to accept it.


There are many other example's than the fine "raison d'être" example of Jack 
(I like particularly the way he describes the motivation to using it, I 
almost hear people trying to carefully articulate this! ;-)).
Other examples of language cross-use include the "gallicisms" e.g. in 
German: http://de.wikipedia.org/wiki/Liste_von_Gallizismen or other 
languages linked there.


E.g. "direction" which has a different meanings in French (where it can mean 
the management staff) and in English (where it can mean the teacher's 
instruction), "demonstration" too, "sitting" (which is an english word used 
in French).



paul

On 4 juil. 2014, at 17:15, "Jack Krupansky"  wrote:

What leads you to believe that the user is not interested in occurrences 
of the French phrase in English text? I mean, we English-speakers and 
writers like to use French phrases to show how sophisticated we are! It's 
part of our... raison d'être. If I do a Google search for "raison d'être", 
it doesn't mysteriously show me only French documents.


So, usually, it needs to be a user preference - the user's preferred 
language, and whether they want to search across documents in all 
languages or just a subset of languages. And then, on the results page you 
can show the language and a button to restrict a re-query to the specific 
language.


If you really need to do this query language detection, the best approach 
is to do it within your application layer (you can use the Google code for 
language detection) and then send the query to the appropriate query 
request handler, with a separate query request handler for each language 
that optimizes the settings for that language, such as the 
language-specific fields to use for the "qf" parameter.


-- Jack Krupansky

-Original Message- From: benjelloun
Sent: Friday, July 4, 2014 10:52 AM
To: solr-user@lucene.apache.org
Subject: multilingual search

Hello,

what i need to do is to detect language of my fields then when i search 
with

"/select  RequestHandler"
how can i define for a search to detect the language of words to choose
which field_langid use.

my conf:


 
 
   true
   NomDocument,ContenuDocument,Postit,

   language_s
   en,fr,ar
   fr
   0.6
   true
   true
   true

 















   
 explicit
 10
 edismax
 
 AllChamp^2.0 AllChamp_ar^2.0 AllChamp_en^2.0 AllChamp_fr^5.0
 
   


exemple for search in Solr Admin:  "nous présentons" it is frensh 
language.

and "nous" is a stopwords_fr.
but when i search for "nous présontons" i find nous becaus i have some
english docs which contain "nous".

this is just one exemple for on language. i dont want to add stopwords_fr 
in

stopwords_en.
what i want is to detect the language before the select search then choose
the field_langid for search.

Best regards,
Anass BENJELLOUN








--
View this message in context: 
http://lucene.472066.n3.nabble.com/multilingual-search-tp4145639.html
Sent from the Solr - User mailing list archive at Nabble.com. 




Re: multilingual search

2014-07-04 Thread Paul Libbrecht
> 1. Modify the qf parameter directly by either adding the "_xx" language 
> suffix to each field in qf, or replacing the "xx" for any qf fields that 
> already have an "_xx" suffix.
> 2. Have separate "qf_xx" parameters which are customized for specific 
> languages and then copy the language-specific "qf_xx" parameter to the main 
> qf parameter based on the language that is detected.

The mix I make there is a little more subtle.
For this I need to remove the earlier query component and call it wrapped 
around.
Anything that is not with a named field will have that multilingual expansion.
To do this: I call the query parser with a wild default field, then perform the 
expansion on the expanded query for any term query that has the wild field, 
these become a disjunction of the languages detected of interest, each analyzed 
for their language.

This is also a solution to perform exact/stemmed/phonetic fields, and, for 
example, prefer a match in the title to a match in the body.
This assumes, of course, that these fields exist in each language (and that 
metaphone works, say, for German, for which no evidence exists yet).

paul

Re: java.net.SocketException: Connection reset

2014-07-04 Thread heaven
Today this had happened again + this one:
null:java.net.SocketException: Broken pipe
at java.net.SocketOutputStream.socketWrite0(Native Method)
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:113)
at java.net.SocketOutputStream.write(SocketOutputStream.java:159)
at
org.apache.http.impl.io.AbstractSessionOutputBuffer.write(AbstractSessionOutputBuffer.java:181)
at
org.apache.http.impl.io.ChunkedOutputStream.flushCache(ChunkedOutputStream.java:111)
at
org.apache.http.impl.io.ChunkedOutputStream.flush(ChunkedOutputStream.java:193)
at
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner$1.writeTo(ConcurrentUpdateSolrServer.java:206)
at org.apache.http.entity.EntityTemplate.writeTo(EntityTemplate.java:69)
at
org.apache.http.entity.HttpEntityWrapper.writeTo(HttpEntityWrapper.java:89)
at
org.apache.http.impl.client.EntityEnclosingRequestWrapper$EntityWrapper.writeTo(EntityEnclosingRequestWrapper.java:108)
at
org.apache.http.impl.entity.EntitySerializer.serialize(EntitySerializer.java:117)
at
org.apache.http.impl.AbstractHttpClientConnection.sendRequestEntity(AbstractHttpClientConnection.java:265)
at
org.apache.http.impl.conn.ManagedClientConnectionImpl.sendRequestEntity(ManagedClientConnectionImpl.java:203)
at
org.apache.http.protocol.HttpRequestExecutor.doSendRequest(HttpRequestExecutor.java:236)
at
org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:121)
at
org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:682)
at
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:486)
at
org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:863)
at
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
at
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:106)
at
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:57)
at
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner.run(ConcurrentUpdateSolrServer.java:233)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)

Previously we had all 4 instances on a single node so I thought these errors
might be result of high load. Like if some request taking too long to
complete or something like that. And we always had missing docs in the index
or vise verse some docs remains in the index when they shouldn't (even
though it is supposed to recover from the log and our index queue never
remove docs from it until it gets a successful response from Solr).

But now we run shards and replicas on separate nodes with lots of resources
and a very fast disk storage. And it still causes weird errors. It seems
Solr is buggy as hell, that's my impression after a few years of usage. And
it doesn't get better in this aspect, these errors follow us from the very
beginning.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/java-net-SocketException-Connection-reset-tp4145519p4145675.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Field for 'species' data?

2014-07-04 Thread Jack Krupansky
I haven't fully digested your species hierarchy requirements, but you can do 
just about anything in a Solr update processor. So you can parse the string 
and then put pieces into different fields to represent portions of the 
hierarchy. Then at query time, your application facet navigation is simply 
using the prefix from the facet selection as a filter on one of those fields 
in which the various hierarchy components are stored.


Alternatively, you may be able to get by using the path hierarchy tokenizer:
http://lucene.apache.org/core/4_0_0/analyzers-common/org/apache/lucene/analysis/path/PathHierarchyTokenizerFactory.html

Or maybe a combination of the two approaches.

I think I have some examples of it in my e-book.

-- Jack Krupansky

-Original Message- 
From: Dan Bolser

Sent: Friday, July 4, 2014 11:57 AM
To: solr-user
Subject: Re: Field for 'species' data?

The problem is that each document has a single species (or
super-species, or sub-species), and needs to get information about
it's place in the hierarchy 'elsewhere', i.e. in an externally encoded
hierarchy.

I don't, for example, have data in this format:
SPECIES: "Hordeum / Hordeum vulgare / Hordeum vulgare var. hybernum"

but rather
SPECIES: "Hordeum vulgare".

How can I add in that data at analysis time?


Cheers,
Dan.


On 4 July 2014 04:19, Gora Mohanty  wrote:

On 3 July 2014 21:40, Dan Bolser  wrote:


Hi,

Does anyone on the list have experience with hierarchical facets,
specifically for species data?

[...]

Maybe not specifically for species data, but hierarchical faceting works
pretty well with Solr. Please see
http://wiki.apache.org/solr/HierarchicalFaceting
For your use case, I would probably use pivot facets:
http://wiki.apache.org/solr/HierarchicalFaceting#Pivot_Facets

Regards,
Gora 




Re: Field for 'species' data?

2014-07-04 Thread Dan Bolser
I think I need to lookup the given species value in a taxonomy, build the
'path' and pass the result to the path hierarchy tokenizer or similar. I
figure I'll do this with a field analyzer.
On 4 Jul 2014 22:30, "Jack Krupansky"  wrote:

> I haven't fully digested your species hierarchy requirements, but you can
> do just about anything in a Solr update processor. So you can parse the
> string and then put pieces into different fields to represent portions of
> the hierarchy. Then at query time, your application facet navigation is
> simply using the prefix from the facet selection as a filter on one of
> those fields in which the various hierarchy components are stored.
>
> Alternatively, you may be able to get by using the path hierarchy
> tokenizer:
> http://lucene.apache.org/core/4_0_0/analyzers-common/org/
> apache/lucene/analysis/path/PathHierarchyTokenizerFactory.html
>
> Or maybe a combination of the two approaches.
>
> I think I have some examples of it in my e-book.
>
> -- Jack Krupansky
>
> -Original Message- From: Dan Bolser
> Sent: Friday, July 4, 2014 11:57 AM
> To: solr-user
> Subject: Re: Field for 'species' data?
>
> The problem is that each document has a single species (or
> super-species, or sub-species), and needs to get information about
> it's place in the hierarchy 'elsewhere', i.e. in an externally encoded
> hierarchy.
>
> I don't, for example, have data in this format:
> SPECIES: "Hordeum / Hordeum vulgare / Hordeum vulgare var. hybernum"
>
> but rather
> SPECIES: "Hordeum vulgare".
>
> How can I add in that data at analysis time?
>
>
> Cheers,
> Dan.
>
>
> On 4 July 2014 04:19, Gora Mohanty  wrote:
>
>> On 3 July 2014 21:40, Dan Bolser  wrote:
>>
>>>
>>> Hi,
>>>
>>> Does anyone on the list have experience with hierarchical facets,
>>> specifically for species data?
>>>
>> [...]
>>
>> Maybe not specifically for species data, but hierarchical faceting works
>> pretty well with Solr. Please see
>> http://wiki.apache.org/solr/HierarchicalFaceting
>> For your use case, I would probably use pivot facets:
>> http://wiki.apache.org/solr/HierarchicalFaceting#Pivot_Facets
>>
>> Regards,
>> Gora
>>
>
>


Re: How to get related facets using Solr query ?

2014-07-04 Thread shamik
Thanks for the pointer Eric. You are right, I forgot to include "IJK" under
AB. Also, facet field names are different. Unfortunately, I'm using
Solrcloud and facet pivot doesn't seem to work in a distributed mode. I'll
get back some result if I use distrib=false, but then it's not the right
data.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-get-related-facets-using-Solr-query-tp4145580p4145684.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Field for 'species' data?

2014-07-04 Thread Alexandre Rafalovitch
Do that with a custom update request processor.

Just remember Solr is there to find things not to preserve structure. So
mangle your data until you can find it.

Also check if SirenDB would fit your requirements if you want to encode the
information as complex structure.

Regards,
Alex