Hi Erick,

Ok. its more clear now. I indeed have the whitespace tokenizer:

    <fieldType name="textTrue" class="solr.TextField"
positionIncrementGap="100" >
      <analyzer>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="false"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords_dutch.txt" />
        <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="0" generateNumberParts="0" catenateWords="1"
catenateNumbers="1" catenateAll="0"/>
        <filter class="solr.ISOLatin1AccentFilterFactory"/>
        <filter class="solr.SnowballPorterFilterFactory" language="Dutch"
protected="protwords.txt"/>
      </analyzer>
    </fieldType>


What happens is that I have a field called 'Beach & Sea", which is a theme
for a location. What happens because of the whitespace tokenizer, it gets
split up in 2 fields: 
         "Beach",2,
         "Sea",2],
(see below)

Ofcourse those individual facet names are NOT correct facetnames, because it
should be "Beach & Sea".
But if I REMOVE the whitespace tokenizer, it throws an error that a
fieldtype should always have a tokenizer.
But which tokenizer would I need in order for me to have the correct facet
name?
(I've been checking this page
btw:http://lucene.apache.org/solr/api/org/apache/solr/analysis/package-summary.html)


"facet_counts":{
  "facet_queries":{},
  "facet_fields":{
        "themes":[
         "Gemeentehuis",2,
         "Beach",2,
         "Sea",2],
        "province":[
         "gelderland",1,
         "utrecht",1,
         "zuidholland",1],
        "services":[
         "exclusiev",2,
         "fotoreportag",2,
         "hur",2,
         "liv",1,
         "muziek",1]},
  "facet_dates":{}}}



-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-fieldvalues-with-dashes-and-spaces-tp1023699p1052554.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to