Your schema.xml setting for the field is probably tokenizing the punctuation. 
Change the field type to one that doesn't tokenize on punctuation; e.g. use 
"text_ws" and not "text"

-----Original Message-----
From: PeterKerk [mailto:vettepa...@hotmail.com] 
Sent: Wednesday, August 04, 2010 3:36 PM
To: solr-user@lucene.apache.org
Subject: Indexing fieldvalues with dashes and spaces


Im having issues with indexing field values containing spaces and dashes.
For example: Im trying to index province names of the Netherlands. Some 
province names contain a "-":
Zuid-Holland
Noord-Holland

my data-config has this:
            <entity name="location_province" query="select provinceid from 
locations where id=${location.id}">
                <entity name="provinces" query="select title from provinces 
where id = ${location_province.provinceid}">
                    <field name="province" column="title"  />
                </entity>
            </entity>


When I check what has been indexed, I have this:
<response>
−
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">0</int>
−
<lst name="params">
<str name="indent">on</str>
<str name="start">0</str>
<str name="q">*:*</str>
<str name="version">2.2</str>
<str name="rows">10</str>
</lst>
</lst>
−
<result name="response" numFound="3" start="0"> − <doc> <str 
name="city">Nijmegen</str> − <arr name="features"> <str>Tuin</str> 
<str>Cafe</str> </arr> <str name="id">1</str> <str 
name="province">Gelderland</str> − <arr name="services"> 
<str>Fotoreportage</str> </arr> − <arr name="theme"> <str>Gemeentehuis</str> 
</arr> <date name="timestamp">2010-08-04T19:11:51.796Z</date>
<str name="title">Gemeentehuis Nijmegen</str> </doc> − <doc> <str 
name="city">Utrecht</str> − <arr name="features"> <str>Tuin</str> 
<str>Cafe</str> <str>Danszaal</str> </arr> <str name="id">2</str> <str 
name="province">Utrecht</str> − <arr name="services"> <str>Fotoreportage</str> 
<str>Exclusieve huur</str> </arr> − <arr name="theme"> <str>Gemeentehuis</str> 
</arr> <date name="timestamp">2010-08-04T19:11:51.796Z</date>
<str name="title">Gemeentehuis Utrecht</str> </doc> − <doc> <str 
name="city">Bloemendaal</str> − <arr name="features"> <str>Strand</str> 
<str>Cafe</str> <str>Danszaal</str> </arr> <str name="id">3</str> <str 
name="province">Zuid-Holland</str>
−
<arr name="services">
<str>Exclusieve huur</str>
<str>Live muziek</str>
</arr>
−
<arr name="theme">
<str>Strand & Zee</str>
</arr>
<date name="timestamp">2010-08-04T19:11:51.812Z</date>
<str name="title">Beachclub Vroeger</str> </doc> </result> </response>



So we see that the full field has been indexed:
<str name="province">Zuid-Holland</str>


BUT, when I check the facets via
http://localhost:8983/solr/db/select/?wt=json&indent=on&q=*:*&fl=id,title,city,score,features,official,services&facet=true&facet.field=theme&facet.field=features&facet.field=province&facet.field=services

I get this (snippet):
"facet_counts":{
  "facet_queries":{},
  "facet_fields":{
        "theme":[
         "Gemeentehuis",2,
         "&",1,               <================ a
         "Strand",1,
         "Zee",1],
        "features":[
         "cafe",3,
         "danszaal",2,
         "tuin",2,
         "strand",1],
        "province":[
         "gelderland",1,
         "holland",1,
         "utrecht",1,
         "zuid",1,         <================  b
         "zuidholland",1],
        "services":[
         "exclusiev",2,
         "fotoreportag",2, <================  c
         "huur",2,
         "live",1,          <================  d
         "muziek",1]},


There several weird things happen which I have indicated with <===

a. the full field value is "Strand & Zee", but now one facet is "&"
b. the full field value is "Zuid-Holland", but now "zuid" is a separate facet 
c. the full field value is "fotoreportage", but somehow the last character has 
been truncated d. the full field value "live muziek", but now "live" and 
"muziek" have become separate facets

What can I do about this?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-fieldvalues-with-dashes-and-spaces-tp1023699p1023699.html
Sent from the Solr - User mailing list archive at Nabble.com.


Reply via email to