Re: Problems using fieldType text_general in copyField

John Bickerstaff Thu, 04 Aug 2016 15:22:57 -0700

Thanks!

The schema is a copy of the techproducts sample.


Entire include here - and I take your point about the possibility of
malformation - thanks.

I assumed (perhaps wrongly) that I could duplicate the <schema ...>
 </schema> arrangement from the schema.xml file.

I'm unfamiliar with xml entity includes, but I'll go look them up...

<?xml version="1.0" encoding="UTF-8" ?>
<schema name="example" version="1.6">

   <!-- ngram field to support suggestions / lookahead search on title (and
category, contentType)-->
   <copyField source="foobar" dest="text"/>
   <field name="suggestion_ngram_for_title" type="text_suggest_ngram"
indexed="true" stored="false"/>
   <field name="displayurl" type="text_general" indexed="true"
stored="true" multiValued="false"/>
   <field name="productVersionId" type="string" indexed="true"
stored="true" multiValued="false"/>
   <field name="caption" type="text_general" indexed="true" stored="true"
multiValued="false"/>
   <field name="documentId" type="string" indexed="true" stored="true"
multiValued="false"/>
   <!--<field name="category" type="string" indexed="true" stored="true"
multiValued="true"/>-->
   <field name="contentType" type="text_special_synonym" indexed="true"
stored="true" multiValued="false"/>
   <!-- Do NOT assume that much thought went into using int on the
following field. This is testing only!-->
   <field name="preference_" type="int" indexed="true" stored="true"
multiValued="false"/>

   <field name="meta_doc_type" type="text_general" indexed="true"
stored="true" multiValued="false"/>
   <!--<field name="content" type="text_general" indexed="true"
stored="true" multiValued="false"/>-->

   <!-- STATdx Weighting fields here. These are not part of the document,
but are used to calculate relevancy scores -->
   <field name="category_weight"  type="double" indexed="true"
 stored="true"/>    <!-- used for rule one - weighting docs on general
usefulness -->

   <!-- Main body of document extracted by SolrCell.
        NOTE: This field is not indexed by default, since it is also copied
to "text"
        using copyField below. This is to save space. Use this field for
returning and
        highlighting document content. Use the "text" field to search the
content. -->
   <field name="content" type="text_en" indexed="false" stored="true"
multiValued="true"/> *//HERE IS WHERE "CONTENT" IS DEFINED*

<!-- test for parsing statdx-provided html in content field. text_html has
been modified to clean html -->
   <field name="html_content" type="text_html" indexed="true" stored="true"
multiValued="true"/>

   <!-- Text fields from SolrCell to search by default in our catch-all
field -->
   <copyField source="title" dest="text"/>
   <copyField source="author" dest="text"/>
   <copyField source="description" dest="text"/>
   <copyField source="keywords" dest="text"/>
   <copyField source="content" dest="text"/>  /*/THROWING ERROR ABOUT
"CONTENT" NOT EXISTING HERE*
   <copyField source="content_type" dest="text"/>
   <copyField source="resourcename" dest="text"/>
   <copyField source="url" dest="text"/>

   <!-- Create a string version of author for faceting -->
   <copyField source="author" dest="author_s"/>

  <!-- Above, multiple source fields are copied to the [text] field.
          Another way to map multiple source fields to the same
          destination field is to use the dynamic field syntax.
          copyField also supports a maxChars to copy setting.  -->

        <copyField source="*_en" dest="text"/>


    <!-- a copy of text_general. Used to handle the rule that says that
docs with "table"
         and "tsm" in the contentType field should show at the top of
results IF any of the
         following terms are in the search term submitted by the user:
         [TNM, AJCC, Stage, Staging, FIGO]   Note the special synonym file
in the xml below.
         Note to self: Expand this documentation if we end up adding more
"special" synonyms -->
    <fieldType name="text_special_synonym" class="solr.TextField"
positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" />
                <!-- in this example, we will only use synonyms at query
time
        <filter class="solr.SynonymFilterFactory"
synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
        -->
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" />
        <!-- Special synonym file here!!!!  -->
        <filter class="solr.SynonymFilterFactory"
synonyms="contentType_synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
    </fieldType>

</schema>



On Thu, Aug 4, 2016 at 3:55 PM, Chris Hostetter <hossman_luc...@fucit.org>
wrote:

>
> you mentioned that the problem only happens when you use xinclude, but you
> havne't shown us hte details of your xinclude -- what exactly does your
> schema.xml look like (with the xinclude call) and what exactly does the
> file being included look like (entire contents)
>
> (I suspect the problem you are seeing is realted to the way xinclude
> doens't really support "snippets" of malformed xml, and instead requires
> some root tag -- i can't imagine what root tag you are using in the
> included file that would play nicely with mixing/matching field
> declarations. ... using xml entity includes may be a simpler/safer option)
>
>
>
> : Date: Thu, 4 Aug 2016 15:47:00 -0600
> : From: John Bickerstaff <j...@johnbickerstaff.com>
> : Reply-To: solr-user@lucene.apache.org
> : To: solr-user@lucene.apache.org
> : Subject: Re: Problems using fieldType text_general in copyField
> :
> : I would call this a bug...
> :
> : I'm going out on a limb and say that if you define a field in the
> included
> : XML file, you will get this error.
> :
> : As long as the field is defined first in schema.xml, you can "copyFIeld"
> it
> : or whatever in the include file, but apparently fields MUST be created in
> : the schema.xml file.
> :
> : That makes use of the include for custom things somewhat moot - at least
> in
> : my situation.
> :
> : I'd love to be wrong by the way, but that's what my tests suggest right
> : now...
> :
> : On Thu, Aug 4, 2016 at 1:37 PM, John Bickerstaff <
> j...@johnbickerstaff.com>
> : wrote:
> :
> : > Summary:
> : >
> : > Using xinclude to include an xml file into schema.xml
> : >
> : > The following line
> : >
> : > <copyField source="content" dest="text"/>
> : >
> : > generates an error:  about a field being "not a glob and not matching
> an
> : > explicit field" even though I declare the field in the line just above.
> : >
> : > This seems to happen only for for fieldType text_general?
> : >
> : > ============
> : >
> : > Explanation:
> : >
> : > I need a little help - keep getting an error when trying to use the
> : > ability to include an additional XML file.  I may be overlooking
> something,
> : > but if so, I need help to see it.
> : >
> : > I have the following two lines which throw zero errors when part of
> : > schema.xml:
> : >
> : > <field name="content" type="text_general" indexed="false" stored="true"
> : > multiValued="true"/>
> : >  <copyField source="content" dest="text"/>
> : >
> : > However, when I put this into an include file and use xinclude, then I
> get
> : > this error when starting Solr.
> : >
> : >
> : >
> : >    - *statdx_shard1_replica3:* org.apache.solr.common.
> : >    SolrException:org.apache.solr.common.SolrException: Could not load
> : >    conf for core statdx_shard1_replica3: Can't load schema schema.xml:
> : >    copyField source :'content' is not a glob and doesn't match any
> explicit
> : >    field or dynamicField.
> : >
> : >
> : > Given that I am defining the field in the line right above the
> copyField
> : > statement, I'm confused about why this works fine in schema.xml but
> NOT in
> : > an included file.
> : >
> : > I experimented and found that any field of type "text_general" will
> throw
> : > this same error if it is part of the included xml file.  Other
> fieldTypes
> : > that I tried (string, int, double) did not have this issue.
> : >
> : > I'm using Solr 5.4, although I'm pulling custom config into an included
> : > file for purposes of moving to 6.1
> : >
> : > I have the following list of copyField commands in the included xml
> file,
> : > and get no errors on any but the "content" one.  It just so happens
> that
> : > "content" is the only field of type "text_general" in there.
> : >
> : >
> : > Any hints greatly appreciated.
> : >
> : >   <copyField source="title" dest="text"/>
> : >    <copyField source="author" dest="text"/>
> : >    <copyField source="description" dest="text"/>
> : >    <copyField source="keywords" dest="text"/>
> : >    <copyField source="content" dest="text"/>
> : >    <copyField source="content_type" dest="text"/>
> : >    <copyField source="resourcename" dest="text"/>
> : >    <copyField source="url" dest="text"/>
> : >
> : >
> :
>
> -Hoss
> http://www.lucidworks.com/
>

Re: Problems using fieldType text_general in copyField

Reply via email to