Is what is shown in "analysis" the same as what is stored in a field?

I am confusing myself pretty thoroughly:

I have some fields:
  <fieldType name="string_raw" class="solr.TextField"
sortMissingLast="true" omitNorms="true">
     <analyzer type="index">
          <tokenizer class="solr.KeywordTokenizerFactory"/>
     </analyzer>

<fieldType name="stems" class="solr.TextField" positionIncrementGap="100">
 <analyzer type="index">
   <filter class="solr.LowerCaseFilterFactory" />
   <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" />
   <filter class="solr.EnglishPossessiveFilterFactory"/>
   <filter class="solr.PorterStemFilterFactory"/>
   <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
   <tokenizer class="solr.StandardTokenizerFactory"/>
 </analyzer>

  <fieldType name="everything" class="solr.TextField"
positionIncrementGap="100">
    <analyzer type="index">
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
    </analyzer>

<field name="stuff_raw" type="string_raw" indexed="true" stored="true"
multiValued="false" />
<field name="stuff_stems" type="stems" indexed="true" stored="true"
multiValued="false" />
 <field name="stuff_everything" type="everything" indexed="true"
stored="true" multiValued="true" />


And I have this:
 <copyField source="stuff_raw" dest="stuff_everything"/>
 <copyField source="stuff_raw" dest="stuff_stems"/>
 <copyField source="stuff_stems" dest="stuff_everything"/>


I run this through the analyzer for stuff_stems:
"the quick brown fox jumped over the sleeping dog"

It prints out a bunch of stuff but the last thing it says is:
"quick brown fox jump over sleep dog"

So far so good.

So I indexed a document with "the quick brown fox jumped over the
sleeping dog" set for stuff_raw and when I query for the document
stuff_stems just has "the quick brown fox jumped over the sleeping
dog" and NOT "quick brown fox jump over sleep dog"

Also stuff_everything only contains a single item, which is weird
because I copy two things into it.

In fact here is everything:

{
  "responseHeader":{
    "zkConnected":true,
    "status":0,
    "QTime":0,
    "params":{
      "q":"*:*",
      "wt":"json"}},
  "response":{"numFound":2,"start":0,"maxScore":1.0,"docs":[
      {
        "id":1,
        "stuff_raw":"the quick brown fox jumped over the sleeping dog",
        "stuff_stems":"the quick brown fox jumped over the sleeping dog",
        "stuff_everything":["the quick brown fox jumped over the sleeping dog"],
        "_version_":1664899022194737152,
        "timestamp":"2020-04-24T23:37:16.877Z",
        "score":1.0},
      {
        "id":2,
        "stuff_raw":"jumped jumping jumper",
        "stuff_stems":"jumped jumping jumper",
        "stuff_everything":["jumped jumping jumper"],
        "_version_":1664899046865633280,
        "timestamp":"2020-04-24T23:37:40.404Z",
        "score":1.0}]
  }}

Reply via email to