Re: Upgrade to 5.2 from 4.6, no storing of text

Alessandro Benedetti Tue, 30 Jun 2015 07:51:19 -0700

Instead of your immense schema, can you give us the details of the
Highlight you are trying to use ?
And how you are trying to use it ?
Which client ? Direct APi calls ?


let us know!

Cheers

2015-06-30 15:10 GMT+01:00 Mark Ehle <[email protected]>:

> Thanks to all for the help - it's now storing text and I can search and get
> results just before in 4.6, but I cannot get snippets to appear when I ask
> for highlighting.
>
>
> when I add documents, here is the URL my script generates:
>
>
> http://localhost:8080/solr/newspapers/update/extract?literal.id=2015_01_01_battlecreekenquirer-004&literal.publication_date=2015-01-01T00:00:00Z&literal.year=2015&literal.yearstr=2015&literal.day=1&literal.month_num=1&literal.month=01_January&literal.publication_name=Battle%20Creek%20Enquirer&literal.publication_type=newspaper&literal.short_name=battlecreekenquirer&literal.image_number=4&literal.filename=2015_01_01_battlecreekenquirer-004.pdf&literal.copyright_year=1923&literal.copyright_restricted=y&fmap.content=publication_text&stream.contentType=application%2Ftxt&stream.file=%2Farchive_data%2Fnewspapers%2FBattle%20Creek%20Enquirer%2F2015%2F01_January%2F2015_01_01_battlecreekenquirer%2Ftxt%2F2015_01_01_battlecreekenquirer-004.txt
>
>
> And here is my schema:
>
> <?xml version="1.0" encoding="UTF-8" ?>
>
> <!--
>  Licensed to the Apache Software Foundation (ASF) under one or more
>  contributor license agreements.  See the NOTICE file distributed with
>  this work for additional information regarding copyright ownership.
>  The ASF licenses this file to You under the Apache License, Version 2.0
>  (the "License"); you may not use this file except in compliance with
>  the License.  You may obtain a copy of the License at
>
>      http://www.apache.org/licenses/LICENSE-2.0
>
>  Unless required by applicable law or agreed to in writing, software
>  distributed under the License is distributed on an "AS IS" BASIS,
>  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
>  See the License for the specific language governing permissions and
>  limitations under the License.
> -->
>
> <!--
>  This is the Solr schema file. This file should be named "schema.xml" and
>  should be in the conf directory under the solr home
>  (i.e. ./solr/conf/schema.xml by default)
>  or located where the classloader for the Solr webapp can find it.
>
>  This example schema is the recommended starting point for users.
>  It should be kept correct and concise, usable out-of-the-box.
>
>  For more information, on how to customize this file, please see
>  http://wiki.apache.org/solr/SchemaXml
>
>  PERFORMANCE NOTE: this schema includes many optional features and should
> not
>  be used for benchmarking.  To improve performance one could
>   - set stored="false" for all fields possible (esp large fields) when you
>     only need to search on the field but don't need to return the original
>     value.
>   - set indexed="false" if you don't need to search on the field, but only
>     return the field as a result of searching on other indexed fields.
>   - remove all unneeded copyField statements
>   - for best index size and searching performance, set "index" to false
>     for all general text fields, use copyField to copy them to the
>     catchall "text" field, and use that for searching.
>   - For maximum indexing performance, use the StreamingUpdateSolrServer
>     java client.
>   - Remember to run the JVM in server mode, and use a higher logging level
>     that avoids logging every request
> -->
>
> <schema name="example" version="1.4">
>   <!-- attribute "name" is the name of this schema and is only used for
> display purposes.
>        Applications should change this to reflect the nature of the search
> collection.
>        version="1.4" is Solr's version number for the schema syntax and
> semantics.  It should
>        not normally be changed by applications.
>        1.0: multiValued attribute did not exist, all fields are multiValued
> by nature
>        1.1: multiValued attribute introduced, false by default
>        1.2: omitTermFreqAndPositions attribute introduced, true by default
> except for text fields.
>        1.3: removed optional field compress feature
>        1.4: default auto-phrase (QueryParser feature) to off
>      -->
>
>   <types>
>     <!-- field type definitions. The "name" attribute is
>        just a label to be used by field definitions.  The "class"
>        attribute and any other attributes determine the real
>        behavior of the fieldType.
>          Class names starting with "solr" refer to java classes in the
>        org.apache.solr.analysis package.
>     -->
>
>     <!-- The StrField type is not analyzed, but indexed/stored verbatim.
> -->
>     <fieldType name="string" class="solr.StrField" sortMissingLast="true"
> omitNorms="true"/>
>
>     <!-- boolean type: "true" or "false" -->
>     <fieldType name="boolean" class="solr.BoolField" sortMissingLast="true"
> omitNorms="true"/>
>     <!--Binary data type. The data should be sent/retrieved in as Base64
> encoded Strings -->
>     <fieldtype name="binary" class="solr.BinaryField"/>
>
>     <!-- The optional sortMissingLast and sortMissingFirst attributes are
>          currently supported on types that are sorted internally as
> strings.
>                This includes
> "string","boolean","sint","slong","sfloat","sdouble","pdate"
>        - If sortMissingLast="true", then a sort on this field will cause
> documents
>          without the field to come after documents with the field,
>          regardless of the requested sort order (asc or desc).
>        - If sortMissingFirst="true", then a sort on this field will cause
> documents
>          without the field to come before documents with the field,
>          regardless of the requested sort order.
>        - If sortMissingLast="false" and sortMissingFirst="false" (the
> default),
>          then default lucene sorting will be used which places docs without
> the
>          field first in an ascending sort and last in a descending sort.
>     -->
>
>     <!--
>       Default numeric field types. For faster range queries, consider the
> tint/tfloat/tlong/tdouble types.
>     -->
>     <fieldType name="int" class="solr.TrieIntField" precisionStep="0"
> omitNorms="true" positionIncrementGap="0"/>
>     <fieldType name="float" class="solr.TrieFloatField" precisionStep="0"
> omitNorms="true" positionIncrementGap="0"/>
>     <fieldType name="long" class="solr.TrieLongField" precisionStep="0"
> omitNorms="true" positionIncrementGap="0"/>
>     <fieldType name="double" class="solr.TrieDoubleField" precisionStep="0"
> omitNorms="true" positionIncrementGap="0"/>
>
>     <!--
>      Numeric field types that index each value at various levels of
> precision
>      to accelerate range queries when the number of values between the
> range
>      endpoints is large. See the javadoc for NumericRangeQuery for internal
>      implementation details.
>
>      Smaller precisionStep values (specified in bits) will lead to more
> tokens
>      indexed per value, slightly larger index size, and faster range
> queries.
>      A precisionStep of 0 disables indexing at different precision levels.
>     -->
>     <fieldType name="tint" class="solr.TrieIntField" precisionStep="8"
> omitNorms="true" positionIncrementGap="0"/>
>     <fieldType name="tfloat" class="solr.TrieFloatField" precisionStep="8"
> omitNorms="true" positionIncrementGap="0"/>
>     <fieldType name="tlong" class="solr.TrieLongField" precisionStep="8"
> omitNorms="true" positionIncrementGap="0"/>
>     <fieldType name="tdouble" class="solr.TrieDoubleField"
> precisionStep="8" omitNorms="true" positionIncrementGap="0"/>
>
>     <!-- The format for this date field is of the form
> 1995-12-31T23:59:59Z, and
>          is a more restricted form of the canonical representation of
> dateTime
>          http://www.w3.org/TR/xmlschema-2/#dateTime
>          The trailing "Z" designates UTC time and is mandatory.
>          Optional fractional seconds are allowed: 1995-12-31T23:59:59.999Z
>          All other components are mandatory.
>
>          Expressions can also be used to denote calculations that should be
>          performed relative to "NOW" to determine the value, ie...
>
>                NOW/HOUR
>                   ... Round to the start of the current hour
>                NOW-1DAY
>                   ... Exactly 1 day prior to now
>                NOW/DAY+6MONTHS+3DAYS
>                   ... 6 months and 3 days in the future from the start of
>                       the current day
>
>          Consult the DateField javadocs for more information.
>
>          Note: For faster range queries, consider the tdate type
>       -->
>     <fieldType name="date" class="solr.TrieDateField" omitNorms="true"
> precisionStep="0" positionIncrementGap="0"/>
>
>     <!-- A Trie based date field for faster date range queries and date
> faceting. -->
>     <fieldType name="tdate" class="solr.TrieDateField" omitNorms="true"
> precisionStep="6" positionIncrementGap="0"/>
>
>
>     <!--
>       Note:
>       These should only be used for compatibility with existing indexes
> (created with older Solr versions)
>       or if "sortMissingFirst" or "sortMissingLast" functionality is
> needed. Use Trie based fields instead.
>
>       Plain numeric field types that store and index the text
>       value verbatim (and hence don't support range queries, since the
>       lexicographic ordering isn't equal to the numeric ordering)
>     -->
>
>
>     <!--
>       Note:
>       These should only be used for compatibility with existing indexes
> (created with older Solr versions)
>       or if "sortMissingFirst" or "sortMissingLast" functionality is
> needed. Use Trie based fields instead.
>
>       Numeric field types that manipulate the value into
>       a string value that isn't human-readable in its internal form,
>       but with a lexicographic ordering the same as the numeric ordering,
>       so that range queries work correctly.
>     -->
>     <fieldType name="sint" class="solr.SortableIntField"
> sortMissingLast="true" omitNorms="true"/>
>     <fieldType name="slong" class="solr.SortableLongField"
> sortMissingLast="true" omitNorms="true"/>
>     <fieldType name="sfloat" class="solr.SortableFloatField"
> sortMissingLast="true" omitNorms="true"/>
>     <fieldType name="sdouble" class="solr.SortableDoubleField"
> sortMissingLast="true" omitNorms="true"/>
>
>
>     <!-- The "RandomSortField" is not used to store or search any
>          data.  You can declare fields of this type it in your schema
>          to generate pseudo-random orderings of your docs for sorting
>          purposes.  The ordering is generated based on the field name
>          and the version of the index, As long as the index version
>          remains unchanged, and the same field name is reused,
>          the ordering of the docs will be consistent.
>          If you want different psuedo-random orderings of documents,
>          for the same version of the index, use a dynamicField and
>          change the name
>      -->
>     <fieldType name="random" class="solr.RandomSortField" indexed="true" />
>
>     <!-- solr.TextField allows the specification of custom text analyzers
>          specified as a tokenizer and a list of token filters. Different
>          analyzers may be specified for indexing and querying.
>
>          The optional positionIncrementGap puts space between multiple
> fields of
>          this type on the same document, with the purpose of preventing
> false phrase
>          matching across fields.
>
>          For more info on customizing your analyzer chain, please see
>          http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
>      -->
>
>     <!-- One can also specify an existing Analyzer class that has a
>          default constructor via the class attribute on the analyzer
> element
>     <fieldType name="text_greek" class="solr.TextField">
>       <analyzer class="org.apache.lucene.analysis.el.GreekAnalyzer"/>
>     </fieldType>
>     -->
>
>     <!-- A text field that only splits on whitespace for exact matching of
> words -->
>     <fieldType name="text_ws" class="solr.TextField"
> positionIncrementGap="100">
>       <analyzer>
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>       </analyzer>
>     </fieldType>
>
>     <!-- A general text field that has reasonable, generic
>          cross-language defaults: it tokenizes with StandardTokenizer,
>          removes stop words from case-insensitive "stopwords.txt"
>          (empty by default), and down cases.  At query time only, it
>          also applies synonyms. -->
>     <fieldType name="text_general" class="solr.TextField"
> positionIncrementGap="100">
>       <analyzer type="index">
>         <tokenizer class="solr.StandardTokenizerFactory"/>
>         <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" enablePositionIncrements="true" />
>         <!-- in this example, we will only use synonyms at query time
>         <filter class="solr.SynonymFilterFactory"
> synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
>         -->
>         <filter class="solr.LowerCaseFilterFactory"/>
>       </analyzer>
>       <analyzer type="query">
>         <tokenizer class="solr.StandardTokenizerFactory"/>
>         <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" enablePositionIncrements="true" />
>         <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>       </analyzer>
>     </fieldType>
>
>     <!-- A text field with defaults appropriate for English: it
>          tokenizes with StandardTokenizer, removes English stop words
>          (stopwords_en.txt), down cases, protects words from protwords.txt,
> and
>          finally applies Porter's stemming.  The query time analyzer
>          also applies synonyms from synonyms.txt. -->
>     <fieldType name="text_en" class="solr.TextField"
> positionIncrementGap="100">
>       <analyzer type="index">
>         <tokenizer class="solr.StandardTokenizerFactory"/>
>         <!-- in this example, we will only use synonyms at query time
>         <filter class="solr.SynonymFilterFactory"
> synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
>         -->
>         <!-- Case insensitive stop word removal.
>           add enablePositionIncrements=true in both the index and query
>           analyzers to leave a 'gap' for more accurate phrase queries.
>         -->
>         <filter class="solr.StopFilterFactory"
>                 ignoreCase="true"
>                 words="stopwords_en.txt"
>                 enablePositionIncrements="true"
>                 />
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.EnglishPossessiveFilterFactory"/>
>         <filter class="solr.KeywordMarkerFilterFactory"
> protected="protwords.txt"/>
>         <!-- Optionally you may want to use this less aggressive stemmer
> instead of PorterStemFilterFactory:
>         <filter class="solr.EnglishMinimalStemFilterFactory"/>
>         -->
>         <filter class="solr.PorterStemFilterFactory"/>
>       </analyzer>
>       <analyzer type="query">
>         <tokenizer class="solr.StandardTokenizerFactory"/>
>         <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/>
>         <filter class="solr.StopFilterFactory"
>                 ignoreCase="true"
>                 words="stopwords_en.txt"
>                 enablePositionIncrements="true"
>                 />
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.EnglishPossessiveFilterFactory"/>
>         <filter class="solr.KeywordMarkerFilterFactory"
> protected="protwords.txt"/>
>         <!-- Optionally you may want to use this less aggressive stemmer
> instead of PorterStemFilterFactory:
>         <filter class="solr.EnglishMinimalStemFilterFactory"/>
>         -->
>         <filter class="solr.PorterStemFilterFactory"/>
>       </analyzer>
>     </fieldType>
>
>     <!-- A text field with defaults appropriate for English, plus
>          aggressive word-splitting and autophrase features enabled.
>          This field is just like text_en, except it adds
>          WordDelimiterFilter to enable splitting and matching of
>          words on case-change, alpha numeric boundaries, and
>          non-alphanumeric chars.  This means certain compound word
>          cases will work, for example query "wi fi" will match
>          document "WiFi" or "wi-fi".  However, other cases will still
>          not match, for example if the query is "wifi" and the
>          document is "wi fi" or if the query is "wi-fi" and the
>          document is "wifi".
>         -->
>     <fieldType name="text_en_splitting" class="solr.TextField"
> positionIncrementGap="100" autoGeneratePhraseQueries="true">
>       <analyzer type="index">
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>         <!-- in this example, we will only use synonyms at query time
>         <filter class="solr.SynonymFilterFactory"
> synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
>         -->
>         <!-- Case insensitive stop word removal.
>           add enablePositionIncrements=true in both the index and query
>           analyzers to leave a 'gap' for more accurate phrase queries.
>         -->
>         <filter class="solr.StopFilterFactory"
>                 ignoreCase="true"
>                 words="stopwords_en.txt"
>                 enablePositionIncrements="true"
>                 />
>         <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.KeywordMarkerFilterFactory"
> protected="protwords.txt"/>
>         <filter class="solr.PorterStemFilterFactory"/>
>       </analyzer>
>       <analyzer type="query">
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>         <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/>
>         <filter class="solr.StopFilterFactory"
>                 ignoreCase="true"
>                 words="stopwords_en.txt"
>                 enablePositionIncrements="true"
>                 />
>         <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.KeywordMarkerFilterFactory"
> protected="protwords.txt"/>
>         <filter class="solr.PorterStemFilterFactory"/>
>       </analyzer>
>     </fieldType>
>
>     <!-- Less flexible matching, but less false matches.  Probably not
> ideal for product names,
>          but may be good for SKUs.  Can insert dashes in the wrong place
> and still match. -->
>     <fieldType name="text_en_splitting_tight" class="solr.TextField"
> positionIncrementGap="100" autoGeneratePhraseQueries="true">
>       <analyzer>
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>         <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="false"/>
>         <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords_en.txt"/>
>         <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="0" generateNumberParts="0" catenateWords="1"
> catenateNumbers="1" catenateAll="0"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.KeywordMarkerFilterFactory"
> protected="protwords.txt"/>
>         <filter class="solr.EnglishMinimalStemFilterFactory"/>
>         <!-- this filter can remove any duplicate tokens that appear at the
> same position - sometimes
>              possible with WordDelimiterFilter in conjuncton with stemming.
> -->
>         <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>       </analyzer>
>     </fieldType>
>
>     <!-- Just like text_general except it reverses the characters of
>          each token, to enable more efficient leading wildcard queries. -->
>     <fieldType name="text_general_rev" class="solr.TextField"
> positionIncrementGap="100">
>       <analyzer type="index">
>         <tokenizer class="solr.StandardTokenizerFactory"/>
>         <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" enablePositionIncrements="true" />
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.ReversedWildcardFilterFactory"
> withOriginal="true"
>            maxPosAsterisk="3" maxPosQuestion="2"
> maxFractionAsterisk="0.33"/>
>       </analyzer>
>       <analyzer type="query">
>         <tokenizer class="solr.StandardTokenizerFactory"/>
>         <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/>
>         <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" enablePositionIncrements="true" />
>         <filter class="solr.LowerCaseFilterFactory"/>
>       </analyzer>
>     </fieldType>
>
>     <!-- charFilter + WhitespaceTokenizer  -->
>     <!--
>     <fieldType name="text_char_norm" class="solr.TextField"
> positionIncrementGap="100" >
>       <analyzer>
>         <charFilter class="solr.MappingCharFilterFactory"
> mapping="mapping-ISOLatin1Accent.txt"/>
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>       </analyzer>
>     </fieldType>
>     -->
>
>     <!-- This is an example of using the KeywordTokenizer along
>          With various TokenFilterFactories to produce a sortable field
>          that does not include some properties of the source text
>       -->
>     <fieldType name="alphaOnlySort" class="solr.TextField"
> sortMissingLast="true" omitNorms="true">
>       <analyzer>
>         <!-- KeywordTokenizer does no actual tokenizing, so the entire
>              input string is preserved as a single token
>           -->
>         <tokenizer class="solr.KeywordTokenizerFactory"/>
>         <!-- The LowerCase TokenFilter does what you expect, which can be
>              when you want your sorting to be case insensitive
>           -->
>         <filter class="solr.LowerCaseFilterFactory" />
>         <!-- The TrimFilter removes any leading or trailing whitespace -->
>         <filter class="solr.TrimFilterFactory" />
>         <!-- The PatternReplaceFilter gives you the flexibility to use
>              Java Regular expression to replace any sequence of characters
>              matching a pattern with an arbitrary replacement string,
>              which may include back references to portions of the original
>              string matched by the pattern.
>
>              See the Java Regular Expression documentation for more
>              information on pattern and replacement string syntax.
>
>
>
> http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/package-summary.html
>           -->
>         <filter class="solr.PatternReplaceFilterFactory"
>                 pattern="([^a-z])" replacement="" replace="all"
>         />
>       </analyzer>
>     </fieldType>
>
>     <fieldtype name="phonetic" stored="false" indexed="true"
> class="solr.TextField" >
>       <analyzer>
>         <tokenizer class="solr.StandardTokenizerFactory"/>
>         <filter class="solr.DoubleMetaphoneFilterFactory" inject="false"/>
>       </analyzer>
>     </fieldtype>
>
>     <fieldtype name="payloads" stored="false" indexed="true"
> class="solr.TextField" >
>       <analyzer>
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>         <!--
>         The DelimitedPayloadTokenFilter can put payloads on tokens... for
> example,
>         a token of "foo|1.4"  would be indexed as "foo" with a payload of
> 1.4f
>         Attributes of the DelimitedPayloadTokenFilterFactory :
>          "delimiter" - a one character delimiter. Default is | (pipe)
>          "encoder" - how to encode the following value into a playload
>             float -> org.apache.lucene.analysis.payloads.FloatEncoder,
>             integer -> o.a.l.a.p.IntegerEncoder
>             identity -> o.a.l.a.p.IdentityEncoder
>             Fully Qualified class name implementing PayloadEncoder, Encoder
> must have a no arg constructor.
>          -->
>         <filter class="solr.DelimitedPayloadTokenFilterFactory"
> encoder="float"/>
>       </analyzer>
>     </fieldtype>
>
>     <!-- lowercases the entire field value, keeping it as a single token.
> -->
>     <fieldType name="lowercase" class="solr.TextField"
> positionIncrementGap="100">
>       <analyzer>
>         <tokenizer class="solr.KeywordTokenizerFactory"/>
>         <filter class="solr.LowerCaseFilterFactory" />
>       </analyzer>
>     </fieldType>
>
>     <fieldType name="text_path" class="solr.TextField"
> positionIncrementGap="100">
>       <analyzer>
>         <tokenizer class="solr.PathHierarchyTokenizerFactory"/>
>       </analyzer>
>     </fieldType>
>
>     <!-- since fields of this type are by default not stored or indexed,
>          any data added to them will be ignored outright.  -->
>     <fieldtype name="ignored" stored="false" indexed="false"
> multiValued="true" class="solr.StrField" />
>
>     <!-- This point type indexes the coordinates as separate fields
> (subFields)
>       If subFieldType is defined, it references a type, and a dynamic field
>       definition is created matching *___<typename>.  Alternately, if
>       subFieldSuffix is defined, that is used to create the subFields.
>       Example: if subFieldType="double", then the coordinates would be
>         indexed in fields myloc_0___double,myloc_1___double.
>       Example: if subFieldSuffix="_d" then the coordinates would be indexed
>         in fields myloc_0_d,myloc_1_d
>       The subFields are an implementation detail of the fieldType, and end
>       users normally should not need to know about them.
>      -->
>     <fieldType name="point" class="solr.PointType" dimension="2"
> subFieldSuffix="_d"/>
>
>     <!-- A specialized field for geospatial search. If indexed, this
> fieldType must not be multivalued. -->
>     <fieldType name="location" class="solr.LatLonType"
> subFieldSuffix="_coordinate"/>
>
>    <!--
>     A Geohash is a compact representation of a latitude longitude pair in a
> single field.
>     See http://wiki.apache.org/solr/SpatialSearch
>    -->
>     <fieldtype name="geohash" class="solr.GeoHashField"/>
>  </types>
>
>
>  <fields>
>
>    <field name="_version_" type="long" indexed="true" stored="true"/>
>    <!-- Valid attributes for fields:
>      name: mandatory - the name for the field
>      type: mandatory - the name of a previously defined type from the
>        <types> section
>      indexed: true if this field should be indexed (searchable or sortable)
>      stored: true if this field should be retrievable
>      multiValued: true if this field may contain multiple values per
> document
>      omitNorms: (expert) set to true to omit the norms associated with
>        this field (this disables length normalization and index-time
>        boosting for the field, and saves some memory).  Only full-text
>        fields or fields that need an index-time boost need norms.
>      termVectors: [false] set to true to store the term vector for a
>        given field.
>        When using MoreLikeThis, fields used for similarity should be
>        stored for best performance.
>      termPositions: Store position information with the term vector.
>        This will increase storage costs.
>      termOffsets: Store offset information with the term vector. This
>        will increase storage costs.
>      default: a value that should be used if no value is specified
>        when adding a document.
>    -->
>         <!-- newspaper-specific fields -->
>
>         <!-- Unique field  (name of the original one-page PDF file with no
> extention -->
>         <field name="id" type="string" indexed="true" stored="true"
> required="true" />
>
>         <!-- tdate that the paper was published, like 1997-11-30T00:00:00Z
> -->
>         <field name="publication_date" type="tdate" indexed="true"
> stored="true" required="true" />
>
>         <!-- Integer year that the paper was published -->
>         <field name="year" type="int" indexed="true" stored="true"
> required="true" />
>
>         <!-- String of the year, like '1998' -->
>         <field name="yearstr" type="string" indexed="true" stored="true"
> required="true" />
>
>         <!-- Integer of the day of the month (no zero padding)-->
>         <field name="day" type="int" indexed="true" stored="true"
> required="true" />
>
>         <!-- Integer number of the month (no zero padding)-->
>         <field name="month_num" type="int" indexed="true" stored="true"
> required="true" />
>
>         <!-- Name of month, like 'January'-->
>         <field name="month" type="string" indexed="true" stored="true"
> required="true" />
>
>         <!-- name of the publication, i.e., Battle Creek Enquirer -->
>         <field name="publication_name" type="string" indexed="true"
> stored="true" required="true" />
>
>         <!-- Short name of the publication, i.e., battlecreekenquirer -->
>         <field name="short_name" type="string" indexed="true" stored="true"
> required="true" />
>
>         <!-- Image number (roughly page number, no zero padding, will match
> last 3 digits of filename) -->
>         <field name="image_number" type="int" indexed="true" stored="true"
> required="true" />
>
>         <!-- Name of the PDF file (just the filename, no path) -->
>         <field name="filename" type="string" indexed="true" stored="true"
> required="true" />
>
>         <!-- Copyright Restricted (not allowed outside willard networks
> values: yes, no) -->
>         <field name="copyright_restricted" type="string" indexed="true"
> stored="true" required="true" />
>
>         <!-- Copyright Year (copyright cut off) -->
>         <field name="copyright_year" type="string" indexed="true"
> stored="true" required="true" />
>
>         <!-- Publication Type (newspaper or shopper) -->
>         <field name="publication_type" type="string" indexed="true"
> stored="true" required="true" />
>
>         <!-- Publication Text -->
>         <field name="publication_text" type="string" indexed="true"
> stored="true" required="true" multiValued="true"/>
>
>         <!-- end newspaper-specific fields -->
>
>    <field name="sku" type="text_en_splitting_tight" indexed="true"
> stored="true" omitNorms="true"/>
>    <field name="name" type="text_general" indexed="true" stored="true"/>
>    <field name="alphaNameSort" type="alphaOnlySort" indexed="true"
> stored="false"/>
>    <field name="manu" type="text_general" indexed="true" stored="true"
> omitNorms="true"/>
>    <field name="cat" type="string" indexed="true" stored="true"
> multiValued="true"/>
>    <field name="features" type="text_general" indexed="true" stored="true"
> multiValued="true"/>
>    <field name="includes" type="text_general" indexed="true" stored="true"
> termVectors="true" termPositions="true" termOffsets="true" />
>
>    <field name="weight" type="float" indexed="true" stored="true"/>
>    <field name="price"  type="float" indexed="true" stored="true"/>
>    <field name="popularity" type="int" indexed="true" stored="true" />
>    <field name="inStock" type="boolean" indexed="true" stored="true" />
>
>    <!--
>    The following store examples are used to demonstrate the various ways
> one might _CHOOSE_ to
>     implement spatial.  It is highly unlikely that you would ever have ALL
> of these fields defined.
>     -->
>    <field name="store" type="location" indexed="true" stored="true"/>
>
>    <!-- Common metadata fields, named specifically to match up with
>      SolrCell metadata when parsing rich documents such as Word, PDF.
>      Some fields are multiValued only because Tika currently may return
>      multiple values for them.
>    -->
>    <field name="title" type="text_general" indexed="true" stored="true"
> multiValued="true"/>
>    <field name="subject" type="text_general" indexed="true" stored="true"/>
>    <field name="description" type="text_general" indexed="true"
> stored="true"/>
>    <field name="comments" type="text_general" indexed="true"
> stored="true"/>
>    <field name="author" type="text_general" indexed="true" stored="true"/>
>    <field name="keywords" type="text_general" indexed="true"
> stored="true"/>
>    <field name="category" type="text_general" indexed="true"
> stored="true"/>
>    <field name="content_type" type="string" indexed="true" stored="true"
> multiValued="true"/>
>    <field name="last_modified" type="date" indexed="true" stored="true"/>
>    <field name="links" type="string" indexed="true" stored="true"
> multiValued="true"/>
>
>
>    <!-- catchall field, containing all other searchable text fields
> (implemented
>         via copyField further on in this schema  -->
>    <field name="text" type="text_general" indexed="true" stored="true"
> multiValued="true"/>
>
>    <!-- catchall text field that indexes tokens both normally and in
> reverse for efficient
>         leading wildcard queries. -->
>    <field name="text_rev" type="text_general_rev" indexed="true"
> stored="false" multiValued="true"/>
>
>    <!-- non-tokenized version of manufacturer to make it easier to sort or
> group
>         results by manufacturer.  copied from "manu" via copyField -->
>    <field name="manu_exact" type="string" indexed="true" stored="false"/>
>
>    <field name="payloads" type="payloads" indexed="true" stored="true"/>
>
>    <!-- Uncommenting the following will create a "timestamp" field using
>         a default value of "NOW" to indicate when each document was
> indexed.
>      -->
>    <!--
>    <field name="timestamp" type="date" indexed="true" stored="true"
> default="NOW" multiValued="false"/>
>      -->
>
>
>    <!-- Dynamic field definitions.  If a field name is not found,
> dynamicFields
>         will be used if the name matches any of the patterns.
>         RESTRICTION: the glob-like pattern in the name attribute must have
>         a "*" only at the start or the end.
>         EXAMPLE:  name="*_i" will match any field ending in _i (like
> myid_i, z_i)
>         Longer patterns will be matched first.  if equal size patterns
>         both match, the first appearing in the schema will be used.  -->
>    <dynamicField name="*_i"  type="int"    indexed="true"  stored="true"/>
>    <dynamicField name="*_s"  type="string"  indexed="true"  stored="true"/>
>    <dynamicField name="*_l"  type="long"   indexed="true"  stored="true"/>
>    <dynamicField name="*_t"  type="text_general"    indexed="true"
> stored="true"/>
>    <dynamicField name="*_txt" type="text_general"    indexed="true"
> stored="true" multiValued="true"/>
>    <dynamicField name="*_b"  type="boolean" indexed="true"  stored="true"/>
>    <dynamicField name="*_f"  type="float"  indexed="true"  stored="true"/>
>    <dynamicField name="*_d"  type="double" indexed="true"  stored="true"/>
>
>    <!-- Type used to index the lat and lon components for the "location"
> FieldType -->
>    <dynamicField name="*_coordinate"  type="tdouble" indexed="true"
> stored="false"/>
>
>    <dynamicField name="*_dt" type="date"    indexed="true"  stored="true"/>
>    <dynamicField name="*_p"  type="location" indexed="true" stored="true"/>
>
>    <!-- some trie-coded dynamic fields for faster range queries -->
>    <dynamicField name="*_ti" type="tint"    indexed="true"  stored="true"/>
>    <dynamicField name="*_tl" type="tlong"   indexed="true"  stored="true"/>
>    <dynamicField name="*_tf" type="tfloat"  indexed="true"  stored="true"/>
>    <dynamicField name="*_td" type="tdouble" indexed="true"  stored="true"/>
>    <dynamicField name="*_tdt" type="tdate"  indexed="true"  stored="true"/>
>
>
>    <dynamicField name="ignored_*" type="ignored" multiValued="true"/>
>    <dynamicField name="attr_*" type="text_general" indexed="true"
> stored="true" multiValued="true"/>
>
>    <dynamicField name="random_*" type="random" />
>
>    <!-- uncomment the following to ignore any fields that don't already
> match an existing
>         field name or dynamic field, rather than reporting them as an
> error.
>         alternately, change the type="ignored" to some other type e.g.
> "text" if you want
>         unknown fields indexed and/or stored by default -->
>    <!--dynamicField name="*" type="ignored" multiValued="true" /-->
>
>  </fields>
>
>  <!-- Field to use to determine and enforce document uniqueness.
>       Unless this field is marked with required="false", it will be a
> required field
>    -->
>  <uniqueKey>id</uniqueKey>
>
>  <!-- field for the QueryParser to use when an explicit fieldname is absent
> -->
>  <defaultSearchField>text</defaultSearchField>
>
>  <!-- SolrQueryParser configuration: defaultOperator="AND|OR" -->
>  <solrQueryParser defaultOperator="OR"/>
>
>   <!-- copyField commands copy one field to another at the time a document
>         is added to the index.  It's used either to index the same field
> differently,
>         or to add multiple fields to the same field for easier/faster
> searching.  -->
>
>    <copyField source="cat" dest="text"/>
>    <copyField source="name" dest="text"/>
>    <copyField source="manu" dest="text"/>
>    <copyField source="features" dest="text"/>
>    <copyField source="includes" dest="text"/>
>    <copyField source="manu" dest="manu_exact"/>
>    <copyField source="publication_text" dest="text" />
>    <!-- Above, multiple source fields are copied to the [text] field.
>           Another way to map multiple source fields to the same
>           destination field is to use the dynamic field syntax.
>           copyField also supports a maxChars to copy setting.  -->
>
>     <copyField source="*_t" dest="text" maxChars="300000000"/>
>
>    <!-- copy name to alphaNameSort, a field designed for sorting by name
> -->
>    <!-- <copyField source="name" dest="alphaNameSort"/> -->
>
>
>  <!-- Similarity is the scoring routine for each document vs. a query.
>       A custom similarity may be specified here, but the default is fine
>       for most applications.  -->
>  <!-- <similarity class="org.apache.lucene.search.DefaultSimilarity"/> -->
>  <!-- ... OR ...
>       Specify a SimilarityFactory class name implementation
>       allowing parameters to be used.
>  -->
>  <!--
>  <similarity class="com.example.solr.CustomSimilarityFactory">
>    <str name="paramkey">param value</str>
>  </similarity>
>  -->
>
>
> </schema>
>
>
>
>
>
> On Sat, Jun 27, 2015 at 11:27 AM, Erick Erickson <[email protected]>
> wrote:
>
> > This should be no different in 5.2 than 4.6.
> >
> > My first guess is a typo somewhere or some similar forehead-slapper.
> > Are you sure you're specifying the field in the "fl" list?
> >
> > Take a look at the index files, the *.fdt files are where the stored data
> > goes. You can't look into them, but for the same documents they should
> > be roughly the same aggregate size as they are in 4.6
> > 'du -hc *.fdt' will sum them all up for you (*nix).
> >
> > Second thing I'd do for sanity check is tail out the Solr log while
> > indexing and querying, just to see "stuff" go by and see if any
> > errors are thrown, although it sounds like you wouldn't see
> > any search results at all if there was something wrong with
> > indexing.
> >
> > And if none of that sheds any light, let's see the schema file?
> > Maybe the results of adding &debug=all to the query?
> >
> > Best,
> > Erick
> >
> > On Fri, Jun 26, 2015 at 8:05 AM, Mark Ehle <[email protected]> wrote:
> > > In my schema from 4.6, the text was in the 'text' field, and the
> "stored"
> > > attrib was set to "true" as it is in the 5.2 schema. I am ingesting the
> > > text from files on the server , and it used to work just fine with
> 4.6. I
> > > am using the same schema except I had to get rid the field types pint,
> > > plong, pfloat, pdouble and pdate. Otherwise, the schema is identical.
> > >
> > > How do I tell SOLR 5.2 to store the text from a file to a certain
> field?
> > >
> > > Thanks!
> > >
> > >
> > > On Fri, Jun 26, 2015 at 7:29 AM, Alessandro Benedetti <
> > > [email protected]> wrote:
> > >
> > >> Actually storing or not storing a field is a simple schema.xml
> > >> configuration.
> > >> This suggestion can be obvious, but … have you checked you have your
> > >> "stored" attribute set "true" for the field you are interested ?
> > >>
> > >> I am talking about the 5.2 schema.
> > >>
> > >> Cheers
> > >>
> > >> 2015-06-26 12:24 GMT+01:00 Mark Ehle <[email protected]>:
> > >>
> > >> > Folks -
> > >> >
> > >> > I am using SOLR 4.6 to run a newspaper indexing site we have at the
> > >> library
> > >> > I work at. I would like to update to 5.2, and I have an instance of
> it
> > >> > running. When I go to index the txt files of each newspaper page, I
> > can
> > >> > search and find stuff, but there is no text stored any more. I do
> use
> > >> > highlighting so I need the text there.
> > >> >
> > >> > What would be different about 5.2 that would account for this?
> > >> >
> > >> > Thanks!
> > >> >
> > >> > Mark Ehle
> > >> > Computer Support Librarian
> > >> > Willard Library
> > >> > Battle Creek,MI
> > >> >
> > >>
> > >>
> > >>
> > >> --
> > >> --------------------------
> > >>
> > >> Benedetti Alessandro
> > >> Visiting card : http://about.me/alessandro_benedetti
> > >>
> > >> "Tyger, tyger burning bright
> > >> In the forests of the night,
> > >> What immortal hand or eye
> > >> Could frame thy fearful symmetry?"
> > >>
> > >> William Blake - Songs of Experience -1794 England
> > >>
> >
>



-- 
--------------------------

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England

Re: Upgrade to 5.2 from 4.6, no storing of text

Reply via email to