Instead of your immense schema, can you give us the details of the Highlight you are trying to use ? And how you are trying to use it ? Which client ? Direct APi calls ?
let us know! Cheers 2015-06-30 15:10 GMT+01:00 Mark Ehle <marke...@gmail.com>: > Thanks to all for the help - it's now storing text and I can search and get > results just before in 4.6, but I cannot get snippets to appear when I ask > for highlighting. > > > when I add documents, here is the URL my script generates: > > > http://localhost:8080/solr/newspapers/update/extract?literal.id=2015_01_01_battlecreekenquirer-004&literal.publication_date=2015-01-01T00:00:00Z&literal.year=2015&literal.yearstr=2015&literal.day=1&literal.month_num=1&literal.month=01_January&literal.publication_name=Battle%20Creek%20Enquirer&literal.publication_type=newspaper&literal.short_name=battlecreekenquirer&literal.image_number=4&literal.filename=2015_01_01_battlecreekenquirer-004.pdf&literal.copyright_year=1923&literal.copyright_restricted=y&fmap.content=publication_text&stream.contentType=application%2Ftxt&stream.file=%2Farchive_data%2Fnewspapers%2FBattle%20Creek%20Enquirer%2F2015%2F01_January%2F2015_01_01_battlecreekenquirer%2Ftxt%2F2015_01_01_battlecreekenquirer-004.txt > > > And here is my schema: > > <?xml version="1.0" encoding="UTF-8" ?> > > <!-- > Licensed to the Apache Software Foundation (ASF) under one or more > contributor license agreements. See the NOTICE file distributed with > this work for additional information regarding copyright ownership. > The ASF licenses this file to You under the Apache License, Version 2.0 > (the "License"); you may not use this file except in compliance with > the License. You may obtain a copy of the License at > > http://www.apache.org/licenses/LICENSE-2.0 > > Unless required by applicable law or agreed to in writing, software > distributed under the License is distributed on an "AS IS" BASIS, > WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. > See the License for the specific language governing permissions and > limitations under the License. > --> > > <!-- > This is the Solr schema file. This file should be named "schema.xml" and > should be in the conf directory under the solr home > (i.e. ./solr/conf/schema.xml by default) > or located where the classloader for the Solr webapp can find it. > > This example schema is the recommended starting point for users. > It should be kept correct and concise, usable out-of-the-box. > > For more information, on how to customize this file, please see > http://wiki.apache.org/solr/SchemaXml > > PERFORMANCE NOTE: this schema includes many optional features and should > not > be used for benchmarking. To improve performance one could > - set stored="false" for all fields possible (esp large fields) when you > only need to search on the field but don't need to return the original > value. > - set indexed="false" if you don't need to search on the field, but only > return the field as a result of searching on other indexed fields. > - remove all unneeded copyField statements > - for best index size and searching performance, set "index" to false > for all general text fields, use copyField to copy them to the > catchall "text" field, and use that for searching. > - For maximum indexing performance, use the StreamingUpdateSolrServer > java client. > - Remember to run the JVM in server mode, and use a higher logging level > that avoids logging every request > --> > > <schema name="example" version="1.4"> > <!-- attribute "name" is the name of this schema and is only used for > display purposes. > Applications should change this to reflect the nature of the search > collection. > version="1.4" is Solr's version number for the schema syntax and > semantics. It should > not normally be changed by applications. > 1.0: multiValued attribute did not exist, all fields are multiValued > by nature > 1.1: multiValued attribute introduced, false by default > 1.2: omitTermFreqAndPositions attribute introduced, true by default > except for text fields. > 1.3: removed optional field compress feature > 1.4: default auto-phrase (QueryParser feature) to off > --> > > <types> > <!-- field type definitions. The "name" attribute is > just a label to be used by field definitions. The "class" > attribute and any other attributes determine the real > behavior of the fieldType. > Class names starting with "solr" refer to java classes in the > org.apache.solr.analysis package. > --> > > <!-- The StrField type is not analyzed, but indexed/stored verbatim. > --> > <fieldType name="string" class="solr.StrField" sortMissingLast="true" > omitNorms="true"/> > > <!-- boolean type: "true" or "false" --> > <fieldType name="boolean" class="solr.BoolField" sortMissingLast="true" > omitNorms="true"/> > <!--Binary data type. The data should be sent/retrieved in as Base64 > encoded Strings --> > <fieldtype name="binary" class="solr.BinaryField"/> > > <!-- The optional sortMissingLast and sortMissingFirst attributes are > currently supported on types that are sorted internally as > strings. > This includes > "string","boolean","sint","slong","sfloat","sdouble","pdate" > - If sortMissingLast="true", then a sort on this field will cause > documents > without the field to come after documents with the field, > regardless of the requested sort order (asc or desc). > - If sortMissingFirst="true", then a sort on this field will cause > documents > without the field to come before documents with the field, > regardless of the requested sort order. > - If sortMissingLast="false" and sortMissingFirst="false" (the > default), > then default lucene sorting will be used which places docs without > the > field first in an ascending sort and last in a descending sort. > --> > > <!-- > Default numeric field types. For faster range queries, consider the > tint/tfloat/tlong/tdouble types. > --> > <fieldType name="int" class="solr.TrieIntField" precisionStep="0" > omitNorms="true" positionIncrementGap="0"/> > <fieldType name="float" class="solr.TrieFloatField" precisionStep="0" > omitNorms="true" positionIncrementGap="0"/> > <fieldType name="long" class="solr.TrieLongField" precisionStep="0" > omitNorms="true" positionIncrementGap="0"/> > <fieldType name="double" class="solr.TrieDoubleField" precisionStep="0" > omitNorms="true" positionIncrementGap="0"/> > > <!-- > Numeric field types that index each value at various levels of > precision > to accelerate range queries when the number of values between the > range > endpoints is large. See the javadoc for NumericRangeQuery for internal > implementation details. > > Smaller precisionStep values (specified in bits) will lead to more > tokens > indexed per value, slightly larger index size, and faster range > queries. > A precisionStep of 0 disables indexing at different precision levels. > --> > <fieldType name="tint" class="solr.TrieIntField" precisionStep="8" > omitNorms="true" positionIncrementGap="0"/> > <fieldType name="tfloat" class="solr.TrieFloatField" precisionStep="8" > omitNorms="true" positionIncrementGap="0"/> > <fieldType name="tlong" class="solr.TrieLongField" precisionStep="8" > omitNorms="true" positionIncrementGap="0"/> > <fieldType name="tdouble" class="solr.TrieDoubleField" > precisionStep="8" omitNorms="true" positionIncrementGap="0"/> > > <!-- The format for this date field is of the form > 1995-12-31T23:59:59Z, and > is a more restricted form of the canonical representation of > dateTime > http://www.w3.org/TR/xmlschema-2/#dateTime > The trailing "Z" designates UTC time and is mandatory. > Optional fractional seconds are allowed: 1995-12-31T23:59:59.999Z > All other components are mandatory. > > Expressions can also be used to denote calculations that should be > performed relative to "NOW" to determine the value, ie... > > NOW/HOUR > ... Round to the start of the current hour > NOW-1DAY > ... Exactly 1 day prior to now > NOW/DAY+6MONTHS+3DAYS > ... 6 months and 3 days in the future from the start of > the current day > > Consult the DateField javadocs for more information. > > Note: For faster range queries, consider the tdate type > --> > <fieldType name="date" class="solr.TrieDateField" omitNorms="true" > precisionStep="0" positionIncrementGap="0"/> > > <!-- A Trie based date field for faster date range queries and date > faceting. --> > <fieldType name="tdate" class="solr.TrieDateField" omitNorms="true" > precisionStep="6" positionIncrementGap="0"/> > > > <!-- > Note: > These should only be used for compatibility with existing indexes > (created with older Solr versions) > or if "sortMissingFirst" or "sortMissingLast" functionality is > needed. Use Trie based fields instead. > > Plain numeric field types that store and index the text > value verbatim (and hence don't support range queries, since the > lexicographic ordering isn't equal to the numeric ordering) > --> > > > <!-- > Note: > These should only be used for compatibility with existing indexes > (created with older Solr versions) > or if "sortMissingFirst" or "sortMissingLast" functionality is > needed. Use Trie based fields instead. > > Numeric field types that manipulate the value into > a string value that isn't human-readable in its internal form, > but with a lexicographic ordering the same as the numeric ordering, > so that range queries work correctly. > --> > <fieldType name="sint" class="solr.SortableIntField" > sortMissingLast="true" omitNorms="true"/> > <fieldType name="slong" class="solr.SortableLongField" > sortMissingLast="true" omitNorms="true"/> > <fieldType name="sfloat" class="solr.SortableFloatField" > sortMissingLast="true" omitNorms="true"/> > <fieldType name="sdouble" class="solr.SortableDoubleField" > sortMissingLast="true" omitNorms="true"/> > > > <!-- The "RandomSortField" is not used to store or search any > data. You can declare fields of this type it in your schema > to generate pseudo-random orderings of your docs for sorting > purposes. The ordering is generated based on the field name > and the version of the index, As long as the index version > remains unchanged, and the same field name is reused, > the ordering of the docs will be consistent. > If you want different psuedo-random orderings of documents, > for the same version of the index, use a dynamicField and > change the name > --> > <fieldType name="random" class="solr.RandomSortField" indexed="true" /> > > <!-- solr.TextField allows the specification of custom text analyzers > specified as a tokenizer and a list of token filters. Different > analyzers may be specified for indexing and querying. > > The optional positionIncrementGap puts space between multiple > fields of > this type on the same document, with the purpose of preventing > false phrase > matching across fields. > > For more info on customizing your analyzer chain, please see > http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters > --> > > <!-- One can also specify an existing Analyzer class that has a > default constructor via the class attribute on the analyzer > element > <fieldType name="text_greek" class="solr.TextField"> > <analyzer class="org.apache.lucene.analysis.el.GreekAnalyzer"/> > </fieldType> > --> > > <!-- A text field that only splits on whitespace for exact matching of > words --> > <fieldType name="text_ws" class="solr.TextField" > positionIncrementGap="100"> > <analyzer> > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > </analyzer> > </fieldType> > > <!-- A general text field that has reasonable, generic > cross-language defaults: it tokenizes with StandardTokenizer, > removes stop words from case-insensitive "stopwords.txt" > (empty by default), and down cases. At query time only, it > also applies synonyms. --> > <fieldType name="text_general" class="solr.TextField" > positionIncrementGap="100"> > <analyzer type="index"> > <tokenizer class="solr.StandardTokenizerFactory"/> > <filter class="solr.StopFilterFactory" ignoreCase="true" > words="stopwords.txt" enablePositionIncrements="true" /> > <!-- in this example, we will only use synonyms at query time > <filter class="solr.SynonymFilterFactory" > synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/> > --> > <filter class="solr.LowerCaseFilterFactory"/> > </analyzer> > <analyzer type="query"> > <tokenizer class="solr.StandardTokenizerFactory"/> > <filter class="solr.StopFilterFactory" ignoreCase="true" > words="stopwords.txt" enablePositionIncrements="true" /> > <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" > ignoreCase="true" expand="true"/> > <filter class="solr.LowerCaseFilterFactory"/> > </analyzer> > </fieldType> > > <!-- A text field with defaults appropriate for English: it > tokenizes with StandardTokenizer, removes English stop words > (stopwords_en.txt), down cases, protects words from protwords.txt, > and > finally applies Porter's stemming. The query time analyzer > also applies synonyms from synonyms.txt. --> > <fieldType name="text_en" class="solr.TextField" > positionIncrementGap="100"> > <analyzer type="index"> > <tokenizer class="solr.StandardTokenizerFactory"/> > <!-- in this example, we will only use synonyms at query time > <filter class="solr.SynonymFilterFactory" > synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/> > --> > <!-- Case insensitive stop word removal. > add enablePositionIncrements=true in both the index and query > analyzers to leave a 'gap' for more accurate phrase queries. > --> > <filter class="solr.StopFilterFactory" > ignoreCase="true" > words="stopwords_en.txt" > enablePositionIncrements="true" > /> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.EnglishPossessiveFilterFactory"/> > <filter class="solr.KeywordMarkerFilterFactory" > protected="protwords.txt"/> > <!-- Optionally you may want to use this less aggressive stemmer > instead of PorterStemFilterFactory: > <filter class="solr.EnglishMinimalStemFilterFactory"/> > --> > <filter class="solr.PorterStemFilterFactory"/> > </analyzer> > <analyzer type="query"> > <tokenizer class="solr.StandardTokenizerFactory"/> > <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" > ignoreCase="true" expand="true"/> > <filter class="solr.StopFilterFactory" > ignoreCase="true" > words="stopwords_en.txt" > enablePositionIncrements="true" > /> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.EnglishPossessiveFilterFactory"/> > <filter class="solr.KeywordMarkerFilterFactory" > protected="protwords.txt"/> > <!-- Optionally you may want to use this less aggressive stemmer > instead of PorterStemFilterFactory: > <filter class="solr.EnglishMinimalStemFilterFactory"/> > --> > <filter class="solr.PorterStemFilterFactory"/> > </analyzer> > </fieldType> > > <!-- A text field with defaults appropriate for English, plus > aggressive word-splitting and autophrase features enabled. > This field is just like text_en, except it adds > WordDelimiterFilter to enable splitting and matching of > words on case-change, alpha numeric boundaries, and > non-alphanumeric chars. This means certain compound word > cases will work, for example query "wi fi" will match > document "WiFi" or "wi-fi". However, other cases will still > not match, for example if the query is "wifi" and the > document is "wi fi" or if the query is "wi-fi" and the > document is "wifi". > --> > <fieldType name="text_en_splitting" class="solr.TextField" > positionIncrementGap="100" autoGeneratePhraseQueries="true"> > <analyzer type="index"> > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > <!-- in this example, we will only use synonyms at query time > <filter class="solr.SynonymFilterFactory" > synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/> > --> > <!-- Case insensitive stop word removal. > add enablePositionIncrements=true in both the index and query > analyzers to leave a 'gap' for more accurate phrase queries. > --> > <filter class="solr.StopFilterFactory" > ignoreCase="true" > words="stopwords_en.txt" > enablePositionIncrements="true" > /> > <filter class="solr.WordDelimiterFilterFactory" > generateWordParts="1" generateNumberParts="1" catenateWords="1" > catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.KeywordMarkerFilterFactory" > protected="protwords.txt"/> > <filter class="solr.PorterStemFilterFactory"/> > </analyzer> > <analyzer type="query"> > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" > ignoreCase="true" expand="true"/> > <filter class="solr.StopFilterFactory" > ignoreCase="true" > words="stopwords_en.txt" > enablePositionIncrements="true" > /> > <filter class="solr.WordDelimiterFilterFactory" > generateWordParts="1" generateNumberParts="1" catenateWords="0" > catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.KeywordMarkerFilterFactory" > protected="protwords.txt"/> > <filter class="solr.PorterStemFilterFactory"/> > </analyzer> > </fieldType> > > <!-- Less flexible matching, but less false matches. Probably not > ideal for product names, > but may be good for SKUs. Can insert dashes in the wrong place > and still match. --> > <fieldType name="text_en_splitting_tight" class="solr.TextField" > positionIncrementGap="100" autoGeneratePhraseQueries="true"> > <analyzer> > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" > ignoreCase="true" expand="false"/> > <filter class="solr.StopFilterFactory" ignoreCase="true" > words="stopwords_en.txt"/> > <filter class="solr.WordDelimiterFilterFactory" > generateWordParts="0" generateNumberParts="0" catenateWords="1" > catenateNumbers="1" catenateAll="0"/> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.KeywordMarkerFilterFactory" > protected="protwords.txt"/> > <filter class="solr.EnglishMinimalStemFilterFactory"/> > <!-- this filter can remove any duplicate tokens that appear at the > same position - sometimes > possible with WordDelimiterFilter in conjuncton with stemming. > --> > <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> > </analyzer> > </fieldType> > > <!-- Just like text_general except it reverses the characters of > each token, to enable more efficient leading wildcard queries. --> > <fieldType name="text_general_rev" class="solr.TextField" > positionIncrementGap="100"> > <analyzer type="index"> > <tokenizer class="solr.StandardTokenizerFactory"/> > <filter class="solr.StopFilterFactory" ignoreCase="true" > words="stopwords.txt" enablePositionIncrements="true" /> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.ReversedWildcardFilterFactory" > withOriginal="true" > maxPosAsterisk="3" maxPosQuestion="2" > maxFractionAsterisk="0.33"/> > </analyzer> > <analyzer type="query"> > <tokenizer class="solr.StandardTokenizerFactory"/> > <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" > ignoreCase="true" expand="true"/> > <filter class="solr.StopFilterFactory" ignoreCase="true" > words="stopwords.txt" enablePositionIncrements="true" /> > <filter class="solr.LowerCaseFilterFactory"/> > </analyzer> > </fieldType> > > <!-- charFilter + WhitespaceTokenizer --> > <!-- > <fieldType name="text_char_norm" class="solr.TextField" > positionIncrementGap="100" > > <analyzer> > <charFilter class="solr.MappingCharFilterFactory" > mapping="mapping-ISOLatin1Accent.txt"/> > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > </analyzer> > </fieldType> > --> > > <!-- This is an example of using the KeywordTokenizer along > With various TokenFilterFactories to produce a sortable field > that does not include some properties of the source text > --> > <fieldType name="alphaOnlySort" class="solr.TextField" > sortMissingLast="true" omitNorms="true"> > <analyzer> > <!-- KeywordTokenizer does no actual tokenizing, so the entire > input string is preserved as a single token > --> > <tokenizer class="solr.KeywordTokenizerFactory"/> > <!-- The LowerCase TokenFilter does what you expect, which can be > when you want your sorting to be case insensitive > --> > <filter class="solr.LowerCaseFilterFactory" /> > <!-- The TrimFilter removes any leading or trailing whitespace --> > <filter class="solr.TrimFilterFactory" /> > <!-- The PatternReplaceFilter gives you the flexibility to use > Java Regular expression to replace any sequence of characters > matching a pattern with an arbitrary replacement string, > which may include back references to portions of the original > string matched by the pattern. > > See the Java Regular Expression documentation for more > information on pattern and replacement string syntax. > > > > http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/package-summary.html > --> > <filter class="solr.PatternReplaceFilterFactory" > pattern="([^a-z])" replacement="" replace="all" > /> > </analyzer> > </fieldType> > > <fieldtype name="phonetic" stored="false" indexed="true" > class="solr.TextField" > > <analyzer> > <tokenizer class="solr.StandardTokenizerFactory"/> > <filter class="solr.DoubleMetaphoneFilterFactory" inject="false"/> > </analyzer> > </fieldtype> > > <fieldtype name="payloads" stored="false" indexed="true" > class="solr.TextField" > > <analyzer> > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > <!-- > The DelimitedPayloadTokenFilter can put payloads on tokens... for > example, > a token of "foo|1.4" would be indexed as "foo" with a payload of > 1.4f > Attributes of the DelimitedPayloadTokenFilterFactory : > "delimiter" - a one character delimiter. Default is | (pipe) > "encoder" - how to encode the following value into a playload > float -> org.apache.lucene.analysis.payloads.FloatEncoder, > integer -> o.a.l.a.p.IntegerEncoder > identity -> o.a.l.a.p.IdentityEncoder > Fully Qualified class name implementing PayloadEncoder, Encoder > must have a no arg constructor. > --> > <filter class="solr.DelimitedPayloadTokenFilterFactory" > encoder="float"/> > </analyzer> > </fieldtype> > > <!-- lowercases the entire field value, keeping it as a single token. > --> > <fieldType name="lowercase" class="solr.TextField" > positionIncrementGap="100"> > <analyzer> > <tokenizer class="solr.KeywordTokenizerFactory"/> > <filter class="solr.LowerCaseFilterFactory" /> > </analyzer> > </fieldType> > > <fieldType name="text_path" class="solr.TextField" > positionIncrementGap="100"> > <analyzer> > <tokenizer class="solr.PathHierarchyTokenizerFactory"/> > </analyzer> > </fieldType> > > <!-- since fields of this type are by default not stored or indexed, > any data added to them will be ignored outright. --> > <fieldtype name="ignored" stored="false" indexed="false" > multiValued="true" class="solr.StrField" /> > > <!-- This point type indexes the coordinates as separate fields > (subFields) > If subFieldType is defined, it references a type, and a dynamic field > definition is created matching *___<typename>. Alternately, if > subFieldSuffix is defined, that is used to create the subFields. > Example: if subFieldType="double", then the coordinates would be > indexed in fields myloc_0___double,myloc_1___double. > Example: if subFieldSuffix="_d" then the coordinates would be indexed > in fields myloc_0_d,myloc_1_d > The subFields are an implementation detail of the fieldType, and end > users normally should not need to know about them. > --> > <fieldType name="point" class="solr.PointType" dimension="2" > subFieldSuffix="_d"/> > > <!-- A specialized field for geospatial search. If indexed, this > fieldType must not be multivalued. --> > <fieldType name="location" class="solr.LatLonType" > subFieldSuffix="_coordinate"/> > > <!-- > A Geohash is a compact representation of a latitude longitude pair in a > single field. > See http://wiki.apache.org/solr/SpatialSearch > --> > <fieldtype name="geohash" class="solr.GeoHashField"/> > </types> > > > <fields> > > <field name="_version_" type="long" indexed="true" stored="true"/> > <!-- Valid attributes for fields: > name: mandatory - the name for the field > type: mandatory - the name of a previously defined type from the > <types> section > indexed: true if this field should be indexed (searchable or sortable) > stored: true if this field should be retrievable > multiValued: true if this field may contain multiple values per > document > omitNorms: (expert) set to true to omit the norms associated with > this field (this disables length normalization and index-time > boosting for the field, and saves some memory). Only full-text > fields or fields that need an index-time boost need norms. > termVectors: [false] set to true to store the term vector for a > given field. > When using MoreLikeThis, fields used for similarity should be > stored for best performance. > termPositions: Store position information with the term vector. > This will increase storage costs. > termOffsets: Store offset information with the term vector. This > will increase storage costs. > default: a value that should be used if no value is specified > when adding a document. > --> > <!-- newspaper-specific fields --> > > <!-- Unique field (name of the original one-page PDF file with no > extention --> > <field name="id" type="string" indexed="true" stored="true" > required="true" /> > > <!-- tdate that the paper was published, like 1997-11-30T00:00:00Z > --> > <field name="publication_date" type="tdate" indexed="true" > stored="true" required="true" /> > > <!-- Integer year that the paper was published --> > <field name="year" type="int" indexed="true" stored="true" > required="true" /> > > <!-- String of the year, like '1998' --> > <field name="yearstr" type="string" indexed="true" stored="true" > required="true" /> > > <!-- Integer of the day of the month (no zero padding)--> > <field name="day" type="int" indexed="true" stored="true" > required="true" /> > > <!-- Integer number of the month (no zero padding)--> > <field name="month_num" type="int" indexed="true" stored="true" > required="true" /> > > <!-- Name of month, like 'January'--> > <field name="month" type="string" indexed="true" stored="true" > required="true" /> > > <!-- name of the publication, i.e., Battle Creek Enquirer --> > <field name="publication_name" type="string" indexed="true" > stored="true" required="true" /> > > <!-- Short name of the publication, i.e., battlecreekenquirer --> > <field name="short_name" type="string" indexed="true" stored="true" > required="true" /> > > <!-- Image number (roughly page number, no zero padding, will match > last 3 digits of filename) --> > <field name="image_number" type="int" indexed="true" stored="true" > required="true" /> > > <!-- Name of the PDF file (just the filename, no path) --> > <field name="filename" type="string" indexed="true" stored="true" > required="true" /> > > <!-- Copyright Restricted (not allowed outside willard networks > values: yes, no) --> > <field name="copyright_restricted" type="string" indexed="true" > stored="true" required="true" /> > > <!-- Copyright Year (copyright cut off) --> > <field name="copyright_year" type="string" indexed="true" > stored="true" required="true" /> > > <!-- Publication Type (newspaper or shopper) --> > <field name="publication_type" type="string" indexed="true" > stored="true" required="true" /> > > <!-- Publication Text --> > <field name="publication_text" type="string" indexed="true" > stored="true" required="true" multiValued="true"/> > > <!-- end newspaper-specific fields --> > > <field name="sku" type="text_en_splitting_tight" indexed="true" > stored="true" omitNorms="true"/> > <field name="name" type="text_general" indexed="true" stored="true"/> > <field name="alphaNameSort" type="alphaOnlySort" indexed="true" > stored="false"/> > <field name="manu" type="text_general" indexed="true" stored="true" > omitNorms="true"/> > <field name="cat" type="string" indexed="true" stored="true" > multiValued="true"/> > <field name="features" type="text_general" indexed="true" stored="true" > multiValued="true"/> > <field name="includes" type="text_general" indexed="true" stored="true" > termVectors="true" termPositions="true" termOffsets="true" /> > > <field name="weight" type="float" indexed="true" stored="true"/> > <field name="price" type="float" indexed="true" stored="true"/> > <field name="popularity" type="int" indexed="true" stored="true" /> > <field name="inStock" type="boolean" indexed="true" stored="true" /> > > <!-- > The following store examples are used to demonstrate the various ways > one might _CHOOSE_ to > implement spatial. It is highly unlikely that you would ever have ALL > of these fields defined. > --> > <field name="store" type="location" indexed="true" stored="true"/> > > <!-- Common metadata fields, named specifically to match up with > SolrCell metadata when parsing rich documents such as Word, PDF. > Some fields are multiValued only because Tika currently may return > multiple values for them. > --> > <field name="title" type="text_general" indexed="true" stored="true" > multiValued="true"/> > <field name="subject" type="text_general" indexed="true" stored="true"/> > <field name="description" type="text_general" indexed="true" > stored="true"/> > <field name="comments" type="text_general" indexed="true" > stored="true"/> > <field name="author" type="text_general" indexed="true" stored="true"/> > <field name="keywords" type="text_general" indexed="true" > stored="true"/> > <field name="category" type="text_general" indexed="true" > stored="true"/> > <field name="content_type" type="string" indexed="true" stored="true" > multiValued="true"/> > <field name="last_modified" type="date" indexed="true" stored="true"/> > <field name="links" type="string" indexed="true" stored="true" > multiValued="true"/> > > > <!-- catchall field, containing all other searchable text fields > (implemented > via copyField further on in this schema --> > <field name="text" type="text_general" indexed="true" stored="true" > multiValued="true"/> > > <!-- catchall text field that indexes tokens both normally and in > reverse for efficient > leading wildcard queries. --> > <field name="text_rev" type="text_general_rev" indexed="true" > stored="false" multiValued="true"/> > > <!-- non-tokenized version of manufacturer to make it easier to sort or > group > results by manufacturer. copied from "manu" via copyField --> > <field name="manu_exact" type="string" indexed="true" stored="false"/> > > <field name="payloads" type="payloads" indexed="true" stored="true"/> > > <!-- Uncommenting the following will create a "timestamp" field using > a default value of "NOW" to indicate when each document was > indexed. > --> > <!-- > <field name="timestamp" type="date" indexed="true" stored="true" > default="NOW" multiValued="false"/> > --> > > > <!-- Dynamic field definitions. If a field name is not found, > dynamicFields > will be used if the name matches any of the patterns. > RESTRICTION: the glob-like pattern in the name attribute must have > a "*" only at the start or the end. > EXAMPLE: name="*_i" will match any field ending in _i (like > myid_i, z_i) > Longer patterns will be matched first. if equal size patterns > both match, the first appearing in the schema will be used. --> > <dynamicField name="*_i" type="int" indexed="true" stored="true"/> > <dynamicField name="*_s" type="string" indexed="true" stored="true"/> > <dynamicField name="*_l" type="long" indexed="true" stored="true"/> > <dynamicField name="*_t" type="text_general" indexed="true" > stored="true"/> > <dynamicField name="*_txt" type="text_general" indexed="true" > stored="true" multiValued="true"/> > <dynamicField name="*_b" type="boolean" indexed="true" stored="true"/> > <dynamicField name="*_f" type="float" indexed="true" stored="true"/> > <dynamicField name="*_d" type="double" indexed="true" stored="true"/> > > <!-- Type used to index the lat and lon components for the "location" > FieldType --> > <dynamicField name="*_coordinate" type="tdouble" indexed="true" > stored="false"/> > > <dynamicField name="*_dt" type="date" indexed="true" stored="true"/> > <dynamicField name="*_p" type="location" indexed="true" stored="true"/> > > <!-- some trie-coded dynamic fields for faster range queries --> > <dynamicField name="*_ti" type="tint" indexed="true" stored="true"/> > <dynamicField name="*_tl" type="tlong" indexed="true" stored="true"/> > <dynamicField name="*_tf" type="tfloat" indexed="true" stored="true"/> > <dynamicField name="*_td" type="tdouble" indexed="true" stored="true"/> > <dynamicField name="*_tdt" type="tdate" indexed="true" stored="true"/> > > > <dynamicField name="ignored_*" type="ignored" multiValued="true"/> > <dynamicField name="attr_*" type="text_general" indexed="true" > stored="true" multiValued="true"/> > > <dynamicField name="random_*" type="random" /> > > <!-- uncomment the following to ignore any fields that don't already > match an existing > field name or dynamic field, rather than reporting them as an > error. > alternately, change the type="ignored" to some other type e.g. > "text" if you want > unknown fields indexed and/or stored by default --> > <!--dynamicField name="*" type="ignored" multiValued="true" /--> > > </fields> > > <!-- Field to use to determine and enforce document uniqueness. > Unless this field is marked with required="false", it will be a > required field > --> > <uniqueKey>id</uniqueKey> > > <!-- field for the QueryParser to use when an explicit fieldname is absent > --> > <defaultSearchField>text</defaultSearchField> > > <!-- SolrQueryParser configuration: defaultOperator="AND|OR" --> > <solrQueryParser defaultOperator="OR"/> > > <!-- copyField commands copy one field to another at the time a document > is added to the index. It's used either to index the same field > differently, > or to add multiple fields to the same field for easier/faster > searching. --> > > <copyField source="cat" dest="text"/> > <copyField source="name" dest="text"/> > <copyField source="manu" dest="text"/> > <copyField source="features" dest="text"/> > <copyField source="includes" dest="text"/> > <copyField source="manu" dest="manu_exact"/> > <copyField source="publication_text" dest="text" /> > <!-- Above, multiple source fields are copied to the [text] field. > Another way to map multiple source fields to the same > destination field is to use the dynamic field syntax. > copyField also supports a maxChars to copy setting. --> > > <copyField source="*_t" dest="text" maxChars="300000000"/> > > <!-- copy name to alphaNameSort, a field designed for sorting by name > --> > <!-- <copyField source="name" dest="alphaNameSort"/> --> > > > <!-- Similarity is the scoring routine for each document vs. a query. > A custom similarity may be specified here, but the default is fine > for most applications. --> > <!-- <similarity class="org.apache.lucene.search.DefaultSimilarity"/> --> > <!-- ... OR ... > Specify a SimilarityFactory class name implementation > allowing parameters to be used. > --> > <!-- > <similarity class="com.example.solr.CustomSimilarityFactory"> > <str name="paramkey">param value</str> > </similarity> > --> > > > </schema> > > > > > > On Sat, Jun 27, 2015 at 11:27 AM, Erick Erickson <erickerick...@gmail.com> > wrote: > > > This should be no different in 5.2 than 4.6. > > > > My first guess is a typo somewhere or some similar forehead-slapper. > > Are you sure you're specifying the field in the "fl" list? > > > > Take a look at the index files, the *.fdt files are where the stored data > > goes. You can't look into them, but for the same documents they should > > be roughly the same aggregate size as they are in 4.6 > > 'du -hc *.fdt' will sum them all up for you (*nix). > > > > Second thing I'd do for sanity check is tail out the Solr log while > > indexing and querying, just to see "stuff" go by and see if any > > errors are thrown, although it sounds like you wouldn't see > > any search results at all if there was something wrong with > > indexing. > > > > And if none of that sheds any light, let's see the schema file? > > Maybe the results of adding &debug=all to the query? > > > > Best, > > Erick > > > > On Fri, Jun 26, 2015 at 8:05 AM, Mark Ehle <marke...@gmail.com> wrote: > > > In my schema from 4.6, the text was in the 'text' field, and the > "stored" > > > attrib was set to "true" as it is in the 5.2 schema. I am ingesting the > > > text from files on the server , and it used to work just fine with > 4.6. I > > > am using the same schema except I had to get rid the field types pint, > > > plong, pfloat, pdouble and pdate. Otherwise, the schema is identical. > > > > > > How do I tell SOLR 5.2 to store the text from a file to a certain > field? > > > > > > Thanks! > > > > > > > > > On Fri, Jun 26, 2015 at 7:29 AM, Alessandro Benedetti < > > > benedetti.ale...@gmail.com> wrote: > > > > > >> Actually storing or not storing a field is a simple schema.xml > > >> configuration. > > >> This suggestion can be obvious, but … have you checked you have your > > >> "stored" attribute set "true" for the field you are interested ? > > >> > > >> I am talking about the 5.2 schema. > > >> > > >> Cheers > > >> > > >> 2015-06-26 12:24 GMT+01:00 Mark Ehle <marke...@gmail.com>: > > >> > > >> > Folks - > > >> > > > >> > I am using SOLR 4.6 to run a newspaper indexing site we have at the > > >> library > > >> > I work at. I would like to update to 5.2, and I have an instance of > it > > >> > running. When I go to index the txt files of each newspaper page, I > > can > > >> > search and find stuff, but there is no text stored any more. I do > use > > >> > highlighting so I need the text there. > > >> > > > >> > What would be different about 5.2 that would account for this? > > >> > > > >> > Thanks! > > >> > > > >> > Mark Ehle > > >> > Computer Support Librarian > > >> > Willard Library > > >> > Battle Creek,MI > > >> > > > >> > > >> > > >> > > >> -- > > >> -------------------------- > > >> > > >> Benedetti Alessandro > > >> Visiting card : http://about.me/alessandro_benedetti > > >> > > >> "Tyger, tyger burning bright > > >> In the forests of the night, > > >> What immortal hand or eye > > >> Could frame thy fearful symmetry?" > > >> > > >> William Blake - Songs of Experience -1794 England > > >> > > > -- -------------------------- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti "Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry?" William Blake - Songs of Experience -1794 England