Alessandro - Someone asked to see the schema, I posted it. Should I have just attached it? Does this mailing list support that?
I am by no means a SOLR expert. I am a PHP coder who wrote a (very-much-loved by our library staff and patrons) newspaper indexing tool that I am trying to update. I only know enough about SOLR to install it, and index and query. All I did to the 5.2 schema was add the newspaper-specific fields that was in the old schema. I cannot answer most of your questions. I just know that this url: http://127.0.0.1:8080/solr/newspapers/select?q=%22JOHN+GRAP%22&fl=year&wt=json&indent=true&hl=true&hl.fl=text&hl.simple.pre=%3Cem%3E&hl.simple.post=%3C%2Fem%3E used to produce snippets of highlited text in 4.6. In 5.2 it does not. Thanks - Mark Ehle Computer Support Librarian Willard Library Battle Creek, MI On Tue, Jun 30, 2015 at 10:50 AM, Alessandro Benedetti < benedetti.ale...@gmail.com> wrote: > Instead of your immense schema, can you give us the details of the > Highlight you are trying to use ? > And how you are trying to use it ? > Which client ? Direct APi calls ? > > let us know! > > Cheers > > 2015-06-30 15:10 GMT+01:00 Mark Ehle <marke...@gmail.com>: > > > Thanks to all for the help - it's now storing text and I can search and > get > > results just before in 4.6, but I cannot get snippets to appear when I > ask > > for highlighting. > > > > > > when I add documents, here is the URL my script generates: > > > > > > > http://localhost:8080/solr/newspapers/update/extract?literal.id=2015_01_01_battlecreekenquirer-004&literal.publication_date=2015-01-01T00:00:00Z&literal.year=2015&literal.yearstr=2015&literal.day=1&literal.month_num=1&literal.month=01_January&literal.publication_name=Battle%20Creek%20Enquirer&literal.publication_type=newspaper&literal.short_name=battlecreekenquirer&literal.image_number=4&literal.filename=2015_01_01_battlecreekenquirer-004.pdf&literal.copyright_year=1923&literal.copyright_restricted=y&fmap.content=publication_text&stream.contentType=application%2Ftxt&stream.file=%2Farchive_data%2Fnewspapers%2FBattle%20Creek%20Enquirer%2F2015%2F01_January%2F2015_01_01_battlecreekenquirer%2Ftxt%2F2015_01_01_battlecreekenquirer-004.txt > > > > > > And here is my schema: > > > > <?xml version="1.0" encoding="UTF-8" ?> > > > > <!-- > > Licensed to the Apache Software Foundation (ASF) under one or more > > contributor license agreements. See the NOTICE file distributed with > > this work for additional information regarding copyright ownership. > > The ASF licenses this file to You under the Apache License, Version 2.0 > > (the "License"); you may not use this file except in compliance with > > the License. You may obtain a copy of the License at > > > > http://www.apache.org/licenses/LICENSE-2.0 > > > > Unless required by applicable law or agreed to in writing, software > > distributed under the License is distributed on an "AS IS" BASIS, > > WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. > > See the License for the specific language governing permissions and > > limitations under the License. > > --> > > > > <!-- > > This is the Solr schema file. This file should be named "schema.xml" and > > should be in the conf directory under the solr home > > (i.e. ./solr/conf/schema.xml by default) > > or located where the classloader for the Solr webapp can find it. > > > > This example schema is the recommended starting point for users. > > It should be kept correct and concise, usable out-of-the-box. > > > > For more information, on how to customize this file, please see > > http://wiki.apache.org/solr/SchemaXml > > > > PERFORMANCE NOTE: this schema includes many optional features and should > > not > > be used for benchmarking. To improve performance one could > > - set stored="false" for all fields possible (esp large fields) when > you > > only need to search on the field but don't need to return the > original > > value. > > - set indexed="false" if you don't need to search on the field, but > only > > return the field as a result of searching on other indexed fields. > > - remove all unneeded copyField statements > > - for best index size and searching performance, set "index" to false > > for all general text fields, use copyField to copy them to the > > catchall "text" field, and use that for searching. > > - For maximum indexing performance, use the StreamingUpdateSolrServer > > java client. > > - Remember to run the JVM in server mode, and use a higher logging > level > > that avoids logging every request > > --> > > > > <schema name="example" version="1.4"> > > <!-- attribute "name" is the name of this schema and is only used for > > display purposes. > > Applications should change this to reflect the nature of the > search > > collection. > > version="1.4" is Solr's version number for the schema syntax and > > semantics. It should > > not normally be changed by applications. > > 1.0: multiValued attribute did not exist, all fields are > multiValued > > by nature > > 1.1: multiValued attribute introduced, false by default > > 1.2: omitTermFreqAndPositions attribute introduced, true by > default > > except for text fields. > > 1.3: removed optional field compress feature > > 1.4: default auto-phrase (QueryParser feature) to off > > --> > > > > <types> > > <!-- field type definitions. The "name" attribute is > > just a label to be used by field definitions. The "class" > > attribute and any other attributes determine the real > > behavior of the fieldType. > > Class names starting with "solr" refer to java classes in the > > org.apache.solr.analysis package. > > --> > > > > <!-- The StrField type is not analyzed, but indexed/stored verbatim. > > --> > > <fieldType name="string" class="solr.StrField" sortMissingLast="true" > > omitNorms="true"/> > > > > <!-- boolean type: "true" or "false" --> > > <fieldType name="boolean" class="solr.BoolField" > sortMissingLast="true" > > omitNorms="true"/> > > <!--Binary data type. The data should be sent/retrieved in as Base64 > > encoded Strings --> > > <fieldtype name="binary" class="solr.BinaryField"/> > > > > <!-- The optional sortMissingLast and sortMissingFirst attributes are > > currently supported on types that are sorted internally as > > strings. > > This includes > > "string","boolean","sint","slong","sfloat","sdouble","pdate" > > - If sortMissingLast="true", then a sort on this field will cause > > documents > > without the field to come after documents with the field, > > regardless of the requested sort order (asc or desc). > > - If sortMissingFirst="true", then a sort on this field will cause > > documents > > without the field to come before documents with the field, > > regardless of the requested sort order. > > - If sortMissingLast="false" and sortMissingFirst="false" (the > > default), > > then default lucene sorting will be used which places docs > without > > the > > field first in an ascending sort and last in a descending sort. > > --> > > > > <!-- > > Default numeric field types. For faster range queries, consider the > > tint/tfloat/tlong/tdouble types. > > --> > > <fieldType name="int" class="solr.TrieIntField" precisionStep="0" > > omitNorms="true" positionIncrementGap="0"/> > > <fieldType name="float" class="solr.TrieFloatField" precisionStep="0" > > omitNorms="true" positionIncrementGap="0"/> > > <fieldType name="long" class="solr.TrieLongField" precisionStep="0" > > omitNorms="true" positionIncrementGap="0"/> > > <fieldType name="double" class="solr.TrieDoubleField" > precisionStep="0" > > omitNorms="true" positionIncrementGap="0"/> > > > > <!-- > > Numeric field types that index each value at various levels of > > precision > > to accelerate range queries when the number of values between the > > range > > endpoints is large. See the javadoc for NumericRangeQuery for > internal > > implementation details. > > > > Smaller precisionStep values (specified in bits) will lead to more > > tokens > > indexed per value, slightly larger index size, and faster range > > queries. > > A precisionStep of 0 disables indexing at different precision > levels. > > --> > > <fieldType name="tint" class="solr.TrieIntField" precisionStep="8" > > omitNorms="true" positionIncrementGap="0"/> > > <fieldType name="tfloat" class="solr.TrieFloatField" > precisionStep="8" > > omitNorms="true" positionIncrementGap="0"/> > > <fieldType name="tlong" class="solr.TrieLongField" precisionStep="8" > > omitNorms="true" positionIncrementGap="0"/> > > <fieldType name="tdouble" class="solr.TrieDoubleField" > > precisionStep="8" omitNorms="true" positionIncrementGap="0"/> > > > > <!-- The format for this date field is of the form > > 1995-12-31T23:59:59Z, and > > is a more restricted form of the canonical representation of > > dateTime > > http://www.w3.org/TR/xmlschema-2/#dateTime > > The trailing "Z" designates UTC time and is mandatory. > > Optional fractional seconds are allowed: > 1995-12-31T23:59:59.999Z > > All other components are mandatory. > > > > Expressions can also be used to denote calculations that should > be > > performed relative to "NOW" to determine the value, ie... > > > > NOW/HOUR > > ... Round to the start of the current hour > > NOW-1DAY > > ... Exactly 1 day prior to now > > NOW/DAY+6MONTHS+3DAYS > > ... 6 months and 3 days in the future from the start of > > the current day > > > > Consult the DateField javadocs for more information. > > > > Note: For faster range queries, consider the tdate type > > --> > > <fieldType name="date" class="solr.TrieDateField" omitNorms="true" > > precisionStep="0" positionIncrementGap="0"/> > > > > <!-- A Trie based date field for faster date range queries and date > > faceting. --> > > <fieldType name="tdate" class="solr.TrieDateField" omitNorms="true" > > precisionStep="6" positionIncrementGap="0"/> > > > > > > <!-- > > Note: > > These should only be used for compatibility with existing indexes > > (created with older Solr versions) > > or if "sortMissingFirst" or "sortMissingLast" functionality is > > needed. Use Trie based fields instead. > > > > Plain numeric field types that store and index the text > > value verbatim (and hence don't support range queries, since the > > lexicographic ordering isn't equal to the numeric ordering) > > --> > > > > > > <!-- > > Note: > > These should only be used for compatibility with existing indexes > > (created with older Solr versions) > > or if "sortMissingFirst" or "sortMissingLast" functionality is > > needed. Use Trie based fields instead. > > > > Numeric field types that manipulate the value into > > a string value that isn't human-readable in its internal form, > > but with a lexicographic ordering the same as the numeric ordering, > > so that range queries work correctly. > > --> > > <fieldType name="sint" class="solr.SortableIntField" > > sortMissingLast="true" omitNorms="true"/> > > <fieldType name="slong" class="solr.SortableLongField" > > sortMissingLast="true" omitNorms="true"/> > > <fieldType name="sfloat" class="solr.SortableFloatField" > > sortMissingLast="true" omitNorms="true"/> > > <fieldType name="sdouble" class="solr.SortableDoubleField" > > sortMissingLast="true" omitNorms="true"/> > > > > > > <!-- The "RandomSortField" is not used to store or search any > > data. You can declare fields of this type it in your schema > > to generate pseudo-random orderings of your docs for sorting > > purposes. The ordering is generated based on the field name > > and the version of the index, As long as the index version > > remains unchanged, and the same field name is reused, > > the ordering of the docs will be consistent. > > If you want different psuedo-random orderings of documents, > > for the same version of the index, use a dynamicField and > > change the name > > --> > > <fieldType name="random" class="solr.RandomSortField" indexed="true" > /> > > > > <!-- solr.TextField allows the specification of custom text analyzers > > specified as a tokenizer and a list of token filters. Different > > analyzers may be specified for indexing and querying. > > > > The optional positionIncrementGap puts space between multiple > > fields of > > this type on the same document, with the purpose of preventing > > false phrase > > matching across fields. > > > > For more info on customizing your analyzer chain, please see > > http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters > > --> > > > > <!-- One can also specify an existing Analyzer class that has a > > default constructor via the class attribute on the analyzer > > element > > <fieldType name="text_greek" class="solr.TextField"> > > <analyzer class="org.apache.lucene.analysis.el.GreekAnalyzer"/> > > </fieldType> > > --> > > > > <!-- A text field that only splits on whitespace for exact matching > of > > words --> > > <fieldType name="text_ws" class="solr.TextField" > > positionIncrementGap="100"> > > <analyzer> > > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > > </analyzer> > > </fieldType> > > > > <!-- A general text field that has reasonable, generic > > cross-language defaults: it tokenizes with StandardTokenizer, > > removes stop words from case-insensitive "stopwords.txt" > > (empty by default), and down cases. At query time only, it > > also applies synonyms. --> > > <fieldType name="text_general" class="solr.TextField" > > positionIncrementGap="100"> > > <analyzer type="index"> > > <tokenizer class="solr.StandardTokenizerFactory"/> > > <filter class="solr.StopFilterFactory" ignoreCase="true" > > words="stopwords.txt" enablePositionIncrements="true" /> > > <!-- in this example, we will only use synonyms at query time > > <filter class="solr.SynonymFilterFactory" > > synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/> > > --> > > <filter class="solr.LowerCaseFilterFactory"/> > > </analyzer> > > <analyzer type="query"> > > <tokenizer class="solr.StandardTokenizerFactory"/> > > <filter class="solr.StopFilterFactory" ignoreCase="true" > > words="stopwords.txt" enablePositionIncrements="true" /> > > <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" > > ignoreCase="true" expand="true"/> > > <filter class="solr.LowerCaseFilterFactory"/> > > </analyzer> > > </fieldType> > > > > <!-- A text field with defaults appropriate for English: it > > tokenizes with StandardTokenizer, removes English stop words > > (stopwords_en.txt), down cases, protects words from > protwords.txt, > > and > > finally applies Porter's stemming. The query time analyzer > > also applies synonyms from synonyms.txt. --> > > <fieldType name="text_en" class="solr.TextField" > > positionIncrementGap="100"> > > <analyzer type="index"> > > <tokenizer class="solr.StandardTokenizerFactory"/> > > <!-- in this example, we will only use synonyms at query time > > <filter class="solr.SynonymFilterFactory" > > synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/> > > --> > > <!-- Case insensitive stop word removal. > > add enablePositionIncrements=true in both the index and query > > analyzers to leave a 'gap' for more accurate phrase queries. > > --> > > <filter class="solr.StopFilterFactory" > > ignoreCase="true" > > words="stopwords_en.txt" > > enablePositionIncrements="true" > > /> > > <filter class="solr.LowerCaseFilterFactory"/> > > <filter class="solr.EnglishPossessiveFilterFactory"/> > > <filter class="solr.KeywordMarkerFilterFactory" > > protected="protwords.txt"/> > > <!-- Optionally you may want to use this less aggressive stemmer > > instead of PorterStemFilterFactory: > > <filter class="solr.EnglishMinimalStemFilterFactory"/> > > --> > > <filter class="solr.PorterStemFilterFactory"/> > > </analyzer> > > <analyzer type="query"> > > <tokenizer class="solr.StandardTokenizerFactory"/> > > <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" > > ignoreCase="true" expand="true"/> > > <filter class="solr.StopFilterFactory" > > ignoreCase="true" > > words="stopwords_en.txt" > > enablePositionIncrements="true" > > /> > > <filter class="solr.LowerCaseFilterFactory"/> > > <filter class="solr.EnglishPossessiveFilterFactory"/> > > <filter class="solr.KeywordMarkerFilterFactory" > > protected="protwords.txt"/> > > <!-- Optionally you may want to use this less aggressive stemmer > > instead of PorterStemFilterFactory: > > <filter class="solr.EnglishMinimalStemFilterFactory"/> > > --> > > <filter class="solr.PorterStemFilterFactory"/> > > </analyzer> > > </fieldType> > > > > <!-- A text field with defaults appropriate for English, plus > > aggressive word-splitting and autophrase features enabled. > > This field is just like text_en, except it adds > > WordDelimiterFilter to enable splitting and matching of > > words on case-change, alpha numeric boundaries, and > > non-alphanumeric chars. This means certain compound word > > cases will work, for example query "wi fi" will match > > document "WiFi" or "wi-fi". However, other cases will still > > not match, for example if the query is "wifi" and the > > document is "wi fi" or if the query is "wi-fi" and the > > document is "wifi". > > --> > > <fieldType name="text_en_splitting" class="solr.TextField" > > positionIncrementGap="100" autoGeneratePhraseQueries="true"> > > <analyzer type="index"> > > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > > <!-- in this example, we will only use synonyms at query time > > <filter class="solr.SynonymFilterFactory" > > synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/> > > --> > > <!-- Case insensitive stop word removal. > > add enablePositionIncrements=true in both the index and query > > analyzers to leave a 'gap' for more accurate phrase queries. > > --> > > <filter class="solr.StopFilterFactory" > > ignoreCase="true" > > words="stopwords_en.txt" > > enablePositionIncrements="true" > > /> > > <filter class="solr.WordDelimiterFilterFactory" > > generateWordParts="1" generateNumberParts="1" catenateWords="1" > > catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/> > > <filter class="solr.LowerCaseFilterFactory"/> > > <filter class="solr.KeywordMarkerFilterFactory" > > protected="protwords.txt"/> > > <filter class="solr.PorterStemFilterFactory"/> > > </analyzer> > > <analyzer type="query"> > > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > > <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" > > ignoreCase="true" expand="true"/> > > <filter class="solr.StopFilterFactory" > > ignoreCase="true" > > words="stopwords_en.txt" > > enablePositionIncrements="true" > > /> > > <filter class="solr.WordDelimiterFilterFactory" > > generateWordParts="1" generateNumberParts="1" catenateWords="0" > > catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/> > > <filter class="solr.LowerCaseFilterFactory"/> > > <filter class="solr.KeywordMarkerFilterFactory" > > protected="protwords.txt"/> > > <filter class="solr.PorterStemFilterFactory"/> > > </analyzer> > > </fieldType> > > > > <!-- Less flexible matching, but less false matches. Probably not > > ideal for product names, > > but may be good for SKUs. Can insert dashes in the wrong place > > and still match. --> > > <fieldType name="text_en_splitting_tight" class="solr.TextField" > > positionIncrementGap="100" autoGeneratePhraseQueries="true"> > > <analyzer> > > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > > <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" > > ignoreCase="true" expand="false"/> > > <filter class="solr.StopFilterFactory" ignoreCase="true" > > words="stopwords_en.txt"/> > > <filter class="solr.WordDelimiterFilterFactory" > > generateWordParts="0" generateNumberParts="0" catenateWords="1" > > catenateNumbers="1" catenateAll="0"/> > > <filter class="solr.LowerCaseFilterFactory"/> > > <filter class="solr.KeywordMarkerFilterFactory" > > protected="protwords.txt"/> > > <filter class="solr.EnglishMinimalStemFilterFactory"/> > > <!-- this filter can remove any duplicate tokens that appear at > the > > same position - sometimes > > possible with WordDelimiterFilter in conjuncton with > stemming. > > --> > > <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> > > </analyzer> > > </fieldType> > > > > <!-- Just like text_general except it reverses the characters of > > each token, to enable more efficient leading wildcard queries. > --> > > <fieldType name="text_general_rev" class="solr.TextField" > > positionIncrementGap="100"> > > <analyzer type="index"> > > <tokenizer class="solr.StandardTokenizerFactory"/> > > <filter class="solr.StopFilterFactory" ignoreCase="true" > > words="stopwords.txt" enablePositionIncrements="true" /> > > <filter class="solr.LowerCaseFilterFactory"/> > > <filter class="solr.ReversedWildcardFilterFactory" > > withOriginal="true" > > maxPosAsterisk="3" maxPosQuestion="2" > > maxFractionAsterisk="0.33"/> > > </analyzer> > > <analyzer type="query"> > > <tokenizer class="solr.StandardTokenizerFactory"/> > > <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" > > ignoreCase="true" expand="true"/> > > <filter class="solr.StopFilterFactory" ignoreCase="true" > > words="stopwords.txt" enablePositionIncrements="true" /> > > <filter class="solr.LowerCaseFilterFactory"/> > > </analyzer> > > </fieldType> > > > > <!-- charFilter + WhitespaceTokenizer --> > > <!-- > > <fieldType name="text_char_norm" class="solr.TextField" > > positionIncrementGap="100" > > > <analyzer> > > <charFilter class="solr.MappingCharFilterFactory" > > mapping="mapping-ISOLatin1Accent.txt"/> > > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > > </analyzer> > > </fieldType> > > --> > > > > <!-- This is an example of using the KeywordTokenizer along > > With various TokenFilterFactories to produce a sortable field > > that does not include some properties of the source text > > --> > > <fieldType name="alphaOnlySort" class="solr.TextField" > > sortMissingLast="true" omitNorms="true"> > > <analyzer> > > <!-- KeywordTokenizer does no actual tokenizing, so the entire > > input string is preserved as a single token > > --> > > <tokenizer class="solr.KeywordTokenizerFactory"/> > > <!-- The LowerCase TokenFilter does what you expect, which can be > > when you want your sorting to be case insensitive > > --> > > <filter class="solr.LowerCaseFilterFactory" /> > > <!-- The TrimFilter removes any leading or trailing whitespace > --> > > <filter class="solr.TrimFilterFactory" /> > > <!-- The PatternReplaceFilter gives you the flexibility to use > > Java Regular expression to replace any sequence of > characters > > matching a pattern with an arbitrary replacement string, > > which may include back references to portions of the > original > > string matched by the pattern. > > > > See the Java Regular Expression documentation for more > > information on pattern and replacement string syntax. > > > > > > > > > http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/package-summary.html > > --> > > <filter class="solr.PatternReplaceFilterFactory" > > pattern="([^a-z])" replacement="" replace="all" > > /> > > </analyzer> > > </fieldType> > > > > <fieldtype name="phonetic" stored="false" indexed="true" > > class="solr.TextField" > > > <analyzer> > > <tokenizer class="solr.StandardTokenizerFactory"/> > > <filter class="solr.DoubleMetaphoneFilterFactory" > inject="false"/> > > </analyzer> > > </fieldtype> > > > > <fieldtype name="payloads" stored="false" indexed="true" > > class="solr.TextField" > > > <analyzer> > > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > > <!-- > > The DelimitedPayloadTokenFilter can put payloads on tokens... for > > example, > > a token of "foo|1.4" would be indexed as "foo" with a payload of > > 1.4f > > Attributes of the DelimitedPayloadTokenFilterFactory : > > "delimiter" - a one character delimiter. Default is | (pipe) > > "encoder" - how to encode the following value into a playload > > float -> org.apache.lucene.analysis.payloads.FloatEncoder, > > integer -> o.a.l.a.p.IntegerEncoder > > identity -> o.a.l.a.p.IdentityEncoder > > Fully Qualified class name implementing PayloadEncoder, > Encoder > > must have a no arg constructor. > > --> > > <filter class="solr.DelimitedPayloadTokenFilterFactory" > > encoder="float"/> > > </analyzer> > > </fieldtype> > > > > <!-- lowercases the entire field value, keeping it as a single token. > > --> > > <fieldType name="lowercase" class="solr.TextField" > > positionIncrementGap="100"> > > <analyzer> > > <tokenizer class="solr.KeywordTokenizerFactory"/> > > <filter class="solr.LowerCaseFilterFactory" /> > > </analyzer> > > </fieldType> > > > > <fieldType name="text_path" class="solr.TextField" > > positionIncrementGap="100"> > > <analyzer> > > <tokenizer class="solr.PathHierarchyTokenizerFactory"/> > > </analyzer> > > </fieldType> > > > > <!-- since fields of this type are by default not stored or indexed, > > any data added to them will be ignored outright. --> > > <fieldtype name="ignored" stored="false" indexed="false" > > multiValued="true" class="solr.StrField" /> > > > > <!-- This point type indexes the coordinates as separate fields > > (subFields) > > If subFieldType is defined, it references a type, and a dynamic > field > > definition is created matching *___<typename>. Alternately, if > > subFieldSuffix is defined, that is used to create the subFields. > > Example: if subFieldType="double", then the coordinates would be > > indexed in fields myloc_0___double,myloc_1___double. > > Example: if subFieldSuffix="_d" then the coordinates would be > indexed > > in fields myloc_0_d,myloc_1_d > > The subFields are an implementation detail of the fieldType, and > end > > users normally should not need to know about them. > > --> > > <fieldType name="point" class="solr.PointType" dimension="2" > > subFieldSuffix="_d"/> > > > > <!-- A specialized field for geospatial search. If indexed, this > > fieldType must not be multivalued. --> > > <fieldType name="location" class="solr.LatLonType" > > subFieldSuffix="_coordinate"/> > > > > <!-- > > A Geohash is a compact representation of a latitude longitude pair > in a > > single field. > > See http://wiki.apache.org/solr/SpatialSearch > > --> > > <fieldtype name="geohash" class="solr.GeoHashField"/> > > </types> > > > > > > <fields> > > > > <field name="_version_" type="long" indexed="true" stored="true"/> > > <!-- Valid attributes for fields: > > name: mandatory - the name for the field > > type: mandatory - the name of a previously defined type from the > > <types> section > > indexed: true if this field should be indexed (searchable or > sortable) > > stored: true if this field should be retrievable > > multiValued: true if this field may contain multiple values per > > document > > omitNorms: (expert) set to true to omit the norms associated with > > this field (this disables length normalization and index-time > > boosting for the field, and saves some memory). Only full-text > > fields or fields that need an index-time boost need norms. > > termVectors: [false] set to true to store the term vector for a > > given field. > > When using MoreLikeThis, fields used for similarity should be > > stored for best performance. > > termPositions: Store position information with the term vector. > > This will increase storage costs. > > termOffsets: Store offset information with the term vector. This > > will increase storage costs. > > default: a value that should be used if no value is specified > > when adding a document. > > --> > > <!-- newspaper-specific fields --> > > > > <!-- Unique field (name of the original one-page PDF file with > no > > extention --> > > <field name="id" type="string" indexed="true" stored="true" > > required="true" /> > > > > <!-- tdate that the paper was published, like > 1997-11-30T00:00:00Z > > --> > > <field name="publication_date" type="tdate" indexed="true" > > stored="true" required="true" /> > > > > <!-- Integer year that the paper was published --> > > <field name="year" type="int" indexed="true" stored="true" > > required="true" /> > > > > <!-- String of the year, like '1998' --> > > <field name="yearstr" type="string" indexed="true" stored="true" > > required="true" /> > > > > <!-- Integer of the day of the month (no zero padding)--> > > <field name="day" type="int" indexed="true" stored="true" > > required="true" /> > > > > <!-- Integer number of the month (no zero padding)--> > > <field name="month_num" type="int" indexed="true" stored="true" > > required="true" /> > > > > <!-- Name of month, like 'January'--> > > <field name="month" type="string" indexed="true" stored="true" > > required="true" /> > > > > <!-- name of the publication, i.e., Battle Creek Enquirer --> > > <field name="publication_name" type="string" indexed="true" > > stored="true" required="true" /> > > > > <!-- Short name of the publication, i.e., battlecreekenquirer --> > > <field name="short_name" type="string" indexed="true" > stored="true" > > required="true" /> > > > > <!-- Image number (roughly page number, no zero padding, will > match > > last 3 digits of filename) --> > > <field name="image_number" type="int" indexed="true" > stored="true" > > required="true" /> > > > > <!-- Name of the PDF file (just the filename, no path) --> > > <field name="filename" type="string" indexed="true" stored="true" > > required="true" /> > > > > <!-- Copyright Restricted (not allowed outside willard networks > > values: yes, no) --> > > <field name="copyright_restricted" type="string" indexed="true" > > stored="true" required="true" /> > > > > <!-- Copyright Year (copyright cut off) --> > > <field name="copyright_year" type="string" indexed="true" > > stored="true" required="true" /> > > > > <!-- Publication Type (newspaper or shopper) --> > > <field name="publication_type" type="string" indexed="true" > > stored="true" required="true" /> > > > > <!-- Publication Text --> > > <field name="publication_text" type="string" indexed="true" > > stored="true" required="true" multiValued="true"/> > > > > <!-- end newspaper-specific fields --> > > > > <field name="sku" type="text_en_splitting_tight" indexed="true" > > stored="true" omitNorms="true"/> > > <field name="name" type="text_general" indexed="true" stored="true"/> > > <field name="alphaNameSort" type="alphaOnlySort" indexed="true" > > stored="false"/> > > <field name="manu" type="text_general" indexed="true" stored="true" > > omitNorms="true"/> > > <field name="cat" type="string" indexed="true" stored="true" > > multiValued="true"/> > > <field name="features" type="text_general" indexed="true" > stored="true" > > multiValued="true"/> > > <field name="includes" type="text_general" indexed="true" > stored="true" > > termVectors="true" termPositions="true" termOffsets="true" /> > > > > <field name="weight" type="float" indexed="true" stored="true"/> > > <field name="price" type="float" indexed="true" stored="true"/> > > <field name="popularity" type="int" indexed="true" stored="true" /> > > <field name="inStock" type="boolean" indexed="true" stored="true" /> > > > > <!-- > > The following store examples are used to demonstrate the various ways > > one might _CHOOSE_ to > > implement spatial. It is highly unlikely that you would ever have > ALL > > of these fields defined. > > --> > > <field name="store" type="location" indexed="true" stored="true"/> > > > > <!-- Common metadata fields, named specifically to match up with > > SolrCell metadata when parsing rich documents such as Word, PDF. > > Some fields are multiValued only because Tika currently may return > > multiple values for them. > > --> > > <field name="title" type="text_general" indexed="true" stored="true" > > multiValued="true"/> > > <field name="subject" type="text_general" indexed="true" > stored="true"/> > > <field name="description" type="text_general" indexed="true" > > stored="true"/> > > <field name="comments" type="text_general" indexed="true" > > stored="true"/> > > <field name="author" type="text_general" indexed="true" > stored="true"/> > > <field name="keywords" type="text_general" indexed="true" > > stored="true"/> > > <field name="category" type="text_general" indexed="true" > > stored="true"/> > > <field name="content_type" type="string" indexed="true" stored="true" > > multiValued="true"/> > > <field name="last_modified" type="date" indexed="true" stored="true"/> > > <field name="links" type="string" indexed="true" stored="true" > > multiValued="true"/> > > > > > > <!-- catchall field, containing all other searchable text fields > > (implemented > > via copyField further on in this schema --> > > <field name="text" type="text_general" indexed="true" stored="true" > > multiValued="true"/> > > > > <!-- catchall text field that indexes tokens both normally and in > > reverse for efficient > > leading wildcard queries. --> > > <field name="text_rev" type="text_general_rev" indexed="true" > > stored="false" multiValued="true"/> > > > > <!-- non-tokenized version of manufacturer to make it easier to sort > or > > group > > results by manufacturer. copied from "manu" via copyField --> > > <field name="manu_exact" type="string" indexed="true" stored="false"/> > > > > <field name="payloads" type="payloads" indexed="true" stored="true"/> > > > > <!-- Uncommenting the following will create a "timestamp" field using > > a default value of "NOW" to indicate when each document was > > indexed. > > --> > > <!-- > > <field name="timestamp" type="date" indexed="true" stored="true" > > default="NOW" multiValued="false"/> > > --> > > > > > > <!-- Dynamic field definitions. If a field name is not found, > > dynamicFields > > will be used if the name matches any of the patterns. > > RESTRICTION: the glob-like pattern in the name attribute must > have > > a "*" only at the start or the end. > > EXAMPLE: name="*_i" will match any field ending in _i (like > > myid_i, z_i) > > Longer patterns will be matched first. if equal size patterns > > both match, the first appearing in the schema will be used. --> > > <dynamicField name="*_i" type="int" indexed="true" > stored="true"/> > > <dynamicField name="*_s" type="string" indexed="true" > stored="true"/> > > <dynamicField name="*_l" type="long" indexed="true" > stored="true"/> > > <dynamicField name="*_t" type="text_general" indexed="true" > > stored="true"/> > > <dynamicField name="*_txt" type="text_general" indexed="true" > > stored="true" multiValued="true"/> > > <dynamicField name="*_b" type="boolean" indexed="true" > stored="true"/> > > <dynamicField name="*_f" type="float" indexed="true" > stored="true"/> > > <dynamicField name="*_d" type="double" indexed="true" > stored="true"/> > > > > <!-- Type used to index the lat and lon components for the "location" > > FieldType --> > > <dynamicField name="*_coordinate" type="tdouble" indexed="true" > > stored="false"/> > > > > <dynamicField name="*_dt" type="date" indexed="true" > stored="true"/> > > <dynamicField name="*_p" type="location" indexed="true" > stored="true"/> > > > > <!-- some trie-coded dynamic fields for faster range queries --> > > <dynamicField name="*_ti" type="tint" indexed="true" > stored="true"/> > > <dynamicField name="*_tl" type="tlong" indexed="true" > stored="true"/> > > <dynamicField name="*_tf" type="tfloat" indexed="true" > stored="true"/> > > <dynamicField name="*_td" type="tdouble" indexed="true" > stored="true"/> > > <dynamicField name="*_tdt" type="tdate" indexed="true" > stored="true"/> > > > > > > <dynamicField name="ignored_*" type="ignored" multiValued="true"/> > > <dynamicField name="attr_*" type="text_general" indexed="true" > > stored="true" multiValued="true"/> > > > > <dynamicField name="random_*" type="random" /> > > > > <!-- uncomment the following to ignore any fields that don't already > > match an existing > > field name or dynamic field, rather than reporting them as an > > error. > > alternately, change the type="ignored" to some other type e.g. > > "text" if you want > > unknown fields indexed and/or stored by default --> > > <!--dynamicField name="*" type="ignored" multiValued="true" /--> > > > > </fields> > > > > <!-- Field to use to determine and enforce document uniqueness. > > Unless this field is marked with required="false", it will be a > > required field > > --> > > <uniqueKey>id</uniqueKey> > > > > <!-- field for the QueryParser to use when an explicit fieldname is > absent > > --> > > <defaultSearchField>text</defaultSearchField> > > > > <!-- SolrQueryParser configuration: defaultOperator="AND|OR" --> > > <solrQueryParser defaultOperator="OR"/> > > > > <!-- copyField commands copy one field to another at the time a > document > > is added to the index. It's used either to index the same field > > differently, > > or to add multiple fields to the same field for easier/faster > > searching. --> > > > > <copyField source="cat" dest="text"/> > > <copyField source="name" dest="text"/> > > <copyField source="manu" dest="text"/> > > <copyField source="features" dest="text"/> > > <copyField source="includes" dest="text"/> > > <copyField source="manu" dest="manu_exact"/> > > <copyField source="publication_text" dest="text" /> > > <!-- Above, multiple source fields are copied to the [text] field. > > Another way to map multiple source fields to the same > > destination field is to use the dynamic field syntax. > > copyField also supports a maxChars to copy setting. --> > > > > <copyField source="*_t" dest="text" maxChars="300000000"/> > > > > <!-- copy name to alphaNameSort, a field designed for sorting by name > > --> > > <!-- <copyField source="name" dest="alphaNameSort"/> --> > > > > > > <!-- Similarity is the scoring routine for each document vs. a query. > > A custom similarity may be specified here, but the default is fine > > for most applications. --> > > <!-- <similarity class="org.apache.lucene.search.DefaultSimilarity"/> > --> > > <!-- ... OR ... > > Specify a SimilarityFactory class name implementation > > allowing parameters to be used. > > --> > > <!-- > > <similarity class="com.example.solr.CustomSimilarityFactory"> > > <str name="paramkey">param value</str> > > </similarity> > > --> > > > > > > </schema> > > > > > > > > > > > > On Sat, Jun 27, 2015 at 11:27 AM, Erick Erickson < > erickerick...@gmail.com> > > wrote: > > > > > This should be no different in 5.2 than 4.6. > > > > > > My first guess is a typo somewhere or some similar forehead-slapper. > > > Are you sure you're specifying the field in the "fl" list? > > > > > > Take a look at the index files, the *.fdt files are where the stored > data > > > goes. You can't look into them, but for the same documents they should > > > be roughly the same aggregate size as they are in 4.6 > > > 'du -hc *.fdt' will sum them all up for you (*nix). > > > > > > Second thing I'd do for sanity check is tail out the Solr log while > > > indexing and querying, just to see "stuff" go by and see if any > > > errors are thrown, although it sounds like you wouldn't see > > > any search results at all if there was something wrong with > > > indexing. > > > > > > And if none of that sheds any light, let's see the schema file? > > > Maybe the results of adding &debug=all to the query? > > > > > > Best, > > > Erick > > > > > > On Fri, Jun 26, 2015 at 8:05 AM, Mark Ehle <marke...@gmail.com> wrote: > > > > In my schema from 4.6, the text was in the 'text' field, and the > > "stored" > > > > attrib was set to "true" as it is in the 5.2 schema. I am ingesting > the > > > > text from files on the server , and it used to work just fine with > > 4.6. I > > > > am using the same schema except I had to get rid the field types > pint, > > > > plong, pfloat, pdouble and pdate. Otherwise, the schema is identical. > > > > > > > > How do I tell SOLR 5.2 to store the text from a file to a certain > > field? > > > > > > > > Thanks! > > > > > > > > > > > > On Fri, Jun 26, 2015 at 7:29 AM, Alessandro Benedetti < > > > > benedetti.ale...@gmail.com> wrote: > > > > > > > >> Actually storing or not storing a field is a simple schema.xml > > > >> configuration. > > > >> This suggestion can be obvious, but … have you checked you have your > > > >> "stored" attribute set "true" for the field you are interested ? > > > >> > > > >> I am talking about the 5.2 schema. > > > >> > > > >> Cheers > > > >> > > > >> 2015-06-26 12:24 GMT+01:00 Mark Ehle <marke...@gmail.com>: > > > >> > > > >> > Folks - > > > >> > > > > >> > I am using SOLR 4.6 to run a newspaper indexing site we have at > the > > > >> library > > > >> > I work at. I would like to update to 5.2, and I have an instance > of > > it > > > >> > running. When I go to index the txt files of each newspaper page, > I > > > can > > > >> > search and find stuff, but there is no text stored any more. I do > > use > > > >> > highlighting so I need the text there. > > > >> > > > > >> > What would be different about 5.2 that would account for this? > > > >> > > > > >> > Thanks! > > > >> > > > > >> > Mark Ehle > > > >> > Computer Support Librarian > > > >> > Willard Library > > > >> > Battle Creek,MI > > > >> > > > > >> > > > >> > > > >> > > > >> -- > > > >> -------------------------- > > > >> > > > >> Benedetti Alessandro > > > >> Visiting card : http://about.me/alessandro_benedetti > > > >> > > > >> "Tyger, tyger burning bright > > > >> In the forests of the night, > > > >> What immortal hand or eye > > > >> Could frame thy fearful symmetry?" > > > >> > > > >> William Blake - Songs of Experience -1794 England > > > >> > > > > > > > > > -- > -------------------------- > > Benedetti Alessandro > Visiting card : http://about.me/alessandro_benedetti > > "Tyger, tyger burning bright > In the forests of the night, > What immortal hand or eye > Could frame thy fearful symmetry?" > > William Blake - Songs of Experience -1794 England >