Hi Alex; What do you mean with wrong case. Could you tell me what should I do?
2013/4/25 Alexandre Rafalovitch <arafa...@gmail.com> > You still seem to have 'fieldtype' with wrong case. Can you try that > simple thing before doing other complicated steps? And yes, restart > Solr after you change schema.xml > > Regards, > Alex. > Personal blog: http://blog.outerthoughts.com/ > LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch > - Time is the quality of nature that keeps events from happening all > at once. Lately, it doesn't seem to be working. (Anonymous - via GTD > book) > > > On Wed, Apr 24, 2013 at 6:50 PM, Furkan KAMACI <furkankam...@gmail.com> > wrote: > > Here is my definition for handler: > > > > <requestHandler name="/update/extract" class="solr.extraction. > > ExtractingRequestHandler" > > > <lst name="defaults"> > > <str name="fmap.content">text</str> > > <str name="lowernames">true</str> > > <str name="uprefix">attr_</str> > > <str name="captureAttr">true</str> > > </lst> > > </requestHandler> > > > > > > > > > > 2013/4/25 Furkan KAMACI <furkankam...@gmail.com> > > > >> I just want to search on rich documents but I still get same error. I > have > >> copied example folder into anywhere else at my computer. I have copied > dist > >> and contrib folders from my build folder into that copy of example > folder > >> (because solr-cell etc. are within that folders) However I still get > same > >> error. If any of you could help me you are welcome. Here is my schema: > >> > >> > >> <?xml version="1.0" encoding="UTF-8" ?> > >> <!-- > >> Licensed to the Apache Software Foundation (ASF) under one or more > >> contributor license agreements. See the NOTICE file distributed with > >> this work for additional information regarding copyright ownership. > >> The ASF licenses this file to You under the Apache License, Version 2.0 > >> (the "License"); you may not use this file except in compliance with > >> the License. You may obtain a copy of the License at > >> > >> http://www.apache.org/licenses/LICENSE-2.0 > >> > >> Unless required by applicable law or agreed to in writing, software > >> distributed under the License is distributed on an "AS IS" BASIS, > >> WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. > >> See the License for the specific language governing permissions and > >> limitations under the License. > >> --> > >> <!-- > >> Description: This document contains Solr 4.x schema definition to > >> be used with Solr integration currently build into Nutch. > >> This schema is not minimal, there are some useful field type definitions > >> left, > >> and the set of fields and their flags (indexed/stored/term vectors) can > be > >> further optimized depending on needs. See > >> > >> > http://svn.apache.org/viewvc/lucene/dev/trunk/solr/example/solr/conf/schema.xml?view=markup > >> for more info. > >> --> > >> > >> <schema name="nutch" version="1.5"> > >> > >> <types> > >> > >> <!-- The StrField type is not analyzed, but indexed/stored verbatim. --> > >> <fieldType name="string" class="solr.StrField" sortMissingLast="true" > >> omitNorms="true"/> > >> > >> > >> <!-- > >> Default numeric field types. For faster range queries, consider the > >> tint/tfloat/tlong/tdouble types. > >> --> > >> <fieldType name="int" class="solr.TrieIntField" precisionStep="0" > >> omitNorms="true" positionIncrementGap="0"/> > >> <fieldType name="float" class="solr.TrieFloatField" precisionStep="0" > >> omitNorms="true" positionIncrementGap="0"/> > >> <fieldType name="long" class="solr.TrieLongField" precisionStep="0" > >> omitNorms="true" positionIncrementGap="0"/> > >> <fieldType name="double" class="solr.TrieDoubleField" precisionStep="0" > >> omitNorms="true" positionIncrementGap="0"/> > >> > >> <!-- > >> Numeric field types that index each value at various levels of precision > >> to accelerate range queries when the number of values between the range > >> endpoints is large. See the javadoc for NumericRangeQuery for internal > >> implementation details. > >> > >> Smaller precisionStep values (specified in bits) will lead to more > tokens > >> indexed per value, slightly larger index size, and faster range queries. > >> A precisionStep of 0 disables indexing at different precision levels. > >> --> > >> <fieldType name="tint" class="solr.TrieIntField" precisionStep="8" > >> omitNorms="true" positionIncrementGap="0"/> > >> <fieldType name="tfloat" class="solr.TrieFloatField" precisionStep="8" > >> omitNorms="true" positionIncrementGap="0"/> > >> <fieldType name="tlong" class="solr.TrieLongField" precisionStep="8" > >> omitNorms="true" positionIncrementGap="0"/> > >> <fieldType name="tdouble" class="solr.TrieDoubleField" precisionStep="8" > >> omitNorms="true" positionIncrementGap="0"/> > >> > >> <!-- The format for this date field is of the form 1995-12-31T23:59:59Z, > >> and > >> is a more restricted form of the canonical representation of dateTime > >> http://www.w3.org/TR/xmlschema-2/#dateTime > >> The trailing "Z" designates UTC time and is mandatory. > >> Optional fractional seconds are allowed: 1995-12-31T23:59:59.999Z > >> All other components are mandatory. > >> > >> Expressions can also be used to denote calculations that should be > >> performed relative to "NOW" to determine the value, ie... > >> > >> NOW/HOUR > >> ... Round to the start of the current hour > >> NOW-1DAY > >> ... Exactly 1 day prior to now > >> NOW/DAY+6MONTHS+3DAYS > >> ... 6 months and 3 days in the future from the start of > >> the current day > >> > >> Consult the DateField javadocs for more information. > >> > >> Note: For faster range queries, consider the tdate type > >> --> > >> <fieldType name="date" class="solr.TrieDateField" omitNorms="true" > >> precisionStep="0" positionIncrementGap="0"/> > >> > >> <!-- A Trie based date field for faster date range queries and date > >> faceting. --> > >> <fieldType name="tdate" class="solr.TrieDateField" omitNorms="true" > >> precisionStep="6" positionIncrementGap="0"/> > >> > >> > >> <!-- solr.TextField allows the specification of custom text analyzers > >> specified as a tokenizer and a list of token filters. Different > >> analyzers may be specified for indexing and querying. > >> > >> The optional positionIncrementGap puts space between multiple fields of > >> this type on the same document, with the purpose of preventing false > phrase > >> matching across fields. > >> > >> For more info on customizing your analyzer chain, please see > >> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters > >> --> > >> > >> <!-- A general text field that has reasonable, generic > >> cross-language defaults: it tokenizes with StandardTokenizer, > >> removes stop words from case-insensitive "stopwords.txt" > >> (empty by default), and down cases. At query time only, it > >> also applies synonyms. --> > >> <fieldType name="text_general" class="solr.TextField" > >> positionIncrementGap="100"> > >> <analyzer type="index"> > >> <tokenizer class="solr.StandardTokenizerFactory"/> > >> <filter class="solr.StopFilterFactory" ignoreCase="true" > >> words="stopwords.txt" enablePositionIncrements="true" /> > >> <!-- in this example, we will only use synonyms at query time > >> <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" > >> ignoreCase="true" expand="false"/> > >> --> > >> <filter class="solr.LowerCaseFilterFactory"/> > >> </analyzer> > >> <analyzer type="query"> > >> <tokenizer class="solr.StandardTokenizerFactory"/> > >> <filter class="solr.StopFilterFactory" ignoreCase="true" > >> words="stopwords.txt" enablePositionIncrements="true" /> > >> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" > >> ignoreCase="true" expand="true"/> > >> <filter class="solr.LowerCaseFilterFactory"/> > >> </analyzer> > >> </fieldType> > >> > >> <!-- A text field with defaults appropriate for English: it > >> tokenizes with StandardTokenizer, removes English stop words > >> (stopwords.txt), down cases, protects words from protwords.txt, and > >> finally applies Porter's stemming. The query time analyzer > >> also applies synonyms from synonyms.txt. --> > >> <fieldType name="text_en" class="solr.TextField" > >> positionIncrementGap="100"> > >> <analyzer type="index"> > >> <tokenizer class="solr.StandardTokenizerFactory"/> > >> <!-- in this example, we will only use synonyms at query time > >> <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" > >> ignoreCase="true" expand="false"/> > >> --> > >> <!-- Case insensitive stop word removal. > >> add enablePositionIncrements=true in both the index and query > >> analyzers to leave a 'gap' for more accurate phrase queries. > >> --> > >> <filter class="solr.StopFilterFactory" > >> ignoreCase="true" > >> words="stopwords.txt" > >> enablePositionIncrements="true" > >> /> > >> <filter class="solr.LowerCaseFilterFactory"/> > >> <filter class="solr.EnglishPossessiveFilterFactory"/> > >> <filter class="solr.KeywordMarkerFilterFactory" > protected="protwords.txt"/> > >> <!-- Optionally you may want to use this less aggressive stemmer instead > >> of PorterStemFilterFactory: > >> <filter class="solr.EnglishMinimalStemFilterFactory"/> > >> --> > >> <filter class="solr.PorterStemFilterFactory"/> > >> </analyzer> > >> <analyzer type="query"> > >> <tokenizer class="solr.StandardTokenizerFactory"/> > >> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" > >> ignoreCase="true" expand="true"/> > >> <filter class="solr.StopFilterFactory" > >> ignoreCase="true" > >> words="stopwords.txt" > >> enablePositionIncrements="true" > >> /> > >> <filter class="solr.LowerCaseFilterFactory"/> > >> <filter class="solr.EnglishPossessiveFilterFactory"/> > >> <filter class="solr.KeywordMarkerFilterFactory" > protected="protwords.txt"/> > >> <!-- Optionally you may want to use this less aggressive stemmer instead > >> of PorterStemFilterFactory: > >> <filter class="solr.EnglishMinimalStemFilterFactory"/> > >> --> > >> <filter class="solr.PorterStemFilterFactory"/> > >> </analyzer> > >> </fieldType> > >> > >> <!-- A text field with defaults appropriate for English, plus > >> aggressive word-splitting and autophrase features enabled. > >> This field is just like text_en, except it adds > >> WordDelimiterFilter to enable splitting and matching of > >> words on case-change, alpha numeric boundaries, and > >> non-alphanumeric chars. This means certain compound word > >> cases will work, for example query "wi fi" will match > >> document "WiFi" or "wi-fi". However, other cases will still > >> not match, for example if the query is "wifi" and the > >> document is "wi fi" or if the query is "wi-fi" and the > >> document is "wifi". > >> --> > >> <fieldType name="text_en_splitting" class="solr.TextField" > >> positionIncrementGap="100" autoGeneratePhraseQueries="true"> > >> <analyzer type="index"> > >> <tokenizer class="solr.WhitespaceTokenizerFactory"/> > >> <!-- in this example, we will only use synonyms at query time > >> <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" > >> ignoreCase="true" expand="false"/> > >> --> > >> <!-- Case insensitive stop word removal. > >> add enablePositionIncrements=true in both the index and query > >> analyzers to leave a 'gap' for more accurate phrase queries. > >> --> > >> <filter class="solr.StopFilterFactory" > >> ignoreCase="true" > >> words="stopwords.txt" > >> enablePositionIncrements="true" > >> /> > >> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" > >> generateNumberParts="1" catenateWords="1" catenateNumbers="1" > >> catenateAll="0" splitOnCaseChange="1"/> > >> <filter class="solr.LowerCaseFilterFactory"/> > >> <filter class="solr.KeywordMarkerFilterFactory" > protected="protwords.txt"/> > >> <filter class="solr.PorterStemFilterFactory"/> > >> </analyzer> > >> <analyzer type="query"> > >> <tokenizer class="solr.WhitespaceTokenizerFactory"/> > >> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" > >> ignoreCase="true" expand="true"/> > >> <filter class="solr.StopFilterFactory" > >> ignoreCase="true" > >> words="stopwords.txt" > >> enablePositionIncrements="true" > >> /> > >> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" > >> generateNumberParts="1" catenateWords="0" catenateNumbers="0" > >> catenateAll="0" splitOnCaseChange="1"/> > >> <filter class="solr.LowerCaseFilterFactory"/> > >> <filter class="solr.KeywordMarkerFilterFactory" > protected="protwords.txt"/> > >> <filter class="solr.PorterStemFilterFactory"/> > >> </analyzer> > >> </fieldType> > >> > >> <!-- Less flexible matching, but less false matches. Probably not ideal > >> for product names, > >> but may be good for SKUs. Can insert dashes in the wrong place and still > >> match. --> > >> <fieldType name="text_en_splitting_tight" class="solr.TextField" > >> positionIncrementGap="100" autoGeneratePhraseQueries="true"> > >> <analyzer> > >> <tokenizer class="solr.WhitespaceTokenizerFactory"/> > >> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" > >> ignoreCase="true" expand="false"/> > >> <filter class="solr.StopFilterFactory" ignoreCase="true" > >> words="stopwords.txt"/> > >> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="0" > >> generateNumberParts="0" catenateWords="1" catenateNumbers="1" > >> catenateAll="0"/> > >> <filter class="solr.LowerCaseFilterFactory"/> > >> <filter class="solr.KeywordMarkerFilterFactory" > protected="protwords.txt"/> > >> <filter class="solr.EnglishMinimalStemFilterFactory"/> > >> <!-- this filter can remove any duplicate tokens that appear at the same > >> position - sometimes > >> possible with WordDelimiterFilter in conjuncton with stemming. --> > >> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> > >> </analyzer> > >> </fieldType> > >> > >> <!-- Just like text_general except it reverses the characters of > >> each token, to enable more efficient leading wildcard queries. --> > >> <fieldType name="text_general_rev" class="solr.TextField" > >> positionIncrementGap="100"> > >> <analyzer type="index"> > >> <tokenizer class="solr.StandardTokenizerFactory"/> > >> <filter class="solr.StopFilterFactory" ignoreCase="true" > >> words="stopwords.txt" enablePositionIncrements="true" /> > >> <filter class="solr.LowerCaseFilterFactory"/> > >> <filter class="solr.ReversedWildcardFilterFactory" withOriginal="true" > >> maxPosAsterisk="3" maxPosQuestion="2" maxFractionAsterisk="0.33"/> > >> </analyzer> > >> <analyzer type="query"> > >> <tokenizer class="solr.StandardTokenizerFactory"/> > >> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" > >> ignoreCase="true" expand="true"/> > >> <filter class="solr.StopFilterFactory" ignoreCase="true" > >> words="stopwords.txt" enablePositionIncrements="true" /> > >> <filter class="solr.LowerCaseFilterFactory"/> > >> </analyzer> > >> </fieldType> > >> > >> <fieldtype name="phonetic" stored="false" indexed="true" > >> class="solr.TextField" > > >> <analyzer> > >> <tokenizer class="solr.StandardTokenizerFactory"/> > >> <filter class="solr.DoubleMetaphoneFilterFactory" inject="false"/> > >> </analyzer> > >> </fieldtype> > >> > >> <fieldtype name="payloads" stored="false" indexed="true" > >> class="solr.TextField" > > >> <analyzer> > >> <tokenizer class="solr.WhitespaceTokenizerFactory"/> > >> <!-- > >> The DelimitedPayloadTokenFilter can put payloads on tokens... for > example, > >> a token of "foo|1.4" would be indexed as "foo" with a payload of 1.4f > >> Attributes of the DelimitedPayloadTokenFilterFactory : > >> "delimiter" - a one character delimiter. Default is | (pipe) > >> "encoder" - how to encode the following value into a playload > >> float -> org.apache.lucene.analysis.payloads.FloatEncoder, > >> integer -> o.a.l.a.p.IntegerEncoder > >> identity -> o.a.l.a.p.IdentityEncoder > >> Fully Qualified class name implementing PayloadEncoder, Encoder must > have > >> a no arg constructor. > >> --> > >> <filter class="solr.DelimitedPayloadTokenFilterFactory" > encoder="float"/> > >> </analyzer> > >> </fieldtype> > >> > >> <!-- lowercases the entire field value, keeping it as a single token. > --> > >> <fieldType name="lowercase" class="solr.TextField" > >> positionIncrementGap="100"> > >> <analyzer> > >> <tokenizer class="solr.KeywordTokenizerFactory"/> > >> <filter class="solr.LowerCaseFilterFactory" /> > >> </analyzer> > >> </fieldType> > >> > >> <fieldType name="url" class="solr.TextField" positionIncrementGap="100"> > >> <analyzer> > >> <tokenizer class="solr.StandardTokenizerFactory"/> > >> <filter class="solr.LowerCaseFilterFactory"/> > >> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" > >> generateNumberParts="1"/> > >> </analyzer> > >> </fieldType> > >> > >> > >> <fieldType name="text_path" class="solr.TextField" > >> positionIncrementGap="100"> > >> <analyzer> > >> <tokenizer class="solr.PathHierarchyTokenizerFactory"/> > >> </analyzer> > >> </fieldType> > >> > >> <!-- since fields of this type are by default not stored or indexed, > >> any data added to them will be ignored outright. --> > >> <fieldtype name="ignored" stored="false" indexed="false" > >> multiValued="true" class="solr.StrField" /> > >> > >> </types> > >> > >> <fields> > >> <field name="id" type="string" indexed="true" stored="true" > >> required="true" multiValued="false" /> > >> <field name="text" type="text_general" indexed="true" stored="true"/> > >> <dynamicField name="attr_*" type="text_general" indexed="true" > >> stored="true" multiValued="true"/> > >> <!-- Common metadata fields, named specifically to match up with > >> SolrCell metadata when parsing rich documents such as Word, PDF. > >> Some fields are multiValued only because Tika currently may return > >> multiple values for them. Some metadata is parsed from the documents, > >> but there are some which come from the client context: > >> "content_type": From the HTTP headers of incoming stream > >> "resourcename": From SolrCell request param resource.name > >> --> > >> <field name="title" type="text_general" indexed="true" stored="true" > >> multiValued="true"/> > >> <field name="subject" type="text_general" indexed="true" stored="true"/> > >> <field name="description" type="text_general" indexed="true" > >> stored="true"/> > >> <field name="comments" type="text_general" indexed="true" > stored="true"/> > >> <field name="author" type="text_general" indexed="true" stored="true"/> > >> <field name="keywords" type="text_general" indexed="true" > stored="true"/> > >> <field name="category" type="text_general" indexed="true" > stored="true"/> > >> <field name="resourcename" type="text_general" indexed="true" > >> stored="true"/> > >> <field name="url" type="text_general" indexed="true" stored="true"/> > >> <field name="content_type" type="string" indexed="true" stored="true" > >> multiValued="true"/> > >> <field name="last_modified" type="date" indexed="true" stored="true"/> > >> <field name="links" type="string" indexed="true" stored="true" > >> multiValued="true"/> > >> > >> <!-- Main body of document extracted by SolrCell. > >> NOTE: This field is not indexed by default, since it is also copied to > >> "text" > >> using copyField below. This is to save space. Use this field for > returning > >> and > >> highlighting document content. Use the "text" field to search the > content. > >> --> > >> <field name="content" type="text_general" indexed="false" stored="true" > >> multiValued="true"/> > >> > >> <field name="_version_" type="long" indexed="true" stored="true"/> > >> > >> <dynamicField name="*" type="ignored" multiValued="true"/> > >> </fields> > >> > >> <uniqueKey>id</uniqueKey> > >> <defaultSearchField>text</defaultSearchField> > >> <solrQueryParser defaultOperator="OR"/> > >> > >> <!-- Text fields from SolrCell to search by default in our catch-all > field > >> --> > >> <copyField source="title" dest="text"/> > >> <copyField source="author" dest="text"/> > >> <copyField source="description" dest="text"/> > >> <copyField source="keywords" dest="text"/> > >> <copyField source="content" dest="text"/> > >> <copyField source="content_type" dest="text"/> > >> <copyField source="resourcename" dest="text"/> > >> <copyField source="url" dest="text"/> > >> > >> <!-- Create a string version of author for faceting --> > >> <copyField source="author" dest="author_s"/> > >> > >> </schema> > >> > >> > >> > >> > >> 2013/4/25 Erik Hatcher <erik.hatc...@gmail.com> > >> > >>> Did you restart after adding those fields and types? > >>> > >>> On Apr 24, 2013, at 16:59, Furkan KAMACI <furkankam...@gmail.com> > wrote: > >>> > >>> > I have added that fields: > >>> > > >>> > <field name="text" type="text_general" indexed="true" stored="true"/> > >>> > <dynamicField name="attr_*" type="text_general" indexed="true" > >>> > stored="true" multiValued="true"/> > >>> > <dynamicField name="ignored_*" type="ignored"/> > >>> > > >>> > and I have that definition: > >>> > > >>> > <fieldtype name="ignored" stored="false" indexed="false" > >>> multiValued="true" > >>> > class="solr.StrField" /> > >>> > > >>> > here is my error: > >>> > > >>> > <?xml version="1.0" encoding="UTF-8"?> > >>> > <response> > >>> > <lst name="responseHeader"> > >>> > <int name="status">400</int> > >>> > <int name="QTime">4154</int> > >>> > </lst> > >>> > <lst name="error"> > >>> > <str name="msg">ERROR: [doc=1] unknown field 'ignored_meta'</str> > >>> > <int name="code">400</int> > >>> > </lst> > >>> > </response> > >>> > > >>> > What should I do more? > >>> > > >>> > 2013/4/24 Erik Hatcher <erik.hatc...@gmail.com> > >>> > > >>> >> Also, at Solr startup time it logs what it loads from those <lib> > >>> >> elements, so you can see whether it is loading the files you intend > to > >>> or > >>> >> not. > >>> >> > >>> >> Erik > >>> >> > >>> >> On Apr 24, 2013, at 10:05 , Alexandre Rafalovitch wrote: > >>> >> > >>> >>> Have you tried using absolute path to the relevant urls? That will > >>> >>> cleanly split the problem into 'still not working' and 'wrong > relative > >>> >>> path'. > >>> >>> > >>> >>> Regards, > >>> >>> Alex. > >>> >>> On Wed, Apr 24, 2013 at 9:02 AM, Furkan KAMACI < > >>> furkankam...@gmail.com> > >>> >> wrote: > >>> >>>> <lib dir="../../../contrib/extraction/lib" regex=".*\.jar" /> > >>> >>>> <lib dir="../../../dist/" regex="solr-cell-\d.*\.jar" /> > >>> >>> > >>> >>> > >>> >>> > >>> >>> Personal blog: http://blog.outerthoughts.com/ > >>> >>> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch > >>> >>> - Time is the quality of nature that keeps events from happening > all > >>> >>> at once. Lately, it doesn't seem to be working. (Anonymous - via > GTD > >>> >>> book) > >>> >> > >>> >> > >>> > >> > >> >