I suspect the following should do (1). I'm just not sure about file references as in stopInit.put("words", "stopwords.txt") . (2) should clarify.
1) class SchemaAnalyzer extends Analyzer{ @Override public TokenStream tokenStream(String fieldName, Reader reader) { HashMap<String, String> stopInit = new HashMap<String,String>(); stopInit.put("words", "stopwords.txt"); stopInit.put("ignoreCase", Boolean.TRUE.toString()); StopFilterFactory stopFilterFactory = new StopFilterFactory(); stopFilterFactory.init(stopInit); final HashMap<String, String> wordDelimInit = new HashMap<String, String>(); wordDelimInit.put("generateWordParts", "1"); wordDelimInit.put("generateNumberParts", "1"); wordDelimInit.put("catenateWords", "1"); wordDelimInit.put("catenateWords", "1"); wordDelimInit.put("catenateNumbers", "1"); wordDelimInit.put("catenateAll", "0"); wordDelimInit.put("splitOnCaseChange", "1"); WordDelimiterFilterFactory wordDelimiterFilterFactory = new WordDelimiterFilterFactory(); wordDelimiterFilterFactory.init(wordDelimInit); HashMap<String, String> porterInit = new HashMap<String, String>(); porterInit.put("protected", "protwords.txt"); EnglishPorterFilterFactory englishPorterFilterFactory = new EnglishPorterFilterFactory(); englishPorterFilterFactory.init(porterInit); return new RemoveDuplicatesTokenFilter(englishPorterFilterFactory.create(new LowerCaseFilter(wordDelimiterFilterFactory.create(stopFilterFactory.create(new WhitespaceTokenizer(reader)))))); } } On Tue, Jul 5, 2011 at 1:00 PM, Gabriele Kahlout <gabri...@mysimpatico.com>wrote: > nice...where? > > I'm trying to figure out 2 things: > 1) How to create an analyzer that corresponds to the one in the schema.xml. > > > <analyzer> > <tokenizer class="solr.StandardTokenizerFactory"/> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.WordDelimiterFilterFactory" > generateWordParts="1" generateNumberParts="1"/> > </analyzer> > > 2) I'd like to see the code that creates it reading it from schema.xml . > > > On Tue, Jul 5, 2011 at 12:33 PM, Markus Jelsma <markus.jel...@openindex.io > > wrote: > >> No. SolrJ only builds input docs from NutchDocument objects. Solr will do >> analysis. The integration is analogous to XML post of Solr documents. >> >> On Tuesday 05 July 2011 12:28:21 Gabriele Kahlout wrote: >> > Hello, >> > >> > I'm trying to understand better Nutch and Solr integration. My >> > understanding is that Documents are added to Solr index from >> SolrWriter's >> > write(NutchDocument doc) method. But does it make any use of the >> > WhitespaceTokenizerFactory? >> >> -- >> Markus Jelsma - CTO - Openindex >> http://www.linkedin.com/in/markus17 >> 050-8536620 / 06-50258350 >> > > > > -- > Regards, > K. Gabriele > > --- unchanged since 20/9/10 --- > P.S. If the subject contains "[LON]" or the addressee acknowledges the > receipt within 48 hours then I don't resend the email. > subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ > time(x) < Now + 48h) ⇒ ¬resend(I, this). > > If an email is sent by a sender that is not a trusted contact or the email > does not contain a valid code then the email is not received. A valid code > starts with a hyphen and ends with "X". > ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ > L(-[a-z]+[0-9]X)). > > -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains "[LON]" or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) < Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with "X". ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).