Re: Taking a full text, then truncate and duplicate with stopwords

Spadez Mon, 17 Sep 2012 13:48:23 -0700

Ah, ok this is news to me and makes a lot more sense. If I can just run this
back past you to make sure I understand. If I move my full_text to


If I move my fulltext document from my SQL database to "keyword_document" it
will contain the original fulltext in the source, but the index will have
the stopword filter, lowercase filter etc applied. Then by copying this to
"truncated_document" the original source is being moved?

*This is my definition for keyword_description, using the stopwords.txt*
<fieldType name="keyword_description" class="solr.TextField"
positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true" />
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="3" maxGramSize="30"
/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true" />
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>

*Then this to do the copying across. Is there somewhere specific to put this
within the schema.xml?*
<copyField source="keyword_description" dest="truncated_description"
maxChars="3000"/>

*Then do I need to have definitions for the "truncated description" in the
same way that I did for "keyword_description"?*
<fieldType name="truncated_description" class="solr.TextField"
positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="3" maxGramSize="30"
/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>



Jack Krupansky-2 wrote
> 
> You said "it has been copied from the keyword_document [field]", but the 
> reality is that Solr is not copying from the indexed value of the field,
> but 
> from the source value for the field. The idea is that multiple fields can
> be 
> based on the same source value even if they analyze and index the value in 
> different ways.
> 
> -- Jack Krupansky
> 
> -----Original Message----- 
> From: Spadez
> Sent: Monday, September 17, 2012 12:29 PM
> To: solr-user@.apache
> Subject: Re: Taking a full text, then truncate and duplicate with
> stopwords
> 
> I'm really confused here. I have a document which is say 4000 words long.
> I
> want to get this put into two fields in Solr without having to save the
> original document in its entirety within Solr.
> 
> When I import my fulltext (4000 word) document to Solr I was going to put
> it
> straight into keyword_document which uses stopwords to remove words like
> "and" "it" "this". Now I only have 3000 words for example.
> 
> Then if I do copy command to move it into truncate_document then even
> though
> I can reduce it down to say 100 words, it is lacking words like "and" "it"
> and "this" because it has been copied from the keyword_document.
> 
> I want the following scenario:
> 
> truncate_document to have 100 words including words like "and" "it" and
> "this"
> keyword_docment to have only stop words removed
> And finally only have the fulltext document, full length and all stop
> words,
> exist in my SQL database.
> 
> 
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Taking-a-full-text-then-truncate-and-duplicate-with-stopwords-tp4008269p4008380.html
> Sent from the Solr - User mailing list archive at Nabble.com.
> 

Jack Krupansky-2 wrote
> 
> You said "it has been copied from the keyword_document [field]", but the 
> reality is that Solr is not copying from the indexed value of the field,
> but 
> from the source value for the field. The idea is that multiple fields can
> be 
> based on the same source value even if they analyze and index the value in 
> different ways.
> 
> -- Jack Krupansky
> 
> -----Original Message----- 
> From: Spadez
> Sent: Monday, September 17, 2012 12:29 PM
> To: solr-user@.apache
> Subject: Re: Taking a full text, then truncate and duplicate with
> stopwords
> 
> I'm really confused here. I have a document which is say 4000 words long.
> I
> want to get this put into two fields in Solr without having to save the
> original document in its entirety within Solr.
> 
> When I import my fulltext (4000 word) document to Solr I was going to put
> it
> straight into keyword_document which uses stopwords to remove words like
> "and" "it" "this". Now I only have 3000 words for example.
> 
> Then if I do copy command to move it into truncate_document then even
> though
> I can reduce it down to say 100 words, it is lacking words like "and" "it"
> and "this" because it has been copied from the keyword_document.
> 
> I want the following scenario:
> 
> truncate_document to have 100 words including words like "and" "it" and
> "this"
> keyword_docment to have only stop words removed
> And finally only have the fulltext document, full length and all stop
> words,
> exist in my SQL database.
> 
> 
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Taking-a-full-text-then-truncate-and-duplicate-with-stopwords-tp4008269p4008380.html
> Sent from the Solr - User mailing list archive at Nabble.com.
> 

Jack Krupansky-2 wrote
> 
> You said "it has been copied from the keyword_document [field]", but the 
> reality is that Solr is not copying from the indexed value of the field,
> but 
> from the source value for the field. The idea is that multiple fields can
> be 
> based on the same source value even if they analyze and index the value in 
> different ways.
> 
> -- Jack Krupansky
> 
> -----Original Message----- 
> From: Spadez
> Sent: Monday, September 17, 2012 12:29 PM
> To: solr-user@.apache
> Subject: Re: Taking a full text, then truncate and duplicate with
> stopwords
> 
> I'm really confused here. I have a document which is say 4000 words long.
> I
> want to get this put into two fields in Solr without having to save the
> original document in its entirety within Solr.
> 
> When I import my fulltext (4000 word) document to Solr I was going to put
> it
> straight into keyword_document which uses stopwords to remove words like
> "and" "it" "this". Now I only have 3000 words for example.
> 
> Then if I do copy command to move it into truncate_document then even
> though
> I can reduce it down to say 100 words, it is lacking words like "and" "it"
> and "this" because it has been copied from the keyword_document.
> 
> I want the following scenario:
> 
> truncate_document to have 100 words including words like "and" "it" and
> "this"
> keyword_docment to have only stop words removed
> And finally only have the fulltext document, full length and all stop
> words,
> exist in my SQL database.
> 
> 
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Taking-a-full-text-then-truncate-and-duplicate-with-stopwords-tp4008269p4008380.html
> Sent from the Solr - User mailing list archive at Nabble.com.
> 

Jack Krupansky-2 wrote
> 
> You said "it has been copied from the keyword_document [field]", but the 
> reality is that Solr is not copying from the indexed value of the field,
> but 
> from the source value for the field. The idea is that multiple fields can
> be 
> based on the same source value even if they analyze and index the value in 
> different ways.
> 
> -- Jack Krupansky
> 
> -----Original Message----- 
> From: Spadez
> Sent: Monday, September 17, 2012 12:29 PM
> To: solr-user@.apache
> Subject: Re: Taking a full text, then truncate and duplicate with
> stopwords
> 
> I'm really confused here. I have a document which is say 4000 words long.
> I
> want to get this put into two fields in Solr without having to save the
> original document in its entirety within Solr.
> 
> When I import my fulltext (4000 word) document to Solr I was going to put
> it
> straight into keyword_document which uses stopwords to remove words like
> "and" "it" "this". Now I only have 3000 words for example.
> 
> Then if I do copy command to move it into truncate_document then even
> though
> I can reduce it down to say 100 words, it is lacking words like "and" "it"
> and "this" because it has been copied from the keyword_document.
> 
> I want the following scenario:
> 
> truncate_document to have 100 words including words like "and" "it" and
> "this"
> keyword_docment to have only stop words removed
> And finally only have the fulltext document, full length and all stop
> words,
> exist in my SQL database.
> 
> 
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Taking-a-full-text-then-truncate-and-duplicate-with-stopwords-tp4008269p4008380.html
> Sent from the Solr - User mailing list archive at Nabble.com.
> 

Jack Krupansky-2 wrote
> 
> You said "it has been copied from the keyword_document [field]", but the 
> reality is that Solr is not copying from the indexed value of the field,
> but 
> from the source value for the field. The idea is that multiple fields can
> be 
> based on the same source value even if they analyze and index the value in 
> different ways.
> 
> -- Jack Krupansky
> 
> -----Original Message----- 
> From: Spadez
> Sent: Monday, September 17, 2012 12:29 PM
> To: solr-user@.apache
> Subject: Re: Taking a full text, then truncate and duplicate with
> stopwords
> 
> I'm really confused here. I have a document which is say 4000 words long.
> I
> want to get this put into two fields in Solr without having to save the
> original document in its entirety within Solr.
> 
> When I import my fulltext (4000 word) document to Solr I was going to put
> it
> straight into keyword_document which uses stopwords to remove words like
> "and" "it" "this". Now I only have 3000 words for example.
> 
> Then if I do copy command to move it into truncate_document then even
> though
> I can reduce it down to say 100 words, it is lacking words like "and" "it"
> and "this" because it has been copied from the keyword_document.
> 
> I want the following scenario:
> 
> truncate_document to have 100 words including words like "and" "it" and
> "this"
> keyword_docment to have only stop words removed
> And finally only have the fulltext document, full length and all stop
> words,
> exist in my SQL database.
> 
> 
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Taking-a-full-text-then-truncate-and-duplicate-with-stopwords-tp4008269p4008380.html
> Sent from the Solr - User mailing list archive at Nabble.com.
> 




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Taking-a-full-text-then-truncate-and-duplicate-with-stopwords-tp4008269p4008459.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Taking a full text, then truncate and duplicate with stopwords

Reply via email to