Re: Taking a full text, then truncate and duplicate with stopwords

Jack Krupansky Mon, 17 Sep 2012 14:09:01 -0700

You're getting the hang of it. No particular location for CopyField, justnot within "fields" or "types". Putting them after your fields makes sense.See the Solr example schema.


-- Jack Krupansky

-----Original Message-----From: Spadez

Sent: Monday, September 17, 2012 4:47 PM
To: solr-user@lucene.apache.org
Subject: Re: Taking a full text, then truncate and duplicate with stopwords

Ah, ok this is news to me and makes a lot more sense. If I can just run this
back past you to make sure I understand. If I move my full_text to

If I move my fulltext document from my SQL database to "keyword_document" it
will contain the original fulltext in the source, but the index will have
the stopword filter, lowercase filter etc applied. Then by copying this to
"truncated_document" the original source is being moved?

*This is my definition for keyword_description, using the stopwords.txt*
<fieldType name="keyword_description" class="solr.TextField"
positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true" />
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="3" maxGramSize="30"
/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true" />
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>

*Then this to do the copying across. Is there somewhere specific to put this
within the schema.xml?*
<copyField source="keyword_description" dest="truncated_description"
maxChars="3000"/>

*Then do I need to have definitions for the "truncated description" in the
same way that I did for "keyword_description"?*
<fieldType name="truncated_description" class="solr.TextField"
positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="3" maxGramSize="30"
/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>



Jack Krupansky-2 wrote


You said "it has been copied from the keyword_document [field]", but the
reality is that Solr is not copying from the indexed value of the field,
but
from the source value for the field. The idea is that multiple fields can
be
based on the same source value even if they analyze and index the value in
different ways.

-- Jack Krupansky

-----Original Message-----From: Spadez

Sent: Monday, September 17, 2012 12:29 PM
To: solr-user@.apache
Subject: Re: Taking a full text, then truncate and duplicate with
stopwords

I'm really confused here. I have a document which is say 4000 words long.
I
want to get this put into two fields in Solr without having to save the
original document in its entirety within Solr.

When I import my fulltext (4000 word) document to Solr I was going to put
it
straight into keyword_document which uses stopwords to remove words like
"and" "it" "this". Now I only have 3000 words for example.

Then if I do copy command to move it into truncate_document then even
though
I can reduce it down to say 100 words, it is lacking words like "and" "it"
and "this" because it has been copied from the keyword_document.

I want the following scenario:

truncate_document to have 100 words including words like "and" "it" and
"this"
keyword_docment to have only stop words removed
And finally only have the fulltext document, full length and all stop
words,
exist in my SQL database.

--
View this message in context:
http://lucene.472066.n3.nabble.com/Taking-a-full-text-then-truncate-and-duplicate-with-stopwords-tp4008269p4008380.html
Sent from the Solr - User mailing list archive at Nabble.com.


Jack Krupansky-2 wrote


You said "it has been copied from the keyword_document [field]", but the
reality is that Solr is not copying from the indexed value of the field,
but
from the source value for the field. The idea is that multiple fields can
be
based on the same source value even if they analyze and index the value in
different ways.

-- Jack Krupansky

-----Original Message-----From: Spadez

Sent: Monday, September 17, 2012 12:29 PM
To: solr-user@.apache
Subject: Re: Taking a full text, then truncate and duplicate with
stopwords

I'm really confused here. I have a document which is say 4000 words long.
I
want to get this put into two fields in Solr without having to save the
original document in its entirety within Solr.

When I import my fulltext (4000 word) document to Solr I was going to put
it
straight into keyword_document which uses stopwords to remove words like
"and" "it" "this". Now I only have 3000 words for example.

Then if I do copy command to move it into truncate_document then even
though
I can reduce it down to say 100 words, it is lacking words like "and" "it"
and "this" because it has been copied from the keyword_document.

I want the following scenario:

truncate_document to have 100 words including words like "and" "it" and
"this"
keyword_docment to have only stop words removed
And finally only have the fulltext document, full length and all stop
words,
exist in my SQL database.

--
View this message in context:
http://lucene.472066.n3.nabble.com/Taking-a-full-text-then-truncate-and-duplicate-with-stopwords-tp4008269p4008380.html
Sent from the Solr - User mailing list archive at Nabble.com.


Jack Krupansky-2 wrote


You said "it has been copied from the keyword_document [field]", but the
reality is that Solr is not copying from the indexed value of the field,
but
from the source value for the field. The idea is that multiple fields can
be
based on the same source value even if they analyze and index the value in
different ways.

-- Jack Krupansky

-----Original Message-----From: Spadez

Sent: Monday, September 17, 2012 12:29 PM
To: solr-user@.apache
Subject: Re: Taking a full text, then truncate and duplicate with
stopwords

I'm really confused here. I have a document which is say 4000 words long.
I
want to get this put into two fields in Solr without having to save the
original document in its entirety within Solr.

When I import my fulltext (4000 word) document to Solr I was going to put
it
straight into keyword_document which uses stopwords to remove words like
"and" "it" "this". Now I only have 3000 words for example.

Then if I do copy command to move it into truncate_document then even
though
I can reduce it down to say 100 words, it is lacking words like "and" "it"
and "this" because it has been copied from the keyword_document.

I want the following scenario:

truncate_document to have 100 words including words like "and" "it" and
"this"
keyword_docment to have only stop words removed
And finally only have the fulltext document, full length and all stop
words,
exist in my SQL database.

--
View this message in context:
http://lucene.472066.n3.nabble.com/Taking-a-full-text-then-truncate-and-duplicate-with-stopwords-tp4008269p4008380.html
Sent from the Solr - User mailing list archive at Nabble.com.


Jack Krupansky-2 wrote


You said "it has been copied from the keyword_document [field]", but the
reality is that Solr is not copying from the indexed value of the field,
but
from the source value for the field. The idea is that multiple fields can
be
based on the same source value even if they analyze and index the value in
different ways.

-- Jack Krupansky

-----Original Message-----From: Spadez

Sent: Monday, September 17, 2012 12:29 PM
To: solr-user@.apache
Subject: Re: Taking a full text, then truncate and duplicate with
stopwords

I'm really confused here. I have a document which is say 4000 words long.
I
want to get this put into two fields in Solr without having to save the
original document in its entirety within Solr.

When I import my fulltext (4000 word) document to Solr I was going to put
it
straight into keyword_document which uses stopwords to remove words like
"and" "it" "this". Now I only have 3000 words for example.

Then if I do copy command to move it into truncate_document then even
though
I can reduce it down to say 100 words, it is lacking words like "and" "it"
and "this" because it has been copied from the keyword_document.

I want the following scenario:

truncate_document to have 100 words including words like "and" "it" and
"this"
keyword_docment to have only stop words removed
And finally only have the fulltext document, full length and all stop
words,
exist in my SQL database.

--
View this message in context:
http://lucene.472066.n3.nabble.com/Taking-a-full-text-then-truncate-and-duplicate-with-stopwords-tp4008269p4008380.html
Sent from the Solr - User mailing list archive at Nabble.com.


Jack Krupansky-2 wrote


You said "it has been copied from the keyword_document [field]", but the
reality is that Solr is not copying from the indexed value of the field,
but
from the source value for the field. The idea is that multiple fields can
be
based on the same source value even if they analyze and index the value in
different ways.

-- Jack Krupansky

-----Original Message-----From: Spadez

Sent: Monday, September 17, 2012 12:29 PM
To: solr-user@.apache
Subject: Re: Taking a full text, then truncate and duplicate with
stopwords

I'm really confused here. I have a document which is say 4000 words long.
I
want to get this put into two fields in Solr without having to save the
original document in its entirety within Solr.

When I import my fulltext (4000 word) document to Solr I was going to put
it
straight into keyword_document which uses stopwords to remove words like
"and" "it" "this". Now I only have 3000 words for example.

Then if I do copy command to move it into truncate_document then even
though
I can reduce it down to say 100 words, it is lacking words like "and" "it"
and "this" because it has been copied from the keyword_document.

I want the following scenario:

truncate_document to have 100 words including words like "and" "it" and
"this"
keyword_docment to have only stop words removed
And finally only have the fulltext document, full length and all stop
words,
exist in my SQL database.

--
View this message in context:
http://lucene.472066.n3.nabble.com/Taking-a-full-text-then-truncate-and-duplicate-with-stopwords-tp4008269p4008380.html
Sent from the Solr - User mailing list archive at Nabble.com.

--

View this message in context:http://lucene.472066.n3.nabble.com/Taking-a-full-text-then-truncate-and-duplicate-with-stopwords-tp4008269p4008459.htmlSent from the Solr - User mailing list archive at Nabble.com.

Re: Taking a full text, then truncate and duplicate with stopwords

Reply via email to