The Term Vector Component (TVC) is a SearchComponent designed to return
information about documents that is stored when setting the termVector
attribute on a field:

Will I have to re-index after adding that to the schema?

On Tue, Aug 30, 2011 at 11:06 PM, Jayendra Patil <
jayendra.patil....@gmail.com> wrote:

> you might want to check - http://wiki.apache.org/solr/TermVectorComponent
> Should provide you with the term vectors with a lot of additional info.
>
> Regards,
> Jayendra
>
> On Tue, Aug 30, 2011 at 3:34 AM, Gabriele Kahlout
> <gabri...@mysimpatico.com> wrote:
> > Hello,
> >
> > This time I'm trying to duplicate Luke's functionality of knowing which
> > terms occur in a search result/document (w/o parsing it again). Any Solrj
> > API to do that?
> >
> > P.S. I've also posted the question on
> > SO<http://stackoverflow.com/q/7219111/300248>
> > .
> >
> > On Wed, Jul 6, 2011 at 11:09 AM, Gabriele Kahlout
> > <gabri...@mysimpatico.com>wrote:
> >
> >> From you patch I see TermFreqVector  which provides the information I
> >> want.
> >>
> >> I also found FieldInvertState.getLength() which seems to be exactly what
> I
> >> want. I'm after the word count (sum of tf for every term in the doc).
> I'm
> >> just not sure whether FieldInvertState.getLength() returns just the
> number
> >> of terms (not multiplied by the frequency of each term - word count) or
> not
> >> though. It seems as if it returns word count, but I've not tested it
> >> sufficienctly.
> >>
> >>
> >> On Wed, Jul 6, 2011 at 1:39 AM, Trey Grainger <
> the.apache.t...@gmail.com>wrote:
> >>
> >>> Gabriele,
> >>>
> >>> I created a patch that does this about a year ago.  See
> >>> https://issues.apache.org/jira/browse/SOLR-1837.  It was written for
> Solr
> >>> 1.4 and is based upon the Document Reconstructor in Luke.  The patch
> adds
> >>> a
> >>> link to the main solr admin page to a docinspector page which will
> >>> reconstruct the document given a uniqueid (required).  Keep in mind
> that
> >>> you're only looking at what's "in" the index for non-stored fields, not
> >>> the
> >>> original text.
> >>>
> >>> If you have any issues using this on the most recent release, let me
> know
> >>> and I'd be happy to create a new patch for solr 3.3.  One of these days
> >>> I'll
> >>> remove the JSP dependency and this may eventually making it into trunk.
> >>>
> >>> Thanks,
> >>>
> >>> -Trey Grainger
> >>> Search Technology Development Team Lead, Careerbuilder.com
> >>> Site Architect, Celiaccess.com
> >>>
> >>>
> >>> On Tue, Jul 5, 2011 at 3:59 PM, Gabriele Kahlout
> >>> <gabri...@mysimpatico.com>wrote:
> >>>
> >>> > Hello,
> >>> >
> >>> > With an inverted index the term is the key, and the documents are the
> >>> > values. Is it still however possible that given a document id I get
> the
> >>> > terms indexed for that document?
> >>> >
> >>> > --
> >>> > Regards,
> >>> > K. Gabriele
> >>> >
> >>> > --- unchanged since 20/9/10 ---
> >>> > P.S. If the subject contains "[LON]" or the addressee acknowledges
> the
> >>> > receipt within 48 hours then I don't resend the email.
> >>> > subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧
> >>> > time(x)
> >>> > < Now + 48h) ⇒ ¬resend(I, this).
> >>> >
> >>> > If an email is sent by a sender that is not a trusted contact or the
> >>> email
> >>> > does not contain a valid code then the email is not received. A valid
> >>> code
> >>> > starts with a hyphen and ends with "X".
> >>> > ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧
> y ∈
> >>> > L(-[a-z]+[0-9]X)).
> >>> >
> >>>
> >>
> >>
> >>
> >> --
> >> Regards,
> >> K. Gabriele
> >>
> >> --- unchanged since 20/9/10 ---
> >> P.S. If the subject contains "[LON]" or the addressee acknowledges the
> >> receipt within 48 hours then I don't resend the email.
> >> subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧
> >> time(x) < Now + 48h) ⇒ ¬resend(I, this).
> >>
> >> If an email is sent by a sender that is not a trusted contact or the
> email
> >> does not contain a valid code then the email is not received. A valid
> code
> >> starts with a hyphen and ends with "X".
> >> ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
> >> L(-[a-z]+[0-9]X)).
> >>
> >>
> >
> >
> > --
> > Regards,
> > K. Gabriele
> >
> > --- unchanged since 20/9/10 ---
> > P.S. If the subject contains "[LON]" or the addressee acknowledges the
> > receipt within 48 hours then I don't resend the email.
> > subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧
> time(x)
> > < Now + 48h) ⇒ ¬resend(I, this).
> >
> > If an email is sent by a sender that is not a trusted contact or the
> email
> > does not contain a valid code then the email is not received. A valid
> code
> > starts with a hyphen and ends with "X".
> > ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
> > L(-[a-z]+[0-9]X)).
> >
>



-- 
Regards,
K. Gabriele

--- unchanged since 20/9/10 ---
P.S. If the subject contains "[LON]" or the addressee acknowledges the
receipt within 48 hours then I don't resend the email.
subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x)
< Now + 48h) ⇒ ¬resend(I, this).

If an email is sent by a sender that is not a trusted contact or the email
does not contain a valid code then the email is not received. A valid code
starts with a hyphen and ends with "X".
∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
L(-[a-z]+[0-9]X)).

Reply via email to