The Term Vector Component (TVC) is a SearchComponent designed to return information about documents that is stored when setting the termVector attribute on a field:
Will I have to re-index after adding that to the schema? On Tue, Aug 30, 2011 at 11:06 PM, Jayendra Patil < jayendra.patil....@gmail.com> wrote: > you might want to check - http://wiki.apache.org/solr/TermVectorComponent > Should provide you with the term vectors with a lot of additional info. > > Regards, > Jayendra > > On Tue, Aug 30, 2011 at 3:34 AM, Gabriele Kahlout > <gabri...@mysimpatico.com> wrote: > > Hello, > > > > This time I'm trying to duplicate Luke's functionality of knowing which > > terms occur in a search result/document (w/o parsing it again). Any Solrj > > API to do that? > > > > P.S. I've also posted the question on > > SO<http://stackoverflow.com/q/7219111/300248> > > . > > > > On Wed, Jul 6, 2011 at 11:09 AM, Gabriele Kahlout > > <gabri...@mysimpatico.com>wrote: > > > >> From you patch I see TermFreqVector which provides the information I > >> want. > >> > >> I also found FieldInvertState.getLength() which seems to be exactly what > I > >> want. I'm after the word count (sum of tf for every term in the doc). > I'm > >> just not sure whether FieldInvertState.getLength() returns just the > number > >> of terms (not multiplied by the frequency of each term - word count) or > not > >> though. It seems as if it returns word count, but I've not tested it > >> sufficienctly. > >> > >> > >> On Wed, Jul 6, 2011 at 1:39 AM, Trey Grainger < > the.apache.t...@gmail.com>wrote: > >> > >>> Gabriele, > >>> > >>> I created a patch that does this about a year ago. See > >>> https://issues.apache.org/jira/browse/SOLR-1837. It was written for > Solr > >>> 1.4 and is based upon the Document Reconstructor in Luke. The patch > adds > >>> a > >>> link to the main solr admin page to a docinspector page which will > >>> reconstruct the document given a uniqueid (required). Keep in mind > that > >>> you're only looking at what's "in" the index for non-stored fields, not > >>> the > >>> original text. > >>> > >>> If you have any issues using this on the most recent release, let me > know > >>> and I'd be happy to create a new patch for solr 3.3. One of these days > >>> I'll > >>> remove the JSP dependency and this may eventually making it into trunk. > >>> > >>> Thanks, > >>> > >>> -Trey Grainger > >>> Search Technology Development Team Lead, Careerbuilder.com > >>> Site Architect, Celiaccess.com > >>> > >>> > >>> On Tue, Jul 5, 2011 at 3:59 PM, Gabriele Kahlout > >>> <gabri...@mysimpatico.com>wrote: > >>> > >>> > Hello, > >>> > > >>> > With an inverted index the term is the key, and the documents are the > >>> > values. Is it still however possible that given a document id I get > the > >>> > terms indexed for that document? > >>> > > >>> > -- > >>> > Regards, > >>> > K. Gabriele > >>> > > >>> > --- unchanged since 20/9/10 --- > >>> > P.S. If the subject contains "[LON]" or the addressee acknowledges > the > >>> > receipt within 48 hours then I don't resend the email. > >>> > subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ > >>> > time(x) > >>> > < Now + 48h) ⇒ ¬resend(I, this). > >>> > > >>> > If an email is sent by a sender that is not a trusted contact or the > >>> email > >>> > does not contain a valid code then the email is not received. A valid > >>> code > >>> > starts with a hyphen and ends with "X". > >>> > ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ > y ∈ > >>> > L(-[a-z]+[0-9]X)). > >>> > > >>> > >> > >> > >> > >> -- > >> Regards, > >> K. Gabriele > >> > >> --- unchanged since 20/9/10 --- > >> P.S. If the subject contains "[LON]" or the addressee acknowledges the > >> receipt within 48 hours then I don't resend the email. > >> subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ > >> time(x) < Now + 48h) ⇒ ¬resend(I, this). > >> > >> If an email is sent by a sender that is not a trusted contact or the > email > >> does not contain a valid code then the email is not received. A valid > code > >> starts with a hyphen and ends with "X". > >> ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ > >> L(-[a-z]+[0-9]X)). > >> > >> > > > > > > -- > > Regards, > > K. Gabriele > > > > --- unchanged since 20/9/10 --- > > P.S. If the subject contains "[LON]" or the addressee acknowledges the > > receipt within 48 hours then I don't resend the email. > > subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ > time(x) > > < Now + 48h) ⇒ ¬resend(I, this). > > > > If an email is sent by a sender that is not a trusted contact or the > email > > does not contain a valid code then the email is not received. A valid > code > > starts with a hyphen and ends with "X". > > ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ > > L(-[a-z]+[0-9]X)). > > > -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains "[LON]" or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) < Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with "X". ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).