Hello Hoss,
I appreciate your detailed response. I think I like your second
alternative because I'd like to score whole books rather than pages in
books. It seems to me that the more words one has to work with in a
"document" the better the scoring would be for the entire book.
Here's a question. Am I correct that (1) book "documents" where the
terms appear farther apart than 10,000 words would not be returned in
the result or (2) they would be in the result but scored lower than all
the books where the terms were within 10,000 chars?
Thanks,
Phil
Chris Hostetter wrote:
: that must appear together on a page. I have a multiValued TextField called
: "page" in a document with uniqueId called "id" that represents a OCR'd book.
: My default operator is AND. My default field is "page". My query is:
:
: q=adhesion+ring&fl=id,score&fq=id:(1+44)&version=2.2
:
: But this doesn't work. I get documents that contain "adhesion" on any page
: and "ring" on any page even thought adhesion and ring do not appear together
: in any single page field.
one option to consider is changing the granularity of your "documents" so
that each "page" is indexed as a seperate document ... this is
particulararly handy if your ultimate goal is to tell people "the words
you searched for were all found on *this* page of *this* document.
Alternately: take a look at the comments about "positionIncrementGap" in
the example schema. by setting that to some value bigger then the number
of words you expect to ever be on the same page, you can then do
"proxmity" queries where you look for the words closer together then that
gap size...
q="adhesion+ring"~10000&fl=id,score&fq=id:(1+44)&version=2.2
this will let you tell people "the words you searched for were all found
on *a* page of *this* document"
-Hoss