jpountz commented on a change in pull request #90:
URL: https://github.com/apache/lucene/pull/90#discussion_r615854333
##########
File path:
lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/Lucene90BlockTreeTermsWriter.java
##########
@@ -140,24 +135,48 @@
* <ul>
* <li>Header is a {@link CodecUtil#writeHeader CodecHeader} storing the
version information for
* the BlockTree implementation.
- * <li>DirOffset is a pointer to the FieldSummary section.
* <li>DocFreq is the count of documents which contain the term.
* <li>TotalTermFreq is the total number of occurrences of the term. This is
encoded as the
* difference between the total number of occurrences and the DocFreq.
+ * <li>PostingsHeader and TermMetadata are plugged into by the specific
postings implementation:
+ * these contain arbitrary per-file data (such as parameters or
versioning information) and
+ * per-term data (such as pointers to inverted files).
+ * <li>For inner nodes of the tree, every entry will steal one bit to mark
whether it points to
+ * child nodes(sub-block). If so, the corresponding TermStats and
TermMetaData are omitted
Review comment:
Adding a trailing dot for consistency with other items.
```suggestion
* child nodes(sub-block). If so, the corresponding TermStats and
TermMetaData are omitted.
```
##########
File path:
lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/Lucene90BlockTreeTermsWriter.java
##########
@@ -140,24 +135,48 @@
* <ul>
* <li>Header is a {@link CodecUtil#writeHeader CodecHeader} storing the
version information for
* the BlockTree implementation.
- * <li>DirOffset is a pointer to the FieldSummary section.
* <li>DocFreq is the count of documents which contain the term.
* <li>TotalTermFreq is the total number of occurrences of the term. This is
encoded as the
* difference between the total number of occurrences and the DocFreq.
+ * <li>PostingsHeader and TermMetadata are plugged into by the specific
postings implementation:
+ * these contain arbitrary per-file data (such as parameters or
versioning information) and
+ * per-term data (such as pointers to inverted files).
+ * <li>For inner nodes of the tree, every entry will steal one bit to mark
whether it points to
+ * child nodes(sub-block). If so, the corresponding TermStats and
TermMetaData are omitted
+ * </ul>
+ *
+ * <p><a id="Termmetadata"></a>
+ *
+ * <h2>Term Metadata</h2>
+ *
+ * <p>The .tmd file contains the list of term metadata (such as FST index
metadata) and field level
+ * statistics (such as sum of total term freq).
+ *
+ * <ul>
+ * <li>TermsMeta (.tmd) --> Header, NumFields,
<FieldStats><sup>NumFields</sup>,
+ * TermIndexLength, TermDictLength, Footer
+ * <li>FieldStats --> FieldNumber, NumTerms, RootCodeLength,
Byte<sup>RootCodeLength</sup>,
+ * SumTotalTermFreq?, SumDocFreq, DocCount, MinTerm, MaxTerm,
IndexStartFP, FSTHeader,
Review comment:
I think it is SumDocFreq which is not always specified rather than
SumTotalTermFreq?
```suggestion
* SumTotalTermFreq, SumDocFreq?, DocCount, MinTerm, MaxTerm,
IndexStartFP, FSTHeader,
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]