[ https://issues.apache.org/jira/browse/LUCENE-10035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
wuda updated LUCENE-10035: -------------------------- Attachment: LUCENE-10035.patch > Simple text codec add multi level skip list data > -------------------------------------------------- > > Key: LUCENE-10035 > URL: https://issues.apache.org/jira/browse/LUCENE-10035 > Project: Lucene - Core > Issue Type: Wish > Components: core/codecs > Affects Versions: main (9.0) > Reporter: wuda > Priority: Major > Labels: Impact, MultiLevelSkipList, SimpleTextCodec > Attachments: LUCENE-10035.patch > > Time Spent: 2h > Remaining Estimate: 0h > > Simple text codec add skip list data( include impact) to help understand > index format,For debugging, curiosity, transparency only!! When term's > docFreq greater than or equal to SimpleTextSkipWriter.BLOCK_SIZE (default > value is 8), Simple text codec will write skip list, the *.pst (simple text > term dictionary file)* file will looks like this > {code:java} > field title > term args > doc 2 > freq 2 > pos 7 > pos 10 > ## we omit docs for better view ...... > doc 98 > freq 2 > pos 2 > pos 6 > skipList > ? > level 1 > skipDoc 65 > skipDocFP 949 > impacts > impact > freq 1 > norm 2 > impact > freq 2 > norm 12 > impact > freq 3 > norm 13 > impacts_end > ? > level 0 > skipDoc 17 > skipDocFP 284 > impacts > impact > freq 1 > norm 2 > impact > freq 2 > norm 12 > impacts_end > skipDoc 34 > skipDocFP 624 > impacts > impact > freq 1 > norm 2 > impact > freq 2 > norm 12 > impact > freq 3 > norm 14 > impacts_end > skipDoc 65 > skipDocFP 949 > impacts > impact > freq 1 > norm 2 > impact > freq 2 > norm 12 > impact > freq 3 > norm 13 > impacts_end > skipDoc 90 > skipDocFP 1311 > impacts > impact > freq 1 > norm 2 > impact > freq 2 > norm 10 > impact > freq 3 > norm 13 > impact > freq 4 > norm 14 > impacts_end > END > checksum 00000000000829315543 > {code} > compare with previous,we add *skipList,level, skipDoc, skipDocFP, impacts, > impact, freq, norm* nodes, at the same, simple text codec can support > advanceShallow when search time. > > h2. Why there has question mark symbol in the file ? > Because the *MultiLevelSkipListWriter* will write "length" and "childPointer" > with VLong > h1. This speed up search process ? > No!!! It can be advanceShallow when search, but why not speed up yet? Because > the skip list data after docs(see the file described before), it must iterate > all docs before read skip list data, so it never speed up search time. it has > no "skipOffset" to direct read skip list data, but as mentioned before, it is > For debugging, curiosity, transparency only!! If this is a problem, may be > the next time, i can add "skipOffset", so we can read skip list data directly. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org