[ 
https://issues.apache.org/jira/browse/LUCENE-10035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wuda updated LUCENE-10035:
--------------------------
    Attachment: LUCENE-10035.patch

> Simple text codec add  multi level skip list data 
> --------------------------------------------------
>
>                 Key: LUCENE-10035
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10035
>             Project: Lucene - Core
>          Issue Type: Wish
>          Components: core/codecs
>    Affects Versions: main (9.0)
>            Reporter: wuda
>            Priority: Major
>              Labels: Impact, MultiLevelSkipList, SimpleTextCodec
>         Attachments: LUCENE-10035.patch
>
>          Time Spent: 2h
>  Remaining Estimate: 0h
>
> Simple text codec add skip list data( include impact) to help understand 
> index format,For debugging, curiosity, transparency only!! When term's 
> docFreq greater than or equal to SimpleTextSkipWriter.BLOCK_SIZE (default 
> value is 8), Simple text codec will write skip list, the *.pst (simple text 
> term dictionary file)* file will looks like this
> {code:java}
> field title
>   term args
>     doc 2
>       freq 2
>       pos 7
>       pos 10
>     ## we omit docs for better view ......
>     doc 98
>       freq 2
>       pos 2
>       pos 6
>     skipList 
> ?
>       level 1
>         skipDoc 65
>         skipDocFP 949
>         impacts 
>           impact 
>             freq 1
>             norm 2
>           impact 
>             freq 2
>             norm 12
>           impact 
>             freq 3
>             norm 13
>         impacts_end 
> ?
>       level 0
>         skipDoc 17
>         skipDocFP 284
>         impacts 
>           impact 
>             freq 1
>             norm 2
>           impact 
>             freq 2
>             norm 12
>         impacts_end         
>         skipDoc 34
>         skipDocFP 624
>         impacts 
>           impact 
>             freq 1
>             norm 2
>           impact 
>             freq 2
>             norm 12
>           impact 
>             freq 3
>             norm 14
>         impacts_end         
>         skipDoc 65
>         skipDocFP 949
>         impacts 
>           impact 
>             freq 1
>             norm 2
>           impact 
>             freq 2
>             norm 12
>           impact 
>             freq 3
>             norm 13
>         impacts_end         
>         skipDoc 90
>         skipDocFP 1311
>         impacts 
>           impact 
>             freq 1
>             norm 2
>           impact 
>             freq 2
>             norm 10
>           impact 
>             freq 3
>             norm 13
>           impact 
>             freq 4
>             norm 14
>         impacts_end 
> END
> checksum 00000000000829315543
> {code}
> compare with previous,we add *skipList,level, skipDoc, skipDocFP, impacts, 
> impact, freq, norm* nodes, at the same, simple text codec can support 
> advanceShallow when search time.
>  
> h2. Why there has question mark symbol in the file ?
> Because the *MultiLevelSkipListWriter* will write "length" and "childPointer" 
> with VLong
> h1. This speed up search process ?
> No!!! It can be advanceShallow when search, but why not speed up yet? Because 
> the skip list data after docs(see the file described before), it must iterate 
> all docs before read skip list data, so it never speed up search time. it has 
> no "skipOffset" to direct read skip list data, but as mentioned before, it is 
> For debugging, curiosity, transparency only!! If this is a problem, may be 
> the next time, i can add "skipOffset", so we can read skip list data directly.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to