[jira] [Commented] (LUCENE-9480) investigate slow DataInput.skipBytes

ASF subversion and git services (Jira) Sat, 20 Feb 2021 09:21:04 -0800


    [ 
https://issues.apache.org/jira/browse/LUCENE-9480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17287748#comment-17287748
 ]


ASF subversion and git services commented on LUCENE-9480:
---------------------------------------------------------

Commit c51fee9c1a59030bda61b600cca8923410f1e090 in lucene-solr's branch 
refs/heads/master from Robert Muir
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=c51fee9 ]

LUCENE-9480: Make DataInput.skipBytes(long) abstract

skipBytes() is a "relative" version of seek(), but DataInput previously
implemented it via read() calls, because DataInput's API does not
include absolute positioning methods (seek, getFilePointer).

This resulted in inefficiencies: calls to skipBytes() would cause
buffers to be allocated, bytes copied, etc.

Instead, make the subclass implement skipBytes() explicitly. The old
DataInput implementation is marked deprecated and renamed to skipBytesSlowly().

Some subclasses still implement skipBytes() via skipBytesSlowly(), to be
fixed in future improvements.


> investigate slow DataInput.skipBytes
> ------------------------------------
>
>                 Key: LUCENE-9480
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9480
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Robert Muir
>            Priority: Major
>         Attachments: LUCENE-9480.patch, LUCENE-9480.patch, LUCENE-9480.patch
>
>
> Currently DataInput has skipBytes(), but IndexInput also adds seek(). 
> There isn't a clear reason about the differences in the two methods: why 
> would you choose one over the other?
> It causes some performance issues: for example the default implementation 
> actually reads bytes into a byte array and throws everything away. This is 
> really silly for MMapDirectory: skipping bytes should only be a glorified 
> {{+=}}. 
> So when I look at latest LUCENE-9447 patch, I can't help but think a ton of 
> waste is happening:
> * Maybe skipBytes() is only used because the stored fields compressor 
> interface happens to take DataInput? Should it take IndexInput instead?
> * Should skipBytes() be overridden by MMapDirectory rather than delegating to 
> super? doing real reads and byte array copies isn't free. It should be a 
> {{+=}} with single bounds check.
> * Should we revisit having DataInput vs IndexInput at all? Maybe they should 
> be collapsed into one thing?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-9480) investigate slow DataInput.skipBytes

Reply via email to