[ https://issues.apache.org/jira/browse/LUCENE-9740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dawid Weiss resolved LUCENE-9740. --------------------------------- Fix Version/s: master (9.0) Resolution: Fixed > Avoid buffering and double-scan of flags in *.aff file > ------------------------------------------------------ > > Key: LUCENE-9740 > URL: https://issues.apache.org/jira/browse/LUCENE-9740 > Project: Lucene - Core > Issue Type: Sub-task > Reporter: Dawid Weiss > Assignee: Dawid Weiss > Priority: Minor > Fix For: master (9.0) > > Time Spent: 1h 10m > Remaining Estimate: 0h > > I wrote a small utility test to scan through all the *.aff files from > openoffice and woorm - no file has double flags (SET or FLAG) and maximum > leading offsets until these flags appear are roughly: > {code} > Flag SET at maximum offset 10753 > Flag FLAG at maximum offset 4559 > {code} > I think we could just make an assumption that, say, affix files are read with > an 20kB buffered reader and this provides a maximum leading window for > scanning for those flags. The dictionary parsing could also fail if any of > these flags occurs more than once in the input file? > This would avoid having to read the file twice and perhaps simplify the API > (no need for a temporary spill). > I'll piggyback this test as part of LUCENE-9727 if you'd like to re-run it > locally. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org