uschindler commented on a change in pull request #643: URL: https://github.com/apache/lucene/pull/643#discussion_r799620611
########## File path: lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/BinaryDictionary.java ########## @@ -154,6 +153,98 @@ protected BinaryDictionary(ResourceScheme resourceScheme, String resourcePath) this.buffer = buffer; } + protected BinaryDictionary( + Supplier<InputStream> targetMapResource, + Supplier<InputStream> posResource, + Supplier<InputStream> dictResource) + throws IOException { + this.resourceScheme = null; + this.resourcePath = null; + + int[] targetMapOffsets = null, targetMap = null; + String[] posDict = null; + String[] inflFormDict = null; + String[] inflTypeDict = null; + ByteBuffer buffer = null; + try (InputStream mapIS = new BufferedInputStream(targetMapResource.get()); + InputStream posIS = new BufferedInputStream(posResource.get()); + // no buffering here, as we load in one large buffer + InputStream dictIS = dictResource.get()) { + DataInput in = new InputStreamDataInput(mapIS); + CodecUtil.checkHeader(in, TARGETMAP_HEADER, VERSION, VERSION); + targetMap = new int[in.readVInt()]; + targetMapOffsets = new int[in.readVInt()]; + int accum = 0, sourceId = 0; + for (int ofs = 0; ofs < targetMap.length; ofs++) { + final int val = in.readVInt(); + if ((val & 0x01) != 0) { + targetMapOffsets[sourceId] = ofs; + sourceId++; + } + accum += val >>> 1; + targetMap[ofs] = accum; + } + if (sourceId + 1 != targetMapOffsets.length) + throw new IOException( + "targetMap file format broken; targetMap.length=" + + targetMap.length + + ", targetMapOffsets.length=" + + targetMapOffsets.length + + ", sourceId=" + + sourceId); + targetMapOffsets[sourceId] = targetMap.length; + + in = new InputStreamDataInput(posIS); Review comment: Generally all fields and variables should be final, so I agree. In addition, assigning null to variables first and then reassigning a real value is bad code pattern! They should be final and only once assigned. This is a relic from the old code, we should clean it up. For the special case where you need to pass a value to super.ctor, you can use a static method. All other code should be moved to ctor only. This is especially important for multithreading, because the ctor is ensured to be atomic. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org