rlaehdals commented on issue #14645:
URL: https://github.com/apache/lucene/issues/14645#issuecomment-3264235977
I have been looking into whether the skeleton.txt files can be removed.
Since JFlex already provides a built-in skeleton, it seems that
skeleton.default.txt can safely be deleted.
The main issue is with skeleton.disable.buffer.expansion.txt. Because JFlex
does not provide a built-in option to disable buffer expansion, the tests
related to buffer size are currently failing.
As a temporary workaround, I modified the generated code in JFlexTask using
regex replacements to remove the buffer expansion logic. However, I am not
certain whether this is the appropriate solution. I would greatly appreciate
any feedback or suggestions on a better approach.
```
configure(project(":lucene:core")) {
task generateStandardTokenizerInternal(type: JFlexTask) {
description = "Regenerate StandardTokenizerImpl.java"
group = "generation"
jflexFile =
file('src/java/org/apache/lucene/analysis/standard/StandardTokenizerImpl.jflex')
// NOTE: The following modifications in `doLast` are applied
// after JFlex generates StandardTokenizerImpl.java.
// These changes adjust buffer handling and error conditions.
doLast {
ant.replace(
file:
file('src/java/org/apache/lucene/analysis/standard/StandardTokenizerImpl.java'),
encoding: "UTF-8",
token: "private static final int ZZ_BUFFERSIZE =",
value: "private int ZZ_BUFFERSIZE ="
)
def content =
file('src/java/org/apache/lucene/analysis/standard/StandardTokenizerImpl.java').text
content = content.replaceAll(
/\/\* is the buffer big enough\? \*\/[\s\S]*?(?=\/\* fill the
buffer with new input \*\/)/,
''
)
content = content.replaceAll(
/int requested = zzBuffer\.length - zzEndRead;/,
"""int requested = zzBuffer.length - zzEndRead -
zzFinalHighSurrogate;
if (requested == 0) {
return true;
}"""
)
content = content.replaceAll(
/if \(numRead == 0\) \{\s*if \(requested == 0\)
\{[\s\S]*?\}\s*else \{[\s\S]*?\}\s*\}/,
"""if (numRead == 0) {
throw new java.io.IOException(
"Reader returned 0 characters. See JFlex examples/zero-reader for
a workaround.");
}"""
)
content = content.replaceAll(
/if \(numRead == requested\) \{[\s\S]*?zzFinalHighSurrogate =
1;[\s\S]*?\}/,
"""if (numRead == requested) { // We requested too few chars to
encode a full Unicode character
--zzEndRead;
zzFinalHighSurrogate = 1;
if (numRead == 1) {
return true;
}
}"""
)
file('src/java/org/apache/lucene/analysis/standard/StandardTokenizerImpl.java').text
= content
}
}
def generateStandardTokenizer =
wrapWithPersistentChecksums(generateStandardTokenizerInternal, [
andThenTasks: [
"applyGoogleJavaFormat"
],
mustRunBefore: ["compileJava"]
])
regenerate.dependsOn generateStandardTokenizer
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]