On Sun, Aug 18, 2019 at 04:54:52PM +0200, Erich Schubert wrote: > Hi, > > To reproduce the experiments, get the (older than what I used) ELKI > standalone jar here: > > https://elki-project.github.io/releases/release0.7.5/elki-bundle-0.7.5.jar > > (snip) > > A suitable command line is: > > taskset -c 0 $JAVA_HOME/bin/java \ > -jar elki-bundle-0.7.5.jar \ > cli -time -dbc.in aloi-hsb-2x2x2.csv.gz \ > -algorithm clustering.DBSCAN -dbscan.minpts 20 -dbscan.epsilon 0.01 \ > -evaluator NoAutomaticEvaluation -resulthandler DiscardResultHandler > > the "cli" command runs this headless; the "-time" parameter enables a > minimal output of statistics. The last two parameter disable the > (unnecessary for this purpose) evaluation and result output. tasksets forces > this to use only the first CPU (except for the garbage collection, > everything is single-threaded in this call).
I have been able to reproduce the "Java 8 is faster than Java 11" results initially reporting using the ALOI datasets from [1]. However, I have also seen Java 11 outperform Java 8 (by 4-5%) using the same command-line invocation of ELKI against a different dataset [2]. This isn't intended to refute your findings but to suggest that there could be other factors. Perhaps the performance difference depends on specific aspects of the workload - e.g., whether there is more pressure on the memory subsystem than CPU (or something like that). Each of these are run on Debian sid (3) times on an otherwise quiet system; the results below are representative (for all runs, JDK 11 was faster on my system with this dataset): # # Java 8 # OpenJDK 64-Bit Server VM (build 25.222-b10, mixed mode) # elki taskset -c 0 /usr/lib/jvm/java-8-openjdk-amd64/bin/java \ -jar elki-bundle-0.7.5.jar \ cli -time -dbc.in data/Range-Queries-Aggregates.csv \ -algorithm clustering.DBSCAN -dbscan.minpts 20 -dbscan.epsilon 1 \ -evaluator NoAutomaticEvaluation -resulthandler DiscardResultHandler de.lmu.ifi.dbs.elki.datasource.FileBasedDatabaseConnection.load: 829 ms de.lmu.ifi.dbs.elki.algorithm.clustering.DBSCAN.average-neighbors: 1.0 There are very few neighbors found. Epsilon may be too small. de.lmu.ifi.dbs.elki.algorithm.clustering.DBSCAN.runtime: 513229 ms # # Java 11 # OpenJDK 64-Bit Server VM (build 11.0.4+11-post-Debian-1, mixed mode) # elki taskset -c 0 /usr/lib/jvm/java-11-openjdk-amd64/bin/java \ -jar elki-bundle-0.7.5.jar \ cli -time -dbc.in data/Range-Queries-Aggregates.csv \ -algorithm clustering.DBSCAN -dbscan.minpts 20 -dbscan.epsilon 1 \ -evaluator NoAutomaticEvaluation -resulthandler DiscardResultHandler de.lmu.ifi.dbs.elki.datasource.FileBasedDatabaseConnection.load: 864 ms de.lmu.ifi.dbs.elki.algorithm.clustering.DBSCAN.average-neighbors: 1.0 There are very few neighbors found. Epsilon may be too small. de.lmu.ifi.dbs.elki.algorithm.clustering.DBSCAN.runtime: 490520 ms So, this is to be continued... In any event, thank you for bringing this benchmark to the list. Cheers, tony [1] https://www.dbs.ifi.lmu.de/research/outlier-evaluation/input/ALOI.tar.gz [2] http://archive.ics.uci.edu/ml/datasets/Query+Analytics+Workloads+Dataset
signature.asc
Description: PGP signature