----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/17058/#review32202 -----------------------------------------------------------
Here are some initial comments on the new changes since the last time I reviewed it. src/java/datafu/pig/sampling/WeightedReservoirSampleWithExpJump.java <https://reviews.apache.org/r/17058/#comment60929> You also need to implement new versions of Intermediate and Final that set the reservoir to InverseWeightJumpSampleReservoir. Currently you are using the default Reservoir that does not perform the exponential jumps. Therefore the algebraic version doesn't behave as intended. I checked this with the debugger by stepping through the code. src/java/datafu/pig/sampling/WeightedReservoirSampleWithExpJump.java <https://reviews.apache.org/r/17058/#comment60918> minor comment: It appears that declaring 'r' is not needed. Math.random() could be called on line 146 directly. src/java/datafu/pig/stats/entropy/stream/ChaoShenEntropyEstimator.java <https://reviews.apache.org/r/17058/#comment60916> I think everything in the datafu.pig.stats.entropy.stream package should be moved to datafu.pig.stats.entropy. - Matthew Hayes On Jan. 17, 2014, 6:57 p.m., Matthew Hayes wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/17058/ > ----------------------------------------------------------- > > (Updated Jan. 17, 2014, 6:57 p.m.) > > > Review request for DataFu. > > > Repository: datafu > > > Description > ------- > > This is Jian Wang's implementation of the entropy and weighted sampling > algorithms in DATAFU-2. I generated the diff by applying the patches in a > branch and squashing into one commit. > > > Diffs > ----- > > src/java/datafu/pig/sampling/ReservoirSample.java > 403bfaf06402c8dac2a70c0133ee955c625cb06e > src/java/datafu/pig/sampling/ScoredSampleReservoir.java PRE-CREATION > src/java/datafu/pig/sampling/ScoredTuple.java > 293cf3d6fcfa041287258b94a1f686a2a77ab4ce > src/java/datafu/pig/sampling/WeightedReservoirSample.java PRE-CREATION > src/java/datafu/pig/sampling/WeightedReservoirSampleWithExpJump.java > PRE-CREATION > src/java/datafu/pig/stats/entropy/Entropy.java PRE-CREATION > src/java/datafu/pig/stats/entropy/EntropyUtil.java PRE-CREATION > src/java/datafu/pig/stats/entropy/stream/ChaoShenEntropyEstimator.java > PRE-CREATION > src/java/datafu/pig/stats/entropy/stream/EmpiricalEntropyEstimator.java > PRE-CREATION > src/java/datafu/pig/stats/entropy/stream/EntropyEstimator.java PRE-CREATION > src/java/datafu/pig/stats/entropy/stream/StreamingCondEntropy.java > PRE-CREATION > src/java/datafu/pig/stats/entropy/stream/StreamingEntropy.java PRE-CREATION > > test/pig/datafu/test/pig/sampling/WeightedReservoirSampleWithExpJumpTests.java > PRE-CREATION > test/pig/datafu/test/pig/sampling/WeightedReservoirSamplingTests.java > PRE-CREATION > test/pig/datafu/test/pig/stats/entropy/AbstractEntropyTests.java > PRE-CREATION > test/pig/datafu/test/pig/stats/entropy/EntropyTests.java PRE-CREATION > test/pig/datafu/test/pig/stats/entropy/StreamingChaoShenEntropyTests.java > PRE-CREATION > > test/pig/datafu/test/pig/stats/entropy/StreamingEmpiricalCondEntropyTests.java > PRE-CREATION > test/pig/datafu/test/pig/stats/entropy/StreamingEmpiricalEntropyTests.java > PRE-CREATION > > Diff: https://reviews.apache.org/r/17058/diff/ > > > Testing > ------- > > > Thanks, > > Matthew Hayes > >
