[
https://issues.apache.org/jira/browse/PIO-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15876893#comment-15876893
]
Pat Ferrel commented on PIO-45:
-------------------------------
I've updated the Template here: https://github.com/actionml/db-cleaner.git
It has a simplified integration test and a file that has expected results.
Neither the trim + deduplicate nor the property aggregation pass. The trim
drops some events, the aggregation is as described above.
Not sure why the trim + dedup fails, 2 different named events and 2 different
users. They are accepted by the EventServer and then the train task is started.
The thing in common is that both are duplicates of events that should be
trimmed. In other words the event times for all duplicate events span the
cutoff for trim. This obviously should not happen. Trim should be considered
independent of dedup. Just because one of the deduped events is too old, the
rest of them may not be (and in this case aren't).
I could suggest 2 algorithm changes for the trim if above is not enough. Like
trim before dedup, or dedup to the most recent timestamp and then trim. Either
should produce only one event inside the duration with the most recent
timestamp unless all dups are outside the duration (which seems to work).
Just pull the most recent template and run the integration test. You'll see the
output in data/ for before and after and expected. Let me know if you have any
questions.
> SelfCleaningDatasource erases all data
> --------------------------------------
>
> Key: PIO-45
> URL: https://issues.apache.org/jira/browse/PIO-45
> Project: PredictionIO
> Issue Type: Bug
> Affects Versions: 0.10.0-incubating
> Reporter: Pat Ferrel
> Assignee: Alexander Merritt
> Priority: Blocker
> Fix For: 0.11.0
>
> Attachments: import_handmade_simple.py,
> sample-time-window-and-downsample-data.txt
>
>
> as integrated into the UR, in the integration-test, the SelfCleaningDataset
> erases all data. This feature works fine in the AML version of PIO.
> Although not tested one could assume that this would be true with any other
> Datasource in other templates.
> [~emergentorder] can you check to see if the PIO merge was done correctly.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)