Thanks Bert, I will definately look through rseek, and reuse wherever possible. Scanning through the first few pages, maybe "datacheck" can provide something. But I have in mind a complete DQ package, a sports car with 4 good wheels ;) and still seems likely that I will need to develop something at this point.
Regards, David On Fri, 4 Aug 2017, 3:15 pm Bert Gunter, <bgunter.4...@gmail.com> wrote: > Sounds like you'll be reinventing square wheels. > > Searching "data quality package" on rseek.org brought up many hits. > > Cheers, > Bert > > > Bert Gunter > > "The trouble with having an open mind is that people keep coming along > and sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > On Fri, Aug 4, 2017 at 2:56 AM, Architector Data Tools via R-help > <r-help@r-project.org> wrote: > > I am planning to develop an R package to manage all aspects of data > > quality. I am very experienced in data quality, but fairly new to R. I > > have tried to find a suitable data quality package, and am surprised > > not to find much to suit my requirements. Developing the package > > would be an ambitious effort, involving several contributors (that I > > have already identified, and who also do not have much R experience > > yet). So I am seeking some confidence that the effort is worthwhile. > > > > The package will be highly configurable so it can be applied to pretty > > much any situation, and will implement sophisticated data quality > > capabilities, including: > > > > (a) DEFINITION: integration with a data dictionary (perhaps metaData), > > and with highly configurable and expressive data quality rules > > > > (b) MONITORING & DETECTION: automated data quality monitoring and > > alerting against any data source. Automatically raise and update > > quality issues > > > > (c) ANALYSIS & ROOT CAUSE: data quality dashboard, alerts, > > drill-downs, plot trends, including perhaps a machine learning aspect > > that detects noteworthy events in quality measurements for inclusion > > in executive reports > > > > (d) WORKFLOW: basic data quality management workflow (i.e. implement > > 'inbox' and 'actions', probably via Shiny) > > > > The requirements will be drawn from my professional experience (as > > interim head of data quality at a global bank), although this project > > is not sponsored either by my employer or any of my consulting > > clients. I do, however, expect the package to be of interest to > > financial service organisations who rely on good quality data for > > their financial and risk models, and for any other process that relies > > on good data. > > > > To sum up, if anyone can point to a data quality package that means I > > don’t have to develop one that would be great. Alternatively, any > > comments of support would also be very useful! > > > > David > > > > David Twaddell > > Architector Data Tools > > Tel: +44 20 3239 1099 | +44 7447 936 984 > > Web: www.architector.co.uk > > > > ______________________________________________ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.