Hi Allan, Thanks a lot for your reply. I am glad you are an advocate for PMML. I found PMML in 2004 and was fascinated by the idea of representing my models in a standard language that can actually be moved around different platforms. PMML started small, but is now a mature standard. The latest version, PMML 4.0 was just released last month. I actually wrote a blog about all that the new version offers. If you are interested in taking a look at it, please follow the link below.
http://adapasupport.zementis.com/2009/06/pmml-40-is-here.html BTW, thanks for the link to your blog. I believe it is a great source of information for the R community and the data mining community in general. From what you wrote in your reply, all deployment options for R users rely on batch mode scoring. Your conclusion was also very important since you say that if you want to deploy models for real-time or on- demand, you usually would not use R ... or tools as SAS or SPSS. That's most probably because until recently there were actually no deployment platforms that could execute models built in these tools in real-time or on-demand. I believe the ADAPA Scoring Engine is a such a platform. Given that it is available on the cloud as a service, it is highly scalable and cost effective. The small instance costs less than $1/hour. I t offers a web console for batch scoring besides web services for real-time or on-demand scoring. Thanks again for your reply. Best, Alex On Jul 15, 2009, at 1:12 AM, Allan Engelhardt wrote: >> I am framing this as a question since I would like to know how folks >> are currently deploying the models they build in R. Say, you want to >> use the results of your model inside another application in real-time >> or on-demand, how do you do it? How do you use the decisions you get >> back from your models? > > Late answer, sorry. I love PMML (and have been advocating it since > at least version 2.0) but I rarely see it deployed in commercial > companies. What I see in decreasing order of importance: > > 1. Pre-scoring. That is pre-calculate the scores of your model for > each customer and stuff them into a database that your operational > system can access. Example: Customer churn in mobile telco. > > 2. Convert the model to SQL. This is obviously easier for some > model types (trees, k-nearest neighbour, ...) than others. This is > surprisingly common. Example: A Big Famous Data Insights Company > created a global customer segmentation model (really: 'cause all > markets and cultures are the same....) for a multi-national company > and distributed it as a Word document with pseudo-SQL fragments for > each country to implement. Gets over the problem of different > technologies in different countries. > > 3. Pre-scoring for multiple likely events. Example: For cross- and > up-sell in a call centre (which is phenomenally effective) you > really want to include the outcome of the original call as an input > to the propensity model. A badly handled complaint call does not > offer the same opportunities for flogging more products as a > successful upgrade to a higher price plan (but might be an > opportunity to extend an (expensive) retention offer). The Right > Way to do this is to run the model in real time which would usually > mean PMML if you have created the model in R. At least one vendor > recommended just pre-scoring the model for each possible > (relevant) call outcome and storing that in the operational > database. That vendor also sold databases :-) > > 4. Use PL/R to embed R within your (PostgreSQL) RDBMS. (Rare.) > > 5. Embed R within your operational application and run the model > that way (I have done this exactly once). > > Somewhere between 1 and 2 is an approach that doesnt really fit > with the way you framed the question (and is probably OT for this > list). It is simply this: if you want to deploy models for real- > time or fast on-demand usage, usually you dont implement them in R > (or SAS or SPSS). In Marketing, which is my main area, there are > dedicated tools for real-time decisioning and marketing like Oracle > RTD [1], Unica inbound marketing [2], Chordiant Recommendation > Advisor, and others [3], though only the first of these can > realistically be described as modelling. > > > Happy to discuss this more offline if you want. And I really like > your approach - hope to actually use it some day. > > > Allan. > More at http://www.pcagroup.co.uk/ and http://www.cybaea.net/Blogs/Data/ > > > [1] > http://www.oracle.com/appserver/business-intelligence/real-time-decisions.html > [2] http://www.unica.com/products/Inbound_Marketing.htm [web site > down at time of writing] > [3] E.piphany and SPSS Interaction Builder appears to be nearly dead > in the market. > > > On 08/07/09 23:38, Guazzelli, Alex wrote: >> >> I am framing this as a question since I would like to know how folks >> are currently deploying the models they build in R. Say, you want to >> use the results of your model inside another application in real-time >> or on-demand, how do you do it? How do you use the decisions you get >> back from your models? >> >> As you may know, a PMML package is available for R that allows for >> many mining model to be exported into the Predictive Model Markup >> Language. PMML is the standard way to represent models and can be >> exported from most statistical packages (including SPSS, SAS, >> KNIME, ...). Once your model is represented as a PMML file, it can >> easily be moved around. PMML allows for true interoperability. We >> have >> recently published an article about PMML on The R Journal. It >> basically describes the PMML language and the package itself. If you >> are interested in finding out more about PMML and how to benefit from >> this standard, please check the link below. >> >> http://journal.r-project.org/2009-1/RJournal_2009-1_Guazzelli+et+al.pdf >> >> We have also wrote a paper about open standards and cloud computing >> for the SIGKDD Explorations newsletter. In this paper, we describe >> the >> ADAPA Scoring Engine which executes PMML models and is available as a >> service on the Amazon Cloud. ADAPA can be used to deploy R models in >> real-time from anywhere in the world. I believe it represents a >> revolution in data mining since it allows for anyone that uses R to >> make effective use of predictive models at a cost of less than $1/ >> hour. >> >> http://www.zementis.com/docs/SIGKDD_ADAPA.pdf >> >> Thanks! >> >> Alex >> >> >> >> >> >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> Alex Guazzelli, Ph.D. Vice President of Analytics Zementis, Inc. 6125 Cornerstone Court East, Suite 250 San Diego, CA 92121 T: 619 330 0780 x1011 F: 858 535 0227 E: alex.guazze...@zementis.com www.zementis.com [[alternative HTML version deleted]]
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.