Re: [R] How to deploy statistical models built in R in real-time?

Guazzelli, Alex Wed, 15 Jul 2009 14:26:47 -0700

Hi Allan,

Thanks a lot for your reply. I am glad you are an advocate for PMML. I  
found PMML in 2004 and was fascinated by the idea of representing my  
models in a standard language that can actually be moved around  
different platforms.  PMML started small, but is now a mature  
standard. The latest version, PMML 4.0 was just released last month. I  
actually wrote a blog about all that the new version offers. If you  
are interested in taking a look at it, please follow the link below.


http://adapasupport.zementis.com/2009/06/pmml-40-is-here.html

BTW, thanks for the link to your blog. I believe it is a great source  
of information for the R community and the data mining community in  
general.

 From what you wrote in your reply, all deployment options for R users  
rely on batch mode scoring. Your conclusion was also very important  
since you say that if you want to deploy models for real-time or on- 
demand, you usually would not use R ... or tools as SAS or SPSS.  
That's most probably because until recently there were actually no  
deployment platforms that could execute models built in these tools in  
real-time or on-demand. I believe the ADAPA Scoring Engine is a such a  
platform. Given that it is available on the cloud as a service, it is  
highly scalable and cost effective. The small instance costs less than  
$1/hour. I t offers a web console for batch scoring besides web  
services for real-time or on-demand scoring.

Thanks again for your reply.

Best,

Alex


On Jul 15, 2009, at 1:12 AM, Allan Engelhardt wrote:

>> I am framing this as a question since I would like to know how folks
>> are currently deploying the models they build in R. Say, you want to
>> use the results of your model inside another application in real-time
>> or on-demand, how do you do it? How do you use the decisions you get
>> back from your models?
>
> Late answer, sorry.  I love PMML (and have been advocating it since  
> at least version 2.0) but I rarely see it deployed in commercial  
> companies.  What I see in decreasing order of importance:
>
> 1. Pre-scoring.  That is pre-calculate the scores of your model for  
> each customer and stuff them into a database that your operational  
> system can access.  Example: Customer churn in mobile telco.
>
> 2. Convert the model to SQL.  This is obviously easier for some  
> model types (trees, k-nearest neighbour, ...) than others.  This is  
> surprisingly common.  Example: A Big Famous Data Insights Company  
> created a global customer segmentation model (really: 'cause all  
> markets and cultures are the same....) for a multi-national company  
> and distributed it as a Word document with pseudo-SQL fragments for  
> each country to implement.  Gets over the problem of different  
> technologies in different countries.
>
> 3. Pre-scoring for multiple likely events.  Example: For cross- and  
> up-sell in a call centre (which is phenomenally effective) you  
> really want to include the outcome of the original call as an input  
> to the propensity model.  A badly handled complaint call does not  
> offer the same opportunities for flogging more products as a  
> successful upgrade to a higher price plan (but might be an  
> opportunity to extend an (expensive) retention offer).  The Right  
> Way to do this is to run the model in real time which would usually  
> mean PMML if you have created the model in R.  At least one vendor  
> recommended just pre-scoring the model for each possible  
> (relevant) call outcome and storing that in the operational  
> database.  That vendor also sold databases :-)
>
> 4. Use PL/R to embed R within your (PostgreSQL) RDBMS.  (Rare.)
>
> 5. Embed R within your operational application and run the model  
> that way (I have done this exactly once).
>
> Somewhere between 1 and 2 is an approach that doesnt really fit  
> with the way you framed the question (and is probably OT for this  
> list).  It is simply this: if you want to deploy models for real- 
> time or fast on-demand usage, usually you dont implement them in R  
> (or SAS or SPSS).  In Marketing, which is my main area, there are  
> dedicated tools for real-time decisioning and marketing like Oracle  
> RTD [1], Unica inbound marketing [2], Chordiant Recommendation  
> Advisor, and others [3], though only the first of these can  
> realistically be described as modelling.
>
>
> Happy to discuss this more offline if you want.  And I really like  
> your approach - hope to actually use it some day.
>
>
> Allan.
> More at http://www.pcagroup.co.uk/ and http://www.cybaea.net/Blogs/Data/
>
>
> [1] 
> http://www.oracle.com/appserver/business-intelligence/real-time-decisions.html
> [2] http://www.unica.com/products/Inbound_Marketing.htm [web site  
> down at time of writing]
> [3] E.piphany and SPSS Interaction Builder appears to be nearly dead  
> in the market.
>
>
> On 08/07/09 23:38, Guazzelli, Alex wrote:
>>
>> I am framing this as a question since I would like to know how folks
>> are currently deploying the models they build in R. Say, you want to
>> use the results of your model inside another application in real-time
>> or on-demand, how do you do it? How do you use the decisions you get
>> back from your models?
>>
>> As you may know, a PMML package is available for R that allows for
>> many mining model to be exported into the Predictive Model Markup
>> Language. PMML is the standard way to represent models and can be
>> exported from most statistical packages (including SPSS, SAS,
>> KNIME, ...). Once your model is represented as a PMML file, it can
>> easily be moved around. PMML allows for true interoperability. We  
>> have
>> recently published an article about PMML on The R Journal. It
>> basically describes the PMML language and the package itself. If you
>> are interested in finding out more about PMML and how to benefit from
>> this standard, please check the link below.
>>
>> http://journal.r-project.org/2009-1/RJournal_2009-1_Guazzelli+et+al.pdf
>>
>> We have also wrote a paper about open standards and cloud computing
>> for the SIGKDD Explorations newsletter. In this paper, we describe  
>> the
>> ADAPA Scoring Engine which executes PMML models and is available as a
>> service on the Amazon Cloud. ADAPA can be used to deploy R models in
>> real-time from anywhere in the world. I believe it represents a
>> revolution in data mining since it allows for anyone that uses R to
>> make effective use of predictive models at a cost of less than $1/ 
>> hour.
>>
>> http://www.zementis.com/docs/SIGKDD_ADAPA.pdf
>>
>> Thanks!
>>
>> Alex
>>
>>
>>
>>
>>
>>
>>      [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>

Alex Guazzelli, Ph.D.
Vice President of Analytics

Zementis, Inc.
6125 Cornerstone Court East, Suite 250
San Diego, CA  92121
T: 619 330 0780  x1011
F: 858 535 0227
E: alex.guazze...@zementis.com
www.zementis.com







        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to deploy statistical models built in R in real-time?

Reply via email to