The Apache Flink® Conference
Stream Processing | Event Driven | Real Time
San Francisco April 9–10, 2018
Operationalizing Machine Learning models is never easy. Our team at Comcast has been challenged with operationalizing predictive ML models to improve customer care experiences. Using Apache Flink we have been able to apply real-time streaming to all aspects of the Machine Learning lifecycle. This includes data feature exploration and preparation by data scientists, deploying live models to serve near-real-time predictions, and validating results for model retraining and iteration. We will share best practices and lessons learned from Flink’s role in our operationalized lifecycle including:
• Executing as the “Prediction Pipeline” – a model container environment for near-real-time streaming and batch predictions
• Preparing streaming features and data sets for model training, as input for production model predictions, and for a continually-updated customer context
• Using connected streams and savepoints for “Live in the Dark”, multi-variant testing, and validation scenarios
• Incorporating Flink’s Queryable State as an approach to the online “Feature Store” – a data catalog for reuse by multiple models and use cases
• Enabling versioned models, versioned feature sets, and versioned data through DevOps approaches
Dave is an Enterprise Software Architect with over twenty-five years of technical leadership in the telecommunications, financial services, and healthcare domains. Dave’s diverse experience ranges from engineering event-based and rule processing systems at “PaaS” (Platform as a Service) scale to building an autonomous-agent workplace simulation engine. At Comcast, Dave is leading the end-to-end ingest, compute, and machine learning pipeline architectures for supporting Customer Experience Big Data applications.
Sameer Wadkar is a senior principal architect for machine learning at Comcast NBCUniversal, where he works on operationalizing machine learning models to enable rapid turnaround times from model development to model deployment and oversees data ingestion from data lakes, streaming data transformations, and model deployment in hybrid environments ranging from on-premises, cloud, and edge devices. Previously, he developed big data systems capable of handling billions of financial transactions per day arriving out of order for market reconstruction to conduct surveillance of trading activity across multiple markets and implemented natural language processing (NLP) and computer vision-based systems for various public and private sector clients.