The Apache Flink® Conference
Stream Processing | Event Driven | Real Time
San Francisco April 9–10, 2018
Apache Flink is a popular stream computing framework for real-time stream computing. Many stream compute algorithms require trailing data in order to compute the intended result. One example is computing the number of user logins in the last 7 days. This creates a dilemma where the results of the stream program are incomplete until the runtime of the program exceeds 7 days. The alternative is to bootstrap the program using historic data to seed the state before shifting to use real-time data. This talk will discuss alternatives to bootstrap programs in Flink. Some alternatives rely on technologies exogenous to the stream program, such as enhancements to the pub/sub layer, that are more generally applicable to other stream compute engines. Other alternatives include enhancements to Flink source implementations. Lyft is exploring another alternative using orchestration of multiple Flink programs. The talk will cover why Lyft pursued this alternative and future directions to further enhance bootstrapping support in Flink.
Gregory is an expert in programming language runtimes, distributed systems, and big data processing. During his time on the ETA team at Lyft, Gregory transformed data processing from a manual process that took weeks to run to a fully automated process that runs every 10 minutes. As a formative member of the Data Science Platform team, Gregory helped define and deliver the vision for expanded use of machine learning techniques across Lyft. Now as a member of the Streaming Platform team, Gregory is focused on the delivery of high quality data to analytics and machine learning applications at Lyft. Before Lyft, Gregory was the lead architect of Salesforce’s Apex ecosystem — including the definition of the Apex language, compiler, runtime, debugger and other tooling, governance, batch processing, and caching — that services billions of requests a month.