Track: Data Engineering for the Bold

Day of week: Tuesday

Data engineering is the practice of delivering high-fidelity, custom access to data in order to serve the varied needs of a business. The rich and engaging experiences many of us expect online today (e.g. personalized news feeds, highly-relevant search engines & recommender systems, smart home assistants) are powered by modern data pipelines and architectures that form the foundation of data engineering. The tools a data engineer can deploy for his/her needs today occupy a vast landscape. The field of data engineering may have started out as “put all of your data in that RDBMS over there”, but it has evolved into a field of a multitude of specialty data solutions. It encompasses databases (RDBMS, NoSQL, NewSQL, OLAP DBs, etc…), messaging systems (Kafka, Kinesis, Pulsar), data compute frameworks (Spark, Flink, Ray, graph compute), storage systems (distributed file systems, block storage, object storage), search engines, RT OLAP engines, and graph DBs, Machine Learning Frameworks (PetaStorm, Michelangelo), etc… As the volume and speed of the data grows, we are continuing to discover new patterns and frameworks for squeezing more out of our data. What are some of the new entrants in this space and what interesting problems are being solved with them? Come to this track to find out!

Track Host: Sid Anand

Chief Data Engineer @PayPal, PMC & Committer for Apache Airflow, Co-Chair for QCon

Sid Anand currently serves as PayPal's Chief Data Engineer, focusing on ways to realize the value of data. Prior to joining PayPal, he held several positions including Agari's Data Architect, a Technical Lead in Search @ LinkedIn, Netflix’s Cloud Data Architect, Etsy’s VP of Engineering, and several technical roles at eBay. Sid earned his BS and MS degrees in CS from Cornell University, where he focused on Distributed Systems. In his spare time, he is a maintainer/committer on Apache Airflow, a co-chair for QCon, and a frequent speaker at conferences. When not working, Sid spends time with his wife, Shalini, and their 2 kids.

Scaling DB Access for Billions of Queries Per Day @PayPal

As microservices scale and proliferate, they add increasing load on databases in terms of connections and resource usage. Open sourced in the Go programming language, VulcanMX scales thousands of PayPal’s applications with connection multiplexing, read-write split, and sharding. This talk covers various approaches taken over the years to handle a large growth in application connections and OLTP database utilization. Beyond pure connection and query scaling, VulcanMX has functionality for better manageability. Automatic SQL eviction and DBA maintenance control help to more easily operate hundreds of databases.

Petrica Voicu, Software Engineer @PayPal
Kenneth Kang, Software Engineer @PayPal


