Presentation: Papers We Love, QCon Edition



7:15pm - 9:55pm

Day of week:




The Paper's We Love Meetup (New York City)  put together a series of 4 PWL "mini" (15~20 minute) presentations by 4 wonderful speakers (all of whom are also speaking at QCon New York)! So, we're extremely happy to welcome Evelina Gabasova (@evelgab), Eric Brewer (@eric_brewer), Ines Sombra (@randommood), and Caitie McCaffrey (@caitie) to PWLNYC!

Visit the Papers We Love, QCon Edition meetup to learn more about the NYC chapter of the Paper's We Love Meetup and the Tuesday evening event.

PWL "Mini" Speakers

Evelina Gabasova presenting Mastering the Game of Go with Deep Neural Networks and Tree Search:

The game of Go has long been viewed as the most challenging of classic games for artificial intelligence due to its enormous search space and the difficulty of evaluating board positions and moves. We introduce a new approach to computer Go that uses value networks to evaluate board positions and policy networks to select moves. These deep neural networks are trained by a novel combination of supervised learning from human expert games, and reinforcement learning from games of self-play. Without any lookahead search, the neural networks play Go at the level of state-of-the-art Monte-Carlo tree search programs that simulate thousands of random games of self-play. We also introduce a new search algorithm that combines Monte-Carlo simulation with value and policy networks. Using this search algorithm,our program AlphaGo achieved a 99.8% winning rate against other Go programs,and defeated the European Go champion by 5 games to 0. This is the first time that a computer program has defeated a human professional player in the full-sized game of Go, a feat previously thought to be at least a decade away. 

Eric Brewer presenting Experience with Processes and Monitors in Mesa:

The use of monitors for describing concurrency has been much discussed in the literature. When monitors are used in real systems of any size, however, a number of problems arise which have not been adequately dealt with: the semantics of nested monitor calls; the various ways of defining the meaning of WAIT; priority scheduling; handling of timeouts, aborts and other exceptional conditions; interactions with process creation and destruction; monitoring large numbers of small objects. These problems are addressed by the facilities described here for concurrent programming in Mesa. Experience with several substantial applications gives us some confidence in the validity of our solutions. 

Ines Sombra presenting IronFleet: Proving Practical Distributed Systems Correct

Distributed systems are notorious for harboring subtle bugs. Verification can, in principle, eliminate these bugs a priori,but verification has historically been difficult to apply at full program scale, much less distributed-system scale.We describe a methodology for building practical and provably correct distributed systems based on a unique blend of TLA-style state-machine refinement and Hoare-logic verification.We demonstrate the methodology on a complex implementation of a Paxos-based replicated state machine library and a lease-based sharded key-value store. We prove that each obeys a concise safety specification, as well as desirable liveness requirements. Each implementation achieves performance competitive with a reference system. With our methodology and lessons learned, we aim to raise the standard for distributed systems from “tested” to “correct.”  

Caitie McCaffrey presenting Simple Testing Can Prevent Most Critical Failures...:

Large, production quality distributed systems still fail periodically,and do so sometimes catastrophically, where most or all users experience an outage or data loss. We present the result of a comprehensive study investigating198 randomly selected, user-reported failures that occurred on Cassandra, HBase, Hadoop Distributed FileSystem (HDFS), Hadoop MapReduce, and Redis, with the goal of understanding how one or multiple faults eventually evolve into a user-visible failure. We found that from a testing point of view, almost all failures require only 3 or fewer nodes to reproduce, which is good news considering that these services typically run on avery large number of nodes. However, multiple inputs are needed to trigger the failures with the order between them being important. Finally, we found the error logs of these systems typically contain sufficient data on both the errors and the input events that triggered the failure,enabling the diagnose and the reproduction of the production failures. 

We found the majority of catastrophic failures could easily have been prevented by performing simple testing on error handling code – the last line of defense – even without an understanding of the software design. We extracted three simple rules from the bugs that have lead to some of the catastrophic failures, and developed a static checker, Aspirator, capable of locating these bugs. Over30% of the catastrophic failures would have been prevented had Aspirator been used and the identified bugs fixed. Running Aspirator on the code of 9 distributed systems located 143 bugs and bad practices that have been fixed or confirmed by the developers. 

Speaker: Evelina Gabasova

Bioinformatics Machine Learning Researcher @Cambridge_Uni

Evelina is a machine learning researcher working in bioinformatics, trying to reverse-engineer cancer at University of Cambridge. Outside of academia, she speaks at developer conferences and user groups about F# and data science. She writes a blog at

Find Evelina Gabasova at

Speaker: Caitie McCaffrey

Distributed Systems Engineer @Twitter

Caitie McCaffrey is a Backend Brat and Distributed Systems Diva at Twitter, where she is the Tech Lead of the Observability Team. Prior to that she spent the majority of her career building large scale services and systems that power the entertainment industry at 343 Industries, Microsoft Game Studios, and HBO. Caitie has a degree in Computer Science from Cornell University, and has worked on several video games including Gears of War 2, Gears of War 3, Halo 4, and Halo 5 She maintains a blog at and frequently discusses technology on Twitter @Caitie

Find Caitie McCaffrey at

Speaker: Eric Brewer

CAP Theorem Creator, VP Infrastructure @Google, CS Professor @UCBerkeley

Dr. Brewer joined Google in May 2011 and leads the company’s compute infrastructure design. He focuses on all aspects of Internet­ based systems including cloud computing, scalability, containers, and storage. As a researcher, Dr. Brewer has led projects on scalable servers, network infrastructure, IoT, and the CAP Theorem. He has also led work on technology for developing regions, with projects in India, Indonesia, and Kenya among others, and including communications, power, and health care. In 1996, he co­founded Inktomi Corporation with a Berkeley grad student and helped lead it onto the NASDAQ 100. In 2000, working with President Clinton, Dr. Brewer helped to create, the official portal of the Federal government.

Find Eric Brewer at

Speaker: Ines Sombra

Engineer @Fastly

Ines Sombra is an Engineer at @Fastly, where she spends her time helping the Web go faster. Ines holds an M.S. in Computology with an emphasis on Cheesy 80’s Rock Ballads. She has a fondness for steak, fernet, and a pug named Gordo. In a previous life she was a Data Engineer.

Find Ines Sombra at

Similar Talks


Monday, 13 June

Tuesday, 14 June

Wednesday, 15 June