New Yorkers
Presentations about New Yorkers
Debugging Microservices: How Google SREs Resolve Outages
Debugging Microservices: How Google SREs Resolve Outages
Defense in Depth: In Depth
Engineering Secure Products at Facebook
Canopy: Scalable Distributed Tracing & Analysis @Facebook
Canopy: Scalable Distributed Tracing & Analysis @Facebook
Java 11 - Keeping the Java Release Train on the Right Track
Design Microservice Architectures the Right Way
Digital Publishing for Scale: The Economist and Go
Organizing for Your Ethical Principles
Rethinking HCI With Neural Interfaces @CTRLlabsco
Software Is Eating the World, ML Is Going to Eat Software
"Yo... Ask Me Anything" - Panel of NY Senior Java Developers
"Yo... Ask Me Anything" - Panel of NY Senior Java Developers
"Yo... Ask Me Anything" - Panel of NY Senior Java Developers
"Yo... Ask Me Anything" - Panel of NY Senior Java Developers
"Yo... Ask Me Anything" - Panel of NY Senior Java Developers
Why Bother With Kotlin - Not Just Another Language Tour
Interviews
Debugging Microservices: How Google SREs Resolve Outages
What is the work that you do today as a Google SRE?
Adam: I work for a Google DevOps team that takes care of Monarch. Monarch is a very large time series database used for querying and metrics collection. Monarch is roughly the internal equivalent of combining Prometheus, Grafana, and Graphite from the open source world. Monarch also adds to that stack all of Stackdriver and provides the backend for a lot of our cloud signals product. My role is an SRE-SWE which means I'm involved in the software engineering side as well. So a lot of my time is spent taking apart Monarch and putting it back together more durably and more reliably. Durability is especially important because Monarch is a globally distributed system (it runs in every single availability zone).
Can you give me an idea of the scope and size we’re talking about with Monarch?
Adam: I can’t be specific, but it’s very large in terms of both QPS and resources. The quantity of data per stream is extremely variable in size, from periodically receiving one byte, to receiving a constant stream of high-cardinality data. The same applies to the query side, where some queries need only fetch a single stream, and some need to fetch and aggregate a lot of them. Some consumers are doing ad hoc queries, and other teams are doing a tremendous number of queries per second to inform their actual customer facing products. Without Monarch, we have no monitoring or alerting, so it’s a critical system.