Program Committee Member: Michelle Brush

SRE Manager @Google

Michelle Brush is a math geek turned computer geek with over 15 years of software development experience. She has developed algorithms and data structures for pathfinding, search, compression, and data mining in embedded as well as distributed systems. In her current role as an SRE Manager for Google, she leads teams of SREs that ensures GCE's APIs are reliable. Previously, she served as the Director of HealtheIntent Architecture for Cerner Corporation, responsible for the data processing platform for Cerner’s Population Health solutions. Prior to her time at Cerner, she was the lead engineer for Garmin's automotive routing algorithm.

Find Michelle Brush at

Interview

Question: 

What are you currently working on?

Answer: 

My role is Engineering Manager for SRE at Google, and in particular, I work in Cloud on GCE, which stands for  Google Compute Engine. Compute Engine provides VMs and the ability for folks to put their workloads into GCP. The thing we're working on right now, besides the broader idea of what SRE is centered around (making sure that we're balancing reliability and velocity of the product), is trying to obtain a  clear and crisp understanding of what our customers experience through using the product as expressed through the service level objectives (or SLOs) for Compute Engine. 

A big part of our focus in SRE is the service level objectives and how we manage them. That requires a precise understanding of what our customers feel the product experience is, and it is something that we're constantly looking to improve. Some of the other things we're working on are in the space of resilience engineering: How do we advance our ability to constrain and understand the reliability of our system? Being able to know when there'll be outages and how they impact our customers, and designing fast mitigations so we can make to help our customers experience the product better.

Question: 

What were you doing before Google?

Answer: 

I'm relatively new to Google, before joining I was for several years working in health care software. That piqued my interest in the idea of safety and resilience engineering because the healthcare field is so risky when it comes to bugs. It's also a highly regulated environment. You have to deal with constraints: what things you can and can't put in place. How do you convince these regulatory authorities that the counterintuitive approach to safety is the right one? I was in that field for about 10 years in different roles working across different technologies. Anyone who's been in the industry for a long time has moved through a ton of programming languages and a ton of approaches. So I did everything from service development and front end development and then big data processing and engineering are where I was working most recently. Prior to that, I was at a consumer electronics company that developed navigation software. My work was with embedded C programming on a tiny machine that was much, much less powerful than my phone.

Question: 

How was your first experience at QCon?

Answer: 

My first experience with QCon was when a track host asked me to give a talk in a track called "Architecting for Failure". I was excited about the personal approach track hosts take to find speakers. She asked me if I'd be willing to come and join her track and talk about my experience in architecting for failure throughout my career. She said, "You could speak on what you want, I'll work with you, as long as you keep it in this space." 

The overall experience was very professional, very polished and well done. That really impressed me. I also love conferences that aren't too focused on a single space. I feel I learn more by reaching out of my comfort zone. QCon impressed me in that regard. It was really easy to move between different topics. 

Question: 

Are there any tracks or topics that you've been interested in the most?

Answer: 

The one track we're talking about right now is very near and dear to my heart, which is “operating microservices”. Sometimes in the industry, when we come up with something that was the best practice that we know everyone should be doing, we run around and we sell the practice, "We should really be doing this. You should have microservices.” But we forget that there are all these little paper cuts we have to work around to realize the value of the thing. And I think operating microservices is a great example of that. Do you want to do it? Yes, you should do it. But what it actually feels like and looks like doing it it's something that's sometimes left us and some readers to figure out on their own. So I think having a track that really goes into, “Hey, once you're in this world, this is what it's going to feel like, and this is the thing you need to do to get the value out of microservices,” is a really great addition. People know they need to do it, but they don't know exactly what living and breathing it looks like. 

Observability is a good example of that. I expect topics like observability to come up in the microservices track as well as in the tools and production tracks. I believe that people will get exposed to that. Again, I hope to see talks that cover what it is like to do or to have observability in your services. 

Question: 

Are there topic areas that you want to learn more about?

Answer: 

I am really interested in learning more about predictive architectures. My exposure to machine learning was back when people did a lot of data mining more than machine learning. And the challenge we always had was not the actual algorithms. There were plenty of choices for libraries and third-party software that could give you the algorithms for machine learning. It was more, “How do you reach the state that you actually can get value out of those algorithms?” either through upfront data engineering or through the back-end consumption and deployment of the models you produce. And I don't think that's changed. 

The industry has adopted more and more machine learning, but I still think that the fundamental challenge is still that a little bit of that “garbage in, garbage out” problem of the data that you're basing your models on. And then also how do you validate and verify the models you're getting are providing you good insight and not just replaying biases or assumptions that you already have in your organization. That track would be really interesting to me to see who has answers and what they look like. 

The other thing I would really hope to see come out is near real-time or real-time machine learning architecture. My exposure to these things is limited to batch processing. It is interesting to see if anyone is able to present a pattern for real-time or near real-time.

Tracks