Skip to main content Practitioner Summit

I went to the Practitioner Summit. Here are the videos. Here are my notes:

Lyft's Envoy: From Monolith to Service Mesh

Matt Klein, Senior Software Engineer at Lyft

Envoy from Lyft looks pretty cool. It's a proxy that runs on every server that facilitates server-to-server communication. It takes care of all sorts of distributed systems / microservices problems. It implements backoff, retry, and all sorts of other things.

It works as a byte-oriented proxy, but it has filters to apply smarts to the bytes going over the wire.

It takes care of a lot of the hard parts of building a microservices architecture--namely server-to-server communication.

It's written in C++11 for performance and latency reasons.

He said that there are a bunch of solutions for doing service discovery. A lot of them try to be fully consistent. These include ZooKeeper, Etcd, Consul. However, he felt that it was better to build an eventually consistent service discovery system.

When you build a system using those fully consistent systems, you usually end up with a team devoted to managing them. However, Lyft's eventually consistent system was only a few hundred lines of code and had been rock solid for six months.

They used lots of health checks, and the results of the health checks was more important than what the discovery system said.

He also recommended Lightstep and Wavefront.

Envoy can work with gRPC, although there's a little bit of overlap in terms of what each provides.

Microservices are the Future (and Always Will Be)

Josh Holzman, Director, Infrastructure Engineering at Xoom

Xoom is a company that lets you send money to people in foreign countries. It's been around for something like 16 years (as I recall).

They're running Apache Curator on top of ZooKeeper for service discovery. Apparently, that removes some of the need to be fully consistent. He completely agreed with the earlier speaker's suggestion that eventually consistent systems were better for service discovery.

He mentioned Grafana and InfluxDB.

He said that moving to micro services gave them more visibility into their overall software stack, and that enabled them to achieve better performance and lower latency. However, their latency distribution is higher.

They use 2 DC's as well as AWS.

He mentioned Terraform and Packer.

They use Puppet and Ansible to manage their machines.

He said that the whole infrastructure as code idea is a good one, but it's important to use TDD when writing such code. He said that they use Beaker for writing such tests.

They have self service and automated deploys.

He recommended that you start eliminating cross-domain joins now. However, he admitted that they still haven't achieved this.

He said that analytics is hard when you have a bunch of separate databases.

Listening to a lot of people, most companies still have monoliths that they're still chipping away at.

You need to think about how to scale your monitoring. They had a metrics explosion that took out their monitoring system.

He said that moving to microservices was worth it.

They have "Xoom in a box" for integration testing. There are a bunch that are running all the time, and you can deploy to one of them.

They also use mocks for doing integration testing.

They have to jump through a bunch of regulatory compliance hoops. One of the requirements that they have to follow is that the people who have access to the code and the people who have access to prod must be completely separate.

They have production-like data sets for testing with.

They have dev, QA, stage, and prod environments.

Bringing Learnings from Googley Microservices into gRPC

Varun Talwar, Product Manager for gRPC at Google

HTTP/JSON doesn't cut it.

gRPC is Google's open source RPC framework. Google is all Stubby internally. gRPC is their open source version of it.

They have 10s of billions of RPCs per second using Stubby.

Just like gRPC is their open source version of Stubby, Kupernetes is their open source version of Borg.

He joked that there are only two types of jobs at Google:
  • Protobuf to protobuf
  • Protobuf to UI
Forward and backward compatibility is really important.

So is using binary on the wire.

gRPC supports both sync and async APIs. It also supports streaming and non-streaming.

Deadlines are a first class feature.

It supports deadline propagation.

You can cancell requests, and that cancellation is cascaded to downstream services.

It can do flow control.

You can create configuration for how your service should be used.

You can even use gRPC for external clients (such as the website or mobile clients).

It's based on HTTP/2.

The Hardest Part of Microservices: Your Data

Christian Posta, Principal Architect at Red Hat

Here are the slides.

A lot of smaller companies use the same tools that the big guys released in order to build their microservices, but that doesn't always work out so well. You can't just cargo cult. You have to understand why the big guys did what they did.

Microservices are about optimizing for being able to make changes to the system more quickly.

Stick with relational DBs as long as you can.

I loved this image he used:

mysql_streamer from Yelp lets you stream changes as they come into MySQL. is similar. It's built on top of Kafka Connect.

Funny: the guy from Red Hat is using OS X.

Note to self: always have a video of your demo just in case it fails.

They currently support MySQL and Mongo, and they're working on PostgreSQL (although it has its own mechanism).

Systems are Eating the World

Rafi Schloming, CTO / Chief Architect at Datawire

WTF is a microservice?

Just because you know distributed systems doesn't mean you know microservices.

According to Wikipedia, "There is no industry consensus yet regarding the properties of microservices, and an official definition is missing as well."

He said it's about technology, process, and people.

3 other things to keep in mind are experts, bootstrapping, and migrating.

He told his story of building microservices infrastructure using microservices.

Minikube for running Kupernetes locally.

This cheat sheet covers the what, why, and how of microservices:

Engineering & Autonomy in the Age of Microservices

Nic Benders, Chief Architect at New Relic

This was perhaps the most interesting talk, although really it's a talk about management, not microservices. It starts at about 5:46:00 in the livestream. Here are the slides.

He and many others kept mentioning Conway's Law, "Any organization that designs a system (defined broadly) will produce a design whose structure is a copy of the organization's communication structure."

New Relic wanted, "Durable, full-ownership teams, organized around business capabilities, with the authority to choose their own tasks and the ability to complete those tasks independently."

They wanted to eliminate or at least minimize the dependencies between teams.

He talked about optimizing team structure to make the company more successful. They didn't just talk about how to reorganize, they looked empirically at how teams communicated.

They wanted to invert control in the org.

They figured the best way to get the engineers into the right teams was to let the engineers pick which team to be in. Hence, they did self selection. They didn't even conduct internal interviews in order to switch teams. The ICs were fully in control.

Early on, this made the managers and even the ICs unhappy. The managers wanted control of who was on their team, and the ICs were fearful that this was a game of musical chairs, and they were going to be left without a chair.

They almost backed down but they didn't.

They really wanted the ICs to have self determination.

They have a core value as a company: we will take care of you.

In my mind, this was like a gigantic experiment in management which is why it made for such a fascinating talk; although, to be fair, the speaker was also quite engaging.

1/3 of people ended up switching teams.

Each team crafted a "working agreement" based on answering the question "We work together best when..."

They optimized for agility, not efficiency. These two things are very different. Most companies optimize for efficiency. Hence, they have deep backlogs, and engineers are never out of things to do. He said that people should optimize for agility instead. This means that people can switch projects and teams easily, although they may suffer from going more slowly. Also, the backlogs are shorter.

At least one team practiced "mob programming" which is where the entire team participates in "pair" programming. This is terrible for efficiency, but it's great for helping people get up to speed.

The experiment worked. The reorg took about a quarter before things really settled down, but over the course of a year, they got a lot more stuff done than they would have otherwise.

Autonomous teams have rights and responsibilities.

You hired smart people--trust them.

Toward the end, he gave a great list of book recommendations:
  • The Art of Agile Development
  • Liftoff: Launching Agile Teams & Projects
  • Creating Great Teams: How Self-Selection Lets People Excel
  • Turn the Ship Around!
  • The Principles of Product Development Flow
All of their teams had embedded PMs.

It's more important to have autonomy than to have technological consistency.

Engineers were allowed to deploy whatever they wanted using containers. They also had to meet some minimum observability requirements.

They even had one team that used Elixir and Phoenix.

The managers weren't allowed to change teams, they provided stability during the process.

There was turnover, but it wasn't that much different than their yearly average. He said, "You're not going to win them all."

The stuff between teams (such as the protocols used between services) is owned by the architecture team. He said this was the "interstate commerce clause."

They came up with a list of every product, library, etc. in the company, and then transferred each of these things during a careful 2 week handover period. However, there were still a few balls that got dropped.

They're still working on what to do going forward, such as how often to do such a reshuffle or whether they should do something continuous.

Microservice Standardization

Susan Fowler, Engineer at Stripe (and previously Uber) and author of "Production Ready Microservices".

Microservices in Production is a ebook that summarizes "Production Ready Microservices".

She said that the inverse of Conway's Law causes the structure of a company to mirror its architecture.

I think it's interesting that she left Uber for Stripe.

She was in charge of standarding microservices at Uber. It was a bit of a mess.

Microservices should not mean you can do whatever you want, however you want.

Every new programming language costs a lot.

Microservices are not a silver bullet.

Microservices can lead to massive technical sprawl and debt, which isn't scalable.

She was trained as a physicist, but she said there are no jobs in physics.

At Uber, there was no trust at the org, cross-team, or team level.

There was a need for standardization at scale.

They needed to hold each service to high standards.

Good logging is critical for fixing bugs. With microservices, it can be very hard to reproduce bugs.

Production readiness should be a guide, not a gate.


Popular posts from this blog

Drawing Sierpinski's Triangle in Minecraft Using Python

In his keynote at PyCon, Eben Upton, the Executive Director of the Rasberry Pi Foundation, mentioned that not only has Minecraft been ported to the Rasberry Pi, but you can even control it with Python. Since four of my kids are avid Minecraft fans, I figured this might be a good time to teach them to program using Python. So I started yesterday with the goal of programming something cool for Minecraft and then showing it off at the San Francisco Python Meetup in the evening.

The first problem that I faced was that I didn't have a Rasberry Pi. You can't hack Minecraft by just installing the Minecraft client. Speaking of which, I didn't have the Minecraft client installed either ;) My kids always play it on their Nexus 7s. I found an open source Minecraft server called Bukkit that "provides the means to extend the popular Minecraft multiplayer server." Then I found a plugin called RaspberryJuice that implements a subset of the Minecraft Pi modding API for Bukkit s…

Apple: iPad and Emacs

Someone asked my boss's buddy Art Medlar if he was going to buy an iPad. He said, "I figure as soon as it runs Emacs, that will be the sign to buy." I think he was just trying to be funny, but his statement is actually fairly profound.

It's well known that submitting iPhone and iPad applications for sale on Apple's store is a huge pain--even if they're free and open source. Apple is acting as a gatekeeper for what is and isn't allowed on your device. I heard that Apple would never allow a scripting language to be installed on your iPad because it would allow end users to run code that they hadn't verified. (I don't have a reference for this, but if you do, please post it below.) Emacs is mostly written in Emacs Lisp. Per Apple's policy, I don't think it'll ever be possible to run Emacs on the iPad.

Emacs was written by Richard Stallman, and it practically defines the Free Software movement (in a manner of speaking at least). Stal…

JavaScript: Porting from react-css-modules to babel-plugin-react-css-modules (with Less)

I recently found a bug in react-css-modules that prevented me from upgrading react-mobx which prevented us from upgrading to React 16. Then, I found out that react-css-modules is "no longer actively maintained". Hence, whether I wanted to or not, I was kind of forced into moving from react-css-modules to babel-plugin-react-css-modules. Doing the port is mostly straightforward. Once I switched libraries, the rest of the port was basically:
Get ESLint to pass now that react-css-modules is no longer available.Get babel-plugin-react-css-modules working with Less.Get my Karma tests to at least build.Get the Karma tests to pass.Test things thoroughly.Fight off merge conflicts from the rest of engineering every 10 minutes ;) There were a few things that resulted in difficult code changes. That's what the rest of this blog post is about. I don't think you can fix all of these things ahead of time. Just read through them and keep them in mind as you follow the approach above.…