Lyft's Envoy: From Monolith to Service MeshMatt Klein, Senior Software Engineer at Lyft
Envoy from Lyft looks pretty cool. It's a proxy that runs on every server that facilitates server-to-server communication. It takes care of all sorts of distributed systems / microservices problems. It implements backoff, retry, and all sorts of other things.
It works as a byte-oriented proxy, but it has filters to apply smarts to the bytes going over the wire.
It takes care of a lot of the hard parts of building a microservices architecture--namely server-to-server communication.
It's written in C++11 for performance and latency reasons.
He said that there are a bunch of solutions for doing service discovery. A lot of them try to be fully consistent. These include ZooKeeper, Etcd, Consul. However, he felt that it was better to build an eventually consistent service discovery system.
When you build a system using those fully consistent systems, you usually end up with a team devoted to managing them. However, Lyft's eventually consistent system was only a few hundred lines of code and had been rock solid for six months.
They used lots of health checks, and the results of the health checks was more important than what the discovery system said.
He also recommended Lightstep and Wavefront.
Envoy can work with gRPC, although there's a little bit of overlap in terms of what each provides.
Microservices are the Future (and Always Will Be)Josh Holzman, Director, Infrastructure Engineering at Xoom
Xoom is a company that lets you send money to people in foreign countries. It's been around for something like 16 years (as I recall).
They're running Apache Curator on top of ZooKeeper for service discovery. Apparently, that removes some of the need to be fully consistent. He completely agreed with the earlier speaker's suggestion that eventually consistent systems were better for service discovery.
He mentioned Grafana and InfluxDB.
He said that moving to micro services gave them more visibility into their overall software stack, and that enabled them to achieve better performance and lower latency. However, their latency distribution is higher.
They use 2 DC's as well as AWS.
He mentioned Terraform and Packer.
They use Puppet and Ansible to manage their machines.
He said that the whole infrastructure as code idea is a good one, but it's important to use TDD when writing such code. He said that they use Beaker for writing such tests.
They have self service and automated deploys.
He recommended that you start eliminating cross-domain joins now. However, he admitted that they still haven't achieved this.
He said that analytics is hard when you have a bunch of separate databases.
Listening to a lot of people, most companies still have monoliths that they're still chipping away at.
You need to think about how to scale your monitoring. They had a metrics explosion that took out their monitoring system.
He said that moving to microservices was worth it.
They have "Xoom in a box" for integration testing. There are a bunch that are running all the time, and you can deploy to one of them.
They also use mocks for doing integration testing.
They have to jump through a bunch of regulatory compliance hoops. One of the requirements that they have to follow is that the people who have access to the code and the people who have access to prod must be completely separate.
They have production-like data sets for testing with.
They have dev, QA, stage, and prod environments.
Bringing Learnings from Googley Microservices into gRPCVarun Talwar, Product Manager for gRPC at Google
HTTP/JSON doesn't cut it.
gRPC is Google's open source RPC framework. Google is all Stubby internally. gRPC is their open source version of it.
They have 10s of billions of RPCs per second using Stubby.
Just like gRPC is their open source version of Stubby, Kupernetes is their open source version of Borg.
He joked that there are only two types of jobs at Google:
- Protobuf to protobuf
- Protobuf to UI
So is using binary on the wire.
gRPC supports both sync and async APIs. It also supports streaming and non-streaming.
Deadlines are a first class feature.
It supports deadline propagation.
You can cancell requests, and that cancellation is cascaded to downstream services.
It can do flow control.
You can create configuration for how your service should be used.
You can even use gRPC for external clients (such as the website or mobile clients).
It's based on HTTP/2.
The Hardest Part of Microservices: Your DataChristian Posta, Principal Architect at Red Hat
Here are the slides.
A lot of smaller companies use the same tools that the big guys released in order to build their microservices, but that doesn't always work out so well. You can't just cargo cult. You have to understand why the big guys did what they did.
Microservices are about optimizing for being able to make changes to the system more quickly.
Stick with relational DBs as long as you can.
I loved this image he used:
mysql_streamer from Yelp lets you stream changes as they come into MySQL.
Debezium.io is similar. It's built on top of Kafka Connect.
Funny: the guy from Red Hat is using OS X.
Note to self: always have a video of your demo just in case it fails.
They currently support MySQL and Mongo, and they're working on PostgreSQL (although it has its own mechanism).
Systems are Eating the WorldRafi Schloming, CTO / Chief Architect at Datawire
WTF is a microservice?
Just because you know distributed systems doesn't mean you know microservices.
According to Wikipedia, "There is no industry consensus yet regarding the properties of microservices, and an official definition is missing as well."
He said it's about technology, process, and people.
3 other things to keep in mind are experts, bootstrapping, and migrating.
He told his story of building microservices infrastructure using microservices.
Minikube for running Kupernetes locally.
This cheat sheet covers the what, why, and how of microservices:
Engineering & Autonomy in the Age of MicroservicesNic Benders, Chief Architect at New Relic
This was perhaps the most interesting talk, although really it's a talk about management, not microservices. It starts at about 5:46:00 in the livestream. Here are the slides.
He and many others kept mentioning Conway's Law, "Any organization that designs a system (defined broadly) will produce a design whose structure is a copy of the organization's communication structure."
New Relic wanted, "Durable, full-ownership teams, organized around business capabilities, with the authority to choose their own tasks and the ability to complete those tasks independently."
They wanted to eliminate or at least minimize the dependencies between teams.
He talked about optimizing team structure to make the company more successful. They didn't just talk about how to reorganize, they looked empirically at how teams communicated.
They wanted to invert control in the org.
They figured the best way to get the engineers into the right teams was to let the engineers pick which team to be in. Hence, they did self selection. They didn't even conduct internal interviews in order to switch teams. The ICs were fully in control.
Early on, this made the managers and even the ICs unhappy. The managers wanted control of who was on their team, and the ICs were fearful that this was a game of musical chairs, and they were going to be left without a chair.
They almost backed down but they didn't.
They really wanted the ICs to have self determination.
They have a core value as a company: we will take care of you.
In my mind, this was like a gigantic experiment in management which is why it made for such a fascinating talk; although, to be fair, the speaker was also quite engaging.
1/3 of people ended up switching teams.
Each team crafted a "working agreement" based on answering the question "We work together best when..."
They optimized for agility, not efficiency. These two things are very different. Most companies optimize for efficiency. Hence, they have deep backlogs, and engineers are never out of things to do. He said that people should optimize for agility instead. This means that people can switch projects and teams easily, although they may suffer from going more slowly. Also, the backlogs are shorter.
At least one team practiced "mob programming" which is where the entire team participates in "pair" programming. This is terrible for efficiency, but it's great for helping people get up to speed.
The experiment worked. The reorg took about a quarter before things really settled down, but over the course of a year, they got a lot more stuff done than they would have otherwise.
Autonomous teams have rights and responsibilities.
You hired smart people--trust them.
Toward the end, he gave a great list of book recommendations:
- The Art of Agile Development
- Liftoff: Launching Agile Teams & Projects
- Creating Great Teams: How Self-Selection Lets People Excel
- Turn the Ship Around!
- The Principles of Product Development Flow
It's more important to have autonomy than to have technological consistency.
Engineers were allowed to deploy whatever they wanted using containers. They also had to meet some minimum observability requirements.
They even had one team that used Elixir and Phoenix.
The managers weren't allowed to change teams, they provided stability during the process.
There was turnover, but it wasn't that much different than their yearly average. He said, "You're not going to win them all."
The stuff between teams (such as the protocols used between services) is owned by the architecture team. He said this was the "interstate commerce clause."
They came up with a list of every product, library, etc. in the company, and then transferred each of these things during a careful 2 week handover period. However, there were still a few balls that got dropped.
They're still working on what to do going forward, such as how often to do such a reshuffle or whether they should do something continuous.
Microservice StandardizationSusan Fowler, Engineer at Stripe (and previously Uber) and author of "Production Ready Microservices".
Microservices in Production is a ebook that summarizes "Production Ready Microservices".
She said that the inverse of Conway's Law causes the structure of a company to mirror its architecture.
I think it's interesting that she left Uber for Stripe.
She was in charge of standarding microservices at Uber. It was a bit of a mess.
Microservices should not mean you can do whatever you want, however you want.
Every new programming language costs a lot.
Microservices are not a silver bullet.
Microservices can lead to massive technical sprawl and debt, which isn't scalable.
She was trained as a physicist, but she said there are no jobs in physics.
At Uber, there was no trust at the org, cross-team, or team level.
There was a need for standardization at scale.
They needed to hold each service to high standards.
Good logging is critical for fixing bugs. With microservices, it can be very hard to reproduce bugs.
Production readiness should be a guide, not a gate.