Managing developers access to production for a fintech unicorn How a fintech unicorn is using Runops to:
- reduce time to resolution of support tickets;
- and increase feature delivery speed
by removing developers dependency on manual work from the security and DevOps teams.
Company
Dock is one of the largest fintech startups in the world. They almost 300 million USD. The company has 1500 people. Almost 1 thousand people work in technology. Dock provides banking, issuing, acquiring, and many other components of the financial stack as a service. One could build just any financial application on top of Dock APIs.
Scenario
By operating in a highly regulated industry they need a lot of controls in production. Besides caring a lot about their users data, they also have to comply with industry standards to operate in the financial industry.
PCI is one of the required certification to operate in their industry. The impacts to technology teams are huge. Things that traditionally work for non-regulated businesses break for fintechs.
One such example is managing access to production databases and containers. Startups from many industries don't care about this problem at all. Developers have full raw access to production.
Many large tech-media companies are known for their speed, and a lot of it comes from their developers' autonomy. This very fact contributed to the large data issues coming out of the tech giants. The impacts their businesses were big. But they are still in business.
This won't happen in the financial industry. Developers autonomy is key for productivity, but it is hard to achieve in a highly regulated industry. This is a critical need for Dock as a disrupting fintech that requires speed.
Problem
The lack of solutions to help with the problem created manual work. They lost speed to ensure systems were safe. Developers that needed access to production had to use a gated process implemented by the security team.
Engineers had to open tickets with SQL queries using Jira. The security team validated the query for:
- Reliability
- Ensure the queries weren't destructive to the production environment.
- Bad where clauses and anything that could disrupt production had to be evaluated before touching production to reduce changes of an outage.
- Security
- Ensure the data access is clear of any PII data.
- When PII is required, validate the need and validate the least possible exposure.
- Compliance
- Leave records of everything throughout the process.
- Who, what, when, where, why, had to be answered for everything that touched production.
After all the the checks, the sec team had to manually:
- Open the VPN
- Access the database
- Open Jira to copy the query
- Run the query on the database
- Validate execution and results
- Report back to the developer
They had more than 50 requests a week.
Multiple people in the security team had to work full time on solving these tickets. High impact security engineers working on this process were spending all their time doing manual copy and pasting of SQL scripts. But developers were also sad.
Due to the large amount of requests tickets would take from hours to days to execute. This delay generated a lot of context switching and broke developers productivity
In the end the, company was looking at impacts of:
Lost investments
- Idle developers blocked waiting on the security team
- Security engineers working on low impact manual tasks full time
Slower growth
- Slower feature release slowed down growth.
- Features needed to grow the business by providing more value to customers were delayed.
Churn
- Slow execution of scripts impacted support requests resolution time.
- The worst user experience also creates churn.
Team
The Runops project was initiated by the SRE Team. They take care of the developer platform and collaborate with security
The security team saw the potential of Runops. Remove manual work and increase developer autonomy was a big desire. Getting to that while maintaining the compliance, reliability, and security requirements would be huge.
Thiago Mouro, one of the security engineers executing scripts manually, took on the challenge to automate the process using Runops.
Stack
Dock is a Microservices powerhouse, with hundreds of databases and services running in the cloud. They use AWS-managed databases and services
Goal
The goal of the project was to enable developers to execute their SQL scripts directly to the live database. Execution had to happen only after scripts got reviews by teams impacted by the changes. It would increase autonomy.
Developers would be in charge of ensuring executions went well. While the sec team performed security analysis. Drastically reduced scopes. The load of the security team would drop. Removing the need for sec to also run and and ensure it went well.
Solution
They decided to use Runops gated access control. It enabled developers to create their scripts from Slack, web, or terminal. Jira was out of the way.
Runops Slack app was used for real-time review workflows. The team got review requests in Slack. Commands included who, is doing what, where, when, and why. Everything they needed. They made decisions in seconds.
Then came execution time. Runops execution runtime was used to send only approved commands to the live database. Centralized audit trails were automatically generated for compliance.
Everything getting SSO with Okta.
Systems replaced
The new process replaced replaced:
- VPN for production access
- Static passwords for database users
- Jira ticketing system
- Manual copying and pasting of commands
- Slack messages for coordinating priority of tickets
Business impacts
Reduced time to resolve support tickets
- A result of production database access requests going from days to minutes.
- In addition to increased customer satisfaction, developers got extra time to work on features.
More time for revenue-related features
- High-leverage projects out of backlog
- The two people from the sec team working full time on access requests started working on strategic projects.
- Automation projects that were long due got out of the backlog.
Reduced number of systems
- Replaced VPNs for database access, Jira ticketing, Slack messages for discussing executions
Faster compliance audits
- Runops SOC2 and PCI compliance reports replaced in-house Jira customizations.
- No time spent on Jira settings during audits.
- Reports built-in.
Extra time to work on strategic projects
- Competitive advantage for Dock by not spending valuable time on manual operations.
- Impressive pace of speed and innovation despite their huge size and complexity of their technology.
Indirect results
Improved M&A results
- Companies joining the team integrated Runops workflows in days and processes standardized across large organizations.
Promotions
- Runops was a super high visibility project as it improved the lives of every engineer in the company. The impact ended up resulting in promotions for champions of the project as they got a lot of appreciation from all areas of the company for the great results.
Thank you
Thank you Jean Dias, Thiago Mouro, Renato Matos, and everyone that took part in the project. Your feedback to Runops was critical to getting us where we are today. We are proud of the partnership we created togheter. Let's make it better together for the years to come!