Why Internal Developer Portals Need Automations

Automations provide the process orchestration required to create seamless workflows, reduce manual intervention and maintain organizational guardrails.

Jun 20th, 2024 10:14am by Zohar Einy

Featued image for: Why Internal Developer Portals Need Automations

Image from Monster Ztudio on Shutterstock

Software development is inherently complex, involving numerous stakeholders, tools and steps. Effective workflow management is key to getting it right, whether it’s ensuring developers can self-serve efficiently or managers can uphold standards seamlessly. Linking the steps within the workflow can provide a better developer experience, and lead to a more efficient software development life cycle (SDLC).

An internal developer portal provides a better developer experience and yields better outcomes for platform engineering teams. But to get the best out of the portal’s pillars (the software catalog, self-service actions, scorecards and dashboards), you need a way to connect the pillars, automating the entire process.

This is why internal developer portals need automations. Automations provide the process orchestration required to create seamless workflows, reduce manual intervention and maintain organizational guardrails. They help engineering teams to derive even more value from their internal developer portals.

Let’s break down what automations are and what you can do with them.

What Automations Can Do

Eliminate Repetitive Tasks

Automations can handle tasks like clean-ups, permission management and terminating unused development environments. Traditionally, these tasks require manual oversight, ad-hoc scripts or cron jobs, leading to errors and security risks. For example, neglecting to revoke access for a former employee could lead to the exposure of sensitive data. Similarly, failing to terminate an unused development environment could result in unnecessary cloud expenses.

Automations can ensure access is revoked for ex-employees or automatically terminate environments after a predefined time-to-live (TTL) expires.

Efficient Alerts and Notifications

Automations improve alert management by making sure the right information gets to the relevant person (or people) at the optimal time. Here are some examples:

Alerts and Notifications for Developers

Automations provide developers with relevant information from the catalog, helping them to complete tasks with the context they need in-hand.

You can notify developers of:

Pending pull request reviews, nudging them when the request has been in review for too long
Deployment status updates, including successful and failed deployments, enabling them to quickly react and identify issues
Changes to dependencies when services or components are modified and their dependencies are affected.

Alerts and Notifications for Managers

Automations provide managers with the information they need to better understand and manage their team’s performance and goals. They’ll be automatically provided with a link to the relevant section in the software catalog, allowing them to quickly identify any issues.

Some examples include being informed of unmet service-level objectives (SLOs), performance degradation or escalating cloud costs.

Alerts and Notifications for Security and SREs

Alerting security and site reliability engineering (SRE) teams enables them to swiftly respond to critical issues.

For example, alerting the security team about a critical vulnerability that affects a high-priority asset, or notifying SREs of unusual patterns of system behaviors, enabling rapid response and mitigation.

Policy Enforcement

There’s more to automation than doing away with mundane tasks, speeding up processes or sending out alerts. Automation can also help enforce organizational policies consistently by integrating these policies directly into your development workflows.

You can set:

Resource consumption limits: Trigger approval workflows when a developer exceeds resource limits; for example if they attempt to create a third development environment in a specific month. This can also depend on other factors; for instance, granting automatic approval if the developer has been with the company for more than two years.
Just-in-time access: Automatically grant and revoke production access for on-call engineers during their shifts.

Real-World Application: Incident Management

Consider a 3 a.m. critical alert scenario.

Without automations, the on-call engineer will have to:

Assess impact, including the severity of the issue and the impact on staging and production.
Coordinate with stakeholders: They would usually open an incident ticket and trawl through outdated documentation to try and identify owners, dependencies and stakeholders.
Investigate the issue: They will immediately be put on the back burner as information is spread across numerous tools and dashboards. After looking at infrastructure health, Kubernetes logs and recent deployments, they find that a recent deployment is causing the issue as it is causing a spike in memory usage.
Seek approvals for remediation: When a rollback is required, the change request can take a long time to approve.
Resolve and communicate: They deploy a rollback and the system returns to normal. Now the engineer has to manually update stakeholders, with an overview of the incident, its impact and the resolution steps taken.

All in all, the process takes a number of hours; in fact, the engineer has to get ready to head to work again as it’s the start of a new working day.

This is a stressful way of dealing with incidents; it’s inefficient, can lead to errors along the way and it puts far too much onus on the on-call engineer.

Now let’s revisit that 3 a.m. scenario, but with automations integrated into the process:

Swift, targeted alerts: The critical alert triggers an automation that immediately notifies the designated on-call engineer, emphasizing the high-priority nature of the issue.
Seamless incident creation: The automation promptly opens an incident in the incident management system such as PagerDuty, Opsgenie, populating it with essential details from the software catalog.
Streamlined communication: An incident-specific Slack channel is automatically set up, inviting all relevant service owners, team members and stakeholders. This eliminates the need for frantic searches to gather the right people.
Centralized investigation: The on-call engineer receives a direct link to the portal catalog within the incident channel, enabling quick access to asset dependencies, recent deployments and monitoring dashboards. They immediately pinpoint a memory spike following a recent deployment.
Efficient self-service remediation: With all necessary information at hand, the engineer initiates a rollback via a self-service action in the portal. Given their on-call status, the rollback is automatically approved, and the automation also ensures their production access is revoked once their shift ends.
Automated resolution and communication: Upon successful completion of the rollback and the system being stabilized, the automation updates the incident status in both the Slack channel and the incident management system, notifying all stakeholders and closing the loop.

By 3:30 a.m., the engineer has effectively resolved the incident, conserving precious time and minimizing the business impact. They can return to sleep, confident that the system is stable and all parties are informed.

How Automations Work

Automations consist of triggers (events in your software catalog) and actions (logic executed when triggers occur). Supported actions include webhook, GitHub workflows, GitLab pipelines, Terraform Cloud and Azure Pipelines, among others. When a trigger event occurs, the portal automatically executes the associated action, ensuring seamless and scalable process orchestration.

Conclusion

Automations redefine how you manage your SDLC, from automating tedious tasks to delivering intelligent alerts and enforcing policies. They provide a cohesive, end-to-end process orchestration that empowers your teams to work smarter and more efficiently.

Want to see how automations for your internal developer work? Book a demo with Port, here.

Zohar Einy is the CEO of Port, the agentic engineering platform that is helping customers like GitHub, Visa, and PwC move from manual to autonomous engineering. Zohar began his career in the Israel Defense Forces' 8200 unit as an engineer,...