Presenters

Source

AI in the Software Lifecycle: Building Enterprise-Scale Agentic Systems at LinkedIn 🚀

The world of software development is undergoing a seismic shift, and at its epicenter is Artificial Intelligence. But how do we move beyond experimental AI projects and truly integrate AI across the entire software development lifecycle at an enterprise scale? That’s precisely the question explored by leading engineers at LinkedIn, and their insights are invaluable for anyone looking to harness the power of AI responsibly and effectively.

This post synthesizes a deep dive into LinkedIn’s approach, showcasing how platform teams are becoming the architects of AI enablement, bridging the gap between cutting-edge AI capabilities and the everyday needs of thousands of developers. We’ll explore their innovative strategies, the tools they’re leveraging, and the crucial lessons learned in building secure, compliant, and developer-friendly multi-agentic systems.

The Converging Challenges: Silos and Shifting Workflows 🌐

LinkedIn, like many organizations, faced two key challenges that sparked their AI integration journey:

  • Siloed AI Experiments: Many teams were experimenting with AI in isolation, resulting in numerous proof-of-concept projects and one-off scripts. While valuable, this led to inconsistency as teams repeatedly reinvented the wheel for common tasks like prompt orchestration, data access, safety, evaluation, and deployment.
  • Evolving Developer Workflows: The traditional “hermit developer” model, where engineers solely focused on coding in their IDEs, is no longer sufficient. Modern developers spend significant time on cross-system coordination, triaging issues, synthesizing insights from multiple services, and triggering complex workflows. These are not single-shot queries but multi-step, stateful tasks that AI agents are exceptionally well-suited to handle.

The realization was clear: to unlock AI’s true productivity potential, a unified, opinionated platform was essential. This would allow teams to focus on their domain-specific problems, while the platform team addressed the system and infrastructure challenges.

Building a Unified Platform: Empowering Developers at Scale 👨‍💻

The goal is to provide a “paved path” that guides developers while preserving their autonomy. This is achieved by meeting engineers where they are and seamlessly integrating AI into their existing workflows.

  • Meeting Developers Where They Are: AI integration isn’t confined to IDEs. It’s woven into workplace productivity tools and everyday development environments. This flexibility is key to driving adoption.
  • Developer Autonomy within Guardrails: Developers maintain significant control through a combination of custom prompts, structured specification governance (like MCP tools), and built-in infrastructure hooks. They are not reinventing boilerplate infrastructure but are empowered to focus on higher-level tasks.
  • Enterprise Scale Impact: Thousands of developers at LinkedIn are actively using these AI-powered tools daily, significantly boosting productivity without compromising quality. Crucially, developers remain responsible for the output’s quality, using AI as an accelerator.

AI Agents: A New Execution Model, Not Just a Feature ✨

A core message from LinkedIn’s experience is that AI agents represent a new execution model, akin to microservices or compute infrastructure. This perspective necessitates a dedicated, well-funded platform team focused on building scalable, reliable, and trustworthy shared infrastructure.

  • Beyond Fancy Demos: Without a robust platform, AI initiatives often remain confined to impressive but unscalable demos.
  • The Role of the Platform Team: These teams are responsible for the process changes, technological adoption, and building common building blocks that prevent repeated efforts across the organization. They thoughtfully incorporate emerging technologies into existing enterprise workflows.
  • AI as an Operating Model: Moving beyond proof-of-concept, AI needs to be viewed as an integral part of the organization’s operating model, permeating across all systems.

Developer Control and Structured Specification Governance 🎯

A critical aspect of enabling AI at scale is ensuring developers remain in control. This is primarily achieved through how they communicate their intent to agents, transforming it into reliably executable actions.

  • The Power of the “Spec”: A specification acts as the contract between the developer and the system. It explicitly defines:
    • The desired changes.
    • How the work should be broken down.
    • Allowed tools.
    • Acceptance criteria and checks for successful completion.
  • Granularity of Tasks: The current focus is at the task level, aligning with the granularity a developer naturally considers when tackling a problem. This treats the agent as a teammate, providing it with the necessary context for a specific task, often resulting in a pull request (PR).
  • Flexible Scope: While currently focused on tasks, the architecture is designed to evolve to handle epic-level orchestrations involving multiple agents and tasks.
  • Tooling and Context: Agents are provided with a baseline set of “basic” tools and knowledge, mirroring what any LinkedIn engineer would possess. Developers can then grant access to additional, specific tools as needed, preventing agent overload and maintaining quality.

The Execution Model: Safe Sandboxes and Orchestration 🛠️

LinkedIn’s approach emphasizes a secure and controlled execution environment for AI agents.

  • Remote Sandbox Execution: Once a developer defines a task, the agent’s execution is orchestrated in a remote sandbox environment. This environment has restrictions to prevent unauthorized access to critical systems.
  • Instantiating Agents: The agent is instantiated within this sandbox, equipped with the provided context (spec/prompt) and access to tools, which can be native platform tools or accessed via MCP.
  • Platform-Managed Workflows: Aspects like authentication with code repositories, pulling code, pushing to branches, and creating pull requests are handled by the platform once the agent completes its execution.

The Human-in-the-Loop: The Critical Review Process ✍️

The generation of a PR is not the end of the line; it’s the beginning of a crucial human-in-the-loop review process.

  • Developer Review: Developers review PRs as they would any teammate’s submission, providing feedback, requesting changes, or approving the work.
  • Iterative Refinement: Agents can pick up this feedback, address the requested changes, and update the same PR, creating a continuous loop of improvement. This collaborative cycle ensures human oversight and judgment remain central.

Encapsulating Domain Knowledge: Evals and Historical Data 📊

To ensure agents are moving in the right direction, the platform leverages domain knowledge and historical data.

  • Leveraging Historical PRs: Hundreds of thousands of past PRs, complete with human feedback and resolutions, provide a rich dataset for training agents. This allows them to learn what constitutes a mergeable change and predict necessary comments.
  • Quantitative, Stochastic, Verifiable Measures: Beyond qualitative feedback, quantifiable metrics like build status are incorporated, ensuring objective evaluation of agent performance.

Patterns in Agent Orchestration: Foreground vs. Background 🎭

LinkedIn identifies two distinct, yet equally important, types of agents and their corresponding patterns:

Foreground Agents (IDE-Integrated) 💡

  • In-the-IDE Experience: These agents, like GitHub Copilot, augment developers directly within their Integrated Development Environment.
  • Augmentation with MCP Tools: While using established products, these agents are enhanced with MCP tools and specific instructions embedded in system prompts to guide their behavior within the IDE.
  • Active Developer Involvement: Developers have clear visibility into the agent’s actions and can actively participate in the process.
  • Use Cases: Ideal for scenarios where developers want to maintain a high level of active control, experimentation, and real-time feedback.

Background Agents (Orchestration Systems) 🤖

  • Automated Workflows: These agents operate in the background, executing tasks based on high-level descriptions. The developer may not see all the “sausage making” but will see the final output, often a PR.
  • Ground-Up Development: LinkedIn builds these agents from scratch to manage complex, multi-step processes.
  • Use Cases: Excellent for tasks like large-scale refactors, migrations, improving code coverage, cleaning up technical debt, and even responding to production outages or availability dips through observability agents.
  • Reduction of Toil: Particularly effective for essential but often deprioritized tasks that can accumulate technical debt.

Spec-Driven Development for Both Models 📝

Specifications play a crucial role in both foreground and background agent interactions.

  • Foreground Spec Focus: While direct interaction is high-fidelity, detailed specs help define the desired outcome. Developers can continuously refine the spec and agent output throughout the interaction.
  • Background Spec Focus: Specs are fundamental for defining long-running, multi-step background tasks, providing clear intent and context for the agent.

The Power of MCP: Standardizing Tool Calling 🤝

Model Context Protocol (MCP) is a foundational element in LinkedIn’s AI tooling strategy.

  • Standardizing Tool Interaction: MCP standardizes how AI models interact with tools, overcoming the fragmentation caused by different model vendors and API formats.
  • Enabling Interoperability: This standardization allows any language, agent, tool, or model to interact seamlessly, unifying the workflow.
  • Reusable Tools: MCP enables the reuse of the same tools across both foreground and background agents, maximizing efficiency.

Internal MCP Tooling:

LinkedIn has developed a suite of internal MCP tools for:

  • Code search
  • Static analysis
  • Executing internal command-line tools
  • Structured impact analysis
  • Surfacing structured knowledge from internal documentation and semantic indexes
  • Accessing production observability and metrics data

The philosophy is to expose existing infrastructure and systems via MCP tools, enabling agents to interact with them in a standardized way.

Security, Compliance, and Observability: Pillars of the Platform 🔒

Building AI at enterprise scale demands a robust foundation of security, compliance, and observability.

  • Developer-Facing Abstractions: Mechanisms for prompt management, tool definition, MCP server spin-up, and abstracted inference (commercial/in-house models) are provided.
  • Technology and Process Changes:
    • Sandbox Environments: Agents run in restricted sandbox environments, limiting their capabilities and access to sensitive systems.
    • Limited Context and Identity: Agents are provided with only necessary context and possess unique identities, making all their actions auditable.
    • Reusing Human Development Abstractions: Security and compliance abstractions used for human developers are adapted for agents.
    • Transparency and Control: All agent actions (steps, tool calls, decisions) are observable and auditable, allowing for detailed inspection of the “chain of thought.”
  • Process Overrides: Crucially, agents cannot directly make code changes. They propose changes that undergo the exact same review and testing processes as human-generated code. Developers can replay traces, inspect reasoning, and verify agent actions before approval. AI augments, rather than replaces, human judgment.

Enhancing Developer Experience (DevX) 🌟

The key challenge in developer experience is bridging the knowledge gap between human context and AI comprehension.

  • Elaborating Task Descriptions: Developers need to effectively translate their implicit knowledge and the adjacent context of a task into a format the agent can understand. This is an ongoing area of evolution.
  • Context Across Hops and Agents: For multi-step, inter-tool, and inter-agent communications, LinkedIn maintains a dynamic “semantic understanding” of the codebase. This is queryable by agents, much like a human would search documentation or ask a colleague.
  • Retrieval-Augmented Generation (RAG): A RAG system provides agents with engineering context, including PRs, architectural decisions, and guiding principles, allowing them to follow established patterns.
  • AI-Described PRs: An experimental approach involves using AI to describe past PRs, detailing the changes made. Agents can then query these descriptions to understand the impact of similar changes or follow established patterns.

Advice for Developers Embarking on the AI Journey 🧭

As developers navigate the transition from foreground to background AI adoption, here’s some expert advice:

From Karthik:

  1. Invest in Solid Engineering and Platform Abstractions: This is the only way to move beyond the hype and achieve production-ready AI solutions.
  2. Understand AI Strengths and Limitations: Use AI wisely. Recognize what tasks it excels at and where human judgment remains indispensable. AI is not yet fully autonomous.
  3. Adapt Your Processes: Trying to shoehorn AI into heavily human-dependent, undocumented, or tribal knowledge processes won’t work. You must evolve your workflows for maximum AI effectiveness.

From Prince:

  1. Don’t Underestimate Evals: Evals are critical for measuring system improvement or regression. Treat them as a core platform component, not an afterthought.
  2. Solve for Company-Specific Context: Avoid simply recreating existing AI tools. Instead, identify your unique, repetitive, high-friction engineering tasks where AI can provide the most significant value.

By embracing these principles and learning from the pioneering work at companies like LinkedIn, developers can unlock the transformative potential of AI, ushering in a new era of productivity and innovation in the software lifecycle. 💡

Appendix