Key Responsibilities and Required Skills for Infrastructure Architect
💰 $ - $
🎯 Role Definition
We are seeking an experienced Infrastructure Architect to design, build, and evolve resilient, secure, and cost‑efficient infrastructure platforms across cloud (AWS, Azure, GCP) and on‑prem environments. The Infrastructure Architect will own architecture blueprints, lead platform and automation initiatives (IaC, CI/CD), define operational standards for performance, availability, security and compliance, and partner closely with engineering, security, and operations teams to deliver scalable infrastructure solutions that support business goals.
The ideal candidate combines deep technical expertise in cloud architecture, networking, security and automation with strong stakeholder leadership, hands‑on implementation experience, and a track record of delivering platform‑level improvements for performance, reliability, and cost optimization.
📈 Career Progression
Typical Career Path
Entry Point From:
- Senior Systems Engineer (Cloud/Platform)
- Senior DevOps / Site Reliability Engineer (SRE)
- Cloud Solutions Engineer / Cloud Engineer
Advancement To:
- Principal Infrastructure Architect
- Cloud/Platform Architect Lead
- Director of Cloud Infrastructure / Head of Platform Engineering
Lateral Moves:
- Security Architect (Cloud/Infra)
- Solution Architect (Application/Integration)
- Platform Engineering Manager
Core Responsibilities
Primary Functions
- Design and document scalable, highly available, and secure infrastructure architectures across public cloud (AWS, Azure, GCP) and on‑premises data centers, producing reference architectures, deployment patterns, and migration plans that align with business objectives and SLA/RTO requirements.
- Lead cloud migration strategies and execution for lift‑and‑shift, replatforming, and refactoring initiatives, including migration planning, risk assessment, runbooks, cutover plans, and rollback strategies to minimize downtime and data loss.
- Define, implement, and enforce Infrastructure as Code (IaC) standards using tools such as Terraform, CloudFormation, ARM templates, or Pulumi to ensure reproducible, auditable, and versioned infrastructure deployments.
- Architect container platforms and orchestration solutions (Kubernetes, EKS, AKS, GKE) including cluster topology, multi‑tenant isolation, network policies, persistent storage design, and upgrade/maintenance strategies.
- Design and implement CI/CD pipelines and release automation for infrastructure and platform components using Jenkins, GitLab CI, GitHub Actions, Argo CD or similar tools to enable safe, repeatable deployments.
- Establish observability and monitoring frameworks (Prometheus, Grafana, Datadog, New Relic, ELK) for infrastructure metrics, logging, distributed tracing, alerting, and capacity planning that provide actionable SRE‑grade telemetry.
- Create and enforce cloud cost management and optimization practices, including rightsizing, reserved instances/savings plans, tagging strategies, and chargeback/showback models to control infrastructure spend.
- Define and implement security architecture controls (network segmentation, WAF, firewalls, VPC design, IAM policies, encryption at rest/in transit) and collaborate with security teams to meet compliance standards such as SOC2, PCI-DSS, HIPAA, GDPR.
- Design disaster recovery (DR) and business continuity architectures, including runbooks, failover testing, backups, cross‑region replication, RTO/RPO validation and periodic recovery drills.
- Lead capacity planning and performance tuning exercises to ensure infrastructure can meet current and forecasted demand, including load testing, autoscaling policies, and performance optimization recommendations.
- Drive platform reliability and incident management improvements, including defining SLOs/SLIs, root cause analysis (RCA) processes, postmortem culture, and mitigation strategies to reduce MTTR and prevent recurrence.
- Evaluate new infrastructure technologies, cloud services, third‑party platforms, and open‑source solutions; perform proof‑of‑concepts and vendor assessments to inform platform roadmap and RFPs.
- Build and maintain network architecture including routing, load balancing, CDN integration, DNS, IP addressing strategies, VPN/Direct Connect or ExpressRoute designs and hybrid connectivity patterns.
- Define identity and access management (IAM) models and governance for multi‑cloud environments, integrating SSO, role-based access control, and least‑privilege principles across teams and services.
- Drive automation of repetitive operations through scripting (Python, Bash, PowerShell) and orchestration to improve developer productivity and operational efficiency while reducing human error.
- Create and maintain infrastructure standards, playbooks, runbooks, and architecture decision records (ADRs) to guide engineering teams and ensure consistent deployments and lifecycle management.
- Collaborate with application architects, product owners and engineering teams to translate application requirements into infrastructure patterns, ensuring alignment on cost, scalability, security and operability.
- Mentor and coach engineering and operations teams on cloud best practices, IaC, containerization, platform usage and incident response to raise organizational capability.
- Manage vendor and third‑party relationships for cloud providers and infrastructure vendors, negotiate SLAs and ensure third‑party services integrate securely and reliably into the platform.
- Lead cross‑functional architecture governance boards and review processes, providing technical leadership, architectural reviews, and approval of major infrastructure changes and projects.
- Drive continuous improvement initiatives for deployment velocity, platform stability and security posture by introducing platform observability, GitOps workflows and infrastructure testing (unit/integration/contract tests for infra).
- Ensure data infrastructure and storage architectures meet application throughput and latency requirements, advising on databases, object storage, caching, and data movement strategies.
- Define migration and decommissioning strategies for legacy systems, ensuring secure data migration, minimal disruption and thorough validation of post‑migration performance and compliance.
Secondary Functions
- Support ad‑hoc infrastructure and data requests, perform exploratory analysis and provide actionable recommendations to engineering and product teams.
- Contribute to the organization's infrastructure strategy and multi‑year platform roadmap, focusing on standardization, consolidation and cloud adoption.
- Collaborate with business units to translate infrastructure and data needs into engineering requirements, technical specifications and acceptance criteria.
- Participate in sprint planning and agile ceremonies, helping teams decompose infrastructure work, estimate effort and identify dependencies.
- Document architecture decisions, create runbooks and ensure knowledge transfer across teams for operational readiness and oncall support.
- Provide on‑call rotation support and guidance during major incidents, coordinating mitigation, communication and postmortem activities.
- Assist security and compliance teams during audits by providing evidence of controls, architecture diagrams and remediation plans.
- Help define platform onboarding processes, developer self‑service capabilities and internal training materials to accelerate developer adoption of platform services.
Required Skills & Competencies
Hard Skills (Technical)
- Cloud Architecture (AWS, Azure, GCP) — designing VPCs, regions, multi‑region strategies, cloud native services and hybrid connectivity.
- Infrastructure as Code (Terraform, CloudFormation, ARM templates, Pulumi) — modular design, state management and CI integration.
- Containerization and Orchestration (Docker, Kubernetes, EKS, AKS, GKE) — cluster design, Helm charts, operators and GitOps.
- CI/CD and Release Automation (Jenkins, GitLab CI/CD, GitHub Actions, Argo CD) — pipeline design for infra and app deployments.
- Networking and Security (VPN, BGP, firewalls, WAF, load balancers, zero trust, VPC peering) and IAM policy design (AWS IAM, Azure AD).
- Observability and Monitoring (Prometheus, Grafana, Datadog, ELK/Elastic Stack, Splunk) — alerting, tracing and dashboards.
- Scripting and Automation (Python, Bash, PowerShell) — automation of provisioning, testing and runbook tasks.
- Database and Storage Architecture (RDS, Aurora, DynamoDB, Cloud SQL, object storage, block storage, caching strategies).
- Disaster Recovery & High Availability Design — cross‑region replication, backup strategies, failover testing.
- Cost Management and FinOps — tagging, budgeting, reserved instances, rightsizing, and cost optimization strategies.
- Security & Compliance Controls — encryption, key management (KMS), vulnerability management and regulatory compliance (PCI, HIPAA, SOC2).
- Platform Engineering & SRE Practices — SLO/SLI/SLA design, incident management, automation of operational runbooks.
- Load Balancing, CDN & Edge Patterns — NGiNX, ALB, CloudFront, Azure CDN, caching and latency optimization.
- Experience with enterprise architecture frameworks (TOGAF) and creating architecture decision records (ADRs).
Soft Skills
- Strong stakeholder management: communicate architecture decisions clearly to technical and non‑technical stakeholders.
- Strategic thinking: align infrastructure roadmaps with business objectives and cost constraints.
- Leadership and mentorship: guide engineering teams and build platform competency.
- Excellent written and verbal communication: produce clear architecture documentation, runbooks and executive summaries.
- Problem solving and analytical thinking: diagnose complex issues and drive pragmatic resolutions.
- Collaboration and influencing: work across distributed teams and vendor partners to deliver cross‑functional programs.
- Prioritization and time management: balance technical debt, innovation and operational needs.
- Attention to detail with a focus on security, compliance and operational excellence.
- Change management and governance: lead architecture reviews and ensure safe rollout of platform changes.
- Customer focus and empathy: design infrastructure solutions that enable product and engineering success.
Education & Experience
Educational Background
Minimum Education:
- Bachelor's degree in Computer Science, Information Systems, Computer Engineering or related technical field (or equivalent practical experience).
Preferred Education:
- Master’s degree in Computer Science, Cloud Computing, Systems Engineering or MBA with technical focus.
- Professional certifications such as AWS Certified Solutions Architect (Professional), Google Professional Cloud Architect, Microsoft Azure Solutions Architect, HashiCorp Certified: Terraform Associate, CKA, TOGAF.
Relevant Fields of Study:
- Computer Science
- Information Technology / Systems
- Network Engineering
- Cloud Computing / Distributed Systems
- Electrical or Computer Engineering
Experience Requirements
Typical Experience Range: 7–12+ years of IT experience with 3–5+ years in a cloud architecture or platform architecture role.
Preferred:
- Proven track record designing and delivering enterprise cloud or hybrid infrastructure at scale.
- Hands‑on experience leading migrations, IaC adoption, Kubernetes platform builds, and cross‑functional architecture governance.
- Prior experience working in regulated environments (finance, healthcare, SaaS) and supporting audit/compliance requirements.