Static IAM users with long-lived access keys are the AWS pattern that refuses to die. They’re also the single largest source of preventable cloud incidents I see in audits. This post walks through the federation pattern I’ve deployed multiple times: Azure AD as identity provider, AWS IAM Identity Center as the access broker, SAML 2.0 for sign-in, SCIM for provisioning, and just-in-time IAM credentials for production database access.
I’ve published a deeper engineering walkthrough of this architecture on my portfolio site; this post is the “why” and the lessons learned.
The pattern
- Identity lives in Azure AD. Joiners, movers, leavers go through HR → Azure AD. AWS does not own the user lifecycle.
- SAML 2.0 federation from Azure AD to AWS IAM Identity Center. Single sign-on flows through the corporate IdP; MFA is enforced at the IdP, not duplicated downstream.
- SCIM provisioning pushes group membership changes from Azure AD into Identity Center. A leaver in Azure AD is a removed group member in AWS within minutes — automatically.
- Permission Sets in Identity Center map AD groups to AWS roles.
db-readonly,db-write-emergency,data-admin, etc. - JIT credentials for the database itself. The role grants the ability to request a short-lived database credential, not the credential itself. Vault / RDS IAM auth / Aurora IAM tokens depending on engine.
Why this is actually Zero-Trust, not just “SSO”
- No standing privilege. The user doesn’t carry a database password. The credential is minted on demand and expires fast.
- Context-aware MFA. Risky sign-in (new device, unusual location) escalates MFA at the IdP. AWS doesn’t need to know about it.
- Group is the unit of access. Not the user. Audit becomes “who was in this group when?” instead of “who had this key?”
- Lifecycle is automatic. Termination in HR → group removal in Azure AD → SCIM push → access gone. No tickets.
The point of SSO isn’t convenience. The point is that the moment HR clicks “terminate,” the engineer can’t reach prod — and you don’t have to remember to revoke anything.
The pitfalls (in deployment order)
- Trust-store mismatches. Azure AD’s SAML signing cert rotates; if your Identity Center metadata is stale, federation breaks at midnight on a Saturday. Automate the metadata refresh.
- Group naming chaos. Pick a strict naming convention before you start (
aws-{account}-{permset}) and enforce it. Renaming groups after the fact is misery. - SCIM batching. Membership changes propagate in batches; if you depend on sub-minute revocation, supplement with session-revocation policies on the AWS side.
- Break-glass. One IAM user, hardware-MFA only, credentials in a sealed envelope. If the IdP is down, you still need a way in. Pretend it doesn’t exist; rotate it after every use.
- Database engine quirks. RDS IAM auth has connection limits. Aurora handles bursts better. Plan for this; benchmark before you cut over.
Map permission sets to job functions, not job titles. Titles change every reorg; functions change rarely. data-platform-admin ages well; senior-engineer-team-c doesn’t.
What we measure once it’s in
- Number of standing IAM users — target zero, exception list reviewed quarterly.
- Average age of a privileged credential at the moment of use — target under 1 hour.
- Time-to-revoke from HR termination event — target under 15 minutes, alert on anything over.
- Permission-set sprawl — alert when the count climbs faster than headcount.
Further reading
- Azure AD → AWS SSO database access guide (deep-dive)
- Ransomware DR: tabletop to live drill — SSO + DR are the same conversation
Still running long-lived access keys against production? Come and talk to us — we’ll help you walk it back to zero.