Home ☁️ Cloud AWS S3

What is
AWS S3?

Amazon's giant cloud filing cabinet — explained from a curious kid's first question all the way to how a professional architect designs with it.

📅 Updated 2026 ⏱️ ~18 min read ☁️ AWS · Cloud
🗃️

Object Storage

Flat, key-based architecture

♾️

Unlimited Scale

Petabytes to exabytes

🔒

Private by Default

IAM + bucket policies

📊

11 Nines Durable

99.999999999% reliability

🍳

The Simple Explanation

S3 is a Giant Cloud Pantry

Imagine a high-end restaurant with a master pantry so big it could hold every ingredient on earth. Every jar, every bag, every bottle has a unique label. You can walk in, grab exactly what you need in milliseconds, and never worry about it running out of shelf space — ever. That's Amazon S3.

Unlike the filing cabinet on your computer (folders inside folders inside folders), S3 uses a flat structure. Every item goes in a bin called a Bucket and gets a unique label called a Key. No nested drawers — just one giant open floor with perfectly labelled shelves.

🪣 Buckets — The Global Storage Bins

Every object in S3 must live inside a Bucket. Think of it as the labelled bin in the pantry. For nearly two decades these bin names had to be globally unique — if a chef in New York named a bin pizza-supplies, no one else on earth could use that name. Modern S3 now allows duplicate names across different AWS accounts via account-level namespaces.

Naming RuleDescription🍳 Kitchen Analogy
Length3–63 charactersLabel must be clear and concise
CharactersLowercase letters, numbers, hyphens onlyNo fancy cursive or uppercase — hard for scanners to read
Start / EndMust start and end with a letter or numberNo beginning or ending with a dash
UniquenessHistorically globally unique across all accountsEvery bin has one serial number for the whole world
No underscoresUnderscores and uppercase not allowedKeeps labels easy for digital systems to parse

📦 Objects — The Digital Ingredients

An Object is the basic storage unit — not just a file, but a complete package. In our kitchen, an object is a jar of tomato sauce: the sauce is the data, the label tomato-sauce-2025 is the key, and the sticker with the expiry date is the metadata. Objects range from 0 bytes up to 5 TB each.

🏷️ Key (The Label)

The unique name within a bucket. Using slashes like recipes/italian/pasta.txt creates a visual "folder" — but it's really just one long string. S3 has no real folders.

📄 Value (The Data)

The actual file content. An image, a PDF, a video, a trained ML model. 2026 updates extended max object size to support giant AI training datasets.

📋 Metadata (Nutrition Label)

System metadata is set by S3 itself (Last-Modified, ETag, size). User metadata is your custom info like Chef: Mario. Max 2 KB of user metadata per object.

🏷️ Tags (Colour-Coded Stickers)

Up to 10 key-value tags per object. Unlike metadata, tags can be changed without re-uploading the file. Perfect for cost allocation, access control, and lifecycle rules.

🌍 Regions & Availability Zones

When you create a bucket you choose a Region (e.g. us-east-1, eu-west-1). Inside each region are Availability Zones (AZs) — physically separate data centres with independent power and networking.

🍳 Kitchen analogy: S3 Standard automatically copies your ingredients to at least three separate kitchen buildings in the same city. If one building floods, your tomato sauce is perfectly safe in the other two — and you never notice a difference. This is how S3 achieves eleven nines of durability (99.999999999%). Store 10 million objects, and you might lose one every 10,000 years.

🥛

The Simple Explanation

Not All Ingredients Need the Front Shelf

Milk goes at the front of the fridge — you need it every day. Extra flour can go in the back cupboard. A vintage wine from 2010 goes in the basement vault. S3 has six storage class families that work exactly the same way: the less frequently you access data, the cheaper the storage — but the longer it takes to retrieve.

🔥 Instant · ms

S3 Standard

The Front Counter

For data you touch every day — website images, active user files, real-time analytics. High throughput, millisecond latency, stored in ≥3 AZs. No minimum storage duration.

🧠 Auto-tiered

S3 Intelligent-Tiering

The Smart Helper

Automatically moves objects between five tiers based on actual access patterns. Not touched in 30 days → Infrequent Access. 90 days → Archive Instant. Ideal when you can't predict usage.

📦 ms · lower cost

S3 Standard-IA

The Side Pantry (3 AZ)

Infrequent Access. Millisecond retrieval, but you pay a fee per GET. Stored across ≥3 AZs. Good for disaster recovery, long-term backups you occasionally need fast.

🏠 ms · 1-AZ risk

S3 One Zone-IA

Single-Building Pantry

20% cheaper than Standard-IA but lives in only one AZ. If that AZ is destroyed, the data is gone. Only use for re-creatable data — like thumbnails generated from originals stored elsewhere.

🧊 min → hours

S3 Glacier (3 tiers)

The Underground Vault

Instant: millisecond retrieval for archives accessed quarterly. Flexible: 1–5 hour retrieval, good for backups. Deep Archive: 12–48 hours, cheapest storage in the cloud, designed for 7–10 year retention.

10× faster

S3 Express One Zone

The Turbo Kitchen

Up to 10× faster than S3 Standard with 50–80% lower request costs. Uses "Directory Buckets" co-located in one AZ next to your compute. For AI training, HPC, real-time analytics at massive scale.

Intelligent-Tiering — How the 5 Tiers Work

Access TierThresholdRetrieval SpeedPurpose
Frequent Access0–30 days touchedMillisecondsItems you use every day
Infrequent AccessNot touched for 30+ daysMillisecondsItems used once a month
Archive InstantNot touched for 90+ daysMillisecondsUsed quarterly, needed fast
Archive AccessOptional opt-in3–5 hoursDeep storage, occasional use
Deep Archive AccessOptional opt-in12 hoursCompliance-grade deep storage

Storage Cost vs. Retrieval Speed

Lower cost always means slower retrieval. Choose based on how often you need the data.

🗓️

The Simple Explanation

The Automated Kitchen Manager

A professional kitchen can't rely on the chef to manually throw out every expired jar. It needs automated rules: "Move tomatoes to the cold store after 30 days. Throw them out after 1 year." S3 has the same system — Lifecycle Policies — plus a Time Machine (Versioning) and Mirror Kitchens (Replication).

♻️ Lifecycle Policies

🚚 Transition Actions

Automatically move objects to cheaper storage classes as they age. Example rule:

Day 0: Standard
↓ after 30 days
Day 30: Standard-IA
↓ after 90 days
Day 90: Glacier Flexible
↓ after 365 days
Day 365: Deep Archive

🗑️ Expiration Actions

Permanently delete objects on a schedule. Use cases:

  • Delete customer feedback logs after 1 year
  • Delete old log files after 90 days
  • AbortIncompleteMultipartUpload after 7 days to clean up partial uploads and stop paying for "half-eaten" ingredients

Versioning — The Time Machine

🍳 When versioning is ON, S3 never overwrites a file — it keeps all versions. If a chef spills soup on a recipe (accidentally overwrites a critical file), they just look back in time and restore the original version. "Deleting" a file just places a Delete Marker on top — the original is still underneath, hidden, retrievable at any time.

🟢

Versioning Enabled

All versions stored, delete markers used

⏸️

Versioning Suspended

Existing versions kept, new writes unversioned

🔴

Versioning Off

Default state. Overwrites replace files permanently

🔁 Replication — Mirror Kitchens

🌍 Cross-Region (CRR)

Copies every new object to a bucket in a different AWS Region. Ensures people in London get data as fast as people in New York. Also used for disaster recovery across geographies.

🏙️ Same-Region (SRR)

Copies objects to another bucket in the same region. Good for maintaining a test/dev copy of production data or sharing data between teams without moving regions.

⏱️ Replication Time Control (RTC)

Guarantees 99.99% of objects are replicated within 15 minutes. Has an SLA. Pay premium for this — use it for high-stakes compliance or near-real-time DR.

🔑

The Simple Explanation

S3 is Private by Default

Only the person who creates a bucket can see inside it — full stop. There is no "make public" checkbox on by default. To open the door to anyone else, you must explicitly grant access. This section covers how those keys and locks work.

🚧 Block Public Access — The Master Gate

Block public ACLs for new buckets

Stops anyone using old-style ACL keys to open new doors.

Ignore all existing public ACLs

Ignores any old ACL keys that were already distributed.

Block public bucket policies for new buckets

Prevents writing new policy rules that let the internet in.

Block public and cross-account access

The master kill switch — keeps the bucket completely private regardless of any other rule. Enable this unless you specifically need a public bucket.

👤 IAM Policies vs. Bucket Policies

FeatureIAM PolicyBucket Policy
What it controlsWhat an identity (user/role) can doWhat can happen to a specific bucket
Where it livesAttached to the IAM user/roleAttached to the S3 bucket itself
AnalogyA keycard given to an employeeA sign written on the pantry door
Cross-accountNeeds trust policiesCan directly grant cross-account access
FormatJSON with Effect, Action, ResourceJSON with Effect, Principal, Action, Resource
Best forInternal users, EC2 roles, Lambda functionsPublic reads, IP restrictions, cross-account grants

🔐 Encryption — The Secret Codes

SSE-S3

AWS-Managed Keys

AWS manages everything. Like a built-in safe that locks itself. Zero config, always on by default since Jan 2023.

SSE-KMS

Customer-Controlled Keys

You define and manage keys in AWS KMS. Full audit trail. You control who can open the safe and can revoke access instantly.

SSE-C

Customer-Provided Keys

You bring your own key with every request. AWS never stores it. Maximum control — but you lose the key, you lose the data, permanently.

🔒 Object Lock — The Evidence Locker

Laws often require records to be kept and unmodifiable for years (WORM — Write Once, Read Many). Object Lock enforces this.

🟡 Governance Mode

Regular users can't delete. Privileged users with a special IAM permission can override and delete if needed.

🔴 Compliance Mode

No one — not even the AWS root account owner — can delete the object until the retention period expires. Ironclad for regulated industries.

⚠️ Legal Hold

Indefinite lock with no expiry timer. Stays on until a manager with s3:PutObjectLegalHold permission manually removes it.

🏎️

The Simple Explanation

Moving Large Crates Faster

When the restaurant scales to a factory, you need better vehicles. Moving one giant 10 GB crate in one go is risky — if you drop it, you restart from zero. S3 gives you three ways to move data faster and more reliably at scale.

📦

Multipart Upload

Required above 5 GB

Break a giant file into smaller parts (minimum 5 MB each), upload them all in parallel, and S3 assembles them at the destination. If any one part fails, only that part needs to be re-uploaded.

File: 10 GB video
↓ split into 100 parts × 100 MB
↓ upload 10 parts simultaneously
↓ S3 re-assembles automatically
Result: ~5× faster, failure-safe
🚀

Transfer Acceleration

Edge Network

The public internet is like a crowded city street with traffic lights. Transfer Acceleration routes your data onto AWS's private high-speed backbone at the nearest Edge Location (there are 400+ worldwide), bypassing the public internet entirely.

Normal upload
via public internet
Accelerated upload
via AWS backbone
🍒

S3 Select

SQL on objects

Normally if you want one cherry from a 10-gallon drum of fruit, you have to download the entire drum. S3 Select lets you run a SQL-like query inside S3 before the data leaves storage. Only the matching rows are sent back to you.

SELECT s.name, s.revenue
FROM s3object s
WHERE s.revenue > 1000000

Works on CSV, JSON, and Parquet files. Can cut data transfer by 80–98% for analytical queries.

📋 Management & Analytics

📊

S3 Inventory

A daily or weekly CSV/ORC/Parquet report of every object in your bucket — its size, storage class, encryption status, replication status, and more. Far cheaper than using the List API to manually count billions of objects.

🖥️

S3 Storage Lens

An organisation-wide dashboard covering all accounts and regions. Gives contextual recommendations — e.g. "You're paying S3 Standard prices for 800 TB of data no one has accessed in 4 months. Move it to Intelligent-Tiering."

TierMetricsCost
Free62 metrics, 14 days history$0
Advanced136+ metrics, 15 months history, prefix-levelPer-object charge
🤖

2025–2026 Evolution

S3 is Now an AI Engine

As of 2026, S3 has evolved from a storage service into the active data layer of the AI era. Three new capabilities — S3 Tables, S3 Vectors, and S3 Access Grants — have fundamentally changed what S3 can do.

📊

S3 Tables

10× faster queries

Traditional S3 objects (CSV, JSON) are hard for big data engines to read efficiently. S3 Tables are a new bucket type built on the Apache Iceberg open-table format. They self-optimise — automatically compacting, sorting, and indexing data so that query engines like Athena and Spark can read them 10× faster than standard S3 files.

Apache Iceberg format Auto-compaction ACID transactions Time-travel queries
🧠

S3 Vectors

90% cheaper than vector DBs

AI models think in vectors — long lists of numbers that represent the meaning of text, images, or audio. S3 Vectors lets AI agents store and query billions of these vectors directly in S3, at up to 90% lower cost than a dedicated vector database. This is how an AI can remember every customer interaction, every document it has ever read, at petabyte scale.

🍳 Kitchen analogy: Imagine the AI chef has a card-catalogue of 10 billion flavour memories. Every dish, every ingredient combination. S3 Vectors stores all of these memories at near-zero cost, so the AI can instantly look up "what went well with truffle?" from 10 billion past experiences.
🏢

S3 Access Grants

Enterprise IAM integration

Managing thousands of IAM policies was a nightmare at enterprise scale. S3 Access Grants connects S3 to your existing corporate identity directory (Active Directory, Okta, etc.). When a new employee joins, they automatically get the right S3 access based on their job role — no one needs to write a new IAM policy for them.

AWS IAM Identity Center Active Directory sync Prefix-level grants

Summary

The Professional S3 Practitioner

Mastering S3 means understanding all six layers: Foundations (buckets, objects, keys), Storage Classes (cost vs. speed trade-offs), Lifecycle automation (moving and expiring data), Security (Block Public Access, IAM, encryption, Object Lock), Performance tools (Multipart, Acceleration, Select), and the new AI capabilities (Tables, Vectors, Access Grants).

As S3 approaches its 20th anniversary in 2026, it has transformed from a simple file store into the central nervous system of the AI-driven enterprise. Understanding its mechanics deeply is an essential skill for the modern era.