HashMaps & Dictionaries, Explained Simply
Channel: Nic Barker
Full video link:
https://www.youtube.com/watch?v=y11XNXi9dgsData Structure • easy
Hash maps store key-value pairs and are one of the most important tools for reducing nested loops to single-pass solutions.
They are powered by hash functions that map keys to buckets. Collisions happen and are resolved internally with chaining or probing.
In practice, hash maps are everywhere: caching, joins, deduplication, indexing, rate limits, and identity maps.
Typical Complexity Baseline
| Metric | Value |
|---|---|
| Insert/find average | O(1) |
Prefer video learning? This explainer gives a quick visual walkthrough of the core idea before you dive into the detailed sections below.
HashMaps & Dictionaries, Explained Simply
Channel: Nic Barker
Full video link:
https://www.youtube.com/watch?v=y11XNXi9dgsLearn the core building blocks and terminology in one place before comparisons, so the mechanics are clear and duplicates are removed.
Hash function
What it is: A hash function turns a key (like "user:42") into a deterministic integer. The table then maps that integer to a bucket index using modulo by bucket length. Function that maps a key to a deterministic integer value.
Why it matters: Distribution quality controls performance. If hashes spread keys evenly, bucket scans stay short and lookup remains close to O(1) average. Good hash functions are fast and distribute keys uniformly. They do not need to be cryptographic for hash map usage, but they must reduce clustering.
Buckets
What it is: Buckets are storage slots in the table. Each bucket holds zero or more entries, usually as a linked list or dynamic array.
Why it matters: Bucket count determines collision pressure. Too few buckets means long chains and slower operations. More buckets means better speed but higher memory use.
Collision handling
What it is: A collision happens when two different keys land in the same bucket index. Common strategies are separate chaining and open addressing.
Why it matters: Correct collision handling is what makes hash maps trustworthy. Without it, inserts overwrite each other and lookups become incorrect.
Load factor
What it is: Load factor is entryCount / bucketCount. It measures how full the table is. Ratio between entries and buckets (n / m).
Why it matters: As load factor rises, collisions increase and average lookup cost rises. Most implementations resize before load factor gets too high. Load factor is a control metric for balancing speed and memory. High load means fewer buckets per key and higher collision probability.
Rehashing
What it is: Rehashing allocates a larger bucket table and reinserts existing entries so each key lands in its new bucket index. Rebuild table with new capacity and remap all keys.
Why it matters: Rehashing keeps long-term performance stable. It is expensive occasionally, but it prevents every operation from getting slower as data grows. Rehashing is usually O(n) for that one resize event, but amortized over many inserts it preserves near O(1) average insertion and lookup.
Collision
What it is: Two different keys map to the same bucket index.
Why it matters: Collisions are normal. Correctness comes from storing key-value pairs and checking full key equality during lookup, not from unique hashes.
Bucket
What it is: One slot in the table that stores one or more entries.
Why it matters: Buckets are the first partition step. Average performance depends on keeping per-bucket entry counts small via good hashing and resizing.
Follow the full hash map lifecycle from key hashing to bucket selection, collision resolution, and resize behavior.
Step 1
Start with the lookup or insert key. Run it through the hash function to produce a deterministic integer.
Why this step matters: Deterministic hashing guarantees the same key always maps to the same candidate bucket before collision checks.
Step 2
Use modulo with bucket count to transform the large hash number into one valid bucket index.
Why this step matters: Modulo makes bucket selection bounded and stable, which keeps average lookup close to O(1) when distribution is healthy.
Step 3
If multiple keys map to the same bucket, scan the chain/probe candidates and compare real key equality.
Why this step matters: Collision handling preserves correctness; hash equality alone is never enough to decide key equality.
Step 4
For lookup: return value when key matches. For insert: update existing key or append a new entry in the collision structure.
Why this step matters: This is where data correctness is guaranteed in the presence of collisions and repeated writes.
Step 5
When entry count relative to bucket count crosses threshold, allocate a larger table and rehash all keys.
Why this step matters: Rehashing prevents chain growth from degrading performance and keeps long-term operations fast.
vs Array
When to choose this: Choose hash maps for key-based lookup not tied to numeric index.
Tradeoff: Memory overhead is higher than compact arrays.
vs Trie
When to choose this: Choose hash maps for exact-key retrieval.
Tradeoff: Trie performs better for prefix-heavy workloads.
vs Balanced Tree Map
When to choose this: Choose hash maps for fastest average lookup.
Tradeoff: Tree maps keep keys sorted and support range queries.
Redis
Redis dictionaries are hash-table based and power core key lookup behavior.
Takeaway: Fast key indexing is foundational for low-latency caching.
Shopify
Order and session services use hash-style key lookup for idempotency and dedup checks.
Takeaway: Hash maps help enforce correctness in high-throughput commerce workflows.
Cloudflare
Rate-limiting and bot protection paths rely on quick key counters by IP/token signatures.
Takeaway: Hash maps turn high-cardinality counting into practical real-time control.
Build your own bucket table, hash function, collision chain, and resize flow so the structure is explicit rather than hidden behind built-in map classes.
Complexity: Average insert/get O(1), resize O(n), space O(n)
Hash map implementation from scratch (chaining)
type HashNode = { key: string; value: number; next: HashNode | null }
class SimpleHashMap {
private buckets: Array<HashNode | null>
private entryCount = 0
constructor(private capacity = 16) {
this.buckets = new Array<HashNode | null>(capacity).fill(null)
}
private hashKey(key: string): number {
let hashValue = 2166136261
for (let index = 0; index < key.length; index += 1) {
hashValue ^= key.charCodeAt(index)
hashValue = Math.imul(hashValue, 16777619)
}
return (hashValue >>> 0) % this.capacity
}
private resizeIfNeeded(): void {
const loadFactor = this.entryCount / this.capacity
if (loadFactor < 0.75) {
return
}
const oldBuckets = this.buckets
this.capacity *= 2
this.buckets = new Array<HashNode | null>(this.capacity).fill(null)
this.entryCount = 0
for (const bucketHead of oldBuckets) {
let currentNode = bucketHead
while (currentNode) {
this.set(currentNode.key, currentNode.value)
currentNode = currentNode.next
}
}
}
set(key: string, value: number): void {
this.resizeIfNeeded()
const bucketIndex = this.hashKey(key)
let currentNode = this.buckets[bucketIndex]
while (currentNode) {
if (currentNode.key === key) {
currentNode.value = value
return
}
currentNode = currentNode.next
}
this.buckets[bucketIndex] = { key, value, next: this.buckets[bucketIndex] }
this.entryCount += 1
}
get(key: string): number | null {
const bucketIndex = this.hashKey(key)
let currentNode = this.buckets[bucketIndex]
while (currentNode) {
if (currentNode.key === key) {
return currentNode.value
}
currentNode = currentNode.next
}
return null
}
}
function firstDuplicateUsingCustomHashMap(values: number[]): number | null {
const countsByValue = new SimpleHashMap()
for (const value of values) {
const key = String(value)
const previousCount = countsByValue.get(key) ?? 0
const nextCount = previousCount + 1
countsByValue.set(key, nextCount)
if (nextCount > 1) {
return value
}
}
return null
}
O(1); adversarial keys can degrade performance.Use these signals to decide if this data structure/algorithm is the right fit before implementation.
O(1) lookup/insert/delete for key-based access patterns.| # | Problem | Difficulty | Typical Complexity |
|---|---|---|---|
| 1 | Contains Duplicate | Easy | O(n) |
| 2 | Valid Anagram | Easy | O(n) |
| 3 | Two Sum | Easy | O(n) |
| 4 | Group Anagrams | Medium | O(n*k log k) |
| 5 | Top K Frequent Elements | Medium | O(n log k) |
| 6 | Subarray Sum Equals K | Medium | O(n) |
| 7 | Longest Consecutive Sequence | Medium | O(n) |
| 8 | 4Sum II | Medium | O(n^2) |
| 9 | Minimum Window Substring | Hard | O(n) |
| 10 | LFU Cache | Hard | O(1) amortized |