Data Structure • easy

Hash Map / Dictionary Complete Guide

Hash maps store key-value pairs and are one of the most important tools for reducing nested loops to single-pass solutions.

They are powered by hash functions that map keys to buckets. Collisions happen and are resolved internally with chaining or probing.

In practice, hash maps are everywhere: caching, joins, deduplication, indexing, rate limits, and identity maps.

Typical Complexity Baseline

Metric	Value
Insert/find average	`O(1)`

Video Explainer

Prefer video learning? This explainer gives a quick visual walkthrough of the core idea before you dive into the detailed sections below.

HashMaps & Dictionaries, Explained Simply

Channel: Nic Barker

Full video link:

https://www.youtube.com/watch?v=y11XNXi9dgs

Core Concepts

Learn the core building blocks and terminology in one place before comparisons, so the mechanics are clear and duplicates are removed.

Hash function

What it is: A hash function turns a key (like "user:42") into a deterministic integer. The table then maps that integer to a bucket index using modulo by bucket length. Function that maps a key to a deterministic integer value.

Why it matters: Distribution quality controls performance. If hashes spread keys evenly, bucket scans stay short and lookup remains close to O(1) average. Good hash functions are fast and distribute keys uniformly. They do not need to be cryptographic for hash map usage, but they must reduce clustering.

Buckets

What it is: Buckets are storage slots in the table. Each bucket holds zero or more entries, usually as a linked list or dynamic array.

Why it matters: Bucket count determines collision pressure. Too few buckets means long chains and slower operations. More buckets means better speed but higher memory use.

Collision handling

What it is: A collision happens when two different keys land in the same bucket index. Common strategies are separate chaining and open addressing.

Why it matters: Correct collision handling is what makes hash maps trustworthy. Without it, inserts overwrite each other and lookups become incorrect.

Load factor

What it is: Load factor is entryCount / bucketCount. It measures how full the table is. Ratio between entries and buckets (n / m).

Why it matters: As load factor rises, collisions increase and average lookup cost rises. Most implementations resize before load factor gets too high. Load factor is a control metric for balancing speed and memory. High load means fewer buckets per key and higher collision probability.

Rehashing

What it is: Rehashing allocates a larger bucket table and reinserts existing entries so each key lands in its new bucket index. Rebuild table with new capacity and remap all keys.

Why it matters: Rehashing keeps long-term performance stable. It is expensive occasionally, but it prevents every operation from getting slower as data grows. Rehashing is usually O(n) for that one resize event, but amortized over many inserts it preserves near O(1) average insertion and lookup.

Collision

What it is: Two different keys map to the same bucket index.

Why it matters: Collisions are normal. Correctness comes from storing key-value pairs and checking full key equality during lookup, not from unique hashes.

Bucket

What it is: One slot in the table that stores one or more entries.

Why it matters: Buckets are the first partition step. Average performance depends on keeping per-bucket entry counts small via good hashing and resizing.

Putting It All Together

Follow the full hash map lifecycle from key hashing to bucket selection, collision resolution, and resize behavior.

Step 1

Hash the key into a deterministic number

Start with the lookup or insert key. Run it through the hash function to produce a deterministic integer.

Why this step matters: Deterministic hashing guarantees the same key always maps to the same candidate bucket before collision checks.

Step 2

Map hash output to a bucket candidate

Use modulo with bucket count to transform the large hash number into one valid bucket index.

Why this step matters: Modulo makes bucket selection bounded and stable, which keeps average lookup close to O(1) when distribution is healthy.

Step 3

Handle collisions inside the bucket

If multiple keys map to the same bucket, scan the chain/probe candidates and compare real key equality.

Why this step matters: Collision handling preserves correctness; hash equality alone is never enough to decide key equality.

Step 4

Resolve operation result from bucket match

For lookup: return value when key matches. For insert: update existing key or append a new entry in the collision structure.

Why this step matters: This is where data correctness is guaranteed in the presence of collisions and repeated writes.

Step 5

Resize and rehash when load factor grows

When entry count relative to bucket count crosses threshold, allocate a larger table and rehash all keys.

Why this step matters: Rehashing prevents chain growth from degrading performance and keeps long-term operations fast.

How It Compares

vs Array

When to choose this: Choose hash maps for key-based lookup not tied to numeric index.

Tradeoff: Memory overhead is higher than compact arrays.

vs Trie

When to choose this: Choose hash maps for exact-key retrieval.

Tradeoff: Trie performs better for prefix-heavy workloads.

vs Balanced Tree Map

When to choose this: Choose hash maps for fastest average lookup.

Tradeoff: Tree maps keep keys sorted and support range queries.

Real-World Stories and Company Examples

Redis

Redis dictionaries are hash-table based and power core key lookup behavior.

Takeaway: Fast key indexing is foundational for low-latency caching.

Shopify

Order and session services use hash-style key lookup for idempotency and dedup checks.

Takeaway: Hash maps help enforce correctness in high-throughput commerce workflows.

Cloudflare

Rate-limiting and bot protection paths rely on quick key counters by IP/token signatures.

Takeaway: Hash maps turn high-cardinality counting into practical real-time control.

Implementation Guide

Build your own bucket table, hash function, collision chain, and resize flow so the structure is explicit rather than hidden behind built-in map classes.

Complexity: Average insert/get O(1), resize O(n), space O(n)

Language

Hash map implementation from scratch (chaining)

type HashNode = { key: string; value: number; next: HashNode | null }

class SimpleHashMap {
  private buckets: Array<HashNode | null>
  private entryCount = 0

  constructor(private capacity = 16) {
    this.buckets = new Array<HashNode | null>(capacity).fill(null)
  }

  private hashKey(key: string): number {
    let hashValue = 2166136261
    for (let index = 0; index < key.length; index += 1) {
      hashValue ^= key.charCodeAt(index)
      hashValue = Math.imul(hashValue, 16777619)
    }
    return (hashValue >>> 0) % this.capacity
  }

  private resizeIfNeeded(): void {
    const loadFactor = this.entryCount / this.capacity
    if (loadFactor < 0.75) {
      return
    }

    const oldBuckets = this.buckets
    this.capacity *= 2
    this.buckets = new Array<HashNode | null>(this.capacity).fill(null)
    this.entryCount = 0

    for (const bucketHead of oldBuckets) {
      let currentNode = bucketHead
      while (currentNode) {
        this.set(currentNode.key, currentNode.value)
        currentNode = currentNode.next
      }
    }
  }

  set(key: string, value: number): void {
    this.resizeIfNeeded()
    const bucketIndex = this.hashKey(key)
    let currentNode = this.buckets[bucketIndex]

    while (currentNode) {
      if (currentNode.key === key) {
        currentNode.value = value
        return
      }
      currentNode = currentNode.next
    }

    this.buckets[bucketIndex] = { key, value, next: this.buckets[bucketIndex] }
    this.entryCount += 1
  }

  get(key: string): number | null {
    const bucketIndex = this.hashKey(key)
    let currentNode = this.buckets[bucketIndex]

    while (currentNode) {
      if (currentNode.key === key) {
        return currentNode.value
      }
      currentNode = currentNode.next
    }

    return null
  }
}

function firstDuplicateUsingCustomHashMap(values: number[]): number | null {
  const countsByValue = new SimpleHashMap()

  for (const value of values) {
    const key = String(value)
    const previousCount = countsByValue.get(key) ?? 0
    const nextCount = previousCount + 1
    countsByValue.set(key, nextCount)

    if (nextCount > 1) {
      return value
    }
  }

  return null
}

Common Problems and Failure Modes

Assuming worst-case behavior is always O(1); adversarial keys can degrade performance.
Using mutable objects as keys without stable hash/equality semantics.
Ignoring memory cost when key cardinality grows rapidly.
Reimplementing map logic manually instead of using battle-tested library primitives.
Forgetting that hash maps do not preserve sorted order by default.

Tips and Tricks

Hash map is usually the first check when you need fast existence or frequency lookup.
Store exactly what you need as value (count, index, metadata), not entire objects by default.
When duplicates matter, count frequencies instead of boolean presence.
Think in two passes if one pass gets branch-heavy and error-prone.

When to Use

Use these signals to decide if this data structure/algorithm is the right fit before implementation.

Real-system usage signals

Average O(1) lookup/insert/delete for key-based access patterns.
Transforms expensive membership checks into fast key probes.
Helps express intent clearly in business logic and data pipelines.

LeetCode-specific tips (including pattern-identification signals)

Identification signal: you need fast existence checks, frequency counts, or key-to-value lookup during one pass.
Identification signal: brute force compares every pair, and the optimized version hints at remembering what was seen earlier.
When wording includes "duplicate", "count occurrences", or "find complement", hash-map is usually the first candidate.
For Hash Map / Dictionary questions, start by naming the core invariant before writing code.
Use the constraint section to set time/space target first, then pick the data structure/algorithm.
Solve one tiny example by hand and map each transition to your variables before implementing.
Run adversarial cases: empty input, duplicates, max-size input, and sorted/reverse patterns when relevant.
During interviews, explain why your approach is the right pattern for this prompt, not just why the code works.

LeetCode Progression (Easy to Hard)

Back to DSA Guide Catalog

#	Problem	Difficulty	Typical Complexity
1	Contains Duplicate	Easy	`O(n)`
2	Valid Anagram	Easy	`O(n)`
3	Two Sum	Easy	`O(n)`
4	Group Anagrams	Medium	`O(n*k log k)`
5	Top K Frequent Elements	Medium	`O(n log k)`
6	Subarray Sum Equals K	Medium	`O(n)`
7	Longest Consecutive Sequence	Medium	`O(n)`
8	4Sum II	Medium	`O(n^2)`
9	Minimum Window Substring	Hard	`O(n)`
10	LFU Cache	Hard	`O(1)` amortized