Skip to main content

HashMap in Java

HashMap

HashMap is one of the most optimized, heavily tested, and important data structures in the entire Java ecosystem. On the surface, it provides a simple interface to store data in a Key $\rightarrow$ Value format. But under the hood, its backend architecture is a masterpiece of algorithmic efficiency and bitwise optimization.

HashMap<Integer, String> map = new HashMap<>();

map.put(101, "Madhusudan");
map.put(102, "Rahul");

1. What Exactly is HashMap?

As a core component of the Java Collections Framework, HashMap is the standard implementation of the Map interface. It relies entirely on the concept of hashing to achieve blazing-fast insertion and retrieval speeds.

Feature HashMap Behavior
Ordering No guarantee of order
Thread Safe No (Use ConcurrentHashMap for threads)
Null Keys Exactly 1 allowed
Null Values Multiple allowed
Average Complexity $O(1)$ for get() and put()

2. The Real Backend Structure

Inside the JVM, a HashMap is not just a single data structure. It is a hybrid architecture utilizing an Array + Linked List + Red-Black Tree.

The main storage is an array of "buckets", defined internally as:

transient Node<K,V>[] table;

Internal Node Structure

Every key-value pair you insert is wrapped in a Node object. Here is the simplified internal code:

static class Node<K,V> implements Map.Entry<K,V> {
    final int hash;
    final K key;
    V value;
    Node<K,V> next;
}
  • hash: The calculated integer hashcode of the key.
  • key: The actual key object.
  • value: The stored value object.
  • next: A pointer to the next node (used for Linked List chaining during collisions).

3. Step-by-Step Backend Working: map.put()

Let's trace exactly what the JVM does when you execute: map.put("APPLE", 100);

STEP 1: Calculate hashCode()

Java first calls the key's native hash function: "APPLE".hashCode().
For strings, the algorithm evaluates to $s[0] \times 31^{n-1} + s[1] \times 31^{n-2} \dots$
This results in a raw 32-bit integer.

STEP 2: Hash Spreading (The Bitwise Shift)

Java doesn't trust the raw hashcode to be perfectly random. To prevent clustering, it improves the hash quality using a bitwise XOR operation against its own upper 16 bits:

static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

STEP 3: The Golden Formula (Bucket Calculation)

This is the most critical operation in HashMap. Java must map the hash to a valid array index (0 to 15). Instead of using the slow modulo operator (%), it uses a bitwise AND.

$$ \text{index} = (n - 1) \ \& \ \text{hash} $$

Where n is the current capacity of the array (default 16). Because (n-1) & hash only works efficiently when n is a power of 2, HashMap capacity is strictly enforced to be 16, 32, 64, 128, etc.


4. Handling Collisions & The equals() Contract

What happens if two different keys calculate to the exact same bucket index? Example:

map.put(1, "A"); // Index: (16-1) & 1 = 1
map.put(17, "B"); // Index: (16-1) & 17 = 1

Both map to table[1]. This is a Collision. Java solves this by chaining them into a Linked List at that bucket index: Node(1,A) → Node(17,B)

Why are both hashCode() and equals() required?

If you use a custom object as a key (like a Student class) and forget to override equals() and hashCode(), retrieval will fail. The default Object implementation compares memory addresses. During a get(), even if the hash matches, Java relies on equals() to traverse the linked list and confidently find the exact matching key object.


5. Java 8 Optimization: Treeification

Before Java 8, massive collisions degraded HashMap performance from $O(1)$ to a disastrous $O(n)$ because it meant scanning massive linked lists.

Java 8 introduced Treeification. If a single bucket's linked list grows too large, the node chain violently upgrades itself from a Linked List to a Red-Black Tree.

  • Trigger Condition: bucket_size $\ge 8$ AND table_capacity $\ge 64$
  • Resulting Complexity: The worst-case lookup drops drastically from $O(n)$ down to $O(\log n)$.

6. Resizing and Rehashing

A HashMap keeps track of how "full" it is using a loadFactor (default 0.75).

Threshold = Capacity $\times$ LoadFactor ($16 \times 0.75 = 12$).

The moment you insert the 13th element, the HashMap panics. It doubles the capacity from 16 to 32. But it cannot just copy the old array over. Because the capacity (n) changed, the golden formula (n-1) & hash results in entirely new indexes. Every single element must be completely redistributed. This heavy operation is known as Rehashing.

Critical Interview Concept: Immutable Keys

Why are String and Integer the best keys for a HashMap? Because they are immutable. If a mutable key changes its state after being inserted, its hashCode changes. When you try to get() it later, the formula will calculate the wrong bucket index, and the value will be permanently lost in memory.

The Ultimate Architecture Cheat Sheet

Operation Average Case Worst Case (Heavy Collisions)
put() $O(1)$ $O(\log n)$ (Red-Black Tree)
get() $O(1)$ $O(\log n)$
remove() $O(1)$ $O(\log n)$

Backend Optimizations at a Glance:

  • Tail Insertion: Java 8 inserts new nodes at the tail of the bucket's list to preserve order and reduce tracking complexity (older Java inserted at the head).
  • Null Keys: Handled flawlessly. A null key skips the math and is hardcoded to be stored immediately in table[0].
  • Fail-Fast Iterator: If the map is structurally modified while iterating (tracked via an internal modCount variable), it instantly throws a ConcurrentModificationException.

Comments

Popular posts from this blog

How I Got Selected in MNC

Virtusa Sometimes success does not come from having the best coding skills or the perfect roadmap. Sometimes it comes from simply refusing to quit. This is the honest story of how I transitioned from a confused, rejected fresher to getting selected as an Associate Engineer at Virtusa. The Beginning: Confused About My Future After completing my graduation, I stared blankly at my career options. Like many freshers, I lacked a clear direction. Should I join a Java course? Should I prepare on my own? Should I just wait for campus placement opportunities? One day, I called my friend Chetan. He suggested I join Naresh i Technologies and start learning Java seriously. Still unsure of my path, I told him I needed time to think about it. A couple of days later, my phone buzzed with a WhatsApp message offering a job opportunity. They asked me to come for the next round of the recruitment process. Excitement completely took over. I packed my bags, traveled to th...

Spring Boot Introduction

Spring Boot Introduction: Architecture, Dependencies, and Embedded Servers Modern enterprise applications demand rapid development, frictionless deployment, and absolute minimal configuration. Before Spring Boot arrived, developers utilizing the Spring Framework wasted immense amounts of time configuring XML files, managing clashing dependencies, setting up clunky application servers, and stitching various Spring modules together manually. To eliminate these bottlenecks, Pivotal introduced Spring Boot . Built entirely on top of the traditional Spring Framework, Spring Boot is an "opinionated" framework. It aggressively simplifies application development by injecting auto-configuration, packaging starter dependencies, and embedding web servers directly into your application. This allows backend developers to focus entirely on building business logic rather than wrestling with infrastructure setup. What is Spring Boot? Spring Boot is a powerful extens...

Strings in C

C Programming: Working with Strings Unlike modern programming languages like Python or Java, C does not possess a dedicated "String" data type. Instead, C treats a string as a simple 1D array of characters. Real-life example: Think of a freight train. The train does not exist as one solid object; it consists of individual boxcars linked together. Similarly, a C string links individual characters side-by-side in memory. To let the computer know the train has ended, C attaches a special "caboose" called the Null Terminator ( \0 ). 1. Essential String Functions Handling strings manually requires complex loops. To save time, C provides a built-in library called <string.h> that contains powerful functions to manipulate text. strlen(): You use this to find the exact length of a string. The compiler counts the characters until it hits the \0 terminator. It does not count the terminator itself. strcpy(): You ...