The Set Architecture

Official Definition: "A collection that contains no duplicate elements. More formally, sets contain no pair of elements e1 and e2 such that e1.equals(e2)." — java.util.Set

1. The Complete Hierarchy

Before understanding the backend, you must understand the architectural flow. The Set interface inherits directly from Collection.

Iterable

⬇

Collection

⬇

Set

HashSet

⬇

LinkedHashSet

SortedSet

⬇

NavigableSet

⬇

TreeSet

2. How Set Prevents Duplicates

This is a critical backend concept. A Set does not write manual for loops to check if an element exists. Instead, the Set interface internally relies on Map implementations.

Uniqueness is guaranteed strictly by utilizing two core Object methods:

hashCode(): Finds the memory bucket.
equals(): Checks if the object exactly matches existing data in that bucket.

Set<Integer> set = new HashSet<>();
set.add(10);
set.add(10); // Evaluates to duplicate, rejected silently
System.out.println(set); // Output: [10]
      

HashSet Deep Dive

Official Definition: "This class implements the Set interface, backed by a hash table (actually a HashMap instance). It makes no guarantees as to the iteration order of the set." — java.util.HashSet

1. The Internal Backend Reality

If you look at the raw Java source code for HashSet, you will discover that it doesn't store data by itself. It is merely a wrapper around a HashMap.

// Inside java.util.HashSet
private transient HashMap<E,Object> map;

// The Hidden Dummy Object
private static final Object PRESENT = new Object();
      

Because a Map requires both a Key and a Value, the HashSet stores your actual data as the Key. It uses a static, empty dummy object called PRESENT to fill the Value requirement.

2. Backend Flow of add()

When you call set.add("Java"), this is the exact logic executed:

public boolean add(E e) {
    return map.put(e, PRESENT) == null;
}
      

If map.put() returns null, it means the key didn't exist, and the insertion was successful. If it returns an object (the PRESENT object), it means the key was a duplicate, and the Set rejects it by returning false.

3. The Bucket Array & Index Formula

The backing HashMap uses an array of buckets. Java calculates exactly where to put your data using a bitwise formula:

index = (n - 1) & hash

Where n is the table size. This bitwise operation is significantly faster than standard modulo (%) arithmetic.

4. Collision Handling & Treeification (Java 8)

If two distinct keys generate the same hash and land in the same bucket, a Collision occurs.

Java 7: Handled collisions using a standard Linked List ($O(n)$ search time).
Java 8+: If a single bucket exceeds 8 elements (and total table size $\ge 64$), the bucket's linked list violently converts into a Red-Black Tree. This drops search complexity from $O(n)$ down to a highly optimized $O(\log n)$.

5. The Mutable Key Danger

Never use mutable objects (objects whose data can change after creation) as elements in a HashSet. If you insert a Student object, and later change the Student's ID, its hashCode() will change. The Set will never be able to find it in the original bucket, creating a fatal memory leak and broken lookups.

LinkedHashSet Backend

Official Definition: "Hash table and linked list implementation of the Set interface, with predictable iteration order. This implementation differs from HashSet in that it maintains a doubly-linked list running through all of its entries." — java.util.LinkedHashSet

1. Why was it created?

A standard HashSet stores items chaotically based on memory hashes. If you insert 10, 50, and 20, they might print out as 50, 10, 20. LinkedHashSet was engineered specifically to solve this by explicitly preserving Insertion Order.

2. Internal Data Structure

Just as a HashSet wraps a HashMap, a LinkedHashSet wraps a LinkedHashMap. It uses a hybrid architecture:

Hash Table: For extremely fast $O(1)$ lookups.
Doubly Linked List: To string all the entries together in the exact order they were inserted.

// Internal Entry Structure inherited from LinkedHashMap
static class Entry<K,V> extends HashMap.Node<K,V> {
    Entry<K,V> before;
    Entry<K,V> after;
}
      

3. Memory vs. Performance Tradeoff

The time complexities for add(), remove(), and contains() remain $O(1)$. However, it requires significantly more memory because every single node must store two additional memory pointers (before and after).

Real-world use case: Caching recent search history where uniqueness and chronological order are equally vital.

SortedSet & NavigableSet

SortedSet: "A Set that further provides a total ordering on its elements."
NavigableSet: "A SortedSet extended with navigation methods reporting closest matches for given search targets."

1. The SortedSet Interface

The SortedSet interface dictates that elements must be automatically sorted either by their Natural Ordering (using Comparable) or by a custom injected Comparator.

first() / last(): Returns the lowest or highest elements.
headSet(toElement): Returns a view of the set strictly less than the target.
tailSet(fromElement): Returns a view of the set greater than or equal to the target.
subSet(from, to): Returns a strictly bounded range view.

2. The NavigableSet Interface

Introduced later to provide advanced "closest-match" algorithms. It extends SortedSet.

Method	Behavior
`ceiling(e)`	Returns the smallest element $\ge e$ (or null if none).
`floor(e)`	Returns the largest element $\le e$ (or null if none).
`higher(e)`	Returns the smallest element strictly > e.
`lower(e)`	Returns the largest element strictly < e.
`pollFirst() / pollLast()`	Retrieves and removes the lowest/highest element.

TreeSet & Red-Black Trees

Official Definition: "A NavigableSet implementation based on a TreeMap. The elements are ordered using their natural ordering, or by a Comparator provided at set creation time." — java.util.TreeSet

1. The Internal Backend

TreeSet completely abandons Hash Tables. It does not use hashCode() or buckets. Internally, it is backed by a TreeMap, which utilizes a Red-Black Tree—a self-balancing binary search tree.

// Internal Node Structure
static final class Entry<K,V> {
    K key;
    Entry<K,V> left;
    Entry<K,V> right;
    Entry<K,V> parent;
    boolean color; // RED or BLACK
}
      

2. The Balancing Act

If you insert sequential data (e.g., 10, 20, 30, 40), a standard binary tree would devolve into a linked list, degrading search time to $O(n)$.

The Red-Black Tree enforces strict mathematical coloring rules. If the tree becomes top-heavy, it automatically executes Rotations and Recoloring to ensure the maximum depth is minimized. This guarantees that add(), remove(), and contains() strictly operate in $O(\log n)$ time.

3. The Null Restriction

You cannot add null to a TreeSet. Because it relies heavily on comparing elements (e.g., element.compareTo(target)), attempting to compare a null reference immediately triggers a NullPointerException.

Summary & Interview Guide

1. The Ultimate Comparison

Feature	HashSet	LinkedHashSet	TreeSet
Internal Backend	HashMap	LinkedHashMap	TreeMap (Red-Black Tree)
Element Ordering	Chaotic / None	Insertion Order	Sorted (Natural/Comparator)
Null Elements	Yes (1 allowed)	Yes (1 allowed)	No (Throws NPE)
Speed (add/search)	$O(1)$	$O(1)$	$O(\log n)$
Memory Footprint	Medium	High (Extra Pointers)	Highest (Tree Pointers + Colors)

2. Important Interview Questions

Q: What is the backend of HashSet?
A: A HashMap. It stores the actual elements as the keys, and uses a dummy static object named PRESENT for the values.
Q: Why is TreeSet slower than HashSet?
A: Because TreeSet must perform mathematical comparisons, node recoloring, and physical tree rotations upon insertion to maintain a perfectly balanced Red-Black Tree ($O(\log n)$), whereas HashSet just computes a hash and drops the item in a bucket ($O(1)$).
Q: Why do equals() and hashCode() matter for Sets?
A: For hash-based sets, hashCode() determines the memory bucket. If two objects land in the same bucket, equals() is executed to confirm if they are truly identical to prevent duplicates.

How I Got Selected in MNC

Virtusa Sometimes success does not come from having the best coding skills or the perfect roadmap. Sometimes it comes from simply refusing to quit. This is the honest story of how I transitioned from a confused, rejected fresher to getting selected as an Associate Engineer at Virtusa. The Beginning: Confused About My Future After completing my graduation, I stared blankly at my career options. Like many freshers, I lacked a clear direction. Should I join a Java course? Should I prepare on my own? Should I just wait for campus placement opportunities? One day, I called my friend Chetan. He suggested I join Naresh i Technologies and start learning Java seriously. Still unsure of my path, I told him I needed time to think about it. A couple of days later, my phone buzzed with a WhatsApp message offering a job opportunity. They asked me to come for the next round of the recruitment process. Excitement completely took over. I packed my bags, traveled to th...

Learn Here Fun Pedia

Search This Blog

Set summary