Set in Java

Set: Complete Deep Dive & Backend Working

The Set interface is a cornerstone of the Java Collection Framework. While a List focuses on maintaining exact insertion order and happily accepts duplicate values, a Set enforces absolute mathematical uniqueness.

You should instantly reach for a Set when your data requirements dictate:

Strict Uniqueness: No duplicate elements are permitted.
High-Speed Searching: You need to constantly check if an element exists without iterating over the entire collection.
Data Filtering: You need to automatically strip duplicates from an incoming data stream.

1. The Set Hierarchy

Before diving into the code, you must understand the interface topology. The Set interface inherits directly from Collection.

Iterable (Interface)

⬇

Collection (Interface)

⬇

Set (Interface)

HashSet

LinkedHashSet

SortedSet

⬇

NavigableSet

⬇

TreeSet

2. Package Structure & Declaration

All core Set interfaces and implementation classes reside exclusively in the java.util package.

package java.util;

public interface Set<E> extends Collection<E> {
    // Core methods: add(), remove(), contains(), size()
}
  

3. Why was Set Created? (The Duplicate Problem)

In standard Java programming, you often process data where duplicates are logically invalid. Suppose you are processing an array of User IDs or Email Addresses.

List<Integer> list = new ArrayList<>();
list.add(10);
list.add(10); // Duplicates accepted!
  

If you use a List, you must manually write expensive $O(n)$ loops to check if an ID already exists before inserting it. The Set solves this elegantly: it simply ignores duplicates at the backend architecture level.

Real World Use Cases for Set

User Authentication: Storing unique active session tokens.
E-commerce: Storing the unique categories or tags applied to a product.
Database Operations: Caching unique primary keys fetched from a database table to avoid duplicate API calls.

4. Features of the Set Interface

Because Set is just an interface, its exact behavior depends entirely on which implementation class you instantiate. However, these baseline rules apply:

Feature	Set Behavior
Duplicate Elements	Absolutely No. The `add()` method returns `false` if it exists.
Insertion Order	Depends on class (No for `HashSet`, Yes for `LinkedHashSet`).
Null Values	Depends on class (1 allowed in `HashSet`, None in `TreeSet`).
Index-based Access	No. You cannot do `set.get(3)`. Sets do not have indices.

5. The Ultimate Secret: How Sets Actually Work Internally

The Backend Reality

This is the most critical concept to master for backend interviews: Java Set implementations do not actually exist as standalone data structures.

They are simply wrappers around Java Map classes!

HashSet internally uses a HashMap.
LinkedHashSet internally uses a LinkedHashMap.
TreeSet internally uses a TreeMap (Red-Black Tree).

How does a Set use a Map?

A Map requires a Key and a Value. A Set only requires a single element. To bridge this gap, the Set stores your element as the Map's Key. But what does it put in the Value slot?

It uses a dummy, constant, empty Object called PRESENT.

// Inside java.util.HashSet Source Code:

private transient HashMap<E,Object> map;

// Dummy value to associate with an Object in the backing Map
private static final Object PRESENT = new Object();

public boolean add(E e) {
    // Returns true if the key didn't exist previously
    return map.put(e, PRESENT) == null;
}
  

Because HashMap automatically handles hash collisions and prevents duplicate keys, the HashSet gets absolute uniqueness essentially for free!

6. Implementation Breakdown & Time Complexities

1. HashSet (The Default Choice)

Backed by a HashMap. It computes the hashCode() of the object to find the bucket location. It provides absolutely no guarantee regarding iteration order.

add() / remove() / contains(): $O(1)$ average time.
Use when: You need the fastest possible performance and don't care about sorting or order.

2. LinkedHashSet

Backed by a LinkedHashMap. It behaves exactly like a HashSet but maintains a Doubly-Linked List across all its elements to preserve chronological insertion order.

add() / remove() / contains(): $O(1)$ average time.
Use when: You need uniqueness, fast access, but must be able to iterate through the data in the exact order it arrived.

3. TreeSet

Backed by a TreeMap, which is implemented as a self-balancing Red-Black Tree. It completely abandons hashing. Instead, it places every element into a perfectly sorted, balanced binary tree.

add() / remove() / contains(): $O(\log n)$ guaranteed time.
Use when: You need to maintain a live, constantly sorted collection of unique items (e.g., a real-time leaderboard).

Learn Here Fun Pedia

Search This Blog

Set in Java

Set: Complete Deep Dive & Backend Working

1. The Set Hierarchy

2. Package Structure & Declaration

3. Why was Set Created? (The Duplicate Problem)

Real World Use Cases for Set

4. Features of the Set Interface

5. The Ultimate Secret: How Sets Actually Work Internally

The Backend Reality

How does a Set use a Map?

6. Implementation Breakdown & Time Complexities

1. HashSet (The Default Choice)

2. LinkedHashSet

3. TreeSet

Labels

Comments

Post a Comment

Popular posts from this blog

How I Got Selected in MNC

Spring Boot Introduction

Data Types in C