Skip to main content

Collections

Arrays

Arrays provide an ordered collection of elements. They're immutable.

float[] theVals = new float[3];
float[] theVals = { 10.0f, 20.0f, 15.0f };

In Java, array variables are references. See more at References to learn more.

Problems

private static Product[] add(Product product, Product[] array){
int length = array.length;
var newArray = Arrays.copyOf(array, length + 1);
newArray[length] = product;
return newArray;
}
var door = new Product("Wooden Door", 35);
var floorPanel = new Product("Floor Panel", 25);
var window = new Product("Glass Window", 10);

// Create
Product[] products = { door, floorPanel };
// 1. Print - horrible format
System.out.println(products);
System.out.println(Arrays.toString(products));
// 2. Add - complex code
products[2] = window; // array is immutable
products = add(window, products);
// 3. Duplicates - no way
products = add(window, products);

Array is also covariant, which means that if a subtype array can be assigned to a supertype array reference variable:

class Animal {}
class Dog extends Animal {}

public class ArrayCovarianceExample {
public static void main(String[] args) {
Dog[] dogs = { new Dog(), new Dog() };
Animal[] animals = dogs; // this is allowed in array

animals[0].toString();
// compiler only allows us to call methods in the Animal class
animals[0].bark();
// if you want to call bark(), you have to downcast
// which is really dangerous
((Dog) animals[0]).bark();
}
}

\rightarrow So we'd better not using it. Using generic List can prevent this, see example. See why we shouldn't downcast here.

List<Integer> nums = Arrays.asList(1, 2, 3);  // [1, 2, 3]

Collection of Collections

Each collection has more than 2 different components.

InterfacesImplementation
Multiple data structuresSpecific data structures
Functional characteristics (behavior)Performance characteristics
Prefer as variable typeConcrete and instantiable
Often has a popular implementation

What to use?

Common Behaviors

Common Methods

  • size()
  • isEmpty()
  • add()
  • addAll(): like add but for many
  • remove(element)
  • removeAll(collection)
  • retainAll(collection): retain: giữ lại
  • contains(element)
  • containsAll(collection)
  • clear()

Iterator

Collection extends Iterable (like Iterator).

Collection<Product> products = new ArrayList<>();
products.add(door); products.add(floorPanel); products.add(window);

for (var product : products) {
products.remove(product); // modification while operation (looping) will crash it
}
// removeIf
products.removeIf(product -> product.weight() > 20);

// use iterator to remove products.
Iterator<Product> iterator = products.iterator();
while (iterator.hasNext()) {
final Product product = iterator.next();
if (product.weight() > 20) {
iterator.remove();
}
}

These methods pass in a lambda expression that implements some common functional interface.

Collection Factories (Immutable)

Sometimes, the code allows modifying the view that takes change the backing, which violate encapsulation. \rightarrow these help us to create unmodifiable, immutable, empty or wrapping collections. Modification will throw exception.

Create Empty Collections

They seem useless, but memory efficient. Use when you want to pass no values to a method that takes a collection.

// immutable
List<String> list = Collections.emptyList();
Map<Integer, String> map = Collections.emptyMap();
Set<Integer> set = Collections.emptySet();

Create Singletons

// immutable
List<String> list = Collections.singletonList("one");
Map<Integer, String> map = Collections.singletonMap(1, "one");
Set<Integer> set = Collections.singletonSet(1);

Create (Not Empty) Collections

// immutable
List<String> list = List.of("UK", "USA");
Map<String, Integer> map = Map.of("UK", 67, "USA", 328);
// when more than 10 args
Map<String, Integer> entries = Map.ofEntries(
Map.ofEntries(
Map.entry("UK", 67),
Map.entry("US", 328)
));

Create Copies

var mutableList = new HashMap<>();  // this is mutable
// immutable
var immutableCopy = Map.copyOf(mutableList); // this is immutable

// modifying mutableList does not modify immutableCopy
mutableCopy.put(key, value);

Create Views

Views are unmodifiable, but they can be read (modify backing list will affect the view). \rightarrow use when we want to keep mutation within a class.

Collections.unmodifiableList(list);
Collections.unmodifiableMap(map);

List<String> countries = new ArrayList<>();
countries.add("UK"); countries.add("USA");
// if countries.add("FR"); is also adding to countriesView
List<String> countriesView = Collections.unmodifiableList(countries);

Collection Operations

Useful collection algorithms.

var products = new ArrayList<Product>();
Collections.addAll(products, door, window, chair);
Collections.rotate(elements, distance); // distance = 1: [a, b, c] -> [c, a, b]
Collections.shuffle(elements, randomGenerator); // randomly arrange, default use Java.random
Collections.binarySearch(elements, key, Comparator);
Collections.disjoint(a, b); // true nếu không giao nhau
Collections.frequency(letters, 'A');
Collections.max(elements, comparator); // min
Collections.fill(list, element); // replace every element
Collections.swap(elements, 1, 2); // swap element by index
Collections.reverse(elements);

Lists

List are collections with iteration order. Every element in the list has an index.

Key Features

Basic get/set

  • void add(int index, E e)
  • E get(int index)
  • E remove(int index)
  • E set(int index, E element)
  • boolean addAll(int index, Collection c)

Static Factory Methods

Creates unmodifiable List instance

  • List<E> of()
  • List<E> of(E e1, E e2)
  • List<E> of (E ... elements)
  • List<E> copyOf(Collection<E>): this is a shallow copy (just copies values)

Overloads for 0-10 arguments. For > 10 args, there's varargs constructor.

Lookup & Split

  • int indexOf(Object o): return -1 if not found
  • int lastIndexOf(Object o)
  • List subList(int fromIndex, int toIndex): this is a view, modifying the view also modifies the backing list itself
info

Modifying a view also affects on the backing one.

Sorting

Java use TimSort-based algorithm

  • list.sort(Comparator<? Super E> comparator): comparator defines sort order

A Comparator is an interface in Java that defines sort order.

// define Comparator
import java.util.Comparator;
public static final Comparator<Product> BY_WEIGHT = Comparator.comparingInt(Product::weight);
/**
implements
public interface Comparator<T> {
int compare(T o1, T o2);
}
**/
products.sort(Product.BY_WEIGHT);

Implementations

:::warningAvoid List legacy implementations avoid Vector, Stack, they're synchronized. :::

ArrayList

How it works

There is a backing array, where there's many space to add, remove, etc. Most of the time, JDK start with an empty backing array. When you add first element, it grows to the default initial collection size (10).

Once you run out of 10 elements, it starts to doubling in size.

This is good general purpose implementation, use as default. CPU cache sympathetic.

LinkedList

Has head nodes and tail nodes, they could be the same and pointer at each of them. LinkedList in Java is doubly linked list (có hướng ngược lại).

Most of the time, worse performance. Only good when adding elements at start, adding/remove a lot.

Performance Comparison

getaddcontainsremovenext
ArrayListO(1)O(1)O(N)O(N) doubling strategy, copy & expand the list
Ω(1)\Omega(1) with larger array
O(N)O(N)O(N)O(N) delete headO(1)O(1)
LinkedListO(N)O(N)O(1)O(1)O(N)O(N)O(N)O(N)O(1)O(1)

Maps

Maps are collections of pairs, like dictionaries. Key -> Value. Key are unique.

Map is better because it helps you not to deal with lookup loop so much. Map is the only collections that don't extend or implement the Collection interface.

info

Java use "map" terms instead of "dictionary", but it's actually a class in JDK & it predates the original Java collection API & kind of deprecated at this time. So don't use it.

Key Features

  • V put(K key, V value): return previous value associated with the key, if no, return null
  • void putAll(Map<? extends K, ? extends V> values)
  • get(Object key): if no, return null
  • boolean constainsKey(Object key)
  • boolean constainsValue(Object value)
  • V remove(Object key)
  • void clear()

*Use Object type rather than K key generic: if you have a local variable that is type of Object and has a string in it, you can use that variable without introducing additional cast to a specific key type

Views

Modify it also modify backing map. All below return a view.

  • var ids = map.keySet();
  • var values = map.values();
  • var entries = map.entrySet();: entry represents a <K, V> pair. We cannot add element to entry set, but we can remove

Immutable Map Factories

Map.Entry<String, Integer> entry = Map.entry("Richard", 38);  // individual key/value pairs
Map<String, Integer> personToAge = Map.of("Richard", 38); // up to 10 value specific overload Factories
personToAge = Map.ofEntries(Map.entry("Richard", 38)); // for > 10 varargs factory takes entry objects
Map<String, Integer> copy = Map.copyOf(personToAge); // immutable copies of existing Map

Alter & Remove

  • replace(key, value): update single value
  • replaceAll(BiFunction<K, V, V>): replace elements using a function (key, oldValue) -> newValue
  • remove(key, value): remove a key only if it has a value

Update

  • getOrDefault(key, defaultValue): regular get returns null if not found
  • computeIfAbsent(key, function): return value if present. If no, execute function key -> newValue, then return the new value
  • putIfAbsent
  • computeIfPresent
  • compute
  • merge

Normally, you have yo use entrySet, but you can use forEach for more convenient.

Implementations

HashMap allows null keys & values. TreeMap allows null values, but not keys.

HashMap

Good general purpose implementation.

:::infoHow it works When put value in hash map, it takes .hashCode() of the key. It takes a bucket of elements behind that It compute the hash from that key rehash(hash) % bucket_count \rightarrow defines a slot within the backing array

Caveat: hash value itself is passed through rehashing function to reduce probability of hash code collision Buckets are linked list to accommodate collisions In Java 8, buckets are converted to trees when there are > 8 elements in the linked list :::

Implement obj.equals() & obj.hashCode(). Example:

@Override
public boolean equals(final Object o) {
if(this == o) return true;
if(o == null || getClass() != o.getClass()) return false;
final Animal animal = (Animal) o;
return Objects.equals(name, animal.name); // ? name.equals(animal.name);
}
@Override
public int hashCode() { return Objects.hash(name); } // ? name.hashCode();
warning

Whatever type used as the keys in HashMap, those need to be immutable objects (always return the same value for hashCode method).

TreeMap

Defines sort order and add functionality

:::infoHow it works Use red/black tree (a balanced binary tre) :::

Advanced

EnumMap, LinkedHashMap, IdentityHashMap, WeakHashMap

Performance Comparison

putgetcontainsKeynext
HashMapO(N)O(N) worst case, re-expand hash map
Ω(1)\Omega(1)
O(N)O(N)
Ω(1)\Omega(1)
O(N)O(N)
Ω(1)\Omega(1)
O(Capacity/N)O(Capacity/N)
TreeMapO(log(N))O(log(N))O(log(N))O(log(N))O(log(N))O(log(N))O(log(N))O(log(N))

Streams

Alternative to traditional approach of using forEach or iterators to operate on collections. Java stream supports functional-style programming (see an example of using stream with no side effect).

elements.forEach(e -> ...) // side effecting for each

Stream vs. Loops

StreamsLoops
High level construct (less boilerplate)Low level construct (more control)
Optimized frameworkCan be faster
ReadabilityReadability (depends)
Corner cases (stream doesn't throw exceptions)Checked Exceptions

Stream operations are divided into 2 types:

IntermediateTerminal
Everything but the last, returns Stream<T>.
E.g. filter()
Last in the Pipeline, returns values.
E.g. toList()

Intermediate Operations

Filter

streamOfElements.filter(e -> e.getSize() > 10);

Map

streamOfElements.map(Element::getName);  // transform elements from this to that

Skip & Limit

streamOfElements
.skip(elementsOnPage * pageNumber) // discard next N elements
.limit(elementsOnPage) // only keep next N elements

Sorted

// 1. Sort Comparable objects with default order
elements.map(Element::getName).sorted() // String implements Comparable -> alphabetiaclly sort order

// 2. Sort objects with a specified comparator
Comparator<Element> byName = Comparator.comparing(Element::getName);
elements.sorted(byName)

FlatMap

// transform elements from one value in to zero, one or many values
// flatMap function recieves a stream of value
// here, we replace streamOfShipments with streamOfProducts
streamOfShipments.flatMap(shipment -> shipment.getLightProducts().stream())

List<Integer> nums = Arrays.asList(1, 2, 3);
nums.stream()
.map(num -> Arrays.asList(num, num * 2)) // [1, 2], [2, 4], [3, 6]
.flatMap(Collection::stream) // [1, 2, 2, 4, 3, 6]
// or
nums.stream()
.flatMap(num -> Arrays.asList(num, num * 2).stream())

Terminal Operations

toList

elementStream.toList();  // create a List, it is unmodifiable

// Object[]
elementStream.toArray(); // create objects array

// Element[]
elementStream.toArray(Element[]::new); // pass a function to create specific array type

Note, toList is introduced in Java 16. If you're using Java 8, you can use collectors: collect(Collectors.toList())

Match

// these all returns a boolean
// if any/none/all elements match a predicate
elements.anyMatch( e -> e.getWeight() > 20 );

elements.noneMatch( e -> e.getWeight() > 20 );

elements.allMatch( e -> e.getWeight() > 20 );

Find

// max/min element given a sort order
elements.max(Comparator.comparingInt(Element::getSize))

// findFirst (or findAny()) get the element
elements.filter(...).findFirst()

// count number of elements in a stream
elements.filter(...).count()

Reduce

// combine elements together using a combining function & an accumulator
elements.reduce(0, (accumlate, e) -> accumlate + e.getSize())

Collectors

Collectors are operations that happen at the end of Streams, so they are terminal operations.

elements.stream().filter(...).sorted(...)
.collect(Collectors.toList());
// add static import so we don't need `Collectors.` part
.collect(groupingBy(Element:getName));
/**
this is called downstream collectors
counting() is like count() but it takes elements passed by groupingBy()
this count each group elements
result is converted from string (return type of groupingBy) to long (return type of counting)
**/
.collect(groupingBy(Element:getName), counting())); // Collectors.groupingBy, Collectors.counting

Implementations

Boxed numeric is much larger & slower than primitives. Java provides streams with primitives.

Primitive Streams

Improve performance over boxed numeric streams. Functionality like sum() for primitives. Only specialized for 3 primitives: int, long, double

strings.mapToInt(String::length)  // return string stream
IntStream.of(1, 2)
IntStream.range(start, end)

Parallel Streams

Streams can be run in parallel mode. Run on the common fork-join pool of the JVM. Potentially, but not always, a performance improvement.

collection.parallelStream()
stream.parallel()

Sets

Collection with uniqueness, no duplicates.

Key Features

Equality Contract

hashCode and equals. This is 1-way implication, means 2 different objects can have the same hashCode, but if they are the same object, they must have the same hashCode.

object.equals(other) -> object.hashCode() == other.hashCode()
Arrays.hashCode()
Long.hashCode(longValue) // Java 8+
// old
(int) (l ^ (l >>> 32 ))
Float.floatToIntBits(f); // because hash code is a 32-bit integer
Objects.hash() // Java 7+

Implementations

HashSet

Based upon HashMap, use hashCode and looks up location, good for general.

TreeSet

Based upon TreeMap (red/black binary tree with defined sort order). Provide extra features:

  • SortedSet
  • NavigableSet

SortedSet

Defines an order, no indexes, but subset views possible. It extends Set interface.

E first();
E last();

SortedSet tailSet(E fromElement);
SortedSet headSet(E toElement);
SortedSet subSet(E fromElement, E toElement);

Extends SortedSet, provides ways to move through the order, implemented by TreeSet.

E lower(E e);
E higher(E e);

E floor(E e);
E ceiling(E e);

E pollFirst();
E pollLast();

LinkedHashSet

  • When to use: copying Set to modify, deduping List or Queue.
  • Maintains order: only insertion
  • Overhead: Slower than HashSet, less memory than TreeSet

EnumSet

  • Keys are Enums: faster and low memory usage
  • Bitset implementation: only a single long if < 64 elements

Performance Comparison

addcontainsnext
HashSetO(N)Ω(1)O(N) \Omega(1)O(log(N))Ω(1)O(log(N)) \Omega(1)O(Capacity/N)O(Capacity/N)
TreeSetO(log(N))O(log(N))O(log(N))O(log(N))O(log(N))O(log(N))