Panduan untuk HashSet di Java

1. Ikhtisar

Pada artikel ini, kita akan mendalami HashSet. Ini adalah salah satu implementasi Set paling populer serta merupakan bagian integral dari Java Collections Framework.

2. Pendahuluan HashSet

HashSet adalah salah satu struktur data fundamental di Java Collections API .

Mari kita mengingat kembali aspek terpenting dari penerapan ini:

  • Ini menyimpan elemen unik dan mengizinkan null
  • Ini didukung oleh HashMap
  • Itu tidak menjaga urutan penyisipan
  • Ini tidak aman untuk benang

Perhatikan bahwa HashMap internal ini diinisialisasi saat instance HashSet dibuat:

public HashSet() { map = new HashMap(); }

Jika Anda ingin mempelajari lebih dalam cara kerja HashMap , Anda dapat membaca artikel yang difokuskan di sini.

3. API

Di bagian ini, kami akan meninjau metode yang paling umum digunakan dan melihat beberapa contoh sederhana.

3.1. Menambahkan()

Metode add () bisa digunakan untuk menambahkan elemen ke set. Kontrak metode menyatakan bahwa sebuah elemen hanya akan ditambahkan jika belum ada dalam satu set. Jika sebuah elemen ditambahkan, metode akan mengembalikan nilai true, jika tidak - false.

Kami dapat menambahkan elemen ke HashSet seperti:

@Test public void whenAddingElement_shouldAddElement() { Set hashset = new HashSet(); assertTrue(hashset.add("String Added")); }

Dari perspektif implementasi, metode tambah adalah salah satu yang sangat penting. Detail implementasi menggambarkan bagaimana HashSet bekerja secara internal dan memanfaatkan metode put HashMap :

public boolean add(E e) { return map.put(e, PRESENT) == null; }

The Peta variabel adalah referensi ke internal, dukungan HashMap:

private transient HashMap map;

Sebaiknya Anda membiasakan diri dengan kode hash terlebih dahulu untuk mendapatkan pemahaman mendetail tentang bagaimana elemen diatur dalam struktur data berbasis hash.

Meringkas:

  • Sebuah HashMap adalah array dari ember dengan kapasitas default 16 elemen - setiap kotak sesuai dengan nilai hash yang berbeda
  • Jika berbagai objek memiliki nilai kode hash yang sama, objek tersebut akan disimpan dalam satu keranjang
  • Jika faktor beban tercapai, array baru akan dibuat dua kali ukuran yang sebelumnya dan semua elemen di-rehash dan didistribusikan ulang di antara keranjang baru yang sesuai
  • Untuk mengambil nilai, kita melakukan hash sebuah kunci, memodifikasinya, dan kemudian pergi ke keranjang yang sesuai dan mencari melalui daftar terkait potensial jika ada lebih dari satu objek

3.2. mengandung()

Tujuan dari metode berisi adalah untuk memeriksa apakah suatu elemen ada dalam HashSet tertentu . Ini mengembalikan nilai benar jika elemen ditemukan, jika tidak salah.

Kami dapat memeriksa elemen di HashSet :

@Test public void whenCheckingForElement_shouldSearchForElement() { Set hashsetContains = new HashSet(); hashsetContains.add("String Added"); assertTrue(hashsetContains.contains("String Added")); }

Setiap kali objek dilewatkan ke metode ini, nilai hash dihitung. Kemudian, lokasi keranjang yang sesuai diselesaikan dan dilalui.

3.3. menghapus()

Metode ini menghapus elemen yang ditentukan dari set jika ada. Metode ini mengembalikan nilai true jika set berisi elemen yang ditentukan.

Mari kita lihat contoh yang berfungsi:

@Test public void whenRemovingElement_shouldRemoveElement() { Set removeFromHashSet = new HashSet(); removeFromHashSet.add("String Added"); assertTrue(removeFromHashSet.remove("String Added")); }

3.4. bersih()

Kami menggunakan metode ini ketika kami bermaksud untuk menghapus semua item dari satu set. Implementasi yang mendasarinya hanya membersihkan semua elemen dari HashMap yang mendasarinya .

Mari kita lihat itu beraksi:

@Test public void whenClearingHashSet_shouldClearHashSet() { Set clearHashSet = new HashSet(); clearHashSet.add("String Added"); clearHashSet.clear(); assertTrue(clearHashSet.isEmpty()); }

3.5. ukuran()

This is one of the fundamental methods in the API. It's used heavily as it helps in identifying the number of elements present in the HashSet. The underlying implementation simply delegates the calculation to the HashMap's size() method.

Let's see that in action:

@Test public void whenCheckingTheSizeOfHashSet_shouldReturnThesize() { Set hashSetSize = new HashSet(); hashSetSize.add("String Added"); assertEquals(1, hashSetSize.size()); }

3.6. isEmpty()

We can use this method to figure if a given instance of a HashSet is empty or not. This method returns true if the set contains no elements:

@Test public void whenCheckingForEmptyHashSet_shouldCheckForEmpty() { Set emptyHashSet = new HashSet(); assertTrue(emptyHashSet.isEmpty()); }

3.7. iterator()

The method returns an iterator over the elements in the Set. The elements are visited in no particular order and iterators are fail-fast.

We can observe the random iteration order here:

@Test public void whenIteratingHashSet_shouldIterateHashSet() { Set hashset = new HashSet(); hashset.add("First"); hashset.add("Second"); hashset.add("Third"); Iterator itr = hashset.iterator(); while(itr.hasNext()){ System.out.println(itr.next()); } }

If the set is modified at any time after the iterator is created in any way except through the iterator's own remove method, the Iterator throws a ConcurrentModificationException.

Let's see that in action:

@Test(expected = ConcurrentModificationException.class) public void whenModifyingHashSetWhileIterating_shouldThrowException() { Set hashset = new HashSet(); hashset.add("First"); hashset.add("Second"); hashset.add("Third"); Iterator itr = hashset.iterator(); while (itr.hasNext()) { itr.next(); hashset.remove("Second"); } } 

Alternatively, had we used the iterator's remove method, then we wouldn't have encountered the exception:

@Test public void whenRemovingElementUsingIterator_shouldRemoveElement() { Set hashset = new HashSet(); hashset.add("First"); hashset.add("Second"); hashset.add("Third"); Iterator itr = hashset.iterator(); while (itr.hasNext()) { String element = itr.next(); if (element.equals("Second")) itr.remove(); } assertEquals(2, hashset.size()); }

The fail-fast behavior of an iterator cannot be guaranteed as it's impossible to make any hard guarantees in the presence of unsynchronized concurrent modification.

Fail-fast iterators throw ConcurrentModificationException on a best-effort basis. Therefore, it'd be wrong to write a program that depended on this exception for its correctness.

4. How HashSet Maintains Uniqueness?

When we put an object into a HashSet, it uses the object's hashcode value to determine if an element is not in the set already.

Each hash code value corresponds to a certain bucket location which can contain various elements, for which the calculated hash value is the same. But two objects with the same hashCode might not be equal.

So, objects within the same bucket will be compared using the equals() method.

5. Performance of HashSet

The performance of a HashSet is affected mainly by two parameters – its Initial Capacity and the Load Factor.

The expected time complexity of adding an element to a set is O(1) which can drop to O(n) in the worst case scenario (only one bucket present) – therefore, it's essential to maintain the right HashSet's capacity.

An important note: since JDK 8, the worst case time complexity is O(log*n).

The load factor describes what is the maximum fill level, above which, a set will need to be resized.

We can also create a HashSet with custom values for initial capacity and load factor:

Set hashset = new HashSet(); Set hashset = new HashSet(20); Set hashset = new HashSet(20, 0.5f); 

In the first case, the default values are used – the initial capacity of 16 and the load factor of 0.75. In the second, we override the default capacity and in the third one, we override both.

A low initial capacity reduces space complexity but increases the frequency of rehashing which is an expensive process.

On the other hand, a high initial capacity increases the cost of iteration and the initial memory consumption.

As a rule of thumb:

  • A high initial capacity is good for a large number of entries coupled with little to no iteration
  • A low initial capacity is good for few entries with a lot of iteration

It's, therefore, very important to strike the correct balance between the two. Usually, the default implementation is optimized and works just fine, should we feel the need to tune these parameters to suit the requirements, we need to do judiciously.

6. Conclusion

In this article, we outlined the utility of a HashSet, its purpose as well as its underlying working. We saw how efficient it is in terms of usability given its constant time performance and ability to avoid duplicates.

We studied some of the important methods from the API, how they can help us as a developer to use a HashSet to its potential.

As always, code snippets can be found over on GitHub.