Petunjuk Performa String

1. Perkenalan

Dalam tutorial ini, kita akan fokus pada aspek kinerja Java String API .

Kami akan menggali operasi pembuatan, konversi, dan modifikasi String untuk menganalisis opsi yang tersedia dan membandingkan efisiensinya.

Saran yang akan kami buat belum tentu cocok untuk setiap aplikasi. Namun yang pasti, kami akan menunjukkan cara memenangkan kinerja saat waktu berjalan aplikasi sangat penting.

2. Membangun String Baru

Seperti yang Anda ketahui, di Java, Strings tidak dapat diubah. Jadi setiap kali kita membangun atau menggabungkan objek String , Java membuat String baru - ini mungkin sangat mahal jika dilakukan dalam satu putaran.

2.1 . Menggunakan Konstruktor

Dalam kebanyakan kasus, kita harus menghindari pembuatan Strings menggunakan konstruktor kecuali kita tahu apa yang kita lakukan .

Mari buat objek newString di dalam loop terlebih dahulu, menggunakan konstruktor String () baru , lalu operator = .

Untuk menulis benchmark kami, kami akan menggunakan alat JMH (Java Microbenchmark Harness).

Konfigurasi kami:

@BenchmarkMode(Mode.SingleShotTime) @OutputTimeUnit(TimeUnit.MILLISECONDS) @Measurement(batchSize = 10000, iterations = 10) @Warmup(batchSize = 10000, iterations = 10) public class StringPerformance { }

Di sini, kami menggunakan mode SingeShotTime , yang menjalankan metode ini hanya sekali. Karena kami ingin mengukur kinerja operasi String di dalam loop, tersedia anotasi @Measurement untuk itu.

Penting untuk diketahui, bahwa pengulangan pembandingan secara langsung dalam pengujian kami dapat mendistorsi hasil karena berbagai pengoptimalan yang diterapkan oleh JVM .

Jadi kami hanya menghitung operasi tunggal dan membiarkan JMH menangani perulangan. Singkatnya, JMH melakukan iterasi dengan menggunakan parameter batchSize .

Sekarang, mari tambahkan tolok ukur mikro pertama:

@Benchmark public String benchmarkStringConstructor() { return new String("baeldung"); } @Benchmark public String benchmarkStringLiteral() { return "baeldung"; }

Pada pengujian pertama, objek baru dibuat di setiap iterasi. Pada pengujian kedua, objek dibuat hanya sekali. Untuk iterasi yang tersisa, objek yang sama dikembalikan dari kumpulan konstan String .

Mari kita jalankan tes dengan iterasi perulangan count = 1.000.000 dan lihat hasilnya:

Benchmark Mode Cnt Score Error Units benchmarkStringConstructor ss 10 16.089 ± 3.355 ms/op benchmarkStringLiteral ss 10 9.523 ± 3.331 ms/op

Dari nilai Score , terlihat jelas bahwa perbedaannya signifikan.

2.2. + Operator

Mari kita lihat contoh penggabungan String dinamis :

@State(Scope.Thread) public static class StringPerformanceHints { String result = ""; String baeldung = "baeldung"; } @Benchmark public String benchmarkStringDynamicConcat() { return result + baeldung; } 

Dalam hasil kami, kami ingin melihat waktu eksekusi rata-rata. Format angka keluaran disetel ke milidetik:

Benchmark 1000 10,000 benchmarkStringDynamicConcat 47.331 4370.411

Sekarang, mari kita analisis hasilnya. Seperti yang kita lihat, menambahkan 1000 item ke state.result membutuhkan waktu 47,331 milidetik. Akibatnya, bertambahnya jumlah iterasi dalam 10 kali, waktu berjalan bertambah menjadi 4370,441 milidetik.

Singkatnya, waktu eksekusi tumbuh secara kuadrat. Oleh karena itu, kompleksitas penggabungan dinamis dalam satu loop iterasi n adalah O (n ^ 2) .

2.3. String.concat ()

Satu cara lagi untuk menggabungkan Strings adalah dengan menggunakan metode concat () :

@Benchmark public String benchmarkStringConcat() { return result.concat(baeldung); } 

Unit waktu keluaran adalah milidetik, hitungan iterasi 100.000. Tabel hasil terlihat seperti:

Benchmark Mode Cnt Score Error Units benchmarkStringConcat ss 10 3403.146 ± 852.520 ms/op

2.4. String.format ()

Cara lain untuk membuat string adalah dengan menggunakan metode String.format () . Di balik terpal, ini menggunakan ekspresi reguler untuk mengurai masukan.

Mari kita tulis kasus uji JMH:

String formatString = "hello %s, nice to meet you"; @Benchmark public String benchmarkStringFormat_s() { return String.format(formatString, baeldung); }

Setelah itu kita jalankan dan lihat hasilnya:

Number of Iterations 10,000 100,000 1,000,000 benchmarkStringFormat_s 17.181 140.456 1636.279 ms/op

Meskipun kode dengan String.format () terlihat lebih bersih dan mudah dibaca, kami tidak menang di sini dalam hal kinerja.

2.5. StringBuilder dan StringBuffer

Kami sudah memiliki artikel yang menjelaskan StringBuffer dan StringBuilder . Jadi di sini, kami hanya akan menampilkan informasi tambahan tentang kinerja mereka. StringBuilder menggunakan larik yang dapat diubah ukurannya dan indeks yang menunjukkan posisi sel terakhir yang digunakan dalam larik. Ketika array sudah penuh, ia membesar dua kali lipat dari ukurannya dan menyalin semua karakter ke dalam array baru.

Mempertimbangkan bahwa pengubahan ukuran tidak terlalu sering terjadi, kita dapat menganggap setiap operasi append () sebagai waktu konstan O (1) . Mempertimbangkan hal ini, seluruh proses memiliki kompleksitas O (n) .

After modifying and running the dynamic concatenation test for StringBuffer and StringBuilder, we get:

Benchmark Mode Cnt Score Error Units benchmarkStringBuffer ss 10 1.409 ± 1.665 ms/op benchmarkStringBuilder ss 10 1.200 ± 0.648 ms/op

Although the score difference isn't much, we can notice that StringBuilder works faster.

Fortunately, in simple cases, we don't need StringBuilder to put one String with another. Sometimes, static concatenation with + can actually replace StringBuilder. Under the hood, the latest Java compilers will call the StringBuilder.append() to concatenate strings.

This means winning in performance significantly.

3. Utility Operations

3.1. StringUtils.replace() vs String.replace()

Interesting to know, that Apache Commons version for replacing the String does way better than the String's own replace() method. The answer to this difference lays under their implementation. String.replace() uses a regex pattern to match the String.

In contrast, StringUtils.replace() is widely using indexOf(), which is faster.

Now, it's time for the benchmark tests:

@Benchmark public String benchmarkStringReplace() { return longString.replace("average", " average !!!"); } @Benchmark public String benchmarkStringUtilsReplace() { return StringUtils.replace(longString, "average", " average !!!"); }

Setting the batchSize to 100,000, we present the results:

Benchmark Mode Cnt Score Error Units benchmarkStringReplace ss 10 6.233 ± 2.922 ms/op benchmarkStringUtilsReplace ss 10 5.355 ± 2.497 ms/op

Although the difference between the numbers isn't too big, the StringUtils.replace() has a better score. Of course, the numbers and the gap between them may vary depending on parameters like iterations count, string length and even JDK version.

With the latest JDK 9+ (our tests are running on JDK 10) versions both implementations have fairly equal results. Now, let's downgrade the JDK version to 8 and the tests again:

Benchmark Mode Cnt Score Error Units benchmarkStringReplace ss 10 48.061 ± 17.157 ms/op benchmarkStringUtilsReplace ss 10 14.478 ± 5.752 ms/op

The performance difference is huge now and confirms the theory which we discussed in the beginning.

3.2. split()

Before we start, it'll be useful to check out string splitting methods available in Java.

When there is a need to split a string with the delimiter, the first function that comes to our mind usually is String.split(regex). However, it brings some serious performance issues, as it accepts a regex argument. Alternatively, we can use the StringTokenizer class to break the string into tokens.

Another option is Guava's Splitter API. Finally, the good old indexOf() is also available to boost our application's performance if we don't need the functionality of regular expressions.

Now, it's time to write the benchmark tests for String.split() option:

String emptyString = " "; @Benchmark public String [] benchmarkStringSplit() { return longString.split(emptyString); }

Pattern.split() :

@Benchmark public String [] benchmarkStringSplitPattern() { return spacePattern.split(longString, 0); }

StringTokenizer :

List stringTokenizer = new ArrayList(); @Benchmark public List benchmarkStringTokenizer() { StringTokenizer st = new StringTokenizer(longString); while (st.hasMoreTokens()) { stringTokenizer.add(st.nextToken()); } return stringTokenizer; }

String.indexOf() :

List stringSplit = new ArrayList(); @Benchmark public List benchmarkStringIndexOf() { int pos = 0, end; while ((end = longString.indexOf(' ', pos)) >= 0) { stringSplit.add(longString.substring(pos, end)); pos = end + 1; } return stringSplit; }

Guava's Splitter :

@Benchmark public List benchmarkGuavaSplitter() { return Splitter.on(" ").trimResults() .omitEmptyStrings() .splitToList(longString); }

Finally, we run and compare results for batchSize = 100,000:

Benchmark Mode Cnt Score Error Units benchmarkGuavaSplitter ss 10 4.008 ± 1.836 ms/op benchmarkStringIndexOf ss 10 1.144 ± 0.322 ms/op benchmarkStringSplit ss 10 1.983 ± 1.075 ms/op benchmarkStringSplitPattern ss 10 14.891 ± 5.678 ms/op benchmarkStringTokenizer ss 10 2.277 ± 0.448 ms/op

As we see, the worst performance has the benchmarkStringSplitPattern method, where we use the Pattern class. As a result, we can learn that using a regex class with the split() method may cause performance loss in multiple times.

Likewise, we notice that the fastest results are providing examples with the use of indexOf() and split().

3.3. Converting to String

In this section, we're going to measure the runtime scores of string conversion. To be more specific, we'll examine Integer.toString() concatenation method:

int sampleNumber = 100; @Benchmark public String benchmarkIntegerToString() { return Integer.toString(sampleNumber); }

String.valueOf() :

@Benchmark public String benchmarkStringValueOf() { return String.valueOf(sampleNumber); }

[some integer value] + “” :

@Benchmark public String benchmarkStringConvertPlus() { return sampleNumber + ""; }

String.format() :

String formatDigit = "%d"; @Benchmark public String benchmarkStringFormat_d() { return String.format(formatDigit, sampleNumber); }

After running the tests, we'll see the output for batchSize = 10,000:

Benchmark Mode Cnt Score Error Units benchmarkIntegerToString ss 10 0.953 ± 0.707 ms/op benchmarkStringConvertPlus ss 10 1.464 ± 1.670 ms/op benchmarkStringFormat_d ss 10 15.656 ± 8.896 ms/op benchmarkStringValueOf ss 10 2.847 ± 11.153 ms/op

After analyzing the results, we see that the test for Integer.toString() has the best score of 0.953 milliseconds. In contrast, a conversion which involves String.format(“%d”) has the worst performance.

That's logical because parsing the format String is an expensive operation.

3.4. Comparing Strings

Let's evaluate different ways of comparing Strings. The iterations count is 100,000.

Here are our benchmark tests for the String.equals() operation:

@Benchmark public boolean benchmarkStringEquals() { return longString.equals(baeldung); }

String.equalsIgnoreCase() :

@Benchmark public boolean benchmarkStringEqualsIgnoreCase() { return longString.equalsIgnoreCase(baeldung); }

String.matches() :

@Benchmark public boolean benchmarkStringMatches() { return longString.matches(baeldung); } 

String.compareTo() :

@Benchmark public int benchmarkStringCompareTo() { return longString.compareTo(baeldung); }

After, we run the tests and display the results:

Benchmark Mode Cnt Score Error Units benchmarkStringCompareTo ss 10 2.561 ± 0.899 ms/op benchmarkStringEquals ss 10 1.712 ± 0.839 ms/op benchmarkStringEqualsIgnoreCase ss 10 2.081 ± 1.221 ms/op benchmarkStringMatches ss 10 118.364 ± 43.203 ms/op

As always, the numbers speak for themselves. The matches() takes the longest time as it uses the regex to compare the equality.

In contrast, the equals() and equalsIgnoreCase() are the best choices.

3.5. String.matches() vs Precompiled Pattern

Now, let's have a separate look at String.matches() and Matcher.matches() patterns. The first one takes a regexp as an argument and compiles it before executing.

So every time we call String.matches(), it compiles the Pattern:

@Benchmark public boolean benchmarkStringMatches() { return longString.matches(baeldung); }

The second method reuses the Pattern object:

Pattern longPattern = Pattern.compile(longString); @Benchmark public boolean benchmarkPrecompiledMatches() { return longPattern.matcher(baeldung).matches(); }

And now the results:

Benchmark Mode Cnt Score Error Units benchmarkPrecompiledMatches ss 10 29.594 ± 12.784 ms/op benchmarkStringMatches ss 10 106.821 ± 46.963 ms/op

As we see, matching with precompiled regexp works about three times faster.

3.6. Checking the Length

Finally, let's compare the String.isEmpty() method:

@Benchmark public boolean benchmarkStringIsEmpty() { return longString.isEmpty(); }

and the String.length() method:

@Benchmark public boolean benchmarkStringLengthZero() { return emptyString.length() == 0; }

First, we call them over the longString = “Hello baeldung, I am a bit longer than other Strings in average” String. The batchSize is 10,000:

Benchmark Mode Cnt Score Error Units benchmarkStringIsEmpty ss 10 0.295 ± 0.277 ms/op benchmarkStringLengthZero ss 10 0.472 ± 0.840 ms/op

After, let's set the longString = “” empty string and run the tests again:

Benchmark Mode Cnt Score Error Units benchmarkStringIsEmpty ss 10 0.245 ± 0.362 ms/op benchmarkStringLengthZero ss 10 0.351 ± 0.473 ms/op

As we notice, benchmarkStringLengthZero() and benchmarkStringIsEmpty() methods in both cases have approximately the same score. However, calling isEmpty() works faster than checking if the string's length is zero.

4. String Deduplication

Since JDK 8, string deduplication feature is available to eliminate memory consumption. Simply put, this tool is looking for the strings with the same or duplicate contents to store one copy of each distinct string value into the String pool.

Currently, there are two ways to handle String duplicates:

  • using the String.intern() manually
  • enabling string deduplication

Let's have a closer look at each option.

4.1. String.intern()

Before jumping ahead, it will be useful to read about manual interning in our write-up. With String.intern() we can manually set the reference of the String object inside of the global String pool.

Then, JVM can use return the reference when needed. From the point of view of performance, our application can hugely benefit by reusing the string references from the constant pool.

Important to know, that JVM String pool isn't local for the thread. Each String that we add to the pool, is available to other threads as well.

However, there are serious disadvantages as well:

  • to maintain our application properly, we may need to set a -XX:StringTableSize JVM parameter to increase the pool size. JVM needs a restart to expand the pool size
  • calling String.intern() manually is time-consuming. It grows in a linear time algorithm with O(n) complexity
  • additionally, frequent calls on long String objects may cause memory problems

To have some proven numbers, let's run a benchmark test:

@Benchmark public String benchmarkStringIntern() { return baeldung.intern(); }

Additionally, the output scores are in milliseconds:

Benchmark 1000 10,000 100,000 1,000,000 benchmarkStringIntern 0.433 2.243 19.996 204.373

The column headers here represent a different iterations counts from 1000 to 1,000,000. For each iteration number, we have the test performance score. As we notice, the score increases dramatically in addition to the number of iterations.

4.2. Enable Deduplication Automatically

First of all, this option is a part of the G1 garbage collector. By default, this feature is disabled. So we need to enable it with the following command:

 -XX:+UseG1GC -XX:+UseStringDeduplication

Important to note, that enabling this option doesn't guarantee that String deduplication will happen. Also, it doesn't process young Strings. In order to manage the minimal age of processing Strings, XX:StringDeduplicationAgeThreshold=3 JVM option is available. Here, 3 is the default parameter.

5. Summary

In this tutorial, we're trying to give some hints to use strings more efficiently in our daily coding life.

As a result, we can highlight some suggestions in order to boost our application performance:

  • when concatenating strings, the StringBuilder is the most convenient option that comes to mind. However, with the small strings, the + operation has almost the same performance. Under the hood, the Java compiler may use the StringBuilder class to reduce the number of string objects
  • to convert the value into the string, the [some type].toString() (Integer.toString() for example) works faster then String.valueOf(). Because that difference isn't significant, we can freely use String.valueOf() to not have a dependency on the input value type
  • when it comes to string comparison, nothing beats the String.equals() so far
  • String deduplication improves performance in large, multi-threaded applications. But overusing String.intern() may cause serious memory leaks, slowing down the application
  • for splitting the strings we should use indexOf() to win in performance. However, in some noncritical cases String.split() function might be a good fit
  • Using Pattern.match() the string improves performance significantly
  • String.isEmpty() is faster than String.length() ==0

Also, keep in mind that the numbers we present here are just JMH benchmark results – so you should always test in the scope of your own system and runtime to determine the impact of these kinds of optimizations.

Terakhir, seperti biasa, kode yang digunakan selama diskusi dapat ditemukan di GitHub.