The Java programming language is about to turn 30 years old. It has been a widely successful language. One of the keys to its success, in particular in the enterprise world, is its commitment to backwards compatibility. If you take a piece of Java code written three decades ago, changes are that it will still compile and run today. Quite an impressive feat that sets Java apart from other languages.

This commitment to backwards compatibility is a double-edged sword. It means that the Java language has to carry a lot of baggage from the past, including some quirks and bugs. Despite the brilliance of its creators, not all the initial decisions baked into the early releases of Java have resisted the test of time, and some of them are very questionable.

One of the areas where this is particularly evident is the java.net.URL class. This class was introduced as part of the standard library in Java 1.0, and it is signed by James Gosling himself. The fact that a new language included from the beginning a class to handle URLs is a testament to the vision of its creators and it shows how Java was designed from the ground up to be a networked language.

However, the java.net.URL class is weird. It is not just weird, it is broken. Oracle itself acknowledged that in 2004, while admitting that it could not be changed.

But how exactly is it broken? Let’s take a look at the following code:

String siteA = "https://site-a.berrueta.net/";
String siteB = "https://site-b.berrueta.net/";
String siteC = "https://site-c.berrueta.net/";

@Test
public void stringsAreDifferent() {
    assertNotEquals(siteA, siteB);
    assertNotEquals(siteA, siteC);
    assertNotEquals(siteB, siteC);
}

@Test
public void urlsAreDifferent() throws Exception {
    assertNotEquals(new URL(siteA), new URL(siteB));
    assertNotEquals(new URL(siteA), new URL(siteC));
    assertNotEquals(new URL(siteB), new URL(siteC));
}

@Test
public void urisAreDifferent() throws Exception {
    assertNotEquals(new URI(siteA), new URI(siteB));
    assertNotEquals(new URI(siteA), new URI(siteC));
    assertNotEquals(new URI(siteB), new URI(siteC));
}

The tests are straightforward. They define three strings. Then they assert that those strings are different. Then they create URLs and URIs from those strings, and they assert that those URLs and URIs are different too. What do you think the output of these tests will be?

As you would expect, the strings are different. The URIs are also different. However, the test case about the URLs is a box of surprises. If you run the test in your computer, the test may pass or not depending on:

  • Whether you are connected to the internet or not.
  • Whether I decide to make them pass or to make them fail.
  • The time of the day. They may pass now and fail later, or vice-versa.
  • Pure luck.

All of the above are true. The reason is that the equals method of the java.net.URL class tries to resolve the hostnames into IP addresses, and compare those IP addresses. If the hostname cannot be resolved (for example, because you do not have access to the internet), the equals method will forget about comparing the IP addresses and instead will compare the hostnames, which are different in this case. But if the hostname can be resolved, then the outcome of the test will depend on whether the hostnames resolve to the same IP address or not. And I, as the owner of the berrueta.net domain, control that. Therefore, I can make the test pass or fail at will. For example, at the moment of writing this blogpost, I have configured the DNS so that site-a.berrueta.net and site-b.berrueta.net resolve to different IP addresses, but site-c.berrueta.net resolves to the same IP address as site-b.berrueta.net. Therefore, the test fails (if you are connected to the internet when you run it) because this assertion fails:

assertNotEquals(new URL(siteB), new URL(siteC));

It gets even more interesting. The mapping of DNS domain names to IP addresses can change over time. For example, I can change the DNS configuration so that site-c.berrueta.net resolves to a different IP address than site-b.berrueta.net. DNS changes take a while to propagate and are cached, for example by your operating system. Therefore, the test may pass now, but fail later. Or vice-versa. Or it may pass in one computer and fail in another at the same time, just because of the DNS cache.

And what about pure luck? Well, DNS domain names can be configured to resolve to multiple IP addresses. This is called round-robin DNS, and it is often used for load balancing purposes. So if I configure site-c.berrueta.net to resolve to two different IP addresses, the test may pass or fail depending on which one of the two IP addresses is returned first by the DNS server. And that is pure luck.

There are other ripple effects of this decision to resolve the hostnames into IP addresses as part of the implementation of equals. For example, that means the equals method of the URL class performs blocking I/O operations. It may take a long time to compare two URLs if the DNS server is slow. Another consequence is that it is in order to keep the implementation hashCode congruent with equals, the hashCode method of the URL class also performs blocking I/O operations. Adding a URL object to a HashSet or as the key of a HashMap may produce very strange results.

There are other problems with this class, for example around decoding some URL-encoded characters. As a partial solution, since Java 1.4 the standard library includes the java.net.URI class, which does not have the same problems and can be used in many cases as a replacement of java.net.URL. Since Java 20, some of the methods and constructors of java.net.URI are deprecated and the recommended alternative is to use an indirections via java.net.URI.

My advice to avoid the problems with java.net.URL is twofold:

  1. Use java.net.URI instead of java.net.URL whenever possible, even to represent URLs and not URIs.

  2. Since you cannot control whether your dependencies use URL or URI, always avoid “real” DNS domain names for testing. As tempting it may be to use google.com or disney.com in your tests (in particular in integration tests), you should avoid them. Instead, use a domain name that you control, or even better, use one of the reserved domain names for testing, like example.com. These are specified in RFC 6761.