Demystifying URL Structure: A Comprehensive Guide to Understanding and Interpreting URLs
Table of Contents
Recommended basics: Articles you should know
To get the full picture of this article, you should know about this topics:
You may also want to use the content map to find interesting articles that play into this one.
You use it everyday and probably you don’t think about it a lot: URLs.
https://reliable.codes is a URL that, if you open it, would load
this website in your browser.
Beside this very common form of a URL, there is much more dynamics to it,
which we will dive deeper on in this article.
URLs: Navigate through the WEB #
URL stands for uniform resource locator. The basic idea is, to have a string
that identifies a particular information / resource on some application, so it
can be loaded whenever needed.
An information can basically be everything: A text, an image, a video,
some audio or a font to render text (just some examples, for sure there is
many more).
Deciphering the Anatomy of a URL #
A URL is “easily” understandable. It has a fix layout. So as long as you know
what options you have, you can read URLs and get information out of it without
even opening.
Here’s a full featured example of how a URL can look like:
| |
The components of the layout are seperated by special characters and have a fix order, so let’s walk through.
Protocol #
Every URL must specify the protocol. It is the first word you can read
until ://, so in the example above it is https.
The protocol will tell your system in which application to process this URL.
See below for more details / examples.
Authentication #
URLs can provide authentication information to access protected data. The
authentication information is optional. If needed, this component will go
directly after :// until the first @, so in the example above it is
user:password.
The authentication information can have just a username or a password as well.
If both is provided, the : will seperate the two information. In the given
example, both is provided and the username would be user with the password
defined as password.
If you just have a username, you will not see the : in this section (e.g.
https://user@...).
Host #
Next it is specified whom to talk to. This information can be seen as one, but to mention it, I’ll break it down.
The Host comes directly after protocol / authentication information and
goes until first : (see port), / (see path), ? (see query parameters) or
# (see anchor). In the given example it is subdomain.domain.tld.
To get all details out of the host, you simply split it by .. You always
will end up with at least 2 words but it can be more. These words can be seen
as “groups” while the first word is the most detail provided and every word
after has a bigger / more generic context (see top level domain for more
information).
- subdomain
- domain
- tld
Subdomain (optional) #
Every split word before the second last is a so-called subdomain. In the given
example we have “subdomain” as our one-and-only subdomain.
If you own a domain, you can create subdomains for free. This way, you can host multiple applications under the same name.
Second level domain #
The second-last split word is the second-level-domain. In the given example it
is domain. The combination of second- and top level domain forms what usually
is called “domain”. One “domain” can just exist once and has one owner.
If you want to have your own website, one of the first
things is to think about your domain name. The second-level-domain is a
dynamic value provided by you.
Top Level Domain #
The last split word is the top-level-domain. In the given example it is tld.
When it comes to top-level-domains, it is a fix set of values. While in the early
days those TLDs where provided for each country (e.g. com, de, it), nowadays
we see so-called generic top-level-domains (gTLDs) like codes or io.
Even if there’s now many more top-level-domains, you can just pick what’s there,
you cannot create new ones.
Port #
Once it’s clear which computer to talk to, it must be clarified on which port
to talk to him. Every computer has 65535 such ports (numeric increasingly,
starting from 0), every running programm can take one or more of these ports,
but every port can just be taken once at a time.
Ports are per-IP. Computers with multiple IPs have more ports as a result.
URLs can provide a port, but he’s optional. If needed, this component will
go directly after host (which in this case is ended with a :) until first
/ (see path), ? (see query parameters) or # (see anchor), so in the example
above it is 1234.
Since defining a port in the URL is optional, there is a public agreement of
some default ports depending on the protocol used, see below.
Examples of default ports #
Let me showcase the often used default ports, definitely there’s way more than that:
| Protocol | Default Port |
|---|---|
| HTTP | 80 |
| HTTPS | 443 |
| FTP | 21 |
| SSH | 22 |
| SFTP | 22 |
| SCP | 22 |
It means basically, if you have an url like https://reliable.codes, port 443
will be used, since it’s https and nothing else is specified.
Path #
URLs can provide a path which will go directly after host or port
(if specified) until first ? (see query parameters) or # (see anchor),
so in the example above it is /some/path.
The path is always defined, it is a empty string or a / by default.
This information is helping the application, that will work with the request, to respond with the expected information.
Query parameters #
URLs can provide query parameters, but they’re optional. If needed, this
component will go directly after path until first # (see anchor), so
in the example above it is query=parameter.
You can have multiple query parameters, they are seperated by &. Each
query parameters is a combination of a key and a value, they are seperated
by =. So in the example above I have one query parameters with the key
query and the value parameter.
This information is helping the application, that will work with the request,
to be more precise in the response. Often query parameters are used to
describe some form of filters.
Anchor #
URLs can provide an anchor, optionally. If needed, this component will be
the last one, so in the example above it is anchor.
The anchor is used for websites to tell your browser where to scroll to, after
the page was loaded. For this to work, the website needs to define this anchors
in the HTML code.
In this article for example, every headline is such an anchor. So in the menu
you can click a particular headline and your browser will scroll to that section.
If you look precisely, you’ll recognize that the menu is “just” anchor links.
Benefit: You can share this links with your friends so they do not need to read
everything, but are directly focused to what is interesting for them.
As long as “just the anchor changes”, browsers will nowadays not reload the page when a link is clicked. Instead, they’ll scroll to the new position.
Beyond the Browser: Application specific URLs #
URLs came up with the Internet. Over time, their usage widened up. Here’s some
more use-cases where the standard format is not fully met (it is just some examples,
there’s more):
E-Mails #
You can “generate” E-Mails from websites, see the @ button in the sharing section
below the article. For sure such links will not directly send an email, but they
will give your E-Mail client some context.
Here’s a full featured example of how a mailto URL can look like:
| |
The protocol in this case is mailto (even if the seperator is not really correct),
the user is some, the domain is friend.com and we have two query parameters which are body
and subject.
The values of the query parameters are hard to read, they are encoded. If you
decode them, you end up with Nice page: https://reliable.codes for body and
I found a nice IT page for subject.
Phone calls #
If you want your visitors to easily call you, you can totally do that. Here’s a full
featured example of how a tel URL can look like:
| |
This is probably the most easy type of URL. The protocol in this case is tel
(even if the seperator is not really correct), the domain is 0123456789.
Custom applications #
Applications that you install on your device can register their own protocols.
WhatsApp URL Schema #
If you want your visitors to easily share your page via whatsapp, here’s a full
featured example of how a whatsapp URL can look like:
| |
The protocol in this case is whatsapp, the domain is send and we have
one query parameter which is text.
The value of the query parameter is hard to read, it is encoded. If you
decode it, you end up with Nice page: https://reliable.codes.
Telegram URL Schema #
If you want your visitors to easily share your page via telegram, here’s a full
featured example of how a telegram URL can look like:
| |
The protocol in this case is tg (telegram), the domain is msg_url and we have
two query parameters which are url and text.
The values of the query parameters are hard to read, they’re encoded. If you
decode it, you end up with https://reliable.codes for url and Nice page
for text.
Conclusion: Navigating the World of URLs #
URLs started as a way to link between webpages, they evolved to the navigator
of the web. You can use them in websites and emails to increase the user
experience.
It is nice that we have this tool at hand but there’s still a long way to go until all stakeholders have understood how to use them. Still many businesses just write their phone number as text in crazy formats or show their email-address as an image (to prevent spam?). What a pity.
I hope this article motivated you to dive deeper into this topic and re-think your project and how you can benefit from this idea.
Keep pushing forward: Next articles to improve your skills
With this article in mind, you can keep on reading about these topics:
You may also want to use the content map to find your next article.