Email Concepts and Terminology

Table of Contents
Introduction
Background
- Internet Standards
- DNS Basics
Email Basics
- RFC 5321 (SMTP)
- RFC 5322 (Internet Message Format)
GreenArrow Terminology for Some Header Fields
Email Authentication Terms Explained

Introduction

This page is a central point for basic email concepts and terminology used throughout GreenArrow documentation.

Background

Before we can talk about email concepts and terminology, we have to first introduce some background concepts.

Internet Standards

Internet standards, also known as Requests for Comments, or RFCs, define the protocols and languages to be used when data is transmitted across the internet from one location to another. Every time an email message moves from sender to receiver, it does so because both the sending site and receiving site are complying with all the RFCs that define how email should flow.

RFCs are numbered when published and typically referred to by their number, not their name. RFC 1 was published in 1969, and there are over 9000 RFCs today. We’ll mention a few of them in the rest of this document.

RFCs are not necessarily permanent. While some RFCs introduce new concepts or define new protocols, others are written to update existing RFCs, and others still render obsolete previous RFCs and their topics entirely.

DNS Basics

The Domain Name System (DNS) is foundational to email, as without working DNS email cannot function.

The DNS is a distributed database that allows domain owners to announce their domain’s presence on the Internet and direct others to the domain’s website, email servers, and other services the domain may provide. It’s referred to as “distributed” because there is no central source of truth for all domains on the Internet; rather, each domain manages their own local store of information about their domain and does so in a way that allows anyone anywhere to find it.

There are many types of records that can be published in DNS, but the following three types will be most important for email:

MX records, which announce the mail server(s) that accept inbound mail for a domain
A records, which map hostnames (e.g., www.greenarrowemail.com) to IP addresses
TXT records, which provide a way to publish arbitrary information about a domain or a name in the domain

A domain must have either an MX record or an A record in order to receive any email, and for all but the smallest domains, the MX record and A record will point to different servers, with the former handling the inbound mail and the latter typically hosting the domain’s main website. TXT records are not required for email unless the domain wants to have its email authenticated using the protocols and methods described below.

Email Basics

Email is defined by two RFCs:

RFC 5321, which describes the Simple Mail Transfer Protocol (SMTP), which is the set of commands and repsonses used during the transmission of an email message from one host to another
RFC 5322, which describes the format of an Internet email message.

Let’s look more closely at both, and how they help us remove ambiguity from references we make elsewhere.

RFC 5321 (SMTP)

The “S” in SMTP stands for “Simple”, and SMTP really is a simple protocol. In fact, there are only ten commands defined in the SMTP for the client (the sending host is always called the client, no matter how large it is) to use when transmitting a message to the server (the receiving host is always called the server). Server responses to client commands come in the form of three digit numbers with optional text; the response codes determine the client’s next command choices, while the text is purely for human operators of the client and server to read to understand the responses.

A typical SMTP session might look like this:

=== Connected to server.recipientdomain.com.
<-  220 server.recipientdomain.com ESMTP
 -> EHLO client.example.com
<-  250-server.recipientdomain.com
<-  250-STARTTLS
<-  250-AUTH LOGIN PLAIN
<-  250-AUTH=LOGIN PLAIN
<-  250-PIPELINING
<-  250 8BITMIME
 -> MAIL FROM:<[email protected]>
<-  250 ok
 -> RCPT TO:<[email protected]>
<-  250 ok
 -> DATA
<-  354 go ahead
 -> Date: Mon, 02 Dec 2024 17:24:14 -0500
 -> To: [email protected]
 -> From: Joe Smith <[email protected]>
 -> Subject: Hi, it's me
 -> Message-Id: <[email protected]>
 -> 
 -> Want to get some lunch?
 -> 
 -> 
 -> .
<-  250 ok 1733178257 qp 0
 -> QUIT
<-  221 server.recipientdomain.com
=== Connection closed with remote host.

In the above example, the lines starting with -> are either commands sent by the sending client, or the body of the email message (all the lines starting with -> that are between -> DATA and -> QUIT). All the lines starting with <- are responses sent by the receiving server.

RFC 5322 (Internet Message Format)

The companion RFC to RFC 5321, this document defines the format for internet email messages.

At its base, an internet email message is composed of a series of header fields followed by the body of the message. The generic definition for a message header field is:

Header-Field-Name: Some text that follows rules specific for this header field

Some of the header fields that are most commonly seen by message recipients include From:, To:, Date:, and Subject:. Many others, especially those that track the message’s transit from origination to destination (collectively called “Trace Fields”) and header fields that are associated with Email Authentication, are only ever seen if recipient takes special steps with their mail reader to actually read them. For example, someone reading a message using the Gmail web client would follow these instructions.

The message body, on the other hand, is everything that follows the last header field, and its format can be plain text, HTML, images, attachments, or some combination of all of these and more.

GreenArrow Terminology for Some Header Fields

The astute reader may have noticed in the example SMTP transaction above that two different lines contained From: in some form, and this fact, combined with the email industry’s proclivity to use multiple terms to refer to the same thing, can lead to at best ambiguity and at worst confusion and misunderstanding. At GreenArrow, we strive to avoid any ambiguity, and we use the following terms:

RFC5321.MailFrom

This term is used to reference the email address, either in whole or in part, that is passed to the MAIL FROM command during the SMTP session. In our example above, it looked like this:

-> MAIL FROM:<[email protected]>

So [email protected] is the RFC5321.MailFrom address, and example.com (the part after the @ sign) is the RFC5321.MailFrom domain.

Other things to know about this element:

When the message is delivered to the recipient’s inbox, this address should be the value of the RFC5322.Return-Path header field.
This address is sometimes referred to by the terms “envelope sender”, “bounce address”, or “return path address”.
When configured to process bounces, GreenArrow generates a Variable Envelope Return Path (VERP) for the RFC5321.MailFrom, so that each message is sent with a unique Return-Path that is keyed to the intended recipient of the message, allowing for easy processing of bounces.

RFC5322.From

This term is used to refer to the contents of the From: header field in the email message body. Again, from our example above:

-> From: Joe Smith <[email protected]>

As you can see, the RFC5321.MailFrom address and the RFC5322.From address are not the same, and this is not at all uncommon, especially for email messages that are sent in bulk or are transactional in nature (e.g., purchase receipts, password resets, banking statement notifications, etc.) The RFC5321.MailFrom address for such mail is typically a destination for bounces and other delivery status messages, and so it’s usually a mailbox dedicated solely for such messages. The RFC5322.From address, on the other hand, is meant to be shown to the recipient, and so will likely be a name or title and email address that the recipient will recognize (assuming that the mail is wanted).

Two notes about RFC5322.From:

RFC5322.From is also known as the “From header”, but someone using this term often receives a reply of “Which FROM do you mean?” from a person hearing this usage.
The part between From: and the email address in RFC5322.From (“Joe Smith”, in our example) is known as the “Friendly From”. Many large mailbox providers only show the Friendly From to recipients if the email address is in the recipient’s addressbook or the recipient has exchanged email with the address in the past.

RFC5322.Return-Path

This term is used for the Return-Path header field in an email message, and its value is usually (but not always) the same as the value that was passed to the MAIL FROM command during the SMTP transaction during delivery of the message. From our example above, the RFC5322.Return-Path and RFC5322.From header fields might look like this:

Return-Path: <[email protected]>
...
From: Joe Smith <[email protected]>
...

RFC5322.Sender

This term refers to the Sender: header field that can be present in messages that were sent by a person or entity other than the identity in the RFC5322.From header field. RFC5322 and its preceding documents contain an illustrative example of a secretary sending on behalf of another person, with the mailbox of the secretary then appearing in the Sender: header field. In cases where the mailbox of the entity that originated the message (either a person, system, or process) is the same as that shown in the RFC5322.From header field, the RFC5322.Sender header field should not be present. Moreover, even if present, this header field is rarely relevant.

We strongly discourage use of the header field and this term. The fact that it is sometimes used synonymously with the RFC5322.From header field by others even though they are two distinct header fields leads to both imprecision and confusion in our view. These are avoided by simply not using the term.

Email Authentication Terms Explained

In addition to the terms and concepts above, we think it’s important for our customers to at least have a conversational knowledge of the following terms related to email authentication.

Sender Policy Framework (SPF)

Defined in RFC 7208, this protocol provides a way for domain owners to authorize servers and networks to use their domain in the RFC5321.MailFrom identity of an email message. This identity is almost always the value passed to the SMTP MAIL FROM command, except in the special case when that value is just <>. In that case, and that case only, the RFC5321.MailFrom identity as it pertains to SPF is the name postmaster, followed by the @ sign, followed by the HELO (or EHLO) identity.

For example, if our SMTP session above had started out like this:

=== Connected to server.recipientdomain.com.
<-  220 server.recipientdomain.com ESMTP
 -> EHLO client.example.com
<-  250-server.recipientdomain.com
<-  250-STARTTLS
<-  250-AUTH LOGIN PLAIN
<-  250-AUTH=LOGIN PLAIN
<-  250-PIPELINING
<-  250 8BITMIME
 -> MAIL FROM:<>

then the RFC5321.MailFrom identity for SPF would’ve been [email protected], because the EHLO identity here was “client.example.com”

DomainKeys Identified Mail (DKIM)

DKIM allows a domain to take responsibility for a message in a way that can be verified by the receiving site. This is done by inserting a header field into the message, one containing two cryptographic hashes of the message content and enough information to allow the receiving site to attempt to validate the hashes.

GreenArrow can be configured to automatically sign messages based on the RFC5322.From address or other factors.

The header field is called the DKIM-Signature header field, and a typical such header field will look like this:

DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
  d=gmail.com; s=20230601; t=1732659201; x=1733264001; 
  h=to:subject:message-id:date:from:in-reply-to:references:mime-version
  :from:to:cc:subject:date:message-id:reply-to;
  bh=6jfZ05WYC3Sms8Cf9Q2doZEOzTplahhdFCzhREKmlAk=;
  b=jleo8wTJq0NKWmZ3vuNgCVewgakV1ys+gCKGkvSrOPna6qkE9FYhdH1fQsklUcAsrD
   R69Fp/fPuijyCH63epN6Oknx5Gvno16LFrVNp2EAlcaSws8z3UlTHdn1A1K/ppx7VlmP
   8y2yrZXeUclUixypxsnwm6jXN32zF6JwJOrVJt4Dbxd2EiNsJ7zRCL9yyf5IaP+B3637
   qrAG+eyWcDFqzXY9WZelKEFMZojpXvjfjdG6x0rEOA2suvWLAVeuyItxpYX/E3e9JqqC
   V1uw33lwvA0fnryIupH2dYwqhm4siZePEdp2PhJHVA9moaR04rGobUhIGq3SBhi356X4
   80Og==

The header field is a series of tag-value pairs, with the most interesting ones being these:

bh=	The hash of the body of the message
b=	The hash of some of the message’s header fields, including the DKIM-Signature header field itself
h=	Lists the other header fields used (in addition to the DKIM-Signature header field) to calculate the hash that is the value of the b= tag
d=	The domain that signed the message
s=	The selector name

DKIM uses PKI encryption, a technology that relies on two cryptographic keys, to sign the message. One of these keys is kept private by the signer, and the other one is published publicly in DNS. The public key is used by the receiving site to do its hash validation, and the s= and d= values in the DKIM-Signature header field tell the receiving site where in DNS to look for that public key.

The public key for the signer is always a DNS TXT record, and its location is always:

<selector_name>._domainkey.<signing_domain>

So, for our DKIM-Signature header field example here we’d find the public key at

20230601._domainkey.gmail.com

DKIM was developed after SPF, as an answer to some of the shortcomings of SPF. SPF is what’s known as a path-based authentication protocol, meaning that its authorization check depends on the path that a message took from its origination point to its destination. This means that SPF is prone to failure when mail takes an indirect path to its destination, as for example when it’s sent to an address that immediately forwards it to another domain. DKIM, on the other hand, is what’s known as a content-based authentication protocol. DKIM-signed messages can still pass authentication checks regardless of the path the message took; so long as the signed parts haven’t been altered, the message should pass.

Domain-based Message Authentication, Reporting, and Conformance (DMARC)

DMARC is an authentication mechanism to check whether or not a domain owner authorized use of its domain in the RFC5322.From header field of an email message. DMARC relies on the concept of “Domain Alignment”, where two domains are said to be in alignment either if they’re identical (a.k.a., “strict alignment”) or if they have the same “Organizational Domain” (a.k.a., “relaxed alignment”). A domain’s “Organizational Domain” is typically the domain name that was registered by an organization in order to establish a presence on the Internet; for example, “greenarrowemail.com” is one of GreenArrow’s Organizational Domains.

Domain owners participate in DMARC by publishing a DNS TXT record. That record can contain a number of tag-value pairs, but the two most important are one that allows the domain owner to request a level of treatment for mail using the domain and failing DMARC and one that specifies a mailbox for receiving regular reports from mailbox providers containing data about authentication results for their domain.

DMARC is not an authenticaton protocol per se; rather, it relies on DKIM and SPF authentication to determine if the RFC5322.From uses an authorized domain. In order for DMARC validation to pass, one of the following must be true:

The message must pass DKIM validation checks, and the DKIM signing domain (the d= domain in the DKIM-Signature header field) must be aligned with the RFC5322.From domain, OR
The message must pass SPF validation checks, and the RFC5321.MailFrom domain must be aligned with the RFC5322.From domain.