GreenArrow Email Software Documentation

SimpleMH Click and Open Tracking

Overview

GreenArrow Engine offers click and open tracking when the SimpleMH method is used to inject mail. SimpleMH’s click and open tracking facility can be turned on or off on a per-Mail Class basis.

Click tracking rewrites links into a URL that GreenArrow Engine’s HTTP server listens on.

Open tracking inserts tracking images into HTML emails. If images are loaded by the recipient, an open gets registered.

If you’d like to receive notifications about click and open events, the Event Notification System can do this for you.

Click and open data is stored in two tables in GreenArrow Engine’s PostgreSQL database. The data in these tables should be treated as read-only. Here are the table structures:

clickthrough_clicks Table
Column Type Description
id integer Primary key for this table.
urlid integer Primary key of the clickthrough_urls entry this record corresponds to.
clicktime integer Time in seconds past the Unix epoch that the click occured.
emailaddress character varying Email address of the subscriber who clicked.
html_or_text character(1) h for an HTML email, or t for a text email.
email_code integer Value contained in the X-GreenArrow-Click-Tracking-ID header, if present.
email_code_text character varying Value contained in the X-GreenArrow-Click-Tracking-ID header, if present.
clickthrough_urls Table
Column Type Description
id integer Primary key for this table.
sendid character varying(100) SendID of the message that was clicked or opened.
listid character varying(100) ListID of the message that was clicked or opened.
url text The original URL for links, or an empty string for opens.

Position of Open Tracking Image

When SimpleMH open tracking is enabled, a tracking image will be inserted into the HTML part.

To control the position of the open tracking image, add <opentag/> to your HTML part where you’d like it positioned. This must be done before the closing body tag (</body>). The tracking image is inserted into the first of the following within the HTML part:

  1. In place of the first <opentag> or <opentag/> tag. If more than one of these tags exist, the others are kept in the HTML unaltered.
  2. Immediately before the first </body> tag.
  3. Appended to the end of the HTML.

Dynamic Data

When dynamic data is being used, and you have control over how the URL is structured, it’s possible to reduce database bloat by putting dynamic data after a question mark. Database entries for URLs containing query strings are truncated at the question mark. The question mark, and query string following it are encoded in the re-written URL. For example, the following URL:

http://server.example.com?query=string&params=included

Would be stored in SimpleMH’s database as:

http://server.example.com

The re-written URL would look like:

http://greenarrow.example.com/click/e72/HZGVmYXVsdDEwMDAxLHQxLGh0dHA6Ly93d3cuZHJoLm5ldA/qP3F1ZXJ5PXN0cmluZyZwYXJhbXM9aW5jbHVkZWQ/scd6c91ef45

As a result, if a URL that’s inserted into a campaign is distinct for each subscriber, and contains subscriber-identification data following a question mark, SimpleMH is able to process this efficiently, and create only a single row in the clickthrough_urls table.

If a URL that’s inserted into a campaign is distinct for each subscriber, and contains subscriber-identification data that does not follow a question mark, then SimpleMH will insert a new row in the clickthrough_urls table for each user. This can lead to database bloat.

Example Query

Here’s an example query that displays opens for SendID a100525:

SELECT * FROM clickthrough_clicks WHERE id = (SELECT id FROM clickthrough_urls WHERE sendid = 'a100525' AND url = '');

The SendID is constructed by concatenating the InstanceID to the Mail Class. For example, if the following headers are present in a message:

X-GreenArrow-MailClass: a
X-GreenArrow-InstanceID: 100525

Then the resulting SendID would be a100525.

Using HTTPS

GreenArow’s Apache instance listens on TCP ports 80 (HTTP) and 443 (HTTPS) by default.

We recommend using HTTPS on port 443 for click and open tracking. TLS Certificate Configuration shows how to configure HTTPS.

If GreenArrow’s Apache instance is on the same server as another Apache instance that is bound to port 443, one solution is to bind each Apache instance to a specific IP address. Instructions for doing with GreenArrow Engine’s HTTP server are in the HTTP Server Configuration Document.

Event Tracking Metadata Storage

By default, SimpleMH uses an internal database for tracking recipient email addresses and link URLs. When a click, open, unsubscribe, bounce or spam complaint is received, the data is retrieved from that internal database. For most usages of GreenArrow, this default behavior is acceptable and fast.

When using GreenArrow in a clustered configuration (such as Processing Events on Dedicated Servers), however, this behavior is not desireable – because events triggered due to messages delivered by one GreenArrow node might be processed on another GreenArrow node.

This is why we offer multiple options for SimpleMH Event Tracking Metadata Storage.

The system default Event Tracking Metadata Storage is set using default_event_tracking_metadata_storage. However, some systems may have a legacy configuration file /var/hvmail/control/opt.simplemh_stateless_event_handling set to 1. In that case, the default is stateless. The default_event_tracking_metadata_storage directive takes precedence over the legacy configuration file.

Local

This is the default mode for GreenArrow. With Local Metadata Event Tracking, event metadata (such as recipient email address and Click-Tracking-ID) is stored on disk on the GreenArrow node that delivered the email.

These events (clicks, opens, etc) must be processed by the same node that delivered the message.

This results in short click tracking links that minimally inflate your message size.

Stateless

With Stateless Metadata Event Tracking, SimpleMH can be configured to embed the email address, link URL, and other message metadata in the message itself instead of recording it in its database.

Stateless Metadata Event Tracking has two advantages:

  1. It allows you to offload event processing to another GreenArrow server.
  2. It reduces disk space requirements.

The downside of Stateless Metadata Event Tracking is that it causes the average email size to increase since the message itself is used to store this extra information. For clicks, opens, and unsubscribes the information is embedded into the link. Configuring simplemh_compress_links can somewhat help with the length of link URLs. For bounces, the information is inserted into the email as an X-Mailer-Info-Extra header.

Regardless of whether or not Stateless Metadata Event Tracking is configured, repeat bounce counting only takes into account the bounces for the particular server on which they are processed.

Stateless Metadata Event Tracking can be configured in the following ways:

  • On the individual message, provide the following header:

    X-GreenArrow-EventTrackingMetadataStorage: stateless
    

  • On the Mail Class, set Event Tracking Metadata Storage to Stateless Metadata Event Tracking.

  • For all mail classes / messages that don’t otherwise set Event Tracking Metadata Storage, set default_event_tracking_metadata_storage.

External

External Metadata Event Tracking is an option that lets you provide an external Postgres database to which GreenArrow will connect for its Event Tracking Metadata. This external database is used for the metadata needed to process engine_click and engine_unsub events. For other events (such as engine_open or scomp), this mode is the same as Stateless Metadata Event Tracking.

This mode offers the following advantages over the other options:

  • The tracking metadata is centralized, so emails sent from one GreenArrow node can have its events processed on any other GreenArrow node that is configured to use the same External Metadata Event Tracking Metadata database connection.

  • Click tracking links are as short as possible – approximately 70 bytes plus the length of your domain name. This is regardless of the length of the destination URL. The links in this mode look like this:

    https://example.com/click?Pz1HRlKdRPkW1b2Uk8tUfxIz1HRlKdRPkW1b2Uk8tUfxI903b7e002a
    

The downsides of External Metadata Event Tracking include:

  • You’re responsible for managing the external Postgres server, configuring replication for high availability, backing it up, and pruning its dataset.

  • If there’s a service disruption with your external Postgres server, no External Metadata Event Tracking clicks, opens, and unsubscribes will be processed.

To configure External Metadata Event Tracking:

  1. Create a Postgres Database Connection in GreenArrow using either the UI, API, or greenarrow.conf.
  2. Set external_metadata_event_tracking_database to that Postgres Database Connection.
  3. Add the following schema to your external Postgres server:
    create table ga_tracking_data (
      id uuid not null primary key,   -- will be a random uuid
      date date not null,             -- UTC date this entry was created
      data jsonb not null             -- json blob containing the tracking data
    );
    -- create an index for pruning
    create index ga_tracking_data__created_at_idx on ga_tracking_data (date);
    

External Metadata Event Tracking can be applied to messages in the following ways:

  • On the individual message, provide the following header:

    X-GreenArrow-EventTrackingMetadataStorage: external
    

  • On the Mail Class, set Event Tracking Metadata Storage to External Metadata Event Tracking.

If external_metadata_event_tracking_database is not configured, or if it is not reachable during pre-delivery message processing, then messages that would otherwise use External Metadata Event Tracking will instead use Stateless Metadata Event Tracking.

The total number of possible connections to the external_metadata_event_tracking_database can be up to the sum of apache_max_clients + simplemh_max_servers + /var/hvmail/control/opt.simplemh.redis_num_workers.

Pruning External Metadata

GreenArrow does not automatically prune data from the metadata tracking table.

To prune, ensure this index exists:

CREATE INDEX IF NOT EXISTS ga_tracking_data__created_at_idx ON ga_tracking_data (date);

Then, you can prune old data using whatever time line you’d like:

DELETE FROM ga_tracking_data
  WHERE date < ((NOW() AT TIME ZONE 'UTC') - '90 days'::interval);
VACUUM VERBOSE ga_tracking_data;

Note that VACUUM will not free disk space – it will instead mark the deleted rows available for re-use.

Troubleshooting External Metadata Database Connections

Errors during delivery

When sending messages, if there is a problem with your External Metadata Database Connection, then GreenArrow will fall-back and generate the message using Stateless Metadata Event Tracking.

To find out why GreenArrow is falling-back to Stateless, and you use SMTP injection, you can run the following command to review the last 10 minutes of failures:

logdir_select_time --last '10 minutes' --dir /var/hvmail/log/simplemh \
  | tai64nlocal \
  | grep 'ERROR EXTERNAL METADATA'

If you inject using the HTTP Submission API (as opposed to SMTP), then you’ll want to run this command instead:

logdir_select_time --last '10 minutes' --dir /var/hvmail/log/simplemh2 \
  | tai64nlocal \
  | egrep 'ERROR EXTERNAL METADATA|PHP Warning'

Errors during event processing

If GreenArrow cannot connect to the External Metadata Database while processing a click, the end-user will receive an error message. Additional details will be logged to /var/hvmail/apache/logs/error_log.

Run the following command to view only the most recent 50 errors communicating with your External Metadata Database:

cat /var/hvmail/apache/logs/error_log \
  | grep 'ERROR EXTERNAL METADATA' \
  | tail -n 50

These errors will look something like this:

[Tue Jun 25 20:02:17.947168 2024] [php:warn] [pid 40635] [client 127.0.0.1:37599] PHP Warning:  pg_query_params(): Query failed: ERROR:  syntax error at or near &quot;f&quot;\nLINE 1: SELECT data, id x f b FROM ga_test_table WHERE id IN! ($1, $...\n                          ^ in /var/hvmail/webapp/click/click.php on line 157
[Tue Jun 25 20:02:17.947186 2024] [php:notice] [pid 40635] [client 127.0.0.1:37599] error: ERROR EXTERNAL METADATA: original_url=http://test.localhost/click?Pu02-MavNR-G1CLkUQMi8dzvfQOOWJ9Td6BnAPAfsd5c2a01b original_domain=test.localhost request_domain=127.0.0.1: cannot query the external event tracking metadata database

The log message includes:

  • original_url – This is our best attempt to determine what the original URL was that was clicked on. If you have a proxy in front of GreenArrow that modifies the request, the actual URL might have been different than this.
  • original_domain – The domain name that was used in the link that was clicked on.
  • request_domain – The domain that was issued as a request to GreenArrow directly. This may be different than original_domain if you have a proxy in front of GreenArrow.

If you have a specific instance of a link that you’d like to track separately from other instances of that same URL, you can add the HTML attribute data-ga-linkid="linkid" immediately after the href= attribute.

The Link ID may have a maximum length of 100 UTF-8 characters. Link IDs longer than this maximum length are ignored. Lead and trailing whitespace is trimmed.

Links that have a User-Defined Link ID are tracked separately in the statstics screen and API.

For example:

<a href="https://example.com" data-ga-linkid="link123">Our Great Link</a>

If the link ID is not quoted, it will end at the first space or > after the =, for example:

<a href="https://example.com" data-ga-linkid=link123 data-something-else="foo">A</a>
<a href="https://example.com" data-ga-linkid=link123>B</a>

This data attribute is not removed from the HTML before delivery.

Skipping Click Tracking

If you have a link for which you’d like to skip click tracking (so the URL will not be rewritten), you can add the HTML attribute data-ga-notrack immediately after the href= attribute (or immediately after the data-ga-linkid= attribute if it is also being used).

For example:

<a href="https://example.com" data-ga-notrack>Our Great Link</a>
<a href="https://example.com" data-ga-linkid="link-123" data-ga-notrack>Our Great Link</a>

This data attribute is not removed from the HTML before delivery.


Copyright © 2012–2024 GreenArrow Email