Disk Usage
- Table of Contents
- Overview
- The Report Hierarchy
- Performance Considerations
- Options
- Disk Space Reclamation Options
- Usage Categories
Overview
The greenarrow disk_usage command is used to generate reports on GreenArrow’s disk usage.
Specific filesystem paths and PostgreSQL table names are shown by greenarrow disk_usage reports when the --details option is used.
Deleting GreenArrow’s data without following procedures documented on this site can cause data corruption and render GreenArrow inoperable. Restoring GreenArrow to a working state after such deletions may incur an additional fee as discussed in our Modifications and Customizations page.
Here is an example of the default report’s output:
# greenarrow disk_usage
GreenArrow Disk Usage Report
Studio
        Attachments                                       16KB
        Bounces                                           112KB
        Clicks                                            184KB
        Content: Campaigns, Autoresponders, Web Forms     32KB
        Imports and Exports                               152KB
        Opens                                             64KB
        Sents                                             160KB
        Spam Complaints                                   64KB
        Subscriber Data                                   1008KB
        Suppression Lists                                 72KB
        Unsubscribes                                      112KB
        Uploaded Images                                   28KB
        Misc Studio Data                                  443MB
Engine
        Archived Messages                                 16KB
        Bad Addresses                                     68KB
        Clicks and Opens                                  44KB
        Delivery Attempt Logs                             1005MB
        Disk Queue                                        3MB
        Incoming Email                                    168KB
        Send Summary Files                                368KB
        SimpleMH Message Log                              8KB
        Time Summary Files                                4MB
        Misc Engine Data                                  3MB
General
        Events                                            8KB
        Redis                                             38MB
        Web Server Logs                                   23MB
        Other Logs                                        51MB
The Report Hierarchy
The reports that are generated by greenarrow disk_usage have a three-layer hierarchy:
| Product | This is the section of the report for the named product. The products are: 
 | 
| Category | A category of data, like  | 
| Item | A specific filesystem path or PostgreSQL table. Individual items are only shown when the --details option is on. | 
Each product contains multiple categories, and each category includes one or more items.
Performance Considerations
The greenarrow disk_usage command runs both PostgreSQL queries and disk usage commands like du to gather its information. These operations are disk I/O intensive, so, depending on how much space your GreenArrow installation is using, and how fast the storage is, greenarrow disk_usage commands could take a long time to complete.
To mitigate this issue, greenarrow disk_usage streams its output, printing each usage figure as soon as it’s calculated. It’s safe to Ctrl-c to cancel a report that’s still running.
Performance issues can be further mitigated by targetting a specific subsection of the report with the --area option.
Options
The options listed in sections below tell greenarrow disk_usage  what operations it is to perform and how to format its output. All of these options are optional, and multiple options may be specified in any order.
--details
The default report (shown in the Overview section) reports on disk usage on a category level. For example, the last line shows that “Other Logs” take up 48MB. Adding the --details option causes the report to indicate which filesystem paths and PostgreSQL tables make up the category. In some cases, small PostgreSQL tables are aggregated into a single tables using less than 1MB each line for conciseness.
The detailed report is lengthy, so you may wish to supplement the --details option with one or more --area options.
Here’s an example:
# greenarrow disk_usage --details
GreenArrow Disk Usage Report
Studio
        Attachments
                table: s_attachments                                               16KB
                TOTAL                                                      16KB
        Bounces
                table: s_stat_bounces                                              112KB
                TOTAL                                                     112KB
        Clicks
                table: s_stat_clicks                                               128KB
                table: s_links                                                     56KB
                TOTAL                                                     184KB
        Content: Campaigns, Autoresponders, Web Forms
                table: s_contents                                                  32KB
                TOTAL                                                      32KB
        Imports and Exports
                table: s_suppressed_address_imports                                32KB
                table: s_subscriber_imports                                        48KB
                table: s_subscriber_import_progresses                              40KB
                files: /var/hvmail/var/studio-data/subscriber_imports              8KB
                files: /var/hvmail/var/studio-data/subscriber_exports              0KB
                files: /var/hvmail/var/studio-data/suppressed_address_imports      0KB
                files: /var/hvmail/var/studio-data/organizations                   24KB
                TOTAL                                                     152KB
        Opens
                table: s_stat_opens                                                64KB
                TOTAL                                                      64KB
        Sents
                table: s_stat_sents                                                160KB
                TOTAL                                                     160KB
        Spam Complaints
                table: s_stat_scomps                                               64KB
                TOTAL                                                      64KB
        Subscriber Data
                table: tables using less than 1MB each                             1008KB
                TOTAL                                                    1008KB
        Suppression Lists
                table: s_suppression_lists                                         32KB
                table: s_suppressed_addresses                                      40KB
                TOTAL                                                      72KB
        Unsubscribes
                table: s_stat_unsubs                                               112KB
                TOTAL                                                     112KB
        Uploaded Images
                table: s_images                                                    16KB
                table: pg_largeobject                                              8KB
                files: /var/hvmail/var/studio-data/campaign_images                 4KB
                TOTAL                                                      28KB
        Misc Studio Data
                table: s_us_zip_codes                                              5MB
                table: tables using less than 1MB each                             2MB
                files: /var/hvmail/studio                                          436MB
                files: /var/hvmail/var/studio-tmp                                  28KB
                TOTAL                                                     443MB
Engine
        Archived Messages
                table: archived_message                                            16KB
                TOTAL                                                      16KB
        Bad Addresses
                table: bounce_bad_addresses                                        56KB
                table: bounce_repeat_tracker                                       8KB
                files: /var/hvmail/var/simplemh-bad-addresses.cdb                  0KB
                files: /var/hvmail/var/bounce_processor_repeat_tracker.cdb         4KB
                TOTAL                                                      68KB
        Clicks and Opens
                table: clickthrough_urls                                           16KB
                table: clickthrough_clicks                                         24KB
                files: /var/hvmail/var/clickthrough-tracking-emaillist             4KB
                TOTAL                                                      44KB
        Delivery Attempt Logs
                files: /var/hvmail/log/ram-qmail-send                              305MB
                files: /var/hvmail/log/bounce-qmail-send                           95MB
                files: /var/hvmail/log/disk-qmail-send                             605MB
                TOTAL                                                    1005MB
        Disk Queue
                files: /var/hvmail/qmail-disk/queue                                3MB
                TOTAL                                                       3MB
        Incoming Email
                files: /var/hvmail/maildata                                        168KB
                TOTAL                                                     168KB
        Send Summary Files
                files: /var/hvmail/log/send-summary                                368KB
                TOTAL                                                     368KB
        SimpleMH Message Log
                table: simplemh_message_log                                        8KB
                TOTAL                                                       8KB
        Time Summary Files
                files: /var/hvmail/log/time-summary                                4MB
                TOTAL                                                       4MB
        Misc Engine Data
                table: eng_dd_read_state                                           1MB
                table: tables using less than 1MB each                             1MB
                TOTAL                                                       3MB
General
        Events
                table: events                                                      8KB
                TOTAL                                                       8KB
        Redis
                files: /var/hvmail/data/redis                                      38MB
                TOTAL                                                      38MB
        Web Server Logs
                files: /var/hvmail/apache/logs                                     23MB
                TOTAL                                                      23MB
        Other Logs
                files: /var/hvmail/log/bounce-processor                            9MB
                files: /var/hvmail/log/config-agent                                992KB
                files: /var/hvmail/log/dd-dispatcher                               68KB
                files: /var/hvmail/log/dd-logreader                                980KB
                files: /var/hvmail/log/event-processor                             996KB
                files: /var/hvmail/log/httpd                                       300KB
                files: /var/hvmail/log/logfile-agent                               128KB
                files: /var/hvmail/log/logfile-summary                             4KB
                files: /var/hvmail/log/logfile-writer                              16KB
                files: /var/hvmail/log/postgres                                    19MB
                files: /var/hvmail/log/pure-authd-studio                           4KB
                files: /var/hvmail/log/pure-ftpd                                   4KB
                files: /var/hvmail/log/qmail-pop3d                                 976KB
                files: /var/hvmail/log/qmail-smtpd                                 9MB
                files: /var/hvmail/log/qmail-smtpd2                                4MB
                files: /var/hvmail/log/qmail-smtpd3                                4KB
                files: /var/hvmail/log/redis                                       944KB
                files: /var/hvmail/log/redis-np                                    108KB
                files: /var/hvmail/log/rpc                                         24KB
                files: /var/hvmail/log/rspawn-limiter                              928KB
                files: /var/hvmail/log/send-summary-queue                          4KB
                files: /var/hvmail/log/simplemh                                    800KB
                files: /var/hvmail/log/simplemh2                                   884KB
                files: /var/hvmail/log/smtp-sink                                   4KB
                files: /var/hvmail/log/studio                                      948KB
                files: /var/hvmail/log/studio-worker                               912KB
                TOTAL                                                      51MB
--area
The --area option restricts the report to a specific product or category. Areas take two forms:
- 
    A specific product. This can be "Engine","Studio"or"General".
- 
    A specific "Product: Category"combination, separated by a colon and optionally whitespace. For example,"Studio: Suppression Lists"or"General: Events".
It is not currently possible to use the --area option to target a specific filesystem path or PostgreSQL table.
If you include multiple --area options, then all sections that match any --area option will be printed, but no section will be printed more than once. For example, greenarrow disk_usage --area "Studio" --area "Studio: Suppression Lists" will print the report for all of Studio and the “Suppression Lists” category will only be shown once.
--json
The --json option causes the report to be shown using JSON pretty-print formatting.
Here’s an example of a JSON encoded report without --details turned on:
# greenarrow disk_usage --area "Engine" --json
{
  "GreenArrow Disk Usage Report": {
    "Engine": {
      "Archived Messages": {
        "total": {
          "disk_used": 16,
          "disk_used_human": "16KB"
        }
      },
      "Bad Addresses": {
        "total": {
          "disk_used": 68,
          "disk_used_human": "68KB"
        }
      },
      "Clicks and Opens": {
        "total": {
          "disk_used": 44,
          "disk_used_human": "44KB"
        }
      },
      "Delivery Attempt Logs": {
        "total": {
          "disk_used": 1029608,
          "disk_used_human": "1005MB"
        }
      },
      "Disk Queue": {
        "total": {
          "disk_used": 3420,
          "disk_used_human": "3MB"
        }
      },
      "Incoming Email": {
        "total": {
          "disk_used": 168,
          "disk_used_human": "168KB"
        }
      },
      "Send Summary Files": {
        "total": {
          "disk_used": 368,
          "disk_used_human": "368KB"
        }
      },
      "SimpleMH Message Log": {
        "total": {
          "disk_used": 8,
          "disk_used_human": "8KB"
        }
      },
      "Time Summary Files": {
        "total": {
          "disk_used": 4172,
          "disk_used_human": "4MB"
        }
      },
      "Misc Engine Data": {
        "total": {
          "disk_used": 2688,
          "disk_used_human": "3MB"
        }
      }
    }
  }
}
In the above report, GreenArrow Disk Usage Report is a hash with a key for each product. Each product is a hash which in turn contains a hash named total that is structured as shown below:
| total hash 
 | |||||
Here’s an example of a JSON encoded report with --details turned on:
# greenarrow disk_usage --area "Engine" --json
{
  "GreenArrow Disk Usage Report": {
    "Engine": {
      "Archived Messages": {
        "total": {
          "disk_used": 16,
          "disk_used_human": "16KB"
        }
      },
      "Bad Addresses": {
        "total": {
          "disk_used": 68,
          "disk_used_human": "68KB"
        }
      },
      "Clicks and Opens": {
        "total": {
          "disk_used": 44,
          "disk_used_human": "44KB"
        }
      },
      "Delivery Attempt Logs": {
        "total": {
          "disk_used": 1029636,
          "disk_used_human": "1006MB"
        }
      },
      "Disk Queue": {
        "total": {
          "disk_used": 3420,
          "disk_used_human": "3MB"
        }
      },
      "Incoming Email": {
        "total": {
          "disk_used": 168,
          "disk_used_human": "168KB"
        }
      },
      "Send Summary Files": {
        "total": {
          "disk_used": 368,
          "disk_used_human": "368KB"
        }
      },
      "SimpleMH Message Log": {
        "total": {
          "disk_used": 8,
          "disk_used_human": "8KB"
        }
      },
      "Time Summary Files": {
        "total": {
          "disk_used": 4172,
          "disk_used_human": "4MB"
        }
      },
      "Misc Engine Data": {
        "total": {
          "disk_used": 2688,
          "disk_used_human": "3MB"
        }
      }
    }
  }
}
When both the --json and  --details options are turned on, each usage category, such as Misc Engine Data has a components array added to it to show the usage for individual items.
Here’s how the components array is structured:
| components array 
 | |||||||||
--postgres-bloat
The --postgres-bloat option causes the report to estimate the percentage of disk space used by a table and its indexes which is “bloat”. Bloat is empty space which was previously used by rows or index entries which have either been changed or deleted. Valid values range from 0% (no bloat) to 100% (all space is bloat).
The bloat figures are shown on a per-table basis, and individual tables are only shown in the detailed report, so specifying --postgres-bloat always causes a detailed report to be shown, regardless of whether the --details option was explicitly used.
When the --json option is not used, the PostgreSQL bloat estimate is included in parentheses following each table name. For example, the s_subscriber_import_progresses table below contains 15% bloat space, meaning that if we were able to recover 100% of the bloat space, 6KB (which is 15% of 40KB) would be freed:
table: s_subscriber_import_progresses                              40KB (15%)
When the --json option is used, the PostgreSQL bloat estimate is included in two new keys - bloat, which contains a floating point number and bloat_human, which contains a string.
{
  "type": "table",
  "name": "s_subscriber_import_progresses",
  "disk_used": 40,
  "disk_used_human": "40KB",
  "bloat": 15.0,
  "bloat_human": "15%"
},
The report skips calculating the bloat percentage for some short-lived tables. It marks the tables that it has skipped by reporting the bloat as null when the --json option is used and - when the --json option is not used.
Please keep the following in mind when reviewing the PostgreSQL bloat estimates:
- 
    Calculating the bloat of PostgreSQL tables and indexes can be resource intensive, so expect reports to take longer to complete when these figures are calculated. 
- 
    These are only estimates because --postgres-bloatuses techniques like querying the data from the last time each table was analyzed by PostgreSQL to make the estimates calculate more quickly.
- 
    Some bloat is a good thing, so PostgreSQL reserves some space by design. For example, binary tree indexes attempt to keep 10% of their index pages free to reduce fragmentation. The above example’s 15% bloat figure is not unusual on a server that’s operating normally. 
- 
    We don’t have a hard rule for what we consider to be problematic bloat, but the following examples may help: - Usually, when PostgreSQL bloat has been an issue in the past, it’s been a situation where one, or a small subset of tables were taking up the majority of the disk space, and had bloat figures over 30%.
- Sometimes small tables get bloated, and it’s not worth addressing. For example, if a table is occupying 12MB of space, and is 75% bloated, it’s probably not worth investigating, because the best possible outcome is freeing 9MB of space.
 
Please contact GreenArrow technical support if you have any questions about how to interpret, or address the bloat figures that you see.
--postgres-only
The --postgres-only option causes the report to only show PostgreSQL table entries. File entries are excluded.
--postgres-only and --files-only are mutually exclusive.
--files-only
The --files-only option causes the report to only show file entries. PostgreSQL table entries are excluded.
--help
The --help option prints a concise usage summary:
# greenarrow disk_usage --help
greenarrow: Usage:
  greenarrow disk_usage [OPTIONS]
This command will generate a report of the disk space used by GreenArrow.
Application Options:
      --details         include extra details in the output
      --area=           show a specific area of disk usage
      --json            print JSON formatted output
      --postgres-bloat  estimate the amount of disk space that PostgreSQL uses in excess of its minimum possible size
      --postgres-only   only show the PostgreSQL table portions of the report
      --files-only      only show the filesystem path portions of the report
Help Options:
  -h, --help            Show this help message
Disk Space Reclamation Options
The Usage Categories section below lists the components that make up each category shown in the report.
Some of these components have disk space reclamation options. The larger a component, the more likely it is to have a documented disk space reclamation procedure. If the procedure has been publicly documented, it’s linked to from the Usage Categories section.
There are also some disk space reclamation procedures which we haven’t been documented up to this point, either because they’re infrequently used, or because implementing them requires advanced knowledge of GreenArrow’s internals. Please contact GreenArrow technical support if you believe that part of GreenArrow is using more disk space than it should, and would like to find out if we have any undocumented disk space reclamation methods available.
When disk space is reclaimed from ordinary files, the results can be seen immediately by re-running the report.
When disk space is reclaimed from PostgreSQL tables, re-running the report usually shows that the table’s size is unchanged. This is because when data is deleted from a PostgreSQL table, the table itself continues to occupy the same amount of space that it did before. PostgreSQL simply marks the space that was used by the deleted data as being available for new data. The report will show this reclaimed space as bloat when the --postgres-bloat option is used.
For example, suppose you have a PostgreSQL table that’s occupying 2GB, and you free 1GB of space in it. The disk usage report will still show that the table is using 2GB of data. If you later add 1GB of data to that same table, the disk usage report will continue to show that the table is using 2GB of space, because the 1GB of space that was freed by PostgreSQL earlier gets reused by the new data.
Usage Categories
The following sections show the hierarchy of files and PostgreSQL tables shown in the report and link to any relevant documentation.
Studio
The Studio portion of the report is only shown if either:
- Your GreenArrow license includes Studio.
- Studio’s total usage is at least 600MB. The reason for this threshold is that Engine and Studio have some shared code, so Engine-only installations typically have a few hundred megabytes of files that would be classified as belonging to Studio in the report. If there are more than 600MB of Studio files, that’s a sign that Studio was used at some point in the past - perhaps by a previous Studio license, and that the past license’s data is still present.
Attachments
| s_attachments table | Campaign attachments are stored in this table. | 
Bounces
| s_stat_bounces table | This table’s data retention settings can be controlled by adjusting the Campaign Bounces data retention setting | 
Clicks
| s_stat_clicks table | This table’s retention settings can be controlled by adjusting the Campaign Clicks data retention setting. | 
| s_links table | This table is used to store the original URL, Stat ID, and a unique identifier for each URL used in a send that uses click tracking. | 
Content: Campaigns, Autoresponders, Web Forms
| s_contents table | This table is used to store the contents of campaigns, including their subjects, HTML versions, and text versions. | 
Imports and Exports
| s_suppressed_address_imports table | This table is used when importing Suppression Lists. | 
| s_subscriber_imports table | This table is used when importing subscribers. | 
| s_subscriber_import_progresses table | This table is used when importing subscribers. | 
| /var/hvmail/var/studio-data/subscriber_imports files | This folder is used when importing subscribers. | 
| /var/hvmail/var/studio-data/subscriber_exports files | This folder is used when exporting subscribers. | 
| /var/hvmail/var/studio-data/suppressed_address_imports files | This table is used when importing Suppression Lists. | 
| /var/hvmail/var/studio-data/organizations files | This folder is used to store files uploaded via FTP. Space can be reclaimed by logging in with your FTP account and deleting files. | 
Opens
| s_stat_opens table | This table’s retention settings can be controlled by adjusting the Campaign Opens data retention setting. | 
Sents
| s_stat_sents table | This table’s data retention settings can be controlled by adjusting the Campaign Recipient Data retention setting. | 
Spam Complaints
| s_stat_scomps table | This table’s data retention settings can be controlled by adjusting the Campaign Spam Complaints data retention setting. | 
Subscriber Data
This category includes all tables used to store Studio’s subscriber records. Most mailing lists are stored in their own table which is named using an s_subscribers_ prefix. The mailing list specific tables that use at least 1MB of space will appear in the report when the --details option is used. The category also contains:
| s_subscribers table | This table stores subscriber data for mailing lists that are not large enough to have had their own table with an  | 
| s_subscriber_statuses table | This table stores the status of each subscriber. | 
| s_pending_subscribers table | This table contains subscription requests pending confirmation. | 
| s_subscriber_recent_activities table | This table is used to temporarily store data on recent sends, clicks, and opens. | 
| tables using less than 1MB each table | This entry shows the combined usage of all tables that store Studio subscriber data, and which use less than 1MB of disk space. | 
The space used by subscriber records in a mailing list may be reclaimed by deleting that mailing list. Think carefully about your decision before doing this, though. Deleting a mailing list, then re-creating it will prevent the default unsubscribe link, bounce processing, and spam complaint processing systems from deactivating subscribers on the new list, which in turn can cause subscriber engagement and deliverability issues.
The above tables are discussed in more detail in the Direct Database Access document.
Suppression Lists
| s_suppression_lists table | This table stores the settings for each suppression lists, excluding representations of individual suppressed addresses. | 
| s_suppressed_addresses table | This table stores representations of email addresses on suppression lists. | 
Unsubscribes
| s_stat_unsubs table | This table’s data retention settings can be controlled by adjusting the Campaign Unsubscribes data retention setting. | 
Uploaded Images
| s_images table | Images used for campaigns and autoresponders are stored in this table. | 
| pg_largeobject table | The pg_largeobject table is used to hold large objects. This table isn’t always populated only by images, but they’re the category in this report that’s most often a contributor. | 
| /var/hvmail/var/studio-data/campaign_images files | Images used for campaigns and autoresponders are stored in this directory. | 
Misc Studio Data
This category includes all tables whose name have an s_ prefix and which aren’t listed elsewhere in the report. As a result, you may see tables listed in your report which aren’t listed below.
| s_us_zip_codes table | This table contains data on US zip codes and is used by the Segmentation Builder. | 
| tables using less than 1MB each table | This entry shows the combined usage of all tables that meet the following criteria: 
 | 
| /var/hvmail/studio files | All files in  | 
| /var/hvmail/var/studio-tmp files | All files in  | 
Engine
Archived Messages
| archived_message table | Sample messages get recorded in this table when a Mail Class has the Archive a Sample of Messages option turned on. | 
Bad Addresses
| bounce_bad_addresses table | This table stores the addresses that are eligible for Bad Address Suppression. | 
| bounce_repeat_tracker table | This table is used by the Bounce Processor to determine when to deactivate subscribers for repeated bounces. | 
| /var/hvmail/var/simplemh-bad-addresses.cdb files | This table is used by the Bad Address Suppression. It contains a subset of the  | 
| /var/hvmail/var/bounce_processor_repeat_tracker.cdb files | This file is used by the Bounce Processor to determine when to deactivate subscribers for repeated bounces. | 
Clicks and Opens
| clickthrough_urls table | The original URLs used in SimpleMH click tracking are stored here. | 
| clickthrough_clicks table | Data on each SimpleMH click and open that takes place is stored here. | 
| /var/hvmail/var/clickthrough-tracking-emaillist files | See SimpleMH and Studio Remote List Email Address Retention for details on what this directory contains and how to control its retention settings. | 
Delivery Attempt Logs
| /var/hvmail/log/ram-qmail-send files | Delivery attempt logs for Greenarrow’s ram-queue. | 
| /var/hvmail/log/bounce-qmail-send files | Delivery attempt logs for Greenarrow’s bounce-queue. | 
| /var/hvmail/log/disk-qmail-send files | Delivery attempt logs for Greenarrow’s disk-queue. | 
Delivery attempt log data retention settings can be adjusted using the hvmail_set log_disk_space command.
Disk Queue
| /var/hvmail/qmail-disk/queue files | Disk-queue messages are stored here. The usage of this directory can be reduced by doing any of the following: 
 | 
Incoming Email
| /var/hvmail/maildata files | Mailboxes used for incoming email are stored here. | 
Send Summary Files
| /var/hvmail/log/send-summary files | These files are used by Send Statistics. The Troubleshooting Disk Space Issues document contains a section on moving  | 
SimpleMH Message Log
| simplemh_message_log table | This table can optionally be used to log messages as they pass through SimpleMH. See Logging all SimpleMH Messages. | 
Time Summary Files
| /var/hvmail/log/time-summary files | Files used by Dynamic Delivery Statistics | 
Misc Engine Data
| tables using less than 1MB each table | This entry shows the combined usage of all GreenArrow PostgreSQL tables that meet the following criteria: 
 | 
General
The General categories are shared by Engine and Studio.
Events
| events table | Individual events recorded by the Event Notification System are stored here until they’re delivered. | 
Redis
| /var/hvmail/data/redis files | This folder contains Redis data. | 
Web Server Logs
| /var/hvmail/apache/logs table | This folder contains web server logs. | 
Other Logs
The Other Logs category contains one entry for each file or folder within /var/hvmail/log that isn’t counted elsewhere.
