Operations Ideals

This is an overview of operations practices that I consider ideal – things that I’d want to have in my ops environment by the time I’d run out of things to do (however unlikely), along the lines of 12-factor 2.0.

  1. Every environment is different – You’ll notice this is the only entry with a number. That’s because I think this one’s the only real certainty. Some of us may not need to concern ourselves with secrets in version control because our repo is self-hosted and its deployment loop is entirely internal. Only you will know which applies to you, so keep that in mind as you read.
  • Redundancy – Avoiding single points of failure is a high priority for a systems engineer. Ideally, redundancy should be present as low as PSUs & disk arrays, to multiple servers serving the same role & DB replicas, and as high as entirely redundant environments (multiple AWS regions, blue-green deployments etc).
  • Staging the app – If we concern ourselves with the user’s experience (we should), staging the applications is a must. If we concern ourselves with the dev’s experience (we should), a dev environment is also a must. Devs need a place to check their work & collaborate, and QA needs a place to test that work in a production-like environment.
    The package deployed to staging should also be the same package deployed to production.
  • Backups – This one can’t really be stressed enough. Backups are the most important thing to have on hand when things go south. Ideally we’ll have at least two entirely separate backup procedures e.g. database dump plus snapshots of the DB on disk, or tarring of dirs we know are important plus VM snapshots. In addition:
    • They need to be checked/tested regularly (no point in having backups we can’t use – the GitLab DB removal incident was a good wake-up call for us all)
    • They should be mirrored offsite in case the disaster is a really bad one; and
    • We need spare capacity on the infrastructure to restore them (in case we’re doing so because we lost some hardware).
  • Infrastructure as code – Nothing’s more satisfying to me than to behold an environment built in a completely reproducible way. It also vastly increases my comfort making changes and my ability to quickly revert a bad change or re-familiarize myself with components I’ve not looked at in a while. The code will also serve as a form of documentation, and will supplement backups and disaster recovery plans. As with all code and documentation, this should be tracked in version control.
    However, there will always be things that can’t be orchestrated, and steps that need to be taken on fresh systems before orchestration can be used. We need to document the process of bootstrapping new equipment and any other steps that need to be run manually. Some things that might be included here are hardening SSH, check running services & open ports (disable/uninstall anything unnecessary), updates, installing basic utilities, boostrap orchestration (dependencies, configuration etc), firewalling, and changing the timezone (UTC, please).
  • Keep the secrets a secret – We should have a solution in place to allow us complete control over secrets used in the infrastructure and apps. Ensure they’re never checked into version control, but that they’re readily on hand when needed, and can be securely accessed by the systems that use them.
    This applies to secrets used by people, too. A credential management system is incredibly valuable when assigning and/or sharing credentials to or among staff.
  • TLS all the things – Unless we’re at pretty high scale and have significant CPU constraints, we’re unlikely to notice the CPU overhead of a TLS termination (and if we are, we’ve got bigger problems). If we can, we should have TLS available (preferably enforced) on all services available from the infrastructure to safeguard the data traversing the network. If the private network isn’t entirely under our control (cloud or managed services), TLS should be applied to it also. It’s important to check the config with something like SSLTest or testssl.sh as default TLS config suffers from very poor defaults on most server platforms in the name of backward compatibility.
  • Keep the systems up to date – We don’t have to look far for examples of compromise caused by vulnerable, outdated software. System, runtime, and dependency updates should be run as regularly as is reasonable. Where possible keep an eye out for vulnerabilities that might warrant an immediate update (e.g. Ubuntu Security Notices). We should try to arrange for the infrastructure to be capable of receiving routine maintenance without downtime to reduce friction when updates are needed, though this can be costly depending on the architecture of the app.
    If we can, consider having an external vulnerability scan run regularly to catch any vulnerabilities in the applications as well.
  • Keep ourrselves up to date – Like any other tech field, things in ops are constantly changing. We should try to keep abreast of new technologies, updated best practices, and keep our cogs constantly turning by regularly reading industry news (HN, Slashdot, Ars, Schneier, LWN), blogs or mailing lists from our vendors/apps/runtimes & other interesting material (e.g. engineering blogs).
  • Monitoring – Monitoring is a big one, because there are many facets to having good coverage. We want a few different types:
    • Broad – We should have something that collects a lot of metrics by default because we can’t always know in advance what we might want to see when issues crop up. Also, the broader the data collection, the more we can catch potential issues before they become problems. Munin is a good example, and it also has a lot of sensible alerting thresholds by default. It’s also worth a remind to not overlook the hardware (PSUs, disk arrays etc) if physical equipment isn’t regularly monitored in-person.
    • Narrow – Narrow monitoring is for the things we do know in advance that we want to track and alert on. Something along the lines of Nagios is good for this.
    • Applications – Not all interesting metrics are found on the server level. Indeed, depending on the application, some of the most useful metrics are found in the app itself. We should have a service available for the applications to report metrics to, something along the lines of StatsD. If the runtime offers it, we should pull any interesting metrics out of that too (memory status, time spent in various calls etc).
    • External – Internal monitoring isn’t going to be able to alert us if we lose a router/uplink, or there was a power outage. We should have something outside the environment monitoring the monitors. There are a staggering number of choices here, at wildly varying price points.
    • Alerting and escalation – We need more than one person on alerts so that if one isn’t immediately available the other can get in quick. If we’re a small operation we can just have two on the alerts with the understanding that each will respond if available, but if we’re larger we can have a rotation and escalate if the first line isn’t available (PagerDuty is a popular option for this).
    • System mail – Linux itself likes to alert us if things go wrong. By default though, these alerts just hang out on the system and we don’t see them until we next login. We can forward that mail from the system to the actual email accounts, and ditto for cron mail.
    • Status page – If we have a lot of eager users and they tend to notice when things aren’t running smoothly, we can consider setting up a status page to let them know when issues crop up (StatusPage is a popular, albeit quite pricey, option for this).
  • Consider outsourcing the real hard/time consuming stuff – Unless we’re Facebook, we can’t do everything ourselves. Much like we don’t build our own servers, there will be other things we’re not good at too. We can consider outsourcing things that can be purchased for a reasonable price from a reputable vendor that would occupy too much for our time or need 100% availability. Production email and DNS are good examples.
  • Be lazy – The ultimate goal is to automate ourselves out of a job (however unlikely we are to eventually succeed). When we find ourselves doing something mindless more than a couple of times, we should consider automating it. Where possible, I find it best to use something central to manage the regular tasks instead of relying on cronjobs etc (Jenkins is a good choice).
  • Secure the public-facing network (and maybe the private one, too) – It’s good security posture to have some way of routinely ensuring the public network isn’t exposing things it ought not to, and alerting us if it is. Routine port scans (nmap is easily automated), vulnerability scans (there are a few reasonably priced PCI compliance vendors who’ll do the vuln scanning alone), and a Google alert for some known sensitive info is a good start. If the private network isn’t guaranteed private (read: “cloud”) we should apply the same principals there also. Centralized firewalling is great if we can afford it, and network segmentation can be considered where it would be helpful. If we can manage it, the applications can do with some protection also – we can consider a WAF (e.g. mod_security).
    We should also ensure any remote-access endpoints are well protected (solid config on VPNs and SSH etc), and consider how the machines themselves are protected (SELinux, audit logging etc). It’s also worth mentioning leaking data, like having verbose logging, headers, errors on production, or publicly exposed storage should be checked for.
  • Always apply the principal of least privilege – All the crons probably don’t need to run as root, all the users probably don’t need sudo access on all the machines, and we probably don’t need that default superuser with default credentials.
  • Centralize logging – Centralized logging provides multiple benefits – we have all the logs in one place which allows for efficient searching & broad alerting, and also keeps copies off the box in case of a breach. If we find ourselves lacking some detail, we can enable auditd or SELinux in permissive mode (both reasonable steps regardless). The ELK stack is popular here.
  • Continuous integration and deployment – If applicable to the environment, we should consider setting up a CI/CD pipeline. Have each commit built and deployed to the dev environment. When it comes time to a production deployment we should try to keep routine deployments zero-downtime to reduce friction and the negative effects of a bad deploy.
  • Be familiar with the codebase – Throwing code or a binary over the wall from dev to ops is something to be left to Fortune 500s of the world. In a smaller org, I like to think of the ops guy as a part of the dev team. As such, we should be familiar enough with the codebase to at least fix small bugs and diagnose issues at runtime. If we have the chance, we can find a few small tickets and resolve them ourselves (with the dev team’s blessing of course). We also ought to generally keep abreast of dev work that’s underway (tickets, commits etc) in addition to of course being involved in the larger roadmap.
  • 2fa – When we become a target, the bad guys getting access to our vendor accounts can be scarily trivial. Enable 2fa everywhere it’s available and request additional lockdown from any vendor who offers it (e.g. “please don’t reset credentials for just anyone who happens to know my mother’s maiden name”).
  • Use a shared mailbox for vendor accounts – In the same way documentation is made available to everyone to whom it’s relevant, so should access to vendor accounts. Not every vendor provides the ability to have additional users (if they do, using it is preferable), so we should have a shared mailbox for shared accounts. This means that no one will have to go trawling through our email in an emergency, and multiple people can keep an eye out for urgent messages.
  • Keep an eye on cloud costs – It can be easy to overlook the cost of the cloud because everything seems so affordable at lower scale. However, costs can get radically out of control if we’re slashdotted, reddited, HN’d, or something goes wrong. Unlike with traditional vendors we’re unlikely to get a call if things go nuts. Even worse, if we’re compromised we could be on the hook for resources we didn’t even use. Cloud vendors typically provide billing alert functionality to keep us from having to worry about it (e.g. CloudWatch).

Some Tips for Your Self-Hosted WordPress Blog

After maintaining my own self-hosted wordpress install for a while, and a few business ones prior, I’ve run in to some things that aren’t often discussed in the usual literature. Hopefully they will be useful to you.

Don’t install a lot of plugins
Almost everyone falls into this trap at first. I want this feature, oh look a plugin! But they become a nightmare after a while. You have to keep them updated, they get abandoned, many are sticking ads or promoting their pro version in your admin panel, some introduce very nasty security holes (see “wpscan” below), and if the situation gets really bad they start making your blog’s markup look like html-vomit.
In direct contravention of this point I’ll be suggesting some basic plugins along the way. All of these will be highly rated, fairly basic, single-task and fully-free. With a bit of luck, these will be all you need.

Keep everything up to date
Wordpress’ greatest vulnerability is the ease with which it’s neglected. It checks for updates on its own, but unlike a desktop application, it has no way to notify unless you visit it.
WP Updates Notifier will email you when plugins, themes or WordPress itself needs updating.

Use WPScan on yourself
WPScan is a well maintained and comprehensive tool for finding vulnerabilities and configuration problems with WordPress installs.
You can see an anonymized scan here. Most of the output is due to WPScan not being able to determine plugin versions, but the vulnerability listings should give you a good idea of why it’s important to keep things up to date.

Double-check your settings

  • Don’t notify linked blogs: This one’s more of a personal preference than anything, I just don’t like peppering other people’s blogs with links to mine just because I mentioned them – this ain’t Facebook.
  • Disallow pingbacks: The flip side of the above.
  • Disable signup: Unless you’re expecting people to sign up to the site for some reason, it’s definitely worth disabling open signup.

Spam spam spam
No silly, I’m not encouraging you to spam. WordPress out of the box is incredibly susceptible to spam, primarily via comments. Here are some things you can do to curb it:

  • Enable comment moderation: If you’re like me, you might prefer to hold any comments for approval just in case something nasty gets through.
  • Enable comment notification: You want to be notified when someone posts a comment, in case they get past whatever you put in their way.
  • Disable comments: This one’s a bit of a last-resort type thing, but it’s preferable to leaving them open if you don’t want to moderate.
  • Put a captcha up for comments: I use WP-reCAPTCHA for this. It uses the modern check-mark type captcha from reCaptcha.
  • Enable Akismet: Automattic run a network which aims to detect WordPress spam. You need to grab an API key, but it’s definitely worth it if you’d rather not use reCaptcha or disable comments entirely.

Don’t use “admin” as your username
Wordpress used to create a default user called “admin”, with admin level access of course, and thus just about any brute-forcing technique uses admin at the outset. Newer ones will enumerate blog users from the slug in your posts, but as always we’re just trying not to a piece of low-hanging fruit.
You can put a captcha on login, but I prefer to block IPs after a number of failed logins. Limit Login Attempts will do this with notifications too.

Use a basic theme
Some of the biggest messes come from interdependency between plugins and themes, particularly when they don’t update in unison. Use as basic a theme as you’re happy with, preferably without any dependencies on unrelated plugins and such.

Start out with a child theme
At some point there will undoubtedly be some minor annoyance with your theme, and you’ll wish you could sneak in a single line of css or just add one quick filter. You can’t do that to the theme directly, because it will all be gone after the next update. This is why WordPress has child themes. They’re basically a theme that inherits everything from the parent, but you can override anything you like by modifying the child theme. Very quick and easy to make, and you’ll thank yourself later.

Start out with the URL and scheme you intend to use long-term (and make it TLS!)
You’d think WordPress would be pretty flexible when it comes to domains and TLS vs unsecured, but you’d be wrong. Changing either after you’ve been up and running for a while is a huge pain in the ass. You’ll either be running half a dozen pretty scary queries against your DB or be dependent on a plugin rewriting your output for the life of your blog. Make sure you pick the right domain, decide on www vs no www, and get yourself a TLS certificate all in advance.

Install WordPress with least privilege and as isolated as possible
On shared hosting providers it’s very tempting to have everything run as the same user out of the same home directory and the like. Since WordPress is so highly targeted, it’s best to isolate it from your other websites and databases as much as possible.

Backups
I mean…duh. If a particularly nasty asshole finds his way into your install, he’s just as likely to wipe it clean as he is to install some driveby malware bullshit, and nothing’s worse than losing that post you just spent an hour of your life on, let alone all the rest. Backup often and keep them as far away from your install as possible.

Caching
Yeah this one probably is mentioned just about everywhere, but it’s really worth it. Simply, WordPress is a dog. Every request, no matter how vanilla, runs through thousands of lines of PHP before it gets to the browser. What’s best here is if you have a host that provides varnish and memcache or something along those lines to take the load off your install. If you don’t have such a host, there are a couple of very popular plugins that can get you some of the way there: W3 Total Cache and WP Super Cache, but I really wouldn’t recomment either.

Sort out your email
By default, WordPress’ email functionality is (notoriously?) extremely limited. This is primarily because Automattic expects your host to configure the way outgoing email is sent through PHP, and not through WordPress itself, which seems a little self-defeating to me. You’ll find that shared hosting providers (often managed hosting providers also) typically don’t configure email settings for your account so they can more readily keep track of what you’re sending with authenticated means. Unfortunately, this all means that your email is likely to be of very poor deliverability out of the gate, and plugins will once again need to come to the rescue. WP Mail SMTP is by far the most widely deployed SMTP plugin for WordPress and should have you up and running with a host, username and password in no time. I recommend mail-tester.com for testing the deliverability of your emails once configured.

Beware the plugin repository
Plugins are surely one of WordPress’ biggest selling points, but much like Google Play and Microsoft’s App Store, can harbor some pretty nasty shit. What’s worse, the plugin repo doesn’t seem employ any kind of reasonable search result ordering, nor does it allow filtering or sorting beyond the basic search term. Be very careful that any plugin you install has a good rating (across many actual ratings) and a large install base. There’s also a quite a number of freemium type plugins in there that offer basic or crippled functionality and constantly prompt you to upgrade.

Use PressThis for linking
If you post links to your blog, a great tool somewhat inconspicuously located under the Tools menu (which otherwise has basically nothing of interest in it) is brilliant. It’s a bookmarklet that will pop up a compose window pre-populated with some content (at least a link, and maybe the content of the description meta tag and maybe some images) from whatever page you were on, making posting links real fast.

Use the WordPress Codex
When searching on WordPress issues, it’s tempting to click articles that specifically describe your issue, but the Codex will almost always give you a far fuller understanding if the topic you’re searching on, and you’re more than likely to learn something entirely new along the way. The Hardening WordPress guide is a great example.

Security is of paramount concern
I think it’s reflected fairly well in this post that security is an important consideration when running wordpress, if for no other reason than it’s such a popular target. Consider reading the Codex’s Hardening WordPress article, and OWASP’s WordPress Security Implementation Guide. Some of the more paranoid security precautions might include:

  • Disabling file editing by adding define('DISALLOW_FILE_EDIT',true); to wp-config.php. If an attacker gains access to your admin panel, they don’t be able to edit any files that are part of your plugins, themes or WordPress itself.
  • Move your wp-config.php out of the web root. WordPress will look for wp-config.php in ../ if it can’t find it in the web root, and this will prevent your config (and secrets) from being exposed if your host ever accidentally disables PHP.
  • Delete readme.html and wp-admin/install.php. readme.html exposes your WordPress version publicly, and wp-admin/install.php should never be needed again beyond the initial install, and could be found to be vulnerable in the future.
  • Add index files to directories you wouldn’t want to display indexes for (primarily plugins, uploads, themes etc) so that no one can peek in there if indexes are ever accidentlly enabled on your server.
  • Add define('FORCE_SSL_LOGIN', true); and define('FORCE_SSL_ADMIN', true); to wp-config.php to ensure admin is always accessed securely (assuming you have TLS enabled).
  • Choose a random SQL table prefix ($table_prefix in wp-config.php) so that anyone who finds SQL injection vulnerabilities in your installation will also need to determine table names before they can exploit them.

Considerations When Accepting Credit Card Payments

This post is applicable only if your product has no contact whatsoever with card data (you’re completing SAQ-A) and is processing card-not-present (CNP) transactions exclusively. These are just recommendations for how to keep your product from being among the low hanging fruit where fraud is concerned.

Credit Card Fraud Overview

Contrary to what might seem presumptive, the majority of successful CNP credit card fraud isn’t perpetrated by someone who targeted you. Credit card data has often passed through many pairs of hands before it’s used for any serious level of fraud – once obtained, the data is more than likely to be onsold in bulk for cash by whomever was responsible for the initial theft.

At some stage in this process, the card will need to be verified, which is where the first serious fraud risk presents itself. Verification is typically conducted by making a small, fairly innocuous charge to the card. You may have had your card suspended by the issuer in the past for a charge that seemed so small it surely didn’t warrant the attention, but chances are they were more concerned about what may follow. The small charge, however, is not the real risk to the merchant either – the real risk is the chargeback.

The consumer credit industry has things set up so that the merchant is ultimately responsible for fraud occurring on their account. In addition to losing the value of the transaction so that it can be returned to the defrauded cardholder, the merchant will be assessed a “fee” of typically around 15-20$ per fraudulent transaction. You can imagine this adds up to some pretty expensive fees over time, especially if the goods supplied were not infinitely available. While larger merchants may be able to eat those kinds of costs, or find additional ways to prevent fraud, this typically hurts the little guy much more.

Below are some suggestions on ways a merchant can attempt to avoid occurrences of fraud on their site along with some basic best-practices when accepting credit card payments.

Considerations

  • Ensure you have a barrier to entry that can’t be bypassed in an automated fashion A (decent) captcha would do the job (can’t really go past reCaptcha 2.0 these days).
  • Always have a way of getting in touch with the person claiming to be the cardholder. If you suspect fraud, you’ll want to be able to make contact.
  • The smaller the amount, the higher the potential risk of fraud (see fraud overview). If it fits your model, allow customers to run larger transactions and spend the balance in smaller increments.
  • Enable any fraud protection or prevention mechanisms offered by your card processor.
  • Enable any alerts offered by your processor. These should also include alerts not directly related to fraud, such as settlement summaries, payment notifications or chargeback alerts. Just like monitoring your infrastructure!
  • Be very responsive to disputes or chargebacks. You’ll want to be in good standing if it comes to contest. If you do contest one, provide as much information as humanly possible, the decision is often final can be very expensive for a repeat customer.
  • Enable MFA with your processor, and depending on the nature of and amounts involved in your transactions, enable it on your site as well.
  • Ensure duplicate transaction checking is always functional. Some people find it easier to request a chargeback from their institution than to take the matter up with the merchant.
  • Try to have some sane limits on what can be done on your site (limit the number of CCs per account, upper and lower limits on transaction amounts etc).
  • Allow users to store CCs (a “wallet” or “vault”). It will help you keep track of card usage since you don’t have direct access to the card details, and also adds to customer satisfaction and trust.
  • When a charge is declined, display the actual error to user so they can fix it, instead of running an invalid card multiple times which could be interpreted negatively by your processor.
  • Reauthenticate the user (e.g. prompt for password), or preferably the card (e.g. prompt for CVV), when buying patterns change (new shipping address, browsing from adifferent geographic location etc.
  • Collect full name and billing information. Not only will this stymie a fraudster without billing information, but it could be very helpful in a chargeback dispute.
  • Support 3d-secure (Verified by Visa, Mastercard Securecode) where possible. Hopefully it will be more broadly supported in the future.

Tips:

  • Use a modern payment processor that allows iframes or hosted fields, it looks much more professional.
  • Use an EV SSL certificate for the domain on which you’re accepting the payments. They’re quite cheap now, and can really help with consumer trust.
  • You may have heard of EMV or chip-and-pin, and how it will save the world from credit card fraud once and for all. Don’t get too excited, it doesn’t protect CNP transactions whatsoever.

Overview of the Hurricane Electric IPv6 Certification

Since I’m a fan of IPv6, but getting quite rusty lately (AWS don’t support IPv6 at time of writing at all, and in their grand bullshit tradition of not telling you jack about what’s in the pipeline, we’re left to wonder when, or at this stage perhaps if, IPv6 is coming), I thought it was time to refresh things a bit, so I took a shot at HE’s IPv6 Certification.

Since my ISP also doesn’t support IPv6 (this is 2015 right?), I had to use various third-party providers to get the job done 🙁 I prefer not to give free advertising to anyone not giving back to the community in some way (like I’m basically doing for HE right now), so if you’re stuck, feel free to leave a comment with a valid email address and I’ll shoot you some suggestions.

A basic outline of the test is as follows:

  1. Five-question multiple choice exam on the basics – the answers are mostly found in their IPv6 primer, sort of a gentle push in the right direction.
  2. Test IPv6 connectivity – you have to visit the page from an IPv6-enabled browser. Since I couldn’t pass this part, a bit of Javascript hackery was in order.
  3. Test an IPv6-accessible web server – You place a file in the document root for them to retrieve.
  4. Test IPv6-enabled email – you receive a code via email that you must enter to proceed.
  5. Test working rdns for mail server
  6. Test fully-IPv6 dns resolution – give them a domain and they check for quad-As at every level.
  7. Test IPv6 glue – give them a domain and they check for glue at the root.

Note 1: All practical steps are followed by their own multiple-choice test.
Note 2: You only pass a test once you’ve answered all questions correctly. If you make a mistake, all answers are cleared (some tests have more than 20 questions) and the answer order is randomized.
Note 3: Test questions are interspersed with some marketing research questions about IPv6 adoption.

There are four additional tests you can take (“enthusiast”, “administrator”, “professional”, “guru”) after you’ve completed the tests above. There are no practical elements to them – they’re just multiple-choice quizzes with increasing levels of difficulty. I stopped short of passing guru because I was relying too heavily on guess work. Be prepared to answer questions about uncommon operating systems, obscure prefixes, 6to4 and 4in6, and router commands etc.

And, not to be forgotten, be sure to grab your free swag.

Strong TLS Configuration (and Getting an A+ On the SSL Test)

In yet another case of improper defaults, two of the most common web servers on the internet today do an unnecessarily poor job of securing “secure” connections. Below are snippets you can use in Apache or Nginx to better protect connections to your server of choice (I’m assuming you already know how to configure your server and are just looking for optimal configuration options). You can read further below to learn some more about the why. Because we’re nerds, you’ll want to run your site through Qualys’ SSLTest before and after to compare results. Be sure to have a look at the tips and caveats section before you go diving right in.

a+ screenshot

Nginx


Apache



Note: I like to remove the IfModule block surrounding the Apache TLS config. In my opinion, the server should fail to start if your TLS config is broken somewhere.

Breakdown

• We select only TLS v1 and upward (on in the case of Apache, deselect all SSL versions enabled by default.
• We limit the set of available ciphers to those in the ‘high security’ group, except those that include MD5, DSS or a NULL algorithm (which breaks down to “DHE-RSA-AES256-SHA:DHE-DSS-AES256-SHA:AES256-SHA:DHE-RSA-AES128-SHA:DHE-DSS-AES128-SHA:AES128-SHA:EDH-RSA-DES-CBC3-SHA:EDH-DSS-DES-CBC3-SHA:DES-CBC3-SHA”).
• Typically, a server will go along with whatever cipher the client requests. In order to enforce forward secrecy on all modern clients, we instruct the server to assert its preference (primarily for IE and Java, surprise surprise).
• Finally, we send an HSTS header which forces the browser to upgrade insecure connections (and gets us an A+!).
• (For Apache only, your DH params should be concatenated to the end of your certificate).

Tips and Caveats

Warning 1: Sending a Strict-Transport-Security (HSTS) header will force any supporting client to upgrade their connections to your server until the max-age specified in the header since you last sent it. It is not trivial to revert, so be sure you intend to serve your site over TLS for the foreseeable future before sending it.
Warning 2: This will break compatibility with IE on Windows XP and Java v6. There are a few options you can enable to fix this, but since both platforms were end-of-life’d some time ago I’m just going to write them off as a lost cause.
Warning 3: Generating your own DH parameters will take a really long time, and consume a ton of resources (I would actually recommend opening a screen or tmux session on a box you don’t care about and having a meal while it runs).
Warning 4: When you behold your A+ rating, it’s tempting to try for all 100s (oh yeah, I know), but the “improvements” you’d need to make provide no real benefit. Your cipher suite can only include 256-bit-or-higher symmetric algorithms (higher bit length does not correlate with greater security), and only TLS v1.2 can be used (there are no significant known attacks on TLS v1 or v1.1). Not to mention you’ll break compatibility with anything built before just about yesterday (see what your handshake simulation would look like here).
Tip 1: Using a 4096-bit key gets you an easy ‘100’ on Key Exchange, but as enlightened people we don’t care about such things of course.
Tip 2: Since your DH params don’t need to be secret, you can borrow one from elsewhere instead of generating it yourself. If you do, just try to get it from a relatively unpopular (preferably not even public) source.
Tip 3: Remember the realities of encryption.
xkcd security comic
Note 1: The SSL Test also scans for common vulnerabilities, so if you’re exposed to something recent your score won’t look good. Be sure to always keep your server up-to-date, and check your score before and after you make any changes. Read up on some recent ones here.
Note 2: You can consider adding ‘includeSubdomains’ to your HSTS header if appropriate for your use case.

Optional Additions

Online Certificate Status Protocol (OCSP) Stapling: Due to the size of CRLs (Certificate Revocation Lists) these days, OCSP was brought about to reduce the burden on both user agent and issuer in this respect. Unfortunately, OCSP has been less than reliable, so browser manufacturers have had to disregard OCSP when the issuer fails to respond in a timely manner. OCSP stapling allows your server to cache a valid response from an OCSP server and deliver it to the client during a handshake, thereby reducing the client’s burden and eliminating the potential for a timeout condition.
HTTP Public Key Pinning (HPKP): This one is definitely designed for larger organizations (but knowledge is knowledge). This tells the browser that it should accept certificates for this domain only from a specified issuer. This reduces your exposure to fraudulent certificates issued for your domain.

Further Reading

• The SSLTest’s scoring guide: https://www.ssllabs.com/downloads/SSL_Server_Rating_Guide.pdf
• Mozilla’s wiki entry for TLS: https://wiki.mozilla.org/Security/Server_Side_TLS
• Details on the draft of TLS v1.3 (mostly removing unused and insecure features): https://en.wikipedia.org/wiki/Transport_Layer_Security#TLS_1.3_.28draft.29
• SSL Labs’ own list of testing tools: https://github.com/ssllabs/research/wiki/Assessment-Tools

Bonus – Generating your CSR and key in a one-liner

AWS Regions: Not All Created Equal

During a recent conversation with some mates, the topic of AWS stability came up. I’ve had nigh on zero complaints with AWS (except with their support, but that’s a story from another time) since I first began using the platform four or so years ago, so I was baffled when several of them questioned its reliability. As it turns out, even now (as opposed to way back when EBS would drop out every month or two), they have good reason.

Because it’s the default, many new users often spin up their stuff in Amazon’s Northern Virginia region. However, if you look at Amazon’s own outage data, and keep in mind that us-west-2 (Oregon) has a near-identical feature set, and identical pricing, you’ll be left wondering what you’re doing in N. Virginia.

Amazon’s status page can be found here. A quick look at how she works will lead you to find data.json which contains all the outage information for the last year. After a bit of processing and some gnuplot magic, you can have a rough plot of the various outages. This is how things looked at time of writing.

data-current

While this doesn’t take into account the severity or duration of any given outage, I think it’s pretty fair to say that us-east-1 seems to be the least reliable region in AWS (even if you exclude the cluster from Sept 20th when the entire region’s API went bananas). I was lucky enough to start out using AWS with a company that was already aware of this, but for those who weren’t, consider sticking any new stuff in Oregon!

The data.json file I used for the graph above can be found here.

Test Streaming With FFMPEG

What can you say about FFMPEG other than it’s fucking brilliant. Yes there’s bugs, yes they don’t mind breaking the interface a little more often than you’d like, yes you’ll inevitably be using a build that doesn’t support that feature you wanted…okay maybe it’s got a problem or two, but seriously, what an unbelievably flexible tool. You might imagine a guy working for a video streaming startup might be using it a fair amount, and as it happens you’d be right. Below is a breakdown of my favorite testing incantation. Many parts can also be broken down for general a/v testing purposes.

TL;DR: Give this a go:

The ntpdate line is from my Getting a quick and dirty ntp sync article. This step is optional and should only be used where you generally don’t give a shit about your system clock. It’s also only necessary if you want the timestamp functionality shown below.

The loop is here so that when you break shit you don’t have to start the script again, and fuck knows that happens plenty. Thanks to the sleep afterward, a double-tap (no pun intended) on ctrl+c will be enough to kill it dead.

-re is in ffmpeg for our very purpose here. It reads from the input source at its actual framerate (as opposed to as fast as possible by default) so we can see things as we expect them to look, instead of like that time you tried an emulator that wasn’t designed for your CPU and the music sounded like a bunch of chimpmunks on speed. You can drop this one if you’re not testing a live stream.

-f lavfi is used multiple times in this incantation. It allows us to specify an arbitrary number of inputs and outputs. In this case, the inputs are testsrc and sine and the output is the test stream.

testsrc shows a motion test image with a visual counter, and has a few configurable options (see the docs for more info). We’re specifying frame size and rate here. You can have a look at life which can be fun, and mandelbrot as alternatives, but testsrc is of the most utility when attempting to diagnose issues. Click here for a screenshot.

sine generates the audio signal of a sine wave (see the docs). the f option specifies the frequency, and the b option causes a pulse at n times the carrier frequency each second. It’s very handy for determining video/audio sync, and as an audible indication that the stream is running (if you’re having video issues). Unfortunately, the sound is really fucking annoying, so I turn it down really low with -af "volume=0.1". You can also have a look at flite which would allow you to synthesize voices, but ffmpeg is rarely compiled with libflite, and it would probably be just as fucking annoying. Of course feel free to use an actual audio source for input, but then the incantation isn’t portable.

-codec:a libfdk_aac instructs ffmpeg to use the Fraunhoffer aac encoder for the audio. ffmpeg is often not compiled with the library as it’s not Free, so as an alternative you can specify -strict experimental -acodec aac instead (see here for more details). You can take a look at ffmpeg -codecs | grep -i aac to see which aac decoders and encoders you have available.

-codec:v libx264 instructs ffmpeg to encode the output as h.264, commonly conflated with its most common container MP4. Since ffmpeg affords us the opportunity to specify a few tuning parameters for x264 we’re also specifying:

  • -preset ultrafast which instructs libx264 to use only the very fastest optimizations
  • -profile:v main which instructs ffmpeg to encode to the ‘main’ h.264 profile which should be supported on all modern devices, but you can try using high if you want better reproduction for some reason, or ‘baseline’ if want really serious compatibility, see here for more info
  • -pixel_format yuv420p (or -pix_fmt in older builds) which instructs ffmpeg to encode to the YUV 4:2:0, a very well supported color space
  • -g 48 which instructs ffmpeg to provide a keyframe at every 48 frames, or two seconds in this case. It’s usually safe to specify `-g` at double your framerate unless you have some serious action in frame
  • -tune zerolatency which enables some additional latency-saving parameters in x264 (you can go even further here by using CBR, and setting maxrate & bufsize to the size of a single frame, but we’re looking for the lowest latency with the least amount of work here)

drawtext is a filter for drawing text over the video. In this case we’re using it to draw the system time across the left-hand side of the video. Combined with the synchronization of the system clock at the start, it gives you a decent idea of how long it takes for your stream to be replayed (assuming your own clock isn’t too far out of course). Here we’re specifying font file (DejaVuSans.ttf should be available at this location on Trusty), the text, color (& alpha) and coordinates. You can see some of the other options here. For this one to work you will need to have compiled ffmpeg with at least --enable-freetype.

-f flv instructs ffmpeg to package the stream in an FLV container. If you’re using this for anything other than a live stream, this is unnecessary, it will default to mp4.

And we’re done! Here’s what the finished product looks like: ffmpeg-test.flv. Have a look below for some reference materials and tips & tricks. Enjoy your streaming!

If you’re interested in building ffmpeg for this configuration, see below. Probably go have a nap after you press return on this baby.
Note: you will need multiverse enabled for libfdk-aac-dev. You can remove it from the package list and --enable-libfdk-aac from the build args to disable it.

For ffmpeg:

For libav (virtually identical):

Tips and Tricks:

If you’re wondering why you can’t get ffmpeg or ffplay to consume a particular RTMP stream, try putting quotes around the URL and adding  live=1 to the end of the URL. It forces ffmpeg to call FCSubscribe, which is required by, amongst others, EdgeCast.

If you want the basic details of another stream you’re looking at, try ffprobe $URL. A much more detailed, but way less human readable incantation is ffprobe -show_streams $URL

Getting a quick and dirty NTP sync


Under normal circumstances, this incantation would be very bad practice, but we’re looking for a quick and dirty (and portable) sync here, so just be sure to only use it on a box where you generally don’t care about your clock.

ntpdate will ordinarily take a domain as an argument, but in this case we specify an IP (this ensures we only get a single server, and we can avoid a whole handful of milliseconds performing a DNS reuest [yes that’s a joke]). Here I’m using 129.6.15.29 also known as time-b.nist.gov and is among the oldest and most reliable non-commercial open-access stratum-1 ntp servers in North America. Also, if you’re ever in a pinch and she ain’t working, there’s another either side 🙂

The -u instructs ntpdate to use an unprivileged port which, while unnecessary under ideal circumstances, will help you avoid colliding with whatever NTP daemon might be running on your machine.

The -p 1 instructs ntpdate to use only a single packet to synchronize, which leads to a much higher skew than you’d otherwise have, but since we’re still almost certainly talking well below sub-second accuracy it’s fine for our purposes here, and much faster than permitting the default of 4.

-b forces the time to be adjusted in a single step instead of allowing ntpdate to slew, which would take more time.

Finally, -v and -d increase verbosity and enable debugging respectively (output from which should be limited due to -p 1 and having a single server to talk to) so we can see what’s going on (and of course just to add to the nerdiness factor).

AWS Business Support – A Cost of Doing Business

Anyone finding themselves frustrated with AWS’s Basic or Developer support response times will find no shortage of company on the Internet, primarily on Amazon’s own community forums. In my experience, and in that of many others I know, there’s little difference between Basic and Developer support, and you can often experience wait times of several days between responses from (sometimes seriously unskilled) technical support officers.

It basically puts you in the position of having to buy Amazon’s Business Support plan (which costs the greater of 100$ or 10% of your monthly bill at timing of writing), or jumping ship altogether. While this may seem a mighty expense, you’ll still come in under what you’d be paying for Rackspace’s fanatical support, while retaining all the additional benefits AWS has to offer. In short, if you can’t or won’t get yourself on business support, run, run fast.

Poking around someone else’s infrastructure

I recently had reason (and permission) to go digging in someone else’s backyard, and I thought a list of things to look at might be useful to someone else. This is just off the top of my head, so there’ll be a bunch of holes, but it works as a decent starting point for me. It doesn’t include anything overtly malicious, but do be careful (and of course I’m not in any way responsible for how you use the information in here). Have fun!

  • Google the shit out of them. No seriously, at least combine their name with a few things:
    • Emails and names
    • Blog
    • Forums
    • Reviews
    • Competitors
    • Alternatives
    • Engineering
    • Social Media
    • Google News
  • For any IPs
    • Reverse DNS (dig -x)
    • Whois
    • Find any other sites that are being hosted on the IP (try Censys, Shodan, DNSDumpster & crt.sh for a start, and search for it in Bing with an ip: prefix)
  • For any domains
    • Whois
    • Run it through Mozilla’s Observatory: https://observatory.mozilla.org/
    • Run it through the Wayback Machine
    • robtex.com to get tons of information about the domain and DNS
    • whatsmydns.net to see what DNS returns around the globe
    • Look at robots.txt/humans.txt and sitemap.xml
    • dig
      • MX
      • TXT (spf, dmarc etc)
      • A, AAAA, CNAME
      • NS
      • SOA
      • ANY (don’t be surprised if you don’t get anything from this)
  • Find whatever subdomains you can (try here for a start)
  • Use the shit out of the inspector. You can look for:
    • Headers
    • JS includes (metrics, monitoring etc)
    • CDNs
    • Indications of a CMS
    • Other subdomains
    • Mobile readiness
    • Indications of build process, tooling, language and platform
    • Code comments
    • Console output
  • Other domains with similar owner info. This one’s a pretty inexact science, and is pretty seriously hampered by whois privacy
  • Port scan (see A Basic Nmap Scan). Be careful not to piss too many people off here, and don’t do it from your home machine.
  • traceroute
  • Decompile and/or strings any native apps
  • Monitor traffic from native apps of flash applets with – wireshark and/or tcpdump (I prefer to use the former to process the output of the latter)
  • Look at the source of any email correspondence
  • Run any TLS endpoints through SSL Test and check for a default cert if using SNI
  • Specifics:
    • Run wpscan on wordpress
  • Tools and links:
    • http://backtrack.offensive-security.com/index.php?title=Tools
    • https://github.com/makefu/dnsmap
    • https://github.com/jvehent/cipherscan
    • https://censys.io/
    • http://www.wolframalpha.com/
    • https://github.com/laramies/theHarvester
    • https://sshcheck.com/
    • sitemaps
    • spam RBLs