Pēteris Caune

How Healthchecks.io Sends Webhook Notifications

Webhooks are a powerful way to notify external systems about checks changing state in Healthchecks.io. Webhook notifications are available to all user accounts, paid and free.

Webhooks were the second notification method supported by Healthchecks (the first one was email). The webhook delivery code started as a simple requests.get(user_supplied_url) and evolved. Today, the webhook integration in Healthchecks supports:

  • HTTP GET, POST, and PUT requests with user-defined request bodies.
  • User-defined request headers.
  • Placeholder values like $NAME and $STATUS that can be used in the URL, the headers, or the request body.
  • Separate webhook configurations for “check goes up” and “check goes down” events.
  • Retries when requests time out or return non-2xx status code.

In terms of implementation, none of the above is super complicated. When the user sets up a webhook integration, we collect the webhook configuration. When it is time to send a notification, we assemble the URL, the headers, and the request body, and pass them to our HTTP client library of choice. But two security-related aspects are a little more interesting:

  • We want to prevent webhook requests from accessing private IP addresses (10.x.x.x, 192.168.x.x, …).
  • Webhook targets can sometimes take a long time to respond. One user’s slow notifications should not block or delay another user’s normal notifications.

Private IP Addresses

Malicious users can set up webhook URLs to tamper with resources in the Healthchecks.io internal network. They can also set up DNS records that resolve to private IP addresses. So it is not enough to check for private IP ranges in webhook URLs using e.g. regular expressions.

I switched Healthchecks to using pycurl for making outbound HTTP requests. pycurl is a Python wrapper for libcurl, and libcurl lets you specify a CURLOPT_OPENSOCKETFUNCTION callback function. This function receives an IP address after DNS resolution, and can decide whether to connect to it or not.

Healthchecks has a site-wide configuration setting for enabling/disabling webhook requests to private IP addresses. This setting is disabled on the hosted service at Healthchecks.io. Operators of self-hosted Healthchecks instances, on the other hand, sometimes specifically need webhooks to access services running inside their internal network, and they can enable it.

When migrating Healthchecks from requests to pycurl, I wrote a wrapper for pycurl that mimics the requests API, and thus could be used as a drop-in replacement. It does not cover the full functionality of requests, but it does cover the functionality that Healthchecks uses.

Slow Webhook Targets

Users can set up webhooks to targets that take a long time to respond, and then generate frequent notifications to these targets. Doing so would keep the notification-sending process busy and delay notifications for all other users. Users could do this maliciously, but this could also happen (and has happened) unintentionally.

The first obvious mitigation was to implement a time budget for each webhook delivery: if a webhook delivery (including retries) takes too long, we abort it.

Another mitigation was to prioritize notifications to integrations with lower historic send times. If we have multiple deliveries lined up, start with the quick ones, and do the slow ones last.

The notification sender is implemented as a Django management command (“manage.py sendalerts”). A simple way to increase sending capacity would be to run multiple “sendalerts” processes concurrently. This works, but each process needs at least one database connection. I am not running PgBouncer (and want to delay introducing new infrastructure pieces for as long as possible), so I cannot go too crazy with many concurrent “sendalerts” processes.

A few weeks ago I completed work on another idea to increase the sending capacity. The “sendalerts” process now uses multiple worker threads to send notifications. The worker threads share database connections using psycopg3 connection pool, which Django recently added support for. There can be more worker threads than database connections available in the pool, but the worker threads are programmed to return DB connections to the pool before potentially long network IO operations, allowing other threads to advance. With an appropriately set worker count, this allows hundreds of in-progress webhook requests while using only a few DB connections.

After implementing the worker threads, I removed the prioritization by historic send time. I also increased the timeout value for outbound HTTP requests as now I could afford to! The timeout is currently set to 30 seconds, and Healthchecks retries failed requests up to 2 times. So a single delivery can take up to 3 * 30 = 90 seconds.

Closing Notes

Healthchecks.io now uses the threaded notification sender for delivering all notification types, not just webhooks. There are integration types other than webhooks that are sometimes slow. For example, Signal and MS Teams notifications sometimes take multiple seconds to complete. The above changes benefit all integration types, not just webhooks. Webhooks, however, are the most risky, as they can be fully configured by users.

Thanks for reading,
–Pēteris

Running One-man SaaS, 9 Years In

Healthchecks.io launched in July 2015, which means this year we turn 9. Time flies!

Previous status updates:

Money

Healthchecks.io currently has 652 paying customers, and the monthly recurring revenue is 14043 USD. MRR graph:

Side note: to minimize the number of data sub-processors, I am not using revenue analytics services. I used a script and a spreadsheet to make the MRR graph!

I’m happy to see MRR gradually go up, but I’m not optimizing for it. Healthchecks.io is sustainable as-is, and so I’m optimizing for enjoyment and life/work balance.

More stats (user count, check count, pings/day) are available on the Healthchecks.io About page.

Still a one-man business?

Yes, Healthchecks.io is still a one-man business. Until 2022, I was part-time contracting. Since January 2022 Healthchecks.io has been my only source of income, but I work on it part-time.

At least for the time being I’m not looking to expand the team. A large part of why I’m a “solopreneur” is because I do not want to manage or be managed. A cofounder or employee would mean regular meetings to discuss what’s done, and what’s to be done. It would be awesome to find someone who just magically does great work without needing any attention. Just brief monthly summaries of high-quality contributions, better than I could have done. But I don’t think I can find someone like that, and I also don’t think I could afford them.

Growth Goals

I’m not planning to tighten the limits of the free plans. I started Healthchecks in 2015 because I thought the existing services (Dead Man’s Snitch and Cronitor) were overpriced. I started with “I think this can be done better and cheaper”, and I’m sticking with it.

For the same reason, I’m also not planning to raise pricing for paid plans.

I’m choosing not to pursue enterprise customers who ask about PO billing, payments by wire transfer, custom agreements, and signing up to vendor portals. “But you are leaving money on the table!” – yes, it is a conscious decision. In my situation, the extra money will not make a meaningful difference, but the additional burden will make me more busy and grumpy.

Feature-wise, I am happy with the current scope and feature set of Healthchecks. I am not planning to expand the scope and add e.g. active uptime monitoring, hosted status pages, or APM features.

Healthchecks the product is hobbit software and Healthchecks.io the business is a lifestyle business.

Hosting Setup

The hosting setup is mostly the same as in 2022. Just a few updates:

  • Web servers upgraded to Hetzner’s AX42 (AMD 8700GE, 8 cores). On the old machines, saw a few nonsensical Python exceptions. A kernel update and a reboot didn’t fix it. Rather than messing with hardware troubleshooting, I upgraded to newer, faster, and more efficient machines.
  • Database servers upgraded to Hetzner’s EX101 (Intel 13900, 8+16 cores). I was setting up new database replicas after an outage and failover event and took the opportunity to upgrade hardware.
  • Healthchecks.io now sends its own email using maddy.
  • Healthchecks.io now stores ping body data in S3-compatible object storage. This keeps the PostgreSQL database size down but adds reliance on an external service.

That’s it for now, thank you for reading! Here’s to another 9 years, and in the closing here’s a complimentary picture of me trying to fit through pull-up bars, and my kids, Nora and Alberts, cheering:

Happy monitoring,
Pēteris,
Healthchecks.io

Data Breach Report: Some SMS Notifications Sent To France and Italy Were Exposed

On July 2, 2024 we received a notice from Twilio, our SMS provider, about a data leak involving IdentifyMobile, one of their downstream carriers. The downstream carrier had made an AWS S3 bucket public from May 10-15, 2024. The bucket contained message-related data sent between January 1, 2024, and May 15, 2024.

After requesting additional information, Twilio informed us that the leak included 13 SMS notifications sent by Healthchecks.io. The leaked data includes message body, recipient number, timestamp. Unfortunately Twilio could not determine which specific recipient numbers were impacted, but they knew only messages to France and Italy were impacted. On July 5, we notified all users with phone numbers in the affected regions, 40 accounts.

Q: I received “Notice of Security Incident With SMS Notifications” from Healthchecks. Is there anything I should do?

Your Healthchecks.io account is not compromised, no need to change its password.

You could consider switching from SMS to a different notification method which does not require your phone number, for example Pushover. No service is immune to security incidents, but if they do not have your phone number in the first place, they cannot leak it.

Q: Why did you notify 40 accounts if only 13 messages were exposed?

Twilio provided a list of exposed message IDs, but not the associated phone numbers. We cannot associate message IDs with phone numbers, because we have configured our Twilio account to retain message logs for only 7 days. We had selected the relatively short log retention period, ironically, to minimize the damage in case the message logs somehow leaked.

We asked Twilio support to request the recipient phone number data from IdentifyMobile, as they presumably still have access to the data that was exposed. According to Twilio, IdentifyMobile are “currently unable to share the requested information due to the sensitive nature of it”.

Timeline

  • May 10, 2024: IdentifyMobile makes AWS S3 bucket containing sensitive data public.
  • May 15, 2024: IdentifyMobile fixes the leak.
  • July 2, 2024: Twilio sends a notice of security incident to its customers.
  • July 3, 2024: We request additional information from Twilio support.
  • July 4, 2024: Twilio support clarifies what information was exposed, and provides a list of the 13 exposed message IDs.
  • July 5, 2024: We send a notice of security incident to the 40 potentially affected users.
  • July 5, 2024: We ask Twilio support to request recipient numbers from IdentifyMobile. On the 3rd attempt, Twilio agrees to do it.
  • July 10, 2024: Twilio support informs us IdentifyMobile cannot share the requested information.
  • July 11-16, 2024: We ask Twilio support followup questions about plans to audit their other carriers and sub-carriers, and receive non-specific answers.
  • July 19, 2024: We publish this report.

OnCalendar schedules: Monitor Systemd Timers with Healthchecks.io

Healthchecks now supports OnCalendar schedules, used for scheduling tasks with systemd timers. Here’s what’s new: when creating a check, you can now switch between “Simple”, “Cron” and “OnCalendar” schedules:

You can also edit schedules (and switch schedule types) for existing checks:

The UI control for entering the schedule is a multi-line textbox, and yes, you can specify multiple schedules there – Healthchecks will expect a ping when any schedule matches:

Note: the schedule field is currently limited to 100 characters. You will be able to enter 2-3 schedules, but probably not 10+ schedules.

systemd allows you to specify a timezone inside the OnCalendar expression. So does Healthchecks:

The API now supports OnCalendar schedules as well. You can pass either cron schedule or OnCalendar expression(s) in the “schedule” field for the Create a new check and Update an existing check calls, and Healthchecks will detect the schedule type automatically:

$ curl -s https://healthchecks.io/api/v3/checks/ \
    --header "X-Api-Key: fdYYw32ftDvYQoCe4C1JUgp7SlPbOYTI" \
    --data '{"name": "Runs at 8AM", "schedule": "8:00"}' | jq .
{
  "name": "Runs at 8AM",
  "slug": "",
  "tags": "",
  "desc": "",
  "grace": 3600,
  "n_pings": 0,
  "status": "new",
  "started": false,
  "last_ping": null,
  "next_ping": null,
  "manual_resume": false,
  "methods": "",
  "subject": "",
  "subject_fail": "",
  "start_kw": "",
  "success_kw": "",
  "failure_kw": "",
  "filter_subject": false,
  "filter_body": false,
  "ping_url": "https://hc-ping.com/97f70e1c-bf2b-4244-ba44-de413c93fab4",
  "update_url": "https://healthchecks.io/api/v3/checks/97f70e1c-bf2b-4244-ba44-de413c93fab4",
  "pause_url": "https://healthchecks.io/api/v3/checks/97f70e1c-bf2b-4244-ba44-de413c93fab4/pause",
  "resume_url": "https://healthchecks.io/api/v3/checks/97f70e1c-bf2b-4244-ba44-de413c93fab4/resume",
  "channels": "",
  "schedule": "8:00",
  "tz": "UTC"
}

Under the hood, the OnCalendar schedule parsing logic is implemented in a separate “oncalendar” library. Feel free to use it in your own Python projects as well!

The OnCalendar schedule support is live on https://healthchecks.io and available to all accounts. Happy monitoring!

–Pēteris

Comparison of Cron Monitoring Services (November 2023)

In this post I’m comparing cron monitoring features of four services: Cronitor, Healthchecks.io, Uptime Robot, Sentry.

How I picked the services for comparison: I searched for “cron monitoring” on Google and picked the top results in their order of appearance.

Disclaimer: I run Healthchecks.io, so I’m a biased source. I’ve tried to get the facts right, but choosing what features to compare, and what differences to highlight, is of course subjective. When in doubt, do your own research!

Business Stats

Cronitor launched in 2014, is registered in the United States and runs on AWS. Cronitor is a bootstrapped company, and is operated by three friendly humans. Cronitor started as a cron monitoring service, but has expanded to website uptime monitoring, real user monitoring, and hosted status pages. Cronitor is a proprietary product and uses the SaaS business model.

Healthchecks.io launched in 2015, is registered in Latvia and runs on Hetzner (Germany). Healthchecks.io is a bootstrapped company, run by a solo founder. Healthchecks.io focuses on doing one thing and doing it well: alerting when something does not happen on time. Healthchecks.io is open source (source on GitHub), users can use the hosted service, or run a self-hosted instance.

Uptime Robot launched in 2010, is registered in Malta, and runs on Limestone Networks, AWS, and DigitalOcean. UptimeRobot started as a free website uptime monitoring service and added cron monitoring and hosted status pages support in 2019. After getting acquired in late 2019, UptimeRobot accelerated development and reorganized its pricing structure. Uptime Robot is a proprietary product and uses the SaaS business model.

Sentry launched in 2012, is registered in the United States and runs on AWS and Google Cloud. Sentry is a VC-funded company and has 200+ employees. Sentry started as an error tracking service, grew into APM, and launched cron monitoring support in public beta in January 2023. Sentry uses the SaaS business model, but its source code is available under the FSL license. Sentry is a complex product with many moving parts. Self-hosting is possible but is not trivial.

Pricing

Each reviewed service except Healthchecks.io bundles several products under one account:

  • Cronitor: cron monitoring, website uptime monitoring, RUM, status pages.
  • Uptime Robot: website uptime monitoring, cron monitoring, status pages.
  • Sentry: error tracking, APM, code coverage.

The total set of functionality you get from a paid account on each service is vastly different, so their pricing is not directly comparable. With that in mind, here is the pricing summary for each service, as of November 2023, for monitoring cron jobs specifically.

Cronitor

  • Free plan: monitor up to 5 jobs.
  • Business plan: $2/mo for 1 job.

Monitoring 100 jobs with Cronitor would cost $200/mo.

Healthchecks.io

  • Free plan: monitor up to 20 jobs.
  • Business plan: $20/mo for 100 jobs.
  • Business Plus plan: $80/mo for 1000 jobs.

Monitoring 100 jobs with Healthchecks.io would cost $20/mo. Healthchecks.io offers sponsored accounts for non-profits and open-source projects (details).

Uptime Robot

  • Solo plan: $8/mo for 10 jobs or $19/mo for 50 jobs.
  • Team plan: $34/mo for 100 jobs.
  • Enterprise plan: $64/mo for 200 jobs.

Monitoring 100 jobs with Uptime Robot would cost $34/mo. Uptime Robot offers sponsored accounts for charities and other non-profits (details).

Sentry

Sentry Cron Monitoring feature is currently in open beta. The limits for different pricing plans are not known yet. Sentry announced general availability and pricing in January 16, 2024.

  • Free: monitor 1 cron job for free.
  • Paid: $0.78/mo for 1 job.

Monitoring 100 jobs with Sentry would cost $77/mo. Sentry offers sponsored accounts for non-profits, open-source, and students (details).

Timeout-based Schedules

When using timeout-based schedules the user specifies a period (for example, one hour). The monitored system is expected to “check in” (send an HTTP request to a unique address) at least every period. When a check-in is missed, the monitoring system declares an outage and notifies you.

This monitoring technique is also sometimes called Heartbeat Monitoring. All four reviewed services support timeout-based schedules.

Cron Expression Schedules

The user specifies a cron expression (for example, “0/5 * * * *”) and a timezone. The monitoring system calculates expected “check in” deadlines based on the cron expression.

Supported by: Cronitor, Healthchecks.io, Sentry.

Not supported by: Uptime Robot.

Cronitor and Sentry use the croniter library to evaluate cron expressions. Healthchecks.io uses the cronsim library.

Start and Fail Signals

In addition to basic “I’m alive!” check-in messages, monitoring services typically support additional signal types:

  • “job started” signal: allows the measurement of job durations, and alerting when a job takes too long.
  • “job failed” signal: allows the job to explicitly declare itself as failed.

Supported by: Cronitor (docs), Healthchecks.io (docs), Sentry (docs).

Not supported by: Uptime Robot.

Check-in Via Email

With this feature, clients can “check in” by sending an email message to a job-specific email address. This comes in handy when integrating with services that only support status reports via emails, or when working in restrictive environments where only email is allowed through.

Supported by: Cronitor (docs), Healthchecks.io (docs).

Not supported by: Uptime Robot, Sentry.

Auto-Provisioning

With auto-provisioning clients can perform check-ins for jobs that the monitoring system does not yet know about, and the monitoring service registers the new jobs on the fly. Auto-provisioning is handy in dynamic environments where the set of monitored jobs changes frequently.

Supported by: Cronitor (docs), Healthchecks.io (docs), Sentry (docs).

Not supported by: Uptime Robot.

Client SDKs and API

Cronitor provides first-party command-line client and SDKs for Java, JavaScript, Kubernetes, PHP, Python, Ruby, and Sidekiq. There are also third-party SDKs for Terraform and .Net.

Healthchecks.io does not provide first-party client SDKs. There are a number of third-party client libraries.

Sentry provides first-party command-line client and SDKs for Celery, Go, Java, JavaScript, Laravel, Node, PHP, Python, Quartz, Rails, Ruby, and Spring.

Uptime Robot does not provide first-party client SDKs.

All four services provide HTTP API: Cronitor API docs, Healthchecks.io API docs, Uptime Robot API docs, Sentry API docs.

Notification Methods

Each reviewed service supports a number of different ways to deliver downtime notifications:

Cronitor

  • Free: email, webhooks (only GET requests), MS Teams, Slack, Telegram.
  • Paid: Opsgenie, PagerDuty, SMS, Splunk On-Call.

Healthchecks.io

  • Free: email, webhooks, Discord, LINE Notify, Matrix, Mattermost, MS Teams, Opsgenie, PagerDuty, PagerTree, Pushbullet, Pushover, Rocket.Chat, Signal, Slack, Spike.sh, Telegram, Trello, Splunk On-Call, Zulip.
  • Paid: SMS, voice calls, WhatsApp.

Uptime Robot

  • Free: Android and iOS app, email, Google Chat, Discord, Pushbullet, Pushover, Splunk On-Call.
  • Paid: webhooks, MS Teams, PagerDuty, Slack, SMS, Telegram, voice calls, Zapier.

Sentry

  • Free: email, webhooks.
  • Paid: Amixr, Discord, MS Teams, Opsgenie, PagerDuty, Pushover, Rocket.Chat, Rootly, Slack, Spike.sh, SMS, TaskCall, Threads, Splunk On-Call.

Project Management, User and Team Management, Authentication

Cronitor

Cronitor supports organizing jobs into Environments. Within each environment, jobs can be grouped into groups. Jobs can be annotated with tags.

Cronitor supports multiple team members ($5/mo for each additional user). Team members can have “admin”, “user”, “readonly” roles.

Cronitor supports SAML2 SSO, which costs an extra $5/mo for every team member. Cronitor does not support two-factor authentication.

Healthchecks.io

Healthchecks.io supports organizing jobs into Projects. Jobs can be annotated with tags.

Healthchecks.io supports multiple team members with “owner”, “manager”, “user”, and “read-only” roles.

Healthchecks.io does not support any form of SSO. Healthchecks.io supports two-factor authentication using WebAuthn and using one-time codes (TOTP).

Uptime Robot

Uptime Robot does not support grouping or tagging jobs.

Uptime Robot’s higher-priced plans support multiple team members with “admin”, “read”, and “write” roles.

Uptime Robot does not support any form of SSO. Uptime Robot supports two-factor authentication using one-time codes (TOTP).

Sentry

Sentry supports organizing jobs into Projects and Environments.

Sentry supports multiple team members with “billing”, “member”, “admin”, “manager”, and “owner” roles.

Sentry offers many options for SSO: Google, GitHub, Okta, SAML2, and others. All options except Google and GitHub require the Business ($80/mo) billing plan. Sentry supports two-factor authentication using U2F, one-time codes (TOTP), and recovery codes.

Feature Matrix

CronitorHealthchecks.ioUptime RobotSentry
Business registered in🇺🇸🇱🇻🇲🇹🇺🇸
Servers hosted in🇺🇸🇩🇪🇺🇸🇺🇸
Team size3110+200+
Founded in2014201520102012
Jobs in the free plan5200?
Price/mo for 100 jobs$200$20$34?
Self-hosting possible
Timeout-based schedules
Cron expressions
“start” and “fail” signals
Check-in via email
Auto-provisioning
Client SDKs
API
Projects
Team access💰💰
Single sign-on💰💰
Two-factor authentication
Notify via email
Notify via webhooks💰
Notify via Slack💰💰
Notify via Telegram💰
Notify via SMS💰💰💰💰

In Closing

If you notice any factual errors, please let me know (contacts), and I will get them fixed ASAP!

There are many more things to compare. If you are shopping for a cron monitoring service, you will have to decide what is important for you, and likely do some additional research.

Happy monitoring,
– Pēteris

Notes on Self-hosted Transactional Email

Since a little more than two months ago, Healthchecks.io has been sending transactional email (~300’000 emails per month) through its own SMTP server. Here are my notes on setting it up.

The Before

Before going self-hosted, Healthchecks sent email using 3rd-party SMTP relays: AWS SES and later Elastic Email.

The reason for switching from AWS to Elastic Email was GDPR compliance: at the time United States did not have an EU adequacy decision, but Canada (the registration country of Elastic Email Inc.) did.

The primary reason I kept looking for alternatives to Elastic Email was also GDPR compliance: a country with EU adequacy decision is good, but being based in the EU is even better. Another reason was their poor communication during service outages: some outages were not acknowledged on their status page, there were no timely updates via support chat or otherwise, and there were no post-mortems published after outages. To their credit, Elastic Email did fix the outages reasonably quickly, and I was overall happy with the service in terms of functionality and pricing.

The EU-based SMTP Relay Options

There are few EU-based SMTP relay services. None of the big names (AWS SES, Sendgrid, Mailgun, Mailchimp, Postmark) are EU-based. I tested a few options:

  • EmailLabs: OK in terms of functionality and pricing. Judging by the mix of Polish and English in the user interface and documentation seemed geared primarily to the Polish market.
  • SMTPeter: OK in terms of functionality and pricing. It was probably just bad timing, but had a major outage while I was testing it. Small shop.
  • Brevo (formerly Sendinblue): the most prominent EU SMTP relay service. Has open and click tracking enabled by default, and refused to turn it off before seeing live production traffic, so a non-starter for me.

None of the options seemed like an upgrade over what I already had, and I kept circling back to the idea of self-hosting. The common wisdom is that self-hosting email means endless deliverability problems, but maybe-maybe?

The Self-Hosting Options

In May 2023, I spent several weeks researching and trialing self-hosted SMTP servers: mox, Postal, Haraka, Zone MTA, OpenSMTPD, and maddy. My brain was getting fried from jumping between documentation sites, trying to make sense of the feature sets, and the pros and cons of each project. One thing that helped immensely was reading Email explained from first principles – it filled many gaps in my knowledge of email delivery.

Maddy

After experimenting with and strongly considering OpenSMTPD, I ultimately picked maddy. I iterated on a test configuration until I got it to do the required things:

  • Accept email on port 465 from authenticated users.
  • Rewrite its envelope sender from “@healthchecks.io” to “@mail.healthchecks.io” (required for routing bounce messages back to the maddy server).
  • Sign it using DKIM protocol.
  • Put outgoing messages in a queue, attempt to deliver them, and retry with exponential backoff.
  • Deliver messages to remote MTAs from a single, specific IP.
  • When delivery fails, send a webhook notification to a designated webhook handler. For permanent failures, the handler can take appropriate action – unsubscribe a specific user from email reports, or mark a specific email integration as disabled.
  • Listen for incoming email on port 25.
  • When a remote MTA sends a DSN (delivery status notification, “bounce message”), deliver it to the same webhook.
  • Use an automatically provisioned LetsEncrypt certificate for TLS encryption on port 465 and port 25.

I wrote the provisioning scripts for deploying Maddy and its configuration to a server. I added and updated the required DNS entries for SPF and DKIM. I implemented, tested, and deployed the webhook handler that would receive bounce notifications from maddy.

IP Warm-Up

I spent several weeks gradually switching outgoing email traffic from Elastic Email to the self-hosted maddy server. IP warm-up serves two purposes:

  • Slowly builds up the reputation of the sending IP address. Switching the entire sending volume to a new IP address all at once risks getting blocked by the receiving servers.
  • It lets me test email delivery in the production environment and fix any potential problems with fewer negative consequences.

The Failover IP Oopsie

One issue I discovered during the IP warm-up phase was that the brand-new mail server (a Hetzner AX41 dedicated server) experienced minute-long network hiccups a couple of times per day. The cause could be a faulty NIC, a faulty switch, or a noisy neighbor, and the easiest fix is ordering another server and hoping for better luck. In anticipation of such a scenario, I had ordered a failover IP so I could keep using the already warmed-up IP with the new server.

I set up a new server, switched the failover IP to it, and after a few days of testing, no more network hiccups! So I went ahead and canceled the original machine. Then, a few days later, around 2 AM local time, my monitoring notifications went off: email delivery was broken. I had assumed that “failover IP” is more or less what other providers call “floating IP.” Dazed and confused in front of a blue screen in the middle of the night, I realized my misunderstanding and mistake: the failover IP is owned by a specific server. Canceling the server also cancels the failover IP with all its sender reputation.

To fix the immediate problem, I temporarily switched the web servers back to using Elastic Email as the SMTP relay. I asked Hetzner support if there was any way I could get the released IP back. Minutes later, I got a reply stating in perfect German calmness that my request would need to be handled by a different department during business hours. The next morning, Hetzner added the lost failover IP back to my account. Phew!

Local Relay-Only MTAs for Reliability

The internet-facing SMTP server (mail.healthchecks.io) runs on a single machine. Each app server also runs a local maddy instance which accepts outgoing email messages from local clients only, and hands them off to mail.healthchecks.io. If mail.healthchecks.io is unavailable (for example, during server restart), the local maddy instances queue the messages and retry them later.

Summary, Pros, and Cons

The self-hosted maddy server has been handling all Healthchecks.io transactional email for over two months. I am keeping an eye on bounce notifications, the outbound email queue size, and blocklists. So far, there have been no significant deliverability issues–fingers crossed!

Cons of going self-hosted:

  • Self-hosted SMTP server is another service to maintain. It uses up the limited time and mental bandwidth I have.
  • The inevitable deliverability problems will be my problems.
  • In the case of maddy, no pre-built graphical management and monitoring dashboard.

And pros:

  • Complete control of subprocessors with access to customer data (just Hetzner in my case).
  • Complete control over server configuration.
  • Fixed direct costs (as long as a single server can keep up with the sending volume).
  • I learned a bunch of new stuff!

Thanks for reading,
Pēteris

New Feature: Check Auto-Provisioning

Healthchecks recently gained a new feature: check auto-provisioning. When you send a ping request to a slug URL, and a check with the specified slug does not exist, Healthchecks can now automatically create the missing check. This feature requires opt-in: to use it, add a ?create=1 query parameter to the ping URL.

Here’s check auto-provisioning in action (the -I parameter tells curl to send HTTP HEAD requests so that we can see HTTP response status codes easily):

$ curl -I https://hc-ping.com/fixme-ping-key/does-not-exist
HTTP/2 404
[...]

$ curl -I https://hc-ping.com/fixme-ping-key/does-not-exist?create=1
HTTP/2 201
[...]

$ curl -I https://hc-ping.com/fixme-ping-key/does-not-exist?create=1
HTTP/2 200 
[...]
  • The first request returns HTTP 404 (“Not Found”) because a check with a slug does-not-exist does, in fact, not yet exist.
  • The second request has a “?create=1” added to the URL to enable auto-provisioning. The server creates a new check and returns HTTP 201 (“Created”).
  • The third request is the same as the second, but a matching check now exists. The server accepts the ping and returns HTTP 200 (“OK”).

When is this useful? Whenever you are working with a dynamic infrastructure, and want your monitoring clients to be able to register with Healthchecks.io automatically. If you distribute the Ping Key to monitoring clients, each client can pick its own slug (for example, derived from the server’s hostname), construct a ping URL (https://hc-ping.com/<ping-key>/<slug-chosen-by-client>?create=1), and Healthchecks.io will auto-create a new check on the first ping.

Auto-Provisioned Checks Use Default Configuration

With the current auto-provisioning implementation, clients can create new checks on the fly, but they cannot yet specify the period, the grace time, the enabled integrations, or any other parameters. The new checks will be created with default parameters (period = 1 day, grace time = 1 hour, all integrations enabled). If you need to change any parameters, you will need to do this either manually from the web dashboard, or from a script that calls Management API.

Auto-Provisioning and Account Limits

Each account has a specific limit of how many checks it is allowed to create: 20 checks for free accounts; 100 or 1000 checks for paid accounts. To reduce friction and the risk of silent failures, the auto-provisioning functionality is allowed to temporarily exceed the account’s check limit up to two times. Meaning, if your account is already maxed out, auto-provisioning will still be able to create new checks until you hit two times the limit. If your account goes over the limit, you will start to see warnings in the dashboard and email:

As soon as you get the number of checks in your account below the limit (either by upgrading to higher limits, or by removing unneeded checks), the warning will go away. If you do not resolve the warning for more than a month, you will start seeing an “Account marked for deletion” notice in the dashboard. After another month of inaction, the account will be deleted.

Slugs and Names Are Now Separate

In the initial slug implementation check slugs were tied to check names. Changing a check’s name also updated its slug. With the introduction of auto-provisioning, check names and slugs are now decoupled. You can hand-pick a custom slug for each check. You can also rename a check but keep its existing slug.

The “Name and Tags” dialog has gained a new, editable “Slug” field:

Similarly, the Create a Check and Update an Existing Check API calls now support a new slug field.

Happy monitoring,
–Pēteris

Walk-through: Set Up Self-Hosted Healthchecks Instance on a VPS

In this guide, I will deploy a Healthchecks instance on a VPS. Here’s the plan:

  • Use the official Docker image and run it using Docker Compose.
  • Store data in a managed PostgreSQL database.
  • Use LetsEncrypt certificates initially, and load-balancer-managed certificates later for a HA setup.
  • Use an external SMTP relay for sending email.

Prerequisites:

  • A domain name (and access to its DNS settings).
  • A payment card (for setting up a hosting account)
  • Working SMTP credentials for sending emails.

Hosting Setup

For this exercise, I’m using UpCloud as the hosting provider. I’m choosing UpCloud because they are a European cloud hosting provider that I have not used before, and they offer managed databases.

I registered for an account, deposited €10, and launched the cheapest server they offer (1 core, 1GB RAM, €7/mo) with Ubuntu 22.04 as the OS. On the new server, I:

  • Installed OS updates (apt update && apt upgrade).
  • Disabled SSH password authentication
  • Installed Docker by following the official instructions.
  • Created a non-root user, set up SSH authentication for it, and added it to the “docker” group.

Basic docker-compose.yml

On the server, logged in as the non-root user, I created a docker-compose.yml file with the following contents:

version: "3"

services:
  web:
    image: healthchecks/healthchecks:v2.8.1
    restart: unless-stopped
    environment:
      - DB_NAME=/tmp/hc.sqlite

I then ran docker compose up. The Healthchecks container started up, but I could not access it from the browser yet: it does not expose any ports, it has no domain name, and there is no TLS terminating proxy yet.

Add DNS records, Add caddy, Add ALLOWED_HOSTS, SITE_ROOT

I own a domain name “monkeyseemonkeydo.lv”, and for this Healthchecks instance I used the subdomain “hc.monkeyseemonkeydo.lv”. I created two new DNS records:

hc.monkeyseemonkeydo.lv A 94.237.80.66
hc.monkeyseemonkeydo.lv AAAA 2a04:3542:1000:910:80a5:5cff:fe7f:0a17

(These are of course the IPv4 and IPv6 addresses of the UpCloud server).

In docker-compose.yml I added a new “caddy” service to act as a TLS terminating reverse proxy, and I added ALLOWED_HOSTS and SITE_ROOT environment variables in the “web” service:

version: "3"

services:
  caddy:
    image: caddy:2.6.4
    restart: unless-stopped
    command: caddy reverse-proxy --from https://hc.monkeyseemonkeydo.lv:443 --to http://web:8000
    ports:
      - 80:80
      - 443:443
    volumes:
      - caddy:/data
    depends_on:
      - web

  web:
    image: healthchecks/healthchecks:v2.8.1
    restart: unless-stopped
    environment:
      - ALLOWED_HOSTS=hc.monkeyseemonkeydo.lv
      - DB_NAME=/tmp/hc.sqlite
      - SITE_ROOT=https://hc.monkeyseemonkeydo.lv
volumes:
  caddy:

Note: Caddy needs a persistent “/data” volume for storing TLS certificates, private keys, OCSP staples, and other information.

After running docker compose up again, the site loads in the browser:

Add DEBUG=False and SECRET_KEY

Next, I added DEBUG and SECRET_KEY environment variables. DEBUG=False turns off the debug mode, which should always be off on public-facing sites. SECRET_KEY is used for cryptographic signing and should be set to a unique, secret value. Do not copy the value I used!

environment:
  [...]
  - DEBUG=False
  - SECRET_KEY=b553f395-2aa1-421a-bcf5-d1c1456776d7
  [...]

Launch PostgreSQL Database, Add Database Credentials

I created a managed PostgreSQL database in the UpCloud account. I selected PostgreSQL 15.1, and the lowest available spec (1 node, 1 core, 2GB RAM, €30/mo). I made sure to select the same datacenter that the web server is in.

After the database server started up, I took note of the connection parameters: host, port, username, password, and database name. Since I was planning to use this database server for the Healthchecks instance and nothing else, I used the default database user (“upadmin”) and the default database (“defaultdb”). Here is the database configuration:

environment:
  [...]
  - DB=postgres
  - DB_HOST=postgres-************.db.upclouddatabases.com
  - DB_PORT=11550
  - DB_NAME=defaultdb
  - DB_USER=upadmin
  - DB_PASSWORD=AVNS_*******************
  [...]

After another docker compose up, I created a superuser account:

docker compose run web /opt/healthchecks/manage.py createsuperuser

I tested the setup by signing in as the superuser:

Configure Outgoing Email

The Healthchecks instance needs valid SMTP credentials for sending email.

For a production site, I would sign up for an SMTP relay service. Since I’m setting this instance up only for demonstration purposes, and the volume of sent emails will be very low, I used my personal mail (hosted by Fastmail) SMTP credentials.

Here are the new environment variables:

environment:
  [...]
  - ADMINS=meow@monkeyseemonkeydo.lv
  - DEFAULT_FROM_EMAIL=meow@monkeyseemonkeydo.lv
  - EMAIL_HOST=smtp.fastmail.com
  - EMAIL_HOST_USER=meow@monkeyseemonkeydo.lv
  - EMAIL_HOST_PASSWORD=****************
  [...]

The ADMINS setting sets the email addresses that will receive error notifications. The DEFAULT_EMAIL_FROM sets the “From:” address for emails from this Healthchecks instance.

Disable New User Signups

The new Healthchecks instance currently allows any visitor to sign up for an account. This will be a private instance, so I disabled new user registration via the REGISTRATION_OPEN environment variable:

environment:
  [...]
  - REGISTRATION_OPEN=False
  [...]

Add Pinging by Email

Healthchecks supports pinging (sending heartbeat messages from clients) via HTTP and also via email. To enable pinging via email, I set the PING_EMAIL_DOMAIN and SMTPD_PORT environment variables, and exposed port 25:

environment:
  [...]
  - PING_EMAIL_DOMAIN=hc.monkeyseemonkeydo.lv
  - SMTPD_PORT=25
  [...]
ports:
  - 25:25        

After another docker compose up, I sent a test email and verified its arrival:

Add Logo and Site Name

The default logo image is located at /opt/healthchecks/static-collected/img/logo.png inside the “web” container. To use a custom logo, one can either set the SITE_LOGO_URL environment variable or mount a custom logo over the default one. I used the latter method.

I used an image from the Noto Emoji font as the logo, placed it next to docker-compose.yml on the server, and picked a site name:

environment:
  [...]
  - SITE_NAME=MeowOps
  [...]
volumes:
  - $PWD/logo.png:/opt/healthchecks/static-collected/img/logo.png

The result:

The Complete docker-compose.yml

Putting it all together, here is the complete docker-compose.yml:

version: "3"

services:
  caddy:
    image: caddy:2.6.4
    restart: unless-stopped
    command: caddy reverse-proxy --from https://hc.monkeyseemonkeydo.lv:443 --to http://web:8000
    ports:
      - 80:80
      - 443:443
    volumes:
      - caddy:/data
    depends_on:
      - web

  web:
    image: healthchecks/healthchecks:v2.8.1
    restart: unless-stopped
    environment:
      - ADMINS=meow@monkeyseemonkeydo.lv
      - DEBUG=False
      - ALLOWED_HOSTS=hc.monkeyseemonkeydo.lv
      - DB=postgres
      - DB_HOST=postgres-************.db.upclouddatabases.com
      - DB_PORT=11550
      - DB_NAME=defaultdb
      - DB_USER=upadmin
      - DB_PASSWORD=AVNS_*******************
      - DEFAULT_FROM_EMAIL=meow@monkeyseemonkeydo.lv
      - EMAIL_HOST=smtp.fastmail.com
      - EMAIL_HOST_USER=meow@monkeyseemonkeydo.lv
      - EMAIL_HOST_PASSWORD=****************
      - PING_EMAIL_DOMAIN=hc.monkeyseemonkeydo.lv
      - REGISTRATION_OPEN=False
      - SECRET_KEY=b553f395-2aa1-421a-bcf5-d1c1456776d7
      - SITE_NAME=MeowOps
      - SITE_ROOT=https://hc.monkeyseemonkeydo.lv
      - SMTPD_PORT=25
    ports:
      - 25:25
    volumes:
      - $PWD/logo.png:/opt/healthchecks/static-collected/img/logo.png

volumes:
  caddy:

HA

With the current setup, the web server and the database are both single points of failure. For a production setup, it would be desirable to have as few single points of failure as possible.

The database part is easy, as UpCloud-managed databases support HA configurations. I changed the database plan from 1 node to 2 HA nodes (2 cores, 4GB RAM, €100/mo) and that was that. I did not even need to restart the web container.

The web server part is more complicated: launch a second web server, put a managed load balancer in front of both web servers, and move TLS termination to the load balancer. I updated docker-compose.yml yet again:

version: "3"

services:
  web:
    image: healthchecks/healthchecks:v2.8.1
    restart: unless-stopped
    environment:
      - ADMINS=meow@monkeyseemonkeydo.lv
      - DEBUG=False
      - DB=postgres                         
      - DB_HOST=postgres-************.db.upclouddatabases.com
      - DB_PORT=11550
      - DB_NAME=defaultdb
      - DB_USER=upadmin                         
      - DB_PASSWORD=AVNS_*******************
      - DEFAULT_FROM_EMAIL=meow@monkeyseemonkeydo.lv
      - EMAIL_HOST=smtp.fastmail.com
      - EMAIL_HOST_USER=meow@monkeyseemonkeydo.lv                            
      - EMAIL_HOST_PASSWORD=****************
      - PING_EMAIL_DOMAIN=hc.monkeyseemonkeydo.lv
      - REGISTRATION_OPEN=False
      - SECRET_KEY=b553f395-2aa1-421a-bcf5-d1c1456776d7
      - SITE_NAME=MeowOps
      - SITE_ROOT=https://hc.monkeyseemonkeydo.lv
      - SMTPD_PORT=25
    ports:
      - 10.0.0.2:8000:8000
      - 10.0.0.2:25:25
    volumes:
      - $PWD/logo.png:/opt/healthchecks/static-collected/img/logo.png
  • I removed the “caddy” service since the load balancer will now be terminating TLS.
  • I removed the ALLOWED_HOSTS setting. This was required to get the load balancer health checks to work (UpCloud’s load balancer does not send the Host request header).
  • I exposed port 8000 of the “web” service on a private IP that the load balancer will connect through.
  • I updated the port 25 entry to bind only to the private IP.

The following steps are UpCloud-specific, not Healthchecks-specific, so I will only summarize them:

  • I launched a second web server and set it up identically to the existing one.
  • I created a managed load balancer (2 HA nodes, €30/mo).
  • I replaced the “A” and “AAAA” DNS records for hc.monkeyseemonkeydo.lv with a CNAME record that points to the load balancer’s hostname.
  • I configured the load balancer to terminate TLS traffic on port 443, add X-Forwarded-For request headers, and proxy the HTTP requests to the web servers.
  • I configured the load balancer to proxy TCP connections on port 25 to port 25 on the web servers.

Costs

For the single-node setup:

  • Web server: €7/mo.
  • Database: €30/mo.
  • Total: €37/mo.

For the HA setup:

  • Web servers: 2 × €7/mo.
  • Database: €100/mo.
  • Load balancer: €30/mo.
  • Total: €144/mo.

Monitoring, Automation, Documentation

At this point, the Healthchecks instance is up and running and the walk-through is complete. For real-world deployment, also consider the following tasks:

  • Set up uptime monitoring using your preferred uptime monitoring service.
  • Set up CPU / RAM / disk / network monitoring using your preferred infrastructure monitoring service.
  • Set up monitoring for notification delivery.
  • Move secret values out of docker-compose.yml, and store docker-compose.yml under version control.
  • Document the web server setup and update procedures.
  • Automate the setup and update tasks if and where it makes sense.

Thanks for reading, and good luck in your self-hosting adventures,
–Pēteris

Monitor Disk Space on Servers Without Installing Monitoring Agents

Let’s say you want to get an email notification when the free disk space on your server drops below some threshold level. There are many ways to go about this, but here is one that does not require you to install anything new on the system and is easy to audit (it’s a 4-line shell script).

The df Utility

df is a command-line program that reports file system disk space usage, and is usually preinstalled on Unix-like systems. Let’s run it:

$ df -h /
Filesystem                         Size  Used Avail Use% Mounted on
/dev/mapper/ubuntu--vg-ubuntu--lv   75G   23G   51G  32% /

The “-h” argument tells df to print sizes in human-readable units. The “/” argument tells df to only output stats about the root filesystem. The “Use%” field in the output indicates the root filesystem is 32% full. If we wanted to extract just the percentage, df has a handy “–output” argument:

$ df --output=pcent /
Use%
 32%

We can use tail to drop the first line, and tr to delete the space and percent-sign characters, leaving just the numeric value:

$ df --output=pcent / | tail -n 1 | tr -d '% '
32

The Disk Space Monitoring Script

Here is a shell script that looks up the free disk space on the root filesystem, compares it to a defined threshold value (75 in this example), then does some action depending on the result:

pct=$(df --output=pcent / | tail -n 1 | tr -d '% ')
if [ $pct -gt 75 ]; 
then
    // FIXME: the command to run when above the threshold 
else
    // FIXME: the command to run when below the threshold
fi

We can save this as a shell script, and run it from cron at regular intervals. Except the script does not yet handle the alerting part of course. Some things to consider:

Healthchecks.io

Healthchecks.io, a cron job monitoring service, can help with the alerting part:

  • You can send monitoring signals to Healthchecks.io via HTTP requests using curl or wget.
  • Healthchecks.io handles the email delivery (as well as Slack, Telegram, Pushover, and many other options).
  • Healthchecks.io sends notifications only on state changes – when something breaks or recovers. It will not spam you with ongoing reminders unless you tell it to.
  • It will also detect when your monitoring script goes AWOL. For example, when the whole system crashes or loses the network connection.

In your Healthchecks.io account, create a new Check, give it a descriptive name, set its Period to “10 minutes”, and copy its Ping URL.

The monitoring API is super-simple. To signal success (disk usage is below threshold), send an HTTP request to the Ping URL directly:

curl https://hc-ping.com/your-uuid-here

And, to signal failure, append “/fail” at the end of the Ping URL:

curl https://hc-ping.com/your-uuid-here/fail

Let’s integrate this into our monitoring script:

url=https://hc-ping.com/your-uuid-here
pct=$(df --output=pcent / | tail -n 1 | tr -d '% ')
if [ $pct -gt 75 ]; then url=$url/fail; fi
curl -fsS -m 10 --retry 5 -o /dev/null --data-raw "Used space on /: $pct%" $url

The curl call here has a few extra arguments:

  • “-fsS” tells curl to suppress output except for error messages
  • “-m 10” sets a 10-second timeout for HTTP requests
  • “–retry 5” tells curl to retry failed requests up to 5 times
  • “-o /dev/null” sends the server’s response to /dev/null
  • “–data-raw …” specifies a log message to include in an HTTP POST request body

Save this script in a convenient location, for example, in /opt/check-disk-space.sh, and make it executable. Then edit crontab (crontab -e) and add a new cron job:

*/10 * * * * /opt/check-disk-space.sh

Cron will run the script every 10 minutes. On every run, the script will check the used disk space, and signal success (disk usage below or at threshold) or failure (disk usage above threshold) to Healthchecks.io. Whenever the status value flips, Healthchecks.io will send you a notification:

You will also see a log of the monitoring script’s check-ins in the Healthchecks.io web interface:

Closing Comments

If your use case involves handling millions of small files, at least on ext4 filesystems, the filesystem can also run out of inodes. Run df -i to see how many inodes are in use and how many are available. If inode use is a potential concern, you could update the check-disk-space.sh script to track it too.

The shell script + Healthchecks.io pattern would work for monitoring other system metrics too. For example, you could have a script that checks the system’s 15-minute load average, the number of files in a specific directory, or a temperature sensor’s reading.

If you are looking to monitor more than a couple of system metrics though, look into purpose-built system monitoring tools such as netdata. The shell script + Healthchecks.io approach works best when you have a few specific metrics you care about, and you want to avoid installing full-blown monitoring agents in your system.

Thanks for reading and happy monitoring,
–Pēteris.

Making HTTP requests with Arduino and ESP8266

A Healthchecks user sent me a code snippet for sending HTTP pings from Arduino. This prompted me to do some Arduino experimenting on my own. I ordered Arduino Nano 33 IoT board:

Arduino Nano 33 IoT

I picked this board because I wanted an easy entry into Arduino development. As a first-party Arduino hardware, it should be easy to get it working with Arduino IDE. It has an on-board WIFI chip, so I would not need to hook up additional WiFi or Ethernet hardware.

The Nano 33 IoT has a micro USB port. After connecting to my PC running Ubuntu, Arduino’s power LED lit up, and on the computer side a /dev/ttyACM0 device appeared. Arduino IDE detected the connected board, but my initial attempt to upload a sketch failed. This turned out to be a permissions issue. After I added my OS user to the dialout group, I could upload a “Hello World” sketch to the board:

Sending a Raw HTTP Request

Arduino Nano 33 IoT has an on-board WiFi module. To use it, Arduino provides the WiFiNINA library. The library comes with example code snippets. One of the examples shows how to connect to a WiFi network and make an HTTP request. I adapted it to make an HTTPS request to hc-ping.com:

#include <WiFiNINA.h>
#include "arduino_secrets.h"

char ssid[] = SECRET_SSID;
char pass[] = SECRET_PASS; 
int status = WL_IDLE_STATUS;             
WiFiSSLClient client;

void setup() {
  pinMode(LED_BUILTIN, OUTPUT);
  Serial.begin(9600);
  while (!Serial);

  Serial.print("Connecting ...");
  WiFi.begin(ssid, pass);
  while (WiFi.status() != WL_CONNECTED) {
    delay(500);
    Serial.print(".");
  }

  Serial.print("\nConnected, IP address: ");
  Serial.println(WiFi.localIP());  
}

void ping() {
  Serial.println("Pinging hc-ping.com...");
  if (client.connect("hc-ping.com", 443)) {
    Serial.println("Connected to server.");
    client.println("GET /da840100-3f58-405e-a5ee-e7e6e4303e82 HTTP/1.0");
    client.println("Host: hc-ping.com");
    client.println("Connection: close");
    client.println();
    Serial.println("Request sent.");
  }

  while (client.connected()) {
    while (client.available()) {
      char c = client.read();
      Serial.write(c);
    }
  }
  Serial.println("\nClosing connection.");
  client.stop();
}

void loop() {
  ping();

  // Blink LED for 10 seconds:
  Serial.print("Waiting 10s: ");
  for (int i=0; i<10; i++) {
    Serial.print(".");
    digitalWrite(LED_BUILTIN, HIGH);  
    delay(500);                      
    digitalWrite(LED_BUILTIN, LOW);  
    delay(500);  
  } 
  Serial.println();
}

After uploading this sketch to Arduino, here’s the output on serial console:

Connecting ...
Connected, IP address: 192.168.1.77
Pinging hc-ping.com...
Connected to server.
Request sent.
HTTP/1.1 200 OK
server: nginx
date: Thu, 30 Mar 2023 12:33:25 GMT
content-type: text/plain; charset=utf-8
content-length: 2
access-control-allow-origin: *
ping-body-limit: 100000
connection: close

OK
Closing connection.
Waiting 10s: ..........
Pinging hc-ping.com...
Connected to server.
Request sent.
HTTP/1.1 200 OK
server: nginx
date: Thu, 30 Mar 2023 12:33:41 GMT
content-type: text/plain; charset=utf-8
content-length: 2
access-control-allow-origin: *
ping-body-limit: 100000
connection: close

OK
Closing connection.
Waiting 10s: .......
[...]

Quite impressively, this works over HTTPS out of the box – the WiFiNINA library and the chip takes care of performing TLS handshake and verifying the certificates. All I had to do was specify port 443, and the rest was handled automagically.

ArduinoHttpClient

After getting the minimal example working, I found the ArduinoHttpClient library. It offers a higher-level interface for making GET and POST requests, and for parsing server responses. It works with several different network libraries, including WifiNINA.

#include <ArduinoHttpClient.h>
#include <WiFiNINA.h>
#include "arduino_secrets.h"

char ssid[] = SECRET_SSID;
char pass[] = SECRET_PASS; 
int status = WL_IDLE_STATUS;             
WiFiSSLClient wifi;
char host[] = "hc-ping.com";
char uuid[] = UUID;
HttpClient client = HttpClient(wifi, host, 443);

void setup() {
  pinMode(LED_BUILTIN, OUTPUT);
  Serial.begin(9600);
  while (!Serial);

  Serial.print("Connecting ...");
  WiFi.begin(ssid, pass);
  while (WiFi.status() != WL_CONNECTED) {
    delay(500);
    Serial.print(".");
  }

  Serial.print("\nConnected, IP address: ");
  Serial.println(WiFi.localIP());  
}

void loop() {
  client.get("/" + String(uuid));
  Serial.print("Status code: ");
  Serial.println(client.responseStatusCode());
  Serial.print("Response: ");
  Serial.println(client.responseBody());

  // Blink LED for 10 seconds:
  Serial.print("Waiting 10s: ");
  for (int i=0; i<10; i++) {
    Serial.print(".");
    digitalWrite(LED_BUILTIN, HIGH);  
    delay(500);                      
    digitalWrite(LED_BUILTIN, LOW);  
    delay(500);  
  } 
  Serial.println();
}

Output in the serial console:

Connecting ...
Connected, IP address: 192.168.1.77
Status code: 200
Response: OK
Waiting 10s: ..........
Status code: 200
Response: OK
Waiting 10s: ..........
[...]

ESP8266

After having good results with Arduino Nano 33 IoT, I wanted to try the same on an ESP8266 board I had lying around:

ESP8266 on a carrier board

This board from AliExpress has a few goodies in addition to the ESP8266 chip: a relay, and multiple powering options: 220V AC, 7-12V DC, 5V DC. It has a USB port, but this port can be used for supplying power only, there is no USB-UART interface onboard. There are clearly labeled GND, 5V, RX, TX pins that I can hook a USB-UART converter (also from AliExpress) to:

ESP8266 with a USB-serial converter hooked up

The yellow jumper connects GPIO 0 to the ground, this puts ESP8266 in programming mode. At this point I can plug the USB-UART converter in the PC and check for signs of life using esptool:

$ apt-get install esptool
$ esptool chip_id
esptool.py v2.8
Found 2 serial ports
Serial port /dev/ttyUSB0
Connecting...
Detecting chip type... ESP8266
Chip is ESP8266EX
Features: WiFi
Crystal is 26MHz
MAC: a8:48:fa:ff:15:45
Enabling default SPI flash mode...
Chip ID: 0x00ff1545
Hard resetting via RTS pin...

Arduino IDE does not support ESP8266 chips out of the box, but there is esp8266/Arduino project which adds support for different flavors of ESP boards.

esp8266 library in Arduino IDE’s Board Manager view

The esp8266/Arduino project also comes with a WiFi library, which provides an interface to the WiFi functionality on the chip. For simple use cases, the esp8266wifi library is a drop-in replacement for the WiFiNINA library:

#include <ArduinoHttpClient.h>
#include <ESP8266WiFi.h>
#include "arduino_secrets.h"

char ssid[] = SECRET_SSID;
char pass[] = SECRET_PASS; 
WiFiClient wifi;
char host[] = "hc-ping.com";
char uuid[] = UUID;
HttpClient client = HttpClient(wifi, host, 80);

void setup() {  
  pinMode(LED_BUILTIN, OUTPUT);

  Serial.begin(115200);
  while (!Serial);  
  Serial.println();

  WiFi.begin(ssid, pass);

  Serial.print("Connecting ...");
  while (WiFi.status() != WL_CONNECTED) {
    delay(500);
    Serial.print(".");
  }

  Serial.print("\nConnected, IP address: ");
  Serial.println(WiFi.localIP());
}

void loop() {
  client.get("/" + String(uuid));
  Serial.print("Status code: ");
  Serial.println(client.responseStatusCode());
  Serial.print("Response: ");
  Serial.println(client.responseBody());

  // Blink LED for 10 seconds:
  Serial.print("Waiting 10s: ");
  for (int i=0; i<10; i++) {
    Serial.print(".");
    digitalWrite(LED_BUILTIN, HIGH);  
    delay(500);                      
    digitalWrite(LED_BUILTIN, LOW);  
    delay(500);  
  } 
  Serial.println();  
}

Although the esp8266wifi library does support TLS, the documentation also mentions significant CPU and memory requirements. To keep things simple and quick, I went with port 80 and unencrypted HTTP for this experiment.

I uploaded the sketch, removed the yellow jumper, reset the board, and got this output on the serial console:

Connecting ..........
Connected, IP address: 192.168.1.78
Status code: 200
Response: OK
Waiting 10s: ..........
Status code: 200
Response: OK
Waiting 10s: ..........
[...]

Success!

In summary, my first steps in Arduino development left me with positive impressions. The network libraries provide an easy to use, high-level interface for working with network hardware. They have uniform interfaces, so can be used in sketches interchangeably, with minimal code changes. After the initial hump of getting a board recognized by Arduino IDE, and getting the first sketch to upload and run, the development went smoothly. To be fair, the “development” in my case was mostly copying and tweaking code samples. But it was still good!

Happy tinkering,
–Pēteris