Starlink: Outage Data End of February Update

Published Tuesday, March 2, 2021 by Bryan

I've continued to analyze and plot more information about Starlink outages. I've also collected three more weeks of nearly continuous data, so it's time to review how quality of service has changed.

Let's start by replotting the data from my earlier post, using my latest code, so it's easier to see changes.

Figure 1: Histogram of different outage lengths. Data from February 6 post replotted, covering about 66.5 hours scattered across on January 31, February 1, 2, 4, 5.

As before, the histogram in Figure 1 shows how often an outage of each length occurred. The difference between this one and the one from the earlier post is that instead of breaking up the columns by days, they're separated by cause. Where we only knew that there were over 700 outages lasting only one second in this data last time, we can now see that that was about 300 obstructions and 500 beta downtimes (my tool also counts a few more outages than the tool I used last time).

Red bars count outages blamed on obstructions, blue are beta downtime, and green are lack of satellites. At the left side, the first set of bars counts the number of times an outage lasting only one second was observed. The next bar to the right counts outages lasting two seconds. Next three seconds, and so on. In the middle of the graph, at the point labeled "1m", the step between the bars switches to minutes (i.e. the next bar after 1m is outages lasting two minutes). On the right half of the graph, outages with durations between two steps are counted as the lower step (e.g. 4 minutes, 45 seconds is counted as 4 minutes).

I'm going to add one more bar to the graph. The one thing I've had trouble using my Starlink connection for is video calling (Zoom, FaceTime, etc.). My connection drops for too long too often to make a long call comfortable. So, the question is, how long am I usually connected?

Figure 2: Adding connectivity lengths (yellow) to the histogram.

In Figure 2, the yellow bars count the number of times that connectivity lasted for the given duration. In the ideal world of zero outages, this looks like a single bar of height 1 at the 60m mark (because spans over 60m are recorded as 60m). This graph doesn't show the ideal case. The most common connected duration is 2 minutes, occurring around 300 times. The longest connected duration is about 17 minutes, which occurred once. (Click to see the full-resolution image.)

One 17-minute span of connectivity across four days doesn't sound great. A FaceTime call that I make every week lasts at least that long, and often closer to 30 minutes. So, multiple spans closer to that, and preferably longer, are what I'm looking for.

One thing that's a little hard with this analysis is making sure it's not flagging disconnections that I wouldn't notice. So, a quick thing I've built in is a setting to ignore disconnects that last less than a configurable number of seconds. As a generous guess, I've decided to tell it that interruptions of two seconds or less are tolerable.

Figure 3: Ignoring outages lasting two seconds or less when calculating duration of connectivity.

Figure 3 has that modification. The number of one and two minute periods of connectivity have drastically decreased. Those short spans were just separated from each other, or from other longer spans. They have been tacked on to those, so we have more connections lasting ten minutes or more. In fact, there are now five durations of connection lasting over 20 minutes.

Something else that's hard, is making sure that "outage" really means "outage". These statistics are already following Starlink's own app in only labeling a second as an outage if all pings were lost during that second (popPingDropRate = 1). Some redditors have suggested that because pings are such low priority, high throughput may cause all pings to be lost. So what looks like an outage could be exactly the opposite. To check this, I also added configuration to ignore an outage if the downlink or uplink speed recorded for that second is above a given value.

Figure 4: Ignoring "outages" where uplink or downlink throughput was at least 1Mbps

In Figure 4, seconds where the downlink or uplink speed was recorded as 1Mbps (1,048,576 bits) or higher are not treated as breaks in connection. It didn't increase the number of connections lasting longer than 20 minutes. That may be because of the 13,187 outage seconds in this dataset, only 114 had downlink or uplink speeds of 1Mbps or more.

Figure 5: Display settings in use.

That was the state of the connection in the first week of February. Let's apply this same analysis to the three weeks since.

Figure 6: Data covering just before midnight February 8 through just before midnight February 15.

Figure 6: February 9-15. This is seven days, instead of four, so we should expect counts to be a little higher overall anyway. But, there are many connected spans counted over 20 minutes, and finally some over 30 minutes. There are even a couple over 50 minutes long! This looks like decent improvement.

Figure 7: Data covering just before midnight February 15 through just after midnight February 23.

Figure 7: February 16-22. This looks pretty similar. Multiple spans over 20 minutes, some over 30. This time there are even a couple over 60 minutes long. Very short outages are also up a bit for both obstructions and beta downtime.

Figure 8: Data covering just before midnight February 22 through just after midnight March 2.

Figure 8: February 23-March 1. This still looks like a pretty similar breakdown to me. Unfortunately, we lost the over-60-minute connections, but we still have some over-30-minute durations. All short outage categories are also up, though obstructions overtook beta downtime for 4-10 second outages. A snowstorm made my tree branches thicker.

While short outages seem to have increased slightly, it does seem that the system has improved according to the connected-time measurement. I was hopeful that the Feb 9-15 improvement may have been because of satellites launched on Feb 4, and thus there might have been more improvements from the Feb 15 launch seen in the past week. There were also a couple of firmware updates I noticed on February 15 (7db91a39-…) and 20 (a95d0312-…), so maybe those shifted these metrics as well.

Subjectively, things seems about the same. Streaming and browsing work great, even if we have become a little more sensitive to the very occasional second or two that a coincidental outage delays a page from loading. Video calling still pauses often enough that we switch back to our fixed wireless connection if we expect the call to last more than a couple of minutes.

Figure 9: Timeseries view of outages and connectivity February 23 through March 1. Each 2x2-pixel rectangle represents one second.

There is still some way to go. Figure 9 is what those very few over-30-minute connections per week look like. In this "timeseries" view, each pixel represents one second. One line, from left to right, is 20 minutes. Where the line is red, blue, or green, all pings were lost during that second. Where the line is yellow, that second is part of a 30-minute or longer span of connectivity that has no interruptions longer than 2 seconds. White are other periods of connectivity that lasted less than 30 minutes. Dark grey are times I missed downloading data, because I had shut off the house power to rewire my workshop.

I already know that I need to move my dish to remove obstructions. Bands of more densely red streaks correlate with snowstorms moving through (e.g. February 28). Dishy melts what falls on it, but it can't melt what has fallen on the tree branches that are in the edges of Dishy's view. Once the several feet of snow on the ground around my temporary Dishy tower begins to disappear, I'll be working on a taller mount.

Figure 10: The same timeseries as Figure 9, but with all obstruction outages removed.

From this data, reducing my obstructions to zero would remove about half of my outages. I see just as much beta downtime as obstructions, usually more, if it's not actively snowing. Ignoring all obstruction outages in my data, while considerably expanding the number of long clear connected periods I can expect, still reveals many stretches where clear connectivity doesn't last long (Figure 10).

Starlink says beta downtime "will occur from time to time as the network matures." That doesn't sound like every couple of minutes for just a few seconds to me, so I've tried a number of things to figure out whether all of this beta downtime is mislabeled. The periodic patterns I saw in the obstruction data in my raster-scanning post aren't as visually obvious in the beta downtime data. Segments of beta downtime are sometimes (about 20% in the last week) immediately preceded or followed by obstruction downtime. Reclassifying those segments as obstructions, and ignoring them does make an appreciable difference in the amount and length of clear connectivity. But is ignoring them correct? Some redditors report frequent beta downtime even with zero obstructions.

For now, I'll continue to enjoy mostly-fast, mostly-up, decently-priced service, and watch the effects of the next satellite launch and the spring thaw.

If you'd like to play with this data and the viewer yourself, I've published it as the 2.0 release on the github repo.

Categories: Starlink