Beer IoT (Part 2)

Published Saturday, January 7, 2017 by Bryan

Welcome back for part two. In part one, I explained how I exported my historical brewing data from The BeerBug's website. In this part, I'm going to demonstrate what I've learned about one alternative, the Helium platform.

Helium doesn't sell a homebrew device, but rather a generic sensor platform. I ordered a dev kit while they were on sale, and while I'm waiting for my hardware to arrive, I have gained access to their data aggregation platform.

Disclaimer: I know several of the Helium developers, but I am not being compensated in any way to review their system.

Helium supports creating "virtual sensors" and uploading whatever data you like for them, as a way to test and experiment. What better data to play with than something I'm already familiar with? I'll upload the BeerBug data I exported.

When a helium sensor posts a reading, it specifies a "port" for that reading. The port is primarily a label of what the reading is, but the examples given and port names reserved suggest that they're intended to label the "type" of the reading. For example, port "t" is reserved for temperature in Celcius, and port "b" is battery level in millivolts. I have data for each of those, as well as a port I'm going to call "sg" for specific gravity.

Logging a reading is done by HTTP-POSTing some JSON data. The basic form looks like this:

{
 "data": {
   "attributes": {
     "port": "sg", // the name of the port
     "value": 1.0568, // the value for the reading
     "timestamp": "2016-01-23T18:35:03Z" // ISO8601 time in UTC
   },
   "type": "data-point"
 }
}
  

My data is all floating point numbers, so nothing too complex to worry about ... except it's all in the wrong format. To start with, my data looks like this:

{
 "dates": [ // comma-separated, zero-based month index, in local time
   "2016,0,23,18,35,3",
   // ... the rest of the dates ...
 ],
 "temp": [ // fahrenheit degrees
   70.26
   // ... the rest of the temperatures ...
 ],
 "sg": [ // specific gravity
   1.0568
   // ... the rest of the specific gravities ...
 ]
}
  

After many iterations, this is my jq script for conversion:

[.dates, .sg, .temp, .batt] | transpose | .[] |
  # there is probably a better way to convert from 0-based month to ISO8601
  # strptime bails on 0-based month, but produces a 0-based month structure?
  (.[0] | split(",") |
   [.[0],(.[1] | tonumber | .+1 | tostring),.[2],.[3],.[4],.[5]] |
   join(",") | strptime("%Y,%m,%d,%k,%M,%S") | todate) as $date |
  # specific gravity
  {"data":{"attributes":{"port":"sg","value":.[1],"timestamp":$date},
           "type":"data-point"}},
  # temperature - assumed fahrenheit (helium is celcius)
  {"data":{"attributes":{"port":"t","value":((.[2] - 32) * 5 / 9),"timestamp":$date},
           "type":"data-point"}},
  # battery level - assumed volts (helium is millivolts)
  {"data":{"attributes":{"port":"b","value":(.[3] * 1000),"timestamp":$date},
           "type":"data-point"}}
  

It has one major bug still: I'm just using local time as UTC. Just figuring out how to deal with the zero-based month was enough hassle (strptime produces an array that uses a zero-based month, but it can't consume a string with one). It seems like the addition of a mktime | . + 28800 | gmtime (or 25200) would be close enough ... but I should have exported in UTC to start with.

But anyway, let's run this through jq:

$ jq -cf beerbug-to-helium.jq export-oatmeal-stout-jan-2016.json > helium-oatmeal-stout-jan-2016.json
$ head -3 helium-oatmeal-stout-jan-2016.json
{"data":{"attributes":{"port":"sg","value":1.0568,"timestamp":"2016-01-23T18:35:03Z"},"type":"data-point"}}
{"data":{"attributes":{"port":"t","value":21.255555555555556,"timestamp":"2016-01-23T18:35:03Z"},"type":"data-point"}}
{"data":{"attributes":{"port":"b","value":4146.7,"timestamp":"2016-01-23T18:35:03Z"},"type":"data-point"}}
  

Now I have one data-point per line, which will make uploading easy. But before uploading, I need to actually create my virtual sensor. This can be done via Helium's HTTP API, but their example is missing the POST body (though I assume it's the same as the update's body, without the "id" field), and it's just so simple with the Helium Commander utility installed (yes, I've censored the UUID):

$ helium sensor create --name beerbug-536
$ helium --uuid sensor list
+--------------------------------------+-----+------+-----------------------------+----------------------------+-------------+
| ID                                   | MAC | TYPE | CREATED                     | SEEN                       | NAME        |
+--------------------------------------+-----+------+-----------------------------+----------------------------+-------------+
| ABIGUUID-USED-TOBE-HERE-BUTISGONENOW |     |      | 2016-12-18T06:11:54.182691Z | 2016-12-19T04:49:57.00331Z | beerbug-536 |
+--------------------------------------+-----+------+-----------------------------+----------------------------+-------------+
$ export HELIUM_BEERBUG=ABIGUUID-USED-TOBE-HERE-BUTISGONENOW
  

Now I can finally upload some data! I'm just going to pipe the file I have through xargs and let things chug along. The sed work at the front is needed to escape the double-quotation marks in the json file, so that xargs doesn't remove them:

$ sed 's/"/\\"/g' helium-oatmeal-stout-jan-2016.json |\
  xargs -n 1 curl -H "Content-Type: application/json" \
  -H "Authorization: $HELIUM_API_KEY" -XPOST \
  "https://api.helium.com/v1/sensor/$HELIUM_BEERBUG/timeseries" -d
  

That ... was slow. About 12,000 data-points in an hour. Or, three per second, as some insist all speeds be measured. I have around 65,000 data points, so that would be five hours or more. That's my fault, though - starting curl all the way over again for each data point is way expensive. Let's split up the work and run three curls in parallel:

$ tail +12001 helium-oatmeal-stout-jan-2016.json |\
  grep "\"b\"" > helium-oatmeal-stout-jan-2016.json-b
$ tail +12001 helium-oatmeal-stout-jan-2016.json |\
  grep "\"sg\"" > helium-oatmeal-stout-jan-2016.json-sg
$ tail +12001 helium-oatmeal-stout-jan-2016.json |\
  grep "\"t\"" > helium-oatmeal-stout-jan-2016.json-t
$ sed 's/"/\\"/g' helium-oatmeal-stout-jan-2016.json-b |\
  xargs -n 1 curl -H "Content-Type: application/json" \
  -H "Authorization: $HELIUM_API_KEY" -XPOST \
  "https://api.helium.com/v1/sensor/$HELIUM_BEERBUG/timeseries" -d &
$ sed 's/"/\\"/g' helium-oatmeal-stout-jan-2016.json-sg |\
  xargs -n 1 curl -H "Content-Type: application/json" \
  -H "Authorization: $HELIUM_API_KEY" -XPOST \
  "https://api.helium.com/v1/sensor/$HELIUM_BEERBUG/timeseries" -d &
$ sed 's/"/\\"/g' helium-oatmeal-stout-jan-2016.json-t |\
  xargs -n 1 curl -H "Content-Type: application/json" \
  -H "Authorization: $HELIUM_API_KEY" -XPOST \
  "https://api.helium.com/v1/sensor/$HELIUM_BEERBUG/timeseries" -d
  

That was better, at about 8-ish points per second. I don't expect much better out of my non-business DSL line. It's saturated enough that MARIO RUN is delaying the starts of the games that I'm playing while waiting. If I were planning to bulk-load other data, I'd write something that kept the HTTP connection open and pipelined POSTs.

The real question I've been waiting on is, now that the data is in Helium's system, what can I do with it? The bummer news is that I can't use their web dashboard. It only goes back 90 days, and this data is from nearly a year ago. Maybe I'll adjust the dates in another experiment. I think the only way to change data later might be to make a new sensor (i.e. you don't get to change it - you have to rewrite it), so maybe best to think about where you scribble.

But, I can do basic retrieval, with filter[start]= and filter[end]=:

$ curl -H "Authorization: $HELIUM_API_KEY" -XGET \
  "https://api.helium.com/v1/sensor/$HELIUM_BEERBUG/timeseries?filter%5Bstart%5D=2016-02-01T12:00:00Z&filter%5Bend%5D=2016-02-01T12:05:00Z" |\
  jq .
{
 "data": [
   {
    "attributes": {
      "value": 4162.5,
      "timestamp": "2016-02-01T12:04:01Z",
      "port": "b"
    },
    "relationships": {
      "sensor": {
        "data": {
          "id": "8dce390e-082a-47fc-85cf-43adafd30edd",
          "type": "sensor"
        }
      }
    },
    "id": "89b47b2f-500d-4af3-9d01-49766b5938b0",
    "meta": {
      "created": "2016-12-23T06:05:50.757111Z"
    },
    "type": "data-point"
   },
   {
    "attributes": {
      "value": 1.0131,
      "timestamp": "2016-02-01T12:04:01Z",
      "port": "sg"
    },
    "relationships": {
      "sensor": {
        "data": {
          "id": "8dce390e-082a-47fc-85cf-43adafd30edd",
          "type": "sensor"
        }
      }
    },
    "id": "645ca2f8-96aa-4cd9-915d-3670ec1b43af",
    "meta": {
      "created": "2016-12-23T06:06:21.478522Z"
    },
    "type": "data-point"
   },
   {
    "attributes": {
      "value": 18.672222222222224,
      "timestamp": "2016-02-01T12:04:01Z",
      "port": "t"
    },
    "relationships": {
      "sensor": {
        "data": {
        "id": "8dce390e-082a-47fc-85cf-43adafd30edd",
        "type": "sensor"
      }
    }
   },
   "id": "44afd122-b13d-4675-b35a-e48184f32c9a",
   "meta": {
     "created": "2016-12-23T06:06:38.950493Z"
   },
   "type": "data-point"
  },
...
  

I've elided the data points at 12:03:01, 12:02:01, and 12:01:01 for brevity. This is a bit verbose, and seems to contain a lot of duplicate information. It all makes more sense when you learn that you query the same data by organziation, element, or label, which each map to groups of sensors.

It's also possible to request basic aggregate statistics for this data, by adding agg[type]= and agg[size]=. The types currently available are min, max, and avg, and window sizes start at one minute and go up to one day.

$ curl -H "Authorization: $HELIUM_API_KEY" -XGET \
  "https://api.helium.com/v1/sensor/$HELIUM_BEERBUG/timeseries?filter%5Bstart%5D=2016-02-01T12:00:00Z&filter%5Bend%5D=2016-02-01T12:30:00Z&agg%5Btype%5D=avg&agg%5Bsize%5D=10m" |\
  jq .
{
 "data": [
   {
    "attributes": {
      "value": {
        "max": 18.7,
        "avg": 18.6819444444444,
        "min": 18.6555555555556
      },
      "timestamp": "2016-02-01T12:20:00Z",
      "port": "agg(t)"
    },
    "relationships": {
      "sensor": {
        "data": {
          "id": "8dce390e-082a-47fc-85cf-43adafd30edd",
          "type": "sensor"
        }
      }
    },
    "id": "ff308e69-a2c5-43a8-9215-dd4042b51104",
    "meta": {
      "created": "2016-12-23T06:06:46.98618Z"
    },
    "type": "data-point"
   },
   {
    "attributes": {
      "value": {
        "max": 1.0133,
        "avg": 1.01325,
        "min": 1.0132
      },
      "timestamp": "2016-02-01T12:20:00Z",
      "port": "agg(sg)"
    },
    "relationships": {
      "sensor": {
        "data": {
          "id": "8dce390e-082a-47fc-85cf-43adafd30edd",
          "type": "sensor"
        }
      }
    },
    "id": "9d09823b-5302-4fd8-94f4-9c1e2ef62b99",
    "meta": {
      "created": "2016-12-23T06:06:29.719129Z"
    },
    "type": "data-point"
   },
   {
    "attributes": {
      "value": {
        "max": 4168,
        "avg": 4161.15,
        "min": 4152.5
      },
      "timestamp": "2016-02-01T12:20:00Z",
      "port": "agg(b)"
    },
    "relationships": {
      "sensor": {
        "data": {
          "id": "8dce390e-082a-47fc-85cf-43adafd30edd",
          "type": "sensor"
        }
      }
    },
    "id": "5cd24bb5-30ea-4278-bbb0-082c8f25a5fe",
    "meta": {
      "created": "2016-12-23T06:06:01.779172Z"
    },
    "type": "data-point"
   },
...
  

Again, I've elided the results for 12:10 and 12:00 for brevity. This seems like it could be very convenient for supporting something like a dashboard. Some things I haven't shown are the ability to choose a limited number of ports, and how large result sets are paginated, but those are also quite simple. It seems like the requests to support basic display of min/max/avg data on a zoomable/scrollable timeline would be very straightforward. And, that's what Helium's dashboard appears to give you, if your data is recent.

But I need some way to visualize historical data as well. Read part three to find out what I came up with.

Categories: Development