Model Projections for Precipitation in the Tuolumne Watershed

Overview and Methodology

The city of San Francisco depends on water from the Hetch Hetchy Reservoir which is filled from precipitation in the Tuolumne River watershed. I was interested in seeing what models of future climate thought about the effect of climate change on San Francisco’s water supply. Hetch Hetchy supplies 80% of the water for 2.6 million people1, so the effects of climate change could be quite significant.

In this post, I show how to use tools that we developed at Planet OS along with the data analysis tools in R to explore what climate models have to say about the future of the watershed.

Simple Spatio-Temporal Windowing With Kafka Streams

Zoom In of San Francisco

The Big Idea

When I read about the new Kafka Streams component being developed by the Apache Kafka team, I was quite intrigued. Kafka Streams is a lightweight streaming layer built directly into Kafka. In line with the Kafka philosophy, it “turns the database inside out” which allows streaming applications to achieve similar scaling and robustness guarantees as those provided by Kafka itself without deploying another orchestration and execution layer.

Kafka gives us data (and compute) distribution and performance based on a distributed log model. Kafka Streams exposes a compute model that is based on keys and temporal windows. It works on both event streams (KStream) and update streams (KTable).

I want to work with spatial data instead of pure <key, value> data. In this post, I’ll show the simplest version of that: aggregating data into hexbins based on location and time.

Making Functional Programming Practical: Okasaki for Dummies

For the past few months, I’ve been working with Planet OS creating data access and search tools for global environmental sensing.

As part of this work, I had the pleasure of talking to a group at Intertrust about functional programming. Part of the fun was the audience: programmers and non-programmers. I wanted to help both groups understand functional programming and why it’s a powerful alternative to traditional models.

Planet OS has posted the video of the talk and I recommend it for folks interested in understanding what functional programming is and how it works.

This talk is very high-level and filled with pictures. It should be a good introduction to this important concept regardless of your programming background.

Some more notes and the slides, below…

My Talk at Clojure/West on Creating Beautiful Spreadsheets From Pure Data

My work on excel-templates generated enough interest that I was invited to give a talk on it at Clojure/West a couple of weeks ago.

In the talk, I show some features that we created since my previous blog post including expanding formulas in the template, creating multiple worksheets from a single worksheet, and support for charts.

You can see the talk here:

I hope you find it informative and maybe a little entertaining.

Create an Ad Hoc Spark Cluster in 15 Minutes

History and Motivation

Lately, I’ve found that my tool of choice for large-scale analytics is Apache Spark. I won’t go into all the reasons why Spark is a fantastic tool here, you can find plenty of that on the web. What I do want to focus on is how easy it is to grab a significant chunk of data, clean it, and quickly use some analysis to learn about the data.

Because I don’t have a DevOps team building me a big cluster just standing by on my whim, I thought it would be nice to be able to build clusters when I wanted them, use them, and tear them down again without fuss or long delays.

In this post, I’ll show you an easy way to do this.

Generating Beautiful Excel Reports With Templates

The transformation

Overview

Despite the rise of web-based report interfaces and dashboards, Excel remains the tool of choice in many settings. However, the world of Excel is very separate from the world of most developers. As a result most programmatically generated Excel reports either have very poor data presentation or they are very brittle.

In this post, I’ll present a solution to this problem in Clojure, but first I want to review why we should embrace Excel rather than run from it and to discuss the current state-of-the-art in Excel reporting from live systems.

Simple Webhooks With Clojure and Ring

When your data moves into cloud applications and collaboration is the rule, web hooks provide a way to extend what your cloud provider gives you.

Webhooks have been touted as the basis for what Anil Dash calls the “Pushbutton Web” which enables large-scale real-time collaboration between applications and humans. The concept behind webhooks is simple (as explained in detail by Jeff Lindsay in this video) and boils down to: it’s easy to build a lightweight HTTP server nowadays, so why not let cloud-based applications interact with simple RESTful HTTP requests containing a payload of information of interest.

In this article, I will show a simple web hook processor written in Clojure using the Mark McGranaghan’s Ring, a lightweight HTTP server framework based on the ideas in Rack. This web hook was written as part of the system for maintaining the clojure-contrib documentation website.