Another release, another massive wave of excitement. I'm thrilled to unveil version 0.7.0 of Onyx, featuring a stunning advancement in performance. In the 7 weeks since we shipped Onyx 0.6.0, we've added new features that help Onyx pack a serious punch in production. In this post, we're going to talk about Onyx's new monitoring API, Java integration, automatic backpressure, and lightning-fast new transport layer.

If this is your first time seeing Onyx, we can sum it up as a distributed, masterless, high performance, fault tolerant data processing platform written in Clojure. Onyx aggressively uses data structures to make computations serializable, crossing languages, networks, and time. Its API exposes a batch and streaming hybrid processing model. Onyx is a good fit for realtime stream processing, batch analytics, and ETL jobs. We're on Twitter @OnyxPlatform, and provide commercial consulting & support.

Upgrading from Onyx 0.6.0? We broke backwards compatability a little bit (we're sorry!). See the changelog for the small number of things you'll need to tweak.

Aeron, Onyx's Secret Weapon

Onyx provides a pluggable messaging transport layer. Until now, we're provided implementations for Netty and core.async to facilitate speedy local development. Today, we're officially providing the option to use Aeron. Aeron is an efficient, reliable unicast and multicast messaging transport library. We are able to achieve remarkable performance gains by switching on Aeron, both due to its lock-free design and simple-to-use API. Given that Aeron is by far the fastest transport layer that we offer, we strongly recommend that you use it instead of Netty transport - though we will continue to support both. Keep in mind that Aeron requires Java 8.

You can switch to Aeron right now by upgrading your Onyx project to version 0.7.0, and switching your peer configuration messaging implementation to use :aeron instead of :netty. That's it!

The Benchmark

Just how fast have we gotten by adding Aeron? Look no further than our repeatable, automated AWS cloud benchmark suite. To summarize our findings, we conclude that Onyx throughput has doubled sinced the last release, and latency has improved by more than 5x at top speeds. You can see the full specification of the benchmark in this Gist.

In short, we blasted a 5 node Onyx cluster consisting of 8-core machines with 100 byte messages through a streaming job of 6 tasks. We found that each machine nominally handles 300,000 segments per second, or 1.5 million segments in the aggregate. This volume was only reachable in Onyx 0.6.0 using 5 * 16 core machines, exactly twice the strength of this cluster. We can see the rate via a throughput graph:

1

How is latency looking? There is where things get really fantastic. We measure latency in terms of quantiles, where latency is the time it takes in milliseconds for a segment to complete the full 6-tasks in the workflow. That means you can divide the latency by 6 to determine roughly how long each task took. At the 50% quantile, we see many segments completing within 3-6 milliseconds. As the quantiles rise, the tail latencies spike no higher than 1,500 milliseconds. The yellow, flat line indicates that machine is not executing any input tasks, and has nothing to report.

The latency figures shown above may actually understate the real world improvement. Additional testing has indicated that latency under load, or with an untuned cluster degrades far more gracefully under the Aeron implementation than the Netty implementation. This brings us one step closer to a cluster that needs less tuning in order to achieve excellent performance.

In conclusion, this is a massive step forward for Onyx. We're truly flying at incredible speeds.

Automatic Backpressure

Given that we're moving at very fast speeds now, backpressure becomes a supremely important topic. A new feature that we're excited to roll out is automatic backpressuring. Previously, Onyx provided backpressure only by allowing users to tune a configuration knob that would ingest a certain number of messages at an input task and would block if we went over a high water mark without completing a sufficient number of messages. Under certain conditions, peers could drop messages as message buffers fill up, causing retries from the input source without any guarantee that forward progress is being made throughout the cluster. This is perhaps the most difficult part of tuning an Apache Storm cluster.

Onyx's unique, masterless design gives us the primitive constructs we need to do cluster-wide broadcast. We've leveraged broadcast to allow peers to pass messages to each other when they are receiving more segments than they can process. When backpressure messages are received, peers ingesting data automatically clamp down, allowing the processing peers to catch up. After a peer is able to work through its backlog of segments, it broadcasts a second message. On reception, the reading of segments by the input tasks resumes. The best part of this feature is that there's nothing to enable or configure. Simply upgrading to Onyx 0.7.0 will do the trick.

This is a hugely important feature for tuning very fast performing clusters, and allowing them to gracefully degrade under high loads. Onyx is progressively becoming more self-tuning to its environment and workload. You'll see more of this type of feature development in the near future.

Java Integration

Another big step forward for Onyx are the beginnings of Java integration. All catalog entries now support a new key - :onyx/language. The default is :clojure, and also permits :java. There's still some clean up to be done here - Java implementations need to communicate in terms of Clojure's data structures. You can see an example input plugin to get an idea of what we mean. Onyx 0.7.1 will feature a shim layer that insulates Java users from needing Clojure on the classpath. We'd love some help here if anyone is interested in submitting a patch. Getting protocols into place, instead of multimethods, was the big win for this release to enable Java interaction.

Monitoring

Putting a distributed system into production is a challenging task, no matter which product you're using. We realize that it's essential that your team has a bird's eye view into all the latency sensitive pieces of Onyx. Starting in 0.7.0, Onyx now offers a first-class monitoring API. Onyx now emits a vast amount of metrics about its I/O activity. You can graph the latency of calls to ZooKeeper, the number of bytes transferred between sockets, and more. We have an entire chapter about monitoring in the User Guide. Plug it into CloudWatch or Graphite to get insight into how Onyx is behaving in production.

Riemann Integration

Finally, we'd like to give a big thank you to Gardner Vickers for contributing to onyx-metrics. Onyx can now emit throughput and latency metrics directly into Riemann. Instrumenting your distributed workflows has gotten that much easier.

6

Road forward

We've come a long way, but there's still plenty to do. The journey of developing Onyx continues towards release 0.8.0. Until then, you can expect smaller releases with a sharper Java interface, more predictable execution models, and increased speed at the edges of Onyx in the official plugins. Thank you so much to everyone who has helped us get to this point. The fireworks will continue!