Autobahn messaging performance on the RaspberryPi

, Tobias Oberstein

Autobahn can act as a real-time message broker for WAMP Publish-and-Subscribe and achieves up to 6,000 events/second dispatching rate on the RaspberryPi when running under PyPy.

Summary

Note: Autobahn|Python no longer contains a WAMP router, and the measurements here were on a Rasbperry Pi 1. They still give an idea for what is possible on such a small machine. With a Raspberry Pi 2 and Crossbar.io performance should be significantly better.

Autobahn is an open-source real-time framework for Web, Mobile & Internet of Things that includes - besides other features - a message broker for Publish & Subscribe based (soft) real-time communication.

Now, the RaspberryPi is an awesome platform for all kinds of projects, but obviously not a high-end server.

Nevertheless, I was interested in doing benchmarks to come up with some hard numbers of what is possible in terms of messaging performance when running Autobahn on the Pi (see here for how to get started).

The actual results vary with the concrete load profile, as an example, take result 24 in the appendix. Here Autobahn is doing Publish & Subscribe over WebSocket at

1,000 PubSub events/sec with 32+ bytes payload to 1,000 subscribers with average latency of 25 ms at a CPU load of 65%.

As it turns out, these are the maximums I observed in any of the tests:

  • up to 6,000 events/sec dispatched (400 ms avg. latency, see result 10)
  • up to 6,000 clients served (850 ms avg. latency, see result 25)
  • up to 28 Mbit/sec bandwidth served (200 ms avg. latency, see result 7)
  • up to 200 Hz publish rate (0.7 ms avg. latency, see result 22)

I think this is pretty neat for a tiny computer like the Pi.

Within the bounds above, you can do a lot, and many application won't be practically restricted by those limits.

Discussion

Limits

Though I don't have hard evidence, I suspect that the network performance and scalability on the Pi ultimately is not only limited by the CPU, but by the fact that the Ethernet on the Pi is integrated to the system via USB. All this Ethernet-USB-TCP bridging code is resource intensive and constraining.

Perspective

To put those numbers into perspective, let's compare to independently derived numbers for a completely different software stack. MigratoryData (which claim to have the world's most scalable WebSocket server) have recently published a detailed benchmark here for their core server product (as Autobahn, this is WebSocket based, but running on a Java stack).

One number you can see in that benchmarking report is that MigratoryData is able to achieve a throughput of 200k messages/sec (with average latency of 424ms and worst case latency of 2024ms) at a CPU load of 59%.

The test system used in that benchmark has 12 Intel Xeon cores at 2.66 GHz (and a Intel 10GbE NIC) while our cute little Pi has a 700MHz single core ARM11 CPU. Hence, only taking into account core count and clock, this is a differential of 46x in term of hardware power. Lets do the math:

6,000 * 59% CPU load = 3,540 x 46 = 162,840

As can be seen, extrapolated, Autobahn on the Pi appears to be 23% slower.

Now, obviously, the Xeon system not only has more and faster cores, but also much larger and faster caches, memory subsystem and the Intel Xeon Westmere microarchitecture simply plays in a different league than the years old ARM11 architecture.

So taking those additional hardware related factors into account, my interpretation of the results here is that the performance in terms of throughput is roughly comparable. You might also have a detailed look at the latencies in the results below, compare and draw your own conclusions.

Of course this "comparison" is totally unscientific and questionable in multiple ways. Sure.

However, at the very least, doing 6k/s event dispatches on the Pi quite something.

You might start to think about what performance you actually need. And what you would and realistically can expect from a 25 $ machine;)

Testing yourself

The results were obtained after running a JIT warmup schedule (running the load client with different parameters, but without restarting the server). Results 1 - 19 were done with the load client running under CPython/select on Windows. As this has limits in itself, results 20 - 27 were obtained by running the load client on Linux under PyPy/epoll.

You will need to have Autobahn and PyPy running. Complete step-by-step guid of installing this can be found here

Download the load test server:

wget https://raw.github.com/tavendo/AutobahnPython/master/examples/wamp/pubsub/loadlatency/server.py

The load test server has a couple of command line options, and you can get help like this:

pi@raspberrypi ~ $ pypy server.py --help

Note that startup will take a couple of seconds. We are running on PyPy, which is JITting code, and the price for that is increased startup time. However, once PyPy has JITted all the hot code, you will get much higher performance. I guess it's another incarnation of "There is no free lunch.";)

For the test results below, we've been running the server without any specific options:

pi@raspberrypi ~ $ pypy server.py

Now, on your PC or notebook, download the load test client

wget https://raw.github.com/tavendo/AutobahnPython/master/examples/wamp/pubsub/loadlatency/client.py

and run the client

python client.py --wsuri ws://192.168.1.133:9000 -c 20 -r 50 -b 1 -p 10

You can get help for the options that control the load profile by doing

python client.py --help

The options we used to get the results below can be seen from the screenshots below.

Results

Result 1

  • 1000 events/s dispatch rate
  • 20 clients
  • 1 event published at 50Hz rate
  • 10+ bytes payload
Result 1

Result 2

  • 5000 events/s dispatch rate
  • 50 clients
  • 10 events published at 10Hz rate
  • 64+ bytes payload
Result 2

Result 3

  • 5000 events/s dispatch rate
  • 50 clients
  • 2 events published at 50Hz rate
  • 64+ bytes payload
Result 3

Result 4

  • 2500 events/s dispatch rate
  • 50 clients
  • 1 event published at 50Hz rate
  • 64+ bytes payload
Result 4

Result 5

  • 2500 events/s dispatch rate
  • 50 clients
  • 1 event published at 50Hz rate
  • 512+ bytes payload
Result 5

Result 6

  • 2500 events/s dispatch rate
  • 50 clients
  • 1 event published at 50Hz rate
  • 1024+ bytes payload
Result 6

Result 7

  • 2500 events/s dispatch rate
  • 50 clients
  • 1 event published at 50Hz rate
  • 1400+ bytes payload
Result 7

Result 8

  • 6000 events/s dispatch rate
  • 50 clients
  • 6 events published at 20Hz rate
  • 0+ bytes payload
Result 8

Result 9

  • 6000 events/s dispatch rate
  • 50 clients
  • 6 event published at 20Hz rate
  • 0+ bytes payload
Result 9

Result 10

  • 6000 events/s dispatch rate
  • 50 clients
  • 6 events published at 20Hz rate
  • 32+ bytes payload
Result 10

Result 11

  • 4000 events/s dispatch rate
  • 500 clients
  • 8 event published at 1Hz rate
  • 32+ bytes payload
Result 11

Result 12

  • 4000 events/s dispatch rate
  • 500 clients
  • 1 event published at 8Hz rate
  • 32+ bytes payload
Result 12

Result 13

  • 2000 events/s dispatch rate
  • 500 clients
  • 1 event published at 4Hz rate
  • 32+ bytes payload
Result 13

Result 14

  • 2000 events/s dispatch rate
  • 200 clients
  • 1 event published at 10Hz rate
  • 32+ bytes payload
Result 14

Result 15

  • 2000 events/s dispatch rate
  • 200 clients
  • 1 event published at 10Hz rate
  • 32+ bytes payload
Result 15

Result 16

  • 2500 events/s dispatch rate
  • 500 clients
  • 1 event published at 5Hz rate
  • 32+ bytes payload
Result 16

Result 17

  • 4000 events/s dispatch rate
  • 200 clients
  • 1 event published at 20Hz rate
  • 32+ bytes payload
Result 17

Result 18

  • 4000 events/s dispatch rate
  • 200 clients
  • 4 events published at 5Hz rate
  • 32+ bytes payload
Result 18

Result 19

  • 1000 events/s dispatch rate
  • 10 clients
  • 1 event published at 100Hz rate
  • 32+ bytes payload
Result 19

Result 20

  • 2000 events/s dispatch rate
  • 2000 clients
  • 1 event published at 1Hz rate
  • 32+ bytes payload
Result 20

Result 21

  • 2000 events/s dispatch rate
  • 10 clients
  • 1 event published at 200Hz rate
  • 32+ bytes payload
Result 21

Result 22

  • 1000 events/s dispatch rate
  • 5 clients
  • 1 event published at 200Hz rate
  • 32+ bytes payload
Result 22

Result 23

  • 4000 events/s dispatch rate
  • 10 clients
  • 1 event published at 0.2Hz rate
  • 32+ bytes payload
Result 23

Result 24

  • 1000 events/s dispatch rate
  • 1000 clients
  • 1 event published at 1Hz rate
  • 32+ bytes payload
Result 24

Result 25

  • 1000 events/s dispatch rate
  • 6000 clients
  • 1 event published at 0.2Hz rate
  • 32+ bytes payload
Result 25

Result 26

  • 4500 events/s dispatch rate
  • 50 clients
  • 90 events published at 1Hz rate
  • 32+ bytes payload
Result 26

Result 27

  • 4000 events/s dispatch rate
  • 500 clients
  • 8 events published at 1Hz rate
  • 32+ bytes payload
Result 27

Start experimenting and prototyping for IoT applications with Crossbar.io and Raspberry Pi!

Loaded with all the software you need to make the board part of WAMP applications!



Learn more

Recent posts

Atom Feed

Search this Site

Stay Informed

Sign up for our newsletter to stay informed of new product releases and features: