the 5 best visualization toolkits

Coding visualizations is a great way to learn programming because it's inherently rewarding. It's nice to write something like:

ellipse(20,20,35,35);

and see a circle, where you can immediately start tweaking the parameters, explore other drawing functions, and ask questions like "how can I change the color?" Compare this to the typical introductory programming lesson:

printf ("Hello World!\n");

Nothing really to explore here.

People often ask me to explain what tools I use to make visualizations and interactive projects. While no framework can address all needs, over the years I've had the chance to experiment with many of the popular ones and learn their strengths and weaknesses. Here are five of my favorites, in increasing order of my preference.

1) Flare / Prefuse (Flash / Java)

The Prefuse Java library, and its Flash cousin Flare, were the first visualization frameworks that I used extensively. These are very full-featured libraries that can be adapted to a variety of projects.

image I wrote this P2P visualization in Flare

Advantages

  • By using the object-oriented languages of Java and ActionScript 3, it's quite easy to modify components of the libraries -- replacing how lines are rendered or how groups are calculated in a visualization is as hard as dropping a single file in the right place.
  • As you can see in the demos, there are a plethora of layouts and chart types you can choose from. Animated transitions between layouts are also possible with the Transitioner class, which interpolates properties for you automatically.
  • Many useful abstractions like DataSets, filters, and property encoders allow you to easily manipulate a data set once it's loaded.

Disadvantages

  • Documentation and examples are a bit lacking: Definitely check out FLAP (Flare Assistance Pool) which has a lot of sample code and explains many common pitfalls.
  • The Flare library hasn't been updated since January 2009 -- not that there's major bugs or anything, but it hasn't seen the latest and greatest
  • Unfortunately Flash and Java applets are rapidly becoming (or already are) out of date. Visualizations made with Flare or Prefuse won't easily work on mobile devices, and are hard to integrate into other web components. It's possible to interact with Javascript via ExternalInterface, but it's a hurdle.

2) Google Chart Tools (Javascript)

Google Chart Tools is an Google product that allows you to make simple visualizations using an online tool, democratizing the process of visualization design. It has a very low barrier to entry and doesn't require much programming skill to create great-looking static charts and also simple interactive visualizations.

A Google chart describing a recent traffic spike here

Advantages

  • No coding of graphical elements required, and Google will host your visualization for free, allowing easy sharing across the web.
  • Many formats: bubble charts, line plots, treemaps, and even geographic maps are among the many choices available.
  • If all you want to do is generate images of charts or plots, you can do it with the Image Charts API, which only requires you to construct a URL with your data and formatting -- no coding at all.

Disadvantages

  • For better or worse, the images and data are stored on Google's servers, so if they are down for any reason, so is your graphic. You could, of course, cache it.
  • Importing data is a bit clunky -- there isn't a good way to just consume JSON or XML data and transform it into a chart, you have to either manually add data or use a Google DataSource protocol-supporting source.
  • Not really extensible -- if you wanted to add some code or some functionality, it's difficult to do because the drawing code isn't exposed.

3) Matplotlib (Python)

Matplotlib is a plotting and graphing tool very popular in the scientific community. With a great many packages for statistics, clustering, and plotting, it makes it easy to present numerical data.

image A pyplot clustering from an AI assignment of mine

Advantages

  • If you're using large datasets, the ease of integration with mathematical frameworks like numpy to process your data makes Python a no-brainer
  • Easy integration with web servers -- web frameworks like Django or even smaller frameworks like Flask can be used to generate visualizations for users.
  • Tons of information online since this gets a lot of use from the scientific community. Check out the variety of designs (complete with source code) at the Gallery

Disadvantages

  • No interactive visualizations - matplotlib only generates static graphics.
  • Because of the many dependencies of this framework, it's often difficult to get matplotlib running on your computer -- for example, on a Mac you'll have to install quite a bit of requirements before you get it working.
  • It's difficult to creative in a design sense with matplotlib -- maybe scientists don't have the best aesthetic sense?

4) Processing (Java-like)

Processing has quickly become a favorite among artists, designers, and programmers alike for its ease of use and focus on creating graphics. It uses its own language that builds on the Java programming language, simplifying the syntax and the creation of visual objects.

image

I wrote this fish schooling simulator in Processing

Advantages

  • Easy to load in data, and there are lots of great manipulation tools like map(), constrain(), norm() to alter data points, and a wide variety of transformations you can apply to the drawing primitives.
  • "throwing paint at a canvas" approach encourages a lot of creativity.
  • Very portable: You can export a project as a Java applet, or even a native application that you can run from the desktop. The ProcessingJS project allows running Processing sketches in an HTML5 canvas element and there is Android Processing support as well.
  • Excellent documentation for beginners, an extraordinary reference with examples, and a great forums for advanced questions. Protip: don't just Google for "processing (your question)" but instead format your searches as "(question) site:processing.org"; unfortunately Processing means many different things
  • There are good books for beginners, which guide you through a few projects - Visualizing Data by the author of Processing, is a good place to start.

Disadvantages

  • There aren't higher-level data representations built in, so you'll have to roll your own classes to keep track of layers and individual objects. At the end of the day, Processing just throws "paint" at a "canvas" -- it's up to you to structure it.
  • Because objects don't exist on the canvas, mouse interactions are sometimes difficult to code. For example, registering a click on an object has to be done by calculating the distance between mouse and object and seeing if the click was inside. This can get a bit tedious.

5) D3 (JavaScript)

A relative newcomer, D3 comes from the same author of the popular Protovis library, and has an interesting philosophy of separating data manipulation from the presentation layer. It allows for clean code that generates interactive visualizations that work on browsers and mobile devices alike. Its great integration with the HTML stack makes this my current favorite framework for general-purpose visualization.

image

A character map I built using D3

Advantages

  • This is a young framework, so there aren't that many tutorials, but there is a very active Google Group for Q&A. The author, Mike Bostock, personally responds to a great many of the questions asked.
  • Javascript and jQuery are very familiar to most web citizens, and this library borrows heavily from the jQuery selector style of programming.
  • D3 focuses on representing the data -- you can use whatever you'd like as the presentation layer, such as SVG, simple DOM elements, or a Canvas.
  • Data manipulation is very simple -- whenever the data set changes, you can see exactly what data points need to be added or removed, and react accordingly
  • Can create static as well as interactive graphics.

Disadvantages

  • SVG, the presentation layer most commonly used in D3, is pretty unfamiliar to many -- but it doesn't take that long to get used to.

Others

There are many more toolkits out there that are worth checking out

  • Raphael- a neat SVG drawing library
  • ManyEyes- data visualization tools from IBM that require no coding and come with hosting and commenting capabilities.
  • NodeBox- ( mac only) a small python IDE that lets you create beautiful visualizations, and is particularly well suited for network visualizations and generative art
  • Raw <canvas>- check out this Mozilla canvas tutorial on how to use Canvas on its own

I hope this is helpful!