It has been a long while since I attended a talk as engaging and relevant as "Ember and D3 Interactive Dashboards" by Sam Selikoff at the Boston D3 Meetup. He presented a thoroughly elegant approach to developing Web-based interactive data visualization systems. The approach involves merging the Ember.js MVC API with reusable D3 visualizations coded using the API style described in Mike Bostock's "Towards Reusable Charts". The result is a stunningly elegant system that manages to solve many reoccurring implementation challenges that come up when building interactive Web-based data visualization systems. This approach fits particularly well with systems involving multiple linked views and extensive UI elements for various filtering and selection options.
I built an interactive visualization dashboard with multiple linked views this summer for Rapid7. Here's a report on the UserInsight Ingress Dashboard.
This dashboard shows an aggregated view of login data. The visualizations show the locations, times, failure rates, and service breakdown of the data. Every visualization has interactions for filtering all the others.I built an interactive visualization dashboard with multiple linked views this summer for Rapid7. Here's a report on the UserInsight Ingress Dashboard.
For example, clicking on a bar filters the data shown in the other views by service, and brushing over a region of time filters the data shown in other views by time range.
Implementation of this type of visualization system requires abstractions or software patterns that separate the various interlocking components cleanly. This is because the system must be incrementally developed, and must scale in complexity over time and as requirements change. A developer should be able to take my code and introduce arbitrarily many more linked visualizations without fundamentally making the software more complex.
In striving to creating a scalable visualization dashboard creation system, I introduced an MVC-framework-like-thing that Rapid7 kindly allowed me to Open Source as as the DashboardScaffold project (for use in both the UserInsight product my Ph. D. dissertation). This project is a framework that encourages strict separation between reusable visualization components and configuration of those components. Here, "configuration" refers to
- the layout of the visualizations,
- the parameters that customize the reusable visualization components, and
- the connections in a data flow network between interactive visual components.
DashboardScaffold allows users to interactively manipulate all aspects of the visualization dashboard configuration in a live-coding-esque configuration editor.
The dashboardScaffold library in action. Try it out! Check out the code!
Changes in the configuration JSON update the visualizations instantly. The text editor is augmented by the library Inlet, which supports interactive editing editing of textual colors and numbers through a color picker widget and a slider. Changes made by interacting with the visualizations directly update the configuration text instantly.
I used a pattern of getter-setters-with-events to establish the interconnections between components in terms of change propagation and data flow. This pattern however resulted in redundant calls to visualization update functions when multiple properties changed at the same time. I solved this by using a "debounce" operation on visualization update functions to collapse multiple sequential updates into a single update execution on the next tick of the JavaScript event loop. This worked overall, but the fact that I had to type "_.debounce" so many times struck me as inelegant, and I felt there must be a better solution to this.
After seeing that talk on Ember with D3, I'm now convinced that Ember's approach to modeling data flows is superior to the one I came up with. In fact, others have gone before down this path. In the StackOverflow thread entitled "D3 with Backbone / D3 with Angular / D3 with Ember?", Sam Selikoff hits the nail on the head in his response pointing out the advantages of Ember's "computed properties" over a more roll-your-own change propagation approach:
Computed properties: you'll often be displaying aggregates, so slicing your data using computed properties means you just have to call your chart's update function whenever the data changes, and you're good to go. No more worrying about sending off an event to every view that will change when one specific part of your data changes. Plus, these will probably be properties on your controllers, instead of being calculated within a specific chart or view, which will make reuse much easier.It turns out that Ember implements a solution to the crux of the problem so often encountered in programming large, complex, multi-faceted data visualization systems: data dependency management. Ember allows consumers of their API to define data dependency graphs, then ensures all changes propagate properly and without redundancy through the data flow network of the system. This approach couples nicely with the D3 reusable chart API style, in which updates to any given visualization parameter can be handled such that only the necessary aspects of the visualization are recomputed (not necessarily the whole thing).
One thing that struck me was the amazingly effective use of a JavaScript practice that feels slightly sketchy - modifying the Function prototype. Check out this piece of code by Sam Selikoff (from Ember and D3: Building a simple dashboard):
App.CompaniesController = Ember.ArrayController.extend({ filter: 'newContracts', data: function() { if (this.get('model.isLoaded')) { var _this = this; var data = this.map(function(company) { return { category: company.get('name'), count: company.get( _this.get('filter') ), }; }); } return data; }.property('model', 'filter') });
This code sets up how changes in either the model (which is really a month filter) or the filter (which is a selection of what field to display) propagate through to a chart he has defined in other code. Notice the following odd-looking structure:
function(){...}.property(...)Normally, functions don't have a method called "property". Ember.js has added this function to JavaScript's function prototype. This to me is mind-bending in terms of API design, but this approach is hugely effective. The resulting code is ultra-concise in defining nodes in a data dependency network. These nodes are comprised of how their values are computed (the function implementation) and their dependencies (the arguments to "property()"). Ember correctly handles propagations of changes through the data dependency graph established by this code. This immensely simplifies the mental load of the visualization programmer, as one of the most complex aspects of the system is taken care of.
Square's retrospective post "Ember and D3: Building responsive analytics" describes their process for building interactive visualization dashboards with multiple linked views for customer-facing business analytics. As they say:
Ember makes coordinating data across multiple views and embedded model objects scarily easy...In terms of performance and responsiveness, Square also faced the challenge that a network round-trip for visualization interactions was too much of a performance hit. This makes the issue of handling data on the client side that much more important:
Querying, sending, and manipulating tens of thousands of payments is a pretty intensive task. For the sake of responsiveness, we ended up sending down all of a user’s payments to the browser...Square also uses Crossfilter, a multidimensional filtering library they created, in conjunction with D3 and Ember.
Many visualization challenges we see today revolve around "big data". In order to comprehend such "big data", it needs to be reduced or aggregated before presentation, visual or otherwise. At Rapid7 I worked with a team to develop a visualization dashboard for ingress activity backed by a Cassandra cluster ingesting millions of raw events per minute. The dashboard had to show up-to-the-minute data, provide historical context, and support interactive multidimensional filtering along four dimensions (time, space, success, and service). Some might call this a "big data" visualization challenge.
Scenarios like this involve balancing data preprocessing, filtering and aggregation operations between the server (or distributed cloud system) and the client. If all operations are done on the server, every interaction requires a network round-trip. If most of the data is loaded up front, the client-side interactions would be instantaneous but the initial load time would be unacceptable. Once the data is loaded into client memory, handling it in conjunction with interactive elements such as filters can quickly become problematic without the right set of abstractions and patterns. I'm sold on the Ember + D3 approach. Now it's time to put it into practice!
View comments