1. It has been a long while since I attended a talk as engaging and relevant as "Ember and D3 Interactive Dashboards" by Sam Selikoff at the Boston D3 Meetup. He presented a thoroughly elegant approach to developing Web-based interactive data visualization systems. The approach involves merging the Ember.js MVC API with reusable D3 visualizations coded using the API style described in Mike Bostock's "Towards Reusable Charts". The result is a stunningly elegant system that manages to solve many reoccurring implementation challenges that come up when building interactive Web-based data visualization systems. This approach fits particularly well with systems involving multiple linked views and extensive UI elements for various filtering and selection options.

    I built an interactive visualization dashboard with multiple linked views this summer for Rapid7. Here's a report on the UserInsight Ingress Dashboard.
    UserInsight Ingress Dashboard This dashboard shows an aggregated view of login data. The visualizations show the locations, times, failure rates, and service breakdown of the data. Every visualization has interactions for filtering all the others.
    Filtering on the UserInsight Dashboard For example, clicking on a bar filters the data shown in the other views by service, and brushing over a region of time filters the data shown in other views by time range.

    Implementation of this type of visualization system requires abstractions or software patterns that separate the various interlocking components cleanly. This is because the system must be incrementally developed, and must scale in complexity over time and as requirements change. A developer should be able to take my code and introduce arbitrarily many more linked visualizations without fundamentally making the software more complex.

    In striving to creating a scalable visualization dashboard creation system, I introduced an MVC-framework-like-thing that Rapid7 kindly allowed me to Open Source as as the DashboardScaffold project (for use in both the UserInsight product my Ph. D. dissertation). This project is a framework that encourages strict separation between reusable visualization components and configuration of those components. Here, "configuration" refers to
    • the layout of the visualizations,
    • the parameters that customize the reusable visualization components, and
    • the connections in a data flow network between interactive visual components.
    DashboardScaffold allows users to interactively manipulate all aspects of the visualization dashboard configuration in a live-coding-esque configuration editor.
    dashboardScaffold
    The dashboardScaffold library in action. Try it out! Check out the code!

    Changes in the configuration JSON update the visualizations instantly. The text editor is augmented by the library Inlet, which supports interactive editing editing of textual colors and numbers through a color picker widget and a slider. Changes made by interacting with the visualizations directly update the configuration text instantly.

    I used a pattern of getter-setters-with-events to establish the interconnections between components in terms of change propagation and data flow. This pattern however resulted in redundant calls to visualization update functions when multiple properties changed at the same time. I solved this by using a "debounce" operation on visualization update functions to collapse multiple sequential updates into a single update execution on the next tick of the JavaScript event loop. This worked overall, but the fact that I had to type "_.debounce" so many times struck me as inelegant, and I felt there must be a better solution to this.

    After seeing that talk on Ember with D3, I'm now convinced that Ember's approach to modeling data flows is superior to the one I came up with. In fact, others have gone before down this path. In the StackOverflow thread entitled "D3 with Backbone / D3 with Angular / D3 with Ember?", Sam Selikoff hits the nail on the head in his response pointing out the advantages of Ember's "computed properties" over a more roll-your-own change propagation approach:

    Computed properties: you'll often be displaying aggregates, so slicing your data using computed properties means you just have to call your chart's update function whenever the data changes, and you're good to go. No more worrying about sending off an event to every view that will change when one specific part of your data changes. Plus, these will probably be properties on your controllers, instead of being calculated within a specific chart or view, which will make reuse much easier.
    It turns out that Ember implements a solution to the crux of the problem so often encountered in programming large, complex, multi-faceted data visualization systems: data dependency management. Ember allows consumers of their API to define data dependency graphs, then ensures all changes propagate properly and without redundancy through the data flow network of the system. This approach couples nicely with the D3 reusable chart API style, in which updates to any given visualization parameter can be handled such that only the necessary aspects of the visualization are recomputed (not necessarily the whole thing).

    One thing that struck me was the amazingly effective use of a JavaScript practice that feels slightly sketchy - modifying the Function prototype. Check out this piece of code by Sam Selikoff (from Ember and D3: Building a simple dashboard):

    App.CompaniesController = Ember.ArrayController.extend({
      filter: 'newContracts',
    
      data: function() {
        if (this.get('model.isLoaded')) {
          var _this = this;
    
          var data = this.map(function(company) {
            return {
              category: company.get('name'),
              count: company.get( _this.get('filter') ),
            };
          });
        }
        return data;
      }.property('model', 'filter')
    });
    

    This code sets up how changes in either the model (which is really a month filter) or the filter (which is a selection of what field to display) propagate through to a chart he has defined in other code. Notice the following odd-looking structure:

    function(){...}.property(...)
    
    Normally, functions don't have a method called "property". Ember.js has added this function to JavaScript's function prototype. This to me is mind-bending in terms of API design, but this approach is hugely effective. The resulting code is ultra-concise in defining nodes in a data dependency network. These nodes are comprised of how their values are computed (the function implementation) and their dependencies (the arguments to "property()"). Ember correctly handles propagations of changes through the data dependency graph established by this code. This immensely simplifies the mental load of the visualization programmer, as one of the most complex aspects of the system is taken care of.

    Square's retrospective post "Ember and D3: Building responsive analytics" describes their process for building interactive visualization dashboards with multiple linked views for customer-facing business analytics. As they say:
    Ember makes coordinating data across multiple views and embedded model objects scarily easy...
    In terms of performance and responsiveness, Square also faced the challenge that a network round-trip for visualization interactions was too much of a performance hit. This makes the issue of handling data on the client side that much more important:
    Querying, sending, and manipulating tens of thousands of payments is a pretty intensive task. For the sake of responsiveness, we ended up sending down all of a user’s payments to the browser...
    Square also uses Crossfilter, a multidimensional filtering library they created, in conjunction with D3 and Ember.

    Many visualization challenges we see today revolve around "big data". In order to comprehend such "big data", it needs to be reduced or aggregated before presentation, visual or otherwise. At Rapid7 I worked with a team to develop a visualization dashboard for ingress activity backed by a Cassandra cluster ingesting millions of raw events per minute. The dashboard had to show up-to-the-minute data, provide historical context, and support interactive multidimensional filtering along four dimensions (time, space, success, and service). Some might call this a "big data" visualization challenge.

    Scenarios like this involve balancing data preprocessing, filtering and aggregation operations between the server (or distributed cloud system) and the client. If all operations are done on the server, every interaction requires a network round-trip. If most of the data is loaded up front, the client-side interactions would be instantaneous but the initial load time would be unacceptable. Once the data is loaded into client memory, handling it in conjunction with interactive elements such as filters can quickly become problematic without the right set of abstractions and patterns. I'm sold on the Ember + D3 approach. Now it's time to put it into practice!
    2

    View comments

  2. Today I released a lot of my code as open source here. Enjoy!
    3

    View comments

  3. to http://curransoft.com/code/
    1

    View comments

  4. Supposedly the Semantic Web contains lots of information, but what information? What is in there? Show me the money! I want to ask Wikipedia "What are all universities in the world and their student populations?" and I thought, what better test query for the Semantic Web?

    I began my search by looking at various example queries through the DBPedia faceted browser:
    [links from this page]
    That's pretty cool, I can manage to list all universities, but how to display their student populations? I see no way of doing this with this faceted browser tool, which is really a faceted search tool, because all it allows one to do is filter the listing based on various parameters, not to specify which parameters are shown.

    I think we'll need to learn some SPARQL to get at the student populations. I came across a nice SPARQL tutorial from XML.com and began following it. It seems like SNORQL is a widely used query interface for DBPedia. The SNORQL link from the DBPedia Applications page presents an example query executed on the DBPedia knowledge base: "List all names, birth dates and death dates for all people born in Berlin before 1900."
    PREFIX dbo: <http://dbpedia.org/ontology/>

    SELECT ?name ?birth ?death ?person WHERE {
    ?person dbpedia2:birthPlace <http://dbpedia.org/resource/Berlin> .
    ?person dbo:birthDate ?birth .
    ?person foaf:name ?name .
    ?person dbo:deathDate ?death
    FILTER (?birth < "1900-01-01"^^xsd:date) .
    }
    Click here to see the result in the SNORQL Query Browser (GitHub page), a project by Richard Cyganaik for the D2R server project. Notice how nowhere in the query is "Person" specified - that is assumed given the fact that the resource has a "birth date".

    I discovered something I had been wondering how to do: a query listing all the outgoing edges from a given resource, in other words listing all RDF (subject-predicate-object) triples which have a given resource as the subject. Here's a query which lists all the triples with Konraz Zuse (a listing in the previous result) as the subject:
    SELECT ?property ?hasValue
    WHERE {
    { <http://dbpedia.org/resource/Konrad_Zuse> ?property ?hasValue }
    }
    See the results by clicking here. Here are some triples returned:
    property hasValue
    http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://xmlns.com/foaf/0.1/Person
    http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://dbpedia.org/class/yago/Scientist110560637
    http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://dbpedia.org/class/yago/ComputerPioneers
    http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://www.w3.org/2002/07/owl#Thing
    http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://dbpedia.org/class/yago/ComputerDesigners
    http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://dbpedia.org/class/yago/ProgrammingLanguageDesigners
    http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://dbpedia.org/class/yago/Werner-von-Siemens-RingLaureates
    http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://dbpedia.org/class/yago/GermanInventors
    http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://dbpedia.org/class/yago/ComputerHardwareEngineers
    http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://dbpedia.org/class/yago/GermanCivilEngineers
    http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://dbpedia.org/class/yago/GermanComputerScientists
    http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://dbpedia.org/ontology/Person
    http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://dbpedia.org/ontology/Scientist
    http://www.w3.org/2002/07/owl#sameAs http://umbel.org/umbel/ne/wikipedia/Konrad_Zuse
    http://www.w3.org/2002/07/owl#sameAs http://www4.wiwiss.fu-berlin.de/dblp/resource/person/120373
    http://www.w3.org/2002/07/owl#sameAs http://rdf.freebase.com/ns/guid.9202a8c04000641f8000000000022414
    http://www.w3.org/2000/01/rdf-schema#comment Konrad Ernst Otto Zuse war ein deutscher Bauingenieur, Erfinder und Unternehmer. Mit seiner Entwicklung des Z3 im Jahre 1941 baute er den ersten universellen Computer der Welt.
    http://www.w3.org/2000/01/rdf-schema#comment コンラート・ツーゼ(Konrad Zuse、1910年6月22日 - 1995年12月18日)は、ドイツの技術者である。彼の最も重要な業績は、1941年に世界初の完全動作するプログラム制御式コンピュータ Zuse Z3 を完成させたことである(プログラムはテープに格納)。1998年、Z3 はチューリング完全であることが証明された。 何が世界初のコンピュータかという問題はコンピュータの定義に依存するが、Z3 は後のマシンと比較したときに汎用性に問題がある。ツーゼは高級プログラミング言語 プランカルキュール を1945年に設計したが、これは理論的な部分での業績であり、彼の生きている間には実装されることもなく後のプログラミング言語にも直接的な影響を与えることはなかった。 技術的な業績だけでなく、ツーゼは1946年に世界初のコンピュータ企業を設立した。この会社は世界初の商用コンピュータZ4を開発し、1950年にチューリッヒ工科大学にリースしている。第二次世界大戦の影響で、ツーゼの業績の大部分はイギリスやアメリカ合衆国では気づかれなかった。アメリカの企業で彼の影響が見られたのは 1946年にIBMがツーゼに特許使用許諾を得たのが最初である。1960年代後半になると、ツーゼは計算する宇宙(計算によって成り立つ宇宙)の概念を提唱した。 Z4とZ3の複製品がミュンヘンのドイツ博物館にある。 ベルリンの Deutsches Technikmuseum Berlin はコンラート・ツーゼおよび彼の作品に関する特別展示をしている。再現されたZ1を含む12台の彼のマシン、オリジナルの文書、いくつかのツーゼの描いた絵などが展示されている。
    http://www.w3.org/2000/01/rdf-schema#comment Konrad Zuse var en tysk pioneer innenfor informatikk. Hans største bragd var konstruksjonen av den første funksjonelle datamaskin med programmer lagret på tape, kalt Z3, i 1941.
    http://www.w3.org/2000/01/rdf-schema#comment Конрад Цузе — немецкий инженер, пионер компьютеростроения. Наиболее известен как создатель первого действительно работающего программируемого компьютера и первого языка программирования высокого уровня.
    http://www.w3.org/2000/01/rdf-schema#comment Konrad Zuse was a German engineer and computer pioneer who collaborated with the German government during World War 2, which helped finance his projects. His greatest achievement was the world's first functional program-controlled Turing-complete computer, the Z3, in 1941 (the program was stored on a punched tape). He received the Werner-von-Siemens-Ring in 1964 for the Z3.

    Wow. We get to know all the "kinds of person" he is known to be - a German inventor, scientist and computer pioneer - and summaries in many languages, and many more results not shown. Thats pretty impressive.

    I played with the query a bit and remembered seeing the "DISTINCT" keyword in some SPARQL queries. Here's a variation on the above query that gives you a list of all unique property types which are applied to Konrad Zuse:
    SELECT DISTINCT ?property
    WHERE {
    { <http://dbpedia.org/resource/Konrad_Zuse> ?property ?hasValue }
    }
    Amazingly, the example queries from the XML.com tutorial actually execute through the DBPedia Snorql instance!

    Very nice! Now we have some example queries for listing of some property values, a key ingredient in our "What are all universities in the world and their student populations?" puzzle. I see that the queries work, but how do I find out what vocabulary I can use? Google "DBPedia Ontology" and you're there in a few clicks. The DBPedia Ontology page contains many useful links. Here is the DBPedia ontology (class hierarchy) in text form. From that page there is a link to the University ontology class, which in turn lists all of its properties, including "numberOfStudents".

    How can we use these things in a SPARQL query? We need URIs, not unqualified strings like "numberOfStudents". I noticed that in the result of a previous query, [Konraz Zuse, rdf:type dbpedia:ontology/Person] was a triple. If you expand the object into its full URI (PREFIX dbpedia: ) you get http://dbpedia.org/ontology/Person, which when accessed in a browser gives you a Linked Data interface to DBPedia. Just to see, I replaced Person with University, and sure enough, here is the Linked Data description of the DBPedia University class:
    http://dbpedia.org/ontology/University, which reveals that University is the domain of the owl:Property
    http://dbpedia.org/ontology/numberOfStudents, which we can use in our query.

    Here's the query that returns a listing of 10 universities from DBPedia:
    PREFIX dbo: <http://dbpedia.org/ontology/>
    SELECT ?university WHERE {
    ?university rdf:type dbo:University.
    } LIMIT 10
    So, Wikipedia, what are all universities in the world and their student populations?
    ...translates to...
    PREFIX dbo:  SELECT ?name ?students WHERE {      ?university rdf:type dbo:University.      ?university foaf:name ?name.      ?university dbo:numberOfStudents ?students } ORDER BY DESC(?students) LIMIT 50
    See the results here! Here are the top 50, formatted from RDF into HTML via the XSLT transformation provided with Snorql:
    namestudents
    Indira Gandhi National Open University3000000
    California Community Colleges System2900000
    The Open University of China2700000
    Church Educational System1200000
    Florida College System800000
    Universitas Terbuka580458
    အဝေးသင် တက္ကသိုလ် (ရန်ကုန်)560000
    มหาวิทยาลัยรามคำแหง525000
    City University of New York483000
    The University System of Ohio478367
    State University of New York438361
    State University of New York438361
    Bangladesh Open University432767
    বাংলাদেশ উন্মূক্ত বিশ্ববিদ্যালয়432767
    California State University417112
    Chicago Public Schools407955
    Yashwantrao Chavan Maharashtra University400000
    Community College of the Air Force351715
    Centre national d'enseignement à distance350000
    University of Buenos Aires308594
    Universidad de Buenos Aires308594
    National Autonomous University of Mexico305969
    Universidad Nacional Autónoma de México305969
    Oklahoma State System of Higher Education236000
    University of Delhi220000
    DU220000
    दिल्ली विश्वविद्यालय220000
    Cairo University200000
    Gāmaʿat al-Qāhirah200000
    جامعة القاهرة200000
    University of Guadalajara195071
    University of North Carolina183000
    Korea National Open University182859
    Universidad Bolivariana de Venezuela180000
    Universidad Nacional de Educación a Distancia180000
    National University for Distance Education180000
    The Open University168850
    Universidad Autónoma de Santo Domingo167533
    UASD167533
    Sharks161668
    Miami Dade College161668
    Studium Urbis147000
    Sapienza – Università di Roma147000
    Modern University for the Humanities140000
    University of Bikaner140000
    University of London135090
    Universitas Londiniensis135090
    Universidade Norte do Paraná130000
    Norte do Paraná University130000
    Universidad Autónoma de Nuevo León129341

    Here's a version that gives you (lat,long) coordinates too.

    Mission accomplished. It looks like the Semantic Web is quite promising after all. I look forward to seeing a user interface which would allow me to construct that query and view the results visually within one browser window - including all vocabulary researching.

    What an incredible time to be alive - the collective body of human common knowledge is finally available for machines to process. I wonder what will come next.
    9

    View comments

  5. Greetings! This blog is a collection of guides and technical notes for doing various things, mostly in Ubuntu Linux. Here is a list the entries I find myself going back to often:

    If you appreciate this site and want to support it, go for it!

    Ubuntu
    Ubuntu Introduction
    Installation scripts

    Editors:
    Installing Eclipse
    Emacs (and SLIME with Clojure)
    Emacs Keystroke Reference
    Vim
    Deploying a WAR file in Jetty
    Getting Started with Java Persistence API
    R and Java
    Projects:

    All code and text on this blog is in the public domain, free to use and modify with no restrictions whatsoever.

    Enjoy!

    --Curran
    5

    View comments

  6. Axioms for Human Computer Interaction Design

    An executive summary of the excellent article by Juhan Sonin found at
    http://www.mit.edu/~juhan/design_axioms.html
    • Let data scream (screen real estate: 85% data, 15% UI)
    • Always use real data
    • Prototype like crazy (fast development iterations)
    • Address layout, color, and interaction design from the start
    • Allow users to bitch about your service quickly (1-click feedback)
    • Dogfood your services
    • Ask for forgiveness rather than permission (just do)
    • Get continual feedback in the domain vernacular
    • Use grid hierarchies and basic information layouts
    • When in doubt, bang left
    • Pay attention to good typography practice
    • Use less than 5 type treatments of only 1 type face
    • Use better words and less of them
    • Color carefully
    • The document should be center stage, not the paint
    • What interface? Great interfaces disappear and have low cognitive overhead
    • People should be engaging directly with the content (e.g. iPhone photo app)
    • Manipulate the data, not the interface
    • Over time as you use a service or product, the interface melts into the background.
    • Design first for the repeat user (jedi), then novice, then the infrequent/inexperienced user.
    • Experience breeds familiarity, and in interface design familiarity promotes usability.
    • Cognitive heat sink: I get it, feels right, go here... instead of oh yeah, I remember where that was
    • Use keyboard shortcuts to supplement intuitive actions (enabling ninja speed)
    • Design every second of the experience
    • Performance ("snappiness") counts.
    • Products thrive (or wither) based on the users’ experience








    4

    View comments

  7. Mapping a region and adding it to OpenStreetMap is a gratifying experience.

    Today I thought I'd try out the OSMTracker Android app. I used it to record GPS traces of an unmapped area in Paxton, MA - walking trails near a small lake called Turkey Hill Pond. I rode my bike through the trails while the app was running. At the end I had these "GPX" files which I opened up in JOSM and used as the basis for adding new paths.

    I noticed that the entire lake was missing so I installed the wms plugin for JOSM and used the Yahoo satellite images to trace out the lake. Tagging was straightforward once I found the tag definition page. I created an OSM account and uploaded the data, and within a few hours the image tiles were updated.

    Here is a before and after screen capture of the OSM map:
    turkey hill pond edits

    How cool! Its a great feeling to know I've contributed something, and that anyone who looks at this region in OSM in the future will see these trails.

    Next I'd like to map the Lowell state forest - there is no good map for all those trails.
    0

    Add a comment

  8. Here is how to get Flexbuilder working in Linux.

    Set up the Flex 3.5 SDK:
    cd ~/opt
    mkdir flex_sdk_3.5
    cd flex_sdk_3.5
    wget http://fpdownload.adobe.com/pub/flex/sdk/builds/flex3/flex_sdk_3.5.0.12683.zip
    unzip flex_sdk_3.5.0.12683.zip
    rm flex_sdk_3.5.0.12683.zip
    echo "PATH=\$PATH:\$HOME/opt/flex_sdk_3.5/bin" >> ~/.bashrc

    Install Flexbuilder
    cd ~/opt
    mkdir flexbuilder
    cd flexbuilder
    wget http://download.macromedia.com/pub/labs/flex/flexbuilder_linux/flexbuilder_linux_install_a5_112409.bin
    chmod +x flexbuilder_linux_install_a5_112409.bin
    ./flexbuilder_linux_install_a5_112409.bin

    Now open Eclipse and add the Flex 3.5 SDK in Properties -> Flex Compiler -> Configure Flex SDKs -> add -> choose the folder ~/opt/flex_sdk_3.5 -> ok -> choose to use the "Flex 3.5" SDK.

    Theoretically that should do it, but at the time I did this (8/24/10) some additional steps were required. First I got the error

    java.lang.IllegalArgumentException: "The attribute value type is com.adobe.flexbuilder.project.compiler.internal.ProblemManager and expected is one of java.lang.String, Boolean, Integer"

    which is a result of this bug. To fix this, follow the instructions here. The instructions were not clear on this point: you need to copy the class file into the inside of the jar file.

    The next error I had trouble with was the following:

    configuration variable 'target-player' must only be set once

    This persisted even after I went into Properties -> Flex Compiler and unchecked the box that says "Require Flash Player version". It turns out that the player version is defined in the SDK in

    flex_sdk_3.5/frameworks/flex-config.xml

    I edited that file and commented out the line with the required player version. Even after this the error is still there for me... until I applied the patch described here. Now FlexBuilder in Linux is working! Woo hoo!
    0

    Add a comment

  9. Here is a small project I did which lets you play several samples at several frequencies - quarter notes, eighth notes and sixteenth notes. The GUI is written in Java using Processing as a library. The audio generation is done using PureData, driven by messages sent from the Java GUI over a socket.

    Here is the code. To run it, first open the PureData patch (you can use the startPd.sh script for this or do it manually), then run the Java program. The Java code communicates to the PureData patch over a socket.


    rhythmSample_pd

    rhythmSample_window

    Here is how it sounds when played:
    RhythmSample-sound by curran
    0

    Add a comment

  10. Here's a slick way of nicely reporting errors using Grails' built-in validation machinery:
    package adminprototype
    class AdminController {
    def index = { }
    def createDatabaseConnection = {
    render Utils.saveAndReport(
    new DatabaseConnection(params))
    }
    }
    package adminprototype
    class Utils {
    static String saveAndReport(Object o){
    return o.save()?"success":reportErrors(o)
    }
    static String reportErrors(Object o){
    def error = o.errors.getFieldError()
    String field = error.getField().toString()
    field = field.replaceAll("\\B([A-Z])", " \$1")
    field = field.toLowerCase()
    String bogus = error.getRejectedValue()
    return "The "+field+" cannot be "+(bogus?bogus:"blank")+"."
    }
    }
    Now if I leave out the field "databaseName" in a request like this:
    http://localhost:8080/AdminPrototype/admin/createDatabaseConnection?port=1234
    I get a really nice error message with the camel case field name converted into a space delimited name:
    The database name cannot be blank.
    1

    View comments

Loading