Skip to main content

A D3 Image is Worth a Thousand Words: Interview with Morgane Ciot

Many things have been said and done in the realm of analytics, but visualizations remain as the forefront of the data analysis process, where intuition and correct interpretation can help us make sense of data.

As an increasing number of tools emerge, current visualizations are far more than mere pictures in a screen, allowing for movement, exploration and interaction.

One of this tools is D3, an open-source Javascript data visualization library. D3 is perhaps the most popular tool to develop rich and interactive data visualizations, used by small and large companies such as Google and the New York Times.

With the next Open Data Science Conference in Boston coming soon, we had the opportunityto talk with DataRobot’s and ODSC speaker Morgane Ciot about her workshop session: “Intro to 3D”, the state of data visualization and her very own perspectives around the analytics market.


Morgane Ciot is a data visualization engineer at DataRobot, where she specializes in creating interactive and intuitive D3 visualizations for data analysis and machine learning. Morgane studied computer science and linguistics at McGill University in Montreal. Previously, she worked in the Network Dynamics Lab at McGill, answering questions about social media behavior using predictive models and statistical topic models.

Morgane enjoys studying about machine learning (ML), reading, writing, and staging unusual events.

Let's get to know more about Morgane and her views as a data visualization engineer.

Morgane, could you tell us a bit more about yourself, especially about your area of expertise, and what was your motivation to pursue a career in analytics and data science?

I went to school for computer science and linguistics. Those two fields naturally converge in Natural Language Processing (NLP)/Artificial Intelligence (AI), an intersection that was unfortunately not exploited by my program but that nonetheless got me interested in machine learning.

One of the computer science professors at my school was doing what essentially amounted to sociological research on social media behavior using machine learning techniques. Working with him furthered my interest in ML, NLP, and topic modeling, and I began to also explore how to visualize some of the unmanageable amounts of data we had (like, all of Reddit).

I’m probably indebted to that part of my life, and my professor, for my current position as a data viz engineer. Also, machine learning's practical ramifications are going to be game changing. I want to live closest to the eye of the storm when the singularity hits.

Based on your experience, which attributes or skills should every data master have if he/she wants to succeed, and what would be your recommendations for those looking for an opportunity at this career?

Stats, problem-solving skills, and engineering or scripting abilities all converge in the modern data scientist.

You have to be able to understand how to formulate a data science problem, how to approach it, and how to build the ad hoc tools you’ll need to solve it. At least some basic statistical knowledge is crucial. Elements of Statistical Learning by Hastie and Andrew Ng’s Coursera course both provide a solid foundational understanding of machine learning and require some statistical background.

Learn at least one programming language — Python or R are the most popular. R is the de facto language for statisticians, and Python has a thriving community and a ton of data science libraries like scikit-learn and pandas. It’s also great for writing scripts to scrape web data. If you’re feeling more adventurous, maybe look into Julia.

As usual, don’t just learn the theory. Find a tangible project to work on. Kaggle hosts competitions you can enter and has a community of experts you can learn from.

Finally, start learning about deep learning. Many of the most interesting papers in the last few years have come out of that area and we’re only just beginning to see how the theory that has been around for decades is going to be put into practice.

Talking about data visualization, what is your view of the role it plays within data science? How important is it in the overall data science process?

Data visualization is pretty fundamental to every stage of the data science process. I think how it’s used in data exploration — viewing feature distributions — is fairly obvious and well-practiced, but people often overlook how important visualizations can be even in the modeling process.

Visualizations should accompany not just how we examine our data, but also how we examine our models! There are various metrics that we can use to assess model performance, but what’s really going to convince an end user is a visualization, not a number. That's what's going to instill trust in model decisions.

Standard introductions to machine learning lionize the ROC curve, but there are plenty of other charts out there that can help us understand what and how a model is doing: plotting predicted vs. actuals, lift charts, feature importance, partial dependence, etc. — this was actually the subject of my ODSC talk last year, which should be accessible on their website.

A visualization that rank-orders the features that were most important to the predictive capacity of a model doesn’t just give you insight, it also helps you model better. You can use those top features to build faster and more accurate models. 

What do you think will be the most important data visualization trend in the next couple of years?

Data is becoming evermore important basically everywhere, but popular and even expert understanding hasn’t quite kept up.

Data is slowly consuming us, pressing down from all angles like that Star Wars scene where Luke Skywalker and Princess Leia get crushed by trash. But are people able to actually interpret that data, or are they going to wordlessly nod along to the magical incantations of “data” and “algorithms”? 

As decisions and stories become increasingly data-driven, visualizations in the media are going to become more important. Visualizations are sort of inherently democratic.

Everyone who can see can understand a trend; math is an alien language designed to make us feel dumb. I think that in journalism, interactive storytelling — displaying data with a visual and narrative focus — is going to become even more ubiquitous and important than it already is. These visualizations will become even more interactive and possibly even gamified.

The New York Times did a really cool story where you had to draw a line to guess the trend for various statistics, like the employment rate, during the Obama years, before showing you the actual trend. This kind of quasi-gamified interactivity is intuitively more helpful than viewing an array of numbers.

Expert understanding will benefit from visualizations in the same way. Models are being deployed in high-stakes industries, like healthcare and insurance, that need to know precisely why they’re making a decision. They’ll need to either use simplified models that are inherently more intelligible, at the expense of accuracy, or have powerful tools, including visualizations, to persuade their stakeholders that model decisions can be interpreted.

The EU is working on legislation called “right of explanation” laws, which allows any AI-made decision to be challenged by a human. So visualizations focused on model interpretability will become more important. 

A few other things….as more and more businesses integrate with machine learning systems, visualizations and dashboards that monitor large-scale ML systems and tell users when models need to be updated will become more prevalent. And of course, we’re generating staggering amounts of new data every day, so visualizations that can accurately summarize that data while also allowing us to explore it in an efficient way — maybe also through unsupervised learning techniques like clustering and topic modeling— will be necessary. 

Please tell us a bit about DataRobot, the company you work at.

We’re a machine learning startup that offers a platform data scientists of all stripes can use to build predictive models. I’m equal parts a fan of using the product and working on it, to be honest. The app makes it insanely easy to analyze your data, build dozens of models, use the myriad visualizations and metrics we have to understand which one will be the best for your use case, and then use that one to predict on new data.

The app is essentially an opinionated platform on how to automate your data science project. I say opinionated because it’s a machine that’s been well-oiled by some of the top data scientists in the world, so it’s an opinion you can trust. And as a data scientist, the automation isn’t something to fear. We’re automating the plumbing to allow you to focus on the problem-solving, the detective work. Don’t be a luddite! 

It’s really fun working on the product because you get to learn a ton about machine learning (both the theoretic and real-world applications) almost by osmosis. It’s like putting your textbook under your pillow while you sleep, except it actually works. And since data science is such a protean field, we’re also covering new ground and creating new standards for certain concepts in machine learning. There’s also a huge emphasis, embedded in our culture and our product, on — “democratizing” is abusing the term, but really putting data science into as many hands as possible, through evangelism, teaching, workshops, and the product itself.

Shameless promotional shout-out: we are hiring! If you’re into data or machine learning or python or javascript or d3 or angular or data vis or selling these things or just fast-growing startups with some cool eclectic people, please visit our website and apply!

As a data visualization engineer at DataRobot, what are the key design principles the company applies for development of its visualizations?

The driving design principle is functionality. Above all, will a user be able to derive an insight from this visualization? Will the insight be actionable? Will that insight be delivered immediately, or is the user going to have to bend over backwards scrutinizing the chart for its underlying logic, trying to divine from its welter of hypnotic curves some hidden kernel of truth? We’re not in the business of beautiful, bespoke visualizations,  like some of the stuff the NYTimes does.

Data visualization at DataRobot can be tricky because we want to make sure the visualizations are compatible with any sort of data that passes through — and users can build predictive models for virtually any dataset — which means we have to operate at the right level of explanatory and visual abstraction. And we want users of various proficiencies to immediately intuit whether or not a model is performing well, which requires thinking about how a beginner might be able to understand the same charts an expert might expect. So by “functionality” I mean the ability to quickly intuit meaning.

That step is the second in a hierarchy of insight: the first is looking at a single-valued metric, which is only capable of giving you a high-level summary, often an average. This could be obfuscating important truths. A visualization the second step exposes these truths a bit further, displaying multiple values at a time over slices of your data, allowing you to see trends and anomalous spots. The third step is actually playing with the visualization. An interactive visualization confirms or denies previous insights by letting you drill down, slice, zoom, project, compare — all ways of reformulating the original view to gain deeper understanding. Interactive functionality is a sub-tenet of our driving design principle. It allows users to better understand what they’re seeing while also engaging them in (admittedly) fun ways. 

During the ODSC in Boston, you will be presenting an intro to D3, can you give us a heads up? What is D3 and what are its main features and benefits?

D3 is a data visualization library built in Javascript. It represents data in a browser interface by binding data to a webpage’s DOM elements. It’s very low-level, but there are plenty of wrapper libraries/frameworks built around it that are easier to use, such as C3.js or the much more sophisticated Plot.ly. If you find a browser-rendered visualization toolkit, it’s probably using D3 under the hood. D3 supports transitions and defines a data update function, so you can create really beautiful custom and dynamic visualizations with it, such as these simulations or this frankly overwrought work of art.

D3 was created by Mike Bostock as a continuation of his graduate work at Stanford. Check out the awesome examples.

Please share with us some details about the session. What will attendees get from it?

Attendees will learn the basics of how D3 works. They’ll come away with a visualization in a static HTML file representing some aspect of a real-world dataset, and a vague sense of having been entertained. I’m hoping the workshop will expose them to the tool and give them a place to start if they want to do more on their own. 

What are the prerequisites attendees should have to take full advantage of your session?

Having already downloaded D3 4.0 (4.0!!!!!) will be useful, but really just a working browser — I’ll be using Chrome — and an IDE or text editor of your choice. And a Positive Attitude™. 

Finally, on a more personal tenor, what's the best book you've read recently? 

Story of O: a bildungsroman about a young French girl's spiritual growth. Very inspiring!

Thank you Morgane for your insights and thoughts.

Morgane's Intro to 3Dworkshop session will be part of the Open Data Science Conference to take place in Boston, Ma. from May 3 to 5.

A good excuse to visit beautiful Boston and have a great data science learning experience!


Comments

  1. Your Computer Technology summer college is the best introduction for teenagers interested in learning the important thing concepts and hypotheses in Laptop or computer scientific research at computer science summer school. If you are captivated by technological innovation, or have each statistical and practical likes and dislikes, the program is a great match.

    ReplyDelete

  2. نقل عفش من الدمام الى الرياض نقل عفش من الدمام الى الرياض
    ارخص نقل عفش بمكة ارخص نقل عفش بمكة
    نقل عفش من جدة الى الاردن نقل عفش من جدة الى الاردن
    نقل عفش

    ReplyDelete

Post a Comment

Popular posts from this blog

Machine Learning and Cognitive Systems, Part 2: Big Data Analytics

In the first part of this series, I described a bit of what machine learning is and its potential to become a mainstream technology in the industry of enterprise software, and serve as the basis for many other advances in the incorporation of other technologies related to artificial intelligence and cognitive computing. I also mentioned briefly how machine language is becoming increasingly important for many companies in the business intelligence and analytics industry. In this post I will discuss further the importance that machine learning already has and can have in the analytics ecosystem, especially from a Big Data perspective. Machine learning in the context of BI and Big Data analytics Just as in the lab, and other areas, one of the reasons why machine learning became extremely important and useful in enterprise software is its potential to deal not just with huge amounts of data and extract knowledge from it—which can somehow be addressed with disciplines such as data

Next-generation Business Process Management (BPM)—Achieving Process Effectiveness, Pervasiveness, and Control

The range of what we think and do is limited by what we fail to notice. And because we fail to notice that we fail to notice there is little we can do to change until we notice how failing to notice shapes our thoughts and deeds. —R.D. Laing Amid the hype surrounding technology trends such as big data, cloud computing, or the Internet of Things, for a vast number of organizations, a quiet, persistent question remains unanswered: how do we ensure efficiency and control of our business operations? Business process efficiency and proficiency are essential ingredients for ensuring business growth and competitive advantage. Every day, organizations are discovering that their business process management (BPM) applications and practices are insufficient to take them to higher levels of effectiveness and control. Consumers of BPM technology are now pushing the limits of BPM practices, and BPM software providers are urging the technology forward. So what can we expect from the next

Teradata Open its Data Lake Management Strategy with Kylo: Literally

Still distilling good results from the acquisition of former consultancy company Think Big Analytics , Teradata , a powerhouse in the data management market took one step further to expand its data management stack and to make an interesting contribution to the open source community. Fully developed by the team at Think Big Analytics, in March of 2017 the company launched Kylo –a full data lake management solution– but with an interesting twist: as a contribution to the open source community. Offered as an open source project under the Apache 2.0 license Kylo is, according to Teradata, a new enterprise-ready data lake management platform that enables self-service data ingestion and preparation, as well the necessary functionality for managing metadata, governance and security. One appealing aspect of Kylo is it was developed over an eight year period, as the result of number of internal projects with Fortune 1000 customers which has enabled Teradata to incorporate several be