DataMind goes to DataCamp

We’re happy to announce that effective immediately, we’ve officially changed our startup’s name from DataMind to DataCamp.

It was very obvious from the start that we did not want to become the next consultancy firm -in a row of many- that offered training and learning services on the side. We believed the time was ripe to build a company within the field of data science that had education and training as its sole core. A company that would develop tailored educational technology, and use it to offer something more exciting than the traditional two-week seminars or long monotonous webinars (depending on which of the two you can afford). The vision was to build a tailored online learning platform that offered students and professionals an engaging, learning-by-doing environment were they could build their knowledge through in-browser coding and exercises.

Today, it seems like there is indeed room for a vision like ours. Everyday, more and more (soon-to-be) data analysts are finding their way to our free interactive intro to R course, and based on the increasing retention figures we have (at least the impression) that they like the interactive learning approach a lot. This traction allowed us to make improvements faster, and just recently we managed to get out of the beta stage.

So why the name change? In the process of building the learning platform, and spreading the message of it to students, professionals and academics, we learnt that a more professional image would benefit us if we wanted to access bigger players in the market, more funding sources, and better mentors. So for the benefit of the project’s growth and future we decided to do a name switch. Instead of the playful domain name DataMind.org you can now find us on the more professional DataCamp.com.

We felt the timing was right because in the upcoming months we’re releasing some interesting new features to the the online interactive learning platform (like a new gamfication system). Even more exciting is that we recently started working together with Coursera professors on how to integrate DataCamp with their course. This will hopefully allow even more students and starting data scientists to become familiar with the power and benefits of R. But more on that in our next post…

We hope you’ll love our new name as much as we do!

@DataCamp_com
Linked-in
Website

 

Complete list of Coursera courses using R ranked by “popularity”

Coursera – an online education startup – has rapidly expanded its curriculum of statistics and data analysis courses. Today, there are already 33 modules directly linked to the field, excluding the courses where statistics and data science are solely used as a supportive tool (e.g. finance). These courses make use of multiple statistical software packages like Python, MATLAB and of course R.

I decided to make a list of all Coursera courses that use R as either their first choice, or as one of the many statistical software packages allowed to use by students to perform the homework’s assignment. Coursera does not publish all data on how many students enroll in their courses, but most (some?) courses reach well over a hundred thousand students each year.

To have some kind of indication of their popularity, I list below all courses using R ranked by the number of facebook likes:

Ranking Courese title Professor University Facebook likes Tweets
1
Social Network Analysis
Lada Adamic University of Michigan 12000 3543
2
Statistics one
Andrew Conway Princeton University 9600 1421
3
Computing for Data Analysis
Roger Peng John Hopkins University 8500 1934
4
Data Analysis
Jeff Leek John Hopkins University 5200 1408
5
Introduction to Data Science
Bill Howe University of Washington 2600 1103
6
Introduction to Computational Finance and Financial Econometrics
Eric Zivot University of Washington 2100 351
7
Mathematical Biostatistics Boot Camp 1
Brian Caffo John Hopkins University 1400 239
8
Statistics: Making Sense of Data
Alison Gibs & Jeffey Rosenthal University of Toronto 1400 243
9
Asset Pricing
John H. Cochrane University of Chicago Booth 855 102
10
Mathematical Methods for Quantitative Finance
Kjell Konis University of Washington 635 92
11
Case-Based Introduction to Biostatistics
Scott L. Zeger John Hopkins University 424 110
12
Financial Engineering 2
Martin Haugh & Garud Iyengar Coumbia University 109 13
13
Data Analysis and statistical inference
Mine Çetinkaya-Rundel Duke University 80 18
14
Core Concepts in Data Analysis
Boris Mirkin Higher School of Economics 77 15
15
Mathematical Biostatistics Boot Camp 2
Brian Caffo John Hopkins University 60 21

Given the unwillingness of Coursera’s search function, I had to manually draft the list above. Therefore, it is possible I overlooked some of the courses. Feel free to mention them in the comment section, and I will make sure to update the list. In case you are interested in taking (or teaching) interactive data analysis courses, make sure to have a look at our own educational startup DataMind.

While I expect that most of you are familiar with Coursera, for those who don’t a quick summary: Coursera is one of the leading providers of Massive Open Online Courses (MOOCs). Today they have more then 100+ institutional partners offering 500+ courses to over 5 million students worldwide. So despite being criticized by some, it is becoming more and more clear that they are here to stay.

R-Fiddle: An online playground for R code

r-fiddle_logowww.R-fiddle.org is an early stage beta that provides you with a free and powerful environment to write, run and share R-code right inside your browser. It even offers the option to include packages. Since a couple of days it’s gaining more and more traction, and was mentioned on the frontpage of Hacker News.

We designed it for those situations where you have code that you need to prototype quickly and then possibly share it with others for feedback. All this without needing a user account, or any scrap projects or files! We even included a very-easy-to-use ’embed’ function for blogs and website, so your visitors can edit and run R code on your own website or blog. This is the first version of R-fiddle, so do not hesitate to give us feedback.

Working together with the help of R-fiddle

You can use R-fiddle to share code snippets with colleagues when tossing around ideas, in order to find that annoying bug, or by making your own variations on others people code. It’s easy: Just go to http://www.R-fiddle.org, type your code, and get your public URL by pressing ‘share’. This is a lot easier for your potential troubleshooter/colleague/.. since (s)he can immediate run and check the code, save it once finished and share it again. So by sharing your R-code through R-fiddle, you can not only help others to better understand your code, but they can also help you!

Embedding an R-fiddle in your blog or website

Embedding the interactive code of your fiddle on a website or blog is easy. R-fiddle automatically generates a piece of code that you can then simply paste in your HTML at the desired place.

You can choose between two ways to embed the code: with or without the console. If you embed a fiddle with the console, your visitors can edit and run your code within the environment of your own site. If you embed a fiddle without the console, your visitors can see the code with a link to the r-fiddle website where they can edit and run it. For more information on how to embed interactive code, just check the documentation at http://www.r-fiddle.org/#/help

The R-fiddle working environment

Working with R-fiddle is very straightforward. The page exists out of two sections. The main section of the site (on the left) is divided into two areas: the editor and the console. Here is were you put your code. They work just like the standard editor and console you are familiar with from your IDE. For example, it colour-codes the syntax. The right pane is the discussion area. Here others can comment on your code, make suggestions, or ask questions. You can immediately see the comments others made, making collaboration easy.

rfiddle

The R-fiddle buttons

The R-fiddle interface provides plenty of features to assist in your development. The buttons at the top of the page include:

  • Save: By clicking save you activate the Embed and Share buttons. You always have to click save first, that’s when R-fiddle knows things are getting serious.
  • Embed: This allows you to embed your code on your website and blog with the help of an iframe.
  • Share: This allows you to share code from the R-fiddle page with other users. You can share it through a web link, Facebook and Twitter. These users can than provide feedback or even adapt/fix your code within their own browser.
  • Run:Executes the code entered in the editor, and displays the results in the console area.
  • Graph: Here you can find the graphs that are possibly created by your code.

 In conclusion:

With this quick tour on R-fiddle, we hope to have given you a better understanding of what it provides and why you should use it. Please be aware that R-fiddle is a hosted application in beta, so performance can degrade during peak usage. As R-fiddle usage increases, we will add more servers to it asap. Check out www.R-fiddle.org today, and you will discover its power!

For any questions or suggestions, do not hesitate to contact us at info@datamind.org

Building DataMind: FREE Online Interactive Learning Platform for R

DataMind is the first free interactive online learning platform for R. Through an in-browser coding environment we offer exercise-based learning-by-doing. Our goal is to build a fun learning experience for data analysis and R, while allowing anyone to create courses! You can check out an early stage beta version at www.DataMind.org !

With DataMind, we focus on three things: (1) make the educational experience interactive and fun for students, (2) make the platform and the content available for free, and (3) stimulate content creation by the community (you! Drop us a line if you are interested to create courses, the course creation interface is work in progress). Our focus on interactivity and fun is driven by our believe that you learn data analytics by doing! We do not believe in copying the classroom online. That is why all our courses are constructed around an in-browser coding interface, allowing users to start coding R from day one with the help of instant feedback. Over time, challenges and competitions will be added to courses as well, so users can also interact with each other.

We were inspired to start this project by innovative start-ups who offer interactive web development courses. These start-ups put a focus on learning-by-doing through in-browser coding, elements of gamification, and community provided content. It turned out this approach was a huge hit, but we got frustrated it didn’t exist for R and data analysis. Having experience in teaching statistics, we were convinced data analytics education could greatly benefit from such a didactic approach that focuses on learning-by-doing. Next, the data science industry itself is experiencing a huge increase in popularity. And last but not least, we strongly believe data analytics and its visualisation needs a somewhat tailored learning approach compared to web development.

So we started coding!

We are developing DataMind in such a way that it supports, and even stimulates, content creation by the community. The key succesfactor of an online learning platform depends on the strength of the available content. Today, R is used in many domains that are often relatively unrelated. (e.g. finance and biostatistics). With community content generation, experts of these diverse fields can share and create interactive content much faster and of much higher quality than we could ever do ourselves. For you as a course creator, it’s a scalable way to spread knowledge, build reputation and provide a fun learning experience to your students. In other words, we need you 😉

Where do we stand today? At www.DataMind.org you can check out an early stage beta version of the platform and enroll in our first course ‘Summer of R‘. ‘Summer of R’ is aimed at those new to R that want to master the basics so they can start doing their own analysis. Furthermore, we’re working very hard on the course creation interface so everyone can start creating interactive courses soon.

If you feel enthusiastic about this project, and want to create interactive courses either for academic purposes, professional reasons or just for fun. Or if you have suggestions, feedback, questions… Do not hesitate to send an e-mail to info@datamind.org. (We would love feedback!)

www.DataMind.org

P.S. The technical infrastructure behind DataMind will be covered in a future post.

logo_big_transparant_capital_M_blue

?help! Instant R search on Rdocumentation.org

Last week, I was working on an educational R project when I needed to consult the help files of different R packages and functions online. After doing some Google searches, it appeared to me that finding an easy-to-use tool was not as simple as I had expected. The closest that I got, were the websites Inside-R and R search, but as a user it wasn’t as “smooth” as what I was looking for. (I needed something really user-friendly for this educational project). Therefore, inspired by the documentation websites of programming languages/frameworks such as Ruby on Rails and AngularJS, I decided to build an online documentation search interface for R myself together with colleagues. Check the result on www.Rdocumentation.org!

Checking R documentation online instead of with the built-in R help function, can often provide some extra benefits. First, you are capable of searching through the latest version of all R packages, even those that are not installed on your device. This makes it not only a help tool, but also a tool for discovery. Second, I added the discussion system ‘Disqus’. For every function and package, Disqus allows users to ask questions, add extra examples to the documentation, etc. Furthermore, today’s web development tools allow you to build a more user-friendly interface. Especially for an R-beginner this can be helpful. And last but not least, since R is a “one letter word”, googling for “R” + “something” is always a challenge. Having all the documentation in one place can at least eliminate that frustration.

I wrote the code for www.Rdocumentation.org together with some colleagues. It is quite dirty code since it only needed to get the job done, but for those interested just send me a request. Also, while coding, we discovered the great staticdocs package of Hadley Wickham, it was not exactly what we needed but maybe it can be used for other/similar initiatives. For all packages on CRAN, the help files were generated in html. Next, these html files were parsed and inserted into an SQL database. We opted for Ruby on Rails to build the web app, that serves all the documentation on R packages and functions. Finally, using JQuery and Twitter Bootstrap, we built the instant search tool that allows you to see all R packages and R functions immediately while typing.

Me and my colleagues hope that with Rdocumentation.org we have delivered the R community with a new and useful tool. Just let us know if you have any suggestions on how we can further improve Rdocumentation.org.

Survey on R and education

Main takeaways from the survey on R in education

  • There is a need to train students in R since large majority of respondents (using R professionally) expect its market share to further increase.
  • Large interest from both academics and students in online interactive R and statistics courses. Highest interest in free courses and only small fraction interested in paying courses.
  • Most R package authors are interested to create interactive R tutorials for free.

Introduction

In this post, we briefly summarize and discuss the results of our survey on “R and education”. Before diving into the figures, we would like to express our sincere gratitude and appreciation to the 286 R enthusiasts that invested their valuable time to fill out this survey. Furthermore, you can download the complete dataset of the survey or browse an overview of all questions (see bottom of the post for more information), so feel free to do your own analysis, and share it. Note that the right panel of this page provides the answers to some open-ended questions in the survey.

Interestingly, respondents came from diverse backgrounds, both geographically as well as in terms of occupation. The left panel of Figure 1 illustrates respondents are mainly active as academics (50.5%), followed by professionals (30%) and students (19.5%). Academics from about 80 different universities, mainly located in the US and Europe, participated. About 24 respondents were R package authors.

The online survey was distributed through the R mailing lists and our personal contacts. Figure 1 demonstrates the geographical origin of the respondents. Individuals from all 4 continents participated, with the majority based in the US. Although there is selection bias when conducting an online survey in this way, we believe the current diversity of respondents is interesting and adds some flavor to the results.

Next, we first discuss the main takeaways regarding the respondent’s views on R in general. A more focused section follows on R and education. To end, we discuss the next steps we want to undertake based on this survey’s results.

R survey - respondents origin and occupation
Figure 1 – Academics, Professionals and students from around the globe filled out the survey.

Why you love R and expect its market share to go up

Figure 2 – Expected market share evolution of R  (Professionals group)
Figure 2 – Majority of Professional respondents expects increase in market share of R

Respondents (from the group “professionals that use R”) are very optimistic when asked about the future spreading of R in the world, as illustrated in Figure 2. An impressive 79.7% of respondents expect the future usage to go up in comparison to other statistical packages such as SAS and SPSS, only 11.9% expects it will remain stable, and just 3.4% of the respondents take a pessimistic view, expecting it will go down.

Figure 3 shows that respondents (from the group “professionals that use R”) mainly love R because of its functionality (86.2%) and the community (65.5%). Other reasons to love R cited under “other” are (among others): “many packages”, “cross platform” and “wonderful for graphics”. All that glitters is not gold though. When asked about their biggest frustration when using R, only 19% answers “Nothing, R is perfect”. The biggest frustrations reported by respondents are “the lack of documentation” (29.3%) and “the lack of consistency” (22.4%). A large number of respondents (34.5%) provided an open-ended response on this question as well. We listed the open-ended responses to this question in the right panel of this page as well as the open-ended responses to what respondents consider as the main disadvantages of R.

What people love and hate about R
Figure 3 – Aspects respondents (Professionals) “love” and “hate” about R

Major interest in online learning and teaching R

“R best matches the concept of ‘computational thinking’, a core idea that my students need”

Academic respondent

Whether you are completely new to R, or you are a veteran with multiple years of experience, there is always room to learn and improve. As illustrated in Figure 4, one of the main sources to develop new R skills are online resources such as websites and online communities. This is true for both academics (92.4%), and professionals (94.9%). The second most cited educational source is the build-in R help feature, mentioned by 77.2% of the academics, and 83.1% of the professionals. Textbooks, which can be seen as a more traditional way to learn and teach, are placed third.

Sources to improve and update R skills
Figure 4 – Main sources of information to improve and update R skills (92 academic & 59 professional respondents)

Today, numerous online courses on statistics are already making use of the R language to explain data analytics concepts. Some of the most noteworthy and successful examples are the Coursera courses from Roger D. Peng (Computing for Data Analysis), and Eric Zivot (Introduction to Computational Finance and Financial Econometrics). This proven need for online educational sources for statistics and R, raises the question if it would be possible to identify different and even more engaging ways to learn R online. The ‘R in Education’ survey indicates over 75% of students are interested to take online courses with an interactive component. Of the Academic respondents, 68.6% shows interest in online interactive courses and 13% would be willing to pay for these courses (see Figure 5). Our survey results are thus in line with the observation that online interactive courses as offered by codecademy.com, codeschool.com, etc. have gained enormous popularity recently.

Interest in online interactive courses on R and statistics
Figure 5 – Majority of respondents interested in free interactive courses
Willingness to create interactive tutorials for R packages
Figure 6 – Most package authors willing to create free R courses

Naturally, in open-source communities most things are developed and offered for free. As noted in the previous paragraph, interactive online courses would be a valuable addition to the current spectrum of R’s educational sources. Since our results indicate that demand for free courses would be high, the question manifests itself: Who will develop these free courses? A reasonable assumption would be to look at people already developing free software such as the R package authors. Indeed, 70% percent of R package authors in the survey indicated that, given an easy-to-use development platform exists, they would be willing to create such interactive learning tools for their packages for free (note that the sample is small though). Therefore, it might be interesting to develop and eventually provide such a platform as a way to spread data analytics knowledge in general, and the R statistical programming language in specific.

New educational tools to teach R and statistics?

This survey largely confirmed our believe that there is a need for more online educational tools to teach R. These tools should take into account the added value of an interactive approach, as well as the characteristics and benefits of an open-source community. Therefore, we started working on an open interactive exercise platform for statistics and R.
To receive updates on our future progress, or if you are willing to provide us with feedback while building this learning platform, please leave your e-mail address below.

Extra information

  • Download the full dataset of the survey here. The dataset is structured as follows: qla is a list in which each list-item contains the information of exactly one question in the survey. Each list-item in qla is itself again a list with the following items:
    • First list-item: The question asked
    • Second list-item: The answer possibilities
    • Third list-item: The data with the answers. Rows for respondents, columns for answers.

    NOTE: For privacy reasons we removed all information from the dataset that could result in identification of the respondents (e.g. emails, university affiliation,..). Please contact us in case we overlooked something.

  • Have a look at a summary of the results of all questions in the survey.
  • The graphics in this post were generated with the R package ggplot2, see code.
  • Errata list:
    We would like to offer our apologies for the following errors that ended up in the survey:

    • When selecting that R is more complex to learn than other statistical languages, one of the following questions stated that you indicated that R was less complex to learn.
    • In order to better target the questions and to avoid making the survey even longer, we opted to mostly ask different questions to each type of respondent (Students/Academics/Professionals). Therefore, it is not often possible to make comparisons of the different types of respondents, which is a pity in hindsight.
    • …. (?)