The very short answer to this question is ‘Quite a long way actually.”
That said there are one or two rather important caveats that you should consider before having your entire data science and advanced analytics strategy totally reliant on open source tools.
Compliance and Audit
Open source is not, by its very nature, supported and developed in a traditional commercial sense. This can cause problems and unwanted complications if you operate in an industry that regularly has to consider compliance and or legislation that has a direct impact on the way that your business or organisation must make and take account of day to day decisions.
More broadly open source tool sets for advanced analytics are not always as strong when it comes to providing you with a robust audit trail of what has been done, by whom and when; again this can become extremely important when considered in the context of compliance, legislation and transparency.
Deployment and collaboration across the entire business
Open source environments such as R and Python, along with a whole range of other open source, packaged analytical applications, often provide much needed flexibility and choice of analytical approach. When considered in the context of zero capital outlay this is clearly a significant boon. Where these programming environments and tool sets typically leave a gap is when you ask the question “how do I get the analytical process that I have built and the predictive model that I have developed to actually support decision making, across multiple channels, in day to day operations?”
Consider additionally that these processes may need to run at certain scheduled times or perhaps in near or real time. There is likely to be significant bespoke development that you will need to do to get these types of processes to automate and to deploy in a way that can actually influence outcomes and support decision making. This process management and automation of work flow (whilst also providing a suitable platform through which to govern and review all of this type of activity) is not the sweet spot of open source environments – especially when you are focused on advanced analytics.
The reality is that the provision of an analytics platform for deployment of models, decision management and analytical collaboration tends to be the preserve of the commercial providers lending the credibility of mature IT governance to the data science driven processes.
Analytical productivity and avoidance of expertise bottlenecks
R is almost certainly the most widely used open source toolset used for advanced analytics. According to recent polls, when measured by numbers of users, R held almost 38% of the market. Based on user numbers and the explosion of advanced analytics tools available clearly advanced and predictive analytics is now much more mainstream than it was a few years ago. That said, to make use of R and comparable tools the user must really have scripting / programming skills. This is a major barrier to mass adoption and usage in any organisation – we are not all going to be good analytics programmers.
The right blend of programming expertise and analytical understanding, blended with the appropriate level of subject matter expertise i.e. the ability to make sense of the business problem at the heart of the analysis, is a fairly scarce resource. Unless you are able to find ways to expand the R / analytical skills available to your business then you will quickly create bottlenecks. Your business may well have developed an appetite for advanced analytics and you may well have proven the value of prediction and data science with a few early projects – great news!
The catch comes as you try to scale this to satisfy all of the demands for advanced analytics across the business. The answer lies in being able to find ways to leverage and scale the (typically) small central programming & analytical skills resources in order that the wider organisation can benefit from it as and when it requires. Addressing this type of productivity / scalability problem is not the usual strength of open source analytical tools.