Leveraging scriptable infrastructures, Towards a paradigm shift in software for data science.
Cloud computing is the answer to the explosion of big data. While the cloud provides infinite scalability for storage, several questions associated with the growth of big data remain partly or fully unanswered: "How will we analyse all this data?". "How can we analyze it virtually?". "How can we leverage the programmability and elasticity of the cloud infrastructure to enhance the flexibility and capabilities of the software tools we use?". "Will we be able to produce and publish on top of models and data, scientific services and applications as easily as we blog ?". "How will we snapshot, make reproducible, undo and redo easily data transformations and analysis?". "Will we be able to achieve software convergence and make our data analysis tools communicate and work for us in synergy?". "How will we view and analyze data collaboratively and how will we share the produced artifacts ?" Elastic-R (www.elastic-r.net<http://www.elastic-r.net>) aims to answer these questions. For the benefit of both Academia and Industry, the Elastic-R platform transforms Amazon EC2 into a ubiquitous collaborative environment for data analysis and computational research. It makes the acquisition, use and sharing of all the capabilities required for statistical computing, data mining and numerical simulation easier than ever: the cloud becomes a user friendly Google-Docs-like platform where all the artifacts of computing can be produced by any number of geographically distributed real-time collaborators and can be stored, published and reused. The presentation will be an overview of this new pioneering platform and applications in bioinformatics and finance will be demonstrated.