-
Immersive Teaching and Research in Data Sciences via Cloud Computing
Track: Real Life Cloud ComputingLocation:Legends Ballroom - Robinson-WhitmanAbstract:
Many talk about cloud computing, some try, yet only few succeed, since cloud computing follows a new paradigm which needs to be learned and understood. This talk is about what cloud computing means to academic teachers and researchers in data sciences and how to take advantage of it. With public cloud computing a new era for research and higher education begins. Scientists, educators and students can now work on advanced high capacity technological infrastructures without having to build them or to comply with rigid and limiting access protocols. Thanks to the cloud's pay-per-use and virtual machine models, they rent the resources and the software they need for the time they want, get the keys to full ownership, and work and share with little limitation. In addition, the centralized nature of the cloud and the users' ubiquitous access to its capabilities should make it straightforward for every user to share with others any reusable artifacts. This is a new ecosystem for open science, open education and open innovation. What is missing is bridging software. We propose such software to help data scientists, educators and students take advantage of this new ecosystem: R, Python, Mathematica, Spreadsheets, etc. are made accessible as articulated, programmable and collaborative components within a virtual research and education environment (VRE). The result is astonishing and requires some adaptation in the way we think: Teachers can easily prepare interactive learning environments and share them like documents in Google Docs; students can share their sessions to solve problems in collaboration. Costs may be hidden to the students by allowing them to access temporarily shared institution-owned resources or using tokens that a teacher can generate using institutional cloud accounts. This includes on-line-courses. The talk includes examples using Amazon EC2 and Microsoft's Azure such as: 1. Constructing a collaborative environment articulated around an R session to teach statistics, 2. Creating enhanced spreadsheets to demonstrate variations of chemical molecules, 3. Creating and sharing interactive dashboards for financial analysis, 4. Reproducible research via VREs. Many things which used to be in the hands of large organizations or corporations such as science gateways, and big data treatment are now at the reach of any talented analyst, teacher, or researcher. Come and see.
-
Leveraging scriptable infrastructures, Towards a paradigm shift in software for data science.
Track: Applied Data ScienceLocation:Grand Ballroom - Salon A/BAbstract:
Cloud computing is the answer to the explosion of big data. While the cloud provides infinite scalability for storage, several questions associated with the growth of big data remain partly or fully unanswered: "How will we analyse all this data?". "How can we analyze it virtually?". "How can we leverage the programmability and elasticity of the cloud infrastructure to enhance the flexibility and capabilities of the software tools we use?". "Will we be able to produce and publish on top of models and data, scientific services and applications as easily as we blog ?". "How will we snapshot, make reproducible, undo and redo easily data transformations and analysis?". "Will we be able to achieve software convergence and make our data analysis tools communicate and work for us in synergy?". "How will we view and analyze data collaboratively and how will we share the produced artifacts ?" Elastic-R (www.elastic-r.net<http://www.elastic-r.net>) aims to answer these questions. For the benefit of both Academia and Industry, the Elastic-R platform transforms Amazon EC2 into a ubiquitous collaborative environment for data analysis and computational research. It makes the acquisition, use and sharing of all the capabilities required for statistical computing, data mining and numerical simulation easier than ever: the cloud becomes a user friendly Google-Docs-like platform where all the artifacts of computing can be produced by any number of geographically distributed real-time collaborators and can be stored, published and reused. The presentation will be an overview of this new pioneering platform and applications in bioinformatics and finance will be demonstrated.