There is a new paradigm, opening completely new opportunities for discovery – a data-intensive approach to science. In many domains, we have entered what could be called the golden age of surveys, with several large-scale projects, spanning decades, between finished, ongoing, and planned activities. ESA is responsible, or is a major partner, in several of these initiatives.
There is, however, a new profound change: data has become a major technological challenge. Increases by multiple orders of magnitude in dataset size means that transferring data to a scientist is often unfeasible.
ESA datalabs gives you a privileged position; bring your code directly to ESA’s infrastructure – there is a great set of tools and programming languages are flexible – and execute it with direct access to ESA’s archives.
ESA datalabs offers a catalog of datalabs you can use. They range from new tools that are quickly become de facto standards to older software that has been repackaged to run inside virtual computers. All are accessible via your web browser.
You can develop notebooks in JupyterLab and pick your programming language from a large and expanding set of languages. We have customized JupyterLab so its immediately useful, with no further work, for astronomers, scientists in Earth Observation related field, or researchers in the navigation area. Or you can keep your code from a different development environment such as Octave or keep using a reference tool such as TopCat in astronomy.
ESA datalabs offers the possibility of searching ESA’s data holdings across domains using the ESA data discovery portal we are developing. To start, you will be able to search for data collections in astronomy, earth observation, navigation, but we expect to offer representative data collections from all of ESA’s directorates.
A data collection is, in general, any set of digital assets, but the best example is an official data release from an ESA mission. Once you find a collection, ESA datalabs offers you the possibility of adding that collection to your workspace; no data transfer will be done and the collection will appear as just another folder.
You can self-register for an account in ESA datalabs. They are free of cost and come with a private storage area. This area is persistent with backups made in ESA’s servers. All your files are private by default and cannot be accessed by other users, but you can also share any file with other registered users. To help with creating a project (that is seen by members of a team) you can create a team workspace: this is a separate storage where you control each member’s role (what they are allowed to do).
The construction of a hard scientific fact is the result of a network of actors and artefacts, mutually influencing each other over time. It is becoming more generally accepted that collaboration, changes in communication, and reproducibility are vital for scientific progress.
In a data-intensive domain, a finding is the outcome of a pipeline of computations, applied to some large existing dataset, or to a dataset created by researchers from multiple other large datasets, combined in unexpected and complex ways. The only way to reproduce the finding is to repeat the “computational narrative” in a first step, exactly as the original.
This is a new kind of technical problem/challenge: it requires packaging data, code, and all dependencies in a way that can be repeated years later, long after the original software stops being in use and the original data capture has ended.
ESA datalabs includes features for constructing “computational narratives”, promote the internal dynamics of a group of scientists collaborating, and support how data and evidence is presented to the general public.