Working together on large datasets

Europe/Stockholm
room MAX III, 4th floor (MAX IV Laboratory)

room MAX III, 4th floor

MAX IV Laboratory

Fotongatan 2, 224 84 Lund
Description

Seminar on scientific collaboration on large datasets

Teams of scientists, typically geographically distributed, need to collaborate on large datasets. Within this short seminar two presentations will be given on this topic, followed by a discussion.

The seminar is organized in relation to DataStaMP project and within the MAX IV & NBI collaboration supported by eSSENCE.

Practical info

  • the seminar starts with an informal introduction and coffee at the 4th floor in the main MAX IV building at 13:15, the talk starts at 13:30 in the MAX III room.
  • Bus number 20 goes to MAX IV every 10-20 minutes. Read more how to get here.
  • We kindly ask external participants to register before Friday, Dec 6, 12:00. This will help you to pass MAX IV reception faster.
Registration
Participants
    • 13:15 13:30
      Coffee 15m
    • 13:30 13:40
      MAX IV DataSTaMP project and MAX IV involvement in ExPaNDS 10m

      MAX IV is developing storage and data services for MAX IV users within the DataSTaMP project thanks to the funding from KAW. MAX IV is also one of 11 partner institutions in ExPaNDS which aims for developing Photon and Neutron Data Services within the European Open Science Cloud (EOSC).

      A very brief introduction will be given about these projects.

      [DataSTaMP] https://www.maxiv.lu.se/accelerators-beamlines/technology/kits-projects/datastamp/
      [ ExPaNDS] https://expands.eu/

      Speakers: Darren Spruce (MAX IV Laboratory) , Magnus Klingberg, Zdenek Matej
    • 13:40 14:10
      Towards end-to-end data-management for large scale x-ray facilities 30m

      Large scale scientific facilities, including x-ray facilities, face an extreme growth in data from instruments. With x-ray instruments data grow exponentially with the increased size of detectors, another exponential factor from the frequency one many sample with and finally x-ray sources are not robust enough that a large set of experiments can be automated, bringing a large increase in the number of experiments an instrument can perform in a session. Thus storing the data alone is a challenge.

      The challenges are furthered from the fact that detector-size has grown to a resolution where samples, at least tomograms, cannot fit in the memory of a PC for data-analysis, and thus must be moved onto server-class computers with sufficient memory to hold a raw-data sample and a processed version as well. The increase in data-rate and number of experiments also mean that running through all samples manually easily becomes unfeasible and some means of batch processing must be introduced.

      A final challenge is that users of x-ray facilities is widening and many of the new users are not comfortable with data-analysis and need to work with others in that part. This means that large communities, typically geographically distributed, need to collaborate on these, very large, datasets.

      The talk will presents our ideas for an integrated solution to the above problems, include status and plans, and also introduce – for discussion – an idea for such an integrated system to help fight scientific fraud.

      The work is supported in parts by H2020 European training network MUMMERING.

      Speaker: Prof. Brian Vinter (Copenhagen University)
    • 14:10 14:40
      ICOS Carbon Portal 30m

      ICOS - Integrated Carbon Observation System, a European Research Infrastructure. ICOS is the European measurement system for high quality and high precision greenhouse gas observations. The ICOS Carbon Portal provides free and open access to all ICOS data.

      Speakers: Karolina Pantazatou (ICOS ERIC (Carbon Portal, Sweden)) , Maggie Hellström (Lund University (Carbon Portal, Sweden))
    • 14:40 15:00
      Discussion and coffee 20m