Challenges and solutions for data sharing in different research fields

23/09/2024

On 23 September 2024, as part of the Science Summit at the 79th United Nations General Assembly (UNGA79), COST co-organised the session ‘Hurdles to International Science Cooperation: Data Sharing & Management’ with the U.S. National Science Foundation’s AccelNet program.

Data sharing in research networks for international collaboration raises several key questions for scientists and researchers. Data have become an object of public interest that needs the development of norms on sharing, accessibility, and use of data.

Data management in research networks: three key questions

1. Technical questions: definitions, operationalisation, structure, and standards

2. Organisational questions: who manages, maintains, and takes day to day decisions

3. Community questions: values, ethics and integrity, community definitions, and FAIR sharing

Speakers from both COST Actions and AccelNet’s came together to emphasise, through the lens of their research network and experience, how data can become shareable and open to serve the research purpose in an international collaborative context.

Challenges and solutions

What do archaeologists, AAL professionals, astrophysicists, grape genome experts, lawyers, oceanographers, environmental chemists, and specialists in earth observation have in common? Data sharing and open collaboration on research data hiders their research process and progress in their respective fields.

Discover the challenges and solutions experiences by our researchers in their respective fields when it comes to data sharing, and what solutions their COST Actions and AccelNet projects are promoting:

Why is collaboration important?

International collaboration for data sharing is critical to advance science. But collaboration can lead to diverse results and serve different end goals for our research networks.

“To create a robust machine learning model, you need to have massive data collection. To have data of such magnitude, the research community has to join the forces. It cannot be done by individual research teams. A database that will store the data in one place is also needed. It is very challenging with such large datasets, but it is not impossible.”

Dr Martin Mokroš, 3DForEcoTech COST Action

“The sharing of complex ocean biological and biogeochemical data is a critical challenge in working towards sustainable management and stewardship of the planet. These data are an important record of the health state of the ocean and are invaluable in understanding how the ocean basins are changing. BioGeoSCAPES aims to cooperate internationally ensure data can be discovered and shared freely.”

Mak Saito, BioGeoSCAPES AccelNet project

“Challenges in data sharing and data management for developing Active and Assistive Living (AAL)technologies include ensuring data privacy, security, and interoperability across different systems and jurisdictions. But without access to high-quality datasets, the technology cannot advance because high-quality training data is a precondition to building and refining the AI models that power AAL solutions.”

Prof. Liane Colonna, Goodbrother COST Action

“Sound data sharing for transboundary water security faces three challenges: equitable data access; data literacy; and data availability. Solutions include: making available open source data; education and public outreach efforts that help users understand the credibility of data sources and appropriate data use; and making available vital water-related data from the domains of food, energy, and human health.”

Abu Mansaray, PEER2PEER AccelNet project

Fewer than five EU countries have repositories with the required specialist knowledge and mechanisms in place to ensure archaeological data will be freely and openly available for re-use by future generations of researchers. Failure to address this inequality means Europe will be divided into countries and regions whose archaeological research legacy is preserved, and countries and regions where it is irrevocably lost. This lack of equity hampers research collaboration.

Dr Holly Wright, SEADDA COST Action

“We aim to combine data from 11 radio telescopes distributed across five continents to form the world’s most sensitive pulsar timing dataset. The process of combining data is very challenging scientifically, but even more so politically, as it requires clear guidelines for authorship, data sharing, and project leadership, and a willingness to transition from an environment of competition to one of collaboration.”

Maura McLaughlin, The International Pulsar Timing Array AccelNet project

“In viticulture, Europe is the driving force but our international partners generate at least as much data as we do. The main challenge we are facing is poor description of the metadata associated with experiments and samples, for any kind of dataset. So for each standard it is crucial that everybody uses them and even participates in their conception.”

Jerome Grimplet, INTEGRAPE and GRAPEDIA COST Actions

Legislation, competition and artificial intelligence

Panelists discussed some of the more unforeseen impacts of the General Data Protection Regulation (GDPR) legislation on their research fields when it comes to research data and sharing. Holly Wright shared the interesting example that archeologists are at risk of losing photographic evidence of the history of archeological exploration as you cannot include images of people doing archaeology in reports as historical consent to use these photos does not exist. Pending questions remain on how you manage consent especially when using genomic data, for example how to maintain consent when you want to reuse the data for something else related but five years later.

From L to R top: Holly Wright, Abu Mansaray, Martin Mokroš, Jerome Grimplet.
From L to R bottom: Kirsty Tinto, Mak Saito, Maura McLaughlin, Liane Colonna

The researchers reflected on the tension that exists when you want to be really open and share your data but at the same time it’s hard to ignore the competitivity that exists within your field and different research groups. How can you be open and welcoming but still protect your own research interests? “Climate change might be a great opportunity to help us overcome data sharing challenges. This problem is bigger than you and I, and it forces us to come together. We should look at shared problems that can unite us to try to solve them” shared Abu Mansaray.

The event closed with an exchange on how artificial intelligence is changing data sharing views and practices. Encouraging data holders and nations to open their data can be problematic, which is a barrier when researchers need huge quantities of data to train and create AI models. The panelists shared positive examples of data holders granting access to data for AI model training purposes without having the navigate the slow process and challenges of ‘opening’ data officially. Meanwhile the panelists support sharing all data and code openly to help ensure ‘good’ AI algorithms are used and not random ones that misinterpret the data.

The session was an opportunity to address these challenges that require a multi- faceted approach.  It includes improved policies, incentives for data sharing, technical solutions, and a cultural shift towards more open collaborative research practices.

Additional information

Data sharing challenges in international cooperation

Saving European archaeology from the digital dark age

Improving the fruit of the vine