Overcoming Barriers to Sharing Data & Code: A Practical Guide for Scientists

The case for sharing research outputs is clear. When data and code are available, results can be tested, extended, and trusted more easily. Funders and journals are asking for it, and the community is moving in that direction. Yet despite the good intentions, many datasets never get released and a lot of analysis code remains private.

It isn’t because researchers are opposed to openness. It’s because sharing is complicated. There are questions about ethics, about privacy, about licensing. There are also practical hurdles like formats, storage, and the simple fact that preparing files takes time. This article looks at the main reasons scientists hold back and what can be done to make sharing less of a burden.

Common reasons scientists hesitate to share data and code

If you ask around, the answers are fairly consistent.

Some worry about losing control of their work, or about someone else publishing an analysis before they do. Others hesitate because their data is messy or their code isn’t polished, and they don’t want to release something that could be misread. Then there are the very ordinary problems: not knowing which repository to use, or not having time to prepare files when deadlines are pressing.

These concerns don’t mean researchers are against open science. They mean that the process is not straightforward, and the risks feel real.

Legal and ethical concerns: consent, anonymisation, sensitive data

When studies involve people, ethics come first. Consent forms often don’t mention sharing, which makes it difficult to release data later. Even when personal identifiers are stripped out, there are cases where individuals can still be identified by combining datasets. Genetic information and health records are especially sensitive.

That doesn’t make sharing impossible. Some researchers release metadata so others at least know the data exists. Others prepare de-identified subsets or summaries. Controlled access repositories are another option, letting qualified users apply for permission. The principle is simple: openness where possible, protection where necessary.

Technical challenges: formats, metadata, versioning

Even when ethics aren’t an issue, technical details can trip researchers up.

Old file formats or proprietary software can make a dataset impossible to open just a few years later. Metadata is often incomplete, which leaves future users guessing about what variables actually mean. And versioning is a constant problem. Without clear records, it’s hard to know which dataset or script produced the results that ended up in a paper.

The fixes are fairly ordinary: use open formats like CSV or JSON, add documentation, and track versions. They don’t solve everything, but they make it far more likely that others can actually reuse the work.

Licensing for reuse

Licensing is another area that creates hesitation. A dataset without a license is a grey area. Some people will avoid it to be safe, others will reuse it without giving credit.

Adding a license provides clarity. Creative Commons options like CC BY are common for data and simply require attribution. For code, licenses such as MIT or Apache 2.0 make reuse straightforward while setting basic terms. The important point is that a license signals permission and expectations. It reassures both the author and future users.

How DOIs and integrated publishing solve discoverability issues

Sometimes data and code are shared but then hidden in places where nobody thinks to look. Files stored on a personal website or buried in supplementary materials are easy to overlook.

Digital Object Identifiers (DOIs) make outputs easier to find and easier to cite. They also connect research together. A paper can point to the dataset DOI, the dataset can link back to the paper, and both can reference the code. Instead of scattered files, you end up with a connected package that others can navigate.

The case for using DeSci Publish to avoid fragmentation

Fragmentation is one of the most frustrating barriers. Manuscripts on one server, data in another repository, code on GitHub. Over time links break, and readers who want to reproduce the work are left with gaps.

Integrated platforms are a way around this. DeSci Publish is an example. It allows manuscripts, data, and code to be uploaded together. Each output gets a DOI, and the platform links them automatically. Licensing and metadata are included in the workflow, which keeps the whole research package connected and easier to use.

Conclusion

Sharing research data and code will never be effortless, but it also doesn’t have to feel impossible. Ethical and technical issues matter, and they won’t go away. With careful preparation and tools that reduce fragmentation, the barriers become smaller. What remains is the value of research that can be trusted, tested, and reused.

‍