How Scientists Evaluate What Counts as Good Evidence

Interdisciplinary work is now routine. Deep interdisciplinary understanding is not.

When scientists from different fields disagree, the disagreement is often framed as a failure of rigor or competence. In practice, it more often reflects different assumptions about what good evidence looks like, which risks are most serious, and which kinds of uncertainty are tolerable.

These assumptions are rarely explicit. They are learned through training, peer review, and informal norms. When they collide, scientists can talk past each other without realizing they are using different evidentiary yardsticks.

Evidence standards are shaped by constraint

Every discipline develops its own evidentiary norms in response to the problems it can realistically study.

For example:

  • clinical research prioritizes controlled interventions to isolate causal effects
  • ecology and climate science often rely on observational inference because manipulation is impossible
  • economics leans on natural experiments when randomization is infeasible
  • machine learning emphasizes benchmark performance when prediction is the goal
  • qualitative social science values depth, triangulation, and interpretive coherence

None of these standards are arbitrary. They are pragmatic adaptations to constraint.

Problems arise when these context-specific standards are treated as universal.

Why the same study can look rigorous and weak at the same time

A single study can inspire confidence in one field and skepticism in another.

This often reflects different views about which errors matter most. Some disciplines prioritize avoiding false positives. Others worry more about missing real effects. These priorities influence acceptable sample sizes, tolerance for noise, and standards of evidence.

When scientists argue about rigor, they are often arguing about risk management, not quality.

The limits of evidence hierarchies

Formal hierarchies of evidence can be useful teaching tools. They become misleading when applied without judgment.

A randomized trial answers some questions extremely well and others poorly. Large observational datasets can reveal patterns that experiments cannot. Qualitative work can surface mechanisms that quantitative studies depend on.

Good evidence is not evidence that ranks highest on a pyramid. It is evidence that is well matched to the question being asked.

Replication means different things in different fields

Even core concepts like replication shift across disciplines.

In some fields, replication means repeating an experiment under identical conditions. In others, it means:

  • observing similar patterns using different datasets
  • testing robustness under alternative assumptions
  • theoretical or computational convergence

Disagreements about replication often persist because participants assume a shared definition that does not exist.

Causality is not always the right standard

Another common source of confusion is the role of causality.

In some domains, causal inference is the central objective. In others, the goal is prediction, description, or system characterization. Problems arise when causal standards are imposed where they do not fit, or when causal language is used loosely where it should be constrained.

The question is not whether causality matters. It is whether it is the appropriate standard for the claim being made.

Why these disagreements escalate

Evidence disputes often feel sharper than they need to be because methods are tied to professional identity.

Questioning a field’s evidentiary norms can sound like questioning its legitimacy. As a result, methodological disagreements can quickly become personal, even when no such intent exists.

Recognizing this dynamic can defuse conflict without lowering standards.

Reading evidence outside your own field

When engaging with research from outside your discipline, a useful shift is to ask different questions.

Instead of asking whether the evidence would convince your field’s reviewers, ask:

  • what problem the field is trying to solve
  • what constraints shape its methods
  • which errors it fears most
  • what trade-offs it has accepted

This reframing often reveals that what looks like weak evidence is actually evidence optimized for a different risk profile.

Using cross-disciplinary evidence responsibly

Good interdisciplinary work makes its translations explicit.

That often means:

  • stating which evidentiary standards are being applied
  • acknowledging where evidence weakens under other norms
  • avoiding overconfident generalization
  • preserving methodological context when citing results

These practices strengthen arguments rather than dilute them.

A broader view of rigor

Rigor is not uniformity. It is alignment between question, method, and inference.

When scientists talk past each other, it is rarely because one side does not value evidence. It is because they value different protections against error.

Making those differences visible is not a compromise of rigor. It is a prerequisite for serious interdisciplinary science.

Stay up to date with DeSci Insights

Have our latest blogs, stories, insights and resources straight to your inbox

Update cookies preferences