Blog Logo

30 Nov 2021 ~ 3 min read

Thoughts on data annotation


The frustrations

There are many frustrations in medical image annotations, to name a few:

  • MANAGING data/datasets is hard: as I mentioned in the challenge of medical image analysis, huge data space, many sources, different quality
  • MANAGING annotations is also hard: you have team of annotators with different expertise/style, you need align their ability AND style while doing multiple round/versions
  • MANAGING tasks is also hard: you can’t label everything AT ONCE, it’s too laborious and impossible to review, but you eventually need to piece them together.

but those are mostly diligence issues, not skill issues. Hence I describe it as frustrations, not challenges.

The challenge

I argue, there is only one challenge: allocation of expertise .

Namely, you have cases are tedious & laborious, which requires patientce and detexrity; and you have cases that are expertise heavy, which requires years of training.

This is less pravelent in natural image CV tasks, where most tasks are solveable with common sense.

You may raise an eyebrow here: what if I just hire an all-expert annotation team, all problems solve. But as a personal ancedote, I found it’s not only uneconomical, but also not yielding the best results. As I mentioned in the challenge of medical image analysis, experts have different styles, and hard to align (to other or to your rules).

I believe the challenge itself stems from the goal of solving two problems at the same time, namely: what does it look like, what it actually is?

By working closely with annotators for years, I quickly realized labeling medical images can roughly be divided into two general types:

The laborious work

eg. finding abormalities

the key of this line of work requires patientce and detexrity rather than exptertise. annotaters with short amount background training (within days if not hours) is usually succient to get started.

examples, ranging from scrolling slice by slice of vast CT abdomen scans for “blob-like” lung nodules

labelling liver vessel

ILLUSTRATION of liver vessels. left: what you want to label, right: what you start from model prediction figure from paper for different context

to an extreme, shading pixel by pixel of all tublar lung vessels.

placeholder

Unsurprisingly, sometimes it is the annotators WITHOUT clinical expertise produce better quality annotations due to shear amount of attention and energy they invested.

The expertise work

eg. diffential diagnosis

this is where real clinical expertise comes in, takes years to master.


Hi, I'm Qianyi. I'm a ML engineer based in Beijing. read more about me on my website.