【医学論文】AIを用いて大腸がんのマイクロサテライト不安定性を予測する

みなさんこんにちは！日々機械学習を学ぶ外科医のさとうです。

今回はLancet Oncologyに2020年に掲載された大腸がんのマイクロサテライト不安定性を予測する機械学習アルゴリズム構築に関する論文を一緒に勉強していきましょう。

今回紹介する論文はこちらです。

Pubmedから検索。

Google Scholarから検索。

今回の論文

論文：Deep learning model for the prediction of microsatellite Äb0instability in colorectal cancer: a diagnostic study

著者： Rikiya Yamashita et al.

雑誌： Lancet Oncol 2020; 22: 132–41

AIを用いて大腸がんのマイクロサテライト不安定性を予測する

＜背景＞

大腸がんにおけるマイクロサテライト不安定性（MSI）の検出は、治療効果や予後が異なる患者を特定するため、臨床的な意思決定に不可欠である。MSI検査の普遍化が推奨されていますが、多くの患者が検査を受けていないのが現状である。そのため、検査対象となる患者を選択するための、広く利用可能で費用対効果の高いツールが必要とされています。ここでは、ディープラーニング（深層学習）を用いて、ヘマトキシリン・エオジン（H&E）染色した全スライド画像（WSI）から直接、MSIを自動予測するシステムの可能性を検討した。

＜方法＞

2015年1月1日から2017年12月31日の間にStanfordUniversity Medical Center（米国カリフォルニア州スタンフォード、内部データセット）で原発性大腸がんの切除手術を受けた343名の患者の中から、クラスバランスを考慮してランダムに選んだ100枚のH&E染色WSI（microsatellitestability[MSS]50枚、MSIあり50枚）を用いて、深層学習モデル（MSINet）を開発した。このモデルを、ホールドアウトテストセット（15人の患者のH&E染色したWSI15枚：MSSあり7例、MSIあり8例）で内部検証し、40倍および20倍でスキャンしたWSIを含むThe Cancer Genome AtlasのH&E染色したWSI484枚（MSSあり402例、MSIあり77例：479例）で外部検証した。性能は、主に感度、特異度、陰性的中率（NPV）、AUROCを用いて評価した。外部データセットから40倍のWSIを無作為に選んでクラスバランスをとったサブセット（MSSが20件、MSIが20件）を用いて、5人の消化器病理医とモデルの性能を比較した。

＜結果＞

MSINet モデルは、内部データセットのホールドアウトテストセットでは 0.931（95% CI 0.771-1.000）、外部データセットでは 0.779（0.720-0.838）の AUROC を達成した。外部データセットでは、NPV 93.7%(95% CI 90.3-96.2)，感度 76.0%(64.8-85.1)，特異度 66.6%(61.8-71.2)を達成した．読者実験（40例）では，このモデルは0.865（95% CI 0.735-0.995）のAUROCを達成した．5人の病理医の平均AUROC性能は0.605（95%CI 0.453-0.757）であった。

＜結論＞

我々の深層学習モデルは、H&E染色したWSIでMSIを予測する経験豊富な消化器病理医のパフォーマンスを上回った。現在のユニバーサルMSI検査のパラダイムでは、このようなモデルは、確認検査のために患者をトリアージする自動化されたスクリーニングツールとしての価値があり、検査を受ける患者の数を減らすことで、検査関連の労働力とコストを大幅に削減できる可能性があります。

＜Background＞

Detecting microsatellite instability (MSI) in colorectal cancer is crucial for clinical decision making, as itidentifies patients with differential treatment response and prognosis. Universal MSI testing is recommended, butmany patients remain untested. A critical need exists for broadly accessible, cost-efficient tools to aid patient selectionfor testing. Here, we investigate the potential of a deep learning-based system for automated MSI prediction directlyfrom haematoxylin and eosin (H&E)-stained whole-slide images (WSIs).

＜Methods＞

Our deep learning model (MSINet) was developed using 100 H&E-stained WSIs (50 with microsatellitestability [MSS] and 50 with MSI) scanned at 40× magnification, each from a patient randomly selected in a classbalancedmanner from the pool of 343 patients who underwent primary colorectal cancer resection at StanfordUniversity Medical Center (Stanford, CA, USA; internal dataset) between Jan 1, 2015, and Dec 31, 2017. We internallyvalidated the model on a holdout test set (15 H&E-stained WSIs from 15 patients; seven cases with MSS and eightwith MSI) and externally validated the model on 484 H&E-stained WSIs (402 cases with MSS and 77 with MSI;479 patients) from The Cancer Genome Atlas, containing WSIs scanned at 40× and 20× magnification. Performancewas primarily evaluated using the sensitivity, specificity, negative predictive value (NPV), and area under the receiveroperating characteristic curve (AUROC). We compared the model’s performance with that of five gastrointestinalpathologists on a class-balanced, randomly selected subset of 40× magnification WSIs from the external dataset(20 with MSS and 20 with MSI).

＜Findings＞

The MSINet model achieved an AUROC of 0·931 (95% CI 0·771–1·000) on the holdout test set from theinternal dataset and 0·779 (0·720–0·838) on the external dataset. On the external dataset, using a sensitivity-weightedoperating point, the model achieved an NPV of 93·7% (95% CI 90·3–96·2), sensitivity of 76·0% (64·8–85·1), andspecificity of 66·6% (61·8–71·2). On the reader experiment (40 cases), the model achieved an AUROC of 0·865(95% CI 0·735–0·995). The mean AUROC performance of the five pathologists was 0·605 (95% CI 0·453–0·757).

＜Interpretation＞

Our deep learning model exceeded the performance of experienced gastrointestinal pathologists atpredicting MSI on H&E-stained WSIs. Within the current universal MSI testing paradigm, such a model mightcontribute value as an automated screening tool to triage patients for confirmatory testing, potentially reducing thenumber of tested patients, thereby resulting in substantial test-related labour and cost savings.