Skip to main content

Table 2 Accuracy of chemical named entity recognition using the naïve-Bayes approach based on the representation of texts using n-grams equal to five symbols and a context window of one token before and after analysis

From: Chemical named entity recognition in the texts of scientific publications using the naïve Bayes classifier approach

Type

N*

R**

IA***

(loo cv)

Abbreviation

12,506

118

0.99

Formula

13,466

110

0.99

Family

19,017

78

0.97

Systematic

32,510

46

0.99

Trivial

25,140

59

0.98

CNE

102,639

14

0.98

Non-CNE

1,480,509

1.01

0.98

  1. *—N is the number of fragments of texts used for training
  2. **—R is the ratio of the number of all tokens to the number of tokens
  3. belonging to a certain type, indicating a measure of dataset imbalance
  4. ***—IA invariant accuracy