Skip to main content

Table 8 Consolidated table of experimental results on terminology analysis of EuropaCat abstracts set

From: Terminology spectrum analysis of natural-language chemical documents: term-like phrases retrieval routine

n

N—total number of n-grams

NTL—total number of term-like phrases

(% of N)

NGS—total number of general scientific terms (% of NTL)

NCOMP—total number of phrases with tag «COMP» (% of NTL)

NCM—total number of phrases with OSCAR tag «CM» (% of NTL)

1

~5.15 × 106

68,811 (~1.3 %)

574 (0.8 %)

8776 (12.7 %)

40,354 (58.6 %)

2

~4.94 × 106

135,002 (~2.7 %)

11,263 (8.3 %)

5199 (3.9 %)

52,641 (38.9 %)

3

~4.74 × 106

130,706 (~2.8 %)

1031 (0.8 %)

5194 (4 %)

64,101 (49.0 %)

4

~4.54 × 106

118,893 (~2.6 %)

41 (0.03 %)

4064 (3.4 %)

56,047 (47.1 %)

5

~4.35 × 106

94,546 (~2.2 %)

5 (0.005 %)

3390 (3.6 %)

43,550 (46.0 %)

6

~4.16 × 106

58,775 (~1.4 %)

2469 (4.2 %)

29,992 (51.0 %)

7

~3.97 × 106

46,224 (~1.2 %)

2403 (5.2 %)

26,030 (56.3 %)

  1. Number of texts: 6387; total amount of tokens: 5,148,124 (EuropaCat 2013, 2011, 2009, 2007, 2005)