Skip to main content

Table 3 Exceptions for strict filtering procedure

From: Terminology spectrum analysis of natural-language chemical documents: term-like phrases retrieval routine

No.

Exception

Examples

1

Facet_Index_4digits

Token denotes the substance containing

a 4-digits facet index. The list of chemical element signs is used (Table 1)

terms: RU(0001); CO(0001)-CARBIDE; α-FE2O3(0001)

rubbish: HPG1800B; RYC-2008-03387; 20000H-1

2

Miller_Index_3digits

Token denotes the substance containing

a 3-digits crystallographic Miller index. The list of chemical element signs is used

terms: CEO2(111); PT(111); AU{111}-CEO2{100}; (NI,AL)(111); AL2O3/NIAL(110)

rubbish: R873; 50WX8-100; 270-470OC

3

Substances_3digits

Token denotes chemical containing

3 digits in succession. Chemical elements signs list and regular expressions as «EL/\{\d{3}\}» are used

terms: 15N218O; H235S; H218O-SSITKA; H216O/H218O

rubbish: FA100; TSVET-500; CE-440

4

Isotopes

Token denotes an isotope. Stable isotopes and chemical elements signs lists are used (Table 1)

terms: 13C CP-MAS NMR; 12C16O-13C16O MIXTURE; 31P MAS NMR SPECTROSCOPY

rubbish: 04,21H; 11H; 11HV; 1 %18O2; -1H-1; 57CO

5

Substances_2digits

Token denotes substance, which begins with one or two digits

terms: 5-PENTANEDIOL; 2-AMINOBENZENE-1,4-DICARBOXYLATE; 5-BROMO-3-(N,N-DIETHYLAMINO-ETHOXY)-2-METHYLINDOLE

rubbish: 2R,3S; 2LFH; 5NICZPOL; 1KPM; 4-CP

6

Catalysts

Token denotes a catalytic system which is a chemical composition with «.» character

terms: 1.5AU/C; 1.0CUCOK/ZRO2; CE0.9PR0.1O2; CU0.2CO0.8FE2O4; MG3ZN3.-XFE0.5AL0.5; LAFE0.7NI0.3O3-Δ; CE0.8GD0.2O2-Δ; MN0.8ZR0.2

rubbish: VOL. %; (B)2.5 %; DISP.[%]

7

Comp

Token denotes the chemical or catalyst composition. Tag «COMP» is used

terms: 20 %CU/ZNAL; 0.4 %PD/AL2O3; 4 %PT-4 %RE/TIO2; (5 %)PB(10 %)-SBA15

rubbish: 50 %AIR; 1.5 %WT; 0-2.5MOL %; CA.23 %

8

Cryst_hydrates

Tokens denote crystalline hydrates. Regular expressions as «*[A-Za-z].*H2O$» are used

terms: AL(NO3)3*6H2O; FE2(SO4)3.9H2O; AUCL4(NH4)7[TI2(O2)2(CIT)(HCIT)]2.12H2O;

rubbish: 0.6 %H2O; 0.03 %C3H6; 0.06286*T;

9

SpatialDimension

Token denotes the 1-, 2 - or 3-dimensional method or pattern

terms: 2D-SAXS; 2D-GC; 1D-3D COPPER – OXIDE; 1D-STRUCTURE; 1D COPPER – OXIDE

rubbish: 12-MR; 1LATTICE; 16ACR; 60HPW

10

Names

Token denotes a proper name. A set of regular expressions is used for recognition

terms: BRØNSTED ACID; BRӦNSTED BASIC SITE; MӦSSBAUER SPECTROSCOPY;

rubbish: L’ARGENTIЀRE; PROCESS’S

11

OscarTags

True, if a token has any Oscar tag and matches the following regular expressions: «\-[A-Za-z]{2}»; «\{«, «\[*[A-Za-z]» and etc

terms: STEM-HAADF; L-CYSTINE; DI-TERT-BUTYLPEROXIDE;[AU(EN)2]2[CU(OX)2]3

rubbish: 128°- Y-ROTATED; π- BACKDONATION; CONVERSION(%);CU(1)MN; M1(2); ACTIVITY [2]

  1. EL designation of any chemical element, IS designation of any stable isotope