 Software
 Open access
 Published:
AdductHunter: identifying proteinmetal complex adducts in mass spectra
Journal of Cheminformatics volume 16, Article number: 15 (2024)
Abstract
Mass spectrometry (MS) is an analytical technique for molecule identification that can be used for investigating proteinmetal complex interactions. Once the MS data is collected, the mass spectra are usually interpreted manually to identify the adducts formed as a result of the interactions between proteins and metalbased species. However, with increasing resolution, dataset size, and species complexity, the time required to identify adducts and the errorprone nature of manual assignment have become limiting factors in MS analysis. AdductHunter is a opensource webbased analysis tool that automates the peak identification process using constraint integer optimization to find feasible combinations of protein and fragments, and dynamic time warping to calculate the dissimilarity between the theoretical isotope pattern of a species and its experimental isotope peak distribution. Empirical evaluation on a collection of 22 unique MS datasetsshows fast and accurate identification of proteinmetal complex adducts in deconvoluted mass spectra.
Introduction
Mass spectrometry (MS) is a wellestablished analytical technique for chemical identification and molecular weight determination of various analytes [1]. The experimental output is a mass spectrum consisting of intensity values at corresponding masstocharge ratios (m/z). Analysis of small molecules by electrospray ionization (ESI)MS, one of the most widely used MS techniques, results in mostly singlycharged ions. In the case of proteins or other biomolecules, which have much higher molecular weights, charge state envelopes are formed from ions at different charge states, but originate from the same molecule. The isotopes of the elements present in the protein and its adducts change the isotope peak pattern for each peak, forming Gaussiantype profiles. Due to the complexity of such spectra, maximum entropy deconvolution [2, 3] as a preprocessing step facilitates the analysis of proteins reconstituting the charge state envelope for each species detected into neutral mass peaks.
MS has proven particularly valuable in characterizing metallodrug interactions with proteins, e.g., proteinmetal complex stoichiometry, adduct composition, binding sites, and structural changes [4,5,6,7,8,9,10,11,12]. For current metallodrugs to progress toward clinical development, it is crucial to understand the pharmacological properties, notably metallodrugprotein interactions [13,14,15].
However, interpreting mass spectra is typically done manually, which can be timeconsuming, tedious, and errorprone due to the complexity of mass spectra, in particular for reactive species that can undergo changes not only upon interacting with proteins, but also by reaction with matrix components or during the analysis process with solvent molecules.
Software solutions have been explored to automatize the identification of protein adducts, but for example, Analysis of Protein Modifications from Mass Spectra (\(\hbox {Apm}^2\)s) [16] is targeted at proteomics workflows, pyOpenMS [17] is a mass spectrometrybased proteomics analysis tool but not specifically designed for identifying proteinmetal complex adducts. The Nesvizhskii lab and collaborators have created a suite of software^{Footnote 1} for proteomics and metabolomics applications [18,19,20,21,22] which are well supported for these applications. mMass [23] and pyQms [24] are either again focused on proteomics or metabolomics, and limited to a narrow set of inputs, or have had no further development and support in recent years.
Therefore, AdductHunter is introduced here as a webbased tool that automates the identification of proteinmetal complex adducts in deconvoluted mass spectra, which, to the best of our knowledge, is the first tool of its kind for this purpose.
Implementation
AdductHunter is a webbased tool that automates the identification of protein adducts in deconvoluted mass spectra (see Fig. 1). It requires a series of input files and parameters, returning a downloadable output file that contains a list of (feasible) species corresponding to different peaks in the input spectrum (see Fig. 2 for its general algorithm). These species are sorted by their similarity to the experimental peaks as scored by closeness of fit (loss) to isotope pattern and mass error.
AdductHunter is freely accessible on GitHub^{Footnote 2} under an MIT license or at adducthunter.wickerlab.org and was created using Python 3. Hence, it is dependent on several Python packages, namely pyOpenMS [17], ORTools [25], SciPy [26], and Flask [27]. In this section, we outline the specifics behind AdductHunter’s implementation, alongside examples of a wellstudied protein/metallodrug system [4, 5, 28,29,30], namely ubiquitin (Ub) incubated with cisplatin (cisPt(\(\hbox {NH}_3\))\(_2\)\(\hbox {Cl}_2\)), to provide clarity on usage.
Input files
Three input files are required; (1) the deconvoluted mass spectrum, for example obtained using Maximum Entropy Deconvolution in Bruker DataAnalysis to produce a charge neutral spectrum [2, 3]; (2) a file that lists the protein and any atoms, ions, and solvents contained in the sample and their corresponding constraints, such as charge and coordination number, the number of expected adducts formed; and (3) a description of the standard adducts involved in the sample and their corresponding constraints, which is expected to be very similar for most experiments. These are required to be of .xlsx or .csv file types and assumed to have the correct formatting (see Tables 1, 2, 3 and Additional Files 1 and 2 for examples of the expected layout for these input files).
Peak identification
AdductHunter begins by identifying peaks within the mass spectrum. Users are required to specify three parameters involved in this process: (1) the (normalized) noise threshold or minimum peak height; (2) the minimum distance between adjacent peaks in atomic mass units; and (3) whether a linear recalibration of the spectrum is required using known peaks as internal standards. In the case that a recalibration is applied, all masstocharge values are shifted equally, either according to the difference between the (theoretical) peak isotopic mass of the protein and the closest identified isotopologue peak in the mass spectrum, or a userspecified value.
With these parameters set, peaks are identified in a twostep process. The spectrum intensities are first normalized to the most abundant peak, then peaks exceeding the minimum height threshold are identified using SciPy’s peak detection function, which yielded similar results to several recently reported MS peak detection algorithms [31,32,33]. Higherresolution MS, however, picks up a much greater number of low intensity peaks, leading to more peaks having an intensity larger than the noise threshold and a significant number of false positive peaks. As a result, a second filtering step was included in the peak identification process. Filtering uses the minimum distance between peaks to remove peaks belonging to the same species, ensuring isotope peaks within the same isotope pattern are only detected once, a feature increasingly relevant in mass spectra collected with higher resolution instruments (see Fig. 3). Additionally, users can specify to only return detected peaks with at least one feasible species in the output.
The peak isotopic mass refers to the highest nominal mass peak by intensityweighted average of the hyperfine mass distribution (at each integer mass) of a species. The most abundant isotopologue typically matches that of commercial isotope pattern predictors on the scale of \(10^{3}\) parts per million (ppm), e.g., the Bruker isotope pattern generator [34]. These mass values are later used in the constraint optimization formulation to linearly approximate the true mass value of a species.
Optimization problem
Once peaks within the mass spectrum have been detected, AdductHunter will determine their corresponding speciations by formulating an optimization problem, involving an objective function subject to a set of constraints, at each identified peak p. The objective function measures the dissimilarity (distance) between the theoretical isotope pattern of a given species and the experimental isotope distribution of the peak. The constraints are established from userdefined parameters and files, forming the set of feasible solutions. In the context of this problem, a feasible solution refers to a combination of input compounds that gives a potential species matching the peakcentred experimental isotopic distribution. This gives the general formulation:
where \(\varvec{x}\) is the vector of the number of molecules for each compound, \(\phi (\varvec{x},p)\) is the dissimilarity between species \(\varvec{x}\) and the experimental distribution around peak p, and F(p) is the set of feasible species at peak p. For example, to represent UbPt\((\) \(\hbox {NH}_3)_2\), we would have \(\varvec{x} = (x_{\text {Ub}}, x_{\text {Pt}}, x_{\text {NH}_3}, \cdots )^T = (1, 1, 2, 0, \cdots , 0)^T\).
Due to noise and inaccuracies in the collection and averaging of mass spectra, the true species may not be optimal, that is, there exists another species that has an isotope pattern more similar to the experimental isotope distribution. However, the correct species is highly likely to be contained within the feasible set, if sensible constraints and parameters have been provided. Thus, returning all feasible species is helpful for postoptimization validation and analysis.
Constraint integer optimization formulation
A constraint integer optimization (CIO) formulation is a type of integer optimization formulation where all feasible integer solutions are returned. The formulation takes advantage of the problem structure and constraints to ensure sensible species are generated. Although once thought to be intractable, it has shown great advances in efficiency and speed in recent years, and can be solved quickly using industrygrade solvers such as CPLEX [35] and GUROBI [36], much faster than enumerating all possible solutions.
To start, the decision variables, \(x_i\), are defined as the number of molecules present for protein/adduct i, each having a mass of \(m_i\), for every i in the set of all species, C. The mass value here takes into account the charge discrepancy from adding a metalbased fragment, \(c_i\), of species i by removing the mass of \(c_i\) protons from its peak isotopic mass, that is,
where \(P_i\) is the most abundant isotopologue of the species.
Constants/parameters in the formulation are defined in either the web application or the compound constraint files. The userdefined parameters in the web application are as follows:

i.
The peak tolerance, t, defined as the neighbourhood of mass values around a peak p, at which a combination of species forms a feasible protein adduct. It is enforced by the constraint:
$$\begin{aligned} p  t \le \displaystyle \sum \limits _{i \in C} m_{i}x_{i} \le p + t \end{aligned}$$(3) 
ii.
The maximum number of unique standard adducts, r, in any feasible solution. We define standard adducts as those adducts frequently observed in ESIMS, that is, the alkali metal ions Na+, Li+ and K+, as well as H+. To enforce this, the indicator variables \(d_s=\mathbb {1}\{\)standard adduct s is selected\(\}\), to track which standard adducts have been selected, will need to be added with the following constraint:
$$\begin{aligned} \displaystyle \sum \limits _{s \in S} d_{s} \le r, \end{aligned}$$(4)where S is the set of all standard adducts and
$$\begin{aligned} \mathbbm {1}\{X\}:= {\left\{ \begin{array}{ll} 1 &{} \text {if}\ X \text { is true,} \\ 0 &{} \text {otherwise.} \end{array}\right. } \end{aligned}$$(5) 
iii.
The minimum, g, and maximum, h, number of proteins in a multiprotein use case, in any feasible solution. Again, the indicator variables \(z_a=\mathbb {1}\{\)primary a is selected\(\}\), to track which primaries have been selected, will need to be added with the following constraint:
$$\begin{aligned} g \le \displaystyle \sum \limits _{a \in A} z_{a} \le h, \end{aligned}$$(6)where A is the set of all primaries.

iv.
The coordination number, v, of metal k used. The coordination number for linear complexes is 2, for square planar and tetrahedral complexes is 4, and for octahedral complexes is 6, to name a few. For cisplatin, the platinum (II) metal center has \(v=4\). AdductHunter supports one type of metal at a time. This constraint is enforced by:
$$\begin{aligned} \displaystyle \sum \limits _{i \in C \setminus \{k, S\}} x_{i} \le vx_{k} \end{aligned}$$(7)For the compound description/constraint files (see Tables 2, 3), the userdefined parameters are as follows:

v.
The lower and upper bounds \(l_i\) and \(u_i\), respectively, for species i in any feasible solution, enforced by constraints:
$$\begin{aligned} l_{i} \le x_{i} \le u_{i}, \quad \forall i \in C \end{aligned}$$(8) 
vi.
The maximum number of coordinating species, \(n_j\), per metal k for each binding species j, enforced by the constraint:
$$\begin{aligned} x_{j} \le n_{j}x_{k}, \quad \forall j \in B, \end{aligned}$$(9)where B is the set of all binding compounds.
Putting all of this together, along with nonnegativity constraints, the final CIO formulation defining the feasible set F(p) at peak p is:
As an example, we illustrate in the system involving ubiquitin incubated with cisplatin the CIO formulation defining the feasible set at the peak corresponding to a mass of 8774.6028 Da:
where the set of all species \(C = \left\{ {{\text{Ubiquitin}},{\text{Platinum}},{\text{Ammonia,}} \ldots ,{\text{Potassium}}} \right\}\), the set of binding compounds \(B = \left\{ {{\text{Ammonia}},{\text{Water}},{\text{Chlorine}}} \right\}\), the set of standard adducts \(S = \left\{ {{\text{Lithium}},{\text{Sodium}},{\text{Potassium}}} \right\}\), the peak tolerance \(t = 2\), the maximum number of unique standard adducts \(r = 2\), the metal k is Platinum with a coordination number \(v = 4\), and the maximum number of coordinating species is \(n_j = 2\) for all binding compounds \(j\in B\). Notice that since there is only one protein in this system, we do not have the multiprotein constraint.
Objective function
With the constraints established, we require an objective function that measures the similarity in shape and mass between theoretical and experimental isotope distributions, or the dissimilarity assuming a minimization problem. Furthermore, the effects of preceding and succeeding noisy peaks far from the peak for the most abundant isotopologue should be ignored, as well as intensities below a certain height due to the noise in highresolution data—these do not help the measurement of similarity. Thus, only values within a certain userspecified interval of the current peak are considered when comparing the theoretical and experimental distributions.
AdductHunter uses Dynamic Time Warping (DTW) [37] to find the dissimilarity between distributions, that is, \(\phi (\varvec{x},p)\) is the (Euclidean) distance between the optimally aligned theoretical and experimental isotopic distributions. DTW works by computing a distance matrix between the two isotopic distributions, where each cell in the matrix represents the distance between a specific point in one distribution and a specific point in the other distribution. The optimal path through the distance matrix that minimizes the total distance between the two distributions is then computed by constructing a cost matrix that accumulates the distances between all possible pairs of points in the two distributions. The cost matrix is then traversed in a way that minimizes the total accumulated cost along the path, that is, the optimally aligned dissimilarity between the two distributions.
Output table
After the optimization problem is solved, a table of feasible protein adducts with an indication of the closest fit for each peak is returned (see Table 4). The table is sorted by experimental peak mass and closeness of fit (loss). Here, the theoretical peak mass is recorded as the most abundant isotopologue for a given species and is used to calculate mass error in ppm.
Results and discussion
We examined the performance of AdductHunter on a variety of datasets to understand its effectiveness in accurately identifying protein adducts, as well as discuss here its limitations and further development.
AdductHunter was specifically developed to identify adducts formed between metal complexes and proteins. A collection of 22 unique datasets was analyzed to provide a comprehensive performance benchmark for AdductHunter (see Table 5). The metal complexes used were cisplatin, oxaliplatin, RAPTAC, RM175, Au1, Au2, and Au3, with formulas cisPt(\(\hbox {NH}_3\))\(_2\)\(\hbox {Cl}_2\), Pt(\(\hbox {C}_6\)\(\hbox {H}_{{14}}\)\(\hbox {N}_2\))(\(\hbox {C}_2\)\(\hbox {O}_4\)), Ru(\(\eta ^6\)\(\hbox {C}_{{10}}\)\(\hbox {H}_{{14}}\))(\(\hbox {PN}_3\)\(\hbox {C}_6\)\(\hbox {H}_{{12}}\))\(\hbox {Cl}_2\), [Ru(\(\eta ^6\)\(\hbox {C}_{{12}}\)\(\hbox {H}_{{10}}\))(\(\hbox {C}_{{2}}\)\(\hbox {H}_8\)\(\hbox {N}_2\))Cl]\(\hbox {PF}_6\), [Au(\(\hbox {C}_{{19}}\)\(\hbox {H}_{{17}}\)\(\hbox {N}_2\))(OH)]\(\hbox {PF}_6\), Au(\(\hbox {C}_{{12}}\)\(\hbox {H}_{{11}}\)\(\hbox {N}_2\)\(\hbox {O}_2\))\(\hbox {Cl}_2\), and Au(\(\hbox {C}_{{12}}\)\(\hbox {H}_{{10}}\)N)\(\hbox {Cl}_2\), respectively. The proteins used were cytochrome c (CyC, \(\hbox {C}_{{560}}\)\(\hbox {H}_{{874}}\)\(\hbox {Fe}_1\)\(\hbox {N}_{{148}}\)\(\hbox {O}_{{156}}\)\(\hbox {S}_4\)), ubiquitin (Ub, \(\hbox {C}_{{378}}\)\(\hbox {H}_{{629}}\)\(\hbox {N}_{{105}}\)\(\hbox {O}_{{118}}\)\(\hbox {S}_1\)), hen eggwhite lysozyme (HEWL, \(\hbox {C}_{{613}}\)\(\hbox {H}_{{951}}\)\(\hbox {O}_{{185}}\)\(\hbox {N}_{{193}}\)\(\hbox {S}_{{10}}\)), and myoglobin (Mb, \(\hbox {C}_{{769}}\)\(\hbox {H}_{{1212}}\)\(\hbox {N}_{{210}}\)\(\hbox {O}_{{218}}\)\(\hbox {S}_2\)). Each data set contained a mixture of at least one protein and one metal complex (see Table 5). We compared the output from AdductHunter for each dataset against the corresponding ground truth, that is, the manually identified protein adducts.
Peak identification
Peak detection in mass spectra is subject to identifying many false positives, especially at low intensities where noise is prevalent. Here, we define false positives as peaks detected by the tool but not ground truth peaks, and false negatives as ground truth peaks that were not picked up by the tool. The peak detection algorithm in AdductHunter is highly sensitive to the normalized minimum peak height. A lower minimum peak height allows AdductHunter to detect more manually identified peaks, although with diminishing returns and increasing false positives (see Fig. 4). Through testing and assuming an equal weight on false positives and false negatives, a value of 0.01 was found to be optimal; decreasing the setting to 0.005 added many false positives with few manually identified peaks, likely due to noise, and increasing the value to 0.02 removed a notable portion of manually identified peaks with a less significant reduction in false positives. Another notable parameter in peak detection is the minimum distance between two (manually identified) adjacent peaks, found to be 15.9 Da over all datasets and set to 15 as a default.
The other significant parameter to be defined is the tolerance around peaks, t. Peak tolerance makes strides at accounting for noise in mass spectra and error in the mass approximation of the adduct in AdductHunter. Here, individual compound masses are summed instead of finding the most abundant isotopologue for the adduct, which is a nonlinear, noncontinuous, and computationallyexpensive calculation. Consequently, we decided to keep the formulation linear as it is a close approximation of the true mass value. This parameter has the most flexibility, uncertainty, and, alongside the minimum peak height, is where computational efficiency in the constraint optimization is most affected.
Since t is directly proportional to the size of the feasible set, a large enough t is desired to be confident that as many manually identified species are captured in the feasible set, but not so large that numerous unwanted species are made feasible (see Fig. 5). Recall that only peaks with at least one feasible solution are returned in the output (and the bestfit species is found for each peak). As a result, a larger t will not only return more potential species, but more unique peaks as well; this brings about the detection of peaks as false positives that would not have been returned with a lower tolerance. Tolerance values were selected to be slightly larger than multiples of the atomic mass of a hydrogen atom at 1.008, that is, nH where \(n \in \mathbb {N}\), which has been approximated to \(n + 0.1\). It was found that for the given data and tolerances greater than 3.1, no more peaks in the manually identified were returned, meaning the missing number of manually identified peaks did not change. Hence, increasing the tolerance past this point means new peaks returned are all false positives.
Default parameters
The results of the benchmarking tests were used to set the default parameters values for AdductHunter. As a broader range of data is tested and analyzed, a parameter search would prove useful to precisely determine their optimal values. Parameter values could also be made variable and dependent on the mass. The assignment of proton adducts became more challenging for higher adducts with larger masses, as they tended to be further away from the experimental peak, making reliable identification difficult. In the used datasets, higher mass adducts at lower intensity in the mass spectra and the peaks usually were surrounded by increased noise and complexity, which comes naturally with more individual components involved in each adduct. Thus, for peaks at higher masses, parameters may need to accommodate for an increased feasible set to capture the previously identified peaks in the ground truth. For example, the peak tolerance could increase as the mass of the protein adducts increases, with possibly a smaller starting peak tolerance than the constant mass tolerance (3.1) used as mass error increases with adduction complexity. Future work may also include automatically calculating the noise threshold as a function of the baseline intensity and noise level of the spectrum, instead of being a userdefined input.
Objective function analysis
A variety of established similarity measures for the objective function were tested over all datasets to determine which metric would work best. We used the “similaritymeasures” package [38] to test the following measures: the area between curves [38], Partial Curve Matching [39], discrete Fréchet distance [40], and Dynamic Time Warping [37] with Euclidean and City Block (Manhattan) distance measures. The normalized intensity values were also scaled by a range of weights – 0 (effectively only using mass), \(10^{1}\), \(10^0\), \(10^1\), \(10^2\), \(10^3\) – to understand its significance in finding the best fit. Dynamic Time Warping with an Euclidean distance measure was found to have the best average performance with a weight of \(10^{1}\) on the intensity. However, other (tested) similarity metrics and weights may be used depending on the data (see Table 6 and Additional File 3). As noise affects the experimental intensities, more weight is applied to mass accuracy when comparing experimental and theoretical distributions. Using only mass however, performs poorly due to multiple feasible solutions having similar mass values.
A different type of objective function initially considered was to measure the mass error (ppm) at each peak p, which is a scaled form of the relative error between the peaks in the theoretical isotope pattern and experimental isotope distribution:
where \(T(\varvec{x})\) is the theoretical peak mass of species \(\varvec{x}\).
Parts per million error calculations are a common approach in MS analyses [4], and would be substantially easier to implement and interpret than a distance metric. However, the linear mass approximation used to find theoretical mass peaks means that ppm would need to be measured after finding the feasible set to accurately calculate its value (as the isotope pattern is needed to find its peak, which is a nonlinear process), hence it is unusable as an objective in the constraint formulation. Furthermore, using a full isotope pattern is more robust as there are cases where two consecutive isotope peaks have near identical abundance in the experimental spectrum, and so measured and theoretical distributions may disagree on the identity of the tallest peak, resulting in large ppm values.
Running time
Experiments were run using an Intel Core i58250U CPU and 8GB RAM. When calculating the objective, the total time taken for the AdductHunter analysis of a recorded spectrum was dominated by the generation of hyperfine isotopic mass distributions. In contrast, the choice of objective function had a negligible effect on the total analysis time. Across all datasets, generating the hyperfine isotope distribution took approximately 135.5 s on average. The time required to identify peaks and generate the set of feasible species pales in comparison, taking approximately 0.41 s on average. Additionally, an approximate, coarse method for generating isotopic mass distributions exists in pyOpenMS that is significantly faster (\(\sim\)85 times) than generating the hyperfine peaks, which took approximately 1.68 s on average. However, the mass values calculated using the coarse method will not accurately reflect the most abundant isotopologue peak as a simplified formula is used to find isotope peaks with greater mass [17]. This imprecision leads to decreased accuracy and high error values for almost all metrics at an intensity weight of 10^{1}, although some improvements can be seen for larger intensity weights across all metrics (see Table 7). The best performance (mean accuracy of 0.787) was achieved with the Dynamic Time Warping with City Block (Manhattan) distance measures at an intensity weight of 10^{0}. However, it is worse than than the one achieved with the hyperfine method (mean accuracy of 0.842, see Table 6). Hence, we recommend to use the hyperfine method, although the coarse method may be used for rapid preliminary testing. As species and peaks generated are independent of each other, further improvement on the analysis time would involve parallelizing the generation of isotope patterns, constraint integer optimization formulations, and objective function calculations.
Conclusion
AdductHunter was created to identify proteinmetal complex adducts in deconvoluted mass spectrometry data by formulating a constraint integer optimization problem at each experimental mass peak and using dynamic time warping to find the best fit species based on its theoretical isotopic distribution. The results presented herein provide comprehensive evidence that AdductHunter effectively detects peaks within mass spectrometry data and accurately determines their speciation much faster than interpreting the spectra manually. Efforts are currently underway to address AdductHunter’s limitations, specifically by introducing the deconvolution of experimental mass spectra as well as ensuring that it can appropriately handle samples with more than one metal complex in the incubation mixture.
Data availability
AdductHunter is freely accessible on Github under an opensource (MIT) license at github.com/dlon450/MSProteinAdductIdentification, and can also be found at adducthunter.wickerlab.org. Scripts used for the results section can be found at github.com/dlon450/MSProteinAdductIdentification/tree/main/src. Finally, not all data sets are available as some are currently unpublished (see Table 5 for more information).
References
Urban PL (2016) Quantitative mass spectrometry: an overview. Philos Trans A Math Phys Eng Sci 374(2079):20150382. https://doi.org/10.1098/rsta.2015.0382
Ferrige AG, Seddon MJ, Jarvis S, Skilling J, Skilling J, Aplin R (1991) Maximum entropy deconvolution in electrospray mass spectrometry. Rapid Commun Mass Spectrom 5(8):374–377. https://doi.org/10.1002/rcm.1290050810
Ferrige AG, Seddon MJ, Green BN, Jarvis SA, Skilling J, Staunton J (1992) Disentangling electrospray spectra with maximum entropy. Rapid Commun Mass Spectrom 6(11):707–711. https://doi.org/10.1002/rcm.1290061115
Hartinger CG, Tsybin YO, Fuchser J, Dyson PJ (2008) Characterization of platinum anticancer drug proteinbinding sites using a topdown mass spectrometric approach. Inorgan Chem 47(1):17–19. https://doi.org/10.1021/ic702236m
Hartinger CG, Ang WH, Casini A, Messori L, Keppler BK, Dyson PJ (2007) Mass spectrometric analysis of ubiquitinplatinum interactions of leading anticancer drugs: Maldi versus esi. J Anal At Spectrom 22:960–967. https://doi.org/10.1039/B703350H
Escribano E, Madurga S, Vilaseca M, Moreno V (2014) Ion mobility and Topdown MS complementary approaches for the structural analysis of protein models bound to anticancer metallodrugs. Inorgan Chim Acta 423:60–69. https://doi.org/10.1016/j.ica.2014.07.052
Cooke MS, Hu CW, Chao MR (2019) Editorial: mass spectrometry for adductomic analysis. Front Chem. https://doi.org/10.3389/fchem.2019.00794
Casini A, Gabbiani C, Mastrobuoni G, Messori L, Moneti G, Pieraccini G (2006) Exploring metallodrugprotein interactions by ESI mass spectrometry: the reaction of anticancer platinum drugs with horse heart cytochrome c. ChemMedChem 1(4):413–417. https://doi.org/10.1002/cmdc.200500079
Riffle M, Hoopmann MR, Jaschob D, Zhong G, Moritz RL, MacCoss MJ, Davis TN, Isoherranen N, Zelter A (2022) Discovery and visualization of uncharacterized drugprotein adducts using mass spectrometry. Anal Chem 94(8):3501–3509. https://doi.org/10.1021/acs.analchem.1c04101
Casini A, Gabbiani C, Michelucci E, Pieraccini G, Moneti G, Dyson PJ, Messori L (2009) Exploring metallodrugprotein interactions by mass spectrometry: comparisons between platinum coordination complexes and an organometallic ruthenium compound. J Biol Inorgan Chem 14(5):761–770. https://doi.org/10.1007/s0077500904895
Artner C, Holtkamp HU, Hartinger CG, MeierMenches SM (2017) Characterizing activation mechanisms and binding preferences of ruthenium metalloprodrugs by a competitive binding assay. J Inorgan Biochem 177:322–327. https://doi.org/10.1016/j.jinorgbio.2017.07.010
Hartinger CG, Groessl M, Meier SM, Casini A, Dyson PJ (2013) Application of mass spectrometric techniques to delineate the modesofaction of anticancer metallodrugs. Chem Soc Rev 42:6186–6199. https://doi.org/10.1039/C3CS35532B
Yang X, Bartlett MG (2016) Identification of protein adduction using mass spectrometry: protein adducts as biomarkers and predictors of toxicity mechanisms. Rapid Commun Mass Spectrom 30(5):652–664. https://doi.org/10.1002/rcm.7462
Nunes J, Charneira C, Morello J, Rodrigues J, Pereira SA, Antunes AMM (2019) Mass spectrometrybased methodologies for targeted and untargeted identification of protein covalent adducts (adductomics): current status and challenges. High Throughput. https://doi.org/10.3390/ht8020009
LoPachin RM, DeCaprio AP (2005) Protein adduct formation as a molecular mechanism in neurotoxicity. Toxicol Sci 86(2):214–225. https://doi.org/10.1093/toxsci/kfi197
Lee RFS, Menin L, Patiny L, Ortiz D, Dyson PJ (2017) Versatile tool for the analysis of metalprotein interactions reveals the promiscuity of metallodrugprotein interactions. Anal Chem. 89(22):11985–11989
Röst HL, Schmitt U, Aebersold R, Malmström L (2014) pyOpenMS: a pythonbased interface to the OpenMS massspectrometry algorithm library. Proteomics 14(1):74–77. https://doi.org/10.1002/pmic.201300246
Kong AT, Leprevost FV, Avtonomov DM, Mellacheruvu D, Nesvizhskii AI (2017) Msfragger: ultrafast and comprehensive peptide identification in mass spectrometrybased proteomics. Nat Methods 14(5):513–520. https://doi.org/10.1038/nmeth.4256
Yu F, Teo GC, Kong AT, Haynes SE, Avtonomov DM, Geiszler DJ, Nesvizhskii AI (2020) Identification of modified peptides using localizationaware open search. Nat Commun 11(1):4065. https://doi.org/10.1038/s4146702017921y
da Veiga Leprevost F, Haynes SE, Avtonomov DM, Chang HY, Shanmugam AK, Mellacheruvu D, Kong AT, Nesvizhskii AI (2020) Philosopher: a versatile toolkit for shotgun proteomics data analysis. Nat Methods 17(9):869–870. https://doi.org/10.1038/s415920200912y
Teo GC, Polasky DA, Yu F, Nesvizhskii AI (2021) Fast deisotoping algorithm and its implementation in the msfragger search engine. J Proteome Res 20(1):498–505. https://doi.org/10.1021/acs.jproteome.0c00544
Avtonomov DM, Raskind A, Nesvizhskii AI (2016) Batmass: a java software platform for LC–MS data visualization in proteomics and metabolomics. J Proteome Res 15(8):2500–2509. https://doi.org/10.1021/acs.jproteome.6b00021
Niedermeyer THJ, Strohalm M (2012) mMass as a software tool for the annotation of cyclic peptide tandem mass spectra. PLoS ONE 7(9):1–9. https://doi.org/10.1371/journal.pone.0044913
Leufken J, Niehues A, Sarin LP, Wessel F, Hippler M, Leidel SA, Fufezan C (2017) pyqms enables universal and accurate quantification of mass spectrometry data. Mol Cell Proteomics 16(10):1736–1745. https://doi.org/10.1074/mcp.M117.068007
Perron L, Furnon V ORTools. Google. https://developers.google.com/optimization/
Virtanen P, Gommers R, Oliphant TE, Haberland M, Reddy T, Cournapeau D, Burovski E, Peterson P, Weckesser W, Bright J, van der Walt SJ, Brett M, Wilson J, Millman KJ, Mayorov N, Nelson ARJ, Jones E, Kern R, Larson E, Carey CJ, Polat İ, Feng Y, Moore EW, VanderPlas J, Laxalde D, Perktold J, Cimrman R, Henriksen I, Quintero EA, Harris CR, Archibald AM, Ribeiro AH, Pedregosa F, van Mulbregt P (2020) SciPy 1.0 contributors: SciPy 1.0: fundamental algorithms for scientific computing in python. Nat Methods 17:261–272. https://doi.org/10.1038/s4159201906862
Grinberg M (2018) Flask web development: developing web applications with python. O’Reilly Media Inc, Sebastopol
Meier SM, Tsybin YO, Dyson PJ, Keppler BK, Hartinger CG (2012) Fragmentation methods on the balance: unambiguous topdown mass spectrometric characterization of oxaliplatinubiquitin binding sites. Anal Bioanal Chem 402(8):2655–2662. https://doi.org/10.1007/s0021601155230
PelegShulman T, Najajreh Y, Gibson D (2002) Interactions of cisplatin and transplatin with proteins: comparison of binding kinetics, binding sites and reactivity of the Ptprotein adducts of cisplatin and transplatin towards biological nucleophiles. J Inorgan Biochem 91(1):306–311. https://doi.org/10.1016/S01620134(02)003628
Gibson D, Costello CE (1999) A mass spectral study of the binding of the anticancer drug cisplatin to ubiquitin. Eur Mass Spectrom 5(6):501–510. https://doi.org/10.1255/ejms.314
O’Callaghan S, De Souza DP, Isaac A, Wang Q, Hodkinson L, Olshansky M, Erwin T, Appelbe B, Tull DL, Roessner U, Bacic A, McConville MJ, Likić VA (2012) PyMS: a Python toolkit for processing of gas chromatographymass spectrometry (GCMS) data Application and comparative study of selected tools. BMC Bioinform 13(1):115. https://doi.org/10.1186/1471210513115
Bittremieux W (2020) spectrum\_utils: a python package for mass spectrometry data processing and visualization. Anal Chem 92(1):659–661. https://doi.org/10.1021/acs.analchem.9b04884
Renard BY, Kirchner M, Steen H, Steen JA, Hamprecht FA (2008) NITPICK: peak identification for mass spectrometry data. BMC Bioinform 9(1):355. https://doi.org/10.1186/147121059355
Bruker (1984) Analytical Chemistry 56(9), 1030–1030. https://doi.org/10.1021/ac00273a717
Cplex II (2009) V12.1: user’s manual for CPLEX. Int Bus Mach Corp 46(53):157
Gurobi Optimization (2022) LLC: Gurobi optimizer reference manual. https://www.gurobi.com
Müller M (2007) Dynamic time warping. Springer, Berlin, pp 69–84. https://doi.org/10.1007/9783540740483_4
Jekel CF, Venter G, Venter MP, Stander N, Haftka RT (2019) Similarity measures for identifying material parameters from hysteresis loops using inverse analysis. Int J Mater Form 12(3):355–378. https://doi.org/10.1007/s1228901814218
Witowski K, Stander N (2012) Parameter identification of hysteretic models using partial curve mapping. American institute of aeronautics and astronautics, Reston. https://doi.org/10.2514/6.20125580
Eiter T, Mannila H (1994) Computing discrete Fréchet distance. Technical report
Meier SM, Gerner C, Keppler BK, Cinellu MA, Casini A (2016) Mass Spectrometry Uncovers Molecular Reactivities of Coordination and Organometallic Gold(III) Drug Candidates in Competitive Experiments That Correlate with Their Biological Effects. Inorganic Chem 55(9):4248–4259. https://doi.org/10.1021/acs.inorgchem.5b03000
Acknowledgements
We would like to thank Hannes Röst for providing indepth explanation of pyOpenMS tools, specifically for generating isotope distributions; Hasnain Cheena and Matthew Mulvey for working on the preliminary development of AdductHunter; Dr. Nicholas Demarias for his support in operating the Bruker FTICRMS.
Funding
Not applicable.
Author information
Authors and Affiliations
Contributions
DL developed and tested the AdductHunter software, used AdductHunter to analyze the data, and wrote the initial version of the manuscript. MS and LE helped with the development and improvement of AdductHunter and the validation of test results. KD thoroughly proofread the manuscript. KT, JW, and CH developed the concept, provided guidance on software features and goals. All the authors read, edited, and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
The authors give consent for publication in the Journal of Cheminformatics.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Additional file 1.
Species description and constraints input CSV file for cytochrome c incubated with cisplatin.
Additional file 2.
Standard adducts’ descriptions and constraints input CSV file.
Additional file 3.
Accuracies for each dataset using the hyperfine isotope generator with different similarity measures and weights. Metrics from left to right: area between curves, Partial Curve Matching, discrete Fréchet distance, and Dynamic Time Warping with Euclidean and City Block (Manhattan) distance measures, respectively. Datasets are grouped by metal complexes and proteins (see Table 5 for more detail).
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Long, D., Eade, L., Sullivan, M.P. et al. AdductHunter: identifying proteinmetal complex adducts in mass spectra. J Cheminform 16, 15 (2024). https://doi.org/10.1186/s13321023007977
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s13321023007977