Plasma Peptidome Pattern of Breast Cancer Using Magnetic Beads-Based Plasma Fractionation and MALDI-TOF MS: A Case Control Study in Egypt

Objective: The present study aimed to determine peptidome patterns in breast cancer (BC). Methods: We analyzed the plasma proteomic profiling of 80 BC patients and 50 healthy controls, using hydrophobic interaction chromatography magnetic beads (MB-HIC8) separation followed by Matrix assisted laser desorption ionization/ time of flight mass spectrometry (MALDI-TOF MS). Results: ClinProTools software identified 92 peaks that differed among the analyzed groups, 33 peaks were significantly different (P < 0.05). Of those, 22 peaks were up-regulated while 11 peaks were down-regulated in BC patients compared with the healthy controls. Three peptide ion signatures (m/z 1,570.31, 1,897.4 and 2,568.17) were provided by the Quick Classifier model to discriminate BC patients from healthy control subjects with 96.4% accuracy. External validation was performed by an independent group and this achieved a sensitivity of 100% and a specificity of 76.9%. Conclusion: MALDI-TOF MS has good analytical performance in distinguishing BC patients from healthy controls.


Introduction
Breast cancer (BC) is considered a leading cause of cancer death in women worldwide (Tan et al., 2018). In Egypt, BC is the most common cancer in women accounting for approximately 32.04% of the reported malignancies among Egyptian women (Ibrahim et al., specimens, but indeed blood is the most convenient of all because of its noninvasive attainable nature. In addition it usually contains biomarkers which are secreted not only by the tumor but by the surrounding stroma as well (McCuaig et al., 2017). Several analytical tools have been employed in cancer biomarker research. Mass spectrometry outperformed other techniques in its ability to detect hundreds of proteins in a sole experiment. These proteins are either up or down regulated when comparing the proteomic profiling between malignant and non malignant biological fluids (Mazur and Pyatchanina, 2016;Yang et al., 2016).
Proteome profiling reveals the features of the whole proteome through defined m/z values in the mass spectrum but does not report protein identities for the discriminatory ion peaks. Numerous mass spectrometry techniques are in use for proteome profiling especially Matrix-Assisted Laser Desorption-Ionization spectrometry (MALDI) coupled to a Time-of-Flight (ToF) analyzer. This enables in addition to the high throughput and low sample consumption, high sensitivity and accuracy of elucidation of m/z values of ions in reported mass spectra of complex protein mixtures in different biological specimens (Zhang et al., 2016).
In the current study, we used hydrophobic interaction chromatography magnetic beads (MB-HIC8) followed by MALDI-TOF MS for plasma proteomic profiling analysis in patients with BC and healthy controls. By comparing the generated proteomic profiles by ClinPro Tools 3.0, differentially expressed peaks could be identified as potential biomarkers.

Materials and Methods
Eighty BC patients were consecutively enrolled from 2015 to 2016 from Medical Research Institute Teaching Hospital, Alexandria University, Egypt. All patients were newly diagnosed, histopathologically confirmed by ultrasound guided core needle biopsy, and untreated. The control group included fifty apparently healthy female volunteers receiving routine mammography at the breast diagnostic center. Exclusion criteria for controls included abnormal mammography or physical breast examination or prior personal history of any cancer. All subjects gave informed consent, approved by local ethics committee. Demographic data were obtained, via interviews and standardized questionnaire.

Blood samples and biopsy
Fasting venous blood sample was collected , from subjects in a seated position, into a purple capped vacutainer (BD diagnostics, Plymouth, UK) containing 50 μL of 3.8% di-potassium ethylene diamine tetra acetic acid (K2-EDTA) , then it was inverted 10 times then centrifuged under refrigerated conditions (5°C) at 1,800 x g for 15 minutes. The plasma samples were distributed into 200 µl aliquots and stored at -80°C till use. Biopsy material was used to assess for histopathological tumor grade.

Sample purification
The samples were processed according to manufacturer instructions. Briefly, the plasma and MB-HIC8 Kit were left to reach room temperature, 8 µl of MB-HIC binding buffer and 4 μL of each plasma sample were combined in a microfuge tube. Then, 4 μL of HIC8-beads was added to the tube (after thoroughly shaking) and mixed by pipetting up and down 5 times. The sample tube was allowed to stand for one minute and then placed in a magnetic bead separator for 20 seconds, where the beads were pulled to the side by magnetic force, allowing for the supernatant to be removed and discarded carefully with a pipette.
Subsequently 90 μL of the washing buffer were added to the tube, the tube was moved 20 times back and forth in two adjacent holes of the magnetic separator. 20 seconds later the beads were collected on the wall of the tubes in the magnetic separator, and the supernatant was removed carefully, using a pipette and the beads remained in place. The washing process was repeated twice. Following binding and washing, 9 μL of the elution buffer was added to disperse beads in tubes by pipetting up and down. One minute later the tube was placed in a magnetic bead separator for 30 seconds where the beads were pulled to the side and the clear elute was transferred to a fresh tube. Finally, 1 μL of the resulting elute was spotted on the MALDI-TOF MS polished steel target (MTP 384 polished steel target plate). After air drying, 1 μL of HCCA matrix was applied onto each spot, and the target was air dried again (co-crystallization). The ClinPro Standard (CPS) was applied for calibrating the machine.

Mass Spectrometry Analysis
The proteomic profiling analysis was performed using an ultrafleXtreme MALDI-TOF/TOF MS instrument (Bruker Daltonics, Germany), operating in a linear mode with the following setting: ion source 1, 25.00 kV; ion source 2, 23.65 kV; lens, 6.8 kV; pulsed ion extraction, 300 ns; Ionization was achieved by irradiation with a laser operating at 2000.0Hz. Matrix suppression effect was enabled with signal suppression up to 800 Da. Mass spectrum were detected using linear positive mode. A standard calibration mixture in the range of 4000 to 20000 Da was used for mass calibration.
Four MALDI preparations (MALDI spots) for each sample were measured. For each MALDI spot, 3,000 spectrum were acquired (500 laser shots at 6 different spot positions). The ClinPro Tools software version 3.0 was used to compile spectra and detect peaks. The m/z ratios between 900 and 20,000 with a signal to noise threshold of 8.0 were selected as the target mass range for analysis because this range contained the resolved protein and peptides with smaller molecular weight. The study was conducted in the Proteomic lab in Faculty of Medicine, Alexandria University which is funded by STDF capacity building project 2,897.

Statistical analysis
Statistical analyses were perfromed by SPSS.18. Descriptive measures were done for all variables and a p-value less than 0.05 was considered statistically significant. The Student's t-test, chi-squared (x 2 ) test, and Fisher exact test were used to assess the general characteristics between groups. Receiver operator characteristic (ROC) curve was generated and Youden's index was calculated. Sample size was estimated to promote the detection of a difference of at least 25% in a measured variable among the studied groups at the 5% significance level (α = 0.05) and statistical power of at least 0.8, assuming group coefficients of variation of 50% (σ = 0.5).
The Bruker Compass ClinProTools 3.0 application (referred to as 'ClinProTools') was used for data analysis. ClinProTools combines visualization features and multiple control subjects (Figure 3 and 4). The receiver operating characteristic curve (ROC) analysis was performed for each peak. Based on quantitative data of peak intensities mathematical algorithms to generate pattern recognition models for classification and prediction of disease from MS based profiling data. Data analysis began with raw data pre-treatment, including baseline subtraction on spectra, normalization and recalibration of spectra, followed by internal peak alignment using prominent peaks, and a peak picking procedure. Statistically significant different quantity of peptides was determined by means of Wilcoxon test.

Results
The mean age of BC patients and control subjects was 53.44±10.88 and 40.90±13.62 respectively. The infiltrating ductal carcinoma was the predominant histopathological type in BC patients (91.3%) followed by lobular carcinoma (5%), while mixed ductal and lobular carcinoma (3.8%) ( Table 1). The data set was randomly split into model generation group (including 45 BC patients and 30 control subjects) and external validation group (including 35 BC patients and 20 control subjects). The model generation group was used for identification of the differentially expressed peptides between BC and controls, while the external validation group was used for independent validation of the peptide signatures. There was no significance difference between both groups regarding age as well as the pathological tumor criteria (Table 2).
A pilot study was performed on 10 BC patients and 10 control subjects to determine the optimal type of MB for sample purification. Two profiling kits were used; MB with hydrophobic interaction chromatography (MB-HIC8) and weak cation exchange chromatography (MB-WCX). MB-HIC8 managed to capture larger number of peptide peaks compared with MB-WCX and hence were utilized for the proteome fractionation. MB-HIC8 also showed better results regarding recognition capability and cross validation.
Model generation groups were constructed to differentiate BC from controls. A total of 92 distinct peaks were identified and 33 of them were significantly expressed. Of those, 22 peaks were up-regulated while 11 peaks were down-regulated in BC patients compared with the control subjects (Table 3). The stack and simulated two dimensional gel electrophoresis views of samples are depicted in Figure 1 and 2.
The ClinProTools supports four kinds of algorithms for model generating classification; Genetic Algorithm, Supervised Neural Network, Quick Classifier (QC) and Support Vector Machine. In our study, the QC model achieved the best results with 100% recognition capability and 96.4% cross validation accuracy. The cross validation refers to the accuracy of the algorithm to correctly assign a random sample to the correct group. In the QC algorithm the peak areas are sorted per peak and a weighted average over all peaks is calculated. Three peptide ion signatures with m/z 1,570.31 (start mass: 1,566.09, end mass: 1,574.99), 1,897.4 (start mass: 1,892.07, end mass: 1,909.46) and 2,568.17 (start mass: 2,560.23, end mass: 2,577.52) were obtained as a proteomic profile for a cross validation set to discriminate the BC patients from   Figure 5).
To verify the accuracy of the established QC classification model, we performed external validation study which consisted of 35 BC patients and 20 control subjects (not used in the model generation group). The QC model had higher external validation values than both SNN and GA models. The QC model correctly classified 100% of the breast cancer (sensitivity) and 76.9% of the control (specificity) samples.
Two samples were used to determine the within-and between-run precision. In each profile, three peaks with different molecular masses were selected to evaluate assay precision. Within-run imprecision was determined by evaluating the coefficients of variance (CV) for each sample, using 8 assays within a run, then between-run imprecision was established by carrying out 8 different assays for a sample over a period of 7 days. The peak CVs were all <4% and <10% in the within-and betweenrun assays, respectively (figure 6). These values were consistent with the reproducibility data for the Protein

Discussion
Owing to the heterogenous nature of BC as well as the different molecular subtypes, no single molecular feature per say could be a decisive diagnostic tool. Instead, multi-component molecular classifiers of either genes or proteins are suggested (Tayyari et al., 2018). Although a number of genetic panels are currently in practice for diagnostic and prognostic applications, medical management guidelines do not explicitly advocate for or against the use these testing methods (NCCN guidelines, 2017; The American Society of Breast Surgeons Consensus Guideline, 2017). Meanwhile BC proteomic panels are still under research and are not commercialized for clinical use. Despite, peptide profiling has been approved for routine use in clinical microbiological laboratories for identification of microorganisms (Patel, 2015).
In this work, a case control comparative analysis between BC and healthy controls was performed. Plasma proteomic profiles were identified with MB-HIC8 fractionation followed by MALDITOF MS. A total 33 peak were significantly expressed. The QC model provided three peptide ion signatures (m/z 1570.31, 1,897.4 and 2,568.17) as a proteomic profile to discriminate the BC patients from control subjects with recognition capability and cross validation accuracy of almost 100%. These peaks are worthy of further sequence determination and functional analysis. Blinded verification of the QC  classification model proved to correctly classify 100% of BC cases (sensitivity) and 76.9% of control subjects (specificity). These findings highlight the possibility of the use of our classification peptidome model as a sensitive and specific diagnostic tool.
During tumorigenesis proteins and peptides may be abnormally secreted, over/under-expressed, modified or degraded. These cancer-specific low-molecular-weight proteins are an indirect snapshot of the enzyme activity in tumor cells (Hajduk et al., 2016). They mostly result from specifically released proteases that process the acute phase proteins generated by the host response to the tumor (Qin and Ling, 2012).
Different proteomic technologies have identified unique proteome patterns that can diagnose BC in different clinical stages and recognize different outcome and response to therapy (Mazur and Pyatchanina, 2016). Furthermore, reproducibility of MS based protein profiles for diagnosis of BC across clinical studies has been demonstrated in systematic review (Callesen et al., 2008).
By searching literature for MS-based studies using MALDI-TOF MS, we retrieved two protein peaks by comparing the obtained molecular masses in our work with the molecular weights of identified proteins and their subunits. Villanueva et al., (2006) identified a peak with m/z 1895.99 as a fragment of complement component 4a (C4a) by using MB-HIC8. The same study identified the peak at m/z 2,568.17 as a fragment of apolipoprotein E (ApoE). Additionally, a similar peak with m/z 1,897.4 was identified as C4a by Tiss et al., (2010) using reversed phase pre-packed C18-ZipTips Van den Broek et al., (2010) have validated a quantitative assay for peptides, generated by BC specific exoproteases, by liquid chromatography coupled to tandem mass spectrometry and one of these peptides was C4a. Also ApoE has been reported to significantly increase in sera of BC patients compared to healthy control confirming its role in promoting tumor cell growth .
Although proteomics based research have been growing, limited studies are reported till date in Egypt    and they focused on the use of tandem MS in screening for inborn errors of metabolism (Hassan et al., 2016). Our work, is the first of its type in Egypt to employ MALDI TOF in BC.
Definitely the aim of MS proteomic profiling is reproducible, rapid and inexpensive acquisition of discriminating peptide signature that can be suitable in population screening or clinical triage. Provided standardized sample pretreatment, MS measurement, data processing and analysis, peptide and protein profiles are highly reproducible. Standardization of preanalytical factors such as anticoagulants, temperature, freeze-thaw cycles and storage conditions is critical in MS studies because they have significant impact on dynamic alterations of the acquired proteome (Baumann et al., 2005;Periano et al., 2016;Tsuchida et al., 2018).
In the current work we used a standardized protocol which started by the use of plasma. This was in agreement with the Human Proteomics Organization who recommended ethylenediamine tetraacetic acid (EDTA) plasma as the preferred specimen from blood. This can be attributed to less degradation ex vivo in addition to much less variability than the protease-rich process of clotting (Rai et al., 2005). Few studies reported peak identification lists for serum (Villanueva et al., 2006;Tiss et al., 2010) or plasma (Koomen et al., 2005) and this showed the inherent difference between the two type of samples in the moderate percent of concordance in the identified peaks. Several proteomic based studies used plasma (Periyasamy et al., 2015;Baralla et al., 2018).
Choosing the right type of MB enables the highest acquisition of proteins and peptides for the furthur proteomic profiling analysis, therfore we compared the performance of two MB and we found that MB-HIC8 had better average peak numbers, higher peak intensitites, and better capturing ability than MB-WCX. HIC showed good performance in previous studies such as the work by (Villanueva et al., 2004;De Noo et al., 2006;Periyasamy et al., 2015) as well as good reproducability (Baumann et al., 2005).
Several compounds were employed as MS matrix such as 2,5-dihydroxyacetophenone, sinapinic acid and 2,5-dihydroxybenzoic acid (Hajduk et al., 2016). We used alpha-cyano-4-hydroxycinnamic acid (HCCA), the most common MALDI-TOF MS matrix, yielding satisfactory quality spectra in both low and high mass ranges. Once again standardization of the type of organic solvent, matrix and its concentration and pH is mandatory because of its effect on crystallization (Penno et al., 2009). HCCA was employed by numerous research groups (Calandra et al., 2016;Kim et al., 2016;Baralla et al., 2018).
In conclusion, our study showed that MB-HIC8 fractionation followed by MALDI-TOF MS combined with ClinProTools software shows high sensitivity and specificity for the identification of BC. The study was limited by the absence of sequence identification of the expressed peptides. Nevertheless, we followed standardized protocols in sample collection, pre-analytical conditions and MS analysis. Moreover biological variables were matched for both groups and data was validated by an independent group not used for model generation. Also the control group included age-matched females undergoing a mammography and showing no aberrations. Our results need to be further confirmed in larger patient cohorts, and we recommend the construction of the protein panel in an immunoassay format exemplified in a multiplexed technique to facilitate its further evaluation and validation. The next step of our study will be to correlate our findings to overall survival, recurrence free survival and quality of life in BC patients. So that, in a near future we can fulfill the role in the personalized medicine and achieve the ultimate aim, the decrease in mortality rate from BC.

Funding Statement
The Alexandria Faculty of Medicine proteomic Lab has been funded by STDF capacity building project 2897. This research has been partially funded by Alexandria University Alex REP project entitled "Serum Proteome patterns of breast cancer: Potential for innovative biomarkers to aid in early diagnosis".