The midgut is the digestive apparatus of the silkworm and its proteome was studied by using nano-LC (liquid chromatography) electrospray ionization MS/MS (tandem MS). MS data were analysed by using X!Tandem searching software using different parameters and validated by using the Poisson model. A total of 90 proteins were identified and 79 proteins were described for the first time. Among the new proteins, (i) 22 proteins were closely related to the digestive function of the midgut, including 11 proteins of digestive enzymes secreted by the epithelium, eight proteins of intestine wall muscle and mechanical digestion and three proteins of peritrophic membrane that could prevent the epithelium from being mechanically rubbed; (ii) 44 proteins were involved in metabolism of substance and energy; and (iii) 11 proteins were associated with signal transduction, substance transport and cell skeleton.
- Poisson model
- silkworm (Bombyx mori L.)
The mulberry silkworm, Bombyx mori, domesticated for over 5000 years from the wild progenitor Bombyx mandarina [1,2], is an important economical insect for silk production. As a typically representative organism of Lepidoptera, it is gradually being recognized as the model organism for insects, and would play an increasingly important role in entomology and molecular biology. Moreover, with the development of modern biotechnology, it has become an ideal bioreactor for producing recombinant proteins [3–5].
The midgut is the digestive apparatus in B. mori larvae, and it plays an important role in food digestion and nutrition assimilation for its growth, development and silk production. All the big molecular substances such as proteins and carbohydrates from its natural food, mulberry leaves, must be digested into small molecular compounds and subsequently assimilated by midgut epithelial cells. The larval midgut is also the first natural barrier to resist foreign pathogens. During the larval stage, there are five instars, among which the 5th instar is the most important stage as the larva takes large amount of feed (approx. 85% of the total of whole larval stage), and accumulates most of nutrition and energy for its remaining life, including pupa and moth. The larval 5th instar is also a key period of transition from the larva to pupae. Therefore analysis of the protein composition of the midgut of the 5th instar will definitely help us to understand the larval activities involved in nutrition digestion, assimilation as well as the defence mechanism. Recently, the completion of the genetic physical map and draft sequence for the silkworm genomes [6,7] and the construction of the genome-wide microarray  provided a solid foundation for basic research of the silkworm at the proteomics level.
Developments in MS have dramatically increased the ability to analyse complex proteomes in depth. It is shown that LTQ-Orbitrap MS, which has recently been developed, is better than the LTQ-FT  to identify protein spots. X!Tandem is an open source proteomics software that could be used to find the best sequence model for a given MS/MS (tandem MS) . In the present study, we analysed the silkworm midgut proteins by using LTQ-Orbitrap, and each MS data were analysed by using three kinds of parameters before being validated by the Poisson model [10–12]. We have reported here the identification of 90 proteins of the silkworm midgut, among which 79 proteins were reported for the first time.
MATERIALS AND METHODS
Sample preparation and analysis of nano-LC (liquid chromatography) MS/MS
The B. mori strain P50 was provided by the Silkworm and Mulberry Station, Zhejiang University. It was reared with the mulberry leaves at 25°C. The midguts of the second day of the 5th instar were dissected, washed with 0.75% NaCl solution and rapidly stored at –80°C. Frozen tissues (220 mg) were disrupted by using a Sample Grinding kit (Amersham Biosciences) with 1 ml of lysis solution (8 M urea, 2 M thiourea and 4% CHAPS). The supernate was collected and transferred to another centrifuge tube for further clean-up using a 2-D Clean-Up kit (Amersham Biosciences). The extracted protein concentration was measured by the Bradford method [12a]. The sample was subsequently digested with trypsin, desalted and dried by using a vacuum evaporator. Tryptic peptides were resuspended in 0.1% formic acid and loaded on to an Ettan MDLC system (GE Healthcare) connected to an LTQ-Orbitrap MS (Thermo Electron Corporation) three times.
Separation of tryptic peptide mixtures was achieved by nanoscale reverse-phase HPLC, in combination with online LTQ-Orbitrap. For the HPLC separation, a nano-MDLC (nano-multidimensional LC) system (Ettan MDLC; GE Healthcare) was used, employing a linear gradient of 5–45% buffer B (95% acetonitrile, 5% water and 0.1% formic acid) over 60 min. The column system consisted of a trap (0.5 mm×2 mm) and a separation column (Magic C18 AQ, 3 μm, 200 Å, 0.2 mm×150 mm), both purchased from Michrom. While column 1, trap 1 was running, column 2, trap 2 was equilibrated with buffer A (95% water, 5% acetonitrile and 0.1% formic acid) to allow continuous running of the sample through two columns.
The mass spectrometer was operated in the data-dependent mode to automatically switch between Orbitrap MS and Orbitrap MS/MS (MS2) acquired. Survey full scan MS spectra (from m/z 200 to 2000) were acquired in the Orbitrap with resolution R=60000 at m/z 400. The most intense ions (up to five, depending on signal intensity) were sequentially isolated for fragmentation; ions were recorded in the Orbitrap with resolution R=15000 at m/z 400.
For accurate mass measurements, the lock mass option was enabled in both the MS and the MS/MS mode and the PCM (polydimethylcyclosiloxane) ions generated in the electrospray process from ambient air [protonated (Si(CH3)2O)6; m/z=445.120025] were used for internal recalibration in real time. For a single SIM (selected ion monitoring) scan injection of the lock mass into the C-trap, the lock mass ‘ion gain’ was set at 10% of the target value of the full mass spectrum. When calibrating in the MS/MS mode, the ion at 429.088735 (PCM with neutral methane loss) was used instead for recalibration.
Target ions already selected for MS/MS were dynamically excluded for 180 s. General MS conditions were: electrospray voltage, 1.8 kV, no sheath and auxiliary gas flow; ion transfer tube temperature, 200°C; collision gas pressure, 1.3 mTorr; normalized collision energy, 35% for MS, ion selection threshold was 500 counts for MS2. An activation q-value of 0.25 and an activation time of 30 ms were applied for MS2 acquisitions.
Protein identification using X!Tandem and high-confidence proteins selection through the Poisson model
MS data were analysed using X!Tandem open source package [10,13]. Raw data acquired by the MS above were submitted to X!Tandem along with a FASTA formatted file of the protein database containing protein data of silkworm (http://silkworm.genomics.org.cn/jsp/data.jsp) . Search was performed using three different parameters named parameter 1, parameter 2 and parameter 3 for the best model, and the Poisson model was used to assess the likelihood of our false identification [11–13]. The frequency of false peptide matches, μ, was estimated by using the Poisson model satisfying the constraints that the number of false matches predicted cannot exceed the number of total matches observed . Thereby μ was calculated through the following format: the total number of observed peptide matches divided by the total number of amino acids in the search database. The mean number of matches, λ, expected at random for a protein of length L is μL. The protein length-specific probability, Prand, that M or more matches was observed is:
Moreover, the expected number of matches (E) is: where Ndb is the number of sequences in the database. The confidence, C, that we have identified the sequence from which the spectral data were derived and not one of the E false positives is:
High-confidence ORFs (open reading frames) are defined as a confidence score of at least 0.5 based on this Poisson model.
The output were obtained through X!Tandem and the Poisson model. Cross-referenced UniGene database IDs (see Supplementary Table S1 at http://www.bioscirep.org/bsr/029/bsr0290363add.htm) were used to interrogate corresponding EST (expressed sequence tag) expression profiles. Our protein hits were categorized according to the GO (gene ontology) annotation using Wego software (http://wego.genomics.org.cn/cgibin/wego/index.pl) and IterProScan (http://www.ebi.ac.uk/IterProScan/) annotation. The putative proteins were annotated by using pblastp and their pI and molecular mass values were analysed.
MS reproducibility and searching model analysis with different parameters
The sample was analysed by using LTQ-Orbitrap MS three times (midgut 1, midgut 2 and midgut 3) and subsequently MS data were respectively searched against the silkworm protein database predicted by the genome with X!Tandem of parameters 1–3 with 0.1 as the maximum valid expectation value. Data acquired above (Supplementary Table S1) were screened with the Poisson model with a probability score of 0.5 as the cut-off score, and 90 proteins were obtained (see Supplementary Table S2 at http://www.bioscirep.org/bsr/029/bsr0290363add.htm). Among these proteins, 79 proteins had a probability score of greater than 0.9, and 11 proteins had a probability score between 0.6 and 0.9 with good MS/MS spectra. These results showed that 89 proteins were identified with three or more peptides and one protein was identified with two peptides (see Table 4 and Supplementary Table S2), so our results here are very conservative. We compared three searching methods (three kinds of parameters) for seeking the best model and the results are shown in Tables 1–3. As a result, the model of parameter 1 was considered to be the best model in the results of midgut 1 and midgut 2, but the model of parameter 3 was the best in the results of midgut 3.
Based on the annotated information (Figure 1 and see Supplementary Table S3 at http://www.bioscirep.org/bsr/029/bsr0290363add.htm), the proteins are classified into four categories: (i) proteins associated with special tissue (the protein related to the midgut digest), (ii) the proteins associated with substance and energy metabolism, (iii) other functional proteins and (iv) unnamed proteins (Table 4).
A total of 90 proteins were compared with midgut proteins found previously and it was found that 79 proteins were identified from the midgut for the first time. Of the 79 proteins, 44 proteins are associated with metabolism, almost accounting for 50% (Figure 2), and as is shown in Figure 2, the largest group is the category of the proteins associated with protein metabolism, accounting for 59% in the proteins associated with metabolism (Figure 3). This group includes the proteins associated with protein synthesis and folding (Table 4).
These proteins demonstrated diverse characteristics, including varying pI and molecular mass values (Table 4). It is shown that 43% of the proteins identified in the present study have a pI greater than 9 (Figure 4A) and 38% have a molecular mass >10000 and <30000 (Figure 4B).
Cross-referencing the known EST distribution (available from the UniGene database) with our identified proteins demonstrated a total of ten different tissue types that were identified with expression of at least one mRNA. The tissue distribution of every protein ID identified in the present paper is given in Supplementary Table S4 (http://www.bioscirep.org/bsr/029/bsr0290363add.htm) and illustrated in Figure 5.
In previous papers about the midgut proteome, the methods used for identifying proteins had some disadvantages. In the studies of Zhang et al.  and Kajiwara et al.  about the midgut, the protein database used was not the silkworm protein database [14,15]. It is known that theoretical and experimental mass comparison is the basis of MS, so using the inappropriate protein database is the main reason why the results are not stringent. The findings of Hou et al.  showed that 32 proteins could be found in the silkworm midgut two-dimensional electrophoresis image, but there are only 26 different proteins, due to the presence of duplicate proteins. In addition, Hou et al.  did not show mass tolerance, and the mass tolerance selected in the other two studies associated with the midgut proteome is too large [14–16]. For example, the mass tolerances of the enzymatic peptides are 1 and 0.5 Da respectively in Zhang et al.  and Kajiwara et al.  (our maximum value of the mass tolerances of the fragment in three parameters is 0.2 Da as shown in Supplementary Table S1). Compared with the former study, our identification method is very strict and the result is satisfactory. In our study, the Shotgun proteomics methods were used for protein identification, and midgut proteins were proteolysed to peptides and subsequently subjected to direct MS analysis via CID (collision-induced dissociation) of single peptides. Two-dimensional LC separation prior to MS/MS analysis was employed. X!Tandem and the Poisson model were used to analyse MS and to acquire proteins information. This method has many advantages, e.g. it can identify a large number of proteins relatively quickly and directly analyse membrane proteins. However, this approach has some disadvantages in the random nature of ion selection for CID. The poor reproducibility in picking the same peptides for CID in replicates of the MS/MS course impacts directly on the reproducibility of protein identification . Another factor impacting on it is the different algorithms, as each of them has its own strengths and weaknesses . In order to achieve the best results, the experiments of CID were repeated three times. MS data were searched against the database by using X!Tandem with three parameters. Finally, we found better parameters (parameters 1 and 3). The number of peptide hits is only indicative of the confidence of the identified proteins . In the previous studies, the proteins were identified by only two or more peptides [20,21], but our study showed that it was not a rigorous statistical approach. In the process of validating the results by using the Poisson model, most of the proteins with two peptide hits or peptides were rejected and only Bmb033352 was validated (Table 4).
We identified a total of 90 putative proteins. The number of new proteins is 79, accounting for 87%. Compared with 26 proteins found by Hou et al. , the proteins newly found refer to almost all the cellular biological functions. From Table 4, we know that the proteins associated with special tissue of the midgut are found in new proteins. For example, 11 new digestive enzymes are from the epithelium, eight new proteins relating to mechanical digestion are from muscle tissue and three new proteins are from the peritrophic membrane. Besides special tissue proteins, there are 44 proteins involved in metabolism of substance and energy and there are 11 proteins relating to signal transduction, substance transport and cell skeleton in newly found proteins. The proteins' detailed biological functions are illustrated in the following passages.
The digestive process of the midgut of the silkworm includes mechanical digestion and chemical digestion through muscular action and digestive enzyme. Peristalsis of the musculature of the midgut, such as muscle contraction, helps the digestion. Muscle contraction is caused by sliding between the thick filament and the thin filament of the myofibril. According to the paper by Scott and Jeffrey , myosin and paramyosin are the most abundant thick filament proteins in invertebrates. Myosin exists as a hexamer of two heavy chains, two alkali light chains and two regulatory light chains. The proteins related to the heavy chain are divided into the N-terminal globular head and the C-terminal coiled-coil tail. Myosin heavy-chain proteins identified in the present study include Bm015063, Bm015062, Bmb014764, Bmb015061, Bmb014765 and Bmb002610; among them, Bmb014764, Bmb015061 and Bmb002610 contain myosin tail, and Bmb015062 contains myosin globular head, offering energy for muscle contraction. In addition, Bmb002610 containing a myosin tail also supplies energy according to Interpro annotation (Supplementary Table S3). Bmb014765 is a kind of paramyosin that plays an unexpected role in myoblast fusion, myofibril assembly and muscle contraction . The thin filaments are primarily composed of actin (Bmb012212 and Bmb012614) and tropomyosin (Bm004766). Other than these proteins, Bmb024911 (calponin) is a type of calcium-binding protein that can inhibit the ATPase activity of myosin in smooth muscle and can be regulated by phosphorylation of a protein kinase . Bmb021176 is suggested by Interpro annotation (Supplementary Table S3) to be a type of arginine kinase that is bound to actin of the thin filament.
The digestive enzyme in the midgut contains ectoenzyme, present in cavum intestinale, and endoenzyme, mainly present in microvilli of the columnar cell. Most of the enzymes identified in our analysis belong to endoenzyme except chymotrypsin-like (Bmb003747 and Bmb018754) and insulinase-like (Bmb020698). The three kinds of enzymes belong to serine proteases; among them; chymotrypsin-like prefers to cleave on the carbonyl side of aromatic residues, such as phenylalanine and tyrosine, and insulinase-like cleaves peptides on the carbonyl side of the basic amino acids, arginine or lysine . Chymotrypsin-like also has strong antiviral activity against BmNPV . However, ectoenzyme or proteinases in cavum intestinale cannot drastically catalyse the protein to amino acids and it needs endoenzyme. For example, Bmb002795 (membrane alanine aminopeptidase) is a kind of aminopeptidase that catalyses the removal of single amino acids in the N-terminus of small peptides and thereby plays a role in their final digestion  and Bmb028877 (3-hydroxyisobutyrate dehydrogenase) is one of the important enzymes related to valine metabolism . Carbohydrates identified here contain α-glucosidase (Bmb004877), which has similar enzymatic activity to γ-amylase and participates in glycoprotein processing in the endoplasmic reticulum , and glycoside hydrolases (Bmb025436 and Bmb015353), which are a widespread group of enzymes that hydrolyse the glycosidic bond between two or more carbohydrates, or between a carbohydrate moiety . The digestive enzymes associated with lipid metabolism identified here are mainly apolipoproteins; for example, Bmb000670 has sterol carrier activity and Bmb009635 can transport α-tocopherol. Bmb004877 participates in the process of regulation of lipid metabolism and is involved in neurodevelopment, and transports nutrients and vitamins . It is in accord with apparent characters. The larvae midgut is not the main tissue synthesizing lipid or digesting lipid completely, and the lipid component that silkworm assimilates is transported to the fat tissue.
Peritrophic membrane can protect the midgut epithelium from mechanical rubbing from food digestion and infection from pathogens [29–31]. It is made up of protein (47%), chitin (47%) and polysaccharide (5%). The proteins associated with peritrophic membrane include Bmb008223 (laminin B), Bmb017923 (type IV procollagen) and Bmb003222 (chitin-based larval cuticle). In addition, Bmb008243 has many functional domains, so it can also be found in the epithelium (Table 4).
We identified 19 ribosomal proteins and two elongation factors (Table 4; Figure 2), which are the main proteins responsible for protein translation. The large subunits of ribosomal protein are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure and they interact with multiple RNA elements, too [32,33]. Further, we identified 13 large units of ribosomal proteins including Bmb026270, Bmb032730, Bmb020135, Bmb001465, Bmb022410, Bmb023836, Bmb035628b, Bmb031937, Bmb030760, Bmb014252, Bmb024069, Bmb039472 and Bmb037600. Protein S4 (Bmb10338) initiates assembly of small unit and is located at junctions of five and four RNA helices respectively [32,33]. The proteins of the small unit we identified include Bmb030045, Bmb006711, Bmb033626, Bmb025378 and Bmb036419.
Newly synthesized proteins can be folded with the help of a folding helper such as chaperonin and isomerase. Chaperone proteins can help some proteins folding during synthesis and refolding during denaturing stress, thereby preventing protein misfolding and aggregation. Hsps (heat-shock proteins) are a family of molecular chaperone proteins that promote protein folding, assembly, translocation and secretion of newly synthesized polypeptides. They also participate in the removal or repairing of denatured proteins acting as molecular chaperons. Different Hsps target different proteins. Hsp70 (Bmb011274) binds to nascent proteins and helps them to reach its final native state. Hsp90 (Bmb035865 and Bmb036625), however, does not act in nascent protein folding. Instead, it targets signal transduction proteins such as steroid hormone receptors and signalling kinases . Besides the chaperone's proteins, the other folding helper is isomerases including PDI (protein disulfide-isomerase) and peptidylprolyl cis–trans isomerase. PDI (Bmb022619 and Bmb032503) accelerates the process of forming the proper disulfide bond and switches its conformation from dimer to tetramer in its functions as a foldase . Peptidylprolyl cis–trans isomerase (Bmb010862) catalyses peptidyl-proline circumgyration of 180 angles to accelerate the folding process and exhibits chaperone-like activity during muscle creatine kinase .
Some proteins identified are associated with the metabolism of nucleic acids. For example, histone H3 (Bmb012751 and Bmb028559) forming the eukaryotic nucleosome octomer core winds approx. 146 DNA base-pairs . Bmb004885 (nuclear receptor co-activator 7) and Bmb007243 (the protein associated with mitotic cell cycle G2/M transition DNA damage checkpoint) are the proteins related to cell division. Some proteins are related to carbohydrate metabolism. For example, Bmb027946 (lactate or malate dehydrogenase), Bmb020504 (lactate or malate dehydrogenase), Bmb006175 [GAPDH (glyceraldehyde-3-phosphate dehydrogenase)], Bmb005299 [PDH (pyruvate dehydrogenase)], Bmb019850 (glucose/ribitol dehydrogenase), Bmb007149 (citrate synthase) and Bmb025249 (glutamic-oxaloacetic transaminase) are associated with carbohydrate metabolism . Some proteins are responsible for fatty acid catabolism, such as Bmb033769 (enoyl-CoA hydratase/isomerase) and Bmb011880 (acetyl-CoA C-acyltransferase) . Bmb039014 [FAMeT (famesoic acid o-methyltransferase)] can catalyse the conversion of farnesoic acid into MF (methylfarnesoate) by the mandibular organ of crustaceans . Previous studies indicated that eyestalk neuropeptides may negatively regulate MF biosynthesis through FAMeT in the mandibular organ . In insects, the regulation of JH (juvenile hormone) III production in corpora allata and allatostatin-mediated inhibitory action on JH III biosynthesis is facilitated by FAMeT .
Some proteins are associated with energy metabolism in the midgut. For example, in the function of cytochrome oxidase (Bmb039215), cytochrome c (Bmb004448) binding Fe2+ can send the electrons from QH2 to O2. Bmb016550 (cytochrome P450 4d2) plays a major role in the synthesis and degradation of insect hormones (ecdysteroids and juvenoids) and the activity  or detoxification of such chemicals as plant toxins and insecticide [42–45]. The F1Fo-ATP synthase complex exists only in the form of dimeric and oligomeric complexes in the inner mitochondrial membrane . It consists of Fo subunits responsible for proton transport and F1 subunits performing the task of ATP synthesis/hydrolysis. From Table 4, Bmb011628, Bmb008836, Bmb016644 and Bmb005542 all are the components forming ATP synthase and have the capability of proton transport and binding ATP; however, only Bmb027800 can bind ATP but does not have the capability of proton transport. In Saccharomyces cerevisiae, Bmb033352 (ATP synthase E)'s N-terminal hydrophobic region plays an important role in the subunit e-dependent processes of mitochondrial DNA maintenance, modulation of mitochondrial morphology and stabilization of the dimer-specific Fo subunits, subunits g and k. Its C-terminal coiled-coil region of subunit e functions to stabilize the dimeric form of detergent-solubilized ATP synthase complexes .
There are other functional proteins such as signal transduction protein, transport protein, skeleton proteins etc. Bmb028514 is a kind of adenine nucleotide translocator, which is a target protein of nitric oxide, peroxynitrite or 4-hydroxynonenal, which is involved in the pathological demise of cells via apoptosis . Transferrin (Bmb000857), glycoprotein, is a blood plasma protein for iron ion delivery, which reversibly binds iron very tightly . Learned from Interpro annotation (Supplementary Table S3), Bmb013324 (porin) not only transports anion but also has voltage-dependent ion-selective channel activity. Protein kinase catalyses the phosphorylation of serine, threonine or tyrosine residue of target protein, which change the conformation and function of the target protein. Bmb020823 (serine/threonine protein kinase) and Bmb004994 (tyrosine protein kinase) play an important role in many cellular processes such as division, proliferation, apoptosis and differentiation . Microtubule, one of the basal ingredients of eukaryotic cytoskeleton, is made of tubulin, adimeric proteins composed of α-tubulin and β-tubulin. We identified a kind of α-tubulin (Bmb004930) and three kinds of β-tubulin (Bmb008789, Bmb020953 and Bmb003475). Tau proteins are microtubule-associated proteins that are involved in microtubule assembly and stabilization and cellular shape, motility and signal transduction. Bmb000655 is a kind of tau protein. Learned from Interpro annotation (Supplementary Table S3), it contains a proline-rich region, so the protein is phosphorylated by many kinases, changes the protein's conformation and affects its ability to bind microtubules, the tubulin-binding domain and other cytoskeletal elements. In addition, tau mRNA is expressed predominantly in neurons, and particularly in their axons. According to Interpro annotation (Supplementary Table S3), Bmb019690 is an axon guidance receptor, but tau does not appear to be an essential protein, since transgenic mice lacking tau appear to develop a normal nervous system with only mild alterations in the structure of certain small-calibre axons .
This work was supported by the Basic Research Programme [grant number 2005CB121003], the National High-Tech R&D Programme [grant number 863], and the New-Century Training Programme Foundation for the Talents (Ministry of Education, People's Republic of China) [grant number NCET-06-0524].
Abbreviations: CID, collision-induced dissociation; EST, expressed sequence tag; FAMeT, famesoic acid o-methyltransferase; GAPDH, glyceraldehyde-3-phosphate dehydrogenase; GO, gene ontology; Hsp, heat-shock protein; JH, juvenile hormone; LC, liquid chromatography; MDLC, multidimensional LC; MF, methylfarnesoate; MS/MS, tandem MS; PCM, polydimethylcyclosiloxane; PDH, pyruvate dehydrogenase; PDI, protein disulfide-isomerase
- © The Authors Journal compilation © 2009 Biochemical Society