Last month Stanford University researcher Mike Snyder caused something of a stir by publishing in Cell initial results from an ongoing study tracking his “Personal Omics Profile” – a collection of data including Snyder’s genome sequence as well as multiple transcriptomic, proteomic, metabolomic, and autoantibody profiles taken over a period of 14 months (GWDN 3/15/2012).
Snyder, though, isn’t the only scientist serving as a subject in a personal ‘omics project.
Last July, Attila Csordas, a bioinformatician at the European Bioinformatics Institute’s Proteomics Identifications Database – PRIDE – launched what he has termed his Personal Proteome project, an effort to measure and track changes in his salivary proteome over time.
Collaborating with Bioproximity, a Springfield, Va.-based contract research organization that specializes in proteomics, Csordas has thus far collected salivary samples at four time points, three of which he and the company have analyzed via mass spec, confidently identifying roughly 1,000 proteins in each. He has made the data from these runs publicly available on Bioproximity’s Proteome Cluster and on the PRIDE database and is following the effort’s progress on his Personal Proteomics blog.
While the notion of “personal proteomics” is obviously interesting for its potential clinical implications, at this point the project is primarily an exploratory effort, Csordas said.
“At this stage this is strictly a research project,” he told ProteoMonitor in an e-mail this week. “We are trying to figure out the basic bioinformatics of personal proteomics and the different ways analysis can go with this type of data, used [either] by itself or mapped to other types of ‘omics data.”
“We are pushing the early-adopter limits to see what can be achieved with mass spectrometry in the context of individual human proteomes and whether the technology is [able to] finally [provide] valuable information at a reasonably good cost/benefit ratio,” he said.
The project was initially sparked, Csordas said, by a promotion he came across on Bioproximity’s website in which the company offered global shotgun proteomic analysis of any sample type for a flat rate of $1,000.
Interested, Csordas emailed Bioproximity CEO Brian Balgley to see if he might want to collaborate on an open-ended, longitudinal analysis of his personal proteome. Balgley agreed that the project sounded interesting, and several weeks later Csordas packed a saliva sample on dry ice and mailed it via FedEx from his home in Cambridge, UK, to Biopromixity’s Virginia laboratory.
“We’ve done three time points so far, and we’ll continue doing it as long as we’re both willing to do so,” Balgley told ProteoMonitor. “[Bioproximity is] certainly willing to keep doing it. It’s an interesting project, and there’s a lot more to be learned.”
The company, Balgley said, provides the analyses free of charge as part of the collaboration.
The researchers are using a MudPIT set-up run on a Thermo Scientific LTQ Velos – a workflow that, Csordas said “is fairly established and [which Bioproximity’s] lab has been using for years.” This, he noted, has meant that data analysis – rather than data acquisition – has proven the most challenging aspect of the work.
“The challenges really have been presented at the bioinformatics end,” he said. “What search engines can be used with what kind of search parameters? How do you balance between specificity and sensitivity, coverage, and repeatability over time?”
In addition, because Csordas is using saliva samples, the proteomics of his mouth’s microbial population also comes into play. This offers potentially interesting insights into the oral microbiome, but it also significantly increases the size of the peptide library the researchers must search against, he said, noting that the microbiome FASTA file they’ve used contains more than 3.7 million proteins – and that’s before including the human library and a library of common contaminants.
Right now, Balgley said, the researchers are primarily working on building a baseline against which they can gauge observed protein expression differences.
“We’re basically trying to figure out what [proteins are] observable and what are the best ways of observing those proteins,” he said. “We looking at variations we’re getting in our measurements and trying to see how much of that is due to differences in sample collection and how much is due to other factors we might not have considered.”
“There are some significant variations in the amount of the proteins we’ve seen,” Balgley said, noting that this could depend on a variety of factors, including “when [a sample] is collected during the day; how long it’s been since [Csordas] has eaten; whether he’s brushed his teeth or not.”
“I know one of the time points we got was after he had visited the dentist. [For] one of the time points he was recovering from a cold. So we do see a lot of variation,” he said.
The researchers are also looking into various sample-prep and enrichment strategies to see if changes to these might help them better explore different parts of Csordas’ proteome.
“We’re looking, [for instance,] at enriching for exosomes, glycoproteins, and what kind of differences we see in protein populations based on that,” Balgley said. “We’re looking at what the differences are between a 30-minute assay, a two-hour assay, a 12-hour assay. We’re looking at how isoelectric focusing can help us improve the confidence of the proteins identified, especially for the microbial proteins.”
The researchers are sharing all data from the project on Bioproximity’s Proteome Cluster tool and the PRIDE database (available here) and, Csordas said, aim to “make the data as transparent as is possible.”
“It’s one thing to have some data available for download from a study on some servers with minimal metadata,” he said. Ideally, though, researchers would offer “the raw files, search engine output flied, peak list files, search parameter/configuration files, FASTA files, and metadata information all mapped together and made accessible for re-analyzing and further data mining.
“With the combined Proteome Cluster-PRIDE release of the data, we are pretty close to this latter scenario,” he said.
Moving forward, the researchers intend to continue adding new time points in hopes of establishing “a solid bioinformatic foundation” for their analysis, Csordas said.
They are also working to present the project’s findings in papers and at conferences and recently had a poster on the work accepted to the Exploring Human Host-Microbiome Interactions in Health and Disease conference taking place in Cambridge in May.
The project has also picked up support, Csordas said, from some of his UK colleagues, including EBI bioinformaticians Henning Hermjakob, head of the Proteomics Services Team, and Johannes Griss, both of whom have helped with some of the bioinformatics work; and Cambridge Veterinary School researchers Jeff Huang and Robin Franklin, who have contributed supplies to the effort.