IT WAS THE MOMENT OF TRUTH. WE HAD WAITED ALMOST THREE years to see if Celera would be able to make good its claim that it had sequenced the human genome faster, more cheaply and more completely than the public project. Now, in an article to appear in the journal Science, it was announcing that the goal was reached. ‘A 2.91 billion base pair consensus sequence of the euchromatic portion of the human genome was generated by the whole-genome shotgun sequencing method,’ the article began. With our own results due to be published the same week as Celera’s, we were naturally eager to compare the two, and as previously agreed we exchanged papers with Celera shortly before the date of the joint public announcement, February 12, 2001. As we read for the first time through the detail of what the Celera team had done, it became clearer and clearer that the whole-genome shotgun strategy had not lived up to the claims made for it.
We couldn’t quite believe it. We had fully expected their sequence to be better than ours, given that they had access to all our data and we knew that they were using it. But they were publishing a sequence that seemed overall no better than the publicly released
sequence, and which depended heavily on it. Though this dependence was glossed over, it was there in black and white for anyone who chose to read the paper carefully enough.
Celera had been launched with the promise of quarterly releases of data culminating in publication in a leading academic journal; but as the Human Genome Project pursued its own draft strategy, the first undertaking was quietly dropped. As a result, the publication would be the first opportunity offered to anyone other than Celera’s paying subscribers to evaluate what the company had actually achieved.
For a paper of such importance, researchers traditionally choose one of two journals: Science, the journal of the American Association for the Advancement of Science, and the British-owned journal Nature published by Macmillan. Both journals are international weeklies, but Science has a higher circulation within the United States and so is often preferred by United States scientists. Both had clear policies on publishing sequencing papers. As sequencing projects had grown larger and larger it had become quite impossible and pointless for journals to publish the sequences in print. So the authors of genome papers had to agree to deposit their data in one of the public databases (which in practice meant all of them, as GenBank, the European Molecular Biology Laboratory Data Library and the DNA Data Bank of Japan shared their holdings), where anyone could verify the findings outlined in the paper. Data in the public databases are free to anyone not only to read, but to download and analyze without restriction. A commercial company can even package and sell them if it so chooses. The databases act as universally available resources with the aim of advancing understanding and discovery for the benefit of all.
It was becoming quite obvious in the course of the HGP’s abortive negotiations with Celera at the end of 1999 that the company had no intention of depositing its human sequence data in the public databases, as it had done (though somewhat under pressure) with the
Drosophila data. Instead, as it announced in a January press release, it would make these data available (exactly how was not clear at this stage, but from its own website seemed a likely possibility) with strings attached. One of the conditions was that users would have to agree not to redistribute the data. Commercial companies would also have to pay to make use of the Celera sequence.
Would Science or Nature accept a paper on conditions that fell so far short of their own guidelines? If either did, it would be a very serious matter for science as a whole. Scientific journals are critical to the integrity of the whole enterprise of science. They decide what does and what does not get published. Any article submitted for publication goes out to experts to be refereed, so that acceptance means that your peers have judged the work to be original and its conclusions valid. Of course, there are any number of examples of the peer review system failing—of brilliant work being rejected, or dubious results accepted—but, flawed though it is, the system more or less works. For the author, a published paper is a vital addition to his or her professional worth: quality and quantity of publications have become the main criteria by which a scientist is judged. Once a paper is out it becomes another brick in the hall of science, for others to build on or challenge. In some ways it’s more like a termite mound, each of us industriously constructing little bits of the castle, adding extra chambers, repairing damage, knowing that later generations may build on our work or demolish it and build something different. But the enterprise succeeds only if our handiwork is in the open for all to see and make of it what they will. With so much at stake, journals have to act with the utmost probity in their dealings with scientific authors.
Part of the agreement with Celera, negotiated in Francis Collins’s beer and pizza sessions with Craig Venter and Ari Patrinos, was that we would publish our papers simultaneously. The working assumption was that this also meant in the same journal, probably Science. Mark Patterson, one of the Nature editors, suggested to me
that if the sequence paper did indeed go to Science, perhaps a complementary set of papers on mapping could go to Nature. Then both journals could share in a historic announcement. It was a good compromise that found general favor, and the two journals agreed to publish all the papers in the same week, whenever that should be, so that the whole effort would make a nice big splash. But we needed to know what Science (or indeed, Nature if Science rejected Celera’s paper) was going to do about the data release issue. It seemed unlikely that Nature would bend its rules, but we weren’t so sure about Science. We knew that the journal was negotiating with Celera. If it wasn’t prepared to change its rules, there would be nothing to negotiate about. If it was going to make a special case of the Celera data, then I for one did not see how we could agree to publish in the same journal. This was an issue that went well beyond the antagonism between the public and private genome projects—it was fundamental to the practice of science.
Once they heard what was going on, leading figures from throughout the scientific community wrote to voice their concern. Aaron Klug of the Royal Society, Bruce Alberts of the National Academy of Sciences and Harold Varmus, ex-head of the National Institutes of Health, all joined the discussion. But the editors of Science—Floyd Bloom was succeeded by Don Kennedy during 2000—argued that when publishing private data they needed to recognize a company’s right to protect its investment in sequencing. Kennedy responded to the concerns of the scientific community by claiming, not very effectively, that making the data available at Celera’s website met the magazine’s condition of deposition in a ‘public database.’ At the same time he came up with precedents for companies publishing in Science without revealing all their data—but none of them was a genomics company. The point about sequence data is that they are not just the raw material for the substance of the article, they are the substance of the article. And the argument for having all genomic data in one place is very specific. If
you want to do any kind of serious analysis of the genome—finding genes, control regions or long-range features of any kind—you need to be able to access all the data at once. If Celera could argue for keeping its data separate, then others could do the same—in which case you would end up with a ‘balkanization’ of the genome sequence that would destroy its very purpose as a tool for discovery. Biologists would have to consult one source after another—and with the non-redistribution condition, they would not be able to incorporate the private data into the publicly available genome databases such as Ensembl that were being set up to provide easy access to the sequence and related data.
The political controversy surrounding Celera’s submission was an unwelcome distraction from the much more important task for us of putting together our own paper on the draft sequence. The publication of the two papers would be a watershed in the history of the genome. The 26 June event had been a great day, but it would mean nothing to posterity—it left no trace behind but news reports relaying the hyperbole of politicians and scientists alike. It was dishonest to the extent that both sides had had to modify their definitions in order to say the working draft was complete. But the publications would be unambiguous. They would have to give the facts in full, making it clear just how far short of complete each sequence was. They would have been scrutinized by critical but informed colleagues, to ensure that vague points were clarified, weak claims strengthened or eliminated. Most importantly, they would pass into the scientific literature to be consulted again and again by scientists in future years and decades.
We began to think about our paper early in 2000, and Eric Lander drafted a first outline. It would include all the history and background to the project, so that anyone outside the field would really understand how it was done. But the publication would make no difference to what people did every day in the twenty sequencing centers worldwide whose work was being documented—they
would keep adding, every hour, minute and second, to the total number of bases sequenced, and every twenty-four hours that new information would arrive in the databases. So we would have to choose some essentially arbitrary cut-off point at which we would freeze the data, to provide a snapshot for our analysis. We did this for the 26 June announcement, but for the publication we would be able to choose a later date and so have a more complete set of data. This was an extreme case, but it’s actually true of a lot of science that the exact publication point is arbitrary. Most research is work in progress, with new questions arising from every answer and the concept of a breakthrough more often than not an artificial one. But having a clear written record in terms of milestones is nevertheless tremendously valuable.
The sequence was accumulating in the databases. Assembling the sequence was a far from straightforward task—because many of the data were based on unfinished clones with gaps and errors in them, because sequence was being collected from more than twenty different sources and because the high proportion of repeat sequence in the genome was a continuing headache. Jim Kent at the University of California at Santa Cruz, who was then still a graduate student (but one with a previous life in the computer animation industry), spent a month writing a huge piece of code called GigAssembler to do the job, which ran successfully for the first time four days before the 26 June announcement of the draft. Ewan Birney, Tim Hubbard and their colleagues at Hinxton were steadily improving the capacity of Ensembl to predict genes using the full range of bioinformatics tools. Along with the map at St. Louis, all these sources of genome information were freely available online, and would be described in the paper. But as time went on, ambitions for the scope of the paper grew, with some wanting it to include much more detailed analysis of the genome. On the whole, I was for an earlier and briefer account, but I could see there was a case for more comprehensive treatment. I was vaguely aware that Francis
Collins was continuing to negotiate with Craig about how the publications were handled, and one element of this discussion seemed to involve a later publication date that would give us more time.
Eric Lander started to put together an international team of bioinformatics people, to form what he called the ‘hardcore analysis group.’ Coordinated through regular meetings, conference calls and a blizzard of e-mails, this group steadily produced the figures, tables and diagrams that would reveal what the sequence was really telling us.
The paper was a key topic for discussion at the eighth international strategy meeting, held at Evry near Paris in September 2000. It would reflect the international nature of the Human Genome Project in both authorship and content. There would be no names on the title page—the author would be simply the International Human Genome Sequencing Consortium. Twenty centers would be listed as members of the consortium: twelve from the United States, five from Europe (one each from the U.K. and France, and three from Germany), two from Japan, and one from China. Other centers that had sequenced small regions were listed in the acknowledgements. We also wanted to discuss with our international colleagues the still unresolved position with respect to Science and data release. There was general agreement that if Science compromised unduly on data release, we would move the paper to Nature. After the main meeting, Francis Collins convened Eric Lander, Bob Waterston and me to discuss progress. We agreed that the four of us would take responsibility for seeing the paper through to completion. Eric, who writes well and enjoys doing so, would take on the final editing and harmonization.
A month later I joined the hardcore analysis group in Philadelphia as they finalized their results. It was indeed exciting to see the analysis emerging, the ‘landscape of the genome’, in the phrase that was on everyone’s lips. On the Saturday afternoon that
same weekend, Eric and Francis went off to the annual meeting of the American Society of Human Genetics to make their speeches and accept the Human Genome Project’s share of an award for sequencing the human genome. The other share was to be accepted by Craig Venter. When they came back they were subdued, as well they might be. It was so extraordinary. Everyone could see the public data; nobody could see Celera’s. Yet an award was being given on the strength of the company’s statements. Had this ever happened before, we asked ourselves at the Sanger Centre when we heard of it? That an internationally reputable society would give an award for research that was unpublished and unseen? This had nothing to do with what one believed to be the actual facts of the case; that was a separate issue. The shocking thing was that at this stage there was no evidence on the basis of which they could make the award at all. First science by press release, and now awards by press release!
Two days earlier, Jane Rogers had seen Francis at the conference, and had asked him if he didn’t feel that the situation was unethical. ‘Jane,’ he had replied, ‘there are no ethics in this.’
The four organizers of our paper—Francis Collins, Eric Lander, Bob Waterston, and I—were united in our opposition to Science’s position on the release of the Celera data. Expressing the unanimous view of the Sanger Centre, I was for moving to Nature immediately; if the Celera data were not going into the public databases that was all we needed to know. But those at the American end were more cautious. They were equally unhappy about any retreat from the usual data release conditions, but once again, political conditions made it difficult to take a decision that clearly implied criticism of United States industry. They felt they could not act until we had seen the wording of the conditions that were to be placed on access to Celera’s data, and continued to hope there would be time to negotiate. ‘The devil is in the detail,’ they said. On the contrary, thought I, the devil is in the principle.
The months went by—the goal of September publication had long since gone by the board—and at the end of November we still did not know exactly what sort of deal Celera had struck with Science. Michael Ashburner, the Cambridge Drosophila geneticist, joint head of the European Bioinformatics Institute and a staunch supporter of unrestricted data release, wrote an outraged letter to every member of the magazine’s board of reviewing editors, of which he himself had been a member until not long before. He told them he was refusing to review any more articles for Science, or to submit any to the magazine if Science went ahead with the Celera paper on the existing basis, and he urged his former colleagues to follow his example and resign. Copies of the letter circulated rapidly on e-mail, and the issue became a topic of heated debate in lab coffee rooms, although surprisingly it barely surfaced in the press.
Eventually it was all over very quickly. We finally got to see the text of the material transfer agreement that commercial companies would have to sign if they wanted to look at Celera’s data, and the restrictions on academic users. The latter would be able to download 1 megabase per week by clicking on the Celera website, subject to a non-redistribution clause; if they wanted more they would have to get a signature from a senior member of their institution guaranteeing that the data would not be redistributed. The restriction on redistribution meant in effect that no public sequencing lab could even look at the data without laying itself open to a potential lawsuit. Commercial companies would have to pay to use the data, and were also bound by an agreement not to redistribute them; companies Eric consulted suggested that many of them would not be able to sign. There was no time left to discuss any modifications to these agreements, which we found completely unacceptable. After a rapid circular to all the members of the sequencing consortium to confirm their approval, Eric wrote on behalf of all of us to the editors at Science and told them that we were submitting our paper to Nature instead. It finally went off on 7 December.
The same day Daphne and I flew to New York to visit our daughter Ingrid and her husband Paul Pavlidis. During her Ph.D. in Berkeley, Ingrid had worked as a volunteer at the Exploratorium in San Francisco and discovered a passion for science communication. She and Paul had married and moved to New York, where she now had a job developing exhibits in bioscience for the New York Hall of Science, while Paul was working in bioinformatics at Columbia University. They had a university apartment nearby. As evening fell on that day, Daphne, Ingrid and I left their apartment to go shopping, turning away from the cathedral and down the street to the west, crossing Broadway with its shops and bustle and on down to the quiet boulevard above Riverside Park. We walked south, the lights of New Jersey shining across the dark waters of the Hudson River. Above them the blue of the sky deepened, with pink streaks of cloud down river. Drifts of dry leaves crunched under our feet, the row of empty benches that had lost their occupants for the winter snaked on under the great trees.
Ingrid was heavily and beautifully pregnant with her first child—our first grandchild. Walking with the two women, I thought of the genetic events that bound us together so closely, of the three generations of human beings through whom one strand of the common thread of humanity was being transmitted, and of the events, both stirring and sad, of the years of endeavor to decipher the code.
On our return to the apartment I opened my laptop and logged on to my e-mail. The download of the last message, over the phone line from Hinxton, took for ever. But at last there it was, from Mark Guyer at the National Human Genome Research Institute to Carina Dennis at Nature, with the huge attachment and the header: ‘Carina: On behalf of the International Human Genome Sequencing Consortium, we are very pleased to be able to submit the attached manuscript for publication in Nature.’ We had taken a stand for freedom of information and integrity of scientific publication. The scientific world would be made aware that what Science had done
was unacceptable to us, to our advisers and (we guessed) to most of our fellow scientists. We would encounter criticism from some who felt we had overreacted, but we could face them with confidence knowing that in the eyes of the majority we had done the right thing.
Later, I walked out in the freezing air again, helping Paul to collect a new sofa. I was chattering away about the submission and how important it was. Then I checked myself. Why? Why was I justifying the action to Paul? What I found, with him as with everyone who was not directly involved, was that nobody knew what was going on—or didn’t believe it. And I reflected, not for the first time, on the power of public relations. Like those who can afford expensive lawyers, those who can afford expensive PR usually get their way—or at least, exert influence beyond what is justified. The penetrative, unremitting power of Celera’s PR had so convinced the newspapers, and through them everyone, including many of my fellow scientists, that my own truths counted for nothing. Once a particular point of view has taken hold in the public imagination, it’s extremely hard to offset it. The only recourse is to compete on the PR front in the first place. I find that a profoundly depressing thought. Is it a fantasy that simply being honest will in the end be powerful enough?
In the middle of the frantic effort to get the paper together, I finally stepped down from the directorship of the Sanger Centre. As I had expected, the Wellcome Trust had been in no hurry to act on the notice of resignation I had delivered almost two years before. But eventually they accepted that, with the sequencing of the human genome well on target, it was not a crazy idea to find a director with different aims and skills who would extend the Sanger Centre into the era of functional genomics—using the genome to understand biology. They found Allan Bradley in Texas. Allan had begun his career in genetics in Cambridge, but had moved to Baylor College of
Medicine in Houston in 1987. There he had carried out pioneering studies of development in the mouse, using gene knockout techniques to explore how the process is controlled. Still only in his early forties, Allan was well placed to encourage new perspectives while sequencing continued on other species including the mouse, the zebrafish and many pathogens.
Allan was to take over at the beginning of October 2000. On the Saturday following the Paris meeting Bob Waterston, Rick Wilson, John McPherson and I travelled back to London by Eurostar, and Bob and I went on to Stapleford and a warm welcome from Daphne. Bob seemed a little preoccupied, and kept scribbling on a big yellow pad. On Monday we went into the lab, and again everything was a bit strange. I knew there was to be some kind of farewell do, but no one had told me the details and I thought maybe there would be a few drinks at the end of the day. I tried to set up meetings for the afternoon, but kept being met with shifty evasions. At the end of the morning I was seized and taken to the Garden Room (now the James Watson Pavilion) where all the senior staff had gathered for a splendid lunch.
From there we went to the auditorium and I was immensely moved as one by one my closest colleagues came to the lectern. First—and now I knew what the yellow pad was about—was Bob. To my amazement, the first story he told was the story of Syosset: how we had stood on the station platform and realized the enormity of what we had undertaken, and how I had said I heard the prison door closing. We had never discussed that conversation since, and yet both of us had retained the memory as a pivotal moment in our relationship and the story of the genome. A parade of other key figures followed. After more drinks, I was marched back to the auditorium and discovered that the metaphorical curtain was about to go up on a full-scale pantomime. Christmas pantomimes had been a regular event since the Sanger Centre started; John Collins and Ian Dunham were the principal scriptwriters. This time they had sur-
passed themselves. ‘King John and the Knights of the Holy Genome’ featured all the main characters in the human sequencing story, but owed more to Monty Python than Thomas Malory. Two minutes or so into the show I was hauled on to the stage to play the lead, with Jane Rogers handing me flash cards for my lines. With interludes from an ABBA lookalike singing group (‘He is the sequence king…’) and a Big Brother-style video (‘I think John should go because…’), it was the most amazing honor: a personal Sanger Centre pantomime.
I’d had plenty of moments when I wished I wasn’t director, but never because of my colleagues. The ethos among them is fantastic. There’s a huge sense of team spirit. At the same time everyone is serious about what matters, and works with single-minded commitment. If anyone had ever had any doubts about that, they were instantly dispelled at the end of October, when we got our share of the autumn floods that submerged much of Britain in 2000. Our architect-designed lab is built on the flood plain of the Cam. ‘Don’t worry, it’ll only flood every hundred years,’ the architect had said. Four years after the building was completed we had two feet of water in the basement of the West Pavilion, which we had filled with sequencing machines in order to scale up our human genome effort. Almost as one, the sequencing teams swung into action and moved all the machines to a higher floor. Allan, who had been in post for less than a month and had just left on a visit to Houston, returned to find that within two days sequence was flowing once more. There was no panic, no argument, just teamwork and dedication.
Was I sorry to be leaving that behind? Well, for a start I wasn’t leaving, just moving to a smaller office—until the paper came out I would still be busy with the Human Genome Project, and I had unfinished business with the worm. I’d still be able to enjoy chats with everyone. But I’d always been a reluctant director; I’d come close to resigning before over issues that I felt to be important. So not being director any more was a relief more than anything. People
kept asking what I was going to do next. But I seemed to be as busy as ever.
After our decision to move the paper to Nature, early February 2001 became the target date for publication. Despite our decision not to submit to Science, the two journals upheld the agreement to publish the Celera and HGP papers simultaneously. When we heard that the publication date had been moved back by another three weeks, we were furious. We were ready for early February, and so was Nature. The delay gave us no advantage at all—but it could provide a huge advantage to Celera over and above getting its paper ready. The American Association for the Advancement of Science, which owned and published Science, was holding its huge mediafest of an annual meeting from 15 to 20 February. There was to be a special weekend symposium on the genome organized by Craig in association with Science; Francis would also be giving a keynote lecture. Under the terms of the usual embargo agreement Nature imposed on its authors and the scientific press, we would not be able to give a preview of our findings. But what Celera might come up with was anyone’s guess. I could not see it as anything other than a spoiling exercise. Fortunately we were able to prevail on Nature and Science to settle on the week before the AAAS meeting for publication, with a joint press announcement on 12 February and an embargo on press publicity until then.
It was during the build-up to the press announcement that we finally got a chance to look at the Celera paper. What was immediately obvious was that Celera had made no attempt at all—or rather, no attempt that they were prepared to show publicly—to assemble their own sequence data alone. So what was Celera publishing, if not their own data and assembly? As they had announced early in 2000, they had abandoned shotgun sequencing when they reached just over fivefold coverage—about half their original target—and instead incorporated the public data into their data set. For the
paper, they had then produced two alternative assemblies of the data. One they described as a whole-genome shotgun assembly.
This included an extra 2.9-fold coverage from the publicly available Human Genome Project consensus sequence, the data shredded in the computer into pieces the size of normal sequence reads, which the Celera authors referred to as ‘faux reads.’ But these reads were not selected at random—they formed a fully overlapping set. So even though they were chopped up and scrambled before being fed to the Celera assembler, they still contained all the information necessary to put themselves back together in the right order, and recover the sequence generated by the HGP. Furthermore, the ‘whole genome assembly’ also involved a process described as ‘external gap walking,’ which meant filling gaps with assembled BAC data from the HGP. The paper itself revealed that this is what was done, but only if you knew where to look—the whole impression you got from the introduction was that the assembly was a triumphant vindication of the whole-genome shotgun approach.
The second version of Celera’s assembly, which they called a ‘compartmentalized’ assembly, made explicit use of the HGP map to put the sequence in order. There was no ambiguity this time—the compartmentalized assembly depended on the work of the public project, and the authors acknowledged this. What was interesting was that they went on to use the compartmentalized assembly, and not the ‘whole-genome shotgun’ assembly, as the basis for the analysis of the genome that took up the rest of the paper.
In addition, because of the public availability of our draft, they were able to publish a comparison of the two products as part of their paper. Not surprisingly, this comparison was not presented as being in our favor. We would not get an opportunity to make our own public comparisons until the day of the announcement on 12 February.
Eric Lander, Richard Durbin and Phil Green all independently analyzed the information and came to similar conclusions. There
was no evidence in the paper that the whole-genome assembly had worked adequately, and the compartmentalized assembly seemed overall comparable with what our own effort had achieved. Phil’s view was particularly telling: he had kept out of the consortium in order to maintain his independence, although of course we all used his assembler, phrap, to put our clones together. Phil had refereed our paper for Nature. Referees are conventionally anonymous, but Phil made a point of signing his referee’s report. His initial review of the international consortium’s paper consisted of fourteen closely typed pages of comments. We’d been kept busy all over Christmas dealing with his points. None of it was destructive; he’d just pointed out where we’d been sloppy in reporting and needed to do some tidying up.
Despite disparaging remarks from Craig Venter about ‘pissing contests’, our intensive analysis wasn’t carried out in the interests of my-contig-is-bigger-than-your-contig point-scoring. Rather, it was the first step in discussing objectively, rather than by press release, what had been achieved by the various approaches. From the evidence of the paper it appeared that without the public project there would not only have been no publicly available draft human genome by 2000—there would have been no draft genome at all. More seriously, the chances of ever having a fully finished sequence would have been very slim indeed.
There was no chance that Francis Collins would be able to point out these uncomfortable truths at the joint press conference with Celera held in Washington on 12 February. But Eric Lander had already given an on-the-record, embargoed briefing to American journalists almost as soon as he saw the Celera paper. And I had every intention of telling the British press exactly what had and had not been achieved at the London press conference Nature was organizing at the Wellcome Trust headquarters. Just to be on the safe side—we thought it highly likely that the Celera publicity machine would come up with some kind of diversionary tactic—
Richard Durbin and I also briefed the senior London science correspondents the previous Wednesday. I gave an introduction, then Richard took them through slides showing the assembly methods and showed a comparison table. He presented our conclusions: that the whole-genome shotgun had not worked as claimed; that Celera had used more of our data than it admitted; and that it had come up with a product that was in some respects better than ours and in some respects worse. Our additional objective was to make sure that nobody thought the sequence was finished, even though the Celera authors studiously avoided calling their assembly a draft.
As soon as it was over the journalists all headed off to Lyon to attend Biovision, a big public conference on biotechnology. Craig Venter was speaking, and it transpired that they all had interviews with him set up for the Friday afternoon. On Saturday morning Robin McKie rang me at home. Robin is the scientific correspondent of the Sunday Observer, and had not been invited to our Wednesday briefing or given any of the advance information distributed to the press by Nature, which was embargoed for Monday. Nature tends not to include Sunday newspapers in its distribution of embargoed advance information, because they are notorious for disregarding embargoes—Robin himself had jumped the gun with the story of Dolly the cloned sheep four years previously. He argued that he had researched the story from other sources and so was not bound by Nature’s embargo.
On this occasion, fortunately, Daphne picked up the phone. I was still in bed, hoping to get rid of a streaming cold that had plagued me all day on Friday. She had a chat with Robin and came back saying, ‘He’s talked to Craig, and says it’s very important that the public side answers.’ Daphne had handled the situation adroitly, and had gleaned from Robin that Craig had made a statement. I didn’t phone Robin back—that would have been a breach of the embargo on my part—but phoned up the Wellcome Trust’s press office team and told them something was up.
Late that evening we learned that Robin’s interview with Craig was to be the front-page story of the next day’s Observer, and was already out on the web. Science magazine declared that the embargo had been breached and released its paper. Nature had no choice but to do likewise. The biology editor, Richard Gallagher, was very fed up. We all felt that the story did not amount to a breach of the embargo. The only substantial information from the papers that it contained was the estimate of the total number of human genes. Both groups had put the number in the range 26,000–40,000, less than half the figure of 100,000 that had been bandied around until very recently. It wasn’t even a very new point—estimates of gene numbers had been coming down steadily, and certainly other papers, such as Ian Dunham’s on chromosome 22, had been published giving this kind of figure a year earlier.
What Craig did in the Observer interview, however, was to use the number to make a bogus philosophical point: that the small number of genes implied a much greater role for environment in determining our natures. No longer, he said, would it be realistic to assume that there were specific genes for behavioral traits such as thrill seeking, intelligence or athletic ability. ‘We simply do not have enough genes for this idea of biological determinism to be right,’ Craig was quoted as saying. McKie called the finding ‘a radical breakthrough in our understanding of human behavior.’ This hyperbole was entirely unjustified. The fact that we have only twice as many genes as a worm or a fruit fly is extremely interesting biologically, but it adds nothing at all to the nature-nurture debate. We already knew that genes, especially control genes, combine their actions in ways that we have barely begun to understand—a theme to which I will come back later.
Of course, there were recriminations about the Observer’s action. Robin said that he interviewed Craig Venter privately in Lyon and that Craig understood that the Observer was not bound by the embargo. Celera denied this. But given that Celera has always given
the impression that its management of public relations is second to none, it seems highly unlikely that Craig did not know what he was doing when he talked to Robin McKie. One of the journalists at Eric’s press briefing had taped it and given the tape to Craig, so Celera knew that we were going to criticize the whole-genome shotgun results. Intentionally or not, the Observer certainly provided a diversion. I had to spend most of that Sunday giving interviews, and a recurring line of questioning was on the nature-nurture issue. (A day or two later I found myself answering the same question live on a radio program broadcast in Bogota.)
And so to the Monday press conference. To our relief the lecture theatre at the Wellcome Trust’s headquarters was full, even though many of the papers had run stories already. We really had just one story that we wanted to ram home: that, thanks to the publicly funded Human Genome Project, the human genome was available to all, including scientists in developing countries. In his opening statement, Mike Dexter pointed out that scientists from the developing world had accessed the databases more than 300,000 times in the previous few months. I explained how Celera had made use of our data, but only so that I could point out that we simply would not have the sequence had it not been for the public project. Meanwhile, in Washington Bob Waterston and Eric Lander were doing their best to make the same points while sharing a platform with Craig. Eric thanked Bob for creating the genome map, saying pointedly, ‘I am sure I speak for all of us when I say how grateful we are.’ And Bob made the data release point: ‘Most importantly,’ he said, ‘we have made this available to the world without any constraints. No patents filed on the raw sequences. No licenses. No documents to be signed. All you need is an internet connection.’
I’m not sure how much of it really sank in. Most of the coverage presented our arguments as just an extension of the bad-mouthing that had gone on the previous year, and Craig was eager to agree that it was all just sour grapes because he had ‘stolen our thunder.’
With a few honorable exceptions—Aaron Zitner of the Los Angeles Times was one, and the New Scientist another—hardly anyone acknowledged that the principle of free release was a moral imperative on our side. If free release was mentioned at all, it was presented as one of two equally valid alternative choices.
The struggle to keep genomic data public and free continued throughout 2001. A consortium consisting of three private companies, six institutes of the National Institutes of Health, and the Wellcome Trust had been formed in October 2000 to produce a draft sequence of the genome of the laboratory mouse. The mouse genome is important both in its own right and because it helps enormously with the interpretation of the human genome: wherever you have a match between mouse and human sequence, you are likely to learn something about how nature makes a mammal. By May 2001 the Mouse Sequencing Consortium, which was releasing its data freely (as we did for the human sequence) on its course towards a finished mouse sequence, had a draft with three-fold coverage.
Having taken advantage, as it was entitled to do, of the publicly released human sequence, Celera was able to switch to sequencing the mouse at an even earlier stage. In July 2001 it released a press statement saying that it had assembled a draft with five-fold coverage, two-fold more than the public-private consortium. Consequently some labs paid for access to the Celera mouse database, though at the same time they declared that they were eager for the freely available finished sequence that would eventually be produced by the consortium. As long as there are both freely released and proprietary sequencing efforts running in parallel there will inevitably be more data in a proprietary database (since it contains data from both) until each genome in turn is finished. Again, as in the case of the human, there was a danger that this position might offer a route towards monopoly: Craig Venter called (unsuccessfully) for public funding for mouse sequencing to be terminated just as he did previously for the human.
Other than the press release there was no publication on Celera’s mouse genome, so nobody could interpret how the company had assembled its data. Meanwhile, Bob Waterston, Eric Lander, and I wrote a brief analysis of the extent to which Celera had used the HGP data to generate both assemblies of the human genome reported in its Science paper. Aaron Klug, as an independent observer, communicated our paper to the Proceedings of the National Academy of Sciences, and it was published in March 2002.
The following issue carried a vigorous rebuttal from Gene Myers, Craig Venter, and their colleagues, in which they attempted to show that HGP data played little part in their assembly process. They didn’t dispute most of the analysis in our paper, but focussed narrowly on the initial step of the process. It is true that due to the peculiarities of the Celera assembler, some of the assembly information inherent in the ‘faux reads’ taken from the public data would have been lost at this initial stage. However, enough would remain to provide useful short-range continuity, and no one appears to dispute that the later steps, ‘external gap walking’ and anchoring assembled fragments to the genome, necessarily use information from the HGP. Alongside the Celera reply, PNAS ran a second commentary on our paper from Phil Green. Phil not only agreed with our analysis, but went much further in questioning the claims of speed and efficiency made for the whole genome shotgun approach.
This is really the issue. Nobody is questioning the fact that whole genome shotgun can get you a lot of data, and it has always been used for simple genomes. But if you want to end up with a fully finished sequence you need some way to close all the gaps and resolve ambiguities. For large, complex genomes such as the human a clone map is essential given the current state of the art. The Mouse Sequencing Consortium adopted a hybrid strategy, combining whole genome shotgun assembly with clone-based sequencing.
Many commentators, cued by Celera, said that the race accelerated
the timetable of the Human Genome Project—some said by as much as ten years, which is certainly incorrect. My own conclusion is that it will have made little or no difference to the date of the finished product (2003), though it did result in an intermediate draft being formally announced in 2000, in addition to the informal release of unfinished sequence that was happening anyway.
My reason for saying this is quite specific. In 1996 the Sanger Centre was funded to sequence one-sixth of the genome to fully finished standard by 2002. In 1998 we were on course, and by May 2001 we had already finished a sixth, in spite of the disruption caused by the draft activity and the associated PR distractions. When we were discussing this the other day I said to Eric Lander, ‘I hardly think that you would have sat on your hands and watched us do it!’ And that, of course, is the point: the internal competition that I’ve described within the Human Genome Project is not as destructive as it may sound. On the contrary, it ensured that we were all racing along, just as Bob Waterston and I had raced on the worm. So if we at the Sanger Centre were doing one-sixth by 2001 or 2002, you can be sure that Bob would try to do the same and so would Eric. So the pressure would have been on, a bit more funding would have come through, and we would be in the same position as we are today—or better, because of the lack of distractions.
But who cares? It doesn’t do to get too upset about all this. In the fullness of time, maybe it won’t matter that we did poorly on the PR front. What matters is that, as Eric puts it, ‘the good guys won’—we produced a sequence, put it in the public domain and made it impossible for any individual or company to control access to it. And we will go on and finish it to the high standard we set ourselves at the beginning.
Maynard Olson and Phil Green, who opposed the draft strategy in 1998, still worry about this. In an accompanying article in the same issue of Nature as the sequence paper, Maynard said he feared the publication of the draft would lessen the motivation of people to
finish the job. ‘Each new round of press conferences announcing that the human genome has been sequenced saps the morale of those who must come into work each day and do what they read in the newspapers has already been done,’ he wrote. The truth is that both the finishers at the Sanger Centre and their funders the Wellcome Trust are totally committed. They don’t need to be told that they’ve got to do it, and they don’t need to be told that there’s a danger they might not do it. It is happening. That’s one of the reasons why I felt OK about stepping down as director. If the funding had been in any way insecure I would have hung on. It may be that some of the other sequencing labs will move on to other species rather than putting the effort into finishing the human, but the remainder will have no trouble finishing everything by 2003. Indeed, as I write nine-tenths of the genome is up to finished standard. And I am absolutely with Maynard Olson in believing that finishing is essential. Errors need to be cleaned up and gaps closed if we are to compile an accurate catalogue of genes and unravel the mysteries that lurk in all that so-called junk.
By 2003 a fully finished sequence will become universally available, a work of reference as indispensable to biologists as a dictionary is to a writer. In the meantime, what have we learned from the draft?
Let’s first go back to the issue of the number of genes. We think that the human genome codes for some 30,000–40,000 genes, only about twice as many as it takes to make the nematode worm or the fruit fly. But the claim that this should change our thinking, because 30,000 is too few to code for all our characteristics, is based on the assumption that each characteristic, from hair color to your level of interest in football, depends exclusively on one corresponding gene. The implication of the Observer article seemed to be that higher estimates, of, say, 100,000, would be sufficient to account for humanity in this way, so that reducing the estimate to a third meant that we should attribute greater importance to nurture in our development.
The notion of one character: one gene is largely false, but leaving that on one side for a moment, let’s first note that the effect of both genes and nurture is already self-evident. Most powerfully, studies of identical twins show remarkable correlations in many characteristics, both physical attributes and behavioral tendencies, even when the twins are brought up separately. But the twins are not the same individual. Not all steps in development are tightly controlled, so that there are opportunities for random variation and effects of the environment. The result is that the twins end up with minor physical differences, but more importantly they have their own experiences, thoughts and ideas. They are unique human beings, even though they share the same genome.
The question is: Should it make a difference to our thinking whether we have 30,000 or 100,000 genes? I think the answer is no.
First, we simply know too little about how genes actually work. Having the complete gene set, which we are approaching through the genome sequence, will be of great help, but each gene has now to be painstakingly examined to identify its role. The gene list will be constantly scrutinized by people who are looking at systems in the body. The list provides them with help in finding all the components of the system—and that’s all. In the long run this approach, taking apart the machine mechanism by mechanism, will narrow our area of ignorance so that we really can evaluate directly the relative roles of heredity and environment in a new way. The more precisely we understand how the machine works intrinsically, the better we can deduce the contribution of extrinsic factors. But we have a long way to go.
Nevertheless, at first sight the conclusion that it takes only twice as many genes to make a human as it does to make a tiny worm or a fly seems to lead very naturally to the conclusion that the genes are fairly boring and don’t really have a lot to do with the essence of being a human. After all, runs the argument, a human is obviously so much more complex than a fly that twice as many genes just won’t
be enough. When I hear this argument I tend to hear also a subtext to the effect that humans, and the speaker in particular, are so much more important than flies that twice as many genes won’t do.
What tends to be forgotten is management. Many of the extra genes that are added in going from worm or fly to human appear to be control genes, and they come in hierarchies which are only just beginning to be worked out. So in principle, by elaborating the control mechanisms, a huge range of tissue types can be specified and a very complex structure can be built up. It’s a bit like the expansion of an organization: although some of us wish it wasn’t so, an essential part of building up a large organization is the introduction of more complex management structures and the employment of more executives. The control genes are the executives of biological development, and they allow complex and diverse structures to be built from units that are fundamentally quite similar. Many control genes operate by switching groups of other genes on or off; in addition, a single gene typically gives rise to a variety of products. One gene can have two or more differently spliced RNA transcripts, acting as templates for different proteins, and enzymes can further modify the protein after it has been synthesized; all of these processes offer further opportunities for control.
But the most remarkable thing is the power of genes in combination. Consider a gene that can exist in two variants, or alleles: A and B. This single gene will then allow us to specify two cell types, A and B. Now add a second gene, that can exist as C or D. Together the two genes will allow us to specify four cell types: AC, AD, BC and BD. Three genes can specify eight types, four, sixteen types, five, thirty-two types; ten genes, over 1,000 types, twenty genes, over a million. The conclusion is that just a few dozen genes, if applied in a hierarchical executive fashion, can provide an immense amount of additional complexity. So the addition of an extra 15,000 over the worm’s allowance allows plenty of room for maneuver in the con-
struction of a human being. In real life things will not be as neat as this simple discussion suggests, but by thinking in this way we can avoid being trapped by falsely limiting assumptions.
By the same line of argument, we can tackle the concern that an extra 15,000 genes are too few to explain the range of human inheritance. Indeed they would be too few if each gene were solely responsible for the specification of one recognizable human characteristic. But we have long known that this is the exception rather than the rule. In particular, many of the subtle human attributes about which we care most—intelligence, athleticism, beauty, wisdom, musicality and so on—are clearly not heritable in the same way as hair or eye color, for example, leading some to the conclusion that they are not heritable at all.
But think again about the power of different alleles combining in different ways. Extending our range to thirty-three genes allows us to generate over 8 billion different types, enough to give every living person a unique label. Three hundred genes will provide as many different types as there are particles in the universe or seconds since time began—vastly more than will ever be needed uniquely to identify every person that has ever lived or will live. And this is on the parsimonious assumption that each gene comes in only two forms, whereas in fact there are numerous alleles of every gene. No wonder identical twins are so special: we shall never see two identical human genomes by chance, but only by the splitting of the fertilized ovum.
Taken together, combinations and hierarchical control allow us to see in principle how both the complexity and the diversity of humans can be specified by a relatively small number of genes. All this should give us optimism in moving forward to find out exactly how it all works; but it should also caution us against jumping to quick conclusions. The complexity of control, overlaid by the unique experience of each individual, means that we must continue to treat every human as unique and special, and not imagine that we can
predict the course of a human life other than in broad statistical terms.
The genes are the starting point for a human being, and we should think of them as offering potentials rather than exercising constraints. Many fear that genetic information about individuals will be used to discriminate against them, and this is a concern that has to be taken seriously. Insurers are pressing to be allowed to use the results of genetic tests taken by their clients in deciding whether or not to issue policies; in future both insurers and employers, if the law permitted, might make genetic testing a condition of issuing a policy or offering a job. It is immensely important that we do not make presumptions about a person’s health or ability on the basis of their genotype, but rather look to see what they can actually achieve. It is a matter of fundamental human rights: rights that are broadly accepted, at least in principle, in Western society as far as sex discrimination and race discrimination are concerned. The same rights must now encompass all forms of genetic discrimination, because we are acquiring the ability to measure a vastly greater range of characteristics than before. Although the correlation of genetic characteristics with physical and mental outcomes will in most cases be purely statistical, there will be a temptation—there already is a temptation—for these to be used in actuarial prediction, very possibly to the detriment of some individuals’ opportunities. This we must oppose.
The new genetic knowledge is an enormously valuable starting point for research in biology and medicine. That is why it is so important to finish the sequence, so that it is as useful as possible. It is a permanent archive to which scientists will keep referring. But we need to be cautious about the immediate claims we make for it. Headlines such as ‘Gene code could beat all disease’ lead only to disillusion when year after year people continue to suffer from cancer, heart disease or senile dementia. To a certain extent I condone the
hype—but only because it’s important to keep the topic in the public eye so that there is wide debate about issues such as genetic discrimination. But let’s think for a moment about what is really likely to emerge in the next few years, moving up the scale from easiest to most difficult.
The most immediate application, already well under way, is in diagnosis. Once a variant gene has been found that is associated with a particular disease, it is a trivial matter to conduct a test that tells you whether or not someone has that variant form. Genetic tests are now available for a number of diseases including cystic fibrosis, muscular dystrophy, certain forms of breast cancer and Huntington’s disease. These are mostly comparatively rare conditions in which a single genetic defect gives you a high chance of developing the disease. A positive test result leads to hard choices for affected people: if it is a prenatal test, then parents need to choose whether or not to terminate the pregnancy; a woman with a positive genetic diagnosis of a predisposition to breast cancer, even if she has no tumors, may opt for preemptive surgery. Such choices are never straightforward, and patients need careful counselling to help them decide.
With the advent of the SNPs database, we can begin to get a handle on common genetic variations that have a more statistical impact on common disorders such as heart disease, asthma and diabetes. This is going to be more complicated, because no single variant will by itself make much difference to your susceptibility to disease—it will be a matter of groups of variants working in concert, and this is a very active research field at the moment. There is still a lot of work to do to correlate SNPs with illness in large populations, but undoubtedly genetic tests will emerge from this work and will be patented. When people patent a gene, all they mean is that they know its sequence and that they can use it to do a diagnosis. To me this should not be a basis for patenting the gene as a whole. I think we get into all kinds of trouble by laying claim to whole genes in
order to protect rights to a diagnostic test, when what we really need are treatments for diseases based on those genes, which will take longer to develop.
The same knowledge of variation can contribute to improved drug treatments. It’s a constant source of frustration for doctors that drugs that work very well on one patient, such as steroids in asthma, don’t work at all on another. A SNP profile might be able to provide guidance to doctors on which is the best drug to prescribe. Further down the line, drug companies will undoubtedly start producing families of drugs ‘personalized’ for different SNP profiles. Whether the benefits of doing this will be worth the considerable costs remains to be seen. We have yet to discover whether the savings made by not having to try out a range of drugs until you find one that works will outweigh the expense of carrying out genetic tests on all patients.
The genome will also undoubtedly have an impact on people’s choice of diet and lifestyle. In consumerist Western societies this will no doubt be seen as a huge marketing opportunity and again be overhyped. If you are a middle-aged man who smokes and is somewhat overweight, you don’t need a genetic test to tell you that you are at risk of heart disease. But if genotyping becomes the norm, I can see an explosion in the market for diet books, nutritional supplements and exercise programs designed for people with specific genetic profiles. I have a nightmare that people will choose which restaurant to eat at according to their genotype. It will be a mess, it will be overdone, but there will be some germs of truth in what the tests are saying.
What I think is much more important and much more realistic in the timescale of a decade or so is the prospect that we will find new drug targets for diseases that we currently find very difficult to treat. For example, Mike Stratton’s cancer group at the Sanger Centre is screening tumors to see how they differ genetically from normal tissue. In many cases it may be easier to kill a cell than to cure it.
Genome information should help to reveal targets on the tumor cell so that drugs can seek them out and destroy the tumor cells selectively, leading to fewer side-effects and higher cure rates than in conventional chemotherapy and radiotherapy. It’s likely that in ten or twenty years’ time many more cancers will be treatable than today.
When people started talking about cures for genetic disease, twenty or thirty years ago, they generally spoke in terms of gene therapy: replacing a bad allele with a good one, or genetically transforming cells to produce useful products, such as the growth factors that can help damaged brain cells to regenerate. Laboratory research has laid most of the groundwork for this approach, but successful gene therapy is turning out to be a more elusive goal than was hoped. The best chances of success are in diseases where the cells you want to treat are accessible, such as leukemias or immune deficiency diseases where you can take blood or bone marrow cells out, treat them and put them back: French doctors for the first time successfully treated two babies with severe combined immunodeficiency disease using this technique in 2000. Trials have also been under way for some time in cystic fibrosis, where the cells that need a working gene are in the membranes that line the lungs, theoretically reachable by using an inhaler. So far such treatments have not led to long-term improvements, an indication of just how hard it’s going to be to tackle in this way diseases of less accessible tissues, such as the brain and nervous system. Getting genes engineered and delivered and turned on and off properly calls for much greater understanding than we currently possess of how the system works.
But it would be wholly misleading to suggest that sequencing the genome has been a waste of time because gene therapy hasn’t worked immediately. This is one area where the hype has been overdone—which is understandable when patients are desperate for cures. But in the long term this should be seen as no more than a momentary setback. There was no more reason why gene therapy
should deliver instant cures than any other form of experimental medicine; it is just as promising, though just as doubtful in the early stages of its development, as organ transplants, for example.
Knowledge of the genome could eventually allow parents to endow their children artificially with genes for ‘desirable’ characteristics such as intelligence, beauty and so on, producing so-called ‘designer babies.’ For the moment this is an implausible scenario, for a variety of reasons. The route from picking your ideal baby from a catalogue to a successful birth, never mind a healthy child or adult with the chosen characteristics, is long and full of uncertainties. And, as we’ve seen, such characteristics are likely to depend on sets of genes working in concert in ways we have scarcely begun to understand. So, even if one could overcome the considerable technological obstacles to transforming an embryo, the result might well fail to meet the parents’ expectations, with horrendous consequences for the unfortunate offspring (not to mention lawsuits galore). For the moment it’s much better to wait for the delightful surprise of producing a new and unique individual by the conventional method—and more practical. The genes are only the start of a person; the environment, and particularly parenting, are immensely important, and in general it’s not at all good for children if their parents have specific expectations of them. However, in a generation or two, with advances in knowledge, parents may genuinely have these options and will have to decide.
Negative selection, on the other hand, is already going on. Genetic screening in pregnancy, available now for several years, gives parents at risk of producing a child with a genetic disease such as muscular dystrophy the option of a termination if the diagnosis is positive. Some clinics now offer pre-implantation diagnosis, screening very early embryos produced by in vitro fertilization and implanting only those that have a normal gene. There is a narrow ethical line between this procedure and introducing a healthy gene into an embryo that lacks it—but the latter is currently banned under U.K.
legislation that outlaws germline gene therapy, in which the treatment will not be restricted to the individual but will also be carried down through future generations. This is wise, for apart from ethical considerations, our ignorance about the potential ramifications is too great. Whether that decision will later be reversed in the light of increasing knowledge will be a matter for democratic debate.
These considerations relate to more than just the practicalities. They also raise a whole series of important ethical questions: the rights of the unborn and the rights of parents; the concept of genetically ‘normal’ and the concept of genetically ‘better.’ The first half of the twentieth century saw, in both Europe and North America, the horrors of eugenic movements in which individuals decreed genetically ‘defective’ were forcibly prevented from reproducing or, as in the ‘final solution’ of the Nazi regime, murdered. Most of us recoil from these events, but there is no escaping the fact of our new powers, and the need to exercise them responsibly. Some would prefer that we deny them by abstaining totally from intervention. Certainly there is a danger that by narrowing too far the boundaries of ‘normality’ we could deny life to people who should rightfully enjoy it. On the other hand, there are already cases of children bringing lawsuits against their parents for ‘wrongful life’—allowing them to be born to lead disadvantaged lives. We cannot evade the need for a balance of rights. In any event, it is essential that, once born, all human beings are treated equally regardless of genetic endowment.
What is most important is that the genome is a key step towards the molecular anatomy of the human body. We are right at the beginning, not the end; we don’t know what most of the genes look like, or when or where they’re expressed. The genome alone doesn’t tell you any of these things. Nevertheless, the information is there as a resource and a toolkit to which people will come back again and
again as they build up knowledge of the complete structure of the body from the foundation. The next step is to discover all the genes: to figure out what the genome is coding for, where the genes are and particularly where all the control signals are. Because the coding regions account for only 2 percent or less of the human genome, they are much harder to find than in more gene-dense organisms such as the worm, which has a coding density of about 30 percent. Comparing the human sequence with those of other species such as the mouse or the zebrafish is going to be part of the way forward. Although we have diverged from other vertebrates during evolution, natural selection will have ensured that the coding and control regions that are essential to make a viable animal will have been conserved. So looking for matches between genomes is a good gene-hunting technique that can help to fill in the gaps left by automatic gene prediction programs and matches to cDNAs.
Once we’ve found the genes, we need to work out what proteins they produce, and to understand their time and place of expression. Investigation in all of these areas is going forward at a great pace. None of these jobs is finite—it’s quite unlike sequencing the human genome, because every time you do a gene expression experiment you’re going to get a different answer, depending on the conditions. You could in principle set up a factory and try to collect a massive set of data, but it’s most useful if people who are working on particular mechanisms of the body each carry out their own studies on the tissues that interest them.
Then there’s the idea of collecting all the proteins, looking at all the interactions between them—‘proteomics’ is the new hot area for both academic labs and private companies. You can portray it as a Tower of Babel—‘You can’t possibly understand all that!’—but because people work on subsystems, we will gradually fit together these pieces of the mechanism. Richard Dawkins’s lovely phrase the ‘blind watchmaker’ is exactly right: we are finding all these little pieces of clockwork that have been put together in the most
irrational way to make the whole thing work. And the neat thing about the pieces of mechanism is that many of them are the same in the human as they are in the worm and the fruit fly. Some of the most fundamental mechanisms, such as the cell death pathway that removes unwanted cells, were first studied in the worm, and some of the same genes control programmed cell death in the human.
Somewhere in the genome will be the answer to what makes us different from all the other species—what makes us human. But it’s very unlikely to be as simple as having a gene or two that chimps don’t have. We will need to know much more about how the whole system acts in concert before we truly understand ourselves.
At Francis Collins’s suggestion we concluded our paper on the draft genome with an ironic echo of Watson and Crick’s famous understatement in their 1953 announcement of the structure of DNA. ‘It has not escaped our notice,’ we wrote, ‘that the more we learn about the human genome, the more there is to explore.’ By making the genome sequence freely available, we made sure that the number of explorers would be unlimited.