About AFP / CAFA & FAQ

New! Rules for the CAFA experiment have been posted!

  1. For the impatient: how to participate in CAFA
  2. About the Automated Function Prediction Special Interest Group (AFP-SIG)
  3. About the CAFA experiment
  4. Frequently Asked Questions


For the impatient: how to participate in CAFA?

  1. Read the rules
  2. Register for the experiment today
  3. Download target proteins, available starting September 15, 2010
  4. Submit predictions before deadline, January 15, 2011


About The Automated Function Prediction Special Interest Group (AFP-SIG)

Sequence and structure genomics have generated a wealth of data. However, extracting meaningful information from genomic data is becoming increasingly difficult. Both the number and the diversity of discovered genes is increasing. This increase means that established annotation methods, such as homology transfer, are annotating less data. In addition, there is a need for annotation which is standardized so that it could be incorported into function annotation on a large scale. Finally, there is a need to assess the quality of the function prediction software which is out there. We probably know the sequence of the target for next generation antibiotics or cancer treatment. We just did not recognize that target for what it is: it is currently annotated as a "domain of unknown function".

The mission of the Automated Function Prediction Special Interest Group (AFP-SIG) is to bring together computational biologists who are dealing with the important problem of gene and gene product function prediction, to share ideas and create collaborations.

About the CAFA experiment
The problem: There are far too many proteins in the database for which the sequence is known, but the function is not. The gap between what we know and what we do not know is growing. A major challenge in the field of bioinformatics is to predict the function of a protein from its sequence or structure. At the same time, how can we judge how well these function prediction algorithms are preforming?

The solution: The Critical Assessment of protein Function Annotation algorithms (CAFA) is an experiment designed to provide a large-scale assessment of computational methods dedicated to predicting protein function. We will evaluate methods in predicting the Gene Ontology (GO) terms in the categories of Molecular Function and Biological Process. The experiment will consist of two tracks: (i) the eukaryotic track, (ii) the prokaryotic track. In each track, a set of targets is provided by the organizers. Participants are expected to submit their predictions by the submission deadline. The predictions will be evaluated during the Automated Function Prediction (AFP) meeting, which has been approved as a Special Interest Group (SIG) meeting, at ISMB 2011 conference (Vienna, Austria).


Frequently Asked Questions:

  • Q: Do I have to participate in the CAFA experiment to participate in the AFP meeting?
    A: No. You just have to register for the AFP meeting.
  • Q: Is there another way I can actively participate in the meeting? I do have work in the field of protein function prediciton, but I do not wish to participate in CAFA.
    A: Yes. You can submit your work for presentation in the meeting, as a poster or a talk. The CAFA experiment is only one AFP activity, and we wish to be as inclusive as possible.
  • Q. Will the organizers provide training data for the predictor development?
    A. No, we will only estimate the prediction accuracy. The accuracy of protein function prediction may critically depend on the ability of the group to extract quality data from a range of sources, integrate it and preprocess it. Functional annotations for training can be extracted from the Swiss-Prot, or GO database. Molecular data is available from various additional sources, such as GEO, HPRD, PDB, PRIDE, BIND, DIP, etc.
  • Q. Yeah yeah, I'll read the rules. But can you tell me in brief how you are running the CAFA challenge?
    A. Starting September 15, 2010, we will make a few thousand protein sequences available. Those are unannotated proteins taken from several sources. You are expected to annotate them using Gene Ontology terms. After the submission deadline, we will select some of the target proteins for scoring predictions. To participate in the CAFA challenge, you need to register your team on this website (see left bar).
  • Q. Will you be using any other vocabulary / keyword scheme except Gene Ontology?
    A. Not this year
  • Q. What I don't understand is how the true function of your proteins will be determined. Will these be made and then tested in some high throughput way? If not, how do you know that your answers are correct? Just because many different algorithms predict a particular function, it doesn't mean that that function is the real one.
    A. There is a natural growth of experimentally verified annotations in Swissprot. So if say, we hand out 3000 drosophila genomes, and over the lull period between the submission deadline and 10 (low estimate) get annotated experimentally, we already have quite a few to play with. We interrogated the history of several genomes, and the more popular eukaryotic ones have a good growth curve with respect to experimental annotations. We will use several genomes. We estimate a few dozen targets to come out of this pipeline.
    1. We will have some annotations sequestered for us, courtesy of the CALIPHO project.
    2. We are in touch with other experimental groups regarding experimental verification of protein function.
  • Q. So how many targets are we talking about?
    A. We will release some 50,000 sequences. We expect that between the prediction submission deadline (Januare 15,2011) and the time we begin the assessment (June 2011) a few hundred will become experimentally validated, and non-trivial to predict. This is based on trends from previous years. In addition, we will secure a few dozen targets from experimental gorups and projects such as CALIPHO
  • Q. Why "CAFA"?
  • A. The "Critical Assessment" meme originated with the Critical Assessment of Structure Predictions (CASP) at the structural bioinformatics community. Later it was taken up by various other life science communities who have decided to perform their own critical assessment challenges. CAPRI, BioCreative, CAMDA and more recently CAGI. The name CAFA was coined by Inbal Haleprin-Landsberg, then at Russ Altman's lab, and the initiative was first discussed at the 2006 Automated Function Prediction meeting.

Previous AFP Meetings: