2. How you calculate sensitivity and selectivity of Blast?

Suppose the Blast search returned 100 hits. Of these, 17
were false positives and we knew that there were 165
sequences in

the database which should have returned a hit with our
sequence.

To calculate the sensitivity and selectivity, we must
determine the number of true positives (ntp), the number of
false

positives (nfp) and the number of false negatives (nfn). We
are told that the number of false positives was 17, hence
the

number true positives must have been 100-17 = 83, as there
were 100 hits. Therefore we know that the search algorithm
found

83 of the 165 sequences it should have found, hence the
number of false negatives was 165-83 = 82. So, we know that
ntp = 83,

nfp = 17 and nfn=82. Using the equations in the notes, we
can calculate:

Sensitivity = ntp/(ntp+nfn) = 83/(83+82) = 83/165 = 0.50 (2
d.p)

Selectivity = ntp/(ntp+nfp) = 83/(83+17) = 83/100 = 0.83

3. What are the main approaches of predicting protein interactions using genomic context analysis?

We have developed an approach using Bayesian networks to
predict protein-protein interactions genome-wide in yeast.
Our method naturally weights and combines into reliable
predictions genomic features only weakly associated with
interaction (e.g., messenger RNAcoexpression,
coessentiality, and colocalization). In addition to de novo
predictions, it can integrate often noisy, experimental
interaction data sets. We observe that at given levels of
sensitivity, our predictions are more accurate than the
existing high-throughput experimental data sets

4. What is the main idea of maximum parsimony in phylogenetic tree construction? What are the drawbacks?

The Maximum Parsimony (MP) problem aims at reconstructing a
phylogenetic tree from DNA sequences while minimizing the
number of genetic transformations. To solve this NP-
complete problem, heuristic methods have been developed,
often based on local search. In this paper, we focus on the
influence of the neighborhood relations

5. Which of the following sequences contains the pattern [AG]-x
(4)-G-K-[ST] from the PROSITE database?
seq. A: VAGWGKST
seq B: GVLKRGKS
seq. C: AGVLKGRT
seq. D: AGVGKSTP?

seq. C: AGVLKGRT

[AG]-x (4)-G-K-[ST]
decodin the pattern:
A or G in the first position,(note both sequence C and D
X any amino acid follows the next four positions (2-5)
G in the sixth position (note seq C alone satify)
k in the seventh position
S or T in the eigth position (note seq C alone satify)

6. What is the meaning of science?

Science is the term given to the powerfull branch which
forces a living thing to struggle and win the obstacles
existing in the mother earth. A dragging power which
motivates the life.

7. Explain Homology modelling?

if the crystal structure of any protein is unavailable,
then one can use the tools of homology modelling to
determine the structure. the logic is that a similar
structure arises becuse of a similar sequence of amino
acids. for homology modelling to be accurate an identity
match of 70% is desirable.basically, one does a FASTA
search of A.A sequences in the PDB database (A.A sequences
whose structures are known), does a CLUSTAL alignment to
check for conserved residues. then the structure of the
unknown A.A sequence is built up on the basis of the
structure of the best matches in FASTA and CLUSTAL by
programs like LLOOP and HHPRED. this structure can be
visualized in programs like DEEPVIEW,Protein Explorer
etc.

8. What are the main signals used for gene finding in prokaryotic genomes? How are these signals introduced into the search algorithms?

The main signals used are the TATA Box and the GC rich
regions present ahead of promoters in prokaryotes. Since,
the genes in prokaryotes are organised as operons so by
using comparative genomics we can find new genes.

9. How to run DOCK 6 using cygwin?

1.install the needed package such as bison, perl etc.....
2.Configure the gnu file using ./configure gnu....
chimera....
4.Prepare the structure and form the spheres and then built
the grid for the ligand.....
5.Dock the molecule, by selecting the rigid dcking or
flexible docking...
6.Give the input file for the selected docking and obtain
the output.....

10. Derive e-value?

Expect value. The E-value is a parameter that describes the
number of hits one can “expect” to see by chance when
searching a database of a particular size. It decreases
exponentially with the score (S) that is assigned to a
match between two sequences. Essentially, the E-value
describes the random background noise that exists for
matches between sequences. For example, an E-value of 1
assigned to a hit can be interpreted as meaning that in a
database of the current size, one might expect to see one
match with a similar score simply by chance. This means
that the lower the E-value, or the closer it is to “0”, the
higher is the “significance” of the match. However, it is
important to note that searches with short sequences can be
virtually identical and have relatively high E-value. This
is because the calculation of the E-value also takes into
account the length of the query sequence. This is because
shorter sequences have a high probability of occurring in
the database purely by chance