Sequence Analysis / Bioinformatics

Analysis of Protein Targeting Signals

Protein sorting in eukaryotic cellsBiological cells are composed of different compartments. Protein translocation from the cytosol to the compartments and the extracellular space is achieved in most cases by an N-terminal targeting signal, which directs the protein co- or post-translationally to its target compartment. After a protein has reached its destination the targeting signal is cleaved off by a signal peptidase (SP) and further degraded by a signal peptide peptidase (SPP). There are indications for additional functional features of the processed targeting signals beyond their function as targeting signal, e.g. a role in the adaptive immune response.

Organisms have developed a large variety of protein transport systems which can roughly be divided into five classes (type 1 to 5 secretion systems). Each system requires unique recognition signals coded in the substrate proteins. These signals are still incompletely described. The focal point of our research in this area lies in the analysis of atypical targeting sequences, in particular non classical secretion signals, in eukaryotes and the pathogens (Helicobacter pylori and Plasmodium falciparum).

Protein Structure Prediction

Predicted vs. native structurThe quest for reliable methods for protein structure prediction could be considered as a Holy Grail in bioinformatics. While increasingly more protein sequences are determined, the speed of experimental structure determination is still far behind, leaving a huge amount of new sequences without structural context. This difference in speed demands for automated methods for structure prediction. While there are reasonably reliable methods to predict secondary structure from amino acid sequence, the prediction of tertiary structure is still difficult to accomplish. Different approaches have been investigated in the last decades, from fold recognition and threading, over atom- or residue-based empirical potentials, to computationally demanding force-field simulations. Research in our group is aimed at the prediction of protein domain architecture. We use empirical contact potentials for scoring combined with Particle Swarm Optimization (PSO) as heuristic method to efficiently search conformational space.


Machine Learning and Optimization

Using SOMs to create an abstraction of Drosophila's antennal lobe (AL). 
Left: Drosphila AL (source: ). Center: A spherical SOM 
that has been trained on the surface points of the AL. Right: The spherical SOM 
unfolded.Machine learning is concerned with the algorithmic extraction of a model from given data, where the model should exhibit good generalization ability on unseen data. We employ and adapt machine learning techniques such as artificial neural networks, support vector machines and self-organizing maps (SOMMER) to extract structure-activity relationships for rapid virtual screening of large compound databases. We analyze several receptor-ligand interactions with the aim to find novel lead structures, and apply machine learning techniques to drug/nondrug classification, frequent-hitter analysis, selectivity prediction and a variety of other problems in early drug discovery.

Current projects deal with the characterization of olfactory coding and "scaffold-hopping" for a panel of pharmacological targets.

PsoVis: Java applet for visualization of Particle Swarm Optimisations Optimization algorithms are grounded on general concepts to find solutions (optima) to mathematically expressible problems. A special subset of optimization algorithms are evolutionary algorithms. Inspired by biological evolution, they apply the concepts of population, reproduction, mutation, selection, and fitness to find solutions to theoretical or real-life problems. Here we apply Cyclops - a Java optimization suite written in our group - to find subsets of combinatorial libraries that are highly enriched with molecules exhibiting a desired pharmacological activity panel.

A main focus of our group is to develop evolutionary techniques for compound de novo design. In this context, we evaluate and apply Particle Swarm Optimization (PsoVis) as one of our favored concepts.


Receptor-based molecular design

The constant increase of experimental structure annotations provides a wide research field for receptor-based drug design. Allocation of properties important for ligand-protein and protein-protein interactions can be calculated from the 3D-structure of the receptor and included into a receptor- based pharmacophore model. Ligand in the binding siteThe latter provides a first step toward rational drug design. Receptor-derived surface properties allow the identification of binding sites and "hot spots" of protein-protein and ligand-protein interactions. Our research in this area is focused on:

Shape-based de novo Design

Shape complementarity is crucial for molecular recognition. Building up new structures which fulfill this requirement is a task of shape-based de novo design of potential small-molecule ligands.

Shape Analysis

Identification of ligand binding-site with PocketPickerThe size and shape of binding pockets play a pivotal role for the biological function of a protein and provide unique microenvironments for selective ligand binding. Steric and chemical properties of potential ligands can therefore be derived from the shape and accessibility of acquainted or putative protein pockets. This renders shape-analysis a primary tool in virtual-screening. We developed the software PocketPicker for this purpose.

Conformation and flexibility of ligand binding

GPCR model colored by entropy

In the absence of receptor-structure information hypothetical protein models are constructed based on which potential binding pockets can be predicted to obtain a first idea about interaction site geometry. Homology modeling is applied in this case using a related receptor as a template. To cope with structural uncertainties and those concerning the ligand binding conformation of the protein, dynamic simulations are employed in order to address receptor flexibility. Shapes of binding sites are analyzed to enforce ligand interaction models.

Prediction of protein-protein interactions

Homology model of 5-Lipoxygenase used for protein-protein interface predictions

Prediction of the location of protein-protein interfaces has far reaching implications both for the understanding of the specificity of binding, as well as the analysis of protein-protein networks. Protein oligomerization has pharmacological and functional implications, which prompts the search for detailed structural information of the protein-protein interfaces.


Ligand-based methods

In the absence of a receptor model and as an alternative strategy, we develop and apply ligand-based virtual screening methods to the design of novel ligands. Molecules are represented at different levels: atomic structure, graph representations, physicochemical properties, surfaces and pharmacophore models. Special focus of our research is on alignment-free molecular descriptors. We have developed the CATS family of descriptors and "fuzzy" pharmacophore descriptors (LIQUID) to allow for rapid virtual screening of large compound collections and fully automated de novo design. These efforts are complemented by extensive software development, for example MQL, a novel substructure query language, and fast clustering techniques for large data sets.
Further activities comprise the hit-to-lead process, which may yield novel candidate structures exhibiting better binding characteristics and fewer side effects.

Our most advanced approaches aim at designing novel structures from scratch - de novo design.

Design of novel inhibitors of the HIV-1 Tat-TAR RNA interaction

With 39.5 million people living with the human immunodeficiency virus (HIV) in 2006 the search for anti-HIV drugs is an ongoing challenge of our time. In our group, we apply ligand-based methods to the task of finding small organic ligands to the Tat-responsive region (TAR) of HIV-1 mRNA whose interaction with the trans-activator of transcription (Tat) protein is essential for HIV replication. In one study, multi-layered artificial neural networks (ANNs) were trained to classify RNA and protein binders. Following cherry picking of the highest scoring compounds of a vendor catalog several new TAR RNA ligands could be identified. New inhibitors of the Tat-TAR interaction could also be discovered in two other studies which employed the "fuzzy" pharmacophore descriptors SQUID and LIQUID [1,2]. In a complementary approach, a computer-assisted iterative prospective study was conducted with help of an evolutionary algorithm to discover novel ligands. Experimental results revealed new TAR RNA ligands, as determined by a fluorescence-based assay. And recently an inhibitor of the Tat-TAR interaction was constructed completely from scratch by our fragment-based de novo Design software FLUX [3]. We are grateful for the experimental support of our collaborative partner, the work group of Prof. Dr. Michael Göbel.

HIV-1 TAR RNA [PDB code: 1LVJ] with ligand acetylpromazine bound in the bulge

[1] Renner, S.; Ludwig, V.; Boden, O.; Scheffer, U.; Göbel, M.; Schneider, G. (2005), ChemBioChem 6, 1119-1125.
[2] Tanrikulu, Y.; Nietert, M.; Scheffer, U.; Proschak, E.; Grabowski, K.; Schneider, P.; Weidlich, M.; Karas, M.; Göbel, M.; Schneider, G. (2007), ChemBioChem 8, 1932-1936.
[3] Schüller, A.; Suhartono, M.; Fechner, U.; Tanrikulu, Y; Breitung, S.; Scheffer, U.; Göbel, M.W.; Schneider, G. (2007), J. Comput.-Aided Mol. Design, accepted.