Development of an AI-method for predicting the ability of protein synthesis in microorganisms

Masterthesis, Bachelorthesis, undergraduate assistent

The aim of the research project is to use deep neural networks to identify patterns in the gene sequences of homologous proteins that are responsible for successful expression. Based on these patterns, a prediction of the producibility of heterologous proteins will be made using their DNA sequence and, if necessary, verified with available experimental data.

The production of proteins using microorganisms is an important branch of industrial biotechnology. It ranges from technical enzymes for the production of e.g. bioethanol to the synthesis of pharmacologically active proteins such as therapeutic antibodies. A handful of established production organisms are confronted with a large number of interesting proteins which are „foreign“ to the production organism, so-called heterologous proteins. Heterologous protein synthesis often does not achieve the efficiency that is achieved with host-own (homologous) proteins. The information for successful protein synthesis is encoded in the DNA sequence of the homologous proteins, among other things. Gene expression is the first important step on the way from DNA to protein. The decoding of this information is the subject of intensive research worldwide.

Implementation / techniques:

• Preparation of genomic data sets: Genome / Pangenome of a host system based on open access sequence databases

• Preparation of a control data set from the sequence databases

• Design and training of deep neural networks for pattern recognition based on training and control data sets

• Verification / falsification of the discovered patterns with available experimental data from BRAIN AG

• Prediction of the producibility of heterologous proteins of an enzyme class and testing in the laboratory (laboratory work can be performed by the student or employees of Brain AG)

For further information please look into the PDF or contact the supervisor.