November 10, 2018 / by / In proteins
Episode 8: Human-AI Collaborated Virus
TL;DR? Check out the Youtube video of this project!
In this episode of How to Generate (Almost) Anything, we collaborated with George Sun, an electrical engineer turned biologist who is pursuing a PhD in bioengineering at MIT and we generated viruses dreamed-up by AI!
Note: This experiment is purely for fun, not scientific. As we will discuss later in this post, this is not the ideal way to generate proteins. Although, there is still a teeny tiny possibility that one of the viruses dreamed-up by AI might correspond to a real (and deadly) virus!
In How to Generate (Almost) Anything, we always try to generate things as diverse as possible. After generating pizzas, dresses, perfumes, graffitis and chocolates, we wanted to try our chances in biology! As with all the projects we did so far, we always look for data (whether in terms of text, images or musical notes) to train our AIs. The Protein Data Bank (PDB), a 3-D structural database to represent protein structures seemed like a good fit since we easily have access to thousands of structures. As a part of our quirky experiment, we wondered: how would AI-generated viruses look like?
We collected protein structures that belong to Virus taxonomy. Each protein also comes with a FASTA file that represents either nucleotide sequences or peptide sequences using single-letter codes. A sample virus from our dataset looks like the following, with a FASTA representation below:
3D view of 5lsf, Sacbrood honeybee virus and its FASTA representation for Chain A.
So our plan is to train a recurrant neural network to generate new sequences! However, it turns out that it is not so straightforward to map generated sequences into 3D structures. In fact, George says that to figure out how sequences form into a protein might take an entire PhD! Right now, with our simple experiment we are just generating the text for sequences, but we don’t know how those molecules form into a 3D structure. As a workaround, we used RaptorX program that predicts secondary and tertiary structures of a given protein sequence. In other words, it gives a prediction of how AI-generated text would look like in 3D (but ideally, the AI should directly work on generating the 3D representations as well).
3D view of an AI-generated protein as predicted by RaptorX.
That being said, there is still a chance that the sequence we generated (and mapped by RaptorX) might actually correspond to a deadly virus! (this awfully calls for a bad Hollywood movie, right? “One day, a bunch of scientists makes a fun experiment at MIT and then…”)
We repeated this process a few times, and ended up with some pretty cute virus candidates (Pinar’s favorite one is on the top left).
Some viruses dreamed-up by AI.
As some of our avid followers already know, an important part of How to Generate (Almost) Anything is to bring AI’s dreamed-up creations into reality. So, we decided to 3-D print some of the structures! George spent an afternoon debugging the 3-D printer but finally we were able to hold our AI-generated viruses! (see below)
Pinar and George are playing with AI-generated viruses at the Koch Institute, MIT.
You can watch our Youtube video to learn more about the viruses we generated, as well as learning about how George is making materials from living organisms! Also we have a bonus section at the end where George is talking about the Koch Insitute, and showing us the amazing artwork displayed at the Koch.
Special Thanks to Mephesto
Mephesto is the geneticist AI that generated the viruses in this episode. Similar to her namesake, she aims to create things beyond the natural (although, we hope that she never comes to a point to generate a four-assed ostrich).
Note: Mephesto’s profile picture is also dreamed-up by an AI.