This is a brief tutorial on how to operate VQGAN+CLIP by Katherine Crowson. You don’t need any coding knowledge to operate it - my own knowledge of coding is very minimal.
I did not make this software, I just wanted to bring it to the public’s attention. Katherine Crowson is on Twitter @RiversHaveWings, and the notebook that inspired it (The Big Sleep, combining BigGAN+CLIP in the same way) was written by @advadnoun. In addition, if you’d like to see some really trippy video applications of this technique, check out the videos on @GlennIsZen‘s YouTube Page: https://www.youtube.com/user/glenniszen
Note: I purchased a subscription to Google Collab Pro, which gives priority to better and faster GPUs, and decreases the time taken before Collab times out. This is not a necessary measure, and you can do all of this without it. If you want to run generations for longer periods of time or have it run in the background without interruption, Collab Pro is a reasonable option.
Go to this Google Collab Notebook (Originally by Eleiber and Abulafia using Katherine Crowson’s code, translated into English by @somewheresy):
https://colab.research.google.com/drive/1_4Jl0a7WIJeqy5LTjPJfZOwMZopG5C-W?usp=sharing
Alternatively, the user @angeremlin has developed an alternate version of the notebook which allows for importing from Google Drive, and batch processing:
https://colab.research.google.com/drive/1ud6KJeKdq5egQx_zz2-rni5R-Q-vxJdj?usp=sharing
Google Collab is like Google Docs for code. It gives you access to dedicated GPUs that Google operates through the cloud, so you don’t need to use your own computer’s processing power in order to run the program. This is useful, because generating art with AI can be very intensive.
You’ll see that the page is split into lots of individual “cells.” For the vast majority of these, don’t mess with them. There are only two cells you should be interfering with, “Selección de modelos a descargar” (Selection of Models to Download) and “Parámetros” (Parameters).
In the Parameters cell, you enter all the information necessary for VQGAN to create its image. Here’s a breakdown of all the different parameters in the cell:
textos/texts: This is where the written prompt goes, e.g. “a cityscape in the style of Van Gogh,” or “a Vaporwave shopping mall”
ancho/width: The width of the generated image in pixels
alto/height: The height of the generated image in pixels
modelo/model: The dataset the GAN uses to create the image (more on this in a second)
intervalo_imagenes/image_range: How frequently the GAN actually prints the ongoing image for you (e.g. every 50 iterations, every 5 iterations, every 500 etc)
imagen_inicial/initial_image: This is optional. You can insert your own image to give the GAN something to start with, instead of having it generate an image from scratch
imagenes_objetivo/target_images: Similarly, you can upload an image for the software to aim for, essentially functioning the same way as the text prompt. You can have several images in this section, and you can use it with or without the text prompt.
seed: As in Minecraft. It’s a unique designator. Keeping it at -1 will generate a new random seed every time. Any positive integer will have its own seed, and will send the software down the same path every time (useful if you want several variation on the same theme)
max_iteraciones/max_iterations: How many iterations the software should produce before stopping. A value of -1 will keep the GAN running indefinitely.
Back up in the “Selección de modelos a descargar” cell, you’ll see several boxes, and only one of them ticked.
These are the different datasets VQGAN has access to - massive collections of images that it will take inspiration from. By default, VQGAN uses the Imagenet 16384 dataset, hence why it's the only box ticked, and why it’s the default option in the “Model” dropdown menu in the “Parameters” cell.
If you want VQGAN to use the other datasets, tick their boxes. Each one has to download separately, and several of them are pretty massive, which is why they’re turned off by default. If you choose to download them, you can then select the set you want VQGAN to use by selecting it under “Model”
Once you have your dataset boxes ticked and your parameters ready, you can run the program!
The first time you open the Google Collab Notebook, you’ll need to run all of the cells in order. You can either do this manually, by running each cell in sequence (clicking them in order will queue them up), or by going up to the “Runtime” tab and clicking “Run All”
Note: there are three more cells underneath “Hacer la ejecución” (Do the Execution), which is the cell where the images are printed. The first of these converts your output into a video, the second plays the video in your browser, and the third downloads the image to your computer. If you aren’t interested in generating a video, just run all the cells up to and including “Hacer la ejecución”
The images will begin to steadily print out in the “Hacer la ejecución” cell. Initially, it will look like dirt, but as the iterations progress it will slowly take shape. You can download any of the printed images by right-clicking, and clicking “save image.”
Some useful tips!
This might just be me, but I struggle to generate images in sizes larger than 700x700 pixels. Collab runs out of memory. Typically, I use that total number of pixels (700x700 = 490000) whenever I try other aspect ratios. Here are some useful ones:
1:1 - 700x700
4:3 - 808x606
16:9 - 928x522
Cinemascope (2.4:1) - 1084:452
2:1 - 988x494
1.66:1 - 903x544