Introduction to VQGAN+CLIP

Here is a tutorial on how to operate VQGAN+CLIP by Katherine Crowson! No coding knowledge necessary.

machine learning
image synthesis
  1. Home
  2. Google Doc
  3. Introduction to VQGAN+CLIP

Introduction to VQGAN+CLIP

Here is a tutorial on how to operate VQGAN+CLIP by Katherine Crowson! No coding knowledge necessary.

machine learning, image synthesis, graphics, design, surrealism, unreal

���This is a brief tutorial on how to operate VQGAN+CLIP by Katherine Crowson. You don’t need any coding knowledge to operate it - my own knowledge of coding is very minimal.

I did not make this software, I just wanted to bring it to the public’s attention. Katherine Crowson is on Twitter @RiversHaveWings, and the notebook that inspired it (The Big Sleep, combining BigGAN+CLIP in the same way) was written by @advadnoun. In addition, if you’d like to see some really trippy video applications of this technique, check out the videos on @GlennIsZen‘s YouTube Page:

Note: I purchased a subscription to Google Collab Pro, which gives priority to better and faster GPUs, and decreases the time taken before Collab times out. This is not a necessary measure, and you can do all of this without it. If you want to run generations for longer periods of time or have it run in the background without interruption, Collab Pro is a reasonable option.

STEP 1: Google Collab

Go to this Google Collab Notebook (Originally by Eleiber and Abulafia using Katherine Crowson’s code, translated into English by @somewheresy):

Alternatively, the user @angeremlin has developed an alternate version of the notebook which allows for importing from Google Drive, and batch processing:

Google Collab is like Google Docs for code. It gives you access to dedicated GPUs that Google operates through the cloud, so you don’t need to use your own computer’s processing power in order to run the program. This is useful, because generating art with AI can be very intensive.

You’ll see that the page is split into lots of individual “cells.” For the vast majority of these, don’t mess with them. There are only two cells you should be interfering with, “Selección de modelos a descargar” (Selection of Models to Download) and “Parámetros” (Parameters).

STEP 2: Parameters

In the Parameters cell, you enter all the information necessary for VQGAN to create its image. Here’s a breakdown of all the different parameters in the cell:

textos/texts: This is where the written prompt goes, e.g. “a cityscape in the style of Van Gogh,” or “a Vaporwave shopping mall”

ancho/width: The width of the generated image in pixels

alto/height: The height of the generated image in pixels

modelo/model: The dataset the GAN uses to create the image (more on this in a second)

intervalo_imagenes/image_range: How frequently the GAN actually prints the ongoing image for you (e.g. every 50 iterations, every 5 iterations, every 500 etc)

imagen_inicial/initial_image: This is optional. You can insert your own image to give the GAN something to start with, instead of having it generate an image from scratch

imagenes_objetivo/target_images: Similarly, you can upload an image for the software to aim for, essentially functioning the same way as the text prompt. You can have several images in this section, and you can use it with or without the text prompt.

seed: As in Minecraft. It’s a unique designator. Keeping it at -1 will generate a new random seed every time. Any positive integer will have its own seed, and will send the software down the same path every time (useful if you want several variation on the same theme)

max_iteraciones/max_iterations: How many iterations the software should produce before stopping. A value of -1 will keep the GAN running indefinitely.

STEP 3: Models

Back up in the “Selección de modelos a descargar” cell, you’ll see several boxes, and only one of them ticked.

These are the different datasets VQGAN has access to - massive collections of images that it will take inspiration from. By default, VQGAN uses the Imagenet 16384 dataset, hence why it's the only box ticked, and why it’s the default option in the “Model” dropdown menu in the “Parameters” cell.

If you want VQGAN to use the other datasets, tick their boxes. Each one has to download separately, and several of them are pretty massive, which is why they’re turned off by default. If you choose to download them, you can then select the set you want VQGAN to use by selecting it under “Model”

STEP 4: Running the GAN

Once you have your dataset boxes ticked and your parameters ready, you can run the program!

The first time you open the Google Collab Notebook, you’ll need to run all of the cells in order. You can either do this manually, by running each cell in sequence (clicking them in order will queue them up), or by going up to the “Runtime” tab and clicking “Run All”

Note: there are three more cells underneath “Hacer la ejecución” (Do the Execution), which is the cell where the images are printed. The first of these converts your output into a video, the second plays the video in your browser, and the third downloads the image to your computer. If you aren’t interested in generating a video, just run all the cells up to and including “Hacer la ejecución”

The images will begin to steadily print out in the “Hacer la ejecución” cell. Initially, it will look like dirt, but as the iterations progress it will slowly take shape. You can download any of the printed images by right-clicking, and clicking “save image.”

STEP 5: Tip and Tricks

Some useful tips!

1. Once you’ve run the GAN for the first time, you don’t need to run all the cells all over again for your next attempt. As long as you don’t close the Notebook tab in your web browser, all you need to do to generate another image is run the “Parameters” cell and then the “Do the Execution” cell. You shouldn’t need to bother with any of the earlier cells unless you want to download more datasets.

2. You can stop the procedure at hand by going up to “Runtime” and clicking “Interrupt Execution.” Useful if you want to halt the output midway through and move on to video generation, or if you want to run a new prompt without waiting for the current one to finish.

3. In the text prompt section, you can enter multiple prompts by separating them with the “|” symbol.

4. You can ascribe percentage weights to your different prompts by adding a colon and then a number to each one, adding up to 100. For example, “a cityscape:50 | nightmare artist:25 | photorealism:25”

5. In the text prompt section, adding phases like “unreal engine,” “hyperrealistic,” “photorealistic,” and “render” produces more HD-like results, which is pretty funny.

6. If you want to add your own images for use in the “initial_image” or “target_images” section, go to the left side of the screen and click on the little File icon. Drag and drop your images into the folder, and then type the name of the folder (e.g. image.jpg, face.png) into the relevant section.

STEP 6: Aspect Ratios and Re-Sizing

This might just be me, but I struggle to generate images in sizes larger than 700x700 pixels. Collab runs out of memory. Typically, I use that total number of pixels (700x700 = 490000) whenever I try other aspect ratios. Here are some useful ones:

1:1 - 700x700

4:3 - 808x606

16:9 - 928x522

Cinemascope (2.4:1) - 1084:452

2:1 - 988x494

1.66:1 - 903x544

Introduction to VQGAN+CLIP
Tags Machine learning, Image synthesis, Graphics, Design, Surrealism, Unreal
Type Google Doc
Published 24/07/2021, 12:27:46


ICML2020_Machine Learning Production Pipeline
ML Visuals by