In The Phantom Spy, Phoenix has a serious issue with crowds: they terrify her. But to compete in her main hobby -- bot battling -- and to follow the trail of clues to her brother’s kidnapper she must go into crowded places. To cope she wears polarized goggles and headphones that help shut out all sight and sounds of the crowds around her.
I drew the picture of Phoenix on the left. At my skill level at the time it took me about a week to complete. Fast forward a few years and we have AI that can generate art based on prompts. Using such a tool I described Phoenix as a young, African-American girl wearing polarized goggles, lifting of headphones with one hand, carrying a battlebot under the other arm. After some tweaking the description and playing with a few other parameters, in about 10 minutes Mid-journey generated the two images on the right.
The good: This was ten minutes of work. The overall quality is excellent and easily at a level that is acceptable for my purpose, a portrait for a novel. Did I mention this took ten minutes? With broad prompts it can generate interesting and fantastic images as it fills in the gaps.
The bad: Mid-journey, and as far as I have seen pretty much all the AI art generators out there as of today, cannot give you specific results. No matter how much tweaking I tried it quickly became apparent that Mid-journey could not get in every detail that I asked for. And these details are not difficult. If you ask for details individually Mid-journey can do it. But three or four specific details are beyond the AI’s ability to consolidate right now. You can see Mid-journey did not draw polarized goggles. It did not draw Phoenix lifting the headphone with one hand. Obviously you can forget about trying further details, such as exactly how you want the battlebot to look, or if you want the battlebot to be emitting a spark that lights up Phoenix’s face, or Phoenix to be wearing her grandmother’s bent cross necklace. You will have to settle with two or three main details. Four, if you’re lucky.
The ugly: Mid-journey creates images based on what it learns from multiple images to, in essence, create an low-intelligence frankenstein image. When I say low-intelligence I mean the AI has little to no basic knowledge of ... well, anything. It doesn’t know what a battlebot is, what it’s used for, or how it should look. It doesn’t even know how many fingers a human being has -- as you can see it drew seven on a hand in one of the images. Mid-journey routinely creates strange or grotesque anatomical features, particularly when it comes to limbs. Extra limbs, characters sharing limbs, limbs bent at impossible angles. The more detail you ask it to add the more likely it will generate aspects that demonstrate a lack of fundamental knowledge about everyday objects. It will draw swords that have two handles and no blade. A bow and arrow with strings that attach to empty space instead of the ends of the bow.
My thoughts: I asked my daughter, who is familiar with Phoenix, which picture she prefers. She said Mid-journey’s images look good, but my drawing looks the most like Phoenix and she prefers that. But she was clearly blown away with how striking and life-like Mid-journey’s output looks, especially considering -- did I tell you it only took ten minutes to produce? Some of the errors Mid-journey creates are easy to fix. I can paint over extra fingers and such easily enough. Some take considerably more effort. Facial expressions, deformed limbs, illogical building structures, etc., require more than a quick paint over. But even in those instances, at my current artistic skill level using Mid-journey as a starting point, fixing the errors and adding in the elements and details that are missing, it’s usually a matter of hours worth of work to get to finished product instead of days.
There are other problems with Mid-journey. Since it is essentially random (literally, you can keep asking it to generate random versions of the same prompt), you cannot get it to generate the same character again in a different setting. If I liked one of the images of Phoenix that it generated, I cannot ask it to generate the same girl in the middle of a crowded arena with her head bowed, goggles on, stressed and trying to focus on her bot. It would generate a completely different girl. AI art generators can be used as excellent accelerators in some situations, especially for mid-level artists. They can also function as fantastic generators of ideas and inspirational snapshots. But they aren’t ready to completely replace human artists... yet.