“The Maker and the Critic”, experimenting with Midjourney AI

May 13, 2023

A hybrid artwork created in Midjourney, by combining prompts for Richard Neutra and David Hockney

No items found.

Mash-ups, hybrids, and aesthetic intersections are suddenly ubiquitous.

Star Wars is meeting Wes Anderson, the British Royal Family are now appearing in a lewd Martin Scorsese film, Nike shoes have been cross fertilised with Tiffany … that last idea is now an actual product.

Why are such images flooding your social media feeds? The answer lies with Midjourney, a leading platform in the realm of Generative AI Art. Midjourney has recently launched its V5 version, captivating users most with a feature set that generates arresting imagery from “hybrid” prompt inputs.

This hybridisation trend involves the combination of two or more seemingly disparate aesthetic or descriptive references into a singular concept. The software accepts both text and image prompts, allowing you to feed it up to 10 input images for inspiration alongside a structured text prompt to describe what you envision. Your prompt can be one word or a 100+ words, but like any good recipe the chef (user) has to decide the balance of complementary ingredients and combine them with deft judgement, or risk ‘spoiling the broth’.

To those who painstakingly create for a living, this concept might seem crude. However, upon experimenting with these tools, you might discover, as I did, that the collaging of concepts from diverse origins is familiar and in some ways analogous to the abstract design process that already occurs within a designer’s mind when creating in a “flow state”.

When addressing a design question or problem, we often find ourselves hunting for a simple, pleasing concept that intuitively feels like the solution. We rummage through our mental library of references and suddenly, a previously overlooked piece of material or scrap of an idea we’ve experienced somewhere in the past emerges as a potential fit. As we further explore this new avenue, a new idea crystallises, paving the way as a fresh direction emerges for the project. The new idea is an augmentation, a transformation of a conceptual seed you’ve been storing in your brain for some unknown future use.

David Hockney’s “The Bigger Splash” 1967

I have a print of Hockney’s The Bigger Splash in my home. While I’m sure that’s a bit of a cliché, I have always loved the sun-bleached colours, calmness and composition of this painting and my eyes return to it again and again. I wanted to see the range of outputs Midjourney can achieve through hybrid prompts and so I decided to throw lots of ideas at the wall, starting with Hockney’s best known work.

I’m an architect with lots of curiosity about what AI will mean for my field and in particular what tools like Midjourney might impact how people think, create and design. So I decided to focus on the painting and see what my imagination threw up for ideas for using the painting as a point of departure for design applications, these landed roughly in the following order; speculate on a new architectural material palette, create a semi-surreal interior design mood image, produce 3D diorama render for a branding piece, imagine a fashion collection on a catwalk … I set out to do this all with Midjourney as my tool, in less than a day. Here is what I found.

Diagram explaining the Midjourney generative process

The Maker and the Critic

Midjourney uses an innovation called Generative Adversarial Networks (GANs). As Prof Neil Leach has remarked, The “Adversarial” part is important because when you feed a brief to the programme it goes to work on two parallel streams. You can think of these as a) the Maker and b) the Critic. The Maker starts creating random drawings and trying to pass them off as a masterpiece that exactly fits the brief that has been received. But then the Critic steps in and says “sorry, this doesn’t pass my test, the client won’t be happy — that’s no good”. The Maker goes back to the drawing board and tries again.

This cycle of creation and critique happens within the code over and over again, many tens of thousands of times (depending on the complexity of the brief you feed it) until the critic is satisfied and in about a human minute, four small thumbnail sized images are sent back to the user. At this point the user must deploy their judgement to decide whether the text/image prompt is working or if a change of strategy is needed.

Midjourney Experiment 1 — Creating a material sample board inspired by the painting

How deep can you dive?

This week I attended the Drawing Matter exhibition at Roca Gallery curated by Hamza Shaikh who gave us a guided tour. Towards the end of the exhibition you transition from Superstudio collages and back-of-the-envelope scribbles by Zaha Hadid to a work created by Hamza called “Sacred Synthesis”, which is a hybrid hand-drawing, painting and collage derived from a composition part generated in Midjourney. The prompt structure is preserved on carefully transcribed handwritten notes on the table in front of the artwork, a nod to the emotional connection for the code that helped to create the image.

And while prompting is the first important step in Midjourney, the user is not passive in the image generation process after the prompt is entered. In this second phase, an editorial choice is given to the user and at each of what I would describe as generation sprint cycles, a new shortlist of 4 images are presented as possible routes forward, a fork in the road upon which to keep experimenting. You decide which fork to take and then the process repeats, your role is to provide the ingredients and gradually curate the project direction and decide when the process is complete.

You can go extremely deep on this process, delving into many layers of experimentation, I imagine until your subscription alert tells you there are no more credits and you must reach again for your credit card. As an example of how deep you may need to go, I’m finding I usually need to go about 10 layers deep before I find something I think is close to what I had imagined at the outset and there is no certainty that there will be a good outcome. Tim Fu & Chantal Matar, whom I joined on an AI panel discussion this week described typically going 20 layers deep with iteration in search of an image that feels ‘done’. This is great advice and I have found myself that the changes are more dramatic at the top layers of this process and that it’s usually best to create a larger number of experimentation paths at the outset to make sure the first fork you take is the best one; with every incremental decision you will have a gradually smaller impact. At the beginning of this process you are judging overall composition and content, as you go deeper you focus on smaller emergent features and refinements.

Something that will happen in the latter stage generations is that the model will start to play with the introduction or removal of certain key elements and if you like the potential direction, you can push the model down that path to see if there is something there. In this way, it feels a bit like wandering through a forest with a blindfold, grabbing each tree in turn to see if it feels like the ‘right tree’.

The “GAN” image generation process works from a specific ‘seed’ which is unique (1 of 4,294,967,295) possible seeds to be precise. The only way to re-create the exact generation, would be to re-use an identical seed number within the prompt, which can be done within the parameters if you so desire it. Having a grasp of the seed as a unique key helps you to understand the sheer range of possibilities for any given prompt configuration, the sheer improbability of every getting the same result twice. The same image/text prompts from one use would engender an altogether different outcome for another, because the seed is generated randomly by default.

Midjourney Experiment 2 — Creating a interior concept mood image inspired by the painting

The process can prove fruitless, you pursue an idea and options emerge but then it just doesn’t come together as you imagine. Whether it’s the image selection or the prompt wording, you will often conclude that the trajectory is off-kilter and needs to be abandoned and replaced with a total rethink. This can be disappointing, you submit your carefully crafted prompts with great hopes only to have them dashed a moment later, when Midjourney’s response demonstrates you have simply not communicated with it effectively; you’re the problem, not the software.

This is the part where many will give up on their (mid)journey, but you must persevere and dig deeper to find the nuggets of gold and this means retracing a few steps and to reverse your thinking about a pivotal image fork several steps earlier. Or it may be there is a nuance in the prompt wording that has caused things to start going downhill, therefore much trial and error is the only way to begin to feel a degree of control over the output.

Midjourney Experiment 3 — Produce a 3D branding image (diorama) inspired by the painting

Don’t get me wrong, the images are pretty much all “good” quality on a surface level, but whether they are “actually, truly good” is the real question you have to keep asking yourself.

Explore the ‘Newbies’ Discord channels to better understand this phenomenon, quantity is most certainly not quality. In these channels humour and enthusiasm abound and you will see the full spectra of artistic style and taste; it’s a bit like an “all you can eat” buffet, full of promise and choice, but quickly nauseating. This is where, if you haven’t already concluded as much — you will realise that when the propagation of images is apparently endless, judgement becomes everything.

Midjourney can create an accomplished image with almost no effort, but it can’t teach you how to think. This seems pertinent to Architectural education and during the AI panel discussion this week Kat Stevens described how students are quickly adopting these new tools in large numbers because they allow them to rapidly test abstract spatial concepts in three dimensions with light and colour, starting from a qualitative description of space or atmosphere. This shift is already having a dramatic effect on the design culture in university studios.

Reams of bold images are being generated with absurd abundance and the more that are created, the greater the degree of discernment we will have to apply. Arguably the ease with which the creation process is being augmented calls for a higher bar for judgement in art and design.

There is no shortage of hyperbole around this new “AI epoch” and of the many outlandish claims, the one I find most implausible is the idea that the only thing previously stopping people from being designers, writers, or artists is tooling, rather than talent. This is akin to imagining free access to treadmills will make us all runners. Overall I do think we will find more people now able to pursue creative paths that were previously not open to them, but AI is simply providing insanely powerful tooling and if we’re honest an almighty boost of synthetic talent.

Midjourney Experiment 4 — Creating a fashion collection and catwalk concept inspired by the painting

I have described the process of Midjourney prompting as I imagine it would feel to sit in a Las Vegas casino endlessly putting coins into a slot machine. It’s really hard to stop because like all addictions the dopamine hit after a successful prompt feels great and is powerful enough to keep you coming back for another try. When an image and/or text prompt comes together as intended, it feels like magic, an image of audacious quality and completeness appearing suddenly from the digital ether and confirming what we always knew to be true, we are in fact a genius.

One of the reasons AI adoption is spreading so fast is that it’s addictive to play with, just like Instagram, Pinterest, TikTok and all the other apps I have deleted from my phone every January 1st for the past five years. Generative AI is offering the irresistible illusion of boundless talent and possibility to all its users and they are enraptured; like Narcissus staring into the water at his reflection … we can’t look away, afterall each image is truly immaculate. It is viral content on steroids and as the endless quest for content spreads so the network effect of the software will grow stronger, bringing in more and more people into its sticky web.

I encourage you to give it go, Midjourney and other platforms like it are phenomenally powerful pieces of software that cannot be ignored, just tread carefully and maintain a critical eye of judgement, like the software you have to be a “Maker and Critic” at the same time. Keep in mind the software probably knows what we like better than we do, because it’s being trained on every click and selection we make. It could become the filter to end all filters, an aesthetic Da Vinci code, all smooth contours and manicured perfection. Unless you ask for imperfection, in which case it can do that too.

More like this

New tools that promise to “take away the middle-man” and go straight from sketch to rendering.

Plot Twist — Generative AI May Actually Encourage Architects to Draw MORE

New tools that promise to “take away the middle-man” and go straight from sketch to rendering.

August 1, 2023