top of page
Search

GPT-4o Image Generation: What It Means for Creativity and Communication

Updated: 3 hours ago

I've been watching the evolution of AI image generation with a mix of fascination and healthy skepticism. However, when OpenAI announced GPT-4o's integrated image capabilities, I spent a weekend putting it through its paces. What struck me wasn't just the technical improvements, but the subtle shift in how it might change our relationship with visual creation; and it absolutely amazed me.


Can you believe this is AI generated?

Prompt: A wide image taken with a phone of a glass whiteboard, in a room overlooking the Bay Bridge. The field of view shows a woman writing, sporting a tshirt with a large OpenAI logo. The handwriting looks natural and a bit messy, and we see the photographer's reflection.
Prompt: A wide image taken with a phone of a glass whiteboard, in a room overlooking the Bay Bridge. The field of view shows a woman writing, sporting a tshirt with a large OpenAI logo. The handwriting looks natural and a bit messy, and we see the photographer's reflection.

Beyond the Technical Specs

GPT-4o introduces some impressive technical advancements. It's multimodal from the ground up, meaning it understands the relationship between language and images in a more integrated way. The text rendering is noticeably better (anyone who's struggled with garbled text in other AI image generators will appreciate this). It follows more complex instructions and can handle references from uploaded images.


But what does this actually mean for us as creators, communicators, and businesses?


To be honest, I'm still working that out. But here are some thoughts after spending time with the technology.



The Photography Parallel, With Caveats

I find myself drawing parallels to digital photography's emergence in 1950. When cameras arrived, they didn't replace painting, they changed it. Freed from the burden of pure representation, artists explored impressionism, expressionism, and countless other movements.


However, I'm wary of overstating this comparison. GPT-4o isn't revolutionising visual arts overnight. What it's doing is lowering certain barriers to visual expression and opening up possibilities for different kinds of creative workflows.


Prompt: Create image of digital photography and AI combining into a new creative art
Prompt: Create image of digital photography and AI combining into a new creative art

What I'm Finding Genuinely Useful

After the initial excitement wore off, I started finding practical applications that actually make sense:


1. Concept Visualisation

There's something powerful about describing an idea and seeing it visualised almost immediately. It changes the conversational dynamic when discussing creative concepts. Internally, we used GPT-4o to quickly test different visual approaches for a product idea we are building. The ability to say "what if we tried this instead?" and immediately see a version saved us hours of back-and-forth.


2. Iteration Speed

"It's like having a really fast sketch artist who doesn't get offended by feedback." We can explore visual directions rapidly, discard what doesn't work, and refine what does, all without the emotional investment that comes with hours of manual creation.


3. Visual Communication for Non-Designers

Using GPT-4o to create explanatory visuals for complex concepts is a huge unlock. People who would never open Photoshop are suddenly communicating visually, which is adding a new dimension to internal knowledge sharing.



Honest Limitations Worth Acknowledging


While impressive, GPT-4o has clear limitations:

  • It struggles with more than 10-20 distinct elements in a single image

  • There are occasional cropping issues with longer images

  • The editing precision still needs work

  • It sometimes invents details when prompts lack specificity

  • Face consistency, especially when editing from uploads, remains challenging


These aren't criticisms so much as important parameters to understand when considering practical applications.


The Copyright Elephant in the Room


I can't ignore the copyright concerns that come with these impressive capabilities. OpenAI and similar companies train these models on massive internet datasets, artwork, photos, designs, even YouTube content, often without creator permission.


When I prompted GPT-4o to create something "in the style of Studio Ghibli," the result was uncannily accurate, and troubling. Is this a new work or essentially a derivative built from thousands of copyrighted Ghibli images? And shouldn't the original creators have some say or benefit?


Prompt: Create a Studio Ghibli style version of this image
Prompt: Create a Studio Ghibli style version of this image

The legal landscape remains uncertain. While some claim AI-generated content qualifies as "fair use" or "transformative work," these arguments haven't been properly tested in courts, particularly in New Zealand's legal context.


To their credit, OpenAI does implement C2PA (Coalition for Content Provenance and Authenticity) metadata in all GPT-4o generated images. This embedded metadata identifies content as AI-generated, creating a provenance trail that helps with transparency. While this doesn't address copyright concerns directly, it does allow for determining an image's origin, an important step for accountability.


Our recommendation? Use these tools for internal ideation, but be extremely cautious about creating content that mimics recognisable styles for commercial purposes. Several lawsuits against AI companies are pending, and their outcomes will likely reshape how we approach these tools professionally.



Finding a Balanced Perspective


Between the "AI will replace all artists" fear-mongering and the "this changes everything" hype, there's a more nuanced reality:


These tools work best as collaborative extensions of human creativity. They're particularly valuable for exploration, iteration, and certain types of visual communication. They supplement rather than replace existing creative processes.

The most interesting applications will likely emerge from how we integrate these capabilities into our workflows rather than from the technology itself.



What This Means for New Zealand Businesses

For organisations considering how these tools might fit into their operations:


  1. Communication Enhancement: These tools can help bridge the gap between text-based concepts and visual execution, particularly useful for teams without dedicated design resources.


  2. Rapid Prototyping: Test visual concepts quickly before committing resources to final production, potentially reducing costs and improving outcomes.


  3. Content Creation Support: For businesses needing consistent visual assets across channels, these tools offer ways to maintain visual cohesion while reducing production time.


  4. Creative Exploration: Use AI generation to explore directions you might not have considered, potentially opening up new creative avenues.


Scenario: A startup aims to market its new line of eco-friendly kitchenware but lacks the resources for professional photography.

Prompt: Generate an image of a biodegradable bamboo cutting board on a rustic kitchen counter with natural lighting
Prompt: Generate an image of a biodegradable bamboo cutting board on a rustic kitchen counter with natural lighting


A Thoughtful Path Forward


As we navigate this evolving landscape, we find the most productive approach is exploring practical applications while maintaining awareness of both capabilities and limitations.


The question isn't whether these tools will replace human creativity – they won't. The more interesting question is how they might augment it in ways that are both practical and meaningful.


I'm curious, how do you see these tools potentially fitting into your creative or communication processes? The conversation around these technologies is still developing, and I'd love to hear your perspective.



References


 
 
 

Comentários


bottom of page