I keep meaning to try out this new capability, but there are so many tools, so few hours! In any case, it promises to be an exciting breakthrough. If you take it for a spin, I’d love to hear what you think of the results.
Sure, all this stuff—including what’s now my career’s work—will likely make it semi-impossible to reason together about any shared conception of reality, thereby calling into question the viability of democracy… but on the upside, moar dank memes!
Here’s how to create a dancing character using just an image + an existing video clip:
Viggle is the new hottest AI Creative Tool That is forever changing Memes and the future of AI Video.@aiwarper created a meme with the joker and Lil Yachty that caused a hilarious explosion.
Removing objects will be huge, and Generative Extend—which can add a couple of seconds to clips to ease transitions—seems handy. Check out what’s in the works:
Many, many years ago, I delighted in experimenting with vector copies of famous logos I could download from the, um, copyright-agnostic Logotypes.ru. That site seems to be gone now, but this quick vid highlights some others you might find useful:
Check out the latest work (downloadable for free here) from longtime Adobe veteran (and former VP of product at Stability AI) Christian Cantrell:
The new version of the Concept Art #photoshop plugin is here! Create your own AI-powered workflows by combining hundreds of different imaging models from @replicate — as well as DALL•E 2 and 3 — without leaving @Photoshop. This is a complete rewrite with tons of new features coming (including local inference).
Not content to let Adobe & ChatGPT have all the fun, Google is now making its Imagen available to developers for image synthesis, including inserting items & expanding images:
We’re also adding advanced photo editing features, including inpainting and outpainting.
Imagen, Google’s text-to-image mode, can now create live images from text, in preview. Just imagine generating animated images such as GIFs from a simple text prompt… Imagen also gets advanced photo editing features, including inpainting and outpainting, and a digital watermarking feature powered by Google DeepMind’s SynthID.
I’m eager to learn more about the last bit re: content provenance. Adobe has talked a bunch about image watermarking, but has not (as far as I know) shipped any support.
Meanwhile Google is also challenging Runway, Pika, & others in the creation of short video clips:
Our generative technology Imagen 2 can now create short, 4-second live images from a single prompt.
Given that my wife is the one responsible enough to chase the eclipse today & not roast her eyeballs, I’m left at home digging up a classic Dana Carvey bit about the eclipse (30 seconds, starts at 2:04). Enjoy! :-p
For 10 years or so I’ve been posting admiringly about the work of Paul Trillo (16 times so far; 17 now, good Lord), so I was excited to hear his conversation with the NYT Hard Fork crew—especially as he’s recently been pushing the limits with OpenAI’s Sora model. I think you’ll really enjoy this thoughtful, candid, and in-depth discussion about the possibilities & pitfalls of our new AI-infused creative world:
Some companies spend three months just on wringing their hands about whether to let you load a style reference image; others spend three people and go way beyond that, in realtime ¯\_(ツ)_/¯ :
These guys are doing such a good job creating intuitive visual interfaces for prompting
This is the new real-time image blending interface from @krea_ai
When DALL•E first dropped, it wasn’t full-image creation that captured my attention so much as inpainting, i.e. creating/removing objects in designated regions. Over the years (all two of ’em ;-)) I’ve lost track of whether DALL•E’s Web interface has remained available (’cause who’s needed it after Generative Fill?), but I’m very happy to see this sort of selective synthesis emerge in the ChatGPT-DALL•E environment:
Or… something like that. Whatever the case, I had fun popping our little Lego family photo (captured this weekend at Yosemite Valley’s iconic Tunnel View viewpoint) into Photoshop, selecting part of the excessively large rock wall, and letting Generative Fill give me some more nature. Click or tap (if needed) to see the before/after animation:
Generative Fill, remaining awesome for family photos. From Yosemite yesterday: pic.twitter.com/GtRP0UCaV6
Hey, I know what you know (or quite possibly less :-)), but this demo (which for some reason includes Shaq) looks pretty cool:
From the description:
Elevate your data storytelling with #ProjectInfographIt, a game-changing solution leveraging Adobe Firefly generative AI. Simplify the infographic creation process by instantly generating design elements tailored to your key messages and data. With intuitive features for color palettes, chart types, graphics, and animations, effortlessly transform complex insights into visually stunning infographics.
Man, I can’t tell you how long I wanted folks to get this tech into their hands, and I’m excited that you can finally take it for a spin. Here are some great examples (from a thread by Min Choi, which contains more) showing how people are putting it into action:
Reinterpreted kids’ drawings:
Adobe Firefly structure reference:
I created these images using my kid’s art as reference + text prompts like these:
– red aeroplane toy made with felt, appliqué stitch, clouds, blue background – broken ship, flowing paint from a palette of yellow and green colors
Speaking of folks with whom I’ve somehow had the honor of working, some of my old teammates from Google have unveiled ObjectDrop. Check out this video & thread:
Google presents ObjectDrop
Bootstrapping Counterfactuals for Photorealistic Object Removal and Insertion
Diffusion models have revolutionized image editing but often generate images that violate physical laws, particularly the effects of objects on the scene, e.g., pic.twitter.com/j7TMadRhxo
Diffusion models have revolutionized image editing but often generate images that violate physical laws, particularly the effects of objects on the scene, e.g., occlusions, shadows, and reflections. By analyzing the limitations of self-supervised approaches, we propose a practical solution centered on a counterfactual dataset.
Our method involves capturing a scene before and after removing a single object, while minimizing other changes. By fine-tuning a diffusion model on this dataset, we are able to not only remove objects but also their effects on the scene. However, we find that applying this approach for photorealistic object insertion requires an impractically large dataset. To tackle this challenge, we propose bootstrap supervision; leveraging our object removal model trained on a small counterfactual dataset, we synthetically expand this dataset considerably.
Our approach significantly outperforms prior methods in photorealistic object removal and insertion, particularly at modeling the effects of objects on the scene.
“Why would you go work at Microsoft? What do they know or care about creative imaging…?” 🙂
I’m delighted to say that my new teammates have been busy working on some promising techniques for performing a range of image edits, from erasing to swapping, zooming, and more:
Microsoft presents DesignEdit!
It’s a image editing method that can remove objects, edit typography, swap, relocate, resize, add and flip multiple objects, pan and zoom images, remove decorations from images, and edit posters.https://t.co/1DGNiNAFw1pic.twitter.com/2N5n6MNkqf
I’ll bet that the myriad effects shown here—from Magritte-like negative space to buildings vibing to the beat—were far trickier to pull off than one might guess from their matter-of-fact presentation, and I love how simply & organically they come together:
I’m delighted to see that the longstanding #1 user request for Firefly—namely the ability to upload an image to guide the structure of a generated image—has now arrived:
Good morning! I’m excited to share with you a new tool on Adobe Firefly website called Structure Reference. I spent whole weekend creating art with it and find this new feature the most inspiring for my art.
This nicely complements the extremely popular style-matching capability we enabled back in October. You can check out details of how it works, as well a look at the UI (below)—plus my first creation made using the new tech ;-).
Last year I posted about the Imaginary Forces’ beautiful, eerie title sequence for Amazon’s Jack Ryan series, and now School of Motion has sat down for an in-depth discussion with creative director Karin Fong. They talk about a wide range of topics, including AI & its possible impacts towards the 1:09 mark.
Here’s a look behind the scenes of the Jack Ryan sequence:
Given just the latest news, the company’s name sounds ironic, but I love seeing them offer capabilities that we previewed in the Firefly teaser video now more than a year ago. (Here’s hoping Adobe announces some progress on that front at Adobe Summit this coming week.)
It’s amazing to see what two people (?!) are able to do. Check out this video & the linked thread, as well as the tool itself.
IT’S FINALLY HERE!
Magnific Style Transfer!
Transform any image, controlling the amount of style transferred and the structural integrity Infinite use cases! 3d, video games, interior design, for fun…
I think the spirit of maximally inclusive “Irishness” has special resonance for millions of people around the world, like me, who can trace a portion (but not all) of their ancestry to the Emerald Isle. (For me it’s 75%, surname notwithstanding.) I’m reminded of Notre Dame’s “What Would You Fight For?” campaign, which features scientists, engineers, and humanitarians from around the world who conclude with “We are the Fighting Irish.” I dunno—it’s hard to explain, but it really warms my heart—as did the Irish & Chinese Railroad Workers float we saw in SF’s St. Paddy’s parade on Saturday.
Anyway, I found this bit starring & directed by Jason Momoa to be pretty charming. Enjoy:
Hey gang—I hope you’ve had a safe & festive St. Patrick’s Day. To mark the occasion, I figured I’d reshare a couple of the videos I captured in the old country with my dad back in August.
Here’s Co. Clare’s wild burren (“rocky district,” hence the choice of Chieftains/Stones banger)…
I cannot tell you how deeply I hope that the Photoshop team is paying attention to developments like this…
My Photoshop is more fun than yours :-p With a bit of help from Krea ai.
It’s a crazy feeling to see brushstrokes transformed like this in realtime.. And the feeling of control is magnitudes better than with text prompts.#ai#artpic.twitter.com/Rd8zSxGfqD
This made me chuckle & remember “Subpar Parks,” a visual celebration of the most dismissive reviews of our natural treasures. My wife & I have long decorated our workspaces with these unintentional gems, and I think you’ll dig the Insta feed & book (now complemented by “Subpar Planet“).
I love this idea and I tried, for Giedi Prime, the home world of Harkonnen, there’s less information in the book and it’s a world that is disconnected from nature. It’s a plastic world. So, I thought that it could be interesting if the light, the sunlight could give us some insight on their psyche. What if instead of revealing colors, the sunlight was killing them and creating a very eerie black and white world, that will give us information about how these people perceive reality, about their political system, about how that primitive brutalist culture and it was in the screenplay.
So, @StabilityAI has this new experimental imageTo3D model, and I just painted a moon buggy in SageBrush, dropped it into their Huggingface space, converted it in Reality Converter, and air dropped it onto the moon – all on #AppleVisionPropic.twitter.com/pj3TTcy5zt
Heh—these are obviously silly but well done, and they speak to the creative importance of being specific—i.e. representing particular famous faces. I sometimes note that a joke about a singer & a football player is one thing, whereas a joke about Taylor Swift & Travis Kelce is a whole other thing, all due to it being specific. Thus, for an AI toolmaker, knowing exactly where to draw the line (e.g. disallowing celebrity likenesses) isn’t always so clear.
It’s a great question, and I think it’s really thoughtful that the day before I joined, the company was generous enough to run a Superb Owl—er, Super Bowl—commercial, just to help me explain the mission to my parents. 😀
But seriously, this ad provides a brief peek into the world of how Copilot can already generate beautiful, interesting things based on your needs—and that’s a core part of the mission I’ve come here to tackle.
Founded by ex-Google Imagen engineers, Ideogram has just launched version 1.0 widely. It’s said to offer new levels of fidelity in the traditionally challenging domain of type rendering:
Introducing Ideogram 1.0: the most advanced text-to-image model, now available on https://t.co/Xtv2rRbQXI!
This offers state-of-the-art text rendering, unprecedented photorealism, exceptional prompt adherence, and a new feature called Magic Prompt to help with prompting. pic.twitter.com/VOjjulOAJU
Historically, AI-generated text within images has been inaccurate. Ideogram 1.0 addresses this with reliable text rendering capabilities, making it possible to effortlessly create personalized messages, memes, posters, T-shirt designs, birthday cards, logos and more. Our systematic evaluation shows that Ideogram 1.0 is the state-of-the-art in the accuracy of rendered text, reducing error rates by almost 2x compared to existing models.
So, it’s true: After nearly three great years back at Adobe, I’ve moved to just the third place I’ve worked since the Clinton Administration: Microsoft!
I’ve signed on with a great group of folks to bring generative imaging magic to as many people as possible, leveraging the power of DALL•E, ChatGPT, Copilot, and other emerging tech to help make fun, beautiful, meaningful things. And yes, they have a very good sense of humor about Clippy, so go ahead and get those jokes out now. :->
It really is a small world: The beautiful new campus (see below) is just two blocks from my old Google office (where I reported to the same VP who’s now in charge of my new group), which itself is just down the road from the original Adobe HQ; see map. (Maybe I should get out more!)
And it’s a small world in a much more meaningful sense: I remain in a very rare & fortunate spot, getting to help guide brilliant engineers’ efforts in service of human creativity, all during what feels like one of the most significant inflection points in decades. I’m filled with gratitude, curiosity, and a strong sense of responsibility to make the most of this moment.
Thank you to my amazing Adobe colleagues for your hard & inspiring work, and especially for chance to build Firefly over the last year. It’s just getting started, and there’s so much we can do together.
Thank you to my new team for opening this door for us. And thank you to the friends & colleagues reading these words. I’ll continue to rely on your thoughtful, passionate perspectives as we navigate these opportunities together.
My friend Nathan Shipley has been deeply exploring AnimateDiff for the last several months, and he’s just collaborated with the always entertaining Karen X. Cheng to make this little papercraft-styled video:
While we’re all waiting for access to Sora…
Here’s our test using open source tools. You can get a decent level of creative control with AnimateDiff
I’m a day late saying it here, but happy birthday to three technologies that changed my life (all our lives, maybe), and to which I’ll be forever grateful to have gotten to contribute. As Jeff Schewe noted:
Happy Birthday Digital Imaging…aka Photoshop, Camera Raw & Lightroom. Photoshop shipped February 19th, 1990. Camera Raw shipped February 19th, 2003 and Lightroom shipped February 19th, 2007. Coincidence? Hum, I wonder…but ya never know when Thomas Knoll is involved…
Check out Jeff’s excellent overview, written for Photoshop’s 30th, as well as his demo of PS 1.0 (which “cost a paltry $895 and could run on home computers like the Macintosh IIfx for under $10,000″—i.e. ~$2,000 & $24,000 today!).
Just in case you’ll be around San Jose this Friday, check out this panel discussion featuring our old Photoshop designer Julie Meridian & other artists discussing their relationship with AI:
Panel discussion: Friday, February 23rd 7pm–9pm. Free admission
Featuring Artists: Julie Meridian, James Morgan, and Steve Cooley Moderator: Cherri Lakey
KALEID Gallery is proud to host this panel with three talented artists who are using various AI tools in their artistic practice while navigating all the ethical and creative dilemmas that arise with it. With all the controversy around AI collaborative / generated art, we’re looking forward to hearing from these avant-garde artists that are exploring the possibilities of a positive outcome for artists and creatives in this as-of-yet undefined new territory.
Sora can create videos of up to 60 seconds featuring highly detailed scenes, complex camera motion, and multiple characters with vibrant emotions. https://t.co/7j2JN27M3W
I really enjoyed getting some behind-the-scenes info on The Cinematography of Oppenheimer, courtesy of director of photography Hoyte Van Hoytema & Dolby’s Sound + Image Lab podcast. It’s packed with interesting details (e.g. hacking loud & bulky IMAX hardware, dealing with new film emulsions, the impact on color perception of cutting from color to B&W, etc.). I think you’ll dig it.
“The company says watermarks from C2PA will appear in images generated on the ChatGPT website and the API for the DALL-E 3 model. Mobile users will get the watermarks by February 12th. They’ll include both an invisible metadata component and a visible CR symbol, which will appear in the top left corner of each image.”
“Meta will employ various techniques to differentiate AI-generated images from other images. These include visible markers, invisible watermarks, and metadata embedded in the image files… Additionally, Meta is implementing new policies requiring users to disclose when media is generated by artificial intelligence, with consequences for failing to comply.”
“When you create a design in Designer you can also decide if you’d like to include basic, trustworthy facts about the origin of the design or the digital content you’ve used in the design with the file.”
Not having a spare $3500 burning a hole in my pocket, I’ve yet to take this for a spin myself, but I’m happy to see it. Per the Verge:
The interface of the Firefly visionOS app should be familiar to anyone who’s already used the web-based version of the tool — users just need to enter a text description within the prompt box at the bottom and hit “generate.” This will then spit out four different images that can be dragged out of the main app window and placed around the home like virtual posters or prints. […]
Meanwhile, we also now have a better look at the native Adobe Lightroom photo editing app that was mentioned back when the Apple Vision Pro was announced last June. The visionOS Lightroom experience is similar to that of the iPad version, with a cleaner, simplified interface that should be easier to navigate with hand gestures than the more feature-laden desktop software.
I’m delighted to say that firefly.adobe.com now supports a live stream of community-created generative recipes. You can share your own simply by creating images via the Text to Image module, then clicking the share button. I’m especially pleased that if you use Generative Match to choose a stylization guide image, that image will be included in the recipe for anyone to use.
Heh—this fun little animation makes me think back to how I considered changing my three-word Google bio from “Teaching Google Photoshop” (i.e. getting robots to see & create like humans, making beautiful things based on your life & interests) to “Wow! Nobody Cares.” :-p Here’s to less of that in 2024.
I’ve been thinking of this trenchant observation a lot lately.
Loving this creative by @Ogilvy . “The more you connect, the less you connect with others,” the important campaign created by @Ogilvy (Beijing) for the Center for Psychological Research in China. pic.twitter.com/y2HVhXf12F
I had a chance to sit down for an interesting & wide-ranging chat with folks from the Wharton Tech Club:
Tune into the latest episode of the Wharton Tech Toks podcast! Leon Zhang and Stephanie Kim chat with John Nack, Principal Product Manager at Adobe with 20+ years of PM experience across Adobe and Google, about GenAI for creators, AI ethics, and more. He also reflects on his career journey. This episode is great if you’re recruiting for tech, PM, or Adobe.