Research Guides: Machines and Society: Generative AI for Visual Design

Introduction

Generative AI is rapidly transforming the landscape of visual media, design, and art. With AI tools and capabilities evolving at a fast pace, this guide provides a snapshot of common tools and concepts in this constantly updating field. This guide aims to introduce key concepts, categories of tools, common applications, various access methods, and important considerations when using generative AI for visual creation.

Page last updated: June 2025

Applications: How Generative AI is Used in Visual Creation

Generative AI is becoming an increasingly integral part of the creative workflows of artists and designers in their daily practices. While the possibilities are constantly expanding, this section highlights some of the most common and practical applications across various fields—from art and storytelling to branding, architecture, and beyond. These examples illustrate new ways to ideate, prototype, and communicate visually. Keep in mind, these are just a few of the many creative use cases emerging in this rapidly evolving space.

Art and illustration: Digital paintings and illustrations, Concept art for games and films, Abstract and experimental art
Content creation and storytelling: Illustrations for picture books and comics， Storyboards for film, animation, and advertising
Design and branding: User Interface (UI) elements and mockups, Logo design and brand identity exploration, Custom typography, fonts, and stylized text/titles, Website and app design mockups
Marketing and advertising: Promotional graphics and social media content, Advertising visuals and campaign materials
Product and industrial design: Product concept renderings and visualizations, Packaging design mockups, Design specifications and visual documentation
Architecture and environmental design: Architectural renderings (exterior and interior), Environmental and landscape visualizations, Urban planning mockups
Media and entertainment: Short-form video clips and animations, Visual effects (VFX) prototyping, Backgrounds and assets for games and virtual environments
Productivity and communication: Slides and presentations (e.g., PowerPoint, Google Slides), Mind maps and visual organizers, Data visualization concepts

Key Generative AI Tool Categories and Platforms

Note: This field is rapidly changing. The tools listed here serve as examples and are not exhaustive. Users should always check the terms of service for any tool they use.

A. Pixel-Based (Raster) Image Generation
Overview: Creating or modifying images composed of pixels, typically from text prompts (text-to-image) or based on existing images (image-to-image).

Key Features and Techniques:

Text-to-image: Generating images from textual descriptions.
Image-to-image: Transforming an input image based on a text prompt or style.
Inpainting and outpainting: Modifying specific parts of an image or extending its boundaries.
Control mechanisms: Using tools like ControlNet to guide generation with inputs like pose, depth maps, sketches, etc.
Style/Concept adaptation: Applying specific styles or concepts to images using techniques like IPAdapter.
Fine-tuning and custom models: Training models on specific datasets or using pre-trained specialized models (e.g., LoRA - Low-Rank Adaptation).

Example Platforms and Models:

Midjourney: Known for high-quality artistic outputs.
Stable Diffusion: Open-source model with many user interfaces (e.g., Automatic1111, ComfyUI, Fooocus) allowing for local deployment and extensive customization.
DALL·E: Integrated into ChatGPT Plus and available via API, known for strong prompt adherence.
Google Imagen (via Vertex AI or experimental tools like ImageFX): Google's family of text-to-image models.
Flux: Open-weight text-to-image model with high-quality output and efficient performance. Ideal for research and creative workflows; outputs may be used commercially, but model usage is restricted to non-commercial purposes.

B. Multimodal AI Platforms with Visual Capabilities
Overview: AI systems that understand and generate information across multiple modalities (text, images, audio, code). They often provide conversational interfaces for performing visual tasks.

Example Platforms:

OpenAI's ChatGPT (with GPT-4o/GPT-4V): Analyzing images, generating images via DALL·E 3, and writing code for visual tasks (e.g., using Code Interpreter to create charts or SVGs).
Google Gemini: Processing and understanding text, images, audio, and video, and generating images.
ByteDance Doubao AI / Cici: Multimodal assistant with image generation capabilities (prominence may vary by region).

C. Vector Graphics Generation
Overview: Creating scalable vector graphics (SVGs, etc.) that are resolution-independent, suitable for logos, icons, and illustrations.

Example Platforms:

Adobe Firefly: Integrated into Adobe Illustrator for text-to-vector graphics and other vector manipulation features.
Recraft: A platform focused on AI-generated vector art, icons, and 3D illustrations.

D. Video Generation
Overview: Generating video clips from text prompts, images, or existing video footage.

Example Platforms and Models:

Google Veo: Google's advanced text-to-video model.
Runway (Gen-1, Gen-2, Gen-3): Platform offering various AI magic tools, including text-to-video and video-to-video.
Wan: Open-source suite of advanced video generative models from Alibaba Cloud. Wan2.1 supports tasks like text-to-video, video editing, and video-to-audio.
Kuaishou Keling: Text-to-video model from Kuaishou Technology.
HeyGen / Synthesia: Platforms focused on AI avatar video generation (useful for presentations, training).

E. 3D Model Generation
Overview: Creating three-dimensional models from text prompts or 2D images.

Example Platforms and Models:

Tencent Hunyuan3D: An open-source high-quality 3D-DiT generative model.
Meshy AI: Generates 3D assets from text or images.
NVIDIA GET3D / Shap-E: Research and models in the 3D generation space.

F. Code Generation for Visuals
Overview: Using AI to generate code for web design (HTML, CSS, JavaScript), data visualizations (e.g., D3.js, Python libraries), generative art (e.g., Processing, p5.js), shaders, and other visual applications.

Example Platforms (General LLMs capable of this):

ChatGPT (Code Interpreter / GPT-4 models)
Claude (Claude 3 models)
GitHub Copilot: AI pair programmer, useful for all coding tasks including visual ones.
DeepSeek Coder: A specialized family of code generation models.
Cursor AI: The AI code editor, offering features like code generation, smart rewrites, and agent-assisted task completion for enhanced developer productivity.

Applications that are able to utilize LLMs through code:

Processing: Arts-focused coding environment supporting Java, Python, and more—ideal for creative coding and visual experimentation.
OpenFrameWorks: Open source C++ toolkit for creative coding, offering a simple, flexible framework for artistic projects.
Unity: Game engine using C# for scripting interactions and shaders; supports GPT plugins via the Unity Asset Store.
Adobe After Effects: Motion graphics software with JavaScript-based Expressions and scripting for custom plugins.
Blender: 3D creation suite supporting Python scripting, add-ons, and natural language tools like the GPT-4 Add-on for Blender.

G. Presentation and Document Design Assistance
Overview: AI tools that assist in generating, structuring, and designing presentations, reports, and other visual documents.

Example Platforms:

Gamma: AI-powered presentation and document creator.
Microsoft Copilot for Microsoft 365: AI assistance within PowerPoint, Word, etc.
Kimi: A large language model platform that includes a PPT assistant to help you create PowerPoint presentations.

H. AI-Powered Design Platforms and "Agents"
Overview: Integrated design platforms that leverage AI across various workflows, sometimes acting as "agents" to automate or assist in complex visual tasks from start to finish.

Example Platforms:

Canva: Offers numerous AI features for design, image editing, video, and presentations.
Adobe Firefly (as a model suite): Integrated across Adobe Creative Cloud apps (Photoshop, Illustrator, Express) to provide generative capabilities and assist in workflows.

Accessing Generative AI Tools for Visual Creation

Generative AI tools for visual creation can be accessed in several ways, each with its own advantages, cost implications, and technical requirements. Understanding these options can help you choose the best approach for your specific needs, whether you're a beginner looking for ease of use or an expert needing deep customization. Key factors to consider when selecting a tool include its primary function (e.g., image, video, 3D), ease of use, cost, available features, desired output quality, and the level of control you require over the generation process.

A. Online Platforms and Web-Based Services
Description: Ready-to-use tools accessible via a web browser, often subscription-based (SaaS) or offering freemium tiers. This is the most common and user-friendly entry point.

Examples: Midjourney (via Discord), ChatGPT (with DALL·E 3), Canva, Adobe Firefly website, Recraft, Runway.

B. Local Deployment
Description: Running AI models directly on your own computer hardware. This offers maximum control, customization, and privacy, but also requires more technical expertise and powerful hardware.

Examples: Stable Diffusion (using interfaces like Automatic1111 WebUI, ComfyUI, Fooocus), open-source models from Hugging Face.

Stable Diffusion DIY:

Configuring Stable Diffusion on your own device is a fairly technical undertaking. At the very least, you need to be able to use the command line tools.

Additionally, you need a computer with a dedicated Nvidia GPU to be able to setup Stable Diffusion. Currently support for Apple M chip devices and AMD GPUs is not widespread. However, you can use these resources as a starting point to find tools that do support your hardware.

You need to install Docker before using these resources. You can get Docker for free here.

The Stable Diffusion (webUI) can be installed using this GitHub repository. Simply follow this Setup guide and you are set up.

The commands below are for your reference.

git clone https://github.com/AbdBarho/stable-diffusion-webui-docker docker compose --profile download up --build docker compose --profile auto up --build

auto is the most feature-rich option and is the suggested one.

C. APIs (Application Programming Interfaces)
Description: For developers and businesses to integrate generative AI capabilities into their own custom applications, websites, or workflows.

Examples: OpenAI API (DALL·E, GPT-4V), Stability AI API, Google Cloud Vertex AI (Gemini, Imagen), Anthropic API (Claude).

D. Plugins and Software Integrations
Description: Generative AI features embedded within existing software applications (e.g., design tools, productivity suites, browsers), extending their functionality.

Examples: Adobe Photoshop (Generative Fill), Microsoft 365 Copilot, Figma plugins, VS Code, etc.

Ethical Considerations, Copyright, and Responsible Use

A. AI Art and Copyright Infringement
Awareness of copyrights: You are likely not the exclusive owner of the images created with AI generators under certain terms, and there could be legal risks if the AI was trained on copyrighted material without permission.

Varying terms: Different AI generators have different rules for commercial use, ownership, and licensing. AI art generation is an emerging field; it’s crucial to read the terms and conditions before using or distributing any artwork created through AI.

Example with Midjourney: Under Midjourney's copyright terms (as of past review, always check current terms), non-paying users are often granted an asset license under the Creative Commons Noncommercial 4.0 Attribution license, while paid subscribers may have different, often more permissive, commercial rights.

Responsible AI and content verification: Tools like SynthID support ethical AI use by embedding invisible watermarks in generated images, making it easier to verify and track AI-created content without altering its appearance.

B. Concerns Regarding AI-Generated Images and Content
Inappropriate content: Similar to human-created drawings, AI-generated content can also be inappropriate or harmful. AI can amplify this due to its high-speed productivity and ability to mimic specific characteristics.

Examples of inappropriate content (based on common guidelines like Midjourney's):

Be kind and respect each other
SFW content only
Be thoughtful about how you share your creations
Unauthorized automation and third party apps are not allowed

Artist protests and originality: Concerns from artists about AI diminishing the value of traditional art forms, originality, and human creativity. Debates around AI models being trained on artists' work without consent or compensation.

Bias perpetuation: AI models can learn and perpetuate harmful stereotypes or biases present in their training data.

C. How to Protect Your Data
Data privacy: Be cautious when uploading personal photos, proprietary artwork, or other sensitive visual data to online AI tools. Understand how the service might use your uploaded data.

Limiting exposure: The most secure way to protect data is not to upload it to public or untrusted services.

Protective technologies: Researchers are working on methods to protect visual data from unauthorized AI training. For instance, tools like Fawkes or Glaze make tiny, often human-imperceptible pixel-level changes to images to "cloak" them, confusing AI models trying to learn artistic styles or identify individuals.

Reflecting on Visual Creation Approaches and Workflows

A project would usually follow the basic steps below:

Starting point: Define the core idea, purpose and constraints
Brainstorming: Ideation and conceptual exploration
Workflow selection/design: Aligning tools, methods, and teams
Execution: Production and refinement

The rise of AI is reshaping how visual creation works. In a traditional schema, a one-to-one correspondence prevailed between a given creative task and its supporting tool—for example, generating a three-dimensional (3D) rendering exclusively with 3D modeling software. AI disrupts this clear mapping of tasks to tools and predetermined workflows. Besides, the velocity at which new AI functionalities emerge has shortened the iteration cycle for workflow innovation from months or years to days, thereby demanding continuous recalibration of methodological frameworks.

Ways to Integrate AI
Selective augmentation of conventional workflows
Under this approach, the conventional visual creation pipeline remains intact, and AI tools are introduced opportunistically to expedite or enhance discrete stages of the process. For instance, if an AI-based image-generation model demonstrably reduces the time required for preliminary concept sketches, designers may opt to incorporate it solely for that segment of the workflow (e.g., brainstorming or prototyping).

AI-centric workflow design
In contrast, a more radical approach involves reconceiving the entire project workflow around AI capabilities from the outset. This may entail identifying tasks that AI currently performs with superior efficiency—such as rapid style exploration or automated layout generation—and structuring the sequence of operations so that these AI-driven tasks become foundational. Conversely, components that rely on uniquely human faculties (e.g., aesthetic judgment, cultural contextualization, or highly nuanced brand messaging) are delineated explicitly as manual interventions.

End-to-end AI-driven workflows
A third extreme envisions relegating the entire production pipeline to AI—potentially requiring minimal human oversight beyond specifying high-level objectives. In this mode, prompts, iterative refinements, and final quality checks are all managed through AI agents capable of autonomously executing discrete subtasks (e.g., generating iteration variants, evaluating visual coherence, or adjusting technical specifications).
Aside from the starting point, AI can potentially handle most stages of creation. The challenge is deciding which approach fits each project.

Choosing an Approach: Quality, Time, Resources
When selecting an approach, consider three factors:

Quality

Accuracy: Does the output make basic sense？e.g., Correct anatomy: a human should have five fingers, not six.
Technical specs: Is the resolution and format appropriate? e.g., high enough for a high-definition display.
Compliance: Does it meet the project’s requirements? e.g., correct product images in an advertisement.
Aesthetics: Is it visually appealing or professionally designed?
Creativity and originality: Is there innovation in the idea or style? While AI can recombine existing ideas in creative ways, it still relies heavily on its training datasets. At this stage, generating entirely new styles or truly original concepts remains impossible for AI.

Time

Production time: How long does AI take to produce and refine images?
Learning curve: How long does it take the team to explore, learn and test new AI tools and techniques?

Resources

Cost: Subscription fees, licensing, and hardware (e.g., GPUs).
Human effort: Time spent by designers on AI tasks versus higher-level creative work.

Often, AI saves time on routine tasks but may still require manual edits. If AI cannot meet a project’s specific needs, human-led work may be more efficient and cost-effective.

Balancing Trade-offs and Looking Ahead
From a business standpoint, the ideal workflow maximizes quality for the lowest cost. Currently, many specialized demands still favor human effort—especially when originality matters most. In contrast, tasks like initial mockups or basic background generation are well suited to AI.

In practice, most teams adopt a hybrid approach, letting AI handle repetitive or volume-driven tasks while humans focus on creative judgment and final refinements. Over time, these hybrid workflows may become standard, particularly in areas where AI matches or exceeds human speed and consistency.

In the long term, AI is expected to permeate every phase of the visual creation process. As this integration deepens, new demands—such as authentic creativity, handcrafted techniques, and genuine originality—are likely to emerge. Time will tell, and we will see.

Contact

Xinyi Zhu
Motion Graphic Designer/Animator
xz3366@nyu.edu