How machine learning powers generative art NFTs

Artificial intelligence (AI) in the non-fungible token (NFT) industry is growing. Generative art, or art produced by an autonomous system, has quickly become one of the significant NFT market categories, inspiring creative projects and astounding collections. NFTs have emerged as one of the primary channels for accessing AI-powered art, from the creations of AI art legends like Tyler Hobbs’ new QQL project to those of Refik Anadol and Sofia Crespo.

One of the classic uses for machine learning has been generative art, but it has only recently gained widespread attention. The computational advances and the development of new methods that allow models to learn without a large amount of expensive and scarcely available labelled datasets have driven mainly the advancement. Since it takes time to experiment with new techniques, many new generative art techniques have yet to be widely adopted by well-known artists. However, the gap between the generative art community and AI research has been closing in recent years.

The generative art catalyzers

Even many early AI pioneers were taken aback by the advent of generative AI, which they considered a niche branch of machine learning. There are three primary reasons for the tremendous advancement in generative AI:

Multimodal AI: There has been a massive increase in AI techniques over the past five years that can work in various domains, including language, image, video, and sound. Models like DALL-E or Stable Diffusion, which produce images or films from natural language, were made possible.
Pretrained language models: With techniques like GPT-3, language models have made incredible progress alongside the emergence of multimodal AI. This has made it possible to create artistic outputs like images, sounds, or videos using language as an input mechanism. As it has lowered the barrier for people to interact with generative AI models, language has played a crucial role in this new phase of generative AI.
Diffusion methods: Diffusion models are a strategy used in most of the photo-realistic art created by AI techniques we see today. Diffusion models are replacing techniques like generative adversarial networks (GAN) and variational auto-encoders (VAE), which have problems scaling and suffer from a lack of diversity in the outputs they produce. Diffusion models work around these restrictions by reconstructing the training data images after destroying them until they are noisy. A model should be able to rebuild an image from virtually any representation, including representations from other domains like language, if it can do so from data that is, theoretically, noise. Unsurprisingly, text-to-image generation approaches like DALL-E and Stable Diffusion are built on diffusion algorithms.

The emergence of NFTs, which have unlocked extremely significant capabilities for digital art like digital ownership, programmable incentives, and more egalitarian distribution models, coincides with the influence of these techniques on generative art.

The methods powering generative art in NFTs

PRYPCO Mint Launches UAE Real Estate Marketplace

Text to image: The NFT community’s favourite use of generative AI has been text-to-image (TTI) synthesis. Some AI models developed in the TTI field are now tangibly influencing popular culture. The most well-known application of TTI to produce aesthetic visuals is OpenAI’s DALL-E. Another TTI model developed by OpenAI, GLIDE, has been widely used in generative art applications. Google has been experimenting with various approaches in the field of generative art, including Imagen, which is based on diffusion models, and Parti, which is based on an alternative method called autoregressive models.
Text-to-video: Although text-to-video (TTV) is a more difficult part of generative art, we are making significant progress in this area. TTV models like Make-A-Video and Imagen Video, which can produce high-frame-fidelity video clips based on natural language, were recently published by Meta and Google.
Image-to-image: Although it seems almost natural, text-based image synthesis has limitations when capturing characteristics like distances between items, orientation, or even highly detailed landscapes. Drawings or other visuals work better to communicate this information. The best diffusion models include processes for producing images from sketches, including DALL-E, Stable Diffusion, and Imagen.
Music generation: Another frequent application of generative AI that has gained popularity in recent years is the automatic production of music. With models like MuseNet and, more notably, Jukebox, which can produce music in various styles and genres, OpenAI has also been in the vanguard of this change. Google has entered the market with AudioLM, a model that generates lifelike speech and piano music by only listening to sound fragments. With the release of Dance Diffusion, a collection of algorithms and tools that can produce creative music clips, Harmonai, funded by Stability AI, began pushing the limits of the AI music-generating field.

An enviable match: NFTs and generative art

Many relatively unrelated movements have been able to influence one another to obtain enormous market shares over the history of technology. The social-mobile-cloud revolution is the most recent example, in which each of these trends increased the market for the other two. NFTs and generative AI are beginning to display a similar dynamic. Both movements have succeeded in bringing a sophisticated technological market into popular culture. NFTs provide digital ownership and distribution models that would be nearly impossible to implement otherwise, complementing generative AI. Similarly, generative AI is anticipated to emerge as one of the major engines for the production of NFTs.