Friday, April 25, 2025

Part 3: Images -- The How, What, and Where

Part 3: Images -- The What, How, and Where

We've already talked about defining your concept and seeing your LoRA can do, so the next step is to start building a dataset. And to do that, you need images. Training a Flux LoRA requires carefully curated images to ensure the best results. I've tried more than once just to toss together images I liked and hope for the best, but it very rarely works out. Unfortunately, if you want to make a quality LoRA, it's something you're going to have to dedicate some time to the process.

How Many Images to do you need?

Flux seems to train best with a low number of images depending on the quality, diversity, and what you are training. I typically recommend 20 to 30 images, but its quite common to train on as few as 10 or as many as 50. I've even gotten away with quite a few one image LoRAs and have made some very successful models with only 3 or 4 images -- but they tend to be very specialized and are prone to overfitting. Of course, its always better to have too many (and whittle it down) than not enough, so I always start toward the top end of the range and then discard images as I comb through them.  So if this is what you came here for, go out and find about 50 images and expect to curate that down to 20-25 for your final dataset. I'll include a few more details about numbers in the next sections.

Selection of Images

In general, the selection of images needs to follow a somewhat oxymoronic concept – keep consistency in what you want to train and diversity in everything else. Here are a few things that are often pointed out for three LoRA types and how you'd want to apply them to Flux LoRA training.

Style LoRAs

  • Capture the Essence of the Style: Select images that highlight the distinctive features of the style you want to emulate (e.g., brush strokes, color palettes, composition). No matter how much you like an image, don’t include it unless it captures the style you are trying to train. One bad image can make a difference (not in a good way).
  • Varied Subjects, Consistent Style: Use images with different subjects but the same artistic style to teach the model the style independent of the subject matter. Avoid a dataset where certain elements (like a recurring object) dominate, which can lead the model to associate the style with that object. The fewer elements (other than the style) the images have in common, the better.
  • Compromise Quantity Before Style: It's generally thought that Flux does well training style LoRAs with 20 to 30 images. If the style is rare, you might have to work with fewer images. In such cases, focus on the most representative ones and leave out stuff on the fringe. If the images follow the first two points well, there’s a good chance you can still do well with 10 or fewer images..

Character LoRAs

  • Consistency in Appearance: Use images where the character's iconic features (face, hair, clothing) are consistent to help the model accurately learn the character and their standard appearance.
  • Avoid Crowds: Though you may want a few images with your character interacting with others, its best to avoid images with lots of other characters present, especially if you do not have a large dataset to work with. Models can become confused about who the character is when there is a lot going on.
  • Variety of Poses and Expressions: Include images showing different angles, poses, and facial expressions to make the character adaptable in various scenarios. This includes various framing, so images with the full body, portrait only, just the head, etc. Though it may not work to extremes, you should also include various art styles if they are available. However, try not to mix up iconic elements for a character—if you want them to appear in their traditional clothing, don’t include images with them wearing alternative clothes or costumes.
  • Compromise Quantity Before Consistency: If you can’t find enough consistent images, use fewer images. 10 to 20 images can still make a good style LoRA. It's better to have a small number of good images than risk a bad LoRA—and there are ways to maximize the utility of available images (see “Getting the Most Out of a Few Images” below).

Concept LoRAs

  • Define the Concept Clearly: Choose images that represent the concept accurately and encompass its various aspects. For example, if you wanted to train a certain type of cell phone, you'd want at least a few images of people using it. If it is just setting on a table in every image, it will be difficult for the model to figure out how people interact with it.
  • Diverse Examples: Incorporate a wide range of images to cover the breadth of the concept, ensuring the model understands its application in different contexts. You should have a varied selection of different subjects, camera angles, framing, and composition.
  • Quality Over Quantity: It's hard to put a number on how many images you need to include for a concept, as they can greatly vary; however, just like previous notes have mentioned, it's better to have fewer high-quality, relevant images than a large number of mediocre ones. You may be able to get away with only 10 or need to close to 50. A concept can quickly become muddied, and much like the other LoRA types, there are ways of getting more out of a limited number of good images (see “Getting the Most Out of a Few Images” below).

Image Quality

It may be obvious if you read through all the notes above, but quality is important—and for quite a few reasons. But what is quality? Let's go through what we know about it and how to select the best quality images for a training dataset.

  • Avoid Ambiguous Images and Distracting Elements: It's mentioned before, but in general, you want to avoid having too many images that mix styles, characters, or concepts. For example, if you are training a character, don’t use an image that shows that character in a group of other characters. Exclude images with busy backgrounds, frames, or other elements that might confuse the model.
  • Use High-Resolution Images: Utilize clear, high-resolution images to ensure the model learns the most accurate features. Images that will have a final size that is at or over the 1 million pixel range (like 1024x1024) is perfect because it can be scaled down for other training resolutions. If you have less than an ideal number of images, even higher resolutions will be very beneficial (see “Getting the Most Out of a Few Images”). While it is true that Flux can train at lower resolutions (512 is common), you want to leave yourself plenty of room to crop images to fit into certain training buckets or to get rid of an unwanted watermark.
  • Avoid Blurry or Pixelated Images: Even if they are technically higher resolution, avoid images with blurs, poor lighting, or pixelation. These are often images that are captured stills from streaming video or images that were enlarged from lower resolutions. Even artistic blurs can be a problem, especially if they aren't captioned well (using keywords like "bokeh" or describing a "background blur"). 
  • Use Lossless Image Formats: Without going too deep into technical details, images come in two types (usually depending on their file extension)—lossless and lossy. Lossless image formats preserve all the original data without any compression artifacts, ensuring that fine details and color information remain intact. Lossy image formats, on the other hand, compress images by removing some data, which can introduce artifacts and degrade image quality. When selecting images, always choose lossless images over lossy—you can always save them in a different format later but picking lossless images up front will save you trouble later. So, pick original images in a PNGs or TIFFs format over the JPEGs/JPGs when you have the option.
  • Look out for compression artifacts: As mentioned above, lossy image compression can cause bad quality images. But to the average human eye, its not always obvious; however, computer models can learn bad habits from these artifacts. A good trick is to zoom in really far and look at the image. Look for odd color patterns, especially a "banded" appearance. When you look really closely, there is also often a halo effect around people and objects on poorly resized images -- a ripple that surrounds them where the pixels are slightly altered.
  • Avoid Watermarks and Logos: Ensure images are free from watermarks, logos, or other types of overlays that may include distracting elements that don't contribute to the learning objective. You never know when the LoRA will learn a bad habit from that one image with a watermark or logo and think that every image generated should have one too.

Where to Find Images

Finding the right images is the foundation of your Flux LoRA. It's not just about gathering a bunch of pictures you like—it's about finding high-quality, appropriately licensed images that fit your concept well. Being mindful of where your images come from can save you a lot of trouble later, especially if you plan to share or release your LoRA. What images you can legitimately use for training an AI model may be limited by your own government, so always check your own national and local laws. Regardless of what's here, you are responsible for following the laws and regulations that affect you. I am not a lawyer and this is not legal advice.

First, you need to be aware of license types and how it relates to building a dataset. Not all images are free to use, even for training purposes. Ideally, you should only use images that are either in the public domain, openly licensed (like Creative Commons), or your own original work. Avoid anything with "All Rights Reserved" unless you have explicit permission. While it is not currently a violation of copyright laws to use images with copyrights to train AI models (in the US as of the date of this post), its still a very gray legal and ethical area. Even if you're not planning to share the LoRA, respecting licenses is good practice and avoids potential problems down the road. 

Where you should be safe

Keep in mind laws and regulations change and just because you aren't currently violating a copyright by using a training image under something like the fair use clause, it doesn't mean it will also be that way. If you want to stay 99% in the clear, here are some sources of images that are unlikely to be an issue now or in the future. 

  • Public Domain Archives: Websites like Wikimedia Commons and Public Domain Review host a large number of images that are free to use. Still, double-check individual entries, as not everything uploaded to these platforms is automatically public domain.
  • Creative Commons Platforms: Platforms like Flickr allow filtering by license. Stick to CC0 or CC-BY images, which are free to use with little or no attribution requirements. Always verify the license on the specific image page—some platforms mix licensing types within collections.
  • Purchasing Stock Images: Paid stock image sites (like Shutterstock, Adobe Stock, or Depositphotos) offer high-quality images, and purchasing them gives you clear rights for personal projects. Just make sure the licensing agreement covers "machine learning training" or "derivative works" if you intend to distribute your LoRA.
  • Personal Collections: If you've taken photos yourself, you're in the clear! Personal photography is a great source because it guarantees originality, and you can tailor your images exactly to your needs. Of course, you should be following all applicable laws when you take any of those pictures and have consent of all participants to use the images when training a model.
  • Generate the Images: You can use an AI model (like Flux itself) to generate base images if you need to create something very specific. Just keep in mind that model-generated images might inherit training biases and AI models also can have license restrictions for generated images.
  • Commission Artists: Hiring an artist to create custom images for your dataset can be a fantastic option—especially if you need a very distinct style or concept that doesn't exist elsewhere. Always make sure the agreement allows the images to be used for training purposes.
  • Old Books and Magazines: Many vintage publications are public domain or available with open licenses and copyrights expire after 95 years in the US. There is a good reason large number of my LoRAs are trained on art and other documents that are over 100 years old. 
  • Government Archives: Some government-produced media (like NASA images) are public domain. Just check.
  • Open-Source Image Repositories: Some open datasets are freely available specifically for AI training.

Collecting images from multiple sources can give you both the variety and consistency needed for a strong LoRA—just make sure you keep track of your sources in case you need to reference them later.



No comments:

Post a Comment

The Making of a LoRA: Ink & Lore

The Making of a LoRA: Ink & Lore I've been saving random watercolor and color pencil images for a while now, not really having much ...