Saturday, April 26, 2025

The Making of a LoRA: Ink & Lore

I've been saving random watercolor and color pencil images for a while now, not really having much of a plan of what I'd do with them. However, I recently discovered Flux can do some unexpected and creative things when trained on a mix styles, so I decided to run a LoRA without a particular style target in mind, but instead a fusion of several mediums. I liked the way it turned out.

Description 1 — Art by Civitai user xanity131

Description 2 — Art by Civitai user matterhorn44388

Images generated with the Ink & Lore LoRA evoke the visual language of lost Northern myths and ancient storytelling traditions. I love the detail, scrollwork, and ritualistic aesthetic.

The Training Settings

The dataset was 14 images, cropped to the following aspect ratios: 2 images (3:5), 6 images (7:9), 2 images (1:1), and 4 images (5:3).
That dataset was used 4 times at 4 resolutions, training at:

256, batch size of 8, with 2 repeats, no buckets, random crop
512, batch size of 4, with 2 repeats, buckets enabled
768, batch size of 4, with 2 repeats, buckets enabled
1024, batch size of 2, with 2 repeats, buckets enabled

Training ran for 2000 steps, but the best model was saved at 1728 steps (epoch 56 of 63)
Training used an equal network alpha and rank of 8.
It was trained on the flux-dev2pro-fp8 base model
Used AdamW8bit Optimizer set with betas=0.9,0.999 and weight decay=0.01
Used cosine annealing for the LR scheduler with a max LR of 6e-4 and a min LR of 2e-4, cycling every 200 steps for 8 cycles
Trained clip-l with a LR of 5e-5

A Closer Look

Multiple Resolutions

Though there are many ways to run a multiple resolution training, I've found that a wide spread including going all the way down to 256 works well for an art style LoRA. I typically set the lowest resolution without bucketing and let it randomly crop. At the lowest resolution, the model does not capture fine details, but rather overall stylistic elements. Because of this, I also found that it works well to run the lowest resolution without captions, which appears to enhance the strength of the style. Even though I can't train at 1024 at anything higher than batch 2, mixing the training sets averages out the batch sizes to speed up training and lets me scale up the learning rate without burning the model.

Learning Rate Schedule

I actually made a mistake training this model that turned out to work in my advantage. Typically I've been training models with a cosine annealing LR scheduler and running 8 cycles for 200 step each for a total of 1600 training steps. The restarts help break the model out of any slumps and makes it easier to map out where the best epochs are likely to be. However, I increased the number of training steps but forgot to proportionally increase the cycles, causing the learning rate to plateau at a constant 2e-4 after the 8th cycle (1600 steps). The smooth finish of the run made it easy to pick out several good epochs from the last 500 or so steps

Training on Dev2Pro

To enhance the stability and quality of LoRA training, I opted to use the Flux-Dev2Pro model as the base. Flux-Dev2Pro is a fine-tuned version of Flux-Dev, specifically designed to address common issues encountered during LoRA training, such as distorted outputs and model collapse. I've often repeated the same training run with both Flux-Dev2Pro and the original Flux dev base and training with Flux-Dev2Pro yields superior LoRA models in the majority of cases with a few occasional ties in overall quality. However, it's important to note that if you're using training samples, they may not be trustworthy, as the script utilizes Flux-Dev2Pro for inference, which can result in subpar images.To save time, it's advisable to disable samples and evaluate your epochs once training is complete.

Where to get the LoRA

Ink & Lore is available for free download at on Civitai and can be run online on Mage.Space.

Friday, April 25, 2025

Part 3: Images -- The How, What, and Where

Part 3: Images -- The What, How, and Where

We've already talked about defining your concept and seeing your LoRA can do, so the next step is to start building a dataset. And to do that, you need images. Training a Flux LoRA requires carefully curated images to ensure the best results. I've tried more than once just to toss together images I liked and hope for the best, but it very rarely works out. Unfortunately, if you want to make a quality LoRA, it's something you're going to have to dedicate some time to the process.

How Many Images to do you need?

Flux seems to train best with a low number of images depending on the quality, diversity, and what you are training. I typically recommend 20 to 30 images, but its quite common to train on as few as 10 or as many as 50. I've even gotten away with quite a few one image LoRAs and have made some very successful models with only 3 or 4 images -- but they tend to be very specialized and are prone to overfitting. Of course, its always better to have too many (and whittle it down) than not enough, so I always start toward the top end of the range and then discard images as I comb through them. So if this is what you came here for, go out and find about 50 images and expect to curate that down to 20-25 for your final dataset. I'll include a few more details about numbers in the next sections.

Selection of Images

In general, the selection of images needs to follow a somewhat oxymoronic concept – keep consistency in what you want to train and diversity in everything else. Here are a few things that are often pointed out for three LoRA types and how you'd want to apply them to Flux LoRA training.

Style LoRAs

Capture the Essence of the Style: Select images that highlight the distinctive features of the style you want to emulate (e.g., brush strokes, color palettes, composition). No matter how much you like an image, don’t include it unless it captures the style you are trying to train. One bad image can make a difference (not in a good way).
Varied Subjects, Consistent Style: Use images with different subjects but the same artistic style to teach the model the style independent of the subject matter. Avoid a dataset where certain elements (like a recurring object) dominate, which can lead the model to associate the style with that object. The fewer elements (other than the style) the images have in common, the better.
Compromise Quantity Before Style: It's generally thought that Flux does well training style LoRAs with 20 to 30 images. If the style is rare, you might have to work with fewer images. In such cases, focus on the most representative ones and leave out stuff on the fringe. If the images follow the first two points well, there’s a good chance you can still do well with 10 or fewer images..

Character LoRAs

Consistency in Appearance: Use images where the character's iconic features (face, hair, clothing) are consistent to help the model accurately learn the character and their standard appearance.
Avoid Crowds: Though you may want a few images with your character interacting with others, its best to avoid images with lots of other characters present, especially if you do not have a large dataset to work with. Models can become confused about who the character is when there is a lot going on.
Variety of Poses and Expressions: Include images showing different angles, poses, and facial expressions to make the character adaptable in various scenarios. This includes various framing, so images with the full body, portrait only, just the head, etc. Though it may not work to extremes, you should also include various art styles if they are available. However, try not to mix up iconic elements for a character—if you want them to appear in their traditional clothing, don’t include images with them wearing alternative clothes or costumes.
Compromise Quantity Before Consistency: If you can’t find enough consistent images, use fewer images. 10 to 20 images can still make a good style LoRA. It's better to have a small number of good images than risk a bad LoRA—and there are ways to maximize the utility of available images (see “Getting the Most Out of a Few Images” below).

Concept LoRAs

Define the Concept Clearly: Choose images that represent the concept accurately and encompass its various aspects. For example, if you wanted to train a certain type of cell phone, you'd want at least a few images of people using it. If it is just setting on a table in every image, it will be difficult for the model to figure out how people interact with it.
Diverse Examples: Incorporate a wide range of images to cover the breadth of the concept, ensuring the model understands its application in different contexts. You should have a varied selection of different subjects, camera angles, framing, and composition.
Quality Over Quantity: It's hard to put a number on how many images you need to include for a concept, as they can greatly vary; however, just like previous notes have mentioned, it's better to have fewer high-quality, relevant images than a large number of mediocre ones. You may be able to get away with only 10 or need to close to 50. A concept can quickly become muddied, and much like the other LoRA types, there are ways of getting more out of a limited number of good images (see “Getting the Most Out of a Few Images” below).

Image Quality

It may be obvious if you read through all the notes above, but quality is important—and for quite a few reasons. But what is quality? Let's go through what we know about it and how to select the best quality images for a training dataset.

Avoid Ambiguous Images and Distracting Elements: It's mentioned before, but in general, you want to avoid having too many images that mix styles, characters, or concepts. For example, if you are training a character, don’t use an image that shows that character in a group of other characters. Exclude images with busy backgrounds, frames, or other elements that might confuse the model.
Use High-Resolution Images: Utilize clear, high-resolution images to ensure the model learns the most accurate features. Images that will have a final size that is at or over the 1 million pixel range (like 1024x1024) is perfect because it can be scaled down for other training resolutions. If you have less than an ideal number of images, even higher resolutions will be very beneficial (see “Getting the Most Out of a Few Images”). While it is true that Flux can train at lower resolutions (512 is common), you want to leave yourself plenty of room to crop images to fit into certain training buckets or to get rid of an unwanted watermark.
Avoid Blurry or Pixelated Images: Even if they are technically higher resolution, avoid images with blurs, poor lighting, or pixelation. These are often images that are captured stills from streaming video or images that were enlarged from lower resolutions. Even artistic blurs can be a problem, especially if they aren't captioned well (using keywords like "bokeh" or describing a "background blur").
Use Lossless Image Formats: Without going too deep into technical details, images come in two types (usually depending on their file extension)—lossless and lossy. Lossless image formats preserve all the original data without any compression artifacts, ensuring that fine details and color information remain intact. Lossy image formats, on the other hand, compress images by removing some data, which can introduce artifacts and degrade image quality. When selecting images, always choose lossless images over lossy—you can always save them in a different format later but picking lossless images up front will save you trouble later. So, pick original images in a PNGs or TIFFs format over the JPEGs/JPGs when you have the option.
Look out for compression artifacts: As mentioned above, lossy image compression can cause bad quality images. But to the average human eye, its not always obvious; however, computer models can learn bad habits from these artifacts. A good trick is to zoom in really far and look at the image. Look for odd color patterns, especially a "banded" appearance. When you look really closely, there is also often a halo effect around people and objects on poorly resized images -- a ripple that surrounds them where the pixels are slightly altered.
Avoid Watermarks and Logos: Ensure images are free from watermarks, logos, or other types of overlays that may include distracting elements that don't contribute to the learning objective. You never know when the LoRA will learn a bad habit from that one image with a watermark or logo and think that every image generated should have one too.

Where to Find Images

Finding the right images is the foundation of your Flux LoRA. It's not just about gathering a bunch of pictures you like—it's about finding high-quality, appropriately licensed images that fit your concept well. Being mindful of where your images come from can save you a lot of trouble later, especially if you plan to share or release your LoRA. What images you can legitimately use for training an AI model may be limited by your own government, so always check your own national and local laws. Regardless of what's here, you are responsible for following the laws and regulations that affect you. I am not a lawyer and this is not legal advice.

First, you need to be aware of license types and how it relates to building a dataset. Not all images are free to use, even for training purposes. Ideally, you should only use images that are either in the public domain, openly licensed (like Creative Commons), or your own original work. Avoid anything with "All Rights Reserved" unless you have explicit permission. While it is not currently a violation of copyright laws to use images with copyrights to train AI models (in the US as of the date of this post), its still a very gray legal and ethical area. Even if you're not planning to share the LoRA, respecting licenses is good practice and avoids potential problems down the road.

Where you should be safe

Keep in mind laws and regulations change and just because you aren't currently violating a copyright by using a training image under something like the fair use clause, it doesn't mean it will also be that way. If you want to stay 99% in the clear, here are some sources of images that are unlikely to be an issue now or in the future.

Public Domain Archives: Websites like Wikimedia Commons and Public Domain Review host a large number of images that are free to use. Still, double-check individual entries, as not everything uploaded to these platforms is automatically public domain.
Creative Commons Platforms: Platforms like Flickr allow filtering by license. Stick to CC0 or CC-BY images, which are free to use with little or no attribution requirements. Always verify the license on the specific image page—some platforms mix licensing types within collections.
Purchasing Stock Images: Paid stock image sites (like Shutterstock, Adobe Stock, or Depositphotos) offer high-quality images, and purchasing them gives you clear rights for personal projects. Just make sure the licensing agreement covers "machine learning training" or "derivative works" if you intend to distribute your LoRA.
Personal Collections: If you've taken photos yourself, you're in the clear! Personal photography is a great source because it guarantees originality, and you can tailor your images exactly to your needs. Of course, you should be following all applicable laws when you take any of those pictures and have consent of all participants to use the images when training a model.
Generate the Images: You can use an AI model (like Flux itself) to generate base images if you need to create something very specific. Just keep in mind that model-generated images might inherit training biases and AI models also can have license restrictions for generated images.
Commission Artists: Hiring an artist to create custom images for your dataset can be a fantastic option—especially if you need a very distinct style or concept that doesn't exist elsewhere. Always make sure the agreement allows the images to be used for training purposes.
Old Books and Magazines: Many vintage publications are public domain or available with open licenses and copyrights expire after 95 years in the US. There is a good reason large number of my LoRAs are trained on art and other documents that are over 100 years old.
Government Archives: Some government-produced media (like NASA images) are public domain. Just check.
Open-Source Image Repositories: Some open datasets are freely available specifically for AI training.

Collecting images from multiple sources can give you both the variety and consistency needed for a strong LoRA—just make sure you keep track of your sources in case you need to reference them later.

Sunday, April 13, 2025

Part 2: Do You Actually Need to Train a LoRA?

Before diving into the technicalities of training a LoRA, it's crucial to determine whether you truly need one. This step might seem counterintuitive, but it can save you a significant amount of time and prevent unnecessary frustration.

A Cautionary Tale

Let me share a personal experience from the early days of SDXL. I meticulously followed various guides, assembled a dataset of a hundred high-quality images, and manually captioned each one. After training online and downloading the results, I was thrilled with what I had created—it seemed to work exactly as intended.

However, upon reviewing the training console log, I noticed a filename error due to a typo in the LoRA name used in the prompt. It turned out that all my efforts were in vain because the base model was already capable of producing the desired results. The LoRA hadn't been applied at all. This experience taught me a valuable lesson: always verify whether a LoRA is necessary before investing time and resources.

Assessing the Need for a LoRA

Beyond avoiding redundant work, this evaluation helps you understand whether you're starting from scratch or if the base model already has some understanding of your concept but struggles with execution. Here's a straightforward process to guide you:

1. Prompt the Base Model Directly

Begin by prompting the base model with your desired concept. Use synonyms and related phrases. For instance, if you're interested in generating an image of a "lemniscate," try "infinity symbol" instead. Sometimes, the model might not recognize a specific term but can produce accurate results with a more commonly understood description.

2. Describe the Concept Visually

If the model doesn't respond well to direct prompts, try describing the concept visually. For example, instead of "kitsune," use "woman with fox ears and fox tail." This approach can often yield better results, as the model may understand visual descriptions more effectively than specific terms.

3. Leverage AI for Captioning

Select a few images (2-3) that represent your concept and use an AI tool like ChatGPT to generate captions. Inform the AI of your intention, such as: "Caption these images using a format a text-to-image AI will understand so that I can use it as a prompt to recreate the image." Feed these captions into the model and observe the outputs.

4. Analyze the Results

Evaluate the model's performance:

Is it entirely unfamiliar with the concept?
Can it approximate the concept with careful prompting?
Does it recognize the concept but struggle with consistent execution?

What to do the Results Mean?

After evaluating the base model's capabilities, you can determine the necessity and scope of training a LoRA. Here's how to interpret your findings:

The Base Model Already Performs Well

This is one of the most disappointing results, but at the same time, its better to find this out at the start rather than at the end. If the base model can generate satisfactory results with appropriate prompting, training a LoRA may be unnecessary -- refine your prompting instead of training a LoRA.

The Base Model Recognizes the Concept but Lacks Consistency

The model understands the concept but struggles to represent it consistently across different generations. Training a LoRA can help reinforce the concept and improve consistency; however, you can likely get away with a smaller dataset and less training time than normal. All you'll be making is a "helper" LoRA that gets Flux to consistently generate the concept. Its good to note this now at the start as you can be on guard for accidently overfitting later.

The Base Model Gets Close to the Concept but not Quite There

The model shows recognition of adjacent concepts or can only produce the designed concept when its visually described but doesn't fully capture its nuances. In this case, you want to identify and utilize the tokens that elicit the closest representation of your concept and use those when captioning. Let Flux map out conceptual associations between what it knows and what you want it to learn. In this case, you should focus on developing a dataset that emphasizes the aspects the model fails to capture, focusing on areas where the current understanding is lacking. Just like before, you should be on guard for overfitting as you are going to leverage some knowledge that model already knows.

Note: If you are working with a SDXL model, a text embedding may be a better solution than a LoRA.

The Base Model Doesn't Recognize the Concept at All

The model lacks any understanding of the concept, indicating a need for comprehensive training. Ironically, this is likely the most straightforward (and in many ways, the easiest) route to go. Most of this guide is written with the assumption that you are starting from scratch, but as long as you note your results early on, you can make adjustments where needed later in the process.

Part 1: What is a LoRA, Anyway?

Let’s keep it simple. A LoRA (short for Low-Rank Adaptation) is a tiny file that “teaches” a big model something new — without having to retrain the whole thing. Think of it like a focused mini-lesson for your model. You’re not rewriting the textbook; you’re just giving it a flashcard with something specific to remember. I personally think of a LoRA as a patch; LoRAs cover up a gap in knowledge about something specific.

In the world of text-to-image models (like Stable Diffusion), LoRAs help the model learn a new thing — and that thing can be just about anything, as long as you’ve got the right data. You can then “activate” the LoRA during generation, and the model will steer its outputs based on what it learned.

For example:

Want the model to draw Sailor Moon in your style? Train a LoRA on that.
Want to make every output look like a pastel chalk sketch? Yep, LoRA.
Want to add a new clothing item, like chainmail bikinis or Elizabethan collars? LoRA’s got your back.
Want sliders to control how muscular or old someone looks? Also possible with the right kind of LoRA.

Common Types of LoRAs

When people in the AI art community talk about LoRAs, they usually refer to one of these rough categories. Each one is a little different in how it’s trained and what data it needs, but conceptually, they all work the same way: teaching the model to associate text with a certain visual result.

Character LoRAs

These are the most common. A character LoRA teaches the model how to recreate a specific character consistently. This can be:

A real person (e.g., Marilyn Monroe, Elon Musk)
A fictional or anime character (e.g., Goku, Lara Croft)
An original character you created
Even animals or mascots with a recognizable look (e.g., Hello Kitty)

The key here is consistency — you want the model to recognize and reproduce core visual traits across different poses and settings. You can click the images below to get a better look at these character LoRAs.

Style LoRAs

These change the look and feel of an image. That might be:

A traditional art medium like oil painting, watercolor, or pixel art
A color scheme (e.g., everything tinted red-orange like golden hour)
A visual technique or aesthetic (e.g., cyberpunk, ukiyo-e, comic book ink)

Style LoRAs can be subtle or extreme, and the best ones tend to focus on clear, consistent visual traits. You can click the images below to get a better look at these style LoRAs.

Concept LoRAs

This is a bit of a catch-all category. Concept LoRAs teach the model to understand something less tangible than a character or style. Think:

Clothing types (e.g., Victorian dresses, samurai armor)
Specific poses or gestures (e.g., "sitting cross-legged", “peace sign”)
Accessories or props (e.g., glowing swords, teacups with steam)
Environmental elements (e.g., foggy forests, floating islands)

Basically, if it doesn’t quite fit as a character or a style, but it’s still something you want the model to learn, it probably goes here. You can click the images below to get a better look at these concept LoRAs.

Slider LoRAs

This one’s a bit different. These LoRAs are trained in a special way so they act more like controls than discrete ideas. They let you tweak a specific aspect of the image along a scale, like turning a dial:

Age: baby → child → adult → elderly
Muscle tone: slim → athletic → bodybuilder
Gender presentation: masculine ↔ feminine
Mood or lighting: gloomy ↔ bright

Instead of saying “use this LoRA to get X,” you can say “use it at 0.2 strength for a little X, or at 0.8 for a lot of X.” It’s like giving your model a volume knob. Slider LoRAs are trained very differently and will not be covered in this guide – but never fear, you can read about them and install a specialized training script by going to Rohit Gandikota’s Silders Training Script Repo.

General Terms to Know

Before we dive deeper, let's familiarize ourselves with some basic terminology that will be useful later on:

Base Model: The original, pre-trained model (e.g., Stable Diffusion) that serves as the foundation for generating images. When training a LoRA, it's crucial to use the same base model you intend to apply the LoRA to, ensuring compatibility and optimal performance.

Fine-Tuning: The process of adjusting a model's parameters to specialize it for a specific task or dataset. While "fine-tuning" often refers to retraining the entire base model, LoRA training is a lightweight form of fine-tuning that focuses on adding new capabilities without altering the base model's core parameters.

Prompt: The textual input you provide to guide the AI model's image generation. For example: "a futuristic cityscape at sunset."

Caption: The descriptive text associated with each image in your training dataset. Captions teach the model the relationship between textual descriptions and visual elements, enabling it to generate relevant images based on prompts.

Dataset: A curated collection of images and their corresponding captions used to train the LoRA. A well-structured dataset is essential for effective learning and accurate image generation.

Trigger Word: A specific keyword or phrase included in your prompt to activate the LoRA's effect. For instance, if you've trained a LoRA on "Sailor Moon," including "Sailor Moon" in your prompt will invoke that LoRA's influence.

LoRA Weight (or Strength): A value, typically between 0 and 1, that determines the intensity of the LoRA's influence on the output. A weight of 0.8 applies the LoRA strongly, while 0.2 applies it more subtly.

Overfitting: Occurs when a model learns the training data too well, including its noise and anomalies, leading to poor generalization on new data. In the context of LoRA training, overfitting can result in images that are too similar to the training examples, lacking diversity.

Underfitting: Happens when a model hasn't learned enough from the training data, resulting in outputs that don't capture the desired concept effectively. This may be due to insufficient training time or a dataset that doesn't adequately represent the concept.

Epoch: One complete pass through the entire training dataset during the training process. Multiple epochs are often necessary for the model to learn the desired patterns effectively.Cross Validated

Training Step: A single update of the model's parameters based on a subset (batch) of the training data. The number of steps per epoch depends on the size of your dataset and the chosen batch size. For example, with 2,000 images and a batch size of 10, one epoch consists of 200 steps. Stack Overflow

Checkpoint: A saved state of the model at a particular point during training. Checkpoints allow you to resume training from a specific state or evaluate performance at different stages. In LoRA training, checkpoints can refer to both the base model and the LoRA.

Navigation

Saturday, April 26, 2025

The Making of a LoRA: Ink & Lore

The Making of a LoRA: Ink & Lore

The Training Settings

A Closer Look

Multiple Resolutions

Learning Rate Schedule

Training on Dev2Pro

Where to get the LoRA

Friday, April 25, 2025

Part 3: Images -- The How, What, and Where

Part 3: Images -- The What, How, and Where

How Many Images to do you need?

Selection of Images

Style LoRAs

Character LoRAs

Concept LoRAs

Image Quality

Where to Find Images

Where you should be safe

Sunday, April 13, 2025

Part 2: Do You Actually Need to Train a LoRA?

Part 2: Do You Actually Need to Train a LoRA?

A Cautionary Tale

Assessing the Need for a LoRA

1. Prompt the Base Model Directly

2. Describe the Concept Visually

3. Leverage AI for Captioning

4. Analyze the Results

What to do the Results Mean?

The Base Model Already Performs Well

The Base Model Recognizes the Concept but Lacks Consistency

The Base Model Gets Close to the Concept but not Quite There

The Base Model Doesn't Recognize the Concept at All

Part 1: What is a LoRA, Anyway?

Part 1: What is a LoRA, Anyway?

Common Types of LoRAs

Character LoRAs

Style LoRAs

Concept LoRAs

Slider LoRAs

General Terms to Know

The Making of a LoRA: Ink & Lore