Autonomía digital y tecnológica

Código e ideas para una internet distribuida

Linkoteca. Archivo de navegación

Technically they’re all machine learning models, but checkpoints are usually referred to just as models. All models are static, meaning they only know what they were trained on. In order for them to learn something new, they have to be (re)trained again, also known as finetuning.

Their differences in very rough terms:

Checkpoints are the big models that make images on their own.

Loras and all their variations like Lycoris are «mini models» that plug into a checkpoint and alter their outputs. They let checkpoints make styles, characters and concepts that the base checkpoint they’re used on doesn’t know or didn’t know very well.

Hypernetworks are an older and not as good implementation of the same paper and same concept as Loras.

Textual Inversions are sort of bookmarks or compilations of what a model already knows, they don’t necessarily teach something new but rearrange stuff that a model already knows in a way that it didn’t know how to arrange by itself.

SD 1.x, SD 2.x and SDXL are both different base checkpoints and also different model architectures. Think of them maybe as non backwards compatible consoles if that’s easier to understand. SD 1.5 is say both the NES itself but also one game for it. All SD 1.x based models are compatible with SD 1.x Loras and models for extensions like ControlNet. SD 2.X is the SNES, it’s a different architecture so 1.x models won’t be compatible with it, same for SDXL if you say that’s like the N64.

ControlNet models are also machine learning models that inject into the Stable Diffusion process and control the denoising process, they’re used with an image made by preprocessor. That image is be used to guide and control that denoising process, also hence the name.

All these models come in safetensors format. Safetensors is the standard for machine learning models, since they only contain the necessary data for diffusion, hence their name. .ckpt is the old model format that’s outdated, because it’s an unsafe format that can contain and execute malicious code.

Pruned models are models that have unnecessary weights and data removed. Weights are part of what the model learned in order to denoise noise to make an image. Say a model has a weight for alien = 0.00000000000000000001, it’s so minimal that it won’t do anything, but it’s still taking up space. Now multiply that for a lot of more useless weights, pruning removes all of them so only the relevant data is left.

FP16 models are smaller than FP32 models because they have lower precision, it’s basically like calculating pi with less numbers, you’ll get close enough results, almost exactly the same in most cases, but still not as precise. Yet the images from FP16 models are also not that different or worse, and A1111 converts models to FP16 by default when loading them anyway, for faster speed.

To use a model, place them in their specific folder inside the UI you’re using. For A1111 they go in stable-diffusion-webui\models in self explanatory folders for Lora, etc. and in stable-diffusion-webui\models\Stable-diffusion for checkpoints.

Both checkpoints and Loras can be used either for poses or for styles, depending on what they were trained on. Pose Loras are a thing for example.

The reason for your results is because you’re using SD 1.5 as a checkpoint. The base SD 1.5 checkpoint is almost a year old and the Lora you’re using was trained using not only newer checkpoints, but also checkpoints trained better on anime. I recommend a newer anime checkpoint for the image you’re trying to make. The base model field on Civitai models is really more like base architecture. Also it’s recommended to not go below 512 in resolution for SD 1.x models and not below 1024 for SDXL models.