Though working with LaTeX, there isn't any need to have in your case to worry about the framework of the doc. You could set your whole focus at the rear of the content. Visuals is usually captioned, labelled and referenced by the use of the figure ecosystem, as shown beneath: https://gdseo2023.bloggersdelight.dk