Higher resolution on TurboWarp: https://turbowarp.org/1006058909 Draw with the mouse Space - see how it works (after something is drawn) R - reset image
TL;DR: The naive way to do outlines is O(w*h*r^2). This one is O(w*h) I was wondering if there exists an algorithm such that given an image, could find round outlines on it in constant time regardless of radius (or more correctly, in time only dependent on resolution O(w*h)). After some thinking I came up with this algorithm. This has probably been invented before, but here I came with it myself. It requires 4 intermediate passes, and each element is computed independently and identically, allowing this algorithm to be implemented in fragment shaders. How it works: First, block size = ceiling(radius+1) is computed. As mentioned previously, the algorithm consists of 4 passes. The first pass creates an image of the same height as original, but with width divided by block size. Each resulting pixel is created from clamping together the row of "block size" amount of pixels. The result of the operation are 2 numbers: the amount of units from the start of the row to the first opaque pixel and the amount of units to the last opaque pixel. So basically, the image is converted to store horizontal spans. It is safe to assume simplify the representation of the image like that, since the distance between pixels that get combined together in a span is less than a radius, so it will get filled anyways. Once that is done, the second pass is performed. It takes the resulting image (2 values/pixel) from the first pass and outputs 4 values/pixel. Computation of each resulting pixel involves looking at radius pixels above and radius pixels below. But since the image is already horizontally compressed, it doesn't increase time complexity O(w/r*h*r)=O(w*h). Out of resulting 4 values: 2 values represent the span of the empty pixels (that cut out from fully filled region) and are computed by looking at the starts and ends of input the filled spans one column to the left and one column to the right and expanding them by the correctly sized radius depending on how much above or below it is compared to the currently processed pixel. The other 2 values represent the span of filled pixels (that fill the part of the cutout). They are computed by looking at input filled spans in the same column and expanding them by correctly sized radius. The third and fourth passes are the same as the first and the second but for another axis (vertically compressed representation). Once both are computed, the final render is done, by figuring out which horizontally and vertically compressed pixels the current real pixel belongs to, pulling the 4 values from both of them (8 in total). Then assuming those compressed regions are filled, first "cutting out" the empty spans, then "filling" the filled spans (but since we only check 1 pixel, no filling is done, it's just few comparisons to figure out if that one location is filled or not). In the end, there are 2 results about whether the pixel is filled or not: one from horizontal, one from vertical. Only if both say it's filled, the pixel is considered to be a part of the outline. Doing just 2 passes instead of 4 generally also looks fine, but there are occasionally situations where the filled spans of the second pass are created as long ones from what should really be multiple tiny filled spans, resulting in lines where there shouldn't be. Doing it on both axis seems to fully (or very close to fully) mitigate that issue. To understand more, go look at the code. It is decently readable.