These days I needed to implement in an embedded algorithm a simple RMS (Root Mean Squre) calculation over an analog-to-digitally converted signal. My most eager and resource consuming problem was, after the efficient accumulation of data (calculating the sum of squares), I needed to extract the square root.

After a couple of Google searches, I found the Newtonian method for calculating the square root pretty effective, and I also found a C-implementation that I'm sharing with you (credits go to Hacker's Delight for their implementation, on which I made small changes):

```
float fast_sqrt(const float x)
{
union
{
int32_t ix;
float x;
} un;
un.x = x; // x can be viewed as int
un.ix = 0x1fbb3f80 + (un.ix >> 1); // Initial guess.
un.x = 0.5f*(un.x + x/un.x); // Newton step.
un.x = 0.5f*(un.x + x/un.x); // Newton step again.
return un.x;
}
```

Please note and be very careful, as this functions only works properly on 32-bit machines (where **sizeof(float) == 4**).