Formally, the entropy of a normal distribution is:-
$$ \frac{1}{2} \log(2 \pi e \sigma^2) $$
So if you're sampling a random (normally distributed) signal, that would seem to be the rate of entropy generation. I take it that if the logarithm is base 2, then the entropy would be in units of bits. So a process with a standard deviation of 1V creates 2.05 bits of entropy /sample.
My problem is with the rate calculation. I thought that the entropy rate would be governed by the precision of the samples. A 16 bit ADC must by definition create more sample data than an 8 bit ADC if they both cover the same voltage range. Intuitively more precision suggests more data. More data suggests more entropy.
My question is why does the entropy formula above take no account of sample precision?
Note. I suspect that this is not the appropriate formula for ADCs.
Edit. It's not. I think that I can manage to formulate the min.entropy for a normal distribution. Min. entropy is used in cryptography and randomness extraction and would be good enough. I get:-
$$ H_{min} = -log_2 \left[ 2 \Phi \left(\frac{V_s - \mu}{\delta} \right) - 1 \right] $$
where \$ V_s \$ is half of the minimum ADC voltage step, as:-
$$ V_s = \frac{V_{range}}{2^{N+1}} $$
for an \$ N \$ bit ADC digitising a range of \$ V_{range} \$ volts. And generally, for any distribution such as a (asymmetrical) log normal that you might get sampling a diode's avalanche noise:-
$$ H_{min} = -log_2 \left[ CDF(v'_{max}) - CDF(v'_{min}) \right] $$
where \$ v'_{max}, v'_{min} \$ are the two quantized max. min. boundaries of the smallest ADC step based on \$ V_s = \frac{V_{range}}{2^{N}} \$ and which bound the mode.
Thus I think that I had totally the wrong equation...
Answer
That's the continuous (differential) entropy; not the entropy of the discrete random variable that your ADC output is!
You could (and seeing your profile page, you'll probably have a lot of fun doing that) look into what is called rate distortion in the field that concerns itself with information entropy, information theory. Essentially, an ADC doesn't "let through" all the entropy entering it, and there's ways of measuring that.
But in this specific case, things are simpler.
Remember: The entropy of a discrete source \$X\$ is the expectation of information
$$H(X) = E(I(X))$$
and since your ADC output \$X\$ has very finitely many, countable amounts of output states, the expectation is just a sum of probability of an output \$x\$ times the information of that output:
\begin{align} H(X) &= E(I(X))\\ &= \sum_{x} P(X=x) \cdot I(X=x)\\ &= \sum_{x} P(X=x) \cdot \left(-\log_2(P(X=x))\right)\\ &= -\sum_{x} P(X=x) \log_2(P(X=x)) \end{align}
Now, first observation: \$H(X)\$ of an \$N\$-bit ADC has an upper bound: you can't ever get more than \$N\$ bits of info out of that ADC. And: you get exactly \$H(X)=N\$ if you use the discrete uniform distribution for the values over the \$2^N\$ ADC steps (try it! set \$P(X=x) = \frac1{2^N}\$ in the formula above, and remember that you sum over \$2^N\$ different possible output states).
So, we can intuitively conclude that the digitized normal distribution yields fewer bits of entropy than the digitized uniform distribution.
Practically, that means something immensely simple: Instead of using, say, one 16-bit ADC to digitize your normally distributed phenomenon, use sixteen 1-bit ADCs to observe 16 analog normally distributed entities, and only measure whether the observed value is smaller or larger than the mean value of the normal distribution. That "sign bit" is uniformly distributed over \$\{-1, +1\}\$, and thus, you get one full bit out of every 1-bit ADC, summing to 16 total bits. If your noise source is white (that means: one sample isn't correlated to the next), then you can just sample 16 times as fast as your 16-bit ADC with your 1-bit ADC, and get the full 16 bit of entropy in the same time you would have done one 16-bit analog-to-digital conversion¹.
But you had a very important question: How many bits do you get when you digitize a normal distribution?
First thing to realize:
The normal distribution has tails, and you will have to truncate them; meaning that all the values above the largest ADC step will be mapped to the largest value, and all the values below the smallest to the smallest value.
So, what probability do the ADC steps then get?
I'll do the following model:
- As shown above, the sign of our \$\mathcal N(0,1)\$ distributed variable is stochastically independent from the absolute value. So, I'll just calculate the values for the ADC bins \$\ge0\$; the negative ones will be symmetrically identical.
- Our model ADC covers \$[-1,1]\$ analog units; it has \$N+1\$ bits. Meaning that I'll cover \$[0,1]\$ with \$N\$ bits below.
- The smallest non-negative ADC bin thus covers analog values \$v_0=[0,2^{-N}[\$; the one after that \$v_1=[2^{-N},2\cdot 2^{-N}[\$.
The \$i\$th bin (counting from 1, \$i<2^N-1\$) covers $$v_i=[i\cdot 2^{-N},(i+1)\cdot 2^{-N}[$$. - Most interesting, however, is the last bin: it covers $$v_{2^N-1}=[ (2^N-1)\cdot 2^{-N},\infty[=[1-2^{-N},\infty[$$.
So, to get the entropy, i.e. the expected information, we'll need to calculate the individual bin's information, and average these, i.e. weigh each bin's information with its probability. What is the probability of each bin?
\begin{align} P(X\in v_i|X>0) &= \int\limits_{i\cdot 2^{-N}}^{(i+1)\cdot 2^{-N}} f_X(x)\,\mathrm dx && {0\le i<2^N-1}\\ &= 2\left(F_X\left((i+1)\cdot 2^{-N}\right)-F_X\left(i\cdot 2^{-N}\right)\right) &&|\text{ standard normal}\\ &= 2\Phi\left((i+1)\cdot 2^{-N}\right)-2\Phi\left(i\cdot 2^{-N}\right)\\[1.5em] P(X\in v_{2^N-1}|X>0) &=\Phi(\infty)-2\Phi\left(1- 2^{-N}\right)\\ &= 1-2\Phi\left(1- 2^{-N}\right) \end{align}
So, let's plot the information of a 5-bit ADC (which means 4 bits represent the positive values):
Summing up the $\P\$ from above, we see that we only get 3.47 bits.
Let's actually do an experiment: Does my ADC pay? We'll simply plot entropy that we get over ADC bits that we pay for.
As you can see, you get diminishing returns for added cost; so don't use a high-resolution ADC to digitize a strongly truncated normal distribution
As was rather intuitive from the previous sections, the problem is that the boundary bin accumulates too much probability mass.
What if we scaled the ADC range, to cover \$[-a\sigma, a\sigma]\$ instead of just \$[-\sigma, \sigma]\$?
Which confirms our suspicion that there's a maximum entropy you'll get when you "kinda" make your normal distribution look relatively uniform:
The scripts used to generate above figures can be found on Github.
¹ in fact, you must take a close look at the architecture of your ADC: ADCs are typically not meant to digitize white noise, and thus, many ADC architectures strive to "color" the white noise e.g. by shifting the noise energy to higher frequencies when measuring the output value successively; an introduction to the Delta-Sigma ADC is a must-read here! And I'm certain you'll enjoy that :)
No comments:
Post a Comment