1+1
2
Computers store information with two-state logic. This can be represented in a binary system with numbers 0 and 1 (i.e. base 2)
Any number can be represented in any base as a polynomial (possibly with infinitely many terms): the digits are $0 \leq x_k < b$ and determine the contribution of the base $b$ raise to the $k$th power.
$$ q_b = \sum_{k=-\infty}^{+\infty} x_k b^k $$Convert 10 (base 10, i.e. $1 \times 10^1 + 0\times 10^0$) into binary
(Note: divmod(x, 2)
is x // 2, x % 2
, i.e. integer division and remainder):
divmod(10, 2)
(5, 0)
divmod(5, 2)
(2, 1)
divmod(2, 2)
(1, 0)
The binary representation of $10_{10}$ is $1010_2$ (keep dividing until there's only 1 left, then collect the 1 and all remainders in reverse order, essentially long division).
Double check by multiplying out $1010_2$:
1*2**3 + 0*2**2 + 1*2**1 + 0*2**0
10
or in Python
int('0b1011', 2)
11
All integers are exactly representable in base 2 with a finite number of digits.
For example, with 8 bits available (a "byte"), what is the largest and smallest integer?
0b1111111 # 7 bits for number, 1 for sign (not included)
127
-0b1111111
-127
If you want to properly sum all terms, use numpy arrays and the element-wise operations:
import numpy as np
nbits = 7
exponents = np.arange(nbits)
bases = 2*np.ones(nbits) # base 2
digits = np.ones(nbits) # all 1, for 1111111 (127 in binary)
np.sum(digits * bases**exponents)
127.0
What is the smallest and largest integer that you can represent
Smallest and largest 4 bit unsigned integer:
0b0000
0
0b1111
15
Smallest and largest 32-bit signed integer (int32):
1 bit is sign, 31 bits are available, so the highest number has 31 ones (111...11111). The next highest number is 1000...000, a one with 32 bits and 31 zeroes, i.e., $2^{31}$.
Thus, the highest number is $2^{31} - 1$:
2**31 - 1
2147483647
(and the smallest number is just $-(2^{31} - 1)$)
And int64 (signed):
max64 = 2**(64-1) - 1
print(-max64, max64)
print(np.log10(max64))
-9223372036854775807 9223372036854775807 18.964889726830815
In Python, integers have arbitrary precision: integer arithmetic (+
, -
, *
, //
) is exact and will not overflow. Thus the following code will run forever (until memory is exhausted); if you run it, you can stop the evaluation with the ''Kernel / Interrupt'' menu command in the notebook and then investigate n
and nbits
:
n = 1
nbits = 1
#while True:
for i in range(10000):
n *= 2
nbits += 1
print(n)
19950631168807583848837421626835850838234968318861924548520089498529438830221946631919961684036194597899331129423209124271556491349413781117593785932096323957855730046793794526765246551266059895520550086918193311542508608460618104685509074866089624888090489894838009253941633257850621568309473902556912388065225096643874441046759871626985453222868538161694315775629640762836880760732228535091641476183956381458969463899410840960536267821064621427333394036525565649530603142680234969400335934316651459297773279665775606172582031407994198179607378245683762280037302885487251900834464581454650557929601414833921615734588139257095379769119277800826957735674444123062018757836325502728323789270710373802866393031428133241401624195671690574061419654342324638801248856147305207431992259611796250130992860241708340807605932320161268492288496255841312844061536738951487114256315111089745514203313820202931640957596464756010405845841566072044962867016515061920631004186422275908670900574606417856951911456055068251250406007519842261898059237118054444788072906395242548339221982707404473162376760846613033778706039803413197133493654622700563169937455508241780972810983291314403571877524768509857276937926433221599399876886660808368837838027643282775172273657572744784112294389733810861607423253291974813120197604178281965697475898164531258434135959862784130128185406283476649088690521047580882615823961985770122407044330583075869039319604603404973156583208672105913300903752823415539745394397715257455290510212310947321610753474825740775273986348298498340756937955646638621874569499279016572103701364433135817214311791398222983845847334440270964182851005072927748364550578634501100852987812389473928699540834346158807043959118985815145779177143619698728131459483783202081474982171858011389071228250905826817436220577475921417653715687725614904582904992461028630081535583308130101987675856234343538955409175623400844887526162643568648833519463720377293240094456246923254350400678027273837755376406726898636241037491410966718557050759098100246789880178271925953381282421954028302759408448955014676668389697996886241636313376393903373455801407636741877711055384225739499110186468219696581651485130494222369947714763069155468217682876200362777257723781365331611196811280792669481887201298643660768551639860534602297871557517947385246369446923087894265948217008051120322365496288169035739121368338393591756418733850510970271613915439590991598154654417336311656936031122249937969999226781732358023111862644575299135758175008199839236284615249881088960232244362173771618086357015468484058622329792853875623486556440536962622018963571028812361567512543338303270029097668650568557157505516727518899194129711337690149916181315171544007728650573189557450920330185304847113818315407324053319038462084036421763703911550639789000742853672196280903477974533320468368795868580237952218629120080742819551317948157624448298518461509704888027274721574688131594750409732115080498190455803416826949787141316063210686391511681774304792596709376
type(n)
int
int.bit_length(n)
10001
nbits
10001
NumPy data types (dtypes) are fixed precision. Overflows "wrap around":
import numpy as np
np.array([2**15-1], dtype=np.int16)
array([32767], dtype=int16)
np.array([2**15], dtype=np.int16)
array([-32768], dtype=int16)
np.array([2**15 + 1], dtype=np.int16)
array([-32767], dtype=int16)
Decimal fractions can be represented as binary fractions:
Convert $0.125_{10}$ to base 2:
0.125 * 2 # 0.0
0.25
_ * 2 # 0.00
0.5
_ * 2 # 0.001
1.0
Thus the binary representation of $0.125_{10}$ is $0.001_2$.
General recipe:
0.3125 * 2 # 0.0
0.625
_ * 2 # 0.01
1.25
(_ - 1) * 2 # 0.010
0.5
_ * 2 # 0.0101
1.0
Thus, 0.3125 is $0.0101_2$.
What is the binary representation of decimal $0.1 = \frac{1}{10}$?
0.1 * 2 # 0
0.2
_ * 2 # 0
0.4
_ * 2 # 0
0.8
_ * 2 # 1
1.6
(_ - 1) * 2 # 1
1.2000000000000002
(_ - 1) * 2 # 0
0.40000000000000036
_ * 2 # 0
0.8000000000000007
_ * 2 # 1
1.6000000000000014
... etc: this is an infinitely repeating fraction and the binary representation of $0.1_{10}$ is $0.000 1100 1100 1100 ..._2$.
Thus, with a finite number of bits, 0.1 is not exactly representable in the computer.
The number 0.1 is not stored exactly in the computer. print
only shows you a convenient approximation:
print(0.1)
0.1
print("{0:.55f}".format(0.1))
0.1000000000000000055511151231257827021181583404541015625
Only a subset of all real numbers can be represented with floating point numbers of finite bit size. Almost all floating point numbers are not exact:
0.1 + 0.1 + 0.1 == 0.3
False
... which should have yielded True
! But because the machine representation of 0.1 is not exact, the equality cannot be fulfilled.
Floating point numbers are stored in "scientific notation": e.g. $c = 2.88792458 \times 10^8$ m/s
Format: $$ x = (-1)^s \times 1.f \times 2^{e - \mathrm{bias}} $$
($f$ is $M$ bits long. The leading 1 in the mantissa is assumed and not stored: "ghost" or "phantom" bit.)
Format:
$$ x = (-1)^s \times 1.f \times 2^{e - \mathrm{bias}} $$Note:
IEEE float uses 32 bits
s | e | f | |
bit position | 31 | 30–23 | 22–0 |
1/2**23
1.1920928955078125e-07
Python floating point numbers are 64-bit doubles. NumPy has dtypes float32
and float64
.
IEEE double uses 64 bits
s | e | f | |
bit position | 63 | 62–52 | 51–0 |
1/2**52
2.220446049250313e-16
For numerical calculations, doubles are typically required.
IEEE 754 also introduces special "numbers" that can result from floating point arithmetic
NaN
(not a number)+INF
and -INF
(infinity)-0
(signed zero)Python itself does not use the IEEE special numbers
1/0
--------------------------------------------------------------------------- ZeroDivisionError Traceback (most recent call last) <ipython-input-89-9e1622b385b6> in <module>() ----> 1 1/0 ZeroDivisionError: division by zero
But numpy does:
np.array([1, -1])/np.zeros(2)
C:\Users\csabai\Anaconda3\lib\site-packages\ipykernel\__main__.py:1: RuntimeWarning: divide by zero encountered in true_divide if __name__ == '__main__':
array([ inf, -inf])
But beware, you cannot use INF
to "take limits". It is purely a sign that something bad happened somewhere...
And not a number, nan
np.zeros(2)/np.zeros(2)
C:\Users\csabai\Anaconda3\lib\site-packages\ipykernel\__main__.py:1: RuntimeWarning: invalid value encountered in true_divide if __name__ == '__main__':
array([nan, nan])
inf
big = 1.79e308
big
1.79e+308
2 * big
inf
2 * np.array([big], dtype=np.float64)
C:\Users\csabai\Anaconda3\lib\site-packages\ipykernel\__main__.py:1: RuntimeWarning: overflow encountered in multiply if __name__ == '__main__':
array([inf])
... but you can just use an even bigger data type:
# not supported on Windows with MS compiler
2 * np.array([big], dtype=np.float128)
--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) <ipython-input-18-f89406158fa4> in <module>() ----> 1 2 * np.array([big], dtype=np.float128) AttributeError: module 'numpy' has no attribute 'float128'
x = 1000.2
A = 1000.2 - 1000.0
print(A)
0.20000000000004547
A == 0.2
False
... oops
x = 700
y = 1e-14
x - y
700.0
x - y < 700
False
... ooops
Only a limited number of floating point numbers can be represented. This limited precision affects calculations:
x = 5 + 1e-16
x
5.0
x == 5
True
... oops.
Machine precision $\epsilon_m$ is defined as the maximum number that can be added to 1 in the computer without changing that number 1:
$$ 1_c + \epsilon_m := 1_c $$Thus, the floating point representation $x_c$ of an arbitrary number $x$ is "in the vicinity of $x$"
$$ x_c = x(1\pm\epsilon), \quad |\epsilon| \leq \epsilon_m $$where we don't know the true value of $\epsilon$.
Thus except for powers of 2 (which are represented exactly) all floating point numbers contain an unknown error in the 6th decimal place (32 bit floats) or 15th decimal (64 bit doubles).
This error should be treated as a random error because we don't know its magnitude.
N = 100
eps = 1
for nbits in range(N):
eps /= 2
one_plus_eps = 1.0 + eps
# print("eps = {0}, 1 + eps = {1}".format(eps, one_plus_eps))
if one_plus_eps == 1.0:
print("machine precision reached for {0} bits".format(nbits))
print("eps = {0}, 1 + eps = {1}".format(eps, one_plus_eps))
break
machine precision reached for 52 bits eps = 1.1102230246251565e-16, 1 + eps = 1.0
Compare to our estimate for the precision of float64:
1/2**52
2.220446049250313e-16
A quick hack to convert a floating point binary representation to a floating point number.
bits = "1010.0001100110011001100110011001100110011001100110011"
import math
def bits2number(bits):
if '.' in bits:
integer, fraction = bits.split('.')
else:
integer = bits
fraction = ""
powers = [int(bit) * 2**n for n, bit in enumerate(reversed(integer))]
powers.extend([int(bit) * 2**(-n) for n, bit in enumerate(fraction, start=1)])
return math.fsum(powers)
bits2number(bits)
10.1
bits2number('1111')
15.0
bits2number('0.0001100110011001100110011001100110011001100110011')
0.09999999999999964
bits2number('0.0001100')
0.09375
bits2number('0.0101')
0.3125
bits2number("10.10101")
2.65625
bits2number('0.0111111111111111111111111111111111111111')
0.4999999999990905
bits2number('0.110011001100')
0.7998046875
Python can convert to binary using the struct
module:
import struct
fpack = struct.pack('f', 6.0e-8) # pack float into bytes
fint = struct.unpack('i', fpack)[0] # unpack to int
m_bits = bin(fint)[-23:] # mantissa bits
print(m_bits)
00000001101100101011001
With phantom bit:
mantissa_bits = '1.' + m_bits
print(mantissa_bits)
1.00000001101100101011001
import math
mn, ex = math.frexp(6.0e-8)
print(mn, ex)
0.50331648 -23