Numbers

In [20]:
1+1
Out[20]:
2

Binary representation

Computers store information with two-state logic. This can be represented in a binary system with numbers 0 and 1 (i.e. base 2)

Any number can be represented in any base as a polynomial (possibly with infinitely many terms): the digits are $0 \leq x_k < b$ and determine the contribution of the base $b$ raise to the $k$th power.

$$ q_b = \sum_{k=-\infty}^{+\infty} x_k b^k $$

Integers

Convert 10 (base 10, i.e. $1 \times 10^1 + 0\times 10^0$) into binary

(Note: divmod(x, 2) is x // 2, x % 2, i.e. integer division and remainder):

In [120]:
divmod(10, 2)
Out[120]:
(5, 0)
In [121]:
divmod(5, 2)
Out[121]:
(2, 1)
In [122]:
divmod(2, 2)
Out[122]:
(1, 0)

The binary representation of $10_{10}$ is $1010_2$ (keep dividing until there's only 1 left, then collect the 1 and all remainders in reverse order, essentially long division).

Double check by multiplying out $1010_2$:

In [123]:
1*2**3 + 0*2**2 + 1*2**1 + 0*2**0
Out[123]:
10

or in Python

In [21]:
int('0b1011', 2)
Out[21]:
11

Summary: Integers in binary representation

All integers are exactly representable in base 2 with a finite number of digits.

  • The sign (+ or –) is represented by a single bit (0 = +, 1 = –).
  • The number of available "bits" (digits) determines the largest representable integer.

For example, with 8 bits available (a "byte"), what is the largest and smallest integer?

In [125]:
0b1111111  # 7 bits for number, 1 for sign (not included)
Out[125]:
127
In [126]:
-0b1111111
Out[126]:
-127

Sidenote: using numpy to quickly convert integers

If you want to properly sum all terms, use numpy arrays and the element-wise operations:

In [2]:
import numpy as np
In [3]:
nbits = 7
exponents = np.arange(nbits)
bases = 2*np.ones(nbits)  # base 2
digits = np.ones(nbits)   # all 1, for 1111111 (127 in binary)
In [5]:
np.sum(digits * bases**exponents)
Out[5]:
127.0

Examples: limits of integers

What is the smallest and largest integer that you can represent

  1. if you have 4 bits available and only consider non-negative ("unsigned") integers?
  2. if you have 32 bits and consider positive and negative integers?
  3. if you have 64 bits and consider positive and negative integers?

Smallest and largest 4 bit unsigned integer:

In [57]:
0b0000
Out[57]:
0
In [58]:
0b1111
Out[58]:
15

Smallest and largest 32-bit signed integer (int32):

1 bit is sign, 31 bits are available, so the highest number has 31 ones (111...11111). The next highest number is 1000...000, a one with 32 bits and 31 zeroes, i.e., $2^{31}$.

Thus, the highest number is $2^{31} - 1$:

In [23]:
2**31 - 1
Out[23]:
2147483647

(and the smallest number is just $-(2^{31} - 1)$)

And int64 (signed):

In [25]:
max64 = 2**(64-1) - 1
print(-max64, max64)
print(np.log10(max64))
-9223372036854775807 9223372036854775807
18.964889726830815

Python's arbitrary precision integers

In Python, integers have arbitrary precision: integer arithmetic (+, -, *, //) is exact and will not overflow. Thus the following code will run forever (until memory is exhausted); if you run it, you can stop the evaluation with the ''Kernel / Interrupt'' menu command in the notebook and then investigate n and nbits:

In [36]:
n = 1 
nbits = 1
#while True:
for i in range(10000):
    n *= 2
    nbits += 1
print(n)
19950631168807583848837421626835850838234968318861924548520089498529438830221946631919961684036194597899331129423209124271556491349413781117593785932096323957855730046793794526765246551266059895520550086918193311542508608460618104685509074866089624888090489894838009253941633257850621568309473902556912388065225096643874441046759871626985453222868538161694315775629640762836880760732228535091641476183956381458969463899410840960536267821064621427333394036525565649530603142680234969400335934316651459297773279665775606172582031407994198179607378245683762280037302885487251900834464581454650557929601414833921615734588139257095379769119277800826957735674444123062018757836325502728323789270710373802866393031428133241401624195671690574061419654342324638801248856147305207431992259611796250130992860241708340807605932320161268492288496255841312844061536738951487114256315111089745514203313820202931640957596464756010405845841566072044962867016515061920631004186422275908670900574606417856951911456055068251250406007519842261898059237118054444788072906395242548339221982707404473162376760846613033778706039803413197133493654622700563169937455508241780972810983291314403571877524768509857276937926433221599399876886660808368837838027643282775172273657572744784112294389733810861607423253291974813120197604178281965697475898164531258434135959862784130128185406283476649088690521047580882615823961985770122407044330583075869039319604603404973156583208672105913300903752823415539745394397715257455290510212310947321610753474825740775273986348298498340756937955646638621874569499279016572103701364433135817214311791398222983845847334440270964182851005072927748364550578634501100852987812389473928699540834346158807043959118985815145779177143619698728131459483783202081474982171858011389071228250905826817436220577475921417653715687725614904582904992461028630081535583308130101987675856234343538955409175623400844887526162643568648833519463720377293240094456246923254350400678027273837755376406726898636241037491410966718557050759098100246789880178271925953381282421954028302759408448955014676668389697996886241636313376393903373455801407636741877711055384225739499110186468219696581651485130494222369947714763069155468217682876200362777257723781365331611196811280792669481887201298643660768551639860534602297871557517947385246369446923087894265948217008051120322365496288169035739121368338393591756418733850510970271613915439590991598154654417336311656936031122249937969999226781732358023111862644575299135758175008199839236284615249881088960232244362173771618086357015468484058622329792853875623486556440536962622018963571028812361567512543338303270029097668650568557157505516727518899194129711337690149916181315171544007728650573189557450920330185304847113818315407324053319038462084036421763703911550639789000742853672196280903477974533320468368795868580237952218629120080742819551317948157624448298518461509704888027274721574688131594750409732115080498190455803416826949787141316063210686391511681774304792596709376
In [37]:
type(n)
Out[37]:
int
In [38]:
int.bit_length(n)
Out[38]:
10001
In [39]:
nbits
Out[39]:
10001

NumPy has fixed precision integers

NumPy data types (dtypes) are fixed precision. Overflows "wrap around":

In [65]:
import numpy as np
In [41]:
np.array([2**15-1], dtype=np.int16)
Out[41]:
array([32767], dtype=int16)
In [67]:
np.array([2**15], dtype=np.int16)
Out[67]:
array([-32768], dtype=int16)
In [68]:
np.array([2**15 + 1], dtype=np.int16)
Out[68]:
array([-32767], dtype=int16)

Binary fractions

Decimal fractions can be represented as binary fractions:

Convert $0.125_{10}$ to base 2:

In [69]:
0.125 * 2  # 0.0
Out[69]:
0.25
In [70]:
_ * 2      # 0.00
Out[70]:
0.5
In [71]:
_ * 2      # 0.001
Out[71]:
1.0

Thus the binary representation of $0.125_{10}$ is $0.001_2$.

General recipe:

  • multiply by 2
  • if you get a number < 1, add a digit 0 to the right
  • if you get a number ≥ 1, add a digit 1 to the right and then use the remainder (the value -1) in the same fashion
In [72]:
0.3125 * 2    # 0.0
Out[72]:
0.625
In [73]:
_ * 2         # 0.01
Out[73]:
1.25
In [74]:
(_ - 1) * 2   # 0.010
Out[74]:
0.5
In [75]:
_ * 2         # 0.0101
Out[75]:
1.0

Thus, 0.3125 is $0.0101_2$.

What is the binary representation of decimal $0.1 = \frac{1}{10}$?

In [76]:
0.1 * 2  # 0
Out[76]:
0.2
In [77]:
_ * 2   # 0
Out[77]:
0.4
In [78]:
_ *  2  # 0
Out[78]:
0.8
In [79]:
_ * 2  # 1
Out[79]:
1.6
In [80]:
(_ - 1) * 2  # 1
Out[80]:
1.2000000000000002
In [81]:
(_ - 1) * 2  # 0
Out[81]:
0.40000000000000036
In [82]:
_ * 2  # 0 
Out[82]:
0.8000000000000007
In [83]:
_ * 2  # 1 
Out[83]:
1.6000000000000014

... etc: this is an infinitely repeating fraction and the binary representation of $0.1_{10}$ is $0.000 1100 1100 1100 ..._2$.

Thus, with a finite number of bits, 0.1 is not exactly representable in the computer.

The number 0.1 is not stored exactly in the computer. print only shows you a convenient approximation:

In [84]:
print(0.1)
0.1
In [85]:
print("{0:.55f}".format(0.1))
0.1000000000000000055511151231257827021181583404541015625

Problems with floating point arithmetic

Only a subset of all real numbers can be represented with floating point numbers of finite bit size. Almost all floating point numbers are not exact:

In [86]:
0.1 + 0.1 + 0.1 == 0.3
Out[86]:
False

... which should have yielded True! But because the machine representation of 0.1 is not exact, the equality cannot be fulfilled.

Representation of floats: IEEE 754

Floating point numbers are stored in "scientific notation": e.g. $c = 2.88792458 \times 10^8$ m/s

  • mantissa: $2.88792458$
  • exponent: $+8$
  • sign: +

Format: $$ x = (-1)^s \times 1.f \times 2^{e - \mathrm{bias}} $$

($f$ is $M$ bits long. The leading 1 in the mantissa is assumed and not stored: "ghost" or "phantom" bit.)

Format:

$$ x = (-1)^s \times 1.f \times 2^{e - \mathrm{bias}} $$

Note:

  • In IEEE 754, the highest value of $e$ in the exponent is reserved and not used, e.g. for a 32-bit float (see below) the exponent has $(30 - 23) + 1 = 8$ bit and hence the highest number for $e$ is $(2^8 - 1) - 1 = 255 - 1 = 254$. Taking the bias into account (for float, bias = 127), the largest value for the exponent is $2^{254 - 127} = 2^{127}$.
  • The case of $e=0$ is also special. In this case, the format is $$x = (-1)^s \times 0.f \times 2^{-\mathrm{bias}}$$ i.e. the "ghost 1" becomes a zero, gaining a additional order of magnitude.

IEEE float (32 bit)

IEEE float uses 32 bits

  • $\mathrm{bias} = 127_{10}$
  • bits
    sef
    bit position3130–2322–0
  • six or seven decimal places of significance (1 in $2^{23}$)
  • range: $1.4 \times 10^{-45} \leq |x_{(32)}| \leq 3.4 \times 10^{38}$
In [87]:
1/2**23
Out[87]:
1.1920928955078125e-07

IEEE double (64 bit)

Python floating point numbers are 64-bit doubles. NumPy has dtypes float32 and float64.

IEEE double uses 64 bits

  • $\mathrm{bias} = 1023_{10}$
  • bits
    sef
    bit position6362–5251–0
  • about 16 decimal places of significance (1 in $2^{52}$)
  • range: $4.9 \times 10^{-324} \leq |x_{(64)}| \leq 1.8 \times 10^{308}$
In [88]:
1/2**52
Out[88]:
2.220446049250313e-16

For numerical calculations, doubles are typically required.

Special numbers

IEEE 754 also introduces special "numbers" that can result from floating point arithmetic

  • NaN (not a number)
  • +INF and -INF (infinity)
  • -0 (signed zero)

Python itself does not use the IEEE special numbers

In [89]:
1/0
---------------------------------------------------------------------------
ZeroDivisionError                         Traceback (most recent call last)
<ipython-input-89-9e1622b385b6> in <module>()
----> 1 1/0

ZeroDivisionError: division by zero

But numpy does:

In [90]:
np.array([1, -1])/np.zeros(2)
C:\Users\csabai\Anaconda3\lib\site-packages\ipykernel\__main__.py:1: RuntimeWarning: divide by zero encountered in true_divide
  if __name__ == '__main__':
Out[90]:
array([ inf, -inf])

But beware, you cannot use INF to "take limits". It is purely a sign that something bad happened somewhere...

And not a number, nan

In [91]:
np.zeros(2)/np.zeros(2)
C:\Users\csabai\Anaconda3\lib\site-packages\ipykernel\__main__.py:1: RuntimeWarning: invalid value encountered in true_divide
  if __name__ == '__main__':
Out[91]:
array([nan, nan])

Overflow and underflow

  • underflow: typically just set to zero (and that works well most of the time)
  • overflow: raises exception or just set to inf
In [17]:
big = 1.79e308
big
Out[17]:
1.79e+308
In [93]:
2 * big
Out[93]:
inf
In [94]:
2 * np.array([big], dtype=np.float64)
C:\Users\csabai\Anaconda3\lib\site-packages\ipykernel\__main__.py:1: RuntimeWarning: overflow encountered in multiply
  if __name__ == '__main__':
Out[94]:
array([inf])

... but you can just use an even bigger data type:

In [18]:
# not supported on Windows with MS compiler
2 * np.array([big], dtype=np.float128)
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-18-f89406158fa4> in <module>()
----> 1 2 * np.array([big], dtype=np.float128)

AttributeError: module 'numpy' has no attribute 'float128'

Insignificant digits

In [96]:
x = 1000.2
A = 1000.2 - 1000.0
print(A)
0.20000000000004547
In [97]:
A == 0.2
Out[97]:
False

... oops

In [98]:
x = 700
y = 1e-14
x - y
Out[98]:
700.0
In [99]:
x - y < 700
Out[99]:
False

... ooops

Machine precision

Only a limited number of floating point numbers can be represented. This limited precision affects calculations:

In [100]:
x = 5  + 1e-16
x
Out[100]:
5.0
In [101]:
x == 5
Out[101]:
True

... oops.

Machine precision $\epsilon_m$ is defined as the maximum number that can be added to 1 in the computer without changing that number 1:

$$ 1_c + \epsilon_m := 1_c $$

Thus, the floating point representation $x_c$ of an arbitrary number $x$ is "in the vicinity of $x$"

$$ x_c = x(1\pm\epsilon), \quad |\epsilon| \leq \epsilon_m $$

where we don't know the true value of $\epsilon$.

Thus except for powers of 2 (which are represented exactly) all floating point numbers contain an unknown error in the 6th decimal place (32 bit floats) or 15th decimal (64 bit doubles).

This error should be treated as a random error because we don't know its magnitude.

In [102]:
N = 100
eps = 1
for nbits in range(N):
    eps /= 2
    one_plus_eps = 1.0 + eps
    # print("eps = {0}, 1 + eps = {1}".format(eps, one_plus_eps))
    if one_plus_eps == 1.0:
        print("machine precision reached for {0} bits".format(nbits))
        print("eps = {0}, 1 + eps = {1}".format(eps, one_plus_eps))
        break
machine precision reached for 52 bits
eps = 1.1102230246251565e-16, 1 + eps = 1.0

Compare to our estimate for the precision of float64:

In [103]:
1/2**52
Out[103]:
2.220446049250313e-16

Appendix

A quick hack to convert a floating point binary representation to a floating point number.

In [104]:
bits = "1010.0001100110011001100110011001100110011001100110011"
In [105]:
import math
def bits2number(bits):
    if '.' in bits:
        integer, fraction = bits.split('.')
    else:
        integer = bits
        fraction = ""
    powers = [int(bit) * 2**n for n, bit in enumerate(reversed(integer))]
    powers.extend([int(bit) * 2**(-n) for n, bit in enumerate(fraction, start=1)])
    return math.fsum(powers)
In [106]:
bits2number(bits)
Out[106]:
10.1
In [107]:
bits2number('1111')
Out[107]:
15.0
In [108]:
bits2number('0.0001100110011001100110011001100110011001100110011')
Out[108]:
0.09999999999999964
In [109]:
bits2number('0.0001100')
Out[109]:
0.09375
In [110]:
bits2number('0.0101')
Out[110]:
0.3125
In [111]:
bits2number("10.10101")
Out[111]:
2.65625
In [112]:
bits2number('0.0111111111111111111111111111111111111111')
Out[112]:
0.4999999999990905
In [113]:
bits2number('0.110011001100')
Out[113]:
0.7998046875

Python can convert to binary using the struct module:

In [114]:
import struct
fpack = struct.pack('f', 6.0e-8)    # pack float into bytes
fint = struct.unpack('i', fpack)[0] # unpack to int
m_bits = bin(fint)[-23:]            # mantissa bits
print(m_bits)
00000001101100101011001

With phantom bit:

In [115]:
mantissa_bits = '1.' + m_bits
print(mantissa_bits)
1.00000001101100101011001
In [116]:
import math
mn, ex = math.frexp(6.0e-8)
print(mn, ex)
0.50331648 -23