Numbers¶

Based on materials from PHY494 at Arizona State University, Department of Physics
Computational Physics: Ch 2.4, 2.5, 3
Python Tutorial Floating Point Arithmetic: Issues and Limitations

In [20]:

1+1

Out[20]:

Binary representation¶

Computers store information with two-state logic. This can be represented in a binary system with numbers 0 and 1 (i.e. base 2)

Any number can be represented in any base as a polynomial (possibly with infinitely many terms): the digits are $0 \leq x_k < b$ and determine the contribution of the base $b$ raise to the $k$th power.

$$ q_b = \sum_{k=-\infty}^{+\infty} x_k b^k $$

Integers¶

Convert 10 (base 10, i.e. $1 \times 10^1 + 0\times 10^0$) into binary

(Note: divmod(x, 2) is x // 2, x % 2, i.e. integer division and remainder):

In [120]:

divmod(10, 2)

Out[120]:

(5, 0)

In [121]:

divmod(5, 2)

Out[121]:

(2, 1)

In [122]:

divmod(2, 2)

Out[122]:

(1, 0)

The binary representation of $10_{10}$ is $1010_2$ (keep dividing until there's only 1 left, then collect the 1 and all remainders in reverse order, essentially long division).

Double check by multiplying out $1010_2$:

In [123]:

1*2**3 + 0*2**2 + 1*2**1 + 0*2**0

Out[123]:

or in Python

In [21]:

int('0b1011', 2)

Out[21]:

Summary: Integers in binary representation¶

All integers are exactly representable in base 2 with a finite number of digits.

The sign (+ or –) is represented by a single bit (0 = +, 1 = –).
The number of available "bits" (digits) determines the largest representable integer.

For example, with 8 bits available (a "byte"), what is the largest and smallest integer?

In [125]:

0b1111111  # 7 bits for number, 1 for sign (not included)

Out[125]:

In [126]:

-0b1111111

Out[126]:

-127

Sidenote: using numpy to quickly convert integers¶

If you want to properly sum all terms, use numpy arrays and the element-wise operations:

In [2]:

import numpy as np

In [3]:

nbits = 7
exponents = np.arange(nbits)
bases = 2*np.ones(nbits)  # base 2
digits = np.ones(nbits)   # all 1, for 1111111 (127 in binary)

In [5]:

np.sum(digits * bases**exponents)

Out[5]:

127.0

Examples: limits of integers¶

What is the smallest and largest integer that you can represent

if you have 4 bits available and only consider non-negative ("unsigned") integers?
if you have 32 bits and consider positive and negative integers?
if you have 64 bits and consider positive and negative integers?

Smallest and largest 4 bit unsigned integer:

In [57]:

0b0000

Out[57]:

In [58]:

0b1111

Out[58]:

Smallest and largest 32-bit signed integer (int32):

1 bit is sign, 31 bits are available, so the highest number has 31 ones (111...11111). The next highest number is 1000...000, a one with 32 bits and 31 zeroes, i.e., $2^{31}$.

Thus, the highest number is $2^{31} - 1$:

In [23]:

2**31 - 1

Out[23]:

2147483647

(and the smallest number is just $-(2^{31} - 1)$)

And int64 (signed):

In [25]:

max64 = 2**(64-1) - 1
print(-max64, max64)
print(np.log10(max64))

-9223372036854775807 9223372036854775807
18.964889726830815

Python's arbitrary precision integers¶

In Python, integers have arbitrary precision: integer arithmetic (+, -, *, //) is exact and will not overflow. Thus the following code will run forever (until memory is exhausted); if you run it, you can stop the evaluation with the ''Kernel / Interrupt'' menu command in the notebook and then investigate n and nbits:

In [36]:

n = 1 
nbits = 1
#while True:
for i in range(10000):
    n *= 2
    nbits += 1
print(n)

19950631168807583848837421626835850838234968318861924548520089498529438830221946631919961684036194597899331129423209124271556491349413781117593785932096323957855730046793794526765246551266059895520550086918193311542508608460618104685509074866089624888090489894838009253941633257850621568309473902556912388065225096643874441046759871626985453222868538161694315775629640762836880760732228535091641476183956381458969463899410840960536267821064621427333394036525565649530603142680234969400335934316651459297773279665775606172582031407994198179607378245683762280037302885487251900834464581454650557929601414833921615734588139257095379769119277800826957735674444123062018757836325502728323789270710373802866393031428133241401624195671690574061419654342324638801248856147305207431992259611796250130992860241708340807605932320161268492288496255841312844061536738951487114256315111089745514203313820202931640957596464756010405845841566072044962867016515061920631004186422275908670900574606417856951911456055068251250406007519842261898059237118054444788072906395242548339221982707404473162376760846613033778706039803413197133493654622700563169937455508241780972810983291314403571877524768509857276937926433221599399876886660808368837838027643282775172273657572744784112294389733810861607423253291974813120197604178281965697475898164531258434135959862784130128185406283476649088690521047580882615823961985770122407044330583075869039319604603404973156583208672105913300903752823415539745394397715257455290510212310947321610753474825740775273986348298498340756937955646638621874569499279016572103701364433135817214311791398222983845847334440270964182851005072927748364550578634501100852987812389473928699540834346158807043959118985815145779177143619698728131459483783202081474982171858011389071228250905826817436220577475921417653715687725614904582904992461028630081535583308130101987675856234343538955409175623400844887526162643568648833519463720377293240094456246923254350400678027273837755376406726898636241037491410966718557050759098100246789880178271925953381282421954028302759408448955014676668389697996886241636313376393903373455801407636741877711055384225739499110186468219696581651485130494222369947714763069155468217682876200362777257723781365331611196811280792669481887201298643660768551639860534602297871557517947385246369446923087894265948217008051120322365496288169035739121368338393591756418733850510970271613915439590991598154654417336311656936031122249937969999226781732358023111862644575299135758175008199839236284615249881088960232244362173771618086357015468484058622329792853875623486556440536962622018963571028812361567512543338303270029097668650568557157505516727518899194129711337690149916181315171544007728650573189557450920330185304847113818315407324053319038462084036421763703911550639789000742853672196280903477974533320468368795868580237952218629120080742819551317948157624448298518461509704888027274721574688131594750409732115080498190455803416826949787141316063210686391511681774304792596709376

In [37]:

type(n)

Out[37]:

int

In [38]:

int.bit_length(n)

Out[38]:

In [39]:

nbits

Out[39]:

NumPy has fixed precision integers¶

NumPy data types (dtypes) are fixed precision. Overflows "wrap around":

In [65]:

import numpy as np

In [41]:

np.array([2**15-1], dtype=np.int16)

Out[41]:

array([32767], dtype=int16)

In [67]:

np.array([2**15], dtype=np.int16)

Out[67]:

array([-32768], dtype=int16)

In [68]:

np.array([2**15 + 1], dtype=np.int16)

Out[68]:

array([-32767], dtype=int16)

Binary fractions¶

Decimal fractions can be represented as binary fractions:

Convert $0.125_{10}$ to base 2:

In [69]:

0.125 * 2  # 0.0

Out[69]:

0.25

In [70]:

_ * 2      # 0.00

Out[70]:

0.5

In [71]:

_ * 2      # 0.001

Out[71]:

1.0

Thus the binary representation of $0.125_{10}$ is $0.001_2$.

General recipe:

multiply by 2
if you get a number < 1, add a digit 0 to the right
if you get a number ≥ 1, add a digit 1 to the right and then use the remainder (the value -1) in the same fashion

In [72]:

0.3125 * 2    # 0.0

Out[72]:

0.625

In [73]:

_ * 2         # 0.01

Out[73]:

1.25

In [74]:

(_ - 1) * 2   # 0.010

Out[74]:

0.5

In [75]:

_ * 2         # 0.0101

Out[75]:

1.0

Thus, 0.3125 is $0.0101_2$.

What is the binary representation of decimal $0.1 = \frac{1}{10}$?

In [76]:

0.1 * 2  # 0

Out[76]:

0.2

In [77]:

_ * 2   # 0

Out[77]:

0.4

In [78]:

_ *  2  # 0

Out[78]:

0.8

In [79]:

_ * 2  # 1

Out[79]:

1.6

In [80]:

(_ - 1) * 2  # 1

Out[80]:

1.2000000000000002

In [81]:

(_ - 1) * 2  # 0

Out[81]:

0.40000000000000036

In [82]:

_ * 2  # 0

Out[82]:

0.8000000000000007

In [83]:

_ * 2  # 1

Out[83]:

1.6000000000000014

... etc: this is an infinitely repeating fraction and the binary representation of $0.1_{10}$ is $0.000 1100 1100 1100 ..._2$.

Thus, with a finite number of bits, 0.1 is not exactly representable in the computer.

The number 0.1 is not stored exactly in the computer. print only shows you a convenient approximation:

In [84]:

print(0.1)

0.1

In [85]:

print("{0:.55f}".format(0.1))

0.1000000000000000055511151231257827021181583404541015625

Problems with floating point arithmetic¶

Only a subset of all real numbers can be represented with floating point numbers of finite bit size. Almost all floating point numbers are not exact:

In [86]:

0.1 + 0.1 + 0.1 == 0.3

Out[86]:

False

... which should have yielded True! But because the machine representation of 0.1 is not exact, the equality cannot be fulfilled.

Representation of floats: IEEE 754¶

Floating point numbers are stored in "scientific notation": e.g. $c = 2.88792458 \times 10^8$ m/s

mantissa: $2.88792458$
exponent: $+8$
sign: +

Format: $$ x = (-1)^s \times 1.f \times 2^{e - \mathrm{bias}} $$

($f$ is $M$ bits long. The leading 1 in the mantissa is assumed and not stored: "ghost" or "phantom" bit.)

Format:

$$ x = (-1)^s \times 1.f \times 2^{e - \mathrm{bias}} $$

Note:

In IEEE 754, the highest value of $e$ in the exponent is reserved and not used, e.g. for a 32-bit float (see below) the exponent has $(30 - 23) + 1 = 8$ bit and hence the highest number for $e$ is $(2^8 - 1) - 1 = 255 - 1 = 254$. Taking the bias into account (for float, bias = 127), the largest value for the exponent is $2^{254 - 127} = 2^{127}$.
The case of $e=0$ is also special. In this case, the format is $$x = (-1)^s \times 0.f \times 2^{-\mathrm{bias}}$$ i.e. the "ghost 1" becomes a zero, gaining a additional order of magnitude.

IEEE float (32 bit)¶

IEEE float uses 32 bits

$\mathrm{bias} = 127_{10}$
bits

s e f

bit position 31 30–23 22–0
six or seven decimal places of significance (1 in $2^{23}$)
range: $1.4 \times 10^{-45} \leq |x_{(32)}| \leq 3.4 \times 10^{38}$

In [87]:

1/2**23

Out[87]:

1.1920928955078125e-07

IEEE double (64 bit)¶

Python floating point numbers are 64-bit doubles. NumPy has dtypes float32 and float64.

IEEE double uses 64 bits

$\mathrm{bias} = 1023_{10}$
bits

s e f

bit position 63 62–52 51–0
about 16 decimal places of significance (1 in $2^{52}$)
range: $4.9 \times 10^{-324} \leq |x_{(64)}| \leq 1.8 \times 10^{308}$

In [88]:

1/2**52

Out[88]:

2.220446049250313e-16

For numerical calculations, doubles are typically required.

Special numbers¶

IEEE 754 also introduces special "numbers" that can result from floating point arithmetic

NaN (not a number)
+INF and -INF (infinity)
-0 (signed zero)

Python itself does not use the IEEE special numbers

In [89]:

1/0

---------------------------------------------------------------------------
ZeroDivisionError                         Traceback (most recent call last)
<ipython-input-89-9e1622b385b6> in <module>()
----> 1 1/0

ZeroDivisionError: division by zero

But numpy does:

In [90]:

np.array([1, -1])/np.zeros(2)

C:\Users\csabai\Anaconda3\lib\site-packages\ipykernel\__main__.py:1: RuntimeWarning: divide by zero encountered in true_divide
  if __name__ == '__main__':

Out[90]:

array([ inf, -inf])

But beware, you cannot use INF to "take limits". It is purely a sign that something bad happened somewhere...

And not a number, nan

In [91]:

np.zeros(2)/np.zeros(2)

C:\Users\csabai\Anaconda3\lib\site-packages\ipykernel\__main__.py:1: RuntimeWarning: invalid value encountered in true_divide
  if __name__ == '__main__':

Out[91]:

array([nan, nan])

Overflow and underflow¶

underflow: typically just set to zero (and that works well most of the time)
overflow: raises exception or just set to inf

In [17]:

big = 1.79e308
big

Out[17]:

1.79e+308

In [93]:

2 * big

Out[93]:

inf

In [94]:

2 * np.array([big], dtype=np.float64)

C:\Users\csabai\Anaconda3\lib\site-packages\ipykernel\__main__.py:1: RuntimeWarning: overflow encountered in multiply
  if __name__ == '__main__':

Out[94]:

array([inf])

... but you can just use an even bigger data type:

In [18]:

# not supported on Windows with MS compiler
2 * np.array([big], dtype=np.float128)

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-18-f89406158fa4> in <module>()
----> 1 2 * np.array([big], dtype=np.float128)

AttributeError: module 'numpy' has no attribute 'float128'

Insignificant digits¶

In [96]:

x = 1000.2
A = 1000.2 - 1000.0
print(A)

0.20000000000004547

In [97]:

A == 0.2

Out[97]:

False

... oops

In [98]:

x = 700
y = 1e-14
x - y

Out[98]:

700.0

In [99]:

x - y < 700

Out[99]:

False

... ooops

Machine precision¶

Only a limited number of floating point numbers can be represented. This limited precision affects calculations:

In [100]:

x = 5  + 1e-16
x

Out[100]:

5.0

In [101]:

x == 5

Out[101]:

True

... oops.

Machine precision $\epsilon_m$ is defined as the maximum number that can be added to 1 in the computer without changing that number 1:

$$ 1_c + \epsilon_m := 1_c $$

Thus, the floating point representation $x_c$ of an arbitrary number $x$ is "in the vicinity of $x$"

$$ x_c = x(1\pm\epsilon), \quad |\epsilon| \leq \epsilon_m $$

where we don't know the true value of $\epsilon$.

Thus except for powers of 2 (which are represented exactly) all floating point numbers contain an unknown error in the 6th decimal place (32 bit floats) or 15th decimal (64 bit doubles).

This error should be treated as a random error because we don't know its magnitude.

In [102]:

N = 100
eps = 1
for nbits in range(N):
    eps /= 2
    one_plus_eps = 1.0 + eps
    # print("eps = {0}, 1 + eps = {1}".format(eps, one_plus_eps))
    if one_plus_eps == 1.0:
        print("machine precision reached for {0} bits".format(nbits))
        print("eps = {0}, 1 + eps = {1}".format(eps, one_plus_eps))
        break

machine precision reached for 52 bits
eps = 1.1102230246251565e-16, 1 + eps = 1.0

Compare to our estimate for the precision of float64:

In [103]:

1/2**52

Out[103]:

2.220446049250313e-16

Appendix¶

A quick hack to convert a floating point binary representation to a floating point number.

In [104]:

bits = "1010.0001100110011001100110011001100110011001100110011"

In [105]:

import math
def bits2number(bits):
    if '.' in bits:
        integer, fraction = bits.split('.')
    else:
        integer = bits
        fraction = ""
    powers = [int(bit) * 2**n for n, bit in enumerate(reversed(integer))]
    powers.extend([int(bit) * 2**(-n) for n, bit in enumerate(fraction, start=1)])
    return math.fsum(powers)

In [106]:

bits2number(bits)

Out[106]:

10.1

In [107]:

bits2number('1111')

Out[107]:

15.0

In [108]:

bits2number('0.0001100110011001100110011001100110011001100110011')

Out[108]:

0.09999999999999964

In [109]:

bits2number('0.0001100')

Out[109]:

0.09375

In [110]:

bits2number('0.0101')

Out[110]:

0.3125

In [111]:

bits2number("10.10101")

Out[111]:

2.65625

In [112]:

bits2number('0.0111111111111111111111111111111111111111')

Out[112]:

0.4999999999990905

In [113]:

bits2number('0.110011001100')

Out[113]:

0.7998046875

Python can convert to binary using the struct module:

In [114]:

import struct
fpack = struct.pack('f', 6.0e-8)    # pack float into bytes
fint = struct.unpack('i', fpack)[0] # unpack to int
m_bits = bin(fint)[-23:]            # mantissa bits
print(m_bits)

00000001101100101011001

With phantom bit:

In [115]:

mantissa_bits = '1.' + m_bits
print(mantissa_bits)

1.00000001101100101011001

In [116]:

import math
mn, ex = math.frexp(6.0e-8)
print(mn, ex)

0.50331648 -23

	s	e	f
bit position	31	30–23	22–0

	s	e	f
bit position	63	62–52	51–0