Doing Linear Algebra using Tensorflow 2

Run in Google Colab View source on GitHub Download notebook

In this post, we will explore ways of doing linear algebra only using tensorflow. We will only import tensorflow and nothing else. As we will see, we can do all the common linear algebra operations without using any other library. This post is very long as it covers almost all the functions that are there in the linear algebra library tf.linalg. But this is not a copy of tensorflow documentation. Rather, the tensorflow documentation is a super set of what has been discussed here. This post also assumes that readers have a working knowledge of linear algebra. Most of the times, we will give examples to illustrate a function without going into the underlying theory. Interested readers should use the contents to browse relevant sections of their interest.

import tensorflow as tf
print(tf.__version__)
2.3.0

One thing we have to keep in mind is that while accessing a function, we have to always append the function by tf.linalg. It is possible to remove the tf part by importing the linalg library from tensorflow. But even then we have to append every function by linalg. In this post, we will always use tf.linalg followed by function name. This amounts to little more typing. But we will do this to remind ourselves that we are using linalg library of tensorflow. This might seem little awkward to seasoned users of MATLAB or Julia where you just need to type the function name to use it without having to write the library name all the time. Except that, linear algebra in tensorflow seems quite natural.

Note: In this post, we will show some of the ways in which we can handle matrix operations in Tensorflow. We will mainly use 1D or 2D arrays in our examples. But matrix operations in Tensorflow are not limited to 2D arrays. In fact, the operations can be done on multidimensional arrays. If an array has more than 2 dimensions, the matrix operation is done on the last two dimensions and the same operation is carried across other dimensions. For example, if our array has a shape of (3,5,5), it can be thought of as 3 matrices each of shape (5,5). When we call a matrix function on this array, the matrix function is applied to all 3 matrices of shape (5,5). This is also true for higher dimensional arrays.

Basics

Tensorflow operates on Tensors. Tensors are characterized by their rank. Following table shows different types of tensors and their corresponding rank.

Tensors Rank
Scalars Rank 0 Tensor
Vectors (1D array) Rank 1 Tensor
Matrices (2D array) Rank 2 Tensor
3D array Rank 3 Tensor

Creating tensors

In this section, we will create tensors of different rank, starting from scalars to multi-dimensional arrays. Though tensors can be both real or complex, we will mainly focus on real tensors.

A scalar contains a single (real or complex) value.

a = tf.constant(5.0)
a
<tf.Tensor: shape=(), dtype=float32, numpy=5.0>

The output shows that the result is a tf.Tensor. As scalars are rank 0 tensors, its shape is empty. Data type of the tensor is float32. And corresponding numpy array is 5. We can get only the value of the tensor by calling numpy method.

a.numpy()
5.0

Similarly, we can define 1D and 2D tensors. While 1D tensors are called vectors, 2D tensors are called matrices.

tf.constant([1, 3, 7, 9])    # Note the shape in result. Only one shape parameter is used for vectors.
<tf.Tensor: shape=(4,), dtype=int32, numpy=array([1, 3, 7, 9], dtype=int32)>
tf.constant([[1,2,3,4],
            [5,6,7,8]])     # Note the shape in result. There are two shape parameters (rows, columns).
<tf.Tensor: shape=(2, 4), dtype=int32, numpy=
array([[1, 2, 3, 4],
       [5, 6, 7, 8]], dtype=int32)>

Another way to define a 2D array is given below.

tf.constant([1,2,3,4,5,6,7,8.0], shape = (2,4))
<tf.Tensor: shape=(2, 4), dtype=float32, numpy=
array([[1., 2., 3., 4.],
       [5., 6., 7., 8.]], dtype=float32)>

Creating a sequence of numbers

There are two ways to generate sequence of numbers in Tensorflow. Functions tf.range and tf.linspace can be used for that purpose. Sequences generated by these functions are equally spaced.

sequence = tf.range(start = 1,limit = 10, delta = 1)
sequence.numpy()
array([1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int32)

Note that the last element (limit) is not included in the array. This is consistent with Python behavior but in departure with MATLAB and Julia convention. It is also possible to set delta to a fraction.

tf.range(start = 1, limit = 10, delta = 1.5).numpy()
array([1. , 2.5, 4. , 5.5, 7. , 8.5], dtype=float32)
tf.linspace(start = 1.0, stop = 10, num = 25)  # Start must be a `float`. See documentation for more details.
<tf.Tensor: shape=(25,), dtype=float32, numpy=
array([ 1.   ,  1.375,  1.75 ,  2.125,  2.5  ,  2.875,  3.25 ,  3.625,
        4.   ,  4.375,  4.75 ,  5.125,  5.5  ,  5.875,  6.25 ,  6.625,
        7.   ,  7.375,  7.75 ,  8.125,  8.5  ,  8.875,  9.25 ,  9.625,
       10.   ], dtype=float32)>

Though in this post we will mainly focus on matrices, it is easy to create higher dimensional arrays in Tensorflow.

tf.constant(tf.range(1,13), shape = (2,3,2))
<tf.Tensor: shape=(2, 3, 2), dtype=int32, numpy=
array([[[ 1,  2],
        [ 3,  4],
        [ 5,  6]],

       [[ 7,  8],
        [ 9, 10],
        [11, 12]]], dtype=int32)>

Slicing

Slicing is similar to that of numpy slicing. For vectors (rank 1 tensor with only one shape parameter), only one argument is passed that corresponds to the location of starting index and end index of sliced array. For matrices (rank 2 tensor with two shape parameters), two input arguments need to be passed. First one for rows and second one for columns.

vector = tf.range(start = 1, limit = 10)
vector
<tf.Tensor: shape=(9,), dtype=int32, numpy=array([1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int32)>
vector[3:7].numpy() 
array([4, 5, 6, 7], dtype=int32)

Indexing in tensorflow starts from zero. In the above example, start index is 3. So that corresponds to 4th element of the vector. And end index is not included. This is similar to Python convention.

matrix = tf.constant(tf.range(20, dtype = tf.float32), shape = (4,5))
matrix
<tf.Tensor: shape=(4, 5), dtype=float32, numpy=
array([[ 0.,  1.,  2.,  3.,  4.],
       [ 5.,  6.,  7.,  8.,  9.],
       [10., 11., 12., 13., 14.],
       [15., 16., 17., 18., 19.]], dtype=float32)>
matrix[1:3, 2:4]
<tf.Tensor: shape=(2, 2), dtype=float32, numpy=
array([[ 7.,  8.],
       [12., 13.]], dtype=float32)>
matrix[:3,2:]      # Same behavior as numpy
<tf.Tensor: shape=(3, 3), dtype=float32, numpy=
array([[ 2.,  3.,  4.],
       [ 7.,  8.,  9.],
       [12., 13., 14.]], dtype=float32)>

Modifying elements of a matrix

Tensors in tensorflow, once created, can’t be modified. So the following code segment will result in an error.

>>> a = tf.constant([1,2,3,4])
>>> a[2] = 5  # Error

But there is a way to modify values of a matrix. Instead of creating a tensor, we create a Variable. Variables work just like tensors with the added advantage that their values can be modified. So if we want to modify entries of our matrix at a later stage, we have to first create our matrix as a variable. Then we can do assignment using assign command.

variable_mat = tf.Variable(tf.constant(tf.range(12, dtype = tf.float32), shape = (3,4)))
variable_mat
<tf.Variable 'Variable:0' shape=(3, 4) dtype=float32, numpy=
array([[ 0.,  1.,  2.,  3.],
       [ 4.,  5.,  6.,  7.],
       [ 8.,  9., 10., 11.]], dtype=float32)>
variable_mat[:2,2:4].assign(-1*tf.ones(shape = (2,2)))
variable_mat
<tf.Variable 'Variable:0' shape=(3, 4) dtype=float32, numpy=
array([[ 0.,  1., -1., -1.],
       [ 4.,  5., -1., -1.],
       [ 8.,  9., 10., 11.]], dtype=float32)>

Creating a complex matrix

To create a complex matrix, we have to first create the real part and imaginary part separately. Then both real and imaginary parts can be combined element wise to create a complex matrix. Elements of both real and imaginary part should be floats. This is the hard way of creating complex a complex matrix. We will discuss the simpler way next.

real_part = tf.random.uniform(shape = (3,2), minval = 1, maxval = 5)
imag_part = tf.random.uniform(shape = (3,2), minval = 1, maxval = 5)
print("Real part:")
print(real_part)
print()
print("Imaginary part:")
print(imag_part)
Real part:
tf.Tensor(
[[3.8433075 4.279177 ]
 [2.409762  1.238677 ]
 [2.4724636 1.6782365]], shape=(3, 2), dtype=float32)

Imaginary part:
tf.Tensor(
[[2.90653   4.282353 ]
 [1.0855489 1.4715123]
 [1.3954673 4.987824 ]], shape=(3, 2), dtype=float32)
complex_mat = tf.dtypes.complex(real = real_part, imag = imag_part)
print(complex_mat)
tf.Tensor(
[[3.8433075+2.90653j   4.279177 +4.282353j ]
 [2.409762 +1.0855489j 1.238677 +1.4715123j]
 [2.4724636+1.3954673j 1.6782365+4.987824j ]], shape=(3, 2), dtype=complex64)

There is a simpler way to create a complex matrix.

complex_mat_2 = tf.constant([1+2j, 2+3j , 3+4j, 4+5j, 5+6j, 6+7j], shape = (2,3))
complex_mat_2
<tf.Tensor: shape=(2, 3), dtype=complex128, numpy=
array([[1.+2.j, 2.+3.j, 3.+4.j],
       [4.+5.j, 5.+6.j, 6.+7.j]])>

Transpose of a matrix

Transpose of a real matrix

For real matrices transpose just means changing the rows into columns and vice versa. There are three functions that achieve this.

  • tf.transpose
  • tf.adjoint
  • tf.matrix_transpose

For real matrices, all three functions give identical results.

matrix
<tf.Tensor: shape=(4, 5), dtype=float32, numpy=
array([[ 0.,  1.,  2.,  3.,  4.],
       [ 5.,  6.,  7.,  8.,  9.],
       [10., 11., 12., 13., 14.],
       [15., 16., 17., 18., 19.]], dtype=float32)>
tf.transpose(matrix)
<tf.Tensor: shape=(5, 4), dtype=float32, numpy=
array([[ 0.,  5., 10., 15.],
       [ 1.,  6., 11., 16.],
       [ 2.,  7., 12., 17.],
       [ 3.,  8., 13., 18.],
       [ 4.,  9., 14., 19.]], dtype=float32)>
tf.linalg.adjoint(matrix)
<tf.Tensor: shape=(5, 4), dtype=float32, numpy=
array([[ 0.,  5., 10., 15.],
       [ 1.,  6., 11., 16.],
       [ 2.,  7., 12., 17.],
       [ 3.,  8., 13., 18.],
       [ 4.,  9., 14., 19.]], dtype=float32)>
tf.linalg.matrix_transpose(matrix)
<tf.Tensor: shape=(5, 4), dtype=float32, numpy=
array([[ 0.,  5., 10., 15.],
       [ 1.,  6., 11., 16.],
       [ 2.,  7., 12., 17.],
       [ 3.,  8., 13., 18.],
       [ 4.,  9., 14., 19.]], dtype=float32)>

Transpose of a complex matrix

Things are little different when we have a complex matrix. For complex matrices, we can take regular transpose or conjugate transpose if we want. Default is regular transpose. To take conjugate transpose, we have to set conjugate = False in tf.transpose and tf.linalg.matrix_transpose or use tf.linalg.adjoint function.

complex_mat_2.numpy()
array([[1.+2.j, 2.+3.j, 3.+4.j],
       [4.+5.j, 5.+6.j, 6.+7.j]])
transpose_of_complex_mat = tf.transpose(complex_mat_2, conjugate = False) # Regular transpose
print(transpose_of_complex_mat)
tf.Tensor(
[[1.+2.j 4.+5.j]
 [2.+3.j 5.+6.j]
 [3.+4.j 6.+7.j]], shape=(3, 2), dtype=complex128)
conjugate_transpose_of_complex_mat = tf.transpose(complex_mat_2, conjugate = True) # Conjugate transpose
print(conjugate_transpose_of_complex_mat)
tf.Tensor(
[[1.-2.j 4.-5.j]
 [2.-3.j 5.-6.j]
 [3.-4.j 6.-7.j]], shape=(3, 2), dtype=complex128)

We can also do conjugate transpose by using function linalg.adjoint function.

tf.linalg.adjoint(complex_mat_2).numpy()
array([[1.-2.j, 4.-5.j],
       [2.-3.j, 5.-6.j],
       [3.-4.j, 6.-7.j]])

Another way to take transpose of a matrix is to use the function linalg.matrix_transpose. In this function, we can set argument conjugate to True or False depending on whether we want regular transpose or conjugate transpose. Default is conjugate = False.

tf.linalg.matrix_transpose(complex_mat_2)   # Conjugate = False is the default
<tf.Tensor: shape=(3, 2), dtype=complex128, numpy=
array([[1.+2.j, 4.+5.j],
       [2.+3.j, 5.+6.j],
       [3.+4.j, 6.+7.j]])>
tf.linalg.matrix_transpose(complex_mat_2, conjugate = True)
<tf.Tensor: shape=(3, 2), dtype=complex128, numpy=
array([[1.-2.j, 4.-5.j],
       [2.-3.j, 5.-6.j],
       [3.-4.j, 6.-7.j]])>

Some common matrices

Identity matrix

tf.linalg.eye(5)
<tf.Tensor: shape=(5, 5), dtype=float32, numpy=
array([[1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0.],
       [0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 1.]], dtype=float32)>

Diagonal matrix

tf.linalg.diag([1,2,3,4,5])
<tf.Tensor: shape=(5, 5), dtype=int32, numpy=
array([[1, 0, 0, 0, 0],
       [0, 2, 0, 0, 0],
       [0, 0, 3, 0, 0],
       [0, 0, 0, 4, 0],
       [0, 0, 0, 0, 5]], dtype=int32)>

To create diagonal matrix, we can also use tf.linalg.tensor_diag with main diagonal as input.

tf.linalg.tensor_diag(tf.constant([1,2,3,4,5.]))
<tf.Tensor: shape=(5, 5), dtype=float32, numpy=
array([[1., 0., 0., 0., 0.],
       [0., 2., 0., 0., 0.],
       [0., 0., 3., 0., 0.],
       [0., 0., 0., 4., 0.],
       [0., 0., 0., 0., 5.]], dtype=float32)>

We can also create a matrix whose only nonzero entries are on its super-diagonals or sub-diagonals.

tf.linalg.diag([1,2,3,4], k = 1)   # Values in super-diagonal
<tf.Tensor: shape=(5, 5), dtype=int32, numpy=
array([[0, 1, 0, 0, 0],
       [0, 0, 2, 0, 0],
       [0, 0, 0, 3, 0],
       [0, 0, 0, 0, 4],
       [0, 0, 0, 0, 0]], dtype=int32)>
tf.linalg.diag([1,2,3,4], k = -1)  # Values in sub-diagonal
<tf.Tensor: shape=(5, 5), dtype=int32, numpy=
array([[0, 0, 0, 0, 0],
       [1, 0, 0, 0, 0],
       [0, 2, 0, 0, 0],
       [0, 0, 3, 0, 0],
       [0, 0, 0, 4, 0]], dtype=int32)>
tf.linalg.diag([1,2,3,4,5], k = 0, padding_value = -1)
<tf.Tensor: shape=(5, 5), dtype=int32, numpy=
array([[ 1, -1, -1, -1, -1],
       [-1,  2, -1, -1, -1],
       [-1, -1,  3, -1, -1],
       [-1, -1, -1,  4, -1],
       [-1, -1, -1, -1,  5]], dtype=int32)>

Another way to create a diagonal matrix is by using tf.linalg.set_diag function.

mat = tf.zeros(shape = (5,5))
diag = tf.constant([1,2,3,4,5.])
tf.linalg.set_diag(input = mat, diagonal = diag, k = 0)
<tf.Tensor: shape=(5, 5), dtype=float32, numpy=
array([[1., 0., 0., 0., 0.],
       [0., 2., 0., 0., 0.],
       [0., 0., 3., 0., 0.],
       [0., 0., 0., 4., 0.],
       [0., 0., 0., 0., 5.]], dtype=float32)>
tf.linalg.set_diag(mat, tf.constant([1,2,3,4.]), k = 1)  # Set super-diagonal
<tf.Tensor: shape=(5, 5), dtype=float32, numpy=
array([[0., 1., 0., 0., 0.],
       [0., 0., 2., 0., 0.],
       [0., 0., 0., 3., 0.],
       [0., 0., 0., 0., 4.],
       [0., 0., 0., 0., 0.]], dtype=float32)>
diags = tf.constant([[1,2,3,4,5],
                     [6,7,8,9,0.]])
tf.linalg.set_diag(mat, diags, k = (-1,0))
<tf.Tensor: shape=(5, 5), dtype=float32, numpy=
array([[1., 0., 0., 0., 0.],
       [6., 2., 0., 0., 0.],
       [0., 7., 3., 0., 0.],
       [0., 0., 8., 4., 0.],
       [0., 0., 0., 9., 5.]], dtype=float32)>

Note on alignment strategy

Functions like tf.linalg.diag and tf.linalg.set_diag and several others to be seen later, take an argument called align among other things. Though up to this point we have used default parameters of align, we believe, a note is warranted at this point. There are four different alignment strategies in tensorflow. Those are:

  • LEFT_LEFT: Superdiagonals are appended at right and subdiagonals are appended at right.
  • LEFT_RIGHT: Superdiagonals are appended at right and subdiagonals are appended at left.
  • RIGHT_LEFT: Superdiagonals are appended at left and subdiagonals are appended at right.
  • RIGHT_RIGHT: Superdiagonals are appended at left and subdiagonals are appended at left.

One way to remember the above rules is that if something is aligned to left, it is appended at right. And in LEFT_RIGHT, first word corresponds to superdiagonals and second word corresponds to subdiagonals.

Why do we need an alignment strategy?

Both superdiagonals and subdiagonals have less number of entries than the main diagonal. If we extract (for some reason) the subdiagonals (or superdiagonals) of a matrix along with the main diagonal, the resulting array will have different lengths. These type of arrays are called ragged arrays. Though tensorflow can handle ragged arrays, the results of linear algebra library are always uniform arrays (i.e., all the arrays have same number of entries). So while extracting subdiagonals (or superdiagonals), we have to append it either at the left or at right to make it of the same length as the main diagonal. This leads to the question of where the arrays should be appended. Tensorflow leaves that option to the readers. Depending on where we append the subdiagonals and superdiagonals, there are four alignment strategies as mentioned below.

In the next section, we will see a way to create tri-diagonal matrix using tf.linalg.set_diag.

Tri-diagonal matrix

Let’s create Gilbert Strang’s favorite matrix.

tf.linalg.diag(tf.repeat(2,repeats = 5)) + tf.linalg.diag(tf.repeat(-1, repeats = 4), k = -1) + tf.linalg.diag(tf.repeat(-1, repeats = 4), k = 1)
<tf.Tensor: shape=(5, 5), dtype=int32, numpy=
array([[ 2, -1,  0,  0,  0],
       [-1,  2, -1,  0,  0],
       [ 0, -1,  2, -1,  0],
       [ 0,  0, -1,  2, -1],
       [ 0,  0,  0, -1,  2]], dtype=int32)>

Tri-digonal matrix using tf.linalg.set_diag

While setting more that one diagonals using set_diag, say k = (-2,3), we have to have 6 diagonals (2 sub-diagonals, 1 main diagonal, and 3 super-diagonals). First three rows of the input diagonals will correspond to super-diagonals and have to be appended at the right by zeros (according to alignment strategy of "LEFT_RIGHT"). Fourth row corresponds to main diagonal. Last two rows correspond to sub-diagonals and have to be appended at the left by zeros (according to alignment strategy of "LEFT_RIGHT"). We could have chosen any other alignment strategy and modified our input accordingly. Using this rule, now we will create a tri-diagonal matrix.

diags = tf.constant([[-1,-1,-1,-1, 0],
                     [ 2, 2, 2, 2, 2],
                     [ 0,-1,-1,-1,-1]], dtype = tf.float32)
mat = tf.zeros(shape = (5,5))
tf.linalg.set_diag(mat,diags, k = (-1,1), align = "LEFT_RIGHT")
<tf.Tensor: shape=(5, 5), dtype=float32, numpy=
array([[ 2., -1.,  0.,  0.,  0.],
       [-1.,  2., -1.,  0.,  0.],
       [ 0., -1.,  2., -1.,  0.],
       [ 0.,  0., -1.,  2., -1.],
       [ 0.,  0.,  0., -1.,  2.]], dtype=float32)>

There is yet another simpler way to create a tri-diagonal matrix using a linear operator. We will see that technique in a later section.

Matrix of all zeros and ones

Matrices of all 1s or all 0s are not in linalg library. But those are available in core Tensorflow.

tf.zeros(shape = (3,5), dtype = tf.float32)
<tf.Tensor: shape=(3, 5), dtype=float32, numpy=
array([[0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.]], dtype=float32)>
tf.ones(shape = (5,4), dtype = tf.int32)
<tf.Tensor: shape=(5, 4), dtype=int32, numpy=
array([[1, 1, 1, 1],
       [1, 1, 1, 1],
       [1, 1, 1, 1],
       [1, 1, 1, 1],
       [1, 1, 1, 1]], dtype=int32)>

Random matrices

Random matrices are also not part of linalg library. Rather, they are part of tf.random library. Using Tensorflow we can create matrices whose entries come from normal, uniform, poisson, and gamma distributions.

Random uniform matrix
tf.random.uniform(shape = (5,5), minval = 0, maxval = 5, seed= 32)
<tf.Tensor: shape=(5, 5), dtype=float32, numpy=
array([[4.8578553 , 0.26324332, 1.7549878 , 4.434555  , 2.3975224 ],
       [3.219039  , 0.4039365 , 0.92039883, 2.9136662 , 4.9377174 ],
       [4.617196  , 3.6782126 , 4.0351195 , 4.8321657 , 4.206293  ],
       [2.3059547 , 4.922245  , 4.186061  , 2.1761923 , 0.88124394],
       [2.7422066 , 1.5948689 , 2.6099925 , 4.4901986 , 2.4033623 ]],
      dtype=float32)>
# Random matrix of integers
uniform_int = tf.random.uniform(shape = (5,5), minval= 10, maxval = 20, dtype = tf.int32, seed = 1234)
uniform_int
<tf.Tensor: shape=(5, 5), dtype=int32, numpy=
array([[19, 15, 13, 14, 10],
       [16, 18, 10, 15, 10],
       [12, 13, 19, 12, 16],
       [18, 11, 10, 18, 12],
       [17, 18, 14, 19, 10]], dtype=int32)>

For further processing we usually require matrix entries to be floating point numbers. This can be achieved by using tf.cast.

tf.dtypes.cast(uniform_int, dtype = tf.float32)
<tf.Tensor: shape=(5, 5), dtype=float32, numpy=
array([[19., 15., 13., 14., 10.],
       [16., 18., 10., 15., 10.],
       [12., 13., 19., 12., 16.],
       [18., 11., 10., 18., 12.],
       [17., 18., 14., 19., 10.]], dtype=float32)>

Random normal matrix
tf.random.normal(shape = (5,5), mean = 1, stddev= 3, seed = 253)
<tf.Tensor: shape=(5, 5), dtype=float32, numpy=
array([[ 1.5266892 , -5.114835  ,  4.4653835 , -1.013567  , -1.1874261 ],
       [ 5.503375  , -1.4568713 , -1.3270268 ,  0.2747649 ,  3.1374507 ],
       [ 4.211556  ,  4.618066  ,  1.2217634 ,  0.04707384,  1.4131291 ],
       [-2.7024255 ,  0.81293994, -3.11763   , -3.043394  ,  5.5663233 ],
       [ 1.4549919 ,  3.7368293 ,  1.2184538 ,  2.0713992 ,  0.19450545]],
      dtype=float32)>

Truncated random normal matrix

truncated_normal function gives values within two standard deviations of mean on both sides of normal curve.

tf.random.truncated_normal(shape = (5,5), mean = 0, stddev= 2, seed = 82) 
<tf.Tensor: shape=(5, 5), dtype=float32, numpy=
array([[ 2.3130474 ,  1.917585  ,  1.1134342 , -3.6221776 , -2.242488  ],
       [ 2.8108876 , -1.8440692 ,  1.7630143 , -0.4591654 , -0.20763761],
       [-0.4769438 ,  2.3582413 , -0.45690525, -0.4208855 , -1.8990422 ],
       [-2.2638845 ,  2.9536312 ,  0.9591611 ,  2.670887  ,  1.4793464 ],
       [-0.60492915,  3.6320126 ,  3.9752324 , -0.4684417 , -3.2791114 ]],
      dtype=float32)>

There are ways to create deterministic random numbers using stateless_normal, stateless_uniform, etc. To know more about random number generation in Tensorflow, go to this link

Random Poisson matrix
tf.random.poisson(shape = (5,5), lam = 2, seed = 12)
<tf.Tensor: shape=(5, 5), dtype=float32, numpy=
array([[1., 0., 1., 2., 3.],
       [0., 1., 2., 3., 4.],
       [2., 0., 2., 2., 2.],
       [2., 0., 2., 2., 3.],
       [1., 4., 2., 5., 4.]], dtype=float32)>

Random gamma matrix
tf.random.gamma(shape = (5,5), alpha = 0.7, beta= 0.3, seed = 232)
<tf.Tensor: shape=(5, 5), dtype=float32, numpy=
array([[0.78733766, 2.5200539 , 0.9812998 , 5.141082  , 1.9184761 ],
       [1.1069427 , 0.32923967, 0.13172682, 5.066955  , 2.8487072 ],
       [0.39204285, 0.53647757, 5.3083944 , 1.618826  , 0.41352856],
       [1.0327125 , 0.27330002, 0.34577194, 0.22123706, 0.77021873],
       [0.38616025, 9.153643  , 1.4737413 , 6.029133  , 0.05517024]],
      dtype=float32)>

Some special matrices

Special matrices like toeplitz, circulant, Kronecker, etc can be created using linear operators. We will discuss this in the linear operator section.

Sparse matrices

Sparse matrices are within tf.sparse library. There are several functions specifically designed for sparse matrices. Full list of function in tf.sparse library can be found at this link. In this section, we will see how sparse matrices are created. The first argument is set of indices (rows and columns), second argument is the values at those indices. Third argument is the dense_shape of the sparse matrix.

sparse_mat = tf.sparse.SparseTensor([[0,1],[1,3],[3,2]], [-5, -10, 7], dense_shape= (5,5))
sparse_mat
<tensorflow.python.framework.sparse_tensor.SparseTensor at 0x7f8f505203a0>

To see the actual matrix, we have to convert the sparse matrix to a dense matrix. This is achieved using to_dense function.

tf.sparse.to_dense(sparse_mat)
<tf.Tensor: shape=(5, 5), dtype=int32, numpy=
array([[  0,  -5,   0,   0,   0],
       [  0,   0,   0, -10,   0],
       [  0,   0,   0,   0,   0],
       [  0,   0,   7,   0,   0],
       [  0,   0,   0,   0,   0]], dtype=int32)>

It should be noted that special algorithms exist to deal with sparse matrices. Those algorithms don’t require the sparse matrix to be converted into its dense equivalent. By converting a sparse matrix into a dense one, all its special properties are lost. Therefore, sparse matrices should not be converted into dense ones.

Matrix multiplication

To multiply two vectors, or two matrices, or a matrix with a vector in a linear algebra sense, we have to use linalg.matmul function. Using * operator in python does element wise multiplication with broadcasting wherever possible. So to multiply two matrices, we have to call linalg.matmul function. Inputs to linalg.matmul function are matrices. Therefore, while multiplying two arrays, we have to first convert them into vectors and then multiply. Also note that linalg.matmul is same as tf.matmul. Both are aliases.

Multiplying two column vectors

Vectors in tensorflow have only 1 shape parameter, where as a column vector (a matrix with one column) has two shape parameters. For example, a vector \([1,2,3]\) has shape \((3,)\), but the column vector \([1,2,3]^T\) has shape \((3,1)\).

Inner product
vector_1 = tf.constant([1., 2., 3.], shape = (3,1))
vector_2 = tf.constant([2., 3., 4.], shape = (3,1))
result = tf.matmul(a = vector_1, b = vector_2, transpose_a=True) # Inner product
result.numpy()
array([[20.]], dtype=float32)

Outer product
tf.matmul(a = vector_1, b = vector_2, transpose_b = True) # Outer product
<tf.Tensor: shape=(3, 3), dtype=float32, numpy=
array([[ 2.,  3.,  4.],
       [ 4.,  6.,  8.],
       [ 6.,  9., 12.]], dtype=float32)>

Multiplying a matrix with a vector

There are two ways in which we can achieve this. We can convert the vector into a column vector (matrix with 1 column) and then apply tf.matmul, or we can use the inbuilt function tf.linalg.matvec to multiply a matrix with a vector.

mat_1 = tf.constant([1,2,3,4,5,6],shape = (2,3), dtype = tf.float32)
mat_1.numpy()
array([[1., 2., 3.],
       [4., 5., 6.]], dtype=float32)
tf.matmul(a = mat_1, b = vector_1).numpy()
array([[14.],
       [32.]], dtype=float32)
tf.linalg.matvec(mat_1, tf.constant([1,2,3.]))    # Note the shape of input vector and result.
<tf.Tensor: shape=(2,), dtype=float32, numpy=array([14., 32.], dtype=float32)>

Multiplying two matrices

tf.linalg.matmul(a = mat, b = mat, transpose_a=True).numpy() # Without `transpose_a` argument, result will be an error.
array([[0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.]], dtype=float32)
tf.linalg.matmul(a = mat, b = mat, transpose_b=True).numpy()
array([[0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.]], dtype=float32)

Multiplying two tri-diagonal matrices

If matrices have sparse structure, usual matrix multiplication is not an efficient method for those type of matrices. Special algorithms are there that exploit the sparsity of the matrices.

One such sparse matrix is tri-diagonal matrix. It has nonzero entries only on its super-diagonal, main diagonal, and sub-diagonal. To multiply a tri-diagonal matrix with another matrix, we can use tf.linalg.tridiagonal_matmul function. Its first argument is the diagonals of tri-diagonal matrix and second argument is the matrix with which the tri-diagonal matrix needs to be multiplied.

diagonals = tf.constant([[-1,-1,-1,-1,0],
                         [ 2, 2, 2, 2, 2],
                         [ 0,-1,-1,-1,-1.]])
rhs = tf.constant([[1,2,3],
                   [2,1,3],
                   [4,5,6],
                   [7,8,9],
                   [2,5,4.]])
tf.linalg.tridiagonal_matmul(diagonals, rhs) 
<tf.Tensor: shape=(5, 3), dtype=float32, numpy=
array([[ 0.,  3.,  3.],
       [-1., -5., -3.],
       [-1.,  1.,  0.],
       [ 8.,  6.,  8.],
       [-3.,  2., -1.]], dtype=float32)>

We can verify the result by dense matrix multiplication. However, note that this is only for verification. For large matrix multiplications involving tri-diagonal matrix, dense multiplication will be considerably slower.

tridiag_mat = tf.linalg.set_diag(tf.zeros(shape = (5,5)), diagonals, k = (-1,1), align = "LEFT_RIGHT")
tf.matmul(tridiag_mat, rhs)
<tf.Tensor: shape=(5, 3), dtype=float32, numpy=
array([[ 0.,  3.,  3.],
       [-1., -5., -3.],
       [-1.,  1.,  0.],
       [ 8.,  6.,  8.],
       [-3.,  2., -1.]], dtype=float32)>

How to multiply two tri-diagonal matrices?

In this case, we have to convert the right tri-diagonal matrix into a full matrix and then multiply it with the left one using only the diagonals of left tri-diagonal matrix. For example, we will multiply the previous tri-diagonal matrix with itself.

tf.linalg.tridiagonal_matmul(diagonals, rhs = tridiag_mat)
<tf.Tensor: shape=(5, 5), dtype=float32, numpy=
array([[ 5., -4.,  1.,  0.,  0.],
       [-4.,  6., -4.,  1.,  0.],
       [ 1., -4.,  6., -4.,  1.],
       [ 0.,  1., -4.,  6., -4.],
       [ 0.,  0.,  1., -4.,  5.]], dtype=float32)>
tf.matmul(tridiag_mat, tridiag_mat)
<tf.Tensor: shape=(5, 5), dtype=float32, numpy=
array([[ 5., -4.,  1.,  0.,  0.],
       [-4.,  6., -4.,  1.,  0.],
       [ 1., -4.,  6., -4.,  1.],
       [ 0.,  1., -4.,  6., -4.],
       [ 0.,  0.,  1., -4.,  5.]], dtype=float32)>

Some common operations on matrices

Trace

Computes the trace of a tensor. For non-square rank 2 tensors, trace of the main diagonal is computed.

mat = tf.constant([[2,4,6],
                   [5,1,9.]])
tf.linalg.trace(mat).numpy()
              
3.0

Determinant

Computes the determinant of the matrix.

mat = -2*tf.linalg.diag([1,2,3.])
tf.linalg.det(mat)
<tf.Tensor: shape=(), dtype=float32, numpy=-48.0>

Rank

Computes the rank of a matrix.

A = tf.constant([[1,4,5],
                 [3,2,5],
                 [2,1,3.]])
rank = tf.linalg.matrix_rank(A)
print("Rank of A = ", rank.numpy())
Rank of A =  2

Matrix inverse

Computes the matrix inverse if it exists. It uses \(LU\) decomposition to calculate inverse. What happens if inverse doesn’t exist? Here is the answer taken directly from tensorflow documentation:

[…] If a matrix is not invertible there is no guarantee what the op does. It may detect the condition and raise an exception or it may simply return a garbage result. […]

Having read the documentation, we will apply the function to an invertible matrix.

A = tf.constant([[2, 2, 3],
                 [4,5,6],
                 [1,2,4.]])
A_inv = tf.linalg.inv(A)
print(A_inv)
tf.Tensor(
[[ 1.5999998e+00 -3.9999992e-01 -6.0000008e-01]
 [-1.9999999e+00  9.9999994e-01  7.9472862e-08]
 [ 5.9999996e-01 -3.9999998e-01  3.9999998e-01]], shape=(3, 3), dtype=float32)
tf.matmul(A,A_inv)
<tf.Tensor: shape=(3, 3), dtype=float32, numpy=
array([[ 9.9999970e-01,  1.1920929e-07, -1.1920929e-07],
       [-5.9604645e-07,  1.0000002e+00,  0.0000000e+00],
       [-2.3841858e-07,  0.0000000e+00,  1.0000000e+00]], dtype=float32)>

Inverse using \(LU\) factors

If \(LU\) decomposition result is already available from some prior computation, it can be used to compute the inverse using command tf.linalg.lu_matrix_inverse. This command basically solves the following system:

\[AX=I \Leftrightarrow LUX=I\]

As \(L\) and \(U\) are lower and upper triangular respectively, two triangular systems can be solved to obtain \(X\) which is nothing but \(A^{-1}\). The triangular systems are \(LY = I\) (this gives \(Y\) as result) and \(UX=Y\) (this gives \(X\), i.e., \(A^{-1}\) as result). Here, the right hand side has more than one column in both the triangular systems. We will see how to solve those triangular systems at a later section. For the time being, we will just compute the inverse using the built-in command.

lu, p = tf.linalg.lu(A)
A_inv_by_lu = tf.linalg.lu_matrix_inverse(lu,p)
A_inv_by_lu
<tf.Tensor: shape=(3, 3), dtype=float32, numpy=
array([[ 1.5999998e+00, -3.9999992e-01, -6.0000008e-01],
       [-1.9999999e+00,  9.9999994e-01,  7.9472862e-08],
       [ 5.9999996e-01, -3.9999998e-01,  3.9999998e-01]], dtype=float32)>

Extract diagonals of a matrix

Diagonals of a matrix can be extracted using tf.linalg.diag_part function. Diagonal entries are obtained by setting k=0 which is the default. By setting k to any other value, either sub-diagonal or super-diagonal can be obtained.If two values are given to k, the values correspond respectively to the lower limit and upper limit of the diagonal. And the result contains all diagonals within those limits. The result is not a matrix. It is an array of diagonals, appended if required. Sub-diagonals are appended at the right and super diagonals are appended at the left.

Another function tf.linalg.tensor_diag_part can be used to extract the main diagonal of the matrix. But it can extract only the main diagonal.

mat = tf.random.uniform(shape = (5,5), minval = 1, maxval = 20, dtype = tf.int32)
mat
<tf.Tensor: shape=(5, 5), dtype=int32, numpy=
array([[10, 11,  3, 18,  6],
       [18,  1, 16, 14, 12],
       [ 4, 18, 12, 17,  1],
       [12,  7,  5,  3,  7],
       [ 9, 16, 11, 14,  8]], dtype=int32)>
tf.linalg.diag_part(mat).numpy()
array([10,  1, 12,  3,  8], dtype=int32)
tf.linalg.tensor_diag_part(mat).numpy()
array([10,  1, 12,  3,  8], dtype=int32)
tf.linalg.diag_part(mat, k = (-1,0)).numpy()
array([[10,  1, 12,  3,  8],
       [18, 18,  5, 14,  0]], dtype=int32)
tf.linalg.diag_part(mat, k = [-2,1])  # 2 subdiagonals, main diagonal, and 1 super diagonal
<tf.Tensor: shape=(4, 5), dtype=int32, numpy=
array([[ 0, 11, 16, 17,  7],
       [10,  1, 12,  3,  8],
       [18, 18,  5, 14,  0],
       [ 4,  7, 11,  0,  0]], dtype=int32)>

Extract band part of a matrix

A band matrix is one that has nonzero values along its diagonal and a few sub-diagonals and super-diagonals. All other entries are zero. It is a sparse matrix. All of its nonzero entries are concentrated in a band along the diagonal. For example, tri-diagonal matrix is a banded matrix. It has lower bandwidth of 1 and upper bandwidth of 1. It is possible for a matrix to have different upper and lower bandwidths. It is still called a banded matrix.

Banded matrices are useful because computations are significantly faster using these matrices as compared to dense matrices of same shape. If for some application, we want the band part of a matrix, we can use linalg.band_part function to extract it. This function takes three arguments (input, num_lower, num_upper). First argument is the tensor whose band part we want to extract. Second argument is the number of sub-diagonals to keep. If set to 0, no sub-diagonal is kept. num_lower = -1 keeps all the sub-diagonals. Similarly for num_upper.

matrix = tf.constant(tf.range(25, dtype=tf.float32), shape=(5,5))
matrix
<tf.Tensor: shape=(5, 5), dtype=float32, numpy=
array([[ 0.,  1.,  2.,  3.,  4.],
       [ 5.,  6.,  7.,  8.,  9.],
       [10., 11., 12., 13., 14.],
       [15., 16., 17., 18., 19.],
       [20., 21., 22., 23., 24.]], dtype=float32)>
tf.linalg.band_part(matrix, num_lower = 2, num_upper = 1)
<tf.Tensor: shape=(5, 5), dtype=float32, numpy=
array([[ 0.,  1.,  0.,  0.,  0.],
       [ 5.,  6.,  7.,  0.,  0.],
       [10., 11., 12., 13.,  0.],
       [ 0., 16., 17., 18., 19.],
       [ 0.,  0., 22., 23., 24.]], dtype=float32)>
tf.linalg.band_part(matrix, num_lower = -1, num_upper = 0)  # Lower triangular part
<tf.Tensor: shape=(5, 5), dtype=float32, numpy=
array([[ 0.,  0.,  0.,  0.,  0.],
       [ 5.,  6.,  0.,  0.,  0.],
       [10., 11., 12.,  0.,  0.],
       [15., 16., 17., 18.,  0.],
       [20., 21., 22., 23., 24.]], dtype=float32)>

Matrix factorizations

Some of the most common and widely used matrix factorizations are available in Tensorflow.

LU

Matrix \(A\) is factorized into a unit lower triangular matrix \(L\) and an upper triangular matrix \(U\), such that \[A=LU\]

To reduce round-off errors, partial pivoting is used. In partial pivoting, the following factorization is done. \[PA = LU\] Where, \(P\) is called the permutation matrix.

A = tf.constant([[1, 4, 7, 8],
                [24, -5, -13, 9],
                [-7, 21, 8, 19],
                [0, 18, 6, 4]], dtype = tf.float32)
lu, p = tf.linalg.lu(A)  # As per documentation
print("LU = ")
print(lu)
print()
print("P = ")
print(p)
LU = 
tf.Tensor(
[[ 24.          -5.         -13.           9.        ]
 [ -0.29166666  19.541666     4.2083335   21.625     ]
 [  0.04166667   0.21535183   6.635394     2.9680166 ]
 [  0.           0.9211088    0.32005137 -16.868896  ]], shape=(4, 4), dtype=float32)

P = 
tf.Tensor([1 2 0 3], shape=(4,), dtype=int32)

What does the above result mean? Well, both \(L\), and \(U\) matrices have been merged into one. And \(P\) contains permutation indices. In practice, we don’t have to reconstruct individual matrices \(L\),\(U\), and \(P\), because tensorflow has built-in functions for further analysis that uses the result of tf.linalg.lu as given above. For the sake of demonstration, we will show how to reconstruct those matrices from above result. To reconstruct \(P\), we will use a linear operator that is discussed next. After constructing the matrices \(A\) will be \[A = P^TLU\]

L = tf.linalg.band_part(lu,-1,0) - tf.linalg.diag(tf.linalg.diag_part(lu)) + tf.linalg.diag(tf.ones(shape = lu.shape[0],))
U = tf.linalg.band_part(lu, 0, -1)
permu_operator = tf.linalg.LinearOperatorPermutation(p)
P = permu_operator.to_dense()
print("L:")
print(L)
print()
print("U:")
print(U)
print()
print("P:")
print(P)
L:
tf.Tensor(
[[ 1.          0.          0.          0.        ]
 [-0.29166666  1.          0.          0.        ]
 [ 0.04166667  0.21535183  1.          0.        ]
 [ 0.          0.9211088   0.32005137  1.        ]], shape=(4, 4), dtype=float32)

U:
tf.Tensor(
[[ 24.         -5.        -13.          9.       ]
 [  0.         19.541666    4.2083335  21.625    ]
 [  0.          0.          6.635394    2.9680166]
 [  0.          0.          0.        -16.868896 ]], shape=(4, 4), dtype=float32)

P:
tf.Tensor(
[[0. 1. 0. 0.]
 [0. 0. 1. 0.]
 [1. 0. 0. 0.]
 [0. 0. 0. 1.]], shape=(4, 4), dtype=float32)
tf.matmul(P, tf.matmul(L,U), transpose_a = True)
<tf.Tensor: shape=(4, 4), dtype=float32, numpy=
array([[  1.       ,   4.0000005,   7.       ,   8.       ],
       [ 24.       ,  -5.       , -13.       ,   9.       ],
       [ -7.       ,  21.       ,   8.       ,  19.       ],
       [  0.       ,  18.       ,   6.       ,   3.999998 ]],
      dtype=float32)>

We can easily reconstruct our original matrix from \(LU\) factors using the function tf.linalg.lu_reconstruct.

tf.linalg.lu_reconstruct(lu,p)
<tf.Tensor: shape=(4, 4), dtype=float32, numpy=
array([[  1.       ,   4.0000005,   7.       ,   8.       ],
       [ 24.       ,  -5.       , -13.       ,   9.       ],
       [ -7.       ,  21.       ,   8.       ,  19.       ],
       [  0.       ,  18.       ,   6.       ,   3.999998 ]],
      dtype=float32)>

Cholesky

It is defined for symmetric positive definite matrices. If A is a symmetric positive definite matrix, its Cholesky decomposition can be written as: \[ A = LL^T\] Where, \(L\) is a lower triangular matrix.

Note: Cholesky decomposition is used as a test for positive definiteness of a matrix. If Cholesky decomposition succeeds, the matrix is positive definite, otherwise it is not. Another widely reported test for positive definiteness is to check the signs of all eigenvalues of the matrix. If all the eigenvalues are positive, the matrix is positive definite. But this method requires computation of all eigenvalues which is computationally nontrivial. Therefore, Cholesky decomposition is the preferred method to test for positive definiteness. If the matrix is not positive definite, Cholesky decomposition will stop at an intermediate step. On the other hand, if it succeeds, along with verifying positive definiteness of the matrix, we will get the Cholesky factor as a byproduct. Cholesky factor can then be used to solve linear systems as we will see later.

A = tf.constant([[1,1,1],
                 [1,5,5],
                 [1,5,14]], dtype = tf.float32)
L = tf.linalg.cholesky(A)
L
<tf.Tensor: shape=(3, 3), dtype=float32, numpy=
array([[1., 0., 0.],
       [1., 2., 0.],
       [1., 2., 3.]], dtype=float32)>
tf.matmul(L,L,transpose_b=True)
<tf.Tensor: shape=(3, 3), dtype=float32, numpy=
array([[ 1.,  1.,  1.],
       [ 1.,  5.,  5.],
       [ 1.,  5., 14.]], dtype=float32)>

QR

Given a matrix \(A\), \(QR\) decomposition decomposes the matrix into an orthogonal matrix \(Q\) and an upper triangular matrix \(R\) such that product of \(Q\) and \(R\) gives back \(A\). Columns of \(Q\) are an orthogonal basis for the column space of \(A\) (also known as range of \(A\)).

A = tf.constant([[1,2],[2,0.5],[3, 1],[4,5.]])
Q,R = tf.linalg.qr(A)
print("Q:")
print(Q)
print()
print("R:")
print(R)
Q:
tf.Tensor(
[[-0.18257415  0.4079837 ]
 [-0.36514837 -0.44398218]
 [-0.5477225  -0.575977  ]
 [-0.73029673  0.5519779 ]], shape=(4, 2), dtype=float32)

R:
tf.Tensor(
[[-5.477226  -4.7469287]
 [ 0.         2.7778888]], shape=(2, 2), dtype=float32)

We can also get full \(Q\) and \(R\) matrices by setting full_matrices = True in the argument.

Q_full, R_full = tf.linalg.qr(A, full_matrices = True)
print("Q full:")
print(Q_full)
print()
print("R full:")
print(R_full)
Q full:
tf.Tensor(
[[-0.18257415  0.4079837  -0.17102492 -0.8780469 ]
 [-0.36514837 -0.44398218 -0.81774735  0.02891004]
 [-0.5477225  -0.575977    0.54808205 -0.2604928 ]
 [-0.73029673  0.5519779   0.04056833  0.4004264 ]], shape=(4, 4), dtype=float32)

R full:
tf.Tensor(
[[-5.477226  -4.7469287]
 [ 0.         2.7778888]
 [ 0.         0.       ]
 [ 0.         0.       ]], shape=(4, 2), dtype=float32)

SVD

Singular value decomposition (SVD) of a matrix \(A\in R^{m\times n}\) is defined as \[A = U\Sigma V^T\] Where, \(U\in R^{m\times m}\) and \(V\in R^{n\times n}\) are orthogonal matrices, commonly known as left and right singular vectors respectively. \(\Sigma \in R^{m\times n}\) is a diagonal matrix.

mat = tf.constant([[5,2,3],
                   [2,9,4],
                   [3,2,6],
                   [7,8,9.]])
s,u,v = tf.linalg.svd(mat)
print("S:")
print(s)
print()
print("U:")
print(u)
print()
print("V:")
print(v)
S:
tf.Tensor([18.604359   5.459675   2.4636664], shape=(3,), dtype=float32)

U:
tf.Tensor(
[[ 0.2936678   0.40458775  0.7340845 ]
 [ 0.48711583 -0.7956307   0.01849233]
 [ 0.34406567  0.418864   -0.67870086]
 [ 0.7470583   0.16683212  0.01195723]], shape=(4, 3), dtype=float32)

V:
tf.Tensor(
[[ 0.4678568   0.5231253   0.71235543]
 [ 0.6254436  -0.76545155  0.15134181]
 [ 0.62444425  0.3747316  -0.685307  ]], shape=(3, 3), dtype=float32)

The result is a truncated SVD. To get full SVD decomposition, we have to set full_matrices = True in the argument.

s_full,u_full,v_full = tf.linalg.svd(mat, full_matrices = True)
print("S full:")
print(s_full)
print()
print("U full:")
print(u_full)
print()
print("V full:")
print(v_full)
S full:
tf.Tensor([18.604359   5.459675   2.4636664], shape=(3,), dtype=float32)

U full:
tf.Tensor(
[[ 0.2936678   0.40458775  0.7340845  -0.45955172]
 [ 0.48711583 -0.7956307   0.01849233 -0.35964906]
 [ 0.34406567  0.418864   -0.67870086 -0.4955166 ]
 [ 0.7470583   0.16683212  0.01195723  0.6433724 ]], shape=(4, 4), dtype=float32)

V full:
tf.Tensor(
[[ 0.4678568   0.5231253   0.71235543]
 [ 0.6254436  -0.76545155  0.15134181]
 [ 0.62444425  0.3747316  -0.685307  ]], shape=(3, 3), dtype=float32)

If only singular values are of interest, it can be computed without computing singular vectors. In this way, computations can be much faster.

tf.linalg.svd(mat, compute_uv=False).numpy()
array([18.604359 ,  5.459675 ,  2.4636664], dtype=float32)

Eigenvalues and eigenvectors

Symmetry is an important consideration while computing eigenvalues and eigenvectors. For symmetric matrices, different set of algorithms are used for eigen analysis that exploit the symmetry of the matrix. Therefore, two functions are available in tensorflow for eigen analysis. eig is used to compute eigenvalues and eigenvectors of a dense matrix without any special structure. eigh is used for eigen analysis of Hermitian matrices. If only eigenvalues are of interest, eigvals and eigvalsh can be used compute just eigenvalues.

Eigen-analysis of Hermitian matrices

A = tf.constant([[3,1,1],
                 [1,2,1],
                 [1,1,2.]])
values, vectors = tf.linalg.eigh(A)
print("Eigenvalues:")
print(values.numpy())
print()
print("Eigenvectors:")
print(vectors.numpy())
Eigenvalues:
[1.        1.5857866 4.4142137]

Eigenvectors:
[[ 0.         -0.7071068   0.70710665]
 [-0.70710677  0.49999994  0.50000006]
 [ 0.7071068   0.4999999   0.5       ]]

Each row is an eigenvector.

tf.linalg.matvec(A, vectors[0,:]).numpy()
array([-1.7881393e-07, -7.0710701e-01,  7.0710647e-01], dtype=float32)

Results are accurate up to 5 decimal digits.

tf.linalg.eigvalsh(A).numpy()    # Just eigenvalues
array([1.       , 1.5857866, 4.4142137], dtype=float32)

What happens if you pass a nonsymmetric matrix to eigh by mistake?

Well, while using eigh, tensorflow assumes the matrix to be symmetric. Tensorflow doesn’t check whether the matrix is symmetric or not. It just takes the lower triangular part, assumes that the upper triangular part is same because of symmetry and performs the computations. So be prepared to get a wrong result!

Eigen-analysis of non-Hermitian matrices

A = tf.constant([[1,-5,3],
                 [2,4,-7],
                 [3,9,-2.]])
values, vectors = tf.linalg.eig(A)
print("Eigenvalues:")
print(values.numpy())
print()
print("Eigenvectors:")
print(vectors.numpy())
Eigenvalues:
[2.7560833 -7.9942424e-08j 0.12195918-7.5705280e+00j
 0.12195931+7.5705280e+00j]

Eigenvectors:
[[ 0.06142625+0.95019215j  0.16093381-0.34744066j  0.13818482+0.35709903j]
 [-0.01236177-0.19122148j  0.3560446 +0.53046155j  0.38952374-0.5063876j ]
 [ 0.01535368+0.23750281j -0.33046415+0.5796737j  -0.29238018-0.59978485j]]

Only eigenvalues.

tf.linalg.eigvals(A).numpy()
array([2.7560833 -7.9942424e-08j, 0.12195918-7.5705280e+00j,
       0.12195931+7.5705280e+00j], dtype=complex64)

What happens when you pass a symmetric matrix to eig?

Nothing! We will still get the correct answer. Tensorflow will use more operations to compute results when it could have been done using less computations.

Solving dense linear systems

A linear system is (usually) written as \[ Ax = b\]

In general, \(A\) can be square or rectangular. In our case, it is dense, i.e., most of its entries are nonzero. Right hand side \(b\) is a vector in this case. If we have to solve the linear system for multiple right hand side vectors involving same \(A\), the RHS can be replaced by a matrix whose columns are different RHS vectors.

Depending on the structure of \(A\) (whether triangular, or tri-diagonal, or positive definite), suitable algorithm is chosen to solve the linear system. Tensorflow has a function tf.linalg.solve to solve linear systems. But this function doesn’t take into account the special structure of \(A\).

A = tf.constant([[1,1,1],
                 [1,5,5],
                 [1,5,13]], dtype = tf.float32)
b = tf.constant([3,11,20], shape = (3,1), dtype = tf.float32)
tf.linalg.solve(A,b)
<tf.Tensor: shape=(3, 1), dtype=float32, numpy=
array([[1.   ],
       [0.875],
       [1.125]], dtype=float32)>

Using LU decomposition

If \(LU\) decomposition factors of \(A\) are known, those can be used to solve the linear system. For example, solve: \[\begin{pmatrix} 1 & 1 & 1\\ 1 & 5 & 5\\ 1 & 5 & 13 \end{pmatrix} x= \begin{pmatrix} 3\\ 11\\ 20\\ \end{pmatrix}\]

A = tf.constant([[1,1,1],
                 [1,5,5],
                 [1,5,13]], dtype = tf.float32)
b = tf.constant([3,11,20], shape = (3,1), dtype = tf.float32)
lu, p = tf.linalg.lu(A)   # Factorization result of LU
x_sol_lu = tf.linalg.lu_solve(lu,p,b)
x_sol_lu
<tf.Tensor: shape=(3, 1), dtype=float32, numpy=
array([[1.   ],
       [0.875],
       [1.125]], dtype=float32)>
tf.matmul(A,x_sol_lu)
<tf.Tensor: shape=(3, 1), dtype=float32, numpy=
array([[ 3.],
       [11.],
       [20.]], dtype=float32)>

Once we have obtained factors \(L\) and \(U\), we can use tf.linalg.triangular_solve to solve the linear system by solving following two triangular system. \[Ly = b\] and \[Ux = y\]

L = tf.linalg.band_part(lu,-1,0) - tf.linalg.diag(tf.linalg.diag_part(lu)) + tf.linalg.diag(tf.ones(shape = lu.shape[0],))
y = tf.linalg.triangular_solve(L,b)    # Solves Ly = b
x = tf.linalg.triangular_solve(lu,y, lower = False)  # Solves Ux = y
x
<tf.Tensor: shape=(3, 1), dtype=float32, numpy=
array([[1.   ],
       [0.875],
       [1.125]], dtype=float32)>

Using Cholesky decomposition

If \(A\) is positive definite, Cholesky decomposition is an efficient method for solving the linear system. For positive definite \(A\), Cholesky decomposition requires fewer computations than \(LU\) decomposition. This is because it exploits the symmetry of the matrix \(A\). Once the Cholesky factor \(L\) is found, we solve the linear system by solving two triangular systems. Solving triangular systems only requires \(O(n^2)\) operations. \[ LL^Tx = b\] Two triangular systems are: \[ Ly = b\] This gives us \(y\). Using \(y\) we solve for \(x\) using the following equation. \[L^Tx = y\]

In tensorflow, we solve the system using tf.linalg.cholesky_solve. It takes cholesky factor \((L)\) and right hand side \(b\) as input.

For example, solve: \[\begin{pmatrix} 1 & 1 & 1\\ 1 & 5 & 5\\ 1 & 5 & 13 \end{pmatrix} x= \begin{pmatrix} 3\\ 11\\ 20\\ \end{pmatrix}\]

A = tf.constant([[1,1,1],
                 [1,5,5],
                 [1,5,13]], dtype = tf.float32)
b = tf.constant([3,11,20], shape = (3,1), dtype = tf.float32)
L = tf.linalg.cholesky(A)
sol_chol = tf.linalg.cholesky_solve(L, b)
sol_chol
<tf.Tensor: shape=(3, 1), dtype=float32, numpy=
array([[1.0000001],
       [0.8750001],
       [1.1249999]], dtype=float32)>
tf.matmul(A,sol_chol)
<tf.Tensor: shape=(3, 1), dtype=float32, numpy=
array([[ 3.      ],
       [11.      ],
       [19.999998]], dtype=float32)>

Solving structured linear systems

If coefficient matrix of a linear system has some kind of structure (triangular, banded, tridiagonal, etc.), it can be efficiently solved using far fewer computations than required by dense \(LU\) decomposition. Special solvers exist that exploit the structure of the coefficient matrix thus reducing flop count considerably for large structured systems. Linear algebra library of Tensorflow implements three such specialized solvers: Triangular solver, Tridiagonal solver, and Banded triangular solver.

Structured solvers have immensely useful as in many practical applications resulting coefficient matrix has some kind of structure. This is especially true in mechanical applications, such as finite element modeling, computational heat transfer, etc. In finite element modeling, a body is divided into several elements and each element has certain number of nodes. A node is connected to only a few neighboring nodes. So interaction of a node is limited to only those neighboring nodes. This results in banded systems. Other types of structured matrices are also common in applications.

Solving triangular systems

We have already used triangular solvers in the section on LU decomposition. For completeness, we will again describe it in this section. tf.linalg.triangular_solve solves both lower triangular and upper triangular systems. Following examples demonstrate that.

lower_triangular_mat = tf.constant([[1,0,0],
                                    [2,3,0],
                                    [4,5,6]], dtype = tf.float32)
rhs = tf.constant([[7],[8],[9]], dtype = tf.float32)
res_lower_triangular = tf.linalg.triangular_solve(matrix = lower_triangular_mat, rhs = rhs, lower = True)
res_lower_triangular
<tf.Tensor: shape=(3, 1), dtype=float32, numpy=
array([[ 7. ],
       [-2. ],
       [-1.5]], dtype=float32)>

We can use the same solver to solve upper triangular system also. This is done by setting adjoint = True along with lower = True. lower = True extracts the lower triangular part of the coefficient matrix and adjoint = True solves the triangular system by transposing the extracted lower triangular part. For real matrices adjoint is same as transpose. By setting the arguments as above, essentially we solve the upper triangular system.

res_upper_triangular = tf.linalg.triangular_solve(matrix = lower_triangular_mat, rhs = rhs, lower = True, adjoint = True)
res_upper_triangular
<tf.Tensor: shape=(3, 1), dtype=float32, numpy=
array([[0.6666665 ],
       [0.16666667],
       [1.5       ]], dtype=float32)>

We can verity the above result by multiplying the transpose of the lower triangular matrix by res_upper_triangular. This indeed gives us the right hand side \(([7,8,9]^T)\).

tf.matmul(lower_triangular_mat, res_upper_triangular, transpose_a=True)
<tf.Tensor: shape=(3, 1), dtype=float32, numpy=
array([[7.],
       [8.],
       [9.]], dtype=float32)>

Solving tri-diagonal systems

If matrix \(A\) is tri-diagonal, tf.linalg.tridiagonal_solve can be used to solve the linear system efficiently. For example, we will solve the following tri-diagonal system.

\[\begin{pmatrix} 2 & -1 & 0 &0 & 0\\ -1 & 2 & -1 & 0 & 0\\ 0 & -1 & 2& -1& 0\\ 0 & 0 & -1 & 2 & -1\\ 0 & 0 & 0 & -1 & 2 \end{pmatrix}x= \begin{pmatrix} 3\\ 4\\ -5\\ 7\\ 9\end{pmatrix}\]

diags = tf.constant([[-1,-1,-1,-1, 0],
                     [ 2, 2, 2, 2, 2],
                     [ 0,-1,-1,-1,-1]], dtype = tf.float32)
b = tf.constant([3,4,-5,7,9.],shape = (5,1))
x = tf.linalg.tridiagonal_solve(diagonals = diags, rhs = b)
x
<tf.Tensor: shape=(5, 1), dtype=float32, numpy=
array([[ 6.5000005],
       [10.000001 ],
       [ 9.500001 ],
       [14.       ],
       [11.5      ]], dtype=float32)>
tf.linalg.tridiagonal_matmul(diagonals=diags, rhs = x)
<tf.Tensor: shape=(5, 1), dtype=float32, numpy=
array([[ 3.      ],
       [ 4.000001],
       [-4.999999],
       [ 7.      ],
       [ 9.      ]], dtype=float32)>

Solving banded triangular systems

Solving an \((n \times n)\) full upper triangular (or lower triangular) system uses \(n^2\) flops (including addition/subtraction and multiplication/division). But if the upper triangular (or lower triangular) system is banded with, say, \(k\) bands (including main diagonal), the system can be solved in \(\approx(2k-1)n\) flops. If \(k\) is not very large, banded triangular systems can be solved at a fraction of computational cost as compared to dense triangular systems.

However, please keep in mind that flop count is not the only criteria that determines the computational time of a system. For very very large systems, communication time between different parts of computer can be significant as compared to processing required number of flops for that system.

We can solve both banded lower triangular and banded upper triangular systems. To illustrate this we will use the following example.

\[\begin{pmatrix} 1 & 0 & 0 & 0 & 0\\ 6 & 2 & 0 & 0 & 0\\ 0 & 7 & 3 & 0 & 0\\ 0 & 0 & 8 & 4 & 0\\ 0 & 0 & 0 & 9 & 5 \end{pmatrix}x= \begin{pmatrix} 3\\ 4\\ -5\\ 7\\ 9\end{pmatrix}\]

As the matrix is banded, we need not create the full dense matrix. Instead, we can only pass the band part to the solver. Subdiagonals (or superdiagonals) contain less number of entries than main diagonal. While passing the band part to tensorflow, these subdiagonals (or superdiagonals) have to be appended either at left or right to make it of the same length as main diagonal. Whether the subdiagonals or superdiagonals will be appended at the left or right, is decided by the alignment strategy. There are four types of alignment strategies used in tensorflow. Those are:

  • LEFT_LEFT: Superdiagonals are appended at right and subdiagonals are appended at right.
  • LEFT_RIGHT: Superdiagonals are appended at right and subdiagonals are appended at left.
  • RIGHT_LEFT: Superdiagonals are appended at left and subdiagonals are appended at right.
  • RIGHT_RIGHT: Superdiagonals are appended at left and subdiagonals are appended at left.

One way to remember the above rules is that if something is aligned to left, it is appended at right. And in LEFT_RIGHT, first word corresponds to superdiagonals and second word corresponds to subdiagonal. For tf.linalg.banded_triangular_solve, tensorflow assumes that input band part has LEFT_RIGHT alignment. Band part inputs to this function can be modified accordingly. Above problem can be solved as follows.

bands = tf.constant([[1,2,3,4,5],
                     [0,6,7,8,9]], dtype = tf.float32)  # Because of LEFT_RIGHT alignment as we have a subdiagonal here
rhs = tf.constant([[3],
                   [4],
                   [-5],
                   [7],
                   [9]], dtype = tf.float32)
banded_lower_triangular_sol = tf.linalg.banded_triangular_solve(bands = bands, rhs = rhs, lower = True)
banded_lower_triangular_sol
<tf.Tensor: shape=(5, 1), dtype=float32, numpy=
array([[  3.      ],
       [ -7.      ],
       [ 14.666667],
       [-27.583334],
       [ 51.45    ]], dtype=float32)>

We can verify the above solution by multiplying the dense banded lower triangular matrix by the solution. This should give us the RHS.

dense_banded_lower_triangular_mat = tf.linalg.set_diag(tf.zeros(shape = (5,5)), diagonal = bands,
                                                       k = (-1,0), align = "LEFT_RIGHT")
dense_banded_lower_triangular_mat
<tf.Tensor: shape=(5, 5), dtype=float32, numpy=
array([[1., 0., 0., 0., 0.],
       [6., 2., 0., 0., 0.],
       [0., 7., 3., 0., 0.],
       [0., 0., 8., 4., 0.],
       [0., 0., 0., 9., 5.]], dtype=float32)>
tf.matmul(dense_banded_lower_triangular_mat, banded_lower_triangular_sol)
<tf.Tensor: shape=(5, 1), dtype=float32, numpy=
array([[ 3.],
       [ 4.],
       [-5.],
       [ 7.],
       [ 9.]], dtype=float32)>

Similarly we can solve banded upper triangular matrix. For illustration, we will use the transpose of the above coefficient matrix. We will use the same right hand side vector.

\[\begin{pmatrix} 1 & 6 & 0 & 0 & 0\\ 0 & 2 & 7 & 0 & 0\\ 0 & 0 & 3 & 8 & 0\\ 0 & 0 & 0 & 4 & 9\\ 0 & 0 & 0 & 0 & 5 \end{pmatrix}x= \begin{pmatrix} 3\\ 4\\ -5\\ 7\\ 9\end{pmatrix}\]

bands = tf.constant([[6,7,8,9,0],
                     [1,2,3,4,5]], dtype = tf.float32)  # Because of LEFT_RIGHT alignment as we have a superdiagonal here
rhs = tf.constant([[3],
                   [4],
                   [-5],
                   [7],
                   [9]], dtype = tf.float32)
banded_upper_triangular_sol = tf.linalg.banded_triangular_solve(bands = bands, rhs = rhs, lower = False)
banded_upper_triangular_sol
<tf.Tensor: shape=(5, 1), dtype=float32, numpy=
array([[ 84.79998  ],
       [-13.63333  ],
       [  4.4666657],
       [ -2.2999997],
       [  1.8      ]], dtype=float32)>

Now we will verify the solution.

dense_banded_upper_triangular_mat = tf.linalg.set_diag(tf.zeros(shape = (5,5)), diagonal = bands,
                                                       k = (0,1), align = "LEFT_RIGHT")
dense_banded_upper_triangular_mat
<tf.Tensor: shape=(5, 5), dtype=float32, numpy=
array([[1., 6., 0., 0., 0.],
       [0., 2., 7., 0., 0.],
       [0., 0., 3., 8., 0.],
       [0., 0., 0., 4., 9.],
       [0., 0., 0., 0., 5.]], dtype=float32)>
tf.matmul(dense_banded_upper_triangular_mat, banded_upper_triangular_sol)
<tf.Tensor: shape=(5, 1), dtype=float32, numpy=
array([[ 3.],
       [ 4.],
       [-5.],
       [ 7.],
       [ 9.]], dtype=float32)>

Solving least squares problems

Ordinary least squares

Both over determined and under determined least squares problem can be solved using the command tf.linalg.lstsq. In the underdetermined case, the output is the least norm solution. Least squares problem can be written as

\[arg\min_{x}\|Ax-b\|_2^2\]

That is, we try to find an \(x\) such that the residual error is as small as possible. For example, we will solve following two problems.

\[\begin{pmatrix} 1 & 2\\ 2 & 0.5\\ 3 & 1\\ 4 & 5\\ \end{pmatrix}x_{over}= \begin{pmatrix} 3\\ 4\\ 5\\ 6\\ \end{pmatrix}\] \[\begin{pmatrix} 3 & 1 & 2 & 5\\ 7 & 9 & 1 & 4 \end{pmatrix}x_{under}= \begin{pmatrix} 7.2\\ -5.8\\ \end{pmatrix}\]

A_over = tf.constant([[1,2],[2,0.5],[3, 1],[4,5.]])
A_under = tf.constant([[3,1,2,5],[7,9,1,4.]])
b_over = tf.constant([3,4,5,6.], shape = (4,1))
b_under = tf.constant([7.2,-5.8], shape = (2,1))
x_over = tf.linalg.lstsq(A_over, b_over)
x_over
<tf.Tensor: shape=(2, 1), dtype=float32, numpy=
array([[ 1.704103  ],
       [-0.04319588]], dtype=float32)>

Though it is not advisable, for this simple case, we will directly apply normal equation to get the solution.

tf.matmul(tf.linalg.inv(tf.matmul(A_over,A_over, transpose_a = True)), tf.matmul(A_over,b_over, transpose_a = True))
<tf.Tensor: shape=(2, 1), dtype=float32, numpy=
array([[ 1.704104  ],
       [-0.04319668]], dtype=float32)>
x_under = tf.linalg.lstsq(A_under, b_under)
x_under
<tf.Tensor: shape=(4, 1), dtype=float32, numpy=
array([[-0.04100358],
       [-1.3355565 ],
       [ 0.699703  ],
       [ 1.4518324 ]], dtype=float32)>

We will computer the least norm solution for underdetermined case using the closed form solution. However, it should be remembered that it is not advisable to do so in practice for large systems.

tf.matmul(A_under,tf.matmul(tf.linalg.inv(tf.matmul(A_under, A_under, transpose_b = True)), b_under), transpose_a = True)
<tf.Tensor: shape=(4, 1), dtype=float32, numpy=
array([[-0.04100358],
       [-1.3355561 ],
       [ 0.6997029 ],
       [ 1.4518325 ]], dtype=float32)>

Regularized least squares

Only \(l_2\) regularization is supported. The following regularized problem is solved.

\[arg\min_{x}\|Ax-b\|_2^2 + \lambda \|x\|_2^2\]

Here, \(\lambda\) is a hyperparameter. Usually several values of \(\lambda\) are tried over a logarithmic scale before choosing the best one.

x_over_reg = tf.linalg.lstsq(A_over, b_over, l2_regularizer= 2.0)
x_over_reg.numpy()
array([[1.3890449 ],
       [0.21348318]], dtype=float32)
x_under_reg = tf.linalg.lstsq(A_under, b_under, l2_regularizer=2.)
x_under_reg
<tf.Tensor: shape=(4, 1), dtype=float32, numpy=
array([[-0.04763567],
       [-1.214508  ],
       [ 0.62748903],
       [ 1.299031  ]], dtype=float32)>

Some specialized operations

Norm

Norm can be defined for vectors as well as matrices. \(p\) norm of vector is defined as \[\|x\|_p = (\Sigma_{i=1}^{n}|x_i|^p)^\frac{1}{p}\]

Matrix is norm is defined as \[\|A\|_p= \max_{x\neq 0}\frac{\|Ax\|_p}{\|x\|_p}\]

Tensorflow supports all the usual vector and matrix norms that are used in practice. Using only tensorflow we can calculate all norms except infinity norm. To calculate infinity norm we have to use ord = np.inf.

tf.linalg.norm(tf.constant([1,-2,3.]), ord = "euclidean").numpy()
3.7416575
tf.linalg.norm(tf.constant([1,-2,3]), ord = 1).numpy()
6

Fractional norms for vectors are also supported.

tf.linalg.norm(tf.constant([1,-2,3.]), ord = 0.75).numpy()
8.46176
A = tf.constant([[1,8, 2,3],
                 [2,7,6,5],
                 [0,3,2,8.]])
mat_norm_2 = tf.linalg.norm(A, ord = 2, axis = [0,1])
print("2 norm of matrix A = ", mat_norm_2.numpy())
WARNING:tensorflow:From /home/biswajit/anaconda3/envs/tf_cpu_23/lib/python3.8/site-packages/tensorflow/python/ops/linalg_ops.py:735: setdiff1d (from tensorflow.python.ops.array_ops) is deprecated and will be removed after 2018-11-30.
Instructions for updating:
This op will be removed after the deprecation date. Please switch to tf.sets.difference().
2 norm of matrix A =  15.294547

2 norm of a matrix is equivalent to the largest singular value of the matrix. We will verify that.

vals,_ ,_ = tf.linalg.svd(A)
tf.math.reduce_max(vals).numpy()
15.294547

Normalizing a tensor

Computes the norm and normalizes the tensor using that norm. By normalize we mean, divide the entries of the tensor by the norm. Here, we will consider a matrix. But the method can be extended to multi-dimensional tensor.

If computed norm is a single number, all the entries of the matrix will be divided by that number. If norm is calculated along some axis, normalization happens along that axis using individual norms. Here are some examples.

A
<tf.Tensor: shape=(3, 4), dtype=float32, numpy=
array([[1., 8., 2., 3.],
       [2., 7., 6., 5.],
       [0., 3., 2., 8.]], dtype=float32)>
normalized_mat, norm = tf.linalg.normalize(A, ord = 2, axis = [0,1])
print("Normalized matrix: ")
print(normalized_mat.numpy())
print()
print("Norm = ", norm.numpy())
Normalized matrix: 
[[0.06538278 0.5230622  0.13076556 0.19614834]
 [0.13076556 0.45767945 0.39229667 0.3269139 ]
 [0.         0.19614834 0.13076556 0.5230622 ]]

Norm =  [[15.294547]]

We will get the same normalized matrix by dividing the entries of the matrix by the norm.

A/norm
<tf.Tensor: shape=(3, 4), dtype=float32, numpy=
array([[0.06538278, 0.5230622 , 0.13076556, 0.19614834],
       [0.13076556, 0.45767945, 0.39229667, 0.3269139 ],
       [0.        , 0.19614834, 0.13076556, 0.5230622 ]], dtype=float32)>
norm_mat_by_col, norms_col = tf.linalg.normalize(A, ord = 2, axis = 0)
print("Normalized matrix:")
print(norm_mat_by_col)
print()
print("Norms of columns of A:")
print(norms_col)
Normalized matrix:
tf.Tensor(
[[0.4472136  0.724286   0.30151135 0.30304575]
 [0.8944272  0.63375026 0.904534   0.5050763 ]
 [0.         0.27160725 0.30151135 0.80812204]], shape=(3, 4), dtype=float32)

Norms of columns of A:
tf.Tensor([[ 2.236068  11.045361   6.6332498  9.899495 ]], shape=(1, 4), dtype=float32)
tf.linalg.norm(A[:,0], ord = 2).numpy()  # 2 Norm of first column of A
2.236068
A/norms_col
<tf.Tensor: shape=(3, 4), dtype=float32, numpy=
array([[0.4472136 , 0.724286  , 0.30151135, 0.30304575],
       [0.8944272 , 0.63375026, 0.904534  , 0.5050763 ],
       [0.        , 0.27160725, 0.30151135, 0.80812204]], dtype=float32)>
norm_mat_by_row, norms_row = tf.linalg.normalize(A, ord = 2, axis = 1)
print("Normalized matrix:")
print(norm_mat_by_row)
print()
print("Norms of rows:")
print(norms_row)
Normalized matrix:
tf.Tensor(
[[0.11322771 0.9058217  0.22645542 0.33968312]
 [0.18731716 0.6556101  0.56195146 0.4682929 ]
 [0.         0.34188172 0.22792116 0.91168463]], shape=(3, 4), dtype=float32)

Norms of rows:
tf.Tensor(
[[ 8.83176 ]
 [10.677078]
 [ 8.774964]], shape=(3, 1), dtype=float32)
tf.linalg.norm(A[0,:], ord = 2).numpy()    # 2 norm of first row of A
8.83176
A/norms_row
<tf.Tensor: shape=(3, 4), dtype=float32, numpy=
array([[0.11322771, 0.9058217 , 0.22645542, 0.33968312],
       [0.18731716, 0.6556101 , 0.56195146, 0.4682929 ],
       [0.        , 0.34188172, 0.22792116, 0.91168463]], dtype=float32)>

Global norm

Given two or more tensors, tf.linalg.global_norm computes the 2 norm of a vector generated by resizing all the tensors to one dimensional arrays.

a = tf.constant([1, 2, 3.])
b = tf.constant([[4,5],
                 [6,7.]])
tf.linalg.global_norm([a,b]).numpy()
11.83216
tf.linalg.norm([1,2,3,4,5,6,7.], ord = 2).numpy()
11.83216

Cross product of vectors

It is defined for 3-element vectors. For two vectors \(a = (a_1, a_2, a_3)^T\), and \(b = (b_1, b_2, b_3)^T\), cross product is defined as the determinant of following matrix

\[\begin{pmatrix} i & j & k\\ a_1 & a_2 & a_3\\ b_1 & b_2 & b_3 \end{pmatrix}\] Where \(i, j\), and \(k\) are unit direction vectors along three perpendicular right handed system.

a = tf.constant([1,2,3])
b = tf.constant([2,3,4])
tf.linalg.cross(a,b).numpy()
array([-1,  2, -1], dtype=int32)

First element in the output corresponds to the value along \(i\)th direction. Similarly for other outputs.

It is also possible to calculate cross product of more that one pair of vectors simultaneously.

c = tf.random.normal(shape = (5,3))
d = tf.random.normal(shape = (5,3))
tf.linalg.cross(c,d).numpy()
array([[-0.8989703 , -0.89722276,  0.05096105],
       [-0.44832098, -0.32573706, -0.11998086],
       [ 0.7280822 ,  0.19632304,  0.8862282 ],
       [ 0.0998309 ,  0.48842978, -0.03491247],
       [ 1.0497653 , -1.5643073 ,  0.5671499 ]], dtype=float32)

First row of output is the cross product of first rows of \(c\) and \(d\). Similarly for other rows.

Matrix square root

Square root of a matrix is defined for invertible matrices whose real eigenvalues are positive.

mat = tf.constant([[5,2,3],
                   [2,9,4],
                   [3,2,6.]])
mat_root = tf.linalg.sqrtm(mat)
mat_root
<tf.Tensor: shape=(3, 3), dtype=float32, numpy=
array([[2.1185937 , 0.35412252, 0.6189322 ],
       [0.30147368, 2.9409115 , 0.7257953 ],
       [0.6540313 , 0.33657336, 2.3132057 ]], dtype=float32)>
tf.matmul(mat_root, mat_root)
<tf.Tensor: shape=(3, 3), dtype=float32, numpy=
array([[4.9999986, 2.0000007, 3.0000038],
       [2.0000005, 9.000003 , 4.0000057],
       [3.0000033, 2.000003 , 6.0000052]], dtype=float32)>

Matrix exponential

Exponential of a matrix \(A\) is defined as

\[ e^A = \sum_{n=0}^\infty \frac{A^n}{n!}\]

In practice, the sum is not taken to infinity. Rather, approximations are used to compute matrix exponential. Tensorflow implementation is based on this paper.

When the matrix has a full set of independent eigenvectors, the formula can be simplified to \[e^A = Se^{\Lambda}S^{-1}\]

where, \(S\) is the eigenvector matrix and \(e^\Lambda\) is a diagonal matrix whose diagonal entries are exponentials of eigenvalues of the matrix \(A\).

A = tf.constant([[0,1],
                 [1,0.]])
tf.linalg.expm(A)
<tf.Tensor: shape=(2, 2), dtype=float32, numpy=
array([[1.5430806, 1.1752012],
       [1.1752012, 1.5430806]], dtype=float32)>

Matrix logarithm

Computes logarithm of the matrix such that matrix exponential of the result gives back the original matrix. Refer to the documentation for further details. It is defined only for complex matrices.

mat = tf.constant([[5,2,3],
                   [2,9,4],
                   [3,2,6.]], dtype = tf.complex64)
mat_log = tf.linalg.logm(mat)
mat_log
<tf.Tensor: shape=(3, 3), dtype=complex64, numpy=
array([[1.4031177 +0.j, 0.25731087+0.j, 0.53848237+0.j],
       [0.16580153+0.j, 2.1160111 +0.j, 0.54512537+0.j],
       [0.5994888 +0.j, 0.2268081 +0.j, 1.5622762 +0.j]], dtype=complex64)>
tf.linalg.expm(mat_log)
<tf.Tensor: shape=(3, 3), dtype=complex64, numpy=
array([[4.999999 +0.j, 1.9999989+0.j, 2.9999986+0.j],
       [1.9999995+0.j, 9.000006 +0.j, 4.0000005+0.j],
       [2.9999995+0.j, 2.000001 +0.j, 5.9999995+0.j]], dtype=complex64)>

Log-determinant of a matrix

Computes the natural logarithm of the determinant of a matrix. There are two functions in tensorflow to calculate this.

  • If matrix is symmetric positive definite, use tf.linalg.logdet (Uses Cholesky decomposition)
  • For other matrices, use tf.linalg.slogdet (Uses \(LU\) decomposition)

slogdet computes the sign of the determinant as well as the log of the absolute value of the determinant.

mat = tf.constant([[5,2,3],
                   [2,9,2],
                   [3,2,6.]])   # Symmetric positive definite
tf.linalg.logdet(mat).numpy()
5.1298985
tf.math.log(tf.linalg.det(mat)).numpy()
5.1298985
mat_2 = tf.constant([[5,2,3],
                     [0,-2,2],
                     [0,0,6.]])
sign, log_abs_det = tf.linalg.slogdet(mat_2)
print("Sign of determinant = ", sign.numpy())
print("Log of absolute value of determinant = ", log_abs_det.numpy())
Sign of determinant =  -1.0
Log of absolute value of determinant =  4.0943446
tf.math.log(tf.abs(tf.linalg.det(mat_2))).numpy()
4.0943446

Pseudo inverse of a matrix

While matrix inverse is defined only for square matrices, pseudo inverse is defined for matrices of any shape. It is also defined for singular matrices. Pseudo inverse can also be used to solve ordinary least squares problem. For the problem

\[arg\min_{x}\|Ax-b\|_2^2\]

the approximate least squares solution can be written as

\[x_{ls} = A^{\dagger}b\]

Where, \(A^{\dagger}\) is the pseudo inverse of \(A\).

tf.linalg.pinv(tf.constant([[2,0,0],[0,3,0],[5,0,0.]]))
<tf.Tensor: shape=(3, 3), dtype=float32, numpy=
array([[0.06896552, 0.        , 0.17241378],
       [0.        , 0.33333334, 0.        ],
       [0.        , 0.        , 0.        ]], dtype=float32)>
A_over = tf.constant([[1,2],[2,0.5],[3, 1],[4,5.]])
A_under = tf.constant([[3,1,2,5],[7,9,1,4.]])
b_over = tf.constant([3,4,5,6.], shape = (4,1))
b_under = tf.constant([7.2,-5.8], shape = (2,1))
x_ls_over = tf.linalg.lstsq(A_over,b_over)
x_ls_over
<tf.Tensor: shape=(2, 1), dtype=float32, numpy=
array([[ 1.704103  ],
       [-0.04319588]], dtype=float32)>
tf.matmul(tf.linalg.pinv(A_over),b_over)
<tf.Tensor: shape=(2, 1), dtype=float32, numpy=
array([[ 1.7041038 ],
       [-0.04319668]], dtype=float32)>
x_ls_under = tf.linalg.lstsq(A_under, b_under)
x_ls_under
<tf.Tensor: shape=(4, 1), dtype=float32, numpy=
array([[-0.04100358],
       [-1.3355565 ],
       [ 0.699703  ],
       [ 1.4518324 ]], dtype=float32)>
tf.matmul(tf.linalg.pinv(A_under), b_under)
<tf.Tensor: shape=(4, 1), dtype=float32, numpy=
array([[-0.04100376],
       [-1.3355565 ],
       [ 0.69970286],
       [ 1.4518324 ]], dtype=float32)>

Linear operators

Linear operators are a powerful way of defining matrices and associated operators without even doing actual computations. What does this mean? Do we ever get result of our computations using operators? Well, the computations are done only when we ask for the results. Before that the operators just act on each other (like chaining of operators) without doing any computation. We will show this by examples.

Note: We will mainly use operators to form dense matrices. But the scope of applicability of operators is far bigger than that.

To define a matrix as an operator, we use tf.linalg.LinearOperatorFullMatrix.

operator = tf.linalg.LinearOperatorFullMatrix(tf.constant([[1,2,3],
                                                           [2,3,5],
                                                           [7,8,9.]]))
operator
<tensorflow.python.ops.linalg.linear_operator_full_matrix.LinearOperatorFullMatrix at 0x7f8f501e7580>

We get only the memory location. No result is shown. To see the actual matrix we have to call the method to_dense on this operator. There are many methods that can be called on an operator. For the full list, refer the documentation.

Before seeing the result, we will apply adjoint operator to our old operator and apply to_dense to the adjoint operator. If everything works well, we should see the transpose of the matrix as result.

adj_operator = tf.linalg.LinearOperatorAdjoint(operator)
adj_operator
WARNING:tensorflow:From /home/biswajit/anaconda3/envs/tf_cpu_23/lib/python3.8/site-packages/tensorflow/python/ops/linalg/linear_operator_adjoint.py:145: LinearOperator.graph_parents (from tensorflow.python.ops.linalg.linear_operator) is deprecated and will be removed in a future version.
Instructions for updating:
Do not call `graph_parents`.





<tensorflow.python.ops.linalg.linear_operator_adjoint.LinearOperatorAdjoint at 0x7f8f501e71f0>

Again no result. At this point we want to see the result. So we will apply to_dense method to adj_operator.

adj_operator.to_dense()
<tf.Tensor: shape=(3, 3), dtype=float32, numpy=
array([[1., 2., 7.],
       [2., 3., 8.],
       [3., 5., 9.]], dtype=float32)>

To compare it with our original matrix, we will also show the original matrix.

operator.to_dense()
<tf.Tensor: shape=(3, 3), dtype=float32, numpy=
array([[1., 2., 3.],
       [2., 3., 5.],
       [7., 8., 9.]], dtype=float32)>

As expected, the adjoint operator gives the correct answer.

Common methods on linear operators

There are many methods that can be called on the operator. Depending on the operator, the methods vary. In this section we will discuss some of the methods of LinearOperatorFullMatrix operator. Some of the methods are:

  • cond (To find condition number)
  • determinant
  • cholesky (To compute Cholesky factors of operator)
  • eigvals (Compute eigenvalues only for self-adjoint (Hermitian) matrices)
  • trace
  • inverse
  • solve (Solve linear system using operator)
  • adjoint

and many others.

operator.to_dense()
<tf.Tensor: shape=(3, 3), dtype=float32, numpy=
array([[1., 2., 3.],
       [2., 3., 5.],
       [7., 8., 9.]], dtype=float32)>
operator.cond()
<tf.Tensor: shape=(), dtype=float32, numpy=68.21983>
operator.trace()
<tf.Tensor: shape=(), dtype=float32, numpy=13.0>
operator.determinant()
WARNING:tensorflow:Using (possibly slow) default implementation of determinant.  Requires conversion to a dense matrix and O(N^3) operations.





<tf.Tensor: shape=(), dtype=float32, numpy=6.0>
operator.adjoint().to_dense()
<tf.Tensor: shape=(3, 3), dtype=float32, numpy=
array([[1., 2., 7.],
       [2., 3., 8.],
       [3., 5., 9.]], dtype=float32)>
operator.inverse().to_dense()
<tf.Tensor: shape=(3, 3), dtype=float32, numpy=
array([[-2.1666667 ,  1.        ,  0.16666669],
       [ 2.8333333 , -2.        ,  0.16666669],
       [-0.83333325,  1.        , -0.16666669]], dtype=float32)>
operator.matmul(operator.inverse().to_dense())
<tf.Tensor: shape=(3, 3), dtype=float32, numpy=
array([[ 1.0000000e+00,  0.0000000e+00,  0.0000000e+00],
       [-2.3841858e-07,  1.0000000e+00,  0.0000000e+00],
       [-2.3841858e-07,  0.0000000e+00,  1.0000000e+00]], dtype=float32)>

Special matrices using operators

Toeplitz matrix

col = tf.constant([1,2,3,4,5.])
row = tf.constant([1,6,7,8,9.])
toeplitz_operator = tf.linalg.LinearOperatorToeplitz(col = col, row = row )
toeplitz_operator.to_dense()
<tf.Tensor: shape=(5, 5), dtype=float32, numpy=
array([[1., 6., 7., 8., 9.],
       [2., 1., 6., 7., 8.],
       [3., 2., 1., 6., 7.],
       [4., 3., 2., 1., 6.],
       [5., 4., 3., 2., 1.]], dtype=float32)>

Circulant matrix

kernel = [1,2,3,4,5]
spectrum = tf.signal.fft(tf.cast(kernel, dtype = tf.complex64))
circ_operator = tf.linalg.LinearOperatorCirculant(spectrum = spectrum, input_output_dtype = tf.float32)
circ_operator.to_dense()
<tf.Tensor: shape=(5, 5), dtype=float32, numpy=
array([[1.0000002 , 4.9999995 , 3.9999993 , 3.        , 2.0000005 ],
       [2.0000005 , 0.99999934, 4.9999995 , 3.9999993 , 2.9999998 ],
       [3.0000002 , 2.        , 0.9999998 , 4.9999995 , 3.9999998 ],
       [3.9999993 , 2.9999993 , 2.        , 1.        , 4.9999995 ],
       [4.9999995 , 3.9999993 , 3.        , 2.0000005 , 1.0000002 ]],
      dtype=float32)>
circ_operator.convolution_kernel()
<tf.Tensor: shape=(5,), dtype=float32, numpy=
array([1.       , 2.0000002, 3.       , 3.9999993, 4.9999995],
      dtype=float32)>

Block diagonal matrix

operator_1 = tf.linalg.LinearOperatorFullMatrix(tf.constant([[1,2,3],
                                                             [4,5,6],
                                                             [7,8,9]], dtype = tf.float32))
operator_2 = tf.linalg.LinearOperatorFullMatrix(-1*tf.constant([[9,8],
                                                           [7,6]], dtype = tf.float32))
blk_diag_operator = tf.linalg.LinearOperatorBlockDiag([operator_1,operator_2])
blk_diag_operator.to_dense()
<tf.Tensor: shape=(5, 5), dtype=float32, numpy=
array([[ 1.,  2.,  3.,  0.,  0.],
       [ 4.,  5.,  6.,  0.,  0.],
       [ 7.,  8.,  9.,  0.,  0.],
       [ 0.,  0.,  0., -9., -8.],
       [ 0.,  0.,  0., -7., -6.]], dtype=float32)>

Block lower triangular matrix

operator_3 = tf.linalg.LinearOperatorFullMatrix(tf.constant(tf.repeat(6.,repeats = 6), shape = (2,3)))
blk_lower = tf.linalg.LinearOperatorBlockLowerTriangular([[operator_1], [operator_3, operator_2]])
blk_lower.to_dense()
<tf.Tensor: shape=(5, 5), dtype=float32, numpy=
array([[ 1.,  2.,  3.,  0.,  0.],
       [ 4.,  5.,  6.,  0.,  0.],
       [ 7.,  8.,  9.,  0.,  0.],
       [ 6.,  6.,  6., -9., -8.],
       [ 6.,  6.,  6., -7., -6.]], dtype=float32)>

Householder matrix

Householder matrix can be used to triangularize a matrix using orthogonal matrices (the process is called orthogonal triangularization). But we will not pursue that point here. We will only show the method using only a column vector. Given a vector \(v = [1, 4, 7]^T\), Householder transform can transform the vector into \(v = \|v\|_2[1,0,0]^T\). It is achieved by multiplying the vector \(v\) by an orthogonal Householder matrix \(H\).

\[H\begin{pmatrix} v_1\\ v_2\\ v_3\end{pmatrix}=\|v\|_2\begin{pmatrix} 1\\ 0\\ 0\end{pmatrix}\]

This process can be repeated with other columns of a matrix to transform it into an upper triangular one. For more details, have a look at the references.

first_column_of_operator_1 = operator_1.to_dense()[:,0]
norm = tf.linalg.norm(first_column_of_operator_1)
vec = first_column_of_operator_1 - norm*tf.linalg.eye(3)[:,0]    # Whether to take positive or negative sign? See references.
householder = tf.linalg.LinearOperatorHouseholder(reflection_axis = vec)
householder.matmul(tf.reshape(first_column_of_operator_1, shape = (3,1))).numpy()
array([[8.1240387e+00],
       [4.7683716e-07],
       [4.7683716e-07]], dtype=float32)
tf.matmul(householder.to_dense(), tf.reshape(first_column_of_operator_1, shape = (3,1)))
<tf.Tensor: shape=(3, 1), dtype=float32, numpy=
array([[8.1240387e+00],
       [4.7683716e-07],
       [7.1525574e-07]], dtype=float32)>

Kronecker matrix

operator_1.to_dense()
<tf.Tensor: shape=(3, 3), dtype=float32, numpy=
array([[1., 2., 3.],
       [4., 5., 6.],
       [7., 8., 9.]], dtype=float32)>
operator_2.to_dense()
<tf.Tensor: shape=(2, 2), dtype=float32, numpy=
array([[-9., -8.],
       [-7., -6.]], dtype=float32)>
kron_operator = tf.linalg.LinearOperatorKronecker([operator_1,operator_2])
kron_operator.to_dense()
<tf.Tensor: shape=(6, 6), dtype=float32, numpy=
array([[ -9.,  -8., -18., -16., -27., -24.],
       [ -7.,  -6., -14., -12., -21., -18.],
       [-36., -32., -45., -40., -54., -48.],
       [-28., -24., -35., -30., -42., -36.],
       [-63., -56., -72., -64., -81., -72.],
       [-49., -42., -56., -48., -63., -54.]], dtype=float32)>

Permutation matrix

perm_operator = tf.linalg.LinearOperatorPermutation(tf.constant([2,4,0,3,1]))
perm_operator.to_dense()
<tf.Tensor: shape=(5, 5), dtype=float32, numpy=
array([[0., 0., 1., 0., 0.],
       [0., 0., 0., 0., 1.],
       [1., 0., 0., 0., 0.],
       [0., 0., 0., 1., 0.],
       [0., 1., 0., 0., 0.]], dtype=float32)>

Common matrices using operators

Identity matrix

iden_operator = tf.linalg.LinearOperatorIdentity(num_rows = 5)
iden_operator.to_dense()
<tf.Tensor: shape=(5, 5), dtype=float32, numpy=
array([[1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0.],
       [0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 1.]], dtype=float32)>

Scaled identity matrix

scaled_iden_operator = tf.linalg.LinearOperatorScaledIdentity(num_rows = 5, multiplier = 5.)
scaled_iden_operator.to_dense()
<tf.Tensor: shape=(5, 5), dtype=float32, numpy=
array([[5., 0., 0., 0., 0.],
       [0., 5., 0., 0., 0.],
       [0., 0., 5., 0., 0.],
       [0., 0., 0., 5., 0.],
       [0., 0., 0., 0., 5.]], dtype=float32)>
scaled_iden_operator_2 = tf.linalg.LinearOperatorScaledIdentity(num_rows = 3, multiplier = tf.constant([-5,7]))
scaled_iden_operator_2.to_dense()
<tf.Tensor: shape=(2, 3, 3), dtype=int32, numpy=
array([[[-5,  0,  0],
        [ 0, -5,  0],
        [ 0,  0, -5]],

       [[ 7,  0,  0],
        [ 0,  7,  0],
        [ 0,  0,  7]]], dtype=int32)>

Diagonal matrix

diag_operator = tf.linalg.LinearOperatorDiag(tf.constant([1,2,3,4.]))
diag_operator.to_dense()
<tf.Tensor: shape=(4, 4), dtype=float32, numpy=
array([[1., 0., 0., 0.],
       [0., 2., 0., 0.],
       [0., 0., 3., 0.],
       [0., 0., 0., 4.]], dtype=float32)>

Tri-diagonal matrix

diags = tf.constant([[-1, -1, -1, -1, 0],
                         [ 2,  2,  2,  2, 2],
                         [ 0, -1, -1, -1, -1.]])
tridiag_operator = tf.linalg.LinearOperatorTridiag(diagonals = diags)
tridiag_operator.to_dense()
<tf.Tensor: shape=(5, 5), dtype=float32, numpy=
array([[ 2., -1.,  0.,  0.,  0.],
       [-1.,  2., -1.,  0.,  0.],
       [ 0., -1.,  2., -1.,  0.],
       [ 0.,  0., -1.,  2., -1.],
       [ 0.,  0.,  0., -1.,  2.]], dtype=float32)>

Lower triangular matrix

mat = tf.constant([[2,4,7,8],
                   [1,2,3,4],
                   [5,8,9,6],
                   [4,2,3,1]], dtype = tf.float32)
lower_tri_operator = tf.linalg.LinearOperatorLowerTriangular(mat)
lower_tri_operator.to_dense()
<tf.Tensor: shape=(4, 4), dtype=float32, numpy=
array([[2., 0., 0., 0.],
       [1., 2., 0., 0.],
       [5., 8., 9., 0.],
       [4., 2., 3., 1.]], dtype=float32)>

Matrix of zeros

zeros_operator = tf.linalg.LinearOperatorZeros(num_rows = 4, num_columns = 5, is_square = False,  is_self_adjoint=False)
zeros_operator.to_dense()
<tf.Tensor: shape=(4, 5), dtype=float32, numpy=
array([[0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.]], dtype=float32)>

Matrix operations using operators

Low-rank update

When a low rank matrix is added to a given matrix, the resulting matrix is called a low-rank update of the original matrix. Let’s suppose our original matrix was \(A\) and we add a rank 1 update to it. The resulting matrix is \(B\). So

\[B = A + uv^T\]

Where, \(u\), and \(v\) are column vectors. It should be noted that low-rank matrix update doesn’t always increase the rank of the original matrix. For example, if rank of \(A\) was 2, updating it with a rank 1 matrix will not always make its rank 3. Here is an example.

A = tf.constant([[1,2,3],
                 [2,4,6],
                 [3,4,5]], dtype = tf.float32)
u = tf.constant([[5],
                 [6],
                 [7]], dtype = tf.float32)
v = tf.constant([[7],
                 [8],
                 [9]], dtype = tf.float32)
B = A + tf.matmul(u,v, transpose_b=True)
print("Rank of A = ", tf.linalg.matrix_rank(A).numpy())
print("Rank of B = ", tf.linalg.matrix_rank(B).numpy())
Rank of A =  2
Rank of B =  2

Why is it useful?

It turns out that this low rank update appears in many applications. One of the applications is in least squares. Imagine that you have solved the least squares problem using the available data. And now you get some new data. The problem is to get the new least squares fit. Well, you can start form scratch by including the new data to your old data and then fit the model on the whole data. But this is a wasteful approach. A better alternative is to use the new data to modify your old fit. If you do some mathematics, you will arrive at matrix update equation. We will not do the math here. Interested readers can check references.

Computations can be much faster if we use low-rank matrix update equation. In tensorflow it is done using tf.linalg.OperatorLowRankUpdate operator. Though the operator can handle more than rank 1 update, we will use it only for rank 1 update.

As an example, let’s suppose we want to compute the inverse of \(B\). We can do so by modifying the inverse of \(A\) that we have previously computed. The result is the famous Sherman–Morrison-Woodbury formula. \[B^{-1} = (A+uv^T)^{-1} = A^{-1}-\frac{A^{-1}uv^TA^{-1}}{1+v^TA^{-1}u}\]

Provided that the denominator is not equal to zero. Note that denominator is a scalar for rank 1 update. This equation show that we can compute new inverse from the old inverse by using matrix-vector and vector-vector multiplications.

operator.to_dense()
<tf.Tensor: shape=(3, 3), dtype=float32, numpy=
array([[1., 2., 3.],
       [2., 3., 5.],
       [7., 8., 9.]], dtype=float32)>
low_rank_update = tf.linalg.LinearOperatorLowRankUpdate(operator,u = u, diag_update = None, v = v)
low_rank_update.inverse().to_dense()
<tf.Tensor: shape=(3, 3), dtype=float32, numpy=
array([[-2.1666667 ,  1.        ,  0.625     ],
       [ 2.8333333 , -2.        , -0.24999982],
       [-0.83333325,  1.        , -0.25000012]], dtype=float32)>

Using Sherman–Morrison-Woodbury formula.

operator_inv = operator.inverse()
second_factor_numer = tf.matmul(operator_inv.matmul(u), tf.matmul(v,operator_inv.to_dense(), transpose_a=True))
second_factor_denom = 1 + tf.matmul(v,operator_inv.matmul(u), transpose_a = True)
update_inv = operator.inverse().to_dense() - (1/second_factor_denom)*second_factor_numer
print(update_inv)
tf.Tensor(
[[-2.1666667   1.          0.6250001 ]
 [ 2.8333333  -2.         -0.25      ]
 [-0.83333325  1.         -0.25000006]], shape=(3, 3), dtype=float32)
low_rank_update.to_dense()
<tf.Tensor: shape=(3, 3), dtype=float32, numpy=
array([[36., 42., 48.],
       [44., 51., 59.],
       [56., 64., 72.]], dtype=float32)>
operator.to_dense() + tf.matmul(u, v, transpose_b = True)
<tf.Tensor: shape=(3, 3), dtype=float32, numpy=
array([[36., 42., 48.],
       [44., 51., 59.],
       [56., 64., 72.]], dtype=float32)>

Along with inverse, other methods can be applied to LinearOperatorLowRankUpdate.

low_rank_update.diag_part().numpy()
array([36., 51., 72.], dtype=float32)

Operator inversion

Computes inverse operator of a given operator.

inv_operator = tf.linalg.LinearOperatorInversion(operator = operator)
inv_operator.to_dense()
<tf.Tensor: shape=(3, 3), dtype=float32, numpy=
array([[-2.1666667 ,  1.        ,  0.16666669],
       [ 2.8333333 , -2.        ,  0.16666669],
       [-0.83333325,  1.        , -0.16666669]], dtype=float32)>

Operator composition

Like composition of function, this operator applies one operator over another. In terms of matrices, it just means matrix multiplication. But the result of composition is another operator. That new operator can be used for further analysis.

operator_1 = tf.linalg.LinearOperatorFullMatrix([[1,2],
                                                 [2,5],
                                                 [7,-3.]])
operator_2 = tf.linalg.LinearOperatorFullMatrix([[2,-1,3],
                                                 [-1,4,5.]])
operator_comp = tf.linalg.LinearOperatorComposition([operator_1,operator_2])
operator_comp.to_dense()
<tf.Tensor: shape=(3, 3), dtype=float32, numpy=
array([[  0.,   7.,  13.],
       [ -1.,  18.,  31.],
       [ 17., -19.,   6.]], dtype=float32)>

Conclusion

As we have seen, using only tensorflow we can do quite a bit of linear algebra. In this post, we have only glossed over some of the functionalities. Clever use of the functions and operators will enable us to do much more than what has been covered here. At times, it might feel a little verbose. But the flexibility that it offers will make the exploration a rewarding experience.

References

  1. Tensorflow documentation
  2. Datta, Biswa Nath. Numerical linear algebra and applications. Vol. 116. Siam, 2010.
  3. (The Book) Golub, Gene H., and Charles F. Van Loan. Matrix computations. Vol. 3. JHU press, 2012.

Last updated: 30 July, 2020.

Avatar
Biswajit Sahoo
PhD Student

My research interests include machine learning, deep learning, signal processing and data-driven machinery condition monitoring.

Related