Digging into the Conjugation in Complex Inner Products

The inconsistency between the inner product of complex spaces and the quadratic forms regarding the positions of conjugate elements.

Table of Contents

The Simple Explanation

The Detailed Explanation

Why Is There the Conjugate and Conjugate Symmetry?

Why the Conjugate Is Taken on the Second Vector?

There is a phenomenon in linear algebra textbooks that is difficult to understand at first glance.

By common mathematical convention, the standard inner product $\langle \cdot, \cdot \rangle$ on an $n$ -dimensional complex space $\mathbb{C}^n$ is defined as:

\langle \mathbf{u}, \mathbf{v} \rangle = \sum_{k=1}^n u_k \overline {v_k\raisebox{0.67em}{}}

And at the same time there is the definition of the Hermitian quadratic forms:

f(\mathbf{u}) = \sum_{j=1}^n \sum_{k=1}^n a_{j,k} \overline {u_j\raisebox{0.67em}{}} u_k

These two definitions, which are supposed to be closely related, take conjugates on different elements — for the inner product, it is the latter, $v_k$ ; for the quadratic form, it is the former, $u_j$ .

It is of course possible to use another set of definitions, where the elements of $\langle \mathbf{u}, \mathbf{v} \rangle$ and $f(\mathbf{u})$ both take the conjugate on the other element, and the “inconsistency” seen above is still present.

After wrestling with this strange question for a long time, I had some wonderful thoughts, which are recorded in detail here.

Unless otherwise specified, we establish the following convention:
$u_k$ , $v_k$ stands for the $k$ -th component of the vectors $\mathbf{u}$ and $\mathbf{v}$ in $\mathbb{C}^n$ , respectively;
$a_{j,k}$ stands for the element at the $j$ -th row and the $k$ -th column of the positive definite Hermitian matrix $\mathbf{A}$ .

The Simple Explanation

In fact, $\langle \mathbf{\mathbf{u}}, \mathbf{v} \rangle = \mathbf{v}^\mathsf{H}\mathbf{u}$ , and $f(\mathbf{u}) = \mathbf{u}^\mathsf{H}\mathbf{A}\mathbf{u}$ ; hence they are both specialisations of the generalised quadratic form $Q_\mathbf{A}(\mathbf{u}, \mathbf{v}) = \mathbf{v}^\mathsf{H}\mathbf{A}\mathbf{u}$ .

As for the difference in the “position” of the conjugation, it is merely an illusion created by the way it is written. It is easy to see this in the following form.

\begin{aligned} \langle \mathbf{u}, \mathbf{v} \rangle = \mathbf{v}^\mathsf{H}\mathbf{I}\mathbf{u} &= \sum_{j=1}^n \sum_{k=1}^n \overline {v_j\raisebox{0.67em}{}} \delta_{j,k} u_k \\ f(\mathbf{u}) = \mathbf{u}^\mathsf{H}\mathbf{A}\mathbf{u} &= \sum_{j=1}^n \sum_{k=1}^n \overline {u_j\raisebox{0.67em}{}} a_{j,k} u_k \end{aligned}

(where the Kronecker function $\delta_{j,k}$ equals the element at the $j$ -th row and the $k$ -th column of the identity matrix $\mathbf{I}$ .)

The Detailed Explanation

The form $\mathbf{v}^\mathsf{H}\mathbf{A}\mathbf{u}$ is not randomly chosen. There is a more fundamental reason behind the mathematicians’ preference for it. [The physics convention seems to be the opposite, but the reason for this (the Dirac notation of quantum mechanics) is irrelevant to our topic here.]

Why Is There the Conjugate and Conjugate Symmetry?

First, why is there a conjugate in the standard inner product?

Looking back at the dot product on real space, it is essentially a generalisation of the Euclidean norm $\Vert \mathbf{u} \Vert^2 = \sum_k u_k^2$ to the case of two vectors.

On the complex space there needs to be a similar operation, satisfying positive definiteness $\Vert \mathbf{u} \Vert^2 \geq 0$ . In the case of one dimension, the modulus of the complex number $|z|^2 = z \cdot \overline {z\raisebox{0.67em}{}}$ is clearly a first choice. Generalising it to $n$ dimensions, the definition $\Vert \mathbf{u} \Vert^2 = \sum_k |u_k|^2 = u_k \cdot \overline {u_k\raisebox{0.67em}{}}$ arises. The conjugation comes from here.

From another perspective, why does the inner product satisfy the conjugate symmetry $\langle \mathbf{u}, \mathbf{v} \rangle = \overline {\langle \mathbf{v}, \mathbf{u} \rangle \raisebox{0.82em}{}}$ , rather than the plain symmetry $\langle \mathbf{u}, \mathbf{v} \rangle = \langle \mathbf{v}, \mathbf{u} \rangle$ ?

Let us return to the standard inner product and try to separate the real part of the expression from the imaginary part.

\begin{aligned} \operatorname{Re} {\langle \mathbf{u}, \mathbf{v} \rangle} = \sum_{k=1}^n \operatorname{Re} u_k \operatorname{Re} v_k + \operatorname{Im} u_k \operatorname{Im} v_k \\ \operatorname{Im} {\langle \mathbf{u}, \mathbf{v} \rangle} = \sum_{k=1}^n \operatorname{Im} u_k \operatorname{Re} v_k - \operatorname{Re} u_k \operatorname{Im} v_k \end{aligned}

It can be observed that the real part $\operatorname{Re} {\langle \mathbf{u}, \mathbf{v} \rangle}$ is equivalent to the sum of $n$ dot products on $\mathbb{R}^2$ , while the imaginary part $\operatorname{Im} {\langle \mathbf{u}, \mathbf{v} \rangle}$ equals the sum of $n$ cross products on $\mathbb{R}^2$ , giving the “rotation angle in the $n$ -dimensional complex space from $\mathbf{v}$ to $\mathbf{u}$ ” — for instance, when the arguments (polar angles) of each component are equal for both vectors, $\operatorname{Im} {\langle \mathbf{u}, \mathbf{v} \rangle} = 0$ ; when the argument of the component of $\mathbf{u}$ is that of the component of $\mathbf{v}$ rotated by $90$ degrees for each component, $\operatorname{Im} {\langle \mathbf{u}, \mathbf{v} \rangle}$ reaches its maximal possible value $\Vert \mathbf{u} \Vert \cdot \Vert \mathbf{v} \Vert$ .

And in the more general definition of the inner product, it is natural to expect the real and imaginary parts of the result to have corresponding properties respectively. Conjugate symmetry is derived from the anticommutative property of the cross product, or more generally, the anticommutative property for “rotation angles in complex spaces”.

In this way, the property of the conjugate is indeed excellent.

Why the Conjugate Is Taken on the Second Vector?

To solve this problem, one has to explore another layer of the nature of inner products, and it is never wrong to start with the simplest dot product. What is the nature of the dot product?

We can of course say that it represents “the projection of $\mathbf{u}$ on $\mathbf{v}$ , multiplied by the length of $\mathbf{v}$ ”, but this is not deep enough.

3Blue1Brown explained in depth in the video^🪐 about a viewpoint: the dot product is the application of a linear transformation on $\mathbb{R}^n \rightarrow \mathbb{R}$ defined by $\mathbf{v}$ to $\mathbf{u}$ . This transformation turns any vector $\mathbf{u}$ into a scalar value $\mathbf{v}^\mathsf{T}\mathbf{u}$ .

From this perspective, an inner product function and a vector together also determines a transformation. That is to say, an “inner product” is an operator that maps vectors onto linear transformations. In a more abstractive manner, this is an instance of currying: a binary function $\mathbb{C}^n \rightarrow \mathbb{C}^n \rightarrow \mathbb{C}$ can be seen as an operator that maps an argument to a unary function, $\mathbb{C}^n \rightarrow (\mathbb{C}^n \rightarrow \mathbb{C})$ .

A positive definite Hermitian matrix $\mathbf{A}$ defines such an operator, which maps a vector $\mathbf{v}$ onto a transformation based on the standard basis $\mathbf{v}^\mathsf{H}\mathbf{A}$ . Applying the transformation to $\mathbf{u}$ results in the previously seen form of $\mathbf{v}^\mathsf{H}\mathbf{A}\mathbf{u}$ . This is actually a general form of inner products on $\mathbb{C}^n$ (the Hermitian form).

Hence, the reason why the conjugate is taken for $\mathbf{v}$ is because we expect that $\mathbf{u}$ , as the element being transformed, keep its original form, and all computations be put inside the linear transformation $\mathbb{C}^n \rightarrow \mathbb{C}$ . As for why $\mathbf{u}$ is written before $\mathbf{v}$ , it’s probably because it’s more intuitive to first specify the element being transformed and then specify the transformation in this binary operation.

<19-Awn>