1 Introduction and Main Results
Consider the first Piola-Kirchhoff stress $ {\bf{P}} $ defined as [1]
$ \begin{equation} {\bf{P}}({\bf{F}}) = \frac{\partial W({\bf{F}})}{\partial {\bf{F}}}, \end{equation} $ |
(1.1) |
where $ {\bf{F}} $ and $ W $ refer to the deformation tensor and strain energy density. Due to the principle of material frame-indifference and the material symmetry, two important constraints should be satisfied by these relations [1]:
$ \begin{eqnarray} {\bf{P}}({\bf{Q}}{\bf{F}}) &=& {\bf{Q}}{\bf{P}}({\bf{F}}), \quad \forall {\bf{Q}} \in SO(3); \end{eqnarray} $ |
(1.2) |
$ \begin{eqnarray} {\bf{P}}({\bf{F}}) &=& {\bf{P}}({\bf{F}}{\bf {H}}){\bf {H}}^{T}, \quad \forall {\bf {H}} \in G, \end{eqnarray} $ |
(1.3) |
where $ SO(3) $ stands for the proper orthogonal group and $ G $ refers to the material symmetry group.
In this work, we propose a deep symmetric network $ {\bf{P}}_{\theta}({\bf {F}}) $ to represent the first PK stress (DPK). By symmetric we mean the network can exactly satisfy the two constraints (1.2),(1.3). The neural network enables us to construct the constitutive relation based on both macroscopic observations and atomistic simulation data.
To begin with, we first review the traditional atomistic PK stress (virial stress) defined as
$ \begin{equation} {\bf{P}}({\bf {F}}) = \frac{1}{\Omega}\frac{\partial V(|{\bf{F}}{\bf {X}}_{12}|, ..., |{\bf{F}}{\bf {X}}_{ij}|, ...)}{\partial {\bf{F}}} = - \frac{1}{2\Omega}\sum\limits_{i}\sum\limits_{j\neq i}{\bf {f}}_{ij}\otimes {\bf {X}}_{ij}, \end{equation} $ |
(1.4) |
in which $ {\bf {X}}_{ij} $ and $ {\bf {f}}_{ij} $ refer to the relative displacement and pair force between the $ i- $th and $ j- $th atom, and $ V $ stands for the potential function of atom system. Under several assumption on the symmetry of the potential and the crystal structure, we can prove the virial stress is exactly in conformity with the two constraints (1.2),(1.3):
Theorem 1.1 (Symmetry of virial stress) For a given crystal system with periodic boundary condition, suppose the potential $ V $ is a symmetric function, let $ {\bf{P}}({\bf {F}}) $ be the stress defined by (1.4), then the constraint (1.2) holds for any $ {\bf {Q}} $ in $ SO(3) $, and (1.3) holds for any $ {\bf {H}} $ in the crystallographic symmetry group (point symmetry group $ G_p $, [2]).
Motivated by the above theorem, we design the deep symmetry network as follows. For a given crystal material, we denote the crystallographic point symmetry group as $ G_p $. For any finite 3-dimensional trainable weight vectors $ {\bf {W}} = \{{\bf {w}}_1, ..., {\bf {w}}_N\} $, the symmetrized weights $ \tilde{{\bf {W}}} $ is defined as the image set of $ G_p $ on $ {\bf {W}} $:
$ \begin{equation} \tilde{{\bf {W}}} = \{{\bf {H}}_i {\bf {w}}_j | {\bf {H}}_i \in G_p, {\bf {w}}_j \in {\bf {W}}\}. \end{equation} $ |
(1.5) |
By construction $ \tilde{{\bf {W}}} = \{\tilde{{\bf {w}}}_1, ..., \tilde{{\bf {w}}}_M\} $ is an invariant subset of $ G_p $. With this we define the deep symmetry network as follows:
Definition 1.1 (DPK) Let $ {\bf {W}} = \{{\bf {w}}_1, ..., {\bf {w}}_N\} $ be N trainable 3-dimensional weight vectors and $ \tilde{{\bf {W}}} $ the symmetrized weights defined above, denote $ g({\bf {x}};\theta) = h_n \circ a_{n-1} \circ ... \circ a_1 \circ h_1({\bf {x}}) $ as the classical dense neural network (DNN) with linear functions $ h_i({\bf {x}};\theta_i) $ and activation function $ a_i({\bf {x}}) $, the deep symmetric network for PK stress is defined as
$ \begin{equation} {\bf{P}}_{\theta, {\bf {W}}}({\bf {F}}) = {\bf {F}}\sum\limits_{i=1}^{M} \frac{\partial f}{\partial x_i} (|{\bf {F}} \tilde{{\bf {w}}}_1|, ..., |{\bf {F}}\tilde{{\bf {w}}}_M|; \theta)\frac{1}{|{\bf {F}}\tilde{{\bf {w}}}_i|} \tilde{{\bf {w}}}_i\otimes \tilde{{\bf {w}}}_i, \end{equation} $ |
(1.6) |
where $ f({\bf {x}}, \theta) = \frac{1}{M!}\sum_{\tau \in S_M} g(\tau({\bf {x}}); \theta) $ with $ S_M $ stands for all the permutations of $ [1, 2, ..., M] $.
The architecture of this network is illustrated in Fig. 1. By using the fact that $ \tilde{{\bf {W}}} $ is an invariant subset of $ G_p $ and $ f({\bf {x}}, \theta) = \frac{1}{M!}\sum_{\tau \in S_M} g(\tau({\bf {x}}); \theta) $ is a symmetric function, we obtain that the DPK is a symmetric stress:
Theorem 1.2 (Symmetry of DPK) For a given crystal system with crystallographic symmetry group $ G_p $, let $ {\bf{P}}_{\theta, {\bf {W}}}({\bf {F}}) $ be the DPK stress defined by (1.6), then the constraint (1.2) holds for any $ {\bf {Q}} $ in $ SO(3) $, and (1.3) holds for any $ {\bf {H}} $ in $ G_p $.
Furthermore, by applying some existing result on the approximation of DNN, we show that the DPK stress owns good approximation ability:
Theorem 1.3 (Approximation of DPK) For a finite crystal system consists of $ N $ atoms with symmetric potential $ V $ satisfying $ \|{\bf {X}}_{ij}\|_2 <1 $ for any $ j \neq i $, if there exists some $ n>1, B< \infty $ such that $ \|V\|_{W^{n, \infty}((0, 1)^{N(N-1)})} < B $, then for any $ \epsilon \in (0, 1/2) $ there exists a DPK function $ {\bf{P}}_0 $ such that
$ \begin{equation} \|{\bf{P}}_{virial}({\bf {F}}) - {\bf{P}}_0({\bf {F}})\|_F \leq \epsilon, \end{equation} $ |
(1.7) |
if $ \|{\bf {FX}}_{ij}\|_2 <1 $. Here $ \|\cdot\|_F $ represents the Frobenius norm and $ {P}_{virial}({\bf {F}}) $ is the virial stress defined by (1.4).
Remark 1 Although the DPK architecture introduced above supplies us with a strong approximation tool which can exactly satisfy the symmetry conditions, the sum term $ f({\bf {x}}, \theta) = \frac{1}{M!}\sum_{\tau \in S_M} g(\tau({\bf {x}}); \theta) $ with a large $ M $ will introduce an intolerable computation complexity in both training and inference procedure.
In order to achieve a simplified architecture while keeping its symmetry and approximation capability, we learn from the idea of quasi-harmonic approximation [3] method and decompose the virtual potential function $ f $ in (1.6) as
$ \begin{equation} f(x_1, x_2, ..., x_M;\theta) = \sum\limits_{i=1}^{M}g(x_i;\theta) \end{equation} $ |
(1.8) |
where $ g(x) = h_n \circ a_{n-1} \circ ... \circ a_1 \circ h_1(x) $ is the classical DNN function. Applying (1.8) to (1.6), we can obtain the modified DPK architecture as:
$ \begin{equation} {\bf{P}}_{\theta, {\bf {W}}}({\bf {F}}) = {\bf {F}}\sum\limits_{i=1}^{M} \frac{d g}{dx}(|{\bf {F}} \tilde{{\bf {w}}}_i |; \theta)\frac{1}{|{\bf {F}} \tilde{{\bf {w}}}_i |} \tilde{{\bf {w}}}_i\otimes \tilde{{\bf {w}}}_i. \end{equation} $ |
(1.9) |
A further simplification can be achieved once we represent $ \frac{1}{x}g^\prime(x;\theta) = \tilde{g}(x;\tilde{\theta}) $ where $ \tilde{g}(x;\tilde{\theta}) $ is a new DNN function. In this case the simplified DPK can be written as:
$ \begin{equation} {\bf{P}}_{\theta, {\bf {W}}}({\bf {F}}) = {\bf {F}}\sum\limits_{i=1}^{M} \tilde{g}(|{\bf {F}} \tilde{{\bf {w}}}_i |; \tilde{\theta}) \tilde{{\bf {w}}}_i\otimes \tilde{{\bf {w}}}_i. \end{equation} $ |
(1.10) |
We numerically illustrate the effectiveness and transferability of (1.10). Additionally, with some classical tools in learning theory such as fat-shattering dimension, we prove that the estimation error (generalization error) of (1.10) can be arbitrarily small with sufficient large number of training samples:
Theorem 1.4 (Learnability of simplified DPK) Let $ \mathcal{A} = \{{\bf{P}}_{\theta, {\bf {W}}}\} $ be the class of DPK functions defined by (1.10) satisfying that $ \tilde{g}(x;\tilde{\theta}) $ has $ l\geq 2 $ layers, $ N_w $ weights (bounded by $ V $), Lipschitz continuous activation function with Lipschitz coefficient $ L > 1/V $ and each unit maps into $ [-b, b] $ for some $ b>0 $. Given the training samples $ S_m = \{({\bf {F}}^1, {\bf{P}}^1), ..., ({\bf {F}}^m, {\bf{P}}^m)\} $ according to independent identical distribution $ D $ on $ [-1, 1]^{3\times3} \times [-1, 1]^{3\times3} $ and loss function
$ \begin{equation} l(f({\bf {X}}), {\bf {Y}}) = \frac{1}{9}\sum\limits_{i, j}l_0(f_{ij}({\bf {X}}), {\bf {Y}}_{ij}), \text{ for } f \in \mathcal{A}, \end{equation} $ |
(1.11) |
where $ l_0(x, y) = |x-y| $, then for any approximate sample error minimization (SME, [4]) algorithm $ \mathbb{M} $ and $ \epsilon \leq 2max\{b, V, \sqrt{3V} \}, \delta >0 $, there exits some $ m_0(\epsilon, \delta, \mathcal{A}) < \infty $ such that $ \forall m>m_0 $, with probability at least $ 1-\delta $, $ \mathbb{M} $ returns a function $ P^* \in \mathcal{A} $ satisfying
$ \begin{equation} E_{(x, y)\sim D}[l(P^*(x), y)] < \inf\limits_{f \in \mathcal{A}}E_{(x, y)\sim D}[l(f(x), y)] + \epsilon. \end{equation} $ |
(1.12) |
Remark 2 In practice, the conditions of $ \tilde{g} $ are indeed satisfied since we will use the ReLU as activation function and bound the network weights by using an $ l_1 $ regularization.
The detailed proof of the theorems mentioned above and the numerical results can be found in [5].