Jekyll2023-07-27T15:04:43+01:00https://ascillitoe.github.io/feed.xmlAshley ScillitoeData science research engineer @ seldon.io. Open-source enthusiast and aerospace geek.Ashley Scillitoeashleyscillitoe@googlemail.comRegularisation 1: Handling real-world data2021-03-15T00:00:00+00:002021-03-15T00:00:00+00:00https://ascillitoe.github.io/posts/2021/Regularisation1-The-need-for-regularisation-in-polynomial-approximations<p>This blog post is part of a series of posts exploring the regularised regression techniques we’ve recently added to <a href="">equadratures</a>, our open-source python library which utilises orthogonal polynomials for a range of tasks.</p>
<p>One of the central tenets of classical machine learning is the bias-variance tradeoff, which implies that an accurate model must balance under-fitting and over-fitting. In other words, it must be rich enough to express underlying structure in the training data, but simple enough to avoid over-fitting to the spurious patterns and noise often present in real-world data.</p>
<figure class="">
<img src="https://ascillitoe.github.io/assets/images/posts/regularisation1/bias_variance.png" alt="bias_variance_tradeoff" style="width: 50%;" /><figcaption>
Figure 1: The bias-variance trade-off.
</figcaption></figure>
<p>With polynomial regression, such as that used by <em>equadratures</em>, the <em>model complexity</em> is often controlled through the <em>polynomial order</em> – Low order polynomials might not sufficiently characterise trends in the data, but high order polynomials risk overfitting to noise in the data.</p>
<p>One approach to the above problem is <em>regularised regression</em>, where penalty terms are used to discourage more complex models, thus mitigating the risk of overfitting. The resulting <em>sparse solutions</em>, with fewer non-zero model coefficients, also aid model interpretability. In this series of posts I’ll be exploring a number of topics surrounding regularisation, starting with a deeper look at the motivation behind regularisation. As usual, if you want to follow along, click on the link below:</p>
<p><a href="https://colab.research.google.com/github/ascillitoe/EQ-live/blob/master/Blog_posts/regularisation/post_regularisation1.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab" /></a></p>
<h1 id="high-order-polynomials">High order polynomials</h1>
<h2 id="ordinary-least-squares-regression">Ordinary Least Squares regression</h2>
<p>To demonstrate the need for regularisation when using high order polynomials, lets start with a basic polynomial regression class. The $k^{th}$ order 1D polynomial</p>
\[y_i = \beta_0 + \beta_1 x_i +\beta_2 x_i^2 + \dots + \beta_k x_i^k \;\;\;\;\; i = [1,\dots,N]\]
<p>can be written as a linear algebra problem</p>
\[\boldsymbol{y} = \mathbf{A}\boldsymbol{\beta}\]
\[\begin{bmatrix} y_1\\ y_2\\ \vdots \\ y_N \end{bmatrix}= \begin{bmatrix} 1 & x_1 & x_1^2 & \dots & x_1^k \\ 1 & x_2 & x_2^2 & \dots & x_2^k \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 1 & x_N & x_N^2 & \dots & x_N^k \end{bmatrix} \begin{bmatrix} \beta_0\\ \beta_1\\ \beta_2\\ \vdots \\ \beta_k \end{bmatrix}\]
<p>Performing a regression is then a case of finding the polynomial coefficients $\boldsymbol{\beta}$. This can be done using the <code class="language-plaintext highlighter-rouge">numpy</code> Ordinary Least Squares (OLS) solver <code class="language-plaintext highlighter-rouge">linalg.lstsq</code>, which seeks to minimise the loss term:</p>
\[\lVert \boldsymbol{y} -\mathbf{A}\boldsymbol{\beta} \rVert_2^2\]
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
</pre></td><td class="rouge-code"><pre><span class="k">class</span> <span class="nc">linear</span><span class="p">:</span>
<span class="s">"""A simple linear regression class"""</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span><span class="n">order</span><span class="o">=</span><span class="mi">1</span><span class="p">):</span>
<span class="bp">self</span><span class="p">.</span><span class="n">order</span><span class="o">=</span><span class="n">order</span>
<span class="k">def</span> <span class="nf">fit</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span><span class="n">X</span><span class="p">,</span><span class="n">y</span><span class="p">):</span>
<span class="n">n</span> <span class="o">=</span> <span class="n">X</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
<span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">X</span><span class="p">.</span><span class="n">shape</span><span class="p">)</span><span class="o">==</span><span class="mi">1</span><span class="p">:</span> <span class="n">X</span> <span class="o">=</span> <span class="n">X</span><span class="p">.</span><span class="n">reshape</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span><span class="mi">1</span><span class="p">)</span>
<span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">y</span><span class="p">.</span><span class="n">shape</span><span class="p">)</span><span class="o">==</span><span class="mi">1</span><span class="p">:</span> <span class="n">y</span> <span class="o">=</span> <span class="n">y</span><span class="p">.</span><span class="n">reshape</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span><span class="mi">1</span><span class="p">)</span>
<span class="n">A</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">ones</span><span class="p">(</span><span class="n">n</span><span class="p">).</span><span class="n">reshape</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span><span class="mi">1</span><span class="p">)</span>
<span class="k">for</span> <span class="n">k</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span><span class="bp">self</span><span class="p">.</span><span class="n">order</span><span class="o">+</span><span class="mi">1</span><span class="p">):</span>
<span class="n">A</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">concatenate</span><span class="p">([</span><span class="n">A</span><span class="p">,</span><span class="n">np</span><span class="p">.</span><span class="n">power</span><span class="p">(</span><span class="n">X</span><span class="p">,</span><span class="n">k</span><span class="p">)],</span><span class="n">axis</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="n">coeffs</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">linalg</span><span class="p">.</span><span class="n">lstsq</span><span class="p">(</span><span class="n">A</span><span class="p">,</span> <span class="n">y</span><span class="p">,</span> <span class="n">rcond</span><span class="o">=</span><span class="bp">None</span><span class="p">)[</span><span class="mi">0</span><span class="p">]</span>
<span class="bp">self</span><span class="p">.</span><span class="n">X</span> <span class="o">=</span> <span class="n">X</span>
<span class="bp">self</span><span class="p">.</span><span class="n">y</span> <span class="o">=</span> <span class="n">y</span>
<span class="bp">self</span><span class="p">.</span><span class="n">coeffs</span> <span class="o">=</span> <span class="n">coeffs</span>
<span class="k">def</span> <span class="nf">predict</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span><span class="n">X</span><span class="p">):</span>
<span class="n">n</span> <span class="o">=</span> <span class="n">X</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
<span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">X</span><span class="p">.</span><span class="n">shape</span><span class="p">)</span><span class="o">==</span><span class="mi">1</span><span class="p">:</span> <span class="n">X</span> <span class="o">=</span> <span class="n">X</span><span class="p">.</span><span class="n">reshape</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span><span class="mi">1</span><span class="p">)</span>
<span class="n">A</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">ones</span><span class="p">(</span><span class="n">n</span><span class="p">).</span><span class="n">reshape</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span><span class="mi">1</span><span class="p">)</span>
<span class="k">for</span> <span class="n">k</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span><span class="bp">self</span><span class="p">.</span><span class="n">order</span><span class="o">+</span><span class="mi">1</span><span class="p">):</span>
<span class="n">A</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">concatenate</span><span class="p">([</span><span class="n">A</span><span class="p">,</span><span class="n">np</span><span class="p">.</span><span class="n">power</span><span class="p">(</span><span class="n">X</span><span class="p">,</span><span class="n">k</span><span class="p">)],</span><span class="n">axis</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="n">ypred</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">dot</span><span class="p">(</span><span class="n">A</span><span class="p">,</span><span class="bp">self</span><span class="p">.</span><span class="n">coeffs</span><span class="p">.</span><span class="n">reshape</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span><span class="mi">1</span><span class="p">))</span>
<span class="k">return</span> <span class="n">ypred</span><span class="p">.</span><span class="n">reshape</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span><span class="mi">1</span><span class="p">)</span>
</pre></td></tr></tbody></table></code></pre></div></div>
<p>This approach works fine in many cases. For example, for the trigonometric function</p>
\[y = 0.2\sin(5x) + 0.05\cos(x) - 0.8\sin(0.1x)\]
<p>we get an excellent approximations as long as the order $k$ is high enough (i.e. $k=15$ in this case).</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
</pre></td><td class="rouge-code"><pre><span class="c1"># Create function and data
</span><span class="n">X</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">random</span><span class="p">.</span><span class="n">RandomState</span><span class="p">(</span><span class="mi">42</span><span class="p">).</span><span class="n">uniform</span><span class="p">(</span><span class="o">-</span><span class="mi">2</span><span class="p">,</span><span class="mi">2</span><span class="p">,</span><span class="mi">30</span><span class="p">)</span>
<span class="n">Xplot</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">linspace</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="nb">min</span><span class="p">(</span><span class="n">X</span><span class="p">),</span><span class="n">np</span><span class="p">.</span><span class="nb">max</span><span class="p">(</span><span class="n">X</span><span class="p">),</span><span class="mi">100</span><span class="p">)</span>
<span class="n">fx</span> <span class="o">=</span> <span class="k">lambda</span> <span class="n">X</span><span class="p">:</span> <span class="mf">0.2</span><span class="o">*</span><span class="n">np</span><span class="p">.</span><span class="n">sin</span><span class="p">(</span><span class="mi">5</span><span class="o">*</span><span class="n">X</span><span class="p">)</span> <span class="o">+</span> <span class="mf">0.05</span><span class="o">*</span><span class="n">np</span><span class="p">.</span><span class="n">cos</span><span class="p">(</span><span class="n">X</span><span class="p">)</span> <span class="o">-</span> <span class="mf">0.8</span><span class="o">*</span><span class="n">np</span><span class="p">.</span><span class="n">sin</span><span class="p">(</span><span class="mf">0.1</span><span class="o">*</span><span class="n">X</span><span class="p">)</span>
<span class="c1">#fx = lambda X: (X**3 - 5*X**2 - X + 2)/10
</span><span class="n">y</span> <span class="o">=</span> <span class="n">fx</span><span class="p">(</span><span class="n">X</span><span class="p">)</span>
<span class="c1"># Try different orders
</span><span class="n">fig</span><span class="p">,</span> <span class="n">ax</span> <span class="o">=</span> <span class="n">plt</span><span class="p">.</span><span class="n">subplots</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span><span class="mi">3</span><span class="p">,</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">15</span><span class="p">,</span><span class="mi">5</span><span class="p">),</span><span class="n">tight_layout</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">order</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">([</span><span class="mi">3</span><span class="p">,</span><span class="mi">8</span><span class="p">,</span><span class="mi">15</span><span class="p">]):</span>
<span class="c1"># Fit Poly
</span> <span class="n">model</span> <span class="o">=</span> <span class="n">linear</span><span class="p">(</span><span class="n">order</span><span class="o">=</span><span class="n">order</span><span class="p">)</span>
<span class="n">model</span><span class="p">.</span><span class="n">fit</span><span class="p">(</span><span class="n">X</span><span class="p">,</span><span class="n">y</span><span class="p">)</span>
<span class="n">y_pred</span> <span class="o">=</span> <span class="n">model</span><span class="p">.</span><span class="n">predict</span><span class="p">(</span><span class="n">Xplot</span><span class="p">)</span>
<span class="c1"># Plot
</span> <span class="n">ax</span><span class="p">[</span><span class="n">i</span><span class="p">].</span><span class="n">set_title</span><span class="p">(</span><span class="s">'Order = %d'</span> <span class="o">%</span><span class="p">(</span><span class="n">order</span><span class="p">))</span>
<span class="n">ax</span><span class="p">[</span><span class="n">i</span><span class="p">].</span><span class="n">plot</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">fx</span><span class="p">(</span><span class="n">X</span><span class="p">),</span> <span class="s">'C0o'</span><span class="p">,</span> <span class="n">ms</span><span class="o">=</span><span class="mi">10</span><span class="p">,</span> <span class="n">mec</span><span class="o">=</span><span class="s">'k'</span><span class="p">,</span> <span class="n">mew</span><span class="o">=</span><span class="mf">1.5</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="s">'Training observations'</span><span class="p">)</span>
<span class="n">ax</span><span class="p">[</span><span class="n">i</span><span class="p">].</span><span class="n">plot</span><span class="p">(</span><span class="n">Xplot</span><span class="p">,</span> <span class="n">fx</span><span class="p">(</span><span class="n">Xplot</span><span class="p">),</span> <span class="s">'--k'</span><span class="p">,</span> <span class="n">lw</span><span class="o">=</span><span class="mi">3</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="s">'Truth'</span><span class="p">)</span>
<span class="n">ax</span><span class="p">[</span><span class="n">i</span><span class="p">].</span><span class="n">plot</span><span class="p">(</span><span class="n">Xplot</span><span class="p">,</span> <span class="n">y_pred</span><span class="p">,</span> <span class="s">'-C3'</span><span class="p">,</span> <span class="n">lw</span><span class="o">=</span><span class="mi">3</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="s">'Poly. approx.'</span><span class="p">,</span> <span class="n">alpha</span><span class="o">=</span><span class="mf">0.8</span><span class="p">)</span>
<span class="n">ax</span><span class="p">[</span><span class="n">i</span><span class="p">].</span><span class="n">set_xlabel</span><span class="p">(</span><span class="s">'$x$'</span><span class="p">)</span>
<span class="n">ax</span><span class="p">[</span><span class="n">i</span><span class="p">].</span><span class="n">set_ylabel</span><span class="p">(</span><span class="s">'$y$'</span><span class="p">)</span>
<span class="n">ax</span><span class="p">[</span><span class="n">i</span><span class="p">].</span><span class="n">set_ylim</span><span class="p">([</span><span class="o">-</span><span class="mf">0.4</span><span class="p">,</span><span class="mf">0.4</span><span class="p">])</span>
<span class="c1">#ax[i].legend()
</span><span class="n">plt</span><span class="p">.</span><span class="n">show</span><span class="p">()</span>
</pre></td></tr></tbody></table></code></pre></div></div>
<figure class="">
<img src="https://ascillitoe.github.io/assets/images/posts/regularisation1/OLS_regression.png" alt="OLS_regression" style="width: 100%;" /><figcaption>
Figure 2: Polynomial approximations of a trigonometric function.
</figcaption></figure>
<p>However, repeating the above when a small amount of noise is added to the training data with</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
</pre></td><td class="rouge-code"><pre>y += np.random.normal(0,noise,len(y))
</pre></td></tr></tbody></table></code></pre></div></div>
<figure class="">
<img src="https://ascillitoe.github.io/assets/images/posts/regularisation1/OLS_regression_noise.png" alt="OLS_regression_noise" style="width: 100%;" /><figcaption>
Figure 3: Polynomial approximations of a trigonometric function, with noisy data.
</figcaption></figure>
<p>As with the previous example, $k=8$ isn’t sufficient to describe the true function. However, with the addition of noise, the polynomial fit for $k=15$ is now poor. The model is <em>over-fitting</em> to the noise in the <em>training</em> data, and will generalise poorly to <em>test</em> data not seen during training.</p>
<h2 id="ridge-regression">Ridge regression</h2>
<p>Over-fitting occurs for the 15th order polynomial in the presence of noise because some of the higher-order coefficients are overly inflated. This is seen in Figure 3, where the $k=7,9,11$ coefficients, in particular, are rather large compared to the <em>no-noise</em> case.</p>
<figure class="">
<img src="https://ascillitoe.github.io/assets/images/posts/regularisation1/poly_coefficient.png" alt="poly_coefficient" style="width: 75%;" /><figcaption>
Figure 4: Coefficients for the 15th order polynomial with and without noise.
</figcaption></figure>
<p>To prevent overly large coefficients, we can apply regularisation. Here we choose ridge regularisation [1], which adds an L2-norm penalty term to the loss:</p>
\[\lVert \boldsymbol{y} -\mathbf{A}\boldsymbol{\beta} \rVert_2^2 +\lambda||\boldsymbol{\beta}||_2^2\]
<p>This term pushes coefficients towards zero, thereby discouraging overly large coefficients. The amount of penalisation is controlled by the $\lambda$ parameter. To minimise this term we can replace the use the <code class="language-plaintext highlighter-rouge">np.linalg.lstsq()</code> solver in the <code class="language-plaintext highlighter-rouge">linear</code> class above with the <code class="language-plaintext highlighter-rouge">elastic_net(alpha=0)</code> solver from <em>equadratures</em>, e.g.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
</pre></td><td class="rouge-code"><pre><span class="n">solver</span> <span class="o">=</span> <span class="n">eq</span><span class="p">.</span><span class="n">solver</span><span class="p">.</span><span class="n">elastic_net</span><span class="p">({</span><span class="s">'path'</span><span class="p">:</span><span class="bp">False</span><span class="p">,</span><span class="s">'lambda'</span><span class="p">:</span><span class="mf">0.01</span><span class="p">,</span><span class="s">'alpha'</span><span class="p">:</span><span class="mf">0.0</span><span class="p">})</span>
<span class="n">coeffs</span> <span class="o">=</span> <span class="n">solver</span><span class="p">.</span><span class="n">get_coefficients</span><span class="p">(</span><span class="n">A</span><span class="p">,</span><span class="n">y</span><span class="p">)</span>
</pre></td></tr></tbody></table></code></pre></div></div>
<p>Of course, we still have a decision to make; the value of the $\lambda$ parameter. Trying this out on the same data as before, we see that if $\lambda$ is small we tend towards OLS regression, with over-fitting occurring. If $\lambda$ is too big we dampen the coefficients too much, leading to the polynomial being overly smoothed out. If $\lambda$ is “just right”, we end up with a good fit, despite the noisy data!</p>
<figure class="">
<img src="https://ascillitoe.github.io/assets/images/posts/regularisation1/ridge_regression.png" alt="ridge_regression" style="width: 100%;" /><figcaption>
Figure 5: Ridge regularised polynomial approximations of a trigonometric function, with noisy data.
</figcaption></figure>
<p>Taking this a step further, below is the error (compared to the true function) versus the $\lambda$ parameter for a range of $\lambda$ values. Clearly there is a range of $\lambda$ values where the <em>test error</em> is lower than for OLS regression. In fact, <a href="https://rss.onlinelibrary.wiley.com/doi/abs/10.1111/j.2517-6161.1974.tb00990.x"><em>Theobald</em></a> [2] proves that there always exists a value of the $\lambda$ parameter such that the ridge estimator has a lower mean squared error than the OLS estimator. The challenge is finding the correct $\lambda$ value, and this is something to be covered in the next post!</p>
<figure class="">
<img src="https://ascillitoe.github.io/assets/images/posts/regularisation1/mse_vs_lambda.png" alt="mse_vs_lambda" style="width: 60%;" /><figcaption>
Figure 6: Test mean squared error versus $\lambda$ parameter for ridge regression.
</figcaption></figure>
<h1 id="feature-selection-for-high-dimensional-datasets">Feature selection for high dimensional datasets</h1>
<p>A similar scenario occurs when we make $\mathbf{A}$ wider by adding more dimensions to the problem i.e. if $y=f(x_1,\dots x_d)$ instead of simply $y=f(x)$ like above. In such a case, models can overfit to noise in irrelevent independent variables, degrading accuracy and complicating model interpretation. For an example of this, we consider the well-known piston problem from Kenett et al. [4] (also see <a href="https://equadratures.org/_documentation/tutorial_6.html">this tutorial</a>). This is a non-linear, 7D problem, that outputs the piston cycle time given the 7 piston parameters shown below.</p>
<figure class="">
<img src="https://ascillitoe.github.io/assets/images/posts/regularisation1/piston_parameters.png" alt="piston_parameters" style="width: 70%;" /><figcaption>
Table 1: The seven input parameters (independent variables) for the piston problem.
</figcaption></figure>
<p>The dependent quantity of interest, $y$, is the piston’s cycle time, given by</p>
\[C=2\pi\sqrt{\frac{M}{k+S^{2}\frac{P_{0}V_{0}T_{a}}{T_{0}V^{2}}}},\]
<p>where</p>
\[V=\frac{S}{2k}\left(\sqrt{A^{2}+4k\frac{P_{0}V_{0}}{T_{0}}T_{a}}-A\right),\]
<p>and $A=P_{0}S+19.62M-\frac{kV_{0}}{S}$.</p>
<p>Our objective here is to obtain a polynomial approximation $g(\boldsymbol{x})\approx f(\boldsymbol{x})$ for the true function $C=f(M,S,V_0,k,P_0,T_a,T_0)$. The approximiation $g(\boldsymbol{x})$ can then be used to understand the piston’s behaviour, as well as to make predictions for new $\boldsymbol{x}$.</p>
<h2 id="generating-the-data">Generating the data</h2>
<p>Instead of using our custom linear regression class, we’ll use <em>equadratures</em> to obtain polynomials for this example. Therefore, the first step is to define our seven input parameters</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
</pre></td><td class="rouge-code"><pre><span class="kn">import</span> <span class="nn">equadratures</span> <span class="k">as</span> <span class="n">eq</span>
<span class="c1"># Define parameters
</span><span class="n">mass</span> <span class="o">=</span> <span class="n">eq</span><span class="p">.</span><span class="n">Parameter</span><span class="p">(</span><span class="n">distribution</span><span class="o">=</span><span class="s">'uniform'</span><span class="p">,</span> <span class="n">lower</span><span class="o">=</span><span class="mf">30.0</span><span class="p">,</span> <span class="n">upper</span><span class="o">=</span><span class="mf">60.0</span><span class="p">,</span> <span class="n">order</span><span class="o">=</span><span class="n">order_parameters</span><span class="p">)</span>
<span class="n">area</span> <span class="o">=</span> <span class="n">eq</span><span class="p">.</span><span class="n">Parameter</span><span class="p">(</span><span class="n">distribution</span><span class="o">=</span><span class="s">'uniform'</span><span class="p">,</span> <span class="n">lower</span><span class="o">=</span><span class="mf">0.005</span><span class="p">,</span> <span class="n">upper</span><span class="o">=</span><span class="mf">0.020</span><span class="p">,</span> <span class="n">order</span><span class="o">=</span><span class="n">order_parameters</span><span class="p">)</span>
<span class="p">...</span> <span class="n">etc</span>
<span class="n">parameters</span> <span class="o">=</span> <span class="p">[</span><span class="n">mass</span><span class="p">,</span> <span class="n">area</span><span class="p">,</span> <span class="n">volume</span><span class="p">,</span> <span class="n">spring</span><span class="p">,</span> <span class="n">pressure</span><span class="p">,</span> <span class="n">ambtemp</span><span class="p">,</span> <span class="n">gastemp</span><span class="p">]</span>
</pre></td></tr></tbody></table></code></pre></div></div>
<p>along with our piston model</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
</pre></td><td class="rouge-code"><pre><span class="c1"># Define model
</span><span class="k">def</span> <span class="nf">piston</span><span class="p">(</span><span class="n">x</span><span class="p">):</span>
<span class="n">mass</span><span class="p">,</span> <span class="n">area</span><span class="p">,</span> <span class="n">volume</span><span class="p">,</span> <span class="n">spring</span><span class="p">,</span> <span class="n">pressure</span><span class="p">,</span> <span class="n">ambtemp</span><span class="p">,</span> <span class="n">gastemp</span> <span class="o">=</span> <span class="n">x</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">x</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="n">x</span><span class="p">[</span><span class="mi">2</span><span class="p">],</span> <span class="n">x</span><span class="p">[</span><span class="mi">3</span><span class="p">],</span> <span class="n">x</span><span class="p">[</span><span class="mi">4</span><span class="p">],</span> <span class="n">x</span><span class="p">[</span><span class="mi">5</span><span class="p">],</span> <span class="n">x</span><span class="p">[</span><span class="mi">6</span><span class="p">]</span>
<span class="n">A</span> <span class="o">=</span> <span class="n">pressure</span> <span class="o">*</span> <span class="n">area</span> <span class="o">+</span> <span class="mf">19.62</span><span class="o">*</span><span class="n">mass</span> <span class="o">-</span> <span class="p">(</span><span class="n">spring</span> <span class="o">*</span> <span class="n">volume</span><span class="p">)</span><span class="o">/</span><span class="p">(</span><span class="mf">1.0</span> <span class="o">*</span> <span class="n">area</span><span class="p">)</span>
<span class="n">V</span> <span class="o">=</span> <span class="p">(</span><span class="n">area</span><span class="o">/</span><span class="p">(</span><span class="mi">2</span><span class="o">*</span><span class="n">spring</span><span class="p">))</span> <span class="o">*</span> <span class="p">(</span> <span class="n">np</span><span class="p">.</span><span class="n">sqrt</span><span class="p">(</span><span class="n">A</span><span class="o">**</span><span class="mi">2</span> <span class="o">+</span> <span class="mi">4</span><span class="o">*</span><span class="n">spring</span> <span class="o">*</span> <span class="n">pressure</span> <span class="o">*</span> <span class="n">volume</span> <span class="o">*</span> <span class="n">ambtemp</span><span class="o">/</span><span class="n">gastemp</span><span class="p">)</span> <span class="o">-</span> <span class="n">A</span><span class="p">)</span>
<span class="n">C</span> <span class="o">=</span> <span class="mi">2</span> <span class="o">*</span> <span class="n">np</span><span class="p">.</span><span class="n">pi</span> <span class="o">*</span> <span class="n">np</span><span class="p">.</span><span class="n">sqrt</span><span class="p">(</span><span class="n">mass</span><span class="o">/</span><span class="p">(</span><span class="n">spring</span> <span class="o">+</span> <span class="n">area</span><span class="o">**</span><span class="mi">2</span> <span class="o">*</span> <span class="n">pressure</span> <span class="o">*</span> <span class="n">volume</span> <span class="o">*</span> <span class="n">ambtemp</span><span class="o">/</span><span class="p">(</span><span class="n">gastemp</span> <span class="o">*</span> <span class="n">V</span><span class="o">**</span><span class="mi">2</span><span class="p">)))</span>
<span class="k">return</span> <span class="n">C</span>
</pre></td></tr></tbody></table></code></pre></div></div>
<p>To generate our dataset we then randomly sample our parameters $N=1000$ times using the <code class="language-plaintext highlighter-rouge">get_samples()</code> method, and evaluate our <code class="language-plaintext highlighter-rouge">piston()</code> model at each of these sample points.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
</pre></td><td class="rouge-code"><pre><span class="n">N</span> <span class="o">=</span> <span class="mi">1000</span>
<span class="n">X</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">empty</span><span class="p">([</span><span class="n">N</span><span class="p">,</span><span class="n">d</span><span class="o">+</span><span class="n">Nnoise</span><span class="p">])</span>
<span class="k">for</span> <span class="n">j</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">d</span><span class="p">):</span>
<span class="n">x</span> <span class="o">=</span> <span class="n">parameters</span><span class="p">[</span><span class="n">j</span><span class="p">].</span><span class="n">get_samples</span><span class="p">(</span><span class="n">N</span><span class="p">)</span>
<span class="n">X</span><span class="p">[:,</span><span class="n">j</span><span class="p">]</span> <span class="o">=</span> <span class="n">x</span>
<span class="n">y</span> <span class="o">=</span> <span class="n">piston</span><span class="p">(</span><span class="n">X</span><span class="p">.</span><span class="n">T</span><span class="p">)</span> <span class="o">+</span> <span class="n">np</span><span class="p">.</span><span class="n">random</span><span class="p">.</span><span class="n">RandomState</span><span class="p">(</span><span class="mi">42</span><span class="p">).</span><span class="n">normal</span><span class="p">(</span><span class="mf">0.0</span><span class="p">,</span><span class="mf">0.15</span><span class="p">,</span><span class="n">N</span><span class="p">)</span>
</pre></td></tr></tbody></table></code></pre></div></div>
<p>To add some realism. Gaussian noise has been added to the model output $\boldsymbol{y}$ to represent measurement noise, and some irrelevent noise variables are added to the input data $\boldsymbol{x}$.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
</pre></td><td class="rouge-code"><pre><span class="c1"># Add irrelevent noise parameters to X
</span><span class="k">for</span> <span class="n">j</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">d</span><span class="p">,</span><span class="n">d</span><span class="o">+</span><span class="n">Nnoise</span><span class="p">):</span>
<span class="n">X</span><span class="p">[:,</span><span class="n">j</span><span class="p">]</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">random</span><span class="p">.</span><span class="n">RandomState</span><span class="p">(</span><span class="mi">42</span><span class="p">).</span><span class="n">uniform</span><span class="p">(</span><span class="o">-</span><span class="n">noise</span><span class="p">,</span><span class="n">noise</span><span class="p">,</span><span class="n">N</span><span class="p">)</span>
<span class="n">parameters</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">eq</span><span class="p">.</span><span class="n">Parameter</span><span class="p">(</span><span class="n">distribution</span><span class="o">=</span><span class="s">'uniform'</span><span class="p">,</span> <span class="n">lower</span><span class="o">=-</span><span class="n">noise</span><span class="p">,</span> <span class="n">upper</span><span class="o">=</span><span class="n">noise</span><span class="p">,</span> <span class="n">order</span><span class="o">=</span><span class="n">order_parameters</span><span class="p">))</span>
<span class="c1"># Split into train and test data (70/30 train/test)
</span><span class="n">X_train</span><span class="p">,</span> <span class="n">X_test</span><span class="p">,</span> <span class="n">y_train</span><span class="p">,</span> <span class="n">y_test</span> <span class="o">=</span> <span class="n">eq</span><span class="p">.</span><span class="n">datasets</span><span class="p">.</span><span class="n">train_test_split</span><span class="p">(</span><span class="n">X</span><span class="p">,</span><span class="n">y</span><span class="p">,</span><span class="n">random_seed</span><span class="o">=</span><span class="mi">42</span><span class="p">)</span>
</pre></td></tr></tbody></table></code></pre></div></div>
<h2 id="model-interpretation-with-sobol-indicies">Model interpretation with Sobol’ indicies</h2>
<p>As covered in <a href="https://equadratures.org/_documentation/tutorial_6.html">this tutorial</a>, the polynomial approximation can be used to estimate Sobol’ indices, which measure the effect of varying each of the 7 input parameters on the output, the piston cycle time $C$. We’ll compare these estimated indices to the <em>true</em> values shown below.</p>
<figure class="">
<img src="https://ascillitoe.github.io/assets/images/posts/regularisation1/true_sobol.png" alt="true_sobol" style="width: 65%;" /><figcaption>
Table 2: The true Sobol’ indices for the piston problem, from [4].
</figcaption></figure>
<p>Since in this case the goal is <em>model interpretation</em>, we choose to use LASSO regularisation instead of the ridge regularisation used earlier. The LASSO [5] uses an L1-norm penalty term instead</p>
\[\lVert \boldsymbol{y} -\mathbf{A}\boldsymbol{\beta} \rVert_2^2 +\lambda||\boldsymbol{\beta}||_1.\]
<p>Whereas the ridge penalty pushes all the coefficients towards each other, allowing them to borrow strength from each other, the LASSO penalty tends to return sparse solutions, selecting a small number of coefficients and pushing the rest to zero. To examine the benefits of this we fit 3rd order polynomials to the training data with OLS and LASSO regression.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
</pre></td><td class="rouge-code"><pre><span class="c1"># Define the basis and polynomials
</span><span class="n">mybasis</span> <span class="o">=</span> <span class="n">eq</span><span class="p">.</span><span class="n">Basis</span><span class="p">(</span><span class="s">'total-order'</span><span class="p">)</span>
<span class="c1"># OLS regression
</span><span class="n">olspoly</span> <span class="o">=</span> <span class="n">eq</span><span class="p">.</span><span class="n">Poly</span><span class="p">(</span><span class="n">parameters</span><span class="o">=</span><span class="n">parameters</span><span class="p">,</span> <span class="n">basis</span><span class="o">=</span><span class="n">mybasis</span><span class="p">,</span> <span class="n">method</span><span class="o">=</span><span class="s">'least-squares'</span><span class="p">,</span> \
<span class="n">sampling_args</span><span class="o">=</span> <span class="p">{</span><span class="s">'mesh'</span><span class="p">:</span> <span class="s">'user-defined'</span><span class="p">,</span> <span class="s">'sample-points'</span><span class="p">:</span><span class="n">X_train</span><span class="p">,</span> <span class="s">'sample-outputs'</span><span class="p">:</span> <span class="n">y_train</span><span class="p">})</span>
<span class="c1"># LASSO regression (elastic net with alpha=1.0 gives LASSO)
</span><span class="n">lassopoly</span> <span class="o">=</span> <span class="n">eq</span><span class="p">.</span><span class="n">Poly</span><span class="p">(</span><span class="n">parameters</span><span class="o">=</span><span class="n">parameters</span><span class="p">,</span> <span class="n">basis</span><span class="o">=</span><span class="n">mybasis</span><span class="p">,</span> <span class="n">method</span><span class="o">=</span><span class="s">'elastic-net'</span><span class="p">,</span> \
<span class="n">sampling_args</span><span class="o">=</span> <span class="p">{</span><span class="s">'mesh'</span><span class="p">:</span> <span class="s">'user-defined'</span><span class="p">,</span> <span class="s">'sample-points'</span><span class="p">:</span><span class="n">X_train</span><span class="p">,</span> <span class="s">'sample-outputs'</span><span class="p">:</span> <span class="n">y_train</span><span class="p">},</span>
<span class="n">solver_args</span><span class="o">=</span><span class="p">{</span><span class="s">'path'</span><span class="p">:</span><span class="bp">False</span><span class="p">,</span><span class="s">'alpha'</span><span class="p">:</span><span class="mf">1.0</span><span class="p">,</span><span class="s">'lambda'</span><span class="p">:</span><span class="mf">6.25e-03</span><span class="p">})</span>
<span class="n">olspoly</span><span class="p">.</span><span class="n">set_model</span><span class="p">()</span>
<span class="n">lassopoly</span><span class="p">.</span><span class="n">set_model</span><span class="p">()</span>
</pre></td></tr></tbody></table></code></pre></div></div>
<p>Using <code class="language-plaintext highlighter-rouge">...poly.get_sobol_indices(1)</code> then gives the Sobol’ indices, which are plotted below.</p>
<figure class="">
<img src="https://ascillitoe.github.io/assets/images/posts/regularisation1/poly_sobol.png" alt="poly_sobol" style="width: 75%;" /><figcaption>
Figure 7: True and estimated Sobol’ indices, using OLS and LASSO regression.
</figcaption></figure>
<p>In a similar fashion to the previous example, OLS is seen to overfit to noise in the training data. Except this time, instead of overfitting to higher order $x$ terms, it is overfitting to the irrelevant independent variables <em>Noise 0</em> and <em>Noise 1</em>. The Sobol indices for these terms are large, which is clearly incorrect since we know they do not actually influence $C$ in the true model. Adding LASSO regularisation removes this problematic behaviour, whilst also improving the estimates for the remaining parameters.</p>
<h2 id="test-accuracy">Test accuracy</h2>
<p>So the LASSO clearly aids model interpretation in this example. To see if it can aid the approximation accuracy once again, let’s examine its performance across a range of $\lambda$ values by plotting the <em>test</em> MSE versus $\lambda$.</p>
<figure class="">
<img src="https://ascillitoe.github.io/assets/images/posts/regularisation1/piston_mse_vs_lambda.png" alt="piston_mse_vs_lambda" style="width: 60%;" /><figcaption>
Figure 8: Test mean squared error versus $\lambda$ parameter for LASSO regression.
</figcaption></figure>
<p>As with the first example, there is a range of $\lambda$ values where the <em>test</em> error is improved over the OLS regression.</p>
<h1 id="conclusions">Conclusions</h1>
<p>In this post we’ve seen how regularisation can help control the polynomial coefficients when fitting high order or high dimensional polynomials. In fact, including regularisation can be even more important when the polynomial is both high order <strong>and</strong> high dimensional. Choosing between ridge and LASSO regularisation can be problem-dependent; LASSO can act as a form of feature selection, yielding sparse solutions with only the important dimensions/features retained. However, LASSO can struggle in the presense of strong collinearity, and in this case ridge regression is preferable. In <em>equadratures</em>, regularisation is achieved via the <code class="language-plaintext highlighter-rouge">elastic-net</code> solver. The elastic net blends the ridge and LASSO penalty terms together in order to achieve a compromise between the two, as will be explored in a future blog post.</p>
<p>In both the examples explored here selection of a suitable $\lambda$ parameter resulted in the test error being reduced relative to OLS regression. However, we didn’t explore how to determine the <em>optimal</em> value of $\lambda$. Efficiently finding the <em>optimal</em> value is an important topic which will also be explored in a future blog post!</p>
<h1 id="references">References</h1>
<p>[1]: Hoerl, A. E.; R. W. Kennard (1970). “<a href="https://www.tandfonline.com/doi/abs/10.1080/00401706.1970.10488634"><em>Ridge regression: Biased estimation for nonorthogonal problems</em></a>”. Technometrics. <strong>12</strong> (1): 55–67.</p>
<p>[2]: Theobald, C. M. (1974) “<a href="https://rss.onlinelibrary.wiley.com/doi/abs/10.1111/j.2517-6161.1974.tb00990.x"><em>Generalizations of mean square error applied to ridge regression</em></a>”, Journal of the Royal Statistical Society, Series B (Methodological), <strong>36</strong>, 103-106.</p>
<p>[3]: Tibshirani, Robert (1996). “<a href="https://rss.onlinelibrary.wiley.com/doi/10.1111/j.2517-6161.1996.tb02080.x"><em>Regression Shrinkage and Selection via the lasso</em></a>”. Journal of the Royal Statistical Society. Series B (methodological). Wiley. <strong>58</strong> (1): 267–88.</p>
<p>[4]: Kenett, R., Shelemyahu Z., and Daniele A., (2013) “<a href="https://www.wiley.com/en-gb/Modern+Industrial+Statistics%3A+with+applications+in+R%2C+MINITAB+and+JMP%2C+2nd+Edition-p-9781118456064"><em>Modern Industrial Statistics: with applications in R, MINITAB and JMP</em></a>”. John Wiley & Sons.</p>
<p>[5]: Tibshirani, Robert (1996). “<a href="https://rss.onlinelibrary.wiley.com/doi/10.1111/j.2517-6161.1996.tb02080.x"><em>Regression Shrinkage and Selection via the lasso</em></a>”. <em>Journal of the Royal Statistical Society</em> . Series B (methodological). Wiley. <strong>58</strong> (1): 267–88.</p>Ashley Scillitoeashleyscillitoe@googlemail.comThis blog post is part of a series of posts exploring the regularised regression techniques we’ve recently added to equadratures, our open-source python library which utilises orthogonal polynomials for a range of tasks. [1]: Hoerl, A. E.; R. W. Kennard (1970). “Ridge regression: Biased estimation for nonorthogonal problems”. Technometrics. 12 (1): 55–67. [2]: Theobald, C. M. (1974) “Generalizations of mean square error applied to ridge regression”, Journal of the Royal Statistical Society, Series B (Methodological), 36, 103-106. [3]: Tibshirani, Robert (1996). “Regression Shrinkage and Selection via the lasso”. Journal of the Royal Statistical Society. Series B (methodological). Wiley. 58 (1): 267–88. [4]: Kenett, R., Shelemyahu Z., and Daniele A., (2013) “Modern Industrial Statistics: with applications in R, MINITAB and JMP”. John Wiley & Sons. [5]: Tibshirani, Robert (1996). “Regression Shrinkage and Selection via the lasso”. Journal of the Royal Statistical Society . Series B (methodological). Wiley. 58 (1): 267–88.Exploring the design of a temperature probe with dimension reduction2020-10-04T00:00:00+01:002020-10-04T00:00:00+01:00https://ascillitoe.github.io/posts/2020/Exploring-the-design-of-a-temperature-probe-with-dimension-reduction<p><a href="https://github.com/ascillitoe/probe-subspaces" class="btn btn--primary"><i class="fas fa-download"></i> Get code</a></p>
<p>In many engineering design tasks, we suffer from the <em>curse of dimensionality</em>. The number of design parameters quickly becomes too large for us to effectively visualise or explore the design space. We could attempt to use a design optimisation procedure to arrive at an “optimal” design, however, the curse of dimensionality places a computational burden on the cost of the optimisation. Also, in many cases we wish to <em>understand</em> the design space, not just spit out a “better” design.</p>
<p>This brings us to <em>dimension reduction</em>, a set of ideas which allows us to reduce high dimensional spaces to lower dimensional ones. By reducing the design space to a small number of dimensions, we can more easily explore it, allowing for:</p>
<ol>
<li>A better physical understanding.</li>
<li>Exploration of previously unexplored areas of the design space.</li>
<li>Obtaining new “better” designs.</li>
<li>Assessing sensitivity to manufacturing uncertainties.</li>
</ol>
<p>This post summarises our recent <a href="https://www.researchgate.net/publication/344362850_Design_Space_Exploration_of_Stagnation_Temperature_Probes_via_Dimension_Reduction">ASME Turbo Expo 2020 paper</a>, where we use <em>equadratures</em> to perform dimension reduction on the design space of a stagnation temperature probe used in aircraft jet engines.</p>
<h2 id="part-1-obtaining-dimension-reducing-subspaces">Part 1: Obtaining dimension reducing subspaces</h2>
<h3 id="the-probes-design-space">The probe’s design space</h3>
<p>The temperature probe considered is shown in Figure 1. The probe design is parameterised by the seven design parameters, so our input design vector lies in a 7D design space $\mathbf{x} \in \mathbb{R}^7$.</p>
<figure class="">
<img src="https://ascillitoe.github.io/assets/images/posts/probe_design/probe_schematic.png" alt="Baseline probe design" style="width: ;" /><figcaption>
Figure 1: The baseline temperature probe design.
</figcaption></figure>
<p>Probes such as this are used in aero engines to measure the stagnation temperature in the flow. Ideally, we wish to bring the flow to rest isentropically, so that the temperature measured at the thermocouple $T_m$ is equal to the stagnation temperature $T_0$. However, in reality, various error sources mean the probe’s <em>recovery ratio</em> $R_r=T_m/T_0$ is always less than one.</p>
<p>In the paper we consider two design objectives:</p>
<ol>
<li>
<p>To reduce measurement errors, we attempt to minimise the recovery ratio’s sensitivity to Mach number $\partial R_r/\partial M$.</p>
</li>
<li>
<p>To reduce the probe’s contamination of the surrounding flow, we wish to minimise the probe’s pressure loss coefficient $Y_p = (P_{0,in}-P_{0,out})/(P_{0,in}-P_{out})$, averaged across the Mach number range.</p>
</li>
</ol>
<p>Both of these design objectives, $O_{R_r}$ and $O_{Y_p}$, are a function of our 7D design space $\mathbf{x}$:</p>
\[O_{Y_p} = g(\mathbf{x}) \\
O_{R_r} = f(\mathbf{x})\]
<h3 id="design-of-experiment">Design of experiment</h3>
<p>Before we can do any dimension reduction, we need some data! To sample the design space, 128 designs are drawn uniformly within the limits of the design space (using Latin hypercube sampling). These are meshed using @bubald’s great mesh morphing code, and put through a CFD solver (at 6 different Mach numbers). We end up with 128 different unique design vectors $\mathbf{x}$, and we post process the 768 CFD results to obtain 128 values of $O_{Y_p}$ and $O_{R_r}$.</p>
<figure class="">
<img src="https://ascillitoe.github.io/assets/images/posts/probe_design/Mach_contours.jpg" alt="Mach contours" style="width: 60%;" /><figcaption>
Figure 2: Mach number contours for a selected probe design.
</figcaption></figure>
<h3 id="dimension-reduction-via-variable-projection">Dimension reduction via variable projection</h3>
<p>As before, let $\mathbf{x} \in \mathbb{R}^{d}$ (with $d=7$) represent a <em>sample</em> within our design space $\chi$ and within this space let $f \left( \mathbf{x} \right)$ represent our aerothermal functional, which could be either $O_{Y_{p}}\left( \mathbf{x} \right) $ or $O_{R_{r}}\left( \mathbf{x} \right) $. Our goal is to construct the approximation</p>
\[f \left( \mathbf{x} \right) \approx h \left( \mathbf{U}^{T} \mathbf{x} \right),\]
<p>where $\mathbf{U} \in \mathbb{R}^{d \times m}$ is an orthogonal matrix with $m \ll d$, implying that $h$ is a polynomial function of $m$ variables—ideally $m=1$ or $m=2$ to facilitate easy visualization. In addition to $m$, the polynomial order of $h$, given by $k$, must also be chosen. The matrix $\mathbf{U}$ isolates $m$ linear combinations of <em>all</em> the design parameters that are deemed sufficient for approximating $f$ with $h$. <em>equadratures</em> possesses two methods for determining the unknowns $\mathbf{U}$ and $h$; the <code class="language-plaintext highlighter-rouge">method=active-subspace</code> uses ideas from [1] to compute a dimension-reducing subspace with a global polynomial approximant, whilst <code class="language-plaintext highlighter-rouge">method=variable-projection</code> [2] solves a Gauss-Newton optimisation problem to compute both the polynomial coefficients and its subspace. Both of these methods involve finding solutions to the non-linear least squares problem</p>
\[\underset{\mathbf{U}, \boldsymbol{\alpha}}{\text{minimize}} \; \; \left\Vert f\left(\mathbf{x}\right)-h_{\boldsymbol{\alpha}}\left(\mathbf{U}^{T} \mathbf{x}\right)\right\Vert _{2}^{2},\]
<p>where $\boldsymbol{\alpha}$ represents unknown model variables associated with $h$. In practice, to solve this optimization problem, we assemble the $N=128$ input-output data pairs</p>
\[\mathbf{X}=\left[\begin{array}{c}
\mathbf{x}_{1}^{T}\\
\vdots\\
\mathbf{x}_{N}^{T}
\end{array}\right], \; \; \; \; \mathbf{f}=\left[\begin{array}{c}
f_{1}\\
\vdots\\
f_{N}
\end{array}\right],\]
<p>and replace $f \left( \mathbf{x} \right)$ in the least squares problem above with the evaluations $\mathbf{f}$. To do this in *equadratures it is as simple as doing:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
</pre></td><td class="rouge-code"><pre><span class="n">m_Y</span> <span class="o">=</span> <span class="mi">1</span> <span class="c1"># Number of reduced dimensions we want
</span><span class="n">k_Y</span> <span class="o">=</span> <span class="mi">1</span> <span class="c1">#Polynomial order
</span>
<span class="c1"># Find a dimension reducing subspace for OYp
</span><span class="n">mysubspace_Y</span> <span class="o">=</span> <span class="n">Subspaces</span><span class="p">(</span><span class="n">method</span><span class="o">=</span><span class="s">'variable-projection'</span><span class="p">,</span> <span class="n">sample_points</span><span class="o">=</span><span class="n">X</span><span class="p">,</span> <span class="n">sample_outputs</span><span class="o">=</span><span class="n">OYp</span><span class="p">,</span> <span class="n">polynomial_degree</span><span class="o">=</span><span class="n">k_Y</span><span class="p">,</span> <span class="n">subspace_dimension</span><span class="o">=</span><span class="n">m_Y</span><span class="p">)</span>
<span class="c1"># Get the subspace Poly for use later
</span><span class="n">subpoly_Y</span> <span class="o">=</span> <span class="n">mysubspace_Y</span><span class="p">.</span><span class="n">get_subspace_polynomial</span><span class="p">()</span>
</pre></td></tr></tbody></table></code></pre></div></div>
<p>There remains the question of what values to choose for the number of reduced dimensions $m$ and the polynomial order $k$. This is problem dependent, but a simple grid search is usually sufficient here. This involves looping through different values i.e. $m=[1,2,3]$ and $k=[1,2,3]$, evaluating the quality of the resulting dimension reducing approximations, and choosing values of $k$ and $m$ which give the best approximations. To quantify the quality of the approximations we used adjusted $R^2$ (see <a href="https://en.wikipedia.org/wiki/Coefficient_of_determination#Adjusted_R2">here</a>), which can be calculated with the <code class="language-plaintext highlighter-rouge">score</code> helper function in <code class="language-plaintext highlighter-rouge">equadratures.datasets</code>:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
</pre></td><td class="rouge-code"><pre><span class="n">OYp_pred</span> <span class="o">=</span> <span class="n">subpoly_Y</span><span class="p">.</span><span class="n">get_polyfit</span><span class="p">(</span><span class="n">X</span><span class="p">)</span>
<span class="n">r2score</span> <span class="o">=</span> <span class="n">score</span><span class="p">(</span><span class="n">OYp</span><span class="p">,</span> <span class="n">OYp_pred</span><span class="p">,</span> <span class="s">'adjusted_r2'</span><span class="p">,</span> <span class="n">X</span><span class="p">)</span>
</pre></td></tr></tbody></table></code></pre></div></div>
<p><strong><em>Note:</em></strong> We measure the $R^2$ scores on the <em>training data</em> here, i.e. the $N=128$ designs we used to obtain the approximations. This is OK in this case since we have only gone up to $k=3$ so we’re not too concerned with <em>overfitting</em>. If you were to try higher Polynomial orders it would be important to split the data into <em>train</em> and <em>test</em> data, and examine the $R^2$ scores on the test data to judge how well the approximations generalise to data not seen during training.</p>
<h3 id="the-subspaces">The subspaces</h3>
<p>Upon performing a grid search for $O_{Yp}$ and $O_{Rr}$, we find $k,m=1$ are sufficient for $O_{Yp}$, but $k=3, m=2$ are required for $O_{Rr}$. With these values, we can see in Figure 3 that we get relatively good approximations for $O_{Yp}$ and $O_{Rr}$.</p>
<figure class="">
<img src="https://ascillitoe.github.io/assets/images/posts/probe_design/predict_vs_true.png" alt="Predicted vs true" style="width: 90%;" /><figcaption>
Figure 3: Predicted vs true values for the two dimension reducing approximations. $O_{Yp}$ on left, and $O_{Rr}$ on right.
</figcaption></figure>
<p>Now for the exciting part! The actual dimension reducing subspaces! A <em>sufficient summary plot</em> for $O_{Y_p}$, which summarises its behaviour in its reduced dimensional space, can easily be obtained from the <code class="language-plaintext highlighter-rouge">mysubspace_Y</code> object from earlier:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
</pre></td><td class="rouge-code"><pre><span class="c1"># Get the subspace matrix U
</span><span class="n">W_Y</span> <span class="o">=</span> <span class="n">mysubspace_Y</span><span class="p">.</span><span class="n">get_subspace</span><span class="p">()</span>
<span class="n">U_Y</span> <span class="o">=</span> <span class="n">W</span><span class="p">[:,</span><span class="mi">0</span><span class="p">:</span><span class="n">m_Y</span><span class="p">]</span>
<span class="c1"># Get the reduced dimension design vectors u=U^T.x
</span><span class="n">u_Y</span> <span class="o">=</span> <span class="n">X</span> <span class="o">@</span> <span class="n">U_Y</span>
<span class="c1"># Plot the training data points on the dimension reducing subspace
</span><span class="n">plt</span><span class="p">.</span><span class="n">scatter</span><span class="p">(</span><span class="n">u_Y</span><span class="p">,</span> <span class="n">OYp</span><span class="p">,</span> <span class="n">s</span><span class="o">=</span><span class="mi">70</span><span class="p">,</span> <span class="n">c</span><span class="o">=</span><span class="n">OYp</span><span class="p">,</span> <span class="n">marker</span><span class="o">=</span><span class="s">'o'</span><span class="p">,</span> <span class="n">edgecolors</span><span class="o">=</span><span class="s">'k'</span><span class="p">,</span> <span class="n">linewidths</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">cmap</span><span class="o">=</span><span class="n">cm</span><span class="p">.</span><span class="n">coolwarm</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="s">'Training designs'</span><span class="p">)</span>
<span class="c1"># Plot the subspace polynomial
</span><span class="n">u_samples</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">linspace</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="nb">min</span><span class="p">(</span><span class="n">u_Y</span><span class="p">[:,</span><span class="mi">0</span><span class="p">]),</span> <span class="n">np</span><span class="p">.</span><span class="nb">max</span><span class="p">(</span><span class="n">u_Y</span><span class="p">[:,</span><span class="mi">0</span><span class="p">]),</span> <span class="mi">100</span><span class="p">)</span>
<span class="n">OYp_poly</span> <span class="o">=</span> <span class="n">subpoly_Y</span><span class="p">.</span><span class="n">get_polyfit</span><span class="p">(</span> <span class="n">u_samples</span> <span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">plot</span><span class="p">(</span><span class="n">u_samples</span><span class="p">,</span> <span class="n">OYp_poly</span><span class="p">,</span> <span class="n">c</span><span class="o">=</span><span class="s">'C2'</span><span class="p">,</span> <span class="n">lw</span><span class="o">=</span><span class="mi">3</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="s">'Polynomial approx.'</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">legend</span><span class="p">()</span>
<span class="n">plt</span><span class="p">.</span><span class="n">show</span><span class="p">()</span>
</pre></td></tr></tbody></table></code></pre></div></div>
<figure class="">
<img src="https://ascillitoe.github.io/assets/images/posts/probe_design/oyp_summary_plot.png" alt="OYp summary plot" style="width: 50%;" /><figcaption>
Figure 4: Sufficient summary plot for the $O_{Yp}$ design objective.
</figcaption></figure>
<p>This shows that we have successfully mapped the original 7D function $O_{Y_p} = g(\mathbf{x})$ onto a 1D subspace $\mathbf{u}_Y = \mathbf{U}_Y^{T} \mathbf{x}$, and in this case $O_{Y_p}$ varies linearly with $\mathbf{u}_{Y}$.</p>
<p>Following a similar approach, but this time for the 2D $O_{R_r}$ subspace, gives us the summary plot shown in Figure 5. This one is especially interesting. $O_{R_r}$ appears to vary quadratically in one direction, with a clear minimum around $u_{R,1}\approx1$, while it decreases relatively linearly in the second direction $u_{R,2}$.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
</pre></td><td class="rouge-code"><pre><span class="n">m_R</span> <span class="o">=</span> <span class="mi">2</span> <span class="c1"># Number of reduced dimensions we want
</span><span class="n">k_R</span> <span class="o">=</span> <span class="mi">3</span> <span class="c1">#Polynomial order
</span>
<span class="c1"># Find a dimension reducing subspace for ORr
</span><span class="n">mysubspace_R</span> <span class="o">=</span> <span class="n">Subspaces</span><span class="p">(</span><span class="n">method</span><span class="o">=</span><span class="s">'variable-projection'</span><span class="p">,</span> <span class="n">sample_points</span><span class="o">=</span><span class="n">X</span><span class="p">,</span> <span class="n">sample_outputs</span><span class="o">=</span><span class="n">ORr</span><span class="p">,</span> <span class="n">polynomial_degree</span><span class="o">=</span><span class="n">k_R</span><span class="p">,</span> <span class="n">subspace_dimension</span><span class="o">=</span><span class="n">m_R</span><span class="p">)</span>
<span class="n">subpoly_R</span> <span class="o">=</span> <span class="n">mysubspace_R</span><span class="p">.</span><span class="n">get_subspace_polynomial</span><span class="p">()</span>
<span class="n">W_R</span> <span class="o">=</span> <span class="n">mysubspace_R</span><span class="p">.</span><span class="n">get_subspace</span><span class="p">()</span>
<span class="n">U_R</span> <span class="o">=</span> <span class="n">W</span><span class="p">[:,</span><span class="mi">0</span><span class="p">:</span><span class="n">m_R</span><span class="p">]</span>
<span class="n">u_R</span> <span class="o">=</span> <span class="n">X</span> <span class="o">@</span> <span class="n">U_R</span>
<span class="c1"># Plot the training data as a 3d scatter plot
</span><span class="n">figRr</span> <span class="o">=</span> <span class="n">plt</span><span class="p">.</span><span class="n">figure</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">10</span><span class="p">,</span><span class="mi">10</span><span class="p">))</span>
<span class="n">axRr</span> <span class="o">=</span> <span class="n">figRr</span><span class="p">.</span><span class="n">add_subplot</span><span class="p">(</span><span class="mi">111</span><span class="p">,</span> <span class="n">projection</span><span class="o">=</span><span class="s">'3d'</span><span class="p">)</span>
<span class="n">axRr</span><span class="p">.</span><span class="n">scatter</span><span class="p">(</span><span class="n">u_R</span><span class="p">[:,</span><span class="mi">0</span><span class="p">],</span> <span class="n">u_R</span><span class="p">[:,</span><span class="mi">1</span><span class="p">],</span> <span class="n">ORr</span><span class="p">,</span> <span class="n">s</span><span class="o">=</span><span class="mi">70</span><span class="p">,</span> <span class="n">c</span><span class="o">=</span><span class="n">ORr</span><span class="p">,</span> <span class="n">marker</span><span class="o">=</span><span class="s">'o'</span><span class="p">,</span> <span class="n">ec</span><span class="o">=</span><span class="s">'k'</span><span class="p">,</span> <span class="n">lw</span><span class="o">=</span><span class="mi">2</span><span class="p">)</span>
<span class="c1"># Plot the Poly approx as a 3D surface
</span><span class="n">N</span> <span class="o">=</span> <span class="mi">20</span>
<span class="n">ur1_samples</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">linspace</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="nb">min</span><span class="p">(</span><span class="n">u_R</span><span class="p">[:,</span><span class="mi">0</span><span class="p">]),</span> <span class="n">np</span><span class="p">.</span><span class="nb">max</span><span class="p">(</span><span class="n">u_R</span><span class="p">[:,</span><span class="mi">0</span><span class="p">]),</span> <span class="n">N</span><span class="p">)</span>
<span class="n">ur2_samples</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">linspace</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="nb">min</span><span class="p">(</span><span class="n">u_R</span><span class="p">[:,</span><span class="mi">1</span><span class="p">]),</span> <span class="n">np</span><span class="p">.</span><span class="nb">max</span><span class="p">(</span><span class="n">u_R</span><span class="p">[:,</span><span class="mi">1</span><span class="p">]),</span> <span class="n">N</span><span class="p">)</span>
<span class="p">[</span><span class="n">ur1</span><span class="p">,</span> <span class="n">ur2</span><span class="p">]</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">meshgrid</span><span class="p">(</span><span class="n">ur1_samples</span><span class="p">,</span> <span class="n">ur2_samples</span><span class="p">)</span>
<span class="n">ur1_vec</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">reshape</span><span class="p">(</span><span class="n">ur1</span><span class="p">,</span> <span class="p">(</span><span class="n">N</span><span class="o">*</span><span class="n">N</span><span class="p">,</span> <span class="mi">1</span><span class="p">))</span>
<span class="n">ur2_vec</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">reshape</span><span class="p">(</span><span class="n">ur2</span><span class="p">,</span> <span class="p">(</span><span class="n">N</span><span class="o">*</span><span class="n">N</span><span class="p">,</span> <span class="mi">1</span><span class="p">))</span>
<span class="n">samples</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">hstack</span><span class="p">([</span><span class="n">ur1_vec</span><span class="p">,</span> <span class="n">ur2_vec</span><span class="p">])</span>
<span class="n">ORr_poly</span> <span class="o">=</span> <span class="n">subpoly_R</span><span class="p">.</span><span class="n">get_polyfit</span><span class="p">(</span><span class="n">samples</span><span class="p">).</span><span class="n">reshape</span><span class="p">(</span><span class="n">N</span><span class="p">,</span> <span class="n">N</span><span class="p">)</span>
<span class="n">surf</span> <span class="o">=</span> <span class="n">axRr</span><span class="p">.</span><span class="n">plot_surface</span><span class="p">(</span><span class="n">ur1</span><span class="p">,</span> <span class="n">ur2</span><span class="p">,</span> <span class="n">ORr_poly</span><span class="p">,</span> <span class="n">rstride</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">cstride</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">cmap</span><span class="o">=</span><span class="n">cm</span><span class="p">.</span><span class="n">gist_earth</span><span class="p">,</span> <span class="n">lw</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">alpha</span><span class="o">=</span><span class="mf">0.5</span><span class="p">)</span>
</pre></td></tr></tbody></table></code></pre></div></div>
<div align="center">
<figure class="">
<img src="https://ascillitoe.github.io/assets/images/posts/probe_design/orr_summary_plot.gif" alt="ORr summary plot" style="width: 75%;" /><figcaption>
Figure 5: Sufficient summary plot for the $O_{Rr}$ design objective.
</figcaption></figure>
</div>
<p>That brings us to the end for now! In the second part of this post, I’ll demonstrate how these dimension reducing subspaces can actually be used.</p>
<h2 id="part-2-using-the-dimension-reducing-subspaces">Part 2: Using the dimension reducing subspaces</h2>
<h3 id="physical-insights">Physical insights</h3>
<p>In Figure 4, we saw that $O_{Yp}$ is a linear function of its dimension reducing subspace $\mathbf{u}_Y=\mathbf{U}_Y^T\mathbf{x}$. In other words, each $j^{th}$ design has its own value of ${u_Y}_j \in \mathbb{R^1}$, which we can obtain by multiplying its oriignal design vector $\mathbf{x}_j\in \mathbb{R}^7$ by $\mathbf{U}_Y \in \mathbb{R}^{7\times1}$. The matrix $\mathbf{U}_Y$ (or vector in this case) gives us information on how the components of $\mathbf{x}$ (i.e. the original 7 design variables) move us around the $\mathbf{u}_Y$ subspace:</p>
\[\mathbf{U}_{Y} =[0.05,0.01,\overbrace{-0.12}^{\text{Angle hole}},-0.02,\overbrace{0.98}^{\text{Kiel }\oslash_{outer}},0.02,\overbrace{0.17}^{\text{Hole }\oslash}]\]
<p>For example, The 0.98 for the Kiel outer diameter, tells us that increasing this design parameter will significantly increase $\mathbf{u}_Y$, and as Figure 4 shows, decrease $O_{Yp}$. On the other hand, the very small numbers for other elements of $\mathbf{U}_Y$ implies that varying these design parameters will have almost no impact on $\mathbf{u}_Y$ (and therefore $O_{Yp}$).</p>
<p>Physically, it makes sense that increasing the Kiel outer diameter increases $O_{Yp}$, as much of the pressure loss is arising from the bluff body pressure drag effect of the probe, and this gets worse as the probe diameter is increased. The above set of ideas can also provide less obvious physical insights though. For example, consider the $O_{Rr}$ subspace, viewed from above:</p>
<figure class="">
<img src="https://ascillitoe.github.io/assets/images/posts/probe_design/orr_summary_plot_above.png" alt="ORr summary plot from above" style="width: 65%;" /><figcaption>
Figure 6: The $O_{Rr}$ sufficient summary plot viewed from above
</figcaption></figure>
<p>Its no longer as simple as looking at what components of $\mathbf{x}$ move us from left to right, as we’re in 2D now. Instead, we must choose what direction to move in. i.e. let’s take the direction $\mathbf{v}_a$ shown in Figure 6. If we look at the components of the vector-matrix product $\mathbf{v}_a \mathbf{U}_R$ (where $\mathbf{U}_R \in \mathbb{R}^{7\times 2}$ is the subspace matrix for $O_{Rr}$):</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
</pre></td><td class="rouge-code"><pre><span class="c1"># Define va as a vector between two designs
</span><span class="n">va</span> <span class="o">=</span> <span class="n">u_R</span><span class="p">[</span><span class="mi">112</span><span class="p">,:]</span> <span class="o">-</span> <span class="n">u_R</span><span class="p">[</span><span class="mi">58</span><span class="p">,:]</span>
<span class="c1"># Normalise va
</span><span class="n">va</span> <span class="o">/=</span> <span class="n">np</span><span class="p">.</span><span class="n">sqrt</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="nb">sum</span><span class="p">(</span><span class="n">va</span><span class="o">*</span><span class="n">va</span><span class="p">))</span>
<span class="c1"># Take product va*Ur
</span><span class="n">prod</span> <span class="o">=</span> <span class="n">va</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">*</span><span class="n">U_R</span><span class="p">[:,</span><span class="mi">0</span><span class="p">]</span> <span class="o">+</span> <span class="n">va</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span><span class="o">*</span><span class="n">U_R</span><span class="p">[:,</span><span class="mi">1</span><span class="p">]</span>
<span class="c1"># and then plot bar chart showing components of va...
</span></pre></td></tr></tbody></table></code></pre></div></div>
<figure class="">
<img src="https://ascillitoe.github.io/assets/images/posts/probe_design/Rr_weights.png" alt="ORr weights" style="width: 55%;" /><figcaption>
Figure 7: Components of the vector-matrix product $\mathbf{v}_a \mathbf{U}_R$
</figcaption></figure>
<p>This tells us how elements in $\mathbf{x}$ move us along the vector $\mathbf{v}_a$. For example, increasing the vent hole diameter moves us forward along $\mathbf{v}_a$, whilst increasing the Kiel inner diameter moves us in the reverse direction. In the paper, we go into more detail about what physical insights such as these can tell us about the probe’s design space.</p>
<h3 id="obtaining-new-designs">Obtaining new designs</h3>
<p>Before looking at finding new designs, its valuable to consider the bounds of the current design space. Our design vectors $\mathbf{x}_j$ can be considered to lie in a 7 dimensional hypercube</p>
\[\chi \subset [-1,1]^7 \;\;\; \text{where} \;\;\; \mathbf{x}_j \in \chi \;\;\; \text{for} \;\;\; j=1,\dots,N,\]
<p>Similar to the way a human (a 3D object!) projects a 2D shadow on the ground, our 7D hypercube projects a 2D silhouette onto the 2D $O_{Rr}$ subspace (thanks to @psesh for the analogy!).</p>
<figure class="">
<img src="https://ascillitoe.github.io/assets/images/posts/probe_design/shadow.jpg" alt="shadow" style="width: 55%;" /><figcaption>
Figure 8: A shadow is a 2D projection of a 3D object
</figcaption></figure>
<p>This 2D projection of the design space is referred to as the <em>zonotope</em>, and can be obtained from <em>equadratures</em> with the <code class="language-plaintext highlighter-rouge">.get_zonotope_vertices()</code> method. Below this is plotted, along with contours of $O_{Rr}$.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
</pre></td><td class="rouge-code"><pre><span class="c1"># Plot zonotope for Rr (the black line)
</span><span class="n">zone_R</span> <span class="o">=</span> <span class="n">mysubspace_R</span><span class="p">.</span><span class="n">get_zonotope_vertices</span><span class="p">()</span>
<span class="n">zone_R</span> <span class="o">=</span> <span class="n">polar_sort</span><span class="p">(</span><span class="n">zone_R</span><span class="p">)</span> <span class="c1"># polar_sort function available on request
</span><span class="n">plt</span><span class="p">.</span><span class="n">plot</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">zone_R</span><span class="p">[:,</span><span class="mi">0</span><span class="p">],</span><span class="n">zone_R</span><span class="p">[</span><span class="mi">0</span><span class="p">,</span><span class="mi">0</span><span class="p">]),</span> <span class="n">np</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">zone_R</span><span class="p">[:,</span><span class="mi">1</span><span class="p">],</span><span class="n">zone_R</span><span class="p">[</span><span class="mi">0</span><span class="p">,</span><span class="mi">1</span><span class="p">]),</span><span class="s">'k-'</span><span class="p">,</span><span class="n">lw</span><span class="o">=</span><span class="mi">3</span><span class="p">)</span>
<span class="c1"># Plot color contours of ORr (within the convex hull of the current u_R samples)
# The grid_within_hull function uses scipy.spatial.ConvexHull to find the convex hull
# of the given points (available upon request).
</span><span class="n">new_samples</span> <span class="o">=</span> <span class="n">grid_within_hull</span><span class="p">(</span><span class="n">u_R</span><span class="p">,</span> <span class="n">N</span><span class="o">=</span><span class="mi">25</span><span class="p">)</span>
<span class="n">ORr_values</span> <span class="o">=</span> <span class="n">subpoly_R</span><span class="p">.</span><span class="n">get_polyfit</span><span class="p">(</span><span class="n">new_samples</span><span class="p">)</span>
<span class="n">cont</span> <span class="o">=</span> <span class="n">plt</span><span class="p">.</span><span class="n">tricontourf</span><span class="p">(</span><span class="n">new_samples</span><span class="p">[:,</span><span class="mi">0</span><span class="p">],</span> <span class="n">new_samples</span><span class="p">[:,</span><span class="mi">1</span><span class="p">],</span> <span class="n">ORr_values</span><span class="p">,</span> <span class="n">alpha</span><span class="o">=</span><span class="mf">0.8</span><span class="p">,</span> <span class="n">levels</span> <span class="o">=</span> <span class="mi">20</span><span class="p">,</span> <span class="n">cmap</span><span class="o">=</span><span class="n">cm</span><span class="p">.</span><span class="n">gist_earth</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">tricontour</span><span class="p">(</span><span class="n">new_samples</span><span class="p">[:,</span><span class="mi">0</span><span class="p">],</span> <span class="n">new_samples</span><span class="p">[:,</span><span class="mi">1</span><span class="p">],</span> <span class="n">ORr_values</span><span class="p">,</span> <span class="n">alpha</span><span class="o">=</span><span class="mf">0.5</span><span class="p">,</span> <span class="n">levels</span><span class="o">=</span><span class="mi">20</span><span class="p">,</span> <span class="n">colors</span><span class="o">=</span><span class="s">'k'</span><span class="p">)</span>
<span class="c1"># Plot baseline design location and direction of new design
</span><span class="n">plt</span><span class="p">.</span><span class="n">plot</span><span class="p">(</span><span class="n">uRbaseline</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span><span class="n">uRbaseline</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="s">'oC3'</span><span class="p">,</span> <span class="n">ms</span><span class="o">=</span><span class="mi">15</span><span class="p">,</span> <span class="n">mfc</span><span class="o">=</span><span class="s">'none'</span><span class="p">,</span> <span class="n">mew</span><span class="o">=</span><span class="mi">3</span><span class="p">)</span>
<span class="n">unew</span> <span class="o">=</span> <span class="p">[</span><span class="mf">0.4</span><span class="p">,</span><span class="o">-</span><span class="mf">1.4</span><span class="p">]</span>
<span class="n">vnew</span> <span class="o">=</span> <span class="n">unew</span> <span class="o">-</span> <span class="n">u2baseline</span>
<span class="n">plt</span><span class="p">.</span><span class="n">arrow</span><span class="p">(</span><span class="n">u2baseline</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">u2baseline</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="n">vnew</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">vnew</span><span class="p">[</span><span class="mi">1</span><span class="p">])</span>
</pre></td></tr></tbody></table></code></pre></div></div>
<figure class="">
<img src="https://ascillitoe.github.io/assets/images/posts/probe_design/manufacturing.png" alt="ORr subspace with zonotope" style="width: 65%;" /><figcaption>
Figure 9: The sufficient summary plot for $O_{Rr}$, with the zonotope added
</figcaption></figure>
<p>The contours of $O_{Rr}$ are only plotted within the convex hull of the training data. In other words, the contours are only plotted in the region of the design space covered by our CFD simulations. This is eye opening, although our DoE uniformly sampled throughout the region $\chi \subset [-1,1]^7$, there are still large areas of the design space which are completely unexplored!</p>
<p>It follows that we can use the above plot to discover new designs in unexplored regions of the design space. The baseline design is highlighted by the red circle in Figure 9. It looks like we might be able to lower $O_{Rr}$ even further by heading off in the direction of the red arrow. We can pick a new point e.g. $\mathbf{u}_{R,new} = (0.4,-1.4)$, and generate new design vectors $\mathbf{x}_{new}$ there:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
</pre></td><td class="rouge-code"><pre><span class="c1"># Generate 10 new xnew vectors for the chosen unew vector
# xnew has dimensions (10,7)
</span><span class="n">unew</span> <span class="o">=</span> <span class="p">[</span><span class="mf">0.4</span><span class="p">,</span><span class="o">-</span><span class="mf">1.4</span><span class="p">]</span>
<span class="n">xnew</span> <span class="o">=</span> <span class="n">mysubspace_R</span><span class="p">.</span><span class="n">get_samples_constraining_active_coordinates</span><span class="p">(</span><span class="mi">10</span><span class="p">,</span><span class="n">unew</span><span class="p">)</span>
</pre></td></tr></tbody></table></code></pre></div></div>
<p>There are an infinite number of $\mathbf{x}$ vectors which can be transformed to a single $\mathbf{u_R}$ vector. Therefore, when finding $\mathbf{x}$ for a given $\mathbf{u_R}$, the <code class="language-plaintext highlighter-rouge">.get_samples_constraining_active_coordinates()</code> method will rapidly generate as many unique $\mathbf{x}$ design vectors as we want. You could generate 100 (or 10000!) designs, and select a design according to other design constraints. In the paper we use the $O_{Yp}$ approximation to quickly approximate $O_{Yp}$ for each of the new designs, and then choose designs which minimise both design objectives together. <em>Please ask if you’d like the code used to do this step!</em></p>
<h3 id="sensitivity-to-manufacturing-uncertainty">Sensitivity to manufacturing uncertainty</h3>
<p>So far, we’ve been trying to minimise $O_{Rr}=\partial R_r/\partial M$, the sensitivity of $R_r$ with respect to Mach number. For real probes, manufacturing tolerances might mean it’s also important to minimise the sensitivity of $R_r$ with respect to the input design vector $\mathbf{x}$. One approach to understanding the sensitivities is to fit a polynomial in the full 7D design space, and then compute Sobol indices (see @Nick’s post https://discourse.effective-quadratures.org/t/sensitivity-analysis-with-effective-quadratures/30).</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
</pre></td><td class="rouge-code"><pre><span class="c1"># Construct a Poly for Rr (at Mach=0.8)
</span> <span class="n">s</span> <span class="o">=</span> <span class="n">Parameter</span><span class="p">(</span><span class="n">distribution</span><span class="o">=</span><span class="s">'uniform'</span><span class="p">,</span> <span class="n">lower</span><span class="o">=-</span><span class="mf">1.</span><span class="p">,</span> <span class="n">upper</span><span class="o">=</span><span class="mf">1.</span><span class="p">,</span> <span class="n">order</span><span class="o">=</span><span class="mi">2</span><span class="p">)</span>
<span class="n">myparams</span> <span class="o">=</span> <span class="p">[</span><span class="n">s</span> <span class="k">for</span> <span class="n">_</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="n">dim</span><span class="p">)]</span>
<span class="n">mybasis</span> <span class="o">=</span> <span class="n">Basis</span><span class="p">(</span><span class="s">'total-order'</span><span class="p">)</span>
<span class="n">mypoly</span> <span class="o">=</span> <span class="n">Poly</span><span class="p">(</span><span class="n">parameters</span><span class="o">=</span><span class="n">myparameters</span><span class="p">,</span> <span class="n">basis</span><span class="o">=</span><span class="n">mybasis</span><span class="p">,</span> <span class="n">method</span><span class="o">=</span><span class="s">'least-squares'</span><span class="p">,</span> <span class="n">sampling_args</span><span class="o">=</span> <span class="p">{</span><span class="s">'mesh'</span><span class="p">:</span> <span class="s">'user-defined'</span><span class="p">,</span> <span class="s">'sample-points'</span><span class="p">:</span> <span class="n">X</span><span class="p">,</span> <span class="s">'sample-outputs'</span><span class="p">:</span> <span class="n">Rr</span><span class="p">})</span>
<span class="n">mypoly</span><span class="p">.</span><span class="n">set_model</span><span class="p">()</span>
<span class="c1"># Check R2
</span><span class="n">Rr_pred</span> <span class="o">=</span> <span class="n">mypoly</span><span class="p">.</span><span class="n">get_polyfit</span><span class="p">(</span><span class="n">X</span><span class="p">)</span>
<span class="n">r2score</span> <span class="o">=</span> <span class="n">score</span><span class="p">(</span><span class="n">Rr</span><span class="p">,</span> <span class="n">Rr_pred</span><span class="p">,</span> <span class="s">'adjusted_r2'</span><span class="p">,</span> <span class="n">X</span><span class="p">)</span>
<span class="c1"># Get Sobol indices
</span><span class="n">Si</span> <span class="o">=</span> <span class="n">mypoly</span><span class="p">.</span><span class="n">get_sobol_indices</span><span class="p">(</span><span class="n">order</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="n">Sij</span> <span class="o">=</span> <span class="n">mypoly</span><span class="p">.</span><span class="n">get_sobol_indices</span><span class="p">(</span><span class="n">order</span><span class="o">=</span><span class="mi">2</span><span class="p">)</span>
<span class="c1"># Plot...
</span></pre></td></tr></tbody></table></code></pre></div></div>
<figure class="">
<img src="https://ascillitoe.github.io/assets/images/posts/probe_design/sobol.png" alt="Sobol indices" style="width: 50%;" /><figcaption>
Figure 10: First and second order Sobol indices for $R_r$ at Mach=0.8
</figcaption></figure>
<p>This is certainly informative, for example, we can see that $R_r$ is most sensitive to the hole diameter, followed by the Kiel inner diameter and hole ellipse. If want to limit uncertainty in $R_r$, and therefore the measured $T_0$, we need to have tight controls on the manufacturing tolerances of these parameters.</p>
<p>With dimension reduction we can go much further though! To explore this, we construct another dimension reducing approximation for $R_r$ itself (at Mach=0.8):</p>
\[R_r \left(\mathbf{x}_j \right) \approx \hat{g} \left(\hat{\mathbf{U}}^T \mathbf{x}_j \right)\]
<p>In this case we are concerned with the sensitivity of $R_r$ to perturbations in the design parameters:</p>
\[R_r \left( \mathbf{x}_j + \Delta \right) \approx \hat{g} \left( \hat{\mathbf{U}}^T \left( \mathbf{x}_j + \Delta \right) \right)\]
<p>where $\Delta$ represents manufacturing variations injected into each design parameter. The $R_r$ subspace is shown in Figure 11 (left). The arrows demonstrate the influence of perturbing each parameter individually by $\Delta=0.1$. Perturbations are applied to the baseline design, but the decomposition</p>
\[\hat{\mathbf{U}}^T \left( \mathbf{x}_j + \Delta \right) = \hat{\mathbf{U}}^T \mathbf{x}_j + \hat{\mathbf{U}}^T\Delta\]
<p>proves that the arrows will be the same anywhere in the $R_r$ subspace (the $\hat{\mathbf{U}}^T\Delta$ term is not a function of $\mathbf{x}_j$). Comparing the magnitude of the arrows indicates what design parameters cause the most significant movement in the $R_r$ subspace. Additionally, by also considering the contours of $R_r$, the arrows indicate what parameters are important for a given design.</p>
<figure class="">
<img src="https://ascillitoe.github.io/assets/images/posts/probe_design/manufacturing2.png" alt="Manufacturing uncertainties" style="width: ;" /><figcaption>
Figure 11: Summary plots for $R_r$ at Mach=0.8. Left: uncertainties in three different parameters (1. Hole $\oslash$, 2. Kiel $\oslash_{inner}$, 3: Kiel $\oslash_{outer}$) for one design. Right: uncertainty in one design parameter, for two different designs.
</figcaption></figure>
<p>These plots can also be used to find designs which are insensitive to a certain design parameter. For example, imagine the factory reports that they are unable to tightly control the hole diameter when manufacturing a probe. We would want to find a probe design which minimises the design objectives, whilst having its $R_r$ relatively insensitive to the hole diameter. From Figure 11 (right), we see that designs at point B would be preferable over designs at A. Since, at point A, perturbations in the hole diameter run perpendicular to the $R_r$ iso-lines, while at B they run parallel.</p>
<h2 id="other-examples">Other Examples</h2>
<p>For other examples of how the <a href="https://www.effective-quadratures.org/docs/_documentation/subspaces.html">dimension reduction module</a> in <em>equadratures</em> can be used, check out:</p>
<ul>
<li>Nicholas Wong’s great <a href="https://discourse.effective-quadratures.org/t/embedded-ridge-approximations/73">blog post</a>, where he examines the use of dimension reduction for flowfield approximations.</li>
<li>Pranay Seshadri’s <a href="https://asmedigitalcollection.asme.org/turbomachinery/article-abstract/140/4/041003/378904/Turbomachinery-Active-Subspace-Performance-Maps?redirectedFrom=fulltext">paper</a>, introducing dimension reducing turbomachinery performance maps.</li>
<li>James Gross’ <a href="https://arc.aiaa.org/doi/abs/10.2514/6.2020-0157">paper</a>, where he combines dimension reduction ideas with trust region optimisation methods.</li>
</ul>
<h2 id="references">References</h2>
<p>[1] Constantine, P. G. (2015). Active Subspaces. Society for Industrial and Applied Mathematics. <a href="https://epubs.siam.org/doi/book/10.1137/1.9781611973860?mobileUi=0">Book.</a></p>
<p>[2] Hokanson, J. M., & Constantine, P. G. (2018). Data-Driven Polynomial Ridge Approximation Using Variable Projection. SIAM Journal on Scientific Computing, 40(3), A1566–A1589. <a href="https://epubs.siam.org/doi/abs/10.1137/17M1117690?mobileUi=0">Paper.</a></p>Ashley Scillitoeashleyscillitoe@googlemail.comIn many engineering design tasks, we suffer from the *curse of dimensionality*. The number of design parameters quickly becomes too large for us to effectively visualise or explore the design space. We could attempt to use a design optimisation procedure to arrive at an "optimal" design, however, the curse of dimensionality places a computational burden on the cost of the optimisation. Also, in many cases we wish to *understand* the design space, not just spit out a "better" design.Uncertainty quantification of computational simulations2020-04-09T00:00:00+01:002020-04-09T00:00:00+01:00https://ascillitoe.github.io/posts/2020/Uncertainty-quantification-for-computational-simulations<p>In this blog post, I’ll talk about using the <a href="https://github.com/Effective-Quadratures/equadratures"><em>equadratures</em></a> python package for uncertainty quantification of computational simulations. If you’d like to follow along, the code from this post can be run interactively on the cloud by clicking below:</p>
<p><a href="https://colab.research.google.com/github/Effective-Quadratures/EQ-live/blob/master/Blog_posts/computing_moments.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab" /></a></p>
<h1 id="uncertainty-quantification">Uncertainty quantification</h1>
<p><em>Uncertainty quantification</em> (UQ) is the science of quantifying, and perhaps reducing, <em>uncertainties</em> in both computational and real world applications. Many fields of engineering rely on computational simulations, such as Computational Fluid Dynamics (CFD) simulations, to predict <em>Quantities of Interest</em> (QoI’s), such as lift and drag coefficients for an aerofoil or vehicle. These simulations have many sources of uncertainty, which can be broadly split into two categories:</p>
<ul>
<li>
<p><em>Aleatory uncertainties -</em> statistical uncertainties, which area caused by intrinsic variability in our experiment or physical process. Examples of these would uncertainties in manufactured geometries or inflow conditions of the flow we are simulating.</p>
</li>
<li>
<p><em>Epistemic uncertainties -</em> systematic uncertainties, which are due to things one could in principle know but do not in practice. An example of this would be the uncertainty which arises due to the turbulence model used in a CFD simulation.</p>
</li>
</ul>
<p>The effect of aleatory uncertainties on a QoI can be quantified by propagating these uncertainties through our computational model, as shown in Figure 1. Here $s_1$ and $s_2$ are our input parameters, which are represented with <em>probability density functions</em> (PDF’s) since there is some uncertainty in their definition. These parameter uncertainties will be propagated through the model $f(s_1,s_2)$, so that we obtain a PDF of our output quantity of interest $y$.</p>
<figure class="">
<img src="https://ascillitoe.github.io/assets/images/posts/cfd_uq/2d_fwd_uq.jpg" alt="2D forward propagation" style="width: ;" /><figcaption>
Figure 1: Forward propagation of two parameter uncertainties through a model.
</figcaption></figure>
<p><em>Backward analysis</em> (statistical inference) techniques can be used to do the opposite i.e. infer the $s_1$ and $s_2$ distributions from the measured distribution of $y$, however here we’ll stick to <em>forward propagation</em>. For a comprehensive review of statistical inference and quantification of epistemic uncertainties from turbulence models, check out this recent review paper <a href="https://www.sciencedirect.com/science/article/abs/pii/S0376042118300952">[1]</a>.</p>
<h1 id="computing-moments-with-polynomials-vs-random-sampling">Computing moments with polynomials vs random sampling</h1>
<p>Before getting to <em>forward propagation</em> on a real CFD example, lets first explore the motivation for using <em>equadratures</em> for this, by comparing it to a random sampling approach. It is important to note that often we are not interested in the actual PDF of our QoI $y$, but only its statistical moments, the first four of which are shown in Figure 2. Here we will focus on the first two, the mean $\bar{y}$ and variance $Var(y)$, which often tell us sufficient information about the uncertainty in our QoI’s.</p>
<figure class="">
<img src="https://ascillitoe.github.io/assets/images/posts/cfd_uq/moments.jpg" alt="Statistical moments" style="width: 85%;" /><figcaption>
Figure 2: The first four statistical moments of a probability distribution. <br />Source: <a href="https://medium.com/paypal-engineering/statistics-for-software-e395ca08005d">https://medium.com/paypal-engineering/statistics-for-software-e395ca08005d</a>
</figcaption></figure>
<p>Why bother using polynomials for estimating moments? What exactly is the advantage? Moreover, are we guaranteed that we will converge to the Monte Carlo solution? The answer is a resounding yes! In fact, this is precisely what Dongbin Xiu and George Karniandakis demonstrate in their seminal paper [2]. @Nick explores this further in <a href="https://effective-quadratures.github.io/_documentation/tutorial_4.html">tutorial 4</a> for a 2D problem (Rosenbrock’s function), where he demonstrates the cost savings gained by using polynomials (see Figure 3!).</p>
<figure class="">
<img src="https://ascillitoe.github.io/assets/images/posts/cfd_uq/mean_and_var.png" alt="Mean and variance" style="width: 85%;" /><figcaption>
Figure 3: Mean and variance estimates versus number of samples for Monte Carlo type random sampling and polynomial approximations (with <em>equadratures</em>). <a href="https://effective-quadratures.github.io/_documentation/tutorial_4.html">Source.</a>
</figcaption></figure>
<p>In this post we will start with a more simple univariate problem. We have one input parameter $s_1$, which has a uniform distribution $\mathcal{S}=\mathcal{U}[0,1]$, and our model is a simple quadratic $f(s_1) = -s_1^2 + s_1 + 1$.</p>
<figure class="">
<img src="https://ascillitoe.github.io/assets/images/posts/cfd_uq/1d_fwd_uq.jpg" alt="1D forward propagation" style="width: 85%;" /><figcaption>
Figure 4: A simple univariate forward propagation example.
</figcaption></figure>
<p>We wish to compute the mean and variance of $y=f(s_1)$ due to the uncertainty in $s_1$. As a reference, the analytical mean and variance of $y$ are:</p>
\[\overline{f(s_1)} = \int_0^1{f(s_1)}ds_1=\frac{7}{6}= \mathbf{1.1\dot{6}}\]
\[Var\left({f(s_1)}\right) = \int_0^1{f(s_1)^2}ds_1 - \overline{f(s_1)}^2=\frac{1}{180}= \mathbf{0.00\dot{5}}\]
<h2 id="random-sampling">Random sampling</h2>
<p>The most simple approach is a Monte Carlo type approach where we evaluate our model $f(s_1)$ at $N$ number of $s_1$ randomly sampled from $\mathcal{S}$. Then we calculate the mean and variance of the collected model outputs.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
</pre></td><td class="rouge-code"><pre><span class="n">our_function</span> <span class="o">=</span> <span class="k">lambda</span> <span class="n">s</span><span class="p">:</span> <span class="o">-</span><span class="n">s</span><span class="o">**</span><span class="mi">2</span> <span class="o">+</span> <span class="n">s</span> <span class="o">+</span> <span class="mi">1</span>
<span class="n">s1_samples</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">random</span><span class="p">.</span><span class="n">uniform</span><span class="p">(</span><span class="n">low</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span><span class="n">high</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span><span class="n">size</span><span class="o">=</span><span class="n">N</span><span class="p">)</span>
<span class="n">y_samples</span> <span class="o">=</span> <span class="n">f</span><span class="p">(</span><span class="n">s1_samples</span><span class="p">)</span>
<span class="n">mean</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">mean</span><span class="p">(</span><span class="n">y_samples</span><span class="p">)</span>
<span class="n">var</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">var</span><span class="p">(</span><span class="n">y_samples</span><span class="p">)</span>
</pre></td></tr></tbody></table></code></pre></div></div>
<p>On the <a href="https://eqlive-ascillitoe.notebooks.azure.com/j/notebooks/Blog_posts/computing_moments.ipynb">azure notebook</a> version of this post there is an interactive widget where you can perform the above procedure for different $N$. You should find that a large number is required for $N$ before accurate moments are obtained, especially for the variance. This might be OK in this example, but not for a case where each model simulation is a CFD simulation requiring minutes/hours/days to run!</p>
<figure class="">
<img src="https://ascillitoe.github.io/assets/images/posts/cfd_uq/random_sampling.png" alt="Random sampling" style="width: ;" /><figcaption>
Figure 5: Random sampling of the model function $f(s_1)$ with $N=5$ and $N=552$.
</figcaption></figure>
<h2 id="using-equadratures">Using <em>equadratures</em></h2>
<p>Alternatively, we can use <em>equadratures</em> to compute the moments of $y$, with the code below. We simply declare the usual building blocks (a <code class="language-plaintext highlighter-rouge">Parameter</code>, <code class="language-plaintext highlighter-rouge">Basis</code> and <code class="language-plaintext highlighter-rouge">Poly</code> object), give the <code class="language-plaintext highlighter-rouge">Poly</code> our data (or function) with <code class="language-plaintext highlighter-rouge">set_model</code>, and then run <code class="language-plaintext highlighter-rouge">get_mean_and_variance</code>.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
</pre></td><td class="rouge-code"><pre><span class="n">s1</span> <span class="o">=</span> <span class="n">Parameter</span><span class="p">(</span><span class="n">distribution</span><span class="o">=</span><span class="s">'uniform'</span><span class="p">,</span> <span class="n">lower</span><span class="o">=</span><span class="mf">0.</span><span class="p">,</span> <span class="n">upper</span><span class="o">=</span><span class="mf">1.</span><span class="p">,</span> <span class="n">order</span><span class="o">=</span><span class="mi">2</span><span class="p">)</span>
<span class="n">mybasis</span> <span class="o">=</span> <span class="n">Basis</span><span class="p">(</span><span class="s">'univariate'</span><span class="p">)</span>
<span class="n">mypoly</span> <span class="o">=</span> <span class="n">Poly</span><span class="p">(</span><span class="n">parameters</span><span class="o">=</span><span class="n">s1</span><span class="p">,</span> <span class="n">basis</span><span class="o">=</span><span class="n">mybasis</span><span class="p">,</span> <span class="n">method</span><span class="o">=</span><span class="s">'numerical-integration'</span><span class="p">)</span>
<span class="n">mypoly</span><span class="p">.</span><span class="n">set_model</span><span class="p">(</span><span class="n">our_function</span><span class="p">)</span>
<span class="n">mean</span><span class="p">,</span> <span class="n">var</span> <span class="o">=</span> <span class="n">mypoly</span><span class="p">.</span><span class="n">get_mean_and_variance</span><span class="p">()</span>
</pre></td></tr></tbody></table></code></pre></div></div>
<figure class="">
<img src="https://ascillitoe.github.io/assets/images/posts/cfd_uq/code_output.png" alt="Code output" style="width: 60%;" /></figure>
<p>The accuracy is clearly pretty good! What have we actually done here? Behind the scenes, <em>equadratures</em> has calculated the quadrature points in $s_1$ (dependent on our choice of <code class="language-plaintext highlighter-rouge">distribution</code>, <code class="language-plaintext highlighter-rouge">order</code> and the <code class="language-plaintext highlighter-rouge">Basis</code>). Then it has evaluated our model at these points, and used the results to construct a polynomial approximation (<em>response surface</em>), $f(s_1)\approx \sum_{i=1}^N x_i p_i(s_1)$.</p>
<figure class="">
<img src="https://ascillitoe.github.io/assets/images/posts/cfd_uq/1d_response.png" alt="1D response surface" style="width: 60%;" /><figcaption>
Figure 6: Polynomial approximation of the model $f(s_1)$, and the three quadrature points the model was evaluated at.
</figcaption></figure>
<p>Once we have the polynomial coefficients $x_i$, it is straightforward to obtain the mean and variance:
\(\mathbb{E}[f(s)]=\int_Sf(s)\omega(s)ds=x_0\)</p>
\[\sigma^2[f(s)]=\int_Sf^2(s)\omega(s)ds-\mathbb{E}[f(s)]^2=\sum_{i=1}^N x_i^2\]
<p>Since we selected <code class="language-plaintext highlighter-rouge">order=2</code> here, we only required $N=3$ model evaluations to get exact values for the moments. This is expected since our model is a quadratic polynomial itself in this case. However, it still demonstrates the potential of this approach, compared to random sampling.</p>
<h1 id="a-cfd-example">A CFD example!</h1>
<p>Now we have seen the potential of using polynomial approximations to compute moments, we will explore a real CFD example, the von Karman Institute LS89 turbine cascade [3]. The open source <a href="https://su2code.github.io/">SU2</a> CFD code is used for all simulations here.</p>
<figure class="">
<img src="https://ascillitoe.github.io/assets/images/posts/cfd_uq/VKI_RANS.jpg" alt="VKI RANS simulation" style="width: 80%;" /><figcaption>
Figure 7: RANS simulation of the VKI turbine cascade.
</figcaption></figure>
<p>For the Reynolds-Averaged Navier-Stokes (RANS) turbulence model, we select the SST model. This is a 2-equation model, solving an equation for the turbulent kinetic energy $k$, and the specific dissipation rate $\omega$.</p>
<figure class="">
<img src="https://ascillitoe.github.io/assets/images/posts/cfd_uq/SST_eqns.png" alt="SST equations" style="width: 90%;" /></figure>
<p>The addition of additional equations for $k$ and $\omega$ can improve accuracy compared to a 0- or 1-equation RANS model, but it adds an additional source of aleatory uncertainty; the specification of inflow boundary conditions for $k$ and $\omega$. We wish to estimate the uncertainty in our QoI due to this new uncertainty source. Our QoI here is the loss coefficient for the cascade:</p>
\[Y_p = \frac{p_{0_{in}}-p_{0}}{p_{0_{in}}-p_{_{inflow}}}\]
<p>SU2 uses turbulence intensity $Ti$, and turbulent viscosity ratio $\nu_t/\nu$ as its turbulent boundary conditions ($k = \frac{3}{2}(U \;Ti)$ and $\omega = \frac{k}{\nu}\left(\frac{\nu_t}{\nu} \right)^{-1}$). We can often estimate $Ti$ with some confidence, since it can be measured using a hot-wire probe. However, $\nu_t/\nu$ is physically more vague, and it is more of an unknown quantity. For this reason we set $Ti$ to have a gaussian distribution, while for $\nu_t/\nu$ we can only say it lies within a certain range, therefore we choose a uniform distribution. Then we declare a <code class="language-plaintext highlighter-rouge">Basis</code> and <code class="language-plaintext highlighter-rouge">Poly</code> as usual…</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
</pre></td><td class="rouge-code"><pre><span class="n">s1</span> <span class="o">=</span> <span class="n">Parameter</span><span class="p">(</span><span class="n">distribution</span><span class="o">=</span><span class="s">'uniform'</span><span class="p">,</span> <span class="n">lower</span><span class="o">=</span><span class="mf">1.0</span><span class="p">,</span> <span class="n">upper</span><span class="o">=</span><span class="mi">100</span><span class="p">,</span> <span class="n">order</span><span class="o">=</span><span class="mi">3</span><span class="p">)</span> <span class="c1">#turb2lamviscosity
</span><span class="n">s2</span> <span class="o">=</span> <span class="n">Parameter</span><span class="p">(</span><span class="n">distribution</span><span class="o">=</span><span class="s">'Gaussian'</span><span class="p">,</span> <span class="n">shape_parameter_A</span><span class="o">=</span><span class="mi">10</span><span class="p">,</span> <span class="n">shape_parameter_B</span><span class="o">=</span><span class="mi">5</span><span class="p">,</span> <span class="n">order</span><span class="o">=</span><span class="mi">3</span><span class="p">)</span> <span class="c1">#Ti
</span><span class="n">mybasis</span> <span class="o">=</span> <span class="n">Basis</span><span class="p">(</span><span class="s">'tensor-grid'</span><span class="p">)</span>
<span class="n">mypoly</span> <span class="o">=</span> <span class="n">Poly</span><span class="p">(</span><span class="n">parameters</span><span class="o">=</span><span class="p">[</span><span class="n">s1</span><span class="p">,</span><span class="n">s2</span><span class="p">],</span> <span class="n">basis</span><span class="o">=</span><span class="n">mybasis</span><span class="p">,</span> <span class="n">method</span><span class="o">=</span><span class="s">'numerical-integration'</span><span class="p">)</span>
</pre></td></tr></tbody></table></code></pre></div></div>
<p>This time running <code class="language-plaintext highlighter-rouge">set_model</code> is a little more involved, since our model is a CFD simulation instead of a simple polynomial function. We first ask <em>equadratures</em> for the quadrature points, and save them to disk.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
</pre></td><td class="rouge-code"><pre><span class="n">pts</span> <span class="o">=</span> <span class="n">mypoly</span><span class="p">.</span><span class="n">get_points</span><span class="p">()</span>
<span class="n">np</span><span class="p">.</span><span class="n">save</span><span class="p">(</span><span class="s">'points_to_run.npy'</span><span class="p">,</span> <span class="n">pts</span><span class="p">)</span>
</pre></td></tr></tbody></table></code></pre></div></div>
<figure class="">
<img src="https://ascillitoe.github.io/assets/images/posts/cfd_uq/tensor_grid.png" alt="Tensor grid" style="width: 60%;" /></figure>
<p>We have essentially just obtained a Design of experiments (DoE), telling us the turbulent inflow conditions to run our CFD at. This is straightforward to do, especially with the python scripting capability of SU2! Once this is done, we load the QoI ($Y_p$) from each simulation into a numpy array <code class="language-plaintext highlighter-rouge">Y\_p</code>, and give it to the <code class="language-plaintext highlighter-rouge">Poly</code> with <code class="language-plaintext highlighter-rouge">mypoly.set_model(Yp)</code>. We can then compute the mean and variance.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
</pre></td><td class="rouge-code"><pre><span class="n">mean</span><span class="p">,</span> <span class="n">var</span> <span class="o">=</span> <span class="n">mypoly</span><span class="p">.</span><span class="n">get_mean_and_variance</span><span class="p">()</span>
</pre></td></tr></tbody></table></code></pre></div></div>
<figure class="">
<img src="https://ascillitoe.github.io/assets/images/posts/cfd_uq/2d_code_output.png" alt="2D code output" style="width: 70%;" /></figure>
<p>So our 95% confidence interval due to uncertainty in inflow turbulence specification is $Y_p\pm 0.00047$. This seems small but is actually 0.88% of the mean $Y_p$ value, so may be significant depending on your use case.</p>
<p>And there you have it! Quantification of aleatory uncertainties using <em>equadratures</em>, with far fewer model evaluations (CFD runs!) required compared to Monte Carlo approaches. These cost savings only increase as the number of parameters we wish to propagate increases!</p>
<h2 id="dealing-with-failed-simulations">Dealing with failed simulations</h2>
<p>Any CFD practitioner is probably all too familiar with CFD simulations not converging! So what happens to the above in this case? In the <a href="https://eqlive-ascillitoe.notebooks.azure.com/j/notebooks/Blog_posts/computing_moments.ipynb">companion notebook</a>, we set a number of our DoE samples to NaN to explore this.</p>
<figure class="">
<img src="https://ascillitoe.github.io/assets/images/posts/cfd_uq/tensor_grid_NaN.png" alt="Tensor grid NaN" style="width: 60%;" /></figure>
<p>The answer is <em>no problem!</em> <em>equadratures</em> detects the NaN’s in our <code class="language-plaintext highlighter-rouge">tensor-grid</code> and automatically switches to a Least Squares technique to find the polynomial coefficients. The accuracy of the computed moments is within 2% of the previous values!</p>
<figure class="">
<img src="https://ascillitoe.github.io/assets/images/posts/cfd_uq/error_output.png" alt="Error output" style="width: 90%;" /></figure>
<h1 id="sensitivity-analysis">Sensitivity analysis</h1>
<p>As a quick follow on from Nicholas Wong’s <a href="https://discourse.effective-quadratures.org/t/sensitivity-analysis-with-effective-quadratures/30">blog post</a> over on our discourse, we can compute the Sobol’ indices for the above CFD polynomial approximation.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
</pre></td><td class="rouge-code"><pre><span class="n">mypoly</span><span class="p">.</span><span class="n">get_sobol_indices</span><span class="p">(</span><span class="n">order</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
</pre></td></tr></tbody></table></code></pre></div></div>
<figure class="">
<img src="https://ascillitoe.github.io/assets/images/posts/cfd_uq/sobol_output.png" alt="Sobol output" style="width: 70%;" /></figure>
<p>Clearly, $Y_p$ is significantly more sensitive to $s_1$ ($\nu_t/\nu$) than $s_2$ ($Ti$). This is potentially a problem when we are looking to run CFD simulations of this nature, since ($\nu_t/\nu$) is more difficult to measure, so we often don’t have a good idea of what its value should be.</p>
<h1 id="references">References</h1>
<p>[1] Xiao, H., and Cinnella, P., (2019). Quantification of Model Uncertainty in RANS Simulations: A Review. <em>Progress in Aerospace Sciences</em>, 108. <a href="https://arxiv.org/pdf/1806.10434.pdf">Preprint</a></p>
<p>[2] Xiu, D., and Karniandakis, G. E., (2002). The Wiener-Askey Polynomial Chaos for Stochastic Differential Equations. <em>SIAM Journal on Scientific Computing</em>, 24(2). <a href="https://epubs.siam.org/doi/abs/10.1137/S1064827501387826?journalCode=sjoce3">Paper</a></p>
<p>[3] Segui, L. et al., (2017). LES of the LS89 cascade: influence of inflow turbulence on the flow predictions. <em>Proceedings of 12th European Conference on Turbomachinery Fluid Dynamics & Thermodynamics.</em> <a href="https://www.euroturbo.eu/publications/proceedings-papers/etc2017-159/">Paper</a></p>Ashley Scillitoeashleyscillitoe@googlemail.comUncertainty quantification is the science of quantifying, and perhaps reducing, uncertainties in both computational and real world applications. Many fields of engineering rely on computational simulations, such as Computational Fluid Dynamics simulations, to predict Quantities of Interest, such as lift and drag coefficients for an aerofoil or vehicle. These simulations have many sources of uncertainty.