Applied Statistics for Logistics, Manufacturing, and Training

OLS regression, two-way ANOVA, Poisson probability, and two-sample hypothesis testing applied to logistics economics, manufacturing process control, and workforce training. Each analysis follows the same loop: verify assumptions, fit the model, interpret the output, run diagnostics. All computation in MATLAB.

European Logistics Expenditure via OLS

The premise was simple. Given GDP data for 11 European countries, build a linear model that predicts national logistics expenditure. The dataset spans nearly two orders of magnitude, from Ireland at $33.9B GDP to Germany at $1.19T.

GDP vs Logistics Bill scatter plot for 11 European countries, showing a strong positive linear relationship with Germany as a clear outlier at the upper extreme

Logistics spending scales with GDP, which is what you’d expect. The OLS fit:

$$\text{Logistics Bill} = -727.56 + 0.124 \times \text{GDP}$$

For every $1B increase in GDP, logistics expenditure rises by ~$124M. R² = 0.9896, which is unusually high for cross-sectional economic data. The F-statistic of 852.28 (p < 0.001) confirms the model is significant.

For a GDP of $600B, the predicted logistics bill is $73.6B, with a 95% prediction interval of [$59.8B, $87.4B]. That $27.6B width is just the reality of n = 11.

Diagnostics

The residuals-vs-fitted plot tells a different story than R² alone.

Residuals vs Fitted Values plot showing a non-random pattern with a large positive residual at the highest fitted value, corresponding to Germany

Residuals cluster tightly for small fitted values, swing negative in the mid-range, then one large positive residual appears at the extreme. That’s Germany. As Europe’s largest logistics hub, with extensive transit infrastructure and a disproportionate share of continental freight, Germany generates logistics spending that exceeds what a GDP-based model would predict. It acts as a high-leverage point that pulls the regression line toward it while simultaneously violating it.

I ran the sensitivity analysis: removing Germany improved RMSE by 81%. The model without Germany still holds, but the coefficient shifts, confirming that one country was exerting substantial influence on the parameter estimates. This doesn’t mean Germany should be excluded. It means a logistics-specific predictor (freight volume, port throughput) would capture variation that GDP alone can’t.

Testing H₀: β₀ = 0 yields t = -0.285, p = 0.782. The non-significant intercept makes economic sense: a country with zero GDP should have zero logistics expenditure.

Line Speed and Carbonation: Two-Way ANOVA

A bottling line runs at four speeds (210, 240, 270, 300 bpm) with two carbonation levels (10%, 12%). Five replicates per combination, 40 observations total. The question: does line speed affect fill accuracy, and do the two factors interact?

Before touching the ANOVA, I checked both assumptions.

Normality: chi-square goodness-of-fit yields χ² = 1.026, df = 4, p = 0.9058. Consistent with normality.

Homogeneity of variance: Levene’s test, F = 1.6935, p = 0.1460. Variances not significantly different across groups. Both hold.

SourceSSdfMSFp-value
Line Speed21.87537.2924.9340.0063
Carbonation0.81210.8120.5500.4639
Interaction0.03930.0130.0090.9989
Error47.288321.478

Line speed is significant (F = 4.934, p = 0.0063). Carbonation is not (p = 0.4639). The interaction is essentially zero.

Box plot of fill deviation by line speed showing increasing median deviation and spread as line speed increases from 210 to 300 bpm

210 bpm produces the lowest fill deviations with the tightest spread. Both median and variance increase with speed. Post-hoc Tukey HSD confirms 210 bpm is significantly different from 270 and 300 bpm.

An interaction p-value of 0.9989

That’s about as close to 1 as you’ll see in practice. The effect of line speed on fill deviation is identical regardless of carbonation level.

Interaction plot showing nearly parallel lines for 10% and 12% carbonation across all four line speeds, confirming no interaction

In practice this means the two factors can be optimised independently. You don’t need to recalibrate speed settings every time the carbonation recipe changes.

Poisson Breakdowns

Same manufacturing setting, different question. Breakdowns occur at λ = 1.5 per 8-hour shift. The Poisson model handles discrete events at a constant rate within a fixed interval, assuming independence between events.

The key probabilities:

  • P(X = 2) per shift: 0.2510
  • P(X < 2) per shift: 0.5578
  • P(X = 0) across three consecutive shifts: 0.0111

Over a 24-hour day, there’s a 98.9% probability of at least one breakdown. Maintenance needs to cover all shifts, not just days. Parts inventory should be sized for roughly 30 breakdowns per week (21 shifts at λ = 1.5).

A 20% reduction in breakdown rate to λ = 1.2 (achievable through preventive maintenance) drops the 3-shift expected count from 4.5 to 3.6. These improvements compound. The probability of a completely breakdown-free day goes from 1.1% to 2.7%.

The model assumes constant rate, independence, and stationarity. Operator fatigue, equipment degradation over time, seasonal variation, all of these could violate those assumptions. In practice you’d track observed counts against Poisson predictions and update λ when they diverge.

Welch’s t-Test: CAL vs GBL

Two training methods for assembly tasks: Computer Assisted Learning (CAL) and Group-Based Learning (GBL). Twelve employees per group. Does CAL produce faster assembly times?

StatisticCALGBL
Mean17.850s20.217s
Std Dev2.345s4.421s
Range[14.4, 21.8][13.8, 28.2]

CAL has a 2.367-second advantage and roughly four times lower variance.

Box plot comparing CAL and GBL assembly times, showing CAL with a lower median and substantially tighter spread

An F-test for equal variances returns F = 0.281, p = 0.046. The variances are significantly different. This rules out the standard pooled t-test and requires Welch’s, which uses separate variance estimates and applies the Welch-Satterthwaite approximation for degrees of freedom. Running the pooled version here would produce results on a violated assumption.

Results:

  • t-statistic: -1.6384
  • Degrees of freedom: 16.74 (Welch-Satterthwaite)
  • p-value (one-tailed): 0.0600
  • Critical value (α = 0.05): -1.7412

We fail to reject H₀. At the 5% level, there’s insufficient evidence to conclude CAL is faster.

But p = 0.06 sitting just above the threshold, on n = 12 per group with a variance ratio of 4:1, doesn’t mean the effect isn’t real. The 2.4-second improvement (11.7%) could easily be practically meaningful in a setting where seconds accumulate across thousands of units per day. The test simply doesn’t have enough power to distinguish a moderate effect from noise at this sample size. A power analysis would point to 20-30 subjects per group to reliably detect an effect this size.

The result is better described as inconclusive than negative. That distinction matters if someone is making a training decision based on it.

Spring Rate Analysis

100 springs, sample mean 44.175 N, standard deviation 1.008 N, specification limits 41.5 N (LSL) to 46.5 N (USL), target 44 N.

Converting to z-scores:

  • z_lower = (41.5 - 44.175) / 1.008 = -2.653
  • z_upper = (46.5 - 44.175) / 1.008 = 2.306

98.54% of springs within spec. 0.40% below LSL, 1.06% above USL. The asymmetry (more failing above than below) reflects the process mean sitting 0.175 N above target.

Normal distribution curve with specification limits showing 98.54% of springs within spec, 0.399% below LSL at 41.5N, and 1.057% above USL at 46.5N

Is the process centered? One-sample t-test for H₀: μ = 44 N:

  • t(99) = 1.735
  • p-value (two-tailed) = 0.0858

We fail to reject. The 0.175 N offset could plausibly arise from sampling variability alone. The 95% confidence interval [43.975, 44.375] N contains the target.

The process is both capable (98.5% within spec) and centered (mean not significantly different from target). The 6σ spread of 6.048 N fits inside the 5 N specification window, but not with a lot of margin. Any upward drift in process variability needs to be caught early.

What the Diagnostics Actually Did

In the GDP regression, the residual plot exposed Germany’s influence on the parameter estimates in a way that R² = 0.99 completely masked. In the training comparison, the F-test for equal variances shifted the analysis from a pooled t-test to Welch’s, cutting the degrees of freedom from 22 to 16.74 and changing what conclusions were defensible.

Skipping the assumption checks would have left both of those points invisible. The models would have fit, the p-values would have looked clean, and the conclusions would have been subtly wrong.

← back to writing