Optimization Landscape & Feasibility in Riemannian AmbientFlow

An empirical investigation: which local minima are reached, and do the recoverability theorem's feasibility assumptions hold?

cs.LG - Machine Learning Based on arXiv:2601.18728 Read Paper (PDF)

Problem Statement

Riemannian AmbientFlow minimizes a combined objective:

L(θ, φ) = LAF(θ, φ) + λ · ||Jfθ(0)||F2

Three feasibility assumptions for recoverability:

(F1) Data matching: Learned distribution equals ground truth.
(F2) Posterior matching: Variational posterior equals true posterior.
(F3) Geometric constraint: ||Jf(0)||F2 ≤ C* = Tr(G*(0)).

Circle in R2

S1, C*=1.0

1→2
d→D

Sphere in R3

S2 stereographic, C*=8.0

2→3
d→D

Helix in R3

Helix, C*≈1.025

1→3
d→D

Experimental Setup

200
Data Points
15
Random Starts
7
λ Values
8
Experiments

Multi-Start Landscape Exploration

15 random initializations per λ. High variance indicates multiple distinct local minima.

Feasibility Phase Diagram

Complete Feasibility Table

ManifoldλData MMPost. MM||J||2F1F2F3Feas.
Circle0.00.0791.7610.4150.9240.1721.00.159
Circle0.10.0892.5270.3890.9150.0801.00.073
Circle1.00.1471.0500.2600.8640.3501.00.302
Sphere0.00.1483.4781.4530.8620.0311.00.027
Sphere0.10.1653.7620.8820.8480.0231.00.020
Sphere2.00.2305.3180.3020.7950.0051.00.004
Helix0.00.0821.4380.4200.9210.2371.00.219
Helix0.10.0812.5620.3990.9220.0771.00.071
Helix1.00.1461.4520.2680.8640.2341.00.202
All entries reported. F3=1.0 everywhere (Jacobian norms below C*). Aggregate score dominated by F2 (posterior mismatch). Best: Circle 0.302, Sphere 0.027, Helix 0.219.

Bootstrap 95% CIs (B=200)

Oracle Feasibility: Capacity vs. Optimization

Fitting fθ directly to f* (bypassing ELBO) to separate model capacity from optimization landscape effects.

ManifoldModelParamsRecon. MSE||J||2F1F3
CircleSimple90.00021.0340.9980.967
CircleMLP1,1860.0161.0140.9570.986
SphereSimple190.1623.7640.9091.000
SphereMLP1,2510.1383.2080.9171.000
HelixSimple130.00021.0590.9980.967
HelixMLP1,2190.0211.1760.9520.860
Circle/Helix: capacity sufficient. Simple model achieves F1>0.99, MSE~0.0002. Feasibility gap is an optimization landscape phenomenon.
Sphere: genuine capacity gap. Both models achieve only F1~0.91, MSE~0.14. Stereographic projection harder to approximate.

Directional Curvature Analysis

50 random unit directions per critical point. All positive curvature provides evidence for local minimum status.

ManifoldλMin Curv.Max Curv.Neg. Dirs
Circle0.012.28461.670/50
Circle1.011.52671.040/50
Sphere0.08.1876.640/50
Sphere1.062.76859.350/50
Helix0.0141.031870.470/50
Helix1.0326.873896.460/50
Zero negative directions across 600+ random probes. Strong statistical evidence for local minimum status.

Parameter Continuation

Tracking a single local minimum as λ increases from 0 to 2. Smooth deformation, no bifurcation.

Path dependence: Feasibility decreases monotonically along continuation, but fresh multi-start finds better solutions. Basin at λ=0 is not the most feasible at larger λ.

Pullback Metric Analysis

ManifoldλTr(Gθ)Tr(G*)Ratio
Circle0.00.4501.0000.450
Circle1.00.2611.0000.261
Sphere0.01.0108.0000.126
Sphere1.00.5158.0000.064
Helix0.00.4161.0250.406
Helix1.00.2631.0250.256
Systematic underestimation: Learned metric underestimates true geometry by 2-16x. Sphere ratio drops to 0.064.

Noise Sensitivity (σ ∈ [0.01, 0.5])

Non-trivial noise dependence: Very low noise yields near-zero feasibility. Peaks at moderate noise (σ=0.1-0.2). Sphere consistently low across all noise levels.

Sample Size Sensitivity (n ∈ [50, 500])

Robust to sample size: Feasibility stable across n=50 to 500. Findings not driven by finite-sample effects.

Key Findings

1. Local minima confirmed: Zero negative curvature in 600+ probes. Multiple distinct minima exist (variance up to 2.08).
2. Feasibility trade-off: Increasing λ improves F3 but degrades F1. Non-monotonic aggregate. Best: Circle 0.302, Sphere 0.027, Helix 0.219.
3. Oracle separates capacity from optimization: Simple model achieves F1>0.99 on circle/helix when fit directly. Feasibility gap is primarily an optimization landscape phenomenon.
4. Geometric underestimation: Pullback metric underestimates true geometry by 2-16x due to Jacobian penalty.
5. Path dependence: Continuation yields worse feasibility than fresh multi-start. Basin at λ=0 not optimal at larger λ.
6. Robust across settings: Results stable across σ∈[0.05,0.5] and n∈[50,500].

Reference

Based on: Diepeveen et al., "Riemannian AmbientFlow," arXiv:2601.18728, 2026.

Read the full paper (PDF)