Local Precise Refinement: A Dual-Gated Mixture-of-Experts for Enhancing Foundation Model Generalization against Spectral Shifts

01

Abstract

Domain Generalization Semantic Segmentation (DGSS) in spectral remote sensing is severely challenged by spectral shifts across diverse acquisition conditions, which cause significant performance degradation for models deployed in unseen domains. While fine-tuning foundation models is a promising direction, existing methods employ global, homogeneous adjustments. This "one-size-fits-all" tuning struggles with the spatial heterogeneity of land cover, causing semantic confusion. We argue that the key to robust DGSS lies not in a single global adaptation, but in performing fine-grained, spatially-adaptive refinement of a foundation model's features. To achieve this, we propose SpectralMoE, a novel fine-tuning framework for DGSS. It operationalizes this principle by utilizing a Mixture-of-Experts (MoE) architecture to perform local precise refinement on the foundation model's features, incorporating depth features estimated from selected RGB bands of the spectral remote sensing imagery to guide the fine-tuning process. Specifically, SpectralMoE employs a dual-gated MoE architecture that independently routes visual and depth features to top-k selected experts for specialized refinement, enabling modality-specific adjustments. A subsequent cross-attention mechanism then judiciously fuses the refined structural cues into the visual stream, mitigating semantic ambiguities caused by spectral variations. Extensive experiments show that SpectralMoE sets a new state-of-the-art on multiple DGSS benchmarks across hyperspectral, multispectral, and RGB remote sensing imagery.

Our SpectralMoE achieves (a) comprehensive state-of-the-art performance across spectral remote sensing DGSS benchmarks. This superiority stems from our dual-gated MoE, which enables (b) fine-grained, spatially adaptive adjustments. In the qualitative Grad-CAM visualizations (top row), our method produces complete and detailed responses for complex target regions, such as the structures highlighted by the green box, in clear contrast to the diffuse activations of global fine-tuning methods. This enhanced local refinement directly translates into more robust and qualitatively superior segmentation results in unseen domains (bottom row).

02

Method Overview

Overview of the proposed SpectralMoE framework. SpectralMoE is inserted as a lightweight plugin into each layer of frozen VFMs and DFMs. At its core is a dual-gated MoE mechanism. A dual-gated network independently routes visual and depth feature tokens to specialized experts, enabling fine-grained, spatially-adaptive adjustments that overcome the limitations of global, homogeneous methods. Following this expert-based refinement, a Cross-Attention Fusion Module adaptively injects the robust spatial structural information from the adjusted depth features into the visual features. This fusion process effectively mitigates semantic ambiguity caused by spectral shifts, significantly enhancing the model's cross-domain generalization capability.

03

Experiments

Cross-sensor generalization

SpectralMoE achieves state-of-the-art generalization performance across hyperspectral, multispectral, and RGB remote sensing benchmarks.

04

Citation

@article{chen2026local,
  title={Local Precise Refinement: A Dual-Gated Mixture-of-Experts for Enhancing Foundation Model Generalization against Spectral Shifts},
  author={Chen, Xi and Zhang, Maojun and Liu, Yu and Yan, Shen},
  journal={arXiv preprint arXiv:2603.13352},
  year={2026}
}

05

Acknowledgements

Our implementation is mainly based on following repositories. Thanks for their authors.