Abstract: Vision-language object tracking integrates advanced linguistic information, enhancing its robustness and accuracy in complex scenarios. Nevertheless, current methods are constrained by a ...