Abstract: The integration of visual and linguistic elements within artificial intelligence research is increasingly emphasized, spurred by advancements in pre-trained model technologies. Traditionally ...