Harnessing Vision Foundation Models for High-Performance, Training-Free Open Vocabulary Segmentation
Abstract: While CLIP has advanced open-vocabulary predictions, its performance on semantic segmentation remains suboptimal. This shortfall primarily stems from its spatialinvariant semantic features ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results