MIT and IBM researchers have opened a new front in multimodal artificial intelligence by releasing ChartNet, a large synthetic dataset designed to teach smaller vision-language models how to read, ...
The latest round of language models, like GPT-4o and Gemini 1.5 Pro, are touted as “multimodal,” able to understand images and audio as well as text. But a new study makes clear that they don’t really ...