Announcement_18
I’m happy to share our new work UniPercept, which tackles a key blind spot of today’s multimodal LLMs: perceptual-level image understanding — how images look and feel to humans — covering aesthetics, quality, structure, and texture. Our release includes UniPercept-Bench, a unified benchmark spanning IAA/IQA/ISTA and supporting both Visual Rating (VR) and Visual Question Answering (VQA) evaluations. We also introduce the UniPercept baseline model to generalize across VR and VQA settings. Beyond benchmarking, UniPercept can be used as a reward model for post-training text-to-image systems and as a perceptual diagnostic tool for analyzing model outputs and datasets. [Homepage] [GitHub] [ UniPercept-Bench] [ UniPercept Model] [Paper].