| | |
| | |
Stat |
Members: 3665 Articles: 2'599'751 Articles rated: 2609
20 January 2025 |
|
| | | |
|
Article overview
| |
|
Towards Addressing the Misalignment of Object Proposal Evaluation for Vision-Language Tasks via Semantic Grounding | Joshua Feinglass
; Yezhou Yang
; | Date: |
1 Sep 2023 | Abstract: | Object proposal generation serves as a standard pre-processing step in
Vision-Language (VL) tasks (image captioning, visual question answering, etc.).
The performance of object proposals generated for VL tasks is currently
evaluated across all available annotations, a protocol that we show is
misaligned - higher scores do not necessarily correspond to improved
performance on downstream VL tasks. Our work serves as a study of this
phenomenon and explores the effectiveness of semantic grounding to mitigate its
effects. To this end, we propose evaluating object proposals against only a
subset of available annotations, selected by thresholding an annotation
importance score. Importance of object annotations to VL tasks is quantified by
extracting relevant semantic information from text describing the image. We
show that our method is consistent and demonstrates greatly improved alignment
with annotations selected by image captioning metrics and human annotation when
compared against existing techniques. Lastly, we compare current detectors used
in the Scene Graph Generation (SGG) benchmark as a use case, which serves as an
example of when traditional object proposal evaluation techniques are
misaligned. | Source: | arXiv, 2309.00215 | Services: | Forum | Review | PDF | Favorites |
|
|
No review found.
Did you like this article?
Note: answers to reviews or questions about the article must be posted in the forum section.
Authors are not allowed to review their own article. They can use the forum section.
|
| |
|
|
|