| | |
| | |
Stat |
Members: 3643 Articles: 2'488'730 Articles rated: 2609
29 March 2024 |
|
| | | |
|
Article overview
| |
|
CodeBLEU: a Method for Automatic Evaluation of Code Synthesis | Shuo Ren
; Daya Guo
; Shuai Lu
; Long Zhou
; Shujie Liu
; Duyu Tang
; Ming Zhou
; Ambrosio Blanco
; Shuai Ma
; | Date: |
22 Sep 2020 | Abstract: | Evaluation metrics play a vital role in the growth of an area as it defines
the standard of distinguishing between good and bad models. In the area of code
synthesis, the commonly used evaluation metric is BLEU or perfect accuracy, but
they are not suitable enough to evaluate codes, because BLEU is originally
designed to evaluate the natural language, neglecting important syntactic and
semantic features of codes, and perfect accuracy is too strict thus it
underestimates different outputs with the same semantic logic. To remedy this,
we introduce a new automatic evaluation metric, dubbed CodeBLEU. It absorbs the
strength of BLEU in the n-gram match and further injects code syntax via
abstract syntax trees (AST) and code semantics via data-flow. We conduct
experiments by evaluating the correlation coefficient between CodeBLEU and
quality scores assigned by the programmers on three code synthesis tasks, i.e.,
text-to-code, code translation, and code refinement. Experimental results show
that our proposed CodeBLEU can achieve a better correlation with programmer
assigned scores compared with BLEU and accuracy. | Source: | arXiv, 2009.10297 | Services: | Forum | Review | PDF | Favorites |
|
|
No review found.
Did you like this article?
Note: answers to reviews or questions about the article must be posted in the forum section.
Authors are not allowed to review their own article. They can use the forum section.
browser claudebot
|
| |
|
|
|
| News, job offers and information for researchers and scientists:
| |