今天我们从互联网上获悉,智源研究院在预印本网站 arXiv 发布的综述报告“A Roadmap for Big Model”(大模型路线图)涉嫌抄袭。对这一情况,研究院立即组织内部调查,确认部分文章存在问题后,已启动邀请第三方专家开展独立审查,并进行相关追责。
对于这一问题的发生,我们深感愧疚。智源研究院作为一家科研机构,高度重视学术规范,鼓励学术创新和学术交流,对学术不端零容忍。在此,我们向相关原文作者和学术界、产业界的同仁和朋友表示诚挚的道歉。
智源研究院内部调查的初步结果如下:
1. 该报告是一篇大模型领域的综述,希望尽可能涵盖国内外该领域的所有重要文献,由智源研究院牵头,负责框架设计和稿件汇总,并邀请国内外100位科研人员分别撰写了16篇独立的专题文章,每篇文章分别邀请了一组作者撰写并单独署名,共200页。报告发布后,根据反馈持续进行修改完善,到4月2日在arXiv网站上已经更新到第三版。
2. 4月13日,我们获悉谷歌研究员Nicholas Carlini在个人博客上指出该报告抄袭了他们论文的数个段落,同时还有其他段落和语句抄袭其他论文。我们对此进行了逐项核查,经查重确认第2篇文章的第3.1节179个词,第8篇文章的第3.1节74个词、第12篇文章的第2.3节55个词、第14篇文章的第2节159个词、第16篇文章的第1节146个词与其他论文重复,应属抄袭。我们决定立即从报告中删除相应内容,报告修订版今天将提交arXiv进行更新。目前已通知所有文章的作者对所有内容进行全面审查,后续经严格审核后再发布新版本。
3. 智源作为该报告的组织者,理应对各篇文章的所有内容进行严格审核,出现这样的问题难辞其咎。对此我们深感自责,特别感谢学术界和媒体的朋友们帮助我们发现问题。我们将深刻吸取教训,整改科研管理和论文发表流程,希望各界朋友监督我们工作。
下一步,智源研究院将以此为戒,采取切实措施,加强科研诚信与学风建设:
(一)即日启动邀请第三方专家对报告进行独立审查,根据正式调查结果对相关责任人作出问责处理。
(二)进一步完善制度管理,通过更加严格的审核机制和更加明确的惩戒措施,对研究院内部以及支持的科研人员加强学风教育,防范同类事件的再次发生。
欢迎各界朋友今后持续严格地监督我们的工作,并对我们工作中可能存在的疏漏和不足加以批评和指正。
北京智源人工智能研究院
2022年4月13日
Statement on the Alleged Plagiarism by "A Roadmap for Big Model"
It has come to our attention that the survey report "A Roadmap for Big Model" uploaded on arXiv by a BAAI team is suspected of plagiarism. Immediately upon learning of the allegations, an internal investigation was organized to confirm the issue. BAAI is also initiating an independent review by third-party experts to further assess the issue and accountabilities. As a research institution that attaches great importance to academic standards, BAAI holds a zero-tolerance policy towards academic misconduct. We express our sincerest apologies to the authors of the original papers and to all of those affected.
The report in question constitutes a collection of 16 feature articles on big AI models. It was intended to cover all relevant literatures in this field at home and abroad and was led by the BAAI, which is responsible for the structure design and overall compiling. We invited researchers at internationally to write the 16 articles, each of which was written by a group of authors, totaling over 200 pages. Since its initial release, the report has been continuously revised based on feedback, and was updated to its third edition on the arXiv website on April 2.
On April 13, we learned that a Google researcher, Dr. Nicholas Carlini, had on his blog identified instances in which the report plagiarizes several paragraphs of his own paper, as well as content from other papers. We checked these findings item by item and rectified that some paragraphs in Section 2.3.1 of Article 2, Section 8.3.1 of Article 8, Section 12.2.3 of Article 12, Section 14.2.2 of Article 14, and Section 16.1 of Article 16, are duplicates of other papers and constitute plagiarism. We have removed these paragraphs from the report and are submitting a revised version to arXiv. The authors of all articles in the report have also been notified to conduct a rigorous review of their respective content and a new version will be released after subsequent afterward if needed. In addition, a third-party expert panel will be assembled to conduct an independent investigation of the issue, and those identified as responsible will be held accountable based on the final findings.
As the organizer of the report in question, BAAI takes responsibility for not conducting a thorough review before publishing. We are grateful to our colleagues in academia and the media for alerting us to this issue and will use this unfortunate incidence as an opportunity to improve our research management and publication review process. We continue welcoming input and feedback to further improve our process and culture.
Beijing Academy of Artificial Intelligence
April 13th, 2022