• Published on

    Magicoder is an open-source series of large language models (LLMs) for code developed to close the performance gap between synthetic instruction data-trained LLMs and real-world performance.

    To do this, Magicoder was trained on 75k synthetic instruction data created by a method called OSS-Instruct, which uses open-source code snippets to generate instruction data.

    This approach is designed to mitigate the inherent bias found in LLMs trained on synthetic data alone by enlightening the model with references, generating more diverse, realistic data and greater controllability.

    Magicoder and its enhanced version, MagicoderS, outperform other code models on a variety of benchmarks, including text-to-code generation in Python, multilingual coding and data-science program completion.