Skip to content

starcoder

bigcode/starcoder
Open Source · chat · open-weights
GA Alert me on changes
Context
8.2K
Max output
Pricing
Open weights
Modalities
text
Released
24 Apr 2023
License: bigcode-openrail-m · bigcode/starcoder
AI summary
● machine-written

StarCoder: open-weights code generation model by BigCode

StarCoder is an open-access LLM for code generation created by the BigCode community, released in 2023 as a 15.5B-parameter model trained on permissively licensed GitHub code from The Stack dataset. The model supports fill-in-the-middle editing, handles inputs up to ~8K tokens, and ships under the BigCode OpenRAIL-M license permitting commercial use. StarCoder2, released in 2024, expands to three sizes (3B, 7B, 15B) with a 16K context window and training on 3.3–4.3 trillion tokens covering 619 programming languages.

What's new
  • Fill-in-the-middle editing for inline code completion without touching surrounding lines
  • StarCoder2 increases context window from 8K to 16K tokens with Grouped Query Attention
  • Trained on The Stack v2 covering 619 programming languages
  • Available in 3B, 7B, and 15B parameter sizes
  • Entire pipeline (data curation, training code, checkpoints) published publicly
Best for
Code completion and inline code editing across multiple languagesSelf-hosted deployment in VPC-isolated or enterprise environmentsMulti-file prompts and long diff analysis with extended contextCross-language translation and mixed-monorepo development
Sources

Source: https://huggingface.co/bigcode/starcoder