Create your own syntax highlighter - in vanilla JS!

Published on March 7, 2025
JavaScript

Building a Syntax Highlighter with Vanilla JavaScript

In this tutorial, we'll create a lightweight syntax highlighter from scratch using only vanilla JavaScript, HTML, and CSS. This is perfect for blogs, documentation sites, or any project where you want to display code with proper highlighting without relying on external libraries.

What We'll Build

We'll build a syntax highlighter that:

  • Identifies and highlights different code elements (keywords, strings, comments, etc.)
  • Supports multiple programming languages
  • Works with code blocks in HTML
  • Has a clean, customizable appearance
  • Uses only vanilla JavaScript (no external dependencies)

Understanding Syntax Highlighting

Before we dive into implementation, let's understand how syntax highlighting works at a basic level:

  1. Tokenization: Break the code into smaller chunks (tokens) based on language syntax rules
  2. Classification: Categorize each token (keyword, string, comment, function, etc.)
  3. Rendering: Apply styling to each token category using CSS

HTML Structure

First, let's set up our HTML structure with some code blocks to highlight:

<!DOCTYPE html>

<html lang="en"> <head> <meta charset="UTF-8"> <meta name="viewport" content="width=device-width, initial-scale=1.0"> <title>Vanilla JS Syntax Highlighter</title> <link rel="stylesheet" href="highlighter.css"> </head> <body> <h1>Syntax Highlighter Demo</h1>

<h2>JavaScript Example</h2> <pre><code class="language-js"> // A simple function function calculateArea(radius) { const PI = 3.14159; let area = PI radius radius; console.log(The area is ${area}); return area; }

// Call the function calculateArea(5); </code></pre>

<h2>HTML Example</h2> <pre><code class="language-html"> <!DOCTYPE html> <html> <head> <title>Sample Page</title> <style> body { font-family: sans-serif; } </style> </head> <body> <h1>Hello World!</h1> <p>This is a paragraph.</p> </body> </html> </code></pre>

<h2>CSS Example</h2> <pre><code class="language-css"> / Main styles / body { font-family: 'Arial', sans-serif; line-height: 1.6; color: #333; }

.container { max-width: 1200px; margin: 0 auto; padding: 1rem; }

@media (max-width: 768px) { .container { padding: 0.5rem; } } </code></pre>

<script src="highlighter.js"></script> </body> </html>

CSS for Styling

Next, let's create our CSS file (highlighter.css) to style our code blocks and provide the highlight colors:

/ General code styling /

pre { background-color: #f5f5f5; border-radius: 4px; padding: 1rem; overflow-x: auto; border: 1px solid #ddd; margin: 1.5rem 0; }

code { font-family: 'Consolas', 'Monaco', 'Courier New', monospace; font-size: 14px; line-height: 1.5; tab-size: 2; }

/ Syntax highlighting colors / .hl-keyword { color: #0033B3; font-weight: bold; }

.hl-string { color: #067D17; }

.hl-comment { color: #8C8C8C; font-style: italic; }

.hl-number { color: #1750EB; }

.hl-function { color: #6F42C1; }

.hl-operator { color: #DE5833; }

.hl-tag { color: #22863A; }

.hl-attribute { color: #6F42C1; }

.hl-selector { color: #22863A; }

.hl-property { color: #6F42C1; }

.hl-value { color: #005CC5; }

.hl-variable { color: #24292E; }

JavaScript Implementation

Now let's create our syntax highlighter in JavaScript (highlighter.js):

document.addEventListener('DOMContentLoaded', () => {

// Language definitions const languages = { js: { keywords: [ 'var', 'let', 'const', 'function', 'return', 'if', 'else', 'for', 'while', 'do', 'switch', 'case', 'break', 'continue', 'new', 'class', 'try', 'catch', 'throw', 'finally', 'async', 'await', 'import', 'export', 'from', 'true', 'false', 'null', 'undefined', 'typeof', 'instanceof', 'this', 'super', 'extends', 'static' ], patterns: [ { type: 'comment', regex: /\/\/.*?$/gm }, // Single line comments { type: 'comment', regex: /\/\[\s\S]?\*\//g }, // Multi-line comments { type: 'string', regex: /'(?:\\.|[^'\\])*'/g }, // Single quotes string { type: 'string', regex: /"(?:\\.|[^"\\])*"/g }, // Double quotes string { type: 'string', regex: /(?:\\.|[^\\])*`/g }, // Template literals { type: 'number', regex: /\b\d+(?:\.\d+)?\b/g }, // Numbers { type: 'function', regex: /\b([a-zA-Z$][\w$]*)\s*\(/g }, // Function calls { type: 'operator', regex: /[+\-*/%=<>!&|^~?:]+/g } // Operators ] }, html: { patterns: [ { type: 'comment', regex: /<!--[\s\S]*?-->/g }, // HTML comments { type: 'tag', regex: /<\/?[a-zA-Z0-9-]+/g }, // Opening/closing tags { type: 'attribute', regex: /\s[a-zA-Z0-9-]+=/g }, // Attributes { type: 'string', regex: /"(?:\\.|[^"\\])*"/g }, // Double quotes string { type: 'string', regex: /'(?:\\.|[^'\\])*'/g } // Single quotes string ] }, css: { patterns: [ { type: 'comment', regex: /\/\*[\s\S]*?\*\//g }, // CSS comments { type: 'selector', regex: /[.#]?[a-zA-Z0-9-]+(?=\s*\{)/g }, // Selectors { type: 'property', regex: /[a-zA-Z-]+(?=\s*:)/g }, // Properties { type: 'value', regex: /:[^;]+/g }, // Values { type: 'string', regex: /"(?:\\.|[^"\\])*"/g }, // Double quotes string { type: 'string', regex: /'(?:\\.|[^'\\])*'/g }, // Single quotes string { type: 'number', regex: /\b\d+(?:\.\d+)?(?:px|em|rem|%|vh|vw)?\b/g } // Numbers with units ] } };

// Process all code blocks document.querySelectorAll('pre code[class^="language-"]').forEach(codeElement => { // Get the language const classList = Array.from(codeElement.classList); const langClass = classList.find(cl => cl.startsWith('language-')); const language = langClass ? langClass.replace('language-', '') : null;

if (language && languages[language]) { highlightCode(codeElement, languages[language]); } });

function highlightCode(element, language) { let code = element.textContent;

// Create a temporary container for the highlighted code const temp = document.createElement('div');

// Store positions for each token to avoid nesting issues const tokens = [];

// Process comments and strings first (they take precedence) language.patterns.forEach(pattern => { code = code.replace(pattern.regex, (match, ...args) => { const index = args[args.length - 2]; // Position of the match tokens.push({ index, length: match.length, type: pattern.type, text: match });

// Replace with placeholders to avoid processing this text again return ''.repeat(match.length); }); });

// Process keywords for languages that have them if (language.keywords) { const keywordRegex = new RegExp(\\b(${language.keywords.join(&#039;|&#039;)})\\b, 'g'); let match;

while ((match = keywordRegex.exec(code)) !== null) { tokens.push({ index: match.index, length: match[0].length, type: 'keyword', text: match[0] }); } }

// Sort tokens by their position in the text tokens.sort((a, b) => a.index - b.index);

// Reconstruct the original text with highlighting let lastIndex = 0; let highlightedCode = '';

// Get the original code again const originalCode = element.textContent;

tokens.forEach(token => { // Add non-highlighted text before this token if (token.index > lastIndex) { highlightedCode += originalCode.substring(lastIndex, token.index); }

// Add the token with its highlight class highlightedCode += &lt;span class=&quot;hl-${token.type}&quot;&gt;${originalCode.substring(token.index, token.index + token.length)}&lt;/span&gt;;

lastIndex = token.index + token.length; });

// Add any remaining text if (lastIndex < originalCode.length) { highlightedCode += originalCode.substring(lastIndex); }

// Update the code element with highlighted code element.innerHTML = highlightedCode; } });

How the Highlighter Works

Let's break down how our syntax highlighter works:

  1. Language Definitions: We define syntax rules for each supported language (JavaScript, HTML, CSS) including:
  • Keywords (for programming languages)
  • Regex patterns for different token types (comments, strings, functions, etc.)
    1. Tokenization Process:
  • We loop through each code block with a language class
  • Identify the language from the class attribute
  • Process the code using regex patterns to find and categorize tokens
  • Store token positions to avoid nesting issues
    1. Highlighting:
  • We wrap each identified token in a <span> with an appropriate class
  • Apply CSS styles based on these classes
  • Reconstruct the code with highlighting applied
  • Advanced Features

    To make our syntax highlighter more powerful, we can add these features:

    Line Numbers

    Add line numbers to our code blocks:

    // Add this to highlightCode() function before setting innerHTML
    

    function addLineNumbers(code) { const lines = code.split('\n'); let numberedCode = '<table class="code-table"><tbody>';

    lines.forEach((line, index) => { numberedCode += ` <tr> <td class="line-number">${index + 1}</td> <td class="line-content">${line || ' '}</td> </tr> `; });

    numberedCode += '</tbody></table>'; return numberedCode; }

    // Then replace the innerHTML line with: element.innerHTML = addLineNumbers(highlightedCode);

    Add matching CSS:

    .code-table {
    

    border-collapse: collapse; width: 100%; }

    .line-number { user-select: none; text-align: right; color: #999; padding-right: 1em; min-width: 2em; border-right: 1px solid #ddd; }

    .line-content { padding-left: 1em; white-space: pre; }

    Code Copying

    Add a button to copy code to clipboard:

    // Add this function to the end of the DOMContentLoaded callback
    

    function addCopyButtons() { document.querySelectorAll('pre').forEach(pre => { const copyButton = document.createElement('button'); copyButton.className = 'copy-button'; copyButton.textContent = 'Copy';

    copyButton.addEventListener('click', () => { const code = pre.querySelector('code').textContent; navigator.clipboard.writeText(code).then(() => { copyButton.textContent = 'Copied!'; setTimeout(() => { copyButton.textContent = 'Copy'; }, 2000); }); });

    pre.style.position = 'relative'; copyButton.style.position = 'absolute'; copyButton.style.top = '0.5rem'; copyButton.style.right = '0.5rem';

    pre.appendChild(copyButton); }); }

    // Call the function addCopyButtons();

    Add CSS for the copy button:

    .copy-button {
    

    background-color: #f0f0f0; border: 1px solid #ddd; border-radius: 4px; padding: 0.25rem 0.5rem; cursor: pointer; font-size: 12px; opacity: 0.7; transition: opacity 0.2s; }

    .copy-button:hover { opacity: 1; }

    Supporting More Languages

    To add support for more languages, simply extend the languages object with new language definitions:

    // Example: Add support for Python
    

    languages.python = { keywords: [ 'def', 'class', 'from', 'import', 'as', 'return', 'if', 'elif', 'else', 'for', 'while', 'try', 'except', 'finally', 'with', 'in', 'is', 'not', 'lambda', 'None', 'True', 'False', 'and', 'or', 'global', 'nonlocal' ], patterns: [ { type: 'comment', regex: /#.*?$/gm }, // Single line comments { type: 'string', regex: /"""[\s\S]*?"""/g }, // Triple double quotes { type: 'string', regex: /'''[\s\S]*?'''/g }, // Triple single quotes { type: 'string', regex: /"(?:\\.|[^"\\])*"/g }, // Double quotes string { type: 'string', regex: /'(?:\\.|[^'\\])*'/g }, // Single quotes string { type: 'function', regex: /\bdef\s+([a-zA-Z]\w)\s\(/g }, // Function definitions { type: 'number', regex: /\b\d+(?:\.\d+)?\b/g }, // Numbers { type: 'operator', regex: /[+\-*/%=<>!&|^~?:]+/g } // Operators ] };

    Improving Performance

    For large code blocks, we can improve performance with these optimizations:

    1. Debouncing: Only highlight visible code blocks when scrolling
    2. Caching: Store highlighted code to avoid reprocessing
    3. Web Workers: Move processing to a background thread for better UI responsiveness
    Here's a simple cache implementation to add:

    // Add this to the beginning of the script
    

    const highlightCache = new Map();

    // Modify the highlightCode function to use the cache function highlightCode(element, language) { const code = element.textContent; const cacheKey = ${language.name || Object.keys(languages).find(key =&gt; languages[key] === language)}-${code};

    if (highlightCache.has(cacheKey)) { element.innerHTML = highlightCache.get(cacheKey); return; }

    // ... existing highlighting code ...

    // Cache the result before returning highlightCache.set(cacheKey, highlightedCode); element.innerHTML = highlightedCode; }

    Browser Compatibility

    This syntax highlighter works in all modern browsers. For older browsers, you might need polyfills for:

    • Array.from()
    • NodeList.forEach()
    • String.matchAll()

    Conclusion

    You've now built a fully functional syntax highlighter using vanilla JavaScript! This highlighter is:

    • Lightweight and dependency-free
    • Customizable with your own styling
    • Extensible to support any programming language
    • Easy to integrate into any website
    You can enhance it further by:
    • Adding more language definitions
    • Improving regex patterns for more accurate highlighting
    • Creating themes for different color schemes
    • Implementing line highlighting or line linking
    By understanding the core concepts of tokenization and pattern matching, you now have the knowledge to build and customize syntax highlighting for any web project.