Create your own syntax highlighter - in vanilla JS!
Building a Syntax Highlighter with Vanilla JavaScript
In this tutorial, we'll create a lightweight syntax highlighter from scratch using only vanilla JavaScript, HTML, and CSS. This is perfect for blogs, documentation sites, or any project where you want to display code with proper highlighting without relying on external libraries.
What We'll Build
We'll build a syntax highlighter that:
- Identifies and highlights different code elements (keywords, strings, comments, etc.)
- Supports multiple programming languages
- Works with code blocks in HTML
- Has a clean, customizable appearance
- Uses only vanilla JavaScript (no external dependencies)
Understanding Syntax Highlighting
Before we dive into implementation, let's understand how syntax highlighting works at a basic level:
- Tokenization: Break the code into smaller chunks (tokens) based on language syntax rules
- Classification: Categorize each token (keyword, string, comment, function, etc.)
- Rendering: Apply styling to each token category using CSS
HTML Structure
First, let's set up our HTML structure with some code blocks to highlight:
<!DOCTYPE html>
<html lang="en"> <head> <meta charset="UTF-8"> <meta name="viewport" content="width=device-width, initial-scale=1.0"> <title>Vanilla JS Syntax Highlighter</title> <link rel="stylesheet" href="highlighter.css"> </head> <body> <h1>Syntax Highlighter Demo</h1>
<h2>JavaScript Example</h2> <pre><code class="language-js"> // A simple function function calculateArea(radius) { const PI = 3.14159; let area = PI radius radius; console.log(The area is ${area}
); return area; }
// Call the function calculateArea(5); </code></pre>
<h2>HTML Example</h2> <pre><code class="language-html"> <!DOCTYPE html> <html> <head> <title>Sample Page</title> <style> body { font-family: sans-serif; } </style> </head> <body> <h1>Hello World!</h1> <p>This is a paragraph.</p> </body> </html> </code></pre>
<h2>CSS Example</h2> <pre><code class="language-css"> / Main styles / body { font-family: 'Arial', sans-serif; line-height: 1.6; color: #333; }
.container { max-width: 1200px; margin: 0 auto; padding: 1rem; }
@media (max-width: 768px) { .container { padding: 0.5rem; } } </code></pre>
<script src="highlighter.js"></script> </body> </html>
CSS for Styling
Next, let's create our CSS file (highlighter.css
) to style our code blocks and provide the highlight colors:
/ General code styling /
pre { background-color: #f5f5f5; border-radius: 4px; padding: 1rem; overflow-x: auto; border: 1px solid #ddd; margin: 1.5rem 0; }
code { font-family: 'Consolas', 'Monaco', 'Courier New', monospace; font-size: 14px; line-height: 1.5; tab-size: 2; }
/ Syntax highlighting colors / .hl-keyword { color: #0033B3; font-weight: bold; }
.hl-string { color: #067D17; }
.hl-comment { color: #8C8C8C; font-style: italic; }
.hl-number { color: #1750EB; }
.hl-function { color: #6F42C1; }
.hl-operator { color: #DE5833; }
.hl-tag { color: #22863A; }
.hl-attribute { color: #6F42C1; }
.hl-selector { color: #22863A; }
.hl-property { color: #6F42C1; }
.hl-value { color: #005CC5; }
.hl-variable { color: #24292E; }
JavaScript Implementation
Now let's create our syntax highlighter in JavaScript (highlighter.js
):
document.addEventListener('DOMContentLoaded', () => {
// Language definitions const languages = { js: { keywords: [ 'var', 'let', 'const', 'function', 'return', 'if', 'else', 'for', 'while', 'do', 'switch', 'case', 'break', 'continue', 'new', 'class', 'try', 'catch', 'throw', 'finally', 'async', 'await', 'import', 'export', 'from', 'true', 'false', 'null', 'undefined', 'typeof', 'instanceof', 'this', 'super', 'extends', 'static' ], patterns: [ { type: 'comment', regex: /\/\/.*?$/gm }, // Single line comments { type: 'comment', regex: /\/\[\s\S]?\*\//g }, // Multi-line comments
{ type: 'string', regex: /'(?:\\.|[^'\\])*'/g }, // Single quotes string { type: 'string', regex: /"(?:\\.|[^"\\])*"/g }, // Double quotes string { type: 'string', regex: /(?:\\.|[^
\\])*`/g }, // Template literals { type: 'number', regex: /\b\d+(?:\.\d+)?\b/g }, // Numbers { type: 'function', regex: /\b([a-zA-Z$][\w$]*)\s*\(/g }, // Function calls { type: 'operator', regex: /[+\-*/%=<>!&|^~?:]+/g } // Operators ] }, html: { patterns: [ { type: 'comment', regex: /<!--[\s\S]*?-->/g }, // HTML comments { type: 'tag', regex: /<\/?[a-zA-Z0-9-]+/g }, // Opening/closing tags { type: 'attribute', regex: /\s[a-zA-Z0-9-]+=/g }, // Attributes
{ type: 'string', regex: /"(?:\\.|[^"\\])*"/g }, // Double quotes string { type: 'string', regex: /'(?:\\.|[^'\\])*'/g } // Single quotes string ] }, css: { patterns: [ { type: 'comment', regex: /\/\*[\s\S]*?\*\//g }, // CSS comments { type: 'selector', regex: /[.#]?[a-zA-Z0-9-]+(?=\s*\{)/g }, // Selectors { type: 'property', regex: /[a-zA-Z-]+(?=\s*:)/g }, // Properties { type: 'value', regex: /:[^;]+/g }, // Values
{ type: 'string', regex: /"(?:\\.|[^"\\])*"/g }, // Double quotes string { type: 'string', regex: /'(?:\\.|[^'\\])*'/g }, // Single quotes string { type: 'number', regex: /\b\d+(?:\.\d+)?(?:px|em|rem|%|vh|vw)?\b/g } // Numbers with units ] } };
// Process all code blocks document.querySelectorAll('pre code[class^="language-"]').forEach(codeElement => { // Get the language const classList = Array.from(codeElement.classList); const langClass = classList.find(cl => cl.startsWith('language-')); const language = langClass ? langClass.replace('language-', '') : null;
if (language && languages[language]) { highlightCode(codeElement, languages[language]); } });
function highlightCode(element, language) { let code = element.textContent;
// Create a temporary container for the highlighted code const temp = document.createElement('div');
// Store positions for each token to avoid nesting issues const tokens = [];
// Process comments and strings first (they take precedence) language.patterns.forEach(pattern => { code = code.replace(pattern.regex, (match, ...args) => { const index = args[args.length - 2]; // Position of the match tokens.push({ index, length: match.length, type: pattern.type, text: match });
// Replace with placeholders to avoid processing this text again return ''.repeat(match.length); }); });
// Process keywords for languages that have them if (language.keywords) { const keywordRegex = new RegExp(\\b(${language.keywords.join('|')})\\b
, 'g'); let match;
while ((match = keywordRegex.exec(code)) !== null) { tokens.push({ index: match.index, length: match[0].length, type: 'keyword', text: match[0] }); } }
// Sort tokens by their position in the text tokens.sort((a, b) => a.index - b.index);
// Reconstruct the original text with highlighting let lastIndex = 0; let highlightedCode = '';
// Get the original code again const originalCode = element.textContent;
tokens.forEach(token => { // Add non-highlighted text before this token if (token.index > lastIndex) { highlightedCode += originalCode.substring(lastIndex, token.index); }
// Add the token with its highlight class highlightedCode += <span class="hl-${token.type}">${originalCode.substring(token.index, token.index + token.length)}</span>
;
lastIndex = token.index + token.length; });
// Add any remaining text if (lastIndex < originalCode.length) { highlightedCode += originalCode.substring(lastIndex); }
// Update the code element with highlighted code element.innerHTML = highlightedCode; } });
How the Highlighter Works
Let's break down how our syntax highlighter works:
- Language Definitions: We define syntax rules for each supported language (JavaScript, HTML, CSS) including:
- Tokenization Process:
- Highlighting:
<span>
with an appropriate classAdvanced Features
To make our syntax highlighter more powerful, we can add these features:
Line Numbers
Add line numbers to our code blocks:
// Add this to highlightCode() function before setting innerHTML
function addLineNumbers(code) { const lines = code.split('\n'); let numberedCode = '<table class="code-table"><tbody>';
lines.forEach((line, index) => { numberedCode += ` <tr> <td class="line-number">${index + 1}</td> <td class="line-content">${line || ' '}</td> </tr> `; });
numberedCode += '</tbody></table>'; return numberedCode; }
// Then replace the innerHTML line with: element.innerHTML = addLineNumbers(highlightedCode);
Add matching CSS:
.code-table {
border-collapse: collapse; width: 100%; }
.line-number { user-select: none; text-align: right; color: #999; padding-right: 1em; min-width: 2em; border-right: 1px solid #ddd; }
.line-content { padding-left: 1em; white-space: pre; }
Code Copying
Add a button to copy code to clipboard:
// Add this function to the end of the DOMContentLoaded callback
function addCopyButtons() { document.querySelectorAll('pre').forEach(pre => { const copyButton = document.createElement('button'); copyButton.className = 'copy-button'; copyButton.textContent = 'Copy';
copyButton.addEventListener('click', () => { const code = pre.querySelector('code').textContent; navigator.clipboard.writeText(code).then(() => { copyButton.textContent = 'Copied!'; setTimeout(() => { copyButton.textContent = 'Copy'; }, 2000); }); });
pre.style.position = 'relative'; copyButton.style.position = 'absolute'; copyButton.style.top = '0.5rem'; copyButton.style.right = '0.5rem';
pre.appendChild(copyButton); }); }
// Call the function addCopyButtons();
Add CSS for the copy button:
.copy-button {
background-color: #f0f0f0; border: 1px solid #ddd; border-radius: 4px; padding: 0.25rem 0.5rem; cursor: pointer; font-size: 12px; opacity: 0.7; transition: opacity 0.2s; }
.copy-button:hover { opacity: 1; }
Supporting More Languages
To add support for more languages, simply extend the languages
object with new language definitions:
// Example: Add support for Python
languages.python = { keywords: [ 'def', 'class', 'from', 'import', 'as', 'return', 'if', 'elif', 'else', 'for', 'while', 'try', 'except', 'finally', 'with', 'in', 'is', 'not', 'lambda', 'None', 'True', 'False', 'and', 'or', 'global', 'nonlocal' ], patterns: [ { type: 'comment', regex: /#.*?$/gm }, // Single line comments { type: 'string', regex: /"""[\s\S]*?"""/g }, // Triple double quotes { type: 'string', regex: /'''[\s\S]*?'''/g }, // Triple single quotes
{ type: 'string', regex: /"(?:\\.|[^"\\])*"/g }, // Double quotes string { type: 'string', regex: /'(?:\\.|[^'\\])*'/g }, // Single quotes string { type: 'function', regex: /\bdef\s+([a-zA-Z]\w)\s\(/g }, // Function definitions { type: 'number', regex: /\b\d+(?:\.\d+)?\b/g }, // Numbers { type: 'operator', regex: /[+\-*/%=<>!&|^~?:]+/g } // Operators ] };
Improving Performance
For large code blocks, we can improve performance with these optimizations:
- Debouncing: Only highlight visible code blocks when scrolling
- Caching: Store highlighted code to avoid reprocessing
- Web Workers: Move processing to a background thread for better UI responsiveness
// Add this to the beginning of the script
const highlightCache = new Map();
// Modify the highlightCode function to use the cache function highlightCode(element, language) { const code = element.textContent; const cacheKey = ${language.name || Object.keys(languages).find(key => languages[key] === language)}-${code}
;
if (highlightCache.has(cacheKey)) { element.innerHTML = highlightCache.get(cacheKey); return; }
// ... existing highlighting code ...
// Cache the result before returning highlightCache.set(cacheKey, highlightedCode); element.innerHTML = highlightedCode; }
Browser Compatibility
This syntax highlighter works in all modern browsers. For older browsers, you might need polyfills for:
Array.from()
NodeList.forEach()
String.matchAll()
Conclusion
You've now built a fully functional syntax highlighter using vanilla JavaScript! This highlighter is:
- Lightweight and dependency-free
- Customizable with your own styling
- Extensible to support any programming language
- Easy to integrate into any website
- Adding more language definitions
- Improving regex patterns for more accurate highlighting
- Creating themes for different color schemes
- Implementing line highlighting or line linking