How I Built Real-Time Log Streaming in SaturnCI

by Jason Swett

SaturnCI is a continuous integration (CI) platform I built. It's roughly analogous to products like CircleCI or GitHub Actions. To summarize in a sentence what SaturnCI does, it runs your tests in the cloud.

Any tool that runs jobs for you is of course going to need to show you your log output, preferably in real time. When I started building SaturnCI, one of the first things I had to figure out was how to build real-time log streaming. Here's how I did it.

As always with any large and ambiguous problem, I began by breaking the problem down into sub-problems in order to make it more tractable.

Breaking down the problem

The architecture of SaturnCI is that tests get executed on ephemeral worker machines, which are separate from the web servers that host the SaturnCI web application. I decided to break down the log streaming challenge into two fundamental parts.

How do I get log content from the workers to the web servers?
How do I get log content from the web server to the browser?

This mental separation was useful, although I didn't tackle these two sub-problems serially. In fact, I rarely do. Usually I aim to get a very low-fidelity version of all the sub-problems working so that I have the entire chain of sub-problems solved from a very early point in the project. Some people call this a steel thread approach. I arrived at my steel thread solution by asking myself a classic question.

"What's the simplest thing that could possibly work?"

For the first iteration of this feature I decided to say to heck with real-time streaming, I'll just wait until the whole test suite run is finished and send the logs in one big chunk. I suppose this represents an additional subdivision of the problem:

How do I get log content from the workers to the browser at all?
How do I get log content from the workers to the browser in real time?

Remember, there are two legs of the trip from worker to browser: first the log content has to get to the web server, then from there it has to get to the browser. (Creating a direct channel of communication between the workers and the browser would create duplication and make the system less modular.)

To get the logs from the worker to the web server, I used a curl command with Basic HTTP authentication. Credentials? Hard-coded. On the web side, I didn't use cloud storage, I just stuck the content right in the database because that was the simplest thing to do.

To get the logs from the web server to the browser, I simply crapped out the content to the screen and wrapped it in a pre tag.

This all might sound stupid, by the way, since I know from the beginning that my solutions won't work out in the long run. I'm building the solutions twice: once the expedient way, then once the right way. Why not just do it the right way from the beginning instead of doing double the work?

The reason I'm choosing to pay this extra price is to mitigate risk. Imagine for example that all the movie theaters in the world get taken over by thugs. (Don't think it couldn't happen.) You can either pay $20 to see a movie plus an extra "thug tax" of $30 (total cost $50), or you can try to sneak past the thugs which half the time, ends in no tax ($20 total instead of $50), but half the time ends in getting a $500 "fine" (on top of the $20 you already paid, making a total of $520) plus getting your ass kicked so bad you can't limp back to the theater for a whole week. Ouch. The options are thus:

Pay-up policy: guaranteed cost of $50 per movie
Sneak-past policy: average cost of ($20 + $500 = $520) * 0.50 = $260 per movie, plus an average of half an ass kicking and half a week of delay per movie
Organize the Great Moviegoer Uprising of 2026 and take back our theaters

The third option appeals to me personally, but from a pure cost perspective, it's far better to just pay the damn thugs. That's what I'm doing when I choose to pay the "tax" of sticking the log content straight in the database even though I know that's not what I'm going to want in the long run. I'm choosing the guarantee of a small cost instead of the risk of a large one. The price I pay on average then is smaller.

How do we make the logs update in real time?

Here's what we have working at this point:

Log content gets sent from worker to web server, in one single chunk, using curl and hard-coded Basic HTTP authentication
Log content gets saved to the main database by the web server
Log content gets shown in the browser in raw form

In any project I always try to work on the hardest and most mysterious work first. The most uncertain question in the very beginning, at least to me, was how the logs would make the trip from the worker to the web server to the browser. Now that that's sorted, what's the most mysterious thing now?

It's not migrating from hard-coded Basic HTTP authentication to something more production-ready. That's just gruntwork. It's not moving log storage from the database to the cloud. That's gruntwork too (and arguably premature optimization). Nor is it formatting and styling the log output, yet more gruntwork. I think the most mysterious part to me is the actual streaming part of the log streaming. How do we do that?

When I first built this feature, my naive first guess at how real-time log streaming might work was something like the following (in pseudo-bash):

while true; do
  LOG_CONTENT=$(cat log.txt)
  curl https://www.saturnci.com/api/v1/test_suite_run/123/logs $LOG_CONTENT
  sleep 5
done

This first pass doesn't try to be efficient. It reads in the entirety of the logs and then sends the whole thing off to the server. A nice part about doing it this way is that the server-side code doesn't even immediately have to change. It will just save to the database whatever log content it's given, and then I can repeatedly refresh the page to see the log content come through in real time.

This version of streaming is clunky as hell, and it's a far cry from what I ultimately want, but it's a small and un-risky piece of work which can get the feature meaningfully closer to the ultimate goal. As a matter of fact, even though this implementation of real-time log streaming is almost absurdly shitty, it is, nonetheless, real-time log streaming! The work from here on out doesn't consist of profound mystery-solving; it merely consists making the log streaming incrementally less crappy. That's exactly the way I wanted building this feature to work out.

Real-time updates without refreshing the page

Time to again take stock of what we have:

Log content gets sent from worker to web server, the whole log file being sent all at once, but at 5-second intervals during the execution of the test suite run instead of just once at the end
Log content gets saved to the main database every 5 seconds by the web server (thanks to the worker hitting the web server every 5 seconds)
Log content gets shown in the browser in raw form and requires manual page refreshes in order to see current log content

The obvious next step is to make the browser fetch log updates automatically so that new log content shows up without having to refresh the page. My first solution to this was to use Turbo Stream broadcasts, but I later realized that broadcasting the log content is illogical and wasteful.

The problem with broadcasting is that if the user has, say, three test suite runs running concurrently, then my web server broadcasts log updates for all three of those test suite runs every time an update comes in, even though the user is unlikely to be watching the real-time output of more than one test suite run at a time. For any test suite run whose logs the user is not watching, any log broadcast sent is a waste of resources.

The solution to this is simple: change the push to a pull. (Or, to use the more precise terms, switch from broadcasting to polling.) Instead of making the web server say, "Hey, all you test suite runs in the browser, I have new log content for you!" make the browser say, "Hey web server, I'm looking at test suite run X. Is there any new log content specifically for test suite run X?" This way, a test suite run's log content gets sent if and only if someone has a browser tab open for that specific test suite run. Switching to polling had a double advantage: it was both conceptually simpler and significantly better for performance.

How do we send only the new content?

Sending the entirety of the log file every five seconds technically works but it's not very efficient. Here's what I did to send only the new part of the content. The original code for this was actually written in bash, but the entirety of the worker code eventually got complex enough that I decided to give up my stupid push-and-pray dev cycle and start using Ruby and TDD instead. So here's the Ruby version.

most_recent_total_line_count = 0

loop do
  all_log_lines = File.readlines(@log_file_path)

  newest_log_content = all_log_lines[most_recent_total_line_count..].join("\n")
  send_content(newest_log_content)

  most_recent_total_line_count = all_log_lines.count

  sleep(@wait_interval)
end

I always try to make my code clear enough that it doesn't require an explanation in English but I'll give one here anyway, just in case: every five seconds, see how many lines the full log is, ignore however many lines we've already sent, and send whatever remains.

I expected the only-send-new-content logic would be much more complex than this, but aside from two minor tweaks (namely, "don't fire a request when there's nothing new to send" and "don't continue to stream after the test suite run has ended), the code you see above is all that has ever been needed.

Auto-scrolling

At this point, log content gets sent from worker to web server, at five second intervals, one little bit at a time instead of all at once. There still remains some gruntwork to be done before we can say that the worker-to-web-server leg of the journey is production-ready, but we can at least say that the mysterious parts have been well-solved.

The mysteries of getting log content from web server to browser have also mostly been solved. Actually, this problem can be divided into two sub-problems: 1) getting the log content, when a web request is made, from the database to the request payload, and 2) turning the request payload into something optimally usable for the user. We have the first problem solved but not the second.

The biggest shortcoming, currently, is that the log content doesn't auto-scroll, meaning that after the viewport fills up, the newest log content is hidden. Conceptually, implementing auto-scrolling is pretty simple: just scroll to the bottom each time some new content comes in.

document.addEventListener("terminal-output:newContent", () => {
  this.element.scrollTop = this.element.scrollHeight
}

There are devils in the details though. What happens when the user manually scrolls up? Obviously we don't want to keep auto-scrolling to the bottom and make the user lose their place. And when the user scrolls back down to the bottom of the log viewer, the auto-scrolling should resume. This feature actually turned out not to be that complicated. Here's the full code for the Stimulus controller that controls scrolling.

import { Controller } from "@hotwired/stimulus"

const AUTOSCROLL_BOTTOM_THRESHOLD = 100

export default class extends Controller {
  connect() {
    this.scrollMode = "auto"
    this.autoScrollToBottom()

    document.addEventListener("terminal-output:newContent", () => {
      if (this.scrollMode === "auto") {
        this.autoScrollToBottom()
      }
    })

    this.element.addEventListener("scroll", () => {
      this.scrollMode = this.isAtBottom() ? "auto" : "manual"
    })
  }

  isAtBottom() {
    return (this.element.scrollTop + this.element.clientHeight) >= this.element.scrollHeight - AUTOSCROLL_BOTTOM_THRESHOLD
  }

  autoScrollToBottom() {
    this.element.scrollTop = this.element.scrollHeight
  }
}

There are two scroll modes: auto and manual. When new content arrives (signified by the terminal-output:newContent event), the user gets automatically scrolled to the bottom—if and only if the current scroll mode is auto. Otherwise, no action is taken.

When the user manually scrolls, we check to see if they're at the bottom of the screen. If they are, then we set the scroll mode to auto, and if they're not, we set it to manual. "Bottom" is defined generously: if the scroll position is within AUTOSCROLL_BOTTOM_THRESHOLD (which happens to be set to 100) pixels of the bottom of the terminal output, then we consider the user to be scrolled to the bottom.

If my memory is not failing me, I found that, without this buffer, there's a race condition where 1) the user scrolls to the bottom, then 2) new content appears, making it so the user is no longer scrolled quite to the bottom, then 3) the scroll mode is set based on whether the user is scrolled all the way to the bottom, which is always false.

I also remember, quite vaguely, that the scrolling buffer may have been instituted in order to account for imperfect scrolling. I could always verify that assumption by taking away the buffer and seeing if it creates any problems, but the current code literally works perfectly, so I'm willing to accept the fact that it might be ever so slightly more complicated than necessary.

Log content formatting

The biggest part of making the log content presentable was converting ANSI to HTML. And I'll be darned if this wasn't a perfect use case for AI. Here's the initial version of my ANSIToHTMLHelper. Since Claude Code was not generally available until May of 2025, this code, which was apparently committed on January 9th, 2024, surely must have been copy-pasted out of ChatGPT. How quaint that seems today!

module ANSIToHTMLHelper
  RULES = {
    /\e\[0m/ => '</span>', # Reset
    /\e\[1m/ => '<span style="font-weight:bold;">',
    /\e\[32m/ => '<span style="color:green;">',
    /\e\[31m/ => '<span style="color:red;">',
    /\e\[33m/ => '<span style="color:yellow;">',
    /\e\[34m/ => '<span style="color:blue;">',
    /\e\[35m/ => '<span style="color:magenta;">',
    /\e\[36m/ => '<span style="color:cyan;">',
    /\e\[37m/ => '<span style="color:white;">',
    /\e\[\d+A/ => '', # Move cursor up
    /\e\[\d+B/ => '', # Move cursor down
    /\e\[2K/ => '',   # Clear Line
  }

  def ansi_to_html(ansi_string)
    html_string = CGI.escapeHTML(ansi_string)

    RULES.each do |ansi, html|
      html_string.gsub!(ansi, html)
    end

    html_string.html_safe
  end
end

Below is what ANSIToHTMLHelper looks like today. What a pile of garbage! I have no recollection of creating this code, but I know it was AI-generated because of the comments. (I personally only write comments extremely rarely.) The thing is though, it's okay for this code to be a mess, for two reasons. Take a little scroll through this piece of shit and I'll meet you on the other side to tell you what they are.

module ANSIToHTMLHelper
  # Map ANSI code numbers to CSS styles
  COLOR_MAP = {
    "30" => "black",
    "31" => "red",
    "32" => "green",
    "33" => "yellow",
    "34" => "blue",
    "35" => "magenta",
    "36" => "cyan",
    "37" => "white",
    "90" => "gray",
    "91" => "brightred",
    "92" => "brightgreen",
    "93" => "brightyellow",
    "94" => "brightblue",
    "95" => "brightmagenta",
    "96" => "brightcyan",
    "97" => "brightwhite"
  }

  STYLE_MAP = {
    "1" => "font-weight:bold;",
    "4" => "text-decoration:underline;"
  }

  def ansi_to_html(ansi_string)
    return "" if ansi_string.blank?
    
    # First escape HTML to prevent XSS
    html_string = CGI.escapeHTML(ansi_string.to_s)
    
    # Remove cursor movement and clearing codes
    html_string.gsub!(/(\e|\#033)\[\d+[AB]/, "")  # Cursor up/down
    html_string.gsub!(/(\e|\#033)\[2K/, "")       # Clear line
    html_string.gsub!(/\#015/, "")                # Carriage return
    html_string.gsub!(/\x00|\^@/, "")             # Null characters
    
    # Process complex ANSI codes
    open_spans = []
    
    # Process complex formatting like [1;31m or [1;34;4m
    html_string = html_string.gsub(/(\e|\#033)\[([0-9;]+)m/) do |_match|
      codes = $2.split(";")
      
      # If it contains 0, close all open spans and reset
      if codes.include?("0")
        result = open_spans.map { "</span>" }.join
        open_spans = []
        result
      else
        styles = []
        
        # Process color codes
        color_code = codes.find { |c| COLOR_MAP.key?(c) }
        styles << "color:#{COLOR_MAP[color_code]};" if color_code
        
        # Process style codes (bold, underline, etc.)
        style_codes = codes.select { |c| STYLE_MAP.key?(c) }
        styles.concat(style_codes.map { |c| STYLE_MAP[c] })
        
        # Create span with combined styles
        if styles.any?
          span = "<span style=\"#{styles.join(' ')}\">"
          open_spans.push(span)
          span
        else
          ""
        end
      end
    end
    
    # Close any remaining open spans
    html_string += open_spans.map { "</span>" }.join
    
    # Wrap in divs by line
    wrapped_html_string = html_string.split("\n").map do |line|
      "<div>#{line}</div>"
    end.join("")
    
    wrapped_html_string.html_safe
  end
end

The first reason it's fine for this code to be nasty is that it's leaf code: nothing depends on it. Provided I keep the ansi_to_html(ansi_string) method signature the same, I could literally delete every single line inside the method and rewrite it from scratch and no external client would have the slightest inkling that I did so. In fact, I'm sure that at some point I'll do exactly that.

The second reason I don't mind that this code is garbage is because through the lens of Blade Theory this is a "dull blade" (i.e. a piece of code that's hard to understand and work with) but it's a blade which we might not actually use again for a very long time. If I sharpen a blade without having a specific reason to believe that the cost of the sharpening will pay itself back, then I risk making an investment that never pays off. That's why I usually perform my refactorings just before I change a piece of code rather than at any other time. If I try to change a piece of code and find that the change is difficult, then that's concrete evidence that the code is a "dull blade" and that sharpening it will pay off. The ANSIToHTMLHelper blade is clearly dull as hell, but I might not need to cut with it again for months or even years. When the day finally comes that I need to cut with this blade, I'll sharpen it then, but no sooner.

Why did this work?

When I first resolved to climb Mount Log Streaming, I imagined it to be a steep and slippery edifice, but the application of classic tactics like breaking down the problem into sub-problems, seeking a "steel thread" solution, starting with the simplest thing that could possibly work, getting the feedback loop started early and working in small atomic increments of work, I climbed to the summit without ever mentally breaking a sweat. That's what these techniques do: they make hard problems dissolve.

I never brought it up in this post but I was boosted mightily in this project, as I am in every project, by test-driven development (TDD). My thorough test suite ensured that everything I got working stayed working, even when I made radical changes to the implementation. Since TDD always helps a programmer divide the what from the how and address each separately, thus lowering the overall mental burden at any given time, I was able to tackle the trickier parts of this project without ever having to think terribly hard. And I took great delight in the fact that I was able to run the test suite for SaturnCI on SaturnCI itself.

One way I think of AI is that it's like a "mental lubricant", reducing the friction in annoying little tasks that require intellectual gruntwork. I don't remember any single big boost that AI gave me during this project, but I know that it helped me because I've been using AI daily since sometime in 2023. I believe most if not all of this project was complete prior to that great inflection point called Claude Code General Availability, so any help that AI gave me during this project was just copy/pasting from ChatGPT, but that's still a lot more than nothing.

As helpful as AI is, it wasn't the key to success in this project, it was merely an (insanely powerful!) accelerator. What I believe allowed this project to succeed was the set of techniques described above, which have served programmers well for decades and which continue to serve us well today. I've found that when I combine the power of AI with the logic of good programming principles, almost any project is easy, quick, and fun.

Jason Swett is the host of the Code with Jason Podcast, the author of Professional Rails Testing, and the creator of SaturnCI.