This documents profiles performance and documents R&D decisions.
This project has 3 pillars of performance
At the time of writing, the network payload size is ~3KB and the responsive is in <20ms.
We will see how far we can push the capacity limit. Eventually, but in the distant edge cases, responsiveness will degrade. In fact, we may even see issues of correctness.
We want to profile the behavior at that boundary, perhaps study solutions, which informs our design decision, but not necessarily dictate it. In simpler terms, we may discover that it’s possible to handle mega large files with some tweaks but the trade-off to the vast majority of usage is not worth optimizing for this long tail.
All tests done 18GB Macbook Air.
This is load test with a “naive loader” which reads the file into the heap and splits on “\n”.
30 lines Took 0.20 millis to scroll viewport with 30 lines. That’s 5000.000298023242 FPS. Took 1.90 millis to insert with 30 lines. That’s 526.315796078077 FPS. Took 1.60 millis to insert new line with 30 lines. That’s 624.9999906867744 FPS. Took 1.50 millis to delete line with 29 lines. That’s 666.6666666666666 FPS. Took 1.40 millis to delete character with 30 lines. That’s 714.2857264499277 FPS.
500k lines, jump to viewport location 250000 x 20 Took 0.30 millis to scroll viewport with 500001 lines. That’s 3333.3332008785724 FPS. Took 1.30 millis to insert with 500001 lines. That’s 769.2307621769652 FPS. Took 2.30 millis to insert new line with 500003 lines. That’s 434.78260644216846 FPS. Took 2.50 millis to delete line with 500002 lines. That’s 400 FPS. Took 1.10 millis to delete character with 500002 lines. That’s 909.0908893868947 FPS.
5 million lines (100 megabytes), jump to viewport 2,999,999 x 20 Took 0.70 millis to scroll viewport with 5000001 lines. That’s 1428.5714528998553 FPS. Took 1.30 millis to insert with 5000000 lines. That’s 769.2307621769652 FPS. Took 6.40 millis to insert new line with 5000001 lines. That’s 156.2500005820766 FPS. Took 1.20 millis to delete character with 5000001 lines. That’s 833.3333416117563 FPS. Took 5.60 millis to delete line with 5000000 lines. That’s 178.57142781116525 FPS.
10 million lines (280 megabytes), jump to viewport Took 0.70 millis to scroll viewport with 10000001 lines. That’s 1428.5713312577295 FPS. Took 1.30 millis to insert with 10000001 lines. That’s 769.2307974459867 FPS. Took 28.40 millis to insert new line with 10000002 lines. That’s 35.211267635193735 FPS. Took 1.30 millis to delete character with 10000001 lines. That’s 769.2307974459867 FPS. Took 4.60 millis to delete line with 10000001 lines. That’s 217.3913060379389 FPS.
It took ~1 second to load the 100mb 10 million line file with the naive loader. Same as with VSCode.
We stop here for the naive loader because its limit is somewhere between the 10 million and 20 million line files. This is likely due to the string length and/or the ability split on “\n” for such a large string.
To avoid Git hosting excessive large files, resouces directory only contain up 200k file. The files were generated as follows:
seq 1 5000000 | awk '{print "This is line number " $1}' > 5_million_lines.txt
This is not representative of real files because there is alot of regularity. However, it makes it convenient to do apples to apples comparison. I also don’t trust that the data isn’t corrupted or corrupting the app otherwise where wrong line numbers are attributed.
We add an #appendLine
method to model and send in 100k lines at a time.
const fileLines = fileSourceTextString.split('\n');
primary.Model.lines = []; // Reset the model to start fresh
const BATCH_SIZE = 100_000;
for (let i = 0; i < fileLines.length; i += BATCH_SIZE) {
const batch = fileLines.slice(i, i + BATCH_SIZE);
editor.Model.appendLines(batch);
}
This doesn’t work and still fails on 20 million line files. The bottleneck seems to be on the source file being in memory.
Suspecting fileSourceTextString.split('\n')
is the issue, the next attempt involved scanning through
fileSourceTextString manually, splitting the string in half (half being the unprocessed string right of new line break) as we grab a chunk and append it. This stil doesn’t work.
We avoid converting the file to String right away. The unit of iteration however are in terms of bytes, i.e. 1mb at a time, rather than lines and strings and delimiters.
It works!
The browser tab crashes out at 72 million on a 100 million line file.
Naive is a little faster between 25-k5 million where performance starts bottoming out.
[Naive] Loaded 10,000,001 lines in 821.70ms // 265.969 MiB (278888897 bytes)
The chunked file byte reader implementation isn’t too complex so it should be preferred for the robustness.
Note: there is a slight off by 1 bug with the chunk-loader adding an implied new line at the end, but that’s okay for now.
Consumed 3660MB of heap space, near the 4GB limit of Chrome’s browser tab limit. Running Chrome from the command line you can specify the flag “–js-flags=”–max-old-space-size=8192” but this does not seem to increase the tab limit or heap size, otherwise we might have reached 140 million LOC with this approach alone.
256kb
512KB
1MB
2MB
3MB sometimes works
1,000,000 LOC:
warrenbuf.js:441 Uncaught (in promise) RangeError: Maximum call stack size exceeded
at Object.appendLines (warrenbuf.js:441:18)
at HTMLButtonElement.
4MB Maximum call stack exceeded when calling appendLine.
Takeaway is that we don’t want our chunksize to be too large as to risk large number of lines in that chunk that would call appendLines to fail. At the same time, larger chunk sizes are more performant.
There’s a slight variation to this approach. Instead of blob.text(), we get a FileReader on the blob. It’s maybe a few percentages faster but more complex dealing with async code.
Stream API is the most modern. It is faster after 1 million then slows down after that.
It crashed on 100m file. However, we don’t have visibility where it failed between 70m and 100m. The implementation of Stream API loader blocks rendering until the entire file is loaded.
After adding this logic, we know Stream API crashes at around 75 million. That’s an improvement.
After inspecting the heap, I saw there were many [sliced strings] in the heap. It seemed peculiar that the [sliced strings] accounted for ~100mb of heap and regular strings another ~108mb of heap, and considering the file was ~100mb, it seemed like there was some redundancy and not GC’ed string.
The following was able to squeeze another 5-10% capacity out of the stream loader. In other words, peak heap usage with materialized string is 10% more efficient, albeit it is more computationally expensive.
Before:
primary.Model.appendLines(slicedLines, true);
Heap usage: 3650GB
After:
// Force materialization using Array.from to break sliced string references
const materializedLines = slicedLines.map(line => Array.from(line).join(''));
primary.Model.appendLines(materializedLines, true);
Heap peak usage: 3300GB Heap usage after: 3150GB
Note this is not deterministic. At other times we peak at 3450GB then hang out at 3300GB. This is still a 5-10% improvement.
The heap usage being lower than peak usgae in the materialized line approach shows us we can defer getting to peak usage earlier.
The following hack gets the GC to kick in earlier.
const _ = new Array(100000);
Ofcourse non of this is deterministic. It depends on the GC. It merely has a non-zero chance of reaching meeting max capacity.
This solution was unstable after 70 million LOC, however it would hit 88 million LOC often.
This mode uses compression techniques. There’s a secondary benefit that the compressed data lives in native C++ memory rather than the heap.
Consequently, the editor breezed past the previous 88 million cap and reached 1+ billion while consuming ~2.5GB of ram. This is contingent on the specific file and its compression profile.
Index DB