Looking Back at 2025
What I want to talk about
- What I learned in 2025
-
Learning Go with the Mia's Trips project
-
Building on web skills with the mias trips project (html, htmx, css, js, tailwind)
-
Building the database using sqlite (first time using this db. I found it quite useful for these kinds of small projects as its light and easy to use/teardown/edit)
-
Setting up an actual production server using docker compose and renting a server
-
Ratatui, rust tooling
-
Working with LLMs
- Using Langchain
- Using Agents
- Google ADK, Langchain, OpenAI, Anthropic Claude Code
-
Codecrafters rust tutorial for building my own grep
-
Work related skills -> Transition into MLOps and HPC servers
- HPC (SLURM)
- Understanding more about how linux works (users, groups, networks, foldersystems such as /tmp, /etc, /var), systemd, file mounts, httpd
- Understanding how nodes can communicate with each other (/etc/hosts)
- What an HPC is, installing the SLURM ecosystem of tools
- HPC related tooling -> slurmrestd, slurmdbd
- 3rd party tooling -> slurm filesystem http server using python http, rust based python CLI and SDK tooling for interacting with the SLURM API
- MLOps
- Understanding how a machine learning model gets trained -> deployed
- Understanding Azure, AzureML, Azure Arc Kubernetes, Azure DevOps
- Understanding private networks
- Understanding Azure best practices for creating a training/deployment pipeline
- Understanding github actions/workflows
- Understanding terraform (Infrastructure as Code)
- Python, Rust tooling
- Python scaffolding tool
- Project B
- Created my own agents
- Created MCP servers for the agents to work with
- Understood the difficulties in getting a multi agent service going
- Data Engineering
- Heimdall: Rust sliding window real time pipeline to pipe real time data over to models for real time inference
- Listens on pulsar workers (threading)
- Puts that data in a sliding window (the size is chosen on configuration on startup)
- That sliding window is able to be accessed using references
- Used gRPC client/server for interactions
- What we learned -> for pure in memory access for real time inferences it was best to use this as a python sdk
- So we refactored to a python sdk to save the data in memory for the model to use at inference
- We did not have a use case for this yet, howeveer, as most ML models did just fine with batch data.
- Heimdall: Rust sliding window real time pipeline to pipe real time data over to models for real time inference
- HPC (SLURM)
-
I also did personal leraning early 2025 (interview prep mostly)
- Algorithms course (Frontend Devs -> The Last Algorithms You Will Ever Need)
-
Looking Back at 2025: A Year of Building, Systems, and Scale
Why 2025 Mattered
The start of 2025 for me was the beginning of a big change in my career. After spending around three years working on a particular project, I told my boss that I wanted to try a new project completely different than what I was currently working on. To be perfectly honest, I was getting burnt out. There was always some fire to put out and our team either didn't have the proper support to handle the problem or the answer always became "just fix it now and we can figure out the long term solution later". This wasn't the fault of management or bad planning. In my opinion, we were just a fast moving company adapting with a rapidly growing userbase and simply weighed the business benefits over the growing technical debt. In any case, I was at a point where I wanted to tackle new problems and learn new skills, and thankfully my manager obliged.
So 2025 was the year in which I would really start learning and get uncomfortable. At least, that's what I told myself. I would spend a lot of personal time working more on fully fledged side projects, while also diving into a new field at work. In spite of my enthusiasm, I do feel like there was a lot left to be desired for 2025, but overall I'm happy with the work that I've done. This blog post therefore is just a review, and hopefully one which will help not just me but maybe others learn from my mistakes (or successes!).
Starting Fresh
Though mostly because of work, I was really happy that I got the opportunity to start working with data science and machine learning. The ask was simple enough, the team was growing quickly, and they needed a platform to start training machine learning models rapidly and easily through their own development environment. We started looking into on-premise high performance computing systems, and one system we found used frequently across academia and professionally seemed to be SLURM.
Now I had known nothbing about how HPCs should be built, or even the details of how to setup Linux boxes to communicate. Eventually though, I started to learn through trial and error. The slurm documentation to be honest isn't that intuitive (SLURM DOCS LINK HERE), and at first you don't even know what you want. Do you need just bare bones control - compute node structure? Do you need the RestAPI? Well then you need the account manager and Maria database to hold user and account information. Where are your job logs going to go? How will you export them back to the user? These were all questions which needed to be answered and implemented. So I had to go through many teardowns and setups. So much so I had created bash scripts to handle bringing nodes up and ready with all the extensions.
I also learned much more about Linux systems, especially the importance of certain filesystems (/etc, /var, /dev), networking, and especially filesytems mounting. Systemd started to become my friend. When I needed to create a simple web server hosting the job log files, systemd was there to start that server up and manage my server up on boot. Bash scripting started to become easier to understand, and I started to enjoy creating that one script which would save me hours later on from setting up a node to onboarding a new user onto SLURM.
I was very fortunate to have supporting upper management, but I believe none of this would have been possible if I had not taken that leap. Once you realize you're getting comfortable, that should be the signal that maybe it's time to try something new. It doesn't have to be as drastic as what I did, but maybe something as simple as talking with others, getting to know their projects, and seeing if that interests you. Finally, don't be afraid to fail. I know that's a cliche take, but I would have never been able to understand SLURM and how it works as I do now without the numerous restarts I had to do. Sometimes, there'd be a critical step I had missed and would have to start from scratch again; uninstalling the binaries, apt remove [some-package] for the 20th time, install again over and over. And each time would be a new lesson learned. I still make mistakes for sure, but now they aren't as catastrophic, and most importantly, I know where to look to figure out the issues. That I believe is the stage one should be when trying to understand a system.