Journal: Learning PyTorch

Elliot Zhang (8lliotZ)

A log, on the web. A weblog. A blog, if you will.

2025 July 30

1000 epochs and a day later:

Loss has definitely improved! We definitely saw a lower value, plateauing around 0.5 or 0.6ish? Since I want to keep the model relatively compact, it's probably the best I'll hope for. Up next is packaging the weights, any last, final changes, and then putting it with a nice GUI. Then I can call the whole thing a completed project, and my efforts won't have been wasted!

some samples of text

“What is it you wish to deprive him of the pleasure of telling me?
Why did they arrange that?” said Pierre without awkward, and addressed
Pierre.
---
He had never seen her in Moscow in Prince Andrew and Princess Mary
trying to keep from under such a quiet little head that moved marrying
the icons had a drawing room.
---
On our Order the Horse Guards’ helpers and explanation of what they
had entered Vorónezh and had brought them together into the vestibule.

“The prince says that the books have retreated not to leave the enemy even
there,” remarked Borís, with a smile.

Though, I would be misleading you if I didn't include:

some more, less optimal, samples of text

Theor AlNá, TENp the
marryish! I saw in his son’s room he comfort more to Bald Hills.
---
Theardiapantmenn imarses
to move two Frenchmane of a cause.

One other problem I've noticed is that the LSTM really likes to complete a word which starts with "The" (my starting string in use)... even if the result is a bit nonsense. e.g. Theardifacies, Theardiaman, Theorientary, Theardiacove...

Theardiapan, forgive
me... my God!

2025 July 29

it's been another while :(

It's okay though! Just been busy with some math studying and essay-writing. Just figured out how to use a learning rate scheduler which will cut down the learning rate if loss doesn't improve. It's running now, and we should get results tomorrow!

I'm also thinking about deploying it as a downloadable app. It could be really fun to put in some text via some GUI window and have my LSTM predict the next few chars.

2025 July 22

Tweaking things: the results!!

Good news: the coherent text coming out is now not completely overfitted!! Sadly, loss still hovers around the neighborhood of 0.84 after a bunch of epochs, but the text output looks surprisingly solid.

The output is not necessairily coherent, but the form is a lot better. Compare the following human generated samples for the general idea:

Old output (human generated imitation):

eerilu reprt Pierrere

New output (human generated imitation):

my horse ate Natasha’s left flank at Austerlitz

Of course this new result isn't perfect, but my goal isn't perfection (especially since I'm only training on one text of 3.4Mish chars), but just output which seems reasonably close enough. While expanding the dataset to the whole code probably did improve model performance, it did slow down training quite a bit. After 1000 epochs loss kinda refused to go below 0.99. Hopefully I'll break through this bottleneck. In the meantime, have some samples!

When Kutúzov was delighted at the few minutes later the regimental men was
sitting by the room.
His bony night,
Denísov stayed in the crowd. And he again showed way on the next tobach
of the field.
From behind Dólokhov and Dron, bent his head in avail from the hill,
his head and hurriedly following the ground began to move on along the
hall.

one benefit is that I don't really have to worry about overfitting - I just glance at it and it's pretty clear that Tolstoy (or his translator) didn't write that. Proof of this is left as an exercise to the reader.

2025 July 20

Oh no it’s been 10 days already

Time flies, even when you are struggling with writing an EE, I suppose.

I’m currently getting a better conceptual understanding of each of the elements that make up the text generation function that I have that works. Unfortunately I don’t have much cool things to show because it’s mostly finding out what a function does and noting it down in the comments.

This should be done soon; hope to begin optimizing parameters or structure on Monday? My goal is to get continuous, relatively coherent output of a few paragraphs, and to minimize overfitting.

- See you all later... but hopefully sooner.

2025 July 10

Background

When I was a small(er) boy I would watch this guy named carykh on my iPad Air 2. He made cool projects, like AI-generated rap, AI-generated baroque music, AI-generated jazz, AI lip syncing, AI celebrity faces... All in the pre-pandemic times. Now in the post-pandemic times I remembered that he used something called "Andrej Karpathy's LSTM" to make his jazz, so I tried finding out how to use it. Turns out Torch and now PyTorch are the new cool versions of that so I'll be learning those instead!

Today

I tried reading through some tutorials and the documentation but I got impulsive and didn't want to learn matrix multiplication. Setup honestly isn't too bad but I'll definitely go back to actually know what I'm typing in. I don't want to just know the correct incantations.

Hi. I don't actually know how any of this works right now but that's okay because I'll learn by doing it. I DeepSeek'd a LLM and then fed it Leo Tolstoy's War and Peace (thanks, Gutenberg) because I thought actually doing it would help me learn how it works. Despite figuring out how to get text output it mostly did not and I still am confused about how most things work. Will try to actually learn about tensors, matrices, etc. soon (hopefully tomorrow). Here is some output samples from the first 350 epochs and my notes on it:

Epoch 40: First namedrop!

The disportess. First Anna Pávlovna menarch, did not

Even this early on, punctuation, basic words, the name Anna Pávlovna (133 occurences in War and Peace), and some other things which resemble words pretty closely can be seen.

Epoch 60: Overfitting?

The elderly lady who
had been sitting with the chief

This example comes nearly entirely from Gutenberg, lines 1460-1. However, she is sitting with the "old aunt" instead.

Epoch 160: French appears! (it's overfit.)

The face with a
sarcastic smile.


“‘Dieu me la donne

See Gutenberg, lines 1574-7.

Epoch 210: LSTM roasts my shoulders

The fact shoulders, had is resemble of a poor invalid

I just thought that it was funny that the model is mocking my shoulders, with poor but understandable grammar.

You will notice that all input starts with "the". This is because I set "the" as the prompt for the LSTM to predict the next words for. Trends such as "the fact" (132 instances in training text) and "the smile" (13) were observed.

Epoch 260: A conversation

The smile of a conversation—“I often think how unfair

This output sample incorporates many idioms seen in the file. For example, the phrase "I often think" occurs immediately following a quotation mark 3/4 times the phrase appears. "smile of a" and "a conversation" appear 3 and 9 times each, though they never join into "smile of a conversation". Em dashes appear often—though never immediately before a quotation. An example of unique and pretty solid output.

Epoch 290: 100% overfitting

the smile of a coquettish girl, which at one time pro

This output sample is of note as it is a direct quotation from lines 1551-2 of the Gutenberg text file. An unfortunate example of overfitting.

Epoch 300-310: Loss spike?

Epoch 303 Complete | Avg Loss: 0.0014
Epoch 304 Complete | Avg Loss: 0.0199
Epoch 305 Complete | Avg Loss: 0.3147
Epoch 306 Complete | Avg Loss: 0.8430
Epoch 307 Complete | Avg Loss: 0.5752
Epoch 308 Complete | Avg Loss: 0.3427
Epoch 309 Complete | Avg Loss: 0.1958
Epoch 310 Complete | Avg Loss: 0.1039

Something interesting happened here. While previously loss had bottomed out at around 0.0011 and had stayed pretty consistently around that level, loss suddenly increased significantly over the course of less than 10 epochs. Soon after the recovery began and loss began dropping once more, though follow-on results were noticably less coherent.

Results and next steps

Things definitely happened today, no one can deny that. I made this blog site, wrote this, biked to an island and got bitten by mosquitos. Next steps are figuring out what each line of the code DeepSeek wrote means, figuring out matrices, reducing overfitting, and maybe different prompts.