Loss has definitely improved! We definitely saw a lower value, plateauing around 0.5 or 0.6ish? Since I want to keep the model relatively compact, it's probably the best I'll hope for. Up next is packaging the weights, any last, final changes, and then putting it with a nice GUI. Then I can call the whole thing a completed project, and my efforts won't have been wasted!
Though, I would be misleading you if I didn't include:
One other problem I've noticed is that the LSTM really likes to complete a word which starts with "The" (my starting string in use)... even if the result is a bit nonsense. e.g. Theardifacies, Theardiaman, Theorientary, Theardiacove...
It's okay though! Just been busy with some math studying and essay-writing. Just figured out how to use a learning rate scheduler which will cut down the learning rate if loss doesn't improve. It's running now, and we should get results tomorrow!
I'm also thinking about deploying it as a downloadable app. It could be really fun to put in some text via some GUI window and have my LSTM predict the next few chars.
Good news: the coherent text coming out is now not completely overfitted!! Sadly, loss still hovers around the neighborhood of 0.84 after a bunch of epochs, but the text output looks surprisingly solid.
The output is not necessairily coherent, but the form is a lot better. Compare the following human generated samples for the general idea:
Old output (human generated imitation):
New output (human generated imitation):
Of course this new result isn't perfect, but my goal isn't perfection (especially since I'm only training on one text of 3.4Mish chars), but just output which seems reasonably close enough. While expanding the dataset to the whole code probably did improve model performance, it did slow down training quite a bit. After 1000 epochs loss kinda refused to go below 0.99. Hopefully I'll break through this bottleneck. In the meantime, have some samples!
one benefit is that I don't really have to worry about overfitting - I just glance at it and it's pretty clear that Tolstoy (or his translator) didn't write that. Proof of this is left as an exercise to the reader.
Time flies, even when you are struggling with writing an EE, I suppose.
I’m currently getting a better conceptual understanding of each of the elements that make up the text generation function that I have that works. Unfortunately I don’t have much cool things to show because it’s mostly finding out what a function does and noting it down in the comments.
This should be done soon; hope to begin optimizing parameters or structure on Monday? My goal is to get continuous, relatively coherent output of a few paragraphs, and to minimize overfitting.
- See you all later... but hopefully sooner.
When I was a small(er) boy I would watch this guy named carykh on my iPad Air 2. He made cool projects, like AI-generated rap, AI-generated baroque music, AI-generated jazz, AI lip syncing, AI celebrity faces... All in the pre-pandemic times. Now in the post-pandemic times I remembered that he used something called "Andrej Karpathy's LSTM" to make his jazz, so I tried finding out how to use it. Turns out Torch and now PyTorch are the new cool versions of that so I'll be learning those instead!
I tried reading through some tutorials and the documentation but I got impulsive and didn't want to learn matrix multiplication. Setup honestly isn't too bad but I'll definitely go back to actually know what I'm typing in. I don't want to just know the correct incantations.
Hi. I don't actually know how any of this works right now but that's okay because I'll learn by doing it. I DeepSeek'd a LLM and then fed it Leo Tolstoy's War and Peace (thanks, Gutenberg) because I thought actually doing it would help me learn how it works. Despite figuring out how to get text output it mostly did not and I still am confused about how most things work. Will try to actually learn about tensors, matrices, etc. soon (hopefully tomorrow). Here is some output samples from the first 350 epochs and my notes on it:
Even this early on, punctuation, basic words, the name Anna Pávlovna (133 occurences in War and Peace), and some other things which resemble words pretty closely can be seen.
This example comes nearly entirely from Gutenberg, lines 1460-1. However, she is sitting with the "old aunt" instead.
See Gutenberg, lines 1574-7.
I just thought that it was funny that the model is mocking my shoulders, with poor but understandable grammar.
You will notice that all input starts with "the". This is because I set "the" as the prompt for the LSTM to predict the next words for. Trends such as "the fact" (132 instances in training text) and "the smile" (13) were observed.
This output sample incorporates many idioms seen in the file. For example, the phrase "I often think" occurs immediately following a quotation mark 3/4 times the phrase appears. "smile of a" and "a conversation" appear 3 and 9 times each, though they never join into "smile of a conversation". Em dashes appear often—though never immediately before a quotation. An example of unique and pretty solid output.
This output sample is of note as it is a direct quotation from lines 1551-2 of the Gutenberg text file. An unfortunate example of overfitting.
Something interesting happened here. While previously loss had bottomed out at around 0.0011 and had stayed pretty consistently around that level, loss suddenly increased significantly over the course of less than 10 epochs. Soon after the recovery began and loss began dropping once more, though follow-on results were noticably less coherent.
Things definitely happened today, no one can deny that. I made this blog site, wrote this, biked to an island and got bitten by mosquitos. Next steps are figuring out what each line of the code DeepSeek wrote means, figuring out matrices, reducing overfitting, and maybe different prompts.