projects | Notion

🐢 mini-LLM (github): A 19M bilingual LLM, which is built and trained using pytorch. It is trained on a small bilingual dataset of hindi and english which is tokenized using a custom tokenizer and follows the gpt decoder achitecture. this is capable of generating stories in Hindi and English.

🐢 Kodo-cli(github): An advanced AI-powered coding assistant that understands your entire codebase through intelligent context management and AST analysis with the multimodel support.

🐢 Varnika (github): Varnika, a simple Hindi tokenizer built using the Byte Pair Encoding (BPE) algorithm. It provides functionality for tokenizing Hindi text into subwords and decoding tokenized outputs back into text. Vocab size: 121K

🐢 paper-implemented (github) A collective repo of papers implemented by me. So far, the list is really small(only 7 papers) but more papers coming soon!! :)

🐢 100M_LLM (huggingface) A 100M mini gpt model, trained on a small subset of wikipedia dataset. Later instruct fine-tuned for conversations. It was a failure project but learnt a lot.