New top story on Hacker News: Launch HN: RunRL (YC X25) – Reinforcement learning as a service

Launch HN: RunRL (YC X25) – Reinforcement learning as a service
8 by ag8 | 0 comments on Hacker News.
Hey HN, we’re Andrew and Derik at RunRL ( https://runrl.com/ ). We've built a platform to improve models and agents with reinforcement learning. If you can define a metric, we'll make your model or agent better, without you having to think about managing GPU clusters. Here's a demo video: https://youtu.be/EtiBjs4jfCg I (Andrew) was doing a PhD in reinforcement learning on language models, and everyone kept...not using RL because it was too hard to get running. At some point I realized that someone's got to sit down and actually write a good platform for running RL experiments. Once this happened, people started using it for antiviral design, formal verification, browser agents, and a bunch of other cool applications, so we decided to make a startup out of it. How it works: - Choose an open-weight base model (weights are necessary for RL updates; Qwen3-4B-Instruct-2507 is a good starting point) - Upload a set of initial prompts ("Generate an antiviral targeting Sars-CoV-2 protease", "Prove this theorem", "What's the average summer high in Windhoek?") - Define a reward function, using Python, an LLM-as-a-judge, or both - For complex settings, you can define an entire multi-turn environment - Watch the reward go up! For most well-defined problems, a small open model + RunRL outperforms frontier models. (For instance, we've seen Qwen-3B do better than Claude 4.1 Opus on antiviral design.) This is because LLM intelligence is notoriously "spiky"; often models are decent-but-not-great at common-sense knowledge, are randomly good at a few domains, but make mistakes on lots of other tasks. RunRL creates spikes precisely on the tasks where you need them. Pricing: $80/node-hour. Most models up to 14B parameters fit on one node (0.6-1.2 TB of VRAM). We do full fine-tuning, at the cost of parameter-efficiency (with RL, people seem to care a lot about the last few percent gains in e.g. agent reliability). Next up: continuous learning; tool use. Tool use is currently in private beta, which you can join here: https://forms.gle/D2mSmeQDVCDraPQg8 We'd love to hear any thoughts, questions, or positive or negative reinforcement!

Hey HN, we’re Andrew and Derik at RunRL ( https://runrl.com/ ). We've built a platform to improve models and agents with reinforcement learning. If you can define a metric, we'll make your model or agent better, without you having to think about managing GPU clusters. Here's a demo video: https://youtu.be/EtiBjs4jfCg I (Andrew) was doing a PhD in reinforcement learning on language models, and everyone kept...not using RL because it was too hard to get running. At some point I realized that someone's got to sit down and actually write a good platform for running RL experiments. Once this happened, people started using it for antiviral design, formal verification, browser agents, and a bunch of other cool applications, so we decided to make a startup out of it. How it works: - Choose an open-weight base model (weights are necessary for RL updates; Qwen3-4B-Instruct-2507 is a good starting point) - Upload a set of initial prompts ("Generate an antiviral targeting Sars-CoV-2 protease", "Prove this theorem", "What's the average summer high in Windhoek?") - Define a reward function, using Python, an LLM-as-a-judge, or both - For complex settings, you can define an entire multi-turn environment - Watch the reward go up! For most well-defined problems, a small open model + RunRL outperforms frontier models. (For instance, we've seen Qwen-3B do better than Claude 4.1 Opus on antiviral design.) This is because LLM intelligence is notoriously "spiky"; often models are decent-but-not-great at common-sense knowledge, are randomly good at a few domains, but make mistakes on lots of other tasks. RunRL creates spikes precisely on the tasks where you need them. Pricing: $80/node-hour. Most models up to 14B parameters fit on one node (0.6-1.2 TB of VRAM). We do full fine-tuning, at the cost of parameter-efficiency (with RL, people seem to care a lot about the last few percent gains in e.g. agent reliability). Next up: continuous learning; tool use. Tool use is currently in private beta, which you can join here: https://forms.gle/D2mSmeQDVCDraPQg8 We'd love to hear any thoughts, questions, or positive or negative reinforcement! 0 https://ift.tt/zUGoYph 8 Launch HN: RunRL (YC X25) – Reinforcement learning as a service

weightlohealt

Search This Blog

New top story on Hacker News: Launch HN: RunRL (YC X25) – Reinforcement learning as a service

Comments

Post a Comment

diet weight loss

helth