Popular Posts

Popular Content

Powered by Blogger.

Search This Blog

Follow on Google+

Recent Posts

About us

Hi HN! I’m Tony, co-founder of Inngest. I wanted to share AgentKit, our Typescript multi-agent library we’ve been cooking and testing with some early users in prod for months.

Although OpenAI’s Agents SDK has been launched since, we think an Agent framework should offer more deterministic and flexible routing, work with multiple model providers, embrace MCP (for rich tooling), and support the unstoppable and growing community of TypeScript AI developers by enabling a smooth transition to production use cases.

This is why we are building AgentKit, and we’re really excited about it for a few reasons:

Firstly, it’s simple. We embrace KISS principles brought by Anthropic and HuggingFace by allowing you to gradually add autonomy to your AgentKit program using primitives:

- Agents: LLM calls that can be combined with prompts, tools, and MCP native support.

- Networks: a simple way to get Agents to collaborate with a shared State, including handoff.

- State: combines conversation history with a fully typed state machine, used in routing.

- Routers: where the autonomy lives, from code-based to LLM-based (ex: ReAct) orchestration

The routers are where the magic happens, and allow you to build deterministic, reliable, testable agents.

AgentKit routing works as follows: the network calls itself in a loop, inspecting the State to determine which agents to call next using a router. The returned agent runs, then optionally updates state data using its tools. On the next loop, the network inspects state data and conversation history, and determines which new agent to run.

This fully typed state machine routing allows you to deterministically build agents using any of the effective agent patterns — which means your code is easy to read, edit, understand, and debug.

This also makes handoff incredibly easy: you define when agents should hand off to each other using regular code and state (or by calling an LLM in the router for AI-based routing). This is similar to the OpenAI Agents SDK but easier to manage, plan, and build.

Then comes the local development and moving to production capabilities.

AgentKit is compatible with Inngest’s tooling, meaning that you can test agents using Inngest’s local DevServer, which provides traces, inputs, outputs, replay, tool, and MCP inputs and outputs, and (soon) a step-over debugger so that you can easily understand and visually see what's happening in the agent loop.

In production, you can also optionally combine AgentKit with Inngest for fault-tolerant execution. Each agent’s LLM call is wrapped in a step, and tools can use multiple steps to incorporate things like human-in-the-loop. This gives you native orchestration, observability, and out-of-the-box scale.

You will find the documentation as an example of an AgentKit SWE-bench and multiple Coding Agent examples.

It’s fully open-source under the Apache 2 license.

If you want to get started:

- npm: npm i @inngest/agent-kit

- GitHub: https://github.com/inngest/agent-kit

- Docs: https://agentkit.inngest.com/overview

We’re excited to finally launch AgentKit; let us know what you think!


Comments URL: https://news.ycombinator.com/item?id=43426164

Points: 32

# Comments: 9



from Hacker News: Front Page https://ift.tt/6CmHyTn
Continue Reading

Hey HN, after years building some of the core AI and NLU systems in Google Search, we decided to leave and build outside. Our goal was to put the advanced ML and DS techniques we’ve been using in the hands of all software engineers, so that everyone can build AI and Search apps at the same level of performance and sophistication as the big labs.

This was a hard technical challenge but we were very inspired by the MVC architecture for Web development. The intuition there was that when a data model changes, its view would get auto-updated. We built a similar architecture for AI. On one side is a scoring system, which encapsulates in a set of metrics what’s good about the AI application. On the other side is a set of optimizers that “compile” against this scorer - prompt optimization, data filtering, synthetic data generation, supervised learning, RL, etc. The scoring system can be calibrated using developer, user or rater feedback, and once it’s updated, all the optimizers get recompiled against it.

The result is a setup that makes it easy to incrementally improve the quality of your AI in a tight feedback loop: You update your scorers, they auto-update your optimizers, your app gets better, you see that improvement in interpretable scores, and then you repeat, progressing from simpler to more advanced optimizers and from off-the-shelf to calibrated scorers.

We would love your feedback on this approach. https://build.withpi.ai has a set of playgrounds to help you quickly build a scorer and multiple optimizers. No sign in required. https://code.withpi.ai has the API reference and Notebook links. Finally, we have a Loom demo [1].

More technical details

Scorers: Our scoring system has three key differences from the common LLM-as-a-judge pattern.

First, rather than a single label or metric from an LLM judge, our scoring system is represented as a tunable tree of metrics, with 20+ dimensions which get combined into a final (non-linear) weighted score. The tree structure makes scores easily interpretable (just look at the breakdown by dimension), extensible (just add/remove a dimension), and adjustable (just re-tune the weights). Training the scoring system with labeled/preference data adjusts the weights. You can automate this process with user feedback signals, resulting in a tight feedback loop.

Second, our scoring system handles natural language dimensions (great for free-form, qualitative questions requiring NLU) alongside quantitative dimensions (like computations over dates or doc length, which can be provided in Python) in the same tree. When calibrating with your labeled or preference data, the scorer learns how to balance these.

Third, for natural language scoring, we use specialized smaller encoder models rather than autoregressive models. Encoders are a natural fit for scoring as they are faster and cheaper to run, easier to fine-tune, and more suitable architecturally (bi-directional attention with regression or classification head) than similar sized decoder models. For example, we can score 20+ dimensions in sub-100ms, making it possible to use scoring everywhere from evaluation to agent orchestration to reward modeling.

Optimizers: We took the most salient ML techniques and reformulated them as optimizers against our scoring system e.g. for DSPy, the scoring system acts as its validator. For GRPO, the scoring system acts as its reward model. We’re keen to hear the community’s feedback on which techniques to add next.

Overall stack: Playgrounds next.js and Vercel. AI: Runpod and GCP for training GPUs, TRL for training algos, ModernBert & Llama as base models. GCP and Azure for 4o and Anthropic calls.

We’d love your feedback and perspectives: Our team will be around to answer questions and discuss. If there’s a lot of interest, happy to host a live session!

- Achint, co-founder of Pi Labs

[1] http://loom.com/share/c09a1fda8cdf4003a5664fa9cfbf7804


Comments URL: https://news.ycombinator.com/item?id=43362535

Points: 10

# Comments: 0



from Hacker News: Front Page https://ift.tt/VZ2oxYC
Continue Reading

Hey everybody, you might remember my older game, Lander! It made a big splash on Hacker News about 2 years ago. I'm still enjoying writing games with no dependencies. I've been working on Bubbles for about 6 months and would love to see your scores.

If you like it, you can build your own levels with my builder tool: https://ehmorris.com/bubbles/builder/ and share the levels here or via Github.


Comments URL: https://news.ycombinator.com/item?id=43355658

Points: 21

# Comments: 7



from Hacker News: Front Page https://ift.tt/Mba0Hcv
Continue Reading

Hey HN! We're building an open-source CMS designed to help creators with every part of the content production pipeline.

We're showing our tiny first step: A tool designed to take in a Twitter username and produce an "identity card" based on it. We expect to use an approach similar to [Constitutional AI] with an explicit focus on repeatability, testability, and verification of an "identity card." We think this approach could be used to create finetuning examples for training changes, or serve as inference time insight for LLMs, or most likely a combination of the two.

The tooling we're showing today is extremely simplistic (and the AI is frankly bad) but this is intentional. We're more focused on showing the dev experience and community aspects. We'd like to make it easier to contribute to this project than edit Wikipedia. Communities are frustrated with things like Wordpress, Apache, and other open source foundations focusing on things other than software. We have a lot of community ideas (governance via vote by jury is perhaps the most interesting).

We're a team of 5, and we've bounced around a few companies with each other. We're all professional creators (video + music) and we're creating tooling for ourselves first.

Previously, we did a startup called Vidpresso (YC W14) that was acquired by Facebook in 2018. We all worked at Facebook for 5 years on creator tooling, and have since left to start this thing.

After leaving FB, it was painful for us to leave the warm embrace of the Facebook infra team where we had amazing tooling. Since then, we've pivoted a bunch of times trying to figure out our "real" product. While we think we've finally nailed it, the developer experience we built is one we think others could benefit from.

Our tooling is designed so any developer can easily jump in and start contributing. It's an AI-first dev environment designed with a few key principles in mind:

1. You should be able to discover any command you need to run without looking at docs. 2. To make a change, as much context as possible should be provided as close to the code as possible. 3. AIs are "people too", in the sense that they benefit from focused context, and not being distracted by having to search deeply through multiple files or documentation to make changes.

We have a few non-traditional elements to our stack which we think are worth exploring. [Isograph] helps us simplify our component usage with GraphQL. [Replit] lets people use AI coding without needing to set up any additional tooling. We've learned how to treat it like a junior developer, and think it will be the best platform for AI-first open source projects going forward. [Sapling] (and Git together) for version control. It might sound counter intuitive, but we use Git to manage agent interactionsand we use Sapling to manage "purposeful" commits.

My last [Show HN post in 2013] ended up helping me find my Vidpresso cofounder so I have high hopes for this one. I'm excited to meet anyone, developers, creators, or nice people in general, and start to work with them to make this project work. I have good references of being a nice guy, and aim to keep that going with this project.

The best way to work with us is [remix our Replit app] and [join our Discord].

Thanks for reading and checking us out! It's super early, but we're excited to work with you!

[Constitutional AI]: https://www.anthropic.com/research/constitutional-ai-harmles...

[Isograph]: https://isograph.dev

[Replit]: https://replit.com

[Sapling]: https://sapling-scm.com

[Show HN post in 2013]: https://news.ycombinator.com/item?id=6993981

[remix our Replit app]: https://replit.com/t/bolt-foundry/repls/Content-Foundry/view...

[join our Discord]: https://discord.gg/TjQZfWjSQ7


Comments URL: https://news.ycombinator.com/item?id=43292058

Points: 18

# Comments: 14



from Hacker News: Front Page https://ift.tt/70mznZq
Continue Reading