Popular Posts

Popular Content

Powered by Blogger.

Search This Blog

Follow on Google+

Recent Posts

About us

Let's say you work as a CTO at a failing startup, and you are tired of all the responsibilities, management, etc, and you just want to go back to being a productive developer and write code again. Will this be perceived as a stupid career move or will people understand? Is it a bad move? Asking for a friend.


Comments URL: https://news.ycombinator.com/item?id=39873644

Points: 33

# Comments: 34



from Hacker News: Front Page https://ift.tt/UHx8Kgh
Continue Reading

Hi everyone! We’re Mark, Justin, and Diandre of Soundry AI (https://soundry.ai/). We provide generative AI tools for musicians, including text-to-sound and infinite sample packs.

We (Mark and Justin) started writing music together a few years ago but felt limited in our ability to create anything that we were proud of. Modern music production is highly technical and requires knowledge of sound design, tracking, arrangement, mixing, mastering, and digital signal processing. Even with our technical backgrounds (in AI and cloud computing respectively), we struggled to learn what we needed to know.

The emergence of latent diffusion models was a turning point for us just like many others in tech. All of a sudden it was possible to leverage AI to create beautiful art. After meeting our cofounder Diandre (half of the DJ duo Bandlez and expert music producer), we formed a team to apply generative AI to music production.

We began by focusing on generating music samples rather than full songs. Focusing on samples gave us several advantages, but the biggest one was the ability to build and train our custom models very quickly due to the small required length of the generated audio (typically 2-10 seconds). Conveniently, our early text-to-sample model also fit well within many existing music producers’ workflows which often involve heavy use of music samples.

We ran into several challenges when creating our text-to-sound model. The first was that we began by training our latent transformer (similar to Open AI’s Sora) using off-the-shelf audio autoencoders (like Meta’s Encodec) and text embedders (like Google’s T5). The domain gap between the data used to train these off-the-shelf models and sample data was much greater than we expected, which caused us to incorrectly attribute blame for issues in the three model components (latent transformer, autoencoder, and embedder) during development. To see how musicians can use our text-to-sound generator to write music, you can see our text-to-sound demo below:

https://www.youtube.com/watch?v=MT3k4VV5yrs&ab_channel=Sound...

The second issue we experienced was more on the product design side. When we spoke with our users in-depth we learned that novice music producers had no idea what to type into the prompt box, and expert music producers felt that our model’s output wasn’t always what they had in mind when they typed in their prompt. It turns out that text is much better at specifying the contents of visual art than music. This particular issue is what led us to our new product: the Infinite Sample Pack.

The Infinite Sample Pack does something rather unconventional: prompting with audio rather than text. Rather than requiring you to type out a prompt and specify many parameters, all you need to do is click a button to receive new samples. Each time you select a sound, our system embeds “prompt samples” as input to our model which then creates infinite variations. By limiting the number of possible outputs we’re able to hide inference latency by pre-computing lots of samples ahead of time. This new approach has seen much wider adoption and so this month we’ll be opening the system up so that everyone can create Infinite Sample Packs of their very own! To compare the workflow of the two products, you can check out our new demo using the Infinite Sample Pack:

https://www.youtube.com/watch?v=BqYhGipZCDY&ab_channel=Sound...

Overall, our founding principle is to start by asking the question: "what do musicians actually want?" Meta's open sourcing of MusicGen has resulted in many interchangeable text-to-music products, but ours is embraced by musicians. By constantly having an open dialog with our users we’ve been able to satisfy many needs including the ability to specify BPM and key, including one-shot instrument samples (so musicians can write their own melodies), and adding drag-and-drop support for digital audio workstations via our desktop app and VST. To hear some of the awesome songs made with our product, take a listen to our community showcases below!

https://soundcloud.com/soundry-ai/sets/community-showcases

We hope you enjoy our tool, and look forward to discussion in the comments


Comments URL: https://news.ycombinator.com/item?id=39782213

Points: 16

# Comments: 4



from Hacker News: Front Page https://soundry.ai/
Continue Reading

I am really puzzled by TPUs. I've been reading everywhere that TPUs are powerful and a great alternative to NVIDIA.

I have been playing with TPUs for a couple of months now, and to be honest I don't understand how can people use them in production for inference:

- almost no resources online showing how to run modern generative models like Mistral, Yi 34B, etc. on TPUs - poor compatibility between JAX and Pytorch - very hard to understand the memory consumption of the TPU chips (no nvidia-smi equivalent) - rotating IP addresses on TPU VMs - almost impossible to get my hands on a TPU v5

Is it only me? Or did I miss something?

I totally understand that TPUs can be useful for training though.


Comments URL: https://news.ycombinator.com/item?id=39670121

Points: 16

# Comments: 2



from Hacker News: Front Page https://ift.tt/UB0tSEi
Continue Reading

Hi everyone! We’re the cofounders of SiLogy (https://silogy.io/). We’re building chip design and verification tools to speed up the semiconductor development cycle. Here's a demo: https://www.youtube.com/watch?v=u0wAegt79EA

Interest in designing new chips is growing, thanks to demand from AI and the predicted decline of Moore’s Law. All these chips need to be tested in simulation. Since the number of possible states grows exponentially with chip complexity, the need for verification is exploding. Chip developers already spend 70% of their time on testing. (See this video on the “verification gap”: https://www.youtube.com/watch?v=rtaaOdGuMCc).

Tooling hasn’t kept up. The state of the art in collaborative debugging is to walk to a coworker’s desk and point to an error in a log file or waveform file. Each chip company rolls out its own tooling and infra to deal with this—this was Kay’s (one of our cofounders) entire job at his last gig. But they want to work on chips, not devtools! The solutions they come up with are often inadequate and frustrating. That’s why we started SiLogy.

SiLogy is a web app to manage the entire digital verification workflow. (“Digital verification” means testing the logic of the design and includes everything before the physical design of the chip. It’s the most time-consuming stage in verification.)

We combine three capabilities:

Test orchestration and running: The heart of our product is a CI tool that runs Verilator, a popular open-source simulator, in a Docker container. When you push to your repo or manually trigger a job in the UI, we install your dependencies and compile your binaries into a Docker image, and run your tests. You can also rerun a single test with custom arguments using the UI.

Test results and statistics: We display logs from each test in the web app. We’re working on displaying waveform files in the app, too. We also keep track of passing and failing tests within each test suite, and we’re working on slick visualizations of test trends, to keep managers happy. :)

Collaboration: soon you’ll be able to send a link to and leave a comment on a specific location within a log or waveform file, just like in Google Docs.

Unlike generic CI tools, we focus on tight integration with verification workflows. When an assertion fails, we show you the source code where it happened. We’re hard at work on waveform viewing – soon you’ll be able to generate waves from a failing test, with the click of a button.

Our roadmap includes support for the major commercial simulators: VCS, Xcelium, and Questa. We’re also working on a test gen framework based on Buck2 to statically declare tests for your post-commit runs, or programmatically generate thousands of tests for nightly regressions.

We plan to sell seats, with discounts for individuals, startups, and research labs (we’re working on pricing). For now, we’re opening up guest registration so HN can play with what we hope is the future of design verification. We owe so much of what we know to this community and we’d be so grateful for any feedback. <3 You can sign up here, just press "Use guest email address" if you don't want to give up your email: https://ift.tt/PJY17VI


Comments URL: https://news.ycombinator.com/item?id=39632872

Points: 7

# Comments: 0



from Hacker News: Front Page https://silogy.io/
Continue Reading

Hi, I'm Will. I'm 24, autistic, and have OCD tendencies. I'm learning to code and this is my first public project. I’d really appreciate your feedback and encouragement!

This project lets me solve some of my OCD problems online. There are a couple of parts of the forums that I visit – Space Battles, Sufficient Velocity, and Questionable Questing – that I want to remove. Specifically, I hate seeing indicators of how much is left in a forum thread, because I keep thinking about how much content is left. It stops me from immersing myself in the story. It stressed me out. Before I learned to code, I'd use my hand to block the total chapter count so I could read the blurb and see the word count. I would do my best to ignore the page navigation bar except for the next page button, but I usually ended up failing. One of the reasons I always read in full-screen Safari is that I didn't have to see the tab name that always had the page number. I learned not to hover my cursor over the window because it would tell me the page number.

This project is a series of userscripts that hide those indicators. I coded the userscripts in JavaScript, and I used https://github.com/quoid/userscripts as the system. Despite the fact I didn't know what a userscript was until I started coding them, AI assistance allowed me to code them with minimal help from my brother, Stevie. Khanmigo helped me plan, write, and debug code. ChatGPT taught me the theory. Part of the reason I coded a lot faster with the later userscripts is I knew enough to realize when AI was talking about something irrelevant and redirect it. One cool moment was when I correctly predicted I didn't need to code different userscripts for SpaceBattles and Sufficient Velocity because Sufficient Velocity used to be part of SpaceBattles.

I find it relaxing not to have to worry about accidentally seeing the chapter count or the final page number. Maybe they’ll help one of you!


Comments URL: https://news.ycombinator.com/item?id=39618062

Points: 9

# Comments: 5



from Hacker News: Front Page https://ift.tt/yREacFo
Continue Reading