Popular Posts

Popular Content

Powered by Blogger.

Search This Blog

Follow on Google+

Recent Posts

About us

Layer Enhanced Classification (LEC) is a novel technique that outperforms current industry leaders like GPT-4o, LlamaGuards 1 and 8B, and deBERTa v3 Prompt Injection v2 for content safety and prompt injection tasks.

We prove that the intermediate hidden layers in transformers are robust feature extractors for text classification.

On content safety, LEC models achieved a 0.96 F1 score vs GPT-4o's 0.82 and Llama Guard 8B's 0.71.The LEC models were able to outperform the other models with only 15 training examples for binary classification and 50 examples for multi-class classification across 66 categories.

On prompt injection,LEC models achieved a 0.98 F1 score vs GPT-4o's 0.92 and deBERTa v3 Prompt Injection v2's 0.73. LEC models were able to outperform deBERTa with only 5 training examples and GPT-4o with only 55 training examples.

Read the full paper and our approach here: https://arxiv.org/abs/2412.13435


Comments URL: https://news.ycombinator.com/item?id=42463943

Points: 3

# Comments: 0



from Hacker News: Front Page https://ift.tt/F3UnDHb
Continue Reading

Every data pipeline job I had to tackle required quite a few components to set up:

- One tool to ingest data

- Another one to transform it

- If you wanted to run Python, set up an orchestrator

- If you need to check the data, a data quality tool

Let alone this being hard to set up and taking time, it is also pretty high-maintenance. I had to do a lot of infra work, and while this being billable hours for me I didn’t enjoy the work at all. For some parts of it, there were nice solutions like dbt, but in the end for an end-to-end workflow, it didn’t work. That’s why I decided to build an end-to-end solution that could take care of data ingestion, transformation, and Python stuff. Initially, it was just for our own usage, but in the end, we thought this could be a useful tool for everyone.

In its core, Bruin is a data framework that consists of a CLI application written in Golang, and a VS Code extension that supports it with a local UI.

Bruin supports quite a few stuff:

- Data ingestion using ingestr (https://github.com/bruin-data/ingestr)

- Data transformation in SQL & Python, similar to dbt

- Python env management using uv

- Built-in data quality checks

- Secrets management

- Query validation & SQL parsing

- Built-in templates for common scenarios, e.g. Shopify, Notion, Gorgias, BigQuery, etc

This means that you can write end-to-end pipelines within the same framework and get it running with a single command. You can run it on your own computer, on GitHub Actions, or in an EC2 instance somewhere. Using the templates, you can also have ready-to-go pipelines with modeled data for your data warehouse in seconds.

It includes an open-source VS Code extension as well, which allows working with the data pipelines locally, in a more visual way. The resulting changes are all in code, which means everything is version-controlled regardless, it just adds a nice layer.

Bruin can run SQL, Python, and data ingestion workflows, as well as quality checks. For Python stuff, we use the awesome (and it really is awesome!) uv under the hood, install dependencies in an isolated environment, and install and manage the Python versions locally, all in a cross-platform way. Then in order to manage data uploads to the data warehouse, it uses dlt under the hood to upload the data to the destination. It also uses Arrow’s memory-mapped files to easily access the data between the processes before uploading them to the destination.

We went with Golang because of its speed and strong concurrency primitives, but more importantly, I knew Go better than the other languages available to me and I enjoy writing Go, so there’s also that.

We had a small pool of beta testers for quite some time and I am really excited to launch Bruin CLI to the rest of the world and get feedback from you all. I know it is not often to build data tooling in Go but I believe we found ourselves in a nice spot in terms of features, speed, and stability.

https://github.com/bruin-data/bruin

I’d love to hear your feedback and learn more about how we can make data pipelines easier and better to work with, looking forward to your thoughts!

Best, Burak


Comments URL: https://news.ycombinator.com/item?id=42442812

Points: 22

# Comments: 4



from Hacker News: Front Page https://ift.tt/KJIxNlz
Continue Reading

This is another one of my automate-my-life projects - I'm constantly asking the same question to different AIs since there's always the hope of getting a better answer somewhere else. Maybe ChatGPT's answer is too short, so I ask Perplexity. But I realize that's hallucinated, so I try Gemini. That answer sounds right, but I cross-reference with Claude just to make sure.

This doesn't really apply to math/coding (where o1 or Gemini can probably one-shot an excellent response), but more to online search, where information is more fluid and there's no "right" search engine + text restructuring + model combination every time. Even o1 doesn't have online search, so it's obviously a hard problem to solve.

An example is something like "best ski resorts in the US", which will get a different response from every GPT, but most of their rankings won't reflect actual skiers' consensus - say, on Reddit https://www.reddit.com/r/skiing/comments/sew297/updated_us_s... - because there's so many opinions floating around, a one-shot RAG search + LLM isn't going to have enough context to find how everyone thinks. And obviously, offline GPTs like o1 and Sonnet/Haiku aren't going to have the latest updates if a resort closes for example.

So I’ve spent the last few months experimenting with a new project that's basically the most expensive GPT I’ll ever run. It runs search queries through ChatGPT, Claude, Grok, Perplexity, Gemini, etc., then aggregates the responses. For added financial tragedy, in-between it also uses multiple embedding models and performs iterative RAG searches through different search engines. This all functions as sort of like one giant AI brain. So I pay for every search, then every embedding, then every intermediary LLM input/output, then the final LLM input/output. On average it costs about 10 to 30 cents per search. It's also extremely slow.

https://ithy.com

I know that sounds absurdly overkill, but that’s kind of the point. The goal is to get the most accurate and comprehensive answer possible, because it's been vetted by a bunch of different AIs, each sourcing from different buckets of websites. Context limits today are just large enough that this type of search and cross-model iteration is possible, where we can determine the "overlap" between a diverse set of text to determine some sort of consensus. The idea is to get online answers that aren't attainable from any single AI. If you end up trying this out, I'd recommend comparing Ithy's output against the other GPTs to see the difference.

It's going to cost me a fortune to run this project (I'll probably keep it online for a month or two), but I see it as an exploration of what’s possible with today’s model APIs, rather than something that’s immediately practical. Think of it as an online o1 (without the $200/month price tag, though I'm offering a $29/month Pro plan to help subsidize). If nothing else, it’s a fun (and pricey) thought experiment.


Comments URL: https://news.ycombinator.com/item?id=42409056

Points: 31

# Comments: 18



from Hacker News: Front Page https://ithy.com
Continue Reading

Hello,

In addition to my studies in computer science, I have been working on a side project. I obtain data from the Unternehmensregister, a register where every German limited company is required to publish their financial statements. These statements are published as HTML files and are completely unstructured. While financial statements often look similar, companies are not required to follow a specific structure, which often leads to inconsistently formatted statements.

The use of the Unternehmensregister is completely free, so you can check out some examples.

I wrote code that converts the unstructured financial statements into structured data using the ChatGPT API. This works well. Of course, there are some problems that have not yet been solved, but data extraction works well for the majority of companies.

I than coded a Random Forest algorithm to estimate the probability of default for a company based on its financial statement from the Unternehmensregister. I built a website to present the structured data along with the scores. Essentially, I create a credit reports for companies.

Currently, there are four companies in Germany that also create credit reports (Schufa, Creditreform, Crif, and Creditsafe). Other companies resell the data from these four providers. I provide the same services as these companies, but without including personal information such as directors or investors. The market for this service is quite large; for example, Creditreform sold over 26 million credit reports about companies in 2020.

My probability of default prediction performs quite well, achieving an AUC score of 0.87 on my test data. An AUC of 0.87 means that there is an 87% chance that the model ranks a randomly selected company that defaults higher than a randomly selected company that does not default. Additionally, there are many more companies to crawl for my database.

Currently, I am focusing on companies that are required to publish their profit and loss statements. For testing purposes, there are currently 2,000 companies available on my website.

At the moment, the website is only available in German, but you can use Google Translate, which works ok for my website.

Thank you very much for your feedback!


Comments URL: https://news.ycombinator.com/item?id=42400588

Points: 13

# Comments: 2



from Hacker News: Front Page https://bonscore.org/
Continue Reading

Apple is currently using Amazon Web Services' custom AI chips for services like searching and will evaluate if its latest AI chip can be used to pre-train its AI models like Apple Intelligence.

from International: Top News And Analysis https://ift.tt/BDReUg2
Continue Reading

Hi, I'm Weston Goodwin. I originally posted about my project on HN back in June 2023 (https://news.ycombinator.com/item?id=36187556) and wanted to share some of the updates I’ve made since then. As a referesher SQL Simulator is a tool that simulates SQL script execution by creating a subsetted database. Below is a list of changes I've made:

1.)It now supports both Docker and Kubernetes. 2.)The database container automatically self-destructs after 15 minutes of inactivity to improve security. 3.)A Data Governor limits the amount of sensitive data that can be retrieved in a day. 4.)The K8s version can be used as a database proxy. Simply remove direct access to the database and force users to go through the K8s cluster/Data governor to view any data.

The tool is available without requiring signup or credit card. I’d appreciate any feedback or suggestions. Thanks for this post reading.

Docker Documentation: https://ssdocker.tribalknowledge.tech/

K8s Documentation: https://sql-simulator.tribalknowledge.tech/


Comments URL: https://news.ycombinator.com/item?id=42295177

Points: 3

# Comments: 0



from Hacker News: Front Page https://ift.tt/Py4CJM7
Continue Reading

We’ve just open-sourced Vicinity, a lightweight approximate nearest neighbors (ANN) search package that allows for fast experimentation and comparison of a larger number of well known algorithms.

Main features:

- Lightweight: the base package only uses Numpy

- Unified interface: use any of the supported algorithms and backends with a single interface: HNSW, Annoy, FAISS, and many more algorithms and libraries are supported

- Easy evaluation: evaluate the performance of your backend with a simple function to measure queries per second vs recall

- Serialization: save and load your index for persistence

After working with a large number of ANN libraries over the years, we found it increasingly cumbersome to learn the interface, features, quirks, and limitations of every library. After writing custom evaluation code to measure the speed and performance for the 100th time to compare libraries, we decided to build this as a way to easily use a large number of algorithms and libraries with a unified, simple interface that allows for quick comparison and evaluation.

We are curious to hear your feedback! Are there any algorithms that are missing that you use? Any extra evaluation metrics that are useful?


Comments URL: https://news.ycombinator.com/item?id=42289109

Points: 9

# Comments: 0



from Hacker News: Front Page https://ift.tt/IWpjMRQ
Continue Reading