Amiga Games Released in 2024 Index
Article URL: https://www.lemonamiga.com/forum/viewtopic.php?t=19114
Comments URL: https://news.ycombinator.com/item?id=42560284
Points: 5
# Comments: 0
from Hacker News: Front Page https://ift.tt/fwCRsSM
Article URL: https://www.lemonamiga.com/forum/viewtopic.php?t=19114
Comments URL: https://news.ycombinator.com/item?id=42560284
Points: 5
# Comments: 0
Article URL: https://pod.geraspora.de/posts/17342163
Comments URL: https://news.ycombinator.com/item?id=42549624
Points: 437
# Comments: 289
Article URL: https://gitlab.com/skmp/dca3-game
Comments URL: https://news.ycombinator.com/item?id=42559909
Points: 67
# Comments: 12
Article URL: https://www.ft.com/content/c755a34d-eb97-40d1-b780-ae2e2f0e7ad9
Comments URL: https://news.ycombinator.com/item?id=42552222
Points: 10
# Comments: 1
If you knew him in life or remember his contribution to the world, please share your stories.
Comments URL: https://news.ycombinator.com/item?id=42551900
Points: 156
# Comments: 14
Article URL: https://abishekmuthian.com/how-i-run-llms-locally/
Comments URL: https://news.ycombinator.com/item?id=42539155
Points: 22
# Comments: 10
Article URL: https://www.france24.com/en/europe/20241228-eu-law-mandating-universal-chargers-for-devices-comes-into-force
Comments URL: https://news.ycombinator.com/item?id=42534851
Points: 45
# Comments: 15
Article URL: https://imatix-legacy.github.io/libero/index.htm
Comments URL: https://news.ycombinator.com/item?id=42534090
Points: 11
# Comments: 4
Article URL: https://lapcatsoftware.com/articles/2024/12/3.html
Comments URL: https://news.ycombinator.com/item?id=42533685
Points: 46
# Comments: 7
Comments URL: https://news.ycombinator.com/item?id=42527572
Points: 97
# Comments: 185
Article URL: https://tidyfirst.substack.com/p/complain-and-propose
Comments URL: https://news.ycombinator.com/item?id=42523933
Points: 21
# Comments: 9
Article URL: https://www.christo.sh/building-agi-on-the-tokio-runtime/
Comments URL: https://news.ycombinator.com/item?id=42516041
Points: 23
# Comments: 7
Article URL: https://ourworldindata.org/golden-age-antibiotics
Comments URL: https://news.ycombinator.com/item?id=42495037
Points: 7
# Comments: 1
Article URL: https://www.earlychristianwritings.com/index.html
Comments URL: https://news.ycombinator.com/item?id=42509711
Points: 12
# Comments: 2
Article URL: https://www.bostonglobe.com/2024/12/24/metro/maine-prison-remote-jobs-mountain-view-correctional-facility/
Comments URL: https://news.ycombinator.com/item?id=42503258
Points: 19
# Comments: 15
Article URL: https://ploum.net/2024-12-23-julius-en.html
Comments URL: https://news.ycombinator.com/item?id=42494090
Points: 19
# Comments: 0
Article URL: https://aegisub.org/
Comments URL: https://news.ycombinator.com/item?id=42457820
Points: 27
# Comments: 1
Article URL: https://genesis-world.readthedocs.io/en/latest/
Comments URL: https://news.ycombinator.com/item?id=42457213
Points: 195
# Comments: 48
Article URL: https://primatology.xyz/blog/introducing-gazzetta
Comments URL: https://news.ycombinator.com/item?id=42480885
Points: 5
# Comments: 0
Article URL: https://www.speakandregret.michaelinzlicht.com/p/revisiting-stereotype-threat
Comments URL: https://news.ycombinator.com/item?id=42464138
Points: 67
# Comments: 41
Layer Enhanced Classification (LEC) is a novel technique that outperforms current industry leaders like GPT-4o, LlamaGuards 1 and 8B, and deBERTa v3 Prompt Injection v2 for content safety and prompt injection tasks.
We prove that the intermediate hidden layers in transformers are robust feature extractors for text classification.
On content safety, LEC models achieved a 0.96 F1 score vs GPT-4o's 0.82 and Llama Guard 8B's 0.71.The LEC models were able to outperform the other models with only 15 training examples for binary classification and 50 examples for multi-class classification across 66 categories.
On prompt injection,LEC models achieved a 0.98 F1 score vs GPT-4o's 0.92 and deBERTa v3 Prompt Injection v2's 0.73. LEC models were able to outperform deBERTa with only 5 training examples and GPT-4o with only 55 training examples.
Read the full paper and our approach here: https://arxiv.org/abs/2412.13435
Comments URL: https://news.ycombinator.com/item?id=42463943
Points: 3
# Comments: 0
Article URL: https://www.caltech.edu/about/news/thinking-slowly-the-paradoxical-slowness-of-human-behavior
Comments URL: https://news.ycombinator.com/item?id=42450811
Points: 6
# Comments: 3
Every data pipeline job I had to tackle required quite a few components to set up:
- One tool to ingest data
- Another one to transform it
- If you wanted to run Python, set up an orchestrator
- If you need to check the data, a data quality tool
Let alone this being hard to set up and taking time, it is also pretty high-maintenance. I had to do a lot of infra work, and while this being billable hours for me I didn’t enjoy the work at all. For some parts of it, there were nice solutions like dbt, but in the end for an end-to-end workflow, it didn’t work. That’s why I decided to build an end-to-end solution that could take care of data ingestion, transformation, and Python stuff. Initially, it was just for our own usage, but in the end, we thought this could be a useful tool for everyone.
In its core, Bruin is a data framework that consists of a CLI application written in Golang, and a VS Code extension that supports it with a local UI.
Bruin supports quite a few stuff:
- Data ingestion using ingestr (https://github.com/bruin-data/ingestr)
- Data transformation in SQL & Python, similar to dbt
- Python env management using uv
- Built-in data quality checks
- Secrets management
- Query validation & SQL parsing
- Built-in templates for common scenarios, e.g. Shopify, Notion, Gorgias, BigQuery, etc
This means that you can write end-to-end pipelines within the same framework and get it running with a single command. You can run it on your own computer, on GitHub Actions, or in an EC2 instance somewhere. Using the templates, you can also have ready-to-go pipelines with modeled data for your data warehouse in seconds.
It includes an open-source VS Code extension as well, which allows working with the data pipelines locally, in a more visual way. The resulting changes are all in code, which means everything is version-controlled regardless, it just adds a nice layer.
Bruin can run SQL, Python, and data ingestion workflows, as well as quality checks. For Python stuff, we use the awesome (and it really is awesome!) uv under the hood, install dependencies in an isolated environment, and install and manage the Python versions locally, all in a cross-platform way. Then in order to manage data uploads to the data warehouse, it uses dlt under the hood to upload the data to the destination. It also uses Arrow’s memory-mapped files to easily access the data between the processes before uploading them to the destination.
We went with Golang because of its speed and strong concurrency primitives, but more importantly, I knew Go better than the other languages available to me and I enjoy writing Go, so there’s also that.
We had a small pool of beta testers for quite some time and I am really excited to launch Bruin CLI to the rest of the world and get feedback from you all. I know it is not often to build data tooling in Go but I believe we found ourselves in a nice spot in terms of features, speed, and stability.
https://github.com/bruin-data/bruin
I’d love to hear your feedback and learn more about how we can make data pipelines easier and better to work with, looking forward to your thoughts!
Best, Burak
Comments URL: https://news.ycombinator.com/item?id=42442812
Points: 22
# Comments: 4
Article URL: https://outlore.dev/blog/model-context-protocol/
Comments URL: https://news.ycombinator.com/item?id=42415654
Points: 5
# Comments: 6
Article URL: https://www.sciencedirect.com/science/article/pii/S2666202724004518
Comments URL: https://news.ycombinator.com/item?id=42418264
Points: 20
# Comments: 11
Article URL: https://www.pcmag.com/articles/nyc-wants-you-to-stop-taking-traffic-cam-selfies-but-heres-how-to-do-it
Comments URL: https://news.ycombinator.com/item?id=42405559
Points: 29
# Comments: 4
This is another one of my automate-my-life projects - I'm constantly asking the same question to different AIs since there's always the hope of getting a better answer somewhere else. Maybe ChatGPT's answer is too short, so I ask Perplexity. But I realize that's hallucinated, so I try Gemini. That answer sounds right, but I cross-reference with Claude just to make sure.
This doesn't really apply to math/coding (where o1 or Gemini can probably one-shot an excellent response), but more to online search, where information is more fluid and there's no "right" search engine + text restructuring + model combination every time. Even o1 doesn't have online search, so it's obviously a hard problem to solve.
An example is something like "best ski resorts in the US", which will get a different response from every GPT, but most of their rankings won't reflect actual skiers' consensus - say, on Reddit https://www.reddit.com/r/skiing/comments/sew297/updated_us_s... - because there's so many opinions floating around, a one-shot RAG search + LLM isn't going to have enough context to find how everyone thinks. And obviously, offline GPTs like o1 and Sonnet/Haiku aren't going to have the latest updates if a resort closes for example.
So I’ve spent the last few months experimenting with a new project that's basically the most expensive GPT I’ll ever run. It runs search queries through ChatGPT, Claude, Grok, Perplexity, Gemini, etc., then aggregates the responses. For added financial tragedy, in-between it also uses multiple embedding models and performs iterative RAG searches through different search engines. This all functions as sort of like one giant AI brain. So I pay for every search, then every embedding, then every intermediary LLM input/output, then the final LLM input/output. On average it costs about 10 to 30 cents per search. It's also extremely slow.
I know that sounds absurdly overkill, but that’s kind of the point. The goal is to get the most accurate and comprehensive answer possible, because it's been vetted by a bunch of different AIs, each sourcing from different buckets of websites. Context limits today are just large enough that this type of search and cross-model iteration is possible, where we can determine the "overlap" between a diverse set of text to determine some sort of consensus. The idea is to get online answers that aren't attainable from any single AI. If you end up trying this out, I'd recommend comparing Ithy's output against the other GPTs to see the difference.
It's going to cost me a fortune to run this project (I'll probably keep it online for a month or two), but I see it as an exploration of what’s possible with today’s model APIs, rather than something that’s immediately practical. Think of it as an online o1 (without the $200/month price tag, though I'm offering a $29/month Pro plan to help subsidize). If nothing else, it’s a fun (and pricey) thought experiment.
Comments URL: https://news.ycombinator.com/item?id=42409056
Points: 31
# Comments: 18
Hello,
In addition to my studies in computer science, I have been working on a side project. I obtain data from the Unternehmensregister, a register where every German limited company is required to publish their financial statements. These statements are published as HTML files and are completely unstructured. While financial statements often look similar, companies are not required to follow a specific structure, which often leads to inconsistently formatted statements.
The use of the Unternehmensregister is completely free, so you can check out some examples.
I wrote code that converts the unstructured financial statements into structured data using the ChatGPT API. This works well. Of course, there are some problems that have not yet been solved, but data extraction works well for the majority of companies.
I than coded a Random Forest algorithm to estimate the probability of default for a company based on its financial statement from the Unternehmensregister. I built a website to present the structured data along with the scores. Essentially, I create a credit reports for companies.
Currently, there are four companies in Germany that also create credit reports (Schufa, Creditreform, Crif, and Creditsafe). Other companies resell the data from these four providers. I provide the same services as these companies, but without including personal information such as directors or investors. The market for this service is quite large; for example, Creditreform sold over 26 million credit reports about companies in 2020.
My probability of default prediction performs quite well, achieving an AUC score of 0.87 on my test data. An AUC of 0.87 means that there is an 87% chance that the model ranks a randomly selected company that defaults higher than a randomly selected company that does not default. Additionally, there are many more companies to crawl for my database.
Currently, I am focusing on companies that are required to publish their profit and loss statements. For testing purposes, there are currently 2,000 companies available on my website.
At the moment, the website is only available in German, but you can use Google Translate, which works ok for my website.
Thank you very much for your feedback!
Comments URL: https://news.ycombinator.com/item?id=42400588
Points: 13
# Comments: 2
Article URL: https://www.chalkbeat.org/2024/12/04/timss-international-test-result-us-math-scores-decline-post-pandemic/
Comments URL: https://news.ycombinator.com/item?id=42378929
Points: 19
# Comments: 23
Article URL: https://sora.com/
Comments URL: https://news.ycombinator.com/item?id=42368604
Points: 193
# Comments: 133
Article URL: https://ww2supercut.substack.com/p/combining-143-world-war-ii-movies
Comments URL: https://news.ycombinator.com/item?id=42368210
Points: 12
# Comments: 2
Article URL: https://www.androidauthority.com/phone-pc-performance-3504716/
Comments URL: https://news.ycombinator.com/item?id=42358388
Points: 56
# Comments: 50
Article URL: https://shape-of-code.com/2024/12/01/21-algol-60-compilers-in-1962/
Comments URL: https://news.ycombinator.com/item?id=42309692
Points: 37
# Comments: 29
Article URL: https://security.googleblog.com/2024/12/announcing-launch-of-vanir-open-source.html
Comments URL: https://news.ycombinator.com/item?id=42341922
Points: 8
# Comments: 0
Article URL: https://www.bloomberg.com/news/articles/2024-12-05/boeing-plea-deal-over-fatal-737-max-crashes-rejected-by-judge
Comments URL: https://news.ycombinator.com/item?id=42330454
Points: 29
# Comments: 8
Article URL: https://blog.vladovince.com/my-brand-new-digitizing-workflow-using-a-25-year-old-film-scanner/
Comments URL: https://news.ycombinator.com/item?id=42308234
Points: 10
# Comments: 4
Hi, I'm Weston Goodwin. I originally posted about my project on HN back in June 2023 (https://news.ycombinator.com/item?id=36187556) and wanted to share some of the updates I’ve made since then. As a referesher SQL Simulator is a tool that simulates SQL script execution by creating a subsetted database. Below is a list of changes I've made:
1.)It now supports both Docker and Kubernetes. 2.)The database container automatically self-destructs after 15 minutes of inactivity to improve security. 3.)A Data Governor limits the amount of sensitive data that can be retrieved in a day. 4.)The K8s version can be used as a database proxy. Simply remove direct access to the database and force users to go through the K8s cluster/Data governor to view any data.
The tool is available without requiring signup or credit card. I’d appreciate any feedback or suggestions. Thanks for this post reading.
Docker Documentation: https://ssdocker.tribalknowledge.tech/
K8s Documentation: https://sql-simulator.tribalknowledge.tech/
Comments URL: https://news.ycombinator.com/item?id=42295177
Points: 3
# Comments: 0
Article URL: https://theconversation.com/an-83-year-old-short-story-by-borges-portends-a-bleak-future-for-the-internet-242998
Comments URL: https://news.ycombinator.com/item?id=42284563
Points: 66
# Comments: 48
We’ve just open-sourced Vicinity, a lightweight approximate nearest neighbors (ANN) search package that allows for fast experimentation and comparison of a larger number of well known algorithms.
Main features:
- Lightweight: the base package only uses Numpy
- Unified interface: use any of the supported algorithms and backends with a single interface: HNSW, Annoy, FAISS, and many more algorithms and libraries are supported
- Easy evaluation: evaluate the performance of your backend with a simple function to measure queries per second vs recall
- Serialization: save and load your index for persistence
After working with a large number of ANN libraries over the years, we found it increasingly cumbersome to learn the interface, features, quirks, and limitations of every library. After writing custom evaluation code to measure the speed and performance for the 100th time to compare libraries, we decided to build this as a way to easily use a large number of algorithms and libraries with a unified, simple interface that allows for quick comparison and evaluation.
We are curious to hear your feedback! Are there any algorithms that are missing that you use? Any extra evaluation metrics that are useful?
Comments URL: https://news.ycombinator.com/item?id=42289109
Points: 9
# Comments: 0