Join Nostr
2025-01-25 06:31:31 UTC
in reply to

John Carlos Baez on Nostr: Wikipedia will eventually be a good jumping-off point for more news. Some quotes: ...

Wikipedia will eventually be a good jumping-off point for more news. Some quotes:

"DeepSeek (Chinese: 深度求索; pinyin: Shēndù Qiúsuǒ) is a Chinese artificial intelligence company that develops open-source large language models. The company is funded solely by Chinese hedge fund High-Flyer. Both DeepSeek and High-Flyer are based in Hangzhou, Zhejiang."

"In December 2024, DeepSeek-V3 was launched. It came with 671 billion parameters and trained in around 55 days at a cost of US$5.58 million, using significantly less resources compared to its peers. It was trained on a dataset of 14.8 trillion tokens. Benchmark tests showed it outperformed Llama 3.1 and Qwen 2.5 while matching GPT-4o and Claude 3.5 Sonnet. DeepSeek's optimization on limited resources highlighted potential limits of US sanctions on China's AI development. An opinion piece by The Hill described the release as American AI reaching its Sputnik moment."

"On January 20, 2025, the DeepSeek-R1 and DeepSeek-R1-Zero were released. They were based on V3-Base. Like V3, each is a MoE with 671B total parameters and 37B activated parameters. They also released some "DeepSeek-R1-Distill" models, which are not based on R1. Instead, they are similar to other open-weight models like LLaMA and Qwen, fine-tuned on synthetic data generated by R1."

(3/n)

https://en.wikipedia.org/wiki/DeepSeek