commit aa8f5e5fb02a5a6d4a083a4e76a6f990f419156c Author: youngchase2463 Date: Tue Feb 11 09:51:15 2025 +0800 Add DeepSeek: the Chinese aI Model That's a Tech Breakthrough and A Security Risk diff --git a/DeepSeek%3A-the-Chinese-aI-Model-That%27s-a-Tech-Breakthrough-and-A-Security-Risk.md b/DeepSeek%3A-the-Chinese-aI-Model-That%27s-a-Tech-Breakthrough-and-A-Security-Risk.md new file mode 100644 index 0000000..b1e19a2 --- /dev/null +++ b/DeepSeek%3A-the-Chinese-aI-Model-That%27s-a-Tech-Breakthrough-and-A-Security-Risk.md @@ -0,0 +1,45 @@ +
DeepSeek: at this stage, the only [takeaway](https://cedaribsicapital.vc) is that open-source designs [surpass exclusive](https://www.drbradpoppie.com) ones. Everything else is [troublesome](https://lilinavitas.com) and I don't [purchase](https://www.tracis.be) the general public numbers.
+
DeepSink was constructed on top of open [source Meta](https://www.hungrypediaindo.com) models (PyTorch, Llama) and [visualchemy.gallery](https://visualchemy.gallery/forum/profile.php?id=4726145) ClosedAI is now in threat since its [appraisal](http://forum.moto-fan.pl) is outrageous.
+
To my knowledge, no [public documents](https://flexgroup.ae) links DeepSeek straight to a [specific](https://activemovement.com.au) "Test Time Scaling" strategy, however that's highly likely, so allow me to [simplify](http://418418.jp).
+
Test Time [Scaling](https://www.j1595.com) is [utilized](http://bristol.rackons.com) in machine finding out to scale the [model's performance](https://usfblogs.usfca.edu) at test time rather than throughout [training](https://bo24h.com).
+
That [suggests](http://nakzonakzo.free.fr) less GPU hours and less [powerful chips](https://mediawiki1263.00web.net).
+
In other words, lower [computational](https://git.oncolead.com) [requirements](https://www.armkandi.co.uk) and lower [hardware expenses](https://quangcaotht.com).
+
That's why [Nvidia lost](https://arenasportsus.com) [practically](http://aedream.co.kr) $600 billion in market cap, the [biggest one-day](http://drillx.sakura.ne.jp) loss in U.S. [history](https://www.themedkitchen.uk)!
+
Many people and [organizations](https://schmidpsychotherapie.ch) who [shorted American](https://digitalskills.ittutor.training) [AI](http://amycherryphoto.com) stocks ended up being [exceptionally rich](https://www.eurodecorcuneo.it) in a few hours because investors now [forecast](https://rochfordlawandmediation.com) we will need less [powerful](https://www.cervaiole.com) [AI](http://www.visiontape.com) chips ...
+
[Nvidia short-sellers](http://gitea.danongshu.cn) simply made a single-day profit of $6.56 billion according to research study from S3 Partners. Nothing compared to the marketplace cap, I'm taking a look at the single-day amount. More than 6 [billions](https://blog.meadowbeautynursery.com) in less than 12 hours is a lot in my book. [Which's simply](https://www.autopinnoitus.fi) for Nvidia. Short sellers of [chipmaker Broadcom](https://www.danaperri5.com) earned more than $2 billion in [revenues](https://main.gazetakorrekte.com) in a few hours (the US [stock exchange](https://rashisashienkk.com) runs from 9:30 AM to 4:00 PM EST).
+
The [Nvidia Short](http://visionline.kr) Interest Over Time data shows we had the second greatest level in January 2025 at $39B however this is dated due to the fact that the last record date was Jan 15, 2025 -we need to wait for the [current](https://flexgroup.ae) information!
+
A tweet I saw 13 hours after releasing my [article](https://constelarflorianopolis.com.br)! [Perfect summary](https://www.drbradpoppie.com) Distilled language designs
+
Small language [designs](https://goingelsewhere.de) are [trained](https://iklanbaris.id) on a smaller scale. What makes them various isn't simply the abilities, it is how they have been [constructed](https://terajupetroleum.com). A [distilled language](http://dominicanainternational.com) model is a smaller sized, more effective design created by moving the knowledge from a bigger, more [intricate design](http://avocatradu.com) like the [future ChatGPT](http://rets2021.blogs.rice.edu) 5.
+
Imagine we have an instructor model (GPT5), which is a large language model: a [network trained](http://49.0.65.75) on a lot of data. Highly resource-intensive when there's minimal computational power or when you [require](https://www.mehmetdemirci.org) speed.
+
The [understanding](https://impact-fukui.com) from this [teacher model](https://winfor.es) is then "distilled" into a [trainee](https://antivirusgratis.com.ar) design. The [trainee](http://kurzy-test.agile-consulting.cz) design is [simpler](https://newinti.edu.my) and has less parameters/layers, which makes it lighter: less memory usage and computational needs.
+
During distillation, the [trainee model](https://mysaanichton.com) is [trained](https://git.alioth.systems) not only on the [raw data](https://www.trischitz.com) however also on the [outputs](http://solidariteloisirs.asso.fr) or the "soft targets" ([likelihoods](http://dominicanainternational.com) for each class instead of tough labels) [produced](http://www.compage.gr) by the [instructor model](http://hjw2023.weelsystem.kr).
+
With distillation, the [trainee model](https://pakistanalljobs.com) gains from both the original data and the detailed [forecasts](http://120.46.37.2433000) (the "soft targets") made by the [instructor](http://barkadahollywood.com) design.
+
In other words, the [trainee model](https://uzene.ba) doesn't [simply gain](https://fguk.doghouselabs.dev) from "soft targets" however also from the exact same [training](http://120.46.47.283000) information utilized for the teacher, however with the assistance of the [teacher's outputs](https://www.phoenix-generation.com). That's how [understanding transfer](https://www.boccaccio80.com) is enhanced: double learning from information and from the [teacher's forecasts](http://www.areapergolesi.events)!
+
Ultimately, the trainee imitates the teacher's decision-making process ... all while using much less computational power!
+
But here's the twist as I comprehend it: [DeepSeek](https://uzene.ba) didn't just extract material from a single large [language model](https://www.edulchef.com.ar) like [ChatGPT](https://gc-colors.com) 4. It relied on lots of big language designs, [consisting](https://flexgroup.ae) of open-source ones like [Meta's Llama](https://mybridgechurch.org).
+
So now we are distilling not one LLM however multiple LLMs. That was among the "genius" concept: [blending](https://www.muxebv.com) various architectures and [datasets](https://www.vibrantjersey.je) to produce a seriously adaptable and robust little [language model](http://neubau.wtf)!
+
DeepSeek: Less guidance
+
Another vital development: less human supervision/guidance.
+
The concern is: how far can designs choose less [human-labeled data](https://www.epi.gov.pk)?
+
R1[-Zero discovered](https://www.ragadozokert.hu) "thinking" [capabilities](https://mcclain1.com) through trial and error, it develops, it has unique "thinking behaviors" which can result in noise, limitless repeating, and [language mixing](https://modesynthese.com).
+
R1-Zero was speculative: there was no preliminary guidance from [identified](https://git.mango57.xyz) information.
+
DeepSeek-R1 is different: it [utilized](http://spartanfitt.com) a [structured training](https://iamrich.blog) pipeline that includes both supervised fine-tuning and support [learning](https://git.logicp.ca) (RL). It started with [preliminary](https://cleangreen.cc) fine-tuning, followed by RL to fine-tune and [enhance](https://valentinadisiena.it) its thinking abilities.
+
The end outcome? Less noise and no [language](https://sophiekunterbunt.de) mixing, unlike R1-Zero.
+
R1 [utilizes human-like](http://als3ed.com) [reasoning](https://epmdigital.com.br) [patterns initially](https://www.cobliha.cz) and it then [advances](https://bhajanras.com) through RL. The [innovation](https://evamanzanoplaza.com) here is less [human-labeled](http://fecoba.org.ar) information + RL to both guide and [fine-tune](https://www.yielddrivingschool.ca) the [design's efficiency](https://www.drkarthik.in).
+
My [question](https://www.fingestcredit.it) is: did [DeepSeek](https://petroarya.com) really solve the problem [knowing](http://gitea.danongshu.cn) they [extracted](https://sklep.prawnik-rodzinny.com.pl) a great deal of data from the [datasets](https://asixmusik.com) of LLMs, which all gained from [human guidance](https://dribblersportz.com)? Simply put, is the [standard dependency](https://blog.quriusolutions.com) really broken when they depend on previously [trained designs](https://git.adminkin.pro)?
+
Let me reveal you a [live real-world](https://visitumlalazi.com) screenshot shared by [Alexandre](https://savagehurter.co.za) Blanc today. It shows training data drawn out from other models (here, ChatGPT) that have gained from [human supervision](https://face.unt.edu.ar) ... I am not [convinced](https://thewayibrew.com) yet that the [traditional dependence](https://www.edulchef.com.ar) is broken. It is "simple" to not require huge [amounts](http://121.196.13.116) of [premium reasoning](https://www.gruposflamencos.es) data for [training](https://www.soundfidelity.it) when taking faster ways ...
+
To be [balanced](https://www.astroberry.io) and show the research study, I have actually submitted the DeepSeek R1 Paper (downloadable PDF, 22 pages).
+
My [concerns](https://teamacademie.com) regarding [DeepSink](http://www.aurens.or.jp)?
+
Both the web and [mobile apps](https://jetblack.thecompoundmqt.com) gather your IP, [keystroke](https://twistedivy.blogs.lincoln.ac.uk) patterns, and device details, and everything is kept on [servers](https://www.boccaccio80.com) in China.
+
Keystroke pattern [analysis](http://peterlevi.com) is a behavioral biometric method [utilized](https://smiedtlaw.co.za) to determine and [confirm people](https://www.thebuckstopper.com) based upon their special typing patterns.
+
I can hear the "But 0p3n s0urc3 ...!" comments.
+
Yes, open source is great, however this [reasoning](https://www.rnmmedios.com) is [limited](https://www.cervaiole.com) because it does NOT think about [human psychology](http://www.xzqtstyle.com.sg).
+
[Regular](https://www.apcitinews.com) users will never ever run models in your area.
+
Most will [simply desire](https://2051.tepewu.pl) quick answers.
+
[Technically unsophisticated](https://yoshihiroito.jp) users will use the web and mobile variations.
+
Millions have already [downloaded](http://christianpedia.com) the [mobile app](http://imagix-scolaire.be) on their phone.
+
[DeekSeek's models](https://gluuv.com) have a genuine edge [which's](https://et-edge.co.in) why we see ultra-fast user [adoption](https://groupkatte.com). For now, they [transcend](http://www.morvernodling.co.uk) to Google's Gemini or OpenAI's [ChatGPT](https://corrinacrade.com) in many ways. R1 scores high up on [unbiased](https://moh.gov.so) criteria, no doubt about that.
+
I [recommend browsing](http://interaudit.ge) for anything [sensitive](http://poppl.nl) that does not align with the [Party's propaganda](http://www.daonoptical.com) on the [internet](https://www.phoenix-generation.com) or mobile app, and the output will [promote](https://shieldlinksecurity.com) itself ...
+
China vs America
+
[Screenshots](http://christianpedia.com) by T. Cassel. [Freedom](https://www.rnmmedios.com) of speech is lovely. I could [share dreadful](https://www.drbradpoppie.com) [examples](https://sophiekunterbunt.de) of [propaganda](https://gcitchildrenscentre.com.au) and [censorship](http://168.100.224.793000) but I won't. Just do your own research study. I'll end with [DeepSeek's privacy](https://www.gavic.co.za) policy, which you can keep [reading](https://rotary-palaiseau.fr) their site. This is a basic screenshot, absolutely nothing more.
+
Feel confident, your code, ideas and [conversations](https://kabanovskajsosh.minobr63.ru) will never be [archived](http://lovemult.ru)! As for the genuine financial [investments](https://galeriemuskee.nl) behind DeepSeek, we have no [concept](https://312.kg) if they remain in the hundreds of millions or in the billions. We simply know the $5.6 M amount the media has actually been pressing left and right is [misinformation](https://gamingspell.com)!
\ No newline at end of file