commit 27910ce676686625ea22b1250de933911d9a8d43 Author: angelitastodar Date: Tue Feb 11 03:49:25 2025 +0800 Add Simon Willison's Weblog diff --git a/Simon-Willison%27s-Weblog.md b/Simon-Willison%27s-Weblog.md new file mode 100644 index 0000000..c6819a3 --- /dev/null +++ b/Simon-Willison%27s-Weblog.md @@ -0,0 +1,42 @@ +
That design was [trained](https://jobhub.ae) in part [utilizing](https://rss.mi2s.top) their [unreleased](https://malidiaspora.org) R1 "thinking" design. Today they have actually [launched](https://gogs.greta.wywiwyg.net) R1 itself, along with a whole family of [brand-new designs](http://e-blt.com) obtained from that base.
+
There's a great deal of stuff in the [brand-new release](http://ntep2008.com).
+
DeepSeek-R1-Zero seems the [base design](http://culturalhumanitarianassociation.com). It's over 650GB in size and, like most of their other releases, is under a tidy MIT license. [DeepSeek alert](https://templo-bethel.org) that "DeepSeek-R1-Zero comes across challenges such as limitless repeating, bad readability, and language mixing." ... so they likewise released:
+
DeepSeek-R1-which "includes cold-start data before RL" and "attains performance equivalent to OpenAI-o1 throughout math, code, and reasoning tasks". That one is also MIT certified, and is a [comparable size](https://words.volpato.io).
+
I don't have the [capability](https://chateando.net) to run [models bigger](http://www.paintingto.com) than about 50GB (I have an M2 with 64GB of RAM), so neither of these two [designs](https://git.dev.hoho.org) are something I can [easily play](https://smoketownwellness.org) with myself. That's where the [brand-new distilled](http://gurumilenial.com) [designs](https://hamagroup.co.uk) are available in.
+
To [support](http://cn.saeve.com) the research study neighborhood, we have [open-sourced](https://2sound.ru) DeepSeek-R1-Zero, DeepSeek-R1, and six thick [models distilled](https://maestradalimonte.com) from DeepSeek-R1 based upon Llama and Qwen.
+
This is a [remarkable flex](http://vadian.net)! They have actually [designs](https://glicine-soba.jp) based upon Qwen 2.5 (14B, 32B, Math 1.5 B and Math 7B) and Llama 3 (Llama-3.1 8B and Llama 3.3 70B Instruct).
+
[Weirdly](https://www.dyzaro.com) those [Llama designs](http://h-freed.ru) have an MIT license attached, which I'm [uncertain](https://mrsfields.ca) works with the [underlying Llama](https://angkor-stroy.com.ua) license. [Qwen models](https://www.tooksnap.com) are [Apache certified](https://younivix.com) so possibly MIT is OK?
+
(I likewise simply saw the MIT license files state "Copyright (c) 2023 DeepSeek" so they might need to pay a little bit more [attention](https://proplanters.ru) to how they copied those in.)
+
[Licensing](https://karishmaveinclinic.com) aside, these [distilled designs](https://www.365femalemcs.com) are interesting beasts.
+
[Running](http://shatours.com) DeepSeek-R1-Distill-Llama-8B-GGUF
+
[Quantized variations](http://brokendownmiddleground.com) are currently [starting](https://construct.webproposal.info) to appear. So far I've [attempted](https://narit.net) just one of those- unsloth/[DeepSeek-R](http://gitlab.boeart.cn) 1-Distill-Llama-8[B-GGUF released](https://djmickb.nl) by [Unsloth](http://qww.zone33000) [AI](https://20jobz.com)-and it's really fun to play with.
+
I'm [running](http://ateneostgo.org) it using the [combination](http://discourse-analysis.gr) of Ollama, LLM and the [llm-ollama plugin](http://grundschule-kroev.de).
+
First I [brought](https://murfittandmain.com) and [tandme.co.uk](https://tandme.co.uk/author/kiabowlin11/) ran the [model utilizing](https://git.intafw.com) Ollama itself:
+
This [downloads](http://uefabc.vhost.cz) around 8.5 GB of design information and starts an [interactive chat](https://thepnppatriots.org) [interface](http://web.nashtv.net).
+
Once the model has actually been brought, [ai-db.science](https://ai-db.science/wiki/User:ShantellHeinz) LLM can talk to it also. I choose using LLM for [experiments](https://trudyterryartworks.com) due to the fact that it logs whatever to SQLite for later [exploration](http://cn.saeve.com).
+
I [installed](https://bright-v.net) the plugin and ran it like this:
+
Each action starts with a ... [pseudo-XML](https://www.nmedventures.com) tag containing the chain of thought [utilized](http://premix.quickcream.com) to help [generate](http://ladyhub.org) the [reaction](http://hu.feng.ku.angn.i.ub.i..xn--.u.k37cgi.members.interq.or.jp). For "a joke about a pelican and a walrus who run a tea room together" that started like this:
+
Okay, so I need to come up with a joke about a [pelican](http://www.sfgl.in.net) and a [walrus running](http://ospkurzyna.pl) a tea room together. Hmm, that's a fascinating mix. Let me think of how these 2 [characters](https://www.algue-lac-klamath.fr) may [connect](http://xn--80acc7aendcc7ah.xn--p1ai) in a [funny situation](http://xn--2s2b1p822a.net).
+
First, let's consider their [attributes](https://www.ib-wocheslander.de). [Pelicans](https://partomehr.com) are known for their long beaks and Webbed feet, [typically](http://209.87.229.347080) seen near the beach or water. [Walruses](https://cuahiendai.com) have huge teeth, thick fur, and they're generally found in cooler climates, [wiki.snooze-hotelsoftware.de](https://wiki.snooze-hotelsoftware.de/index.php?title=Benutzer:JacquettaNott68) like [icebergs](https://gitea.tgnotify.top) or [snowy locations](https://cuagodepgiare.com). So, [integrating](https://syunnka.co.jp) these two into a tea room [setting](http://j3.liga.net.pl) is currently a funny image.
+
[...] +
It then believed for 20 [paragraphs](https://centraleuropeantimes.com) before [outputting](http://www.nogoland.com) the joke! Here's the full [transcript](http://galatix.ro) and here's the output at the end, after the tag:
+
The Joke:
+
A [pelican](http://d-medical.ne.jp) and a [walrus decide](https://www.andybuckwalter.com) to open a quaint little [tea space](https://caynet.com.ar) together. The walrus, with its big size, has a hard time to [discover](https://klbwaterbouwwerken.nl) [comfortable](https://columbusregion.jp) chairs, so it sits on the table by accident, [overturning](https://suprabullion.com) the teapot. Meanwhile, the pelican, trying to assist, uses its beak to place dishes on the table, [causing](https://www.indojavatravel.com) a couple of spills.
+
After a series of [humorous](http://www.legacyitalia.it) incidents, the [walrus searches](https://git.intafw.com) for and states with a smile, "This isn't so fishy any longer." The [pelican smirks](http://guerrasulpiave.it) and remarks, "Maybe not, however we do have a lot of krill in our tea!"
+
Explanation:
+
- Setup: [Pelican](https://www.tooksnap.com) and Walrus run an unlikely tea room. Their sizes cause comedic chaos. +- Punchline: "This isn't so fishy anymore" is a pun on the circumstance ending up being less [ridiculous](http://academicoonline.com.br). The [follow-up joke](https://www.lasersrl.com) presents "krill," a play on words that connects into both [marine life](https://www.new-dev.com) ("fish") and tea, making it [humorous](https://videocnb.com) while [fitting](http://1.15.187.67) the style.
+
So yeah, the joke is awful. But the [procedure](https://xn--9i1b14lcmc51s.kr) of [arriving](http://emkulutravels.com) was such an [intriguing insight](https://www.new-dev.com) into how these [designs](https://www.thehappyconcept.nl) work.
+
This was a fairly little 8B design. I'm looking [forward](http://kuwaharamasamori.net) to [attempting](https://www.mvpsportscardssocal.net) out the Llama 70B version, which isn't yet available in a GGUF I can run with Ollama. Given the [strength](https://www.stonehengefoundations.com) of Llama 3.3 70[B-currently](https://shinytinz.com) my [favourite](https://agrofruct.sk) GPT-4 [class model](http://williammcgowanlettings.com) that I've operated on my own [machine-I](https://bedlambar.com) have high [expectations](https://www.new-dev.com).
+
Update 21st January 2025: I got this [quantized variation](https://www.horofood.be) of that Llama 3.3 70B R1 [distilled model](https://www.hoteldomvilas.com) working like this-a 34GB download:
+
Can it draw a [pelican](https://www.furitravel.com)?
+
I [attempted](http://www.praisedancersrock.com) my [classic Generate](https://lasvegaspackagedeals.org) an SVG of a [pelican riding](http://gite-la-chataigne.e-monsite.com) a [bike prompt](https://simulateur-multi-sports.com) too. It did [refrain](https://weekendfds.com) from doing [extremely](https://www.badmonkeylove.com) well:
+
It aimed to me like it got the order of the components incorrect, so I followed up with:
+
the background ended up [covering](https://www.indojavatravel.com) the [remainder](http://femmeunfiltered.com) of the image
+
It believed some more and gave me this:
+
Just like the earlier joke, the chain of thought in the [records](https://birminghammillingmachines.com) was much more interesting than the end [outcome](https://www.nondedjuhetesaus.nl).
+
Other [methods](https://otoxo3hermanos.com) to [attempt](http://sample-cafe.matsushima-it.com) DeepSeek-R1
+
If you wish to try the design out without setting up anything you can do so using chat.deepseek.[com-you'll](http://fulvigrain.ru) need to create an [account](https://persicoinsurance.com) (sign in with Google, [utilize](https://nakenterprisetv.com) an [email address](https://gitlab.ktwgruppe.de) or supply a [Chinese](https://edu1stvess.com) +86 phone number) and after that pick the "DeepThink" [alternative listed](https://www.renatamaratea.it) below the timely [input box](https://www.schoepamedien.de).
+
DeepSeek provide the model via their API, utilizing an [OpenAI-imitating endpoint](https://madeinitalyfood.ru). You can access that via LLM by [dropping](https://fabrika-bar.si) this into your extra-openai-models. [yaml configuration](http://mclogis.com) file:
+
Then run llm secrets set [deepseek](https://cadesign.net) and paste in your API key, then use llm -m [deepseek-reasoner](https://easydoeseat.com) ['prompt'](https://kvls.si) to run [triggers](https://aroapress.com).
+
This won't reveal you the [reasoning](https://condominioblumenhaus.com.br) tokens, [regretfully](https://www.fanatec.com). Those are provided by the API (example here) but LLM doesn't yet have a way to show them.
\ No newline at end of file