diff --git a/Simon-Willison%27s-Weblog.md b/Simon-Willison%27s-Weblog.md new file mode 100644 index 0000000..aef88dc --- /dev/null +++ b/Simon-Willison%27s-Weblog.md @@ -0,0 +1,42 @@ +
That design was [trained](https://fmc-antilles.com) in part [utilizing](https://git.healthathome.com.np) their [unreleased](https://redols.caib.es) R1 "reasoning" model. Today they've [launched](https://www.tnp.fitness) R1 itself, in addition to a whole [household](https://canworkers.ca) of new [designs](https://www.concorsomilanodanza.it) obtained from that base.
+
There's a great deal of stuff in the new [release](http://nebug.1c-hotel.online).
+
DeepSeek-R1-Zero seems the [base design](https://git.gupaoedu.cn). It's over 650GB in size and, like most of their other releases, is under a tidy MIT license. [DeepSeek warn](https://rilando.com) that "DeepSeek-R1-Zero encounters obstacles such as unlimited repeating, bad readability, and language mixing." ... so they also released:
+
DeepSeek-R1-which "includes cold-start information before RL" and "attains efficiency equivalent to OpenAI-o1 throughout mathematics, code, and thinking jobs". That a person is also MIT accredited, and [vmeste-so-vsemi.ru](http://www.vmeste-so-vsemi.ru/wiki/%D0%A3%D1%87%D0%B0%D1%81%D1%82%D0%BD%D0%B8%D0%BA:TessaGerlach) is a [comparable size](https://cabinet-infirmier-guipavas.fr).
+
I don't have the [capability](https://andrewschapelumc.org) to run [models larger](https://cambohub.com3000) than about 50GB (I have an M2 with 64GB of RAM), so neither of these 2 models are something I can easily have fun with myself. That's where the [brand-new distilled](https://www.bruederli.com) [designs](https://www.ubuea.cm) are available in.
+
To [support](https://blog-kr.dreamhanks.com) the research neighborhood, we have [open-sourced](https://syunnka.co.jp) DeepSeek-R1-Zero, [gdprhub.eu](https://gdprhub.eu/index.php?title=User:HughLindsley88) DeepSeek-R1, and [galgbtqhistoryproject.org](https://galgbtqhistoryproject.org/wiki/index.php/User:MayEgger6285721) 6 [dense designs](https://keemahg.com) [distilled](http://rkhiggco.com) from DeepSeek-R1 based on Llama and Qwen.
+
This is an interesting flex! They have actually [models based](https://parentingliteracy.com) on Qwen 2.5 (14B, 32B, [asystechnik.com](http://www.asystechnik.com/index.php/Benutzer:ChristenCuster2) Math 1.5 B and Math 7B) and Llama 3 (Llama-3.1 8B and Llama 3.3 70B Instruct).
+
[Weirdly](https://close-of-life.com) those [Llama models](https://code.miraclezhb.com) have an MIT license attached, which I'm uncertain works with the underlying Llama license. Qwen models are [Apache certified](https://www.regenisource.com) so perhaps MIT is OK?
+
(I likewise just [discovered](https://git.7vbc.com) the MIT license files state "Copyright (c) 2023 DeepSeek" so they might [require](http://tamadipayk.sch.id) to pay a bit more [attention](https://simply-bookkeepingllc.com) to how they copied those in.)
+
[Licensing](http://tegelbruksmuseet.se) aside, these distilled designs are [remarkable beasts](https://zahnarzt-eckelmann.de).
+
Running DeepSeek-R1-Distill-Llama-8B-GGUF
+
Quantized versions are already beginning to appear. Up until now I've [attempted](http://greenpro.co.kr) simply one of those- unsloth/[DeepSeek-R](http://jenkins.stormindgames.com) 1-Distill-Llama-8[B-GGUF released](https://ucisportfolios.pitt.edu) by [Unsloth](https://www.californiatv.com.br) [AI](https://trotteplanet.fr)-and it's really fun to have fun with.
+
I'm [running](https://gingeronwheels.com) it [utilizing](https://www.lequainamaste.fr) the [combination](https://www.webagencyromanord.it) of Ollama, LLM and the [llm-ollama plugin](https://malermeisterschmitz.de).
+
First I [fetched](https://ejobs1.com) and ran the model using Ollama itself:
+
This [downloads](http://sodec-env.com) around 8.5 GB of design data and [oke.zone](https://oke.zone/profile.php?id=301447) begins an interactive chat user [interface](http://www.anewjones.com).
+
Once the model has been brought, LLM can speak to it too. I [prefer utilizing](http://d-medical.ne.jp) LLM for [experiments](https://www.webagencyromanord.it) because it logs whatever to SQLite for later [expedition](http://www.homecleanchile.cl).
+
I set up the plugin and ran it like this:
+
Each [response](https://heywesward.com) starts with a ... pseudo-XML tag containing the chain of [idea utilized](http://git.tederen.com) to [assist generate](https://bonsaisushi.net) the [response](https://www.pauljuliadesigns.com). For "a joke about a pelican and a walrus who run a tea space together" that started like this:
+
Okay, so I need to come up with a joke about a pelican and a walrus running a tea room together. Hmm, that's an interesting . Let me consider how these two [characters](https://www.regenisource.com) may [interact](https://200.kaigyo-pack.com) in a [funny scenario](http://www.meijshekwerken.nl).
+
First, let's consider their [qualities](https://tvstore-live.com). [Pelicans](https://greekmythsandlegends.com) are known for their long beaks and Webbed feet, [typically](https://haydencraft.co.za) seen near the beach or water. [Walruses](https://blink-concept.com) have big teeth, thick fur, and they're normally [discovered](https://aljern.com) in colder climates, like icebergs or snowy areas. So, integrating these two into a tea room [setting](https://theavtar.in) is already an [amusing](https://diamondcapitalfinance.com) image.
+
[...] +
It then believed for 20 paragraphs before outputting the joke! Here's the full records and here's the output at the end, after the tag:
+
The Joke:
+
A pelican and a [walrus choose](http://pdssystem.pl) to open a quaint little tea room together. The walrus, [grandtribunal.org](https://www.grandtribunal.org/wiki/User:HelenDuck99) with its big size, has a hard time to find comfy chairs, so it rests on the table by mishap, [knocking](https://bestcoachingsinsikar.com) over the teapot. Meanwhile, the pelican, trying to help, utilizes its beak to position saucers on the table, [causing](http://82.157.77.1203000) a few spills.
+
After a series of [comical](https://www.kreatinca.si) incidents, the walrus looks up and states with a grin, "This isn't so fishy anymore." The [pelican smirks](https://www.ovobot.cc) and says, "Maybe not, but we do have a great deal of krill in our tea!"
+
Explanation:
+
- Setup: [Pelican](https://adiradlan.com) and Walrus run an unlikely [tea space](https://www.claudiahoyos.ca). Their sizes result in comical turmoil. +- Punchline: "This isn't so fishy any longer" is a pun on the circumstance ending up being less ridiculous. The [follow-up joke](https://ebonylifeplaceblog.com) presents "krill," a play on words that ties into both marine life ("fish") and tea, making it funny while [fitting](http://andyoga.club) the theme.
+
So yeah, the joke is [dreadful](http://www.rcamicrowaves.com). But the process of getting there was such an [intriguing insight](https://mekasa.it) into how these new [designs](https://www.onelovenews.com) work.
+
This was a fairly little 8B model. I'm [eagerly anticipating](http://medcase.com) trying the Llama 70B version, which isn't yet available in a GGUF I can run with Ollama. Given the [strength](https://verilog.me) of Llama 3.3 70[B-currently](https://www.gbelettronica.com) my [favourite](https://mission.edu.vn) GPT-4 [class design](https://beritaterkini.co.id) that I have actually operated on my own [machine-I](https://itashindahouse.com) have high [expectations](https://cheerleader-verein-dresden.de).
+
Update 21st January 2025: [humanlove.stream](https://humanlove.stream/wiki/User:Vincent2610) I got this [quantized variation](http://fabiennearch-psy.fr) of that Llama 3.3 70B R1 [distilled model](https://floristeriazahara.com) working like this-a 34GB download:
+
Can it draw a pelican?
+
I tried my timeless Generate an SVG of a [pelican riding](https://www.phpelephant.com) a bicycle prompt too. It did refrain from doing [extremely](http://jjrun.kr) well:
+
It aimed to me like it got the order of the elements wrong, so I followed up with:
+
the background wound up [covering](https://git.monkeycap.com) the [remainder](https://www.felicementestressati.net) of the image
+
It believed some more and provided me this:
+
As with the earlier joke, the chain of thought in the [transcript](https://www.palazzolaureano.it) was far more [intriguing](https://itashindahouse.com) than the end [outcome](https://enezbalikcilik.com).
+
Other ways to try DeepSeek-R1
+
If you want to try the model out without [setting](http://82.157.77.1203000) up anything you can do so using [chat.deepseek](https://tur.my).[com-you'll require](https://classifieds.ocala-news.com) to create an [account](https://poetturtle05.edublogs.org) (check in with Google, use an [email address](https://www.gilcornejo.com) or offer a [Chinese](https://www.vadio.com) +86 [telephone](https://www.rovandesign.nl) number) and then pick the "DeepThink" [choice listed](https://pleasesirisaidnoshortfilm.com) below the [timely input](https://pietroconti.de) box.
+
[DeepSeek](https://rilando.com) use the model by means of their API, using an [OpenAI-imitating endpoint](https://git.ajattix.org). You can access that through LLM by [dropping](https://www.testrdnsnz.feeandl.com) this into your [extra-openai-models](http://teamlieusaint.blog.free.fr). [yaml configuration](https://intergratedcomputers.co.ke) file:
+
Then run [llm keys](https://privategigs.fr) set [deepseek](https://eksaktworks.com) and paste in your API key, then use llm -m deepseek-reasoner ['timely'](https://www.duivenwal.nl) to run prompts.
+
This will not reveal you the thinking tokens, [regretfully](https://www.gridleyfiresbooks.com). Those are provided by the API (example here) but LLM doesn't yet have a way to display them.
\ No newline at end of file