• 20 Posts
  • 24 Comments
Joined 2 years ago
cake
Cake day: November 8th, 2023

help-circle



  • LWD@lemm.eetoPrivacy@lemmy.dbzer0.comChat app
    link
    fedilink
    arrow-up
    3
    ·
    6 days ago

    any server that (openly or secretly) keeps chat history can ignore requests to delete it

    Twice as true for any client!

    The best thing a server can do is simply be a temporary relay before messages get to those clients. And the messages themselves should be undecipherable. (I’m probably preaching to the choir here, but for those who don’t know, that’s how apps like Signal work.)



















  • Same thing here. If I don’t see a source code repository, the result is an obvious no-go. Especially because I’m pretty sure there are others, similar projects that either exist in the wider closed source space.

    Accountability is an important thing for these kinds of apps. Unfortunately, if you don’t have a reputation online, that’s about as bad as having a bad reputation. No offense intended, it’s just inevitable.





  • From my own fractured understanding, this is indeed true, but the “DeepSeek” everybody is excited about, which performs as well as OpenAI’s best products but faster, is a prebuilt flagship model called R1. (Benchmarks here.)

    The training data will never see the light of day. It would be an archive of every ebook under the sun, every scraped website, just copyright infringement as far as the eye can see. That would be the source they would have to release to be open source, and I doubt they would.

    But DeepSeek does have the code for “distilling” other companies’ more complex models into something smaller and faster (and a bit worse) - but, of course, the input models are themselves not open source, because those models (like Facebook’s restrictive Llama model) were also trained on stolen data. (I’ve downloaded a couple of these distillations just to mess around with them. It feels like having a dumber, slower ChatGPT in a terminal.)

    Theoretically, you could train a model using DeepSeek’s open source code and ethically sourced input data, but that would be quite the task. Most people just add an extra layer of training data and call it a day. Here’s one such example (I hate it.) I can’t even imagine how much data you would have to create yourself in order to train one of these things from scratch. George RR Martin himself probably couldn’t train an AI to speak in a comprehensible manner by feeding it his life’s work.


  • You (and Ed, who I very much respect) are correct: DeepSeek software is open source*. But from the jump, their app and official server instance were plagued with security holes - most likely accidental ones, since they were harmful to DeepSeek itself! - and naturally their app sends data to China because China is where the app is from.

    I do find it pretty funny that they were also sending data to servers in the US though. This isn’t a China issue, it’s a privacy/design issue, and even after they resolve the security holes they still receive your data. Same as OpenAI, same as every other AI company.

    * DeepSeek releases genuinely open source code for everything except for its models, which exceeds industry standards. The models can be downloaded and used without restriction, and this is considered “Open” according to the OSI, but most other people would say it’s not. I don’t think it’s open either. But again, they have gone above and beyond industry standards, and that is why "Open"AI is angry at them.


  • Anonym Private Audiences is currently in closed beta, supporting early-use cases where privacy matters most.

    Wow that really is private! So private we can’t even see what it’s up to.

    Differential “privacy,” based on what I’ve learned, seems to be a joke. The only thing it does effectively is hide the fact you’ve disabled it, if you choose to disable it. But if other people disable it, it becomes easier to identify you. The best move is to not participate, which should encourage other people to also not participate…

    And if you’re one of the unlucky few people still using it, its developers basically need to choose where on a sliding scale from “anonymous” to “useful” they want to start collecting your data. And there is every prerogative for them to push towards “useful” and away from “anonymous.”

    It operates separately, and is not integrated with, our flagship Firefox browser.

    Doubt…





  • Like I’ve said several times before, you’ve scraped the dregs of anti-China “news” online and probably bypassed a ton of stuff that would have actually been interesting here.

    This opinion piece, in particular, is extra jingoistic and practically assumes the USA deserves control of not just computing technology worldwide, but also control of time itself.

    Reuters reporting confirms that High‑Flyer pivoted from equity markets to artificial intelligence research in 2023, building two super‑computing clusters stuffed with Nvidia A100 processors before US export controls came into force.

    On Capitol Hill, the discovery set alarm bells ringing. Washington had barred Beijing from buying the world’s most coveted AI chips, yet here was a Chinese firm running a model of near‑GPT‑4 heft on hardware Washington thought safely out of reach.

    So the US got upset at a Chinese hedge fund company that managed to purchase things legally and then build a product that doesn’t need any Nvidia processors to run anyway.

    Boo-fucking-hoo. A Chinese capitalist company did capitalism better than the United States. It did more open AI than OpenAI.

    Nvidia insists it obeys US law, but lawmakers are now drafting “chip end‑user tracing" legislation to brand each accelerator with an immutable provenance tag.

    And these additional regulations are just a net negative for privacy

    The House Select Committee… accuses the firm of “spying, stealing and subverting" by siphoning petabytes of conversational data… Through a technique called model inversion, adversaries can reconstruct fragments of that training data. In practice, that means Beijing could fish out a US senator’s embargoed speech or an Indian bureaucrat’s budget note and feed the text into targeted influence campaigns long before it ever reaches the public domain.

    In other words, literally everything OpenAI did with the “public” web. But the author doesn’t seem to care about the unethical funneling of data, just the Chineseness of where it ends up.

    Hopefully I don’t need to explain how goofy these examples are, either.