Has anyone created an AI tarpit for images yet?

hihi24522@lemm.ee · 7 days ago

Garage motor special $100 off? Hooray!

Now if only I could afford a garage…

hihi24522@lemm.ee · 7 days ago

Sorry, the point I was trying to make is that we will be able to know if any statement that is testable is correct.

I just wanted to clarify that your initial comment is only true when you are counting things that don’t actually matter in science. Anything that actually matters can be tested/proven which means that science can be 100% correct for anything that’s actually relevant.

hihi24522@lemm.ee · 7 days ago

Gödel’s theorem is a logical proof about any axiomatic system within which multiplication and division are defined.

By nature, every scientific model that uses basic arithmetic relies on those kinds axioms and is therefore incomplete.

Furthermore, the statement “we live in a simulation” is a logical statement with a truth value. Thus it is within the realm of first order logic, part of mathematics.

The reason you cannot prove the statement is because it itself is standalone. The statement tells you nothing about the universe, so you cannot construct any implication that can be proven directly, or by contradiction, or by proving the converse etc.

As for the latter half of your comment, I don’t think I’m the one who hasn’t thought about this enough.

You are the one repeating the line that “science doesn’t prove things” without realizing that is a generalization not an absolute statement. It also largely depends on what you call science.

Many people say that science doesn’t prove things, it disproves things. Technically both are mathematic proof. In fact, the scientific method is simply proving an implication wrong.

You form a hypothesis to test which is actually an implication “if (assumptions hold true), then (hypothesis holds true).” If your hypothesis is not true then it means your assumptions (your model) are not correct.

However, you can prove things directly in science very easily: Say you have a cat in a box and you think it might be dead. You open the box and it isn’t dead. You now have proven that the cat was not dead. You collected evidence and reached a true conclusion and your limited model of the world with regards to the cat is proven correct. QED.

Say you have two clear crystals in front of you and you know one is quartz and one is calcite but you don’t remember which. But you have vinegar with you and you remember that it should cause a reaction with only the calcite. You place a drop of vinegar on the rocks and one starts fizzing slightly. Viola, you have just directly proven that rock is the calcite.

Now you can only do this kind of proof when your axioms (that one rock is calcite, one rock is quartz, and only the calcite will react with the vinegar) hold true.

The quest of science, of philosophy, is to find axioms that hold true enough we can do these proofs to predict and manipulate the world around us.

Just like in mathematics, there are often multiple different sets of axioms that can explain the same things. It doesn’t matter if you have “the right ones” You only need ones that are not wrong in your use case, and that are useful for whatever you want to prove things with.

The laws of thermodynamics have not been proven. They have been proven statistically but I get the feeling that you wouldn’t count statistics as a valid form of proof.

Fortunately, engineers don’t care what you think, and with those laws as axioms, engineers have proven that there cannot be any perpetual motion machines. Furthermore, Carnot was able to prove that there is a maximum efficiency heat engine and he was able to derive the processes needed to create one.

All inventions typically start as proof based on axioms found by science. And often times, science proves a model wrong by trying to do something, assuming the model was right, and then failing.

The point is that if our scientific axioms weren’t true, we would not be able to build things with them. We would not predict the world accurately. (Notice that statement is an implication) When this happens, (when that implication is proven false) science finds the assumption/axiom in our model that was proven wrong and replaces it with one or more assumptions that are more correct.

Science is a single massive logical proof by process of elimination.

The only arguments I’ve ever seen that it isn’t real proof are in the same vein as the “you can’t prove the world isn’t a simulation.” Yep, it’s impossible to be 100% certain that all of science is correct. However, that doesn’t matter.

It is absolutely possible to know/prove if science dealing with a limited scope is a valid model because if it isn’t, you’ll be able to prove it wrong. “Oh but there could be multiple explanations” yep, the same thing happens in mathematics.

You can usually find multiple sets of axioms that prove the same things. Some of them might allow you to prove more than the others. Maybe they even disagree on certain kinds of statements. But if you are dealing with statements in that zone of disagreement, you can prove which set of axioms is wrong, and if you don’t deal with those statements at all, then both are equally valid models.

Science can never prove that only a single model is correct… because it is certain that you can construct multiple models that will be equally correct. The perfect model doesn’t matter because it doesn’t exist. What matters is what models/axioms are true enough that they can be useful, and science is proving what that is and isn’t.

hihi24522@lemm.ee · 8 days ago

This is false. Godels incompleteness theorems only prove that there will be things that are unprovable in that body of models.

Good news, Newtons flaming laser sword says that if something can’t be proven, it isn’t worth thinking about.

Imagine I said, “we live in a simulation but it is so perfect that we’ll never be able to find evidence of it”

Can you prove my statement? No.

In fact no matter what proof you try to use I can just claim it is part of the simulation. All models will be incomplete because I can always say you can’t prove me wrong. But, because there is never any evidence, the fact we live in a simulation must never be relevant/required for the explanation of things going on inside our models.

Are models are “incomplete” already, but it doesn’t matter and it won’t because anything that has an effect can be measured/catalogued and addded to a model, and anything that doesn’t have an effect doesn’t matter.

TL;DR: Science as a body of models will never be able to prove/disprove every possible statement/hypothesis, but that does not mean it can’t prove/disprove every hypothesis/statement that actually matters.

hihi24522@lemm.ee · 10 days ago

I work in a lab, so yes, I understand how data science works. However, I think you have too much faith in the people running these scrapers.

I think it’s unlikely that ChatGPT would have had those early scandals of leaking people’s SSNs or other private information if the data was actually “cleared by a human team” The entire point of these big companies is laziness; I doubt they have someone looking over the thousands of sites worth of data they feed to their models.

Maybe they do quality checks on the data but even in that event, forcing them to toss out a large data set because some of it was poisoned is a loss for the company. And if enough people poison their work or are able to feed poison to the scrapers, it becomes much less profitable to scrape images automatically.

I previously mentioned methods for possibly slipping through automatic filters in the scraper (though maybe I mentioned that in a different comment chain).

As for a scraper acting like a human by use of an LLM, that sounds hella computationally expensive on the side of the scrapers. There would be few willing to put in that much effort, fewer scrapers makes DDOS like effect of scraping less likely. It would also take more time which means the scraper is spending less time harassing others.

But these are good suggestions. I suppose a drastic option for fighting a true AI mimicking a human would be to make all links have a random chance of sending any user to the tarpit. People would know to click back and try again, but the AI would at best have to render the site, process what it sees, decide it is in the tarpit, and then return. That would further slow down the scraper (or very likely stop/trap it) but that would make it slightly annoying for regular users.

In any case, at a certain point, trying to tailor an AI scraper to avoid a single specific website and navigate the traps for it would probably take more time and effort than sending a human to aggregate the content instead of an automated scraper

hihi24522@lemm.ee · 10 days ago

Oh when you said arms race I thought you were referring to all anti-AI countermeasures including Anubis and tarpits.

Were you only saying you think AI poisoning methods like Glaze and Nightshade are futile? Or do you also think AI mazes/tarpits are futile?

Both kind of seem like a more selfless version of protection like Anubis.

Instead of protecting your own site from scrapers, a tarpit will trap the scraper, stopping it from causing harm to other people’s services whether they have their own protections in place or not.

In the case of poisoning, you also protect others by making it more risky for AI to train on automatically scraped data which would deincentivize automated scrapers from being used on the net in the first place

hihi24522@lemm.ee · 10 days ago

With aggressive scrapers, the “change” is having sites slowed or taken offline, being basically DDOSed by scrapers ignoring robots.txt.

What is your proposed adaptation that’s better than countermeasures? Be rich enough to afford more powerful hardware? Simply stop hosting anything on the internet?

hihi24522@lemm.ee · 10 days ago

Isn’t that what the arms race is? Adapting to new situations?

hihi24522@lemm.ee · 10 days ago

I guess diversity of tactics probably is a good way to stop scrapers from avoiding the traps we set. Good on you for helping out. Also I like the name lol

On a slightly unrelated note, is rust a web dev language? I’ve been meaning to learn it myself since I’ve heard it’s basically a better, modern alternative to C++

hihi24522@lemm.ee · 10 days ago

Oh wow I’m dense, I didn’t even think about the fact that scrapers probably don’t render the full webpage and instead just seek out images in the HTML lol

This seems like a much easier to set up trap than creating a tarpit and then serving bullshit images.

Would it negatively impact the loading times for regular users? Like would it take significant amounts of time for the webpage to load if you added hundreds of these hidden images?

hihi24522@lemm.ee · 11 days ago

That’s the intent behind Nightshade right?

Would overlaying the image with a different, slightly transparent image be enough to shift the weights? Or is there a specific method of pseudorandom hostile noise generation that you’d suggest?

I’d imagine the former is likely more computationally efficient, but if the latter is more effective at poisoning and your goal is to maximize damage regardless of cost, then that would be the better option.

hihi24522@lemm.ee · 11 days ago

Nice straw man infographic, but I’m not sure how it’s relevant.

My post was about methods to poison art scraping models. I said nothing about my reasons for doing so, maybe I just like fucking up corpos. Maybe I just like thinking about interesting topics and hearing other people’s ideas.

Kind of sad that you’re worked up enough about this to both miss the point, and to have an infographic on hand just in case you get offended by anyone not praising generative AI.

If you do have any knowledge of how AI functions, I’d be happy to hear your thoughts on the topic which, again, is on how to poison models that use image scrapers, not the ethics of AI or lack thereof.

hihi24522@lemm.ee · 11 days ago

That thread was hard to read. I do sometimes feel bad for people who don’t understand artists because they don’t realize they have the capability to be artists themselves.

I do realize that any poisoning of models won’t stop backups of the pre-decay models from being utilized, but if we make the web unscrapable, it will slow or even prevent art from being stolen in the future.

I highly doubt the big AI companies get people to screen the scraped images (at least not all of them) because the whole point in their mind is to remove the need to pay people for work lol

hihi24522@lemm.ee · 11 days ago

This thought did cross my mind, but I bet the quality filters probably check for the relevancy of words. And if they don’t already, it wouldn’t take long for them to implement a simple fix.

Generating an AI image based on the text you randomly generate would satisfy this and still cause model decay, but in both cases, generating AI images is pretty costly which means it’s not a very viable attack option for most people.

hihi24522@lemm.ee · edit-2 11 days ago

I’d heard of Glaze before and Nightshade seems useful, but only Glaze protects against mimicry and the Nightshade page makes it seem like the researchers aren’t sure how well the two would do together.

It looks like Nightshade is doing what I described (though on a single image basis) of trying to trick the AI into believing the characteristics of one thing apply to another, but I’d imagine that poisoning could be much more potent if the constraint of “still looks the same to a human” were voided.

If you know you’re feeding an AI, you can go all out on the poisoning. No one cares what it looks like as long as the AI thinks it’s valid.

As for the difficulties in generating meaningful images, it would certainly be more intense than Markov chain text generation, but I think it might not be that hard if you just modify the real art from the site.

Say you just slapped a ton of Snapchat filters on an artwork, or used a blur tool in random places, or drew random line segments that are roughly the same color as their nearby pixels, and maybe shift the hue and saturation. I bet small modifications like that could slip through quality filters but still cause damage to the model.

Edit: Just realized this might sound like I’m suggesting that messing up the art shown on the site through more destructive means would be better than Glaze or Nightshade. That’s not what I meant.

Those edit suggestions were only for the art shown in the tarpit, so you’d only make those destructive modifications to the art you’re showing the AI scrapers. The source images shown to human patrons can remain unedited.

hihi24522@lemm.ee · 11 days ago

Has anyone created an AI tarpit for images yet?

hihi24522@lemm.ee · 11 days ago

Thanks, idk if op needed this but I did

hihi24522@lemm.ee · 13 days ago

I definitely relate. I also kind of have this obsession with using only open source software which also tends to hinder my creativity because some of the open source alternatives to things have steep learning curves.

Anyway, I think this is one of the things that makes me great at math but terrible at learning math. If something is complicated, I have to chew it down to the bone and then rebuild back to the original complicated thing.

As such, I’m really good at doing all sorts of math and even have some of my own weird identities/constants memorized, but it takes me a lot of time and effort to learn new math from a textbook instead of (re)inventing it myself.

hihi24522@lemm.ee · 13 days ago

Them: “Hey you seem a little unfocused today is something wrong?”

Me using 90% of my focus to not say random thoughts out loud or pace or make weird faces because everyone will think I’m insane: “yeah I’m fine, just a little tired is all”

hihi24522@lemm.ee · 14 days ago

Out of curiosity what was the intent of this comment?

To joke about it being irrelevant for most people to know how to fillet a fish
To make a troll joke about filleting something ludicrous (like saying “can you post how to fillet a unicorn next?”)
To make a sadistic joke about killing something that people empathize with more than a fish
To make a vegan statement about how killing a fish and killing a dog should be seen as equally distasteful (no pun intended) as the murder of a sentient thing
To ask a question because you legitimately would like to know how to fillet a dog

No judgement, I’m just fascinated by the fact there are so many different reasons someone might post a comment like this.

hihi24522@lemm.ee · 16 days ago

I did not think that was common practice or even a thing anyone would do at all till I was with a girl who told me she called her pussy “Patricia”

The sex was great and she (the woman not “Patricia” lol) is a wonderful person, but I was, and still am, vaguely unsettled by someone naming their genitals…

hihi24522@lemm.ee · 9 months ago

We have been played for fools