@Vigge93

Vigge93@lemmy.world · 9 days ago

That’s when you get into more of the nuance with tokenization. It’s not a simple lookup table, and the AI does not have access to the original definitions of the tokens. Also, tokens do not map 1:1 onto words, and a word might be broken into several tokens. For example “There’s” might be broken into “There” + “'s”, and “strawberry” might be broken into “straw” + “berry”.

The reason we often simplify it as token = words is that it is the case for most of the common words.

Vigge93@lemmy.world · 9 days ago

Each word gets converted to a number before it is processed, so asking how many “how many r are there in strawberry” could be converted to “how many 7 are there in 13”, for example.

(Very simplified)

Vigge93@lemmy.world · 22 days ago

You would assume that, but you would be very wrong. People are lazier/sloppier than you might think.

Searching for “client side authentication NVD” turns up a lot of examples. There is even a CWE for "Use of Client-Side Authentication:

https://cwe.mitre.org/data/definitions/603.html

Vigge93@lemmy.world · 23 days ago

That would be an extremely bad idea tho, because it would allow a malicious attacker to

Try random usernames, and if the website returns a hash they know that user exists
Once they have the hash, and the hashing algoritm, it is much easier to brute-force the password, bypassing any safeguards on the server

Username/password validation should happen entirely server-side, with as little information as possible provided to the client