

71·
9 days agoEach word gets converted to a number before it is processed, so asking how many “how many r are there in strawberry” could be converted to “how many 7 are there in 13”, for example.
(Very simplified)
Each word gets converted to a number before it is processed, so asking how many “how many r are there in strawberry” could be converted to “how many 7 are there in 13”, for example.
(Very simplified)
You would assume that, but you would be very wrong. People are lazier/sloppier than you might think.
Searching for “client side authentication NVD” turns up a lot of examples. There is even a CWE for "Use of Client-Side Authentication:
That would be an extremely bad idea tho, because it would allow a malicious attacker to
Username/password validation should happen entirely server-side, with as little information as possible provided to the client
That’s when you get into more of the nuance with tokenization. It’s not a simple lookup table, and the AI does not have access to the original definitions of the tokens. Also, tokens do not map 1:1 onto words, and a word might be broken into several tokens. For example “There’s” might be broken into “There” + “'s”, and “strawberry” might be broken into “straw” + “berry”.
The reason we often simplify it as token = words is that it is the case for most of the common words.