database greenhorn

PoisonedPrisonPanda@discuss.tchncs.de · 16 hours ago

well. indeed the devil’s in the detail.

But going with your story. Yes, you are right in general. But the human input is already there.

But you have to have human-made material to train the classifier, and if the classifier doesn’t improve, then the generator never does either.

AI can already understand what stripes are, and can draw the connection that a zebra is a horse without stripes. Therefore the human input is already given. Brute force learning will do the rest. Simply because time is irrelevant and computations occur at a much faster rate.

Therefore in the future I believe that AI will enhance itself. Because of the input it already got, which is sufficient to hone its skills.

While I know for now we are just talking about LLMs as blackboxes which are repetitive in generating output (no creativity). But the 2nd grader also has many skills which are sufficient to enlarge its knowledge. Not requiring everything taught by a human. in this sense.

I simply doubt this:

LLMs will get progressively less useful

Where will it get data about new programming languages or solutions to problems in new software?

On the other hand you are right. AI will not understand abstractions of something beyond its realm. But this does not mean it wont expedite in stuff that it can draw conclusions from.

And even in the case of new programming languages, I think a trained model will pick up the logic of the code - basically making use of its already learned pattern recognition skills. And probably at a faster pace than a human can understand a new programming language.

PoisonedPrisonPanda@discuss.tchncs.de · 17 hours ago

Well. I doubt that very much. Take as an analogy the success of the chess AI which was left training itself - compared to being trained…

PoisonedPrisonPanda@discuss.tchncs.de · 17 hours ago

Programmers as it turns out are very ‘eh, the code should explain itself to anyone with enough brains to look at it’ type of people

I cannot say how much I hate this.

even worse for old code where proper variable naming and underscores were forbidden. Impossible to get into someone else’s head.

PoisonedPrisonPanda@discuss.tchncs.de · 17 hours ago

Puzzling Stack Exchange

this simply an aggregator?

PoisonedPrisonPanda@discuss.tchncs.de · 17 hours ago

People also blame ai, but if people are going to ai to ask the common already answered questions then… good!

exactly!

While I am indeed worried about the “wasted” energy (thats a whole other topic), thats pretty much why AI is good for.

PoisonedPrisonPanda@discuss.tchncs.de · 17 hours ago

Isn’t more like the main driver for our prospering civilization?

Some might say that the shift in desiring less is the downward path for the over-saturated humanity.

But lets not get too deep here.

PoisonedPrisonPanda@discuss.tchncs.de · 18 hours ago

I could not comprehend what you were up to telling us.

But the summary is:

The key essence of this post is a deeply disillusioned and angry critique of modern American society, government, and technology. The author expresses a sense of frustration with the perceived emptiness, manipulation, and decay of U.S. institutions—seeing democracy as a facade, tech innovation as overhyped and hollow, and the government as ineffective. They convey a desire for systemic collapse or radical upheaval (accelerationism), suggesting that elites will soon resort to authoritarianism to maintain control. There’s also an undercurrent of socio-political pessimism, nihilism, and rejection of both corporate and state power—coupled with a belief that the current system is unsustainable and nearing a breaking point.

PoisonedPrisonPanda@discuss.tchncs.de · 18 hours ago

To be honest. (although I am guilty using chatgpt way too often) I have never not found a question + an answer to a similar problem on stackoverflow.

The realm is saturated. 90 % of the common questions are answered. Complex problems which are not yet asked and answered are probably too difficult to formulate on stackoverflow.

It should be kept at what it is. An enormous repository of knowledge.

PoisonedPrisonPanda@discuss.tchncs.de · 18 hours ago

Well. For now the system is not yet running on the new hardware.

It is now a pondering process of whether migrating everything as it is to the new hardware and then optimize/refactor.

Or refactor before (or at least develop a plan) and then improve during migration…

Be sure to make a new post when you decide what you go with, I’m sure people here would enjoy hearing about your approach.

Nice to hear. Thanks. I will share updates.

PoisonedPrisonPanda@discuss.tchncs.de · 18 hours ago

BTW. nice username.

PoisonedPrisonPanda@discuss.tchncs.de · 18 hours ago

BTW, your hard disks are going to be your bottleneck unless you’re reaching out over the internet, so your best bet is to move that data onto an NVMe SSD. That’ll blow any other suggestion I have out of the water.

Yes, we are currently in the process of migrating to PostgreSQL and to a new hardware. Nonetheless the approach we are using is a disaster. So we will refactor our approach as well. Appreciate your input.

I don’t know what language you’re working in.

All processing and SQL related transactions are executed via python. But should not have any influence since the SQL server is the bottleneck.

WITH (NOLOCK)

Yes I have considered this already for the next update. Since our setup can accept dirty reads - but I have not tested/quantified any benefits yet.

Don’t do a write and a read at the same time since you’re on HDDs.

While I understand the underlying issue here, I do not know yet how to control this. Since we have multiple microservices set up which are connected to the DB and either fetch (read), write or delete from different tables. But to my understanding since I am currently not using NOLOCK such occurrences should be handled by SQL no? What I mean is that during a process the object is locked - so no other process can interfere on the SQL object?

Thanks for putting this together I will review it tomorrow again (Y).

PoisonedPrisonPanda@discuss.tchncs.de · 18 hours ago

high risk/(hopefully) reward situation yes. But also probably because I am lazy - or lets say, I dont want to change my private life center because of the job.

It always boils down to many factors and my gut feeling tells me thats the best compromise. Or its the anxiety to push against change. nobody knows.

But thanks. may your life be prosperous. Especially if mine wont. ;)

PoisonedPrisonPanda@discuss.tchncs.de · 18 hours ago

used. Are you in a startup? And you are learning the ropes on the business side?

yes. and yes. basically everything. I assume summarizing my tasks/work would require 3 job positions filled.

PoisonedPrisonPanda@discuss.tchncs.de · 18 hours ago

The thing is, I am already in a niche - since I am coming from the engineering / non-IT side and sliding into development. Probably the start-up character of my company doesnt help here expecting high income for now.

But I am going with either earning, learning or leaving. And since I learn so fucking much I cannot leave.

PoisonedPrisonPanda@discuss.tchncs.de · 19 hours ago

I am getting paid very well

You guys are getting paid well? what the heck am I doing wrong.

PoisonedPrisonPanda@discuss.tchncs.de · 19 hours ago

broadly you want to locate individual records as quickly as possible by using the most selective criteria

What can be more selective than "if ID = “XXX”? Yet the whole table still has to be reviewed until XXX is found?

… and to familiarize yourself with normalization.

based on a quick review of normalization, I doubt that this helps me - as we are not experiencing such links in the data. For us we “simply” have many products with certain parameters (title, description, etc.) and based on those we process the product and store the product with additional output in a table. However to not process products which were already processed, we want to dismiss any product which is in the processing pipeline which is already stored in the “final” table.

It isn’t just a big bucket to throw data into to retrieve later.

thats probably the biggest enlightment I have got since we started working with a database.

Anyway I appreciate your input. so thank you for this.

PoisonedPrisonPanda@discuss.tchncs.de · 19 hours ago

Ms sql is trash Running anything on a Hdd is a joke

thank you, I convinced management to migrate to a modern hardware, and we switch to PostgreSQL together with refactoring our design and approach.

You read write and compare continuously? Did you try to split it into smaller chunks?

Locks are handled by SQL. but yes, multiple tables are read, written and the final table compared with multiple requests/transactions (connections?) simultaneously. Split into smaller chunks would nonetheless mean that the query would loop through the whole table - in chunks? how would this help with simultaneous transactions?

PoisonedPrisonPanda@discuss.tchncs.de · 19 hours ago

A hot DB should not run on HDDs. Slap some nvme storage into that server if you can. If you can’t, consider getting a new server and migrating to it.

Did this because of the convincing replies in this thread. Migrating to modern hardware and switch SQL server with PostgreSQL (because its used by the other system we work with already, and there is know-how available in this domain).

You should avoid scanning an entire table with a huge number of rows when possible, at least during requests.

But how can we then ensure that I am not adding/processing products which are already in the “final” table, when I have no knowledge about ALL the products which are in this final table?

Create an index and a table constraint on the relevant columns. … just so that the DB can do the work for you. The DB is better at enforcing constraints than you are (when it can do so).

This is helpful and also what I experienced. In the peak of the period where the server was overloaded the CPU load was pretty much zero - all processing happened related to disk read/write. Which was because we implemented poor query design/architecture.

For read-heavy workflows, consider whether caches or read replicas will benefit you.

May you elaborate what you mean with read replicas? Storage in memory?

And finally back to my first point: read. Learn. There are no shortcuts. You cannot get better at something if you don’t take the time to educate yourself on it.

Yes, I will swallow the pill. but thanks to the replies here I have many starting points on where to start.

RTFM is nice - but starting with page 0 is overwhelming.

PoisonedPrisonPanda@discuss.tchncs.de · 19 hours ago

With MSSQL, the first thing you should check is your indexes. You should have indexes on commonly queried fields and any foreign keys. It’s the best place to start because indexing alone can often make or break database performance.

Indexing is the most answered step. But for foreign key, to my understanding, I apologize is this is maybe wrong, would lead to split the data into separate tables all related by this key right? What would be the difference in splitting the columns of a table into multiple tables - all related by an mutual column, lets say “id”?

PoisonedPrisonPanda@discuss.tchncs.de · 19 hours ago

So yes. Stack Overflow is going to tell you to RTFM. Because someone needs to sit down with this mess, determine the pros and cons of the system design, and figure out where to start overhauling.

yes thats me. But thanks to the numerous replies to this thread, I have no a clearer picture about culprits and steps where to start with.

The tradeoffs you mentioned are exactly why we are in this mess. In the beginning with no knowledge we thought that certain measures would help us. but it turned out that those poor decisions led to the wrong direction.

Thank you for reply.

PoisonedPrisonPanda@discuss.tchncs.de · edit-2 19 days ago