• 1 Post
  • 32 Comments
Joined 2 years ago
cake
Cake day: June 12th, 2023

help-circle
  • well. indeed the devil’s in the detail.

    But going with your story. Yes, you are right in general. But the human input is already there.

    But you have to have human-made material to train the classifier, and if the classifier doesn’t improve, then the generator never does either.

    AI can already understand what stripes are, and can draw the connection that a zebra is a horse without stripes. Therefore the human input is already given. Brute force learning will do the rest. Simply because time is irrelevant and computations occur at a much faster rate.

    Therefore in the future I believe that AI will enhance itself. Because of the input it already got, which is sufficient to hone its skills.

    While I know for now we are just talking about LLMs as blackboxes which are repetitive in generating output (no creativity). But the 2nd grader also has many skills which are sufficient to enlarge its knowledge. Not requiring everything taught by a human. in this sense.

    I simply doubt this:

    LLMs will get progressively less useful

    Where will it get data about new programming languages or solutions to problems in new software?

    On the other hand you are right. AI will not understand abstractions of something beyond its realm. But this does not mean it wont expedite in stuff that it can draw conclusions from.

    And even in the case of new programming languages, I think a trained model will pick up the logic of the code - basically making use of its already learned pattern recognition skills. And probably at a faster pace than a human can understand a new programming language.







  • I could not comprehend what you were up to telling us.

    But the summary is:

    The key essence of this post is a deeply disillusioned and angry critique of modern American society, government, and technology. The author expresses a sense of frustration with the perceived emptiness, manipulation, and decay of U.S. institutions—seeing democracy as a facade, tech innovation as overhyped and hollow, and the government as ineffective. They convey a desire for systemic collapse or radical upheaval (accelerationism), suggesting that elites will soon resort to authoritarianism to maintain control. There’s also an undercurrent of socio-political pessimism, nihilism, and rejection of both corporate and state power—coupled with a belief that the current system is unsustainable and nearing a breaking point.



  • Well. For now the system is not yet running on the new hardware.

    It is now a pondering process of whether migrating everything as it is to the new hardware and then optimize/refactor.

    Or refactor before (or at least develop a plan) and then improve during migration…

    Be sure to make a new post when you decide what you go with, I’m sure people here would enjoy hearing about your approach.

    Nice to hear. Thanks. I will share updates.



  • BTW, your hard disks are going to be your bottleneck unless you’re reaching out over the internet, so your best bet is to move that data onto an NVMe SSD. That’ll blow any other suggestion I have out of the water.

    Yes, we are currently in the process of migrating to PostgreSQL and to a new hardware. Nonetheless the approach we are using is a disaster. So we will refactor our approach as well. Appreciate your input.

    I don’t know what language you’re working in.

    All processing and SQL related transactions are executed via python. But should not have any influence since the SQL server is the bottleneck.

    WITH (NOLOCK)

    Yes I have considered this already for the next update. Since our setup can accept dirty reads - but I have not tested/quantified any benefits yet.

    Don’t do a write and a read at the same time since you’re on HDDs.

    While I understand the underlying issue here, I do not know yet how to control this. Since we have multiple microservices set up which are connected to the DB and either fetch (read), write or delete from different tables. But to my understanding since I am currently not using NOLOCK such occurrences should be handled by SQL no? What I mean is that during a process the object is locked - so no other process can interfere on the SQL object?

    Thanks for putting this together I will review it tomorrow again (Y).






  • broadly you want to locate individual records as quickly as possible by using the most selective criteria

    What can be more selective than "if ID = “XXX”? Yet the whole table still has to be reviewed until XXX is found?

    … and to familiarize yourself with normalization.

    based on a quick review of normalization, I doubt that this helps me - as we are not experiencing such links in the data. For us we “simply” have many products with certain parameters (title, description, etc.) and based on those we process the product and store the product with additional output in a table. However to not process products which were already processed, we want to dismiss any product which is in the processing pipeline which is already stored in the “final” table.

    It isn’t just a big bucket to throw data into to retrieve later.

    thats probably the biggest enlightment I have got since we started working with a database.

    Anyway I appreciate your input. so thank you for this.


  • Ms sql is trash Running anything on a Hdd is a joke

    thank you, I convinced management to migrate to a modern hardware, and we switch to PostgreSQL together with refactoring our design and approach.

    You read write and compare continuously? Did you try to split it into smaller chunks?

    Locks are handled by SQL. but yes, multiple tables are read, written and the final table compared with multiple requests/transactions (connections?) simultaneously. Split into smaller chunks would nonetheless mean that the query would loop through the whole table - in chunks? how would this help with simultaneous transactions?


  • A hot DB should not run on HDDs. Slap some nvme storage into that server if you can. If you can’t, consider getting a new server and migrating to it.

    Did this because of the convincing replies in this thread. Migrating to modern hardware and switch SQL server with PostgreSQL (because its used by the other system we work with already, and there is know-how available in this domain).

    You should avoid scanning an entire table with a huge number of rows when possible, at least during requests.

    But how can we then ensure that I am not adding/processing products which are already in the “final” table, when I have no knowledge about ALL the products which are in this final table?

    Create an index and a table constraint on the relevant columns. … just so that the DB can do the work for you. The DB is better at enforcing constraints than you are (when it can do so).

    This is helpful and also what I experienced. In the peak of the period where the server was overloaded the CPU load was pretty much zero - all processing happened related to disk read/write. Which was because we implemented poor query design/architecture.

    For read-heavy workflows, consider whether caches or read replicas will benefit you.

    May you elaborate what you mean with read replicas? Storage in memory?

    And finally back to my first point: read. Learn. There are no shortcuts. You cannot get better at something if you don’t take the time to educate yourself on it.

    Yes, I will swallow the pill. but thanks to the replies here I have many starting points on where to start.

    RTFM is nice - but starting with page 0 is overwhelming.


  • With MSSQL, the first thing you should check is your indexes. You should have indexes on commonly queried fields and any foreign keys. It’s the best place to start because indexing alone can often make or break database performance.

    Indexing is the most answered step. But for foreign key, to my understanding, I apologize is this is maybe wrong, would lead to split the data into separate tables all related by this key right? What would be the difference in splitting the columns of a table into multiple tables - all related by an mutual column, lets say “id”?


  • So yes. Stack Overflow is going to tell you to RTFM. Because someone needs to sit down with this mess, determine the pros and cons of the system design, and figure out where to start overhauling.

    yes thats me. But thanks to the numerous replies to this thread, I have no a clearer picture about culprits and steps where to start with.

    The tradeoffs you mentioned are exactly why we are in this mess. In the beginning with no knowledge we thought that certain measures would help us. but it turned out that those poor decisions led to the wrong direction.

    Thank you for reply.