Imagine that each matlab worker required 1 byte of data and instructions. Imagine that you have a petabyte of memory. Clearly after you increased the number of workers to more 10^12, all of the memory would be used just in maintaining the workers and you would not be able to improve performance by adding more workers.
The actual amount of memory used as overhead per worker varies with release. These days about 2 gigabytes is a good estimate, so with your petabyte of memory you would not be able to improve performance beyond 500 workers.
You probably don't have a petabyte though. You probably have 8 or 16 or 32 gigabytes, maybe 64. And in reality you need to account for the data used on each worker. Some algorithms need very little memory but some need gigabytes each. It would not be uncommon for you to start running out of memory by 8 workers.
Now... you have to get each worker the data it needs to work on, and you need to transfer the results back. So each iteration could potentially require sending a notable amount of data around. If the amount of work done with the data is small, then the overhead of sending and receiving the data can dominate. This is fairly common!!
Next: each worker is a Process that needs to be scheduled by the operating system. In practice the operating system needs a core to handle scheduling and device interrupts and run the antivirus and firewall and polling for new email and user interaction... It doesn't necessarily need a dedicated core, but you should not count on getting much computation done on the core. So subtract one from your cores. And the workers have to be allocated to the remaining cores. If they are heavy CPU users they will not be wanting to give up the core even enough to make hyperthreading useful (hyperthreading is fast process switching, not additional computing resources. When a process has to wait on something then the CPU can quickly switch to new work, but in the case of heavy computation the process is not waiting on anything external except during transfer of data between processes. Hyperthreading can slow down something that uses CPU extensively.)
We are now at the point that when the number of workers exceeds (cores minus one) then the workers are going to contend for core access. Number of workers equal to the number of cores is common, but one of them might not run at full speed as the operating system uses one.
People have studied the optimal number of workers in various scenarios. There are some computations and data patterns for which increased cores always results in increased performance, "embarrassingly parallel" computation. But for more general tasks, it is common that performance increases sharply up to 4 workers, moderately to 6 (enough to often be worthwhile), less so to 8... and that beyond that it often becomes questionable whether more cores is cost effective.
If you were asked to choose between 16 cores at 2 gigahertz vs 6 cores at 4 gigahertz, there are some times that the more but slower cores is a big advantage, but more of the time you are better off with fewer much faster cores (and correspondingly fewer workers)