Possible approaches to sort MapReduce reducer inputs by key either before or after reducer execution?
21 visualizzazioni (ultimi 30 giorni)
Mostra commenti meno recenti
Thank you to the MathWorks developers for implementing a version of the MapReduce technique. I am trying to use it as an alternative to the Hadoop implementation but have noticed the difference that there is no "shuffle and sort" step between the mapper and the reducer. As a result, the reducer method is not called on its input keys in sorted order, and thus the order in which the reducer outputs are written to the output datastore does not necessarily match the order in which they would be written in the Hadoop implementation.
I'm trying to figure out a way to account for this difference and obtain output that exactly matches what Hadoop would return.
Could you please let me know whether any or all of them could work and how to implement them if so?
- Intervene in the MapReduce algorithm to introduce a sorting step of the reducer outputs before the output datastore is written to disk.
- Sort the output datastore, whether before or after it is written to disk. I could envision using the approach outlined here, but would this be Big Data "compatible," either by default or via modifications, i.e. would I need to create a tall array at some point and sort that?
- Intervene in the MapReduce algorithm to introduce a "shuffle and sort" step between the mapper and the reducer.
Thanks again,
Dmitri
0 Commenti
Risposte (1)
Ayush
il 10 Dic 2025 alle 8:37
Spostato: Walter Roberson
il 10 Dic 2025 alle 18:09
Hi Dmitri,
To match Hadoop’s output order, load the output datastore as a tall array in MATLAB and use sortrows to sort by key. This approach is fully compatible with big data workflows.
If you need Hadoop’s exact behavior, you can also deploy your MATLAB code to a Hadoop cluster as described in the following document, ensuring each worker has access to the MATLAB Runtime (MCR) and all environment variables are set:
Hope this helps!
0 Commenti
Vedere anche
Categorie
Scopri di più su MapReduce in Help Center e File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!