- Home
- Data Science
- Curriculum
- Machine Learning at Scale
Machine Learning at Scale
ADVANCED COURSE
3 units
SKILL SETS
Code up machine learning algorithms on single machines and on clusters of machines / Amazon AWS / Working on problems with terabytes of data / Machine learning pipelines for petabyte-scale data / Algorithmic design / Parallel computing
TOOLS
Apache Hadoop / Apache Spark
DESIGNED BY
James G. Shanahan
This course builds on and goes beyond the collect-and-analyze phase of big data by focusing on how machine learning algorithms can be rewritten and extended to scale to work on petabytes of data, both structured and unstructured, to generate sophisticated models used for real-time predictions. Conceptually, the course is divided into two parts. The first covers fundamental concepts of MapReduce parallel computing, through the eyes of Hadoop, MrJob, and Spark, while diving deep into Spark Core, data frames, the Spark Shell, Spark Streaming, Spark SQL, MLlib, and more. The second part focuses on hands-on algorithmic design and development in parallel computing environments (Spark), developing algorithms (decision tree learning), graph processing algorithms (pagerank/shortest path), gradient descent algorithms (support vectors machines), and matrix factorization. Students will use MapReduce parallel compute frameworks for industrial applications and deployments for various fields, including advertising, finance, healthcare, and search engines. Examples and exercises will be made available in Python notebooks (Hadoop Streaming, MrJob and pySpark).
Advance your data science career with UC Berkeley's online Master of Information and Data Science.
{"admissionsEmail": "admissions@datascience.berkeley.edu", "degreeOffering": "ucb-mids", "fields": [{"hidden": false, "label": "Which program most interests you?", "mountPoint": 1, "name": "degree", "required": true, "type": 3, "value": {"defaultOption": "", "options": [{"label": "Data Science", "value": "MIDS"}, {"label": "Cybersecurity", "value": "CYB"}]}}, {"hidden": true, "label": "", "name": "no_klondike_gdpr_only_consent", "required": true, "type": 9, "value": {"gdprOnly": "false"}}, {"hidden": false, "mountPoint": 2, "name": "", "type": 7, "value": {"text": "Your personal data will be used as described in our [--link:https://ischoolonline.berkeley.edu/legal/privacy-policy/ target:blank]privacy policy[link--]. You may opt out of receiving communications at any time."}}], "grouping": "ucb-umt", "id": 893, "inferredFields": {}, "programsOfStudy": "5deaba1d-5f98-4636-abf0-c35d936afdd2, 5deaba1e-2939-4e8d-9ce6-7345e67fa31e", "screens": [{"allFields": [0, 1], "conditional": {}, "out": {"0": ["$next", [{"data": "$valid"}]]}}], "version": "1.0.1"}
{"admissionsEmail": "admissions@datascience.berkeley.edu", "degreeOffering": "ucb-mids", "fields": [{"hidden": false, "label": "First Name", "mountPoint": 1, "name": "first_name", "required": true, "type": 0, "value": {"text": ""}}, {"hidden": false, "label": "Last Name", "mountPoint": 1, "name": "last_name", "required": true, "type": 0, "value": {"text": ""}}, {"hidden": false, "label": "Email", "mountPoint": 1, "name": "email", "required": true, "type": 0, "value": {"text": ""}}, {"conditionallyRendered": true, "hidden": false, "label": "US Marketing Consent \u00f0\u009f\u0087\u00ba\u00f0\u009f\u0087\u00b8 - MIDS", "mountPoint": 1, "name": "lead_share_opt_in", "required": true, "type": 11, "value": {"checkboxText": "Please contact me about these educational programs.", "defaultChecked": true, "defaultRadio": "none", "disclaimer": "datascience@berkeley\u0027s technology partner, 2U, Inc., and its family of companies, work with multiple universities to offer educational programs in data science and other fields.", "format": "checkbox", "optInValue": "UCB-MIDS Marketing", "smsHiddenConsent": false}}, {"conditionallyRendered": true, "hidden": false, "label": "GDPR Marketing Consent \u00f0\u009f\u0087\u00aa\u00f0\u009f\u0087\u00ba - MIDS", "mountPoint": 1, "name": "lead_share_opt_in", "required": true, "type": 8, "value": {"disclaimer": "This personal data is collected and processed by [--link:https://2u.com/ target:blank]2U, Inc.[link--], datascience@berkeley\u0027s technology partner.", "leadShareOptIn": {"email": "Please email me about these educational programs.", "leadShareValue": "UCB-MIDS Marketing", "phone": "", "sms": "", "text": "datascience@berkeley\u0027s technology partner, [--link:https://2u.com/ target:blank]2U, Inc., and the 2U family of companies[link--], work with multiple universities to offer educational programs in data science and other fields."}, "retailOptIn": {"email": "Email", "phone": "Phone", "sms": "", "text": "Yes, I want to receive additional information about datascience@berkeley. Please contact me via:"}}}, {"hidden": false, "label": "State", "mountPoint": 1, "name": "state", "required": false, "type": 5, "value": {}}, {"hidden": false, "label": "Zip/Postal Code", "mountPoint": 1, "name": "zip_code", "required": false, "type": 0, "value": {"text": ""}}, {"hidden": false, "label": "Country of Residence", "mountPoint": 1, "name": "country", "required": true, "type": 6, "value": {}}, {"hidden": false, "label": "Which program most interests you?", "mountPoint": 1, "name": "degree", "required": true, "type": 3, "value": {"defaultOption": "", "options": [{"label": "Data Science", "value": "MIDS"}, {"label": "Cybersecurity", "value": "CYB"}]}}, {"conditionallyRendered": true, "hidden": true, "label": "Degree Offering: ucb-mids", "mountPoint": 1, "name": "degree_offering", "required": false, "type": 12, "value": {"degreeOffering": "ucb-mids", "programId": "400"}}, {"conditionallyRendered": true, "hidden": true, "label": "Degree Offering: ucb-cyb", "mountPoint": 1, "name": "degree_offering", "required": false, "type": 12, "value": {"degreeOffering": "ucb-cyb", "programId": "399"}}, {"hidden": false, "label": "What is your highest level of education completed?", "mountPoint": 1, "name": "level_of_education", "required": true, "type": 3, "value": {"defaultOption": "", "options": [{"label": "High School", "value": "High School"}, {"label": "Associate\u0027s", "value": "Associates"}, {"label": "Bachelor\u0027s in progress", "value": "Bachelors in progress"}, {"label": "Bachelor\u0027s", "value": "Bachelors"}, {"label": "Master\u0027s in progress", "value": "Masters in progress"}, {"label": "Master\u0027s", "value": "Masters"}, {"label": "Doctorate", "value": "Doctorate"}]}}, {"hidden": false, "label": "What was your undergraduate GPA?", "mountPoint": 1, "name": "stated_gpa_range", "required": true, "type": 3, "value": {"defaultOption": "", "options": [{"label": "4.00 and above", "value": "4.00 and above"}, {"label": "3.99 - 3.50", "value": "3.99-3.50"}, {"label": "3.49 - 3.00", "value": "3.49-3.00"}, {"label": "2.99 - 2.50", "value": "2.99-2.50"}, {"label": "2.49 and below", "value": "2.49 and below"}]}}, {"conditionallyRendered": true, "hidden": false, "label": "What is your educational background?", "mountPoint": 1, "name": "educational_background", "required": true, "type": 3, "value": {"defaultOption": "", "options": [{"label": "Business/Economics", "value": "Business/Economics"}, {"label": "Computer Science", "value": "Computer Science"}, {"label": "Education/Teaching", "value": "Education/Teaching"}, {"label": "English/Writing", "value": "English/Writing"}, {"label": "Engineering", "value": "Engineering"}, {"label": "History/Government", "value": "History/Government"}, {"label": "Math/Statistics", "value": "Math/Statistics"}, {"label": "Physical Science", "value": "Physical Science"}, {"label": "Other", "value": "Other"}]}}, {"conditionallyRendered": true, "hidden": false, "label": "What category best describes your undergraduate major?", "mountPoint": 1, "name": "undergraduate_major", "required": true, "type": 3, "value": {"defaultOption": "", "options": [{"label": "Computer Engineering", "value": "Computer Engineering"}, {"label": "Computer Science", "value": "Computer Science"}, {"label": "Electrical Engineering", "value": "Electrical Engineering"}, {"label": "Mathematics", "value": "Mathematics"}, {"label": "Mechanical Engineering", "value": "Mechanical Engineering"}, {"label": "Physics", "value": "Physics"}, {"label": "Information Technology", "value": "Information Technology"}, {"label": "Other", "value": "Other"}]}}, {"hidden": false, "label": "Have you taken the GRE or GMAT?", "mountPoint": 1, "name": "test_taken", "required": true, "type": 3, "value": {"defaultOption": "", "options": [{"label": "Yes", "value": "Yes"}, {"label": "No", "value": "No"}, {"label": "Registered but not taken", "value": "Registered but not taken"}]}}, {"hidden": false, "label": "How many years of programming experience do you have?", "mountPoint": 1, "name": "years_of_programming_experience", "required": true, "type": 3, "value": {"defaultOption": "", "options": [{"label": "0", "value": "0"}, {"label": "1", "value": "1"}, {"label": "2", "value": "2"}, {"label": "3", "value": "3"}, {"label": "4", "value": "4"}, {"label": "5", "value": "5"}, {"label": "6", "value": "6"}, {"label": "7", "value": "7"}, {"label": "8", "value": "8"}, {"label": "9", "value": "9"}, {"label": "10+", "value": "10+"}]}}, {"conditionallyRendered": true, "hidden": false, "label": "Why are you interested in the Master of Information and Data Science?", "mountPoint": 1, "name": "why_are_you_interested_in_earning_a_mids", "required": true, "type": 3, "value": {"defaultOption": "", "options": [{"label": "Advance my career", "value": "Advance my career"}, {"label": "Switch to a new career", "value": "Switch to a new career"}]}}, {"hidden": false, "label": "Country of Citizenship", "mountPoint": 1, "name": "country_of_citizenship", "required": true, "type": 6, "value": {}}, {"conditionallyRendered": true, "hidden": false, "label": "Phone", "mountPoint": 1, "name": "phone", "required": true, "type": 0, "value": {"text": ""}}, {"conditionallyRendered": true, "hidden": false, "label": "US Marketing Consent \u00f0\u009f\u0087\u00ba\u00f0\u009f\u0087\u00b8 - CYB", "mountPoint": 1, "name": "lead_share_opt_in", "required": true, "type": 11, "value": {"checkboxText": "Please contact me about these educational programs.", "defaultChecked": true, "defaultRadio": "none", "disclaimer": "cybersecurity@berkeley\u0027s technology partner, 2U, Inc., and its family of companies, work with multiple universities to offer educational programs in cybersecurity and other fields.", "format": "checkbox", "optInValue": "UCB-CYB Marketing", "smsHiddenConsent": false}}, {"conditionallyRendered": true, "hidden": false, "label": "GDPR Marketing Consent \u00f0\u009f\u0087\u00aa\u00f0\u009f\u0087\u00ba - CYB", "mountPoint": 1, "name": "lead_share_opt_in", "required": true, "type": 8, "value": {"disclaimer": "This personal data is collected and processed by [--link:https://2u.com/ target:blank]2U, Inc.[link--], cybersecurity@berkeley\u0027s technology partner.", "leadShareOptIn": {"email": "Please email me about these educational programs.", "leadShareValue": "UCB-CYB Marketing", "phone": "", "sms": "", "text": "cybersecurity@berkeley\u0027s technology partner, [--link:https://2u.com/ target:blank]2U, Inc., and its family of companies[link--], work with multiple universities to offer educational programs in cybersecurity and other fields."}, "retailOptIn": {"email": "Email", "phone": "Phone", "sms": "", "text": "Yes, I want to receive additional information about cybersecurity@berkeley. Please contact me via:"}}}, {"hidden": true, "label": "", "name": "no_klondike_gdpr_only_consent", "required": true, "type": 9, "value": {"gdprOnly": "false"}}, {"hidden": false, "mountPoint": 2, "name": "", "type": 7, "value": {"text": "Your personal data will be used as described in our [--link:https://ischoolonline.berkeley.edu/legal/privacy-policy/ target:blank]privacy policy[link--]. You may opt out of receiving communications at any time."}}], "grouping": "ucb-umt", "id": 893, "inferredFields": {}, "programsOfStudy": "5deaba1d-5f98-4636-abf0-c35d936afdd2, 5deaba1e-2939-4e8d-9ce6-7345e67fa31e", "screens": [{"allFields": [8, 22], "conditional": {}, "out": {"1": ["$next", [{"data": "$valid"}]]}}, {"allFields": [9, 10, 11, 12, 13, 14, 15, 16, 17], "conditional": {"10": [1, "", [{"data": "state.degree"}, {"data": "CYB"}, {"op": 0}]], "13": [1, "", [{"data": "state.degree"}, {"data": "MIDS"}, {"op": 0}]], "14": [1, "", [{"data": "state.degree"}, {"data": "CYB"}, {"op": 0}]], "17": [1, "", [{"data": "state.degree"}, {"data": "MIDS"}, {"op": 0}]], "9": [1, "", [{"data": "state.degree"}, {"data": "MIDS"}, {"op": 0}]]}, "out": {"2": ["$next", [{"data": "$valid"}]]}}, {"allFields": [0, 1, 2], "conditional": {}, "out": {"3": ["$next", [{"data": "$valid"}]]}}, {"allFields": [5, 6, 7, 18, 19, 3, 20, 4, 21, 23], "conditional": {"19": [3, "", [{"data": "state.no_klondike_gdpr_only_consent"}, {"data": "true"}, {"op": 0}, {"data": "state.no_klondike_carmen_sandiego_region"}, {"data": "eu"}, {"op": 0}, {"op": 8}]], "20": [1, "", [{"data": "state.no_klondike_gdpr_only_consent"}, {"data": "true"}, {"op": 1}, {"data": "state.no_klondike_carmen_sandiego_region"}, {"data": "eu"}, {"op": 1}, {"data": "state.degree"}, {"data": "CYB"}, {"op": 0}, {"op": 7}, {"op": 7}]], "21": [1, "", [{"data": "state.no_klondike_gdpr_only_consent"}, {"data": "true"}, {"op": 0}, {"data": "state.degree"}, {"data": "CYB"}, {"op": 0}, {"op": 7}, {"data": "state.no_klondike_carmen_sandiego_region"}, {"data": "eu"}, {"op": 0}, {"data": "state.degree"}, {"data": "CYB"}, {"op": 0}, {"op": 7}, {"op": 8}]], "3": [1, "", [{"data": "state.no_klondike_gdpr_only_consent"}, {"data": "true"}, {"op": 1}, {"data": "state.no_klondike_carmen_sandiego_region"}, {"data": "eu"}, {"op": 1}, {"data": "state.degree"}, {"data": "MIDS"}, {"op": 0}, {"op": 7}, {"op": 7}]], "4": [1, "", [{"data": "state.no_klondike_gdpr_only_consent"}, {"data": "true"}, {"op": 0}, {"data": "state.degree"}, {"data": "MIDS"}, {"op": 0}, {"op": 7}, {"data": "state.no_klondike_carmen_sandiego_region"}, {"data": "eu"}, {"op": 0}, {"data": "state.degree"}, {"data": "MIDS"}, {"op": 0}, {"op": 7}, {"op": 8}]]}, "out": {"-1": ["$next", [{"data": "$valid"}]]}}], "version": "1.0.1"}