In machine learning and statistics, marginalization simply means summing over a set of independent variables. For example, suppose an avid tennis player kept track of the number of days he played tennis over a period of time as well as the weather on that day:
(In this table we’re keeping track of the number of days. If you want probabilities, divide each value in the table by 180. But I think whole numbers are easier to think about so I’m keeping them.)
To marginalize one of the variables, we just sum one of the variables. For example, to marginalize the weather, we would sum each of the rows to find that 96⁄180=53% of the time tennis was played, and 47% of the time tennis was not played. Likewise, to marginalize the boolean variable of whether tennis was played, we just sum the columns: no matter whether tennis was played on that day, how many days was it sunny? 140.
Another way of saying this is the marginal distribution of sunny weather is the first column (containing 70 and 70). The marginal distribution of playing tennis is the first row (containing 70, 25, and 1).