Movie revenue is one important measure of good and bad movies. It also offers important and intuitionistic feedback to producers, directors and actors. Therefore, it is worthy to put effort on analyzing what factors affect revenue, so that movie makers know how to get higher revenue on next movie by focusing on most correlated factors. Our project analyzes different kinds of factors and how they affect the revenue.

Full paper

CATEGORY:    Data Science

TOOL:    Python

ROLE:     Working in a team of 5 people, we use Jupiter notebook to do data analysis on the data from Kaggle Open source. By doing data cleaning and visualization, we are trying to see what affects movie revenues.

​Discussion & Conclusion

After data analysis, we find out that for numeric values: (budget, popularity, runtime, vote_average, vote_count), the budget has the highest correlation with revenue, vote_count has the second-highest correlation with revenue and popularity is the third.


For non-numeric value, we analyze genres, release month, actor, and find that for genres, the science fiction category has the highest average revenue and relatively high frequency; Adventure is the second-highest average revenue and high frequency, And the drama has the least average revenue and also lowest frequency. We also find that there is almost no correlation between the release month of film and its revenue. For the famousness of the actor, we find that popular actors who act in more movies tend to earn higher revenue.


In conclusion, we think that in order to guarantee the revenue of a movie, the company should spend more budget and also increase advertising in order to increase potential vote_count. And though specific genres tend to predict higher revenue, companies should avoid making films of only high profitable genres. Revenue, although represents the degree of success to some extent, is not everything after all.

Full paper


Contact me