
The Power Broker: Robert Moses and the Fall of New York
Robert A. Caro
4.7 on Amazon
142 HN comments

Never Split the Difference: Negotiating as if Your Life Depended on It
Chris Voss, Michael Kramer, et al.
4.8 on Amazon
140 HN comments

Ready Player One
Ernest Cline, Wil Wheaton, et al.
4.7 on Amazon
140 HN comments

Economics in One Lesson: The Shortest and Surest Way to Understand Basic Economics
Henry Hazlitt
4.6 on Amazon
140 HN comments

Open: An Autobiography
Andre Agassi, Erik Davies, et al.
4.7 on Amazon
139 HN comments

The Checklist Manifesto: How to Get Things Right
Atul Gawande
4.6 on Amazon
137 HN comments

The Martian
Andy Weir, Wil Wheaton, et al.
4.7 on Amazon
137 HN comments

The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
Ben Horowitz, Kevin Kenerly, et al.
4.7 on Amazon
136 HN comments

The Moon Is a Harsh Mistress
Robert A. Heinlein, Lloyd James, et al.
4.6 on Amazon
135 HN comments

Foundation
Isaac Asimov, Scott Brick, et al.
4.5 on Amazon
133 HN comments

Calculus: Early Transcendentals
James Stewart , Daniel K. Clegg, et al.
4.2 on Amazon
132 HN comments

High Output Management
Andrew S. Grove
4.6 on Amazon
131 HN comments

Calculus
James Stewart
4.4 on Amazon
130 HN comments

The Big Short: Inside the Doomsday Machine
Michael Lewis, Jesse Boggs, et al.
4.7 on Amazon
127 HN comments

The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition (Springer Series in Statistics)
Trevor Hastie, Robert Tibshirani , et al.
4.6 on Amazon
127 HN comments
laichzeit0onJan 7, 2020
beccaaf1229onSep 2, 2020
markovblingonDec 26, 2016
Plus, it's free!
http://www-bcf.usc.edu/~gareth/ISL/
larrydagonAug 12, 2017
Introduction to Statistical Learning http://www-bcf.usc.edu/~gareth/ISL/
Elements of Statistical Learning https://web.stanford.edu/~hastie/ElemStatLearn/
snotrocketsonJan 25, 2014
exgonAug 22, 2012
[1] http://www-stat.stanford.edu/~tibs/ElemStatLearn/
glimcatonAug 27, 2011
http://www-stat.stanford.edu/~tibs/ElemStatLearn/
teruakohatuonMay 5, 2020
jaf656sonNov 21, 2009
http://www-stat.stanford.edu/~tibs/ElemStatLearn/
pmboumanonFeb 12, 2009
carbocationonDec 7, 2014
1 = (10+ MB PDF warning) https://web.stanford.edu/~hastie/local.ftp/Springer/ESLII_pr...
caffeineonJune 19, 2009
EugeleoonJune 15, 2020
deltuxonApr 18, 2017
https://statweb.stanford.edu/~tibs/ElemStatLearn/
kraphtonOct 15, 2016
If you don't understand something in the book, back up and learn the pre-reqs as needed.
http://web.stanford.edu/~hastie/ElemStatLearn/printings/ESLI...
ktharavaadonFeb 2, 2009
To serve the field of statistical learning justice, "The elements of statistical learning" is an excellent book on the subject.
mendezaonAug 28, 2017
bhickeyonJuly 3, 2011
sn9onJune 10, 2017
Follow it up with Elements of Statistical Learning by three of the same authors for more advanced stuff.
dengonApr 28, 2019
The first book on statistical learning by Hastie, Tibshirani and Friedman, which is absolutely terrific, is freely available for download:
The Elements of Statistical Learning
http://web.stanford.edu/~hastie/ElemStatLearn/
ivanechonJan 10, 2020
curiousgalonJuly 27, 2018
or the more beginner friendly
An Introduction to Statistical Learning: With Applications in R
vector_spacesonJune 15, 2020
https://web.stanford.edu/~hastie/pub.htm
jimbokunonDec 16, 2009
BootvisonMar 6, 2017
disgruntledphd2onNov 24, 2020
For reference, the authors of that book (the best book about ML in general) were all involved in the development of S and R.
knnonMar 16, 2016
simonflynnonDec 18, 2014
- "The Evaluation and Optimization of Trading Strategies" by Pardo
- "The Elements of Statistical Learning" by Hastie et al
snikolovonMay 13, 2010
http://www.autonlab.org/tutorials/
and also a free ebook called Elements of Statistical Learning
http://www-stat.stanford.edu/~tibs/ElemStatLearn/
I've also found a number of course note sets helpful, for example, MIT's machine learning course
http://ocw.mit.edu/OcwWeb/Electrical-Engineering-and-Compute...
dclonJune 10, 2017
It won't teach you much about theoretical statistics, or even things like experiment design, but you will learn a LOT about regression, classification and model fitting which is what everyone seems to want to be able to do these days.
madenineonNov 22, 2019
[0]http://www.deeplearningbook.org/
cs702onJune 6, 2015
aheilbutonNov 19, 2011
warrenmaronMar 26, 2013
I would also go over some basics probability and statistics review. Maybe some linear algebra too.
Python is a great language to do data analysis in. I recommend the scikit-learn and pandas packages and using ipython notebooks.
Another book is the Elements of Statistical Learning (http://www-stat.stanford.edu/~tibs/ElemStatLearn/).
There are also Kaggle contests for testing your chops.
mauritsonMay 22, 2014
Machine Learning: a Probabilistic Perspective, by Murphy
http://www.cs.ubc.ca/~murphyk/MLbook/
Pattern classification, by Duda et all
http://www.amazon.com/Pattern-Classification-Pt-1-Richard-Du...
The Elements of Statistical Learning, by Hastie et all. It is free from Stanford.
http://www-stat.stanford.edu/~tibs/ElemStatLearn
Mining of Massive Datasets, free from Stanford.
http://infolab.stanford.edu/~ullman/mmds.html
Bayesian Reasoning and Machine Learning, by Barber, free available online.
http://web4.cs.ucl.ac.uk/staff/D.Barber/pmwiki/pmwiki.php?n=...
Learning from data, by Abu-Mostafa.
It comes with Caltech video lectures: http://work.caltech.edu/telecourse.html
Pattern Recognition and Machine Learning, by Bischop
http://research.microsoft.com/en-us/um/people/cmbishop/prml/
Also noteworthy
Information Theory, Inference, and Learning Algorithms, by Mackay, free.
http://www.inference.phy.cam.ac.uk/itprnn/book.html
Classification, Parameter Estimation and State Estimation, by van der Heijden.
http://prtools.org
Computer Vision: Models, Learning, and Inference, by Prince, available for free
http://www.computervisionmodels.com/
Probabilistic Graphical Models, by Koller. Has an accompanying course on Coursera.
brentonJune 5, 2008
It doesn't come with pre-canned python, but honestly almost everything (in terms of code) in PCI is available somewhere on the web and/or already built into python libraries, matlab, and/or R.
lymitshnonDec 31, 2019
Another usual recommendation is Elements of Statistical Learning book. Another option is finding a MOOC that you enjoy and following it.
[0]http://course18.fast.ai/ml
arbitrage314onNov 18, 2015
"The Elements of Statistical Learning" (https://web.stanford.edu/~hastie/local.ftp/Springer/OLD/ESLI...) is far and away the best book I've seen.
It took me hundreds of hours to get through it, but if you're looking to understand things at a pretty deep level, I'd say it's well-worth it.
Even if you stop at chapter 3, you'll still know more than most people, and you'll have a great foundation.
Hope this helps!
mellingonDec 31, 2020
https://github.com/melling/ISLR
Would Elements of Statistical Learning be my next book?
I’ve seen the Bishop book highly recommended too, and it has been mentioned in this post.
https://www.amazon.com/Pattern-Recognition-Learning-Informat...
grayclhnonJuly 21, 2014
* James, Witten, Hastie, and Tibshirani's An Introduction to Statistical Learning, with Applications in R
http://www-bcf.usc.edu/~gareth/ISL/
* Hastie, Tibshirani, and Freedman's Elements of statistical learning (more advanced)
http://statweb.stanford.edu/~tibs/ElemStatLearn/
cedonMar 27, 2012
That's the canonical textbook for ML. If industry relies on splines, boosting, and support vector machines, then it is really not that far from modern academic ML research.
pallandtonNov 10, 2013
curiousgalonSep 22, 2016
Thank you Tom!
markovblingonOct 16, 2016
The standard text in ML, "The Elements of Statistical Learning" is authored by statistics Professors.
Statistics is the new statistics. The rest is marketing bullshit.
cs702onJan 12, 2016
I highly recommend the book and this online course, both of which are FREE.
Hastie and Tibshirani's other book, "The Elements of Statistical Learning," is also excellent but far more theoretical, and best for experienced practicioners who want to use it as a reference guide.[2]
--
[1] http://www-bcf.usc.edu/~gareth/ISL/ISLR%20Sixth%20Printing.p...
[2] http://statweb.stanford.edu/~tibs/ElemStatLearn/
ant_szonMay 18, 2014
huaconJuly 21, 2015
http://statweb.stanford.edu/~tibs/ElemStatLearn/
256onJan 17, 2018
The best way to learn the details is of course to read the original papers. This is especially true for following along with the latest developments in deep learning.
[1] https://www.ethz.ch/content/vp/en/lectures/d-infk/2017/autum...
kgwgkonJuly 6, 2018
But given that they distribute the PDF for free it's worth checking out. Hastie, Tibshirani & Friedman's The Elements of Statistical Learning and the watered-down and more practical Introduction to Statistical Learning are also nice. All of them can be downloaded from https://web.stanford.edu/~hastie/pub.htm
perturbationonOct 7, 2019
Also, as others have mentioned, some of the most important skills for DS are data munging, data "presentation", and soft skills like managing expectations / relationships / etc.
I would not recommend this book if you want to get into DS with the idea that, "I'll read this and then I'll know everything I need to." It's too dense and academically-focused, and it would probably be discouraging if you try to read this all without getting your feet wet.
xkgtonJune 15, 2020
It won't be possible to go through all, but are there are any recommendations from HN? Personally, I have come across Elements of Statistical Learning[1], Recommender Systems[2], The Algorithm Design Manual[3] in many recommended lists.
1. http://link.springer.com/openurl?genre=book&isbn=978-0-387-8...
2. http://link.springer.com/openurl?genre=book&isbn=978-3-319-2...
3. http://link.springer.com/openurl?genre=book&isbn=978-1-84800...
craigchingonJuly 17, 2015
ISL is an excellent, free book, introducing you to ML, you can go deeper, but, to me this is where I wish I'd started. I am taking the Data Science track at Coursera (on Practical Machine Learning now) and I am kicking myself that I didn't start with ISL instead.
Now, I know you specifically asked about Python, but the concepts are bigger than the implementation. All of these techniques are available in Python's ML stack, scikit-learn, NumPy, pandas, etc. I don't know of the equivalent of ISL for Python, but if you learn the concepts and you're a programmer of any worth, you will be able to move from R to Python. Maybe take/read ISL, but do the labs in Python, that might be a fun way to go.
Lastly, to go along with ISL, "Elements of Statistical Learning" also by Hastie et al is available for free to dive deeper [3]
[1] -- http://www-bcf.usc.edu/~gareth/ISL/
[2] -- https://lagunita.stanford.edu/courses/HumanitiesandScience/S...
[3] -- http://statweb.stanford.edu/~tibs/ElemStatLearn/
disgruntledphd2onMar 19, 2016
http://www-bcf.usc.edu/~gareth/ISL/ISLR%20First%20Printing.p...
It assumes some understanding of calculus, but doesn't require matrix algebra.
The original (and amazing) book that lots of people used is Elements of Statistical Learning.
https://web.stanford.edu/~hastie/local.ftp/Springer/OLD/ESLI...
Chapters 1-7 are worth their weight in gold. This is one of the cases where the physical books are much better, as you'll need to flick back and forth to see the figures (which are one of the best parts).
The forgoing assumes that you already know some statistics/data analysis (the latter probably being more important).
If you haven't done this before, then I suggest that you acquire some data you care about, install R (a good book is the Art of R Programming by Matloff), and start trying to make inferences. And draw graphs. Many, many, many graphs.
If you keep at this, finding papers/books and reading theory, and implementing it in your spare time, then you can probably get a good data science job in 1-2 years. You'll probably need to devote much of your free time to it though.
I'm assuming that you can already code, given the context :)
nilknonJune 22, 2017
It's definitely the best reference on the subject. With only calculus 3 under your belt the math won't be trivial, but it should overall be fairly approachable and certainly much more so than something like "The Elements of Statistical Learning".
psv1onDec 31, 2019
- MIT: Big Picture of Calculus
- Harvard: Stats 110
- MIT: Matrix Methods in Data Analysis, Signal Processing, and Machine Learning
If any of these seem too difficult - Khan Academy Precalculus (they also have Linear Algebra and Calculus material).
This gives you a math foundation. Some books more specific to ML:
- Foundations of Data Science - Blum et al.
- Elements of Statistical Learning - Hastie et al. The simpler version of this book - Introduction to Statistical Learning - also has a free companion course on Stanford's website.
- Machine Learning: A Probabilistic Perspective - Murphy
That's a lot of material to cover. And at some point you should start experimenting and building things yourself of course. If you'are already familiar with Python, the Data Science Handbook (Jake Vanderplas) is a good guide through the ecosystem of libraries that you would commonly use.
Things I don't recommend - Fast.ai, Goodfellow's Deep Learning Book, Bishop's Pattern Recognition and ML book, Andrew Ng's ML course, Coursera, Udacity, Udemy, Kaggle.
weavieonSep 9, 2015
EDIT: Oops I should have said "An Introduction to Statistical Learning with Applications in R" rather than The Elements of Statistical Learning. The Elements book goes into way too much depth to be a good introduction to the subject.
brentonMay 24, 2008
However, in terms of books I would add Elements of Statistical Learning (Hastie, Tibshirani, and Friedman). It is an excellent text that covers a lot of ground. The down side of this of course is that it is written at the graduate level, so be prepared.
kuusistoonSep 12, 2018
https://www.cs.cmu.edu/afs/cs.cmu.edu/user/mitchell/ftp/mlbo...
It's dated, but it's quite approachable and does a great job explaining a lot of the fundamentals.
If you want to approach machine learning from a more statistical perspective, you could also have a look at An Introduction to Statistical Learning to start:
http://www-bcf.usc.edu/~gareth/ISL/
Or if you're more mathematically inclined than the average bear, you could jump directly into The Elements of Statistical Learning:
https://web.stanford.edu/~hastie/ElemStatLearn/
If you want something a little more interactive than a book though, you might have a look at Google's free crash course on machine learning:
https://developers.google.com/machine-learning/crash-course/...
I checked it out briefly maybe six months ago, and it seemed pretty good. It seemed a bit focused on Tensor Flow and some other tools, but that's okay.
indigentmartianonJan 21, 2017
Regarding books, there are many very high quality textbooks available (legitimately) for free online:
Introduction to Statistical Learning (James et al., 2014) http://www-bcf.usc.edu/~gareth/ISL/
the above book shares some authors with the denser and more in-depth/advanced
The Elements of Statistical Learning (Hastie et al., 2009) http://statweb.stanford.edu/~tibs/ElemStatLearn/
Information Theory: Inference & Learning Algorithms (MacKay, 2003) http://www.inference.phy.cam.ac.uk/itila/p0.html
Bayesian Reasoning & Machine Learning (Barber, 2012) http://web4.cs.ucl.ac.uk/staff/D.Barber/pmwiki/pmwiki.php?n=...
Deep Learning (Goodfellowet al., 2016) http://www.deeplearningbook.org/
Reinforcement Learning: An Introduction (Sutton & Barto, 1998) http://webdocs.cs.ualberta.ca/~sutton/book/ebook/the-book.ht...
^^ the above books are used on many graduate courses in machine learning and are varied in their approach and readability, but go deep into the fundamentals and theory of machine learning. Most contain primers on the relevant maths, too, so you can either use these to brush up on what you already know or as a starting point look for more relevant maths materials.
If you want more practical books/courses, more machine-learning focussed data science books can be helpful. For trying out what you've learned, Kaggle is great for providing data sets and problems.
blahionJuly 13, 2016
This is a book that emphasizes practical applications without getting bent on the math details too much. If on the other hand you are a math whiz, Elements of Statistical Learning is THE book but it expects you to be very proficient in math.
Both books are seriously underrated, which is kind of funny to say because you will find only praises about them, but they deserve even more.
disgruntledphd2onDec 23, 2011
Having some background in statistics, but none in either linguistics or NLP, that book was a revelation. If you read, and implemented all the exercises in that book you'd find a way to make millions, as NLP is a big deal right now. I did find a little too much concentration on the low level stuff (character parsing, bag of words etc), but in conjunction with Elements of Statistical Learning its wonderful.
cs702onJune 29, 2018
* fast.ai ML course: http://forums.fast.ai/t/another-treat-early-access-to-intro-...
* fast.ai DL course: part 1: http://course.fast.ai/ part 2: http://course.fast.ai/part2.html
The fast.ai courses spend very little time on theory, and you can follow the videos at your own pace.
Books:
* The best books on ML (excluding DL), in my view, are "An Introduction to Statistical Learning" by James, Witten, Hastie and Tibshirani, and "The Elements of Statistical Learning" by Hastie, Tibshirani and Friedman. The Elements arguably belongs on every ML practitioner's bookshelf -- it's a fantastic reference manual.[b]
* The only book on DL that I'm aware of is "Deep Learning," by Goodfellow, Bengio and Courville. It's a good book, but I suggest holding off on reading it until you've had a chance to experiment with a range of deep learning models. Otherwise, you will get very little useful out of it.[c]
Good luck!
[a] Scroll down on this page for their bios: http://course.fast.ai/about.html
[b] Introduction to Statistical Learning: http://www-bcf.usc.edu/~gareth/ISL/ The Elements of Statistical Learning: https://web.stanford.edu/~hastie/ElemStatLearn/
[c] http://www.deeplearningbook.org/
maciejgrykaonDec 23, 2011
The Elements of Statistical Learning:
http://www-stat.stanford.edu/~tibs/ElemStatLearn/
Second, while focused on computer vision, has great intro to probability and learning:
Computer Vision: Models, Learning, and Inference
http://computervisionmodels.com/
texthompsononSep 20, 2015
* The Elements of Statistical Learning, by Hastie, Tibshirani and Friedman (https://web.stanford.edu/~hastie/local.ftp/Springer/OLD/ESLI...).
* Probability Theory: The Logic of Science, by ET Jaynes (http://bayes.wustl.edu/etj/prob/book.pdf)
Best of luck. I can see from your post that you're thinking about performance tuning, I'm assuming you mean of software. That's a nice area - the nice part is that compared to fields like medical genetics, data on performance of software is relatively cheap to get, so a lot of issues about small sample sizes are surmountable.
eachroonDec 31, 2019
If you want to really understand the fundamentals of machine learning (deep learning is just one subset of ML!), there is no substitute for picking up one of the classic texts like: Elements of Statistical Learning (https://web.stanford.edu/~hastie/ElemStatLearn/), Machine Learning: A Probabalistic Approach (https://www.cs.ubc.ca/~murphyk/MLbook/) and going through it slowly.
I'd recommend a two pronged approach: dig into fast.ai while reading a chapter a week (or at w/e pace matches your schedule) of w/e ML textbook you end up choosing. Despite all of the hype of deep learning, you really can do some pretty sweet things (ex: classify images/text) with neural nets within a day or two of getting started. Machine learning is a broad field, and you'll find that you will never know as much as you think you should, and that's okay. The most important thing is to stick to a schedule and be consistent with your learning. Good luck on this journey :)
disgruntledphdonJune 12, 2011
The PDF is free, and the book is both extremely well written and super comprehensive.
http://www-stat.stanford.edu/~tibs/ElemStatLearn/
You might also want to check out R, as its an amazing statistics language which has hundreds of packages available for ML. There's a large user community, and the really obscure error messages you get will teach you a lot about statistics. http://cran.r-project.org/
Also, a lot of machine learning is getting the data into a usable form, so learn how to use Unix command line tools such as sed, awk, grep et al. They are absolute lifesavers.
fnbronNov 4, 2017
If you're interested in general machine learning, the Elements of Statistical Learning, by Tibsihirani et. al is great; a more applied book is Applied Statistical Learning by the same author.
For a more applied view, I'd check out Tensorflow or PyTorch tutorials; there's no good book, as far as I'm aware, because the tech changes so quickly.
I've done a series of videos on how to do deep learning that might be useful; if you're interested, there's a link in my profile.
dclonApr 6, 2017
This book covers everything from simple regression and classification from the statistical side to things like gradient boosted decision trees and the like on the ML side with enough math to make sure you understand what's actually going on.
I should note, it doesn't touch deep learning, which is what I suspect most people interested in 'machine learning' without any background in stats want to learn about these days.
phren0logyonNov 21, 2009
1.) Head First Statistics -- Pretty good, but beware the section on Bayes Theorem which is a bit off. This is a quick and casual intro, but worthwhile. I used it to refresh me on my college stats course (which was a long time ago), and I like it. There's also Head First Data Analysis, which I haven't read but could be a reasonable companion. HF Data Analysis uses Excel and R.
2.) Using R for Introductory Statistics (Verzani) -- Good explanations and exercises, and you will also learn R. This second point is actually pretty important, because it's a very valuable tool. Whereas the Head First Stats book walks though pretty simple problems that you work out in pencil, the Verzani book has many real-world data sets to explore that would be impractical to do by hand. That said, I think it's valuable to work things out in pencil with the first book before you move on to this one.
After these books, Elements of Statistical Learning seems to be the current favorite.
dkarlonNov 3, 2010
The Koran (Just five minutes here and there. Honestly, I find it excruciatingly boring.)
The Elements of Statistical Learning (Just started.)
Plus fun stuff. I have a Simenon on my bedside table plus The Tenant of Wildfell Hall. Not sure which is next.
stonemetalonMay 8, 2015
dafrdmanonAug 31, 2020
amrrsonAug 28, 2017
http://www.r-bloggers.com/in-depth-introduction-to-machine-l...
Introduction to Statistical Learning http://www-bcf.usc.edu/~gareth/ISL/ (Rob S and by Trevor H, Free I guess) for more in depth, Elements of Statistical Learning by the same.
Linear Algebra (Andrew Ng's this part in Introduction to Machine Learning is a short and crisp one)
If you're not scared by Derivatives, you can check them. But you can easily survive and even excel as a data scientist or ML practitioner with these.
kitanataonMay 18, 2018
I can do everything you can do.
vowellessonDec 28, 2019
* (Lot's of machine learning books to list: PRML, All of Stats, Deep Learning, etc.)
* Active Portfolio Management - Kahn, Grinold
* Thinking, fast and slow - Kahneman
* Protein Power (the Eades') / Why we get fat (Taubes)
* Why we sleep (Walker)
* Deep Work / So Good They Can't Ignore You (Newport)
* Flowers for Algernon (Keyes)
* Getting to Yes (Fisher)
alexcnwyonAug 20, 2017
The go-to graduate level machine learning text "The Elements of Statistical Learning" was written by 2 statistics professors (one, Prof. Hastie, a fellow South African! :)
Granted, neural networks are often taught as a bolt-on lecture in statistics course machine learning modules but topics like regularization and a rigorous study of overfitting were born in stats departments.
jll29onJuly 24, 2020
Then you will want to consult a text book in your work domain (e.g. introduction to speech & language OR statistical natural language processing for the domain of natural language processing).
And finally, you will want either a book, or free online Web resources/tutorial videos that show you how to do things in practice, given a particular programming language and tool-set (e.g. Python + TensorFlow, Java + DeepLearing4J).
This recipe of Theory + Application + Practice/Tools should get you there.
EugeleoonJune 29, 2020
What resource would you recommend to get an intuitive grasp of statistics?
To give you an idea about what kind of resource (book) I'm looking for: I'm currently reading Elements of Statistical Learning and I enjoy that it has all the mathematical rigour I need to really understand why all of it works, but also that it's heavy on commentary and pictures, which helps me to understand the math quicker. Counterexamples: Baby Rudin one one side of the spectrum, The Hundred-Page Machine Learning Book on the other.
pjmorrisonDec 31, 2019
mustafafonFeb 4, 2011
Good References:
1) Elements of Statistical Learning - Hastie, Tibshirani and Friedman
2) Pattern Classification - Duda, Hart and Stork
3) Pattern Recognition - Theoridis, Koutroumbas
4) Machine Learning - Tom Mitchell
5) http://videolectures.net/Top/Computer_Science/Machine_Learni...
tom_bonSep 22, 2016
I found Larry Harris' Trading and Exchanges: Market Microstructure for Practitioners a solid introduction to market making and trading. Terms and concepts are easy to pick up from the text. I was comfortable enough after reading it to skim stats journal papers talking about market making models. The Stockfighter team had mentioned it in older threads here. It's expensive, but I just borrowed it from the library at my university instead of buying.
I also like The Elements of Statistical Learning which is free from the authors (http://statweb.stanford.edu/~tibs/ElemStatLearn/download.htm...). Although it isn't specifically about economics or markets, you should at least read it.
I'm at a loss on general economics books.
jamiionNov 10, 2010
I used some pretty simple techniques: one bayesian filter to filter jobs from other posts in mailing lists etc, one bayesian filter to score jobs based on keywords. Both filters were trained by feedback from the console ui. The main problem is excluding site-specific keywords that distort the scoring (eg if a site with mostly crappy jobs includes its own name in the listing then even the good jobs will score low by association). A lot of job sites have manky markup so I also had a different scraping script for each site to extract text. All in all its only a couple of hours work. I've been thinking recently about extending it and adding a simple web ui, since finding freelance work is pretty time consuming.
> What sort of collaborative filtering techniques?
I didn't have any specific in mind but there are plenty of good machine learning books that cover different tecniques. If you don't already have a background in maths then 'Programming Collective Intelligence' is a good book to start with. 'The Elements of Statistical Learning' goes into a lot more detail but requires some basic maths.
> Would love to chat further via email.
Email is in my profile.
quantoonSep 3, 2017
What I mean by mathematical background is at or above undergraduate level (so definitely covers calculus, linear algebra, intermediate statistics). A background that can read Elements of Statistical Learning (ESL) comfortably.
What I found is that many "data science" books cover how to use R or Pandas at a very introductory level. Books like ESL focus on core theories (which is great) but do not focus on how to tackle a tough real-world data.
I suppose much of data insight come from experience, but I was wondering whether there are sources to help me jump start.
misframeronMay 8, 2015
[0] http://www-bcf.usc.edu/~gareth/ISL/
[1] http://www-bcf.usc.edu/~gareth/ISL/ISLR%20Fourth%20Printing....
jlgrayonMay 15, 2016
In general, scientific books are an overview of a field, which can only occur with sufficient time for hindsight and synthesis. Even a thousand page book such as Koller's PGM will be littered with references and suggestions of papers to read for a deeper understanding.
One partial exception might be the Deep Learning book by Goodfellow and Bengio, which was made public only a month or so ago. Even this, however, is just an overview. http://www.deeplearningbook.org/
mb7733onJune 15, 2020
monk_the_dogonNov 5, 2011
BTW, here are some good online resources for machine learning:
* The Elements of Statistical Learning (free pdf book): http://www-stat.stanford.edu/~tibs/ElemStatLearn/
* Information Theory, Inference, and Learning Algorithms (free pdf book): http://www.inference.phy.cam.ac.uk/mackay/itila/
* Videos from Autumn School 2006: Machine Learning over Text and Images: http://videolectures.net/mlas06_pittsburgh/
* Bonus link. An Empirical Comparison of Supervised Learning Algorithms (pdf paper): http://www.cs.cornell.edu/~caruana/ctp/ct.papers/caruana.icm... (Note the top 3 are tree ensembles, then SVM, ANN, KNN. Yes, I know there is no 'best' classifier.)
pruthvishettyonJan 5, 2018
Statistical Learning.
(https://web.stanford.edu/~hastie/Papers/ESLII.pdf)
alexanderchronMar 1, 2020
There is also Elements of statistical learning by the same authors if you are looking for something more rigorous. I haven’t read very much of it but it is supposed to very good too.
jamessbonApr 25, 2016
http://www.saedsayad.com/data_mining_map.htm
http://peekaboo-vision.blogspot.co.uk/2013/01/machine-learni...
https://azure.microsoft.com/en-us/documentation/articles/mac...
However, if you want to really understand how things fit together you're probably best reading one of the standard intro textbooks: Murphy's Machine Learning, Bishop's Pattern recognition and machine learning, Hastie et al's The Elements of Statistical Learning, or Wasserman's All of statistics.
Or Barber's textbook, which is freely available online and has some nice mind-maps/concept-maps/trees at the start of each section: http://web4.cs.ucl.ac.uk/staff/D.Barber/pmwiki/pmwiki.php?n=...
hrokronSep 12, 2020
Since no one has really said much about Bayes yet, I think it worth mentioning just how useful it is in DS and ML. A Bayesian approach makes a very good baseline and often one that is hard to beat.
If you're not particular fluent with Probability and Statistics now, let me suggest you add in Khan Academy (make sure to pick the CLEP version) and JBstatistics. Khan has the advantage of quizzes (so you're not just kidding yourself that you know the material). JBstatistics has the advantage of really good explanations. You'll probably want to watch Khan at x1.5 speed.
laxativesonDec 24, 2015
jochenleidneronAug 28, 2017
2. Regarding books I second the late David McKay's "Information Theory, Inference and Learning Algorithms" and the second edition of "Elements of Statistical Learning" by Tibshirani et al. (there's also a more accessible version of a subset of the material targeting MBA students called James et al., An Introduction to Statistical Learning). Duda/Hart/Stork's Pattern Classification (2nd ed.) is also great.
The self-published volume by Abu-Mostafa/Magdon-Ismail/Lin, Learning from Data: A Short Course is impressive, short and useful for self-study.
3. Wikipedia is surprisingly good at providing help, and so is Stack Exchange, which has a statistics sub-forum, and of course there are many online MOOC courses on statistics/probability and more specialized ones on machine learning.
4. After that you will want to consult conference papers and online tutorials on particular models (k-means, Ward/HAC, HMM, SVM, perceptron, MLP, linear and logistic regression, kNN, multinomial naive Bayes, ...).
EvbnonNov 25, 2012
Also, Elements of Statistical Learning is available online for free (previous edition, maybe?) , which covers a lot more standard/traditional statistical curve/surface fitting topics as well, all with high mathematical rigor.
PandabobonMar 20, 2015
[1]: http://www-bcf.usc.edu/~gareth/ISL/
[2]: http://statweb.stanford.edu/~tibs/ElemStatLearn/
mtzetonJuly 14, 2017
The programming part with R, python, julia etc., seems to get the most attention here. I think the most important part here is to learn how to load datasets into your system of choice and work with them to get some nice plots out. The book "R for data science"[1] seems like a good intro for this with R and tidyverse.
Somewhat more overlooked here, are the statistical models. I second the recommendation of "Introduction to Statistical Learning"[2], possibly supplemented with it's big brother "Elements of Statistical Learning"[3] if you're more mathematically inclined and want more details. I like their emphasis on starting with simple models and working your way up. I also found their discussion on how to go from data to a mathematical model very lucid.
[1] http://r4ds.had.co.nz/
[2] http://www-bcf.usc.edu/~gareth/ISL/
[3] http://web.stanford.edu/~hastie/ElemStatLearn/
j7akeonJune 30, 2017
spectramaxonApr 28, 2019
esfandiaonApr 28, 2019
- Hastie is a co-author of two machine learning books, one is "Elements of Statistical Learning" which is very comprehensive, and "Introduction to Statistical Learning", which is more approachable by people without too much background in stats.
tasubotadasonFeb 6, 2020
I wasn't impressed with the quality of the book as well. I did learn quite a few methods there (minhash) that I got to use later so thanks for that, but compared to MLPR, Learning from Data, or TESL books the quality of the former pales.
YadionOct 1, 2018
- [0] Pattern Recognition and Machine Learning (Information
Science and Statistics)
and also:
- [1] The Elements of Statistical Learning
- [2] Reinforcement Learning: An Introduction by Barto and Sutton
- [3] The Deep Learning by Aaron Courville, Ian Goodfellow, and Yoshua Bengio
- [4] Neural Network Methods for Natural Language Processing (Synthesis Lectures on Human Language Technologies) by Yoav Goldberg
Then some math tid-bits:
[5] Introduction to Linear Algebra by Strang
-----------
links:
- [0] [PDF](http://users.isr.ist.utl.pt/~wurmd/Livros/school/Bishop%20-%...)
- [0][AMZ](https://www.amazon.com/Pattern-Recognition-Learning-Informat...)
- [2] [amz](https://www.amazon.com/Reinforcement-Learning-Introduction-A...)
- [2] [site](https://www.deeplearningbook.org/)
- [3] [amz](https://www.amazon.com/Deep-Learning-Adaptive-Computation-Ma...)
- [3] [pdf](http://incompleteideas.net/book/bookdraft2017nov5.pdf)
- [4] [amz](https://www.amazon.com/Language-Processing-Synthesis-Lecture...)
- [5] [amz](https://www.amazon.com/Introduction-Linear-Algebra-Gilbert-S...)
rm999onSep 15, 2013
* Pattern Recognition and Machine Learning by Christopher M. Bishop
* The Elements of Statistical Learning by Trevor Hastie, Robert Tibshirani and Jerome Friedman
Both are very intensive, perhaps to a fault. But they are good references and are good to at least skim through after you have baseline machine learning knowledge. At this stage you should be able to read almost any machine learning paper and actually understand it.
fractionalhareonSep 13, 2020
This probably won't be the case for a question as basic as, "what is regression?" But for any intermediate to advanced interview question involving regression, I would expect companies to jealously guard it.
If you're earnestly interested in building and testing your knowledge, I would recommend you read The Elements of Statistical Learning and Data Analysis Using Regression and Multilevel/Hierarchical Models. Also a good upper undergrad textbook in probability, like A First Course in Probability.
joshvmonDec 31, 2019
Geron Aurelien's Oreilly book is great - Hands-On Machine Learning with Scikit-Learn and TensorFlow. Get the second edition which covers Tensorflow 2.
hashr8064onNov 22, 2018
1. Picked up a High School Algebra Book. Read from beginning to end and did all exercises.
2. Repeat #1 for Algebra 2, Statistics, Geometry and Calculus. (Really helpful for learning those topics fast was Khan Academy).
3. Did MIT Opencourseware's Calculus and Linear Algebra Courses w/the books and exercises.
Now, this took me about 2 years maybe you can get it done quicker, you're at a level where you can pretty much pick up any book, I think I picked up Elements of Statistical Learning, and actually start parsing and understanding what the formulas mean.
One thing I always do is tear apart formulas and equations and play with little parts of them to see how the parts interact with one another and behave under specific conditions, this has really helped my understanding of all kinds of concepts from Pearson's R to Softmax.
darawkonJune 26, 2021
There's a fair amount of overlap between these books, so it's not quite as much as it seems. But i'm hoping to make it through at least a chapter a week this year, which should get me most of the way through them. We'll see how it goes.
n00b101onOct 16, 2016
The standard text in ML, "The Elements of Statistical Learning" is authored by statistics Professors.
To be fair, I think "Machine Learning" was an academic marketing term coined by Computer Science departments. It seems that the term "Statistical Learning" was coined in response by Statistics departments. Other similar marketing buzzwords used by various factions (comp sci, stats, actuarial science, industrial engineers, etc) include "Data Science, "Predictive Analytics," "Data Mining," "Knowledge Discovery," "Knowledge Engineering," "Soft Computing," "Artificial Intelligence," "Big Data," "Deep Learning"... To be honest, it's tiresome and troubling to see academic departments invent and adopt overlapping and vacuous marketing buzzwords.
psb217onNov 25, 2012
I've TAed my university's graduate ML course for the past couple of years, so I've read most chapters of these books in some detail and have hands-on experience using them to help people who are looking closely at these topics for the first time. Interestingly, SVMs are actually a good example of when I'd suggest both books.
achompasonJuly 7, 2018
One of my most valuable activities in grad school was printing and studying each chapter of EoSL.
It's a comprehensive text on the fundamentals of statistics and machine learning, a solid foundation for the cutting-edge techniques relying on deep learning and reinforcement learning.
kblarsen4onFeb 13, 2015
Naive Bayes, for example, is more of a "machine learning" technique where the goal is to classify people into groups based on features. Naive Bayes is called Naive because it assumes that all regressors (x_j) are independent given the target variable (let's call it y and assume it is binary). In other words, the conditional log odds of y=1 given the x_j variables is equal to the sum of the log density ratios, where the log density ratio for variable x_j is ln(f(x_j|y=1)/f(x_j|y=0)).
On the other hand, in the price elasticity example described in post we want to infuse outside knowledge into the model because we don't believe what it says on its own. This is a situation where interpretation and believability is an important part of the objective function because we will be running future pricing scenarios from the model.
If you are building, say, a churn model to predict who is going to cancel their accounts, you probably wouldn't infuse your model with outside knowledge since cross validation accuracy is your main goal. You might regularize your model, however, which can be done in a number of ways (Bayesian or non-Bayesian). But in a pricing model or media mix model, and many other cases, the use case above is very real.
I suggest reading the “Elements of Statistical Learning” by Hastie, Tibshirani, et al.
stochastic_monkonApr 5, 2018
I simply clarified that the question was about computational learning theory, a subfield largely started by Leslie Valiant in the form of PAC (Probably Approximately Correct) learning. The difference in emphasis between the machine learning conferences I mentioned helps point out how practical machine learning (like ICML, matching PRML/ML/ESL) and feature extraction/representation learning (like ICLR, perhaps matching portions of both ICML and ICLR), while important, are not what the previous poster was asking about.
martincmartinonSep 9, 2013
The Elements of Statistical Learning by Hastie, Tibshirani and Friedman, available for free on line.
Pattern Recognition and Machine Learning by Chris Bishop. Very Bayesian.
Machine Learning: A Probabilistic Perspective by Kevin Murphy. Also Bayesian, although not as Bayesian as Bishop. The most recent of the three, and therefore covers a few topics not covered elsewhere like deep learning and conditional random fields. The first few printings are full of errors and confusing passages, should be better before too long.
Did I miss any?
hamiltononOct 12, 2009
From a technical point of view, The Elements of Statistical Learning, by Tibshirani, Friedman, and Hastie. Far and away the most illuminating demonstration that so many ML / AI techniques have a long-standing statistical foundation, and, essentially, everything boils down to the linear model.
laichzeit0onAug 26, 2018
All these are common statistical learning methods used in Data Science.
tomrodonJune 19, 2020
As a subfield in computer science, of course the concern has often been on algorithmic complexity and similar. But that is nascency exposed, in my view, and likely not representative of a fully mature field.
Armchair thought (not a historian of economics): I think Econometrics followed (and continues to follow) a similar evolution -- start with the goals (identification of model parameters, identifiability, KPIs), improve statistical validity and relevance, annotate dead ends or less common routes, continue on trucking.
mrileyonNov 14, 2007
I would also suggest Elements of Statistical Learning: http://www-stat.stanford.edu/~tibs/ElemStatLearn/
As well as Duda, Hart, and Stork's Pattern Classification: http://rii.ricoh.com/~stork/DHS.html
pddproonMay 16, 2016
pskomorochonJan 15, 2010
"Mathematical Statistics and Data Analysis" by John A. Rice
"All of Statistics: A Concise Course in Statistics" by Larry Wasserman
"Pattern Recognition and Machine Learning" by Christopher M. Bishop
"The Elements of Statistical Learning" by T. Hastie et al http://www-stat.stanford.edu/~tibs/ElemStatLearn/
"Information Theory, Inference, and Learning Algorithms", David McKay http://www.inference.phy.cam.ac.uk/itprnn/book.html
"Introduction to Information Retrieval" - Manning et al. http://nlp.stanford.edu/IR-book/information-retrieval-book.h...
"The Algorithm Design Manual, 2nd Edition" - Steven Skiena http://www.algorist.com/
nilknonApr 28, 2019
The Goodfellow book is not complete as an academic intro, but no one book can be. It's not very useful as a practical tutorial, but no book seeking this could cover the mathematical arguments that Goodfellow's book does. I found Goodfellow's book extremely useful for consolidating a lot of handwaving that I'd seen elsewhere and putting it in a slightly more rigorous framework that I could make sense of and immediately work with as a (former) mathematician.
Goodfellow's treatment is especially useful for mathematicians and mathematically-trained practitioners who nevertheless lack a background in advanced statistics. The Elements of Statistical Learning, for instance, is extremely heavy on statistics-specific jargon, and I personally found it far more difficult to extract useful insights from that book than I did from Goodfellow's.
jmountonNov 21, 2009
tom_bonJan 25, 2014
For fans of hard copy, I recently found that if your local (university?) library is a SpringerLink customer, you can purchase a print-on-demand copy of either book for $26.99, which includes shipping. Interior pages are in black and white (including the graphs), but that is a really cheap price for these two.
Andrew Ng's course notes from his physical class at Stanford (CS 229 - Machine Learning) are extensive and available as well at:
http://cs229.stanford.edu/materials.html
ryankupynonAug 4, 2020
https://web.stanford.edu/~hastie/Papers/ESLII.pdf
earlonFeb 12, 2009
http://www.vetta.org/recommended-reading/
I'd second the recommendation of Bishop if you can hack the math, and also Elements of Statistical Learning, though I wouldn't attempt to learn techniques from the latter so much as look at a very interesting mathematical take on them.
gl
bravuraonFeb 12, 2009
I don't understand what is impractical about Bishop. If you are looking blindly to use an off-the-shelf machine learning implementation, that's one thing. Machine Learning has been described as the study of bias. If you want to understand when to pick certain techniques, and develop appropriate biases, then read Bishop.
"The Elements of Statistical Learning" by Hastie, Tibshirani and Friedman gives more of a statistician's approach. The treatment is simply less broad, and also more dated.
You can also look at Andrew Ng's video lectures: http://www.youtube.com/watch?v=UzxYlbK2c7E
He is very well-respected in the field. For certain students, watching a lecture may be preferable to reading a book.
_deliriumonJuly 24, 2010
For languages, frameworks, tools, etc., I almost never read books. But I do read technical books on concepts, ideas, research areas, techniques, etc. So e.g. in computational statistics / ML, I've never bought/read a book like "R in Action", but I do own "The Elements of Statistical Learning".
me2i81onJune 4, 2008
On the other hand going right into code examples is useful, including jumping right into getting real data downloaded and worked on.
DrbbleonMar 27, 2012
/slightly bitter former "academic AI" student.
It's not really Academic vs Industry, though. It is Agents and Logic vs Statistics.
The standard text is Elements of Statistical Learning. It is a grad-level and mostly theory. For goofing around in Python, Programming Collective Intelligence