HackerNews Readings: An Introduction to Statistical Learning: with Applications in R (Springer Texts in Statistics)

All Time Past 6 Months

An Introduction to Statistical Learning: with Applications in R (Springer Texts in Statistics)

Gareth James , Daniela Witten , et al.

4.8 on Amazon

72 HN comments

Mastering Regular Expressions

Jeffrey E. F. Friedl

4.6 on Amazon

72 HN comments

Game Programming Patterns

Robert Nystrom

4.8 on Amazon

68 HN comments

Steve Jobs

Walter Isaacson, Dylan Baker, et al.

4.6 on Amazon

67 HN comments

Machine Learning: A Probabilistic Perspective (Adaptive Computation and Machine Learning series)

Kevin P. Murphy

4.3 on Amazon

66 HN comments

The Cuckoo's Egg: Tracking a Spy Through the Maze of Computer Espionage

Cliff Stoll, Will Damron, et al.

4.7 on Amazon

61 HN comments

Programming: Principles and Practice Using C++ (2nd Edition)

Bjarne Stroustrup

4.5 on Amazon

58 HN comments

Ghost in the Wires: My Adventures as the World’s Most Wanted Hacker

Kevin Mitnick, William L. Simon, et al.

4.6 on Amazon

55 HN comments

Modern Operating Systems

Andrew Tanenbaum and Herbert Bos

4.3 on Amazon

54 HN comments

Head First Design Patterns: Building Extensible and Maintainable Object-Oriented Software 2nd Edition

Eric Freeman and Elisabeth Robson

4.7 on Amazon

52 HN comments

The Singularity Is Near: When Humans Transcend Biology

Ray Kurzweil, George Wilson, et al.

4.4 on Amazon

51 HN comments

The Everything Store: Jeff Bezos and the Age of Amazon

Brad Stone, Pete Larkin, et al.

4.6 on Amazon

51 HN comments

Compilers: Principles, Techniques, and Tools

Alfred Aho, Monica Lam, et al.

4.1 on Amazon

50 HN comments

Test Driven Development: By Example

Kent Beck

4.4 on Amazon

45 HN comments

Patterns of Enterprise Application Architecture

Martin Fowler

4.5 on Amazon

43 HN comments

Prev Page 2/16 Next

Sorted by relevance

magoghmonJan 1, 2019

I would also include some books about statistics.
Two excellent introductory books are:

Statistical Rethinking https://www.amazon.com/Statistical-Rethinking-Bayesian-Examp...

An Introduction to Statistical Learning http://www-bcf.usc.edu/~gareth/ISL/

beccaaf1229onSep 2, 2020

This book is awesome! How does this compare to Introduction to Statistical Learning or Elements of Statistical Learning? Other than the addition of code?

ThripticonOct 1, 2018

+1 for Elements. I started with Introduction to Statistical Learning and then graduated to Elements as I learned more and grew more confident. Those are fantastic books.

markovblingonDec 26, 2016

You should start with "Introduction to Statistical Learning" which is the baby brother of "Elements of Statistics Learning" (arguably THE reference book) - it's easy to follow and has examples in R, a functional language.

Plus, it's free!

http://www-bcf.usc.edu/~gareth/ISL/

larrydagonAug 12, 2017

Two good ebooks. Go well with R.

Introduction to Statistical Learning http://www-bcf.usc.edu/~gareth/ISL/

Elements of Statistical Learning https://web.stanford.edu/~hastie/ElemStatLearn/

jphamtasticonJan 17, 2018

For someone who might want a higher-level primer, Introduction to Statistical Learning is also great.

http://www-bcf.usc.edu/~gareth/ISL/

exoxonApr 7, 2016

Introduction To Statistical Learning:

http://www-bcf.usc.edu/~gareth/ISL/

Is an excellent statistical learning reference.

lovelearningonMay 27, 2015

"Introduction to Statistical Learning" (http://www-bcf.usc.edu/~gareth/ISL/) gives an excellent foundation for all machine learning approaches.

bumbyonDec 31, 2020

Dang, I was really hoping to find examples of the MCMC methods in Ch. 8.

A strong point in the "Introduction to Statistical Learning" by the authors is that each chapter ends with example programs in R (albeit with a fair number of typos).

teruakohatuonMay 5, 2020

The Elements of Statistical Learning and Introduction to Statistical Learning are THE textbooks for an introduction to statistical methods of data science. They are free and very high quality. Most of my class didn't buy it, but many including myself did.

sgillenonJuly 24, 2020

ESL is an excellent reference, if you are looking for an introduction instead I can recommend “An introduction to statistical learning” by (most of?) the same authors.

BrezaonJan 7, 2020

I can't suggest Introduction to Statistical Learning enough, it's a fantastic book! I loaned my copy to another data scientist because I didn't want to hog such a valuable resource.

sn9onJune 10, 2017

Introduction to Statistical Learning is free and quite good: http://www-bcf.usc.edu/~gareth/ISL/

Follow it up with Elements of Statistical Learning by three of the same authors for more advanced stuff.

DrNukeonFeb 11, 2017

An Introduction to Statistical Learning free book is your first way to go here: http://www-bcf.usc.edu/~gareth/ISL/

dengonApr 28, 2019

Since the site mentions "An Introduction to Statistical Learning":

The first book on statistical learning by Hastie, Tibshirani and Friedman, which is absolutely terrific, is freely available for download:

The Elements of Statistical Learning

http://web.stanford.edu/~hastie/ElemStatLearn/

ivanechonJan 10, 2020

An interesting note: Trevor Hastie is an author on this paper. The crowd around here probably knows him best for books he co-wrote: The Elements of Statistical Learning (2001) and An Introduction to Statistical Learning (2013).

matchmike1313onOct 29, 2017

It's never too late. Maybe apart from some age bias in the field (but that most occurs closer to 40). I would suggest starting with some good foundation in stats / modeling, this is an amazing book on that: An Introduction to Statistical Learning by Robert Tibshirani and Trevor Hastie.

neadenonJune 9, 2017

What's your background and what exactly do you mean by modern methods? An Introduction to Statistical Learning is good and you can download the pdf: http://www-bcf.usc.edu/~gareth/ISL/ it assumes you have a pretty decent background in mathematics though.

throw_away_777onOct 7, 2016

Introduction to statistical learning http://www-bcf.usc.edu/~gareth/ISL/ is a great text for beginners interested in machine learning. It is designed to be accessible (there is a more advanced book covering the same topics) but is still quite comprehensive, in terms of machine learning basics.

MxtetrisonApr 1, 2020

James, Witten, Hastie, and Tibshirani, "An Introduction to Statistical Learning."
Available for download: http://faculty.marshall.usc.edu/gareth-james/ISL/

Taylor and Karlin, "An Introduction to Stochastic Modeling"

rusty-rustonMay 1, 2020

Large parts of this blog are straight copy-paste from “An introduction to statistical learning” by Gareth James et. al.

kafkaesqonApr 9, 2016

Yes, it's a lot of turf to cover. Until very recently, most of it was at best, barely touched on in a typical undergraduate curriculum. But here's one source you'll see cited a lot:

An Introduction to Statistical Learning

http://www-bcf.usc.edu/~gareth/ISL/

cs702onJune 6, 2015

These slide tutorials are excellent: engaging and friendly but still rigorous enough that they can be used as reference materials. They're a great companion to "Introduction to Statistical Learning" and "The Elements of Statistical Learning" by Hastie, Tibshirani, et al. The author of these tutorials is Andrew Moore, Dean of the School of Computer Science at Carnegie Mellon.

SaxonRobberonJan 1, 2020

100-page ML book for a brisk tour
Deep Learning (Goodfellow)
Introduction to Statistical Learning

jimmy-deanonJan 1, 2019

Oof those are all dense reads for a new comer... For a first dip into the waters I usually suggest Introduction to Statistical Learning. Then from there move into PRML or ESL.
Were you first introduced to core ML through Bishop? +1 for a solid reading list.

grayclhnonJuly 21, 2014

Two free books that I haven't seen mentioned, that are from more of a stats perspective

* James, Witten, Hastie, and Tibshirani's An Introduction to Statistical Learning, with Applications in R

http://www-bcf.usc.edu/~gareth/ISL/

* Hastie, Tibshirani, and Freedman's Elements of statistical learning (more advanced)

http://statweb.stanford.edu/~tibs/ElemStatLearn/

bssrdfonDec 23, 2016

Still debating whether I should start with An Introduction to Statistical Learning (ISL) or Bishop's Pattern Recognition and Machine Learning (PRML). I really don't like using R (always a python person). Both have rave reviews on Amazon. Any thoughts?

cs702onJan 12, 2016

The lecturers here, Hastie and Tibshirani, are also the authors of the classic text book, "Introduction to Statistical Learning," probably the best introduction to machine/statistical learning I have ever read.[1]

I highly recommend the book and this online course, both of which are FREE.

Hastie and Tibshirani's other book, "The Elements of Statistical Learning," is also excellent but far more theoretical, and best for experienced practicioners who want to use it as a reference guide.[2]

[1] http://www-bcf.usc.edu/~gareth/ISL/ISLR%20Sixth%20Printing.p...
[2] http://statweb.stanford.edu/~tibs/ElemStatLearn/

scythmic_wavesonDec 31, 2020

To echo some of the other comments here, this is a text that's really only appropriate for those who already have a graduate-level grasp of statistics. I love that it's freely available, but ESL is not an introductory text.

For example, here's a screenshot from the introductory chapter (pg. 26): [1]. The authors expect you to already be familiar with matrix analysis applied to statistics.

An Introduction to Statistical Learning (ISL) [2] is aimed at those with a high school level of math.

[1] https://imgur.com/q0NeqdR
[2] https://statlearning.com/book.html

kgwgkonJuly 6, 2018

I think Efron & Hastie may be a bit too advanced and terse for the OP. They cover many things in not so many pages and "our intention was to maintain a technical level of discussion appropriate to Masters’-level statisticians or first-year PhD students."

But given that they distribute the PDF for free it's worth checking out. Hastie, Tibshirani & Friedman's The Elements of Statistical Learning and the watered-down and more practical Introduction to Statistical Learning are also nice. All of them can be downloaded from https://web.stanford.edu/~hastie/pub.htm

evanpwonDec 31, 2020

"An Introduction to Statistical Learning" was written by two of the same authors, and is explicitly meant to be a lower-level introduction to the same ideas: https://statlearning.com/ISLR%20Seventh%20Printing.pdf

psykliconSep 19, 2018

A skilled software engineer who is good at math could probably take some MOOCs, build a portfolio, and do well at data science interviews that emphasize coding. Many interviews often just ask ISL-level questions (Introduction to Statistical Learning), which is studyable over a few months. On the other hand, it would be significantly more difficult for a new-to-coding statistician to become an excellent coder in a short time ... although I've seen some people do it.

eyeballonSep 11, 2018

The society of actuaries is now testing basic analytics and R programming as part of the exam system. The Introduction to statistical learning book is on the exam syllabus.

altairiumblueonFeb 3, 2019

She really doesn't need to spend on anything. All of the good beginner resources are free:

- Hands on programming with R - https://rstudio-education.github.io/hopr/ - Teaches basic programming concepts like working with variables, if-else statements, loops etc in R. If your friend doesn't have a technical background, this is a good place to start.

- R for Data Science - https://r4ds.had.co.nz/ - Teaches you to work with the most commonly used libraries for manipulating and visualising data.

- Introduction to Statistical Learning with R - http://www-bcf.usc.edu/~gareth/ISL/ISLR%20Seventh%20Printing... - A great place to start with some of the theory. If she has a background in statistics, it should be quite accessible.

- Caret package site - http://topepo.github.io/caret/index.html - shows how to use one of the popular packages for machine learning.

QasimKonJan 17, 2018

Does anyone know how Elements of Statistical Learning compares to Introduction to Statistical Learning which is also from the same authors?

psv1onDec 31, 2019

Good free resources:

- MIT: Big Picture of Calculus

- Harvard: Stats 110

- MIT: Matrix Methods in Data Analysis, Signal Processing, and Machine Learning

If any of these seem too difficult - Khan Academy Precalculus (they also have Linear Algebra and Calculus material).

This gives you a math foundation. Some books more specific to ML:

- Foundations of Data Science - Blum et al.

- Elements of Statistical Learning - Hastie et al. The simpler version of this book - Introduction to Statistical Learning - also has a free companion course on Stanford's website.

- Machine Learning: A Probabilistic Perspective - Murphy

That's a lot of material to cover. And at some point you should start experimenting and building things yourself of course. If you'are already familiar with Python, the Data Science Handbook (Jake Vanderplas) is a good guide through the ecosystem of libraries that you would commonly use.

Things I don't recommend - Fast.ai, Goodfellow's Deep Learning Book, Bishop's Pattern Recognition and ML book, Andrew Ng's ML course, Coursera, Udacity, Udemy, Kaggle.

RogerLonJuly 21, 2015

A lot of this is done in discrete math. You know, the actual probability is defined by this integral, but there is no closed form solution to the integral, so we do sums to find the approximate answer. Anyone can understand sums. And, it's probabilities, so the sums must equal one. Not that hard, right ;)

It sure helps to understand the integral equations, especially if you want to read the original literature. But realistically you are going to need to understand summing, normalizing, algorithms for clustering, and so on. You probably don't want to write your own numerical code anyway; someone else did it, and they handled all the edge cases that a naive implementation misses.

You can find PDFs of the James, Witten, Hastie, Tibshirani book "An Introduction to Statistical Learning" [1]. Scroll on through - there is nothing intimidating math wise. All the heavy lifting is left to R.

Jump in, the water is fine!

[1] http://web.stanford.edu/~hastie/pub.htm

kuusistoonSep 12, 2018

I got my start by getting a PhD, but that's perhaps not a practical recommendation. In reality though, you might say I started learning ML by reading Mitchell in class:
https://www.cs.cmu.edu/afs/cs.cmu.edu/user/mitchell/ftp/mlbo...
It's dated, but it's quite approachable and does a great job explaining a lot of the fundamentals.

If you want to approach machine learning from a more statistical perspective, you could also have a look at An Introduction to Statistical Learning to start:
http://www-bcf.usc.edu/~gareth/ISL/
Or if you're more mathematically inclined than the average bear, you could jump directly into The Elements of Statistical Learning:
https://web.stanford.edu/~hastie/ElemStatLearn/

If you want something a little more interactive than a book though, you might have a look at Google's free crash course on machine learning:
https://developers.google.com/machine-learning/crash-course/...
I checked it out briefly maybe six months ago, and it seemed pretty good. It seemed a bit focused on Tensor Flow and some other tools, but that's okay.

fantispugonJan 1, 2020

This may make sense if you want to do image processing and deep reinforcement learning. But there are lots of other domains.

For tabular data (which is probably most relevant in Pharma, and probably the best place to start) Introduction to Statistical Learning by Hastie et al and Max Kuhn's Applied Predictive modelling cover a lot of the classical techniques.

For univariate time series forecasting "Forecasting Principles and Practice" is great.

For natural language processing foundations Jurafsky's Speech and Language Processing is broadly recommended; for cutting edge natural language processing Stanford's CS224n is great: http://web.stanford.edu/class/cs224n/

indigentmartianonJan 21, 2017

Andrew Ng's Coursera course simply titled "Machine Learning" is good - it addresses the mathematics of fundamental algorithms and concepts while giving practical examples and applications: https://www.coursera.org/learn/machine-learning

Regarding books, there are many very high quality textbooks available (legitimately) for free online:

Introduction to Statistical Learning (James et al., 2014) http://www-bcf.usc.edu/~gareth/ISL/

the above book shares some authors with the denser and more in-depth/advanced

The Elements of Statistical Learning (Hastie et al., 2009) http://statweb.stanford.edu/~tibs/ElemStatLearn/

Information Theory: Inference & Learning Algorithms (MacKay, 2003) http://www.inference.phy.cam.ac.uk/itila/p0.html

Bayesian Reasoning & Machine Learning (Barber, 2012) http://web4.cs.ucl.ac.uk/staff/D.Barber/pmwiki/pmwiki.php?n=...

Deep Learning (Goodfellowet al., 2016) http://www.deeplearningbook.org/

Reinforcement Learning: An Introduction (Sutton & Barto, 1998) http://webdocs.cs.ualberta.ca/~sutton/book/ebook/the-book.ht...

^^ the above books are used on many graduate courses in machine learning and are varied in their approach and readability, but go deep into the fundamentals and theory of machine learning. Most contain primers on the relevant maths, too, so you can either use these to brush up on what you already know or as a starting point look for more relevant maths materials.

If you want more practical books/courses, more machine-learning focussed data science books can be helpful. For trying out what you've learned, Kaggle is great for providing data sets and problems.

kashifronJuly 21, 2014

Form my own journey I would say that a good place to start for graphical models might be "Bayesian Reasoning and Machine Learning" by Barber. It's free (http://web4.cs.ucl.ac.uk/staff/D.Barber/pmwiki/pmwiki.php?n=...). I haven't read through it, but I've heard good things. However, it doesn't cover some basic things like SVM, RVM, Neural Networks...

For those I'd suggest "Pattern Recognition and Machine Learning" by Bishop. I've read throughout this and it's really well organized and thought out. For more mathematically advanced ML stuff I'd suggest "Foundations of Machine Learning" by Mohri. For a good reference for anything else I'd suggest "Machine Learning: A Probabilistic Perspective" by Murphy. For more depth on graphical models look at "Probabilistic Graphical Models: Principles and Techniques" by Koller.

On the NLP front there's the standard texts "Speech and Language Processing" by Jurafsky and "Foundations of Statistical Natural Language Processing" by Manning.

I also like "An Introduction to Statistical Learning" by James, Witten, Hastie and Tibshirani.

cs702onJune 29, 2018

If you have at least some coding experience and you are interested in the practical aspects of ML/DL (i.e., you want to learn the how-to, not the why or the whence), my recommendation is to start with the fast.ai courses by Jeremy Howard (co-author of this "Matrix Calculus" cheat sheet) and Rachel Thomas[a]:

* fast.ai ML course: http://forums.fast.ai/t/another-treat-early-access-to-intro-...

* fast.ai DL course: part 1: http://course.fast.ai/ part 2: http://course.fast.ai/part2.html

The fast.ai courses spend very little time on theory, and you can follow the videos at your own pace.

Books:

* The best books on ML (excluding DL), in my view, are "An Introduction to Statistical Learning" by James, Witten, Hastie and Tibshirani, and "The Elements of Statistical Learning" by Hastie, Tibshirani and Friedman. The Elements arguably belongs on every ML practitioner's bookshelf -- it's a fantastic reference manual.[b]

* The only book on DL that I'm aware of is "Deep Learning," by Goodfellow, Bengio and Courville. It's a good book, but I suggest holding off on reading it until you've had a chance to experiment with a range of deep learning models. Otherwise, you will get very little useful out of it.[c]

Good luck!

[a] Scroll down on this page for their bios: http://course.fast.ai/about.html

[b] Introduction to Statistical Learning: http://www-bcf.usc.edu/~gareth/ISL/ The Elements of Statistical Learning: https://web.stanford.edu/~hastie/ElemStatLearn/

[c] http://www.deeplearningbook.org/

joaovictortronFeb 26, 2018

Another very interesting resource on the subject is the book Introduction to Statistical Learning by Garet James et al [1].

The book introduces the foundational concepts of statistical learning (classification, regression, cross-validation) and algorithms such as support vector machines.

It is also available on PDF at the website [1].

[1] http://www-bcf.usc.edu/~gareth/ISL/

amrrsonAug 28, 2017

Statistics and Probability - For non-math background, Openintro.org with R and Sas lab is a good one. Khan academy videos on the same again makes a lot of concepts easier.

http://www.r-bloggers.com/in-depth-introduction-to-machine-l...
Introduction to Statistical Learning http://www-bcf.usc.edu/~gareth/ISL/ (Rob S and by Trevor H, Free I guess) for more in depth, Elements of Statistical Learning by the same.

Linear Algebra (Andrew Ng's this part in Introduction to Machine Learning is a short and crisp one)

If you're not scared by Derivatives, you can check them. But you can easily survive and even excel as a data scientist or ML practitioner with these.

uoaeionApr 30, 2020

Introduction to Statistical Learning

https://faculty.marshall.usc.edu/gareth-james/ISL/

Elements of Statistical Learning

https://web.stanford.edu/~hastie/ElemStatLearn/

Machine Learning: A Probabilistic Perspective

https://mitpress.mit.edu/books/machine-learning-1

psv1onJan 1, 2020

> If you like books and you want to deeply understand ML techniques I'd suggest jumping straight into "Introduction to Statistical Learning" and only learning calculus/stats/matrix methods (linear algebra) as you need them (you really don't need much from them in practice).

This doesn't work. ISL is good, but it aims to be accessible by excluding most of the math. So if you go over it, you'll neither "deeply understand ML techniques", nor will you encounter enough math that you can learn along the way as you suggest.

pjmorrisonDec 31, 2019

There's a MOOC that uses 'Introduction to Statistical Learning' by the authors of 'Elements of Statistical Learning', here: https://lagunita.stanford.edu/courses/HumanitiesSciences/Sta...

nafizhonApr 28, 2019

The Introduction to Statistical Learning book is great.

But, and I think this is not stated enough, there is a big difference between statistical learning and machine learning in terms of how you approach a problem. The subject matter might be same, but the approach to solve problem is different, one is a 'statistics' approach, one is a 'CS' approach. Depending on your background, you might like one but not the other.

You can know more of what I am talking about by reading this famous piece from Leo Breiman [0].

Personally, I feel I was fortunate enough to learn ML from a so called 'CS' perspective through Andrew Ng's course on Coursera.

0. https://projecteuclid.org/download/pdf_1/euclid.ss/100921372...

grayclhnonJune 1, 2015

I wouldn't use it as a stats book. The coverage is very spotty and seems a bit old fashioned. Lots of different test statistics, not too much intuition from what I could see, and too little coverage of most things to be useful (ie the bootstrap gets a few paragraphs, Bayesian stats gets just a little more, etc).

I don't know of good stats books focusing on Python, but I'm sure there are plenty. "An Introduction to Statistical Learning" is free online [1] but it emphasizes R and has very little overlap, so I don't know if it addresses your needs, but it's very good.

1: http://www-bcf.usc.edu/~gareth/ISL/

madenineonJuly 22, 2019

There’s a misconception out there about the data science skills gap - the truth is there is a huge demand for highly skilled data scientists, a big demand for data and ml literate developers, and a moderate demand for entry level data scientists.

These resources from google and courses like Fast AI are great for getting devs up to speed so they can meaningfully contribute to data science projects - filling that big demand for data + ml literate devs, especially internally. They’re not designed to get people jobs (disclosure, getting people jobs in data science is what we do at thisismetis.com)

If you want to go deeper? The open source data science masters is a good set of resources[0]. The first few sections of Goodfellow’s deep learning book are a great crash course in ML math/stats theory[1]. Introduction to Statistical Learning is a staple in most people’s library[2]. There’s a glut of intro level data science content out there on the internet, but intermediate to advanced stuff usually means putting in serious effort or breaking out your checkbook and going back to school (whether traditional or otherwise).

[0]http://datasciencemasters.org/
[1]https://www.deeplearningbook.org/
[2]http://faculty.marshall.usc.edu/gareth-james/ISL/

staredonJan 8, 2018

For intro I recommend "6.1.3 Choosing the Optimal Model" from "An Introduction to Statistical Learning" http://www-bcf.usc.edu/~gareth/ISL/

Notre1onSep 12, 2016

I think a good book that is closer to Andrew NG's course would be the An Introduction to Statistical Learning (ISL)[1]

Depending on your learning style, Data Science from Scratch[2] might be another good option.

BTW, neither of these uses Octave like Andrew NG's course does. The first one uses R and the second uses Python.

[1]: http://www-bcf.usc.edu/~gareth/ISL/
[2]: http://joelgrus.com/2015/04/26/data-science-from-scratch-fir...

misframeronMay 8, 2015

An Introduction to Statistical Learning [0] is also good. It's a little less technical than The Elements of Statistical Learning. We used it for our statistical learning course at my university. The full PDF is available for free as well [1].

[0] http://www-bcf.usc.edu/~gareth/ISL/

[1] http://www-bcf.usc.edu/~gareth/ISL/ISLR%20Fourth%20Printing....

aurahamonJuly 24, 2020

I have Deep Learning with Python (Chollet, first edition) and Hands-On Machine Learning (Géron, first edition). Both books are highly recommended.

Introduction to Statistical Learning is also available for free online:

http://faculty.marshall.usc.edu/gareth-james/ISL/

Although I only read a few chapters from that book, I really like it (but I would have preferred a python version of the book).

Personally, if you have to pick three books from the list, ypu can start with these three options.

wenconJune 25, 2021

* Fooled By Randomness (NN Taleb): Taleb is a complicated personality, but this book gave me a heuristic for thinking about long-tails and uncertain events that I could never have derived myself from a probability textbook.

* Designing Data Intensive Applications (M Kleppmann): Provided a first-principles approach for thinking about the design of modern large-scale data infrastructure. It's not just about assembling different technologies -- there are principles behind how data moves and transforms that transcend current technology, and DDIA is an articulation of those principles. After reading this, I began to notice general patterns in data infrastructure, which helped me quickly grasp how new technologies worked. (most are variations on the same principles)

* Introduction to Statistical Learning (James et al) and Applied Predictive Modeling (Kuhn et al). These two books gave me a grand sweep of predictive modeling methods pre-deep learning, methods which continue to be useful and applicable to a wider variety of problem contexts than AI/Deep Learning. (neural networks aren't appropriate for huge classes of problems)

* High Output Management (A Grove): oft-recommended book by former Intel CEO Andy Grove on how middle management in large corporations actually works, from promotions to meetings (as a unit of work). This was my guide to interpreting my experiences when I joined a large corporation and boy was it accurate. It gave me a language and a framework for thinking about what was happening around me. I heard this was 1 of 2 books Tobi Luetke read to understand management when he went from being a technical person to CEO of Shopify. (the other book being Cialdini's Influence). Hard Things about Hard Things (B Horowitz) is a different take that is also worth a read to understand the hidden--but intentional--managerial design of a modern tech company. These some of the very few books written by practitioners--rather than management gurus--that I've found to track pretty closely with my own real life experiences.

alexanderchronMar 1, 2020

I can really recommend Introduction to statistical learning by James, Witten, Hastie and Tibshirani if you are looking for something that covers the theory without going into too much detail.

There is also Elements of statistical learning by the same authors if you are looking for something more rigorous. I haven’t read very much of it but it is supposed to very good too.

_fullpintonAug 12, 2019

OP mentions Essentials Of Statistical Learning — it’s a pretty heavy book that gets mentioned quite a bit.

From the same Stanford publishing there is Introduction to Statistical Learning. It’s a good intro to Machine Learning as a whole.

Far too often it seems people want to jump directly into Deep Learning, I’d shy away from that and having a better understanding of ML as a discipline makes the application of DL much more productive.

Edit:
Also would like to add a lot of people want to use DL for imaging stuff. Take some time to understand Digital Image Processing as well. It’s a good introduction to convolution and filtering. As well as just understanding what an image is and what can be done with it!

This is just sort of advice from my path.

The second book they mention also had some pretty heavy stuff involving probability and probability models. If you can take some time to understand Automata and it’s supplications such as Hidden Markov Models that’ll be a big help.

Also you mentioning that you never taking a formal algorithm course. While it isn’t necessary as you probably won’t be building anything from scratch. Learning some dynamic programming methods is very helpful when understanding FFT and it’s impact with convolution methods and also how some of these hidden models for probability are evaluated efficiently.

fgimenezonJune 11, 2020

- Norvig's AI: Doesn't have much deep learning, but you get through it and understand the expansiveness of the field.

- Algorithms - Papadimitrou and Vazirani: I had a professor who described this as a poetry book about algorithms. Alternative is Sipser

- An Introduction to Statistical Learning: This is like a diet form of Elements of Statistical Learning which is much more approachable and pragmatic.

- Janeway's Immunobiology - De facto standard of immunology. Great.

- SICP: duh

- Principles of Data Integration: This is more because the subject matter is so important and nobody really has studied fundamentals. Did you know general data integration is AI-complete? If 99% of work in AI was spent on data integration, the field would move so much faster.

jochenleidneronAug 28, 2017

1. You can get a long way with high school calculus and probability theory.

2. Regarding books I second the late David McKay's "Information Theory, Inference and Learning Algorithms" and the second edition of "Elements of Statistical Learning" by Tibshirani et al. (there's also a more accessible version of a subset of the material targeting MBA students called James et al., An Introduction to Statistical Learning). Duda/Hart/Stork's Pattern Classification (2nd ed.) is also great.
The self-published volume by Abu-Mostafa/Magdon-Ismail/Lin, Learning from Data: A Short Course is impressive, short and useful for self-study.

3. Wikipedia is surprisingly good at providing help, and so is Stack Exchange, which has a statistics sub-forum, and of course there are many online MOOC courses on statistics/probability and more specialized ones on machine learning.

4. After that you will want to consult conference papers and online tutorials on particular models (k-means, Ward/HAC, HMM, SVM, perceptron, MLP, linear and logistic regression, kNN, multinomial naive Bayes, ...).

PandabobonMar 20, 2015

I've often seen "An introduction to statistical learning" [1] and "Elements of statistical learning" [2] cited as good resources for statistical inference and machine learning. The former is more of an undergrad text, while the latter seems to be aimed at graduate students. Both books are available free online.

[1]: http://www-bcf.usc.edu/~gareth/ISL/
[2]: http://statweb.stanford.edu/~tibs/ElemStatLearn/

crdbonFeb 23, 2015

Yes and no.

Exporting visit trends over 5 landing pages and a month? Sure.

Exporting page views for 100,000 products each of whom got 5-100 views? Then that 5% sample is going to exclude most products. The latter approach is however necessary if you're trying to determine how each product category is really performing.

Two alternatives I prefer to Google Analytics Premium (once you get to that size): Webtrekk, a small but competent German company whose product costs around 1/10th as much per year, has a fraction of the bugs, and does reliable unsampled daily dumps (moving to hourly, I believe), although the UI is a little less intuitive; and a self-hosted Piwik instance, so you don't need to worry about data exports. The truth is modern relational databases are incredibly powerful and will easily scale even with information like impressions in onsite search. There are multi-TB instances of Postgres out there. I really suggest installing either in parallel to GA or on their own when you set up tracking.

I do agree with you that anybody involved in any kind of job that includes "analytics" in the title, or indeed most people in management, should take an intro stats course. I particularly like Introduction to Statistical Learning because of its brevity, relatively high abstraction level, and lack of maths.

mtzetonJuly 14, 2017

As a rookie trying to get into the field myself, I think there are quite a few ways to start about it.

The programming part with R, python, julia etc., seems to get the most attention here. I think the most important part here is to learn how to load datasets into your system of choice and work with them to get some nice plots out. The book "R for data science"[1] seems like a good intro for this with R and tidyverse.

Somewhat more overlooked here, are the statistical models. I second the recommendation of "Introduction to Statistical Learning"[2], possibly supplemented with it's big brother "Elements of Statistical Learning"[3] if you're more mathematically inclined and want more details. I like their emphasis on starting with simple models and working your way up. I also found their discussion on how to go from data to a mathematical model very lucid.

[1] http://r4ds.had.co.nz/

[2] http://www-bcf.usc.edu/~gareth/ISL/

[3] http://web.stanford.edu/~hastie/ElemStatLearn/

spectramaxonApr 28, 2019

Geron's book is more of a tutorial/cookbook coalesced with important insights into the practice of machine learning. So, I recommend reading Introduction to Statistical Learning (and Elements of Statistical Learning for theoretical background) before jumping into Geron's book. As engineers, I agree we need to have some theoretical background but at the same time, we are applying this knowledge to real world problems. Geron's book is invaluable and I hope publishes more, it is a gem.

chollida1onApr 9, 2016

I wrote this comment a while ago but I think its still very relevant...

I wrote about this here: https://news.ycombinator.com/item?id=8767092 and here: https://news.ycombinator.com/item?id=9433316

Long story short, the biggest mistake I see people making is not actually rolling up their sleeves and learning the math.

People are often content to watch hour after hour of Udacity, Khan academy and Coursera videos but the applied follow up is where most people drop off. At the very least any course work should be followed up by something practical like a kaggle exercise to prove that you can apply the technique you just learned. Consider the benefit of just watching videos vs doing actual applied work.

On one hand if you just watch videos you might learn alot but how do you prove that to someone hiring you? On the other hand if you sit down and spend a week attaching a Kaggle excise then at the very least you have something to point people to, to show that you can apply machine learning techniques.

My recommendation has always been to read the first 5 chapters of Introduction to statistical learning: http://www-bcf.usc.edu/~gareth/ISL/

and if you fly through it then sample Elements of statistical learning http://statweb.stanford.edu/~tibs/ElemStatLearn/ for the topics that you want to learn.

If intro to statistical learning is too advanced, then go to Khan academy and work your way through their statistics videos.
From my experience you can bucket people into skill level by looking at how they attack a new problem.

Beginners tend to start by saying they'll need a hadoop cluster and spend the next week setting up a pipeline.

Intermediate people tend to jump into R or scikit and try to model the problem with a small subset of data and the library and technique they know best.
The advanced people tend to flesh out their hypothesis first and then work out the math and then jump to modelling with a small set of data and finally move to a cluster.

esfandiaonApr 28, 2019

- Wasserman has a book called "All of statistics" that gives a lot of the background required to understand modern machine learning

- Hastie is a co-author of two machine learning books, one is "Elements of Statistical Learning" which is very comprehensive, and "Introduction to Statistical Learning", which is more approachable by people without too much background in stats.

fantispugonJan 1, 2020

If you like books and you want to deeply understand ML techniques I'd suggest jumping straight into "Introduction to Statistical Learning" and only learning calculus/stats/matrix methods (linear algebra) as you need them (you really don't need much from them in practice).

But it's ok to start using libraries and fitting models without understanding how they work deeply, and coming back to these books later (just make sure you come back; there's lots of useful ideas in them!) In which case I'd recommend some of the resources the parent doesn't recommend

codesushi42onJuly 9, 2019

Yes.

An Introduction to Statistical Learning
https://github.com/tpn/pdfs/blob/master/An%20Introduction%20...

Deep Learning
https://books.google.com/books/about/Deep_Learning.html?id=o...

Machine Learning Mastery books
https://machinelearningmastery.com/products/

Convolutional Neural Networks from the Ground Up
https://towardsdatascience.com/convolutional-neural-networks...

Transformers
https://medium.com/inside-machine-learning/what-is-a-transfo...

cashweaveronDec 31, 2020

I found that even Introduction to Statistical Learning made a few too many assumptions when I tried to work through it. I recently finished Jim Hefferon's Linear Algebra [1] and now I'm working through Introduction to Applications of Linear Algebra: Vectors, Matrices, and Least Squares [2] (along with a python companion [3]). The two texts have overlaps but I've found them more helpful than redundant; it's nice to hear different angles on the same topic. I'm planning to focus on statistics next with Blitzstein and Hwang's Introduction to Probability [4] before returning to ISLR.

[1] http://joshua.smcvt.edu/linearalgebra/

[2] http://vmls-book.stanford.edu/

[3] https://ses.library.usyd.edu.au/handle/2123/21370

[4] https://projects.iq.harvard.edu/stat110/home

chollida1onMay 8, 2015

I wrote about this here: https://news.ycombinator.com/item?id=8767092 and here: https://news.ycombinator.com/item?id=9433316

Long story short, the biggest mistake I see people making is not actually rolling up their sleeves and learning the math.

My recommendation has always been to read the first 5 chapters of Introduction to statistical learning: http://www-bcf.usc.edu/~gareth/ISL/

and if you fly thorugh it then sample Elements of statistical learning http://statweb.stanford.edu/~tibs/ElemStatLearn/ for the topics that you want to learn.

If intro to statistical learning is too advanced, then go to Khan academy and work your way through their statistics videos.

From my experience you can bucket people into skill level by looking at how they attack a new problem.

Beginners tend to start by saying they'll need a hadoop cluster and spend the next week setting up a pipeline.

Intermediate people tend to jump into R or scikit and try to model the problem with a small subset of data and the library and technique they know best.

The advanced people tend to flesh out their hypothesis first and then work out the math and then jump to modelling with a small set of data and finally move to a cluster.

crdbonDec 2, 2015

Global, REMOTE only, full/part time/interns.

I am running a small consulting operation with clients in Australia and South East Asia.

Right now, I am signing up new data consulting clients (including a big one in February 2016) and need some help. This involves running AWS instances, figuring out APIs, building a PostgreSQL data warehouse, and finally building various machine learning products from recommendation engines to using multivariate statistics to improve their understanding of their business. The latter part is the fun one but anybody who has done this will know 90% of the work is in the data warehouse.

I write some of the code but I mostly spend my time dealing with the clients and writing the functional specs for you.

You should be familiar with relational algebra and relational databases. I recommend having a good knowledge of the topics covered in "An Introduction to Statistical Learning" (http://www-bcf.usc.edu/~gareth/ISL/). Most of our recent projects have been done with Haskell, Postgres and some bash.

You can usually name your price (per week), worst case the client will say no. Clients are all OK with never meeting you in person (in fact, my first client still hasn’t met me).

If interested, please get in touch - email is in my profile. You can expect some technical discussion and potentially a little programming test. If you can, please point to some public code you've written.

pddproonMay 16, 2016

How does this compare to, say "Introduction to Statistical Learning" and "Elements of Statistical Learning" by Trevor et al? As I understand, the former is also supposed to be a concise introduction to statistical concepts while the latter offers a more rigorous treatment. Where does this book fall in between?

theggintheskyonDec 31, 2020

That's why they wrote Introduction to Statistical Learning[0] and also a video series for the same book[1]. Both books and the video classes are a must for anyone working with Machine Learning and/or Statistics.

[0] http://faculty.marshall.usc.edu/gareth-james/ISL/
[1] https://www.youtube.com/watch?v=5N9V07EIfIg&list=PLOg0ngHtcq...