An Introduction to Statistical Learning: with Applications in R (Springer Texts in Statistics)
Gareth James , Daniela Witten , et al.
4.8 on Amazon
72 HN comments
Mastering Regular Expressions
Jeffrey E. F. Friedl
4.6 on Amazon
72 HN comments
Game Programming Patterns
Robert Nystrom
4.8 on Amazon
68 HN comments
Steve Jobs
Walter Isaacson, Dylan Baker, et al.
4.6 on Amazon
67 HN comments
Machine Learning: A Probabilistic Perspective (Adaptive Computation and Machine Learning series)
Kevin P. Murphy
4.3 on Amazon
66 HN comments
The Cuckoo's Egg: Tracking a Spy Through the Maze of Computer Espionage
Cliff Stoll, Will Damron, et al.
4.7 on Amazon
61 HN comments
Programming: Principles and Practice Using C++ (2nd Edition)
Bjarne Stroustrup
4.5 on Amazon
58 HN comments
Ghost in the Wires: My Adventures as the World’s Most Wanted Hacker
Kevin Mitnick, William L. Simon, et al.
4.6 on Amazon
55 HN comments
Modern Operating Systems
Andrew Tanenbaum and Herbert Bos
4.3 on Amazon
54 HN comments
Head First Design Patterns: Building Extensible and Maintainable Object-Oriented Software 2nd Edition
Eric Freeman and Elisabeth Robson
4.7 on Amazon
52 HN comments
The Singularity Is Near: When Humans Transcend Biology
Ray Kurzweil, George Wilson, et al.
4.4 on Amazon
51 HN comments
The Everything Store: Jeff Bezos and the Age of Amazon
Brad Stone, Pete Larkin, et al.
4.6 on Amazon
51 HN comments
Compilers: Principles, Techniques, and Tools
Alfred Aho, Monica Lam, et al.
4.1 on Amazon
50 HN comments
Test Driven Development: By Example
Kent Beck
4.4 on Amazon
45 HN comments
Patterns of Enterprise Application Architecture
Martin Fowler
4.5 on Amazon
43 HN comments
magoghmonJan 1, 2019
Two excellent introductory books are:
Statistical Rethinking https://www.amazon.com/Statistical-Rethinking-Bayesian-Examp...
An Introduction to Statistical Learning http://www-bcf.usc.edu/~gareth/ISL/
beccaaf1229onSep 2, 2020
ThripticonOct 1, 2018
markovblingonDec 26, 2016
Plus, it's free!
http://www-bcf.usc.edu/~gareth/ISL/
larrydagonAug 12, 2017
Introduction to Statistical Learning http://www-bcf.usc.edu/~gareth/ISL/
Elements of Statistical Learning https://web.stanford.edu/~hastie/ElemStatLearn/
jphamtasticonJan 17, 2018
http://www-bcf.usc.edu/~gareth/ISL/
exoxonApr 7, 2016
http://www-bcf.usc.edu/~gareth/ISL/
Is an excellent statistical learning reference.
lovelearningonMay 27, 2015
bumbyonDec 31, 2020
A strong point in the "Introduction to Statistical Learning" by the authors is that each chapter ends with example programs in R (albeit with a fair number of typos).
teruakohatuonMay 5, 2020
sgillenonJuly 24, 2020
BrezaonJan 7, 2020
sn9onJune 10, 2017
Follow it up with Elements of Statistical Learning by three of the same authors for more advanced stuff.
DrNukeonFeb 11, 2017
dengonApr 28, 2019
The first book on statistical learning by Hastie, Tibshirani and Friedman, which is absolutely terrific, is freely available for download:
The Elements of Statistical Learning
http://web.stanford.edu/~hastie/ElemStatLearn/
ivanechonJan 10, 2020
matchmike1313onOct 29, 2017
neadenonJune 9, 2017
throw_away_777onOct 7, 2016
MxtetrisonApr 1, 2020
Available for download: http://faculty.marshall.usc.edu/gareth-james/ISL/
Taylor and Karlin, "An Introduction to Stochastic Modeling"
rusty-rustonMay 1, 2020
kafkaesqonApr 9, 2016
An Introduction to Statistical Learning
http://www-bcf.usc.edu/~gareth/ISL/
cs702onJune 6, 2015
SaxonRobberonJan 1, 2020
Deep Learning (Goodfellow)
Introduction to Statistical Learning
jimmy-deanonJan 1, 2019
Were you first introduced to core ML through Bishop? +1 for a solid reading list.
grayclhnonJuly 21, 2014
* James, Witten, Hastie, and Tibshirani's An Introduction to Statistical Learning, with Applications in R
http://www-bcf.usc.edu/~gareth/ISL/
* Hastie, Tibshirani, and Freedman's Elements of statistical learning (more advanced)
http://statweb.stanford.edu/~tibs/ElemStatLearn/
bssrdfonDec 23, 2016
cs702onJan 12, 2016
I highly recommend the book and this online course, both of which are FREE.
Hastie and Tibshirani's other book, "The Elements of Statistical Learning," is also excellent but far more theoretical, and best for experienced practicioners who want to use it as a reference guide.[2]
--
[1] http://www-bcf.usc.edu/~gareth/ISL/ISLR%20Sixth%20Printing.p...
[2] http://statweb.stanford.edu/~tibs/ElemStatLearn/
scythmic_wavesonDec 31, 2020
For example, here's a screenshot from the introductory chapter (pg. 26): [1]. The authors expect you to already be familiar with matrix analysis applied to statistics.
An Introduction to Statistical Learning (ISL) [2] is aimed at those with a high school level of math.
[1] https://imgur.com/q0NeqdR
[2] https://statlearning.com/book.html
kgwgkonJuly 6, 2018
But given that they distribute the PDF for free it's worth checking out. Hastie, Tibshirani & Friedman's The Elements of Statistical Learning and the watered-down and more practical Introduction to Statistical Learning are also nice. All of them can be downloaded from https://web.stanford.edu/~hastie/pub.htm
evanpwonDec 31, 2020
psykliconSep 19, 2018
eyeballonSep 11, 2018
altairiumblueonFeb 3, 2019
- Hands on programming with R - https://rstudio-education.github.io/hopr/ - Teaches basic programming concepts like working with variables, if-else statements, loops etc in R. If your friend doesn't have a technical background, this is a good place to start.
- R for Data Science - https://r4ds.had.co.nz/ - Teaches you to work with the most commonly used libraries for manipulating and visualising data.
- Introduction to Statistical Learning with R - http://www-bcf.usc.edu/~gareth/ISL/ISLR%20Seventh%20Printing... - A great place to start with some of the theory. If she has a background in statistics, it should be quite accessible.
- Caret package site - http://topepo.github.io/caret/index.html - shows how to use one of the popular packages for machine learning.
QasimKonJan 17, 2018
psv1onDec 31, 2019
- MIT: Big Picture of Calculus
- Harvard: Stats 110
- MIT: Matrix Methods in Data Analysis, Signal Processing, and Machine Learning
If any of these seem too difficult - Khan Academy Precalculus (they also have Linear Algebra and Calculus material).
This gives you a math foundation. Some books more specific to ML:
- Foundations of Data Science - Blum et al.
- Elements of Statistical Learning - Hastie et al. The simpler version of this book - Introduction to Statistical Learning - also has a free companion course on Stanford's website.
- Machine Learning: A Probabilistic Perspective - Murphy
That's a lot of material to cover. And at some point you should start experimenting and building things yourself of course. If you'are already familiar with Python, the Data Science Handbook (Jake Vanderplas) is a good guide through the ecosystem of libraries that you would commonly use.
Things I don't recommend - Fast.ai, Goodfellow's Deep Learning Book, Bishop's Pattern Recognition and ML book, Andrew Ng's ML course, Coursera, Udacity, Udemy, Kaggle.
RogerLonJuly 21, 2015
It sure helps to understand the integral equations, especially if you want to read the original literature. But realistically you are going to need to understand summing, normalizing, algorithms for clustering, and so on. You probably don't want to write your own numerical code anyway; someone else did it, and they handled all the edge cases that a naive implementation misses.
You can find PDFs of the James, Witten, Hastie, Tibshirani book "An Introduction to Statistical Learning" [1]. Scroll on through - there is nothing intimidating math wise. All the heavy lifting is left to R.
Jump in, the water is fine!
[1] http://web.stanford.edu/~hastie/pub.htm
kuusistoonSep 12, 2018
https://www.cs.cmu.edu/afs/cs.cmu.edu/user/mitchell/ftp/mlbo...
It's dated, but it's quite approachable and does a great job explaining a lot of the fundamentals.
If you want to approach machine learning from a more statistical perspective, you could also have a look at An Introduction to Statistical Learning to start:
http://www-bcf.usc.edu/~gareth/ISL/
Or if you're more mathematically inclined than the average bear, you could jump directly into The Elements of Statistical Learning:
https://web.stanford.edu/~hastie/ElemStatLearn/
If you want something a little more interactive than a book though, you might have a look at Google's free crash course on machine learning:
https://developers.google.com/machine-learning/crash-course/...
I checked it out briefly maybe six months ago, and it seemed pretty good. It seemed a bit focused on Tensor Flow and some other tools, but that's okay.
fantispugonJan 1, 2020
For tabular data (which is probably most relevant in Pharma, and probably the best place to start) Introduction to Statistical Learning by Hastie et al and Max Kuhn's Applied Predictive modelling cover a lot of the classical techniques.
For univariate time series forecasting "Forecasting Principles and Practice" is great.
For natural language processing foundations Jurafsky's Speech and Language Processing is broadly recommended; for cutting edge natural language processing Stanford's CS224n is great: http://web.stanford.edu/class/cs224n/
indigentmartianonJan 21, 2017
Regarding books, there are many very high quality textbooks available (legitimately) for free online:
Introduction to Statistical Learning (James et al., 2014) http://www-bcf.usc.edu/~gareth/ISL/
the above book shares some authors with the denser and more in-depth/advanced
The Elements of Statistical Learning (Hastie et al., 2009) http://statweb.stanford.edu/~tibs/ElemStatLearn/
Information Theory: Inference & Learning Algorithms (MacKay, 2003) http://www.inference.phy.cam.ac.uk/itila/p0.html
Bayesian Reasoning & Machine Learning (Barber, 2012) http://web4.cs.ucl.ac.uk/staff/D.Barber/pmwiki/pmwiki.php?n=...
Deep Learning (Goodfellowet al., 2016) http://www.deeplearningbook.org/
Reinforcement Learning: An Introduction (Sutton & Barto, 1998) http://webdocs.cs.ualberta.ca/~sutton/book/ebook/the-book.ht...
^^ the above books are used on many graduate courses in machine learning and are varied in their approach and readability, but go deep into the fundamentals and theory of machine learning. Most contain primers on the relevant maths, too, so you can either use these to brush up on what you already know or as a starting point look for more relevant maths materials.
If you want more practical books/courses, more machine-learning focussed data science books can be helpful. For trying out what you've learned, Kaggle is great for providing data sets and problems.
kashifronJuly 21, 2014
For those I'd suggest "Pattern Recognition and Machine Learning" by Bishop. I've read throughout this and it's really well organized and thought out. For more mathematically advanced ML stuff I'd suggest "Foundations of Machine Learning" by Mohri. For a good reference for anything else I'd suggest "Machine Learning: A Probabilistic Perspective" by Murphy. For more depth on graphical models look at "Probabilistic Graphical Models: Principles and Techniques" by Koller.
On the NLP front there's the standard texts "Speech and Language Processing" by Jurafsky and "Foundations of Statistical Natural Language Processing" by Manning.
I also like "An Introduction to Statistical Learning" by James, Witten, Hastie and Tibshirani.
cs702onJune 29, 2018
* fast.ai ML course: http://forums.fast.ai/t/another-treat-early-access-to-intro-...
* fast.ai DL course: part 1: http://course.fast.ai/ part 2: http://course.fast.ai/part2.html
The fast.ai courses spend very little time on theory, and you can follow the videos at your own pace.
Books:
* The best books on ML (excluding DL), in my view, are "An Introduction to Statistical Learning" by James, Witten, Hastie and Tibshirani, and "The Elements of Statistical Learning" by Hastie, Tibshirani and Friedman. The Elements arguably belongs on every ML practitioner's bookshelf -- it's a fantastic reference manual.[b]
* The only book on DL that I'm aware of is "Deep Learning," by Goodfellow, Bengio and Courville. It's a good book, but I suggest holding off on reading it until you've had a chance to experiment with a range of deep learning models. Otherwise, you will get very little useful out of it.[c]
Good luck!
[a] Scroll down on this page for their bios: http://course.fast.ai/about.html
[b] Introduction to Statistical Learning: http://www-bcf.usc.edu/~gareth/ISL/ The Elements of Statistical Learning: https://web.stanford.edu/~hastie/ElemStatLearn/
[c] http://www.deeplearningbook.org/
joaovictortronFeb 26, 2018
The book introduces the foundational concepts of statistical learning (classification, regression, cross-validation) and algorithms such as support vector machines.
It is also available on PDF at the website [1].
[1] http://www-bcf.usc.edu/~gareth/ISL/
amrrsonAug 28, 2017
http://www.r-bloggers.com/in-depth-introduction-to-machine-l...
Introduction to Statistical Learning http://www-bcf.usc.edu/~gareth/ISL/ (Rob S and by Trevor H, Free I guess) for more in depth, Elements of Statistical Learning by the same.
Linear Algebra (Andrew Ng's this part in Introduction to Machine Learning is a short and crisp one)
If you're not scared by Derivatives, you can check them. But you can easily survive and even excel as a data scientist or ML practitioner with these.
uoaeionApr 30, 2020
https://faculty.marshall.usc.edu/gareth-james/ISL/
Elements of Statistical Learning
https://web.stanford.edu/~hastie/ElemStatLearn/
Machine Learning: A Probabilistic Perspective
https://mitpress.mit.edu/books/machine-learning-1
psv1onJan 1, 2020
This doesn't work. ISL is good, but it aims to be accessible by excluding most of the math. So if you go over it, you'll neither "deeply understand ML techniques", nor will you encounter enough math that you can learn along the way as you suggest.
pjmorrisonDec 31, 2019
nafizhonApr 28, 2019
But, and I think this is not stated enough, there is a big difference between statistical learning and machine learning in terms of how you approach a problem. The subject matter might be same, but the approach to solve problem is different, one is a 'statistics' approach, one is a 'CS' approach. Depending on your background, you might like one but not the other.
You can know more of what I am talking about by reading this famous piece from Leo Breiman [0].
Personally, I feel I was fortunate enough to learn ML from a so called 'CS' perspective through Andrew Ng's course on Coursera.
0. https://projecteuclid.org/download/pdf_1/euclid.ss/100921372...
grayclhnonJune 1, 2015
I don't know of good stats books focusing on Python, but I'm sure there are plenty. "An Introduction to Statistical Learning" is free online [1] but it emphasizes R and has very little overlap, so I don't know if it addresses your needs, but it's very good.
1: http://www-bcf.usc.edu/~gareth/ISL/
madenineonJuly 22, 2019
These resources from google and courses like Fast AI are great for getting devs up to speed so they can meaningfully contribute to data science projects - filling that big demand for data + ml literate devs, especially internally. They’re not designed to get people jobs (disclosure, getting people jobs in data science is what we do at thisismetis.com)
If you want to go deeper? The open source data science masters is a good set of resources[0]. The first few sections of Goodfellow’s deep learning book are a great crash course in ML math/stats theory[1]. Introduction to Statistical Learning is a staple in most people’s library[2]. There’s a glut of intro level data science content out there on the internet, but intermediate to advanced stuff usually means putting in serious effort or breaking out your checkbook and going back to school (whether traditional or otherwise).
[0]http://datasciencemasters.org/
[1]https://www.deeplearningbook.org/
[2]http://faculty.marshall.usc.edu/gareth-james/ISL/
staredonJan 8, 2018
Notre1onSep 12, 2016
Depending on your learning style, Data Science from Scratch[2] might be another good option.
BTW, neither of these uses Octave like Andrew NG's course does. The first one uses R and the second uses Python.
[1]: http://www-bcf.usc.edu/~gareth/ISL/
[2]: http://joelgrus.com/2015/04/26/data-science-from-scratch-fir...
misframeronMay 8, 2015
[0] http://www-bcf.usc.edu/~gareth/ISL/
[1] http://www-bcf.usc.edu/~gareth/ISL/ISLR%20Fourth%20Printing....
aurahamonJuly 24, 2020
Introduction to Statistical Learning is also available for free online:
http://faculty.marshall.usc.edu/gareth-james/ISL/
Although I only read a few chapters from that book, I really like it (but I would have preferred a python version of the book).
Personally, if you have to pick three books from the list, ypu can start with these three options.
wenconJune 25, 2021
* Designing Data Intensive Applications (M Kleppmann): Provided a first-principles approach for thinking about the design of modern large-scale data infrastructure. It's not just about assembling different technologies -- there are principles behind how data moves and transforms that transcend current technology, and DDIA is an articulation of those principles. After reading this, I began to notice general patterns in data infrastructure, which helped me quickly grasp how new technologies worked. (most are variations on the same principles)
* Introduction to Statistical Learning (James et al) and Applied Predictive Modeling (Kuhn et al). These two books gave me a grand sweep of predictive modeling methods pre-deep learning, methods which continue to be useful and applicable to a wider variety of problem contexts than AI/Deep Learning. (neural networks aren't appropriate for huge classes of problems)
* High Output Management (A Grove): oft-recommended book by former Intel CEO Andy Grove on how middle management in large corporations actually works, from promotions to meetings (as a unit of work). This was my guide to interpreting my experiences when I joined a large corporation and boy was it accurate. It gave me a language and a framework for thinking about what was happening around me. I heard this was 1 of 2 books Tobi Luetke read to understand management when he went from being a technical person to CEO of Shopify. (the other book being Cialdini's Influence). Hard Things about Hard Things (B Horowitz) is a different take that is also worth a read to understand the hidden--but intentional--managerial design of a modern tech company. These some of the very few books written by practitioners--rather than management gurus--that I've found to track pretty closely with my own real life experiences.
alexanderchronMar 1, 2020
There is also Elements of statistical learning by the same authors if you are looking for something more rigorous. I haven’t read very much of it but it is supposed to very good too.
_fullpintonAug 12, 2019
From the same Stanford publishing there is Introduction to Statistical Learning. It’s a good intro to Machine Learning as a whole.
Far too often it seems people want to jump directly into Deep Learning, I’d shy away from that and having a better understanding of ML as a discipline makes the application of DL much more productive.
Edit:
Also would like to add a lot of people want to use DL for imaging stuff. Take some time to understand Digital Image Processing as well. It’s a good introduction to convolution and filtering. As well as just understanding what an image is and what can be done with it!
This is just sort of advice from my path.
The second book they mention also had some pretty heavy stuff involving probability and probability models. If you can take some time to understand Automata and it’s supplications such as Hidden Markov Models that’ll be a big help.
Also you mentioning that you never taking a formal algorithm course. While it isn’t necessary as you probably won’t be building anything from scratch. Learning some dynamic programming methods is very helpful when understanding FFT and it’s impact with convolution methods and also how some of these hidden models for probability are evaluated efficiently.
fgimenezonJune 11, 2020
- Algorithms - Papadimitrou and Vazirani: I had a professor who described this as a poetry book about algorithms. Alternative is Sipser
- An Introduction to Statistical Learning: This is like a diet form of Elements of Statistical Learning which is much more approachable and pragmatic.
- Janeway's Immunobiology - De facto standard of immunology. Great.
- SICP: duh
- Principles of Data Integration: This is more because the subject matter is so important and nobody really has studied fundamentals. Did you know general data integration is AI-complete? If 99% of work in AI was spent on data integration, the field would move so much faster.
jochenleidneronAug 28, 2017
2. Regarding books I second the late David McKay's "Information Theory, Inference and Learning Algorithms" and the second edition of "Elements of Statistical Learning" by Tibshirani et al. (there's also a more accessible version of a subset of the material targeting MBA students called James et al., An Introduction to Statistical Learning). Duda/Hart/Stork's Pattern Classification (2nd ed.) is also great.
The self-published volume by Abu-Mostafa/Magdon-Ismail/Lin, Learning from Data: A Short Course is impressive, short and useful for self-study.
3. Wikipedia is surprisingly good at providing help, and so is Stack Exchange, which has a statistics sub-forum, and of course there are many online MOOC courses on statistics/probability and more specialized ones on machine learning.
4. After that you will want to consult conference papers and online tutorials on particular models (k-means, Ward/HAC, HMM, SVM, perceptron, MLP, linear and logistic regression, kNN, multinomial naive Bayes, ...).
PandabobonMar 20, 2015
[1]: http://www-bcf.usc.edu/~gareth/ISL/
[2]: http://statweb.stanford.edu/~tibs/ElemStatLearn/
crdbonFeb 23, 2015
Exporting visit trends over 5 landing pages and a month? Sure.
Exporting page views for 100,000 products each of whom got 5-100 views? Then that 5% sample is going to exclude most products. The latter approach is however necessary if you're trying to determine how each product category is really performing.
Two alternatives I prefer to Google Analytics Premium (once you get to that size): Webtrekk, a small but competent German company whose product costs around 1/10th as much per year, has a fraction of the bugs, and does reliable unsampled daily dumps (moving to hourly, I believe), although the UI is a little less intuitive; and a self-hosted Piwik instance, so you don't need to worry about data exports. The truth is modern relational databases are incredibly powerful and will easily scale even with information like impressions in onsite search. There are multi-TB instances of Postgres out there. I really suggest installing either in parallel to GA or on their own when you set up tracking.
I do agree with you that anybody involved in any kind of job that includes "analytics" in the title, or indeed most people in management, should take an intro stats course. I particularly like Introduction to Statistical Learning because of its brevity, relatively high abstraction level, and lack of maths.
mtzetonJuly 14, 2017
The programming part with R, python, julia etc., seems to get the most attention here. I think the most important part here is to learn how to load datasets into your system of choice and work with them to get some nice plots out. The book "R for data science"[1] seems like a good intro for this with R and tidyverse.
Somewhat more overlooked here, are the statistical models. I second the recommendation of "Introduction to Statistical Learning"[2], possibly supplemented with it's big brother "Elements of Statistical Learning"[3] if you're more mathematically inclined and want more details. I like their emphasis on starting with simple models and working your way up. I also found their discussion on how to go from data to a mathematical model very lucid.
[1] http://r4ds.had.co.nz/
[2] http://www-bcf.usc.edu/~gareth/ISL/
[3] http://web.stanford.edu/~hastie/ElemStatLearn/
spectramaxonApr 28, 2019
chollida1onApr 9, 2016
I wrote about this here: https://news.ycombinator.com/item?id=8767092 and here: https://news.ycombinator.com/item?id=9433316
Long story short, the biggest mistake I see people making is not actually rolling up their sleeves and learning the math.
People are often content to watch hour after hour of Udacity, Khan academy and Coursera videos but the applied follow up is where most people drop off. At the very least any course work should be followed up by something practical like a kaggle exercise to prove that you can apply the technique you just learned. Consider the benefit of just watching videos vs doing actual applied work.
On one hand if you just watch videos you might learn alot but how do you prove that to someone hiring you? On the other hand if you sit down and spend a week attaching a Kaggle excise then at the very least you have something to point people to, to show that you can apply machine learning techniques.
My recommendation has always been to read the first 5 chapters of Introduction to statistical learning: http://www-bcf.usc.edu/~gareth/ISL/
and if you fly through it then sample Elements of statistical learning http://statweb.stanford.edu/~tibs/ElemStatLearn/ for the topics that you want to learn.
If intro to statistical learning is too advanced, then go to Khan academy and work your way through their statistics videos.
From my experience you can bucket people into skill level by looking at how they attack a new problem.
Beginners tend to start by saying they'll need a hadoop cluster and spend the next week setting up a pipeline.
Intermediate people tend to jump into R or scikit and try to model the problem with a small subset of data and the library and technique they know best.
The advanced people tend to flesh out their hypothesis first and then work out the math and then jump to modelling with a small set of data and finally move to a cluster.
esfandiaonApr 28, 2019
- Hastie is a co-author of two machine learning books, one is "Elements of Statistical Learning" which is very comprehensive, and "Introduction to Statistical Learning", which is more approachable by people without too much background in stats.
fantispugonJan 1, 2020
But it's ok to start using libraries and fitting models without understanding how they work deeply, and coming back to these books later (just make sure you come back; there's lots of useful ideas in them!) In which case I'd recommend some of the resources the parent doesn't recommend
codesushi42onJuly 9, 2019
An Introduction to Statistical Learning
https://github.com/tpn/pdfs/blob/master/An%20Introduction%20...
Deep Learning
https://books.google.com/books/about/Deep_Learning.html?id=o...
Machine Learning Mastery books
https://machinelearningmastery.com/products/
Convolutional Neural Networks from the Ground Up
https://towardsdatascience.com/convolutional-neural-networks...
Transformers
https://medium.com/inside-machine-learning/what-is-a-transfo...
cashweaveronDec 31, 2020
[1] http://joshua.smcvt.edu/linearalgebra/
[2] http://vmls-book.stanford.edu/
[3] https://ses.library.usyd.edu.au/handle/2123/21370
[4] https://projects.iq.harvard.edu/stat110/home
chollida1onMay 8, 2015
Long story short, the biggest mistake I see people making is not actually rolling up their sleeves and learning the math.
People are often content to watch hour after hour of Udacity, Khan academy and Coursera videos but the applied follow up is where most people drop off. At the very least any course work should be followed up by something practical like a kaggle exercise to prove that you can apply the technique you just learned. Consider the benefit of just watching videos vs doing actual applied work.
On one hand if you just watch videos you might learn alot but how do you prove that to someone hiring you? On the other hand if you sit down and spend a week attaching a Kaggle excise then at the very least you have something to point people to, to show that you can apply machine learning techniques.
My recommendation has always been to read the first 5 chapters of Introduction to statistical learning: http://www-bcf.usc.edu/~gareth/ISL/
and if you fly thorugh it then sample Elements of statistical learning http://statweb.stanford.edu/~tibs/ElemStatLearn/ for the topics that you want to learn.
If intro to statistical learning is too advanced, then go to Khan academy and work your way through their statistics videos.
From my experience you can bucket people into skill level by looking at how they attack a new problem.
Beginners tend to start by saying they'll need a hadoop cluster and spend the next week setting up a pipeline.
Intermediate people tend to jump into R or scikit and try to model the problem with a small subset of data and the library and technique they know best.
The advanced people tend to flesh out their hypothesis first and then work out the math and then jump to modelling with a small set of data and finally move to a cluster.
crdbonDec 2, 2015
I am running a small consulting operation with clients in Australia and South East Asia.
Right now, I am signing up new data consulting clients (including a big one in February 2016) and need some help. This involves running AWS instances, figuring out APIs, building a PostgreSQL data warehouse, and finally building various machine learning products from recommendation engines to using multivariate statistics to improve their understanding of their business. The latter part is the fun one but anybody who has done this will know 90% of the work is in the data warehouse.
I write some of the code but I mostly spend my time dealing with the clients and writing the functional specs for you.
You should be familiar with relational algebra and relational databases. I recommend having a good knowledge of the topics covered in "An Introduction to Statistical Learning" (http://www-bcf.usc.edu/~gareth/ISL/). Most of our recent projects have been done with Haskell, Postgres and some bash.
You can usually name your price (per week), worst case the client will say no. Clients are all OK with never meeting you in person (in fact, my first client still hasn’t met me).
If interested, please get in touch - email is in my profile. You can expect some technical discussion and potentially a little programming test. If you can, please point to some public code you've written.
pddproonMay 16, 2016
theggintheskyonDec 31, 2020
[0] http://faculty.marshall.usc.edu/gareth-james/ISL/
[1] https://www.youtube.com/watch?v=5N9V07EIfIg&list=PLOg0ngHtcq...