Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register

Neural Networks

This site may earn commission on affiliate links.

lunitiks

Cool James & Black Teacher
Nov 19, 2016
2,698
5,996
Prawn Island, VC
Not to take anything away from my all-time favorite threads (this and this one)...

But I think we need a thread dedicated exclusively to artificial neural networks, ML, CV and whatnot.

Because I'm pretty sure most of us only have an 'average' (at best) understanding of what NN really is - how it works and what Tesla is doing wrt to this technology. Some questions to get things started:
  • What's the difference between plain vanilla NN and "Convolutional" NN?
  • What's "reinforcement" learning?
  • What's the state of the art wrt the NN technology Tesla is attempting (FSD)?
  • How can we best explain @verygreen's great discoveries in SW?
  • etc.
Couple of very informative vids for n00bs like myself:




(^ that one was way too long for me to watch, but I'm sure it's excellent)

PS: Please don't destroy this thread with *sugar*-throwing about Lidar vs Radar :)

@jimmy_d feel free to enlighten us ;)
 
Last edited by a moderator:
Anyone who is interesting in getting a little deeper into reinforcement learning should check out this great course by David Silver, available on YouTube:


It's pretty accessible, though I had to take and review notes while going through it a couple of years ago. It's also a couple of years old so there may be some valuable updates. And finally, it's a bit of a time commitment at 10 lectures, ~90 minutes each, but you can have YouTube playback go at 1.5x or 2x speed and pause when necessary.

You can use neural networks to approach a reinforcement learning problem. Ways to solve these problems are explored in the course.
 
@jimmy_d feel free to enlighten us ;)

I can't compete with the internet in terms of educational material. There's so much great stuff out there that explains deep learning it's just crazy. Of course if there's any particular thing you would like pointers on I'll try to help.

I haven't been posting much stuff on the AP2 NN lately because real facts are hard to come by and I'm not sure how much people want to listen to me speculate. Certainly there seems to be a cohort which is allergic to any speculation that doesn't confirm their biases and the blowback gets old. Our mutual friend gave me access to copies of recent and some older data on the NN's and I took them apart to see what I could learn from them. I also tried building and analyzing various parts of the network description on the frameworks that they seem to have been created on, which yielded some ideas but mostly it just killed off some theories that I had.

But here's what I've got as of today:

Facts:

1) There's a vision NN running in AP2 that takes images from the main and the narrow cameras and processes them to extract feature maps. There's only one network but it runs as two instances in two threads that separately process each camera independently.
2) The front half of the NN in AP2 is basically Googlenet with a few notable differences:
- The input is 416x640 (original Googlenet was 224x224)
- The working frame size in Googlenet is reduced by 1/2 in each dimension between each of the 5 major blocks. The AP network omits the reduction between blocks 4 and 5 so that the final set of features is 2x2 times larger than in Googlenet.
3) The final set of output features from the googlenet "preprocessor" is digested in several different ways to generate output. All of these outputs are floating point tensor fields with 3 dimensions and they all have frame sizes that either match the input frame size of 416x640 or a reduced version of it at 104x160.
4) All of these output tensors are constructed by deconvolution from the output of the googlenet preprocessor.
5) There are no scalar or vector outputs from the NN, so it is not end-to-end in the usual sense. It's output must be interpreted by some downstream process to make driving decisions.
6) Between 40 and 42 new output categories were added to the NN
7) Sometime in the last four months major changes were made to the deconvolution portions of the NN that included the removal of network sections that are normally associated with improving the accuracy of segmentation maps. This happened after 26 but before 40, an interval in which users reported substantial improvements in operation at highway speed.
8) Between 40 and 42 two additional NNs were added to the code. Neither has been seen in operation.
9) One of the new NNs is named fisheye_wiper. The network itself is a substantially simplified googlenet implementation. The output from this network is a five output classifier indicating that it outputs a single one of 5 categories being detected. This is exactly the output you would expect for a rain intensity detector intended to control wiper blades.
10) The other new NN is named repeater. This network includes a googlenet preprocessor almost identical to the one used for the main and narrow cameras. The outputs are similar to a subset of main narrow.


Stuff that's probably true:

1) Forward vision is probably binocular using cropped, scaled, and undistorted segments of main and narrow that are trimmed to provide identical FOV. This allows the car to determine depth from vision independently from whatever depth information it gets from other sensors (radar). The depth extraction is not being done in the NN but there are function names in the code that refer to stereo functions and frame undistortion. Depth extraction can be done efficiently using conventional vision processing (that is, non NN) techniques so that's probably why it's not included in the NN. I haven't seen or heard about any AP2 behavior that can only be explained by the existed of binocular vision so it's possible this is a shadow feature. No real idea.

2) Output from the forward vision NN probably includes at least one full resolution semantic segmentation map with bounding boxes for that map. It might include several maps including some at lower resolution. Output from the repeater NN is similar though it eliminates some classes that are present in the main/narrow NN. It is extremely likely that both the main/narrow and repeater networks are detecting objects in multiple classes and generating bounding boxes for those objects.

3) The NN is probably getting a lot of development. Not only does it change for every single firmware revision, some of the changes are drastic. Some look like experiments, others look like diagnostic features. Some look like transient bug fixes that are later corrected by fixing the underlying problem (eliminating the need for the hack).

4) The network is probably being trained in part by using simulated driving data. Generating labeled training data for semantic segmentation outputs of the kind the vision NN seems to be outputting is extremely labor intensive if done manually and simulation is a common approach to augmenting training data. My review of Tesla's open positions in their Autopilot division found that they were recruiting simulation experts and simulation artists with a bias towards driving environments and the Unreal 4 engine. I found no positions of the type that I would expect to see if they were manually labeling large volumes of real world data. Of course, this latter is the kind of thing you might outsource, so it's not a very strong datapoint except that it fails to show they are manually labeling lots of data.

5) Tesla is probably developing fairly advanced customizations to the tools used for testing, training, developing, and deploying neural networks. I found evidence for custom libraries and custom tools all over the place as I was trying to track down clues to how they were doing their development. There isn't anything I could find in their system that you could just drop into publicly available tools and make sense of it, but then I find references to parts of publicly available tools and libraries all over the place. It seems like they are pulling from a lot of different sources but then building their own tools.

6) After analyzing the options, I think the calibration phase for the cameras is to allow matching the main and narrow cameras to a high enough accuracy to enable stereo vision. When I look at manufacturing variances for cameras and compare that to the operational variance that the system has to be able to deal with in use the only thing I can come up with that can't be compensated on the fly is support for stereo vision. In order to support rectified aligned image stereo processing you need to pre-calculate the alignment transformations for the two cameras to sub-pixel accuracy. I don't think this can be factory calibrated because the calibration probably wouldn't survive transport of the vehicle to it's delivery destination. That's not true of any other vision process that I can come up with and it's certainly not true of NN vision processes that are appropriate to vehicle applications. Calibration probably has to be re-done frequently by the car because even normal operation will lead to the alignment drifting enough for it to become a problem. And it probably needs to go through the cal process anytime there is maintenance performed that requires manipulation of the forward camera assembly. If this is true then AP2 must have been using stereo since it's first incarnation as the calibration period has always been present, and it must be using it for driving decisions since you can't use the AP features until the cameras have been calibrated.

Pure speculation:

After my first look at the version 40 NN I was surprised at how simple it was, conceptually, and how 'old' the circa 2015 architectural concepts were and speculated that perhaps this version of EAP was not getting much effort. (In the deep learning world 2 years is an eternity). I thought this might make sense if the company were pushing to release a separate and much more sophisticated package and needed this current NN to be just a placeholder substitute for the missing AP1 vision chip while they readied a much more ambitious system, which might be FSD. After reviewing older versions of the network I found that the output varied from version to version so this system can't be merely a substitute for a missing module - if that were the case the I/O wouldn't be changing. Also, stereo vision is clearly a feature that wasn't possible in AP1. And now we see NNs being added for cameras not present in the AP1 hardware - the repeaters. Finally, I am finding small features being included in the NN that did not appear in the public literature until fairly recently, implying that the EAP team is actively trying out cutting edge ideas in limited domains. So at this point I think that EAP is getting the kind of development that suggests it's not a placeholder.

I recall that at some point one of the differentiating features of EAP and FSD was the number of cameras, with EAP having 4 in use and FSD the full suite of 8. With the addition of the repeaters to main and narrow we would see 4 driving cameras in use assuming the fisheye is just for the wipers in the EAP use case. That would be an interesting match up and it might indicate that these four are the candidates for on-ramp to off-ramp level of functionality in EAP.
 
I can't compete with the internet in terms of educational material. There's so much great stuff out there that explains deep learning it's just crazy. Of course if there's any particular thing you would like pointers on I'll try to help.

I haven't been posting much stuff on the AP2 NN lately because real facts are hard to come by and I'm not sure how much people want to listen to me speculate. Certainly there seems to be a cohort which is allergic to any speculation that doesn't confirm their biases and the blowback gets old. Our mutual friend gave me access to copies of recent and some older data on the NN's and I took them apart to see what I could learn from them. I also tried building and analyzing various parts of the network description on the frameworks that they seem to have been created on, which yielded some ideas but mostly it just killed off some theories that I had.

But here's what I've got as of today:

Facts:

1) There's a vision NN running in AP2 that takes images from the main and the narrow cameras and processes them to extract feature maps. There's only one network but it runs as two instances in two threads that separately process each camera independently.
2) The front half of the NN in AP2 is basically Googlenet with a few notable differences:
- The input is 416x640 (original Googlenet was 224x224)
- The working frame size in Googlenet is reduced by 1/2 in each dimension between each of the 5 major blocks. The AP network omits the reduction between blocks 4 and 5 so that the final set of features is 2x2 times larger than in Googlenet.
3) The final set of output features from the googlenet "preprocessor" is digested in several different ways to generate output. All of these outputs are floating point tensor fields with 3 dimensions and they all have frame sizes that either match the input frame size of 416x640 or a reduced version of it at 104x160.
4) All of these output tensors are constructed by deconvolution from the output of the googlenet preprocessor.
5) There are no scalar or vector outputs from the NN, so it is not end-to-end in the usual sense. It's output must be interpreted by some downstream process to make driving decisions.
6) Between 40 and 42 new output categories were added to the NN
7) Sometime in the last four months major changes were made to the deconvolution portions of the NN that included the removal of network sections that are normally associated with improving the accuracy of segmentation maps. This happened after 26 but before 40, an interval in which users reported substantial improvements in operation at highway speed.
8) Between 40 and 42 two additional NNs were added to the code. Neither has been seen in operation.
9) One of the new NNs is named fisheye_wiper. The network itself is a substantially simplified googlenet implementation. The output from this network is a five output classifier indicating that it outputs a single one of 5 categories being detected. This is exactly the output you would expect for a rain intensity detector intended to control wiper blades.
10) The other new NN is named repeater. This network includes a googlenet preprocessor almost identical to the one used for the main and narrow cameras. The outputs are similar to a subset of main narrow.


Stuff that's probably true:

1) Forward vision is probably binocular using cropped, scaled, and undistorted segments of main and narrow that are trimmed to provide identical FOV. This allows the car to determine depth from vision independently from whatever depth information it gets from other sensors (radar). The depth extraction is not being done in the NN but there are function names in the code that refer to stereo functions and frame undistortion. Depth extraction can be done efficiently using conventional vision processing (that is, non NN) techniques so that's probably why it's not included in the NN. I haven't seen or heard about any AP2 behavior that can only be explained by the existed of binocular vision so it's possible this is a shadow feature. No real idea.

2) Output from the forward vision NN probably includes at least one full resolution semantic segmentation map with bounding boxes for that map. It might include several maps including some at lower resolution. Output from the repeater NN is similar though it eliminates some classes that are present in the main/narrow NN. It is extremely likely that both the main/narrow and repeater networks are detecting objects in multiple classes and generating bounding boxes for those objects.

3) The NN is probably getting a lot of development. Not only does it change for every single firmware revision, some of the changes are drastic. Some look like experiments, others look like diagnostic features. Some look like transient bug fixes that are later corrected by fixing the underlying problem (eliminating the need for the hack).

4) The network is probably being trained in part by using simulated driving data. Generating labeled training data for semantic segmentation outputs of the kind the vision NN seems to be outputting is extremely labor intensive if done manually and simulation is a common approach to augmenting training data. My review of Tesla's open positions in their Autopilot division found that they were recruiting simulation experts and simulation artists with a bias towards driving environments and the Unreal 4 engine. I found no positions of the type that I would expect to see if they were manually labeling large volumes of real world data. Of course, this latter is the kind of thing you might outsource, so it's not a very strong datapoint except that it fails to show they are manually labeling lots of data.

5) Tesla is probably developing fairly advanced customizations to the tools used for testing, training, developing, and deploying neural networks. I found evidence for custom libraries and custom tools all over the place as I was trying to track down clues to how they were doing their development. There isn't anything I could find in their system that you could just drop into publicly available tools and make sense of it, but then I find references to parts of publicly available tools and libraries all over the place. It seems like they are pulling from a lot of different sources but then building their own tools.

6) After analyzing the options, I think the calibration phase for the cameras is to allow matching the main and narrow cameras to a high enough accuracy to enable stereo vision. When I look at manufacturing variances for cameras and compare that to the operational variance that the system has to be able to deal with in use the only thing I can come up with that can't be compensated on the fly is support for stereo vision. In order to support rectified aligned image stereo processing you need to pre-calculate the alignment transformations for the two cameras to sub-pixel accuracy. I don't think this can be factory calibrated because the calibration probably wouldn't survive transport of the vehicle to it's delivery destination. That's not true of any other vision process that I can come up with and it's certainly not true of NN vision processes that are appropriate to vehicle applications. Calibration probably has to be re-done frequently by the car because even normal operation will lead to the alignment drifting enough for it to become a problem. And it probably needs to go through the cal process anytime there is maintenance performed that requires manipulation of the forward camera assembly. If this is true then AP2 must have been using stereo since it's first incarnation as the calibration period has always been present, and it must be using it for driving decisions since you can't use the AP features until the cameras have been calibrated.

Pure speculation:

After my first look at the version 40 NN I was surprised at how simple it was, conceptually, and how 'old' the circa 2015 architectural concepts were and speculated that perhaps this version of EAP was not getting much effort. (In the deep learning world 2 years is an eternity). I thought this might make sense if the company were pushing to release a separate and much more sophisticated package and needed this current NN to be just a placeholder substitute for the missing AP1 vision chip while they readied a much more ambitious system, which might be FSD. After reviewing older versions of the network I found that the output varied from version to version so this system can't be merely a substitute for a missing module - if that were the case the I/O wouldn't be changing. Also, stereo vision is clearly a feature that wasn't possible in AP1. And now we see NNs being added for cameras not present in the AP1 hardware - the repeaters. Finally, I am finding small features being included in the NN that did not appear in the public literature until fairly recently, implying that the EAP team is actively trying out cutting edge ideas in limited domains. So at this point I think that EAP is getting the kind of development that suggests it's not a placeholder.

I recall that at some point one of the differentiating features of EAP and FSD was the number of cameras, with EAP having 4 in use and FSD the full suite of 8. With the addition of the repeaters to main and narrow we would see 4 driving cameras in use assuming the fisheye is just for the wipers in the EAP use case. That would be an interesting match up and it might indicate that these four are the candidates for on-ramp to off-ramp level of functionality in EAP.

Jimmy_d-

I love this write up. It's exactly what I crave to read on these forums. Sure, it's just educated guesswork, but it's invaluable to an enthusiast like me. Thank you.
 
The NN is probably getting a lot of development. Not only does it change for every single firmware revision
Actually it's a recent development that the NNs were getting a change in every firmware revision.
Before single NN would be reused across multiple releases. E.g. the NN introduced in 17.24 was with us until at least 17.36 and possibly until 17.38 (I just don't have a copy of 17.38 to confirm) with zero modifications.

And it probably needs to go through the cal process anytime there is maintenance performed that requires manipulation of the forward camera assembly
The calibration is ongoing process, it's just before you have some initial calibration autopilot does not enable, but once you have it it still is constantly adjusted (and mothership gets updated calibration data like ever 5 minutes of driving!) hence when they work on a camera assembly or change a windshield o whatnot typically they do not reset the calibration.
People report that the car position portrayed on IC changes (and I can confirm that since it happened to me too), but within a couple of days the recalibration process does its thing and the discrepancies disappear.
In a way I guess this makes sense since internal cam alignment might have some minute changes due to vibrations and whatnot as well.

There was only one case I know of where the correction did not happen automatically and a full blown reset of calibration data was needed as reported by @kdday
 
If this is true then AP2 must have been using stereo since it's first incarnation as the calibration period has always been present, and it must be using it for driving decisions since you can't use the AP features until the cameras have been calibrated

Didn’t we have some past evidence that EAP could operate with one camera? It worked with tape over the other two front cameras, IIRC.

Current HW2 Autopilot using 2 of 8 cameras * Testing Inside *
 
Actually it's a recent development that the NNs were getting a change in every firmware revision.
Before single NN would be reused across multiple releases. E.g. the NN introduced in 17.24 was with us until at least 17.36 and possibly until 17.38 (I just don't have a copy of 17.38 to confirm) with zero modifications.


The calibration is ongoing process, it's just before you have some initial calibration autopilot does not enable, but once you have it it still is constantly adjusted (and mothership gets updated calibration data like ever 5 minutes of driving!) hence when they work on a camera assembly or change a windshield o whatnot typically they do not reset the calibration.
People report that the car position portrayed on IC changes (and I can confirm that since it happened to me too), but within a couple of days the recalibration process does its thing and the discrepancies disappear.
In a way I guess this makes sense since internal cam alignment might have some minute changes due to vibrations and whatnot as well.

There was only one case I know of where the correction did not happen automatically and a full blown reset of calibration data was needed as reported by @kdday

Curious, when you say the "position in the IC changes" and recalibration fixes that, are you saying that the car is actually centered on the real road but the IC is displaying the car postiitonally incorrect on the display? In my circumstance when I replaced my windshield, and it was a good 50+ miles of driving, both the real car and the IC had my car driving on the left lane marker steadily. It just wouldnt drive centered and didn't seem to recalibrate even after starting/stopping my driving sessions throughout the day. Just loved the left lane line and drove over it. Again, even more interestingly to this conversation, when Tesla Service reset my "DAS" EAP system, they told me that it would take me 50-100 miles of driving before AP would be usable again. It was immediately usable the moment I left the DS, and my car and IC was correctly centered in the lane and is flawless since.

That also being said, when EAP first rolled out to my car, I vividly remember both release notes and forum postings saying AP would be available for use after 50-100 miles of driving after an initial calibration period. When I took my car out for the very first time when AP was installed on my car, AP was also available immediately and required no calibration period. I thought it odd because so many others reported a successful calibration period, but I've never had to do one. And I pay very close attention and immediately test all releases so I'm not being lazy here.
 
Curious, when you say the "position in the IC changes" and recalibration fixes that, are you saying that the car is actually centered on the real road but the IC is displaying the car positionally incorrect on the display?
I think that's what happened, yes. i.e. car was hugging the left side lane, but display was correct or some such, don't remember the details now but it certainly felt out of whack for a couple of days after the windshield replacement.
It could be that the display and actual car positions were aligned, but the car placement was non-ideal and I just forgot I guess. But the constant recalibration is real - that's for sure and my car recalibrated itself just fine with no additional prompts.

When I took delivery of my car (at end of March) first drive from SC home (~180 miles) I had no AP, and then next day it was working. This pretty much reflects all other reports from about that time I believe.
 
There's been some speculation about whether reinforcement learning is being actively used by the AP team. It's possible, but only in a very limited sense today.

When people talk about this in newsgroups they usually imagine one NN that does everything, but in the real world complex problems get broken down in to parts and a solution is usually constructed by solving different parts with different tools. For a car there are a lot of ways you can break down the overall problem of sensing, perception, decision, and control. Some of those might have one or more NNs, but the NNs are only going to be part of the overall solution. Today the most efficient way to solve most problems amenable to solution with an NN is not going to involve RL. That is going to change as RL gets better and we learn how to use simulations alongside real world data, but today it's not done that way except for research and for certain kinds of uncommon systems.

There are 3 common kinds of neural network training: supervised, unsupervised, and reinforcement.

Supervised training has a network learn how to label it's input. So if you show a network a bunch of pictures that are pre-labeled and it has to learn to tell you the label when you show it a picture then you're doing supervised training. This is how almost all NNs are trained today and virtually all of the semantic segmentation systems you see being used in self driving systems get trained this way. The AP2 NNs seem quite similar in concept to YOLO (You Only Look Once) semantic segmentation systems.

It's possible to generate labeled data with a simulation - in fact, that's the best way to do it in terms of accuracy, volume, and control. But to have the simulated data transfer directly over to real world application the simulation needs to be quite good and it takes a lot of effort to make a good simulation. But once you have it you can generate huge amounts of labeled training data, so this is probably a good investment for a serious production effort. I suspect that Tesla's simulation work is mainly focused on generating high quality labeled training data for their vision network at this point in time.

If you can't get labeled data you can do unsupervised training, but it takes longer and you need a really large amount of unlabeled data. You might need 100,000 labeled pieces of data to do supervised training, and 100,000,000 pieces of data without labels to do unsupervised training. So unsupervised learning takes more data and more training, and thus more time.

Tesla can in principle acquire huge amounts of unlabeled or partially labeled video data from their fleet of cars. But we don't see the cars uploading vast amounts of data. @verygreen has only seen the cars uploading pretty limited data sets and those generally surround events of interest. I think it's not unlikely that Tesla is compiling a database of video for use in training networks, but it might just be used for pre-training or for testing. The networks that we see in AP2 right now will require a substantial amount of labeled data to train their output stages but it's possible that the googlenet portions could be partially or substantially pre-trained using that portion of the network configured in a training harness as part of an autoencoder or something equivalent. I've gone back and forth on how likely that is but at the moment I'm inclined to think they may be using a stock googlenet with imagenet weights up through layer 4e and training 5a/5b along with the output filters using just labeled data.

Even if Tesla isn't using their fleet to gather vast amounts of training video it's still extremely useful in training their NNs because they can validate an NN on the fleet in shadow mode and they can use an NN deployed to the fleet to build a database of corner cases - which is much more valuable than just raw video and much less data to manage.

With reinforcement learning you give the network control of something and a goal and it learns to achieve the goal by controlling the system. In a sense it's the perfect NN because you just build it and turn it loose and it learns everything it needs on it's own. It doesn't require any data at all, but it does require that you have a system that it can play with and learn to control. RL systems today take a long time to learn stuff so they get trained in simulation because a simple simulation can run fast enough to get useful results in a reasonable amount of time. An RL that learns to play a video game might have to play through it 1,000,000 times in order to get proficient. That takes a big computer and a pretty simple video game if you want to get results any time soon. It's too slow to do it with a real car even if you could, so you have to train in simulation and that requires that you have a really good simulator if you want to be able to transfer the knowledge directly to a real world system. That has only been done in very simple systems so far but the tech is advancing rapidly.

In a system with a lot of parts you can use multiple networks and train different parts in different ways depending on what subsets of the overall problem for which you have data and labels and which parts you can simulate well. It's possible that Tesla is using RL to train some bits of the network but right now I can't think of a good candidate. OTOH I think any seriously forward thinking company using NNs in control problems is probably paying close attention to developments in RL because it could become the thing you want to be doing very soon.

Also - in the world of reinforcement learning there are two clear leaders. The first one is google deep mind, who are in a league of their own. The second is open ai, which was founded by Elon Musk. Based on comment's he's made I'm quite confident that Musk is paying very close personal attention to developments in RL and I have little doubt that he will want Tesla to use it just as soon as it's ready.

But RL is really, really hard today. The number of people that can get it to work is tiny even in the small world of NN research. This is something that might have recently had a giant breakthrough however. The single most significant thing about deep mind's recent alphago zero paper is that they have found a way to make training an RL much more stable than it has been in the past. Stable means you don't need to have a deep learning dream team to get the damn thing to just work at all. It was that increase in stability that let the machine learn from scratch. And if it's true that they've cracked the stability problem then RL will become doable for a lot more people over the next several months.

total digression:

For the first several hours after I read that alphago zero paper I felt a kind of visceral dread. It was the first time I think I've felt like maybe AI might be moving too fast. This is not the kind of attitude I normally bring to this topic, but this paper was really quite a shock - it was years too soon. Everything about alphago has been 'years too soon' but this was really, really crazy too soon.

When we invented nuclear weapons one of the saving graces was that making one was not something that was doable by some pissed off asshole with a PhD, a basement, and a grudge. What would the world be like if you could make a nuke from stuff that anybody could get their hands on? Think about that for a minute.

I still believe that AI is unlikely to become the kind of technology that is the "superempowerment of the angry young man". But stuff like that alphago zero paper makes me wonder if maybe some critical breakthrough might actually be close at hand, that it might make these systems suddenly a million times more effective, and that such a thing would turn the world into something none of us recognize.
 
Why are the AI guys all using Python? Are there some C++ or C libraries around?

All the computation is done using C, or CUDA if you're running on a GPU. But it turns out that, since 99.999% of the work ends up happening down deep in some matrix math library that you can write all the high level stuff in whatever you want and it doesn't affect the performance. Most of the frameworks do everything except for the highest level stuff in C and deployed systems usually even do that part in C.

Python is popular because it's really easy to develop in and you want something easy for research since you change the code a lot and don't want to get stuck in a debugging loop. But the python stuff is more or less a fancy turing-complete config file when you get right down to it.
 
Even if Tesla isn't using their fleet to gather vast amounts of training video it's still extremely useful in training their NNs because they can validate an NN on the fleet in shadow mode and they can use an NN deployed to the fleet to build a database of corner cases - which is much more valuable than just raw video and much less data to manage.

I have a feeling people use "shadow mode" interchangeably with "magic" and hope people would expand what do they mean by the shadow mode and how it actually works ;)

Sure NN can categorize your image stream, but how ae you going to validate for false positives (i.e. something mislabeled) and false negatives (something that should have been labeled and was not)?
Sure you can have a verifier NN to doublecheck the main one, but if it's perfect - why not use it instead of the main one? Also we see no evidence of checker NNs anyway and if you ship every frame to mothership for verification that's not going to scale all that well (and we have no evidence of that either)
 
All the computation is done using C, or CUDA if you're running on a GPU. But it turns out that, since 99.999% of the work ends up happening down deep in some matrix math library that you can write all the high level stuff in whatever you want and it doesn't affect the performance. Most of the frameworks do everything except for the highest level stuff in C and deployed systems usually even do that part in C.

Python is popular because it's really easy to develop in and you want something easy for research since you change the code a lot and don't want to get stuck in a debugging loop. But the python stuff is more or less a fancy turing-complete config file when you get right down to it.

This leads to one of the most important points to understand about Deep Learning. That is the fact that the vast majority of people currently working with NN's are not really experts at them.

Instead they're people simply using them to accomplish some goal they have,.

Like I don't have a firm grasp of the convolution layers/etc. But, I can still use Neural networks to design a robot to chase dogs/cats off my property.
 
  • Funny
  • Helpful
Reactions: Taipan and lunitiks
I know just enough to sound ignorant, but I believe the parts that need to be fast will be compiled from C or run as GPU instructions. Python can be a nice wrapper around those parts, because it’s easy to work with.

All the computation is done using C, or CUDA if you're running on a GPU. But it turns out that, since 99.999% of the work ends up happening down deep in some matrix math library that you can write all the high level stuff in whatever you want and it doesn't affect the performance. Most of the frameworks do everything except for the highest level stuff in C and deployed systems usually even do that part in C.

Python is popular because it's really easy to develop in and you want something easy for research since you change the code a lot and don't want to get stuck in a debugging loop. But the python stuff is more or less a fancy turing-complete config file when you get right down to it.


So basically everything is written in C ? switch, java, go, php ?

I'm just a pharmacist who messes around with C++ and Qt and the Linux Shell. So I can easily export the libraries, cool.
 
This leads to one of the most important points to understand about Deep Learning. That is the fact that the vast majority of people currently working with NN's are not really experts at them.

Yeah, there aren't many people I would really consider to be expert - someone who could walk you through the derivation, intuition, history, and rationale for every part of an NN. Probably less than 1000 in the whole world, maybe as few as 100. Until a few years ago this field got about as much respect as cold fusion and had even fewer people working in it.

But it's really cool that there are hundreds of thousands of people who can build something useful with an NN. It's a great tool, it's awesome that people can get access to it.

But in a sense it's probably like anything - like a pencil for example. Just about anybody can use a pencil but how many people can make a good one from scratch?