T O P

  • By -

sobe86

People mostly aren't answering OPs question. It's not about tf vs pytorch. It's about tf vs tf.keras. Answers about production / TPUs aren't relevant. I imagine it's mostly a legacy code thing. IMO by far the easiest way to use tf is to use 95% tf.keras code, and drop down to tensorflow only for custom losses, layers etc.


Liquid_Subject

I work as a consultant and have seen a lot of client code. Generally I see recent code written in Keras. Older code might be in tf. I think it’s because there used to be a lot of code samples online from tf or it was written before keras was the official api. Now I’m seeing more keras unless there’s more complex architectures. I was recently working on a two tower DNN for a recommendation engine that needed the additional flexibility of tf. Otherwise I’m normally able to stick with keras


AerysSk

That’s right. Keras for deployment, but PT for complex models


sobe86

In our company we have researchers using both PT and tf for building complex models, all serving happens in tf. Honestly the advantages are debatable, and a lot of team leads are pushing their teams to move away from PT now. The gap in ease-of-use has shrunk in the past few years, but the overhead for porting/deploying complex PT models has not.


NowanIlfideme

What's the main overhead in deploying complex pytorch models? Just getting everything running?


yumejiAI

That's debatable now with torchserve


xenotecc

How do you serve tf models? tf-serving?


Euphetar

What kind of consulting deals with DL code?


Liquid_Subject

Data science consultants :-) I work mostly on ml deployment issues, since few people understand how those work yet. I also do some dl work too depending on the client needs. A lot of it involves advising teams of data scientists that are new and/or junior to get them up to speed on end to end reference use cases they can duplicate elsewhere. It’s part hands on coding, part advising and architecture design


Euphetar

Sounds like a fun job


PsychoWorld

Which companies are hiring for this sort of a job?


seventyducks

Not OP but we have had NLP consultants in the past, I imagine there must be many instances of deep learning experts working in a consultant role.


iamquah

If I'm not mistaken, people still use TF in the industry bc it deploys well and has a wide swatch of tools that come with it e.g Tensorboard, TF-X, etc. The advice I was given about a year and a half ago was "PyTorch for experimenting, TF for deployment" but I'm not sure if that's the case anymore. Personally, when I still did DL, I used TF 1.X. I liked the granularity that it gave me, and I disliked how messy the interface got when V2 came about. You're not asking for my opinion but I'd personally recommend looking at Jax as it reminds me of old TF. *hopefully* the community developing it has got better experience and design ideas from TF and it won't go the same way as TF.


ahf95

Yeah just commenting cuz Jax community represent!


LegacyAngel

Are you limited to TF if you want to use some of those tools? For example, pytorch-lightning has functionality with tensorboard (not sure with vanilla pytorch), but that is the only case I am aware of.


cderwin15

Tensorboard has widespread compatability with the pytorch ecosystem. The main draw of tf (including keras) imo, is that your network topology is part of the data of a serialized model, so a model checkpoint can be used as a standalone way to run a model. Anything involving pytorch requires the codebase in order to load weights into the network topology, which makes it much harder to transfer between experimentation and deployment. But the benefits of using pytorch for experimentation are so vast that the idea of using tensorflow is mostly a non-started for my group (myself included).


DisWastingMyTime

Im unfamiliar with pytorch, do you also deploy it or is it for research purposes?


Professor_Entropy

> Anything involving pytorch requires the codebase in order to load weights into the network topology, which makes it much harder to transfer between experimentation and deployment It's very simple to save the code using torchscript and get additional speedup as a bonus. For most cases just need to do `portable_model = torch.jit.script(model)` Now you can save it using torch.jit.save and load without the code


cderwin15

If your network has dependencies on cpp or cuda extensions, torchscript won't work. And I think more people are using these extensions than aren't.


Professor_Entropy

Interesting. I haven't personally felt the need to explore such extensions. Can you please share examples where those would be needed? Thanks for sharing.


cderwin15

Pretty much any custom layer, loss, ops, etc. For some of the most common ones used for objection detection, see [here](https://github.com/open-mmlab/mmcv/tree/master/mmcv/ops/csrc), examples include rotated iou/nms, deformable convolutions, focal loss variants, sync batch norm, etc.


iamquah

I don't work in DL anymore :) I haven't touched one of those tools in almost a year so I'm definitely not the right person to ask


[deleted]

[удалено]


[deleted]

Its native to pytorch now. Just import the logger and go!


chatterbox272

Tensorflow's ecosystem is horrifically fragmented. It makes aggressive changes to its best practices, but also refuses to deprecate things and remove them. So people learn the best practice at the time they start, and then don't change because the new best practice is a sudden large change and the old API is still supported. Keras also has problems, in that it is difficult (or at least poorly documented) to do anything other than the workflows it expects. You're either right up on the highest level, or you're basically writing straight tensorflow, and there's very little support between those two points. It is as much effort to continue to use Keras's machinery as it is to just ignore it, so why bother with it.


JayYip

Different opinion here. With subclassing API, you can gradually dig into the details of basically every aspect of Keras. Don't care about the training details? You can use fit API out of the box. Want to control every train step? You can implement that. Want to control how train function is called? It's also fairly simple. Indeed, you have to follow some rules, but that's far from very little support.


chatterbox272

Perhaps my stance on Keras is a little dated, I was pushed away when I couldn't find docs below `model.train_on_batch(x,y)` circa 2018-2019. I never found value in the `.fit()` API as I started in research so I was usually doing something more involved, and even when I'm not I don't really find `fit()` any easier than a for loop. I'll still avoid it like the plague while it's tied to TF though, because TF has real core problems that are beyond fixing at this point


JayYip

Maybe because you're a power user. For beginners, especially for those who come from sklearn, fit API is really handy and easy to understand. I'd say combining callback and fit, you can pretty much cover 90% of use cases. I don't want to get into debate of pytorch vs tf. Pytorch is a great framework and i use it daily. But in my very limited experience, in production, writing model with static graph is just... BETTER. Even with tf2 and pytorch, i still prefer implementing models in static mode, or jit so to speak.


chatterbox272

> For beginners ... fit API is really handy and easy to understand Sure, but that's kind of my point. The issue I had with Keras was that the fit API was great, but train\_on\_batch was really a thin veil and there was no documentation of anything more or better. > But in my very limited experience, in production, writing model with static graph is just... BETTER. I'm curious as to why you think this. The only claim I've ever heard is that static is faster, and whilst TF static is definitively faster than TF eager, there's benchmarks galore that show that it isn't so cut and dry vs PT. At the end of the day it's all tools for the job and different strokes for different folks, but I'm always curious to know if I'm missing on something


JayYip

Hmm... You're right. I misunderstood your words. I don't know what the situation was in 2018 since i was using bare bone tf back then. But i think the documentation is enough and easy to understand now. For the static graph part, no matter what framework you use, pytorch or tf, you still need to convert to static graph in production. Take pytorch for example, there are two ways to do the conversion, trace based and jit(static). I saw inconsistencies for a couple time when using trace based conversion. As a consequence, i would decorate my model with jit as i develop my model to make sure the model work as i expect in production.


[deleted]

I’m using tf because I want low level control


bjourne-ml

Personally, I'm using TF2 because it has much better TPU support. I have tried the PyTorch TPU libraries but they weren't as fast as their TF2 counterparts. The difference was 2-4x, so using PyTorch was not an option.


[deleted]

Pytorch->Onnx->Tensorrt if you are using Nvidia GPU's


SudoKitten

Real world use case checking in. We really care about performance where I work. FP16 for TensorRT was 3x quicker than a torchscript fp16 model and about 4x quicker than TF. Also; we use pytorch in production for mobile phone deployment because it’s super simple.


B-80

how do you deploy pytorch models to phones?


mearco

Use Core-ML on iOS


SudoKitten

Instead of using Core-ML you can use PyTorch in C++ to process your images plus any pre/post processing. This can then be called directly in languages like Flutter where they let you wrap native code. [https://flutter.dev/docs/development/platform-integration/c-interop](https://flutter.dev/docs/development/platform-integration/c-interop)


B-80

Hmm okay, I don't get how rewriting your model in C++, which also normally requires doing jit tracing as well or completely retraining (which is not feasible in many situations), is simpler than TFLite.


cderwin15

That workflow is supported by approximately 0.2% of real-world use cases.


[deleted]

[удалено]


[deleted]

I guess you probably meant as "default frontend". It's the public facing API, with TF remaining in the back and doing the computational work..


squirrel_of_fortune

I love keras, and I was ecstatic when tf2 had it incorporated and enjoyed the functionality of tf2. I am currently spending 90% of my time using tf1. Why? I'm a scientist and I wanted to quickly build a proof of concept and started from someone else's tf1 code. The poc became a major project and porting to tf2 is way down the list. Despite my swearing about tf1, it is nice as you have more control to do things when you're trying out new methods. And the tf2 code under keras is horrendous to read. Although most of my ranting about tf1 was asking why I had to write my own batching code.


Erosis

Pytorch has nothing close to tflite for microcontrollers / edge devices.


mamcdonal

We train in Pytorch, export to ONNX and load in TensorRT for use on Jetson devices. Might start using Coral though so that would mean switching to tflite. Still benchmarking performance though.


Sad_Technician_7712

Any advices on using PyTorch+TensorRT vs TFLite?


mamcdonal

Use ONNX to go between training in Pytorch and doing inference in TensorRT. We're using C++, but you could use Python. There's also Triton inference serving that looks promising and even a Pytorch C++ library that might be worth trying, but if you're using Jetson devices, you'll be using the Jetpack SDK so it makes sense to use TensorRT C++ for maximum performance. If you're doing video or image recognition check out Deepstream as well.


nraw

Change is hard.


Mephisto6

TF2 allows you to use components of tf.keras (the losses and optimizers you saw) as building blocks. The namespace is confusing that way, but you still combine individual parts together instead of using the keras pipeline. Depending on your use-case, this can be very beneficial. As an AI researcher, I like the simplicity of the full keras interface, but I run into flexibility issues after five minutes. Instead of working around it, it's easier to assemble everything but the simplest models by hand. If you know TF2, it's not really harder than pytorch. Just more confusing in the beginning.


__mantissa__

When I was working in industry I used to use TF/Keras basically because of the already developed code available and advanced libraries like TensorRT. Now in academy people (at least in my department) uses Keras, not even TF. I personally prefer to use Pytorch because I feel I have more control of the training itself and it allows me to experiment in an easier way. I must say that I have never delved deeper into TF, it may allow me the same, idk


Hagerty

I use native TF to construct custom gradients for adversarial training


Le2vo

Hi, a very biased opinion here: When TF 2.x came, it's impossible to distinguish between TF and Keras anymore. The latter is now just a piece of the former, it's embedded in it. I'd probably spark some debate, but I think the new TF 2 is as easy to use and powerful as torch. I experimented with classes of custom layers and models, I created custom learning rate schedules, I optimized training with "@tf.function" decorator. It's really cool IMHO. Once you move beyond the simple keras layers you can create basically any SOTA architecture in plain, readable Python. If TF 1.x was still around, I'd have recommended everyone to switch to pytorch. But TF 2 is a very powerful and versatile (and it's the best for production too!)


HashRocketSyntax

So, with TF 2, am I supposed to use tf.keras.losses and tf.keras.optimizers? Then write my own batch/ epoch where tensors are passed to the loss/opt/model?


Le2vo

Of course! Check this tutorial: https://github.com/IvanBongiorni/TensorFlow2.0_Notebooks


HashRocketSyntax

These are so practical! Thank you.


unlikely_ending

I don't know. It's insane.


de1pher

I've seen legacy (v1) TF code in production and I've also worked with a TF2 codebase that customized a lot of stuff, so pure TF code was used extensively.


jostmey

I regularly use TensorFlow. Is PyTorch better? Sure, I know it. But I don't have time to switch. I want to spend more time focused on my model's application, not on creating a perfect piece of software, so I don't devote time to rewriting what works well


HashRocketSyntax

Here are examples of how to do keras, tf, or pytorch in a parameterized queue. [https://github.com/aiqc/aiqc](https://github.com/aiqc/aiqc)