Context Based QnA Using Google’s State Of Art -PreTrained BERT
Why Transformers? The problems with long-existing sequence models are long-range dependencies, vanishing and exploding gradient is hard to resolve even batch gradient descent when network is large and deep, a large amount of training steps needed, as well