Sounds like you have a lot to learn. Ive modelled and rendered large cities and there is tradeoffs to be made. Lets go through your questions specifically:
"I need to receive accurate terrain where the central part is smooth in order to put the most important buildings."
Why does it need to be 'accurate'? When dealing with CGI you need 'close-enough' to make your budgets work, unless your client has endless money. If you really do need accurate, contact the developers and landscape architects, they have terrain elevations which you can use for surface deformation.
"Capturing terrain from some websites as mesh is not giving help, because at first sight terrains are accurate, but when I zoom and get closer to the ground, I find out that plopping buildings doesn’t give accurate result. For example, front side of building stands on ground level, but back side goes 3-5 meters underground"
This goes back to my first point, there is a trade-off to be made. Either flatten surface around the buildings or extrude the back down. Most clients ive dealt with dont mind the extrusion as long as the front of the property/driveways align correctly to the road. You can even cover the back with trees and plants if it suits. If you have a 3 meter drop then you will need to smooth the surface, that's far too large. If you are using websites then this is not accurate enough, refer to the second point about contacting the right people.
"Which soft is good enough to flatten the terrain? Also, which software is the most optimal in consuming memory?"
All modern 3D applications can achieve this. However when dealing with city sized scenes 3dsmax is especially good at handling billions of polygons in the viewport - just ensure you optimize where you need to. All software needs to optimize when dealing with city size scale somewhere, so be aware of your software limitations. There is no one software better than another when the artist knows their software well and can adjust things to suit the client needs.