As Fimm says, you'll generally get better depth of field with the smaller apertures or highest f-stops, which means you'll need longer exposures and thus also generally a tripod. And generally lots of light. The phenomenon is sometimes called the "pinhole" effect, since it's basically how a "camera obscura" or "pinhole camera" works.
Some phone cameras do also have physical zoom, believe it or not, but barring that you can select the longest lens, which can have a similar effect. Longer lenses have a smaller aperture relative to their focal length, even when you don't have a physical iris you can adjust. Essentially, the further you can get from your subject the longer the depth of field will be, and thus the more of the image you can make acceptably sharp. Some kind of aftermarket telephoto for your phone should have a similar effect, and some of them come with a tripod mount. It won't be as good as something genuinely built into the lens train, but if the phone's camera is high enough resolution it might not matter.
There's also a thing called "focus stacking" where you can take multiple images of the same subject from the same perspective with the only difference being the focus, to get different "layers" of the image in focus. You can then "stack" them in post processing and selectively erase out of focus sections in each, leading to a tack sharp composite. It's a bit of work in a program like Photoshop or GIMP, but a lot less work than manually despeckling an image, say, or manually erasing a background.
And there are advantages to getting in close as well, since it can be easier to make a thing look "cinematic." I find it's always a bit of a tradeoff. And some having some portions of an image out of focus can be useful and even desirable. The "bokeh" effect can mean that your background doesn't distract from the subject, and when shooting a miniature in a light box there's really no need for the box to be sharp, just the miniature, making a limited depth of field your friend.
Part of the reason Tex is having trouble here, to be fair, is that his models are so good, so massive, and so spectacularly detailed that every compromise screws up something.
Focus stacking. For Tex, I think focus stacking is the answer. Chuck the white background and use something with a little color to it; maybe even a proper backdrop. And use some harder, more cinematic light. A lower color temperature will warm the image up a bit, and harder light will give some shadows and some directionality to it, which will play well with the story telling. The light in the above shots is too soft, I think, which makes the images look flatter than they should. Something like that might work fine for a catalog photo, but you have a movie set.
That's my two cents.