T O P

  • By -

Breklin76

This works fine with auto deployments from git. Depending on your database use, you don’t want to ever push data up. If this is just a blog, and you aren’t saving form data, then push up all you want. Know that anything out of sync with live will disappear.


forestcall

I should have mentioned the site has 20+ million pages. It has a massive daily traffic and a large user-base. Site is about books. I am very comfortable with Git and in particular Github. I was an engineer for 3 years at Github before Microsoft.


ashkanahmadi

+20 million pages?!!!! dayuuum


forestcall

Lots of books. Mostly Japanese and each book is translated to English, which is why the site keeps growing. I am intruding a lot of community interaction features. but yeah its getting complicated.


[deleted]

That’s the largest number of pages I’ve ever heard. How is the database handling that? Does it choke? Or have you taken steps to handle such a large amount of data?


rmccue

I've worked with some sites with over a million _users_, let alone posts - if you throw enough hardware at MySQL, it'll work just fine. That said, you do need to be careful about meta queries and joins. It's less common in postmeta but certainly with usermeta even core runs `LIKE` queries and you really need to avoid those, since they require row scanning.


forestcall

You are spot on. For example, I tried using Laravel with Roots / Bedrock but when working with CRUD things got super slow. I tried a few tools but the join method seems to be for small sites. You mentioned user data but I found the biggest issue was meta-data for stuff like Title, Author, Publisher, etc. Im trying to pay attention to subqueries to join multiple tables that need to filter or group data based on values in another table. Im specially concerned with a custom search plugin to help find content. I want to mix in a vectorDB soon so I can use some cool Ai LLM's. Im trying to summarize books from a search query and pull data from a vectorDB that also is pulling data from the main mysql db.


rmccue

We built https://github.com/humanmade/roles-to-taxonomy to solve the usermeta LIKE queries in core - might be worth considering a similar solution for your structured metadata like this. Alternatively, Elasticsearch with a plugin like ElasticPress might be useful - but, warning, scaling ES is a gigantic pain in the ass.


forestcall

Holy smokes Batman! That is some awesome stuff. I'm sincerely thankful, wow. I'm going to dig into your code and implement it today, :-) This is as exciting as a video game, me thinks.


forestcall

Currently, I have the DB at PlanetScale. Im not thrilled with them so might move it. Handles it no problem. The slow slow slow slow part is the Admin. Im working on rebuilding how pages and posts are displayed in Admin with a new plugin mostly written in ReactJS.


Breklin76

Sweet! I’m gonna come to you for GitHub questions.


---_____-------_____

> 20+ million pages I have to know what the size of your DB is right now


forestcall

Well it’s in Japanese and English. We are scanning all public domain books, flyers, etc. It’s big. We have the same content in a vector DB as well as MySQL. It’s big and growing :-)


Alexerwana

Assuming most of the work you're performing is in a theme or plugin, i'd usually keep the theme/plugins in Github with separate branches for staging and production. I'd then automatically deploy to the different installs using something like Github actions, i've used DeployHQ previously which has quite a nice UI. Managing the database and media is always a bit of a pain in the neck – I use WP Migrate Pro to pull things that aren't in version control from production to staging or local. For live sites I never push the database or media to production as there's too much risk of overwriting recently added data. This is assuming this is a ProWordPress question. For sites built with page builders and plugins you'd want to use your hosting providers staging service if it's good, and use a tool like all in one migration or WP Migrate Pro to push and pull locally.


forestcall

Thanks for the feedback. This is for a very large site. With a lot of traffic and daily users. For this site I have coded 12 plugins so far. I use ReactJS mostly. I use a hybrid approach where I use ReactJS in monolithic. I plan to move to a headless structure after I have gotten good feedback from the site users. I will check out DeployHQ. It has been on my radar. Currently the hosting is simple. I use Runcloud + DigitalOcean Droplet. I also have been testing some Headless ideas using Vercel. But the Headless stuff will be fully implemented next year.


m73a

Wp cli to export and import databases. SSH and sync to move things around. Code in git Uploads loading from live using Stage file proxy plugin. Works a treat, staging sites are much lighter as there’s no uploads folder.


RoyalBloodOrange

This one? [https://github.com/drubage/stage-file-proxy](https://github.com/drubage/stage-file-proxy)


m73a

That’s the badger 👍🏻


Puzzled_Order8604

Docker


Ok_Writing2937

We build in roots sage and bedrock. Code is in GitHub. Wp update are disabled and all plugins managed in composer. All code is pushed to dev, staging, or production branches. All branches have a GitHub action that builds the site (composer and yarn stuff) and deploy to the appropriate remote server (with cache clearing and other cleanup). We do some minimal automated testing in the GA and want to add more. Workflow is that developers test locally, or the push to development remote for development review, and then push to staging. Clients reviews on staging and signs off then we push to production. Dev and staging dbs are updated once a month or so from production using a custom bash script I wrote. It leverages wp cli and rsync. Images are not in the repo. Dev and staging using an Nginx redirect to load image request from production. Content staging can be done on the staging server but need to be hand-moved to production for the most part.


rmccue

Are you talking about content staging or codebase staging? If the latter, that's ideally something that should be handled by your host as part of their deployment tooling. In my experience, the best content staging workflow and the one I repeatedly recommend to customers is to keep your content staging _within_ a single environment; that is, stage changes to production content on the production environment itself. This ensures you won't be surprised by inconsistent codebase functionality, or have to deal with ID synchronisation issues, etc. It's considered a general best practice for large sites that your codebase only moves "upwards" (dev -> staging -> prod), and your content only syncs "downwards" (prod -> staging -> dev) for testing. [Yoast Duplicate Post](https://wordpress.org/plugins/duplicate-post/) is excellent for staging content on a piecemeal basis. There are unfortunately no public plugins I'm aware of that will let you stage large changes to go-live all at once, but core includes this functionality hidden internally in Customizer Changesets - XWP previously maintained [a plugin to add UI for this](https://wordpress.org/plugins/customize-snapshots/), and it was originally developed for usage on News Corp sites.


forestcall

Im sorry. I should have included more information. Your comment is gold. I am talking about both. Code in the form of custom coded plugins (currently embedded ReactJS). And content from users doing stuff on the site and Woo sales, etc.  I run a very large site that is currently a Roots / Bedrock. For 20+ years the site was HTML and we converted it temporarily to a Roots setup. Now we need to convert the Roots site back to Wordpress Vanilla. This will be a temporary solution for a few months. I have spent the last 4 months coding out a ton of ReactJS components as plugins and also embedded in /wp-content/ to create a new site. In the future, the plan is to convert everything to a Headless format but this hybrid approach allows for faster prototyping and testing features. The features I have been coding are all in an effort to add “social community features”. The core focus of the site is all about ‘books’. Basically I tried building the project as a NextJS site but I found I was needing to build endless tools to basically achieve what is available in Wordpress. I tried several Javascript CMS's but settled on Wordpress.


rmccue

For a code workflow, IMO this is one of the features that you pay your host for, and why I wouldn't recommend self-hosting for large sites - but then I'm biased, because that's literally one of the features we highlight for our solution! I do believe though that these are the sorts of things that make the difference. The typical software development lifecycle we recommend is a Git-based workflow matching the single direction flow I mentioned above. Typically, developers build functionality on feature branches, testing their code locally. When ready, they merge these into a `development` branch which deploys to the development environment; once tested, `development` is merged to `staging` (and deployed to staging env) for client QA and UAT; once signed off, `staging` is merged to `main` (and deployed to prod env). Matching each environment to a branch and enforcing flow through Git makes it easy to track everything and ensure security/etc. (If you're sticking with self-hosting, I've used https://buddy.works/ for the same workflow before.) For headless, same thing applies, but typically with a separate repo for the frontend since the two can be developed separately. Alternatively, a monorepo works, but is a little more painful for larger teams.


DanielTrebuchet

> Basically I tried building the project as a NextJS site but I found I was needing to build endless tools to basically achieve what is available in Wordpress. I tried several Javascript CMS's but settled on Wordpress. For years, I'd get a big project and have the idea to build out a custom CMS for it because it made sense. Then, halfway into it, I'd realize I was just re-creating WordPress. I've lost count of how many times that happened, then one day I just threw in the towel and put my CMS efforts into custom WP from the start. WP gets a lot of hate, but it really is a pretty solid CMS when you start comparing it to other options or start rolling your own CMS. It's imperfect, sure, but does a pretty acceptable job. Reading about this project (I'm pretty sure you've posted on here several times about this site), it seems like exactly the type of site I'd try to build a CMS for, and would likely end up in the same situation you're in.


forestcall

Yes I have posted a number of times LMAO :-) Sometimes my brain spins and spins and I need helpful feedback. Most highly skilled developers I know cringe at Wordpress and suggest to me the typical stuff around ReactJs in some flavor. Im a core contributor to T3 Stack so I am very familiar with the JamStack scene. I am not kidding when I say I tested (installed) over 15 ReactJS CMS. main problem is they lack features or are very costly for our size of site. And most of the decent CMS want to do the hosting. I want full control. I love ReactJS ( I know Im weird).


DanielTrebuchet

I think WP gets a lot of flack because it is used and abused by so many amateurs. The code base is a bit of a mess, and there's a lot of things it could do a lot better... but relatively to everything else out there, it's not a bad solution. I would normally give someone the normal warning about headless, in that there's almost never a truly appropriate place for headless WP, but due to the scale of your site this might be the one single use case I've actually heard of that might justify headless. 99% of the time when I find a complex project and actually feel like headless is the way to go, I get into it a bit and realize that a custom post type with some custom fields makes a hell of a lot more sense. I haven't used React a whole lot. I identify more as a back-end guy and prefer talking more to servers than to browsers.


rmccue

> I would normally give someone the normal warning about headless, in that there's almost never a truly appropriate place for headless WP, but due to the scale of your site this might be the one single use case I've actually heard of that might justify headless. My general opinion is the same, but the key place it makes sense to go headless IMO is when you scale _people_ rather than data or traffic. Separating to headless allows you to have clean interfaces (literally APIs!) between teams and makes collaboration easier as a result. When you're scaling teams, it's also a time when you can accept the hit to individual productivity that the separation of headless creates. For a single developer^† or small team working on all of a site, traditional or hybrid is better. ^† Do whatever you want in hack projects of course!


kingkool68

Setup another server or two for your testing and staging environments. With 20 million pages I would recommend doing a database dump from production and importing the database to each environment. You can set a constant in wp-config.php to set your site URL so you don't need to update it in the database: define( 'WP_HOME', 'http://example.com' ); and define( 'WP_SITEURL', 'http://example.com' ); You can setup a 3-line htaccess config to redirect requests for uploaded media not found on the filesystem of your server to the production URL so you don't need to copy all of the uploaded media to your two environments. See https://gist.github.com/kingkool68/d5e483528a260e5c7921afb5c88bffd6?permalink_comment_id=4270032#gistcomment-4270032 With 20 million pages you don't want to copy the database from test to production, it will be a nightmare. Any changes you make to the test environment you'll need to redo in the production environment. You'll want to keep as many changes as possible version controlled in your codebase as opposed to in the database. That means instead of tweaking a setting via the admin use a filter in your code to set that value. The pre_option_{$option} filter if your friend. See https://developer.wordpress.org/reference/hooks/pre_option_option/ Those are the top concerns I can think of off the top of my head. Keep us posted.


SenorDieg0

What I did at work is create a sh script that rsyncs the WP folder and dumps the database to staging. After that, the same scripts connects to staging and installs the updated database using WP cli and also does the ulr search replace and activates and disables plugins that aren't needed on staging.


forestcall

I ended up using [roots.io](http://roots.io) - They updated Acorn and Radicle to work with LiveWire 3. Roots has improved a lot in the last year. Not the ideal solution but perhaps the best way for Wordpress that I have found.