Pages

29 April, 2026

SitecoreAI Pathway vs XM to XM Cloud Migration Tool


In my previous blog post, I wrote about SitecoreAI Pathway when it was announced at Sitecore Symposium 2025. I mentioned that I would explore the tool once it was available. Since then, I have been hands-on with Pathway and recently presented my findings at the Sitecore User Group Coimbatore (SUGCBE) in April 2026. This post is an updated comparison between the XM to XM Cloud Migration Tool (which I have been writing about throughout 2025) and the new SitecoreAI Pathway, now informed by my actual experience using the tool.

If you have been following my migration tool series (Part 1, Part 2, Part 3), you know how the old tool works. This post explains what Pathway does differently and why Sitecore is moving in this direction.

Quick Recap: XM to XM Cloud Migration Tool

The XM to XM Cloud Migration Tool was a technical utility that helped move content, media, and users from Sitecore XP/XM to XM Cloud. It was a "Lift & Shift" approach - it moved your data as-is from source to target.

It had two modes: GUI and CLI. You would point it to your source CM instance, select the content, media, or users to migrate, and it would transfer them to the target XM Cloud environment. Simple and effective for what it was designed to do.

However, it had clear limitations:

  • It only moved data, not the website structure
  • It preserved the existing monolithic content models without modernizing them
  • No support for component-first restructuring
  • No content modeling cleanup
  • Sitecore-to-Sitecore only - no support for other CMS platforms
  • Required direct access to the source CM instance

The tool is now limited to as-is migration only — primarily used for migrating media library assets from source to target. Pathway is the new recommended approach for content migration.

What is SitecoreAI Pathway?

SitecoreAI Pathway is an AI-powered content migration tool available as a marketplace app through the Sitecore Cloud Portal. Unlike the old migration tool, Pathway does not just move data from point A to point B. It uses AI to analyze your existing site structure, intelligently map your legacy templates and renderings to modern SitecoreAI components, and transform your content model during migration.

Think of it this way: the old tool was a moving truck that transported your furniture as-is. Pathway is more like an interior designer who reorganizes and modernizes your furniture to fit a new, better-designed house.

Two Migration Paths

One of the biggest things I discovered during my exploration is that Pathway supports two distinct migration paths:

1. Sitecore Website — For existing Sitecore XM/XP customers. You use the XMComponentExtraction tool to extract source page data as JSON files and upload them to Azure Blob Storage. Pathway then groups pages by source template. This path requires .NET 9.0, PowerShell 7+, and CM access.

2. Any Website — This is the one that surprised me. Pathway has a built-in web crawler that can scrape any public website. Just provide a URL or sitemap XML link, and Pathway handles the rest — no Azure Blob setup, no extraction tools. I tested this with my own blog (nehemiahj.com) and it crawled 50 pages, grouped them by page design and HTML structure, and migrated them successfully.

Both paths share a common requirement: your target SitecoreAI site structure must already be in place — templates, components, page designs. Pathway maps content to existing structures; it does not create new ones.

Side-by-Side Comparison

Aspect XM to XM Cloud Migration Tool SitecoreAI Pathway
Approach Lift & Shift AI-powered intelligent mapping & transformation
Source Systems Sitecore XP/XM only Sitecore XP/XM and any public website (via built-in web crawler)
What it Migrates Content, Media, Users Page content & structure with AI-mapped components (Media still needs the old tool)
Content Model Preserves existing structure as-is Modernizes content model to headless-friendly components
AI Involvement None AI groups pages, maps components, provides reasoning for mapping decisions
Human Review Select what to migrate, then it runs Human-in-the-loop: review and validate AI mappings before execution
Prerequisites on Target XM Cloud environment ready Target site structure, templates, components, and page designs must be created first
Multi-language Supported Single language per migration run (beta limitation)
Interface Desktop app (GUI) or CLI Web-based (Sitecore Cloud Portal) + downloadable extraction packages
Speed Depends on content volume Sitecore claims up to 70% reduction in migration time through AI automation
Accuracy 1:1 copy (100% of what it copies) AI-mapped content needs manual review — post-migration refinement is expected
Cost Free (part of XM Cloud) Iincluded (free) with Sitecore 360 / SitecoreAI subscription

How SitecoreAI Pathway Works (High Level)

The migration process in Pathway follows these key stages:

  1. Install & Create Migration - Install Pathway from the Sitecore Cloud Portal Marketplace. Create a new migration and select your source type — "Sitecore website" or "Any website."
  2. Configure Target - Select your target SitecoreAI environment and site. Upload your target site structure (xmc-structure.json) generated using the CMSExportStructure PowerShell script.
  3. Extract Source Data - For Sitecore websites: deploy the XMComponentExtraction handler and console app to extract page data as JSON to Azure Blob Storage. For any website: click "Start Web Crawling" and Pathway's crawler discovers and extracts pages automatically via the sitemap.
  4. Content Audit — AI Page Grouping - Pathway's AI analyzes the extracted pages and groups similar pages together. For Sitecore sites, grouping is based on source templates. For any website, it analyzes page design and HTML structure.
  5. Template Mapping - The AI matches each group to the most appropriate target template. You can view the reasoning behind each mapping decision — why it chose a particular template based on page structure and content.
  6. Component Mapping - The AI maps source components to target SitecoreAI components. Review and adjust manually if needed.
  7. Execute Migration - Run the migration with a real-time dashboard showing succeeded vs failed pages. Retry failed pages as needed.

What Pathway Does NOT Migrate

It is important to know the boundaries. Pathway does NOT handle:

  • Media library assets - You still need the XM to XM Cloud Migration Tool for this
  • Templates and component code - Must be recreated in SitecoreAI by developers
  • Personalization rules
  • Workflow states
  • Analytics data
  • Custom modules and integrations
  • Datasources with child items

This means the old migration tool is not obsolete. You will likely use both tools together: the old tool for media migration and Pathway for intelligent content migration.

Beta Limitations to Keep in Mind

Since Pathway is still in beta, there are some constraints I encountered during my testing:

  • 50 URL limit — Each migration run supports a maximum of 50 URLs. For larger sites, you will need multiple runs.
  • Single language per run — Pathway currently supports one language per migration. Multilingual sites will need separate runs for each language.
  • No JavaScript-rendered content — The web crawler works with static HTML only. Pages that rely heavily on client-side JavaScript rendering will not be fully captured.
  • No gated or authenticated pages — The crawler cannot access pages behind login walls or IP restrictions (unless you whitelist the crawler's IP).
  • No AI customization yet — You can see the AI's reasoning for its mapping decisions, which is great for transparency. But there is no way to guide or customize the AI instructions yet. If your team knows the source system well, you cannot tell the AI "these are product pages, not blog posts" — that capability is not available today.
  • No re-run capability — If something goes wrong mid-migration, you may need to start from the beginning rather than resuming where you left off.

My Hands-On Experience

I tested Pathway using both migration paths. For the "Any Website" path, I used my blog nehemiahj.com. The crawler picked up 50 pages from the sitemap, the AI grouped them and named each group based on the blog post content (like "XM Migration CLI Guide," "Commerce Cache Troubleshooting"), and mapped them all to the "Page" target template. Migration result: 50 succeeded, 0 failed. The migrated pages appeared in the Content Editor organized by year and month, with URL-friendly names preserved.

The AI grouping is where Pathway really shines. Instead of mapping every page individually, the AI identifies common patterns and groups similar pages together. This dramatically reduces repetitive mapping effort, especially for sites with hundreds of pages sharing similar structures.

The prerequisite setup for the Sitecore website path is more involved — you need to deploy the XMComponentExtraction handler, configure SecurityPolicy changes, set up Azure Blob Storage, and prepare the target structure JSON. It works, but plan time for getting all the pieces in place.

Key Takeaway

The XM to XM Cloud Migration Tool and SitecoreAI Pathway serve different purposes. The old tool is a reliable data mover, now primarily used for media migration. Pathway is a content modernizer that uses AI to transform your legacy content model into a headless-friendly architecture.

For organizations migrating to SitecoreAI, the recommended approach is:

  1. Design your target site structure in SitecoreAI (templates, components, page designs)
  2. Migrate media assets using the XM to XM Cloud Migration Tool
  3. Migrate and transform content using SitecoreAI Pathway
  4. Review and refine migrated content — AI mapping is powerful but post-migration cleanup is expected

In my upcoming posts, I will share detailed walkthroughs of both migration paths with screenshots — the Sitecore website extraction process and the web crawling approach. Stay tuned.

Useful Links:

10 April, 2026

Fixing Coveo Analytics InvalidToken and ExpiredToken Errors in Sitecore XM/XP

If you are using Coveo for Sitecore and seeing InvalidToken or ExpiredToken errors in your Coveo analytics logs, this post covers four issues I ran into and how I fixed them. These are specific to Sitecore XP sites using the Coveo REST proxy for analytics, but the patterns may help anyone dealing with Coveo analytics initialization problems.

The Setup

The site had Coveo analytics initialized in multiple places - a shared JavaScript file (coveo-analytics.js) and several cshtml Razor views (CoveoPageViewAnalytics.cshtml, ProductDetails.cshtml, ThankYou.cshtml). Each file loaded the Coveo analytics script using the standard IIFE loader and called coveoua('init', apiKey) independently.

Here is a quick comparison of the two approaches:

Aspect coveo-analytics.js cshtml Razor Views
API Key Source Hardcoded in JS with client-side hostname check Server-side via Settings.GetSetting("CoveoAnalyticsKey", "")
Init Timing Immediate - runs as soon as script loads Deferred - waits for CoveoSearchEndpointInitialized event
Analytics Routing Direct to Coveo cloud Through Sitecore proxy (/coveo/rest/ua/v15)
Sends Events No - just initializes for other components to use Yes - sends page view with metadata

On pages where both the JS file and a cshtml view loaded together, things broke. Here are the four problems and fixes.

Problem 1: Analytics Bypassing Sitecore Proxy

The coveo-analytics.js file was sending analytics events directly to analytics.cloud.coveo.com instead of routing through the Sitecore proxy at /coveo/rest/ua/v15. The cshtml files had the proxy override but the JS file did not.

The fix is to override baseUrl on the Coveo analytics client prototype inside the onLoad callback (you need to wait for the script to load before the prototype is available):

coveoua('onLoad', function () {
    Object.defineProperty(
        coveoanalytics.CoveoAnalyticsClient.prototype,
        'baseUrl',
        { get() { return '/coveo/rest/ua/v15'; } }
    );
    coveoua('init', apiKey);
});

This makes sure all analytics traffic goes through the Sitecore proxy, same as the cshtml views.

Problem 2: InvalidToken from Multiple Initialization

This was the main issue. On pages where both coveo-analytics.js and a cshtml view loaded, coveoua('init') was being called twice. Each call resets the analytics client and creates a new session token. Any events queued from the first initialization become invalid - hence the InvalidToken error.

There was also a secondary issue: the Object.defineProperty call for the proxy override would throw a TypeError on the second call because the property was defined as non-configurable by default.

The fix is a two-flag initialization guard using a shared namespace on window. Why two flags? Because the IIFE script injection is synchronous but the init call happens asynchronously inside onLoad. A single flag would either skip the IIFE or skip the init at the wrong time depending on load order.

window.SITE = window.SITE || {};

// Flag 1: Guard the IIFE script injection (synchronous)
if (!window.SITE._coveoScriptInjected) {
    window.SITE._coveoScriptInjected = true;

    (function (c, o, v, e, O, u, a) {
        a = 'coveoua'; c[a] = c[a] || function () { (c[a].q = c[a].q || []).push(arguments) };
        c[a].t = Date.now(); u = o.createElement(v); u.async = 1; u.src = e;
        O = o.getElementsByTagName(v)[0]; O.parentNode.insertBefore(u, O)
    })(window, document, 'script', 'https://static.cloud.coveo.com/coveo.analytics.js/2/coveoua.js');
}

// Flag 2: Guard the init call (asynchronous, inside onLoad)
coveoua('onLoad', function () {
    if (!window.SITE._coveoInitialized) {
        window.SITE._coveoInitialized = true;
        Object.defineProperty(
            coveoanalytics.CoveoAnalyticsClient.prototype,
            'baseUrl',
            { get() { return '/coveo/rest/ua/v15'; } }
        );
        coveoua('init', apiKey);
    }
});

This pattern works regardless of which file loads first - JS or cshtml. The first one to run sets the flag and does the initialization. The rest skip it. Event-sending code like coveoua('send', 'view', metadata) or coveoua('ec:addProduct', ...) does not need guards - those use the internal queue and will execute correctly once init completes.

I applied this guard pattern across all four files that had Coveo analytics initialization.

Problem 3: ExpiredToken and InvalidToken from Bot Traffic

While debugging the above, I also noticed two other errors in the Coveo logs that turned out to be caused by Googlebot crawling the site. This is worth calling out because in today's landscape of bots and AI crawlers, bot behavior is constantly changing. Bots render JavaScript, execute search queries, and trigger analytics events just like real users do. If you are not filtering bot traffic from your analytics endpoints, you end up with polluted data - expired tokens from stale bot renders, invalid sessions from crawlers re-initializing your analytics client, and noise that makes it harder to trust your actual user metrics. Keeping bot traffic out of your analytics pipeline is not optional anymore.

The first was a 419 ExpiredToken on the search proxy (/coveo/rest/search/v2). Sitecore generates a search JWT at page render time with a 24-hour TTL. By the time Googlebot crawled and executed the page, the token was expired. This is a server-side issue - the frontend cannot fix it.

The second was a 400 InvalidToken on the analytics proxy (/coveo/rest/ua/v15). This was a different root cause - the CoveoAnalyticsKey Sitecore setting on one of the sites was misconfigured. It had a search JWT (issued by SearchApi, with a queryExecutor role) instead of a static analytics API key (the xx... format). The analytics endpoint does not accept search tokens, so it returned InvalidToken.

Quick check: If your CoveoAnalyticsKey Sitecore setting starts with xx, you are good. If it looks like a long JWT string, it is wrong - you need the static analytics API key from your Coveo admin console.

Filtering Bot Traffic from Analytics

To stop bots from generating noise in your Coveo analytics, you have a few options:

Approach How Notes
Cloudflare WAF Block Googlebot on /coveo/rest/ua/* only Best option if Cloudflare is in your stack. Do NOT block /coveo/rest/search/ - that will break SEO.
IIS URL Rewrite Match User-Agent containing "Googlebot" on ^coveo/rest/ua/.* and abort Works at the server level before Sitecore processes the request
JavaScript Check navigator.userAgent for bot patterns before calling initCoveoUa() Secondary layer - Googlebot can execute JS, so not fully reliable

The key point: only block bots from the analytics path (/coveo/rest/ua/). The search path (/coveo/rest/search/) needs to stay open for Googlebot to index your search-driven content like product pages and knowledge base articles.

And do not just think about Googlebot. Bingbot, SEMrush, Ahrefs, AI training crawlers - they all behave differently, and they change their patterns over time. None of them have any legitimate reason to send commerce analytics events like add-to-cart or purchase. Filtering them out keeps your Coveo ML models trained on real user behavior, not crawler noise.

Problem 4: Google Translate Users Getting Failed Analytics (CORS Preflight)

This one was not a bot issue at all. I noticed 400 errors on OPTIONS /coveo/rest/ua/v15/analytics/custom in IIS logs, with a real browser User-Agent (iPhone Safari) and a Referer from *.translate.goog.

When a user browses your site through Google Translate, the page is served from *.translate.goog (e.g. yourdomain-com.translate.goog). Any JavaScript on that page making requests back to your actual domain is now cross-origin. The browser sends an HTTP OPTIONS preflight request before the actual analytics POST.

Here is the flow:

  1. User visits yourdomain-com.translate.goog
  2. JS tries to POST analytics to yourdomain.com/coveo/rest/ua/v15/analytics/custom
  3. Browser detects cross-origin and sends OPTIONS preflight first
  4. Sitecore returns 400 (Coveo proxy handler does not handle OPTIONS)
  5. Browser blocks the actual analytics POST
  6. Commerce events (add-to-cart, purchase, page view) are never recorded for translated sessions

These are real users losing analytics tracking - not bots. The Cloudflare bot-blocking rule from Problem 3 should NOT block this traffic.

Why Not Cloudflare?

I initially considered Cloudflare for this fix, but it does not work well here. Cloudflare Transform Rules can add headers but cannot change the HTTP status code. A WAF Custom Rule with "Return fixed response" can return a 204, but it cannot echo back the dynamic Origin header value - which is required for CORS. The origin changes per site (e.g. example-com.translate.goog), so you cannot hardcode it.

Fix: Sitecore httpRequestBegin Pipeline Processor

The fix that worked is a custom Sitecore httpRequestBegin pipeline processor that intercepts OPTIONS preflight requests for the Coveo analytics path and returns 204 with the correct CORS headers before the request reaches the Coveo proxy handler.

public class HandleCoveoCors : HttpRequestProcessor
{
    public override void Process(HttpRequestArgs args)
    {
        var request = HttpContext.Current.Request;
        var response = HttpContext.Current.Response;
        var origin = request.Headers["Origin"] ?? "";

        if (request.HttpMethod == "OPTIONS" &&
            request.Path.StartsWith("/coveo/rest/") &&
            origin.EndsWith(".translate.goog"))
        {
            response.StatusCode = 204;
            response.AddHeader("Access-Control-Allow-Origin", origin);
            response.AddHeader("Access-Control-Allow-Methods", "POST, OPTIONS");
            response.AddHeader("Access-Control-Allow-Headers",
                "Content-Type, Authorization");
            response.AddHeader("Access-Control-Max-Age", "86400");
            response.End();
        }
    }
}

A few notes on the implementation:

  • Why origin.EndsWith(".translate.goog"): Google Translate generates subdomains dynamically per site. Matching the suffix covers all translated variants without hardcoding each one.
  • Why 204 and not 200: 204 No Content is the correct HTTP status for a successful OPTIONS preflight with no response body. Browsers accept both, but 204 is the right semantic choice.
  • Why pipeline processor: It handles both the status code and the dynamic Origin header correctly, and keeps the logic server-side alongside the Coveo proxy handler it is protecting.

Summary

Four issues, four fixes:

  1. Analytics bypassing proxy - Add baseUrl override via Object.defineProperty inside onLoad
  2. InvalidToken from duplicate init - Two-flag initialization guard (_coveoScriptInjected + _coveoInitialized) on a shared window namespace
  3. Bot traffic causing token errors - Block bots from analytics proxy only (not search), and verify your CoveoAnalyticsKey Sitecore setting has a static xx... key
  4. Google Translate CORS preflight - Sitecore httpRequestBegin pipeline processor to handle OPTIONS requests from *.translate.goog with proper CORS headers

Hope this helps if you are dealing with similar Coveo analytics issues in Sitecore. If you have questions, leave a comment below.

30 November, 2025

Sitecore XM to XM Cloud Migration to SitecoreAI Pathway

I have been talking about Sitecore XM to XM Cloud Migration tool this year. In the recent Sitecore Symposium 2025, Sitecore team was talking about SitecoreAI Pathway - a new solution to migrate to XM Cloud from legacy systems. Legacy system means that it is not just Sitecore XP but any content systems. 

About XM to XM Cloud Migration Tool: This was a technical utility focused on migrating existing Sitecore XP/XM content to XM Cloud. It was essentially a "Lift & Shift" tool for Sitecore customers only. It required developers to setup the tool, select the content and migrate to XM Cloud environment. 

There are many drawbacks of XM to XM Cloud migration tool. 

  • It only moves data, not the website
  • It perpetuates Monolithic content models
  • No support for component-first restructuring
  • It skips content modeling cleanup
With SitecoreAI Pathway, Sitecore promises that its AI feature can analyze the content whether it is on Sitecore XP or any other CMS like WordPress and transform the content model and import in the target XM Cloud. That is a promising feature to speed up the migration of legacy systems to XM Cloud. 

Sitecore has moved from a 'Lift & Shift' utility to a universal 'SaaS Extraction' service that cuts migration time by 70% and is included for free in your XM Cloud (SitecoreAI) subscription.

SitecoreAI Pathway: Key Strategy Shifts

  • Drastic Timeline Reduction (70% Faster): Historically, full migrations took 15 months (6 months for front-end, 9 for back-end). The new AI-driven approach slashes this entire timeline by 70%, turning year-long projects into quick sprints.
  • Front-End Acceleration (6 Months → 2 Months): The bottleneck of recreating the "look and feel" is solved by partner innovations and AI tools, reducing the design and implementation phase from half a year to just 8 weeks.
  • Generative AI for "Back-End" Heavy Lifting: Instead of manual database scripts, SitecoreAI Pathway uses Generative AI to automatically map, transform, and migrate thousands of pages. It handles the complexity of data mapping and workflow setup with high precision.
  • Universal Compatibility (Not Just for Sitecore): The tool is platform-agnostic. While it supports XP/XM migrations immediately, it effectively extracts content from competitors like Adobe, Optimizely, and Contentful, making it a universal "on-ramp" to Sitecore.
  • Simplified Access (Sitecore 360): There is no separate license fee for this capability. The tool is included directly in the Sitecore 360 subscription, removing procurement barriers and de-risking the decision to migrate.

Based on the Symposium announcements, I assume that this tool will follow this data migration pattern. 

  • Process the Site: The AI crawls your live public URL to analyze page structure and assets.
  • Generate the Blueprint: It automatically groups similar content pattern. 
  • Map Content Types: You visually link legacy page elements (like a "Title" header) to specific new XM Cloud components.
  • Migrate: The AI extracts, cleans, and restructures the content, effectively "re-platforming" it into the new headless architecture automatically.
Since SitecoreAI Pathway is not in GA, I am planning to explore this tool once it is available for Sitecore Partner. Will be sharing more in the upcoming blogs. 

blockquote { margin: 0; } blockquote p { padding: 15px; background: #eee; border-radius: 5px; } blockquote p::before { content: '\201C'; } blockquote p::after { content: '\201D'; }