Online gaming platform and game development system Roblox has announced the release and open source availability of Cube 3D, an AI model designed to generate 3D objects and environments from text prompts.
Cube 3D will serve as the basis for many AI tools that Roblox Roblox will develop in the future, including advanced scene generation tools. Over time, it will evolve into a multimodal model that incorporates text, images, videos and other input formats, and integrates with Roblox’s existing AI creation tools. AI models can generate 3D models and environments directly from textual descriptions and, in the future, images.
Designing fully functional structures is essential to creating a truly immersive 3D world. This can be the garage where you drive your car, the position to sit, the podium in the victory lane, and more. To achieve this, Roblox took inspiration from advanced models trained with text tokens to predict the next token and form the sentence. Innovation is based on this same principle. Roblox has developed the ability to tokenize 3D objects, recognize shapes as tokens, and train Cube 3D to predict the next shape token to construct a complete 3D object. When fully expanded, Cube 3D predicts the layout and recursively predicts the shape to complete the layout. Users can train tweaks, plugins, or cube 3D using their own data to meet their specific needs.
Roblox innovates object creation with 3D tokenization
The main technical challenge was linking text and images with 3D shapes. A key innovation is 3D tokenization, which allows the platform to represent 3D objects as tokens, similar to how text is represented as tokens. This allows Roblox to predict the next shape in the same way that a language model predicts the next word in a sentence.
To achieve the 3D generation, Roblox has developed a unified architecture for autoregressive generation, including the creation of a single object, completion of shapes, and designing multi-object or scene layouts. An auto-detachment transformer is a neural network that uses previous inputs to predict the next component. This architecture supports both scalability and multimodal compatibility, allowing the model to handle different types of inputs (text, visual, audio, 3D). Roblox open sources this model, and at this early stage, authors can generate 3D objects from text prompts. In the future, creators are aiming to generate the entire scene using multiple input types.
To train the Generation Preprocessing Transformer (GPT) for Shape Creation, Roblox uses a separate 3D Shape Token to align with the text prompt. This novel approach will create fully playable 3D scenes in the future.
Roblox is an online gaming platform and game creation system that allows users to design, develop and play games created by other users. From simple games to complex virtual worlds, it offers a vast virtual environment where individuals can create and share interactive 3D experiences.