Re: [Dri-devel] Mach64 for ppc xf86-log etc
You are fast, i never would have thought you would answer this fast! Leif Delgass wrote: >It looks like there's a problem with the drm initialization. Could you >send the kmsg output with debugging turned on: > >close X >become root >rmmod mach64 >modprobe mach64 drm_opts=debug >cat /proc/kmsg > kmsg.txt >start X from another console > >On Sat, 11 May 2002, Peter Andersson wrote: > I hope this will give you some information. At least i know that this will be more informative than some of mine other mails. When i ran the command "modprobe mach64 drm_opts=debug" i got the following error messages: /lib/modules/2.4.18/kernel/drivers/char/drm/mach64.o: invalid parameter parm /lib/modules/2.4.18/kernel/drivers/char/drm/mach64.o insmod /lib/modules/2.4.18/kernel/drivers/char/mach64.o failed /lib/modules/2.4.18/kernel/drivers/char/drm/mach64.o insmod mach64 failed The kmsg.txt output is included (as an attatchment).. The strange thing is that i can start x now for some reason, it didn´t work a couple of hours ago. I still get the "Error :flusinng vertex buffer: return= -16" when running glxgears. Regards Peter <7>[drm:mach64_ioctl] pid=680, cmd=0xc0246400, nr=0x00, dev 0xe200, auth=1 <7>[drm:mach64_flush] pid = 680, device = 0xe200, open_count = 1 <7>[drm:mach64_release] open_count = 1 <7>[drm:mach64_release] pid = 680, device = 0xe200, open_count = 1 <7>[drm:mach64_fasync] fd = -1, device = 0xe200 <7>[drm:mach64_takedown] <7>[drm:mach64_ioctl] pid=680, cmd=0xc0246400, nr=0x00, dev 0xe200, auth=1 <7>[drm:mach64_flush] pid = 680, device = 0xe200, open_count = 1 <7>[drm:mach64_release] open_count = 1 <7>[drm:mach64_release] pid = 680, device = 0xe200, open_count = 1 <7>[drm:mach64_fasync] fd = -1, device = 0xe200 <7>[drm:mach64_takedown] <7>[drm:mach64_ioctl] pid=680, cmd=0xc0246400, nr=0x00, dev 0xe200, auth=1 <7>[drm:mach64_ioctl] pid=680, cmd=0xc0086401, nr=0x01, dev 0xe200, auth=1 <7>[drm:mach64_ioctl] pid=680, cmd=0x80086410, nr=0x10, dev 0xe200, auth=1 <7>[drm:mach64_mmap] start = 0x30019000, end = 0x3001b000, offset = 0xd591b000 <7>[drm:mach64_vm_open] 0x30019000,0x2000 <7>[drm:mach64_vm_shm_nopage] shm_nopage 0x3001a000 <7>[drm:mach64_ioctl] pid=680, cmd=0xc0086426, nr=0x26, dev 0xe200, auth=1 <7>[drm:mach64_ioctl] pid=680, cmd=0xc0246400, nr=0x00, dev 0xe200, auth=1 <7>[drm:mach64_ioctl] pid=680, cmd=0x20006430, nr=0x30, dev 0xe200, auth=1 <7>[drm:mach64_ioctl] pid=680, cmd=0xc0106403, nr=0x03, dev 0xe200, auth=1 <7>[drm:mach64_irq_busid] 0:16:0 => IRQ 48 <7>[drm:mach64_ioctl] pid=680, cmd=0x80086422, nr=0x22, dev 0xe200, auth=1 <7>[drm:mach64_fasync] fd = 7, device = 0xe200 <7>[drm:mach64_ioctl] pid=680, cmd=0x80086414, nr=0x14, dev 0xe200, auth=1 <7>[drm:mach64_irq_install] mach64_irq_install: irq=48 <7>[drm:mach64_irq_install] Before PREINSTALL: CRTC_INT_CNTL = 0x8874 <7>[drm:mach64_irq_install] After PREINSTALL: CRTC_INT_CNTL = 0x8874 <7>[drm:mach64_irq_install] Before POSTINSTALL: CRTC_INT_CNTL = 0x0840 <7>[drm:mach64_irq_install] After POSTINSTALL: CRTC_INT_CNTL = 0x0942 <7>[drm:mach64_ioctl] pid=680, cmd=0xc00c6419, nr=0x19, dev 0xe200, auth=1 <7>[drm:mach64_mapbufs] 128 buffers, retcode = 0 <7>[drm:mach64_vm_open] 0x30019000,0x2000 <7>[drm:mach64_vm_open] 0x3044a000,0x0020 <7>[drm:mach64_flush] pid = 682, device = 0xe200, open_count = 1 <7>[drm:mach64_ioctl] pid=680, cmd=0x8008642a, nr=0x2a, dev 0xe200, auth=1 <7>[drm:mach64_lock] 1 (pid 680) requests lock (0x0001), flags = 0x000a <7>[drm:mach64_lock] 1 has lock <7>[drm:mach64_ioctl] pid=680, cmd=0x8008642a, nr=0x2a, dev 0xe200, auth=1 <7>[drm:mach64_lock] 1 (pid 680) requests lock (0x0001), flags = 0x000a <7>[drm:mach64_lock] 1 has lock <7>[drm:mach64_ioctl] pid=680, cmd=0x8008642a, nr=0x2a, dev 0xe200, auth=1 <7>[drm:mach64_lock] 1 (pid 680) requests lock (0x0001), flags = 0x000a <7>[drm:mach64_lock] 1 has lock <7>[drm:mach64_ioctl] pid=680, cmd=0x8008642a, nr=0x2a, dev 0xe200, auth=1 <7>[drm:mach64_lock] 1 (pid 680) requests lock (0x0001), flags = 0x000a <7>[drm:mach64_lock] 1 has lock <7>[drm:mach64_ioctl] pid=680, cmd=0x8008642a, nr=0x2a, dev 0xe200, auth=1 <7>[drm:mach64_lock] 1 (pid 680) requests lock (0x0001), flags = 0x000a <7>[drm:mach64_lock] 1 has lock <7>[drm:mach64_ioctl] pid=680, cmd=0x8008642a, nr=0x2a, dev 0xe200, auth=1 <7>[drm:mach64_lock] 1 (pid 680) requests lock (0x0001), flags = 0x000a <7>[drm:mach64_lock] 1 has lock <7>[drm:mach64_ioctl] pid=680, cmd=0x8008642a, nr=0x2a, dev 0xe200, auth=1 <7>[drm:mach64_lock] 1 (pid 680) requests lock (0x0001), flags = 0x000a <7>[drm:mach64_lock] 1 has lock <7>[drm:mach64_ioctl] pid=680, cmd=0x8008642a, nr=0x2a, dev 0xe200, auth=1 <7>[drm:mach64_lock] 1 (pid 680) requests lock (0x0001), flags = 0x000a <7>[drm:mach64_lock] 1 has lock <7>[drm:mach64_ioctl] pid=680, cmd=0x8008642a, nr=0x2a, dev 0xe200, auth=1 <7>[drm:mach64_lock] 1 (pid 680) requests lock (0x0001), flags = 0x000
Re: [Dri-devel] MACH64_BM_GUI_TABLE(_CMD)?
On 2002.05.05 19:41 Frank C. Earl wrote:
> ...
>
> > I plan to build a test case for this, but I would like to hear
> preliminary
> > opinions about this, in case I'm missing something. Frank, have you
> tested
> > this before?
>
> Yes, pretty extensively, but I didn't have time to set up tests for
> spanning
> multiple pages- we ought to do that one last one before commiting to the
> path we're now looking at.
Ok. I've made a test to see if this is possible and it failed. It's best
that Leif and Frank made a quick review of the test I made (attached), to
see if is there any mistake I made, before we put a stone on this issue.
Basically I changed mach64_bm_dma_test to allocate a 2nd descriptor table
and 2 more data buffers. The first buffer attempts to override
MACH64_BM_GUI_TABLE to read the 2nd table (which points to a 3rd buffer
which fille the vertex registers with different values). The 2nd buffer is
the continuation of the 1st and has the regular cleanup.
Now I plan to reproduce the hang that we had when trying to draw a
multitextured triangle without the texture offset specified to see if the
engine can recover or not from the lock. Frank, on IRC I got the
impression that you were gonna try this. Did you?
José Fonseca
static int mach64_bm_dma_test( drm_device_t *dev )
{
drm_mach64_private_t *dev_priv = dev->dev_private;
dma_addr_t data_handle, data2_handle, data3_handle, table2_handle;
void *cpu_addr_data, *cpu_addr_data2, *cpu_addr_data3, *cpu_addr_table2;
u32 data_addr, data2_addr, data3_addr, table2_addr;
u32 *table, *data, *table2, *data2, *data3;
u32 regs[3], expected[3];
int i;
DRM_DEBUG( "%s\n", __FUNCTION__ );
table = (u32 *) dev_priv->cpu_addr_table;
/* FIXME: get a dma buffer from the freelist here rather than using the pool */
DRM_DEBUG( "Allocating data memory ...\n" );
cpu_addr_data = pci_pool_alloc( dev_priv->pool, SLAB_ATOMIC, &data_handle );
cpu_addr_data2 = pci_pool_alloc( dev_priv->pool, SLAB_ATOMIC, &data2_handle );
cpu_addr_data3 = pci_pool_alloc( dev_priv->pool, SLAB_ATOMIC, &data3_handle );
cpu_addr_table2 = pci_pool_alloc( dev_priv->pool, SLAB_ATOMIC, &table2_handle
);
if (!cpu_addr_data || !data_handle || !cpu_addr_data2 || !data2_handle ||
!cpu_addr_data3 || !data3_handle || !cpu_addr_table2 || !table2_handle) {
DRM_INFO( "data-memory allocation failed!\n" );
return -ENOMEM;
} else {
data = (u32 *) cpu_addr_data;
data_addr = (u32) data_handle;
data2 = (u32 *) cpu_addr_data2;
data2_addr = (u32) data2_handle;
data3 = (u32 *) cpu_addr_data3;
data3_addr = (u32) data3_handle;
table2 = (u32 *) cpu_addr_table2;
table2_addr = (u32) table2_handle;
}
MACH64_WRITE( MACH64_SRC_CNTL, 0x );
MACH64_WRITE( MACH64_VERTEX_1_S, 0x );
MACH64_WRITE( MACH64_VERTEX_1_T, 0x );
MACH64_WRITE( MACH64_VERTEX_1_W, 0x );
for (i=0; i < 3; i++) {
DRM_DEBUG( "(Before DMA Transfer) reg %d = 0x%08x\n", i,
MACH64_READ( (MACH64_VERTEX_1_S + i*4) ) );
}
/* 1_90 = VERTEX_1_S, setup 3 sequential reg writes */
/* use only s,t,w vertex registers so we don't have to mask any results */
data[0] = cpu_to_le32(0x00020190);
data[1] = expected[0] = 0x;
data[2] = expected[1] = 0x;
data[3] = expected[2] = 0x;
data[4] = cpu_to_le32(MACH64_BM_GUI_TABLE_CMD);
data[5] = cpu_to_le32(table2_addr | MACH64_CIRCULAR_BUF_SIZE_16KB);
data[6] = cpu_to_le32(MACH64_DST_HEIGHT_WIDTH);
data[7] = cpu_to_le32(0);
data2[8] = cpu_to_le32(0x006d); /* SRC_CNTL */
data2[9] = 0x;
data3[0] = cpu_to_le32(0x00020190);
data3[1] = 0x;
data3[2] = 0x;
data3[3] = 0x;
data3[4] = cpu_to_le32(0x006d); /* SRC_CNTL */
data3[5] = 0x;
DRM_DEBUG( "Preparing table ...\n" );
table[0] = cpu_to_le32(MACH64_BM_ADDR + APERTURE_OFFSET);
table[1] = cpu_to_le32(data_addr);
table[2] = cpu_to_le32(8 * sizeof( u32 ) | 0x4000);
table[3] = 0;
table[4] = cpu_to_le32(MACH64_BM_ADDR + APERTURE_OFFSET);
table[5] = cpu_to_le32(data2_addr);
table[6] = cpu_to_le32(2 * sizeof( u32 ) | 0x8000 | 0x4000);
table[7] = 0;
table2[0] = cpu_to_le32(MACH64_BM_ADDR + APERTURE_OFFSET);
table2[1] = cpu_to_le32(data3_addr);
table2[2] = cpu_to_le32(6 * sizeof( u32 ) | 0x8000 | 0x4000);
table2[3] = 0;
DRM_DEBUG( "table[0] = 0x%08x\n", table[0] );
DRM_DEBUG( "table[1] = 0x%08x\n", table[1] );
[Dri-devel] Client context uploads: How to implement them? / Analysis of the Gamma driver
As it becomes more clear that in the mach64 the best solution is to fill DMA buffers with the context state and the vertex buffers I've been trying to understand how can this be done and how the Gamma driver (which has this same model) does. The context state is available right in the beginning of running a pipeline and usually DDUpdateHWState is called in the beginning of RunPipeline. The problem is that although all state information is available, we don't know which part should be uploaded since other clients could dirty the hardware registers in the meanwhile. I'm don't fully understand how the Gamma driver overcomes this. Its behavior regarding this is controled by a macro definition, named DO_VALIDATE, that enables a series of VALIDATE_* macros which in turn I couldn't understand what they do. Another thing that caught my atention was the "HACK" comment on gammaDDUpdateHWState before gammaEmitHwState - it reminds a similar comment on mach64, which makes one think that the author had in mind a better way to do that. Alan, could you shed some light on these two issues please? Before I started this little research I already had given some thought on I would do it. One idea that crossed my mind was to reserve some space on the DMA buffer to put the context state before submiting the buffer. Of course that there would be some DMA buffer waste but it wouldn't that much since there are a fairly low number of context registers. One think that holds me back is that I still don't understand how multiple clients avoid each other: what is done in parallel, and what is done in serie... I would also appreciate any ideas regarding this. This is surely an issue I would like to discuss further on the next meeting. Regards, José Fonseca ___ Have big pipes? SourceForge.net is looking for download mirrors. We supply the hardware. You get the recognition. Email Us: [EMAIL PROTECTED] ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Dri-devel] Client context uploads: How to implement them? / Analysis of the Gamma driver
Jose, I'd certainly forget using the gamma driver as any kind of template for any work. There are many unimplemented features, and multiple clients just don't work. Purely a lack of time thing. Alan. On Sun, May 12, 2002 at 05:27:26 +0100, Jos Fonseca wrote: > As it becomes more clear that in the mach64 the best solution is to fill > DMA buffers with the context state and the vertex buffers I've been trying > to understand how can this be done and how the Gamma driver (which has > this same model) does. > > The context state is available right in the beginning of running a > pipeline and usually DDUpdateHWState is called in the beginning of > RunPipeline. The problem is that although all state information is > available, we don't know which part should be uploaded since other clients > could dirty the hardware registers in the meanwhile. > > I'm don't fully understand how the Gamma driver overcomes this. Its > behavior regarding this is controled by a macro definition, named > DO_VALIDATE, that enables a series of VALIDATE_* macros which in turn I > couldn't understand what they do. Another thing that caught my atention > was the "HACK" comment on gammaDDUpdateHWState before gammaEmitHwState - > it reminds a similar comment on mach64, which makes one think that the > author had in mind a better way to do that. Alan, could you shed some > light on these two issues please? > > Before I started this little research I already had given some thought on > I would do it. One idea that crossed my mind was to reserve some space on > the DMA buffer to put the context state before submiting the buffer. Of > course that there would be some DMA buffer waste but it wouldn't that much > since there are a fairly low number of context registers. One think that > holds me back is that I still don't understand how multiple clients avoid > each other: what is done in parallel, and what is done in serie... > > I would also appreciate any ideas regarding this. This is surely an issue > I would like to discuss further on the next meeting. > > Regards, > > Jos Fonseca > > ___ > > Have big pipes? SourceForge.net is looking for download mirrors. We supply > the hardware. You get the recognition. Email Us: [EMAIL PROTECTED] > ___ > Dri-devel mailing list > [EMAIL PROTECTED] > https://lists.sourceforge.net/lists/listinfo/dri-devel ___ Have big pipes? SourceForge.net is looking for download mirrors. We supply the hardware. You get the recognition. Email Us: [EMAIL PROTECTED] ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Dri-devel] Client context uploads: How to implement them? / Analysis of the Gamma driver
Alan, On 2002.05.12 17:35 Alan Hourihane wrote: > Jose, > > I'd certainly forget using the gamma driver as any kind of template for > any work. > > There are many unimplemented features, and multiple clients just don't > work. > Ok. I wasn't aware of this. > Purely a lack of time thing. > > Alan. > Thanks, José Fonseca ___ Have big pipes? SourceForge.net is looking for download mirrors. We supply the hardware. You get the recognition. Email Us: [EMAIL PROTECTED] ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Dri-devel] Client context uploads: How to implement them? /Analysis of the Gamma driver
Jose, I've been experimenting with this too, and was able to get things going with state being emitted either from the client or the drm, though I'm still having lockups and things are generally a bit buggy and unstable still. To try client side context emits, I basically went back to having each primitive emit state into the vertex buffer before adding the vertex data, like the original hack with MMIO. This works, but may be emmiting state when it's not necessary. Now I'm trying state emits in the drm, and to do that I'm just grabbing a buffer from the freelist and adding it to the queue before the vertex buffer, so things are in the correct order in the queue. The downside of this is that buffer space is wasted, since the state emit uses a small portion of a buffer, but putting state in a separate buffer from vertex data allows the proper ordering in the queue. Perhaps we could use a private set of smaller buffers for this. At any rate, I've done the same for clears and swaps, so I have asynchronous DMA (minus blits) working with gears at least. I'm still getting lockups with anything more complicated and there are still some state problems. The good news is that I'm finally seeing an increase in frame rate, so there's light at the end of the tunnel. Right now I'm using 1MB (half the buffers) as the high water mark, so there should always be plenty of available buffers for the drm. To get this working, I've used buffer aging rather than interrupts. What I realized with interrupts is that there doesn't appear to be an interrupt that can poll fast enough to keep up, since a VBLANK is tied to the vertical refresh -- which is relatively infrequent. I'm thinking that it might be best to start out without interrupts and to use GUI masters for blits and then investigate using interrupts, at least for blits. Anyway, I have an implementation of the freelist and other queues that's functional, though it might require some locks here and there. I'll try to stabilize things more and send a patch for you to look at. I've also played around some more with AGP textures. I have hacked up the performance boxes client-side with clear ioctls, and this helps to see what's going on. I'll try to clean that up so I can commit it. I've found some problems with the global LRU and texture aging that I'm trying to fix as well. I'll post a more detailed summary of that soon. BTW, as to your question about multiple clients and state: I think this is handled when acquiring the lock. If the context stamp on the SAREA doesn't match the current context after getting the lock, everything is marked as dirty to force the current context to emit all it's state. Emitting state to the SAREA is always done while holding the lock. Regards, Leif On Sun, 12 May 2002, José Fonseca wrote: > As it becomes more clear that in the mach64 the best solution is to fill > DMA buffers with the context state and the vertex buffers I've been trying > to understand how can this be done and how the Gamma driver (which has > this same model) does. > > The context state is available right in the beginning of running a > pipeline and usually DDUpdateHWState is called in the beginning of > RunPipeline. The problem is that although all state information is > available, we don't know which part should be uploaded since other clients > could dirty the hardware registers in the meanwhile. > > I'm don't fully understand how the Gamma driver overcomes this. Its > behavior regarding this is controled by a macro definition, named > DO_VALIDATE, that enables a series of VALIDATE_* macros which in turn I > couldn't understand what they do. Another thing that caught my atention > was the "HACK" comment on gammaDDUpdateHWState before gammaEmitHwState - > it reminds a similar comment on mach64, which makes one think that the > author had in mind a better way to do that. Alan, could you shed some > light on these two issues please? > > Before I started this little research I already had given some thought on > I would do it. One idea that crossed my mind was to reserve some space on > the DMA buffer to put the context state before submiting the buffer. Of > course that there would be some DMA buffer waste but it wouldn't that much > since there are a fairly low number of context registers. One think that > holds me back is that I still don't understand how multiple clients avoid > each other: what is done in parallel, and what is done in serie... > > I would also appreciate any ideas regarding this. This is surely an issue > I would like to discuss further on the next meeting. > > Regards, > > José Fonseca > > ___ > > Have big pipes? SourceForge.net is looking for download mirrors. We supply > the hardware. You get the recognition. Email Us: [EMAIL PROTECTED] > ___ > Dri-devel mailing list > [EMAIL PROTECTED] > ht
Re: [Dri-devel] Client context uploads: How to implement them? / Analysis of the Gamma driver
Darryl, On 2002.05.12 19:11 Daryll Strauss wrote: > On Sun, May 12, 2002 at 05:27:26PM +0100, José Fonseca wrote: > > I would also appreciate any ideas regarding this. This is surely an > issue > > I would like to discuss further on the next meeting. > > You're right, there's no automatic way to know what state has become > dirty. You need to keep some flags that tell you what state has > changed when you change clients. Since it is work to keep these flags up > to date, you have to decide what granularity to keep. > > Any time you don't immediately get the hardware lock you have to check > your flags to see what changed. In the tdfx driver I kept 3 flags. One > was that the fifo has changed. That basically meant some other client (X > server, or 3D app) had written data to the card. I had to resyncronize > the fifo in that case. The second said that the 3D state was dirty. That > would only occur when a second 3D app ran (the X server never touched > the same state) and required that I reset all the 3D parameters. Finally > there was a texture dirty flag which meant that I had to reload the > textures on the card. > > The rationale for that breakdown was that X server context switches > would be common. It has to do input handling for example. So I wanted a > cheap way to say that the X server had done stuff, but only the fifo > changed. Next I argued that texture swaps were really expensive. So, if > two 3D apps were running, but not using textures, it would be nice to > avoid paging what could be multiple megabytes of textures. Finally that > meant 3D state was everything else. It wasn't that much data to force to > a known state, so it wasn't worth breaking that into smaller chunks. > > The three flags were stored in the SAREA, and the first time a client > changed each of the areas it would store it's number into the > appropriate flag of the SAREA. > I've been snooping in the tdfx driver source. You're referring to the TDFXSAREAPriv.*Owner flags in tdfx_context.h and the tdfxGetLock function in tdfx_lock.c, in the tdfx's Mesa driver. So let's see if I got this straight (these notes are not specific to 3dfx): - The SAREA writes are made with the lock (as Leif pointed), and reflects the state left by the last client that grabed it. - All those "dirty"/"UPDATE" flags are only meaningfull within a client. Whenever another client got into the way, the word is upload whatever changed - "what" in particular depends of the granularity gauge you mentioned. - For this to work with DMA too the buffers must be sent exactly in the same order they are received on the DRM. - The DRM knows nothing about this: it's up to the client to make sure that the information in SAREA is up to date. (The 3dfx is an exception since there is no DMA so the state is sent to the card without DRM intervention, via the Glide library). > Just a small expansion on this. The texture management solution is > weak. If two clients each had a small texture, it would be quite > possible that they both would have fit in texture memory and no texture > swapping would be required. Doing that would have required more advanced > texture management that realized certain regions were in use by one > client or another. We still don't have that yet. In a grand scheme > regions of texture memory would be shared between 2D and multiple 3D > clients. This would mean that the texture management would had to be made by the DRM or, perhaps even better, X. Surely something to look in the future. Thanks for your reply. It was very informative. José Fonseca ___ Have big pipes? SourceForge.net is looking for download mirrors. We supply the hardware. You get the recognition. Email Us: [EMAIL PROTECTED] ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Dri-devel] Client context uploads: How to implement them? / Analysis of the Gamma driver
Leif, On 2002.05.12 19:15 Leif Delgass wrote: > Jose, > > I've been experimenting with this too, and was able to get things going > with state being emitted either from the client or the drm, though I'm > still having lockups and things are generally a bit buggy and unstable > still. To try client side context emits, I basically went back to having > each primitive emit state into the vertex buffer before adding the vertex > data, like the original hack with MMIO. This works, but may be emmiting > state when it's not necessary. I don't see how that would happen: only the dirty context was updated before. > Now I'm trying state emits in the drm, and I think that doing the emits on the DRM give us more flexibility than in the client. > to do that I'm just grabbing a buffer from the freelist and adding it to > the queue before the vertex buffer, so things are in the correct order in > the queue. The downside of this is that buffer space is wasted, since > the > state emit uses a small portion of a buffer, but putting state in a > separate buffer from vertex data allows the proper ordering in the queue. Is it a requirement that the addresses stored in the descriptor tables must be aligned on some boundary? If not we could use a single buffer to hold succesive context emits, and the first entry each table descriptor would point to a section of this buffer. This way there wouldn't be any waste of space and a single buffer would suffice for a big number of DMA buffers. > > Perhaps we could use a private set of smaller buffers for this. At any > rate, I've done the same for clears and swaps, so I have asynchronous DMA > (minus blits) working with gears at least. This is another way too. I don't know if we are limited to the kernel memory allocation granularity, so unless this is already done by the pci_* API we might need to to split buffers into smaller sizes. > I'm still getting lockups > with > anything more complicated and there are still some state problems. The > good news is that I'm finally seeing an increase in frame rate, so > there's > light at the end of the tunnel. My time is limited, and I can't spend more than 3 hrs per day on this, but I think that after the meeting tomorrow we should try to keep the cvs on sync, even if it's less stable - it's a development branch after all and its stability is not as important as making progress. > > Right now I'm using 1MB (half the buffers) as the high water mark, so > there should always be plenty of available buffers for the drm. To get > this working, I've used buffer aging rather than interrupts. Which register do you use to keep track of the buffers age? > What I > realized with interrupts is that there doesn't appear to be an interrupt > that can poll fast enough to keep up, since a VBLANK is tied to the > vertical refresh -- which is relatively infrequent. I'm thinking that it > might be best to start out without interrupts and to use GUI masters for > blits and then investigate using interrupts, at least for blits. That had crossed my mind before too. I think it may be a good idea too. > Anyway, > I have an implementation of the freelist and other queues that's > functional, though it might require some locks here and there. > I'll try to stabilize things more and send a patch for you to look at. > Looking forward to that. > I've also played around some more with AGP textures. I have hacked up > the > performance boxes client-side with clear ioctls, and this helps to see > what's going on. I'll try to clean that up so I can commit it. I've > found some problems with the global LRU and texture aging that I'm trying > to fix as well. I'll post a more detailed summary of that soon. > What can I say? Great work Leaf! =) > BTW, as to your question about multiple clients and state: I think this > is handled when acquiring the lock. If the context stamp on the SAREA > doesn't match the current context after getting the lock, everything is > marked as dirty to force the current context to emit all it's state. > Emitting state to the SAREA is always done while holding the lock. > I hadn't realize that before. Thanks for the info. Regards, José Fonseca ___ Have big pipes? SourceForge.net is looking for download mirrors. We supply the hardware. You get the recognition. Email Us: [EMAIL PROTECTED] ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
[Dri-devel] Maya 4 on MGA G400
Hello, I would like to know if anybody was able to run Maya 4 on a Matrox G400 using DRI, it seems that Maya only support 3DLabs and nVidia cards. Any clue about running it on a G400? Best regards, - german -- Send email with "SEND GPG KEY" as subject to receive my GnuPG public key. ___ Have big pipes? SourceForge.net is looking for download mirrors. We supply the hardware. You get the recognition. Email Us: [EMAIL PROTECTED] ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Dri-devel] Client context uploads: How to implement them? / Analysis of the Gamma driver
"José Fonseca" wrote: > > Darryl, > > On 2002.05.12 19:11 Daryll Strauss wrote: > > On Sun, May 12, 2002 at 05:27:26PM +0100, José Fonseca wrote: > > > I would also appreciate any ideas regarding this. This is surely an > > issue > > > I would like to discuss further on the next meeting. > > > > You're right, there's no automatic way to know what state has become > > dirty. You need to keep some flags that tell you what state has > > changed when you change clients. Since it is work to keep these flags up > > to date, you have to decide what granularity to keep. > > > > Any time you don't immediately get the hardware lock you have to check > > your flags to see what changed. In the tdfx driver I kept 3 flags. One > > was that the fifo has changed. That basically meant some other client (X > > server, or 3D app) had written data to the card. I had to resyncronize > > the fifo in that case. The second said that the 3D state was dirty. That > > would only occur when a second 3D app ran (the X server never touched > > the same state) and required that I reset all the 3D parameters. Finally > > there was a texture dirty flag which meant that I had to reload the > > textures on the card. > > > > The rationale for that breakdown was that X server context switches > > would be common. It has to do input handling for example. So I wanted a > > cheap way to say that the X server had done stuff, but only the fifo > > changed. Next I argued that texture swaps were really expensive. So, if > > two 3D apps were running, but not using textures, it would be nice to > > avoid paging what could be multiple megabytes of textures. Finally that > > meant 3D state was everything else. It wasn't that much data to force to > > a known state, so it wasn't worth breaking that into smaller chunks. > > > > The three flags were stored in the SAREA, and the first time a client > > changed each of the areas it would store it's number into the > > appropriate flag of the SAREA. > > > > I've been snooping in the tdfx driver source. You're referring to the > TDFXSAREAPriv.*Owner flags in tdfx_context.h and the tdfxGetLock function > in tdfx_lock.c, in the tdfx's Mesa driver. > > So let's see if I got this straight (these notes are not specific to 3dfx): > > - The SAREA writes are made with the lock (as Leif pointed), and > reflects the state left by the last client that grabed it. > - All those "dirty"/"UPDATE" flags are only meaningfull within a client. > Whenever another client got into the way, the word is upload whatever > changed - "what" in particular depends of the granularity gauge you > mentioned. > - For this to work with DMA too the buffers must be sent exactly in the > same order they are received on the DRM. > - The DRM knows nothing about this: it's up to the client to make sure > that the information in SAREA is up to date. (The 3dfx is an exception > since there is no DMA so the state is sent to the card without DRM > intervention, via the Glide library). > > > Just a small expansion on this. The texture management solution is > > weak. If two clients each had a small texture, it would be quite > > possible that they both would have fit in texture memory and no texture > > swapping would be required. Doing that would have required more advanced > > texture management that realized certain regions were in use by one > > client or another. We still don't have that yet. In a grand scheme > > regions of texture memory would be shared between 2D and multiple 3D > > clients. We have this for other drivers - there's a linked list of regions in the sarea that let drivers know for each 'chunk' of texture memory whether it has been stolen by another client or not. Keith ___ Have big pipes? SourceForge.net is looking for download mirrors. We supply the hardware. You get the recognition. Email Us: [EMAIL PROTECTED] ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Dri-devel] Texture management (Was: Client context uploads...)
On Sun, 12 May 2002, Keith Whitwell wrote:
> > > Just a small expansion on this. The texture management solution is
> > > weak. If two clients each had a small texture, it would be quite
> > > possible that they both would have fit in texture memory and no texture
> > > swapping would be required. Doing that would have required more advanced
> > > texture management that realized certain regions were in use by one
> > > client or another. We still don't have that yet. In a grand scheme
> > > regions of texture memory would be shared between 2D and multiple 3D
> > > clients.
>
> We have this for other drivers - there's a linked list of regions in the sarea
> that let drivers know for each 'chunk' of texture memory whether it has been
> stolen by another client or not.
>
Good timing, I was just composing a message about this. Maybe you can
help me...
In working on AGP texturing for mach64, I'm starting from the Rage128
code, which seems to have some problems (though the texture aging problem
could affect other drivers). My understanding is that textures in the
global LRU are marked as "used" and aged so that placeholders can be
inserted in a context's local LRU when another context steals its texture
memory. The problem is that nowhere are these texture regions released by
the context using them. The global LRU is only reset when the heap is
full. So the heap has to fill up before placeholders begin to get swapped
out. I've seen this when running multiple contexts at once, or repeatedly
starting, stopping, and restarting a single app. This isn't a huge
problem with a single heap, but with an AGP heap it means that card memory
is effectively leaked. Once the card memory global LRU is nearly filled
in the sarea with regions marked as "used", newly started apps will start
out only using AGP mem (with the r128 algorithm). Only if the app uses
enough mem. to fill AGP will it start to swap out the placeholders from
the local LRU and use card memory.
One possible solution I'm playing with would be to use a context
identifier on texture regions in the global LRU rather than a boolean
"in_use" (similar to the ctxOwner identifier used for marking the last
owner of the sarea's state information). Then when a context swaps out or
destroys textures, it can free regions that it owns from the global LRU
and age them so that other contexts will swap out their corresponding
placeholders. The downside is an increased penalty for swapping textures.
Another problem is how to reclaim "leaked" regions when an app doesn't
exit normally?
I've also found what looks to me like a bug in the Rage128 driver in
UploadTexImages. The beginning of the function does this:
/* Choose the heap appropriately */
heap = t->heap = R128_CARD_HEAP;
if ( !rmesa->r128Screen->IsPCI &&
t->totalSize > rmesa->r128Screen->texSize[heap] ) {
heap = t->heap = R128_AGP_HEAP;
}
/* Do we need to eject LRU texture objects? */
if ( !t->memBlock ) {
Find a memBlock, swapping and/or changing heaps if necessary...
}
Update LRU and upload dirty images...
The problem I see here is that setting t->heap before checking for an
existing memBlock could potentially lead to a situation where t->heap !=
t->memBlock->heap. So in my code I've deferred changing t->heap = heap to
inside the 'if' block where we know there is no memBlock. Again this
situation can only occur if there is an AGP heap. Is there a reason for
this behavior in the Rage128 code that I'm missing, or is this a bug?
--
Leif Delgass
http://www.retinalburn.net
___
Have big pipes? SourceForge.net is looking for download mirrors. We supply
the hardware. You get the recognition. Email Us: [EMAIL PROTECTED]
___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Dri-devel] Client context uploads: How to implement them? /Analysis of the Gamma driver
On Sun, 12 May 2002, José Fonseca wrote: > Leif, > > On 2002.05.12 19:15 Leif Delgass wrote: > > Jose, > > > > I've been experimenting with this too, and was able to get things going > > with state being emitted either from the client or the drm, though I'm > > still having lockups and things are generally a bit buggy and unstable > > still. To try client side context emits, I basically went back to having > > each primitive emit state into the vertex buffer before adding the vertex > > data, like the original hack with MMIO. This works, but may be emmiting > > state when it's not necessary. > > I don't see how that would happen: only the dirty context was updated > before. It didn't really make sense to me as I was writing this, to tell the truth. :) I just had it in my head that this way was a hack. I guess it was just the client-side register programming that made it "evil" before. At any rate, as you say, I think doing this in the drm is probably better anyway. > > Now I'm trying state emits in the drm, and > > I think that doing the emits on the DRM give us more flexibility than in > the client. > > > to do that I'm just grabbing a buffer from the freelist and adding it to > > the queue before the vertex buffer, so things are in the correct order in > > the queue. The downside of this is that buffer space is wasted, since > > the > > state emit uses a small portion of a buffer, but putting state in a > > separate buffer from vertex data allows the proper ordering in the queue. > > Is it a requirement that the addresses stored in the descriptor tables > must be aligned on some boundary? If not we could use a single buffer to > hold succesive context emits, and the first entry each table descriptor > would point to a section of this buffer. This way there wouldn't be any > waste of space and a single buffer would suffice for a big number of DMA > buffers. I think the data tables need to be aligned on a 4K boundary, since that's the maximum size, but I'm not positive. I know for sure that the descriptor table has to aligned to its size. > > > > Perhaps we could use a private set of smaller buffers for this. At any > > rate, I've done the same for clears and swaps, so I have asynchronous DMA > > (minus blits) working with gears at least. > > This is another way too. I don't know if we are limited to the kernel > memory allocation granularity, so unless this is already done by the pci_* > API we might need to to split buffers into smaller sizes. The pci_pool interface is intended for these sort of small buffers, I think. We just tell it to give us 4K buffers and allocate as many as we need with pci_pool_alloc. That would give us buffers one quarter the size of a full vertex buffer and still satisfy alignment constraints. This would also be more secure, since these buffers would be private to the drm. We could use these to terminate each DMA pass as well. That's one thing that needs more investigation, what registers need to be reset at the end of a DMA pass? Right now I'm only writing src_cntl to disable the bus mastering bit. Bus_cntl isn't fifo-ed, so it doesn't make sense to me to set it, even though the utah driver did. The only drawback to using private buffers is that it complicates the freelist. > > I'm still getting lockups > > with > > anything more complicated and there are still some state problems. The > > good news is that I'm finally seeing an increase in frame rate, so > > there's > > light at the end of the tunnel. > > My time is limited, and I can't spend more than 3 hrs per day on this, but > I think that after the meeting tomorrow we should try to keep the cvs on > sync, even if it's less stable - it's a development branch after all and > its stability is not as important as making progress. OK, I'll try to check in more often. I've been trying a lot of different things, so I just need to clean things up a bit to minimize the cruft. I don't want to check in failed experiments. ;) For a while the branch is likely to cause frequent lockups. I'm trying to at least get pseudo-DMA stable again. > > > > Right now I'm using 1MB (half the buffers) as the high water mark, so > > there should always be plenty of available buffers for the drm. To get > > this working, I've used buffer aging rather than interrupts. > > Which register do you use to keep track of the buffers age? I'm using the PAT_REG[0,1] registers since they aren't needed for 3D. As long as we make sure that DMA is idle and the register contents are saved/restored when switching contexts between 2D/3D, I think this should work. The DDX only uses them for mono pattern fills in the XAA routine, and it saves and restores them, so we need to do the same. I've done that in the Enter/LeaveServer in atidri.c. We should probably also modify the DDX's Sync routine for XAA to use the drm idle ioctl. I think we'll need to make sure that the DMA queue is flushed before checking fo
Re: [Dri-devel] Client context uploads: How to implement them? / Analysis of the Gamma driver
On Sunday 12 May 2002 01:15 pm, Leif Delgass wrote: > this working, I've used buffer aging rather than interrupts. What I > realized with interrupts is that there doesn't appear to be an interrupt > that can poll fast enough to keep up, since a VBLANK is tied to the > vertical refresh -- which is relatively infrequent. Depends on what you're trying to do with it. If you're polling for completion of a pass for a given caller, it may be problematic. Should we be doing that with this chip, though? I had envisioned a scheme in which the clients didn't drive DMA, they simply submitted things to be rendered out by the chip's DMA and the DRM managed details of keeping track of completion, etc. The first pass of the code I was working towards was going to rely on a lack of free buffers to throttle the clients accordingly. If that didn't work as I had hoped, I was looking to use async notifications to tell clients the request had been processed. In that model, the only things needing to deal with locks are the DRM engine code submitting the DMAable buffers or blits (ran by a seperate group of code...) and any clients/code directly manipulating the chip. All DMA engine clients do is ask for buffers, fill them, and then queue them up for submission to the chip's gui-master engine. The interrupt handler takes care of the rest. In that picture, you're not submitting one group of buffers for one client to the chip, you're submitting as many buffers as you think you can get away with at 60+ Hz (something like 1-2Mb, from prior experience with the Utah-GLX code...) from the queue submitted by all clients. The DRM lock scheme would give enough flexibility to allow the X server to squeeze in against the DRM DMA handler and vice-versa so that screen updates and other stuff could be managed. Unfortunately, my lack of available time precluded me from coding in more than the base framework for that scheme. -- Frank Earl ___ Have big pipes? SourceForge.net is looking for download mirrors. We supply the hardware. You get the recognition. Email Us: [EMAIL PROTECTED] ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
[Dri-devel] radeon CPU usage
Hi I dont know if its 'normal', but Quake3 seems to use about 50% CPU time (framesync on) whilst running, with another 50% spent in the kernel. any ideas why ? ___ Have big pipes? SourceForge.net is looking for download mirrors. We supply the hardware. You get the recognition. Email Us: [EMAIL PROTECTED] ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
[Dri-devel] Radeon 7500 + AMD761 Locks machine
Hello I posted this last month on the dri-users list in hopes that I was doing something wrong and would find out how to fix it. I eventually did fix it by installing a matrox g450, and everything works. This pretty well rules out any configuration issues and narrows it down to : radeon 7500 + amd761 + dri = not good :-( I did a lot of googling for references to this and found a reference to a fix in the ac-kernel branch for lockups caused by this combination, so I patched my 2.4.18 kernel with the ac patches, with no luck. What happens is when dri is enabled in XF86Config, as soon as X starts the machine goes dead: monitor, keyboard, even the NIC, all gone. A posthumous examination of /var/log/XFree86.0.log shows that everything is loaded successfully, other than the driver references it is the same as the log generated with the matrox card, X is happy, it just doesn't seem to know its dead. Am I the only one with this problem? My motherboard is an ASUS A7M266 with Athlon XP 1600+ with recent upgrade to BIOS (Janaury) This combo worked with Nvidia TNT2, and now matrox. I would like to submit a bug report but don't know how to generate any more useful information than I have given here. Thanks -- Greg T Hill --- Today is Pungenday, the 60th day of Discord in the YOLD 3168 ___ Have big pipes? SourceForge.net is looking for download mirrors. We supply the hardware. You get the recognition. Email Us: [EMAIL PROTECTED] ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
