Re: [Dri-devel] Mach64 for ppc xf86-log etc

2002-05-12 Thread Peter Andersson

You are fast, i never would have thought you would answer this fast!
Leif Delgass wrote:

>It looks like there's a problem with the drm initialization.  Could you 
>send the kmsg output with debugging turned on:
>
>close X
>become root
>rmmod mach64
>modprobe mach64 drm_opts=debug
>cat /proc/kmsg > kmsg.txt
>start X from another console
>
>On Sat, 11 May 2002, Peter Andersson wrote:
>
I hope this will give you some information. At least i know that this 
will be more informative than some of mine other mails.
When i ran the command "modprobe mach64 drm_opts=debug" i got the 
following error messages:

/lib/modules/2.4.18/kernel/drivers/char/drm/mach64.o: invalid parameter parm
/lib/modules/2.4.18/kernel/drivers/char/drm/mach64.o insmod 
/lib/modules/2.4.18/kernel/drivers/char/mach64.o failed
/lib/modules/2.4.18/kernel/drivers/char/drm/mach64.o insmod mach64 failed

The kmsg.txt output is included (as an attatchment)..

The strange thing is that i can start x now for some reason, it didn´t 
work a couple of hours ago. I still get the "Error :flusinng vertex 
 buffer: return= -16" when running glxgears.

Regards

Peter




<7>[drm:mach64_ioctl] pid=680, cmd=0xc0246400, nr=0x00, dev 0xe200, auth=1
<7>[drm:mach64_flush] pid = 680, device = 0xe200, open_count = 1
<7>[drm:mach64_release] open_count = 1
<7>[drm:mach64_release] pid = 680, device = 0xe200, open_count = 1
<7>[drm:mach64_fasync] fd = -1, device = 0xe200
<7>[drm:mach64_takedown] 
<7>[drm:mach64_ioctl] pid=680, cmd=0xc0246400, nr=0x00, dev 0xe200, auth=1
<7>[drm:mach64_flush] pid = 680, device = 0xe200, open_count = 1
<7>[drm:mach64_release] open_count = 1
<7>[drm:mach64_release] pid = 680, device = 0xe200, open_count = 1
<7>[drm:mach64_fasync] fd = -1, device = 0xe200
<7>[drm:mach64_takedown] 
<7>[drm:mach64_ioctl] pid=680, cmd=0xc0246400, nr=0x00, dev 0xe200, auth=1
<7>[drm:mach64_ioctl] pid=680, cmd=0xc0086401, nr=0x01, dev 0xe200, auth=1
<7>[drm:mach64_ioctl] pid=680, cmd=0x80086410, nr=0x10, dev 0xe200, auth=1
<7>[drm:mach64_mmap] start = 0x30019000, end = 0x3001b000, offset = 0xd591b000
<7>[drm:mach64_vm_open] 0x30019000,0x2000
<7>[drm:mach64_vm_shm_nopage] shm_nopage 0x3001a000
<7>[drm:mach64_ioctl] pid=680, cmd=0xc0086426, nr=0x26, dev 0xe200, auth=1
<7>[drm:mach64_ioctl] pid=680, cmd=0xc0246400, nr=0x00, dev 0xe200, auth=1
<7>[drm:mach64_ioctl] pid=680, cmd=0x20006430, nr=0x30, dev 0xe200, auth=1
<7>[drm:mach64_ioctl] pid=680, cmd=0xc0106403, nr=0x03, dev 0xe200, auth=1
<7>[drm:mach64_irq_busid] 0:16:0 => IRQ 48
<7>[drm:mach64_ioctl] pid=680, cmd=0x80086422, nr=0x22, dev 0xe200, auth=1
<7>[drm:mach64_fasync] fd = 7, device = 0xe200
<7>[drm:mach64_ioctl] pid=680, cmd=0x80086414, nr=0x14, dev 0xe200, auth=1
<7>[drm:mach64_irq_install] mach64_irq_install: irq=48
<7>[drm:mach64_irq_install] Before PREINSTALL: CRTC_INT_CNTL = 0x8874
<7>[drm:mach64_irq_install] After PREINSTALL: CRTC_INT_CNTL = 0x8874
<7>[drm:mach64_irq_install] Before POSTINSTALL: CRTC_INT_CNTL = 0x0840
<7>[drm:mach64_irq_install] After POSTINSTALL: CRTC_INT_CNTL = 0x0942
<7>[drm:mach64_ioctl] pid=680, cmd=0xc00c6419, nr=0x19, dev 0xe200, auth=1
<7>[drm:mach64_mapbufs] 128 buffers, retcode = 0
<7>[drm:mach64_vm_open] 0x30019000,0x2000
<7>[drm:mach64_vm_open] 0x3044a000,0x0020
<7>[drm:mach64_flush] pid = 682, device = 0xe200, open_count = 1
<7>[drm:mach64_ioctl] pid=680, cmd=0x8008642a, nr=0x2a, dev 0xe200, auth=1
<7>[drm:mach64_lock] 1 (pid 680) requests lock (0x0001), flags = 0x000a
<7>[drm:mach64_lock] 1 has lock
<7>[drm:mach64_ioctl] pid=680, cmd=0x8008642a, nr=0x2a, dev 0xe200, auth=1
<7>[drm:mach64_lock] 1 (pid 680) requests lock (0x0001), flags = 0x000a
<7>[drm:mach64_lock] 1 has lock
<7>[drm:mach64_ioctl] pid=680, cmd=0x8008642a, nr=0x2a, dev 0xe200, auth=1
<7>[drm:mach64_lock] 1 (pid 680) requests lock (0x0001), flags = 0x000a
<7>[drm:mach64_lock] 1 has lock
<7>[drm:mach64_ioctl] pid=680, cmd=0x8008642a, nr=0x2a, dev 0xe200, auth=1
<7>[drm:mach64_lock] 1 (pid 680) requests lock (0x0001), flags = 0x000a
<7>[drm:mach64_lock] 1 has lock
<7>[drm:mach64_ioctl] pid=680, cmd=0x8008642a, nr=0x2a, dev 0xe200, auth=1
<7>[drm:mach64_lock] 1 (pid 680) requests lock (0x0001), flags = 0x000a
<7>[drm:mach64_lock] 1 has lock
<7>[drm:mach64_ioctl] pid=680, cmd=0x8008642a, nr=0x2a, dev 0xe200, auth=1
<7>[drm:mach64_lock] 1 (pid 680) requests lock (0x0001), flags = 0x000a
<7>[drm:mach64_lock] 1 has lock
<7>[drm:mach64_ioctl] pid=680, cmd=0x8008642a, nr=0x2a, dev 0xe200, auth=1
<7>[drm:mach64_lock] 1 (pid 680) requests lock (0x0001), flags = 0x000a
<7>[drm:mach64_lock] 1 has lock
<7>[drm:mach64_ioctl] pid=680, cmd=0x8008642a, nr=0x2a, dev 0xe200, auth=1
<7>[drm:mach64_lock] 1 (pid 680) requests lock (0x0001), flags = 0x000a
<7>[drm:mach64_lock] 1 has lock
<7>[drm:mach64_ioctl] pid=680, cmd=0x8008642a, nr=0x2a, dev 0xe200, auth=1
<7>[drm:mach64_lock] 1 (pid 680) requests lock (0x0001), flags = 0x000

Re: [Dri-devel] MACH64_BM_GUI_TABLE(_CMD)?

2002-05-12 Thread José Fonseca

On 2002.05.05 19:41 Frank C. Earl wrote:
> ...
> 
> > I plan to build a test case for this, but I would like to hear
> preliminary
> > opinions about this, in case I'm missing something. Frank, have you
> tested
> > this before?
> 
> Yes, pretty extensively, but I didn't have time to set up tests for
> spanning
> multiple pages- we ought to do that one last one before commiting to the
> path we're now looking at.


Ok. I've made a test to see if this is possible and it failed. It's best 
that Leif and Frank made a quick review of the test I made (attached), to 
see if is there any mistake I made, before we put a stone on this issue.

Basically I changed mach64_bm_dma_test to allocate a 2nd descriptor table 
and 2 more data buffers. The first buffer attempts to override 
MACH64_BM_GUI_TABLE to read the 2nd table (which points to a 3rd buffer 
which fille the vertex registers with different values). The 2nd buffer is 
the continuation of the 1st and has the regular cleanup.

Now I plan to reproduce the hang that we had when trying to draw a 
multitextured triangle without the texture offset specified to see if the 
engine can recover or not from the lock. Frank, on IRC I got the 
impression that you were gonna try this. Did you?

José Fonseca


static int mach64_bm_dma_test( drm_device_t *dev )
{
drm_mach64_private_t *dev_priv = dev->dev_private;
dma_addr_t data_handle, data2_handle, data3_handle, table2_handle;
void *cpu_addr_data, *cpu_addr_data2, *cpu_addr_data3, *cpu_addr_table2;
u32 data_addr, data2_addr, data3_addr, table2_addr;
u32 *table, *data, *table2, *data2, *data3;
u32 regs[3], expected[3];
int i;

DRM_DEBUG( "%s\n", __FUNCTION__ );

table = (u32 *) dev_priv->cpu_addr_table;

/* FIXME: get a dma buffer from the freelist here rather than using the pool */
DRM_DEBUG( "Allocating data memory ...\n" );
cpu_addr_data = pci_pool_alloc( dev_priv->pool, SLAB_ATOMIC, &data_handle );
cpu_addr_data2 = pci_pool_alloc( dev_priv->pool, SLAB_ATOMIC, &data2_handle );
cpu_addr_data3 = pci_pool_alloc( dev_priv->pool, SLAB_ATOMIC, &data3_handle );
cpu_addr_table2 = pci_pool_alloc( dev_priv->pool, SLAB_ATOMIC, &table2_handle 
);
if (!cpu_addr_data || !data_handle || !cpu_addr_data2 || !data2_handle || 
!cpu_addr_data3 || !data3_handle || !cpu_addr_table2 || !table2_handle) {
DRM_INFO( "data-memory allocation failed!\n" );
return -ENOMEM;
} else {
data = (u32 *) cpu_addr_data;
data_addr = (u32) data_handle;
data2 = (u32 *) cpu_addr_data2;
data2_addr = (u32) data2_handle;
data3 = (u32 *) cpu_addr_data3;
data3_addr = (u32) data3_handle;
table2 = (u32 *) cpu_addr_table2;
table2_addr = (u32) table2_handle;
}

MACH64_WRITE( MACH64_SRC_CNTL, 0x );

MACH64_WRITE( MACH64_VERTEX_1_S, 0x );
MACH64_WRITE( MACH64_VERTEX_1_T, 0x );
MACH64_WRITE( MACH64_VERTEX_1_W, 0x );

for (i=0; i < 3; i++) {
DRM_DEBUG( "(Before DMA Transfer) reg %d = 0x%08x\n", i, 
   MACH64_READ( (MACH64_VERTEX_1_S + i*4) ) );
}

/* 1_90 = VERTEX_1_S, setup 3 sequential reg writes */
/* use only s,t,w vertex registers so we don't have to mask any results */
data[0] = cpu_to_le32(0x00020190); 
data[1] = expected[0] = 0x;
data[2] = expected[1] = 0x;
data[3] = expected[2] = 0x;
data[4] = cpu_to_le32(MACH64_BM_GUI_TABLE_CMD);
data[5] = cpu_to_le32(table2_addr | MACH64_CIRCULAR_BUF_SIZE_16KB);
data[6] = cpu_to_le32(MACH64_DST_HEIGHT_WIDTH);
data[7] = cpu_to_le32(0);
data2[8] = cpu_to_le32(0x006d); /* SRC_CNTL */
data2[9] = 0x;

data3[0] = cpu_to_le32(0x00020190); 
data3[1] = 0x;
data3[2] = 0x;
data3[3] = 0x;
data3[4] = cpu_to_le32(0x006d); /* SRC_CNTL */
data3[5] = 0x;

DRM_DEBUG( "Preparing table ...\n" );
table[0] = cpu_to_le32(MACH64_BM_ADDR + APERTURE_OFFSET);
table[1] = cpu_to_le32(data_addr);
table[2] = cpu_to_le32(8 * sizeof( u32 ) | 0x4000);
table[3] = 0;
table[4] = cpu_to_le32(MACH64_BM_ADDR + APERTURE_OFFSET);
table[5] = cpu_to_le32(data2_addr);
table[6] = cpu_to_le32(2 * sizeof( u32 ) | 0x8000 | 0x4000);
table[7] = 0;
table2[0] = cpu_to_le32(MACH64_BM_ADDR + APERTURE_OFFSET);
table2[1] = cpu_to_le32(data3_addr);
table2[2] = cpu_to_le32(6 * sizeof( u32 ) | 0x8000 | 0x4000);
table2[3] = 0;

DRM_DEBUG( "table[0] = 0x%08x\n", table[0] );
DRM_DEBUG( "table[1] = 0x%08x\n", table[1] );

[Dri-devel] Client context uploads: How to implement them? / Analysis of the Gamma driver

2002-05-12 Thread José Fonseca

As it becomes more clear that in the mach64 the best solution is to fill 
DMA buffers with the context state and the vertex buffers I've been trying 
to understand how can this be done and how the Gamma driver (which has 
this same model) does.

The context state is available right in the beginning of running a 
pipeline and usually DDUpdateHWState is called in the beginning of 
RunPipeline. The problem is that although all state information is 
available, we don't know which part should be uploaded since other clients 
could dirty the hardware registers in the meanwhile.

I'm don't fully understand how the Gamma driver overcomes this. Its 
behavior regarding this is controled by a macro definition, named 
DO_VALIDATE, that enables a series of VALIDATE_* macros which in turn I 
couldn't understand what they do. Another thing that caught my atention 
was the "HACK" comment on gammaDDUpdateHWState before gammaEmitHwState - 
it reminds a similar comment on mach64, which makes one think that the 
author had in mind a better way to do that. Alan, could you shed some 
light on these two issues please?

Before I started this little research I already had given some thought on 
I would do it. One idea that crossed my mind was to reserve some space on 
the DMA buffer to put the context state before submiting the buffer. Of 
course that there would be some DMA buffer waste but it wouldn't that much 
since there are a fairly low number of context registers. One think that 
holds me back is that I still don't understand how multiple clients avoid 
each other: what is done in parallel, and what is done in serie...

I would also appreciate any ideas regarding this. This is surely an issue 
I would like to discuss further on the next meeting.

Regards,

José Fonseca

___

Have big pipes? SourceForge.net is looking for download mirrors. We supply
the hardware. You get the recognition. Email Us: [EMAIL PROTECTED]
___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel



Re: [Dri-devel] Client context uploads: How to implement them? / Analysis of the Gamma driver

2002-05-12 Thread Alan Hourihane

Jose,

I'd certainly forget using the gamma driver as any kind of template for
any work.

There are many unimplemented features, and multiple clients just don't
work.

Purely a lack of time thing.

Alan.

On Sun, May 12, 2002 at 05:27:26 +0100, Jos Fonseca wrote:
> As it becomes more clear that in the mach64 the best solution is to fill 
> DMA buffers with the context state and the vertex buffers I've been trying 
> to understand how can this be done and how the Gamma driver (which has 
> this same model) does.
> 
> The context state is available right in the beginning of running a 
> pipeline and usually DDUpdateHWState is called in the beginning of 
> RunPipeline. The problem is that although all state information is 
> available, we don't know which part should be uploaded since other clients 
> could dirty the hardware registers in the meanwhile.
> 
> I'm don't fully understand how the Gamma driver overcomes this. Its 
> behavior regarding this is controled by a macro definition, named 
> DO_VALIDATE, that enables a series of VALIDATE_* macros which in turn I 
> couldn't understand what they do. Another thing that caught my atention 
> was the "HACK" comment on gammaDDUpdateHWState before gammaEmitHwState - 
> it reminds a similar comment on mach64, which makes one think that the 
> author had in mind a better way to do that. Alan, could you shed some 
> light on these two issues please?
> 
> Before I started this little research I already had given some thought on 
> I would do it. One idea that crossed my mind was to reserve some space on 
> the DMA buffer to put the context state before submiting the buffer. Of 
> course that there would be some DMA buffer waste but it wouldn't that much 
> since there are a fairly low number of context registers. One think that 
> holds me back is that I still don't understand how multiple clients avoid 
> each other: what is done in parallel, and what is done in serie...
> 
> I would also appreciate any ideas regarding this. This is surely an issue 
> I would like to discuss further on the next meeting.
> 
> Regards,
> 
> Jos Fonseca
> 
> ___
> 
> Have big pipes? SourceForge.net is looking for download mirrors. We supply
> the hardware. You get the recognition. Email Us: [EMAIL PROTECTED]
> ___
> Dri-devel mailing list
> [EMAIL PROTECTED]
> https://lists.sourceforge.net/lists/listinfo/dri-devel

___

Have big pipes? SourceForge.net is looking for download mirrors. We supply
the hardware. You get the recognition. Email Us: [EMAIL PROTECTED]
___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel



Re: [Dri-devel] Client context uploads: How to implement them? / Analysis of the Gamma driver

2002-05-12 Thread José Fonseca

Alan,

On 2002.05.12 17:35 Alan Hourihane wrote:
> Jose,
> 
> I'd certainly forget using the gamma driver as any kind of template for
> any work.
> 
> There are many unimplemented features, and multiple clients just don't
> work.
> 

Ok. I wasn't aware of this.

> Purely a lack of time thing.
> 
> Alan.
> 

Thanks,

José Fonseca

___

Have big pipes? SourceForge.net is looking for download mirrors. We supply
the hardware. You get the recognition. Email Us: [EMAIL PROTECTED]
___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel



Re: [Dri-devel] Client context uploads: How to implement them? /Analysis of the Gamma driver

2002-05-12 Thread Leif Delgass

Jose,

I've been experimenting with this too, and was able to get things going
with state being emitted either from the client or the drm, though I'm
still having lockups and things are generally a bit buggy and unstable
still.  To try client side context emits, I basically went back to having
each primitive emit state into the vertex buffer before adding the vertex
data, like the original hack with MMIO.  This works, but may be emmiting
state when it's not necessary.  Now I'm trying state emits in the drm, and
to do that I'm just grabbing a buffer from the freelist and adding it to
the queue before the vertex buffer, so things are in the correct order in
the queue.  The downside of this is that buffer space is wasted, since the
state emit uses a small portion of a buffer, but putting state in a
separate buffer from vertex data allows the proper ordering in the queue.  
Perhaps we could use a private set of smaller buffers for this.  At any
rate, I've done the same for clears and swaps, so I have asynchronous DMA
(minus blits) working with gears at least.  I'm still getting lockups with
anything more complicated and there are still some state problems.  The
good news is that I'm finally seeing an increase in frame rate, so there's
light at the end of the tunnel.

Right now I'm using 1MB (half the buffers) as the high water mark, so
there should always be plenty of available buffers for the drm.  To get
this working, I've used buffer aging rather than interrupts.  What I
realized with interrupts is that there doesn't appear to be an interrupt
that can poll fast enough to keep up, since a VBLANK is tied to the
vertical refresh -- which is relatively infrequent.  I'm thinking that it
might be best to start out without interrupts and to use GUI masters for
blits and then investigate using interrupts, at least for blits.  Anyway,
I have an implementation of the freelist and other queues that's
functional, though it might require some locks here and there.  
I'll try to stabilize things more and send a patch for you to look at.

I've also played around some more with AGP textures.  I have hacked up the
performance boxes client-side with clear ioctls, and this helps to see
what's going on.  I'll try to clean that up so I can commit it.  I've
found some problems with the global LRU and texture aging that I'm trying
to fix as well.  I'll post a more detailed summary of that soon.

BTW, as to your question about multiple clients and state:  I think this 
is handled when acquiring the lock.  If the context stamp on the SAREA 
doesn't match the current context after getting the lock, everything is 
marked as dirty to force the current context to emit all it's state.  
Emitting state to the SAREA is always done while holding the lock.

Regards,

Leif

On Sun, 12 May 2002, José Fonseca wrote:

> As it becomes more clear that in the mach64 the best solution is to fill 
> DMA buffers with the context state and the vertex buffers I've been trying 
> to understand how can this be done and how the Gamma driver (which has 
> this same model) does.
> 
> The context state is available right in the beginning of running a 
> pipeline and usually DDUpdateHWState is called in the beginning of 
> RunPipeline. The problem is that although all state information is 
> available, we don't know which part should be uploaded since other clients 
> could dirty the hardware registers in the meanwhile.
> 
> I'm don't fully understand how the Gamma driver overcomes this. Its 
> behavior regarding this is controled by a macro definition, named 
> DO_VALIDATE, that enables a series of VALIDATE_* macros which in turn I 
> couldn't understand what they do. Another thing that caught my atention 
> was the "HACK" comment on gammaDDUpdateHWState before gammaEmitHwState - 
> it reminds a similar comment on mach64, which makes one think that the 
> author had in mind a better way to do that. Alan, could you shed some 
> light on these two issues please?
> 
> Before I started this little research I already had given some thought on 
> I would do it. One idea that crossed my mind was to reserve some space on 
> the DMA buffer to put the context state before submiting the buffer. Of 
> course that there would be some DMA buffer waste but it wouldn't that much 
> since there are a fairly low number of context registers. One think that 
> holds me back is that I still don't understand how multiple clients avoid 
> each other: what is done in parallel, and what is done in serie...
> 
> I would also appreciate any ideas regarding this. This is surely an issue 
> I would like to discuss further on the next meeting.
> 
> Regards,
> 
> José Fonseca
> 
> ___
> 
> Have big pipes? SourceForge.net is looking for download mirrors. We supply
> the hardware. You get the recognition. Email Us: [EMAIL PROTECTED]
> ___
> Dri-devel mailing list
> [EMAIL PROTECTED]
> ht

Re: [Dri-devel] Client context uploads: How to implement them? / Analysis of the Gamma driver

2002-05-12 Thread José Fonseca

Darryl,

On 2002.05.12 19:11 Daryll Strauss wrote:
> On Sun, May 12, 2002 at 05:27:26PM +0100, José Fonseca wrote:
> > I would also appreciate any ideas regarding this. This is surely an
> issue
> > I would like to discuss further on the next meeting.
> 
> You're right, there's no automatic way to know what state has become
> dirty. You need to keep some flags that tell you what state has
> changed when you change clients. Since it is work to keep these flags up
> to date, you have to decide what granularity to keep.
> 
> Any time you don't immediately get the hardware lock you have to check
> your flags to see what changed. In the tdfx driver I kept 3 flags. One
> was that the fifo has changed. That basically meant some other client (X
> server, or 3D app) had written data to the card. I had to resyncronize
> the fifo in that case. The second said that the 3D state was dirty. That
> would only occur when a second 3D app ran (the X server never touched
> the same state) and required that I reset all the 3D parameters. Finally
> there was a texture dirty flag which meant that I had to reload the
> textures on the card.
> 
> The rationale for that breakdown was that X server context switches
> would be common. It has to do input handling for example. So I wanted a
> cheap way to say that the X server had done stuff, but only the fifo
> changed. Next I argued that texture swaps were really expensive. So, if
> two 3D apps were running, but not using textures, it would be nice to
> avoid paging what could be multiple megabytes of textures. Finally that
> meant 3D state was everything else. It wasn't that much data to force to
> a known state, so it wasn't worth breaking that into smaller chunks.
> 
> The three flags were stored in the SAREA, and the first time a client
> changed each of the areas it would store it's number into the
> appropriate flag of the SAREA.
> 

I've been snooping in the tdfx driver source. You're referring to the 
TDFXSAREAPriv.*Owner flags in tdfx_context.h and the tdfxGetLock function 
in tdfx_lock.c, in the tdfx's Mesa driver.

So let's see if I got this straight (these notes are not specific to 3dfx):

  - The SAREA writes are made with the lock (as Leif pointed), and 
reflects the state left by the last client that grabed it.
  - All those "dirty"/"UPDATE" flags are only meaningfull within a client. 
Whenever another client got into the way, the word is upload whatever 
changed  - "what" in particular depends of the granularity gauge you 
mentioned.
  - For this to work with DMA too the buffers must be sent exactly in the 
same order they are received on the DRM.
  - The DRM knows nothing about this: it's up to the client to make sure 
that the information in SAREA is up to date. (The 3dfx is an exception 
since there is no DMA so the state is sent to the card without DRM 
intervention, via the Glide library).

> Just a small expansion on this. The texture management solution is
> weak. If two clients each had a small texture, it would be quite
> possible that they both would have fit in texture memory and no texture
> swapping would be required. Doing that would have required more advanced
> texture management that realized certain regions were in use by one
> client or another. We still don't have that yet. In a grand scheme
> regions of texture memory would be shared between 2D and multiple 3D
> clients.

This would mean that the texture management would had to be made by the 
DRM or, perhaps even better, X. Surely something to look in the future.

Thanks for your reply. It was very informative.

José Fonseca

___

Have big pipes? SourceForge.net is looking for download mirrors. We supply
the hardware. You get the recognition. Email Us: [EMAIL PROTECTED]
___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel



Re: [Dri-devel] Client context uploads: How to implement them? / Analysis of the Gamma driver

2002-05-12 Thread José Fonseca

Leif,

On 2002.05.12 19:15 Leif Delgass wrote:
> Jose,
> 
> I've been experimenting with this too, and was able to get things going
> with state being emitted either from the client or the drm, though I'm
> still having lockups and things are generally a bit buggy and unstable
> still.  To try client side context emits, I basically went back to having
> each primitive emit state into the vertex buffer before adding the vertex
> data, like the original hack with MMIO.  This works, but may be emmiting
> state when it's not necessary.

I don't see how that would happen: only the dirty context was updated 
before.

> Now I'm trying state emits in the drm, and

I think that doing the emits on the DRM give us more flexibility than in 
the client.

> to do that I'm just grabbing a buffer from the freelist and adding it to
> the queue before the vertex buffer, so things are in the correct order in
> the queue.  The downside of this is that buffer space is wasted, since
> the
> state emit uses a small portion of a buffer, but putting state in a
> separate buffer from vertex data allows the proper ordering in the queue.

Is it a requirement that the addresses stored in the descriptor tables 
must be aligned on some boundary? If not we could use a single buffer to 
hold succesive context emits, and the first entry each table descriptor 
would point to a section of this buffer. This way there wouldn't be any 
waste of space and a single buffer would suffice for a big number of DMA 
buffers.

> 
> Perhaps we could use a private set of smaller buffers for this.  At any
> rate, I've done the same for clears and swaps, so I have asynchronous DMA
> (minus blits) working with gears at least.

This is another way too. I don't know if we are limited to the kernel 
memory allocation granularity, so unless this is already done by the pci_* 
API we might need to to split buffers into smaller sizes.

> I'm still getting lockups
> with
> anything more complicated and there are still some state problems.  The
> good news is that I'm finally seeing an increase in frame rate, so
> there's
> light at the end of the tunnel.

My time is limited, and I can't spend more than 3 hrs per day on this, but 
I think that after the meeting tomorrow we should try to keep the cvs on 
sync, even if it's less stable - it's a development branch after all and 
its stability is not as important as making progress.

> 
> Right now I'm using 1MB (half the buffers) as the high water mark, so
> there should always be plenty of available buffers for the drm.  To get
> this working, I've used buffer aging rather than interrupts.

Which register do you use to keep track of the buffers age?

> What I
> realized with interrupts is that there doesn't appear to be an interrupt
> that can poll fast enough to keep up, since a VBLANK is tied to the
> vertical refresh -- which is relatively infrequent.  I'm thinking that it
> might be best to start out without interrupts and to use GUI masters for
> blits and then investigate using interrupts, at least for blits.

That had crossed my mind before too. I think it may be a good idea too.

> Anyway,
> I have an implementation of the freelist and other queues that's
> functional, though it might require some locks here and there.
> I'll try to stabilize things more and send a patch for you to look at.
> 

Looking forward to that.

> I've also played around some more with AGP textures.  I have hacked up
> the
> performance boxes client-side with clear ioctls, and this helps to see
> what's going on.  I'll try to clean that up so I can commit it.  I've
> found some problems with the global LRU and texture aging that I'm trying
> to fix as well.  I'll post a more detailed summary of that soon.
> 

What can I say? Great work Leaf! =)

> BTW, as to your question about multiple clients and state:  I think this
> is handled when acquiring the lock.  If the context stamp on the SAREA
> doesn't match the current context after getting the lock, everything is
> marked as dirty to force the current context to emit all it's state.
> Emitting state to the SAREA is always done while holding the lock.
> 
I hadn't realize that before. Thanks for the info.

Regards,

José Fonseca

___

Have big pipes? SourceForge.net is looking for download mirrors. We supply
the hardware. You get the recognition. Email Us: [EMAIL PROTECTED]
___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel



[Dri-devel] Maya 4 on MGA G400

2002-05-12 Thread German Gomez Garcia

Hello,

I would like to know if anybody was able to run Maya 4 on a 
Matrox G400
using DRI, it seems that Maya only support 3DLabs and nVidia cards. Any 
clue
about running it on a G400?

Best regards,

- german

-- 
Send email with "SEND GPG KEY" as subject to receive my GnuPG public 
key.

___

Have big pipes? SourceForge.net is looking for download mirrors. We supply
the hardware. You get the recognition. Email Us: [EMAIL PROTECTED]
___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel



Re: [Dri-devel] Client context uploads: How to implement them? / Analysis of the Gamma driver

2002-05-12 Thread Keith Whitwell

"José Fonseca" wrote:
> 
> Darryl,
> 
> On 2002.05.12 19:11 Daryll Strauss wrote:
> > On Sun, May 12, 2002 at 05:27:26PM +0100, José Fonseca wrote:
> > > I would also appreciate any ideas regarding this. This is surely an
> > issue
> > > I would like to discuss further on the next meeting.
> >
> > You're right, there's no automatic way to know what state has become
> > dirty. You need to keep some flags that tell you what state has
> > changed when you change clients. Since it is work to keep these flags up
> > to date, you have to decide what granularity to keep.
> >
> > Any time you don't immediately get the hardware lock you have to check
> > your flags to see what changed. In the tdfx driver I kept 3 flags. One
> > was that the fifo has changed. That basically meant some other client (X
> > server, or 3D app) had written data to the card. I had to resyncronize
> > the fifo in that case. The second said that the 3D state was dirty. That
> > would only occur when a second 3D app ran (the X server never touched
> > the same state) and required that I reset all the 3D parameters. Finally
> > there was a texture dirty flag which meant that I had to reload the
> > textures on the card.
> >
> > The rationale for that breakdown was that X server context switches
> > would be common. It has to do input handling for example. So I wanted a
> > cheap way to say that the X server had done stuff, but only the fifo
> > changed. Next I argued that texture swaps were really expensive. So, if
> > two 3D apps were running, but not using textures, it would be nice to
> > avoid paging what could be multiple megabytes of textures. Finally that
> > meant 3D state was everything else. It wasn't that much data to force to
> > a known state, so it wasn't worth breaking that into smaller chunks.
> >
> > The three flags were stored in the SAREA, and the first time a client
> > changed each of the areas it would store it's number into the
> > appropriate flag of the SAREA.
> >
> 
> I've been snooping in the tdfx driver source. You're referring to the
> TDFXSAREAPriv.*Owner flags in tdfx_context.h and the tdfxGetLock function
> in tdfx_lock.c, in the tdfx's Mesa driver.
> 
> So let's see if I got this straight (these notes are not specific to 3dfx):
> 
>   - The SAREA writes are made with the lock (as Leif pointed), and
> reflects the state left by the last client that grabed it.
>   - All those "dirty"/"UPDATE" flags are only meaningfull within a client.
> Whenever another client got into the way, the word is upload whatever
> changed  - "what" in particular depends of the granularity gauge you
> mentioned.
>   - For this to work with DMA too the buffers must be sent exactly in the
> same order they are received on the DRM.
>   - The DRM knows nothing about this: it's up to the client to make sure
> that the information in SAREA is up to date. (The 3dfx is an exception
> since there is no DMA so the state is sent to the card without DRM
> intervention, via the Glide library).
> 
> > Just a small expansion on this. The texture management solution is
> > weak. If two clients each had a small texture, it would be quite
> > possible that they both would have fit in texture memory and no texture
> > swapping would be required. Doing that would have required more advanced
> > texture management that realized certain regions were in use by one
> > client or another. We still don't have that yet. In a grand scheme
> > regions of texture memory would be shared between 2D and multiple 3D
> > clients.

We have this for other drivers - there's a linked list of regions in the sarea
that let drivers know for each 'chunk' of texture memory whether it has been
stolen by another client or not.

Keith

___

Have big pipes? SourceForge.net is looking for download mirrors. We supply
the hardware. You get the recognition. Email Us: [EMAIL PROTECTED]
___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel



Re: [Dri-devel] Texture management (Was: Client context uploads...)

2002-05-12 Thread Leif Delgass

On Sun, 12 May 2002, Keith Whitwell wrote:

> > > Just a small expansion on this. The texture management solution is
> > > weak. If two clients each had a small texture, it would be quite
> > > possible that they both would have fit in texture memory and no texture
> > > swapping would be required. Doing that would have required more advanced
> > > texture management that realized certain regions were in use by one
> > > client or another. We still don't have that yet. In a grand scheme
> > > regions of texture memory would be shared between 2D and multiple 3D
> > > clients.
> 
> We have this for other drivers - there's a linked list of regions in the sarea
> that let drivers know for each 'chunk' of texture memory whether it has been
> stolen by another client or not.
> 

Good timing, I was just composing a message about this.  Maybe you can 
help me...

In working on AGP texturing for mach64, I'm starting from the Rage128
code, which seems to have some problems (though the texture aging problem
could affect other drivers).  My understanding is that textures in the
global LRU are marked as "used" and aged so that placeholders can be
inserted in a context's local LRU when another context steals its texture
memory.  The problem is that nowhere are these texture regions released by
the context using them.  The global LRU is only reset when the heap is
full.  So the heap has to fill up before placeholders begin to get swapped
out.  I've seen this when running multiple contexts at once, or repeatedly
starting, stopping, and restarting a single app.  This isn't a huge
problem with a single heap, but with an AGP heap it means that card memory
is effectively leaked.  Once the card memory global LRU is nearly filled
in the sarea with regions marked as "used", newly started apps will start
out only using AGP mem (with the r128 algorithm).  Only if the app uses
enough mem. to fill AGP will it start to swap out the placeholders from
the local LRU and use card memory.

One possible solution I'm playing with would be to use a context
identifier on texture regions in the global LRU rather than a boolean
"in_use" (similar to the ctxOwner identifier used for marking the last
owner of the sarea's state information).  Then when a context swaps out or
destroys textures, it can free regions that it owns from the global LRU
and age them so that other contexts will swap out their corresponding
placeholders.  The downside is an increased penalty for swapping textures.  
Another problem is how to reclaim "leaked" regions when an app doesn't
exit normally?

I've also found what looks to me like a bug in the Rage128 driver in
UploadTexImages.  The beginning of the function does this:

   /* Choose the heap appropriately */
   heap = t->heap = R128_CARD_HEAP;
   if ( !rmesa->r128Screen->IsPCI &&
t->totalSize > rmesa->r128Screen->texSize[heap] ) {
  heap = t->heap = R128_AGP_HEAP;
   }

   /* Do we need to eject LRU texture objects? */
   if ( !t->memBlock ) {

Find a memBlock, swapping and/or changing heaps if necessary...

   }

   Update LRU and upload dirty images...

The problem I see here is that setting t->heap before checking for an
existing memBlock could potentially lead to a situation where t->heap !=
t->memBlock->heap.  So in my code I've deferred changing t->heap = heap to
inside the 'if' block where we know there is no memBlock.  Again this
situation can only occur if there is an AGP heap.  Is there a reason for
this behavior in the Rage128 code that I'm missing, or is this a bug?

-- 
Leif Delgass 
http://www.retinalburn.net



___

Have big pipes? SourceForge.net is looking for download mirrors. We supply
the hardware. You get the recognition. Email Us: [EMAIL PROTECTED]
___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel



Re: [Dri-devel] Client context uploads: How to implement them? /Analysis of the Gamma driver

2002-05-12 Thread Leif Delgass

On Sun, 12 May 2002, José Fonseca wrote:

> Leif,
> 
> On 2002.05.12 19:15 Leif Delgass wrote:
> > Jose,
> > 
> > I've been experimenting with this too, and was able to get things going
> > with state being emitted either from the client or the drm, though I'm
> > still having lockups and things are generally a bit buggy and unstable
> > still.  To try client side context emits, I basically went back to having
> > each primitive emit state into the vertex buffer before adding the vertex
> > data, like the original hack with MMIO.  This works, but may be emmiting
> > state when it's not necessary.
> 
> I don't see how that would happen: only the dirty context was updated 
> before.

It didn't really make sense to me as I was writing this, to tell the
truth. :)  I just had it in my head that this way was a hack. I guess it
was just the client-side register programming that made it "evil" before.
At any rate, as you say, I think doing this in the drm is probably better 
anyway.
 
> > Now I'm trying state emits in the drm, and
> 
> I think that doing the emits on the DRM give us more flexibility than in 
> the client.
> 
> > to do that I'm just grabbing a buffer from the freelist and adding it to
> > the queue before the vertex buffer, so things are in the correct order in
> > the queue.  The downside of this is that buffer space is wasted, since
> > the
> > state emit uses a small portion of a buffer, but putting state in a
> > separate buffer from vertex data allows the proper ordering in the queue.
> 
> Is it a requirement that the addresses stored in the descriptor tables 
> must be aligned on some boundary? If not we could use a single buffer to 
> hold succesive context emits, and the first entry each table descriptor 
> would point to a section of this buffer. This way there wouldn't be any 
> waste of space and a single buffer would suffice for a big number of DMA 
> buffers.

I think the data tables need to be aligned on a 4K boundary, since that's
the maximum size, but I'm not positive.  I know for sure that the
descriptor table has to aligned to its size.
 
> > 
> > Perhaps we could use a private set of smaller buffers for this.  At any
> > rate, I've done the same for clears and swaps, so I have asynchronous DMA
> > (minus blits) working with gears at least.
> 
> This is another way too. I don't know if we are limited to the kernel 
> memory allocation granularity, so unless this is already done by the pci_* 
> API we might need to to split buffers into smaller sizes.

The pci_pool interface is intended for these sort of small buffers, I
think.  We just tell it to give us 4K buffers and allocate as many as we
need with pci_pool_alloc.  That would give us buffers one quarter the size
of a full vertex buffer and still satisfy alignment constraints.  This
would also be more secure, since these buffers would be private to the
drm.  We could use these to terminate each DMA pass as well.  That's one
thing that needs more investigation, what registers need to be reset at
the end of a DMA pass?  Right now I'm only writing src_cntl to disable the
bus mastering bit.  Bus_cntl isn't fifo-ed, so it doesn't make sense to me
to set it, even though the utah driver did.  The only drawback to using 
private buffers is that it complicates the freelist.

> > I'm still getting lockups
> > with
> > anything more complicated and there are still some state problems.  The
> > good news is that I'm finally seeing an increase in frame rate, so
> > there's
> > light at the end of the tunnel.
> 
> My time is limited, and I can't spend more than 3 hrs per day on this, but 
> I think that after the meeting tomorrow we should try to keep the cvs on 
> sync, even if it's less stable - it's a development branch after all and 
> its stability is not as important as making progress.

OK, I'll try to check in more often.  I've been trying a lot of different
things, so I just need to clean things up a bit to minimize the cruft.  I
don't want to check in failed experiments. ;)  For a while the branch is 
likely to cause frequent lockups.  I'm trying to at least get pseudo-DMA 
stable again.

> > 
> > Right now I'm using 1MB (half the buffers) as the high water mark, so
> > there should always be plenty of available buffers for the drm.  To get
> > this working, I've used buffer aging rather than interrupts.
> 
> Which register do you use to keep track of the buffers age?

I'm using the PAT_REG[0,1] registers since they aren't needed for 3D.  As
long as we make sure that DMA is idle and the register contents are
saved/restored when switching contexts between 2D/3D, I think this should
work.  The DDX only uses them for mono pattern fills in the XAA routine,
and it saves and restores them, so we need to do the same.  I've done that
in the Enter/LeaveServer in atidri.c.  We should probably also modify the
DDX's Sync routine for XAA to use the drm idle ioctl.  I think we'll need
to make sure that the DMA queue is flushed before checking fo

Re: [Dri-devel] Client context uploads: How to implement them? / Analysis of the Gamma driver

2002-05-12 Thread Frank C. Earl

On Sunday 12 May 2002 01:15 pm, Leif Delgass wrote:

> this working, I've used buffer aging rather than interrupts.  What I
> realized with interrupts is that there doesn't appear to be an interrupt
> that can poll fast enough to keep up, since a VBLANK is tied to the
> vertical refresh -- which is relatively infrequent.  

Depends on what you're trying to do with it.  

If you're polling for completion of a pass for a given caller, it may be 
problematic.  Should we be doing that with this chip, though? 

I had envisioned a scheme in which the clients didn't drive DMA, they simply 
submitted things to be rendered out by the chip's DMA and the DRM managed 
details of keeping track of completion, etc.  The first pass of the code I 
was working towards was going to rely on a lack of free buffers to throttle 
the clients accordingly.  If that didn't work as I had hoped, I was looking 
to use async notifications to tell clients the request had been processed.

In that model, the only things needing to deal with locks are the DRM engine 
code submitting the DMAable buffers or blits (ran by a seperate group of 
code...) and any clients/code directly manipulating the chip.  All DMA engine 
clients do is ask for buffers, fill them, and then queue them up for 
submission to the chip's gui-master engine.  The interrupt handler takes care 
of the rest.  In that picture, you're not submitting one group of buffers for 
one client to the chip, you're submitting as many buffers as you think you 
can get away with at 60+ Hz (something like 1-2Mb, from prior experience with 
the Utah-GLX code...) from the queue submitted by all clients.  The DRM lock 
scheme would give enough flexibility to allow the X server to squeeze in 
against the DRM DMA handler and vice-versa so that screen updates and other 
stuff could be managed.

Unfortunately, my lack of available time precluded me from coding in more 
than the base framework for that scheme.  

-- 
Frank Earl

___

Have big pipes? SourceForge.net is looking for download mirrors. We supply
the hardware. You get the recognition. Email Us: [EMAIL PROTECTED]
___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel



[Dri-devel] radeon CPU usage

2002-05-12 Thread Ian Molton

Hi

I dont know if its 'normal', but Quake3 seems to use about 50% CPU time
(framesync on) whilst running, with another 50% spent in the kernel.

any ideas why ?

___

Have big pipes? SourceForge.net is looking for download mirrors. We supply
the hardware. You get the recognition. Email Us: [EMAIL PROTECTED]
___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel



[Dri-devel] Radeon 7500 + AMD761 Locks machine

2002-05-12 Thread Greg T Hill

Hello 
I posted this last month on the dri-users list in hopes that I was doing 
something wrong and would find out how to fix it. I eventually did fix it by 
installing a matrox g450, and everything works.  This pretty well rules out 
any configuration issues and narrows it down to : radeon 7500 + amd761 + dri 
= not good :-(  I did a lot of googling for references to this and found a 
reference to a fix in the ac-kernel branch for lockups caused by this 
combination, so I patched my 2.4.18 kernel with the ac patches, with no luck.

What happens is when dri is enabled in XF86Config, as soon as X starts the 
machine goes dead: monitor, keyboard, even the NIC, all gone.  A posthumous 
examination of /var/log/XFree86.0.log shows that everything is loaded 
successfully, other than the driver references it is the same as the log 
generated with the matrox card, X is happy, it just doesn't seem to know its 
dead.

Am I the only one with this problem? My motherboard is an ASUS A7M266 with 
Athlon XP 1600+ with recent upgrade to BIOS  (Janaury) This combo worked with 
Nvidia TNT2, and now matrox.  I would like to submit a bug report but don't 
know how to generate any more useful information than I have given here.

Thanks
-- 
Greg T Hill
---
Today is Pungenday, the 60th day of Discord in the YOLD 3168


___

Have big pipes? SourceForge.net is looking for download mirrors. We supply
the hardware. You get the recognition. Email Us: [EMAIL PROTECTED]
___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel